You are on page 1of 14

ADVANCED PL/SQL AND ORACLE 9I ETL

Doug Cosman, SageLogix, Inc.


This paper will describe some of the more advanced PL/SQL capabilities, which can be exploited to improve application performance as well as add new functionality to the database. Topics covered will include advantages of nested tables over index by tables, using table functions to return rows from a function, bul! binding, native compilation, returning cursors, and streaming data output using pipelined table functions. "hile these topics are important in their own right to the sophisticated PL/SQL developer, they are needed as bac!ground information for understanding #racle$s strategy for implementing an %TL solution within the database. &t will be shown how all these concepts come together in the new #racle ' i %TL functionality which uses PL/SQL and 'i external tables to transform data without the use of third party %TL tools.

BULK BINDING
&nteraction with #racle in any host language, including PL/SQL, involves the binding of host variables in and out of the SQL engine. (n )in*bind$ is when we pass a value from a program to the SQL engine, often either to constrain on a column or to specify a value for a +,L statement. The following -P+(T% statement uses in* binds for both purposes.
DECLARE v_quantity NUMBER := 0; v_sales_id NUMBER := 231 ; BE!"N U#DA$E %_sales_detail &E$ quantity = v_quantity '(ERE sales_id = v_sales_id; END;

.ommonly, in*binds are only of interest because they are essential for SQL statements to be sharable. "hen +/($s tal! of the importance of applications using )bind variables$ it is in the context of in*binds since, in applications that use dynamic SQL, using literals instead of bind variables causes each SQL statement to be parsed. "hile this is a critical consideration for overall database performance, the relative cost of the bind in this statement is trivial because only a single bind is re0uired regardless of how many rows are affected by the statement. (n )out*bind$ occurs when values are passed from the SQL engine bac! to the host language. #racle ma!es the distinction between values that are passed bac! via a 1%T-12&23 clause in SQL as opposed to when values are passed bac! by during a fetch operation but for the purpose of this paper & will refer to both of these operations as out*binds. "hen processing a cursor, application developers can choose to either fetch bac! values one*at*a*time or returned in a batch operation which will bind bac! many rows to the host application in a single operation. /efore the release of #racle 4i values being bound out into PL/SQL host variables had to be fetched one at a time. The following .-1S#1 5#1*L##P construct is a familiar one.
DECLARE CUR&)R *ust_*u+ ,-_*ust./e+_id NUMBER0 "& &ELEC$ 1 2R)M %_sales_detail '(ERE *ust./e+_id = -_*ust./e+_id; www.sagelogix.com Oracle Open World 2003

Advanced PL/SQL and Oracle 9i E L

!osman

v_*ust./e+_id

NUMBER := 123 ;

BE!"N 2)R +e* "N *ust_*u+ ,v_*ust./e+_id0 L))# "N&ER$ "N$) sales_3ist ,*ust./e+_id4 detail_id4 -+.*ess_date0 5ALUE& ,v_*ust./e+_id4 +e*6sales_id4 sysdate0; END L))#; END;

&n a .-1S#1 5#1*L##P, a record variable is implicitly declared that matches the column list of the cursor. #n each iteration of the loop, the execution context is switched from the PL/SQL engine to the SQL engine, performing an out*bind of the column values into the record variable once for each loop iteration. Li!ewise, an in*bind for the insert statement will occur once on each iteration. (lthough stored PL/SQL code has the advantage over other host languages of !eeping this interaction within the same process, the context switching between the SQL engine and the PL/SQL engine is relatively expensive ma!ing the above code very inefficient. &n addition, the cursor is defined as S%L%.T 6 instead of 7ust selecting from the columns to be utili8ed which is also inefficient. "hether the code references a column or not, #racle will have to fetch and bind over all of the columns in the select list, slowing down code execution. ( better way to perform the above tas! would be to utili8e bul! binding, introduced in #racle 4 i, for both the fetch and the insert statements. "e have two new PL/SQL operators to accomplish this. The /-L9 .#LL%.T statement is used to specify bul! out*binds while the 5#1(LL statement is used to provide bul! in*binds for +,L statements.
DECLARE $7#E sales_t "& $ABLE )2 %_sales_detail6sales_id8$7#E "NDE9 B7 B"NAR7_"N$E!ER; sales_ids v_*ust./e+_id /a:_+.;s sales_t; NUMBER := 123 ; C)N&$AN$ NUMBER := 10000;

CUR&)R sales,-_*ust./e+_id NUMBER0 "& &ELEC$ sales_id 2R)M %_sales_detail '(ERE *ust./e+_id = -_*ust./e+_id; BE!"N )#EN sales,v_*ust./e+_id0; L))# E9"$ '(EN sales8N)$2)UND; 2E$C( sales BUL< C)LLEC$ "N$) sales_ids L"M"$ /a:_+.;s; 2)RALL i "N 166sales_ids6C)UN$ "N&ER$ "N$) sales_3ist ,*ust./e+_id4 detail_id4 -+.*ess_date0 5ALUE& ,v_*ust./e+_id4 sales_ids,i04 sysdate0; END L))#; CL)&E sales; END; www.sagelogix.com Oracle Open World 2003

Advanced PL/SQL and Oracle 9i E L

!osman

&n this example, the fetch statement returns with the sales ids array populated with all of the values fetched for the current iteration, with the maximum number of rows fetched set to :;,;;;. -sing this method, only a single context switch is re0uired for the S%L%.T statement to populate the sales ids array and another switch to bind all of the fetched values to the &2S%1T statements. 2ote also that the 5#1(LL statement is not a looping construct < the array of values is given over in batch to the SQL engine for binding and execution. This second implementation will run at approximately := times the speed of the first, illustrating the importance of efficient binding in data driven code. #ne potential issue with the bul! binding techni0ue is the use of memory by the PL/SQL array variables. "hen a /-L9 .#LL%.T statement returns, all of the fetched values are stored in the target array. &f the number of values returned is very large, this type of operation could lead to memory issues on the database server. The memory consumed by PL/SQL variables is private memory, allocated dynamically from the operating system. &n dedicated server mode it would be the server process created for the current session that allocates memory. &n the case where such allocation becomes extreme, either the host will become memory bound or the dedicated server process will reach a si8e where it tries to allocate beyond its addressing limits, normally > 3/ on many platforms. &n either case the server processes call to malloc?@ will fail raising an #1(*;A;B; out of process memory error. To prevent this possibility when loading anything larger than a small reference table, use the optional L&,&T 1#"S operator, first introduced in #racle 4.:.C, to control the )batch si8e$ of each /-L9 .#LL%.T operation. &n the code example below the cursor will iterate though batches of :;,;;; rows fetching in the values and inserting :;,;;; rows. #n the final iteration, the cursor will fetch the remaining balance. Placement of the %D&T "E%2 clause should be before the 5%T.E statement or the last, incomplete batch will not be processed. The above example is only for the purpose of demonstrating coding techni0ues. The same logic could be accomplished totally in SQL using an &2S%1T (S S%L%.T statement. +oing this in pure SQL would be faster yet since no host binds or procedural execution would be re0uired at all. This brings up an important point < never do anything procedurally that can be accomplished through well crafted SQL.

SQL TYPES VS. PL/SQL TYPES


%very programming language has what is called a type system < the set of data types that the language implements. (mong other things, a language$s type system determines how a value of a particular type is represented. Since PL/SQL was developed by #racle as the procedural extension to SQL, these languages have an overlapping type system. 5or example PL/SQL 2-,/%1 type is the same data type as the SQL 2-,/%1 type. This is one of the main advantages to using PL/SQL as opposed to another language li!e Fava for data intensive operations < there is no type conversion cost when binding values from a SQL 0uery bac! to the PL/SQL. Eowever, being a superset, not all PL/SQL data types are part of the SQL type system. 5or example, the types /##L%(2 and /&2(1G &2T%3%1 are only found in PL/SQL.

COLLECTIONS
There are three flavors of collection types, one, which is only available in PL/SQL, and two others that are shared between both languages.

ASSOCIATIVE ARRAYS (PL/SQL TABLES)


Probably the most familiar collection type is the PL/SQL index*by table, now called associative arrays in #racle 'i 1elease >. The code bloc! below is a typical use of an associative array.
DECLARE $7#E nu/_a++ay "& $ABLE )2 NUMBER "NDE9 B7 B"NAR7_"N$E!ER; -.;e+s nu/_a++ay; BE!"N 2)R i "N 166100 L))# www.sagelogix.com Oracle Open World 2003

Advanced PL/SQL and Oracle 9i E L -.;e+s,i0 := -.;e+,24 i0; END L))#; END;

!osman

The element type of an associative array can be almost any PL/SQL type including a record type. The first thing to note is that PL/SQL associative array types are not SQL types < which is what one would expect since they are indexed by /&2(1G &2T%3%1, a non*SQL type. This type of array is by far the most commonly used in PL/SQL code since it has the advantage of being the simplest to use. (rray si8ing and allocation is totally dynamic < there is not maximum array si8e other than the memory constraints imposed by host environment. (s illustrated above, the code does not need to extend the si8e of the array to add new elements.

Indexing By VARCHAR2
( welcome addition in 'i 1elease > is the ability to use the H(1.E(1> data type as an index !ey. Similar to associative arrays in Perl and (w!, or Eashtables in Fava, this data type enables the loo!*up of a value using a H(1.E(1>. ,any existing PL/SQL routines can benefit from this capability. 2ow it is obvious why the table data type has been renamed to associative array.

NESTED TABLES
-nli!e associative arrays, the nested table data type is also a SQL data type. ( nested table is similar to an associative array in that there is no maximum si8e to the array however prior to assigning a new element to a nested table a PL/SQL program needs to explicitly extend the si8e before adding new elements. ( nested table is an ob7ect type and therefore needs to first be initiali8ed with a constructor before being used. 5or many PL/SQL programs, these two added re0uirements ma!e regular associative arrays a better choice for basic array functionality in code, however we will see that with nested tables a whole new set of options will open up that would not be possible with associative arrays. Li!e all collection types, before a variable of a particular collection type can be defined, its type must first be declared. Since nested tables are a shared type, there are two ways that this can be doneI locally in PL/SQL code or globally in the database. The first example shows a local PL/SQL declaration.
DECLARE $7#E nest_ta=_t "& $ABLE )2 NUMBER; nt nest_ta=_t := nest_ta=_t,0; BE!"N 2)R i "N 166100 L))# nt6E9$END; nt,i0 := i; END L))#; END;

2ote that the variable was initiali8ed to an empty nested table using the constructor for its type. (lso, the example shows how the nested table %DT%2+ method is used to allocate a new element to the array so that it can be assigned to in the next statement. Eowever, the most interesting use for nested tables is when you ta!e advantage of sharing types with the SQL engine. This next example, which defines an ob7ect to hold demographic information associated with an email, lays the groundwor! for many other possibilities. This is a SQL statement that would typically be run from a tool li!e SQL6Plus.
CREA$E )R RE#LACE $7#E e/ail_de/._.=>_t A& )B?EC$ , e/ail_id NUMBER4 de/._*.de NUMBER4 value 5ARC(AR2,3000; @ www.sagelogix.com Oracle Open World 2003

Advanced PL/SQL and Oracle 9i E L

!osman

CREA$E )R RE#LACE $7#E e/ail_de/._nt_t A& $ABLE )2 e/ail_de/._.=>_t; @

2ote that in SQL6Plus, the syntax of the .1%(T% TGP% statement re0uires both a semi*colon and a forward slash on the next line, similar to the re0uired syntax of a PL/SQL anonymous bloc!, although this is in fact a SQL statement. 2ow that the re0uired SQL types have been defined globally in the database they can be referenced in PL/SQL. The cool thing is that local variables in code of this type can now be treated li!e a SQL ob7ect, which means that the SQL engine can be used to manipulate local nested table variables as if they were true database tables.

Table Func ions


To do this, the PL/SQL code executes a SQL statement passing the local nested table variable to the server. There are two special functions necessary to achieve this functionality. The T(/L% function tells the server to bind over the values of the nested table, perform the re0uested SQL operation and return the results bac! as if the variable was a SQL table in the database. The .(ST function is an explicit directive to the server to map the variable to the SQL type that was defined globally in the previous step. "ith this capability, many new operations become possible.. 5or example, one can ta!e a nested table of ob7ects that have been created in code and send them to the server for ordering or aggregation. (lmost any SQL operation is possible. 5or example a nested table can be 7oined with other SQL tables in the database. The next example shows a simple ordering of an array by the second field.
DECLARE e/l_d/._nt e/ail_de/._nt_t := e/ail_de/._nt_t,0;

BE!"N AA &./e l.Bi* t3at -.-ulates t3e nested ta=le C e/l_d/._nt6E9$END,30; e/l_d/._nt,10 := e/ail_de/._.=>_t, D4 34 E23E0; e/l_d/._nt,20 := e/ail_de/._.=>_t,224 34 E 1E0; e/l_d/._nt,30 := e/ail_de/._.=>_t,1F4 G4 E.ve+_100HE0; AA #+.*ess t3e data in assendinB .+de+ .% e/ail id6 2)R + "N ,&ELEC$ 1 2R)M $ABLE,CA&$,e/l_d/._nt A& e/ail_de/._nt_t00 )RDER B7 10 L))# d=/s_.ut-ut6-ut_line,+6e/ail_id II E E II +6de/._id0; END L))#; END;

(nother possibility is to exploit this techni0ue to support direct path inserts into the database. 5or data warehouse applications, direct path is a way to dramatically increase insert performance by allowing the session$s server process to format and write data bloc!s directly to the table segment, bypassing the buffer cache. This can be accomplished using the (PP%2+ hint. #ptionally, one can also use the 2#L#33&23 hint to suppress most of the redo log generation for the statement, however this should normally only be done for inserts into staging tables that would not re0uire recovery in the event of media failure. Eowever, direct path inserts are supported only for &2S%1T (S S%L%.T statements but not &2S%1T /G H(L-%S. 1ecall in the previous section on the advantages of bul! binding, it was noted that performing bul! out*bind using a 5#1(LL statement to implement an &2S%1T of associative array values was up to := times faster than a conventional .-1S#1 5#1*L##P. Eowever, the 5#1(LL statement is implemented using an &2S%1T /G H(L-%S clause. -sing the nested table approach, if one wanted to perform the insert using direct path, one could do so using the following syntax, assuming the type %,(&L +%,# 2T T was defined as a nested table of ob7ects that matched the data types of the %,(&L +%,#31(PE&. table. This method, when

www.sagelogix.com

Oracle Open World 2003

Advanced PL/SQL and Oracle 9i E L

!osman

both direct path and 2#L#33&23 features are appropriate, is the fastest way one can bind over a large number of values from PL/SQL to SQL tables for &2S%1TS.
"N&ER$ @1J a--end n.l.BBinB 1@ "N$) e/ail_de/.B+a-3i* ,&ELEC$ 1 2R)M $ABLE,CA&$,e/l_d/._nt A& e/ail_de/._nt_t000;

VARRAYS
The last collection type to be discussed is the varray. Li!e nested tables, varrays can be both PL/SQL types and SQL types and therefore can ta!e advantage of the many of the features listed above. The main differences with varrays in PL/SQL is that their maximum si8e must be specified when the type is declared. &t should be noted that both varray types as well as nested table types can define the column type of a SQL table. &n the former case, if the si8e of the varray type is A;;; bytes or less, it can be stored in*line in the data bloc! along with other column values. &n contrast, the column data for a nested table is stored in a system managed child table ma!ing it very similar to a normal parent/child table relationship. /ecause they have a shared type, PL/SQL nested table or varray variables can be used to atomically insert values into tables that use them. (part from this capability, varrays are of less interest than nested tables to the PL/SQL developer because they have the restriction of an upper bound and most anything one can do in code with a varray, one can do with a nested table.

MULTI-DIMENSIONAL ARRAYS
(nother new feature that has been provided in #racle ' i, 1elease : is the long awaited capability of multi* dimensional arrays, which #racle has implemented as collections of collections. ,ulti*dimensional arrays have been around in most programming languages for a long time and are now available in PL/SQL. Technically, all collection types support only a single dimension, however by now allowing a collection element to become a collection, one has the effectively the same data structure. The following code shows the way to declare and reference a two*dimensional array of numbers.
DECLARE $7#E ele/ent "& $ABLE )2 NUMBER "NDE9 B7 B"NAR7_"N$E!ER; $7#E t;.Di/ensi.nal "& $ABLE )2 ele/ent "NDE9 B7 B"NAR7_"N$E!ER; t;.D t;.Di/ensi.nal; BE!"N t;.D,10,10 := 123; t;.D,10,20 := DK; END;

(t first one would thin! that, while an interesting capability, it has no potential impact on performance but it will be shown later in this paper how the combination of this capability along with the use of pac!aged variables can open up the door to dramatically speeding up PL/SQL code.

PACKAGE VARIABLES
PL/SQL pac!ages offer a number of advantages over stand*alone code or anonymous bloc!s. "hile some of the advantages come from the familiar ability to organi8e code into logical collections of related procedures and functions, an often ignored aspect is the use of pac!age*level variables and pac!age initiali8ation sections. ( pac!age variable is essentially a variable that is declared globally, typically at the top of the pac!age body outside of any procedure definition. #nce set, a pac!age variable will maintain its state for the life of the session, as opposed to variables local to procedure definitions that only exist for the duration of the procedure call. %very pac!age body implementation can optionally include a bloc! of code at the end of the specification referred to as
www.sagelogix.com Oracle Open World 2003

Advanced PL/SQL and Oracle 9i E L

!osman

the initiali8ation section. This code will run only once < when the pac!age is first referenced and is normally used to initiali8e any pac!age variables that are used. The use of pac!age variables is a powerful techni0ue to speed up SQL statements that has been used for many years. .onsider a procedure that is called repeatedly in a busy #LTP database that inserts a row into a table. #ne of the values passed to the procedure is a !ey that is used to first loo!up another value, which in turn, is used in the insert. ,ost PL/SQL code will first execute a select statement for the row of interest, binding the value to a local variable that will then be used on the subse0uent insert statement. ( uni0ue indexed table loo!*up is relatively 0uic! but if an application is being driven hard, the cost can be more than one would expect. 1eferencing a pre*initiali8ed array value is approximately >; times faster than an index based table loo!*up, ma!ing this techni0ue a real time saver for intensive operations. -sing pac!age variables, the loo!up can be avoided entirely by first initiali8ing an array with the desired values. 5or example, in a mar!eting application a procedure is passed a 8ip code as an argument and must perform an insert into a fact table that re0uires the city name as a denormali8ed value in the table. (ssume also that there is a loo!up table, with a numeric 8ip code as the primary !ey as well as the city and state information that the 8ip code maps to. .ode to avoid all of the loo!*ups would loo! li!e this.
CREA$E )R RE#LACE #AC<A!E B)D7 di+e*t_/Ht A&

$7#E Li-_a++ay "& $ABLE )2 5ARC(AR2,300 "NDE9 B7 B"NAR7_"N$E!ER; Li-_de+e% Li-_a++ay; #R)CEDURE d._inse+t,%_na/e 5ARC(AR24 l_na/e 5ARC(AR24 Li- NUMBER0 BE!"N "N&ER$ "N$) use+_data ,%_n/4 l_n/4 *ity_n/0 5ALUE& ,%_na/e4 l_na/e4 Li-_de+e%,Li-00; C)MM"$; END; AA #a*HaBe initialiLati.n se*ti.n6 BE!"N 2)R +e* "N ,&ELEC$ Li-_*.de4 *ity 2R)M d/a0 L))# Li-_de+e%,+e*6Li-_*.de0 := +e*6*ity; END L))#; END; "&

-ntil #racle 'i 1elease >, this techni0ue couldn$t be used if the loo!*up !ey was non*numeric or composite, but now with the combination of H(1.E(1> associative arrays and multi*dimensional arrays, it can be extended to almost any loo!*up table of a reasonable si8e. 5or example consider a table of individuals mapped to email addresses that has as it$s primary !ey, a composite index of numeric user id and an email address that they are associated with. The following code shows how to implement this.
CREA$E )R RE#LACE #AC<A!E B)D7 e/ail_-us3 A& $7#E e/ail_a++ay "& $ABLE )2 5ARC(AR2,300 "NDE9 B7 5ARC(AR2,300; $7#E use+/ail_a++ay "& $ABLE )2 e/ail_a++ay "NDE9 B7 B"NAR7_"N$E!ER; use+_e/ails use+/ail_a++ay; 5ARC(AR20 RE$URN 5ARC(AR2 "&

2UNC$")N l..Hu-,-_use+id NUMBER4 -_e/ail BE!"N RE$URN use+_e/ails,-_use+id0,-_e/ail0; END;

www.sagelogix.com

Oracle Open World 2003

Advanced PL/SQL and Oracle 9i E L BE!"N 2)R +e* "N ,&ELEC$ use+_id4 e/ail4 l_na/e 2R)M use+_e/ails0 L))# use+_e/ails,+e*6use+_id0,+e*6e/ail0 := +e*6l_na/e; END L))#; END;

!osman

NATIVE COMPILATION
Li!e Fava, PL/SQL has always been an interpreted language, allowing code to be portable between platforms. "hen a PL/SQL library unit is compiled, the source code is compiled into generic byte code that closely resembles assembler code. (t run time, the PL/SQL run time engine executes the byte code. The disadvantage of any interpreted language is execution speed. To address this, #racle introduced the native compilation facility, which re0uires that the host on which the database resides has a native . language compiler. "hen a PL/SQL library unit is compiled using native compilation, the code is first compiled down to byte code as is normally the case. Then the byte code is converted to . source code which, in turn, is used as input to the native . compiler on the host to produce a shared ob7ect library, in -2&D, or a +LL under "indows. 2ative compilation of PL/SQL does increase the speed of execution about B;;J when no interaction with the database is performed, however when tested in the !ind of environments described here, where large amounts of data are fetched or written to the database, the performance gains are fairly modest. This is because most of the time spent in data intensive code is the interaction with the database, not the execution of business logic. 2ote that while interpreted and compiled PL/SQL can be mixed and used to interoperate, #racle does not recommend that practice for production environments.

RETURNING RESULT SETS


-sing PL/SQL to create an (P& for applications to interact with the database has the advantage of removing dependencies of client applications on the physical implementation of the schema. "ith this !ind of approach, the +/( is free to change the physical model of the schema, perhaps for performance reasons, and not have to worry about brea!ing production applications. 5or write activity, this is the straightforward problem of implementing the necessary procedures to update and insert records in the database at some level of abstraction that the application re0uires. Eowever the problem of returning result sets from a PL/SQL procedure or function is a bit more difficult. There are three general approaches for doing thisI returning cursors, returning collections, or using table functions.

CURSORS VARIABLES
#ne of the best ways to isolate an application from SQL dependencies is to write a pac!age of PL/SQL functions that return the 1%5 .-1S#1 type to the calling programs written in other host languages such as Fava or .. .ursor variables can be either wea!ly typed, which are more flexible, or strongly typed, which provide greater type safety. #f course, the application must !now the number and data types of the returned columns as well as their semantics in order to use the data, but it can be totally isolated from the way the data is stored. The following function returns a wea!ly typed cursor using the new ' i type SGS 1%5.-1S#1.
2UNC$")N e/ail_*u+ RE$URN sys_+e%*u+s.+ "& +* sys_+e%*u+s.+; BE!"N )#EN +* 2)R &ELEC$ 1 2R)M e/ail; RE$URN +*; END;

(n application can call the function and bind the returned open cursor to a local cursor variable. The application then iterates through the result set as if it had been defined and opened locally.

www.sagelogix.com

Oracle Open World 2003

Advanced PL/SQL and Oracle 9i E L

!osman

RETURNING COLLECTIONS
(nother approach to returning result sets is to write a function that explicitly returns a PL/SQL collection type. The method has been around for years but it is really best suited for use by calling PL/SQL programs. (lso, since there are no predefined collection types in PL/SQL, the returned type must either be declared in a shared pac!age header or be a SQL type declared globally in the database. ( function that returns a shared collection type is shown below. The type, email demo nt t, was defined earlier in this paper.

2UNC$")N Bet_e/ail_de/.,-_e/ail_id NUMBER0 RE$URN e/ail_de/._nt_t e/l_d/. e/ail_de/._nt_t; BE!"N &ELEC$ e/ail_de/._.=>_t,e/ail_id4 de/._id4 value0 BUL< C)LLEC$ "N$) e/l_d/. 2R)M e/ail_de/.B+a-3i* '(ERE e/ail_id = -_e/ail_id; @1 A--ly s./e =usiness l.Bi* and +etu+n t3e +esult set6 1@ RE$URN e/l_d/.; END;

"&

2ote that when the /-L9 .#LL%.T feature is used, it is not necessary to initiali8e or extend a nested table because #racle does it automatically. /ut it is necessary to call the constructor function for the ob7ect type in the 0uery itself to be able to fetch scalar column values into the nested table of ob7ects. This would not be necessary if fetching from a column of that ob7ect type.

TABLE FUNCTIONS (REVISITED)


,ost client programs however, don$t really want to deal with trying to bind to a PL/SQL user defined typeI instead, they want a cursor. The T(/L% function provides a way to ta!e a function li!e the one above and return its results to the caller directly as a cursor. 1ecall that the T(/L% function ta!es a variable of a globally defined collection type as an argument, therefore a function with a return type of the same collection type, li!e the one above, can be used as an argument to the T(/L% function as well. "ithout modifying the above procedure, a program can return its output collection as a cursor using the following syntax. Hiews can be wrapped around this !ind of SQL statement to ma!e life easier for a calling application.
&ELEC$ 1 2R)M $ABLE, CA&$, Bet_e/ail_de/., D0 A& e/ail_de/._nt_t 00;

!i"elined Table Func ions


"hile that approach wor!s, it is really only appropriate for smaller result sets of perhaps a few thousand rows. "hen the function executes to populate the result set, the data is buffered in the local variable of the procedure. #nly after the procedure has finished executing, will the rows be returned to the calling application. ,emory to store the buffered data is dynamically allocated from the operating system by the server process executing the procedure &f the result set was very large, operating system memory could become depleted. Pipelined table functions are an #racle 'i facility that address this issue by providing a mechanism to stream the values from the function bac! to the calling application while the function is executing. ( small amount of data remains buffered in the function$s address space so that result sets can be sent bac! in batches, which is faster than row*by*row processing. This is a far more scalable design for this functionality since the operating system memory footprint is independent of the si8e of the result set.

www.sagelogix.com

Oracle Open World 2003

Advanced PL/SQL and Oracle 9i E L

!osman

To utili8e this feature, the function must be declared as P&P%L&2%+ and collection ob7ects must be returned one at a time via a new function called P&P% 1#". The function contains a 1%T-12 statement without arguments that is used to terminate the cursor. The function can now be rewritten to ta!e advantage of pipelining.

2UNC$")N Bet_e/ail_de/. RE$URN e/ail_de/._nt_t #"#EL"NED "& CUR&)R e/ail_de/._*u+ "& &ELEC$ e/ail_de/._.=>_t,e/ail_id4 de/._id4 value0 2R)M e/ail_de/.B+a-3i*; e/l_d/._nt e/ail_de/._nt_t;

BE!"N )#EN e/ail_de/._*u+; L))# 2E$C( e/ail_de/._*u+ BUL< C)LLEC$ "N$) e/l_d/._nt L"M"$ 1000; E9"$ '(EN e/ail_de/._*u+8N)$2)UND; 2)R i "N 166e/l_d/._nt6C)UN$ L))# @1 A--ly s./e =usiness l.Bi* .n t3e .=>e*t 3e+e4 and +etu+n a +.;6 1@ #"#E R)' ,e/l_d/._nt,i00; END L))#; END L))#; RE$URN; END;

2ote that while the return type of the function is still the collection type, the local variable being assigned is the ob7ect type. &n this example, the fetch is performed using the /-L9 .#LL%.Tfeature. The documents illustrate the much slower row*by*row fetch. Since the signature of the procedure has not been changed, only the implementation, it can be called the same way as the previous table function using the T(/L% and .(ST functions.

ORACLE 9I ETL
(ll of the above information has value in and of itself, but in the context of this paper it has served to provide bac!ground information for understanding the architecture of #racle$s %TL solution, now part of the #racle ' i database. The term %TL stands for %xtract, Transform, and Load. &n a nutshell it$s the process of reformatting and loading a data source of one format into the data model of the target database. The extract phase is 7ust about accessing the data, normally by reading from a flat file. The transformation phase is about converting the input records into the format of the target database perhaps by changing the record layout, and encoding or transforming some column values. Loading is simply the process of inserting the transformed data into the database, either directly into the appropriate database tables or by loading into a staging table for further database processing. /efore describing the #racle solution, there is still one more piece that is re0uired to understand #racle$s %TL solution. This can be illustrated with an example that processes a file produced by a "eb mar!eting application. The file is a denormali8ed layout where all of the information produced in a user submission is written out to a single line in the file. The layout consists of user names and two demographic attributes that are being transformed and loaded
www.sagelogix.com Oracle Open World 2003

Advanced PL/SQL and Oracle 9i E L

!osman

into a normali8ed table of user ids, attribute codes and values. %ach row of input must produce two rows of output. &n a real*world example there would be more attributes in the file. &f there were more attributes, each input record would produce an additional output record. 5or simplicity, assume the users and attribute codes have already been defined in the database. The source and target record layouts are as follows. ( comma character delimits each field. EMAIL Fohn.+oeKexcite.com AGE L: INCOME over :;;!
Table # In"u Reco$d

EMAIL_ID >BA= >BA=

DEMO_CODE B L

VALUE L: over :;;!

Table 2 %u "u Reco$ds

EXTERNAL TABLES
#racle introduced the concept of external tables in #racle ' i 1elease :. (n external table is simply a flat file on the local file system that #racle can now read directly, returning rows to the application as if it was a database table. The database server reads the file directly using the information from the table definition, which closely resembles a SQL6Loader control file, including .log and .bad file definitions. /efore the table can be defined, an #racle +&1%.T#1G ob7ect must be created, which enables the owner to read and write files to that directory. The definition is as follows.
CREA$E $ABLE e:t_ta= ,e/ail 5ARC(AR2,D004 aBe NUMBER4 in*./e 5ARC(AR2,2000 )R!AN"MA$")N E9$ERNAL , $7#E .+a*le_l.ade+ DE2AUL$ D"REC$)R7 data_di+ ACCE&& #ARAME$ER& ,REC)RD& DEL"M"$ED B7 NE'L"NE L)!2"LE data_di+: Ee:t_ta=6l.BE BAD2"LE data_di+: Ee:t_ta=6=adE 2"ELD& $ERM"NA$ED B7 E4E M"&&"N! 2"ELD 5ALUE& ARE NULL ,e/ail C(AR,D004 aBe "N$E!ER E9$ERNAL,204 in*./e C(AR,200 0 0 L)CA$")N ,Ee:t_ta=6datE0 0 RE?EC$ L"M"$ UNL"M"$ED;

The table creation will only create data dictionary information since, by definition, the data is external to the database. (fter the table definition is created, it can be 0ueried li!e any other table except that, since it cannot be indexed < all access involves a full table scan. The table can also be 0ueried using #racle$s parallel execution facility which will cause multiple server processes to partition the file e0ually and read from it in parallel.

THE ORACLE 9I ETL SOLUTION


(t many customer sites, %TL is accomplished by first running a third party %TL tool or a home*grown application which would read the input file and produce the desired output file, in this case turning each input record line into two lines in the new file. Third party tools are often smart enough to perform the necessary encoding of file
www.sagelogix.com Oracle Open World 2003

Advanced PL/SQL and Oracle 9i E L

!osman

values to database codes during this process, however many home*grown programs would omit this step until later. &n the latter case, the next step is to use SQL6Loader to perform a direct path load into a database staging table where their values would be encoded before being inserted into the production tables. The main point here is that to load and transform, the data are sometimes read and written two or more times. /y combining the above capabilities #racle has provided the infrastructure to perform %TL in a single step. The %xtract phase is now handled by the external table facility, the Transform capability is provided by either by a SQL statement or more flexibly, by pipelined table functions, and the Load phase is implemented by direct path inserts of the transformed data. "ith this approach the data is read and written only once. 5or this example, a pipelined table function will be used to extract the data. 2ote that if parallel processing is re0uired, the table function must be defined with the P(1(LL%L %2(/L% clause or the process will seriali8e.

#AC<A!E B)D7 etl "& $7#E 3as3_ta=le_t "& $ABLE )2 NUMBER "NDE9 B7 5ARC(AR2,300; e/ail_/a3as3_ta=le_t; 2UNC$")N t+ans%.+/ ,ne;_data &7&_RE2CUR&)R0 RE$URN e/ail_de/._nt_t #"#EL"NED #ARALLEL_ENABLE,#AR$"$")N ne;_data B7 AN70 "& $7#E e:t_ta=_a++ay "& $ABLE )2 e:t_ta=8R)'$7#E "NDE9 B7 B"NAR7_"N$E!ER; indata e/ail_de/._.=> de/._/ae:t_ta=_a++ay; e/ail_de/._.=>_t := e/ail_de/._.=>_t,null4null4null0; 3as3_ta=le_t;

BE!"N L))# E9"$ '(EN ne;_data8N)$2)UND; 2E$C( ne;_data BUL< C)LLEC$ "N$) indata L"M"$ 1000; 2)R i "N 166indata6C)UN$ L))# e/ail_de/._.=>6e/ail_id := e/ail_/a-,indata,i06e/ail0; e/ail_de/._.=>6de/._*.de := 3; e/ail_de/._.=>6value := indata,i06aBe; #"#E R)' ,e/ail_de/._.=>0; e/ail_de/._.=>6de/._*.de := G; e/ail_de/._.=>6value := indata,i06in*./e; #"#E R)' ,e/ail_de/._.=>0; END L))#; END L))#; RE$URN; END; BE!"N 2)R e/ail "N ,&ELEC$ e/ail_id4 e/ail 2R)M e/ail0 L))# e/ail_/a-,e/ail6e/ail0 := e/ail6e/ail_id; END L))#; END;

This function and its call method below constitute the basic templates for using #racle ' i %TL. There are several things to note from this example. 5irst, the P&P% 1#" function is called twice for each row creating the pivot
www.sagelogix.com Oracle Open World 2003

Advanced PL/SQL and Oracle 9i E L

!osman

re0uired for the transformation. Second, pac!age variables are used to avoid the repeated loo!*up of the email id. Third, an open cursor to the select statement that 0ueries the external table is passed to the function. This .-1S#1 expression is necessary to read the external table in parallel. 5ourth, the parameter uses the new pre* defined SGS 1%5.-1S#1 type < a wea!ly defined ref cursor. &t should be noted that for those who wish to avoid hand coding PL/SQL li!e the above, #racle$s "arehouse /uilder product can used to generate code such as this, based upon meta data that describes the source and target data. The impressive thing about this capability is it provides an elegant mechanism for allowing #racle$s proven parallel execution engine to read data, stream it through a PL/SQL co*process, and simultaneously write out the data. The actual transformation call is 7ust the &2S%1T of the data produced by the function.
"N&ER$ @1J a--end n.l.BBinB 1@ "N$) e/ail_de/.B+a-3i* ,&ELEC$ @1J -a+allel, a4 0 1@ 1 2R)M $ABLE, CA&$, etl6t+ans%.+/, CUR&)R,&ELEC$ 1 2R)M e:t_ta= 00 A& e/ail_de/._nt_t00 a0;

EPILOGUE
&s there yet a better wayM 5or this particular transformation there is. &ts interesting to note that the above transformation could also be accomplished with pure SQL using #racle$s new extension to the &2S%1T statement, the ,-LT&*T(/L% &2S%1T statement. This new syntax allows for each row returned by a sub0uery to be inserted into multiple tables, or the same table multiple times. Since the statement must encode the values as well as perform the pivot, the external table must be 7oined to the reference tables. 1emember that the external table can$t be indexed so it should be the driving table in the sub0uery.
"N&ER$ @1J a--end n.l.BBinB 1@ ALL "N$) e/ail_de/.B+a-3i* ,e/ail_id4 de/._id4 value0 5ALUE& ,e/ail_id4 34 aBe0 "N$) e/ail_de/.B+a-3i* ,e/ail_id4 de/._id4 value0 5ALUE& ,e/ail_id4 G4 in*./e0 ,&ELEC$ @1J .+de+ed inde:, = 0 1@ =6e/ail_id4 a6in*./e4 a6aBe 2R)M e:t_ta= a4 e/ail = '(ERE a6e/ail = =6e/ail0;

/esides being more simple, this 0uery will perform about A times faster than the above PL/SQL implementation since it$s always faster if something can be performed in pure SQL as opposed to any host language. Speed is impressive < using a small 4;; ,E8 &ntel box running 1ed Eat (dvanced Server >.:,and #racle ' i 1elease >, the above transformation of : million input rows which produces > million output rows, executed in approximately :; seconds without the use of parallelism. The multi*table insert capability also has limited conditional execution capabilities. "hen combined with the SQL .(S% statement, one can perform many simultaneous transformations using this approach. Eowever many transformations are too complex to be performed using this method and must be accomplished in hand*written code. &n this case, the PL/SQL techni0ues shown here are the best approach.

PERFORMANCE ISSUES
&t should also be noted that there appears to be a performance bottlenec! returning data through table functions. The current performance of table functions is approximately A times slower compared with the e0uivalent /-L9 .#LL%.T and 5#1(LL logic presented earlier. & confirmed this with the PL/SQL group at #racle, and they have said that they expect to have this issue addressed in ' i, 1elease B. Eowever it should be noted that, even with this bottlenec!, performance remains very respectable. "hen this issue is fixed, & suspect that the #racle ' i %TL solution will provide the fastest production %TL method available. -ntil the table function issue is sorted out, coding the same logic using bul! binding and perhaps the nested table insert techni0ue described earlier is probably the best approach. Eowever the table function opens the door to
www.sagelogix.com Oracle Open World 2003

Advanced PL/SQL and Oracle 9i E L

!osman

parallel, concurrent PL/SQL processing of parallel 0uery output and will therefore be the optimal solution when the bottlenec! is resolved. "hen Fava was first introduced into the database in #racle 4 i, many predicted the demise of PL/SQL. Eowever that has turned out to be anything but true. Since then, #racle$s PL/SQL group has done some excellent wor! to improve the language with all of the features discussed in this document ma!ing it far more powerful then it ever was before.

CONCLUSION
#racle$s 'i %TL solution offers some great possibilities to transform and load data and will no doubt gain broad acceptance in the future. &t should also be noted that in terms of cost, it is a very attractive solution since this functionality is already included in the cost of the ' i server.

ABOUT THE AUTHOR


+oug .osman is a senior consultant for SageLogix, &nc, based in +enver, .olorado. +oug is an #racle +/( and developer speciali8ing in &nternet technologies, PL/SQL development, #racle instance tuning, and developer support and mentoring. Ee has built databases and applications for clients li!e excite.com, 3,, Smart,oney.com, &/,, ,atchLogic, and "yndham Eotels. +oug can be reached atN doug cosmanKyahoo.com

www.sagelogix.com

Oracle Open World 2003

You might also like