You are on page 1of 3

BIG DATA VALIDATION Written: JP Vijaykumar Date: March 8th 2014 Our dataware house application is having 60000

table partitions. 01) The application tables are partitioned by range partition. 02) Partitions are added once a month. The business want the following info, to validate the data: 01) number of partition_key values 02) number of distinct partition_key values 03) min(partition_key value) 04) max(partition_key value) from eacho of the 60000 partitions. In short, I need to execute a similar sql query on each of the 60000 partitions: select count(partition_key_value),count(distinct(partition_key_value)), min(partition_key_value),max(partition_key_value) from <table_name> partition(<partition_name>); If the required data is accessed manually from eacho of the partition, imagine the amount of time to accomplish the task. How this task can be accomplished efficiently? 01) If a sql query is to be executed against each of the partitions, it need to capture the table_name and partition_name info also in the output. Else, you do not know what info pertains to which tables' patition. 02) If a pl/sql procedure is implemented, the displayed info via dbms_output.put_line function should not fail with buffer overflow error. 03) Since the amount of data to be processed is voluminous and running into terabytes, I need to fine tune the performance of the process for better response times. Solution 01) Dynamically generate individual sql queries for each tables' patition, with the required info. Spool the sql queries and run the spool file from sqlplu s. To generate the required sql queries, I used the following statement: connect veeksha/.... set head off timing off echo off feedback off pagesize 0 linesize 200 colsep "," select 'select /*+ parallel(a 8)*/ '||chr(39)||p.table_name||' ,'||p.partition_n ame||chr(39)||',count(distinct('||k.column_name||')),count('||k.column_name||'), min('||k.column_name||'),max('||k.column_name||') from '||p.table_name||' partition('||p.partition_name||') a;' from user_part_key_columns k,user_tab_partitions p,user_tab_columns c where p.table_name in ( ......... ......... ......... ) and p.table_name = k.name and p.table_name = c.table_name and k.column_name = c.

column_name order by p.table_name,p.partition_name; Solution 02) connect saketh/.... Dynamically generate pl/sql procedure and loop through all the tables' partititions and generate the required info. If the size of messages to be displayed through dbms_output.put_line function, then the messages' display may fail with buffer overflow error. In such scenario, insert the messages into a log table and generate the report from the log table. PS:- You can not loop on cursor variables. To overcome this hurdle, I created a view with each of the tables' partitio ns and iterated on the views' column values. set serverout on size 1000000 timing on declare v_num number; v_str varchar2(30); begin dbms_output.put_line('TABLE_NAME,PARTITION_NAME,PARTITION_KEY,PART_KEY_DATA_TYPE ,DISTINCT_VALUE,MIN_VALUE,MAX_VALUE,NUM_ROWS'); for c1 in ( select p.table_name,p.partition_name,k.column_name, c.data_type from user_part_key_columns k,user_tab_partitions p,user_tab_columns c where p.table_name in ( ............ ............ ............ ) and p.table_name = k.name and p.table_name = c.table_name and k.column_name = c. column_name order by 1,2) loop execute immediate 'create or replace view temp_jp_view as select /*+ parallel(a 8)*/ '||c1.column_name||' col1 from '||c1.table_name|| ' partition ('||c1.partition_name||')'; --execute immediate 'select /*+ parallel(a 16)*/ count(distinct('||c1.column_nam e||')) from '||c1.table_name|| --' partition ('||c1.partition_name||') a' into v_num; --if (v_num > 1) then --dbms_output.put_line(c1.table_name||' '||c1.partition_name||' '||c1.column_nam e||' '||v_num); --end if; for c2 in (select a.distinct_val,b.min_val,b.max_val,b.num_rows from (select distinct(col1) distinct_val from temp_jp_view) a, (select min(col1) min_val,max(col1) max_val,count(col1) num_rows from temp_jp_vi ew) b) loop dbms_output.put_line(c1.table_name||', '||c1.partition_name||' ,'||c1.column_nam e||', '||c1.data_type||', '||c2.distinct_val ||', '||c2.min_val||', '||c2.max_val||', '||c2.num_rows); end loop; end loop; --begin --execute immediate 'drop view temp_jp_view'; --exception

--when others then --dbms_output.put_line(sqlerrm); --end; end; References: http://www.databasejournal.com/features/oracle/article.php/3744226/Multi-Table-L oop.htm