Teradata SQL Performance Tuning Case Study Part II

Teradata SQL Performance Tuning Case Study
Part II
Eddy Cai 2007/03
Overview
Case 9: Derived Table verses Volatile Table Case 10: Pre-aggregation First Case 11: Cross left outer join skew on null Case 12: Join column skew on default value Case 13: QUALIFY & ROW_NUMBER() Function Case 14: Avoid spool PI skew Case 15: Pre-aggregate then join back with duplicated APPENDIX 1: IMD Teradata Performance wiki APPENDIX 2: IMD Support Team Page APPENDIX 3: Key Performance Metrics1 - SKEW APPENDIX 4: Key Performance Metrics2 CPU/IO Efficiency Ratio
APPENDIX 5: Key Performance Metrics3 - Parallel Efficiency

APPENDIX 6: Checklist for Performance Tuning APPENDIX 7: Possibility-Satisfaction Measure
eBay Inc. confidential
Case 9: Derived Table verses Volatile Table
Teradata Optimizer could not look into derived table and no confidence how many row will return, but Teradata would do a sample collect stats on Volatile Table automatically. If result set is small, Optimizer could choose a plan (duplicated data to all amps) better than derived table (joined using a merge join).
myEffectiveCPU 16000 14000 12000 10000 8000 6000 4000 2000 0 2006-12-4 2006-12-11 2006-12-18 2006-12-20 10 0 40 30 20 mySkewOverhead myParallelEfficiency 60 50
Example: o_srch.ods_item_aisle_clssfctn_w.del_ins.sql
select
a.item_id
, syslib.udf_utf8to16( prdct_aspct_nm ) , syslib.udf_utf8to16( aspct_vlu_nm ) , a.last_clsfn_date from ods_batch_views.stg_item_aspct_clssfctn_w a,
(select item_id, max(last_clsfn_date) last_clsfn_date

from ods_batch_views.stg_item_aspct_clssfctn_w group by item_id ) b where a.item_id = b.item_id and a.last_clsfn_date = b.last_clsfn_date ;

Rewrite SQL
CREATE VOLATILE TABLE LATEST_CLSFN_V AS ( select item_id, max(last_clsfn_date) last_clsfn_date from ods_batch_views.stg_item_dprtmnt_clssfctn_w group by item_id )WITH DATA PRIMARY INDEX (ITEM_ID, LAST_CLSFN_DATE) ON COMMIT PRESERVE ROWS ; select a.item_id , syslib.udf_utf8to16( dprtmnt_dmn_nm ) , a.last_clsfn_date from ods_batch_views.stg_item_dprtmnt_clssfctn_w a, latest_clsfn_v b where a.item_id = b.item_id and a.last_clsfn_date = b.last_clsfn_date
;
myEffectiveCPU 35000 30000 25000 20000 15000 10000 5000 0
mySkewOverhead
myParallelEfficiency 100 90 80 70 60 50 40 30 20 10 0
20 07 20 -3 -1 07 5 20 -3 07 15 20 -3 07 15 20 -3 -1 07 5 20 -3 07 15 20 -3 07 15 20 -3 -1 07 6 20 -3 07 16 20 -3 07 17 20 -3 -1 07 7 20 -3 07 18 20 -3 07 18 20 -3 -1 07 8 20 -3 07 18 20 -3 07 18 20 -3 -1 07 9 -3 -2 0
Case 10: Pre-aggregation First
If there are duplicate records in spool which need to join with lookup table, then pre-aggregate first to compress the data set will be more effective. This should result in less skew, due to the aggregate (rather than the detail). Example using Pre-aggregation resolve the skew on issue and downsize the spool when joining with DW_EXCHANGE_RATE.
Before change: dw_dp_ebay_fee_w_ins.ksh

from batch_views.dw_accounts a, batch_views.dw_dp_actn_code b, batch_views.DW_DAILY_EXCHANGE_RATES c, batch_views.DW_DP_WKLY_RPT_SLR_W d, batch_views.dw_calendar cal where a.user_id = d.user_id and c.curncy_id = d.user_site_curncy_id and a.acct_trans_date between d.beg_prd_adjd_by_tz and d.end_prd_adjd_by_tz and a.acct_trans_dt = c.day_of_rate_dt and d.user_site_curncy_id = c.curncy_id and a.actn_code = b.dp_actn_code and a.acct_trans_dt =cal.cal_date and a.acct_trans_dt between (DATE '2006-11-04') and (DATE '2006-11-12') and CASE WHEN kenan_source = 2 then tracking_id_3 else -999 end NOT IN (7, 8)

Rewrite SQL
From ( from $readDB.dw_accounts a, $readDB.dw_dp_actn_code b, $readDB.$DrivingTable d where a.user_id = d.user_id and a.acct_trans_date between d.beg_prd_adjd_by_tz and d.end_prd_adjd_by_tz and a.acct_trans_dt between (DATE '${CREATED_MIN_DATE}') and (DATE '${CREATED_MAX_DATE}') and CASE WHEN a.kenan_source = 2 then a.tracking_id_3 else -999 end NOT IN (7, 8) - NON_WACKO and a.actn_code = b.dp_actn_code group by 1,2,3,4,5,6,7 ) ACCT, $readDB.DW_DAILY_EXCHANGE_RATES c where c.curncy_id = acct.user_site_curncy_id and acct.acct_trans_dt = c.day_of_rate_dt and c.day_of_rate_dt between (DATE '${CREATED_MIN_DATE}') and (DATE '${CREATED_MAX_DATE}')
Case 11: Cross left outer join skew on null
For multiple left outer join in one query, there are several steps of join. It is possible that the later joining column skew on NULL or other particular value based on previous result set. Change this value to a more evenly redistributed value like ITEM_ID. That will bring a balanced distribution and the value can not match in join condition.
Case 11: Cross left outer join skew on null

Example: dw_api_fetr_sd_w_ins.sql.
() ITEM LEFT OUTER JOIN ${readDB}.DW_MIP_ORDER_TRANS_MAP MIP_TRANS ON ITEM.ITEM_ID=MIP_TRANS.ITEM_ID AND MIP_TRANS.ORDER_ID IS NOT NULL LEFT OUTER JOIN ${readDB}.DW_MIP_ORDER MIP_ORDER ON
myEffectiveCPU 45,000.00 40,000.00 35,000.00 30,000.00 25,000.00
myTotalCPUTime
myParallelEfficiency 100 90 80 70 60 50
MIP_TRANS.ORDER_ID=MIP_ORDER.ORDER _ID
AND MIP_ORDER.ORDER_STATUS=3 GROUP BY 1,2,3,5,6,7,8,9,10,11,12,13;
20,000.00 40 15,000.00
Rewrite SQL
() ITEM LEFT OUTER JOIN ${readDB}.DW_MIP_ORDER_TRANS_MAP MIP_TRANS ON ITEM.ITEM_ID=MIP_TRANS.ITEM_ID AND MIP_TRANS.ORDER_ID IS NOT NULL LEFT OUTER JOIN ${readDB}.DW_MIP_ORDER MIP_ORDER
30 20 10 0 20 20 06- 0612- 1226 27 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 06- 06- 06- 06- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-112- 12- 12- 12- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 28 29 30 31
10,000.00 5,000.00 0.00
ON coalesce (MIP_TRANS.ORDER_ID, -ITEM.ITEM_ID)=MIP_ORDER.ORDER_ID

AND MIP_ORDER.ORDER_STATUS=3
10
Case 12: Join column skew on default value
When join with a nullable column, the default value (-999, NULL, etc) maybe the root cause of bad parallel efficiency. Two ways avoid joining on default value.
Split query into two query with/without skew data Filter the skew data on join condition(need confirm with SAE)
11
Case 12: Join column skew on default value
Example: dw_myebay.dw_myebay_sav_search.ups.sql
Before change: UPDATE DW_SAV_SEARCH FROM ${gdwDB}.DW_MYEBAY_SAV_SEARCH DW_SAV_SEARC H, DW_MYEBAY_SAV_SEARCH_V SAV_SEARCH_V SET DELTD_YN_ID = 1 ,DELTD_DT = DATE ,UPD_DATE = CURRENT_TIMESTAMP(0) ,UPD_USER = 'DW_BATCH' WHERE DW_SAV_SEARCH.SAV_SRCH_ID=SAV_SEARCH_V.ID;
Rewrite SQL
UPDATE DW_SAV_SEARCH FROM ${gdwDB}.DW_MYEBAY_SAV_SEARCH DW_SAV_SEARC H, DW_MYEBAY_SAV_SEARCH_V SAV_SEARCH_V SET DELTD_YN_ID = 1 ,DELTD_DT = DATE ,UPD_DATE = CURRENT_TIMESTAMP(0) ,UPD_USER = 'DW_BATCH' WHERE DW_SAV_SEARCH.SAV_SRCH_ID=SAV_SEARCH_V.ID And
DW_SAV_SEARCH.SAV_SRCH_ID <> -999;
12
Case 13: QUALIFY & ROW_NUMBER() Function
When join with derived table which has aggregation to filter unwanted data, we can use QUALIFY & ROW_NUMBER() to remove the derived table .
Qualify is used to get the needed data, its often used with function ROW_NUMBER(), RANK(), SUM(1). Those functions has better performance
than aggregation function in group by sentences.
13
Example: dw_se.stg_se_emls_bncd_cumm_w.ins.sql Before change:

select from (select * LEAF_CATEG_ID ,a.FROM_EMAIL_ADDRESS,a.HARD_BOUNCE_TYPE ,a.TO_EMAIL_ADDRESS ,a.ISP ,case when (b.user_id is null or b.user_id = -999) then -1 * (c.to_email_addr_id) else b.user_id end user_id ,count(*) USER_EMAIL_COUNT from batch_views.STG_EMLS_BNCD_W a left join (select email, max(user_id) from batch_views.dw_users_info group by 1 ) b (email, user_id) ON a.to_email_address = b.EMAIL
left join (select to_email_address, max(to_email_addr_id) from batch_views.dw_se_to_email_addr group by 1) c (to_email_address, to_email_addr_id)
ON a.to_email_address = c.to_email_address group by 1,2,3,4,5,6,7,8,9 ) sebw;
14

Rewrite SQL:
select (select , count(*) USER_EMAIL_COUNT from from
myEffectiveCPU 150,000.00 100,000.00 50,000.00 0.00 mySkewOverhead myParallelEfficiency 120 100 80 60 40 20 0
SUPPORT_SCRATCH.STG_EMLS_BNCD_W GROUP BY 1,2,3,4,5,6,7,8 ) a left join batch_views.dw_users_info b ON a.to_email_address = b.EMAIL
QUALIFY ROW_NUMBER() OVER(Partition by a.EMAIL_BOUNCED_DATE,

a.TEMPLATE_ID,a.ITEM_SITE_ID,LEAF_CATEG_ID,a.FROM_EMAIL_ADDRESS,a.HARD_BOUNCE _TYPE,a.TO_EMAIL_ADDRESS,a.ISP order by b.user_id desc )=1 )t left join batch_views.dw_se_to_email_addr c ON t.to_email_address = c.to_email_address
QUALIFY ROW_NUMBER() OVER(Partition by t.EMAIL_BOUNCED_DATE,

t.TEMPLATE_ID,t.ITEM_SITE_ID,LEAF_CATEG_ID,t.FROM_EMAIL_ADDRESS,t.HARD_BOUNCE_T YPE,t.TO_EMAIL_ADDRESS,t.ISP order by c.to_email_addr_id desc )=1;
15
4/ 13 4/ /20 14 0 4/ /20 7 1 0 4/ 5/20 7 16 0 4/ /20 7 17 0 4/ /20 7 18 0 4/ /20 7 19 0 4/ /20 7 2 0 4/ 0/20 7 21 0 4/ /20 7 22 0 4/ /20 7 23 0 4/ /20 7 24 0 4/ /20 7 2 0 4/ 5/20 7 26 0 4/ /20 7 27 0 4/ /20 7 28 0 /2 7 00 7
Case 14: Avoid spool PI skew
Spool is intermediate table with PI as normal table. Sometimes skew on spool PI will cause the performance issue.
To avoid spool PI skew, need rewrite the query to cut the join which make the spool.
16
FROM fndng_working.STG_FNDNG_TD_RESULT_SET_W STG JOIN fndng_tables.DW_FNDNG_RULE_SET DW ON DW.RULE_ID_STRING_TXT=STG.RULE_ID_STRING_TXT AND DW.LABEL_ID=STG.LABEL_ID JOIN fndng_tables.DW_FNDNG_RULE_MAP DW_MAP ON DW.RULE_SET_KEY=DW_MAP.RULE_SET_KEY JOIN fndng_tables.DW_FNDNG_TD_RULE DW_RULE ON DW_RULE.RULE_ID=DW_MAP.RULE_ID AND DW_RULE.LABEL_ID=DW_MAP.LABEL_ID JOIN batch_views.DW_KWDM_CNSTRNT_VAL_CFG KW ON KW.ITEM_SITE_ID=DW_RULE.SITE_ID GROUP BY 1,2,3;
17
3 CFG DW_RULE D S4 P S5 4 S10 DW_MAP R S6 RULE_SET_KEY
RULE_ID LABEL_ID
L STG L S7 1,418,765
S3
S11
S1
LABEL_ID RULE_ID_STRING_TXT DW D
S9
SKEW
S8
RULE_SET_KEY
LABEL_ID RULE_ID_STRING_TXT
18
FROM fndng_working.STG_FNDNG_TD_RESULT_SET_W STG JOIN (select DW.RULE_SET_KEY ,DW.RULE_ID_STRING_TXT ,DW.LABEL_ID ,MAX(DW_RULE.SITE_ID) SITE_ID
45,000.00 40,000.00 35,000.00 30,000.00 25,000.00 20,000.00 15,000.00 10,000.00 5,000.00 0.00 2007-7-24 2007-7-25 2007-7-26 2007-7-27 100 80 60 40 20 0 myEffectiveCPU mySkewOverhead myParallelEfficiency
,MAX(KW.BID_VAL) BID_VAL
,MAX(KW.BIN_VAL) BIN_VAL from fndng_tables.DW_FNDNG_RULE_SET DW JOIN fndng_tables.DW_FNDNG_RULE_MAP DW_MAP group by 1,2,3 )tmp ON tmp.RULE_ID_STRING_TXT=STG.RULE_ID_STR ING_TXT AND tmp.LABEL_ID=STG.LABEL_ID GROUP BY 1,2,3;
19
Case 15: Pre-aggregate then join back with duplicated
There is a large dataset in spool which need to join with large lookup table, the dataset only have few distinct values over the join column pre-aggregate first to compress the data set, join it with the lookup table, then join back with the duplicated dataset will be more effective.
20
Before change: dw_um.dw_um_user_smpl_map_pr_prep2_w.ins.sql

FROM DDM_UM_W.STG_UM_USER_SMPL_PR_MTRC_P1_W PR_MTRC JOIN DDM_UM_T.DW_UM_USER_SMPL_LKP SMPL_LKP ON PR_MTRC.TEST_GRP = SMPL_LKP.USER_SMPL_CMPGN_ID AND PR_MTRC.SOJ_SITE_ID = SMPL_LKP.SITE_ID JOIN DDM_UM_T.DW_UM_USER_SMPL_VRNT SMPL_VRNT ON SMPL_LKP.USER_SMPL_ID = SMPL_VRNT.USER_SMPL_PRNT_ID AND SMPL_VRNT.USER_SMPL_VRNT=PR_MTRC.TEST_VARIANT
21
Rewrite SQL
First, pre-aggregate the dataset into a volatile table
myEffectiveCPU 14,000.00
mySkewOverhead
myParallelEfficiency 90 80 70 60 50 40 30 20 10 0
CREATE volatile TABLE pre_distinct_v AS ( SELECT SMPL_LKP.USER_SMPL_ID ,PR_MTRC.TEST_VARIANT FROM support_scratch.STG_UM_USER_SMPL_PR_MTRC_P1_W PR_MTRC JOIN DDM_UM_T.DW_UM_USER_SMPL_LKP SMPL_LKP ON PR_MTRC.TEST_GRP = SMPL_LKP.USER_SMPL_CMPGN_ID AND PR_MTRC.SOJ_SITE_ID = SMPL_LKP.SITE_ID Group By 1,2 ) WITH DATA Unique Primary Index( USER_SMPL_ID,TEST_VARIANT ) ONCOMMIT PRESERVE ROWS;
12,000.00 10,000.00 8,000.00 6,000.00 4,000.00 2,000.00 0.00

9/ 13 /2 0 9/ 13 07 /2 0 9/ 13 07 /2 0 9/ 13 07 /2 0 9/ 13 07 /2 0 9/ 14 07 /2 0 9/ 14 07 /2 0 9/ 15 07 /2 0 9/ 17 07 /2 0 9/ 19 07 /2 0 9/ 19 07 /2 0 9/ 24 07 /2 00 7
Then, Join back with the high duplicated dataset ...
FROM DDM_UM_W.STG_UM_USER_SMPL_PR_MTRC_P1_W PR_MTRC JOIN DDM_UM_T.DW_UM_USER_SMPL_LKP SMPL_LKP ON PR_MTRC.TEST_GRP = SMPL_LKP.USER_SMPL_CMPGN_ID AND PR_MTRC.SOJ_SITE_ID = SMPL_LKP.SITE_ID JOIN ( --join the compressed volatile table with the large lookup table select SMPL_VRNT_PRE.USER_SMPL_PRNT_ID ,SMPL_VRNT_PRE.USER_SMPL_VRNT ,USER_SMPL_VRNT_ID From pre_distinct_v JOIN DDM_UM_T.DW_UM_USER_SMPL_VRNT SMPL_VRNT_PRE ON SMPL_VRNT_PRE.USER_SMPL_PRNT_ID= pre_distinct_v.USER_SMPL_ID AND SMPL_VRNT_PRE.USER_SMPL_VRNT=pre_distinct_v.TEST_VARIANT ) SMPL_VRNT ON SMPL_VRNT.USER_SMPL_PRNT_ID= SMPL_LKP.USER_SMPL_ID AND SMPL_VRNT.USER_SMPL_VRNT=PR_MTRC.TEST_VARIANT
22
Case 16: Using max() instead of OLAP function for De-duplication
OLAP function in current version V2R6 has performance issues due to the using of large spool. We can avoid using OLAP functions in some cases (not all cases) and choose alternative methods While doing De-duplication, we can try using max() function instead of OLAP function (row_number() or sum() with qualify clause)
23
Before change:o_odw_itm.itm_dmx_cpu_w.ins.sql
Insert Into odw_itm_w.ITM_DMX_CPU_W ( ) Select From
odw_itm_w.ITM_DENORM_W w
Where w.RESOURCE_MODEL = 'DMXCpu' qualify sum(1) over ( partition by W.SERVER_NAME,W.PROFILE_NAME,W.RESOURCE_MODEL,W.CONTEXT_TYPE, W.INSTANCE_NAME,W.ITM_TYPE,W.TRANS_TS order by W.TRANS_TS desc ROWS UNBOUNDED PRECEDING) = 1 ;
24
Rewrite SQL
myEffectiveCPU
mySkewOverhead
myParallelEfficiency
7000 60 Insert Into odw_itm_w.ITM_DMX_CPU_W ( ......) 6000 50 Select ...... 5000 From 40 4000 odw_itm_w.ITM_DENORM_W w 30 Join 3000 ( 20 2000 Select 10 1000 SERVER_NAME , PROFILE_NAME 0 0 , RESOURCE_MODEL , CONTEXT_TYPE , INSTANCE_NAME , ITM_TYPE , TRANS_TS , MAX(COALESCE(FILE_TIME,0)||COALESCE(DATA_VALUE1,0)||COALESCE(DATA_VALUE2,0)||COALESCE(DATA_VALUE3,0)||COALESCE(DATA_VALUE4,0)||COA LESCE(DATA_VALUE5,0)||COALESCE(DATA_VALUE6,0)||COALESCE(DATA_VALUE7,0)||COALESCE(DATA_VALUE8,0)||COALESCE(INSTANCE1,0)) AS MAX_VALUE From odw_itm_w.ITM_DENORM_W Where RESOURCE_MODEL = 'DMXCpu' Group by 1,2,3,4,5,6,7 ) w1 on w.SERVER_NAME =w1.SERVER_NAME and w.PROFILE_NAME =w1.PROFILE_NAME and w.RESOURCE_MODEL =w1.RESOURCE_MODEL and w.CONTEXT_TYPE =w1.CONTEXT_TYPE and w.INSTANCE_NAME =w1.INSTANCE_NAME and w.ITM_TYPE =w1.ITM_TYPE and w.TRANS_TS =w1.TRANS_TS and (COALESCE(FILE_TIME,0)||COALESCE(DATA_VALUE1,0)||COALESCE(DATA_VALUE2,0)||COALESCE(DATA_VALUE3,0)||COALESCE(DATA_VALUE4,0)||COALES CE(DATA_VALUE5,0)||COALESCE(DATA_VALUE6,0)||COALESCE(DATA_VALUE7,0)||COALESCE(DATA_VALUE8,0)||COALESCE(INSTANCE1,0))= w1.MAX_VALUE Where w.RESOURCE_MODEL = 'DMXCpu ;
25
9/ 2 9/ 3/2 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 9/ 3/20 07 2 9/ 3/20 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 9/ 3/20 07 2 9/ 3/20 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 9/ 3/20 07 2 9/ 3/20 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 9/ 4/20 07 2 9/ 4/20 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 9/ 4/20 07 2 9/ 4/20 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 24 0 /2 07 00 7
APPENDIX 1: IMD Teradata Performance wiki
This wiki is maintained by IMD performance analyst This wiki contains tips and techniques to improve your query performance and to learn about Teradata and the MPP (Massive Parallel Processing) capabilities of the system. It contents of CSUM, skew fact, TD Architecture and developers guidelines. It also includes the TD Performance Quick Tips. http://portal.corp.ebay.com/wiki/tikiindex.php?page=IMD+Teradata+Performance http://portal.corp.ebay.com/wiki/tiki-index.php?page=TD Performance Quick Tips
26
APPENDIX 2: IMD Support Team Page
This teamwork page is community owned by IMD support team it contains of most of hot support issue and explain on varies aspect, like performance Tuning. Also it contains the BBS to leave your message and response there. http://teamworks/sites/10320/default.aspx
27
APPENDIX 3: Key Performance Metrics1 - SKEW
Skew: Uneven resource consumption across units of parallelism Example: %MAX/Agv Skew = (max-avg)/max * 100 Avg = 1.8, Max = 5, Skew = 64%
28
APPENDIX 4: Key Performance Metrics2 CPU/IO Efficiency Ratio
Teradata is highly IO capacity, TD performance monitor focus on CPU more than IO.
Eff= Sum(CPU)/(Sum(IO) / 1000)

It is the formula to calculate Ratio of CPU consumed per 1000 IOs, Lower value indicate More efficient performance. Example:
Query 1: 1283 CPU seconds, 1.09m Ios ration of 6 Query 2: 2568 CPU seconds, 2.09m IOs ration of 12
29
Parallel efficiency is a calculation that determines how much a given query impacts the system overall.
(Total CPU Time / Effective CPU Time ) * 100

Total CPU Time is the total CPU seconds spent executing the query. Effective CPU Time is the CPU seconds spent by the hottest AMP during the execution of the query multiplied by the number of AMPs on the system. This information comes from the Database Query Log view QRYLOG_DBA_ALL
30
SELECT username, acctstringdate, starttime , SUM(HotAMp1CPU * v.VprocCnt) AS myEffectiveCPU
, SUM(TotalCPUTime) AS myTotalCPUTime
, myEffectiveCPU - myTotalCPUTime AS mySkewOverhead , myTotalCPUTime / (myEffectiveCPU+1) * 100 AS myParallelEfficiency , COUNT(*) AS myNumberOfExecutions , AVG(actual_mins) AS AvgActualRunTimeMins , SUM(TotalIOCount) AS myTotalIOCount , (myTotalCPUTime / myTotalIOCount * 1000) AS myCPUIORate , querytext FROM dw_monitor_views.QryLog_dba q inner join dw_monitor_views.dw_vproc_hist v on v.thedate = q.acctstringdate WHERE AcctStringDate >= date - 20 GROUP BY 1,2,3,12
31
APPENDIX 6: Checklist for Performance Tuning

Quantify Impact at each step Generate the Explain plan and look for improvement Execute the modified query and monitor Spool and CPU usage
Document the improvement approaches and help SAEs to understand the proposal
Checklist for Performance Tuning # Data Collection 1 DBQL provides the primary tool for isolating poorly performing queries Action Item
2
Identify Poorly Performing Queries 1 2 3 4
Excel based Graphical Reporting
Parallel Efficiency Efficiency CPU IO Efficiency CPU/IO Efficiency Ratio
Collect statistics
1
32
Diagnostic helpstats on for session; (V2R5.1+)
APPENDIX 6: Checklist for Performance Tuning
Explain Plan Diagnostic verboseexplain on for session; (V2R5.1+) 1 2 3 4 5 6 7 8 9 Indices (Primary, Secondary, Join, etc) 1 Rewrites 1 2 3 4 Quantify Impact at each step Generate the Explain plan and see for improvement Run the query and monitor Spool and CPU usage Explain the corrections help SAE understand the proposal Identify physical model changes (indexes, Primary indexes) Redistributions of large tables and spools(on fields) Product Joins on large numbers of rows Aggregates early in a plan on large numbers of rows Poor confidence Large estimates (hours or days) Large numbers of rows in a step (billions) Index check. See if proper Primary Indexes (PI) and Secondary Indexes (SI) are being used. See for Column Data type Mismatch. (Translate) PPI filter enable
33
APPENDIX 7: Possibility-Satisfaction Measure
Possibility: How possible one thing can be done as P. P in [0..1]. P=1 means possible while P=0 means impossible.
Satisfaction: How people satisfy with one thing as Q. Q in [0..1]. Q=1 means satisfied while Q=0 means unsatisfied.
Possibility-Satisfaction Score: An approach can be possibly be done with satisfaction, which is the combination with P and Q.
34
Performance Tuning metrics weight
Top Important
Very Important
Important
Consider
Can ignore
Score
Weight
Effective CPU -%
104
39%
Parallel Efficiency +
96
36%
CPU/IO Effective Ratio -%
48
18%
Code change +
19
7%
Total
267
100%
35
setting and approach value
A Vaule
B Value
Effective CPU -%
5%
50%
10%
5%
50%
50%
50
11.72
50
5%
50%
43%
40%
5%
45%
Code change +
10
36
Approaches possible satisfaction
Effective CPU -%
0.11
0.00
1.00
1.00
0.23
0.10
0.00
1.00
0.84
0.78
0.00
0.89
Code change +
0.67
0.89
0.56
1.00
Total
0.33
0.24
0.43
0.98
37

Teradata SQL Performance Tuning Case Study Part II

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Teradata SQL Performance Tuning Case Study Part II

Uploaded by

Copyright:

Available Formats

Teradata SQL Performance Tuning Case Study

Eddy Cai 2007/03

APPENDIX 5: Key Performance Metrics3 - Parallel Efficiency

eBay Inc. confidential

Case 9: Derived Table verses Volatile Table

eBay Inc. confidential

Case 9: Derived Table verses Volatile Table

(select item_id, max(last_clsfn_date) last_clsfn_date

eBay Inc. confidential

Case 9: Derived Table verses Volatile Table

myEffectiveCPU 35000 30000 25000 20000 15000 10000 5000 0

eBay Inc. confidential

Case 10: Pre-aggregation First

eBay Inc. confidential

Case 10: Pre-aggregation First

Before change: dw_dp_ebay_fee_w_ins.ksh

eBay Inc. confidential

Case 10: Pre-aggregation First

eBay Inc. confidential

Case 11: Cross left outer join skew on null

eBay Inc. confidential

Case 11: Cross left outer join skew on null

myEffectiveCPU 45,000.00 40,000.00 35,000.00 30,000.00 25,000.00

10,000.00 5,000.00 0.00

ON coalesce (MIP_TRANS.ORDER_ID, -ITEM.ITEM_ID)=MIP_ORDER.ORDER_ID

Case 12: Join column skew on default value

eBay Inc. confidential

Case 12: Join column skew on default value

DW_SAV_SEARCH.SAV_SRCH_ID <> -999;

eBay Inc. confidential

Case 13: QUALIFY & ROW_NUMBER() Function

eBay Inc. confidential

Case 13: QUALIFY & ROW_NUMBER() Function

Example: dw_se.stg_se_emls_bncd_cumm_w.ins.sql Before change:

Case 13: QUALIFY & ROW_NUMBER() Function

SUPPORT_SCRATCH.STG_EMLS_BNCD_W GROUP BY 1,2,3,4,5,6,7,8 ) a left join batch_views.dw_users_info b ON a.to_email_address = b.EMAIL

QUALIFY ROW_NUMBER() OVER(Partition by a.EMAIL_BOUNCED_DATE,

QUALIFY ROW_NUMBER() OVER(Partition by t.EMAIL_BOUNCED_DATE,

Case 14: Avoid spool PI skew

eBay Inc. confidential

Case 14: Avoid spool PI skew

eBay Inc. confidential

Case 14: Avoid spool PI skew

3 CFG DW_RULE D S4 P S5 4 S10 DW_MAP R S6 RULE_SET_KEY

eBay Inc. confidential

Case 14: Avoid spool PI skew

eBay Inc. confidential

Case 15: Pre-aggregate then join back with duplicated

eBay Inc. confidential

Case 15: Pre-aggregate then join back with duplicated

Before change: dw_um.dw_um_user_smpl_map_pr_prep2_w.ins.sql

eBay Inc. confidential

Case 15: Pre-aggregate then join back with duplicated

12,000.00 10,000.00 8,000.00 6,000.00 4,000.00 2,000.00 0.00

Then, Join back with the high duplicated dataset ...

eBay Inc. confidential

Case 16: Using max() instead of OLAP function for De-duplication

eBay Inc. confidential

Case 16: Using max() instead of OLAP function for De-duplication

eBay Inc. confidential

Case 16: Using max() instead of OLAP function for De-duplication

eBay Inc. confidential

APPENDIX 1: IMD Teradata Performance wiki