You are on page 1of 37

Teradata SQL Performance Tuning Case Study

Part II

Eddy Cai 2007/03

Overview

Case 9: Derived Table verses Volatile Table Case 10: Pre-aggregation First Case 11: Cross left outer join skew on null Case 12: Join column skew on default value Case 13: QUALIFY & ROW_NUMBER() Function Case 14: Avoid spool PI skew Case 15: Pre-aggregate then join back with duplicated APPENDIX 1: IMD Teradata Performance wiki APPENDIX 2: IMD Support Team Page APPENDIX 3: Key Performance Metrics1 - SKEW APPENDIX 4: Key Performance Metrics2 CPU/IO Efficiency Ratio

APPENDIX 5: Key Performance Metrics3 - Parallel Efficiency


APPENDIX 6: Checklist for Performance Tuning APPENDIX 7: Possibility-Satisfaction Measure

eBay Inc. confidential

Case 9: Derived Table verses Volatile Table

Teradata Optimizer could not look into derived table and no confidence how many row will return, but Teradata would do a sample collect stats on Volatile Table automatically. If result set is small, Optimizer could choose a plan (duplicated data to all amps) better than derived table (joined using a merge join).
myEffectiveCPU 16000 14000 12000 10000 8000 6000 4000 2000 0 2006-12-4 2006-12-11 2006-12-18 2006-12-20 10 0 40 30 20 mySkewOverhead myParallelEfficiency 60 50

eBay Inc. confidential

Case 9: Derived Table verses Volatile Table

Example: o_srch.ods_item_aisle_clssfctn_w.del_ins.sql
select

a.item_id
, syslib.udf_utf8to16( prdct_aspct_nm ) , syslib.udf_utf8to16( aspct_vlu_nm ) , a.last_clsfn_date from ods_batch_views.stg_item_aspct_clssfctn_w a,

(select item_id, max(last_clsfn_date) last_clsfn_date


from ods_batch_views.stg_item_aspct_clssfctn_w group by item_id ) b where a.item_id = b.item_id and a.last_clsfn_date = b.last_clsfn_date ;

eBay Inc. confidential

Case 9: Derived Table verses Volatile Table


Rewrite SQL
CREATE VOLATILE TABLE LATEST_CLSFN_V AS ( select item_id, max(last_clsfn_date) last_clsfn_date from ods_batch_views.stg_item_dprtmnt_clssfctn_w group by item_id )WITH DATA PRIMARY INDEX (ITEM_ID, LAST_CLSFN_DATE) ON COMMIT PRESERVE ROWS ; select a.item_id , syslib.udf_utf8to16( dprtmnt_dmn_nm ) , a.last_clsfn_date from ods_batch_views.stg_item_dprtmnt_clssfctn_w a, latest_clsfn_v b where a.item_id = b.item_id and a.last_clsfn_date = b.last_clsfn_date
;

myEffectiveCPU 35000 30000 25000 20000 15000 10000 5000 0

mySkewOverhead

myParallelEfficiency 100 90 80 70 60 50 40 30 20 10 0

eBay Inc. confidential

20 07 20 -3 -1 07 5 20 -3 07 15 20 -3 07 15 20 -3 -1 07 5 20 -3 07 15 20 -3 07 15 20 -3 -1 07 6 20 -3 07 16 20 -3 07 17 20 -3 -1 07 7 20 -3 07 18 20 -3 07 18 20 -3 -1 07 8 20 -3 07 18 20 -3 07 18 20 -3 -1 07 9 -3 -2 0

Case 10: Pre-aggregation First

If there are duplicate records in spool which need to join with lookup table, then pre-aggregate first to compress the data set will be more effective. This should result in less skew, due to the aggregate (rather than the detail). Example using Pre-aggregation resolve the skew on issue and downsize the spool when joining with DW_EXCHANGE_RATE.

eBay Inc. confidential

Case 10: Pre-aggregation First

Before change: dw_dp_ebay_fee_w_ins.ksh


from batch_views.dw_accounts a, batch_views.dw_dp_actn_code b, batch_views.DW_DAILY_EXCHANGE_RATES c, batch_views.DW_DP_WKLY_RPT_SLR_W d, batch_views.dw_calendar cal where a.user_id = d.user_id and c.curncy_id = d.user_site_curncy_id and a.acct_trans_date between d.beg_prd_adjd_by_tz and d.end_prd_adjd_by_tz and a.acct_trans_dt = c.day_of_rate_dt and d.user_site_curncy_id = c.curncy_id and a.actn_code = b.dp_actn_code and a.acct_trans_dt =cal.cal_date and a.acct_trans_dt between (DATE '2006-11-04') and (DATE '2006-11-12') and CASE WHEN kenan_source = 2 then tracking_id_3 else -999 end NOT IN (7, 8)

eBay Inc. confidential

Case 10: Pre-aggregation First


Rewrite SQL
From ( from $readDB.dw_accounts a, $readDB.dw_dp_actn_code b, $readDB.$DrivingTable d where a.user_id = d.user_id and a.acct_trans_date between d.beg_prd_adjd_by_tz and d.end_prd_adjd_by_tz and a.acct_trans_dt between (DATE '${CREATED_MIN_DATE}') and (DATE '${CREATED_MAX_DATE}') and CASE WHEN a.kenan_source = 2 then a.tracking_id_3 else -999 end NOT IN (7, 8) - NON_WACKO and a.actn_code = b.dp_actn_code group by 1,2,3,4,5,6,7 ) ACCT, $readDB.DW_DAILY_EXCHANGE_RATES c where c.curncy_id = acct.user_site_curncy_id and acct.acct_trans_dt = c.day_of_rate_dt and c.day_of_rate_dt between (DATE '${CREATED_MIN_DATE}') and (DATE '${CREATED_MAX_DATE}')

eBay Inc. confidential

Case 11: Cross left outer join skew on null

For multiple left outer join in one query, there are several steps of join. It is possible that the later joining column skew on NULL or other particular value based on previous result set. Change this value to a more evenly redistributed value like ITEM_ID. That will bring a balanced distribution and the value can not match in join condition.

eBay Inc. confidential

Case 11: Cross left outer join skew on null


Example: dw_api_fetr_sd_w_ins.sql.
() ITEM LEFT OUTER JOIN ${readDB}.DW_MIP_ORDER_TRANS_MAP MIP_TRANS ON ITEM.ITEM_ID=MIP_TRANS.ITEM_ID AND MIP_TRANS.ORDER_ID IS NOT NULL LEFT OUTER JOIN ${readDB}.DW_MIP_ORDER MIP_ORDER ON

myEffectiveCPU 45,000.00 40,000.00 35,000.00 30,000.00 25,000.00

myTotalCPUTime

myParallelEfficiency 100 90 80 70 60 50

MIP_TRANS.ORDER_ID=MIP_ORDER.ORDER _ID
AND MIP_ORDER.ORDER_STATUS=3 GROUP BY 1,2,3,5,6,7,8,9,10,11,12,13;

20,000.00 40 15,000.00

Rewrite SQL
() ITEM LEFT OUTER JOIN ${readDB}.DW_MIP_ORDER_TRANS_MAP MIP_TRANS ON ITEM.ITEM_ID=MIP_TRANS.ITEM_ID AND MIP_TRANS.ORDER_ID IS NOT NULL LEFT OUTER JOIN ${readDB}.DW_MIP_ORDER MIP_ORDER

30 20 10 0 20 20 06- 0612- 1226 27 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 06- 06- 06- 06- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-112- 12- 12- 12- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 28 29 30 31

10,000.00 5,000.00 0.00

ON coalesce (MIP_TRANS.ORDER_ID, -ITEM.ITEM_ID)=MIP_ORDER.ORDER_ID


AND MIP_ORDER.ORDER_STATUS=3
10
eBay Inc. confidential

Case 12: Join column skew on default value

When join with a nullable column, the default value (-999, NULL, etc) maybe the root cause of bad parallel efficiency. Two ways avoid joining on default value.
Split query into two query with/without skew data Filter the skew data on join condition(need confirm with SAE)

11

eBay Inc. confidential

Case 12: Join column skew on default value

Example: dw_myebay.dw_myebay_sav_search.ups.sql
Before change: UPDATE DW_SAV_SEARCH FROM ${gdwDB}.DW_MYEBAY_SAV_SEARCH DW_SAV_SEARC H, DW_MYEBAY_SAV_SEARCH_V SAV_SEARCH_V SET DELTD_YN_ID = 1 ,DELTD_DT = DATE ,UPD_DATE = CURRENT_TIMESTAMP(0) ,UPD_USER = 'DW_BATCH' WHERE DW_SAV_SEARCH.SAV_SRCH_ID=SAV_SEARCH_V.ID;

Rewrite SQL
UPDATE DW_SAV_SEARCH FROM ${gdwDB}.DW_MYEBAY_SAV_SEARCH DW_SAV_SEARC H, DW_MYEBAY_SAV_SEARCH_V SAV_SEARCH_V SET DELTD_YN_ID = 1 ,DELTD_DT = DATE ,UPD_DATE = CURRENT_TIMESTAMP(0) ,UPD_USER = 'DW_BATCH' WHERE DW_SAV_SEARCH.SAV_SRCH_ID=SAV_SEARCH_V.ID And

DW_SAV_SEARCH.SAV_SRCH_ID <> -999;

12

eBay Inc. confidential

Case 13: QUALIFY & ROW_NUMBER() Function

When join with derived table which has aggregation to filter unwanted data, we can use QUALIFY & ROW_NUMBER() to remove the derived table .

Qualify is used to get the needed data, its often used with function ROW_NUMBER(), RANK(), SUM(1). Those functions has better performance
than aggregation function in group by sentences.

13

eBay Inc. confidential

Case 13: QUALIFY & ROW_NUMBER() Function

Example: dw_se.stg_se_emls_bncd_cumm_w.ins.sql Before change:


select from (select * LEAF_CATEG_ID ,a.FROM_EMAIL_ADDRESS,a.HARD_BOUNCE_TYPE ,a.TO_EMAIL_ADDRESS ,a.ISP ,case when (b.user_id is null or b.user_id = -999) then -1 * (c.to_email_addr_id) else b.user_id end user_id ,count(*) USER_EMAIL_COUNT from batch_views.STG_EMLS_BNCD_W a left join (select email, max(user_id) from batch_views.dw_users_info group by 1 ) b (email, user_id) ON a.to_email_address = b.EMAIL

left join (select to_email_address, max(to_email_addr_id) from batch_views.dw_se_to_email_addr group by 1) c (to_email_address, to_email_addr_id)
ON a.to_email_address = c.to_email_address group by 1,2,3,4,5,6,7,8,9 ) sebw;
14
eBay Inc. confidential

Case 13: QUALIFY & ROW_NUMBER() Function


Rewrite SQL:
select (select , count(*) USER_EMAIL_COUNT from from
myEffectiveCPU 150,000.00 100,000.00 50,000.00 0.00 mySkewOverhead myParallelEfficiency 120 100 80 60 40 20 0

SUPPORT_SCRATCH.STG_EMLS_BNCD_W GROUP BY 1,2,3,4,5,6,7,8 ) a left join batch_views.dw_users_info b ON a.to_email_address = b.EMAIL

QUALIFY ROW_NUMBER() OVER(Partition by a.EMAIL_BOUNCED_DATE,


a.TEMPLATE_ID,a.ITEM_SITE_ID,LEAF_CATEG_ID,a.FROM_EMAIL_ADDRESS,a.HARD_BOUNCE _TYPE,a.TO_EMAIL_ADDRESS,a.ISP order by b.user_id desc )=1 )t left join batch_views.dw_se_to_email_addr c ON t.to_email_address = c.to_email_address

QUALIFY ROW_NUMBER() OVER(Partition by t.EMAIL_BOUNCED_DATE,


t.TEMPLATE_ID,t.ITEM_SITE_ID,LEAF_CATEG_ID,t.FROM_EMAIL_ADDRESS,t.HARD_BOUNCE_T YPE,t.TO_EMAIL_ADDRESS,t.ISP order by c.to_email_addr_id desc )=1;
15
eBay Inc. confidential

4/ 13 4/ /20 14 0 4/ /20 7 1 0 4/ 5/20 7 16 0 4/ /20 7 17 0 4/ /20 7 18 0 4/ /20 7 19 0 4/ /20 7 2 0 4/ 0/20 7 21 0 4/ /20 7 22 0 4/ /20 7 23 0 4/ /20 7 24 0 4/ /20 7 2 0 4/ 5/20 7 26 0 4/ /20 7 27 0 4/ /20 7 28 0 /2 7 00 7

Case 14: Avoid spool PI skew

Spool is intermediate table with PI as normal table. Sometimes skew on spool PI will cause the performance issue.

To avoid spool PI skew, need rewrite the query to cut the join which make the spool.

16

eBay Inc. confidential

Case 14: Avoid spool PI skew

FROM fndng_working.STG_FNDNG_TD_RESULT_SET_W STG JOIN fndng_tables.DW_FNDNG_RULE_SET DW ON DW.RULE_ID_STRING_TXT=STG.RULE_ID_STRING_TXT AND DW.LABEL_ID=STG.LABEL_ID JOIN fndng_tables.DW_FNDNG_RULE_MAP DW_MAP ON DW.RULE_SET_KEY=DW_MAP.RULE_SET_KEY JOIN fndng_tables.DW_FNDNG_TD_RULE DW_RULE ON DW_RULE.RULE_ID=DW_MAP.RULE_ID AND DW_RULE.LABEL_ID=DW_MAP.LABEL_ID JOIN batch_views.DW_KWDM_CNSTRNT_VAL_CFG KW ON KW.ITEM_SITE_ID=DW_RULE.SITE_ID GROUP BY 1,2,3;

17

eBay Inc. confidential

Case 14: Avoid spool PI skew

3 CFG DW_RULE D S4 P S5 4 S10 DW_MAP R S6 RULE_SET_KEY

RULE_ID LABEL_ID

L STG L S7 1,418,765

S3

S11

S1

LABEL_ID RULE_ID_STRING_TXT DW D

S9

SKEW

S8

RULE_SET_KEY

LABEL_ID RULE_ID_STRING_TXT

18

eBay Inc. confidential

Case 14: Avoid spool PI skew

FROM fndng_working.STG_FNDNG_TD_RESULT_SET_W STG JOIN (select DW.RULE_SET_KEY ,DW.RULE_ID_STRING_TXT ,DW.LABEL_ID ,MAX(DW_RULE.SITE_ID) SITE_ID
45,000.00 40,000.00 35,000.00 30,000.00 25,000.00 20,000.00 15,000.00 10,000.00 5,000.00 0.00 2007-7-24 2007-7-25 2007-7-26 2007-7-27 100 80 60 40 20 0 myEffectiveCPU mySkewOverhead myParallelEfficiency

,MAX(KW.BID_VAL) BID_VAL
,MAX(KW.BIN_VAL) BIN_VAL from fndng_tables.DW_FNDNG_RULE_SET DW JOIN fndng_tables.DW_FNDNG_RULE_MAP DW_MAP group by 1,2,3 )tmp ON tmp.RULE_ID_STRING_TXT=STG.RULE_ID_STR ING_TXT AND tmp.LABEL_ID=STG.LABEL_ID GROUP BY 1,2,3;

19

eBay Inc. confidential

Case 15: Pre-aggregate then join back with duplicated

There is a large dataset in spool which need to join with large lookup table, the dataset only have few distinct values over the join column pre-aggregate first to compress the data set, join it with the lookup table, then join back with the duplicated dataset will be more effective.

20

eBay Inc. confidential

Case 15: Pre-aggregate then join back with duplicated

Before change: dw_um.dw_um_user_smpl_map_pr_prep2_w.ins.sql


FROM DDM_UM_W.STG_UM_USER_SMPL_PR_MTRC_P1_W PR_MTRC JOIN DDM_UM_T.DW_UM_USER_SMPL_LKP SMPL_LKP ON PR_MTRC.TEST_GRP = SMPL_LKP.USER_SMPL_CMPGN_ID AND PR_MTRC.SOJ_SITE_ID = SMPL_LKP.SITE_ID JOIN DDM_UM_T.DW_UM_USER_SMPL_VRNT SMPL_VRNT ON SMPL_LKP.USER_SMPL_ID = SMPL_VRNT.USER_SMPL_PRNT_ID AND SMPL_VRNT.USER_SMPL_VRNT=PR_MTRC.TEST_VARIANT

21

eBay Inc. confidential

Case 15: Pre-aggregate then join back with duplicated

Rewrite SQL
First, pre-aggregate the dataset into a volatile table

myEffectiveCPU 14,000.00

mySkewOverhead

myParallelEfficiency 90 80 70 60 50 40 30 20 10 0

CREATE volatile TABLE pre_distinct_v AS ( SELECT SMPL_LKP.USER_SMPL_ID ,PR_MTRC.TEST_VARIANT FROM support_scratch.STG_UM_USER_SMPL_PR_MTRC_P1_W PR_MTRC JOIN DDM_UM_T.DW_UM_USER_SMPL_LKP SMPL_LKP ON PR_MTRC.TEST_GRP = SMPL_LKP.USER_SMPL_CMPGN_ID AND PR_MTRC.SOJ_SITE_ID = SMPL_LKP.SITE_ID Group By 1,2 ) WITH DATA Unique Primary Index( USER_SMPL_ID,TEST_VARIANT ) ONCOMMIT PRESERVE ROWS;

12,000.00 10,000.00 8,000.00 6,000.00 4,000.00 2,000.00 0.00


9/ 13 /2 0 9/ 13 07 /2 0 9/ 13 07 /2 0 9/ 13 07 /2 0 9/ 13 07 /2 0 9/ 14 07 /2 0 9/ 14 07 /2 0 9/ 15 07 /2 0 9/ 17 07 /2 0 9/ 19 07 /2 0 9/ 19 07 /2 0 9/ 24 07 /2 00 7

Then, Join back with the high duplicated dataset ...

FROM DDM_UM_W.STG_UM_USER_SMPL_PR_MTRC_P1_W PR_MTRC JOIN DDM_UM_T.DW_UM_USER_SMPL_LKP SMPL_LKP ON PR_MTRC.TEST_GRP = SMPL_LKP.USER_SMPL_CMPGN_ID AND PR_MTRC.SOJ_SITE_ID = SMPL_LKP.SITE_ID JOIN ( --join the compressed volatile table with the large lookup table select SMPL_VRNT_PRE.USER_SMPL_PRNT_ID ,SMPL_VRNT_PRE.USER_SMPL_VRNT ,USER_SMPL_VRNT_ID From pre_distinct_v JOIN DDM_UM_T.DW_UM_USER_SMPL_VRNT SMPL_VRNT_PRE ON SMPL_VRNT_PRE.USER_SMPL_PRNT_ID= pre_distinct_v.USER_SMPL_ID AND SMPL_VRNT_PRE.USER_SMPL_VRNT=pre_distinct_v.TEST_VARIANT ) SMPL_VRNT ON SMPL_VRNT.USER_SMPL_PRNT_ID= SMPL_LKP.USER_SMPL_ID AND SMPL_VRNT.USER_SMPL_VRNT=PR_MTRC.TEST_VARIANT

22

eBay Inc. confidential

Case 16: Using max() instead of OLAP function for De-duplication

OLAP function in current version V2R6 has performance issues due to the using of large spool. We can avoid using OLAP functions in some cases (not all cases) and choose alternative methods While doing De-duplication, we can try using max() function instead of OLAP function (row_number() or sum() with qualify clause)

23

eBay Inc. confidential

Case 16: Using max() instead of OLAP function for De-duplication

Before change:o_odw_itm.itm_dmx_cpu_w.ins.sql
Insert Into odw_itm_w.ITM_DMX_CPU_W ( ) Select From

odw_itm_w.ITM_DENORM_W w
Where w.RESOURCE_MODEL = 'DMXCpu' qualify sum(1) over ( partition by W.SERVER_NAME,W.PROFILE_NAME,W.RESOURCE_MODEL,W.CONTEXT_TYPE, W.INSTANCE_NAME,W.ITM_TYPE,W.TRANS_TS order by W.TRANS_TS desc ROWS UNBOUNDED PRECEDING) = 1 ;

24

eBay Inc. confidential

Case 16: Using max() instead of OLAP function for De-duplication

Rewrite SQL

myEffectiveCPU

mySkewOverhead

myParallelEfficiency

7000 60 Insert Into odw_itm_w.ITM_DMX_CPU_W ( ......) 6000 50 Select ...... 5000 From 40 4000 odw_itm_w.ITM_DENORM_W w 30 Join 3000 ( 20 2000 Select 10 1000 SERVER_NAME , PROFILE_NAME 0 0 , RESOURCE_MODEL , CONTEXT_TYPE , INSTANCE_NAME , ITM_TYPE , TRANS_TS , MAX(COALESCE(FILE_TIME,0)||COALESCE(DATA_VALUE1,0)||COALESCE(DATA_VALUE2,0)||COALESCE(DATA_VALUE3,0)||COALESCE(DATA_VALUE4,0)||COA LESCE(DATA_VALUE5,0)||COALESCE(DATA_VALUE6,0)||COALESCE(DATA_VALUE7,0)||COALESCE(DATA_VALUE8,0)||COALESCE(INSTANCE1,0)) AS MAX_VALUE From odw_itm_w.ITM_DENORM_W Where RESOURCE_MODEL = 'DMXCpu' Group by 1,2,3,4,5,6,7 ) w1 on w.SERVER_NAME =w1.SERVER_NAME and w.PROFILE_NAME =w1.PROFILE_NAME and w.RESOURCE_MODEL =w1.RESOURCE_MODEL and w.CONTEXT_TYPE =w1.CONTEXT_TYPE and w.INSTANCE_NAME =w1.INSTANCE_NAME and w.ITM_TYPE =w1.ITM_TYPE and w.TRANS_TS =w1.TRANS_TS and (COALESCE(FILE_TIME,0)||COALESCE(DATA_VALUE1,0)||COALESCE(DATA_VALUE2,0)||COALESCE(DATA_VALUE3,0)||COALESCE(DATA_VALUE4,0)||COALES CE(DATA_VALUE5,0)||COALESCE(DATA_VALUE6,0)||COALESCE(DATA_VALUE7,0)||COALESCE(DATA_VALUE8,0)||COALESCE(INSTANCE1,0))= w1.MAX_VALUE Where w.RESOURCE_MODEL = 'DMXCpu ;

25

eBay Inc. confidential

9/ 2 9/ 3/2 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 9/ 3/20 07 2 9/ 3/20 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 9/ 3/20 07 2 9/ 3/20 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 9/ 3/20 07 2 9/ 3/20 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 9/ 4/20 07 2 9/ 4/20 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 9/ 4/20 07 2 9/ 4/20 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 24 0 /2 07 00 7

APPENDIX 1: IMD Teradata Performance wiki

This wiki is maintained by IMD performance analyst This wiki contains tips and techniques to improve your query performance and to learn about Teradata and the MPP (Massive Parallel Processing) capabilities of the system. It contents of CSUM, skew fact, TD Architecture and developers guidelines. It also includes the TD Performance Quick Tips. http://portal.corp.ebay.com/wiki/tikiindex.php?page=IMD+Teradata+Performance http://portal.corp.ebay.com/wiki/tiki-index.php?page=TD Performance Quick Tips

26

eBay Inc. confidential

APPENDIX 2: IMD Support Team Page

This teamwork page is community owned by IMD support team it contains of most of hot support issue and explain on varies aspect, like performance Tuning. Also it contains the BBS to leave your message and response there. http://teamworks/sites/10320/default.aspx

27

eBay Inc. confidential

APPENDIX 3: Key Performance Metrics1 - SKEW

Skew: Uneven resource consumption across units of parallelism Example: %MAX/Agv Skew = (max-avg)/max * 100 Avg = 1.8, Max = 5, Skew = 64%

28

eBay Inc. confidential

APPENDIX 4: Key Performance Metrics2 CPU/IO Efficiency Ratio

Teradata is highly IO capacity, TD performance monitor focus on CPU more than IO.

Eff= Sum(CPU)/(Sum(IO) / 1000)


It is the formula to calculate Ratio of CPU consumed per 1000 IOs, Lower value indicate More efficient performance. Example:
Query 1: 1283 CPU seconds, 1.09m Ios ration of 6 Query 2: 2568 CPU seconds, 2.09m IOs ration of 12

29

eBay Inc. confidential

APPENDIX 5: Key Performance Metrics3 - Parallel Efficiency

Parallel efficiency is a calculation that determines how much a given query impacts the system overall.

(Total CPU Time / Effective CPU Time ) * 100


Total CPU Time is the total CPU seconds spent executing the query. Effective CPU Time is the CPU seconds spent by the hottest AMP during the execution of the query multiplied by the number of AMPs on the system. This information comes from the Database Query Log view QRYLOG_DBA_ALL

30

eBay Inc. confidential

APPENDIX 5: Key Performance Metrics3 - Parallel Efficiency

SELECT username, acctstringdate, starttime , SUM(HotAMp1CPU * v.VprocCnt) AS myEffectiveCPU

, SUM(TotalCPUTime) AS myTotalCPUTime
, myEffectiveCPU - myTotalCPUTime AS mySkewOverhead , myTotalCPUTime / (myEffectiveCPU+1) * 100 AS myParallelEfficiency , COUNT(*) AS myNumberOfExecutions , AVG(actual_mins) AS AvgActualRunTimeMins , SUM(TotalIOCount) AS myTotalIOCount , (myTotalCPUTime / myTotalIOCount * 1000) AS myCPUIORate , querytext FROM dw_monitor_views.QryLog_dba q inner join dw_monitor_views.dw_vproc_hist v on v.thedate = q.acctstringdate WHERE AcctStringDate >= date - 20 GROUP BY 1,2,3,12

31

eBay Inc. confidential

APPENDIX 6: Checklist for Performance Tuning


Quantify Impact at each step Generate the Explain plan and look for improvement Execute the modified query and monitor Spool and CPU usage

Document the improvement approaches and help SAEs to understand the proposal
Checklist for Performance Tuning # Data Collection 1 DBQL provides the primary tool for isolating poorly performing queries Action Item

2
Identify Poorly Performing Queries 1 2 3 4

Excel based Graphical Reporting

Parallel Efficiency Efficiency CPU IO Efficiency CPU/IO Efficiency Ratio

Collect statistics
1
32
eBay Inc. confidential

Diagnostic helpstats on for session; (V2R5.1+)

APPENDIX 6: Checklist for Performance Tuning

Explain Plan Diagnostic verboseexplain on for session; (V2R5.1+) 1 2 3 4 5 6 7 8 9 Indices (Primary, Secondary, Join, etc) 1 Rewrites 1 2 3 4 Quantify Impact at each step Generate the Explain plan and see for improvement Run the query and monitor Spool and CPU usage Explain the corrections help SAE understand the proposal Identify physical model changes (indexes, Primary indexes) Redistributions of large tables and spools(on fields) Product Joins on large numbers of rows Aggregates early in a plan on large numbers of rows Poor confidence Large estimates (hours or days) Large numbers of rows in a step (billions) Index check. See if proper Primary Indexes (PI) and Secondary Indexes (SI) are being used. See for Column Data type Mismatch. (Translate) PPI filter enable

33

eBay Inc. confidential

APPENDIX 7: Possibility-Satisfaction Measure

Possibility: How possible one thing can be done as P. P in [0..1]. P=1 means possible while P=0 means impossible.

Satisfaction: How people satisfy with one thing as Q. Q in [0..1]. Q=1 means satisfied while Q=0 means unsatisfied.

Possibility-Satisfaction Score: An approach can be possibly be done with satisfaction, which is the combination with P and Q.

34

eBay Inc. confidential

APPENDIX 7: Possibility-Satisfaction Measure

Performance Tuning metrics weight

Top Important

Very Important

Important

Consider

Can ignore

Score

Weight

Effective CPU -%

104

39%

Parallel Efficiency +

96

36%

CPU/IO Effective Ratio -%

48

18%

Code change +

19

7%

Total

267

100%

35

eBay Inc. confidential

APPENDIX 7: Possibility-Satisfaction Measure

setting and approach value

A Vaule

B Value

Effective CPU -%

5%

50%

10%

5%

50%

50%

Parallel Efficiency +

50

11.72

50

CPU/IO Effective Ratio -%

5%

50%

43%

40%

5%

45%

Code change +

10

36

eBay Inc. confidential

APPENDIX 7: Possibility-Satisfaction Measure

Approaches possible satisfaction

Effective CPU -%

0.11

0.00

1.00

1.00

Parallel Efficiency +

0.23

0.10

0.00

1.00

CPU/IO Effective Ratio -%

0.84

0.78

0.00

0.89

Code change +

0.67

0.89

0.56

1.00

Total

0.33

0.24

0.43

0.98

37

eBay Inc. confidential

You might also like