P. 1
SQL Pivot and Prune Queries - Keeping an Eye on Performance

SQL Pivot and Prune Queries - Keeping an Eye on Performance

|Views: 1,082|Likes:
Published by Brendan Furey
It is a very common requirement in SQL to join two record sets where there is a one to many relationship between the two sets, but where the cardinality of the result set is the same as that of the set on the 'one' side. The obvious case is for standard grouping and aggregation querying, such as simply counting the number of records in the 'many' set for each record in the 'one' set. There are also some slightly less obvious cases where there may be various SQL techniques available, with varying performance and complexity characteristics. This article looks at two such cases: the first where one wishes to join multiple subtypes of a given entity – this is generally referred to as ‘pivoting’ from rows to columns; and the second where one wishes to join just one record from the 'many' set, but does not have a pure join condition to identify the record and so must use an ordering condition instead – I will call this ‘pruning’.

This work attempts to find the best SQL techniques for such queries in Oracle 11g, mainly in terms of performance. It does this by running a variety of queries within the context of an outbound interface against a deliberately simple data model across a two-dimensional range of data sizes. A simple generic PL/SQL package has been written to perform the testing efficiently, and it uses a previously described (REF-4) object type for timing. Visio diagrams are provided for query structures, based on a similar approach previously described (REF-3), and Microsoft Excel graphs are used to display comparative performances. The results reveal some interesting features of the behaviour of the Cost Based Optimiser in Oracle 11g.

I have applied the same domain-based approach to performance analysis in a subsequent article, ‘Forming Range-Based Break Groups with Advanced SQL'.
It is a very common requirement in SQL to join two record sets where there is a one to many relationship between the two sets, but where the cardinality of the result set is the same as that of the set on the 'one' side. The obvious case is for standard grouping and aggregation querying, such as simply counting the number of records in the 'many' set for each record in the 'one' set. There are also some slightly less obvious cases where there may be various SQL techniques available, with varying performance and complexity characteristics. This article looks at two such cases: the first where one wishes to join multiple subtypes of a given entity – this is generally referred to as ‘pivoting’ from rows to columns; and the second where one wishes to join just one record from the 'many' set, but does not have a pure join condition to identify the record and so must use an ordering condition instead – I will call this ‘pruning’.

This work attempts to find the best SQL techniques for such queries in Oracle 11g, mainly in terms of performance. It does this by running a variety of queries within the context of an outbound interface against a deliberately simple data model across a two-dimensional range of data sizes. A simple generic PL/SQL package has been written to perform the testing efficiently, and it uses a previously described (REF-4) object type for timing. Visio diagrams are provided for query structures, based on a similar approach previously described (REF-3), and Microsoft Excel graphs are used to display comparative performances. The results reveal some interesting features of the behaviour of the Cost Based Optimiser in Oracle 11g.

I have applied the same domain-based approach to performance analysis in a subsequent article, ‘Forming Range-Based Break Groups with Advanced SQL'.

More info:

Published by: Brendan Furey on May 02, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as DOC, PDF, TXT or read online from Scribd
See more
See less

09/26/2012

pdf

text

original

Query Text

SELECT /* PVKP*/
'"' || emp_name || '","' || h || '","' || w || '","' || m || '","' || f || '4895"'

FROM (

SELECTemp.first_name || ' ' || emp.last_name emp_name,
pho.phone_type,
MAX (pho.phone_number) KEEP (DENSE_RANK LAST ORDER BY pho.valid_from) phone_number_last

FROM employees emp
LEFT JOIN phone_numbers pho
ON pho.employee_id= emp.employee_id
GROUP BY emp.first_name || ' ' || emp.last_name, pho.phone_type

)

PIVOT (MAX(phone_number_last) FOR phone_type IN ('HOME' AS h, 'WORK' AS w, 'MOBILE' AS m, 'FAX' AS f))
ORDER BY 1

Query Diagram

110516346.doc

Page 26 of 52

Execution Plan Example (W128-D64)

Notice how poor the final cardinality estimate of 2,132,000 is, when the actual number would be the
number of employees (13,696). In this case, it was found later that changing the form of the Group By to
separate out the two name fields caused the cardinality estimate to improve to be about ¾ the correct
value at id 4, although the execution plan did not change (see Analysis section later).

------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 40121 (100)| |
| 1 | SORT ORDER BY | | 2132K| 77M| | 40121 (1)| 00:08:02 |
| 2 | HASH GROUP BY PIVOT | | 2132K| 77M| | 40121 (1)| 00:08:02 |
| 3 | VIEW | | 2132K| 77M| | 39952 (1)| 00:08:00 |
| 4 | SORT GROUP BY | | 2132K| 109M| 226M| 39952 (1)| 00:08:00 |
|* 5 | HASH JOIN OUTER | | 3480K| 179M| | 5348 (2)| 00:01:05 |
| 6 | TABLE ACCESS FULL| EMPLOYEES | 13583 | 358K| | 68 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL| PHONE_NUMBERS | 3509K| 90M| | 5265 (1)| 00:01:04 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("PHO"."EMPLOYEE_ID"="EMP"."EMPLOYEE_ID")

Results

Wide Data Set

Table of CPU Times

Query

W1

W2

W4

W8

W16

W32

W64

W128

D1

0.14

0.17

0.2

0.31

0.58

0.98

2

3.84

D2

0.14

0.15

0.22

0.32

0.61

1.04

2.14

4.16

D4

0.12

0.14

0.22

0.35

0.63

1.17

2.4

4.74

D8

0.09

0.15

0.24

0.4

0.7

1.44

2.83

5.66

D16

0.14

0.17

0.3

0.5

1

1.87

3.8

7.84

D32

0.14

0.21

0.41

0.71

1.34

2.8

5.76

12

D64

0.2

0.33

0.54

1.11

2.24

4.77

9.56

20.19

Graph

Deep Data Set

Table of CPU Times

Query

W1

W2

W4

W8

D1

0.12

0.14

0.19

0.38

D2

0.13

0.12

0.2

0.32

D4

0.12

0.14

0.25

0.38

D8

0.12

0.15

0.22

0.39

D16

0.14

0.19

0.3

0.49

D32

0.14

0.24

0.37

0.68

110516346.doc

Page 27 of 52

D64

0.2

0.32

0.56

1.09

D128

0.28

0.49

0.92

1.88

D256

0.47

0.81

1.64

3.44

D512

0.77

1.58

3.12

6.85

D1024

1.45

2.86

6.2

13.06

Graph

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->