Explore Ebooks
Categories
Explore Audiobooks
Categories
Explore Magazines
Categories
Explore Documents
Categories
110515958.doc
Author:
Brendan Furey
Creation Date:
12 June 2011
Version:
1.4
Last Updated:
25 September 2012
Page 1 of 49
Table of Contents
Introduction.......................................................................................................4
Hardware/Software Summary.......................................................................4
Problem Definitions and Examples.....................................................................5
Problem Definitions......................................................................................5
Problem 1: Contiguous Ranges.............................................................................5
Problem 2: Overlapping Ranges...........................................................................5
Problem 3: Bursts of Activity.................................................................................5
Model Solution...........................................................................................12
How It Works.......................................................................................................12
Query Diagram....................................................................................................13
SQL......................................................................................................................13
Performance Analysis.................................................................................16
Test Data Sets.....................................................................................................16
Output Record Counts.........................................................................................16
CPU Times...........................................................................................................17
Slice Graphs........................................................................................................19
Explain Plans (Data Point W256-D1)...................................................................19
Discussion of Results..........................................................................................20
Model Solution............................................................................................23
How It Works.......................................................................................................23
Query Diagram....................................................................................................24
SQL......................................................................................................................24
Performance Analysis.................................................................................27
Test Data Sets.....................................................................................................27
Output Record Counts.........................................................................................28
CPU Times...........................................................................................................28
110515958.doc
Page 2 of 49
Slice Graphs........................................................................................................31
Explain Plans (Data Point W64-D1).....................................................................31
Discussion of Results..........................................................................................33
Performance Analysis.................................................................................37
Test Data Sets.....................................................................................................37
Output Row Counts.............................................................................................37
CPU Times...........................................................................................................38
Slice Graphs........................................................................................................39
Explain Plans (Data Point W128-D1)...................................................................40
Discussion of Results..........................................................................................40
Performance Analysis.................................................................................45
Problem 1: Contiguous Ranges...........................................................................45
Problem 2: Overlapping Ranges.........................................................................46
CPU Times...........................................................................................................46
Conclusions.....................................................................................................48
References.......................................................................................................49
Change Record
Date
Author
Version
12-Jun-2011
BPF
1.0
14-Jun-2011
BPF
1.1
19-Jul-2011
BPF
1.2
02-Aug-2011
25-Sep-2012
BPF
BPF
1.3
1.4
110515958.doc
Change Reference
Initial covering 2 problems, analytic solutions only, no performance
analysis
Added test case 5, and tabulated intermediate solutions
Restructured, adding third problem, Model and RSF solutions, and
performance analysis
Analytics anomaly analysis
References now hyperlinks
Page 3 of 49
Introduction
Records in a database often include range fields, such as a start and end time for some activity, and it is
sometimes desired to group the records by range. There are several possible ways of grouping by range:
In one case the records do not overlap, but additional breaking fields may be present; in a second case,
records may overlap, but additional breaking fields do not then make sense; in the third case considered
('bursts of activity'), only a single start field is used and break groups consist of all the records whose
range start is within a given distance from the starting point. For each problem, we consider two variations
that affect the choice of SQL: In the first, we are looking for all break groups, while in the second we want
to retrieve only a single one.
This article provides solutions for these problems, using three SQL techniques, namely: Analytic
Functions, Model Clause, and Recursive Subquery Factoring. Diagrams are used extensively to depict
query structures and help explain the solutions.
Performance analyses are included that compare performance of the three methods (only two for the third
problem) on each problem across a two-dimensional domain of size and depth. The analyses follow an
approach described in an earlier article (SQL Pivot and Prune Queries Keeping an Eye on
Performance). The results show that the best method depends on the depth of the groups, with Analytic
Functions being best for deep groups and Recursive Subquery Factoring best for shallow groups where
only a single group is required. The Model Clause performs best where an Analytic Functions solution is
not available (the bursts of activity problem) and either all groups are required or a single deep group is
required. The Model Clause also gives very stable performance across depth range, and is surprisingly
simple in structure. The article may be of interest to developers who have yet to learn about some of these
techniques.
An important performance glitch was discovered in using the analytic function First_Value with the Ignore
Nulls option, and methods for avoiding it presented.
This document replaces a preliminary version (Forming Range-Based Break Groups with SQL Analytic
Functions) with only analytic solutions, two problems, and no performance analysis.
Hardware/Software Summary
Component
Database
Diagrammer
Operating System
Computer
110515958.doc
Description
Oracle Database 11g Express Edition Release 11.2.0.2.0 - Beta
Microsoft Visio 2003 (11.3216.5606)
Microsoft Windows 7 Home Premium (32 bit)
Samsung X120, 3GB memory, Intel U4100 @ 1.3GHz x 2
Page 4 of 49
key
break
other
- partition by fields
For each problem, we consider two variations that affect the choice of SQL: In the first, we are looking for
all break groups, while in the second we want to retrieve only a single one enclosing (or, starting from, for
the third problem) a particular value.
Problem 1: Contiguous Ranges
The first problem is to obtain for each record a group start, group end pair that are the range start and
range end values for the records that respectively start and end the break group of the current record. The
records are to be ordered by range start within the partitioning key, and a new break group starts when,
between successive records, either there is a gap between range end and range start fields, or any of the
break fields change value. No overlaps are allowed in the ranges within a key.
Problem 2: Overlapping Ranges
The second problem is the same as the first but with no break fields and overlapping is allowed. In other
words, groups consist of all records that overlap, counting contiguity as overlapping.
Problem 3: Bursts of Activity
The third problem is to determine the break groups using distance from the group start point, with
overlapping allowed (since the range end is here just another attribute). In other words, once a group
starts, all records that start within a fixed distance from the group start are in the group, and the first record
after the end of a group defines the next group start.
Column
activity_id
person_id
start_date
end_date
activity_name
Type
Number
Number
Date
Date
Char(10)
Indexes
Activity_nov (problem 1, indexes unique)
Index
ACTIVITY_NOV_U1
ACTIVITY_NOV_U2
110515958.doc
Column
person_id
start_date
person_id
end_date
Page 5 of 49
Index
Column
ACTIVITY_N1
ACTIVITY_N2
person_id
start_date
Nvl(end_date, To_Date(' 3000-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss')
person_id
Nvl(end_date, To_Date(' 3000-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss')
start_date
Test Cases
There are five test cases, two for the first problem, three for the other two, which can use the same data
sets, with a person for each case. The groups for the third problem are defined by a burst size limit of 3
days. Oracle standard dates have 1 second precision, but well take a time component of zero in the test
data for simplicity as this causes no loss of generality.
Test
Case
Scenario
Test Cases T1 and T2 - Non-Overlapping with Additional Breaks
T1
T2
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
10
LEAVE
11
TRAINING
12
TRAINING
13
LEAVE
14
LEAVE
110515958.doc
Start Date
End Date
01-Jun-11
02-Jun-11
02-Jun-11
04-Jun-11
04-Jun-11
07-Jun-11
08-Jun-11
09-Jun-11
09-Jun-11
14-Jun-11
20-Jun-11
30-Jun-11
01-Jun-11
02-Jun-11
02-Jun-11
04-Jun-11
04-Jun-11
07-Jun-11
08-Jun-11
09-Jun-11
09-Jun-11
14-Jun-11
20-Jun-11
01-Jun-11
03-Jun-11
02-Jun-11
05-Jun-11
Group
Start
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
20-Jun11
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
20-Jun-11
01-Jun11
01-Jun11
Group
End
07-Jun11
07-Jun11
07-Jun11
14-Jun11
14-Jun11
30-Jun11
07-Jun11
07-Jun11
07-Jun11
09-Jun-11
Burst
Date
01-Jun-11
14-Jun-11
08-Jun-11
07-Jun11
07-Jun11
01-Jun-11
01-Jun-11
08-Jun-11
08-Jun-11
20-Jun-11
01-Jun-11
01-Jun-11
01-Jun-11
08-Jun-11
20-Jun-11
01-Jun-11
01-Jun-11
Page 6 of 49
15
LEAVE
16
LEAVE
17
TRAINING
18
TRAINING
19
LEAVE
20
LEAVE
21
LEAVE
22
LEAVE
23
TRAINING
24
TRAINING
25
LEAVE
26
LEAVE
27
LEAVE
28
LEAVE
29
TRAINING
30
TRAINING
04-Jun-11
07-Jun-11
08-Jun-11
16-Jun-11
09-Jun-11
14-Jun-11
20-Jun-11
30-Jun-11
01-Jun-11
03-Jun-11
02-Jun-11
05-Jun-11
04-Jun-11
07-Jun-11
08-Jun-11
16-Jun-11
09-Jun-11
20-Jun-11
30-Jun-11
01-Jun-11
03-Jun-11
02-Jun-11
05-Jun-11
04-Jun-11
07-Jun-11
08-Jun-11
16-Jun-11
09-Jun-11
14-Jun-11
15-Jun-11
30-Jun-11
01-Jun11
08-Jun11
08-Jun11
20-Jun11
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
08-Jun11
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
08-Jun11
07-Jun11
16-Jun11
16-Jun11
30-Jun11
07-Jun11
07-Jun11
07-Jun11
01-Jun-11
08-Jun-11
08-Jun-11
20-Jun-11
01-Jun-11
01-Jun-11
01-Jun-11
08-Jun-11
08-Jun-11
20-Jun-11
07-Jun11
07-Jun11
07-Jun11
30-Jun11
30-Jun11
30-Jun11
01-Jun-11
01-Jun-11
01-Jun-11
08-Jun-11
08-Jun-11
15-Jun-11
110515958.doc
Page 7 of 49
110515958.doc
Page 8 of 49
110515958.doc
Page 9 of 49
Notes
The diagram notation follows and extends notation developed earlier, including in SQL Pivot and Prune
Queries Keeping an Eye on Performance. The key can be referred to for subsequent diagrams.
110515958.doc
Page 10 of 49
SQL
SELECT /* NO_OVERLAP */
person_id, start_date, end_date, activity_name, activity_id id,
Last_Value (group_start IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date)
group_start,
First_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date RANGE
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) group_end
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
CASE WHEN (start_date > Nvl (Lag (end_date) OVER (PARTITION BY person_id ORDER BY start_date),
start_date-1)) OR
(activity_name != Lag (activity_name) OVER (PARTITION BY person_id ORDER BY start_date))
THEN start_date END group_start,
CASE WHEN (Nvl (Lead (start_date) OVER (PARTITION BY person_id ORDER BY start_date), end_date+1) >
end_date) OR
(activity_name != Lead (activity_name) OVER (PARTITION BY person_id ORDER BY
start_date)) THEN end_date END group_end
FROM activity_nov
)
ORDER BY person_id, start_date
Act
Id
Activity
Name
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
110515958.doc
Record Level
Start
End Date
Date
01-Jun-11 02-Jun-11
Level 1 View
Start
End Date
Date
01-Jun-11
02-Jun-11
04-Jun-11
04-Jun-11
07-Jun-11
08-Jun-11
09-Jun-11
09-Jun-11
14-Jun-11
20-Jun-11
30-Jun-11
20-Jun-11
01-Jun-11
02-Jun-11
01-Jun-11
02-Jun-11
04-Jun-11
07-Jun-11
08-Jun-11
14-Jun-11
30-Jun-11
Solution
Start
End Date
Date
01-Jun07-Jun-11
11
01-Jun07-Jun-11
11
01-Jun07-Jun-11
11
08-Jun14-Jun-11
11
08-Jun14-Jun-11
11
20-Jun30-Jun-11
11
01-Jun07-Jun-11
11
01-Jun07-Jun-11
Page 11 of 49
LEAVE
10
LEAVE
11
TRAINING
12
TRAINING
04-Jun-11
07-Jun-11
08-Jun-11
09-Jun-11
08-Jun-11
09-Jun-11
09-Jun-11
14-Jun-11
09-Jun-11
14-Jun-11
20-Jun-11
07-Jun-11
20-Jun-11
11
01-Jun11
08-Jun11
08-Jun11
20-Jun-11
07-Jun-11
09-Jun-11
14-Jun-11
Model Solution
How It Works
The key to solving this problem using Oracles Model clause (Oracle Database SQL Language
Reference 11g Release 2 (11.2)) is to realise that the solution can be represented as simple inductions,
forward for the group start dates, then backward for the group end dates. If, a, s, e, S, E are the current
activity, start date, end date, group start date, end date and (pa, ps, pe, pS, pE) and (na, ns, ne, nS, nE)
are the prior and next values then (using C-like terminology for brevity):
Initial,
S = s;
later,
S = (a != pa or s > pe) ? s : pS
Final,
E = e;
earlier, E = nS > S ? e : nE
These inductions can easily be implemented as rules within the model clause:
1. Form the basic Select, with all the table columns required, and append placeholders group_start
and group_end
2. Add the Model keyword, partitioning by person, dimensioning by analytic function Row_Number,
ordering by start date within person, and with the remaining columns as measures
3. Initialise group start and end to start and end dates in the measures clause
4. Define the first rule to obtain the group start date for all rows after the first as the previous group
start date, unless there is a gap or the activity changes, relative to the previous record, in which
case take the new start date. This rule will be processed in the default ascending row order.
5. Define the second rule to obtain the group end date for all rows as the next group end date,
unless the next group start date is greater than the current one, or there is no next (i.e. at the last
row), in which case take the current end date. This rule must be processed in descending row
order, and this is specified as it is not the default.
6. The output from the above obtains all groups, but if necessary, can be used within an inline view
to restrict the output to certain groups only (e.g. a 'current' group)
The query diagram, SQL and functional testing use the form for obtaining all groups, while the
performance testing uses the form for obtaining a single group, for consistency with the third solution
method.
110515958.doc
Page 12 of 49
Query Diagram
Notes
Queries with the Model clause have a structure that is rather different from other queries, and the
diagrams attempt to reflect that structure for these problems. The main query feeds its output into an array
processing component with a set of rules that specify how any additional (here) data items (called
measures) are to be calculated, in a mostly declarative fashion.
The model box above contains 4 specification types:
Partition
- processing is to be performed separately by one or more columns; the same
meaning as in analytic functions
Dimension
here
Measures
- remaining columns that may be calculated or updated by the rules, possibly
including placeholders from the main query
Rules
- a set of rules that specify measure calculation; rules are processed
sequentially, unless otherwise specified; in the diagram:
f(n-1,n)
(and so on)
- denotes that the value depends on values from previous and current rows
^
- denotes that the calculation progresses in ascending order by dimension;
this is the default so does not have to be coded
v
- denotes that the calculation progresses in descending order by dimension;
this is not the default so does have to be coded
SQL
SELECT /* MOD_OVL */ person_id, start_date, end_date, activity_name, activity_id, group_start, group_end
FROM activity_nov
MODEL
PARTITION BY (person_id)
DIMENSION BY (Row_Number() OVER (PARTITION BY person_id ORDER BY start_date) rn)
MEASURES (start_date, end_date, activity_name, activity_id, start_date group_start, end_date
group_end)
110515958.doc
Page 13 of 49
RULES (
group_start[rn > 1] =
CASE WHEN start_date[cv()] > end_date[cv()-1] OR activity_name[cv()] != activity_name[cv()-1]
THEN start_date[cv()] ELSE group_start[cv()-1] END,
group_end[ANY] ORDER BY rn DESC = PRESENTV (group_start[cv()+1],
CASE WHEN group_start[cv()] < group_start[cv()+1] THEN group_end[cv()] ELSE group_end[cv()+1]
END,
end_date[cv()])
)
ORDER BY 1, 2
110515958.doc
Page 14 of 49
Query Diagram
Notes
Queries with a recursive subquery factor have a special structure, and the diagrams attempt to reflect that
structure for these problems. The recursive factor is a subquery having a Union All structure in which there
are two branches:
Anchor Branch
Recursive Branch
Notice the use of subtypes in the diagram: records in the recursive branch can be split into back and
front subtypes.
SQL
WITH
SELECT
FROM
WHERE
UNION
110515958.doc
Page 15 of 49
Performance Analysis
Test Data Sets
For the performance analysis it is simpler to generate test date using a single activity, with groups
determined only by the dates. If w and d are the numeric width and depth points, records are generated for
three persons as follows:
Let random(d) be a random integer between 1 and d (generated afresh on each access)
Start date = '01-JAN-1900'
Record limit = 3 * 100 * w
Loop while number of records <= record limit
Add group of records for person 1, with group size = random(d), as follows:
o
Depth/
Width
Total
Records
>
D1
D3
D9
D27
D81
D243
D729
D2187
D6561
110515958.doc
W1
W2
W4
W8
W16
W32
W64
W12
W256
8
300
600
1200
2400
4800
9600
1920
0
3840
0
76800
1
3
5
8
80
6
300
300
300
1
1
8
16
11
219
93
600
600
1
2
3
8
25
135
75
1196
717
1
3
5
26
55
196
290
1501
2400
1
2
7
10
26
49
134
972
4330
1
1
8
2
72
131
68
1300
3737
1
2
8
13
67
90
547
346
4243
1
1
2
2
41
132
627
437
1331
1
3
2
9
42
168
446
1265
4103
Page 16 of 49
CPU Times
Analytics
Query
W1
W2
W4
W8
W16
D1
0.02
0.05
0.17
0.64
2.42
D3
0.01
0.03
0.10
0.33
1.27
D9
D27
D81
D243
D729
D2187
D6561
0.02
0.02
0.00
0.02
0.02
0.03
0.02
0.01
0.03
0.01
0.04
0.03
0.07
0.06
0.05
0.03
0.03
0.03
0.03
0.10
0.08
0.16
0.08
0.04
0.06
0.05
0.18
0.14
0.50
0.22
0.12
0.09
0.10
0.15
0.33
W12
W64
8
W256
38.4
147. 604.9
9.73
5
35
1
19.2
298.0
4.87
8
72.8
0
28.4 113.0
1.85
7.24
1
2
0.71
2.62 10.18 39.16
0.32
1.00
3.81 14.23
0.22
0.57
1.55
5.27
0.17
0.41
0.99
2.64
0.25
0.34
0.68
1.68
0.31
0.51
0.67
1.42
W32
Notes
The graph generated with Microsoft Excel 2007 may be slightly misleading as the pale blue peak
does not appear to reach 605.
Model
Query
D1
D3
D9
D27
D81
D243
D729
D2187
D6561
110515958.doc
W12
W1
W2
W4
W8
W16
W32
W64
8
W256
0.03
0.03
0.06
0.10
0.19
0.38
0.74
1.43
2.98
0.03
0.03
0.06
0.11
0.19
0.37
0.75
1.53
3.01
0.02
0.03
0.07
0.11
0.20
0.37
0.77
1.52
2.99
0.01
0.03
0.06
0.10
0.17
0.37
0.74
1.51
2.99
0.05
0.03
0.06
0.11
0.21
0.39
0.73
1.50
3.00
0.02
0.05
0.07
0.11
0.18
0.38
0.75
1.54
3.06
0.04
0.03
0.04
0.12
0.20
0.39
0.77
1.51
3.02
0.04
0.08
0.12
0.19
0.22
0.47
0.79
1.50
3.09
0.03
0.06
0.12
0.19
0.44
0.56
0.95
1.60
3.18
Page 17 of 49
Notes
Query
D1
D3
D9
D27
D81
D243
D729
D2187
D6561
W12
W1
W2
W4
W8
W16
W32
W64
8
W256
0.02
0.02
0.02
0.01
0.02
0.01
0.01
0.03
0.05
0.01
0.02
0.00
0.01
0.01
0.02
0.02
0.02
0.09
0.01
0.02
0.02
0.01
0.03
0.05
0.06
0.05
0.07
0.02
0.02
0.01
0.03
0.04
0.01
0.08
0.05
0.25
0.03
0.02
0.02
0.08
0.06
0.27
0.44
0.53
1.05
0.01
0.08
0.09
0.19
0.11
0.42
0.60
1.61
4.33
0.11
0.03
0.03
0.29
0.16
0.19
2.82
7.95 12.02
0.14
0.42
1.45
2.26
1.02
3.01
2.15
4.51 33.49
13.2
27.0
17.8
0.12
0.40
0.54
5.55
17.8
9
3
3 76.35
Notes
110515958.doc
Page 18 of 49
Slice Graphs
Wide Slice
Deep Slice
Model
-------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------110515958.doc
Page 19 of 49
|
0 | SELECT STATEMENT
|
|
|
|
5 (100)|
|
|
1 | SORT ORDER BY
|
|
12 |
828 |
5 (40)| 00:00:01 |
|* 2 |
VIEW
|
|
12 |
828 |
4 (25)| 00:00:01 |
|
3 |
SQL MODEL ORDERED |
|
12 |
336 |
4 (25)| 00:00:01 |
|
4 |
WINDOW SORT
|
|
12 |
336 |
4 (25)| 00:00:01 |
|
5 |
TABLE ACCESS FULL| ACTIVITY_NOV |
12 |
336 |
3
(0)| 00:00:01 |
-------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------2 - filter(("GROUP_START"<=TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "GROUP_END">=TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd
hh24:mi:ss')))
Discussion of Results
110515958.doc
The best method for shallow data sets is Recursive Subquery Factor
The Model method is independent of depth and performs in the wide slice at a level between the
two other methods, except for one intermediate data point where it is better than both
Page 20 of 49
SQL
SELECT /* OVERLAP */
person_id, start_date, end_date, activity_name, activity_id id,
Last_Value (group_start IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date)
group_start,
CASE First_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date RANGE
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) WHEN To_Date('01-JAN-3000', 'DD-MON-YY') THEN NULL ELSE
First_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date RANGE
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) END group_end
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
CASE WHEN (start_date > Nvl (Lag (running_end) OVER (PARTITION BY person_id ORDER BY start_date),
110515958.doc
Page 21 of 49
Act
id
Activity
name
13
LEAVE
14
LEAVE
15
LEAVE
16
LEAVE
17
TRAINING
18
TRAINING
19
LEAVE
110515958.doc
Record Level
Running
(Level 0)
Level 1 View
Start
date
01-Jun-11
End date
End date
Start
date
01-Jun-11
03-Jun-11
03-Jun-11
02-Jun-11
05-Jun-11
05-Jun-11
04-Jun-11
07-Jun-11
07-Jun-11
08-Jun-11
16-Jun-11
16-Jun-11
09-Jun-11
14-Jun-11
16-Jun-11
20-Jun-11
30-Jun-11
30-Jun-11
20-Jun-11
01-Jun-11
03-Jun-11
03-Jun-11
01-Jun-11
End date
07-Jun-11
08-Jun-11
16-Jun-11
30-Jun-11
Solution
Start
date
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
20-Jun11
01-Jun11
End date
07-Jun-11
07-Jun-11
07-Jun-11
16-Jun-11
16-Jun-11
30-Jun-11
07-Jun-11
Page 22 of 49
20
LEAVE
21
LEAVE
22
LEAVE
23
TRAINING
24
TRAINING
25
LEAVE
26
LEAVE
27
LEAVE
28
LEAVE
29
TRAINING
30
TRAINING
02-Jun-11
05-Jun-11
05-Jun-11
04-Jun-11
07-Jun-11
07-Jun-11
08-Jun-11
16-Jun-11
16-Jun-11
09-Jun-11
20-Jun-11
07-Jun-11
08-Jun-11
01-Jan-00
30-Jun-11
01-Jan-00
01-Jun-11
03-Jun-11
03-Jun-11
02-Jun-11
05-Jun-11
05-Jun-11
04-Jun-11
07-Jun-11
07-Jun-11
08-Jun-11
16-Jun-11
16-Jun-11
09-Jun-11
14-Jun-11
16-Jun-11
15-Jun-11
30-Jun-11
30-Jun-11
01-Jan-00
01-Jun-11
07-Jun-11
08-Jun-11
30-Jun-11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
08-Jun11
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
08-Jun11
07-Jun-11
07-Jun-11
07-Jun-11
07-Jun-11
07-Jun-11
30-Jun-11
30-Jun-11
30-Jun-11
Model Solution
How It Works
The key to solving this problem using Oracles Model clause is to realise that the solution can be
represented as three simple inductions. If s, e, S, E are the current start date, end date, group start date,
end date and (ps, pe, pS, pE) and (ns, ne, nS, nE) are the prior and next values, ordering by start date,
then (using C-like terminology for brevity):
Initial,
E = e; later,
E = (e > pE) ? e : pE
Initial,
S = s; later,
S = (s > pE) ? s : pS
Final,
E = e; earlier,
E = (S < nS) ? E : nE
These inductions can easily be implemented as rules within the model clause:
1. Form the basic Select, with all the table columns required, and append placeholders group_start
and group_end
2. Add the Model keyword, partitioning by person, dimensioning by analytic function Row_Number,
ordering by start date within person, and with the remaining columns as measures
3. Initialise group start and end dates to start and end dates in the measures clause
4. Define the first rule to obtain a running latest end date for all rows after the first as the previous
running end date, unless the current end date is greater than the previous running end date, in
which case take the new end date. This rule will be processed in the default ascending row order.
5. Define the second rule to obtain the group start date for all rows after the first as the start date,
unless the start date is greater than the previous running latest end date,, in which case take the
previous group start date. This rule will be processed in the default ascending row order.
6. Define the third rule to obtain the group end date for all rows before the last as the next running
latest end date, unless the group start date is less than the previous group start date, in which
case take the next group end date. This rule must be processed in descending row order, and this
is specified as it is not the default.
7. The output from the above obtains all groups, but if necessary, can be used within an inline view
to restrict the output to certain groups only (e.g. a 'current' group)
The query diagram, SQL and functional testing use the form for obtaining all break groups, while the
performance testing uses the form for obtaining a single break group, for consistency with the third
solution method.
110515958.doc
Page 23 of 49
Query Diagram
SQL
SELECT /* MOD_OVL */ person_id, start_date,
CASE end_date WHEN To_Date ('01-JAN-3000', 'DD-MON-YYYY') THEN NULL ELSE end_date END end_date,
activity_name, activity_id, group_start,
CASE group_end WHEN To_Date ('01-JAN-3000', 'DD-MON-YYYY') THEN NULL ELSE group_end END group_end
FROM activity
MODEL
PARTITION BY (person_id)
DIMENSION BY (Row_Number() OVER (PARTITION BY person_id ORDER BY start_date) rn)
MEASURES (start_date, Nvl (end_date, '01-JAN-3000') end_date, activity_name, activity_id,
start_date group_start, Nvl (end_date, '01-JAN-3000') group_end)
RULES (
group_end[rn > 1] =
CASE WHEN end_date[cv()] > group_end[cv()-1] THEN end_date[cv()] ELSE group_end[cv()-1] END,
group_start[rn > 1] =
CASE WHEN start_date[cv()] > group_end[cv()-1] THEN start_date[cv()] ELSE group_start[cv()-1]
END,
group_end[ANY] ORDER BY rn DESC = PRESENTV (group_start[cv()+1],
CASE WHEN group_start[cv()] < group_start[cv()+1] THEN group_end[cv()] ELSE group_end[cv()+1]
END,
group_end[cv()])
)
ORDER BY 1, 2, 3
110515958.doc
Page 24 of 49
4. The recursive branch extends the record set by joining records that link to extreme parent records
and that push the envelope. The direction column is set to B or F according as the direction of
extension (Forward or Backward).
5. Define a subquery factor for the envelope that simply obtains the minimum start date and
maximum end dates from the recursive factor grouped by person
6. Select all records from the envelope factor, joining the activity table for all records within the
envelope by person to get all the group records with the group start and end dates being the
envelope values.
Note that we need the additional subquery factor because the recursive factor may exclude some records
that do not extend the envelope but are contained within it; for example, record 29 in data set T5 above.
The idea here is that for cases where the break group is small this will avoid expensive processing of the
entire record set. Well demonstrate this saving in our performance analysis section.
110515958.doc
Page 25 of 49
Query Diagram
110515958.doc
Page 26 of 49
SQL
WITH
rsq (person_id, start_date, end_date, activity_name, activity_id, env_start, env_end, rn_asc,
rn_dsc, direction) AS (
SELECT person_id, start_date, end_date, activity_name, activity_id,
Min (start_date) OVER (PARTITION BY person_id) env_start,
Max (Nvl (end_date, '01-JAN-3000')) OVER (PARTITION BY person_id) env_end,
Row_Number () OVER (PARTITION BY person_id ORDER BY start_date) rn_asc,
Row_Number () OVER (PARTITION BY person_id ORDER BY Nvl (end_date, '01-JAN-3000') DESC) rn_dsc,
'E' direction
FROM activity
WHERE '&TODAY' BETWEEN start_date AND Nvl(end_date, '&TODAY')
AND person_id IN (3, 4, 5)
UNION ALL
SELECT act.person_id, act.start_date, act.end_date, act.activity_name, act.activity_id,
Min (act.start_date) OVER (PARTITION BY act.person_id) env_start,
Max (Nvl (act.end_date, '01-JAN-3000')) OVER (PARTITION BY act.person_id) env_end,
Row_Number () OVER (PARTITION BY act.person_id ORDER BY act.start_date) rn_asc,
Row_Number () OVER (PARTITION BY act.person_id ORDER BY Nvl (act.end_date, '01-JAN-3000') DESC)
rn_dsc,
CASE WHEN act.start_date < rsq.env_start THEN 'B' ELSE 'F' END
FROM rsq
JOIN activity act
ON act.person_id
= rsq.person_id
AND ((
act.start_date
< rsq.env_start AND
Nvl (act.end_date, '01-JAN-3000')
>= rsq.env_start AND
rsq.rn_asc
= 1 AND
rsq.direction
IN ('E', 'B')
) OR
(
Nvl (act.end_date, '01-JAN-3000')
> rsq.env_end AND
act.start_date
<= rsq.env_end AND
rsq.rn_dsc
= 1 AND
rsq.direction
IN ('E', 'F')
)
)
), env AS (
SELECT person_id, Min (env_start) env_start, Max (env_end) env_end
FROM rsq
GROUP BY person_id
)
SELECT /* RSQ_OVL '&TODAY' */ act.person_id, act.start_date, act.end_date, act.activity_name,
act.activity_id, env.env_start, CASE WHEN env.env_end = '01-JAN-3000' THEN NULL ELSE env.env_end END
env_end
FROM env
JOIN activity act
ON act.person_id
= env.person_id
WHERE act.start_date
BETWEEN env.env_start AND env.env_end
AND Nvl (act.end_date, '01-JAN-3000')
BETWEEN env.env_start AND env.env_end
ORDER BY act.person_id, act.start_date, act.end_date
Performance Analysis
Test Data Sets
If w and d are the numeric width and depth points, records are generated for three persons as follows:
Let random(x) be a random integer between 1 and x (generated afresh on each access)
Century start date = '01-JAN-1900'
Record limit (per person) = 500 * w
Loop for record limit (per person)
Add record for person 1, as follows:
o
Page 27 of 49
Depth/
Width
Total
Records
>
D1
D2
D4
D8
D16
D32
D64
D128
W1
W2
W4
W8
W16
W32
W64
1500
3000
6000
12000
24000
48000
96000
1
1
1
1
1
2
2
1
1
1
1
1
3
7
10
11
1
1
3
3
6
20
20
94
1
3
1
8
6
19
134
4778
7
4
5
11
47
117
8893
24000
3
3
10
45
231
6531
48000
48000
11
30
62
556
7814
96000
96000
96000
W1
W2
W4
W8
W16
W32
W64
D1
0.28
1.09
3.93
13.96
46.65
126.2
D2
D4
D8
D16
D32
D64
D128
0.28
0.28
0.29
0.28
0.20
0.18
0.12
1.51
0.95
0.96
0.78
0.61
0.44
0.22
3.51
3.68
3.17
2.42
1.71
0.67
0.25
12.39
11.08
9.13
5.48
2.29
0.63
0.58
40.73
31.73
20.02
8.19
1.67
1.16
2.03
96.72
62.95
24.98
4.86
1.28
3.73
3.93
CPU Times
Analytics
Depth/
Width
396.4
6
160.8
9
63.96
12.04
2.31
7.76
7.30
7.33
Notes
Model
Depth/
110515958.doc
W1
W2
W4
W8
W16
W32
W64
Page 28 of 49
Width
D1
D2
D4
D8
D16
D32
D64
D128
0.11
0.11
0.12
0.11
0.11
0.11
0.08
0.08
0.23
0.19
0.18
0.20
0.20
0.23
0.19
0.20
0.41
0.36
0.37
0.40
0.37
0.39
0.40
0.44
0.71
0.75
0.74
0.75
0.73
0.77
0.75
1.06
1.54
1.48
1.40
1.44
1.85
1.53
2.00
2.94
2.82
2.79
2.77
3.16
2.82
3.27
6.39
5.47
5.74
5.74
5.87
5.80
6.15
10.80
10.32
10.63
Notes
Performance for a given width is largely independent of depth, except where it starts to drop off at
the maximum depths on the wider data points
Depth/
Width
D1
D2
D4
D8
W1
W2
W4
W8
W16
W32
W64
0.03
0.01
0.02
0.01
0.01
0.03
0.02
0.03
0.03
0.02
0.03
0.03
0.05
0.05
0.01
0.05
0.09
0.04
0.06
0.14
0.10
0.09
0.16
0.57
D16
0.04
0.00
0.04
0.05
0.26
2.25
D32
0.02
0.05
0.07
0.06
0.58
D64
0.03
0.03
0.06
0.39
39.28
D128
0.03
0.05
0.16
9.86
90.30
59.10
330.4
3
325.5
1
0.38
0.71
1.30
10.03
121.2
2
1255.
15
1240.
12
1213.
75
110515958.doc
Page 29 of 49
Notes
Depth/
Width
D1
D2
D4
D8
W1
W2
W4
W8
W16
W32
W64
0.03
0.02
0.02
0.02
0.03
0.03
0.03
0.04
0.02
0.03
0.11
0.03
0.14
0.19
0.10
0.20
0.79
0.40
0.03
0.78
0.69
0.65
0.71
3.27
D16
0.03
0.06
0.10
0.19
1.27
D32
0.03
0.10
0.32
0.25
0.06
D64
0.03
0.13
0.24
0.10
68.10
9.27
113.8
3
186.7
0
D128
0.03
0.03
0.42
0.61
48.46
93.41
5.64
4.14
6.21
48.11
242.4
7
735.1
5
731.5
5
158.9
4
110515958.doc
Page 30 of 49
Notes
Performance for a given width worsens dramatically with depth, although less so than for the
unhinted query
Slice Graphs
Wide Slice
Deep Slice
Page 31 of 49
Model
-----------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
-----------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 2794 (100)|
|
|
1 | SORT ORDER BY
|
| 96660 | 6513K| 8416K| 2794
(1)| 00:00:34 |
|* 2 |
VIEW
|
| 96660 | 6513K|
| 1219
(1)| 00:00:15 |
|
3 |
SQL MODEL ORDERED |
| 96660 | 3964K|
| 1219
(1)| 00:00:15 |
|
4 |
WINDOW SORT
|
| 96660 | 3964K| 5696K| 1219
(1)| 00:00:15 |
|
5 |
TABLE ACCESS FULL| ACTIVITY | 96660 | 3964K|
|
171
(1)| 00:00:03 |
-----------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------2 - filter(("GROUP_START"<=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "GROUP_END">=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd
hh24:mi:ss')))
Page 32 of 49
| 14 |
MERGE JOIN
|
| 1741K| 169M|
|
121K (3)| 00:24:23 |
| 15 |
TABLE ACCESS BY INDEX ROWID| ACTIVITY
|96660 |3964K|
| 96331
(1)| 00:19:16 |
| 16 |
INDEX FULL SCAN
| ACTIVITY_N1|96000 |
|
|
517
(1)| 00:00:07 |
|* 17 |
FILTER
|
|
|
|
|
|
|
|* 18 |
SORT JOIN
|
|21616 |1266K| 3256K| 22642
(1)| 00:04:32 |
| 19 |
RECURSIVE WITH PUMP
|
|
|
|
|
|
|
| 20 |
MERGE JOIN
|
| 1741K| 169M|
|
121K (3)| 00:24:23 |
| 21 |
TABLE ACCESS BY INDEX ROWID| ACTIVITY
|96660 |3964K|
| 96331
(1)| 00:19:16 |
| 22 |
INDEX FULL SCAN
| ACTIVITY_N1|96000 |
|
|
517
(1)| 00:00:07 |
|* 23 |
FILTER
|
|
|
|
|
|
|
|* 24 |
SORT JOIN
|
|21616 |1266K| 3256K| 22642
(1)| 00:04:32 |
| 25 |
RECURSIVE WITH PUMP
|
|
|
|
|
|
|
| 26 |
TABLE ACCESS BY INDEX ROWID
| ACTIVITY
|
1 | 42 |
|
2
(0)| 00:00:01 |
|* 27 |
INDEX RANGE SCAN
| ACTIVITY_N1|
1 |
|
|
1
(0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------10 - access("ACTIVITY"."SYS_NC00006$">=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"START_DATE"<=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss'))
filter(("START_DATE"<=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"ACTIVITY"."SYS_NC00006$">=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss')))
17 - filter(("ACT"."START_DATE"<="RSQ"."ENV_END" AND "RSQ"."ENV_END"<NVL("END_DATE",
TO_DATE(' 3000-01-0100:00:00', 'syyyy-mm-dd hh24:mi:ss'))))
18 - access("ACT"."PERSON_ID"="RSQ"."PERSON_ID")
filter("ACT"."PERSON_ID"="RSQ"."PERSON_ID")
23 - filter(("ACT"."START_DATE"<"RSQ"."ENV_START" AND "RSQ"."ENV_START"<=NVL("END_DATE",TO_DATE('
3000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) AND (LNNVL("RSQ"."ENV_END"<NVL("END_DATE",
TO_DATE('3000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
OR LNNVL("ACT"."START_DATE"<="RSQ"."ENV_END") OR
LNNVL("RSQ"."RN_DSC"=1) OR (LNNVL("RSQ"."DIRECTION"='E')
AND LNNVL("RSQ"."DIRECTION"='F')))))
24 - access("ACT"."PERSON_ID"="RSQ"."PERSON_ID")
filter("ACT"."PERSON_ID"="RSQ"."PERSON_ID")
27 - access("ACT"."PERSON_ID"="ENV"."PERSON_ID" AND "ACT"."START_DATE">="ENV"."ENV_START" AND
"ENV"."ENV_START"<="ACT"."SYS_NC00006$" AND "ACT"."START_DATE"<="ENV"."ENV_END" AND
"ENV"."ENV_END">="ACT"."SYS_NC00006$")
filter(("ENV"."ENV_START"<="ACT"."SYS_NC00006$" AND "ENV"."ENV_END">="ACT"."SYS_NC00006$"))
Discussion of Results
110515958.doc
The best method for shallow data sets is Recursive Subquery Factor. The hinted version levels
the performance off at the extremes, but does not make a preferred option
The Model method is largely independent of depth and performs in the wide slice at a level
between the two other methods, except for one intermediate data point where it is better than both
Page 33 of 49
Model Solution
How It Works
The key to solving this problem using Oracles Model clause is to realise that the solution can be
represented as simple inductions, forward for the group start dates, then backward for the group end
dates. If D is the distance parameter, s, e, S, E are the current start date, end date, group start date, end
date and (ps, pe, pS, pE) and (ns, ne, nS, nE) are the prior and next values then (using C-like terminology
for brevity):
Initial,
S = s; later,
S = (s pS > D) ? s : pS
Final,
E = e; earlier,
E = nS > S ? e : nE
These inductions can easily be implemented as rules within the model clause:
1. Form the basic Select, with all the table columns required, and append placeholders group_start
and group_end
2. Add the Model keyword, partitioning by person, dimensioning by analytic function Row_Number,
ordering by start date within person, and with the remaining columns as measures
3. Initialise group start and end dates to start and end dates in the measures clause
4. Define the first rule to obtain the group start date for all rows after the first as the start date, unless
the start date is less than the distance parameter from the previous group start date, in which
case take that value. This rule will be processed in the default ascending row order.
5. Define the second rule to obtain the group end date for all rows before the last as the next group
end date, unless the group start date is less than the next group start date, in which case take the
current end date. This rule must be processed in descending row order, and this is specified as it
is not the default.
6. The output from the above obtains all groups, but if necessary, can be used within an inline view
to restrict the output to certain groups only (e.g. a 'current' group)
The query diagram, SQL and functional testing use the form for obtaining all break groups, while the
performance testing uses the form for obtaining a single break group, for consistency with the second
solution method.
110515958.doc
Page 34 of 49
Query Diagram
SQL
SELECT /* MOD */ person_id, start_date, end_date, activity_name, activity_id, group_start, group_end
FROM activity
MODEL
PARTITION BY (person_id)
DIMENSION BY (Row_Number() OVER (PARTITION BY person_id ORDER BY start_date) rn)
MEASURES (start_date, end_date, activity_name, activity_id, start_date group_start, end_date
group_end)
RULES (
group_start[rn > 1] = CASE WHEN start_date[cv()] - group_start[cv()-1] > 3 THEN start_date[cv()]
ELSE group_start[cv()-1] END,
group_end[ANY] ORDER BY rn DESC = PRESENTV (group_start[cv()+1],
CASE WHEN group_start[cv()] < group_start[cv()+1] THEN end_date[cv()] ELSE group_end[cv()+1] END,
end_date[cv()])
)
ORDER BY 1, 2, 3
110515958.doc
Page 35 of 49
Query Diagram
.
SQL
WITH
act AS (
SELECT person_id, start_date, end_date, activity_name, activity_id, Row_Number() OVER (PARTITION BY
person_id ORDER BY start_date) rn
FROM activity
WHERE start_date >= '&TODAY'
),
rsq (person_id, rn, start_date, end_date, activity_name, activity_id, group_start) AS (
SELECT person_id, rn, start_date, end_date, activity_name, activity_id, start_date
group_start
FROM act
WHERE rn = 1
UNION ALL
SELECT act.person_id,
act.rn,
act.start_date,
act.end_date,
act.activity_name,
act.activity_id,
rsq.group_start
110515958.doc
Page 36 of 49
FROM
JOIN
ON
AND
AND
act
rsq
rsq.rn
= act.rn - 1
rsq.person_id
= act.person_id
act.start_date - rsq.group_start <= 3
)
SELECT /* RSQ_DST '&TODAY' */ rsq.person_id,
rsq.start_date,
rsq.end_date,
rsq.activity_name,
rsq.activity_id,
rsq.group_start,
Max (rsq.end_date) OVER (PARTITION BY rsq.person_id)
FROM rsq
ORDER BY 1, 2, 3
Performance Analysis
Test Data Sets
If w and d are the numeric width and depth points, records are generated for three persons as follows:
Let random(x) be a random integer between 1 and x (generated afresh on each access)
Record limit (per person) = 500 * w
Loop for record limit (per person)
Add record for person 1, as follows:
o
Depth/
Width
Total
Records
>
D1
D3
D9
D27
D81
D243
D729
D2187
110515958.doc
W1
W2
W4
W8
W16
W32
W64 W128
1500
3000
6000
1200
0
2400
0
4800
0
9600
0
19200
0
3
3
3
5
4
11
31
3
4
3
9
9
21
70
3
3
3
7
16
38
138
3
6
5
15
32
74
229
4
5
9
28
64
150
494
4
7
16
31
125
290
907
7
12
29
72
218
678
1959
97
164
361
742
1444
2881
5785
12
23
71
138
438
1295
3794
1156
1
Page 37 of 49
CPU Times
Model
Depth/
Width
Total
Records
>
D1
D3
D9
D27
D81
D243
D729
D2187
W1
W2
W4
W8
W16
W32
W64 W128
1500
3000
6000
1200
0
2400
0
4800
0
9600
0
19200
0
0.07
0.10
0.10
0.09
0.09
0.09
0.11
0.08
0.16
0.16
0.15
0.14
0.16
0.16
0.15
0.18
0.29
0.30
0.29
0.31
0.31
0.32
0.31
0.29
0.58
0.59
0.59
0.58
0.58
0.59
0.61
0.69
1.19
1.15
1.17
1.19
1.17
1.17
1.20
1.23
2.28
2.34
2.31
2.35
2.36
2.29
2.34
2.43
4.73
4.64
4.67
4.68
4.69
4.71
4.84
5.00
9.39
9.41
9.29
9.41
9.38
9.42
9.57
10.03
Notes
Depth/
Width
Total
Records
>
D1
D3
D9
D27
D81
W1
W2
W4
W8
W16
W32
W64 W128
1500
3000
6000
1200
0
2400
0
4800
0
9600
0
19200
0
0.01
0.03
0.03
0.03
0.03
0.03
0.03
0.02
0.03
0.05
0.05
0.03
0.03
0.05
0.06
0.07
0.06
0.06
0.11
0.14
0.13
0.12
0.16
0.24
0.36
0.22
0.29
0.31
0.47
1.11
1.22
1.48
3.05
4.93
13.61
D243
0.05
0.05
0.10
0.22
0.71
2.42
D729
0.01
0.10
0.21
0.53
2.00
D2187
0.08
0.13
0.46
1.62
5.63
7.00
21.4
5
0.53
0.61
0.83
1.62
3.74
10.5
8
27.7
2
82.3
7
110515958.doc
37.66
107.7
3
317.8
2
Page 38 of 49
Notes
Slice Graphs
Wide Slice
Deep Slice
110515958.doc
Page 39 of 49
Discussion of Results
110515958.doc
The best method for shallow data sets is Recursive Subquery Factor
The best method for deep data sets is Model, which also is independent of depth
Page 40 of 49
Explain Plan
---------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
---------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
|
522 (100)|
|
|
1 | WINDOW SORT
|
| 19200 | 1293K| 1680K|
522
(1)| 00:00:07 |
|
2 |
WINDOW SORT
|
| 19200 | 1293K| 1680K|
522
(1)| 00:00:07 |
|
3 |
VIEW
|
| 19200 | 1293K|
|
205
(1)| 00:00:03 |
|
4 |
WINDOW SORT
|
| 19200 |
618K|
912K|
205
(1)| 00:00:03 |
|
5 |
TABLE ACCESS FULL| ACTIVITY_NOV | 19200 |
618K|
|
30
(0)| 00:00:01 |
----------------------------------------------------------------------------------------------
110515958.doc
Page 41 of 49
SQL
SELECT /* NOV_MAX */
person_id, start_date, end_date, activity_name, id,
group_start,
Max (end_date) OVER (PARTITION BY person_id, group_start) group_end
FROM (
SELECT
person_id, start_date, end_date, activity_name, activity_id id,
Last_Value (group_start IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date)
group_start
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
CASE WHEN (start_date > Nvl (Lag (end_date) OVER (PARTITION BY person_id ORDER BY start_date),
start_date-1)) OR
(activity_name != Lag (activity_name) OVER (PARTITION BY person_id ORDER BY start_date))
THEN start_date END group_start,
CASE WHEN (Nvl (Lead (start_date) OVER (PARTITION BY person_id ORDER BY start_date), end_date+1) >
end_date) OR
(activity_name != Lead (activity_name) OVER (PARTITION BY person_id ORDER BY
start_date)) THEN end_date END group_end
FROM activity_nov
)
)
ORDER BY person_id, start_date
Explain Plan
-----------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
-----------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 1041 (100)|
|
|
1 | SORT ORDER BY
|
| 19200 | 1125K| 1448K| 1041
(1)| 00:00:13 |
|
2 |
WINDOW SORT
|
| 19200 | 1125K| 1448K| 1041
(1)| 00:00:13 |
|
3 |
VIEW
|
| 19200 | 1125K|
|
484
(1)| 00:00:06 |
|
4 |
WINDOW SORT
|
| 19200 | 1125K| 1448K|
484
(1)| 00:00:06 |
|
5 |
VIEW
|
| 19200 | 1125K|
|
205
(1)| 00:00:03 |
|
6 |
WINDOW SORT
|
| 19200 |
618K|
912K|
205
(1)| 00:00:03 |
|
7 |
TABLE ACCESS FULL| ACTIVITY_NOV | 19200 |
618K|
|
30
(0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Page 42 of 49
--------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
|
205 (100)|
|
|
1 | WINDOW SORT
|
| 19200 | 1293K|
|
205
(1)| 00:00:03 |
|
2 |
VIEW
|
| 19200 | 1293K|
|
205
(1)| 00:00:03 |
|
3 |
WINDOW SORT
|
| 19200 |
618K|
912K|
205
(1)| 00:00:03 |
|
4 |
TABLE ACCESS FULL| ACTIVITY_NOV | 19200 |
618K|
|
30
(0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Explain Plan
-------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 2745 (100)|
|
|
1 | WINDOW SORT
|
| 94880 | 6393K| 8264K| 2745
(1)| 00:00:33 |
|
2 |
WINDOW SORT
|
| 94880 | 6393K| 8264K| 2745
(1)| 00:00:33 |
|
3 |
VIEW
|
| 94880 | 6393K|
| 1199
(1)| 00:00:15 |
|
4 |
WINDOW BUFFER
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
5 |
VIEW
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
6 |
WINDOW SORT
|
| 94880 | 3891K| 5592K| 1199
(1)| 00:00:15 |
|
7 |
TABLE ACCESS FULL| ACTIVITY | 94880 | 3891K|
|
171
(1)| 00:00:03 |
--------------------------------------------------------------------------------------------
110515958.doc
Page 43 of 49
SQL
SELECT /* ANA_MAX */
person_id, start_date, end_date, activity_name, id,
group_start,
CASE Max (Nvl(end_date, To_Date('01-JAN-3000', 'DD-MON-YY'))) OVER (PARTITION BY person_id,
group_start) WHEN To_Date('01-JAN-3000', 'DD-MON-YY') THEN NULL ELSE
Max (end_date) OVER (PARTITION BY person_id, group_start) END group_end
FROM (
SELECT /* ANA_OVL */
person_id, start_date, end_date, activity_name, activity_id id,
Last_Value (group_start IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date)
group_start
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
CASE WHEN (start_date > Nvl (Lag (running_end) OVER (PARTITION BY person_id ORDER BY start_date),
start_date-1)) THEN start_date END group_start,
CASE WHEN (Nvl (Lead (start_date) OVER (PARTITION BY person_id ORDER BY start_date),
running_end+1) > running_end) THEN running_end END group_end
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
Max (Nvl(end_date, '01-JAN-3000')) OVER (PARTITION BY person_id ORDER BY start_date) running_end
FROM activity
)
)
)
ORDER BY person_id, start_date
Explain Plan
-------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 2745 (100)|
|
|
1 | WINDOW SORT
|
| 94880 | 6393K| 8264K| 2745
(1)| 00:00:33 |
|
2 |
WINDOW SORT
|
| 94880 | 6393K| 8264K| 2745
(1)| 00:00:33 |
|
3 |
VIEW
|
| 94880 | 6393K|
| 1199
(1)| 00:00:15 |
110515958.doc
Page 44 of 49
|
4 |
WINDOW BUFFER
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
5 |
VIEW
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
6 |
WINDOW SORT
|
| 94880 | 3891K| 5592K| 1199
(1)| 00:00:15 |
|
7 |
TABLE ACCESS FULL| ACTIVITY | 94880 | 3891K|
|
171
(1)| 00:00:03 |
--------------------------------------------------------------------------------------------
Performance Analysis
Problem 1: Contiguous Ranges
Group Sizes by Depth
The output consists of all the records (76,000) and the table below gives the average group sizes, which
are written to the log by a query in the data setup program.
Depth
D1
D3
D9
D27
D81
D243
D729
D2187
Group Size
1
2
5
14
41
125
356
985
CPU Times
D1
D3
D9
Depth ->
1
2
5
Group Size
->
ANA
694.71 340.58 130.57
NOF
7.05
6.63
6.88
MAX
5.78
5.91
5.46
MOD
7.78
7.48
7.62
110515958.doc
D27
14
D81
41
D243
125
50.47
7.24
4.99
7.55
21.28
6.53
5.32
7.78
10.87
6.92
5.60
7.53
D729
356
D2187
985
8.63
7.12
5.84
8.08
7.74
7.27
5.42
7.45
Page 45 of 49
Depth
D1
D3
D9
D27
D81
D243
D729
D2187
Group Size
4
6
9
33
120
2602
24615
32000
CPU Times
D1
D3
D9
Depth ->
4
6
9
Group Size
->
ANA
278.09 180.97 120.87
NOF
9.18
8.70
8.89
MAX
9.19
8.95
8.95
MOD
11.76
12.12
12.33
110515958.doc
D27
33
D81
120
D243
2602
36.26
9.53
8.83
11.37
16.6
8.76
8.63
11.61
9.13
8.83
8.69
11.64
D729
24615
D2187
32000
9.37
9.15
8.95
11.97
8.40
8.42
8.60
11.47
Page 46 of 49
110515958.doc
Page 47 of 49
Conclusions
Solution methods have been presented for a number of range-based SQL grouping problems, including
relatively new techniques from Oracle Database 10.1 and 11.2. It has been shown that the best method
depends not just on the size of the data set, but also on its shape. A few summary points may be made in
relation to these problems:
110515958.doc
The Model clause tends to produce relatively simple SQL that performs consistently across data
sets
The new Recursive Subquery Factor feature can be extremely efficient in cases where the
records in the solution set are much fewer than the total, but only works for a single group
Solutions using analytic functions are slightly more efficient than model solutions where available,
but an important performance glitch in certain cases has been identified and needs to be worked
around
SQL developers interested in performance need to be proficient in all three techniques (most are
familiar only with the older, from Oracle v8, analytic functions technique)
Performance testing can be more effective when executed by automated methods across multidimensional domains
Page 48 of 49
References
REF
Document
REF-1
REF-2
REF-3
110515958.doc
Details
Question by Jayadev on Tom Kytes Oracle
database forum
BP Furey, June 2011
http://www.oracle.com/pls/db112
Page 49 of 49