Professional Documents
Culture Documents
Jonathan Lewis
www.jlcomp.demon.co.uk
Who am I
Independent Consultant.
www.jlcomp.demon.co.uk
A Puzzle
Basic Costs
Correcting Oracle's assumptions
Oracle 9 learns
(Join Mechanics - time permitting)
Q and A
create table t2 as
select
mod(rownum,200) n1,
mod(rownum,200) n2,
rpad('x',100) v1
from all_objects where rownum <= 3000;
We construct two sets of data with identical content, although we do use two
© Jonathan Lewis NoCOUG 2003
2001 - 2003 different mathematical methods to get 15 rows each for 200 different values.
A puzzle - indexed
USER_TABLES
TABLE_NAME BLOCKS NUM_ROWS AVG_ROW_LEN
----------- ------ ---------- -----------
T1 96 3000 111
T2 96 3000 111
USER_TAB_COLUMNS
TAB COL LOW_VALUE HIGH_VALUE NUM_DISTINCT
--- ---- --------- ---------- ------------
T1 N1 80 C20264 200
T2 N1 80 C20264 200
We can check that statistics like number of rows, column values and column
© Jonathan Lewis NoCOUG 2003
2001 - 2003 counts are identical. The data contents is identical across the two tables.
A puzzle - the problem
We now run exactly the same query against the two sets of data - with
© Jonathan Lewis NoCOUG 2003
2001 - 2003 autotrace switched on - and find that the execution plans are different.
A puzzle - force it
Why has Oracle ignored the index on T2 ? Put in the hint(s) to make it
© Jonathan Lewis NoCOUG 2003
2001 - 2003 happen, and see if we get any clues. The cost of a tablescan is cheaper !
A puzzle - the detail
select
table_name tab,
num_rows num_rows,
avg_leaf_blocks_per_key l_blocks,
avg_data_blocks_per_key d_blocks,
clustering_factor cl_fac
from user_indexes;
Why is the tablescan cheaper ? We look at the data scattering, rather than the
© Jonathan Lewis NoCOUG 2003
2001 - 2003 data content, and find the answer. The clustering is different.
A puzzle - the difference
0, 0, 0, 0 …. 1, 1, 1,… 0, 1 , 2, 3, 4, 5, 6, 7,…
5, 5, 5, 5, … 6, 6, 6, 6.. 45, 46, 47, 48, 49,…
10, 10, 10, …..11, 11, .. 103, 104, 105, …
….. ….. 198, 199, 0, 1, 2, ..
45,45,45,45,45,….. 40, 41, 42, 43, 44, 45, ..
46,46,..
….. …..
131, 131, 131, … …..
….. ….. 44, 45, 46, ….
…..199,199,199,199 …..196,197,198,199
The data on the left shows the effect of the trunc() function, the data on the
© Jonathan Lewis NoCOUG 2003
2001 - 2003 right shows the mod() effect. The statistics describe the data perfectly.
The arithmetic
T2 by index
one index block, 15 data blocks = 16
T2 by scan
96 blocks / 8 (multiblock read) = 12
(this is a first approximation)
T1 by index
one index block, one data block = 2
Silly assumption 1: Every logical request turns into a physical read.
© Jonathan Lewis NoCOUG 2003
2001 - 2003 Silly assumption 2: A multiblock read is just as fast as a single block read.
Multiblock Read
Actual Adjusted
4 4.175
Adjusted
8 6.589
16 10.398
32 16.409
64 25.895
128 40.865
Actual
96 / 6.589 = 14.57
(and add 1 in 9.2)
EVENT AVERAGE_WAIT
db file sequential read 1.05
db file scattered read 3.72
Tim Gorman (www.evdbt.com) - The search for intelligent life in the CBO.
*** but see Garry Robinson: http://www.oracleadvice.com/Tips/optind.htm
The really nice thing about this is that we can set a genuine and realistic cost
© Jonathan Lewis NoCOUG 2003
2001 - 2003 factor by checking recent, or localised, history. (snapshot v$session_event)
Join cost-adjustment
We can even see the effect of this price fixing in joins. Unhinted, or unfixed,
© Jonathan Lewis NoCOUG 2003
2001 - 2003 the optimizer chooses a hash join as the cheapest way to our two tables.
Hash Join (1)
Hashed
The first table is hashed in memory, the second table is used to probe the
© Jonathan Lewis NoCOUG 2003
2001 - 2003 hash (build) table for matches. In simple cases the cost is easy to calculate.
Force a nested loop
select
/*+ ordered use_nl(t1) index(t1) */
t2.n1, t1.n2
from t2,t1
where t2.n2 = 45
and t2.n1 = t1.n1;
The nested loop algorithm is: for each row in the outer table, use the value in
© Jonathan Lewis NoCOUG 2003
2001 - 2003 that row to access the inner table - hence the simple formula.
Nested Loop
T1
T2
The basic arithmetic of the nested loop join is visible in the picture. We do
© Jonathan Lewis NoCOUG 2003
2001 - 2003 three indexed access into T2, but need the three driving rows from T1 first.
Nested Loops - recosted
alter session set OPTIMIZER_INDEX_COST_ADJ = 50;
NESTED LOOPS (Cost=30 Card=225)
TABLE ACCESS(FULL) OF T2 (Cost=15, Card=15)
TABLE ACCESS(BY ROWID) OF T1(Cost=1, Card=3000)
INDEX(RANGE SCAN) OF T_I1(NON-UNIQUE)(Cost=1)
What happens to the cost when we tell Oracle that single block reads cost
© Jonathan Lewis NoCOUG 2003
2001 - 2003 half as much as it would otherwise charge ?
Index Caching (NL only)
Basic nested loop cost (hinted)
We can improve silly assumption 2 (every logical I/O is also a physical I/O).
© Jonathan Lewis NoCOUG 2003
2001 - 2003 Index blocks are often cached. So tell the optimizer how good our cache is.
Simplifications
blevel +
selectivity * leaf_blocks +
selectivity * clustering_factor
(see Wolfgang Breitling's paper to IOUG-A 2002)
(1 - optimizer_index_caching/100)
+
selectivity * clustering_factor Table bit
) *
optimizer_index_cost_adj / 100
dbms_stats.gather_system_stats('start')
dbms_stats.gather_system_stats('stop')
But in version 9 you need the 'fudge factors' less (You could still use them as
© Jonathan Lewis NoCOUG 2003
2001 - 2003 indicators of caching) - instead, you let Oracle learn about your hardware
Conclusions
Merge Merge
Sort Sort
To next step -
e.g. order by
Once the two sets are in order, they can be shuffled together. The shuffling
© Jonathan Lewis NoCOUG 2003
2001 - 2003 can be quick - the sorting may be the most expensive bit.
In-memory sort
PGA
sort_area_retained_size sort_area_retained_size
To disc
Sort_area_size - sort_area_retained_size
In a merge join, even if the first sort completes in memory, it will still dump
© Jonathan Lewis NoCOUG 2003
2001 - 2003 the excess over sort_area_retained_size to disc. (and so will the second sort)
Big Sorts
A one-pass sort. The data has been read, sorted, and dumped to disc in
© Jonathan Lewis NoCOUG 2003
2001 - 2003 chunks, then re-read once to be merged into order, and dumped again.
Huge Sorts
Sort
Merge 1 Merge 2
Merge 3
Multipass sort. After sorting the data in chunks, Oracle was unable to re-read
© Jonathan Lewis NoCOUG 2003
2001 - 2003 the top of every chunk simultaneously, so we have multiple merge passes.
Hash Join (1)
Hashed
The first table is hashed in memory, the second table is used to probe the
© Jonathan Lewis NoCOUG 2003
2001 - 2003 hash (build) table for matches. In simple cases the cost is easy to calculate.
Hash Join (2)
Bit mapped
Hashed and
partitioned
Dump
to disc
Dump
to disc
Hashed and
partitioned
Dump
to disc
Dump
to disc
And if things go really wrong (bad statistics) Oracle uses partitions which are
© Jonathan Lewis NoCOUG 2003
2001 - 2003 too large - and the probe (secondary) partitions are re-read many times.
Version 9 approach
pga_aggregate_target = 500M
worksize_area_policy = auto
v$sysstat
Sorts, hashes, bitmap creates (v.9)
workarea executions - optimal
The job completed in memory - perfect.
workarea executions - onepass
The job required a dump to disk and single re-read.
workarea executions - multipass
Data was dumped to disc and re-read more than once.