You are on page 1of 11

SQL Query Optimization -1-

******** Nested loop information *******************

Performance tuning focuses on writing efficent SQL, allocating computer resources and analyzing wait events
and contention in the system. The design approach to a database is critical to ensuring the best performance
from a database, here are the steps when designing a db

1. Design the application correctly


2. Tune the application SQL code
3. Tune memory
4. Tune I/O
5. Tune contention and other issues

There are two approaches to preformance tuning

• pro-active - try to see the problem before it happens i.e warning alert on a tablespace threshold before it
becomes a problem
• reactive - the problem has already occurred and action is required

pro-active means that it does not impact the end user where as reactive is generally when the user tells you
he/she has a problem. OEM provides many graphs that can show you trends about a particular area of the
database for more information see OEM.doc.

One rule is first try and fix the SQL code, especially if it is old code, no matter how you tune the database if the
code is badly written it won't make a different, however oracle does provide some parameters that can help with
poorly written code.

Oracle Query process

Query processing requires the transformation of your SQL query into an efficient execution plan, Query
optimization requires that the best execution plan is the one executed, the goal is to use the least amount of
resources possible (CPU and I/O), the more resources a query uses the more impact it has on the general
performance of the database.

A users query will go through 3 phases

At this tage the syntax ( decomposed into a relational algebra that's analyzed to see if its
syntactically correct) and semantics (make sure that tables and columns exist and you have
permissions to access the objects) are checked, at the end you have a parse tree which represents
the query's structure.
Parsing
The statement is normalized so that it can be processed more efficiently, once all the checks have
completed it is considered a valid parse tree and is sent to the logical query plan generation stage.

All this is done in the library cache of the SGA


Optimization The optimizer is used at this stage, which is a cost-based optimizer (CBO - see below for more
info), this chooses the best access method to retrieve the requested data. It uses statistics and any
hints specified in the SQL query, the CBO produces an optimal execution plan for the SQL
SQL Query Optimization -2-

staement.

The optimization process can be divided in two parts

Query rewrite phase

The prase tree is converted into an abstract logical query plan, the various nodes and branches are
replaced by operators of relational algebra.

Execution plan generation phase

Oracle transforms the logical query plan into a physical query plan, the physical query or execution
plan takes into account the following factors

• The various operations (joins) to be performed during the query


• The order in which the operations are performed
• The algorithm to be used for performing each operation
• The best way to retrieve data from disk or memory
• The best way to pass data from operation to another during the query

The optimizer may well come up with multiple physical plans, all of which are potential execution
plans. The optimizer then chooses among them by estimating the costs of each possible plan (based
on table and index statistics) and selecting the plan with the lowest cost. This is called the cost-
based query optimization (CBO).
Execution The final stage is to execute the physical query plan that was selected y the CBO

Optimization Mode

In previous versions of Oracle you had two choices RBO (rule-based optimizer) or CBO (cost-based optimizer),
RBO is available in Oracle 10g but is a deprecated product the CBO is the preferred and default method.

RBO used a heuristic method to select among several alternative access paths with the help of certain
Rule-Base rules. All paths were ranked and the lowest was choosen i.e using the ROWID was a cost of 1, a
Optimizer full table scan was a cost of 19.
The CBO uses statistics on tables and indexes, the order of tables and columns in the SQL
CBO statements, available indexes, and any user-supplied hints to pick the most efficient way to access
Cost-based the data requested.
Optimizer
CBO almost always performs better than RBO.

Cost-Based Optimizer (CBO)

The optimizers job is to find out the optimal or best plan to execute your DML statements (select, insert, update
and delete). The CBO uses statistics on tables and indexes, the order of tables and columns in the SQL
statements, available indexes, and any user-supplied hints to pick the most efficient way to access the data
SQL Query Optimization -3-

requested, the CBO uses the least costly method (cost being CPU and I/O which is the most expensive) to get at
the data.

Oracle 10g has a automatic job that collects statistics which is used the the CBO, the gather_stats_job is run
between 10pm-6am everyday. The job collects statistics on all tables that either have no statistics or stale
statistics (more than 10% of data has changed in the table since the last collection).

select last_analyzed, table_name, owner, num_rows, sample_size from dba_tables order by


last_analyzed;
Check if statistics are
being collected Note: sample_size can vary from 1 to 100%, the greater the sample_size the better statistics
are obtained, however a general 5 to 20% should be enough in most cases especially in very
large tables

Once the statistics have been gathered a number of columns will have been updated in a table or index

• Number of rows and number of rows per block


• Average row length
• Total number of database blocks in a table
• Number of levels in each index
• Number of leaf blocks in each index
• Number of distinct values in each column of a table
• Data distribution histograms
• Number if distinct index keys
• Cardinality (the number of columns with simular values for each column)
• Minimum and Maximum values for each column
• System statistics, which include I/O characteristics of your system and CPU statistics which include
CPU speed and other related statistics.

The CBO will use all of the above statistics and other statistics (CPU, I/O and O/S statistics) to help with
finding the optimal plan, using the above statistics the CBO can estimate costs of individual operations. The less
number of statistics collected will result in less physical plans the CBO can come up with, thus the less number
of choices the CBO can make.

Column statistics select column_name, num_distinct fro dba_col_statistics where table_name = 'PERSONNEL';

Optimizer Configuration

The mode level can be set to a number of levels, depending on what your application requirements are you can
start sending data to the user quickly, if the user requires all the data to be seen altogether then the ALL_ROWS
option is best but if the user does not care about getting all the data only getting something fast then
FIRST_ROWS may be an option, also by using hints you can force the optimizer to take a choose path see
Oracle sql optimization.doc under Hintsbelow for more details

Mode Levels
ALL_ROWS (default) The optimizer will process all rows before outputing data
SQL Query Optimization -4-

FIRST_ROWS_n The optimizer will process n rows before outputing data


FIRST_ROWS Once data is available start outputing immediately.
Setting Level
Mode Level (system
alter system set optimizer_mode = ALL_ROWS;
wide)
Mode Level (session
alter session set optimizer_goal = first_rows_10;
level)

CBO Drawbacks

The CBO may not use the same plan for the same SQL statement everytime, you need to watch out for the
following

• Execution plans can change across oracle versions, to make sure that the same plan is used then you
need to use stored outlines
• The application developer may know of a better plan than the CBO, in which case hints can be used to
direct or help the CBO.
• The CBO depends enormously on current statistics, if the statistics are absent or outdated the optimizer
can make poor decisions

Manually Collect Statistics

To collect statistics manually you need to use the package dbms_stats, you can sample row or block and
depending on the size of the table

dbms_stats.gather_database_stats (
estimate_percent => NULL,
method_opt => 'auto',
granularity => 'all,
cascade => 'true',
option => 'gather_auto'
);

Note:
Database esimate - refers to the percentage of rows that should be used to estimate the statistics, null
means all rows
method_opt - can specify several things , like include histograms, AUTO means collect
histograms
granularity - only applies ti tables, ALL means collect statistics for subpartitions, partitions, etc
cascade - collect all statistics for both table and index
option - gather = collect statistics for all objects even it the object has stale or fresh statistics
gather_auto = Oracle will decide on what to collect
gather_empty = collect only for objects that do not have any statistics
gather_stale = collect only for objects with stale statistics
Schema dbms_stats.gather_schema_stats( ownname => 'VALLEP');
SQL Query Optimization -5-

Table dbms_stats.gather_table_stats( 'hr', 'employees');


Index dbms_stats.gather_index_stats( 'hr', 'employ_idx');
dbms_stats.create_stat_table(ownname => 'vallep', stat_tab => 'stats_table', tblspace =>
'stat_tbs');

dbms_stats.gather_system.stats('start'); (wait a while then run stop)


System dbms_stats.gather_system.stats('stop');

dbms_stats.gather_system_stats(interval => 720, stat_tab => 'stats_table', stat_id => 'OLTP');

Note: you can use the create_stat_table to transfer system stats to other databases.

You can also collect statistics by using the analyze command

analyze table employee compute statistics;


Analyze a table
Note: additonal columns are filled in when this is run i.e average_row_size,
last_analyzed, size_of_table (in data blocks)
Useful Tables
DBA_TABLES look at the last_analyzed column to make stats are being collected
DBA_OBJECT_TABLES look at the last_analyzed column to make stats are being collected

Dynamic Sampling

Dynamic sampling is controlled by the optimizer_dynamic_sampling parameter which accepts values from 0
(off) to 10 (agressive sampling) with 2 as the default.

Dynamic sampling can be a benefit

• The sample time is small compared to the overall query execution time
• Results in better performing query
• Query may be executed multiple times

Configuration
Instance alter system set optimizer_dynamic_sampling = 2;
Session alter session set optimizer_dynamic_sampling = 5;
Usage
select /*+ dynamic_sampling(employees 5) */
emp_id, fname, lname, job, sal
SQL Hint
from employees
where dept_num = 50;

Efficient SQL
SQL Query Optimization -6-

To elimate the number of rows that the optimizer has to retrieve we use where clauses, however the optimzer
may not end up writing the best execution plan, you have better knowledge of the application than the optimizer
and with this knowledge you can use hints which force the optimizer to use that knowledge. Using where
clauses efficiently will reduce the I/O, thus increasing performance, the optimizer will use statistics (row count)
to determine how to create the best plan, if no statistics are available then it has no option but to perform a full
table scan.

Sometimes the optimizer will not use a index, even if you know one exists, the possible reason for these could
be any of the following

• Views in a query sometimes prevent the use of indexes


• If you think heavy data skew is in the table, using histograms will help the optimizer to privide more
accurate representations of the data distribution in the table.
• If for some reason it still refuses to use the index you can force it by using a hint.

Try using the where clause instead of the having clause, as the having clause incurs the additional overhead of
sorting and summing.

Hints

Sometimes the CBO does not know best and needs help to point it in the right direction, this is where hints help
by forcing the optimizer to take a path that you have knowledge about, hints can alter the join methods, join
order or even access paths. The are many hint options you would need to see the Oracle documentation for a full
listing but i have listed some of the more common one

• all_rows - optimize throughput, not optimizer the response time of the statement
• first_rows - return the first rows quickly
• full - perform a full table scan
• ordered - force the join order of the tables in the query
• index - force the use of indexes
• index_ffs - force a full scan of a index, can use parallelization for this to improve performance.

select /*+ FULL (employees) */ ... from employees


select /*+ INDEX_FFS (employees) */ ... from employees
Examples select /*+ USE_NL (employees, dept) */ .. from elpoyees, dept

Note: see oracle documentation for full listing of all the hints

Join Methods

You should choose a join method based on how many rows you expect to be returned from the join

cartesain joins are normally a result of not using a where clause, it basically joins every row in all
Cartesian Joins tables, so for an example if one table as 50,000 rows and the other table has 100 rows then a
cartesian join of the tables would be 50,000 * 100 = 500,000 rows
If you are join tables with few than 10,000 rows then a Nested Loop would be the way to go, see
Nested Loops
hints section above for more details on how to use a Nested Loop
SQL Query Optimization -7-

Use if the join will produce large subsets of data or a substantial proportion of a table is going to be
Hash Join
joined.
If the tables in the join are being joined with an inequality condition (not an equi join), then use a
Merge Join
merge join
Basically used for data warehouses, do not use if running OLTP system, they are used with low
Bitmap Join
cardinality columns (columns having low distinct values - gender, marital_status, relation, etc)

To see more information on Oracle join methods.doc

Index Strategy

An index is a data structure that takes the value of one or more columns of a table (the key) and returns all rows
(or the requested columns in that row) with that value of the column quickly. The efficiency of the index is that
it lets you find the necessary rows without having to perform a full table scan, this leads to few I/O's.

As a general rule you so only use a index if you select about 10-15% of a table, when using a index this
prevents you from performing a full table scan. When you want to retrieve a row oracle has to perform a lookup
in the index to obtain the ROWID, using this ROWID it can then retrieve the requested row. Using to many
indexes can degrade performance as when a table is updated the index has to be updated as well, so only index
when the trade-off is better, in other words if you use a OLTP database then limit the amount of indexes that you
use. Here are some guidelines on when you should index

• Index columns with high selectivity (few rows with identical values)
• Index all important foreign keys
• Index all predicate columns
• Index columns used in table joins
• Try to avoid indexing columns that consist of long character strings
• Try to use index-only query plans, in otherwords try to retrieve the query using the index only.
• Use secondary indexes on columns frequently involved in order by and group by as well as sorting
operations union and distinct

this is the default index type, you will probably use it for almost all the indexes in a typical OLTP
B-Tree index
application.
This is ideal for column data that has a low cardinality (few distinct values). Do not use if you
Bitmap Index
have heavy DML going on the table, this is ideal in a data warehouse environment.
IOT IOT's place all the table data in its primary key index, thus emliminating the need for a separate
(index organized index. The data is sorted and rows are stored in primary key order. This type of index will save
table) storage space when compared to a normal B-Tree index.
Concatenated or Composite indexes are indexes that include more than one column and are
Concatenated
excellent for improving selectivity of the where predicates. You are trying to elimate the
Index
optimizer performing full table scans thus reducing I/O.
are efficient in frequently used statements that involve functions or complex expression on
Funcation-Based
columns, they can quickly return the computed value of the function or expression directly from
Index
the index.
Reverse-Key These are ideal when you have a heavy insert application, the reverse key index provide an
SQL Query Optimization -8-

Index efficient way to distribute the index values more evenly and thus improve performance.
each partition can be distributed across disks thus increaing performance, also each partition can
Partioned Index be manitained separately (backup, remove) without affecting any other partion. You can alos use
parallel query options to improve performance.

For more information on Oracle indexes.doc

Bind Variables

In a ideal world you should parse just once and use the same parsed version of the statement for repeated
executions, this operation is much more expensive than actually executing the statement. In order you use the
same execution plan for the same query the SQL statement must be identical, you use bind variables to achive
this.

The way oracle knows if the query is identical is that the statement is hashed and a hash key stored, if this key
does not match then the statement is not the same, so even you add a space or replace a letter with a upper case
letter the hash key will be different.

There is a parameter you can set to force statements that fail to use bind variables to do so

alter system set cursor_sharing = force;


alter system set cursor_sharing = similar;
alter system set cursor_sharing = extact;
Force bind variable
Note:
usage
force - forces the use of bind variables, oracle will substitute the hard coded variable
similar - force the use of a bind variable only if oracle thinks it will not adversely affects
optimization.
extact - only use the already parsed query if it is identical (hash key matches)

Materialized Views

If you are dealing with very large amounts of data, you should consider using materialized views to improve
response time. Materialized views are objects with summary data from the underlaying table. Expensive table
joins can be done beforehand and saved in the materialized view. You can use the package dbms_olap package
to get recommendations on ideal materialized views.

For more information about materialized Oracle Using Views.doc

Stored Outlines

The CBO doesn't always use the same execution strategy, changes within the database can force the CBO to
change its plan. You can force oracle to use the same plan by using the plan stability feature stored outlines to
preserve the current execution plans, even if the statistics and optimizer mode changes. The only catch is that
the SQL statement must be identical if you wish to use the stored outline.
SQL Query Optimization -9-

Stored outlines are use when you plan to migrate from one oracle version to another, thus you can cut risks and
preserve the applications present performance via stored outlines. Outlines ensure that the execution paths the
queries used in a test instance successfully carry over to the production instance. Also it can be used to override
the code that is imbedded in the application.

The more common name for this feature is called optimizer stability. All the infomation on stored outlines is
stored in the OUTLN schema in two tables OL$ and OL$HINTS (these are created with you install Oracle).

alter system set query_rewrite_enabled = true;


alter system set star_transformation_enabled = true;
alter system set optimizer_features_enabled = 10.2.0;
System parameters
Note: the above values must be the same on all instances, when using this freature across
different databses.
alter system set create_stored_outlines = true;
Stored Outlines (database)
Note: this can use lots of disk space
Stored Outlines (session) alter session set create_stored_outlines = true;
create outline test_outline
Create Outline on select employee_id, last_name
from hr.employees;
alter system set use_stored_outline = true;
Use Outline
alter session set use_stored_outline = true;
Use the dbms_outln_edit package

Options
------------------------------------------------------------------------------------
Edit Outline change_join_pos - changes the join position for the hint identified by outline name
create_edit_tables - creates outline editing tables in calling a users schema
dropedit_tables - drops outline editong tables
generate_signature - generates a signature for the specified SQL text
referesh_private_outline - refreshes the in-memory copy of the outline
Remove Outline drop outline test_outline;
Useful Tables
DBA_OUTLINES A view of the below tables (name, owner, catergory, used, sql_text)
Contains the outlines (name, sql_text, signature, catergory, flags, etc)
OL$
Note: flags is useful as it can tell you if the outline is being used.
OL$HINTS Contains the outlines hints (name, hint_type, hint_text, table_name, etc)
OL$NODES Contains the outlines nodes
Packages
DBMS_OUTLN Manage stored outlines and their outline categories
DBMS_OUTLN_EDIT Manage stored outlines and their outline categories
SQL Query Optimization - 10 -

Histograms

The CBO normally assumes that the data is uniformly distributed in the table, however there are times when the
data is extremely skewed which means you are better off using histograms to store column statistics, histograms
provide more efficient access methods. Histograms use buckets to represent distribution of data in a column and
Oracle uses these buckets to how skewed the data distribution is.

You can use the following histograms

• height-based - divide column values into bands, with each band containing a roughly equal number of
rows
• frequency-based - determine the number of buckets based on the distinct values in the column, each
bucket contains all the data that has the same value

dbms_stats.gather_table_stats(
ownname => 'HR',
tabname = > 'benefits',
method_opt => 'for column size 10 number_of_vists'
);

dbms_stats.gather_table_stats(
ownname => 'HR',
height-based
tabname = > 'benefits',
method_opt => 'for column size skewonly'
);

Note:
for column - your own histogram creation
auto - let oracle decide what to do based on data distribution and workload
skewonly - based the decision only on the data distribution of the columns
dbms_stats.gather_table_stats(
ownname => 'HR',
frequency-based tabname = > 'benefits',
method_opt => 'for column size 20 department_id'
);
Useful Tables
DBA_HISTORGRAMS
DBA_PART_HISTOGRAMS
DBA_SUB_PART_HISTOGRAMS
DBA_TABE_HISTOGRAMS

Indexes

It is worth rebuilding indexes regularly so queries can run faster, see Oracle indexes.doc for more information.

PL/SQL Performance
SQL Query Optimization - 11 -

When you are developing a database there are four system parameters that can help the developer

Used to indentify errors and poor performance


PLSQL_WARNING
alter system set plsql_warnings = 'ENABLE:ALL' scope = both;
(disabled by default)
alter system set plsql_warnings = 'ENABLE:PERFORMANCE','ENABLE:SERVE'
scope=both;
Additional debuging information, compiled code will be stored as interpreted
PLSQL_DEBUG regardless of PLSQL_CODE_TYPE
(false by default)
alter system set plsql_debug = false scope = both;
Optimize when compiled
PLSQL_OPTIMIZE_MODE
(2 by default)
alter system set plsql_optimize_mode = 2 scope=both;
Compiled code into interpreted byte code (default) or native machine code

alter system plsql_code_type = native scope=both;


PLSQL_CODE_TYPE
Note: When PL/SQL objects are compiled the resulting code is stored depends on the
PLSQL_CODE_TYPE, Native compiled code will be stored in an O/S system file,
while interpreted code is stored in the data dictionary

You might also like