You are on page 1of 39

APPLICATION

PERFORMANCE
SCSD 3713
DATABASE ADMINISTRATION
Semester 1 2020/2021
Agenda
• Defining Applications for
Relational Access
• Relational Optimization
• Additional Optimization
Considerations
• Reviewing Access Paths
• SQL Coding and Tuning for
Efficiency
• Questions
Application Code and SQL
Most relational tuning
experts agree that the
majority of performance
problems with
applications that access
a relational database are Everything
caused by poorly coded Else

programs or improperly
coded SQL… Application
Code & SQL
 as high as 70% to 80%
Designing Applications
for Relational Access
Design issues to examine when application performance suffers:
• Type of SQL. Is the correct type of SQL (planned or unplanned, dynamic or static,
embedded or stand-alone) being used for this particular application?
• Programming language. Is the programming language capable of achieving the
required performance, and is the language optimized for database access?
• Transaction design and processing. Are the transactions within the program properly
designed to assure ACID properties, and does the program use the transaction
processor of choice appropriately and efficiently?
• Locking strategy. Does the application hold the wrong type of locks, or does it hold
the correct type of locks for too long?
• COMMIT strategy. Does each application program issue SQL COMMIT statements to
minimize the impact of locking?
• Batch processing. Are batch programs designed appropriately to take advantage of
the sequential processing features of the DBMS?
• Online processing. Are online applications designed to return useful information and
to minimize the amount of information returned to the user’s screen for a single
invocation of the program?
The Optimizer
• The optimizer is the heart of a relational DBMS.
– The optimizer is an inference engine for determining the
database navigation strategy.
• The application developer specifies what data is needed by
coding the SQL statements…
• …the DBMS supplies information about where the data is
located, and…
• …the relational optimizer decides how to efficiently
navigate the database.
• The end user needs no knowledge of where and how the
actual data is stored. The optimizer knows this information.
Optimization
Hint(s)
Determine
Optimal System
Access Catalog
SQL
Request
Other System
Information
Exit -or-
Save Run SQL?
Access No
Path
Yes

Execute the
Optimized Database Tables
SQL

Query
Results
Multiple Rows
Physical Data Independence
• Relational optimization allows queries to adapt to a
changing database environment.
• The optimizer can react to changes by formulating
new access paths without requiring application coding
changes to be implemented.
– The application can therefore be flexible as tables expand
or contract in size, as indexes are added or removed, and
as the database becomes disorganized or reorganized.
• This separation of access criteria from physical
storage characteristics is called physical data
independence.
Optimization
• Although every RDBMS has a relational optimizer
that turns SQL into executable access paths, each
vendor’s optimizer works a little differently, with
different steps and using different information.
• But the core of the process is the same:
– The optimizer parses the SQL statement and performs
various phases of optimization
• typically involving verification of syntactic and semantic
correctness
– Then the query is analyzed.
– And access paths are created for the query.
Query Cost Formula
• CPU and I/O Costs
• Database Statistics
– Number of rows in the tablespace, table, or index
– Number of unique values stored in the column
– Most frequently occurring values for columns
– Index key density
• the average percentage of duplicate values
– Details on the ratio of clustering for clustered tables
– Correlation of columns to other columns
– Structural state of the index or tablespace
– Amount of storage used by the database object
Test vs. Production Statistics
• When testing applications against test databases
there will be less data
• So statistics will not match production
• You can copy production statistics and populate
them into the test
system to simulate production
access paths, though.
Query Analysis
• The query analysis scans the SQL statement to
determine its overall complexity.
• The formulation of the SQL statement is a significant factor
in determining the access paths chosen by the optimizer.
• Estimated cost calculated by optimizer, including:
• complexity of the query,
• number and type of predicates
• the presence of functions
• the presence of ordering clauses
Query Analysis
During query analysis, the optimizer analyzes aspects of the SQL
statement and the database system, such as:
• Which tables in which database are required
• Whether any views are required to be broken down into underlying tables
• Whether table joins or subselects are required
• Whether union, except, or intersect are required
• Which indexes, if any, can be used
• How many predicates (WHERE clauses) must be satisfied
• Which functions must be executed
• Whether the SQL uses OR or AND
• How the DBMS processes each component of the SQL statement
• How much memory has been assigned to the data cache(s) used by the
tables in the SQL statement
• How much memory is available for sorting if the query requires a sort
Understanding & Optimizing Joins
 Join predicates
 No Cartesian
Products
 Minimize the size
of intermediate
tables
 Index for
performance
 Type of Joins
 Nested Loop
 Merge Scan
 Hybrid
 Star Join
Access Path Choices
• Table Scans
• Indexed Access
– Direct index lookup
– Matching index scan
– Non-matching index scan
– Index screening
– Index only access
– Using indexes to avoid sorting
• Hashed Access
• Parallel Access
– I/O, CPU, system
Access Path Choices
• Table scan  simplest form  read every row of
table
• Alternate type of scan:
• Tablespace scan --> Reads every page in the tablespace, a
tablespace may contain more than one table
• Will run slower than table scan, because additional I/O will be
incurred reading data that does not apply
• Partition scan  data to be accessed exists in certain
partition, thus limit the scan to only appropriate partition
Why Wasn’t That Index Chosen?
• Does the query specify a search argument?
– If no predicate uses a search argument, the optimizer cannot use an index
to satisfy the query.
• Are you joining a large number of tables?
– The optimizer within some DBMSs may produce unpredictable query plan
results when joining a large number of tables.
• Are statistics current?
– If many changes have occurred database statistics should be recaptured
to ensure that the optimizer has up-to-date information.
• Are you using stored procedures?
– Sometimes the DBMS provides options whereby a stored procedure, once
compiled, will not reformulate a query plan for subsequent executions.
• Are additional predicates needed?
– A different WHERE clause might possibly enable the optimizer to consider
a different index.
Additional Optimization
Considerations
• View Access
– View Merge
– View Materialization
• Query Rewrite
– Equivalent SQL

– Inferred Predicates
Rule-Based Optimization
• Most relational optimizers are cost based, meaning
they formulate access paths based on an estimation
of costs.
• Lower-cost favored over costlier access paths.
• Some DBMSs support a optimization based on
heuristics, or rules.
• Oracle provides both cost-based and rule-based
optimization.
• But Oracle is phasing out the rules-based optimizer.
Reviewing Access Paths

SQL Text
Plan
Table

BIND or
Catalog
EXPLAIN Optimizer
Statistics
Request

Access
Path
What in the Plan Table?
• Whether an index is used, and if so, how many
• How many columns of the index match the query
• Whether index-only access is used
• What join method is used
• Whether parallel access is used
• Whether sorting is required
Visual Explain Tools
• Instead of interpreting coded values in a Plan Table, a Visual
Explain tool diagrams access paths pictorially.
Forcing Access Paths
• Techniques to force access path selection should be used
with caution. It is usually better to let the optimizer choose
the appropriate access paths on its own unless:
– You have in-depth knowledge of the amount and type of data
stored in the tables to be joined
– You are reasonably sure that you can determine the optimal join
order better than the optimizer, or
– Database statistics are not up-to-date, so the optimizer is not
working with sufficient information about the database
environment.
• Examples:
– FORCEPLAN (SQL Server)
– Access Path Hints (Oracle)
– Plan Table Hints (DB2)
Query Tweaking
• An alternative method to encourage the optimizer
to select different access paths is tweaking your
SQL statement.
• Example:
SQL Coding and Tuning for
Efficiency
1. Identify the business data requirements
2. Ensure that the required data is available within existing databases
3. Translate the business requirements into SQL
4. Test the SQL for accuracy and results
5. Review the access paths for performance
6. Tweak the SQL for better access paths
7. Code optimization hints
8. Repeat steps 4 through 7 until performance is acceptable.
9. Repeat step 8 whenever performance problems arise or a new
DBMS version is installed
10. Repeat entire process whenever business needs change
A Dozen SQL Rules of Thumb
1. It depends!
2. Be careful what you ask for
3. KISS
4. Retrieve only what is needed
5. Avoid Cartesian products
6. Judicious use of OR
7. Judicious use of LIKE
8. Avoid sorting when possible
9. Know what works best
10. Issues frequent COMMITs
11. Beware of code generators
12. Consider stored procedures
It Depends!
• The cardinal rule of SQL coding, and indeed,
database development is “It depends!”
• A successful DBA will know on what it depends.
• Be skeptical of tuning tips that use the words
“always” or “never.”
• Just about everything depends on other things.
Be Careful What You Ask For
• The arrangement of elements within a query can
change query performance.
• Place the most restrictive predicate where the
optimizer can read it first.
• Enables the optimizer to
narrow down the first set
of results before proceeding
to the next predicate
KISS
• Keep It Simple, Stupid

This space intentionally left blank


Retrieve Only What is Needed
• Specify appropriate WHERE clause to minimize
number of rows returned.
• Specify the absolute minimum number of columns
in the SELECT list.
• Consider this query… what is wrong with it?
Avoid Cartesian Products
• Cartesian Product -> every row in one table is
joined to every row in another table with no join
criteria.
– The results of a Cartesian product are difficult to
interpret.
• Always provide join predicates.
• Failure to do so will result in severe performance
degradation and possibly incorrect results.
Judicious Use of OR
• The OR logical operator can be troublesome for
performance.
• For example, consider changing this:

• To this:
Judicious Use of LIKE
• The LIKE logical operator can cause trouble.
• You might be able to change this SQL:

• To this:
Avoid Sorts When Possible
• When performance is important, remember to look
for sorts and find ways to eliminate them.
• You can use indexes to avoid sorts for certain SQL
constructs in most relational DBMSs:
– ORDER BY
– GROUP BY
– DISTINCT
– UNION
– INTERSECT
– EXCEPT
Know What Works Best
• One way of coding usually provides better
performance than the others.
• Study your DBMS and learn what constructs work
best…
• …and use them!
Issue Frequent Commits
• The COMMIT statement finalizes any modifications
to the database.
• Changed data is locked until committed.
• Locks impact concurrency and degrade
performance (and perhaps, availability)
• As a DBA you must ensure that application
developers issue enough COMMIT statements to
minimize the impact of locking on availability and
to keep rollback segments to a manageable size
Beware of Code Generators
• Application code generators and similar tools that
automatically create SQL can create “bad” SQL… and
usually do.
– Keep an eye on the SQL generated by such tools and re-write
poorly written SQL before it reaches production.
• Some of these tools use gateways that require each
SQL statement to be recompiled and optimized each
time it is requested.
• Utilize the gateway’s a caching mechanism to store
compiled and optimized SQL on the server.
– Such a cache can be help to improve performance for
frequently recurring SQL statements.
Consider Stored Procedures
• Stored procedures can be used to reduce network
traffic and improve performance
• A stored procedure can contain multiple SQL statements
• Only one trip across the network is required to run the
entire stored procedure
• Multiple trips across the network would be required to
run each of multiple, individual SQL statements
Additional SQL Tuning Tips
• Create indexes to support troublesome queries.
• Whenever possible, do not perform arithmetic in SQL
predicates.
– Use the host programming language (Java, COBOL, C, etc.) to perform arithmetic.
• Use SQL functions to reduce programming effort.
• Look for ways to perform as much work as possible using
only SQL.
– Optimized SQL typically outperforms host language application code.
• Build proper constraints into the database to minimize
coding edit checks.
• Do not forget about the “hidden” impact of triggers.
– A delete from one table may trigger many more operations. Although you may
think the problem is a poorly performing DELETE, the trigger is really the culprit.
Indentifying Poorly Performing
SQL
• A large part of the task of tuning SQL is identifying
the offending code.
• Acquire and use a SQL performance monitor to
constantly monitor the DBMS for sub-optimal SQL
statements.
• Identify the worst SQL and fix.

You might also like