You are on page 1of 49

SQL Query Optimization

and Indexing

Along the way

How is a db query executed


Schema optimization
Execution Plan
Indexing (Types of Indices)
Using indices
Lock Contention
Covering Indices
The DB Engine

How does a
database server
run a query ?

Server process

SQL Query

analysis

If new query?

Execution plan?

(Optimizer )

If cpu?

Table scan ?

index?

Require high
performance ??

Good optimized schema


Give indexes for specific
queries

Tradeoffs !!

Whats an Index??

adata structure

Retrieval

Inserts

Also

Denormalized db
=
faster 4 some
queries
+
slower for others

Choosing optimal Data Types


Smaller is better
Use less space
Require fewer CPU cycles

Simple is good
Integers easier to compare than characters
E.g.: Use MySQL built in types for date/time

Avoid NULL if possible


Harder for MySQL to optimize queries referring to
nullable columns
DATETIME and TIMESTAMP store same kind of data, but ??

String Types
VARCHAR and CHAR types
Their storage on disk is storage engine dependent
Usually the storage is different for disk, memory and
after retrieval from the storage engine

VARCHAR

Uses as much space as it needs


Uses 1 or 2 bytes extra for storing the length
1 byte if length up to 255 bytes, 2 for above 255 length
So VARCHAR(10) uses 11 bytes and VARCHAR(1000)
uses 1002 bytes
Improves performance as it saves space
Variableupdat
rows can
more

-
-
length
e
grow
work!!!
-
Use VARCHAR , max col length > avg length, updates are
rare

CHAR
Fixed length
For data changing frequently, char better than varchar
For very short columns, CHAR(1) = 1 BYTE and
VARCHAR(1) = 2 BYTES
The siblings of char and varchar are binary and
varbinary data types
Good for comparing as bytes that characters.

Comparing random strings

Strings produced by MD5(), SHA1() OR UUID().


Each new string generated will be distributed in arbitrary
ways over a large space
Can slow INSERT coz get inserted in a random loc in
indexes
They slow some SELECT queries as logically adjacent
rows will be widely dispersed in disk and memory

If you do store UUID values, you


should remove the dashes or, even
better, convert the UUID values to
16-byte numbers with UNHEX() and
store them in a BINARY(16) column.
You can retrieve the values in
hexadecimal format with the HEX()
function

IP Address

Usual case, use VARCHAR(15)


But, IP is really an unsigned 32 bit integer , not a string
Dotted-quad notation for humans to understand easily
MySQL provide INET_ATON() and INET_NTOA() fns to
convert btw 2 representations

The Execution Plan


Every SQL query is broken down in to series of
execution steps called as operators
Each operator performs basic operations like
insertion, search, scan, updation, aggregation etc.
There are 2 kinds of operators Logical operators
and physical operators.
Logical operators : describe how the execution
will be executed at a conceptual level
Physical operators : The actual logic / routine
which perform the action.

Checks
syntax
Query
process
or tree
is output
of parse

PARSE

OPTIMIZE

Calculate
cost and
gives out
estimated
plan and
an actual
plan

DATA STATISTICS
1. How many rows?
2. Unique data?
3. Does table span
over more than
one page?

EXECUTE
As per
plan
executio
n is
done

Into Indexing
TYPES
B-Tree Indexes
Hash Indexes

B-Tree
We use the term "B-Tree" for these indexes because
that's what MySQL uses in CREATE TABLE and other
statements
All the values are stored in order, and each leaf page is
the same distance from the root

Leaf nodes have pointers to the indexed data instead


of pointers to other pages
Because B-Trees store the indexed columns in order,
they're useful for searching for ranges of data

Hash Indexes
Built on hash tables and useful for exact lookups that
use every column in the index
Memory storage engine only supports this in MySQL
Forms hash codes of the indexed columns and stores a
pointer to each row in hash table
E.g. :

CREATE TABLE testhash (


fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
KEY USING HASH(fname)
) ENGINE=MEMORY;

containing this data:

mysql> SELECT * FROM testhash

Fname
Darshan
Bijesh
Jophin
Vivek

lname
Raj
Chandran
Joseph
Babu

Suppose the index use a fn f(), which return following


values
f(Darshan) = 2323
f(Bijesh') = 7437
f(Jophin') = 8784
f('Vivek') = 2458

The index's data structure will look like this:


Slot

Value

2323

Pointer to row 1

2458

Pointer to row 4

7437

Pointer to row 2

8784

Pointer to row 3

A hash index on a TINYINT will be the same size as a hash


index on a large character, coz ???
the indexes store only the short hash values.

Non - Clustered Indexes

Data present in random order


Logical ordering specified by index
Typically created on column used in JOIN, WHERE and ORDER BY
Good for tables whose values may be modified frequently

Clustered Indexes
Data blocks arranged in order to match the index
Only one clustered index possible on a given table
Faster retrieval if data accessed in asc or desc order

MS SQL Server creates non-clustered


indices by default when CREATE INDEX is
given.

Using indices
Indexing the primary key
Usually automatically indexed to facilitate effective
information retrieval
Most effective access path
Other columns or combination of columns = secondary
index to improve performance in data retrieval

Secondary indexes
Indexes on other columns other than primary key
column
Create secondary indexes on tables that have more
reads than writes
Just copy of the db table but containing only the fields
specified in the index

Dont give more than 4 fields in an


index and more than 5 indexes for a
table. You are inviting trouble
otherwise !!

Index Column Order does matter !!


Not useful if lookup does not start from the leftmost side
of the indexed columns.
Cant skip columns in the index.

Join vs. Sub query

Join faster when we have less number of tables


Join faster when we have less data in tables
Sub query faster when there are large number of tables
as joining more tables is tedious
Sub query faster when we have huge data in tables

Explaining the explain ??!


A way to obtain information about how MySQL executes a
SELECT statement
Syntax : Explain SELECT select_options
Returns a row of information for each table used in the
SELECT statement
These are the info that MySQL gives for each table
id

Selec
t_typ
e

Tabl
e

Type Possibl
e_
Keys

Key

Key_
Length

Ref

Row
s

Id : select identifier
Select_type : type of select (Simple, Primary , Union ,
Dependent Union, Subquery, Dependent Subquery etc)

extr
a

Table : table to which row output refers


type : The join type (important)
possible keys : The possible indexes that can be used for
the query
keys : The indexes used in the query
rows : no: of rows scanned

Lock Contention??

1) DELETE FROM user WHERE


status = 9

Fully scan user


table, deleting if
status = 9;

User_id
(PK)

Name

status

100

What happens if query


1 does not lock row:
user_id = 100 ?

100000
2) UPDATE user SET status=9 WHERE
user_id =100

DATA CONSISTENCY IS
BROKEN !!

If STATUS column is
indexed
1) DELETE FROM user WHERE
status = 9
Status

PK

100

101

12345

100000

1) And 2) can run in


parallel
(CONCURRENCY
IMPROVED)

User_id
(PK)

Name

status

Roger

100

Rafael

01

100000

Andy

2) UPDATE user SET status=9 WHERE


user_id =100

Covering Index??

DB Engine ??

The underlying software component


that a database management
system(DBMS) uses to create , read ,
update , delete (CRUD) data from a
database
MySQL has InnoDB and MyISAM
InnoDB = transactional
MyISAM = non-transactional

InnoDB create a Clustered Index for


every table. If it has a primary key,
that is the clustered index. If not, it
created a six-byte unique ID and
makes it the clustered index.
All Indexes are B-Trees. The Primary
keys leaf nodes are the data.

References

High Performance MySQL Steven Feuerstein


Mastering the art of Indexing - Yoshinori Matsunobu
http://www.codeproject.com
http://www.databasejournal.com
SQL Best Practices Video Journal by Steven Feuerstein
MySQL 5.0 Reference manual

THANK YOU

You might also like