Professional Documents
Culture Documents
Creating an index is easy. All you need to do is identify which column(s) you want to index and give it a name!
But, as always, there's more to it than that. You can place many columns in the same index. For example, you could
also include the type of the toys in the index like so:
Bitmaps couldn't be more different. As with B-trees, they store the indexed values. But instead of one row per entry,
the database associates each value with a range of rowids. It then has a series of ones and zeros to show whether
each row in the range has the value (1) or not (0).
Rows where all the indexed values are null are NOT included in a B-tree. But they are in a bitmap! So the optimizer
can use a bitmap to answer queries like:
For example, using a table of Olympic medal winners. Creating indexes on edition, sport, medal, event and athlete
gives the following sizes (in blocks)
Sport 83 1
Medal 71 3
Event 111 5
Gender 64 1
As you can see, in most cases the bitmap indexes are substantially smaller. Though this advantage diminishes as the
number of different values in the index increases.
If you're familiar with Boolean logic truth tables, you may spot another big advantage for bitmaps:
By overlaying the rowid ranges of two indexes, you can find which rows match the where clause in both. Then go to
the table for just those rows.
This means indexes which point to a large number of rows can still be useful. Say you want to find all the female gold
medal winners in the 2000 Athens Olympics. Your query is:
If you create single column bitmaps, the database can combine using a BITMAP AND like so:
---------------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| OLYM_MEDALS |
| 2 | BITMAP CONVERSION TO ROWIDS | |
| 3 | BITMAP AND | |
|* 4 | BITMAP INDEX SINGLE VALUE | OLYM_EDITION_BI |
|* 5 | BITMAP INDEX SINGLE VALUE | OLYM_MEDAL_BI |
|* 6 | BITMAP INDEX SINGLE VALUE | OLYM_GENDER_BI |
---------------------------------------------------------------
This gives a lot of flexibility. You only need to create single column indexes on your table. If you have conditions
against several columns the database will stitch them together!
B-trees don't have this luxury. You can't just plonk one on top of the other to find what you're looking for. While
Oracle Database can combine B-trees (via a "bitmap conversion from rowids"), this is relatively expensive. In general
to get the same performance as the three bitmaps, you need to place all three columns in a single index. This affects
how reusable an index is, which we'll come to later.
At this point it looks like a slam dunk victory for bitmaps over B-trees.
Smaller? Check
They're one of the few situations in Oracle Database where an insert in one session can block an insert in another.
This makes them questionable for most OLTP applications.
Why?
Well, whenever you insert, update or delete table rows, the database has to keep the index in sync. This happens in a
B-tree by walking down the tree, changing the leaf entry as needed. You can see how this works with this visualization
tool.
But bitmap locks the entire start/end rowid range! So say you add a row with the value RED. Any other inserts which
try to add another row with the value RED to the same range are blocked until the first one commits!
This is an even bigger problem with updates. An update from a B-tree is really a delete of the old value and insert of
the new one. But with a bitmap Oracle Database has to lock the affected rowid ranges for both the old and new
values!
As a result, bitmap indexes are best suited for tables that will only have one process at a time writing to them. This is
often the case on reporting tables or data warehouses. But not your typical application.
So if more than one session will change the data in a table at the same time, think carefully before using bitmap
indexes.
And of course, there's the cost reason. Bitmap indexes are an Enterprise Edition only feature. Or, if you like your
database in the cloud, an Enterprise Package DBaaS or better.
So, great as bitmaps may be, for most applications B-trees are the way to go!
If you'd like to run these comparisons yourself, use this LiveSQL script. That covers one of the biggest differences in
index types. Here are some other common index types.
Function-based Indexes
Image Pixabay
These are simply indexes where one or more of the columns have a function applied to them. The index stores the
result of this calculation. For example:
create index date_at_midnight_i on table ( trunc ( datetime ) );
or
Bear in mind if you have a function-based index, to use it the function in your where clause must match the definition
in the index exactly(*). So if your index is:
Simple formulas
Using standard math, rearrange the formula so there are no functions on the column:
Note that the reverse isn't always true. You can have a normal index on a column. Then apply a function to that
column in your where clause. Sticking with the dates example, you can index:
*Starting in 11.2.0.2, Oracle Database can use function-based indexes to process queries without the function in the
where clause. This happens in a special case where the function preserves the leading part of the indexed values. For
example, trunc() or substr().
I still think it's better to create the regular index. But at least this helps if you must use the function-based index for
some reason. For worked examples, head over to LiveSQL.
Unique Indexes
Image Pixabay
A unique index is a form of constraint. It asserts that you can only store a given value once in a table. When you
create a primary key or unique constraint, Oracle Database will automatically create a unique index for you (assuming
there isn't an index already available). In most cases you'll add the constraint to the table and let the database build
the index for you.
But there is one case where you need to manually create the index:
You can't use functions in unique constraints. For example, you might want to build a "dates" table that stores one
row for each calendar day. Unfortunately, Oracle Database doesn't have a "day" data type. Dates always have a time
component. To ensure one row per day, you need lock the time to midnight. You apply the trunc() function to do this.
But a constraint won't work:
Descending Indexes
B-tree indexes are ordered data structures. New entries have to go in the correct location, according to the logical
order imposed by the columns. By default these sort from small to large.
But you can change this. By specifying desc after the column, Oracle Database sorts the values from large to small.
Usually this makes little difference. The database can walk up or down the index as needed. So it's rare you'll want to
use them.
But there is a case where they can help. If your query contains "order by descending", the descending index can avoid
a sort operation. The simplest case is where you're searching a range of values, sorting these descending and another
column.
For example, finding the orders for customer_ids in a given range. Return these ids in reverse order. Then sort by sale
date:
-------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT ORDER BY | |
| 2 | TABLE ACCESS BY INDEX ROWID| ORDERS |
| 3 | INDEX RANGE SCAN | ORDERS_I |
-------------------------------------------------
But create it with ( customer_id desc, order_datetime ) and the sorting step disappears!
------------------------------------------------
| Id | Operation | Name |
------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| ORDERS |
| 2 | INDEX RANGE SCAN | ORDERS_I |
------------------------------------------------
This could be a big time saver if the sort is "expensive".
JSON Search Indexes
Storing data as JSON documents is (sadly) becoming more common. All the attributes are part of the document,
instead of a column in their own right. This makes indexing the values tricky. So searching for documents that have
specific values can be agonizingly slow. You can overcome this by creating function-based indexes on the attributes
you search.
For example, say you store the Olympic data as JSON. You regularly look for which medals an athlete's won. You can
index this like so:
But you may want to do ad-hoc searching of the JSON, looking for any values in the documents. To help with this you
can create an index specifically for JSON data from Oracle Database 12.2. It's easy to do:
Context
Category
Rule
Using these indexes is a huge topic in itself. So if you want to do this kind of search, I recommend starting with
the Text Developer's Guide.
If you want to do this, check out the Data Cartridge Developer's Guide.
If you have a large number of sessions doing this at the same time, this can lead to problems. They all need to access
the same index block to add their data. As only one can change it at a time, this can lead to contention. A term called
"hot blocks".
Reverse key indexes avoid this problem by flipping the byte order of the indexed values. So instead of storing 12345,
you're effectively storing 54321. The net effect of this is it spreads new rows throughout the index.
It's rare you'll need to do this. And because the entries are not stored in their natural sort order, you can only use
these indexes for equality queries. So it's often better to solve this problem in other ways, such as hash partitioning
the index.
OK, so that covers the different types of index available. But we still haven't got to the bottom of which indexes you
should create for your application. It's time to figure out:
First up, it's important to know there's no "right" answer here (though there are some wrong ones ;). It's likely you'll
have many different queries against a table. If you create an index tailored to each one, you'll end up with a mass of
indexes. This is a problem because each extra index adds storage and data maintenance overheads.
So for each query you need to make a compromise between making it as fast as possible and overall system impact.
Everything is a trade-off!
The most important thing to remember is that for the database to use an index, the column(s) must be in your query!
So if a column never appears in your queries, it's not worth indexing. With a couple of caveats:
Unique indexes are really constraints. So you may need these to ensure data quality. Even if you don't query
them columns themselves.
You should index any foreign key columns where you delete from the parent table or update its primary key.
Beyond this, an index is most effective when its columns appear in your where clause to locate "few" rows in the table.
I give a brief overview of what "few" means in this video series:
A key point is it’s not so much about how many rows you get, but how many database blocks you access. Generally
an index is useful when it enables the database to touch fewer blocks than a full table scan reads. A query may return
“lots” of rows, yet access “few” database blocks.
This typically happens when logically sorting the rows on the query column(s) closely matches the physical order
database stores them in. Indexes store the values in this logical order. So you normally want to use indexes where this
happens. The clustering factor is a measure of how closely these logical and physical orders match. The lower this
value is, the more effective the index is likely to be. And thus the optimizer to use it.
But contrary to popular belief, you don't have to have the indexed column(s) in your where clause. For example, if you
have a query like this:
If your SQL only ever uses one column of a table in the join and where clauses then you can have single column
indexes and be done.
But the real world is more complicated than that. Chances are you have relatively few basic queries like:
So it's likely you’re going to want some composite indexes. This means you need to think about which order to place
columns in the index.
Why?
Because Oracle Database reads an index starting with its leftmost (“first”) column. Then it goes onto the second, third,
etc. So if you have an index on:
Those with range conditions (<, >=, between, etc.) should go towards the end
Important disclaimer: The findings for column order here are NOT cast-iron rules. The specifics of your data and
requirements may lead you to different conclusions and thus indexes. The important part to grasp is the process of
analyzing the trade-offs so you can find the "best" indexes for your application. If you'd like to follow along, use these
LiveSQL scripts.
Say you've got three common queries on this data:
With just three values for medal and an even spread between them an index on this is unlikely to help. Queries on
event return few rows. So this is definitely worth indexing.
( event, medal )
OR
( medal, event )
For the query searching both columns it will make little difference. Both options will only hit the table for the rows
matching the where clause.
But the order affects which of the other queries will use it.
Remember Oracle Database reads index columns from left to right. So if you index (medal, event), your query looking
for all the 100m medal winners must first read the medal values.
Normally the optimizer won’t do this. But in cases where there are few values in the first column, it can bypass it in
what’s called an index skip scan.
There are only three different medals available. So this is few enough to enable a skip scan. Unfortunately this is more
work than when event is first. And, as discussed, SQL looking for the winners of a particular medal are unlikely to use
an index anyway. So an index with medal first probably isn't worth it.
Using ( event, medal ) has the advantage of a more targeted index compared to just ( event ). And it gives a tiny
reduction to the work the query on both columns compared to an index on event alone.
So you've got a choice. Do you index just event, or event and medal?
The index on both columns reduces the work for queries on event and medal slightly.
As well as consuming more disk space, adding to DML overheads, etc. there's another problem with the composite
index. Compare the clustering factor for the indexes on event:
INDEX_NAME CLUSTERING_FACTOR
OLYM_EVENT_I 916
OLYM_EVENT_MEDAL_I 2219
Notice that the clustering factor for event_medal is nearly 3 times higher than for medal!
The clustering factor is one of the key drivers determining how "attractive" an index is. The higher it is, the less likely
the optimizer is to choose it. If you're unlucky this could be enough to make the optimizer think a full table scan is
cheaper...
Sure, if you really need the SQL looking for the gold winners in given events to be ultra speedy, go for the composite
index. But in this case it may be better to go one step further. Add athlete to the index too. That way Oracle Database
can answer the query by accessing just the index. Avoiding the table access can save you some work. In most other
cases I'd stick with the single column event index.
Let's take another example. Say your dominant query is to find the winners for given events in a particular year:
select listagg(athlete,',') within group (order by athlete)
from olym_medals where edition = 2000 and event = '100m';
In most cases this will return three or six rows. Even with those pesky team events with lots of athletes getting medals
you're looking at around 100 rows tops. So a composite index is a good idea.