Cluster JoinsImagine a filing cabinet full of personnel records. For each employee there is a separate folder containing a basicinformation sheet, a list of vacation days used, and other papers pertaining to the employee. If the boss wants to see areport that lists each employee’s name, social security number, and number of vacation days used this year, then youcould work through the filing cabinet one file at a time. When you pull out one employee’s file, you immediately haveaccess to the basic information sheet (which has the employee’s social security number on it) and the list of vacation daysused. If you imagine that the basic information is stored in one database table and the vacation days used in another, thenthis scene could describe a cluster join.A cluster join is really just a special case of the nested loops join. If the two row sources being joined are actually tablesthat are part of a cluster and if the join is an equijoin between the cluster keys of the two tables, then Oracle can use acluster join. In this case, Oracle reads each row from the first row source and finds all matches in the second row source by using the cluster index. In the personnel cabinet example, the filing cabinet is the cluster and the individual employeefolders are the cluster key.Cluster joins are extremely efficient, since the joining rows in the two row sources will actually be located in the same physical data block. However, clusters carry certain caveats of their own, and you can’t have a cluster join without acluster. Therefore, cluster joins are not very commonly used. In a later section we’ll look at the pros and cons of clusters.Hash JoinsIn a hash join, Oracle reads all of the join column values from the second row source, builds a hash table, and then probesthe hash table for each of the join column values from the first row source. This is like a nested loops join, except that firstOracle builds a hash table to facilitate the operation.Hash joins can be effective when the lack of a useful index renders nested loops joins inefficient. The hash join might befaster than a sort-merge join in this case because only one row source needs to be sorted, and could possibly be faster thana nested loops join because probing a hash table in memory can be faster than traversing a B-tree index. As with sort-merge joins and cluster joins, though, hash joins only work on equijoins. Also, as with sort-merge joins, hash joins usememory resources and can drive up I/O in the temporary tablespace. Finally, hash joins are only available in Oracle 7.3and later, and then only when cost-based optimization is used.
Effective Schema Design
The database schema plays an important role in tuning joins. Certain features in the schema, or the absence of thesefeatures, will limit the options available to the optimizer for joining tables efficiently. Also, poorly placed features in theschema can fool the optimizer into making a poor decision.
All primary and unique keys should be identified and appropriate constraints declared. This will cause the creation of unique indexes on the primary and unique key columns. Indexes should usually be created explicitly on foreign keycolumns to facilitate joins and to eliminate locking problems when Oracle enforces referential integrity constraints.Consider an orders table with primary key order_id, and an order_lines table with primary key order_line_id and foreignkey order_id referencing the orders table.
SELECT O.order_number, L.line_number, L.item_number, L.quantityFROM orders O, order_lines LWHERE O.order_id = :order_idAND L.order_id = O.order_idORDER BY O.order_number, L.line_number
If the database will contain many orders, then the order_id column in the order_lines table is very selective and should beindexed. This index, plus the unique index on the order_id column of the orders table will allow Oracle to execute thisquery extremely efficiently using a nested loops join.
-- 3 --