You are on page 1of 11

Table of Contents

Table Cluster .................................................................................................................................................................................. 1 Index Cluster .............................................................................................................................................................................. 5 HASH Cluster .............................................................................................................................................................................. 5 Alter Statement: ........................................................................................................................................................................ 9 Validating Tables (IOTs), Indexes, Clusters, and Materialized Views .................................................................................. 10 Indexes............................................................................................................................................ Error! Bookmark not defined.

Table Cluster
Key Notes: 1. A schema object that contains data from one or more tables, all of which have one or more columns in common. In table clusters, the database stores physically together all the rows (in the same data block) from all tables that share the same cluster key. 2. The cluster key is the column or columns that the clustered tables have in common. You can specify up to 16 cluster key columns. 3. Restrictions on Cluster Data Types Cluster data types are subject to the following restrictions: You cannot specify a cluster key column of data type LONG, LONG RAW, REF, nested table, varray, BLOB, CLOB, BFILE, the Any* Oracle-supplied types, or user-defined object type. You can specify a column of type ROWID, but Oracle Database does not guarantee that the values in such columns are valid rowids. 4. The cluster key value is the value of the cluster key columns for a particular set of rows. 5. Each cluster key value is stored only once in the cluster and the cluster index, no matter how many rows of different tables contain the value. 6. Cluster those tables that are primarily queried together using joins and not modified frequently. 7. After you create a cluster, you add tables to it. A cluster can contain a maximum of 32 tables. Object tables and tables containing LOB columns or columns of the Any* Oracle-supplied types cannot be part of a cluster. 8. Cluster Tables guidelines: The tables are frequently updated. Cluster tables that are accessed frequently by the application in join statements. The tables frequently require a full table scan. The tables require truncating. Do not cluster tables where the cluster key values are modified frequently. Modifying a row's cluster key value takes longer than modifying the value in an unclustered table, because Oracle Database might need to migrate the modified row to another block to maintain the cluster. Do not cluster tables if the application often performs full table scans of only one of the tables. A full table scan of a clustered table can take longer than a full table scan of an unclustered table. Store a detail table alone in a cluster if you often select many detail records of the same master. This measure improves the performance of queries that select detail records of the same master, but does not decrease the performance of a full table scan on the master table. An alternative is to use an index organized table. Do not cluster tables if the data from all tables with the same cluster key value exceeds more than one or two data blocks. A good cluster key has enough unique values so that the group of rows corresponding to each key value fills approximately one data block. Do not cluster tables when the number of rows for each cluster key value varies significantly. This causes waste of space for the low cardinality key value; it causes collisions for the high cardinality key values. Collisions degrade performance. 9. Advantage of clustering: Disk I/O is reduced for joins of clustered tables.

10. 11. 12.



Access time improves for joins of clustered tables. Less storage is required to store related table and index data because the cluster key value is not stored repeatedly for each row. A clustered table can be in a different schema than the schema containing the cluster. Also, the names of the columns are not required to match, but their structure (data type and size) must match. A cluster index cannot be unique or include a column defined as long. Columns That Are Suitable for Indexing/Cluster Keys: Columns with one or more of the following characteristics are candidates for indexing: Values are relatively unique in the column. There is a wide range of values (good for regular indexes). There is a small range of values (good for bitmap indexes). The column contains many nulls, but queries often select all rows having a value. LONG and LONG RAW columns cannot be indexed. This is important. By default, the database stores only one cluster key and its associated rows in each data block of the cluster data segment. The rule of one key for each block is maintained as clustered tables are imported to other databases on other systems. If all the rows for a given cluster key value cannot fit in one block, the blocks are chained together to speed access to all the values with the given key. If the cluster SIZE is such that multiple keys fit in a block, then blocks can belong to multiple chains. Set the storage parameters for the data segments of a cluster using the STORAGE clause of the CREATE CLUSTER or ALTER CLUSTER statement. The storage parameters set for the cluster override the table storage parameters.

15. Clusters Data Dictionary Views DBA_CLUSTERS/ALL_CLUSTERS/USER_CLUSTERS DBA_CLU_COLUMNS/USER_CLU_COLUMNS (These views map table columns to cluster columns) DBA_CLUSTER_HASH_EXPRESSIONS/ALL_CLUSTER_HASH_EXPRESSIONS/USER_CLUSTER_HASH_EXPRESSIONS (These views list hash functions for hash clusters) 16. Dropping Clusters and related objects: When a cluster is dropped, so are the tables within the cluster and the corresponding cluster index. DROP CLUSTER emp_dept; DROP CLUSTER emp_dept INCLUDING TABLES; If one or more tables in a cluster contain primary or unique keys that are referenced by FOREIGN KEY constraints of tables outside the cluster, the cluster cannot be dropped unless the dependent FOREIGN KEY constraints are also dropped. DROP CLUSTER emp_dept INCLUDING TABLES CASCADE CONSTRAINTS; When you drop a single table from a cluster usind (DROP TABLE...), the database deletes each row of the table individually. CLUSTER INDEX: A cluster index can be dropped without affecting the cluster or its clustered tables. However, clustered tables cannot be used if there is no cluster index; you must re-create the cluster index to allow access to the cluster.

SORT: The SORT keyword is valid only if you are creating a hash cluster. This clause instructs Oracle Database to sort the rows of the cluster on this column after applying the hash function when performing a DML operation. SIZE: Use the SIZE clause to specify the number of cluster keys that will be stored in data blocks allocated to the cluster. You can change the SIZE parameter only for an indexed cluster, not for a hash cluster. SIZE should be set to the average amount of space required to hold all rows for any given hash key. Specify the amount of space in bytes reserved to store all rows with the same cluster key value or the same hash value. This space determines the maximum number of cluster or hash values stored in a data block. If SIZE is not a divisor of the data block size, then Oracle Database uses the next largest divisor. If SIZE is larger than the data block size, then the database uses the operating system block size, reserving at least one data block for each cluster or hash value. The database also considers the length of the cluster key when determining how much space to reserve for the rows having a cluster key value. Larger cluster keys require larger sizes. To see the actual size, query the KEY_SIZE column of the USER_CLUSTERS data dictionary view. (This value does not apply to hash clusters, because hash values are not actually stored in the cluster.) If you omit this parameter, then the database reserves one data block for each cluster key value or hash value. If the SIZE value is small (more than four hash keys can be assigned for each data block) you can use this value for SIZE in the CREATE CLUSTER statement. However, if the value of SIZE is large (four or fewer hash keys can be assigned for each data block), then you should also consider the expected frequency of collisions and whether performance of data retrieval or efficiency of space usage is more important to you.

If you expect frequent collisions on inserts, the likelihood of overflow blocks being allocated to store rows is high. To reduce the possibility of overflow blocks and maximize performance when collisions are frequent, you should adjust SIZE as shown in the following chart. Available Space for each Block / Calculated SIZE Setting for SIZE SIZE 1 SIZE + 15% 2 SIZE + 12% 3 SIZE + 8% 4 SIZE >4 Overestimating the value of SIZE increases the amount of unused space in the cluster. INDEX: Specify INDEX to create an indexed cluster. HASHKEYS Clause Specify the HASHKEYS clause to create a hash cluster and specify the number of hash values for the hash cluster. In a hash cluster, Oracle Database stores together rows that have the same hash key value. The hash value for a row is the value returned by the hash function of the cluster. Oracle Database rounds up the HASHKEYS value to the nearest prime number to obtain the actual number of hash values. The minimum value for this parameter is 2. SINGLE TABLE: SINGLE TABLE indicates that the cluster is a type of hash cluster containing only one table. This clause can provide faster access to rows in the table. Specify the allocate_extent_clause to explicitly allocate a new extent for the cluster. When you explicitly allocate an extent with this clause, Oracle Database does not evaluate the storage parameters of the cluster and determine a new size for the next extent to be allocated (as it does when you create a table). Therefore, specify SIZE if you do not want Oracle Database to use a default value. You can allocate a new extent only for an indexed cluster, not for a hash cluster. PARALLEL: It specifies the degree of parallelism. If the tables in cluster contain any columns of LOB or user-defined object type, this statement as well as subsequent INSERT, UPDATE, or DELETE operations on cluster are executed serially without notification. Restriction on Single-table Clusters Only one table can be present in the cluster at a time. However, you can drop the table and create a different table in the same cluster. HASH IS expr: Specify an expression to be used as the hash function for the hash cluster. The expression: Must evaluate to a positive value Must contain at least one column, with referenced columns of any data type as long as the entire expression evaluates to a number of scale 0. For example: number_column * LENGTH (varchar2_column) Cannot reference user-defined PL/SQL functions Cannot reference the pseudo columns LEVEL or ROWNUM Cannot reference the user-related functions USERENV, UID, or USER or the datetime functions CURRENT_DATE, CURRENT_TIMESTAMP, DBTIMEZONE, EXTRACT (datetime), FROM_TZ, LOCALTIMESTAMP, NUMTODSINTERVAL, NUMTOYMINTERVAL, SESSIONTIMEZONE, SYSDATE, SYSTIMESTAMP, TO_DSINTERVAL, TO_TIMESTAMP, TO_DATE, TO_TIMESTAMP_TZ, TO_YMINTERVAL, and TZ_OFFSET. Cannot evaluate to a constant Cannot be a scalar sub query expression Cannot contain columns qualified with a schema or object name (other than the cluster name) If you omit the HASH IS clause, then Oracle Database uses an internal hash function for the hash cluster. Hash clusters with composite cluster keys or cluster keys made up of non-integer columns must use the internal hash function

CACHE | NOCACHE CACHE Specify CACHE if you want the blocks retrieved for this cluster to be placed at the most recently used end of the least recently used (LRU) list in the buffer cache when a full table scan is performed. This clause is useful for small lookup tables. NOCACHE Specify NOCACHE if you want the blocks retrieved for this cluster to be placed at the least recently used end of the LRU list in the buffer cache when a full table scan is performed. This is the default behavior. NOCACHE has no effect on clusters for which you specify KEEP in the storage_clause. Default values: PCTFREE: 10 PCTUSED: 40 INITRANS: 2 or the default value of the tablespace to contain the cluster, whichever is greater

Index Cluster
Key Notes: 1. An indexed cluster is a table cluster that uses an index to locate data. The cluster index is a B-tree index on the cluster key. A cluster index must be created before any rows can be inserted into clustered tables. 2. The database stores the rows in a heap and locates them with the index. 3. Oracle Database does not automatically create an index for a cluster when the cluster is initially created. Data manipulation language (DML) statements cannot be issued against cluster tables in an indexed cluster until you create a cluster index with a CREATE INDEX statement. 4. An index cluster is created by default. The key values are ordered in the index. 5. After you create an indexed cluster, you must create an index on the cluster key before you can issue any data manipulation language (DML) statements against a table in the cluster. This index is called the cluster index. 6. A cluster index must be created before any rows can be inserted into any clustered table. 7. The B-tree cluster index associates the cluster key value with the database block address (DBA) of the block containing the data. 8. Creating a cluster and its index in different tablespaces that are stored on different storage devices allows table data and index data to be retrieved simultaneously with minimal disk contention. Hence always specify tablespaces for the cluster and its cluster index. 9. To find or store a row in an indexed table or table cluster, the database must perform at least two I/Os: a. One or more I/Os to find or store the key value in the index b. Another I/O to read or write the row in the table or table cluster
-- Employee department Cluster. CREATE CLUSTER emp_dept (deptno NUMBER(3)) SIZE 600 TABLESPACE users STORAGE (INITIAL 200K NEXT 300K CREATE TABLE emp ( empno NUMBER(5) PRIMARY KEY, ename VARCHAR2(15) NOT NULL, . . . deptno NUMBER(3) REFERENCES dept) CLUSTER emp_dept (deptno); CREATE TABLE dept ( deptno NUMBER(3) PRIMARY KEY, . . . ) CLUSTER emp_dept (deptno); CREATE INDEX emp_dept_index ON CLUSTER emp_dept TABLESPACE users STORAGE (INITIAL 50K






HASH Cluster
Key Notes:

1. In a hash cluster, which can contain one or more tables, Oracle Database stores together rows that have the same hash key value. A hash cluster is like an indexed cluster, except the index key is replaced with a hash function. No separate cluster index exists. In a hash cluster, the data is the index. 2. Even if you decide to use hashing, a table can still have separate indexes on any columns, including the cluster key. However, you need not create an index on a hash cluster key. 3. The hash key values are actual or possible values inserted into the cluster key column. For example, if the cluster key is department_id, then hash key values could be 10, 20, 30, and so on. Oracle Database uses a hash function that accepts an infinite number of hash key values as input and sorts them into a finite number of buckets. Each bucket has a unique numeric ID known as a hash value. Each hash value maps to the database block address for the block that stores the rows corresponding to the hash key value (department 10, 20, 30, and so on). The number of hash values for the cluster depends on the hash key. 4. A hash cluster with a composite key must use the internal hash function of the database. 5. Specify the HASH IS parameter only if the cluster key is a single column of the NUMBER data type, and contains uniformly distributed integers. 6. The tables in the hash cluster are primarily static in size so that you can determine the number of rows and amount of space required for the tables in the cluster. If tables in a hash cluster require more space than the initial allocation for the cluster, performance degradation can be substantial because overflow blocks are required. 7. When you create a hash cluster, the database immediately allocates space for the cluster based on the values of the SIZE and HASHKEYS parameters. 8. When TO USE HASH CLUSTER: Most queries are equality queries on the cluster key: SELECT ... WHERE cluster_key = ...; In such cases, the cluster key in the equality condition is hashed, and the corresponding hash key is usually found with a single read. In comparison, for an indexed table the key value must first be found in the index (usually several reads), and then the row is read from the table (another read). Store a table in a hash cluster if you can determine how much space is required to hold all rows with a given cluster key value, including rows to be inserted immediately and rows to be inserted in the future. Use sorted hash clusters, where rows corresponding to each value of the hash function are sorted on a specific column in ascending order, when the database can improve response time on operations with this sorted clustered data. 9. When NOT TO USE HASH CLUSTER: Most queries on the table retrieve rows over a range of cluster key values. For example, in full table scans or queries such as the following, a hash function cannot be used to determine the location of specific hash keys. Instead, the equivalent of a full table scan must be done to fetch the rows for the query. SELECT . . . WHERE cluster_key < . . . ; With an index, key values are ordered in the index, so cluster key values that satisfy the WHERE clause of a query can be found with relatively few I/Os. The table is not static, but instead is continually growing. If a table grows without limit, the space required over the life of the table (its cluster) cannot be predetermined. Applications frequently perform full-table scans on the table and the table is sparsely populated. A full-table scan in this situation takes longer under hashing. You cannot afford to pre-allocate the space that the hash cluster will eventually need. Do not store a table in a hash cluster if the application often performs full table scans and if you must allocate a great deal of space to the hash cluster in anticipation of the table growing. Such full table scans must read all blocks allocated to the hash cluster, even though some blocks might contain few rows. Storing the table alone reduces the number of blocks read by full table scans. Do not store a table in a hash cluster if the application frequently modifies the cluster key values. Modifying a row's cluster key value can take longer than modifying the value in an unclustered table, because Oracle Database might need to migrate the modified row to another block to maintain the cluster. 10. To find or store a row in a hash cluster, Oracle Database applies the hash function to the cluster key value of the row. The resulting hash value corresponds to a data block in the cluster, which the database reads or writes on behalf of the issued statement.

11. A limitation of hash clusters is the unavailability of range scans on non-indexed cluster keys. A query for departments with IDs between 20 and 100 cannot use the hashing algorithm because it cannot hash every possible value between 20 and 100. Because no index exists, the database must perform a full scan. 12. Hash Cluster Variations a. A single-table hash cluster is an optimized version of a hash cluster that supports only one table at a time. A one-to-one mapping exists between hash keys and rows. A single-table hash cluster can be beneficial when users require rapid access to a table by primary key. For example, users often look up an employee record in the employees table by employee_id.
CREATE CLUSTER <schema_name>.<cluster_name> ( <cluster_key_column_name> <data_type>) PCTFREE <integer> PCTUSED <integer> INITRANS <integer> MAXTRANS <integer> SIZE <integer><K | M | G | T> TABLESPACE <tablespace_name> INDEX SINGLE TABLE HASHKEYS <integer> HASH IS <expression> PARALLEL <integer> <NOWROWDEPENDENCIES | ROWDEPENDENCIES> <CACHE | NOCACHE>;

b. A sorted hash cluster stores the rows corresponding to each value of the hash function in such a way that the database can efficiently return them in sorted order. The database performs the optimized sort internally.
CREATE CLUSTER <schema_name>.<cluster_name> ( <cluster_key_column_name> <data_type> <SORT>) PCTFREE <integer> PCTUSED <integer> INITRANS <integer> MAXTRANS <integer> SIZE <integer><K | M | G | T> TABLESPACE <tablespace_name> INDEX <SINGLE TABLE> HASHKEYS <integer> HASH IS <expression> PARALLEL <integer> <NOWROWDEPENDENCIES | ROWDEPENDENCIES> <CACHE | NOCACHE>;

13. Hash Cluster Storage: Oracle Database allocates space for a hash cluster differently from an indexed cluster. In , HASHKEYS specifies the number of departments likely to exist, whereas SIZE specifies the size of the data associated with each department. The database computes a storage space value based on the following formula:
HASHKEYS * SIZE / database_block_size

14. Oracle Database does not limit the number of hash key values that you can insert into the cluster even though the HASHKEYS specifies a limit. 15. A user inserts a new department with department_id 43 into the departments table. The number of departments exceeds the HASHKEYS value, so the database hashes department_id 43 to hash value 77, which is the same hash value used for department_id 20. Hashing multiple input values to the same output value is called a hash collision. When users insert rows into the cluster for department 43, the database cannot store these rows in block 100, which is full. The database links block 100 to a new overflow block, say block 200, and

stores the inserted rows in the new block. Both block 100 and 200 are now eligible to store data for either department.

A query of either department 20 or 43 now requires two I/Os to retrieve the data: block 100 and its associated block 200. You can solve this problem by re-creating the cluster with a different HASHKEYS value. Examples: A hash cluster named language with the cluster key column cust_language, a maximum of 10 hash key values, each of which is allocated 512 bytes, and storage parameter values:
CREATE CLUSTER language (cust_language VARCHAR2(3)) SIZE 512 HASHKEYS 10 STORAGE (INITIAL 100k next 50k);

Because the preceding statement omits the HASH IS clause, Oracle Database uses the internal hash function for the cluster.
CREATE CLUSTER employees_departments_cluster (department_id NUMBER(4)) SIZE 8192 HASHKEYS 100;

The following statement creates a hash cluster named address with the cluster key made up of the columns postal_code and country_id, and uses a SQL expression containing these columns for the hash function:
CREATE CLUSTER address (postal_code NUMBER, country_id CHAR(2)) HASHKEYS 20 HASH IS MOD(postal_code + country_id, 101);

Single-Table Hash Clusters: Creates a single-table hash cluster named cust_orders with the cluster key customer_id and a maximum of 100 hash key values, each of which is allocated 512 bytes:
CREATE CLUSTER cust_orders (customer_id NUMBER(6)) SIZE 512 SINGLE TABLE HASHKEYS 100;

Create Hash Cluster With Hash Expression

CREATE CLUSTER cl_address (postal_code NUMBER, country_id VARCHAR(2)) HASHKEYS 16 HASH IS MOD(postal_code + country_id, 101);

Alter Statement:
Use the ALTER CLUSTER statement to redefine storage and parallelism characteristics of a cluster. You cannot use this statement to change the number or the name of columns in the cluster key, and you cannot change the tablespace in which the cluster is stored. You cannot change the values of the storage parameters INITIAL and MINEXTENTS for a cluster


Altering a CLUSTER: Physical attributes (INITRANS and storage characteristics) The default degree of parallelism Allocation of a new extent for the cluster Deallocation of any unused extents at the end of the cluster The average amount of space required to store all the rows for a cluster key value (SIZE)
SIZE means estimated number of keys accommodated per block When you alter the cluster size parameter (SIZE) of a cluster, the new settings apply to all data blocks used by the

cluster, including blocks already allocated and blocks subsequently allocated for the cluster. Blocks already allocated for the table are reorganized when necessary (not immediately). For HASH CLUSTER: The SIZE, HASHKEYS, and HASH IS parameters cannot be specified in an ALTER CLUSTER statement. To change these parameters, you must re-create cluster, and then copy data from the original cluster.

ALTER CLUSTERED Table: You can use the ALTER TABLE statement only to Add or modify columns, Drop non-cluster-key columns, Add, drop, enable, or disable integrity constraints or triggers for a clustered table

Example: - ALTER CLUSTER language DEALLOCATE UNUSED KEEP 30 K; -- Keeps 30 kilobytes of unused space for future use.


The REUSE STORAGE option specifies that all space currently allocated for the table or cluster remains allocated to it. A hash cluster cannot be truncated, nor can tables within a hash or index cluster be individually truncated. Truncation of an index cluster deletes all rows from all tables in the cluster. If all the rows must be deleted from an individual clustered table, use the DELETE statement or drop and re-create the table. Validating Tables (IOTs), Indexes, Clusters, and Materialized Views ANALYZE TABLE emp VALIDATE STRUCTURE; -- validate an object only. ANALYZE TABLE emp VALIDATE STRUCTURE CASCADE; -- Validate an object and all dependent objects including indexes. Because this operation can be resource intensive, you can perform a faster version of the validation by using the FAST clause. This version checks for the existence of corruptions using an optimized check algorithm, but does not report details about the corruption.

You can specify that you want to perform structure validation online while DML is occurring against the object being validated.

Listing Chained Rows of Tables (IOTs) and Clusters You can look at the chained and migrated rows of a table or cluster using the ANALYZE statement with the LIST CHAINED ROWS clause. The results of this statement are stored in CHAINED_ROWS table (execute the UTLCHAIN.SQL or UTLCHN1.SQL script) created explicitly to accept the information returned by the LIST CHAINED ROWS clause.

Eliminating Migrated or Chained Rows in a Table: 1. Use the ANALYZE statement to collect information about migrated and chained rows.

2. Query the output table:


3. If the output table shows that you have many migrated or chained rows, then you can eliminate migrated rows by continuing through the following steps: 4. Create an intermediate table with the same columns as the existing table to hold the migrated and chained rows:
CREATE TABLE int_order_hist AS SELECT * FROM order_hist WHERE ROWID IN


5. Delete the migrated and chained rows from the existing table:

6. Insert the rows of the intermediate table into the existing table:
INSERT INTO order_hist SELECT * FROM int_order_hist;

7. Drop the intermediate table:

DROP TABLE int_order_history;

8. Delete the information collected in step 1 from the output table:


9. Use the ANALYZE statement again, and query the output table. You can eliminate chained rows only by increasing your data block size. It might not be possible to avoid chaining in all situations. Chaining is often unavoidable with tables that have a LONG column or large CHAR or VARCHAR2 columns.

Gathering Statistics: You can use the DBMS_STATS package or the ANALYZE statement to gather statistics about the physical storage characteristics of a table, index, or cluster. Oracle recommends using the more versatile DBMS_STATS package for gathering optimizer statistics, but you must use the ANALYZE statement to collect statistics unrelated to the optimizer, such as empty blocks, average space, and so forth. The DBMS_STATS package allows both the gathering of statistics, including utilizing parallel execution, and the external manipulation of statistics. The following DBMS_STATS procedures enable the gathering of optimizer statistics:

Note: a. Do not use the COMPUTE and ESTIMATE clauses of ANALYZE to collect optimizer statistics. b. The cost-based optimizer, which depends upon statistics, will eventually use only statistics that have been collected by DBMS_STATS. c. You must use the ANALYZE statement (rather than DBMS_STATS) for statistics collection not related to the costbased optimizer, such as: To use the VALIDATE or LIST CHAINED ROWS clauses To collect information on free-list blocks