DBMS

1. Introduction DBMS - A Database is a collection of interrelated data and a Database Management System is a set of programs to use and/or modify this data. 1. 1 Approaches to Data Management • File-Based Systems Conventionally, before the Database systems evolved, data in software systems was stored in and represented using flat files. • Database Systems Database Systems evolved in the late 1960s to address common issues in applications handling large volumes of data which are also data intensive. Some of these issues could be traced back to the following disadvantages of File-based systems. Drawbacks of File-Based Systems

As shown in the figure, in a file-based system, different programs in the same application may be interacting with different private data files. There is no system enforcing any standardized control on the organization and structure of these data files. • Data Redundancy and Inconsistency Since data resides in different private data files, there are chances of redundancy and resulting inconsistency. For example, in the above example shown, the same customer can have a savings account as well as a mortgage loan. Here the customer details may be duplicated since the programs for the two functions store their corresponding data in two different data files. This gives rise to redundancy in the customer's data. Since the same data is stored in two files, inconsistency arises if a change made in the data in one file is not reflected in the other. • Unanticipated Queries

In a file-based system, handling sudden/ad-hoc queries can be difficult, since it requires changes in the existing programs. • Data Isolation Though data used by different programs in the application may be related, they reside in isolated data files. • Concurrent Access Anomalies In large multi-user systems the same file or record may need to be accessed by multiple users simultaneously. Handling this in a file-based systems is difficult. • Security Problems In data-intensive applications, security of data is a major concern. Users should be given access only to required data and not the whole database. In a file-based system, this can be handled only by additional programming in each application. • Integrity Problems In any application, there will be certain data integrity rules which need to be maintained. These could be in the form of certain conditions/constraints on the elements of the data records. In the savings bank application, one such integrity rule could be Customer ID, which is the unique identifier for a customer record, should be non-empty. There can be several such integrity rules. In a file-based system, all these rules need to be explicitly programmed in the application program.

It may be noted that, we are not trying to say that handling the above issues like concurrent access, security, integrity problems, etc., is not possible in a file-based system. The real issue was that, though all these are common issues of concern to any data-intensive application, each application had to handle all these problems on its own. The application programmer needs to bother not only about implementing the application business rules but also about handling these common issues.
1.2 Advantages of Database Systems

As shown in the figure, the DBMS is a central system which provides a common interface between the data and the various front-end programs in the application. It also provides a central location for the whole data in the application to reside.

Due to its centralized nature, the database system can overcome the disadvantages of the file-based system as discussed below. • Minimal Data Redundancy

Since the whole data resides in one central database, the various programs in the application can access data in different data files. Hence data present in one file need not be duplicated in another. This reduces data redundancy. However, this does not mean all redundancy can be eliminated. There could be business or technical reasons for having some amount of redundancy. Any such redundancy should be carefully controlled and the DBMS should be aware of it. • Data Consistency

Reduced data redundancy leads to better data consistency. • Data Integration

Since related data is stored in one single database, enforcing data integrity is much easier. Moreover, the functions in the DBMS can be used to enforce the integrity rules with minimum programming in the application programs. • Data Sharing

Related data can be shared across programs since the data is stored in a centralized manner. Even new applications can be developed to operate against the same data. • Enforcement of Standards

Enforcing standards in the organization and structure of data files is required and also easy in a Database System, since it is one single set of programs which is always interacting with the data files. • Application Development Ease

The application programmer need not build the functions for handling issues like concurrent access, security, data integrity, etc. The programmer only needs to implement the application business rules. This brings in application development ease. Adding additional functional modules is also easier than in file-based systems. • Better Controls

Better controls can be achieved due to the centralized nature of the system. • Data Independence

The architecture of the DBMS can be viewed as a 3-level system comprising the following: - The internal or the physical level where the data resides. - The conceptual level which is the level of the DBMS functions - The external level which is the level of the application programs or the end user. Data Independence is isolating an upper level from the changes in the organization or structure of a lower level. For example, if changes in the file organization of a data file do not demand for changes in the functions in the DBMS or in the application programs, data independence is achieved. Thus Data Independence can be defined as immunity of applications to change in physical representation and access technique. The provision of data independence is a major objective for database systems.

Reduced Maintenance

Maintenance is less and easy, again, due to the centralized nature of the system. 1.3 Functions of a DBMS The functions performed by a typical DBMS are the following: • Data Definition The DBMS provides functions to define the structure of the data in the application. These include defining and modifying the record structure, the type and size of fields and the various constraints/conditions to be satisfied by the data in each field. • Data Manipulation Once the data structure is defined, data needs to be inserted, modified or deleted. The functions which perform these operations are also part of the DBMS. These functions can handle planned and unplanned data manipulation needs. Planned queries are those which form part of the application. Unplanned queries are ad-hoc queries which are performed on a need basis. • Data Security & Integrity The DBMS contains functions which handle the security and integrity of data in the application. These can be easily invoked by the application and hence the application programmer need not code these functions in his/her programs. • Data Recovery & Concurrency Recovery of data after a system failure and concurrent access of records by multiple users are also handled by the DBMS. • Data Dictionary Maintenance Maintaining the Data Dictionary which contains the data definition of the application is also one of the functions of a DBMS. • Performance Optimizing the performance of the queries is one of the important functions of a DBMS. Hence the DBMS has a set of programs forming the Query Optimizer which evaluates the different implementations of a query and chooses the best among them. Thus the DBMS provides an environment that is both convenient and efficient to use when there is a large volume of data and many transactions to be processed. 1.4 Role of the Database Administrator Typically there are three types of users for a DBMS. They are : 1. The End User who uses the application. Ultimately, this is the user who actually puts the data in the system into use in business. This user need not know anything about the organization of data in the physical level. She also need not be aware of the complete data in the system. She needs to have access and knowledge of only the data she is using. 2. The Application Programmer who develops the application programs. She has more knowledge about the data and its structure since she has manipulate the data using her programs. She also need not have access and knowledge of the complete data in the system. 3. The Database Administrator (DBA) who is like the super-user of the system. The role of the DBA is very important and is defined by the following functions.

Defining backup procedures includes specifying what data is to backed up. • Liaising with Users The DBA needs to interact continuously with the users to understand the data in the system and its use. The DBA determines what data needs to be present in the system ad how this data has to be represented and organized. • Monitoring Performance The DBA has to continuously monitor the performance of the queries and take measures to optimize all the queries in the application. • Defining Security & Integrity Checks The DBA finds about the access restrictions to be defined and defines security checks accordingly.• Defining the Schema The DBA defines the schema which contains the structure of the data in the application. the periodicity of taking backups and also the medium and storage place for the backup data. . Data Integrity checks are also defined by the DBA. • Defining Backup / Recovery Procedures The DBA also defines procedures for backup and recovery.

A parent record can have several children.the first storing the relations between CUSTOMER. . but a child can have only one parent. In practice. The many-to-many relationship is implemented through the ORDER_PARTS segment which occurs in both the hierarchies. ORDER_PARTS and SALES_HISTORY. different records are inter-related through hierarchical or tree-like structures. IMS (Information Management System) of IBM is an example of a Hierarchical DBMS. • In the Hierarchical Model. while the other has a logical pointer to this segment. ORDERS. only one tree stores the ORDER_PARTS segment. there are two hierarchies shown .1. CONTACTS and ORDER_PARTS and the second showing the relation between PARTS. hierarchic and network systems. These are the prerelational models.5 Types of Database Systems Database Systems can be catagorised according to the data structures and operators they present to the user. In the figure. The oldest systems fall into inverted list.

Records are physically linked through linked-lists.• In the Network Model. is an example of a Network DBMS. IDMS from Computer Associates International Inc. a parent can have several children and a child can also have many parent records. .

. ADDRESS L. NO. CONTACT 15371 15371 . Examples of such systems are GemStone and Versant ODBMS.. . . Road .. • SALES-HISTORY PART NO. 3216 ... Data in two tables is related through common columns and not physical links or pointers. . Sybase... Informix. . • CONTACTS CUST.. NO. .. . .. S3 . S3 S3 S3 S3 REGION East North South West YEAR 1996 1996 1996 1996 UNITS 2000 5500 12000 20000 Nanubhai Rajesh Munim . . . This makes querying much more easier in a Relational DBMS than in the the Hierarchical or Network DBMS. . The recent developments in the area have shown up in the form of certain object and object/relational DBMS products...... . ..... Unlike the other two type of DBMS... there is no need to traverse pointers in the Relational DBMS. PARTS DESC Amkette 3. is a major reason for the relational model to become more programmer friendly and much more dominant and popular in both industrial and academic scenarios... QUANTITY 300 120 .. ORDER DATE CUSTOMER NO. ....NO. . .... PARTS PARTS NO.. .. 24-June-1997 15371 ... 3216 3216 .. ... DB2. PART PRICE 400. ....00 . CITY Mumbai . . . ORDERS-PARTS ORDER PART NO... Operators are provided for operating on rows in tables.... .• In the Relational Model.. All data is maintained in the form of tables consisting of rows and columns.... MS-SQL Server are few of the popular Relational DBMSs.... . .. CUSTOMER NAME 15371 Nanubhai & Sons ..... ...... .... . ....... there are no physical links.. C1 S3 . ORDERS ORDER NO..... Ingres.. unlike the Hierarchical and Network models. This.. in fact. CUSTOMER CUST. J.. . DESIGNATION Owner Accountant .... . Research has also proceeded on to a variety of other schemes including the multi-dimensional approach and the logicbased approach. Oracle..5" Floppies ..

the main objective is to optimize performance by minimizing the number of disk accesses during the various database operations. The Internal Level is the level which deals with the physical storage of data. While designing this layer. 2. what are the representation of the fields etc. . The Conceptual level is the representation of the entre information content of the database.3-Level Database System Architecture • • • The External Level represents the collection of views available to different end-users. The Internal Level This chapter discusses the issues related to how the data is physically stored on the disk and some of the access mechanisms commonly used for retrieving this data. The Internal level is the physical level which shows how the data data is stored.

. . Hence the number of pages to be retrieved will be less and this reduces the number of disk accesses which in turn gives a better performance. The DBMS views the database as a collection of records... The Disk Manager determines the physical location on the disk and retrieves the required page. .. physically together is called clustering.. i. the latter maps the record to a page containing it and requests the Disk Manager for the specific page. it can be assumed that the Customer records are stored at random physical locations. Eg: Consider CUSTOMER table as shown below. In such a situation... ..... if the records are clustered.1 Clustering In the above process. Q: Can a table have clustering on multiple fields simultaneously ? A: No . .. .... 10001 to 10100). in the given example.. Thus. . more records will be in the same page... .The figure shows the process of database access in general.... Hence the number of pages to be accessed for retrieving the 100 consecutive records will be ceil(100/8) = 13. Assume that the Customer record size is 128 bytes and the typical size of a page retrieved by the File Manager is 1 Kb (1024 bytes). clustering improves the speed by a factor of 7.. Cust City Delhi . This can be explained as follows. .. ... retrieval from the disk is not necessary. . . 2. . In the worst-case scenario. Hence a query to retrieve 100 records with consecutive Cust_Ids (say.. The File Manager of the underlying Operating System views it as a set of pages and the Disk Manager views it as a collection of physical locations on the disk. Cust Name Raj . If there is no clustering. This method of storing logically related records... When the DBMS makes a request for a specific record to the File Manager.. a page can contain 8 records.. Thus. But.... . .e... . each record may be placed in a different page..7 Q: For what record size will clustering be of no benefit to improve performance ? A: When the record size and page size are such that a page can contain only one record. only 13 disk accesses will be required to obtain the query results. time taken for the whole operation will be less. clustering based on Cust_ID will help improving the performance of these queries. if records which are frequently used together are placed physically together. Cust ID 10001 10002 10003 10004 .. will require 100 pages to be accessed which in turn translates to 100 disk accesses. if the page containing the requested record is already in the memory. If queries retrieving Customers with consecutive Cust_IDs frequently occur in the application.. ..

2.Clustered records belong to the same file (table) as in the above example. Here. This is because of the much smaller size of the index record due to which each page will be able to contain more number of records. Here interleaving of records is used. the number of pages depends on the size of each record also.Intra-file clustering . if the records are stored physically together.2 Indexing Indexing is another common method for making retrievals faster. A new index file is created. Retrieval Speed v/s Update Speed : Though indexes help making retrievals faster. . it is to be noted that this search will be much faster than a sequential search in the CUSTOMER table. The number of records in the index file is same as that of the data file. If the records are stored physically together. they slow down updates on the table since updates on the base table demand update on the index field as well. a search is carried out on the Index file. If the records are randomly stored. steps can be taken to improve the performance of these queries. One field contains the value of the Cust_City field and the second contains a pointer to the actual data record in the CUSTOMER table. Creating an Index on Cust_City is one such method.Clustered records belong to different files (tables). the pointer in the second field of the records can be followed to directly retrieve the corresponding CUSTOMER records. This type of clustering may be required to enhance the speed of queries retrieving related records from more than one tables. Thus the access involves a Sequential access on the index file and a Direct access on the actual data file. Whenever a query based on Cust_City field occurs. Retrieve the records of all customers who reside in Delhi Here a sequential search on the CUSTOMER table has to be carried out and all records with the value 'Delhi' in the Cust_City field have to be retrieved. Consider the example of CUSTOMER table used above. Inter-file Clustering . The time taken for this operation depends on the number of pages to be accessed. the page accesses depends on the volume of data. The index file has two fields in each record. When the records with value 'Delhi' in the Cust_City field in the index file are located. If such queries based on Cust_City field are very frequent in the application. The following query is based on Customer's city. This results in the scenario as shown below.

d)All of the above A: d) All of the above Q: Can a clustering based on one field and indexing on another field exist on the same table simultaneously ? A: Yes 2. Q: Can there be more than one hash fields on a file ? A: No .It is possible to create an index with multiple fields i. where f is the hash function. Here. Retrievals are faster since a direct access is provided and there is no search involved in the process.3 Hashing Hashing is yet another method used for making retrievals faster. when a record is to be retrieved. Later. Multiple indexes can also be created on the same table simultaneously though there may be a limit on the maximum number of indexes that can be created on a table..e. modulus a very large prime number. An example of a typical hash function is given by a numeric hash field. c) In queries involving NULL / Not NULL in the indexed field. Thus for every new record. say an id. it is physically stored at an address which is computed by applying a mathematical function (hash function) to the value of the hash field. the same hash function is used to compute the address where the record is stored. Q: In which of the following situations will indexes be ineffective ? a) When the percentage of rows being retrieved is large b) When the data table is small and the index record is of almost the same size as of the actual data record. hash_address = f (hash_field). when a new record is inserted. index on field combinations. This method provides direct access to record on the basis of the value of a specific field called the hash_field.

the pointer from the record at that address is followed to locate the required record. 1217 etc. the over head incurred is the time taken for the linear search to locate the next free location while inserting a record. records with CUST_ID values 20001. the hash address is computed to locate the record. The records with CUST_ID 10001. . And same is the case with CUST_ID values 30001. 1217 etc.As hashing relates the field value to the address of the record. 10003 etc. 10002. In this method. In the above example. A pointer from the first record at the original hash address to the new record will also be stored. search for the next free location available in the disk and store the new record at this location. 30002. Let CUST_ID be the hash field and the hash function be defined as ((CUST_ID mod 10000)*64 + 1025). When it is seen that the record is not available at the hash address. The methods to resolve a collision are by using : 1. During retrieval. 30003 etc. Collisions : Consider the example of the CUSTOMER table given earlier while discussing clustering. It is possible that two records hash to the same address leading to a collision. respectively. Linear Search: While inserting a new record. Collision Chain: Here. 20003 etc. if it is found that the location at the hash address is already occupied by a previously inserted record. 1153. will be stored at addresses 1089. 20002. 2. multiple hash fields will map a record to multiple addresses at the same time. 1153. Hence there can be only one hash field per file. will also map on to the addresses 1089. respectively. the hash address location contains the head of a list of pointers linking together all records which hash to that address.

3.1 The Relational Model Relational Databases: Terminology Ord_Items Databases: Case Example Ord # 101 Ord_Aug Ord # 101 102 103 104 105 OrdDate 02-08-94 11-08-94 21-08-94 28-08-94 30-08-94 Cust# 002 003 003 002 005 101 101 102 103 104 104 105 HW3 SW1 HW2 HW3 HW2 HW3 SW1 50 150 10 50 25 100 100 Item # HW1 Qty 100 . an overflow area needs to be used if the number of records mapping on to the same hash address exceeds the number of locations linked to it.In this method.

Ord# and Item# combination forms the primary Key of Ord_Items Cust# in Ord_Aug relation is a foreign key creating reference from Ord_Aug to Customers. Attribute Cardinality of a relation Degree of a relation Domain of an attribute Cardinality of Ord_Items relation is 8 Degree of Customers relation is 3. . from the given Case Example Ord_Aug. Domain of Qty in Ord_Items is the set of all values which can represent quantity of an ordered item. The number of tuples in a relation. The number of attributes in a relation.. A row or a record in a relation. The set of all values that can be taken by the attribute. An attribute or a combination of attributes in one relation R1 which indicates the relationship of R1 with another relation R2. i. Attribute Values are Atomic “ Each tuple contains exactly one value for each attribute. Item#. Attributes are unordered “ The order of columns in a relation is immaterial.0 Price 4000 2000 800 5000 8000 Customers Ord # 101 102 103 104 105 OrdDate 02-08-94 11-08-94 21-08-94 28-08-94 30-08-94 Cust# 002 003 003 002 005 Term Relation Tuple A table Meaning Eg.Items Item # HW1 HW2 HW3 SW1 SW2 Descr Power Supply 101-Keyboard Mouse MS-DOS 6. Ord_Date. Primary Key of Customers relation is Cust#.0 MS-Word 6. CustName etc.2 Properties of Relations No Duplicate Tuples “ A relation cannot contain two or more tuples which have the same values for all the attributes. every row is unique.e. A field or a column in a relation. Customers. The foreign key attributes in R1 must contain values matching with those of the values in R2 Foreign Key 3. A row from Customers relation is a Customer tuple. In any relation. This is required to indicate the relationship between Orders in Ord_Aug and Customers. Ord# and Item# in Ord_Items are foreign keys creating references from Ord_Items to Ord_Aug and Items respectively. Tuples are unordered “ The order of rows in a relation is immaterial. Items etc. Primary Key of a relation An attribute or a combination of attributes that uniquely defines each tuple in a relation.

if the application business rule allows this. a Null value can be allowed for EmpAcc# in Employee relation. Cust# in Ord_Aug cannot accept Null if the business rule insists that the Customer No. 3. The next issue related to foreign key reference is handling deletes / updates of parent? In the case example. can we delete the record with Cust# value 002. This can be logically explained with the help of the following example: Consider the relations Employee and Account as given below. â ¢ No Component of the Primary Key can be null. 003 or 005 ? .3 Integrity Rules The following are the integrity rules to be satisfied by any relation. a Null value in EmpAcc# attribute is logically possible if an Employee does not have a bank account. This is called the referential integrity rule.It may be noted that many of the properties of relations follow the fact that the body of a relation is a mathematical set. Q: Can the Foreign Key accept nulls? A: Yes. Employee Emp# X101 X102 X103 X104 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal EmpAcc# 120001 120002 Null 120003 Account ACC# 120001 120002 120003 120004 OpenDate 30-Aug-1998 29-Oct-1998 01-Jan-1999 04-Mar-1999 BalAmt 5000 1200 3000 500 EmpAcc# in Employee relation is a foreign key creating reference from Employee to Account. needs to be stored for every order placed. How do we explain this ? Unlike the case of Primary Keys. there is no integrity rule saying that no component of the foreign key can be null. If the business rules allow an employee to exist in the system without opening an account. Here. â ¢ The Database must not contain any unmatched Foreign Key values. In the case example given.

But these order records.The default answer is NO. After the deletion the data in the tables will be as follows: Employee Emp# X101 X102 X103 X104 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal EmpAcc# 120001 120002 Null Null 120003 Account ACC# 120001 120002 120003 120004 OpenDate 30-Aug-1998 29-Oct-1998 01-Jan-1999 04-Mar-1999 BalAmt 5000 1200 3000 500 3.4 Relational Algebra Operators The eight relational algebra operators are 1. But this deletion is not possible as long as the Employee record of Raj references it. Nullify: Update the referencing to Null and then delete/update the parent record. In the above example of Employee and Account relations. the records are referenced from the order records in Ord_Aug relation. Account record with Acc# 120002 has to be deleted. Hence the strategy can be to update the EmpAcc# field in the employee record of Raj to Null and then delete the Account parent record of 120002. an account record may have to be deleted if the account is to be closed. if Employee Raj decides to close his account. In the case example. Customer record with Cust#002 can be deleted after deleting order records with Ord# 101 and 104. in turn. as long as there is a foreign key reference to these records from some other table. Cascade: Delete/Update all the references successively or in a cascaded fashion and finally delete/update the parent record. SELECT . Deletion can still be carried if we use the Cascade or Nullify strategies. Here. . For example. Hence Restrict the deletion of the parent record. can be deleted only after deleting those records with Ord# 101 and 104 from Ord_Items relation.To retrieve specific tuples/rows from a relation.

Ord# 101 104 OrdDate 02-08-94 18-09-94 Cust# 002 002 2. PROJECT .To retrieve specific attributes/columns from a relation.0 Price 4000 2000 800 5000 8000 . Descr Power Supply 101-Keyboard Mouse MS-DOS 6.0 MS-Word 6.

To retrieve tuples appearing in either or both the relations participating in the UNION. Eg: Consider the relation Ord_Jul as follows (Table: Ord_Jul) .To obtain all possible combination of tuples from two relations. Ord# 101 101 101 101 101 102 102 OrdDate 02-08-94 02-08-94 02-08-94 02-08-94 02-08-94 11-08-94 11-08-94 O. PRODUCT .Cust# 001 002 003 004 005 001 002 CustName Shah Srinivasan Gupta Banerjee Apte Shah Srinivasan City Bombay Madras Delhi Calcutta Bombay Bombay Madras 4.3. UNION .Cust# 002 002 002 002 002 003 003 C.

To retrieve tuples appearing in the first relation participating in the DIFFERENCE but not the second. INTERSECT. .101 102 101 102 103 104 105 03-07-94 27-07-94 02-08-94 11-08-94 21-08-94 28-08-94 30-08-94 001 003 002 003 003 002 005 Note: The union operation shown above logically implies retrieval of records of Orders placed in July or in August 5.To retrieve tuples appearing in both the relations participating in the INTERSECT. DIFFERENCE . Eg: To retrieve Cust# of Customers who've placed orders in July and in August Cust# 003 6.

Eg: To retrieve Cust# of Customers who've placed orders in July but not in August Cust# 001 7. JOIN .To retrieve combinations of tuples in two relations based on a common field in both the relations. the commo n column is Cust#) Ord# 101 102 103 104 OrdDate 02-08-94 11-08-94 21-08-94 28-08-94 Cust# 002 003 003 002 CustNames Srinivasan Gupta Gupta Srinivasan City Madras Delhi Delhi Madras . Eg: ORD_A UG join CUSTO MERS (here.

This means that result of the inner join shows the details of those employees who hold an account along with the account details. EMPLOYEE EMP # X101 X102 X103 X104 ACCOUNT Acc# 120001 120002 120003 120004 OpenDate 30. columns of the right-side table will take null values. 1998 1. Aug.Note: The above join operation logically implies retrieval of details of all orders and the details of the corresponding customers who placed the orders. 1998 29. Oct. Otherwise. 1999 4. 1999 BalAmt 5000 1200 3000 500 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal Acc# 120001 120002 Null 120003 A join can be formed between the two relations based on the common column Acc#. Aug. 1998 1. . from each table. These three joins are explained as follows: The left outer join retrieves all rows from the left-side (of the join operator) table. This is the most common join operation. Jan 1999 BalAmt 5000 1200 3000 Note that. Jan. Oct. The result of the (inner) join is : Emp# X101 X102 X104 EmpName Shekhar Raj Vani EmpCity Bombay Pune Bhopal Acc# 120001 120002 120003 OpenDate 30. only those records which have corresponding records in the other table appear in the result set. Mar. The other type of join is the outer join which has three variations â “ the left outer join. the right outer join and the full outer join. Consider the example of EMPLOYEE and ACCOUNT relations. Such a join operation where only those rows having corresponding rows in the both the relations are retrieved is called the natural join or inner join. the correspondence will be shown. If there are corresponding or related rows in the right-side table. 1998 29.

If there are corresponding or related rows in the left-side table. Aug. the correspondence will be shown. the correspondence will be shown. Jan 1999 4. EMPLOYEE right outer join ACCOUNT gives: Emp# X101 X102 X104 NULL EmpName Shekhar Raj Vani NULL EmpCity Bombay Pune Bhopal NULL Acc# 120001 120002 120003 120004 OpenDate 30. related columns will take null values. Aug. 1998 29. 1999 BalAmt 5000 1200 3000 500 (Assume that Acc# 120004 belongs to someone who is not an employee and hence the details of the Account holder are not available here) The full outer join retrieves all rows from both the tables. Mar. Otherwise. columns of the left-side table will take null values. If there is a correspondence or relation between rows from the tables of either side. . 1998 1. Oct.EMPLOYEE left outer join ACCOUNT gives: Emp# X101 X102 X103 X104 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal Acc# 120001 120002 NULL 120003 OpenDate 30. Otherwise. 1998 NULL 1. Oct. 1998 29. Jan 1999 BalAmt 5000 1200 NULL 3000 The right outer join retrieves all rows from the right-side (of the join operator) table.

1998 NULL 1. 1998 29. Mar. Oct. 1999 BalAmt 5000 1200 NULL 3000 500 Q: What will the result of a natural join operation between R1 and R2 ? A: a1 a2 a3 b1 b2 b3 c1 c2 c3 8. Jan 1999 4. Aug. DIVIDE Consider the following three relations: R1 divide by R2 per R3 gives: a Thus the result contains those values from R1 whose corresponding R2 values in R3 include all R2 values. .EMPLOYEE full outer join ACCOUNT gives: Emp# X101 X102 X103 X104 NULL EmpName Shekhar Raj Sharma Vani NULL EmpCity Bombay Pune Nagpur Bhopal NULL Acc# 120001 120002 NULL 120003 120004 OpenDate 30.

Data Definition Language â “ Consists of SQL statements for defining the schema (Creating. Deleting and Retrieving Data) in tables which already exist. Data Manipulation Language â “ Consists of SQL statements for operating on the data (Inserting. indexes.) c. b. General form: SELECT [ ALL | DISTINCT ] <attribute (comma)list> FROM <table (comma)list> [ WHERE <conditional expression>] [ ORDER BY [DESC] <attribute list> [ GROUP BY <attribute (comma)list>] [ HAVING <conditional expression>] The INSERT statement Inserts one or more tuples in a table. To insert multiple tuples INSERT INTO <table-name> [<attribute (comma)list>] SELECT [ ALL | DISTINCT ] <attribute (comma)list> FROM <table (comma)list>* [ WHERE <conditional expression>].1 SQL : An Overview The components of SQL are a. General forms: To insert a single tuple INSERT INTO <table-name> [<attribute (comma)list>] VALUES <value list>. Data Control Language â “ Consists of SQL statements for providing and revoking access permissions to users 4. UPDATE and DELETE statements.2 DML . Modifying. INSERT.4. views etc. Modifying and Dropping tables.list of existing tables . * .SELECT. The SELECT statement Retrieves rows from one or more tables according to given conditions. Structured Query Language (SQL) 4.

cust#. indexes and other elements of the DBMS. city CHAR(20)).This query Creates a table CUSTOMERS with 3 fields . and DROP statements. * . The DELETE statement Deletes one or more tuples in a table according to given conditions General form: DELETE FROM <table-name> [ WHERE <conditional expression>]. attribute-2 = value-2.attribute-n = value-n] [ WHERE <conditional expression>]. General form: UPDATE <table-name> SET <attribute-1 = value-1[. . modify and drop the definitions or structures of various tables.CREATE. Cust# cannot be null Query: ..The UPDATE statement Updates values of one or more attributes of one or more tuples in a table. Some CREATE TABLE statements on the Case Example Query: CREATE TABLE customers ( cust# NUMBER(6) NOT NULL. views.3 DDL .. 4.. DDL statements are those which are used to create. ALTER. custname CHAR(30) . custname and city. General form: CREATE TABLE <table-name> (<table-element (comma)list>*). The CREATE TABLE statement Creates a new table.table element may be attribute with its data-type and size or any integrity constraint on attributes.

This query Creates table ORD_SEP as a copy of ORD_AUG. No data in ord_aug is copied to the new table since there is no row which satisfies the 'always false' condition 1 = 2. .This query changes the custname field to a character field of length 35. the new attribute will take NULL values since no DEFAULT value is mentioned for the attribute. Adds two new attributes to the Customers table. for existing tuples (if any). . which has the same structure of ord_aug. The data in ord_aug is copied to the new table ord_sep. Query: ALTER TABLE customers MODIFY custname CHAR(35). <-----------------credit_rating char(1)). General form: ALTER TABLE <table-name> ADD | MODIFY (<table-element (comma)list). Copies structure as well as data.This query Creates table ORD_SEP as a cpy of ORD-AUG. The ALTER TABLE statement Alters the structure of an existing table. which has the same structure of ord_aug. The DROP TABLE statement DROPS an existing table.CREATE TABLE ord_sep <------------------AS SELECT * from ord_aug. Creates a new table ord_sep. but does not copy any data as the WHERE clause is never satisfied. . Here. Creates a new table ord_sep. . Query: CREATE TABLE ord_sep <-----------------AS SELECT * from ord_aug WHERE 1 = 2. Query: ALTER TABLE customers ADD (phone number(8). Examples of ALTER TABLE statement.This query adds two new fields . Used for modifying field lengths and attributes. <------------Modifies the data type/size of an attribute in the table .phone & credit_rating to the customers table.

custname FROM ord_aug. . and Quantity respectively. qty FROM ord_items.cust# = customers. . Quantity) AS SELECT item#. Example: Query: DROP TABLE ord_sep. ord_aug. Query: CREATE VIEW myview2 (ItemNo.General form: DROP TABLE <table-name>. . orddate. customers WHERE ord_aug.cust#.This query defines a view with columns item# and qty from the ORD_ITEMS table.This query defines a view consisting of ord#.cust#. and custname using a join of ORD_AUG and CUSTOMERS tables. Query: CREATE VIEW myview1 AS SELECT ord#. .The above query drops table ORD_SEP from the database Creating & Dropping Views A view is a virtual relation created with attributes from one or more base tables. at any given time will evaluate the view-defining query in the CREATE VIEW statement and display the result. cust#. SELECT * FROM myview1. and renames these columns as ItemNo.

descr. the updated values do not cause the row to fall outside the view. Query: DROP VIEW myview1. <---. <-------------------Drops index i_city Creates an index which allows only unique values for custnames Creates an index based on two fields : city and custname . Query: DROP INDEX i_city. Creates a new index named i_city. price FROM items WHERE price < 1000 WITH CHECK OPTION. .Query: CREATE VIEW myview3 AS SELECT item#. The new index file(table) will have the values of city column of Customers table Query: CREATE UNIQUE INDEX i_custname <----ON customers (custname).This query defines the view as defined. custname). WITH CHECK OPTION ensures that if this view is used for updation.this query drops the view MYVIEW1 Creating & Dropping Indexes Query: CREATE INDEX i_city <-------------------ON customers (city). <------------------WITH CHECK OPTION in a CREATE VIEW statement indicates that INSERTs or UPDATEs on the view will be rejected if they violate any integrity constraint implied by the view-defining query. Query: CREATE INDEX i_city_custname <--------ON customers (city.To drop a view .

views and other elements of the DBMS. DCL statements are those which are used to control access permissions on the tables. Query: GRANT SELECT <-------------ON customers TO sunil.GRANT and REVOKE statements. EXEC SQL EXEC SQL WHENEVER SQLERROR GOTO UNDO UPDATE DEPOSIT . GRANT ALL <-----------------ON customers TO ashraf.4 DCL . Consider the following example: The procedure for transferring an amount of Rs. 5. delete or perform any other operation on customers table. 5.4. Takes away DELETE permission on customers table from user 'ashraf'. Granting & Revoking Privileges Query: Grants all permissions on the table customers to the user who logs in as 'ashraf'. <-------Query: REVOKE DELETE <------------ON customers FROM ashraf. Recovery and Concurrency Recovery and Concurrency in a DBMS are part of the general topic of transaction management.from the account of one customer to another is given. User 'sunil' does not have permission to insert. 100/. indexes. Grants SELECT permission on the table customers to the user 'sunil'. update.1 Transaction A transaction is a logical unit of work. Hence we shall begin the discussion by examining the fundamental notion of a transaction. Query: GRANT SELECT ON customers TO sunil WITH GRANT OPTION. Enables user 'sunil' to give SELECT permission on customers table to other users.

In between these two updates the database is in an inconsistent (or incorrect in this example) state. Durability: Once a transaction commits.ACID standing for atomicity. then the system will guarantee that its updates will be permanently installed in the database even if the system crashes immediately after the COMMIT. Either all operations in the transaction have to be performed or none should be performed. Atomicity: A transaction is atomic.and after-images of the updated tuples are recorded in the log. (Note: Transactions do not directly write on . This is true with all transactions. 5. COMMIT â “ The COMMIT operation indicates successful completion of a transaction which means that the database is in a consistent state and all updates made by the transaction can now be made permanent.. Consistency: Transactions preserve database consistency. its updates survive in the database even if there is a subsequent system crash. The set of programs which handles this forms the transaction manager in the DBMS. a system log or journal is maintained by the transaction manager. GOTO FINISH ROLLBACK. or a violation of an integrity constraint etc. something goes wrong due to problems like a system crash. A transaction transforms a consistent state of the database into another without necessarily preserving consistency at all intermediate points. UPDATE DEPOSIT SET BALANCE=BALANCE+100 WHERE CUSTID=to_cust: COMMIT. consistency. A transaction's updates are concealed from all others until it commits (or rolls back). The before. ROLLBACK â “ The ROLLBACK operation indicates that the transaction has been unsuccessful which means that all updates done by the transaction till then need to be undone to bring the database back to a consistent state. Hence it is important to ensure that either a transaction executes in its entirety or is totally cancelled. the contents of the main memory are lost. The properties of transaction can be summarised as ACID properties .2 Recovery from System Failures System failures (also called soft crashes) are those failures like power outage which affect all transactions in progress.e.. Here. i. it has to be noted that the single operation â œamount transferâ involves two database updates â “ updating the record of from_cust and updating the record of to_cust. one cannot say by seeing the database contents whether the amount transfer operation has been done or not. Isolation: Transactions are isolated from one another.. i. Thus the contents of the database buffers which contain the updates of transactions are lost. after one update and before the next update. isolation and durability. i. The transaction manager uses COMMIT and ROLLBACK operations for ensuring atomicity of transactions.e. Any transaction takes the database from one consistent state to another. If. If a transaction successfully commits. SET BALANCE=BALANCE-100 WHERE CUSTID=from_cust.. Hence to guarantee database consistency it has to be ensured that either both updates are performed or none are performed. To help undoing the updates once done.e.EXEC SQL EXEC SQL UNDO: EXEC SQL FINISH: RETURN. an overflow error. During a system failure. It need not necessarily preserve consistency of database at all intermediate points. if only one of the updates is performed. but do not physically damage the database. then the first update needs to be undone.

b) Physically writing a special checkpoint record to the physical log.and after-images of the tuples updated during a transaction. T3 and T5 must be undone and T2 and T4 must be redone.3 Recovery : An Example At the time of restart. The updates are written to database buffers and. at regular intervals. This helps in carrying out the UNDO and REDO operations as required. This recovery procedure is carried out with the help of An online logfile or journal . Thus during a checkpoint the updates of all transactions. will be written to the physical database. Transactions which had completed prior to the crash but could not get all their updates transferred from the database buffers to the physical database have to redone at the time of restart.) At restart. 5. the system has to ensure that the ACID properties of transactions are maintained and the database remains in a consistent state. The checkpoint record has a list of all active transactions at the time of taking the checkpoint. transferred to the database.This involves the following two operations: a) physically writing the contents of the database buffers out to the physical database. the strategy to be followed for recovery at restart is as follows: • • Transactions which were in progress at the time of failure have to be undone at the time of restart.to the database. . This is needed because the precise state of such a transaction which was active at the time of failure is no longer known and hence cannot be successfully completed. To attain this. Typical entries made in the logfile are : • • • • • • • Start of Transaction Marker Transaction Identifier Record Identifier Operations Performed Previous Values of Modified Data (Before-image or Undo Log) Updated Values of Modified Records (After-image or Redo Log) Commit / Rollback Transaction Marker Taking a checkpoint at specific intervals .The logfile maintains the before. including both active and committed transactions.

a) Lost Update Problem (To understand the above situation. Update by Transaction A at time t3 is over-written by the Transaction B at time t4. with a field. the Amt value in record R has value 1200. say Amt. Transaction B updates the Amt field in R to 1200 at time t4. assume that • there o o o is a record R. some kind of control mechanism has to be in place to ensure that concurrent transactions do not interfere with each other.4 Concurrency Concurrency refers to multiple transactions accessing the same database at the same time. Three typical problems which can occur due to concurrency are explained here. having value 1000 before time t1. Thus after time t4. Transaction A updates the Amt field in R to 800 at time t3.T1 does not enter the recovery procedure at all since it updates were all written to the database at time tc as part of the checkpoint process 5.) . Both transactions A & B fetch this value at t1 and t2 respectively. In a system which allows concurrency.

Transaction A fetches R with Amt field value 800 at time t2. having value 1000 before time t1. with a field.b) Uncommitted Dependency Problem (To understand the above situation. Transaction A continues processing with Amt field value 800 without knowing about B's rollback. assume that • there o o o is a record R. Transaction B rolls back and its update is undone at time t3.) c) Inconsistent Analysis Problem . The Amt field takes the initial value 1000 during rollback. say Amt. Transaction B fetches this value and updates it to 800 at time t1.

Locking of records can be used as a concurrency control technique to prevent the above mentioned problems.5. A transaction acquires a lock on a record if it does not want the record values to be changed by some other transaction during a period of time. locking can also introduce the problem of deadlock as shown in the example below. The transaction releases the lock after this time. UPDATE or DELETE.) The following figure shows the Lock Compatibility matrix.6 Deadlocks Locking can be used to solve the problems of concurrency. (Here update means INSERT. An exclusive (write) lock is acquired on a record when a transaction wishes to update the record. Explicit lock requests need to be issued if a different kind of lock is required during an operation. and exclusive (X Lock). Locks are of two types 1. For example. locks are implicit. A FETCH request is an implicit request for a shared lock whereas an UPDATE request is an implicit request for an exclusive lock. However.5 Locking Locking: A solution to problems arising due to concurrency. . shared (S lock) 2. 5. • • A transaction acquires a shared (read) lock on a record when it wishes to retrieve or fetch the record. Normally. if an X lock is to acquired before a FETCH it has to be explicitly requested for.

1000000 tuple writes to a temporary space in the disk. Deadlock prevention can be done by not allowing any cyclic-waits. the system may detect it and break it. This may allow some other transaction(s) to proceed. If a deadlock occurs. For the same query. 6. Assumptions: • • • There are 100 records in ORDTBL There are 10.1000000 tuples written to disk (Assuming that 1000000 tuples in the intermediate result cannot be held in the memory.1 Overview When compared to other database systems. ORD_ITEMS where ORDTBL. this knowledge can be utilised only if the query is re-written each time. each of them waiting for one of the others to release a lock before it can proceed. Detecting involves detecting a cycle in the â œWait-For Graphâ (a graph which shows 'who is waiting for whom').000 records in ORD_ITEMS There are 50 order items with item# 'HW3' Query Evaluation Method 1 T1 = ORDTBL X ORD_ITEMS (Perform the Product operation as the first step towards joining the two tables) . QTY from ORDTBL. such conditions may be different at different times of querying.2 An Example of Query Optimization Let us look at a query being evaluated in two different ways to see the dramatic effect of query optimization. Breaking a deadlock implies choosing one of the deadlocked transactions as the victim and rolling it back.ORD# = ORD_ITEMS. It can be said so since relational systems by themselves do optimization to a large extent unlike the other systems which leave optimization to the programmer. Select ORDDATE. • In this chapter we shall look into the process of automatic query optimization done by the relational systems. system's ability to make use of the knowledge of internal conditions (eg: volume of data at the time of querying) for optimization. Query Optimization 6.Deadlock is a situation in which two or more transactions are in a simultaneous wait state. Consider the following query. query optimization is a strength of the relational systems.) . (In a manual system. thereby releasing all its locks.10000 X 100 tuple reads (1000000 tuple reads -> generates 1000000 tuples as intermediate result) . 6. which is not practically possible.) system's ability to evaluate large number of alternatives to find the most efficient query evaluation method. ITEM#. Automatic optimization done by the relational systems will be much more efficient than manual optimization due to several reasons like : • • uniformity in optimization across programs irrespective of the programmer's expertise in optimizing the programs.ORD# and ITEM# = 'HW3'.

100 tuple reads (100 tuple reads from ORDTBL) .1000000 tuples read into memory (1000000 tuple reads) . Here it needs to be noted that in the Method 2 of evaluation.'dd-mm-yy') = '11-08-94'.ITEM#. of tuple i/o s = 10000 reads + 100 reads = 10100 tuple i/o's Comparison of the two Query Evaluation Methods 10. Thus this operation causes elimination of 9950 tuples.50 tuples (final result) Total no.ORD# = ORD_ITEMS.000 tuples in the ORD_ITEMS table.000 tuple I/O's (of Method 1) ! Thus by sequencing the operations differently a dramatic difference can be made in the performance of queries. No more tuple i/o s) . 'dd-mm-yy').ORD# & ITEM# 'HW3'(T1) (Apply the two conditions in the query on the intermediate result obtained after the first step) . select CITY. No more tuple i/o s) . Thus elimination in the initial steps would help optimization. of tuple i/o s = 1000000 reads + 1000000 writes + 1000000 reads = 3000000 tuple i/o s Query Evaluation Method 2 T1 = ITEM#='HW3' (ORD_ITEMS) (Perform the Select operation on ORD_ITEMS as the first step) .50 tuples (final result) Total no.QTY (T2) (Projection performed as the final step.resulting relation with 50 tuples T3 = ORDDATE. QTY(T2) (Projection performed as the final step. select * from ORDTBL where ORDDATE = to_date('11-08-94'.000. where CITY != 'BOMBAY' group by CITY. select * from ORDTBL 2.100 tuple I/O's (of Method 2) v/s 3.50 selected (those tuples satisfying both the conditions. COUNT(*) from CUSTTBL group by CITY having CITY != 'BOMBAY'. COUNT(*) from CUSTTBL 1.50 tuples selected. v/s v/s . No disk writes assuming that the 50 tuples forming the intermediate result can be held in the memory) T2 = ORDTBL JOIN T1 .T2 = ORDTBL. 50 held in the memory itself) T3 = ORDDATE. Some more examples: select CITY. the first operation to be performed was a 'Select' which filters out 50 tuples from the 10. no disk writes (50 tuples satisfy the condition in Select. where to_char(ORDDATE. ITEM#.10000 tuple reads (10000 tuple reads from ORD_ITEMS) .

a) Cast into some Internal Representation â “ This step involves representing each SQL query into some internal representation which is more suitable for machine manipulation. ) Rule 1: (A JOIN B) WHERE restriction_A AND restriction_B (A WHERE restriction_A) JOIN (B WHERE restriction_B) Restrictions when applied first. a function to_char is applied on an attribute and hence needs to be evaluated for each tuple in the table. In the second form. 6.Here the second version is faster. Some examples are given below. the optimizer makes use of some transformation laws or rules for sequencing the internal operations involved. Rule 2: (A WHERE restriction_1) WHERE restriction_2 . The time for this evaluation will be thus proportional to the cardinality of the relation. a function to_date is applied on a constant and hence needs to be evaluated just once. since the attribute appears in an expression and its value is not directly used. Query Tree for the SELECT statement discussed above: b)Convert to Canonical Form â “ In this second step. (Note: In all these examples the second form will be more efficient irrespective of the actual data values and physical access paths that exist in the stored database.3 The Query Optimization Process The steps of query optimization are explained below. In the first form of the query. cause eliminations and hence better performance. irrespective of the cardinality of the relation. Moreover. The internal form typically chosen is a query tree as shown below. if the attribute ORDDATE is indexed. the index will not be used in the first case.

At this stage factors such as existence of indexes or other access paths. all but the last one can be ignored. Assume that there is a query expression comprising a restriction. The optimizer chooses one or more candidate procedures for each low-level operations in the query.A WHERE restriction_1 AND restriction_2 Two restrictions applied as a single compound one instead applying the two individual restrictions separately. are considered. For eg. Rule 3: (A[projection_1])[projection_2] A[projection_2] If there is a sequence of successive projections applied on the same relation. one (say. procedure 'a') for the case where the restriction attribute is indexed. of implementation procedures available for each of these operations can be assumed as given in the table below. Each such procedure has and associated cost measure indicating the cost.. physical clustering of records.) which is available from the system catalog will be used to make this choice of candidate procedures. The basic strategy here is to consider the query expression as a set of low-level implementation procedures predefined for each operation.. d)Generate Query Plans and Choose the Cheapest. typically in terms of disk I/Os. current cardinalities etc. The information about the current state of the database (existence of indexes. Reference [1] gives more such general transformation laws. c)Choose Candidate Low-level Procedures â “ In this step. Implementation Procedure a b c Operation Restriction Restriction Restriction Condition Existing Restriction attribute is indexed Restriction attribute is hashed Restriction attribute is neither . a join and a projection. distribution of data values etc. procedure 'b') where the restriction attribute is hashed and so on. This can be explained with the following example(A trivial one but illustrative enough).e. i. Some examples. query plans are generated by combining a set of candidate implementation procedures. The entire operation is equivalent to applying the last projection alone. In this last step. there will be a set of procedures for implementing the restriction operation: one (say. the optimizer decides how to execute the transformed query. cause eliminations and hence better performance. Rule 4: (A[projection]) WHERE restriction (A WHERE restriction)[projection] Restrictions when applied first.

6. .4 Query Optimization in Oracle Some of the query optimization measures used in Oracle are the following: Indexes unnecessary for small tables. if the size of the actual data record is not much larger than the index record. The overhead of searching in the index file will be more when retrieving more rows. one such heuristic method can be as follows: If the system knows that the restriction attribute is neither indexed nor hashed. Index not used in queries containing NULL / NOT NULL.. Hence a heuristic reduction of search space rather than exhaustive search needs to be done. . then the query plans involving implementation procedure 'c ' alone (and not 'a' and 'b') need to be considered and the cheapest plan can be chosen from the reduced set of query plans. Multiple column WHERE clauses evaluations causing largest number of eliminations performed first JOIN-columns should be indexed. i.. Hence need not search for these in the index table. Thus the query plans can be adf adg aef aeg bdf .indexed nor hashed Join Join Projection Projection d e f g Now the various query plans for the original query expression can be generated by making permutations of implementation procedures available for different operations.. JOIN columns or Foreign Key columns may be indexed since queries based on these columns can be expected to be very frequent. the search time in the index table and the data table will be comparable. Index tables will not have NULL / NOT NULL entries.e.. the number of such query plans possible can be too many and hence generating all such plans and then choosing the cheapest will be expensive by itself. It has to be noted that in reality. Hence indexes will not make much difference in the performance of queries.. Considering the above example. Indexes/clusters when retrieving less than 25% of rows.

Sign up to vote on this title
UsefulNot useful