DATABASE MANAGEMENT SYSTEM

DBMS is software system and its main purpose is to store data but every software which stores data is not DBMS. Characteristics of DBMS – it must provide a easy language for retrieval and manipulation of data.  Language should not have complex programming techniques and it should support structural programming.  Language supported by DBMS – SQL(Structured Query Language) – It must provide concurrent access to data( Multiple transaction can be performed on data at a time). – It must provide data integrity ( No replicated data). – It must provide security( prevent accessing of data by unauthorized user) DBMS store data in form of table or relation. A table consist of rows( or records or tuples) and column(or field or attribute). A Database is collection of tables. Consider student table given below. RollNo Name Address DOB 1 2 Vivek Priyesh 12, jivaji nagar M25-gandhinagar 18/8/87 15/6/90

In this Student table there are four columns(RollNo, Name, Address, DOB) and two rows(records). Each column is defined with a data type. Some General data types supported by DBMS are: 1. Varchar- for string of characters 2. Number- for numeric values 3. Date- for date and time Constraints are the conditions which are required to be satisfied when data is inserted or deleted or modified. Suppose for Student table a constraint is defined to check age of students is not more than 20, then we can apply check for DOB > “31/12/1990”. If we try to insert entry for a student with DOB less than 31/12/1990 then insertion of record will give error and transaction will not be completed successfully. RDBMS- A software system is said to be a Relational DBMS if it follows all 12 rules suggested by E. F. Codd. Oracle follows only 7-8 rules and MS access follows only 3-4 rules.

STRUCTURED QUERY LANGUAGE

This is standard query language which should be implemented by all RDBMS for defining, manipulation and retrieval of data. SQL statements can be divided into 3 categories: (i) DDL( data definition language): used to define tables. e.g. Create, Alter, Update (ii) DML(data Manipulation Language): used to manipulate data in tables. e.g. insert, update, delete, select (iii)DCL(data control language): e.g. Commit, rollback Uses of some component of SQL is given below:
1. Create: used to create a table

Example:

create table Student ( Rollno Number(5), Name Varchar(20) Not Null, Address varchar(30), DOB Date )
Page 1

ENGINEER’S CIRCLE, GWALIOR

This query will create a table with four fields i. Rollno- which is number type and 5 digit it can have at max. ii. Name- which is variable character string type with maximum 20 character. Not null is constraints which specify that values in Name field cannot be null. iii. Address- which is variable character string type with maximum 30 character iv. DOB- which is date type RollNo Name Address DOB

1. Insert: used to Insert data into table.

Example: insert into Student values (1,’Vivek’,’ 12, jivaji nagar’,’26/10/1987’ ) This query will insert a tuples into table Student. RollNo Name Address DOB 1 Vivek 12, jivaji nagar 26/10/1987

Note: when we use varchar and date data in query then it should be in single quotes. Keywords in SQL, column names and table names are not case sensitive but data is case sensitive. If we want to insert data in particular fields then query will be as insert into student( Rollno, Name) values (2 ,’priyesh’) now table will be: RollNo Name Address 1 2 Vivek Priyesh 12, jivaji nagar Null

DOB 18/8/87 Null

If we try to insert a tuple without name then query will give error in execution, because we have defined Name field with constraint ‘not null’. Example: insert into student (Rollno) values(3); This query will generate error because null value is not accepted in Name field.
2. Update: used to update values in table

Example: update student set Address =’21,gandhinagar’ This query will update all values in Address field to ’21,gandhinagar’ RollNo Name Address DOB 1 2 Vivek Priyesh 21,gandhinagar 21,gandhinagar 18/8/87 Null

But if we want to change address of particular fields then we can use where clause. Example: update student set Address= ‘12,Jivajinagar’ where rollno=1 This query will update address of student with rollno 1 to ‘21,gandhinagar’
3. Delete: used to delete rows from table

Example: delete from student This will delete all rows in student table If we want to delete particular row then we can use where clause. Example: delete from student where DOB < ‘31/12/1990’
ENGINEER’S CIRCLE, GWALIOR Page 2

This query will delete all records with date of birth less than ‘31/12/1990’.
4. Commit: save changes permanently to disk.

Example: update student set Name=’Vikas’ where rollno=1; Commit; ‘;’ is used to separate multiple statements. Commit statement make these changes permanently to disk. If you don’t commit the transactions than changes are made only in main memory and if suddenly main memory switch off( or computer crashes) then these changes are not visible when you reboot computer. Generally DBMS implements autocommit(automatically executes commit in some period of time ).
5. Rollback : undo changes since last commit. 6. Drop: used to delete table

Example: drop table student This query will removes table from the disk Note: drop and create can not be rollbacked.
7. Select: used to view data

Example: select * from student This query will display all rows and column of student table. To view particular row and column where clause can be used as Select rollno,name from student where DOB > ‘1/1/1995’ This query will return only two column rollno and name of students those are born after 1/1/1995. In this type of query, records are selected first on the basis of conditions in where clause and then fields specified in the query will be displayed.
8. Distinct: used to produce non-duplicate result.

Example: select distinct name from student This query will return name of students and if two or more students have same name then only one name is returned. Query: select distinct name ,address from student This query will give error when there are two students with same name and different address.
9. Where clause: to understand where clause and conditions Consider Relation Employee

( empid, ename , salary, job, deptid ) Q1. write a Query to find name of all employee of deptid 10 Ans. select ename from employee where deptid=10 Q2. write a query to find name of all manager from deptid 20 Ans. select ename from employee where deptid=20 and job= ‘Manager’ Q3. write a query to find name of employee belongs to deptid 10 or deptid 20 Ans. select ename from employee where deptid=10 or deptid=20 This query can be rewritten with the use of ‘in’ as Select ename from employee where deptid in(10,20) ‘in’ checks for set membership. In this example ‘in’ checks whether deptid ∈ {10,20} Q4.Write a query to find name of employees having salary between 1000 and 2000 Ans. Select ename from employee where salary>=1000 and salary <=2000 This query can be rewritten with the use of ‘between’ as
ENGINEER’S CIRCLE, GWALIOR Page 3

Select ename from employee where salary between (10,20) Between includes boundary condition. In above example salary is also equated with 1000 and 2000. Q5. Write a query to find name of employees have their name start with A or B or C or D Note: Before solving this query we first understand how strings are handles in SQL. Suppose if S1= ‘ABC’ and S2= ‘X’ then which one of the S1 and S2 is greater. SQL compares string from left to right and by position of each character according to their ASCII value. ‘A’ from S1 is compared with ‘X’ from S2 , since ASCII value of ‘ X’ is greater than ASCII value of ‘A’, so S2 > S1 is answer. If first character in S1 and S2 is same then they are compared according to next character in position. Examples: S1= ‘ADAMS’ S2= ‘A’ S1 > S2 because ASCII value of ‘D’ is greater than ASCII value of NULL S1= ‘ADAMS’ S2= ‘D’ S1 < S2 because ASCII value of ‘D’ is greater than ASCII value of ‘A’

Ans. Now to find name of employees whom name start with A or B or C or D , if we use this query Select ename from employee where ename between ‘A’ and ‘D’ then it will display all names start from ‘A’ and ‘B’ and ‘C’ and only ‘D’ , not all name starting with ‘D’ . Instead this we can use this query Select ename from employee where ename < ‘E’ and ename between ‘A’ and ‘E’ Q6. Write a query to find name of employee having no deptid. Ans. Select ename from employee where deptid=null This query will not display any row because Null cannot be used with relational operators. Deptid=null neither returns true nor false but it returns null. To compare with Null SQL provide ‘is’ and ‘is not’ operators. Above query correctly can be written as: Select ename from employee where deptid is null Q7. Write a query to find name of employees whom name starts with ‘C’ and ends with ‘T’. Ans. for this type of quries SQL provides wild cards and Like operator. Wild card ‘%’ is used for string of zero to any length. Wild card ‘_’ is used for any single character. Like and not like operator is used when string with wild card is compared. Query for this question is: Select ename from employee where ename like ‘C%T’ Q8. Write a query to find name of employees whom name contains ‘C’ as second character. Ans. Select ename from employee where ename like ‘_C%’ Q9. Write a query to find name of employees whom name contain at least two ‘T’ . Ans. Select ename from employee where ename like ‘%T%T%’ Q10. Write a query to find name of employees whom name contain at exactly two ‘T’ .
ENGINEER’S CIRCLE, GWALIOR Page 4

Ans. Select ename from employee where ename like ‘%T%T%’ and not like ‘%T%T%T %’ Q11. Write a query to find name of employees whom name contain exactly two characters. Ans. Select ename from employee where ename like ‘_ _’ Q12. Write a query to find name of employees whom name contain ‘_’ . Ans. If wild card character are to be compared in data then ‘/’ is used before character as a escape character. Select ename from employee where ename like ‘% /_ %’ Q13. Write a query to find name of employees whom name contain ‘/’ . Ans. Select ename from employee where ename like ‘% / / %’
10.Order by: used to display data in(either increasing or decreasing)

SCOTT Allen Null 123 Adams Smith

order. In tables data is present in the order in which we insert data. But while retrieving data we can use order by keyword to display data in increasing or decreasing order. Example: select ename from employee order by ename This display names in alphabetical order. Suppose employee table contains ename Null 123 Alphabetical increasing order Adams SCOTT Allen Smith

If there are two or more same name then we can order these same name according to other column Select ename from employee order by ename, job desc Desc used to specify decreasing order.
11.Aggregate functions: consider Employee table given below:

empid 1 2 3 4 5

Ename Vivek Priyesh Pawan Hermesh Null

Job Manager Programmer HR programmer Manager

Salary 50000 20000 30000 22000 Null

Depid 25 25 26 Null Null

There are 5 aggregate function supported by SQL (i) Sum(column_name) –used to find sum of values in field column_name Example: select sum(salary) from employee
ENGINEER’S CIRCLE, GWALIOR Page 5

This query will return sum of salaries of all employee. Sum(salary) 122000 (ii) count(column_name)- used to count records in field column_name. if column_name contains any null value then it doesn’t count that row. Count can be used to count a tupple of two or more column. Suppose if we use count(empid,ename) then it will treat them as single field and count for each (empid,ename) tuple. Query: Select count(salary) from employee Output: Count(salary) 4 Query: count(salary, deptid) from employee Output: Count(salary,deptid) 4 If you count a tuple of two or more fields then only tuple having all null values {Null, Null} is counted as 0 and tuple like {22000,Null} will be counted as one . (iii) max(column_name)- used to find maximum in a column Query: select max(salary) from employee Output: max(salary) 50000 (iv) min(column_name)- used to find minimum in a column Query: select min(salary) from employee Output: min(salary) 20000 (v) avg(column_name)- used to find average of values in a column and implemented as sum(column_name) / count(column_name). Query: select avg(salary) from employee Output: avg(salary) 30500 122000 / 4 =30500
12.As : used for renaming a field returned by select. In above query avg(salary) is column

name return by select, we can rename it using ‘as’ operator. Query: select avg(salary) as average_salary from employee Output:
ENGINEER’S CIRCLE, GWALIOR Page 6

Average_salary 30500
13.Group value function(group by): suppose if we want to find out average salary in each

department. Query: select deptid, avg(salary) from employee group by(dept id) Output: Deptid Avg(salary) 25 26 Null 35000 30000 20000

If null is a entry in deptid then a separate group is created for deptid having null value and then aggregate function is applied. If you try to print avg(salary) and deptid without using group function as Select deptidd, avg(salary) from employee Then this query will give error because result is not compatible (too see how result is not compatible try to draw table for above query). Note: we can select column with aggregate functions only if that column appear in Group by function. Query: select deptid, job, avg(salary) from employee group by(deptid, job) This query first divide table int the group of deptid, then each deptid divided into subgroup of job. Output: Deptid Job Avg(salary) 25 25 26 Null Null Manager Programmer HR Manager Programmer 50000 20000 30000 Null 22000

14.Having: its like where clause used only for group by function. We cannot apply where

clause to group by functions. So you want to apply aggregate function with some conditions to each group divided by group by function then use having as: select avg(salary) from employee group by (deptid) having count(empid) >= 5 This query returns average salary of department having at least 5 employees. First table is grouped according to deptid and group having at least 5 employess are selected then average is found for selected group. Q14. write a query to find deptid with more than 5 employee having salary more than 5000.
ENGINEER’S CIRCLE, GWALIOR Page 7

Ans. select deptid from employee where salary > 5000 group by(deptid) having count(empid) >=5
15.Nested Queries- query within a query is called Nested query

Example: select ename from employee where salary = ( select max (sal) from emp) In this query, query inside parentheses is inner query. Iinner query executes first and return maximum salary in employee table which is 50000 ,then outer query will look like: Select ename from employee where salary =50000 Q15. write a query to name of employees having salary greater than ‘Hermesh’. Ans. select ename from employee where salary > ( select salary from employee where ename = ‘ Hermesh‘) This query run correctly if there is only one employee with name ‘Hermesh’ , inner query return single value. But, if there are more than one employee with name ‘Hermesh’ then inner query returns a set of values and > operator (all relational operator) can compare values but not set of value. In this case above condition look like: salary > {20000 , 50000} , which is not a valid statement, so above query will give error. To overcome this type of error Any,All and Some is used with relational operators. Example: select ename from employee where salary > all (select salary from employee where ename = ‘ Hermesh‘). Now, if table have more than one employee with name ‘Hermesh’ and set of salaries are returned by inner query, suppose it is {1000,30000,15000,20000}, then ‘>All’ will compare values in salary field in table by all values in set. In simpler way, ‘>All’ select maximum value from set and then compare this maximum value with values in table. There is no need to compare for all other values in set. If values in salary field in table is greater than maximum value in Set than it will also greater than all other values in Set. Similarly, <All - means Minimum in set =ALL - not possible > Any - means Minimum in set <Any - means Maximum in set =Any - same as ‘In’ operator Some is same as Any. Nested Queries for two or more tables Examples: consider two relations employee and department Employee(empno, ename,deptno) Department(deptno, dname,dcity). Q16. write a query to find name of employee of ‘Research’ department. Ans. select ename from employee where deptno in (select deptno from department where dname=’Research’) Here inner query run on department table and return deptno, this deptno can be equated with deptno of Employee table. Q17. write a query to find name of employees whom department is in Delhi. Ans. select ename from employee where deptno in (select deptno from department where dcity=’Delhi’ 16. Key constraints: Unique Key – those fields in relation have unique values, can be assigned unique keys. A unique key can have any number of null values. For example, if we assign ename field in
ENGINEER’S CIRCLE, GWALIOR Page 8

employee table to unique key then it cannot contain two or more employees with same name, But it allows fields to have any number of null values. Unique key can be assigned to group of fields. Example: (empid, ename) can be assigned a unique key. In this case employees can have same name or same empid but two or more employess cannot have both empid and ename same. Primary key- This is defined as unique key with a Not Null constraint. If a field is primary key than it cannot have null values, similarly if group of fields is primary key then all fields for same record cannot have null values. Example: if (empid, ename) is primary key then values (1,Null) is allowed but values (Null, Null) is not allowed in table. Primary key uniquely identifies a record. In relations primary key is denoted by underlining the keys. For example relation employee(empid,ename,deptno) has primary key (ename,deptno). Candidate key: keys or combination of keys those can be assigned as primary key. In other words keys those are candidature of primary key are called as primary keys. For example: if both empid and (ename,deptno) are able to assigned as primary key of table, so both are candidate key. But only one of the is chosen as primary key, We should choose candidate key which is used most in queries. Primary key is used for indexing purpose in table and indexes speed up searching process. Super key: all possible keys or combination of keys those uniquely identify a record is called super key. Primary key(candidate key) is minimal set of super keys which uniquely identifies a record in table. Super keys can be subdivided while candidate key cannot be divided further, means if combination of keys is taken as candidate key and if we remove any field from that combination then the combination will not remain candidate key or super key. If we add more fields to primary key then combination will not remain candidate key but becomes super key. Suppose empid and (ename,deptno) is primary key of employee relation then possible super keys are: empid , (empid,ename) , (empid,deptno) , (ename ,deptno) and (empid,ename,deptno) All candidate key are super key but not all super keys are candidate key. Similarly, every primary key is candidate key but not vice-versa. Unique key does not identify a record uniquely but values in this field or combination of field are unique. While defining table we can create primary key and unique key. Example: create table employee (empid number(6) primary key , ename varchar(20) unique, deptno Number(2) ); if we want to make (ename,deptno) primary key then query is: create table employee (empid number(6) , ename varchar(20), deptno Number(2) , primary key(ename, deptno) ); ) Foreign Key: consider two relations employee(empid,ename,deptno) and department(deptno, dname,city). If we want to make sure that deptno inserted or modified
ENGINEER’S CIRCLE, GWALIOR Page 9

in employee relation must be in department table or in other words, every employee should be in a valid department, then make deptno of employee table (employee.deptno) foreign key which references to deptno field of department table(department.deptno). Query to create forign key is: First define table department Create table department( deptno Number(2), dname varchar(20) city varchar(20) ); now define table employee as Create table employee(empid number(6) , ename varchar(20), deptno Number(2) , foreign key employee.deptno references department.deptno ); Note: Foreign key can have any number of null values and field that is referred by foreign key must be unique. Suppose entries in tables are Table: Department Deptno 10 12 13 Dname Research Account Managing City Gwalior Gwalior Indore 1 2 3 4 Vivek Hermesh Priyesh Pawan Table: Employee Empid Ename Dep tno 12 10 10 Null

If we try to insert tuple(5, XYZ,20) in employee table then query will generate error because deptno 40 is not in Department table. Suppose if we delete row with deptno=10 from department table then two records(having empid 1 and empid 2) in employee table, given above, become invalid, so SQL not allow to delete row from department table until these two rows are deleted. On delete cascade : this is used while defining foreign key for above situation. Example: Create table employee (empid number(6) , eeame varchar(20), deptno Number(2) , foreign key employee.deptno references department.deptno on delete cascade ) ; Now , rows having deptno 10 in employee table is deleted first then row(with deptno 10) is deleted from department table. On delete NULL: in above case data from employee table is lost, For example ,if a department is removed from company then this does not mean that employee belongs to that company also removed but they can be shifted to other department or remain in no department. For this case ‘on delete Null’ can be used instead of ‘on delete cascade’. So when we delete row with deptno=10 from department table , first deptno of ‘Hermesh’ and ‘Priyesh’ is set to null then row is deleted from department table.

ENGINEER’S CIRCLE, GWALIOR

Page 10

17.Join: to understand join first we look at the Cartesian product of table. Result of the

Cartesian product of two tables from previous example is (each row of one table is combined with each row of other table): Employee × Department Em pid 1 1 1 2 2 2 3 3 3 4 4 4 Ename Vivek Vivek Vivek Hermesh Hermesh Hermesh Priyesh Priyesh Priyesh Pawan Pawan Pawan Employee. Deptno 12 12 12 10 10 10 10 10 10 Null Null Null Department. Deptno 10 12 13 10 12 13 10 12 13 10 12 13 Dname Research Account Managing Research Account Managing Research Account Managing Research Account Managing City Gwalior Gwalior Indore Gwalior Gwalior Indore Gwalior Gwalior Indore Gwalior Gwalior Indore

If 1st table have m rows and 2nd table have n rows then Cartesian product results in m×n rows. Query for above Cartesian product is: Select * from employee , department If two tables have same column name then they are differentiated by preceding table name and a dot(‘.’) . Now if we want department information of each employee then query will be: Select empid ,ename, department.deptno, dname, city from employee, department where employee.deptno = department.deptno; Result of this query will be: Empid 1 2 3 Ename Vivek Hermesh Priyesh Department.Deptno 12 10 10 Dname Account Research Research City Gwalior Gwalior Gwalior

This is called Natural join, in which two or more tables are joined according to their common fields. Q18. write a query to find name of employees in account department.
ENGINEER’S CIRCLE, GWALIOR Page 11

Ans. select ename from employee, department.deptno and dname=’Account’

department

where

employee.deptno

=

This problem can be solved by using nested query as: Select ename from employee where deptno =(select deptno from department dname=’Account’);

where

Note: There are some problems that can be solved by both join and nested query , some problems can only solved by join, similarly, some problems can only solved by nested queries. There are also some problems which cannot solved by both join and nested queries. Self join: Joining a table from itself. Consider the table employee (empid,ename,mgrid) where empid is primary key of table and mgrid is foreign key references to empid of employee table. This table contain information about employees and their manager. Empid 1 2 3 4 5 6 7 Ename Vivek Hermesh Priyesh Pawan Ravindra Aditya Mohan Mgrid Null 3 1 5 7 1 6

Now if we want to find out who is manager of Aditya then we have to use self join as: Select e2.ename from employee e1, employee e2 where e1.mgrid=e2.empid and e1.ename= ‘Aditya’; Here e1 and e2 are Alias of employee table. e1 and e2 are like two copies of employee table then these copies joined by equating e1.mgrid and e2.empid. Outer Joins :Notice that much of the data is lost when applying a join to two relations. In some cases this lost data might hold useful information. An outer join retains the information that would have been lost from the tables, replacing missing data with nulls. There are three forms of the outer join, depending on which data is to be kept. • LEFT OUTER JOIN - keep data from the left-hand table • RIGHT OUTER JOIN - keep data from the right-hand table • FULL OUTER JOIN - keep data from both tables

ENGINEER’S CIRCLE, GWALIOR

Page 12

Example:

1. Correlated queries: this is special type of nested query in which inner query executed for

every row selected outer query. Example: for relation employee(empid, ename, deptno, salary) write a query to find the name of employees earning highest salary in their department. Select * from employee e1 where e1.salary=(select max(salary) from employee e2 e1.deptno=e2.deptno) Now suppose if there are 10 records in employee table selected by outer query then inner query is executed for each record in employee.(i.e. 10 time total). For example if employee table is: Empid 1 2 3 4 5 6 7 Ename Vivek Hermesh Priyesh Pawan Ravindra Aditya Mohan deptno 10 20 20 10 30 30 20 Salary 30000 20000 12000 25000 30000 10000 25000

First inner query runs for 1st record and in inner query put 30000 for e1.salary and 10 for e1.deptno. Inner query returns max salary of department 10, then outer query compares values and then save result. Now inner query runs for 2nd ,3rd ,….. 7th record in sequence . Finally result is produced. Empid Ename deptno Salary 1 5 7 Vivek Ravindra Mohan 10 30 20 30000 30000 25000

2. Grant and Revoke: used for deciding access permission for select/update/delete/insert/

queries on table/views to other users: Example:
ENGINEER’S CIRCLE, GWALIOR Page 13

Grant update, delete on employee to Rahul This query gives permission to Rahul to delete and update on employee table. Query: Grant all on employee to Rahul with grant option This query gives permission to Rahul for all operation on employee table and also permission to give permission to other user. Similarly, Revoke is used to withdraw/cancel granted permission from a user. Example: revoke update on employee from Rahul
3. Views: A SQL View is a virtual table, which is based on SQL SELECT query.

Essentially a view is very close to a real database table (it has columns and rows just like a regular table), except for the fact that the real tables store data, while the views don’t. The view’s data is generated dynamically when the view is referenced. A view references one or more existing database tables or other views. In effect every view is a filter of the table data referenced in it and this filter can restrict both the columns and the rows of the referenced tables. Here is an example of how to create a SQL view using already familiar employee and department table Create view employeeinfo as Select empid , ename, department.deptno, city From employee, department where employee.deptno=department.deptno Importance of views: if we want that other user should have access to only some fields of tables then create views using only those field and make view accessible to other users instead of original table. Question : Are views updatable? For a view to be updatable there are some condition which should be satisfy by view • It should be single table based. • If it is created by using two or more tables then all primary keys and not null keys of all tables should be in view. • It should not contain aggregate functions
1. Set operations: There are four set operation supported by SQL

UNION ALL: Combines the results of two SELECT statements into one result set. UNION: Combines the results of two SELECT statements into one result set, and then eliminates any duplicate rows from that result set. MINUS: Takes the result set of one SELECT statement, and removes those rows that are also returned by a second SELECT statement. INTERSECT: Returns only those rows that are returned by each of two SELECT statements. SQL statements containing these set operators are referred to as compound queries, and each SELECT statement in a compound query is referred to as a component query. Two SELECTs can be combined into a compound query by a set operation only if they satisfy the following two conditions: 1. The result sets of both the queries must have the same number of columns. 2. The datatype of each column in the second result set must match the datatype of its corresponding column in the first result set. Note: The datatypes do not need to be the same if those in the second result set can be automatically converted by DBMS (using implicit casting) to types compatible with those in the first result set. These conditions are also referred to as union compatibility conditions. The term union compatibility is used even though these conditions apply to other set operations as well.
ENGINEER’S CIRCLE, GWALIOR Page 14

Set operations are often called vertical joins, because the result combines data from two or more SELECTS based on columns instead of rows. The generic syntax of a query involving a set operation is: <component query> {UNION | UNION ALL | MINUS | INTERSECT} <component query> Example: select ename from employee where salary > 1000 Intersect select ename from employee where deptno=10 Q19. The employee information in a company is stored in the relation Employee (name, sex, salary, deptName) Consider the following SQL query Select deptName From Employee Where sex = male Group by deptName Having avg(salary) > (select avg (salary) from Employee) It returns the names of the department in which (a) the average salary is more than the average salary in the company (b) the average salary of male employees is more than the average salary of all male employees in the company (c) the average salary of male employees is more than the average salary of employees in the same department. (d) the average salary of male employees is more than the average salary in the company. CS2004 Ans. d

RELATIONAL ALGEBRA
In order to implement a DBMS, there must exist a set of rules which state how the database system will behave. For instance, somewhere in the DBMS must be a set of statements which indicate than when someone inserts data into a row of a relation, it has the effect which the user expects. One way to specify this is to use words to write an `essay' as to how the DBMS will operate, but words tend to be imprecise and open to interpretation. Instead, relational databases are more usually defined using Relational Algebra. Relational Algebra is : • the formal description of how a relational database operates • an interface to the data stored in the database itself • the mathematics which underpin SQL operations Operators in relational algebra are not necessarily the same as SQL operators, even if they have the same name. For example, the SELECT statement exists in SQL, and also exists in relational algebra. These two uses of SELECT are not the same. The DBMS must take whatever SQL statements the user types in and translate them into relational algebra operations before applying them to the database. Relational algebra is a procedural language Operators in relational algebra:
ENGINEER’S CIRCLE, GWALIOR Page 15

1. Project(Π) : used to select a subset of the attributes of a relation by specifying the names

of the required attributes. Same as select in sql. Πename(employee) This will return set of ename from employee table. its result same as SQL query Select ename from employee. The only difference is that select can return duplicate values while all relational algebra work on set and set does not contain duplicated values, so values in set, returned by project, are distinct. 2. Select(σ): same as where clause in sql. The only difference is in sql where clause checks conditions in it but σ return complete rows from table according to condition. σsalary>5000(employee) this relational expression will return records of employees having salary > 5000 if we want to select only name of employees having salary>5000 then relational expression will be: Πename(σsalary>5000(employee)) Q20. write a relational expression to find name of employees of department number 10 having salary >500 Ans. Πename(σsalary>5000 ∧deptno=10 (employee)) ∧is AND operator and ∨is OR operator. 3. Cartesian product(×): Example: Πename,dname(employee × department) This relational expression is same as sql statement: Select ename , dname from employee, department 4. Joins: in relational algebra special operators are used for joins. Joins are performed by equating fields with same name in two tables. Natural join: Full outer join: Right outer join: Left outer join: 5. Renaming operator() : used for creating alias or renaming a table field in output(similar to ‘as’ operator in sql) 6. Group by ( ): Example: write a relational expression to find average salary of each department. Πsalary(employee) (deptno) Q21. Let R1 (A,B,C) and R2 (D,E) be two relation schema, where the primary keys are shown underlined, and let C be a foreign key in R1 referring to R2. Suppose there is no violation of the above referential integrity constraint in the corresponding relation instances r1 and r2. Which one of the following relational algebra expressions would necessarily produce an empty relation? (a) ΠD(r2) – ΠC(r1)
ENGINEER’S CIRCLE, GWALIOR Page 16

(b) ΠC(r1) – ΠD(r2) (c) ΠD(r1 (d) ΠC(r1
C=D 2

r) r) CS2004

C=D 2

Ans. b Explanation: C is foreign key referring to R2(D of R2), means C contains values those are already in D. Applying MINUS operator as ΠC(r1) – ΠD(r2) will return empty set. We can also solve this query by taking example: R1 R2 A B C D E a1 a2 a3 a4 b1 b2 b3 b4 c1 c2 c2 c3 c1 c2 c3 c4 c5 ΠD(r2) – ΠC(r1) returns {c4,c5} ΠC(r1) – ΠD(r2) returns empty set{} ΠD(r1 ΠD(r1
C=D C=D

e1 e2 e2 e4 e5

r2) returns {c1,c2,c3} r2) returns {c1,c2,c3}

Q22. Consider the relation Student (name, sex, marks), where the primary key is shown underlined, pertaining to students in a class that has at least one boy and one girl. What does the following relational algebra expression produce) (Note: ρ is the rename operator). Πname (σsex= female (Student)) — Πname [σsex=female /\ x=male /\ marks≤ m Student ρ (a) names of girl students with the highest marks (b) names of girl students with more marks than some boy student (c) names of girl students with marks not less than some boy student (d) names of girl students with more marks than all the boy students
n,x,m

(Student)]

CS2004

Ans. d Explanation: Πname (σsex= female (Student)) will return only name of female student. Πname [σsex=female /\ x=male /\ marks≤ m Student ρ n,x,m (Student)] will return name of female student having marks less or equal than any male student. Subtracting result of second query from result of first query will return name of female students those do not have marks less or equal than any male student.

RELATIONAL CALCULUS

Relational calculus consists of two calculi, the tuple relational calculus and the domain relational calculus, that are part of the relational model for databases and provide a declarative way to specify database queries. This in contrast to the relational algebra which is also part of the relational model but provides a more procedural way for specifying queries.
ENGINEER’S CIRCLE, GWALIOR Page 17

Relational calculus query specifies what is to be retrieved rather than how to retrieve it. – No description of how to evaluate a query. In first-order logic (or predicate calculus), predicate is a truth-valued function with arguments. When we substitute values for the arguments, function yields an expression, called a proposition, which can be either true or false. If predicate contains a variable (e.g. ‘x is a member of staff’), there must be a range for x. When we substitute some values of this range for x, proposition may be true; for other values, it may be false. Tuple Relational Calculus: Interested in finding tuples for which a predicate is true. Based on use of tuple variables. Tuple variable is a variable that ‘ranges over’ a named relation: i.e., variable whose only permitted values are tuples of the relation. Specify range of a tuple variable S as the Staff relation as: Staff(S) To find set of all tuples S such that P(S) is true: {S | P(S)} Examples: To find details of all staff earning more than 10,000: {e | Staff(e) Ù S.salary > 10000} To find a particular attribute, such as salary, write: {e.salary | Staff(S) ∧e.salary > 10000} In relational calculus two quantifiers are used to tell how many instances the predicate applies to: – Existential quantifier ∃ (‘there exists’) – Universal quantifier ∀ (‘for all’) Tuple variables qualified by  or  are called bound variables, otherwise called free variables. Existential quantifier used in formulae that must be true for at least one instance, such as: Staff(e)  (∃ B)(Branch(B) ∧ (B.branchNo = e.branchNo) ∧B.city = ‘London’) Means ‘There exists a Branch tuple with same branchNo as the branchNo of the current Staff tuple, S, and is located in London’. Universal quantifier is used in statements about every instance, such as: (∀B) (B.city ¹ ‘Paris’) Means ‘For all Branch tuples, the address is not in Paris’. These identifiers can be used with negation operator (~) as ~(∃ B) (B.city = ‘Paris’) which means ‘There are no branches with an address in Paris’. Examples: List the names of all managers who earn more than £25,000. {S.fName, S.lName | Staff(S) ∧ S.position = ‘Manager’ Ù S.salary > 25000} List the staff who manage properties for rent in Glasgow. {S | Staff(S) Ù (∃ P) (PropertyForRent(P) ∧(P.staffNo = S.staffNo) ∧P.city = ‘Glasgow’)} List the names of staff who currently do not manage any properties. {S.fName, S.lName | Staff(S) ∧(~(∃ P) (PropertyForRent(P)∧ (S.staffNo = P.staffNo)))} Or {S.fName, S.lName | Staff(S) Ù ((∀P) (~PropertyForRent(P) ∨ ~(S.staffNo = P.staffNo)))} Expressions can generate an infinite set. For example:
ENGINEER’S CIRCLE, GWALIOR Page 18

{S | ~Staff(S)} This type of expression are called unsafe expression. To avoid this, add restriction that all values in result must be values in the domain of the expression.

DOMAIN RELATIONAL CALCULUS

Uses variables that take values from domains instead of tuples of relations. If F(d1, d2, . . . , dn) stands for a formula composed of atoms and d1, d2, . . . , dn represent domain variables, then: {d1, d2, . . . , dn | F(d1, d2, . . . , dn)} is a general domain relational calculus expression. Examples: Find the names of all managers who earn more than £25,000. {fn, ln | (∃ sn, posn, sex, DOB, sal, bn) (Staff (sn, fn, ln, posn, sex, DOB, sal, bn) ∧ posn = ‘Manager’ ∧sal > 25000)} Note: When restricted to safe expressions, domain relational calculus is equivalent to tuple relational calculus restricted to safe expressions, which is equivalent to relational algebra. Means every relational algebra expression has an equivalent relational calculus expression, and vice versa. If unsafe expressions are not restricted then relational calculus is more powerful than relational algebra. Q23. With regard to the expressive power of the formal relational query languages, which of the following statements is true? (a) Relational algebra is more powerful than relational calculus (b) Relational algebra has the same power as relational calculus. (c) Relational algebra has the same power as safe relational calculus. (d) None of the above CS2002 Ans. b Explanation: there is no restriction on unsafe query is given in question.

FUNCTIONAL DEPENDECY

Definition: A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y. This is similar to functions in mathematics. In mathematics, a function f is said to be valid function when for every value of x ,f(x) return single value of y. For example, function y = x2 returns single value of y for every value of x (at x=0 y=0,at x=1 y=1,at x=2 y=4). Now consider the function, y=√x this function returns 2 values of by for every value of x(for x=4 it returns y = -2 and y = +2). So it is not a valid function. For a valid function we can say x functionally determine y. Q24.Consider Relation R(A,B,C) and sample data in R A B C a1 a1 a2 a2 b1 b2 b3 b4 c1 c1 c2 c2

Which of the following dependencies holds in R? 1. AB 2. BA 3. BC 4. AC
ENGINEER’S CIRCLE, GWALIOR Page 19

Ans. 1. AB – this functional dependency does not hold since for a1 there are two values in field B. 2. BA - functional dependency holds 3. BC - functional dependency holds 4. AC - functional dependency holds Note: by using sample data we can only decide which functional dependency is not holding. If a functional dependency is holding in sample data then it may or may not hold in whole relation. Q25. From the following instance of a relation schema R(A,B,C), we can conclude that: A 1 1 2 2 B 1 1 3 3 C 1 0 2 2

(a) A functionally determines B and B functionally determines C (b) A functionally determines B and B does not functionally determines C (c) B does not functionally determines C (d) A does not functionally determines B and B does not functionally determines C CS2002 Ans. d Explanation: from a instance of schema we can only prove that particular functional dependency is not holding but we can’t determine that functional dependency is holding. So, options (a) and (b) are wrong. Option (c) is wrong because for data ‘1’ in B there are two values in C. Some rules for Functional dependencies: 1. Reflexivity: XX always holds. It means a attribute or group of attribute always functionally determines itself. 2. Transitivity: if XY and YZ then XZ 3. Pseudo-transitivity: if XY and YZW then XZW 4. Additivity: if XY and XZ then XYZ 5. Projectivity: if XYZ then XY and XZ 6. Augmentation: if XY then XZY Where X,Y and Z are single attributes or group of attributes of a relation. Note: if ABC then you can’t divide AB C into AC and BC. Closure of a attribute(*): closure of attributes contains all attributes those are directly or indirectly driven by this attribute(using above rules). Example: for a relation R(A,B,C,D), functional dependencies are: AB , BC , BCD. Closure of A: By 1st rule a attribute derives itself so its closure contain A(i.e {A}*={A}). Now from AB, B can be directly derive from B. if AB and BC then AC(2nd rule), C can be derived from A . similarly if AB and AC then ABC(4th rule) and if ABC and BCD then AD, so D can be derived from A.
ENGINEER’S CIRCLE, GWALIOR Page 20

A* = {A,B,C,D} Simlarly, B*={B,C,D} , C*={C} , D*={D} If closure of a attribute of attributes contains all attributes of relation then attribute is candidate key of relation. In above example A is candidate key of R. Note: How to find closure of group of attributes: suppose we want to find closure of BC in above example then closure of BC contains attribute directly or indirectly driven by B,C and BC. {BC}*={B,C,D} Q26. consider following functional dependencies for relation R(A,B,C,D,E,F,G,H,I,J,K) ABC , ADE, BF , FGH , D IJ Find closure of AB Ans. {AB}* ={A,B,C,D,E,F,G,H,I,J} Note: in above example AB is not candidate key of R since K is not in closure of AB. K is also not in any functional dependency. Attributes those are not in any functional dependency must be part of candidate key, so candidate key of R is ABK. Q27. In a schema with attributes A, B, C, D and E following set of functional dependencies are given. AB , AC , CDE, BD, EA Which of the following functional dependencies is NOT implied by the above set? (A) CD  AC (B) BD  CD (C) BC  CD (D)AC  BC IT2005 Ans. B Explanation: Find closure of attributes in left of all options (A) {CD}+ = { CDEAB} - AC is in closure so AC can be derived from CD (B) {BD}+ = {BD} - CD is not in closure so CD can not be derived from BD (C) {BC}+ = { BCDEA} - CD is in closure so CD can be derived from BC + (D) {AC} = { ACBDEA} - CD is in closure so CD can be derived from AC

Minimal Cover: Minimal cover of functional dependencies is set of functional

dependencies which does not contain any redundant functional dependency. For example, if a relation R(A,B,C) have functional dependencies{AB,BC,AC}. In this set AC is redundant because it can be derived from AB and BC, so we need not to write this functional dependency in set. {AB,BC} is minimal cover of dependencies. Steps to find Minimal Cover: Consider the following functional dependencies of relation R(A,B,C,D,E,F) AC , ACD , EADH Step1: covert all functional dependencies to simple form. (If XYZ then break it into XY and XZ).Now functional dependencies for R is: AC , ACD, EA, ED, EH Step2: to check whether a functional dependency is redundant or not , first hide that functional from set and then find closure attributes those are at left of that functional dependency without using reflexivity rule , if closure contains same attributes for whom we are finding closure then functional dependency is redundant, remove this functional dependency from the set. First we check for AC , dependencies remains after hiding it:
ENGINEER’S CIRCLE, GWALIOR Page 21

ACD , EA , ED , EH

Problems In Unorganized Relation

Consider the relation student(Rollno, Name, CourseNo , CourseName) with (rollno,courseno) as primary key. following problems are in this relation: Data Redundancy: if one course is assigned to many student then that course name and course number will be in many records in tables. This causes following anomalies in table: 1. Insertion anomaly: we can’t insert a new course until at least one student register for it. 2. Deletion Anomaly: if we want to delete a course from table then student information may loss. 3. Updation anomaly: if we want to change course name of that course then we have to change course in all the records of students those are assigned to that course.

Normalization: To remove data redundancy and anomalies we Normalize table by
decomposing into multiple tables. Following normal forms are defined for Normalization: 1st Normal Form: a relation is said to be in 1st normal form if it’s data is represented in tabular form or atomic and there should not be duplicated row(whole row should not be duplicated, at least value in one same field of two rows must be different ). Example: Consider following data in employee table. Empid Ename Job Programmer 1 Vivek Analyzer Project manager Salary 30000 20000 12000

Above table appears to be in tabular form but it’s not in tabular form. A table in is in tabular form if it for every row each column have single value. Above table will be in 1NF if it is represented as Empid 1 1 1 Ename Vivek Vivek Vivek Job Programmer Analyzer Project manager Salary 30000 20000 12000

2nd Normal Form(2NF) : To understand 2NF first look at these terms: Consider relation Student(Rollno, Name, CourceNo,CourceName, Deptid, DeptName) • Prime attribute: Attributes those are parts of candidate key/primary key but not a candidate key. For example, if {rollno , courseno , deptid} is candidate key of student relation then rollno , courseno , {rollno, courseno}, {rollno,deptid} and {courseno,deptid} are the prime attributes. In other words, prime attributes are proper subset of candidate keys. • Determinant: in functional dependency X  Y , X is determinant(attributes at the tail(left side) of arrow)
ENGINEER’S CIRCLE, GWALIOR Page 22

Partial dependency: a functional dependency is said to be partial when determinant is prime attribute and right side of arrow have non-prime attribute. Consider following functional dependencies for student relation defined above and {rollno , courseno , deptid} as candidate key . Rollno name partial (primenon-Prime) Rollno ,courseno ,deptid name not partial(non-primenon-Prime) Rollnocourseno not partial(primePrime) Name,courseno  rollno not partial(non-primePrime)

A relation is said to be in 2NF if and only if it is in 1NF and every non-key attribute is fully dependent on the primary key. or in other words, A relation is said to be in 2NF if and only if it is in 1NF and there exist no partial dependency. Relation in 2NF has redundancy and suffers from anomalies. Note: if all candidate keys have single attribute, then there will be no prime attribute and relation will be in 2NF. 3rd Normal Form(3NF) : A relation R is in third normal form (3NF) if and only if it is in 2NF and every non-key(non-prime) attribute is non-transitively dependent on the primary key. A functional dependency XY not violates 3NF conditions if either X is candidate key or Y is prime attribute , where X and Y attributes or group of attributes. If any of the functional dependencies violates 3NF conditions then relation is not in 3NF. An attribute C is transitively dependent on attribute A if there exists an attribute B such that: AB and BC. Note that 3NF is concerned with transitive dependencies which do not involve candidate keys. If A 3NF relation have more than one candidate key then it can have transitive dependencies of the form: primary_key  other_candidate_key  any_non-key_column. A relation R having just one candidate key is in third normal form (3NF) if and only if the non-key attributes of R (if any) are: 1) mutually independent(attributes, those are not present in any functional dependency, are mutually independent) , and 2) fully dependent on the primary key of R. A non-key attribute is any column which is not part of the primary key. Two or more attributes are mutually independent if none of the attributes is functionally dependent on any of the others. A relation R having just one candidate key is in third normal form (3NF) if and only if no non-key(non-prime) column (or group of columns) determines another non-key(non-prime) column (or group of columns). Example: consider a relation ShipDetails (Ship, Capacity, Date, Cargo ,Value) with following functional dependencies: Ship, DateCargo, Capacity Cargo  Value Capacity Value To find whether given relation is in 3NF or not, first find all candidate keys of relation using closure of attributes, then find whether relation is in 2NF or not, then check for 3NF. Step1: candidate key of above relation is {ship, date}. Step 2: There is no partial dependency so relation is in 2NF. Step 3:
ENGINEER’S CIRCLE, GWALIOR Page 23

Ship, DateCargo, Capacity attribute) Cargo  Value Capacity Value Relation ShipDetails is not in 3NF.

not violates 3NF conditions(candidate keynon-prime violates 3NF(non-primenon-prime) violates 3NF(non-primenon-prime)

A relation in 3NF does not have any anomalies but it still have redundancy. Boycee Cott’s Normal form(BCNF): A relation is in BCNF if it contains functional dependencies of form XY, where X is superkey. This is Strongest than 3NF. Powers of Normal Form can be compared as 1NF < 2NF < 3NF < BCNF Q28. consider the following functional dependencies in a database. Date_of_Birth  Age Age Eligibility Name  Roll_Number Roll_Number  Name Course_Number Course_Name Course_Number Instructor (Roll_Number, Course_number)Grade The relation (Roll_Number, Name,Date_of_Birth,Age) is (A) in second normal form but not in third normal form (B) in third normal form but not in BCNF (C) in BCNF (D) in none of the above

CS2003

Ans. D Explanation: functional dependencies applicable for relation (Roll_Number, Name, Date_of_Birth, Age) are: Date_of_Birth  Age Name  Roll_Number Roll_Number  Name To check that a relation is in which normal form we should apply test from lower level. First apply test for 2NF Candidate keys of relations are: {Name , Date_of_birth} and {Roll_number , Date_of_birth} Now check for partial dependencies Date_of_Birth  Age - partial Name  Roll_Number - partial Roll_Number  Name - partial There exist partial dependency in relation, relation is not in 2NF , so relation will not be in neither 3NF nor BCNF. Q29. The relation scheme Student Performance (name, courseNo, rolINo, grade) has the following functional dependencies: name, courseNo grade rolINo, courseNo  grade name  rolINo rolINo  name The highest normal form of this relation scheme is (a) 2 NF (b) 3 NF (c) BCNF (d) 4 NF CS2004
ENGINEER’S CIRCLE, GWALIOR Page 24

Ans.b Explanation: candidate keys of relation are: {name, courseNo} and {rollno, courseNo} First apply test for 2NF. name, courseNo grade - not partial(non-primenon-prime) rolINo, courseNo  grade - not partial(non-primenon-prime) name  rolINo - not partial(primeprime) rolINo  name - not partial(non-primenon-prime) in this relation no partial dependency exist so relation is in 2NF. Now check for 3NF: for every XY either X is candidate key or Y is prime attribute. name, courseNo grade - not violating 3NF( candidate key at left side) rolINo, courseNo  grade - not violating 3NF( candidate key at left side) name  rolINo - not violating 3NF( prime attribute at right side) rolINo  name - not violating 3NF( prime attribute at right side) No dependency is violating 3NF condition ,so relation is in 3NF Now check for BCNF: for every XY X should be super key. name, courseNo grade - not violating BCNF( super key at left side) rolINo, courseNo  grade - not violating BCNF( super key at left side) name  rolINo - violating BCNF rolINo  name - violating BCNF Relation is not in BCNF. Highest normal form of relation is 3NF.

Desirable Properties of Decomposition:

Lossy and lossless-join decomposition: if divided tables are not able to produce original table after join then decomposition of table is lossy. This does not data is lost after joining tables but extra spurious tuples may produced. Consider the following relation enrol (sno, cno, date-enrolled, room-No., instructor) Sno cno date-enrolled room-No. instructor 83005 7 83005 7 82015 9 82567 8 82678 9 CP302 CP303 CP302 CP304 CP305 1FEB1984 1FEB1984 10JAN1984 1FEB1984 15JAN1984 MP006 MP006 MP006 CE122 EA123 Gupta Jones Gupta Wilson Smith

Suppose we decompose the above relation into two relations enrol1 and enrol2 as follows enrol1 (sno, cno, date-enrolled) enrol2 (date-enrolled, room-No., instructor) There are problems with this decomposition but we wish to focus on one aspect at the moment.
ENGINEER’S CIRCLE, GWALIOR Page 25

date-enrolled Let the decomposed relations enrol1 1FEB1984 and enrol2 be: 1FEB1984 10JAN1984 Sno Cno date-enrolled 1FEB1984 83005 CP302 1FEB1984 15JAN1984 7 CP303 1FEB1984 83005 CP302 10JAN1984 7 CP304 1FEB1984 82015 CP305 15JAN1984 9 82567 8 82678 9

room-No. MP006 MP006 MP006 CE122 EA123

instructor Gupta Jones Gupta Wilson Smith

All the information that was in the relation enrol appears to be still available in enrol1 and enrol2 but this is not so. Suppose, we wanted to retrieve the student numbers of all students taking a course from Wilson, we would need to join enrol1 and enrol2. The join would have 11 tuples as follows: Sno Cno date-enrolled room-No. instructor 83005 7 83005 7 83005 7 83005 7 83005 7 83005 7 CP302 CP302 CP303 CP303 CP302 CP303 1FEB1984 1FEB1984 1FEB1984 1FEB1984 1FEB1984 1FEB1984 MP006 MP006 MP006 MP006 CE122 CE122 Gupta Jones Gupta Jones Wilson Wilson

The join contains a number of spurious tuples that were not in the original relation Enrol. Because of these additional tuples, we have lost the information about which students take courses from WILSON. (Yes, we have more tuples but less information because we are unable to say with certainty who is taking courses from WILSON). Such decompositions are called lossy decompositions. A decomposition must be lossless. How to check whether a decomposition is lossy or lossless-join decomposition.
ENGINEER’S CIRCLE, GWALIOR Page 26

For this we have to check whether decomposed tables are able to produce original table or not. Suppose we have relation R(A,B,C,D,E) with functional dependencies : AB , AC , DC , DE Let we decomposes R in two table R1(A,B,D) and R2(C,D,E) Step 1: Create at table with row equals to number of decomposed relations, and columns equals to all attributes in R. A B C D E R1 R2 Step2: Now put X into cell(m,n) where m is decomposed relation and n is field which is A R1 R2 present in relation m X B X X C D X X X E

Step 3: Now search for all column in table which X in two rows(which is D here). Step 4: find those functional dependencies which have column, found in step 3, at left side. (DC and DE in above example). Put X into cell(m,n) where m is row selected in step 3 and n is attributes in the right of these A R1 R2 X B X C X X D X X E X X

functional dependencies(C and E for rows selected in D).

Step 5: repeat step 3-5 until no further filling is possible. If any of the row contain X in all columns then decomposition is lossless-join else it is lossy. A R1 R2 X X B X X C X X X X D E

Suppose we decompose R in R1(A,B,C) and R2(A,D,E) then final table for it will be
ENGINEER’S CIRCLE, GWALIOR Page 27

Means this decomposition is also loss-less join A R1 R2 X B X C X X X X D E

If we divide R in R1(A,B,C) and R2(C,D,E) then final table for it will be

No further filling of table is possible because there is no functional dependency in relation having C as determinant. Dependency Preserving: when we decompose a table into multiple table then every dependency in original table must be preserved (every dependency must be satisfied by at least one decomposed table). in previous example if we divide R in R1(A,B,C) and R2(A,D,E) then the dependency DC is not satisfied by R! and R2, because of them not containing D and C together. This decomposition is not dependency preserving. How to decompose a relation into BCNF A relation is not in BCNF when functional dependencies of relation not satisfying the conditions of BCNF. Consider a relation R(A,B,C,D) with following dependencies: AB BC CD From these set of dependencies we can find primary key of R which is because closure of A, A*={ABCD} In this set of functional dependencies BC and CD is violating conditions of BCNF. Take dependencies , those are violating BCNF conditions, one by one and create separate table containing attributes in functional dependency(attributes in the left of functional dependency forms) and remove attribute at the right of these functional dependency from original table. First we take CD . create separate table for this relation,R1(C,D) and remove D from R, now remaining attributes in R are {A,B,D}. First we take BC . Create separate table for this relation,R2(B,C) and remove C from R, now remaining attributes in R are {A,B}. So finally three tables are created : R(A,B) , R1(C,D), R2(B,C) . these tables are now in BCNF. This decomposition is lossless and dependency preserving. Suppose if we would have taken BC first instead of CD then R and R1 after first decomposition would be R(A,B,D) R1(B,C) Now CD is not holding by R and R1, so this dependency is lost, no further decomposition is possible. This decomposition is lossless but not dependency preserving. BCNF decomposition is lossless but may or may not dependency preserving.
ENGINEER’S CIRCLE, GWALIOR Page 28

Note: if a relation R is having no functional dependency then highest normal form supported by such relation is BCNF. Q30. Relation R with an associated set of functional dependencies, F, is decomposed into BCNF. The redundancy (arising out of functional dependencies) in the resulting set of relations is (a) Zero (b) More than zero but less than that of an equivalent 3NF decomposition (c) Proportional to the size of F+ (d) Indeterminate CS2002 Ans. a Explanation: if a relation is in BCNF then there is no redundancy left in relation , but if a relation is in 3NF then there will be redundancy with no anomalies. Q31. Relation R is decomposed using a set of functional dependencies, F, and relation S is decomposed using another set of functional dependencies, G. One decomposition is definitely BCNF, the other is definitely 3NF, but it is not known which is which. To make a guaranteed identification, which one of the following tests should be used on the decompositions? (Assume that the closures of F and G are available). (a) Dependency-preservation (b) Lossless-join (c) BCNF definition (d) 3NF definition CS2002 Ans. C Explanation: if we apply BCNF test to both F and G then only one of them will pass the test(which is in BCNF) other will fail(which is in 3NF). Q32. Which one of the following statements about normal forms is FALSE? (a) BCNF is stricter than 3NF (b) Lossless, dependency-preserving decomposition into 3NF is always possible (c) Lossless, dependency-preserving decomposition into BCNF is always possible (d) Any relation with two attributes is in BCNF Ans. c

CS2005

TRANSACTION MANAGEMENT

Transactions is A sequence of many actions which are considered to be one atomic unit of work. Transacttion in DBMS uses following operations: – Read, write, commit, abort Each transaction has a unique starting point, some actions and one end point. A transaction is a unit of work which completes as a unit or fails as a unit. Properties of transactions(ACID) • Atomicity: All actions in the transaction happen, or none happen . in other words, An event either happens and is committed or fails and is rolled back. e.g. in a money transfer, debit one account, credit the other. Either both debiting and crediting operations succeed, or neither of them do. Transaction failure is called Abort. Commit and abort are irrevocable actions. There is no undo for these actions. An Abort undoes operations that have already been executed. For database operations, restore the data’s previous value from before the transaction (Rollback-it); a Rollback command will undo all actions taken since the last commit for that user. But some real world operations are not undoable. Examples - transfer money, print ticket, fire missile
ENGINEER’S CIRCLE, GWALIOR Page 29

• Consistency: If each transaction is consistent, and the DB starts consistent, it ends up

consistent. Consistency preservation is a property of a transaction, not of the database mechanisms for controlling it (unlike the A, I, and D of ACID). If each transaction maintains consistency, then a serial execution of transactions does also. A database state consists of the complete set of data values in the database. A database state is consistent if the database obeys all the integrity constraint. A transaction brings the database from one consistent state to another consistent state. • Isolation: Execution of one transaction is isolated from that of other transactions • Durability: If a transaction commits, its effects persist. When a transaction commits, its results will survive failures (e.g. of the application, OS, DB system … even of the disk). Durability makes it possible for a transaction to be a legal contract. Implementation is usually via a log – DB system writes all transaction updates to a log file. To commit, it adds a record “commit(Ti)” to the log.When the commit record is on disk, the transaction is committed. Then system waits for disk acknowledgement before acknowledging to user. There can be five state of transactions: 1. Active: transaction is started and is issuing reads and writes to the database. 2. Partially committed: operations are done and values are ready to be written to the database. 3. Committed: writing to the database is permitted and successfully completed. 4. Abort: the transaction or the system detects a fatal error. 5. Terminated: transaction leaves the system.

A transaction reaches its commit point when all operations accessing the database are completed and the result has been recorded in the log. It then writes a [commit, ] and terminates When a system failure occurs, search the log file for entries [start, ] and if there are no logged entries [commit, ]then undo all operations that have logged entries [write, , X, old_value, new_value] Durability is hardware aspect while consistency programming aspect(programmer should design tables and write queries in such a way that consistency of database is maintained). To guarantee ACID property following test are performed in DBMS. E.g. Concurrency Control – Guarantees Consistency and Isolation, given Atomicity. Logging and Recovery – Guarantees Atomicity and Durability. Concurrency Control: concurrent transaction causes various problems if they run in uncontrolled manner. Consider two transactions T1 and T2 running concurrently then following problems may occur: • Lost update – Two transactions simultaneously update the same files • Uncommitted update – Transaction 2 uses the result updated by transaction 1 – Transaction 1 aborts and rolls back – Transaction 2 commits • Inconsistent Analysis – Transaction 1 reads
ENGINEER’S CIRCLE, GWALIOR Page 30

– Transaction 2 reads and uses for calculation – Transaction 1 updates and commits – Transaction 2 updates and commits Consider following two transactions on Bank Table: T1 Update bank accountno=10 set bal=3000 where Update bank accountno=10

T2 set bal=4000 where

If two transaction are valid and they executed serially(eighther <T1,T2> or <T2,T1> in above case) then system will always move from one valid state to another valid state. This type of execution is called serial execution and schedule is serial schedule. A concurrent schedule is called serializable if it behaves like(or equivalent) serial schedule. Consider following transaction T1 and T2 T1 T2 Read A A=A+30 Write A Read B B=B-30 Write B Read A A=A*5 Write A Read B B=B/5 Write B

There are two possible serial schedule for above two transactions: S1:<T1 ,T2> execute T1 first then T2 S1:<T2 ,T1> execute T2 first then T1 Lets analyze these schedule, assume initially A is 100 and B is 200 S1:<T1 ,T2> Initially After T1 Value of A Value of B S2:<T2 ,T1> Value of A Value of B 100 200 Initially 100 200 130 170 After T2 500 40

T2 650 34 T1 530 10

These two schedules are not equivalent because T1 and T2 reads different value of A and B(in S1 ,transaction T1 reads value of A=100 and B=200, while in S2 it reads value of A=500 and B=40). But these schedules are valid because they are serial. They always returns to a consistent state. Now consider concurrent schedule S3 for transaction T1 and t2 in above example: T1 T2 Read A A=A+30
ENGINEER’S CIRCLE, GWALIOR Page 31

Write A Read A A=A*5 Write A Read B B=B-50 Write B Read B B=B / 5 Write B Now analyze this using same initial values S3 initially After first After first half After second After half of T1 of T2 half of T2 second half of T2 Value of A Value of B 100 200 130 650 170 34

This schedule is equivalent to the serial schedule S1( values of A and B read by T1 and T2 is same as in S1). A schedule is called serializable if it is equivalent to a serial schedule. So S3 is serializable. In other words ,if change in order of instruction in a serial schedule results in a concurrent schedule that exihibit same behavior as that serial schedule then concurrent schedule is serializable schedule. Testing for serializability: There are two type of serializability. 1. Conflict serializability 2. View Serializability Conflict Serializable: Conflict actions are the sequence of actions which should not be changed to maintain serializability for every data item. As in above example two transaction is concurrent and they are executing by interleaving there action but for data item A sequence T1--->T2 is maintained ,similarly for data item B sequence T1--->T2 is maintained. So all actions on data items are executing in sequence same as a serializable schedule. This schedule is conflict serializable. There are two type of conflict that can occur in a schedule. i. Read-Write conflict : A transaction T1 reads a data item (Let data item is A) then other transaction T2 write data item A (before data item A is written by T1) . So these two readwrite actions are in conflict. ii. write-read conflict : A transaction T1 writes a data item (Let data item is A) then other transaction T2 read data item A (before data item A is read by T1) . So these two write-read actions are in conflict
ENGINEER’S CIRCLE, GWALIOR Page 32

iii. Write-write conflict: A transaction T1 writes a data item (Let data item is A) then other transaction T2 write data item A (before data item A is read by T1) or vice versa. So these two read-write actions are in conflict. Example: consider the following schedule S4 T1 T2 In this schedule there are two conflicts only shown by arrows. Arrow indicate sequence in which actions are conflicting. To find conflict we have to check from starting of schedule. pick a read/write action from the beginning of schedule then find write instruction next to it on that data item. If conflicting actions are in same transaction then do nothing (1 and 5 , 6 and 8). If conflicting actions are in different transactions then make an arrow from earlier to later action(3-5 and 8-10). In this example(8-12) can be pair of conflict action but we have made arrow (8-10) where 10 is before 12 and in same transaction T2 so need not to make arrow for(8-12). Write A Read B B=B / 5 Write B

1 2 3 4 5 6 7 8 9 10 11 12

Read A A=A+30 Read A A=A*5 Write A Read B B=B-50 Write B

Conflict actions shows that if you make new schedule S5 swapping to actions of transactions in a schedule S4 but order of conflicting action(3-5, 5-9 and 8-10) remains same then new schedule S5 will be equivalent to S4. Consider new S5 created by swapping some instructions in T1 T2 This schedule is equivalent to S4 since order of conflicting actions is same as S4. Read A A=A+30 Write A Read B A=A*5 B=B-50 Write A Write B
Page 33

1 2 3 4 5 6 7 8 9

Read A

ENGINEER’S CIRCLE, GWALIOR

10 11 12 Read B B=B / 5 Write B

To find whether a schedule is conflict serializable or not draw dependency graph. This graph is directed contains transactions as node and conflicting actions as edges( lables on edges can be given, label contains name off data item for which conflict occur). For S4 and S5 dependency graph would be:

If dependency graph contains cycle then schedule is not conflict serializable. This graph contains cycle so S4 and S5 is not conflict serializable. If dependency graph does not contain cycle then we can find that schedule is equivalent to which serial schedule by using topological sort of dependency graph. Q33. Consider three data items D1, D2, and D3, and the following execution schedule of transactions T1,T2,and T3. In the diagram, R(D) and W(D) denotes the actions reading and writing the data item D respectively. T1 T2 T3 R(D3) R(D2) W(D2) R(D2) R(D3) R(D1) W(D1) W(D2) W(D3) R(D1) R(D2) W(D2) W(D1)
ENGINEER’S CIRCLE, GWALIOR Page 34

Which of the following statements is correct? (A) The schedule is serializable as T2;T3;T1 (B) The schedule is serializable as T2;T1;T3 (C) The schedule is serializable as T3;T2;T1 (D) The schedule is not serializable.

CS2003

Ans. A Explanation: If a schedule is conflict serializable then schedule is serializable. So first we apply conflict serializability test on schedule. Step 1. Find all conflicts T1 T2 R(D3) R(D2) W(D2) R(D2) R(D3) R(D1) W(D1) W(D2) W(D3) R(D1) R(D2) W(D2) W(D1) Now draw dependency graph using these conflicts
T2 T3 T1

T3

There is no cycle in dependency graph, so schedule is conflict serializable and order of serialization can be found using topological sort of graph which is T2;T3;T1;

Locking-used to implement serialization and concurrently control practically.

there are two types of lock (1) shared lock- it is applied when a transaction wants to only read data item. Multiple transaction can acquire shared lock simultaneously on same data item. (2) exclusive lock- when transaction want to manipulate data it put exclusive lock. it can be acquired by single transaction at a time on a data item. There are locking protocol which guide when to lock or unlock
ENGINEER’S CIRCLE, GWALIOR Page 35

2PL- 2 phase lock- transaction is divided in two phase. (1) Growing phase-a transaction can acquire lock only when it is in this phase at start up transaction is in growing phase (2) Shrinking phase- transaction cannot acquire lock in this phase. it start as soon as transaction unlock any data item X(A) A=A+30 write(A) X(B) unlock(A) read(B) B=B+30 write(B) unlock(B growing phase

Shrinking phase

ENGINEER’S CIRCLE, GWALIOR

Page 36

T1

T2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

X(A) A=A+30 Write A X(B) Unlock A X(A) Read A A=A+50 Write A Read B If(B<60) rollback X(B) Unlock(A) Read B B=B / 5 Write B Unlock(B)

In this schedule if T1 is rollback(12) then T2 will be affected since it have read value of A written by T1. So T1 should also be rollbacked. When one transaction causes other to rollback then this is called cascade rollback

Problems in 2PL To avoid cascade rollback there is some modification done in 2PL and modified 2PL is called Strict 2PL. All data item can be unlock at the end of transaction, so that interleaving of transaction are minimum(restrict concurrent execution to certain limit). This increases waiting time of other transactions. In 2PL there is possibility of unrecoverable schedules in which one transaction read value of data item from an uncommitted transaction, manipulate the value and then commit itself. If that uncommitted transaction is rollbacked then value of data item written by transaction become invalid, but transaction have committed and value is stored on disk so no rollback can be performed on committed transaction. This type of transaction are called unrecoverable transaction. In Strict 2PL there is no problem of unrecoverable transactions. Note: Schedule in 2PL always satisfy conflict serializability, but there is problem of deadlock in both Simple 2Pl and Strict 2PL.
ENGINEER’S CIRCLE, GWALIOR Page 37

Deadlock: Consider following situation for two transaction T1 and T2 T1 T2 X(A) X(B) X(B) X(A) . . . . . . In this case T1 will wait for B and T2 will wait for A and this situation is called deadlock when two or more transaction are waiting for other transaction to unlock data item but no transaction can make any progress. 2PL is a pessimistic approach. Optimistic approach- according to this approach do not lock any data item, since generally 99% transaction are on different data item. It allows all transaction to read and manipulate data but it has mechanism to detect inconsistencies. If inconsistencies occurred then rollback transactions. Example of optimistic approach is timestamp protocol. Q34. which of the following scenario may lead to an irrecoverable error in database system? (A) A transaction writes a data item after it is read by an uncommitted transaction (B) A transaction reads a data item after it is read by an uncommitted transaction (C) A transaction reads a data item after it is written by an committed transaction (D) A transaction reads a data item after it is written by an uncommitted transaction CS2003 Ans. C

ENTITY RELATION DIAGRAM

Entity and Attributes: Any real time object is entity. Every entity have attributes. Ex. Book is an entity and bookno , authors are its attributes. Entities are represented by rectangle and attributes by ellipse.

If a attribute is primary key then it is represented by underlining the name of attribute. Types of attribute: 1. Composite attribute: An attribute that can be divided into other attributes is alled composite attribute Ex. Address can be divided into street, city ,state , country ,pin.

2. Derived Attribute: An attribute that can be derived from other attribute. Ex. Age can be derived from DOB(date of birth). Derived attributes are represented by dashed ellipse.

ENGINEER’S CIRCLE, GWALIOR

Page 38

3. Multivalued Attributes- an attribute have multiple values is called multivalued attribute. Ex. A student can have multiple phone numbers. Multivalued attributes are denoted by double ellipse.

4. Simple Attributes. Attributes those are not composite, derived or multivalued. Relation: A relation is an association(“has-a” relationship) among several entities. E.g. student has a book. Relation is represented by diamond. Example: An E-R diagram of Book issuing system of an institute’s library:

A relation can have attributes. For example, in above ER diagram relation issued has attribute “issuedate” which shows the date on which book is issued. Cardinality of relations: express the number of entities to which another entity can be associated via a relationship. For binary relationship sets between entity sets A and B, the mapping cardinality must be one of: 1. One to one: An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A. E.g. if every student is allowed to borrow one book only. 2. many to one: An entity in A is associated with any number in B. An entity in B is associated with at most one entity in A. E.g. . if every student is allowed to borrow multiple books. 3. many to many: Entities in A and B are associated with any number from each other. E.g. if a book can be issued to many student and a student can borrow many books. To denote cardinality put label ‘1’ on one side and ‘n’ for many side
Student

1

issue

n
Book

ENGINEER’S CIRCLE, GWALIOR

Page 39

Partial participation: there may be some entry in entity which are not in relation, so there is partial participation of entity in relation. Example: there may be students , those don’t have borrowed any books or there may be books those are not issued to any student. Total participation : if every entry in entity participate in relation. Total participation of an entity into the relation is denoted by double line Types of entities: 1. Strong entity: those entities having primary key are called strong entities. These are represented by single rectangle. 2. Weak Entity: those entities not having primary key are called weak entities. These are represented by doubly-outlined rectangle. Relation joining weak entity is also represented by doubly-outlined diamond. Example: consider ER Diagram for Bank-Loan system. n 1
Loan payme

LoanNo

Date

installmentNo

amount

In this System Payment is weak entity because installment number for two customer is same( Customer A pays 3000 for his 1st installment and Customer B pays 3000 for his 1 st installment ). So in payment there is no primary key. Facts about Weak Entities • weak entity always have total participation in relation. • If weak entity have relation with strong entity then cardinality is one to many as shown in above figure(with one at weak entity and many at strong entity). • Weak entity can also be represented as multivalued attribute. Or in other words if a multivalued dependency have composite attribute(more than one) then multivalued dependency can be represented as weak entity. Translation of ERD into Tables Using following rules we can convert an entity-relationship diagram into tables: 1. Create table for each strong entity and create column for each simple attribute in the entity. In case of composite key create column only for sub-attributes. 2. Do not create separate column for derived attribute. Derived attributes are not added into the tables. 3. Create separate table for multivalued attributes. Add primary key of entity to the column of table for multivalued attribute. Similarly, create separate tables for weak entities. Primay key of tables for multivalued attributes/weak entities will be formed by combining partial key of weak entity and primary key of entity(strong entity with which weak entity is in relation). Partial key/Discriminator of weak entity is the key which uniquely identifies weak entity. 4. Tables for relations depends on their cardinality: i. for many-to-many relationship , create separate table for relationship and add primary key of both entities in the table for relation. Primary key of this table will be formed by combining primary key of both entities. ii. for one-to-many relation there is no need to create separate table. Add primary key of oneside-entity in the table for many-side-entity. If relation has attributes then add this attribute as column in many-side-entity. iii. also for one-to-one relation there is no need to create separate table. Add primary key of either entity in table for other entity. Example: tables for following ERD will be

ENGINEER’S CIRCLE, GWALIOR

Page 40

Book(BookNo,Bname, Rollno, issueDate) Student(RollNo, firstname,lastname,DOB) Phoneno(rollno,phoneno) Note: tables created using ERD are in 3NF. Statement for Linked Answer Questions: 35 & 36 Consider the following ER diagram

Q35. The minimum number of tables needed to represent M, N, P, R1, R2 is (when one to many relationship is shown by arrow then head of arrow is at one-side entity) (A) 2 (B) 3 (C) 4 (D) 5 CS2008 Ans. B Explanation: According to rules table created will be M(M1,M2,M3,P1) P(P1,P2) N(N1,N2,P1) No relation is many-to-many, so no table created for relation. 36. Which of the following is a correct attribute set for one of the tables for the correct answer to the above question? (A) {M1,M2,M3,P1} (B) {M1,P1,N1,N2} (C) {M1,P1,N1} (D) {M1,P1} CS2008 Ans. A Explanation: see explanation of previous question

Hashing and Indexing
Database are stored on secondary storage in form of files. Every secondary storage is divided in blocks and Database is divided into records. Suppose we have a secondary storage with block size 512 bytes and every record(row) is 100. Suppose we have to store 500 records of
ENGINEER’S CIRCLE, GWALIOR Page 41

this table then there are two types of organization that can be used for storing records in blocks. Spanned Organization: block size is 512 bytes, so 5 records can be stored on first block directly then remaining 12 byte of block is occupied by 12 byte of 6th block and next 88 bytes of 6th block is stored on 2nd block. This type of organization is called Spanned organization. For storing 500 records we require only  (100*500) /512 =  97.6 =98 blocks. Unspanned organization: in this organization whole record is stored on a block , if some space is available in block and record size is larger than this space then record will be stored in new block. For above example 6th record will be stored in new block. Total number of block required in this organization for storing 500 records=500/5 = 100 blocks. In spanned organization, accessing a record require more time than unspanned organization, as in above example accessing 6th record require accessing of two blocks. Main purpose of indexing is to speed up searching. Searching in a table using linear or binary search(if records are sorted) is not practical, since they require large time(for above example searching in 100 blocks require loading and unloading of 100 blocks in to the memory, accessing time will be large). A sophisticated algorithm is needed for searching in tables. Index are created for every record using some key (generally primary key), this identifies each record uniquely. Each index is associated with a record pointer which points to record which is stored at secondary storage. Index and record pointer associated with it is called index record. These index records are kept in separate file, called index file. If we want to search a record, first we search index in index file, if it is found then we can locate whole record using record pointer. Suppose we have index record of 6 bytes, then for 500 records in table we have 500 index records. A block of 512 byte can store 512/6= 85 index records, and for 500 index records we require  500/85 =  5.88 =6 blocks. Now to search a record in table we have to access only 6 blocks. Drawbacks of indexing: – requires extra storage – if we want to search using other key than we have to create another index file using that key. Still, searching index in index file using linear search or binary search is time consuming. To reduce time complexity of searching we use B-tree/B+-tree in index files. B-tree /B+-tree are created using index records. If we have a B-tree of degree n then every internal node can have maximum n-1 keys, n-1 record pointers associated with keys and n child pointers or block pointer (Here term “block pointer” is used for child pointer because, generally, every node of B-tree is stored on a separate block).

If block pointer size is b bytes, key size is k bytes, and record pointer size is r bytes then nb+(n-1)k+ (n-1) r = block size.
ENGINEER’S CIRCLE, GWALIOR Page 42

If given block pointer size is 8 bytes, key size is 4 bytes and record pointer size is 6 bytes then maximum degree supported by B-tree can be calculated using above formula n*8+(n-1)*4 +(n-1)*6=512 18n=522 n= 522/18=29 for above example maximum height of B-tree for 1000 index record will be log291000 =  1.46 =2 , which means to search a record we have to access only 2 blocks(instead of 1000/85= 12 blocks in linear search). Q37. Consider a table T in a relational database with a key field K. A B-tree of order p is used as an access structure on K, where p denotes the maximum number of tree pointers in a B-tree index node. Assume that K is 10 bytes long; disk block size is 512 bytes; each data pointer D is 8 bytes long and each block pointer PB is 5 bytes long. In order for each B-tree node to fit in a single disk block, the maximum value of p is: (A) 20 (B) 22 (C) 23 (D)32 IT2004 Ans. C Explanation: by using formula nb+(n-1)k+ (n-1) r = block size. Here n is p p *5+ (p-1)*10 + (p-1) * 8 = 512 23p= 530 p=23 B+-trees have different leaf structure. In B+- tree leaf node contains keys and record pointer associated with it and a block pointer pointing to next leaf node. Non-leaf nodes contains only keys and child pointer, there is no need to store record pointer at non-leaf node, because all keys are ultimately present on leaf node. For leaf node order will be maximum number of keys, record pointer pair a node can hold, but order of non leaf node is determined by maximum child pointers it can have. For leaf node equation will be: n* k+ n* r + b = block size For non-leaf node equation will be: (n-1) k+ n b = block size Q38. A B+ - tree index is to be built on the Name attribute of the relation STUDENT. Assume that all student names are of length 8 bytes, disk blocks are of size 512 bytes, and index pointers are of size 4 bytes. Given this scenario, what would be the best choice of the degree (i.e. the number of pointers per node) of the B+ - tree? (a) 16 (b) 42 (c) 43 (d) 44 CS2002 Ans.C Explanation: Degree of B+-tree can be calculated if we know the maximum number of key a internal node can have. By the formula for internal node of B+-tree (n-1) k+ n b = block size (n-1) * 8 + n*4=512 12n=520 n=43 Q39. The order of an internal node in a B+ tree index is the maximum number of children it can have. Suppose that a child pointer takes 6 bytes, the search field value takes 14 bytes, and the block size is 512 bytes. What is the order of the internal node? (a) 24 (b) 25 (c) 26 (d) 27 CS2004 Ans. c
ENGINEER’S CIRCLE, GWALIOR Page 43

Explanation: by formula for internal node of B+ tree of n degree (n-1) k+ n b = block size (n-1)*14 + n*6= 512 20 n=526 n=26 Q40. The order of a leaf node in a B+- tree is the maximum number of (value, data record pointer) pairs it can hold. Given that the block size is 1K bytes, data record pointer is 7 bytes long, the value field is 9 bytes long and a block pointer is 6 bytes long, what is the order of the leaf node? (A) 63 (B) 64 (C) 67 (D) 68 CS2007 Ans. A Explanation: order of leaf node B+ tree can be determined by formula n*k+ n* r + b = block size n*9 + n*7 + 6=1024 n*16=1018 n=63

Hashing

In B/B+ tree ,searching is faster but still we have to search. Hashing is used to remove searching complexity. In this we use a hash functions and indexes are then mapped into hash table according that hash function. if we want to locate an index then use hash function to find that index. In hashing, searching is removed completely only hash function is to map and locate indexes. Hash function: the hash function is chose in such a way that it can map all keys into hash table. for example we have hash function of ‘mod 10’ and we want to map keys 2010, 4011 , 3127,4256,3214 then hash table will look like 0 1 2 3 4 5 6 7 8 9 If a new key result in the position which is already filled in the hash table then collision occurs. Ex. If new entry is 5414 for above hash table then hash function returns 4th location in the table which is already filled. There are two ways to handle these collisions. 1. Open Addressing / Rehashing : slightly change the hash function for new key which is causing collision. Ex. Use (key+5)mod 10 when collision occurs.
ENGINEER’S CIRCLE, GWALIOR Page 44

2010 4011

3124

4256 3127

Linear probing: This is a type of rehashing function. In this if new entries collides then search for next free block in the table and fill this block by new entry. If in above example 5414 and 5444 arrives then hash table will be 0 2010 1 2 3 4 5 6 7 8 9 In linear probing there is problem of primary clustering , means data concentrated at one place. Hash function should distribute data uniformly to avoid primary clustering. Ex. Quadratic function can be used for distribute data as: Hashing function=(key+ n2 )mod 10 + where n denotes number of collision Key Collision number Hash table index 2010 4010 7120 9650 3250 0 1 2 3 4 (2010+02)%10= 0 (4010+12)%10= 1 (7120+22)%10= 4 (9650+32)%10= 9 (3250+42)%10= 6 3124 5414 4256 3127 5444 4011

2. Chaining: in this we make array of pointer instead of data. If entries of same hash function value makes link list. New entry is added at the end of link list. Example: chaining for data 2010, 4011 , 3127,4256,3214 , 5414 ,5444, 6457 ,9666, 8888 using key mod 10 function will be as: 0 2010 Null

1

4011

Null

2

Null

3

Null
Page 45

ENGINEER’S CIRCLE, GWALIOR

4

3214

5414

5444

Null

5

Null

6

4256

9666

Null

7

3127

6457

Null

8

8888

Null

9

Null

Q41. Consider a hash table of size seven, with starting index zero, and a hash function (3x + 4) mod 7 . Assuming the hash table is initially empty, which of the following is the contents of the table when the sequence 1, 3, 8, 10 is inserted into the table using closed hashing? Note that − denotes an empty location in the table. (A) 8, −, −, −, −, −, 10 (B) 1, 8, 10, −, −, −, 3 (C) 1, −, −, −, −, −, 3 (D) 1, 10, 8, −, −, −, 3 CS2007 Ans. B Explanation: hash function is (3x+4) mod 7 Key hash table index 1 0 3 6 8 0 10 6 Final hash table will be(if linear probing is used) 0 1 1 2 3 4 5 6 (C) 3 (B) Π 8 10 Q42. consider the following SQL query Select distinct a1, a2, , , an from r1, r2, , , rm where p For an arbitrary predicate p, this query is equivalent to which of the following relational algebra expressions? (A) Π σp (r1 × r2 × … ×rm )
a1, a2 . . . an

Π
a1, a2 . . . an

σp (r1

r2

rm )

σp (r1 ∪ r2 ∪ …∪rm )
Page 46

ENGINEER’S CIRCLE, GWALIOR

a1, a2 . . . an

(D) Ans. A

Π
a1, a2 . . . an

σp (r1 ∩ r2 ∩ …∩ rm )

CS2003

Q43. Consider set of relation shown below and SQL query that follows. Students: (Roll_Number , Name, Date_of_Birth) Course: (Cource_Number,Cource_Name, Instructor) Grades: (Roll_Number, Course_Number, Grade) Select distinct Name from Students, Cources, Grades Where Students.Roll_Number=Grades.Roll_Number and Cources.Instructor= Korth and Cources.Course_Number=Grades. Course_Number and Grades.Grade=A Which of the following sets is computed by above query? (A) Names of Students who have got an A grade in all courses taught by Korth (B) Names of Students who have got an A grade in all courses (C) Names of Students who have got an A grade in at least one of the courses taught by Korth (D)None of the above CS2003 Ans. C Q44. Given the following input (4322, 1334, 1471, 9679, 1989, 6171, 6173, 4199) and the hash function x mod 10, which of the following statements are true? i) 9679, 1989, 4199 hash to the same value ii) 1471, 6171 has to the same value iii) All elements hash to the same value iv) Each element hashes to a different value (a) i only (b) ii only (c) i and ii only (d) iii or iv CS2004

Ans. C Explanation: when we apply hash function x mod 10 to 9679,1989,4199 , result is 9 , so statement (i) is correct. Similarly, for 1471 and 6171 hash function returns 1, so statement (ii) is also correct. Q45. Consider the following relation schema pertaining to a Students database: Students (rollno, name, address) Enroll( rollno, courseno, coursename) Where the primary keys are shown underlined. The number of tuples in the student and Enroll tables are 120 and 8 respectively. What are the maximum and minimum number of tuples that can be present in (Student * Enroll), where ‘*‘ denotes natural join? (a) 8,8 (b) 120,8 (c) 960,8 (d) 960,120 CS2004 Ans. a Explanation: Natural join will be performed by equating rollno attribute from both tables.
ENGINEER’S CIRCLE, GWALIOR Page 47

In Students relation there are 120 tuples and each tuple have a unique rollno, since rollno is primary key in students relation. In enroll relation there are only 8 tuples and there can be two extreme conditions: 1.Minimum condition: each tuple in enroll have unique student or in other words there are only 8 students enrolled for courses. 2. Maximum condition: each tuple in enroll have same student but with different courses , or in other words there is one student enrolled for 8 courses. In both cases, natural join of two relation results only 8 tuples.

Q46. Which one of the following is a key factor for preferring B-trees to binary search trees for indexing database relations? (a) Database relations have a large number of records (b) Database relations are sorted on the primary key (c) B-trees require less memory than binary search trees (d) Data transfer form disks is in blocks CS2005 Ans. a Q47. Let r be a relation instance with schema R = (A, B, C, D). We define r1 = ΠA,B,C (R) and r2 = ΠA,D (r). Let s = r1* r2 where * denotes natural join. Given that the decomposition of r into r1 and r2 is lossy, which one of the following is TRUE? (a) s ⊂ r (b) r ∪ s = r (c) r ⊂ s (d)r*s=s CS2005 Ans. c Explanation: Decomposition is lossy means when decomposed tables are joined then some spurious(extra, meaning-less) tuples will be generated and because of these spurious tuples we can’t obtain actual data after joining. So option(c) is correct. Q48. The following table has two attributes A and C where A is the primary key and C is the foreign key referencing a with on-delete cascade. The set of all tuples that must be additionally deleted to preserve referential integrity when the tuple (2,4) is deleted is: A C 2 3 4 5 7 9 6 (a) (3,4) and (6,4) (b) (5,2) and (7,2) (c) (5,2), (7,2) and (9,5)
ENGINEER’S CIRCLE, GWALIOR Page 48

4 4 3 2 2 5 4

(d) (3,4), (4,3) and (6,4)

CS2005

Ans. c Explanation: C is foreign key referencing to A, C can’t have data other than data present in A. if we delete tuple(2,4), then rows those containing data value ‘2’ in column C becomes invalid, which are tuples(5,2) and (7,2), these tuples must be deleted from table. If (5,2) is deleted then one more row become invalid(tuple (9,5)). So by deleting (2,4) from table, tuples (5,2),(7,2) and (9,5) is also deleted Q49. The relation book (title, price) contains the titles and prices of different books. Assuming that no two books have the same price, what does the following SQL query list? select title from book as B where (select count(*) from book as T where T.price>B.price)<5 (a) Titles of the four most expensive books (b) Title of the fifth most inexpensive book (c) Title of the fifth most expensive book (d) Titles of the five most expensive books CS2005 Ans. d Q50. Consider a relation scheme R = (A,B,C,D,E,H) on which the following functional dependencies hold: {AB, BCD, EC, D A}. What are the candidate keys of R? (a) AE, BE (b) AE, BE, DE (c) AEH, BEH, BCH (d) AEH, BEH, DEH CS2005 Ans. d Explanation: if we take closure AE, BE, DE , we will get all attributes appearing in functional dependencies as: AE+={ABCDE} BE+={ABCDE} DE+={ABCDE} Only H is not in any of the FDs, so add H to AE, DE, and BE to generate candidate key. Q51. Consider the following log sequence of two transactions on a bank account, with initial balance 12000, that transfer 2000 to a mortgage payment and then apply a 5% interest. 1. T1 start 2. T1 B old= 1200 new= 10000 3. T1 M old=0 new=2000 4. T1 commit 5. T2start 6. T2 B old= 10000 new= 10500 7. T2 commit Suppose the database system crashes just before log record 7 is written. When the system is restarted, which one statement is true of the recovery procedure?
ENGINEER’S CIRCLE, GWALIOR Page 49

(A) We must redo log record 6 to set B to 10500 (B) We must undo log record 6 to set B to 10000 and then redo log records 2 and 3 (C) We need not redo log records 2 and 3 because transaction Ti has committed (D) We can apply redo and undo operations in arbitrary order because they are idempotent. CS2006 Ans. C Q52. Consider the relation account (customer, balance) where customer is a primary key and there are no null values. We would like to rank customers according to decreasing balance. The customer with the largest balance gets rank 1. ties are not broke but ranks are skipped: if exactly two customers have the largest balance they each get rank 1 and rank 2 is not assigned. Queryl: select A.customer, count(B.customer) from account A, account B where A.balance <=B.balance group by A.customer Query2: select A.customer, 1+count(B.customer) from account A, account B where A.balance < B.balance group by A.customer Consider these statements about Queryl and Query2. 1. Queryl will produce the same row set as Query2 for some but not all databases. 2. Both Queryl and Query2 are correct implementation of the specification 3. Queryl is a correct implementation of the specification but Query2 is not 4. Neither Queryl nor Query2 is a correct implementation of the specification 5. Assigning rank with a pure relational query takes less time than scanning in decreasing balance order assigning ranks using ODBC. Which two of the above statements are correct? (A) 2 and 5 (B) 1 and 3 (C) 1 and 4 (D) 3 and 5 CS2006 Ans. C Explanation: solve these queries by taking example. Suppose content of table is Account Query1: Customer Count Query2: A B C D E 5 5 3 1 1 Customer a b c d e
ENGINEER’S CIRCLE, GWALIOR

Count 4 4 3 1 1
Page 50

Customer a b c d e

balance 10000 10000 50000 60000 60000 So by seeing result we can prove that statement 1 is true and statement 2 and statement 3 is false.

Q53. Consider the relation enrolled (student, course) in which (student, course) is the primary key, and the relation paid (student, amount) where student is the primary key. Assume no null values and no foreign keys or integrity constraints. Given the following four queries: Queryl: select student from enrolled where student in (select student from paid) Query2: select student from paid where student in (select student from enrolled) Query3: select E.student from enrolled E, paid P where E.student = P.student Query4: select student from paid where exists (select * from enrolled where enrolled.student = paid.student) Which one of the following statements is correct? (A) All queries return identical row sets for any database (B) Query2 and Query4 return identical row sets for all databases but there exist databases for which Queryl and Query2 return different row sets. (C) There exist databases for which Query3 returns strictly fewer rows than Query2 (D) There exist databases for which Query4 will encounter an integrity violation at runtime. CS2006 Ans. b Explanation: solve by taking example. Q54. The following functional dependencies are given: AB  CD, AF D, DE  F, C  G, F  E, G  A. Which one of the following options is false? (A) {CF} ={ACDEFG} (B) {BG} = {ABCDG} (C) {AF} ={ACDEFG} (D) {AB} ={ABCDFG} Ans. C,D Explanation: in option C closure of AF contains C , so it is wrong. In option D , closure of AB contains F, so it is also wrong.
ENGINEER’S CIRCLE, GWALIOR Page 51

CS2006

Q55. Information about a collection of students is given by the relation studinfo(studId, name, sex). The relation enroll(studId, courseId) gives which student has enrolled for (or taken) what course(s). Assume that every course is taken by at least one male and at least one female student. What does the following relational algebra expression represent? Π courseid ( ( Π studId (σ sex="female" (studInfo))× Π courseid( enroll))– enroll ) (A) Courses in which all the female students are enrolled. (B) Courses in which a proper subset of female students are enrolled. (C) Courses in which only male students are enrolled. (D) None of the above

CS2007

Ans. b Explanation: statement “ Π studId (σ sex="female" (studInfo)” returns studid of all female student, these studid is naturally join with studid of enroll so statement “( Π studId (σ sex="female" (studInfo))× Π courseid( enroll))” returns cartesian product of all female student’s studid with all available courses in enroll. Next, enroll is subtracted from this Cartesian product, means actual entries of female student enrolled in enroll relation is removed from result of Cartesian product, if a course is enrolled by all female student then it will be removed completely. Now , final result contain female student with coursed in which they have not enrolled but these course may be enrolled by other female student. Π courseid () will select Courses in which a proper subset of female students are enrolled. Example: Studid S1 S2 S3 S4 Name A B C D Sex Female Female Male Male Studid S1 S1 S2 S3 S4 Π studId (σ sex="female" (studInfo))× Π courseid( enroll) Studid S1 S1 S2 S2 courseid 1 2 1 2 returns courseid 1 2 1 1 2

( ( Π studId (σ sex="female" (studInfo))× Π courseid( enroll))– enroll ) returns Studid courseid S2 2 returns
Page 52

Π courseid ( ( Π studId (σ sex="female" (studInfo))× Π courseid( enroll))– enroll )
ENGINEER’S CIRCLE, GWALIOR

Coursed 2 Q56. Consider the relation employee(name, sex, supervisorName) with name as the key. supervisorName gives the name of the supervisor of the employee under consideration. What does the following Tuple Relational Calculus query produce? {e.name employee (e) /\ ∀x [¬employee (x) \/ x.supervisorName  e.name \/ x.sex = "male" ] } (A) Names of employees with a male supervisor. (B) Names of employees with no immediate male subordinates. (C) Names of employees with no immediate female subordinates. (D) Names of employees with a female supervisor. CS2007 Ans. C Q57. Consider the table employee(empId, name, department, salary) and the two queries Q1 ,Q2 below. Assuming that department 5 has more than one employee, and we want to find the employees who get higher salary than anyone in the department 5, which one of the statements is TRUE for any arbitrary employee table? Q1 : Select e.empId From employee e Where not exists (Select * From employee s where s.department = “5” and s.salary >=e.salary) Q2 : Select e.empId From employee e Where e.salary > Any (Select distinct salary From employee s Where s.department = “5”) (A)Q1 is the correct query (B) Q2 is the correct query (C) Both Q1 and Q2 produce the same answer. (D) Neither Q1 nor Q2 is the correct query CS2007 Ans. b Q58. Which one of the following statements if FALSE? (A) Any relation with two attributes is in BCNF (B) A relation in which every key has only one attribute is in 2NF (C) A prime attribute can be transitively dependent on a key in a 3 NF relation. (D) A prime attribute can be transitively dependent on a key in a BCNF relation. Ans. d

CS2007

Q59. Consider the following schedules involving two transactions. Which one of the following statements is TRUE? S1: r1(X); r1(Y); r2(X); r2(Y); w2(Y); w1(X) S2: r1(X); r2(X); r2(Y); w2(Y); r1(Y);w1(X) (A) Both S1 and S2 are conflict serializable. (B) S1 is conflict serializable and S2 is not conflict serializable. (C) S1 is not conflict serializable and S2 is conflict serializable.
ENGINEER’S CIRCLE, GWALIOR Page 53

(D) Both S1 and S2 are not conflict serializable.

CS2007

Ans. C Explanation: to find conflict serializability first find conflict statements in schedules. S1 S2 T1 r1(X) r1(Y) r2(X) r2(Y) w2(Y) w1(X) Dependency Graph: Y
T1

T2

T1 r1(X)

T2

r2(X) r2(Y) w2(Y) r1(Y) w1(X) Dependency graph: X Y
T1

X

T1

T1

Cycle exists, serializable

Not

conflict

Cycle not serializable

exists,

conflict

Q60. Which of the following tuple relational calculus expression(s) is/are equivalent to t ∈ r (P(t))? I. ¬∃ t ∈ r(P(t)) II. ∃ t ∉ r(P(t)) III. ¬∃ t ∈ r(¬P(t)) IV. ¬t ∉ r(¬P(t)) (A) I only (B) II only (C) III only (D) III and IV only CS2008 Ans. C Explanation: in this question some rules of predicate calculus are used.  can be replace by ∀¬ or ∀ can be replace by ¬∃ ¬ ∀t ∈ r (P(t)) = ∃ ¬ ( t ∈ r (P(t))) = ¬∃ ( t ¬∈ r (P(t))) Now ¬∈ can be replace by ∈ = ∃ ( t ∈  r (P(t))) =  ( t ∈ r (¬P(t))) Alternatively, you can take example for symbols and then compare each predicate. Suppose r is student relation and P is predicate for Sick. P(t)=Sick(t) means student t is sick ∀t ∈ r (P(t)) means all are sick students I. ¬t ∈ r(P(t)) means there exist no one who belongs to sick student II. ∃ t ∉ r(P(t)) means there exist some one who does not belong sick student III. ¬∃ t ∈ r(¬P(t)) means there exist no one who belongs to not sick student
ENGINEER’S CIRCLE, GWALIOR Page 54

IV. ¬t ∉ r(¬P(t)) means there exist no one who does not belong to not sick student Q61. A clustering index is defined on the fields which are of type (A) non-key and ordering (B) non-key and non-ordering (C) key and ordering (D) key and non-ordering Ans. C Q62. The keys 12, 18, 13, 2, 3, 23, 5 and 15 are inserted into an initially empty hash table of length 10 using open addressing with hash function h(k) = k mod 10 and linear probing. What is the resultant hash table? A B C D 0 1 2 3 4 5 6 7 8 9 18 15 2 23 0 1 2 3 4 5 6 7 8 9 18 5 12 13 0 1 2 3 4 5 6 7 8 9 12 13 2 3 23 5 18 15 0 1 2 3 4 5 6 7 8 9 18 5,15 12,2 13,3,23

CS2008

Ans. C Q63. Let R and S be relational schemes such that R={a,b,c} and S={c}. Now consider the following queries on the database: I. Π R-S (r) - Π R-S ( Π R-S (r) ×S- Π R-S,S(r) ) II. {t | t ∈ Π R-S(r) /\ ∀u ∈ s (∃ v ∈ r(u=v[s] /\ t=v[R-S] ))} III. t | t ∈ Π R-S(r) /\ ∀v ∈ s (u ∈ r(u=v[s] /\ t=v[R-S] ))} IV. Select R.a, R.b From R, S Where R.c=S.c Which of the above queries are equivalent? (A) I and II (B) I and III (C) II and IV (D) III and IV CS2009 Ans. C Hint: here R-S means a,b and S means c queryI. Π R-S (r) - Π R-S ( Π R-S (r) ×S- Π R-S,S(r) ) This query is equivalent to Π a,b (r) - Π a,b ( Π a,b (r) ×S- Π a,b,c(r) )
ENGINEER’S CIRCLE, GWALIOR Page 55

Returns combination of a,b in R which belongs all c in S Common Data Questions: 64 & 65 Consider the following relational schema: Suppliers(sid:integer, sname:string, city:string, street:string) Parts(pid:integer, pname:string, color:string) Catalog(sid:integer, pid:integer, cost:real) Q64. Consider the following relational query on the above database: SELECT S.sname FROM Suppliers S WHERE S.sid NOT IN (SELECT C.sid FROM Catalog C WHERE C.pid NOT in (SELECT P.pid FROM Parts P WHERE P.color<> 'blue')) Assume that relations corresponding to the above schema are not empty. Which one of the following is the correct interpretation of the above query? (A) Find the names of all suppliers who have supplied a non-blue part. (B) Find the names of all suppliers who have not supplied a non-blue part. (C) Find the names of all suppliers who have supplied only blue parts. (D) Find the names of all suppliers who have not supplied only blue parts. CS2009 Ans. B Explanation: (SELECT P.pid FROM Parts P WHERE P.color<> 'blue') returns pid of parts those have blue color. (SELECT C.sid FROM Catalog C WHERE C.pid NOT in (SELECT P.pid FROM Parts P WHERE P.color<> 'blue')) returns sid of suppliers who have supplied at least one non-blue part. Finally, outer query will select suppliers who have not supplied any non-blue parts. Q65. Assume that, in the suppliers relation above, each supplier and each street within a city has a unique name, and (sname, city) forms a candidate key. No other functional dependencies are implied other than those implied by primary and candidate keys. Which one of the following is TRUE about the above schema? (A) The schema is in BCNF (B) The schema is in 3NF but not in BCNF (C) The schema is in 2NF but not in 3NF (D) The schema is not in 2NF CS2009 Ans. A Explanation: in this relation FDs only depend on primary key and candidate key, so relation is in BCNF. Q66. A relational schema for a train reservation database is given below Passenger (pid, pname, age) Reservation (pid, cass, tid)

ENGINEER’S CIRCLE, GWALIOR

Page 56

Pid pid pname age 0 1 2 3 Sachin Rahul 65 66 0 1 2 5 1 3

Class AC AC AC AC AC AC

Tid 8200 8201 8201 8203 8204 8202

Table :Passenger Table :Reservation

Sourav 67 Anil 69

What pids are returned by the following SQL query for the above instance of the tables? SELECT pid FROM Reservation WHERE class = 'AC' AND EXISTS ( SELECT * FROM Passenger WHERE age>65 AND Passenger.pid=Reservation.pid) (A) 1, 0 (B) 1, 2 (C) 1, 3 (D) 1, 5 CS2010

Ans. B Explanation: it is correlated query ,so for every row select by outer query inner query will run Pid 0 1 2 5 1 3 Class AC AC AC AC AC AC Tid 8200 8201 8201 8203 8204 8202 Inner query returns Exists returns Null {1,Rahul,66} {2,Sourav,67} Null {1,Rahul,66} Null False True True False True False 1 1 2 Outer query returns

Q67. Which of the following concurrency control protocols ensure both conflict serializability and freedom from deadlock? I. 2-phase locking II. Time-stamp ordering (A) I only (B) II only (C) Both I and II (D) Neither I nor II CS2010 Ans. B Q68. Consider the following schedule for transactions T1, T2 and T3: T1 T2 T3 Read(X) Read(Y)
ENGINEER’S CIRCLE, GWALIOR Page 57

Read(Y) Write(Y) Write(X) Write(X) Read(X) Write(X) Which one of the schedules below is the correct serialization of the above? (A) T1 → T3 → T2 (B) T2 → T1 → T3 (C) T2 → T3 → T1 (D) T3 → T1 → T2 Ans. A Explanation: first find all conflicts T1 T2 T3 Read(X) Read(Y) Read(Y) Write(Y) Write(X) Write(X) Read(X) Write(X) Now make dependency graph
T1 T3 T2

CS2010

By topological sorting of this graph we can find order of serialization which is T1T3T2 Q69. The following functional dependencies hold for relations R(A, B, C) and S(B, D, E) B A, A C The relation R contains 200tuples and the relation S contains 100tuples. What is the maximum number of tuples possible in the natural join of R and S? (A) 100 (B) 200 (C) 300 (D) 2000 CS2010 Ans. A Explanation: from set of functional dependencies , we can find that B is primary key of R. So in R , 200 tuples contains unique value of B. In S there can be two extreme conditions: 1. If all 100 B in S is same(and this B is present in R) 2. If all 100 B in S is unique( and every B in S is present in R) In both case natural join would pick maximum 100 tuples. Statement for Linked Answer Questions: 70 & 71 A hash table of length 10 uses open addressing with hash function h(k)=k mod 10, and linear probing. After inserting 6 values into an empty hash table, the table is as shown below 0 1 2 3
ENGINEER’S CIRCLE, GWALIOR

4 2 2
Page 58

3 4 5 6 7 8 9 Q70. Which one of the following choices gives a possible order in which the key values could have been inserted in the table? (A) 46, 42, 34, 52, 23, 33 (B) 34, 42, 23, 52, 33, 46 (C) 46, 34, 42, 23, 52, 33 (D) 42, 46, 33, 23, 34, 52 CS2010 Ans. C Explanation: for all option create hash table Option A Option B 0 1 2 3 4 5 6 7 8 9 42 52 34 23 46 33 42 23 34 52 33 46 42 23 34 52 46 33 42 33 23 34 46 52 3 4 5 2 4 6 3 3

Option C

Option D

Q71. How many different insertion sequences of the key values using the same hash function and linear probing will result in the hash table shown above? (A) 10 (B) 20 (C) 30 (D) 40 CS2010 Ans. C Q72. Consider the following entity relationship diagram (ERD), where two entities El and E2 have a relation R of cardinality l: m. 1 m
R E1 ENGINEER’S CIRCLE, GWALIOR E2 Page 59

The attributes of El are A11, A12 and A13 where A11 is the key attribute. The attributes of E2 are A21, A22 and A23 where A21 is the key attribute and A23 is a multi-valued attribute. Relation R does not have any attribute. A relational database containing minimum number of tables with each table satisfying the requirements of the third normal form (3NF) is designed from the above ERD. The number of tables in the database is: (A) 2 (B) 3 (C) 5 (D)4 IT2004 Ans. B Explanation: tables created using ERD E1(A11, A12 , A13) , E2(A21, A22) and A23(A21, A23) Q73. A relational database contains two table student and department in which student table has columns roll_no, name and dept_id and department table has columns dept_id and detp_name. the following insert statements were executed successfully to populate the empty tables: Insert into department values (1, ‘Mathematics’) Insert into department values (2, ‘Physics’) Insert into student values (1, ‘Navin’,l) Insert into student values (2, ‘Mukesh’,2) Insert into student values (3, ‘Gita’,l) How many rows and columns will be retrieved by the following SQL statement? Select * from student, department (A) 0 row and 4 columns (B) 3 rows and 4 columns (C) 3 rows and 5 columns (D) 6 rows and 5 columns

IT2004

Ans. D Explanation: query is Cartesian product of student and department which returns 3*2 rows and 5(3 student’s and 2 department’s) columns. Q74. A relation Empdtl is defined with attributes empcode (unique), name, street, city, state and pincode. For any pincode, there is only one city and state. Also, for any given street, city and state, there is just one pincode. In normalization terms, Empdtl is a relation in (A) 1 NF only (B) 2 NF and hence also in 1 NF (C) 3 NF and hence also in 2 NF and 1 NF (D) BCNF and hence also in 3 NF, 2NF and 1NF IT2004 Ans. C Explanation: functional dependency given Pincodecity Pincodestate Street,city,statepincode Candidate key of Empdtl will be: {empcode, name, pincode, street} and {empcode, name, Street, city, state} Apply check for 2NF: find partial dependencies Pincodecity - not partial(primeprime) Pincodestate - not partial(primeprime) Street,city,statepincode - not partial(primeprime)
ENGINEER’S CIRCLE, GWALIOR Page 60

Empdtl is in 2NF Apply check for 3NF: Pincodecity Pincodestate Street,city,statepincode Empdtl is in 3NF Apply check for BCNF Pincodecity Pincodestate Street,city,statepincode Empdtl is not in BCNF - not violate 3NF(right side have prime attribute) - not violate 3NF(right side have prime attribute) - not violate 3NF(right side have prime attribute)

- violate BCNF(left side not have super key) - violate BCNF(left side not have super key) - violate BCNF(left side not have super key)

Highest Normal form supported by Empdtl is 3NF Q75. A table Ti in a relational database has the following rows and columns: Roll No Marks 1 2 3 4 10 20 30 Null

The following sequence of SQL statements was successfully executed on table T1. Update Ti set marks = marks + 5 Select avg(marks) from Ti What is the output of the select statement? A) 18.75 (B) 20 (C) 25 (D)Null Ans. C Explanation:Query “Update Ti set marks = marks + 5” update in table as Roll No Marks 1 2 3 4 Query “Select avg(marks) from Ti” returns (15+25+35)/3=25 Q76. Consider the following schedule S of transactions T1 and T2: T1 T2 Read(A) A=A-10
ENGINEER’S CIRCLE, GWALIOR

IT2004

15 25 35 Null

Page 61

Read(A) Temp=0.2*A Write(A) Read(B) Write(A) Read(B) B=B+10 Write(B) B=B+temp Write(B) Which of the following is TRUE about the schedule 5? (A) S is serializable only as T1, T2 (B) S is serializable only as T2, T1 (C) S is serializable both as T1, T2 and T2, T1 (D) S is serializable either as T1 or as T2 Ans. D Explanation: find all conflicts in schedule T1 T2 Read(A) A=A-10 Read(A) Temp=0.2*A Write(A) Read(B) Write(A) Read(B) B=B+10 Write(B) B=B+temp Write(B) This schedule is not conflict serializable. But this can be view serializable. If a schedule is not conflict serializable then this means schedule may or may not be serializable. So we have to check for less strict definition of serializability i.e. view serializability. View serializability does not consider conflicts for blind writes. For example T1 T2
ENGINEER’S CIRCLE, GWALIOR Page 62

IT2004

R(A) W(A) W(A) W(A) This schedule is not conflict serializable. But in last result of this schedule is value stored for data item A. Last write operation writes value of A. result of this schedule is similar to running only T2. If we swap the write statements then schedule is T1 T2 R(A) W(A) W(A) W(A) Result of this schedule is similar to running T1 only. This situation is called blind writes. These writes operation does not change serializability of schedule. Now in question, there is only write-write conflict forming a cycle. We can swap these write instructions in following two ways to remove cycle: T1 Read(A) A=A-10 Read(A) Temp=0.2*A Write(A) Write(A) Read(B) Read(B) B=B+10 Write(B) B=B+temp Write(B) Output of this schedule is same as output of running only T2 So option D is correct. Q77. Consider two tables in a relational database with columns and rows as follows: Table: Student Table: Department Roll_no Name 1 2 ABC DEF Dept_id 1 1 Dept_id 1 2 Dept_Name A B
Page 63

T2

T1 Read(A) A=A-10

T2

Read(A) Temp=0.2*A Write(A) Read(B) Write(A) Read(B) B=B+10 B=B+temp Write(B) Write(B) Output of this schedule is same as output of running only T1

ENGINEER’S CIRCLE, GWALIOR

3 4

GHI JKL

2 3

3

C

Roll_no is the primary key of the Student table, Dept_id is the primary key of the Department table and Studetn.Dept_id is a foreign key from Department.Dept_id. What will happen if we try to execute the following two SQL statements? (i) update Student set Dept_id = Null where Roll_no =1 (ii) update Department set Dept_id = Null where Dept_id =1 (A) Both (i) and (ii) will fail (C) (i) will succeed but (ii) will fail (B) (i) will fail but (ii) will succeed (D) Both (i) and (ii) will succeed IT2004

Ans. C Explanation: Query(i) runs correctly because foreign key(Student.dept_id) can have only null value other than values in referred column(Department.dept_id). Query(ii) will fail because it is trying to set dept_id to Null which is primary key and primary key implicitly have two constraints 1. Unique and 2. Not null. Q78. A hash table contains 10 buckets and uses linear probing to resolve collisions. The key values are integers and the hash function used is key % 10. if the values 43, 165, 62, 123, 142 are inserted in the table, in what location would the key value 142 be inserted? (A) 2 (B) 3 (C) 4 (D)6 IT2005 Ans. D Explanation: create hash table for given data 0 1 2 3 4 5 6 7 8 9 Q79. Consider the entities ‘hotel room’, and ‘person’ with a many to many relationship ‘lodging’ as shown below:
lodging Hotel Room ENGINEER’S CIRCLE, GWALIOR

62 43 123 165 142

m

m

person Page 64

If we wish to store information about the rent payment to be made by person(s) occupying different hotel rooms, then this information should appear as an attribute of (A) Person (B) Hotel Room (C) Lodging (D)None of these IT2005 Ans. C Explanation: for many-nay relation a separate table is created. Here a separate table will be created for lodging which contain primary keys of ‘hotel room’ and ‘person’ as its attribute . so we can store information about the rent payment to be made by person(s) occupying different hotel rooms in lodging table. Q80. A table has fields Fl, F2, F3, F4, F5 with the following functional dependencies Fl  F3 F2 F4 (F1.F2)  F5 In terms of Normalization, this table is in (A) 1 NF (B) 2 NF (C) 3 NF (D)None of these IT2005 Ans. A Explanation: first find candidate keys of relation: {F1,F2} Now check for 2NF: find partial dependencies Fl  F3 - partial (prime non-prime) F2 F4 - partial (prime non-prime) (F1.F2)  F5 - not partial(non-prime non-prime) Relation is not in 2NF because it have partial dependencies. Q81. A B-tree used as an index for a large database table has four levels including the root node. If a new key is inserted in this index, then the maximum number of nodes that could be newly created in the process are (A) 5 (B) 4 (C) 3 (D)2 IT2005 Ans. A Explanation: solve by taking example. Q82. Amongst the ACID properties of a transaction, the ‘Durability’ property requires that the changes made to the database by a successful transaction persist (A) except in case of an Operating System crash (B) except in case of Disk crash (C) except in case of a power failure (D) always, even if there is a failure of any kind IT2005 Ans. D Q83. A company maintains records of sales made by its salespersons and pays them commission based on each individual’s total sales made in a year. This data is maintained in a table with following schema: salesinfo = (salespersonid, totalsales, commission) In a certain year, due to better business results, the company decides to further reward its salespersons by enhancing the commission paid to them as per the following formula.
ENGINEER’S CIRCLE, GWALIOR Page 65

If commission < = 50000, enhance it by 2% If 50000 < commission < = 100000, enhance it by 4% If commission > 100000, enhance it by 6% The IT staff has written three different SQL scripts to calculate enhancement for each slab, each of these scripts is to run as a separate transaction as follows: T1 Update salesinfo Set commission = commission * 1.02 Where commission < = 50000; T2 Update salesinfo Set commission = commission * 1.04 Where commission > 50000 and commission is <= 100000; T3 Update salesinfo Set commission = commission * 1.06 Where commission > 100000; Which of the following options of running these transactions will update the commission of all salespersons correctly? (A) Execute T1, followed by T2 followed by T3 (B) Execute T2, followed by T3; T1 running concurrently throughout (C) Execute T3 followed by T2; Ti running concurrently throughout (D) Execute T3 followed by T2 followed by T1 Ans. D Explanation: suppose if we run T1 then there will be some employees whose salary have become >50000 and now if we run T2 then these employees will also get benefit of 4%, so T2 must not followed by T1, and similarly, T3 must not be followed by T2. So option D is correct. Q84. A table ‘student’ with schema (roll, name, hostel, marks) and another table ‘hobby’ with schema (roll, hobbyname) contains records as shown below. Table: Student Table: Hobby Roll 1798 2154 2369 2581 2643 2711 2872 2926 2959 3125 Name Manoj Rathod Soumic Banerjee Gumma Reddy Pradeep Pense Suhas Kulkarni Nitin Kadam Kiran Vora Manoj Kulkalikar Hemant Karkhanis Rajesh Doshi Hostel 7 5 7 6 5 8 5 5 7 5 Marks 95 68 86 92 78 72 92 94 88 82 Roll 1798 1798 2154 2369 2581 2643 2643 2711 2872 2926 2959 3125
ENGINEER’S CIRCLE, GWALIOR

Hobbyname Chess Music Music Swimming Cricket Chess Hockey Volleyball Football Cricket Photography Music
Page 66

3125 The following SQL query is executed on the above tables: select hostel from student natural join hobby where marks > = 75 and roll between 2000 and 3000;

Chess

Relations S and H with the same schema as those of these two tables respectively contain the same information as tuples. A new relation S’ is obtained by the following relational algebra operation: S = Π hostel ((σs.roll=H.roll (σmarks>75 and roll>2000 and roll<3000 (S)) × (H)) The difference between the number of rows output by the SQL statement and the number of tuples in S is: (A) 6 (B) 4 (C) 2 (D) IT2005 Ans. B Explanation: Following table is created after joining tables with condition Roll Name Hostel Marks Hobby 2369 2581 2643 2643 2872 2926 2959 Gumma Reddy Pradeep Pense Suhas Kulkarni Suhas Kulkarni Kiran Vora Manoj Kulkalikar Hemant Karkhanis 7 6 5 5 5 5 7 86 92 78 78 92 94 88 Swimming Cricket Chess Hockey Football Cricket Photography

From this table only hostel is selected by SQL statements, so 7 rows are returned. Relational algebra expression S is similar to this SQL statement , but relational algebra always returns a set and a set removes replicated values itself. Result produced by S is {7,6,5}only three tuple. So difference is 7-3=4

ENGINEER’S CIRCLE, GWALIOR

Page 67

Sign up to vote on this title
UsefulNot useful