You are on page 1of 119

DATABASE MANAGEMENT SYSTEM

G PRAKASH
 Semi-structured Data Model
Domain Constraint
 Referential Integrity
 Relational Databases
 based on the relational model and uses a collection of tables to represent both
data and the relationships among those data
 Tables
 It
also includes DDL and DML
 Relational model is an example of record-based model
 Database structured in fixed-format records of several types
 Each table consists of a particular record type

 Each record type defines a fixed number of fields, or attributes

 Relationalmodel hides low-level implementation details from database


developers and users
 Data-Manipulation Language
 SQL is non-procedural
 A query takes as input several tables (possibly only one) and always returns a single

table
 select instructor.name from instructor where instructor.dept_name = ‘History’;
 Queries may involve information from more than one table.
 select instructor.ID, department.dept_name from instructor, department where
instructor.dept_name= department.dept_name and department.budget > 95000;
 Data-Definition Language
 SQL provides a rich DDL that allows one to define tables, integrity constraints,
assertions, etc
 create table department (dept name char (20), building char (15), budget number

(12,2));
 Database Access from Application Programs
 SQL does not support actions such as
 Input from users,
 output to displays,

 or communication over the network

 Actions
must be written in a host language, such as C, C++, or Java, with
embedded SQL queries that access the data in the database
 To access the database, DML statements need to be executed from the host
language. 2 ways:
 By providing an application program interface – Open Database Connectivity
(ODBC) – commonly used API standard & Java Database Connectivity (JDBC)
standard
 By extending the host language syntax to embed DML calls within the host language

program – a preprocessor, called the DML precompiler, converts the DML statements
to normal procedure calls in the host language.
 Database Design
 Designed to manage large bodies of information
 These large bodies of information do not exist in isolation
 mainly involves the design of the database schema
 Design Process
 Initial phase is to characterize fully the data needs of the prospective database users;
Outcome => user requirements specification.
 Next, the designer chooses a data model, by applying the concepts of the chosen data

model, translates these requirements into a conceptual schema of the database


 Designer reviews the schema for confirming the satisfaction of data requirements

 Examine the design to remove any redundant features

 W.r.t. to relational model, Also involves decision on “what” attributes to capture in

the database & “how to group” these attributes to form the various tables

Use the Entity-Relationship model Normalization


 A fully-developed conceptual schema indicates the functional requirements of
the enterprise
 In a specification of functional requirements, users describe the kinds of
operations (or transactions) that will be performed on the data
 Process of moving

Abstract Data Implementation of


Model Database

 Proceeds in 2 final design phases:


 Logical-Design phase
 designer maps the high-level conceptual schema onto the implementation data
model of the database system
 Physical-Design phase
 Resulting system-specific database schema is used in which the physical features
of the database are specified
 Database Design for a University Organization
 Refer the book and consider your own conceptual design
 The Entity-Relationship Model
 Entities are described in a database by a set of attributes
Form attributes of the
department entity set

 A relationship
is an association among
several entities.

 E.g. a member relationship


associates an instructor with
her department.
 The set of all entities of the same type is termed an entity set
 The set of all relationships of the same type is termed an relationship set.
 Overall logical structure (schema) of a database can be expressed
graphically by an entity-relationship (E-R) diagram
 In addition to entities and relationships, the E-R model represents certain
constraints to which the contents of a database must conform =>
mapping cardinalities
 Expresses the number of entities to which another entity can be associated via a
relationship set
 E.g. if each instructor must be associated with only a single department, the E-R
model can express that constraint.
 Normalization
 Goal: to store information without unnecessary redundancy
 Approach: to design schemas that are in an appropriate normal form
 To determine whether a relation schema is in one of the desirable normal forms,
additional information needed => to use functional dependencies
 A bad database design have undesirable properties:
 Repetition of information
 Inability to represent certain information (Suppose we are creating a new department in the
university
 Solution: introduce null values
 Transaction Management
 Several operations on the database form a single logical unit of work.
 E.g., a fund transfer must happen in its entirety or not at all; this all-or-none
requirement is called atomicity.
 In addition, it is essential that the execution of the funds transfer preserve the
consistency of the database; This correctness requirement is called
consistency
 Finally, after the successful execution of a funds transfer, the new values of
the balances of accounts A and B must persist, despite the possibility of
system failure; This persistence requirement is called durability.
 A transaction is a collection of operations that performs a single logical
function in a database application
 Each transaction is a unit of both atomicity and consistency
 Transactions should not violate any database consistency constraints =>
programmer’s responsibility to define various transactions correctly and
preserve consistency of the database
 Database system’s responsibility for ensuring atomicity and durability
properties – specifically of Recovery manager
 To ensure atomicity property, database system perform failure recovery =>
restore the database to the state that existed prior to the occurrence of the
failure.
 What happens when several transactions update the database concurrently =>
responsibility goes to concurrent -control manager
 The transaction manager consists of the concurrency-control manager and
the recovery manager
ACID PROPERTIES

 Let’s take an example of a simple transaction. Suppose a bank


employee transfers Rs 500 from A's account to B's account. This very
simple and small transaction involves several low-level tasks
 A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
 B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
Database Architecture
 Database Users and Administrators
 Database Users and User Interfaces
 Four different types of database-system users
 Na¨ıve users – typical user interface for na¨ıve users is a forms interface,

 Application programmers - choose from many tools to develop user interfaces.

Rapid application development (RAD) enable an application programmer to


construct forms and reports with minimal programming effort
 Sophisticated users interact with the system without writing programs; Analysts who

submit queries to explore data in the database fall in this category


 Specialized users write specialized database applications that do not fit into the

traditional data-processing framework. computer-aided design systems, knowledgebase


and expert systems, systems that store data with complex data types (for example,
graphics data and audio data),
 Database Administrator
 Central control over the system
 Schema definition

 Storage structure and access-method definition

 Schema and physical-organization modification

 Granting of authorization for data access

 Routine maintenance.
RELATIONAL DATABASES

 In a relational model, the terms


 Relation is used to refer to a table
 Tuple is used to refer to a row
 Attribute refers to a column of a table
 The term relation instance is used to refer to a specific instance of a
relation, i.e., containing a specific set of rows
 The order in which tuples appear in a relation is irrelevant, since a relation
is a set of tuples
 For each attribute of a relation, there is a set of permitted values, called the
domain of that attribute
 For all relations r, the domains of all attributes of r be atomic
 A domain is atomic if elements of the domain are considered to be
indivisible units
 The null value is a special value that signifies that the value is unknown or
does not exist
 Database Schema
 Logicaldesign of the database
 The concept of a relation schema corresponds to the programming-
language notion of type definition
 Relation corresponds to variable names
 Relation instance corresponds to value of variables
 The schema for this relation is
department (dept name, building, budget)
 Common attributes in relation schemas is one way of relating tuples of
distinct relations
 To find the information about all the instructors who work in the Watson
building? (Exercise)
 Let us continue with our university
database example
 Each course in a university may be
offered multiple times, across
different semesters, or even within a
semester
 Relation needed to describe each individual section, of the class are defined as
schema
section (course id, sec id, semester, year, building, room number, time slot id)
 Relation needed to describe the association between instructors and the class
sections that they teach; the schema is
teaches (ID, course id, sec id, semester, year)
 Create your own relation schemas for university database? (Exercise)
 Keys
 How tuples within a given relation are distinguished?
 No two tuples in a relation are allowed to have exactly the same value for all
attributes
 A superkey is a set of one or more attributes that, taken collectively, allow us to
identify uniquely a tuple in the relation
 A superkey may contain extraneous attributes
 E.g. the combination of ID and name is a superkey for the relation instructor.
If K is a superkey, then so is any superset
of K.

Superkeys for which no proper subset is


also a superkey

Such minimal superkeys are called


candidate keys

Suppose that a combination of name and


dept name is sufficient to distinguish
among members of the instructor relation.

We shall use the term primary key to denote a


candidate key that is chosen
by the database designer
 A key (whether primary, candidate, or super) is a property of the entire
relation, rather than of the individual tuples
 Any two individual tuples in the relation are prohibited from having the
same value on the key attributes at the same time.
 A relation, say r1, may include among its attributes the primary key of
another relation, say r2
 This attribute is called a foreign key from r1, referencing r2.
 The relation r1 is also called the referencing relation of the foreign key
dependency, and r2 is called the referenced relation of the foreign key
 In any database instance, given any tuple, say ta, from the instructor relation,
there must be some tuple, say tb, in the department relation such that the
value of the dept_name attribute of ta is the same as the value of the primary
key, dept name, of tb .
 The constraint from section to teaches is an example of a referential
integrity constraint;
 A referential integrity constraint requires that the values appearing in
specified attributes of any tuple in the referencing relation also appear in
specified attributes of at least one tuple in the referenced relation
 Schema Diagrams
 A database schema, along with primary key and foreign key
dependencies, can be depicted by schema diagrams
 Foreign key dependencies appear as arrows from the foreign key
attributes of the referencing relation to the primary key of the referenced
relation.
 Relational Query Languages
 A query language is a language in which a user requests information from the
database
 Usually has a level higher than that of a standard programming language
 Categorized as procedural or non-procedural language
 In a procedural language, the user instructs the system to perform a sequence of
operations on the database to compute the desired result
 In a nonprocedural language, the user describes the desired information without
giving a specific procedure for obtaining that information
 Relational Operations
 All procedural relational query languages provide a set of operations that can be
applied to either a single relation or a pair of relations
 Join operation
 allows the combining of two relations by merging pairs of tuples, one from each
relation, into a single tuple
 a number of different ways to join relations are available

 Natural join

 In general, the natural join operation on two relations matches tuples whose values
are the same on all attribute names that are common to both relations.
 Cartesian product operation
 combines tuples from two relations, but unlike the join operation, its result
contains all pairs of tuples from the two relations, regardless of whether their
attribute values match
 As relations are sets, we can perform normal set operations on relations
 Union operation
 performs a set union of two “similarly structured” tables
 Intersection operation
 Set difference operation

create table department (dept _name varchar (20), building varchar (15), budget
number (12,2), primary key (dept name));
create table course (course id varchar (7), title varchar (50),
dept _name varchar (20), credits number (2,0), primary key (course id),
foreign key (dept name) references department);

create table instructor (ID varchar (5), name varchar (20) not null, dept name
varchar (20), salary number (8,2), primary key (ID), foreign key (dept name)
references department);

create table section (course id varchar (8), sec id varchar (8),


semester varchar (6), year number (4,0), building varchar (15),
room number varchar (7), time slot id varchar (4), primary key (course id, sec id,
semester, year), foreign key (course id) references course);
create table teaches (ID varchar (5), course id varchar (8), sec id varchar (8),
semester varchar (6), year number (4,0), primary key (ID, course id, sec id, semester,
year), foreign key (course id, sec id, semester, year) references section, foreign key
(ID) references instructor);

 To force the elimination of duplicates


select distinct dept name from instructor;
 SQL allows us to use the keyword all to specify explicitly that
duplicates are not removed
select all dept name from instructor;
 Select clause may also contain arithmetic expressions involving the
operators +, −, ∗, and / operating on constants or attributes of tuples

select ID, name, dept name, salary * 1.1 from instructor;


 The where clause allows us to select only those rows in the result relation of the
from clause that satisfy a specified predicate.
 SQL allows the use of the logical connectives and, or, and not in the where
clause.
 The operands of the logical connectives can be expressions involving the
comparison operators <, <=, >, >=, =, and <>.
 Queries on Multiple Relations
 Retrieve the names of all instructors, along with their department names and
department building name (exercise)
 select name, instructor.dept name, building from instructor, department where
instructor.dept name = department.dept name;
 The from clause by itself defines a Cartesian product of the relations listed in the
clause. It is defined formally in terms of set theory.
 For example, the relation schema for the Cartesian product of relations
instructor and teaches is:
 (instructor.ID, instructor.name, instructor.dept name, instructor.salary
teaches.ID, teaches.course id, teaches.sec id, teaches.semester,
teaches.year)
 The Cartesian product by itself combines tuples from instructor and teaches
that are unrelated to each other.
 Each tuple in instructor is combined with every tuple in teaches, even those
that refer to a different instructor.
 The result can be an extremely large relation, and it rarely makes sense to
create such a Cartesian product.
 Instead, the predicate in the where clause is used to restrict the combinations
created by the Cartesian product to those that are meaningful for the desired
answer.
 We would expect a query involving instructor and teaches to combine a particular
tuple t in instructor with only those tuples in teaches that refer to the same instructor
to which t refers
select name, course id from instructor, teaches where instructor.ID= teaches.ID;

 To find instructor names and course identifiers for instructors in the Computer
Science department
select name, course id from instructor, teaches where instructor.ID= teaches.ID
and instructor.dept name = ’Comp. Sci.’;
 In general, the meaning of an SQL query can be understood as follows
1. Generate a Cartesian product of the relations listed in the from clause
2. Apply the predicates specified in the where clause on the result of Step 1.
3. For each tuple in the result of Step 2, output the attributes (or results of
expressions) specified in the select clause.
 Natural join (visited again)
Cartesian Product Natural Join
Concatenates each tuple of the first relation Considers only those pairs of tuples with the
with every tuple of the second same value on those attributes that appear in
the schemas of both relations
SQL STRING FUNCTIONS

 CONCAT - returns the result (a string) of concatenating two string values


 Concat (char1/string1/column1, char2/string2/column2)
 CHR – used to return the character having the binary equivalent to ‘n’ as a
varchar2 value
 Chr (n)
 LOWER – returns a specified character expression in lowercase letters
 Lower (cExpression/string/column)
 UPPER
 LPAD – used to pad the left side of a string with a specific set of characters
 Lpad(expr1, n[,expr2])
 RPAD
 LTRIM – used to remove all the specified characters from the left
end side of a string
 LTRIM (string1 [,set])
 RTRIM(string [,trimming text])
 SUBSTR – returns the specified number of characters from a
particular position of a given string
 Substr(char, position [,substring_length])
SQL NUMERIC FUNCTIONS
 Abs(n)
 Log(B,n) – to return the logarithm, base B of n

 Mod(N,M)

 Power(m,n)

 Sqrt(n)

 Round(m)
SQL AGGREGATE FUNCTIONS
 MIN(column_name)
 MAX(column_name)

 COUNT(column_name)

 AVG(column_name)

 SUM(column_name)
SOME DATE/TIME FUNCTIONS AND
QUERIES
 Select sysdate from dual;
 Select current_date from dual;

 Display the date 60 days before the current date:


 Select sysdate-60 from dual;
 Select months_between(date1, date2) from dual;
 SQL Like
 SELECT column1, column2, ... FROM table_name
WHERE columnN LIKE pattern;
 different LIKE operators with '%' and '_' wildcards:
 LIKE ‘a%’ => Finds any values that start with "a“
 LIKE ‘%a’ => Finds any values that end with "a“

 LIKE ‘_r%’ => Finds any values that have "r" in the second position

 LIKE ‘%or%’ => Finds any values that have "or" in any position

 SQL IN Operator
 The IN operator allows you to specify multiple values in a WHERE clause.
 SELECT column_name(s) FROM table_name
WHERE column_name IN (value1, value2, ...);
 SELECT column_name(s) FROM table_name
WHERE column_name IN (SELECT STATEMENT);
 The SQL BETWEEN Operator
 The BETWEEN operator selects values within a given range. The values can
be numbers, text, or dates.
 SELECT column_name(s) FROM table_name
WHERE column_name BETWEEN value1 AND value2;
 The SQL ORDER BY clause
 used to sort the data in ascending or descending order, based on one or more
columns
 Syntax:
 SELECT column-list FROM table_name [WHERE condition] [ORDER BY
column1, column2, .. columnN] [ASC | DESC];
 SQL> SELECT * FROM CUSTOMERS ORDER BY NAME;
 SQL> SELECT * FROM CUSTOMERS ORDER BY NAME DESC;
 The SQL GROUP BY clause
 used in collaboration with the SELECT statement to arrange identical data
into groups
 GROUP BY clause follows the WHERE clause in a SELECT statement
and precedes the ORDER BY clause
 SELECT column1, column2 FROM table_name WHERE [ conditions ]
GROUP BY column1, column2 ORDER BY column1, column2
 If you want to know the total amount of salary on each customer, then the
GROUP BY query would be as follows:
 SQL> SELECT NAME, SUM(SALARY) FROM CUSTOMERS GROUP BY
NAME;
NAME SUM(SALARY
)
Hardik 8500.00
kaushik 8500.00
Komal 4500.00
Muffy 10000.00
Ramesh 3500.00
 SQL - Using Joins
 usedto combine records from two or more tables in a database
 A JOIN is a means for combining fields from two tables by using values
common to each
 Consider the following tables: CUSTOMERS & ORDERS

Let us join these two tables in our


SELECT statement as shown below.

SQL> SELECT ID, NAME, AGE,


AMOUNT FROM CUSTOMERS,
ORDERS WHERE CUSTOMERS.ID
= ORDERS.CUSTOMER_ID;
 Several operators can be used to join tables, such as =, <, >, <>, <=, >=, !=,
BETWEEN, LIKE, and NOT

SQL> SELECT ID, NAME, AMOUNT,


DATE FROM CUSTOMERS INNER JOIN
ORDERS ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;

 Different types of joins available in SQL:


 SQL - INNER JOINS / EQUI JOINS
 returns rows when there is a match in both tables.
 SELECT table1.column1, table2.column2... FROM table1 INNER JOIN table2 ON
table1.common_field = table2.common_field;
 SQL LEFT (OUTER) JOIN
 returns all rows from the left table, even if there are no matches in the right table
 This means that a left join returns all the values from the left table, plus matched values

from the right table or NULL in case of no matching join predicate


 SELECT table1.column1, table2.column2... FROM table1 LEFT JOIN table2 ON

table1.common_field = table2.common_field;
 SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS LEFT JOIN

ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;


 SQL RIGHT (OUTER) JOIN
 returns all rows from the right table, even if there are no matches in the left table
 This means that a right join returns all the values from the right table, plus
matched values from the left table or NULL in case of no matching join
predicate.
 SELECT table1.column1, table2.column2... FROM table1 RIGHT JOIN
table2 ON table1.common_field = table2.common_field;
 SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS RIGHT
JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
 SQL FULL JOIN
 combines the results of both left and right outer joins
 Syntax
 SELECT table1.column1, table2.column2... FROM table1 FULL JOIN table2

ON table1.common_field = table2.common_field;
 SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS FULL

JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;


 Natural Join in SQL
 The associated tables have one or more pairs of identically named columns
 The columns must be the same data type
 Don’t use ON clause in a natural join.
 Syntax: SELECT * FROM table1 NATURAL JOIN table2;
 Sample table: foods Sample table: company
 To get all the unique columns from foods and company tables, the following
SQL statement can be used:
 SELECT * FROM foods NATURAL JOIN company;
COMPANY_ ITEM_ID ITEM_NAME ITEM_UNIT COMPANY_ COMPANY_
ID NAME CITY

 Difference between natural join and inner join


 number of columns returned
 SQL | USING Clause
 If several columns have the same names but the datatypes do not match,
the NATURAL JOIN clause can be modified with the USING clause to specify
the columns that should be used for an EQUIJOIN
 USING Clause is used to match only one column when more than one column
matches.
 NATURAL JOIN and USING Clause are mutually exclusive.
 NATURAL JOIN uses all the columns with matching names and data types to join the
tables.
 The USING Clause can be used to specify only those columns that should be used for

an EQUIJOIN.
 Syntax: SELECT table1.column, table2.column FROM table1 JOIN
table2 USING (join_column1, join_column2…);
 table1, table2 are the name of the tables participating in joining.
 The natural join syntax contains the NATURAL keyword, the JOIN…USING
syntax does not.
 An error occurs if the NATURAL and USING keywords occur in the same
join clause.
 The JOIN…USING clause allows one or more equijoin columns to specify in
brackets after the USING keyword.
 SQL - UNIONS CLAUSE
 used to combine the results of two or more SELECT statements without returning any
duplicate rows
 To use this UNION clause, each SELECT statement must have
 The same number of columns selected
 The same number of column expressions
 The same data type and
 Have them in the same order
 Syntax
 SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition] UNION
SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]

 SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS


LEFT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID UNION
SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS
RIGHT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
 SQL - INTERSECT Clause
 used to combine two SELECT statements, but returns rows only from the first SELECT
statement that are identical to a row in the second SELECT statement
 Syntax
 SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]
INTERSECT
SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]
SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS
LEFT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID
INTERSECT
SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS
RIGHT JOIN ORDERS ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;
 SQL - EXCEPT Clause
 used to combine two SELECT statements and returns rows from the first SELECT
statement that are not returned by the second SELECT statement.
 SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]
EXCEPT
SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]
 SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS

LEFT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID


EXCEPT SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS
RIGHT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
 SQL - Having Clause
 enables you to specify conditions that filter which group results appear in the
results
 The WHERE clause places conditions on the selected columns, whereas the
HAVING clause places conditions on groups created by the GROUP BY clause
 Syntax
 SELECT FROM WHERE GROUP BY HAVING ORDER BY

 SELECT column1, column2 FROM table1, table2 WHERE [ conditions ]

GROUP BY column1, column2 HAVING [ conditions ] ORDER BY


column1, column2
 SQL > SELECT ID, NAME, AGE, ADDRESS, SALARY FROM

CUSTOMERS GROUP BY age HAVING COUNT(age) >= 2;


NESTED SUBQUERIES
 Set Membership
 SQL allows testing tuples for membership in a relation
 The ’ in’ connective tests for set membership, where the set is a
collection of values produced by a select clause.
 The ‘not in’ connective tests for the absence of set membership.

 Find all the courses taught in the both the Fall 2009 and Spring
2010 semesters?
 Using intersect

(select course_id from section where semester = ’Fall’ and year= 2009)
intersect
 Using
(select in connective
course _id from section where semester = ’Spring’ and year= 2010);
 Write the subquery

(select course_id from section where semester = ’Spring’ and year=


2010)
 We then need to find those courses that were taught in the Fall 2009 and that
appear in the set of courses obtained in the subquery.
 We do so by nesting the subquery in the where clause of an outer query.

select distinct course_id from section where semester = ’Fall’ and year=
2009 and course_id in (select course_id from section
where semester = ’Spring’ and year= 2010);
 Find all the courses taught in the Fall 2009 semester but not in the
Spring 2010 semester? Exercise, Hint: use ‘not in’
 Set Comparison

 Find the names of all instructors whose salary is greater than at


least one instructor in the Biology department
 Using
‘SOME’ & ‘ANY’ subquery predicates
 SOME returns true if some row of the group matched the condition

 The phrase “greater than at least one” is represented in SQL by ‘>


some’

select name from instructor where salary > some (select salary from
instructor where dept_name = ’Biology’);
 SQL also allows < some, <= some, >= some, = some, and <> some
comparisons.
 Find the names of all instructors that have a salary value greater
than that of each instructor in the Biology department?
 The construct > all corresponds to the phrase “greater than all.”
 As it does for some, SQL also allows < all, <= all, >= all, = all, and <>
all comparisons
select name from instructor where salary > all (select salary from
instructor where dept_name = ’Biology’);

 Find the departments that have the highest average salary

select dept_name from instructor group by dept_name having avg


(salary) >= all (select avg (salary) from instructor group by dept_name);
 Test for Empty Relations
 SQL includes a feature for testing whether a subquery has any tuples in
its result.
 The exists construct returns the value true if the argument subquery is
nonempty
 Find all courses taught in both the Fall 2009 semester and in the
Spring 2010 semester?
select course_id from section as S where semester = ’Fall’ and year= 2009
and exists (select * from section as T where semester = ’Spring’ and
year= 2010 and S.course id= T.course id);

 Test for Empty Relations


 SQL includes a feature for testing whether a subquery has any tuples in
its result
 The exists construct returns the value true if the argument subquery is
nonempty.
 Find all courses taught in both the Fall 2009 semester and in the
Spring 2010 semester?
select course_id from section as S where semester = ’Fall’ and year=
2009 and exists (select * from section as T where semester = ’Spring’
and year= 2010 and S.course id= T.course id);
 A correlation name from an outer query (S in the above query), can
be used in a subquery in the where clause.
 A subquery that uses a correlation name from an outer query is
called a correlated subquery
 We can test for the nonexistence of tuples in a subquery by using
the not exists construct
 Find all students who have taken all courses offered in the Biology
department.
 Using the except construct, we can write the query as follows:
select distinct S.ID, S.name from student as S where not exists
((select course id from course where dept _name = ’Biology’)
Except (select T.course id from takes as T where S.ID = T.ID));
DISPLAYING DATA FROM
MULTIPLE TABLES USING JOINS
 Join Conditions
 A join queries must have contained at least one join
condition, either in the FROM clause or in the WHERE
clause
 The join condition compares two columns from two different
tables
 Equijoins
 a join with a join condition containing an equality operator
 This join retrieves information by using equality condition.
RELATIONAL ALGEBRA

 A procedural query language used to query the database tables to


access data in different ways
 In relational algebra, Input is a relation(table from which data has to be accessed)
and
 Output is also a relation(a temporary table holding the data asked for by the user)
 The primary operations that we can perform using relational algebra
are:
 Select
 Project
 Union
 SetDifferent
 Cartesian product
 Rename
Select Predicate
 Select Operation (σ)
 usedto fetch rows(tuples) from table(relation) which satisfies a given
condition
 Syntax: σp(r) Prepositional Logic, where we specify the conditions that
must be satisfied by the data
 In prepositional logic, one can use unary and binary operators like =, <, > etc,
to specify the conditions
 σage > 17 (Student)
 σage > 17 and gender = 'Male' (Student)
 Project Operation (∏)
 Used to project only a certain set of attributes of a relation
 It will only project or show the columns or attributes asked for, and will also
remove duplicate data from the columns.
 Syntax: ∏A1, A2...(r) where A1, A2 etc are attribute names(column names).
 ∏Name, Age(Student)
 Union Operation (∪)
 Used to fetch data from two relations(tables) or temporary relation(result of
another operation).
 For this operation to work, the relations(tables) specified should have same number of
attributes(columns) and same attribute domain
 Also the duplicate tuples are automatically eliminated from the result.
 Syntax: A ∪ B where A and B are relations
 ∏dept_name(instructor) ∪ ∏dept_name(department)
 Above operation will give us name of departments present in both instructor and
department relations, eliminating repetition.
 Set Difference (-)
 used to find data present in one relation and not present in the second relation
 Syntax: A – B
 ∏dept_name(instructor) - ∏ dept_name(department)
 Cartesian Product (X)
 used to combine data from two different relations(tables) into one and fetch data
from the combined relation
 Syntax: A X B
 Exercise ?
 Rename Operation (ρ)
 used to rename the output relation for any query operation which returns result
like Select, Project etc. Or to simply rename a relation(table)
 Syntax: ρ(RelationNew, RelationOld)
DATABASE DESIGN
ENTITY-RELATIONSHIP MODEL

 Developed to facilitate database design by allowing specification of an


enterprise schema that represents the overall logical structure of a database.
 is a model used for design and representation of relationships between data.
 E-R data mode employs three basic concepts:
 entity sets,
 relationship sets, and
 attributes
 Entity Sets
 An entity is a “thing” or “object” in the real world that is distinguishable
from all other objects.
 E.g. student is an entity, instructor, course, section, etc. are all entities
 It has a set of properties, and the values for some set of properties may
uniquely identify an entity
 An entity set is a set of entities of the same type that share the same
properties, or attributes
 If a Student is an Entity, then the complete dataset of all the students will be
the Entity Set
 An entity is represented by a set of attributes
 If a Student is an Entity, then student's roll no., student's name, student's age,
student's gender etc will be its attributes
 An attribute can be of many types, here are different types of attributes
defined in ER database model:
 For each attribute, there is a set of permitted values, called the domain, or
value set, of that attribute
Composite attributes help
us to
group together related
 Simple attribute: The attributes with attributes,
values that aremaking
atomictheand cannot
modeling cleaner
be broken down further are simple attributes. For example,
student's age.
 Composite attribute: A composite attribute is made up of more than one
simple attribute. For example, student's address will contain, house
no., street name, pincode etc.
 Derived attribute: These are the attributes which are not present in the
whole database management system, but are derived using other attributes.
For example, average age of students in a class.
 As another example, suppose that the instructor entity set has an attribute
age that indicates the instructor’s age; consider another attribute DOB
(stored /base attribute)
 Single-valued attribute: As the name suggests, they have a single value.

E.g. Student _ID attribute for a specific student entity refers to only one

student ID.
 Multi-valued attribute: And, they can have multiple values.
 E.g. Phone_number attribute
 A relationship is an association among several entities;
 When an Entity is related to another Entity, they are said to have a
relationship
 A relationship set is a set of relationships of the same type.
 It is a mathematical relation on n ≥ 2 (possibly non-distinct) entity sets. If E1, E2,
. . . , En are entity sets, then a relationship set R is a subset of

advisor
 {(e1, e2, . . . , en) | e1 ∈ E1, e2 ∈ E2, . . . , en ∈ En}
 where (e1, e2, . . . , en) is a relationship
 The association between entity sets is referred to as participation; i.e., the
entity sets E1, E2, . . . , En participate in relationship set R
 A relationship instance in an E-R schema represents an association between
the named entities in the real-world enterprise that is being modeled
 A relationship may also have attributes called descriptive attributes
 Consider a relationship set advisor with entity sets instructor and student.
 We could associate the attribute date with that relationship to specify the date
when an instructor became the advisor of a student
 A relationship instance in a given relationship set must be uniquely
identifiable from its participating entities, without using the descriptive
attributes.
 We cannot represent multiple dates by multiple relationship instances between
the same instructor and a student, since the relationship instances would not be
uniquely identifiable using only the participating entities
 It is possible to have more than one relationship set involving the same entity
sets (Exercise?)
 Depending upon the number of entities involved, a degree is assigned to
relationships
 For example, if 2 entities are involved, it is said to be Binary relationship, if
3 entities are involved, it is said to be Ternary relationship, and so on.
 Constraints
 Mapping Cardinalities
 Or cardinality ratios, express the number of entities to which another entity
can be associated via a relationship set
 Most useful in describing binary relationship sets

 For a binary relationship set R between entity sets A and B, the mapping
cardinality must be one of the following:
 Participation Constraints
 The participation of an entity set E in a relationship set R is said to be total if every
entity in E participates in at least one relationship in R
 Partial – if only some entities in E participates in relationships in R

 Keys
 The primary key of an entity set allows us to distinguish among the various entities of
the set
 Keys also help to identify relationships uniquely, and thus distinguish relationships
from each other
 Similar mechanism needed to distinguish among the various relationships of a
relationship set.
 Let R be a relationship set involving entity sets E1, E2, . . . , En.
 Let primary key(Ei ) denote the set of attributes that forms the primary key for entity
set Ei
 The composition of the primary key for a relationship set depends on the
set of attributes associated with the relationship set R.
 If the relationship set R has no attributes associated with it, then the set of
attributes
 primary-key (E1) ∪ primary-key (E2) ∪ ·· · ∪ primary-key (En)
describes an individual relationship in set R.
 If the relationship set R has attributes a1, a2, . . . , am associated with it,
then the set of attributes
 primary-key (E1) ∪ primary-key (E2) ∪ · · · ∪ primary-key (En) ∪ {a1, a2, . . . ,
am}
describes an individual relationship in set R.
 The structure of the primary key for the relationship set depends on the
mapping cardinality of the relationship set

Forms a super key for the relationship set


 Removing Redundant Attributes in Entity Sets
 To design a database using the E-R model,
 1st identify the entity set
 Once entity sets are decided, then choose the appropriate attributes

 These attributes represent the various values to be captured in the database

 After choosing the entity sets and their corresponding attributes, relationship sets

among the various entity sets are formed


 These relationship sets may result in a situation where attributes in the various

entity sets are redundant and need to be removed from the original entity sets.
 Consider the entity sets instructor and department:
 Model the fact that each instructor has an associated department using a relationship
set inst_dept relating instructor and department
 Primary key for the department relation where it is redundant in the entity
set instructor and needs to be removed.
 Removing the attribute “dept_name” is rather unintuitive
 When we create a relational schema from the E-R diagram, the attribute
dept_name gets added to the relation instructor, but only if each
instructor has at most one associated department.
 If an instructor has more than one associated department, the relationship
between instructors and departments is recorded in a separate relation
inst_dept
 Treating the connections between “instructor” and “department”
uniformly as a relationship makes the logical relationship explicit
 Helps avoid an early assumption that each instructor is associated with
only one department
 Similarly the student entity set is related to the department entity set
through the relationship set student_dept and thus there is no need for a
dept_name attribute in student.
 A good entity-relationship design does not contain redundant attributes

For our university example, we list the entity


sets and their attributes below, with primary
keys underlined:
 The relationship sets in our design are listed below:
ENTITY-RELATIONSHIP DIAGRAMS

 Basic Structure
 An E-R diagram consists of the following major components:
 Rectangles divided into two parts
 Diamonds represent relationship sets

 Undivided rectangles represent the attributes of a relationship set

 Lines link entity sets to relationship sets

 Dashed lines link attributes of a relationship set to the relationship set

 Double lines indicate total participation of an entity in a relationship set.

 Double diamonds represent identifying relationship sets linked to weak entity sets
 Mapping Cardinality
 One-to-one:
Draw a directed line from the relationship set
advisor to both entity sets instructor and student

 One-to-many:
Draw a directed line from the relationship set
advisor to the entity set instructor and an
undirected line to the entity set student
 Many-to-one:

Draw an undirected line from the relationship set


advisor to the entity set instructor and a directed
line to the entity set student
The limit 0..∗ on the line between
advisor and instructor indicates that
an instructor can have zero or more
students.
 Many-to-many:
If both edges have a maximum
Draw an undirected line from the relationship set value of 1, the relationship is
advisor to both entity sets instructor and student one-to-one

The line between advisor and student has a cardinality constraint of 1..1,
meaning the minimum and the maximum cardinality are both 1
 What is the cardinality limit of 1…* specify? (Exercise)
 Complex Attributes Roles
Roles in E-R diagrams are indicated by
labeling the lines that connect diamonds to
rectangles.
 Nonbinary Relationship Sets
 Weak Entity Sets
 Suppose we create a relationship set sec_course between entity sets section and
course.
 An entity set that does not have sufficient attributes to form a primary key is
termed a weak entity set
 An entity set that has a primary key is termed a strong entity set.
 For a weak entity set to be meaningful, it must be associated with another entity
set, called the identifying or owner entity set.

Discriminator
underlined with
a dashed line
The identifying entity set is said
to own the weak entity set that it
identifies

The relationship associating the weak


entity set with the identifying entity
set is called the identifying
relationship

Every weak entity


must be associated
with an identifying
entity;

The weak entity set is


said to be existence
dependent on the
identifying entity set.
 The identifying relationship set should not have any descriptive attributes, since
any such attributes can instead be associated with the weak entity set.
 The identifying relationship is many-to-one from the weak entity set to the
identifying entity set
 The participation of the weak entity set in the relationship is total
 How to distinguish among all those entities in the weak entity set that depend on
one particular strong entity?
 Discriminator – set of attributes that allow the distinction
 Also called as the partial key of the entity set
 E.g., in ‘section’ weak entity set -> sec_id, year, & semester attributes uniquely
identifies one single section for that course
 Primary key of the weak entity set is formed by the primary key of the
identifying entity set + discriminator
 A weak entity set
 Can participate in relationships other than the identifying relationship
 May participate as owner in an identifying relationship with another weak entity
set
 May have more than one identifying entity set

 E-R diagram for the University Enterprise


 From the diagram followed, find out the components needed for showing the
constraints
– each instructor must have exactly one associated department.
– each instructor can have at most one associated department
 Identify the descriptive attributes present in the relationship set
 Reduction to Relational Schemas
 For each entity set and for each relationship set in the database design, there
is a unique relation schema to which we assign the name of the
corresponding entity set or relationship set
 Both the E-R model & Relational Schema are abstract, logical
representations of real-world enterprises
 Representation of Strong Entity Sets with Simple Attributes
 Let ‘E’ be a strong entity set with only simple descriptive attributes a1, a2, …, an =>
represented by a schema called E with n distinct attributes
 Each tuple corresponds to this E by one entity

 The primary key of the entity set serves as the primary key of the resulting schema

 E.g., entity set ‘student’ from the E-R diagram with 3 attributes: ID, name, tot_cred =>

represented by a schema: student(ID, name, tot_cred)


 Schemas derived from strong entity sets:
classroom (building, room number, capacity)
department (dept name, building, budget)
course (course id, title, credits)
instructor (ID, name, salary)
student (ID, name, tot cred)
 Representation of Strong Entity Sets with Complex Attributes
 We handle composite attributes by creating a separate attribute for each of the

component attributes
 We do not create a separate attribute for the composite attribute itself.

 ‘name’ attribute in ‘instructor’ relation => the schema generated for instructor

contains the attributes first_name, middle_name and last_name; there is no


separate attribute / scheme for name

instructor (ID, first name, middle name, last name, street number, street name,
apt number, city, state, zip code, date of birth)
 Multivalued attributes are treated differently from other attributes
 Attributes in an E-R diagram generally map directly into attributes for the appropriate

relation schemas
 Multivalued attributes, however, are an exception => new relation schemas are created

for these attributes


instructor_phone (ID, phone number)
 Each phone number of an ‘instructor’ is represented as a unique tuple in the relation on

this schema
 For a multivalued attribute M, we create a relation schema R with an attribute A that

corresponds to M and attributes corresponding to the primary key of the entity set or
relationship set of which M is an attribute
 In addition, we create a foreign-key constraint on the relation schema created from the

multivalued attribute, with the attribute generated from the primary key of the entity set
referencing the relation generated from the entity set.
 Derived attributes are not explicitly represented in the relational data model
 Representation of Weak Entity Sets
 Let A be a weak entity set with attributes a1 , a2 , . . . , am. Let B be the strong entity set
1 2

on which A depends
 Let the primary key of B consist of attributes b1, b 2, . . . , bn
2

 We represent the entity set A by a relation schema called A with one attribute for each

member of the set:

 For schemas derived from a weak entity set, the combination of the primary key of the
strong entity set and the discriminator of the weak entity set serves as the primary key
of the schema
 In addition to creating a primary key, we also create a foreign-key constraint on the

relation A, specifying that the attributes b1, b2, . . . , bn reference the primary key of
the relation B.
 Considering the weak entity set ‘section’
 The primary key of the course entity set, on which section depends, is

course_id
 Thus, we represent section by a schema with the following attributes:

section (course_id, sec_id, semester, year)


 The primary key consists of the primary key of the entity set course,

along with the discriminator of section, which is sec_id, semester, and


year.
 Foreign key constraint on the section schema, with the attribute course

id referencing the primary key of the course schema, and the integrity
constraint “on delete cascade”

When a referential-integrity constraint is violated, the normal procedure is to reject the action
that caused the violation
However, a foreign key clause can
specify that if a delete or update action
on the referenced relation violates the
constraint, then, instead of rejecting
the action, the system must take steps
to change the tuple in the referencing
relation to restore the constraint

 on delete cascade - if a delete of a tuple in department results in this


referential-integrity constraint being violated, the system does not reject the
delete
 Instead, the delete “cascades” to the course relation, deleting the tuple that
refers to the department that was deleted.
 Representation of Relationship Sets
 Let R be a relationship set
 Let a1, a2, . . . , am be the set of attributes formed by the union of the primary
keys of each of the entity sets participating in R
 Let the descriptive attributes (if any) of R be b1, b2, . . . , bn.
 We represent this relationship set by a relation schema called R with one
attribute for each member of the set
 {a1, a2, . . . , am} ∪ {b1, b2, . . . , bn}
 How to choose a primary key for a binary relationship set?
 For a binary many-to-many relationship, the union of the primary-key attributes from
the participating entity sets becomes the primary key
 For a binary one-to-one relationship set, the primary key of either entity set can be

chosen as the primary key. The choice can be made arbitrarily.


 For a binary many-to-one or one-to-many relationship set, the primary key of the entity

set on the “many” side of the relationship set serves as the primary key.
 For an n-ary relationship set without any arrows on its edges, the union of the primary
key-attributes from the participating entity sets becomes the primary key
 For an n-ary relationship set with an arrow on one of its edges, the primary keys of the

entity sets not on the “arrow” side of the relationship set serve as the primary key for
the schema
 How to create foreign-key constraints on the relation schema R?
 For each entity set Ei related to relationship set R, we create a foreign-key constraint
from relation schema R, with the attributes of R that were derived from primary-key
attributes of Ei referencing the primary key of the relation schema representing Ei .
 Consider the relationship set advisor in the E-R diagram

 Example:
 Since the relationship set has no attributes, the advisor schema has two attributes, the
primary keys of instructor and student.
 Since both attributes have the same name, we rename them i_ID and s_ID

 Since the advisor relationship set is many-to-one from student to instructor the primary

key for the advisor relation schema is s_ID.


 Two foreign-key constraints on the advisor relation with attribute i_ID referencing the

primary key of instructor and attribute s_ID referencing the primary key of student.
 The schemas derived from a relationship set are depicted as follows:

teaches (ID, course id, sec id, semester, year)


takes (ID, course id, sec id, semester, year, grade)
prereq (course id, prereq id)
advisor (s ID, i ID)
sec course (course id, sec id, semester, year)
sec time slot (course id, sec id, semester, year, time slot id)
sec class (course id, sec id, semester, year, building, room number)
inst_dept (ID, dept name)
stud dept (ID, dept name)
course dept (course id, dept name)
 Redundancy of Schemas
 Relationshipset linking a weak entity set to the corresponding strong entity
set => many-to-one relationships; no descriptive attributes present

 Primary key for weak entity set section is {course_id, sec_id, semester, year}
 Since sec course has no descriptive attributes, the sec course schema has
Includes
attributes course_id, sec_id, semester, andprimary
year. key of strong entity set
 The schema for the entity set section includes the attributes course_id, sec
_id, semester, and year (among others).
 Every (course_id, sec_id, semester, year) combination in a
sec_ course relation would also be present in the relation on
schema section, and vice versa.
 Combination of Schemas
 Consider a many-to-one relationship set AB from entity set A to entity set B.
 instructor (ID, name, salary) => A
 department (dept_name, building, budget) => B
 inst_dept(ID, dept_name)
 inst_dept and instructor schemas can be combined and now the
instructor (ID, name, dept_name, salary) has these attributes.
Similarly, we can combine the schemas for the relationship sets
 stud_dept
 course_dept
 sec_class
 sec_time_slot
In the case of one-to-one relationships, the
relation schema for the relationship set can be
combined with the schemas for either of the
entity sets

We can combine schemas even if the


participation is partial by using null values.

 Entity-Relationship Design Issues


CREATE TABLE sales (
    customer_id NUMBER,
    product_id NUMBER,
    order_date DATE NOT NULL,
    total NUMBER(9,2) DEFAULT 0 NOT NULL,
    PRIMARY KEY(customer_id,
                 product_id,
                 order_date)
);

You might also like