Professional Documents
Culture Documents
Codd proposed the relational data model in 1970. A Prototype research project
was developed at IBM UC-Berkeley by mid-1970. Today, Most database (DBMS)
products like IBM’s DB2 family, Informix, Oracle, Sybase, Microsoft’s Access and
SQLServer, FoxBase, and Paradox depend on the relational data model.
In the relational data model each relation is a table with row and column. The
major advantage of the relational model is simple representation and complex queries can
be expressed easily
The main construct for representing data in the relational model is a relation. A
relation consists of a relation schema and a relation instance.
The relation instance is a table, and the relation schema describes the column
heads for the table.
The schema specifies the relation’s name, the name of each field (or column,
or attribute), and the domain of each field. A domain is referred to in a relation schema
by the domain name and has a set of associated values.
Students(sid: string, name: string, login: string, age: integer, gpa: real)
From the above syntax, the field named sid has a domain named string. The
set of values associated with domain string is the set of all character strings.
Field names
sid name login age gpa
50000 Dave dave@cs 19 3.3
53666 Jones jones@cs 18 3.4
TUPLES (RECORDS, ROWS) 53688 Smith smith@ee 18 3.2
53650 Smith smith@math 19 3.8
53831 Madayan madayan@music 11 1.8
53832 Guldu guldu@music 12 2.0
The domain constraints specify a condition, that all values for the column must be
drawn from the domain associated with that column.
We can say that the relation instance satisfies the domain constraints in the
relation schema.
SQL language uses tables to represent relations. The subset of SQL that supports
the creation, deletion, and modification of tables is called the Data Definition Language
(DDL).
The CREATE TABLE statement is used to define a new table. For example,
create the Students relation, we can use the following statement:
3
CREATE TABLE Students (
sid CHAR(20),
name CHAR(30),
login CHAR(20),
age INTEGER,
gpa REAL );
Tuples are inserted using the INSERT command. We can insert a single tuple into
the Students table as follows:
We can optionally omit the list of column names in the INTO clause and list the
values in the appropriate order.
We can delete tuples using the DELETE command. We can delete all Students
tuples with name equal to Smith using the command:
DELETE
FROM Students S
WHERE S.name = ‘Smith’;
We can modify the column values in an existing row using the UPDATE
command. For example, we can increment the age and decrement the gpa of the student
with sid 53688:
UPDATE Students S
SET S.age = S.age + 1, S.gpa = S.gpa - 1
WHERE S.sid = 53688;
4
2) INTEGRITY CONSTRAINTS OVER RELATIONS
a) Key Constraints
A key constraint is a statement that a certain minimal subset of the fields of a relation
is a unique identifier for a tuple.
The set {sid, name} is an example of a superkey, which is a set of fields that contains a
key.
A relation can have several candidate keys like for student roll number and email id.
Out of all the available candidate keys, a database designer can identify a primary
key.
In SQL we can declare that a subset of the columns of a table constitute a key
by using the UNIQUE constraint. At most one of these ‘candidate’ keys can be
declared to be a primary key, using the PRIMARY KEY constraint.
5
Let us revisit our example table definition and specify key information:
This definition says that sid is the primary key and that the combination of name
and age is also a key. The definition of the primary key also illustrates how we can
name a constraint by preceding it with CONSTRAINT constraint-name. If the constraint
is violated, the constraint name is returned and can be used to identify the error.
Sometimes there can be a relationship between two tables. If one table is modified, the
other table must also be modified to keep data consistent.
To ensure that only Students in the above relation is enrolled, any value
that appears in sid of enrolled must appear in field of Students. The sid of
enrolled is called as foreign key.
However, every sid value that appears in the instance of the Enrolled table appears in the
primary key column of a row in the Students table.
6
c) Domain Integrity-
d) General Constraints
ICs are specified when a relation is created and enforced when a relation is modified.
The impact of the domain, PRIMARY KEY, and UNIQUE constraints is straightforward:
if an insert, delete, or update command causes a violation, it is rejected.
Consider the previous student instance, The following insertion violates the primary
key constraint because there is already a tuple with the sid 53688, and it will be rejected
by the DBMS:
INSERT
INTO Students (sid, name, login, age, gpa)
VALUES (53688, ‘Mike’, ‘mike@ee’, 17, 3.4);
The following insertion violates the constraint that the primary key cannot contain
null:
INSERT
INTO Students (sid, name, login, age, gpa)
VALUES (null, ‘Mike’, ‘mike@ee’, 17, 3.4)
When we violate a domain constraint then all command will be reject. Deletion
does not cause a violation of domain, primary key or unique constraints. However, an
update can cause violations, similar to an insertion:
This update violates the primary key constraint because there is already a tuple with
sid 50000.
Consider previous Enrolled relation deletion of tuples does not cause violation of
referential integrity of tuples in Enrolled could.
For example, the following insertion is illegal because there is no student tuple with sid
51111:
INSERT
INTO Enrolled (cid, grade, sid) VALUES (‘Hindi101’, ‘B’, 51111);
8
On the other hand, insertions of Student tuples may not cause violation but
deletion of student tuple can cause violation because the key may be used in other
relations.
SQL provides several alternative ways to handle foreign key violations. We must
consider three basic questions:
1. What should we do if an Enrolled row is inserted, with a sid column value that does
not appear in any row of the Students table?
• Delete all Enrolled rows that refer to the deleted Students row.
• Disallow the deletion of the Students row if an Enrolled row refers to it.
• Set the sid column to the sid of some (existing) ‘default’ student, for every
Enrolled row that refers to the deleted Students row.
• For every Enrolled row that refers to it, set the sid column to null. In our example,
z l this option conflicts with the fact that sid is part of the primary key of
Enrolled and therefore cannot be set to null. Thus, we are limited to the first three
options in our example, although this fourth option (setting the foreign key to
null) is available in the general case.
SQL allows us to choose any of the four options on DELETE and UPDATE.
For example, we can specify that when a Students row is deleted, all Enrolled
rows that refer to it are to be deleted as well, but that when the sid column of a Students
row is modified, this update is to be rejected if an Enrolled row refers to the modified
Students row:
9
When action is set to NO ACTION, the update will be rejected and when option is
cascade, if a students row is deleted all rows od enrolled refereeing to students will be
deleted.
If we want to update all enrolled rows we can use “NO UPDATE CASCADE”.
For example, When a student row is deleted, we can give ‘default’ student value
by using “NO DELETE SET DEFAULT”.
SQL also allows to use of null as default value by specifying “NO DELETE SET
NULL”.
A relational database query (query, for short) is a question about the data, and the
answer consists of a new relation containing the result. language is a specialized
language for writing queries.
For example, we might want to find all students younger than 18 or all students enrolled
in Reggae203. A query
SQL is the most popular commercial query language for a relational DBMS. Some of the
SQL queries on previous table cab be given as follows.
10
sid name login age gpa
50000 Dave dave@cs 19 3.3
53666 Jones jones@cs 18 3.4
53688 Smith smith@ee 18 3.2
53650 Smith smith@math 19 3.8
53831 Madayan madayan@music 11 1.8
53832 Guldu guldu@music 12 2
From the above table we can retrieve the students younger than 18, with following query.
SELECT *
FROM Students
WHERE age < 18;
• * means we want to retain all fields. The result of above query is as follows:
If we give the condition in where clause as age = name, it does not make sense
because it compares an integer value with a string value
We can also write a query to extract only subset fields. For example, if want to
retrieve name, login of students younger than 18 the query can be written as
Figure 3.7 shows the answer to this query; it is obtained by applying the selection to
the instance S1 of Students (to get the relation shown in Figure 3.6), followed by removing
unwanted fields. Note that the order in which we perform these operations does matter—if
we remove unwanted fields first, we cannot check the condition S.age
< 18, which involves one of those fields.
We can also combine information in the Students and Enrolled relations. If we want to
obtain the names of all students who obtained an A and the id of the course in which they
got an A, we could write the following query:
11
SELECT S.name, E.cid
FROM Students S, Enrolled E
WHERE S.sid = E.sid AND E.grade = ‘A’
Converting an entity set to a table is very simple. Each attribute of Entity set be comes
column/attribute of the table.
name
ssn lot
Employees
The attributes are ssn, name, lot. Consider there are employee entities then this
entity set will be converted to table form as shown below
The following SQL statement captures the preceding information, including the
domain constraints and key information:
12
The set of nondescriptive attributes is a super key for the relation. If there are no key
constraints this set of attributes is a candidate key.
since
name dname
All the available information about the Works_In2 table is captured by the following SQL
definition:
13
CREATE TABLE Works In2 ( ssn CHAR(11),
did INTEGER,
address CHAR(20), since DATE,
PRIMARY KEY (ssn, did, address),
FOREIGN KEY (ssn) REFERENCES Employees,
FOREIGN KEY (address) REFERENCES Locations,
FOREIGN KEY (did) REFERENCES Departments );
Note that address, did, and ssn fields cannot take on null values and should be
unique since they are primary keys.
name
ssn lot
Employees
supervisor subordinate
Reports_To
Observe that we need to explicitly name the referenced field of Employees because the
field name differs from the name(s) of the referring field(s).
If a relationship set involves n entity sets and some m of them are linked via arrows
in the ER diagram, the key for any one of these m entity sets constitutes a key for the
relation to which the relationship set is mapped. Thus we have m candidate keys, and one
of these should be designated as the primary key.
14
since
name dname
The table for manages relationship consists of attributes ssn, did, since. did as
primary key and ssn, did becomes foreign key
Modify the table which corresponds to primary key in above relation(i.e; did of
department) by adding key field of employee and descriptive attribute “since”.
Since did should be unique and ssn can be same i.e; one department should have
almost one employee as manager and one manager can manage many department.
15
CREATE TABLE Dept Mgr ( did INTEGER,
dname CHAR(20),
budget REAL,
ssn CHAR(11),
since DATE,
PRIMARY KEY (did),
FOREIGN KEY (ssn) REFERENCES Employees );
Consider the ER diagram below Figure, which shows two relationship sets, Manages and
Works In.
since
name dname
Works_In
since
A weak entity set always participates in a one-to-many binary relationship and has a
key constraint and total participation at a weak entity.
name
cost pname
ssn lot age
The weak entity can’t be uniquely identified by its partial key, so it should be
combined with the primary key of owning entity to form the primary key of the
relationship. This relation can be written as
The cascade option ensures that information about the policy is deleted if
the employee tuple is deleted.
17
6) INTRODUCTION TO VIEWS
A view is a table whose rows are not explicitly stored in the database but are computed
as needed from a view definition. Consider the Students and Enrolled relations.
Suppose we want to find out sid and name of student who got grade B in a course
along with course name. For this can define a view as follows.
The view B-Students has three fields called name, sid, and course with the same
domains as the fields name and sid in Students and course_name in Enrolled. These are
optional because if we omit then they will be inherited.
The tables from which view is created is called as base table. We can perform
query on views as he do on table.
Views provide logical data independence. Because even the student of the table is
changed, we can the same view if required.
Views also provide security. We can create with only necessary data, the
18
unwanted data hide in the table.
Update on views:
An update on the view is possible by updating its base table. For example,
consider the following view updates are simple on single base table views.
sid gpa
1 7.5
4 8.5
5 8
Good_Student
We can modify the gpa of a Good_Stident row by modifying row of student table.
We can delete a row in view by deleting corresponding row of a base table.
We can insert rows into view by inserting rows in base table, and keeping null for
columns that does not corresponding to views in above table Sname, age. But we can’t
keep a null value for the primary key. If we do this insertion will be rejected.
Updates on views with more than base table is problematic consider a view on
student and Enrolled table.
Now consider we want to delete a row(1, Ramu, DBMS, A), how to do this?
This can be accomplished by either deleting row(1, Ramu, 25, 7.5) from Student
base table, if we do this then other row in view(1, Ramu, OS, A) will also deleted.
➢ When a table is deleted by using Drop table the view will be deleted.
Query languages are specialized languages for asking questions, or queries, that
in- volve the data in a database. Relational algebra and calculus are two formal query
language for model.
In relational calculus, a query produces the answer without specifying how the
answer is computed. This nonprocedural style of querying is called as declarative.
1) PRELIMINARIES
The inputs and outputs of a query are relations. A query is evaluated using
instances of each input relation and it produces an instance of the output relation.
The key fields are underlined, and the domain of each field is listed after the
field name. Thus sid is the key for Sailors, bid is the key for Boats, and all three fields
together form the key for Reserves.
Instance S1 of Sailors
Sid sname rating age sid bid day
22 Dustin 7 45.0 22 101 10/10/96
31 Lubber 8 55.5 58 103 11/12/96
58 Rusty 10 35.0
Instance S2 of Sailors Instance R1 of Reserves
Sid sname rating age
28 yuppy 9 35.0
31 Lubber 8 55.5
44 guppy 5 35.0
58 Rusty 10 35.0
2) RELATIONAL ALGEBRA
Relational algebra contains operator to select rows from a relation and to select
columns from relation. The first one is called as selection operator(σ) and another
one is called as projection operator(π).
For example, consider the instance S2 of Sailors, we can retrieve the rows of
expert Sailors by using “σ” operators as follows.
σrating>8(S2)
The selection operator “σ” specifies the tuples to return through a selection
condition.
πsname,rating (S2)
sname rating
yuppy 9
Lubber 8
guppy 5
Rusty 10
πsname,rating(S2)
Suppose that we wanted to find out only the ages of sailors. The expression
πage(S2)
➔ It returns
age
35.0
55.5
Even though there are four tuples, two of them have same value 35.0. The
23
elimination of duplicates is done, so that the relations are always sets of tuples.
Since the output of a query is also a relation, we can write the query as follows.
πsname,rating (σrating>8(S2))
sname rating
yuppy 9
Rusty 10
πsname,rating (σrating>8(S2))
b) Set Operations
S1 ∩ S2
−
➢ Set-difference: R - S returns a relation instance containing all tuples
that occur in R but not in S. The relations R and S must be union-
compatible, and the schema of the result is defined to be identical to the
schema of R.
The result of a relational algebra expression inherits fields names from its
argument (input) relation. But in same case name conflicts may arise. For example
in previous S1 X R1. So it convenient to give a new name to output relation.
The renaming operator (ρ) is used for change exist name. The expression
̅
ρ(R(𝐹 ), E) takes an arbitrary relational algebra expression E and returns an
instance of a (new) relation called R.
R contains the same tuples as the result of E, and has the same schema as
E, but some fields are renamed. 𝐹̅ contains the renaming list, it contains the terms
having the form old name -> new name or position -> new name.
d) Joins
The join operation is one of the most useful operations in relational algebra and is
the most commonly used way to combine information from two or more relations.
i. Conditional Join
The most general form of join is conditional join. It accepts a condition c and a
pair
relation instance as arguments and return a relation instance.
ii. Equijoin
e) Division
The division operator is useful for expressing certain kinds of queries, for example:
“Find the names of sailors who have reserved all boats.”
For example, consider two relation instances A with two fields x,y and B with only
one field y, with same domain of A. We define division as A/B as the set of x values, such
that for every y value in B, there is a tuple(x, y) in A.
Or
For each x value in A, there is a set of y value in tuples of A with that x value. If the
set contains (all y values in B), then x is the result if A/B.
Another way to understand division is as follows. For each x value in (the first column
of) A, consider the set of y values that appear in (the second field of) tuples of A with that
x value. If this set contains (all y values in) B, the x value is in the result of A/B.
An analogy with integer division may also help to understand division. For integers A
and B, A/B is the largest integer Q such that Q ∗ B ≤ A. For relation instances A and B,
A/B is the largest relation instance Q such that Q × B ⊆ A.
πx((πx(A) × B) − A)
28
B3 pno A/B3
p1 sno
p2 s1
p4
3) RELATIONAL CALCULUS