You are on page 1of 36

CHAPTER ONE

The Database Management Systems


1
I. What is a database:

A database is a collection of information organized in


such a way that it can be easily accessed and updated. It is
used by organizations (hospital, bank, school, etc.) as a
method of storing and managing information.

Building a database is to group the data in


"homogeneous" packages (entities, and tables), each entity
(table) is composed of a certain number of elementary data
(attributes or fields), the repetition (redundancy) of attributes
must be minimal.

II. The database management systems:

i. Definition :
A data base management system (DBMS) is a software
system destined for storing and sharing information in a
database. It also allows recording, retrieving, editing, sorting,
transforming, or printing the information of the database.

Examples: Oracle Database (1979), Microsoft Access (1992),


MySql (1995)…

ii. Main functions of a DBMS:


 Data Definition Language (DDL): Description of data
(attributes) in entities and the relations between the entities.
 Data Manipulation Language (DML) or Structured Query
Language (SQL):
i. Insertion of data.
ii. Updating data.
iii. Remove data.
iv. Searching data (query).
 Data Control Language:
i. Control of data integrity.
ii. Transaction management and security.
iii. Characteristics of databases and DBMS objectives:
a. Data Independence :
In a database, the manner in
which the information is presented
to the users differs from the
manner how the information is
organized, and it differs from the
manner how the information is
stored in files.

This construction has 3


views; each of three views can be
changed in a separated manner.

The data independence allows the modification of any of the three


views without the need to modify the other views.
1- User views
2- Conceptual model
3- Physical model

1- User views: external view for each group of users on a sub


set of the database.

2- Conceptual model and logical data independency:


The conceptual model is the model of the logical organization of
information stored in the database; it is an overview of the whole
information stored as entity-association form.

The logical independency is the ability of change the logical model


without modifying the external model of application program.

3- Physical model and the physical data independency:


The physical model contains the characteristics of the structures
for the permanent storage of information as records in files. This
includes the reserved space for each piece of information, how
information is stored.

The physical independency is the able to change the physical


model without modifying the logical model.
b. Data redundancy control:
The data redundancy is the unnecessary duplication of
data. These data must be represented with the least possible
redundancy which decreases the possibility of errors existence.

The redundancy can cause several problems such as:


- Loss (wasted) space.
- Inconsistency of data.
- Research of data may become slow.
- Waste of time (data entry, update ...) ...

Exp : Table Material

Code Desc Coeff Nbh/W Prof name Date of address Redundant


mat birth data
Inf1 Algo 10 4 Wafaa 12/5/1985 Nabatieh
Inf2 VB.net 8 4 Danielle 7/1/1980 Nabatieh
Acc1 Analitycal 12 2 Imad 3/2/1989 Saida
Inf3 Database 8 2 Wafaa 12/5/1985 Nabatié
Acc1 Finance 8 2 Imad 3/2/1989 Saida

The solution is to divide the data into related tables as follows:


Code mat Desc Coeff Nbh/W Prof Id
Inf1 Algo 10 4 1
Inf2 VB.net 8 4 3
Acc1 Analitycal 12 2 2
Inf3 Database 8 2 1
Acc1 Finance 8 2 2

Prof-id Prof name Date of birth address


1 Wafaa 12/5/1985 Nabatieh
2 Imad 3/2/1989 Saida
3 Danielle 7/1/1980 Nabatieh

c. Data consistency (coherent):


Some data may depend on other data to define a
consistent state of the database.

Exp: The quantity sold of a certain product must be less


than or equal to the quantity in stock of the same product.
d. Control of integrity constraints:
The data are submitted to a certain number of integrity
constraints that define a consistent status of the database.

Exp:
1- The age of a person must be a positive integer. The DBMS
checks the entered data that respects this rule.

2- Table Country [c_id, description, surface, capital, official lang]


 Each country can have one capital. The same capital is for
one country.
 Each country has only one official language. An official
language can be for several countries.

When these data are used, the system verifies permanently the
respect of these constraints as follows:

 Domain integrity: domain of the attribute is the set of values it can


take.

 Referential integrity: controls the coherence of the data by checking


the join constraints between the tables (each value of the secondary
key has to be taken from a value of the primary key).

 Relation integrity: it permits to control that each row of a relational


table is identified by a unique value (primary key, unique value).

e. Data sharing (Data portability):


Data sharing allows two (or many) users to edit the same
data "together in the same time" and assures a consistent query
result.

f. Protection and Data security:


The data must be protected against unauthorized access.
For this reason, we need to associate each user with access
rights to the data.
i. Managing users and their passwords:
For security reasons, each user accesses the data by a
username and a secret password by creating roles, users,
permissions (privileges).

ii. Access control and authorization:


A DBMS must be used to prohibit certain persons from
realizing certain operations on a part or on the entire database.

iii. Backup and restore data:


The backup is the operation that is to duplicate and to
secure the data contained in a database (BACKUP, RECOVERY).

In informatics, the data recovery (or restoration of


data) is to retrieve lost data due a damage and restore the DB
in a consistent status, the same or the closest to the one
preceding the damage by making recovery procedures
(COMMIT, ROLLBACK, SAVE POINT).

iv. Process of designing a database:

Real world
Independent from
a DBMS
Specifications of
Analyze
the DB

Conception
Conceptual
Specified to a
model
DBMS

Transformation in Logical
logical model
model

Physical Internal
Conception model
III. Database Administrator:

The database administrator is a person or team responsible


for the good functioning of database servers, in terms of database
design, validation testing, operating, protection and control of
users.

IV. Classification of databases:

a. Centralized / Distributed Databases:


1. Centralized database:
A centralized database is a set of data stored in a single
database that exist in one place.

2. Distributed database:
A distributed database is a collection of logically related
databases that are physically distributed over a network.

b. Homogeneous / heterogeneous databases:


1. Homogeneous database:
A homogeneous database is distributed by the same database
management systems.

2. Heterogeneous database:
A heterogeneous database is a distributed database generated
by several types of DBMS.
CHAPTER TWO

The Relational Data Model


2
I- The concepts of relational model:

The relational model is a logical model where data is stored


in related tables.

II- Attributes, Domain and relationships:

1. Attribute:
An attribute is identifier (a name) that describes
information stored in a database.

Exp: the date of birth of a person, the person name, the


social security number

a. Composed / atomic attributes:


• An atomic attribute is indivisible and stores a single type of
information. Exp: Firstname, Lastname, unit price.

• A composed attribute is formed of several elementary


attributes. Exp:
Address

City Street Building Nb

b. Existing / derivatives attributes:


• A derived attribute is an attribute that can be calculated at
any time from the other attributes and displays the result
immediately.
Exp:

Person

Person_Id
First name
Last name
Date of birth Age= Current date – date of birth
Age

Derived attribute

 An existing attribute stores a permanent elementary value


which can be used in a calculated attribute Exp: Unit_Price,
Quantity.

2. Set of attributes values (domain)

A domain is a set of values characterized by a name. As a


set, a domain can be defined in extension.

For example: Level = {"BT", "TS"}


YEAR= {1, 2, 3}

3. A relation:

A relation is represented as a two-dimensional array in which the


n attributes correspond to the titles of n columns. A relation is sometimes
called the "table" or "entity".

Exp: Relation Professors (Prof_id, Prof_name, DB, certificate, major, …)

a. Degree of a relation: The degree of a relation is the number of


its attributes.
b. The tuple or occurrence of a relation: the tuple is an ordered
collection of values of n attributes relating to the same object. It
corresponds to one record or row in the relation.

Exp: 1, Walid Fares, 2/8/1975, LT, Accounting… Tuple 1

2, Faten Saleh, 12/4/1960, BS, Informatics… Tuple 2


c. Intention and extension of a relation: The description of the


relation attributes is called the diagram of the relation or the
intention of the relation.

The set of the occurrences of a relation is called the extension of


the entity.

Exp:
Diagram (intention Professors
Of the relation) (Prof_id, Prof_name, DB, certificate, major, …)

Schema (extension 1, Walid Fares, 2/8/1975, LT, Accounting…


or occurrences 2, Faten Saleh, 12/4/1960, BS, Informatics…
of the relation) 3, Daniel Fakih, 9/2/1980, LT, Informatics …

d. Cardinality: The cardinality of a relation is its number of


occurrences.

Prof_id Prof_name DB Certificate major


1 Walid Fares 2/8/1975 LT Accounting
Cardinality=3 2 Faten Saleh 12/4/1960 BS Informatics
3 Daniel Fakih 9/2/1980 LT Informatics
III- The relational constraint:

1. Key constraints:

a. Candidate key constraint:


A candidate key is any attribute of a relation (in minimum
state) able to find a single element of the relationship (tuple) by
their value (s).

The primary key is a key chooses among the key candidates.

Exp:
- For a car, a candidate key could be his registration number,
or serial number, which are supposed to be unique.

- For one person, the security number is a candidate key.


His email address can be another (if all people have
managed an email).

- For books, the No. ISBN is a candidate key.

b. The primary key of a relation:


The key of a relation or identifier is an attribute not null that
identifies a unique tuple (record) of this entity (selected candidate
key).

The values of the key attribute should be separate (you can’t


find two tuples that have the same key value).

Exp 1:

1, Walid Fares, 2/8/1975, LT, Accounting…


Distinct
values
2, Faten Saleh, 12/4/1960, BS, Informatics…

 Each tuple is identified by a unique identifier value.


Sometimes we may find two or more attributes (composed key)
that identify each tuple separately. The value of this kind of key (several
values together) must still be unique.

Exp 2:

Key Attribute

The 3 values Code student Code subject Date exam Grade


together must be 1 5 12/12/2012 15
found only once. 1 2 12/12/2012 17
2 5 25/2/2013 10

Exp 3: the table "Customers" has a unique key (one column C_Id)
while the table "Purchase" has a multiple key (several columns
formed the key: (C_Id, prod_id, Date_purch).

c. Constraint of secondary key :


We call "foreign key" an attribute (in a secondary table) used to
establish and maintain a link with a primary key or unique key in
another table (primary table). This key can contain null or
duplicated values.

Exp: the attribute C_Id of the relation "Purchase"


(secondary table) must take the value from one of the values
of the attribute C_Id of the relation "Customers" (primary
table).

d. The super key :


The super key is a subset of attributes uniquely identifies each
row, but it is not the minimal state, it contains additional attribute(s).
2. Domain constraint:
The values assigned to a field of a relationship must belong to the
domain of definition of an attribute. This constraint is used to
define:
 The type of the attribute.
 The field size.
 Limit values (check, between ...)
 Value by default.
 Null / Not Null ...

Exp: Consider the association Members of authors, each member


must have at least 18 years old and at most 60 years. If you build
the relationship MEMBER (nummembre, membername,
prenommembre, agemembre, adressemembre). In agemembre
attribute, you can’t enter an age less than 18 years or age over 60
years.

3. Referential integrity constraint:


 A foreign key should reference to a primary key.
 Cannot delete a parent record which is referenced by a child
record(s). (Delete cascade/ Delete set Null).
 Cannot update value of a primary key referenced by a foreign key.
(Update Cascade).

Exp: the obligation of customer presence for a purchase


transaction. That is to say, that a record in the Purchase table must
match a record in the CUSTOMERS table as PURCHASE.C_Id =
CUSTOMERS.C_Id.

Customer Products

C_Id FN LN DB Nat_Id Prod_Id Desc Up Qty_stock

… … … … … … … … …

Purchase

C_Id Prod_Id Purchased-qty

… … …
IV- Relational algebra:

1. The basic operations of relational algebra

Exp: suppose that we have the following two tables:


Customer (C_Id, FN, LN, DB, Nat_Id)
Nationality (Nat_Id, Desc)

Customer Nationality
C_Id FN LN DB Nat_id Nat_id Desc
1 Wafaa Saleh 5/6/1992 1 1 Lebanese
2 Samir Hanna 9/8/1993 2 2 Syrian
3 Ali Ahmad 9/8/1993 1 3 Egyptian
4 Faten Makki 15/2/1992 1
5 Ali Makki 10/10/1993 1

a. The PROJECT (unary operator): extracts attributes (columns) from


the relation. Given a relation R that contains an attribute X.

Syntax:
RelName = PROJECT (tablename, attr1, attr2, attr3, …)

Exp: display the first and last name of customers:


R1= PROJECT (CUSTOMER, fn, ln )

R1
FN LN
Wafaa Saleh
Samir Hanna
Ali Ahmad
Faten Makki
Ali Makki

b. The selection (Restrict): SELECT extracts tuples from a relation


that respects a given condition (criteria).

Syntax:
RelName = SELECT (tablename, condition )
Exp: display the customers that have numbers greater than 3:
R2= SELECT (CUSTOMER, C_Id>3)

R2
C_Id FN LN DB Nat_id
4 Faten Makki 15/2/1992 1
5 Ali Makki 10/10/1993 1

The selection condition can be specified with:


 The comparison operators >,> =, <, <=, =, <>.
 The logical operators: AND, OR , NOT
 Other operators: IN, BETWEEN, LIKE, IS, ALL.

c. The UNION operator: it constructs the union of two tables. With


two tables R and S (the same structure), the union R ∪ S is the set
of tuples that are in R, or in S, or in both tables.

Syntax:
RelName = UNION (tablename1, tablename2)

Exp: Show all Lebanese and foreign clients.


Note: Suppose we have a relation for Lebanese customers and
another for strangers.

R3= UNION (CUSTOMER, STRANGE_C)

Customer STRANGE_C
C_Id FN LN DB Nat_id C_Id FN LN DB Nat_id
1 Wafaa Saleh 5/6/1992 1 2 Samir Hanna 9/8/1993 2
3 Ali Ahmad 9/8/1993 1
4 Faten Makki 15/2/1992 1
5 Ali Makki 10/10/1993 1

R3
C_Id FN LN DB Nat_id
1 Wafaa Saleh 5/6/1992 1
2 Samir Hanna 9/8/1993 2
3 Ali Ahmad 15/2/1992 1
4 Faten Makki 15/2/1992 1
5 Ali Makki 10/10/1993 1
d. The DIFFERENCE operator: it constructs a set of two different
tables. Suppose that R and S are two tables with the same
structure. The difference R - S is the set of tuples present in R but
not in S.

Syntax:
RelName = DIFFERENCE (tablename1, tablename2)

Exp: Show all customers who are not strangers.


R4= DIFFERENCE (CUSTOMER, STRANGE_C)

e. The Cartesian product: PRODUCT (×): it constructs the Cartesian


product of two relations containing the number of tuples of table1
X number of tuples of table2.

Syntax:
RelName = PRODUCT (tablename1, tablename2)

Exp1:

R5= PRODUCT (CUSTOMER, NATIONALITY)

Exp2: Given the 2 relations R and S:

R: S:

R X S:
f. The JOIN operator: this operator is used to
combine data (rows or columns) from 2
tables based on a common attribute
between them.

Syntax:
RelName = JOIN (tablename1, tablename2, condition of join)

Exp: Here are the contents of two tables R and S:


R A | B | C S C | D | E
---+---+--- ---+---+---
1 | 2 | 3 3 | a | b
4 | 5 | 6 6 | c | d
7 | 8 | 9
8 | 6 | 3

Delete duplicate columns S.C and we obtain:


A | B | C | D | E
---+---+---+---+---
1 | 2 | 3 | a | b
4 | 5 | 6 | c | d
8 | 6 | 3 | a | b

g. Rename a relation:
The Rename operator returns an existing relationship in a
new name.

Syntax:
RelName = RENAME (old_tablename, new_tablename)

Exp: Rename the relation CUSTOMER into CLIENT


R6= RENAME (CLIENT, CUSTOMER)

This operator can rename one or more attributes of a relation.


2. The additional operators:
a. The INTERSECT operator: builds the intersection of
two tables. With two tables R and S, the intersection
R ∩ S is the set of tuples that are in R and in S. We
must have R and S of the same structure.

b. Division (÷):
This operator is used when we wish to express queries with “all”:

Exp: Consider:

- The relationship "Results" of the join (Student, Subjects).


- The relation Student (Student-id, Student-Name).

The relation T resulting from the division of "Results" by "Student"


gives the subjects examined by all the students.

Results Students

Student_id Subj_name Year Student_id Student name


1 Database 1 1 Faten
2 VB.net 2 2 Walid
1 VB.net 2

T=Results  Students

Subj_name Year
VB.net 2

3. Relational queries:
Exp:
Given the two following tables (or relations):
Customer(CodeCustomer, CustomerName, CustomerAdr, Tel)
Order(OrderNb, Date, CodeCustomer *)

Query 1:
We would like to obtain the code and the name of the customers
who ordered the 10/06/2017:

R1=SELECT (Order, Date=”10/06/2017”)


R2=JOIN (R1, Customer, R1. CodeCustomer = Customer.CodeCustomer)
R3=PROJECT (R2, CodeCustomer, CustomerName)
Query 2:
We want to get the code and name of customers who have not
already ordered:

R1= PROJECT (Customer, CodeCustomer)


R2= PROJECT (Order, CodeCustomer)
R3= DIFFERENCE (R1, R2)
R4= JOIN (R3, Customer, R3. CodeCustomer=Customer. CodeCustomer)
R5= PROJECT (R4, CodeCustomer, CustomerName)

EXERCISE N°1

AGENCY (Agency_Number, Name, City, Active)


CUSTOMER (Customer_Number, Name, City)
ACCOUNT (Account_Number, Agency_Number, Customer_Number, Balance)
BORROW (BorrowNumber, Agency_Number, Customer_Number, Amount)

Formulate in relational algebra the following queries:


a) Give all the names of the agencies that are in the city of 'Beirut'.
b) Give all accounts (Account_Number, customer name, balance) whose
balance is greater than 1000.
c) Give all the accounts of the customer "Samir Wehbé".
d) Give the names of clients who have not already taken a loan.
e) Give the names of clients who have accounts and loans.
f) Change the name of the CUSTOMER relation in PATIENT.

EXERCISE N°2

REPRESENTATION (representationNb, representation_title, place)


MUSICIAN (name, representationNb *)
PROGRAM (date, representationNb*, tarrif)

Formulate in relational algebra the following queries:


a) Give the list of the representation titles.
b) Give the list of the representation titles taking place at the Opera.
c) List the names of the musicians and the representation titles in which they
participate.
d) Give the list of the representation titles, the places and the tariffs realized
on 14/09/2016.
e) Give a list of musicians who haven’t present any representation.
f) Give a list of musicians who schedule the representation “Romio and
Juliette” on “17/9/2016”.

EXERCISE N°3

Client (CliNb, name, address)


Booking (BookingNb, dateB, CliNb, RoomNb, nDays)
Room (RoomNb, type, surface, CostPerDay)

N.B: - nDays means the number of days booked by the customer.


- Amount (amount to be pay in the exit) = nDays* CostPerDay
- Type means singular, double, business, etc ....

Formulate in relational algebra the following queries:


a) List of Clients (CliNb, name) living in Beirut.
b) Booking list (BookingNb, dateB) of the customer "Raed".
c) Clients List (CliNb, name) who resided in the hotel more than 5 days.
d) Rooms list (RoomNb, type) that have greater than 45m2 of surface.
e) List (name, cost) of all bookings made before '12/3/2021'.
f) List of the "double" rooms that were booked in '26/12/2021’ for more than
10 days.
g) List of clients who don’t book any room yet.
3
CHAPTER THREE
The files management Systems

I- The concept of an informatics file:


An informatics file is a set of numeric information joined under
the same name, registered on a storing device like hard disk, CD-
ROM, and manipulating like a single unit.

In order to facilitate the research of a file, the files are organized


in containers called Folders using files systems.

II- General definitions:

a. A data file:
A data file is a structural recipient of information
concerning only one object of the real world (Exp : Students,
Professors, …), and characterized by its name.

b. A field :
A field is the elementary information of a data file.

In another word, a field (or attribute) is a property of an


object of a real world (Exp: Students).

This property has a name (Firstname, Lastname, date of


birth, …), a data type (numeric, alphanumeric, date, …) and a
value (Farran, Wassim, 12/8/1994, …).

c. An article (record/ tuple/occurrence):


A record or article is only one element (row) in a file. The
record contains usually many information related to the same
object.
For example, a record (row) of a file containing the description of
students, contains many information for the same student.

Exp : file (Students)

Field

Student code First name Last name Date of birth Telephone

25 X X1 3/1/2004 07/123 456

Article 27 Y Y1 18/6/2005 07/654 321

42 Z Z1 5/8/2004 01/111 111

III- Data Access :

The FMS manages some access methods that depend on files


organization. The most used data access methods are:
a. Sequential access.
b. Direct access
c. Sequential - indexed access.

a. Sequential access:
A sequential access means that we must accede to the
elements in a sequential order. That’s means, if we want to accede
to the record No N; we must read all the previous records (N-1
records).

The sequential access can be imposed by some constraints, for


example in the case of read from a magnetic tape.
b. Direct access:
A direct access ( also called
random) to an element (for
example, a record of a file) means
that we can write or read a record
directly without access the previous
records.

To access directly the desired place, the software uses an index or


a mathematical calculation that gives the element address.

c. Sequential indexed access:


A file that has the organization sequential-indexed is composed
of 2 parts, the records and the indexes.

1. The indexes are tables Key Pointer


composed of 2 attributes: the
first corresponds to a value of
a key, the second to a pointer
(address) (direct access).
2. The records are organized in
Sequential file
order, records after records
Index Table of data
with the respect of this order
(sequential access).
d. Advantages and disadvantages of the different types of
files:

File type Advantages disadvantages


sequential - Reduced time to access the - Expensive updating.
next record. -big time to access
the Nth record.
- big time to treat
some records.
Indexed - - Reduced time to access - losing spaces
sequential records from various tables of because of tables of
indexes. indexes.
-Easy management for new
records.
-Sequential processing is
authorized.
direct - very effective. - Difficulty to establish
a link between the key
and a serial number.
- losing spaces in case
of delete.

IV- The articles format:


a. Articles with the same length:
All the records of the same file have the same length.

Article1 Article2 Article3 Article4 …

The advantage of this type of article is the facility of


articles management but the disadvantage is that there is a risk
of losing important place.
b. Article with variable length:
Each article in the same file has its own length.

Article1 Article2 Article3 Article4 Article5…

The disadvantage is that there is a complexity of managing


these items but its advantage is that there is a maximum
occupation of place.

c. Blocked articles:
Some FMS (File management System) organizes disk
space in blocks of sectors.

Exp: 1 block = 2 sectors of 512 bytes or 1 KB

The operations of read and write of FMS are done block by


block:

sect sect sect sect sect sect

Block 1 Block 2 Block 3 …

V- The blocking factor:


1 logic block = n articles
When multiple records are transferred together in an
exchange, we say that they are blocked. The blocking factor is
the number n of records per logical block. The interest the
blockage is doubled: space saving secondary memory (no
loosing space), and gain of I/O during sequential access.
VI- file operations:
a. At file level:
1. Create a file :
In fact, a file is obtained using a particular application software
(word, Excel…), for this we say that each file belongs to a
particular family indicated by a specified extension.

For example, we use Word software to get a document with a


.doc extension, Access software to create a database with a
.accdb extension, C ++ software to get a file with a .cc
extension etc ...

What you have as a result is called a file which will be stored on


a storage device like hard disk, CDROM…

2. Delete a file:
Deleting a file is the action of erasing (destroying) it from the
storage hardware where it is stored.

3. Copying a file:
This action consists of reproducing another copy of a source file
in order to obtain 2 separate files of the same origin.

4. Moving a file:
To move a file, the system must transmit information from a
starting point (source) to an ending point (destination) on the
same storage device or no.

5. Back up a file:
Backup consists of duplicating and securing the data contained
in a computer system.

6. Sorting a file:
Sorting a file consists of classifying its articles in an ordering
manner (either increasing –ascending or decreasing –
descending).
7. Merging files:
When several files (source) deal with the same subject but
under different titles, the merging consists in bringing them
together to obtain a single global file (destination).

8. Splitting a file:
This action consists of obtaining several files (destination) from
a single file (Source).

b. At article level:

1. Insert an article:
This action consists of adding new articles to an existing file.
Exp: In a sequential file, new articles are inserted at the end of
the file.

2. Delete an article:
This action consists of removing or deleting articles from an
existing file.

3. Read an article:
This action consists in consulting one or more articles of a file in
order to know their contents.

4. Update an article:
This action consists of changing (adjusting or modulating) the
content of one or more articles in order to update or maintain a
file.
CHAPTER FOUR

The Database conception


4
I- Redundancy and anomalies:
The redundancy is the repetition in several tuples without it being
necessary. Here is a relation that contains information about films:

Title Year duration studio actor

Star Wars 1977 124 Fox Carrie Fischer

Star Wars 1977 124 Fox Mark Hamill

Star Wars 1977 124 Fox Harrison Ford

Le vent se lève (wind picks up) 2006 124 Diaphana Films Cillian Murphy

Charlie et la chocolaterie 2005 116 Warner Johnny Depp

Charlie et la chocolaterie 1971 100 Warner Gene Wilder

The main founded anomalies are:

 Anomaly of update: if you must change redundant information,


such as the duration of the film, you must change this information in
several tuples.

 Anomaly of delete: if a set of values becomes empty, you can lose


other information by side effect. For example, if you remove the
"wind picks up", we lose as information that Cillian Murphy is an
actor.

 Anomaly of Insertion: to record a new studio, it must be given a


film name with the year of manufacture, its duration and the actor's
name at least.

 To avoid these anomalies, you must delete redundancy by


decomposing the relational schema. We use the normalization.
II- The functional dependencies :
A functional dependency is a constraint between two attributes in a
relationship of a database.

We say that X determines Y or that Y functionally depends on X if for


each value of X corresponds a unique value of Y.

We write: X → Y

We say that: X determines Y

Exp1:

PRODUCT (prod_Nb, Desc, price)


FDproduct= (prod_Nb → Desc, prod_Nb→ price)
Note: FDproduct= family of functional dependencies of the relation product.

Exp2:
Grade (contrôle_nb, student_Nb, grade)
(contrôle_nb, student_Nb) → grade

Properties (or Armstrong axioms)

1. Reflexivity: if X  Y and Y  X
then X  X.

2. Augmentation (increase) : if X  Y
then X,Z  Y,Z.

3. Transitivity : if X  Y and Y  Z
then X  Z.

4. Pseudo-transitivity : if X  Y and Y,Z  W


then X,Z  W

5. Decomposition : if X  Y,Z
then X  Y and X  Z.

6. Union : if X  Y and X  Z,
then X  Y,Z.
III- The normalization theory:
The Normalization permits to eliminate abnormalities of storage,
and the redundancy in order to have a coherent database.

a. The normalized relation:


A normalized relation is a simple structure, free from inconsistent
dependency, identified with a primary key and all fields depend on
the key.

The Normalization leads to break relation into several. It includes


many steps: 1NF, 2NF, 3NF and BCNF.

b. The first three normal forms:


i. 1FN first normal form: every attribute must be atomic.

Exp : The relation Person (person-nb, first_name,


last_name, street_and_city, nationalities)
is not in first normal form.

We must decompose it in:


- Person (person_nb, first_name, last_name, street, city)
- person_nationalities (person_nb, Code_Nat)
- Nationality (Code_Nat, description)

ii. Second normal form 2NF:


- The relation must be in 1NF.
- A non-key attribute must depend on the entire primary
key and not in a part.
Exp: the schema of the relations
Invoice (Inv_Id, client_Id, Inv_date)
Inv_detail (inv_id, prod_id, qty, unit_price, qty_in_stock)

 The relation Invoice is in second normal form because


client_id and Inv_date attributes depend on the key attribute.

 The relation Inv_detail is not in second normal form because


unit_price and qty-in-stock attributes depend on a part of the
key (prod_id). The solution is to divide it into 2 relations
Inv_detail and Products.

Inv_detail(inv_id, prod_id, qty)


Products (prod_id, unit_price, qty_in_stock)

iii. 3rd normal form: A relation is in 3NF if:


- The relation should be 2FN.
- Any non-key attribute must depend on the primary key
and not on other non-key attribute.

Exp: given the relation COMPANY (Flight, Airplane,


Pilot) with Flight → Airplane, Airplane → Pilot. Flight →
Pilot is in 2NF but not 3NF because the attribute non-key
pilot depends on another non-key attribute Airplane, the
solution is to decompose the relation as follows:

COMPANY (Flight, Airplane)


AIRPLANE (Airplane, Pilot)
c. Boyce Codd Normal Form BCNF:
A relation is in BCNF if:
- It is in 3NF.
- Any attribute that does not belong to a key is not a source of
FD to a part of a key. That is meant that the only existing
EFD are those in which a key determines an attribute.

Exp: given the relation Person:


Person (#SS_nb, #country, Name, Region)
and the following FD on this relation:
#SS_nb, #country →Name
#SS_nb, #country →Region
Region →country

There is an EFD that is not the result of a key and which determines
an attribute belongs to a key. This relation is in 3NF but not BCNF.
To have relational schema in BCNF, we must decompose Person:
Person (#SS_nb, # Region => Name)
Region (#Region, Country).
EXERCISES
NORMALIZATION AND
FUNCTIONAL DEPENDENCIES:

1) The axiom "Pseudo-transitivity" tells us that:


If X Y and Y W  Z, then XW  Z.
Demonstrate this axiom using the other axioms.

2) Given R (A, B, C, D, E, G, H)
and F= {A, BC ; BD ; C, DE ; GA ; DH}
Using the axioms of Armstrong, show that we can deduce the following sets:

1. BH.
2. B,G  C.
3. A,B E.

3) We consider the relation R (A, B, C, D, E) with the following functional


dependencies:
A→B
CD→E
E→A
B→D
Precise the primary key of R.

4) We consider the relation R1 (A, B, C, D, E, F) with the following functional


dependencies:
A → C.
DE → F
B→D
Precise the primary key of R1.
5) Show that if X → Y, Z and Z → C, W then X→ Y, Z, C

6) Let R be a universal relation R (a, b, c, d, e) and F the set of functional


dependencies: (1ère session 2012)

F={ a,bc; a,bd; de; bd}

a) Prove that a, b is a key to the relation R.


b) Decompose R in third normal form (3NF).

7) Given the following relation:


Doctor (D#, FN, LN, #Op, Op_Date, Op_resultat, #Patient, Pfn, PLn)

1- What is the normal form of this relation?


2- Give a possible solution for this case.

8) Given the relation named Exam defined by the following relational schema:
(2nd Session 2012)
- Exam (ExamNo, ExamDate, CourseNo, CourseName, coefficient)
- The set F of functional dependencies is the following:

F={ ExamNo ExamDate, ExamNo CourseNo, CourseNo


CourseName, coefficient)}

- What is the biggest normal form of relation Exam? Justify your answer.

9) The following relation describes commands made by clients, with the products
and quantities ordered by the client.
Commands (ComNb, ComDate, CliNb, AdrCli, ProdNb, Price, Qty)
a- What is the key to this relation?
b- In what normal form it is?
c- Put it in the 3NF.
10) Let the relation R (A, B, C, D) and the following FD:
BC -> D, C -> A, AB -> C
a- Give the key of the relation?
b- What are the violations of BCNF?
c- Decompose the relation, if necessary, into a collection of BCNF relations.

11) Let the following relation: R(A,B,C,D,E) and the following FD:
A→B, A→C, BC→A and D→E
a- Identify if R is in 3NF or BCNF.
b- If it is not BNCF propose R into a collection of BCNF relations.

12) Consider a relation R (A, B, C, D, E, G, H)


Let F be the set of functional dependencies (FD) associated to R:
A, B  C

BD

C, D  E

C, E  G

C, E  H

GA

a- Prove formally using the axioms of Armstrong that (B, G) is a key of R.


b- Is the relation R in second normal form? In third normal form?
c- Give, if necessary, a decomposition of R into relations in third normal
form. Are all the obtained relations in BCNF (Boyce-Codd normal form)?

You might also like