## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

**• We know that databases are often required to satisfy some integrity
**

constraints.

• The most common ones are functional and inclusion dependencies.

• We’ll study properties of integrity constraints.

• We show how they inﬂuence database design, in particular, how they

tell us what type of information should be recorded in each particular

relation

343 1 Introduction to Databases

Review: functional dependencies and keys

• A functional dependency is X → Y where X, Y are sequences of

attributes. It holds in a relation R if for every two tuples t

1

, t

2

in R:

π

X

(t

1

) = π

X

(t

2

) implies π

Y

(t

1

) = π

Y

(t

2

)

• A very important special case: keys

• Let K be a set of attributes of R, and U the set of all attributes of

R. Then K is a key if R satisﬁes functional dependency K → U.

• In other words, a set of attributes K is a key in R if for any two tuples

t

1

, t

2

in R,

π

K

(t

1

) = π

K

(t

2

) implies t

1

= t

2

• That is, a key is a set of attributes that uniquely identify a tuple in a

relation.

343 2 Introduction to Databases

Problems

• Good constraints vs Bad constraints: some constraints may be unde-

sirable as they lead to problems.

• Implication problem: Suppose we are given some constraints. Do they

imply others?

This is important, as we never get the list of all constraints that hold

in a database (such a list could be very large, or just unknown to the

designer).

It is possible that all constraints given to us look OK, but they imply

some bad constraints.

• Axiomatization of constraints: a simple way to state when some con-

straints imply others.

343 3 Introduction to Databases

Bad constraints: example

DME Dept Manager Employee

D1 Smith Jones

D1 Smith Brown

D2 Turner White

D3 Smith Taylor

D4 Smith Clarke

... ... ...

ES Employee Salary

Jones 10

Brown 20

White 20

... ...

• Functional dependencies

• in DME: Dept → Manager, but none of the following:

Dept → Employee

Manager → Dept

Manager → Employee

• in ES: Employee → Salary (Employee is a key for ES)

but not Salary → Employee

343 4 Introduction to Databases

Update anomalies

• Insertion anomaly: A company hires a new employee, but doesn’t

immediately assign him/her to a department. We cannot record this

fact in relation DME.

• Deletion anomaly: Employee White leaves the company. We have

to delete a tuple from DME. But this results in deleting manager Turner

as well, even though he hasn’t left!

• Reason for anomalies: the association between managers and employees

is represented in the same relation as the association between managers

and departments. Furthermore, the same fact (a department D is man-

aged by M) could be represented more than once.

• Putting this into the language of functional dependencies: we have a

dependency Dept → Manager, but Dept is not a key.

• In general, one tries to avoid situations like this.

343 5 Introduction to Databases

dependencies and implication

• Suppose we have relation R with 3 attributes A, B, C

• Someone tells us that R satisﬁes A → B and B → ¦A, C¦.

• This looks like the bad case before: we have A → B that does not

mention C.

• But A is a key: if π

A

(t

1

) = π

A

(t

2

), then π

B

(t

1

) = π

B

(t

2

), and since

B is a key, t

1

= t

2

.

• Thus, even though we did not have information that A is a key, we

were able to derive it, as it is implied by other dependencies.

• Using implication, we can ﬁnd all dependencies and ﬁgure out if there

is a problem with the design.

343 6 Introduction to Databases

Implication for functional dependencies

• Normally we try to ﬁnd nontrivial dependencies

A trivial dependency is X → Y with Y ⊆ X

• Suppose we are given a set U of attributes of a relation, a set F of

functional dependencies (FDs), and a FD f over U.

• F implies f, written

F ¬ f

if for every relation over attributes U, if R satisﬁes all FDs in F, then

it is the case that it also satisﬁes f.

• Problem: Given U and F, ﬁnd all nontrivial FDs f such that F ¬ f.

343 7 Introduction to Databases

Implication for functional dependencies cont’d

• Closed set with respect to F: a subset V of U such that, for every

X → Y in F,

if X ⊆ V , then Y ⊆ V

• Property of FDs: For every set V , there exists a unique set C

F

(V ),

called the closure of V with respect to F, such that

C

F

(V ) is closed

V ⊆ C

F

(V )

For every closed set W, V ⊆ W implies C

F

(V ) ⊆ W.

• Solution to the implication problem:

A FD X → Y is implied by F

if and only if

Y ⊆ C

F

(X)

343 8 Introduction to Databases

Implication and closure

• To solve the implication problem, it suﬃces to ﬁnd closure of each set

of attributes.

• A naive approach to ﬁnding C

F

(X): check all subsets V ⊆ U, verify

is they are closed, and select the smallest closed set containing X.

• Problem: this is too expensive.

• If U has n elements and X has m < n elements, then there are 2

n−m

subsets of U that contain V .

• But there is a very fast, O(n), algorithm to compute C

F

(X).

• We will see a very simple O(n

2

) algorithm instead.

343 9 Introduction to Databases

Implication and closure cont’d

Algorithm CLOSURE

Input: a set F of FDs, and a set X

Output: C

F

(X)

1. unused := F

2. closure := X

3. repeat until no change:

if Y → Z ∈ unused and Y ⊆ closure then

(i) unused := unused −¦Y → Z¦

(ii) closure := closure ∪Z

4. Output closure.

Homework: Prove that CLOSURE returns C

F

(X)

and that its running time is O(n

2

).

343 10 Introduction to Databases

Properties of the closure

• X ⊆ C

F

(X)

• X ⊆ Y implies C

F

(X) ⊆ C

F

(Y )

• C

F

(C

F

(X)) = C

F

(X)

• It turns out that for any operator C satisfying

X ⊆ C(X)

X ⊆ Y implies C(X) ⊆ C(Y )

C(C(X)) = C(X)

there exists a family of FDs F such that C = C

F

.

343 11 Introduction to Databases

Closure: Example

• Common practice: write sets as AB for ¦A, B¦, BC for ¦B, C¦ etc

• U = ¦A, B, C¦, F = ¦A → B, B → AC¦

• Closure:

C

F

(∅) = ∅

C

F

(C) = C

C

F

(B) = ABC

hence C

F

(X) = ABC for any X that contains B

C

F

(A) = ABC

hence C

F

(X) = ABC for any X that contains A

343 12 Introduction to Databases

Keys, candidate keys, and prime attributes

• A set X of attributes is a key with respect to F if X → U is implied

by F

• That is, X is a key if C

F

(X) = U

• In the previous example: any set containing A or B is a key.

• Candidate keys: smallest keys. That is, keys X such that for each

Y ⊂ X, Y is not a key.

• In the previous example: A and B.

• Suppose U has n attributes. What is the maximum number of candi-

date keys?

answer:

n

¸n/2|

**and that’s very large: 252 for n = 10, > 180, 000 for n = 20
**

• Prime attribute: an attribute of a candidate key.

343 13 Introduction to Databases

Inclusion dependencies

• Functional and inclusion dependencies are the most common ones en-

countered in applications

• Reminder: R[A

1

, . . . , A

n

] ⊆ S[B

1

, . . . , B

n

] says that for any tuple t in

R, there is a tuple t

/

in S such that

π

A

1

,...,A

n

(t) = π

B

1

,...,B

n

(t

/

)

• Suppose we have a set of relation names and inclusion dependencies

(IDs) between them. Can we derive all IDs that are valid?

• Formally, let G be a set of IDs and g and ID. We say that G implies

g, written

G ¬ g

if any set of relations that satisﬁes all IDs in G, also satisﬁed g.

• The answer is positive, just as for FDs.

343 14 Introduction to Databases

Implication of inclusion dependencies

• There are simple rules:

• R[A] ⊆ R[A]

•

if R[A

1

, . . . , A

n

] ⊆ S[B

1

, . . . , B

n

]

then R[A

i

1

, . . . , A

i

n

] ⊆ S[B

i

1

, . . . , B

i

n

]

where ¦i

1

, . . . , i

n

¦ is a permutation of ¦1, . . . , n¦.

•

if R[A

1

, . . . , A

m

, A

m+1

, . . . , A

n

] ⊆ S[B

1

, . . . , B

m

, B

m+1

, . . . , B

n

]

then R[A

1

, . . . , A

m

] ⊆ S[B

1

, . . . , B

m

]

• if R[X] ⊆ S[Y ] and S[Y ] ⊆ T[Z] then R[X] ⊆ T[Z]

343 15 Introduction to Databases

Implication of inclusion dependencies

• An ID g is implied by G iﬀ it can be derived by repeated application of

the four rules shown above

• This immediately gives an expensive algorithm: apply all the rules until

none are applicable.

• It turns out that the problem is inherently hard; no reasonable algorithm

for solving it will ever be found

• But an important special case admits eﬃcient solution

• Unary IDs: those of the form R[X] ⊆ S[Y ]

• Implication G ¬ g can be tested in polynomial time for unary IDs

343 16 Introduction to Databases

Functional and inclusion dependencies

• In majority of applications, one has to deal with only two kinds of

dependencies: FDs and IDs

• Implication problem can be solved for FDs and IDs.

• Can it be solved for FDs and IDs together?

• That is, given a set of FDs F and a set of IDs G, a FD f and an ID

g. Can we determine if

F ∪ G ¬ f

F ∪ G ¬ g

• That is, any database that satisﬁes F and G, must satisfy f (or g).

• It turns out that no algorithm can possibly solve this problem.

343 17 Introduction to Databases

Even more restrictions

• Relations are typically declared with just primary keys

• Inclusion dependencies typically occur in foreign key declarations

• Can implication be solved algorithmically for just primary keys and for-

eign keys?

• The answer is still NO.

• Thus, it is hard to reason about some of the most common classes of

constraints found in relational databases.

343 18 Introduction to Databases

When is implication solvable for FDs and IDs?

• Recall: an ID is unary if it is of the form R[X] ⊆ S[Y ].

• Implication can be tested for unary IDs and arbitrary FDs

• Moreover, it can be done in polynomial time.

• However, the algorithm for doing this is quite complex.

343 19 Introduction to Databases

Dependencies: summary

• FDs: X → Y

• Keys: X → U, where U is the set of all attributes of a relation.

• Candidate key: a minimal key X; no subset of X is a key.

• Inclusion dependency: R[A

1

, . . . , A

n

] ⊆ S[B

1

, . . . , B

n

] (unary if n = 1)

• Foreign key: R[A

1

, . . . , A

n

] ⊆ S[B

1

, . . . , B

n

] and ¦B

1

, . . . , B

n

¦ is a

key

• Implication problem:

easy for FDs alone

hard but solvable for IDs alone

solvable by a complex algorithm for FDs and unary IDs

unsolvable for FDs and IDs, and even

unsolvable for primary keys and foreign keys

343 20 Introduction to Databases

Database Design

• Finding database schemas with good properties

• Good properties:

no update anomalies

no redundancies

no information loss

• Input: list of all attributes and constraints (usually functional depen-

dencies)

• Output: list of relations and constraints they satisfy

343 21 Introduction to Databases

Example: bad design

• Attributes: Title, Director, Theater, Address, Phone, Time, Price

• Constraints:

FD1 Theater → Address, Phone

FD2 Theater, Time, Title → Price

FD3 Title → Director

• Bad design: put everything in one relation

BAD[Title, Director, Theater, Address, Phone, Time, Price]

343 22 Introduction to Databases

Why is BAD bad?

• Redundancy: many facts are repeated

• Director is determined by Title

For every showing, we list both director and title

• Address is determined by Theater

For every movie playing, we repeat the address

• Update anomalies:

• If Address changes in one tuple, we have inconsistency, as it must be

changed in all tuples corresponding to all movies and showtimes.

• If a movie stops playing, we lose association between Title and Director

• Cannot add a movie before it starts playing

343 23 Introduction to Databases

Good design

• Split BAD into 3 relations:

Relational Schema GOOD:

Table attributes constraints

T1 Theater, Address, Phone FD1: Theater → Address, Phone

T2 Theater, Title, Time, Price FD2: Theater, Time, Title → Price

T3 Title, Director FD3: Title → Director

343 24 Introduction to Databases

Why is GOOD good?

• No update anomalies:

Every FD deﬁnes a key

• No information loss:

T1 = π

Theater,Address,Phone

(BAD)

T2 = π

Theater,Title,Time,Price

(BAD)

T3 = π

Title,Director

(BAD)

BAD = T1 1 T2 1 T3

• No constraints are lost

FD1, FD2, FD3 all appear as constraints for T1, T2, T3

343 25 Introduction to Databases

Boyce-Codd Normal Form (BCNF)

• What causes updates anomalies?

• Functional dependencies X → Y where X is not a key.

• A relation is in Boyce-Codd Normal Form (BCNF) if for every nontrivial

FD X → Y , X is a key.

• A database is in BCNF if every relation in it is in BCNF.

343 26 Introduction to Databases

Decompositions: Criteria for good design

Given a set of attributes U and a set F of functional dependencies, a

decomposition of (U, F) is a set

(U

1

, F

1

), . . . , (U

n

, F

n

)

where U

i

⊆ U and F

i

is a set of FDs over attributes U

i

.

A decomposition is called:

• BCNF decomposition if each (U

i

, F

i

) is in BCNF.

343 27 Introduction to Databases

Decompositions: Criteria for good design cont’d

• lossless if for every relation R over U that satisﬁes all FDs in F,

each π

U

i

(R) satisﬁes F

i

and

R = π

U

1

(R) 1 π

U

2

(R) 1 . . . 1 π

U

n

(R)

• Dependency preserving if

F and F

∗

=

i

F

i

are equivalent.

That is, for every FD f,

F ¬ f ⇔ F

∗

¬ f

or

∀X ⊆ U C

F

(X) = C

F

∗(X)

343 28 Introduction to Databases

Projecting FDs

• Let U be a set of attributes and F a set of FDs

• Let V ⊆ U

• Then

π

V

(F) = ¦X → Y [ X, Y ⊆ V, Y ⊆ C

F

(X)¦

• In other words, these are all FDs on V that are implied by F:

π

V

(F) = ¦X → Y [ X, Y ⊆ V, F ¬ X → Y ¦

• Even though as deﬁned π

V

(F) could be very large, it can often be

compactly represented by a set F

/

of FDs on V such that:

∀X, Y ⊆ V : F

/

¬ X → Y ⇔ F ¬ X → Y

343 29 Introduction to Databases

Decomposition algorithm

Input: A set U of attributes and F of FDs

Output: A database schema S = ¦(U

1

, F

1

), . . . , (U

n

, F

n

)¦

1. Set S:=¦(U, F)¦

2. While S is not in BCNF do:

(a) Choose (V, F

/

) ∈ S not in BCNF

(b) Choose nonempty disjoint X, Y, Z such that:

(i) X ∪ Y ∪ Z = V

(ii) Y = C

F

/(X) −X

(that is, F

/

¬ X → Y and F

/

,¬ X → A for all A ∈ Z)

(c) Replace (V, F

/

) by

(X ∪ Y, π

X∪Y

(F

/

)) and (X ∪ Z, π

X∪Z

(F

/

))

(d) If there are (V

/

, F

/

) and (V

//

, F

//

) in S with V

/

⊆ V

//

then remove (V

/

, F

/

)

343 30 Introduction to Databases

Decomposition algorithm: example

Consider BAD[Title, Director, Theater, Address, Phone, Time, Price]

and constraints FD1, FD2, FD3.

Initialization: S =(BAD, FD1, FD2, FD3)

While loop

• Step 1: Select BAD; it is not in BCNF because

Theater → Address, Phone

but Theater ,→ Title, Director, Time, Price

Our sets are:

X = ¦Theater¦, Y = ¦Address, Phone¦,

Z = ¦ Title, Director, Time, Price¦

343 31 Introduction to Databases

Decomposition algorithm: example cont’d

• Step 1, cont’d

π

X∪Y

(¦FD1,FD2,FD3¦) = FD1

π

X∪Z

(¦FD1,FD2,FD3¦) = ¦FD2, FD3¦

Thus, after the ﬁrst step we have two schemas:

S

1

= (¦Theater, Address, Phone¦, FD1)

S

/

1

= (¦Theater, Title, Director, Time, Price¦, FD2, FD3)

• Step 2: S

1

is in BCNF, there is nothing to do.

S

/

1

is not in BCNF: Title is not a key.

Let X = ¦Title¦, Y = ¦Director¦, Z = ¦Theater, Time, Price¦

π

X∪Y

(¦FD1,FD2,FD3¦) = FD3

π

X∪Z

(¦FD1,FD2,FD3¦) = FD2

343 32 Introduction to Databases

Decomposition algorithm: example cont’d

• After two steps, we have:

S

1

= (¦Theater, Address, Phone¦, FD1)

S

2

= (¦Theater,Title, Time, Price¦, FD2)

S

3

= (¦Title, Director¦, FD3)

• S

1

, S

2

, S

3

are all in BCNF, this completes the algorithm.

343 33 Introduction to Databases

Properties of the decomposition algorithm

• For any relational schema, the decomposition algorithm yields a new

relational schema that is:

in BCNF, and

lossless

• However, the output is not guaranteed to be dependency preserving.

343 34 Introduction to Databases

Surprises with BCNF

• Example: attributes Theater (th), Screen (sc), Title (tl), Snack (sn),

Price (pr)

• Two FDs

Theater, Screen → Title

Theater, Snack → Price

• Decomposition into BCNF: after one step

(th, sc, tl; th, sc → tl), (th, sc, sn, pr; th, sn → pr)

• After two steps:

(th, sc, tl; th, sc → tl), (th, sn, pr; th, sn → pr), (th, sc, sn; ∅)

• The last relation looks very unnatural!

343 35 Introduction to Databases

Surprises with BCNF cont’d

• The extra relation (th, sc, sn; ∅) is not needed

• Why? Because it does not add any information

• All FDs are accounted for, so are all attributes, as well as all tuples

• BCNF tells us that for any R satisfying the FDs:

R = π

th,sc,tl

(R) 1 π

th,sn,pr

(R) 1 π

th,sc,sn

(R)

• However, in out case we also have

R = π

th,sc,tl

(R) 1 π

th,sn,pr

(R)

• This follows from the intuitive semantics of the data.

• But what is there in the schema to tell us that such an equation holds?

343 36 Introduction to Databases

Multivalued dependencies

• Tell us when a relation is a join of two projections

• A multivalued dependency (MVD) is an expression of the form

X →→ Y

where X and Y are sets of attributes

• Given a relation R on the set of attributes U, MVD X →→ Y holds in

it if

R = π

XY

(R) 1 π

X(U−Y )

(R)

• Simple property: X → Y implies X →→ Y

• Another deﬁnition: R satisﬁes X →→ Y if for every two tuples t

1

, t

2

in R with π

X

(t

1

) = π

X

(t

2

), there exists a tuple t with

π

XY

(t) = π

XY

(t

1

)

π

X(U−Y )

(t) = π

X(U−Y )

(t

2

)

343 37 Introduction to Databases

MVDs and decomposition

• If a relation satisﬁes a nontrivial MVD X →→ Y and X is not a key,

it should be decomposed into relations with attribute sets XY and

X(U −Y ).

• Returning to the example, assume that we have a MVD

Theater →→ Screen

• Apply this to the “bad” schema (th, sc, sn; ∅) and obtain

(th, sc; ∅) (th, sn; ∅)

• Both are subsets of already produced schemas and can be eliminated.

• Thus, the FDs plus Theater →→ Screen give rise to decomposition

(th, sc, tl; th, sc → tl), (th, sn, pr; th, sn → pr)

343 38 Introduction to Databases

4th Normal Form (4NF)

• Decompositions like the one just produced are called 4th Normal Form

decompositions

• They can only be produced in the presence of MVDs

• Formally, a relation is in 4NF if for any nontrivial MVD X →→ Y ,

either X ∪ Y = U, or X is a key.

• Since X → Y implies X →→ Y , 4NF implies BCNF

• General rule: if BCNF decomposition has “unnatural relations”, try to

use MVDs to decompose further into 4NF

• A useful rule: any BCNF schema that has one key that consists of a

single attribute, is in 4NF

• Warning: the outcome of a decomposition algorithm may depend on

the order in which constraints are considered.

343 39 Introduction to Databases

BCNF and dependency preservation

• Schema Lecture[C(lass), P(rofessor), T(ime)]

• Set of FDs F = ¦C → P, PT → C¦

• (Lectures, F) not in BCNF: C → P ∈ F, but C is not a key.

• Apply BCNF decomposition algorithm: X = ¦C¦, Y = ¦P¦, Z =

¦T¦

• Output: (¦C, P¦, C → P), (¦C, T¦, ∅)

• We lose PT → C!

• In fact, there is no relational schema in BCNF that is equivalent to

Lectures and is lossless and dependency preserving.

• Proof: there are just a few schemas on 3 attributes ... check them all

... exercise!

343 40 Introduction to Databases

Third Normal Form (3NF)

• If we want a decomposition that guarantees losslessness and dependency

preservation, we cannot always have BCNF

• Thus we need a slightly weaker condition

• Recall: a candidate key is a key that is not a subset of another key

• An attribute is prime if it belongs to a candidate key

• Recall (U, F) is in BCNF if for any FD X → A, where A ,∈ X is an

attribute, F ¬ X → A implies that X is a key

• (U, F) is in the third normal form (3NF) if for any FD X → A, where

A ,∈ X is an attribute, F ¬ X → A implies that one of the following

is true:

- either X is a key, or

- A is prime

343 41 Introduction to Databases

Third Normal Form (3NF) cont’d

• Main diﬀerence between BCNF and 3NF: in 3NF non-key FDs are OK,

as long as they imply only prime attributes

• Lectures[C, P, T], F = ¦C → P, PT → C¦ is in 3NF:

• PT is a candidate key, so P is a prime attribute.

• More redundancy than in BCNF: each time a class appears in a tuple,

professor’s name is repeated.

• We tolerate this redundancy because there is no BCNF decomposition.

343 42 Introduction to Databases

Decomposition into third normal form: covers

• Decomposition algorithm needs a small set that represents all FDs in a

set F

• Such a set is called minimal cover. Formally:

• F

/

is a cover of F iﬀ C

F

= C

F

/ . That is, for every f,

F ¬ f ⇔ F

/

¬ f

• F

/

is a minimal cover if

F

/

is a cover,

no subset F

//

⊂ F

/

is a cover of F,

each FD in F

/

is of the form X → A, where A is an attribute,

For X → A ∈ F

/

and X

/

⊂ X, A ,∈ C

F

(X

/

).

343 43 Introduction to Databases

Decomposition into third normal form: covers cont’d

• A minimal cover is a small set of FDs that give us all the same infor-

mation as F.

• Example: let

¦A → AB, A → AC, A → B, A → C, B → BC¦

• Closure:

C

F

(A) = ABC,

C

F

(B) = BC,

C

F

(C) = C, and hence:

• Minimal cover: A → B, B → C

343 44 Introduction to Databases

Decomposition into third normal form: algorithm

Input: A set U of attributes and F of FDs

Output: A database schema S = ¦(U

1

, F

1

), . . . , (U

n

, F

n

)¦

Step 1. Find a minimal cover F

/

of F.

Step 2. If there is X → A in F

/

with XA = U, then output (U, F

/

).

Otherwise, select a key K, and output:

2a) (XA, X → A) for each X → A in F

/

, and

2b) (K, ∅)

343 45 Introduction to Databases

Decomposition into third normal form: example

• F = ¦A → AB, A → AC, A → B, A → C, B → BC¦

• F

/

= ¦A → B, B → C¦

• A is a key, so output:

(A, ∅), (AB, A → B), (BC, B → C)

• Simpliﬁcation: if one of attribute-sets produced in 2a) is a key, then

step 2b) is not needed

• Hence the result is:

(AB, A → B), (BC, B → C)

• Other potential simpliﬁcations: if we have (XA

1

, X → A

1

), . . . ,

(XA

k

, X → A

k

) in the output, they can be replaced by

(XA

1

. . . A

k

, X → A

1

. . . A

k

)

343 46 Introduction to Databases

3NF cont’d

• Properties of the decomposition algorithm: it produces a schema which

is

in 3NF

lossless, and

dependency preserving

• Complexity of the algorithm?

• Given the a minimal cover F

/

, it is linear time. (Why?)

• But how hard is it to ﬁnd a minimal cover?

• Naive approach: try all sets F

/

, check if they are minimal covers.

• This is too expensive (exponential); can one do better?

• There is a polynomial-time algorithm. How?

343 47 Introduction to Databases

Overview of schema design

• Choose the set of attributes U

• Choose the set of FDs F

• Find a lossless dependency preserving decomposition into:

BCNF, if it exists

3NF, if BCNF decomposition cannot be found

343 48 Introduction to Databases

Other constraints

• SQL lets you specify a variety of other constraints:

Local (refer to a tuple)

global (refer to tables)

• These constraints are enforced by a DBMS, that is, they are checked

when a database is modify.

• Local constraints occur in the CREATE TABLE statement and use the

keyword CHECK after attribute declaration:

CREATE TABLE T (...

....

A <type> CHECK <condition>,

B <type> CHECK <condition,

....

)

343 49 Introduction to Databases

Local constraints

• Example: the value of attribute Rank must be between 1 and 5:

Rank INT CHECK (1 <= Rank AND Rank <= 5)

• Example: the value of attribute A must be less than 10, and occur as

a value of attribute C of relation R:

A INT CHECK (A IN

SELECT R.C FROM R WHERE R.C < 10)

• Example: each value of attribute Name occurs precisely once as a value

of attribute LastName in relation S:

Name VARCHAR(20) CHECK (1 = SELECT COUNT(*)

FROM S

WHERE Name=S.LastName)

343 50 Introduction to Databases

Assertions

• These assert some conditions that must be satisﬁed by the whole

database.

• Typically assertions say that all elements in a database satisfy certain

condition.

• All salaries are at least 10K:

CREATE ASSERTION A1 CHECK

( (SELECT MIN (Empl.Salary) FROM Empl >= 10000) )

• Most assertions use NOT EXISTS in them:

SQL way of saying

“every x satisﬁes F”

is to say

“does not exist x that satisﬁes F”

343 51 Introduction to Databases

Assertions cont’d

• Example: all employees of department ’sales’ have salary at least 20K:

CREATE ASSERTION A2 CHECK

( NOT EXISTS (SELECT * FROM Empl

WHERE Dept=’sales’ AND Salary < 20000) )

• All theaters play movies that are at most 3hrs long:

CREATE ASSERTION A2 CHECK

( NOT EXISTS (SELECT * FROM Schedule S, Movies M

WHERE S.Title=M.Title AND M.Length > 180) )

343 52 Introduction to Databases

Assertions cont’d

• Some assertions use counting. For example, to ensure that table T is

never empty, use:

CREATE ASSERTION T_notempty CHECK

( 0 <> (SELECT COUNT(*) FROM T) )

• Assertions are not forever: they can be created, and dropped later:

DROP ASSERTION T_nonempty.

343 53 Introduction to Databases

Triggers

• They specify a set of actions to be taken if certain event(s) took place.

• Follow the event-condition-action scheme.

• Less declarative and more procedural than assertions.

• Example: If an attempt is made to change the length of a movie, it

should not go through.

CREATE TRIGGER NoLengthUpdate

AFTER UPDATE OF Length ON Movies

REFERENCING

OLD ROW AS OldTuple

NEW ROW AS NewTuple

FOR EACH ROW

WHEN (OldTuple.Length <> NewTuple.Length)

SET Length = OldTuple.Length

WHERE title = NewTuple.title

343 54 Introduction to Databases

Analysis of the trigger

• AFTER UPDATE OF Length ON Movies speciﬁes the event: Rela-

tion Movies was modiﬁed, and attribute Length changed its value.

• REFERENCING

OLD ROW AS OldTuple

NEW ROW AS NewTuple

says how we refer to tuples before and after the update.

• FOR EACH ROW – the tigger is executed once for each update row.

• WHEN (OldTuple.Length <> NewTuple.Length) – condition for

the trigger to be executed (Lenght changed its value).

• SET Length = OldTuple.Length

WHERE title = NewTuple.title

is the action: Lenght must be restored to its previous value.

343 55 Introduction to Databases

Another trigger example

• Table Empl(emp_id,rank,salary)

• Requirement: the average salary of managers should never go below

$100,000.

• Problem: suppose there is a complicated update operation that aﬀects

many tuples. It may initially decrease the average, and then increase it

again. Thus, we want the trigger to run after all the statements of the

update operation have been executed.

• Hence, FOR EACH ROW trigger cannot work here.

343 56 Introduction to Databases

Another trigger example cont’d

Trigger for the previous example:

CREATE TRIGGER MaintainAvgSal

AFTER UPDATE OF Salary ON Empl

REFERENCING

OLD TABLE AS OldTuples

NEW TABLE AS NewTuples

FOR EACH STATEMENT

WHEN ( 100000 > (SELECT AVG(Salary)

FROM Empl WHERE Rank=’manager’) )

BEGIN

DELETE FROM Empl

WHERE (emp_id, rank, salary) in NewTuples;

INSERT INTO Empl

(SELECT * FROM OldTuples)

END;

343 57 Introduction to Databases

Analysis of the trigger

• AFTER UPDATE OF Salary ON Empl speciﬁes the event: Relation

Empl was modiﬁed, and attribute Salary changed its value.

• REFERENCING

OLD TABLE AS OldTuples

NEW TABLE AS NewTuples

says that we refer to the set of tuples that were inserted into Empl as

NewTuples and to the set of tuples that were deleted from Empl as

OldTuples.

• FOR EACH STATEMENT – the tigger is executed once for the entire

update operation, not once for each updated row.

• WHEN ( 100000 > (SELECT AVG(Salary)

FROM Empl WHERE Rank=’manager’) )

is the condition for the trigger to be executed: after the entire update,

the average Salary of managers is less than $100K.

343 58 Introduction to Databases

Analysis of the trigger cont’d

• BEGIN

DELETE FROM Empl

WHERE (emp_id, rank, salary) in NewTuples;

INSERT INTO Empl

(SELECT * FROM OldTuples)

END;

is the action, that consists of two updates between BEGIN and END.

• DELETE FROM Empl

WHERE (emp_id, rank, salary) in NewTuples;

deletes all the tuples that were inserted in the illegal update, and

• INSERT INTO Empl

(SELECT * FROM OldTuples)

re-inserts all the tuples that were deleted in that update.

343 59 Introduction to Databases

Conceptual modeling

• How does one come up with a set of attributes and FDs?

• Often by using conceptual design techniques, typically represented by

entity-relationship models

• Database describes objects (entities), relationships between them, and

their attributes.

• Goal: describe entities, relationships, and attributes based on a real-

world situation that needs to be modeled by a database, ﬁnd relational

database design, and then apply normalization techniques.

• Steps:

describe entities

describe relationships between them

describe attributes

translate into relational model

343 60 Introduction to Databases

ENTITIES

STUDENT

COURSE

EMPLOYEE PROJECT

343 61 Introduction to Databases

STUDENT

EMPLOYEE

COURSE

PROJECT

EXAM

WORKS

RELATIONSHIPS

343 62 Introduction to Databases

STUDENT

COURSE

EXAM

ATTRIBUTES

StudId

Name

Major

Date Course# Name

Enrollment

343 63 Introduction to Databases

Translating into relational model: rules

• Each entity E

i

is described by a relation T

i

• Key attributes of E

i

form a key of T

i

• Each relationship R

j

is described by a relation S

j

• For a relationship R

j

between E

1

, . . . , E

k

, the key of S

j

is the set of

all attributes which are keys for T

1

, . . . , T

k

• Sometimes attributes have to be renamed

343 64 Introduction to Databases

STUDENT

COURSE

EXAM

StudId

Name

Major

Date Course# Name

Enrollment

TRANSLATION INTO RELATIONS

Student(StudId, Name, Major) Course(Course#, Name, Enrollment)

Exam(StudId,Course#,Date)

343 65 Introduction to Databases

Database design in SQL

CREATE TABLE Student

(StudId char(10) not null primary key,

Name char(20), Major char(5))

CREATE TABLE Course

(Course# int not null primary key,

Name char(30), Enrollment int)

CREATE TABLE Exam

(Student char(10) not null,

Course int not null,

Date int,

unique (Student, Course))

343 66 Introduction to Databases

Problems • Good constraints vs Bad constraints: some constraints may be undesirable as they lead to problems. • Implication problem: Suppose we are given some constraints. Do they imply others? This is important, as we never get the list of all constraints that hold in a database (such a list could be very large, or just unknown to the designer). It is possible that all constraints given to us look OK, but they imply some bad constraints. • Axiomatization of constraints: a simple way to state when some constraints imply others.

343

3

Introduction to Databases

Bad constraints: example DME Dept Manager Employee D1 Smith Jones D1 Smith Brown D2 Turner White D3 Smith Taylor D4 Smith Clarke ... ... ... • Functional dependencies • in DME: Dept → Manager, but none of the following: Dept → Employee Manager → Dept Manager → Employee • in ES: Employee → Salary (Employee is a key for ES) but not Salary → Employee

343 4 Introduction to Databases

ES Employee Salary Jones 10 Brown 20 White 20 ... ...

Update anomalies • Insertion anomaly: A company hires a new employee, but doesn’t immediately assign him/her to a department. We cannot record this fact in relation DME. • Deletion anomaly: Employee White leaves the company. We have to delete a tuple from DME. But this results in deleting manager Turner as well, even though he hasn’t left! • Reason for anomalies: the association between managers and employees is represented in the same relation as the association between managers and departments. Furthermore, the same fact (a department D is managed by M) could be represented more than once. • Putting this into the language of functional dependencies: we have a dependency Dept → Manager, but Dept is not a key. • In general, one tries to avoid situations like this.

343 5 Introduction to Databases

dependencies and implication • Suppose we have relation R with 3 attributes A, B, C • Someone tells us that R satisﬁes A → B and B → {A, C}. • This looks like the bad case before: we have A → B that does not mention C. • But A is a key: if πA(t1) = πA(t2), then πB (t1) = πB (t2), and since B is a key, t1 = t2. • Thus, even though we did not have information that A is a key, we were able to derive it, as it is implied by other dependencies. • Using implication, we can ﬁnd all dependencies and ﬁgure out if there is a problem with the design.

343

6

Introduction to Databases

for every X → Y in F . called the closure of V with respect to F . • Problem: Given U and F . and a FD f over U . such that CF (V ) is closed V ⊆ CF (V ) For every closed set W . there exists a unique set CF (V ). V ⊆ W implies CF (V ) ⊆ W . 343 7 Introduction to Databases Implication for functional dependencies cont’d • Closed set with respect to F : a subset V of U such that. • Solution to the implication problem: A FD X → Y is implied by F if and only if Y ⊆ CF (X) 343 8 Introduction to Databases . if R satisﬁes all FDs in F . • F implies f . a set F of functional dependencies (FDs).Implication for functional dependencies • Normally we try to ﬁnd nontrivial dependencies A trivial dependency is X → Y with Y ⊆ X • Suppose we are given a set U of attributes of a relation. then Y ⊆ V • Property of FDs: For every set V . if X ⊆ V . then it is the case that it also satisﬁes f . written F f if for every relation over attributes U . ﬁnd all nontrivial FDs f such that F f.

O(n). repeat until no change: if Y → Z ∈ unused and Y ⊆ closure then (i) unused := unused −{Y → Z} (ii) closure := closure ∪Z 4. • If U has n elements and X has m < n elements. and select the smallest closed set containing X. • Problem: this is too expensive. algorithm to compute CF (X). Output closure. • But there is a very fast.Implication and closure • To solve the implication problem. closure := X 3. Homework: Prove that CLOSURE returns CF (X) and that its running time is O(n2). unused := F 2. it suﬃces to ﬁnd closure of each set of attributes. and a set X Output: CF (X) 1. • We will see a very simple O(n2) algorithm instead. then there are 2n−m subsets of U that contain V . • A naive approach to ﬁnding CF (X): check all subsets V ⊆ U . 343 9 Introduction to Databases Implication and closure cont’d Algorithm CLOSURE Input: a set F of FDs. verify is they are closed. 343 10 Introduction to Databases .

B. B}. F = {A → B. C} etc • U = {A. C}.Properties of the closure • X ⊆ CF (X) • X ⊆ Y implies CF (X) ⊆ CF (Y ) • CF (CF (X)) = CF (X) • It turns out that for any operator C satisfying X ⊆ C(X) X ⊆ Y implies C(X) ⊆ C(Y ) C(C(X)) = C(X) there exists a family of FDs F such that C = CF . 343 11 Introduction to Databases Closure: Example • Common practice: write sets as AB for {A. BC for {B. B → AC} • Closure: CF (∅) = ∅ CF (C) = C CF (B) = ABC hence CF (X) = ABC for any X that contains B CF (A) = ABC hence CF (X) = ABC for any X that contains A 343 12 Introduction to Databases .

. keys X such that for each Y ⊂ X. We say that G implies g. . written G g if any set of relations that satisﬁes all IDs in G. let G be a set of IDs and g and ID.Keys. just as for FDs. Can we derive all IDs that are valid? • Formally. • Suppose U has n attributes. 000 for n = 20 • Prime attribute: an attribute of a candidate key.. Bn] says that for any tuple t in R. • The answer is positive..An (t) = πB1. 343 14 Introduction to Databases . .. also satisﬁed g. What is the maximum number of candidate keys? n answer: n/2 and that’s very large: 252 for n = 10. candidate keys. • Candidate keys: smallest keys. . > 180. Y is not a key.Bn (t ) • Suppose we have a set of relation names and inclusion dependencies (IDs) between them.. . • In the previous example: A and B. X is a key if CF (X) = U • In the previous example: any set containing A or B is a key. and prime attributes • A set X of attributes is a key with respect to F if X → U is implied by F • That is. That is. .. 343 13 Introduction to Databases Inclusion dependencies • Functional and inclusion dependencies are the most common ones encountered in applications • Reminder: R[A1 . . ... there is a tuple t in S such that πA1. . An ] ⊆ S[B1.

. Am. Bin ] where {i1. . Bm] • if R[X] ⊆ S[Y ] and S[Y ] ⊆ T [Z] then R[X] ⊆ T [Z] R[A1 . . .Implication of inclusion dependencies • There are simple rules: • R[A] ⊆ R[A] • if then • if R[A1. . . n}. . . . . in} is a permutation of {1. • It turns out that the problem is inherently hard. 343 15 Introduction to Databases Implication of inclusion dependencies • An ID g is implied by G iﬀ it can be derived by repeated application of the four rules shown above • This immediately gives an expensive algorithm: apply all the rules until none are applicable. . . . . . no reasonable algorithm for solving it will ever be found • But an important special case admits eﬃcient solution • Unary IDs: those of the form R[X] ⊆ S[Y ] • Implication G g can be tested in polynomial time for unary IDs 343 16 Introduction to Databases . . . . . . . . Bn ] R[Ai1 . . . . An ] ⊆ S[B1. Bm+1. An] ⊆ S[B1. . Bm. . . . . . . . . . . Ain ] ⊆ S[Bi1 . . . Bn] then R[A1 . Am] ⊆ S[B1. Am+1 . . . . . . . . . . .

• Thus. it is hard to reason about some of the most common classes of constraints found in relational databases. one has to deal with only two kinds of dependencies: FDs and IDs • Implication problem can be solved for FDs and IDs. a FD f and an ID g. must satisfy f (or g). 343 18 Introduction to Databases . • It turns out that no algorithm can possibly solve this problem. any database that satisﬁes F and G.Functional and inclusion dependencies • In majority of applications. • Can it be solved for FDs and IDs together? • That is. 343 17 Introduction to Databases Even more restrictions • Relations are typically declared with just primary keys • Inclusion dependencies typically occur in foreign key declarations • Can implication be solved algorithmically for just primary keys and foreign keys? • The answer is still NO. given a set of FDs F and a set of IDs G. Can we determine if F ∪G F ∪G f g • That is.

. . • Implication can be tested for unary IDs and arbitrary FDs • Moreover. . and even unsolvable for primary keys and foreign keys 343 20 Introduction to Databases . An ] ⊆ S[B1. . .When is implication solvable for FDs and IDs? • Recall: an ID is unary if it is of the form R[X] ⊆ S[Y ]. . Bn] and {B1. . Bn] (unary if n = 1) • Foreign key: R[A1 . where U is the set of all attributes of a relation. . • Candidate key: a minimal key X. • However. . . the algorithm for doing this is quite complex. . . 343 19 Introduction to Databases Dependencies: summary • FDs: X → Y • Keys: X → U . Bn} is a key • Implication problem: easy for FDs alone hard but solvable for IDs alone solvable by a complex algorithm for FDs and unary IDs unsolvable for FDs and IDs. • Inclusion dependency: R[A1 . . . . it can be done in polynomial time. no subset of X is a key. . . . . An ] ⊆ S[B1. .

Title → Price FD3 Title → Director • Bad design: put everything in one relation BAD[Title. Time. Address. Director. Time.Database Design • Finding database schemas with good properties • Good properties: no update anomalies no redundancies no information loss • Input: list of all attributes and constraints (usually functional dependencies) • Output: list of relations and constraints they satisfy 343 21 Introduction to Databases Example: bad design • Attributes: Title. Theater. Phone. Phone FD2 Theater. Director. Address. Time. Price • Constraints: FD1 Theater → Address. Phone. Price] 343 22 Introduction to Databases . Theater.

we list both director and title • Address is determined by Theater For every movie playing. Time. we have inconsistency. • If a movie stops playing. Title → Price FD3: Title → Director 343 24 Introduction to Databases . Phone FD2: Theater. Price Title. Time. Director constraints FD1: Theater → Address. we lose association between Title and Director • Cannot add a movie before it starts playing 343 23 Introduction to Databases Good design • Split BAD into 3 relations: Relational Schema GOOD: Table attributes T1 T2 T3 Theater. as it must be changed in all tuples corresponding to all movies and showtimes. we repeat the address • Update anomalies: • If Address changes in one tuple.Why is BAD bad? • Redundancy: many facts are repeated • Director is determined by Title For every showing. Phone Theater. Title. Address.

X is a key. T3 343 25 Introduction to Databases Boyce-Codd Normal Form (BCNF) • What causes updates anomalies? • Functional dependencies X → Y where X is not a key.Price(BAD) T3 = πTitle. FD2.Why is GOOD good? • No update anomalies: Every FD deﬁnes a key • No information loss: T1 = πTheater.Address. • A database is in BCNF if every relation in it is in BCNF.Phone(BAD) T2 = πTheater. 343 26 Introduction to Databases . FD3 all appear as constraints for T1.Director(BAD) BAD = T1 1 T2 1 T3 • No constraints are lost FD1.Title. • A relation is in Boyce-Codd Normal Form (BCNF) if for every nontrivial FD X → Y . T2.Time.

. A decomposition is called: • BCNF decomposition if each (Ui. . (Un. . each πUi (R) satisﬁes Fi and R = πU1 (R) 1 πU2 (R) 1 . for every FD f . F ) is a set (U1. Fi) is in BCNF. . Fn) where Ui ⊆ U and Fi is a set of FDs over attributes Ui. . 1 πUn (R) • Dependency preserving if F and F ∗ = i Fi are equivalent. .Decompositions: Criteria for good design Given a set of attributes U and a set F of functional dependencies. F or ∀X ⊆ U 343 f ⇔ F∗ f CF (X) = CF ∗ (X) 28 Introduction to Databases . F1). a decomposition of (U. 343 27 Introduction to Databases Decompositions: Criteria for good design cont’d • lossless if for every relation R over U that satisﬁes all FDs in F . That is.

these are all FDs on V that are implied by F : πV (F ) = {X → Y | X. . F ) ∈ S not in BCNF (b) Choose nonempty disjoint X. F1). F ) by (X ∪ Y. Y ⊆ V : F X→Y ⇔ F X→Y 343 29 Introduction to Databases Decomposition algorithm Input: A set U of attributes and F of FDs Output: A database schema S = {(U1. it can often be compactly represented by a set F of FDs on V such that: ∀X. Z such that: (i) X ∪ Y ∪ Z = V (ii) Y = CF (X) − X X → Y and F X → A for all A ∈ Z) (that is. Fn )} 1. Y ⊆ CF (X)} • In other words. Y. While S is not in BCNF do: (a) Choose (V. (Un. Y ⊆ V. Y ⊆ V. F X →Y} • Even though as deﬁned πV (F ) could be very large. Set S:={(U.Projecting FDs • Let U be a set of attributes and F a set of FDs • Let V ⊆ U • Then πV (F ) = {X → Y | X. . . . πX∪Z (F )) (d) If there are (V . F )} 2. F ) 343 30 Introduction to Databases . F ) in S with V ⊆ V then remove (V . F (c) Replace (V. πX∪Y (F )) and (X ∪ Z. F ) and (V .

FD3}) = FD2 343 32 Introduction to Databases .FD3}) = FD3 πX∪Z ({FD1. Let X = {Title}. Price} πX∪Y ({FD1.FD2. after the ﬁrst step we have two schemas: S1 = ({Theater. Phone but Theater → Title. FD3) • Step 2: S1 is in BCNF. FD2. Price} 343 31 Introduction to Databases Decomposition algorithm: example cont’d • Step 1. it is not in BCNF because Theater → Address. Director. Y = {Address. Z = {Theater.Decomposition algorithm: example Consider BAD[Title. Y = {Director}. Initialization: S =(BAD. Z = { Title. FD3) While loop • Step 1: Select BAD. Phone}. Time. FD2. Time. FD2. FD1. cont’d πX∪Y ({FD1. Phone.FD3}) = FD1 πX∪Z ({FD1. there is nothing to do.FD3}) = {FD2. Theater. Address. Director. Time. FD1) S1 = ({Theater. Time.FD2. Price Our sets are: X = {Theater}. Price}. Address. Director.FD2. Time. FD3. Price] and constraints FD1. FD3} Thus. Phone}.FD2. Director. Title. S1 is not in BCNF: Title is not a key.

and lossless • However. Director}. we have: S1 = ({Theater. Time. S3 are all in BCNF. 343 33 Introduction to Databases Properties of the decomposition algorithm • For any relational schema. this completes the algorithm. the output is not guaranteed to be dependency preserving.Decomposition algorithm: example cont’d • After two steps. Phone}. FD1) S2 = ({Theater. the decomposition algorithm yields a new relational schema that is: in BCNF. FD3) • S1. Address. Price}. S2.Title. 343 34 Introduction to Databases . FD2) S3 = ({Title.

sn → pr). sc.sc. tl. sc. Snack (sn).tl(R) 1 πth. Screen → Title Theater. Title (tl). sc. (th. sn.sn.pr(R) • This follows from the intuitive semantics of the data. sn. th. in out case we also have R = πth. sn → pr) 343 35 Introduction to Databases Surprises with BCNF cont’d • The extra relation (th.sc.sn.sc. sc. tl.pr(R) 1 πth. so are all attributes. (th. as well as all tuples • BCNF tells us that for any R satisfying the FDs: R = πth. sc → tl). pr. th. sn.tl(R) 1 πth. sn.Surprises with BCNF • Example: attributes Theater (th). sc → tl). pr.sn(R) • However. Screen (sc). sc. • But what is there in the schema to tell us that such an equation holds? 343 36 Introduction to Databases . ∅) • The last relation looks very unnatural! (th. Price (pr) • Two FDs Theater. ∅) is not needed • Why? Because it does not add any information • All FDs are accounted for. th. th. • After two steps: (th. Snack → Price • Decomposition into BCNF: after one step (th.

→ it should be decomposed into relations with attribute sets XY and X(U − Y ). th. ∅) (th. MVD X → Y holds in → it if R = πXY (R) 1 πX(U −Y )(R) • Simple property: X → Y implies X → Y → • Another deﬁnition: R satisﬁes X → Y if for every two tuples t1. sn.Multivalued dependencies • Tell us when a relation is a join of two projections • A multivalued dependency (MVD) is an expression of the form X→ Y → where X and Y are sets of attributes • Given a relation R on the set of attributes U . sc. pr. tl. ∅) • Both are subsets of already produced schemas and can be eliminated. sn. sc. ∅) and obtain (th. sn. sn → pr) 343 38 Introduction to Databases . there exists a tuple t with πXY (t) = πXY (t1) πX(U −Y )(t) = πX(U −Y )(t2) 343 37 Introduction to Databases MVDs and decomposition • If a relation satisﬁes a nontrivial MVD X → Y and X is not a key. the FDs plus Theater → Screen give rise to decomposition → (th. assume that we have a MVD Theater → Screen → • Apply this to the “bad” schema (th. t2 → in R with πX (t1) = πX (t2). sc. th. • Thus. (th. sc → tl). • Returning to the example.

Y = {P }.. P(rofessor). T(ime)] • Set of FDs F = {C → P. a relation is in 4NF if for any nontrivial MVD X → Y . Z = {T } • Output: ({C. PT → C} • (Lectures. P }. or X is a key.. 4NF implies BCNF → • General rule: if BCNF decomposition has “unnatural relations”.. 343 39 Introduction to Databases BCNF and dependency preservation • Schema Lecture[C(lass). ({C. • Proof: there are just a few schemas on 3 attributes . C → P ). • Apply BCNF decomposition algorithm: X = {C}. but C is not a key. T }. exercise! 343 40 Introduction to Databases . → either X ∪ Y = U . ∅) • We lose PT → C! • In fact. try to use MVDs to decompose further into 4NF • A useful rule: any BCNF schema that has one key that consists of a single attribute.. is in 4NF • Warning: the outcome of a decomposition algorithm may depend on the order in which constraints are considered. F ) not in BCNF: C → P ∈ F . • Since X → Y implies X → Y . check them all . there is no relational schema in BCNF that is equivalent to Lectures and is lossless and dependency preserving.4th Normal Form (4NF) • Decompositions like the one just produced are called 4th Normal Form decompositions • They can only be produced in the presence of MVDs • Formally.

P. professor’s name is repeated. 343 42 Introduction to Databases . F X → A implies that one of the following is true: . T]. so P is a prime attribute. we cannot always have BCNF • Thus we need a slightly weaker condition • Recall: a candidate key is a key that is not a subset of another key • An attribute is prime if it belongs to a candidate key • Recall (U. PT → C} is in 3NF: • PT is a candidate key. • We tolerate this redundancy because there is no BCNF decomposition. or . where A ∈ X is an attribute.either X is a key. F ) is in BCNF if for any FD X → A. as long as they imply only prime attributes • Lectures[C.A is prime 343 41 Introduction to Databases Third Normal Form (3NF) cont’d • Main diﬀerence between BCNF and 3NF: in 3NF non-key FDs are OK. F = {C → P. where A ∈ X is an attribute. F ) is in the third normal form (3NF) if for any FD X → A.Third Normal Form (3NF) • If we want a decomposition that guarantees losslessness and dependency preservation. F X → A implies that X is a key • (U. • More redundancy than in BCNF: each time a class appears in a tuple.

no subset F ⊂ F is a cover of F . and hence: • Minimal cover: A → B. 343 43 Introduction to Databases Decomposition into third normal form: covers cont’d • A minimal cover is a small set of FDs that give us all the same information as F .Decomposition into third normal form: covers • Decomposition algorithm needs a small set that represents all FDs in a set F • Such a set is called minimal cover. • Example: let {A → AB. B → C 343 44 Introduction to Databases . A → B. That is. CF (B) = BC. Formally: • F is a cover of F iﬀ CF = CF . where A is an attribute. A → AC. each FD in F is of the form X → A. For X → A ∈ F and X ⊂ X. CF (C) = C. for every f . A → C. A ∈ CF (X ). B → BC} • Closure: CF (A) = ABC. F f ⇔ F f • F is a minimal cover if F is a cover.

B → C} • A is a key. Fn )} Step 1. X → A1). (BC. A → AC. (BC. . X → A) for each X → A in F . F ). . B → C) • Simpliﬁcation: if one of attribute-sets produced in 2a) is a key. B → C) • Other potential simpliﬁcations: if we have (XA1. (XAk . A → B). A → B. Step 2. . A → B). . Ak . Ak ) 343 46 Introduction to Databases . so output: (A. (Un. . . Find a minimal cover F of F . F1). . . (AB.Decomposition into third normal form: algorithm Input: A set U of attributes and F of FDs Output: A database schema S = {(U1. then output (U. A → C. ∅). ∅) 343 45 Introduction to Databases Decomposition into third normal form: example • F = {A → AB. X → A1 . X → Ak ) in the output. If there is X → A in F with XA = U . then step 2b) is not needed • Hence the result is: (AB. Otherwise. . B → BC} • F = {A → B. select a key K. . . and output: 2a) (XA. they can be replaced by (XA1 . . and 2b) (K.

(Why?) • But how hard is it to ﬁnd a minimal cover? • Naive approach: try all sets F . and dependency preserving • Complexity of the algorithm? • Given the a minimal cover F . if BCNF decomposition cannot be found 343 48 Introduction to Databases . • This is too expensive (exponential). can one do better? • There is a polynomial-time algorithm. How? 343 47 Introduction to Databases Overview of schema design • Choose the set of attributes U • Choose the set of FDs F • Find a lossless dependency preserving decomposition into: BCNF. it is linear time. check if they are minimal covers. if it exists 3NF.3NF cont’d • Properties of the decomposition algorithm: it produces a schema which is in 3NF lossless.

• Local constraints occur in the CREATE TABLE statement and use the keyword CHECK after attribute declaration: CREATE TABLE T (.LastName) 343 50 Introduction to Databases . ) 343 <condition>... <condition... .Other constraints • SQL lets you specify a variety of other constraints: Local (refer to a tuple) global (refer to tables) • These constraints are enforced by a DBMS..C < 10) • Example: each value of attribute Name occurs precisely once as a value of attribute LastName in relation S: Name VARCHAR(20) CHECK (1 = SELECT COUNT(*) FROM S WHERE Name=S. that is.C FROM R WHERE R.. 49 Introduction to Databases Local constraints • Example: the value of attribute Rank must be between 1 and 5: Rank INT CHECK (1 <= Rank AND Rank <= 5) • Example: the value of attribute A must be less than 10... they are checked when a database is modify. and occur as a value of attribute C of relation R: A INT CHECK (A IN SELECT R. A <type> CHECK B <type> CHECK .

• Typically assertions say that all elements in a database satisfy certain condition. • All salaries are at least 10K: CREATE ASSERTION A1 CHECK ( (SELECT MIN (Empl.Length > 180) ) 343 52 Introduction to Databases .Assertions • These assert some conditions that must be satisﬁed by the whole database.Title=M.Salary) FROM Empl >= 10000) ) • Most assertions use NOT EXISTS in them: SQL way of saying “every x satisﬁes F ” is to say “does not exist x that satisﬁes ¬F ” 343 51 Introduction to Databases Assertions cont’d • Example: all employees of department ’sales’ have salary at least 20K: CREATE ASSERTION A2 CHECK ( NOT EXISTS (SELECT * FROM Empl WHERE Dept=’sales’ AND Salary < 20000) ) • All theaters play movies that are at most 3hrs long: CREATE ASSERTION A2 CHECK ( NOT EXISTS (SELECT * FROM Schedule S.Title AND M. Movies M WHERE S.

Assertions cont’d • Some assertions use counting. to ensure that table T is never empty. it should not go through. For example.Length <> NewTuple.title 343 54 Introduction to Databases .Length) SET Length = OldTuple. use: CREATE ASSERTION T_notempty CHECK ( 0 <> (SELECT COUNT(*) FROM T) ) • Assertions are not forever: they can be created. • Example: If an attempt is made to change the length of a movie. • Less declarative and more procedural than assertions.Length WHERE title = NewTuple. and dropped later: DROP ASSERTION T_nonempty. CREATE TRIGGER NoLengthUpdate AFTER UPDATE OF Length ON Movies REFERENCING OLD ROW AS OldTuple NEW ROW AS NewTuple FOR EACH ROW WHEN (OldTuple. • Follow the event-condition-action scheme. 343 53 Introduction to Databases Triggers • They specify a set of actions to be taken if certain event(s) took place.

• WHEN (OldTuple.salary) • Requirement: the average salary of managers should never go below $100.title is the action: Lenght must be restored to its previous value. • REFERENCING OLD ROW AS OldTuple NEW ROW AS NewTuple says how we refer to tuples before and after the update. 343 56 Introduction to Databases . we want the trigger to run after all the statements of the update operation have been executed. and attribute Length changed its value.Analysis of the trigger • AFTER UPDATE OF Length ON Movies speciﬁes the event: Relation Movies was modiﬁed. • SET Length = OldTuple. FOR EACH ROW trigger cannot work here. and then increase it again.Length WHERE title = NewTuple.rank. 343 55 Introduction to Databases Another trigger example • Table Empl(emp_id.Length) – condition for the trigger to be executed (Lenght changed its value). It may initially decrease the average.000. • Problem: suppose there is a complicated update operation that aﬀects many tuples. Thus.Length <> NewTuple. • FOR EACH ROW – the tigger is executed once for each update row. • Hence.

Another trigger example cont’d Trigger for the previous example: CREATE TRIGGER MaintainAvgSal AFTER UPDATE OF Salary ON Empl REFERENCING OLD TABLE AS OldTuples NEW TABLE AS NewTuples FOR EACH STATEMENT WHEN ( 100000 > (SELECT AVG(Salary) FROM Empl WHERE Rank=’manager’) ) BEGIN DELETE FROM Empl WHERE (emp_id. rank. • REFERENCING OLD TABLE AS OldTuples NEW TABLE AS NewTuples says that we refer to the set of tuples that were inserted into Empl as NewTuples and to the set of tuples that were deleted from Empl as OldTuples. 343 58 Introduction to Databases . salary) in NewTuples. and attribute Salary changed its value. not once for each updated row. • WHEN ( 100000 > (SELECT AVG(Salary) FROM Empl WHERE Rank=’manager’) ) is the condition for the trigger to be executed: after the entire update. 343 57 Introduction to Databases Analysis of the trigger • AFTER UPDATE OF Salary ON Empl speciﬁes the event: Relation Empl was modiﬁed. • FOR EACH STATEMENT – the tigger is executed once for the entire update operation. the average Salary of managers is less than $100K. INSERT INTO Empl (SELECT * FROM OldTuples) END.

relationships. and INSERT INTO Empl (SELECT * FROM OldTuples) re-inserts all the tuples that were deleted in that update. and then apply normalization techniques. relationships between them. that consists of two updates between BEGIN and END. salary) in NewTuples. rank. • Goal: describe entities. and their attributes. deletes all the tuples that were inserted in the illegal update. INSERT INTO Empl (SELECT * FROM OldTuples) END.Analysis of the trigger cont’d • BEGIN DELETE FROM Empl WHERE (emp_id. and attributes based on a realworld situation that needs to be modeled by a database. salary) in NewTuples. • DELETE FROM Empl WHERE (emp_id. typically represented by entity-relationship models • Database describes objects (entities). ﬁnd relational database design. • 343 59 Introduction to Databases Conceptual modeling • How does one come up with a set of attributes and FDs? • Often by using conceptual design techniques. is the action. rank. • Steps: describe entities describe relationships between them describe attributes translate into relational model 343 60 Introduction to Databases .

ENTITIES STUDENT COURSE EMPLOYEE PROJECT 343 61 Introduction to Databases RELATIONSHIPS STUDENT EXAM COURSE EMPLOYEE WORKS PROJECT 343 62 Introduction to Databases .

. . Ek . Tk • Sometimes attributes have to be renamed 343 64 Introduction to Databases . . . . .ATTRIBUTES STUDENT EXAM COURSE StudId Name Major Date Course# Name Enrollment 343 63 Introduction to Databases Translating into relational model: rules • Each entity Ei is described by a relation Ti • Key attributes of Ei form a key of Ti • Each relationship Rj is described by a relation Sj • For a relationship Rj between E1. the key of Sj is the set of all attributes which are keys for T1. . .

Date int. Enrollment) 343 65 Introduction to Databases Database design in SQL CREATE TABLE Student (StudId char(10) not null primary key. Name char(20).TRANSLATION INTO RELATIONS STUDENT EXAM COURSE StudId Name Major Date Course# Name Enrollment Student(StudId.Course#. Name char(30). Major) Exam(StudId. Name. Course)) 343 66 Introduction to Databases .Date) Course(Course#. unique (Student. Enrollment int) CREATE TABLE Exam (Student char(10) not null. Major char(5)) CREATE TABLE Course (Course# int not null primary key. Name. Course int not null.

- Chapter 7 Part 1
- Normalization
- ERwin_Methods_New.pdf
- ESHOP Apllication
- ertoschemamapping-140506022038-phpapp02
- Database Design and Implementation Exam December 2008 - UK University BSc Final Year
- Lecture 2
- All Interview QA
- dbms
- Nancy DBMS
- Implementation
- Chapter5_data normalization.ppt
- Rdbms Quan Book (Debraj)
- PERT.12.Normalization Cont (Tugas Teori)
- SQL Commands
- Braindump2go New Updated 98-364 Exam Questions Free Download (1-10)
- New Microsoft Office Word Document
- Using Perl DBI to Interface to MySQL
- Scheduler Technical Reference
- Information Systems
- Call Log System1 Set
- Algebra y SQL
- Glorp Tutorial
- Airline Reservation
- Tiny College
- 44085311 a Must SQL Notes for Beginners
- Chapter6 Thai 9
- ANNAUNIVERSITY DBMS
- SYBASE SESSION -2 DOCUMENT
- File on Ms Access

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading