You are on page 1of 39

CPSC 471 FINAL EXAM NOTES

ENTITY RELATION
DIAGRAMS & RELATIONAL
MODELS
ERD NOTATION
RELATIONAL MODEL CONSTRAINTS

1. INHERENT OR IMPLICIT CONSTRAINTS


• These are based on the data model itself.
• E.g., relational model does not allow a list as a value for any attribute

2. SCHEMA-BASED OR EXPLICIT CONSTRAINTS


• They are expressed in the schema by using the facilities provided by the
model.
• E.g., max. cardinality ratio constraint in the ER model

3. APPLICATION BASED OR SEMANTIC CONSTRAINTS


These are beyond the expressive power of the model and must be
specified and enforced by the application programs.

FUNDAMENTAL DBMS OPERATIONS


• INSERT
• DELETE
• MODIFY or UPDATE

INSERT OPERATION: CONSTRAINT VIOLATIONS

DOMAIN CONSTRAINT
• if one of the attribute values provided for the new tuple is not of the specified
attribute domain

KEY CONSTRAINT
• if the value of a key attribute in the new tuple already exists in another tuple in the
relation

REFERENTIAL INTEGRITY
• if a foreign key value in the new tuple references a primary key value that does not
exist in the referenced relation

ENTITY INTEGRITY
• if the primary key value is NULL in the new tuple
DELETE OPERATION: CONSTRAINT VIOLATIONS

DELETE may violate only referential integrity, and only if the primary key value of
the tuple being deleted is referenced from other tuples in the database

• Can be remedied by several actions: RESTRICT, CASCADE, SET NULL

• RESTRICT option: reject the deletion

• CASCADE option: propagate the new primary key value into the foreign keys of the
referencing tuples

• SET NULL option: set the foreign keys of the referencing tuples to NULL

• One of the above options must be specified during database design for each
foreign key constraint

UPDATE OPERATION: CONSTRAINT VIOLATIONS


UPDATE can violate all constraints, depending on the attribute being modified

PRIMARY KEY UPDATE


• Similar to DELETE followed by INSERT
• Need to set one of: REJECT, CASCADE, or SET NULL

FOREIGN KEY UPDATE


• May violate referential integrity

NORMAL ATTRIBUTE UPDATE (Neither PK nor FK):

• Can only violate domain constraints


o This includes the NOT NULL or UNIQUE constraint that might be applied to a
field
Glossary / Translations:

Informal Terms Formal Terms


Table Relation
Column Header Attribute
All possible Column Values Domain
Row Tuple
Table Definition Schema of a Relation
Populated Table State of the Relation

Relational Database State:


An instance of a database: a snapshot of the way it is at the current time. Is the
union of all individual relation states.
• The state of a database changes whenever it is modified by INSERT,
UPDATE, or DELETE

Relation State:
An instance of a relation within a larger relational database state. Denoted as
foo(S) where S is the state itself, and foo is the name of the table.

Superkey: A set of attributes SK unique to a current tuple of a relation R :


For all tuples in R, if t1[SK] = t2[SK] then t1 = t2 (no tuples share the same
superkey).
This must hold for all valid relation states

Key: A ‘minimal’ superkey K such that removal of any attribute from K breaks the
uniqueness property of the relation (K is no longer a superkey)

ALL KEYS ARE SUPERKEYS BUT NOT VISE VERSA


MAPPING OF ENTITY RELATION
DIAGRAMS TO RELATIONAL
MODELS
Entity Relation Diagrams & Relational Models
Construction:
• ERDs / EERDs:
o Start with Strong Entities
o Move onto weak entities
o Map any specializations / generalizations
§ Disjoint OR Overlapping
• Partial
• Full
o Write out Binary (1:1) Relationships
§ Weak entities
§ Full participation
§ Partial participation
o Write out Binary (1:N) Relationships
§ Weak entities
§ Full participation
§ Partial participation

o Write out Binary (M:N) Relationships


§ Full participation
§ Partial participation
o Write out N-ary(M:N:Q) Relationships
§ Full participation
§ Partial participation
o Add union types at the very end (these are usually not helpful)

Relational Models / Mapping:


• Mapping is the conversion between ERDs and RMs
• Aim for the minimum necessary redundancy
1) Convert all strong entity types to relational entities with their respective primary
keys underlined
a. If a primary key is composite, preserve this and make it the primary key of
the relation
2) Convert all weak entities into relational entities. Any partial key attributes
combine with foreign key attributes to form the primary key of weak entities.
3) Translate Binary (1:1) Relations into relational entities

This splits into 3 cases, depending on the participation of entities A and B


in the relation R
• It might be appropriate to merge the entities into one relational schema (the
table-like thing)
• A good example would be merging the attributes of an organization member
and their organization for the relation “head” (think like Alexis)
• In this case, pick a side. Attributes that don’t natively belong to the original
entity become candidate keys (they’re unique too)

• If it is a partial participation on one side, i.e. (0…1) : 1, make the partial (left)
side’s primary key a foreign key of the full (right) side
• Still make two separate relational entities, but connect them like above
• Shift any attributes of the relation itself to the full side.

• If there exists a relation R between two entities A and B such that R is a


partial participation on both sides [(0…1) : (0…1)], then make a new
“Relationship” entity (let’s call this C) to represent this connection between A
and B
• C = (A.PKey, B.PKey, [Attributes of R itself])
• “the primary key of this new relationship entity is the combination of A and B’s
primary keys
• The separate primary keys of A and B, as seen within C would be a “foreign
key constraint” ➝ they can be used to connect an instance of C with an
instance of A OR and instance of B separately, but cannot uniquely identify C
independently

4) Map any 1 : N relations
a. “Follow the traffic” ➝ add the primary key of A as a foreign key of B if 1x A
is related to Nx B
b. Do the same with relational attributes
5) Map any N to N Binary relations
a. Create a brand-new ‘relationship entity’ containing the primary keys of A
AND B
i. Separately, A.Pkey and B.PKey relate the new ‘relationship’ entity to A
and B respectively, but can’t identify this new entity alone
ii. Together, (A.PK, B.PK) in C (the new entity we created from the
relationship) are the PRIMARY key of C
iii. EXAMPLE Invitation.PK = (M.Email, E.EventID)
iv. Any relationship attributes (initiation date for example) STAY with
this new relation (because we can’t tie them to any instance of
“Member” or “Event”
Extensions of Step 5:

6) Multivalued-Attributes:
a. If an attribute is multi-valued (Vehicle color for example), apply a similar
idea to step (5)
b. Create a new entity in the model to represent this multi-valued attribute,
and add the containing entity’s primary key (Vehicle plate number, for
example), to this new relation
c. The primary key of multi-valued attribute relations is:
(Parent.PK, MV_Attribute_Value)

7) N-ary Relations
a. Another ‘extension’ of step (5)

[Building some context for our example]:

Suppose an N-Ary Relation Rel exists between entities A, B, and C

That is, Rel[A,B,C]. Suppose the primary keys of A, B and C are (respectively),
A.PK, B.PK, C.PK.

What to do:
b. Combine A.PK, B.PK, C.PK, into a new relationship entity called “Rel”
c. The primary key of ‘Rel’ is (A.PK, B.PK, C.PK).
d. Add any attributes of Rel(A,B,C) (bubbles attached to the relationship in
the ERD), to the relationship entity ‘Rel’.
e. A.PK, B.PK, C.PK are all foreign keys of Rel

(Continued)
HARDEST PARTS: GENERALIZATION AND SPECIALIZATIONS

Specializations implicitly contain the inherited attributes of their parent classes:

Using your final project relational schema

‘Admin’ doesn’t need Fname, Lname, Gender, OrgID.

8) Map Specializations and generalizations:

a. For any type of specialization (disjoint or overlapping, partial or full),


create new entities to represent the subclasses of the parent class, and
add the parent’s primary key to each of the subclass relational entities.
i. This is a pretty good approach and you should do this by default
OR

(For subclasses with total participation ONLY)


b. For each subclass Ti of a superclass S:
i. Create a new relation Vi
ii. Give Vi all of the attributes of S, PLUS the attributes of Ti
• Attrs(V) = {S1, S2, … Sn} È {t1, t2, … tm} OR
• Attrs(Vi) = Attrs(S) È Attrs(Ti)
iii. The primary key of T is the primary key of V
c.
Other approaches to this exist, but this approach is universal, and difficult to
screw up.

What about multiple inheritance?


In the event of multiple inheritance, repeat step 8 for all sub classes, but make
sure that the same primary key is used throughout the entire ‘relational chain’
(since it inherits from multiple parents)

9) UNION TYPES:
a. Create a new relational entity in the model to represent the union, but add
a surrogate (artificial) key to the relation to make it uniquely identifiable
b. NOTE: the surrogate key doesn’t really exist in the original schema.
It’s purely for convenience.
HANDLING COMPOSITE ATTRIBUTES:
1) Anywhere that a composite attribute occurs, break it up into its simpler
components (eg: ‘Name’ ➝ Fname + Lname).
2) In the case of a composite multi-valued attribute, apply step 6.

RELATIONAL MODEL EQUIVALENCE:


1) Two relational models are equivalent if they have the same domains for all
relations
2) In simplest terms, if one relational model is another relational model but with
slightly different attribute names, then it is equivalent
RELATIONAL ALGEBRA &
CALCULUS
THE COMPLETE SET OF RELATIONAL OPERATIONS
*Note that the minimal set of relational operations consists of the following:
Project, Select, Rename, Cartesian Product, Union, and Difference

PROJECT (π):
π S(R)
• Acts on COLUMNS
• S is a subset of the attributes of R
• Not commutative
• Not associative
• Not distributive
• Equivalent to SELECT in SQL

SELECT (s):
s(Condition(s))(R)

• Acts on ROWS
• Selects only instances of R that satisfy the logical condition(s)
• Not commutative
• Not associative
• Not distributive
• Equivalent to WHERE in SQL

RENAME ρ
Given the relation R(A1, A2, … An):

(1) ρ(B1, B2, … B(n))(R)


(2) ρS(R)
(3) ρS(B1, B2, … B(n))(R)

• (1) Renames attributes A(n) ® B(n)


• (2) Renames R to S
• (3) Does both (1) and (2)
• Equivalent to AS in SQL

UNION È
DIFFERENCE –
CARTESIAN PRODUCT ×
Making Relational Algebra Expressions
• Combine operations into one gigantic, illegible expression
• Break into smaller intermediate expressions
• Equivalent to the WITH operator in SQL
• intermediate results must be named
• Use the ‘¬’ symbol instead of the ‘=’ sign:
• Bar ¬ π (s(Foo))
• Bar ¬ s (Foo)
• Quux ¬ π(Bar)

RELATIONAL ALGEBRA: SET THEORY OPERATORS


UNIONS
• Two relations are union-compatible if they have the same number of
attributes and the same data type in each column.
• A union of tuples merges the set of tuples, not the set of attributes.
• IS ASSOCIATIVE ➝ (A È B) È C = A È (B È C)
• IS COMMUTATIVE ➝ A È B = B È A

SQL Equivalent: “UNION” Clause

INTERSECTIONS

• Is the set of all shared elements of A and B:


Given: A = {A1, A2, A3, C1, C2, C3}, B = {B1, B2, B3, C1, C2, C3};
The set A Ç B = {C1, C2, C3}
• IS ASSOCIATIVE: X Ç (Y Ç Z) = (X Ç Y) Ç Z
• IS COMMUTATIVE: X Ç (Y Ç Z) = (Y Ç Z) Ç X = (Y Ç X) Ç Z
If there are no shared attributes, the resulting set is the empty set (ø)

SQL Equivalent: INTERSECT Clause

DIFFERENCE
• The set A – B is the set A without any elements of set B
• SQL Equivalent: EXCEPT Clause
• Is NOT commutative
• Is NOT associative
• If B Ë A, then A – B = A
Cartesian Product:

Extends the domain of a relation by ‘adding’ a new set of attributes. Lists every possible
combination of ‘merged tuples’.

Given: R1 {(A1, B1, C1), (A2, B2, C2)} and R2 {(D1, E1, F1), (D2, E2, F2)}

R1 × R2 = {
(A1, B1, C1, D1, E1, F1)
(A1, B1, C1, D2, E2, F2)
(A2, B2, C2, D1, E1, F1)
(A2, B2, C2, D2, E2, F2)}

SQL Equivalent = ???


JOINS AND SET DIVISION
INNER JOIN / JOIN
A ⨝ (Condition) B = s(Condition)(A × B)
• Combines a cartesian product and a select
• The general case is the theta join, where theta is the Boolean join condition
(listed as condition) here
• Is NOT commutative nor associative
• Does not remove duplicate columns ➝ can result in duplication of attribute
values within a tuple
• SQL EQUIVALENT:

SELECT *
FROM A INNER JOIN B ON (A.x = B.x)
WHERE (Logical Condition)

Example Equivalent Statement removing duplicates:

SELECT A.x
FROM A INNER JOIN B
ON A.x = B.x
WHERE A.x = bar

Resolving Duplicate Columns:


1) Rename the clashing attribute names in either table
2) Use a project (π) operation to remove the redundant column

NATURAL JOIN (AVOID IF POSSIBLE)

A *<Condition> B
• Merges the duplicate columns in the joined set
• Joins tuples based on attributes with the same name
• Results in no duplicate columns, but can cause loss of information if used
improperly
• SQL Equivalent: EQUI JOIN(Logical condition)
• Is identical to an equijoin, except the JOIN attributes of
OTHER JOIN TYPES:

LEFT OUTER JOIN:

R ⟕Condition S

• Identical to INNER JOIN, except ALL tuples from R are included in the result
• Tuples of R that do not satisfy the join condition are padded with NULL
values

RIGHT JOIN:

R ⟖Condition S
• Identical to INNER JOIN, except ALL tuples from S are included in the result
• Tuples of S that do not satisfy the join condition are padded with NULL values
• Columns that are foreign (not part) of the original relation S are set to NULL
in tuples that do not satisfy the join condition

FULL JOIN:
R ⟗ Condition S
• Is a combination of the LEFT and RIGHT joins.
• Tuples of S that do not satisfy the join condition are padded with NULL values
• Tuples of R that do not satisfy the join condition are padded with NULL
values
DIVISION
• The inverse of the cartesian product (for sets of tuples)
• R1(Z) ÷ R2(Y) produces a relation R(X) that contains all tuples in R1(Z) that
appear in R1 in combination with every tuple from R2(Y), where Z = X È Y

Given two relations, R(X) and S(Y):


R(X) ÷ S(Y) = πA(R) – πA(πA(R) × S – R)

AGREGATE FUNCTIONS:
Min
Max
Sum
Avg
Count

Syntax:
<Grouping_Attribute>F<Aggregate Function1, 2, etc>

OUTER UNION:
• Takes the union of two union-incompatible relations and merges them

• BOTH RELATIONS MUST BE PARTIALLY COMPATIBLE
• There must be some shared attributes between relations

• Tuples are matched based on having the same combination of shared


attributes
• Attributes that are unapplicable to a tuple are set to NULL

Example:
Admin(Email, Position)
(liliana.brown@gmail.com, “Team Lead”)
Member(Email, FirstName, LastName)
(john.doe@gmail.com, John, Doe)
(liliana.brown@gmail.com, Liliana, Brown)

OUTER UNION RESULT:


MEMBER_OR_ADMIN(Email, FirstName, LastName, Position)
(liliana.brown@gmail.com, Liliana, Brown, ‘Team Lead’)
(john.doe@gmail.com, John, Doe, NULL)
TUPLE RELATIONAL CALCULUS
A weird non-procedural language.
Uses second-order predicate logic:

HANDY TABLE OF LOGICAL OPERATORS:

Operation Name Operation Negation


AND AÙB (¬ A) Ú (¬ B)
OR AÚB (¬ A ) Ù (¬ B)

FOR ALL "A:B $A:¬B

EXISTS $A:B "A:¬B


A®B
THEN A®¬B
¬AÚB
NOT / NEGATION ¬A A

• Note: the colon ‘ : ’ means “such that ”, and isn’t a true operator

CONSTRUCTING QUERIES:

Let T represent a member of the relation R that has attributes A1, A2, … An

The General Query Form:


{ T.A1, T.A2, … T.An ú Rel(T) Ù <Conditions>}

Example:
Given the following relational schema:

Team (ID, Name, LeadEmail, Specialization)


Admin(Email, Position)
Member(Email, Fname, Lname, Gender)

Find the first and last names of club members that are also team leads. Note that only
administrators can be team leads.

Solution
{M.Fname, M.Lname ½MEMBER(M) Ù $A(ADMIN(A) Ù A.Email = M.Email
Ù $ T(TEAM(T) Ù T.LeadEmail = A.Email)}
Query Tree Notation
SQL
SQL
THINGS TO KNOW:
• Relational Division does not exist in SQL
• Universal quantifiers do not exist in SQL
• Syntax is a mix of RA and TRC
• NULL has a special meaning in SQL, and is not the same as NULL in other
programming languages.
STATEMENT TYPES:
• CREATE
• Table
• View
• SELECT QUERIES
• UPDATE QUERIES
• INSERT QUERIES
• DROP STATEMENTS
• DELETE QUERIES

SELECT STATEMENTS:
Syntax:
SELECT <Attributes>
FROM <Table 1>, <Table 2>…
WHERE <CONDITION>

INSERT STATEMENTS:
Syntax:
INSERT INTO <Table Name>(<Insert Cols>) VALUES
(<Comma-separated list>),
THREE VALUED LOGIC TABLE

STORING INTERMEDIATE QUERIES (‘WITH’):


WITH <AliasName>(<Attributes>) AS(
SELECT <Attritbutes>
FROM <Table1, Table2, … Table n >
WHERE <condition>
), <AliasName (2) >(<Attributes>) AS(
SELECT <Attritbutes>
FROM <Table1, Table2, … Table n >
WHERE <condition>
)
SELECT <Attributes>
FROM <AliasName1, AliasName2…>
WHERE <Condition>

*Note: The sequence of SELECT ® FROM ® WHERE Clauses will be referred to as a


‘SELECT Statement’
Logical Operators & Operators from R.A. & T.R.C.
JOINS:
<Table 1> <JOIN TYPE> <Table 2> ON (Join Condition)
JOIN <Table 3> ON (Join Condition 2) …

‘EXISTS’ ($) :
SELECT *
FROM <TABLE>
WHERE EXISTS(
<Nested Query>
);
• The EXISTS statement can be used inside the WHERE section of SELECT,
UPDATE, and DELETE Statements

• A universal quantifier does not exist in SQL
• Use NOT EXISTS instead, and invert the logic appropriately

‘FOR ALL’
Does not actually exist in SQL but can be performed with one of the following:

SELECT *
FROM Table_1
WHERE NOT EXISTS(Relation_X)
EXCEPT
SELECT *
FROM Table_1
WHERE EXISTS (Relation_X)

“Select from Table1 where of all possible relations, there is no


relation that does not exist”
SELECT <Table1 Attributes>
FROM <Table1> JOIN <Relationship_Table1> ON <Join Cols>
JOIN <Table2> ON <Join Cols>
WHERE <Condition ‘A’ on Table2>
GROUP BY <Table1 Attributes>
HAVING COUNT(*) = (
SELECT COUNT *
FROM <Table2>
WHERE <Condition ‘A’ on Table2>
)

OTHER USEFUL OPERATORS


RETRIEVAL QUERY BLOCK:
SELECT <list of attributes>
FROM <list of tables>
[WHERE <condition>]
[GROUP BY <grouping attribute(s)>]
[HAVING <group condition>]
[ORDER BY <list of attributes>]
• Sections in square brackets are optional

IN: Preceded by a single attribute


SELECT *
FROM <Table Name>
WHERE <Condition on Table_Name.Attr> IN(<Correlated Query>)

CREATE TABLE STATEMENTS:


CREATE TABLE XYZ(
A1 TYPE,
A2 TYPE NOT NULL,

PRIMARY KEY(A2),
FOREIGN KEY (A1) REFERENCES TABLE(A2)
ON UPDATE CASCADE
ON DELETE SET NULL
);
CREATE VIEW STATEMENTS:

CREATE VIEW View_Name AS (


SELECT <list of attributes>
FROM <list of tables>
[WHERE <condition>]
[GROUP BY <grouping attribute(s)>]
[HAVING <group condition>]
[ORDER BY <list of attributes>])

Views can be referenced like ordinary tables, but do not persist when the
database is shut down

UPDATE STATEMENTS:
UPDATE <table_name>
SET <Column Name>
WHERE <Condition>

UPDATE (WITH CASES):


UPDATE <table_name>
SET <table_name>.attribute =
CASE WHEN <Condition1> THEN <action_on_attribute>
CASE WHEN <Condition2> THEN <alternate_action_on_attribute> …
ELSE THEN

DELETE STATEMENTS:

DELETE FROM <TABLE>


WHERE <Condition>
*These DO NOT support columns

INSERT STATEMENTS:

INSERT INTO <TABLE>(<cols>)


VALUES
(<Tuple1>),(<Tuple2>), ...(<TupleN>);
TRIGGER STATEMENTS

CREATE TRIGGER <trigger_name>


[BEFORE | AFTER ]
{INSERT | UPDATE | DELETE}
ON <table_name>
[FOR EACH ROW]
[Trigger_Body];

TRIGGER-BODY SYNTAX:
WHEN
• acts like “IF” Statement:
WHEN(<condition>) <Call_Procedure>(<args?>)
Pseudo-records:
OLD: An alias to the existing tuple at the ‘cursor position’ (wherever the loop is within
the current table).

NEW: An alias to the “container” that has the new information being passed to the
trigger

Trigger Operation OLD NEW


INSERT (No values) New Values
UPDATE Existing Values New Values
DELETE Existing Values (No values)
OPERATORS & SYNTAX RULES
FOR SELECT & DELETE STATEMENTS

OPERATOR Symbol Rules Usage Tips

UNION È Must be preceded by a SELECT


The SELECT statement
AFTER these operators
INTERSECT Ç statement, and followed by another should share an attribute (or
SELECT statement table) in common with the one
EXCEPT – PRECEDING the operator
INNER JOIN These can be chained
together arbitrarily many

JOIN ⨝ times. You can specify


an extra logical
Goes after a table name in a
FROM clause, or after the ON constraint in the ON
clause of a previous JOIN section of the JOIN
LEFT (OUTER) JOIN ⟕ operation Useful if you want to
preserve NULL attributes
RIGHT (OUTER) JOIN ⟖ (display X if NULL, Y if
FULL OUTER JOIN ⟗ not)
Attached to a logical
Use in existential
EXISTS $ condition in the WHERE
queries.
block of a query
Similar to EXIST, but
IN ⊂ Applies to a single attribute
more specific

NORMALIZATION + DECOMPOSITION

FUNCTIONAL DEPENDENCIES:
Given two attributes X and Y of a relation R
• X ® Y holds if whenever two tuples have the same value for X, they have the
same value for Y
o " t1, t2 Î rel(R): if t1[ X ] = t2 [ X ], t1[Y ] = t2[Y ]
o X is not necessarily the primary key of R
• If X IS a primary key of R, then " Y Î R, X ➝ Y
• The FDs of a schema or relation can be determined by examining the values of
attributes
o Just apply the rules above to see if an attribute forms a functional
dependency for another.
Armstrong’s Rules of Inference:

RULE Explanation
IR1: Reflexive If Y Í X, then X ® Y
IR2: Augmentation If X ® Y then (X È Z) ® (Y È Z)
IR3: Transitivity If X ® Y and Y ® Z, then X ® Z
IR4: Decomposition If X ® Y È Z, then X ® Y and X ® Z
IR5: Union If X ® Y and X®Z, then X ® Y È Z
IR6: Pseudo-Transitivity If X ® Y and (W È Y) ® Z, then (W È X) ® Z

The union X È Y can also be written as XZ ® YZ

CLOSURE OF AN ATTRIBUTE

Given:
• A relation R (like “Member” from your Final Project Schema
• A set of functional dependencies F on R
• A set of attributes X, X Í R

Follow this algorithm to calculate X+ (the closure of X over R):

1) Add all attributes of X itself to X+ : " A Í X ⟹ A Î X+


2) For every FD Y ® Z in F:
• Y Í X+ ⟹ X+ = X+ È Z
• Otherwise, move on

ATTRIBUTE COVERS:

Covers
Given two sets of functional dependencies F and G:

F covers G if G+ ⊆ F+
• This means that all FD’s in F can be inferred from G (via Armstrong’s rules)
Equivalence of FD’s
• F = G if all X ® Y in G can be inferred from F and vise versa
• Determine equivalence by calculating X+ wrt F
F = G if F covers G and G covers F
• Not guaranteed: consider G ⊂ F or F ⊂ G
• If G ⊂ F, then F will contain FD’s that G does not
Minimal Covers
• Covers but without redundancy

Creating the minimal cover of F:

Canonical Form
Only one attribute on the right hand side of the arrow:
{A,B} ® {C, D} =
{A,B} ® {C}
{A,B} ® {D}

Extraneous Attributes
Given A set of FD’s F, and (X ® A) ∈ F

Attribute Y ∈ X is extraneous if removing X ® A and keeping (X – {Y}) ® A has no


effect on other closures in F (all closures in F are preserved)

In other words Y is extraneous in X if:


Y ⊂ X and F logically implies {F – (X ® A)} È {(X – {Y}) ® A}

Reductant Attributes (Redundant?)


No FDs exist that can be inferred from the rest of F

• If we removed a functional dependency (X ® A) from F, we couldn’t get it


back by applying the Rules of Inference on F.

• If F – (X ® A) = F, remove dependency (X ® A)

THE PROCESS FOR FINDING MINIMAL COVERS:


1) Convert F to canonical form
2) Remove extraneous attributes from F
3) Remove reductant attributes from F

Minimal Sets of FDs:


• Repeat the process above for EACH functional dependency in F (start from
the top down), until no further reduction is possible
Defining Keys for a Relation:
Input: R and F = FDs on attributes of R
Output: K, the set of attributes that make up a key for R

ALGORITHM
K ← R (Why? Because R is a super key of itself!)
for each attribute A Î K do:
compute (K – A)+ WRT F
(Remove A from K temporarily)
if (K – A)+ contains all attributes of R do:
K = K – {A}
(K is still a superkey of R, so A is not a prime attribute)
else: stop

FASTER (But less precise) WAY:


• The key of R is the set of attributes that only appear on the left hand side of F
• This seems to work based on a website from another university
DEFINITIONS FOR THIS SECTION

RULES OF INFERENCE:

RULE Explanation
IR1: Reflexive If Y Í X, then X ® Y
IR2: Augmentation If X ® Y then (X È Z) ® (Y È Z)
IR3: Transitivity If X ® Y and Y ® Z, then X ® Z
IR4: Decomposition If X ® Y È Z, then X ® Y and X ® Z
IR5: Union If X ® Y and X®Z, then X ® Y È Z
IR6: Pseudo-Transitivity If X ® Y and (W È Y) ® Z, then (W È X) ® Z

The union X È Y can also be written as XZ ® YZ

Full Functional Dependency:


A FD {Y} ® {Z} where removal of any attribute A Î Y breaks {Y} ® {Z} (it doesn’t hold
anymore)

Partial Functional Dependency:


A FD {W} ® {X} where removal of some attribute A Î W maintains {W} ® {X}

Transitive Functional Dependency


A FD X → Y in a relation schema R is transitive if
• there exists a set of attributes Z in R that is neither a candidate key nor a
subset of any key of R
• both X → Z and Z → Y hold
• See IR3

KEYS:
Superkey:
A set of attributes S is a set of attributes S subset-of R with the property that no
two tuples t1 and t2 in any legal relation state r of R will have t1[S]=t2[S]

Key:
A key K is a superkey with the additional property that removal of any attribute
from K will cause K not to be a superkey any more

Candidate Key
If a relation schema has more than one key, each is called a candidate key.

Primary Key
One of the candidate keys is arbitrarily designated to be the primary key, and
the others are called secondary keys.
Prime Attribute
A Prime attribute must be a member of some candidate key

Non-prime Attribute
A Nonprime attribute is not a prime attribute—that is, it is not a member of any
candidate key.

Functional Dependencies:

First Nominal Form:


All attributes are dependent on the key of the relation

Second Nominal Form:


All attributes are dependent on the whole key

Third Nominal Form:


All attributes are dependent on nothing but the key

Boyce-Codd Nominal Form (BCNF):


A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever a
functional dependency X → A holds in R, then X is a superkey of R

Lossless Join / Non-Additivity

Binary Decomposition:
• Decomposition of a relation R into two relations.

NJB (non-additive join test for binary decompositions):


• A decomposition D = {R1, R2} of R has the lossless join property with respect to
a set of FDs F on R if and only if either of the following FDs is in F+:
(1) (R1 Ç R2) ® (R1 – R2)
(2) (R1 Ç R2) ® (R2 – R1)

• The decomposition of a relation R into relations R1, R2, … Rn such that the
natural join of all relations yields R. (the minimum value for n is 2)

• If R can be split into R1, R2, … Rn , and R1 * R2 * … * Rn = R, R is said to


possess this property.
NOMINAL FORMS & DECOMPOSITION

FIRST NOMINAL FORM:


• What you’re used to seeing in SQL and Excel
• No nested relations, nor multivalued or composite attributes
• Resolve nested relations by propagating the primary key of the relation
• Nested relations where individual tuples are not atomic
• Most DBMS systems only accept this as input

SECOND NOMINAL FORM (2NF)


DEFINITION:

All attributes of relation R are fully functionally dependent on the whole key.

EXAMPLE IN 2NF
R1 = {A, B, C, D, E, F}
FDs:
{A, B} ® {C, D}
{B, D} ® {E, F}

Note that FD1 {B, D} Ë FD2 {A, B} since FD2 is missing D, and {E, F} depends on {B, D}

THIRD NOMINAL FORM

DEFINITION:
A relation schema R is in 3NF if it satisfies these two conditions wherever X ® A holds
in R:
(1) X is a superkey of R OR
(2) A is a prime attribute of R

Alternative definition:
A relation schema R is in 3NF if every nonprime attribute of R:
(1) Is fully functionally dependent on every key of R (2NF), and
(2) Is non-transitively dependent on every key of R
CONVERSION PROCESS

• If R is not in the nominal form wanted, we decompose it into sub-relations R =


{R1, R2, … Rn} such that Ri is in the desired nominal form

• This preserves F, the set of functional dependencies of R (except in BCNF


where this isn’t guaranteed).

For a relation schema R and the set of functional dependencies F on its attributes:

For each FD X ® Y over R, that is in violation of the nominal form (2NF or 3NF)
(1) Create a new sub-relation schema S
(2) Add X È Y as the attributes of S
(3) X becomes the primary key of S

Remember that X and Y can be sets of arbitrarily many attributes!

Decomposition
• A Universal Relation Schema R = {A1, A2, … Am} can be decomposed into a set
of smaller relation schemas D = {R1, R2,…Rn}

DECOMOPOSITION GOALS:
• Attribute Preservation:
• Each A(k) will appear in at least one relation schema in R(k) ∈ D
• Each relation will be in 3NF or BCNF
• Dependency Preservation (to the maximum extent – it won’t always be perfect)
• Lossless (non-additive) join (mandatory)

Dependency Preservation Property

• When R is decomposed into D, we would like each dependency in R to be


preserved in D = {R1, R2, ..., Rm}.

• A decomposition D = {R1, R2, ..., Rm} of R is dependency- preserving with


respect to F if the union of the projections of F on each Ri in D is equivalent to F
• F+ = {π (R1) (F) È π (R2) (F) È … È π (R(m))(F)}+
(RULE) Preservation of L.J.P. in successive decompositions
Let DECOMP (R, F) represent the decomposition of an arbitrary relation R with its set of
functional dependencies F

IF DECOMP(R, F) = {R1, R2, … Rk , … Rn} = D has the L.J.P.


AND DECOMP(Rk , π(Rk)(F)) = {Q1, Q2, … Qm} = Dk ALSO has the L.J.P.
THEN {D – Rk } È {Dk} HAS L.J.P. TOO

*L.J.P. = Lossless Join Property.

DECOMPOSITION INTO 3NF ALGORITHM


With Dependency Preservation & Non-Additive (Lossless) Join Properties

Input: A universal relation R and a set of FDs F on the attributes of R.


1. Find a minimal cover G for F
2. For each X, where X ® Y is in G:
• Create a relation schema S with attributes X È {A1, ..., Ak}
• X ® A1, ..., X ® Ak are the only FDs with X on the LHS in G
(Note X is the key of S)
3. If none of the relation schemas in D contains a key of R:
• Create one more relation schema in D that contains attributes that
form a key of R.
4. Eliminate redundant relations, if any

It is always possible to find a dependency-preserving decomposition D with respect to F


such that each relation Ri in D is in 3NF.
This does not necessarily hold for BCNF.

DECOMPOSITION INTO BCNF


with Non-Additive (Lossless) Join Property

Input: A universal relation R and a set of FDs F on the attributes of R.

1. D ← {R}
2. While there is a relation schema Q in D that is not in BCNF do:
1. Find FD X ® Y that violates BCNF in Q
2. Decompose Q into (Q – Y) and (X È Y)
### Lossless Join Testing Algorithm
**Input**: A universal relation R, a decomposition D = {R1, R2, R3, ..., Rm} of R, and
a set F of functional dependencies
1. Create an initial matrix S with one row `i` for each relation `Ri` in `D`, and one
column `j` for each attribute `Aj` in `R`
2. For each row `i` and column `j` in S, if relation `Ri` includes attribute `Aj` then
set `S(i,j) = x`
3. For each FD `X -> Y`:
- For each row `R1` that contains all attributes in `X`:
- For each row `R2` that contains all attributes in `X` where `R1 != R2`:
- For each attribute `k` in `Y`:
- If `R1` contains `k`, then set `k` in `R2` to be the symbol `x`
- Repeat step 3 until a *full loop execution* results in no changes to S
4. If a row is made up entirely of `x` symbols, the decomposition has the lossless
join property. Otherwise, it does not.

### Example:
Let `R = {A, B, C, D, E, F}`, `D = {R1, R2, R3}`
`R1 = {A, B}`
`R2 = {C, D, E}`
`R3 = {A, C, F}`
`FD = { {A} -> {B}, {C} -> {D, E}, {A, C} -> {F} }`

**Step 1**:
A B C D E F

R1

R2

R3

**Step 2**:
A B C D E F

R1 x x

R2 x x x
R3 x x x

**Step 3 (A ® B)**:
A B C D E F

R1 x x

R2 x x x

R3 x x x x
• A exists in R1 and R3, as B exists in R1 we can set B to be in R3
**Step 3 (C ® D, E)**:

A B C D E F

R1 x x

R2 x x x

R3 x x x x x x
• C exists in R2 and R3, as D exists in R2 we can set D to be in R3. Same with E

**Step 3 (A, C ® F)**:


A B C D E F

R1 x x

R2 x x x

R3 x x x x x x

• A and C exist in R3, but since no other relation contains both A, C we can't set F
anywhere else

**Step 4**:
• As `R3` is composed entirely of `x` symbols, the decomposition has the lossless
join property.

You might also like