Professional Documents
Culture Documents
CPSC 471 Final Exam Notes
CPSC 471 Final Exam Notes
ENTITY RELATION
DIAGRAMS & RELATIONAL
MODELS
ERD NOTATION
RELATIONAL MODEL CONSTRAINTS
DOMAIN CONSTRAINT
• if one of the attribute values provided for the new tuple is not of the specified
attribute domain
KEY CONSTRAINT
• if the value of a key attribute in the new tuple already exists in another tuple in the
relation
REFERENTIAL INTEGRITY
• if a foreign key value in the new tuple references a primary key value that does not
exist in the referenced relation
ENTITY INTEGRITY
• if the primary key value is NULL in the new tuple
DELETE OPERATION: CONSTRAINT VIOLATIONS
DELETE may violate only referential integrity, and only if the primary key value of
the tuple being deleted is referenced from other tuples in the database
• CASCADE option: propagate the new primary key value into the foreign keys of the
referencing tuples
• SET NULL option: set the foreign keys of the referencing tuples to NULL
• One of the above options must be specified during database design for each
foreign key constraint
Relation State:
An instance of a relation within a larger relational database state. Denoted as
foo(S) where S is the state itself, and foo is the name of the table.
Key: A ‘minimal’ superkey K such that removal of any attribute from K breaks the
uniqueness property of the relation (K is no longer a superkey)
• If it is a partial participation on one side, i.e. (0…1) : 1, make the partial (left)
side’s primary key a foreign key of the full (right) side
• Still make two separate relational entities, but connect them like above
• Shift any attributes of the relation itself to the full side.
6) Multivalued-Attributes:
a. If an attribute is multi-valued (Vehicle color for example), apply a similar
idea to step (5)
b. Create a new entity in the model to represent this multi-valued attribute,
and add the containing entity’s primary key (Vehicle plate number, for
example), to this new relation
c. The primary key of multi-valued attribute relations is:
(Parent.PK, MV_Attribute_Value)
7) N-ary Relations
a. Another ‘extension’ of step (5)
•
[Building some context for our example]:
That is, Rel[A,B,C]. Suppose the primary keys of A, B and C are (respectively),
A.PK, B.PK, C.PK.
What to do:
b. Combine A.PK, B.PK, C.PK, into a new relationship entity called “Rel”
c. The primary key of ‘Rel’ is (A.PK, B.PK, C.PK).
d. Add any attributes of Rel(A,B,C) (bubbles attached to the relationship in
the ERD), to the relationship entity ‘Rel’.
e. A.PK, B.PK, C.PK are all foreign keys of Rel
(Continued)
HARDEST PARTS: GENERALIZATION AND SPECIALIZATIONS
9) UNION TYPES:
a. Create a new relational entity in the model to represent the union, but add
a surrogate (artificial) key to the relation to make it uniquely identifiable
b. NOTE: the surrogate key doesn’t really exist in the original schema.
It’s purely for convenience.
HANDLING COMPOSITE ATTRIBUTES:
1) Anywhere that a composite attribute occurs, break it up into its simpler
components (eg: ‘Name’ ➝ Fname + Lname).
2) In the case of a composite multi-valued attribute, apply step 6.
PROJECT (π):
π S(R)
• Acts on COLUMNS
• S is a subset of the attributes of R
• Not commutative
• Not associative
• Not distributive
• Equivalent to SELECT in SQL
SELECT (s):
s(Condition(s))(R)
• Acts on ROWS
• Selects only instances of R that satisfy the logical condition(s)
• Not commutative
• Not associative
• Not distributive
• Equivalent to WHERE in SQL
RENAME ρ
Given the relation R(A1, A2, … An):
UNION È
DIFFERENCE –
CARTESIAN PRODUCT ×
Making Relational Algebra Expressions
• Combine operations into one gigantic, illegible expression
• Break into smaller intermediate expressions
• Equivalent to the WITH operator in SQL
• intermediate results must be named
• Use the ‘¬’ symbol instead of the ‘=’ sign:
• Bar ¬ π (s(Foo))
• Bar ¬ s (Foo)
• Quux ¬ π(Bar)
INTERSECTIONS
DIFFERENCE
• The set A – B is the set A without any elements of set B
• SQL Equivalent: EXCEPT Clause
• Is NOT commutative
• Is NOT associative
• If B Ë A, then A – B = A
Cartesian Product:
Extends the domain of a relation by ‘adding’ a new set of attributes. Lists every possible
combination of ‘merged tuples’.
Given: R1 {(A1, B1, C1), (A2, B2, C2)} and R2 {(D1, E1, F1), (D2, E2, F2)}
R1 × R2 = {
(A1, B1, C1, D1, E1, F1)
(A1, B1, C1, D2, E2, F2)
(A2, B2, C2, D1, E1, F1)
(A2, B2, C2, D2, E2, F2)}
SELECT *
FROM A INNER JOIN B ON (A.x = B.x)
WHERE (Logical Condition)
SELECT A.x
FROM A INNER JOIN B
ON A.x = B.x
WHERE A.x = bar
A *<Condition> B
• Merges the duplicate columns in the joined set
• Joins tuples based on attributes with the same name
• Results in no duplicate columns, but can cause loss of information if used
improperly
• SQL Equivalent: EQUI JOIN(Logical condition)
• Is identical to an equijoin, except the JOIN attributes of
OTHER JOIN TYPES:
R ⟕Condition S
• Identical to INNER JOIN, except ALL tuples from R are included in the result
• Tuples of R that do not satisfy the join condition are padded with NULL
values
RIGHT JOIN:
R ⟖Condition S
• Identical to INNER JOIN, except ALL tuples from S are included in the result
• Tuples of S that do not satisfy the join condition are padded with NULL values
• Columns that are foreign (not part) of the original relation S are set to NULL
in tuples that do not satisfy the join condition
FULL JOIN:
R ⟗ Condition S
• Is a combination of the LEFT and RIGHT joins.
• Tuples of S that do not satisfy the join condition are padded with NULL values
• Tuples of R that do not satisfy the join condition are padded with NULL
values
DIVISION
• The inverse of the cartesian product (for sets of tuples)
• R1(Z) ÷ R2(Y) produces a relation R(X) that contains all tuples in R1(Z) that
appear in R1 in combination with every tuple from R2(Y), where Z = X È Y
AGREGATE FUNCTIONS:
Min
Max
Sum
Avg
Count
Syntax:
<Grouping_Attribute>F<Aggregate Function1, 2, etc>
OUTER UNION:
• Takes the union of two union-incompatible relations and merges them
•
• BOTH RELATIONS MUST BE PARTIALLY COMPATIBLE
• There must be some shared attributes between relations
Example:
Admin(Email, Position)
(liliana.brown@gmail.com, “Team Lead”)
Member(Email, FirstName, LastName)
(john.doe@gmail.com, John, Doe)
(liliana.brown@gmail.com, Liliana, Brown)
• Note: the colon ‘ : ’ means “such that ”, and isn’t a true operator
CONSTRUCTING QUERIES:
Let T represent a member of the relation R that has attributes A1, A2, … An
Example:
Given the following relational schema:
Find the first and last names of club members that are also team leads. Note that only
administrators can be team leads.
Solution
{M.Fname, M.Lname ½MEMBER(M) Ù $A(ADMIN(A) Ù A.Email = M.Email
Ù $ T(TEAM(T) Ù T.LeadEmail = A.Email)}
Query Tree Notation
SQL
SQL
THINGS TO KNOW:
• Relational Division does not exist in SQL
• Universal quantifiers do not exist in SQL
• Syntax is a mix of RA and TRC
• NULL has a special meaning in SQL, and is not the same as NULL in other
programming languages.
STATEMENT TYPES:
• CREATE
• Table
• View
• SELECT QUERIES
• UPDATE QUERIES
• INSERT QUERIES
• DROP STATEMENTS
• DELETE QUERIES
SELECT STATEMENTS:
Syntax:
SELECT <Attributes>
FROM <Table 1>, <Table 2>…
WHERE <CONDITION>
INSERT STATEMENTS:
Syntax:
INSERT INTO <Table Name>(<Insert Cols>) VALUES
(<Comma-separated list>),
THREE VALUED LOGIC TABLE
‘EXISTS’ ($) :
SELECT *
FROM <TABLE>
WHERE EXISTS(
<Nested Query>
);
• The EXISTS statement can be used inside the WHERE section of SELECT,
UPDATE, and DELETE Statements
•
• A universal quantifier does not exist in SQL
• Use NOT EXISTS instead, and invert the logic appropriately
‘FOR ALL’
Does not actually exist in SQL but can be performed with one of the following:
SELECT *
FROM Table_1
WHERE NOT EXISTS(Relation_X)
EXCEPT
SELECT *
FROM Table_1
WHERE EXISTS (Relation_X)
PRIMARY KEY(A2),
FOREIGN KEY (A1) REFERENCES TABLE(A2)
ON UPDATE CASCADE
ON DELETE SET NULL
);
CREATE VIEW STATEMENTS:
Views can be referenced like ordinary tables, but do not persist when the
database is shut down
UPDATE STATEMENTS:
UPDATE <table_name>
SET <Column Name>
WHERE <Condition>
DELETE STATEMENTS:
INSERT STATEMENTS:
TRIGGER-BODY SYNTAX:
WHEN
• acts like “IF” Statement:
WHEN(<condition>) <Call_Procedure>(<args?>)
Pseudo-records:
OLD: An alias to the existing tuple at the ‘cursor position’ (wherever the loop is within
the current table).
NEW: An alias to the “container” that has the new information being passed to the
trigger
NORMALIZATION + DECOMPOSITION
FUNCTIONAL DEPENDENCIES:
Given two attributes X and Y of a relation R
• X ® Y holds if whenever two tuples have the same value for X, they have the
same value for Y
o " t1, t2 Î rel(R): if t1[ X ] = t2 [ X ], t1[Y ] = t2[Y ]
o X is not necessarily the primary key of R
• If X IS a primary key of R, then " Y Î R, X ➝ Y
• The FDs of a schema or relation can be determined by examining the values of
attributes
o Just apply the rules above to see if an attribute forms a functional
dependency for another.
Armstrong’s Rules of Inference:
RULE Explanation
IR1: Reflexive If Y Í X, then X ® Y
IR2: Augmentation If X ® Y then (X È Z) ® (Y È Z)
IR3: Transitivity If X ® Y and Y ® Z, then X ® Z
IR4: Decomposition If X ® Y È Z, then X ® Y and X ® Z
IR5: Union If X ® Y and X®Z, then X ® Y È Z
IR6: Pseudo-Transitivity If X ® Y and (W È Y) ® Z, then (W È X) ® Z
CLOSURE OF AN ATTRIBUTE
Given:
• A relation R (like “Member” from your Final Project Schema
• A set of functional dependencies F on R
• A set of attributes X, X Í R
ATTRIBUTE COVERS:
Covers
Given two sets of functional dependencies F and G:
F covers G if G+ ⊆ F+
• This means that all FD’s in F can be inferred from G (via Armstrong’s rules)
Equivalence of FD’s
• F = G if all X ® Y in G can be inferred from F and vise versa
• Determine equivalence by calculating X+ wrt F
F = G if F covers G and G covers F
• Not guaranteed: consider G ⊂ F or F ⊂ G
• If G ⊂ F, then F will contain FD’s that G does not
Minimal Covers
• Covers but without redundancy
Canonical Form
Only one attribute on the right hand side of the arrow:
{A,B} ® {C, D} =
{A,B} ® {C}
{A,B} ® {D}
Extraneous Attributes
Given A set of FD’s F, and (X ® A) ∈ F
• If F – (X ® A) = F, remove dependency (X ® A)
ALGORITHM
K ← R (Why? Because R is a super key of itself!)
for each attribute A Î K do:
compute (K – A)+ WRT F
(Remove A from K temporarily)
if (K – A)+ contains all attributes of R do:
K = K – {A}
(K is still a superkey of R, so A is not a prime attribute)
else: stop
RULES OF INFERENCE:
RULE Explanation
IR1: Reflexive If Y Í X, then X ® Y
IR2: Augmentation If X ® Y then (X È Z) ® (Y È Z)
IR3: Transitivity If X ® Y and Y ® Z, then X ® Z
IR4: Decomposition If X ® Y È Z, then X ® Y and X ® Z
IR5: Union If X ® Y and X®Z, then X ® Y È Z
IR6: Pseudo-Transitivity If X ® Y and (W È Y) ® Z, then (W È X) ® Z
KEYS:
Superkey:
A set of attributes S is a set of attributes S subset-of R with the property that no
two tuples t1 and t2 in any legal relation state r of R will have t1[S]=t2[S]
Key:
A key K is a superkey with the additional property that removal of any attribute
from K will cause K not to be a superkey any more
Candidate Key
If a relation schema has more than one key, each is called a candidate key.
Primary Key
One of the candidate keys is arbitrarily designated to be the primary key, and
the others are called secondary keys.
Prime Attribute
A Prime attribute must be a member of some candidate key
Non-prime Attribute
A Nonprime attribute is not a prime attribute—that is, it is not a member of any
candidate key.
Functional Dependencies:
Binary Decomposition:
• Decomposition of a relation R into two relations.
• The decomposition of a relation R into relations R1, R2, … Rn such that the
natural join of all relations yields R. (the minimum value for n is 2)
All attributes of relation R are fully functionally dependent on the whole key.
EXAMPLE IN 2NF
R1 = {A, B, C, D, E, F}
FDs:
{A, B} ® {C, D}
{B, D} ® {E, F}
Note that FD1 {B, D} Ë FD2 {A, B} since FD2 is missing D, and {E, F} depends on {B, D}
DEFINITION:
A relation schema R is in 3NF if it satisfies these two conditions wherever X ® A holds
in R:
(1) X is a superkey of R OR
(2) A is a prime attribute of R
Alternative definition:
A relation schema R is in 3NF if every nonprime attribute of R:
(1) Is fully functionally dependent on every key of R (2NF), and
(2) Is non-transitively dependent on every key of R
CONVERSION PROCESS
For a relation schema R and the set of functional dependencies F on its attributes:
For each FD X ® Y over R, that is in violation of the nominal form (2NF or 3NF)
(1) Create a new sub-relation schema S
(2) Add X È Y as the attributes of S
(3) X becomes the primary key of S
Decomposition
• A Universal Relation Schema R = {A1, A2, … Am} can be decomposed into a set
of smaller relation schemas D = {R1, R2,…Rn}
DECOMOPOSITION GOALS:
• Attribute Preservation:
• Each A(k) will appear in at least one relation schema in R(k) ∈ D
• Each relation will be in 3NF or BCNF
• Dependency Preservation (to the maximum extent – it won’t always be perfect)
• Lossless (non-additive) join (mandatory)
1. D ← {R}
2. While there is a relation schema Q in D that is not in BCNF do:
1. Find FD X ® Y that violates BCNF in Q
2. Decompose Q into (Q – Y) and (X È Y)
### Lossless Join Testing Algorithm
**Input**: A universal relation R, a decomposition D = {R1, R2, R3, ..., Rm} of R, and
a set F of functional dependencies
1. Create an initial matrix S with one row `i` for each relation `Ri` in `D`, and one
column `j` for each attribute `Aj` in `R`
2. For each row `i` and column `j` in S, if relation `Ri` includes attribute `Aj` then
set `S(i,j) = x`
3. For each FD `X -> Y`:
- For each row `R1` that contains all attributes in `X`:
- For each row `R2` that contains all attributes in `X` where `R1 != R2`:
- For each attribute `k` in `Y`:
- If `R1` contains `k`, then set `k` in `R2` to be the symbol `x`
- Repeat step 3 until a *full loop execution* results in no changes to S
4. If a row is made up entirely of `x` symbols, the decomposition has the lossless
join property. Otherwise, it does not.
### Example:
Let `R = {A, B, C, D, E, F}`, `D = {R1, R2, R3}`
`R1 = {A, B}`
`R2 = {C, D, E}`
`R3 = {A, C, F}`
`FD = { {A} -> {B}, {C} -> {D, E}, {A, C} -> {F} }`
**Step 1**:
A B C D E F
R1
R2
R3
**Step 2**:
A B C D E F
R1 x x
R2 x x x
R3 x x x
**Step 3 (A ® B)**:
A B C D E F
R1 x x
R2 x x x
R3 x x x x
• A exists in R1 and R3, as B exists in R1 we can set B to be in R3
**Step 3 (C ® D, E)**:
A B C D E F
R1 x x
R2 x x x
R3 x x x x x x
• C exists in R2 and R3, as D exists in R2 we can set D to be in R3. Same with E
R1 x x
R2 x x x
R3 x x x x x x
• A and C exist in R3, but since no other relation contains both A, C we can't set F
anywhere else
**Step 4**:
• As `R3` is composed entirely of `x` symbols, the decomposition has the lossless
join property.