Professional Documents
Culture Documents
CS309 IDS Lecs 1-19-2
CS309 IDS Lecs 1-19-2
Rohit Saluja
Data V/S Information
Process
Data Information
Process
- data/information
- stored and accessed electronically
- AMU: access, manage, update
Source: https://en.wikipedia.org/wiki/Database
Image by fancycrave1 from Pixabay
DBMS
Database Management System
- Highly Valuable
- Relatively Large
- Simultaneously Accessed by
- Multiple Users and
- Multiple Applications
Image by Stephan from Pixabay , Reference: Database System Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan, SEVENTH EDITION
Database System Applications
Databases in our day to day life:
- Grade Management
- Amazon
- Facebook
- Google
- Top to Bottom:
- Time: Old to New
- Volume of data:
- Megabytes (MB) -> Petabytes (106 GB) -> Exabytes (109 GB)
- Should we store data for all 4 in same way?
- Enterprises
- Sales
- Accounting
- Human Resources Roll Number Name Fees CS309 CGPA
- Bank Transactions Credits
B19001 CS309
CSE 3-0-2-4
UG
CS309
Compulsory for CSE; CS elective for
CS207
EE and ME
IC152 The students will be exposed to the
Rs. 40,000 core concepts …
File Based
Student Information:- Course Information:-
B19001 CS309
UG
CS309
Compulsory for CSE; CS elective for Add Course
CS207
EE and ME
IC152 The students will be exposed to the Map
Rs. 40,000 core concepts … Student and
Course
Relational Model
Issues:
- Not self-describing
- Inadequate Abstraction
- Copies may lead to Data inconsistency + Redundancy
- Insulation between Program and Data
- Multiple programs access the same data
- Data’s life outlives program’s life
- Inefficient query mechanisms, e.g., how many students fail in CS 309 last year?
- access multiple files
- formats differ across files
- No notion of enforcing constraints (enforce a prerequisite)
- Inconsistent state: Dropping a course -> name removed from mailing list, but credits
to be completed in the sem. are not increased
- Debit from account A, but not credited to account B
- Atomicity: Either all operations execute, or they do not execute at all
File Based
Student Information:- Course Information:-
B19001 CS309
UG
CS309
Add Course
Compulsory for CSE; CS elective for
CS207
EE and ME
IC152 The students will be exposed to the Map
Student and
Rs. 40,000 core concepts …
Course
Query: Add
Course is Prerequisite
UG or not ? to Course
File Based
Student Information:- Course Information:-
CS309
B19001
Information and Database Systems
Rahul Sharma
3-0-2-4 Add Student
CSE
CS207
CS309 UG
Add Course
CS207 Compulsory for CSE; CS elective for EE
and ME
IC152 Map
The students will be exposed to the core
Student and
concepts …
Rs. 40,000 Course
Query: Add
Course is Prerequisite
UG or not ? to Course
File Based
Student Information:- Course Information:-
CS309
B19001
Information and Database Systems
Rahul Sharma
3-0-2-4 Add Student
CSE
CS207
CS309 UG
Add Course
CS207 Compulsory for CSE; CS elective for EE
and ME
IC152 Map
The students will be exposed to the core
Student and
concepts …
Rs. 40,000 Course
Query: Add
Course is Prerequisite
UG or not ? to Course
Relational Model
Issues:
- Not self-describing
- Inadequate Abstraction
- Copies may lead to Data inconsistency + Redundancy
- Insulation between Program and Data
- Multiple programs access the same data
- Data’s life outlives program’s life
- Inefficient query mechanisms, e.g., how many students fail in CS 309 last year?
- access multiple files
- formats differ across files
- No notion of enforcing constraints (enforce a prerequisite)
- Inconsistent state: Dropping a course -> name removed from mailing list, but credits
to be completed in the sem. are not increased
- Debit from account A, but not credited to account B
- Atomicity: Either all operations execute, or they do not execute at all
File Based
Student Information:-
B19001
Rahul Sharma
Add Student
CSE Query:
What are
CS309 B19001’s
courses? Add Course
CS207
IC152 Query: Map
What is B19001’s Student and
Rs. 40,000 fees? Course
Relational Model
Issues:
- Not self-describing
- Inadequate Abstraction
- Copies may lead to Data inconsistency + Redundancy
- Insulation between Program and Data
- Multiple programs access the same data
- Data’s life outlives program’s life
- Inefficient query mechanisms, e.g., how many students fail in CS 309 last year?
- access multiple files
- formats differ across files
- No notion of enforcing constraints (enforce a prerequisite)
- Inconsistent state: Dropping a course -> name removed from mailing list, but credits
to be completed in the sem. are not increased
- Debit from account A, but not credited to account B
- Atomicity: Either all operations execute, or they do not execute at all
Student Information Student
File Based for Accounts:- Courses:-
Data Model
Requirement Analysis Conceptual Design Mapping
(Logical Design)
stock-price price name A B
name
Index
Data Model
Requirement Analysis Conceptual Design Mapping
(Logical Design)
stock-price price name A B
name
Attributes of
Company makes Product Entities
a
1
b
2
C
3
d
one-one
Multiplicity of ER Relations
a
1
b
2
C
3
d
many-one
Multiplicity of ER Relations
a
1
b
2
C
3
d
many-many
Database Design Process: Conceptual Design
E/R Diagram
Conceptual Design
first_name last_name
name
address
Person
phone_no address
Person
date_of_birth
age
Person
- An attribute for which each entity must have a unique value is called a key
attribute of the entity type. For example, Adhaar of EMPLOYEE
- A key attribute may be composite. For example, VehicleTagNumber is a key of
the CAR entity type with components (Number, State).
- An entity type may have more than one key. For example, the CAR entity type
may have two keys:
- VehicleIdentificationNumber (popularly called VIN) and
- VehicleTagNumber (Number, State), also known as license_plate number
- Entity Set – collection of entities of a particular type
Structural Constraints on Relationships
Specifies that each entity e in E participates in at least min and at most max
relationship instances in R
Participation Constraints
- Total participation (indicated by double line): every entity in the entity set
participates in at least one relationship in the relationship set.
- Partial participation: some entities may not participate in any relationship in the
relationship set
Some
courses
Students registers Courses might not be
offered in a
semester.
Roles
manager
employee works for
worker
- Another E.g.
- Course has another course as prerequisite.
Weak Entities
- An entity type may have no key. Then it is referred to as weak entity.
- e.g. Payment of Loans
- Without a Loan, there is no meaning to Payment.
- Thus, the existence of a weak entity depends on the existence of an identifying entity
- Other e.g.
- Employee - has - dependants (weak entity).
- Course <- has = Sections/Course Offering (Total Participation).
Weak Entities
- The discriminator (or partial key) of a weak entity set is the set of attributes that
distinguishes among all the entities of a weak entity set.
- Total participation of weak entity
- One-to-Many relationship from the identifying to weak entity
payment-number
date
SID SName
Rating
Key for Relationship Set Courses Instructors
- Entities
- Course
- …
Aggregation
How to model relations between company, student and job offer?
Aggregation
How to model relations between company, student and job offer?
Aggregation
How to model relations between company, student and job offer?
- E/R Model does not allow relationships between relationships
Aggregation
How to model relations between company, student and job offer?
- E/R Model does not allow relationships between relationships
Aggregation
How to model relations between company, student and job offer?
For union type, individual entities will not be similar, e.g. e-mail is union type of
dissimilar entities like text, link, and attachment. E-mail “contains” items which is
union of text, link, and attachment.
Formal Definitions of EER Model
Class C:
- could be entity type, subclass, superclass, or category
Subclass S is a class whose:
- Type inherits all the attributes and relationship of a class C
- Set of entities must always be a subset of the set of entities of the other
class C
- S⊆C
- C is called the superclass of S
- A superclass/subclass relationship exists between S and C
Formal Definitions of EER Model
- Specialization Z: Z = {S1, S2,…, Sn} is a set of subclasses with same
superclass G; hence, G/Si is a superclass relationship for i = 1, …., n.
- G is called a generalization of the subclasses {S1, S2,…, Sn}
- Z is total if we always have:
- S1 ∪ S2 ∪ … ∪ Sn = G;
- Otherwise, Z is partial.
- Z is disjoint if we always have:
- Si ∩ Sj empty-set for i ≠ j;
- Otherwise, Z is overlapping.
Formal Definitions of EER Model
- Predicate: A condition expression which evaluates to True or False.
- Subclass S of C is predicate defined if predicate (condition) p on attributes
of C is used to specify membership in S;
- that is, S = C[p], where C[p] is the set of entities in C that satisfy condition p
- A subclass not defined by a predicate is called user-defined
Formal Definitions of EER Model
- Category or UNION type T
- A class that is a subset of the union of n defining superclasses D1, D2,…Dn, n>1:
- T ⊆ (D1 ∪ D2 ∪ … ∪ Dn)
- Can have a predicate pi on the attributes of Di to specify entities of Di that
are members of T.
- If a predicate is specified on every Di:
- T = (D1[p1] ∪ D2[p2] ∪…∪ Dn[pn])
Summary
- Introduced the EER model concepts
- Class/subclass relationships
- Specialization and generalization
- Inheritance
Database Design Process Physical Design
Data Model
Requirement Analysis Conceptual Design Mapping
(Logical Design)
stock-price price name A B
name
Data Model
Requirement Analysis Conceptual Design Mapping
(Logical Design)
stock-price price name A B
name
B20120 8.8
- Values in a tuple can be:
- Atomic values, e.g. composite attribute (name) has atomic values (first name, middle name, …)
- Null value
- Unknown
- No value
- May create confusion whether value is unknown or no, so we generally avoid null values
- A specific tuple is referred to as t
- t[Ai] gives value of attribute Ai for tuple t
- Referring to subset of attributes, e.g, A1, Ak, and Aj attributes in tuple t:
- t[A1, Ak, Aj]
S_RNo CGPA
B20120 8.8
- t[Ai] gives value of attribute Ai for tuple t
- Relational Integrity Constraints
- Conditions that must hold on all valid relation instances
- Key constraints: For any two distinct tuples t1 and t2, if Sk is the super key,
then
- t1[Sk] ≠ t2[Sk], i.e. values cannot be same
- E.g. {B20320, 8.8} ≠ {B20120, 8.8}
- Entity constraints:
- Can candidate (minimal super key) key have length > 1 ?
- If Pk is primary key, then
- t1[Pk] ≠ null for any tuple t in r(R)
- Referential Integrity Constraints
- Student enrolls in Courses
Here
- Student table is called as referenced relation
- Enrols table is called as referencing relation
Relational Database - S1 in relation Enrols is value of foreign key
- Constraint: The value of the foreign key
should be value of an existing primary key
- Referential Integrity Constraints in the referenced relation
- Student enrolls in Courses
- S1, C2
C1, …
S1, S1Name, ..
C2, …
- S2, C1
S2, …
Sn, …
- Sn+1, C2 Cn, …
Creating structure of the table is called defining
schema of the table.
- Schema for Student table: Student (SID, SName, …)
Relational Database - Schema for Enrols table: Enrols (SID, CID, grade, …)
- Schema for Course table: Student (CID, CName, …)
- Referential Integrity Constraints
- Student enrolls in Courses
C2, …
- S2, C1
S2, …
Sn, …
- Sn+1, C2 Cn, …
- S1 in relation Enrols is value of foreign key
- Foreign key can:
- Be existing primary key in referenced relation
Relational Database - Have null value
- Then the foreign key cannot be a prime
attribute, or combination containing it
- Referential Integrity Constraints cannot form a primary key
- Student enrolls in Courses
- S1, C2
C1, …
S1, S1Name, ..
C2, …
- S2, C1
S2, …
Sn, …
- Sn+1, C2 Cn, …
Semantic Constraints
- Value Constraint:
- E.g. lower limit and upper limit to student’s salary
- Non-volatility constraint
- Data item cannot be modified
- Record constraints
- Deductions cannot be greater than gross salary
- Required during Insert, Update, Delete operations
- E.g. Enrol and Student both have SID attribute
- delete entry from Student table
- Update entry on Student table: ME2XX -> CS2XX
- Cascade operations
- CS207: {Triggers, Assertions}
ER Diagram to Relational Model Physical Design
Data Model
Requirement Analysis Conceptual Design Mapping
(Logical Design)
stock-price price name A B
name
FIGURE 4.7
A specialization
lattice with multiple
inheritance for a
UNIVERSITY
database.
Representing Class Hierarchy
- Mapping of Shared Subclasses (Multiple Inheritance)
- A shared subclass, such as STUDENT_ASSISTANT, is a subclass of
several classes, indicating multiple inheritance. These classes must all
have the same key attribute; otherwise, the shared subclass would be
modeled as a category.
Example – Representing Class Hierarchy
FIGURE 7.5 Mapping the EER
specialization lattice in Figure 4.6
using multiple options.
Major
pk
Specialization Super-Class
1. Multiple relations
- Make table for superclass as well as subclasses
- Primary key of sub-class will be assigned same as primary key d
of super-class (because it is is-a relationship)
S1 S2
p1k p2k
pk
Specialization Super-Class
S1 S2
b1 b2
a1 a2
pk
Specialization Super-Class
1. Multiple relations
2. Multiple relations, but only for sub-classes
3. Single relation for super-class only d
- type attribute needed
- Will work only in case of disjoint
- n subclasses implies n values
- type: {1, 2}, where 1 represents s1, and 2
represents s2. S1 S2
- Overlapping -> s1 and s2 needed, n subclasses
implies 2n values
- Type (may use binary representation like
following):
- {0, 1, 2, 3}, where 3 (= b11) means s1 b2
and s2 both, 0 (=b00) means none of b1
s1 and s2.
- Total participation => 2n -1 values
- type: {1, 2, 3}
- Limitation: a2
- As local attributes of subclasses increase, a1
null values increase
pk
Specialization Super-Class
1. Multiple relations
2. Multiple relations, but only for sub-classes
3. Single relation for super-class only d
4. Single relation with multiple type attributes
- Multiple type attributes (n - boolean flags)
- Works for overalping case also
S1 S2
b1 b2
a1 a2
sk
Sub-Class
Category Union Type
- S1, S2, … are of different types
- So, S1, S2, … can have different keys
- What will be key for Sub-class given superclass have different U
keys?
- Create surrogate (placeholder) key sk of length N = |p1k| + |p2k| + ..
- Add foreign keys to S1, S2, …
- referenced Sub-class’s surrogate key sk S1 S2
- In other words:
- For mapping a category whose defining superclass have
different keys, it is customary to specify a new key attribute,
called a surrogate key, when creating a relation to correspond
to the category. p1k p2k
- In the example on next slide we can create a relation OWNER
to correspond to the OWNER category and include any
attributes of the category in this relation. The primary key of
the OWNER relation is the surrogate key, which we called
OwnerId.
FIGURE 4.8
Two categories (union
types): OWNER and
REGISTERED_VEHICLE.
FIGURE 7.6
a1 a2 a3
a1 a2 a3 b1 b2
b1 b2
Relational Algebra
- How do we write queries on the relational model?
- Formal method/specification of how to write specified queries etc. on
the relational model is Relational Algebra.
- Relational algebra is formal query language defined over
relational model.
- SQL is structured query language implemented on mySQL
database (and other databases!).
- The basic set of operations for the relational model is known as the
relational algebra.
- The result of a relational algebra operation is a new relation.
- Relational Algebra Expression – sequence of relational algebra
operations
Example: simple college admissions database
College (cName, state, enrollment)
aID
Theta Join: combine two relations
- Will Natural Join work if common attribute names are different in two tables?
- E.g. sID and aID in following tables.
- So theta join is needed
- R ⋈θS, where θ is R.att1 = S.att2
- Theta join is basic operation implemented in DBMS
- Term “join” often means theta join
aID
Relational Algebra Operations From Set Theory
- Union, Intersection, Difference, etc.
- Type Compatibility
- The operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) must have the same
number of attributes, and the domains of corresponding attributes must be compatible;
that is, dom(Ai)=dom(Bi) for i=1, 2, ..., n.
- The resulting relation for R1 U R2, R1 ∩ R, or R1-R2 has the same attribute names as the
first operand relation R1 (by convention).
aID
Union Operator
- List of college and student names
- T <- πcName(College) U πsName(Student)
- Resultant Schema will be T(cName)
- Entries will be union of cName and sName
- If all student and college names are unique, what will be |T|
- |College| + |Student|
- In general, |T| <= |R1| + |R2| for T <- R1 U R2
aID
Difference Operator
- IDs and names of students who didn’t apply anywhere
- DA <- πsID(Student) - πsID(Apply)
- DA ⋈ πsID, sName(Student)
aID
Intersection Operator
- Id and Name of students whose GPA is greater than 9 but were rejected.
- Hint: start with πsID(σGPA > 9 (Student))
- R <- πsID(σGPA > 9 (Student)) ∩ πsID(σdec = reject (Apply))
- R ⋈ πsID, sName(Student)
- Id and Name of students who applied to both CS and EE major.
Rename Operator
- Rename a relation
- Rename the columns
- Rename both column and relation
Rename Operator
- ρR(a1, a2, a3, …)(S)
- S gets renamed to R,
- and attributes of S get renamed to a1, a2, a3, …
- For disambiguation in “self-joins”
- Pairs of colleges in same state
- Hint: Start with ρC1(College)
- R <- σC1.state = C2.state ^ C1.cName ≠ C2.cName(ρC1(College) X
ρC2(College))
- πC1.cName, C2.cName(R)
- How to avoid following:
- IIT Mandi, IIT Ropar
- IIT Ropar, IIT Mandi
- Change C1.cName ≠ C2.cName to
-
C1.cName >l C2.cName
-
>l represents lexicographic order
Division Operation
- The division operation is applied to two relations
R(Z) ÷ S(X),
- where X is subset of Z.
- Let Y = Z - X (and hence Z = X U Y);
- that is, let Y be the set of attributes of R that are
not attributes of S.
- For a tuple t to appear in the result T of the
DIVISION, the values in t must appear in R in
combination with every tuple in S.
- The result of DIVISION is a relation T(Y)
- that includes a tuple t if tuples tR appear in R
with tR [Y] = t, and with tR [X] = ts for every
tuple ts in S.
Division Operation
Example:
- R has SID and TID pairs for all tracks.
- Track IDs of a particular destination
are given in S.
- Division will give ID of students who
have gone on all tracks of the
particular destination.
Additional Relational Operations
ℱ
<group-by attributes> <function list>
(R)
Use of Functional Operator
- Given a relational schema: Employee(SSN, Dno, Salary)
- ℱMAX Salary(Employee)
- retrieves the maximum salary value from the Employee relation
- How to get id of employee with maximum salary?
- Note ℱ will return a relation, not a scalar value.
- ℱMIN Salary(Employee)
- retrieves the minimum Salary value from the Employee relation
- ℱSUM Salary(Employee) (Employee)
- retrieves the sum of the Salary from the Employee relation
- Dno ℱ COUNT SSN, AVERAGE Salary(Employee)
- groups employees by DNO (department number) and computes the
count of employees and average salary per department.
- Note: count just counts the number of values in that column (ignores
NULL), without removing duplicates.
Aggregate Function Output
The OUTER JOIN Operation
- In NATURAL JOIN tuples without a matching (or related) tuple
are eliminated from the join result.
- Tuples with null in the join attributes are also eliminated.
- This amounts to loss of information.
- To retain all the tuples in R or S or both whether or not they
have matching tuples in the other relation, we use outer joins.
- Left outer join
- Right outer join
- Full outer join
- Pad with null value if no matching tuple is found in the other
relation.
The OUTER JOIN Operation
- Left outer join example
- Employee(EName, SSN)
- Department(DName, Manager SSN)
- Employee ⋈ ρ(DName, SSN)(Department) will only return tuples
with Employees who are managers.
- If we want all employees, then we should use:
- Employee ⟕ ρ(DName, SSN)(Department)
Alter schema add ‘rating’ for every faculty
- ρ(Fid, Rating)(πFid,‘1’(Rating1-Faculty))
ID2 1 ID2 1
ID3 1 ID3 1
ID4 1 ID4 1
MySQL Workbench
- Signup and install from:
- https://dev.mysql.com/downloads/workbench/
- Add new database
- Add new table
- PK: Primary Key
- NN: Not Null (Primary Key)
- UQ: Unique
- UN: Unsigned
- ZF: Cannot have zero
- AI: Auto Increment
- Rows in “Columns tab” represent different columns
MySQL Workbench
- Add foreign key
- Old version: Relational View (for foregin key
constraints)
- 8.0:
- foreign key tab in same table
- If primary key is deleted, then:
- cascade (delete foreign key as well)
- no action (do nothing)
- set null (make foreign key value as null)
- restrict (Don't allow to delete particulary tuple from
reference table)
- Inserts tab: add values to table
Announcements
- No classes this Thursday and Friday
- Class at 7:30 -8:20 p.m. today
- Class at 7:30 -8:20 p.m. tomorrow
- No Lab on Saturday
Relational Design Theory
Physical Design
Data Model
Requirement Analysis Conceptual Design Mapping
(Logical Design)
stock-price price name A B
name
- Redundancy
- May be ok if we have large space
- But Redundancy leads to following.
- Inconsistencies Processor Disk
- Query Efficiency
- In general query efficiency
goes down (σSID > S3 (E))
SName
- In this case it increases
σSID > S3, CID < C2 (E) Semester CID CName
- RAM: faster but SID
less memory
- Disk: slow but high
Memory, has all records
- Multiple cycles needed Student Enrols Course
If multiple chunks of tables
are transferred from Disk to
RAM due to redundancy
- SID, CID: 6 chars each SID SNme CID CName Semester
- CName, SName: 50 chars each
- CSyllabus: 1500 chars
- Size of record with only SID and CID:
12 chars in query
- 1600 chars in query: if multiple chunks of a
table shifted to RAM, then in generally query
efficiency goes down due to redundancy.
What is a Good Design?
- No Redundancy
- Query should be very efficient
- No loss of information
- Lossless Decomposition: Decomposing R into R1 and R2 should not lead to loss
of info
- i.e. πR1(R) ⋈ πR2(R) should give back original set of tuples in R
R1 R2
Relation Design Theory
- Start with a mega relation with all the attributes
- Provide constraints (or functional dependencies) that hold in the domain we are
trying to model
- E.g. SID -> SName (i.e. SID determines SName is a functional dependency)
- i.e. if t1[SID] = t2[SID] => t1[SName] = t2[SName]
- CID determines CName is another functional dependency
- How to know functional dependencies?
- Domain experts will know the functional dependencies, ask them
- Infer from the data, confirm with domain experts and add constraint if true
- Automatic Design (Decomposition)
- No redundancy
- Efficient queries
R - Lossless decomposition
R1 R2
Relation Design Theory: Normalization
- Normalization: Process to organize data and attributes in db
- Normalization happen through series of Normal Forms (which final set of relations
satisfies)
- An algorithm for automatic decomposition lead to a particular normal form
- Several Normal Forms
- 1NF, 2NF, 3NF, Boyce-Codd Normal Form (BCNF), 4NF, 5NF
- 1NF: attributes must be atomic (e.g. FName, LName), but does not eliminate
redundancy
- 2NF enforce some redundancy, 3NF further enforce some redundancy and so on..
- Lower order NFs permit some redundancy which Higher order NFs eliminate
- Given all attributes, functional dependencies, based on normal form we get automatic
decomposition
R
R1 R2
Functional Dependencies and BCNF
- Apply (SSN, sName, cName)
- Redundancy; Update & Deletion Anomalies
- Storing SSN-sName pair once for each college
- Functional Dependency: SSN -> sName
- Same SSN always has same sName
- Should store each SSN’s sName only once
- Boyce-Codd Normal Form:
- If A -> B then A is a key
- Decompose: Student (SSN, sName) Apply(SSN, cName)
R1 R2
Functional Dependencies and BCNF
- Student (SSN, sName, address, HScode, HSname, HScity, GPA, priority)
- Functional Dependencies:
- SSN -> sName, address
- t1[SSN] = t2[SSN] => t1[sName] = t2[sName] and so on
- Refer this link to know why SSN does not determine HScode always
- HScode -> HSname, HScity
- GPA -> priority
- {SSN, HScode} -> GPA
- Boyce-Codd Normal Form:
- If A -> B then A is a key
- Decompose:
- R1 (SSN, sName, address)
- R2 (HScode, HSname, HScity) R3 (SSN, HScode, GPA)
- R4 (GPA, priority)
R
R1 R2
Full Functional Dependency (2NF)
- X̄ ->Ȳ is a full functional dependency if removal of any attribute A from X̄ means
that the dependency does not hold any more.
- E.g. {SId, PNo} -> hrs ✓
- is a full functional dependency
- E.g. {SId, PNo} -> PName ❌
- is not a full functional dependency
- {PNo} -> PName ✓
- 2NF: A relation is in 2NF if every non-prime attribute A in R is fully functional
dependant on all candidate keys.
- E.g. Student (SId, SName, PNo, PName, hrs)
- Functional Dependencies:
- SId, Pno -> rest of (non-prime) attributes
- Pno -> PName
- SId -> SName
- {SId, Pno} is candidate key
- Pno -> PName and SId -> SName violate 2NF
- As PName is partially dependant on Pno (not fully)
- What if SName and PName are all unique
- then all attributes except hrs are candidate keys
- and then Student (SId, SName, PNo, PName, hrs} is in 2NF (next slide).
Full Functional Dependency (2NF)
- X̄ ->Ȳ is a full functional dependency if removal of any attribute A from X̄ means
that the dependency does not hold any more.
- E.g. {SId, PNo} -> hrs ✓
- is a full functional dependency
- E.g. {SId, PNo} -> PName ❌
- is not a full functional dependency
- 2NF: A relation is in 2NF if every non-prime attribute A in R is fully functional
dependant on all candidate keys.
- E.g. Student (SId, SName, PNo, PName, hrs)
- What if SName and PName are all unique
- then all attributes except hrs are candidate keys
- and then Student (SId, SName, PNo, PName, hrs} is in 2NF. Still redundancy!
- Functional Dependencies:
- SId -> hrs ✓
- Pno -> hrs ✓
- SName -> hrs ✓
- PName -> hrs ✓
- {SId, Pno} -> hrs is a functional dependency (not fully ❌), but 2NF holds as {SId, Pno} is not
a candidate key. Still redundancy (SId -> SName and PNo -> PName)!
Non-trivial Functional Dependancy: 3NF
- X̄ ->Ȳ is trivial functional dependency if Ȳ ⊆ X̄
- Else it is not-trivial
- If X̄ ∩ Ȳ = Φ, then X̄ ->Ȳ is completely non-trivial functional dependency
- 3NF: Whenever a non-trivial functional dependency X -> A holds in R, then:
- Either (a) X is super-key of R
- Or (b) A is a prime attribute of R
- E.g. Student (SId, PNo, PName, Hrs, Salary) and PNames are all unique
- It does not violates 2NF ✓
- PNo -> PName does not violates 3NF ✓
- Hrs -> Salary violates 3NF ❌
- Note: Student (SId, PNo, PName, Hrs) given PNames are all unique does not violate 3NF
R1 R2
References and Acknowledgements
Text books:
- Database System Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan, Seventh Edition.
Slides derived from:
- Dr. Sriram’s Lectures in 2021
- J Ullman’s slides on ER modelling, Stanford University
- Dan Suciu’s slides, University of Washington
- Chapter 2 slides of Database System Concepts by Korth & Silberschatz
- Chapter 3 slides from Fundamentals of Database Systems by Navathe
Slides derived from slides on ER model to Relational Model
- Jung T. Chang, San Hose State University
- David Toman, University of Waterloo
- Sunnie S Chung, Ohio State University
- Navathe slides on Relational model
- Dr. Sriram’s Lectures in 2021
Relational Algebra: Slides derived from Navathe chapter 6 and from relational algebra slides by Jeniffer Widom, Stanford University
Relational Design Theory: Slides derived from Prof. Jennifer Widom, Stanford University and Navathe Chapter 10