You are on page 1of 202

Introduction

CS309: Information and Database Systems

Rohit Saluja
Data V/S Information

Process

Data Information

Images by Gerd Altmann and Pexels from Pixabay


Data V/S Information

Process

Images by Maikol Aquino and Pexels from Pixabay


Database
Organized collection of:

- data/information
- stored and accessed electronically
- AMU: access, manage, update

Source: https://en.wikipedia.org/wiki/Database
Image by fancycrave1 from Pixabay
DBMS
Database Management System

- collection of interrelated data DBMS


- and set of programs
- to access those data

Database systems manage data collections that are:

- Highly Valuable
- Relatively Large
- Simultaneously Accessed by
- Multiple Users and
- Multiple Applications
Image by Stephan from Pixabay , Reference: Database System Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan, SEVENTH EDITION
Database System Applications
Databases in our day to day life:

- Grade Management
- Amazon
- Facebook
- Google
- Top to Bottom:
- Time: Old to New
- Volume of data:
- Megabytes (MB) -> Petabytes (106 GB) -> Exabytes (109 GB)
- Should we store data for all 4 in same way?

Reference: Dr. Sriram’s Previous Year Lectures


Database System Applications
Volume of Data in Grade Management:

- No. of Students: ~2000


- Courses per Student: ~ 7 (course/sem) * 8 (sem)
- At Least 2000*60 ~ 105 Entries
- Each Entry has roll number, value of grade, etc.
- Roll Number: ‘b21001’ is stored in ~ 10 bytes
- Grade is stored in ~ 1 byte
- ~ 11 bytes per entry
- So total is ~ 106, so ~ 1-10 MB of data
Database System Applications
Volume of Data in Amazon:

- No. of Prime Users: ~160 M


- Books, CDs, Games etc: ~ 168 M
- So ~ 107 GB (10 Petabytes) of data

https://www.businessofapps.com/data/amazon-statistics/ Image by kirstyfields from Pixabay


Database System Applications
Volume of Data in Facebook:

-No. of Users: ~ 3000 M


- Images, Posts, Videos: ~ 100 M
- So ~ 108 GB (100s of Petabytes) of data

Image by Tumisu from Pixabay


Database System Applications
- Can we store all of the data of Amazon/
Facebook/Google in a single machine?
- Data Centre of Google has millions of
Servers
- Issues?
- Failure
- Safety of data during failure
- Recovery mechanisms needed
- So, traditional techniques used for single machine
- Can not be used for large distributed set of machines

Image by Simon from Pixabay


A
B
Database System Applications
What has changed with time? C D
- Storage
- Format - no longer structured E F
- semi-structured: jsons, xmls
- unstructured: text, images
- Queries
- From graphs: who has maximum influence?
- Analytics
- Hard Disks -> Main memory databases
What will this course cover?
Different Data Models: How to Store Data?
- Tables: Relational Model
- No SQL, New SQL, etc.
- Relational Database Design
- Data Manipulation with SQL
- Transactions:
- Concurrency control
- Recovery
- Principles of query processing
- Data storage
- Scalable data processing

Image by Colin Behrens from Pixabay


Evaluations

Image by Colin Behrens from Pixabay


Labs
- A11 PC Lab (55 capacity)
- Time: Monday 2-5 p.m.
- Or A18-2
- In groups of 5
- Depends on number of students registered
- And Laptops students have
- Plag check will be done

Image by Colin Behrens from Pixabay


Relational Model
Where are they used?

- Enterprises
- Sales
- Accounting
- Human Resources Roll Number Name Fees CS309 CGPA
- Bank Transactions Credits

Image by Gerd Altmann from Pixabay


Relational Model
Example to motivate Relational Model:

- Office Automation System (OAS) System


- Storage: Only files are allowed
- Use cases:
- add student
- add course
- map student to course
File Based
Student Information:- Course Information:-

B19001 CS309

Rahul Sharma Information and Database Systems

CSE 3-0-2-4

UG
CS309
Compulsory for CSE; CS elective for
CS207
EE and ME
IC152 The students will be exposed to the
Rs. 40,000 core concepts …
File Based
Student Information:- Course Information:-

B19001 CS309

Rahul Sharma Information and Database Systems


Add Student
CSE 3-0-2-4

UG
CS309
Compulsory for CSE; CS elective for Add Course
CS207
EE and ME
IC152 The students will be exposed to the Map
Rs. 40,000 core concepts … Student and
Course
Relational Model
Issues:

- Not self-describing
- Inadequate Abstraction
- Copies may lead to Data inconsistency + Redundancy
- Insulation between Program and Data
- Multiple programs access the same data
- Data’s life outlives program’s life
- Inefficient query mechanisms, e.g., how many students fail in CS 309 last year?
- access multiple files
- formats differ across files
- No notion of enforcing constraints (enforce a prerequisite)
- Inconsistent state: Dropping a course -> name removed from mailing list, but credits
to be completed in the sem. are not increased
- Debit from account A, but not credited to account B
- Atomicity: Either all operations execute, or they do not execute at all
File Based
Student Information:- Course Information:-

B19001 CS309

Rahul Sharma Information and Database Systems


Add Student
CSE 3-0-2-4

UG
CS309
Add Course
Compulsory for CSE; CS elective for
CS207
EE and ME
IC152 The students will be exposed to the Map
Student and
Rs. 40,000 core concepts …
Course

Query: Add
Course is Prerequisite
UG or not ? to Course
File Based
Student Information:- Course Information:-

CS309
B19001
Information and Database Systems
Rahul Sharma
3-0-2-4 Add Student
CSE
CS207
CS309 UG
Add Course
CS207 Compulsory for CSE; CS elective for EE
and ME
IC152 Map
The students will be exposed to the core
Student and
concepts …
Rs. 40,000 Course

Query: Add
Course is Prerequisite
UG or not ? to Course
File Based
Student Information:- Course Information:-

CS309
B19001
Information and Database Systems
Rahul Sharma
3-0-2-4 Add Student
CSE
CS207
CS309 UG
Add Course
CS207 Compulsory for CSE; CS elective for EE
and ME
IC152 Map
The students will be exposed to the core
Student and
concepts …
Rs. 40,000 Course

Query: Add
Course is Prerequisite
UG or not ? to Course
Relational Model
Issues:
- Not self-describing
- Inadequate Abstraction
- Copies may lead to Data inconsistency + Redundancy
- Insulation between Program and Data
- Multiple programs access the same data
- Data’s life outlives program’s life
- Inefficient query mechanisms, e.g., how many students fail in CS 309 last year?
- access multiple files
- formats differ across files
- No notion of enforcing constraints (enforce a prerequisite)
- Inconsistent state: Dropping a course -> name removed from mailing list, but credits
to be completed in the sem. are not increased
- Debit from account A, but not credited to account B
- Atomicity: Either all operations execute, or they do not execute at all
File Based
Student Information:-
B19001
Rahul Sharma
Add Student
CSE Query:
What are
CS309 B19001’s
courses? Add Course
CS207
IC152 Query: Map
What is B19001’s Student and
Rs. 40,000 fees? Course
Relational Model
Issues:
- Not self-describing
- Inadequate Abstraction
- Copies may lead to Data inconsistency + Redundancy
- Insulation between Program and Data
- Multiple programs access the same data
- Data’s life outlives program’s life
- Inefficient query mechanisms, e.g., how many students fail in CS 309 last year?
- access multiple files
- formats differ across files
- No notion of enforcing constraints (enforce a prerequisite)
- Inconsistent state: Dropping a course -> name removed from mailing list, but credits
to be completed in the sem. are not increased
- Debit from account A, but not credited to account B
- Atomicity: Either all operations execute, or they do not execute at all
Student Information Student
File Based for Accounts:- Courses:-

Student Information:- B19001 B190012

B19001 Rahul Sharma Rahul Sharma


Rahul Sharma Rs. 40,000 CS309 …
Add Student
CSE
Query:
CS309 What are
B19001’s Add Course
CS207 courses?

IC152 Query: Map


What is B19001’s Student and
Rs. 40,000 fees? Course
Student Information Student
File Based for Accounts:- Courses:-

Student Information:- B19001 B190012

B19001 Rahul Sharma Rahul Sharma


Rahul Sharma Rs. 40,000 CS309 …
Add Student
CSE
Query:
CS309 What are
B19001’s Add Course
CS207 courses?

IC152 Query: Map


What is B19001’s Student and
Rs. 40,000 fees? Course
Student Information Student
File Based for Accounts:- Courses:-

Student Information:- B19001 B190012

B19001 Rahul Sharma Rahul Sharma


Rahul Sharma Rs. 40,000 CS309 …
Add Student
CSE
Query:
CS309 What are
B19001’s Add Course
CS207 courses?

IC152 Query: Map


What is B19001’s Student and
Rs. 40,000 fees? Course
Relational Model
Issues:
- Not self-describing
- Inadequate Abstraction
- Copies may lead to Data inconsistency + Redundancy
- Insulation between Program and Data
- Multiple programs access the same data
- Data’s life outlives program’s life
- Inefficient query mechanisms, e.g., how many students fail in CS 309 last year?
- access multiple files
- formats differ across files
- No notion of enforcing constraints (enforce a prerequisite)
- Inconsistent state: Dropping a course -> name removed from mailing list, but credits
to be completed in the sem. are not increased
- Debit from account A, but not credited to account B
- Atomicity: Either all operations execute, or they do not execute at all
Relational Model
Issues:
- Not self-describing
- Inadequate Abstraction
- Copies may lead to Data inconsistency + Redundancy
- Insulation between Program and Data
- Multiple programs access the same data
- Data’s life outlives program’s life
- Inefficient query mechanisms, e.g., how many students
got A* in CS 309 last year?
- access multiple files
- formats differ across files
- No notion of enforcing constraints (enforce a prerequisite)
- Inconsistent state: Dropping a course -> name removed from mailing list, but credits
to be completed in the sem. are not increased
- Debit from account A, but not credited to account B
- Atomicity: Either all operations execute, or they do not execute at all
Relational Model
Issues:
- Not self-describing
- Inadequate Abstraction
- Copies may lead to Data inconsistency + Redundancy
- Insulation between Program and Data
- Multiple programs access the same data
- Data’s life outlives program’s life
- Inefficient query mechanisms, e.g., how many students got A* in CS 309 last year?
- access multiple files
- formats differ across files
- No notion of enforcing constraints (enforce a prerequisite)
- Inconsistent state: Dropping a course -> name removed from mailing list, but credits to
be completed in the sem. are not increased
- Debit from account A, but not credited to account B
- Atomicity: Either all operations execute, or they do not execute at all
Relational Model
Issues:
- Not self-describing
- Inadequate Abstraction
- Copies may lead to Data inconsistency + Redundancy
- Insulation between Program and Data
- Multiple programs access the same data
- Data’s life outlives program’s life
- Inefficient query mechanisms, e.g., how many students got A* in CS 309 last year?
- access multiple files
- formats differ across files
- No notion of enforcing constraints (enforce a prerequisite)
- Inconsistent state:
- Dropping a course -> name removed from mailing list, but credits to be completed
in the sem. are not increased
- Debit from account A, but not credited to account B
- Atomicity: Either all operations execute, or they do not execute at all
Relational Model
DBMS solves these issues, and:

- Manages massive data persistently


- Stores data in highly available manner (even in case of machine going
down)
- Provides convenient and highly efficient way of accessing data
Recap
- Database: Collection of related data.
- DBMS helps manage massive persistent data in
- Safe,
- Highly available,
- Efficient, and
- Convenient manner
- File based approach as multiple limitations
DBMS Requirements
- Self-describing nature of data / Insulation between program and data
- via field names
- Catalogue
- size or storage requirement of each item
- previous applications still work even if new entries are added to data
- Controlling data redundancy and building appropriate data abstraction
- Redundancy may give rise to data inconsistency
- Efficient query mechanism
- Support from atomic operations / recovery from failure
- Non-atomicity may again lead to data inconsistency
- Enforcing data integrity constraints
- E.g., Adding CSAB8 as course should not be allowed
What Next?
- What process to follow to come up with design?
- We have requirements specified by end users.
- Ask different questions to user.
- Analyze requirements,
- Based on analysis, come up with design.
- What DBMS gives you?
Database Design Process Physical Design

Data Model
Requirement Analysis Conceptual Design Mapping
(Logical Design)
stock-price price name A B
name

Company makes Product


C D
category
Consult different stakeholders E-R Model
E F
to find out requirements Analyze Design Alternatives
R.No. Name Fees CGPA
and Select Appropriate One
Database Design Process Physical Design
Database Design Process Physical Design

Index

Sender Date Keywords

- Optimize Queries: Different indexes are created


based on different queries which are frequently
used.
Image by Oberholster Venita from Pixabay - For quick retrieval
Database Design Process Physical Design

Data Model
Requirement Analysis Conceptual Design Mapping
(Logical Design)
stock-price price name A B
name

Company makes Product


C D
category
Consult different stakeholders E-R Model
E F
to find out requirements Analyze Design Alternatives
R.No. Name Fees CGPA
and Select Appropriate One
Database Design Process: Conceptual Design
E-R Model
Conceptual Design
- We want to model a system for storing data about
companies and the products they make.
- Each company has a unique name, stock-price, etc. Entities: things
- Similarly, each product has a price tag and a category with independent
associated with it. existence
Relationships, in
which entities
stock-price price pid
name participate

Attributes of
Company makes Product Entities

____ Key/s of Entities:


Help uniquely identifying
category an entity
Database Design Process: Conceptual Design
E/R Diagram
Conceptual Design

name stock-price price pid


Entities: things
with independent
existence
Company makes Product
Relationships, in
which entities
category participate
address
employs buys Attributes of
Entities

Person ____ Key/s of Entities:


Help uniquely identifying
ssn name an entity
Multiplicity of ER Relations

a
1
b
2
C
3
d

one-one
Multiplicity of ER Relations

a
1
b
2
C
3
d

many-one
Multiplicity of ER Relations

a
1
b
2
C
3
d

many-many
Database Design Process: Conceptual Design
E/R Diagram
Conceptual Design

name stock-price price pid


Entities: things
with independent
existence
Company makes Product
Relationships, in
which entities
category participate
address
employs buys Attributes of
Entities

Person ____ Key of Entities


ssn name
Types of Attributes
middle_name

first_name last_name

name
address

Person

composite attribute: name


simple attribute: address
Types of Attributes

phone_no address

Person

single valued: address


multivalued: phone number
Types of Attributes

date_of_birth
age

Person

Derived attribute: age


Age is derived from date of birth
Types of Attributes
Entity Types and Key Attributes

- An attribute for which each entity must have a unique value is called a key
attribute of the entity type. For example, Adhaar of EMPLOYEE
- A key attribute may be composite. For example, VehicleTagNumber is a key of
the CAR entity type with components (Number, State).
- An entity type may have more than one key. For example, the CAR entity type
may have two keys:
- VehicleIdentificationNumber (popularly called VIN) and
- VehicleTagNumber (Number, State), also known as license_plate number
- Entity Set – collection of entities of a particular type
Structural Constraints on Relationships

- Cardinality ratio (of a binary relationship):


- 1:1,
- 1:N,
- N:1, or
- M:N
- Participation constraint (on each participating entity type): total (called
existence dependency) or partial.
The (min,max) notation relationship constraints

Specifies that each entity e in E participates in at least min and at most max
relationship instances in R
Participation Constraints

- Total participation (indicated by double line): every entity in the entity set
participates in at least one relationship in the relationship set.
- Partial participation: some entities may not participate in any relationship in the
relationship set

Some
courses
Students registers Courses might not be
offered in a
semester.
Roles

- Sometimes an entity set appears more than once in a relationship.


- Label the edges between the relationship and the entity set with names called
roles.

manager
employee works for
worker

- Another E.g.
- Course has another course as prerequisite.
Weak Entities
- An entity type may have no key. Then it is referred to as weak entity.
- e.g. Payment of Loans
- Without a Loan, there is no meaning to Payment.
- Thus, the existence of a weak entity depends on the existence of an identifying entity

payment loan_payment loan

- Other e.g.
- Employee - has - dependants (weak entity).
- Course <- has = Sections/Course Offering (Total Participation).
Weak Entities
- The discriminator (or partial key) of a weak entity set is the set of attributes that
distinguishes among all the entities of a weak entity set.
- Total participation of weak entity
- One-to-Many relationship from the identifying to weak entity
payment-number
date

payment loan_payment loan


Attributes on Relationship

- Hours is a function of both employee and project.


Equivalent Diagrams Without Attributes on
Relationships
- Create an entity set representing values of the attribute
- Make that entity set participate in the relationship.
More about Keys
- Key: Group of attributes that uniquely determine a particular entity.
- Simple terminology:
- Super key: essentially super sets containing key can also be key, i.e., {SID, (SID,
SName), …}
- Candidate key: Minimal set of attributes which uniquely determine the entity. If
SID is a key, SName can also be a key, then both are candidate keys.
- Primary key: One of the candidate key is designated as a primary key.

SID SName

____ Key/s of Entities:


Student Help uniquely identifying
an entity
Key for Relationship Set Courses Instructors

C1, C1Name, .. I1, …


- Super-key for Relationship Set: Union of the primary
C2, … I2, …
keys of each of the participating entities.
- Fore one-one relation, the primary key of relationship C3, … I3, …
can be one of the primary keys of entities, since they
Possible Entries
can be candidate keys.
- Relationship may not be binary always, tertiary etc. which can be part of
are also possible. Relationship Set:
CID CName IID IName
- C1, I1
- C2, I3
Course has Instructor - C3, I2
- C1, I3
Rating
Key for Relationship Set Courses Instructors

C1, C1Name, .. I1, …


- Super-key for Relationship Set: Union of the primary
C2, … I2, …
keys of each of the participating entities.
- Fore many-one relation, the primary key of C3, … I3, …
relationship can be one of the primary keys of
Possible Entries
entities which are on many side.
which can be part of
Relationship Set:
CID CName IID IName
- C1, I1
- C2, I1
Course has Instructor

Rating
Key for Relationship Set Courses Instructors

C1, C1Name, .. I1, …


- Super-key for Relationship Set: Union of the primary
C2, … I2, …
keys of each of the participating entities.
- Fore many-many relation, the primary key of C3, … I3, …
relationship can be union of the primary keys of
Possible Entries
entities.
which can be part of
Relationship Set:
CID CName IID IName
- C1, I1
- C1, I2
Course has Instructor - C2, I1
- C2, I2
Rating
Build an ER Model
Course Offerings in an University:

- Entities
- Course
- …
Aggregation
How to model relations between company, student and job offer?
Aggregation
How to model relations between company, student and job offer?
Aggregation
How to model relations between company, student and job offer?
- E/R Model does not allow relationships between relationships
Aggregation
How to model relations between company, student and job offer?
- E/R Model does not allow relationships between relationships
Aggregation
How to model relations between company, student and job offer?

- E/R Model does not allow relationships between relationships


- E.g.: Associate account officers with has account relationship set
Aggregation
How to model relations between company, student and job offer?

- E/R Model does not allow relationships between relationships


- E.g.: Associate account officers with has account relationship set
Design Techniques
- Don’t use an entity set when an attribute will do.
- Avoid redundancy.
- Limit the use of weak entity sets.
Ternary Relationship Examples
- When is it right? Examples?
- Student, Professor, and Project (when relation is MTP)
- Supplier, Product, and Consumer
- Breaking ternary relationship into 3 binary relations does not represent the semantics
correctly.
- Correct way of Splitting n-ary relationship:
Recap
- Entities
- Attributes
- Relationships
- Roles
- Weak Entities
- Participation Constraints - min, max notation
- Keys for entities and relationships
- Aggregation
Extended ER DIAGRAM (EER)
- Includes all modeling concepts of basic ER
- Additional Concepts:
- subclasses/superclasses
- specialization/generalization
- categories (UNION types)
- attribute and relationship inheritance
- These are fundamental to conceptual modeling
- The additional EER concepts are used to model applications more
completely and more accurately
Extended ER DIAGRAM (EER)
Subclasses and Superclasses
- An entity that is member of a subclass represents the same real-world
entity as some member of the superclass e.g. SECRETARY IS-A EMPLOYEE
- The subclass member is the same entity in a distinct specific role
- An entity cannot exist in the database merely by being a member of a
subclass; it must also be a member of the superclass
- Overlapping constraints and Covering constraints
Subclasses and Superclasses
Examples:
- A salaried employee who is also an engineer belongs to the two
subclasses:
- ENGINEER, and
- SALARIED_EMPLOYEE
- A salaried employee who is also an engineering manager belongs to the
three subclasses:
- MANAGER,
- ENGINEER, and
- SALARIED_EMPLOYEE
- It is not necessary that every entity in a superclass be a member of some
subclass
Representing Specialization in EER Diagrams

The set of subclasses is based upon some


distinguishing characteristics (job-type attribute) of
the entities in the superclass
Extended ER DIAGRAM (EER)
Attribute Inheritance in Superclass / Subclass
Relationships
An entity that is member of a subclass inherits
- All attributes of the entity as a member of the superclass
- All relationships of the entity as a member of the superclass
Examples?
- In the previous slide, SECRETARY (as well as TECHNICIAN and ENGINEER)
inherit the attributes Name, SSN, …, from EMPLOYEE
- Every SECRETARY entity will have values for the inherited attributes
Extended ER DIAGRAM (EER)
Extended ER DIAGRAM (EER)
Specialization
- Specialization is the process of defining a set of subclasses of a superclass
- The set of subclasses is based upon some distinguishing characteristics of
the entities in the superclass
- Example: {SECRETARY, ENGINEER, TECHNICIAN} is a specialization of
EMPLOYEE based upon job type.
- May have several specializations of the same superclass
- Attributes of a subclass are called specific or local attributes. For example,
the attribute TypingSpeed of SECRETARY
- The subclass can also participate in specific relationship types.
- For example, a relationship BELONGS_TO of HOURLY_EMPLOYEE
Specialization
Generalization
- Generalization is the reverse of the specialization process
- Specialization: process of defining a set of subclasses of a superclass
- Several classes with common features are generalized into a superclass;
- original classes become its subclasses
- Example: CAR, TRUCK generalized into VEHICLE;
- both CAR, TRUCK become subclasses of the superclass VEHICLE.
Generalization
Generalization
Constraints on Specialization/Generalization

- Two basic constraints can apply to a specialization/generalization:


- Disjointness Constraint – disjoint / overlapping (d/o):
- Completeness Constraint – total /partial
- Hence, we have four types of specialization/generalization:
- Disjoint, total
- Disjoint, partial
- Overlapping, total
- Overlapping, partial
Disjoint Partial Specialization
Overlapping Total Specialization
Specialization/Generalization Hierarchies, Lattices &
Shared Subclasses
- A subclass may itself have further subclasses specified on it forms a
hierarchy or a lattice
- Hierarchy has a constraint that every subclass has only one
superclass (called single inheritance); this is basically a tree
structure
- In a lattice, a subclass can be subclass of more than one superclass
(called multiple inheritance)
Shared Subclass
Lattice Example (University)
Categories (UNION TYPES)
- Superclasses represent different entity types
- Such a subclass is called a category or UNION TYPE
Union Types
Union Types V/S Disjoint/Overlapping Relation
Think whether it is “is-a” member type of relation or “union” or “contains” type of
relation.

For union type, individual entities will not be similar, e.g. e-mail is union type of
dissimilar entities like text, link, and attachment. E-mail “contains” items which is
union of text, link, and attachment.
Formal Definitions of EER Model
Class C:
- could be entity type, subclass, superclass, or category
Subclass S is a class whose:
- Type inherits all the attributes and relationship of a class C
- Set of entities must always be a subset of the set of entities of the other
class C
- S⊆C
- C is called the superclass of S
- A superclass/subclass relationship exists between S and C
Formal Definitions of EER Model
- Specialization Z: Z = {S1, S2,…, Sn} is a set of subclasses with same
superclass G; hence, G/Si is a superclass relationship for i = 1, …., n.
- G is called a generalization of the subclasses {S1, S2,…, Sn}
- Z is total if we always have:
- S1 ∪ S2 ∪ … ∪ Sn = G;
- Otherwise, Z is partial.
- Z is disjoint if we always have:
- Si ∩ Sj empty-set for i ≠ j;
- Otherwise, Z is overlapping.
Formal Definitions of EER Model
- Predicate: A condition expression which evaluates to True or False.
- Subclass S of C is predicate defined if predicate (condition) p on attributes
of C is used to specify membership in S;
- that is, S = C[p], where C[p] is the set of entities in C that satisfy condition p
- A subclass not defined by a predicate is called user-defined
Formal Definitions of EER Model
- Category or UNION type T
- A class that is a subset of the union of n defining superclasses D1, D2,…Dn, n>1:
- T ⊆ (D1 ∪ D2 ∪ … ∪ Dn)
- Can have a predicate pi on the attributes of Di to specify entities of Di that
are members of T.
- If a predicate is specified on every Di:
- T = (D1[p1] ∪ D2[p2] ∪…∪ Dn[pn])
Summary
- Introduced the EER model concepts
- Class/subclass relationships
- Specialization and generalization
- Inheritance
Database Design Process Physical Design

Data Model
Requirement Analysis Conceptual Design Mapping
(Logical Design)
stock-price price name A B
name

Company makes Product


C D
category
Consult different stakeholders E-R Model
E F
to find out requirements Analyze Design Alternatives
R.No. Name Fees CGPA
and Select Appropriate One
Database Design Process Physical Design

Data Model
Requirement Analysis Conceptual Design Mapping
(Logical Design)
stock-price price name A B
name

Company makes Product


C D
category
Consult different stakeholders E-R Model
E F
to find out requirements Analyze Design Alternatives
R.No. Name Fees CGPA
and Select Appropriate One
Types of Data Model
- Relational Model
- Graph Model
Relation in Sets
- ARB
- R⊆AXB
- X is cartesian product, e.g. A = {x, y, z} and B = {1, 2, 3},
- then A X B is { (x,1), (x, 2), (x, 3), (y,1), (y, 2), (y, 3), (z,1), (z, 2), (z, 3) }
S_RNo S_Name Fees CGPA
Relation in Database
R

- E.g. Table containing data of students


- S_RNo is a set
- S_Name is also a set
- Student is Relation on Attributes (S_RNo, S_Name, … )
- Each row in relational database is referred as a relational instance or a tuple
- Schema/Attributes: R (A1, A2, …, An)
- Ordering of attributes is there in R
- Table has fixed Schema
- Instances: r(R) ⊂ dom(A1) X dom(A2) X … X dom(An)
- dom represents domain
- r(R) is also called
- specific values of relation R
- population of R
- members of R
- r(R) is also a set or collection of all tuples, set -> so tuples are unordered
S_RNo CGPA

Relational Database B20320 8.8

B20120 8.8
- Values in a tuple can be:
- Atomic values, e.g. composite attribute (name) has atomic values (first name, middle name, …)
- Null value
- Unknown
- No value
- May create confusion whether value is unknown or no, so we generally avoid null values
- A specific tuple is referred to as t
- t[Ai] gives value of attribute Ai for tuple t
- Referring to subset of attributes, e.g, A1, Ak, and Aj attributes in tuple t:
- t[A1, Ak, Aj]
S_RNo CGPA

Relational Database B20320 8.8

B20120 8.8
- t[Ai] gives value of attribute Ai for tuple t
- Relational Integrity Constraints
- Conditions that must hold on all valid relation instances
- Key constraints: For any two distinct tuples t1 and t2, if Sk is the super key,
then
- t1[Sk] ≠ t2[Sk], i.e. values cannot be same
- E.g. {B20320, 8.8} ≠ {B20120, 8.8}
- Entity constraints:
- Can candidate (minimal super key) key have length > 1 ?
- If Pk is primary key, then
- t1[Pk] ≠ null for any tuple t in r(R)
- Referential Integrity Constraints
- Student enrolls in Courses
Here
- Student table is called as referenced relation
- Enrols table is called as referencing relation
Relational Database - S1 in relation Enrols is value of foreign key
- Constraint: The value of the foreign key
should be value of an existing primary key
- Referential Integrity Constraints in the referenced relation
- Student enrolls in Courses

SID SName CID CName

Student Enrols Course

Students - S1, C1 Courses

- S1, C2
C1, …
S1, S1Name, ..

C2, …
- S2, C1
S2, …

Sn, …
- Sn+1, C2 Cn, …
Creating structure of the table is called defining
schema of the table.
- Schema for Student table: Student (SID, SName, …)
Relational Database - Schema for Enrols table: Enrols (SID, CID, grade, …)
- Schema for Course table: Student (CID, CName, …)
- Referential Integrity Constraints
- Student enrolls in Courses

SID SName CID CName

SQL: Create Table for Enrols


Constraint: tS1[Sk] = tEn[Fk]
Student Enrols Course For Enrols SID and CID are
primary attributes of the
primary key formed by their
combination
Students - S1, C1 Courses
But SID is not primary key for
Enrols, its a foreign key
- S1, C2
C1, …
S1, S1Name, ..

C2, …
- S2, C1
S2, …

Sn, …
- Sn+1, C2 Cn, …
- S1 in relation Enrols is value of foreign key
- Foreign key can:
- Be existing primary key in referenced relation
Relational Database - Have null value
- Then the foreign key cannot be a prime
attribute, or combination containing it
- Referential Integrity Constraints cannot form a primary key
- Student enrolls in Courses

SID SName CID CName

Student Enrols Course

Students - S1, C1 Courses

- S1, C2
C1, …
S1, S1Name, ..

C2, …
- S2, C1
S2, …

Sn, …
- Sn+1, C2 Cn, …
Semantic Constraints
- Value Constraint:
- E.g. lower limit and upper limit to student’s salary
- Non-volatility constraint
- Data item cannot be modified
- Record constraints
- Deductions cannot be greater than gross salary
- Required during Insert, Update, Delete operations
- E.g. Enrol and Student both have SID attribute
- delete entry from Student table
- Update entry on Student table: ME2XX -> CS2XX
- Cascade operations
- CS207: {Triggers, Assertions}
ER Diagram to Relational Model Physical Design

Data Model
Requirement Analysis Conceptual Design Mapping
(Logical Design)
stock-price price name A B
name

Company makes Product


C D
category
Consult different stakeholders E-R Model
E F
to find out requirements Analyze Design Alternatives
R.No. Name Fees CGPA
and Select Appropriate One
Review of Concepts
- Relational Model is made up of tables
- A row of table = a relational instance/tuple
- A column of table = an attribute
- A table = a schema/relation
- Cardinality = number of rows
- Degree = number of columns
Review: A Schema/Relation

- Relational Model is made up of tables


From ER Model to Relational Model
Basic Ideas:
- Build a table for each entity set
- Build a table for each relationship set if necessary (more on this
later)
- Primary Key and Foreign key constraints
- Indivisibility Rule
Example – Strong Entity Set
Make a column in the table for each
attribute in the entity set
Representation of Weak Entity Set

- Weak Entity Set has no independent existence


- To build a table/schema for weak entity set
- Construct a table with one column for each attribute in the
weak entity set
- Augment the table with extra column containing the Primary
key of the identifying entity set
- Primary key of the weak entity set = Discriminator + Primary
key of the identifying entity set
Example – Weak Entity Set
Representation of Relationship Set

- Unary/Binary Relationship set


- Depends on the cardinality and participation of the
relationship
- Two possible approaches
- N-ary (multiple) Relationship set
- Primary Key Issue
- Identifying Relationship - identifying relationship is that
relationship where a child's foreign key is part of its primary key.
- No relational model representation necessary
Example – One-to-One Relationship Set
Representing Relationship Set
Unary/Binary Relationship [1:1]
Foreign key approach (if one of the entity is in total participation)
- Include foreign key in the relation with total participation
- No new table needed
Merged Relation approach
- Appropriate when participation from both entities is total, merge two relation tables
into 1.
- Only one relation (table) needed, with attributes from (E1 - {pk(E1)}( U (E2 - {pk(E2})U
pk(E1 or E2).
- In case pk(E1) and pk(E2) both represent some important information, then relation can
have attributes from E1 U E2, and primary key can be any one of pk(E1) or pk(E2).
Cross-referencing approach (generally used for many-many relationships)
- Create a new relation (table) for the relationship in ER Diagram
- Use key of either entity as key
Example – Many-to-One Relationship Set
Example – Many-to-One Relationship Set

SID Name Major GPA Prof_SSN Semester


Representing Relationship Set
Unary/Binary Relationship [1:N]
- For one-to-many relationship w/out total participation
- Create a new relation

- For one-to-many/many-to-one relationship with one entity set having


total participation on “many” side
- Augment the “many” side, with primary key of of the “one” side in
the relationship.
Representing Relationship Set
Unary/Binary Relationship
- For many-to-many relationship
- Create new relation
- Primary key of this new schema is the union of the foreign keys of
both entity sets.
- No augmentation approach possible…
Example – N-ary Relationship Set
Example – N-ary Relationship Set
Representing Relationship Set
N-ary Relationship
- Intuitively Simple
- Build a new table with as many columns as there are attributes
for the union of the primary keys of all participating entity sets.
- xAugment additional columns for descriptive attributes of the
relationship set (if necessary)
- The primary key of this table is the union of all primary keys of
entity sets that are on “many” side
Representing Relationship Set
Identifying Relationship
- You DON’T have to build a table/schema for the identifying
relationship set once you have built a table/schema for the
corresponding weak entity set
- Reason:
- A special case of one-to-many with total participation
- Reduce Redundancy
Example – Representing Composite Attribute
Example – Representing Composite Attribute
Representing Composite Attribute

- Relational Model: Indivisibility Rule Applies


- One column for each component attribute
- NO column for the composite attribute itself
Example – Representing Multivalued Attribute
Example – Representing Multivalued Attribute

Union of Father SSN


and Child SSN
(Generally)

Father SSN Child SSN

Relation table separated


for children to avoid
repeated entries of
father’s attributes with
multiple children in
relation table for father.
Representing Multivalued Attribute

- For each multivalued attribute in an entity set/relationship set


- Build a new relation schema with two columns
- One column for the primary key of the entity set/relationship set
that has the multivalued attribute
- Another column for the multivalued attribute. Each cell of this
column holds only one value. So each value is represented as a
unique tuple
- Primary key for this schema is the union of both the attributes
Example – Representing Class Hierarchy
Representing Class Hierarchy
- Two general approaches depending on disjointness and completeness
- For non-disjoint and/or non-complete class hierarchy:
- Table creation
- create a table for each super-class entity set according to normal
entity set translation method.
- Create a table for each subclass entity set with a column for each of
the attributes of that entity set plus one for each attributes of the
primary key of the super-class entity set
- Primary key
- The primary key from super-class entity set is also used as the
primary key for this new table
-
Example – Representing Class Hierarchy
Representing Class Hierarchy
- Two general approaches depending on disjointness and completeness
- For disjoint AND complete mapping class hierarchy:
- DO NOT create a table for the super-class entity set
- Create a table for each subclass entity set include all attributes of
that subclass entity set and attributes of the superclass entity set
Example – Representing Class Hierarchy

FIGURE 4.7
A specialization
lattice with multiple
inheritance for a
UNIVERSITY
database.
Representing Class Hierarchy
- Mapping of Shared Subclasses (Multiple Inheritance)
- A shared subclass, such as STUDENT_ASSISTANT, is a subclass of
several classes, indicating multiple inheritance. These classes must all
have the same key attribute; otherwise, the shared subclass would be
modeled as a category.
Example – Representing Class Hierarchy
FIGURE 7.5 Mapping the EER
specialization lattice in Figure 4.6
using multiple options.

Major
pk
Specialization Super-Class

1. Multiple relations
- Make table for superclass as well as subclasses
- Primary key of sub-class will be assigned same as primary key d
of super-class (because it is is-a relationship)

S1 S2

p1k p2k
pk
Specialization Super-Class

1. Multiple relations, but only for sub-classes


2. Multiple relations, but only for sub-classes
- Works only when there is total participation from super-class d
- overlapping -> data redundancy
- So only works for disjoint case

S1 S2

b1 b2
a1 a2
pk
Specialization Super-Class

1. Multiple relations
2. Multiple relations, but only for sub-classes
3. Single relation for super-class only d
- type attribute needed
- Will work only in case of disjoint
- n subclasses implies n values
- type: {1, 2}, where 1 represents s1, and 2
represents s2. S1 S2
- Overlapping -> s1 and s2 needed, n subclasses
implies 2n values
- Type (may use binary representation like
following):
- {0, 1, 2, 3}, where 3 (= b11) means s1 b2
and s2 both, 0 (=b00) means none of b1
s1 and s2.
- Total participation => 2n -1 values
- type: {1, 2, 3}
- Limitation: a2
- As local attributes of subclasses increase, a1
null values increase
pk
Specialization Super-Class

1. Multiple relations
2. Multiple relations, but only for sub-classes
3. Single relation for super-class only d
4. Single relation with multiple type attributes
- Multiple type attributes (n - boolean flags)
- Works for overalping case also
S1 S2

b1 b2
a1 a2
sk
Sub-Class
Category Union Type
- S1, S2, … are of different types
- So, S1, S2, … can have different keys
- What will be key for Sub-class given superclass have different U
keys?
- Create surrogate (placeholder) key sk of length N = |p1k| + |p2k| + ..
- Add foreign keys to S1, S2, …
- referenced Sub-class’s surrogate key sk S1 S2
- In other words:
- For mapping a category whose defining superclass have
different keys, it is customary to specify a new key attribute,
called a surrogate key, when creating a relation to correspond
to the category. p1k p2k
- In the example on next slide we can create a relation OWNER
to correspond to the OWNER category and include any
attributes of the category in this relation. The primary key of
the OWNER relation is the surrogate key, which we called
OwnerId.
FIGURE 4.8
Two categories (union
types): OWNER and
REGISTERED_VEHICLE.
FIGURE 7.6

Mapping the EER categories


(union types) in Figure 4.7 to
relations.
OwnerId
Representing Aggregation
Relational Algebra a1 a3

a1 a2 a3

a1 a2 a3 b1 b2

b1 b2
Relational Algebra
- How do we write queries on the relational model?
- Formal method/specification of how to write specified queries etc. on
the relational model is Relational Algebra.
- Relational algebra is formal query language defined over
relational model.
- SQL is structured query language implemented on mySQL
database (and other databases!).
- The basic set of operations for the relational model is known as the
relational algebra.
- The result of a relational algebra operation is a new relation.
- Relational Algebra Expression – sequence of relational algebra
operations
Example: simple college admissions database
College (cName, state, enrollment)

Student (sID, sName, GPA, sizeHS)

Apply (sID, cName, major, decision)


Example: simple college admissions database
College (cName, state, enrollment)

Student (sID, sName, GPA, sizeHS)

Apply (sID, cName, major, decision)


Simplest query: relation name
Use operators to filter, slice, combine
Select operator: selects a subset of the rows
σ<selection condition> (R)
- Students with GPA>9
- σGPA > 9 (Student)
- Same as σStudent.GPA > 9 (Student)
Select operator: selects a subset of the rows
σ<selection condition> (R)
- Applications to Stanford CS major
- σcName = Stanford ^ major = CS (Apply)
Project operator: picks certain columns
π<attribute list> (R)
- ID and decision of all applications
- πsID, dec(Apply)
- Duplicates will be removed as output is a relation
Pick both rows and columns
- ID and name of students with GPA>3.7
Cross/Cartesian Product: combine two relations
- Names and GPAs of students with HS>1000 who applied to CS and were rejected
- R <- σdec = Rej. ^ major = CS (Apply)
- πName, GPA (σStudent.sID = R.sID ^ student.HS >1000 (Student x R)
- Equijoin: Student.sID = R.sID
- SQL query: {Select att1, att2 from R where att1 = ‘_’} is same as πatt1, att2 (σatt1 =
(R))
‘_’
Natural Join: combine two relations
- Suppose R has attributes att1 and att2, S has attributes att1 and att3
- What will be schema of R1 <- σR.att1 = S.att1(R X S) ?
- R1 (R.att1, R.att2, S.att1, S.att3)
- So Natural Join (⋈) is introduced to avoid repeating att1 twice.
- R ⋈ S will have schema: R U S (common attributes appearing once)
- R ⋈ S is same as πR U S(σR.att1 = S.att1(R X S))
Natural Join: combine two relations
Natural Join:

- Enforce equality on all attributes with same name


- Eliminate one copy of duplicate attributes
- Names and GPAs of students with HS>1000 who applied to CS and were rejected
- πsName, GPA(σHS > 1000 ^ major = CS ^ dec = reject (Student ⋈ Apply))
Natural Join: combine two relations
- Names and GPAs of students with HS>1000 who applied to CS at college with
enr>20,000 and were rejected
- Is Natural Join Associative? (S ⋈ A) ⋈ C = S ⋈ (A ⋈ C)
- Can we do the following query:
- πsName, GPA(σHS > 1000(S) ⋈ πsID, cName(σmajor = CS ^ dec = reject (A)) ⋈ σenr > 20000(C))
- Let’s say |S| = 1e6, |A| = 1e8, |C| = 1e4
- and σHS > 1000(S) has 10 entries, πsID, cName(σmajor = CS ^ dec = reject (A)) has 20 entries, σenr > 20000(C)
has 10 entries
- Cost: ~1e8
1e6 + 1e8 + 1e4
+ 10*20…
Natural Join: combine two relations
- Names and GPAs of students with HS>1000 who applied to CS at college with
enr>20,000 and were rejected
- Is Natural Join Associative? (S ⋈ A) ⋈ C = S ⋈ (A ⋈ C)
- Can we do the following query:
- πsName, GPA(σHS > 1000 ^ major = CS ^ dec = reject ^ enr > 20000 (S ⋈ A ⋈ C)
- Let’s say |S| = 1e6, |A| = 1e8, |C| = 1e4
- Cost: ~1e6*1e8…, so this is less efficient than query on the previous slide
- Query Optimizer transforms the above query to query on the previous slide
Theta Join: combine two relations
- Will Natural Join work if common attribute names are different in two tables?
- E.g. sID and aID in following tables.
- So theta join is needed
- R ⋈θS, where θ is R.att1 = S.att2
- Theta join is basic operation implemented in DBMS
- Term “join” often means theta join

aID
Theta Join: combine two relations
- Will Natural Join work if common attribute names are different in two tables?
- E.g. sID and aID in following tables.
- So theta join is needed
- R ⋈θS, where θ is R.att1 = S.att2
- Theta join is basic operation implemented in DBMS
- Term “join” often means theta join

aID
Relational Algebra Operations From Set Theory
- Union, Intersection, Difference, etc.
- Type Compatibility
- The operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) must have the same
number of attributes, and the domains of corresponding attributes must be compatible;
that is, dom(Ai)=dom(Bi) for i=1, 2, ..., n.
- The resulting relation for R1 U R2, R1 ∩ R, or R1-R2 has the same attribute names as the
first operand relation R1 (by convention).

aID
Union Operator
- List of college and student names
- T <- πcName(College) U πsName(Student)
- Resultant Schema will be T(cName)
- Entries will be union of cName and sName
- If all student and college names are unique, what will be |T|
- |College| + |Student|
- In general, |T| <= |R1| + |R2| for T <- R1 U R2

aID
Difference Operator
- IDs and names of students who didn’t apply anywhere
- DA <- πsID(Student) - πsID(Apply)
- DA ⋈ πsID, sName(Student)

aID
Intersection Operator
- Id and Name of students whose GPA is greater than 9 but were rejected.
- Hint: start with πsID(σGPA > 9 (Student))
- R <- πsID(σGPA > 9 (Student)) ∩ πsID(σdec = reject (Apply))
- R ⋈ πsID, sName(Student)
- Id and Name of students who applied to both CS and EE major.
Rename Operator
- Rename a relation
- Rename the columns
- Rename both column and relation
Rename Operator
- ρR(a1, a2, a3, …)(S)
- S gets renamed to R,
- and attributes of S get renamed to a1, a2, a3, …
- For disambiguation in “self-joins”
- Pairs of colleges in same state
- Hint: Start with ρC1(College)
- R <- σC1.state = C2.state ^ C1.cName ≠ C2.cName(ρC1(College) X
ρC2(College))
- πC1.cName, C2.cName(R)
- How to avoid following:
- IIT Mandi, IIT Ropar
- IIT Ropar, IIT Mandi
- Change C1.cName ≠ C2.cName to
-
C1.cName >l C2.cName
-
>l represents lexicographic order
Division Operation
- The division operation is applied to two relations
R(Z) ÷ S(X),
- where X is subset of Z.
- Let Y = Z - X (and hence Z = X U Y);
- that is, let Y be the set of attributes of R that are
not attributes of S.
- For a tuple t to appear in the result T of the
DIVISION, the values in t must appear in R in
combination with every tuple in S.
- The result of DIVISION is a relation T(Y)
- that includes a tuple t if tuples tR appear in R
with tR [Y] = t, and with tR [X] = ts for every
tuple ts in S.
Division Operation
Example:
- R has SID and TID pairs for all tracks.
- Track IDs of a particular destination
are given in S.
- Division will give ID of students who
have gone on all tracks of the
particular destination.
Additional Relational Operations

- Aggregate Functions and Grouping

- A type of request that cannot be expressed in the basic


relational algebra is to specify mathematical aggregate
functions on collections of values from the database.

- E.g. SUM, AVERAGE, MAXIMUM, MINIMUM, COUNT


Example Aggregate queries

- Given a relational schema: Employee(SSN, Dno, Salary)

- Find number of employees and their average salary for each


department
- Functional operator:


<group-by attributes> <function list>
(R)
Use of Functional Operator
- Given a relational schema: Employee(SSN, Dno, Salary)
- ℱMAX Salary(Employee)
- retrieves the maximum salary value from the Employee relation
- How to get id of employee with maximum salary?
- Note ℱ will return a relation, not a scalar value.
- ℱMIN Salary(Employee)
- retrieves the minimum Salary value from the Employee relation
- ℱSUM Salary(Employee) (Employee)
- retrieves the sum of the Salary from the Employee relation
- Dno ℱ COUNT SSN, AVERAGE Salary(Employee)
- groups employees by DNO (department number) and computes the
count of employees and average salary per department.
- Note: count just counts the number of values in that column (ignores
NULL), without removing duplicates.
Aggregate Function Output
The OUTER JOIN Operation
- In NATURAL JOIN tuples without a matching (or related) tuple
are eliminated from the join result.
- Tuples with null in the join attributes are also eliminated.
- This amounts to loss of information.
- To retain all the tuples in R or S or both whether or not they
have matching tuples in the other relation, we use outer joins.
- Left outer join
- Right outer join
- Full outer join
- Pad with null value if no matching tuple is found in the other
relation.
The OUTER JOIN Operation
- Left outer join example
- Employee(EName, SSN)
- Department(DName, Manager SSN)
- Employee ⋈ ρ(DName, SSN)(Department) will only return tuples
with Employees who are managers.
- If we want all employees, then we should use:
- Employee ⟕ ρ(DName, SSN)(Department)
Alter schema add ‘rating’ for every faculty
- ρ(Fid, Rating)(πFid,‘1’(Rating1-Faculty))

Fid Fid Rating


ID1 1 ID1 1

ID2 1 ID2 1

ID3 1 ID3 1

ID4 1 ID4 1
MySQL Workbench
- Signup and install from:
- https://dev.mysql.com/downloads/workbench/
- Add new database
- Add new table
- PK: Primary Key
- NN: Not Null (Primary Key)
- UQ: Unique
- UN: Unsigned
- ZF: Cannot have zero
- AI: Auto Increment
- Rows in “Columns tab” represent different columns
MySQL Workbench
- Add foreign key
- Old version: Relational View (for foregin key
constraints)
- 8.0:
- foreign key tab in same table
- If primary key is deleted, then:
- cascade (delete foreign key as well)
- no action (do nothing)
- set null (make foreign key value as null)
- restrict (Don't allow to delete particulary tuple from
reference table)
- Inserts tab: add values to table
Announcements
- No classes this Thursday and Friday
- Class at 7:30 -8:20 p.m. today
- Class at 7:30 -8:20 p.m. tomorrow
- No Lab on Saturday
Relational Design Theory
Physical Design

Given different design, define goodness metric

Data Model
Requirement Analysis Conceptual Design Mapping
(Logical Design)
stock-price price name A B
name

Company makes Product


C D
category
Consult different stakeholders E-R Model
E F
to find out requirements Analyze Design Alternatives
R.No. Name Fees CGPA
and Select Appropriate One
Relational Design Theory
Single Enrollment Table

- Redundancy (e.g. Student Name repeated for multiple courses)


- May be ok if we have large space
- But Redundancy leads to following.
- Inconsistencies:
- Update anomaly
- Update must be
SName
propagated to
all redundant Semester CID CName
records SID
- Insert anomaly
- Insert null values
to student Student Enrols Course
attributes, when
a new course is
introduced for the 1st time
SID SNme CID CName Semester
- Delete anomaly
- If student drops a course,
then what happens?
- No issue S1 SN1 C1 CN1 EVEN-22

- If all students drop a course?


- Course info also gets lost S1 SN1 C2 CN2 EVEN-22
Relational Design Theory RAM
Single Enrollment Table

- Redundancy
- May be ok if we have large space
- But Redundancy leads to following.
- Inconsistencies Processor Disk
- Query Efficiency
- In general query efficiency
goes down (σSID > S3 (E))
SName
- In this case it increases
σSID > S3, CID < C2 (E) Semester CID CName
- RAM: faster but SID
less memory
- Disk: slow but high
Memory, has all records
- Multiple cycles needed Student Enrols Course
If multiple chunks of tables
are transferred from Disk to
RAM due to redundancy
- SID, CID: 6 chars each SID SNme CID CName Semester
- CName, SName: 50 chars each
- CSyllabus: 1500 chars
- Size of record with only SID and CID:
12 chars in query
- 1600 chars in query: if multiple chunks of a
table shifted to RAM, then in generally query
efficiency goes down due to redundancy.
What is a Good Design?
- No Redundancy
- Query should be very efficient
- No loss of information
- Lossless Decomposition: Decomposing R into R1 and R2 should not lead to loss
of info
- i.e. πR1(R) ⋈ πR2(R) should give back original set of tuples in R

- Recap: R ⋈ S is same as πR U S(σR.att1 = S.att1(R X S))


- Should not lead to spurious tuples
- Relationship: Ternary to 3 binary lead to spurious tuples r(R) ⊂ r(R1)⋈r(R2)⋈r(R3)
- What if no attributes are common between R1 and R2?
- Primary key, foreign key avoid spurious tuples.
- If Sname is common attribute in previous slide, then it will lead to spurious tuples!
R

R1 R2
Relation Design Theory
- Start with a mega relation with all the attributes
- Provide constraints (or functional dependencies) that hold in the domain we are
trying to model
- E.g. SID -> SName (i.e. SID determines SName is a functional dependency)
- i.e. if t1[SID] = t2[SID] => t1[SName] = t2[SName]
- CID determines CName is another functional dependency
- How to know functional dependencies?
- Domain experts will know the functional dependencies, ask them
- Infer from the data, confirm with domain experts and add constraint if true
- Automatic Design (Decomposition)
- No redundancy
- Efficient queries
R - Lossless decomposition

R1 R2
Relation Design Theory: Normalization
- Normalization: Process to organize data and attributes in db
- Normalization happen through series of Normal Forms (which final set of relations
satisfies)
- An algorithm for automatic decomposition lead to a particular normal form
- Several Normal Forms
- 1NF, 2NF, 3NF, Boyce-Codd Normal Form (BCNF), 4NF, 5NF
- 1NF: attributes must be atomic (e.g. FName, LName), but does not eliminate
redundancy
- 2NF enforce some redundancy, 3NF further enforce some redundancy and so on..
- Lower order NFs permit some redundancy which Higher order NFs eliminate
- Given all attributes, functional dependencies, based on normal form we get automatic
decomposition
R

R1 R2
Functional Dependencies and BCNF
- Apply (SSN, sName, cName)
- Redundancy; Update & Deletion Anomalies
- Storing SSN-sName pair once for each college
- Functional Dependency: SSN -> sName
- Same SSN always has same sName
- Should store each SSN’s sName only once
- Boyce-Codd Normal Form:
- If A -> B then A is a key
- Decompose: Student (SSN, sName) Apply(SSN, cName)

R1 R2
Functional Dependencies and BCNF
- Student (SSN, sName, address, HScode, HSname, HScity, GPA, priority)
- Functional Dependencies:
- SSN -> sName, address
- t1[SSN] = t2[SSN] => t1[sName] = t2[sName] and so on
- Refer this link to know why SSN does not determine HScode always
- HScode -> HSname, HScity
- GPA -> priority
- {SSN, HScode} -> GPA
- Boyce-Codd Normal Form:
- If A -> B then A is a key
- Decompose:
- R1 (SSN, sName, address)
- R2 (HScode, HSname, HScity) R3 (SSN, HScode, GPA)
- R4 (GPA, priority)
R
R1 R2
Full Functional Dependency (2NF)
- X̄ ->Ȳ is a full functional dependency if removal of any attribute A from X̄ means
that the dependency does not hold any more.
- E.g. {SId, PNo} -> hrs ✓
- is a full functional dependency
- E.g. {SId, PNo} -> PName ❌
- is not a full functional dependency
- {PNo} -> PName ✓
- 2NF: A relation is in 2NF if every non-prime attribute A in R is fully functional
dependant on all candidate keys.
- E.g. Student (SId, SName, PNo, PName, hrs)
- Functional Dependencies:
- SId, Pno -> rest of (non-prime) attributes
- Pno -> PName
- SId -> SName
- {SId, Pno} is candidate key
- Pno -> PName and SId -> SName violate 2NF
- As PName is partially dependant on Pno (not fully)
- What if SName and PName are all unique
- then all attributes except hrs are candidate keys
- and then Student (SId, SName, PNo, PName, hrs} is in 2NF (next slide).
Full Functional Dependency (2NF)
- X̄ ->Ȳ is a full functional dependency if removal of any attribute A from X̄ means
that the dependency does not hold any more.
- E.g. {SId, PNo} -> hrs ✓
- is a full functional dependency
- E.g. {SId, PNo} -> PName ❌
- is not a full functional dependency
- 2NF: A relation is in 2NF if every non-prime attribute A in R is fully functional
dependant on all candidate keys.
- E.g. Student (SId, SName, PNo, PName, hrs)
- What if SName and PName are all unique
- then all attributes except hrs are candidate keys
- and then Student (SId, SName, PNo, PName, hrs} is in 2NF. Still redundancy!
- Functional Dependencies:
- SId -> hrs ✓
- Pno -> hrs ✓
- SName -> hrs ✓
- PName -> hrs ✓
- {SId, Pno} -> hrs is a functional dependency (not fully ❌), but 2NF holds as {SId, Pno} is not
a candidate key. Still redundancy (SId -> SName and PNo -> PName)!
Non-trivial Functional Dependancy: 3NF
- X̄ ->Ȳ is trivial functional dependency if Ȳ ⊆ X̄
- Else it is not-trivial
- If X̄ ∩ Ȳ = Φ, then X̄ ->Ȳ is completely non-trivial functional dependency
- 3NF: Whenever a non-trivial functional dependency X -> A holds in R, then:
- Either (a) X is super-key of R
- Or (b) A is a prime attribute of R
- E.g. Student (SId, PNo, PName, Hrs, Salary) and PNames are all unique
- It does not violates 2NF ✓
- PNo -> PName does not violates 3NF ✓
- Hrs -> Salary violates 3NF ❌
- Note: Student (SId, PNo, PName, Hrs) given PNames are all unique does not violate 3NF

- So we divide Student (SId, PNo, PName, Hrs, Salary) into


- R1 (SId, PNo, PName, Hrs) and R2 (Hrs, Salary)
- Still PNo -> PName is redundancy which is still present
Functional Dependencies and BCNF
- Student (SSN, sName, address, HScode, HSname, HScity, GPA, priority)
- Functional Dependencies:
- SSN -> sName, address, GPA, priority
- t1[SSN] = t2[SSN] => t1[sName] = t2[sName] and so on
- Refer this link to know why SSN does not determine HScode always
- HScode -> HSname, HScity
- GPA -> priority
- Boyce-Codd Normal Form:
- If A -> B in relation R Then A is a super key of R
- Decompose:
- R1 (SSN, sName, address, GPA, priority)
- R2 (HScode, HSname, HScity) R3 (SSN, HScode)
- R4 (GPA, priority)
- Solve: Student (SId, SName, PNo, PName, Hrs, Salary) when SName and PName are not unique!
R

R1 R2
References and Acknowledgements
Text books:
- Database System Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan, Seventh Edition.
Slides derived from:
- Dr. Sriram’s Lectures in 2021
- J Ullman’s slides on ER modelling, Stanford University
- Dan Suciu’s slides, University of Washington
- Chapter 2 slides of Database System Concepts by Korth & Silberschatz
- Chapter 3 slides from Fundamentals of Database Systems by Navathe
Slides derived from slides on ER model to Relational Model
- Jung T. Chang, San Hose State University
- David Toman, University of Waterloo
- Sunnie S Chung, Ohio State University
- Navathe slides on Relational model
- Dr. Sriram’s Lectures in 2021
Relational Algebra: Slides derived from Navathe chapter 6 and from relational algebra slides by Jeniffer Widom, Stanford University
Relational Design Theory: Slides derived from Prof. Jennifer Widom, Stanford University and Navathe Chapter 10

You might also like