You are on page 1of 72

Module 2

DATA MODELING
Entity
• An entity can be a real-world object, either animate or
inanimate, that can be easily identifiable.
• For example, in a school database, students, teachers,
classes, and courses offered can be considered as entities.
• All these entities have some attributes or properties that give
them their identity.
Attributes
• Entities are represented by means of their properties,
called attributes. All attributes have values. For
example, a student entity may have name, class, and age
as attributes.
• There exists a domain or range of values that can be
assigned to attributes. For example, a student's name
cannot be a numeric value. It has to be alphabetic. A
student's age cannot be negative, etc.
Types of Attributes
Simple Vs Composite
●Simple or atomic attribute − Simple attributes are atomic values, which
cannot be divided further. For example, a student's phone number is an
atomic value of 10 digits.

●Composite attribute − Composite attributes are made of more than one


simple attribute. Composite attributes can be divided in to smaller
subparts. For example, a student's complete name may have first_name
and last_name.
Single-value Vs Multivalued attribute

●Single-value attribute − Single-value attributes contain


single value. For example − Social_Security_Number.

●Multivalued attribute − Multivalued attributes may


contain more than one values. For example, a person can
have more than one phone number, email_address, etc.
Stored Vs Derived Attributes

• The stored attribute are such attributes which are already stored
in the database. For example , DOB.

• Derived attribute − Derived attributes are the attributes that do


not exist in the physical database, but their values are derived
from other attributes present in the database. For example, age
can be derived from data_of_birth. Another example,
average_salary in a department should not be saved directly in
the database, instead it can be derived.
Complex Attributes
• In general, composite and multi-valued attributes may be nested
arbitrarily to any number of levels although this is rare.
• We can represent arbitrary nesting by grouping components of a
composite attribute between parentheses () and separating the
components with commas, and by displaying multivalued
attributes between braces n. Such attributes are called complex
attributes.
• For example, if a person can have more than one residence and
each residence can have multiple phones, an attribute
AddressPhone
(AddressPhone( (Phone(AreaCode,PhoneNumber)},
Address(StreetAddress(Number,Street,ApartmentNum
ber),
City,State,Zip) ) }
Entity Types, Entity sets
• A database usually contains groups of entities that
are similar.
• An entity type defines a collection (or set) of
entities that have the same attributes.
• Each entity type in the database is described by its
name and attributes.
• The collection of all entities of a particular entity
type in the database at any point in time is called an
entity set.
Key attributes of an entity type
• An entity type usually has one or more attributes
whose values are distinct for each individual entity
in the entity set. Such an attribute is called a key
attribute
Types of keys
• Primary Key
• Candidate key/ Alternate key
• Super key
• Minimal super key
Types of Keys
• Primary Key The attribute or combination of
attributes that uniquely identifies a row or record in
a relation is known as primary key.
• Composite key or concatenate key A primary key
that consists of two or more attributes is known as
composite key.
• SuperKey: A key that can be uniquely used to
identify a database record, that may contain extra
attributes that are not necessary to uniquely
identify records x
Superkey is a set of columns in a table for which there are no two rows that will share the same
combination of values. So, the superkey is unique for each and every row in the table. A

l
superkey can also be just a single column.
Example of a superkey
Suppose we have a table that holds all the managers in a company, and that table is called
Managers. The table has columns called ManagerID, Name, Title, and DepartmentID.

Every manager has his/her own ManagerID, so that value is always unique in each and every
row. This means that if we combine the ManagerID column value for any given row with any
other column value, then we will have a unique set of values. So, for the combinations of
(ManagerID, Name), (ManagerID, TItle), (ManagerID, DepartmentID), (ManagerID, Name,
DepartmentID), etc
– there will be no two rows in the table that share the exact same combination of values,
because the ManagerID will always be unique and different for each row. This means that
pairing the Manager ID with any other column(s) will ensure that the combination will also be
unique across all rows in the table.

superkey – it’s any combination of column(s) for which that combination of values will be
unique across all rows in a table. So, all of those combinations of columns in the Manager
table that would be considered to be superkeys. Even the ManagerID column is considered to
be a superkey
Relationships and Relationship Types
• A relationship relates two or more distinct entities
with a specific meaning.
• For example, EMPLOYEE John Smith works on the
ProductX PROJECT, or EMPLOYEE Franklin Wong
manages the Research DEPARTMENT.
• Relationships of the same type are grouped or typed
into a relationship type.
• For example, the WORKS_ON relationship type in which
EMPLOYEEs and PROJECTs participate, or the
MANAGES relationship type in which EMPLOYEEs and
DEPARTMENTs participate.
Degree of Relationships
• The degree of a relationship type is the number of
participating entity types.

• Unary or Recursive Relationships


• Binary Relationships
• Ternary Relationship
Unary or Recursive Relationships
• A unary relationship, also called recursive, is one in
which a relationship exists between occurrences of
the same entity set.
Binary Relationships
Binary Relationships
• A binary relationship exists when two entities are
associated in a relationship.
• For example the relationship “a PROFESSOR
teaches one or more CLASSes” represents a binary
relationship.
Ternary Relationship
Ternary Relationship
• A ternary relationship exists when three entities are associated.
Some of the examples are as follows :
• A DOCTOR writes one or more PRESCRIPTIONs.
• A PATIENT may receive one or more PRESCRIPTIONs.
• A DRUG may appear in one or more PRESCRIPTIONs.
Example relationship instances of the WORKS_FOR relationship
between EMPLOYEE and DEPARTMENT
EMPLOYEE WORKS_FOR DEPARTMENT

e1  r1  d1
e2 
r2
d2
e3  

r3
e4 
r4  d3
e5 
e6  r5

e7  r6

r7
Role Names
• Each entity type that participates in a relationship type plays
a particular role in the relationship.
• The role name signifies the role that a participating entity
from the entity type plays in each relationship instance, and
helps to explain what the relationship means.
• For example, in the WORKS_FOR relationship type,
EMPLOYEE plays the role of employee or worker and
DEPARTMENT plays the role of department or employer.
Constraints on Relationship Types
• Two main types of relationship constraints:
1. Cardinality ratio - The cardinality ratio for a binary
relationship specifies the maximum number of relationship
instances that an entity can participate in.
• The possible cardinality ratios for binary relationship types
are 1:1, l:N, N:l, and M:N.
2. Participation - It specifies whether the existence of an
entity depends on its being related to another entity via the
relationship type.
• Total participation
• Partial participation
Constraints on binary relationship
types
• 1:1

• 1:N

• M:N
• M:N

• Participation constraints
• It specifies whether the existence of an entity depends
on its being related to another entity via the relationship
type.
• Total participation
• Partial participation
• Strong Entity types
• Weak Entity types
• Partial key

A Weak entity type


normally has a
partial key which is
the attribute that
can uniquely identify
weak entities that
are related to the
same owner entity.
ER Schema diagram of company DB
ER-DIAGRAM - NOTATION FOR ER
SCHEMAS Symbol Meaning

ENTITY TYPE

WEAK ENTITY TYPE

RELATIONSHIP TYPE

IDENTIFYING RELATIONSHIP TYPE

ATTRIBUTE

KEY ATTRIBUTE

MULTIVALUED ATTRIBUTE

COMPOSITE ATTRIBUTE

DERIVED ATTRIBUTE

TOTAL PARTICIPATION OF E2 IN R

E1 R E2 CARDINALITY RATIO 1:N FOR E1:E2 IN R

E1 R N E2 STRUCTURAL CONSTRAINT (min, max) ON


PARTICIPATION OF E IN R
(min,max)
R E
• ER model has three main concepts:
• Entities (and their entity types and entity sets)
• Attributes (simple, composite, multivalued)
• Relationships (and their relationship types and
relationship sets)
ER Diagrams
• A company database needs to store information about employees
(identified by ssn, with salary and phone as attributes), departments
(identified by dno, with dname and budget as attributes), and children of
employees (with name and age as attributes). Employees work in
departments; each department is managed by an employee; a child must
be identified uniquely by name when the parent (who is an employee;
assume that only one parent works for the company) is known. We are
not interested in information
N about a child once the parent leaves the
company. 1

1 1
1

N
Notown Records has decided to store information about musicians who perform on
its albums
• Each musician that records at Notown has an SSN, a name, an address, and a
phone number.
• Each instrument used in songs recorded at Notown has a unique identification
number, a name (e.g., guitar, synthesizer, flute) and a musical key (e.g., C, B-flat,
E-flat).
• Each album recorded on the Notown label has a unique identification number, a
title, a copyright date, a format (e.g., CD or MC), and an album identifier.
• Each song recorded at Notown has a title and an author.
• Each musician may play several instruments, and a given instrument may be
played by several musicians.
• Each album has a number of songs on it, but no song may appear on more than
one album.
• Each song is performed by one or more musicians, and a musician may perform a
number of songs.
• Each album has exactly one musician who acts as its producer. A musician may
produce several albums, of course.
ER-to-Relational Mapping Algorithm

• ER-to-Relational Mapping Algorithm


• Step 1: Mapping of Regular Entity Types
• Step 2: Mapping of Weak Entity Types
• Step 3: Mapping of Binary 1:1 Relation Types
• Step 4: Mapping of Binary 1:N Relationship Types.
• Step 5: Mapping of Binary M:N Relationship Types.
• Step 6: Mapping of Multivalued attributes.
• Step 7: Mapping of N-ary Relationship Types.
Relational Model
• Relational model represents the database as a
collection of relations.
What is Relation?
• A Relation is thought of as a table of values (rows and
columns)
• each row, or tuple, is a collection of related data values
• the degree of the relation is the number of attributes in the
relation
• each column represents an attribute
• each row is an instance of the relation
So, a relation is a table of facts.
• Each column contains the same attribute data with the same data
type
• Each row describes a real-world instance of the relation
• A Relational database contains one or more relations (or
tables).
In relational model terminology

Informal Terms Formal Terms


Table Relation
Column Attribute
Row Tuple
Values in a column Domain
Table Definition Schema of a Relation
Populated Table Extension
Formal Definition
A domain has a logical definition:
 E.g., “Indian_phone_numbers” are the set of 10 digit phone
numbers valid in INDIA.
A domain may have a data-type or a format defined for it.
 E.g., Dates have various formats such as monthname, date, year
or yyyy-mm-dd, or dd mm,yyyy etc.
An attribute designates the role played by the domain.
 E.g., the domain Date may be used to define attributes “Invoice-
date” and “Payment-date”.
• A relation schema R denoted by R(A1, A2, …, An ) is made up of a
relation name R and a list of attributes (A1, A2, …, An)
• each attribute Ai is the name of a role played by some
domain D in the relation schema R.
• D is called the domain of Ai and denoted by dom (Ai )

• A relation schema is used to describe a relation


• R is called Relation.
• The degree of a relation is the number of attributes n of its
relation schema
• Relation state r of the relation scheme R(A1, A2, …, An ) also
denoted by r(R) is a set of n-tuples r = {t1,t2..tn}
• Formally,
Given R(A1, A2, .........., An)
r(R)  dom (A1) X dom (A2) X ....X dom(An)
• R: schema of the relation
• r of R: a specific "value" or population of R.
• R is also called the intension of a relation
• r is also called the extension of a relation
Example
Characteristics of Relations
• Ordering of tuples in a relation r(R): The tuples in a
relation do not have any particular order. Even though
they appear to be in the tabular form.

• Ordering of values within a tuple. We will consider the


attributes in R(A1, A2, ..., An) and a relation state r(R) is set
of mappings r={t1,t2,….tm} the values in t=<v1, v2, ..., vn>
to be ordered .

• Values in a tuple: All values are considered atomic


(indivisible). A special null value is used to represent
values that are unknown or inapplicable to certain tuples.
Relational Integrity Constraints
 Constraints are conditions that must hold on all valid
relation instances.
 Constraints on databases can generally be divided into
three categories:
 Constraints that are inherent in the data model – model based
constraints
- the constraint that a relation cannot have duplicate
tuples is an inherent constraint.
 Constraints that can be directly expressed in the schemas of
the data model – schema based constraints
 Constraints that cannot be directly expressed in the schemas of
the data model –application based constraints
Schema - based Constraints
 There are five main types of constraints:

1. Domain constraints
2. Key constraints
3. constraints on nulls
4. Entity integrity constraints
5. Referential integrity constraints
Domain Constraints
• Domain constraints specify that within each tuple, the value of
each attribute A must be an atomic value from the domain
dom(A).
• A data type or format is also specified for each domain.
• The data types associated with domains typically include
standard numeric data types for integers (such as short integer,
integer, and long integer) and real numbers (float and double-
precision float).
• Characters, booleans, fixed-length strings, and variable-length
strings are also available, as are date, time, timestamp, and, in
some cases, money data types.
• Other possible domains may be described by a subrange of
values from a data type or as an enumerated data type in which
all possible values are explicitly listed.
Key Constraints
 Superkey of R:
 Is a set of attributes SK of R with the following condition:
 No two tuples in any valid relation state r(R) will have the same value
for SK
 That is, for any distinct tuples t1 and t2 in r(R), t1[SK]  t2[SK]
 This condition must hold in any valid state r(R)

 Key of R:
 A "minimal" superkey
 That is, a key is a superkey K such that removal of any attribute from K
results in a set of attributes that is not a superkey (does not possess the
superkey uniqueness property)
 Example: The STUDET relation schema:
{SSN} – key. {SSN, Name, Age)-is a superkey.
However, the superkey {SSN, Name, Age) is not a key of STUDENT,
because removing Name or Age or both from the set still leaves us with a
superkey.
 A Key is a Superkey but not vice versa
Key Constraints (continued)
• Example: Consider the CAR relation schema:
• CAR(State, Reg#, SerialNo, Make, Model, Year)
• CAR has two keys:
• Key1 = {State, Reg#}
• Key2 = {SerialNo}
• Both are also superkeys of CAR
• {SerialNo, Make} is a superkey but not a key.
• In general:
• Any key is a superkey (but not vice versa)
• Any set of attributes that includes a key is a superkey
• A minimal superkey is also a key
Key Constraints (continued)
• If a relation has several candidate keys, one is chosen arbitrarily to
be the primary key.
• The primary key attributes are underlined.
• Example: Consider the CAR relation schema:
• CAR(State, Reg#, SerialNo, Make, Model, Year)
• We chose SerialNo as the primary key
• The primary key value is used to uniquely identify each tuple in a
relation
• Provides the tuple identity
• Also used to reference the tuple from another tuple
• General rule: Choose as primary key the smallest of the candidate
keys (in terms of size)
• Not always applicable – choice is sometimes subjective
Key Constraints
Relational Database Schema
• Relational Database Schema:
• A set S of relation schemas that belong to the same
database.
• S is the name of the whole database schema
• S = {R1, R2, ..., Rn} and a set of integrity constraints IC.
• R1, R2, …, Rn are the names of the individual relation
schemas within the database S
• Following slide shows a COMPANY database
schema with 6 relation schemas
COMPANY Database Schema
Relational Database State
• A relational database state DB of S is a set of relation
states DB = {r1, r2, ..., rm} such that each ri is a state of Ri
and such that the ri relation states satisfy the integrity
constraints specified in IC.
• A relational database state is sometimes called a relational
database snapshot or instance.
• We will not use the term instance since it also applies to
single tuples.
• A database state that does not meet the constraints is an
invalid state
Populated Database State
• Each relation will have many tuples in its current relation
state
• The relational database state is a union of all the individual
relation states
• Whenever the database is changed, a new state arises
• Basic operations for changing the database:
• INSERT a new tuple in a relation
• DELETE an existing tuple from a relation
• MODIFY an attribute of an existing tuple
• Next slide (Fig. 5.6) shows an example state for the
COMPANY database schema shown in Fig. 5.5.
Slide 5- 56
Populated Database State for COMPANY

Slide 5- 57
Entity Integrity

Entity Integrity:
The primary key attributes PK of each relation schema R in S
cannot have null values in any tuple of r(R).
 This is because primary key values are used to identify the individual
tuples.
 t[PK]  null for any tuple t in r(R).
 If PK has several attributes, null is not allowed in any of these
attributes.
Note: Other attributes of R may be similarly constrained to disallow
null values, even though they are not members of the primary key.
Referential Integrity

A constraint involving two relations (the previous constraints


involve a single relation).
Used to specify a relationship among tuples in two relations: the
referencing relation and the referenced relation.
Tuples in the referencing relation R1 have attributes FK (called
foreign key attributes) that reference the primary key attributes
PK of the referenced relation R2.
 A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].
A referential integrity constraint can be displayed in a relational
database schema as a directed arc from R1.FK to R2. PK
Referential Integrity (or Foreign Key)
Constraint
Statement of the constraint
The value in the foreign in key column (or columns) FK
of the referencing relation R1 can be either:
1. a value of an existing primary key value of the corresponding
primary key PK in the referenced relation R2,, or..
2. a null.
In case (2), the FK in R1 should not be a part of its own
primary key.
Displaying a relational database
schema and its constraints
• Each relation schema can be displayed as a row of attribute
names
• The name of the relation is written above the attribute names
• The primary key attribute (or attributes) will be underlined
• A foreign key (referential integrity) constraints is displayed
as a directed arc (arrow) from the foreign key attributes to
the referenced table
• Can also point the the primary key of the referenced relation
for clarity
• Next slide shows the COMPANY relational schema
diagram with referential integrity constraints
Referential Integrity Constraints for COMPANY database
Other Types of Constraints
Semantic Integrity Constraints:
-constraints can be specified and enforced on a relational database.
- E.g the salary of an employee should not exceed the salary of the
employee’s supervisor
- E.g., “the max. no. of hours per employee for all projects he or she works
on is 56 hrs per week”
Constraint specification language
- Constraints can be specified and enforced within the application
programs that update the database.
SQL-99 allows CREATE TRIGGER and CREATE ASSERTION
to express some of these semantic constraints
Keys, Permissibility of Null values, Candidate Keys (Unique in SQL),
Foreign Keys, Referential Integrity etc. are expressed by the
CREATE TABLE statement in SQL.
Update Operations on Relations
• INSERT a tuple.
• DELETE a tuple.
• MODIFY a tuple.
• Integrity constraints should not be violated by the
update operations.
• Several update operations may have to be grouped
together.
• Updates may propagate to cause other updates
automatically. This may be necessary to maintain
integrity constraints.
Update Operations on Relations
• In case of integrity violation, several actions can be taken:
• Cancel the operation that causes the violation
(RESTRICT or REJECT option)
• Perform the operation but inform the user of the
violation
• Trigger additional updates so the violation is corrected
(CASCADE option, SET NULL option)
• Execute a user-specified error-correction routine
Possible violations for each operation

• INSERT may violate any of the constraints:


• Domain constraint:
• if one of the attribute values provided for the new tuple is not of
the specified attribute domain
• Key constraint:
• if the value of a key attribute in the new tuple already exists in
another tuple in the relation
• Referential integrity:
• if a foreign key value in the new tuple references a primary key
value that does not exist in the referenced relation
• Entity integrity:
• if the primary key value is null in the new tuple
COMPANY Database Schema
Slide 5- 68
Possible violations for each operation
• DELETE may violate only referential integrity:
• If the primary key value of the tuple being deleted is referenced
from other tuples in the database
• Can be remedied by several actions: RESTRICT, CASCADE, SET
NULL.
• RESTRICT option: reject the deletion
• CASCADE option: propagate the new primary key value into the
foreign keys of the referencing tuples
• SET NULL option: set the foreign keys of the referencing tuples to
NULL
ALTER TABLE dept DROP CONSTRAINT dept_unique RESTRICT;
ALTER TABLE dept DROP CONSTRAINT dept_unique CASCADE;
• One of the above options must be specified during database
design for each foreign key constraint
Possible violations for each operation
• UPDATE may violate domain constraint and NOT NULL
constraint on an attribute being modified
• Any of the other constraints may also be violated, depending on
the attribute being updated:
• Updating the primary key (PK):
• Similar to a DELETE followed by an INSERT
• Need to specify similar options to DELETE
• Updating a foreign key (FK):
• May violate referential integrity
• Updating an ordinary attribute (neither PK nor FK):
• Can only violate domain constraints

Slide 5- 70
Summary
• Presented Relational Model Concepts
• Definitions
• Characteristics of relations
• Discussed Relational Model Constraints and Relational
Database Schemas
• Domain constraints
• Key constraints
• Entity integrity
• Referential integrity
• Described the Relational Update Operations and Dealing
with Constraint Violations
In-Class Exercise
(Taken from Exercise 5.15)
Consider the following relations for a database that keeps track of student
enrollment in courses and the books adopted for each course:
STUDENT(SSN, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(SSN, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_ISBN)
TEXT(Book_ISBN, Book_Title, Publisher, Author)
Draw a relational schema diagram specifying the foreign keys for this
schema.

Slide 5- 72

You might also like