Professional Documents
Culture Documents
ER MODEL
➢ The Entity Relational Model is a model for identifying entities to be represented
in the database and representation of how those entities are related.
➢ The ER data model specifies enterprise schema that represents the overall
logical structure of a database graphically.
➢ The Entity Relationship Diagram explains the relationship among the entities
present in the database.
➢ ER models are used to model real-world objects like a person, a car, or a
company and the relation between these real-world objects.
➢ The ER Diagram is the structural format of the database.
Entity
An Entity may be an object with a physical existence – a particular
person, car, house, or employee – or it may be an object with a conceptual
existence – a company, a job, or a university course.
Entity Set:
An Entity is an object of Entity Type and a set of all entities is called an
entity set.
For Example, E1 is an entity having Entity Type Student and the set of
all students is called Entity Set. In ER diagram, Entity Type is represented as:
1. Strong Entity:
A Strong Entity is a type of entity that has a key
Attribute. Strong Entity does not depend on other
Entity in the Schema. It has a primary key, that helps
in identifying it uniquely, and it is represented by a
rectangle. These are called Strong Entity Types.
2. Weak Entity:
An Entity type has a key attribute that uniquely
identifies each entity in the entity set. But some
entity type exists for which key attributes can’t be
defined. These are called Weak Entity types.
For Example, A company may store the information of dependents
(Parents, Children, Spouse) of an Employee. But the dependents don’t have
existed without the employee. So Dependent will be a Weak Entity
Type and Employee will be Identifying Entity type for Dependent, which
means it is Strong Entity Type.
A weak entity type is represented by a Double Rectangle. The participation
of weak entity types is always total. The relationship between the weak
entity type and its identifying strong entity type is called identifying
relationship and it is represented by a double diamond.
Attributes
Attributes are the properties that define the entity type. For example,
Roll_No, Name, DOB, Age, Address, and Mobile_No are the attributes that
define entity type Student. In ER diagram, the attribute is represented by an
oval.
Attribute
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called
the key attribute. For example, Roll_No will be unique for each student. In ER
diagram, the key attribute is represented by an oval with underlying lines.
Key Attribute
2. Composite Attribute
An attribute composed of many other
attributes is called a composite attribute. For
example, the Address attribute of the student
Entity type consists of Street, City, State, and
Country. In ER diagram, the composite attribute is represented by an oval
comprising of ovals.
Composite Attribute
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example,
Phone_No (can be more than one for a given student). In ER diagram, a
multivalued attribute is represented by a double oval.
Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is
known as a derived attribute. e.g.; Age (can be derived from DOB). In ER
diagram, the derived attribute is represented by a dashed oval.
Derived Attribute
The Complete Entity Type Student with its Attributes can be represented as:
Relationship Type and
Relationship Set
A Relationship Type represents
the association between entity
types. For example, ‘Enrolled in’ is
a relationship type that exists
between entity type Student and
Course. In ER diagram, the
relationship type is represented by
a diamond and connecting the
entities with lines.
Relationship Set
Degree of a Relationship Set
The number of different entity sets participating in a relationship set is called
the degree of a relationship set.
1. Unary Relationship: When there is only ONE entity set participating in a
relation, the relationship is called a unary relationship. For example, one
person is married to only one person.
Unary Relationship
Binary Relationship
3. Many-to-One: When entities in one entity set can take part only once in
the relationship set and entities in other entity sets can take part more than
once in the relationship set, cardinality is many to one. Let us assume that a
student can take only one course but one course can be taken by many
students. So the cardinality will be n to 1. It means that for one course there
can be n students but for one student, there will be only one course.
The total number of tables that can be used in this is 3.
many to one cardinality
In this case, each student is taking only 1 course but 1 course has been taken
by many students.
4. Many-to-Many: When entities in all entity sets can take part more than
once in the relationship cardinality is many to many. Let us assume that a
student can take more
than one course and one
course can be taken by
many students. So the
relationship will be
many to many.
the total number of
tables that can be used in this is 3.
many to many cardinality
Using Sets, it can be represented as:
In this example, student S1 is enrolled in C1
and C3 and Course C3 is enrolled by S1, S3, and S4.
So it is many-to-many relationships.
Participation Constraint
Participation Constraint is applied to the entity
participating in the relationship set.
1. Total Participation – Each entity in the entity
set must participate in the relationship. If each
student must enroll in a course, the participation of
students will be total. Total participation is shown by a double line in the ER
diagram.
2. Partial Participation – The entity in the entity set may or may NOT
participate in the relationship. If some courses are not enrolled by any of the
students, the participation in the course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set
having total participation and Course Entity set having partial participation.
• Example 2 –
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
• In the above table Course is a multi-valued attribute so it is not in
1NF. Below Table is in 1NF as there is no multi-valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
Second Normal Form
To be in second normal form, a relation must be in first normal form and
relation must not contain any partial dependency. A relation is in 2NF if it
has No Partial Dependency, i.e., no non-prime attribute (attributes which are
not part of any candidate key) is dependent on any proper subset of any
candidate key of the table. Partial Dependency – If the proper subset of
candidate key determines non-prime attribute, it is called partial dependency.
• Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
• {Note that, there are many courses having the same course fee}
Here, COURSE_FEE cannot alone decide the value of COURSE_NO
or STUD_NO; COURSE_FEE together with STUD_NO cannot decide
the value of COURSE_NO; COURSE_FEE together with
COURSE_NO cannot decide the value of STUD_NO; Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong
to the one only candidate key {STUD_NO, COURSE_NO} ; But,
COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key. Non-
prime attribute COURSE_FEE is dependent on a proper subset of
the candidate key, which is a partial dependency and so this relation
is not in 2NF. To convert the above relation to 2NF, we need to split
the table into two tables such as :
Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO,
COURSE_FEE
Table 1
Table 2
STUD_NO COURSE_NO COURSE_NO
COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
• NOTE: 2NF tries to reduce the redundant data getting stored in
memory. For instance, if there are 100 students taking C1 course,
we don’t need to store its Fee as 1000 for all the 100 records,
instead, once we can store it in the second table as the course fee
for C1 is 1000.
• Example 2 – Consider following functional dependencies in
relation R (A, B , C, D )
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial
dependency, i.e., any proper subset of AB doesn’t determine any non-prime
attribute.
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate
key).
Example 1: In relation STUDENT given in Table 4, FD set: {STUD_NO ->
STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE ->
STUD_COUNTRY, STUD_NO -> STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE -
> STUD_COUNTRY are true.
So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the
third normal form.
To convert it in third normal form, we will decompose the relation
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY_STUD_AGE) as: STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_AGE) STATE_COUNTRY (STATE,
COUNTRY)
Consider relation R(A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All possible
candidate keys in above relation are {A, E, CD, BC} All attributes are on right
sides of all functional dependencies are prime.
Example 2: Find the highest normal form of a relation R(A,B,C,D,E) with FD
set as {BC->D, AC->BE, B->E}
Step 1: As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can
determine all attribute of relation, So AC will be candidate key. A or C can’t
be derived from any other attribute of the relation, so there will be only 1
candidate key {AC}.
Step 2: Prime attributes are those attributes that are part of candidate key
{A, C} in this example and others will be non-prime {B, D, E} in this example.
Step 3: The relation R is in 1st normal form as a relational DBMS does not
allow multi-valued or composite attribute. The relation is in 2nd normal form
because BC->D is in 2nd normal form (BC is not a proper subset of candidate
key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E is
in 2nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super
key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a
prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be
super key or RHS should be prime attribute. So the highest normal form of
relation will be 2nd Normal form.
For example consider relation R(A, B, C) A -> BC, B -> A and B both are super
keys so above relation is in BCNF.
Third Normal Form
A relation is said to be in third normal form, if we did not have any transitive
dependency for non-prime attributes. The basic condition with the Third
Normal Form is that, the relation must be in Second Normal Form.
Below mentioned is the basic condition that must be hold in the non-trivial
functional dependency X -> Y:
• X is a Super Key.
• Y is a Prime Attribute ( this means that element of Y is some part
of Candidate Key).
For more, refer to Third Normal Form in DBMS.
BCNF (Boyce-Codd Normal Form)
BCNF (Boyce-Codd Normal Form) is just a advanced version of Third
Normal Form. Here we have some additional rules than Third Normal Form.
The basic condition for any relation to be in BCNF is that it must be in Third
Normal Form.
We have to focus on some basic rules that are for BCNF:
1. Table must be in Third Normal Form.
2. In relation X->Y, X must be a superkey in a relation.
For more, refer to BCNF in DBMS.
Fourth Normal Form
Fourth Normal Form contains no non-trivial multivaued dependency except
candidate key. The basic condition with Fourth Normal Form is that the
relation must be in BCNF.
The basic rules are mentioned below.
1. It must be in BCNF.
2. It does not have any multi-valued dependency.
For more, refer to Fourth Normal Form in DBMS.
Fifth Normal Form
Fifth Normal Form is also called as Projected Normal Form. The basic
conditions of Fifth Normal Form is mentioned below.
Relation must be in Fourth Normal Form.
The relation must not be further non loss decomposed.
For more, refer to Fifth Normal Form in DBMS.
Applications of Normal Forms in DBMS
• Data consistency: Normal forms ensure that data is consistent and
does not contain any redundant information. This helps to prevent
inconsistencies and errors in the database.
• Data redundancy: Normal forms minimize data redundancy by
organizing data into tables that contain only unique data. This
reduces the amount of storage space required for the database and
makes it easier to manage.
• Query performance: Normal forms can improve query
performance by reducing the number of joins required to retrieve
data. This helps to speed up query processing and improve overall
system performance.
• Database maintenance: Normal forms make it easier to maintain
the database by reducing the amount of redundant data that needs
to be updated, deleted, or modified. This helps to improve
database management and reduce the risk of errors or
inconsistencies.
• Database design: Normal forms provide guidelines for designing
databases that are efficient, flexible, and scalable. This helps to
ensure that the database can be easily modified, updated, or
expanded as needed.
Some Important Points about Normal Forms
• BCNF is free from redundancy.
• If a relation is in BCNF, then 3NF is also satisfied.
• If all attributes of relation are prime attribute, then the relation is
always in 3NF.
• A relation in a Relational Database is always and at least in 1NF
form.
• Every Binary Relation ( a Relation with only 2 attributes ) is always
in BCNF.
• If a Relation has only singleton candidate keys( i.e. every candidate
key consists of only 1 attribute), then the Relation is always in
2NF( because no Partial functional dependency possible).
• Sometimes going for BCNF form may not preserve functional
dependency. In that case go for BCNF only if the lost FD(s) is not
required, else normalize till 3NF only.
• There are many more Normal forms that exist after BCNF, like 4NF
and more. But in real world database systems it’s generally not
required to go beyond BCNF.
DBMS Architecture 1-level, 2-Level, 3-Level:
A Database store a lot of critical information to access data
quickly and securely.
DBMS Architecture helps users to get their requests done while
connecting to the database. We choose database architecture depending on
several factors like the size of the database, number of users, and
relationships between the users.
There are two types of database models that we generally use,
are logical model and physical model.
1. I/O parallelism :
is a form of parallelism in which the relations are partitioned on
multiple disks a motive to reduce the retrieval time of relations from the disk.
Within, the data inputted is partitioned and then processing is done in
parallel with each partition. The results are merged after processing all the
partitioned data. It is also known as data-partitioning.
Hash partitioning has the advantage that it provides an even
distribution of data across the disks and it is also best suited for those point
queries that are based on the partitioning attribute. It is to be noted that
partitioning is useful for the sequential scans of the entire table placed on ‘n‘
number of disks and the time taken to scan the relationship is
approximately 1/n of the time required to scan the table on a single disk
system. We have four types of partitioning in I/O parallelism:
• Hash partitioning –
As we already know, a Hash Function is a fast, mathematical
function. Each row of the original relationship is hashed on
partitioning attributes. For example, let’s assume that there are 4
disks disk1, disk2, disk3, and disk4 through which the data is to be
partitioned. Now if the Function returns 3, then the row is placed
on disk3.
• Range partitioning –
In range partitioning, it issues continuous attribute value ranges to
each disk. For example, we have 3 disks numbered 0, 1, and 2 in
range partitioning, and may assign relation with a value that is less
than 5 to disk0, values between 5-40 to disk1, and values that are
greater than 40 to disk2. It has some advantages, like it involves
placing shuffles containing attribute values that fall within a
certain range on the disk. See figure 1: Range partitioning
given below:
• Round-robin partitioning –
In Round Robin partitioning, the relations are studied in any
order. The ith tuple is sent to the disk number(i % n). So, disks take
turns receiving new rows of data. This technique ensures the even
distribution of tuples across disks and is ideally suitable for
applications that wish to read the entire relation sequentially for
each query.
• Schema partitioning –
In schema partitioning, different tables within a
database are placed on different disks. See figure 2
below:
2. Intra-query parallelism :
Intra-query parallelism refers to the execution of a single query in a parallel
process on different CPUs using a shared-nothing paralleling architecture
technique. This uses two types of approaches:
• First approach –
In this approach, each CPU can execute the duplicate task against
some data portion.
• Second approach –
In this approach, the task can be divided into different sectors with
each CPU executing a distinct subtask.
3. Inter-query parallelism :
In Inter-query parallelism, there is an execution of multiple
transactions by each CPU. It is called parallel transaction processing. DBMS
uses transaction dispatching to carry inter query parallelism.
In this method, each query is run sequentially, which leads to slowing
down the running of long queries.
In such cases, DBMS must understand the locks held by different
transactions running on different processes. Inter query parallelism on
shared disk architecture performs best when transactions that execute in
parallel do not accept the same data.
Also, it is the easiest form of parallelism in DBMS, and there is an
increased transaction throughput.
4. Intra-operation parallelism :
Intra-operation parallelism is a sort of parallelism in which we parallelize
the execution of each individual operation of a task like sorting, joins,
projections, and so on. The level of parallelism is very high in intra-
operation parallelism. This type of parallelism is natural in database
systems. Let’s take an SQL query example:
SELECT * FROM Vehicles ORDER BY Model_Number;
In the above query, the relational operation is sorting and since a relation can
have a large number of records in it, the operation can be performed on
different subsets of the relation in multiple processors, which reduces the
time required to sort the data.
5. Inter-operation parallelism :
When different operations in a query expression are executed in parallel,
then it is called inter-operation parallelism. They are of two types –
• Pipelined parallelism –
In pipeline parallelism, the output row of one operation is
consumed by the second operation even before the first operation
. It is useful for the small number of CPUs and avoids writing of
intermediate results to disk.
• Independent parallelism –
In this parallelism, the operations in query expressions that are not
dependent on each other can be executed in parallel. This parallelism is
very useful in the case of the lower degree of parallelism.