You are on page 1of 44

What is Database

The database is a collection of inter-related data which is used to retrieve, insert and delete the data
efficiently. It is also used to organize the data in the form of a table, schema, views, and reports, etc.

For example: The college Database organizes the data about the admin, staff, students and faculty
etc.

Using the database, you can easily retrieve, insert, and delete the information.

Database Management System


o Database management system is a software which is used to manage the database. For
example: MySQL, Oracle, etc are a very popular commercial database which is used in
different applications.
o DBMS provides an interface to perform various operations like database creation, storing data
in it, updating data, creating a table in the database and a lot more.
o It provides protection and security to the database. In the case of multiple users, it also
maintains data consistency.
DBMS allows users the following tasks:

o Data Definition: It is used for creation, modification, and removal of definition that defines
the organization of data in the database.
o Data Updation: It is used for the insertion, modification, and deletion of the actual data in
the database.
o Data Retrieval: It is used to retrieve the data from the database which can be used by
applications for various purposes.
o User Administration: It is used for registering and monitoring users, maintain data integrity,
enforcing data security, dealing with concurrency control, monitoring performance and
recovering information corrupted by unexpected failure.

Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the information.
o It can provide a clear and logical view of the process that manipulates data.
o DBMS contains automatic backup and recovery procedures.
o It contains ACID properties which maintain data in a healthy state in case of failure.
o It can reduce the complex relationship between data.
o It is used to support manipulation and processing of data.
o It is used to provide security of data.
o It can view the database from different viewpoints according to the requirements of the user.

Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it stores all the data
in one single database file and that recorded data is placed in the database.
o Data sharing: In DBMS, the authorized users of an organization can share the data among
multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature of the
database system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create automatic backup of data
from hardware and software failures and restores the data if required.
o multiple user interface: It provides different types of user interfaces like graphical user
interfaces, application program interfaces

Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and large
memory size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in most of the
organization, all the data stored in a single database and if the database is damaged due to
electric failure or database corruption then the data may be lost forever.

Different types of Database Users:


Database users are categorized based up on their interaction with the database. These are seven
types of database users in DBMS.

1. Database Administrator (DBA): Database Administrator (DBA) is a person/team who


defines the schema and also controls the 3 levels of database. The DBA will then create a
new account id and password for the user if he/she need to access the database. DBA is
also responsible for providing security to the database and he allows only the authorized
users to access/modify the data base. DBA is responsible for the problems such as
security breaches and poor system response time.
 DBA also monitors the recovery and backup and provide technical support.
 The DBA has a DBA account in the DBMS which called a system or superuser
account.
 DBA repairs damage caused due to hardware and/or software failures.
 DBA is the one having privileges to perform DCL (Data Control Language) operations
such as GRANT and REVOKE, to allow/restrict a particular user from accessing the
database.
2. Naive / Parametric End Users: Parametric End Users are the unsophisticated who don’t
have any DBMS knowledge but they frequently use the database applications in their
daily life to get the desired results. For examples, Railway’s ticket booking users are
naive users. Clerks in any bank is a naive user because they don’t have any DBMS
knowledge but they still use the database and perform their given task.
3. System Analyst :
System Analyst is a user who analyses the requirements of parametric end users. They
check whether all the requirements of end users are satisfied.
4. Sophisticated Users: Sophisticated users can be engineers, scientists, business analyst,
who are familiar with the database. They can develop their own database applications
according to their requirement. They don’t write the program code but they interact the
database by writing SQL queries directly through the query processor.
5. Database Designers: Data Base Designers are the users who design the structure of
database which includes tables, indexes, views, triggers, stored procedures and
constraints which are usually enforced before the database is created or populated with
data. He/she controls what data must be stored and how the data items to be related. It
is responsibility of Database Designers to understand the requirements of different user
groups and then create a design which satisfies the need of all the user groups.
6. Application Programmers: Application Programmers also referred as System Analysts
or simply Software Engineers, are the back-end programmers who writes the code for
the application programs. They are the computer professionals. These programs could
be written in Programming languages such as Visual Basic, Developer, C, FORTRAN,
COBOL etc. Application programmers design, debug, test, and maintain set of programs
called “canned transactions” for the Naive (parametric) users in order to interact with
database.
7. Casual Users / Temporary Users: Casual Users are the users who occasionally
use/access the database but each time when they access the database they require the
new information, for example, Middle or higher-level manager.

DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server architecture is used to deal
with a large number of PCs, web servers, database servers and other components that are connected
with networks.
o The client/server architecture consists of many PCs and a workstation which are connected via the
network.
o DBMS architecture depends upon how users are connected to the database to get their request done.

Types of DBMS Architecture

Database architecture can be seen as a single tier or multi-tier. But logically, database architecture
is of two types like: 2-tier architecture and 3-tier architecture.

1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user can directly sit on
the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide a handy tool
for end users.
o The 1-Tier architecture is used for development of the local application, where programmers can
directly communicate with the database for the quick response.
2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the
client end can directly communicate with the database at the server side. For this interaction, API's
like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and transaction
management.
o To communicate with the DBMS, client-side application establishes a connection with the server side.

3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In this architecture, client
can't directly communicate with the server.
o The application on the client-end interacts with an application server which further communicates
with the database system.
o End user has no idea about the existence of the database beyond the application server. The database
also has no idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web application.
Data Model
o Data models in DBMS help to understand the design at the conceptual, physical, and logical
levels as it provides a clear picture of the data making it easier for developers to create a
physical database.
Data models are used to describe how the data is stored, accessed, and updated in a DBMS. A set of
symbols and text is used to represent them so that all the members of an organization can understand
how the data is organized. It provides a set of conceptual tools that are vastly used to represent the
description of data.
o There are many types of data models that are used in the industry, some of them are -

Types of Data Model

Hierarchical Model
The hierarchical data model is one of the oldest data models, developed in the 1950s by IBM. In this data
model, the data is organized in a hierarchical tree-like structure. This data model can be easily visualized
because each record has one parent and many children (possibly 0) as shown in the image given below.

The above given image represents the data model of the Vehicle database, vehicle are classified into two
types Viz. two-wheelers and four-wheelers and then they are further classified. The main drawback we can
see here is we can only have one to many relationships under this model, hence the hierarchical data model
is very rarely used nowadays.

Network Model
A network model is nothing but a generalization of the hierarchical data model as this data model allows
many to many relationships therefore in this model a record can also have more than one parent.
The network model can be represented as a graph and hence it replaces the hierarchical tree with a graph
in which object types are the nodes and relationships are the edges. For example –

Here you can see all the three departments are linked with the director which was not possible in the
hierarchical data model. In the network model, there can be many possible paths to reach a node from the
root node (College is the root node in the above case), therefore the data can be accessed efficiently when
compared to the hierarchical data model. But, on the other hand, the process of insertion and deletion of
data is quite complex.

Entity-Relationship model (ER Model)


An Entity-Relationship model is a high-level data model that describes the structure of the database
in a pictorial form which is known as ER-diagram. In simple words, an ER diagram are used to
represent logical structure of the database easily. ER model develops a conceptual view of the data hence it
can be used as a blueprint to implement the database in the future.
Developers can easily understand the system just by looking at ER diagram. Let's first have a look at the
components of an ER diagram.

 Entity - Anything that has an independent existence about which we collect the data. They are
represented as rectangles in the ER diagram. For example - Car, house, employee.
 Entity Set - A set of the same type of entities is known as an entity set. For example - Set of
students studying in a college.
 Attributes - Properties that define entities are called attributes. They are represented by an ellipse
shape.
 Relationships - A relationship is used to describe the association between entities. They are
represented as diamond or rhombus shapes in the ER diagram.
In the above-represented ER diagram, we have two entities that are Employee and Company and the
relationship among them. Also, in the above-represented ER diagram, we can see that both employee and
company have some attributes and the relationship is of "works in" type, which means employee works in a
company.

Relational Model

This is the most widely accepted data model. In this model, the database is represented as a collection of
relations in the form of rows and columns of a two-dimensional table. Each row is known as a tuple (a
tuple contains all the data for an individual record) while each column represents an attribute. For example
-

Stu. Id Name Branch

101 Naman CSE

102 Saloni ECE

103 Rishabh IT

104 Pulkit ME

The above table shows a relation "STUDENT" with attributes as Stu. Id, Name, and Branch which consists of
4 records or tuples.

Object-Oriented Data model

As suggested by its name, the object-oriented data model is a combination of object-oriented


programming, and relational data model. In this data model, the data and their relationship are represented
in a single structure which is known as an object. Since data is stored as objects we can easily store audio,
video, images, etc in the database which was very difficult and inconvenient to do in the relational model.
As shown in the image below two objects are connected with each other through links.
In the above image, we have two objects that are Employee and Department in which all the data is
contained in a single unit (object). They are linked with each other as they share a common
attribute i.e.i.e. Department_Id.

Object Relational Data Model

Again, as suggested by its name, the object-relational data model is an integration of the object-oriented
model and the relational model. Since it inherits properties from both of the models it supports objects,
classes, etc like object-oriented model and tabular structures like the relational model. For example –

It provides data structures and operations used in the relational model and also provides features of object-
oriented models like classes, inheritance, etc. The only drawback of this data model is that it is complex and
quite difficult to handle.
Float Data Model

The float data model consists of a single two-dimensional array of data elements. For example, in the 2-d
array we can have one column as username and the other as password. One thing that can be noticed here
is, there should not be duplicate entries in the table (array).
For example -

The major drawbacks of this model are that it fails to store a huge amount of data secondly, to access any
data whole table needs to be searched which makes it inefficient.

Semi-Structured Data Model

The semi-structured data model is a generalized form of the relational model, which allows representing
data in a flexible way, hence we can not differentiate between data and schema in this model because,
in this model, some entities have a missing attribute(s) and on the other hand, some entities might
have some extra attribute(s) which in turn makes it easy to update the schema of the database.
For example - We can say a data model to be semi-structured if in some attributes we are storing both
atomic values (values that can't be divided further, for example, Roll_No) as well as a collection of values.

Associative Data Model

The associative data model sees the data in the same way as the brain does, i.e.i.e. entities and relationships
between them. The relationship is expressed as a simple English sentence of the form "subject-verb-object".
For example - From the sentences

 Pulkit is a customer
 Pulkit's customer id is 645.
 Neeraj is a customer
 Neeraj's customer id is 784.

We can make the following table -


Cust_ID Name

645 Pulkit

784 Neeraj

Context Data Model


The context model is nothing but a combination of several data models that have been discussed above.
For example, a context model can be a combination of a network model, ER model, etc. This data model
allows one to do many things which were not possible if he/she use a single data model.

Advantages of Data Models


 Data models ensure that the data is represented accurately.
 The relationship between the data is well defined.
 Data redundancy can be minimized and missing data can be identified easily.
 Last but not the least, security of the data is not compromised.

Disadvantages of Data Models


 The biggest disadvantage of the data model is, one must know the characteristics of physical data to build a
data model.
 Sometimes in big databases, it is quite difficult to understand the data model also the cost incurred is very
high.

Conclusion
 Data models are used to represent the logical structure of the database.
 Several types of data models exist with their own advantages and limitations.
 ER model, Relational model, and Object-oriented model are among the most popular data models.

Instance and schema (DBMS)


The overall design of the database is called database schema. Schema will not be changed
frequently. It is the logical structure of a database. It does not show the data in the database.
The schema is pictorially represented as follows −
Types of Schema
The different types of schemas are as follows −
 Physical schema − It is a database design at the physical level.It is hidden below the
logical schema and can be changed easily without affecting the application programs.
 Logical schema − It is a database design at the logical level. Programmers construct
applications using logical schema.
 External − It is schema at view level. It is the highest level of a schema which defines
the views for end users.
Generally the Database Management System (DBMS) assists one physical schema, one logical
schema and several sub or external schemas.
Database schema refers to the format and layout of the database in which the data will be stored.
It is the one thing that remains the same throughout unless otherwise modified. It defines the
structure of what type of data and how it will be stored.

Example
A database schema for a person will have fields for name, email, phone and address as shown
below −
Person

Name Email Phone no

Instance
Instance or extension or database state is a collection of information that stored in a database at
a particular moment is called an instance of the database. The Database instance refers to the
information stored in the database at a given point of time. Thus, it is a dynamic value which keeps
on changing.

Example
A database instance for the Person database can be (User1,emai.com,11345679,addr) So the
person construct will contain their individual entities in the attributes called as instance. This is
shown below −
Person

Name Email Phone no

BOB kksd@yasd.com 2343435

JANU werwr@sdas.in 5345464

PRIYA wefrwer@sdf.com 2342342


Differences
The major differences between schema and instance are as follows −

Database Schema Database Instance

It is the definition of the database or It is a snapshot of a database at a specific moment.


it is defined as the description of the
database.

It rarely changes. It changes frequently.

Example” We take two tables emp At a moment, what is the value of the database schema is
table and dept table.Emp called instance.At t=8 A.M

Id Empid name salary did

Name 1 A 5000 d1

Salary 2 B 2000 d2

dept At t=9 A.M

Dept 3

Dept_id C

dname 3000

Emp and dept both called as d3


schemasIt gives database definition
Empid 1 and 2 are called as Instance 1At time 9 A.M
instance 2 changes

This corresponds to the variable The value of the variable in a program at a point in time
declaration of a programming corresponds to an instance of the database schema.
language.
What is Data Independence of DBMS?
Data Independence is defined as a property of DBMS that helps you to change the Database
schema at one level of a database system without requiring to change the schema at the next
higher level. Data independence helps you to keep data separated from all programs that
make use of it.

You can use this stored data for computing and presentation. In many systems, data
independence is an essential function for components of the system.

Types of Data Independence


In DBMS there are two types of data independence

1. Physical data independence


2. Logical data independence.

Levels of Database
Before we learn Data Independence, a refresher on Database Levels is important. The
database has 3 levels as shown in the diagram below

1. Physical/Internal
2. Conceptual
3. External

Consider an Example of a University Database. At the different levels this is how the
implementation will look like:

Type of Schema Implementation

View 1: Course info(cid:int,cname:string)


External Schema
View 2: studeninfo(id:int. name:string)
Students(id: int, name: string, login: string, age: integer)
Conceptual Shema Courses(id: int, cname:string, credits:integer)
Enrolled(id: int, grade:string)
 Relations stored as unordered files.
Physical Schema  Index on the first column of Students.

Physical Data Independence


Physical data independence helps you to separate conceptual levels from the
internal/physical levels. It allows you to provide a logical description of the database without
the need to specify physical structures. Compared to Logical Independence, it is easy to
achieve physical data independence.

With Physical independence, you can easily change the physical storage structures or devices
with an effect on the conceptual schema. Any change done would be absorbed by the
mapping between the conceptual and internal levels. Physical data independence is
achieved by the presence of the internal level of the database and then the transformation
from the conceptual level of the database to the internal level.

Examples of changes under Physical Data Independence


Due to Physical independence, any of the below change will not affect the conceptual layer.

 Using a new storage device like Hard Drive or Magnetic Tapes


 Modifying the file organization technique in the Database
 Switching to different data structures.
 Changing the access method.
 Modifying indexes.
 Changes to compression techniques or hashing algorithms.
 Change of Location of Database from say C drive to D Drive

Logical Data Independence


Logical Data Independence is the ability to change the conceptual scheme without changing

1. External views
2. External API or programs

Any change made will be absorbed by the mapping between external and conceptual levels.

When compared to Physical Data independence, it is challenging to achieve logical data


independence.

Examples of changes under Logical Data Independence


Due to Logical independence, any of the below change will not affect the external layer.
1. Add/Modify/Delete a new attribute, entity or relationship is possible without a rewrite
of existing application programs
2. Merging two records into one
3. Breaking an existing record into two or more records

Difference between Physical and Logical Data Independence


Logica Data Independence Physical Data Independence

Logical Data Independence is mainly


Mainly concerned with the storage of the
concerned with the structure or changing the
data.
data definition.

It is difficult as the retrieving of data is mainly


It is easy to retrieve.
dependent on the logical structure of data.

Compared to Logic Physical independence it


Compared to Logical Independence it is easy
is difficult to achieve logical data
to achieve physical data independence.
independence.

You need to make changes in the Application


A change in the physical level usually does not
program if new fields are added or deleted
need change at the Application program level.
from the database.

Modification at the logical levels is significant Modifications made at the internal levels may
whenever the logical structures of the or may not be needed to improve the
database are changed. performance of the structure.

Concerned with conceptual schema Concerned with internal schema

Example: change in compression techniques,


Example: Add/Modify/Delete a new attribute
hashing algorithms, storage devices, etc

Importance of Data Independence


 Helps you to improve the quality of the data
 Database system maintenance becomes affordable
 Enforcement of standards and improvement in database security
 You don’t need to alter data structure in application programs
 Permit developers to focus on the general structure of the Database rather than
worrying about the internal implementation
 It allows you to improve state which is undamaged or undivided
 Database incongruity is vastly reduced.
 Easily make modifications in the physical level is needed to improve the performance
of the system.
Hierarchical model in DBMS
Data Models in the Database Management System (DBMS)are divided into three categories.
These are as follows −
 Object Model − It uses concepts like entities, attributes, and their relationship. This
model can be used to describe the data at the conceptual and external level. For
Example: E-R model.
 Physical Model − These models describe how the data is stored in the computer. This
model is used to describe the data at the internal level.
 Record based logical models − These models are used in describing the data at the
logical and view level. It is used to specify the overall logical structure of the database.

Features of Hierarchical Model


The features of hierarchical model are as follows −
 It is based on tree structure.
 It consists of a collection of records that are connected to each other by links.
 The tree structure used in hierarchical models is called as routed true.
 The root node of that tree is an empty node.
 So, Hierarchical model is a collection of routed trees and the relationship that exists
in the hierarchical model is one to many and one to one.
Advantages
The advantages of hierarchical model are as follows −
 It is easy to understand.
 More efficient than the ER model.
Disadvantages
The disadvantages of hierarchical model are as follows −
 Data inconsistency occurs when the parent node is deleting that result in the deletion
of the child node.
 Wastage of storage space.
 Complex to design.
 Absence of structural independence.
Given below is an example of the hierarchical model in pictorial format −
Relational Database Management System (RDBMS)
RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and
for all modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft
Access.
A Relational database management system (RDBMS) is a database management system (DBMS)
that is based on the relational model as introduced by E. F. Codd.

What is a table?
The data in an RDBMS is stored in database objects which are called as tables. This table is
basically a collection of related data entries and it consists of numerous columns and rows.
Remember, a table is the most common and simplest form of data storage in a relational database.
The following program is an example of a CUSTOMERS table −

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+

What is a field?
Every table is broken up into smaller entities called fields. The fields in the CUSTOMERS table
consist of ID, NAME, AGE, ADDRESS and SALARY.
A field is a column in a table that is designed to maintain specific information about every record
in the table.

What is a Record or a Row?


A record is also called as a row of data is each individual entry that exists in a table. For example,
there are 7 records in the above CUSTOMERS table. Following is a single row of data or record in
the CUSTOMERS table −

+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+----+----------+-----+-----------+----------+

A record is a horizontal entity in a table.

What is a column?
A column is a vertical entity in a table that contains all information associated with a specific field
in a table.
For example, a column in the CUSTOMERS table is ADDRESS, which represents location
description and would be as shown below −

+-----------+
| ADDRESS |
+-----------+
| Ahmedabad |
| Delhi |
| Kota |
| Mumbai |
| Bhopal |
| MP |
| Indore |
+----+------+

What is a NULL value?


A NULL value in a table is a value in a field that appears to be blank, which means a field with a
NULL value is a field with no value.
It is very important to understand that a NULL value is different than a zero value or a field that
contains spaces. A field with a NULL value is the one that has been left blank during a record
creation.

SQL Constraints
Constraints are the rules enforced on data columns on a table. These are used to limit the type of
data that can go into a table. This ensures the accuracy and reliability of the data in the database.
Constraints can either be column level or table level. Column level constraints are applied only to
one column whereas, table level constraints are applied to the entire table.
Following are some of the most commonly used constraints available in SQL −
 NOT NULL Constraint − Ensures that a column cannot have a NULL value.
 DEFAULT Constraint − Provides a default value for a column when none is specified.
 UNIQUE Constraint − Ensures that all the values in a column are different.
 PRIMARY Key − Uniquely identifies each row/record in a database table.
 FOREIGN Key − Uniquely identifies a row/record in any another database table.
 CHECK Constraint − The CHECK constraint ensures that all values in a column satisfy
certain conditions.
 INDEX − Used to create and retrieve data from the database very quickly.

Data Integrity
The following categories of data integrity exist with each RDBMS −
 Entity Integrity − There are no duplicate rows in a table.
 Domain Integrity − Enforces valid entries for a given column by restricting the type,
the format, or the range of values.
 Referential integrity − Rows cannot be deleted, which are used by other records.
 User-Defined Integrity − Enforces some specific business rules that do not fall into
entity, domain or referential integrity.
Advantages
 Easy to manage: Each table can be independently manipulated without affecting
others.
 Security: It is more secure consisting of multiple levels of security. Access of data
shared can be limited.
 Flexible: Updating of data can be done at a single point without making amendments
at multiple files. Databases can easily be extended to incorporate more records, thus
providing greater scalability. Also, facilitates easy application of SQL queries.
 Users: RDBMS supports client-side architecture storing multiple users together.
 Facilitates storage and retrieval of large amount of data.
 Easy Data Handling:
 Data fetching is faster because of relational architecture.
 Data redundancy or duplicity is avoided due to keys, indexes, and
normalization principles.
 Data consistency is ensured because RDBMS is based on ACID properties for
data transactions(Atomicity Consistency Isolation Durability).
 Fault Tolerance: Replication of databases provides simultaneous access and helps the
system recover in case of disasters, such as power failures or sudden shutdowns
Disadvantages
 High Cost and Extensive Hardware and Software Support: Huge costs and setups are
required to make these systems functional.
 Scalability: In case of addition of more data, servers along with additional power, and
memory are required.
 Complexity: Voluminous data creates complexity in understanding of relations and
may lower down the performance.
 Structured Limits: The fields or columns of a relational database system is enclosed
within various limits, which may lead to loss of data

Database Normalization
Database normalization is the process of efficiently organizing data in a database. There are two
reasons of this normalization process −
 Eliminating redundant data, for example, storing the same data in more than one
table.
 Ensuring data dependencies make sense.
Both these reasons are worthy goals as they reduce the amount of space a database consumes
and ensures that data is logically stored. Normalization consists of a series of guidelines that help
guide you in creating a good database structure.
Normalization guidelines are divided into normal forms; think of a form as the format or the way a
database structure is laid out. The aim of normal forms is to organize the database structure, so
that it complies with the rules of first normal form, then second normal form and finally the third
normal form.
It is your choice to take it further and go to the fourth normal form, fifth normal form and so on, but
in general, the third normal form is more than enough.

 First Normal Form (1NF)


 Second Normal Form (2NF)
 Third Normal Form (3NF)

DBMS RDBMS

DBMS stores data as file. RDBMS stores data in tabular form.

Data elements need to access Multiple data elements can be accessed at the
individually. same time.

Data is stored in the form of tables which are


No relationship between data. related to each other.

Normalization is not present. Normalization is present.

DBMS does not support distributed


database. RDBMS supports distributed database.

It uses a tabular structure where the headers


It stores data in either a navigational or are the column names, and the rows contain
hierarchical form. corresponding values.

It deals with small quantity of data. It deals with large amount of data.

Data redundancy is common in this Keys and indexes do not allow Data
model. redundancy.

It is used for small organization and


deal with small data. It is used to handle large amount of data.

It supports single user. It supports multiple users.

Data fetching is slower for the large Data fetching is fast because of relational
amount of data. approach.

The data in a DBMS is subject to low


security levels with regards to data There exists multiple levels of data security in a
manipulation. RDBMS.
DBMS RDBMS

Low software and hardware


necessities. Higher software and hardware necessities.

Examples: MySQL, PostgreSQL, SQL Server,


Examples: XML, Window Registry, etc. Oracle, Microsoft Access etc.
Data Modelling using the Entity Relationship Model
ER model
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used to
define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and easy to design
view of data.
o In ER modelling, the database structure is portrayed as a diagram called an entity-relationship
diagram.

For example, Suppose we design a school database. In this database, the student will be an entity
with attributes like address, name, id, age, etc. The address can be another entity with attributes like
city, street name, pin code, etc and there will be a relationship between them.

Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be represented
as rectangles.

Consider an organization as an example- manager, product, employee, department etc. can be taken
as an entity.

a. Weak Entity

An entity that depends on another entity called a weak entity. The weak entity doesn't contain
any key attribute of its own. The weak entity is represented by a double rectangle.

2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.

a. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a primary
key. The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute. The
composite attribute is represented by an ellipse, and those ellipses are connected with an ellipse.

c. Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivalued attribute.
The double oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.

d. Derived Attribute

An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from another attribute like Date
of birth.

3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is used to
represent the relationship.

Types of relationship are as follows:

a. One-to-One Relationship

When only one instance of an entity is associated with the relationship, then it is known as one to
one relationship.

For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.

c. Many-to-one relationship

When more than one instance of the entity on the left, and only one instance of an entity on the
right associates with the relationship then it is known as a many-to-one relationship.

For example, Student enrolls for only one course, but a course can have many students.

d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an entity on
the right associates with the relationship then it is known as a many-to-many relationship.

For example, Employee can assign by many projects and project can have many employees.

Notation of ER diagram
Database can be represented using the notations. In ER diagram, many notations are used to express
the cardinality. These notations are as follows:
Types of Attributes Description
Simple attributes can’t be divided any further.
Simple attribute For example, a student’s contact number. It is
also called an atomic value.
It is possible to break down composite
attribute. For example, a student’s full name
Composite attribute
may be further divided into first name, second
name, and last name.
This type of attribute does not include in the
physical database. However, their values are
derived from other attributes present in the
Derived attribute
database. For example, age should not be
stored directly. Instead, it should be derived
from the DOB of that employee.
Multivalued attributes can have more than
one values. For example, a student can have
Multivalued attribute
more than one mobile number, email address,
etc.

Strong Entity Set Weak Entity Set


It does not have enough attributes to build a
Strong entity set always has a primary key.
primary key.

It is represented by a double rectangle


It is represented by a rectangle symbol.
symbol.

It contains a Primary key represented by the It contains a Partial Key which is represented
underline symbol. by a dashed underline symbol.

The member of a strong entity set is called as The member of a weak entity set called as a
dominant entity set. subordinate entity set.

In a weak entity set, it is a combination of


Primary Key is one of its attributes which
primary key and partial key of the strong
helps to identify its member.
entity set.

In the ER diagram the relationship between The relationship between one strong and a
two strong entity set shown by using a weak entity set shown by using the double
diamond symbol. diamond symbol.

The connecting line of the strong entity set The line connecting the weak entity set for
with the relationship is single. identifying relationship is double.
Mapping Constraints
o A mapping constraint is a data constraint that expresses the number of entities to which another entity
can be related via a relationship set.
o It is most useful in describing the relationship sets that involve more than two entity sets.
o For binary relationship set R on an entity set A and B, there are four possible mapping cardinalities.
These are as follows:
1. One to one (1:1)
2. One to many (1:M)
3. Many to one (M:1)
4. Many to many (M:M)

One-to-one
In one-to-one mapping, an entity in E1 is associated with at most one entity in E2, and an entity in
E2 is associated with at most one entity in E1.

One-to-many
In one-to-many mapping, an entity in E1 is associated with any number of entities in E2, and an
entity in E2 is associated with at most one entity in E1.

Many-to-one
In one-to-many mapping, an entity in E1 is associated with at most one entity in E2, and an entity in
E2 is associated with any number of entities in E1.
Many-to-many
In many-to-many mapping, an entity in E1 is associated with any number of entities in E2, and an
entity in E2 is associated with any number of entities in E1.
Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to establish and
identify relationships between tables.

For example, ID is used as a key in the Student table because it is unique for each student. In the
PERSON table, passport_number, license_number, SSN are keys since they are unique for each
person.

Types of keys:

1. Primary key
o It is the first key used to identify one and only one instance of an entity uniquely. An entity can contain
multiple keys, as we saw in the PERSON table. The key which is most suitable from those lists becomes
a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary keys since
they are also unique.
o For each entity, the primary key selection is based on requirements and developers.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The candidate
keys are as strong as the primary key.

For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes,
like SSN, Passport_Number, License_Number, etc., are considered a candidate key.

3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a candidate
key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can
also be a key.

The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.


4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another table.
o Every employee works in a specific department in a company, and employee and department are two
different entities. So we can't store the department's information in the employee table. That's why
we link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the
EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.

5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple
in a relation. These attributes or combinations of the attributes are called the candidate keys. One
key is chosen as the primary key from these candidate keys, and the remaining candidate key, if it
exists, is termed the alternate key. In other words, the total number of the alternate keys is the total
number of candidate keys minus the primary key. The alternate key may or may not exist. If there is
only one candidate key in a relation, it does not have an alternate key.

For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate
keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No,
acts as the Alternate key.

6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This
key is also known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned multiple roles,
and an employee may work on multiple projects simultaneously. So the primary key will be
composed of all three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So these
attributes act as a composite key since the primary key comprises more than one attribute.

7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created
when a primary key is large and complex and has no relationship with many other relations. The
data values of the artificial keys are usually numbered in a serial order.

For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in
employee relations. So it would be better to add a new virtual attribute to identify each tuple in the
relation uniquely.
Extended ER model - Generalization,
Specialization, Aggregation
Generalization
o Generalization is like a bottom-up approach in which two or more entities of lower level combine to
form a higher-level entity if they have some attributes in common.
o In generalization, an entity of a higher level can also combine with the entities of the lower level to
form a further higher-level entity.
o Generalization is more like subclass and superclass system, but the only difference is the approach.
Generalization uses the bottom-up approach.
o In generalization, entities are combined to form a more generalized entity, i.e., subclasses are
combined to make a superclass.

For example, Faculty and Student entities can be generalized and create a higher-level entity
Person.

Specialization
o Specialization is a top-down approach, and it is opposite to Generalization. In specialization, one
higher level entity can be broken down into two lower-level entities.
o Specialization is used to identify the subset of an entity set that shares some distinguishing
characteristics.
o Normally, the superclass is defined first, the subclass and its related attributes are defined next, and
relationship set are then added.

For example: In an Employee management system, EMPLOYEE entity can be specialized as TESTER
or DEVELOPER based on what role they play in the company.
Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation,
relationship with its corresponding entities is aggregated into a higher-level entity.

For example: Center entity offers the Course entity act as a single entity in the relationship which is
in a relationship with another entity visitor. In the real world, if a visitor visits a coaching center then
he will never enquiry about the Course only or just about the Center instead he will ask the enquiry
about both.
ER diagram to tables Mapping
Following rules are used for converting an ER diagram into the tables:

Rule-01: For Strong Entity Set With Only Simple Attributes-

A strong entity set with only simple attributes will require only one table in relational model.
 Attributes of the table will be the attributes of the entity set.
 The primary key of the table will be the key attribute of the entity set.
Example-

Roll_no Name Sex

Schema : Student ( Roll_no , Name , Sex )

Rule-02: For Strong Entity Set With Composite Attributes-

 A strong entity set with any number of composite attributes will require only one table in
relational model.
 While conversion, simple attributes of the composite attributes are taken into account and not
the composite attribute itself.
Example-
Roll_no First_name Last_name House_no Street City

Schema : Student ( Roll_no , First_name , Last_name , House_no , Street , City )

Rule-03: For Strong Entity Set With Multi Valued Attributes-

A strong entity set with any number of multi valued attributes will require two tables in relational model.
 One table will contain all the simple attributes with the primary key.
 Other table will contain the primary key and all the multi valued attributes.
Example-

Roll_no City

Roll_no Mobile_no
Rule-04: Translating Relationship Set into a Table-

A relationship set will require one table in the relational model.


Attributes of the table are-
 Primary key attributes of the participating entity sets
 Its own descriptive attributes if any.
Set of non-descriptive attributes will be the primary key.
Example-

Emp_no Dept_id since

Schema : Works in ( Emp_no , Dept_id , since )

NOTE-

If we consider the overall ER diagram, three tables will be required in relational model-
 One table for the entity set “Employee”
 One table for the entity set “Department”
 One table for the relationship set “Works in”

Rule-05: For Binary Relationships With Cardinality Ratios-

The following four cases are possible-


 Case-01: Binary relationship with cardinality ratio m:n
 Case-02: Binary relationship with cardinality ratio 1:n
 Case-03: Binary relationship with cardinality ratio m:1
 Case-04: Binary relationship with cardinality ratio 1:1
Case-01: For Binary Relationship With Cardinality Ratio m:n

Here, three tables will be required-


1. A ( a1 , a2 )
2. R ( a1 , b1 )
3. B ( b1 , b2 )

Case-02: For Binary Relationship With Cardinality Ratio 1:n

Here, two tables will be required-


1. A ( a1 , a2 )
2. BR ( a1 , b1 , b2 )

NOTE- Here, combined table will be drawn for the entity set B and relationship set R.

Case-03: For Binary Relationship With Cardinality Ratio m:1


Here, two tables will be required-
1. AR ( a1 , a2 , b1 )
2. B ( b1 , b2 )

NOTE- Here, combined table will be drawn for the entity set A and relationship set R.

Case-04: For Binary Relationship With Cardinality Ratio 1:1

Here, two tables will be required. Either combine ‘R’ with ‘A’ or ‘B’

Way-01:
1. AR ( a1 , a2 , b1 )
2. B ( b1 , b2 )

Way-02:
1. A ( a1 , a2 )
2. BR ( a1 , b1 , b2 )

Thumb Rules to Remember

While determining the minimum number of tables required for binary relationships with given
cardinality ratios, following thumb rules must be kept in mind-
 For binary relationship with cardinality ration m : n , separate and individual tables will be drawn
for each entity set and relationship.
 For binary relationship with cardinality ratio either m : 1 or 1 : n , always remember “many side
will consume the relationship” i.e. a combined table will be drawn for many side entity set and
relationship set.
 For binary relationship with cardinality ratio 1 : 1 , two tables will be required. You can combine
the relationship set with any one of the entity sets.

Rule-06: For Binary Relationship With Both Cardinality Constraints and


Participation Constraints-
 Cardinality constraints will be implemented as discussed in Rule-05.
 Because of the total participation constraint, foreign key acquires NOT NULL constraint i.e. now
foreign key can not be null.

Case-01: For Binary Relationship With Cardinality Constraint and Total Participation Constraint From One Side-

Because cardinality ratio = 1 : n , so we will combine the entity set B and relationship set R.
Then, two tables will be required-
1. A ( a1 , a2 )
2. BR ( a1 , b1 , b2 )
Because of total participation, foreign key a1 has acquired NOT NULL constraint, so it can’t be null
now.

Case-02: For Binary Relationship With Cardinality Constraint and Total Participation Constraint From Both
Sides-

If there is a key constraint from both the sides of an entity set with total participation, then that binary
relationship is represented using only single table.

Here, Only one table is required.


 ARB ( a1 , a2 , b1 , b2 )

Rule-07: For Binary Relationship With Weak Entity Set-

Weak entity set always appears in association with identifying relationship with total participation
constraint.
Here, two tables will be required-
1. A ( a1 , a2 )
2. BR ( a1 , b1 , b2 )

You might also like