You are on page 1of 107

Data Model

Data Model gives us an idea that how the final system will look like after its complete
implementation. It defines the data elements and the relationships between the data
elements. Data Models are used to show how data is stored, connected, accessed and
updated in the database management system. Here, we use a set of symbols and text to
represent the information so that members of the organisation can communicate and
understand it. Though there are many data models being used nowadays but the Relational
model is the most widely used model. Apart from the Relational model, there are many
other types of data models about which we will study in details in this blog. Some of the
Data Models in DBMS are:

1. Hierarchical Model
2. Network Model
3. Entity-Relationship Model
4. Relational Model
5. Object-Oriented Data Model
6. Object-Relational Data Model
7. Flat Data Model
8. Semi-Structured Data Model
9. Associative Data Model
10. Context Data Model

Hierarchical Model
Hierarchical Model was the first DBMS model. This model organises the data in the
hierarchical tree structure. The hierarchy starts from the root which has root data and then it
expands in the form of a tree adding child node to the parent node. This model easily
represents some of the real-world relationships like food recipes, sitemap of a website
etc. Example: We can represent the relationship between the shoes present on a shopping
website in the following way:
Features of a Hierarchical Model
1. One-to-many relationship: The data here is organised in a tree-like structure where
the one-to-many relationship is between the datatypes. Also, there can be only one
path from parent to any node. Example: In the above example, if we want to go to
the node sneakers we only have one path to reach there i.e through men's shoes
node.
2. Parent-Child Relationship: Each child node has a parent node but a parent node can
have more than one child node. Multiple parents are not allowed.
3. Deletion Problem: If a parent node is deleted then the child node is automatically
deleted.
4. Pointers: Pointers are used to link the parent node with the child node and are used
to navigate between the stored data. Example: In the above example the ' shoes '
node points to the two other nodes ' women shoes ' node and ' men's shoes ' node.

Advantages of Hierarchical Model


 It is very simple and fast to traverse through a tree-like structure.
 Any change in the parent node is automatically reflected in the child node so, the
integrity of data is maintained.

Disadvantages of Hierarchical Model


 Complex relationships are not supported.
 As it does not support more than one parent of the child node so if we have some
complex relationship where a child node needs to have two parent node then that
can't be represented using this model.
 If a parent node is deleted then the child node is automatically deleted.

Network Model
This model is an extension of the hierarchical model. It was the most popular model before
the relational model. This model is the same as the hierarchical model, the only difference
is that a record can have more than one parent. It replaces the hierarchical tree with a
graph. Example: In the example below we can see that node student has two parents i.e.
CSE Department and Library. This was earlier not possible in the hierarchical model.
Features of a Network Model
1. Ability to Merge more Relationships: In this model, as there are more relationships
so data is more related. This model has the ability to manage one-to-one
relationships as well as many-to-many relationships.
2. Many paths: As there are more relationships so there can be more than one path to
the same record. This makes data access fast and simple.
3. Circular Linked List: The operations on the network model are done with the help
of the circular linked list. The current position is maintained with the help of a
program and this position navigates through the records according to the
relationship.

Advantages of Network Model


 The data can be accessed faster as compared to the hierarchical model. This is
because the data is more related in the network model and there can be more than
one path to reach a particular node. So the data can be accessed in many ways.
 As there is a parent-child relationship so data integrity is present. Any change in
parent record is reflected in the child record.

Disadvantages of Network Model


 As more and more relationships need to be handled the system might get complex.
So, a user must be having detailed knowledge of the model to work with the model.
 Any change like updation, deletion, insertion is very complex.

Entity-Relationship Model
Entity-Relationship Model or simply ER Model is a high-level data model diagram. In this
model, we represent the real-world problem in the pictorial form to make it easy for the
stakeholders to understand. It is also very easy for the developers to understand the system
by just looking at the ER diagram. We use the ER diagram as a visual tool to represent an
ER Model. ER diagram has the following three components:

 Entities: Entity is a real-world thing. It can be a person, place, or even a


concept. Example: Teachers, Students, Course, Building, Department, etc are some
of the entities of a School Management System.
 Attributes: An entity contains a real-world property called attribute. This is the
characteristics of that attribute. Example: The entity teacher has the property like
teacher id, salary, age, etc.
 Relationship: Relationship tells how two attributes are related. Example: Teacher
works for a department.

Example:

In the above diagram, the entities are Teacher and Department. The attributes
of Teacher entity are Teacher_Name, Teacher_id, Age, Salary, Mobile_Number. The
attributes of entity Department entity are Dept_id, Dept_name. The two entities are
connected using the relationship. Here, each teacher works for a department.

Features of ER Model
 Graphical Representation for Better Understanding: It is very easy and simple to
understand so it can be used by the developers to communicate with the
stakeholders.
 ER Diagram: ER diagram is used as a visual tool for representing the model.
 Database Design: This model helps the database designers to build the database and
is widely used in database design.

Advantages of ER Model
 Simple: Conceptually ER Model is very easy to build. If we know the relationship
between the attributes and the entities we can easily build the ER Diagram for the
model.
 Effective Communication Tool : This model is used widely by the database
designers for communicating their ideas.
 Easy Conversion to any Model : This model maps well to the relational model and
can be easily converted relational model by converting the ER model to the table.
This model can also be converted to any other model like network model,
hierarchical model etc.

Disadvatages of ER Model
 No industry standard for notation: There is no industry standard for developing an
ER model. So one developer might use notations which are not understood by other
developers.
 Hidden information: Some information might be lost or hidden in the ER model.
As it is a high-level view so there are chances that some details of information
might be hidden.

Relational Model
Relational Model is the most widely used model. In this model, the data is maintained in
the form of a two-dimensional table. All the information is stored in the form of row and
columns. The basic structure of a relational model is tables. So, the tables are also
called relations in the relational model. Example: In this example, we have an Employee
table.

Features of Relational Model


 Tuples : Each row in the table is called tuple. A row contains all the information
about any instance of the object. In the above example, each row has all the
information about any specific individual like the first row has information about
John.
 Attribute or field: Attributes are the property which defines the table or relation.
The values of the attribute should be from the same domain. In the above example,
we have different attributes of the employee like Salary, Mobile_no, etc.

Advnatages of Relational Model


 Simple: This model is more simple as compared to the network and hierarchical
model.
 Scalable: This model can be easily scaled as we can add as many rows and columns
we want.
 Structural Independence: We can make changes in database structure without
changing the way to access the data. When we can make changes to the database
structure without affecting the capability to DBMS to access the data we can say
that structural independence has been achieved.

Disadvantages of Relatinal Model


 Hardware Overheads: For hiding the complexities and making things easier for the
user this model requires more powerful hardware computers and data storage
devices.
 Bad Design: As the relational model is very easy to design and use. So the users
don't need to know how the data is stored in order to access it. This ease of design
can lead to the development of a poor database which would slow down if the
database grows.
But all these disadvantages are minor as compared to the advantages of the relational
model. These problems can be avoided with the help of proper implementation and
organisation.

Object-Oriented Data Model


The real-world problems are more closely represented through the object-oriented data
model. In this model, both the data and relationship are present in a single structure known
as an object. We can store audio, video, images, etc in the database which was not possible
in the relational model(although you can store audio and video in relational database, it is
adviced not to store in the relational database). In this model, two are more objects are
connected through links. We use this link to relate one object to other objects. This can be
understood by the example given below.
In the above example, we have two objects Employee and Department. All the data and
relationships of each object are contained as a single unit. The attributes like Name,
Job_title of the employee and the methods which will be performed by that object are
stored as a single object. The two objects are connected through a common attribute i.e the
Department_id and the communication between these two will be done with the help of this
common id.

Object-Relational Model
As the name suggests it is a combination of both the relational model and the object-
oriented model. This model was built to fill the gap between object-oriented model and the
relational model. We can have many advanced features like we can make complex data
types according to our requirements using the existing data types. The problem with this
model is that this can get complex and difficult to handle. So, proper understanding of this
model is required.

Flat Data Model


It is a simple model in which the database is represented as a table consisting of rows and
columns. To access any data, the computer has to read the entire table. This makes the
modes slow and inefficient.

Semi-Structured Model
Semi-structured model is an evolved form of the relational model. We cannot differentiate
between data and schema in this model. Example: Web-Based data sources which we can't
differentiate between the schema and data of the website. In this model, some entities may
have missing attributes while others may have an extra attribute. This model gives
flexibility in storing the data. It also gives flexibility to the attributes. Example: If we are
storing any value in any attribute then that value can be either atomic value or a collection
of values.

Associative Data Model


Associative Data Model is a model in which the data is divided into two parts. Everything
which has independent existence is called as an entity and the relationship among these
entities are called association . The data divided into two parts are called items and links.

 Item : Items contain the name and the identifier(some numeric value).
 Links: Links contain the identifier, source, verb and subject.
Example : Let us say we have a statement "The world cup is being hosted by London from
30 May 2020". In this data two links need to be stored:

1. The world cup is being hosted by London. The source here is 'the world cup', the
verb 'is being' and the target is 'London'.
2. ...from 30 May 2020. The source here is the previous link, the verb is 'from' and the
target is '30 May 2020'.
This is represented using the table as follows:

Context Data Model


Context Data Model is a collection of several models. This consists of models like network
model, relational models etc. Using this model we can do various types of tasks which are
not possible using any model alone.
Database Management System

Dr. S.P.Khandait

1
Introduction to DBMS
• Purpose of Database Systems
• View of Data
• Data Models
• Data Definition Language
• Data Manipulation Language
• Transaction Management
• Storage Management
• Database Administrator
• Database Users
• Overall System Structure

2
Database Management System
(DMBS)

• Collection of interrelated data


• Set of programs to access the data
• DMBS contains information about a
particular enterprise
• DBMS provides an environment that it both
convenient and efficient to use

3
Purpose of Database Systems
Database management systems were developed to
handle the following difficulties of typical file-
processing systems supported by conventional
operating systems:
• Data redundancy and inconsistency
• Difficulty in accessing data
• Data isolation – multiple files and formats
• Integrity problems
• Atomicity of updates
• Concurrent access by multiple users
• Security problems

4
Levels of Abstraction
• Physical level: describes how a record (e.g.
customer) is stored.
• Logical level: describes data stored in database,
and the relationships among the data.
type customer = record
name: string;
street: string;
city: integer;
end;
• View level: application programs hide details of
data types. Views can also hide information (e.g.
salary) for security purposes.

5
Objectives of Three-Level Architecture

• All users should be able to access same data.

• A change in a user’s view should not affect


other users’ views.

• Users should not need to know physical


database storage details.

6
Objectives of Three-Level Architecture

• DBA should be able to change database storage


structures without affecting the users’ views.

• Internal structure of database should be


unaffected by changes to physical aspects of
storage.

• DBA should be able to change conceptual


structure of database without affecting all users.

7
View of Data
An architecture for a database system
View level

View 1 View 2 … View n

Logical
level

Physical
level

8
ANSI-SPARC Three-Level Architecture

9
ANSI-SPARC Three-Level Architecture

• External Level
– Users’ view of the database.
– Describes that part of database that is relevant
to a particular user.
– The way perceived by end users.
• Conceptual Level
– Community view of the database.
– Describes what data is stored in database and
relationships among the data.
– The way perceived by the DBA &
programmers.
10
ANSI-SPARC Three-Level Architecture

Internal Level
– Physical representation of the database on the
computer.
– Describes how the data is stored in the
database.
– The way perceived by the DBMS & OS.

11
Differences between Three Levels

12
Instances and Schemas
• Similar to types and variables in programming
languages
• Schema – the logical structure of the database
(e.g., set of customers and accounts and the
relationship between them)
• Instance – the actual content of the database at
a particular point in time
 In RDBMS context: Schema – table names,
attribute names with their data types for each
table and constraints etc.
13
Schemas versus Instances
• Database Schema: The description of the database. It rarely changes.
– Includes descriptions of the database structure, data types, and the
constraints on the database.

• Database Instance (snapshot): The actual data stored in a database


at a particular moment in time. Changes rapidly.

• The concepts of Schema & Instances corresponds to Types & Values


in programming languages, respectively.

14
Schemas, Mappings, and Instances
u Mapping is the process of transforming requests and results between
the Internal, Conceptual & External levels.

• Programs refer to an external schema, and are mapped by the


DBMS to the internal schema for execution.
• Data extracted from the internal DBMS level is reformatted to
match the user’s external view.
u Two types of mapping:
– External / Conceptual mapping.
– Conceptual / Internal mapping.

15
Example
Schema Instance

16
Data Independence
• Ability to modify a schema definition in one
level without affecting a schema definition in
the other levels.
• The interfaces between the various levels and
components should be well defined so that
changes in some parts do not seriously
influence others.
• Two levels of data independence
– Physical data independence
– Logical data independence

17
Data Independence

• Logical Data Independence


– Refers to immunity of external schemas to
changes in conceptual schema.
– Conceptual schema changes (e.g.
addition/removal of entities).
– Should not require changes to external schema
or rewrites of application programs.

18
Data Independence

• Physical Data Independence


– Refers to immunity of conceptual schema to
changes in the internal schema.
– Internal schema changes (e.g. using different file
organizations, storage structures/devices).
– Should not require change to conceptual or
external schemas.

19
Data Independence and the ANSI-SPARC
Three-Level Architecture

20
Database Languages

• Data Manipulation Language (DML)


– Provides basic data manipulation operations on data held in the
database.
– DML is a language for retrieving and updating (insert, delete, &
modify) the data in the DB.
– Types of DML:
• Procedural Language (3GL): user specifies what data is required
and how to get those data(allows user to tell system exactly how to
manipulate data.) Ex:Java

• Nonprocedural Language(4GL): user specifies what data is


required without specifying how to get those data(allows user to
state what data is needed rather than how it is to be retrieved.)
Ex:SQL

21
Data Definition Language (DDL)
• Specification notation for defining the database
schema
• DDL compiler generates a set of tables stored in a
data dictionary
• Data dictionary contains metadata (data about data)
• Data storage and definition language – special type
of DDL in which the storage structure and access
methods used by the database system are specified

22
Data Manipulation Language (DML)

• Language for accessing and manipulating the


data organized by the appropriate data model
• Two classes of languages
– Procedural – user specifies what data is required
and how to get those data
– Nonprocedural – user specifies what data is
required without specifying how to get those data

23
Database Languages

• Both DDL and DML are usually not considered


distinct languages. Rather, they are included in a
comprehensive integrated language.

• For example, SQL relational database language


is a comprehensive DB language which represents
a combination of DDL and DML.

24
Database Languages
• DBMS have a facility for embedding DDL & DML
(sub-languages) in a High-Level Language
(COBOL, C, C++ or Java), which in this case is
considered a host language

C,C++,Lisp,..
Application Program

Call to DB
DBMS

Local Vars
(memory)

25
Data Model
• Integrated collection of concepts for describing
data, relationships between data, and constraints
on the data in an organization.
– To represent data in an understandable way.

• Data Model comprises:


– a structural part;
– a manipulative part;
– possibly a set of integrity rules.

26
Data Models
• A collection of tools for describing:
– Data
– Data relationships
– Data semantics
– Data constraints
• Object-based logical models
– Entity-relationship model
– Object-oriented model
– Semantic model
– Functional model
• Record-based logical models
– Relational model (e.g., SQL/DS, DB2)
– Network model
– Hierarchical model (e.g., IMS)

27
Categories of Data Models

Conceptual data models (Object-based): is the construction of


an enterprise’s information that is independent of
implementation details.
• Also called entity-based or object-based data models.

Logical data models (Record_based): is the logical description of


an enterprise’s information with high level description of the
implementation.
• Also called record-based data models.

Physical data models: is the physical description of how data is


stored in the computer.
28
Categories of Data Models

Hardware independent Conceptual


Software independent model

Hardware independent Logical


Software dependent model

Hardware dependent Physical


Software dependent model

29
Data Models
• Conceptual data models (Object-based):
– Entity-Relationship
– Semantic
– Functional
– Object-Oriented
• Logical data models (Record_based):
– Relational Data Model
– Network Data Model
– Hierarchical Data Model
• Physical Data Models
- Unifying &
- frame based memory models
30
E/R (Entity/Relationship) Model
 A conceptual level data model.
 Provides the concepts of entities, relationships and attributes.
The University Database Context
Entities: student, faculty member, course, departments etc.
Relationships: enrollment relationship between student &
course,
employment relationship between faculty
member, department etc.
Attributes: name, rollNumber, address etc., of student entity,
name, empNumber, phoneNumber etc., of faculty entity etc.
Banking database: with customer and Account entities

31
Entity-Relationship Model
Example of entity-relationship model

social-security customer-street
account-number
customer-name customer-city balance

customer depositor account

32
Representational Level Data Model
Relational Model : Provides the concept of a relation.
In the context of university database:
Relation name
Attributes
student

SName RollNumber JoiningYear BirthDate Program Dept

Sriram IT04B123 2004 15Aug1982 BE IT


Data
….

….
….

….
….
tuple

.

Relation scheme: Attribute names of the relation. Relation
data/instance: set of data tuples.
More details will be given in Relational Data Model Module.
33
Record-Based Data Models
Relational Data Model

34
Relational Model
Example of tabular data in the relational model:
name ssn street city account-number
Johnson 192-83-7465 Alma Palo Alto A-101
Smith 019-28-3746 North Rye A-215
Johnson 192-83-7465 Alma Palo Alto A-201
Jones 321-12-3123 Main Harrison A-217
Smith 019-28-3746 North Rye A-201

account-number balance
A-101 500
A-201 900
A-215 700
A-217 750

35
Record-Based Data Models
Network Data Model

36
Record-Based Data Models
Hierarchical Data Model

37
Functions of a DBMS

• Data Storage, Retrieval, and Update.

• A User-Accessible Catalog.

• Transaction Support.

• Concurrency Control Services.

• Recovery Services.

38
Functions of a DBMS

• Authorization Services.

• Support for Data Communication.

• Integrity Services.

• Services to Promote Data Independence.

• Utility Services.

39
Architecture of DBMS - Overall System Structure

indices Statistical data disk storage

Data files Data dictionary 40


Architecture of an RDBMS system
GUI/parameter
values
Compiled
Appln pgms Trans
Application Appln Manager
programs Pgm
compiler
RDBMS
Ad-hoc Run Buffer
Query Query Meta
queries Time Manager
compiler optimizer data
(Analyst) System
data
DDL DDL and Recovery Log
Commands other Manager
Control command
Commands processor Disk
(DBA) Storage
Database Users
• Users are differentiated by the way they
expect to interact with the system.
• Application programmers: interact with system
through DML calls.
• Specialized users: write specialized database
applications that do not fit into the traditional
data processing framework
• Sophisticated users: form requests in a database
query language.
• Naive users: invoke one of the permanent
application programs that have been written
previously

42
Architecture Details (1/3)
Disk Storage:
Meta-data – schema
- table definition, view definitions, mappings
Data – relation instances, index structures
statistics about data
Log – record of database update operations essential for failure
recovery

DDL and other command processor: Commands for relation


scheme creation Constraints setting
Commands for handling authorization and data access control

43
Architecture Details (2/3)
Query compiler
SQL adhoc queries
Compiles
update / delete commands
Query optimizers
Selects a near optimal plan for executing a query
- relation properties and index structures are utilized

Application Program Compiler


Preprocess to separate embedded SQL commands
Use host language compiler to compile rest of the program
Integrate the compiled program with the libraries for
SQL commands supplied by RDBMS

44
Architecture Details (3/3)
RDBMS Run Time System:
Executes Compiled queries, Compiled application programs
Interacts with Transaction Manager, Buffer Manager
Transaction Manager:
Keeps track of start, end of each transaction
Enforces concurrency control protocols
Buffer Manager: Manages disk space
Implements paging mechanism Recovery Manager:
Takes control as restart after a failure
Brings the system to a consistent state before it can be resumed

45
Roles for people in an Info System Management (1/2)

Naive users / Data entry operators


•Use the GUI provided by an application program
•Feed-in the data and invoke an operation
- e.g., person at the train reservation counter, person at library
issue / return counter
•No deep knowledge of the IS required

Application Programmers
•Embed SQL in a high-level language and develop programs to
handle functional requirements of an IS
•Should thoroughly understand the logical schema or relevant
views
•Meticulous testing of programs - necessary

46
Roles for people in an Info System management (2/2)
Sophisticated user / data analyst:
Uses SQL to generate answers for complex queries

DBA (Database Administrator) Designing the logical


scheme
Creating the structure of the entire database
Monitor usage and create necessary index structures to
speed up query execution
Grant / Revoke data access permissions to other users
etc.

47
Transaction Management
• A transaction is a collection of operations that
performs a single logical function in a database
application.
• Transaction-management component ensures that
the database remains in a consistent (correct) state
despite system failures (e.g. power failures and
operating system crashes) and transaction failures.
• Concurrency-control manager controls the
interaction among the concurrent transactions, to
ensure the consistency of the database.

48
Storage Management
• A storage manager is a program module that
provides the interface between the low-level data
stored in the database and the application
programs and queries submitted to the system.
• The storage manager is responsible for the
following tasks:
– Interaction with the file manager
– Efficient storing, retrieving, and updating of data

49
Database Administrator
• Coordinates all the activities of the database system; the
database administrator has a good understanding of the
enterprise’s information resources and needs:
• Database administrator’s duties include:
– Schema definition
– Storage structure and access method definition
– Schema and physical organization modification
– Granting user authority to access the database
– Specifying integrity constraints
– Acting as liaison with users
– Monitoring performance and responding to changes in
requirements

50
Purpose of Database Systems
The Database Management System (DBMS) is defined as a software system that
allows the user to define, create and maintain the database and provide control
access to the data.
It is a collection of programs used for managing data and simultaneously it supports
different types of users to create, manage, retrieve, update and store information.
Purpose
The purpose of DBMS is to transform the following −
 Data into information.
 Information into knowledge.
 Knowledge to the action.
The diagram given below explains the process as to how the transformation of data
to information to knowledge to action happens respectively in the DBMS −

Previously, the database applications were built directly on top of the file system.
Drawbacks in File System
There are so many drawbacks in using the file system. These are mentioned below

 Data redundancy and inconsistency: Different file formats, duplication of
information in different files.
 Difficulty in accessing data: To carry out new task we need to write a new
program.
 Data Isolation − Different files and formats.
 Integrity problems.
 Atomicity of updates − Failures leave the database in an inconsistent state.
For example, the fund transfer from one account to another may be
incomplete.
 Concurrent access by multiple users.
 Security problems.
Database system offer so many solutions to all these problems
Uses of DBMS
The main uses of DBMS are as follows −
 Data independence and efficient access of data.
 Application Development time reduces.
 Security and data integrity.
 Uniform data administration.
 Concurrent access and recovery from crashes.
Applications of DBMS
The different applications of DBMS are as follows −
 Railway Reservation System − It is used to keep record of booking of tickets,
departure of the train and the status of arrival and give updates to the
passengers with the help of a database.
 Library Management System − There will be so many numbers of books in
the library and it is very hard to keep a record of all the books in a register or a
copy. So, DBMS is necessary to keep track of all the book records, issue
dates, name of the books, author and maintain the records.
 Banking − We are doing a lot of transactions daily without directly going to the
banks. The only reason is the usage of databases and it manages all the data
of the customers over the database.
 Educational Institutions − All the examinations and the data related to the
students maintained over the internet with the help of a database
management system. It contains registration details of the student, results,
grades and courses available. All these works can be done online without
visiting an institution.
 Social Media Websites − By filling the required details we are able to access
social media platforms. Many users daily sign up for social websites such as
Facebook, Pinterest and Instagram. All the information related to the users are
stored and maintained with the help of DBMS.

Database systems arose in response to early methods of computerized


management of commercial data. As an example of such methods, typical of
the 1960s, consider part of a university organization that, among other data,
keeps information about all instructors, students, departments, and course
offerings. One way to keep the information on a computer is to store it in
operating system files. To allow users to manipulate the information, the
system has a number of application programs that manipulate the files,
including programs to:

1. Add new students, instructors, and courses

2. Register students for courses and generate class rosters

3. Assign grades to students, compute grade point averages (GPA), and


generate transcripts

System programmers wrote these application programs to meet the needs of


the university. New application programs are added to the system as the need
arises. For example, suppose that a university decides to create a new major
(say, computer science).As a result, the university creates a newdepartment
and creates new permanent files (or adds information to existing files) to
record information about all the instructors in the department, students in that
major, course offerings, degree requirements, etc. The university may have to
write new application programs to deal with rules specific to the new major.
New application programs may also have to be written to handle new rules in
the university. Thus, as time goes by, the system acquires more files and more
application programs. This typical file-processing system is supported by a
conventional operating system. The system stores permanent records in
various files, and it needs different application programs to extract records
from, and add records to, the appropriate files. Before database management
systems (DBMSs) were introduced, organizations usually stored information in
such systems.

Keeping organizational information in a file-processing system has a number


of major disadvantages:

• Data redundancy and inconsistency: Since different programmers create


the files and application programs over a long period, the various files are
likely to have different structures and the programs may bewritten in several
programming languages. Moreover, the same information may be duplicated
in several places (files). For example, if a student has a double major (say,
music and mathematics) the address and telephone number of that student
may appear in a file that consists of student records of students in the Music
department and in a file that consists of student records of students in the
Mathematics department. This redundancy leads to higher storage and access
cost. In addition, it may lead to data inconsistency; that is, the various copies
of the same data may no longer agree. For example, a changed student
address
may be reflected in the Music department records but not elsewhere in the
system.

• Difficulty in accessing data: Suppose that one of the university clerks


needs to find out the names of all students who live within a particular postal-
code area. The clerk asks the data-processing department to generate such a
list. Because the designers of the original system did not anticipate this
request, there is no application program on hand to meet it. There is, however,
an application program to generate the list of all students. The university clerk
has now two choices: either obtain the list of all students and extract the
needed information manually or ask a programmer to write the necessary
application program. Both alternatives are obviously unsatisfactory. Suppose
that such a program is written, and that, several days later, the same clerk
needs to trim that list to include only those students who have taken at least
60 credit hours. As expected, a program to generate such a list does not
exist. Again, the clerk has the preceding two options, neither of which is
satisfactory. The point here is that conventional file-processing environments
do not allow needed data to be retrieved in a convenient and efficientmanner.
More responsive data-retrieval systems are required for general use.

• Data isolation: Because data are scattered in various files, and files may
be in different formats, writing new application programs to retrieve the
appropriate data is difficult.

• Integrity problems: The data values stored in the database must satisfy
certain types of consistency constraints. Suppose the university maintains
an account for each department, and records the balance amount in each
account. Suppose also that the university requires that the account balance of
a department may never fall below zero. Developers enforce these constraints
in the system by adding appropriate code in the various application programs.
However, when new constraints are added, it is difficult to change
the programs to enforce them. The problem is compounded when constraints
involve several data items from different files.

• Atomicity problems: A computer system, like any other device, is subject


to failure. In many applications, it is crucial that, if a failure occurs, the data
be restored to the consistent state that existed prior to the failure. Consider
a program to transfer $500 from the account balance of department A to
the account balance of department B. If a system failure occurs during the
execution of the program, it is possible that the $500 was removed from the
balance of department A but was not credited to the balance of department B,
resulting in an inconsistent database state. Clearly, it is essential to database
consistency that either both the credit and debit occur, or that neither occur.
That is, the funds transfer must be atomic—it must happen in its entirety or
not at all. It is difficult to ensure atomicity in a conventional file-processing
system.

• Concurrent-access anomalies: For the sake of overall performance of the


system and faster response, many systems allow multiple users to update the
data simultaneously. Indeed, today, the largest Internet retailers may have
millions of accesses per day to their data by shoppers. In such an
environment, interaction of concurrent updates is possible and may result in
inconsistent data. Consider department A, with an account balance of
$10,000. If two department clerks debit the account balance (by say $500 and
$100, respectively) of department A at almost exactly the same time, the result
of the concurrent executions may leave the budget in an incorrect (or
inconsistent) state. Suppose that the programs executing on behalf of each
withdrawal read the old balance, reduce that value by the amount being
withdrawn, and write the result back. If the two programs run concurrently, they
may both read the value $10,000, and write back $9500 and $9900,
respectively. Depending on which one writes the value last, the account
balance of department A may contain either $9500 or $9900, rather than the
correct value of $9400. To guard against this possibility, the system must
maintain some form of supervision. But supervision is difficult to provide
because data may be accessed by many different application programs that
have not been coordinated previously. As another example, suppose a
registration program maintains a count of students registered for a course, in
order to enforce limits on the number of students registered. When a student
registers, the program reads the current count for the courses, verifies that the
count is not already at the limit, adds one to the count, and stores the count
back in the database. Suppose two students register concurrently, with the
count at (say) 39. The two program executions may both read the value 39,
and both would then write back 40, leading to an incorrect increase of only 1,
even though two students successfully registered for the course and the count
should be 41. Furthermore, suppose the course registration limit was 40; in the
above case both students would be able to register, leading to a violation of
the limit of 40 students.
• Security problems: Not every user of the database system should be able
to access all the data. For example, in a university, payroll personnel need
to see only that part of the database that has financial information. They do
not need access to information about academic records. But, since application
programs are added to the file-processing system in an ad hoc manner,
enforcing such security constraints is difficult.

These difficulties, among others, prompted the development of database


systems.
S.P.Khandait
• Basic E-R model is good for many uses
• Several extensions to E-R model for more
advanced modeling
– Generalization and specialization
– Aggregation
• These extensions can also be converted
to relational model
– Introduce a few more design choices
• An entity-set might contain distinct subgroups of
entities
– Subgroups have some different attributes, not shared
by entire entity-set
• E-R model provides specialization to represent
such entity-sets
• Example: bank account categories
– Checking accounts
– Savings accounts
– Have common features, but also unique attributes
• Generalization: a “bottom up” approach
– Taking similar entity-sets and unifying their common
features
– Start with specific entities, then create generalizations
from them
• Specialization: a “top down” approach
– Creating general purpose entity-sets, then providing
specializations of the general idea
– Start with general notion, then refine it
• Terms are basically equivalent
– Book refers to generalization as overarching concept
• Checking and savings accounts have:
– account number
– balance
– owner(s)
• Checking accounts also have:
– overdraft limit and associated account
– check transactions
• Savings accounts also have:
– minimum balance
• Create entity-set to represent common attributes
– Called the superclass, or higher-level entity-set
• Create entity-sets to represent specializations
– Called subclasses, or lower-level entity-sets
• Join superclass to subclasses using “ISA”
triangle
acct_id
account
balance

ISA

overdraft_limit min_balance
checking savings
• Attributes of higher-level entity-sets are inherited
by lower-level entity-sets
• Relationships involving higher-level entity-sets
are also inherited by lower-level entity-sets!
– A lower-level entity-set can participate in its own
relationship-sets, too
• Usually, entity-sets inherit from one superclass
– Entity-sets form a hierarchy
• Can also inherit from multiple superclasses
– Entity-sets form a lattice
– Introduces many subtle issues, of course
acct_id
account
balance

ISA

overdraft_limit min_balance
checking savings

• Can an account be both a savings account and


a checking account?
• Can an account be neither a savings account or
a checking account?
• Can specify constraints on specialization
– Enforce what “makes sense” for the enterprise
• “An account must be either a checking account,
or a savings account, but not both.”
• An entity may belong to only one of the lower-
level entity-sets
– Must be a member of checking, or a member of
savings, but not both!
– Called a “disjointness constraint”
– A better way to state it: a disjoint specialization
• If an entity can be a member of multiple lower-
level entity-sets:
– Called an overlapping specialization
• Default constraint is overlapping!
• Indicate disjoint specialization with word
“disjoint” next to triangle
• Updated bank account diagram:
acct_id
account
balance

ISA disjoint

overdraft_limit min_balance
checking savings
• “An account must be a checking account or a
savings account.”
• Every entity in higher-level entity-set must also
be a member of at least one lower-level entity-
set
– Called total specialization
• If entities in higher-level entity-set aren’t required
to be members of lower-level entity-sets:
– Called partial specialization
• account specialization is a total specialization
• Default constraint is partial specialization
• Specify total specialization constraint with
a double line on superclass side
• Updated bank account diagram:
acct_id
account
balance

ISA disjoint

overdraft_limit min_balance
checking savings
• Our bank schema so far:
acct_id
account
balance

ISA disjoint

overdraft_limit min_balance
checking savings

• How to tell whether an account is a


checking account or a savings account?
– No attribute indicates type of account
• Membership constraints specify which entities
are members of lower-level entity-sets
– e.g. which accounts are checking or savings accounts
• Condition-defined lower-level entity-sets
– Membership is specified by a predicate
– If an entity satisfies a lower-level entity-set’s predicate
then it is a member of that lower-level entity-set
– If all lower-level entity-sets refer to the same attribute,
this is called attribute-defined specialization
• e.g. account could have an account_type attribute
• Entities may simply be assigned to lower-level
entity-sets by a database user
– No explicit predicate governs membership
– Called user-defined membership
• Generally used when an entity’s membership
could change in the future
• Bank account example:
– Accounts could use user-defined membership, but
wouldn’t make so much sense
– Makes it harder to write queries involving only one
kind of account
– Best choice is probably attribute-defined membership
• Final bank account diagram:
acct_id
acct_type
account
balance

ISA disjoint

overdraft_limit min_balance
checking savings

• Would also create relationship-sets against


various entity-sets in hierarchy
– associate customer with account
– associate check_txns weak entity-set with checking
• Mapping generalization/specialization to
relational model is straightforward
• Create relation schema for higher-level entity-set
– Including primary keys, etc.
• Create schemas for lower-level entity-sets
– Subclass schemas include superclass’ primary key
attributes!
– Primary key is same as superclass’ primary key
• If subclass contains its own primary key, treat as a separate
candidate key
– Foreign key reference from subclass schemas to
superclass schema, on primary-key attributes
acct_id
acct_type
account
balance

ISA disjoint

overdraft_limit min_balance
checking savings

• Schemas:
account(acct_id, acct_type, balance)
checking(acct_id, overdraft_limit)
savings(acct_id, min_balance)
– Could use CHECK constraints SQL tables for
membership constraints, other constraints
• If specialization is disjoint and complete, can
convert only lower-level entity-sets to relational
schemas
– Every entity in higher-level entity-set also appears in
lower-level entity-sets
– Every entity is a member of exactly one lower-level
entity-set
• Each lower-level entity-set has its own relation
schema
– All attributes of superclass entity-set are included on
each subclass entity-set
– No relation schema for superclass entity-set
acct_id
acct_type
account
balance

ISA disjoint

overdraft_limit min_balance
checking savings

• Schemas:
checking(acct_id, acct_type, balance, overdraft_limit)
savings(acct_id, acct_type, balance, min_balance)
• Alternative schemas:
checking(acct_id, acct_type, balance, overdraft_limit)
savings(acct_id, acct_type, balance, min_balance)
• Problems?
– Enforcing uniqueness of account IDs!
– Representing relationships involving general accounts
• Can solve by creating a simple relation:
account(acct_id)
– Contains all valid account IDs
– Relationships involving accounts can use account
– Need foreign key constraints again…
• Generating primary key values is actually the easy part
• Most databases provide sequences
– A source of INTEGER or BIGINT values
– Perfect for primary key values
– Multiple tables can use a sequence for their primary keys
• PostgreSQL example:
CREATE SEQUENCE acct_seq;

CREATE TABLE checking (


acct_id INT PRIMARY KEY DEFAULT nextval('acct_seq');
...
);
CREATE TABLE savings (
acct_id INT PRIMARY KEY DEFAULT nextval('acct_seq');
...
);
• Alternative mapping has some drawbacks
– Doesn’t actually give many benefits in general case
– Biggest issue is managing primary keys!
• Fewer drawbacks if:
– Total, disjoint specialization
– No relationships against superclass entity-set
• If specialization is overlapping, some details are
stored multiple times
– Unnecessary redundancy, and consistency issues
• Also limits future schema changes
Q1) Design a generalization-specialization
hierarchy for a motor-vehicle sales company.
The company sells motorcycles, passenger
cars, vans and buses. Justify your placement of
attributes at each level of hierarchy. Explain
why they should not be placed at a higher or
lower level.
Q2) Construct an EER Model for Banking System
showing generalization-specialization
• Basic E-R model can’t represent relationships
involving other relationships
• Example: employee jobs job

employee works_on branch

• Want to assign a manager to each


(employee, branch, job) combination
– Need a separate manager entity-set
– Relationship between each manager, employee,
branch, and job entity
• One option: a quaternary relationship
– This option has lots of redundant information
– Benefit is that some jobs might not
require a manager
job
• Could also make works_on a
quaternary relationship
– Don’t use a separate employee works_on branch
manager relation
– Jobs with no manager
would use null values
instead manages
• These options are clumsy

manager
 Another option is to treat works_on
relationship as an aggregate
 Build a relationship against the aggregate
– manages implicitly includes set of entities
participating in a works_on relationship
instance
job
– Jobs can also have
no manager
employee works_on branch

manager manages
• Mapping for aggregation is straightforward
• For entity-sets and relationship-set being used
as an aggregate, mapping is unchanged
• Relationship-set against the aggregate:
– Includes primary keys of participating entity-sets
– Includes all primary key attributes of aggregated
relationship-set
– Also includes any descriptive attributes
– Primary key of relationship-set includes all the above
primary key attributes
– Foreign key against aggregated relationship-set, as
well as participating entity-sets
job

employee works_on branch

• Job schemas:
employee(emp_id, emp_name)
manages manager
job(title, level)
branch(branch_name, branch_city, assets)
works_on(emp_id, branch_name, title)
• Manager schemas:
manager(mgr_id, mgr_name)
manages(mgr_id, emp_id, branch_name, title)
• Differences between version with aggregation,
and version with quaternary relationship?
• Biggest difference:
– Quaternary relationship’s schema derives primary
and foreign key constraints from participating entities
– Relationship using aggregation derives primary and
foreign key constraints from aggregate relationship
• A subtle difference
– Doesn’t have any significant practical impact
• Covered two extensions to E-R model
– Higher level abstractions
• Generalization and specialization
– Can specify constraints:
• Membership constraints
• Completeness constraints
• Disjointedness constraints
• Aggregation
– Can build relationships that include other
relationships
• Straightforward mappings to relational model
• Next time: normal forms!
Database System Structure & architecture
Database Management System (DBMS) is software that allows access to data stored in a
database and provides an easy and effective method of –
 Defining the information.
 Storing the information.
 Manipulating the information.
 Protecting the information from system crashes or data theft.
 Differentiating access permissions for different users.
Used by the computer system on which the database system runs.
• Database systems can be centralized, or client-server.
• A database system is divided into modules that deal with different responsibilities of
the overall system.

• Primary goal:- retrieving information from and storing new information into the
database.

The design of a DBMS depends on its architecture. It can be centralized or decentralized or


hierarchical. The architecture of a DBMS can be seen as either single tier or multi-tier. An n-
tier architecture divides the whole system into related but independent n modules, which can
be independently modified, altered, changed, or replaced.
In 1-tier architecture, the DBMS is the only entity where the user directly sits on the DBMS
and uses it. Any changes done here will directly be done on the DBMS itself. It does not
provide handy tools for end-users. Database designers and programmers normally prefer to
use single-tier architecture.
If the architecture of DBMS is 2-tier, then it must have an application through which the
DBMS can be accessed. Programmers use 2-tier architecture where they access the DBMS by
means of an application. Here the application tier is entirely independent of the database in
terms of operation, design, and programming.

3-tier Architecture

A 3-tier architecture separates its tiers from each other based on the complexity of the users
and how they use the data present in the database. It is the most widely used architecture to
design a DBMS.
 Database (Data) Tier − At this tier, the database resides along with its query
processing languages. We also have the relations that define the data and their
constraints at this level.
 Application (Middle) Tier − At this tier reside the application server and the
programs that access the database. For a user, this application tier presents an
abstracted view of the database. End-users are unaware of any existence of the
database beyond the application. At the other end, the database tier is not aware of any
other user beyond the application tier. Hence, the application layer sits in the middle
and acts as a mediator between the end-user and the database.
 User (Presentation) Tier − End-users operate on this tier and they know nothing
about any existence of the database beyond this layer. At this layer, multiple views of
the database can be provided by the application. All views are generated by
applications that reside in the application tier.
Multiple-tier database architecture is highly modifiable, as almost all its components are
independent and can be changed independently.

The database system is divided into functional components: User,interface,Query Processor,


Storage Manager, and Disk Storage. These are explained as following below.
Query Processor: It interprets the requests (queries) received from end user via an
application program into instructions. It also executes the user request which is received
from the DML compiler.
Query Processor contains the following components –
 DML Compiler: It processes the DML statements into low level instruction (machine
language), so that they can be executed.
 DDL Interpreter: It processes the DDL statements into a set of table containing meta
data (data about data).
 Embedded DML Pre-compiler: It processes DML statements embedded in an
application program into procedural calls.
 Query Optimizer: It executes the instruction generated by DML Compiler.
2. Storage Manager: Storage Manager is a program that provides an interface between the
data stored in the database and the queries received. It is also known as Database Control
System. It maintains the consistency and integrity of the database by applying the
constraints and executing the DCL statements. It is responsible for updating, storing,
deleting, and retrieving data in the database.
It contains the following components –
 Authorization Manager: It ensures role-based access control, i.e,. checks whether the
particular person is privileged to perform the requested operation or not.

 Integrity Manager: It checks the integrity constraints when the database is modified.

 Transaction Manager: It controls concurrent access by performing the operations in a


scheduled way that it receives the transaction. Thus, it ensures that the database remains
in the consistent state before and after the execution of a transaction.

 File Manager: It manages the file space and the data structure used to represent
information in the database.

 Buffer Manager: It is responsible for cache memory and the transfer of data between
the secondary storage and main memory.

3. Disk Storage: It contains the following components –


 Data Files: It stores the data.

 Data Dictionary: It contains the information about the structure of any database object.
It is the repository of information that governs the metadata.
 Indices: It provides faster retrieval of data item.

SYSTEM USERS

four different types of database-system users,


1. Application programmers
1. Sophisticated users and Specialized users:-
1. Database Administrator
1. Naive users

1. Application programmers
• Computer professionals and write application programs.
• They interact with the system through DML calls.
• DML calls are embedded in a program written in a host language (for example PHP,
python, c).
• These programs are commonly referred to as application programs.

2. Sophisticated users and Specialized users:-


• Interact with the system without writing programs.
• They request in a database query language as query form.
• Each query is submitted to a query processor.
• Query processor is to break down DML statements into instructions that the storage
manager understands.
• They work as an analyst.
• Specialized users are sophisticated users who write specialized database applications
like computer-aided design Systems, knowledgebase, and expert systems, etc.
• The user uses complex data types (for example, graphics data and audio data).

(3) Database Administrator (DBA)


• DBA has central control over the system.
• Responsible for following functions:
(i) Schema Design and Maintenance,
(ii) Physical Schema and Organization Modification,
(iii) Authorization and Security,
(iv) Integrity Constraint Specification,
(v) Recovery from Failure,
(vi) Database Upgradation

(4) Naive users


• Naive users are unsophisticated users.
• Interact with the system by the permanent application programs that have been written
previously.
• For example, the clerk at the ticket booking window, he uses an application program to
do his job of making reservations for a passenger.

Query Processor:
• A query processor helps the database system simplify and facilitate data access.
• System users are not required to know physical details of the implementation of the
system.
• Quick processing of updates.
• Queries are written in a nonprocedural language, at the logical "level.
• Results are stored into an efficient sequence of operations at the physical level.

Query Processor Components: -


These components are used in evaluating DDL and DML queries. These are:-
(i) DDL Interpreter
• Interprets DDL statements and store the definitions in the data dictionary, as metadata.

(ii) DML Queries


• Sophisticated or Specialized user’s requests in a database query language.
• Each query is submitted to a query tools
• Query tools break down it into DML queries and low-level instructions.
• These are sent to the DML compiler and Organizer, and the query evaluation engine for
further processing.

(iii) DML Compiler and Organizer


• DML calls are usually started by a special character/code so that the appropriate code
can be generated.
• A special preprocessor, called the DML precompiles, converts the DML statement to
normal procedure calls in the host language.
• The resulting program is then run through the host-language compiler, which generates
appropriate object code ( a set of low-level instructions that can be used by query evaluation
engine.)

(iv) Application Program Object Code


• It converts DML statements embedded in an application program to normal procedure
calls in the host language.
• These pre-compilers consult the DML compiler to generate the appropriate code.

(v) Compiler and Linker: -


• Application programmer writes program application.
• The source codes compiled by the compiler and linker-linked application program
object code to DML queries and send to query evolution engine.

(iv) Query evaluation Engine: -


• A query can usually be translated into any of a number of alternative evaluation plans
that all give the same result.
• The DML compiler also performs query optimization, that is, it picks the lowest cost
evaluation plan from among the alternatives.
• This component is responsible for interpreting and executing the SQL query.
• It contains three major components
Compiler - builds a data structure from the SQL statement and then does semantic checking
on the query such as whether the table exists, field ‘exists, etc.
Optimizer - transforms the initial query plan (data structure created by the compiler), into the
sequence of operations usually pipelined together to achieve fast execution.
It refers to the metadata (dictionary) and statistical information stored about the data to decide
which sequence of operations is likely to be faster and based on that it creates the optimal
query plan. Both cost and rule-based optimizers are used.
• Execution Engine - executes each step in the query plan chosen by Optimizer.
It interacts with the relation engine to retrieve and store records.

Storage Manager:
• A storage manager is a program module that provides the interface between the low-
level data stored in the database and the application programs and queries submitted to the
system.
• The storage manager is responsible for the interaction with the file manager.
• The raw data are stored on the disk using the file system, which is usually provided by
a conventional operating system.
• The storage manager translates the various DML statements into low-level file system
commands.
• The storage manager is responsible for storing, retrieving, and updating data in the
database.
• A large amount of storage space is required for storing corporate databases (which may
range from hundreds to gigabytes to terabytes of data) and to manage this storage manager is
required.
• Data are to move between disk storage and main memory as per requirement because
the main memory of the computer cannot store this much information.

The storage manager components–


(i) File Manager
• It manages disk space allocation and the data structures used to store the data.
• File manager maps disk pages of the file to the memory pages at the disk in physical
form and does the actual disk I/O operations in case of major faults generated "by buffer
manager module.
(ii) Buffer Manager
• Buffer manager responsible for loading pages(fetching) from disk to main memory
and to managing the buffer pool based on Least Recently Used (LRU) algorithm and deciding
the caching strategy suitable for the application.
• It is a critical part of the database system, it enables the database to handle data sizes
that are much large than the size of main memory for this has a special-purpose allocator for
storing control information, which is transient.
• Buffer pool is the memory space used by buffer manager to cache disk pages
associated with records, index information, Metadata information.
• Some database systems have space limits at the individual level and some at the
global level for buffer pool size.

(iii) Transaction Manager


• The transaction manager creates transaction objects and manages their atomicity and
durability.
• Applications request the creation of a transaction object by calling the transactions
manager’s begin Transaction method.
• When a resource manager first participates in a transaction, it calls the Enlist method
to enlist in the transaction.
• The transaction manager tracks all the resource managers who enlist in the
transaction.
• It ensures that the database remains in a consistent (correct) state despite system
failures and that concurrent transaction executions proceed without conflict.
One of the following three results can occur:
1. The application either commits or aborts the transaction.
2 . A resource manager aborts the transaction.
3. A failure occurs.

(iv) Authorization and Integrity Manager


• This manager is responsible for granting access to the database or portions thereof only to
authorized users and preventing access to unauthorized users.
• It tests for the satisfaction of integrity constraints and checks the authority of users to access
data.
• It uses all the integrity constraints and authorization rules specified by the DBA.
• Integrity manager must assure data integrity during normal database operations as well as
during the database failures-

Disk Storage
• A DBMS can use several kinds of data structures as a part of physical system
implementation in the form of disk storage.
• Each structure has its own importance.
• Following are some common data structures.
• Disk storage is the central repository for storing all kinds of data in the database.

(i) Data
• It stores the database itself on the disk in the Data files.
(ii) Data Dictionary
• Information relating to the structure and usage of data contained in the database, the
metadata, is maintained in a data dictionary.
• The data dictionary is a database itself, documents the data.
• Each database user can consult the data dictionary to pick up what each piece of data
and the various synonyms of the data fields mean.
• In a system the data dictionary is part of the DBMS (Integrated system) the data
dictionary stores information concerning the source of each data-field value, the frequency of
its use, and an audit trail (verification of account) concerning updates, including the who and
when of each update.
• Currently data dictionary systems are available as add-ons to the DBMS.

The data dictionary stores:


• Names of relations
• Names of the attributes of each relation
• Domains, and lengths of attributes
• Names of views defined on the database, and definitions of those views
• Names of authorized users
• Accounting information about users
• Number of tuples in each relation
• Method of storage used for each relation
• Name of the index
• Name of the relation being indexed
• Attributes on which the index is defined
• Type of index formed

(iii) Indices
• Indices, which can provide fast access to data items.
• A database index provides pointers to those data items that hold a particular value.
• Hashing is an alternative to indexing that is faster in some but not all cases.

(iv) Statistical Data


• It stores statistical information about the data stored in the database, like the number of
records, blocks, etc. in a table.
• This information can be used to execute a query efficiently.

You might also like