You are on page 1of 18

DATABASE-1: Definitions and Concepts

Overview
• History of Database
• What is a Database?
• Components of a Database
• Different Database Users
• Data Abstraction
• Database Schema and Instances
• Database Languages – DDL, DML, DQL, DCL
• Data Models
• Transaction Management
• Database Management System
• Data Administrator (DA) & Database Administrator (DBA)
• Advantages and Disadvantages of a DBMS
History of Database
Data may be defined as a collection of
related information. For example during a No. CJ2ed/0221 A. Banerjee
census, information related to different people Cobol Made Easy 2/3 Park Circus
– like name, age, occupation, income etc. may J. K. Jones Calcutta 700019
be collectively stored for each person, thus Second Edition 1. MK1ed/2341
$10.95 2. CG1ed/0542
forming a list of related data. To make use of
this huge amount of data, it needs to be stored
to be accessed efficiently whenever needed. Book Index Cards Member Index Cards

Before the advent of computers, paper index cards were used to store information
and maintain a catalogue of different types of related data. For example a library
would use different sets of index cards – one set to keep a record of its books and
another set to keep a record of its members. When a new book was bought, a
separate index card was added to the existing list with information related to the
new book. Similarly when a new member joined the library, a new card was added
to the member set with information related to the member and the books borrowed.
With the emergence of computers, instead of using paper index cards, data were
stored in separate computer files in the form of records. Each file contained a group
of related records. Thus the library would now have a file containing records of the
different books and a separate file to contain records of the different members.
The figure to the right shows one such application Supplier
where a manufacturing company is using three Processing Supplier
separate files for its business. Application Data File

The first file, the Supplier file, stores a list of Supplier File User
names, addresses, telephone numbers, and
contact persons of different suppliers who are
supplying raw materials to the company for Order
Processing Order
making their products. Data File
Application
The second file, the Order file, is storing details of
the purchase orders placed on the different Order File User
suppliers for raw materials.
Payment
The third file, the Payment file, is storing details Payment
Processing
of the payments made to the different suppliers Application Data File
against their purchase orders based on their terms
of payment. Payment File User
The users of the different files interact with the
files by means of specific application programs.
Thus the Supplier file user interacts with the
Supplier file through the Supplier Processing
Application program, the Order file user uses the

DB01 – Definitions and Concepts Page 1 of 18 © Joyrup Bhattacharya


Order Processing Application program to access the Order file and the
Payment file user runs the Payment Processing Application program to access
the Payment file.
Though File Processing Systems were an improvement over the earlier manual
system of keeping records, however it had some major drawbacks:
1. Data Duplication: The same data may need to be stored in different files. For
example in the above application when an order is placed on a supplier, the
Order file should also contain the Supplier Name and Address. Thus the same
data i.e. the Supplier Name and Address occurs in the Supplier file and also
in the Order file. Moreover if several orders are placed on the same supplier at
different times, the number of occurrences of the same data will be even more.
Two problems arise due to this. Firstly unnecessary space is used up to
store the same data at different places. The second problem is that of data
integrity. A collection of data is said to have integrity if it is logically
consistent everywhere. For example if a supplier changes its address, then the Data
same change needs to be updated in all the files where the address of the Integrity
supplier is stored. However chances are that due to manual or other mistakes the
same data may not get updated everywhere, leading to data inconsistency
and data integrity problems.
2. Separated and isolated Data: Usually different files are used to contain
different information. In case data needs to be combined from these
different files, the programmer must know which data need to be selected
from which file, before combining them to form a third file. For example the
Supplier file, Order file and the Payment file contain different information. If
the user wants to find the payments that are due in a particular month, he has to
refer to both the Order file and Payment file to get the required result. In case
data need to be combined from many files, the process becomes complex on
the part of the programmer.
3. File format dependent Applications: In file processing systems, application
programs are written based on the data files on which they work. Thus the actual
formats of the data in the files are an integral part of the application program
code i.e. there is a dependency between the data files and the application
programs that work on those files. In case any modification is made on the data
type in a file, then all application programs that use that data file are also
required to be modified. For example all application programs that process files
containing phone numbers need to be modified if the phone number changes
from 7 to 8 digits. Modifying several programs is a time consuming and
error prone job.
4. File Incompatibility: In case different programming platforms are used to
develop the application programs, then the formats of the data files on which
these programs act will also be different for each of the programming languages
used. Thus a C program data file will be different from a Visual Basic program
data file. Under such a situation, if a requirement is there to combine data
from the different data files, then the data files need to be first
converted to a common format and then used. This is both a complex and
time-consuming task.
5. User unfriendly data representation: In a file processing system it is
difficult to combine data from different files and display them in a user
friendly manner based on the specific requirements of the end user. This is
because it is difficult to process relationships between different data from
different files in a file processing system.
Database technologies were developed to overcome these difficulties. Unlike File
Processing systems, in Database Processing systems the user application
programs do not directly interact with the stored data, but through an intermediate
system called the Database Management System or DBMS. In doing so, the
application programs become independent of the way in which the data is actually
stored.

DB01 – Definitions and Concepts Page 2 of 18 © Joyrup Bhattacharya


What is a Database?
To overcome the shortfalls of the File Processing systems, Database technologies
were developed during the 60s with major effort given by IBM Corporation. Unlike
File Processing systems whose main components are application program
dependent data files, the main component of a Database Processing System is
a Database. A database is usually a collection of information or data related to a
particular topic or subject. This data can be purely textual in nature like the name,
address, and contact number etc. of a person in a telephone directory. It can
contain graphical data like photograph of the person along with his name, or it can
be a collection of audio/video files as in a music album database. Moreover in a
database, the structure of the stored data should be independent of application
programs that may use this data. In general a database should have the following
characteristics:
1. Data Independence: In file processing systems, the application programs
are dependent on the data structures of the data files on which they act. Thus if
a data format is changed for better efficiency or accuracy or new data items
are added to accommodate changes, the application programs also need to
be changed to accommodate the changes. In a database system data items
are stored independent of any application program. Application programs
interact with the Database Management System which in turn interacts with the
database, making the application programs independent of any changes made to
the database.
This ability to modify the data scheme in one level without affecting the data
scheme in a higher level is called data independence. There are two kinds of
data independence. These are Physical Data Independence (also called program
data independence) and Logical Data Independence. The differences include: Data
Independenc
Physical Data Independence Logical Data Independence e
It is the ability to modify the
It is the ability to modify the
physical schema of data storage
underlying logical or conceptual
without the need to rewrite the
schema of a database without the
application programs that access the
need to rewrite application programs.
database.
It leaves the users’ views and It leaves the users’ views and
methods of accessing the information methods for accessing the information
unaffected by changes made to the unaffected by changes made to the
physical organisation of data at the structure of database at the
physical or internal level. conceptual level.
Such modifications are usually done It allows the logical structure of the
to improve the overall database to be altered dynamically
performance of a database. in case a change is required.
Application programs do not
Since application programs are highly
dependent much on the physical
dependent on the logical structure of
structure of the data. Hence it is
the data, it is a difficult job to
relatively easier to achieve physical
achieve logical data independence.
data independence.
Example: Changing the file
Example: Adding a new field like the
organisation from sequential to
mobile phone number to an existing
random access to improve
record of a person in a company.
performance.
2. Data Integrity: In a database a particular data is kept at a single place
avoiding duplication of data. This ensures data updates need to be
implemented at a single point only, eliminating chances for any confusion.
3. Data Flexibility: In a database, the same data can be accessed from many
places simultaneously and in different ways based on the requirements of the
respective application programs.

DB01 – Definitions and Concepts Page 3 of 18 © Joyrup Bhattacharya


4. User Friendly Interface: The end user is not required to bother about the
actual or physical storage of data. Highly technical software called the Database
Management System takes care of the low level data structures and the
relationship between the different data. Thus the complexity of the data and
implementation details are hidden by the DBMS and the end user can access the
required data with the minimal of technical knowledge.
We can thus formally define a Database as stated below.
A database is a self-describing, shared collection of interrelated data from
where users can efficiently retrieve information in response to specific queries. Database
Definition
Components of a Database Supplier
The self-describing nature of a database implies SupI SupNam SupCity SupPhon
that we do not have to rely on any external D e e
2136587
information to find out what the data in the 0001 Godrej Mumbai 1
database represent or the relation between the 2417555
0002 Bajaj Kolkata
different data components. In a File Processing 2
system, the data files contain only the data, and 0003 Steelco Chennai 3265147
8
the description of the data is a part of the
2422369
Application Programs that access the data files. On 0004 Philips Kolkata
the other hand, in a Database Processing system, the data description is inbuilt
into the database along with the data. In the set of data shown above in the
form of a table, the data part consists of the values: 0001, Godrej, Mumbai,
21365871 etc. Whereas the headings: SupID, SupName, SupCity, and
SupPhone, describe the database structure and meaning of each data item. Both
of these are inbuilt into the database.
The data related to the structure or description of a database is called Metadata/
Metadata or Data Dictionary. For example Metadata includes the table names, Data
the column names, the properties of the columns in the tables like the data-type, Dictionary
the length of each data-type etc. This Metadata is usually stored in the form of
tables called System Tables.
Number
Thus in our previous example of the manufacturing Table Name of
Primary
company, we basically had three separate tables Key
Columns
of data viz. Supplier, Order, and Payment. Supplier 4 SupID
Similarly each table contained several columns in Order 6 OrdNo
which data was divided and stored. The System
Payment 4 ReceiptNo
Tables that can be used to describe the database
for the above company are shown to the right. Column Table Data
Length
Name Name Type
We have two System Tables containing the
Metadata. These include System Tables describing SupID Supplier Integer 2
the different TABLES (first table) and the different SupName Supplier Text 20
COLUMNS in the various tables (second table). SupCity Supplier Text 20
The first table contains the names of the different SupPhon
Supplier Integer 2
e
tables that comprise the database, the number of
OrdNo Order Integer 2
columns in each table and the Primary Keys.
Date Order Text 10
Similarly the second table contains details about SupIDF Order Integer 2
the different columns that form each of the tables.
Item Order Text 30
The column names, data-types of the data they
contain and the lengths of each data type are Rate Order Float 4
included as information needed to describe the Qty Order Integer 2
database. ReceiptN
Payment Integer 2
o
To improve database performance, a database OrdNoF Payment Integer 2
contains another kind of data called indexes.
Date Payment Text 10
Suppose it is required to list the names of all Index
suppliers in a particular city. Since the database Amount Payment Float 4
may be stored as a sorted file with respect to the ID-number of
the suppliers, it will be a time consuming process to query each CityIndex
SupCity SupID
Chennai 0003
DB01 – Definitions and Concepts Page 4 of 18 Kolkata 0002
© Joyrup Bhattacharya
Kolkata 0004
Mumbai 0001
Mumbai 0005
and every record and then find out the names located in a particular city. To speed
up the process, a special data structure called an index may also be maintained
by the database. It is similar to finding a name from the telephone directory. The
index stores the different cities in alphabetical order and relates each city to the
respective supplier ID as shown to the right. It is easier to find the city in
alphabetical order from the index and then find the supplier name from the SupID
given against each city name in the index. The DBMS looks for the supplier name
from the Supplier table by matching the SupID as given in the index. We will
discuss more about indexes in a later section.
A database may also contain data about the applications that use the database.
These may include the structure of the different data entry forms, the different
types of reports or queries etc. This last category of data is called Application
Metadata.
We can summarise the components of a database by
the diagram to the right. The basic unit of information
stored in a database are bits. These bits combine to
form characters (both strings and numbers). These
strings and numbers are collected to form different DATABASE
fields, which in turn form records. Several records are FILES+METADATA+INDEXES
RECORDS
collected to form data files. Data-files along with other FIELDS
CHARACTERS
BITS
special data structures like Metadata, Application
Metadata and Indexes form the Database.
The term “shared collection” in the description of a database implies that all data
is stored centrally in the database. This central data is then shared by every
individual who has access to the particular data. Data is not stored in different
individual files as per the need of different individuals with the same data
repeating in more than one file, as in a file processing system. Different application
programs fetch the data from the central database where a particular data is stored
only once.
The next term “interrelated data” implies that the data stored as different
relations or tables are not independent but are related to each other. For example
in the above example, the Order table is related to the Supplier table through the
SupID attribute. From the Order table if we know the SupID, we can find out the
phone number of the corresponding supplier from the Supplier table by matching
the SupID numbers in both the tables. Therefore the data stored in different tables
are related to each other by means of special attributes or keys (discussed in detail
in later sections).
The final part of the definition indicates that the information stored in a database
can be efficiently updated and retrieved by the users by writing specific queries
in a data query language like SQL. The queries are submitted to the database
management system, which responds to these queries by combining data from
different tables and present the required data to the end-user in a manner
convenient to the user.

Database Users
The main aim of a database is to provide ways of storing and retrieving information
in an efficient manner. To do this, different kinds of people may need to access or
handle the database both during the development and during the implementation
stage. These users include the general public accessing a public database like a
railway reservation database. They may include company executives handling
confidential data in the company database. At the lower end we have the computer
professionals engaged in developing a database and the data-entry operators
engaged in entering the raw data into the database. Depending upon the type of
use, we can classify database users into the following categories:
1. Application Programmers: These are people who are engaged in developing
general application programs to access databases. The application program
is usually written in a base or host language (like C, Visual Basic etc.).

DB01 – Definitions and Concepts Page 5 of 18 © Joyrup Bhattacharya


Commands in a special Data Manipulation Language (like SQL) are then
embedded within the host language code, to access the database and perform
data manipulations.
2. Sophisticated Users: These are the people who interact with a database
without writing application programs but by requesting information from a
database by writing queries in Data Manipulation Languages like SQL.
These queries are then processed by a query processor and submitted to a
database storage manager to provide the necessary outputs. Analysts who may
be required to analyse data based on certain criteria and generate special
reports fall under this category.
3. Specialised Users: These people are engaged in writing specialised database
application programs involving complex data structures like graphics,
audio, or video data or are engaged in writing special application programs to
implement computer aided design systems.
4. Inexperienced Users: These are end users who interact with a database
through permanent application programs like menu driven interfaces in a
railway enquiry system, in an automated bank teller machine etc.
5. Database Administrators: In an organisation the Database Administrator is the
person who is responsible for overall control and fine tuning of the
database to get the best performance. The DBA is responsible for maintaining
the database server and provide users with access to their required
information as and when required.

Data Abstraction
In a database, the stored data needs to be
retrieved and manipulated efficiently. View Level
Complex algorithms and data structures have View 1 View 2 … View n
been developed to do this. However not all
users of the database are computer experts and
hence may not be expected to understand these Logical Level
complex data structures to manipulate the data.
To overcome this difficulty, the database Physical
Level
approach provides some level of data
abstraction i.e. the developers of the database
hide from the database users the details of
actually how the data is stored. Instead, it
presents to the user a view of the data that is
readily understandable by him. This helps to
simplify the users’ interaction with the database
system as it allows the user to manipulate the
data without being concerned about the underlying mechanism by which the data
gets actually stored.
In a database system thus different levels of data abstraction are used to simplify
the final data representation i.e. to connect the raw data type to the final user view
of the data. These levels include:
1. Physical Level: This is the lowest level of data abstraction. At this level the
complex low level data structures used to store the data are described. For
example at the byte level, the different records that comprise a database may
be stored as a linear linked list, as a binary tree structure, as fixed length records
or as variable length records. The data representation at the physical level thus
describes how blocks of data consisting of bytes of raw data are stored
in consecutive storage locations. The database system hides many of these
lowest level storage details from the database programmers and the end users.
2. Logical Level: This is the next higher level. At this level the data and the
relationships that exist between those data are defined. The entire database

DB01 – Definitions and Concepts Page 6 of 18 © Joyrup Bhattacharya


is described in terms of relatively simple structures like data tables etc. though
at the physical level this may involve manipulation of complex data structures.
The logical level of abstraction is used by database administrators who decide
what information needs to be kept in the database and the relationship between
the different data.
For example in a student database, the different aspects related to a student,
like students’ personal data, students’ accounts related data, students’ academic
performance related data etc. needs to be defined and the relationships that
exist between these different aspects are established at the logical level.
3. View Level: This is the highest level of data abstraction. In case of a large
database, some complexity may still remain at the logical level. Moreover
majority of users will not be required to access the entire database, but will be
concerned with only a part of the database. Accordingly, depending upon the
nature of use and the type of user, different user-friendly views of the
database are defined. Apart from providing appropriate database views, this
level also provides security to the database by providing selective access to
different users.
For example in a student database, different views or forms may be provided at
the view level like the student personal data entry view, student fees entry view,
students’ marks entry and report card generation view etc. Of these teachers
may be given access to only the marks entry view, while the accounts
department may be given access to the fees related data view etc. thus
providing data security at these different view levels.

Database Schema and Instance


The overall design and description of a database is in general called the
database schema. The schema is used to define the following:
Database
a) The physical structure of the database i.e. the data structures used to store Schema
the data physically in the database. It also specifies the character sets or
symbols used to encode the data. ASCII is the best known character set used.
b) The logical structure of the database i.e. the different relations or tables
that comprise the database, the relationships between those tables and the
different attributes for the relations.
c) The different constrains or business rules that govern different transactions.
d) Rules to determine who has access to the schema.
Though the contents of a database may change over time, but its schema as
determined during design time, is hardly changed. A database may have
different types of schema at the different levels of data abstraction discussed
earlier. Based on these, the Three-Schema Architecture has been developed to
construct a database system. It consists of the following:
Three
1. Physical/Internal schema: This corresponds User-1 User-2 User-n Schema
mainly to the physical data abstraction Architecture
level and deals with the physical organisation View-1 View-2 … View-
n
of data. It forms the lowest level and
describes the different data structures used and
how the raw data gets stored at the byte level.
2. Logical/Conceptual schema: This
corresponds mainly to the logical data Sub/External Views Schema
Logical/Conceptual Schema
Physical/Internal
abstraction level. It is used to describe the
Schema
logical structure of the database based on the Stored
different data types and the relationships that Database
exist between those data types. It describes
the different data operations possible and any
constraint or business rule to be imposed on those data. The logical schema

DB01 – Definitions and Concepts Page 7 of 18 © Joyrup Bhattacharya


hides the details of physical storage structures from the developer or database
administrator.
3. Sub/External View schema: This corresponds to the view level of data
abstraction and deals with the way a particular user application views the data
from the database. It forms the highest level. Each view or external schema is
used to describe a part of the database that a particular user group is interested
in and hides the rest of the database from that user group.
In general a database system supports one physical schema, one logical
schema and several sub-schemas as shown in the diagram above.
When a new database is defined, we only specify the database schema to the
DBMS. At this stage the state of the database is empty as it contains no data. We
get the initial state of the database only when the database is first filled with the
initial data. Whereas a database schema describes the structure of a database,
the database state or database instance indicates the collection of
information stored at any particular moment in the database. At any point in
time, a database has a current state or instance. It is the responsibility of the
database management system to ensure that every instance of a database is a DB Instance
valid instance satisfying the various constraints specified in the schema. For
example in case a bank allows a minimum account balance of Rs. 1000, then the
DBMS should take care of this constraint to ensure that at no instance can a bank
account have a balance less than Rs. 1000.
Unlike a database schema, an instance can change frequently as and when data
in added, updated or removed from the database. However changes may need to
be applied to a schema once in a while. For example the mobile phone number or
the email address may need to be incorporated to the existing database of
customers in a bank. This is known as schema evolution and is allowed by most
modern DBMSs during the time a database is operational.

Database Languages (DDL, DML, DQL, DCL)


To implement and use a database, three different classes of programming
languages are used in general. These can be broadly divided into Data Definition
Languages or DDL, Data Manipulation Languages or DML and the Data Control
Languages or DCL. The functions and examples of these are described below:
1. Data Definition Language (DDL): The design and structure of a database is
usually specified by a specific language called a Data Definition Language. The
DDL forms a link between the logical and physical structure of a
database i.e. the way the user views the data and the way the data is physically
stored. Once the DDL statements are written and compiled, they produce a set of
relations (tables), which are stored in a special file called a Data Dictionary or
Data Directory. The major functions of the DDL are thus:
a) To describe or create the logical schema or different relations in a
database.
b) To describe the data fields or attributes of each record i.e. to describe each
field’s logical name, data-type, field length, etc.
c) To describe the relationships between the different relations.
d) To describe the integrity constraints.
e) Describe the specific keys and indexes for accessing the data.
f) Provide means of data security and data restrictions.
g) Provide means of logical and physical data independence.
Examples of DDL statements in SQL include CREATE to establish a new table,
ALTER to alter the structure of the database, DROP to delete tables from the
database, TRUNCATE to remove all records from a table etc.
2. Data Manipulation Language (DML): Once the general structure of a
database is formed using a DDL, the database can be accessed, filled and

DB01 – Definitions and Concepts Page 8 of 18 © Joyrup Bhattacharya


manipulated by the user using a Data Manipulation Language. The Data Query
Language or DQL is a subset of DML and is used to write specific queries to
retrieve specific data. DQL is very flexible and can be used to express quite
complicated queries, sometimes very concisely. The different functions and
characteristics of a DML include:
a) Insert new information into the database
b) Retrieve existing information from the database based on certain criteria
c) Delete information from the database
d) Modify, sort, and update information in the database
e) Enable a user and application programs to process data on a logical basis
rather than bother about how the data is physically organised.
f) Supports high-level languages (like COBOL, VB etc.) in which application
programs are generally written. In general DML statements are embedded
within high-level host languages in which application programs are written.
In general there are basically two types of DML. These are:
a) Procedural DMLs: In a procedural DML, to retrieve particular information, the
user has to specify both the specific data requirement along with how to
get that data. Procedural DMLs are more efficient than non-procedural
languages. Example of a procedural approach include Relational Algebra
which can be used to manipulate data organised in relations (tables) using the
various relational operators. However relational algebra is hard to use and
due to their complexity they are generally not used in commercial
databases.
b) Non-procedural DMLs: In a non-procedural DML, to retrieve a particular
information, the user has to specify only the specific data requirement
without specifying the means to get that data. Since a user is not required to
specify the means of getting the data, these languages may not generate very
efficient codes. Examples of non-procedural DMLs include Relational
Calculus, Transform-Oriented-Languages (e.g. SEQUEL, SQL), Query-by-
Example and Query-by-Form (e.g. MS-Access). Of these, due to its
complexity Relational Calculus is never used in commercial database
processing.
In Transform-Oriented-Languages like SQL, the input data may be
expressed as several relations (tables), which are then transformed to express
the required result as a single relation (table).
Query-by-Example and Query-by-Form are graphical languages. In these,
the user is presented with a graphical interface in the form of a Data-Entry-
Form. The database management system analyses the entries made by the
user and generates the required queries.
Examples of DML statements in SQL include SELECT to retrieve rows of data,
INSERT to place new rows of data in the database, UPDATE to replace existing
values in the database with new values, DELETE to delete rows of data etc.
3. Data Control Language (DCL): The Data Control Language defines activities
that are not part of DDL or DML. DCL commands are used to control the
distribution of access privileges to users. It defines, when proposed
changes to a database can be made irreversibly. Only database administrator
can execute DCL commands.
Examples of DCL statements in SQL include CALL to execute an SQL procedure,
RETURN to return a value from an SQL function, SET assignment: to assign a
value to an SQL variable, VALUES to invoke an SQL routine, ALTER
PASSWORD to change passwords etc.
Data Models
Data models are a collection of conceptual tools for describing the data, the
relationships between the data, the constraints applicable on the data etc. There

DB01 – Definitions and Concepts Page 9 of 18 © Joyrup Bhattacharya


are various data models available, which can be broadly classified into the
following:
1. Physical Models: These data models are used to describe data at the lowest
level of data abstraction i.e. the way the data is physically stored in the
database. Two popular data models used to describe the physical architecture
are:
a) Unifying Model
b) Frame Memory Model
2. Record Based Logical Models: These data models are used to describe data
at the logical and view levels. It uses concepts that may be understood by
the end users and at the same time not too far from the way data is actually
organized within the computer. In this model, the database is formed using fixed
format records of several types with each record type containing a fixed
number of fixed length fields. The different record based models include:
a) Relational Model: In this model, Items Supply
data and the relationship between ItemCod Item SupNam SupCit ItemCod
e e y e
them is represented as a collection
I0001 Fridge Godrej Mumba
of tables. Each table has multiple i
I0001
Almira
columns and rows with each column I0002
h Steelco Kolkat
I0002
having unique name. All columns in a I0003 Table a
particular row in the table form a record. The figure Kolkat
Steelco I0003
a
above shows a relational database consisting of the tables Items (2 columns)
and Supply (3 columns). The Relational model is discussed in detail in a later
section.
b) Hierarchical Model: The Hierarchical SupplyData
model is the oldest of database
models. Here records are logically
GodrejMumbai ModernKolkata
organised into a hierarchy of
relationships forming an inverted SteelcoKolkata
tree pattern. All records in a hierarchy
are called nodes with each node I0001Fridge I0003Table
related to the next in a Parent-Child
relationship. Records that own other
I0002Almirah I0003Table
records are called parent records.
The top parent record (here SupplyData) is called the root record. Each
parent record can have one or more child records. But any child record can
have only a single parent record.
c) Network Model: This model is used to
GodrejMumbaiSt I0001FridgeI00
store data similar to the hierarchy model’s eelcoKolkataMod 02AlmirahI000
parent-child relationship. However unlike ernKolkata 3Table
the hierarchical model, it allows a record to
be a child of more than one parent records. The relationship between different
records is then represented by links in the form of pointers as shown in the
figure above. In the example, the I0003|Table record can be seen to be a child
of both the Steelco| Kolkata and the Modern| Kolkata records.
3. Object Based Logical Models: These data models are used in describing data
at the logical and view levels. These models are closer to human perception
and farther from system perception. Different object based logical models
include:
a) The Entity Relationship (ER) Model: The ER model views the real world as
a collection of basic objects called entities with relationships existing
between those entities. Each entity in turn is described by a set of
attributes. Entities and relationships of the same type are grouped together
to form an entity set and a Sup- Item-
relationship set. Several Sup-City Name
Name Code

Suppli
Supplier es Items

DB01 – Definitions and Concepts Page 10 of 18 © Joyrup Bhattacharya


graphical shapes are used to construct an ER diagram to express the overall
logical structure of a database.
b) The Object Oriented Model: By the middle of the 1980's it was observed
that relational databases were not practical for storing data in fields like
medicine, multimedia and high energy physics, all of which needed more
flexibility in how their data was represented and accessed. This led to object
oriented databases where users could define their own methods of access to
data and how it was represented and manipulated. It is based on a collection
of objects and codes called methods that operate on these objects. Objects
that contain the same type of values and the same methods are grouped
together into classes. Multimedia Databases, used for storing several
different types of files i.e. text, audio, video and images in a single database,
fall under this category.

Transaction Management
When working with a database, there may arise certain situations, when a particular
transaction involves two or more separate operations which form one logical unit
of work. For example consider the situation in a stock transfer. Suppose ‘x’ units of
item-t are transferred from the store in a factory to the showroom for sale. For a
valid transfer, the stock of item-t in the factory should get reduced by ‘x’ units
and simultaneously the stock of item-t in the showroom should get increased by ‘x’
units to keep the total number of units constant before and after the transfer. The
transaction will be incomplete and erroneous if either the factory stock or the
showroom stock is not updated due to some errors during the transfer. Thus either
both the transactions should occur or neither should occur. This all-or-none
requirement is called atomicity. A similar situation arises in case of money
transfer form one bank account to another. There the debit from one account must
be followed by a credit from another account simultaneously.
Atomicity
Moreover in case of money transfer, the total amount involved in the transaction
should be constant. Therefore an increase in the account A should correspond to a
decrease in the account B, i.e. the sum of the money in account A and that in
account B should be preserved. This requirement to maintain the correctness of
the transfer is called consistency. After a particular transfer is over, the database
should be able to preserve the new values in spite of any system snag or failure.
This property is called durability.
Consistency
We call this collection of separate operations that form a single logical unit of
work, a transaction. Each transaction forms one unit of both atomicity and
consistency. In our above example, the change of records in the two accounts was
carried out by two separate operations or programs. Here each program by itself
does not transfer the database from one consistent state to another. Hence each Transaction
program by itself does not carry out a transaction as the atomicity property is not
satisfied in such an operation. Thus in case all the operations in a transaction do not
take place due to a system failure or any other mishap, a failed transaction should
have no effect on the state of the database and the database must be restored to
the previous state before the said transaction had started.
It is the responsibility of the Transaction Management Module of a DBMS to
preserve the state of the database in case of any failures. Moreover it is the
responsibility of the database programmer to design the database in such a manner
so as to maintain these two properties in a transaction.

DB01 – Definitions and Concepts Page 11 of 18 © Joyrup Bhattacharya


Database Management System (DBMS)
A Database Management System or DBMS is
Users and
a collection of software programs that Programmers
enables users to define, create, maintain
and manipulate a database for various
applications. Application
DBMS
Programs /
The first step in handling a database is to Queries
define the database. This includes
specifying the physical and logical structure Transactio
Query
of the database, defining the data types, the Processo n
constraints imposed on the data, etc. This is r Manager
usually done using Data Definition DBMS
Languages (DDL). Software
DatabaseStorage Manager
Once the logical and physical structure of the System(File Manager + Buffer
database is defined, the next step is Manager)
creating the database. This implies
populating the database i.e. actually entering
data into a storage medium to form the
database. Metadat
Data
a
(Database Indexes
The final step is to manipulate the database (Database
Data)
to enter, retrieve or update data using Definition)
special application programs that incorporate Database
statements in special Data Manipulation
Languages (DML).
We can thus summarise the different
functions of a DBMS as:
1. Perform data storage and retrieval
functions and handle user queries
2. Implement data manipulation
procedures developed by the
administrators
3. Enforce database security at the
physical and logical level
4. Interfacewith the OS to allocate
computer resources like printers etc. to
users
5. Implement back up and recovery in
case of system crashes, power outages
etc.
The above figure shows the essential parts of a Database Management
System. These are now described in detail.
1) Database: The lowest level forms the database where the raw data is stored.
At this level, we have the metadata and indexes stored along with the data.
As discussed earlier, metadata deals with information related to the structure of
the data. Apart from these the database level also contains another type of data
structure called indexes which are used to find data items quickly in a database
and hence helps to improve database performance.
2) DBMS Software: The next higher level is the DBMS software. It consists of
several modules used to manipulate and process the data in the database. The
different modules that are used include the following:
a) Storage Manager: The function of this module is to modify the
information in the database and retrieve information from the database,
when requested by the higher levels. It thus serves as an interface between
the low level data stored in the database and the application programs and

DB01 – Definitions and Concepts Page 12 of 18 © Joyrup Bhattacharya


queries submitted to the database system. It translates the various DML
statements into low level file system commands. Thus in a simple
database, the storage manager may be the file system of the underlying
operating system itself. However in larger databases it may consist of the
following components:
i) Authorisation and Integrity Manager: It checks whether a user is
authorised to access the database. It is also responsible for maintaining the
integrity of the system. To maintain the integrity, it interacts with the
Query Processor to find out what data is being operated upon by the
current queries. In case of several queries running in the system, it takes
care so that no two queries interfere with each other.
ii) Transaction Manager: It keeps track of the changes made to the data to
recover lost data in case of a system failure and maintain a consistent
state of the database. It maintains a data log containing a record of the
changes made so that un-executed changes can be executed after the
system has recovered from a failure. It also maintains execution of different
transactions simultaneously without any conflict.
iii) The Buffer Manager: The buffer manager is used to handle main
memory. It obtains blocks of data from the disk and allocates the blocks
to a portion of the main memory. The buffer manager will keep a block in
the main memory as long as it is required and will return the block to the
disk if the main memory is needed by another block.
iv) The File Manager: The file manager is used to keep track of file
locations on the disk. A file is stored in the storage device in a collection of
disk blocks. When requested by the buffer manager, the file manager
obtains the required block or blocks that contain a particular file.
b) Query Manager/Processor: The job of Query Processor is to convert a
query as submitted by the user, and expressed in a high level language (like
SQL) into a sequence of commands in a low level language to the
Storage Manager to retrieve the appropriate information. It is also handles
requests for modification of data and metadata. It is usually made up of the
following modules:
i) DML Compiler: This module is used to translate DML statements in a
query language (like SQL) to a low level language that the query
evaluation engine understands. The DML compiler also optimises the user
queries to increase the efficiency of the queries.
ii) Embedded DML pre-compiler: This module interacts with the DML
Compiler to generate the appropriate codes for DML statements
embedded in an application program.
iii) DDL Interpreter: This module is responsible for interpreting DDL
statements and tabulating them in a set of tables called system tables
that contain the metadata.
iv) Query Evaluation Engine: This module receives low level instructions
from the DML Compiler and executes them to retrieve the required data
from the database.
3) Application Programs: Users interact with a database through application
program interfaces. A typical DBMS allows programmers to write application
programs that through system calls to the DBMS are able to manipulate
data in a database. The most frequent interaction with a database is to query a
database. Apart from queries, application programs are also written to modify
data or modify the database schema. However access is given only to
database administrators to modify an existing schema or create a new database.
There may be several application programs that are used by different user types.
4) Users: At the outermost level are the end users as described earlier, who are
responsible for maintaining and accessing the database. These include the

DB01 – Definitions and Concepts Page 13 of 18 © Joyrup Bhattacharya


database administrator, the sophisticated users, the specialised users and the
inexperienced users.

Data Administrator (DA) & Database Administrator (DBA)


It is the job of a special category of people in an organisation to determine whether
a database technology has been successfully developed and implemented. These
people are termed as Data Administrators (DA). The job of a DA is to look after
the following:
1. Strategic Planning: The DA is the key person involved in strategic planning
of data resources and determines the major business areas or processes the
database should serve.
2. Determine Data Requirement: The DA decides what data will be stored in
the database to carry out these processes and their corresponding data sources.
3. Determine Access Policies: The DA lays down policies for accessing and
maintaining the database and determines the access rights of the different
database users.
4. The DA plays a business oriented role in determining the business
strategies and policies involved in using a DBMS. To do so he should have
access to the top-level management and should be granted a wide range of
authority in connection with the database.

A Database Administrator (DBA) on the other hand is a technical person who is


responsible for defining the internal model of a database. He is the person who
creates and maintains a database. To design a database, the DBA first has to
discuss with the users to determine their specific requirements. He then determines
the physical storage requirement of the data, the accuracy requirement, frequency
of data access, search strategies, and security levels of different data. The DBA also
identifies the different data sources and the persons responsible for entering and
updating the data. Finally with all the specifications available, the DBA converts
these requirements into a physical design which specifies the hardware
requirements of the database.
Depending upon the above functions, we can classify the different jobs of a DBA
as:
1. Schema Definition: The original database schema is created by the DBA
by writing a set of definitions. These are then translated by the DDL compiler to
form a set of tables consisting of metadata that is stored in the data dictionary.
2. Storage Structure & Access Method Definition: The storage structures of
different data types and their access mechanisms are defined, guided by the
need to efficiently store and retrieve the data. These definitions are then
translated by the DDL compiler to form the actual data structures.
3. Schema Modification: In case there is the rare need to modify the logical or
the physical schema, the DBA is responsible to write a set of definitions that
are translated by the DDL compiler to accomplish the required modification to
the internal system tables.
4. Data Access Authorisation: Every user of the database may not be required to
access the entire database. Moreover some user may be allowed to modify data
while some may be allowed to view data only. It is the DBA who is responsible for
granting rights to different classes of users. This authorisation data is kept
in a special system file which is consulted by the DBMS whenever a user wants to
access the database.
5. Integrity Constraint Specification: Based on certain business rules or other
criteria there may be certain constraints on certain data types. For example a
bank may allow a minimum bank account balance, beyond which a customer will

DB01 – Definitions and Concepts Page 14 of 18 © Joyrup Bhattacharya


not be able to withdraw money. The DBA is required to specify all such integrity
constraints explicitly.

DB01 – Definitions and Concepts Page 15 of 18 © Joyrup Bhattacharya


Advantages and Disadvantages of using a DBMS
The advantages of using a DBMS over a File Processing System are:
1. Minimised data duplication: Supplier
In a DBMS, a particular data is Processing
stored in one place only. Application
Whenever any application is Supplier File User
required to access the data, the
DBMS retrieves the data for Order
the application from that place. Processing DBMS Database
Since a particular data is stored Application
at a single place, storage space
Order File User
is saved. Moreover when an
update is required, data needs to Payment
be updated at one place only. Processing
Application
This eliminates the problem
of data integrity. Payment File User

2. Data remains together: In a Database system, all data are stored at a


single place called a database. Whenever an application program requires
some data, the DBMS retrieves the data from the database. In case data from
multiple locations need to be combined, the DBMS does the same by retrieving
the required data from the database.
3. File format independent application programs: In a Database system the
application programs that access the data, interact with the data
through the DBMS and not directly with the database. In case any change
occurs in the data formats, the DBMS takes care of the same. Thus physical and
logical data independence makes application programs independent of schema
modifications.
4. Compatibility between different files: In a Database Processing system, the
application programs do not interact directly with the data files, instead they
interact with the DBMS. The DBMS in turn interacts with the database files to
generate the require results. Hence, in case different programming
platforms are used to develop the application programs, they need to
interact only with the DBMS and not with the different data files. Thus the
question of compatibility in formats of different data files does not arise.
5. User Friendly Interfaces: Database technology makes it easier to represent
data in a user friendly manner by combining data from different tables as
required.
In spite of the huge success of a DBMS over a conventional file processing system,
however there are certain limitations of the DBMS approach as described below:
1. Concurrency Problems: In case a DBMS package is not designed for multiple
users, problems can arise when more than one user wants to access the
database simultaneously. This problem of concurrently accessing the same Concurrency
record in a database is known as concurrency problem. Problem
For example let two persons A and B have a joint bank account. Suppose two
of them simultaneously view their bank balance from two different ATMs. Let the
bank balance shown be Rs. 40,000/-. Suppose A withdraws Rs. 20,000/- and
closes the transaction whereby the DBMS program writes back a balance amount
of 20,000/-. However B still sees the bank balance as Rs. 40,000/- as no change
is made to the screen view of person B after the transaction by A. So B now
withdraws Rs 25,000/- and closes the transaction which writes back the balance
record as (40,000-25,000) = Rs.15,000/- by overwriting the previous record
balance of Rs. 20,000/- as entered by A. Thus at the end of the transaction, the
account shows a balance of Rs. 15,000/- when actually, there is a negative
balance of Rs. 5000/-.

DB01 – Definitions and Concepts Page 16 of 18 © Joyrup Bhattacharya


One can avoid a concurrency problem by locking a file when it is used by one
person, so that it is not available for another person at the same time. Another
method is to lock the particular record that is accessed by one user, so that
the file may be available for another user for accessing other records.
2. Ownership Problem: In a file based system, generally data in a particular file is
handled by a particular individual. When a database is created using those files,
the data is no longer the specific property of the application user, but
instead is owned by the entire company. Any user with an access right should be
able to access or use the data. Giving up ownership of data may be
traumatic for any company employee and managers.
3. Resource Problem: When a DBMS is implemented, the amount of data that
needs to be accessed and manipulated also increases. To handle the new
database and run the DBMS programs, extra resources or upgradation of
existing resources may be required. Thus extra terminals, printers, storage
devices, servers, communication devices, etc. may need to be purchased. This
adds to the cost of setting up a DBMS.
4. Security Problem: The DBMS should be able to give access to the database to
authorised personnel only. Security considerations should include means of
controlling physical access to terminals, storage devices, and specific interface
forms for updating or deletion of records.

Questions from this Section


1. State the major differences between a file processing system and a DBMS.4
2. What are the disadvantages of a conventional file system? 3
3. What is integrity problem? 4
4. What is a Database? 2
5. What are the levels of data abstraction? Explain each of them briefly.
2+4
6. What is a Database Schema? What is a DB instance? 3
7. Describe the three schema architecture of a database.
3
8. What is the difference between logical and physical data independence? 4
9. What are the different types of Database users? 3
10. State the different database languages.
3
11. Distinguish between DDL and DML. 4
12. What are the basic characteristics of DML? What are the types of DML? 3+3
13. What do you mean by atomicity and consistency? 2+2
14. What do you mean by a transaction? 3
15. Name different types of database models. 2
16. What is a DBMS? State the advantages and disadvantages of a DBMS. 2+4
17. What are the components of a Query Processor? 4
18. What are the components of a Storage Manager? 4
19. What are the major functions of a DBA? 4
20. What are the responsibilities of DBA and that of a database designer? 4

DB01 – Definitions and Concepts Page 17 of 18 © Joyrup Bhattacharya


DB01 – Definitions and Concepts Page 18 of 18 © Joyrup Bhattacharya

You might also like