You are on page 1of 18

MODULE – I

DATABASE SYSTEM ARCHITECTURE


Introduction
 Data refers to the meaningful raw facts that can be recorded. They are facts and statistics stored or free
flowing over a network. Data is generally it's raw and unprocessed
 Information is the processed data, , turning it into something meaningful..Information to one user may
be data for another.
 Database is an organized persistent collection of data .It captures the essential properties of the objects
and records relationships among them. It is a collection of related data organized in a way that data can
be easily accessed, managed and updated.

Database Applications:
Databases are widely used in various areas as below:
o Banking: all transactions information.
o Airlines: reservations, schedules information.
o Universities: registration, grades information.
o Sales: customers, products, purchases information.
o Online retailers: order tracking, customized recommendations
o Manufacturing: production, inventory, orders, supply chain
o Human resources: employee records, salaries, tax deductions.
o Credit card transactions: For purchases on credit cards and generation of monthly statements.
o Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
o Finance: For storing information about holdings, sales, and purchases of financial instruments
such as stocks and bonds.

Disadvantages of earlier File System


 In the early days, database applications were built directly on top of file systems.
 Drawbacks of using file systems to store data:
 Data redundancy and inconsistency
 Difficulty in accessing data

1
 Integrity problems
 Atomicity of updates
 Concurrent access by multiple users
 Security problems

Overview of Data Base Management System(DBMS)


 A database-management system (DBMS) is a software that manages the database.
 It allows creation, definition and manipulation of database, allowing users to store, process and analyze
data easily.
 It is a collection of interrelated data and a set of programs to access those data.
 The collection of data, usually referred to as the database, contains information relevant to an
enterprise. DBMS controls access to the data and provides features for database creation, data
manipulation such as data value modification, data retrieval, data integrity and security etc.
 The primary goal of a DBMS is to provide a way to store and retrieve database information that is both
convenient and efficient.
 A Database System consists of
o DBMS (which organizes and maintains the information)
o Database Application (which allows operations on data stored in DBMS).

2
Advantages of Using the DBMS Approach
i. Controlling Redundancy and saving spaces.
ii. Restricting Unauthorized Access
iii. Inconsistency is avoided as all changes are affected at one site.
iv. Providing Storage Structures and Search Techniques for Efficient Query Processing
v. Providing Backup and Recovery
vi. Providing Multiple User Interfaces.
vii. Providing data sharing.
viii. Representing Complex Relationships among Data
ix. Enforcing Integrity so that data fetched will be correct.

Limitations of DBMS
i. High initial investment in hardware, software, and training.
ii. Overhead for providing security, concurrency control, recovery, and integrity functions

Here are some examples of popular databases:


 MySql
 Oracle
 SQL Server
 IBM DB2
 PostgreSQL
 Amazon SimpleDB (cloud based) etc.

Characteristics of Database Management System


Data stored into Tables:
 Data is stored into tables, created inside the database.
 DBMS also allows to have relationships between tables which makes the data more meaningful
and connected.
Reduced Redundancy:
 DBMS follows Normalisation which divides the data in such a way that repetition is minimum.
3
Data Consistency:
 Data that is being continuously updated and added, maintaining the consistency of data can be
easily handled by DBMS.
Support Multiple user and Concurrent Access:
 DBMS allows multiple users to work on it(update, insert, delete data) at the same time .
Query Language:
 DBMS provides users with a simple Query language, using which data can be easily fetched,
inserted, deleted and updated in a database.
Security:
 The DBMS takes care of the security of data by protecting them from un-authorised access.

DATA MODELS:
A data model is a collection of conceptual tools that describe the data, their relationships and
consistency constraints.

Various types of data models are:


i. High level or conceptual based logical model
 They are used in describing data at conceptual and view level.
 They have fairly flexible structuring capabilities and allow data constraints to be specified
explicitly.
 High-level conceptual data models provide concepts for presenting data in ways that are close to
the way people perceive data.
 A typical example is the entity relationship model, which uses main concepts like entities,
attributes and relationships.
4
ii. Low level or physical data model
They are used to describe the data at the lowest level (ie how data are stored) such as formats, access
paths etc.
iii. Record based logical model
 The database is structured in fixed format records.
 They specify the overall logical structure of the database and are used in describing the database
at the conceptual level.
 Record-based logical data models provide concepts users can understand but are not too far from
the way data is stored in the computer.

 Various models are:


a. File management system
Earliest method, data stored and checked sequentially but their relationships could
not be drawn.
b. Hierarchical database system
Data represented by a simple tree structure.
c. Network database system
Data represented by records and links, allowing relationships
d. Relational database system
Data is represented as relations, or tables.
e. Object Relational model
Supports both object oriented and relational concepts, allowing structures to be reused

5
6
ARCHITECTURE
There are two ways to view the architecture of a DBMS
i. Logical DBMS architecture
It deals with the way data is stored and presented to the user.
ii. Physical DBMS architecture
It is concerned with the software components that make up the DBMS.

LOGICAL OR THREE-LEVEL ARCHITECTURE


It describes how data in the database is perceived by the user. It is not concerned with how data
is handled and processed , but only with how it looks. The most popular architecture is ANSI/ SPARC
model.( American National Standard Institute / Standards Planning And Requirements Committee).

Schema
The overall design or description of a database is the database schema, which is specified during the database
design and is not expected to change frequently.

Instance
The collection of the data stored in the database at a particular moment of time is known as a database instance
or database state or snapshot.
DATA ABSTRACTION
The main purpose of database system is to provide users with an abstract view of data ie hides the details of
how data are stored and maintained(complexities) through several levels of abstraction.

The ANSI / SPARC model divides the system into three levels of abstractions.
i. Physical or Internal level.
 It is the lowest level of abstraction describes how the data are actually stored.
 The physical level describes complex low-level data structures in detail.
 It is expressed by an internal schema.
ii. Logical or Conceptual level.
 The next-higher level of abstraction describes what data are stored in the database, and what
relationships exist among those data.
7
 It presents a logical view of the entire database as a unified whole.
 It is expressed by a conceptual schema, hence there is only one conceptual schema per
database.
 DBMS provides a DDL (Data Definition Language) for this which defines the content only.
iii. View or External level
 It is the highest level of abstraction which allows the user to see only the data of their own
interest.
 There can be any number of external views, each described by an external schema.
 Thus the interactions are simplified and the system may provide many views for the same
database.

Mapping between the levels


The three level of abstractions don’t exist independently. There is some mapping or correspondence between
the levels, which is of two types
i. Conceptual / Internal mapping:
 It defines the correspondence between the conceptual and physical levels ie between global
and internal levels.
 If the structure of the stored database is changed, then this mapping must also be changed
accordingly, so that the view from conceptual level remains constant.

8
ii. External / Conceptual mapping:
 It defines the correspondence between a particular external view and conceptual level.
 If the structure of the database at the conceptual level is changed, then this mapping must be
changed accordingly, so that the view from external level remains constant.

iii. External / External mapping (rare):


 It defines the correspondence between one external view and another external view.

DATA INDEPENDENCE
The three level of abstraction, along with the mapping provide two distinct levels of data independence.
The ability to modify a schema at one level without changing the schema at the next higher level is known as
data independence. It is of two types:
i. Logical data independence:
It is the ability to modify the conceptual or global schema without changing the external or user
schema or application programs. The change is absorbed by the external / conceptual mapping.
ii. Physical data independence :
It is the ability to modify the physical or internal schema without changing the conceptual (or
external) schema or application programs. The change is absorbed by the external / conceptual
mapping.

Objectives of three level architecture


i. To separate each user’s view of database from the physically represented database.
ii. Support of multiple customized user views.
iii. Insulation between the user programs and the data that doesn’t concern them.
iv. Allows users to concentrate on general structure rather than low level implementation details.

9
Example:

User view User1 Name - John User2 Name - Shyam


Rollno - 093045 Mark - 75

Name – string
Conceptual Rollno-number
view Mark – number
Address- string

Physical view Name-string of length 25, starting address xxx and offset xxx
Rollno – number without decimal, starting address xxx and offset xxx
Mark- number with decimal, starting address xxx and offset xxx
Address- string of length 50, starting address xxx and offset xxx

PHYSICAL ARCHITECTURE OR COMPONENTS


It describes the software components and their interconnection. It usually varies but the main components of
DBMS are as follows:

10
i. DML Precompiler :
Data manipulation Language(DML) defines the set of commands that modify and process data for
output. The DML Precompiler converts DML statements embedded in application program, to
normal procedural calls in the host language. It interacts with query processor to generate the code.

ii. DDL Compiler:


Data Definition Language (DDL) contains commands to define format of data stored. The DDL
Compiler converts the DDL statements into set of tables containing data about the whole database,
which can be used by other components. These tables are stored in Data Dictionary.

iii. File Manager:


It manages the allocation of space on storage location. It maintains the internal schema. It interacts
with the Disk manager, which is a part of the operating system, and transfers blocks of space.

iv. Database Manager:


It is the central software component of DBMS or its control system. The basic functions of Database
Manager are:
a. Interaction with the File manager
b. Enforcing constraints or checks
c. Enforcing security(ie only authorized access)
d. Concurrent control(ie. Simultaneous access to many users)
e. Backup and recovery

v. Query Processor:
It changes the query statement from the English like syntax into a DBMS understandable form. It
usually consists of two parts

a. Parser
It checks the syntax of the statements, by breaking it into basic units. It ensures that each
statement consists of proper component parts.

11
b. Query Optimizer
It tries to choose the best and most efficient way of executing the query, by generating several
query plans(ie arranging the order of operations) and trying to estimate which plan will be
executed most efficiently. The factors taken into consideration are CPU time, disk time, network
time, sorting time and scan methods.

vi. Data Files and Data Dictionary


Data dictionary stores the information about the structure of the database. It provides the definition
of the things and where to find them. It contains the information or data about the data. It stores information
concerning external, conceptual, and internal levels of the database. Its documentation helps the end users
and managers. It acts as a road map which guides users through the large database.

DATABASE USERS
 A primary goal of a database system is to retrieve information from and store new information in the
database. People who work with a database can be categorized as database users or database
administrators.
 There are four different types of database-system users, differentiated by the waythey expect to interact
with the system.
1. Naive users :
 They are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously.
For example, a bank teller who needs to transfer $50 from account A to account B
invokes a program called transfer. This program asks the teller for the amount of money to be
transferred, the account from which the money is to be transferred, and the account to which the money
is to be transferred.
Examples, people accessing database over the web, bank tellers, clerical staff.
 The typical user interface for naive users is a forms interface, where the user can fill in
appropriate fields of the form. Naive users may also simply read reports generated from the
database.
2. Application programmers :
 They are computer professionals who write application programs.
12
 Application programmers can choose from many tools to develop user interfaces.
 Rapid application development (RAD) tools are tools that enable an application programmer
to construct forms and reports without writing a program.
 They must be familiar with the DBMSs to accomplish their task.

3. Sophisticated users :
 They interact with the system without writing programs.
 They form their requests in a database query language.
 They submit each such query to a query processor, whose function is to break down DML
statements into instructions that the storage manager understands.
 Analysts who submit queries to explore data in the database fall in this category.
 Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them view
summaries of data in different ways.
For instance, an analyst can see total sales by region or by product, or by a combination
of region and product.

4. Specialized users :
 They are sophisticated users who write specialized database applications that do not fit into the
traditional data-processing framework.
 Among these applications are computer-aided design systems, knowledge base and expert
systems, systems that store data with complex data types etc.

Database Administrator(DBA)
Centralized control of the database is exerted by a person or group of persons, under the supervision
of a high level administrator known as Database Administrator. They are responsible for creating,
modifying and maintaining the three database levels. The basic functions of DBA are
a. Schema definition:
DBA creates the original database schema by writing definitions, which are stored
permanently in the data dictionary.
b. Storage structure and access method definition

13
DBA creates appropriate storage structures and access methods by writing set of
definitions.
c. Schema and physical-organization modification
DBA is involved in the rare modifications to the database schema or to the description of
physical organization.
d. Granting of authorization for data access.
DBA allows granting of various types of authorizations to various users.
e. Routine maintenance.
 Periodically backing up the database
 Ensuring that enough free disk space is available for normal operations, and upgrading
disk space as required.
 Monitoring jobs running on the database
f. Integrity constraints
DBA specifies the constraints, which are checked before any data addition, modification
etc.
g. Recovery
DBA is also responsible for recovery of database from failures.
h. Overall custodian
DBA is the overall custodian and controls the database.

DATABASE LANGUAGES
DBMS provides different types of languages. They are:

a. Data Definition Language (DDL)


 The database schema is specified by a set of definitions expressed by a special language called Data
Definition Language(DDL).
 These commands are basically used to change or modify the formats of various objects (tables, views
etc..) in the database.
 The compiled form of DDL is a set of tables stored in a file called Data Dictionary or Data Directory,
which is consulted before any data manipulation operation.

14
 A data dictionary contains metadata—that is, data about data. The schema of a table is an example of
metadata.
 The storage structure and the access methods used by the DBMS are specified by a set of definitions in a
special type of DDL called Data Storage and Definition language. The compiled forms of these
specify the implementation details of the database schemas, which are usually hidden from the users.
 The data values stored in the database must satisfy certain consistency constraints.(For example,
suppose the balance on an account should not fall below $100). The database systems check these
constraints every time the database is updated.

Examples of DDL commands:


CREATE, ALTER and DROP commands.

b. Data Manipulation Language(DML)


 The language used to manipulate the data in the database is called Data Manipulation Language(DML).
 These commands are basically used to change or modify the content of the database.
 Data Manipulation involves :
o The retrieval of information stored in the database
o The insertion of new information into the database
o The deletion of information from the database
o The modification of information stored in the database
 A query is a statement requesting the retrieval of information. The portion of a DML that involves
information retrieval is called a query language. The terms query language and data manipulation
language synonymously.

 Example of DML commands are:


INSERT DELETE, SELECT and UPDATE commands.
 DML can be of two types:
a) Procedural DMLs require a user to specify what data are needed and how to get those data. It is
more efficient.
b) Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data
are needed without specifying how to get those data. It is more easier.
15
c. Data Control Language(DCL)
It is used to control access to the database and the data contained within it. It enforces data security.
Example of DCL commands:
GRANT, REVOKE

d. Transaction Control Language(TCL)


Transaction Control (TCL) statements are used to manage the changes made by DML
statements. These SQL commands are used for managing changes affecting the data.
Example of DCL commands:
COMMIT, ROLLBACK and SAVEPOINT.

E. F CODD RULES
Dr Edgar F. Codd, after his extensive research on the Relational Model of database systems, came up with
twelve rules of his own, which according to him, a database must obey in order to be regarded as a true
relational database.

Rule 0 is a foundation rule, which acts as a base for all the other rules.
Rule 0 : Foundation Rule
A relational database management system must manage its stored data using only its relational capabilities.

Rule 1 : Information Rule


All information in the database should be represented in one and only one way - as values in a table.

Rule 2 : Guaranteed Access Rule


Each and every datum (atomic value) must be logically addressable by combination of table name, primary
key value and column name.

Rule 3 : Systematic Treatment of Null Values


Null values (distinct from empty character string or a string of blank characters and distinct from zero or any
other number) are supported in the fully relational DBMS for representing missing information in a systematic
way, independent of data type.
16
 NULL can be interpreted as one the following
− data is missing, data is not known, or data is not applicable.

Rule 4 : Dynamic On-line Catalog Based on the Relational Model


 The structure description of the entire database must be stored in an online catalog, known
as data dictionary, which can be accessed by authorized users.
 Users can use the same query language to access the catalog which they use to access the
database itself.

Rule 5 : Comprehensive Data Sublanguage Rule


A relational system may support several languages and various modes of terminal use. However, there must be
at least one language whose statements are expressible, per some well-defined syntax, as character strings and
whose ability to support all of the following is comprehensible:
a. data definition
b. view definition
c. data manipulation (interactive and by program)
d. integrity constraints
e. authorization
f. transaction boundaries (begin, commit, and rollback).

Rule 6 : View Updating Rule


All views that are theoretically updateable are also updateable by the system.

Rule 7 : High-level Insert, Update, and Delete


Capability of handling a relation must hold good for all retrieve, update, delete, and insert activities. A database
must support high-level insertion, updation, and deletion. This must not be limited to a single row, that is, it
must also support union, intersection and minus operations to yield sets of data records.

Rule 8 : Physical Data Independence


Application programs and terminal activities remain logically unimpaired whenever any changes are made in
either storage representation or access methods.
17
Rule 9 : Logical Data Independence
Application programs must be independent of changes made to the table. Design of tables may be changed
dynamically without user’s knowledge.

For example, if two tables are merged or one is split into two different tables, there should be no impact or
change on the user application. This is one of the most difficult rule to apply.

Rule 10 : Integrity Independence


A database must be independent of the application that uses it. All its integrity constraints can be independently
modified without the need of any change in the application. This rule makes a database independent of the
front-end application and its interface.

Rule 11 : Distribution Independence


The end-user must not be able to see that the data is distributed over various locations. Users should always get
the impression that the data is located at one site only. This rule has been regarded as the foundation of
distributed database systems.

Rule 12 : Non subversion Rule


If a relational system has or supports a low-level (single-record-at-a-time) language, that low-level language
cannot be used to subvert or bypass the integrity rules or constraints expressed in the higher-level (multiple-
records-at-a-time) relational language.

18

You might also like