BSC CsIt Complete RDBMS Notes

MIT CIDCO DBMS B.Sc.
(CS/IT) III Semester
UNIT-I BASIC CONCEPTS
Database Management System:

A database-management system (DBMS) is a collection of interrelated data and a
set of programs to access those data.
 The collection of data, usually referred to as the database, contains information relevant to
an enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient.
 Database systems are designed to manage large bodies of information.
 Management of data involves both defining structures for storage of information and
providing mechanisms for the manipulation of information.
Database System Applications

Databases are widely used. Here are some representative applications:
• Banking: For customer information, accounts, and loans, and banking transactions.
• Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner—terminals situated around the world accessed
the central database system through phone lines and other data networks.
• Universities: For student information, course registrations, and grades.
• Credit card transactions: For purchases on credit cards and generation of monthly statements.
• Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
• Finance: For storing information about holdings, sales, and purchases of financial instruments
such as stocks and bonds.
Prepared by: Lect. Arohi Patil Page 1

MIT CIDCO DBMS B.Sc. (CS/IT) III Semester
• Sales: For customer, product, and purchase information.
• Manufacturing: For management of supply chain and for tracking production of items in
factories, inventories of items in warehouses/stores, and orders for items.
• Human resources: For information about employees, salaries, payroll taxes and benefits, and
for generation of paychecks.
Database Systems versus File Systems

One way to keep the information on a computer is to store it in operating system files. To allow
users to manipulate the information, the system has a number of application programs that
manipulate the files, including
• A program to debit or credit an account
• A program to add a new account
• A program to find the balance of an account
• A program to generate monthly statements
Typical file-processing system is supported by a conventional operating system. The system

stores permanent records in various files, and it needs different application programs to extract
records from, and add records to, the appropriate files. Before database management systems
(DBMSs) came along, organizations usually stored information in such systems. Keeping
organizational information in a file-processing system has a number of major disadvantages:
• Data redundancy and inconsistency. Since different programmers create the files and
application programs over a long period, the various files are likely to have different formats and
the programs may be written in several programming languages. Moreover, the same information
may be duplicated in several places (files). For example, the address and telephone number of a
particular customer may appear in a file that consists of savings-account records and in a file that
consists of checking-account records. This redundancy leads to higher storage and access cost. In
addition, it may lead to data inconsistency; that is, the various copies of the same data may no

longer agree. For example, a changed customer address may be reflected in savings-account
records but not elsewhere in the system.
• Difficulty in accessing data The point here is that conventional file-processing environments
do not allow needed data to be retrieved in a convenient and efficient manner. More responsive
data-retrieval systems are required for general use.
• Data isolation. Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult.
• Integrity problems. The data values stored in the database must satisfy certain types of
consistency constraints. For example, the balance of a bank account may never fall below a
prescribed amount (say, $25). Developers enforce these constraints in the system by adding
appropriate code in the various application programs. However, when new constraints are added,
it is difficult to change the programs to enforce them. The problem is compounded when
constraints involve several data items from different files.
• Atomicity problems. A computer system, like any other mechanical or electrical device, is
subject to failure. In many applications, it is crucial that, if a failure occurs, the data be restored
to the consistent state that existed prior to the failure. it must happen in its entirety or not at all. It
is difficult to ensure atomicity in a conventional file-processing system.
• Concurrent-access anomalies. For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. In such an
environment, interaction of concurrent updates may result in inconsistent data.
• Security problems. Not every user of the database system should be able to access all the data.
For example, in a banking system, payroll personnel need to see only that part of the database
that has information about the various bank employees. They do not need access to information
about customer accounts. But, since application programs are added to the system in an ad hoc
manner, enforcing such security constraints is difficult.
ADVANTAGES OF USING THE DBMS

1. Controlling Redundancy
Redundancy in storing the same data multiple times leads to several problems. It is necessary to
use controlled redundancy for improving the performance of queries.
Restricting Unauthorized Access
When multiple users share a large database, it is likely that most users will not be authorized to
access all information in the database. For example, financial data is often considered
confidential, and hence only authorized persons are allowed to access such data. A DBMS
should provide a security and authorization subsystem, which the DBA uses to create accounts
and to specify account restrictions.
2. Providing Persistent Storage for Program Objects
Databases can be used to provide persistent storage for program objects and data structures. This
is one of the main reasons for object-oriented database systems. Object-oriented database
systems typically offer data structure compatibility with one or more object oriented
programming languages.
3. Providing Backup and Recovery
A DBMS must provide facilities for recovering from hardware or software failures. The backup
and recovery subsystem of the DBMS is responsible for recovery.
4. Providing Multiple User Interfaces
Because many types of users with varying levels of technical knowledge use a database, a
DBMS should provide a variety of user interfaces.
5. Representing Complex Relationships among Data
A database may include numerous varieties of data that are interrelated in many ways. A DBMS
must have the capability to represent a variety of complex relationships among the data as well
as to retrieve and update related data easily and efficiently.
6. Enforcing Integrity Constraints

Most database applications have certain integrity constraints that must hold for the data. A
DBMS should provide capabilities for defining and enforcing these constraints. The simplest
type of integrity constraint involves specifying a data type for each data item.
Disadvantages of DBMS
1. Cost
DBMS requires high initial investment for hardware, software and trained staff. A significant
investment based upon size and functionality of organization if required. Also organization has
to pay concurrent annual maintenance cost.
2. Complexity
A DBMS fulfill lots of requirement and it solves many problems related to database. But all
these functionality has made DBMS an extremely complex software. Developer, designer, DBA
and End user of database must have complete skills if they want to user it properly. If they don‘t
understand this complex system then it may cause loss of data or database failure.
Also See: What is File Processing System?
3. Technical staff requirement
Any organization have many employees working for it and they can perform many others tasks
too that are not in their domain but it is not easy for them to work on DBMS. A team of technical
staff is required who understand DBMS and company have to pay handsome salary to them too.
4. Database Failure
As we know that in DBMS, all the files are stored in single database so chances of database
failure become more. Any accidental failure of component may cause loss of valuable data. This
is really a big question mark for big firms.
5. Extra Cost of Hardware
A DBMS requires disk storage for the data and sometimes you need to purchase extra space to
store your data. Also sometimes you need to a dedicated machine for better performance of
database. These machines and storage space increase extra costs of hardware.
6. Size
As DBMS becomes big software due to its functionalities so it requires lots of space and memory
to run its application efficiently. It gains bigger size as data is fed in it.
7. Cost of Data Conversion
Data conversion may require at any time and organization has to take this step. It is unbelievable
that data conversion cost is more than the costs of DBMS hardware and machine combined.
Trained staff is needed to convert data to new system. It is a key reason that most of the
organizations are still working on their old DBMS due to high cost of data conversion.
8. Currency Maintenance
As new threats comes daily, so DBMS requires to updates itself daily. DBMS should be updates
according to the current scenario.
9. Performance
Traditional files system was very good for small organizations as they give splendid
performance. But DBMS gives poor performance for small scale firms as its speed is slow.
Data Abstraction
A major purpose of a database system is to provide users with an abstract view of the data. That
is, the system hides certain details of how the data are stored and maintained.
The need for efficiency has led designers to use complex data structures to represent data in the
database. Since many database-systems users are not computer trained, developers hide the
complexity from users through several levels of abstraction, to simplify users‘ interactions with
the system:
• Physical level. The lowest level of abstraction describes how the data are actually stored. The
physical level describes complex low-level data structures in detail.
• Logical level. The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The logical level thus describes the
entire database in terms of a small number of relatively simple structures. Database
administrators, who must decide what information to keep in the database, use the logical level
of abstraction.
• View level. The highest level of abstraction describes only part of the entire database. Many
users of the database system do not need all this information; instead, they need to access only a
part of the database. The view level of abstraction exists to simplify their interaction with the
system. The system may provide many views for the same database.
Figure: The three levels of data abstraction.
Database Languages
A database system provides a data definition language to specify the database schema and a
data manipulation language to express database queries and updates
 Data-Definition Language (DDL)
We specify a database schema by a set of definitions expressed by a special language called a

data-definition language (DDL)
The following statement in the SQL language defines the account table:

create table account
(account-number char(10),
balance integer);
Execution of the above DDL statement creates the account table. In addition, it updates a special
set of tables called the data dictionary or data directory.
A data dictionary contains metadata—that is, data about data.
The storage structure and access methods used by the database system by a set of statements in a
special type of DDL called a data storage and definition language. These statements define the
implementation details of the database schemas, which are usually hidden from the users.
 Data-Manipulation Language
Data manipulation is
• The retrieval of information stored in the database
• The insertion of new information into the database
• The deletion of information from the database
• The modification of information stored in the database
A data-manipulation language (DML) is a language that enables users to access or manipulate

data as organized by the appropriate data model.
There are basically two types:
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what
data are needed without specifying how to get those data.
The DML component of the SQL language is nonprocedural. A query is a statement requesting
the retrieval of information. The portion of a DML that involves information retrieval is called a
query language.
eg.

select customer.customer-name
from customer
where customer.customer-id = 192-83-7465;
SQL Language is classified in 3 types as

follow-
1. DDL(Data definition language)
2. DML(Data manipulation language)
3. DCL( Data control language)
1. DDL (Data definition language): The Data definition language DDL contains elementary
commands Specific commands. The purpose of those commands is to specify the set of
definitions of a database schema or plan. The results are stored in data dictionary.
Most frequently used DDL commands are:-
CREATE_TABLE..:
ALTER TABLE…:
CREATE INDEX…:
CREATE VIEW….:
DROP TABLE…:
DROP VIEW…..:
DROP INDEX…:

2. DML (Data manipulation language): DML contains commands that enable users to access
and manipulate the data stored in the database. Normally the commands are categorized into
procedural and non-procedural commands. The procedural DML commands require user to
specify what data are needed and how to get them. The non-procedural DML commands require
the user to specify what data are needed without any specifications of how to get them.
Data manipulation is nothing but to retrieve, insert, delete and modify the data stored in a
database.
Most frequently used DML commands are:-
SELECT……:
SELECT……
ORDER BY…..:
SELECT …….
GROUP BY……
INSERT INTO…..:
DELETE FROM……:
UPDATE…….:
3. DCL ( Data control language):- DCL contains commands that are used to provide ecurityon
the data contained by the database tables. The data is to be controlled from unauthorized access. The
permissions are granted to read, write and update the data from the table.
GRANT…….:
RECALL……:

COMMIT….:
SAVEPOINT……:
ROLLBACK…….:
DBMS Facilities
 Data Definition Facilities –
It allows a database designer to define the database using a Data Definition Language (DDL)
provided for the particular DBMS. The DDL allows the designer to specify the data types and
structures, and the constraints on the data to be stored in the database.
Example:
CREATE_TABLE..:
ALTER TABLE…:
CREATE INDEX…:
CREATE VIEW….:
DROP TABLE…:
DROP VIEW…..:
DROP INDEX…:
 Data Manipulation Facilities –

It allows users to insert, update, delete and retrieve data from the database through a Data
Manipulation Language (DML). Having a central repository for all data and data
descriptions allows the DML to provide a general enquiry facility to this data, called a query
language. Using a query language, directly or indirectly, enables new lines of enquiry to be
constructed and satisfied quickly. A query language is sufficiently high level to allow non-
technical personnel to use it, easily. The most common query language is the Structured
Query Language (SQL –pronounced ‗S-Q-L‘).
Example:
SELECT……:
SELECT……
ORDER BY…..:
SELECT …….
GROUP BY……
INSERT INTO…..:
DELETE FROM……:
UPDATE…….:
 Help Facilities –
MS Access provides helpful wizards to allow 'novice' users to do a task or even 'expert' users
to do it easier.
 Reporting Facilities –
Creating professional looking reports from SELECT statements, another good example of
this is MS Access.
 Data Control Facilities –
Permissions, GRANT. Views, CREATE VIEW, etc.

 Multi-user Functionality –
Allowing more than one user to access the database simultaneously. Including concurrency
controls such as locking part of the database that is being updated.
 Distributed Databases –
Distribute the database over several sites.
 CASE Tools –
Computer-aided Software Engineering is the automation of the development of software

systems. An example of this are the easy tools provided in MS Access and Oracle to create
forms for a database.
Database Users and Administrators

A primary goal of a database system is to retrieve information from and store new information in
the database. People who work with a database can be categorized as database users or database
administrators.
 Database Users and User Interfaces

There are four different types of database-system users, differentiated by the way they expect to
interact with the system. Different types of user interfaces have been designed for the different
types of users.
• Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously. For example, a bank teller who needs to
transfer $50 from account A to account B invokes a program called transfer.
The typical user interface for naive users is a forms interface, where the user can fill in
appropriate fields of the form. Naive users may also simply read reports generated from the
database.
• Application programmers are computer professionals who write application programs.

Application programmers can choose from many tools to develop user interfaces. Rapid

application development (RAD) tools are tools that enable an application programmer to
construct forms and reports without writing
a program. There are also special types of programming languages that combine imperative
control structures (for example, for loops, while loops and if-then-else statements) with
statements of the data manipulation language. These languages, sometimes called fourth-
generation languages.
• Sophisticated users interact with the system without writing programs. Instead, they form their
requests in a database query language. They submit each such query to a query processor,
whose function is to break down DML statements into instructions that the storage manager
understands.
Online analytical processing (OLAP) tools simplify analysts‘ tasks by letting them view
summaries of data in different ways.
Another class of tools for analysts is data mining tools, which help them find certain kinds of
patterns in data.
• Specialized users are sophisticated users who write specialized database applications that do
not fit into the traditional data-processing framework. Among these applications are computer-
aided design systems, knowledgebase and expert systems, systems that store data with complex
data types (for example, graphics data and audio data), and environment-modeling systems.
 Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is called
a database administrator (DBA). The functions of a DBA include:
• Schema definition. The DBA creates the original database schema by executing a set of data
definition statements in the DDL.
• Storage structure and access-method definition.

• Schema and physical-organization modification. The DBA carries out changes to the schema
and physical organization to reflect the changing needs of the organization, or to alter the
physical organization to improve performance.
• Granting of authorization for data access. By granting different types of authorization, the
database administrator can regulate which parts of the database various users can access. The
authorization information is kept in a special system structure that the database system consults
whenever someone attempts to access the data in the system.
• Routine maintenance. Examples of the database administrator‘s routine maintenance activities

are:
� Periodically backing up the database, either onto tapes or onto remote servers, to prevent loss
of data in case of disasters such as flooding.
� Ensuring that enough free disk space is available for normal operations, and upgrading disk
space as required.
� Monitoring jobs running on the database and ensuring that performance is not degraded by
very expensive tasks submitted by some users.
Database System Structure / Components of DBMS

A database system is partitioned into modules that deal with each of the responsibilities of
the overall system. The functional components of a database system can be broadly divided
into the storage manager and the query processor components.

Storage Manager
A storage manager is a program module that provides the interface between the low level data
stored in the database and the application programs and queries submitted to the system.
The storage manager is responsible for storing, retrieving, and updating data in the database. The
storage manager components include:
• Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
• Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicting.
• File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.

• Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than the
size of main memory. The storage manager implements several data structures as part of the
physical system implementation:
• Data files, which store the database itself.
• Data dictionary, which stores metadata about the structure of the database, in
particular the schema of the database.
• Indices, which provide fast access to data items that hold particular values.
The Query Processor

The query processor is important because it helps the database system simplify and facilitate
access to data. The query processor components include
• DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands. A query can
usually be translated into any of a number of alternative evaluation plans that all give the same
result. The DML compiler also performs
• Query optimization, that is, it picks the lowest cost evaluation plan from among the
alternatives.
• Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
The Two and Three Tier Architectures

Database applications are usually partitioned into two or three parts,

In a two-tier architecture, the application is partitioned into a component that resides at the
client machine, which invokes database system functionality at the server machine through query
language statements. Application program interface standards like ODBC and JDBC are used for
interaction between the client and the server.
In contrast, in a three-tier architecture, the client machine acts as merely a front end and does
not contain any direct database calls. Instead, the client end communicates with an application
server, usually through a forms interface. The application server in turn communicates with a
database system to access data. The business logic of the application, which says what actions to
carry out under what conditions, is embedded in the application server, instead of being
distributed across multiple clients. Three-tier applications are more appropriate for large
applications, and for applications that run on the World Wide Web.
Figure : DBMS Architecture
Many Web applications use an architecture called the three-tier architecture, which adds an
intermediate layer between the client and the database server, as illustrated in Figure.
This intermediate layer or middle tier is sometimes called the application server and sometimes
the Web server, depending on the application. This server plays an intermediary role by storing
business rules (procedures or constraints) that are used to access data from the database server. It
can also improve database security by checking a client's credentials before forwarding a request
to the database server. Clients contain GUI interfaces and some additional application-specific
business rules. The intermediate server accepts requests from the client, processes the request
and sends database commands to the database server, and then acts as a conduit for passing
(partially) processed data from the database server to the clients, where it may be processed
further and filtered to be presented to users in GUI format. Thus, the user interface, application
rules, and data access act as the three tiers.
Advances in encryption and decryption technology make it safer to transfer sensitive data from
server to client in encrypted form, where it will be decrypted. The latter can be done by the
hardware or by advanced software. This technology gives higher levels of data security, but the
network security issues remain a major concern. Various technologies for data compression are
also helping in transferring large amounts of data from servers to clients over wired and wireless
networks.
Data Independence
The three-schema architecture can be used to further explain the concept of data independence,
which can be defined as the capacity to change the schema at one level of a database system
without having to change the schema at the next higher level. We can define two types of data
independence:
1. Logical data independence is the capacity to change the conceptual schema without having
to change external schemas or application programs. We may change the conceptual schema to
expand the database (by adding a record type or data item), to change constraints, or to reduce
the database (by removing a record type or data item). Changes to constraints can be applied to
the conceptual schema without affecting the external schemas or application programs.
2. Physical data independence is the capacity to change the internal schema without having to
change the conceptual schema. Hence, the external schemas need not be changed as well.
Changes to the internal schema may be needed because some physical files had to be
reorganized-for example, by creating additional access structures-to improve the performance of

retrieval or update. If the same data as before remains in the database, we should not have to
change the conceptual schema.
Data independence occurs because when the schema is changed at some level, the schema at the
next higher level remains unchanged; only the mapping between the two levels is changed.
Hence, application programs referring to the higher-level schema need not be changed. The
three-schema architecture can make it easier to achieve true data independence, both physical
and logical.
Different Types of Databases

Relational Databases
This is the most common of all the different types of databases. In this, the data in a relational
database is stored in various data tables. Each table has a key field which is used to connect it to
other tables. Hence all the tables are related to each other through several key fields. These
databases are extensively used in various industries and will be the one you are most likely to
come across when working in IT.
Examples of relational databases are Oracle, Sybase and Microsoft SQL Server and they are
often key parts of the process of software development. Hence you should ensure you include
any work required on the database as part of your project when creating a project plan and
estimating project costs.
Object Oriented Databases
Object oriented databases are also called Object Database Management Systems (ODBMS).
Object databases store objects rather than data such as integers, strings or real numbers. Objects
are used in object oriented languages such as Smalltalk, C++, Java, and others. Objects basically
consist of the following:
 Attributes - Attributes are data which defines the characteristics of an object. This data may be
simple such as integers, strings, and real numbers or it may be a reference to a complex object.
 Methods - Methods define the behavior of an object and are what was formally called procedures
or functions.

Therefore objects contain both executable code and data. There are other characteristics of
objects such as whether methods or data can be accessed from outside the object. We don't
consider this here, to keep the definition simple and to apply it to what an object database is. One
other term worth mentioning is classes. Classes are used in object oriented programming to
define the data and methods the object will contain. The class is like a template to the object. The
class does not itself contain data or methods but defines the data and methods contained in the
object. The class is used to create (instantiate) the object. Classes may be used in object
databases to recreate parts of the object that may not actually be stored in the database. Methods
may not be stored in the database and may be recreated by using a class.
Operational Databases
In its day to day operation, an organization generates a huge amount of data. Think of things
such as inventory management, purchases, transactions and financials. All this data is collected
in a database which is often known by several names such as operational/ production database,
subject-area database (SADB) or transaction databases.
An operational database is usually hugely important to Organizations as they include the
customer database, personal database and inventory database i.e. the details of how much of a
product the company has as well as information on the customers who buy them. The data stored
in operational databases can be changed and manipulated depending on what the company
requires.
Database Warehouses
Organizations are required to keep all relevant data for several years. In the UK it can be as long
as 6 years. This data is also an important source of information for analyzing and comparing the
current year data with that of the past years which also makes it easier to determine key trends
taking place. All this data from previous years are stored in a database warehouse. Since the data
stored has gone through all kinds of screening, editing and integration it does not need any
further editing or alteration.
With this database ensure that the software requirements specification (SRS) is formally
approved as part of the project quality plan.
Distributed Databases

Many organizations have several office locations, manufacturing plants, regional offices, branch
offices and a head office at different geographic locations. Each of these work groups may have
their own database which together will form the main database of the company. This is known as
a distributed database.
End-User Databases
There is a variety of data available at the workstation of all the end users of any organization.
Each workstation is like a small database in itself which includes data in spreadsheets,
presentations, word files, note pads and downloaded files. All such small databases form a
different type of database called the end-user database.
Data Association
Association
Association is a relationship between two objects. In other words, association defines the
multiplicity between objects. You may be aware of one-to-one, one-to-many, many-to-one,
many-to-many all these words define an association between objects. Aggregation is a special
form of association. Composition is a special form of aggregation.
Example: A Student and a Faculty are having an association.

Aggregation
Aggregation is a special case of association. A directional association between objects. When an
object ‗has-a‘ another object, then you have got an aggregation between them. Direction between
them specified which object contains the other object. Aggregation is also called a ―Has-a‖
relationship.
Composition

Composition is a special case of aggregation. In a more specific manner, a restricted aggregation

is called composition. When an object contains the other object, if the contained object cannot
exist without the existence of container object, then it is called composition.
 Entity –
An entity in an ER Model is a real-world entity having properties called attributes.

Every attribute is defined by its set of values called domain. For example, in a school database, a
student is considered as an entity. Student has various attributes like name, age, class, etc.
 Relationship –
The logical association among entities is called relationship. Relationships are mapped with
entities in various ways. Mapping cardinalities define the number of association between two
entities.
Mapping cardinalities −
 one to one
 one to many
 many to one
 many to many
Data Models
A database model is a type of data model that determines the logical structure of a database and
fundamentally determines in which manner data can be stored, organized, and manipulated.
Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints.
A data model-a collection of concepts that can be used to describe the structure of a database-
provides the necessary means to achieve this abstraction. is by structure of a database, we mean

the data types, relationships, and constraints that should hold for the data. Most data models also
include a set of basic operations for specifying retrievals and updates on the database.
 A model is a representation of reality, 'real world' objects and events, associations. A data
model represents the organization itself.
 It is a collection of conceptual tools for describing data, data relationships, data semantics
and consistency constraints.
 Data models define how data is connected to each other and how they are processed and
stored inside the system.
 It should provide the basic concepts and notations that will allow database designers and end
users unambiguously and accurately to communicate their understanding of the
organizational data.
Types of Data Models:
1. High Level- Conceptual data model.
2. Low Level – Physical data model.
3. Representational or Record Based or Implementation Data Model
4. Object Based Data Model
1. High Level-conceptual data model: User level data model is the high level or
conceptual model. This provides concepts that are close to the way that many users
perceive data.
2. Low level-Physical data model : Physical data models describe how data is stored in
the computer, representing information such as record structures, record ordering, and
access paths. There are not as many physical data models as logical data models, the most
common one being the Unifying Model.
Low level data model is only for Computer specialists not for end-user.
3. Representation data model: It is between High level & Low level data model which
provides concepts that may be understood by end-user but that are not too far removed
from the way data is organized by within the computer.

Record based logical models are used in describing data at the logical and view levels. In
contrast to object based data models, they are used to specify the overall logical structure
of the database and to provide a higher-level description of the implementation. Record
based models are so named because the database is structured in fixed format records of
several types. Each record type defines a fixed number of fields, or attributes, and each
field is usually of a fixed length.
The three most widely accepted record based data models are:
• Hierarchical Model
• Network Model
• Relational Model
4. Object Based Data Model
Object based data models use concepts such as entities, attributes, and relationships. An
entity is a distinct object (a person, place, concept, and event) in the organization that is
to be represented in the database. An attribute is a property that describes some aspect of
the object that we wish to record, and a relationship is an association between entities.
Some of the more common types of object based data model are:
• Entity-Relationship
• Object Oriented
• Object Relational
Relational Model
The Relational Model uses a collection of tables both data and the relationship among those data.
Each table have multiple column and each column has a unique name.
The relational database was invented by E. F. Codd at IBM in 1970.The relational model
represents data and relationships among data by a collection of tables, each of which has a
number of columns with unique names. Relational data model is used widely around the world

for data storage and processing. This model is simple and it has all the properties and capabilities
required to process data with storage efficiency.
For example the following figure shows a relational database showing customers and their
accounts. The customer Nina has two accounts with Rs. 50000 and 30000 balance.
Table Name: Customer
Name Street City Account Number
Neena Tagore Garden Delhi 101
Jacqueline Janakpuri Delhi 402
Peter Connaught Place Delhi 506
Table Name : Account info
Account Number Balance
101 50000
201 30000
402 150000
506 80000

Advantages
 Much easier to understand.
 In rational database, changes in the database structure do not affect the data access. So
relational database has structural independence.
 The relational database model achieves both data independence and

structural independence.
 The database design, maintenance, administration and usage much easier than the other
models.
 It is simpler to navigate
Disadvantages
 A major disadvantage in the use of relational database system is machine performance. If

the number of tables between which relationships to be established are large and the
tables themselves effect the performance in responding to the SQL queries.
 Slower processing times than hierarchical and network models.
Hierarchical Model
A hierarchical data model is a data model which the data is organized into a tree like structure.
The structure allows repeating information using parent/child relationships: each parent can have
many children but each child only has one parent. All attributes of a specific record are listed
under an entity type.

In Hierarchical model data elements are linked as an inverted tree structure (root at the top with
branches formed below). Below the single root data element are subordinate elements each of
which in turn has its own subordinate elements and so on, the tree can grow to multiple levels.
Data elements has parent child relationship as in a family tree.
For Example in an organization employees are categorized by their department and within a
department they are categorized by their job function such as managers, engineers, technicians
and support staff.
Advantages
1. The representation of records is done using an ordered tree, which is natural method of
implementation of one–to-many relationships.
2. Proper ordering of the tree results in easier and faster retrieval of records.
3. Allows the use of virtual records. This results in a stable database especially when
modification of the data base is made.
4. Hierarchical model was the first database model that offered the data security that is provided
and enforced by DBMS.
Disadvantages
1. Although the hierarchical database model is conceptually simple and easy to design , it is quite
complex to implement.

2. If you make any changes in the database structure of hierarchical database, then you need to
make the necessary changes in all the application programs that access the database. Thus
maintaining the database and the applications can become very difficult.
Network model
The data in the network model are represented by collection of records and relationships among
data are represented by links, which can be viewed as pointers.
This model is the extension of hierarchical data model. In this model also there exist a parent
child relationship but a child data element can have more than one parent element or no parent at
all. The main difference of the network model from the hierarchical model is its ability to handle
many –to – many (n: n) relationships or in other words it allows a record to have more than one
parent.
Example of Network model is given below where there are relationships among courses offered
and students enrolled for each course in a college. Each student can be enrolled for several
courses and each course may have a number of students enrolled for it. The students enrolled for
English are Miya and Priyanka and Miya has taken three courses English, Math and Science. The
example also shows a child element that has no parent element i.e he has not taken any course in
this semester, he might be a research student.
Advantages
1. It is conceptually simple and easy to design.
2. It can handle one to many (1:N) and many to many (M : N) relationships

3. The changes in data characteristics do not require changes to the application programs.
4. The data access is easier and flexible than the hierarchical model.
Disadvantages
1. Detailed structural knowledge is required.
2. There is lack of structural independence.
3.The insertion, deletion and updating operations of any record require large number of pointers
adjustments.
Entity Relationship Model

The entity relationship (E-R) model consists of a collection of basic objects, called entities and of
relationships among these entities.
Entity
An entity can be real world object, either animate or inanimate, that can be easily identifiable.
For example, in a school database, students, teachers, classes and courses offered can be
considered as entities. Entities are represented by means of rectangles.
Relationship
A relationship is an association among several entities. For example an employee works at a

department, a student enrolls in a course. Here, works at and enrollsare called relationship.
Relationships are represented by diamond-shaped box.
Attributes
Entities are represented by means of their properties, called attributes. All attributes have
values. For example, a student entity may have name, class, and age as attributes. Attributes are
represented by means of ellipses. Every ellipse represents one attribute and is directly connected
to its entity (rectangle).
The ER Model can be diagrammatically represented as follows:

Example of ER model
Let us consider an ER model for Banking system consisting of customers and accounts. The
diagram shown below indicates that there are two entity sets, customer and account with
attributes customer name, address, account no. and balance. In the diagram there is also a
depositor between customer and account.
Advantages of ER model
 The E-R model gives graphical and diagrammatical representation of various entities,
their attributes and relationships between entities. So, It helps in the clear understanding
of the data structure and in minimizing redundancy and other problems.
 It is an effective communication tool among users, domain experts and database

designers.
 Conversion of ER Diagram to any other data model like network model, hierarchical
model and the relational model is very easy.
 ER model specifies mapping cardinalities.
Disadvantages
 It is just used for database design and not for implementation.

 There is no industry standard notation for developing ER diagram.
 The E-R data model is especially popular for high level design.
 No representation of data manipulation.
 Physical design derived from E-R Model may have some amount of ambiguities or
inconsistency.
Object-oriented Data Models

Object oriented models were introduced to overcome the shortcomings of conventional models
like Relational, Hierarchical and network model. An object oriented database is collection of
objects whose behavior, state, and relationships are defined in accordance with object oriented
concepts (such as objects, class, class hierarchy etc. )
One set comprises models of persistent O-O Programming Languages such as C++ (e.g., in
OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE).
The following diagram represents an example of object oriented database structure. Here Class
vehicle is root of a class composition hierarchy including classes VehicleSpecs, Company and
Employee. Class Vehicle is also root of a class Hierarchy involving classes. Two Wheeler and
FourWheeler. Class Company is in turn, root of a class hierarchy with subclasses Domestic
Company and ForeignCompany. It is also root of a class composition hierarchy involving class
Employee.

For above database structure a typical query may be ―President‘s and Company‘s names for all
companies that manufacture two wheeler vehicles and are located in Pune, India‖.
Advantages
 Data access is easy.
 It provides higher performance management of objects and complex interrelationships

between objects.
 Unlike traditional databases (such as hierarchical, network or relational), the object

oriented database are capable of storing different types of data, for example, pictures,
voice video, including text, numbers and so on.
Disadvantages
 There is no universally agreed data model for an OODBMS, and most models lack a
theoretical foundation.

 The increased functionality provided by the OODBMS makes the system more complex
than that of traditional DBMSs.
Object-Relational Models
• Most Recent Trend. Started with Informix
• Universal Server.
The object-relational model is designed to provide a relational database management that allows
developers to integrate databases with their data types and methods. It is essentially a relational
model that allows users to integrate object-oriented features into it. Object relational model
combine advantages of both modern object oriented programming languages with relational
database features such as multiple views of data and high level non procedural query language.
Some of these systems available In market are IBM‘s DB2 universal server, oracle corporations
oracle 8, Microsoft Corporations SQL server 7 and so on.
Advantages
 It allows users to define new data types that combine one or more of the currently
existing data types. Complex types aid in better flexibility in organizing the data on a
structure made up of columns and tables.
 Users are able to extend the capability of the database server; this can be done by
defining new data types, as well as user-defined patterns. This allows the user to store
and manage data.
Disadvantages
 Storage structures and access methods become quite complex.
 Issues related to indexing on user defined types are experienced.
UNIT-II ENTITY RELATIONSHIP MODEL and
RELATIONAL DATA MODEL

ENTITY RELATIONSHIP MODEL

The entity relationship (E-R) model consists of a collection of basic objects, called entities and of
relationships among these entities.
Entity
An entity can be real world object, either animate or inanimate, that can be
easily identifiable. For example, in a school database, students, teachers,
classes and courses offered can be considered as entities. Entities are
represented by means of rectangles.
Entity set:
Same as an entity type, but defined at a particular point in time, such as students enrolled in a
class on the first day. Other examples: Customers who purchased last month, cars currently
registered in Florida. A related term is instance, in which the specific person or car would be an
instance of the entity set.
Entity categories:
Entities are categorized as strong, weak or associative. A strong entity can be defined solely by
its own attributes, while a weak entity cannot. An associative entity associates entities (or
elements) within an entity set.
Weak Entity Set:
 An entity set that does not have a primary key is referred to as a weak entity set.
 The existence of a weak entity set depends on the existence of a strong entity set.
 A discriminator of a weak entity set is the set of attributes that distinguishes
among all the entities of a weak entity set.
 The primary key of a weak entity set is formed by the primary key of the strong

entity set on which the weak entity set is existence depend.

 Double rectangle represents weak entity.
 In following diagram payment is weak entity set in double rectangle.
Example:
 Payment-number – discriminator of the payment entity set

 Primary key for payment – (loan-number, payment-number).
Strong Entity Set:
 It exists independently from other entity types.

 They always possess one or more attributes that uniquely distinguish each
occurrence of the entity.
Relationship
A relationship is an association among several entities. For example an
employee works at a department, a student enrolls in a course. Here, works
at and enrolls are called relationship. Relationships are represented by
diamond-shaped box.
Attributes

Entities are represented by means of their properties, called attributes. All

attributes have values. For example, a student entity may have name, class, and
age as attributes. Attributes are represented by means of ellipses. Every ellipse
represents one attribute and is directly connected to its entity (rectangle).
Attribute categories:
Attributes are categorized as simple, composite, derived, as well as single-value or multi-value.
Simple: Means the attribute value is atomic and can‘t be further divided, such as a phone
number.
Composite: Sub-attributes spring from an attribute.
Derived: Attributed is calculated or otherwise derived from another attribute, such as age from a
birthdate.
Multi-value: More than one attribute value is denoted, such as multiple phone numbers for a
person.
Single-value: Just one attributes value. The types can be combined, such as: simple single-value
attributes or composite multi-value attributes.
Entity keys:
Refers to an attribute that uniquely defines an entity in an entity set. Entity keys can be super,
candidate or primary.
Super key: A set of attributes (one or more) that together define an entity in an entity set.
Candidate key: A minimal super key, meaning it has the least possible number of attributes to
still be a super key. An entity set may have more than one candidate key.
Primary key: A candidate key chosen by the database designer to uniquely identify the entity
set.
Foreign key: Identifies the relationship between entities.
The ER Model can be diagrammatically represented as follows:
Example of ER model
Let us consider an ER model for Banking system consisting of customers and accounts. The
diagram shown below indicates that there are two entity sets, customer and account with
attributes customer name, address, account no. and balance. In the diagram there is also a
depositor between customer and account.

Relationship Types:
There are three types of relationship that exist between Entities.
a) Binary Relationship
b) Recursive Relationship
c) Ternary Relationship
a) Binary Relationship
Binary Relationship means relation between two Entities.
This is further divided into three types.

1. One to One: This type of relationship is rarely seen in real world.
The above example describes that one student can enroll only for one course and a course
will also have only one Student. This is not what you will usually see in relationship.
2. One to Many: It reflects business rule that one entity is associated with many number of same
entity. The example for this relation might sound a little weird, but this means that one
student can enroll to many courses, but one course will have one Student.
The arrows in the diagram describes that one student can enroll for only one course.
3. Many to One: It reflects business rule that many entities can be associated with just one entity.
For example, Student enrolls for only one Course but a Course can have many Students.

4. Many to Many :
The above diagram represents that many students can enroll for more than one courses.
b) Recursive Relationship
When an Entity is related with itself it is known as Recursive Relationship.
c) Ternary Relationship
Relationship of degree three is called Ternary relationship.
Degree of Relationship
The number of participating entities in a relationship defines the degree of the relationship.
1. Binary = degree 2
2. Ternary = degree 3
3. n-ary = degree
Cardinalities or Mapping Cardinalities or

Cardinality Constraints:
Cardinality is a constraint on a relationship specifying the number of entity instances that a
specific entity may be related to via the relationship.
Cardinality defines the number of entities in one entity set, which can be associated with the
number of entities of other set via relationship set.
 One-to-one − One entity from entity set A can be associated with at most one entity of
entity set B and vice versa.

 One-to-many − One entity from entity set A can be associated with more than one
entities of entity set B however an entity from entity set B, can be associated with at most
one entity.
 Many-to-one − More than one entities from entity set A can be associated with at most
one entity of entity set B, however an entity from entity set B can be associated with
more than one entity from entity set A.

 Many-to-many − One entity from A can be associated with more than one entity from B
and vice versa.
 Participation Constraints
 Total Participation – Each entity is involved in the relationship. Total participation is

represented by double lines.
 Partial participation − Not all entities are involved in the relationship. Partial
participation is represented by single lines.

ER Design Issues
ER design issues need to be discussed for better ER- design
1. Use of Entity set vs. Attributes

In the real world situations, sometimes it is difficult to select the property as an attribute or an
entity set.
2. Use of Entity sets vs. Relationship sets

Sometimes, an entity set can be better expressed in relationship set. Thus, it is not always clear
whether an object is best expressed by an entity set or a relationship set.

3. Binary vs. n-ary relationship sets

Relationships in databases are often binary. Some relationships that appear to be non-binary
could actually be better represented by several binary relationships.
4. Placement of Relationship Attributes

The cardinality ratio of a relationship can affect the placement of relationship attributes:

• One-to-Many: Attributes of 1:M relationship set can be repositioned to only the entity set on
the many side of the relationship
• One-to-One: The relationship attribute can be associated with either one of the participating
entities
• Many-to-Many: Here, the relationship attributes can not be represented to the entity sets; rather
they will be represented by the entity set to be created for the relationship set
ER Design Methodologies: (To resolve design issues)
The guidelines that should be followed while designing an ER diagram are discussed below:
• Recognize entity sets
• Recognize relationship sets and participating entity sets
• Recognize attributes of entity sets and attributes of relationship sets
• Define binary relationship types and existence dependencies
• Define general cardinality, constraints, keys, and discriminators
• Design diagram.
ER Model Symbols or ERD symbols and

notations
There are several notation systems, which are similar but vary in a few specifics.
Chen notation style


RELATIONAL DATA MODEL

The Relational Model uses a collection of tables both data and the relationship among those data.
Each table have multiple column and each column has a unique name.
The relational database was invented by E. F. Codd at IBM in 1970.The relational model
represents data and relationships among data by a collection of tables, each of which has a
number of columns with unique names. Relational data model is used widely around the world
for data storage and processing. This model is simple and it has all the properties and capabilities
For example the following figure shows a relational database showing customers and their
accounts. The customer Nina has two accounts with Rs. 50000 and 30000 balance.
Table Name: Customer

Name Street City Account Number
Jacqueline Janakpuri Delhi 402
Peter Connaught Place Delhi 506
Table Name : Account info
Account Number Balance
101 50000
201 30000
402 150000
506 80000
Relational data model is the primary data model, which is used widely around the world for data
storage and processing. This model is simple and it has all the properties and capabilities
Concepts of Relational Model

Tables − In relational data model, relations are saved in the format of Tables. This format stores
the relation among entities. A table has rows and columns, where rows represents records and
columns represent the attributes.
Tuple − A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance − A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name (table name), attributes, and
their names.
Relation key − Each row has one or more attributes, known as relation key, which can identify
the row in the relation (table) uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as attribute
domain.
Constraints in Relational Model:

Every relation has some conditions that must hold for it to be a valid relation. These conditions
are called Relational Integrity Constraints.
There are three main integrity constraints −
 Key constraints
 Domain constraints
 Referential integrity constraints
Key Constraints
There must be at least one minimal subset of attributes in the relation, which can identify a tuple
uniquely. This minimal subset of attributes is called key for that relation. If there are more than
one such minimal subsets, these are called candidate keys.
a) Domain Constraints-

Attributes have specific values in real-world scenario. For example, age can only be a positive
integer. The same constraints have been tried to employ on the attributes of a relation. Every
attribute is bound to have a specific range of values. For example, age cannot be less than zero
and telephone numbers cannot contain a digit outside 0-9.
b) Integrity Rules
Relational database integrity rules are very important to good database design. Many (but by no
means all) RDBMS enforce integrity rules automatically. Those rules are:
1- Integrity Rule 1 OR Entity Integrity OR Key Constraints
 In a relation with a key attribute, no two tuples can have identical values for key
attributes.
 A key attribute can not have NULL values.
All primary key entries are unique, and no part of primary key may be null. Each row will have a
unique identity, and foreign key values can properly reference primary key values, for example...
No invoice can have a duplicate number, nor can it be null. In short, all invoices are uniquely
identified by their invoice number.
2- Integrity Rule 1 OR Referential Integrity
Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key
attribute of a relation that can be referred in other relation.
Referential integrity constraint states that if a relation refers to a key attribute of a different or
same relation, then that key element must exist.
A foreign key may have either a null entry, as long as it is not a part of its table‘s primary key, or
an entry that matches the primary key value in a table to which it is related.(every non –null
foreign key value must reference an existing primary key value).It is possible for an attribute not
to have corresponding value, but it will be impossible to have an invalid entry. for example, A
Customer might not yet have an assigned sales representative(number),but it will be impossible
to have an invalid sales representative(number).
To avoid nulls, some designers use special codes, known as flags, to indicate the absence of
some value.

Other integrity rules that can be enforced in the relational model are the NOT NULL and
UNIQUE constraints.
The NOT NULL constrain can be placed on a column to ensure that every row in the table has a
value for that column the UNIQUE constraint is a restriction placed on a column to ensure that
no duplicate values exist for that column.
Instances and Schemas

The collection of information stored in the database at a particular moment is called an instance
of the database. The overall design of the database is called the database schema. Schemas are
changed infrequently, if at all.
A database schema corresponds to the variable declarations (along with associated type
definitions) in a program. Each variable has a particular value at a given instant. The values of
the variables in a program at a point in time correspond to an instance of a database schema.
The physical schema describes the database design at the physical level, while the logical
schema describes the database design at the logical level. A database may also have several
schemas at the view level, sometimes called sub schemas, that describe different views of the
database.
Schema:
If A1, A2, …, An are attributes
R = (A1, A2, …, An ) is a relation schema
Example:
Customer_schema = (customer_name, customer_street, customer_city)
Instance:
The current values (relation instance) of a relation are specified by a

Table.
An element t of r is a tuple, represented by a row in a table
Example:
Instance For Customer
INTEGRITY CONSTRAINTS
Constraints are the rules enforced on data columns on table. These are used to limit the type of
data that can go into a table. This ensures the accuracy and reliability of the data in the database.
Constraints could be column level or table level. Column level constraints are applied only to
one column, whereas table level constraints are applied to the whole table.
Following are commonly used constraints available in SQL.
1. NOT NULL Constraint: Ensures that a column cannot have NULL value.
By default, a column can hold NULL values. If you do not want a column to have a NULL
value, then you need to define such constraint on this column specifying that NULL is now
not allowed for that column.
A NULL is not the same as no data, rather, it represents unknown data.
Example:
For example, the following SQL creates a new table called CUSTOMERS and adds five
columns, three of which, ID and NAME and AGE, specify not to accept NULLs:

CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);
2. DEFAULT Constraint: Provides a default value for a column when none is specified.
The DEFAULT constraint provides a default value to a column when the INSERT INTO
statement does not provide a specific value.
Example:
columns. Here, SALARY column is set to 5000.00 by default, so in case INSERT INTO
statement does not provide a value for this column, then by default this column would be set
to 5000.00.
ID INT NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
SALARY DECIMAL (18, 2) DEFAULT 5000.00,
PRIMARY KEY (ID)
);
If CUSTOMERS table has already been created, then to add a DFAULT constraint to
SALARY column, you would write a statement similar to the following:
3. UNIQUE Constraint: Ensures that all values in a column are different.

The UNIQUE Constraint prevents two records from having identical values in a
particular column. In the CUSTOMERS table, for example, you might want to prevent
two or more people from having identical age.
Example:
columns. Here, AGE column is set to UNIQUE, so that you can not have two records
with same age:
ID INT NOT NULL,
AGE INT NOT NULL UNIQUE,
ADDRESS CHAR (25) ,
PRIMARY KEY (ID)
);
4. PRIMARY Key: Uniquely identified each rows/records in a database table.
A primary key is a field in a table which uniquely identifies each row/record in a

database table. Primary keys must contain unique values. A primary key column cannot
have NULL values.
A table can have only one primary key, which may consist of single or multiple fields.
When multiple fields are used as a primary key, they are called a composite key.
If a table has a primary key defined on any field(s), then you can not have two records
having the same value of that field(s).
Note: You would use these concepts while creating database tables.
Create Primary Key:

Here is the syntax to define ID attribute as a primary key in a CUSTOMERS table.
ID INT NOT NULL,

AGE INT NOT NULL,
ADDRESS CHAR (25) ,
PRIMARY KEY (ID)
);
5. FOREIGN Key: Uniquely identified a rows/records in any another database table.
A foreign key is a key used to link two tables together. This is sometimes called a
referencing key.
Foreign Key is a column or a combination of columns whose values match a Primary Key in
a different table.
The relationship between 2 tables matches the Primary Key in one of the tables with a
Foreign Key in the second table.
If a table has a primary key defined on any field(s), then you can not have two records
having the same value of that field(s).
Example:
Consider the structure of the two tables as follows:
CUSTOMERS table:
ID INT NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
PRIMARY KEY (ID)
);
ORDERS table:
CREATE TABLE ORDERS (

ID INT NOT NULL,
DATE DATETIME,
CUSTOMER_ID INT references CUSTOMERS(ID),
AMOUNT double,
PRIMARY KEY (ID)
);
6. CHECK Constraint: The CHECK constraint ensures that all values in a column satisfy certain
conditions.
The CHECK Constraint enables a condition to check the value being entered into a record. If
the condition evaluates to false, the record violates the constraint and isn't entered into the
table.
Example:
columns. Here, we add a CHECK with AGE column, so that you can not have any
CUSTOMER below 18 years:
ID INT NOT NULL,
AGE INT NOT NULL CHECK (AGE >= 18),
ADDRESS CHAR (25) ,
PRIMARY KEY (ID)
);
Functional Dependency (FD):

The attributes of a table is said to be dependent on each other when an attribute of a table
uniquely identifies another attribute of the same table.

Functional dependency (FD) is a set of constraints between two attributes in a relation.

Functional dependency says that if two tuples have same values for attributes A1, A2,..., An,
then those two tuples must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally
determines Y. The left-hand side attributes determine the values of attributes on the right-hand
side.
Single Valued Functional Dependency –
A simple example of single value functional dependency is when A is the primary key of an
entity (eg. SID) and B is some single valued attribute of the entity (eg. Sname). Then, A → B
must always hold.
Partial Functional Dependency –

A Functional Dependency in which one or more non key attributes are functionally depending
on a part of the primary key is called partial functional dependency.
Transitive Dependency –
Given a relation R(A,B,C) then dependency like A–>B, B–>C is a transitive dependency,
since A–>C is implied .
Trivial Dependency –
If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a trivial
FD. Trivial FDs always hold.
Normalization
Normalization is a process of organizing the data in database to avoid data redundancy,
insertion anomaly, update anomaly & deletion anomaly.
Database Normalization is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy and undesirable
characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts
data into tabular form by removing duplicated data from the relation tables.
Normalization is used for mainly two purpose,
 Eliminating redundant (useless) data.
 Ensuring data dependencies make sense i.e. data is logically stored.

 Eliminate Insertion, Update and Deletion Anomalies.
Anomalies:
Update anomalies − If data items are scattered and are not linked to each other properly, then it
could lead to strange situations. For example, when we try to update one data item having its
copies scattered over several places, a few instances get updated properly while a few others are
left with old values. Such instances leave the database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of
unawareness, the data is also saved somewhere else.
Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to a consistent
state.
First Normal Form:

First Normal Form is defined in the definition of relations (tables) itself. This rule defines that
all the attributes in a relation must have atomic domains. The values in an atomic domain are
indivisible units.
 As per the rule of first normal form, an attribute (column) of a table cannot hold multiple
values. It should hold only atomic values.
We re-arrange the relation (table) as below, to convert it to First Normal Form.

Each attribute must contain only a single value from its pre-defined domain.
Second Normal Form:

Before we learn about the second normal form, we need to understand the following −
Prime attribute − An attribute, which is a part of the prime-key, is known as a prime attribute.
Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-
prime attribute.
A table is said to be in 2NF if both the following conditions hold:
 Table is in 1NF (First normal form)
 Remove Partial Dependency.
 No non-prime attribute is dependent on the proper subset of any candidate key of table.
If we follow second normal form, then every non-prime attribute should be fully functionally
dependent on prime key attribute. That is, if X → A holds, then there should not be any proper
subset Y of X, for which Y → A also holds true.
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually. But we find that Stu_Name can
be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is
called partial dependency, which is not allowed in Second Normal Form.

We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
Third Normal Form:

A table design is said to be in 3NF if both the following conditions hold:
 Table must be in 2NF
 Transitive functional dependency of non-prime attribute on any super key should be
removed.
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each
functional dependency X-> Y at least one of the following conditions hold:
 X is a super key of table
 Y is a prime attribute of table
An attribute that is a part of one of the candidate keys is known as prime attribute.
We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute.
We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor
is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive
dependency.
To bring this relation into third normal form, we break the relation into two relations as follows
−

Boyce-Codd Normal Form:

Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms.
BCNF states that −
For any non-trivial functional dependency, X → A, X must be a super-key.
In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-
key in the relation ZipCodes. So,
Stu_ID → Stu_Name, Zip
and
Zip → City
Which confirms that both the relations are in BCNF.
Relational Algebra
Relational Algebra
 A query language is a language in which user requests information from the database. it
can be categorized as either procedural or nonprocedural.
 In a procedural language the user instructs the system to do a sequence of operations on
database to compute the desired result. In nonprocedural language the user describes the
desired information without giving a specific procedure for obtaining that information.
 The relational algebra is a procedural query language. It consists of a set of operations
that take one or two relations as input and produces a new relation as output.
It uses operators to perform queries.
 An operator can be either unary or binary. Relational algebra is performed recursively
on a relation and intermediate results are also considered relations.
Fundamental Operations
 SELECT
 PROJECT
 UNION
 SET DIFFERENCE
 CARTESIAN PRODUCT
 RENAME
Select and project operations are unary operation as they operate on a single relation. Union, set
difference, Cartesian product and rename operations are binary operations as they operate on
pairs of relations.
Other Operations
 SET INTERSECTION
 NATURAL JOIN
 DIVISION
 ASSIGNMENT
1. SELECT (σ):-
It selects tuples that satisfy the given predicate from a relation.
σcondition(relation)
Notation – σ p (r)
Where σ stands for selection predicate and r stands for relation. p is prepositional logic formula
which may use connectors like and, or, and not. These terms may use relational operators like
− =, ≠, ≥, < , >, ≤.
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database'.
σsubject = "database" and price = "450"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450.
σsubject = "database" and price = "450" or year > "2010"(Books)

Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those books
published after 2010.
2. Project Operation (∏)

o Produce a subset of attributes from a relation
o Unselected columns are eliminated
o Duplicate rows are eliminated
o Result is a relation
πattribute- list(relation)
It projects column(s) that satisfy a given predicate.
Notation − ∏A1, A2, An (r)
Where A1, A2 , An are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
For example −
∏subject, author (Books)
Selects and projects columns named as subject and author from the relation Books.
 πName,Hobby(Person)
 πName,Address(Person)
Combination of Select and Project:

3. Union Operation (∪)

It performs binary union between two given relations and is defined as −
r ∪ s = { t | t ∈ r or t ∈ s}
Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold −
 r, and s must have the same number of attributes.
 Attribute domains must be compatible.
 Duplicate tuples are automatically eliminated.
∏ author (Books) ∪ ∏ author (Articles)
Output − Projects the names of the authors who have either written a book or an article or both.

4. Set Difference (−)

The result of set difference operation is tuples, which are present in one relation but are not in
the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)
Output − Provides the name of authors who have written books but not articles.

5. Cartesian Product (Χ)

Combines information of two different relations into one.
Notation − r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
σauthor = 'tutorialspoint'(Books Χ Articles)
Output − Yields a relation, which shows all the books and articles written by tutorialspoint.
6. Rename Operation (ρ)

The results of relational algebra are also relations but without any name. The rename operation
allows us to rename the output relation. 'rename' operation is denoted with small Greek
letter rho ρ.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.
Other operations are −
 SET INTERSECTION
 NATURAL JOIN
 DIVISION
 ASSIGNMENT

7. Set Intersection
•R∩S
• Includes all tuples that are in both R and S.
R and S must be union compatible.
• Schema of the result is that of R.
8. Join:
 Can be defined as cross-product followed by selection and projection.
 We have several variants of join.
o – Condition joins
o – Equijoin
o – Natural join



9. Division:
10. Assignment:
Sometimes it is useful to be able to write a relational algebra expression in parts using a
temporary relation variable.
The assignment operation, denoted by , works like assignment in a programming language.

Example:
No extra relation is added to the database, but the relation variable created can be used in
subsequent expressions. Assignment to a permanent relation would constitute a modification to
the database
SQL Data Types

Data Types Description
CHAR(size) This data type is used to store character strings value of

fixed length. The size in brackets determines the number
of characters the cell can hold. The data held is right-
padded with spaces to whatever length specified.
VARCHAR(size)/ This data type is used to store variable length

VARCHAR 2(size) alphanumeric data. It is more flexible from of the CHAR
data type .The maximum this data type can hold up to
4000 characters.
DATE This data type is used to represent data and time. The
standard format is DD-MON-YY as an in 21-jun-04. Date
Times stores date in the 24-hour format.
NUMBER(P,S) The NUMBER data type is used to store numbers (fixed or

floating point). Numbers of virtually any magnitude
maybe stored up to 38 digits of precision.

LONG This data type is used to store variable length character

strings containing up to 2 GB.
RAW/LONG RAW The RAW/ LONG RAW data types are used to store binary
data, such as digitized picture or image. RAW data type
can have a maximum length of 225 bytes.
SQL Commands:
1. Create Table:
Command Type : DDL

Command Syntax : Create Table <Table Name>
(<Column Name1><datatype> (<size>),
<Column Name2><datatype> (<size>),

:
:
<Column Name n><datatype> (<size>)
);
Enter
Use : This DDL type command is used to create table.
Example : CREATE TABLE CLIENT_MASTER

(
CLIENTNO VARCHAR2(6),
NAME VARCHAR2(20),
ADDRESS VARCHAR2(30),
CITY VARCHAR2(30),
PINCODE NUMBER(15),

STATE VARCHAR2(5),
BALDUE NUMBER(10,2)
);
Output : Table created.
2. Describe Table:
Command Type : DDL

Command Syntax : DESCRIBE <Table Name>;
Use : This DDL type command is used to describe table.
Example : DESCRIBE TABLE CLIENT_MASTER
NAME NULL? TYPE
------------------------------- -------- ----
CLIENTNO VARCHAR2(6),
NAME VARCHAR2(20),
ADDRESS VARCHAR2(30),
CITY VARCHAR2(30),
PINCODE NUMBER(15),
STATE VARCHAR2(5),
BALDUE NUMBER(10,2)
3. Insert into:
Command Type : DML
Command Syntax : Insert into<table name>(<columnname1>,<columnname2>)

Values(<expression1>,<expression2>);
Use : This DML type command is used to Insert table.
Example : INSERT TABLE CLIENT_MASTER

1*INSERT INTO CLIENT_MASTER
VALUES('C00001','IVAN','KANDEWALI','MUMBAI',400054,'M.H',5000);
Output : 1 row created.

4. Select Command:
Command Type : DML

Command Syntax : SELECT [DISTINCT]* | Column Name1, [Column Name2……..]
|[Functions]
FROM tablename
[WHERE ConditionalExpr [AND | OR ConditionalExpr……..]]
[ORDER BY Column_Name[ASC |DESC] [,Column_Name [ASC
|DESC]….]]
[GROUP BY Column_Name
[HAVING ConditionalExpr]]
Use : This DML type command is used to clauses – ―FROM‘‘, ―WHERE‘‘,
―ORDER BY‘‘, ― GROUP BY‘‘, ―HAVING‘‘.
Example :.
SELECT NAME FROM CLIENT_MASTER;
Output : NAME
--------------------
CHHAYA
ASHWINI
HANSEL
SELECT SALESMANNAME FROM SALESMAN_MASTER WHERE SALAMT=3000;
Output : SALESMANNAME
--------------------
AMAN
OMKAR
RAJ
5. Delete and Update:
Command Type : DML
Command Syntax : 1) DELETE FROM <TableName > WHERE <Expression…>;

2) UPDATE <TableName >

SET < ColumanName1 > = <Expression1>,

<ColumanName2> = <Expression2>
WHERE <Condition>;
Use : Delete DML type command is used to Delete records from table.
Update DML type command is used to modify and change records.
Example : DELETE AND UPDATE CLIENT_MASTER , PRODUCT_MASTER

AND SALESMAN_MASTER.
(a)UPDATE CLIENT_MASTER SET CITY = 'BANGLORE' WHERE CLIENTNO='C00005';
Output : 1 row updated.
(b)UPDATE CLIENT_MASTER SET BALDUE = 1000 WHERE CLIENTNO='C00001';
(c)UPDATE PRODUCT_MASTER SET COSTPRICE=950.00 WHERE DESCRIPTION

='TROUSERS';
6. Alter Table Command:
Command Type : DML
Command Syntax : 1) ALTER TABLE <TableName>
ADD(<NewColumnName><Datatype> (<Size>),
<NewColumnName><Datatype> (<Size>)…...);
MODIFY(<ColumnName><New Datatype> (<New Size>),
ColumnName><New Datatype> (<New Size>)…...);
Use : To alter table and modify column name, data type or size.
Examples :

Inbuilt functions in SQL.

Basically a function takes one or more inputs and returns one value as a result. In SQL, a
function works on one or more rows of data an returns a result.
 Character/String/Text Functions: Character functions are used to modify a char or

varchar column.

 Number Functions: Number functions allows you to present a number in a manner that is
useful to the reader.
 Date Functions: Dates are stored in database as a number that contains the calendar data
information and time information. Date functions allows to modify and compare date
data types.
 Conversion Functions: These functions are used to change data from one data type to
another.
 Aggregate or Multi Row SQL Functions: They operate on a set of rows and returns one
result or one result per group.
String/Character/Text Functions:
1. LENGTH(S):
2. UPPER(…):
3. LOWER(…):

4. INITCAP(…):
5. CONCAT(S1,S2):
6. SUBSTR(S1,B,N):
7. INSTR(S1,S2,ST,T):

8. LPAD(S1,S,C) / RPAD(S1,S,C)
9. LTRIM(S1,S2) / RTRIM(S1,S2) / TRIM( ):

 Number Functions: Number functions allows you to present a number in a manner that is
useful to the reader.
Number Functions:
1. ROUND(N,D):
2. TRUNC(N,D):
 Date Functions: Dates are stored in database as a number that contains the calendar data
information and time information. Date functions allows to modify and compare date
data types.
Date Functions:
1. Months_Between(st,ed):

2. Next_day(d,day_of_week):
3. Round(d, format):

4. Trunc(d, format):
 Conversion Functions: These functions are used to change data from one data type to
another.
Conversion Functions:
1. To_Char(date, format) / To_Char(num, format):

2. To_Date(text, format):
3. To_Number(text, format):
 Aggregate or Multi Row SQL Functions: They operate on a set of rows and returns one
result or one result per group.
Aggregate Functions:
1. Count( ):
2. Sum( ):
3. Avg( ):
4. Max( ):
5. Min( ):



BSC CsIt Complete RDBMS Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BSC CsIt Complete RDBMS Notes

Uploaded by

Copyright:

Available Formats

MIT CIDCO DBMS B.Sc.

(CS/IT) III Semester

UNIT-I BASIC CONCEPTS

Database Management System:

set of programs to access those data.

Database System Applications

• Universities: For student information, course registrations, and grades.

Prepared by: Lect. Arohi Patil Page 1

• Sales: For customer, product, and purchase information.

Database Systems versus File Systems

• A program to debit or credit an account

• A program to add a new account

• A program to find the balance of an account

• A program to generate monthly statements

Typical file-processing system is supported by a conventional operating system. The system

Prepared by: Lect. Arohi Patil Page 2

ADVANTAGES OF USING THE DBMS

Restricting Unauthorized Access

2. Providing Persistent Storage for Program Objects

3. Providing Backup and Recovery

4. Providing Multiple User Interfaces

5. Representing Complex Relationships among Data

6. Enforcing Integrity Constraints

Prepared by: Lect. Arohi Patil Page 4

Also See: What is File Processing System?

3. Technical staff requirement

5. Extra Cost of Hardware

7. Cost of Data Conversion

Figure: The three levels of data abstraction.

 Data-Definition Language (DDL)

We specify a database schema by a set of definitions expressed by a special language called a

Prepared by: Lect. Arohi Patil Page 7

create table account

A data dictionary contains metadata—that is, data about data.

• The retrieval of information stored in the database

• The insertion of new information into the database

• The deletion of information from the database

• The modification of information stored in the database

A data-manipulation language (DML) is a language that enables users to access or manipulate

There are basically two types:

Prepared by: Lect. Arohi Patil Page 8

where customer.customer-id = 192-83-7465;

SQL Language is classified in 3 types as

Most frequently used DDL commands are:-

Prepared by: Lect. Arohi Patil Page 9

Most frequently used DML commands are:-

Prepared by: Lect. Arohi Patil Page 10

Prepared by: Lect. Arohi Patil Page 11

 Data Control Facilities –

Permissions, GRANT. Views, CREATE VIEW, etc.

Distribute the database over several sites.

Computer-aided Software Engineering is the automation of the development of software

Database Users and Administrators

 Database Users and User Interfaces

• Application programmers are computer professionals who write application programs.

Prepared by: Lect. Arohi Patil Page 13

• Storage structure and access-method definition.

Prepared by: Lect. Arohi Patil Page 14

• Routine maintenance. Examples of the database administrator‘s routine maintenance activities

Database System Structure / Components of DBMS

Prepared by: Lect. Arohi Patil Page 15

Prepared by: Lect. Arohi Patil Page 16

• Data files, which store the database itself.

particular the schema of the database.