Professional Documents
Culture Documents
Introduction:
What is Data?
In simple words data can be facts related to any object in consideration.For example your name,
age, height, weight, etc are some data related to you.A picture , image , file , pdf etc can also be
considered data.
What is a Database?
Database is a systematic collection of data. Databases support storage and manipulation of data.
Databases make data management easy. Let's discuss few examples.
An online telephone directory would definitely use database to store data pertaining to people,
phone numbers, other contact details, etc.
Your electricity service provider is obviously using a database to manage billing , client related
issues, to handle fault data, etc.
Let's also consider the facebook. It needs to store, manipulate and present data related to
members, their friends, member activities, messages, advertisements and lot more.
Database Management System (DBMS) is a collection of programs which enables its users to
access database, manipulate data, reporting / representation of data .
It also helps to control access to the database.
Database Management Systems are not a new concept and as such had been first implemented in
1960s.
Charles Bachmen's Integrated Data Store (IDS) is said to be the first DBMS in history.
With time database technologies evolved a lot while usage and expected functionalities of
databases have been increased immensely.
Types of DBMS
Let's see how the DBMS family got evolved with the time. Following diagram shows the
evolution of DBMS categories.
✓ There are 4 major types of DBMS. Let's look into them in detail.
✓ Hierarchical - this type of DBMS employs the "parent-child" relationship of storing data.
This type of DBMS is rarely used nowadays. Its structure is like a tree with nodes
representing records and branches representing fields. The windows registry used in
Windows XP is an example of a hierarchical database. Configuration settings are stored
as tree structures with nodes.
✓ Network DBMS - this type of DBMS supports many-to many relations. This usually
results in complex database structures. RDM Server is an example of a database
management system that implements the network model.
✓ Relational DBMS - this type of DBMS defines database relationships in form of tables,
also known as relations. Unlike network DBMS, RDBMS does not support many to
many relationships.Relational DBMS usually have pre-defined data types that they can
support. This is the most popular DBMS type in the market. Examples of relational
database management systems include MySQL, Oracle, and Microsoft SQL Server
database.
✓ Object Oriented Relation DBMS - this type supports storage of new data types. The data
to be stored is in form of objects. The objects to be stored in the database have attributes
(i.e. gender, ager) and methods that define what to do with the data. PostgreSQL is an
example of an object oriented relational DBMS.
Database Architecture
1 tier Architecture
The simplest of Database Architecture are 1 tier where the Client, Server, and Database all reside
on the same machine. Anytime you install a DB in your system and access it to practise SQL
queries it is 1 tier architecture. But such architecture is rarely used in production.
2-tier DBMS architecture includes an Application layer between the user and the DBMS, which
is responsible to communicate the user's request to the database management system and then
send the response from the DBMS to the user.
An application interface known as ODBC(Open Database Connectivity) provides an API that
allow client side program to call the DBMS. Most DBMS vendors provide ODBC drivers for
their DBMS.
3-tier DBMS Architecture
3-tier DBMS architecture is the most commonly used architecture for web applications.
It is an extension of the 2-tier architecture. In the 2-tier architecture, we have an application layer
which can be accessed programmatically to perform various operations on the DBMS. The
application generally understands the Database Access Language and processes end users
requests to the DBMS.
In 3-tier architecture, an additional Presentation or GUI Layer is added, which provides a
graphical user interface for the End user to interact with the DBMS.
For the end user, the GUI layer is the Database System, and the end user has no idea about the
application layer and the DBMS system.
If you have used MySQL, then you must have seen PHPMyAdmin, it is the best example of a 3-
tier DBMS architecture.
Types of Data Models
Data Models are the collection of concept to describe the structure of database. The types of data
models are :
In this model a database record is a tree that consists of one or more groupings of fields called
segments, which makeup the individual nodes of the tree. This model use one-to-many
relationship
Advantage : Data access is quite predictable in structure and retrieval and updates can be
highly optimized by a DBMS.
Disadvantage : The link is permanently established and cannot be modified which makes this
model rigid.
The Network database model was developed as an alternative to the hierarchical database. This
model expands on the hierarchical model by providing multiple paths among segments i.e more
than one parent-child relationship. Hence this model allows one-to-one, one-to-many and many-
to-many relationships
Supporting multiple paths in the data structure eliminates some of the drawbacks of the
hierarchical model, the network model is not very practical.
The key differences between previous database models and relational database model is in terms
of flexibility.
A relational database represents all data in the database as simple two-dimensional tables called
relations. Each row of a relational table, called tuple, represents a data entity with columns of the
table representing attributes(fields). The allowable values for these attributes are called
the domain
Each row in a relational table must have a unique primary key and also has some secondary keys
which correspond with primary keys in other tables
Advantage : Provides flexibility that allows changes to the database structure to be easily
accommodated. It facilitates multiple views of the same database for different users.
For example: COLLEGE table has Batch_Year as primary key and has secondary keys
Student_ID and Course_ID, these keys serve as primary keys for STUDENT and COURSE
tables.
Student Table
Student_ID Student_Name
101 Shubham
102 Rajat
Course Table
Course_ID Course_Name
14 Java
16 Android
College Table
Database Schema
A database schema is the skeleton structure that represents the logical view of the entire
database. It defines how the data is organized and how the relations among them are associated.
It formulates all the constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a descriptive
detail of the database, which can be depicted by means of schema diagrams. It’s the database
designers who design the schema to help programmers understand the database and make it
useful.
Database Instance
• It is important that we distinguish these two terms individually. Database schema is the
skeleton of database. It is designed when the database doesn't exist at all. Once the
database is operational, it is very difficult to make any changes to it. A database schema
does not contain any data or information.
• A database instance is a state of operational database with data at any given time. It
contains a snapshot of the database. Database instances tend to change with time. A
DBMS ensures that its every instance (state) is in a valid state, by diligently following all
the validations, constraints, and conditions that the database designers have imposed.
Database Languages
A database system provides a Data Definition Language to specify the database schema and
a Data Manipulation Language to express database queries and updates.
Data-Definition Language
Data-Manipulation Language
A data manipulation language (DML) is a language that enables users to access or manipulate
data as organized by the appropriate data model.
DML commands are not auto-committed. It means changes are not permanent to database, they
can be rolled back.
Data manipulation is retrieval of information, insertion of new information, deletion of
information or modification of information stored in the database.
Example of DML are :
SELECT : retrieve data from the database
INSERT : insert data into a table
UPDATE : updates existing data in a table
DELETE : delete all records from a database table
There are two types of DML :
Procedural DML : require a user to specify what data are needed and how to get those data.
Non-Procedural DML : require a user to specify what data are needed without specifying how
to get those data.
TCL commands are to keep a check on other commands and their affect on the database.
These commands can annul changes made by other commands by rolling back to original state. It
can also make changes permanent.
Example of TCL are :
COMMIT : to permanently save
ROLLBACK : to undo change
SAVEPOINT : to save temporarily
DBMS | Interfaces
A database management system (DBMS) interface is a user interface which allows for the ability to input
queries to a database without using the query language itself.User-friendly interfaces provide by DBMS
may include the following:
DBMS acts as an interface between user and the database. DBMS are very large and typically
divided into modules :-
DDL Compiler –
Converts DDL statements to a set of tables containing metadata stored in a data
dictionary.Metadata information can be the name of files,data items,storage details of each
file,mapping information and constraints,etc.
DML Compiler and Query Optimizer - DML compiler translates the Data Manipulation
Languages into query Engine instructions. It might also do optimization for query.
Query processor/optimizer translates statements in a query language into low-level instructions
the database manager understands. (It is used to find an equivalent but more efficient form).
Data Manager –
• The data manager is the central software component of the DBMS. It is sometimes
referred to as the database control system.
One of the functions of the data manager is to convert operations in the user's queries
coming directly via the query processor or\ indirectly via an application program from the
user's logical view to a physical file system.
• The data manager is responsible for interfacing with the file system as show. In addition,
the tasks of enforcing constraints to maintain the consistency and integrity of the data, as
well as its security, are also performed by the data manager.
It is also the responsibility of the Data. Manager to provide the synchronization in the
simultaneous operations performed by concurrent users and to maintain the backup and
recovery operations.
Data Dictionary –
Data Dictionary is a repository of description of data in the database.A data dictionary contains
a list of all files in the database, the number of records in each file and the names and types of
each field. Most database management systems keep the data dictionary hidden from users to
prevent them from accidentally destroying its content.
Functions of the Data Dictionary-
1. Defines the data element.
2. Helps in the scheduling.
3. Helps in the control.
4. Permits the various users who know which data is available and how can it be obtained.
5. Helps in the identification of the organizational data irregularity.
6. Acts as a very essential data management tool.
7. Provides with a good standardization mechanism.
8. Acts as the corporate glossary of the ever growing information resource.
9. Provides the report facility, the control facility along with the excerpt facility.
Compiled DML - The DML complier converts the high level Queries into low level file access
commands known as compiled DML.
End Users - End Users are the people who interact with the database through applications or
utilities. The various categories of end users are:
1. Casual End Users - These Users occasionally access the database but may need different
information each time. They use sophisticated database Query language to specify their requests.
For example: High level Managers who access the data weekly or biweekly.
2. Native End Users - These users frequently query and update the database using standard
types of Queries. The operations that can be performed by this class of users are very limited and
effect precise portion of the database.For example: - Reservation clerks for airlines/hotels check
availability for given request and make reservations. Also, persons using Automated Teller
Machines (ATM's) fall under this category as he has access to limited portion of the database.
3.Standalone end Users/On-line End Users - Those end Users who interact with the database
directly via on-line terminal or indirectly through Menu or graphics based Interfaces.Example:-
Library Management System.
ER Model Concepts
ER Diagram Notations:
The ER model defines the conceptual view of a database. It works around real-world entities and
the associations among them. At view level, the ER model is considered a good option for
designing databases.
Entity
Entities are represented by means of rectangles. Rectangles are named with the entity set they
represent.
Attributes
Attributes are the properties of entities. Attributes are represented by means of ellipses. Every
ellipse represents one attribute and is directly connected to its entity (rectangle).
If the attributes are composite, they are further divided in a tree like structure. Every node is then
connected to its attribute. That is, composite attributes are represented by ellipses that are
connected with an ellipse.
For example, the roll_number of a student makes him/her identifiable among students.
• Super Key − A set of attributes (one or more) that collectively identifies an entity in an
entity set.
• Candidate Key − A minimal super key is called a candidate key. An entity set may have
more than one candidate key.
• Primary Key − A primary key is one of the candidate keys chosen by the database
designer to uniquely identify the entity set.
Relationship
The association among entities is called a relationship. For example, an employee works_at a
department, a student enrolls in a course. Here, Works_at and Enrolls are called relationships.
Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a relationship too
can have attributes. These attributes are called descriptive attributes.
Participation Constraints
Degree of Relationship
The number of participating entities in a relationship defines the degree of the relationship.
• Binary = degree 2
• Ternary = degree 3
• n-ary = degree
• One-to-many − One entity from entity set A can be associated with more than one
entities of entity set B however an entity from entity set B, can be associated with at most
one entity.
• Many-to-one − More than one entities from entity set A can be associated with at most
one entity of entity set B, however an entity from entity set B can be associated with
more than one entity from entity set A.
• Many-to-many − One entity from A can be associated with more than one entity from B
and vice versa.
• Super class is an entity type that has a relationship with one or more subtypes.
• An entity cannot exist in database merely by being member of any super class.
For example: Shape super class is having sub groups as Square, Circle, Triangle.
2. Sub Class
• Sub class is a group of entities with unique attributes.
• Sub class inherits properties and attributes from its super class.
For example: Square, Circle, Triangle are the sub class of Shape super class.
1. Generalization
Generalization is the process of generalizing the entities which contain the properties of all the
generalized entities.
It is a bottom approach, in which two lower level entities combine to form a higher level entity.
Generalization is the reverse process of Specialization.
It defines a general entity type from a set of specialized entity type.
It minimizes the difference between the entities by identifying the common features.
For example:
In the above example, Tiger, Lion, Elephant can all be generalized as Animals.
2. Specialization
Specialization is a process that defines a group entities which is divided into sub groups based on
their characteristic.
It is a top down approach, in which one higher entity can be broken down into two lower level
entity.
It maximizes the difference between the members of an entity by identifying the unique
characteristic or attributes of each member.
It defines one or more sub class for the super class and also forms the superclass/subclass
relationship.
For example
In the above example, Employee can be specialized as Developer or Tester, based on what role
they play in an Organization.
C. Category or Union
Category represents a single super class or sub class relationship with more than one super class.
It can be a total or partial participation.
For example Car booking, Car owner can be a person, a bank (holds a possession on a Car) or a
company. Category (sub class) → Owner is a subset of the union of the three super classes →
Company, Bank, and Person. A Category member must exist in at least one of its super classes.
D. Aggregation
Aggregation is a process that represent a relationship between a whole object and its component
parts.
It abstracts a relationship between objects and viewing the relationship as an object.
It is a process when two entity is treated as a single entity.
In the above example, the relation between College and Course is acting as an Entity in Relation
with Student.
For example, a student can be enrolled only in one course, but a course can be enrolled by many
students
For Student(SID, Name), SID is the primary key. For Course ( CID, C_name ), CID is the
primary key
Student Course Enroll
(SID Name) ( CID C_name ) (SID CID)
-------------- ----------------- ---------------
Now the question is, what should be the primary key for Enroll SID or CID or combined. We
can’t have CID as primary key as you can see in enroll for the same CID we have multiples SID.
(SID , CID) can distinguish table uniquely, but it is not minimum. So SID is the primary key for
the relation enroll.
Degrees of Relationship (Cardinality)
The degree of relationship (also known as cardinality) is the number of occurrences in one entity
which are associated (or linked) to the number of occurrences in another.
There are three degrees of relationship, known as:
• one-to-one (1:1)
• one-to-many (1:N)
• many-to-one (N:1)
• many-to-many (M:N)
• The latter one is correct, it is M:N and not M:M.
Binary Relationship and Cardinality
A relationship where two entities are participating is called a binary relationship. Cardinality is
the number of instance of an entity from a relation that can be associated with the relation.
One-to-one − When only one instance of an entity is associated with the relationship, it is
marked as '1:1'. The following image reflects that only one instance of each entity should be
associated with the relationship. It depicts one-to-one relationship.
One-to-many − When more than one instance of an entity is associated with a relationship, it is
marked as '1:N'. The following image reflects that only one instance of entity on the left and
more than one instance of an entity on the right can be associated with the relationship. It depicts
one-to-many relationship.
Many-to-one − When more than one instance of entity is associated with the relationship, it is
marked as 'N:1'. The following image reflects that more than one instance of an entity on the left
and only one instance of an entity on the right can be associated with the relationship. It depicts
many-to-one relationship.
Many-to-many − The following image reflects that more than one instance of an entity on the
left and more than one instance of an entity on the right can be associated with the relationship. It
depicts many-to-many relationship.