You are on page 1of 56

DBMS Basics

Dr. Rajesh Chauhan


Database Management System
(DBMS)
• Collection of interrelated data
• Set of programs to access the data
• DBMS contains information about a particular enterprise
• DBMS provides an environment that is both convenient and
efficient to use.
• Database Applications:
– Banking: all transactions
– Airlines: reservations, schedules
– Universities: registration, grades
– Sales: customers, products, purchases
– Manufacturing: production, inventory, orders, supply chain
– Human resources: employee records, salaries, tax deductions
Traditional Database System
• In the early days, database applications were
built on top of file systems
• Drawbacks of using file systems to store data:
– Data redundancy and inconsistency
• Multiple file formats, duplication of information in
different files
– Difficulty in accessing data
• Need to write a new program to carry out each new
task
– Data isolation — multiple files and formats
– Integrity problems
• Integrity constraints (e.g. account balance > 0) become
part of program code
• Hard to add new constraints or change existing ones
Traditional Database Systems (Cont.)
• Drawbacks of using file systems (cont.)
– Atomicity of updates
• Failures may leave database in an inconsistent state with partial
updates carried out
• E.g. transfer of funds from one account to another should either
complete or not happen at all
– Concurrent access to multiple users
• Concurrent accessed needed for performance
• Uncontrolled concurrent accesses can lead to inconsistencies
– E.g. two people reading a balance and updating it at the same time
– Security problems
• Database systems offer solutions to all the above
problems
DBMS Standardization (Reference
Model)
• Reference model for DBMS can be described
according to Three approaches
– Based on components
• Here system can be seen as the interrelationship of
components and hence DBMS consists of number of
components each with specific functionality.
– Based on functions
• Different classes of users are identified
• Different functions are defined for each class of users
• Resultant system will be hierarchical based on the classes of
users
– Based on data (most commonly used)
• Different types of data are identified
• Functional unit are defined that will use data according to
different views.
Know Your Data (Present Scenario)
Traditional Database are used for
• Data storage
• Data management
• Ad-hoc Queries
• Timely Information retrieval
• Report writing & Decision making
• security and privacy
Know Your Data (Present Scenario)
Modern databases are used for
• All work of traditional database
• used for the development of training model
• used in machine learning
• used in prediction analysis
• used in exploratory data analysis
• used in descriptive data analysis
Know Your Data (Present Scenario)
• Storage and management of unstructured
data
• used for knowledge discovery in unstructured
data
• used in managing the heterogeneous data
resources
• Distributed data analysis.
Know Your Data
▪ Types of data
▪ Structured
▪ structured data refers to information with a high
degree of organization.
▪ Semi-structured
▪ It does not have formal structure as in RDBMS but
still have some information related to structuring
such as in XML
▪ Unstructured data
▪ Data do not have any organized structure. The
lack of structure makes compilation a time and
energy-consuming task.
Know your Data

Qualitative /Categorical Quatitative Data

Nominal Ordinal Discreet Continuous

Name Performance Grade Age


Three dimensions of distribution
Distribution (0,1,2) (No, Partial, Full)

Heterogeneity (0,1) (Homogenious, heterogenious)

Autonomy (0,1,2) (No, Partial, Full)


(Multi database)
(Multiple Databases) Any engine, any format
Database with Different Engines
Three dimensions of distribution
• Based on autonomy (A0-tight integration,A1-
semi-autonomous,A2- total isolation)
– Refers to distribution of control in DBMS.
– It is degree to which individual DBMS can be
controlled independently
– Autonomy dimensions are
• Design autonomy
• Communication autonomy
• Execution autonomy
Three dimensions of distribution
• Based on distribution (D0-no distribution,D1-
Client/server, D2-peer-to-peer)
– This dimension deals with data
– Has two classes
• Client/server
• Peer-to-peer

• Based on heterogeneity (H0-homogeneous,H1-


heterogeneous)
– Heterogeneity in network protocols
– Heterogeneity in hardware
– heterogeneity in engines
Architecture alternatives based on
dimensions
A- Autonomy, D- Distribution, H-Heterogeneity,
• A0,D0,H0
– Generic name is composite system
– No Autonomy, no Distribution, no Heterogeneity
• A0,D0,H1
– Having multiple data managers that provides integrated view to
the users
• A0,D1,H0
– Client –server model is typical example of this type
• A0,D2,H0
– Fully distributed
– Identical functionality at each site
Architecture alternatives based on
dimensions
• A1,D0,H0
– Common term is federated databases
• A1,D0,H1
– Heterogeneous federated databases
• A1,D1,H1
– Distributed heterogeneous federated databases
• A2,D0,H0
– Multi-database system architecture
• A2,D0,H1
– Data in multi-storage system each with different characteristics
• A2,D1,H1 and A2,D2,H1
– Distributed multi-database systems
Overall structure of Database
Levels of Abstraction
• Physical level describes how a record (e.g., customer)
is stored.
• Logical level: describes data stored in database, and
the relationships among the data.
type customer = record
name : string;
street : string;
city : integer;
end;
• View level: application programs hide details of data
types. Views can also hide information (e.g., salary) for
security purposes.
View of Data

Known as
Conceptual
Level
Levels of Data
Types of users
• Naïve users
– Interact with the system by invoking one of application
programmes
• Application users
– They are the computer professions who write applications using
database. They have the limited access to the database.
• Sophisticated users
– They interact with the system directly using SQL. They are the
database administrator and play active role in every kind of
database task
• Specialized users
– These are users who write specialised applications for different
databases
Role of Database Administrator
• Schema Definition
• To change Physical Organization of data
• Granting authorization
• Routine maintenance
– Backup
– Upgrading disk space
– Job monitoring (expensive tasks should not
hamper the overall performance)
Instances and Schemas
• Schema – the logical structure of the database
– e.g., the database consists of information about a set
of customers and accounts and the relationship
between them)
– Physical schema: database design at the physical
level
– Logical schema: database design at the logical level
• Instance – the actual content of the database at a
particular point in time
Data Independence
Data Independence is defined as a property of
DBMS that helps you to change the Database
schema at one level of a database system
without requiring to change the schema at
the next higher level.
Importance of Data Independence
•Helps in improving the quality of the data
•Database maintenance becomes affordable
•Enforcement of standards.
•Improved database security
•Avoid repetitive alteration of data structure in application
programs
•Permit developers to focus on the general structure of the
Database rather than worrying about the internal
implementation
•Database incongruity or incompatibility is vastly reduced.
•modifications at the physical level become easier.
Physical Independence
• Physical Data Independence – the ability to modify
the physical schema without changing the logical
schema
– With Physical independence, one can easily change
the physical storage structures or devices without an
effect on the conceptual schema.
– Any change done would be absorbed by the mapping
between the conceptual and internal levels.
– Applications depend on the logical schema do not change
– In general, the interfaces (mapping) between the various
levels and components should be well defined so that
changes in some parts do not seriously influence others.
Physical Data Independence
Example
• Using a new storage device like Hard Drive or Magnetic
Tapes
• Modifying the file organization technique in the Database
• Switching to different data structures.
• Changing the access method.
• Modifying indexes.
• Changes to compression techniques or hashing algorithms.
• Change of Location of Database from say C drive to D Drive
Logical Data Independence
• Logical data Independence
• the ability to modify the Logical schema without
changing the external views / External API or
programs /related application programs
• change made will be absorbed by the mapping between
external and conceptual levels
• examples are
• Add/Modify/Delete a new attribute, entity or relationship is
possible without a rewrite of existing application programs
• Merging two records into one
• Breaking an existing record into two or more records
How to Achieve Data
Independence
Physical Data Independence is achieved by
modifying the physical layer to logical layer
mapping (PL-LL mapping)

Logical Data Independence is achieved by


modifying the view layer to logical layer
mapping (VL-LL mapping).
Advantages of Data Independence
• Redundancy can be reduced
• Inconsistency can be avoided
• Data can be shared
• Standards can be enforced
• Security can be enforced
• Integrity can be maintained
Types of databases
• Centralised database
– Is a database that is located, stored, and maintained in a single
location. This location is most often a central computer or database
system
• Distributed database
– is a database in which storage devices are not all attached to a
common processing unit such as the CPU, and which is controlled by a
distributed database management system. It may be stored in multiple
computers, located in the same physical location; or may be dispersed
over a network of interconnected computers.
• Parallel Databases
– A parallel database system seeks to improve performance through
parallelization of various operations, such as loading data, building
indexes and evaluating queries.
Data Models (ER Model)
• E-R Model
– Model is based on the entities and their relationships
• Advantages
– Easy to develop
– Easy to visualize
– Ability to specify the keys
– Specify generalization and specialization
• Disadvantages
– Used for design not for implementation
– Need more specialized persons
Data Models (Relational Model)
• Relational Model
– Data is represented in collection of tables and columns
• Advantages
– Structural Independence
– Conceptual simplicity
– Design, maintenance is easy
– Good for ad hoc requests
– flexible
• Disadvantages
– Hardware & software overheads
– Slower
– Need more technical expertise
Relational Model
Data Models (Hierarchical)
• Hierarchical Model
– It links the records together in a tree structure and each record
has only one owner
• Advantages
– Great speed
– Easy updates
– Simplicity
– Data security: first to use security
– Efficiency: good for large number of transactions
• Disadvantages
– Implementation complexity
– Not structurally independent
– Not physically independent
– M:N relationship is not possible
Hierarchical Model
Data Models (Network)
● There can be more than one path from
a previous node to successor node/s.
● The operations of the network model
are maintained by indexing structure of
linked list (circular).
Network Model
Data Models (Network)
• Based on directed graph theory
• Advantages:
– Conceptual simplicity
– Handel M:N Relationships
– Data independence
• Disadvantages
– Detailed structural Knowledge is required
– Lack of structural independence
Data Models (Object Oriented)
• Based on the collection of objects
• Object stores instance variables & functions
• Advantages
– Require less code to develop applications
– More natural data model
– Code is easier to maintain
– Provide higher performance
– Improve productivity
– Easy data access
• Disadvantages
– Difficult to maintain the enormous number of objects
– Good for non conventional system
– Data Migration problems
– Data transformation problems
Database Languages
Database Languages (DDL)

• DDL stands for Data Definition Language. It


is used to define database structure or
pattern.
• DDL is a set of commands used to create ,
modify and delete database structures but
not data
• It is used to create schema, tables, indexes,
constraints, etc. in the database.
DDL
• Using the DDL statements, you can create the
skeleton of the database.
• Data definition language is used to store the
information of metadata like the number of
tables and schemas, their names, indexes,
columns in each table, constraints, etc.
Use of DDL
• Create: It is used to create objects in the
database.
• Alter: It is used to alter the structure of the
database.
• Drop: It is used to delete objects from the
database.
• Truncate: It is used to remove all records from
a table.
• Rename: It is used to rename an object.
Data Manipulation Language (DML)
• Language for accessing and manipulating the
data organized by the appropriate data model
– DML also known as query language
• Two classes of languages
– Procedural – user specifies what data is required
and how to get those data
– Nonprocedural – user specifies what data is
required without specifying how to get those data
• SQL is the most widely used query language
Use of DML
• Select: It is used to retrieve data from a
database.
• Insert: It is used to insert data into a table.
• Update: It is used to update existing data
within a table.
• Delete: It is used to delete all records from a
table.
• Merge: It performs UPSERT operation, i.e.,
insert or update operations.
Use of DML
• Call: It is used to call a structured query
language or a Java subprogram.
• Lock Table: It controls concurrency.
Data Control Language
● DCL stands for Data Control
Language.
● It is used to retrieve the stored or
saved data.
● The DCL execution is transactional.
● It also has rollback parameters.
Use of DCL
Here are some tasks that come under DCL:
● Grant: It is used to give user access privileges to a
database.
● Revoke: It is used to take back permissions from the
user.
There are the following operations which have the
authorization of Revoke:
CONNECT, INSERT, USAGE, EXECUTE, DELETE,
UPDATE and SELECT.
Transaction Control Language
TCL is used to run the changes made by the DML
statement.

Here are some tasks that come under TCL:


● Commit: It is used to save the transaction on the
database.
● Rollback: It is used to restore the database to
original since the last Commit.
SQL
• SQL is the standard command set used to
communicate with the RDBMS
• Advantages:
– High level
– Portable
– Non-procedural
– Easy
Storage Manager
• Storage manager is a program module that
provides the interface between the low-level
data stored in the database and the
application programs and queries submitted
to the system.
• The storage manager is responsible to the
following tasks:
– interaction with the file manager
– efficient storing, retrieving and updating of data
Storage Manager (Components)
• Authorization and integrity manager
– It tests for various integrity constraints
• Transaction Manager
– Ensure that database remains consistent during transactions
• File Manager
– Manages allocation of disk space and data structures to represent information
• Buffer Manager
– Responsible for fetching of data from disk to main memory
• Data Files
– Stores data
• Data Dictionary
– Contains metadata
• Indices
– Provides faster access
Data Dictionary
Data Dictionary Contains Metadata and consists
of the following information
● Name of the tables in the database
● Constraints of a table i.e. keys, relationships, etc.
● Columns of the tables that related to each other
● Owner of the table
● Last accessed information of the object
● Last updated information of the object
Type of Data Dictionary
Active Data Dictionary
• The DBMS software manages the active
data dictionary automatically.
• The modification is an automatic task and
most RDBMS has active data dictionary.
• It is also known as integrated data
dictionary.
Types of Data Dictionary
Passive Data Dictionary
• Managed by the users and is modified
manually when the database structure
change.
• Known as non-integrated data dictionary.
End of Basics

Thanks

You might also like