You are on page 1of 22

Database Management Systems

[MCC511]
Winter, 2023-24

Lecture 1.4
RDBMS Architecture – Part 1
ABHISHEK KUMAR SINGH
ASSOCIATEPROFESSOR
DEPARTMENT OF MATHEMATICS & COMPUTING
IIT(ISM) DHANBAD
Last time …
We talked about the different ways you could model the data requirements of a software solution
We discussed, in brief,
◦ The Object Oriented Data Model – where we represent the data in the form of communicating objects
◦ The Graph Model – where the data is represented by nodes in a graph, while edges represent relationships
◦ The Relational Model – where the data is expressed in the form of “join-able” tables having rows and columns
◦ The ER Model – where the data requirements are mapped to entities and inter-relationships between them

We also discussed that the ER model is at somewhat higher level of abstraction …


◦ … whereas the other models could be seen as peers

Also discussed, was that from now on, we will mostly cover the Relational Model along with the ER Model
In continuation, for this lecture and the next, we will have a look at the overall architecture of a DBMS …
◦ … in particular, an RDBMS

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
Logical View of an RDBMS
The Logical View of a system shows a logical decomposition of the system into its sub-systems
◦ It is an architectural diagram, that is often the first such diagram in the series

The information that the Logical View provides is straightforward …


◦ … which is to inform about the major tasks performed by the system at a higher level of abstraction

Assume that you are the architect of a project that involves building a new RDBMS
◦ What could the Logical View of your RDBMS look like?

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
RDBMS

Transaction
Query Processor Security Manager Storage Manager
Manager

A Logical View of a Typical RDBMS

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
Logical View of an RDBMS
The Logical View of a system shows a logical decomposition of the system into its sub-systems
◦ It is an architectural diagram, that is often the first such diagram in the series

The information that the Logical View provides is straightforward …


◦ … which is to inform about the major tasks performed by the system at a higher level of abstraction

Assume that you are the architect of a project that involves building a new RDBMS
◦ What could the Logical View of your RDBMS look like?

In essence, you should look towards building these four sub-systems for your DBMS
◦ Query Processor – the section of the RDBMS that “talks” to your users and processes their requirements
◦ Transaction Manager – the element that makes sure that the database is always in a consistent state
◦ Security Manager – the part of your DBMS that enforces security paradigms on its usage
◦ Storage Manager – the piece of the overall system which manages the data at the physical level

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
Logical View of an RDBMS
The Logical View of a system shows a logical decomposition of the system into its sub-systems
◦ It is an architectural diagram, that is often the first such diagram in the series

The information that the Logical View provides is straightforward …


◦ … which is to inform about the major tasks performed by the system at a higher level of abstraction

Assume that you are the architect of a project that involves building a new RDBMS
◦ What could the Logical View of your RDBMS look like?

In essence, you should look towards building these four sub-systems for your DBMS
◦ Query Processor – the section of the RDBMS that “talks” to your users and processes their requirements
◦ Transaction Manager – the element that makes sure that the database is always in a consistent state
◦ Security Manager – the part of your DBMS that enforces security paradigms on its usage
◦ Storage Manager – the piece of the overall system which manages the data at the physical level
◦ We will discuss the first and last subsystems today, and the rest in the next lecture

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
The Query Processor Sub-system
For a moment, assume that our RDBMS is ready to use …
◦ … i.e., it can be used to create, update and delete tables

Who is going to use it now?


◦ The Database Administrator will certainly be a user – the user with all the administrative rights !!
◦ But there will be other users as well – most commonly, any user that produces or needs the data in the DBMS

For example, the software solution that we discussed last time as an example …
◦ … the one which maps Doctors to Hospitals and vice versa …
◦ … would require an RDBMS where the tables could be “queried” to know more about a Doctor or a Hospital

Thus, we need a communication medium – a way to tell the DBMS, what a user wants
◦ This is where, the Structured Query Language or SQL comes into the picture

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
The Query Processor Sub-system
For a moment, assume that our RDBMS is ready to use …
◦ … i.e., it can be used to create, update and delete tables

Who is going to use it now?


◦ The Database Administrator will certainly be a user – the user with all the administrative rights !!
◦ But there will be other users as well – most commonly, any user that produces or needs the data in the DBMS

For example, the software solution that we discussed last time as an example …
◦ … the one which maps Doctors to Hospitals and vice versa …
◦ … would require an RDBMS where the tables could be “queried” to know more about a Doctor or a Hospital

Thus, we need a communication medium – a way to tell the DBMS, what a user wants
◦ This is where, the Structured Query Language or SQL comes into the picture
◦ Let us spend a few minutes on SQL now, although you’ll be seeing it all the time during the labs

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
Structured Query Language
Modern RDBMSs are capable of understanding an SQL statement (or an SQL command) from a user
SQL is an English-like language, in which statements can be created to convey a variety of information
Based on the type of operation that is required, the SQL statements can be categorised under
◦ Data Definition Language or DDL category
◦ Data Manipulation Language or DML category
◦ Data Control Language category or DCL
◦ Transaction Control Language or TCL

As a Database Administrator, you may have to use statements from the first three categories often …
◦ … while the statements in the fourth category is more important for the software developers

DDL and DML statements usually have the highest frequencies …


◦ … and maybe used by software developers, or in some cases, by the end user as well

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
DDL and DML
The statements that are a part of the DDL are used to create, change or remove schemas
◦ A collection of tables, along with some additional information is called a schema
◦ This information includes the relationships between the tables and any applicable domain constraints
◦ A database is a collection of one or more schemas
◦ Schemas can thus be seen as a logical group of tables, within a larger group, i.e., the database
◦ DDL statements affect the structure of a database, not the data itself …
◦ … although the data may get changed as a side-effect

The statements that constitute DML are those which read, add, change or remove data items
◦ These data items are essentially the rows in one or more tables
◦ DML statements can also be used to report derived data – data that is processed from the stored data …
◦ … e.g., you can report the average CGPA of a batch by averaging all CGPA values stored in the table(s) …
◦ … using an SQL function provided by your DBMS – it is usually called AVG()

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
DCL and TCL
The DCL is a collection of SQL commands that perform authorisation operations over the data
◦ For example, “a particular table should only be accessible to a particular user”
◦ DCL commands are almost exclusively meant for the Database Administrator only

The TCL or Transaction Control Language are meant for the software developers using the DBMS
◦ These commands are useful if the developer wishes to take the data from one consistent state …
◦ … to another consistent state
◦ We will revisit TCL again in the next part of this lecture, when we have discussed Transactions

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
DCL and TCL
The DCL is a collection of SQL commands that perform authorisation operations over the data
◦ For example, “a particular table should only be accessible to a particular user”
◦ DCL commands are almost exclusively meant for the Database Administrator only

The TCL or Transaction Control Language are meant for the software developers using the DBMS
◦ These commands are useful if the developer wishes to take the data from one consistent state …
◦ … to another consistent state
◦ We will revisit TCL again in the next part of this lecture, when we have discussed Transactions

We will study SQL at some depth in this course, so we will look at these statements at that point

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
Overview of Query Processing Tasks
Let us summarise the expected tasks from the Query Processor sub-system of our planned RDBMS
The most prominent ability is to parse SQL statements for syntactic validity
◦ Keep in mind that while there is a standard, called the ANSI SQL, it is not a binding to implement it as is
◦ Thus, every RDBMS vendor has variations in ”their version of SQL”
◦ Still most SQL commands have similar syntaxes across RDBMSs, making it slightly easier for DBMS users

Some other issues that the Query Processor sub-system may have to take care includes
◦ Query Caching – trying to store results of “frequently asked queries” for faster response time
◦ Query Evaluation – attempting to represent a query in one or more “formal, unambiguous forms” …
◦ … which could then be evaluated, e.g., by performing the CRUD operations and any other processing
◦ Query Optimisation – there may be more ways than one to evaluate a query …
◦ … and finding out which way is the best out of the given options

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
Overview of Query Processing Tasks
Let us summarise the expected tasks from the Query Processor sub-system of our planned RDBMS
The most prominent ability is to parse SQL statements for syntactic validity
◦ Keep in mind that while there is a standard, called the ANSI SQL, it is not a binding to implement it as is
◦ Thus, every RDBMS vendor has variations in ”their version of SQL”
◦ Still most SQL commands have similar syntaxes across RDBMSs, making it slightly easier for DBMS users

Some other issues that the Query Processor sub-system may have to take care includes
◦ Query Caching – trying to store results of “frequently asked queries” for faster response time
◦ Query Evaluation – attempting to represent a query in one or more “formal, unambiguous forms” …
◦ … which could then be evaluated, e.g., by performing the CRUD operations and any other processing
◦ Query Optimisation – there may be more ways than one to evaluate a query …
◦ … and finding out which way is the best out of the given options

It is an important field for research as well, that we will cover in brief a little later in the course

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
The Storage Manager Sub-system
The Storage Manager of an RDBMS is the hero behind the curtains …
◦ … since it does its job, with little recognition from the DBMS users, due to the Physical Data Independence

There are three major tasks for which the Storage Manger is responsible
◦ Storing data in physical mediums – which is most probably done as Files
◦ Keeping information about the Logical Level structures that describe the stored data
◦ Managing the data in such a fashion that the CRUD operations could be performed efficiently …
◦ … which, by the way, may require storing additional data

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
The Storage Manager Sub-system
The Storage Manager of an RDBMS is the hero behind the curtains …
◦ … since it does its job, with little recognition from the DBMS users, due to the Physical Data Independence

There are three major tasks for which the Storage Manger is responsible
◦ Storing data in physical mediums – which is most probably done as Files
◦ Keeping information about the Logical Level structures that describe the stored data
◦ Managing the data in such a fashion that the CRUD operations could be performed efficiently …
◦ … which, by the way, may require storing additional data

We will spend some more time on these details towards the end of the course
◦ But let us have an overview of these tasks for now

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
Storing data in Files
Files are just a “sequence of bytes” – however, the bytes may or may not be contiguous
If they are not contiguous, there are some additional issues
◦ For instance, some space has to be used for “linking” the chunks of data together (via pointers)

At the physical level, the data must be seen as a “collection of records”


◦ Each record loosely represents one row in a table – comprising of values of different types …
◦ … where each type represents an attribute or field of the row

The Storage Manager needs to know finer storage details to efficiently use the storage space
◦ For example, what is the “block” size – the unit of reading or writing data on the medium
◦ It is important to figure out if a record should span across multiple blocks …
◦ … since if a record spans across blocks, multiple reads will be required to construct the logical data

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
Keeping “Data about Data” – Metadata (1/2)
The Storage Manager also needs to maintain some information to interpret the stored bytes
For instance, assume that you have a record of size 13 bytes
◦ The bytes can be interpreted in multiple ways
◦ Possible interpretation 1:
int double char

◦ Possible interpretation 2:
char double int

◦ There could be other ways to interpret this record well, but only one way is the correct way !!

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
Keeping “Data about Data” – Metadata (2/2)
The Storage Manager, thus, needs to store a template to map parts of the records to fields
◦ For, instance:
Serial# in the Record Logical Name of Field Type and Space
1 AGE Integer, 4 bytes
2 WEIGHT Double, 8 bytes
3 GENDER Char, 1 byte
◦ This type of information about a record is called Metadata, or “data about data”

It is also a function of the Storage Manager to keep this data somewhere …


◦ … to map data structures at the logical level to records at the physical level

This requires some additional storage as well, but usually it is negligible as compared to the data

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
Creating Efficient Retrieval Agents – Indices
The physical data, or the records, should ideally, only be stored once to avoid wastage
◦ But logically, we may be expected to present the records in some random order
◦ For instance, the records of the example we saw just now may have to be sorted by WEIGHT or GENDER

How can we achieve it?


◦ One way to do so is through Indices

Assuming that we have a mechanism to reach a disk block instantaneously …


◦ ... which is true for most of the secondary storage devices today (ask people who lived with Tapes :P) …
◦ … we can read the records in any order, provided that we know where they are on the disk

An index is a collection type of Key-Value pairs, which are sorted on “some criteria”
◦ Commonly, this criteria is the values of a field, which may be required in a sequential fashion
◦ For example, in the previous example, the keys could be the set of all WEIGHT or SEX entries …
◦ … while the values could be addresses of the records with those entries

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
In the next lecture …
We are currently going through the sub-systems of an imaginary RDBMS …
◦ … unless you decide to build it, in which case, All the best :D

We have seen the Query Processor and the Storage Processor sub-systems in this part …
◦ … whereas in the next part, we will check out the Security Manager and Transaction Manager sub-systems

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD
Homework
The Logical View of a system shows the major sub-systems of the system
◦ It helps us gain some perspective on how to componentise the problem of building it
◦ Another view of a system, is called the Process View
◦ Try to understand what that view is, and how it can help you in understanding the system
◦ Reading this link maybe a good start:
https://en.wikipedia.org/wiki/4%2B1_architectural_view_model

Although we will discuss the four types of SQL commands in detail …


◦ … there is no harm in trying to read this short and simple explanation:
https://www.geeksforgeeks.org/sql-ddl-dml-tcl-dcl/
◦ It is fine if you do not understand everything… see if you can get some idea !!

ABHISHEK KUMAR SINGH | DEPARTMENT OF MATHEMATICS & COMPUTING| IIT (ISM) DHANBAD

You might also like