Professional Documents
Culture Documents
Database is a collection of data, facts and figures which can be processed to produce
information relevant to an enterprise.
Eg.: Name of a student, age, class and her subjects can be counted as data for recording
purposes.
For example, if we have data about marks obtained by all students, we can then conclude about
toppers and average marks etc.
DBMS is a collection of interrelated data and set of program to access those data. (OR)
A software system that is used to manage databases is called a DBMS.
The primary goal of database management system is to store data, in such a way which is
easier to retrieve, manipulate and helps to produce information that is both convenient
and efficient.
DBMS examples include: MySQL, SQL Server, Oracle, dBASE, FoxPro
Database systems – Collection of Database and DBMS collectively known as Database systems.
Characteristics of DBMS:
Traditionally data was organized in file formats. DBMS was all new concepts to overcome all
the deficiencies in traditional style Modern DBMS has the following characteristics (or) features:
Real-world entity
Relation-based tables Objectives:
Isolation of data and application 1.Allow multiple users to be active at one time.
2.Provide data integrity.
3.Protect the data from physical harm and unauthorized access.
4.Provide security with a user access privilege.
Types of DBMS: (Refer in data model)
Hierarchical DBMS
Network DBMS
Less redundancy Relational DBMS
Consistency Object-Oriented DBMS
Query Language
ACID (Atomicity, Consistency, Isolation and Durability) Properties: in multi-
transactional environment and in case of failure.
Multiuser and Concurrent Access
Multiple views
Security: DBMS is not saved on disk as traditional file system it is very hard for a thief to
break the code.
DBMS also stores metadata which is data about data, to ease its own process.
Advantages: Disadvantages:
o Controls database redundancy
o Cost of Hardware and Software is high
o Data sharing
o large Size and takes time to Set up
o Easily Maintenance
o Complexity
o Reduce time
o Higher impact of failure
o Backup
o multiple user interface
o Data Security
Users: Administrators: They create access profiles for users and apply
limitations to maintain isolation and force security. Maintains DBMS
resources.
End Users: can be just viewers who pay attention to the logs or market
rates or end users can be as sophisticated as business a analyst who takes
the most of it.
DBMS - Architecture
DBMS architecture depends upon how users are connected to the database to get their
request done. The design of a Database Management System highly depends on its architecture.
It can be centralized or decentralized or hierarchical.
Types:
In 1-tier architecture, DBMS is the only entity where user directly sits on DBMS and uses it.
Any changes done here will directly be done on DBMS itself. It does not provide handy tools for
end users and preferably database designer and programmers use single tier architecture.
If the architecture of DBMS is 2-tier then it must have some application, which uses the
DBMS. Programmers use 2-tier architecture where they access DBMS by means of application.
Here application tier is entirely independent of database in term of operation, design and
programming.
3-tier architecture
Most widely used architecture is 3-tier architecture. 3-tier architecture separates it tier
from each other on basis of users. It is described as follows:
Database (Data) Tier: At this tier, the database resides along with its query processing
languages. We also have the relations that define the data and their constraints at this level.
Application (Middle) Tier :At this tier the application server and program, which access
database, resides. For a user, this application tier presents an abstracted view of the
database. End-users are unaware of any existence of the database beyond the application.
At the other end, the database tier is not aware of any other user beyond the application
tier. Hence, the application layer sits in the middle and acts as a mediator between the end-
user and the database.
User (Presentation) Tier: End-users operate on this tier and they know nothing about any
existence of the database beyond this layer. At this layer, multiple views of the database
can be provided by the application. All views are generated by applications that reside in the
application tier.
Multiple tier database architecture is highly modifiable as almost all its components are
independent and can be changed independently.
2-tier architecture
3-tier architecture
DBMS - Data Models
Data models define how the logical structure of a database is modeled. Data Models
are fundamental entities to introduce abstraction in a DBMS. Data models define how data is
connected to each other and how they are processed and stored inside the system.
Describe data at the conceptual and view levels. These models specify logical structure of
database with records, fields and attributes.
The most popular data model in DBMS is the Relational Model. It is more scientific model than
others. In this model, the data is maintained in the form of a two-dimensional table. All the
information is stored in the form of row and columns. The basic structure of a relational
model is tables. So, the tables are also called relations in the relational model. Example: In
this example, we have a Student table.
Features of Relational Model
Describe data at the conceptual( logical level - design of the database) and view levels.
Example 1
Example 2
Entity-Relationship Model
Entity-Relationship model is based on the notion of real world entities and relationship
among them.
ER Model is best used for the conceptual design of a database.
ER Model is based on- (1)Entities and their attributes (2)Relationships among entities.
These concepts are explained below.
Entity − real-world entity having properties called attributes. Every attribute is defined
by its set of values called domain.
Relationship − The logical association among entities is called relationship.
Relationships are mapped with entities in various ways.
Mapping cardinalities –the number of association between two entities.
Cardinality defines the number of entities in one entity set, which can be associated with the
number of entities of other set via relationship set.
. Mapping cardinalities − One-to-one − One entity from entity One-to-many − One entity from
set A can be associated with at most entity set A can be associated with
one to one one entity of entity set B and vice more than one entities of entity set B
one to many vers however an entity from entity set B,
many to one can be associated with at most one
many to many entity.
A B
QUERY PROCESSING
It is a step wise process that can be used at the physical level of the file system, query
optimization and actual execution of the query to get the result.
It requires the basic concepts of relational algebra and file structure.
It refers to the range of activities that are involved in extracting data from the database.
It includes translation of queries in high-level database languages into expressions that can be
implemented at the physical level of the file system.
In query processing, we will actually understand how these queries are processed and how
they are optimized.
Each relational algebra operation can be evaluated using one of several different algorithms.
Correspondingly, a relational-algebra expression can be evaluated in many ways.
Annotated expression specifying detailed evaluation strategy is called an evaluation-plan.
SQL
SQL is Structured Query Language, which is a computer language for storing, manipulating and
retrieving data stored in relational database.
SQL is the standard language for Relation Database System. All relational database management systems
like MySQL, MS Access, and Oracle, Sybase, Informix, postgres and SQL Server use SQL as standard
database language.
SQL Process
When you are executing an SQL command for any RDBMS, the system determines the best way
to carry out your request and SQL engine figures out how to interpret the task.
There are various components included in the process. These components are Query Dispatcher,
Optimization Engines, Classic Query Engine and SQL Query Engine, etc. Classic query engine handles
all non-SQL queries but SQL query engine won't handle logical files.
SQL Commands
The standard SQL commands to interact with relational databases are CREATE, SELECT,
INSERT,UPDATE, DELETE and DROP. These commands can be classified into groups based on their
nature.
1. DDL - Data Definition Language
Command Description
CREATE Creates a new table, a view of a table, or other object in database
ALTER Modifies an existing database object, such as a table.
DROP Deletes an entire table, a view of a table or other object in the database.
TRUNCATE Remove all records from a table, including all spaces allocated for the records are
removed
RENAME Rename an object
SQL Process
3. DCL - Data Control Language
Command Description
GRANT Gives a privilege to user
REVOKE Takes back privileges granted from user
Example:
CREATE:
Syntax: To create table - Create table tablename; Eg: Create table student;
To create table with Column name - Create table tablename (column_name1 data_ type
constraints, column_name2 data_ type constraints …)
Eg.: Create table stud (sname varchar2(20), rollno number(10) not null,dob date not null);
ALTER:
Syntax: alter table tablename add/modify (attribute datatype(size));
Eg.1: Alter table stud add (phone_no char (20));
Eg.2: Alter table stud modify(phone_no number (10));
DROP:
Syntax: drop table tablename; Eg.: drop table stud;
To drop column- Eg.: alter table emp drop column experience;
TRUNCATE:
Syntax: Truncate table tablename; Eg.: Truncate table stud;
DESC-This is used to view the structure of the table.
Syntax: desc tablename; Eg.: desc emp; Name Null? Type
INSERT: --------------------------------- --------
Inserting a single row into a table – EmpNo NOT NULL number(5)
Syntax: insert into <table name> values (value list); EName VarChar(15)
Example: insert into s values(‘s3’,’sup3’,’blore’,10); Job NOT NULL Char(10)
1 row created. DeptNo NOT NULL number(3)
Inserting more than one record using a single insert commands PHONE_NO number (10)
Syntax: insert into <table name> values (&col1, &col2, ….);
Example: insert into stud values(®, ‘&name’, &percentage);
Enter value for reg: 1
Enter value for name: Mathi
Enter value for percentage: 90
1 row created.
SQL> /
Enter value for reg: 2
Enter value for name: Arjun
Enter value for percentage: 92
1 row created.
SELECT:
Selects all rows from the table REG NAME PERCENTAGE
Syntax: Select * from tablename; Eg.: Select * from stud; ---------- -------------------- -------------
1 Mathi 90
2 Arjun 92
REG NAME
The retrieval of specific columns from a table ---------- -------
Syntax: Select column_name1, …..,column_namen from table name; 1 Mathi
Eg.: Select reg, name from stud; 2 Arjun
Select command with where clause
Example: Select empno, empname from emp where sal>4000;
UPDATE COMMAND
Syntax:update tablename set field=values where condition;
Example:Update emp set sal = 10000 where empno=135;
DELETE COMMAND
Syntax: Delete from table where conditions; Eg.: delete from emp where empno=135;
GRANT Eg.: Grant all on employees to departments;
Grant succeeded.
Grant some options: Eg.: Grant select, update , insert on employees to departments with grant option;
Grant succeeded.
REVOKE Eg.: Revoke all on employees from departments; Revoke succeeded.
Revoke some options: Revoke select, update , insert on employees from departments;
Revoke succeeded.
SAVEPOINT: Syntax : SAVEPOINT <SAVE POINT NAME>; Eg.: SAVEPOINT S1;
ROLLBACK: Syntax: ROLL BACK <SAVE POINT NAME>; Eg.: rollback s1;
COMMIT: Commit; Commit complete.
GROUP FUNCTIONS:
A group function returns a result based on group of rows.
1. avg - Example: select avg (total) from student;
2. max - Example: select max (percentage) from student;
3. min - Example: select min (percentage) from student;
4. sum - Example: select sum(price) from product;
COUNT FUNCTION:
In order to count the number of rows, count function is used.
1. count(*) – It counts all, inclusive of duplicates and nulls.
Example: select count(*) from student; COUNT(*) COUNT(NAME)
2. count(col_name)– It avoids null value. --------- ----------
Example: select count(name) from stud; 5 4
Display all the details of the records whose employee name starts with ‘A’.
select * from emp where ename like 'A%';
EMPNO ENAME JOB DEPTNO SAL
---------- -------------------- ------------- ---------- ----------
2 Arjun ASP 2 15000
5 Akalya AP 1 10000
Display all the details of the records whose employee name does not starts with ‘A’.
Ans:
SQL> select * from emp where ename not like 'A%';
EMPNO ENAME JOB DEPTNO SAL
---------- -------------------- ------------- ---------- ----------
1 Mathi AP 1 10000
3 Gugan ASP 1 15000
4 Karthik Prof 2 30000
CONCURRENCY MANAGEMENT
When more than one user utilizes a DBMS, problems can occur if the system is not designed for multi-
users
CONCURRENCY MANAGEMENT
In a multiprogramming environment where more than one transactions can be concurrently executed,
there exists a need of protocols to control the concurrency of transaction to ensure atomicity and isolation
properties of transactions.
Concurrency control protocols in DBMS are procedures that are used for managing multiple
simultaneous operations without conflict with each other.
Concurrency control protocols, which ensure speed, serializability of transactions, are most desirable.
Concurrency control protocols can be broadly divided into two categories:
Lock is in other words called as access(the operations that can be performed on the data item)
Locks help synchronize access to the database items by concurrent transactions.
All lock requests are made to the concurrency-control manager. Transactions proceed only once
the lock request is granted.
It ensures that one transaction should not retrieve and update record while another transaction
is performing a write operation on it
Example:
In traffic light signal that indicates stop and go, when one signal is allowed to pass at a time and other
signals are locked, in the same way in a database transaction, only one transaction is performed at a time
meanwhile other transactions are locked
Locks are of two kinds:
Binary Locks: A Binary lock on a data item can either locked or unlocked states.
Shared/exclusive: This type of locking mechanism separates the locks based on their uses.
The lock compatibility matrix for shared / exclusive locks is given below:
There can be any number of transactions for holding shared locks on an item
Shared Exclusive
but if any transaction holds exclusive lock then item no other transaction may
Shared True False
Exclusive False False hold any Lock on the item
i)Exclusive Lock(X):
If a lock is acquired on a data item to perform a write operation, it is called an exclusive lock.
Once the transaction is complete and releases its first lock, it goes on releasing the locks
the transaction cannot demand any new locks; it only releases the acquired locks.
Hence it will have descending phase of locks.
Thus this protocol has two phases – growing phase of locks and shrinking phase of locks.
For example, if we have to calculate total marks of 3 subjects, then this
protocol will go on asking for the locks on subject1 marks, subject2
marks and then subject3 marks. As and when it gets the locks on the
subject marks it reads the marks. It does not wait till all the locks are
received. Then it will have total calculation. Once it is complete it
release the lock on subject3 marks, subject2 marks and subject1 marks.
d) Strict Two-Phase Locking
The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first
phase, the transaction continues to execute normally. But in contrast to 2PL, Strict-2PL
does not release a lock after using it. Strict-2PL holds all the locks until the commit point
and releases all the locks at a time.
The most commonly used concurrency protocol is the timestamp based protocol.
This protocol uses either system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions
at the time of execution, whereas timestamp-based protocols start working as soon as a
transaction is created.
Example:
Suppose there are there transactions T1, T2, and T3.
T1 has entered the system at time 0010
T2 has entered the system at 0020
T3 has entered the system at 0030
Priority will be given to transaction T1, then transaction T2 and lastly Transaction T3.
The timestamp-ordering protocol : The timestamp-ordering protocol ensures serializability among
transaction in their conflicting read and writes operations. This is the responsibility of the protocol system
that the conflicting pair of tasks should be executed according to the timestamp values of the transactions.
Advantages:
Time-stamp of Transaction Ti is denoted as TS (Ti).
Read time-stamp of data-item X is denoted by R-timestamp(X).
Schedules are serializable just like
Write time-stamp of data-item X is denoted by W-timestamp(X).
2PL protocols
Timestamp ordering protocol works as follows:
No waiting for the transaction,
If a transaction Ti issues read(X) operation: which eliminates the possibility of
If TS(Ti) < W-timestamp(X) deadlocks!
Operation rejected.
If TS(Ti) >= W-timestamp(X) Disadvantages:
Operation executed.
All data-item Timestamps updated. Starvation is possible if the same
If a transaction Ti issues write(X) operation: transaction is restarted and
If TS(Ti) < R-timestamp(X) continually aborted
Operation rejected.
If TS(Ti) < W-timestamp(X) Deadlock: It is a situation where two or more transactions are
Operation rejected and Ti rolled back.
Otherwise, operation executed. waiting indefinitely for each other to give up their locks
itself is ignored.
DATA WAREHOUSE
A Data Warehouse is separate from DBMS, it stores huge amount of data, which is
typically collected from multiple heterogeneous source like files, DBMS, etc.
The goal is to produce statistical results that may help in decision makings.
(OR)
A data warehouse is a centralized repository that stores data from multiple information
sources and transforms them into a common, multidimensional data model for efficient
querying and analysis.
For example, a college might want to see quick different results, like how is the placement of
CS students has improved over last 10 years, in terms of salaries, counts, etc.
Data flow
Inflow- The processes associated with the extraction, cleansing, and loading of the data
from the source systems into the data warehouse.
upflow- The process associated with adding value to the data in the warehouse through
summarizing, packaging , packaging, and distribution of the data.
downflow- The processes associated with archiving and backing-up of data in the
warehouse.
Data Warehouse reduces the cost used to Data Warehouse is not easy to maintain. It can
access historical data. be costly to maintain it.
It allows others to access and share the data. Data Warehouse has security issues.
It improves turnaround time for analysis and It is a time consuming process.
reporting.
Not an ideal option for unstructured data.
Staging Area
Tools and Technologies
After the critical steps, loading the results into target system can be carried out either by
separate products, or by a single, categories:
For the various types of meta-data and the day-to-day operations of the data warehouse,
the administration and management tools must be capable of supporting those tasks:
Monitoring data loading from multiple sources
Data quality and integrity checks
Managing and updating meta-data
Monitoring database performance to ensure efficient query response times and
resource utilization
Auditing data warehouse usage to provide user chargeback information
Replicating, sub-setting, and distributing data
Maintaining efficient data storage management
Purging data;
Archiving and backing-up data
Implementing recovery following failure
Some of the prominent Data Warehousing tools are MarkLogic, Oracle, Amazon RedShift
DATA MART
A data mart is a simple form of a data warehouse that is focused on a single subject (or
functional area), such as sales, finance or marketing. Data marts are often built and controlled by
a single department within an organization.
Given their single-subject focus, data marts usually draw data from only a few sources.
The sources could be internal operational systems, a central data warehouse, or external data.
Data Mart
1. Dependent: Draw data from a central data warehouse that has already been created.
2. Independent: Independent data mart is created without the use of a central data
warehouse. Draw data directly from operational or external sources of data or both.
3. Hybrid: This type of data marts can take data from data warehouses or operational
systems.
The Extraction-Transformation-and Loading (ETL) process is how you get data out of the
sources and feed into the data mart which involves moving data from operational systems,
filtering it, and loading it into the data mart.
Dependent data marts draw data from a central data warehouse that has already been
created. This gives you the usual advantages of centralization.
ETL process is somewhat simplified because formatted and summarized (clean) data has
already been loaded into the central data warehouse.
The ETL process for dependent data marts is mostly a process of identifying the right
subset of data relevant to the chosen data mart subject and moving a copy of it, perhaps in
a summarized form.
Motivation- usually built to achieve improved performance and availability, better
control, and lower telecommunication costs resulting from local access of data relevant to
a specific department.
Independent data marts are standalone systems built by drawing data directly from
operational or external sources of data, or both.
Independent data marts deals with all aspects of the ETL process you do with a central
data warehouse. The number of sources is likely to be fewer and the amount of data
associated with the data mart is less than the warehouse, given your focus on a single
subject.
This could be desirable for smaller groups within an organization.
Motivation- The creation of independent data marts is often driven by the need to have a
solution within a shorter time.
A hybrid data mart allows you to combine input from sources other than a data
warehouse.
This could be useful for many situations, especially when you need ad hoc integration,
such as after a new group or product is added to the organization.
Hybrid data marts simply combine the issues of independent and independent data marts.
It is best suited for multiple database environments and fast implementation turnaround
for any organization. It also requires least data cleansing effort.
Hybrid Data mart also supports large storage structures, and it is best suited for flexible
for smaller data-centric applications.
The major steps in implementing a data mart are:
STEPS IN IMPLEMENTING A DATAMART
Designing - design the schema
Constructing - construct the physical storage
Populating - populate the data mart with
data from source systems
Accessing - access it to make informed
decisions
Managing - manage it over time
1. Designing
The design step is first in the data mart process. This step covers all of the tasks from
initiating the request for a data mart through gathering information about the requirements, and
developing the logical and physical design of the data mart. The design step involves the
following tasks:
Gathering the business and technical requirements DATA MART ISSUES
Identifying data sources (capability)
Selecting the appropriate subset of data (Performance decreases)
Designing the logical and physical structure of the
data mart
2. Constructing
This step includes creating the physical database and the logical structures associated with
the data mart to provide fast and efficient access to the data. This step involves the following
tasks:
Creating the physical database and storage structures, such as table spaces, associated
with the data mart
Creating the schema objects, such as tables and indexes defined in the design step
Determining how best to set up the tables and the access structures
3. Populating
The populating step covers all of the tasks related to getting the data from the source,
cleaning it up, modifying it to the right format and level of detail, and moving it into the data
mart. More formally stated, the populating step involves the following tasks:
Mapping data sources to target data structures
Extracting data
Cleansing and transforming the data
Loading data into the data mart
Creating and storing metadata
4. Accessing
The accessing step involves putting the data to use: querying the data, analyzing it, creating
reports, charts, and graphs, and publishing these. Typically, the end user uses a graphical front-
end tool to submit queries to the database and display the results of the queries. The accessing
step requires that you perform the following tasks:
Set up an intermediate layer for the front-end tool to use. This layer, the Meta layer,
translates database structures and object names into business terms, so that the end user
can interact with the data mart using terms that relate to the business function.
Maintain and manage these business interfaces.
Set up and manage database structures, like summarized tables that help queries
submitted through the front-end tool execute quickly and efficiently.
5. Managing
This step involves managing the data mart over its lifetime. In this step, you perform
management tasks such as the following:
Providing secure access to the data
Managing the growth of the data
Optimizing the system for better performance
Ensuring the availability of data even with system failures
Data warehouse is related to a central repository Data Mart is only related to a specific group of users
for all organization's data. within the organization.
It takes the data from many data sources. It takes the data from a few data sources.
Size required can be 100 GB – TB and above. Size required can be less than 100 GB.
Implementation time can be from months to years. Implementation time can be in months.