You are on page 1of 21

UNIT III - DATABASE MANAGEMENT SYSTEMS

DBMS – HDBMS, NDBMS, RDBMS, OODBMS, Query Processing, SQL, Concurrency


Management, Data warehousing and Data Mart

DBMS- Data Base Management System

 Database is a collection of data, facts and figures which can be processed to produce
information relevant to an enterprise.
Eg.: Name of a student, age, class and her subjects can be counted as data for recording
purposes.
For example, if we have data about marks obtained by all students, we can then conclude about
toppers and average marks etc.

DBMS is a collection of interrelated data and set of program to access those data. (OR)
A software system that is used to manage databases is called a DBMS.
 The primary goal of database management system is to store data, in such a way which is
easier to retrieve, manipulate and helps to produce information that is both convenient
and efficient.
 DBMS examples include: MySQL, SQL Server, Oracle, dBASE, FoxPro
Database systems – Collection of Database and DBMS collectively known as Database systems.

Database system Applications:

 Banking: all transactions, loans, Customer Information ,Accounts


 Airlines: reservations, schedules
 Universities: registrations, grades, Student information
 Sales: Customer, products, purchase information
 Online retailers: Order tracking, customized recommendations
 Manufacturing: production, inventory, orders, supply chain
 Human resources: employee records, salaries, tax deductions
 Credit card transactions: purchases on credit cards , generation of monthly statements
 Telecommunications : call records, generating monthly bills
 Finance: holdings, sales and purchases of financial instruments such as stocks and bonds

Characteristics of DBMS:

Traditionally data was organized in file formats. DBMS was all new concepts to overcome all
the deficiencies in traditional style Modern DBMS has the following characteristics (or) features:
 Real-world entity
 Relation-based tables Objectives:
 Isolation of data and application 1.Allow multiple users to be active at one time.
2.Provide data integrity.
3.Protect the data from physical harm and unauthorized access.
4.Provide security with a user access privilege.
Types of DBMS: (Refer in data model)
 Hierarchical DBMS
 Network DBMS
 Less redundancy  Relational DBMS
 Consistency  Object-Oriented DBMS
 Query Language
 ACID (Atomicity, Consistency, Isolation and Durability) Properties: in multi-
transactional environment and in case of failure.
 Multiuser and Concurrent Access
 Multiple views
 Security: DBMS is not saved on disk as traditional file system it is very hard for a thief to
break the code.
 DBMS also stores metadata which is data about data, to ease its own process.

Advantages: Disadvantages:
o Controls database redundancy
o Cost of Hardware and Software is high
o Data sharing
o large Size and takes time to Set up
o Easily Maintenance
o Complexity
o Reduce time
o Higher impact of failure
o Backup
o multiple user interface
o Data Security
Users: Administrators: They create access profiles for users and apply
limitations to maintain isolation and force security. Maintains DBMS
resources.

Designers: identify and design the whole set of entities, relations,


constraints, and views

End Users: can be just viewers who pay attention to the logs or market
rates or end users can be as sophisticated as business a analyst who takes
the most of it.

DBMS - Architecture

DBMS architecture depends upon how users are connected to the database to get their
request done. The design of a Database Management System highly depends on its architecture.
It can be centralized or decentralized or hierarchical.

Types:
In 1-tier architecture, DBMS is the only entity where user directly sits on DBMS and uses it.
Any changes done here will directly be done on DBMS itself. It does not provide handy tools for
end users and preferably database designer and programmers use single tier architecture.

If the architecture of DBMS is 2-tier then it must have some application, which uses the
DBMS. Programmers use 2-tier architecture where they access DBMS by means of application.
Here application tier is entirely independent of database in term of operation, design and
programming.

3-tier architecture

Most widely used architecture is 3-tier architecture. 3-tier architecture separates it tier
from each other on basis of users. It is described as follows:

Database (Data) Tier: At this tier, the database resides along with its query processing
languages. We also have the relations that define the data and their constraints at this level.
Application (Middle) Tier :At this tier the application server and program, which access
database, resides. For a user, this application tier presents an abstracted view of the
database. End-users are unaware of any existence of the database beyond the application.
At the other end, the database tier is not aware of any other user beyond the application
tier. Hence, the application layer sits in the middle and acts as a mediator between the end-
user and the database.

User (Presentation) Tier: End-users operate on this tier and they know nothing about any
existence of the database beyond this layer. At this layer, multiple views of the database
can be provided by the application. All views are generated by applications that reside in the
application tier.

Multiple tier database architecture is highly modifiable as almost all its components are
independent and can be changed independently.

2-tier architecture
3-tier architecture
DBMS - Data Models
Data models define how the logical structure of a database is modeled. Data Models
are fundamental entities to introduce abstraction in a DBMS. Data models define how data is
connected to each other and how they are processed and stored inside the system.

Types of data models:

Categories of Data models

Record Based Models Object Based Models

 Relational Model  Entity-Relationship Model


 Network Model  Object-Oriented Model
 Hierarchical Model

RECORD BASED MODELS:

Describe data at the conceptual and view levels. These models specify logical structure of
database with records, fields and attributes.

 RDBMS-Relational Database Management System

The most popular data model in DBMS is the Relational Model. It is more scientific model than
others. In this model, the data is maintained in the form of a two-dimensional table. All the
information is stored in the form of row and columns. The basic structure of a relational
model is tables. So, the tables are also called relations in the relational model. Example: In
this example, we have a Student table.
Features of Relational Model

 Tuples: Each row in the table is


called tuple
 Attribute or field: Attributes are the
property which defines the table or
relation.
Properties of RDBMS:

 Data is stored in tables


called relations.
 Relations can be normalized(Large
table into small tables).
 It's Values are Atomic
 In Each Row is alone.
 Column Values are of the same thing.
 Columns are undistinguished.
Most popular RDBMS are Oracle, SQL Server,DB2,Sybase,etc.  Sequence of Rows is Insignificant.
 Each Column has a common Name.
Most of the RDBMS use the Structured Query Language(SQL) to access data from database.

Steps to create RDBMS:


1. Step 1: Define the Purpose of the
Database (Requirement Analysis) ...
2. Step 2: Gather Data, Organize in tables
and Specify the Primary Keys. ...
3. Step 3: Create Relationships among
Tables. ...
4. Step 4: Refine & Normalize the Design.

Advantages of RDBMS Disadvantages of RDBMS


 HDBMS-Hierarchical Database Management System
 The data is organized like a tree structure.
 Represents the data using parent-child relationship
 Follows one to many relationship (each child record has only one parent, whereas each
parent record can have one or more child records)

 NDBMS-Network Database Management System


 Network Model is same as hierarchical model except that it has graph-like
structure rather than a tree-based structure.
 Unlike hierarchical model, this model allows each record to have more than one
parent record.
 It supports many-to-many relationships.
 Most of the databases use SQL for manipulation of their data.
OBJECT BASED MODELS

Describe data at the conceptual( logical level - design of the database) and view levels.

 OODBMS – Object oriented Database Management System


 Data is stored in the form of objects, which are instances of classes.
 Each object contains of two elements:
1. Piece of data (e.g., sound, video, text, or graphics).
2. Instructions or software programs called methods, for what to do with the data.
 These classes and objects together makes an object oriented data model.
 Extension of E-R model
 Also called as an Object database management system(ODMS)

Example 1

Example 2

 Entity-Relationship Model
 Entity-Relationship model is based on the notion of real world entities and relationship
among them.
 ER Model is best used for the conceptual design of a database.
 ER Model is based on- (1)Entities and their attributes (2)Relationships among entities.
These concepts are explained below.

 Entity − real-world entity having properties called attributes. Every attribute is defined
by its set of values called domain.
 Relationship − The logical association among entities is called relationship.
Relationships are mapped with entities in various ways.
 Mapping cardinalities –the number of association between two entities.
Cardinality defines the number of entities in one entity set, which can be associated with the
number of entities of other set via relationship set.

. Mapping cardinalities − One-to-one − One entity from entity One-to-many − One entity from
set A can be associated with at most entity set A can be associated with
 one to one one entity of entity set B and vice more than one entities of entity set B
 one to many vers however an entity from entity set B,
 many to one can be associated with at most one
 many to many entity.

Many-to-one − More than one entities from entity


set A can be associated with at most one entity of
entity set B, however an entity from entity set B can
be associated with more than one entity from entity Many-to-many − One entity from A can be
set A. associated with more than one entity from B and
vice versa.

A B
QUERY PROCESSING

Query Processing is a translation of high-level queries(like SQL) into low-level expression.

 It is a step wise process that can be used at the physical level of the file system, query
optimization and actual execution of the query to get the result.
 It requires the basic concepts of relational algebra and file structure.
 It refers to the range of activities that are involved in extracting data from the database.
 It includes translation of queries in high-level database languages into expressions that can be
implemented at the physical level of the file system.
 In query processing, we will actually understand how these queries are processed and how
they are optimized.

Basic Steps In Query Processing


1. Parsing and translation
2. Optimization
3. Evaluation
1. Parsing and translation
 Parser checks syntax, verifies relations and the attributes which are used in the query.
 Translate the query into its internal form. This is then translated into relational algebra.
2. Query Optimization

 SQL is a very high level language:


o The users specify what to search for- not how the search is actually done
o The algorithms are chosen automatically by the DBMS.
 For a given SQL query there may be many possible execution plans.
 Amongst all equivalent evaluation plans choose the one with lowest cost. Cost is estimated
using statistical information from the database catalog.
transforms the query into equivalent expressions that are more efficient to execute
3. Evaluation
 The query-execution engine takes a query-evaluation plan, executes that plan, and returns the
answers to the query.
 A relational algebra expression may have many equivalent expressions.

 Each relational algebra operation can be evaluated using one of several different algorithms.
Correspondingly, a relational-algebra expression can be evaluated in many ways.
 Annotated expression specifying detailed evaluation strategy is called an evaluation-plan.
SQL
SQL is Structured Query Language, which is a computer language for storing, manipulating and
retrieving data stored in relational database.

SQL is the standard language for Relation Database System. All relational database management systems
like MySQL, MS Access, and Oracle, Sybase, Informix, postgres and SQL Server use SQL as standard
database language.

Why SQL? / Applications of SQL


 Allows users to access data in relational database management systems.
 Allows users to describe the data.
 Allows users to define the data in database and manipulate that data.
 Allows to embed within other languages using SQL modules, libraries & pre-compilers.
 Allows users to create and drop databases and tables.
 Allows users to create view, stored procedure, functions in a database.
 Allows users to set permissions on tables, procedures, and views
History
 1970 -- Dr. Edgar F. "Ted" Codd of IBM is known as the father of relational databases. He
described a relational model for databases.
 1974 -- Structured Query Language appeared.
 1978 -- IBM worked to develop Codd's ideas and released a product named System/R.
 1986 -- IBM developed the first prototype of relational database and standardized by ANSI.
The first relational database was released by Relational Software and its later becoming Oracle.

SQL Process
 When you are executing an SQL command for any RDBMS, the system determines the best way
to carry out your request and SQL engine figures out how to interpret the task.
 There are various components included in the process. These components are Query Dispatcher,
Optimization Engines, Classic Query Engine and SQL Query Engine, etc. Classic query engine handles
all non-SQL queries but SQL query engine won't handle logical files.

SQL Commands
The standard SQL commands to interact with relational databases are CREATE, SELECT,
INSERT,UPDATE, DELETE and DROP. These commands can be classified into groups based on their
nature.
1. DDL - Data Definition Language
Command Description
CREATE Creates a new table, a view of a table, or other object in database
ALTER Modifies an existing database object, such as a table.
DROP Deletes an entire table, a view of a table or other object in the database.
TRUNCATE Remove all records from a table, including all spaces allocated for the records are
removed
RENAME Rename an object

2. DML - Data Manipulation Language


Command Description
SELECT Retrieves certain records from one or more tables
INSERT Creates a record
UPDATE Modifies records
DELETE Deletes records

SQL Process
3. DCL - Data Control Language
Command Description
GRANT Gives a privilege to user
REVOKE Takes back privileges granted from user

4. TCL - Transaction Control Language


Command Description
COMMIT Commits a transaction
ROLLBACK Rollback a transaction in case of any error occurs
SAVEPOINT To rollback the transaction making points within groups

Example:
 CREATE:
Syntax: To create table - Create table tablename; Eg: Create table student;
To create table with Column name - Create table tablename (column_name1 data_ type
constraints, column_name2 data_ type constraints …)
Eg.: Create table stud (sname varchar2(20), rollno number(10) not null,dob date not null);
 ALTER:
Syntax: alter table tablename add/modify (attribute datatype(size));
Eg.1: Alter table stud add (phone_no char (20));
Eg.2: Alter table stud modify(phone_no number (10));
 DROP:
Syntax: drop table tablename; Eg.: drop table stud;
To drop column- Eg.: alter table emp drop column experience;

 TRUNCATE:
Syntax: Truncate table tablename; Eg.: Truncate table stud;
 DESC-This is used to view the structure of the table.
Syntax: desc tablename; Eg.: desc emp; Name Null? Type
 INSERT: --------------------------------- --------
 Inserting a single row into a table – EmpNo NOT NULL number(5)
Syntax: insert into <table name> values (value list); EName VarChar(15)
Example: insert into s values(‘s3’,’sup3’,’blore’,10); Job NOT NULL Char(10)
1 row created. DeptNo NOT NULL number(3)
 Inserting more than one record using a single insert commands PHONE_NO number (10)
Syntax: insert into <table name> values (&col1, &col2, ….);
Example: insert into stud values(&reg, ‘&name’, &percentage);
Enter value for reg: 1
Enter value for name: Mathi
Enter value for percentage: 90
1 row created.
SQL> /
Enter value for reg: 2
Enter value for name: Arjun
Enter value for percentage: 92
1 row created.

 SELECT:
 Selects all rows from the table REG NAME PERCENTAGE
Syntax: Select * from tablename; Eg.: Select * from stud; ---------- -------------------- -------------
1 Mathi 90
2 Arjun 92
REG NAME
 The retrieval of specific columns from a table ---------- -------
Syntax: Select column_name1, …..,column_namen from table name; 1 Mathi
Eg.: Select reg, name from stud; 2 Arjun
Select command with where clause
 Example: Select empno, empname from emp where sal>4000;
 UPDATE COMMAND
Syntax:update tablename set field=values where condition;
Example:Update emp set sal = 10000 where empno=135;
 DELETE COMMAND
Syntax: Delete from table where conditions; Eg.: delete from emp where empno=135;
 GRANT Eg.: Grant all on employees to departments;
Grant succeeded.
Grant some options: Eg.: Grant select, update , insert on employees to departments with grant option;
Grant succeeded.
 REVOKE Eg.: Revoke all on employees from departments; Revoke succeeded.
Revoke some options: Revoke select, update , insert on employees from departments;
Revoke succeeded.
 SAVEPOINT: Syntax : SAVEPOINT <SAVE POINT NAME>; Eg.: SAVEPOINT S1;
 ROLLBACK: Syntax: ROLL BACK <SAVE POINT NAME>; Eg.: rollback s1;
 COMMIT: Commit; Commit complete.
GROUP FUNCTIONS:
A group function returns a result based on group of rows.
1. avg - Example: select avg (total) from student;
2. max - Example: select max (percentage) from student;
3. min - Example: select min (percentage) from student;
4. sum - Example: select sum(price) from product;
COUNT FUNCTION:
In order to count the number of rows, count function is used.
1. count(*) – It counts all, inclusive of duplicates and nulls.
Example: select count(*) from student; COUNT(*) COUNT(NAME)
2. count(col_name)– It avoids null value. --------- ----------
Example: select count(name) from stud; 5 4

Display all the details of the records whose employee name starts with ‘A’.
select * from emp where ename like 'A%';
EMPNO ENAME JOB DEPTNO SAL
---------- -------------------- ------------- ---------- ----------
2 Arjun ASP 2 15000
5 Akalya AP 1 10000

Display all the details of the records whose employee name does not starts with ‘A’.
Ans:
SQL> select * from emp where ename not like 'A%';
EMPNO ENAME JOB DEPTNO SAL
---------- -------------------- ------------- ---------- ----------
1 Mathi AP 1 10000
3 Gugan ASP 1 15000
4 Karthik Prof 2 30000
CONCURRENCY MANAGEMENT

Concurrency(Parallel) – it is a property of systems in which several computations are executing


simultaneously

When more than one user utilizes a DBMS, problems can occur if the system is not designed for multi-
users

CONCURRENCY MANAGEMENT

In a multiprogramming environment where more than one transactions can be concurrently executed,
there exists a need of protocols to control the concurrency of transaction to ensure atomicity and isolation
properties of transactions.

Concurrency control protocols in DBMS are procedures that are used for managing multiple
simultaneous operations without conflict with each other.

Concurrency control protocols, which ensure speed, serializability of transactions, are most desirable.
Concurrency control protocols can be broadly divided into two categories:

1. Lock based protocols Methods to avoid concurrency:


2. Time stamp based protocols
1. Locking file – Lock file being used
Reasons for Concurrency management:
2. Locking record – Lock record being used
 To improved throughput and resource utilization
3. Locking data field - lock data element /field
 To reduce waiting time
 To resolve read-write and write-write conflict issues 4. Versioning- View record and make updates

1.Lock based protocols

 Lock is in other words called as access(the operations that can be performed on the data item)
 Locks help synchronize access to the database items by concurrent transactions.
 All lock requests are made to the concurrency-control manager. Transactions proceed only once
the lock request is granted.
 It ensures that one transaction should not retrieve and update record while another transaction
is performing a write operation on it

Example:
In traffic light signal that indicates stop and go, when one signal is allowed to pass at a time and other
signals are locked, in the same way in a database transaction, only one transaction is performed at a time
meanwhile other transactions are locked
Locks are of two kinds:

 Binary Locks: A Binary lock on a data item can either locked or unlocked states.
 Shared/exclusive: This type of locking mechanism separates the locks based on their uses.

The lock compatibility matrix for shared / exclusive locks is given below:
There can be any number of transactions for holding shared locks on an item
Shared Exclusive
but if any transaction holds exclusive lock then item no other transaction may
Shared True False
Exclusive False False hold any Lock on the item
i)Exclusive Lock(X):

 If a lock is acquired on a data item to perform a write operation, it is called an exclusive lock.

ii) Shared Lock (S):

 A shared lock is also called a Read-only lock.


 With the shared lock, the data item can be shared between transactions. This is because you will
never have permission to update data on the data item.

TYPES OF LOCK PROTOCOLS:

a) Simplistic Lock Protocol


Simplistic lock-based protocols allow transactions to obtain a lock on every object before a 'write'
operation is performed. Transactions may unlock the data item after completing the ‘write’ operation.

b) Pre-claiming Lock Protocol


 Pre-claiming protocols evaluate their operations and create a list of data items on which they
need locks.
 Before initiating an execution, the transaction requests the system for all the locks it needs
beforehand.
 If all the locks are granted, the transaction executes and releases all the locks when all its
operations are over.
 If all the locks are not granted, the transaction rolls back and waits until all the locks are
granted.
For example, if we have to calculate total marks of 3
subjects, then this protocol will evaluate the transaction
and list the locks on subject1 marks, subject2 marks and
then subject3 marks. Once it gets all the locks, it will start
the transaction.
c) Two-Phase Locking (2PL)
 In this type of protocol, as the transaction begins to execute, it starts requesting for the locks that
it needs.
 It goes on requesting for the locks as and when it is needed. Hence it has a growing phase of
locks. Advantage: transaction are serialized
 At one stage it will have all the locks. Disadvantage: Deadlock arrives

 Once the transaction is complete and releases its first lock, it goes on releasing the locks
 the transaction cannot demand any new locks; it only releases the acquired locks.
 Hence it will have descending phase of locks.
 Thus this protocol has two phases – growing phase of locks and shrinking phase of locks.
For example, if we have to calculate total marks of 3 subjects, then this
protocol will go on asking for the locks on subject1 marks, subject2
marks and then subject3 marks. As and when it gets the locks on the
subject marks it reads the marks. It does not wait till all the locks are
received. Then it will have total calculation. Once it is complete it
release the lock on subject3 marks, subject2 marks and subject1 marks.
d) Strict Two-Phase Locking
 The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first
phase, the transaction continues to execute normally. But in contrast to 2PL, Strict-2PL
does not release a lock after using it. Strict-2PL holds all the locks until the commit point
and releases all the locks at a time.

In the example of calculating total marks of 3 subjects,


locks are achieved at growing phase of the transaction
and once it receives all the locks, it executes the
transaction. Once the transaction is fully complete, it
releases all the locks together.

2. Time stamp based protocols:

 The most commonly used concurrency protocol is the timestamp based protocol.
 This protocol uses either system time or logical counter as a timestamp.
 Lock-based protocols manage the order between the conflicting pairs among transactions
at the time of execution, whereas timestamp-based protocols start working as soon as a
transaction is created.
 Example:
Suppose there are there transactions T1, T2, and T3.
T1 has entered the system at time 0010
T2 has entered the system at 0020
T3 has entered the system at 0030
Priority will be given to transaction T1, then transaction T2 and lastly Transaction T3.
The timestamp-ordering protocol : The timestamp-ordering protocol ensures serializability among
transaction in their conflicting read and writes operations. This is the responsibility of the protocol system
that the conflicting pair of tasks should be executed according to the timestamp values of the transactions.
Advantages:
 Time-stamp of Transaction Ti is denoted as TS (Ti).
 Read time-stamp of data-item X is denoted by R-timestamp(X).
 Schedules are serializable just like
 Write time-stamp of data-item X is denoted by W-timestamp(X).
2PL protocols
Timestamp ordering protocol works as follows:
 No waiting for the transaction,
 If a transaction Ti issues read(X) operation: which eliminates the possibility of
 If TS(Ti) < W-timestamp(X) deadlocks!
Operation rejected.
 If TS(Ti) >= W-timestamp(X) Disadvantages:
Operation executed.
All data-item Timestamps updated.  Starvation is possible if the same
 If a transaction Ti issues write(X) operation: transaction is restarted and
 If TS(Ti) < R-timestamp(X) continually aborted
Operation rejected.
 If TS(Ti) < W-timestamp(X) Deadlock: It is a situation where two or more transactions are
Operation rejected and Ti rolled back.
Otherwise, operation executed. waiting indefinitely for each other to give up their locks

Thomas Write Rule:

if TS(Ti) < W-timestamp(X), then the operation is rejected and T is


i

rolled back. Instead of making T rolled back, the 'write' operation


i

itself is ignored.
DATA WAREHOUSE

Data warehouse is data management and data analysis

 A Data Warehouse is separate from DBMS, it stores huge amount of data, which is
typically collected from multiple heterogeneous source like files, DBMS, etc.
 The goal is to produce statistical results that may help in decision makings.
(OR)
 A data warehouse is a centralized repository that stores data from multiple information
sources and transforms them into a common, multidimensional data model for efficient
querying and analysis.

For example, a college might want to see quick different results, like how is the placement of
CS students has improved over last 10 years, in terms of salaries, counts, etc.

FEATURES OF DATA WAREHOUSE:


 Subject Oriented - Data warehouse provides the information about major subject areas of the
organization.
 Integrated - It is constructed by integrating data and enhances the effective analysis of data.
 Non-Volatile - Data warehouse does not erase the previous data when new data is added to it.
 Time Variant - It provides the information in a historical view.

According to Bill Inomn (1990)


“A data warehouse is a subject oriented, integrated, time-variant and non-volatile collection of data. This data
helps analysts to make informed decisions in an organization.”

DATA WAREHOUSE ARCHITECTURE:


Main components
 Operational data sources for the DW is supplied from mainframe operational data held
in first generation hierarchical and network databases, departmental data held in
proprietary file systems, private data held on workstations and private serves and external
systems such as the Internet, commercially available DB, or DB associated with and
organization‘s suppliers or customers.
 Operational datastore(ODS) is a repository of current and integrated operational data
used for analysis. Simply act as a staging area.
 Load manager: Load manager is also called the front component. It performs with all
the operations associated with the extraction and load of data into the warehouse
followed by transformations to prepare the data for entering into the Data warehouse.
 Warehouse Manager is responsible for the warehouse management process. The
operations include Backup the data in the data warehouse, Archives the data that has
reached the end of its captured life.
 Query manager also called backend component, Query manager is responsible for
directing the queries to the suitable tables and scheduling the execution of queries.
 End-user access tools: This is categorized into five main groups like 1. Data Reporting
2. Query Tools 3. Application development tools 4. EIS (Executive Information System)
tools, 5. OLAP(On-Line Analytical Processing) tools and data mining tools.

Data flow

 Inflow- The processes associated with the extraction, cleansing, and loading of the data
from the source systems into the data warehouse.
 upflow- The process associated with adding value to the data in the warehouse through
summarizing, packaging , packaging, and distribution of the data.
 downflow- The processes associated with archiving and backing-up of data in the
warehouse.

Advantages of Data Warehouse Disadvantages of Data warehouse

 Data Warehouse reduces the cost used to  Data Warehouse is not easy to maintain. It can
access historical data. be costly to maintain it.
 It allows others to access and share the data.  Data Warehouse has security issues.
 It improves turnaround time for analysis and  It is a time consuming process.
reporting.
 Not an ideal option for unstructured data.

Staging Area
Tools and Technologies

 The critical steps in the construction of a data warehouse:

Extraction Cleansing Transformation

 After the critical steps, loading the results into target system can be carried out either by
separate products, or by a single, categories:

code generators database data replication tools dynamic transformation engines

 For the various types of meta-data and the day-to-day operations of the data warehouse,
the administration and management tools must be capable of supporting those tasks:
 Monitoring data loading from multiple sources
 Data quality and integrity checks
 Managing and updating meta-data
 Monitoring database performance to ensure efficient query response times and
resource utilization
 Auditing data warehouse usage to provide user chargeback information
 Replicating, sub-setting, and distributing data
 Maintaining efficient data storage management
 Purging data;
 Archiving and backing-up data
 Implementing recovery following failure
Some of the prominent Data Warehousing tools are MarkLogic, Oracle, Amazon RedShift

DATA MART

A data mart is a simple form of a data warehouse that is focused on a single subject (or
functional area), such as sales, finance or marketing. Data marts are often built and controlled by
a single department within an organization.

Given their single-subject focus, data marts usually draw data from only a few sources.
The sources could be internal operational systems, a central data warehouse, or external data.

Data Mart

Data Warehouse Architecture with a Staging Area and Data


TYPES OF DATA MART- The categorization is based primarily on the data source that feeds
the data mart. There are three main types of data marts are:

1. Dependent: Draw data from a central data warehouse that has already been created.
2. Independent: Independent data mart is created without the use of a central data
warehouse. Draw data directly from operational or external sources of data or both.
3. Hybrid: This type of data marts can take data from data warehouses or operational
systems.

The Extraction-Transformation-and Loading (ETL) process is how you get data out of the
sources and feed into the data mart which involves moving data from operational systems,
filtering it, and loading it into the data mart.

1.Dependent Data Marts:

 Dependent data marts draw data from a central data warehouse that has already been
created. This gives you the usual advantages of centralization.
 ETL process is somewhat simplified because formatted and summarized (clean) data has
already been loaded into the central data warehouse.
 The ETL process for dependent data marts is mostly a process of identifying the right
subset of data relevant to the chosen data mart subject and moving a copy of it, perhaps in
a summarized form.
 Motivation- usually built to achieve improved performance and availability, better
control, and lower telecommunication costs resulting from local access of data relevant to
a specific department.

Dependent Data Marts Independent Data Marts Hybrid Data Marts


2.Independent Data Marts:

 Independent data marts are standalone systems built by drawing data directly from
operational or external sources of data, or both.
 Independent data marts deals with all aspects of the ETL process you do with a central
data warehouse. The number of sources is likely to be fewer and the amount of data
associated with the data mart is less than the warehouse, given your focus on a single
subject.
 This could be desirable for smaller groups within an organization.
 Motivation- The creation of independent data marts is often driven by the need to have a
solution within a shorter time.

3. Hybrid Data Marts:

 A hybrid data mart allows you to combine input from sources other than a data
warehouse.
 This could be useful for many situations, especially when you need ad hoc integration,
such as after a new group or product is added to the organization.
 Hybrid data marts simply combine the issues of independent and independent data marts.
 It is best suited for multiple database environments and fast implementation turnaround
for any organization. It also requires least data cleansing effort.
 Hybrid Data mart also supports large storage structures, and it is best suited for flexible
for smaller data-centric applications.
The major steps in implementing a data mart are:
STEPS IN IMPLEMENTING A DATAMART
 Designing - design the schema
 Constructing - construct the physical storage
 Populating - populate the data mart with
data from source systems
 Accessing - access it to make informed
decisions
 Managing - manage it over time

1. Designing
The design step is first in the data mart process. This step covers all of the tasks from
initiating the request for a data mart through gathering information about the requirements, and
developing the logical and physical design of the data mart. The design step involves the
following tasks:
 Gathering the business and technical requirements DATA MART ISSUES
 Identifying data sources (capability)
 Selecting the appropriate subset of data (Performance decreases)
 Designing the logical and physical structure of the
data mart
2. Constructing
This step includes creating the physical database and the logical structures associated with
the data mart to provide fast and efficient access to the data. This step involves the following
tasks:
 Creating the physical database and storage structures, such as table spaces, associated
with the data mart
 Creating the schema objects, such as tables and indexes defined in the design step
 Determining how best to set up the tables and the access structures
3. Populating
The populating step covers all of the tasks related to getting the data from the source,
cleaning it up, modifying it to the right format and level of detail, and moving it into the data
mart. More formally stated, the populating step involves the following tasks:
 Mapping data sources to target data structures
 Extracting data
 Cleansing and transforming the data
 Loading data into the data mart
 Creating and storing metadata
4. Accessing
The accessing step involves putting the data to use: querying the data, analyzing it, creating
reports, charts, and graphs, and publishing these. Typically, the end user uses a graphical front-
end tool to submit queries to the database and display the results of the queries. The accessing
step requires that you perform the following tasks:
 Set up an intermediate layer for the front-end tool to use. This layer, the Meta layer,
translates database structures and object names into business terms, so that the end user
can interact with the data mart using terms that relate to the business function.
 Maintain and manage these business interfaces.
 Set up and manage database structures, like summarized tables that help queries
submitted through the front-end tool execute quickly and efficiently.
5. Managing
This step involves managing the data mart over its lifetime. In this step, you perform
management tasks such as the following:
 Providing secure access to the data
 Managing the growth of the data
 Optimizing the system for better performance
 Ensuring the availability of data even with system failures

Data Warehouse Data Mart

Data warehouse is related to a central repository Data Mart is only related to a specific group of users
for all organization's data. within the organization.

It holds multiple subjects. It holds single subject.

It takes the data from many data sources. It takes the data from a few data sources.

Size required can be 100 GB – TB and above. Size required can be less than 100 GB.

Implementation time can be from months to years. Implementation time can be in months.

It holds detailed information. It holds summarized data.

You might also like