You are on page 1of 37

Advanced

Concepts of
DBMS
Presented by
Suvendu Chattaraj
Contents:
• Online Analytical Processing (OLAP)
• Online Transaction Processing (OLTP)
• Data Warehousing and Data Mining
• Concept
• Features
• Components
• Application areas
• Data Backup
• Concept
• Types
Online Transaction Processing (OLTP)
• Refers to a class of systems that facilitate and manage transaction-
oriented applications
• Typically for data entry and retrieval transaction processing
 banking, airlines, mail-order, supermarkets,
manufacturers…
• Or in the context of commercial transactions processing in
which the system must responds immediately to user requests
 An automatic teller machine (ATM) for a bank
• An OLTP system captures and maintains transaction data in a
database.
• OLTP enables the real-time execution of large numbers of
database transactions by large numbers of people, typically over
the internet
• Emphasis is on fast processing, because OLTP databases are read,
written, and updated frequently
Concept:
• With the advent of digital customer interactions, OLTP systems
are coming into greater demand
• To manage interactions with digital customers, companies
require to store and manage huge volume of data related to
customers as well as operations
• Managing such activities manually is infeasible due to possibility
of human error, involvement of huge cost etc.
• OLTP is more reliable in this context
• It replaces paper-based processes to make businesses more
efficient
Concept (Continues…):
• OLTP systems allow businesses to support thousands or even
millions of transactions for many users concurrently in near real
time.
• OLTP plays a central role in delivering everyday services, from
ATM withdrawals to online purchases.
• Defining characteristics of OLTP are short response time,
high concurrency, small and simple transactions, assured
data integrity, high availability...
• OLTP Systems are supported by relational databases which are
built on the principals of ACID
Characteristics of an OLTP Systems:
• Short Response Time:- A very short response time to perform
real time transactions in a timely manner helps the user of OLTP
systems to remain productive. If a user need to wait for 10-15
minutes inside an ATM counter to withdraw cash, such system
would have lost popularity.

• High concurrency:- Due to the large user population, the short


response times, and small transactions, need of concurrent
transactions in OLTP environments is very high. A requirement
for thousands of concurrent users is very much common. For
example, an online auction website can have hundreds of
thousands (if not millions) of users accessing data on its website
at the same time. Each user’s transactions are performed by
OLTP system concurrently, without any error.

Reference:- https://docs.oracle.com/database/121/VLDBG/GUID-0BC75680-5BD4-43A9-826F-CD8837D30EB2.htm#VLDBG1367
Characteristics of an OLTP Systems:
• Small and simple transactions:- OLTP systems typically comprised
of simple transactions, such as data updates, insertions, deletions and
simple queries. It manipulates highly selective, small amounts of data.
For example, one of many call center employees retrieves customer
details for every call and enters customer complaints while reviewing
past communications with the customer.
• High availability:- The availability requirements for OLTP systems
are often extremely high. An unavailable OLTP system can impact a
very large user population, and organizations can suffer major losses if
OLTP systems are unavailable. For example, a stock exchange system
has extremely high availability requirements during trading hours.
• Data integrity:- in OLTP systems data transactions happen in a
specific order and users cannot change data simultaneously which
ensures data integrity. If a failure takes place in any step of a
transaction, the transaction is cancelled and can not be continued
further.
Reference:- https://docs.oracle.com/database/121/VLDBG/GUID-0BC75680-5BD4-43A9-826F-CD8837D30EB2.htm#VLDBG1367
Examples of OLTP
• Cash withdrawals from ATM machines.
• The purchasing journey and transactions taking place on large
ecommerce websites.

• Taking restaurant orders via a food delivery app, even with many
customers ordering at once.

• Managing call center data to enable associates to take customers'


latest interactions into account.

• Online bookings (ticketing, reservation systems, etc.)


• Record keeping (including health records, inventory control,
production scheduling, claims processing, customer service
ticketing, and many other applications)
Online analytical processing (OLAP)
• Software technology to analyze business data from different
points of view
• Organizations collect and store data from multiple data sources
(like websites, applications etc…) which are combined by OLAP
into different categories to facilitate strategic planning
• Example:
• Assume a retailer stores data about the color, size, cost and
location of all the products it sells
• The retailer also collects customers purchase data like the
product sold and the cost of each product, in a different
application
• OLAP may combine these data and answer query like which
color product is most popular?
More examples:

• A company might compare their mobile phone sales in


September with sales in October, then compare those results
with another location which may be stored in a separate
database.
• Amazon analyzes purchases by its customers to come up with a
personalized homepage with products which likely interest to
their customer.
Concept:

• OLAP provides a user-friendly environment for interactive data


analysis.
• An OLAP system is market-oriented and is used for data
analysis by knowledge workers, including managers,
executives, and analysts.
• An OLAP system manages large amounts of historic data,
provides facilities for summarization and aggregation, and stores
and manages information at different levels of granularity.
• OLAP systems can organize and present data in various
formats in order to accommodate the diverse needs of
different users.
Importance of OLAP
• Following are some benefits of OLAP
• Faster decision making
• OLAP systems can pre-calculate and integrate the data collected
by business which in turn helps in faster decision making
• Non-technical user support
• Business users / managers are well capable of doing complex
calculations rather handling DBMS. OLAP system makes such
complex analysis easier without involving DBMS much
• Integrated data view
• OLAP provides a unified platform for marketing, finance,
production, and other business units which helps managers and
decision makers to perform what-if analysis, which shows the
impact of decisions taken by one department on other areas of
the business.
Some applications:
• Business Reporting for sales
• Business Process Management
• Marketing analysis
• Customer and product profitability
• Supply and Demand forecasting
• Human resources analysis
• Resource analysis and capacity planning
• Variance analysis
• Claims experience analysis
OLAP vs. OLTP

Reference:- https://www.techtarget.com/searchdatamanagement/definition/OLAP
Introduction to Data Mining
• Data is raw fact or disconnected fact.
• Information is the Processed data.
• Knowledge is derived from information by applying rules to it.

• Data mining is the process of extracting hidden, valid, and


potentially useful patterns in huge data sets.
• Data Mining is all about discovering unsuspected/ previously
unknown relationships amongst the data.
• Data Mining, also popularly known as Knowledge Discovery in
Databases (KDD), refers to the nontrivial extraction of implicit,
previously unknown and potentially useful information from data
in databases
Data mining concept
• We are in information age…
• It is believed that, information
leads to power and success
• Tremendous amount of
information is collected by
means of computers, satellites,
various types of sensors etc.,
ranging from business
transactions and scientific data,
to satellite pictures, text reports,
military intelligence…
• However, information retrieval is
not enough for decision making
• New needs in terms of data
mining is adopted for making
better managerial decisions.
Data mining advantages
• Knowledge-based data can be obtained which
helps in taking better decision
• Profitable modifications in operation and
production
• Cost efficient than other statistical methods
• Facilitates prediction of business trends and
behaviors by discovering hidden patterns
• Compatible with new system as well as the
existing platforms.
• Analysis of huge amount of data is easy and
convenient
Features of data mining
1. Data Ingestion - Supporting different formats in the industry
can be a significant competitive advantage.
• Need to understand how data is collected and stored
• Need to understand – what is the variety, velocity and
volume (3 Vs) of data and how to handle it in application?
• What will be the data lifecycle?
• How are you archiving old data?
2. Actual Analysis –
• What variables are important?
• What types of analysis do your customers need?
• How flexible is the analysis?
• Can the user change the time-window, variables being
analyzed?
3. Visualization
• What is the best visualization for all this data and analysis?
Components of data mining

Reference:- https://static.javatpoint.com/tutorial/data-mining/images/data-mining-architecture.png
Components of data mining (Continues…)
• Databases − One or a set of databases, data warehouses,
spreadsheets, and another type of data repository where
data cleaning and integration techniques can be
implemented.
• Data warehouse server − A server to fetch the relevant
records based on users request from a data warehouse
• Knowledge base − Knowledge repository to discover
important patterns.
• Data mining engine − a module to perform machine
learning tasks (classification, association, cluster analysis,
etc.)
• Pattern evaluation module − investigates patterns using a
threshold value
• User interface − Enables users to interact with the system
through the graphical user interface.
Types of Data for Mining

1. Flat files (The data for transactions, time-series data,


scientific measurements, etc can be represented in these
files.)
2. database data (Relational databases are one of the most
commonly available and richest information repositories)
3. data warehouse data (A data warehouse is a repository of
information collected from multiple sources, stored under a
unified schema, and usually residing at a single site. )
4. transactional data. (A transaction typically includes a unique
transaction identity number (trans ID) and a list of the items
making up the transaction, such as the items purchased in the
transaction. )
Application Domains
• Primarily used by
organizations with intense
consumer
• Enables a retailer to use point-
of-sale records of customer
purchases to develop products
and promotions that help the
organization to attract the
customer.
• Two highly successful and
popular application examples
of data mining:
 Business
intelligence
 Search engines.
Applications of Data Mining in Marketing
• Market basket analysis
analyzes customer buying
habits by finding associations
between the different items
that customers place in their
“shopping baskets”
• The discovery of these
associations can help retailers
develop marketing strategies,
improve stocking, store layout
strategies, and promotions by
analyzing which items are
frequently purchased together
by customers.
Applications of Data Mining in Marketing
• Companies use mining data to tailor their coupons,
advertisements and sales to consumers
• This marketing tactic is more effective, efficient and can
save the company money

Amazon Case Study


• Amazon uses mining data to improve the customer-service
• Mining data includes, name, address & basic personal info as
well as consumer preferences and the specific issue the
consumer is trying to fix
• Use synchronized data to transfer all the data about an
individual collected from various departments to provide the
customer service representative with the information they
need to have an effective human conversation
Starbucks Case Study

• Starbucks Corporation is an American multinational


chain of coffeehouses
• Uses data to determine the best locations for their
stores
• Multiple Starbucks locations are able to do so well in
such close proximity due to data mining and modeling
• Use location-based data, street traffic analysis and
demographic information to determine where their
locations will have the most success
Disadvantages of Data Mining
• There is a probability that the organizations may sell
useful data of customers to other organizations for
money. As per the report, American Express has sold
credit card purchases of their customers to other
organizations.
• Many data mining analytics software is difficult to
operate and needs advance training to work on.
• Different data mining instruments operate in distinct
ways due to the different algorithms used in their
design. Therefore, the selection of the right data
mining tools is a very challenging task.
• The data mining techniques are not precise, so that it
may lead to severe consequences in certain
conditions.
Data Warehouse
• A single, complete and
consistent store of data
obtained from a variety of
different sources made
available to end users in a
way that they can
understand and use in a
business context
• a central repository of
information that can be
analyzed to make more
informed decisions
• Warehouse are the very
large databases.
What is Data Warehousing?
• The process of transforming data into information and
making it available to users in a timely enough manner to
make a difference is known as data warehousing.
• “A data warehouse is a subject-oriented, integrated,
time-variant, and nonvolatile collection of data in
support of management's decision making process”
• Data warehouses provide online analytical processing
(OLAP) tools for the interactive analysis of
multidimensional data of varied granularities, which
facilitates effective data generalization and data mining.
Difference between data warehouse
and database
Features of data warehouse
The key features of a data warehouse include:

• Subject Oriented: Provides information catered to a specific


subject instead of the whole organization’s ongoing operations.

• Integrated: It is developed by combining data from multiple


sources, such as flat files and relational databases; offers better
data analysis.

• Time-Variant: The data in a DWH gives information from a


specific historical point of time; therefore, the data is
categorized with a particular time frame.

• Non-volatile: Non-volatile refers to historical data that is not


omitted when newer data is added.
Source:- https://www.astera.com/type/blog/what-is-data-warehousing/
Applications of a Data Warehouse
• Banking
• Identify the potential risk of default and manage and control
collections
• Performance analysis of each product, service, interchange,
and exchange rates
• Track performance of accounts and user data
• Provide feedback to bankers regarding customer relationships
and profitability
• Finance
• Evaluation of customer expenses trends
• Maintain transparency in transactions
• Predict/spot defaulters and act accordingly
• Analyze and forecast different aspects of business, stock, and
bond performance
Applications of a Data Warehouse
• Government
• Maintain and analyze tax records, health policy records, and their
respective providers
• Prediction of criminal activities from patterns and trends
• Searching terrorist profile
• Threat assessment and fraud detection
• Education
• Store and analyze information about faculty and students
• Maintain student portals to facilitate student activities
• Extract information for research grants
• Retail
• Maintain records of producers and consumers
• Track items, their promotion strategies, and consumer buying
trends (trend analysis)
• Analyze sales to determine shelf space
• Understanding the patterns of complaints, claims, and returns
Database
Backup & Recovery
Failure Classification
• Transaction failure :
• Logical errors: transaction cannot complete due to some
internal error condition
• System errors: the database system must terminate an active
transaction due to an error condition (e.g., deadlock)
• System crash: a power failure or other hardware or
software failure causes the system to crash.
• Fail-stop assumption: non-volatile storage contents are
assumed to not be corrupted by system crash
• Database systems have numerous integrity checks to
prevent corruption of disk data
• Disk failure: a head crash or similar disk failure destroys all
or part of disk storage
• Destruction is assumed to be detectable: disk drives use
checksums to detect failures
Database backup:
• Copies the data or log records from a server database or its
transaction log to a backup device, such as a disk, to create a
data backup or log backup.
• It is a copy of server data that can be used to restore and
recover the data after a failure.
• A backup of server data is created at the level of a database or
one or more of its files or filegroups.
• Table-level backups cannot be created.
• In addition to data backups, the full recovery model requires
creating backups of the transaction log.
• It is a safeguard against unexpected data loss and application
errors; protects the database against data loss.
• If the original data is lost, then using the backup it can
reconstructed.
Backup Types
• Full Backup - Backs up all the data in the database and records all database
file locations and also enough log to allow for recovering that data.
• Differential Backup - Backups on the data that has changed since the last
Full backup. Includes the portion of the transaction log that contains database
modifications that occurred during the backup.

• File/Filegroup Backup - A backup of one or more database files or


filegroups.

• Transaction Log Backup - A backup of transaction logs that includes all log
records that were not backed up in a previous log backup. (full recovery
model)
• Partial Backup - Contains data from only some of the filegroups in a
database, including the data in the primary filegroup, every read/write
filegroup, and any optionally-specified read-only files.

• Copy Only Backup - A special-use backup that is independent of the regular


sequence of SQL Server backups.
Reference:- https://learn.microsoft.com/en-us/sql/relational-databases/backup-restore/backup-overview-sql-server?view=sql-server-ver16
Thank You

You might also like