Professional Documents
Culture Documents
SAMIR SIDDIQUI
CR(FINAL YEAR)
Department of Information Technology
1
Types of data
Operational data (OLTP application)
• Data that ‘works’.
• Frequent updates and queries
• Normalized(standardize) for efficient search
and updates
• Fragmented and local reference
• Point queries: queries accessing individual
tuples.
Cont…
Historical data (OLAP application)
• Data that ‘tells’.
• Very infrequent updates
• Analytical queries that require huge amounts
of aggregation.
• Integrated data set with global relevance
• Performance issues mainly in query response
time (not in updates)
e.g. of OLTP Queries
• What is the salary of Mr. X
• What is the address and phone no. of the
person in change of the supplies department.
e.g. of OLAP Queries
• How is the employee attrition scene changing
over the years across the company?
Data Warehouse vs. Operational DBMS
• OLTP (on-line transaction processing)
– Major task of traditional relational DBMS
– Day-to-day operations: purchasing, inventory, banking, manufacturing,
payroll, registration, accounting, etc.
• OLAP (on-line analytical processing)
– Major task of data warehouse system
– Data analysis and decision making
• Distinct features (OLTP vs. OLAP):
– User and system orientation: customer vs. market
– Data contents: current, detailed vs. historical, consolidated
– Database design: ER + application vs. star + subject
– View: current, local vs. evolutionary, integrated
– Access patterns: update vs. read-only but complex queries
December 22, 2022 Data Mining: Concepts and Techniques 5
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
all all
Office Day
Month
Han: Data Cubes 22
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
t
uc
TV
od
PC U.S.A
Pr
VCR
Country
sum
Canada
Mexico
sum
• Visualization
• OLAP capabilities
• Interactive manipulation
Han: Data Cubes 24
Representation of Multi-
dimensional Data
• Example of two-dimensional query.
• What is the total revenue generated by property sales in
each city, in each quarter of 2004?’
33
Multi-dimensional OLAP (MOLAP)
• Use array technology and efficient storage
techniques that minimize the disk space
requirements through sparse data
management.
34
Multi-dimensional OLAP (MOLAP)
• Traditionally, require a tight coupling with
the application layer and presentation layer.
35
Typical Architecture for MOLAP
Tools
36
MOLAP Tools - Development
Issues
• Underlying data structures are limited in
their ability to support multiple subject areas
and to provide access to detailed data.
37
MOLAP Tools - Development
Issues
• MOLAP products require a different set of
skills and tools to build and maintain the
database, thus increasing the cost and
complexity of support.
38
Relational OLAP (ROLAP)
• Fastest-growing style of OLAP technology
due to requirements to analyze ever-
increasing amounts of data and the
realization that users cannot store all the
data they require in MOLAP databases.
39
Relational OLAP (ROLAP)
• Supports RDBMS products using a metadata
layer - avoids need to create a static multi-
dimensional data structure - facilitates the
creation of multiple multi-dimensional views
of the two-dimensional relation.
40
Relational OLAP (ROLAP)
• To improve performance, some products use
SQL engines to support the complexity of
multi-dimensional analysis, while others
recommend, or require, the use of highly
denormalized database designs such as the
star schema.
41
Typical Architecture for ROLAP
Tools
42
ROLAP Tools - Development Issues
• Performance problems associated with the
processing of complex queries that require
multiple passes through the relational data.
43
ROLAP Tools - Development Issues
• Development of an option to create
persistent, multi-dimensional structures with
facilities to assist in the administration of
these structures.
44
Hybrid OLAP (HOLAP)
• Provide limited analysis capability, either
directly against RDBMS products, or by using
an intermediate MOLAP server.
45
Hybrid OLAP (HOLAP)
• Promoted as being relatively simple to install
and administer with reduced cost and
maintenance.
46
Typical Architecture for HOLAP
Tools
47
HOLAP Tools - Development
•
Issues
Architecture results in significant data redundancy
and may cause problems for networks that
support many users.
49
Desktop OLAP (DOLAP)
• As with multi-dimensional databases on the
server, OLAP data may be held on disk or in
RAM, however, some DOLAP products allow
only read access.
52
© Pearson Education Limited 1995, 2005
DOLAP Tools - Development
Issues
• Provision of appropriate security controls to
support all parts of the DOLAP environment.
Since the data is physically extracted from the
system, security is generally implemented by
limiting the information compiled into each
cube.
• Once each cube is uploaded to the user's
desktop, all additional meta data becomes the
property of the local user.
53 © Pearson Education Limited 1995, 2005
DOLAP Tools - Development
Issues
• Reduction in the effort involved in deploying
and maintaining the DOLAP tools. Some DOLAP
vendors now provide a range of alternative
ways of deploying OLAP data such as through
e-mail, the Web or using traditional
client/server architecture.
55
OLAP Extensions to SQL
• Answer is ANSI adopted a set of OLAP
functions as an extension to SQL to enable
these calculations as well as many others that
used to be impossible or even impractical
within SQL.