Professional Documents
Culture Documents
Data Warehousing
T-03 Paulraj Ponniah Fundamentals Wiley 2001
McGraw Hill First edition,
R-01 Anil Maheshwari Data Analytics Education 2017)
Historical data is like the "finger print" or DNA of an enterprise and can potentially be
a smart guidance system.
1.1 Business Enterprise Organization, Its
Functions and Core Business Processes
Enterprise
Internal Unit /
Support Units
Business Unit
Human
Supplier
Accounting Technology Resource
Management
Management
Online Transaction
• Online Airline Booking
Processing System
Business Process/Model
• Amazon book store
Innovation Applications
Support single sign-on and support special authentication and authorization requirements
Customer Customer
Supply Chain Human Capital Financial
Relationship Order
Management Management Accounting
Management Processing
Service or
Inventory Innovation
Repair Order
Management Management
Management
1.5 Bespoke IT Applications
Office goers, mobile users, home office users, remote securely connected
users, casual visitors to website, and digital users who conduct transactions
using the Internet
Role-based users who have access to certain category of IT applications, certain level
of classified information, access to specific systems, and even specific operations they
are allowed to perform.
Securely connected users who may be allowed to access specific servers from a
specific location during specified hours
Administrative users, who configure the IT environment, manage users access control,
execute anti-virus programs, perform anti-theft checks, install updates/upgrades,
back-up enterprise data, and restore in the event of data corruption
1.6 Information users
Users who have permission to read or update or have full control over the
enterprise information
Multi-device access users who sometimes work in the office, move in the field,
use different devices ranging from desktop systems to hand held smartphones
to connect to the enterprise IT applications.
1.7 Information users requirements
Smooth authentication and access to authorized resources
Speed of information delivery without having to wait long for response from systems
Digital Data are generated with the help of computers, Internet & digital
communication devices.
Digital data can be stored in the form of digital space such as drives, data
warehouses, clouds etc.
Three types of Digital Data are there namely: Structured, Semi Structured,
unstructured
Unit-2 Digital Data
Structure data can be organized into matrix form (e.g. rows & columns) & can be easily
used by computers.
Relationship exists between entities of the data such as classes & their objects. Example –
Data Stored in databases
Semi-Structure data is which does not conform to data model but has some structure &
cannot be easily used by computers. Example – Email.
Unstructured data is which does not conform to data model, does not has any structure &
cannot be easily used by computers. Example – Chats, images, videos, PPT’s, Letters,
researches, body of email etc.
2.1 Characteristics of Structure Data
Conforms to data model
Spreadsheet
SQL
OLTP systems
2.2 Characteristics of Semi-structured Data
Does not conforms to a data model but contains tags & elements.
XML
TCP/IP packets
Zipped files
Binary executables
Mark Up Languages
RDBMS
Implicit Structure
Evolving schemas
XML
RDBMS
OEM
2.2 Challenges for extracting information
from Semi - Structured Data
Heterogenous sources
Indexing
OEM
XML
Mining Tools
2.2 Solutions For Analyzing Semi - Structured
Data
XML (eXtensible Markup Language)
Memos
Videos
Images
Body of an email
Word documents
PPT’s
Chats
Reports
2.3 Manage Unstructured data
Indexing
Tags / Metadata
Classification / Taxonomy
Scalability
Retrieve Information
Security
Changing format
Storing in RDBMS/BLOBs
Tags
Indexing
Deriving meaning
File formats
Classification / Taxonomy
2.3 Solutions for extracting information from
Unstructured Data
Tags
Text mining
Applications Platforms
Classification / Taxonomy
OLTP systems refer to class of systems that manage transaction oriented applications.
These applications are mainly concerned with the entry, storage, & retrieval of data.
They are designed to cover most of day-to-day operations of an organizations such as purchasing, inventory,
manufacturing, payroll, accounting etc.
Airlines, mail – order, supermarkets, banking, insurance uses OLTP systems to record transactional data.
2.4 ONLINE TRANSACTION PROCESSING
SYSTEM
Advantages of • Simplicity
an OLTP • Efficiency
System • Fast query processing
Challenges of • Security
an OLTP • OLTP system data content not suitable
System for decision making
2.5 ONLINE ANALYTICAL PROCESSING SYSTEM
OLAP differs from traditional databases in the way data is conceptualized and
stored
OLAP data is held in the dimensional form rather than the relational form
The multi dimensional data model views in the form of a data cube
2.5 ONLINE ANALYTICAL PROCESSING SYSTEM
On the one hand, HOLAP leverages the greater scalability of ROLAP, on the
other hand, HOLAP leverages the cube technology for faster performance
and for summary type information.
However, HOLAP can also “drill through” into the underlying relational data
from the cube.
2.7 Data Model for OLTP - ER
2.7 Data Model for OLAP - Star
2.7 Data Model for OLAP - Snowflake
2.8 OLAP Operations in Multidimensional
Data
Slice
Dice
Roll-up or Drill-up
Drill-down
Pivot
Drill-across
Drill-through
THANK YOU!