Professional Documents
Culture Documents
ETL and Datawarehousing PDF
ETL and Datawarehousing PDF
Platforms
Module 2
Agenda
• What is Data Warehouse ?
• Typical Data Warehouse
• Modern Data Warehouse
• Terminologies used in BIG Data Environment
• Parallel Vs Distributed System
• CAP Theorem
• Big Data and Technologies Landscape
Data Ware House
• Data warehousing emphasizes the capture of data from different sources for access and
analysis.
• Three main component of Data Warehouse.
– Data sources from operational systems, such as Excel, ERP, CRP or financial applications
– A data staging area where data is cleaned and ordered
– A presentation area where data is warehoused.
• Data may be:
– Structured
– Semi-structured
– Unstructured data
• Types of Data Warehouse
– Enterprise Data Warehouse:
– Operational Data Store
– Data Mart
Typical Data Warehouse
• Step 1: Operational or transactional or day – to
–day business data is gathered.
• Step 2: This data is then integrated, cleaned up,
transformed, and standardized through the
process of Extraction, Transformation, and
Loading (ETL).
• Step 3: The transformed data is then loaded
into enterprise data warehouse or data marts.
• Step 4: A host of market leading business
intelligence and analytics tools are then used to
enable decision making from the use of ad-hoc
queries, SQL, enterprise dashboards, data
mining etc.
Modern Data Warehouse
Terminologies used in BIG Data
In-Memory In-Database
Analytics Processing
Massively
Symmetric
Parallel
Multiprocessor
Processing
System (SMP)
(MPP)
Parallel Vs Distributed System
Distributed System
CAP Theorem
Hadoop Ecosystem
Big Data and Technologies Landscape
THANK YOU