You are on page 1of 12

Big Data Fundamentals and

Platforms
Module 2
Agenda
• What is Data Warehouse ?
• Typical Data Warehouse
• Modern Data Warehouse
• Terminologies used in BIG Data Environment
• Parallel Vs Distributed System
• CAP Theorem
• Big Data and Technologies Landscape
Data Ware House
• Data warehousing emphasizes the capture of data from different sources for access and
analysis.
• Three main component of Data Warehouse.
– Data sources from operational systems, such as Excel, ERP, CRP or financial applications
– A data staging area where data is cleaned and ordered
– A presentation area where data is warehoused.
• Data may be:
– Structured
– Semi-structured
– Unstructured data
• Types of Data Warehouse
– Enterprise Data Warehouse:
– Operational Data Store
– Data Mart
Typical Data Warehouse
• Step 1: Operational or transactional or day – to
–day business data is gathered.
• Step 2: This data is then integrated, cleaned up,
transformed, and standardized through the
process of Extraction, Transformation, and
Loading (ETL).
• Step 3: The transformed data is then loaded
into enterprise data warehouse or data marts.
• Step 4: A host of market leading business
intelligence and analytics tools are then used to
enable decision making from the use of ad-hoc
queries, SQL, enterprise dashboards, data
mining etc.
Modern Data Warehouse
Terminologies used in BIG Data

In-Memory In-Database
Analytics Processing

Massively
Symmetric
Parallel
Multiprocessor
Processing
System (SMP)
(MPP)
Parallel Vs Distributed System
Distributed System
CAP Theorem
Hadoop Ecosystem
Big Data and Technologies Landscape
THANK YOU

You might also like