You are on page 1of 13

Datawarehouse Application

 Presentation of standard reports and graphs


 For dimensional analysis
 Data mining
Roadmap to Data Warehousing
 Data extracted, transformed and cleaned
 Stored in a database - RDBMS, MDD
 Query and Reporting systems
 Executive Information System and Decision Support
System
Data Warehousing Process
Overview
The major components of a data warehousing process
 Data sources: internal, external (data provider), OLAP, ERP,
Web data
 Data extraction: using custom-written or commercial software
called (ETL)
 Data loading: loaded into a staging area to be transformed
and cleansed, then loaded into the warehouse
 Comprehensive database: It is the EDW to support all
decision analysis
 Metadata: to ease indexing and search
 Middleware tools: to enable access to DW. It includes data
mining tools, OLAP, reporting tools, and data visualization
tools.
Data Integration and the Extraction,
Transformation, and Load (ETL) Process
 Extraction, transformation, and load (ETL)
A data warehousing process that consists of:
 extraction (i.e., reading data from a database),
 transformation (i.e., converting the extracted data from its
previous form into the form in which it needs to be so that it
can be placed into a data warehouse or simply another
database), and
 load (i.e., putting the data into the data warehouse)
 During extraction process, the input files are written to a set of
staging tables, to facilitate the load process.
Sample ETL Tools
 Power Mart/Power Center from Informatica
 Teradata Warehouse Builder from Teradata
 DataStage from Ascential Software
 SAS System from SAS Institute
 Sagent Solution from Sagent Software
 Hummingbird Genio Suite from Hummingbird
Communications
 Talend open studio (Open source)
ETL Process Flow
ETL - > Big data
 Bottom line: Any data that exceeds our current capability
of processing can be regarded as “big”
 Big Data : high-volume, high-velocity, and/or high-variety.
 Walmart handles more than 1 million(10 Lakhs) customer
transactions every hour, which is imported into databases
estimated to contain more than 2.5 petabytes(1
PB=1000000 GB) of data
 Facebook handles 40 billion(1 billion=100 crores) photos
from its user base.
Challenges in traditional approach:
 Recent survey says that 80% of the data created in the
world are unstructured.
 Challenge is how these unstructured data can be
structured, before we attempt to understand and capture
the most important data.
 Another challenge is how we can store it.
 For Sentimental analysis – Facebook,Twitter
Top tools used to store and analyze Big
Data.
 Apache Hadoop
 Hive
 Sqoop
 Flumes
 Presto
 Spark
 Kafka
 Nifi and many more..
To Conclude:
 Data = Understanding. “The goal is to
turn data into information, and information
into insight.”
 Data decides the business and decision
making.
 Choosing career path in data handling will be a
worthful decision for the future.
Data modelling- Points to
remember
 What are the dimension and fact tables? (How many)
 What schema used? Star or Snow flake?
 Primary key and foreign keys used
 Types of indexes used – Bitmap, Functional etc..
 Performance tuning
 Bottle necks
A word to your heart:
Everything You’ve Ever Wanted Is On The
Other Side Of Fear.

Thank you  All the very best.!!

You might also like