Professional Documents
Culture Documents
com 02/07/2023 1
What is Business Intelligence
The term Business Intelligence refers collectively to the tools and technologies used for
the collection, integration, analysis, and visualization of data. The raw data which we
simpo313@gmail.com 02/07/2023 2
To simplify the concept, we collect raw data from various sources and with the
can store such data in data warehouses or data lakes in specific data structures
From the data warehouses, we can retrieve stored data in the form of a report,
query, make a dashboard to conduct data analysis. We do this with the process
simpo313@gmail.com 02/07/2023 3
So What is Data Warehousing?
Data warehousing is the process of storing data in data warehouses, which are
simpo313@gmail.com 02/07/2023 4
A data warehouse is known by several other terms like Decision Support
simpo313@gmail.com 02/07/2023 5
How does a Data Warehousing Work?
In data warehousing, data is de-normalized i.e. it is converted to 2NF from 3NF and
hence, is called Big data. We call it big data because of data redundancy increases
and so, data size increases. The sole purpose of creating data warehouses is to
Also, to provide aggregate data like totals, averages, general trends etc for
enterprises to analyze and make decisions good for their business and functioning in
the industry
simpo313@gmail.com 02/07/2023 6
Components of Data Warehouse
enterprise which serve a unique purpose and contribute in their ways for the
Planning (ERP), etc. All of these systems have their own normalized database
simpo313@gmail.com 02/07/2023 7
Integration Layer:
The normalized data is present in the operational systems must not be manipulated.
Instead, a copy of that we take data into an integration layer staging area where
One basic operation done is bringing the copied data into a single standardized format
because, in the operational systems, data is not present in the same format. For
instance, in a data field, the data can be in pounds in one table, and dollars in another.
simpo313@gmail.com 02/07/2023 8
Data Warehouse:
The transformed and standardized data flows into the next element, known
as the data warehouse which is a very large database. So, the data stores
from all over the enterprise in this data vault in the second normal form
simpo313@gmail.com 02/07/2023 9
Data Marts:
These are the purpose-specific sub-databases of the data warehouse containing only some
parts of the entire big data. In each data mart, only that data which is useful for a particular
use is available like there will be different data marts for analysis related to marketing,
Each of these databases does not coincide or share their data with each other and
operations performed in each of them does not influence the other. This makes fetching
data from the data marts much faster than doing it from the much larger data warehouse.
simpo313@gmail.com 02/07/2023 10
Business Intelligence and Data
Warehousing
Data warehousing and Business Intelligence often go hand in hand, because the
data made available in the data warehouses are central to the Business
BI tools like Tableau, Sisense, Chartio, Looker etc, use data from the data
warehouses for purposes like query, reporting, analytics, and data mining.
simpo313@gmail.com 02/07/2023 11
In any enterprise, Business Intelligence plays a central
role in the smooth and cost-effective functioning of it
simpo313@gmail.com 02/07/2023 12
. From our prior discussions, we know that data warehouses store processed
and aggregated data. Business Intelligence tools require such data from the
data warehouses.
The data is transported through the Online Analytical Processing (OLAP). Data
warehousing and OLAP has proved to be a much-needed jump from the old
simpo313@gmail.com 02/07/2023 13
simpo313@gmail.com 02/07/2023 14
– Architecture and Process of data
warehousing and BI
In this section, we will see how to extract, transform and load raw data into
data warehouses. Also, we discuss how BI tools use it for analytical purposes.
simpo313@gmail.com 02/07/2023 15
simpo313@gmail.com 02/07/2023 16
Step 1: Extracting raw data from data sources like traditional data, workbooks, excel
files etc.
Step 2: The raw data that is collected from different data sources are consolidated
and integrated to be stored in a special database called a data warehouse The process
by which we fetch the data into data warehouses from the source is ETL (Extract,
Transform, Load). This extracts raw data from the original sources, transforms or
simpo313@gmail.com 02/07/2023 17
Step 3: If you wish to use data from the data warehouse for specific purposes
like marketing analysis, financial analysis etc., subsets of the data warehouse
are created known as data marts and data cubes. Data from the data
simpo313@gmail.com 02/07/2023 18
Step 4: From both data warehouse and data marts, data is redirected to data
or OLAP cubes which are multi-dimensional data sets whose data is ready to
At the front-end, exists BI tools such as query tools, reporting, analysis, and
data mining. These BI tools query data from OLAP cubes and use it for
analysis.
simpo313@gmail.com 02/07/2023 19
Summary
Thus, Business Intelligence and Data Warehousing are two important pillars in the
Definition
simpo313@gmail.com 02/07/2023 21
Data Warehouse—Subject-Oriented
Organized around major subjects, such as customer, product, sales,
employees.
simpo313@gmail.com 02/07/2023 22
Data Warehouse—Integrated
Constructed by integrating multiple, heterogeneous data sources
Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among
E.g., When short listing your top 20 customers, you must know that “HAL” and “Hindustan Aeronautics
Limited” are one and the same. Much of the transformation and loading work that goes into the data
simpo313@gmail.com 02/07/2023 23
Data Warehouse—Time Variant/time
referenced data
The time horizon for the data warehouse is significantly longer than
that of operational systems.
Operational database: current value data.
Data warehouse data: provide information from a historical perspective (e.g.,
past 5-10 years)
For example, the user may ask “What were the total sales of product
A for the past three years on New Year’s Day across region Y ‟?”
simpo313@gmail.com 02/07/2023 24
And………
Time-referenced data when analyzed can also help in spotting the hidden
obvious to the naked eye. This exploration activity is termed “data mining”.
simpo313@gmail.com 02/07/2023 25
Data Warehouse—Non-Volatile
simpo313@gmail.com 02/07/2023 26
Metadata
Metadata is simply defined as data about data. For example the index of a book serve
as metadata for the contents in the book. In other words we can say that metadata
The metadata act as a directory. This directory helps the decision support system to
simpo313@gmail.com 02/07/2023 27
Metadata Respiratory
The Metadata Respiratory is an integral part of data warehouse system. The
Business Metadata - This metadata has the data ownership information, business
Currency of data means whether data is active, archived. Lineage of data means
simpo313@gmail.com 02/07/2023 28
Data for mapping from operational environment to data warehouse -This
metadata includes source databases and their contents, data extraction, data
simpo313@gmail.com 02/07/2023 29
Data cube
Data cube help us to represent the data in multiple dimensions. The data cube is defined by
dimensions and facts.
Illustration of Data cube
Suppose a company wants to keep track of sales records with help of sales data warehouse with
respect to time, item, branch and location. These dimensions allow to keep track of monthly
sales and at which branch the items were sold. There is a dimension table table associated
with each dimension. This dimension table further describes the dimensions. For example
"item" dimension table may have attributes such as item_name, item_type and item_brand.
simpo313@gmail.com 02/07/2023 30
The following table represents 2-D view of Sales Data for a company with
respect to time, item and location dimensions
simpo313@gmail.com 02/07/2023 31
But here in this 2-D table we have records with respect to time and item only.
The sales for New Delhi are shown with respect to time and item dimensions
If we want to view the sales data with one new dimension say the location
dimension. The 3-D view of the sales data with respect to time, item, and
simpo313@gmail.com 02/07/2023 32
simpo313@gmail.com 02/07/2023 33
The above 3-D table can be represented as 3-D data cube as
shown in the following figure:
simpo313@gmail.com 02/07/2023 34
Data mart
Data mart contains the subset of organisation-wide data. This subset of data is valuable to
specific group of an organisation. in other words we can say that data mart contains only that
For example the marketing data mart may contain only data related to item, customers
simpo313@gmail.com 02/07/2023 35
Points to remember about data marts:
simpo313@gmail.com 02/07/2023 36
Graphical Representation of data mart.
simpo313@gmail.com 02/07/2023 37
Process Flow in Data Warehouse:
The ETL Process
You get the data out of its original source location (E), you
simpo313@gmail.com 02/07/2023 38
THE ETL Process
Extract, transform, and load (ETL) is a process in data warehousing that
involves:
extracting data from sources systems; (these are the (OLTP) On Line Transaction
Processes)
simpo313@gmail.com 02/07/2023 39
Extract
The first part of an ETL process is to extract data from the source systems.
simpo313@gmail.com 02/07/2023 40
Transform
Some data sources will require very little manipulation of data. In other
simpo313@gmail.com 07/02/2023 41
POSSIBLE DATA TRANSFORMATIONS
1. Selecting only certain columns to load (or selecting null columns not to
load)
2. Translating coded values (e.g., if the source system stores M for male and F
for female, but the warehouse stores 1 for male and 2 for female)
simpo313@gmail.com 07/02/2023 42
Summarizing multiple rows of data (e.g., total sales for each region)
etc.)
simpo313@gmail.com 07/02/2023 43
Transformation types
Data must be merged from different systems, e.g. one source may store the
data warehouse that are independent of keys from the data sources.
simpo313@gmail.com 02/07/2023 44
Load
The load phase loads the data into the data warehouse. The data loaded can
simpo313@gmail.com 07/02/2023 45