You are on page 1of 33

Aquiring knowledge of

In Business Intelligence
Objectives
Our Goal is to understand the Basic Concepts of
Data Warehousing and be able to create a different
models to create OLAP CUBES.

To understand the ETL Process (Extract Transform


and Load).

And to have the Ability to find Patterns Using Data Mining


Techniques.
Data
Warehousing
Data Warehousing

Data warehousing is a process and


technology that involves collecting,
storing, and managing data from
various sources within an
organization in a central repository.

This central repository is known as


a data warehouse, and it is
designed to facilitate data analysis,
reporting, and business intelligence
activities.
Key aspects in Data Warehousing
Data Integration: Data Data Transformation: Data is Data Storage: Data warehouses
warehousing involves gathering
01
data from diverse sources, such
as operational databases,
02
often transformed and cleansed
to ensure consistency and 03
are optimized for storage and
query performance. They
spreadsheets, and external
accuracy within the data typically use a specific schema
warehouse.Data
This includes tasks design, like a star or snowflake
Data Integration
systems. This data is integrated Data Storage
into a unified format and Transformation
like data validation, data schema, to efficiently store and
structure for analysis. cleansing, and data enrichment. organize data.

Query and Reporting: Data


04 05
Decision Support: Data
06
Historical Data: Data
stored in the data warehouse
warehouses often maintain warehousing supports
can be easily queried and used
historical data, allowing strategic decision-making by
for generating reports and
Query and organizations to analyze trends providing a consolidated and
performing data analysis. This Historical Data Decision Support
Reporting
supports business intelligence
and make informed decisions reliable source of data for
based on historical business analysts, managers,
and decision-making
performance. and decision-makers.
processes.
Where Data Warehouse is commonly used ?
Retail Finance Healthcare
Retailers use data warehousing Financial institutions, such as Healthcare organizations use
to store and analyze sales data, banks and investment firms, use data warehousing to consolidate
customer information, inventory data warehousing to store patient records, medical
levels, and more. This helps transaction data, customer histories, billing information, and
them make informed decisions accounts, and market data. This clinical data. This supports
about pricing, inventory enables risk analysis, fraud medical research, patient care,
management, and customer detection, and regulatory and compliance with healthcare
preferences. compliance. regulations.

Marketing Telecom Education


Marketing firms and departments use Telecommunication companies In the field of education, data
data warehousing to consolidate data use data warehousing to store warehousing is increasingly used to
from various marketing channels, call records, network gather and analyze data related to
such as digital marketing campaigns, performance data, and customer student performance, enrollment
social media, and customer surveys. usage data. This allows them to trends, faculty information, and
This data supports marketing monitor network health, institutional operations. This data
analytics, customer segmentation, optimize service delivery, and aids in improving educational
and campaign optimization. analyze customer behavior. outcomes, resource allocation, and
administrative decision-making.
Data Mart
Master Data Management
Data Mart

A data mart is a smaller, focused


subset of a data warehouse that
caters to the specific needs of a
particular department, team, or
group within an organization. Data
marts are often created to improve
performance and
simplify access to data.
Difference of Data Mart and Data Warehouse

FACTORS DATA MART DATA WAREHOUSE

Summarized historical (in traditional


TYPE OF DATA Summarized historical
DW’s).

DATA Fewer source systems which are Wide variety of source systems from
SOURCES operationally focused. all across the enterprise.

Analyzing smaller data sets


Analyzing large (typically 100+ GB),
(typically <100 GB) focused on a
USE CASE OR complex, enterprise-wide datasets to
particular subject to support
SCOPE analytics and business intelligence
support data mining, BI artificial
intelligence, and machine learning.
(BI).

DATA Easier because data is already Requires strict governance rules and
GOVERNANCE partitioned. systems to access data.
3 Types of Data mart
Dependent

A data mart that relies on the central data warehouse for its data
source, ensuring consistency and reducing data duplication.
Independent

A standalone data repository created for specific departmental


needs, separate from the central data warehouse.
Hybrid

Combines elements of both independent and dependent data marts, using the central data
warehouse as the primary source while allowing for some local data storage and
processing.
Master Data Management
Master Data Management

Master Data Management is a comprehensive


method for managing an organization's critical
data assets, often referred to as master data.
This data includes information about customers,
products, employees, and other entities that
need to be consistent and accurate across the
organization.
Data Warehouse
Dimensions
What is Dimension in Data Warehousing ?

In data warehousing, a dimension is a collection


of reference information that supports a
measurable event, such as a customer
transaction. In this context, events are referred
to as "facts." Dimensions provide the details
necessary to understand and analyze a set of
related facts.
3 Types of Data
Dimensions
Conformed Dimension

Conformed Dimension: Think of


it as a dimension (like
"customer" or "product") that
multiple data reports or
analyses can use consistently
without any confusion. It's like
having a shared vocabulary.
Junk Dimension

Junk Dimension: This is like a


drawer where you put things
that don't belong anywhere else.
It's for attributes like flags or
yes/no values that don't need
their own special category.
Degenerate Dimension

Degenerate Dimension: Think of


it as information that's right
there in the data you're looking
at. For example, in a sales
report, you might see invoice or
order numbers without needing
a separate document to explain
them.
Data Vault Modelling
What is Data Vault Modelling?
Data Vault modeling is a highly flexible and
scalable methodology used in data
warehousing. It is designed to address the
challenges posed by rapidly changing data
sources, evolving business requirements, and
the need for adaptability in the data
warehousing environment. Data Vault modeling
provides a structured approach to data
warehouse design that accommodates various
data types and allows for seamless data
integration.
Here are 3 Cores of Data Vault Modelling

Hubs Satelites Links


These represent the core Satellites contain Links establish
business entities, such as descriptive information relationships between hubs
customers or products. about the hubs, capturing and are used to connect
Hubs act as a central attributes and historical different business entities
repository for unique changes over time. They are or events within the data.
identifiers and links to linked to hubs and can be
related satellite data. updated independently
without altering the hubs.
OLAP Cube
What is OLAP Cube?
An OLAP cube is a multi-dimensional array of
data. Online analytical processing (OLAP) is a
computer-based technique of analyzing data to
look for insights. The term cube here refers to a
multi-dimensional dataset, which is also
sometimes called a hypercube if the number of
dimensions is greater than three.
Parallel Processing
What is Parallel Processing?
The processing of large amounts of data is
typical for data warehouse environments.
Depending on the available hardware resources,
sooner or later the point is reached where a job
cannot be processed on a single processor
respectively cannot be represented by a single
process anymore.
3 Main reasons why data
warehouse jobs cannot be
processed on a single
processor
3 main reasons why data warehouse jobs
cannot be processed on a single processor

Time System Recurrent


Requirements Resources Errors
It demands the use of (memory, disk space, Requires the repetition of
multiple processors temporary tablespace, the process
rollback segments, and etc.)
are limited
Advantages or Pros of
Data Warehousing
Advantages or Pros of Data
Warehousing
Data Data Warehouse
Warehouse Data Mart Dimensions
Departmental focus. Improved Data Analysis.
Centralized Data Repository.

Data Vault Parallel


Modelling Processing
Flexibility and Scalability. Enhanced Performance.

You might also like