Professional Documents
Culture Documents
UNIT 04 - Data Science - Final
UNIT 04 - Data Science - Final
2
2
All Rights Reserved, Copyright © 2018 Prof. Amresh Kumar, GHRCE, Nagpur
Outline
1. Introduction to Data Warehouse,
2. Data Warehouse Architecture,
3. Data Warehouse Models,
4. Need for Data Warehousing,
5. OLTP and OLAP system design,
6. Introduction to data mining,
7. KDD Process,
8. Relational Vs Non-Relational databases.
3
3
All Rights Reserved, Copyright © 2018 Prof. Amresh Kumar, GHRCE, Nagpur
Introduction to Data Warehouse
Introduction
• The term "Data Warehouse" was first coined by Bill Inmon in 1990.
• Data Warehouse:
o They are central repositories of integrated data from one or more disparate
sources.
o They store current and historical data in one single place that are used for
creating analytical reports for workers throughout the enterprise (helps the
organization to analyze its business).
o A data warehouse helps executives to organize, understand, and use their data
to take strategic business decisions.
• Data warehousing:
o Data warehousing is the process that uses Data Warehouse to analyze and
transform data into information, thereby enabling the business to examine its
operations and performance.
o Data warehousing is a process of Designing/Constructing and using data
warehouses for ETL process(Extraction, Transformation, Loading) and
Reporting). All Rights Reserved, Copyright © 2018
4
Prof. Amresh Kumar, GHRCE, Nagpur
Introduction to Data Warehouse Conti…
Data Warehouse Applications
• As discussed before, a data warehouse helps business executives to
organize, analyze, and use their data for decision making.
•Star schema is the simple and common •Snowflake schema is the variant of the
modeling paradigm. star schema which includes the
•It schema resembles to a star, with dimension hierarchical form of dimensional tables.
table displayed in a radial pattern around the •In this schema, there is a fact table
central fact table. comprise of various dimension and sub-
•The dimensions in fact table are connected to dimension table connected across through
dimension table through primary key and primary and foreign key to the fact table.
All Rights Reserved, Copyright © 2018
foreign key. Prof. Amresh Kumar, GHRCE, Nagpur
22
Star and Snowflake Schema Conti…
Types of Clustering
1. Hard Clustering: In hard clustering, each data point either belongs to a
cluster completely or not. For example, in the above example each
customer is put into one group out of the 10 groups.
2. Soft Clustering: In soft clustering, instead of putting each data point into
a separate cluster, a probability or likelihood of that data point to be in
those clusters is assigned. For example, from the above scenario each
costumer is assigned a probability to be in either of 10 clusters of the
retail store. All Rights Reserved, Copyright © 2018
29
Prof. Amresh Kumar, GHRCE, Nagpur
Clustering Techniques Conti…
Types of clustering algorithms
• Connectivity models: Examples of these models are hierarchical clustering
algorithm and its variants.
• Centroid models: K-Means clustering algorithm is a popular algorithm that
falls into this category.
• Distribution models: A popular example of these models is Expectation-
maximization algorithm which uses multivariate normal distributions.
• Density Models: Popular examples of density models are DBSCAN and
OPTICS.
Step 02
Step 02: Randomly assign each data point to a cluster :
Let’s assign three points in cluster 1 shown using red
color and two points in cluster 2 shown using grey
color.