Professional Documents
Culture Documents
Exploratory Data
Analysis Regression
Advanced Data
Data Visualization Visualization
01
DESCRIPTIVE
ANALYTICS
CASE STUDY
Exploratory Data Analysis refers to the critical process of performing initial investigations
on data so as to discover patterns, to spot anomalies, to test hypothesis and to check
assumptions with the help of summary statistics and graphical representations.
It is a good practice to understand the data first and try to gather as many insight from it.
EDA is all about making sense of data in hand, before getting them dirty with it.
Store 1
Store 2
Store 3
Store 4
Store 5
Store 6
Store 7
Store 8
EXPLORATORY DATA ANALYSIS
STEP BY STEP
REVENUE BY STORE
CUSTOMER BY STORE
• Flagship store are the most Revenue declining segment compared to year ago. The decline of
value was resulted from significant drop of store visitors compared to last period and year ago.
• Mass is the best performer despite suffering a decrease compared to Q4’19 yet the 6% gain
versus Q1’19, share is leaping from 26% to 32%.
• Flagship store are the most Revenue declining segment compared to year ago. The decline of
value was resulted from significant drop of store visitors compared to last period and year ago.
• Mass is the best performer despite suffering a decrease compared to Q4’19 yet the 6% gain
versus Q1’19, share is leaping from 26% to 32%.
DIAGNOSTICS
ANALYTICS
BUSINESS ANALYTICS TECHNIQUE
Exploratory Data
Analysis Regression
Advanced Data
Data Visualization Visualization
HYPOTHESIS TESTING
490k
575k
520k
đ510.000 đ530.000
POPULATION SAMPLE
HYPOTHESIS TESTING
TECHNIQUE
Hypothesis testing
Goodness of
Pair samples test
Fit Test
Independent Independence test of
samples test 2 nominal variables
HYPOTHESIS TESTING
TECHNIQUE
One population Compare population proportion with • Test whether number of defective
Z-test
proportion test a given number product under threshold
Compare proportion of 2
Two population • Test whether more customer visit our
independent population with same Z-test
proportion test stores in weekend than weekday
variance
Determine whether or not a
Goodness of • Test if ratio of male customers vs.
categorical variable follows a
Fit Test female is 50:50
hypothesized distribution
Non-
Chi-square
parametric Independenc
Determine whether or not there is a tests
tests e test of 2 • Test whether male prefer blue while
significant association between two
nominal female like pink
categorical variables
variables
HYPOTHESIS TESTING APPLICATION
• A/B testing.
• Evaluate new campaign effectiveness.
• Product test: Compare 2 new products to
decide which one will be launched.
• Figure out different characteristics among
customer groups.
CORRELATION & REGRESSION
Correct. But…
H2: Basket size does increase as
Customers get older.
Correlation coefficient
CORRELATION
700,000
Correlation coefficient = 0.9
600,000
Basket size
500,000
400,000
300,000
200,000
15 20 25 30 35 40 45 50 55 60 65 70 75
Age
CORRELATION
WHY OLDER PEOPLE PAY MORE FOR AN ORDER?
Affect
Affect
REGRESSION
700,000
600,000
Basket size
300,000
200,000
15 20 25 30 35 40 45 50
Age
• Regression is a statistical technique that identify how variables affect each other.
• Regression analysis attempts to determine how independent variables (aka explanatory
variable or predictor variable) affect a dependent variable (aka response variable or
outcome variable).
• Linear Regression analysis is the most popular method and fits almost all data points.
Variants:
Y: dependent variable which are Numerical data (sometimes Ordinal is also
acceptable)
X1, X2 : independent variables which are Numerical/ Ordinal data
β: The magnitude that X affects Y
REGRESSION
700,000
600,000
Basket size
300,000
Age
REGRESSION
NON-LINEAR REGRESSION
Quadratic
Y = β0+ β1X1 + β2X12
Cubic
Y = β0+ β1X1 + β2X12 + β3X13
REGRESSION
700000
600000
Basket size
500000
400000
300000
200000
15 20 25 30 35 40 45 50 55 60 65 70 75
Age
Basket size = 407,638 + 1,849 * Age Basket size = -133 + 13,305 * Age + 19,738 * Age2
R2 = 19% R2 = 40%
R2: khả năng giải thích của mô hình
CORRELATION & REGRESSION
APPLICATION
Correlation Regression
• Measures the
relationship
relationship
of 2of
factors
2 factors
thatthat
havehave
no • Figures out which factors are the most
no
causal
causal
relationship.
relationship. important factors affect customer satisfaction.
• Measures the
relationship
relationship
of 2of
factors
2 factors
before
before • Forecast: Forecast market demand if we
conducting regression analysis. increase our price while competitors maintain
theirs.
• Optimization: Measure how much turnover will
be increased if we invest one more dollar for
advertising activities vs. one more dollar for
promotions.
03
PREDICTIVE
ANALYTICS
BUSINESS ANALYTICS TECHNIQUE
Sales
forecast for
the next 6
months?
FORECASTING
Exponential smoothing
The objective of classification is to predict the class of a target value based on the values of
explanatory variables or predictors
CLASSIFICATION TECHNIQUE
Stochastic K-Nearest
Logistic
Naïve Bayes Gradient Neighbours
Regression
Descent (KNN)
Support Vector
Decision Trees Random Forest
Machine (SVM)
CLASSIFICATION
DECISION TREES
Weather
No Yes No Yes
CLUSTERING
CUSTOMER SEGMENTATION
CLUSTERING
CLASSIFICATION VS. CLUSTERING
Classification Clustering
• Classification are supervised learning • Clustering are unsupervised learning
algorithms (where classes are known a algorithms (classes are not known a
priori in the training data). priori).
• Sometimes Classification algorithms • Clustering is
algorithms
usually used
create
forcomplex
a
can mistake between items. descriptive
criteria that problem.
can be difficult for Business
• Classification is mostly applied for Analysts to decipher.
predictive problem. • Clustering is usually used for a
descriptive problem.
CLUSTERING TECHNIQUE
Hierarchical clustering
K Means
• Customer segmentation
• Social network analysis
• Search result grouping
• Image segmentation
• Anomaly detection
ASSOCIATION
ASSOCIATION
ASSOCIATION
Association Application:
• Recommendation system
Combine insights from previous analysis
And then to make data-driven decisions
PRESCRIPTIVE
ANALYTICS
BUSINESS ANALYTICS TECHNIQUE
Exploratory Data
Analysis Regression
Advanced Data
Data Visualization Visualization
OPTIMIZATION
60