You are on page 1of 60

Module 03

ANALYTICS TECHNIQUE & APPLICATION


BUSINESS ANALYTICS TECHNIQUE

Descriptive Diagnostics Predictive Prescriptive


Analytics Analytics Analytics Analytics

Data sources & Data mining Forecasting Optimization


Data type
Inferential Statistics Classification Simulation
Data Preparation
Hypothesis Clustering Machine learning
Descriptive testing & Artificial
Statistics Intelligence
Correlation Association

Exploratory Data
Analysis Regression

Advanced Data
Data Visualization Visualization
01

DESCRIPTIVE
ANALYTICS
CASE STUDY

A fashion company which owns 10 stores

1. How to increase revenue?


2. Should we open more stores?
3. How to Attract new customers & Retain current
customers?
4. What is the right Product strategy?
5. How to optimize cost?
EXPLORATORY DATA ANALYSIS

Exploratory Data Analysis refers to the critical process of performing initial investigations
on data so as to discover patterns, to spot anomalies, to test hypothesis and to check
assumptions with the help of summary statistics and graphical representations.

It is a good practice to understand the data first and try to gather as many insight from it.
EDA is all about making sense of data in hand, before getting them dirty with it.

Practically, EDA is a professional way of saying “reporting”.


Output of EDA mainly includes Crosstab Table & Charting.
Store 1 Store 2 Store 3 Store 4 Store 5 Store 6 Store 7 Store 8
EXPLORATORY DATA ANALYSIS
CROSSTAB

Store 1
Store 2

Store 3
Store 4
Store 5
Store 6
Store 7
Store 8
EXPLORATORY DATA ANALYSIS
STEP BY STEP

Explore data Summarize


Understand Crosstab multi valuable
data structure one by one data field
information

• Available data field • Average • Average • Executive


(Mean/ Median) (Mean/ Median) summary & Report
• Data type
• Frequency & • Frequency &
percentage of total percentage of total
(Count/ Distinct count/ (Count/ Distinct count/
Mode) Mode)
• Percentage of • Percentage of
change change
• Percentile
EXPLORATORY DATA ANALYSIS
STEP BY STEP – EXPLORE DATA ONE BY ONE

ANNUAL REVENUE REVENUE BY STORE

Store 1 Store 2 Store 3 Store 4 Store 5 Store 6 Store 7 Store 8

NUMBER OF STORE CUSTOMER BY STORE

Store 1 Store 2 Store 3 Store 4 Store 5 Store 6 Store 7 Store 8


EXPLORATORY DATA ANALYSIS
STEP BY STEP – CROSSTAB MULTI-DATA FIELD

REVENUE BY STORE

Store 1 Store 2 Store 3 Store 4 Store 5 Store 6 Store 7 Store 8

CUSTOMER BY STORE

Store 1 Store 2 Store 3 Store 4 Store 5 Store 6 Store 7 Store 8


EXPLORATORY DATA ANALYSIS
STEP BY STEP – SUMMARIZE VALUABLE INFORMATION

• Flagship store are the most Revenue declining segment compared to year ago. The decline of
value was resulted from significant drop of store visitors compared to last period and year ago.
• Mass is the best performer despite suffering a decrease compared to Q4’19 yet the 6% gain
versus Q1’19, share is leaping from 26% to 32%.

Revenue per Store Revenue Contribution Transaction per Store

Mass Mainstream Flagship


EXPLORATORY DATA ANALYSIS
STEP BY STEP – SUMMARIZE VALUABLE INFORMATION

• Flagship store are the most Revenue declining segment compared to year ago. The decline of
value was resulted from significant drop of store visitors compared to last period and year ago.
• Mass is the best performer despite suffering a decrease compared to Q4’19 yet the 6% gain
versus Q1’19, share is leaping from 26% to 32%.

Buyer per Store # of visit per store Conversion rate

Mass Mainstream Flagship


02

DIAGNOSTICS
ANALYTICS
BUSINESS ANALYTICS TECHNIQUE

Descriptive Diagnostics Predictive Prescriptive


Analytics Analytics Analytics Analytics

Data sources & Data mining Optimization


Forecasting
Data type
Inferential Statistics Classification Simulation
Data Preparation
Hypothesis Machine learning
Clustering
Descriptive testing & Artificial
Statistics Intelligence
Correlation Association

Exploratory Data
Analysis Regression

Advanced Data
Data Visualization Visualization
HYPOTHESIS TESTING

Hypothesis: Do Female often spend more than Male?


This is a hypothesis that needs to be tested.
→ Further analysis need to be done before concluding.

Average Basket size


đ510.000 đ530.000
HYPHOTHESIS TESTING
Statistical Error

490k

Average Basket size

575k

520k
đ510.000 đ530.000

POPULATION SAMPLE
HYPOTHESIS TESTING
TECHNIQUE

Hypothesis testing

One One Two Mean test


Two Non-
population population for more
population population parametric
proportion proportion than 2
mean test mean test tests
test test population

Goodness of
Pair samples test
Fit Test
Independent Independence test of
samples test 2 nominal variables
HYPOTHESIS TESTING
TECHNIQUE

Testing case Description Method Application


• Check whether average net weight of
One population Compare population mean with a
Z-test/ t-test random products meet requirement or
mean test given number
not
• Evaluate effectiveness of 2 marketing
Compare 2 means calculated from campaigns (A/B testing)
Paired test t-test
Two the same population • Compare liking level of 2 new
population prototypes
mean test
Independent Compare mean of 2 independent • Compare basket size between male &
Z-test/ t-test
test population female

Mean test for Compare mean of >2 independent


ANOVA test • Compare basket size among age group
more than 2 population population
HYPOTHESIS TESTING
TECHNIQUE

Testing case Description Method Application

One population Compare population proportion with • Test whether number of defective
Z-test
proportion test a given number product under threshold

Compare proportion of 2
Two population • Test whether more customer visit our
independent population with same Z-test
proportion test stores in weekend than weekday
variance
Determine whether or not a
Goodness of • Test if ratio of male customers vs.
categorical variable follows a
Fit Test female is 50:50
hypothesized distribution
Non-
Chi-square
parametric Independenc
Determine whether or not there is a tests
tests e test of 2 • Test whether male prefer blue while
significant association between two
nominal female like pink
categorical variables
variables
HYPOTHESIS TESTING APPLICATION

• A/B testing.
• Evaluate new campaign effectiveness.
• Product test: Compare 2 new products to
decide which one will be launched.
• Figure out different characteristics among
customer groups.
CORRELATION & REGRESSION

H1: Basket size is different among different


Age groups.

Correct. But…
H2: Basket size does increase as
Customers get older.

This cannot be concluded... yet.


→ Relationship: Age, Basket size

H1, H2: Hypothesis 1,2


CORRELATION

Correlation is a measure of degree of association and direction of relationship that exists


between two random variables.

Correlation coefficient
CORRELATION

Correlation coefficient = 0.4

Excel Formula 800,000

700,000
Correlation coefficient = 0.9

600,000

Basket size
500,000

400,000

300,000

200,000
15 20 25 30 35 40 45 50 55 60 65 70 75

Age
CORRELATION
WHY OLDER PEOPLE PAY MORE FOR AN ORDER?

Hypothesis Interpreting Hypothesis


• Because they are older? • Age affect Basket size.
• Because older people often have higher income? • Income affect Basket size.
• Because products for older people are more • Product price affect Basket size.
expensive?
CORRELATION

Affect and Correlation

Affect

Correlation is only an association relationship


Correlation and not a causal relationship.

Affect
REGRESSION

Correlation coefficient = 0.9


800,000

700,000

600,000
Basket size

500,000 If the customer is 1 year older, how


much more will their Basket size gain?
400,000

300,000

200,000
15 20 25 30 35 40 45 50

Age

Chart is now limited to 50 years old, not 75


REGRESSION
FORMULA

• Regression is a statistical technique that identify how variables affect each other.
• Regression analysis attempts to determine how independent variables (aka explanatory
variable or predictor variable) affect a dependent variable (aka response variable or
outcome variable).
• Linear Regression analysis is the most popular method and fits almost all data points.

Y = β0+ β1X1 + β2X2 +…+ βnXn

Variants:
Y: dependent variable which are Numerical data (sometimes Ordinal is also
acceptable)
X1, X2 : independent variables which are Numerical/ Ordinal data
β: The magnitude that X affects Y
REGRESSION

Correlation coefficient = 0.9


800,000

700,000

600,000
Basket size

500,000 If the customer is 1 year older, how


much more will their Basket size gain?
400,000

300,000

200,000 Basket size = 288,690 + 6,013 * Age


15 20 25 30 35 40 45 50

Age
REGRESSION
NON-LINEAR REGRESSION

LOGISTIC REGRESSION POLYNOMIAL REGRESSION

Quadratic
Y = β0+ β1X1 + β2X12

Cubic
Y = β0+ β1X1 + β2X12 + β3X13
REGRESSION

LINEAR REGRESSION POLYNOMIAL REGRESSION


800000

700000

600000
Basket size

500000

400000

300000

200000
15 20 25 30 35 40 45 50 55 60 65 70 75
Age

Basket size = 407,638 + 1,849 * Age Basket size = -133 + 13,305 * Age + 19,738 * Age2
R2 = 19% R2 = 40%
R2: khả năng giải thích của mô hình
CORRELATION & REGRESSION
APPLICATION

Correlation Regression
• Measures the
relationship
relationship
of 2of
factors
2 factors
thatthat
havehave
no • Figures out which factors are the most
no
causal
causal
relationship.
relationship. important factors affect customer satisfaction.
• Measures the
relationship
relationship
of 2of
factors
2 factors
before
before • Forecast: Forecast market demand if we
conducting regression analysis. increase our price while competitors maintain
theirs.
• Optimization: Measure how much turnover will
be increased if we invest one more dollar for
advertising activities vs. one more dollar for
promotions.
03

PREDICTIVE
ANALYTICS
BUSINESS ANALYTICS TECHNIQUE

Descriptive Diagnostics Predictive Prescriptive A


Analytics Analytics Analytics Analytics

Data sources & Data mining Regression Optimization


Data type
Inferential Statistics Forecasting Simulation
Data Preparation
Hypothesis Classification Machine learning
Descriptive testing & Artificial
Statistics Clustering Intelligence
Correlation
Exploratory Data
Regression Association
Analysis
Advanced Data
Data Visualization Visualization
FORECASTING

Sales
forecast for
the next 6
months?
FORECASTING

• Forecasting is one of the most important and


frequently addressed problems in analytics.
• Inaccurate forecasting can have significant impact
on both top line and bottom line of an organization.
Forecasting applications:
• Sales forecast.
• Demand forecast.
• Economic forecasting (GDP, CPI, Inflation,…).
• Price forecasting (Fuel, Stock,…).
• Transportation forecasting.
• Disaster prediction (earthquake, flood,…).
FORECASTING TECHNIQUE

Moving average method

Exponential smoothing

Auto-regressive moving average (ARMA)

Auto-regressive integrated moving average


(ARIMA)

Autoregressive conditional heteroskedasticity


(ARCH)

Generalized ARCH (GARCH)


FORECASTING
CLASSIFICATION

• A company want to predict the customers who are likely to churn.


• Customers who are likely to respond to a marketing campaign through phone calls/emails.
• HR Department try to predict whether an employee would leave the company of not.
• Many organizations such as banks, e-commerce and insurance companies have to deal with
fraudulent transactions. They may like to predict whether a transaction is fraud or not.
• A bank may like to classify their customers based on risk such as low-, medium- and high-
risk customers under loan portfolio.
• Sentiment about a product or service in social media can be classified as positive, negative,
or neutral.
• Health service providers based on diagnostic tests may classify the patients as positive or
negative.
• Predict outcome of any sporting event, for example, in case of football the outcome will be
Win, Draw or Loss.
CLASSIFICATION

The objective of classification is to predict the class of a target value based on the values of
explanatory variables or predictors
CLASSIFICATION TECHNIQUE

Stochastic K-Nearest
Logistic
Naïve Bayes Gradient Neighbours
Regression
Descent (KNN)

Support Vector
Decision Trees Random Forest
Machine (SVM)
CLASSIFICATION
DECISION TREES

Should we go out for sightseeing?

Weather

Sunny Cloudy Rainy

Humidity Yes Wind

High Low Strong Weak

No Yes No Yes
CLUSTERING
CUSTOMER SEGMENTATION
CLUSTERING
CLASSIFICATION VS. CLUSTERING

Classification Clustering
• Classification are supervised learning • Clustering are unsupervised learning
algorithms (where classes are known a algorithms (classes are not known a
priori in the training data). priori).
• Sometimes Classification algorithms • Clustering is
algorithms
usually used
create
forcomplex
a
can mistake between items. descriptive
criteria that problem.
can be difficult for Business
• Classification is mostly applied for Analysts to decipher.
predictive problem. • Clustering is usually used for a
descriptive problem.
CLUSTERING TECHNIQUE

Hierarchical clustering

K Means

Fuzzy C Means Algorithm

Mean Shift Clustering

Density-based Spatial Clustering

Gaussian Mixed Models with Expectation-Maximization Clustering


CLUSTERING TECHNIQUE
HIERARCHICAL CLUSTERING & K MEANS

Hierarchical clustering K Means


CLUSTERING APPLICATION

• Customer segmentation
• Social network analysis
• Search result grouping
• Image segmentation
• Anomaly detection
ASSOCIATION
ASSOCIATION
ASSOCIATION

Association analysis is an Unsupervised data mining technique where there is no target


variable to predict. Instead, the algorithm discovers the interesting relationship between the
data items in the form of rules

Association Application:

• Market Basket analysis

• Recommendation system
Combine insights from previous analysis
And then to make data-driven decisions

WHAT? Business Analytics is meaningless


without a proper actionable plan
04

PRESCRIPTIVE
ANALYTICS
BUSINESS ANALYTICS TECHNIQUE

Descriptive Diagnostics Predictive Prescriptive


Analytics Analytics Analytics Analytics

Data sources & Data mining Forecasting Optimization


Data type
Inferential Statistics Classification Simulation
Data Preparation
Hypothesis Clustering Machine learning
Descriptive testing & Artificial
Statistics Intelligence
Correlation Association

Exploratory Data
Analysis Regression

Advanced Data
Data Visualization Visualization
OPTIMIZATION

Area Decision Variables


Financial investment • Investment alternatives and amounts
Marketing • Advertising budget
• Where to advertise
Manufacturing • What and how much to produce
• Inventory levels
• Compensation programs
Accounting • Use of computers
• Audit schedule
Transportation • Shipments schedule
• Use of smart cards
Services • Staffing levels
SIMULATION

Simulation has emerged as the next


generation in prescriptive analytics by
answering “What if…” questions at the
speed of business. The algorithm is used
to find a good enough solution, or the best
solution among alternatives, using
experimentation.
AI – ML - DL
MACHINE LEARNING vs. ARTIFICIAL INTELLIGENCE

Machine learning Artificial intelligence


• Machine learning is a subset of AI which uses • Artificial intelligence is the capability of a
algorithm/ mathematical models to help a computer system to mimic human cognitive
computer learn from past data, enables the functions such as learning, planning and
computer to continue learning and improving on problem-solving. Artificial Intelligence applies
its own, based on experience. machine learning, deep learning and other
• The goal of ML is to allow machines to learn techniques to solve actual problems.
from data so that they can give accurate output. • The goal of AI is to make a smart computer
system like humans to solve complex problems.
MACHINE LEARNING vs. ARTIFICIAL INTELLIGENCE

Machine learning Artificial intelligence


• ML is working to create machines that can • AI is working to create an intelligent system that
perform only those specific tasks for which they can perform various complex tasks. The main
are trained. The main applications of ML are the applications of AI are Siri, customer support
online recommender system, Google search using catboats, expert systems, online game
algorithms, Facebook auto friend tagging playing, intelligent humanoid robots, etc
suggestions, etc.
MACHINE LEARNING vs. DEEP LEARNING

Machine Learning is old…

Deep Learning — The next big Thing


THANK YOU

60

You might also like