You are on page 1of 27

The First Step in Information Management

MONTHLY SERIES

Descriptive, Prescriptive and Predictive Analytics


March 2, 2017

Produced by: Brought to you in partnership with:

www.firstsanfranciscopartners.com
Polling Questions
§ What type of statistical analyses do you use or plan to use (can choose multiple answers)?
− Descriptive
− Predictive
− Prescriptive
− I don’t use any of these
− I don’t know the difference between these

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 2


Polling Questions
§ What type of statistical analyses do you use or plan to use (can choose multiple answers)?
− Descriptive
− Predictive
− Prescriptive
− I don’t use any of these
− I don’t know the difference between these
§ How frequently do you use statistical analyses in your work?
− I don’t currently do any type of statistical analysis
− Less than once a week
− Once or a few times a week
− At least once a day

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 3


Topics For Today’s Webinar

§ Overview of statistical analysis process Descriptive


− Forming a hypothesis
− Identifying appropriate sources Predictive
− Proving/Disproving the hypothesis
§ Types of data analysis Prescriptive
− Descriptive data analytics
− Predictive data analytics
Combine?
− Prescriptive data analytics
§ How these types compare within the analytic environment
§ Key takeaways and suggested resources

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 3


The Process of Statistical Analysis
When we have resource constraints, Statistical Analysis enables us to make quantitative
inferences based on an amount of information we can analyze (a sample).
Form Identify Data Prove/Disprove
Hypotheses Source Hypothesis

• Null: Nothing • Don’t go • Is Type I or


special overboard! Type II error
• Alternative: • Collect your worse?
Something own, OR • Choose
unique, an • Use confidence
actionable secondary level
finding, etc. data • Reject/not
reject null
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 5
Step 1: Forming a Hypothesis
§ In statistical analysis, we have two hypotheses:
− Null hypothesis: Claims that any irregularities in the sample are due
to chance Step 1
− Alternative hypothesis: Claims that irregularities in the sample are due
to non-random causes (and would therefore reflect the population)
§ What are you really looking to discover/prove?
− Experiment 1:
§ Null: There is no difference in the amount sold when comparing salespeople who did
and did not receive training.
§ Alternative: There is a difference in the amount sold when comparing salespeople who
did and did not receive training.
− Experiment 2:
§ Null: The salespeople who received training do not sell more on average than the
salespeople who did not receive training.
§ Alternative: Salespeople who received the training sell more on average than those who
did not receive the training.
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 6
Step 2: Identifying Appropriate Sources
§ Remember, you don’t need Big Data for every decision!
§ Sometimes, knowing what data you don’t need is just as important
as knowing what you do need. Keep your end decision in mind. Step 2
§ Potential sources of data:
− Primary data − collect new data
§ Who to include: Random sample, stratified random sample, etc.
§ How many to include: Sample size calculators online (free)
§ Determine the level of measurement needed for your desired analysis:
categorical, ordinal, interval, rational
§ As necessary, design a control group
− Secondary data − utilize existing data
§ Census records, syndicated data, government data, etc.
§ Consider your data needs, data cleanliness, cost, etc., when determining
appropriate sources.

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 7


Step 3: Proving/Disproving the Hypothesis
§ Establish a confidence level prior to analysis.
§ Confidence levels: Step 3
1. Determine how significant a difference/irregularity must be for you
to prove/disprove your alternative hypothesis.
2. Determine how confident you can be in your decision.
§ Even with a high confidence level, you aren’t always right:
− Type I error: You reject the null hypothesis but shouldn’t have.
− Type II error: You do not reject the null hypothesis but should have.
− How to decrease the likelihood of these errors: change the confidence level, increase
sample size (be aware of effect size), etc.
§ Determine which type of error is more detrimental to your investigation and set
up your study accordingly.

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 8


Step 3: Proving/Disproving the Hypothesis

Std. Std. Error


Training N Mean
Deviation Mean § Confidence level = 95%
No training 74 102.643 9.95482 1.15722
QPctQ3
Training 74 106.3889 9.83445 1.14323
§ Alpha = 0.05
95% 95%
Confidence Confidence
Levene's Test for t-test for Equality of Sig. (2- Mean Std. Error
Interval of Interval of Percent of 3rd Quarter Quota Sold
Equality of Variances Means tailed) Difference Difference
the the by Trained vs. Untrained
Difference Difference
Lower Upper
Salespeople
F Sig.
108
106
0.029 0.865 -2.303 146 0.023 -3.74595 1.6267 -6.96086 -0.53103
104
102
-2.303 145.978 0.023 -3.74595 1.6267 -6.96087 -0.53102 100
No training Training

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 9


Types of Data Analysis

www.firstsanfranciscopartners.com
Types of Data Analysis

Descriptive Predictive Prescriptive

• Aims to help • Helps forecast • Suggests


uncover valuable behavior of people conclusions or
insight from the and markets actions that may
data being analyzed • Answers the question be taken based
• Answers the “What could happen?” on the analysis
question • Answers the
“What happened?” question
“What should
be done?”

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 11


Descriptive Data Analytics
§ Though the most simple type, it is used most Mean, Median and Mode Amounts
often. of Items Purchased
§ Two types of descriptive analysis: 7 6.5
1. Measures of central tendency (tells us 6
about the middle) 5
§ Mean − the average 4
§ Median − the midpoint of the 3
responses 2
2

§ Mode − the response with the highest 1


1
frequency 0
2. Measures of dispersion Mean Median Mode
§ Range − the min, the max and the
distance between the two
§ Variance − the average degree to which
Customer_ID Items Purchased Amount Spent
each of the points differ from the mean 29304 1 $ 1.09
§ Standard Deviation − the most 28308 3 $ 44.43
common/standard way of expressing
the spread of data 19962 21 $ 218.58
30281 1 $ 73.02

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 12


Predictive Analysis

www.firstsanfranciscopartners.com
Predictive Data Analytics

§ Some mistake predictive analysis to have exclusive relevance to predicting


future events.
− However, in cases such as sentiment analysis, existing data (e.g., the text
of a tweet) is used to predict non-existent data (whether the tweet is positive
or negative).
§ Several of the models that can be used for predictive analysis are:
− Forecasting
− Simulation
− Regression
− Classification
− Clustering

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 14


Predictive Forecasting
§ Forecasting:
− Moving average technique: use the Net Income of Store C Projected 2017-2020
mean of prior periods to predict the $25,000.00
next
§ The mean of periods 1−4 = period 5 $20,000.00
§ The mean of periods 2−5 = period 6
− Exponential smoothing technique: $15,000.00

similar, but more recent data points $10,000.00


are weighted more heavily due to
relevance $5,000.00
− Regression techniques
§ Use caution in forecasting – The $-
2006 2008 2010 2012 2014 2016 2018 2020 2022
larger the forecasted time period,
the less accuracy there is in the
projections.

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 15


Predictive Simulation

§ Simulation
− Queuing models: used to predict wait time and queue length
§ Results can be used to create staff schedules in a way that reduces inefficiencies, etc.
− Discrete event model: used in special situations when queuing cannot be used
§ Results can be used to identify bottlenecks, etc.
− Monte Carlo simulations: used to identify probable outcomes of a scenario
based on many possible outcomes (uses random number generation and many
iterations of the scenario).
§ Results can be used to predict the likelihood of profitability within the first two years, etc.

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 16


Predictive Queuing Model Example

Scenario 1 Scenario 2

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 17


Predictive Monte Carlo Simulation Example

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 18


Predictive Regression

§ Regression − generally speaking, used


to understand the correlation of
independent and dependent variables
§ Types of regression models:
− Logistic: used for categorical variables (i.e., will customers shop at your store or a
competitor?)
− Linear: used to identify a linear relationship between the dependent variable and
at least one independent variables (i.e., daily store revenue predicted by the
number of customers entering the store)
− Step-wise: used to identify a relationship between dependent/independent
variables. This is done by adding/removing variables based on how those
variables impact the overall strength of the model.
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 19
Predictive Classification & Clustering

§ Classification: used to assign objects to


one of several categories
− Sentiment analysis of social media
postings
§ Clustering: another method of forming
groups
− Intragroup differences are minimized
− Intergroup differences are maximized
− Commonly used to create and better
understand customer groups

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 20


Prescriptive Analysis

www.firstsanfranciscopartners.com
Prescriptive Data Analytics

§ Decisions can be formulated from descriptive and predictive analysis


− If I need to cut a product and I know that product C is least preferred and least
profitable, I will cut product C.
§ However, prescriptive analytics explicitly tell you the decisions that should
be made. This can be done using a variety of techniques:
− Linear programming
− Integer programming
− Mixed integer programming
− Nonlinear programming

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 22


Prescriptive Linear Programming Example

Product A Product B Product C Product D Product E


Quantity to Order
Profit per Unit $ 5 $ 3 $ 20 $ 50 $ 200 Total Profit $ -
Product A Product B Product C Product D Product E Used Available
Storage Space 0.05 0.5 1 5 10 1000
Selling Effort 0.25 5 0.5 2 7 500
Minimum Order 100 15 20 60 5

Solution:
Product A Product B Product C Product D Product E
Quantity to Order 100 15 490 60 5
Profit per Unit $ 5 $ 3 $ 20 $ 50 $ 200 Total Profit $ 14,345.00

Product A Product B Product C Product D Product E Used Available


Storage Space 0.05 0.5 1 5 10 852.5 1000
Selling Effort 0.25 5 0.5 2 7 500 500
Minimum Order 100 15 20 60 5

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 23


Comparing the Three Types of Data Analytics
§ Descriptive analysis is most common.
− Best practice to perform descriptive
analyses prior to prescriptive/predictive
§ Understand that distribution, variance,
skew, etc., may exclude certain models
§ How to know which type of analysis to
pursue:
− How much time do you have?
− What resources are available to you?
− How accurate is your data? How accurate
do you need the model/analysis to be?
− How popular/accepted is the model you are considering?
§ Don’t subscribe to “that’s how we’ve always done it,” but
remember to use a model that stakeholders will accept.

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 24


Key Takeaways and Suggested Resources
§ Gaining meaningful insights from data requires planning, technical awareness and consistency.
§ Statistical analysis isn’t a replacement for your own logic (don’t go on statistical autopilot).
§ Utilize available resources (blogs, podcasts, articles, webinars and online courses) to learn more.
− Look for APPLIED statistics topics
§ Big data is not always required.
§ Basic understanding of the statistical
analysis process goes a long way!

Guide: When Predictive Models Fail Book: Statistics


searchdatamanagement.techtarget.com/ in Plain English
Podcast: Not So Standard Deviations ezine/Business-Information/When- Timothy C. Urdan
https://soundcloud.com/nssd-podcast predictive-analytics-models-produce-
false-outcomes

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 25


Closing Q&A

?
Descriptive

Predictive

Prescriptive

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com pg 26


Thank you!
See you Thursday, April 6 for our next DIA webinar,
Building a Flexible and Scalable Analytics Architecture

Catch our webinar recap next week here:


firstsanfranciscopartners.com/blog

John Ladley @jladley


john@firstsanfranciscopartners.com
Kelle O’Neal @kellezoneal
kelle@firstsanfranciscopartners.com

© 2016 First San Francisco Partners www.firstsanfranciscopartners.com pg 27

You might also like