THE STATISTICS WORKFLOW
Understand what If the sample data If the sample doesn’t Continue to leverage Use additional
your sample data fits a probability fit a distribution, use the central limit variables to increase
looks like distribution, use it as the central limit theorem to draw the accuracy of your
a model for the theorem to make conclusions about estimates and make
entire population estimates about what a population predictions based
population looks like based on a on their
parameters sample relationships
PRO TIP: do what is required for the task and don’t go
overboard. If you have all the population data, or simply need
a bit of inspiration to make an “unimportant” decision, then
descriptive statistics may be all you need!
Maven Analytics
DESCRIPTIVE STATS: 3 TYPES
Represents the frequency of each value Represents the middle of the values Represents the dispersion of the values
Examples: Examples: Examples:
• Frequency Tables • Mean, Median, and Mode • Min, Max, and Range
• Histograms • Skew • Quartiles & Interquartile Range
• Box & Whisker Plots
• Variance & Standard Deviation
Maven Analytics
PROBABILITY DISTRIBUTIONS
1) Discrete probability distributions
Uniform Binomial Poisson
The height of each bar is its probability
There are “gaps” between the numbers
2) Continuous probability distributions
Uniform Exponential Normal The height of the curve is NOT its
probability, the area under the curve is
(more on this later!)
The numbers can take any value
Maven Analytics
CONFIDENCE INTERVALS
A confidence interval is an estimate of an unknown population value using a sample
• It is a range defined by a point estimate, like the sample mean, plus/minus a margin of error
• It includes a confidence level, or probability of including the population value (can’t be
certain!)
The area is the
μ confidence level
Estimating the population mean:
The distance
𝑥̅ = 𝟐𝟑𝟗. 𝟗 between the mean
and bounds is the
margin of error
𝜇= ?
Remember, the sample means are normally 239.9
distributed around the population mean
239.9
239.9
239.9
It’s possible, but not probable, that
the interval won’t include the mean!
n=95
Maven Analytics
HYPOTHESIS TESTING
Ho: μ = μo Ho: μ ≥ μo Ho: μ ≤ μo
Ha: μ ≠ μo Ha: μ < μo Ha: μ > μo
μo μo μo
p/2 p/2 p p
tlower tupper t t
Excel p-value formulas: Excel p-value formulas: Excel p-value formulas:
=[Link](tlower, df, TRUE)*2 =[Link](t, df, TRUE) =[Link](t, df, TRUE)
=[Link].2T(tupper, df) =[Link](t, df)
Maven Analytics
REGRESSION ANALYSIS
The goal of regression is to predict a dependent variable using independent variables
• This is achieved by fitting a line through the sample data points that models the population
This line is a model that can be used
to predict site traffic in a given month
based on the advertising budget!
This is the dependent variable (y), This is the independent variable (x), which
which is what you’re trying to predict helps you predict the dependent variable
Maven Analytics
NEW COURSE: STATS FOR
DATA ANALYSIS!
Discuss the role of statistics in the context of business
Why Statistics? intelligence and decision-making, and introduce the statistics
workflow
Understand data using descriptive statistics, including
Descriptive Statistics frequency distributions and measures of central tendency &
variability
Model data with probability distributions, and use the
Probability Distributions normal distribution to calculate probabilities and make value
estimates
Introduce the Central Limit Theorem, which leverages the
Central Limit Theorem normal distribution to make inferences on populations with
any distribution
Make estimates with confidence intervals, which use sample
Confidence Intervals statistics to define a range where an unknown population
parameter likely lies
Draw conclusions with hypothesis tests, which let you
Hypothesis Tests evaluate assumptions about population parameters using
sample statistics
Make predictions with regression analysis, and estimate the
Regression Analysis values of a dependent variable via its relationship with
independent variables
Maven Analytics
THE COURSE PROJECT
You’ve just been hired as a Recruitment Analyst by Maven Business School, an
online startup that’s looking to disrupt the postgraduate programs offered by
traditional universities
You have data from the first graduating class of their MBA program, including
details & scores from their application, the program itself, and their employment
status 2 months later
Your goal is to leverage statistics to evaluate the results of this class, predict the
performance of future classes, and propose changes in recruitment to improve
graduate outcomes
• Understand the data with descriptive statistics
• Model the data with probability distributions
• Make estimates with confidence intervals
• Draw conclusions with hypothesis tests
• Make predictions with regression analysis
Maven Analytics
COURSE EXPECTATIONS
This course is about introducing & demystifying essential statistics concepts
• Our goal is to break down seemingly complex techniques using simple and intuitive
explanations that will help you develop an intuition into when, why, and how to
use them in the real world
It’s also about applying those concepts to real-world use cases
• As we introduce each topic, we’ll use Microsoft Excel as a tool to apply them through
hands-on demos & assignments, and include additional projects to test your
knowledge in different scenarios
We’ll be using Excel for Office 365 on a PC for the course demos
• What you see on your screen may not always match what you see on ours, especially
if you are running a different operating system or following along with an older
version of Excel
You do NOT need a math or stats background to take this course
• Although we will cover many statistical equations (and their equivalent Excel
functions), the focus will be placed on the meaning behind them and not in the
technical details or proof
Maven Analytics
WHERE TO FIND THE COURSE
Like all Maven Analytics courses, Statistics
for Data Analysis is included with an
unlimited access subscription at
[Link]
For those who prefer to purchase individual
courses, this one just went live on Udemy
and you can get it for $9.99 with code:
STATSFORDATA
[Link]/course/essential-statistics-for-
data-analysis/
Maven Analytics