You are on page 1of 42

Inferential Statistics

Descriptive Statistics

• It summarizes the characteristics of a data set.


Inferential Statistics

• It helps you come to conclusions and make


predictions based on your data
• When you have collected data from a sample, you
can use inferential statistics to understand the
larger population from which the sample is
taken.
Inferential statistics have two main uses:

• making estimates about populations (for example, the


mean SAT score of all 11th graders in the US).
• testing hypotheses to draw conclusions about populations
(for example, the relationship between SAT scores and
family income).
Using descriptive statistics, you can report
characteristics of your data:

• The distribution concerns the frequency of each


value.
• The central tendency concerns the averages of
the values.
• The variability concerns how spread out the
values are.
Example

• Descriptive statistics
• You collect data on the SAT scores of all 11th
graders in a school for three years. You can use
descriptive statistics to get a quick overview of
the school’s scores in those years. You can then
directly compare the mean SAT score with the
mean scores of other schools.
Inferential statistics

• Use your sample to make reasonable guesses about the


larger population.
• It’s important to use random and unbiased sampling methods
• Example: Inferential statistics You randomly select a sample
of 11th graders in your state and collect data on their SAT
scores and other characteristics. You can use inferential
statistics to make estimates and test hypotheses about the
whole population of 11th graders in the state based on your
sample data
Sampling error in inferential statistics

• Since the size of a sample is always smaller than the size of the
population, some of the population isn’t captured by sample
data.
• This creates sampling error, which is the difference between
the true population values (called parameters) and the
measured sample values (called statistics).
• Sampling error arises any time you use a sample, even if your
sample is random and unbiased. For this reason, there is always
some uncertainty in inferential statistics. However, using
probability sampling methods reduces this uncertainty.
Estimating population parameters from
sample statistics

• The characteristics of samples and populations are


described by numbers called statistics and parameters:
• A statistic is a measure that describes the sample (e.g.,
sample mean).
• A parameter is a measure that describes the whole
population (e.g., population mean).
Sampling error

• It is the difference between a parameter and a


corresponding statistic. Since in most cases you don’t
know the real population parameter, you can use
inferential statistics to estimate these parameters in a
way that takes sampling error into account.
There are two important types of estimates you can
make about the population:

• point estimates; and


• interval estimates
Point Estimate

• It is a single value estimate of a parameter.


• For instance, a sample mean is a point estimate
of a population mean.
Interval Estimate

• It gives you a range of values where the


parameter is expected to lie. A confidence
interval is the most common type of interval
estimate.
Confidence intervals

• It uses the variability around a statistic to come up with


an interval estimate for a parameter.
• It is useful for estimating parameters because they take
sampling error into account.
• Each confidence interval is associated with a confidence
level.
Confidence Level

• It tells you the probability (in percentage) of the


interval containing the parameter estimate if you
repeat the study again.
95% Confidence Interval

• It means that if you repeat your study with a new


sample in exactly the same way 100 times, you
can expect your estimate to lie within the
specified range of values 95 times.
Example: Point estimate and
confidence interval

• You want to know the average number of paid vacation days that
employees at an international company receive.
• After collecting survey responses from a random sample, you
calculate a point estimate and a confidence interval.
• Your point estimate of the population mean paid vacation days is
the sample mean of 19 paid vacation days.
• With random sampling, a 95% confidence interval of [16 22] means
you can be reasonably confident that the average number of
vacation days is between 16 and 22.
Hypothesis testing

• It is a formal process of statistical analysis using


inferential statistics.
• The goal of hypothesis testing is to compare
populations or assess relationships between
variables using samples
Hypotheses, or predictions

• These are tested using statistical tests. Statistical


tests also estimate sampling errors so that valid
inferences can be made.
Statistical tests can be:

• parametric or non-parametric.
Parametric tests

• These are considered more statistically powerful because


they are more likely to detect an effect if one exists.
Parametric tests make assumptions that
include the following:

• the population that the sample comes from follows a


normal distribution of scores the sample size is large
enough to represent the population the variances, a
measure of spread, of each group being compared are
similar
Non-parametric

• When your data violates any of these assumptions, non-


parametric tests are more suitable.
• Non-parametric tests are called “distribution-free tests”
because they don’t assume anything about the
distribution of the population data.
Statistical tests

• These come in three forms:


1. Tests of Comparison,
2. Correlation or
3. Regression.
Comparison tests

• Theses assess whether there are differences in means,


medians or rankings of scores of two or more groups.
• To decide which test suits your aim, consider whether your
data meets the conditions necessary for parametric tests,
the number of samples, and the levels of measurement of
your variables.
• Means can only be found for interval or ratio data, while
medians and rankings are more appropriate measures for
ordinal data.
T-Test

• Parametric? Yes
• What’s being compared? Means
• Samples: 2
ANOVA

• Parametric? Yes
• What’s being compared? Means
• Samples: 3+ samples
Mood’s Median

• Parametric? No
• What’s being compared? Medians
• Samples: 2+ samples
Wilcoxon Signed-rank

• Parametric? No
• What’s being compared? Distributions
• Samples: 2
Wilcoxon rank-sum (Mann-Whitney U)

• Parametric? No
• What’s being compared? Sums of Rankings
• Samples: 2
Kruskal-Wallis H

• Parametric? No
• What’s being compared? Mean rankings
• Samples: 3+ samples
Correlation tests

• These determine the extent to which two variables are


associated.
• Although Pearson’s r is the most statistically powerful
test, Spearman’s r is appropriate for interval and ratio
variables when the data doesn’t follow a normal
distribution.
• The chi square test of independence is the only test that
can be used with nominal variables
Pearson’s r

• Parametric? Yes
• Variables: Interval/ratio variables
Spearman’s r

• Parametric? No
• Variables: Ordinal/Interval/ratio variables
Chi square test of independence

• Parametric? No
• Varaibles: Nominal/ordinal variables
Regression Tests

• These demonstrate whether changes in predictor variables cause


changes in an outcome variable.
• You can decide which regression test to use based on the number and
types of variables you have as predictors and outcomes.
• Most of the commonly used regression tests are parametric.
• If your data is not normally distributed, you can perform data
transformations.
• Data transformations help you make your data normally distributed
using mathematical operations, like taking the square root of each
value.
Simple Linear Regression

• Predictor: 1 interval/ratio variable


• Outcome&nbsp: 1 interval/ratio variable
Multiple Linear Regression

• Predictor: 2+ interval/ratio variable(s)


• Outcome & nbsp: 1 interval/ratio variable
Logistic Regression

• Predictor: 1+ any variable (s)


• Outcome&nbsp: 1 binary variable
Nominal Regression

• Predictor: 1+ any variable (s)


• Outcome&nbsp: 1 nominal variable
Ordinal Regression

• Predictor: 1+ any variable (s)


• Outcome&nbsp: 1 ordinal variable
End of the Presentation

You might also like