2-17-Descriptive Inferential Statistics - PT 1 - JA Edit

Descriptive &
Inferential Statistics
Prepared by Dr. Ali Bavik & as adapted by
Shanshan
Learning Objectives
• What is SPSS?
• Basic Analysis
• Types of Statistics
– Descriptive analysis
• Central Tendency
– Mean; Median; Mode
• Normal Distribution
• Standard Deviation
2
SPSS
Statistical Package for the Social Sciences

We Can Analyse Data 3 Basic Ways
1) Descriptive Statistics
• Frequencies
• Minimum
• Maximum
• Mean
• Median
• Mode
• Standard Deviation
We Can Analyse Data 3 Basic Ways
2) Examine Relationships (Level of Association)

• Correlation
• Regression
3) Compare Groups/ Cause &Effect

• T-Test
• One-Way Anova
Types of Statistics
Descriptive Statistics
• Characterize the attributes of a set of
measurements
• Used to summarize data
• Used to explore patterns of variation
• Used to describe changes over time
Central Tendency
Measures of central tendency represent the

“ typical” attributes of the data.
Mean
The mean (M) is the arithmetic

average of a group of scores or sum of
scores divided by the number of scores
Mean
Median
The Median (Mdn) is the middle score of all the

scores in a distribution arranged from highest to
lowest.
It is the mid-point of distribution when the

distribution has an odd number of scores. It is the
number halfway between the two middle scores when
the distribution has an even number of scores
Median
Mode
The mode (Md) is the value with the

greatest frequency in the distribution
Mode
Central Tendency
Mode
Most Frequently Occurring Score
Median
Middle Score
Mean
Arithmetic Average
Levels of Measurement & the Best Measure of Central
Tendency
Example: Male / Female
Example: Likert scale type
Example: Likert scale type, Temperature
Example: Weight
Frequency Distribution
•The pattern of frequencies of the

observations or listing of case counts by
category
•10 students’ scores on a math test,

arranged in order from lowest to highest:
69, 77, 77, 77, 84, 85, 85, 87, 92, 98

Frequency Distribution
The frequency (f) of a particular data set is the

number of times a particular observation
occurs in the data
Frequency Table
A chart presenting statistical data that categorizes the
values along with the number of times each value
appears in the data set
69, 77, 77, 77, 84, 85, 85, 87, 92, 98

Mean
The mean (M) is the arithmetic average of a group of

scores or sum of scores divided by the number of scores
For example, in our distribution of 10 test scores,
69 + 77+ 77+77+84+ 85+ 85+ 87+92+ 98/ 10 = 83.1

Median
The median (Mdn) is the middle score of all the

scores in a distribution arranged from highest to
lowest
69, 77, 77, 77, 84, 84.5, 85, 85, 87, 92, 98
Mode
The mode (Md) is the value with the greatest

frequency in the distribution
For example, in our distribution of 10 test scores, 77

is the mode because it is observed most frequently
69, 77, 77, 77, 84, 85, 85, 87, 92, 98

Normal Distribution
Bell-shaped Curve
Total area =1
Symmetrical
50% of the values < the mean1& 50% > the mean
0
Standard deviation is a measure of the
spread of scores
That is, how spread out is the data set?
How much does the data
vary from the average?
EXAMPLE
Test Scores
69, 77, 77, 77, 84, 85, 85, 87, 92, 98

Low standard deviation indicates that the
data closely clustered around the mean
Most students have achieved close to the average

score with few achieving high or low scores
High standard deviation indicates that the data
dispersed over a wide range of values
Most students are very spread out from the mean with
individuals achieving very high or very low scores on the test
Mean
83.1
Set of Data
Standard
Deviation
Student test scores
69, 77, 77, 77, 84, 85, 85, 87, 92, 98 8.39
Student Score Example
69 + 77+ 77+77+84+ 85+ 85+ 87+92+ 98
Mean= 83.1 Standard Deviation: 8.39

Types of Statistics
Types of Statistics
Inferential Analysis
• Used to generate conclusions about the
population’s characteristics based on the sample
data
– Determine population parameters
– Test hypotheses E.g. null hypothesis, alternative
hypothesis
– That is, results are generalisable to the population
– Only possible when using a random sample
Types of Statistics
Differences Analysis:
Used to compare the mean of the responses of one
group to that of another group
– Determine if differences exist between
groups
– Evaluate statistical significance of difference
in the means of two groups in a sample
– E.g., T-test, Paired Samples T-test, One-way
ANOVA
Types of Statistics
Associative Analysis
Determines the strength & direction of relationships
between two or more variables
–Chi-square Analysis (Cross-Tabulation)
–Correlation
–Regression Analysis
–Multiple Regression Analysis
Types of Statistics
Predictive Analysis
Allows one to make forecasts for future events based on a
statistical model
• Estimate the level of Y, given the amount of X
• For Example -
Independent T-test
Paired Samples T-test
ANOVA
Regression Analysis
Determining the Test
Parameter Estimation
Parameter estimation involves three values:

1. Sample statistic (mean or percentage generated from sample data)
2. Standard error (variance divided by sample size; formula for

standard error of the mean and another formula for standard error of
the percentage)
3. Confidence interval (gives us a range within which a sample

statistic will fall if we were to repeat the study many times over)
– E.g., 95%, 99%
Parameter Estimation – Confidence
Interval
• Confidence intervals: the degree of accuracy desired by

the researcher and stipulated as a level of confidence in
the form of a percentage
• Most commonly used level of confidence: 95%;

corresponding to 1.96 standard errors
• Other levels of confidence:

– 90% (1.64 standard error)
– 99% (2.58 standard error)
Confidence Interval
What does this mean?

•It means that we can say that if we did our study over
100 times, we can determine a range within which
the sample statistic will fall 95 times out of 100 (95%
level of confidence)
•This gives us confidence that the real population

value falls within this range
Why Differences are Important?
• Market segmentation holds that within a market, there are different types
of consumers who have different requirements, and these differences can
be the bases of marketing strategies
• Some differences are obvious – differences between teens’ & baby

boomers’ music preferences
• Other differences are not so obvious and marketers who “discover” these
subtle differences may take advantage of huge gains in the marketplace.
Why Differences are Important
Market Segmentation
• Differences must be statistically significant
– Statistical significance of differences: the differences in the
sample(s) may be assumed to exist in the population(s) from which
the random samples are drawn
– Statistically significant differences should be demonstrated between
groups
• Differences must be meaningful

– Meaningful difference: one that the marketing manager can
potentially use as a basis for marketing decisions
– The outcome must be interpretable (reasonable)
• Makes sense to you
Why Differences are Important
Market Segmentation
• Differences should be stable
– Stable difference: one that will be in place for the foreseeable future
– The differences should not be short term or changed easily
• Differences must be actionable

– Actionable difference: the marketer can focus various marketing strategies
and tactics, such as advertising, on the market segments to accentuate the
differences between segments
– Example of segmentation bases that are actionable: demographics,
lifestyles, product benefits, usage, opinions, attitudes
Parametric Statistical Test Assumptions
(aside)
• Normality
– The populations from which samples are drawn are normally distributed
• Homogeneity of Variance
– In ANOVA and T-test the variances within the groups are statistically the same
• Continuity & Equal Intervals of Measures

– The dependent variables are continuous (interval or ratio scale) and the intervals
have equal distance
• Independence of Observations
– One observation does not influence the making of another observation (except for
repeated measure analyzes)
42
Determining Statistical Significance: The
‘p’ value
• Statistical tests generate some critical value usually identified

by some letter; i.e., z, t or F.
• Associated with the value will be a p value which stands for

probability of supporting the null hypothesis (no difference or
no association).
• If the probability of supporting the null hypothesis is low, say

0.05 or less, we have significance!
Determining Statistical Significance:
The ‘p’ value
• p value = probability of committing a type I error (α)
– i.e., the result is significant but in fact wrong
• p values are often identified in SPSS with abbreviations such as “Sig.”

or “Prob.”
• p values range from 0 to 1.0
• First, we MUST determine the amount of sampling error we are willing

to accept and still say the results are significant.
• Convention is 5% (α = 0.05), and this is known as the “alpha error”

– i.e., 1 - 0.05 = 0.95 (95% confidence interval)
Testing Differences: Percentages or
Means?
• There are statistical tests for when a researcher wants to

compare the means or percentages of 2 different groups or
samples
• Percentages are calculated for questions with nominal or

ordinal level of measurement
• Means are calculated for questions with interval or ratio

(metric level of measurement)
Testing the Difference Between Two
Groups (Mean Differences)
• Null hypothesis (H0): no difference between the

means being compared
– H0: µ1 = µ2 or µ1 - µ 2 = 0
• Alternative hypothesis (H1): a true difference

between the compared means
– H1: µ1 ≠ µ2
Golden Rule!!!
We have to decide whether to ‘Do Not’ Reject
or ‘Reject’ the Null Hypothesis
•If the p-value > 0.05 then Do Not Reject the Null
Hypothesis
•If the p-value < 0.05 then Reject the Null Hypothesis
47
How do you know when the results are
significant?
• If the null hypothesis is true we would expect there to be 0

differences between the two means
• Yet we know that, in any given study, differences may be

expected due to sampling error
• If the null hypothesis were true, we would expect 95% of the t-

ratio (t-value) computed from 100 samples to fall between +&-
1.96 standard errors
How do you know when the results are
significant?
• If the computed t value is greater than +/-1.96, it is not

likely that the null hypothesis of no difference is true
• Rather, it is likely that there is a real statistical

difference between the two means
49

2-17-Descriptive Inferential Statistics - PT 1 - JA Edit

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2-17-Descriptive Inferential Statistics - PT 1 - JA Edit

Uploaded by

Copyright:

Available Formats

Descriptive &

Statistical Package for the Social Sciences

2) Examine Relationships (Level of Association)

3) Compare Groups/ Cause &Effect

Measures of central tendency represent the

The mean (M) is the arithmetic

The Median (Mdn) is the middle score of all the

It is the mid-point of distribution when the

The mode (Md) is the value with the

Example: Male / Female

Example: Likert scale type

Example: Likert scale type, Temperature

•The pattern of frequencies of the

•10 students’ scores on a math test,

69, 77, 77, 77, 84, 85, 85, 87, 92, 98

The frequency (f) of a particular data set is the

69, 77, 77, 77, 84, 85, 85, 87, 92, 98

The mean (M) is the arithmetic average of a group of

For example, in our distribution of 10 test scores,

69 + 77+ 77+77+84+ 85+ 85+ 87+92+ 98/ 10 = 83.1

The median (Mdn) is the middle score of all the

The mode (Md) is the value with the greatest

For example, in our distribution of 10 test scores, 77

69, 77, 77, 77, 84, 85, 85, 87, 92, 98

69, 77, 77, 77, 84, 85, 85, 87, 92, 98

Most students have achieved close to the average

Mean= 83.1 Standard Deviation: 8.39

Parameter estimation involves three values:

2. Standard error (variance divided by sample size; formula for

3. Confidence interval (gives us a range within which a sample

• Confidence intervals: the degree of accuracy desired by

• Most commonly used level of confidence: 95%;

• Other levels of confidence:

What does this mean?

•This gives us confidence that the real population

• Some differences are obvious – differences between teens’ & baby

• Differences must be meaningful

• Differences must be actionable

• Continuity & Equal Intervals of Measures

• Statistical tests generate some critical value usually identified

• Associated with the value will be a p value which stands for

• If the probability of supporting the null hypothesis is low, say

• p values are often identified in SPSS with abbreviations such as “Sig.”

• p values range from 0 to 1.0

• First, we MUST determine the amount of sampling error we are willing

• Convention is 5% (α = 0.05), and this is known as the “alpha error”

• There are statistical tests for when a researcher wants to

• Percentages are calculated for questions with nominal or

• Means are calculated for questions with interval or ratio

• Null hypothesis (H0): no difference between the

• Alternative hypothesis (H1): a true difference

• If the null hypothesis is true we would expect there to be 0

• Yet we know that, in any given study, differences may be

• If the null hypothesis were true, we would expect 95% of the t-

• If the computed t value is greater than +/-1.96, it is not

• Rather, it is likely that there is a real statistical

You might also like