You are on page 1of 49

Descriptive &

Inferential Statistics
Prepared by Dr. Ali Bavik & as adapted by
Shanshan
Learning Objectives
• What is SPSS?
• Basic Analysis
• Types of Statistics
– Descriptive analysis
• Central Tendency
– Mean; Median; Mode
• Normal Distribution
• Standard Deviation
2
SPSS

Statistical Package for the Social Sciences


We Can Analyse Data 3 Basic Ways

1) Descriptive Statistics
• Frequencies
• Minimum
• Maximum
• Mean
• Median
• Mode
• Standard Deviation
We Can Analyse Data 3 Basic Ways

2) Examine Relationships (Level of Association)


• Correlation
• Regression

3) Compare Groups/ Cause &Effect


• T-Test
• One-Way Anova
Types of Statistics

Descriptive Statistics
• Characterize the attributes of a set of
measurements
• Used to summarize data
• Used to explore patterns of variation
• Used to describe changes over time
Central Tendency

Measures of central tendency represent the


“ typical” attributes of the data.
Mean

The mean (M) is the arithmetic


average of a group of scores or sum of
scores divided by the number of scores
Mean
Median

The Median (Mdn) is the middle score of all the


scores in a distribution arranged from highest to
lowest.

It is the mid-point of distribution when the


distribution has an odd number of scores. It is the
number halfway between the two middle scores when
the distribution has an even number of scores
Median
Mode

The mode (Md) is the value with the


greatest frequency in the distribution
Mode
Central Tendency

Mode
Most Frequently Occurring Score

Median
Middle Score

Mean
Arithmetic Average
Levels of Measurement & the Best Measure of Central
Tendency

Example: Male / Female

Example: Likert scale type

Example: Likert scale type, Temperature

Example: Weight
Frequency Distribution

•The pattern of frequencies of the


observations or listing of case counts by
category

•10 students’ scores on a math test,


arranged in order from lowest to highest:

69, 77, 77, 77, 84, 85, 85, 87, 92, 98


Frequency Distribution

The frequency (f) of a particular data set is the


number of times a particular observation
occurs in the data
Frequency Table
A chart presenting statistical data that categorizes the
values along with the number of times each value
appears in the data set

69, 77, 77, 77, 84, 85, 85, 87, 92, 98


Mean

The mean (M) is the arithmetic average of a group of


scores or sum of scores divided by the number of scores

For example, in our distribution of 10 test scores,

69 + 77+ 77+77+84+ 85+ 85+ 87+92+ 98/ 10 = 83.1


Median

The median (Mdn) is the middle score of all the


scores in a distribution arranged from highest to
lowest

69, 77, 77, 77, 84, 84.5, 85, 85, 87, 92, 98
Mode

The mode (Md) is the value with the greatest


frequency in the distribution

For example, in our distribution of 10 test scores, 77


is the mode because it is observed most frequently

69, 77, 77, 77, 84, 85, 85, 87, 92, 98


Normal Distribution
Bell-shaped Curve
Total area =1

Symmetrical
50% of the values < the mean1& 50% > the mean

0
Standard deviation is a measure of the
spread of scores
That is, how spread out is the data set?
How much does the data
vary from the average?
EXAMPLE

Test Scores

69, 77, 77, 77, 84, 85, 85, 87, 92, 98


Low standard deviation indicates that the
data closely clustered around the mean

Most students have achieved close to the average


score with few achieving high or low scores
High standard deviation indicates that the data
dispersed over a wide range of values

Most students are very spread out from the mean with
individuals achieving very high or very low scores on the test
Mean
83.1
Set of Data
Standard
Deviation
Student test scores
69, 77, 77, 77, 84, 85, 85, 87, 92, 98 8.39
Student Score Example
69 + 77+ 77+77+84+ 85+ 85+ 87+92+ 98

Mean= 83.1 Standard Deviation: 8.39


Types of Statistics
Types of Statistics

Inferential Analysis
• Used to generate conclusions about the
population’s characteristics based on the sample
data
– Determine population parameters
– Test hypotheses E.g. null hypothesis, alternative
hypothesis
– That is, results are generalisable to the population
– Only possible when using a random sample
Types of Statistics

Differences Analysis:
Used to compare the mean of the responses of one
group to that of another group
– Determine if differences exist between
groups
– Evaluate statistical significance of difference
in the means of two groups in a sample
– E.g., T-test, Paired Samples T-test, One-way
ANOVA
Types of Statistics

Associative Analysis
Determines the strength & direction of relationships
between two or more variables
–Chi-square Analysis (Cross-Tabulation)
–Correlation
–Regression Analysis
–Multiple Regression Analysis
Types of Statistics

Predictive Analysis
Allows one to make forecasts for future events based on a
statistical model
• Estimate the level of Y, given the amount of X
• For Example -
Independent T-test
Paired Samples T-test
ANOVA
Regression Analysis
Determining the Test
Parameter Estimation

Parameter estimation involves three values:


1. Sample statistic (mean or percentage generated from sample data)

2. Standard error (variance divided by sample size; formula for


standard error of the mean and another formula for standard error of
the percentage)

3. Confidence interval (gives us a range within which a sample


statistic will fall if we were to repeat the study many times over)
– E.g., 95%, 99%
Parameter Estimation – Confidence
Interval

• Confidence intervals: the degree of accuracy desired by


the researcher and stipulated as a level of confidence in
the form of a percentage

• Most commonly used level of confidence: 95%;


corresponding to 1.96 standard errors

• Other levels of confidence:


– 90% (1.64 standard error)
– 99% (2.58 standard error)
Confidence Interval

What does this mean?


•It means that we can say that if we did our study over
100 times, we can determine a range within which
the sample statistic will fall 95 times out of 100 (95%
level of confidence)

•This gives us confidence that the real population


value falls within this range
Why Differences are Important?

• Market segmentation holds that within a market, there are different types
of consumers who have different requirements, and these differences can
be the bases of marketing strategies

• Some differences are obvious – differences between teens’ & baby


boomers’ music preferences

• Other differences are not so obvious and marketers who “discover” these
subtle differences may take advantage of huge gains in the marketplace.
Why Differences are Important
Market Segmentation
• Differences must be statistically significant
– Statistical significance of differences: the differences in the
sample(s) may be assumed to exist in the population(s) from which
the random samples are drawn
– Statistically significant differences should be demonstrated between
groups

• Differences must be meaningful


– Meaningful difference: one that the marketing manager can
potentially use as a basis for marketing decisions
– The outcome must be interpretable (reasonable)
• Makes sense to you
Why Differences are Important
Market Segmentation
• Differences should be stable
– Stable difference: one that will be in place for the foreseeable future
– The differences should not be short term or changed easily

• Differences must be actionable


– Actionable difference: the marketer can focus various marketing strategies
and tactics, such as advertising, on the market segments to accentuate the
differences between segments
– Example of segmentation bases that are actionable: demographics,
lifestyles, product benefits, usage, opinions, attitudes
Parametric Statistical Test Assumptions
(aside)
• Normality
– The populations from which samples are drawn are normally distributed

• Homogeneity of Variance
– In ANOVA and T-test the variances within the groups are statistically the same

• Continuity & Equal Intervals of Measures


– The dependent variables are continuous (interval or ratio scale) and the intervals
have equal distance

• Independence of Observations
– One observation does not influence the making of another observation (except for
repeated measure analyzes)
42
Determining Statistical Significance: The
‘p’ value

• Statistical tests generate some critical value usually identified


by some letter; i.e., z, t or F.

• Associated with the value will be a p value which stands for


probability of supporting the null hypothesis (no difference or
no association).

• If the probability of supporting the null hypothesis is low, say


0.05 or less, we have significance!
Determining Statistical Significance:
The ‘p’ value
• p value = probability of committing a type I error (α)
– i.e., the result is significant but in fact wrong

• p values are often identified in SPSS with abbreviations such as “Sig.”


or “Prob.”

• p values range from 0 to 1.0

• First, we MUST determine the amount of sampling error we are willing


to accept and still say the results are significant.

• Convention is 5% (α = 0.05), and this is known as the “alpha error”


– i.e., 1 - 0.05 = 0.95 (95% confidence interval)
Testing Differences: Percentages or
Means?

• There are statistical tests for when a researcher wants to


compare the means or percentages of 2 different groups or
samples

• Percentages are calculated for questions with nominal or


ordinal level of measurement

• Means are calculated for questions with interval or ratio


(metric level of measurement)
Testing the Difference Between Two
Groups (Mean Differences)

• Null hypothesis (H0): no difference between the


means being compared
– H0: µ1 = µ2 or µ1 - µ 2 = 0

• Alternative hypothesis (H1): a true difference


between the compared means
– H1: µ1 ≠ µ2
Golden Rule!!!
We have to decide whether to ‘Do Not’ Reject
or ‘Reject’ the Null Hypothesis

•If the p-value > 0.05 then Do Not Reject the Null
Hypothesis

•If the p-value < 0.05 then Reject the Null Hypothesis

47
How do you know when the results are
significant?

• If the null hypothesis is true we would expect there to be 0


differences between the two means

• Yet we know that, in any given study, differences may be


expected due to sampling error

• If the null hypothesis were true, we would expect 95% of the t-


ratio (t-value) computed from 100 samples to fall between +&-
1.96 standard errors
How do you know when the results are
significant?

• If the computed t value is greater than +/-1.96, it is not


likely that the null hypothesis of no difference is true

• Rather, it is likely that there is a real statistical


difference between the two means

49

You might also like