You are on page 1of 155

MSC 102

Comprehensive Exam
Review
Ere Lee Q. Salang
Statistician, WMSU
erelee.salang@gmail.com
LEARNING OUTCOMES
At the end of the review, you are expected to
master identification and performance of the
following items:
○ Level of ○ Statistical Process involving:
measurement o Chi-Square Test for Ordinal
classification Levels or Chi-Square Test for
Nominal Levels
○ Determining o Single-Sample t-Test for Means
measures of central o Dependent Samples t-Test or
tendencies Independent Samples t-Test
o ANOVA or Kruskal Wallis Test
○ Presenting graphical
data
2
o Chi-Square Test for Ordinal Levels or Chi-Square Test o Ordinal / Nominal categories
for Nominal Levels
o Single-Sample t-Test for Means o Interval one sample group
o Dependent Samples t-Test or Independent Samples t- o Interval 2 sample group
Test
o ANOVA o Interval three samples
o Kruskal Wallis Test o Ordinal three samples

3
Review?!
○ All references can be downloaded
at:
https://tinyurl.com/msc201review

4
Let’s review some
concepts
STATISTICS Population Variable
The branch of Collection of all A characteristic or
science that elements under attribute of the
deals with the consideration in a elements in a
collection, statistical inquiry collection that
presentation, assume different
organization, Sample values for the
analysis and Subset of the different elements
interpretation population Observation
of data
Realized value of the
variable
5
Let’s review some
concepts
STATISTICS Variable

Descriptive Stat
Qualitative Quantitative
All techniques used in
Categories Quantity or
organizing, summarizing, amount, expressed
used as labels
and presenting data to distinguish numerically
classification
Inferential Stat ➢ Discrete
Integral values or whole numbers
All techniques used in
analyzing the sample data ➢ Continuous
that leads to generalizations Values with fractions or decimals
about the population where
6
the sample was taken
Let’s review some
concepts
➢ Measurement Variable
The process of determining
the value or label of the Qualitative Quantitative
variable based on what been
observed.

Nominal

Ordinal

Interval

Ratio
7
1.
Classifying the level
of measurement:
nominal, ordinal, or
interval/ratio

8
Levels of Measurement

9
Levels of Measurement

10
Levels of Measurement

11
Levels of Measurement

12
Levels of Measurement

13
Exercises
Lets test your understanding for
Levels of measurement
14
Identify the following variables, indicate whether it is qualitative
or quantitative, and identify the level of measurement.

1. Gender 1. QL Nominal
2. Age 2. QN Interval
3. Height 3. QN Ratio
4. Per capita income (Php) 4. QN Ratio
5. Barangay 5. QL Nominal
6. Number of firearms 6. QN Ratio
7. Distance traveled by car (kms/hr) 7. QN Ratio
8. Socio-economic status 8. QL Ordinal
9. Educational attainment 9. QL Ordinal
10. Drug Dependence score 10. QN Interval

15
For each of the situations, state the level of measurement and
explain your answer.

16
Describe how each of the following variables can be measured
using nominal, ordinal or ratio/interval scale of measurements:
Variables Nominal Ordinal Ratio/Interval

socio-economic status

Educational attainment

Distance of crime scene from


victims house

Drug dependence

Age of perpetrator

17
Describe how each of the following variables can be measured
using nominal, ordinal or ratio/interval scale of measurements:
Variables Nominal Ordinal Ratio/Interval

socio-economic status Employed Low income Actual monthly


Unemployed Average income income
High income
Educational attainment Educated Elementary Number of years in
Not educated High school school
College level
Post grad level
Distance of crime scene from Near Less than 5 kms Actual distance in
victims house Far 5kms to 10 kms kms
Greater than 11 kms
Drug dependence Drug addict Low dependence Scores in Drug
Non-addict Regular users Dependence Exam
Risky Use
High dependence
Age of perpetrator Old Less than 13 years old Age in years
Young 13 to 25 years old
26 to 40 years old 18
Above 40 years old
2.
Determining the
mode, median, and
the mean

19
Descriptive Inferential
Examples:
Counts, proportions, tables,
graphs, summary measures

Methods of compressing a mass of data for


Summary Measures better comprehension and description of
what it intends to portray
Measures of Central Tendency
Mean number of prior arrests of 20
individuals arrested for robbery
Measures of Central Tendency
Measures of dispersion
Range of the number of prior
arrests is 10
Mode Median Mean
Measures of location
3rd quartile of the number of arrests
is 7
20
Measures of Central
Tendency

21
Mode

White Black
Is the color of milk and Is the color of coal,
fresh snow, the color ebony, and of outer
produced by the space. It is the darkest
combination of all the color, the result of the
colors of the visible absence of or
spectrum. complete absorption
of light.

22
Mode
For qualitative variables, we usually just present the
mode by grouping frequency of similar characteristics
and determining the characteristic with the greatest
number of counts.

✓ Mode is Private
attorney with 380

23
Median
○ The middlemost value in a set of
observations put in an array
○ For ordinal scale, it is the category where
the middle score lies
○ For interval scale, it is the value that splits
the distribution into half.

24
Median

141st observation is within the somewhat serious level


Compare mode and median value. Which is better level of
measurement in this scenario. 25
Median
Median obs is 16 with hot
spot score of 2.12.
If we delete the 31st
observation for even
numbers:

26
Median
Example of Interval-Scale Case:
We wish to find the median age of the subjects represented in the sample:
43, 66, 61, 64, 65, 38, 59, 57, 57, 50
Arranging in array: 38, 43, 50, 57, 57, 59, 61, 64, 65, 66
Since we have even number of ages, there is no mid-point value but the two middle values
are 57 and 59. The median then is (57 + 59) / 2 = 58
Interpretation: 50% of the subjects have ages less than 58 years, while the other half have
ages above 58 years.

27
Mean

White Black
Is the color of milk and Is the color of coal,
fresh snow, the color ebony, and of outer
produced by the space. It is the darkest
combination of all the color, the result of the
colors of the visible absence of or
spectrum. complete absorption
of light.

28
Mean
Calculated by dividing the sum of
the scores by the number of cases

The average number of arrests for the 20 subjects is 4.3. 29


Exercises
Lets test your understanding for
Finding mode, median and mean
30
Calculate the mode, median and mean for the following data:

a b c
mean 2.32 12.5 1.770
median 1 12 1.755
mode 1 16 1.72

31
Calculate the mode, median and mean for the following data:

Prior stat classes n cumulative n


none 17 17 0
one 9 26 9
two 3 29 6
three 2 31 6
four 1 32 4
five 1 33 5
30 0.91
MODE none
MEDIAN none (33+1) / 2 = 17
MEAN 0.91 = 1 32
3.
Presenting the
information in
graphical forms

33
Descriptive Inferential
Examples:
Counts, proportions, tables,
graphs, summary measures
Portrays numerical figures or relationships
Graphical Display of Data among variables in pictorial form
Key features of characteristics of an analysis
are highlighted
• Line Charts
• Column charts
• Horizontal bar charts
• Pie charts
• Pictograph
• Statistical maps

34
Line Chart
Useful for presenting historical data, showing movement
of a series over time (Time Series Data)

35
Column Chart
Compare amounts in a time series
data or frequency distribution of a
quantitative variable; emphasize
differences in magnitude rather than
movement of a series
(Frequency histograms)

36
Column Chart
Compare amounts in a time series data or
frequency distribution of a quantitative variable;
emphasize differences in magnitude rather than
movement of a series (Frequency histograms)

37
Horizontal bar charts
Horizontally arranged bars useful for showing
distribution of categorical data

38
Pie Chart
Useful for displaying data graphical if the number of
categories are relatively small

39
Pictograph
Like horizontal bar charts but uses
symbols or pictures to represent
magnitude, used to get attention of the
reader

40
Statistical maps
Shows statistical
data in
geographical areas,
can use shaded
maps or dot maps

41
Types of Graphs commonly used in
presenting research data
Type Nature of variable Function

1. Bar Qualitative or discrete For comparisons of absolute counts or


graph/chart/diagram Quantitative relative counts, rates, etc. between
(vertical or horizontal) categories
2. Pie chart Qualitative Shows breakdown of groups (fewer
categories)
3. Histogram (column Continuous Quantitative Frequency distribution of a continuous
chart) variable or measurement
4. Line Diagram/chart Time series Show trend data or changes with time
or age with respect to another
variable
5. Scatterplot / Quantitative Shows correlation between two
scatterpoint, dot quantitative variables
diagram
42
Exercises
Lets test your understanding for
Graphical presentation of data
43
Present this information graphically in a way you think is
informative; explain the choice of chart.

44
Present this information graphically in a way you think is
informative; explain the choice of chart.

45
Present this information graphically in a way you think is
informative; explain the choice of chart.

46
Indicate the type of chart you would choose to present
information in each of the following case and briefly give the
reason of your choice:

1. Percentage distribution of monthly expenditures of a


Filipino family for clothing, house maintenance, and
food (further breakdown into rice, vegetables, meat,
others)
2. The number of stores with reported robberies in
Zamboanga City
3. Rice production for 2002 to 2012
4. Location of Jollibee stores in Manila
5. Distribution of employees by civil status and length of
service

47
4.
Solving and
interpreting
statistical analysis
for Chi-Square Test
for Ordinal or
Nominal Levels

48
Branches of Statistics
Descriptive Inferential
The process by which a
value computed for a Estimation Hypothesis
sample (called statistic)
is used to approximate • Point estimation
Testing
(“estimate”) the (e.g. sample mean,
corresponding value for sample proportion) • Chi-square test for nominal
the population (called • Interval estimation and ordinal data
parameter). (95% confidence interval) • Single population mean
• Two Independent or Two
Related population mean
• K-independent sample
mean
• Non-parametric equivalent
for ANOVA
49
1. State the
null

The
hypothesis
2. State the
alternative
8. State
hypothesis
Conclusion

Hypothesis 3. Select

Testing
Hypothesis the test
statistic
7. State the Testing
Statistical

Process
Decision Procedure

4.
Determine
the
6.
distribution
Calculate 5. State the
of test stat
test decision
statistic rule (sig.
level /
rejection
region)

50
Chi-square test for
nominal or ordinal data
Chi-square test for association/independence:
used to test for the existence of a relationship
or an association between two qualitative
variables
Chi-square test for homogeneity: used to test if
two or more populations have the same
proportions for the different categories of a
particular variable.

51
Chi-square test for
nominal or ordinal data
Chi-square test for association/independence:
* Is there an association between religion
(Christianity and Muslim) and Assignment in
Cell Blocks?
Chi-square test for homogeneity:
* Are there the same proportions of Christians
and Muslims in Seven Cell Blocks of the prison
facility?

52
Chi-square test for
nominal or ordinal data
Preliminary test for measures test of
association for nominal or ordinal data
Measure of Association: a numerical index
summarizing the strength or degree of
relationship in a two-dimensional cross-
classification
Shows the strength of relationship between
two variables
Chi-square test is needed to test statistical
significance then is used to compute the
measure of association
53
Chi-square test
for independence
Step 1. State the null and alternative hypothesis
H0: The random variables, X and Y, are independent.
Ha: X and Y are not independent.
Step 2. Choose the level of significance, (α) and determine critical region
Step 3. Describe sampling distribution of the test statistic.
Step 4. Compute the test statistic by constructing r x c contingency table
with corresponding expected frequencies
Step 5. Make the statistical decision
Step 6. State the conclusion

54
Computing chi-square test of
independence

○ A researcher wishes to determine if a significant association exists


between the gender of a resident in a high-crime area and if they
have ever stolen anything before. They asked 29 random residents
the following question: “Have your ever tried to steal anything
before?”, which was answerable by Yes or No; and categorized them
across gender.

Males Females Total


Yes 4 6 10
No 11 8 19
Total 15 14 29
1. State the Hypothesis

Ho: There is no association between gender and if they have ever


stolen before.
Ha: There is an association between gender and if they have ever
stolen before.

2. Level of Significance: 0.05


Using the Chi Square Table with
df=1
(Rows-1)(Columns-1)=
(2-1)*(2-1)=1

Critical Region: x2 > 3.84


Decision Rule: Reject H0 if computed x2 is greater than 3.84
3. Distribution of statistics

When H0 is true, X2 is distributed approximately as x2 with (r-1)(c-1)=


(2-1)(2-1)= 1 degrees of freedom

4. Compute the test statistic


Males Females Total O E O-E (O-E)2 (O-E)2/ E
Yes 4 6 10 4 5.172414 -1.17241 1.374554 0.265747
11 9.827586 1.172414 1.374554 0.139867
No 11 8 19
6 4.827586 1.172414 1.374554 0.284729
Total 15 14 29 8 9.172414 -1.17241 1.374554 0.149857
x2 0.840201
𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 ∗ 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
𝐸=
𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙
Alternative Computation (only for 2x2 tables):
Males Females Total |𝑎𝑑−𝑏𝑐|
Yes 4 6 10 Φ=
(𝑎+𝑏)(𝑐+𝑑)(𝑎+𝑐)(𝑏+𝑑)
No 11 8 19
Total 15 14 29 (bc − ad )
=
Males Females Total (efgh)
Yes a b e
No c d f ○ Convert phi result to chi-square value
Total g h by multiplying N to (phi coeff)2

(bc − ad ) (6*11) − (4*8) 34


= = = = .17 𝒙𝟐 = 𝟐𝟗 𝒙 𝟎. 𝟏𝟕𝟐 = 𝟎. 𝟖𝟒
(efgh) 10*19*15*14 199.75
5. Statistical Decision:
• Since the computed statistic (0.84) is less than 3.84, do not reject
the null hypothesis

6. Conclusion:
• There is no sufficient evidence to say that a significant association
exists between the gender of residents and if they have ever stolen
anything before.
• The variables gender and ever stolen anything before are
independent.

SPSS Sample
SPSS Sample
○ Input data into two columns: sex and stolen
Sex: 1=male, 2=female
Stolen: 1=Yes, 2=No
○ Label values accordingly in variable view
○ Run crosstab analysis:

60
○ Input sex to row and
stolen to column or vice-
versa
○ Click Statistics, check
Chi-square
○ Click Continue

61
Exercises
Lets test your understanding for
Chi-square test
62
Create the 2 x2 Tabular Data

63
Imprisoned Not imprisoned Row total

Rearrested

Not arrested

Col total

64
Imprisoned Not imprisoned Row total

Rearrested 33 19 52

Not arrested 67 48 115

Col total 100 67 167

a. State the Null Hypothesis


H0: The likelihood of arrest and imprisonment is not
associated.
H0: There is no difference in the likelihood of re-arrest among
offenders who were imprisoned or not imprisoned.
H0: The likelihood of arrest and imprisonment is independent.

b. State the Alternative Hypothesis


65
Determine the rejection region given alpha of 0.05
df = (r-1)(c-1)

Critical Region: x2 > 3.84


Decision Rule: Reject H0 if computed x2 is greater than 3.84
Compute the test statistic
Imprisoned Not imprisoned Row total

Rearrested 33 19 52

Not arrested 67 48 115

Col total 100 67 167

𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 ∗ 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙


𝐸=
𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙 O E O-E (O-E)2 (O-E)2/ E
State the Statistical Decision:
• Since the computed statistic (0.40) is less than 3.84, do not reject
the null hypothesis

State the Conclusion:


• There is no significant difference in the likelihood of re-arrest among
offenders who were imprisoned or not imprisoned.
• There is no sufficient evidence to say that a significant association
exists between the likelihood of arrest and imprisonment.
• The likelihood of arrest and imprisonment are independent.

Generate software analysis sample (SPSS)


SPSS Output

69
Perform the hypothesis testing procedure

70
(O-E)2/
𝑁 60 O E O-E (O-E)2 E
𝐸= = = 10 A 14 10 4 16 1.6
𝑘 6
N is the total number of observed frequencies B 9 10 -1 1 0.1
K is the number of categories C 17 10 7 49 4.9
D 3 10 -7 49 4.9
E 7 10 -3 9 0.9
F 10 10 0 0 0
∑ 60 12.4

a. State the Null Hypothesis


H0: The distribution of homicides among all the six
neighborhoods in the city is randomly distributed.
b. State the Alternative Hypothesis
Ha: The distribution of homicides among all the six
neighborhoods in the city is not randomly distributed.
71
Determine the rejection region given alpha of 0.05
df = k-1 = 6-1 = 5
df for single nominal
variables

Critical Region: x2 > 11.07


Decision Rule: Reject H0 if computed x2 is equal or greater than 11.07
Describe sampling distribution

The distribution of cases for a nominal variable can be assessed


using the chi-square test for homogeneity across the six
neighborhood areas, with degrees of freedom at 5 (k-1).

Compute the test statistic O E O-E (O-E)2 (O-E)2/ E


A 14 10 4 16 1.6
B 9 10 -1 1 0.1
C 17 10 7 49 4.9
D 3 10 -7 49 4.9
E 7 10 -3 9 0.9
𝑘
F 10 10 0 0 0
2
(𝑂 − 𝐸)2 ∑ 60 12.4
𝑋 = ෍
𝐸
𝑖=1
2 (14−10)2 (9−10)2 (17−10)2 (3−10)2 (7−10)2 (10−10)2
𝑋 = + + + + + = 12.4
10 10 10 10 10 10
State the Statistical Decision:
• Since the computed statistic (12.4) is greater than 11.07, we reject
the null hypothesis.

State the Conclusion:


The distribution of homicides among all the six neighborhoods
in the city is not randomly distributed.

Generate software analysis sample (SPSS)


SPSS

75
SPSS Output

76
Perform the hypothesis testing procedure
Determine if affectional identification of father and the level of delinquency act are
associated.

77
a. State the Null Hypothesis
H0: Affectional identification of father and delinquency are
independent.
b. State the Alternative Hypothesis
Ha: Affectional identification of father and delinquency are
not independent.
78
Determine the rejection region given alpha of 0.01
df = (r-1)(c-1)
=(5-1)(3-1) = 8

Critical Region: x2 > 20.09


Decision Rule: Reject H0 if computed x2 is equal or greater than 20.09
Describe sampling distribution

To determine if both ordinal variables are not independent, the


distribution can be assessed using the chi-square test for
independence. The assumption of independence is computed at
the chi-square statistic with degrees of freedom at 8.
O E O-E (O-E)2 (O-E)2/ E
Compute the test statistic 77 69.51064 7.489362 56.09054 0.806935
263 232.0851 30.91489 955.7306 4.118018
𝑘
2
(𝑂 − 𝐸)2 224 222.3191 1.680851 2.82526 0.012708
𝑋 = ෍ 82 98.80851 -16.8085 282.526 2.859329
𝐸
𝑖=1 56 79.2766 -23.2766 541.7999 6.834298
2
𝑋 = 61.53 25 29.80442 -4.80442 23.08244 0.774464
97 99.51227 -2.51227 6.311525 0.063425
97 95.32488 1.675123 2.806036 0.029437
52 42.36661 9.633388 92.80216 2.190455
30 33.99182 -3.99182 15.9346 0.468778
19 21.68494 -2.68494 7.208917 0.332439
44 72.40262 -28.4026 806.7087 11.14198
66 69.35597 -3.35597 11.26256 0.162388
38 30.82488 7.175123 51.48239 1.670157
52 24.73159 27.26841 743.5663 30.06545
61.53026
State the Statistical Decision:
• Since the computed statistic (61.53) is greater than 20.09, we reject
the null hypothesis.

State the Conclusion:


Affectional identification of father and delinquency are not
independent. At 99% confidence, we can assume that the level
of affectional identification to a father and the commission of
more delinquent acts are associated.

Generate software analysis sample (SPSS)


SPSS

82
SPSS
○ Analyze / Descriptive Statistics / Crosstabs
○ Input both variables to row and column
○ Statistics - ✓ Chi-square

83
5.
Solving and
interpreting
statistical analysis
for Single sample t-
test for means

84
Single sample t-test for
means
Estimates the population mean of one
sample group
Assumptions:
Level of measurement: interval scale
Population distribution: Normal distribution
with unknown population variance
Sampling method: Independent random
One population group inference

85
Single Sample t-test

μ (population mean) = 65
𝑥ҧ (sample mean) = 60
n = 51
𝜎ො = 15 86
1. Hypothesis

2. Test statistic: t-test for one sample mean

3. Distribution of the statistic


• Because σ is unknown and cannot be inferred from our null hypothesis,
we use the t distribution, with the number of degrees of freedom at 50
(N-1)=51-1 = 50.

4. Confidence level: 05 alpha


• t-score of 2.008
• Decision rule: reject H0 if t-
test is greater than 2.008 or
less than -2.008
5. Calculate test statistic

6. Statistical Decision
• Since the computed statistic (-2.3570) is less than
-2.008, we reject the null hypothesis

6. Conclusion:

• The mean test scores of the prisoners is not 65. The


test scores of the population of prisoners who
completed the program is below the goal of 65, at
95% confidence level.
SPSS
○ Sample data set

89
○ Since p-value (0.028) is less than alpha
of 0.05, we reject the null hypothesis.

Note: result for t-statistic in SPSS is different from the manual computation because actual
dataset was not provided. This is just a simulated dataset closely similar to the actual
90
values.
Exercises
Lets test your understanding for
One sample t-test for means
91
Perform the hypothesis testing procedure

μ (population mean) = 67
𝑥ҧ (sample mean) = 72
n = 13
𝜎ො = 8.4

92
1. Hypothesis
• H0: The Hispanic drug offenders’ average months in prison is equal to 67.
• H1: The Hispanic drug offenders’ average months in prison is greater than 67.

2. Test statistic: t-test for one sample mean

3. Distribution of the statistic


• Because σ is unknown and cannot be inferred from our null hypothesis,
we use the t distribution, with the number of degrees of freedom at ___
(N-1)= 13-1 = 12.

4. Confidence level: 05 alpha


• t-score of 1.782
• Decision rule: Reject H0 if
computed statistic is
greater than 1.782
5. Calculate test statistic

𝑥ҧ − 𝜇 6. Statistical Decision
𝑡=
𝑠/ 𝑁−1
• Since the computed statistic (2.06) is greater
72 − 67 than 1.782, we reject the null hypothesis
𝑡=
8.4 / 13 − 1
7. Conclusion:
5
𝑡=
2.424871 • The Hispanic drug offenders in Border State are
𝑡 = 2.06 sentenced more severely than the average years in
prison of 67 years.
6.
Solving and
interpreting
statistical analysis
for Dependent
Samples t-Test or
Independent
Samples t-Test

95
Dependent /
Independent Samples Separate variance

Compare means of two populations


Assumptions:
Level of measurement: interval scale Pooled variance

Population distribution: Normal distribution in


both populations
Sampling method: Independent random
No. of groups: two

96
Independent Samples

97
Independent Samples

98
Dependent / Related Samples
Pre-post measurements

99
Dependent / Related Samples
Matched samples

100
Independent Two- Sample t-test
In a study conducted by researchers, police officers were
compared to firefighters in terms of amount of stress and anxiety
experienced on the job. They measured this using an interval-
scale index. Randomly selected subjects were chosen among
police officers and among fire fighters. The final sample included
127 fire fighter and 197 police officers. Among the samples,
researchers found that mean anxiety on the job score for police
officers was 12.8 (s1 =2.76), whereas for fire fighters was 8.8 (s2 =
2.85). What conclusions can be concluded regarding the
populations of firefighters and police officers in the city?
Group 1 Police Group 2 Fire fighters
μ1 (population mean) = ? μ2 (population mean) =?
𝑥ҧ 1 (sample mean) = 12.8 𝑥ҧ 2 (sample mean) = 8.8
n1 = 197 n2 = 127 101
s1 = 2.76 s2= 2.85
1. Hypothesis

2. Test statistic: t-test for two independent sample mean

3. Distribution of the statistic


Since the null hypothesis states that μ1 = μ2, the mean of the
sampling distribution is 0 (since mean difference of μ1 – μ2
should equal to 0) at 322 degrees of freedom.

4. Confidence level at 05 alpha


• t-score of 1.97 at 322 df.
• Decision rule: Reject Ho if
computed t-test is
equal/greater than 1.98 or
equal/less than -1.98.
5. Calculate test statistic
6. Statistical Decision

• Since the computed statistic (12.45) is greater


than 1.98, we reject the null hypothesis

7. Conclusion:

• The mean anxiety-on-the job scores for the population of police


officers is different from that of the population of fire fighters. Or
there is a statistical difference in the mean anxiety-on-the job
scores of both population groups.
SPSS ○ Sample dataset only. Not actual values.

Encode first all score entries in one column, and


the type of group in another column. Label
accordingly.

To analyze, click Analyze>>Compare Means>>


Independent-Samples T Test
Click score in test variable and group in grouping
variable, specify group 1 and 2>>Continue>>OK

105
○ SAMPLE INDEPENDENT T-TEST OUTPUT. Again, this is not the actual data set
from the manual computation. This is just made to show you how to run the
analysis and generate the SPSS output.

** If actual data used, t statistic should at least be closer to value of 12.45. However, decision rule
using software is still based on p-value [Sig.(2-tailed)] and since 0.000 is less than alpha of 0.05, the 106
null hypothesis is still rejected similar to the manual computation.
Dependent Two-Sample t-test
A study was conducted by taking independent random samples
of 35 high-crime addresses from a city. Police officers were then
assigned to conduct beat patrols on each addresses for a full
month to test if the strategy of police presence was effective in
reducing police calls service. The number of emergency calls
were collected before the beat patrol and after one month
implementation. The mean number of calls for police service
before the month was 30, then after a month, it was 20. Can we
conclude that the program was effective in reducing calls for
police service?

107
BEFORE
μ1 (population mean) = ?
𝑥ҧ 1 (sample mean) = 30
n1 = 35
s1= 9.21

AFTER
μ2 (population mean) =?
𝑥ҧ 2 (sample mean) = 20
n2 = 35
s2= 7.52

108
1. Hypothesis

2. Test statistic: t-test for 2 dependent sample mean

3. Distribution of the statistic


• Because the null hypothesis assumes there is no difference in the
crime calls before and after the beat patrol walks, μd or mean
difference is 0 with degrees of freedom for the distribution at 34.
• df = N-1 (degrees of freedom for 2-dependent sample mean)
• df = 35 – 1 = 34
4. Confidence level: 05 alpha
• t-score of 1.691 (one-tailed)
• Decision rule: Reject H0 if
computed t-test is less than
or equal to -1.691.
• Reject H0 if t-test ≤-1.691

5. Calculate test statistic

Formula for 2-samples dependent t-test


111
5. Calculate test statistic

6. Statistical Decision
• Since the computed statistic (-6.4) is less than
-1.69, we reject the null hypothesis

7. Conclusion:

• There is a statistical decrease in the number of


police calls at high-crime addresses in the city if
police conducts beat patrols.

112
SPSS To analyze, click Analyze>>Compare Means>>
Paired-Samples T Test
Encode data in two
separate columns:
a. calls made before
b. calls made during

113
Click the first variable “before”, then click the arrow it will directly put it in Pair Variable .
Click the second variable “during”, then click the arrow again, and it will directly place this
in Pair 1 Variable 2.
Click OK

114
SPSS Output

**SPSS Output for paired samples has a default


of 2-tailed test and deducts first number then
second. Since we used one-tail only and
computation of mean difference was during
minus before, we used the negative part of the
output (-6.412).

115
Exercises
Lets test your understanding for
Two independent sample t-test for means and
Two Dependent sample t-test for means 116
Perform the hypothesis testing procedure

Number of groups?
• Two: United fans and City fans
Are the groups dependent or independent?
• Independent
What is the mean and sd of group 1?
• 15 and 4.7
What is the mean and sd of group 1?
• 8 and 4.2

117
Perform the hypothesis testing procedure

1. Hypothesis
H0: The mean number of violent crowd behaviors observed
between the United fans is the same as those of the City fans.
H1: The mean number of violent crowd behaviors observed
between the United fans is not the same as those of the City fans.

118
2. Test statistic: t-test for two independent sample mean
3. Distribution of the statistic
𝑑𝑓 = 𝑁1 + 𝑁2 − 2 𝑑𝑓 = 𝑁1 + 𝑁2 − 2 = 110 + 130 – 2 = 138
Since the null hypothesis states that μ1 = μ2, the mean of the sampling
distribution is 0 (since mean difference of μ1 – μ2 should equal to 0) at
_____ degrees of freedom.
4. If Confidence level is at .10 alpha
• Decision rule: Reject Ho if computed t-test is equal/greater than 1.658
or equal/less than -1.658.
United fans City fans
5. Calculate test statistic μ1 (population mean) = ? μ2 (population mean) =?
𝑥ҧ 1 (sample mean) = 15 𝑥ҧ 2 (sample mean) = 8
n1 = 110 n2 = 139
s1 = 4.7 s2= 4.2

15 − 8 − 0 7
𝑡= 𝑡= = 12.1765
4.72 4.22 22.09 17.64
110 − 1 + 139 − 1 109 + 138

15 − 8 − 0
𝑡=
110 4.72 + 139 4.22 110 + 139
110 + 139 − 2 110 ∗ 139
𝑡 = 12.3384
United fans
5. Calculate test statistic 𝑤ℎ𝑎𝑡 𝑖𝑓 𝜎ො = 4.45
μ1 (population mean) = ?
𝑥ҧ 1 (sample mean) = 15
n1 = 110

City fans
μ2 (population mean) =?
𝑥ҧ 2 (sample mean) = 8
n2 = 139

15 − 8 − 0
𝑡=
110 + 139
4.45 110 ∗ 139
𝑡 = 12.3266
6. Statistical Decision

• Since the computed statistic (12.18) is greater than 1.698, we


reject the null hypothesis

7. Conclusion:

• The mean number of violent crowd behaviors observed between


the United fans is not the same as those of the City fans. Or there
is a statistical difference in the mean number of violent crowd
behaviors observed in both fan groups.

***SPSS Sample not generated, no


available dataset from problem in book.
SAMPLE PROBLEM SET FOR DEPENDENT SAMPLES

123
124
1. Hypothesis H0: The anger control index of prisoners is the same in the first and
last lesson of the course. (Or no significant difference in anger
control index during the first and last lesson)
H1: The anger control index of prisoners is higher in the last lesson.

2. Test statistic: t-test for two dependent sample mean

3. Distribution of the statistic


𝑑𝑓 = 𝑁 − 1 𝑑𝑓 = 41 − 1 = 40
Since the null hypothesis states that μ1 = μ2, the mean of the sampling
distribution is 0 (since mean difference of μ1 – μ2 should equal to 0) at 40
degrees of freedom.
t-test Table:

4. If Confidence level is at 0.05 alpha


• Decision rule: Reject Ho if computed t-test is equal/greater than 1.684.
**One-tailed test is used, since the
hypothesis is testing that the last lesson is
greater. 126
1 4
6 4
2 4
○ To calculate t-test statistic, we need the sample mean and sample standard
6
4
6
4
deviation. Use Excel to compute for this.
9 9
first last 9 9
1 2 0 0
2 4 1 1
FIRST
1 6 1 1 μ1 (population mean) = ?
3 2 3 3
2 2 𝑥ҧ 1 (sample mean) = 3.4
4 5
7 9 2 2 n1 = 41
1 1
6 6
0 0 s1= 2.4
4 3
4 4
4 7
4 4
1 1
6 6 LAST
2 1
3 4
6 6 μ2 (population mean) =?
7 7
4 9 3 3 𝑥ҧ 2 (sample mean) = 4.1
6 7 1 1 n2 = 41
7 8 1 1
2 2 3.439024 4.097561 MEAN s2= 2.7
2 7 2.389497 2.666915 STDDEV
3 3
1 4 Thus, sample mean difference (overall)
6 4 formula used: = sample mean of last – sample mean of first
Mean in first: =AVERAGE(A2:A42)
2 4 = 4.1 – 3.4 = 0.7
6 6 Mean in last: =AVERAGE(B2:B42)
Standard dev in first: =STDEV.P(A2:A42) 127
4 4
Standard dev in last: =STDEV.P(B2:B42)
9 9
1 2 1 0.3 0.09
2 4 2 1.3 1.69
1 6 5 4.3 18.49

Use the Excel table to generate difference per entry:


3 2 -1 -1.7 2.89
○ 4 5 1 0.3 0.09
7 9 2 1.3 1.69
first last mean diff mean diff-sample mean diff mean diff2 6 6 0 -0.7 0.49
1 2 1 0.3 0.09 4 3 -1 -1.7 2.89
4 7 3 2.3 5.29
2
1
4
6
2
5
1.3
4.3
1.69
18.49
mean diff = last – first
1 1 0 -0.7 0.49
2 1 -1 -1.7 2.89
3 2 -1 -1.7 2.89 3 4 1 0.3 0.09
4 5 1 0.3 0.09
7 9 2 1.3 1.69
4
6
9
7
Answers in mean diff – 0.7
5
1
4.3
0.3
18.49
0.09
6 6 0 -0.7 0.49 7
2
8
2
(overall sample mean)
1
0
0.3
-0.7
0.09
0.49
4 3 -1 -1.7 2.89
2 7 5 4.3 18.49
4 7 3 2.3 5.29
3 3 0 -0.7 0.49
1 1 0 -0.7 0.49
1 4 3 2.3 5.29
2 1 -1 -1.7 2.89
3 4 1 0.3 0.09
6 4 -2 Square of answers in
-2.7 7.29
2 4 2 1.3 1.69
4 9 5 4.3 18.49 6 6 0 [mean diff – overall
-0.7 0.49
6 7 1 0.3 0.09 4 4 0 -0.7 0.49
7 8 1 0.3 0.09 9 9 0 sample mean]
-0.7 0.49
2 2 0 -0.7 0.49 9 9 0 -0.7 0.49
0 0 0 -0.7 0.49
2 7 5 4.3 18.49
1 1 0 -0.7 0.49
3 3 0 -0.7 0.49
1 1 0 -0.7 0.49
1 4 3 2.3 5.29 first 3 last 3 mean diff 0 mean diff-sample mean diff -0.7 mean diff2 0.49
6 4 -2 -2.7 7.29 1
2 2
2 1
0 1.7
-0.7 2.89
0.49
2 4 2 1.3 1.69 2
2 4
2 2
0 2.7
-0.7 7.29
0.49
1 6 5 5.7 32.49
6 6 0 -0.7 0.49 1 1 0 -0.7 0.49
3
0 2
0 -1
0 -0.3
-0.7 0.09
0.49
4 4 0 -0.7 0.49
4
4 5
4 1
0 1.7
-0.7 2.89
0.49
9 9 0 -0.7 0.49 7 9 2 2.7 7.29
4 4 0 -0.7 0.49
9 9 0 -0.7 0.49 6 6 0 0.7 0.49
6 6 0 -0.7 0.49
0 0 0 -0.7 0.49 4 3 -1 -0.3 0.09
6 6 0 -0.7 0.49
1 1 0 -0.7 0.49 4
7 7
7 3
0 3.7
-0.7 13.69
0.49
1 1 0 -0.7 0.49 1
3 1
3 0
0 0.7
-0.7 0.49
0.49
3 3 0 -0.7 0.49 2
1 1
1 -1
0 -0.3
-0.7 0.09
0.49
2 2 0 -0.7 0.49 3
1 4
1 1
0 1.7
-0.7 2.89
0.49 128
4 9 5
27 5.7 32.49
99.29 TOTAL
2 2 0 -0.7 0.49
6 7 1 1.7 2.89
1 1 0 -0.7 0.49
FIRST LAST
5. Calculate test statistic μ1 (population mean) = ? μ2 (population mean) =?
𝑥ҧ 1 (sample mean) = 3.4 𝑥ҧ 2 (sample mean) = 4.1
n1 = 41 n2 = 41
s1= 2.4 s2= 2.7

0.658537 − 0
𝑡= = 2.676
2.4217 27 99.29
41 − 1 = =
41 41
= 0.658537 = 2.4217

6. Statistical Decision
• Since the computed statistic (2.676) is greater than 1.684, we
reject the null hypothesis
7. Conclusion:
• The anger control index of prisoners is higher in the last lesson.
SPSS

**minimal discrepancy from Output in manual


2.676. Result is also negative since paired
difference was computed first minus last. While in
Manual computation, last-first.
7.
Solving and
interpreting
statistical analysis
for ANOVA

131
K-Independent Samples
Analysis Of Variance
Compare means of more than two
independent populations groups
Extension of t-test for 2 population independent
means; simplest is One-Way ANOVA
Assumptions:
Level of measurement: interval scale
Population distribution: Normal distribution
Sampling method: Independent random
No. of groups: three or more

132
One Way ANOVA
○ Only one factor / variable / treatment is investigated
○ Example:
○ Compare mean depression scores of inmates in High,
Moderate and Low Security prisons

133
1. Hypothesis • H0: The mean depression scores of the inmates in the
low, moderate and high security prison are equal.
• H1: The mean depression scores of the inmates in the
low, moderate and high security prison are not equal.

2. Test statistic: One way ANOVA

3. Distribution of the statistic


• We use the F-distribution to test for differences in means in the three
groups with degrees of freedom for between group variance and
within group variance at:
df between-group variance = k -1 = 3 -1 = 4
df within-group variance = N-k = 12 – 3 = 9
4. Confidence level: 05 alpha
• t-score of 3.63
• Decision rule: Reject H0 if
computed t-test is equal or
greater than 3.63

5. Calculate test statistic

Complete first table: Estimated Effects

Group Measures Group Mean Mean - Overall Mean


Low 3 5 4 4 4 -3
Moderate 9 9 8 6 8 1
High 9 10 7 10 9 2

Overall Mean 7
Group Measures Group Mean Mean - Overall Mean
Low 3 5 4 4 4 -3
Moderate 9 9 8 6 8 1
High 9 10 7 10 9 2

Overall Mean 7

k=3; N=12
Cause of Variations df SS MS F
Between 3-1=2 56 28 18
Within 12–3=9 14 1.56
Total 12–1 =11 70

136
F-table: 3.63 Computed F= 18
6. Statistical Decision
• Since the computed statistic 18 is greater than
3.63, we reject the null hypothesis

7. Conclusion:

• The mean depression scores of the inmates in the


low, moderate and high security prison are not
equal. There is a significant difference in the mean
depression scores of inmates across the three
security prisons.

137
SPSS

138
p-value from SPSS is 0.001
6. Statistical Decision
• Since the p-value (0.001) is less than alpha of 0.05,
we reject the null hypothesis

7. Conclusion:

• The mean depression scores of the inmates in the


low, moderate and high security prison are not
equal. There is a significant difference in the mean
depression scores of inmates across the three
security prisons.

139
7.
Solving and
interpreting
statistical analysis
for Kruskall Wallis

140
Kruskal Wallis Test
Non-Parametric ANOVA
Compare medians of more than two
independent populations groups
Non-parametric counterpart of One-Way ANOVA
Assumptions:
Level of measurement: ordinal scale
Population distribution: No assumptions
Sampling method: Independent random
No. of groups: three or more

141
Kruskal-Wallis Test or H-Test
○ Used for comparing more than two independent
or unrelated samples with the hypothesis of
equal location parameters
○ Non-parametric equivalent of One-Way ANOVA
○ Data consists of k(>3) random samples, one
sample from each k population
○ The Primary interest is to determine differences
across the k populations by comparing their
relative central locations
○ The outcome of interest has an underlying
continuous distribution at least in the ordinal
measurement
Assumptions of the Test
1. The samples are randomly collected independent
observations within each group and between
group
2. Dependent variable is at least at the ordinal level
of measurement
3. Independent variable or grouping variable is
nominal with more than two levels
4. The shape of the distribution of the values in each
group of the sampled populations are identical
except for possible difference in measure of
central tendency of at least one of the group
Test Statistic For small samples (k=3 and ni < 8; k=4 and ni <
4; k=5 and ni < 3)

H-Test

where:
N = number of total samples
Ri = sum of the ranks from the grouped
samples
ni the number of values from the
corresponding rank sum
Test Statistic For larger samples, H is distributed
approximately as chi-square under the null

H-Test hypothesis with degrees of freedom equal to


k-1

where:
N = number of total samples
Ri = sum of the ranks from the grouped
samples
ni the number of values from the
corresponding rank sum
Steps in manual procedure
1. The observations from the k samples are combined into a single
series of size n and arranged in order of magnitude from smallest
to largest then rank all observations.
2. Arrange equivalent rankings into each group and compute for
sum of ranks per group (Ri )
3. Compute for test statistic and use correction factor formula if
ranks have ties
4. Determine significance of computed H statistic by comparing the
values in the table of critical values for Kruskal-Wallis test if k=3 and
ni<5 samples in each group or tabulated values of x2 with k-1 df if
more than 5 observations in one or more of the sampled groups
Steps in software
computation (SPSS)
Steps in manual procedure
Sample Case:
Researchers were interested in studying
○ STATE THE HYPOTHESIS:
H0: All social interaction groups have the same population median self-
confidence rating
HA: At least one social interaction group has a different population
median self-confidence rating

or

H0: There is no tendency for self-confidence to rank systematically higher


or lower for any of the levels of social interaction.
HA: There is a tendency for self-confidence to rank systematically higher
or lower for at least one level of social interaction when compared with
the other levels.
○ LEVEL OF CONFIDENCE
a = 0.05
○ COMPUTE FOR TEST STATISTIC
1. For manual procedure: b. Regroup ranks by population
a. Line up all samples in Group Self confidence Rank samples
LOW 3 1
one column, get ranks. LOW 4 2
Group
HIGH
Self confidence
12
Rank
10
Sum of Ranks

MEDIUM 5 3 HIGH 18 12
LOW 6 4 HIGH 19 13.5
LOW 7 5 HIGH 20 15
LOW 8 6 HIGH 21 16
HIGH 23 17 83.5
MEDIUM 9 7
LOW 3 1
MEDIUM 10 8 LOW 4 2
MEDIUM 11 9 LOW 6 4
HIGH 12 10 LOW 7 5
LOW 15 11 LOW 8 6
HIGH 18 12 LOW 15 11 29
MEDIUM 19 13.5 MEDIUM 5 3
MEDIUM 9 7
HIGH 19 13.5
MEDIUM 10 8
HIGH 20 15 MEDIUM 11 9
HIGH 21 16 MEDIUM 19 13.5 40.5
HIGH 23 17
○ COMPUTE FOR TEST STATISTIC
Manual computation

Based on the critical value table for H-Test, the critical value for k=3, and
n1=6, n2=6, n3=5 at a=0.05, is 5.76
▪ STATISTICAL DECISION
Since H = 9.94 is greater than 5.76, therefore we reject H0
○ COMPUTE FOR TEST STATISTIC
SPSS software

▪ STATISTICAL DECISION
Since p-value (0.012) is less than alpha of
0.05, we reject the null hypothesis.
○ CONCLUSION
At least one social interaction group has a different population
median self-confidence rating; OR
There is a tendency for self-confidence to rank systematically
higher or lower for at least one level of social interaction when
compared with the other levels.
Thanks!

154
References
Almeda JV et.al. (2010). Elementary Statistics. UP Press.
Weisburd D et.al. (2014) Statistics in Criminal Justice. 4th Ed.
Daniel WW (2009). Biostatistics. A Foundation for Analysis in
the Health Sciences 9th Ed.
Mendoza OM et al. (2000). Foundations of Statistical
Analysis for the Health Sciences.
Palatino MC. Lecture Notes in STATA Training. Statistical
Analysis of Quantitative Data. UP Manila 2016.

155

You might also like