Modules 8 - 12 Stat

Modules 8 and 9
Module 8
TESTING HYPOTHESIS
Hypothesis is a conjecture, a proposition, assumption, or a supposition that

is temporarily and provisionally accepted to describe a certain event which is still
to be verified or proven by facts to be gathered. Hypothesis is a tentative insight
that serves as the basis of investigation as guidance in proving or disproving
certain characteristics of a population.
In statistics, hypothesis is either classified as Null Hypothesis, denoted as

Ho or alternative hypothesis, denoted as H1.. Etymologically, the term “null”
comes from the Latin word, “nullus”, which means ‘none’ Semantically, “null”
means ‘nil’, ‘no value’, or ‘void’..
The null hypothesis is the hypothesis of no difference. This means that the
population mean, μ1-; μ2= 0; μ1=μ2; therefore, HO: μ1-; μ2= 0 or no significant
difference. Otherwise if μ1-; μ2≠ 0; μ1≠μ2, therefore, μ1>μ2 or μ1<μ2, then reject HO
in favor of H1: μ1-; μ2≠ 0 or there is significant difference.
However, the probability of rejecting the null hypothesis, HO, in favor of the
Alternative Hypothesis,H1, when it is in fact true which should have been
accepted is called Type 1 Error or False Positive. Type I Error is denoted by α,
(0.05) the significance level of a test or it is the probability of committing Type I
Error.
1
On the other hand, the probability of accepting the null hypothesis, HO,
when it is in fact false which should have been rejected is called Type II Error.
Refusing to believe a truth is considered as Type II Error and is denoted by β.
For example, if HO states that Reading Remediation has no effect on the

performance of students. When HO was tested, data showed to reject HO in favor
of H1 when in fact HO is indeed true, Type I error was committed. However, when
HO was tested, data showed to accept HO when in fact it is false, Type II error is
committed.
2
Module 9
CORRELATION AND REGRESSION
There are instances when we are interested to know if a variable

(First/Independent variable) has a significant relationship with another variable
(Second/Dependent Variable). For example, a teacher would like to establish if
there is a relationship that exists between reading ability and National
Achievement Test (NAT) scores of students hoping to improve student
achievement in the future. In other words, we would like to determine whether
NAT scores depend on the reading ability of students. In order to get a very good
estimate of the relationship between the two variables, the teacher needs to
compute Pearson Product Moment Correlation Coefficient, r.
Lesson 1 – Simple Linear Regression
Regression analysis involves the prediction of one variable (Dependent

variable) with another variable (Independent variable). It also helps in the
decision whether the independent variable (x) significantly influences the
dependent variable (y).
Example 1
The following data represent the math grades for a random sample of 12
freshmen at a certain college along with their scores on an intelligence test while
they were still seniors in high school.
3
STUDENT INTELLIGENCE MATH GRADE
TEST SCORE
Xi Yi X i2 XiYi
1 65 85 4225 5525
2 50 74 2500 3700
3 55 76 3025 4180
4 65 90 4225 5850
5 55 85 3025 4675
6 70 87 4900 6090
7 65 94 4225 6110
8 70 98 4900 6860
9 55 81 3025 4455
10 70 91 4900 6370
11 50 76 2500 3800
12 55 74 3025 4070
725 1011 44475 61685
_ _ 1011
725
X = = 60.42 Y = 12 = 84.25
12
Substituting the formula above to find the value of b and a:
n  XY- X  Y
b=
n  X 2 -   X 2
n∑XY = 12(61,685) = 740,220
∑X∑Y = 725(1011) = 732,975
n∑X2 = 12(44,475) = 533,700
(∑X)2 = (725)2 = 525,625
b = 740,220 – 732,975
533,700 – 525,625
= 7, 245
8, 075
b = 0.897 or b = 0.90
a = ∑X∑Y2 - ∑X∑XY or a = Y – bX
n∑X2 – (∑X)2 a = 84.25 – 0.9(60.42)
a = 84.25 – 54.378
4 a= 29.97
a = 1011(44,475) – 725(61,685)
8,075
a = 44,964,225 – 44,721,625
8,075
a = 242,600
8,075
a = 30.04
Therefore the linear regression equation from the general form of
Y = a + bX is
Y = 30.04 + 0.90X
The regression line plotted on the scatter diagram is
(80,102)
(70,93.04)
M (60,84.04)
A
T (50,75.04)
H
G
R
A (30,57.04)
D
E (10,39.04)
(0,30.4)
INTELLIGENCE TEST SCORE
5
X Y
0 30.04
70 93.04
50 75.04
60 84.04
80 102
10 39.04
30 57.04
To test the significance of b is to compute for the value of tb which is

b
tb =
s 2 yx
n 2
 Xi
i 1
Where tb = compound t value

sy2 x = residual mean square or residual variance
n = the number of paired data
b = slope of the line
X i = the independent variable
2 2 2
 Y -   XY  /  X
sy 2 x=
n-2
The null hypothesis, Ho : b = 0, that is, Y does not depend on X. reject Ho if |tb| > tk (n-2)
Y
2
From the Example 1: =85,905
2
85, 905-  61, 685  / 44, 478
s 2 y.x= 6
10
3805039225
85, 905-
44, 478
=
10
85, 905-85,548.793

10
356.207

10
 35.621
Therefore
0.897
tb =
35.621/44,475
0.897
=
0.028
t b =32.036
From table A.2 tt with ta(n-2) with α = 0.05, df = n-2 = 10 tf = 2.23. since tb>tt then we
reject Ho :b=0 that Y is not dependent on X.
7
Lesson 2 – Correlation
The correlation coefficient r was first introduced by Karl Pearson and is often called as
the product – moment correlation. This correlation coefficient measures the amount of
spread about the linear least – squares equation. Its range is from -1.0 to 1.0. if all the
points are exactly on the straight line, r value will either be +1.0 or -1.0 depending on
whether the relationship is positive or negative. As the value of r approaches 1.0, the
more points are located on the line. The value of r approaches 0 (zero) if the points are
more randomly scattered as indicated in the figures below:
8
To compute for the value of the correlation coefficient. rr the formula is
n XY    X   Y 
r
 n X 2    X 2   n Y 2    Y 2 
  
9
STUDENT TEST SCORE MATH GRADE
X Y X2
1 65 85 7225
2 50 74 5476
3 55 76 5776
4 65 90 8100
5 55 85 7225
6 70 87 7569
7 65 94 8836
8 70 98 9604
9 55 81 6561
10 70 91 8281
11 50 76 5776
12 55 74 5476
725 1011 85,905
n∑XY = 740, 220 n∑X2 = 12 (85,905) = 1,030,860

(∑X) (∑Y) = 732, 975 (∑Y)2 = (1011)2 = 1,022,121
n∑X2 = 533, 700
(∑X)2 = 525, 625
Computing for r
740, 220  732, 975

r
 533, 700  525, 6251, 030, 860  1, 022,121
7, 245

 8, 075  8, 739 
7, 245

8, 400, 442
 0.8625 or r2 = 0.7439
We can say that approximately 74.39% of the variation in the values of Y is

accounted for by a linear relationship with X.
10
Testing the significance of the correlation coefficient the following relationship are
used:
r n2
t With n-2 degrees of freedom if n<50
2
1 r
r
z 1 If n > 50 critical where Z = 1.645
n2
From example 1, n = 12, therefore
r n2
t
2
1 r
0.8625 12  2

1  0.7439
0.8625(3.162)

0.2561
2.727

0.506
t  5.389
Tt with n – 2 = 10 df at α = 0.05
tt = 2.228
Accept Ho iff tc is less than tt; Reject Ho iff tc is equal or greater than tt
Therefore, reject H0, therefore, we can say that there is a correlation between the two
variables.
11
Exercise 9
Correlation and Regression
Name:________________________________________Score:___________
Course/Year:_____________________Time:________Date:_____________
Code No.____________
1. The data below show the final grades in algebra and physics obtained by
10 students selected at random form a large group of students.
Algebra (X) 75 80 93 65 78 71 98 68 84 77
Physics (Y) 82 78 86 72 81 80 95 72 89 74
a. Graph the data above

b. Find the linear regression equation and locate it on the same graph
c. Test the linearity of the data.
d. If a student receives a grade of 75 in algebra, what is his expected grade
in physics?
2. The following data are indexed prices of gold and copper over the last 10
years. Assuming these indexed values constitute a random sample from
the population of possible values, test for the existence of correlation
between the indexed prices of the two metals.
Gold: 76 62 70 58 52 53 53 56 57 56
Copper: 80 68 73 63 65 68 65 63 65 66
12
3. An exclusive school in Manila conducted a study on the relationship of age
teaching performance of teachers, as evaluated by the students using the
5-point scale, where 5 the highest. With the random sample of 16
teachers, the result is shown below.
TEACHER AGE(X) PERFORMANCE(Y)

1 38 3.89
2 30 4.12
3 37 4.05
4 42 3.61
5 45 3.08
6 52 2.58
7 48 3.42
8 33 4.47
9 36 4.68
10 32 4.39
11 29 2.95
12 48 3.01
13 56 2.80
14 30 4.34
15 44 3.67
16 43 3.65
a. Determine the degree of relationship existing between the two

variables and the significance of the obtained r using 0.05 level of
significance.
b. What percent of the total variation in performance of the teachers is

being explained by their ages?
c. Find the regression equation and interpret it.
d. Estimate the teaching performance if the age of the teachers is 46.
13
Module 10
CHI-SQUARE, X2
Chi-square test, x2, is used when the population is not assumed to be

normal distribution. The statistics involved is known as nonparametric or
distribution-free statistics. Chi-square is used as a test of significance when we
have data that are expressed in frequencies data that are in terms of
percentages or proportions that can be reduced to frequencies. The data must be
independent – that is, no response is related to any other response.
The general formula for chi-square is:
( 0 - E )
2
Where 0 = the observed cell frequency
2
x = E = the expected or theoretical frequency
E
The x2 test is a general test used to determine whether there is a significant

difference between observed frequencies and what is expected. It can be used to
compare frequencies between two categories or to interrelate nominal categories
with any number of other categories.
14
Lesson 1 – The One-Sample Chi-Square Test
Suppose a sample of 100 barangay residents (one sample only) were

asked whether they are in favor or not in favor (two categories of response) of
putting up a certain project in their community. The null hypothesis, Ho (the
hypothesis of no difference) would be: there is equal proportion of residents who
are in favor and not in favor of establishing a certain project in their barangay.
The actual or observed frequencies (O) are: 55 in favor and 45 not in favor.
Since Ho is equally divided, then the expected frequencies, E, are equal for both:
½ N=50 for in favor and ½ N=50 for not in favor. To facilitate the computation, a
working table is recommended:
Observed Expected (O-E)2 (O-E)2

Frequency Frequency E
(O) (E)
In favor = 55 ½ (100) = 50 55-50 = 5 25 25/50 = 0.5
Not in favor = 45 ½ (100) = 50 45-50 = -5 25 25/50 = 0.5
Chi-Square computer value, x2c = 1.0
Referring to the Table A., the tabular value, xt2 with df = (number of
categories -1) = 2-1 = 1 with α = 0.05 is 3.841.
15
The null hypothesis, Ho, is accepted if x2c < x2t. It is rejected if x2c > x2t. From
our example, since x2c < x2t or 1.0 <3.841, then the decision is to accept Ho, that
is, there is indeed equal proportion of residents who are in favor and not in favor
of putting a certain project in their barangay.
Lesson 2 – The Chi-Square Test for Two Independent Samples
(In a 2x2 Contingency Table). Suppose that age (independent variable) is being
tested for association with the level of job satisfaction (dependent variable). Age
can be categorized into two independent samples: Young and Old. While the
dependent variable can be categorized into low and high level of job satisfaction.
A 2x2 contingency table is recommended to be constructed as follows:
LEVEL OF JOB SATISFACTION

AGE HIGH LOW TOTAL
Young A B A+B
Old C D C+D
TOTAL A+C B+D N
Chi-square will be obtained by the use of the formula:
X2 = N(AD-BC) 2 Where df= 1

(A+C)(B+D)(A+B)(C+D)
16
When df = 1 and any expected frequency is small – less than 10 – the formula
below should be used:
X2 = N[(AD-BC) – ½]2
(A+D)(B+D)(A+B)(C+D)
Suppose, our 2x2 contingency table is:
LEVEL OF JOB SATISFACTION

AGE HIGH LOW TOTAL
Young A 25 B 30 55
Old C 60 D 25 85
TOTAL 85 55 140
Using formula
Xc2 = N(AD-BC) 2
(A+C)(B+D)(A+B)(C+D)
= 140[(25)(25)- 30(60)]2
85(55)(55)(85)
= 193,287,500
21,855,625
Xc2 = 8,844
The tabular value x2t with df = 1 and α = 0.05 is 3.481. Since x2c > x2t or
8.844 > 3.841, therefore, reject Ho, that there is no association between age and
level of job satisfaction.
17
Lesson 3 – The Chi-Square Test for Tables Having More Than
Four Cells
In this case, the basic formula of x2 below is used:
X2 =  (O-E)2/E Where: E= (Row Total)(Column Total) /N
Suppose we randomly selected members of four ethnic groups and

determined their party preference. The null hypothesis, Ho, states that there is no
relationship between ethnic group and political party preference.
The table below shows the summary of the findings:
POLITICAL PARTY PREFERENCE
ETHNIC PARTY A PARTY B PARTY PARTY D TOTAL

GROUP C
Ilongo 50 45 30 20 145
Cebuano 30 45 45 50 170
Ilocano 10 10 25 35 80
Tagalong 20 30 45 10 105
TOTAL 110 130 145 115 500
18
Computed Chi-Square can easily be solved by using a table similar below:
O E = RxC/N (O-E)2 (O-E)2/E

50 31.90 327.61 10.270
30 37.40 54.76 1.464
10 17.60 57.76 3.282
20 23.10 9.61 0.416
45 37.70 53.29 1.414
45 44.20 0.64 0.014
10 20.80 116.64 5.603
30 27.30 7.29 0.267
30 42.05 145.20 3.453
45 49.30 18.49 0.375
25 23.20 3.24 0.140
45 30.45 211.70 6.952
20 33.35 178.22 5.344
50 39.10 118.81 3.039
35 18.40 275.56 14.976
10 24.15 200.223 8.291
∑(O-E) /E = 65.305 computed chi-square
2
To determine whether we have to accept or reject Ho, we have to compare

the computed chi-square, x2c, with the tabulated chi-square value, x2, with df =
(row-1) (column-1) and α = 0.05.
From the table A.3 x2t = 16.919 with df = 9 and α = 0.05. since x2c > x2t we
reject the null hypothesis, Ho, in favor of the alternative hypothesis, H1, that there
is relationship between ethnic group and political party preference.
19
Lesson 4 – Application of the Chi-Square Test for the Effect of
Two or more Independent Variables
There are instances when an independent variable has no relationship with

the dependent variable. However, when the same independent variable is
combined with another variable, the situation reverses. Is age and level of job
satisfaction related under low or high length of service/
This can clearly be seen in a tabular form as shown below:
LENGTH OF LEVEL OF SATISFACTION

AGE TOTAL
SERVICE LOW HIGH
High
High Low
Low High
Low
Total
In this case, there are 2, 2x2 contingency tables. Combining the df’s of the
two tables, the final df is 3 when referring the tabular values.
20
Exercise 10
Chi-Square, X2
Name:________________________________________Score:___________
Code No.____________
1. Two groups, A and B, consist of 100 people each who have a disease. A
serum is given to group A but not to group B (which is called control)
otherwise, the two groups are treated identically. It is found that in group A
and B 75 and 65 people, respectively, recover from the disease. We
moved expect 70 people in each of the groups to recover and 30 in each
group not to recover as shown in the tables below:
FREQUENCIES OBSERVED
Recover Do not TOTAL
Recover
Gorup A (Using serum) 75 35 100
Group B (not using serum) 65 35 100
TOTAL 140 60 200
FREQUENCIES EXPECTED UNDER H

Recover Do not TOTAL
Recover
Gorup A (Using serum) 70 30 100
Group B (not using serum) 70 30 100
TOTAL 140 60 200
0  E 
2
Using formula
 E
21
2. Students, teachers, and school employees are asked to report to a scale of
proposed action with the response in 3 categories. Determine whether
there are significant differences in their responses.
Plan of Aciton
A B C
Student 20 20 60
Teachers 40 40 20
School Employees 10 70 20
22
Module 11
TEST OF SIGNIFICANCE WITH “t” DISTRIBUTION
To test the equality of means between two samples, the t-test is used.
There are two application: one for comparing the means of paired samples and
the other is comparing the means of two non-paired samples. The latter has two
methods a) samples with the same number of observations, and b) samples with
different numbers of observations.
Lesson 1 – Means of Paired Samples
EX: YOU WANT TO COMPARE THE SCORES OF THE STUDENTS IN PRE-

TEST AND POST TEST.
MEANS OF PAIRED SAMPLES
Each of x1 is paired in some way with that of x2. In other words, there is
basis for pairing the observations. The formulas used are:
d
tc = Where : n = number of pairs
s2 / n
D = the difference between paired values, i.e., x1-x2
d = mean difference
S2 = estimated variance
S2 = [  d2 – (  d) 2]
n-1
23
The null hypothesis, Ho : µ1 - µ2 (there is no difference between the
means of population 1 and population 2), is accepted if f tc

< tabulated ta(n-2)
otherwise Ho is rejected in favor of the alternative hypothesis, H 1 (there is
significant difference), if f >tctabulated ta(n-1).
Example 1
Suppose we are going to compare the mean intelligence score of husbands (x 1)

and wives (x2) ((X1= PRE-TEST X2=POST TEST ) ASSIGNMENT) n=15
x1 x2 d d2 x12 x22
70 85 -15 225 4900 7225
65 70 -5 25 4225 4900
83 83 0 0 6889 6889
82 88 -6 36 6724 7744
86 90 -4 16 7396 8100
75 84 -9 81 5625 7056
74 75 -1 1 5476 5625
74 88 -14 196 5476 7744
65 79 -14 196 4225 6241
68 68 0 0 4624 4624
84 88 -4 16 7056 7744
90 88 2 4 8100 7744
81 84 -3 9 6561 7056
70 85 -15 225 4900 7225
66 70 -4 16 4356 4900
1133 1225 -92 1046 86530 100817
24
15 pairs – 1 in identifying the degree of freedom in the table
d  ( d )
2
2
/n 1133
s2  X1 
n 1 15
=75.533
1046  (92) 2 /15

14
1046  504.267 1225

 X2 
14 15
=81.667
92
s 2  34.41 d
15
=-6.133
d
 From the table A.2., The tabulated value of t with
s2 / n df = n-1=14 @ α =0.05 is 2.145 since |tc| > tt.
6.133 Therefore, reject Ho. There is indeed a significant

34.41 / 15 difference between intelligence test scores
6.133 husbands and wives ….. df (degree of freedom)

2.294
 4.05
25
Lesson 2 – Means of Two Non-Paired Samples
a) Samples with the same number of observations, n1 = n2 with no basis for

pairing s= sd
X1  X 2 n
tc 
s 2
Where tc = computed t value
X 1 = sample mean of the first group
X 2 = sample mean of the second group
s 2 = pooled variance-standard deviation/ square root
S2  [ x  ( x ) / n]  [ x  ( x ) / n]
2 2 2 2
1 1 2 2
2(n  1)
Using the problem of Example 1, we have (7,858 ---- 7.858/-2,138- -2.138

correction below)
[86,533  (1133)2 /15]  [100,817  1225)2 /15]

s2 
2(15  1)
[86,533  1, 283, 689 /15]  [100,817  1,500, 625 /15]


28
953.733  775.393

28
 61.752
s  s 2  61.752
s
 7.858
tc  X1  X 2 n
s 2
 75,533  81, 667  15
  2
 7,858 
6.134  2.739 

7,858
 2,138
26
28= 2.045 based to the table
From table A.2., The value, tt width df = 2(n-1) = 28 and α = 0.05 is 2.045. Since
tc > tt then reject Ho in favor of the alternative hypothesis, H1, that is, there is indeed
a significant difference between scores of husbands and wives.
b. Samples with different numbers of observations.
X1  X 2 n1n2
tc 
s n1  n2
  X 12    X 1 2 / n1     X 2 2    X 2 2 / n2 
s2     
 n1  1   n2  1
Example 1
Suppose we are going to compare the yield per plot or 2 varieties of rice:
Variety A (X1) = 38,40,40,42,39,35,32,28,42,44
Variety B (X2) = 37,37,40,40,32,30,31
Variety A (X1) Variety B (X2)
n1  10 n2  7
 X1  380  X2  247
X1  38.00 X2  35.29
 14, 662  8,829
2 2
 X1  X2
  X1   X2 
2 2
 14, 440  8, 716

n1 n2
Substituting from the formulas above:
27
s2 
14, 662  14, 440   8,823  8, 716 
96
222  107
  21.93
15
s  52  21.93
 4.68
Substituting the values to t:
X1  X 2 n1n2 38.00  35.29 70

tc  
s n1  n2 4.68 17
 1.18
2.71 4.118
4.68
The tabular value of t with df=( n1-1)+ ( n2-1)=15 and σ=0.05 is 2.131.
Since |tc|< tt, Ho is accepted, which means that varieties X1 and X2 do not differ
significantly. (1.18<2.131 that is why Ho is accepted)
28
Paired
Exercise 11
Test Significance with “T” Distribution
Name:________________________________________Score:___________
Code No.____________
Test the comparison of the paired heights of fathers and sons
Height X of Father (inches) 65 63 67 64 68 62 70 66 68 67 69 71

Height of sons (inches) 68 66 65 69 66 66 68 65 71 67 68 70
The following sets of scores were made by 16 individuals in a laboratory

experiment on perception: is there a significant difference between the means of
two distributions?
TEST 1 TES8T 11
18 16
12 14
8 8
6 8
3 8
12 10
16 3
7 14
8 12
12 8
15 14
7 4
5 6
3 6
11 7
29
Module 12
ANALYSIS OF VARIANCE (ANOVA)
The analysis of variance utilizes the F-test. Despite its name, ANOVA is used to
test hypotheses about population means rather than population variances.
Lesson 1 – Single-Factor Analysis of Variance
This type of ANOVA tests two or more groups if they are affected by
various treatments. In the actual experiment we can see that the means of the
groups vary. This variation of group means from the grand mean is called
between-groups variance. The variations of the scores within each group are
called within-groups variance and the variation of all individual scores is called
total variance.
Symbolic representation of data for analysis of variance.
CATEGORIES
A1 A2 ... Ax TOTAL
x11 x12 ... x1k
x21 x22 ... x2k
Scores x31 x32 ... x3k
xr1 xr2 ... xrk

r
X  X
r y
Sums X i1 X i2 ik ij
i 1 i 1 i 1 i j
Means X.1 X.2 ... X.k X..
# of Cases r r ... r N
30
31
r r
Total sum of squares (BSS)   i  i x ij 2  CF
Where CF = Correction Factor
2

  x  ij
2
N
Between sum of squares (BSS) 
  x  ij
N
Within sum of squares (WSS) = TSS – BSS
Degrees of freedom (df)

Total df (Tdf) = Total number of cases – 1
= N–1
Between df (Bdf) = Number of factors – 1
= K–1
Within df (Wdf) = Total df – Between df
= N–K
Mean Square (MS)

Since there is no total mean square, therefore,
BBS
Between Mean Square (BMS) =
Bdf
WSS
Within Mean Square (WMS) =
Wdf
F Test:
BMS
F computed (Fc) = which has to be compared with the tabulated
WMS
value of F (Ft) basing from Bdf and wdf at α = 0.05.
Accept Ho, iff Fc < Ft

Accept Ho, iff Fc ≥ Ft
32
Example 1
Suppose we want to find out if there is a significant difference between the
lengths of lives of the 3 brands of fluorescent tubes (in years). The samples of
eight tubes were selected at random and their corresponding lengths of lives
were as follows:
FLOURESCENT BRAND
A B C TOTAL
2.5 2.0 2.0
2.6 2.4 2.2
3.5 2.6 2.0
2.4 3.0 2.1
3.7 1.8 1.2
3.7 1.5 1.5
2.8 2.1 1.5
2.5 2.2 2.0
TOTAL 23.7 17.6 14.5 55.8
MEAN 2.96 2.20 1.81
 55.8
2
3113.64
cf    129.735
24 24
SST  2.52  2.02  2.02  2.62  ...  2.02  CF
 139.940  129.735
 10.205
  X ij 
2
23.7 2 17.62 14.52
BSS    CF     CF
r 8 8 8
1081.7
  129.735
8
 5.478
WSS  SST  BSS

 10.205  5.478
 4.727
33
We have to summarize the computation in the ANOVA table below:
ANOVA
Degree of Sums of Means

Source of Ft
Variation
Freedom Squares Squares Fc α = 0.05
df SS MS
TOTAL 23 10.205
BETWEEN 2 5.478 2.739 12.173* 3.47
WITHIN 21 4.727 0.225
* significant at α = 0.05
Since Fc> Ft, we can reject Ho which means that there is a significant
difference between the lengths of lives of the three brands of fluorescent tubes.
Lesson 2 – Tests After the F-Test
If after using the ANOVA and there is a need to reject Ho, it is imperative
for us to test where the difference or difference lie. There are several tests to
determine this and one of them is Scheffe’s Test (1957). This is done by
arranging the individual means and comparing each other.
Comparing, we have:
X C  1.81 X C VS XB
X S  2.20 X C VS XA
X A  2.96 X S VS XA
34
Then compute the F ratio of each group:
X 
2
 X2
Fr  1
BMS  N1  N 2 / N1 N 2 
As pointed out in the ANOVA table, the Ftfor 2, 21 degrees of freedom at α

= 0.05 is 3.47. This value is multiplied by (k-1), where k is the number of factors
or treatments, therefore:
FS = Ft (k-1)
= 3.47 (2)
= 6.94
Each of the three F’s computed above is then compared with the value of
6.94. if the E values computed is larger than 6.94 then it follows that the mean of
each differs significantly with each other:
Since F2<FS or 2.714 < 6.94 then the length of lives between brands B and
C are not significantly different from each other.
Since Fr > Fs or 23.625 > 6.94, then the lengths of lives between brands
C and A are significantly different from each other.
Finally, since F3> F2 or 10.314 > 6.94, then the lengths of lives between
brands B and A are significantly different from each other.
Exercise 12
35
Analysis of Variance (ANOVA)
Name:________________________________________Score:___________
Code No.____________
1. An investigation was conducted to determine the source of reduction in

yield of a certain chemical product. It was no material removed at the
filtration stage. It was felt that different results of the percentage
reductions at the mother liquor stage. The following are the results of the
mother liquor stage. The following are the results of the percentage
reduction for three batches at each of four pre-selected blends:
1 2 3 4
25.6 25.2 20.8 31.6
24.3 28.6 26.7 29.8
27.9 24.7 22.2 34.3
a. Is there a significant difference in the average percentage reduction

in yield for the different blends?
2. The sales data of Redent Cars, Inc.
SALES AREA 1 AREA 2 AREA 3 AREA 4

PERSON
1 3 9 7 12
2 5 8 5 8
3 9 8 4 7
4 10 7 9 7
5 7 10 9 8
6 3 9 6 10
7 5 9 8 5
8 6 4 8 7
Test if there are significant differences in the sales between regions. Further test
if there are significant differences.
36

Modules 8 - 12 Stat

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modules 8 - 12 Stat

Uploaded by

Copyright:

Available Formats

Modules 8 and 9

Hypothesis is a conjecture, a proposition, assumption, or a supposition that

In statistics, hypothesis is either classified as Null Hypothesis, denoted as

For example, if HO states that Reading Remediation has no effect on the

There are instances when we are interested to know if a variable

Lesson 1 – Simple Linear Regression

Regression analysis involves the prediction of one variable (Dependent

Substituting the formula above to find the value of b and a:

INTELLIGENCE TEST SCORE

To test the significance of b is to compute for the value of tb which is

Where tb = compound t value

n∑XY = 740, 220 n∑X2 = 12 (85,905) = 1,030,860

740, 220  732, 975

We can say that approximately 74.39% of the variation in the values of Y is

From example 1, n = 12, therefore

a. Graph the data above

TEACHER AGE(X) PERFORMANCE(Y)

a. Determine the degree of relationship existing between the two

b. What percent of the total variation in performance of the teachers is

c. Find the regression equation and interpret it.

d. Estimate the teaching performance if the age of the teachers is 46.

Chi-square test, x2, is used when the population is not assumed to be

The general formula for chi-square is:

The x2 test is a general test used to determine whether there is a significant

Suppose a sample of 100 barangay residents (one sample only) were

Observed Expected (O-E)2 (O-E)2

Chi-Square computer value, x2c = 1.0

Lesson 2 – The Chi-Square Test for Two Independent Samples

LEVEL OF JOB SATISFACTION

Chi-square will be obtained by the use of the formula:

X2 = N(AD-BC) 2 Where df= 1

Suppose, our 2x2 contingency table is:

LEVEL OF JOB SATISFACTION

In this case, the basic formula of x2 below is used:

X2 =  (O-E)2/E Where: E= (Row Total)(Column Total) /N

Suppose we randomly selected members of four ethnic groups and

The table below shows the summary of the findings:

POLITICAL PARTY PREFERENCE

ETHNIC PARTY A PARTY B PARTY PARTY D TOTAL

O E = RxC/N (O-E)2 (O-E)2/E

To determine whether we have to accept or reject Ho, we have to compare

There are instances when an independent variable has no relationship with

This can clearly be seen in a tabular form as shown below:

LENGTH OF LEVEL OF SATISFACTION

FREQUENCIES EXPECTED UNDER H

Lesson 1 – Means of Paired Samples

EX: YOU WANT TO COMPARE THE SCORES OF THE STUDENTS IN PRE-

means of population 1 and population 2), is accepted if f tc

significant difference), if f >tctabulated ta(n-1).

Suppose we are going to compare the mean intelligence score of husbands (x 1)

1046  504.267 1225

a) Samples with the same number of observations, n1 = n2 with no basis for

Using the problem of Example 1, we have (7,858 ---- 7.858/-2,138- -2.138

[86,533  (1133)2 /15]  [100,817  1225)2 /15]

[86,533  1, 283, 689 /15]  [100,817  1,500, 625 /15]

b. Samples with different numbers of observations.

Variety A (X1) = 38,40,40,42,39,35,32,28,42,44

Variety B (X2) = 37,37,40,40,32,30,31

Variety A (X1) Variety B (X2)

 14, 440  8, 716

Substituting from the formulas above:

Substituting the values to t:

X1  X 2 n1n2 38.00  35.29 70