Professional Documents
Culture Documents
The Statistical Tools: Mathematics in The Modern World
The Statistical Tools: Mathematics in The Modern World
Chapter 6
Learning Objectives
Lesson Proper
Hypothesis
A hypothesis is a conjecture or statement which aims to explain certain
phenomena in the real world. Many hypotheses, statistical or not, are products of man’s
curiosity. To seek for the answers to his questions, he tries to find and present
evidences, then tests the resulting hypothesis using statistical tools and analysis. In
statistical analysis, the truth of which will be either accepted or rejected within a certain
critical interval.
The hypothesis that is subjected to testing to determine whether its truth can be
accepted or rejected is the null hypothesis by Ho. This hypothesis states that there is no
significant relationship or no significant difference between two or more variables, or
that one variable does not affect another variable. In statistical research, the hypotheses
should be written in null form. For example, suppose you want to know whether method
A is not more effective than method B in teaching high school mathematics. The null
hypothesis for this study will be: “There is no significant difference between the
effectiveness of method A and method B.”
Another type of hypothesis is the alternative hypothesis, denoted by Ha. This is
the hypothesis that challenges the null hypothesis. The alternative hypothesis for the
example above can be: “There is a significant difference between the effectiveness of
method A and method B.” or “Method A is more effective than method B,” or Method
A is less effective than method B,” depending on whether the type of test is either one-
tailed or two-tailed. These will be discussed in the succeeding lessons.
Significance Level
To test the null hypothesis of no significance in the difference between the two
methods in the above example, one must set the level of significance first. This is the
probability of having a Type I error and is denoted by the symbol 𝛼. A Type I error is
the probability of accepting the alternative hypothesis when, in fact, the null hypothesis
is true. The probability of accepting the null hypothesis when, in fact, it is false is called
a Type II error and it is denoted by the symbol 𝛽. The most common level of
significance is 5%.
Table 1. Four Possible Outcomes in Decision-Making
Decisions about the Ho
Do not Reject Ho
Reject
(or Accept Ho)
Ho is true. Type I error Correct Decision
Reality
Ho is false. Correct Decision Type II error
If the null hypothesis is true and accepted, or if it is false and rejected, the
decision is correct. If the null hypothesis is true and reject, the decision is incorrect and
this is a Type I error. If the null hypothesis is false and accepted, the decision is
incorrect and this is a Type II error. For instance, Sarah insists that she is 31 years old
when, in fact, she is 35 years old. What error is Sarah committing? Mary is rejecting
the truth. She is committing a Type I error. Another example, a man plans to go hunting
the Philippine monkey-eating eagle believing that it is a proof of his mettle. What type
of error is this? Hunting the Philippine eagle is prohibited by law. Thus, it is not a good
sport. It is a Type II error. Since hunting the Philippine monkey-eating eagle is against
the law, the man may find himself in jail if he goes out of his way hunting endangered
species.
In decisions that we make, we form conclusions and these conclusions are the
bases of our actions. But this is not always the case in Statistics because we make
decisions based on sample information. The best that we can do is to control the
probability with which an error occurs. This is the reason why we are assigning small
probability values to each of them.
In figure 1.A (two-tailed), the rejection region is the areas to the extreme left
and right of the curve marked by the two vertical lines. In figure 1.B&C (both one-
tailed), the rejection region is the area to the left (left tail) and to the right (right tail)
of the vertical line under the bell curve, respectively.
(𝑥̅ −𝜇)
𝑧= ∙ √𝑛, where n is the number of sample.
𝜎
Decision Rule:
Reject Ho if |𝑧| ≥ |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
Example 1
A company, which makes a battery-operated toy car, claims that its products
have a mean life span of 5 years with a standard deviation of 2 years. Test the null
hypothesis that 𝜇 = 5 years against the alternative hypothesis that years if a random
sample of 40 toy cars was tested and found to have a mean life span of only 3 years.
Use a 5% level of significance.
Solution:
1. Ho : The mean lifespan of battery-operated toy cars is 5 years. (𝜇 = 5)
Ha : The mean lifespan of battery-operated toy cars is 5 years. (𝜇 ≠ 5)
2. 𝛼 = 0.05, two-tailed
3. Use z-test as test statistic.
4. Computation:
Given 𝑥̅ = 3, 𝜇 = 5, 𝑛 = 40, 𝜎 = 2
(𝑥̅ − 𝜇)
𝑧= ∙ √𝑛
𝜎
(3−5)
= 2 ∙ √40
= −6.32
5. Critical Value: 𝑧 < −1.96 𝑎𝑛𝑑 𝑧 > 1.96
6. Decision Rule: Reject Ho if |𝑧| ≥ |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
7. Since the computed|𝑧|, which is 6.32, is greater than |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |, which is 1.96,
therefore, reject Ho. Hence, there is a significant difference between the
population and sample mean lifespan of battery-operated toy cars.
Example 2
A manufacturer of bicycle tires has developed a new design which he claims
has an average lifespan of 5 years with a standard deviation of 1.2 years. A dealer of
the product claims that the average lifespan of 150 samples of the tires is only 3.5 years.
Test the difference of the population and sample means at 5% level of significance.
Solution:
1. Ho : There is no significant difference between the population and sample
mean of bicycle tires’ lifespan. (𝑥̅ = 𝜇)
Ha : There is a significant difference between the population and sample mean
of bicycle tires’ lifespan. (𝑥̅ < 𝜇)
2. 𝛼 = 0.05, one-tailed, left tail
3. Use z-test as test statistic.
4. Computation:
Given 𝑥̅ = 3.5, 𝜇 = 5, 𝑛 = 150, 𝜎 = 1.2
(𝑥̅ − 𝜇)
𝑧= ∙ √𝑛
𝜎
(3.5−5)
= 1.2 ∙ √150
= −15.31
5. Critical Value: 𝑧 < −1.645
6. Decision Rule: Reject Ho if |𝑧| ≥ |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
7. Since the computed|𝑧|, which is 15.31, is greater than |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |, which is 1.645,
therefore, reject Ho. Hence, there is a significant difference between the
population and sample mean of bicycle tires’ lifespan.
6.1.1.2 t-test on the Comparison between the Population Mean and the
Sample Mean
The t-test can be used to compare the means when the population mean
(𝜇) is known but the population standard deviation (𝜎) is unknown.
When the population standard deviation is unknown but the sample
standard deviation can be computed, the t-test can also be used instead of
the z-test. The formula is given below:
(𝑥̅ − 𝜇)
𝑡= ∙ √𝑛
𝑠
The denominator of the formula, s, divided by the √𝑛 for t is called the
standard error of the statistic. It is the standard deviation of the sampling
distribution of a statistic for random samples n.
Decision Rule:
Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
Example 1
The average length of time for people to vote using the old procedure during a
presidential election period in precinct A is 55 minutes. Using computerization as a new
election method, a random sample of 20 registrants was used and found to have a mean
length of voting time of 30 minutes with a standard deviation of 1.5 minutes. Test the
significance of the difference between the population mean and the sample mean.
Solution:
1. Ho : There is no significant difference between the population and sample
mean of length of time for people to vote using the old and new procedure.
(𝑥̅ = 𝜇)
Ha : There is a significant difference between the population and sample mean
length of time for people to vote using the old and new procedure.
(𝑥̅ < 𝜇)
2. 𝛼 = 0.05, one-tailed, left tail
3. Use t-test as test statistic.
4. Computation:
Given 𝑥̅ = 30, 𝜇 = 55, 𝑛 = 20, 𝑠 = 1.5
(𝑥̅ − 𝜇)
𝑡= ∙ √𝑛
𝑠
(30−55)
= 1.5 ∙ √20
= −74.54
5. df = n – 1 = 20 – 1 = 19
6. Tabular Value: t = 1.729 (from Appendix)
7. Decision Rule: Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
8. Since the computed|𝑡|, which is 74.54, is greater than |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |, which is 1.729,
therefore, reject Ho. Hence, there is a significant difference between the
population and sample mean length of time for people to vote using the old and
new procedure. It implies that using computerization method in election gives
short period of time to vote compare to the old procedure.
Example 2
An experiment study was conducted by a researcher to determine if a new time
slot has an effect on the performance of pupils in Mathematics. Fifteen randomly
selected learners participated in the study. Toward the end of the investigations, a
standardized assessment was conducted. The sample mean was 85 and the standard
deviation of 3. In the standardization of the test, the mean was 75 and the standard
deviation was 10. Based on the evidence at hand, is the new time slot effective? Use
5% level of significance.
Solution:
1. Ho: There is no significant difference between the population and sample mean
of performance in Mathematics in a new time slot. (𝑥̅ = 𝜇)
Ha: There is a significant difference between the population and sample mean
of performance in Mathematics in a new time slot. (𝑥̅ > 𝜇)
2. 𝛼 = 0.05, one-tailed, right tail
3. Use t-test as test statistic.
4. Computation:
Given 𝑥̅ = 85, 𝜇 = 75, 𝑛 = 15, 𝑠 = 3
(𝑥̅ − 𝜇)
𝑡= ∙ √𝑛
𝑠
(85−75)
= ∙ √15
3
= 12.91
5. df = n – 1 = 15 – 1 = 14
6. Tabular Value: t = 1.761 (from Appendix)
7. Decision Rule: Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
8. Since the computed|𝑡|, which is 12.91, is greater than |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |, which is 1.761,
therefore, reject Ho. Hence, there is a significant difference between the
population and sample mean of performance in Mathematics in a new time slot.
It implies that there is an effect of students’ performance in Mathematics when
it changed the time slot.
where 𝑥1 ̅̅̅
̅̅̅, 𝑥2 = means
𝑛1 , 𝑛2 = sample sizes
𝑠1 , 𝑠2 = variances
Decision Rule:
Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
Example 1
A course in Physics was taught to 10 students using the traditional method.
Another group of students went through the same course using another method. At the
end of the semester, the same test was administered to each group. The 10 students
under method A got an average of 82 with a standard deviation of 5, while the 11
students under method B got an average of 78 with a standard deviation of 6. Test the
null hypothesis of no significant difference in the performance of the two groups of
students at 5% level of significance.
Solution:
1. Ho: There is no significant difference between the average scores of the two
groups of students.
̅̅̅1 = ̅̅̅)
(𝑥 𝑥2
Ha: There is a significant difference between the average scores of the two
groups of students.
̅̅̅1 > ̅̅̅)
(𝑥 𝑥2
2. 𝛼 = 0.05, one-tailed, right tail
3. Use the t-test as test statistic.
4. Computation:
Given: ̅̅̅
𝑥1 = 82, ̅̅̅
𝑥2 = 78, 𝑛1 = 10, 𝑛2 = 11, 𝑠1 = 5, 𝑠2 = 6
𝑥1 − ̅̅̅
̅̅̅ 𝑥2
𝑡=
(𝑛 − 1)𝑠12 + (𝑛2 − 1)𝑠2 2 𝑛1 + 𝑛2
√[ 1 ][ ]
𝑛1 + 𝑛2 − 2 𝑛1 𝑛2
82 − 78
=
(10 − 1)(5)2 + (11 − 1)(6)2 10 + 11
√[ ][ ]
10 + 11 − 2 (10)(11)
4
=
(9)(25) + (10)(36) 21
√[ ][ ]
19 110
4
= = 1.65
2.4245
5. df = 𝑛1 + 𝑛2 − 2 = 10 + 11 – 2 = 19
6. Tabular Value: t = 1.729 (from Appendix)
7. Decision Rule: Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
8. Since the computed|𝑡|, which is 1.645, is less than |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |, which is 1.729,
therefore, accept Ho. Hence, there is no significant difference between the
average scores of the two groups of students. It implies that there is no
significant difference in using method A and method B in their students’
performance in Physics.
Solution:
1. Ho: There is no significant difference between the pre-test and post-test of the
students’ performance in College Algebra. (𝜇1 = 𝜇2 )
Ha: There is a significant difference between the pre-test and post-test of the
students’ performance in College Algebra. (𝜇1 < 𝜇2 )
2. 𝛼 = 0.05, one-tailed, left tail
3. Use the t-test as test statistic.
4. Computation:
∑𝑑
𝑡=
2 2
√(𝑛 ∑ 𝑑 ) − (∑ 𝑑 )
𝑛−1
−77
=
2
√10(1,185) − (−77)
10 − 1
−77
=
√5,921
9
−77
=
25.65
= −3.002
5. df = n – 1 = 10 – 1 = 9
6. Tabular Value: t = 2.821(from Appendix)
7. Decision Rule: Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
8. Since the computed|𝑡|, which is 3.002, is greater than |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |, which is 2.821,
therefore, reject Ho. Hence, there is a significant difference between the pre-test
and post-test of the students’ performance in College Algebra. It implies that
the performance of the students in Algebra is significantly improved.
6.1.1.5 z-test on the Significance of the Difference Between Two Independent
Proportions
There are certain situations when the data to be analyzed involve
population proportions or percentages. For instance, a shoe company may
want to know the proportions of defective shoes to be delivered in other
countries. To determine if there is a significant difference between
proportions of two variables, the z-test will be used.
𝑝1 − 𝑝2
𝑧=
𝑝1 𝑞1 𝑝2 𝑞2
√ 𝑛 + 𝑛
1 2
Solution:
1. Ho: There is no significant difference between the proportion of the male votes
and the proportion of female votes. (𝑝1 = 𝑝2 )
Ha: There is a significant difference between the proportion of the male votes
and the proportion of female votes. (𝑝1 ≠ 𝑝2 )
2. 𝛼 = 0.01, two-tailed
3. Use the z-test as test statistic.
4. Computation:
120 175
Given: 𝑝1 = 200, 𝑝2 = 250
𝑝1 − 𝑝2
𝑧=
𝑝1 𝑞1 𝑝2 𝑞2
√ 𝑛 + 𝑛
1 2
120 175
= 200 − 250
120 120 175 175
√(200) (1 − 200) (250) (1 − 250)
+
200 250
−0.1
=
120 80 175 75
√(200) (200) (250) (250)
+
200 250
−0.1
=
√0.24 + 0.21
200 250
−0.1
=
0.045
= −2.22
Example 1
Determine who among the three salesmen will most likely be promoted based
on their monthly sales in pesos. Use 5% level of significance.
Sales of Three Candidates for Promotion (A, B, C)
A B C
12,000 15,500 12,800
10,000 12,500 16,000
10,900 12,000 15,000
18,000 13,000 12,700
16,000 14,000 15,000
14,400 15,000 13,000
14,400 12,300 12,000
15,500 15,000 16,000
18,800 19,000 16,000
Solution:
1. Ho: There is no significant difference between the mean sales of the three
candidates for promotion.
Ha: There is a significant difference between the mean sales of the three
candidates for promotion.
2. 𝛼 = 0.05
3. Accomplish the ANOVA Table.
A B C A2 B2 C2
12,000 15,500 12,800 144,000,000 240,250,000 163,840,000
10,000 12,500 16,000 100,000,000 156,250,000 256,000,000
10,900 12,000 15,000 118,810,000 144,000,000 225,000,000
18,000 13,000 12,700 324,000,000 169,000,000 161,290,000
16,000 14,000 15,000 256,000,000 196,000,000 225,000,000
14,400 15,000 13,000 207,360,000 225,000,000 169,000,000
14,400 12,300 12,000 207,360,000 151,290,000 144,000,000
15,500 15,000 16,000 240,250,000 225,000,000 256,000,000
18,800 19,000 16,000 324,000,000 361,000,000 256,000,000
∑ 𝐴 =130,000 ∑ 𝐵 =128,300 ∑ 𝐶 =128,500 ∑ 𝐴2 =1,921,780,000 ∑ 𝐵2 =1,867,790,000 ∑ 𝐶 2 =1,856,130,000
(386,800)2
= 5,541,460,000 −
27
= 5,541,460,000 − 5,541,268,148.15
𝑆𝑆𝐵 = 191,851.85
Find TSS:
(∑ 𝑋𝑖 )2
2
𝑇𝑆𝑆 = ∑ 𝑋𝑖 −
𝑁
2 2 2 (∑ 𝐴 + ∑ 𝐵 + ∑ 𝐶 )2
= ∑𝐴 + ∑𝐵 + ∑𝐶 −
𝑁
= 1,921,780,000 + 1,867,790,000 + 1,856,130,000 − 5,541,268,148.15
= 5,645,700,000 − 5,541,268,148.15
= 104,431,851.85
Find SSW:
𝑆𝑆𝑊 = 𝑇𝑆𝑆 − 𝑆𝑆𝐵
= 104,431,851.85 − 191,851.85
= 104,240,000
𝑆𝑆𝐵 191,851.85
𝑀𝑆𝐵 = = = 95,925.93
𝑑𝑓𝐵 2
𝑆𝑆𝑊 104,240,000
𝑀𝑆𝑊 = = = 4,343,333.33
𝑑𝑓𝑊 24
3.4. F – Value
𝑀𝑆𝐵 95,925.93
𝐹= = = 0.0221
𝑀𝑆𝑊 4,343,333.33
Total 104,431,851.85 26
Correlation Analysis
Why do most students who excel in English do not do well in Mathematics?
Have you ever wondered whys some of your friends who are good in Mathematics do
not have high grades in English? Did it occur to you to find out if there exists a
relationship between academic performance in English and achievement in
Mathematics? The statistical procedure that is used to determine whether a relationship
between two variables is called correlation analysis.
Correlation
Correlation analysis measures the association or the strength of the
relationship between two variables say, x and y.
Example 1
Suppose a ten-item test in English and a ten-item test in Mathematics were
administered to ten students. The scores of the students are tabulated below. It must be
determined if the scores in Mathematics quiz (here labelled variable x) and the English
quiz (labelled variable y) are correlated or not.
Mathematics
English Score
Student Score
(y)
(x)
1 4 5
2 5 4
3 9 8
4 2 3
5 8 9
6 1 2
7 2 1
8 7 6
9 6 7
10 4 5
The scatter graph of the data above is given below. Note that x-axis represents
the scores in Mathematics and y-axis shows the scores in English. Each point in the
graph below is an ordered pair (x, y) corresponding to the score obtained by a student
in the two subjects.
10
9
8
7
English Score
6
5
4
3
2
1
0
0 2 4 6 8 10
Mathematics Score
The graph above indicates a direct correlation between variables x and y which
appears to be increasing.
Example 2
Suppose the scores of the students in those two subjects happen to be as follows:
Mathematics Score English Score
Student
(x) (y)
1 9 3
2 3 6
3 4 7
4 7 4
5 6 2
6 1 9
7 2 8
8 5 4
9 10 2
10 2 10
The scatter graph of the data above looks like this:
12
10
English Score
8
0
0 2 4 6 8 10 12
Mathematics Score
This time the trend of the data is decreasing, hence, the variables are negatively
correlated.
Example 3
Suppose the same students have the following scores.
Mathematics
English Score
Student Score
(y)
(x)
1 9 8
2 2 9
3 6 3
4 3 7
5 4 7
6 5 5
7 3 6
8 6 7
9 8 4
10 2 2
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
Mathematics Score
The scatter of the data is neither increasing nor decreasing. It represents a zero
correlation.
While a scatter plot may be a convenient way of inspecting correlation between
two variables, it does not offer a measure of the strength of the correlation. Fortunately,
Karl Pearson (1857-1936) developed and perfected a formula that can give a numerical
value to measure the strength of correlation. This formula does not only show how
greatly two data sets are correlated but also reveals if the correlation is direct or inverse,
or if the data sets are not correlated. The formula named after him is called the Pearson
Product-Moment Correlation Coefficient.
Consider the data in Example 1 of this section. Organize the data as shown in
the table below.
This result is in conformity with the scatter plot in Example 1 of this section.
The computed r is almost 1, hence, it has a very high positive correlation. This the
reason why the scatter plot in Example 2 in this section is increasing from left to right.
Using the data in Example 2 of this section, we have the following
computations.
Mathematics Score English Score
x2 y2 xy
(x) (y)
9 3 81 9 27
3 6 9 36 18
4 7 16 49 28
7 4 49 16 28
6 2 36 4 12
1 9 1 81 9
2 8 4 64 16
5 4 25 16 20
10 2 100 4 20
2 10 4 100 20
∑ 𝑥 = 49 ∑ 𝑦 = 55 ∑ 𝑥 2 = 325 ∑ 𝑦 2 = 379 ∑ 𝑥𝑦 = 198
Solution:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
(10)(198) − (49)(55)
=
√[(10)(325)− (49)2 ][(10)(379) − (55)2 ]
−715
=
√(849)(765)
−715
=
805.906322
= −0.89
The computed r is – 0.89, hence, it has a very high correlation. This is the reason
why the scatter plot in Example 2 of this section is decreasing from left to right.
We now compute the r of the data on two non-correlated variables in Example
3 of this section.
Mathematics Score English Score
x2 y2 xy
(x) (y)
9 8 81 64 72
2 9 4 81 18
6 3 36 9 18
3 7 9 49 21
4 7 16 49 28
5 5 25 25 25
3 6 9 36 18
6 7 36 49 42
8 4 32 4 32
2 2 4 4 4
∑ 𝑥 = 48 ∑ 𝑦 = 58 2 2
∑ 𝑥 = 252 ∑ 𝑦 = 370 ∑ 𝑥𝑦 = 278
Solution:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
(10)(278) − (48)(58)
=
√[(10)(252)− (48)2 ][(10)(370) − (58)2 ]
−4
=
√(216)(336)
−4
=
269.39933
= −0.01
Since the computed r is almost zero, then it has little or zero linear correlation.
This conforms with the scatter plot in Example 3 in this section. The graph is neither
increasing nor decreasing and therefore the two sets of data are not correlated.
Example 4
Test the hypothesis that there is no significant relationship between mental
ability and English proficiency at 5% level of significance.
Mental Ability (x) English Proficiency (y)
50 200
54 198
50 200
51 203
49 186
46 205
48 185
47 197
44 183
44 171
46 179
45 185
48 184
53 190
54 191
33 170
34 168
Solution:
1. Ho: There is no significant relationship between mental ability and English
proficiency.
Ha: There is a significant relationship between mental ability and English
proficiency.
2. 𝛼 = 5% 𝑜𝑟 0.05
3. Pearson r will be used to test the hypothesis.
4. Computation
Mental English
Ability (x) Proficiency x2 y2 xy
(y)
50 200 2,500 40,000 10,000
54 198 2,916 39,204 10,692
50 200 2,500 40,000 10,000
51 203 2,601 41,209 10,353
49 186 2,401 34,596 9,114
46 205 2,116 42,025 9,430
48 185 2,304 34,225 8,880
47 197 2,209 38,809 9,259
44 183 1,936 33,489 8,052
44 171 1,936 29,241 7,524
46 179 2,116 32,041 8,234
45 185 2,025 34,225 8,325
48 184 2,304 33,856 8,832
53 190 2,809 36,100 10,070
54 191 2,916 36,481 10,314
33 170 1,089 28,900 5,610
34 168 1,156 28,224 5,712
∑ 𝑥 = 796 ∑ 𝑦 = 3,195 ∑ 𝑥 2 = 37,834 ∑ 𝑦2 = ∑ 𝑥𝑦 =
602,625 150,401
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
(17)(150,401) − (796)(3,195)
=
√[(17)(37,834)− (796)2 ][(17)(602,625) − (3,195)2 ]
13,597
=
√(9,562)(36,600)
13,597
=
18,707.46375
= 0.73
5. df = N – 2 = 17 – 2 = 15
6. Tabular Value: r = 0.482 (from Appendix)
7. Decision Rule: If the computed value is less than the tabular value, accept the
null hypothesis. If the computed value is greater than the tabular value, reject the
null hypothesis.
8. Since the computed r (0.73) is greater than the tabular value (0.482), so reject
the null hypothesis. Hence, there is a significant relationship between mental
ability and English proficiency. It shows that there is a moderately high positive
relationship between the two variables.
6.2.2 Regression Analysis
Regression analysis is used when predicting the behavior of a variable. The
regression equation explains the amount of variations observable in the independent
variable x. It is actually an equation of a straight line in the form:
𝑦 = 𝑏𝑥 + 𝑎
where y = criterion measure
x = predictor
a = ordinate or the point where regression line crosses the y-axis
b = the slope of the line.
To get the regression equation, the values of a and b are computed using the
formula below.
∑ 𝑦 ∑ 𝑥 2 − ∑ 𝑥 ∑ 𝑥𝑦
𝑎=
𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2
and
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏=
∑ 𝑥 2 − (∑ 𝑥 )2
where n = number of pairs
Example 1
The data in the table represent the membership at a university Mathematics club
during the past 5 years. Find the regression equation to predict the membership 5 years
from now.
Number of Years (x) Membership (y)
1 25
2 30
3 32
4 45
5 50
Solution:
Number of Membership x2 xy
Years (x) (y)
1 25 1 25
2 30 4 60
3 32 9 96
4 45 16 180
5 50 25 250
∑ 𝑥 = 15 ∑ 𝑦 = 182 2
∑ 𝑥 = 55 ∑ 𝑥𝑦 = 611
Find a: Find b:
∑ 𝑦 ∑ 𝑥 2 − ∑ 𝑥 ∑ 𝑥𝑦 𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑎= 𝑏=
𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ∑ 𝑥 2 − (∑ 𝑥 )2
182(55) − 15(611) 5(611) − 15(182)
= =
5(55) − (15)2 5(55) − (15)2
845 325
= =
50 50
= 16.9 = 6.5
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
(10)(44,947) − (659)(680)
=
√[(10)(43,601)− (659)2 ][(10)(46,356) − (680)2 ]
= 0.95
5. df = N – 2 = 10 – 2 = 8
6. Tabular Value: r = 0.632 (from Appendix)
7. Decision Rule: If the computed value is less than the tabular value, accept the
null hypothesis. If the computed value is greater than the tabular value, reject the
null hypothesis.
8. Since the computed r (0.95) is greater than the tabular value (0.632), so reject
the null hypothesis. Hence, there is a significant relationship between heights of
father and their eldest sons. It shows that there is a very high positive relationship
between the two variables.
We can now proceed to regression analysis since there was a significant
relationship between heights of father and their eldest sons.
Find a: Find b:
∑ 𝑦 ∑ 𝑥 2 − ∑ 𝑥 ∑ 𝑥𝑦 𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑎= 𝑏=
𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ∑ 𝑥 2 − (∑ 𝑥 )2
680(43,601) − 659(44,947) 10(44,947) − 659(680)
= =
(10)(43,601)− (659)2 (10)(43,601)− (659)2
= 16.55 = 0.78
Solution:
Judge 1 Judge 2
Candidate Rx Ry d d2
(x) (y)
1 98 94 1 4 -3 9
2 97 97 2 2 0 0
3 95 98 3 1 2 4
4 90 95 4 3 1 1
5 89 92 5 5 0 0
6 88 90 6 6 0 0
7 85 89 7.5 7 0.5 0.25
8 85 85 7.5 8 -0.5 0.25
∑ 𝑑 2 = 14.5
6 ∑ 𝑑2 6(14.5)
𝜌 =1− 2
=1− = 0.83
𝑛(𝑛 − 1) 8(82 − 1)
Example 2
Ten instructors were rated by third- and fourth-year students on their “master
of subject matter” and the results were tabulated. What is the Spearman rho value for
the data? At 5% level of significance, determine if there is a significant relationship in
the scores obtained by the teachers.
Instructor 3rd Year (x) 4th Year (y)
1 44 46
2 45 43
3 38 40
4 32 30
5 46 39
6 47 37
7 37 44
8 35 46
9 27 48
10 40 50
Solution:
1. Ho: There is no significant relationship between the ratings given to the ten
instructors by third- and fourth-years students.
Ha: There is a significant relationship between the ratings given to the ten
instructors by third- and fourth-years students.
2. 𝛼 = 5% 𝑜𝑟 0.05
3. Spearman rho (𝜌)will be used to test the hypothesis.
4. Computation
3rd Year 4th Year
Instructor Rx Ry d d2
(x) (y)
1 44 46 4 3.5 0.5 0.25
2 45 43 3 6 -3 9
3 38 40 6 7 -1 1
4 32 30 9 10 -1 1
5 46 39 2 8 -6 36
6 47 37 1 9 -8 64
7 37 44 7 5 2 4
8 35 46 8 3.5 4.5 20.25
9 27 48 10 2 8 64
10 40 50 5 1 4 16
∑ 𝑑 2 = 215.5
6 ∑ 𝑑2
𝜌 = 1−
𝑛(𝑛2 − 1)
6(215.5)
=1−
10(102 − 1)
= 1 − 1.31
= −0.31
5. df = N – 2 = 10 – 2 = 8
6. Tabular Value: 𝜌 = 0.643 (from Appendix)
7. Decision Rule: If the computed value is less than the tabular value, accept the
null hypothesis. If the computed value is greater than the tabular value, reject the
null hypothesis.
8. Since the absolute value of the computed 𝜌 (0.31) is less than the tabular value
(0.643), so the null hypothesis accepted. Hence, there is no significant
relationship between the ratings given to the ten instructors by third- and fourth-
years students. It implies that the ratings of the third- and fourth-years students
are not the same.
References/Additional Resources/Readings
Baltazar, E. C. et al. (2013). Mathematics in the Modern World. Quezon City: C&E
Publishing, Inc.
Nocon, R.C. & Nocon, E.G. (2018). Essential Mathematics for the Modern World. Quezon
City: C&E Publishing, Inc.
Quintos, R.T. et al. (2018). Mathematics in the Modern World. St. Andrew Publishing
House
Activity Sheet 19
2. The owner of a factory that sells a particular bottled fruit juice claims that the
average capacity of their product is 250 ml. To test the claim, a consumer group
gets a sample of 100 such bottles, calculates the capacity of each bottle, and
then finds the mean capacity to be 248 ml. The standard deviation is 5 ml. Is the
claim true?
3. In a plant nursery, the owner thinks that the lengths of seedlings in a box sprayed
with a new kind of fertilizer has an average height of 26 cm after three days and
a standard deviation of 10 cm. One researcher randomly selected 80 such
seedlings and calculated the mean height to be 20 cm and the standard deviation
was 10 cm. Will you conduct a one-tailed test or two-tailed test? Proceed with
the test using 𝛼 = 0.05.
Activity Sheet 20
2. An investigator predicts that dog owners in the country spend more time
walking their dogs than do dog owners in the city. The investigator gets a sample
of 21 country owners and 23 city owners. The mean number of hours per week
that city owners spend walking their dogs is 10.0. The standard deviation of
hours spent walking the dog by city owners is 3.0. The mean number of hours’
country owners spent walking their dogs per week was 15.0. The standard
deviation of the number of hours spent walking the dog by owners in the country
was 4.0. Do dog owners in the country spend more time walking their dogs than
do dog owners in the city?
Males Females
2. Yvonne is a good student, but at times she doesn’t get enough sleep. She
hypothesizes that when she gets more sleep she does better on tests. To test her
hypothesis, she tracked how she did on a number of tests, based on how many
hours of sleep she got on the night previous. She inputs the following data into
her excel file to compute the correlation coefficient equation.
History Algebra
34 31
25 32
16 46
9 23
40 9
7 48
28 31
9 4
2. The left side of Figure 1 displays the association between the IQ of each
adolescent in a sample with the number of hours they listen to rock music per
month. Determine the strength of the correlation between IQ and rock music
using both the Pearson’s correlation coefficient and Spearman’s rank
correlation. Compare the results.
IQ Rock
99 2
120 0
98 25
102 45
123 14
105 20
85 15
110 19
117 22
90 4
Learner’s Feedback Form
In what particular portion of this learning packet, you feel that you are struggling or
lost?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
Did you raise your concern to you instructor? □ Yes □ No
To further improve this learning packet, what part do you think should be enhanced?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
NOTE: This is an essential part of course module. This must be submitted to the subject
teacher (within the 1st week of the class).