The Statistical Tools: Mathematics in The Modern World

Mathematics in the Modern World
Chapter 6
The Statistical Tools

Chapter 6: THE STATISTICAL TOOLS
Introduction
Statistics involves the collection, organization, summarization, presentation,
and interpretation of data. It has two branches: descriptive statistics and inferential
statistics. Descriptive statistics is the term given to the analysis of data that helps
describe, show or summarize data in a meaningful way. When using descriptive
statistics, it is useful to summarize a group of data using a combination of tabulated
description (i.e., tables), graphical description (i.e., graphs and charts) and statistical
commentary (i.e., a discussion of the results). The branch that allows to make
predictions (“inferences”) from the data is called inferential statistics. With inferential
statistics, it takes data from samples and make generalizations about a population.
For instance, you might stand in a mall and ask a sample of 100 people if they
like shopping at SM. You could make a bar chart of yes or no answers (that would be
a descriptive statistics) or you could use your research (and inferential statistics) to
reason that around 75-80% of the population (all shoppers in all malls) like shopping at
SM.
Testing the significance of the difference between two means, two standard
deviations, two proportions, or two percentages, is an important area of inferential
statistics. Comparison between two or more variables often arises in research or
experiments and to be able to make valid conclusions regarding the results of the study,
one has to apply an appropriate test statistic. This chapter deals with the discussion of
the different test statistics that are commonly used in research studies under inferential
statistics.
Learning Objectives
At the end of this chapter, the student is expected to:

 apply a variety of statistical tolls to process and manage numerical data;
 use the methods of linear regression and correlations to predict the value of
a variable given certain conditions; and
 recognize the importance of testing of hypotheses in making decisions.
Duration
Topic 1: Testing of Hypothesis = 6 hours

Topic 2: Correlation and Regression Analysis = 3 hours
Lesson Proper
6.1 Hypothesis Testing

In Statistics, decision-making starts with a concern about a population regarding
its characteristics denoted by parameter values. We might be interested in the
population parameter like the mean or the proportion. For instance, you are deciding to
put up a business selling cars. Your first course before spending money in business is
to know which car sells the most these days. Before you open a business of selling
Toyota, Mitsubishi, Hyundai, Honda, Nissan, or Suzuki, you need to gather information
which among these get the most number of sales. How many existing distributors of
these cars are out there? Do you want to compete? To answer these questions, you need
to gather data. What type of data? And where will you get them? You simply need to
do a survey. These concerns can be addressed in a procedure in Statistics called
hypothesis testing.
Hypothesis
A hypothesis is a conjecture or statement which aims to explain certain
phenomena in the real world. Many hypotheses, statistical or not, are products of man’s
curiosity. To seek for the answers to his questions, he tries to find and present
evidences, then tests the resulting hypothesis using statistical tools and analysis. In
statistical analysis, the truth of which will be either accepted or rejected within a certain
critical interval.
The hypothesis that is subjected to testing to determine whether its truth can be
accepted or rejected is the null hypothesis by Ho. This hypothesis states that there is no
significant relationship or no significant difference between two or more variables, or
that one variable does not affect another variable. In statistical research, the hypotheses
should be written in null form. For example, suppose you want to know whether method
A is not more effective than method B in teaching high school mathematics. The null
hypothesis for this study will be: “There is no significant difference between the
effectiveness of method A and method B.”
Another type of hypothesis is the alternative hypothesis, denoted by Ha. This is
the hypothesis that challenges the null hypothesis. The alternative hypothesis for the
example above can be: “There is a significant difference between the effectiveness of
method A and method B.” or “Method A is more effective than method B,” or Method
A is less effective than method B,” depending on whether the type of test is either one-
tailed or two-tailed. These will be discussed in the succeeding lessons.
Significance Level
To test the null hypothesis of no significance in the difference between the two
methods in the above example, one must set the level of significance first. This is the
probability of having a Type I error and is denoted by the symbol 𝛼. A Type I error is
the probability of accepting the alternative hypothesis when, in fact, the null hypothesis
is true. The probability of accepting the null hypothesis when, in fact, it is false is called
a Type II error and it is denoted by the symbol 𝛽. The most common level of
significance is 5%.
Table 1. Four Possible Outcomes in Decision-Making
Decisions about the Ho
Do not Reject Ho
Reject
(or Accept Ho)
Ho is true. Type I error Correct Decision
Reality
Ho is false. Correct Decision Type II error
If the null hypothesis is true and accepted, or if it is false and rejected, the
decision is correct. If the null hypothesis is true and reject, the decision is incorrect and
this is a Type I error. If the null hypothesis is false and accepted, the decision is
incorrect and this is a Type II error. For instance, Sarah insists that she is 31 years old
when, in fact, she is 35 years old. What error is Sarah committing? Mary is rejecting
the truth. She is committing a Type I error. Another example, a man plans to go hunting
the Philippine monkey-eating eagle believing that it is a proof of his mettle. What type
of error is this? Hunting the Philippine eagle is prohibited by law. Thus, it is not a good
sport. It is a Type II error. Since hunting the Philippine monkey-eating eagle is against
the law, the man may find himself in jail if he goes out of his way hunting endangered
species.
In decisions that we make, we form conclusions and these conclusions are the
bases of our actions. But this is not always the case in Statistics because we make
decisions based on sample information. The best that we can do is to control the
probability with which an error occurs. This is the reason why we are assigning small
probability values to each of them.
One-Tailed and Two-Tailed Tests

A test is called a one-tailed test if the rejection region lies on one extreme side
of the distribution and two-tailed if the rejection region is located on both ends of the
distribution.
Figure 1. Two-tailed (A) and One-tailed (A & B) tests
In figure 1.A (two-tailed), the rejection region is the areas to the extreme left
and right of the curve marked by the two vertical lines. In figure 1.B&C (both one-
tailed), the rejection region is the area to the left (left tail) and to the right (right tail)
of the vertical line under the bell curve, respectively.
Steps in Testing Hypothesis

Below are the steps when testing the truth of a hypothesis.
1. Formulate the null hypothesis. Denote it as Ho and the alternative hypothesis
as Ha.
2. Set the desired level of significance (𝛼).
3. Determine the appropriate test statistic to be used in testing the null
hypothesis.
4. Compute for the value of the statistic to be used.
5. Compute for the degrees of freedom.
6. Find the tabular value using the table of values for different tests from the
appendix tables.
7. State the Decision Rule: If the computed value is less than the tabular value,
accept the null hypothesis. If the computed value is greater than the tabular
value, reject the null hypothesis.
8. Compare the computed value to the tabular value. Make a conclusion using the
result of the comparison.
Degree of Freedom (df)
The degree of freedom gives the number of pieces of independent information
available for computing variability. For any statistical tool used in testing hypothesis,
the number of degrees of freedom required will vary depending on the size of the
distribution. For a single group of population, the number of degrees of freedom is N –
1, where N is the population. For two groups, the formula for df is: N1 + N2 – 2 for t-
test and N – 2 for Pearson r. These test statistics will be discussed later in this chapter.
6.1.1 Tests Concerning Means

6.1.1.1 z-test on the Comparison between the Population Mean and the
Sample Mean
If the population mean (𝜇) and the population standard deviation (𝜎) are
known, and 𝜇 will be compared to a sample mean (𝑥̅ ), use the formula
below.
(𝑥̅ −𝜇)
𝑧= ∙ √𝑛, where n is the number of sample.
𝜎
The tabular values of 𝑧 can be obtained from the following table:
Table 2. Summary Table of Critical Values

Level of Significance
Test Type 0.10 0.05 0.025 0.01
One-tailed Test ±1.28 ±1.645 ±1.96 ±2.33
Two-tailed Test ±1.645 ±1.96 ±2.33 ±2.58
Decision Rule:
Reject Ho if |𝑧| ≥ |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
Example 1
A company, which makes a battery-operated toy car, claims that its products
have a mean life span of 5 years with a standard deviation of 2 years. Test the null
hypothesis that 𝜇 = 5 years against the alternative hypothesis that years if a random
sample of 40 toy cars was tested and found to have a mean life span of only 3 years.
Use a 5% level of significance.
Solution:
1. Ho : The mean lifespan of battery-operated toy cars is 5 years. (𝜇 = 5)
Ha : The mean lifespan of battery-operated toy cars is 5 years. (𝜇 ≠ 5)
2. 𝛼 = 0.05, two-tailed
3. Use z-test as test statistic.
4. Computation:
Given 𝑥̅ = 3, 𝜇 = 5, 𝑛 = 40, 𝜎 = 2
(𝑥̅ − 𝜇)
𝑧= ∙ √𝑛
𝜎
(3−5)
= 2 ∙ √40
= −6.32
5. Critical Value: 𝑧 < −1.96 𝑎𝑛𝑑 𝑧 > 1.96
6. Decision Rule: Reject Ho if |𝑧| ≥ |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
7. Since the computed|𝑧|, which is 6.32, is greater than |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |, which is 1.96,
therefore, reject Ho. Hence, there is a significant difference between the
population and sample mean lifespan of battery-operated toy cars.
Example 2
A manufacturer of bicycle tires has developed a new design which he claims
has an average lifespan of 5 years with a standard deviation of 1.2 years. A dealer of
the product claims that the average lifespan of 150 samples of the tires is only 3.5 years.
Test the difference of the population and sample means at 5% level of significance.
Solution:
1. Ho : There is no significant difference between the population and sample
mean of bicycle tires’ lifespan. (𝑥̅ = 𝜇)
Ha : There is a significant difference between the population and sample mean
of bicycle tires’ lifespan. (𝑥̅ < 𝜇)
2. 𝛼 = 0.05, one-tailed, left tail
3. Use z-test as test statistic.
4. Computation:
Given 𝑥̅ = 3.5, 𝜇 = 5, 𝑛 = 150, 𝜎 = 1.2
(𝑥̅ − 𝜇)
𝑧= ∙ √𝑛
𝜎
(3.5−5)
= 1.2 ∙ √150
= −15.31
5. Critical Value: 𝑧 < −1.645
7. Since the computed|𝑧|, which is 15.31, is greater than |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |, which is 1.645,
population and sample mean of bicycle tires’ lifespan.
6.1.1.2 t-test on the Comparison between the Population Mean and the
Sample Mean
The t-test can be used to compare the means when the population mean
(𝜇) is known but the population standard deviation (𝜎) is unknown.
When the population standard deviation is unknown but the sample
standard deviation can be computed, the t-test can also be used instead of
the z-test. The formula is given below:
(𝑥̅ − 𝜇)
𝑡= ∙ √𝑛
𝑠
The denominator of the formula, s, divided by the √𝑛 for t is called the
standard error of the statistic. It is the standard deviation of the sampling
distribution of a statistic for random samples n.
Decision Rule:
Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
Example 1
The average length of time for people to vote using the old procedure during a
presidential election period in precinct A is 55 minutes. Using computerization as a new
election method, a random sample of 20 registrants was used and found to have a mean
length of voting time of 30 minutes with a standard deviation of 1.5 minutes. Test the
significance of the difference between the population mean and the sample mean.
Solution:
1. Ho : There is no significant difference between the population and sample
mean of length of time for people to vote using the old and new procedure.
(𝑥̅ = 𝜇)
Ha : There is a significant difference between the population and sample mean
length of time for people to vote using the old and new procedure.
(𝑥̅ < 𝜇)
3. Use t-test as test statistic.
4. Computation:
Given 𝑥̅ = 30, 𝜇 = 55, 𝑛 = 20, 𝑠 = 1.5
(𝑥̅ − 𝜇)
𝑡= ∙ √𝑛
𝑠
(30−55)
= 1.5 ∙ √20
= −74.54
5. df = n – 1 = 20 – 1 = 19
6. Tabular Value: t = 1.729 (from Appendix)
7. Decision Rule: Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
8. Since the computed|𝑡|, which is 74.54, is greater than |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |, which is 1.729,
population and sample mean length of time for people to vote using the old and
new procedure. It implies that using computerization method in election gives
short period of time to vote compare to the old procedure.
Example 2
An experiment study was conducted by a researcher to determine if a new time
slot has an effect on the performance of pupils in Mathematics. Fifteen randomly
selected learners participated in the study. Toward the end of the investigations, a
standardized assessment was conducted. The sample mean was 85 and the standard
deviation of 3. In the standardization of the test, the mean was 75 and the standard
deviation was 10. Based on the evidence at hand, is the new time slot effective? Use
5% level of significance.
Solution:
1. Ho: There is no significant difference between the population and sample mean
of performance in Mathematics in a new time slot. (𝑥̅ = 𝜇)
Ha: There is a significant difference between the population and sample mean
of performance in Mathematics in a new time slot. (𝑥̅ > 𝜇)
2. 𝛼 = 0.05, one-tailed, right tail
3. Use t-test as test statistic.
4. Computation:
Given 𝑥̅ = 85, 𝜇 = 75, 𝑛 = 15, 𝑠 = 3
(𝑥̅ − 𝜇)
𝑡= ∙ √𝑛
𝑠
(85−75)
= ∙ √15
3
= 12.91
5. df = n – 1 = 15 – 1 = 14
population and sample mean of performance in Mathematics in a new time slot.
It implies that there is an effect of students’ performance in Mathematics when
it changed the time slot.
6.1.1.3 t-test Concerning Means of Independent Samples

When two samples are drawn from normally distributed population with
the assumption that their variances are equal, the t-test with the given
formula should be used.
𝑥1 − ̅̅̅
̅̅̅ 𝑥2
𝑡=
(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠2 2 𝑛1 + 𝑛2
√[ ][ ]
𝑛1 + 𝑛2 − 2 𝑛1 𝑛2
where 𝑥1 ̅̅̅
̅̅̅, 𝑥2 = means
𝑛1 , 𝑛2 = sample sizes
𝑠1 , 𝑠2 = variances
Decision Rule:
Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |.
Example 1
A course in Physics was taught to 10 students using the traditional method.
Another group of students went through the same course using another method. At the
end of the semester, the same test was administered to each group. The 10 students
under method A got an average of 82 with a standard deviation of 5, while the 11
students under method B got an average of 78 with a standard deviation of 6. Test the
null hypothesis of no significant difference in the performance of the two groups of
students at 5% level of significance.
Solution:
1. Ho: There is no significant difference between the average scores of the two
groups of students.
̅̅̅1 = ̅̅̅)
(𝑥 𝑥2
Ha: There is a significant difference between the average scores of the two
groups of students.
̅̅̅1 > ̅̅̅)
(𝑥 𝑥2
2. 𝛼 = 0.05, one-tailed, right tail
3. Use the t-test as test statistic.
4. Computation:
Given: ̅̅̅
𝑥1 = 82, ̅̅̅
𝑥2 = 78, 𝑛1 = 10, 𝑛2 = 11, 𝑠1 = 5, 𝑠2 = 6
𝑥1 − ̅̅̅
̅̅̅ 𝑥2
𝑡=
(𝑛 − 1)𝑠12 + (𝑛2 − 1)𝑠2 2 𝑛1 + 𝑛2
√[ 1 ][ ]
𝑛1 + 𝑛2 − 2 𝑛1 𝑛2
82 − 78
=
(10 − 1)(5)2 + (11 − 1)(6)2 10 + 11
√[ ][ ]
10 + 11 − 2 (10)(11)
4
=
(9)(25) + (10)(36) 21
√[ ][ ]
19 110
4
= = 1.65
2.4245
5. df = 𝑛1 + 𝑛2 − 2 = 10 + 11 – 2 = 19
8. Since the computed|𝑡|, which is 1.645, is less than |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |, which is 1.729,
therefore, accept Ho. Hence, there is no significant difference between the
average scores of the two groups of students. It implies that there is no
significant difference in using method A and method B in their students’
performance in Physics.
6.1.1.4 t-test on the Significance of the Difference Between Two Correlated

Means
When comparing two correlated means, the t-test is the appropriate
statistic. A typical example is when comparing the results of the pre-test
and post-test administered to group of individuals. The two tests must be
the same and the given formula should be used.
∑𝑑
𝑡=
2 2
√(𝑛 ∑ 𝑑 ) − (∑ 𝑑 )
𝑛−1
where d = difference between the pre-test and post-test scores

n = number of samples
Example 1
To determine whether the students’ performance in College Algebra improved
after enrolling in the subject for one term, a 60-item pre-test and post-test were
administered to them on the first and the last days of classes, respectively. The same
test was given as pre-test and post-test.
The results are as follows:

Student Pre-Test Score Post-Test Score d 𝒅𝟐
A 34 45 -11 121
B 23 32 -9 81
C 40 46 -6 36
D 31 57 -26 676
E 24 39 -15 225
F 45 48 -3 9
G 27 27 0 0
H 32 33 -1 1
I 12 18 -6 36
J 45 45 0 0
∑ 𝑑 = −77 ∑ 𝑑 2 = 1,185
Solution:
1. Ho: There is no significant difference between the pre-test and post-test of the
students’ performance in College Algebra. (𝜇1 = 𝜇2 )
Ha: There is a significant difference between the pre-test and post-test of the
students’ performance in College Algebra. (𝜇1 < 𝜇2 )
3. Use the t-test as test statistic.
4. Computation:
∑𝑑
𝑡=
2 2
√(𝑛 ∑ 𝑑 ) − (∑ 𝑑 )
𝑛−1
−77
=
2
√10(1,185) − (−77)
10 − 1
−77
=
√5,921
9
−77
=
25.65
= −3.002
5. df = n – 1 = 10 – 1 = 9
6. Tabular Value: t = 2.821(from Appendix)
therefore, reject Ho. Hence, there is a significant difference between the pre-test
and post-test of the students’ performance in College Algebra. It implies that
the performance of the students in Algebra is significantly improved.
6.1.1.5 z-test on the Significance of the Difference Between Two Independent
Proportions
There are certain situations when the data to be analyzed involve
population proportions or percentages. For instance, a shoe company may
want to know the proportions of defective shoes to be delivered in other
countries. To determine if there is a significant difference between
proportions of two variables, the z-test will be used.
𝑝1 − 𝑝2
𝑧=
𝑝1 𝑞1 𝑝2 𝑞2
√ 𝑛 + 𝑛
1 2
where 𝑝1 = proportion of first sample

𝑝2 = proportion of second sample
𝑞1 = 1 - 𝑝1
𝑞2 = 1 - 𝑝2
𝑛1 = number of cases in the first sample
𝑛2 = number of cases in the second sample
Example 1
A sample survey of a presidential candidate in the Philippines shows that 120
of 200 male voters dislike candidate X and 175 of 250 female voters dislike the same
120
candidate. Determine whether the difference between the two sample proportions, 200
175
and 250, is significant or not at 1% level of significance.
Solution:
1. Ho: There is no significant difference between the proportion of the male votes
and the proportion of female votes. (𝑝1 = 𝑝2 )
Ha: There is a significant difference between the proportion of the male votes
and the proportion of female votes. (𝑝1 ≠ 𝑝2 )
2. 𝛼 = 0.01, two-tailed
3. Use the z-test as test statistic.
4. Computation:
120 175
Given: 𝑝1 = 200, 𝑝2 = 250
𝑝1 − 𝑝2
𝑧=
𝑝1 𝑞1 𝑝2 𝑞2
√ 𝑛 + 𝑛
1 2
120 175
= 200 − 250
120 120 175 175
√(200) (1 − 200) (250) (1 − 250)
+
200 250
−0.1
=
120 80 175 75
√(200) (200) (250) (250)
+
200 250
−0.1
=
√0.24 + 0.21
200 250
−0.1
=
0.045
= −2.22
5. Tabular Value: z = 2.58

7. Since the computed|𝑧|, which is 2.22, is less than |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟 |, which is 2.58,
therefore, accept Ho. Hence, there is no significant difference between the
proportion of the male votes and the proportion of female votes in their dislike for
candidate X.
6.1.2 Significance of the Difference Between Variances

6.1.2.1 Analysis of Variance
When the variances of two or more independent samples differ, the
appropriate test statistic to determine the significance of such difference is
the analysis of variance (ANOVA), which makes use of the F ratio or
variance ratio. The various groups being compared are assumed to belong
to a population with a normal distribution, each group randomly selected
and independent from the other groups. The variables from each group also
have standard deviations that are approximately equal.
Steps in Solving the Analysis of Variance
1. State the null hypothesis.

2. Set the level of significance.
3. Accomplish the ANOVA table.
The ANOVA Table

Source of Sum of df Mean F
Variance Square Square
𝑆𝑆𝐵 𝐹
Between SSB dfB = k – 1 𝑀𝑆𝐵 = 𝑀𝑆𝐵
𝑑𝑓𝐵 =
𝑀𝑆𝑊
𝑆𝑆𝑊
Within SSW dfW = N – k 𝑀𝑆𝑊 =
𝑑𝑓𝑊
Total TSS dfT = N – 1
∑(∑ 𝑋𝐴𝑖 )2 (∑ 𝑋𝑖 )2
where 𝑆𝑆𝐵 = 𝑛𝐴𝑖
− 𝑁
(∑ 𝑋𝑖 )2
𝑇𝑆𝑆 = ∑ 𝑋𝑖 2
− 𝑁
𝑆𝑆𝑊 = 𝑇𝑆𝑆 − 𝑆𝑆𝐵
𝑁 = sample size
𝑘 = number of columns
𝑋 = observed value
𝑛 = number of rows
𝐴 = given factor or category
𝑖 = individual observation of cell
4. Find the tabular value of F at the given level of significance (from

Appendix)
5. State the Decision Rule: If the computed value is less than the tabular value,
accept the null hypothesis. If the computed value is greater than the tabular
value, reject the null hypothesis.
6. Interpret the result.
Example 1
Determine who among the three salesmen will most likely be promoted based
on their monthly sales in pesos. Use 5% level of significance.
Sales of Three Candidates for Promotion (A, B, C)
A B C
12,000 15,500 12,800
10,000 12,500 16,000
10,900 12,000 15,000
18,000 13,000 12,700
16,000 14,000 15,000
14,400 15,000 13,000
14,400 12,300 12,000
15,500 15,000 16,000
18,800 19,000 16,000
Solution:
1. Ho: There is no significant difference between the mean sales of the three
candidates for promotion.
Ha: There is a significant difference between the mean sales of the three
candidates for promotion.
2. 𝛼 = 0.05
3. Accomplish the ANOVA Table.
A B C A2 B2 C2
12,000 15,500 12,800 144,000,000 240,250,000 163,840,000
10,000 12,500 16,000 100,000,000 156,250,000 256,000,000
10,900 12,000 15,000 118,810,000 144,000,000 225,000,000
18,000 13,000 12,700 324,000,000 169,000,000 161,290,000
16,000 14,000 15,000 256,000,000 196,000,000 225,000,000
14,400 15,000 13,000 207,360,000 225,000,000 169,000,000
14,400 12,300 12,000 207,360,000 151,290,000 144,000,000
15,500 15,000 16,000 240,250,000 225,000,000 256,000,000
18,800 19,000 16,000 324,000,000 361,000,000 256,000,000
∑ 𝐴 =130,000 ∑ 𝐵 =128,300 ∑ 𝐶 =128,500 ∑ 𝐴2 =1,921,780,000 ∑ 𝐵2 =1,867,790,000 ∑ 𝐶 2 =1,856,130,000
3.1 Sum of Squares

Find SSB:
∑(∑ 𝑋𝐴𝑖 )2 (∑ 𝑋𝑖 )2
𝑆𝑆𝐵 = −
𝑛𝐴𝑖 𝑁
2 2
(∑ 𝐴) + (∑ 𝐵) + (∑ 𝐶)2 (∑ 𝐴 + ∑ 𝐵 + ∑ 𝐶 )2
= −
𝑛𝐴𝑖 𝑁
(130,000)2 + (128,300)2 + (128,500)2 (130,000 + 128,300 + 128,500)2
= −
9 27
(386,800)2
= 5,541,460,000 −
27
= 5,541,460,000 − 5,541,268,148.15
𝑆𝑆𝐵 = 191,851.85
Find TSS:
(∑ 𝑋𝑖 )2
2
𝑇𝑆𝑆 = ∑ 𝑋𝑖 −
𝑁
2 2 2 (∑ 𝐴 + ∑ 𝐵 + ∑ 𝐶 )2
= ∑𝐴 + ∑𝐵 + ∑𝐶 −
𝑁
= 1,921,780,000 + 1,867,790,000 + 1,856,130,000 − 5,541,268,148.15
= 5,645,700,000 − 5,541,268,148.15
= 104,431,851.85
Find SSW:
𝑆𝑆𝑊 = 𝑇𝑆𝑆 − 𝑆𝑆𝐵
= 104,431,851.85 − 191,851.85
= 104,240,000
3.2. degrees of freedom

dfB = 𝑘 – 1 = 3 – 1 = 2
dfW = 𝑁 – 𝑘 = 27 – 3 = 24
dfT = 𝑁 – 1 = 27 – 1 = 26
3.3. Mean of Squares
𝑆𝑆𝐵 191,851.85
𝑀𝑆𝐵 = = = 95,925.93
𝑑𝑓𝐵 2
𝑆𝑆𝑊 104,240,000
𝑀𝑆𝑊 = = = 4,343,333.33
𝑑𝑓𝑊 24
3.4. F – Value
𝑀𝑆𝐵 95,925.93
𝐹= = = 0.0221
𝑀𝑆𝑊 4,343,333.33
The ANOVA Table

Sum of df Mean F
Source of Square Square
Variance
Between 191,851.85 2 95,925.93 0.0221
Within 104,240,000 24 4,343,333.33
Total 104,431,851.85 26
4. Tabular Value: F = 3.40

5. Decision Rule: If the computed value is less than the tabular value, accept the
null hypothesis. If the computed value is greater than the tabular value, reject
the null hypothesis.
6. Since the computed F-value, which is 0.0221, is less than the tabular value,
which is 3.40, so the null hypothesis is accepted. Hence, there is no significant
difference between the mean sales of the three candidates for promotion. It
implies that the three salesmen have almost equal chances of promotion.
6.2 Correlation and Regression Analysis
Look at these pictures. What do they show?

When we say “healthy students are better students,” are we saying that the
academic performance of a student depends on his health?
If you know the monthly net profit of a company for a period of time, can you
predict its net profit for the coming months?
When one applies a job, what requirements are needed for submission? Why are
they required? Can a hiring officer predict the kind of worker an applicant will be based
on the submitted requirements?
These are some of the real-life situations that are require decision-making that
will be discussed in this topic. We will learn how to determine whether there is a
relationship between two variables using correlation analysis. We will also learn how
to predict the value of one variable in terms of the other variable using regression
analysis.
Correlation Analysis
Why do most students who excel in English do not do well in Mathematics?
Have you ever wondered whys some of your friends who are good in Mathematics do
not have high grades in English? Did it occur to you to find out if there exists a
relationship between academic performance in English and achievement in
Mathematics? The statistical procedure that is used to determine whether a relationship
between two variables is called correlation analysis.
Correlation
Correlation analysis measures the association or the strength of the
relationship between two variables say, x and y.
The relationship or correlation between two variables may be described in terms of

direction and strength.
The direction of correlation may be positive, negative, or zero.
 Two variables are positively correlated if the values of the two
variables both increase or both decrease.
 Two variables are negatively correlated if the values of one variable
increase while the values of the other decrease.
 Two values are not correlated or they have zero correlation if one
variable neither increases nod decreases while the other increases.
The strength of correlation may be perfect, very high, moderately high,
moderately low, very low, and zero. The discussion of the strength is found in
the succeeding box.
Example 1
Suppose a ten-item test in English and a ten-item test in Mathematics were
administered to ten students. The scores of the students are tabulated below. It must be
determined if the scores in Mathematics quiz (here labelled variable x) and the English
quiz (labelled variable y) are correlated or not.
Mathematics
English Score
Student Score
(y)
(x)
1 4 5
2 5 4
3 9 8
4 2 3
5 8 9
6 1 2
7 2 1
8 7 6
9 6 7
10 4 5
The scatter graph of the data above is given below. Note that x-axis represents
the scores in Mathematics and y-axis shows the scores in English. Each point in the
graph below is an ordered pair (x, y) corresponding to the score obtained by a student
in the two subjects.
10
9
8
7
English Score
6
5
4
3
2
1
0
0 2 4 6 8 10
Mathematics Score
The graph above indicates a direct correlation between variables x and y which
appears to be increasing.
Example 2
Suppose the scores of the students in those two subjects happen to be as follows:
Mathematics Score English Score
Student
(x) (y)
1 9 3
2 3 6
3 4 7
4 7 4
5 6 2
6 1 9
7 2 8
8 5 4
9 10 2
10 2 10
The scatter graph of the data above looks like this:
12
10
English Score
8
0
0 2 4 6 8 10 12
Mathematics Score
This time the trend of the data is decreasing, hence, the variables are negatively
correlated.
Example 3
Suppose the same students have the following scores.
Mathematics
English Score
Student Score
(y)
(x)
1 9 8
2 2 9
3 6 3
4 3 7
5 4 7
6 5 5
7 3 6
8 6 7
9 8 4
10 2 2
The scatter graph of the data above looks like this:

10
9
8
7
English Score
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
Mathematics Score
The scatter of the data is neither increasing nor decreasing. It represents a zero
correlation.
While a scatter plot may be a convenient way of inspecting correlation between
two variables, it does not offer a measure of the strength of the correlation. Fortunately,
Karl Pearson (1857-1936) developed and perfected a formula that can give a numerical
value to measure the strength of correlation. This formula does not only show how
greatly two data sets are correlated but also reveals if the correlation is direct or inverse,
or if the data sets are not correlated. The formula named after him is called the Pearson
Product-Moment Correlation Coefficient.
6.2.1 Pearson Product-Moment Correlation Coefficient

The most common statistical tool in measuring the linear relationship between
two random variables, x and y, is the linear correlation coefficient commonly called the
Pearson Product-Moment Correlation Coefficient or Pearson r for short. It became the
basis of different theories in the fields of heredity, psychology, anthropometry, and
statistics. It can be used to determine the linearity of the relationships between two
variables. The Pearson r formula is given by,
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
Note that the results of r should be interpreted only after its value has been found
to be significant. We will use the measuring devise to determine the strength of the
computed r, as shown below.
Pearson r Qualitative Description
±1.0 Perfect Correlation/Relationship
±0.75 𝑡𝑜 ± 0.99 Very High Correlation/Relationship
±0.50 𝑡𝑜 ± 0.74 Moderately High Correlation/Relationship
±0.25 𝑡𝑜 ± 0.49 Moderately Low Correlation/Relationship
±0.01 𝑡𝑜 ± 0.24 Very Low Correlation/Relationship
0 Zero or No Correlation/Relationship
Consider the data in Example 1 of this section. Organize the data as shown in
the table below.

x2 y2 xy
(x) (y)
4 5 16 25 20
5 4 25 16 20
9 8 81 64 72
2 3 4 9 6
8 9 64 81 72
1 2 1 4 2
2 1 4 1 2
7 6 49 36 42
6 7 36 49 42
4 5 16 25 20
∑ 𝑥 = 48 ∑ 𝑦 = 50 ∑ 𝑥 2 = 296 ∑ 𝑦 2 = 310 ∑ 𝑥𝑦 = 298
Solution:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
(10)(298) − (48)(50)
=
√[(10)(296)− (48)2 ][(10)(310) − (50)2 ]
580
=
√(656)(600)
580
=
62.73755
= 0.92
This result is in conformity with the scatter plot in Example 1 of this section.
The computed r is almost 1, hence, it has a very high positive correlation. This the
reason why the scatter plot in Example 2 in this section is increasing from left to right.
Using the data in Example 2 of this section, we have the following
computations.
x2 y2 xy
(x) (y)
9 3 81 9 27
3 6 9 36 18
4 7 16 49 28
7 4 49 16 28
6 2 36 4 12
1 9 1 81 9
2 8 4 64 16
5 4 25 16 20
10 2 100 4 20
2 10 4 100 20
∑ 𝑥 = 49 ∑ 𝑦 = 55 ∑ 𝑥 2 = 325 ∑ 𝑦 2 = 379 ∑ 𝑥𝑦 = 198
Solution:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
(10)(198) − (49)(55)
=
√[(10)(325)− (49)2 ][(10)(379) − (55)2 ]
−715
=
√(849)(765)
−715
=
805.906322
= −0.89
The computed r is – 0.89, hence, it has a very high correlation. This is the reason
why the scatter plot in Example 2 of this section is decreasing from left to right.
We now compute the r of the data on two non-correlated variables in Example
3 of this section.
x2 y2 xy
(x) (y)
9 8 81 64 72
2 9 4 81 18
6 3 36 9 18
3 7 9 49 21
4 7 16 49 28
5 5 25 25 25
3 6 9 36 18
6 7 36 49 42
8 4 32 4 32
2 2 4 4 4
∑ 𝑥 = 48 ∑ 𝑦 = 58 2 2
∑ 𝑥 = 252 ∑ 𝑦 = 370 ∑ 𝑥𝑦 = 278
Solution:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
(10)(278) − (48)(58)
=
√[(10)(252)− (48)2 ][(10)(370) − (58)2 ]
−4
=
√(216)(336)
−4
=
269.39933
= −0.01
Since the computed r is almost zero, then it has little or zero linear correlation.
This conforms with the scatter plot in Example 3 in this section. The graph is neither
increasing nor decreasing and therefore the two sets of data are not correlated.
Example 4
Test the hypothesis that there is no significant relationship between mental
ability and English proficiency at 5% level of significance.
Mental Ability (x) English Proficiency (y)
50 200
54 198
50 200
51 203
49 186
46 205
48 185
47 197
44 183
44 171
46 179
45 185
48 184
53 190
54 191
33 170
34 168
Solution:
1. Ho: There is no significant relationship between mental ability and English
proficiency.
Ha: There is a significant relationship between mental ability and English
proficiency.
2. 𝛼 = 5% 𝑜𝑟 0.05
3. Pearson r will be used to test the hypothesis.
4. Computation
Mental English
Ability (x) Proficiency x2 y2 xy
(y)
50 200 2,500 40,000 10,000
54 198 2,916 39,204 10,692
50 200 2,500 40,000 10,000
51 203 2,601 41,209 10,353
49 186 2,401 34,596 9,114
46 205 2,116 42,025 9,430
48 185 2,304 34,225 8,880
47 197 2,209 38,809 9,259
44 183 1,936 33,489 8,052
44 171 1,936 29,241 7,524
46 179 2,116 32,041 8,234
45 185 2,025 34,225 8,325
48 184 2,304 33,856 8,832
53 190 2,809 36,100 10,070
54 191 2,916 36,481 10,314
33 170 1,089 28,900 5,610
34 168 1,156 28,224 5,712
∑ 𝑥 = 796 ∑ 𝑦 = 3,195 ∑ 𝑥 2 = 37,834 ∑ 𝑦2 = ∑ 𝑥𝑦 =
602,625 150,401
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
(17)(150,401) − (796)(3,195)
=
√[(17)(37,834)− (796)2 ][(17)(602,625) − (3,195)2 ]
13,597
=
√(9,562)(36,600)
13,597
=
18,707.46375
= 0.73
5. df = N – 2 = 17 – 2 = 15
6. Tabular Value: r = 0.482 (from Appendix)
null hypothesis. If the computed value is greater than the tabular value, reject the
null hypothesis.
8. Since the computed r (0.73) is greater than the tabular value (0.482), so reject
the null hypothesis. Hence, there is a significant relationship between mental
ability and English proficiency. It shows that there is a moderately high positive
relationship between the two variables.
6.2.2 Regression Analysis
Regression analysis is used when predicting the behavior of a variable. The
regression equation explains the amount of variations observable in the independent
variable x. It is actually an equation of a straight line in the form:
𝑦 = 𝑏𝑥 + 𝑎
where y = criterion measure
x = predictor
a = ordinate or the point where regression line crosses the y-axis
b = the slope of the line.
To get the regression equation, the values of a and b are computed using the
formula below.
∑ 𝑦 ∑ 𝑥 2 − ∑ 𝑥 ∑ 𝑥𝑦
𝑎=
𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2
and
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏=
∑ 𝑥 2 − (∑ 𝑥 )2
where n = number of pairs
Example 1
The data in the table represent the membership at a university Mathematics club
during the past 5 years. Find the regression equation to predict the membership 5 years
from now.
Number of Years (x) Membership (y)
1 25
2 30
3 32
4 45
5 50
Solution:
Number of Membership x2 xy
Years (x) (y)
1 25 1 25
2 30 4 60
3 32 9 96
4 45 16 180
5 50 25 250
∑ 𝑥 = 15 ∑ 𝑦 = 182 2
∑ 𝑥 = 55 ∑ 𝑥𝑦 = 611
Find a: Find b:
∑ 𝑦 ∑ 𝑥 2 − ∑ 𝑥 ∑ 𝑥𝑦 𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑎= 𝑏=
𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ∑ 𝑥 2 − (∑ 𝑥 )2
182(55) − 15(611) 5(611) − 15(182)
= =
5(55) − (15)2 5(55) − (15)2
845 325
= =
50 50
= 16.9 = 6.5
Substitute the values of a and b in the equation y = bx + a.

y = 6.5x + 16.9
Since you need to predict the membership five years from now, or at year 10, substitute
10 for x in the equation.
y = 6.5(10) + 16.9
= 81.9
≈ 82
Thus, five years from now, the Mathematics club would have 82 members.
Example 2
The following data pertains to the heights of father and their eldest sons in
inches. If there is a significant relationship between two variables, predict the height of
the son if the height of his father is 78 inches. Use 5% level of significance.
Height of the Father Height of the Son
(x) (y)
71 71
69 69
69 71
65 68
66 68
63 66
68 70
70 72
60 65
58 60
Solution:
1. Ho: There is no significant relationship between heights of father and their eldest
sons.
Ha: There is a significant relationship between heights of father and their eldest
sons.
2. 𝛼 = 5% 𝑜𝑟 0.05
3. Pearson r will be used to test the hypothesis.
4. Computation
Height of the Height of the
Father Son x2 y2 xy
(x) (y)
71 71 5041 5041 5041
69 69 4761 4761 4761
69 71 4761 5041 4899
65 68 4225 4624 4420
66 68 4356 4624 4488
63 66 3969 4356 4158
68 70 4624 4900 4760
70 72 4900 5184 5040
60 65 3600 4225 3900
58 60 3364 3600 3480
∑ 𝑥 = 659 ∑ 𝑦 = 680 2
∑ 𝑥 = 43,601 2
∑ 𝑦 = 46,356 ∑ 𝑥𝑦 = 44,947
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
(10)(44,947) − (659)(680)
=
√[(10)(43,601)− (659)2 ][(10)(46,356) − (680)2 ]
= 0.95
5. df = N – 2 = 10 – 2 = 8
6. Tabular Value: r = 0.632 (from Appendix)
null hypothesis.
8. Since the computed r (0.95) is greater than the tabular value (0.632), so reject
the null hypothesis. Hence, there is a significant relationship between heights of
father and their eldest sons. It shows that there is a very high positive relationship
between the two variables.
We can now proceed to regression analysis since there was a significant
relationship between heights of father and their eldest sons.
Find a: Find b:
∑ 𝑦 ∑ 𝑥 2 − ∑ 𝑥 ∑ 𝑥𝑦 𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑎= 𝑏=
𝑛 ∑ 𝑥 2 − (∑ 𝑥 )2 ∑ 𝑥 2 − (∑ 𝑥 )2
680(43,601) − 659(44,947) 10(44,947) − 659(680)
= =
(10)(43,601)− (659)2 (10)(43,601)− (659)2
= 16.55 = 0.78
Substitute the values of a and b in the equation y = bx + a.

y = 0.78x + 16.55
Since you need to predict the height of the son if the height of the father is 78 inches,
substitute 78 for x in the equation.
y = 0.78(78) + 16.55
= 77.39
≈ 77 inches
Thus, the predicted height of the son whose father’s height is 78 inches is 77 inches.
6.2.3 Spearman’s Rank Correlation Coefficient Spearman rho (𝝆)

Beauty contests are very popular not only among Filipinos but also to many
people around the world. Normally, when the names of the five finalists are announced,
people place their own bets on who will be the queen and the runners-up. Very often,
they are happy about the results. This happens when their ranks agree with the ranks
assigned by the board of judges. There might be some slight differences between the
ranks assigned by the people and those by the board of judges but if overall, there is a
positive correlation (or agreement) between these ranks, then everyone will be happy
about the results.
In this next statistical measure, we shall be concerned with correlation between
ranks. Like in simple correlation, we have cases of positive correlation, zero correlation,
or negative correlation. A positive rank correlation indicates that those categories that
are given high ranks by one judge (or rater) are also the categories that are assigned
high ranks by the other rater. Or those with low ranks in one have also low ranks in the
other. A negative rank correlation is the reverse. It means that those categories who
were assigned high ranks by the first rater is given low ranks by the second rater, or
vice versa.
The most common method used in rank correlation is the statistics developed
by Spearman where the coefficient used is symbolized by 𝜌 (rho, Greek letter for r). To
compute 𝜌, we use the formula:
6 ∑ 𝑑2
𝜌 =1−
𝑛(𝑛2 − 1)
where d = difference between ranks
n = number of categories given ranks.
In interpreting the computer 𝜌, we use the same qualitative interpretation as the
one we use in interpreting Pearson r.
Example 1
In a contest for Mr. Campus Personality, two judges gave their ratings to 8
candidates. Transform the ratings to ranks and compute the coefficient of rank
correlation. Interpret the result.
Candidate Judge 1 Judge 2
1 98 94
2 97 97
3 95 98
4 90 95
5 89 92
6 88 90
7 85 89
8 85 85
Solution:
Judge 1 Judge 2
Candidate Rx Ry d d2
(x) (y)
1 98 94 1 4 -3 9
2 97 97 2 2 0 0
3 95 98 3 1 2 4
4 90 95 4 3 1 1
5 89 92 5 5 0 0
6 88 90 6 6 0 0
7 85 89 7.5 7 0.5 0.25
8 85 85 7.5 8 -0.5 0.25
∑ 𝑑 2 = 14.5
6 ∑ 𝑑2 6(14.5)
𝜌 =1− 2
=1− = 0.83
𝑛(𝑛 − 1) 8(82 − 1)
Interpretation: The computed 𝜌 (0.83) indicates a “very high positive correlation”

between the ranks. This means that those candidates who received high ranks from the
first judge are also the candidates who received the same high ranks from the second
judge. Similarly, those candidates who were ranked low by the first judge were also
ranked low by the other judge. This means that the rankings of the two judges have a
very high degree of agreement. It also implies that as to the selection of Mr. Campus
Personality, the two judges have more or less the same taste.
Example 2
Ten instructors were rated by third- and fourth-year students on their “master
of subject matter” and the results were tabulated. What is the Spearman rho value for
the data? At 5% level of significance, determine if there is a significant relationship in
the scores obtained by the teachers.
Instructor 3rd Year (x) 4th Year (y)
1 44 46
2 45 43
3 38 40
4 32 30
5 46 39
6 47 37
7 37 44
8 35 46
9 27 48
10 40 50
Solution:
1. Ho: There is no significant relationship between the ratings given to the ten
instructors by third- and fourth-years students.
Ha: There is a significant relationship between the ratings given to the ten
instructors by third- and fourth-years students.
2. 𝛼 = 5% 𝑜𝑟 0.05
3. Spearman rho (𝜌)will be used to test the hypothesis.
4. Computation
3rd Year 4th Year
Instructor Rx Ry d d2
(x) (y)
1 44 46 4 3.5 0.5 0.25
2 45 43 3 6 -3 9
3 38 40 6 7 -1 1
4 32 30 9 10 -1 1
5 46 39 2 8 -6 36
6 47 37 1 9 -8 64
7 37 44 7 5 2 4
8 35 46 8 3.5 4.5 20.25
9 27 48 10 2 8 64
10 40 50 5 1 4 16
∑ 𝑑 2 = 215.5
6 ∑ 𝑑2
𝜌 = 1−
𝑛(𝑛2 − 1)
6(215.5)
=1−
10(102 − 1)
= 1 − 1.31
= −0.31
5. df = N – 2 = 10 – 2 = 8
6. Tabular Value: 𝜌 = 0.643 (from Appendix)
null hypothesis.
8. Since the absolute value of the computed 𝜌 (0.31) is less than the tabular value
(0.643), so the null hypothesis accepted. Hence, there is no significant
relationship between the ratings given to the ten instructors by third- and fourth-
years students. It implies that the ratings of the third- and fourth-years students
are not the same.
References/Additional Resources/Readings
Aufmann, R. et al. (2018). Mathematical Excursions 4th Edition.

www.cenage.com/students/MINDTAP
Baltazar, E. C. et al. (2013). Mathematics in the Modern World. Quezon City: C&E
Publishing, Inc.
Nocon, R.C. & Nocon, E.G. (2018). Essential Mathematics for the Modern World. Quezon
City: C&E Publishing, Inc.
Quintos, R.T. et al. (2018). Mathematics in the Modern World. St. Andrew Publishing
House
Activity Sheet 19
Name: __________________________________________ Date: ________________

Year & Section: ___________________________________ Score: _______________
Direction: Solve the following problems.
1. A researcher used a developed problem solving test to randomly select 50 Grade

6 pupils. In this sample, 𝑥̅ = 80 and s = 10. The 𝜇 and the standard deviation of
the population used in the standardization of the test were 75 and 15,
respectively. Use the 95% confidence level to answer the following questions:
a. Does the sample mean differ significantly from the population mean?
b. Can it be said that the sample mean is above average?
2. The owner of a factory that sells a particular bottled fruit juice claims that the
average capacity of their product is 250 ml. To test the claim, a consumer group
gets a sample of 100 such bottles, calculates the capacity of each bottle, and
then finds the mean capacity to be 248 ml. The standard deviation is 5 ml. Is the
claim true?
3. In a plant nursery, the owner thinks that the lengths of seedlings in a box sprayed
with a new kind of fertilizer has an average height of 26 cm after three days and
a standard deviation of 10 cm. One researcher randomly selected 80 such
seedlings and calculated the mean height to be 20 cm and the standard deviation
was 10 cm. Will you conduct a one-tailed test or two-tailed test? Proceed with
the test using 𝛼 = 0.05.
Activity Sheet 20
Name: __________________________________________ Date: ________________


1. Drinking water has become an important concern among people. The quality of
drinking water must be monitored as often as possible during the day for
possible contamination. Another variable of concern is the pH below 7.0 is
acidic while a pH above 7.0 is alkaline. A pH of 7.0 is neutral. A water-treatment
plant has a target pH of 8.0. based on 16 random water samples, the mean and
the standard deviation were found to be 7.6 and 0.4, respectively. Does the
sample mean provide enough evidence that it differs significantly from the
target mean? Use 𝛼 = 0.05, two – tailed test.
2. The following sample of eight measurements was randomly selected from a

normally distributed population: 12, 10, 9, 8, 15, 10, 11, and 13. Test for
significant difference between the sample mean and the population mean of 10.
Use 𝛼 = 0.05.
3. An experiment study was conducted by a researcher to determine if a new time

slot has an effect on the performance of pupils in Mathematics. Fifteen
randomly selected learners participated in the study. Toward the end of the
investigation, a standardized assessment was conducted. The sample mean was
85 and the standard deviation was 3. In the standardization of the test, the mean
was 75 and the standard deviation was 10. Based on the evidence at hand, is the
new time slot effective? Use 𝛼 = 0.05.
Activity Sheet 21
Name: __________________________________________ Date: ________________


1. An investigator thinks that people under the age of forty have vocabularies that
are different than those of people over sixty years of age. The investigator
administers a vocabulary test to a group of 31 younger subjects and to a group
of 31 older subjects. Higher scores reflect better performance. The mean score
for younger subjects was 14.0 and the standard deviation of younger subject's
scores was 5.0. The mean score for older subjects was 20.0 and the standard
deviation of older subject's scores was 6.0. Does this experiment provide
evidence for the investigator's theory?
2. An investigator predicts that dog owners in the country spend more time
walking their dogs than do dog owners in the city. The investigator gets a sample
of 21 country owners and 23 city owners. The mean number of hours per week
that city owners spend walking their dogs is 10.0. The standard deviation of
hours spent walking the dog by city owners is 3.0. The mean number of hours’
country owners spent walking their dogs per week was 15.0. The standard
deviation of the number of hours spent walking the dog by owners in the country
was 4.0. Do dog owners in the country spend more time walking their dogs than
do dog owners in the city?
3. An investigator theorizes that people who participate in a regular program of

exercise will have levels of systolic blood pressure that are significantly
different from that of people who do not participate in a regular program of
exercise. To test this idea, the investigator randomly assigns 21 subjects to an
exercise program for 10 weeks and 21 subjects to a non-exercise comparison
group. After ten weeks the mean systolic blood pressure of subjects in the
exercise group is 137 and the standard deviation of blood pressure values in the
exercise group is 10. After ten weeks, the mean systolic blood pressure of
subjects in the non-exercise group is 127 and the standard deviation on subjects
in the non-exercise group is 9.0. Please test the investigator's theory using an
alpha level of 0.05.
Activity Sheet 22
Name: __________________________________________ Date: ________________


1. Suppose we were interested in determining whether two types of music, A and
B, differ with respect to their effects on sensory-motor coordination. We test
some subjects in the presence of Type-A music and other subjects in the
presence of Type-B music. With the design for correlated samples, we test all
subjects in both conditions and focus on the difference between the two
measures for each subject. To obviate the potential effects of practice and test
sequence in this case, we would also want to arrange that half the subjects are
tested first in the Type-A condition, then later in the Type-B condition, and vice
versa for the other half.
Student A B
1 10.2 13.2
2 8.4 7.4
3 17.8 16.6
4 25.2 27.0
5 23.8 27.5
6 25.7 26.6
7 16.2 18.0
8 21.5 23.4
9 21.1 23.4
10 16.9 21.1
11 24.6 23.8
12 20.4 20.2
13 25.8 29.1
14 17.1 17.7
15 14.4 19.2
2. Consider an experimenter interested in Subject Visual Auditory

whether the time it takes to respond to a 1 420 380
visual signal is different from the time it 2 235 230
takes to respond to an auditory signal. Ten 3 280 300
subjects are tested with both the visual 4 360 260
signal and with the auditory signal. (To 5 305 295
avoid confounding with practice effects, 6 215 190
half are in the auditory condition first and 7 200 200
the other half are in the visual task first). 8 460 410
The reaction times (in milliseconds) of the 9 345 330
ten subjects in the two conditions are 10 375 380
shown on the right side.
Activity Sheet 23
Name: __________________________________________ Date: ________________


1. Two types of medication for hives are being tested to determine if there is a
difference in the proportions of adult patient reactions. Twenty out of a random
sample of 200 adults given medication A still had hives 30 minutes after taking
the medication. Twelve out of another random sample of 200 adults given
medication B still had hives 30 minutes after taking the medication. Test at a
5% level of significance.
2. A research study was conducted about gender differences in “sexting.” The

researcher believed that the proportion of girls involved in “sexting” is less than
the proportion of boys involved. The data collected in the spring of 2010 among
a random sample of middle and high school students in a large school district in
the southern United States is summarized in the table. Is the proportion of girls
sending sexts less than the proportion of boys “sexting?” Test at a 5 % level of
significance.
Males Females
Sent “sexts” 183 156
Total number surveyed 2231 2169
3. Researchers conducted a study of smartphone use among adults. A cell phone

company claimed that iPhone smartphones are more popular with whites (non-
Hispanic) than with African-Americans. The results of the survey indicate that
of the 232 African-American cell phone owners randomly sampled, 5% have ab
iPhone. Of the 1,343 white cell phone owners randomly sampled, 10% own an
iPhone. Test at the 5% level of significance. Is the proportion of white iPhone
owners greater than the proportion of African-American iPhone owners?
Activity Sheet 24
Name: __________________________________________ Date: ________________

1. Zelazo et al. (1972) investigated the variability in age at first walking in infants.
Study infants were grouped into four groups, according to reinforcement of
walking and placement: (1) active (2) passive (3) no exercise; and (4) 8-week
control. Sample sizes were 6 per group, for a total of n=24. For each infant,
study data included group assignment and age at first walking, in months.
The following are the data and consist of recorded values of age
(months) by group:
Active Passive No-Exercise 8-Week
Group Group Group Control
9.00 11.00 11.50 13.25
9.50 10.00 12.00 11.50
9.75 10.00 9.00 12.00
10.00 11.75 11.50 13.50
13.00 10.50 13.25 11.50
9.50 15.00 13.00 12.35
2. Four brands of flashlight batteries are to be compared by testing each brand in
five flashlights. Twenty flashlights are randomly selected and divided randomly
into four groups of five flashlights each. Then each group of flashlights uses a
different brand of battery. The lifetimes of the batteries, to the nearest hour, are
as follows:
Brand A Brand B Brand C Brand D
42 28 24 20
30 36 36 32
39 21 28 38
28 32 28 28
29 27 33 25
Preliminary data analyses indicate that the independent samples come from
normal populations with equal standard deviations. At the 5% significance
level, does there appear to be a difference in mean lifetime among the four
brands of batteries?
3. The times required by three workers to perform an assembly-line task were
recorded on five randomly selected occasions. Here are the times, to the nearest
minute.
Hank Joseph Susan
8 8 10
10 9 9
9 9 10
11 8 11
10 10 9
Activity Sheet 25
Name: __________________________________________ Date: ________________

1. Below are the data for six participants giving their number of years in college
(X) and their subsequent yearly income (Y). Income here is in thousands of
pesos, but this fact does not require any changes in our computations. Test
whether there is a relationship with Alpha = .05.
No. of Years Income

of College (y)
(x)
0 15
1 15
3 20
4 25
4 30
6 35
2. Yvonne is a good student, but at times she doesn’t get enough sleep. She
hypothesizes that when she gets more sleep she does better on tests. To test her
hypothesis, she tracked how she did on a number of tests, based on how many
hours of sleep she got on the night previous. She inputs the following data into
her excel file to compute the correlation coefficient equation.
Hours of Sleep Test Score

(x) (y)
8 81
8 80
6 75
5 65
7 91
6 80
Activity Sheet 26
Name: __________________________________________ Date: ________________

1. The scores for nine students in history and algebra are as follows:
History Algebra
34 31
25 32
16 46
9 23
40 9
7 48
28 31
9 4
Compute the Spearman rank correlation.
2. The left side of Figure 1 displays the association between the IQ of each
adolescent in a sample with the number of hours they listen to rock music per
month. Determine the strength of the correlation between IQ and rock music
using both the Pearson’s correlation coefficient and Spearman’s rank
correlation. Compare the results.
IQ Rock
99 2
120 0
98 25
102 45
123 14
105 20
85 15
110 19
117 22
90 4
Learner’s Feedback Form
Name of Student: ___________________________________________________

Program : ___________________________________________________
Year Level : ______________________Section: ______________________
Faculty : ___________________________________________________
Schedule : ___________________________________________________
Learning Module: ________ Number: _________ Title : ______________________
How do you feel about the topic or concept presented?

□ I completely get it. □ I’m struggling.
□ I’ve almost got it. □ I’m lost.
In what particular portion of this learning packet, you feel that you are struggling or
lost?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
Did you raise your concern to you instructor? □ Yes □ No
If Yes, what did he/she do to help you?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
If No, state your reason?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
To further improve this learning packet, what part do you think should be enhanced?
_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
How do you want it to be enhanced?

_____________________________________________________________________
_____________________________________________________________________
_____________________________________________________________________
NOTE: This is an essential part of course module. This must be submitted to the subject
teacher (within the 1st week of the class).

The Statistical Tools: Mathematics in The Modern World

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Statistical Tools: Mathematics in The Modern World

Uploaded by

Copyright:

Available Formats

Mathematics in the Modern World

The Statistical Tools

At the end of this chapter, the student is expected to:

Topic 1: Testing of Hypothesis = 6 hours

6.1 Hypothesis Testing

One-Tailed and Two-Tailed Tests

Figure 1. Two-tailed (A) and One-tailed (A & B) tests

Steps in Testing Hypothesis

6.1.1 Tests Concerning Means

The tabular values of 𝑧 can be obtained from the following table:

Table 2. Summary Table of Critical Values

One-tailed Test ±1.28 ±1.645 ±1.96 ±2.33

Two-tailed Test ±1.645 ±1.96 ±2.33 ±2.58

6.1.1.3 t-test Concerning Means of Independent Samples

6.1.1.4 t-test on the Significance of the Difference Between Two Correlated

where d = difference between the pre-test and post-test scores

The results are as follows:

where 𝑝1 = proportion of first sample

5. Tabular Value: z = 2.58

6.1.2 Significance of the Difference Between Variances

Steps in Solving the Analysis of Variance

1. State the null hypothesis.

The ANOVA Table

4. Find the tabular value of F at the given level of significance (from

3.1 Sum of Squares

3.2. degrees of freedom

The ANOVA Table

Within 104,240,000 24 4,343,333.33

4. Tabular Value: F = 3.40

6.2 Correlation and Regression Analysis

Look at these pictures. What do they show?

The relationship or correlation between two variables may be described in terms of

The scatter graph of the data above looks like this:

6.2.1 Pearson Product-Moment Correlation Coefficient

Mathematics Score English Score

Substitute the values of a and b in the equation y = bx + a.

Substitute the values of a and b in the equation y = bx + a.

6.2.3 Spearman’s Rank Correlation Coefficient Spearman rho (𝝆)

Interpretation: The computed 𝜌 (0.83) indicates a “very high positive correlation”

Aufmann, R. et al. (2018). Mathematical Excursions 4th Edition.

Name: __________________________________________ Date: ________________

Direction: Solve the following problems.

1. A researcher used a developed problem solving test to randomly select 50 Grade

Name: __________________________________________ Date: ________________

Direction: Solve the following problems.

2. The following sample of eight measurements was randomly selected from a

3. An experiment study was conducted by a researcher to determine if a new time

Name: __________________________________________ Date: ________________

Direction: Solve the following problems.

3. An investigator theorizes that people who participate in a regular program of

Name: __________________________________________ Date: ________________

Direction: Solve the following problems.

2. Consider an experimenter interested in Subject Visual Auditory

Name: __________________________________________ Date: ________________

Direction: Solve the following problems.

2. A research study was conducted about gender differences in “sexting.” The

Sent “sexts” 183 156

Total number surveyed 2231 2169

3. Researchers conducted a study of smartphone use among adults. A cell phone

Name: __________________________________________ Date: ________________

Name: __________________________________________ Date: ________________

No. of Years Income

Hours of Sleep Test Score

Name: __________________________ Date:

Name: __________________________ Date:

Name: __________________________ Date:

Name: __________________________ Date:

Name: __________________________ Date:

Name: __________________________ Date:

Name: __________________________ Date:

Name: __________________________ Date:

Learning Module: ____ Number: _ Title : __________________