You are on page 1of 16

Mathematics in the Modern World 6.

1 Hypothesis Testing
Chapter 6: The Statistical Tools In Statistics, decision-making starts with a concern
Introduction about a population regarding its characteristics denoted by
Statistics involves the collection, organization, parameter values. We might be interested in the population
summarization, presentation, and interpretation of data. It has parameter like the mean or the proportion. For instance, you are
two branches: descriptive statistics and inferential statistics. deciding to put up a business selling cars. Your first course
Descriptive statistics is the term given to the analysis of data before spending money in business is to know which car sells
that helps describe, show or summarize data in a meaningful the most these days. Before you open a business of selling
way. When using descriptive statistics, it is useful to Toyota, Mitsubishi, Hyundai, Honda, Nissan, or Suzuki, you
summarize a group of data using a combination of tabulated need to gather information which among these get the most
description (i.e., tables), graphical description (i.e., graphs and number of sales. How many existing distributors of these cars
charts) and statistical commentary (i.e., a discussion of the are out there? Do you want to compete? To answer these
results). The branch that allows to make predictions questions, you need to gather data. What type of data? And
(“inferences”) from the data is called inferential statistics. where will you get them? You simply need to do a survey. These
With inferential statistics, it takes data from samples and make concerns can be addressed in a procedure in Statistics called
generalizations about a population. hypothesis testing.
For instance, you might stand in a mall and ask a
sample of 100 people if they like shopping at SM. You could Hypothesis
make a bar chart of yes or no answers (that would be a A hypothesis is a conjecture or statement which aims to
descriptive statistics) or you could use your research (and explain certain phenomena in the real world. Many hypotheses,
inferential statistics) to reason that around 75-80% of the statistical or not, are products of man’s curiosity. To seek for the
population (all shoppers in all malls) like shopping at SM. answers to his questions, he tries to find and present evidences,
Testing the significance of the difference between two then tests the resulting hypothesis using statistical tools and
means, two standard deviations, two proportions, or two analysis. In statistical analysis, the truth of which will be either
percentages, is an important area of inferential statistics. accepted or rejected within a certain critical interval.
Comparison between two or more variables often arises in The hypothesis that is subjected to testing to determine
research or experiments and to be able to make valid whether its truth can be accepted or rejected is the null
conclusions regarding the results of the study, one has to apply hypothesis by Ho. This hypothesis states that there is no
an appropriate test statistic. This chapter deals with the significant relationship or no significant difference between two
discussion of the different test statistics that are commonly or more variables, or that one variable does not affect another
used in research studies under inferential statistics. variable. In statistical research, the hypotheses should be written
in null form. For example, suppose you want to know whether
Learning Objectives method A is not more effective than method B in teaching high
At the end of this chapter, the student is expected to: school mathematics. The null hypothesis for this study will be:
• apply a variety of statistical tolls to process and manage “There is no significant difference between the effectiveness of
numerical data; method A and method B.”
• use the methods of linear regression and correlations to Another type of hypothesis is the alternative hypothesis,
predict the value of a variable given certain conditions; and denoted by Ha. This is the hypothesis that challenges the null
• recognize the importance of testing of hypotheses in making hypothesis. The alternative hypothesis for the example above
decisions. can be: “There is a significant difference between the
effectiveness of method A and method B.” or “Method A is
Duration more effective than method B,” or Method A is less effective
Topic 1: Testing of Hypothesis = 6 hours than method B,” depending on whether the type of test is either
Topic 2: Correlation and Regression Analysis = 3 hours onetailed or two-tailed. These will be discussed in the
succeeding lessons.
Lesson Proper

Significance Level
To test the null hypothesis of no significance in the
difference between the two methods in the above example, one
must set the level of significance first. This is the probability
of having a Type I error and is denoted by the symbol 𝛼. A
Type I error is the probability of accepting the alternative
hypothesis when, in fact, the null hypothesis is true. The
probability of accepting the null hypothesis when, in fact, it is
false is called a Type II error and it is denoted by the symbol
𝛽. The most common level of significance is 5%.

In figure 1.A (two-tailed), the rejection region is the areas


to the extreme left and right of the curve marked by the two
vertical lines. In figure 1.B&C (both onetailed), the rejection
region is the area to the left (left tail) and to the right (right tail)
of the vertical line under the bell curve, respectively.
If the null hypothesis is true and accepted, or if it is false Steps in Testing Hypothesis
and rejected, the decision is correct. If the null hypothesis is Below are the steps when testing the truth of a hypothesis.
true and reject, the decision is incorrect and this is a Type I 1. Formulate the null hypothesis. Denote it as Ho and the
error. If the null hypothesis is false and accepted, the decision alternative hypothesis as Ha.
is incorrect and this is a Type II error. For instance, Sarah 2. Set the desired level of significance (𝛼).
insists that she is 31 years old when, in fact, she is 35 years 3. Determine the appropriate test statistic to be used in testing
old. What error is Sarah committing? Mary is rejecting the the null hypothesis.
truth. She is committing a Type I error. Another example, a 4. Compute for the value of the statistic to be used.
man plans to go hunting the Philippine monkey-eating eagle 5. Compute for the degrees of freedom.
believing that it is a proof of his mettle. What type of error is 6. Find the tabular value using the table of values for different
this? Hunting the Philippine eagle is prohibited by law. Thus, tests from the appendix tables.
it is not a good sport. It is a Type II error. Since hunting the 7. State the Decision Rule: If the computed value is less than the
Philippine monkey-eating eagle is against the law, the man tabular value, accept the null hypothesis. If the computed value
may find himself in jail if he goes out of his way hunting is greater than the tabular value, reject the null hypothesis.
endangered species. 8. Compare the computed value to the tabular value. Make a
In decisions that we make, we form conclusions and conclusion using the result of the comparison.
these conclusions are the bases of our actions. But this is not
always the case in Statistics because we make decisions based Degree of Freedom (df)
on sample information. The best that we can do is to control The degree of freedom gives the number of pieces of
the probability with which an error occurs. This is the reason independent information available for computing variability. For
why we are assigning small probability values to each of them. any statistical tool used in testing hypothesis, the number of
degrees of freedom required will vary depending on the size of
One-Tailed and Two-Tailed Tests the distribution. For a single group of population, the number of
A test is called a one-tailed test if the rejection region lies degrees of freedom is N – 1, where N is the population. For two
on one extreme side of the distribution and two-tailed if the groups, the formula for df is: N1 + N2 – 2 for ttest and N – 2 for
rejection region is located on both ends of the distribution. Pearson r. These test statistics will be discussed later in this
chapter.

6.1.1 Tests Concerning Means


6.1.1.1 z-test on the Comparison between the Population
Mean and the Sample Mean

A manufacturer of bicycle tires has developed a new


design which he claims has an average lifespan of 5 years with a
standard deviation of 1.2 years. A dealer of the product claims
that the average lifespan of 150 samples of the tires is only 3.5
years. Test the difference of the population and sample means at
5% level of significance.
Solution:
1. Ho : There is no significant difference between the
population and sample mean of bicycle tires’ lifespan. (𝑥̅=
𝜇)
Ha : There is a significant difference between the
population and sample mean of bicycle tires’ lifespan. (𝑥̅<
𝜇)
2. 𝛼 = 0.05, one-tailed, left tail
3. Use z-test as test statistic.
4. Computation:

Example 1
A company, which makes a battery-operated toy car,
claims that its products have a mean life span of 5 years with a
standard deviation of 2 years. Test the null hypothesis that 𝜇 =
5 years against the alternative hypothesis that years if a
random sample of 40 toy cars was tested and found to have a
mean life span of only 3 years. Use a 5% level of significance.
Solution: 6.1.1.2 t-test on the Comparison between the Population
1. Ho : The mean lifespan of battery-operated toy cars is Mean and the Sample Mean
5 years. (𝜇 = 5) The t-test can be used to compare the means when the
Ha : The mean lifespan of battery-operated toy cars is population mean (𝜇) is known but the population standard
5 years. (𝜇 ≠ 5) deviation (𝜎) is unknown.
When the population standard deviation is unknown but the
2. 𝛼 = 0.05, two-tailed sample standard deviation can be computed, the t-test can also
3. Use z-test as test statistic. be used instead of the z-test. The formula is given below:
4. Computation:
The denominator of the formula, s, divided
by the √𝑛 for t is called the standard error of the statistic. It is the
standard deviation of the sampling distribution of a statistic for
random samples n.
Decision Rule:
Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟|.
5. Critical Value: 𝑧 < −1.96 𝑎𝑛𝑑 𝑧 > 1.96
Example 1
6. Decision Rule: Reject Ho if |𝑧| ≥ |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|.
The average length of time for people to vote using the old
7. Since the computed|𝑧|, which is 6.32, is greater than
procedure during a presidential election period in precinct A is
|𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|, which is 1.96, therefore, reject Ho. Hence, there is
55 minutes. Using computerization as a new election method, a
a significant difference between the population and sample
random sample of 20 registrants was used and found to have a
mean lifespan of battery-operated toy cars.
mean length of voting time of 30 minutes with a standard
Example 2
Ha: There is a significant difference between the population and
sample mean of performance in Mathematics in a new time slot.
deviation of 1.5 minutes. Test the significance of the
(𝑥̅> 𝜇)
difference between the population mean and the sample mean.
2. 𝛼 = 0.05, one-tailed, right tail
Solution:
1. Ho : There is no significant difference between the 3. Use t-test as test statistic.
population and sample mean of length of time for 4. Computation:
people to vote using the old and new procedure. (𝑥̅= 𝜇) Given 𝑥̅= 85, 𝜇 = 75, 𝑛 = 15, 𝑠 = 3
Ha : There is a significant difference between the
population and sample mean length of time for people
to vote using the old and new procedure. (𝑥̅< 𝜇)
2. 𝛼 = 0.05, one-tailed, left tail
3. Use t-test as test statistic. 5. df = n – 1 = 15 – 1 = 14
4. Computation: 6. Tabular Value: t = 1.761 (from Appendix)
7. Decision Rule: Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟|.
8. Since the computed|𝑡|, which is 12.91, is greater than
|𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|, which is 1.761, therefore, reject Ho. Hence, there is
a significant difference between the population and sample mean
5. df = n – 1 = 20 – 1 = 19 of performance in Mathematics in a new time slot. It implies that
6. Tabular Value: t = 1.729 (from Appendix) there is an effect of students’ performance in Mathematics when
7. Decision Rule: Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟|. it changed the time slot.
8. Since the computed|𝑡|, which is 74.54, is greater than
|𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|, which is 1.729, therefore, reject Ho. Hence, there 6.1.1.3 t-test Concerning Means of Independent Samples
When two samples are drawn from normally distributed
is a significant difference between the population and sample
mean length of time for people to vote using the old and new population with the assumption that their variances are equal, the
procedure. It implies that using computerization method in t-test with the given formula should be used.
election gives short period of time to vote compare to the old
procedure.

Example 2
An experiment study was conducted by a researcher to
determine if a new time slot has an effect on the performance
of pupils in Mathematics. Fifteen randomly selected learners
participated in the study. Toward the end of the investigations,
a standardized assessment was conducted. The sample mean
was 85 and the standard deviation of 3. In the standardization
of the test, the mean was 75 and the standard deviation was 10.
Based on the evidence at hand, is the new time slot effective?
Use 5% level of significance. Example 1
A course in Physics was taught to 10 students using the
Solution: traditional method. Another group of students went through the
1. Ho: There is no significant difference between the same course using another method. At the end of the semester,
population and sample mean of performance in the same test was administered to each group. The 10 students
Mathematics in a new time slot. (𝑥̅= 𝜇) under method A got an average of 82 with a standard deviation
of 5, while the 11 students under method B got an average of 78
Solution: with a standard deviation of 6. Test the null hypothesis of no
1. Ho: There is no significant difference between the significant difference in the performance of the two groups of
average scores of the two groups of students. students at 5% level of significance.
(𝑥̅̅1̅ = 𝑥̅̅2)̅
Ha: There is a significant difference between the Example 1
average scores of the two groups of students. To determine whether the students’ performance in
(𝑥̅̅1̅ > 𝑥̅̅2)̅ College Algebra improved after enrolling in the subject for one
2. 𝛼 = 0.05, one-tailed, right tail term, a 60-item pre-test and post-test were = 82 − 78 √[ (10 −
3. Use the t-test as test statistic. 1)(5) 2 + (11 − 1)(6) 2 10 + 11 − 2 ][ 10 + 11 (10)(11) ] = 4 √[
4. Computation: (9)(25) + (10)(36) 19 ][ 21 110] = 4 2.4245 = 1.65 administered
to them on the first and the last days of classes, respectively. The
same test was given as pre-test and post-test.

5. df = 𝑛1 + 𝑛2 − 2 = 10 + 11 – 2 = 19
6. Tabular Value: t = 1.729 (from Appendix)
Solution:
7. Decision Rule: Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟|.
1. Ho: There is no significant difference between the pre-
8. Since the computed|𝑡|, which is 1.645, is less than
test and post-test of the students’ performance in
|𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|, which is 1.729, therefore, accept Ho. Hence, there
College Algebra. (𝜇1 = 𝜇2)
is no significant difference between the average scores of the
Ha: There is a significant difference between the pre-test
two groups of students. It implies that there is no significant
and post-test of the students’ performance in College
difference in using method A and method B in their students’
Algebra. (𝜇1 < 𝜇2)
performance in Physics.
2. 𝛼 = 0.05, one-tailed, left tail
3. Use the t-test as test statistic.
6.1.1.4 t-test on the Significance of the Difference Between
4. Computation:
Two Correlated Means
When comparing two correlated means, the t-test is the
appropriate statistic. A typical example is when comparing the
results of the pre-test and post-test administered to group of
individuals. The two tests must be the same and the given
formula should be used.

8. Since the computed|𝑡|, which is 3.002, is greater than


|𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|, which is 2.821, therefore, reject Ho. Hence, there
is a significant difference between the pre-test and post-test of
the students’ performance in College Algebra. It implies that
the performance of the students in Algebra is significantly 5. df = n – 1 = 10 – 1 = 9
improved. 6. Tabular Value: t = 2.821(from Appendix)
7. Decision Rule: Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟|.

6.1.1.5 z-test on the Significance of the Difference Between


Two Independent Proportions
There are certain situations when the data to be analyzed
involve population proportions or percentages. For instance, a
shoe company may want to know the proportions of defective
shoes to be delivered in other countries. To determine if there
is a significant difference between proportions of two 4. Computation:
variables, the z-test will be used.

Example 1
A sample survey of a presidential candidate in the
Philippines shows that 120 of 200 male voters dislike
candidate X and 175 of 250 female voters dislike the same
candidate. Determine whether the difference between the two
120 175
sample proportions, and , is significant or not at 1%
200 250
level of significance.

Solution:
1. Ho: There is no significant difference between the
5. T tabular Value: z = 2.58
proportion of the male votes and the proportion of
6. Decision Rule: Reject Ho if |𝑧| ≥ |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|.
female votes. (𝑝1 = 𝑝2)
Ha: There is a significant difference between the 7. Since the computed|𝑧|, which is 2.22, is less than
proportion of the male votes and the proportion of |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|, which is 2.58, therefore, accept Ho.
female votes. (𝑝1 ≠ 𝑝2) Hence, there is no significant difference between the
2. α = 0.01, two-tailed proportion of the male votes and the proportion of
3. Use the z-test as test statistic. female votes in their dislike for candidate X.

6.1.2 Significance of the Difference Between Variances

6.1.2.1 Analysis of Variance


When the variances of two or more
independent samples differ, the appropriate test statistic to
determine the significance of such difference is the analysis of
variance (ANOVA), which makes use of the F ratio or variance
Steps in Solving the Analysis of Variance ratio. The various groups being compared are assumed to belong
to a population with a normal distribution, each group randomly
1. State the null hypothesis. selected and independent from the other groups. The variables
2. Set the level of significance. from each group also have standard deviations that are
3. Accomplish the ANOVA table. approximately equal.
Solution:
1. Ho: There is no significant difference between the
mean sales of the three candidates for promotion.
Ha: There is a significant difference between the mean
sales of the three candidates for promotion.
2. 𝛼 = 0.05 3.
3. Accomplish the ANOVA Table.

4. Find the tabular value of F at the given level of significance


(from Appendix)
5. State the Decision Rule: If the computed value is less than
the tabular value, accept the null hypothesis. If the computed
value is greater than the tabular value, reject the null
hypothesis.
6. Interpret the result.
3.1 Sum of Squares
Example 1
Determine who among the three salesmen will most Find SSB:
likely be promoted based on their monthly sales in pesos. Use
5% level of significance.
6. Since the computed F-value, which is 0.0221, is less than the
tabular value, which is 3.40, so the null hypothesis is accepted.
Hence, there is no significant difference between the mean sales
of the three candidates for promotion. It implies that the three
salesmen have almost equal chances of promotion.

6.2 Correlation and Regression Analysis

3.2. Degrees of Freedom

Look at these pictures. What do they show?


3.3. Mean of Squares
When we say “healthy students are better students,” are
we saying that the academic performance of a student depends
on his health?
If you know the monthly net profit of a company for a
period of time, can you predict its net profit for the coming
months?
3.4. F – Value When one applies a job, what requirements are needed for
submission? Why are they required? Can a hiring officer predict
the kind of worker an applicant will be based on the submitted
requirements?
These are some of the real-life situations that are require
decision-making that will be discussed in this topic. We will
learn how to determine whether there is a relationship
between two variables using correlation analysis. We will also
learn how to predict the value of one variable in terms of the
other variable using regression analysis.

Correlation Analysis:
4. Tabular Value: F = 3.40 Why do most students who excel in English do not do well
5. Decision Rule: If the computed value is less than the in Mathematics? Have you ever wondered whys some of your
tabular value, accept the null hypothesis. If the computed friends who are good in Mathematics do not have high grades
value is greater than the tabular value, reject the null in English? Did it occur to you to find out if there exists a
hypothesis. relationship between academic performance in English and
achievement in Mathematics? The statistical procedure that is
used to determine whether a relationship between two
Correlation
variables is called correlation analysis.
Correlation analysis measures the association
or the strength of the relationship between two
variables say, x and y.
The scatter graph of the data above is given below. Note
The relationship or correlation between two variables that x-axis represents the scores in Mathematics and y-axis
may be described in terms of direction and strength. shows the scores in English. Each point in the graph below is an
ordered pair (x, y) corresponding to the score obtained by a
The direction of correlation may be positive, negative, student in the two subjects.
or zero.
• Two variables are positively correlated if the values of
the two variables both increase or both decrease.
• Two variables are negatively correlated if the values
of one variable increase while the values of the other
decrease.
• Two values are not correlated or they have zero
correlation if one variable neither increases nod
decreases while the other increases.

The strength of correlation may be perfect, very high,


moderately high, moderately low, very low, and zero. The The graph above indicates a direct correlation between
discussion of the strength is found in the succeeding box. variables x and y which appears to be increasing.

Example 1 Example 2
Suppose the scores of the students in those two subjects
Suppose a ten-item test in English and a ten-item test in happen to be as follows:
Mathematics were administered to ten students. The scores
of the students are tabulated below. It must be determined if
the scores in Mathematics quiz (here labelled variable x) and
the English quiz (labelled variable y) are correlated or not.

The scatter graph of the data above looks like this:

Example 3
Suppose the same students have the following scores.

This time the trend of the data is decreasing, hence, the


variables are negatively correlated.
Note that the results of rshould be interpreted only after
The scatter graph of the data above looks like this: its value has been found to be significant. We will use the
measuring devise to determine the strength of the computed r,
as shown below.

The scatter of the data is neither increasing nor


decreasing. It represents a zero correlation.
Consider the data in Example 1 of this section. Organize the
While a scatter plot may be a convenient way of data as shown in the table below.
inspecting correlation between two variables, it does not
offer a measure of the strength of the correlation.
Fortunately, Karl Pearson (1857-1936) developed and
perfected a formula that can give a numerical value to
measure the strength of correlation. This formula does not
only show how greatly two data sets are correlated but also
reveals if the correlation is direct or inverse, or if the data sets
are not correlated. The formula named after him is called the
Pearson Product-Moment Correlation Coefficient.

6.2.1 Pearson Product-Moment Correlation Coefficient


The most common statistical tool in measuring the linear
relationship between two random variables, x and y, is the Solution:
linear correlation coefficient commonly called the Pearson
Product-Moment Correlation Coefficient or Pearson r for
short. It became the basis of different theories in the fields of
heredity, psychology, anthropometry, and statistics. It can be
used to determine the linearity of the relationships between
two variables. The Pearson r formula is given by,
This result is in conformity with the scatter plot in
Example 1 of this section. The computed r is almost 1, hence,
it has a very high positive correlation. This the reason why the
scatter plot in Example 2 in this section is increasing from left
to right.
Using the data in Example 2 of this section, we have the
following computations.
Solution:
Since the computed r is almost zero, then it has little or zero
linear correlation. This conforms with the scatter plot in
Example 3 in this section. The graph is neither increasing nor
decreasing and therefore the two sets of data are not
correlated.
Example 4
Test the hypothesis that there is no significant relationship
between mental ability and English proficiency at 5% level of
significance.

The computed r is – 0.89, hence, it has a very high


correlation. This is the reason why the scatter plot in Example
2 of this section is decreasing from left to right.
We now compute the r of the data on two non-
correlated variables in Example 3 of this section.

Solution:
1. Ho: There is no significant relationship between mental
ability and English proficiency.
Ha: There is a significant relationship between mental
ability and English proficiency.
2. 𝛼 = 5% 𝑜𝑟 0.05
3. Pearson r will be used to test the hypothesis.
4. Computation
6.2.2 Regression Analysis
Regression analysis is used when predicting the behavior
of a variable. The regression equation explains the amount of
variations observable in the independent variable x. It is
actually an equation of a straight line in the form:
𝑦 = 𝑏𝑥 + a
where y = criterion measure
x = predictor
a = ordinate or the point where regression line crosses
the y-axis
b = the slope of the line.
To get the regression equation, the values of a and b are
computed using the formula below.

Example 1
The data in the table represent the membership at a
university Mathematics club during the past 5 years. Find the
regression equation to predict the membership 5 years from
now.

5. df = N – 2 = 17 – 2 = 15 Solution:
6. Tabular Value: r = 0.482 (from Appendix)
7. Decision Rule: If the computed value is less than the
tabular value, accept the null hypothesis. If the computed
value is greater than the tabular value, reject the null
hypothesis.
8. Since the computed r (0.73) is greater than the tabular
value (0.482), so reject the null hypothesis. Hence, there is a
significant relationship between mental ability and English
proficiency. It shows that there is a moderately high positive
relationship between the two variables.
Substitute the values of a and b in the equation
y = bx + a. y = 6.5x + 16.9
Since you need to predict the membership five years from
now, or at year 10, substitute 10 for x in the equation.
y = 6.5(10) + 16.9
= 81.9
≈ 82
Thus, five years from now, the Mathematics club would have 5. df = N – 2 = 10 – 2 = 8
82 members. 6. Tabular Value: r = 0.632 (from Appendix)
7. Decision Rule: If the computed value is less than the tabular
Example 2 value, accept the null hypothesis. If the computed value is
The following data pertains to the heights of father and greater than the tabular value, reject the null hypothesis.
their eldest sons in inches. If there is a significant relationship 8. Since the computed r (0.95) is greater than the tabular value
between two variables, predict the height of the son if the (0.632), so reject the null hypothesis. Hence, there is a
height of his father is 78 inches. Use 5% level of significance. significant relationship between heights of father and their
eldest sons. It shows that there is a very high positive
relationship between the two variables.

We can now proceed to regression analysis since there was


a significant relationship between heights of father and their
eldest sons.

Solution:
1. Ho: There is no significant relationship between heights
of father and their eldest sons.
Ha: There is a significant relationship between heights of
father and their eldest sons. Substitute the values of a and b in the equation
2. 𝛼 = 5% 𝑜𝑟 0.05 y = bx + a. y = 0.78x + 16.55
3. Pearson r will be used to test the hypothesis. Since you need to predict the height of the son if the height of
4. Computation the father is 78 inches, substitute 78 for x in the equation.
y = 0.78(78) + 16.55 = 77.39 ≈ 77 inches
Thus, the predicted height of the son whose father’s height is
78 inches is 77 inches.

6.2.3 Spearman’s Rank Correlation Coefficient Spearman rho


(𝝆)
Beauty contests are very popular not only among Filipinos
but also to many people around the world. Normally, when the
names of the five finalists are announced, people place their
own bets on who will be the queen and the runners-up. Very
often, they are happy about the results. This happens when
their ranks agree with the ranks assigned by the board of
judges.
There might be some slight differences between the ranks
assigned by the people and those by the board of judges but
if overall, there is a positive correlation (or agreement)
between these ranks, then everyone will be happy about the
results.
In this next statistical measure, we shall be concerned
with correlation between ranks. Like in simple correlation, we
have cases of positive correlation, zero correlation, or
negative correlation. A positive rank correlation indicates that
those categories that are given high ranks by one judge (or
rater) are also the categories that are assigned high ranks by
the other rater. Or those with low ranks in one have also low
ranks in the other. A negative rank correlation is the reverse.
It means that those categories who were assigned high ranks
by the first rater is given low ranks by the second rater, or
vice versa. Interpretation: The computed 𝜌 (0.83) indicates a “very high
The most common method used in rank correlation is the positive correlation” between the ranks. This means that those
statistics developed by Spearman where the coefficient used candidates who received high ranks from the first judge are also
is symbolized by 𝜌 (rho, Greek letter for r). To compute 𝜌, we the candidates who received the same high ranks from the
use the formula: second judge. Similarly, those candidates who were ranked low
by the first judge were also ranked low by the other judge. This
means that the rankings of the two judges have a very high
degree of agreement. It also implies that as to the selection of
Mr. Campus Personality, the two judges have more or less the
In interpreting the computer 𝜌, we use the same same taste.
qualitative interpretation as the one we use in interpreting Example 2
Pearson r. Ten instructors were rated by third- and fourth-year students
Example 1 on their “master of subject matter” and the results were
In a contest for Mr. Campus Personality, two judges gave tabulated. What is the Spearman rho value for the data? At 5%
their ratings to 8 candidates. Transform the ratings to ranks level of significance, determine if there is a significant
and compute the coefficient of rank correlation. Interpret the relationship in the scores obtained by the teachers.
result.

Solution:
1. Ho: There is no significant relationship between the
ratings given to the ten instructors by third- and fourth-
Solution: years students.
Ha: There is a significant relationship between the ratings given
to the ten instructors by third- and fourth-years students.
2. 𝛼 = 5% 𝑜𝑟 0.05
3. Spearman rho (𝜌)will be used to test the hypothesis.
4. Computation

5. df = N – 2 = 10 – 2 = 8
6. Tabular Value: 𝜌 = 0.643 (from Appendix)
7. Decision Rule: If the computed value is less than the
tabular value, accept the null hypothesis. If the computed
value is greater than the tabular value, reject the null
hypothesis.
8. Since the absolute value of the computed 𝜌 (0.31) is less
than the tabular value (0.643), so the null hypothesis
accepted. Hence, there is no significant relationship between
the ratings given to the ten instructors by third- and
fourth years students. It implies that the ratings of the third-
and fourth-years students are not the same.

You might also like