Professional Documents
Culture Documents
1 Hypothesis Testing
Chapter 6: The Statistical Tools In Statistics, decision-making starts with a concern
Introduction about a population regarding its characteristics denoted by
Statistics involves the collection, organization, parameter values. We might be interested in the population
summarization, presentation, and interpretation of data. It has parameter like the mean or the proportion. For instance, you are
two branches: descriptive statistics and inferential statistics. deciding to put up a business selling cars. Your first course
Descriptive statistics is the term given to the analysis of data before spending money in business is to know which car sells
that helps describe, show or summarize data in a meaningful the most these days. Before you open a business of selling
way. When using descriptive statistics, it is useful to Toyota, Mitsubishi, Hyundai, Honda, Nissan, or Suzuki, you
summarize a group of data using a combination of tabulated need to gather information which among these get the most
description (i.e., tables), graphical description (i.e., graphs and number of sales. How many existing distributors of these cars
charts) and statistical commentary (i.e., a discussion of the are out there? Do you want to compete? To answer these
results). The branch that allows to make predictions questions, you need to gather data. What type of data? And
(“inferences”) from the data is called inferential statistics. where will you get them? You simply need to do a survey. These
With inferential statistics, it takes data from samples and make concerns can be addressed in a procedure in Statistics called
generalizations about a population. hypothesis testing.
For instance, you might stand in a mall and ask a
sample of 100 people if they like shopping at SM. You could Hypothesis
make a bar chart of yes or no answers (that would be a A hypothesis is a conjecture or statement which aims to
descriptive statistics) or you could use your research (and explain certain phenomena in the real world. Many hypotheses,
inferential statistics) to reason that around 75-80% of the statistical or not, are products of man’s curiosity. To seek for the
population (all shoppers in all malls) like shopping at SM. answers to his questions, he tries to find and present evidences,
Testing the significance of the difference between two then tests the resulting hypothesis using statistical tools and
means, two standard deviations, two proportions, or two analysis. In statistical analysis, the truth of which will be either
percentages, is an important area of inferential statistics. accepted or rejected within a certain critical interval.
Comparison between two or more variables often arises in The hypothesis that is subjected to testing to determine
research or experiments and to be able to make valid whether its truth can be accepted or rejected is the null
conclusions regarding the results of the study, one has to apply hypothesis by Ho. This hypothesis states that there is no
an appropriate test statistic. This chapter deals with the significant relationship or no significant difference between two
discussion of the different test statistics that are commonly or more variables, or that one variable does not affect another
used in research studies under inferential statistics. variable. In statistical research, the hypotheses should be written
in null form. For example, suppose you want to know whether
Learning Objectives method A is not more effective than method B in teaching high
At the end of this chapter, the student is expected to: school mathematics. The null hypothesis for this study will be:
• apply a variety of statistical tolls to process and manage “There is no significant difference between the effectiveness of
numerical data; method A and method B.”
• use the methods of linear regression and correlations to Another type of hypothesis is the alternative hypothesis,
predict the value of a variable given certain conditions; and denoted by Ha. This is the hypothesis that challenges the null
• recognize the importance of testing of hypotheses in making hypothesis. The alternative hypothesis for the example above
decisions. can be: “There is a significant difference between the
effectiveness of method A and method B.” or “Method A is
Duration more effective than method B,” or Method A is less effective
Topic 1: Testing of Hypothesis = 6 hours than method B,” depending on whether the type of test is either
Topic 2: Correlation and Regression Analysis = 3 hours onetailed or two-tailed. These will be discussed in the
succeeding lessons.
Lesson Proper
Significance Level
To test the null hypothesis of no significance in the
difference between the two methods in the above example, one
must set the level of significance first. This is the probability
of having a Type I error and is denoted by the symbol 𝛼. A
Type I error is the probability of accepting the alternative
hypothesis when, in fact, the null hypothesis is true. The
probability of accepting the null hypothesis when, in fact, it is
false is called a Type II error and it is denoted by the symbol
𝛽. The most common level of significance is 5%.
Example 1
A company, which makes a battery-operated toy car,
claims that its products have a mean life span of 5 years with a
standard deviation of 2 years. Test the null hypothesis that 𝜇 =
5 years against the alternative hypothesis that years if a
random sample of 40 toy cars was tested and found to have a
mean life span of only 3 years. Use a 5% level of significance.
Solution: 6.1.1.2 t-test on the Comparison between the Population
1. Ho : The mean lifespan of battery-operated toy cars is Mean and the Sample Mean
5 years. (𝜇 = 5) The t-test can be used to compare the means when the
Ha : The mean lifespan of battery-operated toy cars is population mean (𝜇) is known but the population standard
5 years. (𝜇 ≠ 5) deviation (𝜎) is unknown.
When the population standard deviation is unknown but the
2. 𝛼 = 0.05, two-tailed sample standard deviation can be computed, the t-test can also
3. Use z-test as test statistic. be used instead of the z-test. The formula is given below:
4. Computation:
The denominator of the formula, s, divided
by the √𝑛 for t is called the standard error of the statistic. It is the
standard deviation of the sampling distribution of a statistic for
random samples n.
Decision Rule:
Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟|.
5. Critical Value: 𝑧 < −1.96 𝑎𝑛𝑑 𝑧 > 1.96
Example 1
6. Decision Rule: Reject Ho if |𝑧| ≥ |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|.
The average length of time for people to vote using the old
7. Since the computed|𝑧|, which is 6.32, is greater than
procedure during a presidential election period in precinct A is
|𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|, which is 1.96, therefore, reject Ho. Hence, there is
55 minutes. Using computerization as a new election method, a
a significant difference between the population and sample
random sample of 20 registrants was used and found to have a
mean lifespan of battery-operated toy cars.
mean length of voting time of 30 minutes with a standard
Example 2
Ha: There is a significant difference between the population and
sample mean of performance in Mathematics in a new time slot.
deviation of 1.5 minutes. Test the significance of the
(𝑥̅> 𝜇)
difference between the population mean and the sample mean.
2. 𝛼 = 0.05, one-tailed, right tail
Solution:
1. Ho : There is no significant difference between the 3. Use t-test as test statistic.
population and sample mean of length of time for 4. Computation:
people to vote using the old and new procedure. (𝑥̅= 𝜇) Given 𝑥̅= 85, 𝜇 = 75, 𝑛 = 15, 𝑠 = 3
Ha : There is a significant difference between the
population and sample mean length of time for people
to vote using the old and new procedure. (𝑥̅< 𝜇)
2. 𝛼 = 0.05, one-tailed, left tail
3. Use t-test as test statistic. 5. df = n – 1 = 15 – 1 = 14
4. Computation: 6. Tabular Value: t = 1.761 (from Appendix)
7. Decision Rule: Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟|.
8. Since the computed|𝑡|, which is 12.91, is greater than
|𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|, which is 1.761, therefore, reject Ho. Hence, there is
a significant difference between the population and sample mean
5. df = n – 1 = 20 – 1 = 19 of performance in Mathematics in a new time slot. It implies that
6. Tabular Value: t = 1.729 (from Appendix) there is an effect of students’ performance in Mathematics when
7. Decision Rule: Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟|. it changed the time slot.
8. Since the computed|𝑡|, which is 74.54, is greater than
|𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|, which is 1.729, therefore, reject Ho. Hence, there 6.1.1.3 t-test Concerning Means of Independent Samples
When two samples are drawn from normally distributed
is a significant difference between the population and sample
mean length of time for people to vote using the old and new population with the assumption that their variances are equal, the
procedure. It implies that using computerization method in t-test with the given formula should be used.
election gives short period of time to vote compare to the old
procedure.
Example 2
An experiment study was conducted by a researcher to
determine if a new time slot has an effect on the performance
of pupils in Mathematics. Fifteen randomly selected learners
participated in the study. Toward the end of the investigations,
a standardized assessment was conducted. The sample mean
was 85 and the standard deviation of 3. In the standardization
of the test, the mean was 75 and the standard deviation was 10.
Based on the evidence at hand, is the new time slot effective?
Use 5% level of significance. Example 1
A course in Physics was taught to 10 students using the
Solution: traditional method. Another group of students went through the
1. Ho: There is no significant difference between the same course using another method. At the end of the semester,
population and sample mean of performance in the same test was administered to each group. The 10 students
Mathematics in a new time slot. (𝑥̅= 𝜇) under method A got an average of 82 with a standard deviation
of 5, while the 11 students under method B got an average of 78
Solution: with a standard deviation of 6. Test the null hypothesis of no
1. Ho: There is no significant difference between the significant difference in the performance of the two groups of
average scores of the two groups of students. students at 5% level of significance.
(𝑥̅̅1̅ = 𝑥̅̅2)̅
Ha: There is a significant difference between the Example 1
average scores of the two groups of students. To determine whether the students’ performance in
(𝑥̅̅1̅ > 𝑥̅̅2)̅ College Algebra improved after enrolling in the subject for one
2. 𝛼 = 0.05, one-tailed, right tail term, a 60-item pre-test and post-test were = 82 − 78 √[ (10 −
3. Use the t-test as test statistic. 1)(5) 2 + (11 − 1)(6) 2 10 + 11 − 2 ][ 10 + 11 (10)(11) ] = 4 √[
4. Computation: (9)(25) + (10)(36) 19 ][ 21 110] = 4 2.4245 = 1.65 administered
to them on the first and the last days of classes, respectively. The
same test was given as pre-test and post-test.
5. df = 𝑛1 + 𝑛2 − 2 = 10 + 11 – 2 = 19
6. Tabular Value: t = 1.729 (from Appendix)
Solution:
7. Decision Rule: Reject Ho if |𝑡| ≥ |𝑡𝑡𝑎𝑏𝑢𝑙𝑎𝑟|.
1. Ho: There is no significant difference between the pre-
8. Since the computed|𝑡|, which is 1.645, is less than
test and post-test of the students’ performance in
|𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|, which is 1.729, therefore, accept Ho. Hence, there
College Algebra. (𝜇1 = 𝜇2)
is no significant difference between the average scores of the
Ha: There is a significant difference between the pre-test
two groups of students. It implies that there is no significant
and post-test of the students’ performance in College
difference in using method A and method B in their students’
Algebra. (𝜇1 < 𝜇2)
performance in Physics.
2. 𝛼 = 0.05, one-tailed, left tail
3. Use the t-test as test statistic.
6.1.1.4 t-test on the Significance of the Difference Between
4. Computation:
Two Correlated Means
When comparing two correlated means, the t-test is the
appropriate statistic. A typical example is when comparing the
results of the pre-test and post-test administered to group of
individuals. The two tests must be the same and the given
formula should be used.
Example 1
A sample survey of a presidential candidate in the
Philippines shows that 120 of 200 male voters dislike
candidate X and 175 of 250 female voters dislike the same
candidate. Determine whether the difference between the two
120 175
sample proportions, and , is significant or not at 1%
200 250
level of significance.
Solution:
1. Ho: There is no significant difference between the
5. T tabular Value: z = 2.58
proportion of the male votes and the proportion of
6. Decision Rule: Reject Ho if |𝑧| ≥ |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|.
female votes. (𝑝1 = 𝑝2)
Ha: There is a significant difference between the 7. Since the computed|𝑧|, which is 2.22, is less than
proportion of the male votes and the proportion of |𝑧𝑡𝑎𝑏𝑢𝑙𝑎𝑟|, which is 2.58, therefore, accept Ho.
female votes. (𝑝1 ≠ 𝑝2) Hence, there is no significant difference between the
2. α = 0.01, two-tailed proportion of the male votes and the proportion of
3. Use the z-test as test statistic. female votes in their dislike for candidate X.
Correlation Analysis:
4. Tabular Value: F = 3.40 Why do most students who excel in English do not do well
5. Decision Rule: If the computed value is less than the in Mathematics? Have you ever wondered whys some of your
tabular value, accept the null hypothesis. If the computed friends who are good in Mathematics do not have high grades
value is greater than the tabular value, reject the null in English? Did it occur to you to find out if there exists a
hypothesis. relationship between academic performance in English and
achievement in Mathematics? The statistical procedure that is
used to determine whether a relationship between two
Correlation
variables is called correlation analysis.
Correlation analysis measures the association
or the strength of the relationship between two
variables say, x and y.
The scatter graph of the data above is given below. Note
The relationship or correlation between two variables that x-axis represents the scores in Mathematics and y-axis
may be described in terms of direction and strength. shows the scores in English. Each point in the graph below is an
ordered pair (x, y) corresponding to the score obtained by a
The direction of correlation may be positive, negative, student in the two subjects.
or zero.
• Two variables are positively correlated if the values of
the two variables both increase or both decrease.
• Two variables are negatively correlated if the values
of one variable increase while the values of the other
decrease.
• Two values are not correlated or they have zero
correlation if one variable neither increases nod
decreases while the other increases.
Example 1 Example 2
Suppose the scores of the students in those two subjects
Suppose a ten-item test in English and a ten-item test in happen to be as follows:
Mathematics were administered to ten students. The scores
of the students are tabulated below. It must be determined if
the scores in Mathematics quiz (here labelled variable x) and
the English quiz (labelled variable y) are correlated or not.
Example 3
Suppose the same students have the following scores.
Solution:
1. Ho: There is no significant relationship between mental
ability and English proficiency.
Ha: There is a significant relationship between mental
ability and English proficiency.
2. 𝛼 = 5% 𝑜𝑟 0.05
3. Pearson r will be used to test the hypothesis.
4. Computation
6.2.2 Regression Analysis
Regression analysis is used when predicting the behavior
of a variable. The regression equation explains the amount of
variations observable in the independent variable x. It is
actually an equation of a straight line in the form:
𝑦 = 𝑏𝑥 + a
where y = criterion measure
x = predictor
a = ordinate or the point where regression line crosses
the y-axis
b = the slope of the line.
To get the regression equation, the values of a and b are
computed using the formula below.
Example 1
The data in the table represent the membership at a
university Mathematics club during the past 5 years. Find the
regression equation to predict the membership 5 years from
now.
5. df = N – 2 = 17 – 2 = 15 Solution:
6. Tabular Value: r = 0.482 (from Appendix)
7. Decision Rule: If the computed value is less than the
tabular value, accept the null hypothesis. If the computed
value is greater than the tabular value, reject the null
hypothesis.
8. Since the computed r (0.73) is greater than the tabular
value (0.482), so reject the null hypothesis. Hence, there is a
significant relationship between mental ability and English
proficiency. It shows that there is a moderately high positive
relationship between the two variables.
Substitute the values of a and b in the equation
y = bx + a. y = 6.5x + 16.9
Since you need to predict the membership five years from
now, or at year 10, substitute 10 for x in the equation.
y = 6.5(10) + 16.9
= 81.9
≈ 82
Thus, five years from now, the Mathematics club would have 5. df = N – 2 = 10 – 2 = 8
82 members. 6. Tabular Value: r = 0.632 (from Appendix)
7. Decision Rule: If the computed value is less than the tabular
Example 2 value, accept the null hypothesis. If the computed value is
The following data pertains to the heights of father and greater than the tabular value, reject the null hypothesis.
their eldest sons in inches. If there is a significant relationship 8. Since the computed r (0.95) is greater than the tabular value
between two variables, predict the height of the son if the (0.632), so reject the null hypothesis. Hence, there is a
height of his father is 78 inches. Use 5% level of significance. significant relationship between heights of father and their
eldest sons. It shows that there is a very high positive
relationship between the two variables.
Solution:
1. Ho: There is no significant relationship between heights
of father and their eldest sons.
Ha: There is a significant relationship between heights of
father and their eldest sons. Substitute the values of a and b in the equation
2. 𝛼 = 5% 𝑜𝑟 0.05 y = bx + a. y = 0.78x + 16.55
3. Pearson r will be used to test the hypothesis. Since you need to predict the height of the son if the height of
4. Computation the father is 78 inches, substitute 78 for x in the equation.
y = 0.78(78) + 16.55 = 77.39 ≈ 77 inches
Thus, the predicted height of the son whose father’s height is
78 inches is 77 inches.
Solution:
1. Ho: There is no significant relationship between the
ratings given to the ten instructors by third- and fourth-
Solution: years students.
Ha: There is a significant relationship between the ratings given
to the ten instructors by third- and fourth-years students.
2. 𝛼 = 5% 𝑜𝑟 0.05
3. Spearman rho (𝜌)will be used to test the hypothesis.
4. Computation
5. df = N – 2 = 10 – 2 = 8
6. Tabular Value: 𝜌 = 0.643 (from Appendix)
7. Decision Rule: If the computed value is less than the
tabular value, accept the null hypothesis. If the computed
value is greater than the tabular value, reject the null
hypothesis.
8. Since the absolute value of the computed 𝜌 (0.31) is less
than the tabular value (0.643), so the null hypothesis
accepted. Hence, there is no significant relationship between
the ratings given to the ten instructors by third- and
fourth years students. It implies that the ratings of the third-
and fourth-years students are not the same.