You are on page 1of 20

Do not write

anything in this
SENIOR HIGH SCHOOL Learning
Activity Sheets

STATISTICS AND
PROBABILITY
Learning Activity Sheets for
Grade 11 – Quarter 4
CHAPTER IV: Parametric Statistical Inference: Estimation(page 1-5)
CHAPTER V: Parametric Statistical Inference: Hypothesis Testing(page 6-20)

Reference:
Tomakin, F. Y. (2010). Topics in Applied Statistics 2nd Edition. Block 2 Lot 1 Vel Pal 2 Etsates, Minglanilla, Cebu
6046: STATLINK RESEARCH TRAINING AND DEVELOPMENT.

1|P age
CHAPTER IV: PARAMETRIC STATISTICAL INFERENCE: ESTIMATION

Learning Objectives:
Researchers of the medical, social, behavioral and educational sciences are at times
confronted with problems of estimation. This unit presents a discussion on the procedures for
estimating the values of unknown parameters from information provided by simple statistics. After
completing this Chapter, the student shall be able to:
 Identify the two kinds of estimators;
 Construct the confidence intervals for the population mean and population proportion.

LESSON 4.1: ESTIMATION OF PARAMETERS


Key Concepts:
 There are two areas of statistical inference. These are estimation of parameters and
hypothesis testing. In particular, estimation of parameters involves the estimation of
unknown population values (parameters) by the known sample values (statistics). Statistics
therefore are used to estimate the unknown parameters.
 Estimation can be classified into two namely: point estimation and interval estimation. A point
estimate consists of a single value used to estimate population parameters.
Example:
` ̅ can be used to estimate 𝝁
𝑿
𝒔𝟐 can be used to estimate 𝝈𝟐

A confidence – interval estimate of a parameter consists of an interval of numbers obtained


from a point estimate, together with a percentage specifying how confident we are that the parameter
lies in that interval.
Example: Consider the following statement:
A 90% confidence interval for the mean income of teachers is (15000, 28000)
In this statement, the number 90% or 0.90 is called confidence coefficient or the degree
of confidence. The endpoints 15000 and 28000 are respectively called the lower and
upper confidence limits.
\

 In general, we can construct a (1 − 𝛼) 100% confidence interval. The Greek letter 𝛼 is referred
to as the level of significance and fraction (1 − 𝛼) is called confidence coefficient which is
interpreted as the probability that the interval estimator encloses the true value of the
parameter.

 The following table presents the most commonly used confidence coefficient and the
corresponding Z – values.
Confidence 𝜶
𝜶 𝒁𝜶 𝒁𝜶
coefficient 𝟐 𝟐

90% 0.10 1.282 0.05 1.645


95% 0.05 1.645 0.025 1.960
99% 0.01 2.326 0.005 2.576

 As much as possible, we always want to construct a confidence interval that is as narrow as


possible and has a large confidence coefficient. The following diagrams show a comparison
among the 90% and 99% confidence intervals.

2|P age
LESSON 4.2: ESTIMATING THE POPULATION MEAN
1. Computes for the confidence interval estimate based on the appropriate form of the
estimator for the population mean M11/12SP-IIIh-1,
2. Solves problems involving confidence interval estimation of the population mean. M11/12SP-
IIIh-2,
3. Draws conclusion about the population mean based on its confidence interval estimate.
M11/12SP-IIIh-3

Key Concepts:
 The point estimator of 𝜇 is 𝑋̅. The interval estimator of 𝜇 is the (1 − 𝛼) 100% confidence
interval given by:
𝝈 𝝈
a. (𝑋̅ − 𝒁𝜶 , 𝑋̅ + 𝒁𝜶 ) when 𝜎 is known (n > 30)
𝟐 √𝒏 𝟐 √𝒏

𝒔 𝒔
b. (𝑋̅ − 𝒕𝜶 , 𝑋̅ + 𝒕𝜶 ) when 𝜎 is unknown (n < 30); where 𝒕𝜶 is the t-value with
𝟐 √𝒏 𝟐 √𝒏 𝟐

v = n-1 degrees of freedom.


 It should be stressed that even if 𝝈 is unknown but for as long as n >30, we still use (a)
instead of (b). This explains the notion that the t is used only for small sample cases (n < 30)
Example: The laboratory department of a certain hospital collects data on leukemia patients,
of which one of these is that of the patients’ hematocrit. Accordingly, it is assumed that the
patients’ hematocrit are approximately normally distributed with a standard deviation of 11.07.
If random sample of 50 patients have an average hematocrit of 36.38, construct a 90%
confidence interval for the mean hematocrit 𝜇 of all leukemia patients of that hospital.

Solution: We are given the following information:

Given: n = 50 ̅ = 36.38
𝑿 𝝈 = 11.07 (known)
(𝟏 − 𝜶) 100% = (1 – 0.9) = 0.10

Since 𝝈 is known, then we use (a). Hence, 𝒁𝜶 = 𝒁𝟎.𝟏𝟎 = 𝒁𝟎.𝟎𝟓 = 1.645. Thus, the 90%
𝟐 𝟐
confidence interval for the mean hematocrit 𝜇 of all leukemia patients is given by:
𝝈 𝝈
(𝑋̅ − 𝒁𝜶 , 𝑋̅ + 𝒁𝜶 )
𝟐 √𝒏 𝟐 √𝒏
11.07 11.07
= (36.38 − 1.645 ( ) , 36.38 + 1.645 ( ) )
√50 √50
11.07 11.07
= (36.38 − 1.645 ( ) , 36.38 + 1.645 ( ) )
7.07106 7.07106
= (36.38 − 2.5753 , 36.38 + 2.5753 )
= (𝟑𝟑. 𝟖𝟎 , 𝟑𝟖. 𝟗𝟓 )

Example: The experiment scores of 7 chemistry students are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2,
and 9.6. Find the 95% confidence interval for the mean score of all chemistry students,
assuming an approximate normal distribution for students’ scores.

3|P age
Solution: Notice here that 𝝈 unknown. Thus the second formula (b) will be used. From the
given, we will compute the mean and the standard deviation. (Refer to Chapter 3 – Lesson 2:
Parameter and Statistics)

Step 1:
Since the mean is not yet 9.8 + 10.2 + 10.4 + 9.8 + 10.0 + 10.2 + 9.6
given, thus we need to solve 7
using the formula 𝑿̅ = 10.0
Σ𝑥
𝑋̅ = 𝑛

Step 2:
Solve for Sample Standard x ̅
𝒙−𝑿 ̅ )𝟐
(𝒙 − 𝑿
deviation 9.8 -0.2 0.04
10.2 0.2 0.04
10.4 0.4 0.16
9.8 -0.2 0.04
10 0 0
10.2 0.2 0.04
9.6 -0.4 0.16
Σ(𝑥 − 𝑋̅)2 = 0.48

0.48 0.48
= = = 0.08
7−1 6

Standard Deviation = √0.08


= 0.2828 ≈ 0.283
Given: n=7 ̅ = 10
𝑿 𝒔 = 0.283
(𝟏−. 𝟗𝟓) 100% = (1 – 0.95) = 0.05
Since 𝒔 is known, then we use (b).
𝒔 𝒔
(𝑋̅ − 𝒕𝜶 , 𝑋̅ + 𝒕𝜶 ) when 𝜎 is unknown (n < 30); where 𝒕𝜶 is the t-value with
𝟐 √𝒏 𝟐 √𝒏 𝟐

v = n-1 degrees of freedom.

Hence, 𝒕𝜶,𝒗=𝒏−𝟏 = 𝒕𝟎.𝟎𝟓,𝒗=𝟕−𝟏 = 𝒕𝟎.𝟎𝟐𝟓,𝒗=𝟔 = 2.447.


𝟐 𝟐
𝒔 𝒔
(𝑋̅ − 𝒕𝜶 , 𝑋̅ + 𝒕𝜶 )
𝟐 √𝒏 𝟐 √𝒏

0.283 0.283
= (10.0 − 2.447 ( ) , 10.0 + 2.447 ( ) )
√7 √7
0.283 0.283
= (10.0 − 2.447 ( ) , 10.0 + 2.447 ( ) )
2.64575 2.64575
= (10.0 − 0.2617 , 10.0 + 0.2617 )
= (𝟗. 𝟕𝟑 , 𝟏𝟎. 𝟐𝟔 )

4|P age
LESSON 4.3: ESTIMATING THE POPULATION PROPORTION
1. Identifies the appropriate form of the confidence interval estimator for the population
proportion based on the CLT M11/12SP-IIIi-3
2. Computes for the confidence interval estimate of the population proportion M11/12SP-IIIi-4
3. Solves problems involving confidence interval estimation of the population proportion
M11/12SP-IIIi-5
4. Draws conclusion about the population proportion based on its confidence interval estimate.
M11/12SP-IIIi-6

Key Concepts:
𝑋
 In a binomial experiment, the point estimator of the population proportion p is 𝑝̂ = 𝑛, where
X represents the number of successes in n trials. On the other hand, a (1 − 𝛼) 100%
confidence interval for population p is given by:

𝑝̂ 𝑞̂ 𝑝̂ 𝑞̂
(𝑝̂ − 𝒁𝜶 (√ ) , 𝑝̂ + 𝒁𝜶 (√ ))
𝟐 𝑛 𝟐 𝑛

𝑋
Where 𝑝̂ = and 𝑞̂ = 1 − 𝑝̂
𝑛

Example: In a random sample of 200 Covid patients, 138 were found to be negative of Covid.
Construct a 95% confidence interval for the population of all Covid Patients who are negative
of Covid.
Solution: We are given X = 138 and n = 200
𝑋 138
̂= =
Hence, 𝒑 ̂ = 1 − 𝑝̂ = 1 – 0.69 = 0.31
= 0.69 and 𝒒
𝑛 200

(𝟏 − 𝜶) 100% = (1 – 0.95) = 0.05

Also , 𝒁𝜶 = 𝒁𝟎.𝟎𝟓 = 𝒁𝟎.𝟎𝟐𝟓 = 1.96


𝟐 𝟐

Thus, the 95% confidence interval is given by:

𝑝̂ 𝑞̂ 𝑝̂ 𝑞̂
(𝑝̂ − 𝒁𝜶 (√ ) , 𝑝̂ + 𝒁𝜶 (√ ))
𝟐 𝑛 𝟐 𝑛

(0.69)(0.31) (0.69)(0.31)
= (0.69 − 1.96 (√ ) , 0.69 + 1.96 (√ ))
200 200

= (0.69 − 1.96(0.0327032) , 0.69 + 1.96(0.0327032))

= (0.63, 0.75)
This means that we are 95% confident that the proportion of patients who are negative of
Covid lies between 63% to 75%.

5|P age
CHAPTER V: PARAMETRIC STATISTICAL INFERENCE: HYPOTHESIS TESTING

LESSON 5.1: PARAMETRIC HYPOTHESIS TESTING

1. To illustrate null hypothesis and alternative hypothesis.(M11/12SP-IVa-1)


2. To illustrate types of errors in hypothesis testing (M11/12SP-IVa-1)

Key Concepts:
 Hypothesis testing is a decision-making process for evaluating claims about a population
based on the characteristics of a sample purportedly coming from that population. The
decision is whether the characteristic is acceptable or not. It involves making a decision
between the two opposing hypotheses which are the null hypothesis and alternative
hypothesis.

 A statistical hypothesis is any statement


 The null hypothesis, denoted by 𝐻0, is a statement that there is no difference between
a parameter and a specific value, or that there is no difference between two parameters.
Example:
If the null hypothesis says that the average grade of the students is 50. Then in
symbols, we write 𝐻0 : 𝜇 = 50
 The alternative hypothesis, denoted by 𝐻𝑎 , is a statement that there is a difference
between a parameter and a specific value, or that there is a difference between two
parameters. When the alternative hypothesis utilizes the ≠ symbol, the test is said to be
non-directional.
Example:
There are three possible alternative hypothesis which may be formulated from the
above null hypothesis 𝐻0
i. 𝐻𝑎 : 𝜇 < 50 (the average grade of students is less than 50)
ii. 𝐻𝑎 : 𝜇 > 50 (the average grade of students is greater than 50)
iii. 𝐻𝑎 : 𝜇 ≠ 50 (the average grade of students is not equal to 50)
The first two alternative hypothesis are called a one-tailed or a directional test. The third
alternative hypothesis on the other hand is called two-tailed or a non-directional test.

In problems that involve hypothesis testing, there are words like greater, efficient, improves,
effective, increases and so on suggest a right-tailed direction. Words like decrease, less than,
smaller, and the like suggest a left-tailed direction.

Example:
Formulate a null hypothesis and its alternative hypothesis and identify whether the alternative
hypothesis is directional or non-directional for each of the following:
1. The average TV viewing time of all five-year old children is 4 hours daily.
Solution:
Null Hypothesis: The average TV viewing of all five-year old children is 4 hours daily.
(This is the claim.) In symbol, 𝐻0 : 𝜇 = 4
Alternative Hypothesis: The average TV viewing of all five-year old children is not equal to 4
hours daily. (This is the opposite of the claim.) In symbol, 𝐻𝑎 : 𝜇 ≠ 4

 A decision rule is a criterion that specifies whether or not the null hypothesis should be
rejected in favor of the alternative hypothesis. This decision is based on the value of a test
statistic, the value of which is determined from the sample measurements.

6|P age
 The critical region is the area under sampling distribution that includes unlikely sample
outcomes. Also known as the rejection region, this is the area where the null hypothesis is
rejected. The acceptance region on the other hand is the region which lies opposite of the
rejection region. This is the region where the null hypothesis is accepted. The value between
the critical region and the acceptance region is called the critical value.

Example: In a one-tailed test, we have the following diagram:

Critical
Critical value
value

Acceptance region Acceptance region

In a two-tailed test, we have the following diagram:

Critical Critical
value value
Acceptance
region

 In a hypothesis testing procedure, we take into account the two types of errors that may be
committed in rejecting or accepting the null hypothesis. The Type I error occurs whjen we
rejected the null hypothesis when it is true. This is denoted by 𝜶. The Type II error on the
other hand occurs when we accept the null hypothesis when it is false. This is denoted by 𝜷.
The following table displays the possible consequences in the decision to accept the null
hypothesis.
Decision Null Hypothesis (𝑯𝟎 )

TRUE FALSE

Reject 𝐻0 Type I (𝜶) Correct Decision

Accept 𝐻0 Correct Decision Type II (𝜷)

 Consequently, 𝜶 is called the level of significance which is interpreted as the maximum


probability that the researcher is willing to commit a Type I error. It should be stressed at this
point that accepting the null hypothesis 𝐻0 does not mean that it is true but it is a result of
insufficient evidence to reject it.

7|P age
 In hypothesis testing procedure, the following steps are suggested.

1. State the null hypothesis (𝐻0 ) and the alternative hypothesis(𝐻𝑎 ).


2. Decide on the level of significance 𝜶.
3. Determine the decision rule, the appropriate test statistic and the critical region.
4. Gather the given data and compute the value of the test statistic. Check the
computed value if it falls inside the critical region or in the acceptance region.
5. Make the decision and state the conclusion in words.

LESSON 5.2: TEST ON MEANS FOR ONE SAMPLE CASE


1. Formulates the appropriate null and alternative hypotheses on a population mean. M11/12SP-IVb-1
2. Identifies the appropriate form of the test-statistic when: (a) the population variance is assumed
to be known; (b) the population variance is assumed to be unknown; and (c) the Central Limit Theorem
is to be used. M11/12SP-IVb-2
3. Identifies the appropriate rejection region for a given level of significance when: (a) the population
variance is assumed to be known; (b) the population variance is assumed to be unknown; and (c) the
Central Limit Theorem is to be used. M11/12SP-IVc-1
4. Computes for the test-statistic value (population mean). M11/12SP-IVd-1
5. Draws conclusion about the population mean based on the test-statistic value and the rejection
region. M11/12SP-IVd-2
6. Solves problems involving test of hypothesis on the population mean. M11/12SP-IVe-1

Key Concepts:
 Hypothesis testing for the mean of one sample case is concerned with testing the
significance of the deviation of a sample mean from the population mean. This involves
determining whether 𝜎 is known or unknown. The following table shows the form of the null
hypothesis 𝐻0 , the test statistics to be used, the kind of alternative hypothesis and the
corresponding critical region for each type of alternative hypothesis.

𝐻0 Test Statistic 𝐻𝑎 Critical Region


a. 𝝈 known
̅ − 𝝁𝟎
𝑿 𝝁 < 𝝁𝟎 𝒁 < −𝒁𝒂
𝝁 = 𝝁𝟎 𝒁= 𝝈 𝝁 > 𝝁𝟎 𝒁 > 𝒁𝒂
√𝒏 𝝁 ≠ 𝝁𝟎 |𝒁| > −𝒁𝒂
𝟐
b. 𝝈 unknown
̅ − 𝝁𝟎
𝑿 𝒕 < −𝒕𝒂
𝒕= 𝝁 < 𝝁𝟎
𝒔 𝒕 > 𝒕𝒂
𝝁 = 𝝁𝟎 𝝁 > 𝝁𝟎
√𝒏 𝝁 ≠ 𝝁𝟎 |𝒕| > −𝒕𝒂
v=n–1 𝟐

 Observe that if 𝜎 is known, the test statistics is the Z-test and if 𝜎 is unknown, the test
statistics is the t-test. These are Z – test and t-test for one sample case. Furthermore, it
should be stressed that even if 𝜎 is unknown but for as long as n > 30, we use the Z-test
instead of t-test. Hence, the t-test is reserved only for small cases and the Z-test for
sufficiently large n>30.

8|P age
Example: An instructor gives his class an achievement test, which as he knows from years of
experience, yields a mean of 80. His present class of 40 students obtains a mean of
85 and a standard deviation of 8. Can he claim that his present class is a superior
class? Use 𝜶 =0.01

Solution: Let 𝝁 = mean grade of the present class.

Step 1: Set up the null and alternative hypotheses.

𝐻0 : 𝝁 = 80 (There is no significant difference between the mean grade of the present


class and that of previous class)

𝐻𝑎 : 𝝁 > 80 (The present class is superior than the previous class)

Step 2: The level of significance is set at 𝜶 =0.01

This means that we are willing to commit a 1% chance of committing a Type I error
that is, rejecting the null hypothesis when it is true.

Step 3: Establish the test statistics and the critical region.

Here 𝜎 is known, since n = 40 > 30, then the test statistic that will be used is

̅ − 𝝁𝟎
𝑿
𝒁= 𝝈
√𝒏

Also, the critical region is given by:

Confidence 𝜶
𝜶 𝒁𝜶 𝒁𝜶
coefficient 𝟐 𝟐

90% 0.10 1.282 0.05 1.645


95% 0.05 1.645 0.025 1.960
99% 0.01 2.326 0.005 2.576

𝒁 > 𝒁𝒂 Rejection
Acceptance Region
𝒁 > 𝒁𝟎.𝟎𝟏 Region
𝒁 > 𝟐. 𝟑𝟐

-3 -2 -1 1 2 3

2.32
Step 4: Computations

Given: 𝜇0 = 80, s = 8 ≈ 𝜎, 𝑋̅ = 85, n = 40


̅ −𝝁𝟎
𝑿 𝟖𝟓−𝟖𝟎 𝟓
𝒁= 𝝈 = 𝟖 =
𝟏.𝟐𝟔𝟒𝟗
√𝒏 √𝟒𝟎
Z = 3.95

9|P age
Step 5: Decision and Conclusion

The computed Z-value lies in the critical region. Hence, the null hypothesis 𝐻0
is rejected and conclude that the population mean 𝝁 > 80. Thus, we can say that his
present class is superior than his previous classes.

Example: According to the College Entrance Examination Board, the mean verbal score on the
Scholastic Aptitude Test (SAT) in 1983 was 425 points out of a possible 800. A random
sample of 25 students last year had mean SAT score of 438 and a standard deviation
of 85. At 𝛼 =0.01 level of significance, does it appear that last year’s mean for verbal
SAT scores has increased over the 1983 mean of 425 points?

Solution: Let 𝝁 = last year’s mean for verbal SAT score.

Step 1: Set up the null and alternative hypotheses.

𝐻0 : 𝝁 = 425 (There is no significant difference between the last year’s mean verbal
SAT score and that of 1983)

𝐻𝑎 : 𝝁 > 425 (The last year’s mean verbal SAT score has increased over that of 1983)

Step 2: The level of significance is set at 𝜶 =0.01

Step 3: Establish the test statistics and the critical region.

Since n = 25 < 30, then the test statistic that will be used is
̅ −𝝁𝟎
𝑿
𝒕= 𝒔 , v=n–1
√𝒏
Also, the critical region is given by:
𝒕 > 𝒕𝒂 Rejection
Acceptance Region
𝒕 > 𝒕𝟎.𝟎𝟏, 𝒗=𝒏−𝟏 Region
𝒕 > 𝒕𝟎.𝟎𝟏, 𝒗=𝟐𝟓−𝟏

𝒕 > 𝒕𝟎.𝟎𝟏, 𝒗=𝟐𝟒


-3 -2 -1 1 2 3
𝒕 > 𝟐. 𝟒𝟗𝟐
2.492
Step 4: Computations

Given: 𝜇0 = 425, s = 85, 𝑋̅ = 438, n = 25


̅ −𝝁𝟎
𝑿 𝟒𝟑𝟖−𝟒𝟐𝟓 𝟏𝟑
𝒕= 𝒔 = 𝟖𝟓 =
𝟏𝟕
√𝒏 √𝟐𝟓
t = 0.7647

Step 5: Decision and Conclusion

The computed t-value is 0.7647 which lies in the acceptance region. Hence, we
accept the null hypothesis and conclude that there is no significant difference between
last year’s mean verbal SAT score and that of 1983.

10 | P a g e
LESSON 5.3: TEST ON MEANS FOR TWO SAMPLE CASE

 For two independent samples, the following table presents the structure of the null
hypothesis, the test statistics to be used, the different forms of alternative hypothesis and
their corresponding critical region.
𝐻0 Test Statistic 𝐻𝑎 Critical Region
𝟐 𝟐
a. 𝝈𝟏 𝒂𝒏𝒅 𝝈𝟐 known
̅𝟏 − 𝑿
(𝑿 ̅ 𝟐 ) − 𝒅𝒐
𝒁= 𝝁𝟏 − 𝝁𝟐 < 𝒅𝟎 𝒁 < −𝒁𝒂
𝝁𝟏 − 𝝁𝟐 = 𝒅𝒐 𝝁𝟏 − 𝝁𝟐 > 𝒅𝟎 𝒁 > 𝒁𝒂
𝝈𝟐 𝝈𝟐
√ 𝟏 + 𝟐 𝝁𝟏 − 𝝁𝟐 ≠ 𝒅𝟎 | 𝒁 | > −𝒁𝒂
𝒏𝟏 𝒏𝟐 𝟐

a. 𝝈𝟐𝟏 𝒂𝒏𝒅 𝝈𝟐𝟐 unknown


̅𝟏 − 𝑿
(𝑿 ̅ 𝟐 ) − 𝒅𝒐
𝒕=
𝟏 𝟐
𝑺𝒑 √𝒏 + 𝒏
𝟏 𝟐
𝝁𝟏 − 𝝁𝟐 < 𝒅𝟎 𝒕 < −𝒕𝒂
𝝁𝟏 − 𝝁𝟐 = 𝒅𝒐 𝝁𝟏 − 𝝁𝟐 > 𝒅𝟎 𝒕 > 𝒕𝒂
V = 𝒏𝟏 + 𝒏𝟐 - 2 |𝒕| > −𝒕𝒂
𝝁𝟏 − 𝝁𝟐 ≠ 𝒅𝟎
𝟐
𝒔𝟐𝟏
(𝒏 𝟏 − 𝟏 ) + (𝒏 𝟐 − 𝟏) 𝒔𝟐𝟐
𝒔𝟐𝒑 =
𝒏 𝟏 + 𝒏𝟐 − 𝟐

Example: A statistics test was given to 50 girls and 75 boys. The girls made an average grade of
80 with a standard deviation of 4 and the boys hand an average of 86 with a standards
deviation of 6. Is there sufficient evidence at 0.05 level of significance that the average
grades of girls and boys differ?

Solution: We formally follow the steps in testing hypothesis


Let 𝜇1 = average grade of girls
𝜇2 = average grade of boys

Step 1: Set up the null and alternative hypotheses.

𝐻0 : 𝝁𝟏 − 𝝁𝟐 = 𝟎 (There is no significant difference between the average grade of girls


and boys)

𝐻𝑎 : 𝝁𝟏 − 𝝁𝟐 ≠ 𝟎 (There is a significant difference between the average grade of girls


and boys)

Step 2: The level of significance is set at 𝜶 =0.05

Step 3: Establish the test statistics and the critical region.

We use the Z – statistic even if 𝝈𝟐𝟏 𝒂𝒏𝒅 𝝈𝟐𝟐 are unknown because the sample sizes of
the two samples are greater than 30. Hence 𝒔𝟐𝟏 ≈ 𝝈𝟐𝒛 𝒂𝒏𝒅 𝒔𝟐𝟐 ≈ 𝝈𝟐𝟐

The test statistic is:


̅ 𝟏 −𝑿
(𝑿 ̅ 𝟐)−𝒅𝒐
𝒁=
𝝈𝟐 𝝈 𝟐
√ 𝟏+ 𝟐
𝒏𝟏 𝒏𝟐

11 | P a g e
Also, the critical region is given by:

|𝒁 | > 𝒁 𝒂
𝟐
|𝒁| > 𝒁𝟎.𝟎𝟐𝟓 Rejection
Rejection Acceptance
|𝒁| > 𝟏. 𝟗𝟔𝟒 Region Region
Region

-3 -2 -1 1 2 3
1.964 1.964
Step 4: Computations
We are given the following:

̅ 𝟏= 80
𝑿 ̅ 𝟐= 86
𝑿 𝒏𝟏 = 50 𝒏𝟐 = 75 𝝈𝟏 = 4 𝝈𝟐 = 6 𝒅𝟎 = 0

Computing for the Z – statistic

̅ 𝟏 −𝑿
(𝑿 ̅ 𝟐)−𝒅𝒐 (𝟖𝟎−𝟖𝟔)−𝟎 −𝟔 −𝟔 −𝟔
𝒁= = 𝟐 𝟐
= = = 𝟎.𝟖𝟗𝟒𝟒𝟐𝟕𝟏
𝟏𝟔 𝟑𝟔 √𝟎.𝟖
𝝈𝟐 𝟐
𝟏 + 𝝈𝟐 √𝟒 +𝟔 √ +
√ 𝟓𝟎 𝟕𝟓 𝟓𝟎 𝟕𝟓
𝒏𝟏 𝒏𝟐

𝒁 = - 6.71

Step 5: Decision and Conclusion

Notice that 𝒁 = - 6.71 is located in the rejection area. Thus, the null hypothesis
is rejected and conclude that there is a significant difference between the average
grades of girls and boys.

Example: A Cardiologist wishes to determine which of the two drugs A or B is more effective in
lowering diastolic blood pressure (BP). In a group of 11 patients, he administered Drug
A and found out that mean diastolic BP was 85 with a standard deviation of 4.7 while
17 patients who took Drug B have a mean diastolic BP of 79 with a standard deviation
of 6.1. Would you say that the diastolic BP of those taking Drug A increases those of
Drug B by more than 8 mmHg? Use 𝜶 =0.01 and assume the populations to be
approximately normally distributed with equal variances.

Solution: We formally follow the steps in testing hypothesis


Let 𝜇1 = average diastolic BP of patients taking Drug A
𝜇2 = average diastolic BP of patients taking Drug B

Step 1: Set up the null and alternative hypotheses.

𝐻0 : 𝝁𝟏 − 𝝁𝟐 = 𝟖
𝐻𝑎 : 𝝁𝟏 − 𝝁𝟐 > 𝟖

Step 2: The level of significance is set at 𝜶 =0.01

Step 3: Establish the test statistics and the critical region.

12 | P a g e
The test statistic that will be used is

̅ 𝟏 −𝑿
(𝑿 ̅ 𝟐)−𝒅𝒐
𝒕= 𝟏 𝟐
with v = 𝒏𝟏 + 𝒏𝟐 – 2 degrees of freedom and where the pooled
𝑺𝒑 √ +
𝒏𝟏 𝒏𝟐
variance is given by
(𝒏𝟏 − 𝟏) 𝒔𝟐𝟏 + (𝒏𝟐 − 𝟏) 𝒔𝟐𝟐
𝒔𝟐𝒑 =
𝒏𝟏 + 𝒏𝟐 − 𝟐

Also, the critical region is given by:


𝒕 > 𝒕𝒂
𝒕 > 𝒕𝟎.𝟎𝟏, 𝒗=𝒏𝟏 + 𝒏𝟐 – 𝟐

𝒕 > 𝒕𝟎.𝟎𝟏, 𝒗=𝟏𝟏 + 𝟏𝟕– 𝟐 Rejection


Acceptance Region
𝒕 > 𝒕𝟎.𝟎𝟏, 𝒗=𝟐𝟔 Region
𝒕 > 𝟐. 𝟒𝟕𝟗

-3 -2 -1 1 2 3
Step 4: Computations 2.479
We are given the following:

Drug A Drug B 𝑑0 = 8
𝑛1 = 11 𝑛2 = 17
𝑋̅1 = 85 𝑋̅2 = 79
𝑠1= 4.7 𝑠2= 6.1

Let us first compute:


𝟐
(𝒏𝟏 − 𝟏) 𝒔𝟐𝟏 + (𝒏𝟐 − 𝟏) 𝒔𝟐𝟐
𝒔𝒑 =
𝒏𝟏 + 𝒏𝟐 − 𝟐
(11 − 1) ( 4.7)2 + (17 − 1) ( 6.1)2
=
11 + 17 − 2
(10) ( 4.7)2 + (16) ( 6.1)2
=
11 + 17 − 2
220.9 + 595.36
=
26
816.26
=
26
2
𝑠𝑝 = 31.394615
Hence, 𝑠𝑝 = √31.394615 = 5.60

Substituting these values to compute for t, we get:


̅𝟏 − 𝑿
(𝑿 ̅ 𝟐 ) − 𝒅𝒐
𝒕=
𝟏 𝟐
𝑺𝒑 √ +
𝒏𝟏 𝒏𝟐

13 | P a g e
(𝑋̅1 −𝑋̅2 )−𝑑𝑜 (85−79)−8 −2
= = =
𝑆𝑝 √
1
+
2 1 2 5.6 (√0.3869528)
𝑛1 𝑛2
5.6 √ +
11 17

t = −𝟎. 𝟗𝟐𝟑
Step 5: Decision and Conclusion

Clearly, the computed t-value is within the acceptance region, and so we decide
to accept 𝐻0 and conclude that Drug A does not increase the diastolic BP compared to
Drug B by more than 8 mmHg.

 For two related or paired samples, the following table presents the structure of the null
hypothesis, the test statistic to be used, the different forms of alternative hypotheses and their
corresponding critical region.
𝐻0 Test Statistic 𝐻𝑎 Critical Region
̅
𝒅 − 𝒅𝒐
𝒕= 𝒔 𝝁𝑫 < 𝒅𝟎 𝒕 < −𝒕𝒂
𝝁𝑫 = 𝒅𝒐 𝒅
𝝁𝑫 > 𝒅𝟎 𝒕 > 𝒕𝒂
√𝒏 𝝁𝑫 ≠ 𝒅𝟎 | 𝒕| > −𝒕𝒂
Where v = n - 1 𝟐

Example: An exercise therapist measured the heart rates of 15 randomly selected patients. The
patients were then placed on a running program and their heart rates were measured
again after a week. The results are as follows:
Patient Heart Rate Before Heart Rate After
1 68 67
2 76 77
3 74 74
4 71 74
5 71 69
6 72 70
7 75 71
8 83 77
9 75 71
10 74 74
11 76 73
12 77 68
13 78 71
14 75 72
15 75 77
Do the data provide sufficient evidence that the running program will reduce heart
rates? Use 𝜶 =0.01 level of significance.

Solution: Let 𝜇1 = average heart rate of the patients before the program
𝜇2 = average heart rate of the patients after the program

Step 1: Set up the null and alternative hypotheses.


𝐻0 : 𝝁𝟏 = 𝝁𝟐
𝐻𝑎 : 𝝁𝟏 > 𝝁𝟐

14 | P a g e
Step 2: The level of significance is set at 𝜶 =0.01

Step 3: Establish the test statistics and the critical region.

The test statistic is


̅ − 𝒅𝒐
𝒅
𝒕= 𝒔𝒅
√𝒏
Also, the critical region is given by:
𝒕 > 𝒕𝒂
𝒕 > 𝒕𝟎.𝟎𝟏, 𝒗=𝒏−𝟏

𝒕 > 𝒕𝟎.𝟎𝟏, 𝒗=𝟏𝟓−𝟏 Rejection


Acceptance Region
𝒕 > 𝒕𝟎.𝟎𝟏, 𝒗=𝟏𝟒 Region
𝒕 > 𝟐. 𝟔𝟐𝟒
𝒕 > 𝟐.-2𝟔𝟐𝟒 -1
-3 1 2 3
2.624
Step 4: Computations

Patient Heart Rate Heart Rate


d
Before After 𝒅𝟐
(x - y)
(x) (y)
1 68 67 1 1
2 76 77 -1 1
3 74 74 0 0
4 71 74 -3 9
5 71 69 2 4
6 72 70 2 4
7 75 71 4 16
8 83 77 6 36
9 75 71 4 16
10 74 74 0 0
11 76 73 3 9
12 77 68 9 81
13 78 71 7 49
14 75 72 3 9
15 75 77 2 4
∑ 𝑑 =35 2
∑ 𝑑 =239
̅ and 𝒔𝒅 , we have
Computing for 𝒅

∑𝑑 35
̅=
𝒅 = = 2.33
𝑛 15
𝒏 ∑ 𝒅𝟐 − (∑ 𝒅)𝟐 𝟏𝟓(𝟐𝟑𝟗) − (𝟑𝟓)𝟐 𝟑𝟓𝟖𝟓 − 𝟏𝟐𝟐𝟓 𝟐𝟑𝟔𝟎
𝒔𝒅 = √ =√ =√ = √ 𝟐𝟏𝟎 = √11.24 = 3.35
𝒏 (𝒏−𝟏) 𝟏𝟓(𝟏𝟓−𝟏) 𝟐𝟏𝟎

15 | P a g e
Computing for t, we obtain:
̅ −𝒅𝒐
𝒅 2.33 − 0 2.33 − 0 2.33 2.33
𝒕= 𝒔𝒅 = 3.35 = 3.35 = 3.35 = 0.8656 = 2.69
√𝒏 √15 √15 3.87
𝒕 = 2.69

Step 5: Decision and Conclusion

Since the computed t-value lies within the rejection region, then we decide to
reject the null hypothesis. Equivalently, we accept 𝐻𝑎 : 𝝁𝟏 > 𝝁𝟐 and conclude that the
running program will on the average, reduce heart rates.

16 | P a g e
LESSON 5.4: LINEAR CORRELATION
1. Constructs a scatter plot. M11/12SP-IVg-3
2. Describes shape (form), trend (direction), and variation (strength) based on a scatter plot.
M11/12SP-IVg-4
3. Calculates the Pearson’s sample correlation coefficient. M11/12SP-IVh-2
4. Solves problems involving correlation analysis. M11/12SP-IVh-3
 The linear correlation coefficient is a measure that is used to decide the strength of linear
relationship between two variables X and Y. The population linear correlation is denoted by
the Greek letter rho (𝜌). Since this is a parameter, 𝜌 is estimated by the statistic r. This is
called Pearson’s coefficient of correlation given by the formula:

 The correlation coefficient is the single number that represents the degree of relation
between two variables.
 The Pearson Product-Moment Correlation Coefficient (symbolized by r) is the most
common measure of correlation; researchers calculate it when both the X variable and
the Y variable are interval or ratio scale measurements. Mathematically, it can be
defined as the average of the cross-products of z-scores.
 The value of r ranges between ( -1) and ( +1)
 The value of r denotes the strength of the association as illustrated
 The sign of r denotes the nature of association

 If the sign is +ve this means the relation is direct (an increase in one variable is
associated with an increase in the other variable and a decrease in one variable
is associated with a decrease in the other variable).
 While if the sign is -ve this means an inverse or indirect relationship (which means an
increase in one variable is associated with a decrease in the other).

17 | P a g e
Example:
1. The more it rains, the more sales for umbrellas go up.- Positive Correlation
2. A student who has many absences has a decrease in grades- Negative Correlation
3. The longer your hair grows, the more shampoo you will need.- Positive Correlation
4. The older a man gets, the less hair that he has.- Negative Correlation
5. Time spent playing in ML and money spent in Load. _________________________________
6. 0.75 – Strong Positive Correlation
7. -0.32 – Weak Negative Correlation

Example:
A sample of 6 children was selected, data about their age in years and weight in kilograms
was recorded as shown in the following table. Find the relation between age and weight compute
the simple correlation coefficient:
Age Weight
(years) (Kg)
7 12
6 8
8 12
5 10
6 11
9 13

18 | P a g e
Make a table and solve for xy, X2, Y2

Age (years) Weight (Kg)


xy X2 Y2
(x) (y)
7 12 84 49 144
6 8 48 36 64
8 12 96 64 144
5 10 50 25 100
6 11 66 36 121
9 13 117 81 169
∑x= 41 ∑y= 66 ∑xy = 461 ∑X2=291 ∑Y2=742

We are given the following:

∑n= 6 ∑x= 41 ∑y= 66 ∑ xy = 461 ∑ X2 =291 ∑ Y2 =742

Using this formula:

Solution:

𝟔 (𝟒𝟔𝟏)−(𝟒𝟏)(𝟔𝟔)
𝒓= = 0.759
√[𝟔(𝟐𝟗𝟏)−(𝟒𝟏)𝟐 ][𝟔(𝟕𝟒𝟐)−(𝟔𝟔)𝟐

Thus, the relationship between age and weight of 6 selected children is strongly positive.
As the age of the children increases the weight also tends to increase. We can say that as
the children grow older it has a tendency that their weight also will affected and it will somehow
increase.

Example: Relationship between Anxiety and Test Scores


Anxiety(X) Test score(Y)
10 2
8 3
2 9
1 7
5 6
6 5

19 | P a g e
Anxiety
Test score (Y) X2 Y2 XY
(X)
10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30

∑X = 32 ∑Y = 32 ∑X2 = 230 ∑Y2 = 204 ∑XY=129

We are given the following:

∑n= 6 ∑x= 32 ∑y= 32 ∑ xy = 129 ∑ X2 =230 ∑ Y2 =204

Using this formula:

Solution:

𝟔 (𝟏𝟐𝟗)−(𝟑𝟐)(𝟑𝟐) 𝟕𝟕𝟒−𝟏𝟎𝟐𝟒
𝒓= = = - .94
√(𝟑𝟓𝟔)(𝟐𝟎𝟎)
√[𝟔(𝟐𝟑𝟎)−(𝟑𝟐)𝟐 ][𝟔(𝟐𝟎𝟒)−(𝟑𝟐)𝟐

Thus, the relationship between Anxiety and Test Scores is strongly negative.
As anxiety increases the Test tends to decrease. We can say that Anxiety affects the result
of the Test Scores of the students.

20 | P a g e

You might also like