Professional Documents
Culture Documents
anything in this
SENIOR HIGH SCHOOL Learning
Activity Sheets
STATISTICS AND
PROBABILITY
Learning Activity Sheets for
Grade 11 – Quarter 4
CHAPTER IV: Parametric Statistical Inference: Estimation(page 1-5)
CHAPTER V: Parametric Statistical Inference: Hypothesis Testing(page 6-20)
Reference:
Tomakin, F. Y. (2010). Topics in Applied Statistics 2nd Edition. Block 2 Lot 1 Vel Pal 2 Etsates, Minglanilla, Cebu
6046: STATLINK RESEARCH TRAINING AND DEVELOPMENT.
1|P age
CHAPTER IV: PARAMETRIC STATISTICAL INFERENCE: ESTIMATION
Learning Objectives:
Researchers of the medical, social, behavioral and educational sciences are at times
confronted with problems of estimation. This unit presents a discussion on the procedures for
estimating the values of unknown parameters from information provided by simple statistics. After
completing this Chapter, the student shall be able to:
Identify the two kinds of estimators;
Construct the confidence intervals for the population mean and population proportion.
In general, we can construct a (1 − 𝛼) 100% confidence interval. The Greek letter 𝛼 is referred
to as the level of significance and fraction (1 − 𝛼) is called confidence coefficient which is
interpreted as the probability that the interval estimator encloses the true value of the
parameter.
The following table presents the most commonly used confidence coefficient and the
corresponding Z – values.
Confidence 𝜶
𝜶 𝒁𝜶 𝒁𝜶
coefficient 𝟐 𝟐
2|P age
LESSON 4.2: ESTIMATING THE POPULATION MEAN
1. Computes for the confidence interval estimate based on the appropriate form of the
estimator for the population mean M11/12SP-IIIh-1,
2. Solves problems involving confidence interval estimation of the population mean. M11/12SP-
IIIh-2,
3. Draws conclusion about the population mean based on its confidence interval estimate.
M11/12SP-IIIh-3
Key Concepts:
The point estimator of 𝜇 is 𝑋̅. The interval estimator of 𝜇 is the (1 − 𝛼) 100% confidence
interval given by:
𝝈 𝝈
a. (𝑋̅ − 𝒁𝜶 , 𝑋̅ + 𝒁𝜶 ) when 𝜎 is known (n > 30)
𝟐 √𝒏 𝟐 √𝒏
𝒔 𝒔
b. (𝑋̅ − 𝒕𝜶 , 𝑋̅ + 𝒕𝜶 ) when 𝜎 is unknown (n < 30); where 𝒕𝜶 is the t-value with
𝟐 √𝒏 𝟐 √𝒏 𝟐
Given: n = 50 ̅ = 36.38
𝑿 𝝈 = 11.07 (known)
(𝟏 − 𝜶) 100% = (1 – 0.9) = 0.10
Since 𝝈 is known, then we use (a). Hence, 𝒁𝜶 = 𝒁𝟎.𝟏𝟎 = 𝒁𝟎.𝟎𝟓 = 1.645. Thus, the 90%
𝟐 𝟐
confidence interval for the mean hematocrit 𝜇 of all leukemia patients is given by:
𝝈 𝝈
(𝑋̅ − 𝒁𝜶 , 𝑋̅ + 𝒁𝜶 )
𝟐 √𝒏 𝟐 √𝒏
11.07 11.07
= (36.38 − 1.645 ( ) , 36.38 + 1.645 ( ) )
√50 √50
11.07 11.07
= (36.38 − 1.645 ( ) , 36.38 + 1.645 ( ) )
7.07106 7.07106
= (36.38 − 2.5753 , 36.38 + 2.5753 )
= (𝟑𝟑. 𝟖𝟎 , 𝟑𝟖. 𝟗𝟓 )
Example: The experiment scores of 7 chemistry students are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2,
and 9.6. Find the 95% confidence interval for the mean score of all chemistry students,
assuming an approximate normal distribution for students’ scores.
3|P age
Solution: Notice here that 𝝈 unknown. Thus the second formula (b) will be used. From the
given, we will compute the mean and the standard deviation. (Refer to Chapter 3 – Lesson 2:
Parameter and Statistics)
Step 1:
Since the mean is not yet 9.8 + 10.2 + 10.4 + 9.8 + 10.0 + 10.2 + 9.6
given, thus we need to solve 7
using the formula 𝑿̅ = 10.0
Σ𝑥
𝑋̅ = 𝑛
Step 2:
Solve for Sample Standard x ̅
𝒙−𝑿 ̅ )𝟐
(𝒙 − 𝑿
deviation 9.8 -0.2 0.04
10.2 0.2 0.04
10.4 0.4 0.16
9.8 -0.2 0.04
10 0 0
10.2 0.2 0.04
9.6 -0.4 0.16
Σ(𝑥 − 𝑋̅)2 = 0.48
0.48 0.48
= = = 0.08
7−1 6
0.283 0.283
= (10.0 − 2.447 ( ) , 10.0 + 2.447 ( ) )
√7 √7
0.283 0.283
= (10.0 − 2.447 ( ) , 10.0 + 2.447 ( ) )
2.64575 2.64575
= (10.0 − 0.2617 , 10.0 + 0.2617 )
= (𝟗. 𝟕𝟑 , 𝟏𝟎. 𝟐𝟔 )
4|P age
LESSON 4.3: ESTIMATING THE POPULATION PROPORTION
1. Identifies the appropriate form of the confidence interval estimator for the population
proportion based on the CLT M11/12SP-IIIi-3
2. Computes for the confidence interval estimate of the population proportion M11/12SP-IIIi-4
3. Solves problems involving confidence interval estimation of the population proportion
M11/12SP-IIIi-5
4. Draws conclusion about the population proportion based on its confidence interval estimate.
M11/12SP-IIIi-6
Key Concepts:
𝑋
In a binomial experiment, the point estimator of the population proportion p is 𝑝̂ = 𝑛, where
X represents the number of successes in n trials. On the other hand, a (1 − 𝛼) 100%
confidence interval for population p is given by:
𝑝̂ 𝑞̂ 𝑝̂ 𝑞̂
(𝑝̂ − 𝒁𝜶 (√ ) , 𝑝̂ + 𝒁𝜶 (√ ))
𝟐 𝑛 𝟐 𝑛
𝑋
Where 𝑝̂ = and 𝑞̂ = 1 − 𝑝̂
𝑛
Example: In a random sample of 200 Covid patients, 138 were found to be negative of Covid.
Construct a 95% confidence interval for the population of all Covid Patients who are negative
of Covid.
Solution: We are given X = 138 and n = 200
𝑋 138
̂= =
Hence, 𝒑 ̂ = 1 − 𝑝̂ = 1 – 0.69 = 0.31
= 0.69 and 𝒒
𝑛 200
𝑝̂ 𝑞̂ 𝑝̂ 𝑞̂
(𝑝̂ − 𝒁𝜶 (√ ) , 𝑝̂ + 𝒁𝜶 (√ ))
𝟐 𝑛 𝟐 𝑛
(0.69)(0.31) (0.69)(0.31)
= (0.69 − 1.96 (√ ) , 0.69 + 1.96 (√ ))
200 200
= (0.63, 0.75)
This means that we are 95% confident that the proportion of patients who are negative of
Covid lies between 63% to 75%.
5|P age
CHAPTER V: PARAMETRIC STATISTICAL INFERENCE: HYPOTHESIS TESTING
Key Concepts:
Hypothesis testing is a decision-making process for evaluating claims about a population
based on the characteristics of a sample purportedly coming from that population. The
decision is whether the characteristic is acceptable or not. It involves making a decision
between the two opposing hypotheses which are the null hypothesis and alternative
hypothesis.
In problems that involve hypothesis testing, there are words like greater, efficient, improves,
effective, increases and so on suggest a right-tailed direction. Words like decrease, less than,
smaller, and the like suggest a left-tailed direction.
Example:
Formulate a null hypothesis and its alternative hypothesis and identify whether the alternative
hypothesis is directional or non-directional for each of the following:
1. The average TV viewing time of all five-year old children is 4 hours daily.
Solution:
Null Hypothesis: The average TV viewing of all five-year old children is 4 hours daily.
(This is the claim.) In symbol, 𝐻0 : 𝜇 = 4
Alternative Hypothesis: The average TV viewing of all five-year old children is not equal to 4
hours daily. (This is the opposite of the claim.) In symbol, 𝐻𝑎 : 𝜇 ≠ 4
A decision rule is a criterion that specifies whether or not the null hypothesis should be
rejected in favor of the alternative hypothesis. This decision is based on the value of a test
statistic, the value of which is determined from the sample measurements.
6|P age
The critical region is the area under sampling distribution that includes unlikely sample
outcomes. Also known as the rejection region, this is the area where the null hypothesis is
rejected. The acceptance region on the other hand is the region which lies opposite of the
rejection region. This is the region where the null hypothesis is accepted. The value between
the critical region and the acceptance region is called the critical value.
Critical
Critical value
value
Critical Critical
value value
Acceptance
region
In a hypothesis testing procedure, we take into account the two types of errors that may be
committed in rejecting or accepting the null hypothesis. The Type I error occurs whjen we
rejected the null hypothesis when it is true. This is denoted by 𝜶. The Type II error on the
other hand occurs when we accept the null hypothesis when it is false. This is denoted by 𝜷.
The following table displays the possible consequences in the decision to accept the null
hypothesis.
Decision Null Hypothesis (𝑯𝟎 )
TRUE FALSE
7|P age
In hypothesis testing procedure, the following steps are suggested.
Key Concepts:
Hypothesis testing for the mean of one sample case is concerned with testing the
significance of the deviation of a sample mean from the population mean. This involves
determining whether 𝜎 is known or unknown. The following table shows the form of the null
hypothesis 𝐻0 , the test statistics to be used, the kind of alternative hypothesis and the
corresponding critical region for each type of alternative hypothesis.
Observe that if 𝜎 is known, the test statistics is the Z-test and if 𝜎 is unknown, the test
statistics is the t-test. These are Z – test and t-test for one sample case. Furthermore, it
should be stressed that even if 𝜎 is unknown but for as long as n > 30, we use the Z-test
instead of t-test. Hence, the t-test is reserved only for small cases and the Z-test for
sufficiently large n>30.
8|P age
Example: An instructor gives his class an achievement test, which as he knows from years of
experience, yields a mean of 80. His present class of 40 students obtains a mean of
85 and a standard deviation of 8. Can he claim that his present class is a superior
class? Use 𝜶 =0.01
This means that we are willing to commit a 1% chance of committing a Type I error
that is, rejecting the null hypothesis when it is true.
Here 𝜎 is known, since n = 40 > 30, then the test statistic that will be used is
̅ − 𝝁𝟎
𝑿
𝒁= 𝝈
√𝒏
Confidence 𝜶
𝜶 𝒁𝜶 𝒁𝜶
coefficient 𝟐 𝟐
𝒁 > 𝒁𝒂 Rejection
Acceptance Region
𝒁 > 𝒁𝟎.𝟎𝟏 Region
𝒁 > 𝟐. 𝟑𝟐
-3 -2 -1 1 2 3
2.32
Step 4: Computations
9|P age
Step 5: Decision and Conclusion
The computed Z-value lies in the critical region. Hence, the null hypothesis 𝐻0
is rejected and conclude that the population mean 𝝁 > 80. Thus, we can say that his
present class is superior than his previous classes.
Example: According to the College Entrance Examination Board, the mean verbal score on the
Scholastic Aptitude Test (SAT) in 1983 was 425 points out of a possible 800. A random
sample of 25 students last year had mean SAT score of 438 and a standard deviation
of 85. At 𝛼 =0.01 level of significance, does it appear that last year’s mean for verbal
SAT scores has increased over the 1983 mean of 425 points?
𝐻0 : 𝝁 = 425 (There is no significant difference between the last year’s mean verbal
SAT score and that of 1983)
𝐻𝑎 : 𝝁 > 425 (The last year’s mean verbal SAT score has increased over that of 1983)
Since n = 25 < 30, then the test statistic that will be used is
̅ −𝝁𝟎
𝑿
𝒕= 𝒔 , v=n–1
√𝒏
Also, the critical region is given by:
𝒕 > 𝒕𝒂 Rejection
Acceptance Region
𝒕 > 𝒕𝟎.𝟎𝟏, 𝒗=𝒏−𝟏 Region
𝒕 > 𝒕𝟎.𝟎𝟏, 𝒗=𝟐𝟓−𝟏
The computed t-value is 0.7647 which lies in the acceptance region. Hence, we
accept the null hypothesis and conclude that there is no significant difference between
last year’s mean verbal SAT score and that of 1983.
10 | P a g e
LESSON 5.3: TEST ON MEANS FOR TWO SAMPLE CASE
For two independent samples, the following table presents the structure of the null
hypothesis, the test statistics to be used, the different forms of alternative hypothesis and
their corresponding critical region.
𝐻0 Test Statistic 𝐻𝑎 Critical Region
𝟐 𝟐
a. 𝝈𝟏 𝒂𝒏𝒅 𝝈𝟐 known
̅𝟏 − 𝑿
(𝑿 ̅ 𝟐 ) − 𝒅𝒐
𝒁= 𝝁𝟏 − 𝝁𝟐 < 𝒅𝟎 𝒁 < −𝒁𝒂
𝝁𝟏 − 𝝁𝟐 = 𝒅𝒐 𝝁𝟏 − 𝝁𝟐 > 𝒅𝟎 𝒁 > 𝒁𝒂
𝝈𝟐 𝝈𝟐
√ 𝟏 + 𝟐 𝝁𝟏 − 𝝁𝟐 ≠ 𝒅𝟎 | 𝒁 | > −𝒁𝒂
𝒏𝟏 𝒏𝟐 𝟐
Example: A statistics test was given to 50 girls and 75 boys. The girls made an average grade of
80 with a standard deviation of 4 and the boys hand an average of 86 with a standards
deviation of 6. Is there sufficient evidence at 0.05 level of significance that the average
grades of girls and boys differ?
We use the Z – statistic even if 𝝈𝟐𝟏 𝒂𝒏𝒅 𝝈𝟐𝟐 are unknown because the sample sizes of
the two samples are greater than 30. Hence 𝒔𝟐𝟏 ≈ 𝝈𝟐𝒛 𝒂𝒏𝒅 𝒔𝟐𝟐 ≈ 𝝈𝟐𝟐
11 | P a g e
Also, the critical region is given by:
|𝒁 | > 𝒁 𝒂
𝟐
|𝒁| > 𝒁𝟎.𝟎𝟐𝟓 Rejection
Rejection Acceptance
|𝒁| > 𝟏. 𝟗𝟔𝟒 Region Region
Region
-3 -2 -1 1 2 3
1.964 1.964
Step 4: Computations
We are given the following:
̅ 𝟏= 80
𝑿 ̅ 𝟐= 86
𝑿 𝒏𝟏 = 50 𝒏𝟐 = 75 𝝈𝟏 = 4 𝝈𝟐 = 6 𝒅𝟎 = 0
̅ 𝟏 −𝑿
(𝑿 ̅ 𝟐)−𝒅𝒐 (𝟖𝟎−𝟖𝟔)−𝟎 −𝟔 −𝟔 −𝟔
𝒁= = 𝟐 𝟐
= = = 𝟎.𝟖𝟗𝟒𝟒𝟐𝟕𝟏
𝟏𝟔 𝟑𝟔 √𝟎.𝟖
𝝈𝟐 𝟐
𝟏 + 𝝈𝟐 √𝟒 +𝟔 √ +
√ 𝟓𝟎 𝟕𝟓 𝟓𝟎 𝟕𝟓
𝒏𝟏 𝒏𝟐
𝒁 = - 6.71
Notice that 𝒁 = - 6.71 is located in the rejection area. Thus, the null hypothesis
is rejected and conclude that there is a significant difference between the average
grades of girls and boys.
Example: A Cardiologist wishes to determine which of the two drugs A or B is more effective in
lowering diastolic blood pressure (BP). In a group of 11 patients, he administered Drug
A and found out that mean diastolic BP was 85 with a standard deviation of 4.7 while
17 patients who took Drug B have a mean diastolic BP of 79 with a standard deviation
of 6.1. Would you say that the diastolic BP of those taking Drug A increases those of
Drug B by more than 8 mmHg? Use 𝜶 =0.01 and assume the populations to be
approximately normally distributed with equal variances.
𝐻0 : 𝝁𝟏 − 𝝁𝟐 = 𝟖
𝐻𝑎 : 𝝁𝟏 − 𝝁𝟐 > 𝟖
12 | P a g e
The test statistic that will be used is
̅ 𝟏 −𝑿
(𝑿 ̅ 𝟐)−𝒅𝒐
𝒕= 𝟏 𝟐
with v = 𝒏𝟏 + 𝒏𝟐 – 2 degrees of freedom and where the pooled
𝑺𝒑 √ +
𝒏𝟏 𝒏𝟐
variance is given by
(𝒏𝟏 − 𝟏) 𝒔𝟐𝟏 + (𝒏𝟐 − 𝟏) 𝒔𝟐𝟐
𝒔𝟐𝒑 =
𝒏𝟏 + 𝒏𝟐 − 𝟐
-3 -2 -1 1 2 3
Step 4: Computations 2.479
We are given the following:
Drug A Drug B 𝑑0 = 8
𝑛1 = 11 𝑛2 = 17
𝑋̅1 = 85 𝑋̅2 = 79
𝑠1= 4.7 𝑠2= 6.1
13 | P a g e
(𝑋̅1 −𝑋̅2 )−𝑑𝑜 (85−79)−8 −2
= = =
𝑆𝑝 √
1
+
2 1 2 5.6 (√0.3869528)
𝑛1 𝑛2
5.6 √ +
11 17
t = −𝟎. 𝟗𝟐𝟑
Step 5: Decision and Conclusion
Clearly, the computed t-value is within the acceptance region, and so we decide
to accept 𝐻0 and conclude that Drug A does not increase the diastolic BP compared to
Drug B by more than 8 mmHg.
For two related or paired samples, the following table presents the structure of the null
hypothesis, the test statistic to be used, the different forms of alternative hypotheses and their
corresponding critical region.
𝐻0 Test Statistic 𝐻𝑎 Critical Region
̅
𝒅 − 𝒅𝒐
𝒕= 𝒔 𝝁𝑫 < 𝒅𝟎 𝒕 < −𝒕𝒂
𝝁𝑫 = 𝒅𝒐 𝒅
𝝁𝑫 > 𝒅𝟎 𝒕 > 𝒕𝒂
√𝒏 𝝁𝑫 ≠ 𝒅𝟎 | 𝒕| > −𝒕𝒂
Where v = n - 1 𝟐
Example: An exercise therapist measured the heart rates of 15 randomly selected patients. The
patients were then placed on a running program and their heart rates were measured
again after a week. The results are as follows:
Patient Heart Rate Before Heart Rate After
1 68 67
2 76 77
3 74 74
4 71 74
5 71 69
6 72 70
7 75 71
8 83 77
9 75 71
10 74 74
11 76 73
12 77 68
13 78 71
14 75 72
15 75 77
Do the data provide sufficient evidence that the running program will reduce heart
rates? Use 𝜶 =0.01 level of significance.
Solution: Let 𝜇1 = average heart rate of the patients before the program
𝜇2 = average heart rate of the patients after the program
14 | P a g e
Step 2: The level of significance is set at 𝜶 =0.01
∑𝑑 35
̅=
𝒅 = = 2.33
𝑛 15
𝒏 ∑ 𝒅𝟐 − (∑ 𝒅)𝟐 𝟏𝟓(𝟐𝟑𝟗) − (𝟑𝟓)𝟐 𝟑𝟓𝟖𝟓 − 𝟏𝟐𝟐𝟓 𝟐𝟑𝟔𝟎
𝒔𝒅 = √ =√ =√ = √ 𝟐𝟏𝟎 = √11.24 = 3.35
𝒏 (𝒏−𝟏) 𝟏𝟓(𝟏𝟓−𝟏) 𝟐𝟏𝟎
15 | P a g e
Computing for t, we obtain:
̅ −𝒅𝒐
𝒅 2.33 − 0 2.33 − 0 2.33 2.33
𝒕= 𝒔𝒅 = 3.35 = 3.35 = 3.35 = 0.8656 = 2.69
√𝒏 √15 √15 3.87
𝒕 = 2.69
Since the computed t-value lies within the rejection region, then we decide to
reject the null hypothesis. Equivalently, we accept 𝐻𝑎 : 𝝁𝟏 > 𝝁𝟐 and conclude that the
running program will on the average, reduce heart rates.
16 | P a g e
LESSON 5.4: LINEAR CORRELATION
1. Constructs a scatter plot. M11/12SP-IVg-3
2. Describes shape (form), trend (direction), and variation (strength) based on a scatter plot.
M11/12SP-IVg-4
3. Calculates the Pearson’s sample correlation coefficient. M11/12SP-IVh-2
4. Solves problems involving correlation analysis. M11/12SP-IVh-3
The linear correlation coefficient is a measure that is used to decide the strength of linear
relationship between two variables X and Y. The population linear correlation is denoted by
the Greek letter rho (𝜌). Since this is a parameter, 𝜌 is estimated by the statistic r. This is
called Pearson’s coefficient of correlation given by the formula:
The correlation coefficient is the single number that represents the degree of relation
between two variables.
The Pearson Product-Moment Correlation Coefficient (symbolized by r) is the most
common measure of correlation; researchers calculate it when both the X variable and
the Y variable are interval or ratio scale measurements. Mathematically, it can be
defined as the average of the cross-products of z-scores.
The value of r ranges between ( -1) and ( +1)
The value of r denotes the strength of the association as illustrated
The sign of r denotes the nature of association
If the sign is +ve this means the relation is direct (an increase in one variable is
associated with an increase in the other variable and a decrease in one variable
is associated with a decrease in the other variable).
While if the sign is -ve this means an inverse or indirect relationship (which means an
increase in one variable is associated with a decrease in the other).
17 | P a g e
Example:
1. The more it rains, the more sales for umbrellas go up.- Positive Correlation
2. A student who has many absences has a decrease in grades- Negative Correlation
3. The longer your hair grows, the more shampoo you will need.- Positive Correlation
4. The older a man gets, the less hair that he has.- Negative Correlation
5. Time spent playing in ML and money spent in Load. _________________________________
6. 0.75 – Strong Positive Correlation
7. -0.32 – Weak Negative Correlation
Example:
A sample of 6 children was selected, data about their age in years and weight in kilograms
was recorded as shown in the following table. Find the relation between age and weight compute
the simple correlation coefficient:
Age Weight
(years) (Kg)
7 12
6 8
8 12
5 10
6 11
9 13
18 | P a g e
Make a table and solve for xy, X2, Y2
Solution:
𝟔 (𝟒𝟔𝟏)−(𝟒𝟏)(𝟔𝟔)
𝒓= = 0.759
√[𝟔(𝟐𝟗𝟏)−(𝟒𝟏)𝟐 ][𝟔(𝟕𝟒𝟐)−(𝟔𝟔)𝟐
Thus, the relationship between age and weight of 6 selected children is strongly positive.
As the age of the children increases the weight also tends to increase. We can say that as
the children grow older it has a tendency that their weight also will affected and it will somehow
increase.
19 | P a g e
Anxiety
Test score (Y) X2 Y2 XY
(X)
10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
Solution:
𝟔 (𝟏𝟐𝟗)−(𝟑𝟐)(𝟑𝟐) 𝟕𝟕𝟒−𝟏𝟎𝟐𝟒
𝒓= = = - .94
√(𝟑𝟓𝟔)(𝟐𝟎𝟎)
√[𝟔(𝟐𝟑𝟎)−(𝟑𝟐)𝟐 ][𝟔(𝟐𝟎𝟒)−(𝟑𝟐)𝟐
Thus, the relationship between Anxiety and Test Scores is strongly negative.
As anxiety increases the Test tends to decrease. We can say that Anxiety affects the result
of the Test Scores of the students.
20 | P a g e