You are on page 1of 12

1

Probability and Statistics


Module 3: Sampling and Sampling Distribution

Objectives: At the end of the module, I can:

1. illustrate random sampling.


2. distinguish between a parameter and a statistic
3. identify sampling distributions of satistics (sample mean)
4. find the mean and variance of the sampling distribution of the sample mean.
5. define the sampling distribution of the sample mean for normal population when the
variance is a) known b) unknown.
6. illustrate the Central Limit Theorem.
7. define the sampling distribution of the sample mean using the Central Limit Theorem.
8. Solve problems involving sampling distributions of the sample mean.

Introduction

In the previous modules, you have encountered some unfamiliar terms including population, sample,
parameter, and statistic. These terms are commonly used in probability and statistics and a deepened knowledge of
them help facilitates your understanding of the subject. This module focuses on these terms as well as the different
sampling techniques which are functionally used in conducting researches and studies.

Population and Sample

In statistics, the term population refers to the totality of observations or elements


from a set of data. Suppose a teacher conducts a study on the correlation of the
students’ entrance examination scores and their respective academic performance. To
ensure the validity of his findings, he decided to include all the students who are enrolled
for the current school year under a certain program or course, hence, the entire
population.
On the other hand, a sample refers to one or more Population
elements taken from the population for a specific purpose.
In other words, this is a subset or a representative of the
population that is selected for a specific purpose. For Sample
example, take the scenario given above. Because of budget
issues and feasibility concerns, the teacher decided to include Fig. 7.1 Illustration of
a population only a group of 200 students to participate in population and a sample
his study. This illustrates a sample.
In most studies, only a sample of the population is considered. This saves time,
money, and effort in the part of the researcher. In medical technology, only a sample of
blood is needed to determine the blood type of a patient. Also, in microbiology, only a
sample of water is needed to determine the potable quality of water from a pumping
station. In some studies, however, it is inevitable to include all the members of the
population. One illustration is a government census, which surveys the whole population
in order to get necessary information.

Parameter Versus Statistic

The useful information from a population and a sample may be derived if proper
descriptions are made. These descriptions are usually given as numerical measurements
such as the mean or standard deviation.
A numerical measure that describes the whole population is called a parameter. For
example, if all the students in a school are surveyed about their heights and an average
2

height of 65 inches (in) was determined, then 65 in is called a population parameter. A


numerical description of the sample, 65 in will be called a sample statistic when only 50
students out of 230 students are surveyed to determine the average height.
To further understand the difference between parameter and statistic, study the
following examples.

Example 1: In the following statements, identify the parameter and the statistic used in the study.
1. The Food and Nutrition Research Institute of the Department of Science and Technology (FNRI-
DOST) surveyed 14 million Filipino adults aged 20 and above and determined that 80% of Filipino
adults are at risk of hypertension.
2. A researcher wants to estimate the average death age of Filipino women in the last decade and
from a sample of 100 deaths, he obtained a sample mean age of 73.
3. Capvex is a drug used to treat patients with metastatic breast cancer. An oncologist wants to
determine the proportion of patients taking Capvex who are healed within 10 weeks. A random
sample of 300 breast cancer patients was selected and 250 of them were healed after 10 weeks.
Solution:
Parameter: Statistic:
1. a. The parameter is the percentage of adults at risk of b. The statistic is the percentage of 80% obtained
hypertension out of all Filipino adults aged 20 and above. from the sample of 14 million adults.
2.a The parameter is the mean death age taken from the b. The statistic is the mean age of 73 identified from
data including all Filipino women who died in the last 10 the 100 samples.
years.
3.a. The parameter is the proportion of patients healed by b. The statistic is the proportion 250 = 0.833
300
Capvex within 10 weeks out of all patients taking it.
obtained from the sample of 300 patients.

Assessment 3.1

Sampling Techniques

When conducting studies where only few members of the population can
participate, the selection of a sample is very crucial as wrong sampling can lead to invalid
results. Researchers need to guarantee that the sample chosen to partake in a study is
the representative of the entire population and this, proper sampling technique must be
carried out in order to ensure that the results of the study will not be put to waste.
Suppose you are conducting a research which will be submitted as part of your final
requirement in your English subject. Your research aims to identify the English language

proficiency of the junior high school students in your school. To implement your research,
you chose a sample which consists of all students belonging to the top ten of each grade
level. Do you think you will get valid results considering that there is bias in the selection
of your sample? In general, there are two categories of sampling techniques which
depend on the consideration of the chances of selecting one member of the population
to be included in the sample. These are probability sampling and nonprobability sampling
techniques. The difference between the two sampling techniques is very distinct.

In probability sampling, each member of the population has a known probability of


being selected in the sample, while in nonprobability sampling, there is bias in the
selection and there is no recognized probability that one member will be included in
the sample.
3

Probability Versus Nonprobability Sampling


A sample is a small, representative part of the population. Samples may be selected from the
population using either probability (unbiased) or nonprobability (biased) sampling.

Simple Random Sampling

Simple random sampling is the most commonly used sampling technique. In


this technique, each member of the population has an equal chance to be selected
as a participant. The process is done by choosing the members of the sample one by
one, using either the lottery method or the tables of random numbers.
In the lottery method, all the members of the population are assigned with
specific numbers which are then written in pieces of paper and placed in a fishbowl
(or box). The researcher then selects numbers pieces from the bowl, one at a time.
All members that correspond to the selected numbers will make up the sample.
The second method is through the use of the tables of random numbers.
The random number table is composed of seven pages (tables) with digits form 0 to 9
having the same frequency of occurrence.

Here are the steps when using the tables of random numbers in selecting the samples:
1. Assign numbers to each member of the population.
2. Choose the number arbitrarily.
3. With eyes closed and pencil or pen at hand, choose the set of numbers from which to
start. For example,
if you selected table 1 and your pen pointed to the 3 rd column of the 2nd row, then you have the following
numbers: 43886, 94107, 73847, 38244, and 61157. (see the table below.

Table 7.1 Table 1 of the Random Number Table

4. The number of digits to be considered in the random numbers selected depends on the
number of digits needed. In the example given above, if you need two-digit numbers,
then you will choose the members numbered 43, 94, 73, 38, 61. If three-digit numbers
are needed, then you will have 438, 941, 738, 382, 611. Ignore the numbers which do
not exist in the given population.
5. Repeat the process until you reach the desired number of members in the sample.
ignore duplicates.

Example 2: Suppose five students will be selected from the list of 40 students in a class numbered 1 to 40.
Using Table 3 of the tables of random numbers, 4th column of the 2nd row then numbers 11 and
22 could be members of the sample. You ignore 63, 79, and 87 since there are only 40 students.
4

To complete the sampling process, choose another column and row, say 9th column of 1st row.
Then students numbered 07, 21 and 27 will be parts of the sample.

Table 7.2. Table 3 of the Random Number Table.

37100 62492 63642 47638 13925 80113 88076 42575 44078 62703
53406 13855 38519 29500 62479 01036 87964 44498 07793 21599
55172 81556 18856 59043 64315 38270 25677 01965 21310 28115
40353 84807 47767 46890 16053 32415 60259 99788 55924 22077
18899 09612 77541 57675 70153 41179 97535 82889 27214 03482
68141 25340 92551 11326 60939 79355 41544 88926 09111 86431
51559 91159 81310 63251 91799 41215 87412 35317 74271 11603
92214 33386 73459 79359 65867 39269 57527 69551 17495 91456
15089 50557 33166 87094 52425 21211 41876 42525 36625 63964
96461 00604 11120 22254 16763 19206 67790 88362 01880 37911
A drawback in using the lottery method or the tables of random numbers is when the population is very large, e. g.,
there are 1 000 members, that it would be very difficult and time consuming to write all the numbers and to pick the
samples. In this case, systematic random sampling will be more preferable.

Systematic Random Sampling

Systematic sampling is a random sampling technique which considers every nth element of the population in
the sample with the selected random starting point from the first q members. To carry out his sampling technique.
Follow this three-step procedure:
1. Assign a number to each member of the population.
2. Choose a random starting point (n). do this by dividing the number of members in the population by the desired
number of samples. The quotient (q) will represent the firs q members of the population and the random
starting point will be determined from them by the lottery method.
3. From student number n, skip count by n repeatedly until the desired number of samples is completed.
Remember to eliminate the numbers which have been previously selected.

40
Example 3: A sample of 10 will be selected from a population of 40 patients. Since q = = 4, then choose
10
from the first four students the random starting point using lottery. Suppose 3 was selected, then
every 3rd element of the population will be included in the sample, that is, the patients numbered 3,
6, 9, 12, 15, 18, 21, 24, 27, and 30.

Stratified Sampling

Sometimes, a given population is purposively divided into homogeneous partitions (or groups) depending on
certain factors that might be affecting the results of the study. These homogeneous partitions are also called strata
(singular stratum). The sampling for this partitioned population is usually done through a stratified random sampling
technique.
Unser stratified random sampling conditions, one has to make sure that each stratum of the population is
properly represented in the sample. This is achieved by randomly selecting the samples proportionally form all the
strata. Study the following example.

Example 4: A sample of 100 students is to be selected from a junior high school population of 1 000, of which
250 are in grade 7, 200 are in grade 8, 300 are in grade 9 and 250 are in grade 10. If the sample
size is to be proportionally distributed, how many samples are to be taken from each stratum?

Solution: Partitions Size of the Partition Number of Samples


Grade 7 250 250
× 100 = 25
1 000
200
Grade 8 200 × 100 = 20
1 000
300
Grade 9 300 × 100 = 30
1 000
Grade 10 250 250
× 100 = 25
1 000
Total 1 000 100

Note that stratified random sampling is particularly useful only in conditions when the population is divided into
homogeneous groups, that is, the members are grouped similarly based on a controlling variable in the study such as
gender, race, civil status, or nationality.
5

Cluster Sampling

Like stratified sampling, the population is divided into groups, called clusters, in another probability sampling
technique called cluster sampling. However, unlike stratified sampling, the clusters are heterogeneous groups of the
population. This means that they are grouped differently according to the controlling variables of the study. The sample
is taken through a random selection of cluster(s) and then, all the members of the chosen cluster(s) will be part of the
samples.

Example 5: Suppose a researcher wants to study the effect of a certain teaching methodology among the
students coming from a particular town. Since there are many schools in the town, it will be very
inefficient and impractical to consider all the schools in the study. Instead, the researcher will
randomly choose a few schools and then, the students in these schools will be surveyed.

Assessment 3.2:

The Sampling Distribution

Consider a population with 50 members and suppose a sample of 15 members is selected. At first, you can
randomly take a sample of 15 and you know that this sample may be described using its mean, variance, and standard
deviation. The mean, variance, and standard deviation are called sample statistics. Replacing the 15 members formerly
chose, choose another sample of 15 and compute its corresponding mean, variance, and standard deviation. Then, the
new sample statistics could be different from the ones you identified earlier. Repeat this for many times over and you
will find that the mean, variance, and standard deviation vary from sample to sample. Therefore, sample statistics can
be classified as random variables and thus, have a corresponding distribution which is called the sampling distribution.

Sampling Distribution
A sampling distribution is a distribution that shows the frequency with which values of statistics are
observed when all the possible random samples are drawn from a given population.

Example 6: A sample with the size n = 3 is drawn from the set 5, 6, 8, 12, and 20. Construct the sampling
distribution of the medians.

Solution:
Step/Solution Explanation
𝑛! Recall our lesson in junior high about combination. Using
nCr =
𝑟!(𝑛−𝑟)!
combinations, we use the formula;.
Where n = 5
These are (5,6,8,12, and 20).
r=3
The sample size, 3.
5! Substitute the value of n and r.
5C3 =
3!(5−3)!
5! 5𝑥4𝑥3𝑥2𝑥1 120
= = = Simplify.
3!(2)! 3𝑥2𝑥1(2!) 6(2𝑥1)
Hence, 5C3 = 10
there are 5C3 = 10 possible samples with size n = 3. The table below gives all these 10 samples and their corresponding
medians.
Arrange the samples from smallest to highest and
Sample Median
locate the middle (median) number. the median or
1 5, 6, 8 6 middle of 5,6, 8 is 6. Do this step to all the samples to
2 5, 6, 12 6 find their median. The table with its corresponding
3 5, 6, 20 6 samples and medians are shown at the left side.
4 5, 8, 12 8
5 5, 8, 20 8
6 5, 12, 20 12
6

7 6, 8, 12 8
8 6, 8, 20 8
9 6, 12, 20 12
10 8, 12, 20 12

The sampling distribution of the medians is shown at


Sample Frequency the left.
𝑥̃
6 3 The sample (𝑥̃) = 6, with a frequency of 3, at the left,
8 4 means that there are three (3) 6’s. Same with 8 with 4
12 3 as its frequency and 12 with 3 as its frequency. Similar
to the probability distributions you encountered in the
previous modules, sampling distributions have a mean
and a standard deviation. The standard deviation of a
sampling distribution is usually called the standard
error.

Example 7: Find the mean and the standard error of the sampling distribution of the medians given in
example 6.
Solution:
Step/Solution Explanation
∑ 𝑛 𝑥𝑖 ( 𝑓 ) The mean (𝜇) of the sampling distribution is solved using
𝜇𝑥̃ = 𝑖
𝑛
the formula:
𝜇𝑥̃ it is read as “mean of the median”.
Substitute the value of x1 = 6, f1=3; so on… x2 = 8, f2= 4; and
6(3)+8(4)+12(3) 86 x3 = 12, f3= 3. Divided by the total number or (n) of
𝜇𝑥̃ = 10
= = 8.6
10
samples.
𝜇𝑥̃ = 8.6
Hence, 8.6 is the mean of the median
2 2 To calculate the standard error of the distribution, we
𝑥̃ f 𝑥̃– 𝜇𝑥̃ (𝑥̃– 𝜇𝑥̃ ) f(𝑥̃– 𝜇𝑥̃ )
6 3 –2.6 6.76 20.28 construct a table as shown at the left side.
8 4 –0.6 0.36 1.44
12 3 3.4 11.56 34.68
N=10 56.4
𝑥̃ = 6 𝑥̃ = 8 𝑥̃ = 12 𝑥̃– 𝜇𝑥̃ , means subtract the sample mean by the mean of
- 𝜇𝑥̃ = 8.6 - 𝜇𝑥̃ = 8.6 - 𝜇𝑥̃ = 8.6 the sampling distribution.
-2.6 0.6 3.4
(𝑥̃– 𝜇𝑥̃ )2, means square the difference of the sample
median and the mean of the sampling distribution.
(-2.6)2 = 6.76 (0.6)2 = 0.36 (3.4)2 = 11.56
f(𝑥̃– 𝜇𝑥̃ )2, means multiply the frequency by the square the
3(6.76) = 20.28 4(0.36) = 1.44 3(11.56) = 34.68 difference of the sample median and the mean of the
sampling distribution.
∑ 𝑓 (𝑥̃ – 𝜇𝑥̃ )2 ∑ 𝑓 (𝑥̃ – 𝜇𝑥̃ )2 , means add all the results of f(𝑥̃– 𝜇𝑥̃ )2.
𝜎𝑥2̃ = = 56.4 = 5.64 Which is 56.4. Substitute these values in the formula for
𝑛 10
finding the standard deviation or standard error.
𝜎𝑥̃ = √5.64 ≈ 2.37
The standard error of the sample medians is approximately
2.37.

Example 8: Random samples with size 4 are drawn from the population containing the values 14, 19, 26, 31,
48, and 53.
a. Construct a sampling distribution of the sample means.
b. Find the mean of the sample means.
c. Compute the standard error of the sample means.
7

Solution:
Step/Solution Explanation
𝑛! Using combinations, we use the formula: These are the (n),
nCr = Where n = 6 ; r = 4
𝑟!(𝑛−𝑟)!
(14,19,26,31,48, and 53). The sample size, 4.
6! 6! 720 Substitute the given
6C4 = = = =15
4!(6−4)! 4!𝑥2! 48
Sample Mean Sample
Mean There are 6C4 = 15 samples with size 4 that can be drawn
( 𝑥̅ ) ( 𝑥̅ ) from the population with size 6. At the left are the 15
14, 19, 26, 31 22.5
14, 26, 31, 53 31 samples arrange from smallest to the biggest. To find the
14, 19, 26, 48 26.75
14, 26, 48, 53 36, 25 mean of each sample add the four samples divided by 4.
Like, 14+ 19+ 26+ 31=90 divide by 4(to get the mean) and
14, 19, 26, 53 28
14, 31, 48, 53 36.5
it is equal to 22.5. Follow this procedure to get all the
14, 19, 31, 48 28
19, 26, 31, 48 31
mean of each sample.
14, 19, 31, 53 29.25
19, 26, 31, 53 32.25
14, 19, 48, 53 33.5
19, 26, 48, 53 36.5
14, 26, 31, 48 29.75
19, 31, 48, 53 37.75
26, 31, 48, 53 39.5
Make a table for the distribution of the sample mean.
a. Construct a sampling distribution of the sample means.

𝑥̅ 22.5 26.75 28 29.25 31 32.25 33.5 35.25 36.5 37.75 39.5 N


f 1 1 2 1 2 1 1 1 2 1 1 15

𝑥̅ = 22.5 and f=1, means there is only (1) 22.5 in the distribution table. 𝑥̅ = 28 and f=2, means there are two (2) 28 in
the distribution table, so with the rest of the distribution.
b. Mean of the Sample Means. To solve for the mean of the sample means, we use the formula
𝑥𝑖 ( 𝑓 )
𝜇𝑥̅ = ∑𝑛𝑖
𝑛
𝑥𝑖 ( 𝑓 ) 22.5 (1)+26.75(1)+28(2)+29.25(1)+31(2)+32.25(1)+33.5(1)+35.25(1)+36.5(2)+37.75(1)+39.5(1)
𝜇𝑥̅ = ∑𝑛𝑖
𝑛 = 15
477.5
𝜇𝑥̅ = ≈ 31.83
15
Hence, the mean of the sample means is 31.83.
c. Standard Error of the Sample Means. To solve for the standard error, we need to have a table with five columns.
∑ 𝑓 (𝑥̅ – 𝜇𝑥̅ )2
Use the formula, 𝜎𝑥̅2 =
𝑛
𝑥̅ – 𝜇𝑥̅ , means 22.5 – 31.83 = 9.33 ; 26.75 – 31.83= -5.08; 28 – 31.83 = -3.83; so on and so forth.
(𝑥̅ – 𝜇𝑥̅ )2 , means (-9.33)2= 87.0489; (-5.08)2= 25.8064; (-3.83)2=14.6689; so on and so forth.
f (𝑥̅ – 𝜇𝑥̅ )2 , means 1(87.0489)=87.0489; 1(25.8064)=25.8064; 2(14.6689)=29.3378; so on & so forth.
∑ 𝑓 (𝑥̅ – 𝜇𝑥̅ )2 , means summation of f (𝑥̅ – 𝜇𝑥̅ )2 which is 306.7085. The complete table for the distribution of the
sample mean is shown below.
𝑥̅ f 𝑥̅ – 𝜇𝑥̅ (𝑥̅ – 𝜇𝑥̅ )2 f (𝑥̅ – 𝜇𝑥̅ )2
22.5 1 –9.33 87.0489 87.0489
26.75 1 –5.08 25.8064 25.8064
28 2 –3.83 14.6689 29.3378
29.25 1 –2.58 6.6564 6.6564
29.75 1 –2.08 4.3264 4.3264
31 2 –0.83 0.6889 1.3778
32.25 1 0.42 0.1764 0.1764
33.5 1 1.67 2.7889 2.7889
32.25 1 3.42 11.6964 11.6964
36.5 2 4.67 21.8089 43.6178

37.75 1 5.92 35.0464 35.0464


39.5 1 7.67 58.8289 58.8289
N 15 306.7085
∑ 𝑓 (𝑥̅ – 𝜇𝑥̅ )2 Substitute ∑ 𝑓 (𝑥̅ – 𝜇𝑥̅ )2 = 306.7085 ; n = 15.
𝜎𝑥̅2 =
𝑛
306.7085
√𝜎𝑥̅2 = √ ≈ 4.52
15
8

𝜎𝑥̅ ≈ 4.52 Hence, the standard error of the sample mean is 4.52.

The Sampling Distribution of the Sample Means

Suppose that all possible samples with size n are drawn from a population with the
mean 𝜇 and standard deviation 𝜎. Then, the means obtained from all these possible
samples will make up a sampling distribution of the sample means exhibiting the following
properties:
1. The mean of the sample means (𝜇𝑥̅ ) is the same as the population mean 𝜇.
2. The standard error of the sample mean (𝜎𝑥̅ ) is equal to the populations standard
deviation 𝜎 divided by the square root of the sample size, thus,
𝜎
𝜎𝑥̅ =
√𝑛

Mean and Standard Error of the Sampling Distribution of Sample Means


The mean and standard error of the distribution of sample means taken from a population with
mean 𝜇 and standard deviation 𝜎 can be computed as
𝜇𝑥̅ = 𝜇
𝜎
𝜎𝑥̅ =
√𝑛

Example 9: Random samples with size 4 are drawn from the population continuing the values 14, 19, 26, 31,
48, and 53. Find the mean and the standard error of the sample means.

Solution:
Step/Solution Explanation
Find the mean and the standard error of the sample The mean of the population is:
means.
14 +19 +26+31+48+53 191 Therefore,
is 𝜇 = = ≈ 31.83.
6 6

𝜇𝑥̅ = 𝜇 ≈ 31.83 The sample mean is equal to the population mean.

(𝑥− 𝜇)2 To get the standard error, compute first the population
𝜎=√ standard deviation using the formula
𝑛
Observe that x=14, 19, 26, 31, 48, and 53; The mean (𝜇)=31.83; n=6. Substitute the value of the x’s, 𝜇, and n in the
formula. Hence,

(14 – 31.83)2 + (19 – 31.83)2 + (26 – 31.83)2 + (31 – 31.83)2 + (48 – 31.83)2 + (53 – 31.83)2
𝜎 =√
6

𝜎 ≈ 14.3 The population standard deviation is 14.3


Thus, the standard error of the mean is Use the formula to solve the standard error:
𝜎
𝜎 14.3 𝜎𝑥̅ = ; where 𝜎 ≈ 14.3 ; n=4
𝜎𝑥̅ = ≈ ≈ 7.15 √𝑛
√𝑛 √4

Observe that the previous example is the same as example 8 and from this, you can verify that
the mean of the sample means is the same as the population mean. However, you might be
wondering why the computed standard error for examples 8 and 9 are different. This is because of
the sampling error. Sampling error is influenced by two factors – the population variance and the
sample size. In fact the larger the size of the sample, the smaller is the sampling error.

Example 5: A school has 900 junior high school students. The average height of these students is 68 in with a
standard deviation of 6 in. suppose you draw a random sample of 50 students. Find a) the mean,
b) standard deviation, and c) variance of the distribution of all sample means that can be derived
from the samples.
Solution:
9

Step/Solution Explanation
Find a) the mean 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 (𝜇)= 68. But is 𝜇 equal to sample mean
Given: 𝜇𝑥̅ = 𝜇 ; 𝜇𝑥̅ = 𝜇 (𝜇𝑥̅ ).

𝜇𝑥̅ = 𝜇 = 68. The mean of the distribution of the sample means is the
same as the population mean. Thus,
Find b) standard deviation The standard error (or standard deviation) of the sample
𝜎
𝜎𝑥̅ = means is equal to:
√𝑛
6
𝜎𝑥̅ = ≈ 0.8485
√50

Find c) The variance The variance is the square of the standard deviation. If 𝜎 =
𝜎2 6, 𝜎 2 =62 = 36. We use the formula:
𝜎𝑥̅2 =
𝑛 Where 𝜎 = 6, 𝜎 2 =62 = 36. And n=50

= 36
50
= 0.72

Hence, 𝜎𝑥̅2 = 0.72

Assessment 3.3:

Distribution of the Sample Mean of a Normal Variable

Suppose a random variable X is given and the population distribution of X is known to


be normal with mean 𝜇 and variance 𝜎 2 . Then, it follows that the sampling distribution of
the mean of all samples of size n selected from X is normal with 𝜇𝑥̅ = 𝜇 and variance
𝜎2
𝜎𝑥̅ 2 = .
𝑛
Knowing the normality of the sampling distribution of the sample mean allows you to
compute probabilities involving means of samples using the standard normal distribution.

Any score from a normal distribution can be converted into its equivalent standard normal
𝑥−𝜇
score using the formula z = . Since in a sampling distribution of the sample mean, the
𝜎
𝜎
random variable gives the statistic 𝑥̅ with 𝜇 and standard deviation , the formula is derived as
√𝑛
𝑥̅ −µ
z= 𝜎 .
√𝑛
Where 𝑥̅ 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛, µ is the population mean, 𝜎 is the population standard deviation
and n is the sample size.

Study the examples below.

Example 6: Suppose that the systolic blood pressures of a certain population are normally distributed with
mean µ=125 and 𝜎 = 8 and a sample size 36 is taken from this population. What is the probability
that a single sample will have a mean blood pressure of less than 122?
Solution:
Step/Solution Explanation
Given: Write the given.
̅̅̅
Sample mean (𝑥) = 122 The sampling distribution of the means of all samples
Population mean (µ) = 125 taken from the population is normal since the population
Population Standard deviation (𝜎) = 8 itself is normally distributed. The sample mean must be
Sample size (n) = 36 converted to its corresponding standard score (z-score).
𝑥̅ −µ Use the formula below and substitute the given:
z= 𝜎
√𝑛
10

122−125 −3 The same as:


z= 8 = 8
√36 6
8
= -3 ÷ Get the reciprocal of the divisor and proceed to
6
multiplication.
−3 6
= • Multiply both numerators and both denominators.
1 8
−18
= Simplify.
8
z = - 2.25 Locate z= -2.25 in the z table. Negative means directions
and in the z-table ɸ(-2.25) = ɸ(+2.25).

P(Z<-2.25) Less than 122 is converted to z-score of -2.25. The


graph in curve is shaded to the left. Its equivalent area in
0.4878 the z-table is located at 2.2 column 0.05 which is 0.4878.
Hence,

-3 -2 -1 0

ɸ(-2.25) = ɸ(+2.25) = 0.4878 This means that the probability that a randomly chosen
P(𝑋̅ <122) = P(Z< -2.25) = 0.5- sample from the population will have a mean systolic
0.4878 blood pressure less than 122 is 1.22% (move 2 decimal
P(𝑋̅ <122) = 0.0122 places to the right and affix the percent sigh).

Example 7: The average time a person spends in a museum follows a normal distribution with µ = 88
minutes (min) and a variance of 16 min. These parameters are taken from a survey conducted to
the visitors of the museum in a particular day. Suppose a sample consisting of 9 visitors is
selected, what is the probability that the average time these visitors spent in the museum in
between 87 and 92 min?
Solution :
Step/Solution Explanation
2
µ = 88; 𝜎 = 16; n= 9 ; ̅̅̅
𝑥1 = 87; ̅̅̅
𝑥2 = 92 Write all the given.
𝜎 2 = 16 The variance (𝜎 2 ) = 16 and the standard deviation is
shown at the left side.
2
√ 𝜎 = √16 Solve for 𝜎.
𝜎 = 4 Extract the square of both sides of the equation.

𝑥̅ −µ 87−88 −1 Use the formula and substitute the given: For ̅̅̅
𝑥1 = 87
z= 𝜎 = 4 = 4
√𝑛 √9 3
4
= -1 ÷ 3 Simplify.
−3
= 4
= -0.75 Change to decimal form.
z = - 0.75
𝑥̅ −µ 92−88 4 Use the formula and substitute the given: For ̅̅̅
𝑥2 = 92
z= 𝜎 = 4 = 4
√𝑛 √9 3
3 12
= 4•4= 4 Simplify.
z=3
̅
P(87<𝑋<92) = P(-0.75<Z<3) = ɸ(0.75) + ɸ(3) We write in probability form. We have,
= 0.2734 +0.4987 Take note that we add the two areas as shown in the curve

-0.75 3
P(87<𝑋̅<92) This means the probability that the mean time of 9
= 0.7721
visitors fall between 87 and 92 minutes is 77.21%
This means the probability that the mean time of 9 visitors fall between 87 and 92 minutes is 77.21%
11

Central Limit Theorem

Recall that when samples with any size n are taken from a normally distributed
population, then the distribution of all sample means also follows a normal distributions. But
what if the population from which the samples are taken is not normally distributed? Can you
conclude that the sampling distribution of the sample mean will be normal? The answer to
this question is “yes” provided that the sample size is large enough. This can be explained
using the Central Limit Theorem.

The Central Limit Theorem (CLT)


Given a random variable X with mean µ and variance 𝜎 2 , then, regardless whether the
population distribution of X is normally distributed or not, as the sample size n gets larger, the shape of
the distribution of the sample means taken from the population approaches a normal distribution, with
𝜎
mean µ and the standard deviation .
√𝑛

Note that the emphasis of the central limit theorem is on the phrase “as the
sample size gets larger”. You can only assume the sampling distribution to be
normal when the sample is large enough. Mostly, n≥30 is sufficient to approximate
the normality of the sampling distribution of the sample mean.

Study the examples below:

Example 8: The average age of teachers in a certain town is 34 with a standard deviation of 4. If the principal
of ABC College employs 100 teachers, find the probability that the average age of these teachers
is less than 35.
Solution:
Step/Solution Explanation
It is not given that the population is normally distributed
Given: 𝑥̅ = 35; µ = 34; 𝜎 = 4; n = 100 but since n>30, then you can assume that the sampling
distribution of the mean ages of 100 teachers is normal
according to the CLT. The formula is,
𝑥̅ −µ 35−34
z= 𝜎 = 4
√𝑛 √100 Substitute the given in the formula.
1 Simplify.
= 4
10
4
=1 ÷ Get the reciprocal of the divisor and proceed to
10
10 multiplication.
= 1 • = 2.5
4
Z = 2.5 Hence, The equivalent of 35 in z= 2.5.
P(X<35) = P(Z < 2.5) Writing in probability format, we have,
P(X<35) = 0.5 + ɸ(2.5)
= 0.5 + 0.4938 Locate the area of ɸ(2.5).
P(X<35) = 0.9938 Therefore, the probability that the average age of the
teachers is less than 35 is 99.38%

Assessment 3.4

In real life, you often encounter parameters and statistics in the form of descriptions (or
labels) about a certain group. Such descriptions are necessary as it gives you relevant
information about the group, allowing you decide whether or not to join them. Remember that
“birds of the same feather flock together”.
12

Congratulations! You are now ready to proceed to our next module…

REFERENCES AND WEBSITE USED IN THIS LESSON

BOOK/S

Canlapan, R. & Campena, F. (2016). Statistics and Probability. Makati City, Philippines: Diwa Learning Center.

You might also like