Professional Documents
Culture Documents
STATISTICS
AND
PROBABILITY
RANDOM VARIABLES AND PROBABILITY DISTRIBUTION
OBJECTIVES:
At the end of this lesson, you should be able to:
Before you proceed with this lesson, you should be able to identify the elements of a
set and the domain and the range of a function.
Consider the set of colors in a rainbow. The elements of that set are red,
orange, yellow, green, blue, indigo, and violet. All of those elements constitute
the sample space of that set.
Recall that the domain of a function is the set of values of x while the range is
the set of values of y. For example, the set of ordered pairs {(2,5),(3,7),(4,9),
(5,11)} is a function with domain {2,3,4,5} and range {5,7,9,11}.
Example
Consider a random experiment of tossing a fair coin three times. In this scenario, the
domain can be defined as the set of all possible outcomes of the experiment and the
range of the random variable as the total number of tails that comes out after tossing
a coin three times.
Let Y be the number of tails in the tossing of fair coin three times (the random
variable).
The set of possible outcomes (domain) of the experiment is as follows:
{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
For each element in the domain, there is a corresponding value for the random
variable Y.The specific value of a random variable is denoted by small letter y. The
domain and range of the random variable Y are shown in the table below:
A discrete random variable is a random variable whose set of all possible values
are countable or infinitely countable. It can be represented as separate points on a
number line.
Explore
Consider the place where you are right now, be it a classroom, living room, or a
library. Can you name one discrete random variable and one continuous random
variable related to the things that you can see? How do you get the values of those
random variables?
Try it!
Determine whether the following variables is discrete or continuous.
4. It is a discrete variable because the rating can be just the whole numbers from 1
to 10.
5. It is a discrete variable because the number of coins are countable. Hence, it will
be represented by a whole number.
Tip
There are cases of discrete random variables that may have infinite values. For
instance, the experiment is counting the number of sand in beaches. Although it
would be impractical and senseless to conduct that experiment, the kind of random
variable it would generate is still discrete.
Key Points
Objectives
At the end of this lesson, you should be able to:
Before you proceed with this lesson, you should be able to recall random variables.
A die is rolled and the score shown on the top face is observed. The random
variable x is the score shown. x could take on the values from 1 to 6, which
are the numbers that the die shows.
Topic5 pages
Questions
Note that in a probability distribution, it must exhibit the two properties of probability.
The probability P(x) for a random variable must be between zero and one, that
is, 0≤P(x)≤1. This means that the probability must not exceed one or have a negative
value.
The sum of the probabilities of the random variables in an experiment should be
equal to one, that is, ∑i=1nP(xi)=1
where:
How to Do
Step 1: Determine the sample space.
The sample space is {HH, HT, TH, TT}. There are four elements in the sample
space.
Identify the probability of the random variable to occur in relation the sample space.
P(x=0)=P(0T)=P(HH)=14=0.25
P(x=1)=P(1T)=P(THorHT)=24=0.5
P(x=2)=P(2T)=P(TT)=14=0.25
Step 3: Construct a table for the probability distribution.
The random variable x for this experiment is the number of dots seen when the die is
rolled. Thus, we assign x with the values 1, 2, and 3.
Rolling the modified five-sided die will yield the sample space {1, 2, 2, 3, 3}.
P(x=1)=P(1dot)=15=0.2
P(x=2)=P(2dots)=25=0.4
P(x=3)=P(3dots)=25=0.4
Step 3: Construct a table for the probability distribution.
Key Points
Topic5 pages
Questions
Objectives
At the end of the lesson, you should be able to:
construct the probability mass function of a discrete random variable and its
corresponding histogram, and
compute probabilities corresponding to a given random variable.
The table below shows the possible outcomes of tossing a coin twice.
Based on the table, what is the probability of getting two consecutive heads? What is
the probability of getting two consecutive tails?
Which is more likely to happen: getting two consecutive heads or getting two
consecutive tails?
How to Do
Here are the steps in creating a probability mass function and its corresponding
histogram:
Step 1: Construct the table containing the random variables. Identify all possible
outcomes.
The experiment is concerned with the number of heads. Thus, count the number of
heads for each possible outcome.
Step 5: Create the corresponding histogram with the x-axis as the expected outcome
and y-axis as the probability of each outcome.
Try it!
Find the probability mass function of throwing a pair of dice. Hence, draw its
corresponding histogram.
Group the possible outcomes in such a way that the sums of the two numbers
appearing on the dice are the same.
Let x be the sum of the numbers appearing on the dice and P(x) be the probability of
each outcome.
The total number of possible outcomes is 36.
When x=2, P(2)=136
When x=3, P(3)=1+136=236=118
When x=4, P(4)=1+1+136=336=112
When x=5, p(5)=1+1+1+136=436=19
When x=6, P(6)=1+1+1+1+136=536
When x=7, P(7)=1+1+1+1+1+136=636=16
When x=8, P(8)=1+1+1+1+136=536
When x=9, P(9)=1+1+1+136=436=19
When x=10, P(10)=1+1+136=336=112
When x=11, P(11)=1+136=236=118
When x=12, P(12)=136
Step 4: Construct a table for the expected outcomes, x, and their corresponding
probabilities, P(x).
Step 5: Create the corresponding histogram with the x-axis as the expected outcome
and y-axis as the probability of each outcome.
Key Points
Topic5 pages
Questions
Objectives
At the end of this lesson, you should be able to:
illustrate, calculate, and interpret the mean of a discrete random variable, and
solve problems involving mean of probability distributions.
Before you proceed to the lesson, you must be able to recall the definition of random
variable and discrete random variable.
Random variable is a type of variable whose values are numbers and due to
chance. When the value of a variable is the outcome of a random experiment, that
variable is a random variable. The following are examples of random variables:
Random variables that are countable are discrete, while those that usually arise from
measurement and are not countable are continuous.
It is important that you can distinguish between discrete and continuous random
variables because different statistical techniques are used to analyze each.
where:
How to Do
Step 1: Identify what is asked.
E(x)=μx=(4⋅0.50)+(8⋅0.25)+(12⋅0.15)+(16⋅0.05)+(20⋅0.05)
E(x)=μx=2+2+1.8+0.8+1
E(x)=μx=7.6.
Therefore, the expected value of x is 7.6.
Try it!
Lloyd repairs computers for money on weekday mornings. He has compiled the
following probability distribution of the number of customers he is likely to get each
day. How many customers does he expect per day?
The number of customers are 20, 25, 30, 35, and 40.
The probability of occurrence P(x) are 0.15, 0.35, 0.30, 0.15 and 0.05, respectively.
Step 3: Use the formula to solve for the unknown.
E(x)=μx=(20⋅0.15)+(25⋅0.35)+(30⋅0.30)+(35⋅0.15)+(40⋅0.05)
E(x)=μx=3+8.75+9+5.25+2
E(x)=μx=28
The expected value of x is 28. Therefore, Lloyd is expecting 28 repairs per day.
Tips
If the given possible outcome is in frequency, you can find the probability of
each possible outcome by dividing its frequency by the sum of the
frequencies. Then check that each probability is between zero and one, and
that the sum of all probabilities is one.
You may use a table to organize your work in computing for the mean like the
one shown below.
Key Point
The mean or expected value of x of a discrete random variable is given by the
formula
E(x)=μx=Σ[xi⋅P(xi)]
where xi is the value of the random variable and P(xi) is the probability of observing
the random variable x.
Random Variables and Probability Distributions , Statistics and Probability
Variance of a Discrete Random Variable
Topic5 pages
Questions
Objectives
At the end of this lesson, you should be able to:
In the previous lesson, you learned that the mean of a probability distribution gives
the expected outcome over repeated samples. If the experiment is performed
several times, the mean of all the outcomes would be close to the mean of the
random variable.
In this lesson, you will learn how to quantify the spread of data in a probability
distribution and use the calculated variance to measure how much is the probability
of the spread around the mean.
σ2=∑[x2⋅P(x)]−μ2
where:
σ2= variance,
x= random variable,
P(x)= probability of x, and
μ= mean of random variable
A smaller standard deviation indicates that more of the data is clustered about
the mean. A larger one indicates the data are more spread out.
How to Do
Step 1: Find the known facts of the problem.
Game A
x=20,8,−10
P(x)=16,16,46
Game B
x=24,−15
P(x)=26,46
Step 2: Solve for the mean.
Mean of game A
μ=20(16)+8(16)−10(46)=−2
Mean of game B
μ=24(26)−15(46)=−2
Step 3: Solve for the variance.
Game A
σ2=∑[x2⋅P(x)]−μ2
σ2=[202(16)+82(16)+(−10)2(46)]−(−2)2
σ2=140
Game B
σ2=∑[x2⋅P(x)]−μ2
σ2=[242(26)+(−15)2(46)]−(−2)2
σ2=338
To interpret the results, calculate the standard deviation. The standard deviations of
game A and game B are 11.83 and 18.38, respectively. Since game A has a smaller
standard deviation, this means that there are more data that is closer to the mean.
When game A is played many times, notice a smaller spread or variation of winning
or losing than playing game B.
Try it!
The students in an English class took a quiz with five questions. The random
variable x represents the number of questions answered correctly. The probability
distribution is given by the table below.
x=0,1,2,3,4,5
P(x)=0.08,0.07,0.16,0.21,0.25,0.23
Step 2: Solve for the mean.
μ=0(0.08)+1(0.07)+2(0.16)+3(0.21)+4(0.25)+5(0.23)
μ=3.17
Step 3: Solve for the variance
σ2=∑[x2⋅P(x)]−μ2
σ2=[02(0.08)+12(0.07)+22(0.16)+32(0.21)+42(0.25)+52(0.23)]−(3.17)2
σ2=2.30
To calculate the standard deviation, get the square root of 2.30. This means that
most of the scores differ from the mean within 1.52 above and below.
Key Points
In a probability distribution, the variance of a random variable is solved by
subtracting the square of the mean from the sum of the products of the
squares of the random variable and the probabilities.
The variance of a random variable x is denoted by σ2 and given by the
formula σ2=∑[x2⋅P(x)]−μ2.
A small standard deviation indicates that more of the data is clustered about
the mean while a large one indicates the data are more spread out.
The variance is the square of the standard deviation.
Topic5 pages
Questions
Objectives
At the end of this lesson, you should be able to:
The normal distribution is the most commonly used distribution because of its
application in many different fields including those that have unknown distributions. It
is a continuous probability distribution of a normal random variable.
Most measurable physical quantities like height, weight, temperature, and test
scores often follow a normal distribution that makes it very useful in making
conclusions about population data by only using a single sample.
Examples
Describe the curve of the normal distribution N(3,10).
This is a normal distribution denoted by N(μ,σ). The normal curve for this distribution
has its center at 3 with a standard deviation of 10.
Describe the curve of the normal distribution N(0,2).
This is a normal distribution denoted by N(μ,σ). The normal curve for this distribution
has its center at 0 with a standard deviation of 2.
Explore
Test scores of a group of 200 students have a mean of 85 with a standard deviation
of 5. The passing score is 70. Estimate the passing rate of the students. (Note:
Remember that about 99.7% of the area under the normal curve falls within three
standard deviations from the mean.)
Try it!
The average weight of the 100 g variant of chocolate bar in a chocolate factory is
100 g with a standard deviation of 0.1 g. The Quality Control team wants at least
95% of the chocolate bars to be in the range of 99.5 - 100.5 g. Illustrate the normal
curve of the chocolate bars and determine if the requirement of Quality Control is
fulfilled.
Try it! Solution
The normal curve of the weight of the chocolate bars has its center at 100 with a
standard deviation of 0.1.
About 95% of the area under the normal curve falls within two standard deviations
from the mean. Using this property, it can be assumed that 95% of the chocolate
bars have weight in the range of 99.8 - 100.2 g. This is within the range of 99.5 -
100.5 g as required by Quality Control. Thus, the requirement set by Quality Control
is fulfilled.
Key Points
Topic5 pages
Questions
Objective
At the end of this lesson, you should be able to identify regions under the normal
curve corresponding to different standard normal values.
Data can be distributed in many different ways. The most common distribution that
applies to many real life data is the normal distribution. Heights and weights of
people, exam scores, and blood pressure all follow the normal distribution. In the
normal distribution, all data tend to approach the mean as the amount of data
increases.
If the region is to the left of the z-score, then the z-table value is the area.
If the region is to the right of the z-score, then subtract the z-table value from
1 to get the area.
If the region is between two z-scores, subtract the z-table value of the leftmost
z-score from the z-table value of the rightmost z-score to get the area.
In the standard normal distribution, the mean is equal to zero and the standard
deviation is equal to one. The values under the normal curve indicate the proportion
of area in each region. For instance, the area between the mean and 1 standard
deviation below or above the mean is approximately 0.34. Additionally, the distance
of a value from the mean in terms of standard deviation through its absolute value
can be easily determined.
How to Do
Step 1: Illustrate the z-score and the area that is asked.
Step 2: Look for the value of the z-score on the z-table. If there are two z-scores,
look for both values on the z-table.
The area of the entire region under the standard normal curve is one. Thus, the area
of the region to the right of Z=1.36 is equal to 1−P(Z<1.36).
Since the area of the region is to the right of the z-score, then subtract 0.9131 from
1. This gives us :
1 - 0.9131 = 0.0869.
Try it!
Find the area of the region between Z=-0.56 and Z=2.1 under the standard normal
curve.
The corresponding values for the z-scores -0.56 and 2.1 are 0.2877 and 0.9821,
respectively.
Step 2: Solve for the area of the region using the value/s obtained from the z-table.
Since the region is between two z-scores, subtract the z-table value of the leftmost z-
score which is 0.2877 from the z-table value of the rightmost z-score which is
0.9821. This gives us:
0.9821-0.2877=0.6944
Thus, P(−0.56<Z<2.1)=0.6944.
Key Points
The area of a region under the standard normal curve cannot be negative.
The total area of the entire region under the standard normal curve is one.
The standard normal curve is symmetric. Thus, the area of the region to the
left of Z=z is equal to the area of the region to the right of Z=-z.
Topic6 pages
Questions
Objective
At the end of this lesson, you should be able to convert a normal random variable to
a standard normal variable and vice versa.
Normal distribution is the most commonly used distribution. The graph of the
distribution is shaped like a bell. Since it looks like a bell, the distribution is
symmetric. Thus, the mean, median, and mode are equal and are located at the
center.
What are the mean and standard deviation of a standard normal distribution?
The standard normal distribution has a mean equal to zero and standard deviation
equal to one as shown in the illustration above.
How to Do
Step 1: Find the known and unknown facts of the problem.
unknown: z
Step 2: Substitute the values to the formula and solve for the unknown.
z=x−μσ
z=85−805
z=55=1
If Allan got 85 in Math final exam, the equivalent z-score is 1.
The z-score shows how many standard deviations from the mean Allan's score is. In
this example, Allan’s score is one standard deviation above the mean.
Try it!
Suppose you have a set of test scores that are normally distributed with mean equal
to 80 and standard deviation equal to 5. If Alexa got 75, what is her z-score?
Alexa’s score = 75
mean of test scores = 80
standard deviation = 5
unknown: z
Step 2: Substitute the values to the formula and solve for the unkown.
z=x−μσ
z=75−805
z=−55=−1
What does it mean to have a negative z-score?
Try it!
Math exam scores are normally distributed with a mean of 80 and a standard
deviation of 5. If Ana got a z-score of -2, what is her exam score?
z=x−μσ
−2=x−805
x=(−2)(5)+80
x=−10+80
x=70
Thus, the score of Ana is 70.
Key Points
Any value from the normal distribution can be converted into its corresponding
value on a standard normal distribution.
A positive z-score indicates the number of standard deviations above the
mean, while a negative z-score indicates the number of standard deviations
below the mean.
To find the z-score, use the formula z=x−μσ.
To find the normal value x, use the formula x=zσ+μ.
Topic5 pages
Questions
Objective
At the end of this lesson, you should be able to compute the probabilities and
percentiles using the standard normal table.
Before you proceed with this lesson, you should be able to recall the normal curve
and its properties.
The pulse rates of adult females have a normal curve distribution with a mean of 75
beats per minute (bpm) and a standard deviation of 8 bpm. Find the probability that a
randomly selected female has a pulse rate greater than 85 bpm.
The mean for IQ scores is 100 and the standard deviation is 15. What proportion of
IQ scores falls between 100 and 130?
When a random variable x is normally distributed, you can find the probability
that x will lie in an interval by calculating the area under the normal curve for
the interval.
To transform an x value to a z-score, use the
formula z=value − meanstandard deviation or z=x−μσ.
Note: Round the z-score to the nearest hundredth.
Step 1: Draw the normal distribution curve and shade the area.
z=x−μσ
z=73−655
z=1.60
Step 3: Find the corresponding area of the z-score.
Using the z-table, we could determine the proportion of the curve under 73 mph.
Because z=1.60 look in the 1.6 row and the 0.00 column (1.6 plus 0.00 equals 1.60).
Step 1: Draw the normal distribution curve and shade the area
Step 2: Compute for the z-score for observation x.
z=x−μσ
z=85−758
z=1.25
Step 3: Find the corresponding area of the z-score.
Given that z=1.25, use the z-table to determine the cumulative probability.
The cumulative probability for z=1.25 is 0.8944 which is the proportion below a pulse
rate of 85 bpm.
To find the proportion above a pulse rate of 85, subtract the area from 1.
1.000 - 0.8944=0.1056
The probability that a randomly selected female will have a pulse rate above 85 bpm
is 0.1056 or 10.56%
Step 1: Draw the normal distribution curve and shade the area.
z=100−10015
z=0
For an IQ of 130,
z=130−10015
Try it!
A mobile company survey indicates that their employees keep their mobile phone an
average of 1.5 years before replacing it with a new one. The standard deviation is
0.25 year. The mobile phone users are randomly selected. Find the probability that
the user will keep his/her mobile phone for less than a year before replacing it with a
new one.
Using the z table, because z= -2, look in the -2 row and the 0.00 column.
The cumulative probability for z = -2 is 0.228.
Therefore, 2.28% of mobile phone users will keep their phone for less than a year
before they buy a new one.
z=2
Step 3: Find the corresponding area of the z-score.
This problem is asking for the proportion of observations that fall between a z-score
of 0 and a z-score of 2. Using the z table,
P(z≤0)=0.5000 and P(z≤2)=0.9772
P(0≤z≤2)=P(z≤2)−P(z≤0)=0.9772−0.5000=0.4772
The proportion of IQ scores between 100 and 130 is 0.4772 or 47.72%
Tips
An x-value should be standardized first by using the
formula z=value − meanstandard deviation=x−μσ.
It is helpful to begin by sketching a normal distribution and shading in the
appropriate region.
Key Points
If the probability being asked is greater than the x-value, subtract the
cumulative probability from one.
Topic5 pages
Questions
Objectives
At the end of this lesson, you should be able to:
Think about this! Imagine yourself conducting a study about a certain characteristic
of a very big population. Is it enough to just get one sample and make conclusions
about the population? Why? Why not? In this lesson, you will learn how taking more
random samples provides better data on the actual characteristic of the population.
Say you are interested in studying a particular population. You need to look into
certain parameters such as the population mean (μ) or standard deviation (σ) to
describe it. A parameter is a numerical value that summarizes the data of an entire
population. However, in reality, complete information about the population is not
thoroughly accessible and may probably be unable to get the exact value.
Sampling Distribution
To address this concern, sample of the population, typically using random sampling,
and obtain a statistic is taken. A statistic is a numerical value that summarizes the
sample data. From the sample data collected, statistics such as sample means (x¯)
or sample standard deviations (s) to make predictions or approximates about the
parameters of the population can be computed. However, taking one or two samples
is not enough to be able to know if the statistic is close to the parameter of the
population. Hence, repeated samples are to be taken and sampling distribution of a
sample statistic must be looked into.
If repeated random samples of the same size (n) are taken from the sample
population, the values of the sample statistics vary from sample to sample. This
creates a sampling variability. The distribution of values of these sample statistics to
see how close they describe the parameters of the population should be taken into
consideration.
How to Do
Note: The sample problem is of a finite population.
Here, simply find the mean of the two numbers in each sample.
There are several ways of presenting the sampling distribution: (a) tabular method
(b) graphical method.
a. Tabular form
b. Graphical form
Try it!
A five-sided die has been modified to appear one side with one dot, two sides with
two dots, and two sides with three dots. Construct a sampling distribution of the 20
sample means of size 3.
Here, simply find the mean of the three numbers in each sample.
a. Tabular form
b. Graphical form
Topic6 pages
Questions
Objective
At the end of the lesson, you should be able to define the sampling distribution of the
sample mean for normal population when the variance is known.
How does the sampling distribution of the sample mean of a normal population when
the variance is known can be defined?
In a normal population, the mean μM of the sampling distribution of the sample mean
is always equal to the population mean μ.
μM=μ
In computing for the standard deviation σM of the sampling distribution σ and the
sample size n are needed.
σM=σn
The standard deviation of the sampling distribution is commonly called as
the standard error. The standard error is used to measure the accuracy of which
the sample represents the population.
However, the mean and standard deviation of sampling distributions are only
considered to be true if the sample size is large enough. A sample size above 30 is
generally accepted in practice.
Examples
1). A group of 1000 students took an achievement test. The scores have a normal
distribution and the population mean and variance of the scores are 85 and 9,
respectively. Define the sampling distribution of the sample mean of the scores with
a sample size of 36. (Note: The standard deviation is the square root of the
variance.)
In this example, the population mean μ is equal to 85. Thus, μM is also equal to 85.
For the standard error of the mean, it can be seen that n=36 and σ=3. Solve for σM.
σM=σn
σM=336=36=0.5
Therefore, the sampling distribution of the sample mean of the scores with a sample
size of 36 has a mean of 85 and standard deviation of 0.5.
2). The mean height of men aged 20 to 30 in a city is normally distributed with a
mean of 67 inches and a variance of 16. Define the sampling distribution of the
sample mean with a sample size of 400.
In this example, the population mean μ is equal to 67. Thus, μM is also equal to 67.
For the standard error of the mean, it can be noticed that n=400 and σ=4. Solve
for σM.
σM=σn
σM=4400=420=0.2
Therefore, the sampling distribution of the sample mean of the heights with a sample
size of 400 has a mean of 67 and standard deviation of 0.2.
Explore
Consider the same data on the height of men aged 20 to 30 in a city which is
normally distributed with a mean of 67 inches and a variance of 16. The standard
error is 0.2 when the sample size is 400. What if the sample size is 900? What if it is
further increased to 1600? What does this say about the relationship between the
sample size and standard error?
Try it!
The average household income in a municipality is ₱14 000 with a standard
deviation of ₱400. Define the sampling distribution of the sample mean with a
sample size of 100 households if the population household income follows a normal
distribution.
Try it! Solution
In this example, the population mean μ is equal to 14 000. Thus, μM is also equal to
14 000.
For the standard error of the mean, it can be seen that n=100 and σ=100. Solve
for σM.
σM=σn
σM=4000100=400010=400
Therefore, the sampling distribution of the sample mean of household incomes with
a sample size of 100 has a mean of 14000 and standard deviation of 400.
Key Points
Topic6 pages
Questions
Objective
At the end of this lesson, you should be able to define the sampling distribution of the
sample mean for normal population when the variance is unknown.
How can the sampling distribution of the sample mean of a normal population be
defined when the variance is unknown?
To solve for the population mean, use the formula μ=∑xN where μ is the population
mean, ∑x is the total of data values and N is the number of data values in the
population.
Five students took an exam on Statistics and Probability and got the following
scores: 5, 10, 17, 19, and 22. Find the mean of the sampling distribution of the
sample mean with a sample size of 3.
Solve the population mean.
μ=∑xN
μ=frac5+10+17+19+225
μ=14.6
Now, solve for the sample mean of all possible combinations of samples of size 3.
Then, solve for the mean of the sample means using the
formula μM=∑xN where μM is the mean of the sample means, ∑x is the total of the
sample means and N is the number of data values in the population.
μM=10.67+11.33+12.33+13.67+14.67+15.33+15.33+16.33+17+19.3310
μM=14.6
In this example, the mean of the sampling distribution of the sample mean is equal to
the population mean. This may not be the case for all instances, but this only implies
that the mean of the sampling distribution of the sample mean is approximately equal
to the population mean.
However, if the population is very large, it is not possible to solve for all the possible
combinations of samples of size n. In most cases, the mean of the sampling
distribution of the sample mean should be solved using a single sample. In these
cases, a simple assumption can be made based on the results of the single sample.
In the example, notice that μM=14.6 is also at the center of the sampling distribution
which is normally distributed. Thus, it can be assumed that the mean from a single
sample is most probably equal to the mean of the sample means and the population
mean.
Moreover, the same assumption can be made about the standard deviation of the
sampling distribution of the sample mean and the population standard deviation
using a single sample. Thus, using the sample size and sample standard deviation
instead of the population standard deviation, σM (standard deviation of the sample
means) can be estimated to be equal to fracsn where s is the standard deviation of
the sample and n is the sample size. Remember that in solving for the standard
deviation of the sample, the sample size used is not n but n−1.
Note that these assumptions only work for large sample sizes. A sample size of at
least 30 is considered large enough in practice.
Example
The heights of a sample of 100 children aged 6 to 8 in a rural area are measured.
Their average height is 50 inches with a standard deviation of 14 inches. Estimate
the mean and standard deviation of the sampling distribution of the sample mean.
Thus, the mean of the sampling distribution of the sample mean is also 50 inches.
Explore
Data from a sample of 30 social media users were collected to determine the
average time a person spends visiting social media every day. The average time
spent on social media every day by the 29 out of the 30 respondents is 25 minutes.
The remaining one respondent, who admits to being a social media addict, spent
408 minutes every day. How would this value affect the sampling distribution of the
sample mean?
Try it!
The average weight of women aged 40 to 50 who are working in a business district
is computed using a sample size of 60. Their average weight is 62 kg with a
standard deviation of 15 kg. Estimate the mean and standard deviation of the
sampling distribution of the sample mean.
Thus, the mean of the sampling distribution of the sample mean is also 62 kg.
It is given that s=15 and n=60. Compute for the standard error of the mean using the
formula.
σM=sn
σM=1560
σM=1.94
Key Points
Topic5 pages
Questions
Objective
At the end of this lesson, you should be able to understand how a sampling
distribution of the mean of large sample sizes approaches a normal distribution.
In this lesson, you need to remember the properties of a sampling distribution and
how to get the mean and variance of the sampling distribution of sampling means.
To get the standard deviation of the population, use the following formula:
σ=1N∑i=1N(xi−μ)2
where:
Statisticians differ in opinion as to which sample size to take. Some would suggest
taking a sample size of at least 30 to get a close approximation of a normal
distribution. Others would suggest a sample size as large as 50 or even more. This
happens when the parent population sampled does not appear to be normal.
Example
To illustrate the Central Limit Theorem, take the population that contains five
numbers 1, 2, 3, 4, and 5 and consider the samples of size two.
a. Mean
μ=1+2+3+4+55=155=3
b. Standard Deviation
σ=15∑i=15(xi−3)2=1.414
Sampling distribution of sample means of sample size n=2
The histogram below shows the distribution (n=2) of the sampling distribution of the
sample means.
Summary:
1. The mean of the sampling distribution will equal to the mean of the population,
i.e. μ=μx¯=3.
2. The standard error of the mean for the sampling distribution is σx¯=1.
3. The histogram of the sampling distribution very strongly suggests normality.
Try it!
Test the normality of a sampling distribution of sample means for 30 random
samples with sample size n=3 from the population that contains five numbers 1, 2, 4,
3, and 5.