You are on page 1of 39

INFORMATIC COMPUTER INSTITUTE OF AGUSAN DEL SUR INC.

Sanfrancisco, Agusan del Sur

STATISTICS
AND
PROBABILITY
RANDOM VARIABLES AND PROBABILITY DISTRIBUTION

Lesson 1: Distinguishing Continuous and Discrete Random Variables

OBJECTIVES:
At the end of this lesson, you should be able to:

 illustrate random variables and


 Distinguish between a discrete and a continuous random variable.

Before you proceed with this lesson, you should be able to identify the elements of a
set and the domain and the range of a function.

Identifying the Elements of a Set

 Consider the set of colors in a rainbow. The elements of that set are red,
orange, yellow, green, blue, indigo, and violet. All of those elements constitute
the sample space of that set.

Domain and Range of a Function

 Recall that the domain of a function is the set of values of x while the range is
the set of values of y. For example, the set of ordered pairs {(2,5),(3,7),(4,9),
(5,11)} is a function with domain {2,3,4,5} and range {5,7,9,11}.

Learn about it!


A random variable is a function whose domain is the sample space of a random
experiment, and the range of values is the set of real numbers.

Example
Consider a random experiment of tossing a fair coin three times. In this scenario, the
domain can be defined as the set of all possible outcomes of the experiment and the
range of the random variable as the total number of tails that comes out after tossing
a coin three times.

Let Y be the number of tails in the tossing of fair coin three times (the random
variable).
The set of possible outcomes (domain) of the experiment is as follows:
{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

For each element in the domain, there is a corresponding value for the random
variable Y.The specific value of a random variable is denoted by small letter y. The
domain and range of the random variable Y are shown in the table below:

Therefore, the possible values of the random variable are 0, 1, 2, 3.

Learn about it!


Random variables are classified as either discrete or continuous.

A discrete random variable is a random variable whose set of all possible values
are countable or infinitely countable. It can be represented as separate points on a
number line.

The following are examples of discrete random variables:

 the number of correct answers in a 5-item true or false quiz


 the number of siblings of your classmates
 the number of people in each country

A continuous random variable is a random variable whose set of all possible


values are not countable or infinite. It can be represented as an interval.

The following are examples of continuous random variables:

 the height of each student in a class


 the weight of each plane baggage
 the waiting time before a person gets a taxi in a taxi stand

Explore
Consider the place where you are right now, be it a classroom, living room, or a
library. Can you name one discrete random variable and one continuous random
variable related to the things that you can see? How do you get the values of those
random variables?

Try it!
Determine whether the following variables is discrete or continuous.

1. number of leaves of a 10-week old plant


2. amount of time rendered in using social media applications
3. temperature of air in different times of the day
4. rating of an employee in a 10-point scale
5. number of coins in a piggy bank
Try it! Answers
1. It is a discrete variable because the number of leaves can only be represented by
a whole number.
2. It is a continuous variable because the amount of time can be any value within a
certain range. Thus, its possibilities cannot be counted.
3. It is a continuous variable because air temperature can be any value within a
range. Thus, it cannot be represented by a whole number alone.

4. It is a discrete variable because the rating can be just the whole numbers from 1
to 10.
5. It is a discrete variable because the number of coins are countable. Hence, it will
be represented by a whole number.

Tip
There are cases of discrete random variables that may have infinite values. For
instance, the experiment is counting the number of sand in beaches. Although it
would be impractical and senseless to conduct that experiment, the kind of random
variable it would generate is still discrete.

Key Points

 Random variable is a function that takes on all the possible outcomes of an


experiment and assigns for each of them a corresponding real number.
 Discrete random variables are countable or countably infinite.
 Continuous random variables are not countable or infinite.

Objectives
At the end of this lesson, you should be able to:

 find the values of a random variable and


 illustrate a probability distribution for a discrete random variable and its
properties.

Before you proceed with this lesson, you should be able to recall random variables.

 In conducting an experiment, each possible result is called an outcome and


listing all of the possible results make up the sample space.
 There are two types of random variables, discrete and continuous. Discrete
random variables assume a countable number (integer) of values while
continuous random variables assume an uncountable one (arises from
measurement).
Examples:

Discrete Random Variable

 A die is rolled and the score shown on the top face is observed. The random
variable x is the score shown. x could take on the values from 1 to 6, which
are the numbers that the die shows.

Continuous Random Variable

 Let the lifetime of a cell phone battery be a random variable. If measured


perfectly with decimals and no rounding off, the random variable can take on
different values.

Random Variables and Probability Distributions , Statistics and Probability


Random Variables and Probability Distribution

 Topic5 pages
 Questions

Learn about it!


Consider the experiment of tossing two coins. Let the random variable x be the
number of tails observed upon tossing the two coins at the same time.
Determine the probability of the random variable x and construct its probability
distribution.
Probability Distribution
of a Discrete Random Variable
A probability distribution is a distribution of the probabilities associated with the
values of a random variable.

Note that in a probability distribution, it must exhibit the two properties of probability.

The probability P(x) for a random variable must be between zero and one, that
is, 0≤P(x)≤1. This means that the probability must not exceed one or have a negative
value.
The sum of the probabilities of the random variables in an experiment should be
equal to one, that is, ∑i=1nP(xi)=1
where:

xi is the ith random variable in the experiment;


i is an element of the set of integers;
P(xi) is the probability of each random variable; and
n is the total number of random variables in the experiment.

How to Do
Step 1: Determine the sample space.
The sample space is {HH, HT, TH, TT}. There are four elements in the sample
space.

Step 2: Calculate the probability of the random variable.

Identify the probability of the random variable to occur in relation the sample space.

P(x=0)=P(0T)=P(HH)=14=0.25
P(x=1)=P(1T)=P(THorHT)=24=0.5
P(x=2)=P(2T)=P(TT)=14=0.25
Step 3: Construct a table for the probability distribution.

Try it! Solution


Step 1: Determine the sample space.

The random variable x for this experiment is the number of dots seen when the die is
rolled. Thus, we assign x with the values 1, 2, and 3.
Rolling the modified five-sided die will yield the sample space {1, 2, 2, 3, 3}.

Step 2: Calculate the probability of the random variable.

Based on the sample space, the following probabilities can be established:

P(x=1)=P(1dot)=15=0.2
P(x=2)=P(2dots)=25=0.4
P(x=3)=P(3dots)=25=0.4
Step 3: Construct a table for the probability distribution.

What do you think?


What other ways could you present a probability distribution of discrete random
variables?

Key Points

 Discrete random variables assume countable number of values or integer


values. Examples are number of male students in a classroom and number of
cellphones owned.
 Continuous random variables assume uncountable number of values.
Examples are body temperature, life span, and distance.
 Probability distributions are used to represent the probabilities of random
variables in a population
Random Variables and Probability Distributions , Statistics and Probability
Constructing Probability Mass Function and Computing for the Probability of a
Random Variable

 Topic5 pages
 Questions

Objectives
At the end of the lesson, you should be able to:

 construct the probability mass function of a discrete random variable and its
corresponding histogram, and
 compute probabilities corresponding to a given random variable.

The table below shows the possible outcomes of tossing a coin twice.

Based on the table, what is the probability of getting two consecutive heads? What is
the probability of getting two consecutive tails?

Which is more likely to happen: getting two consecutive heads or getting two
consecutive tails?

Is the table sufficient to answer the previous questions?

Learn about it!


Construct a probability mass function and its corresponding histogram for the
number of heads when a coin is tossed twice.

Probability Mass Function


There is an easier way to know which event is more probable to occur. It is through
the use of probability mass function.

A random variable x denotes a set of values with different probabilities. It is said to


be discrete when the number of outcomes is finite or countable. Some examples of
discrete random variables are determining whether heads or tails, dead or alive,
positive or negative, rolling a die, etc.
The probability mass function is a one-to-one mapping between the possible
values of the random variable x to their respective probabilities of occurrence. Let
the probabilities of occurrence of an event be denoted by P(x). The value of P(x)
ranges from zero to one.
The probability mass function has a corresponding graph called histogram.

How to Do
Here are the steps in creating a probability mass function and its corresponding
histogram:

Step 1: Construct the table containing the random variables. Identify all possible
outcomes.

Step 2: Count the occurrence of each possible outcome.

The experiment is concerned with the number of heads. Thus, count the number of
heads for each possible outcome.

Step 3: Determine the probability of each outcome.

The formula for finding the probability, P(x), is P(x)=number of occurence of an


eventtotal number of possible outcomes.
In the experiment above, there are four possible outcomes. Let x be the number of
heads and P(x) be the probability of each outcome.
When x=0, P(0)=14
When x=1, P(1)=1+14=24=12
When x=2, P(2)=14
Step 4: Construct a table for the expected outcomes, x, and their corresponding
probabilities, P(x).

Note: The sum of all the probabilities should be equal to one.

Step 5: Create the corresponding histogram with the x-axis as the expected outcome
and y-axis as the probability of each outcome.

Try it!
Find the probability mass function of throwing a pair of dice. Hence, draw its
corresponding histogram.

Try it! Solution


Step 1: Construct the table containing the random variables. Then, identify all
possible outcomes.

Group the possible outcomes in such a way that the sums of the two numbers
appearing on the dice are the same.

Step 2: Count the occurrence of each possible outcome.


The expected outcome is the sum of the numbers appearing on the dice.

Step 3: Determine the probability of each outcome.

Let x be the sum of the numbers appearing on the dice and P(x) be the probability of
each outcome.
The total number of possible outcomes is 36.
When x=2, P(2)=136
When x=3, P(3)=1+136=236=118
When x=4, P(4)=1+1+136=336=112
When x=5, p(5)=1+1+1+136=436=19
When x=6, P(6)=1+1+1+1+136=536
When x=7, P(7)=1+1+1+1+1+136=636=16
When x=8, P(8)=1+1+1+1+136=536
When x=9, P(9)=1+1+1+136=436=19
When x=10, P(10)=1+1+136=336=112
When x=11, P(11)=1+136=236=118
When x=12, P(12)=136
Step 4: Construct a table for the expected outcomes, x, and their corresponding
probabilities, P(x).

Step 5: Create the corresponding histogram with the x-axis as the expected outcome
and y-axis as the probability of each outcome.

What do you think?


Would probability mass function work for continuous random variables?

Key Points

 A random variable x denotes a set of values with different probabilities.


 A random variable is discrete when the number of outcomes is finite or
countable. Some examples of discrete random variables are determining
whether heads or tails, dead or alive, positive or negative, rolling a die, etc.
 The probability mass function is a one-to-one mapping between the
possible values of the random variable x to their respective probabilities of
occurrence. Let the probabilities of occurrence be denoted by P(x). The value
of P(x) ranges from zero to one.
 Histogram is the graph of the probability mass function. The sum of the area
covered by the histogram is one.

Random Variables and Probability Distributions , Statistics and Probability


Mean of a Discrete Random Variable

 Topic5 pages
 Questions

Objectives
At the end of this lesson, you should be able to:

 illustrate, calculate, and interpret the mean of a discrete random variable, and
 solve problems involving mean of probability distributions.

Before you proceed to the lesson, you must be able to recall the definition of random
variable and discrete random variable.

Random variable is a type of variable whose values are numbers and due to
chance. When the value of a variable is the outcome of a random experiment, that
variable is a random variable. The following are examples of random variables:

 number of wins that a basketball team has in a given season


 weight of the students
 number of people at a seminar
 time it takes to get to the office

Random variables that are countable are discrete, while those that usually arise from
measurement and are not countable are continuous.

It is important that you can distinguish between discrete and continuous random
variables because different statistical techniques are used to analyze each.

Learn about it!


Calculate the mean of the random variable with the given probability
distribution.

Mean of a Discrete Random Variable

 The mean μx of the discrete random variable x is called the expected value


of x.
 It is denoted by E(x).
 The expected value of a discrete random variable is equal to the mean of the
random variable.
 The mean of random variable x is the sum of the products of possible
outcomes of x and its percent probability of occurrence.
 To compute the mean of discrete random variable or the expected value of x,,
the following formula is used:
E(x)=μx=Σ[xi⋅P(xi)]

where:

xi is the value of random variable for the outcome i,


P(xi) is the probability that a random variable will have an outcome i, and
μx is the mean of random variable of x.

How to Do
Step 1: Identify what is asked.

The expected value of the given probability distribution

Step 2: Identify what are given.

The values of x are 4, 8, 12, 16, and 20.


The probability of occurrence P(x) are 0.50, 0.25, 0.15, 0.05, and 0.05, respectively.
Step 3: Use the formula to solve for the unknown.

E(x)=μx=(4⋅0.50)+(8⋅0.25)+(12⋅0.15)+(16⋅0.05)+(20⋅0.05)
E(x)=μx=2+2+1.8+0.8+1
E(x)=μx=7.6.
Therefore, the expected value of x is 7.6.

Try it!
Lloyd repairs computers for money on weekday mornings. He has compiled the
following probability distribution of the number of customers he is likely to get each
day. How many customers does he expect per day?

Try it! Solution


Step 1: Identify what is asked.

The average number of customers he expects per day

Step 2: Identify what are given.

The number of customers are 20, 25, 30, 35, and 40.

The probability of occurrence P(x) are 0.15, 0.35, 0.30, 0.15 and 0.05, respectively.
Step 3: Use the formula to solve for the unknown.

E(x)=μx=(20⋅0.15)+(25⋅0.35)+(30⋅0.30)+(35⋅0.15)+(40⋅0.05)
E(x)=μx=3+8.75+9+5.25+2
E(x)=μx=28
The expected value of x is 28. Therefore, Lloyd is expecting 28 repairs per day.

Tips

 If the given possible outcome is in frequency, you can find the probability of
each possible outcome by dividing its frequency by the sum of the
frequencies. Then check that each probability is between zero and one, and
that the sum of all probabilities is one.
 You may use a table to organize your work in computing for the mean like the
one shown below.

Key Point
The mean or expected value of x of a discrete random variable is given by the
formula

E(x)=μx=Σ[xi⋅P(xi)]
where xi is the value of the random variable and P(xi) is the probability of observing
the random variable x.
Random Variables and Probability Distributions , Statistics and Probability
Variance of a Discrete Random Variable

 Topic5 pages
 Questions

Objectives
At the end of this lesson, you should be able to:

 calculate and interpret the variance of a discrete random variable and


 solve problems involving variance of a probability distribution.

In the previous lesson, you learned that the mean of a probability distribution gives
the expected outcome over repeated samples. If the experiment is performed
several times, the mean of all the outcomes would be close to the mean of the
random variable.

In this lesson, you will learn how to quantify the spread of data in a probability
distribution and use the calculated variance to measure how much is the probability
of the spread around the mean.

Learn about it!


In a school fair, two games are conducted using six-sided dice. For game A,
you will win ₱20 if the result of rolling the die is “1”, ₱8 if the result is “2”.
Otherwise, you lose ₱10. For game B, you will win ₱24 if the result is “1” or
“2”. Otherwise, you lose ₱15. The tables below show the probability
distribution for winning or losing the money for both games. What is the
variance of each game?

Variance of a Discrete Random Variable


 The variance of a discrete random variable x is denoted by σ2. The variance
is the square of the standard deviation which is the measure of how close the
random variable is to the mean.
 The variance of a discrete random variable is given by the formula:

σ2=∑[x2⋅P(x)]−μ2
where:

σ2= variance,
x= random variable,
P(x)= probability of x, and
μ= mean of random variable

 A smaller standard deviation indicates that more of the data is clustered about
the mean. A larger one indicates the data are more spread out.

How to Do
Step 1: Find the known facts of the problem.

Game A

x=20,8,−10
P(x)=16,16,46
Game B

x=24,−15
P(x)=26,46
Step 2: Solve for the mean.

Mean of game A
μ=20(16)+8(16)−10(46)=−2
Mean of game B
μ=24(26)−15(46)=−2
Step 3: Solve for the variance.
Game A
σ2=∑[x2⋅P(x)]−μ2
σ2=[202(16)+82(16)+(−10)2(46)]−(−2)2
σ2=140
Game B

σ2=∑[x2⋅P(x)]−μ2
σ2=[242(26)+(−15)2(46)]−(−2)2
σ2=338
To interpret the results, calculate the standard deviation. The standard deviations of
game A and game B are 11.83 and 18.38, respectively. Since game A has a smaller
standard deviation, this means that there are more data that is closer to the mean.
When game A is played many times, notice a smaller spread or variation of winning
or losing than playing game B.

Try it!
The students in an English class took a quiz with five questions. The random
variable x represents the number of questions answered correctly. The probability
distribution is given by the table below.

What is the variance?

Try it! Solution


Step 1: Find the known facts of the problem.

x=0,1,2,3,4,5
P(x)=0.08,0.07,0.16,0.21,0.25,0.23
Step 2: Solve for the mean.
μ=0(0.08)+1(0.07)+2(0.16)+3(0.21)+4(0.25)+5(0.23)
μ=3.17
Step 3: Solve for the variance

σ2=∑[x2⋅P(x)]−μ2
σ2=[02(0.08)+12(0.07)+22(0.16)+32(0.21)+42(0.25)+52(0.23)]−(3.17)2
σ2=2.30
To calculate the standard deviation, get the square root of 2.30. This means that
most of the scores differ from the mean within 1.52 above and below.

What do you think?


What does it mean when the variance of a discrete random variable is equal to zero?

Key Points
 In a probability distribution, the variance of a random variable is solved by
subtracting the square of the mean from the sum of the products of the
squares of the random variable and the probabilities.
 The variance of a random variable x is denoted by σ2 and given by the
formula σ2=∑[x2⋅P(x)]−μ2.
 A small standard deviation indicates that more of the data is clustered about
the mean while a large one indicates the data are more spread out.
 The variance is the square of the standard deviation.

Normal Distribution, Statistics and Probability


The Normal Curve

 Topic5 pages
 Questions

Objectives
At the end of this lesson, you should be able to:

 illustrate a normal random variable and its characteristics and


 construct a normal curve.

The normal distribution is the most commonly used distribution because of its
application in many different fields including those that have unknown distributions. It
is a continuous probability distribution of a normal random variable.

Most measurable physical quantities like height, weight, temperature, and test
scores often follow a normal distribution that makes it very useful in making
conclusions about population data by only using a single sample.

How does the graph of a normal distribution look like?

Learn about it!


Normal Random Variable
The normal random variable is a continuous random variable that follows the normal
distribution with mean μ and standard deviation σ which is usually denoted
by N(μ,σ). Moreover, it can take any value and has a density function of f(x)=1σ2πe−
(x−μ)22σ2 where x is the normal random variable.
Normal Curve

 The graph of a normal distribution is bell-shaped.


 It depends on two factors: the mean and standard deviation.
 The mean determines the center of the graph and the standard deviation
determines its height and width.
 Normal distributions with higher standard deviation create curves with smaller
height and bigger width.
 A normal curve looks like this:
 The mean of a normal distribution lies in the center of the bell curve along the
horizontal axis.

Characteristics of a Normal Curve

 The total area under the normal curve is equal to one.


 The probability of x to be any single value is equal to zero. The standard
normal distribution is a continuous probability distribution, so the probability is
only found for intervals. Remember that in finding the probabilities in a
continuous probability distribution, the area between two values is calculated.
If there is only a single value, an area cannot be computed.
 About 68% of the area under the normal curve falls within one standard
deviation from the mean.
 About 95% of the area under the normal curve falls within two standard
deviations from the mean.
 About 99.7% of the area under the normal curve falls within three standard
deviations from the mean.
 It is asymptotic to the horizontal axis.
 It is symmetric about the mean.
 It has a maximum point at the mean.

Examples
Describe the curve of the normal distribution N(3,10).
This is a normal distribution denoted by N(μ,σ). The normal curve for this distribution
has its center at 3 with a standard deviation of 10.
Describe the curve of the normal distribution N(0,2).
This is a normal distribution denoted by N(μ,σ). The normal curve for this distribution
has its center at 0 with a standard deviation of 2.

Explore
Test scores of a group of 200 students have a mean of 85 with a standard deviation
of 5. The passing score is 70. Estimate the passing rate of the students. (Note:
Remember that about 99.7% of the area under the normal curve falls within three
standard deviations from the mean.)

Try it!
The average weight of the 100 g variant of chocolate bar in a chocolate factory is
100 g with a standard deviation of 0.1 g. The Quality Control team wants at least
95% of the chocolate bars to be in the range of 99.5 - 100.5 g. Illustrate the normal
curve of the chocolate bars and determine if the requirement of Quality Control is
fulfilled.
Try it! Solution
The normal curve of the weight of the chocolate bars has its center at 100 with a
standard deviation of 0.1.

About 95% of the area under the normal curve falls within two standard deviations
from the mean. Using this property, it can be assumed that 95% of the chocolate
bars have weight in the range of 99.8 - 100.2 g. This is within the range of 99.5 -
100.5 g as required by Quality Control. Thus, the requirement set by Quality Control
is fulfilled.

What do you think?


Normal distributions follow a complicated probability density function. Given that
normal distributions are continuous, how can the probabilities of intervals be
computed without using the formula?

Key Points

 The normal distribution follows a normal curve that is bell-shaped.


 The mean is the center of the normal curve.
 The standard deviation determines the height and width of the normal curve.

Normal Distribution, Statistics and Probability


Standard Scores and Area under the Normal Curve

 Topic5 pages
 Questions

Objective
At the end of this lesson, you should be able to identify regions under the normal
curve corresponding to different standard normal values.

Data can be distributed in many different ways. The most common distribution that
applies to many real life data is the normal distribution. Heights and weights of
people, exam scores, and blood pressure all follow the normal distribution. In the
normal distribution, all data tend to approach the mean as the amount of data
increases.

Learn about it!


Find the area of the region to the right of Z=1.36 under the standard normal
curve.

Standard Scores or Z-Scores


By using the normal distribution, the number of standard deviations of a value away
from the mean can be found. However, by converting the values to standard scores
(z-scores), it will be easier to make conclusions from the data through the Standard
Normal Distribution Table (z-table). Here are important things you should know when
looking at the z-table.

 If the region is to the left of the z-score, then the z-table value is the area.
 If the region is to the right of the z-score, then subtract the z-table value from
1 to get the area.
 If the region is between two z-scores, subtract the z-table value of the leftmost
z-score from the z-table value of the rightmost z-score to get the area.

In the standard normal distribution, the mean is equal to zero and the standard
deviation is equal to one. The values under the normal curve indicate the proportion
of area in each region. For instance, the area between the mean and 1 standard
deviation below or above the mean is approximately 0.34. Additionally, the distance
of a value from the mean in terms of standard deviation through its absolute value
can be easily determined.

How to Do
Step 1: Illustrate the z-score and the area that is asked.

The illustration below depicts the area to the right of Z=1.36.

Step 2: Look for the value of the z-score on the z-table. If there are two z-scores,
look for both values on the z-table.

The area under the curve to the left of Z=1.36 is 0.9131.


Step 3: Solve for the area of the region using the value/s obtained from the z-table.

The area of the entire region under the standard normal curve is one. Thus, the area
of the region to the right of Z=1.36 is equal to 1−P(Z<1.36).

Since the area of the region is to the right of the z-score, then subtract 0.9131 from
1. This gives us :

1 - 0.9131 = 0.0869.

Try it!
Find the area of the region between Z=-0.56 and Z=2.1 under the standard normal
curve.

Try it! Solution


Step 1: Look for the value of the z-score on the z-table. If there are two z-scores,
look for both values on the z-table.

The corresponding values for the z-scores -0.56 and 2.1 are 0.2877 and 0.9821,
respectively.

Step 2: Solve for the area of the region using the value/s obtained from the z-table.

Since the region is between two z-scores, subtract the z-table value of the leftmost z-
score which is 0.2877 from the z-table value of the rightmost z-score which is
0.9821. This gives us:

0.9821-0.2877=0.6944

Thus, P(−0.56<Z<2.1)=0.6944.

What do you think?


The area bounded by two z-scores is 0.8413. What are these two z-scores?

Key Points

 The area of a region under the standard normal curve cannot be negative.
 The total area of the entire region under the standard normal curve is one.
 The standard normal curve is symmetric. Thus, the area of the region to the
left of Z=z is equal to the area of the region to the right of Z=-z.

Normal Distribution, Statistics and Probability


Converting a Normal Random Variable to a Standard Normal Variable and Vice
Versa

 Topic6 pages
 Questions

Objective
At the end of this lesson, you should be able to convert a normal random variable to
a standard normal variable and vice versa.

A continuous random variable that is normally distributed is called a normal random


variable. The graph of a normal distribution is called a normal curve.

Normal distribution is the most commonly used distribution. The graph of the
distribution is shaped like a bell. Since it looks like a bell, the distribution is
symmetric. Thus, the mean, median, and mode are equal and are located at the
center.
What are the mean and standard deviation of a standard normal distribution?

The standard normal distribution has a mean equal to zero and standard deviation
equal to one as shown in the illustration above.

A standard normal variable is a normally distributed random variable, and is denoted


by z.

Learn about it!


Suppose that Math final exam scores are normally distributed with a mean of
80 and a standard deviation of 5. If Allan got 85 in the Math final exam, what is
its equivalent z-score?

Converting a Normal Random Variable to a Standard Normal Variable


If each normal random variable x in a normal distribution is converted to a standard
deviation unit called standard normal variable z, the result will be the standard
normal distribution.

A normal random variable x in normal distribution has a corresponding z value (z-


score) in standard normal distribution. It shows how many standard deviations the
normal random variable x is from the mean.
The following formula is used to convert any normal variable to its
corresponding z value on a standard normal distribution.
z=x−μσ
where

z is the standard normal distribution value,


x is the normal random variable,
μ is the mean of the normal distribution, and
σ is the standard deviation of the normal distribution.

How to Do
Step 1: Find the known and unknown facts of the problem.

Allan’s exam score = 85


mean of the Math final exam score = 80
standard deviation = 5

unknown: z

Step 2: Substitute the values to the formula and solve for the unknown.

z=x−μσ
z=85−805
z=55=1
If Allan got 85 in Math final exam, the equivalent z-score is 1.

The problem can be illustrated using the figure below.

What does it mean if Allan got a z-score of one?

The z-score shows how many standard deviations from the mean Allan's score is. In
this example, Allan’s score is one standard deviation above the mean.

Try it!
Suppose you have a set of test scores that are normally distributed with mean equal
to 80 and standard deviation equal to 5. If Alexa got 75, what is her z-score?

Try it! Solution


Step 1: Find the known and unknown facts of the problem.

Alexa’s score = 75
mean of test scores = 80
standard deviation = 5

unknown: z

Step 2: Substitute the values to the formula and solve for the unkown.

z=x−μσ
z=75−805
z=−55=−1
What does it mean to have a negative z-score?

It means that Alexa is one standard deviation below the mean.

Try it!
Math exam scores are normally distributed with a mean of 80 and a standard
deviation of 5. If Ana got a z-score of -2, what is her exam score?

Try it! Solution


Step 1: Find the known and unknown facts of the problem.
Ana’s z-score = –2
mean of the Math exam scores = 80
standard deviation = 5

unknown: x (Ana’s math score)


Step 2: Substitute the values to the formula and solve for the unknown.

z=x−μσ
−2=x−805
x=(−2)(5)+80
x=−10+80
x=70
Thus, the score of Ana is 70.

What do you think?


Which would you prefer: a math score of 80 with equivalent z-score of 2 or a math
score of 83 with equivalent z-score of – 2?

Key Points

 Any value from the normal distribution can be converted into its corresponding
value on a standard normal distribution.
 A positive z-score indicates the number of standard deviations above the
mean, while a negative z-score indicates the number of standard deviations
below the mean.
 To find the z-score, use the formula z=x−μσ.
 To find the normal value x, use the formula x=zσ+μ.

Normal Distribution, Statistics and Probability


Computing Probabilities and Percentiles Using the Standard Normal Table

 Topic5 pages
 Questions

Objective
At the end of this lesson, you should be able to compute the probabilities and
percentiles using the standard normal table.

Before you proceed with this lesson, you should be able to recall the normal curve
and its properties.

A normal curve distribution is represented by a normal curve. The area under a


normal curve indicates probability. Thus, as the area becomes larger, the probability
also becomes greater. The graph of a normal distribution is a bell-shaped curve that
extends indefinitely in both directions.
The standard normal distribution is a normal distribution with a mean μ of zero and
standard deviation σ of one and has a total area under its normal curve of one.
This lesson will show you how to calculate the probability (area under the curve) and
percentiles of a standard normal distribution.

Learn about it!


Case 1: Left of any z value or “less than” the probability

Vehicles' speeds at McArthur Hi-way have a normal distribution with a mean of 65


mph and a standard deviation of 5 mph. What is the probability that a randomly
selected car is going 73 mph or less?

Case 2: Right of any z value or “greater than” the probability

The pulse rates of adult females have a normal curve distribution with a mean of 75
beats per minute (bpm) and a standard deviation of 8 bpm. Find the probability that a
randomly selected female has a pulse rate greater than 85 bpm.

Case 3: Between any two z values or “between” the probability

The mean for IQ scores is 100 and the standard deviation is 15. What proportion of
IQ scores falls between 100 and 130?

 When a random variable x is normally distributed, you can find the probability
that x will lie in an interval by calculating the area under the normal curve for
the interval.
 To transform an x value to a z-score, use the
formula z=value − meanstandard deviation or z=x−μσ.
Note: Round the z-score to the nearest hundredth.

 The table of cumulative probability of standard normal table (also known as z-


table) gives the area (to four decimal places) under the standard normal curve
for any z value from -3.49 to 3.49.
 For example, given a z-score 1.55, the area of this z-score can be found by
looking up 1.5 in the first column and 0.05 in the top row. Where the two lines
meet gives the area of the z-score 1.55 which is 0.9394.
How to Do
Case 1: Left of any z value or “less than” the probability

Step 1: Draw the normal distribution curve and shade the area.

Step 2: Compute for the z-score for observation x.

z=x−μσ
z=73−655
z=1.60
Step 3: Find the corresponding area of the z-score.

Using the z-table, we could determine the proportion of the curve under 73 mph.
Because z=1.60 look in the 1.6 row and the 0.00 column (1.6 plus 0.00 equals 1.60).

The cumulative probability for z=1.60 is 0.9452.

Therefore, there is a 94.52% chance of randomly selecting a vehicle that is going 73


mph or less.

Case 2: Right of any z value or “greater than” the probability

Step 1: Draw the normal distribution curve and shade the area
Step 2: Compute for the z-score for observation x.

z=x−μσ
z=85−758
z=1.25
Step 3: Find the corresponding area of the z-score.

Given that z=1.25, use the z-table to determine the cumulative probability.

The cumulative probability for z=1.25 is 0.8944 which is the proportion below a pulse
rate of 85 bpm.

To find the proportion above a pulse rate of 85, subtract the area from 1.

1.000 - 0.8944=0.1056

The probability that a randomly selected female will have a pulse rate above 85 bpm
is 0.1056 or 10.56%

Case 3: Between any two z values or “between” the probability

Step 1: Draw the normal distribution curve and shade the area.

Step 2: Compute for the z-score for observation x.


For an IQ of 100,

z=100−10015
z=0
For an IQ of 130,

z=130−10015

Try it!
A mobile company survey indicates that their employees keep their mobile phone an
average of 1.5 years before replacing it with a new one. The standard deviation is
0.25 year. The mobile phone users are randomly selected. Find the probability that
the user will keep his/her mobile phone for less than a year before replacing it with a
new one.

Try it! Solution


Step 1: Draw the normal distribution curve and shade the area.

Step 2: Compute for the z-score for observation x.


z=x−μσ
z=1−1.50.25
z=−2
Step 3: Find the corresponding area of the z-score.

Using the z table, because z= -2, look in the -2 row and the 0.00 column.
The cumulative probability for z = -2 is 0.228.

Therefore, 2.28% of mobile phone users will keep their phone for less than a year
before they buy a new one.

z=2
Step 3: Find the corresponding area of the z-score.

This problem is asking for the proportion of observations that fall between a z-score
of 0 and a z-score of 2. Using the z table,

P(z≤0)=0.5000 and P(z≤2)=0.9772
P(0≤z≤2)=P(z≤2)−P(z≤0)=0.9772−0.5000=0.4772
The proportion of IQ scores between 100 and 130 is 0.4772 or 47.72%

Tips
 An x-value should be standardized first by using the
formula z=value − meanstandard deviation=x−μσ.
 It is helpful to begin by sketching a normal distribution and shading in the
appropriate region.

Key Points
 If the probability being asked is greater than the x-value, subtract the
cumulative probability from one.

 If the probability is between values x and y, subtract the cumulative probability


of x from the cumulative probability of y.

Sampling and Sampling Distributions, Statistics and Probability


Sampling Distribution

 Topic5 pages
 Questions

Objectives
At the end of this lesson, you should be able to:

 illustrate random sampling


 distinguish between parameter and statistic
 identify sampling distributions of statistics (sample mean)

Think about this! Imagine yourself conducting a study about a certain characteristic
of a very big population. Is it enough to just get one sample and make conclusions
about the population? Why? Why not? In this lesson, you will learn how taking more
random samples provides better data on the actual characteristic of the population.

Learn about it!


Consider the set of single-digit odd numbers {1, 3, 5, 7, 9}. List all possible samples
of size 2 and create a sample distribution of its sample means.

Say you are interested in studying a particular population. You need to look into
certain parameters such as the population mean (μ) or standard deviation (σ) to
describe it. A parameter is a numerical value that summarizes the data of an entire
population. However, in reality, complete information about the population is not
thoroughly accessible and may probably be unable to get the exact value.

Sampling Distribution
To address this concern, sample of the population, typically using random sampling,
and obtain a statistic is taken. A statistic is a numerical value that summarizes the
sample data. From the sample data collected, statistics such as sample means (x¯)
or sample standard deviations (s) to make predictions or approximates about the
parameters of the population can be computed. However, taking one or two samples
is not enough to be able to know if the statistic is close to the parameter of the
population. Hence, repeated samples are to be taken and sampling distribution of a
sample statistic must be looked into.

If repeated random samples of the same size (n) are taken from the sample
population, the values of the sample statistics vary from sample to sample. This
creates a sampling variability. The distribution of values of these sample statistics to
see how close they describe the parameters of the population should be taken into
consideration.

The sampling distribution of a statistic is the distribution of values of the statistics


for all possible samples of the same size from the same population.

How to Do
Note: The sample problem is of a finite population.

Step 1: List the required samples.

Step 2: Compute for the sample means.

Here, simply find the mean of the two numbers in each sample.

Step 3: Construct the sampling distribution of the sample mean.

There are several ways of presenting the sampling distribution: (a) tabular method
(b) graphical method.

a. Tabular form
b. Graphical form

Try it!
A five-sided die has been modified to appear one side with one dot, two sides with
two dots, and two sides with three dots. Construct a sampling distribution of the 20
sample means of size 3.

Try it! Solution


This is an example of a finite but quite a big population. You may choose which 20
samples you would like, but for illustration purposes, this particular set is chosen.

Step 1: List the required samples.

Step 2: Compute for the sample means.

Here, simply find the mean of the three numbers in each sample.

Step 3: Construct the sampling distribution of the sample mean.

a. Tabular form

b. Graphical form

What do you think?


Given that you can compute for the sample mean of repeated samples, will you be
able to approximate the value of the population mean?
Key Points

 A parameter is a numerical value that summarizes the data of an entire


population.
 A statistic is a numerical value that summarizes the sample data.
 Collecting more samples from the population to be able to form a sampling
distribution gives us a better idea of the population.

Sampling and Sampling Distributions, Statistics and Probability


Sampling Distribution of the Sample Mean When Variance Is Known

 Topic6 pages
 Questions

Objective
At the end of the lesson, you should be able to define the sampling distribution of the
sample mean for normal population when the variance is known.

The sampling distribution of a statistic is defined to be the distribution of a statistic


when taken from a random sample of size n from the population. In other words, if all
the possible samples of size n from the population are taken and a particular statistic
from each sample is computed, then the sampling distribution of that statistic is
found.
For normal populations where the population mean and variance are known, the
sampling distribution of the sample mean can be defined with the help of these
parameters.

How does the sampling distribution of the sample mean of a normal population when
the variance is known can be defined?

Learn about it!


To define the sampling distribution, the two most important things to know are the
mean and standard deviation.

In a normal population, the mean μM of the sampling distribution of the sample mean
is always equal to the population mean μ.
μM=μ
In computing for the standard deviation σM of the sampling distribution σ and the
sample size n are needed.
σM=σn
The standard deviation of the sampling distribution is commonly called as
the standard error. The standard error is used to measure the accuracy of which
the sample represents the population.
However, the mean and standard deviation of sampling distributions are only
considered to be true if the sample size is large enough. A sample size above 30 is
generally accepted in practice.

Examples
1). A group of 1000 students took an achievement test. The scores have a normal
distribution and the population mean and variance of the scores are 85 and 9,
respectively. Define the sampling distribution of the sample mean of the scores with
a sample size of 36. (Note: The standard deviation is the square root of the
variance.)

In this example, the population mean μ is equal to 85. Thus, μM is also equal to 85.
For the standard error of the mean, it can be seen that n=36 and σ=3. Solve for σM.
σM=σn
σM=336=36=0.5
Therefore, the sampling distribution of the sample mean of the scores with a sample
size of 36 has a mean of 85 and standard deviation of 0.5.

2). The mean height of men aged 20 to 30 in a city is normally distributed with a
mean of 67 inches and a variance of 16. Define the sampling distribution of the
sample mean with a sample size of 400.
In this example, the population mean μ is equal to 67. Thus, μM is also equal to 67.
For the standard error of the mean, it can be noticed that n=400 and σ=4. Solve
for σM.
σM=σn
σM=4400=420=0.2
Therefore, the sampling distribution of the sample mean of the heights with a sample
size of 400 has a mean of 67 and standard deviation of 0.2.

Explore
Consider the same data on the height of men aged 20 to 30 in a city which is
normally distributed with a mean of 67 inches and a variance of 16. The standard
error is 0.2 when the sample size is 400. What if the sample size is 900? What if it is
further increased to 1600? What does this say about the relationship between the
sample size and standard error?

Try it!
The average household income in a municipality is ₱14 000 with a standard
deviation of ₱400. Define the sampling distribution of the sample mean with a
sample size of 100 households if the population household income follows a normal
distribution.
Try it! Solution
In this example, the population mean μ is equal to 14 000. Thus, μM is also equal to
14 000.
For the standard error of the mean, it can be seen that n=100 and σ=100. Solve
for σM.
σM=σn
σM=4000100=400010=400
Therefore, the sampling distribution of the sample mean of household incomes with
a sample size of 100 has a mean of 14000 and standard deviation of 400.

What do you think?


A local government that does not have a population data of their average household
income in the municipality decided to conduct a survey with a sample of 400
households. Assuming that the data are normally distributed, how can they define
the sampling distribution of the sample mean if population parameters are not
available?

Key Points

 The mean of the sampling distribution of the sample mean is equal to the


population mean in a normal population.
 The standard deviation of the sampling distribution of the sample mean is
equal to the population standard deviation divided by the square root of the
sample size.
 The standard deviation of the sampling distribution of the sample mean is also
called the standard error.

Sampling and Sampling Distributions, Statistics and Probability


Sampling Distribution of the Sample Mean When Variance Is Unknown

 Topic6 pages
 Questions

Objective
At the end of this lesson, you should be able to define the sampling distribution of the
sample mean for normal population when the variance is unknown.

The sampling distribution of a statistic is defined to be the distribution of a statistic


when taken from a random sample of size n from the population. In other words, if
you take all the possible samples of size n from the population and compute for a
particular statistic from each sample, then you get the sampling distribution of that
statistic.
For normal populations where the population mean and variance are known, the
sampling distribution of the sample mean can be defined with the help of these
population mean and variance. However, in most real-life applications, the
parameters are usually unknown.

How can the sampling distribution of the sample mean of a normal population be
defined when the variance is unknown?

Learn about it!


To grasp the idea of knowing the mean of the sampling distribution of the sample
mean when the population mean and variance are unknown, consider a small
population and solve for the sample mean for every possible sample without
replacement.

To solve for the population mean, use the formula μ=∑xN where μ is the population
mean, ∑x is the total of data values and N is the number of data values in the
population.
Five students took an exam on Statistics and Probability and got the following
scores: 5, 10, 17, 19, and 22. Find the mean of the sampling distribution of the
sample mean with a sample size of 3.
Solve the population mean.

μ=∑xN
μ=frac5+10+17+19+225
μ=14.6
Now, solve for the sample mean of all possible combinations of samples of size 3.

Then, solve for the mean of the sample means using the
formula μM=∑xN where μM is the mean of the sample means, ∑x is the total of the
sample means and N is the number of data values in the population.
μM=10.67+11.33+12.33+13.67+14.67+15.33+15.33+16.33+17+19.3310
μM=14.6
In this example, the mean of the sampling distribution of the sample mean is equal to
the population mean. This may not be the case for all instances, but this only implies
that the mean of the sampling distribution of the sample mean is approximately equal
to the population mean.
However, if the population is very large, it is not possible to solve for all the possible
combinations of samples of size n. In most cases, the mean of the sampling
distribution of the sample mean should be solved using a single sample. In these
cases, a simple assumption can be made based on the results of the single sample.
In the example, notice that μM=14.6 is also at the center of the sampling distribution
which is normally distributed. Thus, it can be assumed that the mean from a single
sample is most probably equal to the mean of the sample means and the population
mean.
Moreover, the same assumption can be made about the standard deviation of the
sampling distribution of the sample mean and the population standard deviation
using a single sample. Thus, using the sample size and sample standard deviation
instead of the population standard deviation, σM (standard deviation of the sample
means) can be estimated to be equal to fracsn where s is the standard deviation of
the sample and n is the sample size. Remember that in solving for the standard
deviation of the sample, the sample size used is not n but n−1.
Note that these assumptions only work for large sample sizes. A sample size of at
least 30 is considered large enough in practice.

Example
The heights of a sample of 100 children aged 6 to 8 in a rural area are measured.
Their average height is 50 inches with a standard deviation of 14 inches. Estimate
the mean and standard deviation of the sampling distribution of the sample mean.

In this example, no parameters were given so an assumption about the sampling


distribution of the sample mean can be made using the data of the single sample of
100 children.

Thus, the mean of the sampling distribution of the sample mean is also 50 inches.

It is given that s=14 and n=100. Compute for the standard error of the mean using


the formula.
σM=sn
σM=14100
σM=1.4

Explore
Data from a sample of 30 social media users were collected to determine the
average time a person spends visiting social media every day. The average time
spent on social media every day by the 29 out of the 30 respondents is 25 minutes.
The remaining one respondent, who admits to being a social media addict, spent
408 minutes every day. How would this value affect the sampling distribution of the
sample mean?

Try it!
The average weight of women aged 40 to 50 who are working in a business district
is computed using a sample size of 60. Their average weight is 62 kg with a
standard deviation of 15 kg. Estimate the mean and standard deviation of the
sampling distribution of the sample mean.

Try it! Solution


No parameters were given so an assumption about the sampling distribution of the
sample mean can be made using the data of the single sample of 60 women.

Thus, the mean of the sampling distribution of the sample mean is also 62 kg.

It is given that s=15 and n=60. Compute for the standard error of the mean using the
formula.
σM=sn
σM=1560
σM=1.94

What do you think?


The sampling distribution of the mean is normally distributed if the population follows
a normal distribution and the sample size is large. What if the population distribution
is not normal but the sample size n is very large?

Key Points

 If no parameters are available, the mean and standard deviation of the


sampling distribution of the sample mean is estimated to be equal to the mean
of a single sample given that the sample size is large enough.
 The assumption that the mean of a single sample is approximately equal to
the mean of the sampling distribution of the sample mean and the population
mean is based on probability, given that the population follows a normal
distribution.
 The assumptions regarding the sampling distribution of the sample mean of a
normally distributed population are reliable only if the sample size is large
enough. Determining whether the sample size is large enough or not, purely
depends on the nature of data and judgment of a researcher/statistician. A
sample size of at least 30 is considered large enough in most cases.

Sampling and Sampling Distributions, Statistics and Probability


The Central Limit Theorem

 Topic5 pages
 Questions

Objective
At the end of this lesson, you should be able to understand how a sampling
distribution of the mean of large sample sizes approaches a normal distribution.
In this lesson, you need to remember the properties of a sampling distribution and
how to get the mean and variance of the sampling distribution of sampling means.

To get the standard deviation of the population, use the following formula:

σ=1N∑i=1N(xi−μ)2
where:

N is the population size


xi is the mean of the random sample
μ is the population mean

Learn about it!


The Central Limit Theorem states that the sampling distribution of the mean
approximates a normal distribution with a mean of μ and a standard deviation of σn if
the sample size N of the random samples is large enough.
This means to say that as we take more samples with large sample sizes, the
sampling distribution obtained will closely resemble to that of a normal distribution.
But how large should the sample size be?

Statisticians differ in opinion as to which sample size to take. Some would suggest
taking a sample size of at least 30 to get a close approximation of a normal
distribution. Others would suggest a sample size as large as 50 or even more. This
happens when the parent population sampled does not appear to be normal.

When a population is already normally distributed, the sampling distribution of the


mean is normal as well. In cases that the parent population does show normality, the
central limit theorem guarantees that the distribution of sampling means approaches
normality given that large sample sizes are taken.

Example
To illustrate the Central Limit Theorem, take the population that contains five
numbers 1, 2, 3, 4, and 5 and consider the samples of size two.

The histogram below shows the distribution (N=1) of the population.

The following are the population parameters:

a. Mean

μ=1+2+3+4+55=155=3
b. Standard Deviation

σ=15∑i=15(xi−3)2=1.414
Sampling distribution of sample means of sample size n=2
The histogram below shows the distribution (n=2) of the sampling distribution of the
sample means.

Summary:

1. The mean of the sampling distribution will equal to the mean of the population,
i.e. μ=μx¯=3.
2. The standard error of the mean for the sampling distribution is σx¯=1.
3. The histogram of the sampling distribution very strongly suggests normality.

Try it!
Test the normality of a sampling distribution of sample means for 30 random
samples with sample size n=3 from the population that contains five numbers 1, 2, 4,
3, and 5.

Try it! Solution


The population parameters are μ=3 and σ=1.414.
The sampling distribution statistics (n=3) are the following:
a. mean
μ=μx¯=3
b. standard deviation
σx¯=1.4143=0.816
The histogram below shows the distribution (n=3) of the sampling distribution of the
sample means. The histogram also suggests normality.

What do you think?


How does the sample size of random samples affect the variability of the
distribution?
Key Point
The Central Limit Theorem guarantees that the sampling distribution of the mean
approximates a normal distribution given that the sample size of the random samples
taken is large enough.

You might also like