You are on page 1of 45

02.

1 Review on Probability
and Sampling Distributions
ANNA MARIA LOURDES S. LATONIO, Ph.D.
BA 705
2nd Semester, 2020-2021

CENTRAL LUZON STATE UNIVERSITY


Parameters of a Probability Distribution
A group or family of probability distributions refers to a
collection of probability distributions that is indexed by a quantity
called a parameter. A parameter is a numerical characteristic of a
probability distribution.

Also, the parameter(s) of the population is(are) the


parameter(s) of the distribution of the population.

Very often, observations generated by different random


experiments behave similarly, and as a result, the random
variables associated with these experiments can be expressed
essentially by the same probability distribution and therefore can
be represented by a common formula.
Special Probability Distributions
Perhaps you were introduced to the following Discrete Probability
Distributions:

1. Discrete Uniform UD (a,b)


2. Binomial B(n,p)
3. Poisson G(𝜆)
4. Hypergeometric H(N, D, n)

and Continuous Probability Distributions such as:

1. Continuous Uniform UC (a,b)


2. Normal N(𝝁, 𝝈𝟐 )
3. Standard Normal N(𝟎,1)
4. Student’s t t(𝒗)
Special Probability Distributions
For this chapter, we will review some Continuous Probability
distributions to have a better appreciation of the different tests of
hypothesis and significance that will be later discussed in the
course.

The following continuous probability distributions will be


described:
1. Normal N(𝜇, 𝜎 2 )
2. Standard Normal N(0,1)
3. Student’s t t(𝑣)
4. F Distribution F(𝑣1 , 𝑣2 )
5. Chi-square Distribution 𝑋 2 (k)
The Normal Distribution
One of the most used continuous probability
distribution is called the normal probability
distribution. Many variables are approximately
normally distributed and therefore can be
represented by the normal distribution.
The Normal Distribution
The graph of this distribution has the following characteristics:
1. It is bell-shaped and has a single peak at the center of the
distribution.
2. The mean, median, and mode are at the center of the
distribution.
3. It is symmetric about the mean.
4. It is continuous, asymptotic (never touches the x-axis)
curve.
5. The total area under the curve is equal to 1 or 100 %
6. The position of the normal distribution on the x-axis is
determined by the mean , and the spread of the
distribution is determined by the standard deviation .
The Normal Distribution
Since the normal distribution is very often used, a
special notation is used for it.
If the random variable X is normally distributed, with
mean  and variance 2, it is written as
X  N(, 2)
The parameters of the normal distribution are
 and 2.
The Normal Distribution
X N(,2)
The parameter of the normal distribution are  and 2.

Example. Suppose the normal population consisting of


the production X of all rice farmers in 2020 has a mean 105
cav/ha and variance 400 (cav/ha)2 , then the parameters of
the distribution of the population is  = 105 and 2 = 400.
The Normal Distribution
Below are some examples of graphs of normal distributions.
These graphs are called normal curves.
Figure 1 is a sketch of two normal curves having the same
variances but different means. The two curves have the same
form but are located at different positions along the
horizontal axis.

Figure 1. Normal curves with 1  2 and 1 = 2


2 2
The Normal Distribution
Below are some examples of graphs of normal distributions. These
graphs are called normal curves.
In Figure 2 are two normal curves with the same mean but different
variances, thus they are centered at the same position on the
horizontal axis but have different forms. The normal curve with the
larger variance is lower and has a wider spread.

Figure 2. Normal curves with 1 = 2 and 12  22


The Normal Distribution
Below are some examples of graphs of normal distributions. These
graphs are called normal curves.
Figure 3 sketches two normal curves with different means and
different variances.

Figure 5.3 Normal curves with 1  2 and 12   22


Areas under the Normal Curve
The graph of any continuous probability distribution may be
constructed so that the areas under the curve, bounded by the lines
at X = x1 and X = x2 may be obtained. This area is actually equal to
the probability that the random variable will assume a value
between x1 and x2. Thus for the normal curve in Figure 4, the
shaded area represents the probability
P(x1 < X < x2).

Figure 4 Shaded area is equal to P(x1 < X < x2).


The Area under the Normal Curve
Let X N(,2)

Properties:
1. The curve is symmetric about a vertical axis
through the mean .
2. The median and the mode occurs at X = .
3. The total area under the curve and above the
horizontal axis is equal to 1.
4. The normal curve approaches the horizontal axis
asymptotically as we proceed in either direction
away from the mean.
Areas under the Normal Curve
Once the parameters  and 2 are specified, the graph
of the probability density function N(,2) is completely
determined. If a table of probabilities is available for the
normal distribution under study, it would be easier to
make use of this table than to use integral calculus.
However, it would be a very tedious task to attempt to
set up separate tables or curves for every conceivable
pair of  and 2.
Fortunately, every normal random variable X may be
transformed to a new set of observations Z that is also
normally distributed but with mean 0 and
variance equal to 1.
Areas under the Normal Curve
Note:
For continuous random variables,
P(X = c) = 0
Thus, P(x < c) = P(x < c)
P(x > c) = P(x < c)
P(a < x < b) = P(a < x < b)
P(a < x < b) = P(a < x < b)
P(a < x < b) = P(a < x < b)
Where a, b, and c are elements of the set of real
numbers.
The Standard Normal Distribution
The distribution of a normal random variable Z
with mean zero 0 and variance equal to 1 is
called a standard normal distribution.

Z is called the standard normal random


variable or standard score.

In symbols, Z N( 0 , 1 )
Finding Areas under the
Standard Normal Curve
Let ZN(0 ,1). Using Table A1, find the
following probabilities:
1. P(0 < Z < 1.27)
2. P(Z > 1.61)
3. P(|Z| < 1.57)
4. P(|Z| > 1.96)
Table A1. Areas under the
Standard Normal Curve
Finding Areas under the
Standard Normal Curve
1. P(0 < Z < 1.27)
= 0.3980
The probability that Z will assume a
value between 0 and 1.27 is equal to
0.3980.
Finding Areas under the
Standard Normal Curve 0.4515

2. P(Z > 1.61)


= P(Z>0) – P(0<Z<1.61) P(0<Z<1.61) P(Z>1.61)
= 0.5000 - 0.4515
The probability that Z will assume a value at
= 0.0485 least greater than 1.61 is equal to 0.0485.
Finding Areas under the
Standard Normal Curve
3. P(|Z| < 1.57)
= P(-1.57< Z < 1.57) P(-1.57 < Z< 0) = P(0 < Z< 1.57) = 0.4418
= 2[P(0 < Z < 1.57)]
The probability that Z will assume an absolute
= 2[0.4418] = 0.8836 value less than 1.57 is equal to 0.8836.

ASLatonio 1st Semester 2020-2021


Finding Areas under the
0.4750
Standard Normal Curve
4. P(|Z| > 1.96)
= P(Z <-1.96) + P(Z>1.96)
P(Z < -1.96) = P(Z > 1.96)
= 2[P(Z>1.96)]
The probability that Z will assume an
= 2[1 – P(0<Z<1.96)] absolute value at least greater than 1.96
= 2[1 – 0.475] = 2[0.025] = 0.05 is equal to 0.05

ASLatonio 1st Semester 2020-2021


The Z-transformation
Any value of a normal random variable 𝜽 with
mean 𝝁𝜽 and variance 𝝈𝜽 2 may be transformed to
its standard score or standard normal value Z
using the formula
𝜽 − 𝝁𝜽
Z=
𝝈𝜽

𝑿 − 𝝁𝒙 𝑿 −𝝁
For X N(,2) → Z= =  N(0 ,1)
𝝈𝒙 𝝈
Finding probabilities for a normal random variable X
using Z-transformation
Given X N(,2)
To determine P(x1 < X < x2)
𝒙𝟏 −𝝁 𝒙𝟐 −𝝁
Solve for z1 = and z2 =
𝝈 𝝈

Then P(x1 < X < x2) = P(z1 < Z < z2)


Find P(z1 < Z < z2) using Table A1.

X1 𝜇 X2 X
Z1 0 Z2 Z
Finding probabilities for a normal random variable X
using Z-transformation

Let X be a normal random variable with mean 5


and variance 16.

Find:
Using Z-transformation:
(a)P(5 < X < 10)
(b)P(X > 12) 𝑿 − 𝝁𝒙
Z=
𝝈𝒙
Finding probabilities for a normal random variable X
using Z-transformation
(a) Let X be a normal random variable with mean 5 and variance 16.
Based on the given, X N( 5, 16)
mean  = 5
variance 2 = 16
Standard deviation  = 2 = 16 = 4

We are required to find P(X > 18)


x1− x2−
P(x1 < X < x2) = P  < Z <  = P(z1 < Z < z2)

5−5 10−5
P( 5 < X < 10) = P <Z< = P( 0 < Z < 1.25 ) = 0.3944
4 4
Finding probabilities for a normal random variable X
using Z-transformation
(b) Let X be a normal random variable with mean 5 and variance 16.
Based on the given, X N( 5, 16)
mean  = 5
variance 2 = 16
Standard deviation  = 2 = 16 = 4

We are required to find P(X > 16)


12 −5
P(X > 12) = P( Z > 4 ) = P( Z > 1.75)
= 0.5 - P(0 < Z < 1.75)
= 0.5 – 0.4599 = 0.0401
Sampling Distribution of a Statistic

The distribution of a statistic is called the


sampling distribution of a statistic.

What again is a statistic?


The Central Limit Theorem
If random samples of size n are drawn
from a large population or infinite population
with mean  and variance 2, then the sampling
distribution of the sample mean 𝑿 is
approximately normally distributed with
𝝈𝟐
mean 𝝁𝒙 =  and variance 𝝈𝒙 𝟐 = 𝒏 , provided
that n is large (n > 30).

Note: In some references, it is stated as n > 30.


The Central Limit Theorem
The distribution of the sample mean 𝑋 when the
sample size n > 30.

The Central Limit Theorem does not mention about the


form of the original or parent population from which the
random sample was drawn. Provided that the
population has a mean  and variance 2, and with a
large sample size, then the sample mean will have
approximately the normal distribution. In symbols, this
can be written as
𝝈𝟐
𝑿N (, ).
𝒏
The Central Limit Theorem
If sample n>30

x  (µ, 2)
The Sampling Distribution of 𝑋
Central Limit Theorem and the

Sampling Distribution of 𝒙
Sample of size n > 30
A B
Population X 𝑋
of X values

µ, 2
By the Central Limit Theorem
The distribution of the sample mean 𝑋 when the
sample size n > 30 is
𝜎2
𝑋  N( , ).
𝑛
and by the Z-transformation we have the statistic
Z given by
Recall Z transformation:
𝑋−𝜇
Z=
𝜽 − 𝝁𝜽
Z=
𝜎 𝝈𝜽

𝑛
where Z is a standard normal random variable.
𝑋−𝜇
The statistic Z = 𝜎 is used in statistical inference regarding the
𝑛
value of the population mean.
Sampling from the
Normal Population
Sampling from the
Normal Distribution /Population
The normal distribution plays a very important role in
statistics. In the succeeding slides are sampling
distributions of some statistics from observed values of
random samples from normal populations. These
sampling distributions are used for inferences concerning
the mean and variance of a population, and because of
their importance in statistical inference, their
distributions have been tabulated extensively, that is,
statistical tables on these sampling distributions are
available in statistics books. Important values related to
these variables can be easily obtained using Excel.
Student’s t Distribution
If 𝑿and S2 are the sample mean and sample variance,
respectively, of a random sample of size n taken from a
population that is normally distributed with mean  and
variance 2, then
𝑋−𝜇
t= 𝑆
ൗ 𝑛

is a value of a random variable t having the t-distribution


with v = n-1 degrees of freedom.
An important use of the t-distribution concerns inferences about the
value of the population mean.
Student’s t Distribution
Like the normal distribution, the t-distribution is also
symmetric.

P(tv > t,(v)) = .


An important use of the t-distribution
concerns inferences about the value of the
population mean.
Table A2.
𝛼
Critical values of the
Student’s t Distribution

P(t > t𝛼(v) ) = 𝛼

t0.01(16) = 2.583
P(t > 2.583 ) = 0.01

t0.005(7) = 3.499
P(t > 3.499 ) = 0.005

t0.05(10) = 1.812
P(t > 1.812 ) = 0.05
The F Distribution
If 𝑺𝟐𝟏 and 𝑺𝟐𝟐 are the sample variances of independent
random samples of size n1 and n2 taken from normal
populations with variances 𝝈𝟐𝟏 and 𝝈𝟐𝟐 , respectively, then for
𝑺𝟐𝟏 >𝑺𝟐𝟐 ,

𝑺𝟐𝟏 /𝝈𝟐𝟏
F=
𝑺𝟐𝟐 /𝝈𝟐𝟐

is a value of a random variable F having the F distribution


with v1 = n1 – 1 and v2 = n2 – 1 degrees of freedom.

An important application of the F-distribution is with regards to the


inference on the equality of variances of two populations.
The F Distribution

P( Fv1,v2 > F,(v1, v2)) = .


An important application of the F-
distribution is with regards to the
inference on the equality of
variances of two populations.
Table A3
Critical values of F for 𝛼 = 0.05
𝛼 = 0.05

P(F > F𝛼(v1,v2) ) = 𝛼

F0.05(5,8) = 3.687
P(F > 3.687) = 0.05

F0.05(7,12) = 2.913
P(F > 2.913) = 0.05

F0.05(5,3) = 9.013
P(F > 9.013) = 0.05
The Chi-square Distribution
If S2 is the sample variance of a random sample of size
n taken from a normal population having the variance
2, then

(𝒏−𝟏)𝑺𝟐
Q=
𝝈𝟐

is a value of a random variable having the chi-square


distribution with k = n-1 degrees of freedom.

An important use of the chi-square distribution concerns a statistic


used in inference regarding the value of the population variance.
The Chi-square Distribution

P(Qv > 2,(k)) = .


An important use of the chi-square
distribution concerns a statistic used in
inference regarding the value of the
population variance.
Table A4.
Critical values of the Chi-
square Distribution

P(Q > X2𝛼(v) ) = 𝛼

X20.05(6) = 12.592
P(Q > 12.592) = 0.05

X20.01(13) = 27.688
P(Q > 27.688) = 0.01

X20.025(14) = 26.119
P(Q > 26.119) = 0.025
Mazel Tov ☺

You might also like