Chap4-Probability and Sampling-Upload

1
CHAPTER 4
PROBABILITY AND SAMPLING
DISTRIBUTIONS
Outline of the Subject
2
Chapter 1: Introduction-to-Applied-Probability-Statistics
Chapter 2: Numerical_Summary_Measure
Chapter 3: Bivariate_Multivariate_Data_Distribution
Chapter 4: Probability_Sampling_Distributions
Chapter 5: Estimation and Statistical Intervals

Chapter 6: Testing Statistical Hypotheses
Outline of Chapter 4
3
4.1 Chance Experiments and Probability Concepts

4.2 Independence
4.3 Random Variables
4.4 Sampling Distributions
4.5 Describing Sampling Distributions and The Central

Limit Theorem
à Chance Experiments
• A chance experiment, also called a random experiment,

is simply an activity or situation whose outcomes, to some
degree, depend on chance.
• To decide whether a given activity qualifies as a chance
experiment à
Ø Getting exactly the same result if repeating the
experiment more than once?
Ø If the answer is “no,” then the experiment qualifies
as a chance experiment.
à Chance Experiments à Events
• The outcomes of chance experiments. These outcomes can be

divided into two types:
Ø (1) simple events, which are the individual outcomes of an
experiment.
Ø (2) events, which consist of collections of simple events.
q For instance, the chance experiment of conducting a series of stress tests on three
metal parts has the eight possible outcomes PPP, PPF, PFP, FPP, PFF, FPF, FFP,
and FFF, where P and F denote the test results “pass” and “fail,” and the order in
which the letters appear corresponds to the part number tested.
• Each of outcomes is a simple event, which, taken together, form the

sample space of the experiment.
à Chance Experiments à Events
• Events are often denoted by single uppercase letters

• Events can also be described by just listing, in brackets, the simple
events that comprise them.
For example:
Ø The event that at least two parts pass the stress test corresponds to
the set of outcomes {PPP, PPF, PFP, FPP}.
Ø If we had also chosen to denote this event by the letter A, then we
could also write A = {PPP, PPF, PFP, FPP}.
à Chance Experiments à Depicting Events
• Tree diagrams are especially useful for depicting experiments that

are conducted in a sequence of steps.
Tree diagram for the experiment of

selecting and testing three metal
parts (branches forming the simple
event PPF are shown shaded)
à Chance Experiments à Depicting Events
• The Venn diagram, is especially useful for depicting relationships

between events.
• Venn diagrams are simple two-
dimensional figures, often rectangles or
circles, whose enclosed regions are
intended to depict a collection of simple
events, called points, in a sample space.
E.g: Event A = at least two parts pass the
test contains all of the simple events
Event C= exactly three parts pass the test,
à C is shown inside of A
Event B = at most two parts pass the test.
à Chance Experiments à Forming New Events
• One of the primary methods for creating complex events and,

therefore, for unraveling them, involves the use of the words and, or,
and not.
DEFINITIONS
For a chance experiment and any two events A and B:

1. The event A or B consists of all simple events that are contained in
either A or B. A or B can also be described as the event that at least one
of A or B occurs.
2. The event A and B consists of all simple events common to both A
and B. A and B can be described as the event that both A and B occur.
3. The event A’, called the complement of A, consists of all simple
events that are not contained in A.
A’ is the event that A does not occur.
à Chance Experiments à Forming New Events
• When two events A and B have no simple events in common, we say that
they are mutually exclusive or disjoint ( cannot occur simultaneously).
• Several of the previous definitions can be extended to include events
formed from more than two events.
DEFINITIONS
Given a chance experiment and any events A1, A2, A3, . . . , Ak:
1. The event A1 or A2 or A3 or . . . or Ak consists of all the simple events that are
contained in at least one of the events A1, A2, A3, . . . , or Ak. It can also be described
as the event that at least one of the events A1, A2, A3, . . . , or Ak occurs.
2. The event A1 and A2 and A3 and . . . and Ak consists of all simple events
common to all the events A1, A2, A3, . . . , and Ak. This event can be described as
the event that all of the events A1, A2, A3, . . . , and Ak occur.
3. Several events A1, A2, A3, . . . , and Ak are said to be mutually exclusive or
disjoint if no two of them have any simple events in common.
à Probability Concepts
• Probability allows us to quantify the likelihood associated with

uncertain events, that is, events that result from chance experiments.
• Generally speaking, the probability of an event can be thought of as

the proportion of times that the event is expected to occur in the long
run.
• Probabilities are reported either as proportions (between 0 and 1) or

as percentages (between 0% and 100%).
à Probability Concepts à Assigning Probabilities
Probability Axioms
1. The probability of any event must lie between 0 and 1. That is,
0 £ P(A) £ 1 for any event.
2. The total probability assigned to the sample space of an
experiment must be 1.
à There are several ways to determine probabilities:

Ø (1) as frequencies of occurrence,
Ø (2) by using density and mass functions
à Probability Concepts à Assigning Probabilities
• Depending on the circumstances, each method has its merits.
• For example, when it is possible to repeat a chance experiment, the

“frequentist” approach defines the probability of an event A to be the
long-run ratio
• As the number of trials increases, we expect this ratio to stabilize and

eventually approach a limiting value.
à The Addition Rule for Disjoint Events
• Disjoint, or mutually exclusive, events are events that cannot occur

simultaneously. For any two disjoint events A and B,
• More generally, for any collection of disjoint events A1, A2, A3, . . . ,
Ak,
à Complementary Events
• The complement A¢ of an event A was defined to be the collection of

simple events that are not in A.
Ø (A¢)¢ = A
• General addition rule

4.2 Independence
à Independent Events
Two events, A and B, are independent events if the probability that either one occurs is
not affected by the occurrence of the other. In this case,
Several events, A1, A2, A3, . . . , Ak, are independent if the probability of each
event is unaltered by the occurrence of any subset of the remaining events.
In this case, the product rule can be applied to any subset of the k events. That
is, the probability that all the events in any subset occur equals the product of
their individual probabilities of occurring. In particular, for all k events,
4.3 Random variables
à Introduction
ü Random Variable (RV): A numerical characteristic whose value depends on

the outcome of a chance experiment is called a random variable.
• Example: In a chemical reaction, any quantifiable feature associated with
the reaction is a random variable (e.g., yield, density, weight, viscosity,
volume, and translucence of the material produced)
• Notation: capital letters denote random variables, e.g., X or Y, and lower

case letters denote possible values of the random variables, e.g., x or y.
à Introduction à Example - Coin Tosses
Example 4.1: A coin is tossed three times and the sequence of heads
and tails is noted. The sample space for this experiment is
Let X be the number of heads in the three tosses. X assigns each

outcome z in S a number from the set
X is a random variable taking on values in the set

à Probability Density Function - pdf
à Example - Coin Tosses
What is the pmf of X?
What is the CDF of X?

à The Cumulative Distribution Function - cdf
The cumulative distribution function (cdf) of a random variable X

is defined as the probability of the event
• The cdf is the probability of the event
• It is the probability that the random variable X takes on a

value in the set
à The Cumulative Distribution Function – cdf
à Example : Three Coin Tosses
• FX(x) is simply the sum of the probabilities of the outcomes from {0,1,2,3}
that are less than or equal to x.
• The cdf has jumps at the points 0, 1, 2, 3 of magnitudes 1/8, 3/8, 3/8, and
1/8, respectively
u(x)
1
u(t) is unit step function 0 x

à The Cumulative Distribution Function – cdf
à Basic properties of the cdf
à Joint random variable à Joint distribution function
23
X and Y are two random variables defined on the same sample space S.
¨ If 𝑥 and 𝑦 are both discrete, their joint distribution is specified by a

joint mass function 𝑓(𝑥,𝑦) satisfying:
1. 𝑓 𝑥, 𝑦 ≥ 0
2. ∑!""($,&) 𝑓 𝑥, 𝑦 = 1
¨ Often, there is no nice formula for 𝑓(𝑥,𝑦).

¨ When there are only a few possible values of 𝑥 and 𝑦, the mass
function is most conveniently displayed in a rectangular table.
à Joint random variable à Joint distribution function
24
¨ Example 4.2: A certain market has both an express checkout register

and a super-express register.
¨ Let x denote the number of customers queueing at the express register
at a particular weekday time, and let y denote the number of
customers in line at the super-express register at that same time.
¨ The joint mass function is as given as:
Q:
1. What is the probability to have the
number of customers at the express
register equal to the number of customers
at the super-express?
2. What is the probability to have total 2
customers at these two registers at the
same time?
4.3 Joint Distributions
Distributions for two discrete variables
25
¨ Example 4.2:
Distributions for two discrete variables
à The marginal probability mass functions
26
¨ Def:
Q: What is the marginal pmf of X and Y?

Distributions for two continuous variables
27
¨ If 𝑥 and 𝑦 are both continuous, their joint distribution is specified by a joint

mass function 𝑓(𝑥,𝑦) satisfying:
1. 𝑓 𝑥, 𝑦 ≥ 0
" "
2. ∫!" ∫!" 𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦 = 1
¨ The graph of f (x, y) is a surface in three-dimensional space.
¨ The second condition indicates that the total volume under this density surface
is 1.
à Figure :
Volume representing the proportion
of ( x, y) in the region A
28
¨ Example 4.3:
à 1. Verify the joint pdf satisfy the two conditions.

à 2. What is the probability that neither facility is busy more than one-
quarter of the time ?
29
1.
2.
30
¨ Def:
Back to Example 4.3 :
à The marginal pdf of X, which gives the probability distribution of busy

time for the drive-up facility without reference to the walk-up window, is
31
¨ Back to Example 4.3 :
What is the probability distribution of busy time from for the walk-up
window for the period between one-quarter and third-quarter of the time ?
à The marginal pdf of Y, which gives the probability distribution of busy
time for the walk-up window without reference to the drive-up facility is
Leads to
The Bivariate Normal Distribution
32
¨ The bivariate normal joint density function is given by

$ !%'! # !%'! "%'" "%'" #
( , [ ,)- / ]
# $%&#
𝑓 𝑥, 𝑦 = 𝑒 (! (! (" ("
(1)
)*+!+" (,-#
where −∞ < 𝑥 < ∞ and −∞ < 𝑦 < ∞
¨ When 𝑥 and 𝑦 are statistically independently, the joint density function
𝑓(𝑥,𝑦) must satisfy
𝒇 𝒙, 𝒚 = 𝒇𝟏 𝒙 𝒇𝟐 𝒚
where 𝑓( 𝑥 and 𝑓) 𝑦 denote the marginal distributions of 𝑥 and 𝑦,
respectively.
Note that once independence is assumed, one has only to select appropriate
distributions for 𝑥 and 𝑦 separately and then use (1) to yield the joint
distribution.
Correlation and the Bivariate Normal Distribution
33
¨ The correlation coefficient 𝜌 is defined by

𝜎#$
𝜌=
𝜎# 𝜎$
Ø 𝜌 does not depend on the 𝑥 or 𝑦 units of measurement.
−1 ≤ 𝜌 ≤ 1
Ø The closer 𝜌 is to +1 or -1, the stronger the linear relationship between the
two variables.
¨ The covariance between

" "
𝑥 and 𝑦 is defined by:
𝜎#$ = & & (𝑥 − 𝜇# )(𝑦 − 𝜇$ )𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦

!" !"
where 𝜇# and 𝜇$ denote the mean values of x and y, respectively.
34
¨ Example: Images displayed on computer screens consist of thousands of

pixels. The intensity of the electron beam focused at a given point (x0,
y0) on a flat screen is usually described by two independent normal
random variables x and y, with means x0 and y0, respectively.
à The intensity of the beam by a joint

density function of two independent
random variables.
à Figure shows a graph of the joint
density function describing an electron
beam focused on the point (x0, y0) = (30,
50).
à The standard deviations of the two
normal distributions are 𝜎# = 0.2 and
𝜎% = 0.2
35
¨ Example (cont’):
àBecause x and y are
independent, we can write the
joint density as the product:
Q: What is the proportion of time

that the beam spends in the region
where x < 29.5 and y < 49.6?
à Instead of integrating the density over the region B = {(x, y)|x < 29.5, y < 49.6},
we can simply use the independence of x and y to obtain:
4.3 Joint Distributions: Exercise
36
àMean of Random Variables
The mean of a random variable X can be thought of as the long-run average

value of X that should occur in many repeated trials of a chance experiment.
à Fortunately, when the probability distribution of X is known, there is no
need to actually perform repeated experimental trials.
à Instead, we define the mean to be the mean of the population described
by the mass or density function and then use the methods of Chapter 2 to
compute it.
à The same notation 𝜇 used to describe the mean of a population is now
used to denote the mean of a random variable.
àMoments of Random Variables
à Variance and Standard deviation
• Variance is a measure of dispersion of the random variable about the

mean.
- For the discrete RV:
- For the continuous RV:

à Covariance and correlation coefficient
For two random variables X and Y

the joint expectation is defined as
The correlation between random variables X and Y measured by the

covariance, is given by
is called the correlation coefficient. The correlation coefficient

measures how much two random variables are similar.
àIntroduction
• As in chapters 1–3, statistics such as the sample mean, standard deviation,

and correlation coefficient are useful tools for describing sets of data.
Ø Similarly, density and mass functions provide concise descriptions of
populations and ongoing processes.
à One important question left unanswered in those chapters: how do we know
what parameter values to use in a mass function or density function?
Ø For example, the Weibull density is commonly used for modeling the
lifetimes of products, but how do you go about selecting numerical values
for the Weibull parameters 𝜶 and 𝜷, that best describe the lifetimes of a
particular product?
àIntroduction
à One way to answer such questions is to use statistical inference, a technique

that converts the information from random samples into reliable estimates
of, and conclusions about, population or process parameters.
à It is important to keep in mind the important role played by random
sampling!
• For example:
à when testing a large shipment of parts for defective items, most
people would agree that finding two defective items in a random sample
of 10 is very different from finding 200 defectives in a random sample
of 1000. Although the sample percentage (i.e., the statistic calculated
from the data) is the same in both cases, the 20% defect rate in the larger
sample seems much more credible than the 20% defect rate in the smaller
sample!
à Random sampling provides credible information!
àIntroduction
• Figure shows Statistical inference:

(a) descriptive statistics;
(b) inferential statistics
Ø Without random sampling, statistics

can only provide descriptive
summaries of the data itself.
Ø With random sampling, our
conclusions can be reliably extended
beyond the data, to the population!!
àIntroduction
• Statistical inference is based on the interplay between random samples

(used to obtain data and calculate statistics), sampling distributions (which
describe the behavior of such statistics), and probability (which gives
quantitative measures of reliability about what the statistics say)
à The sampling distribution of a statistic is a mass or density function that
characterizes all the possible values that the statistic can assume in repeated
random samples.
à Definition
à How Sampling Distributions Are Used?

• One way to approximate the sampling distribution of a statistic is to
repeatedly select a large number of random samples of size n from a given
population.
• By calculating the value of the statistic for each sample and forming a
histogram of the results, we get an approximate picture of the sampling
distribution of the statistic.
• In turn, this picture can be used to describe the values of the statistic that are
likely to occur in any random sample of size n.
à Example
Suppose that we draw 1000 random samples, each of size n = 25, from a
normal population with a mean of 50 and a standard deviation of 2. If we
calculate the mean 𝑥̅ of each sample, then the distribution of all 1000 𝑥̅ values
gives a good approximation to the sampling distribution of 𝑥.̅
A histogram of the results of such an experiment

à Example
• Notice that the 1000 sample means stack up around the population mean (µ = 50).
• Variation among the sample means is smaller than variation in the population.
• In particular, none of the sample means fall outside the range of 48.5 to 51.5 (i.e.,
none are more than 1.5 units away from µ).
• In fact, it also appears that very few sample means fall outside the interval 49 to 51;
that is, they are generally within 1 unit of µ.

à Example
• From the shape and location of the sampling distribution, we can begin to see
which values of the sample statistic are more likely to occur than others.
• In this sense, the information in a sampling distribution provides a template for

evaluating any sample, even future samples, from a population or process.

à General Properties of Sampling Distributions
• Sampling distributions can be created for any statistic: 𝑥,̅ s, s2, 𝑥.

5
• the approximate sampling

distributions of the
statistics 𝑥,̅ s, s2, 𝑥,
# for
the same 1000 samples of
size n = 25
• What about sampling from discrete populations?

• In particular, suppose we want to use samples of size 25 to estimate
the proportion p of defectives being made by a certain process.
• Denoting defective items by a “1” and nondefectives by a “0,” the
mass function
describes such a process in which the proportion of defective items is

20%.
• By calculating the sample proportion defective p for each of 1000
random samples of size n = 25, an approximate sampling distribution
for the statistic p can be formed
Note that this distribution has many more possible

values than just the values x = 0 and x = 1 in the
population (each of the values 0/25, 1/25, 2/25, 3/25, .
. . , 25/25 is a possible value of p)
Sampling distribution of p when n = 25

from a process with p = 0.20
à General conclusions
à Section 4.4 Exercises
4.5 Describing Sampling Distributions
!
à Sampling Distribution of 𝒙
• The sampling distribution of 𝑥,̅ also called the sampling distribution of the mean,
is the probability distribution that describes the behavior of 𝑥̅ in repeated random
samples from a population or process.
• Like any distribution, the sampling distribution of 𝑥̅ has its own unique mean and
standard deviation, which we denote by µ𝑥 ̅ and s𝑥 ̅ , respectively.
!
à Sampling Distribution of 𝒙
• These equations hold regardless of the particular form of the population distribution.
• To emphasize the fact that it describes a sampling distribution, not a population, s𝑥 ̅

#, or the standard error of the mean.
is also called the standard error of 𝒙
• One of the key features of the standard error of the mean s𝑥 ̅ is that it decreases as
the sample size increases. In fact, many statistics have this.
• This makes intuitive sense, since we expect that more information ought to provide
better estimates (i.e., smaller standard errors). As a result, increasing the size of a
random sample has the desirable effect of increasing the probability that the estimate
𝑥̅ will lie close to the population mean µ.
à Sampling Distribution of 𝒙!
à Sampling from a Normal Population
• When a population follows a normal distribution, it can be shown that

the sampling distribution of 𝑥̅ is also normal, for any sample size n.
• The normality of 𝑥̅ , along with the fact that its mean µ𝑥 ̅ and standard
error s𝑥 ̅ can be determined from µ and s, is enough to completely
characterize the sampling distribution of x in this case.
• As a result, with the normal distribution, probabilities of events

involving x reduce to straightforward calculations.
à Sampling Distribution of 𝒙!
à Sampling from a Normal Population
The probability that 𝑥̅ falls

within a fixed distance from
µ increases as n increase
When a population
distribution is normal,
the sampling distribution
# is also normal,
of 𝒙
regardless of the size of
the sample.
à The Central Limit Theorem
By using a moderately large sample size n, it can be shown that the

sampling distribution of x is approximately normal, regardless of the
particular population distribution.
à The Central Limit Theorem
The closer the

population is to
being normal, the
more rapidly the
sampling
distribution of 𝑥̅
approaches
normality.
the less symmetric

a population is,
the larger the
sample size will
have to be to
ensure normality
of 𝑥.̅
à Sampling Distribution of the Sample Proportion
Qualitative information can also be included in statistical studies. To do this, we first

numerically code such information using the following simple device: The number “1”
is assigned to population members having a specified characteristic and “0” is assigned
to those that do not. The population that results from this 0–1 coding scheme is
pictured in Figure.
The parameter of interest in this situation is , the

proportion of the population that has the characteristic
of interest. Notice that is also the height of the bar
associated with the value of 1 in Figure.
The distribution of coded values of a qualitative

characteristic: “1” denotes that the specified
characteristic is present; “0” indicates that it is not
Every random sample drawn from such a population will consist entirely of 0s and
1s. Suppose, for instance, that a particular sample of size 10 contains the observations
{0, 0, 1, 1, 0, 1, 0, 0, 1, 0}. Then the sample mean is (0+0 +1+1+0+1+0 +0+1+0)/10 =
.40. That is, the sample mean is simply the proportion of 1s in the sample.
à We use the notation p to denote the proportion of successes, also called the sample
proportion, in a random sample of size n.
Since p is actually a sample mean, we can use

the earlier results in this section to determine its
sampling distribution. For example, the mean
and standard error of the sampling distribution
of p are given by
à As a general rule, the accuracy of the normal approximation is best

when both n𝜋 ≥ 5 and n(1 − 𝜋) ≥ 5 .
Example: a p chart is often used to monitor the proportion of non-

conforming products in a manufacturing process
à a value of 𝜋 is selected as being representative of the long-run behavior of
the process.
à Suppose, for example, that a certain process constantly generates an
average of about 5% nonconforming products and that samples of size 100
are taken each day to test whether the 5% nonconformance rate has
changed. On one particular day, 12 non-conforming products appear in the
sample. How do we interpret this information?
à Section 4.5 Exercises

Chap4-Probability and Sampling-Upload

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap4-Probability and Sampling-Upload

Uploaded by

Copyright:

Available Formats

1

Chapter 5: Estimation and Statistical Intervals

4.1 Chance Experiments and Probability Concepts

4.3 Random Variables

4.4 Sampling Distributions

4.5 Describing Sampling Distributions and The Central

• A chance experiment, also called a random experiment,

• The outcomes of chance experiments. These outcomes can be

• Each of outcomes is a simple event, which, taken together, form the

• Events are often denoted by single uppercase letters

• Tree diagrams are especially useful for depicting experiments that

Tree diagram for the experiment of

• The Venn diagram, is especially useful for depicting relationships

• One of the primary methods for creating complex events and,

For a chance experiment and any two events A and B:

• Probability allows us to quantify the likelihood associated with

• Generally speaking, the probability of an event can be thought of as

• Probabilities are reported either as proportions (between 0 and 1) or

à There are several ways to determine probabilities:

• Depending on the circumstances, each method has its merits.

• For example, when it is possible to repeat a chance experiment, the

• As the number of trials increases, we expect this ratio to stabilize and

• Disjoint, or mutually exclusive, events are events that cannot occur

• The complement A¢ of an event A was defined to be the collection of

• General addition rule

ü Random Variable (RV): A numerical characteristic whose value depends on

• Notation: capital letters denote random variables, e.g., X or Y, and lower

Let X be the number of heads in the three tosses. X assigns each

X is a random variable taking on values in the set

What is the pmf of X?

What is the CDF of X?

The cumulative distribution function (cdf) of a random variable X

• The cdf is the probability of the event

• It is the probability that the random variable X takes on a

u(t) is unit step function 0 x

¨ If 𝑥 and 𝑦 are both discrete, their joint distribution is specified by a

¨ Often, there is no nice formula for 𝑓(𝑥,𝑦).

¨ Example 4.2: A certain market has both an express checkout register

Q: What is the marginal pmf of X and Y?

¨ If 𝑥 and 𝑦 are both continuous, their joint distribution is specified by a joint

à 1. Verify the joint pdf satisfy the two conditions.

Back to Example 4.3 :

à The marginal pdf of X, which gives the probability distribution of busy

¨ Back to Example 4.3 :

¨ The bivariate normal joint density function is given by

¨ The correlation coefficient 𝜌 is defined by

¨ The covariance between

𝜎#$ = & & (𝑥 − 𝜇# )(𝑦 − 𝜇$ )𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦

¨ Example: Images displayed on computer screens consist of thousands of

à The intensity of the beam by a joint

Q: What is the proportion of time

The mean of a random variable X can be thought of as the long-run average

• Variance is a measure of dispersion of the random variable about the

- For the continuous RV:

For two random variables X and Y

The correlation between random variables X and Y measured by the

is called the correlation coefficient. The correlation coefficient

• As in chapters 1–3, statistics such as the sample mean, standard deviation,

à One way to answer such questions is to use statistical inference, a technique

• Figure shows Statistical inference:

Ø Without random sampling, statistics

• Statistical inference is based on the interplay between random samples

à How Sampling Distributions Are Used?