You are on page 1of 64

1

CHAPTER 4
PROBABILITY AND SAMPLING
DISTRIBUTIONS
Outline of the Subject
2

Chapter 1: Introduction-to-Applied-Probability-Statistics
Chapter 2: Numerical_Summary_Measure

Chapter 3: Bivariate_Multivariate_Data_Distribution

Chapter 4: Probability_Sampling_Distributions

Chapter 5: Estimation and Statistical Intervals


Chapter 6: Testing Statistical Hypotheses
Outline of Chapter 4
3

4.1 Chance Experiments and Probability Concepts


4.2 Independence

4.3 Random Variables

4.4 Sampling Distributions

4.5 Describing Sampling Distributions and The Central


Limit Theorem
4.1 Chance Experiments and Probability Concepts
à Chance Experiments

• A chance experiment, also called a random experiment,


is simply an activity or situation whose outcomes, to some
degree, depend on chance.
• To decide whether a given activity qualifies as a chance
experiment à
Ø Getting exactly the same result if repeating the
experiment more than once?
Ø If the answer is “no,” then the experiment qualifies
as a chance experiment.
4.1 Chance Experiments and Probability Concepts
à Chance Experiments à Events

• The outcomes of chance experiments. These outcomes can be


divided into two types:
Ø (1) simple events, which are the individual outcomes of an
experiment.
Ø (2) events, which consist of collections of simple events.

q For instance, the chance experiment of conducting a series of stress tests on three
metal parts has the eight possible outcomes PPP, PPF, PFP, FPP, PFF, FPF, FFP,
and FFF, where P and F denote the test results “pass” and “fail,” and the order in
which the letters appear corresponds to the part number tested.

• Each of outcomes is a simple event, which, taken together, form the


sample space of the experiment.
4.1 Chance Experiments and Probability Concepts
à Chance Experiments à Events

• Events are often denoted by single uppercase letters


• Events can also be described by just listing, in brackets, the simple
events that comprise them.
For example:
Ø The event that at least two parts pass the stress test corresponds to
the set of outcomes {PPP, PPF, PFP, FPP}.
Ø If we had also chosen to denote this event by the letter A, then we
could also write A = {PPP, PPF, PFP, FPP}.
4.1 Chance Experiments and Probability Concepts
à Chance Experiments à Depicting Events

• Tree diagrams are especially useful for depicting experiments that


are conducted in a sequence of steps.

Tree diagram for the experiment of


selecting and testing three metal
parts (branches forming the simple
event PPF are shown shaded)
4.1 Chance Experiments and Probability Concepts
à Chance Experiments à Depicting Events

• The Venn diagram, is especially useful for depicting relationships


between events.
• Venn diagrams are simple two-
dimensional figures, often rectangles or
circles, whose enclosed regions are
intended to depict a collection of simple
events, called points, in a sample space.
E.g: Event A = at least two parts pass the
test contains all of the simple events
Event C= exactly three parts pass the test,
à C is shown inside of A
Event B = at most two parts pass the test.
4.1 Chance Experiments and Probability Concepts
à Chance Experiments à Forming New Events

• One of the primary methods for creating complex events and,


therefore, for unraveling them, involves the use of the words and, or,
and not.
DEFINITIONS

For a chance experiment and any two events A and B:


1. The event A or B consists of all simple events that are contained in
either A or B. A or B can also be described as the event that at least one
of A or B occurs.
2. The event A and B consists of all simple events common to both A
and B. A and B can be described as the event that both A and B occur.
3. The event A’, called the complement of A, consists of all simple
events that are not contained in A.
A’ is the event that A does not occur.
4.1 Chance Experiments and Probability Concepts
à Chance Experiments à Forming New Events

• When two events A and B have no simple events in common, we say that
they are mutually exclusive or disjoint ( cannot occur simultaneously).
• Several of the previous definitions can be extended to include events
formed from more than two events.
DEFINITIONS
Given a chance experiment and any events A1, A2, A3, . . . , Ak:
1. The event A1 or A2 or A3 or . . . or Ak consists of all the simple events that are
contained in at least one of the events A1, A2, A3, . . . , or Ak. It can also be described
as the event that at least one of the events A1, A2, A3, . . . , or Ak occurs.
2. The event A1 and A2 and A3 and . . . and Ak consists of all simple events
common to all the events A1, A2, A3, . . . , and Ak. This event can be described as
the event that all of the events A1, A2, A3, . . . , and Ak occur.
3. Several events A1, A2, A3, . . . , and Ak are said to be mutually exclusive or
disjoint if no two of them have any simple events in common.
4.1 Chance Experiments and Probability Concepts
à Probability Concepts

• Probability allows us to quantify the likelihood associated with


uncertain events, that is, events that result from chance experiments.

• Generally speaking, the probability of an event can be thought of as


the proportion of times that the event is expected to occur in the long
run.

• Probabilities are reported either as proportions (between 0 and 1) or


as percentages (between 0% and 100%).
4.1 Chance Experiments and Probability Concepts
à Probability Concepts à Assigning Probabilities

Probability Axioms

1. The probability of any event must lie between 0 and 1. That is,
0 £ P(A) £ 1 for any event.
2. The total probability assigned to the sample space of an
experiment must be 1.

à There are several ways to determine probabilities:


Ø (1) as frequencies of occurrence,
Ø (2) by using density and mass functions
4.1 Chance Experiments and Probability Concepts
à Probability Concepts à Assigning Probabilities

• Depending on the circumstances, each method has its merits.

• For example, when it is possible to repeat a chance experiment, the


“frequentist” approach defines the probability of an event A to be the
long-run ratio

• As the number of trials increases, we expect this ratio to stabilize and


eventually approach a limiting value.
4.1 Chance Experiments and Probability Concepts
à The Addition Rule for Disjoint Events

• Disjoint, or mutually exclusive, events are events that cannot occur


simultaneously. For any two disjoint events A and B,

• More generally, for any collection of disjoint events A1, A2, A3, . . . ,
Ak,
4.1 Chance Experiments and Probability Concepts
à Complementary Events

• The complement A¢ of an event A was defined to be the collection of


simple events that are not in A.
Ø (A¢)¢ = A

• General addition rule


4.2 Independence
à Independent Events

Two events, A and B, are independent events if the probability that either one occurs is
not affected by the occurrence of the other. In this case,

Several events, A1, A2, A3, . . . , Ak, are independent if the probability of each
event is unaltered by the occurrence of any subset of the remaining events.

In this case, the product rule can be applied to any subset of the k events. That
is, the probability that all the events in any subset occur equals the product of
their individual probabilities of occurring. In particular, for all k events,
4.3 Random variables
à Introduction

ü Random Variable (RV): A numerical characteristic whose value depends on


the outcome of a chance experiment is called a random variable.
• Example: In a chemical reaction, any quantifiable feature associated with
the reaction is a random variable (e.g., yield, density, weight, viscosity,
volume, and translucence of the material produced)

• Notation: capital letters denote random variables, e.g., X or Y, and lower


case letters denote possible values of the random variables, e.g., x or y.
4.3 Random variables
à Introduction à Example - Coin Tosses

Example 4.1: A coin is tossed three times and the sequence of heads
and tails is noted. The sample space for this experiment is

Let X be the number of heads in the three tosses. X assigns each


outcome z in S a number from the set

X is a random variable taking on values in the set


4.3 Random variables
à Probability Density Function - pdf
à Example - Coin Tosses

What is the pmf of X?

What is the CDF of X?


4.3 Random variables
à The Cumulative Distribution Function - cdf

The cumulative distribution function (cdf) of a random variable X


is defined as the probability of the event

• The cdf is the probability of the event

• It is the probability that the random variable X takes on a


value in the set
4.3 Random variables
à The Cumulative Distribution Function – cdf
à Example : Three Coin Tosses
• FX(x) is simply the sum of the probabilities of the outcomes from {0,1,2,3}
that are less than or equal to x.
• The cdf has jumps at the points 0, 1, 2, 3 of magnitudes 1/8, 3/8, 3/8, and
1/8, respectively

u(x)
1

u(t) is unit step function 0 x


4.3 Random variables
à The Cumulative Distribution Function – cdf
à Basic properties of the cdf
4.3 Random variables
à Joint random variable à Joint distribution function
23

X and Y are two random variables defined on the same sample space S.

¨ If 𝑥 and 𝑦 are both discrete, their joint distribution is specified by a


joint mass function 𝑓(𝑥,𝑦) satisfying:

1. 𝑓 𝑥, 𝑦 ≥ 0
2. ∑!""($,&) 𝑓 𝑥, 𝑦 = 1

¨ Often, there is no nice formula for 𝑓(𝑥,𝑦).


¨ When there are only a few possible values of 𝑥 and 𝑦, the mass
function is most conveniently displayed in a rectangular table.
4.3 Random variables
à Joint random variable à Joint distribution function
24

¨ Example 4.2: A certain market has both an express checkout register


and a super-express register.
¨ Let x denote the number of customers queueing at the express register
at a particular weekday time, and let y denote the number of
customers in line at the super-express register at that same time.
¨ The joint mass function is as given as:
Q:
1. What is the probability to have the
number of customers at the express
register equal to the number of customers
at the super-express?
2. What is the probability to have total 2
customers at these two registers at the
same time?
4.3 Joint Distributions
Distributions for two discrete variables
25

¨ Example 4.2:
4.3 Joint Distributions
Distributions for two discrete variables
à The marginal probability mass functions
26

¨ Def:

Q: What is the marginal pmf of X and Y?


4.3 Joint Distributions
Distributions for two continuous variables
27

¨ If 𝑥 and 𝑦 are both continuous, their joint distribution is specified by a joint


mass function 𝑓(𝑥,𝑦) satisfying:
1. 𝑓 𝑥, 𝑦 ≥ 0
" "
2. ∫!" ∫!" 𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦 = 1
¨ The graph of f (x, y) is a surface in three-dimensional space.

¨ The second condition indicates that the total volume under this density surface
is 1.
à Figure :
Volume representing the proportion
of ( x, y) in the region A
4.3 Joint Distributions
Distributions for two continuous variables
28

¨ Example 4.3:

à 1. Verify the joint pdf satisfy the two conditions.


à 2. What is the probability that neither facility is busy more than one-
quarter of the time ?
4.3 Joint Distributions
Distributions for two continuous variables
29

1.

2.
4.3 Joint Distributions
Distributions for two continuous variables
à The marginal probability mass functions
30

¨ Def:

Back to Example 4.3 :

à The marginal pdf of X, which gives the probability distribution of busy


time for the drive-up facility without reference to the walk-up window, is
4.3 Joint Distributions
Distributions for two continuous variables
à The marginal probability mass functions
31

¨ Back to Example 4.3 :

What is the probability distribution of busy time from for the walk-up
window for the period between one-quarter and third-quarter of the time ?
à The marginal pdf of Y, which gives the probability distribution of busy
time for the walk-up window without reference to the drive-up facility is

Leads to
4.3 Joint Distributions
The Bivariate Normal Distribution
32

¨ The bivariate normal joint density function is given by


$ !%'! # !%'! "%'" "%'" #
( , [ ,)- / ]
# $%&#
𝑓 𝑥, 𝑦 = 𝑒 (! (! (" ("
(1)
)*+!+" (,-#
where −∞ < 𝑥 < ∞ and −∞ < 𝑦 < ∞
¨ When 𝑥 and 𝑦 are statistically independently, the joint density function
𝑓(𝑥,𝑦) must satisfy
𝒇 𝒙, 𝒚 = 𝒇𝟏 𝒙 𝒇𝟐 𝒚
where 𝑓( 𝑥 and 𝑓) 𝑦 denote the marginal distributions of 𝑥 and 𝑦,
respectively.

Note that once independence is assumed, one has only to select appropriate
distributions for 𝑥 and 𝑦 separately and then use (1) to yield the joint
distribution.
4.3 Joint Distributions
Correlation and the Bivariate Normal Distribution
33

¨ The correlation coefficient 𝜌 is defined by


𝜎#$
𝜌=
𝜎# 𝜎$
Ø 𝜌 does not depend on the 𝑥 or 𝑦 units of measurement.
−1 ≤ 𝜌 ≤ 1
Ø The closer 𝜌 is to +1 or -1, the stronger the linear relationship between the
two variables.

¨ The covariance between


" "
𝑥 and 𝑦 is defined by:

𝜎#$ = & & (𝑥 − 𝜇# )(𝑦 − 𝜇$ )𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦


!" !"
where 𝜇# and 𝜇$ denote the mean values of x and y, respectively.
4.3 Joint Distributions
The Bivariate Normal Distribution
34

¨ Example: Images displayed on computer screens consist of thousands of


pixels. The intensity of the electron beam focused at a given point (x0,
y0) on a flat screen is usually described by two independent normal
random variables x and y, with means x0 and y0, respectively.

à The intensity of the beam by a joint


density function of two independent
random variables.
à Figure shows a graph of the joint
density function describing an electron
beam focused on the point (x0, y0) = (30,
50).
à The standard deviations of the two
normal distributions are 𝜎# = 0.2 and
𝜎% = 0.2
4.3 Joint Distributions
The Bivariate Normal Distribution
35

¨ Example (cont’):
àBecause x and y are
independent, we can write the
joint density as the product:

Q: What is the proportion of time


that the beam spends in the region
where x < 29.5 and y < 49.6?
à Instead of integrating the density over the region B = {(x, y)|x < 29.5, y < 49.6},
we can simply use the independence of x and y to obtain:
4.3 Joint Distributions: Exercise
36
4.3 Random variables
àMean of Random Variables

The mean of a random variable X can be thought of as the long-run average


value of X that should occur in many repeated trials of a chance experiment.
à Fortunately, when the probability distribution of X is known, there is no
need to actually perform repeated experimental trials.
à Instead, we define the mean to be the mean of the population described
by the mass or density function and then use the methods of Chapter 2 to
compute it.
à The same notation 𝜇 used to describe the mean of a population is now
used to denote the mean of a random variable.
4.3 Random variables
àMoments of Random Variables
à Variance and Standard deviation

• Variance is a measure of dispersion of the random variable about the


mean.
- For the discrete RV:

- For the continuous RV:


4.3 Random variables
à Covariance and correlation coefficient

For two random variables X and Y


the joint expectation is defined as

The correlation between random variables X and Y measured by the


covariance, is given by

is called the correlation coefficient. The correlation coefficient


measures how much two random variables are similar.
4.4 Sampling Distributions
àIntroduction

• As in chapters 1–3, statistics such as the sample mean, standard deviation,


and correlation coefficient are useful tools for describing sets of data.
Ø Similarly, density and mass functions provide concise descriptions of
populations and ongoing processes.
à One important question left unanswered in those chapters: how do we know
what parameter values to use in a mass function or density function?
Ø For example, the Weibull density is commonly used for modeling the
lifetimes of products, but how do you go about selecting numerical values
for the Weibull parameters 𝜶 and 𝜷, that best describe the lifetimes of a
particular product?
4.4 Sampling Distributions
àIntroduction

à One way to answer such questions is to use statistical inference, a technique


that converts the information from random samples into reliable estimates
of, and conclusions about, population or process parameters.
à It is important to keep in mind the important role played by random
sampling!
• For example:
à when testing a large shipment of parts for defective items, most
people would agree that finding two defective items in a random sample
of 10 is very different from finding 200 defectives in a random sample
of 1000. Although the sample percentage (i.e., the statistic calculated
from the data) is the same in both cases, the 20% defect rate in the larger
sample seems much more credible than the 20% defect rate in the smaller
sample!
à Random sampling provides credible information!
4.4 Sampling Distributions
àIntroduction

• Figure shows Statistical inference:


(a) descriptive statistics;
(b) inferential statistics

Ø Without random sampling, statistics


can only provide descriptive
summaries of the data itself.
Ø With random sampling, our
conclusions can be reliably extended
beyond the data, to the population!!
4.4 Sampling Distributions
àIntroduction

• Statistical inference is based on the interplay between random samples


(used to obtain data and calculate statistics), sampling distributions (which
describe the behavior of such statistics), and probability (which gives
quantitative measures of reliability about what the statistics say)
à The sampling distribution of a statistic is a mass or density function that
characterizes all the possible values that the statistic can assume in repeated
random samples.
4.4 Sampling Distributions
à Definition

à How Sampling Distributions Are Used?


• One way to approximate the sampling distribution of a statistic is to
repeatedly select a large number of random samples of size n from a given
population.
• By calculating the value of the statistic for each sample and forming a
histogram of the results, we get an approximate picture of the sampling
distribution of the statistic.
• In turn, this picture can be used to describe the values of the statistic that are
likely to occur in any random sample of size n.
4.4 Sampling Distributions
à Example

Suppose that we draw 1000 random samples, each of size n = 25, from a
normal population with a mean of 50 and a standard deviation of 2. If we
calculate the mean 𝑥̅ of each sample, then the distribution of all 1000 𝑥̅ values
gives a good approximation to the sampling distribution of 𝑥.̅

A histogram of the results of such an experiment


4.4 Sampling Distributions
à Example

• Notice that the 1000 sample means stack up around the population mean (µ = 50).
• Variation among the sample means is smaller than variation in the population.
• In particular, none of the sample means fall outside the range of 48.5 to 51.5 (i.e.,
none are more than 1.5 units away from µ).
• In fact, it also appears that very few sample means fall outside the interval 49 to 51;
that is, they are generally within 1 unit of µ.

A histogram of the results of such an experiment


4.4 Sampling Distributions
à Example

• From the shape and location of the sampling distribution, we can begin to see
which values of the sample statistic are more likely to occur than others.

• In this sense, the information in a sampling distribution provides a template for


evaluating any sample, even future samples, from a population or process.

A histogram of the results of such an experiment


4.4 Sampling Distributions
à General Properties of Sampling Distributions

• Sampling distributions can be created for any statistic: 𝑥,̅ s, s2, 𝑥.


5

• the approximate sampling


distributions of the
statistics 𝑥,̅ s, s2, 𝑥,
# for
the same 1000 samples of
size n = 25
4.4 Sampling Distributions
à General Properties of Sampling Distributions

• What about sampling from discrete populations?


• In particular, suppose we want to use samples of size 25 to estimate
the proportion p of defectives being made by a certain process.
• Denoting defective items by a “1” and nondefectives by a “0,” the
mass function

describes such a process in which the proportion of defective items is


20%.
• By calculating the sample proportion defective p for each of 1000
random samples of size n = 25, an approximate sampling distribution
for the statistic p can be formed
4.4 Sampling Distributions
à General Properties of Sampling Distributions

Note that this distribution has many more possible


values than just the values x = 0 and x = 1 in the
population (each of the values 0/25, 1/25, 2/25, 3/25, .
. . , 25/25 is a possible value of p)

Sampling distribution of p when n = 25


from a process with p = 0.20
4.4 Sampling Distributions
à General Properties of Sampling Distributions
à General conclusions
4.4 Sampling Distributions
à Section 4.4 Exercises
4.5 Describing Sampling Distributions
!
à Sampling Distribution of 𝒙

• The sampling distribution of 𝑥,̅ also called the sampling distribution of the mean,
is the probability distribution that describes the behavior of 𝑥̅ in repeated random
samples from a population or process.
• Like any distribution, the sampling distribution of 𝑥̅ has its own unique mean and
standard deviation, which we denote by µ𝑥 ̅ and s𝑥 ̅ , respectively.
4.5 Describing Sampling Distributions
!
à Sampling Distribution of 𝒙

• These equations hold regardless of the particular form of the population distribution.

• To emphasize the fact that it describes a sampling distribution, not a population, s𝑥 ̅


#, or the standard error of the mean.
is also called the standard error of 𝒙

• One of the key features of the standard error of the mean s𝑥 ̅ is that it decreases as
the sample size increases. In fact, many statistics have this.

• This makes intuitive sense, since we expect that more information ought to provide
better estimates (i.e., smaller standard errors). As a result, increasing the size of a
random sample has the desirable effect of increasing the probability that the estimate
𝑥̅ will lie close to the population mean µ.
4.5 Describing Sampling Distributions
à Sampling Distribution of 𝒙!
à Sampling from a Normal Population

• When a population follows a normal distribution, it can be shown that


the sampling distribution of 𝑥̅ is also normal, for any sample size n.

• The normality of 𝑥̅ , along with the fact that its mean µ𝑥 ̅ and standard
error s𝑥 ̅ can be determined from µ and s, is enough to completely
characterize the sampling distribution of x in this case.

• As a result, with the normal distribution, probabilities of events


involving x reduce to straightforward calculations.
4.5 Describing Sampling Distributions
à Sampling Distribution of 𝒙!
à Sampling from a Normal Population

The probability that 𝑥̅ falls


within a fixed distance from
µ increases as n increase

When a population
distribution is normal,
the sampling distribution
# is also normal,
of 𝒙
regardless of the size of
the sample.
4.5 Describing Sampling Distributions
à The Central Limit Theorem

By using a moderately large sample size n, it can be shown that the


sampling distribution of x is approximately normal, regardless of the
particular population distribution.
4.5 Describing Sampling Distributions
à The Central Limit Theorem

The closer the


population is to
being normal, the
more rapidly the
sampling
distribution of 𝑥̅
approaches
normality.

the less symmetric


a population is,
the larger the
sample size will
have to be to
ensure normality
of 𝑥.̅
4.5 Describing Sampling Distributions
à Sampling Distribution of the Sample Proportion

Qualitative information can also be included in statistical studies. To do this, we first


numerically code such information using the following simple device: The number “1”
is assigned to population members having a specified characteristic and “0” is assigned
to those that do not. The population that results from this 0–1 coding scheme is
pictured in Figure.

The parameter of interest in this situation is , the


proportion of the population that has the characteristic
of interest. Notice that is also the height of the bar
associated with the value of 1 in Figure.

The distribution of coded values of a qualitative


characteristic: “1” denotes that the specified
characteristic is present; “0” indicates that it is not
4.5 Describing Sampling Distributions
à Sampling Distribution of the Sample Proportion

Every random sample drawn from such a population will consist entirely of 0s and
1s. Suppose, for instance, that a particular sample of size 10 contains the observations
{0, 0, 1, 1, 0, 1, 0, 0, 1, 0}. Then the sample mean is (0+0 +1+1+0+1+0 +0+1+0)/10 =
.40. That is, the sample mean is simply the proportion of 1s in the sample.
à We use the notation p to denote the proportion of successes, also called the sample
proportion, in a random sample of size n.

Since p is actually a sample mean, we can use


the earlier results in this section to determine its
sampling distribution. For example, the mean
and standard error of the sampling distribution
of p are given by
4.5 Describing Sampling Distributions
à Sampling Distribution of the Sample Proportion

à As a general rule, the accuracy of the normal approximation is best


when both n𝜋 ≥ 5 and n(1 − 𝜋) ≥ 5 .
4.5 Describing Sampling Distributions
à Sampling Distribution of the Sample Proportion

Example: a p chart is often used to monitor the proportion of non-


conforming products in a manufacturing process
à a value of 𝜋 is selected as being representative of the long-run behavior of
the process.
à Suppose, for example, that a certain process constantly generates an
average of about 5% nonconforming products and that samples of size 100
are taken each day to test whether the 5% nonconformance rate has
changed. On one particular day, 12 non-conforming products appear in the
sample. How do we interpret this information?
4.5 Describing Sampling Distributions
à Sampling Distribution of the Sample Proportion
4.5 Describing Sampling Distributions
à Section 4.5 Exercises

You might also like