distribution research

© All Rights Reserved

21 views

distribution research

© All Rights Reserved

- M350 Wk3 Stats 2 Student t
- Assignment 2
- ASTM D 4533-91 Standard Test Method for Trapezoid Tearing Streght of Geotextiles
- Avoiding_Detection__Scripter_Edition_-_Scripter_Discussion_-_TRiBot_Forums.pdf
- Statistics and Finance - Content
- AP Statistics Project Report
- Lab6_SchmidtHammeranalysis
- Control Charts
- exp2 (1)
- expdes
- Chap7
- Brewer Stats
- 5 - 21 - Lecture 44 Expectation and Variance
- Minitab-Intro[1][1].pdf
- IE241 ver15
- Introduction to Continuous Probability Distributions
- stats.pdf
- chapter10_section3
- June 2014 (R) QP - S1 Edexcel.pdf
- The Individual Risk Slide

You are on page 1of 22

COLLEGE OF SCIENCE

Research Paper

In Statistics

Discrete probability distribution

Discrete probability distributions describes a finite set of possible occurrences,

for discrete count data. For example, the number of successful treatments out of 2

patients is discrete, because the random variable represent the number of success can be

only 0, 1, or 2. The probability of all possible occurrencesPr(0 successes), Pr(1 success),

Pr(2 successes)constitutes the probability distribution for this discrete random

variable.

Continuous probability distribution

Continuous probability distributions describe an unbroken continuum of possible

occurrences. For example, the probability of a given birthweight can be anything from, say,

half a pound to more than 12 pounds (or something like that). Thus, the random variable of

birthweight is continuous, with an infinite number of possible points between any two

values. (Think in terms of Xenos Paradox.)

Define and give its properties and formulas

TYPES OF DISCRETE DISTRIBUTION

Binomial Distribution

A binomial random variable is the number of successes x in n repeated trials of a binomial

experiment. The probability distribution of a binomial random variable is called a binomial

distribution (also known as a Bernoulli distribution).

Suppose we flip a coin two times and count the number of heads (successes). The binomial random

variable is the number of heads, which can take on values of 0, 1, or 2. The binomial distribution is

presented below.

Number of heads

Probability

0.25

0.50

0.25

Binomial Probability

The binomial probability refers to the probability that a binomial experiment results in exactly x

successes. For example, in the above table, we see that the binomial probability of getting

exactly one head in two coin flips is 0.50.

Given x, n, and P, we can compute the binomial probability based on the following formula:

Binomial Formula. Suppose a binomial experiment consists of n trials and results in x successes.

If the probability of success on an individual trial is P, then the binomial probability is:

b(x; n, P) = nCx * Px * (1 - P)n - x

Example 1

Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours?

Solution: This is a binomial experiment in which the number of trials is equal to 5, the number of

successes is equal to 2, and the probability of success on a single trial is 1/6 or about 0.167.

Therefore, the binomial probability is:

b(2; 5, 0.167) = 5C2 * (0.167)2 * (0.833)3

b(2; 5, 0.167) = 0.161

Cumulative Binomial Probability

A cumulative binomial probability refers to the probability that the binomial random variable

falls within a specified range (e.g., is greater than or equal to a stated lower limit and less than or

equal to a stated upper limit).

For example, we might be interested in the cumulative binomial probability of obtaining 45 or

fewer heads in 100 tosses of a coin (see Example 1 below). This would be the sum of all these

individual binomial probabilities.

b(x < 45; 100, 0.5) =

b(x = 0; 100, 0.5) + b(x = 1; 100, 0.5) + ... + b(x = 44; 100, 0.5) + b(x = 45; 100, 0.5)

Hypergeometric Distribution

hypergeometric experiment. The probability distribution of a hypergeometric random variable is

called a hypergeometric distribution.

Given x, N, n, and k, we can compute the hypergeometric probability based on the following

formula:

Hypergeometric Formula. Suppose a population consists of N items, k of which are successes.

And a random sample drawn from that population consists of n items, x of which are successes.

Then the hypergeometric probability is:

h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]

The hypergeometric distribution has the following properties:

The variance is n * k * ( N - k ) * ( N - n ) / [ N2 * ( N - 1 ) ] .

Example 1

Suppose we randomly select 5 cards without replacement from an ordinary deck of playing cards.

What is the probability of getting exactly 2 red cards (i.e., hearts or diamonds)?

h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]

h(2; 52, 5, 26) = [ 26C2 ] [

C3 ] / [ 52C5 ]

26

Thus, the probability of randomly selecting 2 red cards is 0.32513.

Multinomial Distribution

A multinomial distribution is the probability distribution of the outcomes from a multinomial

experiment. The multinomial formula defines the probability of any outcome from a multinomial

experiment.

Multinomial Formula. Suppose a multinomial experiment consists of n trials, and each trial can

result in any of k possible outcomes: E1, E2, . . . , Ek. Suppose, further, that each possible outcome

can occur with probabilities p1, p2, . . . , pk. Then, the probability (P) that E1 occurs n1 times, E2

occurs n2 times, . . . , and Ek occurs nk times is

P = [ n! / ( n1! * n2! * ... nk! ) ] * ( p1n1 * p2n2 * . . . * pknk )

where n = n1 + n2 + . . . + nk.

The example below illustrate how to use the multinomial formula to compute the probability of an

outcome from a multinomial experiment.

Example 1

Suppose a card is drawn randomly from an ordinary deck of playing cards, and then put back in

the deck. This exercise is repeated five times. What is the probability of drawing 1 spade, 1

heart, 1 diamond, and 2 clubs?

Solution: To solve this problem, we apply the multinomial formula. We know the following:

2.

On any particular trial, the probability of drawing a spade, heart, diamond, or club is 0.25,

0.25, 0.25, and 0.25, respectively. Thus, p1 = 0.25, p2 = 0.25, p3 = 0.25, and p4 = 0.25.

P = [ n! / ( n1! * n2! * ... nk! ) ] * ( p1n1 * p2n2 * . . . * pknk )

P = [ 5! / ( 1! * 1! * 1! * 2! ) ] * [ (0.25)1 * (0.25)1 * (0.25)1 * (0.25)2 ]

P = 0.05859

Thus, if we draw five cards with replacement from an ordinary deck of playing cards, the

probability of drawing 1 spade, 1 heart, 1 diamond, and 2 clubs is 0.05859.

Poisson Distribution

A Poisson random variable is the number of successes that result from a Poisson experiment.

The probability distribution of a Poisson random variable is called a Poisson distribution.

Given the mean number of successes () that occur in a specified region, we can compute the

Poisson probability based on the following formula:

Poisson Formula. Suppose we conduct a Poisson experiment, in which the average number of

successes within a given region is . Then, the Poisson probability is:

P(x; ) = (e-) (x) / x!

where x is the actual number of successes that result from the experiment, and e is

approximately equal to 2.71828.

The Poisson distribution has the following properties:

Example 1

The average number of homes sold by the Acme Realty company is 2 homes per day. What is the

probability that exactly 3 homes will be sold tomorrow?

x = 3; since we want to find the likelihood that 3 homes will be sold tomorrow.

P(x; ) = (e-) (x) / x!

P(3; 2) = (2.71828-2) (23) / 3!

P(3; 2) = (0.13534) (8) / 6

P(3; 2) = 0.180

Thus, the probability of selling 3 homes tomorrow is 0.180 .

Define each and give its properties and formulas

TYPES OF CONTINUOUS DISTRIBUTION

The Normal Equation

The normal distribution is defined by the following equation:

Normal equation. The value of the random variable Y is:

Y = { 1/[ * sqrt(2) ] } * e-(x - )2/22

where X is a normal random variable, is the mean, is the standard deviation, is approximately

3.14159, and e is approximately 2.71828.

The random variable X in the normal equation is called the normal random variable. The normal

equation is the probability density function for the normal distribution.

The Normal Curve

The graph of the normal distribution depends on two factors - the mean and the standard

deviation. The mean of the distribution determines the location of the center of the graph, and

the standard deviation determines the height and width of the graph. When the standard

deviation is large, the curve is short and wide; when the standard deviation is small, the curve is

tall and narrow. All normal distributions look like a symmetric, bell-shaped curve, as shown below.

The curve on the left is shorter and wider than the curve on the right, because the curve on the

left has a bigger standard deviation.

Probability and the Normal Curve

The normal distribution is a continuous probability distribution. This has several implications for

probability.

The probability that a normal random variable X equals any particular value is 0.

The probability that X is greater than a equals the area under the normal curve bounded

by a and plus infinity (as indicated by the non-shaded area in the figure below).

The probability that X is less than a equals the area under the normal curve bounded by a

and minus infinity (as indicated by the shaded area in the figure below).

Additionally, every normal curve (regardless of its mean or standard deviation) conforms to the

following "rule".

About 68% of the area under the curve falls within 1 standard deviation of the mean.

About 95% of the area under the curve falls within 2 standard deviations of the mean.

About 99.7% of the area under the curve falls within 3 standard deviations of the mean.

Collectively, these points are known as the empirical rule or the 68-95-99.7 rule. Clearly, given a

normal distribution, most outcomes will be within 3 standard deviations of the mean.

To find the probability associated with a normal random variable, use a graphing calculator, an

online normal distribution calculator, or a normal distribution table. In the examples below, we

illustrate the use of Stat Trek's Normal Distribution Calculator, a free tool available on this site.

In the next lesson, we demonstrate the use of normal distribution tables.

Example 1

An average light bulb manufactured by the Acme Corporation lasts 300 days with a standard

deviation of 50 days. Assuming that bulb life is normally distributed, what is the probability that

an Acme light bulb will last at most 365 days?

Solution: Given a mean score of 300 days and a standard deviation of 50 days, we want to find the

cumulative probability that bulb life is less than or equal to 365 days. Thus, we know the

following:

We enter these values into the Normal Distribution Calculator and compute the cumulative

probability. The answer is: P( X < 365) = 0.90. Hence, there is a 90% chance that a light bulb will

burn out within 365 days.

Student's T Distribution

According to the central limit theorem, the sampling distribution of a statistic will follow a normal

distribution, as long as the sample size is sufficiently large. Therefore, when we know the

standard deviation of the population, we can compute a z-score, and use the normal distribution to

evaluate probabilities with the sample mean.

But sample sizes are sometimes small, and often we do not know the standard deviation of the

population. When either of these problems occurs, statisticians rely on the distribution of the t

statistic (also known as the t score), whose values are given by:

t = [ x - ] / [ s / sqrt( n ) ]

where x is the sample mean, is the population mean, s is the standard deviation of the sample,

and n is the sample size. The distribution of the t statistic is called the t distribution or the

Student t distribution.

Degrees of Freedom

There are actually many different t distributions. The particular form of the t distribution is

determined by its degrees of freedom. The degrees of freedom refers to the number of

independent observations in a set of data.

When estimating a mean score or a proportion from a single sample, the number of independent

observations is equal to the sample size minus one. Hence, the distribution of the t statistic from

samples of size 8 would be described by a t distribution having 8 - 1 or 7 degrees of freedom.

Similarly, a t distribution having 15 degrees of freedom would be used with a sample of size 16.

For other applications, the degrees of freedom may be calculated differently. We will describe

those computations as they come up.

Properties of the t Distribution

The t distribution has the following properties:

The variance is equal to v / ( v - 2 ), where v is the degrees of freedom (see last section)

and v > 2.

The variance is always greater than 1, although it is close to 1 when there are many

degrees of freedom. With infinite degrees of freedom, the t distribution is the same as

the standard normal distribution.

When a sample of size n is drawn from a population having a normal (or nearly normal)

distribution, the sample mean can be transformed into a t score, using the equation presented at

the beginning of this lesson. We repeat that equation below:

t = [ x - ] / [ s / sqrt( n ) ]

where x is the sample mean, is the population mean, s is the standard deviation of the sample, n

is the sample size, and degrees of freedom are equal to n - 1.

The t score produced by this transformation can be associated with a unique cumulative

probability. This cumulative probability represents the likelihood of finding a sample mean less

than or equal to x, given a random sample of size n.

The easiest way to find the probability associated with a particular t score is to use the T

Distribution Calculator, a free tool provided by Stat Trek.

Notation and t Scores

Statisticians use t to represent the t-score that has a cumulative probability of (1 - ). For

example, suppose we were interested in the t-score having a cumulative probability of 0.95. In

this example, would be equal to (1 - 0.95) or 0.05. We would refer to the t-score as t 0.05

Of course, the value of t0.05 depends on the number of degrees of freedom. For example, with 2

degrees of freedom, that t0.05 is equal to 2.92; but with 20 degrees of freedom, that t 0.05 is equal

to 1.725.

Note: Because the t distribution is symmetric about a mean of zero, the following is true.

t = -t1 - alpha

And

t1 - alpha = -t

Problem 1

Acme Corporation manufactures light bulbs. The CEO claims that an average Acme light bulb lasts

300 days. A researcher randomly selects 15 bulbs for testing. The sampled bulbs last an average

of 290 days, with a standard deviation of 50 days. If the CEO's claim were true, what is the

probability that 15 randomly selected bulbs would have an average life of no more than 290 days?

Solution

The first thing we need to do is compute the t score, based on the following equation:

t = [ x - ] / [ s / sqrt( n ) ]

t = ( 290 - 300 ) / [ 50 / sqrt( 15) ] = -10 / 12.909945 = - 0.7745966

where x is the sample mean, is the population mean, s is the standard deviation of the sample,

and n is the sample size.

Now, we are ready to use the T Distribution Calculator. Since we know the t score, we select "T

score" from the Random Variable dropdown box. Then, we enter the following data:

The calculator displays the cumulative probability: 0.226. Hence, if the true bulb life were 300

days, there is a 22.6% chance that the average bulb life for 15 randomly selected bulbs would be

less than or equal to 290 days.

Chi-Square Distribution

Suppose we conduct the following statistical experiment. We select a random sample of size n

from a normal population, having a standard deviation equal to . We find that the standard

deviation in our sample is equal to s. Given these data, we can define a statistic, called chi-square,

using the following equation:

2 = [ ( n - 1 ) * s2 ] / 2

If we repeated this experiment an infinite number of times, we could obtain a sampling

distribution for the chi-square statistic. The chi-square distribution is defined by the following

probability density function:

Y = Y0 * ( 2 ) ( v/2 - 1 ) * e-2 / 2

where Y0 is a constant that depends on the number of degrees of freedom, 2 is the chi-square

statistic, v = n - 1 is the number of degrees of freedom, and e is a constant equal to the base of

the natural logarithm system (approximately 2.71828). Y0 is defined, so that the area under the

chi-square curve is equal to one.

In the figure below, the red curve shows the distribution of chi-square values computed from all

possible samples of size 3, where degrees of freedom is n - 1 = 3 - 1 = 2. Similarly, the the green

curve shows the distribution for samples of size 5 (degrees of freedom equal to 4); and the blue

curve, for samples of size 11 (degrees of freedom equal to 10).

When the degrees of freedom are greater than or equal to 2, the maximum value for Y

occurs when 2 = v - 2.

distribution.

The chi-square distribution is constructed so that the total area under the curve is equal to 1.

The area under the curve between 0 and a particular chi-square value is a cumulative probability

associated with that chi-square value. For example, in the figure below, the shaded area

represents a cumulative probability associated with a chi-square statistic equal to A; that is, it is

the probability that the value of a chi-square statistic will fall between 0 and A.

Problem 1

The Acme Battery Company has developed a new cell phone battery. On average, the battery lasts

60 minutes on a single charge. The standard deviation is 4 minutes.

Suppose the manufacturing department runs a quality control test. They randomly select 7

batteries. The standard deviation of the selected batteries is 6 minutes. What would be the chisquare statistic represented by this test?

Solution

We know the following:

To compute the chi-square statistic, we plug these data in the chi-square equation, as shown

below.

2 = [ ( n - 1 ) * s2 ] / 2

= [ ( 7 - 1 ) * 62 ] / 42 = 13.5

2

where 2 is the chi-square statistic, n is the sample size, s is the standard deviation of the

sample, and is the standard deviation of the population.

F Distribution

The f statistic, also known as an f value, is a random variable that has an F distribution. (We

discuss the F distribution in the next section.)

Here are the steps required to compute an f statistic:

Select a random sample of size n1 from a normal population, having a standard deviation

equal to 1.

standard deviation equal to 2.

f = [ s12/12 ] / [ s22/22 ]

f = [ s12 * 22 ] / [ s22 * 12 ]

f = [ 21 / v1 ] / [ 22 / v2 ]

f = [ 21 * v2 ] / [ 22 * v1 ]

where 1 is the standard deviation of population 1, s1 is the standard deviation of the sample

drawn from population 1, 2 is the standard deviation of population 2, s2 is the standard deviation

of the sample drawn from population 2, 21 is the chi-square statistic for the sample drawn from

population 1, v1 is the degrees of freedom for 21, 22 is the chi-square statistic for the sample

drawn from population 2, and v2 is the degrees of freedom for 22 . Note that degrees of

freedom v1 = n1 - 1, and degrees of freedom v2 = n2 - 1 .

The F Distribution

The distribution of all possible values of the f statistic is called an F distribution, with v1 = n1 - 1

and v2 = n2 - 1 degrees of freedom.

The curve of the F distribution depends on the degrees of freedom, v1 and v2. When describing an

F distribution, the number of degrees of freedom associated with the standard deviation in the

numerator of the f statistic is always stated first. Thus, f(5, 9) would refer to an F distribution

with v1 = 5 and v2 = 9 degrees of freedom; whereas f(9, 5) would refer to an F distribution with v1

= 9 and v2 = 5 degrees of freedom. Note that the curve represented by f(5, 9) would differ from

the curve represented by f(9, 5).

The F distribution has the following properties:

Every f statistic can be associated with a unique cumulative probability. This cumulative

probability represents the likelihood that the f statistic is less than or equal to a specified value.

Statisticians use f to represent the value of an f statistic having a cumulative probability of (1 ). For example, suppose we were interested in the f statistic having a cumulative probability of

0.95. We would refer to that f statistic as f0.05, since (1 - 0.95) = 0.05.

Of course, to find the value of f, we would need to know the degrees of freedom, v1 and v2.

Notationally, the degrees of freedom appear in parentheses as follows: f(v1,v2). Thus, f0.05(5, 7)

refers to value of the f statistic having a cumulative probability of 0.95, v1 = 5 degrees of

freedom, and v2 = 7 degrees of freedom.

The easiest way to find the value of a particular f statistic is to use the F Distribution Calculator,

a free tool provided by Stat Trek. For example, the value of f0.05(5, 7) is 3.97. The use of the F

Distribution Calculator is illustrated in the examples below.

Example 1

Suppose you randomly select 7 women from a population of women, and 12 men from a population

of men. The table below shows the standard deviation in each sample and in each population.

Population

Women

30

35

Men

50

45

Solution A: The f statistic can be computed from the population and sample standard deviations,

using the following equation:

f = [ s12/12 ] / [ s22/22 ]

where 1 is the standard deviation of population 1, s1 is the standard deviation of the sample

drawn from population 1, 2 is the standard deviation of population 2, and s1 is the standard

deviation of the sample drawn from population 2.

As you can see from the equation, there are actually two ways to compute an f statistic from

these data. If the women's data appears in the numerator, we can calculate an f statistic as

follows:

f = ( 352 / 302 ) / ( 452 / 502 ) = (1225 / 900) / (2025 / 2500) = 1.361 / 0.81 = 1.68

For this calculation, the numerator degrees of freedom v1 are 7 - 1 or 6; and the denominator

degrees of freedom v2 are 12 - 1 or 11.

On the other hand, if the men's data appears in the numerator, we can calculate an f statistic as

follows:

f = ( 452 / 502 ) / ( 352 / 302 ) = (2025 / 2500) / (1225 / 900) = 0.81 / 1.361 = 0.595

For this calculation, the numerator degrees of freedom v1 are 12 - 1 or 11; and the denominator

degrees of freedom v2 are 7 - 1 or 6.

When you are trying to find the cumulative probability associated with an f statistic, you need to

know v1 and v2. This point is illustrated in the next example.

What is normal distribution?

What are the applications of normal distribution?

Normal Distribution

The general formula for the probability density function of the normal distribution is

where is the location parameter and is the scale parameter. The case where = 0 and = 1 is

called the standard normal distribution. The equation for the standard normal distribution is

expressed in terms of the standard distribution, all subsequent

formulas in this section are given for the standard form of the

function.

Beside is the plot of the standard normal probability density

function.

The formula for the cumulative distribution function of the

normal distribution does not exist in a simple closed formula. It

computed numerically.

Beside is the plot of the normal cumulative distribution

function.

The formula for the percent point function of the normal

distribution does not exist in a simple closed formula. It is

computed numerically.

Beside is the plot of the normal percent point function.

Hazard Function

The formula for the hazard function of the normal distribution is

is

normal distribution and is the probability density function of the

standard normal distribution.

Beside is the plot of the normal hazard function.

The normal cumulative hazard function can be computed from the

normal cumulative distribution function.

Beside is the plot of the normal cumulative hazard function.

Survival Function

The normal survival function can be computed from the normal

cumulative distribution function.

Beside is the plot of the normal survival function.

computed from the normal percent point function.

Beside is the plot of the normal inverse survival function.

Common Statistics

Parameter Estimation The location and scale parameters of the normal distribution can be

estimated with the sample mean and sample standard deviation, respectively. Comments For both

theoretical and practical reasons, the normal distribution is probably the most important

distribution in statistics. For example,

Many classical statistical tests are based on the assumption that the data follow a normal

distribution. This assumption should be tested before applying these tests.

In modeling applications, such as linear and non-linear regression, the error term is often

assumed to follow a normal distribution with fixed location and scale.

The normal distribution is used to find significance levels in many hypothesis tests and

confidence intervals.

Theroretical Justification - Central Limit Theorem The normal distribution is widely used. Part of

the appeal is that it is well behaved and mathematically tractable. However, the central limit

theorem provides a theoretical basis for why it has wide applicability.

The central limit theorem basically states that as the sample size ( N) becomes large, the

following occur:

1. The sampling distribution of the mean becomes approximately normal regardless of the

distribution of the original variable.

2. The sampling distribution of the mean is centered at the population mean, , of the

original variable. In addition, the standard deviation of the sampling distribution of the

mean approaches

Example

Suppose scores on an IQ test are normally distributed. If the test has a mean of 100 and a standard

deviation of 10, what is the probability that a person who takes the test will score between 90 and 110?

Solution: Here, we want to know the probability that the test score falls between 90 and 110. The "trick"

to solving this problem is to realize the following:

P( 90 < X < 110 ) = P( X < 110 ) - P( X < 90 )

We use the Normal Distribution Calculator to compute both probabilities on the right side of the above

equation.

To compute P( X < 110 ), we enter the following inputs into the calculator: The value of the

normal random variable is 110, the mean is 100, and the standard deviation is 10. We find that P(

X < 110 ) is 0.84.

To compute P( X < 90 ), we enter the following inputs into the calculator: The value of the

normal random variable is 90, the mean is 100, and the standard deviation is 10. We find that

P( X < 90 ) is 0.16.

P( 90 < X < 110 ) = P( X < 110 ) - P( X < 90 )

P( 90 < X < 110 ) = 0.84 - 0.16

P( 90 < X < 110 ) = 0.68

Thus, about 68% of the test scores will fall between 90 and 110.

What is a sampling distribution

Give its properties and give an example.

Sampling Distributions

Suppose that we draw all possible samples of size n from a given population. Suppose

further that we compute a statistic (e.g., a mean, proportion, standard deviation) for each sample.

The probability distribution of this statistic is called a sampling distribution.

Variability of a Sampling Distribution

The variability of a sampling distribution is measured by its variance or its standard deviation.

The variability of a sampling distribution depends on three factors:

If the population size is much larger than the sample size, then the sampling distribution has

roughly the same sampling error, whether we sample with or without replacement. On the other

hand, if the sample represents a significant fraction (say, 1/10) of the population size, the

sampling error will be noticeably smaller, when we sample without replacement.

Central Limit Theorem

The central limit theorem states that the sampling distribution of any statistic will be normal or

nearly normal, if the sample size is large enough.

How large is "large enough"? As a rough rule of thumb, many statisticians say that a sample size

of 30 is large enough. If you know something about the shape of the sample distribution, you can

refine that rule. The sample size is large enough if any of the following conditions apply.

The sampling distribution is symmetric, unimodal, without outliers, and the sample size is

15 or less.

The sampling distribution is moderately skewed, unimodal, without outliers, and the sample

size is between 16 and 40.

The exact shape of any normal curve is totally determined by its mean and standard deviation.

Therefore, if we know the mean and standard deviation of a statistic, we can find the mean and

standard deviation of the sampling distribution of the statistic (assuming that the statistic came

from a "large" sample).

Suppose we draw all possible samples of size n from a population of size N. Suppose further that

we compute a mean score for each sample. In this way, we create a sampling distribution of the

mean.

We know the following. The mean of the population () is equal to the mean of the sampling

distribution (x). And the standard error of the sampling distribution ( x) is determined by the

standard deviation of the population (), the population size, and the sample size. These

relationships are shown in the equations below:

x =

and

Therefore, we can specify the sampling distribution of the mean whenever two conditions are

met:

Note: When the population size is very large, the factor 1/N is approximately equal to zero; and

the standard deviation formula reduces to: x = / sqrt(n). You often see this formula in

introductory statistics texts.

Sampling Distribution of the Proportion

In a population of size N, suppose that the probability of the occurence of an event (dubbed a

"success") is P; and the probability of the event's non-occurence (dubbed a "failure") is Q. From

this population, suppose that we draw all possible samples of size n. And finally, within each

sample, suppose that we determine the proportion of successes p and failures q. In this way, we

create a sampling distribution of the proportion.

We find that the mean of the sampling distribution of the proportion ( p) is equal to the

probability of success in the population (P). And the standard error of the sampling distribution

(p) is determined by the standard deviation of the population (), the population size, and the

sample size. These relationships are shown in the equations below:

p = P

and

where = sqrt[ PQ ].

Note: When the population size is very large, the factor PQ/N is approximately equal to zero; and

the standard deviation formula reduces to: p = sqrt( PQ/n ). You often see this formula in intro

statistics texts.

Example

Assume that a school district has 10,000 6th graders. In this district, the average weight of a

6th grader is 80 pounds, with a standard deviation of 20 pounds. Suppose you draw a random

sample of 50 students. What is the probability that the average weight of a sampled student will

be less than 75 pounds?

Solution: To solve this problem, we need to define the sampling distribution of the mean. Because

our sample size is greater than 40, the Central Limit Theorem tells us that the sampling

distribution will be normally distributed.

To define our normal distribution, we need to know both the mean of the sampling distribution

and the standard deviation. Finding the mean of the sampling distribution is easy, since it is equal

to the mean of the population. Thus, the mean of the sampling distribution is equal to 80.

The standard deviation of the sampling distribution can be computed using the following formula.

x = 20 * sqrt( 1/50 - 1/10000 ) = 20 * sqrt( 0.0199 ) = 20 * 0.141 = 2.82

Let's review what we know and what we want to know. We know that the sampling distribution of

the mean is normally distributed with a mean of 80 and a standard deviation of 2.82. We want to

know the probability that a sample mean is less than or equal to 75 pounds. To solve the problem,

we plug these inputs into the Normal Probability Calculator: mean = 80, standard deviation = 2.82,

and value = 75. The Calculator tells us that the probability that the average weight of a sampled

student is less than 75 pounds is equal to 0.038.

- M350 Wk3 Stats 2 Student tUploaded bynoobaznkid
- Assignment 2Uploaded byMubynAfiq
- ASTM D 4533-91 Standard Test Method for Trapezoid Tearing Streght of GeotextilesUploaded byPablo Antonio Valcárcel Vargas
- Avoiding_Detection__Scripter_Edition_-_Scripter_Discussion_-_TRiBot_Forums.pdfUploaded byJordan
- Statistics and Finance - ContentUploaded byfantrav
- AP Statistics Project ReportUploaded byPaul Park
- Lab6_SchmidtHammeranalysisUploaded byEndashaw Hailu
- Control ChartsUploaded byAnand Dixit
- exp2 (1)Uploaded byRohit Jain
- expdesUploaded byObaid Ur Rehman
- Chap7Uploaded bycleofecalo
- Brewer StatsUploaded bylevtyler
- 5 - 21 - Lecture 44 Expectation and VarianceUploaded byRahulsinghoooo
- Minitab-Intro[1][1].pdfUploaded byDo van Quan
- IE241 ver15Uploaded byApoorva Bhatt
- Introduction to Continuous Probability DistributionsUploaded bygopalmeb6368
- stats.pdfUploaded byHarish Kumar
- chapter10_section3Uploaded byAsim Abdelwahab Abdorabo
- June 2014 (R) QP - S1 Edexcel.pdfUploaded byahamed
- The Individual Risk SlideUploaded byDionisio Ussaca
- Chapter TwoUploaded byAbdii Dhufeera
- Examples Biostatistics. FinalUploaded byabdihakem
- Normal Approximation to the BinomialUploaded byGwen Bess N. Cacanindin
- Cauchy vs Gaussian DistributionUploaded byMartinAlfons
- probablity distributionUploaded byDhrubajyoti Datta
- Unit3Uploaded byUsman
- time headway.pdfUploaded byVikas Ahlawat
- Sampling DistributionUploaded byKritika Jaiswal
- Normal_distrib_PRENHALL_public.pdfUploaded bySriniwas Jha
- Topic 17 DistributionsUploaded byHoang Ha

- AD592.pdfUploaded byThiago Bueno
- Us Steel ExtractsUploaded byHerbert P. Bacosa
- CL 3 ES-EPTUploaded byweldsv
- Mechanical & Automation EngineeringUploaded byaditya sharma
- Statistical Physics DN Angela KochoskaUploaded byAngela Pond
- 376114b_4138Uploaded bySándor Kaba Sarkadi
- Nebular TheoryUploaded byRaven Winfrey
- AC Supply System.pdfUploaded byRajeevAgrawal
- Kinematics (Modified)Uploaded byGuianne Carlo Bustamante
- Trackmaster Oh.desbloqueadoUploaded byGabriela Pao
- Chemical composition, phytochemical and antioxidant activity from extract of Etlingera elatior flower from IndonesiaUploaded byFadhil Muhammad Awaluddin
- Eye Care Massager Model 605E - User ManualUploaded bylcofresi
- Zaha Hadid Inspiration From MalevichUploaded byedgard jeremy
- Application of Wavelets in Signal Processing 1Uploaded byDjam Alyami
- Sikadur 752 Pds EnUploaded bytitie
- Unit 2 Part3 MaemUploaded bykannanchammy
- Canal FallsUploaded byyasir_mushtaq786
- Sprinkler Irrigation 2011 CompleteUploaded bymosne1
- Bending of a Curved Beam (Results-Interpretation) - Numerical Results - Simulation - ConfluenceUploaded byRaviranjan Kumar Singh
- CE3eUploaded bymovie8
- Hosseindisnov10 150204112818 Conversion Gate01Uploaded byMary Morse
- SemisolidUploaded byzxvcazcf
- M.E-computer aided design syllabus 2013 regulationUploaded bybalameck
- Pre-Installation of Offshore Pipe Lines ASHWIN THOTTUKARAUploaded byASHWIN THOTTUMKARA
- A ThermaCoat Case StudyUploaded byBilly Joe Ereno
- GDJP UNIT-IIUploaded byPazhani Vel
- 30 Min Culvert DesignUploaded bySrinivas P
- Geotechnical Engineering Lecture NotesUploaded bySumalinog G. Gelson
- Fatigue Analysis of Rail Joint Using Finite Element MethodUploaded byInternational Journal of Research in Engineering and Technology
- Dish PlateUploaded byLikepageLikeit

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.