You are on page 1of 21

Lecture 8

Random Sampling

• A population consists of the totality of the observations with which we


are concerned.
• Let X1, X2,……..,Xn be n independent random variables each having the
same probability distribution f(x). We then define X1, X2,……..,Xn to be
random sample of size n from the population f(x) and write its joint
probability distribution as
• f(x1, x2, ….., xn) = f(x1)f(x2)…..f(xn).
• Sampling Theory
• A statistic is a random variable that depends only on the observed
random sample.
• The most common statistics for measuring the center of a set of data,
arranged in order of magnitude, are the mean, the median, and mode.
The most important of these is the mean
If X1, X2,……..,Xn represent a random sample of size n,
then the sample mean is defined by the statistic

Example
Find the mean of the random sample whose
observations are 20, 27, and 25.
Solution
The second most useful statistic for measuring the center of the set of data is the median.
If X1, X2,……..,Xn represent a random sample of size n,
then the sample median is defined by the statistic

Example
Find the median for the random sample
whose observations are 8, 3, 9, 5, 6, 8, and 5.
Solution
Arranging the observations in order of magnitude, 3,5,5,6,8,8,9 gives the median as 6.
Example
Find the median for the random sample whose observations are 10, 8, 4, and 7.
Solution
Arranging the observations in order of magnitude 4, 7,8,10,
the median is the arithmetic mean of the two middle values.
Therefore the median is (7+8)/2 = 7.5.
• The third and final statistic for measuring the center of a random sample
is the mode.
• If X1, X2,……..,Xn, not necessarily different, represent a random sample of
size n, then the sample mode M is that value of sample that occurs more
often or with greatest frequency. The mode may not exist, and when it
does it is not necessarily unique.
• Example
• The mode of the random sample whose observations are
2,4,4,5,6,6,6,7,7, and 8 is 6.
• Example
• The observations 3,4,4,4,4,7,7,8,8,8,8, and 9 have two modes, 4 and 8
occur with the greatest frequency. The distribution of the sample is said
to be bimodal.
• The relative merits of the three (mean, median, mode) are
• (1) The mean is the most common used measure of
central tendency in statistics.
• (2) The only real disadvantage of the mean is that it may
be affected adversely by extreme values.
• (3) The median has the advantage being easy to compute.
It is not influenced by extreme values.
• (4) The mode is the least used measure of the three. For
small set of data its value is almost useless, if in fact it exist
at all. Its only advantage is that it requires no calculation.
Variability Measurement

• The most important statistics for measuring the variability of


a random sample are the range and the variance. The
simplest of these to compute is the range.
• The range of a random X1, X2,……..,Xn, arranged in increasing
order of magnitude, is defined by the statistic Xn – X1.
• Example
• The range of the set of observations 10,12,12,18,19,22, and
24 is 24-10 =14.
• The range is a poor measure of variability, particularly if the
size of the sample is large. It considers only extreme values
and tells nothing about the distribution of values in between.
To overcome the disadvantage of the range the sample variance
is used which consider the position of each observation relative to the sample mean .
If X1, X2,……..,Xn represent a random sample of size n,
then the sample variance is defined by the statistic

If S2 is the variance of a random sample of size n, we may write

Example
Find the variance of the sample whose observations are 3,4,5,6,6, and7.
Solution
∑xi2 = 171 i=1 to 6, ∑xi = 31, i=1 to 6, n=6. Hence
S2 = ((6)(171) – (31)2)/(6)(5) = 13/6
.

The probability distribution of a statistic is called a sampling distribution.


The standard deviation of the sampling distribution of a statistic
is called the standard error of the statistic.

The probability distribution of

is called the sampling distribution of the mean,


and the standard error of the mean is the standard
deviation of the sampling distribution of
Sampling Distributions of Means

If we are sampling from a population with unknown distribution,


either finite or infinite, the sampling distribution of
will still be approximately normal with mean μ and variance σ 2/n
provided that the sample size is large.

If is the mean of a random sample of size n taken from a population


with mean μ and variance σ2 then the limiting form of the distribution of

as n→∞, the standardized normal distribution n(z;0,1).


• The normal approximation for sampling mean
will generally be good if n≥30 regardless of the
shape of the population. If n˂30, the
approximation is only good if the population is
not too different from the normal population.
Example

• An electrical firm manufactures light bulbs that have a


length of life that is approximately normally
distributed, with mean 800 hours and standard
deviation of 40 hours. Find the probability that a
random sample of 16 bulbs will have an average life of
less than 775 hours.
• Solution
• mean = 800, standard deviation = 40/√16 = 10
• z = (775 – 800)/10 = -2.5
• P( ˂ 775) = P(Z˂-2.5) = 0.0062
• Example
• Given the discrete uniform population
• f(x) =1/4 , x=0, 1, 2, 3
• = 0 elsewhere
• Find the probability that a random sample of
size 36, selected with replacement will yield a
sample mean greater than 1.4 but less than 1.8
if the mean is measured to the nearest tenth.
Solution
• μ = (0 + 1 + 2 + 3 )/4 = 3.5
• σ2 = 5/4
• σ2(x) = σ2/√n = (5/4)/√36 =5/144
• σ (x) = √(5/144) = 0.186
• x1x =1.45, x2
x = 1.75
• z1 = (1.45 – 1.5)/0.186 = -0.269
• z2 = (1.75 – 1.5)/0.186 = 1.344
• P(1.4˂X˂ 1.8) ≈ P(-0.269˂Z˂1.344)
• P(Z˂1.344) – P(Z˂-0.269) =0.9105-0.3932=0.5173
Sampling Distribution of the Difference of Means
If independent samples of size n1 and n2 are drawn at random
from two populations, discrete or continuous, with mean μ 1 and μ2
and variances σ12 and σ22, respectively,
then the sampling distribution of the differences of means
is approximately normally distributed with mean and variance given by

Hence

is approximately a standard normal variable.


• Example
• The television picture tubes of manufacture A have a
mean lifetime of 6.5 years and a standard deviation
of 0.9 year, while those of manufacture B have a
mean lifetime of 6.0 years and standard deviation of
0.8 year. What is the probability that a random
sample of 36 tubes from manufacture A will have a
mean lifetime that at least a year more than the
mean lifetime of a sample of 49 tubes from
manufacture B?
Solution

P(Z˃2.646) = 1- P(Z˂2.646)
= 1 – 0.9959 = 0.0041
Sampling Distribution of (n-1)S2/σ2

• If S2 is the variance of a random sample of size n taken from a normal population having a
variance σ2, then the random variable
• χ2 = (n-1)S2/σ2
• has a chi-square distribution with ν degrees of freedom.
• The probability that a random sample produces a χ2 value greater than some specified
value is equal to the area under the curve to the right of this value.
• Example
• A manufacturer of car batteries guarantees that his batteries will last, on the average, 3
years with standard deviation of 1 year. If five of these batteries have lifetimes of
1.9,2.4,3.0,3.5, and 4.2 years, is the manufacture still convinced that his batteries have
standard deviation of 1 year.
• Solution
• S2 = ((5)(48.26)-(15)2)/((5)(4)) = 0.815
• Then χ2 = (4)(0.815)/(1) = 3.26
• from tables 95% of χ2 fall between 0.484 and 11.143, then the computed value is
reasonable.
t Distribution

• If the sample size is small(n˂30), the value of S 2 fluctuate considerably


from sample to sample and the distribution of the random variable (XX
-μ)/(S/√n) is no longer a standard normal distribution. We are now
dealing with the distribution of statistic that we shall call T, where
• T = (XX-μ)/(S/√n)
• The distribution of T is similar to the distribution of Z in that they both
are symmetric about a mean of zero. Both distributions are bell-shaped,
but the t distribution is more variable, owing to the fact that the T
values depend on the on the fluctuations of two of two quantities XX and
S2, whereas the Z values depend only on the changes of XX from sample
to sample. The distribution of T differs from that of Z in that the
variance of T depends on the sample size and always greater than 1.
Example
• A manufacture of light bulbs claims that his bulbs will burn on the average 500
hours. To maintain this average, he tests 25 bulbs each month. If the computed t
values falls between –t0.05 and t0.05, he is satisfied with his claim. What conclusion
should he draw from a sample that has mean xx =518 hours and standard deviation s
= 40 hours? Assuming the distribution of burning times is approximately normal.
• Solution
• From table V we find t0.05 =1.711 for 24 degrees of freedom. Therefore, the
manufacture is satisfied with claim if a sample of 25 bulbs yield a t value between
-1.711 and 1.711. If μ = 500, then
• t = (518 -500)/(40/√25) = 2.25
• a value well above 1.711. The probability of obtaining a t value, with ν =24 equal or
greater than 2.25 is approximately 0.02. If μ ˃ 500, the value of t computed from
the sample would be more reasonable. Hence the manufacture is likely to conclude
that his bulbs are a better product than he thought.
F Distribution

• One of the most important distribution in applied statistics is the F Distribution. The
statistic F is defined to be the ratio of two independent chi-square random
variables, each divided by their degrees of freedom. Hence
• F = (U/ν1)/(V/ν2)
• where U and V are independent random variables having chi-square distributions
with ν1 and ν2 degrees of freedom, respectively.
• Let us define fα to be a particular value f of the random variable F above which we
find an area equal to α.
• Writing fα (ν1,ν2) for fα with ν1 and ν2 degrees of freedom we obtain
• f1-α(ν1,ν2) = 1/ fα (ν2,ν1)
• f0.95(6,10) = 1/ f0.05 (10,6)=1/4.06=0.246
• If S12 and S22 are the variances of independent random samples of size n 1 and n2
taken from normal populations with variances σ12 and σ22 , respectively, then
• F = (S12/ σ12)/( S22/ σ22) =( σ22S12)/( σ12 S22)