You are on page 1of 5

Statistics

Lecture 3: Chapter 8 Fundamental Sampling Distributions and Data Descriptions


- A population consists of the totality of the observations with which we are concerned.
- A sample is a subset of a population.
- Any sampling procedure that produces interferences that consistently overestimate or
consistently underestimate some characteristic of the population is said to be biased.
- To eliminate any possibility of bias in the sampling procedure it is desirable to choose a random
sample in the sense that observations are made independently and at random.
- The random variables 𝑋1 , 𝑋2 , … , 𝑋𝑛 will constitute a random sample from the population 𝑓(𝑥)
with numerical values 𝑥1 , 𝑥2 , … , 𝑥3 .
- Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be 𝑛 independent random variables, each having the same probability
distribution 𝑓(𝑥). 𝑋1 , 𝑋2 , … , 𝑋𝑛 is defined to be a random sample of size 𝑛 from the population
𝑓(𝑥) with the joint probability distribution 𝑓(𝑥1 , 𝑥2 , … 𝑥𝑛 ) = 𝑓(𝑥1 )𝑓(𝑥2 ) … 𝑓(𝑥𝑛 ).
- Any function of the random variables constituting a random sample is called a statistic.

Location measures of a sample: the sample mean, median, and mode.


1
- Sample mean: 𝑋̅ = 𝑛 ∑𝑛𝑖=1 𝑋𝑖 .
𝑥𝑛+1
2 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
- Sample Median: 𝑥̃ = {1 .
(𝑥𝑛 + 𝑥𝑛+1 )𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2 2 2

Variability measures of a sample: the sample variance, standard deviation, and range
1
- Sample variance: 𝑆 2 + ∑𝑛 (𝑋 − 𝑋̅)2
𝑛−1 𝑖=1 𝑖
- Sample standard deviation: 𝑆 = √𝑆 2
- Sample Range: 𝑅 = 𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛

Sampling Distribution
- The probability distribution of a statistic is called a sampling distribution.
- Proof:
1 1 1 𝜎2
- 𝑋̅ = (𝑋1 + 𝑋2 + ⋯ 𝑋𝑛 ) has 𝜇𝑋̅ = (𝜇 + 𝜇 + ⋯ + 𝜇) = 𝜇 and 𝜎 2 = (𝜎 2 + 𝜎 2 + ⋯ + 𝜎 2 ) = .
𝑛 𝑛 𝑛2 𝑛
o 𝐸(𝑥𝑖 ) = 𝜇 and 𝑉𝑎𝑟(𝑥𝑖 ) = 𝜎2
1 1 𝑛𝜇
o 𝜇𝑥̅ = 𝐸(𝑥̅ ) = [𝐸(𝑥1 ) + 𝐸(𝑥2 ) + ⋯ + 𝐸(𝑥𝑛 ) = [𝜇 + 𝜇 + ⋯ + 𝑛𝜇) = =𝜇
𝑛 𝑛 𝑛
1 1 𝑛𝜎 2 𝜎2
o 𝜎 2 = 𝑉𝑎𝑟(𝑥̅ ) = 𝑣𝑎𝑟 [ (𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 )] = [𝜎 2 + 𝜎 2 + ⋯ + 𝑛𝜎 2 ] = =
𝑛 𝑛2 𝑛2 𝑛

The central limit theorem


- If 𝑋̅is the mean of a random sample of size 𝑛 taken from a population with mean 𝜇 and finite
variance 𝜎 2 , then the limiting form of the distribution of
𝑋̅−𝜇
▪ 𝑍= 𝜎 as 𝑛 → ∞, is the standard normal distribution 𝑛(𝑧; 0,1).
√𝑛
- The normal approximation for 𝑋̅will generally be good if 𝑛 ≥ 30, provided the population
distribution is not terribly skewed.
- If 𝑛 < 30, The approximation is good only if the population is not too different from a normal
distribution.
- If the population is known to be normal, the sampling distribution of 𝑋̅ we'll follow a normal
distribution exactly.

-
Sampling distribution of 𝑆 2
- If 𝑆 2 is the variance of a random sample of size 𝑛 is taken from a normal population with mean 𝜇
(𝑛−1)𝑆 2 (𝑋𝑖 −𝑋̅)2
and variance 𝜎 2 , then the statistic 𝜒 2 = 𝜎2
= ∑𝑛𝑖=1 𝜎2
has a chi-squared distribution
with 𝑣 = 𝑛 − 1 degrees of freedom.
- The probability that a random sample produces a 𝜒 2 value greater than some specified value is
equal to the area under the curve to the right of this value. It is customary to let 𝜒𝛼2 represent
the 𝜒 2 value above which we find an area of 𝛼.

-
t-Distribution
𝑋̅− 𝜇
- A natural statistic to consider to deal with inferences on 𝜇 is 𝑇 = 𝑆 .
√𝑛
𝑡 − 𝐷𝑖𝑠𝑡𝑖𝑏𝑢𝑡𝑖𝑜𝑛 Similarity 𝑍 − 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
𝑇-values Depend on the fluctuations of two Both are symmetric about a mean of zero. 𝑍-values depend only on the change in
quantities 𝑋̅ 𝑎𝑛𝑑 𝑆 2 . Both distributions are bell shape. 𝑋̅from sample to sample
The variance of 𝑇 depends on the sample size 𝑛 When 𝑛 → ∞ the two distributions become
and is always greater than 1. the same

- It is customary to let 𝑡𝛼 represent the 𝑡 − 𝑣𝑎𝑙𝑢𝑒𝑠 above which we find an area equal to 𝛼.
- The 𝑡 − 𝑣𝑎𝑙𝑢𝑒 with 10 degrees of freedom that leaves an area of 0.025 to the right is 𝑡 = 2.228.
- Since 𝑡 −distribution is symmetric about a mean of zero: 𝑡1−𝛼 = −𝑡𝛼 .
o The 𝑡 − 𝑣𝑎𝑙𝑢𝑒 Leaving an area of 1 − 𝛼 to the right and therefore an area of 𝛼 to the
left is equal to the negative 𝑡 − 𝑣𝑎𝑙𝑢𝑒 that leaves an area of 𝛼 in the right tail of the
distribution.
- 𝑡-ditributions is used extensively in problems that deal with interference about the population
mean, all in problems that involve comparative samples.
- 𝑇 requires that 𝑋1 , 𝑋2 , … 𝑋𝑛 be normal.
- The use of the 𝑡 −distribution and sample size consideration do not relate to the central limit
theorem.
𝐹-distribution
- The statistic 𝐹 is defined to be the ratio of two independent chi-squared random variables, each
divided by its number of degrees of freedom.
𝑈
𝑣1
- 𝐹 = 𝑉
𝑣2
- where 𝑈 and 𝑉 are independent random variables having chi-squared distributions with 𝑣1 and
𝑣2 degrees of freedom, respectively.
- Let 𝑈 and 𝑉 be two independent random variables having chi-squared distributions with 𝑣1 and
𝑣2 degrees of freedom, respectively. Then the distribution of the random variable 𝐹 is given by
the density function:
2
𝑣 +𝑣 𝑣 𝑣1
(𝛤[ 1 2 2 ](𝑣1 ) ) 𝑣1
( )−1
2 𝑓 2
𝑣1 +𝑣2 ,𝑓 > 0
o ℎ(𝑓) = 𝑣 𝑣
(𝛤( 1 )𝛤( 2 )) 𝑣 𝑓 2
2 2 (1+ 1 )
𝑣2

{ 0, 𝑓 ≤ 0
▪ This is known as the 𝐹-distribution with 𝑣1 and 𝑣2 degrees of freedom (d.f.).
- The curve of the 𝐹-distribution depends not only on the two parameters 𝑣1 and 𝑣2 but also on
the order in which we state them.
- Writing 𝑓𝛼 (𝑣1 , 𝑣2 ) for 𝑓𝛼 with 𝑣1 and 𝑣2 degrees of freedom, we obtain:
1
o 𝑓1−𝛼 (𝑣1 , 𝑣2 ) =
𝑓𝛼 (𝑣2 ,𝑣1 )
- If 𝑆12 and 𝑆22 are the variances of independent random samples of size 𝑛1 and 𝑛2 taken from
normal populations with variances 𝜎12 and 𝜎22 , respectively, then
𝑆2
1
𝜎2
1 𝜎22 𝑆12
o 𝐹 = 𝑆2
= has an 𝐹-distribution with 𝑣1 = 𝑛1 − 1 and 𝑣2 = 𝑛2 − 1 degrees of
2 𝜎12 𝑆22
𝜎2
2
freedom.
Examples

You might also like