Professional Documents
Culture Documents
Distributions and
Statistical Concepts
Prof. Sayak Roychowdhury
Descriptive and Inferential Statistics
Statistics
Pr 𝑥 ≤ 𝑋 = 𝑓(𝑥) Pr 𝑥 ≤ 𝑋 = F X = න 𝑓 𝑡 𝑑𝑡
−∞
𝑥=0
𝑛𝐷 𝐷
𝑛𝐷 1−𝑁 𝑁−𝑛
• 𝜇= 𝜎= 𝑁
𝑁 𝑁−1
• If 𝑥𝑖 are independent and identically distributed (IID), and distribution of each 𝑥𝑖 does not
depart radically from normal distribution, then CLT works quite well for 𝑛 ≥ 3 𝑜𝑟 4.
(common in SQC problems)
Central Limit Theorem
• Suppose that we have a population with mean 𝜇 and
standard deviation 𝜎. If random samples of size n are
selected from this population, the following holds if the
sample size is large:
• 1. The sampling distribution of the sample mean will be
approximately normal.
• 2. The mean of the sampling distribution of the sample
mean 𝜇𝑋ത will be equal to the population mean 𝜇.
• 3. The standard deviation of the sample mean is given
𝜎
by𝜎𝑋ത = , known as the standard error.
𝑛
Important Sampling Distributions Derived
from Normal Distribution
1. 𝜒 2 distribution: If 𝑥1 , . . 𝑥𝑛 are standard normally and
independently distributed then 𝑦 = 𝑥12 + 𝑥22 … + 𝑥𝑛2
follow chi-squared distribution with 𝑛 degrees of
freedom.
2. 𝑡-distribution: If 𝑥 is standard normal variable and 𝑦 is chi-
squared random variable with 𝑘 degrees of freedom, and𝑥 if
𝑥 and 𝑦 are independent then the random variable 𝑡 = 𝑦 is
𝑘
distributed as 𝑡 with 𝑘 degrees of freedom.
3. If 𝑤 and 𝑦 are two independent random chi-sq distributed
variables with 𝑢 and 𝑣 degrees of freedom, then the ratio
𝑤
𝑢
𝐹= 𝑦 follows F distribution with (𝑢, 𝑣) degrees of freedom
𝑣
Tests of Normality
• Normal Probability Plot of Residuals
• Histogram
• Boxplot
• Skewness and Kurtosis
• Chi-sq. Test
Normal Probability Plot of Residuals
Minitab > Stat >Basic Statistics > Normality Tests>Select data > Ok
Anderson-Darling Test
𝜎
2𝜎
3𝜎
http://whatilearned.wikia.com/wiki/File:Normal_curve_probability.jpg
Histograms
• Tally grouped or ungrouped data
• Determine range 𝑅 = 𝑋ℎ − 𝑋𝑙
• Determine cell (or bin) interval 𝑖 (applying Sturges’
rule is optional)
𝑖
• Determine cell midpoints (𝑀𝑃𝑙 = 𝑋𝑙 + )
2
• Determine cell boundaries (extra decimal place)
• Post cell/bins and the frequencies
• Plot (X-axis: midpoints, Y-axis: frequencies/Relative
frequencies)
Histograms
Minitab > Graph > Histogram >Simple > Select Variable > OK
Histograms
Minitab > Graph > Histogram >With Fit > Select Variable > OK
12
10
Frequency
0
0 20 40 60 80 100 120
Data
Minitab > Graph > Box plot > Simple > Select Variable
75 %ile/ Q3
Median
25th %ile/ Q1
If the boxplot is symmetrical about median, then data is more likely to be normal
Graphical Summary
Minitab > Stat > Basic Statistics > Graphical Summary
Mean
Median
• Null Hypothesis: 𝐻0 : 𝜇 = 𝜇0
• Alternate Hypothesis: 𝐻1 : 𝜇 ≠ 𝜇0 (2 sided)
𝐻1 : 𝜇 > 𝜇0 (right tail)
𝐻1 : 𝜇 < 𝜇0 (left tail)
ത 0
𝑦−𝜇
• Test Statistic t 0 =
𝑠/√𝑛
where n= sample size,
1
sample mean 𝑦ത = ( )(σ𝑛𝑖=1 𝑦𝑖 )
n
2
σ𝑛 ത
𝑖=1(𝑦𝑖 −𝑦)
sample standard deviation 𝑠 =
𝑛−1
𝑡 is a random variable following t-distribution with 𝜈 degrees
of freedom where 𝜈 =𝑛−1
Example 1: 1-sample T-test
Procedure: 2 sided test: If 𝑡0 ≥ 𝑡𝛼,𝑛−1 the null
2
hypothesis 𝐻0 : 𝜇 = 𝜇0 is rejected (conclusion
𝜇 ≠ 𝜇0 significant)
𝒕𝟎.𝟎𝟐𝟓,𝟒
Confidence Interval
• 100 1 − 𝛼 % CI on the true mean 𝜇 is
𝑠 𝑠
𝑦ത − 𝑡𝛼,𝑛−1 ∗ ≤ 𝜇 ≤ 𝑦ത + 𝑡𝛼,𝑛−1 ∗
2 𝑛 2 𝑛
Example 1: 1-sample Z-test (Known Variance)
• Null Hypothesis: 𝐻0 : 𝜇 = 𝜇0
• Alternate Hypothesis: 𝐻1 : 𝜇 ≠ 𝜇0 (2 sided)
𝐻1 : 𝜇 > 𝜇0 (right tail)
𝐻1 : 𝜇 < 𝜇0 (left tail)
ത 0
𝑦−𝜇
• Test Statistic z0 =
𝜎/√𝑛
where n= sample size,
1
sample mean 𝑦ത = ( )(σ𝑛𝑖=1 𝑦𝑖 )
n
Example 1: 1-sample Z-test (Known Variance)
𝑠2 𝑠2 2
1+ 2
𝑛1 𝑛2
• Degrees of freedom 𝜈 = 2 2 2 2
𝑠1 𝑠2
𝑛1 𝑛
+ 2
𝑛1 −1 𝑛2 −1
Confidence Intervals
• 100 1 − 𝛼 % CI on the difference in mean 𝜇1 − 𝜇2 is :
• For equal variance case:
1 1
𝑦ത1 − 𝑦ത2 − 𝑡 𝛼
,𝑛 +𝑛2 −2 𝑠𝑝 + ≤ 𝜇1 − 𝜇2 ≤
2 1 𝑛1 𝑛2
1 1
𝑦ത1 − 𝑦ത2 + 𝑡𝛼,𝑛 +𝑛 −2
𝑠𝑝 +
2 1 2 𝑛1 𝑛2
• For unequal variance case:
𝑠12 𝑠22 𝑠12 𝑠22
𝑦ത1 − 𝑦ത2 − 𝑡𝛼,𝜈 + ≤ 𝜇1 − 𝜇2 ≤ 𝑦ത1 − 𝑦ത2 + 𝑡𝛼,𝜈 +
2 𝑛1 𝑛2 2 𝑛1 𝑛2
(𝜈= DOF in unequal variance case)
Errors
• Type I error: Null hypothesis is rejected when it is
true.
• Type II error: Null hypothesis is not rejected when it is
false.
• 𝛼 = 𝑃 𝑇𝑦𝑝𝑒 𝐼 𝑒𝑟𝑟𝑜𝑟 = 𝑃 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒
• Also called producer’s risk, probability that a good lot is rejected
• 𝛽 = 𝑃 𝑇𝑦𝑝𝑒 𝐼𝐼 𝑒𝑟𝑟𝑜𝑟 = 𝑃 𝑓𝑎𝑖𝑙 𝑡𝑜 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝐻0 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒
• Also called consumer’s risk, probability that a poor lot is accepted
• Power= 1 − 𝛽= 𝑃 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝐻0 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒
P-Value
• 𝛼 = 0.05 (typically) Level of Significance of the test,
probability of Type I error.
• P-value : Smallest level of significance that would lead
to the rejection of 𝐻0 .
• P-value is the probability that the test statistic would take
on a value that is as extreme a value as the observed value
of the statistic when 𝐻0 is true.
• Intuitively, smallest level of 𝛼 at which the data are
significant, gives an idea how significant the data are.
• At 𝛼 = 0.05 , 𝑝 ≤ 0.05 shows significance (i.e. reject
𝐻0 )
Paired Comparison
Paired Comparison
• When paired data are encountered.
• E.g. 2 machines measured tensile strengths of 8
specimens of fiber, test if the difference between
measurements by 2 machines is significant
• 2 tips were used to test the hardness of 10 specimens of
steel. test if the difference between measuremens by 2
tips is significant
• 𝑦𝑖𝑗 = 𝜇𝑖 + 𝛽𝑗 + 𝜖𝑖𝑗 𝑖 = 1,2 ; 𝑗 = 1,2, . . 𝑘
• 𝜇𝑖 is the true mean response of 𝑖 𝑡ℎ treatment
• 𝛽𝑗 is the effect on response due to 𝑗𝑡ℎ specimen
Paired Comparison
• 𝑑𝑗 = 𝑦1𝑗 − 𝑦2𝑗 for 𝑗 = 1,2, . . 𝑘
• 𝜇𝑑 = 𝐸 𝑑𝑗 = 𝜇1 + 𝛽𝑗 − 𝜇2 − 𝛽𝑗 = 𝜇1 − 𝜇2
• 𝐻0 : 𝜇𝑑 = 0; 𝐻𝑎 : 𝜇𝑑 ≠ 0
𝑑ത 1
• Test statistic 𝑡0 = 𝑆𝑑
ҧ
where 𝑑 = ( ) σ𝑛𝑗=1 𝑑𝑗
𝑛
𝑛
2
σ𝑛 𝑑
𝑗=1 𝑗 − ത
𝑑
• 𝑆𝑑 =
(𝑛−1)
• 𝐻0 is rejected if 𝑡0 > 𝑡 𝛼
1− 2 ,𝑛−1
𝑛−1 𝑆 2 𝑛−1 𝑆 2
100 1 − 𝛼 Confidence interval 2 ≤ 𝜎2 ≤
𝜒𝛼 𝜒2 𝛼
2 ,(𝑛−1) 1− 2 ,(𝑛−1)
Inferences about Variance
• Testing the equality of the variances of two normal
populations. If independent random samples of size 𝑛1 and 𝑛2
are taken from populations 1 and 2, respectively,
• 𝐻0 : 𝜎12 = 𝜎22
𝐻1 : 𝜎12 ≠ 𝜎22
𝑆12
Test statistic: 𝐹0 =
𝑆22
Null hypothesis is rejected if 𝐹0 > 𝐹1−𝛼,n
2 1 −1,n2 −1