Professional Documents
Culture Documents
1/25/2021
2
What is “Statistical Inference”
3
Why we need “Statistical Inference”
4
3.1 Random Sample and Statistics
1/25/2021
5
Statistics
A statistic is defined as any function of the sample data that does not contain
unknown parameters.
For example, let {𝑥𝑥1 , 𝑥𝑥2 , ⋯ , 𝑥𝑥𝑛𝑛 } represent the observations in a sample. The
sample average or sample mean,
∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖
𝑥𝑥̅ =
𝑛𝑛
the sample variance,
∑𝑛𝑛 2
𝑖𝑖=1 (𝑥𝑥𝑖𝑖 − 𝑥𝑥)
̅
𝑠𝑠 2 =
𝑛𝑛 − 1
are statistics. The statistics 𝑥𝑥̅ and 𝑠𝑠 2 describe the central tendency and variability,
respectively, of the sample.
1/25/2021
6
Sampling distribution
7
3.2 Parameters Estimation
1/25/2021
8
3.2.1 Point Estimation
1/25/2021
9
Good point estimators
1/25/2021
10
Useful point estimators
1/25/2021
11
3.2.2 Interval Estimation
12
Example (Demo)
Question?
The relationship between 𝛼𝛼 and width of C.I.?
1/25/2021
13
Example (Application)
1/25/2021
14
Exercises
A chemical process converts lead to gold. However, the production varies due
to the powers of the alchemist. It is known that the process is normally
distributed, with a standard deviation of 2.5g. How many samples must be
taken to be 90% certain that an estimate of the mean process is within 1.5g of
the true but unknown mean yield?
1/25/2021
15
3.3 Hypothesis Testing
Conclusion:
By Comparing the Test Statistic with Critical Value (i.e., a threshold value),
determine whether reject or NOT reject the null hypothesis
1/25/2021
16
Procedures
1) State the null and alternative hypothesis, and define the test statistic.
2) Specify the significance level 𝛼𝛼.
3) Find the distribution of the test statistic and the rejection region of 𝐻𝐻0 .
4) Collect data and calculate the test statistic.
5) Compare the test statistic with the rejection region.
6) Assess the risk.
1/25/2021
17
3.3.1 Testing for Single Sample
Inference on mean
Variance is known
Variance in unknown
Inference on variance
1/25/2021
18
Test population mean if variance is known
Suppose that 𝑥𝑥 is a random variable with unknown mean 𝜇𝜇 and known variance
𝜎𝜎 2 . Test the hypothesis that the mean is equal to a standard value 𝜇𝜇0 .
The hypothesis may be formally stated as
𝐻𝐻0 : 𝜇𝜇 = 𝜇𝜇0
𝐻𝐻1 : 𝜇𝜇 ≠ 𝜇𝜇0
Test statistic:
𝑥𝑥̅ − 𝜇𝜇0
𝑍𝑍0 =
𝜎𝜎/ 𝑛𝑛
Reject 𝐻𝐻0 if |𝑍𝑍0 | > 𝑍𝑍𝛼𝛼/2 where 𝑍𝑍𝛼𝛼/2 is the upper 𝛼𝛼/2 percentage point of the
standard normal distribution. It is sometimes called one-sample Z-test.
Think: How to perform the test if 𝐻𝐻1 : 𝜇𝜇 > 𝜇𝜇0 or 𝐻𝐻1 : 𝜇𝜇 < 𝜇𝜇0 ?
1/25/2021
19
Example
1/25/2021
20
Test population mean if variance is unknown
1/25/2021
Think: How to perform the test if 𝐻𝐻1 : 𝜇𝜇 > 𝜇𝜇0 or 𝐻𝐻1 : 𝜇𝜇 < 𝜇𝜇0 ?
21
Example
Rubber can be added to asphalt to reduce road noise when the material is used
as pavement. The table shows the stabilized viscosity (cP) of 15 specimens of
asphalt paving material. To be suitable for the intended pavement application,
the mean stabilized viscosity should be equal to 32,00. Test this hypothesis
using 𝛼𝛼 = 0.05. Based on experience we are willing to initially assume that
stabilized viscosity is normally distributed. (Example in textbook)
1/25/2021
22
Example (cont’d)
1/25/2021
23
The use of P-value in hypothesis testing
P-value approach
P-value: the smallest level of significance that would lead to rejection of 𝐻𝐻0
If the predefined 𝛼𝛼 > 𝑃𝑃 = 𝛼𝛼min , reject 𝐻𝐻0
Underlying idea: “if 𝑯𝑯𝟎𝟎 is really true, is it possible for test statistic to be
such big/small?”
1/25/2021
24
Example: Normal distribution
If 𝑍𝑍0 is the computed value of the test statistic, then the 𝑃𝑃-value is
25
Confidence interval & hypothesis testing
1/25/2021
26
Understand the hypothesis testing results
1/25/2021
27
Test population variance
29
3.3.2 Testing for Two Samples
1/25/2021
30
Difference in means of two samples
Case 1: If the variances 𝜎𝜎12 and 𝜎𝜎22 are known (table from textbook)
1/25/2021
31
Example
1/25/2021
32
Example (cont’d)
1/25/2021
33
Example
A bakery has a line making Binkies, a big-selling junk food. Another line has
just been installed, and the plant manager wants to know if the output of the
new line is greater than that of the old line, as promised by the bakery
equipment firm. 12 days of data are selected at random from line 1 and 10
days of data are selected at random from line 2, with 𝑥𝑥̅1 = 1124.3 and 𝑥𝑥1̅ =
1138.7. It is known that 𝜎𝜎12 = 52 and 𝜎𝜎22 = 60. Test the appropriate hypotheses
at 𝛼𝛼 = 0.05, given that the outputs are normally distributed.
1/25/2021
34
Difference in means of two samples
Case 2: If variances are unknown, but assume 𝜎𝜎 2 = 𝜎𝜎12 = 𝜎𝜎22 (equal variances)
Construct test statistic:
Combine the two sample variances 𝑠𝑠12 and 𝑠𝑠22 to form an estimator of 𝜎𝜎 2 . The pooled
estimator of 𝜎𝜎 2 is defined as
2 2
𝑛𝑛1 − 1 𝑠𝑠1 + (𝑛𝑛2 − 1)𝑠𝑠2
𝑠𝑠𝑝𝑝2 =
𝑛𝑛1 + 𝑛𝑛2 − 2
If 𝐻𝐻0 is true, the quantity
𝑥𝑥̅1 − 𝑥𝑥̅2 − (𝜇𝜇1 − 𝜇𝜇2 ) Reject 𝐻𝐻0 if?
𝑡𝑡 =
1 1
𝑠𝑠𝑝𝑝
+
𝑛𝑛1 𝑛𝑛2
has a 𝑡𝑡 distribution with 𝑛𝑛1 + 𝑛𝑛2 − 2 degrees of freedom.
1/25/2021
35
Two-sample pooled t-test
Think:
Do we need the
normal assumption
here?
1/25/2021
36
Example
Example in textbook
Two catalysts are being analyzed to
determine how they affect the mean
yield of a chemical process. Specifically,
catalyst 1 is currently in use, but catalyst
2 is acceptable. Since catalyst 2 is
cheaper, it should be adopted, providing
it does not change the process yield. An
experiment is run in the pilot plant and
results in the data shown in the Table. Is
there any difference between the mean
yields? Use 𝛼𝛼 = 0.05 and assume equal
variances.
1/25/2021
37
Example
1/25/2021
38
Example
Assuming that the variances are equal, construct a 95% confidence interval on
the mean difference in surface-finish measurements. Test the hypothesis that
the mean surface finish measurements made by the two technicians are equal.
Use 𝛼𝛼 = 0.05.
1/25/2021
39
Difference in variances of two samples
Consider testing the hypothesis that the variances of two independent normal
distributions are equal.
If random samples of sizes 𝑛𝑛1 and 𝑛𝑛2 are taken from populations 1 and 2,
respectively, then the test statistic for
𝐻𝐻0 : 𝜎𝜎12 = 𝜎𝜎22
𝐻𝐻1 : 𝜎𝜎12 ≠ 𝜎𝜎22
Is simply the ratio of the sample variances,
𝑠𝑠12
𝐹𝐹0 = 2
𝑠𝑠2
What is the distribution of this test statistic if 𝐻𝐻0 is true?
1/25/2021
40
Cont’d
1/25/2021
41
3.3.3 Hypothesis Test Errors
1/25/2021
42
Properties of Type I & Type II Errors
Both types of errors can be reduced by increasing the sample size at the
price of increased inspection costs.
For a given sample size, one risk can only be reduced at the expense of
increasing the other risk. That is, if 𝛼𝛼 is reduced, then 𝛽𝛽 must be increased.
1/25/2021
43
Operating-Characteristic (OC) Curves
Type II error
1/25/2021
44 𝑑𝑑 = |𝛿𝛿|/𝜎𝜎
OC Curves (cont’d)
The larger the mean shift, the smaller the type II error
The larger the sample size, the smaller the type II error
1/25/2021
45
Thank you!
Any Questions?
1/25/2021
46