# 1

INSE 6220 -- Week 4

• Sampling and Estimation
• Confidence intervals
• Control Charts and hypothesis testing
• Statistical basis for Control Charts

Dr. A. Ben Hamza Concordia University

2

Using the normal cdf and pdf
• We often want to talk about “percentage
points” of the distribution-portion in the
tails.

 P ( Z  z / 2 )  1  P ( Z  z / 2 )  1  ( z / 2 )
2

  ( z / 2 )  1 
2
 
 z / 2 = -1  1  
 2

>> icdf('normal',1- /2,0,1)

Also, we have:  P ( Z   z / 2 )   (  z / 2 )
2

Example: z0.20 / 2  z0.10  1.2816
z0.05/ 2  z0.025  1.96

3
Moments of the population vs. sample statistics
Population Sample

1 n
• Mean   X  E( X ) X   Xi
n i 1
2

• Variance   Var ( X )    E ( X   X )
2 2
X
2
S S 
2 1 n2
X
n  1 i 1
Xi  X  
 E( X 2 )   E( X )
2

• Standard   2 S  S2
Deviation

• Covariance
 2
XY  Cov( X , Y )  E ( X   X )(Y   y ) S 2
XY 
1 n
 
n  1 i 1

X i  X Yi  Y 
 E ( XY )  E ( X ) E (Y )

 XY
2
Cov( X , Y ) 2
S XY
• Correlation  XY   rXY 
Coefficient  XY Var ( X )Var (Y ) S X SY

4
Statistical Inference
• The purpose of statistical inference is to obtain information about a population from
information contained in a sample.
• A population is the set of all the elements of interest.
• A sample is a subset of the population.
• The sample results provide only estimates of the values of the population
characteristics.
• A parameter is a numerical characteristic of a population.
• With proper sampling methods, the sample results will provide “good” estimates of
the population characteristics.
• In point estimation we use the data from the sample to compute a value of a
sample statistic that serves as an estimate of a population parameter.
• We refer to X as the point estimator of the population mean .
• s is the point estimator of the population standard deviation .
• When the expected value of a point estimator is equal to the population parameter,
the point estimator is said to be unbiased.

we calculate a sample mean from a random sample. a value that can be computed from data (contains no unknowns) – average. standard deviation • A statistic is a random variable. 5 Sampling and Estimation • Sampling: act of making observations from populations • Random sampling: when each observation is identically and independently distributed (i. the value for the statistic will be different for each set of samples. we can reason about the population based on the observed value of a statistic – E. if we take multiple random samples. in what range do we think the actual (population) mean really sits? . median. which itself has a sampling distribution – i.i.) • Statistic: a function of sample data.g..d.e. but will be governed by the same sampling distribution • If we know the appropriate sampling distribution.

That is we say (with some ___% certainty) that the population parameter of interest is between some lower and upper bounds. . •An interval estimator draws inferences about a population by estimating the value of an unknown parameter using an interval. 6 Point and Interval Estimators •A point estimator draws inferences about a population by estimating the value of an unknown parameter using a single value or point.

For n=25 students. point estimate interval estimate An alternative statement is: • The mean income is between 380 and 420 \$/week. is calculated to be 400 \$/week. 7 Point & Interval Estimation… For example. . suppose we want to estimate the mean summer income of a class of Quality Systems Engineering students.

8 Population vs. Sampling Distribution .

9 Sampling Distributions The probability distribution of a statistic is called a sampling distribution • Sampling distribution for sample mean when large sample → Normal distribution • Sampling distribution for sample mean when small sample → Student-t distribution • Sampling distribution for sample variance → Chi-squared distribution • Sampling distribution of the ratio of two sample variances → F Distribution .

10 Estimation Process Random Sample I am 95% confident that  is between Population Mean 40 & 60. (mean. . is X = 50 unknown) Sample .

General Formula 11 • The general formula for all confidence intervals is: Point Estimate ± (Critical Value)(Standard Error) Where: • Point Estimate is the sample statistic estimating the population parameter of interest • Critical Value is a table value based on the sampling distribution of the point estimate and the desired confidence level • Standard Error is the standard deviation of the point estimate .

12 Confidence Level.) = 0. (so  = 0. (1-) • Confidence Level  The confidence that the interval will contain the unknown population parameter  A percentage (less than 100%) Suppose confidence level = 95% • Also written (1 .95.05) • A relative frequency interpretation:  95% of all the confidence intervals that can be constructed will contain the unknown true parameter • A specific interval either will contain or will not contain the true parameter  No probability involved in a specific interval .

Confidence interval on the mean: variance known 13 • We know  .g. e. from historical data • Estimate mean in some interval to (1   )100% confidence   X  z / 2    X  z / 2 n n width .

96 0 zα/2 = 1.95    0.05 α α  0.025 2 2 Z units: -zα/2 = -1. 14 Finding the Critical Value.96 1    0.025  0. zα/2 • Consider a 95% confidence interval: z /2  1.96 Lower Upper X units: Confidence Point Estimate Confidence Limit Limit .

Determine a 95% confidence interval for the true mean resistance of the population. We know from past testing that the population standard deviation is 0.9932 and 2.35/ 11)  2.35 ohms.20  0.96 (0.4068 • We are 95% confident that the true mean resistance is between 1. • Solution: σ X  z /2 n  2.20  1.4068 ohms.Example 15 • A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms.9932    2. 95% of intervals formed in this manner will contain the true mean . • Although the true mean may or may not be in this interval.2068 1.

σ is not known. 16 Do You Ever Truly Know σ? • Probably not! • In virtually all real world situations. since S is variable from sample to sample • So we use the t distribution instead of the normal distribution . we can substitute the sample standard deviation. • If there is a situation where σ is known then µ is also known (since to calculate σ you need to know µ. S • This introduces extra uncertainty.) • If you truly know µ there would be no need to gather a sample to estimate it. Confidence Interval for μ (σ Unknown) • If the population standard deviation σ is unknown.

17 Confidence Interval on the Mean of a Normal Distribution. is small and underlying distribution is Normal (or close to Normal) • Calculate upper and lower CI limit using Student-t distribution and 1-α such that P(LCL ≤ μ ≤ UCL) = 1-α . . of data that is Normally distributed with unknown variance (i. Variance Unknown • Want to estimate the population mean. n.e. X X- • S n will be Student-t distributed with n-1 degrees of freedom  How do I know this? Remember that Student t Distribution is the sampling distribution of the sample mean when sample size. small sample) • Take a sample of size n from Normally distributed data (Note: n < 40) • Find sample mean.

18 Sampling: the Chi-Square distribution 2 S ~   2 2  (n  1) n 1 .

1) • Consider X i N (  . • Typical use: Find distribution of average when  is NOT known • For k  .1)  t n 1 s/ n s / 1  n21 n 1 This is just the “normalized” distance from mean (normalized to our estimate of the sample variance).  2 ) Then X  X   / n N (0. tk  N (0.1) then Z Y /k tkwith Y  k is distributed as a student 2 t distribution with k degrees of freedom. . 19 Sampling: the t student distribution • If Z N (0.

n 1 n n • Note that the t distribution is slightly wider than the normal distribution. confidence intervals bounds not symmetric. Confidence intervals: Estimate of variance ( n  1) s 2 ( n  1) s 2 2  2 / 2.n 1    X  t / 2. but also estimate the variance • Our estimate of the mean to some interval with (1-)100% confidence becomes s s X  t / 2. . so that our confidence interval on the true mean is not as tight as when we know the variance.n 1 • The appropriate sampling distribution is the Chi-square • Because chi-square is asymmetric.n 1 12 / 2. 20 Confidence intervals: variance unknown • Case where we don’t know variance a priori • Now we have to estimate not only the mean based on our data.

e. n 1 X  t . X~ N(. σ2) • Population parameters: – Mean. : unknown → to be estimated – Variance.n 1 X  t . n< 40) • Sample Statistic:  Sample Mean • Sampling Distribution: Student‐t Distribution • Confidence Intervals (CIs): 2‐sided CI Upper 1‐sided CI  Lower 1‐sided CI    S S S X  t / 2 .e. Small Sample Size Summary of Procedure • Data Distribution: Normal  i.n 1 n n n . 21 Confidence Interval on the Mean of a Normal Distribution. σ2: unknown → es mate using S2 • Sample Size: small (i.

025. 22 t-distribution table The shaded are is equal to  for t  t .15  2.131 .05  t / 2.n 1  t0. = degree of freedom Example: n  16.   0.

σ2) • Population parameters: – Variance. σ2: unknown → to be estimated • Sample Size: any • Sample Statistic: Sample variance • Sampling Distribution: Chi-squared Distribution • Because chi-square is asymmetric. • Confidence Intervals (CIs): 2‐sided CI Upper 1‐sided CI  Lower 1‐sided CI   ( n  1) S 2 ( n  1) S 2 (n  1) S 2 (n  1) S 2 2   2 / 2 .n 1  12 . X~ N(.n 1  2 .n 1 . n 1  12 / 2 .e. 23 Confidence Interval on the Variance of a Normal Distribution Summary of Procedure • Data Distribution: Normal i. confidence intervals bounds not symmetric.

05/2.19) = 2.0.1-0.19.19) = 0. 24 Sampling: the F distribution MATLAB >> icdf('F'.5265 >> icdf('F'.05/2.19.3958 .

Suppose your average price was \$1.10 A hypothesis test is a procedure for determining if an assertion about a characteristic of a population is reasonable. Example1: The mean monthly cell phone bill in this city is  = \$42 Example2: The proportion of adults in this city with cell phones is p = 0.10 due to variability in price from one station to the next. That approach might be definitive.10 H1 :   1.18. . A simpler approach is to find out the price of gas at a small number of randomly chosen stations around the city and compare the average price to \$1.68 Example3: suppose that someone says that the average price of a liter of regular unleaded gas in Montreal is \$1. but it could end up costing more than the information is worth. or is the original assertion incorrect? A hypothesis test can provide an answer.10. the average price you get will probably not be exactly \$1.10.Hypothesis Testing pronounced Null 25 H “nought” Alternative Hypothesis Hypothesis H 0 :   1. Of course. How would you decide whether this statement is true? You could try to find out what every gas station in the city was charging and how many liters they were selling at that price. Is this three cent difference a result of chance variability.

the notation is  = 0. 26 Hypothesis Test Terminology • The significance level is related to the degree of certainty you require in order to reject the null hypothesis in favor of the alternative. If you need more protection from this error. • The null hypothesis is always about a population parameter. For this significance level. not about a sample statistic H0 : μ  3 H0 : X  3 . then choose a lower value of  . By taking a small sample you cannot be certain about your conclusion. you have insufficient evidence to reject the null hypothesis. • The p-value is the probability of observing the given sample result under the assumption that the null hypothesis is true. The converse is not true. For a typical significance level of 5%. the probability of incorrectly rejecting the null hypothesis when it is actually true is 5%.05. So you decide in advance to reject the null hypothesis if the probability of observing your sampled result is less than the significance level. If the p-value is greater than .05 and the p-value is 0. If the p-value is less than .03. then you reject the null hypothesis. then you reject the null hypothesis. For example. if  = 0.

27 Type I and Type II Errors • Since hypothesis tests are based on sample data.  Statistician avoids the risk of making a Type II error by using “do not reject H0” and not “accept H0”. we must allow for the possibility of errors. • A Type I error is rejecting H0 when it is true.  Generally. • The person conducting the hypothesis test specifies the maximum allowable probability of making a Type I error.  A Type II error is accepting H0 when it is false. denoted by . denoted by  and called the level of significance. we cannot control for the probability of making a Type II error. • P(Type I error) = α • P(Type II error) = β .

and we reject H0 only if Z0<−Zα . the one-sided alternative hypothesis is H1: µ>µ0. where Z/2 is the upper /2 percentage of the standard normal distribution. • In some situations we may wish to reject H0 only if the true mean is larger than µ0  Thus. variance known H 0 :   0 H1 :    0 (3-22) X  0 Z0  (3-23) / n •H1 in equation (3-22) is a two-sided alternative hypothesis •The procedure for testing this hypothesis is to:  take a random sample of n observations on the random variable x. and we would reject H0: µ=µ0 only if Z0>Zα • If rejection is desired only when µ<µ0  Then the alternative hypothesis is H1: µ<µ0.  compute the test statistic. 28 Inference on the mean of a population. and  reject H0 if |Z0| > Z/2.

29 Confidence interval on the mean. variance known Furthermore. a 100(1 − α)% upper confidence bound on µ is whereas a 100(1 − α)% lower confidence bound on µ is .

• In general. H0:  > 0 H0:  < 0 H0:  = 0 H1:  < 0 H1:  > 0 H1:   0 One-tailed One-tailed Two-tailed .A Summary of Forms for Null and Alternative 30 Hypotheses about a Population Mean • The equality part of the hypotheses always appears in the null hypothesis. a hypothesis test about the value of a population mean  must take one of the following three forms (where 0 is the hypothesized value of the population mean).

31 Example .

32 Example .

33 F-test statistic .

34 Introduction to control charts • Principal purpose: early detection of an ‘out-of-control’ process • A process is out of control if it is producing items which are  off target or  too variable • An out-of-control process is likely to produce many nonconforming items • If an assignable cause can be found. in-control process will produce fewer nonconforming items. • Based on each sample. • Basic principle: Samples of measurements are periodically taken at one or more stages of a production process to provide data for the monitoring of the process. a statistic is computed and plotted against time. the process can be corrected and brought back into control. • The result is a time series of the observed statistic values. . • A capable.

is producing consistent output). there will be a high probability that the sample finding will be between the two lines. • Specification Limits are used to determine if the product will function in the intended fashion. • A range is specified within which the statistic is likely to have come from the same distribution as the preceding data.. • Values outside of the control limits provide strong evidence that the process is out of control. . • Two important lines on a control chart are the upper control limit (UCL) and lower control limit (LCL). • Control Limits are used to determine if the process is in a state of statistical control (i.Control Charts 35 • A Control Chart is a graphical method to spot assignable cause variation quickly.e. • These lines are chosen so that when the process is in control. • A control chart is like a hypothesis test.

the null hypothesis is not rejected • When a point plots outside the control limits. 36 Control charts and hypothesis testing • Null hypothesis H : process is in-control 0 • Alternative hypothesis H1: process is out-of-control • When a point plots within the control limits. Not rejecting the null hypothesis when it is false 2. Concluding the process is out of control when it isn’t 3. Rejecting the null hypothesis when it is true 2. False Alarm: an in-control point plots outside the control limits • Type II error: 1. Failing to detect an out of control condition: an out-of-control point plots inside the control limits . the null hypothesis is rejected • Type I error: 1.

and so on.Types of Control Charts 37 • An x chart is used if the quality of the output is measured in terms of a variable such as length.'rules'. weight.'we6').'chart'. . temperature. • A p chart is used to monitor the proportion defective in the sample. • x represents the mean value found in a sample of the output. • An R chart is used to monitor the range of the measurements in the sample.'range'.'sigma'. >> controlchart(data.'xbar'. • An np chart is used to monitor the number of defective items in the sample.

Dev. of W • A control chart based on a number of standard deviations of the statistic from the mean of the statistic is called a Shewhart Control Chart • Some commonly used W’s  X-bar: Average  R: Range  s: Standard deviation • We can also specify control charts using probability limits . 38 Shewhart Control Charts • Suppose we have a general statistic W • We plot W over time • We specify control limits of the form U C L    3 W W Mean of W C L   W L C L   W  3 W Std.

. 39 X-bar Control Charts • We don’t know  and .. X m then the best estimate for  is X  X  .. with biasing term d2 R • So an unbiased estimator is given by: d 2 ... R 2 . X 2 . with averages X 1 .. so we must estimate them • If we have m subgroups....  X X  1 2 m m • Suppose we have subgroup ranges (Xmax-Xmin) R  R  ...  R R1.. R m R  1 2 m m • It turns out that R is a biased estimator of .

A 2 3 L C L  X  R d 2 n C L X 3 U C L  X  R d 2 n . 40 X-bar Control Charts • Control Limits U C L   X  3 X C L   X L C L   X  3 X • Therefore.

41 R-Control Charts • We are looking to make control charts of the form LCL   R  3 R UCL   R  3 R • The best estimate for R is R • What about  R? • It turns out that:   d  R 3 So R ˆ R  d 3 d2 D 3 • Thus LCL  R  3d R    1  3 d 3   R 3 d 2  d 2  R  d  UCL  R  3d 3   1  3 3  R d 2  d 2  D 4 .

42 .

50345 R  0.32521 LCL  D3 R  (0)(0.5056  (0.32521) x  1. 43 Example x  Chart : UCL  x  A2 R  1.32521) Central line  R  0.114)(0.32521)  1.31795 R  Chart : UCL  D4 R  ( 2.32521)  1.5056  (0.5056 LCL  x  A2 R  1.577 )(0.3360 .69325 Central line  x  1.577 )(0.

'we6').'range'. Interpreting Charts: • Observations outside control limits indicate the process is probably “out-of-control” • Significant patterns in the observations indicate the process is probably “out-of-control” • Random causes will on rare occasions indicate the process is probably “out-of-control” when it actually is not .'sigma'.'rules'.'chart'.'xbar'. 44 X-bar chart using MATLAB >> load parts >> controlchart(runout.