00 upvotes00 downvotes

1 views194 pagesanalysis

Nov 22, 2018

© © All Rights Reserved

PDF, TXT or read online from Scribd

analysis

© All Rights Reserved

1 views

00 upvotes00 downvotes

analysis

© All Rights Reserved

You are on page 1of 194

❖ Y = f(X),

❖ where Y is Dependent variable or the result

(output)

❖ X is Independent variable, input or the

controllable variable

obtained by students in a subject (Y) vs

hours of study (X)

Correlation

Correlation

Correlation

❖ Demonstration:

❖ Calculate Pearson’s Correlation coefficient

using MS Excel

Column 1 Column 2

Column 1 1

Column 2 0.879350768 1

Correlation Coefficient

❖ Correlation

❖ Measures the strength of linear

relationship between Y and X

❖ Pearson Correlation Coefficient, r (r

varies between -1 and +1)

❖ Perfect positive relationship: r = 1

❖ No relationship: r = 0

❖ Perfect negative relationship: r = -1

Correlation Coefficient

Correlation vs Causation

❖ Correlation does not imply causation

❖ a correlation between two variables does

not imply that one causes the other

Correlation – Confidence Interval

❖ Population correlation (ρ) – usually

unknown

❖ Sample correlation (r)

Correlation – Confidence Interval

❖ Since r is not normally distributed, there

are three steps to find out confidence

interval

❖ Convert r to z’ (Fisher’s Transformation)

❖ Calculate confidence interval in terms of z’

❖ Convert confidence interval back to r

❖ z’ = .5[ln(1+r) – ln(1-r)]

❖ Variance = 1/N-3

Correlation – Confidence Interval

❖ N=10, r=0.88 find confidence interval

❖ Step 1.

❖ Convert r to z’

❖ z’ = .5[ln(1+r) – ln(1-r)]

❖ z’ = .5[ln(1+0.88) – ln(1-0.88)]

❖ z’= . 5[0.63 – (-2.12)] = 1.375

Correlation – Confidence Interval

❖ N=10, r=0.88 find confidence interval

❖ Step 2. Confidence interval for z’

❖ Variance = 1/N-3 = 1/7 = 0.1428

❖ Standard error = Sqrt (0.1428) = 0.378

❖ 95% confidence Z = 1.96

❖ CI = 1.375 +/- (1.96)(0.378)

❖ Lower Limit = 0.635

❖ Upper Limit = 2.11

Correlation – Confidence Interval

❖ N=10, r=0.88 find confidence interval

❖ Step 3. Convert back to r

❖ z’ Lower Limit = 0.635

❖ z’ Upper Limit = 2.11

z’ = .5[ln(1+r) – ln(1-r)]

❖ r Upper Limit = 0.97

Coefficient of Determination

❖ Coefficient of Determination, r2

❖ Proportion of the variance in the

dependent variable that is predictable

from the independent variable

❖ (varies from 0.0 to 1.0 or zero to 100%)

❖ None of the variation in Y is explained by X,

r2 = 0.0

❖ All of the variation in Y is explained by X,

r2= 1.0

❖ r = 0.88, r2 = 0.77

Regression Analysis

❖ Quantifies the relationship between Y

and X (Y = a + bX)

Regression Analysis

❖ Quantifies the relationship between Y

and X (Y = a + bX)

Hours Studied (X) Test Score % (Y) XY X2 Y2

20 40 800 400 1600

24 55 1320 576 3025

46 69 3174 2116 4761

62 83 5146 3844 6889

22 27 594 484 729

37 44 1628 1369 1936

45 61 2745 2025 3721

27 33 891 729 1089

65 71 4615 4225 5041

23 37 851 529 1369

SUM 371 520 21764 16297 30160

Regression Analysis

❖ Quantifies the relationship between Y

and X (Y = 15.79 + 0.97.X)

Hours

Test Score

Studied XY X2 Y2

% (Y)

(X)

20 40 800 400 1600

24 55 1320 576 3025

46 69 3174 2116 4761

62 83 5146 3844 6889

22 27 594 484 729

37 44 1628 1369 1936

45 61 2745 2025 3721

27 33 891 729 1089

65 71 4615 4225 5041

23 37 851 529 1369

SUM 371 520 21764 16297 30160

Regression Analysis

❖ For a student studying 50 hrs what is the

expected test score %?

Residual Analysis

❖ Y = 15.79 + 0.97.X

Residual Analysis – No pattern

Residual

20

15

10

0

0 10 20 30 40 50 60 70

-5

-10

-15

Residual

Confidence Interval - Slope

❖ Confidence interval

❖ 95% confidence interval, representing a

range of likely values for the mean

response.

❖ Prediction interval

❖ 95% prediction interval, represents a range

of likely values for a single new

observation.

Multivariate Tools

❖ Simple Linear Relation

❖ Y = a + bX

❖ Multiple Linear Regression

❖ Y = a + b1X1 + b2X2+ ……..+ bnXn

❖ Multicollinearity

❖ When two input variables (predictor

variables - Xs) are correlated.

❖ Multivariate

❖ Two or more dependent variables (Ys)

Multivariate Tools

❖ Factor analysis / Principal Component

Analysis

❖ Discriminant analysis

❖ Multiple analysis of variance (MANOVA)

Multivariate

❖ Application

❖ Climate: Min temp, max temp, humidity,

precipitation – for a day

❖ Medical – Systolic BP, Diastolic BP, Pulse

rate, Age – of a patient

Multivariate

❖ Application

❖ Classification of individuals – easy when

there are limited number of variables.

number of variable to manageable number.

❖ Cause-effect relationship

Multivariate

❖ Tools

❖ Classification of individuals

❖ Discriminant Analysis

❖ Dimension reduction

❖ Principal Component Analysis/ Factor

Analysis

❖ Cause-effect relationship

❖ MANOVA

Discriminant Analysis

❖ Explains how clusters are different

PCA / Factor Analysis

❖ Principal Component Analysis/ Factor

Analysis

❖ To reduce number of variables

❖ By grouping highly correlated variables

together

MANOVA

❖ The MANOVA (multivariate analysis of

variance)

❖ To analyze data that involves more than

one dependent variable at a time.

❖ Tests the effect of one or more

independent variables on two or more

dependent variables.

❖ MANOVA is simply an ANOVA with

several dependent variables.

Errors of Statistical Tests

True State of Nature

H0 Ha

Is true Is true

Support H0 /

Reject Ha Correct Type II

Conclusion Error

Conclusion Support Ha / Correct

Reject H0 Type I Error Conclusion

(Power)

Errors of Statistical Tests

Type I error (alpha) Type II error (beta)

Name Producer’s risk/ Consumer’s risk

Significance level

1 minus error is Confidence level Power of the test

called

Example of Fire False fire alarm leading Missed fire leading to

Alarm to inconvenience disaster

Effects on Unnecessary cost Defects may be produced

process increase due to frequent

changes

Control method Usually fixed at a pre- Usually controlled to < 10%

determined level, 1%, by appropriate sample size

5% or 10%

Simple definition Innocent declared as Guilty declared as innocent

guilty

Significance Level

Level of Confidence / Confidence Interval:

C = 0.90, 0.95, 0.99 (90%, 95%, 99%)

Level of Significance:

α = 1 – C (0.10, 0.05, 0.01)

Power

❖ Power = 1 – β (or 1 - type II error)

❖ Type II Error: Failing to reject null

hypothesis when null hypothesis is false.

❖ Power: Likelihood of rejecting null

hypothesis when null hypothesis is false.

correctly reject the null hypothesis.

Alpha vs Beta

❖ Researcher can not commit both Type I

and II error. Only one can be committed.

❖ As the value of α increases (say 0.01 to

0.05) β goes down and the Power of test

increases.

❖ To reduce both Type I and II errors

increase sample size.

Power

❖ As the value of α increases (say 0.01 to

0.05) β goes down and the Power of test

increases.

Statistical Significance

❖ Case of a perfume making company:

❖ Mean Volume 150 cc and sd=2 cc

Practical Significance

❖ Practical significance of an experiment

tells us if there is any actionable

information from the result.

❖ Large samples can find out statistical

difference for very small difference.

These small difference might not have

practical significance.

Hypothesis Testing

1. State the Alternate Hypothesis.

2. State the Null Hypothesis.

3. Select a probability of error level (alpha

level). Generally 0.05

4. Select and compute the test statistic

(e.g t or z score)

5. Critical test statistic

6. Interpret the results.

Hypothesis Testing

❖ Lower Tail Tests

❖ H0: μ ≥ 150cc

❖ Ha: μ < 150cc

❖ H0: μ ≤ 150cc

❖ Ha: μ > 150cc

Hypothesis Testing

❖ Two Tail Tests

❖ H0: μ = 150cc

❖ Ha: μ ≠ 150cc

Calculate Test Statistic

❖ Single sample

❖ z = (x - μ)/ σ

❖ z = (x̄ - μ) / (σ / √n)

Z Critical

❖ α = 0.05 Single Tails

❖ Z Critical = 1.645

Z Critical

❖ α = 0.01 Two Tails means 0.005 on both

tails. Z Critical = 2.575

❖ α = 0.05 Two Tails means 0.025 on both

tails. Z Critical = 1.96

❖ α = 0.10 Two Tails means 0.05 on both

tails. Z Critical = 1.645

•95% – Z Score = 1.96

•99% – Z Score = 2.576

p Value

❖ p value is the lowest value of alpha for

which the null hypothesis can be

rejected. (Probability that the null

hypothesis is correct)

❖ If p = 0.01 you can reject the null

hypothesis at α = 0.05

❖ p is low the null must go / p is high the

null fly.

Sample Size

❖ n = ( zα/2 . σ / ME)2

❖ n is sample size

❖ zα/2 is standard score

❖ α = 0.01 Z Critical = 2.575

❖ α = 0.05 Z Critical = 1.96

❖ α = 0.10 Z Critical = 1.645

❖ σ is standard deviation

❖ ME is the Margin of Error (shift to be

detected)

Sample Size

❖ In perfume bottle filling m/c with mean

of 150cc and s.d. of 2cc, what is the

minimum sample size which at 95%

confidence will confirm a mean shift

greater than 0.5cc?

❖ n = ( zα/2 . σ / ME)2

❖ zα/2 = 1.96, σ = 2cc, ME=0.5cc

❖ n = 61.46

Sample Size – SigmaXL Demo

Sample Size - Proportion

❖ n = ( zα/2)2. p̂ . (1-p̂) / (Δp) 2

❖ n is sample size

❖ zα/2 is standard score

❖ α = 0.01 Z Critical = 2.575

❖ α = 0.05 Z Critical = 1.96

❖ α = 0.10 Z Critical = 1.645

❖ p̂ is proportion rate

❖ Δp is the desired proportion interval

Sample Size - Proportion

Point vs Interval Estimates

❖ Point estimate:

❖ Summarize the sample by a single number

that is an estimate of the population

parameter.

❖ Interval estimate:

❖ A range of values within which, we believe,

the true parameter lies with high

probability.

Point Estimates

❖ Point estimate:

❖ Summarize the sample by a single number

that is an estimate of the population

parameter.

❖ The sample mean x̄ is a point estimate of

the population mean μ. The sample

proportion p is a point estimate of the

population proportion P.

Point vs Interval Estimates

❖ Interval estimate:

❖ A range of values within which, we believe,

the true parameter lies with high

probability.

❖ For example, a < x̄ < b is an interval

estimate of the population mean μ. It

indicates that the population mean is

greater than a but less than b.

Confidence Interval

❖ Factors affecting the width of

confidence interval

❖ sample size

❖ standard deviation

❖ confidence level

Confidence Interval

❖ When population standard deviation is

known/ Sample size is >=30

❖ Zα/2 = z table value for confidence level,

❖ σ = standard deviation

❖ n = sample size.

Confidence Interval

❖ When population standard deviation is

known/ Sample size is >=30

❖ Zα/2 = z table value for confidence level,

❖ σ = standard deviation

❖ n = sample size.

Confidence Interval

❖ The average income of 100 random residents of

city was found to be $42,000 per annum with

standard deviation of 5,000. Find the 95%

confidence interval of the town income.

❖ CI = x̄ +/- (Zα/2 )* σ/√(n).

❖ Zα/2 = z table value for confidence level,

❖ σ = standard deviation

❖ n = sample size.

Confidence Interval

❖ The average income of 100 random residents of

city was found to be $42,000 per annum with

standard deviation of 5,000. Find the 95%

confidence interval of the town income.

❖ CI = x̄ +/- (Zα/2 )* σ/√(n).

❖ Zα/2 = z table value for confidence level =

1.96

❖ σ = standard deviation = 5,000

❖ n = sample size = 100

•90% – Z Score = 1.645

•95% – Z Score = 1.96

•99% – Z Score = 2.576

Confidence Interval

❖ The average income of 100 random residents of

city was found to be $42,000 per annum with

standard deviation of 5,000. Find the 95%

confidence interval of the town income.

❖ CI = x̄ +/- (Zα/2 )* σ/√(n)

❖ CI = 42,000 +/- 1.96 * 5,000/√(100)

❖ CI = 42,000 +/- 980

❖ CI = 41020 to 42980

Confidence Interval

❖ When population standard deviation is

unknown and Sample size is < 30

❖ tα/2 = t distribution value for the confidence

level and (n-1) degrees of freedom

❖ s = sample standard deviation

❖ n = sample size.

Confidence Interval

❖ The average income of 25 random residents of

city was found to be $42,000 per annum with

standard deviation of 5,000. Find the 95%

confidence interval of the town income.

❖ CI = x̄ +/- (tα/2 )* s/√(n).

❖ tα/2 = t distribution value for the confidence

level and (n-1) degrees of freedom

❖ s = sample standard deviation

❖ n = sample size.

•95% – Z Score = 1.96

•99% – Z Score = 2.576

Introducing t distribution

❖ Also known as Student ’s t distribution

❖ Used when the sample size is small

and/or when the population variance is

unknown

❖ Calculated value

❖ t = [x̄ - μ ] / [ s / sqrt( n ) ]

❖ The form of the t distribution is

determined by its degrees of freedom

(n-1)

Confidence Interval

Confidence Interval

❖ The average income of 25 random residents of

city was found to be $42,000 per annum with

standard deviation of 5,000. Find the 95%

confidence interval of the town income.

❖ CI = x̄ +/- (tα/2 )* s/√(n).

❖ tα/2 = t distribution value for the confidence

level and (n-1) degrees of freedom = 2.064

❖ s = sample standard deviation = 5000

❖ n = sample size. = 25

❖ CI = 42,000 +/- 2.064* 5000/√(25)

❖ CI = 42,000 +/- 2064

❖ CI = 39,936 to 44,064

Confidence Interval - Proportion

❖ CI = x̄ +/- (Zα/2 )* σ/√(n)

❖ CI = p +/- (Zα/2 )* √((p)(1-p)/n)

❖ np ≥ 5 and

❖ n(1 − p) ≥ 5

Confidence Interval - Proportion

❖ Out of 100 pieces sample inspected 10

were found to be defective. What is the

95% confidence interval for

proportions?

❖ CI = p +/- (Zα/2 )* √((p)(1-p)/n)

❖ p = 0.10, np = 100x0.10=10, n(1-p)=90

❖ Conditions np ≥ 5 and n(1 − p) ≥ 5

satisfied

❖ CI = 0.10+/- 1.96 √((0.10)(1-0.10)/100)

❖ CI = 0.10 +/- 0.0588 = 0.0412 to 0.1588

Confidence Interval - Variation

❖ Confidence interval for variance:

first

Chi-Square Distribution

❖ Select a random sample of size n from a

normal population, having a standard

deviation equal to σ.

❖ The standard deviation in sample is

equal to s.

❖ chi-square for this can be calculated by:

❖ Χ2 = [ ( n - 1 ) * s2 ] / σ2

Chi-Square Distribution

❖ Χ2 = [ ( n - 1 ) * s2 ] / σ2

Chi-Square Distribution

❖ Χ2 = [ ( n - 1 ) * s2 ] / σ2

❖ Df = 24, Χ2 0.05 = 36.42, Χ2 0.95 = 13.848

Chi-Square Distribution

❖ Χ2 = [ ( n - 1 ) * s2 ] / σ2

❖ Df = 24, Χ2 0.05 = 36.42, Χ2 0.95 = 13.848

❖ For 25 sample of perfume bottles,

variance was found to be 4. Find the CI

of the population with 90% confidence.

❖ (25-1).(4)/36.42 and (25-1).(4)/13.848

❖ Between 2.636 and 6.93

Tests for Mean, Variance & Proportion

One sample z test

Two

Tests

Samples

Paired t test

deviation

More than 2

ANOVA

samples

One Sample z Test

❖ Calculated value

❖ z = [x̄ - μ ] / [σ / sqrt( n ) ]

❖ Example: Perfume bottle producing

150cc with sd of 2 cc, 100 bottles are

randomly picked and the average

volume was found to be 152cc. Has

mean volume changed? (95%

confidence)

❖ zcalculated = (152-150)/[2 / sqrt( 100 ) ] =

2/0.2 = 10

❖ zcritical = ?

One Sample z Test

zcritical = 1.96

One Sample z Test

❖ Calculated value

❖ z = [x̄ - μ ] / [σ / sqrt( n ) ]

❖ Example: Perfume bottle producing

150cc with sd of 2 cc, 100 bottles are

randomly picked and the average

volume was found to be 152cc. Has

mean volume changed? (95%

confidence)

❖ zcalculated = (152-150)/[2 / sqrt( 100 ) ] =

2/0.2 = 10

❖ zcritical = 1.96 > Reject Ho

One Sample t Test

❖ Calculated value

❖ t = [x̄ - μ ] / [s / sqrt( n ) ]

❖ Example: Perfume bottle producing

150cc, 4 bottles are randomly picked

and the average volume was found to

be 151cc and sd of sample was 2 cc. Has

mean volume changed? (95%

confidence)

❖ tcal = (151-150)/[2 / sqrt( 4 ) ] = 1/1 = 1

❖ tcritical = ?

One Sample t Test

tcritical = 3.182

One Sample t Test

❖ Calculated value

❖ t = [x̄ - μ ] / [s / sqrt( n ) ]

❖ Example: Perfume bottle producing

150cc, 4 bottles are randomly picked

and the average volume was found to

be 151cc and sd of sample was 2 cc. Has

mean volume changed? (95%

confidence)

❖ tcal = (151-150)/[2 / sqrt( 4 ) ] = 1/1 = 1

❖ tcritical = 3.182 > Fail to reject Ho

One Sample p Test

❖ H0: p = p0

❖ Calculated value

was 21%, 100 samples were picked and

found 14 smokers. Has smoking habit

changed?

One Sample p Test

❖ Example: Smoking rate in a town in past

was 21%, 100 samples were picked and

found 14 smokers. Has smoking habit

changed at 95% confidence? (two tail)

❖ p0 = 0.21, p=0.14

❖ np0 = 0.21x100 = 21 and n(1-p0)= 0.79x100 = 79

❖ >5 means sample size is sufficient.

❖ z = (0.14-0.21)/sqt (0.21x0.79/100)

❖ z = -0.07/0.0407 = -1.719

❖ z critical = 1.96

One Sample p Test

❖ Example: Smoking rate in a town in past

was 21%, 100 samples were picked and

found 14 smokers. Has smoking habit

reduced at 95% confidence? (one tail)

❖ H0 : p < p 0

❖ p0 = 0.21, p=0.14

❖ z = (0.14-0.21)/sqt (0.21x0.79/100)

❖ z = -0.07/0.0407 = -1.719

❖ z critical = 1.645

Tests for Mean, Variance & Proportion

One sample z test

Two

Tests

Samples

Paired t test

deviation

More than 2

ANOVA

samples

Two Sample z Test

❖ Null hypothesis: H 0: μ 1 = μ 2

❖ or H 0: μ 1 – μ 2= 0

❖ Alternative hypothesis: H a : μ 1 ≠ μ 2

Two Sample z Test

❖ Example: From two machines 100

samples each were drawn.

❖ Machine 1: Mean = 151.2 / sd = 2.1

❖ Machine 2: Mean = 151.9 / sd = 2.2

❖ Is there difference in these two machines.

Check at 95% confidence level.

Two Sample z Test

❖ Example: From two machines 100

samples each were drawn.

❖ Machine 1: Mean = 151.2 / sd = 2.1

❖ Machine 2: Mean = 151.9 / sd = 2.2

❖ Is there difference in these two machines.

Check at 95% confidence level.

❖ Zcal = -0.7 / 0.304 = -2.30

❖ Zcritical = 1.96

❖ Reject Null.

❖ There is a difference.

Two Sample z Test

❖ Example: From two machines 100

samples each were drawn.

❖ Machine 1: Mean = 151.2 / sd = 2.1

❖ Machine 2: Mean = 151.9 / sd = 2.2

❖ Is there difference of more than 0.3 cc in

these two machines. Check at 95%

confidence level.

❖ H 0: μ 2 – μ 1 <= 0.3

❖ H a: μ 2 – μ 1 > 0.3

Two Sample z Test

❖ Example: From two machines 100

samples each were drawn.

❖ Machine 1: Mean = 151.2 / sd = 2.1

❖ Machine 2: Mean = 151.9 / sd = 2.2

❖ Is there difference of more than 0.3 cc in

these two machines. Check at 95%

confidence level.

❖ Zcal = (151.2 – 151.9) – (-0.3)/0.304

❖ = -0.4 / 0.304 = -1.316

❖ Zcritical = 1.64

❖ Fail to reject Null Hypothesis.

Two Sample t Test

❖ If two set of data are independent or

dependent.

❖ If the values in one sample reveal no

information about those of the other

sample, then the samples are independent.

❖ Example: Blood pressure of male/female

in the other sample, then the samples are

dependent.

❖ Example: Blood pressure before and after a

specific medicine

Two Sample t Test

❖ If two set of data are independent or

dependent.

❖ If the values in one sample reveal no

information about those of the other

sample, then the samples are independent.

❖ Example: Blood pressure of male/female

Two sample t test

in the other sample, then the samples are

dependent.

❖ Example: Blood pressure before and after a

specific medicine Paired t test

Two Sample t Test

❖ Is variance for two samples equal?

finding out t

Two Sample t Test

❖ Example: Samples from two machines A

and B have the following volumes in

bottles. If the mean different? Calculate

with 95% confidence.

results.

Two Sample t Test

Two Sample t Test

tcritical = 2.306

Two Sample t Test

❖ Assumptions: Normality, independent

random samples, population variances

are equal

Two Sample t Test

❖ What if variance of two samples is not

A C

equal? 150 144

152 162

154 177

152 150

151 140

2 Sample t-Test

Test Information

H0: Mean Difference = 0

Ha: Mean Difference Not Equal To 0

Assume Unequal Variance

Results: A C

Count 5 5

Mean 151.80 154.60

Standard Deviation 1.483 15.027

Std Error Difference 6.753

DF 4.078

t -0.414644

P-Value (2-sided) 0.6997

Two Sample t Test

❖ Degrees of freedom are calculated by:

Results: A C

Count 5 5

Mean 151.80 154.60

Standard Deviation 1.483 15.027

Two Sample t Test

tcritical = 2.776

Two Sample t Test

❖ Minitab 17 output:

Two-sample T for A vs C

A 5 151.80 1.48 0.66

C 5 154.6 15.0 6.7

Estimate for difference: -2.80

95% CI for difference: (-21.55, 15.95)

T-Test of difference = 0 (vs ≠): T-Value = -0.41 P-Value = 0.700

DF = 4

Paired t Test

❖ Where you have two samples in which

observations in one sample can be

paired with observations in the other

sample.

❖ Or

❖ If the values in one sample affect the

values in the other sample, (the samples

are dependent.)

❖ Example: Blood pressure before and after a

specific medicine

Paired t Test

❖ Find the difference between two set of

readings as d1, d2 …. dn.

❖ Find the mean and standard deviation of

these differences.

Paired t Test

❖ Example: Before and after medicine BP

was measured. Is there a difference at

95% confidence level?

1 120 122

2 122 120

3 143 141

4 100 109

5 109 109

Paired t Test

❖ Example: Before and after medicine BP

was measured. Is there a difference at

95% confidence level?

Patient Before After difference

1 120 122 2

2 122 120 -2

3 143 141 -2

4 100 109 9

5 109 109 0

❖ tcal. = 1.4/2.04 = 0.69

Paired t Test

❖ Example: Before and after medicine BP

was measured. Is there a difference at

95% confidence level?

Patient Before After difference

1 120 122 2

2 122 120 -2

3 143 141 -2

4 100 109 9

5 109 109 0

❖ tcal. = 1.4/2.04 = 0.69

❖ t0.025, 4 = 2.766

❖ Fail to reject null hypothesis

Tests for Mean, Variance & Proportion

One sample z test

Two

Tests

Samples

Paired t test

deviation

More than 2

ANOVA

samples

Two Sample p Test

❖ Null hypothesis: H 0: p1 = p2

❖ or H 0: p 1 – p 2= 0

❖ Alternative hypothesis: H a : p 1 ≠ p 2

Two Sample p Test

Two Sample p Test

❖ Normal approximation – Pooled

❖ p1 = 30/200 =0.15 , p2= 10/100 = 0.10

❖ Pooled p = (30+10)/(200+100) = 0.1333

❖ Expected value = 13.33% (is >=5)

❖ Z = 0.0500 /

Sqrt(0.133x0.866)(1/200+1/100)

❖ Z = 0.0500/0.4156 = 1.20

Tests for Variance

❖ F-test

❖ for testing equality of two variances from

different populations

❖ for testing equality of several means with

technique of ANOVA.

❖ Chi-square test

❖ For testing the population variance against

a specified value

❖ testing goodness of fit of some probability

distribution

Two Sample Variance – F Test

❖ F-test

❖ for testing equality of two variances from

different populations

❖ F calculated

test.

❖ Remember: Variance is square of standard

deviation

Two Sample Variance – F Test

❖ F critical

❖ Use table with appropriate degrees of

freedom

❖ For two tail test use the table for α/2

Two Sample Variance – F Test

❖ Example: We took 8 samples from

machine A and the standard deviation

was 1.1. For machine B we took 5

samples and the variance was 11. Is

there a difference in variance at 90%

confidence level?

❖ n1 = 8, s1 = 1.1, s21 = 1.21, df = 7 (denominator)

❖ n2 = 5, s22 = 11, df = 4 (numerator)

❖ F calculated = 11/1.21 = 9.09 (higher value at top)

Two Sample Variance – F Test

F critical = 4.1203

Two Sample Variance – F Test

❖ Example: We took 8 samples from

machine A and the standard deviation

was 1.1. For machine B we took 5

samples and the variance was 11. Is

there a difference in variance at 90%

F critical = 4.1203

confidence level?

❖ n1 = 8, s1 = 1.1, s21 = 1.21, df = 7 (denominator)

❖ n2 = 5, s22 = 11, df = 4 (numerator)

❖ F calculated = 11/1.21 = 9.09 (higher value at top)

❖ Reject H0

Two Sample Variance – F Test

❖ Right tail F critical = 4.1203

❖ Left tail F critical =?

❖ Reverse degrees of freedom and then

take inverse.

F critical = 4.1203

❖ F (4,7) = 4.1203

❖ F (7,4) = 6.0942

❖ Inverse of this is 1/6.0942 is F= 0.164

Tests for Variance

❖ F-test

❖ for testing equality of two variances from

different populations

❖ for testing equality of several means with

technique of ANOVA.

❖ Chi-square test

❖ For testing the population variance against

a specified value

❖ testing goodness of fit of some probability

distribution

One Sample Chi Square

❖ For testing the population variance against

a specified value σ

One Sample Chi Square

❖ Example: A sample of 25 bottles was

selected. The variance of these 25 bottles

as 5 cc. Has it increased from established 4

cc? 95% confidence level.

❖ Ho: s2 <= σ2 / Ha: s2 > σ2

❖ X2 = 24x5 / 4 = 30

degrees of freedom?

One Sample Chi Square

❖ Example: A sample of 25 bottles was

selected. The variance of these 25 bottles

as 5 cc. Has it increased from established 4

cc? 95% confidence level.

❖ Ho: s2 <= σ2 / Ha: s2 > σ2

❖ X2 = 24x5 / 4 = 30

degrees of freedom?

One Sample Chi Square

One Sample Chi Square

❖ Example: A sample of 25 bottles was

selected. The variance of these 25 bottles

as 5 cc. Has it increased from established 4

cc? 95% confidence level.

❖ Ho: s2 <= σ2 / Ha: s2 > σ2

❖ X2 = 24x5 / 4 = 30

of freedom = 36.42

❖ Fail to reject H0

One Sample Chi Square

❖ SigmaXL Output

ANOVA

❖ F-test

❖ for testing equality of two variances from

different populations

❖ for testing equality of several means with

technique of ANOVA.

❖ Chi-square test

❖ For testing the population variance against

a specified value

❖ testing goodness of fit of some probability

distribution

ANOVA

❖ F-test

❖ for testing equality of two variances from

different populations

❖ F calculated

test.

❖ Remember: Variance is square of standard

deviation

ANOVA

❖ Why ANOVA?

❖ We used t test to compare the means of

two populations.

❖ What if we need to compare more than two

populations? With ANOVA e can find out if

one or more populations have different

mean or comes from a different population.

❖ We could have conducted multiple t Test.

❖ How many t Test we need to conduct if

have to compare 4 samples? … 6

ANOVA

❖ Why ANOVA?

❖ How many t Test we need to conduct if

have to compare 4 samples? … 6

4 x

3 x 3 vs 4

2 x 2 vs 3 2 vs 4

1 x 1 vs 2 1 vs 3 1 vs 4

1 2 3 4

ANOVA

❖ Why ANOVA?

❖ How many t Test we need to conduct if

have to compare 4 samples? … 6

❖ Each test is done with alpha = 0.05 or 95%

confidence.

❖ 6 tests will result in confidence level of

0.95x0.95x0.95x0.95x0.95x0.95 = 0.735

ANOVA

❖ Comparing three machines:

Machine 1 Machine 2 Machine 3

150 153 156

151 152 154

152 148 155

152 151 156

151 149 157

150 152 155

x̄1 = 151 x̄2 = 150.83 x̄3 = 155.50

ANOVA

Machine 1 Machine 2 Machine 3

❖ Comparing three machines: 150 153 156

151 152 154

152 148 155

152 151 156

158

151 149 157

156 150 152 155

x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50

154

Median

25th

152

75th

Mean

150

148

Machine 1 Machine 2 Machine 3

146

ANOVA

❖ Comparing three machines:

150 153 156 130 163 166

151 152 154 155 152 154

152 148 155 160 143 155

152 151 156 158 141 151

151 149 157 152 149 152

150 152 155 145 157 155

x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50 x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50

ANOVA

❖ Comparing three machines:

Machine 1 Machine 2 Machine 3 Machine 4 Machine 5 Machine 6

150 153 156 130 163 166

151 152 154 155 152 154

152 148 155 160 143 155

152 151 156 158 141 151

151 149 157 152 149 152

150 152 155 145 157 155

x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50 x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50

170

158

165

156 160

154 Median

Median 155

25th 25th

152 150

75th 75th

145

150 Mean Mean

140 Outliers

148

135

Machine 1 Machine 2 Machine 3

146 130

125 Machine 4 Machine 5 Machine 6

ANOVA

❖ ANOVA is Analysis of Variance

❖ Variance

Squares

❖ Total of Sum of Squares (SST) =

SS between/or treatment +SS within/or error

ANOVA

❖ SST = SS between(or treatment) +SS within(or error)

❖ Ratio:

SS between(or treatment) / SS within(or error)

ANOVA

❖ SST = SS between(or treatment) +SS within(or error)

150 153 156

151 152 154

152 148 155

152 151 156

151 149 157

150 152 155

x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50

Machine 1 Machine 2 Machine 3

ANOVA

150 153 156

151 152 154

152 148 155

152 151 156

150 152 155

x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50

Machine 1 x1 - x̄1 Sqr(x1 - x̄1) Machine 2 x2 - x̄2 Sqr(x2 - x̄2) Machine 3 x3 - x̄3 Sqr(x3 - x̄3)

150.00 -1.00 1.00 153.00 2.17 4.69 156.00 0.50 0.25

151.00 0.00 0.00 152.00 1.17 1.36 154.00 -1.50 2.25

152.00 1.00 1.00 148.00 -2.83 8.03 155.00 -0.50 0.25

152.00 1.00 1.00 151.00 0.17 0.03 156.00 0.50 0.25

151.00 0.00 0.00 149.00 -1.83 3.36 157.00 1.50 2.25

150.00 -1.00 1.00 152.00 1.17 1.36 155.00 -0.50 0.25

151.00 150.83 155.50 152.44

4.00 18.83 5.50

❖ SS within = 4.00+18.83+5.50 = 28.33

Machine 1 Machine 2 Machine 3

ANOVA

150 153 156

151 152 154

152 148 155

152 151 156

150 152 155

x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50

Machine 1 x1 - x̄1 Sqr(x1 - x̄1) Machine 2 x2 - x̄2 Sqr(x2 - x̄2) Machine 3 x3 - x̄3 Sqr(x3 - x̄3)

150.00 -1.00 1.00 153.00 2.17 4.69 156.00 0.50 0.25

151.00 0.00 0.00 152.00 1.17 1.36 154.00 -1.50 2.25

152.00 1.00 1.00 148.00 -2.83 8.03 155.00 -0.50 0.25

152.00 1.00 1.00 151.00 0.17 0.03 156.00 0.50 0.25

151.00 0.00 0.00 149.00 -1.83 3.36 157.00 1.50 2.25

150.00 -1.00 1.00 152.00 1.17 1.36 155.00 -0.50 0.25

151.00 150.83 155.50 152.44

4.00 18.83 5.50

Machine 1 Machine 2 Machine 3

ANOVA

150 153 156

151 152 154

152 148 155

152 151 156

150 152 155

x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50

❖ Degrees of freedom

❖ Total df = df treatment + df error

❖ (N-1) = (C-1) + (N-C)

❖ df treatment = 3-1=2, df error = 18-3=15

❖ df total = 17

Machine 1 Machine 2 Machine 3

ANOVA

150 153 156

151 152 154

152 148 155

152 151 156

150 152 155

❖ MSbetween = SS between(or treatment) /df treatment x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50

ANOVA

❖ F = MSbetween / MSwithin = 42.03/1.89 =

22.24

❖ F (2, 15, 0.95) = 3.68

❖ DEMONSTRATE MS Excel

Machine 1 Machine 2 Machine 3

ANOVA

150 153 156

151 152 154

152 148 155

151 149 157

One-Way ANOVA & Means Matrix:

150 152 155

H0: Mean 1 = Mean 2 = ... = Mean k x̄1 = 151.00 x̄2 = 150.83 x̄3 = 155.50

Ha: At least one pair Mean i ≠ Mean

j

Count 6 6 6

Mean 151 150.83 155.50

Standard Deviation 0.894427 1.941 1.048808848

UC (2-sided, 95%, pooled) 152.20 152.03 156.70

LC (2-sided, 95%, pooled) 149.80 149.64 154.30

ANOVA Table

Source SS DF MS F P-Value

Between 84.111 2 42.056 22.265 0.0000

Within 28.333 15 1.889

Total 112.44 17

ANOVA

❖ Practice Exercise: Fill in the values for ?A

to ?E in this ANOVA Table:

ANOVA Table

Source SS DF MS F

Between 84.111 ?C ?D ?E

Within ?A 15 1.889

Total ?B 17

values. Once done, go ahead and start

the video

ANOVA

❖ ?A = 15x1.889 = 28.333

❖ ?B = 84.111+28.33 = 112.44

❖ ?C = 17-15 = 2

❖ ?D = 84.111/2 = 42.056

❖ ?E = 42.045/1.889 = 22.265

ANOVA Table

Source SS DF MS F

Between 84.111 ?C= 2 ?D = 42.056 ?E= 22.265

Within ?A= 28.333 15 1.889

Total ?B= 112.44 17

Goodness of Fit Test (Chi Square)

❖ To test if the sample is coming from a

population with specific distribution.

❖ Other goodness-of-fit tests are

❖ Anderson-Darling

❖ Kolmogorov-Smirnov

❖ Chi Square Goodness of Fit can be used

for any time of data: Continuous or

Discrete.

Goodness of Fit Test (Chi Square)

❖ H0: The data follow a specified

distribution.

❖ Ha: The data do not follow the specified

distribution.

❖ Calculated Statistic:

degrees of freedom for specific alpha.

Goodness of Fit Test (Chi Square)

❖ A coin is flipped 100 times. Number of

heads are noted. Is this coin biased?

Expected Observed

50 51

50 52

50 56

50 82

50 65

Goodness of Fit Test (Chi Square)

❖ A coin is flipped 100 times. Number of

heads are noted. Is this coin biased?

Expected Observed O-E (O-E)2 (O-E)2/E

50 51 1 1 0.02

50 52 2 4 0.08

50 56 6 36 0.72

50 82 32 1024 20.48

50 65 15 225 4.5

X2 = 25.8

Goodness of Fit Test (Chi Square)

❖ A coin is flipped 100 times. Number of

heads are noted. Is this coin biased?

X2cal= 25.8

X2(4,0.95)= 9.49

Goodness of Fit Test (Chi Square)

❖ A coin is flipped 100 times. Number of

heads are noted. Is this coin biased?

❖ X2cal= 25.8

❖ X2(4,0.95)= 9.49

❖ Coin is biased

Contingency Tables

❖ To find relationship between two

discrete variables.

Smoker Non

Smoker

Male 60 40 100

Female 35 40 75

95 80 175

Shift 1 22 26 23 71

Shift 2 28 62 26 116

Shift 3 72 22 66 160

122 112 115 347

Contingency Tables

❖ Null hypothesis is that there is no

relationship between the row and

column variables.

❖ Alternate hypothesis is that there is a

relationship. Alternate hypothesis does

not tell what type of relationship exists.

Operator 1 Operator 2 Operator 3

Shift 1 22 26 23 71

Shift 2 28 62 26 116

Shift 3 72 22 66 160

122 112 115 347

Contingency Tables

❖ Calculate Chi square statistic.

Shift 1 22 26 23 71

Shift 2 28 62 26 116

Shift 3 72 22 66 160

122 112 115 347

Contingency Tables

❖ Calculate Chi square statistic.

OBSERVED

Operator 1 Operator 2 Operator 3

Shift 1 22 26 23 71

Shift 2 28 62 26 116

Shift 3 72 22 66 160

122 112 115 347

EXPECTED

Operator 1 Operator 2 Operator 3

Shift 1 122x71/347 112x71/347 115x71/347 71

Shift 2 122x116/347 112x116/347 115x116/347 116

Shift 3 122x160/347 112x160/347 115x160/347 160

122 112 115 347

Contingency Tables

❖ Calculate Chi square statistic.

EXPECTED

Operator 1 Operator 2 Operator 3

Shift 1 122x71/347 112x71/347 115x71/347 71

Shift 2 122x116/347 112x116/347 115x116/347 116

Shift 3 122x160/347 112x160/347 115x160/347 160

122 112 115 347

EXPECTED

Operator 1 Operator 2 Operator 3

Shift 1 24.96 22.91 23.53 71

Shift 2 40.78 37.44 38.44 116

Shift 3 56.25 51.64 53.02 160

122 112 115 347

Contingency Tables

❖ Calculate Chi square statistic.

OBSERVED EXPECTED

Operator 1 Operator 2 Operator 3 Operator 1 Operator 2 Operator 3

Shift 1 22 26 23 71 Shift 1 24.96 22.91 23.53 71

Shift 2 28 62 26 116 Shift 2 40.78 37.44 38.44 116

Shift 3 72 22 66 160 Shift 3 56.25 51.64 53.02 160

122 112 115 347 122 112 115 347

Operat Operat

2

(O-E) /E Operator 1

or 2 or 3

Shift 1 (22-24.96)2/24.96 = 0.35 0.42 0.01 71

Shift 2 (28-40.78)2/40.78 = 4.00 16.11 4.03 116

Shift 3 (72-56.25)2/56.25 = 4.41 17.01 3.18 160

122 112 115 347 X2 = 49.52

Contingency Tables

❖ Calculate Chi square statistic = 49.52

❖ Degrees of freedom = (r-1)(c-1) = 4

❖ Chi square critical = 9.49

❖ Reject null hypothesis

❖ There is a relationship between the shift

and the operator.

Contingency Tables

❖ Practice Exercise:

❖ Calculate the Expected value for Non

Smoker Male?

❖ What will be the degrees of freedom in

this example?

Smoker Non

Smoker

Male 60 40 100

Female 35 40 75

95 80 175

Contingency Tables

❖ Practice Exercise:

❖ Calculate the Expected value for Non

Smoker Male? = 80x100/175 = 45.71

❖ What will be the degrees of freedom in

this example? (2-1)(2-1)=1

Smoker Non

Smoker

Male 60 40 100

Female 35 40 75

95 80 175

Parametric vs Non Parametric

❖ Parametric

❖ Assumes about the population from which the

sample has been drawn (e.g. Normally

distributed)

❖ Data is ratio or interval level

❖ Non Parametric

❖ Makes no assumption about the population

from which the sample has been drawn

❖ Normally or small size data. No minimum

sample size.

❖ Data is ratio, interval, nominal or ordinal level

❖ Less power (More likely to make Type II error)

Parametric vs Non Parametric

Data Level Interval, Ratio Nominal, Ordinal,

Interval, Ratio

Measurement of Mean Median

central tendency

Distribution Normal Unknown

Parametric vs Non Parametric Tests

Parametric Non Parametric

1-sample t-test 1-sample Wilcoxon

test

Independent Sample Mann-Whitney Test

T Test

Paired Sample T Test Wilcoxon Signed-Rank

Test

One-way ANOVA Kruskal-Wallis Test

FMEA

❖ Failure Mode and Effect Analysis:

❖ The FMEA is a design tool used to

systematically analyze potential failures

and identify the their effects.

❖ Identify

❖ Prioritize

FMEA Concept

FMEA

Identifies failures associated Identifies failures associated

with: with:

•Product life •Process reliability

•Safety hazards •Customer dissatisfaction

FMEA

Concept

FMEA

Design Process

FMEA FMEA

- System

Production Assembly

- Subsystem

FMEA FMEA

- Component

FMEA

- System - System

- Subsystem - Subsystem

- Component - Component

FMEA FMEA

FMEA

❖ Failure Mode and Effect Analysis:

❖ It is proactive tool (Before the problem

happens / not the after effect analysis)

❖ It is a living document

FMEA

Process / Failure Mode Failure Severity Cause(s) of Occurrence Current Detection R Recommende

Requirement Effect (1-10) failure mode (1-10) Controls (1-10) P d actions

(KPIVs) N

Perfume (1-10) • Unclear (1-10) • Review and 4 96

Making • Inconsistent specificatio 3 approve

quality 8 n specification

• Receiving • Wrong by design

ingredients

• Substandard 6 • Third party 4 192

material certification

supplied by • In house test

supplier lab

• Mixing

FMEA

❖ Risk Priority Number (RPN)

❖ Severity (1-10) x Occurrence (1-10) x

Detection (1-10)

❖ Severity

❖ Severity 1 – No effect/ client might not

even notice it

❖ Severity 10 – Serious safety hazard

without warning

FMEA

❖ Occurrence

❖ Occurrence 1 – Rare event, no data of such

type of failure in past

❖ Occurrence 10 – Failure almost inevitable

❖ Detection

❖ Detection 1 – Current system almost

certainly detect the problem (automation)

❖ Detection 10 – Current system can not

detect the problem

FMEA

❖ Identify key process steps

❖ Identify failure mode

❖ Identify failure effects/severity

❖ Identify causes/occurrence

❖ Identify controls /detection

❖ Calculate Risk Priority Number (RPN)

❖ Prioritize by RPN – Higher RPN first

❖ Determine action plan

❖ Recalculate RPN

FMEA

❖ Update FMEA when there is plan to

change / actual change of :

❖ Design

❖ Application

❖ Material

❖ Process

Gap Analysis

❖ Difference between

❖ Where we are, and

❖ Where we want to be

Desired

Gap State

Current

State

Gap Analysis

❖ Defining Current State

❖ Internal Measurements Strength Weakness

Opportunities and Threats) Opportunity Threat

Social, and Technological factors)

Political Social

Government Culural aspects

Intervention

Economic Technological

How business Automation

operates and innovation

Gap Analysis

❖ Defining Future State

❖ Benchmarking

Gap Analysis

❖ Bridging the gap

❖ Prioritization

❖ Hoshin Kanri (X Matrix) for Strategy

deployment

Commonly Used Gap Analysis

❖ Implementing ISO 9001 or other

Management Systems

❖ MBNQA, EFQM Excellence Model,

Deming Prize

Root Cause Analysis (RCA)

❖ RCA is a structured process to identify

root causes of an event that resulted in

an undesired outcome and develop

corrective actions.

Root Cause Analysis (RCA)

❖ 1. Identify the event to be investigated

and gather preliminary information (D)

❖ 2. Charter and select the team (D)

❖ 3. Describe what happened (M)

❖ 4. Identify the contributing factors (A)

❖ 5. Identify the root causes (A)

❖ 6. Identify and implement changes to

eliminate the root causes (I)

❖ 7. Measure and monitor the success (C)

Five Whys

Oil spill on Leakage from

floor pump

damaged gasket

Policy of

ordering to

lowest bidder

Five Whys

Policy of

Oil spill on Leakage from Gasket Sub standard

ordering to

floor pump damaged gasket

lowest bidder

on leave overdue

Poor

Pump too old

Housekeeping

Frequency

10

0

1

2

3

4

5

6

8

9

7

Application opportunity

Concepts clear

Engaging instructor

Pareto Chart

Complaint

Meeting expectatations

Instructor knoledgeable

Learning Valuable

0%

10%

20%

30%

40%

50%

60%

80%

90%

70%

100%

Fault Tree Analysis

Image from

Wikipedia

Cause and Effect Diagrams

Types of Wastes

Philosophy

• Waste exist in all processes at all levels in

the organization.

• Waste elimination is the key to successful

implementation of lean.

• Waste reduction is an effective way to

increase profitability.

Muda, Mura, Muri

Muda

an activity that is wasteful and

doesn't add value or is unproductive

Mura

Any variation leading to

unbalanced situations.

Muri

Any activity asking unreasonable

stress or effort from personnel, Wastes

material or equipment.

Muda

• Muda is a traditional Japanese term for

an activity that is wasteful and doesn't

add value or is unproductive

• Type I Muda: (Incidental Work)

• Non-value-added tasks which seam to be

essential. Business conditions need to be

changed to eliminate this type of waste.

Wastes

• Type II Muda: (Non-Value-Added Work)

• Non-value-added tasks which can be

eliminated immediately.

Mura

• MURA: Any variation leading to

unbalanced situations.

• Mura exists when

• workflow is out of balance

• workload is inconsistent

• not in compliance with the standard.

Wastes

Muri

• MURI: Any activity asking unreasonable

stress or effort from personnel, material

or equipment.

• For people, Muri means too heavy a mental

or physical burden.

• For machinery Muri means expecting a

machine to do more than it is capable of or

Wastes

has been designed to do.

Eight Types of Wastes

Transportation Over Processing

Unnecessary movement of people Processing beyond the demand

or parts between processes. from the customers.

Materials parked and not having Producing too much, too early

value added to them. and/or too fast.

8 Types

of Wastes Defects

Motion

Unnecessary movement of people Sorting, repetition or making

or parts within a process. scrap

People or parts waiting for a Failure when it comes

work cycle to finish. to exploiting the knowledge and

talent of the employees.

Types of Wastes (TIMWOOD +1)

1. Transportation

2. Inventory

3. Motion

4. Wait time 8 Types

of Wastes

5. Over-Processing

6. Over-Production

7. Defects

8. Under-utilized staff

1. Transportation

• Unnecessary movement of people or

parts between processes.

8 Types

of Wastes

2. Inventory

• Materials parked and not having value added

to them.

8 Types

of Wastes

3. Motion

• Unnecessary movement of people or parts

within a process.

8 Types

of Wastes

4. Waiting time

• People or parts waiting for a work cycle to

finish.

8 Types

of Wastes

5. Over processing

• Processing beyond the demand from the

customers.

8 Types

of Wastes

6. Overproduction

• Producing too much, too early and/or too

fast.

8 Types

of Wastes

7. Defects

• Sorting, repetition or making scrap

8 Types

of Wastes

8. Unexploited knowledge

• Failure when it comes to exploiting the

knowledge and talent of the employees.

8 Types

of Wastes

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.