You are on page 1of 84

Your presentation Basic Statistics title here

Contents
Data Types Histogram Pareto Chart Measures of Central Tendency Measures of Dispersion Frequency Distribution Variation Cause and Effect Diagram Process Stability Process Capability Indices Control Charts Scatter Diagram
2

Chapter 1

Data Types

Types of Data
Data Type Variable Data Can be measured on continuous scale & followed Normal distribution. E.g. Length, Weight, Temperature etc. Attribute Data Binary e.g. Yes or No (follow Binomial distribution) Count Data e.g. No. of defects, C Sat ratings (follow Poisson distribution) Categorical Data
4

Types Of Data
Continuum of Data Types
Exercise on Data Type

Discreet Binary Ordered categories Rankings or ratings Count

Continuous Measured on a continuum or scale Time (in hours) to process an application


5

Description

Classified into one of two categories

Counted discretely

Example

Good or bad Pass or fail Yes or no

Customer satisfaction rating of call center service

Number of errors in an application

Chapter 2

Histograms

Maintenance Project
Time to Resolve Ticket (TTR) in hours

1.7 1.1 2.2 1.9 2.3 1.7 1.2 2.0 1.5 2.1

1.5 1.2 1.7 1.8 1.4 1.5 1.8 1.7 1.6 2.0

1.9 1.6 1.5 1.7 1.8 1.4 1.6 1.5 2.0 1.6

2.0 1.8 1.2 1.9 1.6 1.7 1.8 1.3 1.7 1.5

1.6 1.9 2.1 1.7 1.4 1.6 1.5 1.7 1.9 1.4

1.7 2.3 1.5 1.6 1.3 2.0 2.2 1.6 1.4 1.6

2.0 1.6 1.1 1.8 1.7 1.9 1.8 1.5 1.9 1.7

1..4 1.7 1.8 1.5 2.0 1.8 1.6 1.9 1.8 1.6

2.1 1.9 1.6 1.8 1.8 1.7 1.3 1.8 1.5 1.9

1.7 1.3 1.8 1.9 1.6 1.4 1.8 1.7 2.1 2.2

The Making of A Distribution Curve


P lotting D on a ata U sing T R exam T ple. horizontal ( ) scale.

0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5

T first he

ticket closed in

1.7 hrs and the value is plotted on the scale.

A each newpiece of data is collected, it too is plotted onto the scale. s N howvaria ote: tion is already show ing.

0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5

Distribution Curve continued


A shape emerges As each new measurement is plotted, a shape or Distribution emerges. Note: how most data gather near the center and the shape tends to get smaller from the center.

0.8

0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6

1.7 1.8 1.9 2.0 2.1 2.2

2.3 2.4 2.5

Distribution Curve continued

A each piece of data m be s ay represented as a box the shape tends to look like a pyram id.

In a H istogramthese Boxes are norm replaced w Bars to ally ith m it easier to see ake

Distribution Curves continued

DISTRIBUTIONS may be shown as CURVES

A line can be drawn over the bars to give an overall shape of the distribution. This is called a DISTRIBUTION CURVE.

Histogram
The purpose of a histogram is to graphically summarize the distribution of a univariate data set. A histogram is a mapping that counts the number of observations that fall into various categories known as bins.
Interval <=0 (0, 2] (2, 4] (4, 6] (6, 8] (8, 10] (10, 12] (12, 14] (14, 16] (16, 18] (18, 20] (20, 22] (22, 24] (24, 26] >26 Freq. 0 0 22 15 14 15 9 8 11 13 10 7 0 0 0
Frequency
25

20

15

10

0 <=0 (0, 2] (2, 4] (4, 6] (6, (8, (10, (12, (14, (16, (18, (20, (22, (24, >26 8] 10] 12] 14] 16] 18] 20] 22] 24] 26]

12

Message
The histogram graphically shows the following: center tendency of the data spread of the data skewness of the data presence of outliers presence of multiple modes in the data.
Frequency
70 60 50 40 30 20 10 0 <=0 (0, 10] (10, 20] (20, 30] >30

13

No of Bins
Nclass = n Where n is the number of observations in the sample Bin Width or Class Interval = (Maximum value - Minimum Value) / 10

Histogram

HISTOGRA M TEMPLATE
14

Chapter 3

Pareto Chart

15

Pareto Chart
Pareto charts are a type of bar chart in which the horizontal axis represents categories of interest, rather than a continuous scale. The categories are often "defects." By ordering the bars from largest to smallest, a Pareto chart can help you determine which of the defects comprise the "vital few" and which are the "trivial many The Pareto Principle states that only a "vital few" factors are responsible for producing most of the problems.

16

Pareto Chart Contd..


This principle can be applied to quality improvement to the extent that a great majority of problems (80%) are produced by a few key causes (20%). If we correct these few key causes, we will have a greater probability of success.
Pareto

17

Pareto Chart
97.92% 100.00% 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00%

45 40 35 30 25 20 15 10 5 0

85.42% 72.92%

93.75%

No. of Defects

41.67% 20 15 6 4 2

od in g/ Im pl em en ta tio

St an da rd

Im

pr op er

Defect Causes
18

co m m un ic at En io vi n ro nm en ta ls et -u p In ad eq ua te tr ai ni ng

ec t

fo llo w ed

in co rr

no t

In ad eq u

at e

re

vi ew

Chapter 4

Measures of Central Tendency

19

Measuring the Location


Mean
Mean is the arithmetic average of all data points in a data set

Y=
Mode

Y1 + Y2 + Y3 + . + Yn n

Where n = number of data points

Mode is the most frequently occurring data point in a data set

Median
Median is the middle data point of a data set arranged in an ascending / descending order

Odd number of data points

Even number of data points

Average

20

Median
Numerical value in the middle of a linear array of data
The middle value in a distribution.

The median divides the distribution into two equal areas.

AREA A

AREA B

MEDIAN

MAINT DATA MEDIAN

21

Mean Vs Median
Mean enjoys participation from all data points. It is influenced by every data value. Extreme values influence mean. AREA A
Exercise

Median does not represent all data points, but represents a central location of the data AREA Hence the median is array. Bnot biased by extreme values in the data set.

22

Mode

AREA A

AREA B

23

Chapter 5

Measures of Dispersion

24

Importance of Spread

B A C

Mean of Curve A is more representative of its data set as compared to Curves B & C Spread outside the specifications may result in defects; this information is not provided by mean From a process perspective, individual customers are subject to different behaviors of the process
25

Range
Range = Maximum Minimum Value

26

Standard Deviation
A measure of deviation from the mean.
Variance = d2/(n-1) s= Variance
Note: Variance & standard deviation measure how individual data points are spread around mean

Mean d

Variance = s2 =

( Y1 - Y )2 + ( Y2 Y )2 + . + ( Yn Y )2 (n1)

Standard Deviation =

s2
27

Standard Deviation
advantages of STANDARD DEVIATION
Standard Deviation uses all data - not just the extremes Can be more reliable indicator than using Range.

dis-advantages of STANDARD DEVIATION


Not the easiest of formulas to use Not manual charting friendly

Class Exercise
Given below is the sample data on Customer complaint closure time in hrs. Compute the Mean & Standard Deviation for each quarter.
Quarter 1 Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 Sample 8 Sample 9 Sample 10 Mean Standard Deviation 204 202 205 196 198 190 196 205 200 199 199.5 5 Quarter 2 145 150 140 165 134 130 170 132 145 164 147.5 14
29

Chapter 6

Frequency Distribution

30

Types of Distributions
Discrete Distributions Binomial Hypergeometric Poisson Continuous Distributions Normal Exponential Lognormal Weibull

31

Normal Distribution
The normal Probability Density Function:1 < x < 2 2 where e = 2 .7182818 ... and = 3 . 14159265 ... f ( x )=
is the population standard deviation, is the population mean, e is the base e, and is pi. The name of this function is the "normal" curve. If = 0 and = 1, then the following graph results:

x 2 e 2 2 for

EFFORT VARIANCE

32

Normal Distribution
Normal distribution curve has following properties: symmetrical about mean = 0 "bell" shaped highest probability at mean = 0 approaches x-axis but never crosses (asymptotic to the x-axis) the numbers on the x-axis are the number of standard deviations away from the mean

33

Normal Distribution
The area under each "section" of the normal curve can be seen in the following diagram.

34

Normal Distribution Curve

34.13%

34.13%

13.60%

13.60%

0.13% 2.14% -3 -2 -1 X +1 +2 2.14%

0.13% +3

68.26% 95.46% 99.73% 68.26% Fall Within +\- 1 Std deviation 95.46% Fall Within +\- 2 Std deviation 99.73% Fall Within +\- 3 Std deviation

35

Normal Distribution

The shape of the normal curve is affected by the standard deviation.

Changes to the mean shift the normal curve horizontally.

36

Measuring the Shape


Symmetric Data set
Its a data set in which spread of the data set around its mean is identical For such a data set mean = mode = median
Mean, Mode, Median

Asymmetric Data set


Positive / Right skewed - high spread on the right side of the mean

Negative / Left skewed - high spread on the left side of the mean

Mode

Mean

Mean

Mode 37

Median

Median

Measuring the Shape

Karl Pearson suggested a simple calculation as a measure of skewness: (mean - mode) / standard deviation

AREA A Note: Mode = 3Median 2Mean AREA B

38

Chapter 7

Variation

39

Process

Market
Suppliers Inputs Business Processes Process Outputs Critical Customer Requirements

Defects

Variation in the Process Output causes Defects that are seen by the customer

Output Variation is caused by Variation in Process Inputs and by Variation in the Process itself

40

Elements of Variation
What causes variation 6Ms
Market

M-Man M-Material M-Method M-Machine M-Measurement M-Mother Nature


41

Elements of Measurement Machine Method Material Man Variation

Two types of VARIATION

Common Causes
traffic lights amount of traffic

Examples from the Job

Special Causes
car breaks down accident on freeway Weather

Examples from the Job

Causes of Variation
Those causes: Which can be easily identified Which are significantly visible in the process Which are not expected and When present, make the process unstable and unpredictable Those causes: Which can not be easily identified Which are present in the process without having an significant impact on the process Which are expected and Even if present, the process remains stable and predictable They are Voice of the process

Special Causes

Random Causes

When we display data, Variation becomes visible


43

Chapter 8

Cause and Effect Diagram

44

Cause & Effect Diagram


Also known as Fish Bone Diagram or Ishikawa diagram The fishbone diagram is an analysis tool that provides a systematic way of looking at effects and the causes that create or contribute to those effects. The design of the diagram looks much like the skeleton of a fish. Used to study a problem/issue to determine the root cause

Cause & Effect Diagram Contd..


How is it done? Decide which quality characteristic, outcome or effect you want to examine (may use Pareto chart) Backbone draw straight line Ribs categories Medium size bones secondary causes Small bones root causes

Cause & Effect Diagram Contd..

Men

Cause & Effect diagram


Machine Material Environment Measurement Week battery at counter Method Incomplete address

Wrong route followed Furnace temperature low Topping not available Heavy Traffic Health problem Absenteeism Computer not working Breakdown of Scooter Packing problem Bad roads

Week battery at coustomer Mix-up of orders New Apartments developing

Heavy rain or fog

Cause & Effect Di agr am


Measurements Material Personnel

Week battery at counter

Topping not av ailable

Wrong route follow ed H ealth problem

Week battery at coustomer

P acking problem

A bsenteeism

Delay in Pizza Delivery


H eav y rain or fog Bad roads H eav y Traffic N ew A partments dev eloping M ix-up of orders Incomplete address Breakdow n of Scooter C omputer not w orking F urnace temperature low

Env ironment

Methods

Machines

47

Cause & Effect Diagram Contd..


Custome r Issues Lack of review/ mentoring of fresher Not briefed No Training No shield jobs to later Reviewers did not on VB, XML between joinees have domain customer and knowledge & were Kept changing Doc. developer New not VB programmers technology because of not Domain trg for ambiguity detaile not customer d Not clear comprehens Skill & leading to in Untested ive experience not ambiguity beginning code sent on Customer appropriate Dissatisfacti customer Lack of Poor Two on request Intermediate openness issue libraries Test plan prepared by Project review Config fresher not reviewed. for escalat not happening Mgmt ion escalation Adhoc Testing New Capability Coding GL change joiners assessment not followed mgmt Codin ITP based on developers team & job g input. allocation Other Lack of No internal review, Tight s Design Review done at end process Schedu awarenes No risk / not effective Design Common info not le (no s contingenc Othe baselined defined buffer) y plan rs Poor after Team reporti (Error handling, coding Project relucta Process ng by transaction mgmt) Detail design was Manageme nt to membe ineffective nt follow rs proces s Requiremen ts Skills & Training

Chapter 9

Process Stability

49

Statistical Process Control


A Histogram (or frequency distribution) is a snapshot of our process outcomes at a particular point in time. If we took a number of distributions for a given process (e.g. weekly basis) they may look like this:

Can you predict Fridays Output ?


MONDAYS OUTPUT

TUESDAYS OUTPUT

WEDNESDAYS OUTPUT

THURSDAYS OUTPUT

Can you predict Fridays Output ?


MONDAYS OUTPUT

TUESDAYS OUTPUT

WEDNESDAYS OUTPUT

THURSDAYS OUTPUT

A Process in Statistical Control - Stable


The recorded variation is due to common causes only. Common causes refer to the many sources of variation within a process that has a stable and repeatable distribution over time. This is the natural variation in the process, such as tool wear of a machine or the vary time taken to perform a task. If the Process Inputs remain unchanged, then you may be confident that the process Output will remain relatively the same. The process (outcome) is predictable With the graph showing a random pattern that fits the statistical model, this process is deemed to be in control and called in a state of statistical control.

Statistical Process Control- Stable

This diagram shows the Average (or Mean) for our process has not changed, our spread (variation) has not changed also. So we can be confident that if the process is unchanged by US the process outcome for Week 4 will be similar to Weeks 1, 2 and 3. We can predict what may happen We would say this process is IN STATISTICAL CONTROL. Notice that the process still has some variation, this is due to common causes (over which the operator has no control).

A Process in Statistical Control Un Stable


Special causes (often called assignable causes) refer to any factors causing variation that are not always acting on the process. That is, when they occur, they make the (overall) process distribution change. Examples may be using old stock in production or a new operator that has not been trained to perform the job properly. Unless all the special causes of variation are identified and acted upon, they will continue to affect the process output in unpredictable ways. This process is unpredictable.

A Process Out of Statistical Control- Un Stable


If special causes of variation are present, the process output is not stable over time. If the Process Inputs change constantly or you are unable to control the variation in the process, making it impossible to PREDICT what will happen next. This process is OUT OF STATISTICAL CONTROL

A Process Out of Statistical Control.. Continued.


week 5 week 4 week 3 week 2
week 1

if special causes of variation are present the process output (described by the distribution) may not stable over time .

Statistical Process Control.. Unstable

If special causes exist, distributions taken over time may vary quite differently from one another. The process being examined is then said to be OUT OF STATISTICAL CONTROL. It is not producing predictable outcomes. Week 2 and 3 show a difference in the Average and the Range (spread). How can we confidently predict what is going to happen? (We cant!!).

Tools for Understanding Variation


Time plots (run charts) Control charts Frequency plots Pareto Charts

59

Objective of Six Sigma


Which pilot would you trust your life with ?

Reduce the Variation & Shift the Mean

60

Chapter 10

Process Capability Indices

61

Index #1 Process Potential Index


Cp = USL - LSL Process Width

Process width is measured in terms of population standard deviation

LSL

TARGET

USL

Process Width

Cp Cp

Specs.Window Process Width

Specs.Window 6

Specs. Window

= population standard deviation


62

Index #2 Process Capability Index

Cpk

USL Mean 3

or

Mean - LSL 3

whichever is smaller.

63

Interpreting the Indices

Process A B C

Cp >2 >2 <1

Cpk >2 <1 <1

Inference 6 process Good process, but requires centering Inadequate process

64

APPLICATION IN SCORECARD

DEMO

SCORECARD SHOWING CENTRAL TENDENCY, TARGET, DISPERSION, LSL, USL, Cp, Cpk
M.NO
1 2 3 4 5 6 7 8 9 10 11 12

METRIC METRIC NAME


EVa EVb SVa SVb SzVa DRE DD RE PROD PPI SLAC MTTR Effort Variation - Initial Effort Variation - Latest Baselined Schedule Variation - Initial Schedule Variation - Latest Baseline Size Variance Defect removal / Containment efficiency Defect Density Review Effectiveness Productivity Project Performance Index SLA Compliance Mean Time To Repair Cat Small

UNIT MEAN
% % % % % % D/FP % FP/PD % HRS 45 25 12 3 40 85 0.4 55 24 0.77 79 5

TARGET SIGMA LSL


0 0 0 0 0 98 0.1 70 20 0.9 99 7 30 20 30 10 20 10 0.2 10 4 0.24 2.4 1 -20 -10 -5 -3 -10 90 0 60 10 0.7 90 6

USL
40 10 30 10 20 100 0.3 80 40 0.99 100 8

Cp
0.33 0.17 0.19 0.22 0.25 0.17 0.25 0.33 1.25 0.20 0.69 0.33

Cpk
-0.04 -0.25 0.12 0.15 -0.25 -0.10 -0.13 -0.17 1.00 0.07 -0.85 -0.33

SCORECARD 4

65

Chapter 11

Control Charts

66

Specification Limits vs. Control Limits


Specification limits Are set by the customer, management, or engineering requirements. Describe what you want a process to achieve. Control limits Are calculated from the data. Describe what the process is capable of achieving.
67

120

110

UCL Upper spec

100

90

Lower spec LCL

80

70 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35

Adding the Three Sigma Limits


Walter A Shewhart, father of SPC, introduced 3 limitsand started a tradition. He coined the terms assignable cause, chance cause and statistical control UCL = MEAN + 3 SIGMA LCL = MEAN 3 SIGMA

Note : Control Limits are not specification limits

68

Adding the Three Sigma Limits


Random Variation Region UCL + 3

Observation value

10 9 8 7 6 5 4 3 2 1 0

Mean LCL - 3
0 5

Observation number

10

15

20

Nonrandom Variation Region

99.73% area
69

Western Electric Rules for Control Overview


A B
6 4 2 UCL

C C B A
LCL

Established rules for run and trend analysis

1.Any point outside control limits 2.7 consecutive points on same side of centerline 3.7 consecutive points increasing or decreasing 4.2 of 3 points in same zone A or beyond 5.4 of 5 points in same zone B or beyond 6.14 consecutive points alternating up and down 7.14 consecutive points in either zone C
70

Rules
1. One point beyond zone A 2. Nine points in a row in zone C or beyond. 3. Six points in a row, all increasing or decreasing 4. Fourteen points in a row, alternating up and down 5. Two out of three points in a row in zone A or beyond 6. Four out of five points in a row in zone B or beyond 7. Fifteen points in a row in zone C, above or below center 8. Eight points in a row beyond zone C, above or below center
+3 +2 +1 -1 -2 -3

B C C B

A 71

A Process is Unstable if
Rule 1
A B C C B A

Rule 2
UCL Average A B C C B A

LCL

One point or more points fall outside of the control limits. Rule 3
A B C C B A A B C C B A

Nine consecutive points on one side of the average (Shift)

Rule 4

Six consecutive points steadily increasing or decreasing (Trend)

Fourteen consecutive points in an upand-down pattern (Sawtooth or Cycle)

72

A Process is Unstable if
Rule 5
A B C C B A A B C C B A

Rule 6

Two out of three consecutive points in Zone A or beyond Rule 7


A B C C B A

Four out of five consecutive points in a row in Zone B or beyond

Fifteen consecutive points in Zone C

73

How to Calculate Control Limits

Control limit Calculations

74

Control Charts Selecting the Control chart based on the following flow chart:
Start c Chart Measuring Defects or Defectives? Defects Yes No Group Sub Group size <10 Yes No IMR Chart X and R Chart Defectives Same Opportunity for defects? Yes

Detect minor shifts?

Continuous

Data Type ?

Attribute

EWMA Chart

Individual or in Group comparison Individual

Subgroup size same or different? Same Different p Chart np Chart

No u Chart

X and s Chart

Legend

Control charts used with continuous data Control charts used for with Attribute data

75

Chapter 12

Scatter Plots

76

Scatter Plot
Scatter Plot A scatter plot is a graph that helps to visualize the relationship between two variables. It can be used to check whether one variable is related to another variable and is an effective way to communicate the relationship we find.

77

Scatter Plot Contd..


Why & When to Use Scatter Plots To study and identify possible relationships between the changes observed in two different sets of variables To discover whether two variables are related To understand the relationships between variables To find out if changes in one variable are associated with changes in the other To test for a cause-and-effect relationship; but finding such an apparent relationship does not necessarily imply causation. Even strong correlations do not imply causation.
78

Scatter Plot Contd..

Sog oii e t n P tv r s Cr l t n oe io r a

Sog eav t nN t e r gi Cr l t n oe io r a

P s l P it e o i e o iv sb s Cr l t n oe io r a

P s l N av oi e e t e sb g i Cr l t n oe io r a

79

Scatter Plot Contd..

N o C o r r e l a t io n

O t h e r P a t te r n
A D - 07 9

80

Scatter Plot Contd..


Original data

Yiel d

AD-079

Temperature

81

Scatter Plot Contd..


Simple Regression Correlation (r): The Strength of the Relationship The correlation, r: Ranges from 1 to 1 r = 1 = perfect negative or inverse relationship

r = 0 = no linear relationship r = +1 = perfect positive relationship Measures the strength of the relationship R2 is equal to square of r r is Known as Pearsons correlation coefficient
82

Scatter Plot Contd..


R-Squared (R-Sq or R2): The % Explained Variation R-Squared = R-sq Measures the percent of variation in the Y-values that is explained by the linear relationship with X. Ranges from 0 to 1 (= 0% to 100%)

Explained variation R - sq = x 100 = % Explained Variation Total variation


83

Thanks
84

You might also like