You are on page 1of 58

# Senior Management Programme

## Quality With Statistics

Ltd.
26 27 June
2002
Conducted
by:

Indian Statistical
Institute
98 sampatrao Colony
BARODA 390 007

## Defining the Ideal Quality

Value
1) List the six factors which you believe are the major determinants of quality
Guidelines for scoring

## 1. Use same time scale:

Frequency of improvement
actions <= frequency of
review <= frequency of
reporting <= frequency of
measurement.
2. A factor is measured
always if it is measured as
frequently as is practically
possible.

## 2) For each factor, place a rating on the following statements

5 = Always
M Performance of the listed factor should
be measured
4 = Often
R The performance measure should
be reported
3 = Occasionally
R The management should
review the performance 2 = Rarely
reports
1 = Never
I Improvement actions should
stem from the reviews

Factor

## 3. Ideal value for each

factor need not be 5. For
example,scrap% may be
measured every hour
(always) but the ideal
frequency may be every
shift (often).

Measure

Report

Review

Total

Improve

## 4. A factor which is not

measurable (e.g. integrity)
gets a score of 1 for all the
four actions (M-R-R-I).

Total

## Defining the Real Quality

1) List the 6 factors you believe are the major determinants of quality
Value
2) For each factor place a rating on the following statements

Guidelines for
scoring
1. Use the same 6
factors and the
time
scale as was used
while defining
ideal
quality value.

M
R
R
I

5 = Always
Performance of the listed factor is
measured
4 = Often
The performance measure is
reported
3 = Occasionally
The management
reviews the performance reports
2 = Rarely
Improvement actions
stem from the reviews
1 = Never

Factor

Measure

Report

Review

Total

Improve

## 2. Real score can

be
equal to or more
or
less than the ideal
score.

Total

Behavior Score
120
100
80
60
40
20
0
0

20

40

60

Belief Score

80

100

## Quality with Statistics

120

Cutting to the
Core
Behavior is a function of values
B = f(V)
Behavior
The way in which a person or group of people responds
Values
The complex of beliefs, ideals or standards which
characterizes a person or group of people

## The Cost of Remaining

Average
Waste as a proportion of total sales volume

30%
Typical Company

## The Classical View of

Performance
Practical Meaning of 99% Good
20,000 lost articles of mail per hour

## Unsafe drinking water almost 15 minutes each day

5000 incorrect surgical operations per week
2 short or long landings at major airports each day
2,00,000 wrong drug prescriptions every year
No electricity for about 7 hours each month

## The Need for

Knowledge
Knowledge
We dont know what we
dont know
If we cant express what
we know in terms of
numbers, we really dont

The Need
If we dont know, we can
not act

loss is high

is managed

## In God we believe, all

else must have data
Hewlett Packard

## If we do know and do not

act, we deserve the loss

## Quality with Statistics

The Role of
Questions

the same answers which invariably produce the same result. To change
the result means to change the question.

New

## measures lead to new questions. [Management needs to focus on

new measures like .. rather than outputs and budgets].

As

## questions arise, vision emerges, direction becomes apparent and

ambiguity diminishes. In turn, people become organized and mobilized to
common action.

## When people take common action, the organization's

ability to survive
and prosper will increase, owing to the discovery of answers to
problems heretofore not known.

Insanity is doing the same thing over and over again but expecting different results Rita Mae Brown (Author)

## The Value of Measurement

Improved Measurement

Measurement

Question

Search

Knowledge

## We dont know what we dont know

We cant act on what we dont know
We wont know until we search
We wont search for what we dont question
We dont question what we dont measure
Hence, we just dont know
Mikel J. Harry

## Quality with Statistics

10

The Role of
Training
Undoubtedly the most

## important aspect of Quality is people and their

knowledge. Without this golden asset all is for nothing. At the risk of redundancy,
you dont know what you dont know and if you dont know something nothing
will happen. Obviously the key is knowledge. Successful change can not occur
without it.

## Today, the best-in-class companies provide a tremendous amount of training

and education to their employees. Many of such companies have made
significant investments in training, and are discovering the rewards. For
example, Motorola Inc. has discovered a 10:1 return on their budget. In fact, they
require every employee to receive 40 hours or more of training annually, of
which 40% must be in the area of Quality.

## Quality with Statistics

11

What is Quality
Quality means different things to different people. There is
no universally accepted definition.
However, there is a broad agreement on the following
Very difficult to define
Determined by customer
Multi dimensional
Dynamic
Needs to be TOTAL

## Usually, TOTAL QUALITY refers to the fact that all

departments have roles in quality.
Quality with Statistics
12

## ISO 9000:2000 Definition of

Quality
Degree to which a set of inherent characteristics fulfills
requirements
Requirements are needs or expectations that are stated or implied
Requirements can be generated by different interested parties
Inherent characteristics are the distinguishing features that exist in
the product/process/system, specially as a permanent characteristic

## Inherent characteristics are called quality characteristics

Assigned characteristics (e.g. product price) are not quality
characteristics
Note: This definition is an improvement over its 1994 version.
However, it can still be argued that all inherent characteristics
are not quality characteristics.

13

Product
Quality

## Marketing Quality + Design Quality +

Mfg.Quality + . + Service Quality

Appropriateness of requirements
Customer
Satisfaction

Degree of conformance to
requirements
Cost of identifying and meeting the
requirements

14

## How to Measure Quality (Contd.)

Customer satisfaction can be measured but it is not very useful
as a stand-alone measure.
Establishing the function f is a highly challenging task
Presently, all quality measures (e.g. Defect Rate, Process
Capability, Quality Cost, Cycle Time) address only a part of the
whole.
Points to remember
Quality

## is customer satisfaction but customer satisfaction is

not quality
Reducing internal rejection and rework reduces producers
cost but not that of the customer
Quality with Statistics
15

Components of
Main
Quality
Quality of Design
Componen
Decides the level of customer
attraction
Related to market segmentation

t
Quality of
Design

## Improving design quality may

lead to higher cost but the same
need not be the case always.

Subcomponen
t
Product
Design

## Power rating of an engine

Robustness
Operating cost
Ease of use

Process
Design

Rated efficiency
Process capability
Cycle time
Downtime for regulatory inspection

Process
Conformance

Process instability
Process failures
Late deliveries
Loss of efficiency/yield

Product
Conformance

Field failures
Factory scrap and rework
Deviation from target
Incorrect invoices

Quality of Conformance
Refers to the deficiencies
resulting from lack of control
Decides the level
customer dissatisfaction

of

Improving quality of
reduction of costs. It is in this
sense that Crosby says
quality is free

Quality of
Conformance

Examples of features

16

## Quality with Statistics

Quality of
Design

System Design
Parameter Design
Tolerance Design

Quality

Statistical Tools
Process Monitoring

Quality of
Conformance Problem Solving

Product Disposal

17

## Quality with Statistics

Qualit
This Programme
Scope
Statistical Tools
y by

Product
Design

in
AECL

This
Programm
e

System
Design

Limited*

Engineering

Nil

Paramet
er
Design

-do-

Ratio, ANOVA

The concept
of robustness
only

Toleranc
e Design

-do-

## Statistical Designs, Loss

Function, Simulation,
Regression

Nil

System
design

Limited**

## Same as those mentioned

against product design
Process
Design
PLUS optimization tools for
Paramet
Very
inventory management,
er
High
* Applicable only forDesign
intermediate products and services
transportation, scheduling
** Applicable mostly for management and service delivery processes etc.
Toleranc
High
Quality
with Statistics
e Design

Nil
The concept
of robustness
only
Illustration
18
with an

## Quality with Statistics

Quality
This Programme
(Contd.)
Scop
Statistical Tools
by

Process
Conforman
ce

Product
Conforman
ce

e in
AECL
Process
Monitorin
g and
nt

Very
High

Problem
Solving

Very
High

Product
Disposal

High

Field

This
Programm
e

## Probability Distributions, Principles and

Control Charts, GR&R
tools of
Studies, PCA, Process
process
monitoring
only
Simple tools like
Histogram and C&E
diagram, (Z, t, 2, F)tests, Advanced tools
PLUS all the tools
mentioned above

Concepts,
disciplines
and simple
tools of
problem
solving

Bulk Sampling,
Acceptance Sampling,
Loss Function

Issues in bulk
sampling only

Quality
High with Statistics
Nil

Nil 19

Chapter 2:

## Data and Data

Collection

Data
Data are facts or figures related to any characteristic of
an individual
Also called a variable
A m/c, an year, a casting, a dimension, a person

VARIABLES
Station

Date of
commissioning

Availability
(%)

No. of
outages

C:15

12/11/98

92.5
9

30

C:16

10/05/97

93.0
4

12/10/78

88.3
2

INDIVIDUALS

Average
duration of
non-stop
operation
(days)

## Average loss per

outage (hours)

Main
cause of
outage

Capacity
utilization

Forced

Planned

27

64

52

Leakag
e

High

47

28

52

52

Leakag
e

Mod.

124

58

261
164
Gen*
V. Low
* Generator stator / rotor problem

## Quality with Statistics

21

Types of Data/Variable
D a t a / V a r ia b le
N u m e r ic a l/ Q u a n t it a t iv e
C o n t in u o u s

D is c r e t e

C a t e g o r ic a l/ Q u a lit a t iv e
O r d in a l

N o m in a l

## Continuous: An infinite number of values (positive or negative) are possible, e.g.

measurements of weight, length, chemical composition.
Discrete: The variable can take values 0,1,2,3, .. e.g. count of frequency (# of
defects, breakdowns etc.)
Ordinal: Data classified in ordered categories, e.g. quality of service provided is
classified as poor, moderate, good or yearly rainfall classified as very low, low,
moderate, good and very good.
Nominal: Data classified in categories having no inherent or explicit order, e.g.
location classified as east, west, north, south or names of departments.

22

## Types of Data - Outage Data

Example
Variable Name

Variable Type

1. Date of commissioning
2. Availability (%)
3. Number of outages since
commissioning
4. Average duration of non-stop
operation (days)
5. Average loss per outage
(hours)
6. Main cause of outage
7. Capacity utilization

23

## Types of Data - Further

Considerations
Continuous data may appear as discrete either due to rounding (see the
outage data example) or due to measurement limitations. We should treat
such data as continuous unless the number of levels in the data set is very few
(say 2-4).
However, hourly records of steam pressure at turbine inlet (station F) show
that the values are either 126 or 127 or 128. Great care must be exercised
while analyzing such data.
Discrete data having seven or more levels may be treated as continuous data.
Dichotomous data (O.K/Not O.K, Pass/Fail etc.) may be treated as discrete data
after coding the two categories as 1 (O.K) and 0 (Not O.K).
In the field of Quality Control, various types of data are classified as
- VARIABLE DATA : Continuous data
- ATTRIBUTE DATA: Others - Discrete, Dichotomous, Ordinal and Nominal
Henceforth we shall use this later classification.

24

Data Gateway
DATA
COLLECTION
Problem/
Hypothesi
s

DATA ANALYSIS

Dat
a

Solutio
n/
Fact

## Quality problems can not be solved merely based on experience.

Any claim not backed by data is only a hypothesis.
Data Gates: Quality of the data gates and their placement at
appropriate
locations of a process are extremely important for
process control.
Data Quality: Data collection step is vital garbage in,
garbage out

25

## Data Quality Scale

Most Data are of Poor Quality
Whenever you see data,
doubt it
Quality
Impact
Example
category

Rank*

Wrong data

information

Cooked data

Noisy Data

Potentially
information

High gauge
R&R

Old data

## Irrelevant data Useless information

data

Partial information

Small sample

Hard data

Difficult to process

Censored
data

Redundant

Quality
with
Statistics
Useful but
to

Multiple

626

## Information Content in Data

for Process Control
Source of Data

Attribute
Data

Variable
Data

Very low

Low

Low

Moderate

## Past Data: Statistically

designed experiments

Moderate

High

## Live data: Passive observation

of the process

Moderate

High

High

Very High

General literature
Past data: In-house routine
Q.C records

## Live Data: Statistically

designed experiments

## Do not transform variable data to

attribute data.
That will be
like burning
diamond for heat.
Quality
with Statistics

27

## Data Collection Process

INDIVIDUALS

VARIABLES
Var. 1 Var. 2 Var. 3

Population
.

Var. p

Ind. 1

Data

Data

Data

Ind. 2

Data

Data

Data

Data

Ind. 3

Data

Data

Data

Data

Ind. n

Data

Data

Data

Sample

Data

.
.

Measurement
.

Data

Recording

Quality with
Statistics
Editing, Storage,

28

## Linking Data Quality to Data Collection

Process
Process
Elements
Popula
tion
Sampl
e

Wron
g

Individua
l
Procedur
e

quat
e

Har
d

Gauge

Appraise
r

Others

Record
ing

Irrele
vant

Variables

Size

Measu
remen
t

Noisy

Format
Recorder

Editing, Storage,
Retrieval

Redun
dant

Issues
relate
d to
data
base
mgmt.

Quality
with Statistics

29

## Poor Data Quality

- Cause and Effect Diagram
Populatio
n

Individu
al

Sampl
e

Variabl
e
Samplin
g
Method

Size

Measureme
nt
Measuran
Gaug
d
e
Metho
Apprais
d
er

Poor
Data

Operat
Softwar
or
Hardwar
e
Forma e
Data base
Mgmt.
t
policy
Recordin
Editing, storage,
g
retrieval
Record
er

Qualit
y

Note: Due to limitations of space, only the main subcauses are shown in the CE diagram.

30

## Measurement Related Causes for

Poor Data Quality
Calibratio
n
Not

Operatio
n

done
Statu
Breakdow
s
Done long back n
Not
Resultsused

Measuremen
t
Bia
s

Malfunctionin
g

error

Numbe

Appraiser
s
Reproducibili
ty

Unstabl
e

Not
traceable

Gauges

Different
makes
Many
Variable
least
count

Numbe
r

Operating
range
Beyond
limit

Capabilit
y

Low
repeatabilit
y Precision
Low least
count

Measuran
d

Inhomogeneou
s
Standard
procedure
Not
availabl
e

Type of
data
Unwante
d

Not
followed
Communicatio
n

Metho
d

Poor
Data
Qualit
y

31

## Data Collection Planning

Plan

The Planning
Questions
1) What do you want to
know?
2) How do you want to
see what it is that you
need to know?
3) What type of tool
will generate what it
is that you need to
see?
4) What type of data
is required of the
selected tool?

## 5) Where can you get

the required type of
Execut data?
e

Illustration
Has X any effect on
Y?
X1 X2
X3 Y
.
. . .... . ...
..
.. .
Y
Histogra
m
X1 X2 X3
Y11 Y21
Y31
.
.
.
Y1n Y2p
Y3qinspection
Final
and production
log book

Scatter
diagram
X
X1
.
Xn

Y
Y1
.
Yn

Nowhereto be
collected

32

## Data Collection Tools

Foregoing discussion indicates that collection of right data, by no means, is
a trivial task. One can go wrong in various ways at different stages of the
data collection process.
The two basic requirements for data collection are

Clarity of purpose
Use of a structured approach

Commonly used data collection tools, that satisfies the two requirements
are

Check Sheet
Data Sheet

## Check Sheet: Checks (/, , x etc.) are made against a category of a

variable or combination of categories of several variables. Used primarily
for collecting attribute data.
Data Sheet: Measurement results are recorded against an individual and
its characteristics. Used for collecting both attribute and variable data.
Many consider all check sheets as data sheets and vice versa. However,
we shall distinguish between the two as above.

33

## Process Distribution Check

Sheet
Power Generation Process (Moving
Target) Characteristic: Y1= Total generation (MW), Y2= System

Month:
September
Sampling
interval: Every 3.5 demand
hours
Target: Min(420,
Data: Target - Y1
Y1) Class
bar
Check

## Process average (Y1 bar): 420

MW Total No. of observations:
206
Frq

Interval
<-54.99
-54.99 to
44.99
-44.99 to
34.99

Wasteful export
due to lack of
control

7
5

Export limit =
-10

-34.99 to
24.99
-24.99 to
14.99

Import limit =
+20

-14.99 to
04.99
-04.99 to
05.01
05.01 to
15.01
15.01 t0

Wasteful import
due to lack of
control

Defect rate = 27
%

8
12
6
16

342

## Causes for Wasteful Import of

Power
Run Chart of half-hourly readings of
generation at station C15 in September
2001

35.0
30.0
25.0
20.0

15.0
10.0
5.0
0.0

1340

1237

1134

1031

928

825

722

619

516

413

310

207

104

pick up

35

## Defect Cause Check Sheet

Month: September, 2001
affected
Station
C15
C16
Defect

D

Total

Process
failure

52

Process
deficienc
y

81

Early
slowdow
n

15

Late pick
up

34

Total

54

22

65

21

stations

20

182

36

## Identifying Critical Causes

for Wasteful Import
Hours of low
generation
C1 C1 D
5
6

PF

30

15

PD

11

36 14 11

ES

LP Process
11 11
PF=
PD=
Process
failure
ES=
Early
deficiency slow
LP=
downLate pick
up

## Average generation loss at each

instant
C15 C16
D
E
F

PF

29.
0

29.
5

PD

10

ES

10

30

15

7
LP 10
4
Total generation loss
(MWH)
C15 C16
D
E

30

15

Tota
l

PF

870

160
5

725

320
0

PD

55

18

360

70

55

558

ES

20

270

30

328

LP

110

44

150

105

409

with
238 Statistics
795 190

449
5

Tota
l

105
Quality
70
5

107.0 103.5

110.
0

37

## Other Types of Check Sheets

Defective item check sheet

## Instead of a table a diagram is made of the defect space.

Checks are made at the location where defect occurs.
Locational segregation of defects, if any, provides valuable clue.

## Leakage in a cooling system

Cracks in castings
Wear out of moving parts

## Used to make a comprehensive check-up of product/process quality (usually

at the final stage).
Preprinted items of checks avoids duplication and missing of tests to be
performed.
It is a variation of check list, which is used for checking if all the tasks have
been performed or not.

## C-E diagram check sheet

Checks are made against the cause of a problem in the C-E diagram.

38

## Data Sheet General Format

Title
Individu
al

Var. 1

Common relevant
information

Var.
2

Var. p

Remar
k

Ind. 1
Ind. 2

Ind. n

Notes:

Important summary of
data

39

Rak
e
N0.

## Up-load detention report for the month of July,

Dat2001
Arrival Qu # of
For
For
Depar Depar Deten Demur Rea
e

time

a
lity

wag
ons

m
date

m
time

t.
date

t.
time

.
hours

.
hours

son

Actual
g time Hr.

01

01

19.45

Envi
ro

58

02

05.3
5

02

15.30

09.55

09.00

20

14

07.50

Du.
hill

58

15

16.4
5

16

00.20

07.35

23

S(19)
+I(4)

14.30

42

31

20.20

14.45

Purpose
? Estimation of demurrage

hours of demurrage
Control

hours
Important reasons cited are receipt in quick succession, successive
detentions and wet coal. These are beyond the control of the coal
handling section.

Quality
Data!with Statistics

40

Chapter 3:

Summarization of
Data

## Data Analysis Getting Started

Half-hourly record of generation by station E during 19/9/01 (10 hrs.)
to 21/9/01 (1.30 hrs.) under normal operating condition

Hours
(MW)
10.00
13.30

14.00
17.30
18.00
21.30
22.00
01.30
02.00
05.30
06.00
09.30
10.00
13.30
14.00
17.30

Generation
102.8 105.2 103.2 104.0
105.0 105.0 104.0 104.0
103.2 104.2 102.0 103.6
105.2 106.0 105.0 103.0
104.2 105.8 105.4 104.8
106.0 104.0 104.2 103.8
103.4 104.4 104.4 104.2
104.8 102.8 103.6 104.8
104.0 104.0 104.0 104.0
103.0 104.8 102.8 104.0
104.0 103.4 106.0 104.4
What
are105.2
your
105.0
105.2

conclusions?
Quality with

105.2
105.2
103.8
103.2
104.8
104.4
104.8
104.4
104.4
103.4
104.4

Statistics

104.8
106.0
105.0
103.0
105.2
104.0
106.2
104.8
104.0
103.6
102.4

105.6
106.4
105.2
103.0
105.2
102.2
106.4
104.0
102.6
104.0
102.8

42

Frequency Distribution
- Analyzing a large data set on the same
variable
Generation data set (previous
The eighty observations
are grouped in eight classes of
slide)
equal length
Class Interval
Tally
Frequency
101.7 102.3

02

102.3 102.9

06

102.9 103.5

10

103.5 104.1

19

104.1 104.7

11

104.7 105.3

22

105.3 105.9

03

105.9 106.5

07
Total

80

## Does the frequency distribution provide better insight into the

Data are not
process?
information
DATA +
ANALYSIS =
INFORMATION
Quality with Statistics

43

Constructing Frequency
Distributions
- Variable Data
Data set

## Number of classes (k)

Too many classes obscure the pattern of the distribution due to

## sampling fluctuations. Details are lost with too few classes.

Optimum number of classes is given by k = 1 + 3.3 log10 (N)
The simpler formula k = N also works well in practice.
For better visual impact, it is preferable to have 5 k 12.
For the generation data set we have N = 80. Therefore, k

## = 1+3.3*log(80) = 7.3. This means the number of classes

should be either 7 or 8. We have chosen 7 classes.

## Quality with Statistics

44

Constructing Frequency
Distributions (..contd.)

h = (R + w) / k

## where R = Range of the observations = Maximum

Minimum
and
w = Least count of measurement.
Next, h is rounded to the nearest integer multiple of w. This

## means, if the least unit of measurement (w) is 0.1, then h =

2.312 should be rounded to 2.3. However, if w = 0.2, then
the same h should be rounded to 2.4.
In our generation data example, R = 106.4 102.0 =

## 4.4, and w = 0.2. Thus, h = (4.4+0.2) / 7 = 0.657,

which is rounded to 0.6. We shall explain later, why
taking h = 0.7 will be erroneous.
Note that if h is rounded down then we shall need (k+1)

## classes to cover the whole range of the observations. How

many classes shall we need if h is rounded up?

## Quality with Statistics

45

Constructing Frequency
Distributions (..Contd.)

Class limits

The minimum value of the generation data is 102.0 and the class
width has been determined as 0.6. So we can form the classes as
102.0 102.6, 102.7 103.3, 103.4 103.9, . . .

## The problem with the above classification is that there is a gap

between two successive class intervals. This is not desirable since
we are dealing with continuous data.

## Discontinuity can be removed by forming the classes as

102.0 102.6, 102.6 103.2, 103.2 103.8, . . .
However, this classification has another problem. Suppose we have
an
observation 102.6. In which class shall we place it, first or second?

## In order to avoid such confusion we take

Lower limit of the first class = Minimum w/2
and then successively add the class width to this lower limit to obtain
the other class limits.

## Quality with Statistics

46

Constructing Frequency
Distributions (..Contd.)

101.9 102.5
102.5 103.1
103.1 103.7
103.7 104.3
104.3 104.9
104.9 105.5
105.5 106.1
106.1 106.7

## Note that now we have

- 8 classes (since h has been rounded down from 0.657 to 0.6)
- no confusion in classification (since there are no observations
which
fall on the class limits) and
- an extended last class (ideally the upper limit of the last class
should
have been 106.5).

## In the example, we have extended the first class instead of

the last one since this has brought out the process
abnormalities better. Thus the eight classes used are
101.7 102.3, 102.3 102.8, , 105.9 106.5

## Quality with Statistics

47

Constructing Frequency
Distributions (..Contd.)
Tally marking (second column)

Start with the first observation. Find the class to which the observation
belongs. Put a tally against the class.
Classify all the remaining observations as above.
Tally marks are grouped in five, with the fifth tally crossed through the
previous four tallies. This provides a better visual display and helps in
counting the frequency of each class.
Note that all the above observations get classified as we go through
the observations only once. However, if we concentrate on a class and
then try to find out the number of observations in the class then we
have to go through the observations k times. This not only consumes
more time but also increases the chance of committing error.

tallies.

Other columns

## Columns giving cumulative frequency (f1, f1+f2, ..) and relative

frequency (f1/N, f2/N, ..) may also be added, if required.

## Quality with Statistics

48

Constructing Frequency
Distributions
- Getting the class intervals right

## Why class width (h) is rounded to nearest integer multiple of w

Consider the same generation data example. Here w=0.2. Assume that
h = 0.657 is rounded to 0.7 (which is not an integer multiple of 0.2)
instead of 0.6. Thus the classes will be 101.9 102.6, 102.6 103.3, ..

## Now in order to overcome the problem of classifying observations like

102.6, we are forced to consider w=0.1 and have the classes as
101.95 102.65, 102.65 103.35, 103.35 104.05, 104.05 104.75,
104.75 105.45, 105.45 106.15, 106.15 106.85

Note that the number of observation units covered by each class are
not same. For example, the second class covers three units (102.8,
103.0 and 103.2) but the third class covers four units (103.4, 103.6,
103.8 and 104.0). As a result the frequency distribution is likely to
show many peaks.

## Assuming w=0.1, the seven classes shown above should be

appropriate. However, note that the last class is extended by four units
beyond the maximum observed value of 106.4. It is desirable to
distribute this imbalance to the two end classes by starting the first
class from 101.75 and ending at 106.65.

49

## Frequency Distribution of The

Generation Data Further
analysis

## The frequency distribution shows an abnormal pattern (nearly

alternative peaks). Does this mean the process mean is jumping
Following two frequency distributions constructed out of the same data

Fractional part Frequenc
y
.0
.2
.4
.6
.8

!!
a
at
D
y
s
i
No

Class interval

Frequency

101.7 102.7

04

27

102.7 103.7

17

18

103.7 104.7

26

104.7 105.7

25

105.7 106.7

08

Total

80

15
5
15

Total
80
0s occur more frequently
at the cost of 6s. Does this
indicate
measurement
bias?

Quality

## Smooth pattern (left skewed).

Smoothness has been achieved not only
by reducing the number of classes but also
by including the adjacent 0s and 6s in the
with interval.
Statistics
50
same

Histogram
Histogram is a graphical representation of a frequency distribution of

variable data.

The histogram of the generation data having five classes is shown below.

Frequenc
y

30
25
20
15
10
5
0

## Bars of equal width (= class width)

Heights of the bars are proportional to

## Bar width of about 1 cm. (7-10 classes)

Horizontal axis is about 1.6 times

## longer than the vertical axis

101.7
103.7
Pattern of variation: Slightly left
105.7
skewed
Generation in E station
(MW)
Specification limits: Should be shown wherever applicable.
Class mid-point: Marking the class mid-points may be helpful in

certain cases.

Open ended classes: Avoid adding too many classes at the ends

having zero or
Quality
withasStatistics
51
very
low
frequencies.
Shown
open
ended
bars
with
arbitrarily
reduced heights.

Construction of Histogram
- An exercise
Half-hourly record of power (MW) generated by station E during

29.9.2001
(10.00 hours) to 30.9.2001 (24.00 hours) gives us the following
data.29/9
6.4 6.4 6.8 6.0 5.2 4.8 6.4 4.4 5.2
(10 hrs.) 6.0
7.6 8.0 7.4 6.6 8.0 5.6 7.2 7.2 7.0
4.0
6.4 8.0 8.0 6.0 6.0 6.4 7.8 7.6 7.6
7.4
7.6 7.6 7.4 4.6 4.2 4.8 6.0 5.6 5.4
5.0
6.2 7.8 7.4 7.2 7.4 7.8 6.6 6.4 6.8
6.8

30/9
6.8 6.8 6.6 6.8 6.6 6.8 6.8 6.8 7.0
(24
hrs.)
7.0
Construct a histogram of the above data set. Compare with the
histogram 6.0 5.6 4.4 4.6 4.6 4.8 6.2 7.0 6.6
for the period
6.4
19.9.01
to 21.9.01
( previous slide) and offer your
Quality
with Statistics
52

Commonly Observed
Histogram Patterns

## Single peak, symmetric,

bell shaped, commonly
observed pattern of a
stable process
LSL
US
L

## Single peak, thick

tail

How
?

Single
positively
(long tail
right)

peak,
skewed
on the

Single
peak,
negatively skewed
(Long tail on the
left)
Many
characteristics
generation
data
is
negatively skewed while
breakdown
data
is
Two peaks (bipositively
skewed.
modal)
?
w
However
such
shapes
Ho
may also indicate process
Quality with Statistics instability.
53

Frequency
Distribution of
Discrete
Data
Number of plant outages in each year since
commissioning
Statio
Period
Type of
n
outage
D

197879
To
200001
198586
To
200001

Forced

# of outages in a year
2, 3, 1, 0, 3, 2, 1, 0, 2, 2, 0, 2, 3, 0, 2, 1,
2, 1, 1, 0, 1, 0, 2

Planned 3, 5, 1, 4, 2, 5, 2, 1, 6, 3, 7, 7, 4, 7, 6, 5,
6, 4, 2, 2, 2, 6, 2
Forced

2, 2, 5, 3, 0, 0, 1, 0, 1, 0, 2, 1, 1, 0, 1, 4

Planned 15, 7, 8, 3, 7, 5, 2, 6, 3, 8, 7, 4, 5, 4, 3, 4

F
19881, frequency
1, 0, 0, 1, 1,
2, 0, 1, 0, 1,(for
6 each type
Ideally
we
should Forced
construct4,six
distributions
89
Planned However,
3, 11, 6, 12,
1, 2, 8, 2,of4,data
4, 6 we shall
of outage inTo
each station).
due4,to0,shortage
construct only
2000-two - one for forced outage and the other for planned
outage.
01
What can you say about the occurrence of two types of
Quality
with
Statistics
54
outages from the above
data
set?

## Line Graph and Bar Graph

Distribution of number of yearly outages of stations D, E
and F since commissioning
Forced outages (Line
graph)

1
6
1
2
8

Frequenc
y

Frequenc
y

1
6
1
2
8

graph)

Number of
outages

0 1 2
11,12,15

3 4

Number of
outages

8 9

## Line graph is showing the frequencies of individual outcomes.

Bar graph is similar to the histogram. But there are gaps between the
bars since we are dealing with discrete (attribute) data.
Planned outages occur more frequently than forced outages.
Number of planned outages is uniformly distributed between 2 and 7
with very few outages outside this band. Such a pattern is somewhat
odd. Planned outages need to be defined properly. Do we undertake
unnecessary planned outages?

55

## Measures of Central Tendency

The Typical value
Most effective measure for numerical data. Let {X1, X2, , XN-1,
XN} be the data set. Then
Mean = X = (X1 + X2 + + XN) /N = Xi / N

Mean

May be used for ordinal data but not for nominal data
Sensitive to extreme values
Ordinal data: Category containing the (N+1)/2 case

Media
n

Mod
e

## Numerical data: (N+1)/2 th ordered observation, when N is

odd and average of N/2 th and (N/2)+1 th ordered observations,
when N is even.
Can be computed even for open ended classes at the extremes
provided each of the end classes contain less than 50% of the
observations.
Category
value occurring with greatest frequency
Insensitiveortothe
outliers.
Only measure of center for nominal data
May not be unique and highly sensitive to how the classes or
categories are formed.

## Quality with Statistics

56

Interpretation of Mean
In a rising voltage test the alternating breakdown
voltage(kV) of 24 samples of an insulation arrangement were
found to be as follows:
210; 208; 208; 175; 182; 206; 190; 194; 198; 205; 212; 200;
205;
Dot
MEAN
=
[210
+
208
+

+
216
+
196]
/
202; 207; 210; 202; 201; 188; 205; 209; 201; 216; 196
Plo
24
t
= 201.25 kV
170

180

190

210

220

Mea
n for the distribution of the
Mean is the balance point (or fulcrum)
values

## Mean is analogous to centre of gravity

Sum of negative deviations from mean exactly equals the sum of
positive deviations. Thus the total sum of the deviations from
mean is always zero
In the above example,
the with
meanStatistics
should be interpreted as a 57
Quality
measure of centre and not that of central tendency or typical

## Data Analysis Getting Started

Reportable accidents (#) in AEC Ltd.,
Sabarmati
1995-2000
Aprduring
May Jun
Jul
Aug Sep

Jan

Feb

Mar

Oct

Nov

Dec

Total

1995

24

17

27

19

10

25

19

22

23

16

18

15

235

1996

22

10

22

18

16

21

21

20

21

18

18

18

225

1997

19

14

12

15

15

15

24

19

16

14

19

192

1998

14

14

12

20

19

23

10

16

13

15

17

19

192

1999

19

13

15

13

16

18

17

16

20

17

13

16

193

2000

12

14

15

22

12

13

13

134

Total

110

82

103

108

88

100

88

111

103

91

87

100

1171