You are on page 1of 58

Senior Management Programme

Quality With Statistics


Ahmedabad Electricity Company
Ltd.
26 27 June
2002
Conducted
by:

Indian Statistical
Institute
98 sampatrao Colony
BARODA 390 007

Defining the Ideal Quality


Value
1) List the six factors which you believe are the major determinants of quality
Guidelines for scoring

1. Use same time scale:


Frequency of improvement
actions <= frequency of
review <= frequency of
reporting <= frequency of
measurement.
2. A factor is measured
always if it is measured as
frequently as is practically
possible.

2) For each factor, place a rating on the following statements


5 = Always
M Performance of the listed factor should
be measured
4 = Often
R The performance measure should
be reported
3 = Occasionally
R The management should
review the performance 2 = Rarely
reports
1 = Never
I Improvement actions should
stem from the reviews

Factor

3. Ideal value for each


factor need not be 5. For
example,scrap% may be
measured every hour
(always) but the ideal
frequency may be every
shift (often).

Measure

Report

Review

Total

Improve

4. A factor which is not


measurable (e.g. integrity)
gets a score of 1 for all the
four actions (M-R-R-I).

Total

Quality with Statistics

Defining the Real Quality


1) List the 6 factors you believe are the major determinants of quality
Value
2) For each factor place a rating on the following statements

Guidelines for
scoring
1. Use the same 6
factors and the
time
scale as was used
while defining
ideal
quality value.

M
R
R
I

5 = Always
Performance of the listed factor is
measured
4 = Often
The performance measure is
reported
3 = Occasionally
The management
reviews the performance reports
2 = Rarely
Improvement actions
stem from the reviews
1 = Never

Factor

Measure

Report

Review

Total

Improve

2. Real score can


be
equal to or more
or
less than the ideal
score.

Total

Quality with Statistics

The Quality Value Grid


Behavior Score
120
100
80
60
40
20
0
0

20

40

60

Belief Score

80

100

Quality with Statistics

120

Cutting to the
Core
Behavior is a function of values
B = f(V)
Behavior
The way in which a person or group of people responds
Values
The complex of beliefs, ideals or standards which
characterizes a person or group of people

Quality with Statistics

The Cost of Remaining


Average
Waste as a proportion of total sales volume

30%
Typical Company

Your Area?

Quality with Statistics

The Classical View of


Performance
Practical Meaning of 99% Good
20,000 lost articles of mail per hour

Unsafe drinking water almost 15 minutes each day


5000 incorrect surgical operations per week
2 short or long landings at major airports each day
2,00,000 wrong drug prescriptions every year
No electricity for about 7 hours each month

Quality with Statistics

The Need for


Knowledge
Knowledge
We dont know what we
dont know
If we cant express what
we know in terms of
numbers, we really dont
know much about it

The Need
If we dont know, we can
not act

If we can not act, the risk of


loss is high

If we know and act, the risk


is managed

In God we believe, all


else must have data
Hewlett Packard

If we do know and do not


act, we deserve the loss

Quality with Statistics

The Role of
Questions
Questions lead and answers follow. The same question most often lead to

the same answers which invariably produce the same result. To change
the result means to change the question.

New

measures lead to new questions. [Management needs to focus on


new measures like .. rather than outputs and budgets].

As

questions arise, vision emerges, direction becomes apparent and


ambiguity diminishes. In turn, people become organized and mobilized to
common action.

When people take common action, the organization's

ability to survive
and prosper will increase, owing to the discovery of answers to
problems heretofore not known.

Insanity is doing the same thing over and over again but expecting different results Rita Mae Brown (Author)

Quality with Statistics

The Value of Measurement


Improved Measurement

Measurement

Question

Search

Knowledge

We dont know what we dont know


We cant act on what we dont know
We wont know until we search
We wont search for what we dont question
We dont question what we dont measure
Hence, we just dont know
Mikel J. Harry

Quality with Statistics

10

The Role of
Training
Undoubtedly the most

important aspect of Quality is people and their


knowledge. Without this golden asset all is for nothing. At the risk of redundancy,
you dont know what you dont know and if you dont know something nothing
will happen. Obviously the key is knowledge. Successful change can not occur
without it.

Today, the best-in-class companies provide a tremendous amount of training


and education to their employees. Many of such companies have made
significant investments in training, and are discovering the rewards. For
example, Motorola Inc. has discovered a 10:1 return on their budget. In fact, they
require every employee to receive 40 hours or more of training annually, of
which 40% must be in the area of Quality.

Quality with Statistics

11

What is Quality
Quality means different things to different people. There is
no universally accepted definition.
However, there is a broad agreement on the following
Very difficult to define
Determined by customer
Multi dimensional
Dynamic
Needs to be TOTAL

Usually, TOTAL QUALITY refers to the fact that all


departments have roles in quality.
Quality with Statistics
12

ISO 9000:2000 Definition of


Quality
Degree to which a set of inherent characteristics fulfills
requirements
Requirements are needs or expectations that are stated or implied
Requirements can be generated by different interested parties
Inherent characteristics are the distinguishing features that exist in
the product/process/system, specially as a permanent characteristic

Inherent characteristics are called quality characteristics


Assigned characteristics (e.g. product price) are not quality
characteristics
Note: This definition is an improvement over its 1994 version.
However, it can still be argued that all inherent characteristics
are not quality characteristics.

Quality with Statistics

13

How to Measure Quality


Product
Quality

Marketing Quality + Design Quality +


Mfg.Quality + . + Service Quality

Appropriateness of requirements
Customer
Satisfaction

Degree of conformance to
requirements
Cost of identifying and meeting the
requirements

Quality with Statistics

14

How to Measure Quality (Contd.)


Customer satisfaction can be measured but it is not very useful
as a stand-alone measure.
Establishing the function f is a highly challenging task
Presently, all quality measures (e.g. Defect Rate, Process
Capability, Quality Cost, Cycle Time) address only a part of the
whole.
Points to remember
Quality

is customer satisfaction but customer satisfaction is


not quality
Reducing internal rejection and rework reduces producers
cost but not that of the customer
Quality with Statistics
15

Components of
Main
Quality
Quality of Design
Componen
Decides the level of customer
attraction
Related to market segmentation
based on product grade

t
Quality of
Design

Improving design quality may


lead to higher cost but the same
need not be the case always.

Subcomponen
t
Product
Design

Power rating of an engine


Robustness
Operating cost
Ease of use

Process
Design

Rated efficiency
Process capability
Cycle time
Downtime for regulatory inspection

Process
Conformance

Process instability
Process failures
Late deliveries
Loss of efficiency/yield

Product
Conformance

Field failures
Factory scrap and rework
Deviation from target
Incorrect invoices

Quality of Conformance
Refers to the deficiencies
resulting from lack of control
Decides the level
customer dissatisfaction

of

Improving quality of
conformance always leads to
reduction of costs. It is in this
sense that Crosby says
quality is free

Quality of
Conformance

Examples of features

Quality with Statistics

16

Quality with Statistics


Tasks
Quality of
Design

System Design
Parameter Design
Tolerance Design

Quality

Statistical Tools
Process Monitoring
and Adjustment

Quality of
Conformance Problem Solving

Product Disposal

Quality with Statistics

17

Quality with Statistics


Qualit
This Programme
Tasks
Scope
Statistical Tools
y by

Product
Design

in
AECL

This
Programm
e

System
Design

Limited*

QFD, FMEA, Reliability


Engineering

Nil

Paramet
er
Design

-do-

Statistical Designs, S/N


Ratio, ANOVA

The concept
of robustness
only

Toleranc
e Design

-do-

Statistical Designs, Loss


Function, Simulation,
Regression

Nil

System
design

Limited**

Same as those mentioned


against product design
Process
Design
PLUS optimization tools for
Paramet
Very
inventory management,
er
High
* Applicable only forDesign
intermediate products and services
transportation, scheduling
** Applicable mostly for management and service delivery processes etc.
Toleranc
High
Quality
with Statistics
e Design

Nil
The concept
of robustness
only
Illustration
18
with an

Quality with Statistics


Quality
This Programme
(Contd.)
Tasks
Scop
Statistical Tools
by

Process
Conforman
ce

Product
Conforman
ce

e in
AECL
Process
Monitorin
g and
Adjustme
nt

Very
High

Problem
Solving

Very
High

Product
Disposal

High

Field

This
Programm
e

Probability Distributions, Principles and


Control Charts, GR&R
tools of
Studies, PCA, Process
process
adjustment methods
monitoring
only
Simple tools like
Histogram and C&E
diagram, (Z, t, 2, F)tests, Advanced tools
PLUS all the tools
mentioned above

Concepts,
disciplines
and simple
tools of
problem
solving

Bulk Sampling,
Acceptance Sampling,
Loss Function

Issues in bulk
sampling only

Quality
High with Statistics
Nil

Nil 19

Chapter 2:

Data and Data


Collection

Data
Data are facts or figures related to any characteristic of
an individual
Also called a variable
A m/c, an year, a casting, a dimension, a person

Power station outages (up to 31/03/01 since commissioning)


VARIABLES
Station

Date of
commissioning

Availability
(%)

No. of
outages

C:15

12/11/98

92.5
9

30

C:16

10/05/97

93.0
4

12/10/78

88.3
2

INDIVIDUALS

Average
duration of
non-stop
operation
(days)

Average loss per


outage (hours)

Main
cause of
outage

Capacity
utilization

Forced

Planned

27

64

52

Leakag
e

High

47

28

52

52

Leakag
e

Mod.

124

58

261
164
Gen*
V. Low
* Generator stator / rotor problem

Quality with Statistics

21

Types of Data/Variable
D a t a / V a r ia b le
N u m e r ic a l/ Q u a n t it a t iv e
C o n t in u o u s

D is c r e t e

C a t e g o r ic a l/ Q u a lit a t iv e
O r d in a l

N o m in a l

Continuous: An infinite number of values (positive or negative) are possible, e.g.


measurements of weight, length, chemical composition.
Discrete: The variable can take values 0,1,2,3, .. e.g. count of frequency (# of
defects, breakdowns etc.)
Ordinal: Data classified in ordered categories, e.g. quality of service provided is
classified as poor, moderate, good or yearly rainfall classified as very low, low,
moderate, good and very good.
Nominal: Data classified in categories having no inherent or explicit order, e.g.
location classified as east, west, north, south or names of departments.

Quality with Statistics

22

Types of Data - Outage Data


Example
Variable Name

Variable Type

1. Date of commissioning
2. Availability (%)
3. Number of outages since
commissioning
4. Average duration of non-stop
operation (days)
5. Average loss per outage
(hours)
6. Main cause of outage
7. Capacity utilization

Quality with Statistics

23

Types of Data - Further


Considerations
Continuous data may appear as discrete either due to rounding (see the
outage data example) or due to measurement limitations. We should treat
such data as continuous unless the number of levels in the data set is very few
(say 2-4).
However, hourly records of steam pressure at turbine inlet (station F) show
that the values are either 126 or 127 or 128. Great care must be exercised
while analyzing such data.
Discrete data having seven or more levels may be treated as continuous data.
Dichotomous data (O.K/Not O.K, Pass/Fail etc.) may be treated as discrete data
after coding the two categories as 1 (O.K) and 0 (Not O.K).
In the field of Quality Control, various types of data are classified as
- VARIABLE DATA : Continuous data
- ATTRIBUTE DATA: Others - Discrete, Dichotomous, Ordinal and Nominal
Henceforth we shall use this later classification.

Quality with Statistics

24

Data Gateway
DATA
COLLECTION
Problem/
Hypothesi
s

DATA ANALYSIS

Dat
a

Solutio
n/
Fact

Quality problems can not be solved merely based on experience.


Any claim not backed by data is only a hypothesis.
Data Gates: Quality of the data gates and their placement at
appropriate
locations of a process are extremely important for
process control.
Data Quality: Data collection step is vital garbage in,
garbage out

Quality with Statistics

25

Data Quality Scale


Most Data are of Poor Quality
Whenever you see data,
doubt it
Quality
Impact
Example
category

Rank*

Wrong data

Misleading
information

Cooked data

Noisy Data

Potentially
misleading
information

High gauge
R&R

Old data

Irrelevant data Useless information


Inadequate
data

Partial information

Small sample

Hard data

Difficult to process

Censored
data

* Higher the better

Redundant

Quality
with
Statistics
Useful but
adds
to

Multiple

626

Information Content in Data


for Process Control
Source of Data

Attribute
Data

Variable
Data

Very low

Low

Low

Moderate

Past Data: Statistically


designed experiments

Moderate

High

Live data: Passive observation


of the process

Moderate

High

High

Very High

General literature
Past data: In-house routine
Q.C records

Live Data: Statistically


designed experiments

Do not transform variable data to


attribute data.
That will be
like burning
diamond for heat.
Quality
with Statistics

27

Data Collection Process


INDIVIDUALS

VARIABLES
Var. 1 Var. 2 Var. 3

Population
.

Var. p

Ind. 1

Data

Data

Data

Ind. 2

Data

Data

Data

Data

Ind. 3

Data

Data

Data

Data

Ind. n

Data

Data

Data

Sample

Data

.
.

Measurement
.

Data

Recording

Quality with
Statistics
Editing, Storage,

28

Linking Data Quality to Data Collection


Process
Process
Elements
Popula
tion
Sampl
e

Wron
g

Individua
l
Procedur
e

Inade
quat
e

Har
d

Gauge

Appraise
r

Others

Record
ing

Irrele
vant

Variables

Size

Measu
remen
t

Noisy

Format
Recorder

Editing, Storage,
Retrieval

Redun
dant

Issues
relate
d to
data
base
mgmt.

Quality
with Statistics

29

Poor Data Quality


- Cause and Effect Diagram
Populatio
n

Individu
al

Sampl
e

Variabl
e
Samplin
g
Method

Size

Measureme
nt
Measuran
Gaug
d
e
Metho
Apprais
d
er

Poor
Data

Operat
Softwar
or
Hardwar
e
Forma e
Data base
Mgmt.
t
policy
Recordin
Editing, storage,
g
retrieval
Record
er

Qualit
y

Note: Due to limitations of space, only the main subcauses are shown in the CE diagram.

Quality with Statistics

30

Measurement Related Causes for


Poor Data Quality
Calibratio
n
Not

Operatio
n

done
Statu
Breakdow
s
Done long back n
Not
Resultsused

Measuremen
t
Bia
s

Malfunctionin
g

Inadvertent
error

Numbe

Appraiser
s
Reproducibili
ty

Unstabl
e

Not
traceable

Gauges

Different
makes
Many
Variable
least
count

Numbe
r

Operating
range
Beyond
limit

Capabilit
y

Low
repeatabilit
y Precision
Low least
count

Measuran
d

Inhomogeneou
s
Standard
procedure
Not
availabl
e

Type of
data
Unwante
d

Quality with Statistics

Not
followed
Communicatio
n

Metho
d

Poor
Data
Qualit
y

31

Data Collection Planning


- Principle of Inverse Loading
Plan

The Planning
Questions
1) What do you want to
know?
2) How do you want to
see what it is that you
need to know?
3) What type of tool
will generate what it
is that you need to
see?
4) What type of data
is required of the
selected tool?

5) Where can you get


the required type of
Execut data?
e

Illustration
Has X any effect on
Y?
X1 X2
X3 Y
.
. . .... . ...
..
.. .
Y
Histogra
m
X1 X2 X3
Y11 Y21
Y31
.
.
.
Y1n Y2p
Y3qinspection
Final
and production
log book

Quality with Statistics

Scatter
diagram
X
X1
.
Xn

Y
Y1
.
Yn

Nowhereto be
collected

32

Data Collection Tools


Foregoing discussion indicates that collection of right data, by no means, is
a trivial task. One can go wrong in various ways at different stages of the
data collection process.
The two basic requirements for data collection are

Clarity of purpose
Use of a structured approach

Commonly used data collection tools, that satisfies the two requirements
are

Check Sheet
Data Sheet

Check Sheet: Checks (/, , x etc.) are made against a category of a


variable or combination of categories of several variables. Used primarily
for collecting attribute data.
Data Sheet: Measurement results are recorded against an individual and
its characteristics. Used for collecting both attribute and variable data.
Many consider all check sheets as data sheets and vice versa. However,
we shall distinguish between the two as above.

Quality with Statistics

33

Process Distribution Check


Sheet
Power Generation Process (Moving
Target) Characteristic: Y1= Total generation (MW), Y2= System

Month:
September
Sampling
interval: Every 3.5 demand
hours
Target: Min(420,
Data: Target - Y1
Y1) Class
bar
Check

Process average (Y1 bar): 420


MW Total No. of observations:
206
Frq

Interval
<-54.99
-54.99 to
44.99
-44.99 to
34.99

Wasteful export
due to lack of
control

7
5

Export limit =
-10

-34.99 to
24.99
-24.99 to
14.99

Import limit =
+20

-14.99 to
04.99
-04.99 to
05.01
05.01 to
15.01
15.01 t0

Wasteful import
due to lack of
control

Defect rate = 27
%

Quality with Statistics

8
12
6
16

342

Causes for Wasteful Import of


Power
Run Chart of half-hourly readings of
generation at station C15 in September
2001

35.0
30.0
25.0
20.0

15.0
10.0
5.0
0.0

1340

1237

1134

1031

928

825

722

619

516

413

310

207

104

A: Process failure B: Process deficiency C: Early slow down D: Late


pick up

Quality with Statistics

35

Defect Cause Check Sheet


Month: September, 2001
affected
Station
C15
C16
Defect

Data: # of hours of generation


D

Total

Process
failure

52

Process
deficienc
y

81

Early
slowdow
n

15

Late pick
up

34

Total

54

22

65

21

Note: Criticality of the defects is not same over all


stations

Quality with Statistics

20

182

36

Identifying Critical Causes


for Wasteful Import
Hours of low
generation
C1 C1 D
5
6

PF

30

15

PD

11

36 14 11

ES

LP Process
11 11
PF=
PD=
Process
failure
ES=
Early
deficiency slow
LP=
downLate pick
up

Average generation loss at each


instant
C15 C16
D
E
F

PF

29.
0

29.
5

PD

10

ES

10

30

15

7
LP 10
4
Total generation loss
(MWH)
C15 C16
D
E

30

15

Tota
l

PF

870

160
5

725

320
0

PD

55

18

360

70

55

558

ES

20

270

30

328

LP

110

44

150

105

409

with
238 Statistics
795 190

449
5

Tota
l

105
Quality
70
5

107.0 103.5

110.
0

37

Other Types of Check Sheets


Defective item check sheet

Checks are made against various causes of rejection/rework of an item.

Defect location check sheet

Instead of a table a diagram is made of the defect space.


Checks are made at the location where defect occurs.
Locational segregation of defects, if any, provides valuable clue.

Leakage in a cooling system


Cracks in castings
Wear out of moving parts

Check-up confirmation check sheet

Used to make a comprehensive check-up of product/process quality (usually


at the final stage).
Preprinted items of checks avoids duplication and missing of tests to be
performed.
It is a variation of check list, which is used for checking if all the tasks have
been performed or not.

C-E diagram check sheet

Checks are made against the cause of a problem in the C-E diagram.

Quality with Statistics

38

Data Sheet General Format


Title
Individu
al

Var. 1

Common relevant
information

Var.
2

Var. p

Remar
k

Ind. 1
Ind. 2

Ind. n

Notes:

Important summary of
data

Quality with Statistics

39

Data Sheet - Example


Rak
e
N0.

Up-load detention report for the month of July,


Dat2001
Arrival Qu # of
For
For
Depar Depar Deten Demur Rea
e

time

a
lity

wag
ons

m
date

m
time

t.
date

t.
time

.
hours

.
hours

son

Actual
unloadin
g time Hr.

01

01

19.45

Envi
ro

58

02

05.3
5

02

15.30

09.55

09.00

20

14

07.50

Du.
hill

58

15

16.4
5

16

00.20

07.35

23

S(19)
+I(4)

14.30

42

31

20.20

14.45

Purpose
? Estimation of demurrage

hours of demurrage
Control

hours
Important reasons cited are receipt in quick succession, successive
detentions and wet coal. These are beyond the control of the coal
handling section.

Inadequate
Quality
Data!with Statistics

40

Chapter 3:

Summarization of
Data

Data Analysis Getting Started


Half-hourly record of generation by station E during 19/9/01 (10 hrs.)
to 21/9/01 (1.30 hrs.) under normal operating condition

Hours
(MW)
10.00
13.30

14.00
17.30
18.00
21.30
22.00
01.30
02.00
05.30
06.00
09.30
10.00
13.30
14.00
17.30

Generation
102.8 105.2 103.2 104.0
105.0 105.0 104.0 104.0
103.2 104.2 102.0 103.6
105.2 106.0 105.0 103.0
104.2 105.8 105.4 104.8
106.0 104.0 104.2 103.8
103.4 104.4 104.4 104.2
104.8 102.8 103.6 104.8
104.0 104.0 104.0 104.0
103.0 104.8 102.8 104.0
104.0 103.4 106.0 104.4
What
are105.2
your
105.0
105.2

conclusions?
Quality with

105.2
105.2
103.8
103.2
104.8
104.4
104.8
104.4
104.4
103.4
104.4

Statistics

104.8
106.0
105.0
103.0
105.2
104.0
106.2
104.8
104.0
103.6
102.4

105.6
106.4
105.2
103.0
105.2
102.2
106.4
104.0
102.6
104.0
102.8

42

Frequency Distribution
- Analyzing a large data set on the same
variable
Generation data set (previous
The eighty observations
are grouped in eight classes of
slide)
equal length
Class Interval
Tally
Frequency
101.7 102.3

02

102.3 102.9

06

102.9 103.5

10

103.5 104.1

19

104.1 104.7

11

104.7 105.3

22

105.3 105.9

03

105.9 106.5

07
Total

80

Does the frequency distribution provide better insight into the


Data are not
process?
information
DATA +
ANALYSIS =
INFORMATION
Quality with Statistics

43

Constructing Frequency
Distributions
- Variable Data
Data set

Number of observations (N): About 100 on the same variable.

Formation of the classes (first column)

Number of classes (k)


Too many classes obscure the pattern of the distribution due to

sampling fluctuations. Details are lost with too few classes.


Optimum number of classes is given by k = 1 + 3.3 log10 (N)
The simpler formula k = N also works well in practice.
For better visual impact, it is preferable to have 5 k 12.
For the generation data set we have N = 80. Therefore, k

= 1+3.3*log(80) = 7.3. This means the number of classes


should be either 7 or 8. We have chosen 7 classes.

Quality with Statistics

44

Constructing Frequency
Distributions (..contd.)

Class width (h)


h = (R + w) / k

where R = Range of the observations = Maximum


Minimum
and
w = Least count of measurement.
Next, h is rounded to the nearest integer multiple of w. This

means, if the least unit of measurement (w) is 0.1, then h =


2.312 should be rounded to 2.3. However, if w = 0.2, then
the same h should be rounded to 2.4.
In our generation data example, R = 106.4 102.0 =

4.4, and w = 0.2. Thus, h = (4.4+0.2) / 7 = 0.657,


which is rounded to 0.6. We shall explain later, why
taking h = 0.7 will be erroneous.
Note that if h is rounded down then we shall need (k+1)

classes to cover the whole range of the observations. How


many classes shall we need if h is rounded up?

Quality with Statistics

45

Constructing Frequency
Distributions (..Contd.)

Class limits

The minimum value of the generation data is 102.0 and the class
width has been determined as 0.6. So we can form the classes as
102.0 102.6, 102.7 103.3, 103.4 103.9, . . .

The problem with the above classification is that there is a gap


between two successive class intervals. This is not desirable since
we are dealing with continuous data.

Discontinuity can be removed by forming the classes as


102.0 102.6, 102.6 103.2, 103.2 103.8, . . .
However, this classification has another problem. Suppose we have
an
observation 102.6. In which class shall we place it, first or second?

In order to avoid such confusion we take


Lower limit of the first class = Minimum w/2
and then successively add the class width to this lower limit to obtain
the other class limits.

Quality with Statistics

46

Constructing Frequency
Distributions (..Contd.)

Class limits (..Contd.)

Thus, for the generation data we have the classes as


101.9 102.5
102.5 103.1
103.1 103.7
103.7 104.3
104.3 104.9
104.9 105.5
105.5 106.1
106.1 106.7

Note that now we have


- 8 classes (since h has been rounded down from 0.657 to 0.6)
- no confusion in classification (since there are no observations
which
fall on the class limits) and
- an extended last class (ideally the upper limit of the last class
should
have been 106.5).

In the example, we have extended the first class instead of


the last one since this has brought out the process
abnormalities better. Thus the eight classes used are
101.7 102.3, 102.3 102.8, , 105.9 106.5

Quality with Statistics

47

Constructing Frequency
Distributions (..Contd.)
Tally marking (second column)

Start with the first observation. Find the class to which the observation
belongs. Put a tally against the class.
Classify all the remaining observations as above.
Tally marks are grouped in five, with the fifth tally crossed through the
previous four tallies. This provides a better visual display and helps in
counting the frequency of each class.
Note that all the above observations get classified as we go through
the observations only once. However, if we concentrate on a class and
then try to find out the number of observations in the class then we
have to go through the observations k times. This not only consumes
more time but also increases the chance of committing error.

Counting frequency (third column)

The frequency (f) of each class is obtained simply by counting the


tallies.

Other columns

Columns giving cumulative frequency (f1, f1+f2, ..) and relative


frequency (f1/N, f2/N, ..) may also be added, if required.

Quality with Statistics

48

Constructing Frequency
Distributions
- Getting the class intervals right

Why class width (h) is rounded to nearest integer multiple of w

Consider the same generation data example. Here w=0.2. Assume that
h = 0.657 is rounded to 0.7 (which is not an integer multiple of 0.2)
instead of 0.6. Thus the classes will be 101.9 102.6, 102.6 103.3, ..

Now in order to overcome the problem of classifying observations like


102.6, we are forced to consider w=0.1 and have the classes as
101.95 102.65, 102.65 103.35, 103.35 104.05, 104.05 104.75,
104.75 105.45, 105.45 106.15, 106.15 106.85

Note that the number of observation units covered by each class are
not same. For example, the second class covers three units (102.8,
103.0 and 103.2) but the third class covers four units (103.4, 103.6,
103.8 and 104.0). As a result the frequency distribution is likely to
show many peaks.

Balancing end points

Assuming w=0.1, the seven classes shown above should be


appropriate. However, note that the last class is extended by four units
beyond the maximum observed value of 106.4. It is desirable to
distribute this imbalance to the two end classes by starting the first
class from 101.75 and ending at 106.65.

Quality with Statistics

49

Frequency Distribution of The


Generation Data Further
analysis

The frequency distribution shows an abnormal pattern (nearly


alternative peaks). Does this mean the process mean is jumping
randomly by about 1.2 unit?
Following two frequency distributions constructed out of the same data

provide some additional clues.


Fractional part Frequenc
y
.0
.2
.4
.6
.8

!!
a
at
D
y
s
i
No

Class interval

Frequency

101.7 102.7

04

27

102.7 103.7

17

18

103.7 104.7

26

104.7 105.7

25

105.7 106.7

08

Total

80

15
5
15

Total
80
0s occur more frequently
at the cost of 6s. Does this
indicate
measurement
bias?

Quality

Smooth pattern (left skewed).


Smoothness has been achieved not only
by reducing the number of classes but also
by including the adjacent 0s and 6s in the
with interval.
Statistics
50
same

Histogram
Histogram is a graphical representation of a frequency distribution of

variable data.

The histogram of the generation data having five classes is shown below.

Frequenc
y

30
25
20
15
10
5
0

Bars of equal width (= class width)


Heights of the bars are proportional to

the frequencies of the classes

Bar width of about 1 cm. (7-10 classes)


Horizontal axis is about 1.6 times

longer than the vertical axis

Central tendency: About 104.2.

101.7
103.7
Pattern of variation: Slightly left
105.7
skewed
Generation in E station
(MW)
Specification limits: Should be shown wherever applicable.
Class mid-point: Marking the class mid-points may be helpful in

certain cases.

Open ended classes: Avoid adding too many classes at the ends

having zero or
Quality
withasStatistics
51
very
low
frequencies.
Shown
open
ended
bars
with
arbitrarily
reduced heights.

Construction of Histogram
- An exercise
Half-hourly record of power (MW) generated by station E during

29.9.2001
(10.00 hours) to 30.9.2001 (24.00 hours) gives us the following
data.29/9
6.4 6.4 6.8 6.0 5.2 4.8 6.4 4.4 5.2
(10 hrs.) 6.0
7.6 8.0 7.4 6.6 8.0 5.6 7.2 7.2 7.0
4.0
6.4 8.0 8.0 6.0 6.0 6.4 7.8 7.6 7.6
7.4
7.6 7.6 7.4 4.6 4.2 4.8 6.0 5.6 5.4
5.0
6.2 7.8 7.4 7.2 7.4 7.8 6.6 6.4 6.8
6.8

30/9
6.8 6.8 6.6 6.8 6.6 6.8 6.8 6.8 7.0
(24
hrs.)
7.0
Construct a histogram of the above data set. Compare with the
histogram 6.0 5.6 4.4 4.6 4.6 4.8 6.2 7.0 6.6
for the period
6.4
19.9.01
to 21.9.01
( previous slide) and offer your
Quality
with Statistics
52
comments.

Commonly Observed
Histogram Patterns

Single peak, symmetric,


bell shaped, commonly
observed pattern of a
stable process
LSL
US
L

Single peak, thick


tail

How
?

Single
positively
(long tail
right)

peak,
skewed
on the

Single
peak,
negatively skewed
(Long tail on the
left)
Many
characteristics
follow such patterns. We
have already seen that
generation
data
is
negatively skewed while
breakdown
data
is
Two peaks (bipositively
skewed.
modal)
?
w
However
such
shapes
Ho
may also indicate process
Quality with Statistics instability.
53

Frequency
Distribution of
Discrete
Data
Number of plant outages in each year since
commissioning
Statio
Period
Type of
n
outage
D

197879
To
200001
198586
To
200001

Forced

# of outages in a year
2, 3, 1, 0, 3, 2, 1, 0, 2, 2, 0, 2, 3, 0, 2, 1,
2, 1, 1, 0, 1, 0, 2

Planned 3, 5, 1, 4, 2, 5, 2, 1, 6, 3, 7, 7, 4, 7, 6, 5,
6, 4, 2, 2, 2, 6, 2
Forced

2, 2, 5, 3, 0, 0, 1, 0, 1, 0, 2, 1, 1, 0, 1, 4

Planned 15, 7, 8, 3, 7, 5, 2, 6, 3, 8, 7, 4, 5, 4, 3, 4

F
19881, frequency
1, 0, 0, 1, 1,
2, 0, 1, 0, 1,(for
6 each type
Ideally
we
should Forced
construct4,six
distributions
89
Planned However,
3, 11, 6, 12,
1, 2, 8, 2,of4,data
4, 6 we shall
of outage inTo
each station).
due4,to0,shortage
construct only
2000-two - one for forced outage and the other for planned
outage.
01
What can you say about the occurrence of two types of
Quality
with
Statistics
54
outages from the above
data
set?

Line Graph and Bar Graph


Distribution of number of yearly outages of stations D, E
and F since commissioning
Forced outages (Line
graph)

1
6
1
2
8

Frequenc
y

Frequenc
y

1
6
1
2
8

Planned outages (Bar


graph)

Number of
outages

0 1 2
11,12,15

3 4

Number of
outages

8 9

Line graph is showing the frequencies of individual outcomes.


Bar graph is similar to the histogram. But there are gaps between the
bars since we are dealing with discrete (attribute) data.
Planned outages occur more frequently than forced outages.
Number of planned outages is uniformly distributed between 2 and 7
with very few outages outside this band. Such a pattern is somewhat
odd. Planned outages need to be defined properly. Do we undertake
unnecessary planned outages?

Quality with Statistics

55

Measures of Central Tendency


The Typical value
Most effective measure for numerical data. Let {X1, X2, , XN-1,
XN} be the data set. Then
Mean = X = (X1 + X2 + + XN) /N = Xi / N

Mean

May be used for ordinal data but not for nominal data
Sensitive to extreme values
Ordinal data: Category containing the (N+1)/2 case

Media
n

Mod
e

Numerical data: (N+1)/2 th ordered observation, when N is


odd and average of N/2 th and (N/2)+1 th ordered observations,
when N is even.
Can be computed even for open ended classes at the extremes
provided each of the end classes contain less than 50% of the
observations.
Category
value occurring with greatest frequency
Insensitiveortothe
outliers.
Only measure of center for nominal data
May not be unique and highly sensitive to how the classes or
categories are formed.

Quality with Statistics

56

Interpretation of Mean
In a rising voltage test the alternating breakdown
voltage(kV) of 24 samples of an insulation arrangement were
found to be as follows:
210; 208; 208; 175; 182; 206; 190; 194; 198; 205; 212; 200;
205;
Dot
MEAN
=
[210
+
208
+

+
216
+
196]
/
202; 207; 210; 202; 201; 188; 205; 209; 201; 216; 196
Plo
24
t
= 201.25 kV
170

180

190

210

220

Mea
n for the distribution of the
Mean is the balance point (or fulcrum)
values

Mean is analogous to centre of gravity


Sum of negative deviations from mean exactly equals the sum of
positive deviations. Thus the total sum of the deviations from
mean is always zero
In the above example,
the with
meanStatistics
should be interpreted as a 57
Quality
measure of centre and not that of central tendency or typical

Data Analysis Getting Started


Reportable accidents (#) in AEC Ltd.,
Sabarmati
1995-2000
Aprduring
May Jun
Jul
Aug Sep

Jan

Feb

Mar

Oct

Nov

Dec

Total

1995

24

17

27

19

10

25

19

22

23

16

18

15

235

1996

22

10

22

18

16

21

21

20

21

18

18

18

225

1997

19

14

12

15

15

15

24

19

16

14

19

192

1998

14

14

12

20

19

23

10

16

13

15

17

19

192

1999

19

13

15

13

16

18

17

16

20

17

13

16

193

2000

12

14

15

22

12

13

13

134

Total

110

82

103

108

88

100

88

111

103

91

87

100

1171

What are your conclusions?

Quality with Statistics

58