Professional Documents
Culture Documents
Amit K Biswas
For PGPM
July 7, 2021
Contents
1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures
Contents
1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures
TYPES OF DATA
CONTINUOUS DISCRETE
TYPES OF DATA
CONTINUOUS DISCRETE
TYPES OF DATA
CONTINUOUS DISCRETE
Countable
e.g. Heart Rate, Number of defects
Data if properly collected
TYPES OF DATA
CONTINUOUS DISCRETE
Countable
e.g. Heart Rate, Number of defects
Data if properly collected
TYPES OF DATA
CONTINUOUS DISCRETE
Countable
e.g. Heart Rate, Number of defects
Data if properly collected
TYPES OF DATA
CONTINUOUS DISCRETE
Countable
e.g. Heart Rate, Number of defects
Data if properly collected
DATA GATHERING
What is Data ?
DATA GATHERING
What is Data ?
DATA GATHERING
What is Data ?
DATA GATHERING
What is Data ?
—K.Ishikawa
DATA GATHERING
What is Data ?
—K.Ishikawa
DATA GATHERING
What is Data ?
—K.Ishikawa
—Lord Kelvin
Difference between...
Difference between...
Difference between...
CONTINUOUS DISCRETE
CONTINUOUS DISCRETE
CONTINUOUS DISCRETE
CONTINUOUS DISCRETE
CONTINUOUS DISCRETE
Capacity of an O2 tank
Capacity of an O2 tank
Capacity of an O2 tank
Capacity of an O2 tank
Capacity of an O2 tank
Capacity of an O2 tank
Capacity of an O2 tank
Capacity of an O2 tank
Contents
1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures
Data Collection
OBJECTIVES OF DATA COLLECTION
Data Collection
OBJECTIVES OF DATA COLLECTION
Data Collection
OBJECTIVES OF DATA COLLECTION
Data Collection
OBJECTIVES OF DATA COLLECTION
Data Collection
OBJECTIVES OF DATA COLLECTION
Data Collection
OBJECTIVES OF DATA COLLECTION
Data Collection
OBJECTIVES OF DATA COLLECTION
Data Collection
OBJECTIVES OF DATA COLLECTION
Data Collection
OBJECTIVES OF DATA COLLECTION
Data Collection
OBJECTIVES OF DATA COLLECTION
Contents
1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures
A data set
2.87, 2.85, 2.88, 2.85, 2.86, 2.85, 2.81, 2.82, 2.83, 2.85,
2.84, 2.84, 2.85, 2.86, 2.85, 2.84, 2.85, 2.85, 2.87, 2.81,
2.85, 2.82, 2.83, 2.85, 2.85, 2.86, 2.85, 2.86, 2.89, 2.85,
2.84, 2.84, 2.85, 2.85, 2.83, 2.82, 2.86, 2.83, 2.85, 2.86,
2.85, 2.84, 2.84, 2.87, 2.85, 2.86, 2.85, 2.84, 2.90, 2.88
To answer questions like these of any data set, we need to compile the data.
Contents
1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures
This suspended pipe is Horizontal The same pipe has now tilted
This suspended pipe is Horizontal The same pipe has now tilted
WHY?
This suspended pipe is Horizontal The same pipe has now tilted
WHY?
The rope is tied at the center The rope is tied away from
of gravity. the center of gravity.
This suspended pipe is Horizontal The same pipe has now tilted
WHY?
The rope is tied at the center The rope is tied away from
of gravity. the center of gravity.
Center of gravity is the point where the entire mass of the body is
supposed to be concentrated.
This suspended pipe is Horizontal The same pipe has now tilted
WHY?
The rope is tied at the center The rope is tied away from
of gravity. the center of gravity.
Center of gravity is the point where the entire mass of the body is
supposed to be concentrated.
Let us now find the mean of the values of our earlier data.
Let us now find the mean of the values of our earlier data.
2.87+2.85+···+2.88
x̄ = 50 = 2.849
Amit (ISI, Chennai) Data July 7, 2021 18 / 59
Introduction Centre & Spread
A data set
2.87, 2.85, 2.88, 2.85, 2.86, 2.85, 2.81, 2.82, 2.83, 2.85,
2.84, 2.84, 2.85, 2.86, 2.85, 2.84, 2.85, 2.85, 2.87, 2.81,
2.85, 2.82, 2.83, 2.85, 2.85, 2.86, 2.85, 2.86, 2.89, 2.85,
2.84, 2.84, 2.85, 2.85, 2.83, 2.82, 2.86, 2.83, 2.85, 2.86,
2.85, 2.84, 2.84, 2.87, 2.85, 2.86, 2.85, 2.84, 2.90, 2.88
A data set
2.87, 2.85, 2.88, 2.85, 2.86, 2.85, 2.81, 2.82, 2.83, 2.85,
2.84, 2.84, 2.85, 2.86, 2.85, 2.84, 2.85, 2.85, 2.87, 2.81,
2.85, 2.82, 2.83, 2.85, 2.85, 2.86, 2.85, 2.86, 2.89, 2.85,
2.84, 2.84, 2.85, 2.85, 2.83, 2.82, 2.86, 2.83, 2.85, 2.86,
2.85, 2.84, 2.84, 2.87, 2.85, 2.86, 2.85, 2.84, 2.90, 2.88
Let us find Mean for the values now with this new method. We will
need frequency table.
Let us find Mean for the values now with this new method. We will
need frequency table.
2.81×5+2.83×12+2.85×26+2.87×5+2.89×2
Mean, x̄ = 5+12+26+5+2 = 2.8448
Let us find Mean for the values now with this new method. We will
need frequency table.
2.81×5+2.83×12+2.85×26+2.87×5+2.89×2
Mean, x̄ = 5+12+26+5+2 = 2.8448
Median is the value which has equal number of values above it and
below it, when arranged in ascending order.
Let us find Mean for the values now with this new method. We will
need frequency table.
2.81×5+2.83×12+2.85×26+2.87×5+2.89×2
Mean, x̄ = 5+12+26+5+2 = 2.8448
Median is the value which has equal number of values above it and
below it, when arranged in ascending order.
Mathematically :
Let us find Mean for the values now with this new method. We will
need frequency table.
2.81×5+2.83×12+2.85×26+2.87×5+2.89×2
Mean, x̄ = 5+12+26+5+2 = 2.8448
Median is the value which has equal number of values above it and
below it, when arranged in ascending order.
Mathematically :
Mode
Mode
Mode
We have thus calculated, The Mean, Median and Mode for a given
data set.
Mode
We have thus calculated, The Mean, Median and Mode for a given
data set.
The Mean, Median and Mode are equal for a symmetric unimodal
distribution.
Mode
We have thus calculated, The Mean, Median and Mode for a given
data set.
The Mean, Median and Mode are equal for a symmetric unimodal
distribution.
Measures of Dispersion
The extent of the spread of the values from the mean value is called
Dispersion.
Measures of Dispersion
The extent of the spread of the values from the mean value is called
Dispersion.
Measures of Dispersion
The extent of the spread of the values from the mean value is called
Dispersion.
1 Range (R)
Measures of Dispersion
The extent of the spread of the values from the mean value is called
Dispersion.
1 Range (R)
2 Standard Deviation (s)
Measures of Dispersion
The extent of the spread of the values from the mean value is called
Dispersion.
1 Range (R)
2 Standard Deviation (s)
3 Variance (s 2 )
Measures of Dispersion
The extent of the spread of the values from the mean value is called
Dispersion.
1 Range (R)
2 Standard Deviation (s)
3 Variance (s 2 )
4 Co-efficient of Variation (CV )
Measures of Dispersion
The extent of the spread of the values from the mean value is called
Dispersion.
1 Range (R)
2 Standard Deviation (s)
3 Variance (s 2 )
4 Co-efficient of Variation (CV )
Measures of Dispersion
Of Population:
Measures of Dispersion
Of Population:
Measures of Dispersion
Of Population:
Of Sample:
Measures of Dispersion
Of Population:
Of Sample:
Measures of Dispersion
Of Population:
Of Sample:
Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy!
Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.
Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.
This is a piece of chocolate I want to divide in three pieces.
Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.
This is a piece of chocolate I want to divide in three pieces.
For my friend!
Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.
This is a piece of chocolate I want to divide in three pieces.
For my friend!
Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.
This is a piece of chocolate I want to divide in three pieces.
For my friend!
Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.
This is a piece of chocolate I want to divide in three pieces.
For my friend!
Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.
This is a piece of chocolate I want to divide in three pieces.
For my friend!
back to Variation
Alternatively:
back to Variation
Alternatively:
s s
Pn 2
(x12 + x22 + . . . + xn2 ) − n × X̄ 2 i=1 xi− nX̄ 2
s= =
n−1 n−1
r
(2.872 + 2.852 + . . . + 2.882 ) − 50 × 2.8492
s= = 0.0181
50 − 1
Other measures
Other measures
= Xmax − Xmin
Other measures
= Xmax − Xmin
Other measures
= Xmax − Xmin
Other measures
= Xmax − Xmin
Mean Deviation :
Pn
|xi −A|
i=1
n is the mean deviation from some value A.
Other measures
= Xmax − Xmin
Mean Deviation :
Pn
|xi −A|
i=1
n is the mean deviation from some value A.
Pn
|xi −x̄|
i=1
n is the mean deviation from Mean x̄.
Other measures
= Xmax − Xmin
Mean Deviation :
Pn
|xi −A|
i=1
n is the mean deviation from some value A.
Pn
|xi −x̄|
i=1
n is the mean deviation from Mean x̄.
Pn
|xi −Md|
i=1
n is the mean deviation from Median Md.
Other measures
= Xmax − Xmin
Mean Deviation :
Pn
|xi −A|
i=1
n is the mean deviation from some value A.
Pn
|xi −x̄|
i=1
n is the mean deviation from Mean x̄.
Pn
|xi −Md|
i=1
n is the mean deviation from Median Md.
Percentiles
Percentile (or a centile) is a score below which a given percentage of
scores in its frequency distribution falls. For example, the 50th
percentile (the median) is the score below which 50% of the scores
in the distribution may be found.
Percentiles
Percentile (or a centile) is a score below which a given percentage of
scores in its frequency distribution falls. For example, the 50th
percentile (the median) is the score below which 50% of the scores
in the distribution may be found.
Percentiles
Percentile (or a centile) is a score below which a given percentage of
scores in its frequency distribution falls. For example, the 50th
percentile (the median) is the score below which 50% of the scores
in the distribution may be found.
Percentiles
Percentile (or a centile) is a score below which a given percentage of
scores in its frequency distribution falls. For example, the 50th
percentile (the median) is the score below which 50% of the scores
in the distribution may be found.
Variation
Variation
This man wants to reach his work place by 6.55 a.m.. But he can
not do so, exactly at 6.55 a.m. daily. Sometimes he reaches earlier
(but almost never before 6.50 a.m.). Sometimes he reaches later
(but almost never after 7.00 a.m.)
Variation
This man wants to reach his work place by 6.55 a.m.. But he can
not do so, exactly at 6.55 a.m. daily. Sometimes he reaches earlier
(but almost never before 6.50 a.m.). Sometimes he reaches later
(but almost never after 7.00 a.m.)
WHY ?
Contents
1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures
The bin below is the population and the three selected ones by its
side is a sample of size 3.
Population Sample
Random
Population - Sample
sampling
- Data
Random Measurement/
Population - Sample
sampling
-
Observation
Data
No action
6
?
Random Measurement/
Population - Sample
sampling
-
Observation
Data
6
?
Action
— Sampling is
— Sampling is
— Sampling is
— Sampling is
— Why sample?
— Sampling is
— Why sample?
— Sampling is
— Why sample?
— Sampling is
— Why sample?
Contents
1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures
A data set
2.87, 2.85, 2.88, 2.85, 2.86, 2.85, 2.81, 2.82, 2.83, 2.85,
2.84, 2.84, 2.85, 2.86, 2.85, 2.84, 2.85, 2.85, 2.87, 2.81,
2.85, 2.82, 2.83, 2.85, 2.85, 2.86, 2.85, 2.86, 2.89, 2.85,
2.84, 2.84, 2.85, 2.85, 2.83, 2.82, 2.86, 2.83, 2.85, 2.86,
2.85, 2.84, 2.84, 2.87, 2.85, 2.86, 2.85, 2.84, 2.90, 2.88
A data set
2.87, 2.85, 2.88, 2.85, 2.86, 2.85, 2.81, 2.82, 2.83, 2.85,
2.84, 2.84, 2.85, 2.86, 2.85, 2.84, 2.85, 2.85, 2.87, 2.81,
2.85, 2.82, 2.83, 2.85, 2.85, 2.86, 2.85, 2.86, 2.89, 2.85,
2.84, 2.84, 2.85, 2.85, 2.83, 2.82, 2.86, 2.83, 2.85, 2.86,
2.85, 2.84, 2.84, 2.87, 2.85, 2.86, 2.85, 2.84, 2.90, 2.88
Possibly not much!
A data set
2.87, 2.85, 2.88, 2.85, 2.86, 2.85, 2.81, 2.82, 2.83, 2.85,
2.84, 2.84, 2.85, 2.86, 2.85, 2.84, 2.85, 2.85, 2.87, 2.81,
2.85, 2.82, 2.83, 2.85, 2.85, 2.86, 2.85, 2.86, 2.89, 2.85,
2.84, 2.84, 2.85, 2.85, 2.83, 2.82, 2.86, 2.83, 2.85, 2.86,
2.85, 2.84, 2.84, 2.87, 2.85, 2.86, 2.85, 2.84, 2.90, 2.88
Possibly not much!
The spread of each range is (e.g. 2.82 − 2.80 = 0.2 ) is called Class
Interval. Now we also know that:
The spread of each range is (e.g. 2.82 − 2.80 = 0.2 ) is called Class
Interval. Now we also know that:
52% of values lie in 2.84 − 2.86 range.
The spread of each range is (e.g. 2.82 − 2.80 = 0.2 ) is called Class
Interval. Now we also know that:
52% of values lie in 2.84 − 2.86 range.
What is the distribution in a given range.
The spread of each range is (e.g. 2.82 − 2.80 = 0.2 ) is called Class
Interval. Now we also know that:
52% of values lie in 2.84 − 2.86 range.
What is the distribution in a given range.
It is helpful when you have a large number of values.
Amit (ISI, Chennai) Data July 7, 2021 39 / 59
Population & Sample Histogram
Histogram
Histogram
The concept of X - Y axis and origin need not be followed once you
gain proficiency in drawing histograms.
Histogram
The concept of X - Y axis and origin need not be followed once you
gain proficiency in drawing histograms.
Histogram
The concept of X - Y axis and origin need not be followed once you
gain proficiency in drawing histograms.
Histogram
Histograms can be of various shapes, each meaning different things.
Histogram
Histograms can be of various shapes, each meaning different things.
Histogram
Histograms can be of various shapes, each meaning different things.
Contents
1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures
Moments
Moments
Moments
Moments
Moments
Moments
Moments
Moments
Moments
Moments
Moments
Moments
(x−µ)4
P
4th central moment is µ4 = N
Moments
(x−µ)4
P
4th central moment is µ4 = N
Skewness
Skewness refers to deviation from symmetry.
Skewness
Skewness refers to deviation from symmetry.
Skewness
Skewness refers to deviation from symmetry.
Skewness
Skewness refers to deviation from symmetry.
As we observe in the picture above,If Mean > Mode, the skewness is positive.
Skewness
Skewness refers to deviation from symmetry.
As we observe in the picture above,If Mean > Mode, the skewness is positive.If
Skewness
Skewness refers to deviation from symmetry.
As we observe in the picture above,If Mean > Mode, the skewness is positive.If
Mean < Mode, the skewness is negative.If Mean = Mode, the skewness is zero.
Skewness
Skewness refers to deviation from symmetry.
As we observe in the picture above,If Mean > Mode, the skewness is positive.If
Mean < Mode, the skewness is negative.If Mean = Mode, the skewness is zero.
Mean−Mode
standarddeviation is a measure of skewness by Karl Pearson.
Skewness
Skewness refers to deviation from symmetry.
As we observe in the picture above,If Mean > Mode, the skewness is positive.If
Mean < Mode, the skewness is negative.If Mean = Mode, the skewness is zero.
Mean−Mode
standarddeviation is a measure of skewness by Karl Pearson.
Skewness
µ23
Moment based measure of skewness = β1 = µ32
Skewness
µ23
Moment based measure of skewness = β1 = µ32
Skewness
µ23
Moment based measure of skewness = β1 = µ32
γ1 is √
defined as the square root of β1 to retain the original sign of µ3 .
γ 1 = β1
Kurtosis
Kurtosis
Kurtosis
Kurtosis
Contents
1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures
Stratification
Stratification
Stratification
Stratification
Stratification . . .
Stratification . . .
Stratification . . .
• You can also stratify the data you collect by different QC tools
such as graphs, Pareto diagrams, check sheets, histograms, scatter
diagrams, and control charts.
Area of application
• Raw Material
Quantity supplied, Delivery time, Rejection % - supplier wise and
batch wise.
Area of application
• Raw Material
Quantity supplied, Delivery time, Rejection % - supplier wise and
batch wise.
• Production
Rejection percentage with respect to machine, shift, operator, raw
material, tool, jig and so on.
Area of application
• Raw Material
Quantity supplied, Delivery time, Rejection % - supplier wise and
batch wise.
• Production
Rejection percentage with respect to machine, shift, operator, raw
material, tool, jig and so on.
1 Problem monitoring
2 Direction for trouble shooting
• Raw Material
No. of defects, Location of defect, measurement on quality
characteristics etc
• Raw Material
No. of defects, Location of defect, measurement on quality
characteristics etc
• Production
Measurements on process parameters, No. of defects in products,
location of defects etc
Location Check-Sheet
Contents
1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures
Scales of Measurements
Nominal: A scale that measures data by name only. For example,
religious affiliation (measured as Jewish, Christian, Buddhist, and
so forth), political affiliation (measured as Democratic, Republican,
Libertarian, and so forth), or style of automobile (measured as
sedan, sports car, station wagon, van, and so forth).
Scales of Measurements