You are on page 1of 240

Descriptive Statistics

Amit K Biswas

Indian Statistical Institute

@ GLIM June-July 2021

For PGPM

July 7, 2021

Amit (ISI, Chennai) Data July 7, 2021 1 / 59


Introduction

Contents

1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures

2 Population & Sample


Histogram
Moments, Skewness Kurtosis
Further on data collection, compilation
Measurements from data
Scales

Amit (ISI, Chennai) Data July 7, 2021 2 / 59


Introduction Data

Contents

1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures

2 Population & Sample


Histogram
Moments, Skewness Kurtosis
Further on data collection, compilation
Measurements from data
Scales

Amit (ISI, Chennai) Data July 7, 2021 3 / 59


Introduction Data

TYPES OF DATA

CONTINUOUS DISCRETE

Amit (ISI, Chennai) Data July 7, 2021 4 / 59


Introduction Data

TYPES OF DATA

CONTINUOUS DISCRETE

Measurable Subjective assessment

Amit (ISI, Chennai) Data July 7, 2021 4 / 59


Introduction Data

TYPES OF DATA

CONTINUOUS DISCRETE

Measurable Subjective assessment


e.g. SPO2 , Temperature e.g. Score in a beauty contest

Countable
e.g. Heart Rate, Number of defects
Data if properly collected

Amit (ISI, Chennai) Data July 7, 2021 4 / 59


Introduction Data

TYPES OF DATA

CONTINUOUS DISCRETE

Measurable Subjective assessment


e.g. SPO2 , Temperature e.g. Score in a beauty contest

Countable
e.g. Heart Rate, Number of defects
Data if properly collected

Least influenced by individual biases

Amit (ISI, Chennai) Data July 7, 2021 4 / 59


Introduction Data

TYPES OF DATA

CONTINUOUS DISCRETE

Measurable Subjective assessment


e.g. SPO2 , Temperature e.g. Score in a beauty contest

Countable
e.g. Heart Rate, Number of defects
Data if properly collected

Least influenced by individual biases

Could be subject to critical analysis

Amit (ISI, Chennai) Data July 7, 2021 4 / 59


Introduction Data

TYPES OF DATA

CONTINUOUS DISCRETE

Measurable Subjective assessment


e.g. SPO2 , Temperature e.g. Score in a beauty contest

Countable
e.g. Heart Rate, Number of defects
Data if properly collected

Least influenced by individual biases

Could be subject to critical analysis

Generally beyond language barriers and therefore universal in


expression.

Amit (ISI, Chennai) Data July 7, 2021 4 / 59


Introduction Data

DATA GATHERING

What is Data ?

Amit (ISI, Chennai) Data July 7, 2021 5 / 59


Introduction Data

DATA GATHERING

What is Data ?

Data is a numerical expression of an activity.

Amit (ISI, Chennai) Data July 7, 2021 5 / 59


Introduction Data

DATA GATHERING

What is Data ?

Data is a numerical expression of an activity.

Conclusions based on facts and data are necessary for any


improvement activity.

Amit (ISI, Chennai) Data July 7, 2021 5 / 59


Introduction Data

DATA GATHERING

What is Data ?

Data is a numerical expression of an activity.

Conclusions based on facts and data are necessary for any


improvement activity.

—K.Ishikawa

Amit (ISI, Chennai) Data July 7, 2021 5 / 59


Introduction Data

DATA GATHERING

What is Data ?

Data is a numerical expression of an activity.

Conclusions based on facts and data are necessary for any


improvement activity.

—K.Ishikawa

If you are not able to express a phenomenon in numbers, you do not


know about it adequately.

Amit (ISI, Chennai) Data July 7, 2021 5 / 59


Introduction Data

DATA GATHERING

What is Data ?

Data is a numerical expression of an activity.

Conclusions based on facts and data are necessary for any


improvement activity.

—K.Ishikawa

If you are not able to express a phenomenon in numbers, you do not


know about it adequately.

—Lord Kelvin

Amit (ISI, Chennai) Data July 7, 2021 5 / 59


Introduction Data

Difference between...

A shaft Diameter Number of shafts rejected for


oversize diameter

Amit (ISI, Chennai) Data July 7, 2021 6 / 59


Introduction Data

Difference between...

A shaft Diameter Number of shafts rejected for


oversize diameter

The diameter of a shaft can take The number of shafts rejected


any value ever after the decimal has necessarily to be a whole
point e.g.19.055, 19.0554 etc. number. e.g. 0, 2, 7, 10 etc.

Amit (ISI, Chennai) Data July 7, 2021 6 / 59


Introduction Data

Difference between...

A shaft Diameter Number of shafts rejected for


oversize diameter

The diameter of a shaft can take The number of shafts rejected


any value ever after the decimal has necessarily to be a whole
point e.g.19.055, 19.0554 etc. number. e.g. 0, 2, 7, 10 etc.

Data related to this type of Data related to this type of


parameters are called Continuous parameters are called Discrete
data data

Amit (ISI, Chennai) Data July 7, 2021 6 / 59


Introduction Data

Continuous & Discrete : Distinction

CONTINUOUS DISCRETE

• They are real numbers • They are whole numbers

Amit (ISI, Chennai) Data July 7, 2021 7 / 59


Introduction Data

Continuous & Discrete : Distinction

CONTINUOUS DISCRETE

• They are real numbers • They are whole numbers


• Normally they are measured • Normally they are counted
values values

Amit (ISI, Chennai) Data July 7, 2021 7 / 59


Introduction Data

Continuous & Discrete : Distinction

CONTINUOUS DISCRETE

• They are real numbers • They are whole numbers


• Normally they are measured • Normally they are counted
values values
• They can not take a single • They can take only ’zero’ or
value. There is an interval positive integral values
associated with it

Amit (ISI, Chennai) Data July 7, 2021 7 / 59


Introduction Data

Continuous & Discrete : Distinction

CONTINUOUS DISCRETE

• They are real numbers • They are whole numbers


• Normally they are measured • Normally they are counted
values values
• They can not take a single • They can take only ’zero’ or
value. There is an interval positive integral values
associated with it
• They are continuous • They are in steps of 1

Amit (ISI, Chennai) Data July 7, 2021 7 / 59


Introduction Data

Continuous & Discrete : Distinction

CONTINUOUS DISCRETE

• They are real numbers • They are whole numbers


• Normally they are measured • Normally they are counted
values values
• They can not take a single • They can take only ’zero’ or
value. There is an interval positive integral values
associated with it
• They are continuous • They are in steps of 1
• Requires small sample size • Requires larger sample size
for same precision

Amit (ISI, Chennai) Data July 7, 2021 7 / 59


Introduction Data

Continuous & Discrete : Distinction


Weight of a PPE suit

Amit (ISI, Chennai) Data July 7, 2021 8 / 59


Introduction Data

Continuous & Discrete : Distinction


Weight of a PPE suit

No. of new COVID patients admitted

Amit (ISI, Chennai) Data July 7, 2021 8 / 59


Introduction Data

Continuous & Discrete : Distinction


Weight of a PPE suit

No. of new COVID patients admitted

Capacity of an O2 tank

Amit (ISI, Chennai) Data July 7, 2021 8 / 59


Introduction Data

Continuous & Discrete : Distinction


Weight of a PPE suit

No. of new COVID patients admitted

Capacity of an O2 tank

Tubes rejected by Go—Nogo guage

Amit (ISI, Chennai) Data July 7, 2021 8 / 59


Introduction Data

Continuous & Discrete : Distinction


Weight of a PPE suit

No. of new COVID patients admitted

Capacity of an O2 tank

Tubes rejected by Go—Nogo guage

Diameter of a hypodermic needle

Amit (ISI, Chennai) Data July 7, 2021 8 / 59


Introduction Data

Continuous & Discrete : Distinction


Weight of a PPE suit

No. of new COVID patients admitted

Capacity of an O2 tank

Tubes rejected by Go—Nogo guage

Diameter of a hypodermic needle

Time taken to complete billing after doctor has discharged

Amit (ISI, Chennai) Data July 7, 2021 8 / 59


Introduction Data

Continuous & Discrete : Distinction


Weight of a PPE suit

No. of new COVID patients admitted

Capacity of an O2 tank

Tubes rejected by Go—Nogo guage

Diameter of a hypodermic needle

Time taken to complete billing after doctor has discharged

Number of ventilators under breakdown

Amit (ISI, Chennai) Data July 7, 2021 8 / 59


Introduction Data

Continuous & Discrete : Distinction


Weight of a PPE suit

No. of new COVID patients admitted

Capacity of an O2 tank

Tubes rejected by Go—Nogo guage

Diameter of a hypodermic needle

Time taken to complete billing after doctor has discharged

Number of ventilators under breakdown

Out of 100 patients number discharged

Amit (ISI, Chennai) Data July 7, 2021 8 / 59


Introduction Data

Continuous & Discrete : Distinction


Weight of a PPE suit

No. of new COVID patients admitted

Capacity of an O2 tank

Tubes rejected by Go—Nogo guage

Diameter of a hypodermic needle

Time taken to complete billing after doctor has discharged

Number of ventilators under breakdown

Out of 100 patients number discharged

No. of bugs in a program

Amit (ISI, Chennai) Data July 7, 2021 8 / 59


Introduction Data

Continuous & Discrete : Distinction


Weight of a PPE suit

No. of new COVID patients admitted

Capacity of an O2 tank

Tubes rejected by Go—Nogo guage

Diameter of a hypodermic needle

Time taken to complete billing after doctor has discharged

Number of ventilators under breakdown

Out of 100 patients number discharged

No. of bugs in a program

Taste of a brand of beer


Amit (ISI, Chennai) Data July 7, 2021 8 / 59
Introduction Data

Explosion of data in social media

How about an audio posted in a WhatsApp message?

Amit (ISI, Chennai) Data July 7, 2021 9 / 59


Introduction Data

Explosion of data in social media

How about an audio posted in a WhatsApp message?

A photograph or a painting in the timeline of your friend in


Facebook?

Amit (ISI, Chennai) Data July 7, 2021 9 / 59


Introduction Data

Explosion of data in social media

How about an audio posted in a WhatsApp message?

A photograph or a painting in the timeline of your friend in


Facebook?

Of course they are data, analysis of which need sufficiently advanced


tools.

Amit (ISI, Chennai) Data July 7, 2021 9 / 59


Introduction Data

Explosion of data in social media

How about an audio posted in a WhatsApp message?

A photograph or a painting in the timeline of your friend in


Facebook?

Of course they are data, analysis of which need sufficiently advanced


tools.

How else do you think a fascist regime, books an opposition leader


for sedition, based on their social media activity?

Amit (ISI, Chennai) Data July 7, 2021 9 / 59


Introduction Data

Explosion of data in social media

How about an audio posted in a WhatsApp message?

A photograph or a painting in the timeline of your friend in


Facebook?

Of course they are data, analysis of which need sufficiently advanced


tools.

How else do you think a fascist regime, books an opposition leader


for sedition, based on their social media activity?

You need text analytics, pattern recognition, Artificial Intelligence


and Machine learning kind of tools to do them on a continuos and
automated manner.

Amit (ISI, Chennai) Data July 7, 2021 9 / 59


Introduction Data

Explosion of data in social media

How about an audio posted in a WhatsApp message?

A photograph or a painting in the timeline of your friend in


Facebook?

Of course they are data, analysis of which need sufficiently advanced


tools.

How else do you think a fascist regime, books an opposition leader


for sedition, based on their social media activity?

You need text analytics, pattern recognition, Artificial Intelligence


and Machine learning kind of tools to do them on a continuos and
automated manner.

Amit (ISI, Chennai) Data July 7, 2021 9 / 59


Introduction Data

Explosion of data in social media

We do leave a toxic or narcissistic girl friend, don’t we?

Amit (ISI, Chennai) Data July 7, 2021 10 / 59


Introduction Data

Explosion of data in social media

We do leave a toxic or narcissistic girl friend, don’t we?

Our brain analyses data of behavioural pattern and decides on the


toxicity of a person.

Amit (ISI, Chennai) Data July 7, 2021 10 / 59


Introduction Data

Explosion of data in social media

We do leave a toxic or narcissistic girl friend, don’t we?

Our brain analyses data of behavioural pattern and decides on the


toxicity of a person.

It takes a lot of research and learning before we mechanically mimic


a human brain.

Amit (ISI, Chennai) Data July 7, 2021 10 / 59


Introduction Data

Explosion of data in social media

We do leave a toxic or narcissistic girl friend, don’t we?

Our brain analyses data of behavioural pattern and decides on the


toxicity of a person.

It takes a lot of research and learning before we mechanically mimic


a human brain.

That is why Data Mining, AI and MI are developing in such a fast


pace today.

Amit (ISI, Chennai) Data July 7, 2021 10 / 59


Introduction Data

Explosion of data in social media

We do leave a toxic or narcissistic girl friend, don’t we?

Our brain analyses data of behavioural pattern and decides on the


toxicity of a person.

It takes a lot of research and learning before we mechanically mimic


a human brain.

That is why Data Mining, AI and MI are developing in such a fast


pace today.

These are the fastest growing research area in Wold Universities.

Amit (ISI, Chennai) Data July 7, 2021 10 / 59


Introduction Data

Explosion of data in social media

We do leave a toxic or narcissistic girl friend, don’t we?

Our brain analyses data of behavioural pattern and decides on the


toxicity of a person.

It takes a lot of research and learning before we mechanically mimic


a human brain.

That is why Data Mining, AI and MI are developing in such a fast


pace today.

These are the fastest growing research area in Wold Universities.

You must be aware of the Target story of pregnancy material ads


reaching an unmarried girl.

Amit (ISI, Chennai) Data July 7, 2021 10 / 59


Introduction Data

Explosion of data in social media

We do leave a toxic or narcissistic girl friend, don’t we?

Our brain analyses data of behavioural pattern and decides on the


toxicity of a person.

It takes a lot of research and learning before we mechanically mimic


a human brain.

That is why Data Mining, AI and MI are developing in such a fast


pace today.

These are the fastest growing research area in Wold Universities.

You must be aware of the Target story of pregnancy material ads


reaching an unmarried girl.

Amit (ISI, Chennai) Data July 7, 2021 10 / 59


Introduction Collection of Data

Contents

1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures

2 Population & Sample


Histogram
Moments, Skewness Kurtosis
Further on data collection, compilation
Measurements from data
Scales

Amit (ISI, Chennai) Data July 7, 2021 11 / 59


Introduction Collection of Data

Data Collection
OBJECTIVES OF DATA COLLECTION

Amit (ISI, Chennai) Data July 7, 2021 12 / 59


Introduction Collection of Data

Data Collection
OBJECTIVES OF DATA COLLECTION

To know and quantify the status

Amit (ISI, Chennai) Data July 7, 2021 12 / 59


Introduction Collection of Data

Data Collection
OBJECTIVES OF DATA COLLECTION

To know and quantify the status

To monitor the process

Amit (ISI, Chennai) Data July 7, 2021 12 / 59


Introduction Collection of Data

Data Collection
OBJECTIVES OF DATA COLLECTION

To know and quantify the status

To monitor the process

To decide acceptance or rejection

Amit (ISI, Chennai) Data July 7, 2021 12 / 59


Introduction Collection of Data

Data Collection
OBJECTIVES OF DATA COLLECTION

To know and quantify the status

To monitor the process

To decide acceptance or rejection

To analyse and decide the course of action

Amit (ISI, Chennai) Data July 7, 2021 12 / 59


Introduction Collection of Data

Data Collection
OBJECTIVES OF DATA COLLECTION

To know and quantify the status

To monitor the process

To decide acceptance or rejection

To analyse and decide the course of action

HOW TO COLLECT DATA ?

Amit (ISI, Chennai) Data July 7, 2021 12 / 59


Introduction Collection of Data

Data Collection
OBJECTIVES OF DATA COLLECTION

To know and quantify the status

To monitor the process

To decide acceptance or rejection

To analyse and decide the course of action

HOW TO COLLECT DATA ?

Define the purpose

Amit (ISI, Chennai) Data July 7, 2021 12 / 59


Introduction Collection of Data

Data Collection
OBJECTIVES OF DATA COLLECTION

To know and quantify the status

To monitor the process

To decide acceptance or rejection

To analyse and decide the course of action

HOW TO COLLECT DATA ?

Define the purpose

Decide the type of analysis

Amit (ISI, Chennai) Data July 7, 2021 12 / 59


Introduction Collection of Data

Data Collection
OBJECTIVES OF DATA COLLECTION

To know and quantify the status

To monitor the process

To decide acceptance or rejection

To analyse and decide the course of action

HOW TO COLLECT DATA ?

Define the purpose

Decide the type of analysis

Define the period of data collection

Amit (ISI, Chennai) Data July 7, 2021 12 / 59


Introduction Collection of Data

Data Collection
OBJECTIVES OF DATA COLLECTION

To know and quantify the status

To monitor the process

To decide acceptance or rejection

To analyse and decide the course of action

HOW TO COLLECT DATA ?

Define the purpose

Decide the type of analysis

Define the period of data collection

Is the the required data already available ?


Amit (ISI, Chennai) Data July 7, 2021 12 / 59
Introduction Compilation of Data

Contents

1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures

2 Population & Sample


Histogram
Moments, Skewness Kurtosis
Further on data collection, compilation
Measurements from data
Scales

Amit (ISI, Chennai) Data July 7, 2021 13 / 59


Introduction Compilation of Data

A data set

2.87, 2.85, 2.88, 2.85, 2.86, 2.85, 2.81, 2.82, 2.83, 2.85,
2.84, 2.84, 2.85, 2.86, 2.85, 2.84, 2.85, 2.85, 2.87, 2.81,
2.85, 2.82, 2.83, 2.85, 2.85, 2.86, 2.85, 2.86, 2.89, 2.85,
2.84, 2.84, 2.85, 2.85, 2.83, 2.82, 2.86, 2.83, 2.85, 2.86,
2.85, 2.84, 2.84, 2.87, 2.85, 2.86, 2.85, 2.84, 2.90, 2.88

Amit (ISI, Chennai) Data July 7, 2021 14 / 59


Introduction Compilation of Data

Questions about the data set

What is the maximum or minimum value? Or the spread of the values?

Amit (ISI, Chennai) Data July 7, 2021 15 / 59


Introduction Compilation of Data

Questions about the data set

What is the maximum or minimum value? Or the spread of the values?

The most frequent value?

Amit (ISI, Chennai) Data July 7, 2021 15 / 59


Introduction Compilation of Data

Questions about the data set

What is the maximum or minimum value? Or the spread of the values?

The most frequent value?

What value is the data centred around?

Amit (ISI, Chennai) Data July 7, 2021 15 / 59


Introduction Compilation of Data

Questions about the data set

What is the maximum or minimum value? Or the spread of the values?

The most frequent value?

What value is the data centred around?

To answer questions like these of any data set, we need to compile the data.

Amit (ISI, Chennai) Data July 7, 2021 15 / 59


Introduction Centre & Spread

Contents

1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures

2 Population & Sample


Histogram
Moments, Skewness Kurtosis
Further on data collection, compilation
Measurements from data
Scales

Amit (ISI, Chennai) Data July 7, 2021 16 / 59


Introduction Centre & Spread

Measures of Central Tendency & Dispersion

This suspended pipe is Horizontal The same pipe has now tilted

Amit (ISI, Chennai) Data July 7, 2021 17 / 59


Introduction Centre & Spread

Measures of Central Tendency & Dispersion

This suspended pipe is Horizontal The same pipe has now tilted
WHY?

Amit (ISI, Chennai) Data July 7, 2021 17 / 59


Introduction Centre & Spread

Measures of Central Tendency & Dispersion

This suspended pipe is Horizontal The same pipe has now tilted
WHY?
The rope is tied at the center The rope is tied away from
of gravity. the center of gravity.

Amit (ISI, Chennai) Data July 7, 2021 17 / 59


Introduction Centre & Spread

Measures of Central Tendency & Dispersion

This suspended pipe is Horizontal The same pipe has now tilted
WHY?
The rope is tied at the center The rope is tied away from
of gravity. the center of gravity.

Center of gravity is the point where the entire mass of the body is
supposed to be concentrated.

Amit (ISI, Chennai) Data July 7, 2021 17 / 59


Introduction Centre & Spread

Measures of Central Tendency & Dispersion

This suspended pipe is Horizontal The same pipe has now tilted
WHY?
The rope is tied at the center The rope is tied away from
of gravity. the center of gravity.

Center of gravity is the point where the entire mass of the body is
supposed to be concentrated.

Thus center of gravity is a measure of CENTRAL TENDENCY of a


body.
Amit (ISI, Chennai) Data July 7, 2021 17 / 59
Introduction Centre & Spread

Measure of Central Tendency


There are three ways in which central tendency of numbers can be
measured.

Amit (ISI, Chennai) Data July 7, 2021 18 / 59


Introduction Centre & Spread

Measure of Central Tendency


There are three ways in which central tendency of numbers can be
measured.

These three M’s are

Amit (ISI, Chennai) Data July 7, 2021 18 / 59


Introduction Centre & Spread

Measure of Central Tendency


There are three ways in which central tendency of numbers can be
measured.

These three M’s are


Mean : X̄

Amit (ISI, Chennai) Data July 7, 2021 18 / 59


Introduction Centre & Spread

Measure of Central Tendency


There are three ways in which central tendency of numbers can be
measured.

These three M’s are


Mean : X̄
Median: Md

Amit (ISI, Chennai) Data July 7, 2021 18 / 59


Introduction Centre & Spread

Measure of Central Tendency


There are three ways in which central tendency of numbers can be
measured.

These three M’s are


Mean : X̄
Median: Md
Mode: Mo

Amit (ISI, Chennai) Data July 7, 2021 18 / 59


Introduction Centre & Spread

Measure of Central Tendency


There are three ways in which central tendency of numbers can be
measured.

These three M’s are


Mean : X̄
Median: Md
Mode: Mo

Mean or Average is normally signified by

Amit (ISI, Chennai) Data July 7, 2021 18 / 59


Introduction Centre & Spread

Measure of Central Tendency


There are three ways in which central tendency of numbers can be
measured.

These three M’s are


Mean : X̄
Median: Md
Mode: Mo

Mean or Average is normally signified by

Mean = Summation of all the values


Number of values
Pn
xi
Mathematically it can be expressed as :x̄ = i=1
n

Amit (ISI, Chennai) Data July 7, 2021 18 / 59


Introduction Centre & Spread

Measure of Central Tendency


There are three ways in which central tendency of numbers can be
measured.

These three M’s are


Mean : X̄
Median: Md
Mode: Mo

Mean or Average is normally signified by

Mean = Summation of all the values


Number of values
Pn
xi
Mathematically it can be expressed as :x̄ = i=1
n

Let us now find the mean of the values of our earlier data.

Amit (ISI, Chennai) Data July 7, 2021 18 / 59


Introduction Centre & Spread

Measure of Central Tendency


There are three ways in which central tendency of numbers can be
measured.

These three M’s are


Mean : X̄
Median: Md
Mode: Mo

Mean or Average is normally signified by

Mean = Summation of all the values


Number of values
Pn
xi
Mathematically it can be expressed as :x̄ = i=1
n

Let us now find the mean of the values of our earlier data.
2.87+2.85+···+2.88
x̄ = 50 = 2.849
Amit (ISI, Chennai) Data July 7, 2021 18 / 59
Introduction Centre & Spread

A data set

2.87, 2.85, 2.88, 2.85, 2.86, 2.85, 2.81, 2.82, 2.83, 2.85,
2.84, 2.84, 2.85, 2.86, 2.85, 2.84, 2.85, 2.85, 2.87, 2.81,
2.85, 2.82, 2.83, 2.85, 2.85, 2.86, 2.85, 2.86, 2.89, 2.85,
2.84, 2.84, 2.85, 2.85, 2.83, 2.82, 2.86, 2.83, 2.85, 2.86,
2.85, 2.84, 2.84, 2.87, 2.85, 2.86, 2.85, 2.84, 2.90, 2.88

Amit (ISI, Chennai) Data July 7, 2021 19 / 59


Introduction Centre & Spread

A data set

2.87, 2.85, 2.88, 2.85, 2.86, 2.85, 2.81, 2.82, 2.83, 2.85,
2.84, 2.84, 2.85, 2.86, 2.85, 2.84, 2.85, 2.85, 2.87, 2.81,
2.85, 2.82, 2.83, 2.85, 2.85, 2.86, 2.85, 2.86, 2.89, 2.85,
2.84, 2.84, 2.85, 2.85, 2.83, 2.82, 2.86, 2.83, 2.85, 2.86,
2.85, 2.84, 2.84, 2.87, 2.85, 2.86, 2.85, 2.84, 2.90, 2.88

Few numbers do occur repeatedly.

Amit (ISI, Chennai) Data July 7, 2021 19 / 59


Introduction Centre & Spread

Mean for classified data & Median

sum of product of mid point of class and its frequency


Mean =
Sum of frequencies of all classes

Amit (ISI, Chennai) Data July 7, 2021 20 / 59


Introduction Centre & Spread

Mean for classified data & Median

sum of product of mid point of class and its frequency


Mean =
Sum of frequencies of all classes
Pk
i=1 fi ×xi
Pk
Mathematically, x̄ = = P fi
k i=1 xi . k fi
P
i=1 fi i=1

Amit (ISI, Chennai) Data July 7, 2021 20 / 59


Introduction Centre & Spread

Mean for classified data & Median

sum of product of mid point of class and its frequency


Mean =
Sum of frequencies of all classes
Pk
i=1 fi ×xi
Pk
Mathematically, x̄ = = P fi
k i=1 xi . k fi
P
i=1 fi i=1

Let us find Mean for the values now with this new method. We will
need frequency table.

Amit (ISI, Chennai) Data July 7, 2021 20 / 59


Introduction Centre & Spread

Mean for classified data & Median

sum of product of mid point of class and its frequency


Mean =
Sum of frequencies of all classes
Pk
i=1 fi ×xi
Pk
Mathematically, x̄ = = P fi
k i=1 xi . k fi
P
i=1 fi i=1

Let us find Mean for the values now with this new method. We will
need frequency table.
2.81×5+2.83×12+2.85×26+2.87×5+2.89×2
Mean, x̄ = 5+12+26+5+2 = 2.8448

Amit (ISI, Chennai) Data July 7, 2021 20 / 59


Introduction Centre & Spread

Mean for classified data & Median

sum of product of mid point of class and its frequency


Mean =
Sum of frequencies of all classes
Pk
i=1 fi ×xi
Pk
Mathematically, x̄ = = P fi
k i=1 xi . k fi
P
i=1 fi i=1

Let us find Mean for the values now with this new method. We will
need frequency table.
2.81×5+2.83×12+2.85×26+2.87×5+2.89×2
Mean, x̄ = 5+12+26+5+2 = 2.8448

Median is the value which has equal number of values above it and
below it, when arranged in ascending order.

Amit (ISI, Chennai) Data July 7, 2021 20 / 59


Introduction Centre & Spread

Mean for classified data & Median

sum of product of mid point of class and its frequency


Mean =
Sum of frequencies of all classes
Pk
i=1 fi ×xi
Pk
Mathematically, x̄ = = P fi
k i=1 xi . k fi
P
i=1 fi i=1

Let us find Mean for the values now with this new method. We will
need frequency table.
2.81×5+2.83×12+2.85×26+2.87×5+2.89×2
Mean, x̄ = 5+12+26+5+2 = 2.8448

Median is the value which has equal number of values above it and
below it, when arranged in ascending order.

Mathematically :

Amit (ISI, Chennai) Data July 7, 2021 20 / 59


Introduction Centre & Spread

Mean for classified data & Median

sum of product of mid point of class and its frequency


Mean =
Sum of frequencies of all classes
Pk
i=1 fi ×xi
Pk
Mathematically, x̄ = = P fi
k i=1 xi . k fi
P
i=1 fi i=1

Let us find Mean for the values now with this new method. We will
need frequency table.
2.81×5+2.83×12+2.85×26+2.87×5+2.89×2
Mean, x̄ = 5+12+26+5+2 = 2.8448

Median is the value which has equal number of values above it and
below it, when arranged in ascending order.

Mathematically :

2 value+( 2 +1) value


" n th n th
; when n is even
Median = 2
n+1 th
2 value ; when n is odd

Amit (ISI, Chennai) Data July 7, 2021 20 / 59


Introduction Centre & Spread

Mode

This is the value which occurs with the highest frequency.

Amit (ISI, Chennai) Data July 7, 2021 21 / 59


Introduction Centre & Spread

Mode

This is the value which occurs with the highest frequency.

In our case it is 2.85 which occurred 19 times.

Amit (ISI, Chennai) Data July 7, 2021 21 / 59


Introduction Centre & Spread

Mode

This is the value which occurs with the highest frequency.

In our case it is 2.85 which occurred 19 times.

We have thus calculated, The Mean, Median and Mode for a given
data set.

Amit (ISI, Chennai) Data July 7, 2021 21 / 59


Introduction Centre & Spread

Mode

This is the value which occurs with the highest frequency.

In our case it is 2.85 which occurred 19 times.

We have thus calculated, The Mean, Median and Mode for a given
data set.

The Mean, Median and Mode are equal for a symmetric unimodal
distribution.

Amit (ISI, Chennai) Data July 7, 2021 21 / 59


Introduction Centre & Spread

Mode

This is the value which occurs with the highest frequency.

In our case it is 2.85 which occurred 19 times.

We have thus calculated, The Mean, Median and Mode for a given
data set.

The Mean, Median and Mode are equal for a symmetric unimodal
distribution.

They are not equal if distribution is not symmetric.

Amit (ISI, Chennai) Data July 7, 2021 21 / 59


Introduction Centre & Spread

Measures of Dispersion

The extent of the spread of the values from the mean value is called
Dispersion.

Amit (ISI, Chennai) Data July 7, 2021 22 / 59


Introduction Centre & Spread

Measures of Dispersion

The extent of the spread of the values from the mean value is called
Dispersion.

The measures of Dispersions are:

Amit (ISI, Chennai) Data July 7, 2021 22 / 59


Introduction Centre & Spread

Measures of Dispersion

The extent of the spread of the values from the mean value is called
Dispersion.

The measures of Dispersions are:

1 Range (R)

Amit (ISI, Chennai) Data July 7, 2021 22 / 59


Introduction Centre & Spread

Measures of Dispersion

The extent of the spread of the values from the mean value is called
Dispersion.

The measures of Dispersions are:

1 Range (R)
2 Standard Deviation (s)

Amit (ISI, Chennai) Data July 7, 2021 22 / 59


Introduction Centre & Spread

Measures of Dispersion

The extent of the spread of the values from the mean value is called
Dispersion.

The measures of Dispersions are:

1 Range (R)
2 Standard Deviation (s)
3 Variance (s 2 )

Amit (ISI, Chennai) Data July 7, 2021 22 / 59


Introduction Centre & Spread

Measures of Dispersion

The extent of the spread of the values from the mean value is called
Dispersion.

The measures of Dispersions are:

1 Range (R)
2 Standard Deviation (s)
3 Variance (s 2 )
4 Co-efficient of Variation (CV )

Amit (ISI, Chennai) Data July 7, 2021 22 / 59


Introduction Centre & Spread

Measures of Dispersion

The extent of the spread of the values from the mean value is called
Dispersion.

The measures of Dispersions are:

1 Range (R)
2 Standard Deviation (s)
3 Variance (s 2 )
4 Co-efficient of Variation (CV )

Standard deviation is the most commonly used measure of


dispersion.

Amit (ISI, Chennai) Data July 7, 2021 22 / 59


Introduction Centre & Spread

Measures of Dispersion

Of Population:

Amit (ISI, Chennai) Data July 7, 2021 23 / 59


Introduction Centre & Spread

Measures of Dispersion

Of Population:

If x1 , x2 , . . . , xN are population values and µ is their population mean,

Amit (ISI, Chennai) Data July 7, 2021 23 / 59


Introduction Centre & Spread

Measures of Dispersion

Of Population:

If x1 , x2 , . . . , xN are population values and µ is their population mean,


s
r PN 2
(x1 − µ)2 + (x2 − µ)2 + . . . + (xN − µ)2 i=1 (xi − µ)
σ= =
N N

Of Sample:

Amit (ISI, Chennai) Data July 7, 2021 23 / 59


Introduction Centre & Spread

Measures of Dispersion

Of Population:

If x1 , x2 , . . . , xN are population values and µ is their population mean,


s
r PN 2
(x1 − µ)2 + (x2 − µ)2 + . . . + (xN − µ)2 i=1 (xi − µ)
σ= =
N N

Of Sample:

If x1 , x2 , . . . , xn are sample values and X̄ is their sample mean,

Amit (ISI, Chennai) Data July 7, 2021 23 / 59


Introduction Centre & Spread

Measures of Dispersion

Of Population:

If x1 , x2 , . . . , xN are population values and µ is their population mean,


s
r PN 2
(x1 − µ)2 + (x2 − µ)2 + . . . + (xN − µ)2 i=1 (xi − µ)
σ= =
N N

Of Sample:

If x1 , x2 , . . . , xn are sample values and X̄ is their sample mean,


s s
Pn 2
(x1 − X̄ )2 + (x2 − X̄ )2 + . . . + (xn − X̄ )2 i=1 (xi − X̄ )
s= =
n−1 n−1

The ”n − 1” in the denominator above is called the degree of freedom

Amit (ISI, Chennai) Data July 7, 2021 23 / 59


Introduction Centre & Spread

Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.

Amit (ISI, Chennai) Data July 7, 2021 24 / 59


Introduction Centre & Spread

Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy!

Amit (ISI, Chennai) Data July 7, 2021 24 / 59


Introduction Centre & Spread

Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.

Amit (ISI, Chennai) Data July 7, 2021 24 / 59


Introduction Centre & Spread

Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.
This is a piece of chocolate I want to divide in three pieces.

Amit (ISI, Chennai) Data July 7, 2021 24 / 59


Introduction Centre & Spread

Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.
This is a piece of chocolate I want to divide in three pieces.

For my friend!

Amit (ISI, Chennai) Data July 7, 2021 24 / 59


Introduction Centre & Spread

Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.
This is a piece of chocolate I want to divide in three pieces.

For my friend!

A big piece for myself!

Amit (ISI, Chennai) Data July 7, 2021 24 / 59


Introduction Centre & Spread

Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.
This is a piece of chocolate I want to divide in three pieces.

For my friend!

A big piece for myself!

Do I have a choice for the last?

Amit (ISI, Chennai) Data July 7, 2021 24 / 59


Introduction Centre & Spread

Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.
This is a piece of chocolate I want to divide in three pieces.

For my friend!

A big piece for myself!

Do I have a choice for the last?

Amit (ISI, Chennai) Data July 7, 2021 24 / 59


Introduction Centre & Spread

Degrees of Freedom
This is a deep and wide concept, not easy to get in a single go.
Here is an analogy! an analogy only not the whole concept.
This is a piece of chocolate I want to divide in three pieces.

For my friend!

A big piece for myself!

Do I have a choice for the last?

The degree of freedom to make three pieces out of the chocolate is


only 2.
Amit (ISI, Chennai) Data July 7, 2021 24 / 59
Introduction Centre & Spread

back to Variation

Alternatively:

Amit (ISI, Chennai) Data July 7, 2021 25 / 59


Introduction Centre & Spread

back to Variation

Alternatively:

s s
Pn 2
(x12 + x22 + . . . + xn2 ) − n × X̄ 2 i=1 xi− nX̄ 2
s= =
n−1 n−1

r
(2.872 + 2.852 + . . . + 2.882 ) − 50 × 2.8492
s= = 0.0181
50 − 1

This is also known as :

Root Mean Square (R.M.S) Deviation from mean.

Amit (ISI, Chennai) Data July 7, 2021 25 / 59


Introduction Centre & Spread

Other measures

Range, R = Largest Observation -Smallest Observation

Amit (ISI, Chennai) Data July 7, 2021 26 / 59


Introduction Centre & Spread

Other measures

Range, R = Largest Observation -Smallest Observation

= Xmax − Xmin

Amit (ISI, Chennai) Data July 7, 2021 26 / 59


Introduction Centre & Spread

Other measures

Range, R = Largest Observation -Smallest Observation

= Xmax − Xmin

Variance (s 2 ) is the square of standard deviation.

Amit (ISI, Chennai) Data July 7, 2021 26 / 59


Introduction Centre & Spread

Other measures

Range, R = Largest Observation -Smallest Observation

= Xmax − Xmin

Variance (s 2 ) is the square of standard deviation.


SD s
Coefficient of Variation, CV = Mean = X̄

Amit (ISI, Chennai) Data July 7, 2021 26 / 59


Introduction Centre & Spread

Other measures

Range, R = Largest Observation -Smallest Observation

= Xmax − Xmin

Variance (s 2 ) is the square of standard deviation.


SD s
Coefficient of Variation, CV = Mean = X̄

Mean Deviation :
Pn
|xi −A|
i=1
n is the mean deviation from some value A.

Amit (ISI, Chennai) Data July 7, 2021 26 / 59


Introduction Centre & Spread

Other measures

Range, R = Largest Observation -Smallest Observation

= Xmax − Xmin

Variance (s 2 ) is the square of standard deviation.


SD s
Coefficient of Variation, CV = Mean = X̄

Mean Deviation :
Pn
|xi −A|
i=1
n is the mean deviation from some value A.
Pn
|xi −x̄|
i=1
n is the mean deviation from Mean x̄.

Amit (ISI, Chennai) Data July 7, 2021 26 / 59


Introduction Centre & Spread

Other measures

Range, R = Largest Observation -Smallest Observation

= Xmax − Xmin

Variance (s 2 ) is the square of standard deviation.


SD s
Coefficient of Variation, CV = Mean = X̄

Mean Deviation :
Pn
|xi −A|
i=1
n is the mean deviation from some value A.
Pn
|xi −x̄|
i=1
n is the mean deviation from Mean x̄.
Pn
|xi −Md|
i=1
n is the mean deviation from Median Md.

Amit (ISI, Chennai) Data July 7, 2021 26 / 59


Introduction Centre & Spread

Other measures

Range, R = Largest Observation -Smallest Observation

= Xmax − Xmin

Variance (s 2 ) is the square of standard deviation.


SD s
Coefficient of Variation, CV = Mean = X̄

Mean Deviation :
Pn
|xi −A|
i=1
n is the mean deviation from some value A.
Pn
|xi −x̄|
i=1
n is the mean deviation from Mean x̄.
Pn
|xi −Md|
i=1
n is the mean deviation from Median Md.

Mean deviation from the median happens to be the lowest among


all Mean Deviations.

Amit (ISI, Chennai) Data July 7, 2021 26 / 59


Introduction Centre & Spread

Percentiles
Percentile (or a centile) is a score below which a given percentage of
scores in its frequency distribution falls. For example, the 50th
percentile (the median) is the score below which 50% of the scores
in the distribution may be found.

Amit (ISI, Chennai) Data July 7, 2021 27 / 59


Introduction Centre & Spread

Percentiles
Percentile (or a centile) is a score below which a given percentage of
scores in its frequency distribution falls. For example, the 50th
percentile (the median) is the score below which 50% of the scores
in the distribution may be found.

25th percentile is called the first quartile Q1 . 75th percentile is called


the third quartile Q3 .

Amit (ISI, Chennai) Data July 7, 2021 27 / 59


Introduction Centre & Spread

Percentiles
Percentile (or a centile) is a score below which a given percentage of
scores in its frequency distribution falls. For example, the 50th
percentile (the median) is the score below which 50% of the scores
in the distribution may be found.

25th percentile is called the first quartile Q1 . 75th percentile is called


the third quartile Q3 .

Q3 − Q1 is called the quartile range, which is also considered a


measure of dispersion.

Amit (ISI, Chennai) Data July 7, 2021 27 / 59


Introduction Centre & Spread

Percentiles
Percentile (or a centile) is a score below which a given percentage of
scores in its frequency distribution falls. For example, the 50th
percentile (the median) is the score below which 50% of the scores
in the distribution may be found.

25th percentile is called the first quartile Q1 . 75th percentile is called


the third quartile Q3 .

Q3 − Q1 is called the quartile range, which is also considered a


measure of dispersion.

Amit (ISI, Chennai) Data PAUSE for break!


July 7, 2021 27 / 59
Introduction Centre & Spread

Variation

Amit (ISI, Chennai) Data July 7, 2021 28 / 59


Introduction Centre & Spread

Variation

This man wants to reach his work place by 6.55 a.m.. But he can
not do so, exactly at 6.55 a.m. daily. Sometimes he reaches earlier
(but almost never before 6.50 a.m.). Sometimes he reaches later
(but almost never after 7.00 a.m.)

Amit (ISI, Chennai) Data July 7, 2021 28 / 59


Introduction Centre & Spread

Variation

This man wants to reach his work place by 6.55 a.m.. But he can
not do so, exactly at 6.55 a.m. daily. Sometimes he reaches earlier
(but almost never before 6.50 a.m.). Sometimes he reaches later
(but almost never after 7.00 a.m.)

WHY ?

Amit (ISI, Chennai) Data July 7, 2021 28 / 59


Population & Sample

Contents

1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures

2 Population & Sample


Histogram
Moments, Skewness Kurtosis
Further on data collection, compilation
Measurements from data
Scales

Amit (ISI, Chennai) Data July 7, 2021 29 / 59


Population & Sample

Population & Sample

The entire set of items is called the Population.

Amit (ISI, Chennai) Data July 7, 2021 30 / 59


Population & Sample

Population & Sample

The entire set of items is called the Population.

The small number of items taken from the population to make a


judgment of the population is called a Sample.

Amit (ISI, Chennai) Data July 7, 2021 30 / 59


Population & Sample

Population & Sample

The entire set of items is called the Population.

The small number of items taken from the population to make a


judgment of the population is called a Sample.

The numbers of samples taken to make this judgment is called


Sample size.

Amit (ISI, Chennai) Data July 7, 2021 30 / 59


Population & Sample

Population & Sample

The entire set of items is called the Population.

The small number of items taken from the population to make a


judgment of the population is called a Sample.

The numbers of samples taken to make this judgment is called


Sample size.

The bin below is the population and the three selected ones by its
side is a sample of size 3.

Population Sample

Amit (ISI, Chennai) Data July 7, 2021 30 / 59


Population & Sample

Population, Sample and Data

Random
Population - Sample
sampling
- Data

Amit (ISI, Chennai) Data July 7, 2021 31 / 59


Population & Sample

Population, Sample and Data

Random Measurement/
Population - Sample
sampling
-
Observation
Data

Amit (ISI, Chennai) Data July 7, 2021 31 / 59


Population & Sample

Population, Sample and Data

No action
6

?
Random Measurement/
Population - Sample
sampling
-
Observation
Data

6
 ?
Action

Amit (ISI, Chennai) Data July 7, 2021 31 / 59


Population & Sample

What and why sampling

Population mean will hardly ever be known, though we designate it


xi i=1N
P
as µ = N When population size is N.

Amit (ISI, Chennai) Data July 7, 2021 32 / 59


Population & Sample

What and why sampling

Population mean will hardly ever be known, though we designate it


xi i=1N
P
as µ = N When population size is N.

Similarly we will possibly never know


q PNpopulation standard deviation
2
i=1 (xi −µ)
σ however we designate it by σ = N

Amit (ISI, Chennai) Data July 7, 2021 32 / 59


Population & Sample

What and why sampling

Population mean will hardly ever be known, though we designate it


xi i=1N
P
as µ = N When population size is N.

Similarly we will possibly never know


q PNpopulation standard deviation
2
i=1 (xi −µ)
σ however we designate it by σ = N

In the absence of population values, we shall of course use sample


values x̄ and s.

Amit (ISI, Chennai) Data July 7, 2021 32 / 59


Population & Sample

What and why sampling

Population mean will hardly ever be known, though we designate it


xi i=1N
P
as µ = N When population size is N.

Similarly we will possibly never know


q PNpopulation standard deviation
2
i=1 (xi −µ)
σ however we designate it by σ = N

In the absence of population values, we shall of course use sample


values x̄ and s.

This is why sample size, it’s being a good representative of the


population etc. are important issues.

Amit (ISI, Chennai) Data July 7, 2021 32 / 59


Population & Sample

What and why sampling

— Sampling is

Amit (ISI, Chennai) Data July 7, 2021 33 / 59


Population & Sample

What and why sampling

— Sampling is

• Collecting a portion of all the data.

Amit (ISI, Chennai) Data July 7, 2021 33 / 59


Population & Sample

What and why sampling

— Sampling is

• Collecting a portion of all the data.

• Using that portion to draw conclusions (make inferences).

Amit (ISI, Chennai) Data July 7, 2021 33 / 59


Population & Sample

What and why sampling

— Sampling is

• Collecting a portion of all the data.

• Using that portion to draw conclusions (make inferences).

— Why sample?

Amit (ISI, Chennai) Data July 7, 2021 33 / 59


Population & Sample

What and why sampling

— Sampling is

• Collecting a portion of all the data.

• Using that portion to draw conclusions (make inferences).

— Why sample?

• Because looking at all the data may be Too expensive.

Amit (ISI, Chennai) Data July 7, 2021 33 / 59


Population & Sample

What and why sampling

— Sampling is

• Collecting a portion of all the data.

• Using that portion to draw conclusions (make inferences).

— Why sample?

• Because looking at all the data may be Too expensive.

• Too time-consuming. Destructive (e.g., taste tests).

Amit (ISI, Chennai) Data July 7, 2021 33 / 59


Population & Sample

What and why sampling

— Sampling is

• Collecting a portion of all the data.

• Using that portion to draw conclusions (make inferences).

— Why sample?

• Because looking at all the data may be Too expensive.

• Too time-consuming. Destructive (e.g., taste tests).

• Sound conclusions can often be drawn from a relatively small


amount of data.

Amit (ISI, Chennai) Data July 7, 2021 33 / 59


Population & Sample Histogram

Contents

1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures

2 Population & Sample


Histogram
Moments, Skewness Kurtosis
Further on data collection, compilation
Measurements from data
Scales

Amit (ISI, Chennai) Data July 7, 2021 34 / 59


Population & Sample Histogram

Frequency Distribution & Histogram

If I had to reduce my message to management to just a few words, I


would say it all had to do with reducing variation.

Amit (ISI, Chennai) Data July 7, 2021 35 / 59


Population & Sample Histogram

Frequency Distribution & Histogram

If I had to reduce my message to management to just a few words, I


would say it all had to do with reducing variation.
— W. Edwards Deming

Amit (ISI, Chennai) Data July 7, 2021 35 / 59


Population & Sample Histogram

A data set

What do these numbers mean to you?

2.87, 2.85, 2.88, 2.85, 2.86, 2.85, 2.81, 2.82, 2.83, 2.85,
2.84, 2.84, 2.85, 2.86, 2.85, 2.84, 2.85, 2.85, 2.87, 2.81,
2.85, 2.82, 2.83, 2.85, 2.85, 2.86, 2.85, 2.86, 2.89, 2.85,
2.84, 2.84, 2.85, 2.85, 2.83, 2.82, 2.86, 2.83, 2.85, 2.86,
2.85, 2.84, 2.84, 2.87, 2.85, 2.86, 2.85, 2.84, 2.90, 2.88

Amit (ISI, Chennai) Data July 7, 2021 36 / 59


Population & Sample Histogram

A data set

What do these numbers mean to you?

2.87, 2.85, 2.88, 2.85, 2.86, 2.85, 2.81, 2.82, 2.83, 2.85,
2.84, 2.84, 2.85, 2.86, 2.85, 2.84, 2.85, 2.85, 2.87, 2.81,
2.85, 2.82, 2.83, 2.85, 2.85, 2.86, 2.85, 2.86, 2.89, 2.85,
2.84, 2.84, 2.85, 2.85, 2.83, 2.82, 2.86, 2.83, 2.85, 2.86,
2.85, 2.84, 2.84, 2.87, 2.85, 2.86, 2.85, 2.84, 2.90, 2.88
Possibly not much!

Amit (ISI, Chennai) Data July 7, 2021 36 / 59


Population & Sample Histogram

A data set

What do these numbers mean to you?

2.87, 2.85, 2.88, 2.85, 2.86, 2.85, 2.81, 2.82, 2.83, 2.85,
2.84, 2.84, 2.85, 2.86, 2.85, 2.84, 2.85, 2.85, 2.87, 2.81,
2.85, 2.82, 2.83, 2.85, 2.85, 2.86, 2.85, 2.86, 2.89, 2.85,
2.84, 2.84, 2.85, 2.85, 2.83, 2.82, 2.86, 2.83, 2.85, 2.86,
2.85, 2.84, 2.84, 2.87, 2.85, 2.86, 2.85, 2.84, 2.90, 2.88
Possibly not much!

Amit (ISI, Chennai) Data July 7, 2021 36 / 59


Population & Sample Histogram

Pictorial representation of data

Why pictures are important?

Amit (ISI, Chennai) Data July 7, 2021 37 / 59


Population & Sample Histogram

Pictorial representation of data

Why pictures are important?

Pictures/Displays of data stimulate hypothesis generation which is a


key step in process improvement

Amit (ISI, Chennai) Data July 7, 2021 37 / 59


Population & Sample Histogram

Pictorial representation of data

Why pictures are important?

Pictures/Displays of data stimulate hypothesis generation which is a


key step in process improvement

Now lets do the following—

Amit (ISI, Chennai) Data July 7, 2021 37 / 59


Population & Sample Histogram

Pictorial representation of data

Why pictures are important?

Pictures/Displays of data stimulate hypothesis generation which is a


key step in process improvement

Now lets do the following—

Find the minimum value.

Amit (ISI, Chennai) Data July 7, 2021 37 / 59


Population & Sample Histogram

Pictorial representation of data

Why pictures are important?

Pictures/Displays of data stimulate hypothesis generation which is a


key step in process improvement

Now lets do the following—

Find the minimum value.


Find the maximum value.

Amit (ISI, Chennai) Data July 7, 2021 37 / 59


Population & Sample Histogram

Pictorial representation of data

Why pictures are important?

Pictures/Displays of data stimulate hypothesis generation which is a


key step in process improvement

Now lets do the following—

Find the minimum value.


Find the maximum value.
Arrange all values between minimum & maximum value in
ascending order.

Amit (ISI, Chennai) Data July 7, 2021 37 / 59


Population & Sample Histogram

Pictorial representation of data

Why pictures are important?

Pictures/Displays of data stimulate hypothesis generation which is a


key step in process improvement

Now lets do the following—

Find the minimum value.


Find the maximum value.
Arrange all values between minimum & maximum value in
ascending order.
For each number, in the table make a mark, after reading each
number from the given jumble of numbers.

Amit (ISI, Chennai) Data July 7, 2021 37 / 59


Population & Sample Histogram

Pictorial representation of data


Value Tally Frequency
2.80 - 0
2.81 // 2
2.82 /// 3
2.83 //// 4
2.84 //////// 8
2.85 /////////////////// 19
2.86 /////// 7
2.87 /// 3
2.88 // 2
2.89 / 1
2.90 / 1
Total 50

The above table now tells us

Amit (ISI, Chennai) Data July 7, 2021 38 / 59


Population & Sample Histogram

Pictorial representation of data


Value Tally Frequency
2.80 - 0
2.81 // 2
2.82 /// 3
2.83 //// 4
2.84 //////// 8
2.85 /////////////////// 19
2.86 /////// 7
2.87 /// 3
2.88 // 2
2.89 / 1
2.90 / 1
Total 50

The above table now tells us

2.85 occurs with a maximum frequency of 38%.

Amit (ISI, Chennai) Data July 7, 2021 38 / 59


Population & Sample Histogram

Pictorial representation of data


Value Tally Frequency
2.80 - 0
2.81 // 2
2.82 /// 3
2.83 //// 4
2.84 //////// 8
2.85 /////////////////// 19
2.86 /////// 7
2.87 /// 3
2.88 // 2
2.89 / 1
2.90 / 1
Total 50

The above table now tells us

2.85 occurs with a maximum frequency of 38%.

Approximately 82% of the values lie between 2.83 and 2.87.

Amit (ISI, Chennai) Data July 7, 2021 38 / 59


Population & Sample Histogram

Pictorial representation of data


Value Tally Frequency
2.80 - 0
2.81 // 2
2.82 /// 3
2.83 //// 4
2.84 //////// 8
2.85 /////////////////// 19
2.86 /////// 7
2.87 /// 3
2.88 // 2
2.89 / 1
2.90 / 1
Total 50

The above table now tells us

2.85 occurs with a maximum frequency of 38%.

Approximately 82% of the values lie between 2.83 and 2.87.

THIS TABLE THAT GIVES YOU THIS DISTRIBUTION IS


CALLED :
Frequency Distribution Table
Amit (ISI, Chennai) Data July 7, 2021 38 / 59
Population & Sample Histogram

Now do the following. . .


For the same data, divide the range (i.e. 2.9 − 2.80 = 0.10) in equal
no.of parts (say 5).

Amit (ISI, Chennai) Data July 7, 2021 39 / 59


Population & Sample Histogram

Now do the following. . .


For the same data, divide the range (i.e. 2.9 − 2.80 = 0.10) in equal
no.of parts (say 5).

Hence each part will be equal to 0.02.

Amit (ISI, Chennai) Data July 7, 2021 39 / 59


Population & Sample Histogram

Now do the following. . .


For the same data, divide the range (i.e. 2.9 − 2.80 = 0.10) in equal
no.of parts (say 5).

Hence each part will be equal to 0.02.

The table now will have values:


2.80 − 2.82, 2.82 − 2.84, 2.84 − 2.86, 2.86 − 2.88, 2.88 − 2.90

Amit (ISI, Chennai) Data July 7, 2021 39 / 59


Population & Sample Histogram

Now do the following. . .


For the same data, divide the range (i.e. 2.9 − 2.80 = 0.10) in equal
no.of parts (say 5).

Hence each part will be equal to 0.02.

The table now will have values:


2.80 − 2.82, 2.82 − 2.84, 2.84 − 2.86, 2.86 − 2.88, 2.88 − 2.90
Value Tally Frequency
2.80-2.82 ///// 5
2.82-2.84 //////////// 12
2.84-2.86 ////////////////////////// 26
2.86-2.88 ///// 5
2.88-2.90 // 2
Total 50

The spread of each range is (e.g. 2.82 − 2.80 = 0.2 ) is called Class
Interval. Now we also know that:

Amit (ISI, Chennai) Data July 7, 2021 39 / 59


Population & Sample Histogram

Now do the following. . .


For the same data, divide the range (i.e. 2.9 − 2.80 = 0.10) in equal
no.of parts (say 5).

Hence each part will be equal to 0.02.

The table now will have values:


2.80 − 2.82, 2.82 − 2.84, 2.84 − 2.86, 2.86 − 2.88, 2.88 − 2.90
Value Tally Frequency
2.80-2.82 ///// 5
2.82-2.84 //////////// 12
2.84-2.86 ////////////////////////// 26
2.86-2.88 ///// 5
2.88-2.90 // 2
Total 50

The spread of each range is (e.g. 2.82 − 2.80 = 0.2 ) is called Class
Interval. Now we also know that:
52% of values lie in 2.84 − 2.86 range.

Amit (ISI, Chennai) Data July 7, 2021 39 / 59


Population & Sample Histogram

Now do the following. . .


For the same data, divide the range (i.e. 2.9 − 2.80 = 0.10) in equal
no.of parts (say 5).

Hence each part will be equal to 0.02.

The table now will have values:


2.80 − 2.82, 2.82 − 2.84, 2.84 − 2.86, 2.86 − 2.88, 2.88 − 2.90
Value Tally Frequency
2.80-2.82 ///// 5
2.82-2.84 //////////// 12
2.84-2.86 ////////////////////////// 26
2.86-2.88 ///// 5
2.88-2.90 // 2
Total 50

The spread of each range is (e.g. 2.82 − 2.80 = 0.2 ) is called Class
Interval. Now we also know that:
52% of values lie in 2.84 − 2.86 range.
What is the distribution in a given range.

Amit (ISI, Chennai) Data July 7, 2021 39 / 59


Population & Sample Histogram

Now do the following. . .


For the same data, divide the range (i.e. 2.9 − 2.80 = 0.10) in equal
no.of parts (say 5).

Hence each part will be equal to 0.02.

The table now will have values:


2.80 − 2.82, 2.82 − 2.84, 2.84 − 2.86, 2.86 − 2.88, 2.88 − 2.90
Value Tally Frequency
2.80-2.82 ///// 5
2.82-2.84 //////////// 12
2.84-2.86 ////////////////////////// 26
2.86-2.88 ///// 5
2.88-2.90 // 2
Total 50

The spread of each range is (e.g. 2.82 − 2.80 = 0.2 ) is called Class
Interval. Now we also know that:
52% of values lie in 2.84 − 2.86 range.
What is the distribution in a given range.
It is helpful when you have a large number of values.
Amit (ISI, Chennai) Data July 7, 2021 39 / 59
Population & Sample Histogram

Variation for a period of time

Definition : A Histogram shows the shape, or distribution, of the


data by displaying how often different values occur.

Amit (ISI, Chennai) Data July 7, 2021 40 / 59


Population & Sample Histogram

Variation for a period of time

Definition : A Histogram shows the shape, or distribution, of the


data by displaying how often different values occur.

The following is an example—

Amit (ISI, Chennai) Data July 7, 2021 40 / 59


Population & Sample Histogram

Variation for a period of time

Definition : A Histogram shows the shape, or distribution, of the


data by displaying how often different values occur.

The following is an example—

Amit (ISI, Chennai) Data July 7, 2021 40 / 59


Population & Sample Histogram

Histogram

A Histogram gives you the Frequency distribution pictorially.

Amit (ISI, Chennai) Data July 7, 2021 41 / 59


Population & Sample Histogram

Histogram

A Histogram gives you the Frequency distribution pictorially.

The concept of X - Y axis and origin need not be followed once you
gain proficiency in drawing histograms.

Amit (ISI, Chennai) Data July 7, 2021 41 / 59


Population & Sample Histogram

Histogram

A Histogram gives you the Frequency distribution pictorially.

The concept of X - Y axis and origin need not be followed once you
gain proficiency in drawing histograms.

The specified limits can be marked on histograms to show the


behavior of variation versus the actual required values.

Amit (ISI, Chennai) Data July 7, 2021 41 / 59


Population & Sample Histogram

Histogram

A Histogram gives you the Frequency distribution pictorially.

The concept of X - Y axis and origin need not be followed once you
gain proficiency in drawing histograms.

The specified limits can be marked on histograms to show the


behavior of variation versus the actual required values.

Histogram is more common than frequency distribution.

Amit (ISI, Chennai) Data July 7, 2021 41 / 59


Population & Sample Histogram

Histogram
Histograms can be of various shapes, each meaning different things.

Amit (ISI, Chennai) Data July 7, 2021 42 / 59


Population & Sample Histogram

Histogram
Histograms can be of various shapes, each meaning different things.

Amit (ISI, Chennai) Data July 7, 2021 42 / 59


Population & Sample Histogram

Histogram
Histograms can be of various shapes, each meaning different things.

Interpretation of Histogram can lead to finding the source or root


cause of a possible problem.
Amit (ISI, Chennai) Data July 7, 2021 42 / 59
Population & Sample Moments, Skewness Kurtosis

Contents

1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures

2 Population & Sample


Histogram
Moments, Skewness Kurtosis
Further on data collection, compilation
Measurements from data
Scales

Amit (ISI, Chennai) Data July 7, 2021 43 / 59


Population & Sample Moments, Skewness Kurtosis

Moments

Beyond the measures of central tendency and dispersion discussed


so far, there are measures that further describe the characteristics
of a distribution. Few of them, shall follow.

Amit (ISI, Chennai) Data July 7, 2021 44 / 59


Population & Sample Moments, Skewness Kurtosis

Moments

Beyond the measures of central tendency and dispersion discussed


so far, there are measures that further describe the characteristics
of a distribution. Few of them, shall follow.

Moments are statistical parameters that measure a distribution.


Four moments commonly used are :

Amit (ISI, Chennai) Data July 7, 2021 44 / 59


Population & Sample Moments, Skewness Kurtosis

Moments

Beyond the measures of central tendency and dispersion discussed


so far, there are measures that further describe the characteristics
of a distribution. Few of them, shall follow.

Moments are statistical parameters that measure a distribution.


Four moments commonly used are :

1st moment — Mean (describes central value)

Amit (ISI, Chennai) Data July 7, 2021 44 / 59


Population & Sample Moments, Skewness Kurtosis

Moments

Beyond the measures of central tendency and dispersion discussed


so far, there are measures that further describe the characteristics
of a distribution. Few of them, shall follow.

Moments are statistical parameters that measure a distribution.


Four moments commonly used are :

1st moment — Mean (describes central value)

2nd moment — Variance (describes dispersion)

Amit (ISI, Chennai) Data July 7, 2021 44 / 59


Population & Sample Moments, Skewness Kurtosis

Moments

Beyond the measures of central tendency and dispersion discussed


so far, there are measures that further describe the characteristics
of a distribution. Few of them, shall follow.

Moments are statistical parameters that measure a distribution.


Four moments commonly used are :

1st moment — Mean (describes central value)

2nd moment — Variance (describes dispersion)

3rd moment — Skewness (describes asymmetry)

Amit (ISI, Chennai) Data July 7, 2021 44 / 59


Population & Sample Moments, Skewness Kurtosis

Moments

Beyond the measures of central tendency and dispersion discussed


so far, there are measures that further describe the characteristics
of a distribution. Few of them, shall follow.

Moments are statistical parameters that measure a distribution.


Four moments commonly used are :

1st moment — Mean (describes central value)

2nd moment — Variance (describes dispersion)

3rd moment — Skewness (describes asymmetry)

4th moment — Kurtosis (describes peakedness)

Amit (ISI, Chennai) Data July 7, 2021 44 / 59


Population & Sample Moments, Skewness Kurtosis

Moments

Beyond the measures of central tendency and dispersion discussed


so far, there are measures that further describe the characteristics
of a distribution. Few of them, shall follow.

Moments are statistical parameters that measure a distribution.


Four moments commonly used are :

1st moment — Mean (describes central value)

2nd moment — Variance (describes dispersion)

3rd moment — Skewness (describes asymmetry)

4th moment — Kurtosis (describes peakedness)

Amit (ISI, Chennai) Data July 7, 2021 44 / 59


Population & Sample Moments, Skewness Kurtosis

Moments

Moments are of two types, Raw and Central

Amit (ISI, Chennai) Data July 7, 2021 45 / 59


Population & Sample Moments, Skewness Kurtosis

Moments

Moments are of two types, Raw and Central


P
x
Population mean µ1 = N is the first raw moment, 1st central moment being
0.

Amit (ISI, Chennai) Data July 7, 2021 45 / 59


Population & Sample Moments, Skewness Kurtosis

Moments

Moments are of two types, Raw and Central


P
x
Population mean µ1 = N is the first raw moment, 1st central moment being
0.
(x−µ)2
P
Population variance µ2 = N is the second central moment.

Amit (ISI, Chennai) Data July 7, 2021 45 / 59


Population & Sample Moments, Skewness Kurtosis

Moments

Moments are of two types, Raw and Central


P
x
Population mean µ1 = N is the first raw moment, 1st central moment being
0.
(x−µ)2
P
Population variance µ2 = N is the second central moment.
(x−µ)3
P
3rd central moment is µ3 = N

Amit (ISI, Chennai) Data July 7, 2021 45 / 59


Population & Sample Moments, Skewness Kurtosis

Moments

Moments are of two types, Raw and Central


P
x
Population mean µ1 = N is the first raw moment, 1st central moment being
0.
(x−µ)2
P
Population variance µ2 = N is the second central moment.
(x−µ)3
P
3rd central moment is µ3 = N

(x−µ)4
P
4th central moment is µ4 = N

Amit (ISI, Chennai) Data July 7, 2021 45 / 59


Population & Sample Moments, Skewness Kurtosis

Moments

Moments are of two types, Raw and Central


P
x
Population mean µ1 = N is the first raw moment, 1st central moment being
0.
(x−µ)2
P
Population variance µ2 = N is the second central moment.
(x−µ)3
P
3rd central moment is µ3 = N

(x−µ)4
P
4th central moment is µ4 = N

So the index defines which moment it is and the subtraction of µ or


not defines whether it is raw or central. Though mostly we use central
moments.

Amit (ISI, Chennai) Data July 7, 2021 45 / 59


Population & Sample Moments, Skewness Kurtosis

Skewness
Skewness refers to deviation from symmetry.

Amit (ISI, Chennai) Data July 7, 2021 46 / 59


Population & Sample Moments, Skewness Kurtosis

Skewness
Skewness refers to deviation from symmetry.

Amit (ISI, Chennai) Data July 7, 2021 46 / 59


Population & Sample Moments, Skewness Kurtosis

Skewness
Skewness refers to deviation from symmetry.

As we observe in the picture above,

Amit (ISI, Chennai) Data July 7, 2021 46 / 59


Population & Sample Moments, Skewness Kurtosis

Skewness
Skewness refers to deviation from symmetry.

As we observe in the picture above,If Mean > Mode, the skewness is positive.

Amit (ISI, Chennai) Data July 7, 2021 46 / 59


Population & Sample Moments, Skewness Kurtosis

Skewness
Skewness refers to deviation from symmetry.

As we observe in the picture above,If Mean > Mode, the skewness is positive.If

Mean < Mode, the skewness is negative.

Amit (ISI, Chennai) Data July 7, 2021 46 / 59


Population & Sample Moments, Skewness Kurtosis

Skewness
Skewness refers to deviation from symmetry.

As we observe in the picture above,If Mean > Mode, the skewness is positive.If

Mean < Mode, the skewness is negative.If Mean = Mode, the skewness is zero.

Amit (ISI, Chennai) Data July 7, 2021 46 / 59


Population & Sample Moments, Skewness Kurtosis

Skewness
Skewness refers to deviation from symmetry.

As we observe in the picture above,If Mean > Mode, the skewness is positive.If

Mean < Mode, the skewness is negative.If Mean = Mode, the skewness is zero.

Mean−Mode
standarddeviation is a measure of skewness by Karl Pearson.

Amit (ISI, Chennai) Data July 7, 2021 46 / 59


Population & Sample Moments, Skewness Kurtosis

Skewness
Skewness refers to deviation from symmetry.

As we observe in the picture above,If Mean > Mode, the skewness is positive.If

Mean < Mode, the skewness is negative.If Mean = Mode, the skewness is zero.

Mean−Mode
standarddeviation is a measure of skewness by Karl Pearson.

Amit (ISI, Chennai) Data July 7, 2021 46 / 59


Population & Sample Moments, Skewness Kurtosis

Skewness

µ23
Moment based measure of skewness = β1 = µ32

Amit (ISI, Chennai) Data July 7, 2021 47 / 59


Population & Sample Moments, Skewness Kurtosis

Skewness

µ23
Moment based measure of skewness = β1 = µ32

PLEASE NOTE THE FOLLOWING :

Amit (ISI, Chennai) Data July 7, 2021 47 / 59


Population & Sample Moments, Skewness Kurtosis

Skewness

µ23
Moment based measure of skewness = β1 = µ32

PLEASE NOTE THE FOLLOWING :

γ1 is √
defined as the square root of β1 to retain the original sign of µ3 .
γ 1 = β1

Amit (ISI, Chennai) Data July 7, 2021 47 / 59


Population & Sample Moments, Skewness Kurtosis

Kurtosis

Kurtosis refers to the degree of peakedness of a frequency curve.

Amit (ISI, Chennai) Data July 7, 2021 48 / 59


Population & Sample Moments, Skewness Kurtosis

Kurtosis

Kurtosis refers to the degree of peakedness of a frequency curve.

Kurtosis is measured in the following ways: Moment based Measure


of kurtosis β2 = µµ24
2

Amit (ISI, Chennai) Data July 7, 2021 48 / 59


Population & Sample Moments, Skewness Kurtosis

Kurtosis

Kurtosis refers to the degree of peakedness of a frequency curve.

Kurtosis is measured in the following ways: Moment based Measure


of kurtosis β2 = µµ24
2

Amit (ISI, Chennai) Data July 7, 2021 48 / 59


Population & Sample Moments, Skewness Kurtosis

Kurtosis

Kurtosis refers to the degree of peakedness of a frequency curve.

Kurtosis is measured in the following ways: Moment based Measure


of kurtosis β2 = µµ24
2

Amit (ISI, Chennai) Data July 7, 2021 48 / 59


Population & Sample Further on data collection, compilation

Contents

1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures

2 Population & Sample


Histogram
Moments, Skewness Kurtosis
Further on data collection, compilation
Measurements from data
Scales

Amit (ISI, Chennai) Data July 7, 2021 49 / 59


Population & Sample Further on data collection, compilation

Stratification

• The method of grouping data by common points or characteristics


to better understand similarities and characteristics of data is called
stratification.

Amit (ISI, Chennai) Data July 7, 2021 50 / 59


Population & Sample Further on data collection, compilation

Stratification

• The method of grouping data by common points or characteristics


to better understand similarities and characteristics of data is called
stratification.

• Such classification helps in obtaining vital information by


distinguishing and comparing data in different class or strata.

Amit (ISI, Chennai) Data July 7, 2021 50 / 59


Population & Sample Further on data collection, compilation

Stratification

• The method of grouping data by common points or characteristics


to better understand similarities and characteristics of data is called
stratification.

• Such classification helps in obtaining vital information by


distinguishing and comparing data in different class or strata.

• It also identifies the key strata to concentrate on.

Amit (ISI, Chennai) Data July 7, 2021 50 / 59


Population & Sample Further on data collection, compilation

Stratification

• The method of grouping data by common points or characteristics


to better understand similarities and characteristics of data is called
stratification.

• Such classification helps in obtaining vital information by


distinguishing and comparing data in different class or strata.

• It also identifies the key strata to concentrate on.

• The stratification may be based on machines, operators, shifts or


any other source of variation.

Amit (ISI, Chennai) Data July 7, 2021 50 / 59


Population & Sample Further on data collection, compilation

Stratification . . .

• The purpose of stratification is to ascertain the difference between


different categories and to analyze the reasons behind abnormal
distribution.

Amit (ISI, Chennai) Data July 7, 2021 51 / 59


Population & Sample Further on data collection, compilation

Stratification . . .

• The purpose of stratification is to ascertain the difference between


different categories and to analyze the reasons behind abnormal
distribution.

• Stratification of data is an effective method for isolating the cause


of a problem.

Amit (ISI, Chennai) Data July 7, 2021 51 / 59


Population & Sample Further on data collection, compilation

Stratification . . .

• The purpose of stratification is to ascertain the difference between


different categories and to analyze the reasons behind abnormal
distribution.

• Stratification of data is an effective method for isolating the cause


of a problem.

• You can also stratify the data you collect by different QC tools
such as graphs, Pareto diagrams, check sheets, histograms, scatter
diagrams, and control charts.

Amit (ISI, Chennai) Data July 7, 2021 51 / 59


Population & Sample Further on data collection, compilation

Area of application

• Raw Material
Quantity supplied, Delivery time, Rejection % - supplier wise and
batch wise.

Amit (ISI, Chennai) Data July 7, 2021 52 / 59


Population & Sample Further on data collection, compilation

Area of application

• Raw Material
Quantity supplied, Delivery time, Rejection % - supplier wise and
batch wise.

• Production
Rejection percentage with respect to machine, shift, operator, raw
material, tool, jig and so on.

Amit (ISI, Chennai) Data July 7, 2021 52 / 59


Population & Sample Further on data collection, compilation

Area of application

• Raw Material
Quantity supplied, Delivery time, Rejection % - supplier wise and
batch wise.

• Production
Rejection percentage with respect to machine, shift, operator, raw
material, tool, jig and so on.

• Engineering and design


Draftsman wise drawing errors, Type of drawing wise.

Amit (ISI, Chennai) Data July 7, 2021 52 / 59


Population & Sample Further on data collection, compilation

Check sheet, Data formats

A check sheet is a convenient and compact format for collection of


data.

Amit (ISI, Chennai) Data July 7, 2021 53 / 59


Population & Sample Further on data collection, compilation

Check sheet, Data formats

A check sheet is a convenient and compact format for collection of


data.

• PURPOSE OF CHECK SHEET

1 Simplification of data gathering


2 Provide preliminary summarization
3 Provide a basis for statistical analysis

Amit (ISI, Chennai) Data July 7, 2021 53 / 59


Population & Sample Further on data collection, compilation

Check sheet, Data formats

A check sheet is a convenient and compact format for collection of


data.

• PURPOSE OF CHECK SHEET

1 Simplification of data gathering


2 Provide preliminary summarization
3 Provide a basis for statistical analysis

• USES OF CHECK SHEET

1 Problem monitoring
2 Direction for trouble shooting

Amit (ISI, Chennai) Data July 7, 2021 53 / 59


Population & Sample Further on data collection, compilation

Area of Application — Check-Sheet

• Raw Material
No. of defects, Location of defect, measurement on quality
characteristics etc

Amit (ISI, Chennai) Data July 7, 2021 54 / 59


Population & Sample Further on data collection, compilation

Area of Application — Check-Sheet

• Raw Material
No. of defects, Location of defect, measurement on quality
characteristics etc

• Production
Measurements on process parameters, No. of defects in products,
location of defects etc

Amit (ISI, Chennai) Data July 7, 2021 54 / 59


Population & Sample Further on data collection, compilation

How to make a check sheet

Clarifying the objective

Amit (ISI, Chennai) Data July 7, 2021 55 / 59


Population & Sample Further on data collection, compilation

How to make a check sheet

Clarifying the objective

Determining the type of check sheet to use

Amit (ISI, Chennai) Data July 7, 2021 55 / 59


Population & Sample Further on data collection, compilation

How to make a check sheet

Clarifying the objective

Determining the type of check sheet to use

Deciding which items to check

Amit (ISI, Chennai) Data July 7, 2021 55 / 59


Population & Sample Further on data collection, compilation

How to make a check sheet

Clarifying the objective

Determining the type of check sheet to use

Deciding which items to check

Creating the check sheet

Amit (ISI, Chennai) Data July 7, 2021 55 / 59


Population & Sample Further on data collection, compilation

How to make a check sheet

Clarifying the objective

Determining the type of check sheet to use

Deciding which items to check

Creating the check sheet

Recording the data

Amit (ISI, Chennai) Data July 7, 2021 55 / 59


Population & Sample Further on data collection, compilation

How to make a check sheet

Clarifying the objective

Determining the type of check sheet to use

Deciding which items to check

Creating the check sheet

Recording the data

Tallying the data

Amit (ISI, Chennai) Data July 7, 2021 55 / 59


Population & Sample Further on data collection, compilation

How to make a check sheet

Clarifying the objective

Determining the type of check sheet to use

Deciding which items to check

Creating the check sheet

Recording the data

Tallying the data

Examining the check sheet

Amit (ISI, Chennai) Data July 7, 2021 55 / 59


Population & Sample Further on data collection, compilation

Process Distribution Check-Sheet

Amit (ISI, Chennai) Data July 7, 2021 56 / 59


Population & Sample Further on data collection, compilation

Location Check-Sheet

Amit (ISI, Chennai) Data July 7, 2021 57 / 59


Population & Sample Measurements from data

Contents

1 Introduction
Data
Collection of Data
Compilation of Data
Centre & Spread
Central Tendency
Dispersion
Degree of Freedom
Other dispersion measures

2 Population & Sample


Histogram
Moments, Skewness Kurtosis
Further on data collection, compilation
Measurements from data
Scales

Amit (ISI, Chennai) Data July 7, 2021 58 / 59


Population & Sample Measurements from data

Scales of Measurements
Nominal: A scale that measures data by name only. For example,
religious affiliation (measured as Jewish, Christian, Buddhist, and
so forth), political affiliation (measured as Democratic, Republican,
Libertarian, and so forth), or style of automobile (measured as
sedan, sports car, station wagon, van, and so forth).

Ordinal: Measures by rank order only. Other than rough order, no


precise measurement is possible. For example, medical condition
(measured as satisfactory, fair, poor, guarded, serious, and critical);
social-economic status (measured as lower class, lower-middle class,
middle class, upper-middle class, upper class); or military officer
rank (measured as lieutenant, captain, major, lieutenant colonel,
colonel, general). Such rankings are not absolute but rather relative
to each other: Major is higher than captain, but we cannot measure
the exact difference in numerical terms. Is the difference between
major and captain equal to the difference between colonel and
general? You cannot say.

Amit (ISI, Chennai) Data July 7, 2021 59 / 59


Population & Sample Measurements from data

Scales of Measurements

Interval: Measures by using equal intervals. Here you can compare


differences between pairs of values. The Fahrenheit temperature
scale, measured in degrees, is an interval scale, as is the centigrade
scale. The temperature difference between 50 and 60 degrees
centigrade (10 degrees) equals the temperature difference between
80 and 90 degrees centigrade (10 degrees). Note that the 0 in each
of these scales is arbitrarily placed, which makes the interval scale
different from ratio.

Ratio: Similar to an interval scale, a ratio scale includes a 0


measurement that signifies the point at which the characteristic
being measured vanishes (absolute 0). For example, income
(measured in dollars, with 0 equal to no income at all), years of
formal education, items sold, and so forth, are all ratio scales.

Amit (ISI, Chennai) Data July 7, 2021 60 / 59

You might also like