Professional Documents
Culture Documents
STML
STML
1 / 79
Content
Purposes of Statistics
Descriptive Statistics
Shape of data
Numerical Measures of Central Tendency
Numerical Measures of Variability
Notion of Normal Distribution
2 / 79
Purposes of Statistics
3 / 79
Purposes of Statistics
3 / 79
Purposes of Statistics
3 / 79
Data we normally encounter...
4 / 79
Data we normally encounter...
4 / 79
Classification of Data
5 / 79
Classification of Data
5 / 79
Classification of Data
5 / 79
Classification of Data
5 / 79
Classification of Data
5 / 79
Classification of Data
5 / 79
Classification of Data
5 / 79
Classification of Data
5 / 79
Classification of Data
5 / 79
Types of Data
6 / 79
Types of Data
7 / 79
Types of Data
7 / 79
Types of Data
7 / 79
Types of Data
7 / 79
Types of Data
7 / 79
Types of Data
7 / 79
Types of Data
7 / 79
Types of Data
7 / 79
Types of Data
7 / 79
Types of Data
8 / 79
Types of Data
8 / 79
Types of Data
8 / 79
Types of Data
8 / 79
Types of Data
8 / 79
Types of Data
8 / 79
Types of Data
8 / 79
Types of Data
8 / 79
Types of Data
8 / 79
Types of Data
8 / 79
Types of Data
8 / 79
Stevens classification categorizes data according to four
basic properties:
9 / 79
Stevens classification categorizes data according to four
basic properties:
1. Description
In this measurement all we can do is to name or label things. We
cannot perform any arithmetic with nominal level data. All we can
do is count the frequencies with which the things occur.
9 / 79
Stevens classification categorizes data according to four
basic properties:
1. Description
In this measurement all we can do is to name or label things. We
cannot perform any arithmetic with nominal level data. All we can
do is count the frequencies with which the things occur.
2. Order
This scale enables us to order the items of interest using ordinal
numbers. Ordinal numbers denote an item’s position or rank in a
sequence: First, second, third, and so on.
9 / 79
Stevens classification categorizes data according to four
basic properties: (contd.)
10 / 79
Stevens classification categorizes data according to four
basic properties: (contd.)
3. Distance
The interval level has an inherent order, but here we do have the
distance between intervals on the scale.
10 / 79
Stevens classification categorizes data according to four
basic properties: (contd.)
3. Distance
The interval level has an inherent order, but here we do have the
distance between intervals on the scale.
4. Origin
The addition of a non-arbitrary zero allows us to calculate the
numerical relationship between values using ratios.
For example: A person who weighs 150 pounds, weighs twice as
much as a person who weighs only 75 pounds and half as much as a
person who weighs 300 pounds. We can calculate ratios like these
because the scale for weight in pounds starts at zero pounds.
These are also referred as primary scales of measurement
10 / 79
Different types of Data
11 / 79
Different types of Data
12 / 79
Primary scales of measurement: Ratio Data
13 / 79
Primary scales of measurement: Interval Data
14 / 79
Primary scales of measurement: Ordinal Data
This scale has Description, and order, but No distance, and No Origin.
15 / 79
Primary scales of measurement: Nominal Data
16 / 79
To begin, not all data are of the same type
17 / 79
To begin, not all data are of the same type
17 / 79
To begin, not all data are of the same type
18 / 79
Descriptive Statistics
19 / 79
Statistics: A Single variable
20 / 79
Statistics: A Single variable
20 / 79
Statistics: A Single variable
20 / 79
Statistics: A Single variable
20 / 79
Example
21 / 79
Example
21 / 79
Example
21 / 79
Example
21 / 79
Example
21 / 79
Example
21 / 79
Example
21 / 79
So, the claim that we need to verify is:
22 / 79
So, the claim that we need to verify is:
22 / 79
So, the claim that we need to verify is:
22 / 79
So, the claim that we need to verify is:
22 / 79
So, the claim that we need to verify is:
22 / 79
So, the claim that we need to verify is:
22 / 79
Data
23 / 79
Data
23 / 79
Data
23 / 79
Data
23 / 79
Data
23 / 79
Data
23 / 79
Data
23 / 79
Summarizing Quantitative Data - Ordering
24 / 79
Summarizing Quantitative Data - Ordering
24 / 79
Summarizing Quantitative Data - Ordering
24 / 79
Summarizing Quantitative Data - Ordering
We can now quickly read, Max = 68, Min = 27, some concentration of
values around 50. . .
24 / 79
Summarizing Quantitative Data: Ungrouped Frequency
Distributions
25 / 79
Summarizing Quantitative Data: Ungrouped Frequency
Distributions
25 / 79
Summarizing Quantitative Data: Ungrouped Frequency
Distributions
25 / 79
Summarizing Quantitative Data: Ungrouped Frequency
Distributions
25 / 79
Summarizing Quantitative Data: Ungrouped Frequency
Distributions - Graphical
26 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Grouping 1
27 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Grouping 1
27 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Grouping 1
27 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Grouping 2
Any grouped frequency distribution chart is not unique. There are many
ways of groupings that can be made.
28 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Grouping 2
Any grouped frequency distribution chart is not unique. There are many
ways of groupings that can be made.
28 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Grouping 2
Any grouped frequency distribution chart is not unique. There are many
ways of groupings that can be made.
28 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Grouping 2
Any grouped frequency distribution chart is not unique. There are many
ways of groupings that can be made.
The right number of groupings and class intervals are subjective and
depends on data.
28 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Grouping 2
Any grouped frequency distribution chart is not unique. There are many
ways of groupings that can be made.
The right number of groupings and class intervals are subjective and
depends on data.
Anything over 10 groups becomes difficult to read and comprehend.
28 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Grouping 2
Any grouped frequency distribution chart is not unique. There are many
ways of groupings that can be made.
The right number of groupings and class intervals are subjective and
depends on data.
Anything over 10 groups becomes difficult to read and comprehend.
It is not advisable to vary the class intervals. (e.g. Hrs: 63 -68, 55 – 62,
50 – 54)
28 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Open Ended Groups
Any grouped frequency distribution chart is not unique. There are many
ways of groupings that can be made.
29 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Open Ended Groups
Any grouped frequency distribution chart is not unique. There are many
ways of groupings that can be made.
29 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Open Ended Groups
Any grouped frequency distribution chart is not unique. There are many
ways of groupings that can be made.
29 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Open Ended Groups
Any grouped frequency distribution chart is not unique. There are many
ways of groupings that can be made.
Open ended Groups bring focus to data points that need greater
attention.
29 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions – Open Ended Groups
Any grouped frequency distribution chart is not unique. There are many
ways of groupings that can be made.
Open ended Groups bring focus to data points that need greater
attention.
But, it is not amenable to certain mathematical computations. (For
example, average)
29 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions - Graphical
Histograms
30 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions - Graphical
Histograms
▶ Divide the range of values in sample
set into small intervals and count how
many observations fall within each
interval.
30 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions - Graphical
Histograms
▶ Divide the range of values in sample
set into small intervals and count how
many observations fall within each
interval.
▶ For each interval plot a rectangle with
width = interval size and height equal
to number of observations in interval.
30 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions - Graphical
Histograms
▶ Divide the range of values in sample
set into small intervals and count how
many observations fall within each
interval.
▶ For each interval plot a rectangle with
width = interval size and height equal
to number of observations in interval.
30 / 79
Summarizing Quantitative Data: Grouped Frequency
Distributions - Graphical
31 / 79
Summarizing Quantitative Data: Relative Frequency
Distributions – Proportions/Percentages
32 / 79
Summarizing Quantitative Data: Relative Frequency
Distributions – Proportions/Percentages
32 / 79
Summarizing Quantitative Data: Relative Frequency
Distributions – Proportions/Percentages
32 / 79
Summarizing Quantitative Data: Cumulative Frequency
Distributions
33 / 79
Summarizing Quantitative Data: Cumulative Frequency
Distributions
33 / 79
Summarizing Quantitative Data: Cumulative Frequency
Distributions
33 / 79
Summarizing Quantitative Data: Cumulative Frequency
Distributions - Graphical
34 / 79
Summarizing Quantitative Data: Basic Statistical
Measures- Shape of data
35 / 79
Basic Statistical Measures: Shape of data - Examples
36 / 79
Basic Statistical Measures: Shape of data - Examples
36 / 79
Basic Statistical Measures: Shape of data - Examples
36 / 79
Basic Statistical Measures: Shape of data - Examples
36 / 79
Basic Statistical Measures: Shape of data - Examples
36 / 79
Basic Statistical Measures: Shape of data - Examples
36 / 79
Basic Statistical measures: Measures of location
37 / 79
Basic Statistical measures: Measures of location
37 / 79
Basic Statistical measures: Measures of location
37 / 79
Basic Statistical measures: Measures of location
37 / 79
Basic Statistical measures: Measures of location
37 / 79
Basic Statistical measures: Measures of location
37 / 79
Summarizing Quantitative Data: Central Tendency –
Mode (and proportion)
38 / 79
Summarizing Quantitative Data: Central Tendency –
Mode (and proportion)
The mode is the data value that occurs most frequently in the data set.
38 / 79
Summarizing Quantitative Data: Central Tendency –
Mode (and proportion)
The mode is the data value that occurs most frequently in the data set.
38 / 79
Summarizing Quantitative Data: Central Tendency –
Mode (and proportion)
The mode is the data value that occurs most frequently in the data set.
Mode: 50 Hours
3
( 30 x100 = 10%)
38 / 79
Mode
39 / 79
Mode
Advantages
39 / 79
Mode
Advantages
i. Only sensible measure for
qualitative data
39 / 79
Mode
Advantages
i. Only sensible measure for
qualitative data
ii. More appropriate for quantitative
data which are inherently discrete
39 / 79
Mode
Disadvantages
Advantages
i. Only sensible measure for
qualitative data
ii. More appropriate for quantitative
data which are inherently discrete
39 / 79
Mode
Disadvantages
i. There may not be a single
Advantages mode (multimodal data)
i. Only sensible measure for
qualitative data
ii. More appropriate for quantitative
data which are inherently discrete
39 / 79
Mode
Disadvantages
i. There may not be a single
Advantages mode (multimodal data)
i. Only sensible measure for ii. It does not use all the data
qualitative data available
ii. More appropriate for quantitative
data which are inherently discrete
39 / 79
Mode
Disadvantages
i. There may not be a single
Advantages mode (multimodal data)
i. Only sensible measure for ii. It does not use all the data
qualitative data available
ii. More appropriate for quantitative iii. Poor sampling stability - Large
data which are inherently discrete variations across samples
39 / 79
Mode
Disadvantages
i. There may not be a single
Advantages mode (multimodal data)
i. Only sensible measure for ii. It does not use all the data
qualitative data available
ii. More appropriate for quantitative iii. Poor sampling stability - Large
data which are inherently discrete variations across samples
iv. Not very mathematically
tractable
39 / 79
Summarizing Quantitative Data: Central Tendency – Mean
40 / 79
Summarizing Quantitative Data: Central Tendency – Mean
The mean is the arithmetic average of all the data values in the data set.
40 / 79
Summarizing Quantitative Data: Central Tendency – Mean
The mean is the arithmetic average of all the data values in the data set.
40 / 79
Summarizing Quantitative Data: Central Tendency – Mean
The mean is the arithmetic average of all the data values in the data set.
40 / 79
Summarizing Quantitative Data: Central Tendency – Mean
The mean is the arithmetic average of all the data values in the data set.
40 / 79
Mean
41 / 79
Mean
Advantages
41 / 79
Mean
Advantages
i. Uses all the data values
available
41 / 79
Mean
Advantages
i. Uses all the data values
available
ii. Moderate sampling stability
41 / 79
Mean
Advantages
i. Uses all the data values
available
ii. Moderate sampling stability
iii. Highly mathematically tractable
41 / 79
Mean
Disadvantages
Advantages
i. Uses all the data values
available
ii. Moderate sampling stability
iii. Highly mathematically tractable
41 / 79
Mean
Disadvantages
i. More sensitive to extreme values –
Supposing instead of 33, and 27 Hrs,
Advantages we had 24, and 20 Hrs, the mean
would be 49.27 Hrs
i. Uses all the data values
available
ii. Moderate sampling stability
iii. Highly mathematically tractable
41 / 79
Mean
Disadvantages
i. More sensitive to extreme values –
Supposing instead of 33, and 27 Hrs,
Advantages we had 24, and 20 Hrs, the mean
would be 49.27 Hrs
i. Uses all the data values
available ii. Not appropriate for qualitative data –
You can’t get an average Gender for
ii. Moderate sampling stability
example
iii. Highly mathematically tractable
41 / 79
Mean
Disadvantages
i. More sensitive to extreme values –
Supposing instead of 33, and 27 Hrs,
Advantages we had 24, and 20 Hrs, the mean
would be 49.27 Hrs
i. Uses all the data values
available ii. Not appropriate for qualitative data –
You can’t get an average Gender for
ii. Moderate sampling stability
example
iii. Highly mathematically tractable
iii. Not appropriate for open ended
distributions
41 / 79
Mean
Disadvantages
i. More sensitive to extreme values –
Supposing instead of 33, and 27 Hrs,
Advantages we had 24, and 20 Hrs, the mean
would be 49.27 Hrs
i. Uses all the data values
available ii. Not appropriate for qualitative data –
You can’t get an average Gender for
ii. Moderate sampling stability
example
iii. Highly mathematically tractable
iii. Not appropriate for open ended
distributions
iv. Not appropriate for skewed
distributions
41 / 79
Mean
Disadvantages
i. More sensitive to extreme values –
Supposing instead of 33, and 27 Hrs,
Advantages we had 24, and 20 Hrs, the mean
would be 49.27 Hrs
i. Uses all the data values
available ii. Not appropriate for qualitative data –
You can’t get an average Gender for
ii. Moderate sampling stability
example
iii. Highly mathematically tractable
iii. Not appropriate for open ended
distributions
iv. Not appropriate for skewed
distributions
41 / 79
Summarizing Quantitative Data: Central Tendency –
Median
The median is the data value in the distribution that divides the data into
two groups having equal frequencies – the center point of the data set.
42 / 79
Summarizing Quantitative Data: Central Tendency –
Median
The median is the data value in the distribution that divides the data into
two groups having equal frequencies – the center point of the data set.
42 / 79
Summarizing Quantitative Data: Central Tendency –
Median
The median is the data value in the distribution that divides the data into
two groups having equal frequencies – the center point of the data set.
42 / 79
Summarizing Quantitative Data: Central Tendency –
Median
The median is the data value in the distribution that divides the data into
two groups having equal frequencies – the center point of the data set.
42 / 79
Median
43 / 79
Median
Advantages
43 / 79
Median
Advantages
▶ Simple to compute.
43 / 79
Median
Advantages
▶ Simple to compute.
▶ Very appropriate for skewed
distributions.
43 / 79
Median
Advantages
▶ Simple to compute.
▶ Very appropriate for skewed
distributions.
▶ Great sampling stability.
43 / 79
Median
Advantages
▶ Simple to compute.
▶ Very appropriate for skewed
distributions.
▶ Great sampling stability.
▶ Most appropriate for open ended
distributions.
43 / 79
Median
Advantages
▶ Simple to compute.
▶ Very appropriate for skewed
distributions.
▶ Great sampling stability.
▶ Most appropriate for open ended
distributions.
▶ Appropriate for ordered qualitative
data.
43 / 79
Median
Advantages
▶ Simple to compute.
Disadvantages
▶ Very appropriate for skewed
distributions.
▶ Great sampling stability.
▶ Most appropriate for open ended
distributions.
▶ Appropriate for ordered qualitative
data.
43 / 79
Median
Advantages
▶ Simple to compute.
Disadvantages
▶ Very appropriate for skewed
▶ Does not use all data
distributions.
values.
▶ Great sampling stability.
▶ Most appropriate for open ended
distributions.
▶ Appropriate for ordered qualitative
data.
43 / 79
Median
Advantages
▶ Simple to compute.
Disadvantages
▶ Very appropriate for skewed
▶ Does not use all data
distributions.
values.
▶ Great sampling stability.
▶ Less mathematically
▶ Most appropriate for open ended
tractable compared to
distributions. mean.
▶ Appropriate for ordered qualitative
data.
43 / 79
Summarizing Qualitative Data: Frequency Table, Central
Tendency, Bar Graph and Pie Chart
44 / 79
Summarizing Qualitative Data: Frequency Table, Central
Tendency, Bar Graph and Pie Chart
44 / 79
Summarizing Qualitative Data: Frequency Table, Central
Tendency, Bar Graph and Pie Chart
44 / 79
Summarizing Qualitative Data: Frequency Table, Central
Tendency, Bar Graph and Pie Chart
44 / 79
Summarizing Qualitative Data: Frequency Table, Central
Tendency, Bar Graph and Pie Chart
44 / 79
Summarizing Quantitative Data: Dispersion - Concept
45 / 79
Summarizing Quantitative Data: Dispersion - Concept
45 / 79
Summarizing Quantitative Data: Dispersion - Concept
45 / 79
Summarizing Quantitative Data: Dispersion - Concept
45 / 79
Summarizing Quantitative Data: Dispersion - Concept
45 / 79
Summarizing Quantitative Data: Dispersion - Concept
45 / 79
Summarizing Quantitative Data: Dispersion - Range
46 / 79
Summarizing Quantitative Data: Dispersion - Range
The range is defined as the R = (Maximum Data value) – (Minimum
Data Value)
46 / 79
Summarizing Quantitative Data: Dispersion - Range
The range is defined as the R = (Maximum Data value) – (Minimum
Data Value)
Range = 68 – 27 =41
46 / 79
Summarizing Quantitative Data: Dispersion - Range
The range is defined as the R = (Maximum Data value) – (Minimum
Data Value)
Advantages
Range = 68 – 27 =41
46 / 79
Summarizing Quantitative Data: Dispersion - Range
The range is defined as the R = (Maximum Data value) – (Minimum
Data Value)
Advantages
▶ Easy to compute
Range = 68 – 27 =41
46 / 79
Summarizing Quantitative Data: Dispersion - Range
The range is defined as the R = (Maximum Data value) – (Minimum
Data Value)
Advantages
▶ Easy to compute
▶ Provides a quick understanding
of the total spread of the data
values
Range = 68 – 27 =41
46 / 79
Summarizing Quantitative Data: Dispersion - Range
The range is defined as the R = (Maximum Data value) – (Minimum
Data Value)
Advantages
▶ Easy to compute
▶ Provides a quick understanding
of the total spread of the data
values
Disadvantages
Range = 68 – 27 =41
46 / 79
Summarizing Quantitative Data: Dispersion - Range
The range is defined as the R = (Maximum Data value) – (Minimum
Data Value)
Advantages
▶ Easy to compute
▶ Provides a quick understanding
of the total spread of the data
values
Disadvantages
▶ Cannot be used for qualitative
data
Range = 68 – 27 =41
46 / 79
Summarizing Quantitative Data: Dispersion - Range
The range is defined as the R = (Maximum Data value) – (Minimum
Data Value)
Advantages
▶ Easy to compute
▶ Provides a quick understanding
of the total spread of the data
values
Disadvantages
▶ Cannot be used for qualitative
data
▶ Uses only two data values
Range = 68 – 27 =41
46 / 79
Summarizing Quantitative Data: Dispersion - Range
The range is defined as the R = (Maximum Data value) – (Minimum
Data Value)
Advantages
▶ Easy to compute
▶ Provides a quick understanding
of the total spread of the data
values
Disadvantages
▶ Cannot be used for qualitative
data
▶ Uses only two data values
▶ Highly influenced by extreme
Range = 68 – 27 =41 data values
46 / 79
Summarizing Quantitative Data: Dispersion - Range
The range is defined as the R = (Maximum Data value) – (Minimum
Data Value)
Advantages
▶ Easy to compute
▶ Provides a quick understanding
of the total spread of the data
values
Disadvantages
▶ Cannot be used for qualitative
data
▶ Uses only two data values
▶ Highly influenced by extreme
Range = 68 – 27 =41 data values
▶ Poor sampling stability
46 / 79
Summarizing Quantitative Data: Dispersion
Inter-Quartile Range
47 / 79
Summarizing Quantitative Data: Dispersion
Inter-Quartile Range
47 / 79
Summarizing Quantitative Data: Dispersion
Inter-Quartile Range
47 / 79
Summarizing Quantitative Data: Dispersion
Inter-Quartile Range
47 / 79
Summarizing Quantitative Data: Dispersion
Inter-Quartile Range
47 / 79
Summarizing Quantitative Data: Dispersion
Inter-Quartile Range
47 / 79
Summarizing Quantitative Data: Dispersion
Inter-Quartile Range
47 / 79
Summarizing Quantitative Data: Dispersion
Inter-Quartile Range
47 / 79
Summarizing Quantitative Data: Dispersion
Inter-Quartile Range
47 / 79
Inter-Quartile Range
48 / 79
Inter-Quartile Range
Advantages
48 / 79
Inter-Quartile Range
Advantages
▶ Good sampling stability, Less
influenced by extreme values,
appropriate for skewed
distributions.
48 / 79
Inter-Quartile Range
Advantages Disadvantages
▶ Good sampling stability, Less
influenced by extreme values,
appropriate for skewed
distributions.
48 / 79
Inter-Quartile Range
Advantages Disadvantages
▶ Good sampling stability, Less ▶ Not computable for qualitative
influenced by extreme values, variables, Does not use all the data,
appropriate for skewed Not amenable for further mathematical
distributions. operations.
48 / 79
Inter-Quartile Range
Advantages Disadvantages
▶ Good sampling stability, Less ▶ Not computable for qualitative
influenced by extreme values, variables, Does not use all the data,
appropriate for skewed Not amenable for further mathematical
distributions. operations.
48 / 79
Summarizing Quantitative Data: Dispersion – Box
(Whisker) Plots
49 / 79
Summarizing Quantitative Data: Dispersion – Box
(Whisker) Plots
49 / 79
Summarizing Quantitative Data: Dispersion – Box
(Whisker) Plots
49 / 79
Summarizing Quantitative Data: Dispersion – Box
(Whisker) Plots
49 / 79
Summarizing Quantitative Data Dispersion: Standard
Deviation (SD)
50 / 79
Summarizing Quantitative Data Dispersion: Standard
Deviation (SD)
50 / 79
Summarizing Quantitative Data Dispersion: Standard
Deviation (SD)
50 / 79
Summarizing Quantitative Data Dispersion: Standard
Deviation (SD)
50 / 79
Summarizing Quantitative Data Dispersion: Standard
Deviation (SD)
50 / 79
Summarizing Quantitative Data Dispersion: Standard
Deviation (SD)
50 / 79
Summarizing Quantitative Data Dispersion: Standard
Deviation (SD)
50 / 79
Comparison with Standard Normal Plot
51 / 79
Comparison with Standard Normal Plot
51 / 79
Comparison with Standard Normal Plot
Advantages
51 / 79
Comparison with Standard Normal Plot
Advantages
▶ Like mean, uses all the data
values.
51 / 79
Comparison with Standard Normal Plot
Advantages
▶ Like mean, uses all the data
values.
▶ Has good sampling stability.
51 / 79
Comparison with Standard Normal Plot
Advantages
▶ Like mean, uses all the data
values.
▶ Has good sampling stability.
▶ The measure is mathematically
tractable.
51 / 79
Comparison with Standard Normal Plot
Disadvantages
Advantages
▶ Like mean, uses all the data
values.
▶ Has good sampling stability.
▶ The measure is mathematically
tractable.
51 / 79
Comparison with Standard Normal Plot
Disadvantages
Advantages
▶ Cannot be used for qualitative
▶ Like mean, uses all the data
data.
values.
▶ Has good sampling stability.
▶ The measure is mathematically
tractable.
51 / 79
Comparison with Standard Normal Plot
Disadvantages
Advantages
▶ Cannot be used for qualitative
▶ Like mean, uses all the data
data.
values.
▶ Is influenced by extreme
▶ Has good sampling stability.
values/outliers.
▶ The measure is mathematically
tractable.
51 / 79
Comparison with Standard Normal Plot
Disadvantages
Advantages
▶ Cannot be used for qualitative
▶ Like mean, uses all the data
data.
values.
▶ Is influenced by extreme
▶ Has good sampling stability.
values/outliers.
▶ The measure is mathematically
▶ Not very appropriate for skewed
tractable.
data.
51 / 79
Summarizing Quantitative Data: Dispersion
Coefficient of Variation (CV)
52 / 79
Summarizing Quantitative Data: Dispersion
Coefficient of Variation (CV)
52 / 79
Summarizing Quantitative Data: Dispersion
Coefficient of Variation (CV)
52 / 79
Summarizing Quantitative Data: Dispersion
Coefficient of Variation (CV)
52 / 79
Summarizing Quantitative Data: Dispersion
Coefficient of Variation (CV)
52 / 79
Summarizing Quantitative Data: Dispersion
Coefficient of Variation (CV)
52 / 79
Summarizing Qualitative Data: Dispersion
Index of Diversity (Gini Impurity)
53 / 79
Summarizing Qualitative Data: Dispersion
Index of Diversity (Gini Impurity)
53 / 79
Summarizing Qualitative Data: Dispersion
Index of Diversity (Gini Impurity)
53 / 79
Summarizing Qualitative Data: Dispersion
Index of Diversity (Gini Impurity)
53 / 79
Summarizing Qualitative Data: Dispersion
Index of Diversity (Gini Impurity)
53 / 79
Summarizing Qualitative Data: Dispersion
Index of Diversity (Gini Impurity)(contd.)
54 / 79
Summarizing Qualitative Data: Dispersion
Index of qualitative variation
55 / 79
Summarizing Qualitative Data: Dispersion
Index of qualitative variation
(k−1)/k =
= (1–(0.4)2 –(0.3)2 − (0.3)2 )×3/2 =
0.96.
55 / 79
Summarizing Qualitative Data: Dispersion
Index of qualitative variation
55 / 79
Summarizing Qualitative Data: Dispersion
Index of qualitative variation(contd.)
56 / 79
Data: Summary of ...
57 / 79
Data: Summary of ...
57 / 79
Data: Summary of ...
57 / 79
Data: Summary of ...
57 / 79
Data: Summary of ...
57 / 79
Data: Summary of ...
58 / 79
Data: Summary of ...
58 / 79
Data: Summary of ...
58 / 79
Data: Summary of ...
58 / 79
Data: Summary of ...
58 / 79
More than One Variable
59 / 79
Data Tables: Cross Tabulations
60 / 79
Data Tables: Cross Tabulations
60 / 79
Data Tables: Cross Tabulations
Higher the Family Monthly Income, the more likely it is that the HH will
have more than one Car.
60 / 79
Data Tables: Cross Tabulations
Higher the Family Monthly Income, the more likely it is that the HH will
have more than one Car.
Higher the Family Size, the more likely it is that the HH will have more
than one Car.
60 / 79
Data Plots: Three variables at a go
61 / 79
Correlation
62 / 79
Measure of Associations
63 / 79
Measure of Associations
63 / 79
Measure of Associations
Pn
i=1 (Xi − X̄ )(Yi − Ȳ )
cov (x, y ) =
n−1
Pn
2 − x̄)2
i=1 (xi
SD =
n−1
63 / 79
Measure of Associations - Covariance
64 / 79
Measure of Associations - Covariance
64 / 79
Measure of Associations - Covariance
64 / 79
Measure of Associations - Covariance
65 / 79
Measure of Associations - Covariance
65 / 79
Measure of Associations - Covariance
Covariance is unbounded,
and the value also depends
on the units of data
65 / 79
Measure of Associations - Correlations
66 / 79
Measure of Associations - Correlations
66 / 79
Measure of Associations - Correlations
66 / 79
Measure of Associations - Correlations
66 / 79
Measure of Associations - Correlations
67 / 79
Measure of Associations - Correlations
67 / 79
Measure of Associations - Correlations
67 / 79
Measure of Associations - Correlations
67 / 79
Measure of Associations - Correlations
67 / 79
Measure of Associations - Correlations
68 / 79
Measure of Associations - Correlations
68 / 79
Measure of Associations - Correlations
68 / 79
Measure of Associations - Correlations
68 / 79
Measure of Associations - Correlations
69 / 79
Measure of Associations - Correlations
Pearson’s Correlation
coefficient provides the
direction and strength of a
linear relationship between
two variables.
69 / 79
Measure of Associations - Correlations
Pearson’s Correlation
coefficient provides the
direction and strength of a
linear relationship between
two variables.
69 / 79
Spearman Rank Correlation
70 / 79
Spearman Rank Correlation
Spearman’s rank correlation measures the strength and direction of
association between two ranked variables. It basically gives the measure
of monotonicity of the relation between two variables i.e. how well the
relationship between two variables could be represented using a
monotonic function.
70 / 79
Spearman Rank Correlation
Spearman’s rank correlation measures the strength and direction of
association between two ranked variables. It basically gives the measure
of monotonicity of the relation between two variables i.e. how well the
relationship between two variables could be represented using a
monotonic function.
▶ Degree of association between two variables
70 / 79
Spearman Rank Correlation
Spearman’s rank correlation measures the strength and direction of
association between two ranked variables. It basically gives the measure
of monotonicity of the relation between two variables i.e. how well the
relationship between two variables could be represented using a
monotonic function.
▶ Degree of association between two variables
▶ Linear or nonlinear association
70 / 79
Spearman Rank Correlation
Spearman’s rank correlation measures the strength and direction of
association between two ranked variables. It basically gives the measure
of monotonicity of the relation between two variables i.e. how well the
relationship between two variables could be represented using a
monotonic function.
▶ Degree of association between two variables
▶ Linear or nonlinear association
▶ x increases, y increases or decreases monotonically
70 / 79
Spearman Rank Correlation
Spearman’s rank correlation measures the strength and direction of
association between two ranked variables. It basically gives the measure
of monotonicity of the relation between two variables i.e. how well the
relationship between two variables could be represented using a
monotonic function.
▶ Degree of association between two variables
▶ Linear or nonlinear association
▶ x increases, y increases or decreases monotonically
70 / 79
Spearman Rank Correlation
71 / 79
Spearman Rank Correlation
▶ Spearman rank correlation computation for n observations:
6 di2
P
rs = 1 −
n(n2 − 1)
di is the difference in the ranks given to the two variables values for
each item of the data.
71 / 79
Spearman Rank Correlation
▶ Spearman rank correlation computation for n observations:
6 di2
P
rs = 1 −
n(n2 − 1)
di is the difference in the ranks given to the two variables values for
each item of the data.
▶ Example:
71 / 79
Spearman Rank Correlation
▶ Spearman rank correlation computation for n observations:
6 di2
P
rs = 1 −
n(n2 − 1)
di is the difference in the ranks given to the two variables values for
each item of the data.
▶ Example:
71 / 79
Spearman Rank Correlation
▶ Spearman rank correlation computation for n observations:
6 di2
P
rs = 1 −
n(n2 − 1)
di is the difference in the ranks given to the two variables values for
each item of the data.
▶ Example:
71 / 79
Spearman Rank Correlation
▶ Spearman rank correlation computation for n observations:
6 di2
P
rs = 1 −
n(n2 − 1)
di is the difference in the ranks given to the two variables values for
each item of the data.
▶ Example:
71 / 79
Kendall rank correlation coefficient
72 / 79
Kendall rank correlation coefficient
72 / 79
Kendall rank correlation coefficient
72 / 79
Kendall rank correlation coefficient
72 / 79
Kendall rank correlation coefficient
72 / 79
Kendall rank correlation coefficient
72 / 79
Kendall rank correlation coefficient
73 / 79
Kendall rank correlation coefficient
73 / 79
Kendall rank correlation coefficient
73 / 79
Kendall rank correlation coefficient
74 / 79
Kendall rank correlation coefficient
74 / 79
Kendall rank correlation coefficient
74 / 79
Kendall rank correlation coefficient
74 / 79
Kendall rank correlation coefficient
74 / 79
Kendall rank correlation coefficient
75 / 79
Kendall rank correlation coefficient
75 / 79
Kendall rank correlation coefficient
75 / 79
Kendall rank correlation coefficient
75 / 79
Kendall rank correlation coefficient
75 / 79
Kendall rank correlation coefficient
75 / 79
Kendall rank correlation coefficient
76 / 79
Kendall rank correlation coefficient
76 / 79
Kendall rank correlation coefficient
76 / 79
Kendall rank correlation coefficient
76 / 79
Kendall rank correlation coefficient
76 / 79
Summary
▶ Data Types
▶ Classification of Data
▶ Basic Statistical Measures
▶ Shape of the data
▶ Central Tendency (3M’s)
▶ Dispersion
▶ Summarizing Qualitative Data
▶ Frequency Table
▶ Central Tendency
▶ Bar Graph
▶ Pie Chart
▶ Summarizing Quantitative Data: Dispersion
▶ Range
▶ Inter-Quartile Range
▶ Box Plot
▶ Standard Deviation (SD)
▶ Comparison with Normal Distribution
▶ Coefficient of Variation (CV)
77 / 79
Summary (Contd.)
78 / 79
The End
79 / 79