Professional Documents
Culture Documents
BITS Pilani
Bangalore Campus
Descriptive Statistics
About me
Dr.Gangaboraiah, PhD (Stats)
Former Professor of Statistics, KIMS, Bangalore
Work Experience
Kempegowda Institute of Medical Sciences, Bangalore (34 years)
Govt. Homeopathy Medical College, Bangalore (4 years)
SJC Institute of Technology, Chickballapur (13 years, Visiting Professor)
Manipal University, Bangalore Centre (Since 2008, Visiting Professor)
MS (Computer Science), MS (Computer Network)
Data Science
BITS (Since 2013, Visiting Professor)
MTech (Data Science)
WIPRO and Aricent (2019)
Prof.Gangaboraiah PhD (Stats) | Slide 3 of 125 Former Professor of Statistics | KIMS, B’lore
Agenda
Here’s what you will learn in the entire Session:
1 Data Visualization: Why? What? How?
Prof.Gangaboraiah PhD (Stats) | Slide 4 of 125 Former Professor of Statistics | KIMS, B’lore
Data Visualization
Data is generated everywhere …everyday
…and is increasing exponentially
Source: http://3dsbiovia.com/blog/
Prof.Gangaboraiah PhD (Stats) | Slide 6 of 125 Former Professor of Statistics | KIMS, B’lore
Data Has Become ‘Big ’
Prof.Gangaboraiah PhD (Stats) | Slide 8 of 125 Former Professor of Statistics | KIMS, B’lore
How many 3’s ?
Prof.Gangaboraiah PhD (Stats) | Slide 9 of 125 Former Professor of Statistics | KIMS, B’lore
How many 3’s ?
Prof.Gangaboraiah PhD (Stats) | Slide 12 of 125 Former Professor of Statistics | KIMS, B’lore
Need for Data Visualization
Tool to enable a user get insight into data
Broadly three types of goals:
• To explore:
o Nothing is known
o Required to get an insight
• To analyze :
o There are hypotheses
o Used for verification or falsification
• To present:
o We have the required information
o Used for communication of result Source: Google images
Prof.Gangaboraiah PhD (Stats) | Slide 13 of 125 Former Professor of Statistics | KIMS, B’lore
What experts say ?
“Data visualization is the use of visual representations to
explore, make sense of, and communicate data.”
Data visualization expert, Stephen Few
Prof.Gangaboraiah PhD (Stats) | Slide 14 of 125 Former Professor of Statistics | KIMS, B’lore
History of Visualization
Prof.Gangaboraiah PhD (Stats) | Slide 15 of 125 Former Professor of Statistics | KIMS, B’lore
Visualization of Napoleon's Army
Prof.Gangaboraiah PhD (Stats) | Slide 16 of 125 Former Professor of Statistics | KIMS, B’lore
Impact of Visualization
• John Snow’s Cholera Map (1854)
• Snow used a spot map to illustrate how cases of cholera clustered around the
pump
Prof.Gangaboraiah PhD (Stats) | Slide 17 of 125 Former Professor of Statistics | KIMS, B’lore
Truth about Crime – BBC
http://www.bbc.co.uk/truthaboutcrime/crimemap/
Prof.Gangaboraiah PhD (Stats) | Slide 18 of 125 Former Professor of Statistics | KIMS, B’lore
Good data representation principles
Exercise
Prof.Gangaboraiah PhD (Stats) | Slide 19 of 125 Former Professor of Statistics | KIMS, B’lore
1. Use Colors Wisely – 1/5
What is
Wrong with
this Color
Scale ?
Prof.Gangaboraiah PhD (Stats) | Slide 20 of 125 Former Professor of Statistics | KIMS, B’lore
1. Use Colors Wisely -2/5
Not a bad choice of color scale, but the Dynamic Range
needs some work
Prof.Gangaboraiah PhD (Stats) | Slide 21 of 125 Former Professor of Statistics | KIMS, B’lore
1. Use Colors Wisely – 3/5
Do Not Attempt to Fight Pre-Established Color Meanings
Red Green Blue
• Stop • Go • Cool
• Off • On • Safe
• Dangerous • Plants • Deep
• Hot • Carbon • Nitrogen
• High stress • Moving • Job
• Oxygen • Money completed
• Shallow • All OK
• Money loss • SLA Met
• Project running late • Project on schedule
Prof.Gangaboraiah PhD (Stats) | Slide 22 of 125 Former Professor of Statistics | KIMS, B’lore
2. Use Colors Wisely – 4/5
Which one is easier to read ?
Use a different
color to
extrapolation to
future
Prof.Gangaboraiah PhD (Stats) | Slide 24 of 125 Former Professor of Statistics | KIMS, B’lore
2. Reduce clutter
Three graphs from the same data
• Connecting lines should never obscure points and points should not obscure
each other.
• If multiple samples overlap, a representation should be chosen for the
elements that emphasizes the overlap.
• If multiple data sets are represented in the same plot (superposed data), they must be visually
separable.
• If this is not possible due to the data itself, the data can be separated into
• adjacent plots that share an axis
Prof.Gangaboraiah PhD (Stats) | Slide 27 of 125 Former Professor of Statistics | KIMS, B’lore
4- Use proper scale
Prof.Gangaboraiah PhD (Stats) | Slide 29 of 125 Former Professor of Statistics | KIMS, B’lore
6- Align juxtaposed plots
Make sure that scales match and graphs are aligned, to
improve the understanding
Prof.Gangaboraiah PhD (Stats) | Slide 30 of 125 Former Professor of Statistics | KIMS, B’lore
Gestalt principles
• The brain creates a perception that is more than the sum of available visual
inputs.
• Gestalt principles is used to identify the elements in our visualization which are
signal (the information we want to communicate) and which are noise (clutter).
Prof.Gangaboraiah PhD (Stats) | Slide 32 of 125 Former Professor of Statistics | KIMS, B’lore
Law of Similarity
We seek similarities and differences in objects and link similar
objects as belonging to a group.
Prof.Gangaboraiah PhD (Stats) | Slide 33 of 125 Former Professor of Statistics | KIMS, B’lore
Law of Closure
Our minds tend to see complete figures even if a picture is
incomplete.
Prof.Gangaboraiah PhD (Stats) | Slide 34 of 125 Former Professor of Statistics | KIMS, B’lore
Law of Enclosure
We perceive objects as belonging to a group when they are enclosed in
a way that creates a boundary or border around them.
Prof.Gangaboraiah PhD (Stats) | Slide 35 of 125 Former Professor of Statistics | KIMS, B’lore
Law of Continuity
Our tendency is to see shapes as continuous to the greatest
degree possible. The human eye follows lines, curves or a
sequence of shapes to create pathways.
Prof.Gangaboraiah PhD (Stats) | Slide 36 of 125 Former Professor of Statistics | KIMS, B’lore
Law of Connection
We perceive objects connected to each other as a single group as
opposed to objects that are not linked in the same manner.
Prof.Gangaboraiah PhD (Stats) | Slide 37 of 125 Former Professor of Statistics | KIMS, B’lore
Are you ready to create a dashboard ?
Prof.Gangaboraiah PhD (Stats) | Slide 38 of 125 Former Professor of Statistics | KIMS, B’lore
Dashboard
Prof.Gangaboraiah PhD (Stats) | Slide 39 of 125 Former Professor of Statistics | KIMS, B’lore
Elements of dashboard design
Chart type
Prof.Gangaboraiah PhD (Stats) | Slide 40 of 125 Former Professor of Statistics | KIMS, B’lore
Steps for a great dashboard design
• Content
• Tools
Prof.Gangaboraiah PhD (Stats) | Slide 41 of 125 Former Professor of Statistics | KIMS, B’lore
Steps for a great dashboard design
Avoid excessive details
Prof.Gangaboraiah PhD (Stats) | Slide 42 of 125 Former Professor of Statistics | KIMS, B’lore
Chart Types
Line charts are great when it comes to displaying patterns of
change across a continuum.
Prof.Gangaboraiah PhD (Stats) | Slide 43 of 125 Former Professor of Statistics | KIMS, B’lore
Chart Types
• Choose bar charts if you want to compare items in the same
category.
• The objective is not just to compare but also show how much one is
better or worse than the rest.
Prof.Gangaboraiah PhD (Stats) | Slide 44 of 125 Former Professor of Statistics | KIMS, B’lore
Chart Types
Sparklines usually don’t have a scale which means that users
will not be able to notice individual values. They work well
when you have a lot of metrics and you want to show only the
trends.
Prof.Gangaboraiah PhD (Stats) | Slide 45 of 125 Former Professor of Statistics | KIMS, B’lore
Charts To Avoid
Avoid scatterplots. They lack precision and clarity as the
relationships between two quantitative measures don’t
change very frequently.
Prof.Gangaboraiah PhD (Stats) | Slide 46 of 125 Former Professor of Statistics | KIMS, B’lore
Charts To Avoid
Avoid Pie charts. They rank low in precision because users
find it difficult to accurately compare the sizes of the pie slices.
Prof.Gangaboraiah PhD (Stats) | Slide 47 of 125 Former Professor of Statistics | KIMS, B’lore
Charts To Avoid
Avoid bubble charts. They require too much mental effort from their
users even when it comes to reading simple information in a context.
Prof.Gangaboraiah PhD (Stats) | Slide 48 of 125 Former Professor of Statistics | KIMS, B’lore
Visualization and layout design
• Place the most important information of top left of the dashboard.
Prof.Gangaboraiah PhD (Stats) | Slide 49 of 125 Former Professor of Statistics | KIMS, B’lore
Visualization and layout design
• Avoid highly saturated colors instead choose few
colors and stick to it.
• Use the same color for the same item on all charts.
Prof.Gangaboraiah PhD (Stats) | Slide 50 of 125 Former Professor of Statistics | KIMS, B’lore
Some tools that aid in data
visualization include:
- Tableau
- Power Point
- QlikView
- Python Visualization library
(MatPlotlib)
- Google chart
and so on
Prof.Gangaboraiah PhD (Stats) | Slide 51 of 125 Former Professor of Statistics | KIMS, B’lore
Summarisation of Data
Data Summarization
Prof.Gangaboraiah PhD (Stats) | Slide 53 of 125 Former Professor of Statistics | KIMS, B’lore
Measures of Central Tendency
• Measure of central tendency provides a very convenient way
of describing a set of scores with a single number that
describes the PERFORMANCE of the group.
Prof.Gangaboraiah PhD (Stats) | Slide 54 of 125 Former Professor of Statistics | KIMS, B’lore
Ungrouped Distribution
Prepare a report showing the number of hours per week
students spend studying from a random sample of 30
students. Determines the number of hours each student
studied last week.
15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8,
13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4,
18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0,
17.8, 33.8, 23.2, 12.9, 27.1, 16.6.
Prof.Gangaboraiah PhD (Stats) | Slide 55 of 125 Former Professor of Statistics | KIMS, B’lore
Grouped Frequency Distribution
Table No. Title. Head note
Caption
Stub heading Column Column
Total
heading heading
Row heading r1
Body of the table
Row heading r2
Total c1 c2 n
Footnote
Source
Prof.Gangaboraiah PhD (Stats) | Slide 56 of 125 Former Professor of Statistics | KIMS, B’lore
Grouped Frequency Distribution
Marks No. of Age (yrs) No. of persons
obtained persons <1 15
45 10 1-5 15
6-12 30
46 15
13-19 35
47 30 20-29 45
48 25 30-39 65
49 15 40-49 44
50 5 50-60 32
> 60 19
Total 100
Total 300
Prof.Gangaboraiah PhD (Stats) | Slide 57 of 125 Former Professor of Statistics | KIMS, B’lore
Ungrouped Distribution
Prepare a report showing the number of hours per week
students spend studying from a random sample of 30
students. Determines the number of hours each student
studied last week.
15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8,
13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4,
18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0,
17.8, 33.8, 23.2, 12.9, 27.1, 16.6.
Organize the data into a frequency distribution by
considering the class interval (a) 7.5 – 12.5, 12.5 – 17.5
etc and (b) 10-15, 15 – 20 etc.
Prof.Gangaboraiah PhD (Stats) | Slide 58 of 125 Former Professor of Statistics | KIMS, B’lore
Grouped Frequency Distribution
Hours of No. of Hours of No. of
studying persons (f) studying persons (f)
07.5 – 12.5 1 10 - 15 7
12.5 – 17.5 12 15 - 20 12
17.5 – 22.5 10 20 - 25 7
22.5 – 27.5 5 25 - 30 3
27.5 – 32.5 1 30 - 35 1
32.5 – 37.5 1 Total 30
Total 30
Prof.Gangaboraiah PhD (Stats) | Slide 59 of 125 Former Professor of Statistics | KIMS, B’lore
Grouped Frequency Distribution
Class Midpoint: find the midpoint of each interval, use the following formula:
Upper limit + lower limit
2
No. of No. of
Hours of Hours of
Mid point (x) persons Mid point (x) persons
studying studying
(f) (f)
07.5 – 12.5 (12.5+07.5)/2=10 1 10 - 15 (10+15)/2=12.5 7
12.5 – 17.5 (17.5+12.5)/2=15 12 15 - 20 (15+20)/2=17.5 12
17.5 – 22.5 (22.5+17.5)/2=20 10
20 - 25 (20+25)/2=22.5 7
22.5 – 27.5 (27.5+22.5)/2=25 5
25 - 30 (25+30)/2=27.5 3
27.5 – 32.5 (32.5+27.5)/2=30 1
32.5 – 37.5 (37.5+32.5)/2=35 1
30 - 35 (35+35)/2=32.5 1
Total 30
Total 30
Prof.Gangaboraiah PhD (Stats) | Slide 60 of 125 Former Professor of Statistics | KIMS, B’lore
Grouped Frequency Distribution
Relative Frequency Distribution: Shows the relative observations in each class
No. of No. of
Hours of Relative Hours of Relative
Persons Persons
studying frequency studying frequency
(f) (f)
07.5 – 12.5 1 1/30=0.33 10 - 15 7 7/30 =0.23
12.5 – 17.5 12 12/30=0.40 15 - 20 12 12/30 =0.40
17.5 – 22.5 10 10/30=0.33 20 - 25 7 7/30 =0.23
22.5 – 27.5 5 5/30=0.17 25 - 30 3 3/30 =0.10
27.5 – 32.5 1 1/30=0.03 30 - 35 1 1/30 = 0.04
32.5 – 37.5 1 1/30=0.03 Total 30 1
Total 30 1
Prof.Gangaboraiah PhD (Stats) | Slide 61 of 125 Former Professor of Statistics | KIMS, B’lore
Ungrouped Distribution
• Also referred as the “arithmetic average”
• The most commonly used measure of the center of
data
• Computation of Sample mean for ungrouped data:
Sum of observation divided by number of
observations. If X is denoted as variable and x1, x2,
…, xn as values of X, then n
x1 x 2 ... x n i 1
xi
X
n n
Prof.Gangaboraiah PhD (Stats) | Slide 62 of 125 Former Professor of Statistics | KIMS, B’lore
Ungrouped Distribution
• If the mean is to be calculated for population based
data, then it is given by
n
x1 x 2 ... x n x i
μ i 1
n n
Prof.Gangaboraiah PhD (Stats) | Slide 63 of 125 Former Professor of Statistics | KIMS, B’lore
Ungrouped Distribution
Computation of mean
n n
x1 x 2 ... x n x i
x1 x 2 ... x n x i
μ i 1
X i 1
n n n n
Prof.Gangaboraiah PhD (Stats) | Slide 64 of 125 Former Professor of Statistics | KIMS, B’lore
Mean for ungrouped data
0 1 2 3 4 5 6 7 8 9 10
Mean = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 6
Prof.Gangaboraiah PhD (Stats) | Slide 65 of 125 Former Professor of Statistics | KIMS, B’lore
Mean for ungrouped data
0 1 2 3 4 5 6 7 8 9 10 12 14 . . . 24
Mean = 8
0 1 2 3 4 5 6 7 8 9 10 12 14 . . . 44
Mean = 12
Prof.Gangaboraiah PhD (Stats) | Slide 66 of 125 Former Professor of Statistics | KIMS, B’lore
Grouped Frequency Distribution
• Computation of Mean for grouped frequency data:
Sum of the product of frequency (f) with mid-term (x)
of class interval divided by total frequency. If x1, x2,
…, xn are the mid-term of class interval and f1, f2, …,
fn are their corresponding frequencies then the mean
is calculated by n
f1x1 f 2 x 2 ... f n x n fi x i
X n
i 1
fi
N
i 1
Prof.Gangaboraiah PhD (Stats) | Slide 67 of 125 Former Professor of Statistics | KIMS, B’lore
Mean for grouped data
Using Direct and Step-deviation methods n
Mid f x i i
Hours of
fi point fixi di=(xi-A)/h fidi X i 1
studying N
(xi)
580
07.5 – 12.5 1 10 10 -2 -2 X 19.33
12.5 – 17.5 12 15 180 -1 - 12 30
n
17.5 – 22.5 10 A= 20 200 0 0 f d i i
22.5 – 27.5 5 25 125 +1 5 X A i 1
h
27.5 – 32.5 1 30 30 +2 2 N
32.5 – 37.5 4
1 35 35 +3 3 X 20 5 19.33
Total 30 ∑fixi 580 ∑fidi - 4 30
Prof.Gangaboraiah PhD (Stats) | Slide 68 of 125 Former Professor of Statistics | KIMS, B’lore
Mean for grouped data
Wages of employees No. of
Find the mean of (Rs) persons
the wages of 4001- 4500 25
employees 4501- 5000 36
of a company are 5001- 5500 45
as follows: 5501- 6000 62
6001- 6500 39
6501- 7000 55
7001- 7500 44
7501- 8000 29
8001- 8500 15
Total 350
Prof.Gangaboraiah PhD (Stats) | Slide 69 of 125 Former Professor of Statistics | KIMS, B’lore
Mean: Grouped Scores
Data of Children watching TV in Bangalore
Hours of
No. of
TV Cumulati
children fX Percentage
watching ve %
(f)
(X)
1 104 104 31.3 33.3
2 130 260 39.2 70.5
3 98 294 29.5 100
Total 332 658 100.0
Prof.Gangaboraiah PhD (Stats) | Slide 70 of 125 Former Professor of Statistics | KIMS, B’lore
Mean - Properties
• It measures stability. Mean is the most stable among other
measures of central tendency because every score contributes to
the value of the mean.
• It is rigidly defined and therefore suitable for further mathematical
anlysis
• It may easily affected by the extreme scores (outliers).
• The sum of each score’s distance from the mean is zero
(X-Mean)=0
• It can be applied to interval and ratio level of measurement
• It may not be an actual score in the distribution
• It is very easy to compute.
Prof.Gangaboraiah PhD (Stats) | Slide 71 of 125 Former Professor of Statistics | KIMS, B’lore
Mean
When to use the Mean
Sampling stability is desired.
Prof.Gangaboraiah PhD (Stats) | Slide 72 of 125 Former Professor of Statistics | KIMS, B’lore
The Standard Deviation
Sl.
X1 X2
No.
1 2 3
2 8 10
3 5 5
4 3 3
5 7 7
6 8 3
7 5 5
8 2 6
9 5 3
Total 45 45
Prof.Gangaboraiah PhD (Stats) | Slide 73 of 125 Former Professor of Statistics | KIMS, B’lore
The Standard Deviation
Statistical measures Group 1 Group 2
Mean 5 5
Median 5 5
Mode 5 5
Prof.Gangaboraiah PhD (Stats) | Slide 74 of 125 Former Professor of Statistics | KIMS, B’lore
??? - Observe the following data
Group A
11 12 13 14 15 16 17 18 19 20 21
Group B
11 12 13 14 15 16 17 18 19 20 21
Group C
11 12 13 14 15 16 17 18 19 20 21
Prof.Gangaboraiah PhD (Stats) | Slide 75 of 125 Former Professor of Statistics | KIMS, B’lore
Compute Mean for all three groups
Group A Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
Group B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
Group C Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
Prof.Gangaboraiah PhD (Stats) | Slide 76 of 125 Former Professor of Statistics | KIMS, B’lore
??? - Observe the following data
Do you need anything else to describe the data
with mean because mean of all three groups are
same?
• Measures of dispersion or variation
Standard deviation
Variance
Range
Prof.Gangaboraiah PhD (Stats) | Slide 77 of 125 Former Professor of Statistics | KIMS, B’lore
Standard Deviation
x
G1(x) xi x (x i x ) 2 n
2
i x
12 -3.5 12.25
S i 1
13 -2.5 6.25 n -1
15 -0.5 0.25
52.0
15 -0.5 0.25
15 -0.5 0.25 7
16 0.5 0.25 2.726
17 1.5 2.25
21 5.5 30.25
Total 0 52.0
Prof.Gangaboraiah PhD (Stats) | Slide 78 of 125 Former Professor of Statistics | KIMS, B’lore
Standard Deviation
G1 G2 G3
12 14 11
13 15 11
15 15 11 Calculate SD
of groups
15 15 12
G2 and G3
15 16 19
16 16 20
17 16 20
21 17 20
Prof.Gangaboraiah PhD (Stats) | Slide 79 of 125 Former Professor of Statistics | KIMS, B’lore
Standard Deviation
Group A Mean = 15.5
SD= 2.726
11 12 13 14 15 16 17 18 19 20 21
Group B
Mean = 15.5
SD=0.926
11 12 13 14 15 16 17 18 19 20 21
Group C Mean = 15.5
SD=4.567
11 12 13 14 15 16 17 18 19 20 21
Prof.Gangaboraiah PhD (Stats) | Slide 80 of 125 Former Professor of Statistics | KIMS, B’lore
The Standard Deviation
Sl.
X1 X2
No.
1 2 3
2 8 10
3 5 5
4 3 3
5 7 7
6 8 3
7 5 5
8 2 6
9 5 3
Total 45 45
Prof.Gangaboraiah PhD (Stats) | Slide 81 of 125 Former Professor of Statistics | KIMS, B’lore
The Standard Deviation
Statistical measures Group 1 Group 2
Mean 5 5
Median 5 5
Mode 5 5
Range 2- 8 3- 10
Mean Deviation 2 2
Variance 5.50 5.75
Standard Deviation 2.34 2.40
Coefficient of Variation (%) 46.90 48.00
Prof.Gangaboraiah PhD (Stats) | Slide 82 of 125 Former Professor of Statistics | KIMS, B’lore
Standard Deviation
The Standard Deviation is a measure of Dispersion
or Variation which is a descriptive statistics that
describe how similar a set of scores are to each other.
The more similar the scores are to each other, the
lower the Standard deviation will be and the less
similar the scores are to each other, the higher the
Standard deviation will be.
In general, the more spread out a distribution is, the
larger the measure of dispersion will be.
Prof.Gangaboraiah PhD (Stats) | Slide 83 of 125 Former Professor of Statistics | KIMS, B’lore
Measure of Variability/ Dispersion
Which of the distributions of scores has the larger
125
dispersion? 100
75
The upper distribution has 50
25
more dispersion because the 0
1 2 3 4 5 6 7 8 9 10
scores are more spread out
That is, they are less similar 125
100
to each other 75
50
25
0
1 2 3 4 5 6 7 8 9 10
Prof.Gangaboraiah PhD (Stats) | Slide 84 of 125 Former Professor of Statistics | KIMS, B’lore
Measure of Variability/ Dispersion
Variability can be defined several ways:
Prof.Gangaboraiah PhD (Stats) | Slide 85 of 125 Former Professor of Statistics | KIMS, B’lore
The Standard Deviation
New Strategy :
Find deviation of each score, ie., (X-Mean)
Square each deviation of each score, ie., (X-Mean)2
Sum the Squared Deviations, ie., Σ (X-Mean)2
Average the squared deviations Σ (X-Mean)2/n
• Mean Squared Deviation is known as “Variance”
• Variability is now measured in squared units
Prof.Gangaboraiah PhD (Stats) | Slide 86 of 125 Former Professor of Statistics | KIMS, B’lore
The Population Variance
• Population variance equals mean (average) squared
deviation (distance) of the scores from the population
mean
n n
(x i μ) 2
(x x)
i
2
σ i 1
S i 1
n n -1
Prof.Gangaboraiah PhD (Stats) | Slide 88 of 125 Former Professor of Statistics | KIMS, B’lore
Formula for grouped data
◊ Standard deviation for grouped data
Prof.Gangaboraiah PhD (Stats) | Slide 89 of 125 Former Professor of Statistics | KIMS, B’lore
The Standard Deviation
• Most common and most important measure of variability is
the standard deviation
o A measure of the standard, or average, distance from
the mean
o Describes whether the scores are clustered closely
around the mean or are widely scattered
Exercise : Find out the deviations of all the data points with
the mean….and then find the ‘mean deviation’.
Prof.Gangaboraiah PhD (Stats) | Slide 91 of 125 Former Professor of Statistics | KIMS, B’lore
Interpretation
Important note:
◊ Mean and Standard Deviation
◊ Mean and Variance
Prof.Gangaboraiah PhD (Stats) | Slide 92 of 125 Former Professor of Statistics | KIMS, B’lore
Interpretation
Important note:
◊ It is not enough if only mean is
computed to describe the data but
standard deviation is also needed to
give complete description as both have
same unit of measurement.
Prof.Gangaboraiah PhD (Stats) | Slide 93 of 125 Former Professor of Statistics | KIMS, B’lore
Coefficient of Variation
Important note:
◊ If two variables are measured in different
unit of measurement which is the best
measure to compare the variability?
Ans: Coefficient of Variation
S
CV 100
X
Prof.Gangaboraiah PhD (Stats) | Slide 94 of 125 Former Professor of Statistics | KIMS, B’lore
Coefficient of Variation
Variables Mean SD
Prof.Gangaboraiah PhD (Stats) | Slide 95 of 125 Former Professor of Statistics | KIMS, B’lore
Coefficient of Variation
Prof.Gangaboraiah PhD (Stats) | Slide 96 of 125 Former Professor of Statistics | KIMS, B’lore
The Median
• The score that divides the distribution into two
equal parts, so that half the cases are above it and
half below it.
percentile.
Prof.Gangaboraiah PhD (Stats) | Slide 97 of 125 Former Professor of Statistics | KIMS, B’lore
The Median
Median of Ungrouped Data
Prof.Gangaboraiah PhD (Stats) | Slide 98 of 125 Former Professor of Statistics | KIMS, B’lore
Median
151, 168, 174
Median 168
Median 170.5
Prof.Gangaboraiah PhD (Stats) | Slide 99 of 125 Former Professor of Statistics | KIMS, B’lore
Median
n 1
Value in the position of 2 , if n is odd
Median
Average value in the position of n and n 1, if n is even
2 2
Prof.Gangaboraiah PhD (Stats) | Slide 100 of 125 Former Professor of Statistics | KIMS, B’lore
Median in Grouped Data
Where:
• L = Lower boundary of the category containing the N/2
Prof.Gangaboraiah PhD (Stats) | Slide 101 of 125 Former Professor of Statistics | KIMS, B’lore
Median in Grouped Data
Steps to solve median for grouped data
Prof.Gangaboraiah PhD (Stats) | Slide 102 of 125 Former Professor of Statistics | KIMS, B’lore
Median
Example: Scores of 40 students in a science class consist of 60 items
and they are tabulated below. The highest score is 54 and the lowest
score is 10.
Prof.Gangaboraiah PhD (Stats) | Slide 103 of 125 Former Professor of Statistics | KIMS, B’lore
Median
Solution:
• N/2 = 40/2 = 20
• The category containing N/2 is (35 – 39)
• Lower Limit of MC = 35
• L = 34.5
• Cf (or Cfp) = 17
• f (or fm) = 9
• h=5
1 106 7 8
2 86 2 40
3 200 27 10
4 101 70 80
5 199 8 180
6 103 9 5
7 197 20 80
8 113 12 10
9 112 6 5
10 65 17 8
Prof.Gangaboraiah PhD (Stats) | Slide 106 of 125 Former Professor of Statistics | KIMS, B’lore
Shape of the Distribution
Prof.Gangaboraiah PhD (Stats) | Slide 107 of 125 Former Professor of Statistics | KIMS, B’lore
Shape of the Distribution
• Symmetrical : mean is about equal to median
• Normality: mean = median = mode
8 75th Centile Q1
Pain (VAS)
4 MEDIAN
Q2
2 (50th centile)
0
-2
25th Centile Q3
N= 74 27
Female Male
Inter-quartile
2.5th Centile
range
Prof.Gangaboraiah PhD (Stats) | Slide 111 of 125 Former Professor of Statistics | KIMS, B’lore
Box and Whisker plot
IQR = Q3 – Q1
Q1-1.5 IQR Q3+1.5 IQR
Min Max
Q1 Q2 Q3
Outlier Outlier
Prof.Gangaboraiah PhD (Stats) | Slide 112 of 125 Former Professor of Statistics | KIMS, B’lore
Box-and-Whisker plot
IQR = Q3 – Q1
Max
Q1-3 IQR Q1 Q2 Q3 Q3+3 IQR
Major Outlier Major Outlier
Prof.Gangaboraiah PhD (Stats) | Slide 113 of 125 Former Professor of Statistics | KIMS, B’lore
Percentiles
A score below which a specific percentage of the distribution
falls.
i(n 1)
Pi , i 1, 2, 3, . . . , 99
100
Prof.Gangaboraiah PhD (Stats) | Slide 114 of 125 Former Professor of Statistics | KIMS, B’lore
The Mode
• The category or score with the largest frequency (or
percentage) in the distribution.
• The mode can be calculated for variables with levels of
measurement that are: nominal, ordinal, or interval-ratio.
Example:
• Number of Votes for Candidates for Lok Sabha MP. The mode, in this
case, gives you the “central” response of the voters: the most popular
candidate.
Prof.Gangaboraiah PhD (Stats) | Slide 115 of 125 Former Professor of Statistics | KIMS, B’lore
Mode
Properties
• It can be used when the data are qualitative as well as
quantitative.
• It may not be unique (Uni -, Bi -, Tri- , Poly- modal)
• It is affected by extreme values (outliers).
• It may not exist (Ill – defined : Mode = 3Median – 2Median).
When to Use the Mode
o When the “typical” value is desired.
Prof.Gangaboraiah PhD (Stats) | Slide 117 of 125 Former Professor of Statistics | KIMS, B’lore
Learning Check
a) If all the scores in a data set are the same, the Standard
Deviation is equal to 1.00 True / False ?
Select the correct option
Prof.Gangaboraiah PhD (Stats) | Slide 118 of 125 Former Professor of Statistics | KIMS, B’lore
Solution
a) If all the scores in a data set are the same, they are
equal to the mean and hence the deviation from mean
= 0 therefore, Standard Deviation is equal to zero
False
a) The standard deviation measures …
(1) Sum of squared deviation scores
(2) Standard distance of a score from the mean
(3) Average deviation of a score from the mean
(4) Average squared distance of a score from the
mean
Prof.Gangaboraiah PhD (Stats) | Slide 119 of 125 Former Professor of Statistics | KIMS, B’lore
Learning Check
Select the correct option
a) A sample of four scores has SS = 24. What is the variance?
(1) The variance is 6
(2) The variance is 7
(3) The variance is 8
(4) The variance is 12
Prof.Gangaboraiah PhD (Stats) | Slide 122 of 125 Former Professor of Statistics | KIMS, B’lore
Exercise 1
The following data is the wages of 350 employees of an organisation.
Compute (a) Mean, (b) Median (c) Mode and (d) Standard deviation
Wages 4001- 4501- 5001- 5501- 6001- 6501- 7001- 7501- 8001-
(Rs.) 4500 5000 5500 6000 6500 7000 7500 8000 8500
No. of
25 36 45 62 39 55 44 29 15
persons
Hint for Mode: L = Lower limit of Modal class; f0 = Frequency preceding modal
class; f1 = Frequency of modal class; f2 = Frequency succeeding
modal class; h = Width of class interval
Prof.Gangaboraiah PhD (Stats) | Slide 123 of 125 Former Professor of Statistics | KIMS, B’lore
Exercise 2
In the following three sets of data calculate suitable measure of
central tendency and dispersion? Represent data graphically using
an appropriate graphical method
Sl. No. 1 2 3 4 5 6 7 8 9 10
Intelligence Quotient 106 86 200 101 199 103 197 113 112 65
Hours of TV
7 2 27 70 8 9 20 12 6 17
watching per week
Duration of hospital
8 40 10 80 180 5 80 10 5 8
stay (days)
Prof.Gangaboraiah PhD (Stats) | Slide 124 of 125 Former Professor of Statistics | KIMS, B’lore
Prof.Gangaboraiah PhD (Stats) | Slide 125 of 125 Former Professor of Statistics | KIMS, B’lore