Professional Documents
Culture Documents
salmansaeed@uetpeshawar.edu.pk
Summarizing Data
• As discussed in previous slides:
– Data matrix is often huge
– For presentation purposes we have to summarize data
Summarize Data
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Summarizing Data – Central Tendency
• First we discuss the how to measure the central
tendency of data
Summarize Data
Mode
Measures
Median of Central
Tendency
Mean
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Central Tendency - Mode
• The value that occurs most frequently – the most
common outcome
• Commonly used with Nominal and Ordinal
measurements
• On the Pie Chart, the value with the biggest pie is
the mode
• On Bar Graphs and Histograms, the highest bar
represents the mode
• On Frequency Distributions, the peak occurs at the
mode value
• That’s why the distribution with two peaks is called a
bimodal distribution
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Examples of modes
50 66
Mode 25
119
140
13
12
11
10
Left Arm Fast Left Arm Medium Right Arm Fast Right Arm Medium Other
9
8
7
6
5
4
3
2
1
0
175
193
155
157
159
161
163
165
167
169
171
173
177
179
181
183
185
187
189
191
195
197
199
155 160 165 170 175 180 185 190 195 200
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
What is NOT the mode
140
50 66
119
Mode 25
66 119
50 140
25 13 13
12
12
11 11
11
1010 10
10
Left Arm Fast Left Arm Medium Right Arm Fast Right Arm Medium Other
9
8
8
7
7
6
555
5
44
4
3 3
3
22 2 22 222 2 2
2
11 1 1 1 1 1 1 1
1
0 0 0 0 0 00 000 0
0
175
193
155
157
159
161
163
165
167
169
171
173
177
179
181
183
185
187
189
191
195
197
199
155 160 165 170 175 180 185 190 195 200
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Central Tendency - Median
• The central value in an ordered list
Consider the following list, for example:
8 9 3 6 7 8 1 5 2
What is the mode?
8 – because it occurs most frequently
To find the median, we have to order the list:
1 2 3 5 6 7 8 8 9
Median
50% of the values are BELOW median 50% of the values are ABOVE median
Median = ( 5 + 6 ) / 2
=
#of obs. =9
Σof all obs. = 1620
Mean = 1620 / 9 = 180
155 160 165 170 175 180 185 190 195 200
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Central Tendency - Mean
Consider the following list of numbers
6 7 7 8 8 9 => mean = 7.5
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Which measure of central tendency should be used?
Central Tendency
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Which measure of central tendency should be used?
Central Tendency
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Which measure of central tendency should be used?
Central Tendency
Mean
Quantitative
Median
[Interval & Ratio]
Mode
All three measures can be easily calculated for quantitative measurements. However, Mean and
Median are better measures of central tendency as compared to Mode.
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Which measure of central tendency should be used?
Consider the yearly income of individuals
Yearly Income
Person 1 $46,000
Person 2 $41,000 Mean= $42,285.71
Person 3 $39,000 Median= $41,000
Person 4 $38,000
Person 5 $41,000
Person 6 $45,000
Person 7 $46,000
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Which measure of central tendency should be used?
Let’s add another individual to the list
Yearly Income
Person 1 $46,000
Person 2 $41,000 Mean= $42,285.71
Person 3 $39,000 Median= $41,000
Person 4 $38,000
Person 5 $41,000
Person 6 $45,000 New Mean= $87,87,000
Person 7 $46,000 New Median= $43,000
Person 8 $70,000,000 Outlier
There was a slight change in the median but the mean has
changed significantly. The Mean is sensitive to outliers
while the median remain relatively unchanged.
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Which measure of central tendency should be used?
Categorical Mode
No Mean
Quantitative Outliers
Yes Median
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Summarizing Data
• Consider the following two data sets, presented here as dot plots
• Both data sets have the same mean, same median and same mode
• The first data set is more “spread out” as compared to the second one
Mean=178
Median=178
Mode=177
155 160 165 170 175 180 185 190 195 200
Same
Central Physical Height (cm)
Tendency
• Clearly, none of the central tendency measures can fully describe the data
• We need another measure that can describe the spread of the data
Mean=178
Median=178
Mode=177
155 160 165 170 175 180 185 190 195 200
Physical Height (cm)
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Summarizing Data – Spread
Summarize Data
Range
Interquartile Range
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Range
• Range is the difference between highest and lowest
value in a data set
Max=199 Highest Value = 199, Lowest Value = 156, Range = 199-156 = 43
Min=156
Range=43
155 160 165 170 175 180 185 190 195 200
Physical Height (cm)
155 160 165 170 175 180 185 190 195 200
Physical Height (cm)
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Range
• Range is the difference between highest and lowest
value in a data set
Max=199 The Range is:
Min=156 • Easy to understand
Range=43 • Simple to compute
155 160 165 170 175 180 185 190 195 200
Physical Height (cm)
155 160 165 170 175 180 185 190 195 200
Physical Height (cm)
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Range
• Consider another two data sets
• We can see two different data sets, one in which the data is more
Mean 178 dispersed compared to the other
Median 178 • But Neither the central tendency measures nor the range can tell
Mode 177 them apart
Range 43
Same
155 165 175 185 195
Central
Tendency Physical Height (cm)
&
Spread
Q1 Q2 Q3
• The median divides the data such that 50% of the values are above
it, and 50% are below it
• Now we divide the data such that 25% of the values are below it –
This is the first quartile.
• Now add another division such that 25% of the values are above it
– This will be the third quartile
• While the median is the second quartile
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Interquartile Range
• Interquartile range is the difference between the first and the third
quartile
• The three quartiles divide the data into four parts or quarters
Median
50% of the 50% of the
data data
Q1 Q2 Q3
IQR = Q3 – Q1
• Interquartile Range (IQR) is simply the third quartile (Q3) minus the
first quartile (Q1)
Player 4 133.8
Player 5 58.2
Player 6 98.2 Q1 Median/Q2 Q3
Player 7 199.6
Player 8 117.1
Player 9 183.7
Player 10 101.8
Player 11 86.7 Outliers do not affect the calculation – hence, the IQR
gives a more realistic description of dispersion in the data
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Interquartile Range – Detecting outliers
• Values that are 1.5 x IQR lower or higher than first and
third quartiles, respectively are generally considered to
be outliers in the data
IQR = Q3 – Q1
Q1 Median/Q2 Q3 = 133.8 – 90.8 = 43
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Box and Whiskers Plot
Outlier
Q3
Q1
Outlier
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Making a Box and Whisker Plot
• Consider the data from Box and Whiskers Plot
our previous example 200
170
140
110 110.7
80
50
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Making a Box and Whisker Plot
• Consider the data from Box and Whiskers Plot
our previous example 200
170
140
110 110.7
80
50
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Making a Box and Whisker Plot
• Consider the data from Box and Whiskers Plot
our previous example 200
170
140
110.7 120
110 110.7
80
50
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Making a Box and Whisker Plot
• Now join the Q3 and Box and Whiskers Plot
Q1 lines to make a box 200
180
170
160
150
140
133.8
130
120
90 90.8
80
70
60
50
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Making a Box and Whisker Plot
• Now join the Q3 and Box and Whiskers Plot
Q1 lines to make a box 200
198.3
199.6
120
IQR = 43
110 110.7
IQR x 1.5 = 64.5
100
90 90.8
80
70
60 58.2
50
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Making a Box and Whisker Plot
• Now join the Q3 and Box and Whiskers Plot
Q1 lines to make a box 200
198.3
199.6
90 90.8
80
70
60 58.2
50
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Making a Box and Whisker Plot
• Now join the Q3 and Box and Whiskers Plot
Q1 lines to make a box 200
198.3
199.6
80
70
60 58.2
50
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Making a Box and Whisker Plot
• Now join the Q3 and Box and Whiskers Plot
Q1 lines to make a box 200 199.6
whisker 70
60 58.2
50
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Making a Box and Whisker Plot
• We adopt the same Box and Whiskers Plot
procedure to make the 200 199.6
170
160
150
140
133.8
130
120
110 110.7
100
90 90.8
80
70
60 58.2
50
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Making a Box and Whisker Plot
• We adopt the same Box and Whiskers Plot
procedure to make the 200 199.6
170
• Next we mark the
160
outliers that are the
150
data points outside the
140
whiskers 133.8
130
120
110 110.7
100
90 90.8
80
70
60 58.2
50
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Exercise - 1
Weight (Kg) 1. Summarize the two tables Weight (Kg)
Player 1 103.8 using following descriptors Player 1 128.5
Player 2 113.2 and compare them: Player 2 113.9
Player 3 94.9 Player 3 87.2
Player 4 105.6 • Median Player 4 116
Player 5 111.6 • Mean Player 5 101.9
Player 6 112.2 • Range Player 6 122.4
Player 7 119.6 • IQR Player 7 78.7
Player 8 85.7 Player 8 91.2
Player 9 109 2. Plot and compare the Box Player 9 92.8
Player 10 90.3 and Whiskers plots for two Player 10 52.2
Player 11 106.2 tables Player 11 147.7
Player 12 114.6 Player 12 96.6
Player 13 114.4 3. Find all the parameters in Player 13 132.6
Player 14 110.7 Player 14 50.8
question 1 above, after
Player 15 97.1 Player 15 137.1
removing the outliers.
Player 16 65.3 Player 16 107.5
Comment on how much
Player 17 98.1 Player 17 117.2
Player 18 117.5
they changed
Player 18 141.4
Player 19 99.9 Player 19 57.4
Player 20 102.6 Player 20 127
Player 21 122.5 Player 21 65.5
Player 22 110.3 Player 22 137.1
Player 23 102 Player 23 73.9
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Trimmed Mean
• Like the Median, another measure with reduced
effects of outliers is the trimmed mean.
58.2 86.7 90.8 98.2 101.8 110.7 115.3 117.1 133.8 183.7 199.6
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Mean and Median
Two samples of 10 seedlings were planted in a green
house, one sample was treated with nitrogen and the
other was not treated. All other environmental
conditions were held constant. The weight of stems of
the plants growing out of these seedlings after 140
days were recorded and are presented in the following
table:
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Mean and Median
S.No. No Nitrogen Nitrogen
1 0.28 0.26
2 0.32 0.43
3 0.36 0.46
4 0.37 0.47
5 0.38 0.49
6 0.42 0.52
7 0.43 0.62
8 0.43 0.75
9 0.47 0.79
10 0.53 0.86
Sum 3.99 5.65
Mean 3.99/10=0.399 5.65/10=0.565
Median (0.38+0.42)/2=0.4
(0.49+0.52)/2=0.505
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Trimmed Mean
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Trimmed Mean
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Trimmed Mean
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Trimmed Mean - Example
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar
Trimmed Mean - Example
23-Jul-2020 Lecture # 02 GEOL 703 – Applied Geo-statistics Dr. Salman Saeed, NIUIP, UET Peshawar