Summarizing and Describing

Numerical Data
Lectures 3+4+5 Topics
‡Measures of Central Tendency
Mean, Median, Mode
‡Measures of Variation
The Range, Variance and
Standard Deviation
‡Shape
Symmetric, Skewed, Skewness, Kurtosis
Summary Measures
Central Tendency
Mean
Median
Mode
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of
Variation
Range
Measures of Central Tendency
Central Tendency
Mean Median
Mode
n
x
n
i
i
¿
=1
The Mean (Arithmetic mean,
Average)
‡It is the Arithmetic Average of data values:
‡The Most Common Measure of Central Tendency
‡Affected by Extreme Values (Outliers)
n
x
n
1 i
i
¿
=
n
x x x
n 2 i
+ - - - + +
=
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
= x
Sample Mean
Sum of the observations
Number of observations
Mean =
ƥ ƥ This is the most popular and useful This is the most popular and useful
measure of central location measure of central location
The Arithmetic The Arithmetic
Mean Mean
n
x
x
i
n
1 i=
¿
=
Sample mean Population mean
N
x
i
N
1 i=
¿
= µ
Sample size Population size
n
x
x
i
n
1 i=
¿
=
The Arithmetic The Arithmetic
Mean Mean
=
+ + +
=
¿
=
=
10
...
10
10 2 1
10
1
x x x x
x
i i
‡ Example 1
The reported time spent on the Internet of 10 adults are 0, 7, 12, 5,
33, 14, 8, 0, 9, 22 hours. Find the mean time spent on the Internet.
00 77 22 22
11.0 hours 11.0 hours
‡ Example 2
Suppose the telephone bills represent
the population of measurements ( 200). The population mean is
=
+ + +
=
¿
= µ
=
200
x ... x x
200
x
200 2 1 i
200
1 i
42.19 42.19 38.45 38.45 45.77 45.77
43.59 43.59
The Arithmetic The Arithmetic
Mean Mean
The arithmetic
mean
Weighted mean for data grouped Weighted mean for data grouped
by categories or variants by categories or variants
i
i i
k
i
f
f x
x
¿
¿
=
=1
When many of the measurements have the same value, the
measurement can be summarized in a frequency table. Suppose
the number of children in a sample of 16 families were recorded
as follows:
NUMBER OF CHILDREN 0 1 2 3
NUMBER OF FAMILIES 3 4 7 2
16 families
5 . 1
16
) 3 ( 2 ) 2 ( 7 ) 1 ( 4 ) 0 ( 3
16
... .
16
16 16 2 2 1 1
16
1
=
+ + +
=
+ +
=
¿
=
=
f x f x f x f x
x
i i i
The Median
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
Median = 5
‡Important Measure of Central Tendency
‡In an ordered array, the median is the
³middle´ number.
‡If n is odd, the median is the middle number.
‡If n is even, the median is the average of the 2
middle numbers.
‡Not Affected by Extreme Values
Odd number of observations
0, 0, 5, 7, 8 9, 12, 14, 22
0, 0, 5, 7, 8, 9, 12, 14, 22, 33 0, 0, 5, 7, 8, 9, 12, 14, 22, 33
Even number of observations
Example 4.3
Find the median of the time spent on the internet
for the adults of example 1
ƥ ƥ The The Median Median of a set of observations is the of a set of observations is the
value that falls in the middle when the value that falls in the middle when the
observations are arranged in order of observations are arranged in order of
magnitude or ranked increasingly magnitude or ranked increasingly
The Median The Median
Suppose only 9 adults were sampled
(exclude, say, the longest time (33))
Comment
8
The Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
‡A Measure of Central Tendency
‡Value that Occurs Most Often
‡Not Affected by Extreme Values
‡There May Not be a Mode
‡There May be Several Modes
‡Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6
No Mode
ƥ ƥ The The Mode Mode of a set of observations is the of a set of observations is the
variable value that occurs most frequently. variable value that occurs most frequently.
ƥ ƥ Set of data may have one mode (or modal Set of data may have one mode (or modal
class), or two or more modes. class), or two or more modes.
The modal class
For large data sets
the modal class is
much more relevant
than a single-value
mode.
The Mode The Mode
Approximating Descriptive Approximating Descriptive
Measures for grouped Measures for grouped
Data by CLASSES Data by CLASSES
ƥ ƥ Approximating descriptive measures for Approximating descriptive measures for
grouped data may be needed in two grouped data may be needed in two
cases: cases:
ƛƛ when approximated values.suffices the needs, when approximated values.suffices the needs,
ƛƛ when only secondary grouped data are when only secondary grouped data are
available. available.
i
k
i
i i
k
i
f
f x
x
1
1
=
=
¿
¿
=
x midpoint
f frequency
Class Class Frequency Midpoint
i limits f
i
x
i
x
i
f
i
1 2-5 3 3.5 10.5
2 5-8 6 6.5 39.0
3 8-11 8 9.5 76.0
«. «. « «. «. .
6 17-20 2 18.5 37.0
n =sample size= 30=f
1
+«+f
n
312.0
Class Class Frequency Midpoint
i limits f
i
x
i
x
i
f
i
1 2-5 3 3.5 10.5
2 5-8 6 6.5 39.0
3 8-11 8 9.5 76.0
«. «. « «. «. .
6 17-20 2 18.5 37.0
n =sample size= 30=f
1
+«+f
n
312.0
ƛƛ Example 3 Example 3
ƥ ƥ Approximate the mean (calculate the mean) of Approximate the mean (calculate the mean) of
the telephone call durations problem as the telephone call durations problem as
represented by the frequency distribution represented by the frequency distribution
5 8 11 14 17 20 More
5
6.5
26 . 10
: value Real
= x
Median and Mode Median and Mode
ƥ ƥ Median Median
Me
1 - Me
1 i
i
0
n
n - 1) (
2
1
K x
¿ ¿
=
+
+ =
i
n
Me
Median and Mode Median and Mode
ƥ ƥ Mode Mode
2 1
1
0
K x
( + (
(
+ = Mo
ƥ ƥ If a distribution is symmetrical, the If a distribution is symmetrical, the
mean, median and mode coincide mean, median and mode coincide
ƥ ƥ If a distribution is non symmetrical, and If a distribution is non symmetrical, and
skewed to the left or to the right, the skewed to the left or to the right, the
three measures differ. three measures differ.
A positively skewed distribution
(´skewed to the rightµ)
Mean
Median
Mode
Mean
Median
Mode
A negatively skewed distribution
(´skewed to the leftµ)
Relationship among Mean, Median, Relationship among Mean, Median,
and Mode and Mode
Summary Measures
Central Tendency
Mean
Median
Mode
n
x
n
i
i
¿
=1
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of
Variation
Range
¸ )
1 n
x x
s
2
i
2

¿

=
Measures of Variation
Variation
Variance Standard Deviation Coefficient of
Variation
Population
Variance
Sample
Variance
Population
Standard
Deviation
Sample
Standard
Deviation
Range
100% ™
¹
º
¸
©
ª
¨
=
X
S
CV
‡ Measure of Variation
‡ Difference Between Largest & Smallest
Observations:
Absolute Range =
‡ Relative Range =
‡Ignores How Data Are Distributed:
The Range
Smallest rgest La
x x
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
mean x x
Smallest La
/ ) (
rgest

Deviation Deviation
ƥ ƥ Individual deviation from the mean = Individual deviation from the mean =
ƥ ƥ Overall deviation = 0, because Overall deviation = 0, because
ƥ ƥ Summing squared deviations Summing squared deviations
or or
absolute values of the deviations absolute values of the deviations
mean x
i

¸ )
¿
= 0 X X
i
¸ )
¿

2
X X
i
| | x x
i ¿

‡Important Measure of Variation
‡Shows Variation About the Mean
‡ Computed as an arithmetic mean of
squared deviations or as a square mean of
individual deviations
‡For the Population:
‡For the Sample:
Variance
¸ )
N
X
i
¿

=
2
2
µ
o
¸ )
1
2
2

¿

=
n
X X
s
i
For the Population: use N in the
denominator.
For the Sample : use n - 1
in the denominator.
‡Most Important Measure of Variation
‡Shows Variation About the Mean:
‡For the Population:
‡For the Sample:
Standard Deviation
¸ )
N
X
i
¿

=
2
µ
o
¸ )
1
2

¿

=
n
X X
s
i
For the Population: use N in the
denominator.
For the Sample : use n - 1
in the denominator.
Sample Standard Deviation
¸ )
1
2

¿

=
n
X X
i
Data: 10 12 14 15 17 18 18 24
s =
n = 8 Mean =16
1 8
16 24 16 18 16 17 16 15 16 14 16 12 16 10
2 2 2 2 2 2 2

+ + + + + + ) ( ) ( ) ( ) ( ) ( ) ( ) (
= 4.2426
s
: X
i
Comparing Standard Deviations
¸ )
1
2

¿

n
X X
i s =
= 4.2426
¸ )
N
X
i
¿

=
2
µ
o = 3.9686
Value for the Standard Deviation is larger for data considered as a Sample.
Data : 10 12 14 15 17 18 18 24 : X
i
N= 8 Mean =16
Comparing Standard Deviations
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B - AGE
Data A - AGE
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
Data C - AGE
Coefficient of Variation Coefficient of Variation
ƥ ƥMeasure of Measure of Relative Variation Relative Variation
ƥ ƥAlways a Always a % or coefficient % or coefficient
ƥ ƥShows Variation Relative to Mean Shows Variation Relative to Mean
ƥ ƥUsed to Used to Compare 2 or More Groups Compare 2 or More Groups
ƥ ƥFormula ( for Sample): Formula ( for Sample):
100% ™
¹
º
¸
©
ª
¨
=
X
S
CV
Comparing Coefficient of Variation Comparing Coefficient of Variation
ƥ ƥ Stock A: Stock A: Average Price last year = Average Price last year = $50 $50
ƥ ƥ Standard Deviation (sd) Standard Deviation (sd) = = $5 $5
ƥ ƥ Stock B: Stock B: Average Price last year Average Price last year = = $100 $100
ƥ ƥ (sd) = (sd) = $5 $5
100% ™
¹
º
¸
©
ª
¨
=
X
S
CV
Coefficient of Variation:
Stock A: CV = 10%
Stock B: CV = 5%
Both average prices are
representatives
Shape Shape
ƥ ƥ Describes How Data Are Distributed Describes How Data Are Distributed
between smallest and largest values between smallest and largest values
ƥ ƥ Measures of Shape: Measures of Shape:
ƥ ƥ Symmetric or skewed Symmetric or skewed
Right-Skewed or
Positively Skewed
Left-Skewed or
Positive Skew-ness
Symmetric
Mean = Median = Mode Mean Median Mode Median Mean Mod
e
Box plot Box plot ƛƛ graphical presentation of graphical presentation of
CTM CTM
Central tendency Central tendency
measures summary measures summary
ƥ ƥ Discussed Measures of Discussed Measures of Central Tendency Central Tendency
ƥ ƥ Mean, Median, Mode Mean, Median, Mode
ƥ ƥ Addressed Measures of Addressed Measures of Variation Variation
ƥ ƥ The Range The Range, , Variance, Variance,
ƥ ƥ Standard Deviation, Coefficient of Standard Deviation, Coefficient of Variation Variation
ƥ ƥ Determined Determined Shape Shape of Distributions of Distributions
ƥ ƥ Symmetric or Skewe Symmetric or Skewed d
ƥ ƥCoefficient of skewness Coefficient of skewness
Mean= Median = Mode Mean Median Mode Mode Median Mean

Lectures 3+4+5 Topics
‡Measures of Central Tendency
Mean, Median, Mode

‡Measures of Variation
The Range, Variance and Standard Deviation

‡Shape
Symmetric, Skewed, Skewness, Kurtosis

Summary Measures
Summary Measures

Central Tendency
Mean Median Mode

Variation Range Variance Standard Deviation Coefficient of Variation

Measures of Central Tendency Central Tendency Mean i !1 Median Mode § xi n n .

Average) ‡It is the Arithmetic Average of data values: x! Sample Mean n i !1 § xi n xi  x 2  y y y  xn ! n ‡The Most Common Measure of Central Tendency ‡Affected by Extreme Values (Outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14 Mean = 5 Mean = 6 .The Mean (Arithmetic mean.

The Arithmetic Mean This is the most popular and useful measure of central location Sum of the observations Mean = Number of observations .

The Arithmetic Mean Sample mean x! n § i!1 x ii 1x Population mean Q! N § i!1 x i n N Sample size Population size .

.. 8.The arithmetic mean The Arithmetic Mean ‡ Example 1 The reported time spent on the Internet of 10 adults are 0.. 12. 14.19 38. 9.77 x1  x 2  . 33.  x 200 Q! ! ! 200 200 200 § i!1 x i 43.59 . Find the mean time spent on the Internet. 5. 22 hours. 0. 7..0 hours 10 ‡ Example 2 Suppose the telephone bills represent the population of measurements ( 200). x! 10 § i !1 xi 10 ! x 0x1  7 2 x10  . The population mean is 42.45 45.  22 ! 11.

Weighted mean for data grouped by categories or variants § k i !1 x! xi f i § fi .

the measurement can be summarized in a frequency table.  x16 f16 3(0)  4(1)  7( 2)  2(3) x ! i! ! ! ! 1.. Suppose the number of children in a sample of 16 families were recorded as follows: NUMBER OF CHILDREN NUMBER OF FAMILIES 0 3 1 4 2 7 3 2 16 families §161 xi f i x1.5 16 16 16 .. f1  x2 f 2 .When many of the measurements have the same value.

the median is the middle number. ‡If n is odd. the median is the ³middle´ number. the median is the average of the 2 middle numbers.The Median ‡Important Measure of Central Tendency ‡In an ordered array. ‡If n is even. ‡Not Affected by Extreme Values 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5 Median = 5 .

33 Odd number of observations 0. 8.The Median The Median of a set of observations is the value that falls in the middle when the observations are arranged in order of Comment Example 4. 9. 14. 22. 9. 7. 22. 22 .12. 8 9. 14. 5.3 or ranked increasingly magnitude Find the median of the time spent on the internet for the adults of example 1 Suppose only 9 adults were sampled (exclude. 7. 12. say. the longest time (33)) Even number of observations 0. 12. 5. 33 0. 0. 7. 8. 0. 5. 14.

The Mode ‡A Measure of Central Tendency ‡Value that Occurs Most Often ‡Not Affected by Extreme Values ‡There May Not be a Mode ‡There May be Several Modes ‡Used for Either Numerical or Categorical Data 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 Mode = 9 No Mode .

Set of data may have one mode (or modal class). The modal class For large data sets the modal class is much more relevant than a single-value mode. . or two or more modes.The Mode The Mode of a set of observations is the variable value that occurs most frequently.

x midpoint § x f x! § i !1 i i k i !1 i f f frequency .suffices the needs. when only secondary grouped data are k available.Approximating Descriptive Measures for grouped Data by CLASSES Approximating descriptive measures for grouped data may be needed in two cases: when approximated values.

5 37.5 39.0 8-11 8 9. «.5 5-8 6 6. « «.0 «.5 . 5 8 11 14 17 20 More 5 6.5 10. 6 Class Frequency Midpoint limits fi xi xi fi 2-5 3 3. 17-20 2 18.0 Real value : x ! 10.0 n =sample size= 30=f1+«+fn 312.Example 3 Approximate the mean (calculate the mean) of the telephone call durations problem as represented by the frequency distribution Class i 1 2 3 «.5 76.26 .

Median and Mode Median Me -1 1 (§ ni  1) 2 Me ! x 0  K n Me §n i !1 i .

Median and Mode Mode (1 Mo ! x 0  K (1  ( 2 .

Median. median and mode coincide If a distribution is non symmetrical. and skewed to the left or to the right. A positively skewed distribution (´skewed to the rightµ) A negatively skewed distribution (´skewed to the leftµ) Mode Mean Median Mean Mode Median . the three measures differ.Relationship among Mean. and Mode If a distribution is symmetrical. the mean.

Summary Measures Summary Measures § .

xi  x s ! n 1 2 2 Central Tendency Mean i !1 Variation Range Variance Coefficient of Variation Mode Median § xi n n Standard Deviation .

Measures of Variation Variation Variance Range Population Variance Sample Variance Standard Deviation Population Standard Deviation Sample Standard Deviation Coefficient of Variation ¨S¸ CV ! © ¹ ™ 100% ªX º .

The Range ‡ Measure of Variation ‡ Difference Between Largest & Smallest Observations: Absolute Range = x Largest  x Smallest ‡ Relative Range = 7 8 9 10 11 12 ( xLargest  xSmallest ) / mean ‡Ignores How Data Are Distributed: 7 8 9 10 11 12 Range = 12 .7 = 5 .7 = 5 Range = 12 .

Deviation Individual deviation from the mean = xi  mean Overall deviation = 0. because § .

X i  X Summing squared deviations or absolute values of the deviations !0 § .

X i X 2 §| x i x | .

Variance ‡Important Measure of Variation ‡Shows Variation About the Mean ‡ Computed as an arithmetic mean of squared deviations or as a square mean of individual deviations § .

Xi  Q ‡For the Population: W ! N 2 2 § .

1 in the denominator. . For the Sample : use n .X i  X ‡For the Sample: s ! n 1 2 2 For the Population: use N in the denominator.

Standard Deviation ‡Most Important Measure of Variation ‡Shows Variation About the Mean: 2 § .

X i  Q ‡For the Population: W! N ‡For the Sample: s! § .

X i  X n 1 2 For the Population: use N in the denominator. For the Sample : use n .1 in the denominator. .

Sample Standard Deviation s Data: § .

2426 .X i  X ! n1 2 Xi : 10 12 14 15 17 18 18 24 n=8 Mean =16 s= (10  16) 2  (12  16) 2  (14  16)2  (15  16) 2  (17  16) 2  (18  16) 2  (24  16)2 81 = 4.

Comparing Standard Deviations Data : X i : 10 12 14 15 17 18 18 24 N= 8 Mean =16 s = W ! § .

X i  X n1 2 § .

.2426 = 3.X i  Q N 2 = 4.9686 Value for the Standard Deviation is larger for data considered as a Sample.

5 s = .5 s = 4.AGE 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.AGE 11 12 13 14 15 16 17 18 19 20 21 .Comparing Standard Deviations Data A .338 Data B .57 Data C .5 s = 3.9258 Mean = 15.AGE 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.

Coefficient of Variation Measure of Relative Variation Always a % or coefficient Shows Variation Relative to Mean Used to Compare 2 or More Groups Formula ( for Sample): ¨S¸ CV ! © ¹ ™ 100% ªX º .

Comparing Coefficient of Variation Stock A: Average Price last year = $50 Standard Deviation (sd) = $5 Stock B: Average Price last year = $100 (sd) = $5 Coefficient of Variation: Stock A: CV = 10% Stock B: CV = 5% Both average prices are representatives ¨S¸ CV ! © ¹ ™ 100% ªX º .

Shape Describes How Data Are Distributed between smallest and largest values Measures of Shape: Symmetric or skewed Left-Skewed or Positive Skew-ness Mean Median Mod e Symmetric Right-Skewed or Positively Skewed Mode Median Mean Mean = Median = Mode .

Box plot CTM graphical presentation of .

.

.

Variance. Range.Central tendency measures summary Discussed Measures of Central Tendency Addressed Measures of Variation The Range. Mode Determined Shape of Distributions Symmetric or Skewed Skewed Coefficient of skewness Mean Median Mode Mean= Median = Mode Mode Median Mean . Coefficient of Variation Mean. Median. Standard Deviation.