Descriptive Statistics: Numerical Descriptive Statistics: Numerical Methods Methods Methods Methods

Descriptive Statistics: Numerical
Methods
Descriptive Statistics
3.1
3.2
3.3
3.5
Describing Central Tendency

Measures of Variation
Percentiles, Quartiles
Grouped Data
Describing Central Tendency

In addition to describing the shape of a
distribution, want to describe the data
sets central tendency
A measure of central tendency represents the
center or middle of the data
Parameters and Statistics

A population parameter is a number
calculated from all the population
measurements that describes some
aspect of the population
A sample statistic is a number calculated
using the sample measurements that
describes some aspect of the sample
Measures of Central Tendency

Mean,
Median, Md
Mode, Mo
The average or expected

value
The value of the middle
point of the ordered
measurements
The most frequent value
The Mean
Population X1, X2, , XN
Sample x1, x2, , xn
Population Mean
Sample Mean
n
Xi
i=1
x=
i=1
The Sample Mean

For a sample of size n, the sample mean is defined as
n
x=
x
i =1
x1 + x2 + ... + xn
=
n
and is a point estimate of the population mean

It is the value to expect, on average and in the long run
Example 3.1: The Car Mileage Case

Example 3.1: Sample mean for first five car
mileages from Table 3.1:
30.8, 31.7, 30.1, 31.6, 32.1
5
x1 + x2 + x3 + x4 + x5
5
5
30.8 + 31.7 + 30.1 + 31.6 + 32.1 156.3
x=
=
= 31.26
5
5
x=
i =1
The Median
The median Md is a value such that 50% of all
measurements, after having been arranged in
numerical order, lie above (or below) it
1. If the number of measurements is odd, the median
is the middlemost measurement in the ordering
2. If the number of measurements is even, the median
is the average of the two middlemost
measurements in the ordering
Example: Car Mileage Case

Example 3.1: First five observations
from Table 3.1:
30.8, 31.7, 30.1, 31.6, 32.1
In order: 30.1, 30.8, 31.6, 31.7, 32.1
There is an odd so median is one in
middle, or 31.6
The Mode
The mode Mo of a population or sample of
measurements is the measurement that occurs
most frequently
Modes are the values that are observed most
typically
Sometimes higher frequencies at two or more values
If there are two modes, the data is bimodal
If more than two modes, the data is multimodal
When data are in classes, the class with the highest

frequency is the modal class
The tallest box in the histogram
Histogram Describing the 50 Mileages
Relationships Among Mean, Median

and Mode
Knowing the measures of central tendency is not
enough
Both of the distributions below have identical
measures of central tendency
Range
Largest minus the smallest

measurement
Variance
The average of the squared

deviations of all the population
measurements from the population
mean
Standard
Deviation
The square root of the variance
The Range
Largest minus smallest
Measures the interval spanned by all the
data
For Figure 3.13, largest repair time is 5
and smallest is 3
Range is 5 3 = 2 days
Population Variance and Standard

Deviation
The population variance (2) is the average
of the squared deviations of the individual
population measurements from the
population mean ()
The population standard deviation () is the
positive square root of the population
variance
Variance
For a population of size N, the population

variance 2 is:
N
2 =
2
(
)
x
i
i =1
2
2
2
(
x1 ) + ( x2 ) + L + (x N )
=
For a sample of size n, the sample

variance s2 is:
n
s2 =
2
(
)
x
x
i
i =1
n 1
2
2
2
(
x1 x ) + ( x2 x ) + L + ( xn x )
=
n 1
Standard Deviation
Population standard deviation ():
Sample standard deviation (s):
s= s
Example: Chriss Class Sizes This

Semester
Data points are: 60, 41, 15, 30, 34
Mean is 36
Variance is:
2
2
2
2
2
(
60 36 ) + (41 36 ) + (15 36 ) + (30 36 ) + (34 36)
=
2
5
576 + 25 + 441 + 36 + 4 1082
=
=
= 216.4
5
5
Standard deviation is:

= 216.4 = 14.71
Example: Sample Variance and

Standard Deviation
Example 3.7: data for first five car
mileages from Table 3.1 are 30.8, 31.7, 30.1,
31.6, 32.1
The sample mean is 31.26
5
s2 =
(x x )
i =1
5 1
2
2
2
2
2
(
30.8 31.26) + (31.7 31.26) + (30.1 31.26) + (31.6 31.26) + (32.1 31.26)
=
4
2.572
=
= 0.643
4
s = s 2 = 0.643 = 0.8019
The Empirical Rule for Normal

Populations
If a population has mean and standard
deviation and is described by a normal curve,
then
68.26% of the population measurements lie within
one standard deviation of the mean: [-, +]
two standard deviations of the mean: [-2, +2]
three standard deviations of the mean: [-3, +3]
The Empirical Rule and Tolerance Intervals
Example 3.9: The Car Mileage Case

Continued
68.26% of all individual cars will have

mileages in the range
[xs] = [31.60.8] = [30.8, 32.4] mpg
[x2s] = [31.61.6] = [30.0, 33.2] mpg
[x3s] = [31.62.4] = [29.2, 34.0] mpg
Chebyshevs Theorem
Let and be a populations mean and

standard deviation, then for any value k>
1
At least 100(1 - 1/k2 )% of the population
measurements lie in the interval [-k,
+k]
Coefficient of Variation
Measures the size of the standard deviation
relative to the size of the mean
Coefficient of variation =standard
deviation/mean 100%
Used to:
Compare the relative variabilities of values about the
mean
Compare the relative variability of populations or
samples with different means and different standard
deviations
Measure risk
Percentiles, Quartiles, and BoxBox-and

and-Whiskers Displays
For a set of measurements arranged in
increasing order, the pth percentile is a value
such that p percent of the measurements fall at
or below the value and (100-p) percent of the
measurements fall at or above the value
The first quartile Q1 is the 25th percentile
The second quartile (or median) is the 50th
percentile
The third quartile Q3 is the 75th percentile
The interquartile range IQR is Q3 - Q1
Quartiles
Quartiles split the ranked data into 4 segments with an
equal number of values per segment
25%
25%
Q1
25%
Q2
25%
Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are
larger)
Only 25% of the observations are greater than the third
quartile
Quartile Formulas
Find a quartile by determining the value in the
appropriate position in the ranked data, where
First quartile position:
Q1 = (n+1)/4
Second quartile position: Q2 = (n+1)/2 (the median position)

Third quartile position:
Q3 = 3(n+1)/4
where n is the number of observed values
Quartiles
Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the
(9+1)/4 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so
Q1 = 12.5
Q1 and Q3 are measures of noncentral location

Q2 = median, a measure of central tendency
Quartiles
(continued)
Example:
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so
Q1 = 12.5
Q2 is in the (9+1)/2 = 5th position of the ranked data,

so
Q2 = median = 16
Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data,

so
Q3 = 19.5
Five Number Summary

1.
2.
3.
4.
5.
The smallest measurement

The first quartile, Q1
The median, Md
The third quartile, Q3
The largest measurement
Displayed visually using a box-andwhiskers plot
Five Number Summary

Example:
X
minimum
Q1
25%
12
Median
(Q2)
25%
30
Q3
25%
45
Interquartile range
= 57 30 = 27
maximum
25%
57
70
Weighted Means
Sometimes, some measurements are more
important than others
Assign numerical weights to the data
Weights measure relative importance of the value
Calculate weighted mean as
w x
w
i i
i
where wi is the weight assigned to the ith

measurement xi
Example: Weighted Mean

June 2001 unemployment rates by census
region
Northeast, 26.9 million in civilian labor force,
4.1% unemployment rate
South, 50.6 million, 4.7% unemployment
Midwest, 34.7 million, 4.4% unemployment
West, 32.5 million, 5.0 unemployment
Want the mean unemployment rate for

the US
Continued
Want the mean unemployment rate for

the U.S.
Calculate it as a weighted mean
So that the bigger the region, the more heavily
it counts in the mean
The data values are the regional

unemployment rates
The weights are the sizes of the regional
labor forces
Continued
(
26.9 4.1) + (50.6 4.7 ) + (34.7 4.4 ) + (32.5 5.0 )
=
26.9 + 50.6 + 34.7 + 25.5 + 32.5
663.29
=
= 4.58%
144.7
Note that the unweigthed mean is 4.55%,

which underestimates the true rate by 0.03%
That is, 0.0003 144.7 million = 43,410
workers
Descriptive Statistics for Grouped Data

Data already categorized into a frequency
distribution or a histogram is called
grouped data
Can calculate the mean and variance even
when the raw data is not available
Calculations are slightly different for data
from a sample and data from a population
Mean for Grouped Data

Example
Find the arithmetic mean for the following continuous
frequency distribution:
Class
Frequency
0-1
1
1-2
4
2-3
8
3-4
7
4-5
3
5-6
2
Solution for the Example

1
2
3
4
5
6
7
8
9
A
Class
0-1
1-2
2-3
3-4
4-5
5-6
Totals
Mean
Applying the formula
B
X
0.5
1.5
2.5
3.5
4.5
5.5
C
f
1
4
8
7
3
2
25
fX
X=
n
D
fX
0.5
6.0
20.0
24.5
13.5
11.0
75.5
3.02
= 75.5/25=3.02
Median for Grouped Data

Formula for Median is given by
(n/2) m
c
Median = L +
f
Where
L =Lower limit of the median class
n = Total number of observations = f
m = Cumulative frequency preceding the median class
f = Frequency of the median class
c = Class interval of the median class
Median for Grouped Data

Example
Find the median for the following continuous
frequency distribution:
Class
Frequency
0-1
1
1-2
4
2-3
8
3-4
7
4-5
3
5-6
2

Class Frequency
Cumulative
Frequency
1
5
13
20
23
25
0-1
1
1-2
4
2-3
8
3-4
7
4-5
3
5-6
2
Total
25
Substituting in the formula the relevant values,
Median =
= 2.9375
L+
(n/2) m
c
f
,we have Median =
(25/ 2) 5
2+
1
8
Mode for Grouped Data

d1
c
Mode = L +
d1 + d 2
Where L =Lower limit of the modal class
d1 = f1 f0
d2 = f1 f2
f1
= Frequency of the modal class
f0
= Frequency preceding the modal class
f2
= Frequency succeeding the modal class

C = Class Interval of the modal class
Mode for Grouped Data

Example
Example: Find the mode for the following
continuous frequency distribution:
Class
Frequency
0-1
1
1-2
4
2-3
8
3-4
7
4-5
3
5-6
2

Class Frequency
0-1
1
1-2
4
2-3
8
3-4
7
4-5
3
5-6
2
Total 25
d1
c
Mode = L +
d1 + d 2
L=2
d1 = f1 f0 = 8-4 = 4
d2 = f1 f 2 = 8-7 = 1
4
2
+
1
C = 1 Hence Mode =
5
= 2.8
Standard Deviation for

Grouped Data
The standard deviation for sample data, based on frequency
distribution is given by
f(X X )
S=
which is used to estimate the Population
n 1
Standard Deviation .
Here
X =
fX
n
n is the Sample Size =
, X =Mid Point of each class
Standard Deviation for

Grouped DataData-Example
Frequency Distribution of Return on Investment of Mutual Funds
Return on
Investment
5-10
10-15
15-20
20-25
25-30
Total
Number of Mutual
Funds
10
12
16
14
8
60

From the spreadsheet of Microsoft Excel in the previous
slide, it is easy to see
Mean = X =
fX
=1040/60=17.333(cell F10),
Standard Deviation = S =
(Cell H12)
f(X X)
n 1
2448.33
59
= 6.44
Practice Problem
Q. 3.47 pp. 154, calculate sample mean, variance and s.d.
Age (Years)
Frequency
28-32
33-37
38-42
43-47
13
48-52
14
53-57
12
58-62
63-67
68-72
73-77
Solution
Midpoint
Freq
M*f
(M mean)**2
F*diff^2
30
30
462.25
462.25
35
105
272.25
816.75
40
45
50
55
60
3
13
14
12
9
120
585
700
660
540
132.25
42.25
2.25
12.25
72.25
396.75
549.25
31.5
147
650.25
65
65
182.25
182.25
70
210
342.25
1026.75
75
1
75
552.25
variance = 81.61017
sample mean = 51.5
552.25
s = 9.033835

Descriptive Statistics: Numerical Descriptive Statistics: Numerical Methods Methods Methods Methods

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Descriptive Statistics: Numerical Descriptive Statistics: Numerical Methods Methods Methods Methods

Uploaded by

Copyright:

Available Formats

Descriptive Statistics: Numerical

Describing Central Tendency

Describing Central Tendency

Parameters and Statistics

Measures of Central Tendency

The average or expected

Sample x1, x2, , xn

The Sample Mean

and is a point estimate of the population mean

Example 3.1: The Car Mileage Case

Example: Car Mileage Case

When data are in classes, the class with the highest

Histogram Describing the 50 Mileages

Relationships Among Mean, Median

Largest minus the smallest

The average of the squared

The square root of the variance

Population Variance and Standard

For a population of size N, the population

For a sample of size n, the sample

Sample standard deviation (s):

Example: Chriss Class Sizes This

Standard deviation is:

Example: Sample Variance and

The Empirical Rule for Normal

The Empirical Rule and Tolerance Intervals

Example 3.9: The Car Mileage Case

68.26% of all individual cars will have

Let and be a populations mean and

Percentiles, Quartiles, and BoxBox-and

Second quartile position: Q2 = (n+1)/2 (the median position)

where n is the number of observed values

(9+1)/4 = 2.5 position of the ranked data

Q1 and Q3 are measures of noncentral location

Q2 is in the (9+1)/2 = 5th position of the ranked data,

Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data,

Five Number Summary

The smallest measurement

Five Number Summary

Calculate weighted mean as

where wi is the weight assigned to the ith

Example: Weighted Mean

Want the mean unemployment rate for

Example: Weighted Mean

Want the mean unemployment rate for

The data values are the regional

Example: Weighted Mean

Note that the unweigthed mean is 4.55%,

Descriptive Statistics for Grouped Data

Mean for Grouped Data

Solution for the Example

Applying the formula

Median for Grouped Data

Median for Grouped Data

Solution for the Example

,we have Median =

Mode for Grouped Data

= Frequency of the modal class

= Frequency preceding the modal class

= Frequency succeeding the modal class

Mode for Grouped Data

Solution for the Example

Standard Deviation for