Professional Documents
Culture Documents
Descriptive Statistics: Numerical Descriptive Statistics: Numerical Methods Methods Methods Methods
Descriptive Statistics: Numerical Descriptive Statistics: Numerical Methods Methods Methods Methods
Methods
Descriptive Statistics
3.1
3.2
3.3
3.5
Mode, Mo
The Mean
Population X1, X2, , XN
Population Mean
Sample Mean
n
Xi
i=1
x=
i=1
x=
x
i =1
x1 + x2 + ... + xn
=
n
x1 + x2 + x3 + x4 + x5
5
5
30.8 + 31.7 + 30.1 + 31.6 + 32.1 156.3
x=
=
= 31.26
5
5
x=
i =1
The Median
The median Md is a value such that 50% of all
measurements, after having been arranged in
numerical order, lie above (or below) it
1. If the number of measurements is odd, the median
is the middlemost measurement in the ordering
2. If the number of measurements is even, the median
is the average of the two middlemost
measurements in the ordering
The Mode
The mode Mo of a population or sample of
measurements is the measurement that occurs
most frequently
Modes are the values that are observed most
typically
Sometimes higher frequencies at two or more values
If there are two modes, the data is bimodal
If more than two modes, the data is multimodal
Measures of Variation
Knowing the measures of central tendency is not
enough
Both of the distributions below have identical
measures of central tendency
Measures of Variation
Range
Variance
Standard
Deviation
The Range
Largest minus smallest
Measures the interval spanned by all the
data
For Figure 3.13, largest repair time is 5
and smallest is 3
Range is 5 3 = 2 days
Variance
2 =
2
(
)
x
i
i =1
2
2
2
(
x1 ) + ( x2 ) + L + (x N )
=
s2 =
2
(
)
x
x
i
i =1
n 1
2
2
2
(
x1 x ) + ( x2 x ) + L + ( xn x )
=
n 1
Standard Deviation
Population standard deviation ():
s= s
5
576 + 25 + 441 + 36 + 4 1082
=
=
= 216.4
5
5
s2 =
(x x )
i =1
5 1
2
2
2
2
2
(
30.8 31.26) + (31.7 31.26) + (30.1 31.26) + (31.6 31.26) + (32.1 31.26)
=
4
2.572
=
= 0.643
4
s = s 2 = 0.643 = 0.8019
Chebyshevs Theorem
Coefficient of Variation
Measures the size of the standard deviation
relative to the size of the mean
Coefficient of variation =standard
deviation/mean 100%
Used to:
Compare the relative variabilities of values about the
mean
Compare the relative variability of populations or
samples with different means and different standard
deviations
Measure risk
Quartiles
Quartiles split the ranked data into 4 segments with an
equal number of values per segment
25%
25%
Q1
25%
Q2
25%
Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are
larger)
Only 25% of the observations are greater than the third
quartile
Quartile Formulas
Find a quartile by determining the value in the
appropriate position in the ranked data, where
First quartile position:
Q1 = (n+1)/4
Q3 = 3(n+1)/4
Quartiles
Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the
so use the value half way between the 2nd and 3rd values,
so
Q1 = 12.5
Quartiles
(continued)
Example:
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so
Q1 = 12.5
Q2 = median = 16
Q3 = 19.5
minimum
Q1
25%
12
Median
(Q2)
25%
30
Q3
25%
45
Interquartile range
= 57 30 = 27
maximum
25%
57
70
Weighted Means
Sometimes, some measurements are more
important than others
Assign numerical weights to the data
Weights measure relative importance of the value
w x
w
i i
i
Continued
Continued
(
26.9 4.1) + (50.6 4.7 ) + (34.7 4.4 ) + (32.5 5.0 )
=
26.9 + 50.6 + 34.7 + 25.5 + 32.5
663.29
=
= 4.58%
144.7
0-1
1
1-2
4
2-3
8
3-4
7
4-5
3
5-6
2
A
Class
0-1
1-2
2-3
3-4
4-5
5-6
Totals
Mean
B
X
0.5
1.5
2.5
3.5
4.5
5.5
C
f
1
4
8
7
3
2
25
fX
X=
n
D
fX
0.5
6.0
20.0
24.5
13.5
11.0
75.5
3.02
= 75.5/25=3.02
(n/2) m
c
Median = L +
f
Where
L =Lower limit of the median class
n = Total number of observations = f
m = Cumulative frequency preceding the median class
f = Frequency of the median class
c = Class interval of the median class
0-1
1
1-2
4
2-3
8
3-4
7
4-5
3
5-6
2
Cumulative
Frequency
1
5
13
20
23
25
0-1
1
1-2
4
2-3
8
3-4
7
4-5
3
5-6
2
Total
25
Substituting in the formula the relevant values,
Median =
= 2.9375
L+
(n/2) m
c
f
(25/ 2) 5
2+
1
8
d1 = f1 f0
d2 = f1 f2
f1
f0
f2
0-1
1
1-2
4
2-3
8
3-4
7
4-5
3
5-6
2
d1
c
Mode = L +
d1 + d 2
L=2
d1 = f1 f0 = 8-4 = 4
d2 = f1 f 2 = 8-7 = 1
4
2
+
1
C = 1 Hence Mode =
5
= 2.8
f(X X )
S=
which is used to estimate the Population
n 1
Standard Deviation .
Here
X =
fX
n
Number of Mutual
Funds
10
12
16
14
8
60
fX
=1040/60=17.333(cell F10),
Standard Deviation = S =
(Cell H12)
f(X X)
n 1
2448.33
59
= 6.44
Practice Problem
Q. 3.47 pp. 154, calculate sample mean, variance and s.d.
Age (Years)
Frequency
28-32
33-37
38-42
43-47
13
48-52
14
53-57
12
58-62
63-67
68-72
73-77
Solution
Midpoint
Freq
M*f
(M mean)**2
F*diff^2
30
30
462.25
462.25
35
105
272.25
816.75
40
45
50
55
60
3
13
14
12
9
120
585
700
660
540
132.25
42.25
2.25
12.25
72.25
396.75
549.25
31.5
147
650.25
65
65
182.25
182.25
70
210
342.25
1026.75
75
1
75
552.25
variance = 81.61017
sample mean = 51.5
552.25
s = 9.033835