Professional Documents
Culture Documents
First :
What is Statistics?
Data Information
Sample
— A sample is a set of data drawn from the
population.
— Potentially very large, but less than the population.
E.g. a sample of 1,000 voters who took part in a poll.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.4
Key Statistical Concepts…
Parameter
— A descriptive measure of a population.
Statistic
— A descriptive measure of a sample.
Subset
Statistic
Parameter
Populations have Parameters,
Samples have Statistics.
Sample
Inference
Statistic
Parameter
However:
Such conclusions and estimates are not always going to be correct.
We need to know when a sample is less likely to be a good sample
Sample statistics n
x
̂ X n i 1
n
n
Truth (not (x X i n)
2
ˆ 2 s 2 i 1
n 1
observable)
Sample *hat notation ^ is often used to indicate
“estitmate”
Population (observation)
parameters
N N
x
i 1
(x )
i
2
2 i 1
N N
Make guesses about
the whole
population
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Statistic Parameter
Sample
Finance Marketing
20.6% 25.3%
80 73
70 64
60 52
Frequency
50
36
40
28
30
20
10
0
1 2 3 4 5 More
Area
first class…
next class: .355+.185=.540
:
:
60
45 13
40 60 9
20 75 10
0 90 18
15 30 45 60 75 90 105 120 105 28
Bills 120 14
40
20
0
60
15
30
45
75
90
105
120
Bills
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Shapes of histograms
similar
Professionals tend
to read the Post
more than twice as
often as the Star or
Sun…
Interval Nominal
Data Data
Histogram, Ogive, Frequency and
Single Set of or Stem-and-Leaf Relative Frequency
Data Display Tables, Bar and
Pie Charts
Relationship Scatter Diagram Contingency Table,
Between Bar Charts
Two Variables
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Numerical Description
1 Measures of Central Location
Mean, Median, Mode
The measure of central location reflects an “average” location for the data
points.
2 Measures of Variability
Range, Standard Deviation, Variance, Coefficient of Variation
• Example 4.1
The reported time on the Internet of 10 adults are 0, 7, 12, 5, 33,
14, 8, 0, 9, 22 hours. Find the mean time on the Internet.
i 1 xi
10
0x1 7x2 ... 22
x10
x 11.0
10 10
MEAN
•The mean is a typical value used to represent the
central location of a probability distribution.
•The mean of a probability distribution is also
referred to as its expected value.
Solution
All observation except “0” occur once. There are two “0”. Thus, the mode
is zero.
Is this a good measure of central location?
The value “0” does not reside at the center of this set
(compare with the mean = 11.0 and the mode = 8.5).
Mean Mean
Median Median
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 46
Validity of mean, median and
mode
Range
Smallest Largest
observation observation
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 50
The Variance
ni 1( x i x )2
s2
n 1
Example 4.7
The following sample consists of the number of jobs six
students applied for: 17, 15, 23, 7, 9, 13. Finds its mean
and variance
Solution:
i61 x i 17 15 23 7 9 13 84
x 14 jobs
6 6 6
n 2
2 ( x i x) 1
s i1
(17 14) 2 (15 14) 2 ...(13 14) 2
n 1 6 1
33.2 jobs 2
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 52
The Variance – Shortcut
method
2 1 n
2 ( n
x
i1 i ) 2
s x i
n 1 i1 n
1 2 2
2 17 15 ... 13
2
17 15 ... 13
6 1 6
33.2 jobs 2
2
Sample standard deviation : s s
2
Population standard deviation :
2 1.290 1.136
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
The Coefficient of
Variation
The coefficient of variation of a set of measurements is
the standard deviation divided by the mean value.
s
Sample coefficient of variation : cv
x
Population coefficient of variation : CV
This coefficient provides a proportionate measure of
variation.
A standard deviation of 10 may be perceived
large when the mean value is 100, but only
moderately large when the mean value is 500
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 61
Box Plot
Example 4.14
Bills
42.19
38.45 Left hand boundary = 9.275–1.5(IQR)= -104.226
29.23
89.35
Right hand boundary=84.9425+ 1.5(IQR)=198.4438
118.04
110.46
.
Smallest =. 0 -104.226 0 9.275 84.9425 119.63 198.4438
.
Q1 = 9.275 26.905
Median = 26.905
Q3 = 84.9425 No outliers are found
Largest = 119.63
IQR = 75.6675
Outliers = ()
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 63
Box Plot: GMAT Scores