Professional Documents
Culture Documents
SAS
http://v8doc.sas.com/sashtml/
Descriptive Statistics
78 76 82 75 85 82 78 74 83 90
70 76 85 92 87 67 65 68 73 74
83 88 86 85 92 90 82 75 69 80
85 77 86 85 90 85 80 70 65 60
Stem-and-Leaf Plots
We can see that the data ranges from about 60 to about 95.
If you look at the page sideways you can see the distribution
of the data. The same rule that says you should 5-20 classes
of data in a histogram applies to a stem and leaf diagram. We
could clearly expand the stem and leaf diagram to include
more rows and could also be condensed to include fewer rows.
HISTOGRAM
-median
-upper and lower quartile
-minimum and maximum value
1 1 2 2 2 3 3 3 3 4 4 4 5 5 5
5 8 1 4 7 0 3 6 9 2 5 8 1 4 7
Relative
Type Frequency
Frequency
Motor Vehicle 43,500 0.578
Falls 12,200 0.162
Poison 6,400 0.085
Drowning 4,600 0.061
Fire 4,200 0.056
Ingestion of Food/Object 2,900 0.039
Firearms 1,400 0.019
n = 75,200
Pie Chart
Next, find the central angle. To find the central angle, multiply the
relative frequency by 360°.
Relative
Type Frequency Angle
Frequency
Motor Vehicle 43,500 0.578 208.2°
Falls 12,200 0.162 58.4°
Poison 6,400 0.085 30.6°
Drowning 4,600 0.061 22.0°
Fire 4,200 0.056 20.1°
Ingestion of Food/Object 2,900 0.039 13.9°
Firearms 1,400 0.019 6.7°
Pie Chart
Ingestion Firearms
3.9% 1.9%
Fire
5.6%
Drowning
6.1%
Poison
8.5% Motor
vehicles
Falls 57.8%
16.2%
Times Series Chart
A data set that is composed of quantitative data entries taken at
regular intervals over a period of time is a time series. A time
series chart is used to graph a time series.
Example:
Month Minute
The following table lists the
January s
236
number of minutes Robert used
on his cell phone for the last six
February 242
months.
March 188
Construct a time series chart
April 175
for the number of minutes May 199
used.
June 135
Times Series Chart
Robert’s Cell Phone
Usage
250
200
Minutes
150
100
50
0
Jan Feb Mar Apr May June
Month
Quartiles and Percentiles
+------+-+
o * |---------| + | | -- |
+-----+-+
+---+---+---+---+---+---+---+---+---+---+ number line
0 1 2 3 4 5 6 7 8 9 10
Measures of the center
10 11 12 12 15 17 21 22 23 27
Measures of the center
The mean (or arithmetic mean) is the average of these data
points. To calculate the mean you simply add the data points
and divide by the number of data points. The mean is denoted
by x . In our example above:
Sum of data points: 10+11+12+12+15+17+21+22+23+27 =
170
Number of data points = 10
Average = 170/10 = 17
The median is the middle value when the scores are arranged
in order of increasing (or decreasing) magnitude To calculate
the median follow this rule:
If the number of scores is odd, the median is the number that
is located in the exact middle of the list If the number of
scores is even, the median is found by computing the mean of
the two middle numbers
NOTE: TO APPLY THE RULES ABOVE THE LISTS MUST BE
SORTED!
Measures of the center
The mode of the data set is the score that occurs most
frequently. When two scores occur with the same greatest
frequency, each one is a mode and the data is bimodal. If
more than two scores occur with the same greatest frequency,
each is a mode and the data is multimodal. When all scores
occur just once there is no mode. The mode is denoted by M
The value 12 in the above dataset occurs most frequently and
is therefore the mode.
µ=∑
x
N is the mean of a population
Measures of Variation
Measures of central tendency give us measures of where the
middle of a set of data occurs, but this is not enough to
characterize a set of data.
Both these data sets have a mean of 70. Yet the first data set
is more widely dispersed than the second data set. So a
measure of variation is clearly needed.
17 20 21 18 20 20 20 18 19 19
20 19 22 20 18 20 18 19 20 19
Measures of Variation
The range is the difference between the highest value and
the lowest value in a dataset.
And
1 2 5 8 10
Both have a range of 9, yet the first data set is clearly not as
dispersed as the second.
Measures of Variation
A more accurate measure of variation can be given by
the standard deviation of the data.
n _
∑ i
( x −x ) 2
s= i =1
n −1
Measures of Variation
n _
∑ i
( x −x ) 2
s2 = i =1
n −1
Interpretation of standard
deviation
A small standard deviation means the data is close together,
a large deviation means the data is wide spread
The range rule of thumb states that for typical data sets,
the range of the data is about 4 standard deviations wide so
the standard deviation is about the range divided by 4. This
is a very rough estimate
The 68-95-99 rule states that about 68% of all scores fall
within one standard deviation of the mean, 95% of all scores
fall within about 2 standard deviations of the mean and
99.7% of all scores fall within 3 standard deviations from the
mean.