Professional Documents
Culture Documents
2. MEASURES OF VARIATION
3. MEASURES OF POSITION
50
¡ Where are the majority of scores concentrated?
1. MEASURES OF
¡ Statistics that are used for describing the center of a
CENTRAL
set of data values:
TENDENCY
Mean
¡ Mean is the arithmetic average of the scores in distribution.
¡ Population Mean
∑ $% • ! is for mean of population
!= • & is size of the population
&
¡ Sample Mean
52
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Sample Mean
¡ Let !" , !$ , !% , … , !' are n numerical values of our data set, then the sample mean,
(, is defined by
denoted by !
*, + *. + ⋯ + *0 ∑034, *3
*̅ = =
1 1
53
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
54
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Example
¡ Find the sample mean of the following scores (The winning scores in the U.S. Masters golf
tournament 1999-2008).
{280, 278, 272, 276, 281, 279, 276, 281, 289, 280}
¡ It is easy to first subtract 280 from these values, +, = ., – 280
{0, −2, −8, −4, 1, −1, −4, 1, 9, 0}
¡ It is easy to determine the mean of +, ’s, i.e +2 = −0.8
¡ So, the mean of original data is,
.̅ = +2 + 280 = 279.2
55
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Exercise
¡ Find the sample mean of the following Statprob Scores.
{90, 87, 85, 92, 90, 86, 98, 95, 91, 81}
56
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Exercise
¡ Find the sample mean of the following Statprob Scores.
{90, 87, 85, 92, 90, 86, 98, 95, 91, 81}
¡ Mean = 89.5
57
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
åfm i i
x= i =1
n
åf
i =1
i
58
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
4
0 (-1 − -)̅ = 0
123
59
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
% %
! &" − &̅ ) ≤! &" − + ),+ ∈.
"#$ "#$
60
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
!" %)
(!" − & %) (
(!" − & (!" − )*) (
90 0.5 0.25 25
87 -2.5 6.25 64
85 -4.5 20.25 100
92 2.5 6.25 9
90 0.5 0.25 25
86 -3.5 12.25 81
98 8.5 72.25 9
95 5.5 30.25 0
91 1.5 2.25 16
81 -8.5 72.25 196
∑ 0 222.5 525
61
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Sample Median
¡ Order the values of a data set of size n from smallest to largest.
¡ If n is odd, the sample median is the value in position (" + $)/'
¡ if n is even, the sample median is the average of the values in positions "/' and "/' + $.
¡ Example
¡ {3, 6, 12, 18, 19, 21, 23} à median = 4th datum = 18.
¡ {3, 6, 12, 18, 19, 21, 23, 25} à median = (18 + 19) / 2 = 18.5
62
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
For left-end-inclusion [Ross, 2009] case, lower limit of an interval is the left-interval-bound
63
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
æ n(0.50) - cf ö
Mdn = ll + çç ÷÷( w)
è fi ø
64
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
æ n(0.50) - cf ö
Mdn = ll + çç ÷÷( w)
è fi ø
æ ö
ç 90 - 50 ÷
Med = 44.5 + ç ÷(5) = 49.26
ç 42 ÷
è ø
65
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Mean vs Median
¡ Mean is highly sensitive to outliers !
(60 + 70 + 80 + 990)
= 300
4
¡ The mean 300 fails to present a realistic picture of the major part of the data. 990 seems to be
an outlier !
66
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
(70 + 80)
= 75
2
¡ In this case, 3 observations out of 4 lie between 60-80, so the median is a good statistic here.
67
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Sample Mode
¡ Mode is the most frequent score in a distribution.
Score f
783 6 783 is the most frequent score (6 times)
785 4 Mode of the data is 783
786 2
788 2
789 2
790 2
791 3
792 2 68
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Sample Mode
¡ When data are grouped into class intervals [Hinkle, 2003], the mode is a modal interval.
And the midpoint of this interval is considered the mode.
69
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
70
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
71
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
72
¡ How widely are scores spread throughout the
distribution ?
¡ These statistics measure how much our variables
2. MEASURES OF vary from the mean.
VARIABILITY
¡ The measures of variation to be discussed:
Range
¡ Range is the number of units on the scale of measurement that include the highest and
lowest values.
!"#$% = (ℎ)$ℎ%*+ *,-.% – 0-1%*+ *,-.%) + 1 5#)+
Distribution 1 11 16 18 … 31 37
Distribution 2 18 19 21 … 26 29
74
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Range
¡ Range is the number of units on the scale of measurement that include the highest and
lowest values.
*+,-. = (ℎ1-ℎ.23 2456. – 859.23 2456.) + 1 ;,13
Distribution 1 11 16 18 … 31 37
Distribution 2 18 19 21 … 26 29
¡ Distribution 1 = 37 − 11 + 1 = 27
¡ Distribution 2 = 29 − 18 + 1 = 12
75
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Mean Deviation
¡ Deviation score is the difference between the given score and the mean.
¡ Mean deviation (MD) is the average of the absolute values of the deviation scores.
n n
å x - x å DS
i i
MD = i =1
= i =1
n n
¡ A larger MD shows a greater variation
76
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Variance
¡ Using square instead of absolute.
¡ Variance is the average of the sum of squared deviations around the mean.
¡ Population variance !2 N
SS å (x - µ) i
2
s2 = = i =1
N N
$$: sum of square
¡ Sample Variance #2
n
SS å (x - x) i
2
s2 = = i =1
n -1 n -1
77
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Sample Variance
¡ The sample variance, call it !", of the data set #$ , #" , #& , … , #( is defined by
å (x - x) i
2
s2 = i =1
n -1
78
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
79
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Low variance
High variance
80
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
(
∑.+,- /+ (1+ − 3)(
' =
5−1
81
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Exercise
¡ Compute the sample variance
"
∑(%&' )% (+% − -)"
! =
/−1
82
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Standard Deviation
¡ Standard deviation is the square root of the variance.
¡ Symbols:
¡ Standard deviation of population !
s = s2
¡ Standard deviation of sample "
s = s2
83
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Example
¡ Compute the standard deviation ¡ !=7
0 *0 )
*0 − * ) )4
(*0 − * ¡ Total ∑%& = 42
1 9 3 9 ,-
2 12 6 36
)̅ =
¡ Mean * = 6
.
3 7 1 1
4 5 -1 1
¡ s
2
=
å (X i - X )
2
=
76
= 12.67
5 2 -4 16 n -1 6
6 3 -3 9
7 4 -2 4
42 0 76
s = s 2 = 12.67 = 3.56
∑
84
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Exercise
¡ Compute mean, mean deviation, and variance of the following grouped frequency table!
85
¡ Where can you find a certain score in reference to
the other scores?
3. MEASURES OF ¡ Statistics to give context or frame of reference, i.e.,
POSITION relative position of a score among other scores.
¡ Some statistics for this problem:
PERCENTILE PERCENTILE
RANK
Sample Percentile
¡ The sample 100p percentile is that data value such that:
¡ 100p percent of the data are less than or equal to it
¡ 100(1 - p) percent are greater than or equal to it
¡ If two data values satisfy this condition, then the sample 100p percentile is the average
of these two values.
¡ Writing convention
87
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
88
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
89
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
90
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
91
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Example
¡ Determine first, second, and third quartile, as well as P70 of the following data set !
{17.11, 6.6, 6.59, 11.06, 2.78, 6.96, 3.79, 4.3}
¡ Ordered data set:
{2.78, 3.79, 4.3, 6.59, 6.6, 6.96, 11.06, 17.11}
92
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Example
¡ Ordered data set:
{2.78, 3.79, 4.3, 6.59, 6.6, 6.96, 11.06, 17.11}
(3.79 + 4.3)
/01 23 = 8(0.25) = 2 /01 = = 4.045
2
(6.59 + 6.6)
/18 23 = 8(0.50) = 4 /18 = = 6.595
2
(6.96 + 11.06)
/91 23 = 8(0.75) = 6 /91 = = 9.01
2
/98 23 = 8(0.70) = 5.6 /98 = 6.96
For left-end-inclusion [Ross, 2009] case, lower limit of an interval is the left-interval-bound 94
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Example
¡ Find the 34th percentile!
æ n. p - cf ö
X th percentile = PX = ll + çç ÷÷( w)
è fi ø
æ 180(0.34) - 50 ö
P34 = 44.5 + ç ÷(5) = 45.83
è 42 ø
95
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Percentile Rank
¡ Percentile rank of a score is the percent of scores less than or equal to that score.
¡ Suppose you got 65 on the final exam of this course. You want to know what percent of
students scored lower.
¡ Writing convention
¡ Percentile rank of score 65 = !"65
96
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
CF’ = the count of all scores less than the score of interest
F = frequency of the score
N = number of scores
97
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
¡ Percentile rank:
6 + (0.5)(1)
)*+, = ×100 = 65
10
98
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
For left-end-inclusion [Ross, 2009] case, lower limit of an interval is the left-interval-bound 99
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Percentile Rank
¡ Find percentile rank of score 61 !
æ X - ll ö
ç cf + fi ÷
PRX = ç w ÷(100)
ç n ÷
ç ÷
è ø
æ 61 - 59.5 ö
ç 159 + 15 ÷
PR61 = ç 5 ÷(100) = 90.83
ç 180 ÷
ç ÷
è ø
100
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
101
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
¡ The difference between !"# − !%# and !&# − !'# may not be the same à ordinal
103
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Box Plot
¡ A straight line segment stretching from the smallest to the largest data value. It contains
information about first to the third quartile on the “box” part.
(3.79 + 4.3)
!"# $% = 8(0.25) = 2 !"# = = 4.045
2
(6.59 + 6.6)
!#3 $% = 8(0.50) = 4 !#3 = = 6.595
2
(6.96 + 11.06)
!5# $% = 8(0.75) = 6 !5# = = 9.01
2
!53 $% = 8(0.70) = 5.6 !53 = 6.96
Min Max
Q1 Q2 Q3
105
SUMMARIZING DATA SETS - STATPROB FASILKOM UI
Outliers
¡ An outlier is an unusual score in a distribution that may warrant special consideration.
106
SUMMARIZING DATA SETS - STATPROB FASILKOM UI