Professional Documents
Culture Documents
Measures of location
A
measure of location provides us information on the percentage of observations in the collection whose values are less than or equal to it
Measures of location
1.
Percentiles divide the set of observations into 100 equal parts Deciles divide the set of observations into 10 equal parts Quartiles divide the set of observations into 4 equal parts
2.
3.
Percentiles
P1 1% of the observations fall below the rst percentile
P2 2% of the observations fall below the second percentile
.
.
P99 99% of the observations fall below the 99th percentile
Percentiles
! i(n + 1) $ Pi = # th & " 100 %
observation in the array
Example
52
52
52
54
55
56
56
57
57
59
59
59
60
61
62
64
65
65
65
66
69
69
70
72
72
75
75
75
75
76
76
77
79
79
83
85
87
87
87
88
89
91
94
95
95
95
97
98
99
Percentiles
When
interpolate i(n + 1) $ and g 1. Let j be the integer part of ! # " 100 & % be the decimal part 2. Compute for the ith percentile using the formula:
not an integer, we
52 52 52 54 55 56 56
57 57 59 59 59 60 61
62 64 65 65 65 66 69
69 70 72 72 75 75 75
75 76 76 77 79 79 83
85 87 87 87 88 89 91
94 95 95 95 97 98 99
Deciles
D1 10% of the observations fall below the rst decile
D2 20% of the observations fall below the second decile
.
.
D9 90% of the observations fall below the 9th decile
Quartiles
Q1 25% of the observations fall below the rst quartile
Q2 50% of the observations fall below the second quartile
Q3 75% of the observations fall below the third quartile
Di = P10i
Md = D5 = Q2 = P50
Example
D2 = P20 D7 = P70 Q1 = P25 Q2 = P50
3.
Construct the <CF column Starting at the top, locate the class with <CF greater than or equal to in/100 for the first time. This is the Pith class Approximate Pi using the formula
Class
Frequency
LCB
<CF
Pith Class
c = class size n = total number of observations in the FDT
Frequency 17 - 18 19 - 20 21 - 22 23 - 24 25 - 26 27 - 28 29 - 30 1 2 3 3 12 7 6
<CF 1 3 6 9 21 28 34
>CF 34 33 31 28 25 13 6
LCB UCB 16.5 - 18.5 18.5 - 20.5 20.5 - 22.5 22.5 - 24.5 24.5 - 26.5 26.5 - 28.5 28.5 - 30.5
P25 = ?
c=2
LCBpi = 22.5
" (in /100) < CFPi 1 % " 8.5 6 % Pi = LCBPi + c $ ' = 24.17 ' = 22.5 + 2 $ # 3 & fpi # &
MEASURES OF DISPERSION
What is it?
A
measure of dispersion is a summary measure that helps us characterize the data set in terms of how varied the observations are from each other
Measure of dispersion
If
its value is small, then this indicates that the observations are not too different from each other so that there is a concentration of observations in the center. its value is high, this indicates that the observations are very different from each other. They are widely spread out from the center.
If
Two classifications
Measure
of absolute dispersion
Measure
of relative dispersion
of absolute dispersion are expressed in the units of the original observations. They can not be used to compare variations of two data sets when the averages of these data sets differ a lot in value or when the observations differ in units of measurement.
Range
The
range of a set of measurements is the difference between the largest and the smallest values. R = Max Min
The
range is approximated from a frequency distribution by getting the difference between the upper class limit of the highest class interval and the lower class limit of the lowest class interval.
( Xi )
i=1
( Xi )
i=1
Population Variance
Where: N is the total number of units in the population is the population mean
s =
( Xi X )
i=1
n 1
s=
( Xi X )
i=1
n 1
Sample Variance
Where: n is the total number of units in the sample X is the sample mean
2. 3.
Get the difference between the observed value, Xi, and the sample mean. ( Xi X ) Square each of differences: ( Xi X )2 Sum the square of the differences:
n 2 ( Xi X ) i=1
n 1
Example
The following scores were given by the 6 judges for a gymnast performance: 7, 5, 7, 9, 8, 6. Find the Standard deviation.
=
7+ 5+ 7+ 9 +8+ 6 =7 6
Example
A
sample of 5 households showed the following number of household members: 3, 8, 5, 4, 4. Find the standard deviation.
3+8+ 5+ 4 + 4 X= = 4.8 5
Interpretation
A
small variance/SD indicates that the observation are highly concentrated on (near the) mean. large variance/SD indicates that, on the average, the observations are far or very different from the mean.
Variance
Standard Deviation
s =
fi( Xi X )
i=1
n 1
Where: fi is the frequency of the ith class Xi is the class mark of the ith class X-bar is the mean of the FDT n is the total number of observations
Xi
i=1
2.
Xi
Xi
i=1
4. Plug into the formula the computed sums obtained in steps 1 and 3
is affected by the value of every observation. It may be distorted by few extreme values.
It
each observation of a set of data is transformed to a new set by the addition (or subtraction) of a constant c, the standard deviation of the new set of data is the same as the standard deviation of the original data set.
a set of data is transformed to a new set by multiplying (or dividing) each observation by a constant c, the standard deviation of the new data set is equal to the standard deviation of the original data set multiplied (or divided) by c.
in comparing the variability of two or more data sets. These data sets can even have different means and units of measurement. comparison is feasible since measures of relative dispersion is the unit-less
The
Coefficient of variation
The
100%
s CV = 100% X
Example
In
1992 Bangko Sentral ng Pilipians (BSP) put the peso on a floating rate basis. Given are the means and standard deviations of the quarterly P-$ exchange rate for the periods 1989 to 1991 and 1992 to 1994. Which of the two periods is more stable?
Interpretation
A
large coefficient of variation indicates that the data set is highly variable. small CV indicates less variability in the data set.
z-score/standard score measures how many standard deviations an observed value is above or below the mean.
X Z=
XX Z= s
In other words
The
z-score helps determine whether the observed value is above or below the mean and how far.
Example
If we consider the grades of other students in the two subjects, Roberts scores is Stat 101 is just as good as his score in econ 11. Based on the Z scores, Robertss Scores in both subjects are 0.5 standard deviations above the mean.
Remarks
The
standard score is not a measure of relative dispersion per se, but it is related is useful for comparing two values from different series specially when these two series differ with respect to the mean or standard deviation or both are expressed in different units.
It
MEASURES OF SKEWNESS
Review
Measures
of central tendency are single figures that can represent the other numbers in the data set. (central figure) Measure of location help determine the relative position of any observation in the distribution. Measures of dispersion describes the spread of the values about the central figure.
What is it?
A
measure of skewness shows shape of the graph(relative frequency distribution) of your dataset. indicates not only the amount of skewness but also the direction.
It
distribution tapers more to the right than to the left longer tail to the right more concentration of values below than above the mean
distribution tapers more to the left than to the right longer tail to the left more concentration of values above than below the mean
X Mo Sk = s
Pearsons
3( X Md ) Sk = s
Interpretation
Example
Given a distribution with the following measures of central tendency and dispersion. What is the shape of the distribution?
Remarks
Since
Kurtosis
THE BOXPLOT
What is it?
The
boxplot is a graph that is very useful for displaying the following features of the data:
Location spread symmetry extremes outliers
2. 3.
Construct a rectangle with one end at the first quartile and the other end at the third quartile. Put a vertical line across the interior of the rectangle at the median. Compute for the interquartile range (IQR), lower fence (FL) and upper fence (FU) given by: IQR = Q3 - Q1 FL = Q1 - 1.5 IQR FU = Q3 + 1.5 IQR
5.
6.
Locate the smallest value contained in the interval [FL , Q1]. Draw a line from this value to Q1. Locate the largest value contained in the interval [Q3,FU]. Draw a line from this value to Q3. Values falling outside the fences are considered outliers and are usually denoted by x.
Examples