You are on page 1of 62

MEASURES OF LOCATION

Measures of location
A

measure of location provides us information on the percentage of observations in the collection whose values are less than or equal to it

Measures of location
1.

Percentiles divide the set of observations into 100 equal parts Deciles divide the set of observations into 10 equal parts Quartiles divide the set of observations into 4 equal parts

2.

3.

Percentiles
P1 1% of the observations fall below the rst percentile

P2 2% of the observations fall below the second percentile

.

.
P99 99% of the observations fall below the 99th percentile

Percentiles
! i(n + 1) $ Pi = # th & " 100 %
observation in the array

*Sometimes k is used instead of i, they are interchangeable

Example
52 52 52 54 55 56 56 57 57 59 59 59 60 61 62 64 65 65 65 66 69 69 70 72 72 75 75 75 75 76 76 77 79 79 83 85 87 87 87 88 89 91 94 95 95 95 97 98 99

! i(n + 1) $ Pi = # th & " 100 % X (10) = 59


20% of the students in the sample have scores lower than 59

Percentiles
When

interpolate i(n + 1) $ and g 1. Let j be the integer part of ! # " 100 & % be the decimal part 2. Compute for the ith percentile using the formula:

! i(n + 1) $ is # " 100 & %

not an integer, we

Pi = (1 g) X ( j ) + g( X ( j + 1)) P 85 = (1 0.5)(91) + 0.5(94) = 92.5

52 52 52 54 55 56 56

57 57 59 59 59 60 61

62 64 65 65 65 66 69

69 70 72 72 75 75 75

75 76 76 77 79 79 83

85 87 87 87 88 89 91

94 95 95 95 97 98 99

! i(n + 1) $ ! 85(49 + 1) $ P 85 = # =# = 42.5 & & " 100 % " 100 %


j = 42; g = 0.5 X(42) = 91; X(42+1) = 94

Pi = (1 g) X ( j ) + g( X ( j + 1)) P85 = (1 0.5)(91) + 0.5(94) = 92.5


85% of the student in the sample have scores lower than 92.5

Deciles
D1 10% of the observations fall below the rst decile

D2 20% of the observations fall below the second decile

.

.
D9 90% of the observations fall below the 9th decile

Quartiles
Q1 25% of the observations fall below the rst quartile

Q2 50% of the observations fall below the second quartile


Q3 75% of the observations fall below the third quartile

Median, Deciles, Quartiles


The

median, deciles and quartiles are special cases of percentiles: Qi = P25i

Di = P10i

Md = D5 = Q2 = P50

Example
D2 = P20 D7 = P70 Q1 = P25 Q2 = P50

Getting Percentiles from the FDT


1. 2.

3.

Construct the <CF column Starting at the top, locate the class with <CF greater than or equal to in/100 for the first time. This is the Pith class Approximate Pi using the formula

" (in /100) < CFPi 1 % Pi = LCBPi + c $ ' fpi # &

<Cfpi-1 = <CF of the class preceding the Pith class

Class

Frequency

LCB

<CF

Pith Class
c = class size n = total number of observations in the FDT

LCBpi = lower class boundary of the Pith class

fpi= frequency of the Pith class

" (in /100) < CFPi 1 % Pi = LCBPi + c $ ' fpi # &

Frequency 17 - 18 19 - 20 21 - 22 23 - 24 25 - 26 27 - 28 29 - 30 1 2 3 3 12 7 6

<CF 1 3 6 9 21 28 34

>CF 34 33 31 28 25 13 6

LCB UCB 16.5 - 18.5 18.5 - 20.5 20.5 - 22.5 22.5 - 24.5 24.5 - 26.5 26.5 - 28.5 28.5 - 30.5

CM 17.5 19.5 21.5 23.5 25.5 27.5 29.5

P25 = ?

in/100 = (25)(34)/100 = 8.5 <CFpi-1 = 6 fpi = 3

c=2

LCBpi = 22.5

" (in /100) < CFPi 1 % " 8.5 6 % Pi = LCBPi + c $ ' = 24.17 ' = 22.5 + 2 $ # 3 & fpi # &

MEASURES OF DISPERSION

What is it?
A

measure of dispersion is a summary measure that helps us characterize the data set in terms of how varied the observations are from each other

Measure of dispersion
If

its value is small, then this indicates that the observations are not too different from each other so that there is a concentration of observations in the center. its value is high, this indicates that the observations are very different from each other. They are widely spread out from the center.

If

Two classifications
Measure

of absolute dispersion

Measure

of relative dispersion

Measures of Absolute Dispersion


Measures

of absolute dispersion are expressed in the units of the original observations. They can not be used to compare variations of two data sets when the averages of these data sets differ a lot in value or when the observations differ in units of measurement.

Range
The

range of a set of measurements is the difference between the largest and the smallest values. R = Max Min
The

range is approximated from a frequency distribution by getting the difference between the upper class limit of the highest class interval and the lower class limit of the lowest class interval.

Variance and Standard deviation


The variance is used to describe the variation of the measurements in the collection. Defined as the average squared difference of the observation from the mean. The standard deviation (SD) is positive square root of the variance.

Population Var and SD


N

( Xi )
i=1

( Xi )
i=1

Population Variance

Population Standard Deviation

Where: N is the total number of units in the population is the population mean

Sample Var and SD


n

s =

( Xi X )
i=1

n 1

s=

( Xi X )
i=1

n 1

Sample Variance

Sample Standard deviation

Where: n is the total number of units in the sample X is the sample mean

Steps for the sample variance


1.

2. 3.

Get the difference between the observed value, Xi, and the sample mean. ( Xi X ) Square each of differences: ( Xi X )2 Sum the square of the differences:
n 2 ( Xi X ) i=1

Steps for the sample variance


4. Divide the sum of the squared differences by n-1:
n 2 ( Xi X ) i=1

n 1

Example
The following scores were given by the 6 judges for a gymnast performance: 7, 5, 7, 9, 8, 6. Find the Standard deviation.
=
7+ 5+ 7+ 9 +8+ 6 =7 6

(7 7)2 + (5 7)2 + (7 7)2 + (9 7)2 + (8 7)2 + (6 7)2 10 = = = 1.3 6 6

Example
A

sample of 5 households showed the following number of household members: 3, 8, 5, 4, 4. Find the standard deviation.
3+8+ 5+ 4 + 4 X= = 4.8 5

(3 4.8)2 + (8 4.8)2 + (5 4.8)2 + (4 4.8)2 + (4 4.8)2 14.8 s = = = 1.9 5 4

Interpretation
A

small variance/SD indicates that the observation are highly concentrated on (near the) mean. large variance/SD indicates that, on the average, the observations are far or very different from the mean.

Computational formula for Var and SD


# & n Xi % Xi ( $ i=1 ' 2 i=1 s = n(n 1)
n n 2 2

Variance

n # & 2 n Xi % Xi ( $ i=1 ' i=1 s= n(n 1) n

Standard Deviation

Getting the Var from the FDT


k

s =

fi( Xi X )
i=1

n 1

k # & 2 n fiXi % fiXi ( $ i=1 ' i=1 s= n(n 1) k

Where: fi is the frequency of the ith class Xi is the class mark of the ith class X-bar is the mean of the FDT n is the total number of observations

Steps for sample variance in FDT


1.

Get the sum of the observed values Xi:


k

Xi
i=1

2.

Square each of the observed values Xi:

Xi

Steps for sample variance in FDT


3. Get the sum of the square of each of the observed values:
k

Xi
i=1

4. Plug into the formula the computed sums obtained in steps 1 and 3

Some characteristics of the SD


It

is affected by the value of every observation. It may be distorted by few extreme values.

It

can not be computed from an openended distribution.

Some characteristics of the SD


If

each observation of a set of data is transformed to a new set by the addition (or subtraction) of a constant c, the standard deviation of the new set of data is the same as the standard deviation of the original data set.

Some characteristics of the SD


If

a set of data is transformed to a new set by multiplying (or dividing) each observation by a constant c, the standard deviation of the new data set is equal to the standard deviation of the original data set multiplied (or divided) by c.

Measures of Relative dispersion


Useful

in comparing the variability of two or more data sets. These data sets can even have different means and units of measurement. comparison is feasible since measures of relative dispersion is the unit-less

The

Coefficient of variation
The

ratio of the SD to the mean of a data set. It is expressed as percentage.


CV =

100%

s CV = 100% X

Example
In

1992 Bangko Sentral ng Pilipians (BSP) put the peso on a floating rate basis. Given are the means and standard deviations of the quarterly P-$ exchange rate for the periods 1989 to 1991 and 1992 to 1994. Which of the two periods is more stable?

Interpretation
A

large coefficient of variation indicates that the data set is highly variable. small CV indicates less variability in the data set.

The Standard Score


The

z-score/standard score measures how many standard deviations an observed value is above or below the mean.

X Z=

XX Z= s

In other words
The

z-score helps determine whether the observed value is above or below the mean and how far.

Example

If we consider the grades of other students in the two subjects, Roberts scores is Stat 101 is just as good as his score in econ 11. Based on the Z scores, Robertss Scores in both subjects are 0.5 standard deviations above the mean.

Remarks
The

standard score is not a measure of relative dispersion per se, but it is related is useful for comparing two values from different series specially when these two series differ with respect to the mean or standard deviation or both are expressed in different units.

It

MEASURES OF SKEWNESS

Review
Measures

of central tendency are single figures that can represent the other numbers in the data set. (central figure) Measure of location help determine the relative position of any observation in the distribution. Measures of dispersion describes the spread of the values about the central figure.

What is it?
A

measure of skewness shows shape of the graph(relative frequency distribution) of your dataset. indicates not only the amount of skewness but also the direction.

It

Two types of skewness


Positively

skewed or skewed to the right

distribution tapers more to the right than to the left longer tail to the right more concentration of values below than above the mean

Two types of skewness


Negatively

Skewed or Skewed to the Left

distribution tapers more to the left than to the right longer tail to the left more concentration of values above than below the mean

Common measures of skewness


Pearsons

first coefficient of skewness

X Mo Sk = s

Pearsons

second coefficient of skewness

3( X Md ) Sk = s

Interpretation

Example
Given a distribution with the following measures of central tendency and dispersion. What is the shape of the distribution?

Remarks
Since

the mode is frequently only an approximation, formula 2 is preferred.

Kurtosis

THE BOXPLOT

What is it?
The

boxplot is a graph that is very useful for displaying the following features of the data:
Location spread symmetry extremes outliers

Steps for Box Plot construction


1.

2. 3.

Construct a rectangle with one end at the first quartile and the other end at the third quartile. Put a vertical line across the interior of the rectangle at the median. Compute for the interquartile range (IQR), lower fence (FL) and upper fence (FU) given by: IQR = Q3 - Q1 FL = Q1 - 1.5 IQR FU = Q3 + 1.5 IQR

Steps for Box Plot construction


4.

5.

6.

Locate the smallest value contained in the interval [FL , Q1]. Draw a line from this value to Q1. Locate the largest value contained in the interval [Q3,FU]. Draw a line from this value to Q3. Values falling outside the fences are considered outliers and are usually denoted by x.

Examples

Lets construct the box-plot for this dataset.

You might also like