Descriptive Stat 3

MEASURES OF LOCATION
Measures of location
A
measure of location provides us information on the percentage of observations in the collection whose values are less than or equal to it
Measures of location
1.
Percentiles divide the set of observations into 100 equal parts Deciles divide the set of observations into 10 equal parts Quartiles divide the set of observations into 4 equal parts
2.
3.
Percentiles
P1 1% of the observations fall below the rst percentile

P2 2% of the observations fall below the second percentile

.

.
P99 99% of the observations fall below the 99th percentile

Percentiles
! i(n + 1) $ Pi = # th & " 100 %
observation in the array
*Sometimes k is used instead of i, they are interchangeable
Example
52 52 52 54 55 56 56 57 57 59 59 59 60 61 62 64 65 65 65 66 69 69 70 72 72 75 75 75 75 76 76 77 79 79 83 85 87 87 87 88 89 91 94 95 95 95 97 98 99
! i(n + 1) $ Pi = # th & " 100 % X (10) = 59

20% of the students in the sample have scores lower than 59
Percentiles
When
interpolate i(n + 1) $ and g 1. Let j be the integer part of ! # " 100 & % be the decimal part 2. Compute for the ith percentile using the formula:
! i(n + 1) $ is # " 100 & %
not an integer, we
Pi = (1 g) X ( j ) + g( X ( j + 1)) P 85 = (1 0.5)(91) + 0.5(94) = 92.5
52 52 52 54 55 56 56
57 57 59 59 59 60 61
62 64 65 65 65 66 69
69 70 72 72 75 75 75
75 76 76 77 79 79 83
85 87 87 87 88 89 91
94 95 95 95 97 98 99
! i(n + 1) $ ! 85(49 + 1) $ P 85 = # =# = 42.5 & & " 100 % " 100 %

j = 42; g = 0.5 X(42) = 91; X(42+1) = 94
Pi = (1 g) X ( j ) + g( X ( j + 1)) P85 = (1 0.5)(91) + 0.5(94) = 92.5

85% of the student in the sample have scores lower than 92.5
Deciles
D1 10% of the observations fall below the rst decile

D2 20% of the observations fall below the second decile

.

.
D9 90% of the observations fall below the 9th decile

Quartiles
Q1 25% of the observations fall below the rst quartile

Q2 50% of the observations fall below the second quartile

Q3 75% of the observations fall below the third quartile

Median, Deciles, Quartiles

The
median, deciles and quartiles are special cases of percentiles: Qi = P25i
Di = P10i
Md = D5 = Q2 = P50
Example
D2 = P20 D7 = P70 Q1 = P25 Q2 = P50
Getting Percentiles from the FDT

1. 2.
3.
Construct the <CF column Starting at the top, locate the class with <CF greater than or equal to in/100 for the first time. This is the Pith class Approximate Pi using the formula
" (in /100) < CFPi 1 % Pi = LCBPi + c $ ' fpi # &
<Cfpi-1 = <CF of the class preceding the Pith class
Class
Frequency
LCB
<CF
Pith Class
c = class size n = total number of observations in the FDT
LCBpi = lower class boundary of the Pith class
fpi= frequency of the Pith class
" (in /100) < CFPi 1 % Pi = LCBPi + c $ ' fpi # &
Frequency 17 - 18 19 - 20 21 - 22 23 - 24 25 - 26 27 - 28 29 - 30 1 2 3 3 12 7 6
<CF 1 3 6 9 21 28 34
>CF 34 33 31 28 25 13 6
LCB UCB 16.5 - 18.5 18.5 - 20.5 20.5 - 22.5 22.5 - 24.5 24.5 - 26.5 26.5 - 28.5 28.5 - 30.5
CM 17.5 19.5 21.5 23.5 25.5 27.5 29.5
P25 = ?
in/100 = (25)(34)/100 = 8.5 <CFpi-1 = 6 fpi = 3
c=2
LCBpi = 22.5
" (in /100) < CFPi 1 % " 8.5 6 % Pi = LCBPi + c $ ' = 24.17 ' = 22.5 + 2 $ # 3 & fpi # &
MEASURES OF DISPERSION
What is it?
A
measure of dispersion is a summary measure that helps us characterize the data set in terms of how varied the observations are from each other
Measure of dispersion
If
its value is small, then this indicates that the observations are not too different from each other so that there is a concentration of observations in the center. its value is high, this indicates that the observations are very different from each other. They are widely spread out from the center.
If
Two classifications
Measure
of absolute dispersion
Measure
of relative dispersion
Measures of Absolute Dispersion

Measures
of absolute dispersion are expressed in the units of the original observations. They can not be used to compare variations of two data sets when the averages of these data sets differ a lot in value or when the observations differ in units of measurement.
Range
The
range of a set of measurements is the difference between the largest and the smallest values. R = Max Min
The
range is approximated from a frequency distribution by getting the difference between the upper class limit of the highest class interval and the lower class limit of the lowest class interval.
Variance and Standard deviation

The variance is used to describe the variation of the measurements in the collection. Defined as the average squared difference of the observation from the mean. The standard deviation (SD) is positive square root of the variance.
Population Var and SD

N
( Xi )
i=1
( Xi )
i=1
Population Variance
Population Standard Deviation
Where: N is the total number of units in the population is the population mean
Sample Var and SD

n
s =
( Xi X )
i=1
n 1
s=
( Xi X )
i=1
n 1
Sample Variance
Sample Standard deviation
Where: n is the total number of units in the sample X is the sample mean
Steps for the sample variance

1.
2. 3.
Get the difference between the observed value, Xi, and the sample mean. ( Xi X ) Square each of differences: ( Xi X )2 Sum the square of the differences:
n 2 ( Xi X ) i=1
Steps for the sample variance

4. Divide the sum of the squared differences by n-1:
n 2 ( Xi X ) i=1
n 1
Example
The following scores were given by the 6 judges for a gymnast performance: 7, 5, 7, 9, 8, 6. Find the Standard deviation.
=
7+ 5+ 7+ 9 +8+ 6 =7 6
(7 7)2 + (5 7)2 + (7 7)2 + (9 7)2 + (8 7)2 + (6 7)2 10 = = = 1.3 6 6
Example
A
sample of 5 households showed the following number of household members: 3, 8, 5, 4, 4. Find the standard deviation.
3+8+ 5+ 4 + 4 X= = 4.8 5
(3 4.8)2 + (8 4.8)2 + (5 4.8)2 + (4 4.8)2 + (4 4.8)2 14.8 s = = = 1.9 5 4
Interpretation
A
small variance/SD indicates that the observation are highly concentrated on (near the) mean. large variance/SD indicates that, on the average, the observations are far or very different from the mean.
Computational formula for Var and SD

# & n Xi % Xi ( $ i=1 ' 2 i=1 s = n(n 1)
n n 2 2
Variance
n # & 2 n Xi % Xi ( $ i=1 ' i=1 s= n(n 1) n
Standard Deviation
Getting the Var from the FDT

k
s =
fi( Xi X )
i=1
n 1
k # & 2 n fiXi % fiXi ( $ i=1 ' i=1 s= n(n 1) k
Where: fi is the frequency of the ith class Xi is the class mark of the ith class X-bar is the mean of the FDT n is the total number of observations
Steps for sample variance in FDT

1.
Get the sum of the observed values Xi:

k
Xi
i=1
2.
Square each of the observed values Xi:
Xi
Steps for sample variance in FDT

3. Get the sum of the square of each of the observed values:
k
Xi
i=1
4. Plug into the formula the computed sums obtained in steps 1 and 3
Some characteristics of the SD

It
is affected by the value of every observation. It may be distorted by few extreme values.
It
can not be computed from an openended distribution.

If
each observation of a set of data is transformed to a new set by the addition (or subtraction) of a constant c, the standard deviation of the new set of data is the same as the standard deviation of the original data set.

If
a set of data is transformed to a new set by multiplying (or dividing) each observation by a constant c, the standard deviation of the new data set is equal to the standard deviation of the original data set multiplied (or divided) by c.
Measures of Relative dispersion

Useful
in comparing the variability of two or more data sets. These data sets can even have different means and units of measurement. comparison is feasible since measures of relative dispersion is the unit-less
The
Coefficient of variation
The
ratio of the SD to the mean of a data set. It is expressed as percentage.

CV =
100%
s CV = 100% X
Example
In
1992 Bangko Sentral ng Pilipians (BSP) put the peso on a floating rate basis. Given are the means and standard deviations of the quarterly P-$ exchange rate for the periods 1989 to 1991 and 1992 to 1994. Which of the two periods is more stable?
Interpretation
A
large coefficient of variation indicates that the data set is highly variable. small CV indicates less variability in the data set.
The Standard Score

The
z-score/standard score measures how many standard deviations an observed value is above or below the mean.
X Z=
XX Z= s
In other words
The
z-score helps determine whether the observed value is above or below the mean and how far.
Example
If we consider the grades of other students in the two subjects, Roberts scores is Stat 101 is just as good as his score in econ 11. Based on the Z scores, Robertss Scores in both subjects are 0.5 standard deviations above the mean.
Remarks
The
standard score is not a measure of relative dispersion per se, but it is related is useful for comparing two values from different series specially when these two series differ with respect to the mean or standard deviation or both are expressed in different units.
It
MEASURES OF SKEWNESS
Review
Measures
of central tendency are single figures that can represent the other numbers in the data set. (central figure) Measure of location help determine the relative position of any observation in the distribution. Measures of dispersion describes the spread of the values about the central figure.
What is it?
A
measure of skewness shows shape of the graph(relative frequency distribution) of your dataset. indicates not only the amount of skewness but also the direction.
It
Two types of skewness

Positively
skewed or skewed to the right
distribution tapers more to the right than to the left longer tail to the right more concentration of values below than above the mean
Two types of skewness

Negatively
Skewed or Skewed to the Left
distribution tapers more to the left than to the right longer tail to the left more concentration of values above than below the mean
Common measures of skewness

Pearsons
first coefficient of skewness
X Mo Sk = s
Pearsons
second coefficient of skewness
3( X Md ) Sk = s
Interpretation
Example
Given a distribution with the following measures of central tendency and dispersion. What is the shape of the distribution?
Remarks
Since
the mode is frequently only an approximation, formula 2 is preferred.
Kurtosis
THE BOXPLOT
What is it?
The
boxplot is a graph that is very useful for displaying the following features of the data:
Location spread symmetry extremes outliers
Steps for Box Plot construction

1.
2. 3.
Construct a rectangle with one end at the first quartile and the other end at the third quartile. Put a vertical line across the interior of the rectangle at the median. Compute for the interquartile range (IQR), lower fence (FL) and upper fence (FU) given by: IQR = Q3 - Q1 FL = Q1 - 1.5 IQR FU = Q3 + 1.5 IQR
Steps for Box Plot construction

4.
5.
6.
Locate the smallest value contained in the interval [FL , Q1]. Draw a line from this value to Q1. Locate the largest value contained in the interval [Q3,FU]. Draw a line from this value to Q3. Values falling outside the fences are considered outliers and are usually denoted by x.
Examples
Lets construct the box-plot for this dataset.

Descriptive Stat 3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Descriptive Stat 3

Uploaded by

Copyright:

Available Formats

MEASURES OF LOCATION

*Sometimes k is used instead of i, they are interchangeable

! i(n + 1) $ Pi = # th & " 100 % X (10) = 59

! i(n + 1) $ is # " 100 & %

Pi = (1 g) X ( j ) + g( X ( j + 1)) P 85 = (1 0.5)(91) + 0.5(94) = 92.5

! i(n + 1) $ ! 85(49 + 1) $ P 85 = # =# = 42.5 & & " 100 % " 100 %

Pi = (1 g) X ( j ) + g( X ( j + 1)) P85 = (1 0.5)(91) + 0.5(94) = 92.5

Median, Deciles, Quartiles

median, deciles and quartiles are special cases of percentiles: Qi = P25i

Getting Percentiles from the FDT

" (in /100) < CFPi 1 % Pi = LCBPi + c $ ' fpi # &

<Cfpi-1 = <CF of the class preceding the Pith class

LCBpi = lower class boundary of the Pith class

fpi= frequency of the Pith class

" (in /100) < CFPi 1 % Pi = LCBPi + c $ ' fpi # &

CM 17.5 19.5 21.5 23.5 25.5 27.5 29.5

in/100 = (25)(34)/100 = 8.5 <CFpi-1 = 6 fpi = 3

Measures of Absolute Dispersion

Variance and Standard deviation

Population Var and SD

Population Standard Deviation

Sample Var and SD

Sample Standard deviation

Steps for the sample variance

Steps for the sample variance

(7 7)2 + (5 7)2 + (7 7)2 + (9 7)2 + (8 7)2 + (6 7)2 10 = = = 1.3 6 6

(3 4.8)2 + (8 4.8)2 + (5 4.8)2 + (4 4.8)2 + (4 4.8)2 14.8 s = = = 1.9 5 4

Computational formula for Var and SD

n # & 2 n Xi % Xi ( $ i=1 ' i=1 s= n(n 1) n

Getting the Var from the FDT

k # & 2 n fiXi % fiXi ( $ i=1 ' i=1 s= n(n 1) k

Steps for sample variance in FDT

Get the sum of the observed values Xi:

Square each of the observed values Xi:

Steps for sample variance in FDT

Some characteristics of the SD

can not be computed from an openended distribution.

Some characteristics of the SD

Some characteristics of the SD

Measures of Relative dispersion

ratio of the SD to the mean of a data set. It is expressed as percentage.

The Standard Score

Two types of skewness

skewed or skewed to the right

Two types of skewness

Skewed or Skewed to the Left

Common measures of skewness

first coefficient of skewness

second coefficient of skewness

the mode is frequently only an approximation, formula 2 is preferred.

Steps for Box Plot construction

Steps for Box Plot construction

Lets construct the box-plot for this dataset.

You might also like