Professional Documents
Culture Documents
SUMMARY MEASURES
(1) Piles of raw data, by themselves, may not be informative, but when data are presented in
summary form, they may be much more interesting and meaningful to us. In most cases,
we need to summarize a given set of data rather maintain the entire set. Single numbers
called summary (or descriptive) statistics can be calculated for such a purpose. Two kinds
of summary statistics are particularly important to most data users – measures of central
tendency and measures of variabilit
(2)
(3) Measures of location summarize a data set by giving a “typical value” within the range of
the data values that describes its location relative to entire data set.
(4) A measure of variation is a single value that is used to describe the spread of the distribution.
A measure of central tendency alone does not uniquely describe a distribution.
(5) Measure of skewness describes the degree of departures of the distribution of the data from
symmetry and measure of kurtosis describes the extent of peakedness or flatness of the dis-
tribution of the data.
(6) Minimum is the smallest value in the data set, denoted by MIN. 7. Maximum is the largest
value in the data set, denoted by MAX.
(7) Measures of Central tendency or location are values that are typical, or representative, of a
set of data that tend to lie centrally within a set of data arranged according to magnitude.
Measures of central tendency are also called averages.
(8) Arithmetic mean or simply the mean – is the most popular measure of central tendency. It is
a sum of a set of measurements divided by number of measurements in the set.
Population mean – if the set of data x1 , x2, x3 , ...xn not necessarily all distinct represents a
finite population of size N, then the population mean is
∑iN=1 xi
µ=
N
Sample mean – if the set of data x1 , x2, x3 , ...xn not necessarily all distinct represents a finite
sample of size n, the sample mean is
∑in=1 xi
x̄ =
n
(9) Properties of the Arithmetic Mean
1. May not be an actual value observation in the data set.
2. Can be applied in at least an interval level of measurement.
3. Easy to compute.
4. Every observation contributes to the value of the mean.
5. Subgroup mean can be combined to come up with a group mean.
6. Easily affected by extreme values.
Note: Sometimes we associate with the numbers x1 , x2, x3 , ...xn certain weighting factors (or
weights) w1 , w2, w3 , ...wn depending on the significance or importance attached to the num-
bers. In this case,
w1 x1 + w2 x2 + w3 x3 + ... + wk xk
xe =
w1 + w2 + ... + wk
X N +1 if N is odd
µ̄ = 2
1
2 X N + X N +1 if N is even
2 2
b. Sample median:
X n +1 if n is odd
x̄ = 2
1
2 X n2 + X n2 +1 if n is even
Properties of Median
1. May not be an actual observation in the data set.
2. Can be applied in at least ordinal level.
3. A positional measure; may not be affected by extreme values.
(11) Mode is the value that appears the most number of times or that value with the greatest
frequency. The mode may not exist, and even if it does exist it may not be unique. A distri-
bution having only one mode is called unimodal.
Properties of the Mode
1. Can be used for qualitative as well as quantitative data.
2. May not be unique.
3. Not affected by extreme values.
4. Can be computed for ungrouped and grouped data.
(12) . If a set of data is arranged in order of magnitude, the middle value (or arithmetic mean of
the two middle values) that divides the set into two equal parts is the median. By extending
this idea, we can think of those values which divide the set into four equal parts, 10 equal
parts and 100equal parts and these are called quartiles, deciles and percentiles, respectively.
(13) Collectively, quartiles, deciles, percentiles and other values obtained by equal subdivisions
of the data are called quantiles.
(14) Percentiles – are values that divide an ordered set of observations into 100 equal parts. These
values, denoted by P1 , P2 , ..., P99 are such that 1 of the data falls below P1 , 2 falls below P2 and
99 falls below P99 .
(15) Deciles – are values that divide an ordered set of observations into 10 equal parts. These
values, denoted by D1 , D2 , ..., D9 are such that 10 of the data falls below D1 , 20% falls below
D2 and 90 falls below D9.
(16) Quartiles – are values that divide an ordered set of observations into 4 equal parts. These
values denoted by Q1 , Q2 , Q3 are such that 25% of the data falls below Q1 , 50% falls below
Q2 and 75% falls below Q3 .
(17) Procedure to compute for these values.
Step 1. Arrange the data in an increasing order of magnitude.
Step 2. Solve for the value of L, where
mn
100 , percentiles
mn
10 , deciles
mn
4 , quartiles
variance is:
2
∑ N ( xi − µ )
σ = i =1
2
N
For computational purposes, use the formula
2
(∑iN=1 xi )
∑iN=1 xi2 − Nµ2 ∑ N x2 −
σ2 = orσ2 = i=1 i N
N N
Sample Variance (s2 ). Given the random sample x1 , x2, x3 , ...xn the sample variance is:
2
∑n ( xi − x̄ )
s = i =1
2
n−1
For computational purposes, use the formula
∑ ( n
x i )2
2 n ∑in=1 xi2 − (∑in=1 xi )2 2 ∑in=1 xi2 − i=n1
s = ors =
n ( n − 1) n−1
IQR = Q3 − Q1
24.When data are presented in a frequency distribution, measures for central tendency and mea-
sures of variation can be computed.
Arithmetic mean:
Note: The arithmetic mean cannot be computed from an open-ended frequency distribution.
Median:
Where Lm is the lower class boundary of the median class. The median class is the class interval
where the (n/2)th value falls.
Fm−1 is the cumulative frequency of the class interval immediately preceding the median class.
The median of grouped data can be calculated even with open-ended intervals provided the me-
dian class is not open-median.
Mode:
To locate the modal class, look at the highest number in the frequency column.
f mo − f 1
Mode g = Lmo + c
2 f mo − f 1 − f 2
Where Lmo is the lower class boundary of the modal class. The modal class is the class interval
with the highest frequency.
f 1 is the frequency of the class interval immediately preceding the modal class.
f 2 is the frequency of the class interval immediately following the modal class.
Variance:
Standard deviation
q
The computational formula is: s g = s2g
Coefficient of Variation:
sg
The computational formula is: CVg = (100%)
| x¯g |
25. Measure of skewness describes the degree of departures of the distribution of the data from
symmetry. The degree of skewness is measured by the coefficient of skewness, denoted as SK and
computed as,
26. A distribution is said to be symmetric about the mean, if the distribution of the left of the
mean is the “mirror image” of the distribution to the right of the mean. Likewise, a symmetric
distribution has SK = 0 since its mean is equal to its median and its mode
27.Measure of kurtosis describes the extent of peakedness or flatness of the distribution of the
data. Measured by coefficient of Kurtosis(K ) computed as,
4
∑ ( Xi − µ )
K= −3
Nσ4