You are on page 1of 8

Psychological Statistics Stat 25 Teachers

SUMMARY MEASURES

3.1 Things to Know

(1) Piles of raw data, by themselves, may not be informative, but when data are presented in
summary form, they may be much more interesting and meaningful to us. In most cases,
we need to summarize a given set of data rather maintain the entire set. Single numbers
called summary (or descriptive) statistics can be calculated for such a purpose. Two kinds
of summary statistics are particularly important to most data users – measures of central
tendency and measures of variabilit

(2)
(3) Measures of location summarize a data set by giving a “typical value” within the range of
the data values that describes its location relative to entire data set.
(4) A measure of variation is a single value that is used to describe the spread of the distribution.
A measure of central tendency alone does not uniquely describe a distribution.
(5) Measure of skewness describes the degree of departures of the distribution of the data from
symmetry and measure of kurtosis describes the extent of peakedness or flatness of the dis-
tribution of the data.
(6) Minimum is the smallest value in the data set, denoted by MIN. 7. Maximum is the largest
value in the data set, denoted by MAX.
(7) Measures of Central tendency or location are values that are typical, or representative, of a
set of data that tend to lie centrally within a set of data arranged according to magnitude.
Measures of central tendency are also called averages.

First Semester page 1 of 8


Psychological Statistics Stat 25 Teachers

(8) Arithmetic mean or simply the mean – is the most popular measure of central tendency. It is
a sum of a set of measurements divided by number of measurements in the set.
Population mean – if the set of data x1 , x2, x3 , ...xn not necessarily all distinct represents a
finite population of size N, then the population mean is

∑iN=1 xi
µ=
N

Sample mean – if the set of data x1 , x2, x3 , ...xn not necessarily all distinct represents a finite
sample of size n, the sample mean is

∑in=1 xi
x̄ =
n
(9) Properties of the Arithmetic Mean
1. May not be an actual value observation in the data set.
2. Can be applied in at least an interval level of measurement.
3. Easy to compute.
4. Every observation contributes to the value of the mean.
5. Subgroup mean can be combined to come up with a group mean.
6. Easily affected by extreme values.
Note: Sometimes we associate with the numbers x1 , x2, x3 , ...xn certain weighting factors (or
weights) w1 , w2, w3 , ...wn depending on the significance or importance attached to the num-
bers. In this case,

w1 x1 + w2 x2 + w3 x3 + ... + wk xk
xe =
w1 + w2 + ... + wk

is called the weighted arithmetic mean.


(10) Median is the middle value of a set of observations arranged in increasing or decreasing
order of magnitude. It is the middle value when the number of observations is odd, or the
arithmetic mean of the two middle values when the number of observations is even, i.e., it
the value such that half of the observations fall above it and half below it.
a. Population median:


 X N +1 if N is odd
µ̄ =  2 
1

2 X N + X N +1 if N is even
2 2

b. Sample median:

First Semester page 2 of 8


Psychological Statistics Stat 25 Teachers


 X n +1 if n is odd
x̄ =  2 
1

2 X n2 + X n2 +1 if n is even

Properties of Median
1. May not be an actual observation in the data set.
2. Can be applied in at least ordinal level.
3. A positional measure; may not be affected by extreme values.
(11) Mode is the value that appears the most number of times or that value with the greatest
frequency. The mode may not exist, and even if it does exist it may not be unique. A distri-
bution having only one mode is called unimodal.
Properties of the Mode
1. Can be used for qualitative as well as quantitative data.
2. May not be unique.
3. Not affected by extreme values.
4. Can be computed for ungrouped and grouped data.
(12) . If a set of data is arranged in order of magnitude, the middle value (or arithmetic mean of
the two middle values) that divides the set into two equal parts is the median. By extending
this idea, we can think of those values which divide the set into four equal parts, 10 equal
parts and 100equal parts and these are called quartiles, deciles and percentiles, respectively.
(13) Collectively, quartiles, deciles, percentiles and other values obtained by equal subdivisions
of the data are called quantiles.
(14) Percentiles – are values that divide an ordered set of observations into 100 equal parts. These
values, denoted by P1 , P2 , ..., P99 are such that 1 of the data falls below P1 , 2 falls below P2 and
99 falls below P99 .
(15) Deciles – are values that divide an ordered set of observations into 10 equal parts. These
values, denoted by D1 , D2 , ..., D9 are such that 10 of the data falls below D1 , 20% falls below
D2 and 90 falls below D9.
(16) Quartiles – are values that divide an ordered set of observations into 4 equal parts. These
values denoted by Q1 , Q2 , Q3 are such that 25% of the data falls below Q1 , 50% falls below
Q2 and 75% falls below Q3 .
(17) Procedure to compute for these values.
Step 1. Arrange the data in an increasing order of magnitude.
Step 2. Solve for the value of L, where

First Semester page 3 of 8


Psychological Statistics Stat 25 Teachers


mn


 100 , percentiles

mn
10 , deciles


 mn

4 , quartiles

where m is the location of the percentile, decile or quartile,


n is the number of observations.
Step 3. If L is an integer, the desired quantile get the average of the Lth and the( L + 1)th
observations. If L is fractional, get the next higher integer to find the required location. The
quantile corresponds to the value in that location.
(18) Measures of variation determine whether the set of observations tend to be quite similar
(homogeneous) or whether they vary considerably (heterogeneous).
(19) Range ( R)– difference between the largest and the smallest values in the set.

( R) = Highest value–lowest value

Properties of the Range.


1. Computation-wise, it is a quick but rough measure of dispersion.
2. The larger the value of the range, the more dispersed are the observations.
3. It considers only the lowest and highest values
(20) Variance Population Variance σ2 Given the finite population x1 , x2, x3 , ...xn the population


variance is:

2
∑ N ( xi − µ )
σ = i =1
2
N
For computational purposes, use the formula

2
(∑iN=1 xi )
∑iN=1 xi2 − Nµ2 ∑ N x2 −
σ2 = orσ2 = i=1 i N
N N

Sample Variance (s2 ). Given the random sample x1 , x2, x3 , ...xn the sample variance is:

2
∑n ( xi − x̄ )
s = i =1
2
n−1
For computational purposes, use the formula

∑ ( n
x i )2
2 n ∑in=1 xi2 − (∑in=1 xi )2 2 ∑in=1 xi2 − i=n1
s = ors =
n ( n − 1) n−1

First Semester page 4 of 8


Psychological Statistics Stat 25 Teachers

Properties of the variance


1. The variance is always non-negative.
2. A large variance corresponds to a highly dispersed set of values.
3. The variance is easy to manipulate for further mathematical computation.
4.. The variance makes use of all observations.
5. The variance comes in a unit of measure that is the square of the unit of measure of the
given set of values.
(21) Standard deviation.
The positive square root of the variance.

Formulas: a. population standard deviation: σ = σ2

b. sample standard deviation:s = s2
Note: 1. The standard deviation has the same properties as the variance except the last one.
Its unit of measure is the same as the original data.
2. If there is a large amount of variation, then on average, the data values will be far from
the mean. Hence, the standard deviation will be large.
3. If there is only a small amount of variation, then on average, the data values will be close
to the mean. Hence, the standard deviation will be small.
(22) Inter-quartile Range (IQR)
- the difference between the third quartile and the first quartile, i.e.,

IQR = Q3 − Q1

Properties of the Inter-quartile Range


1. Reduce the influence of extreme values.
2. Not as easy to calculate as the range.
(23) Coefficient of Variation (CV) is the ratio of the standard deviation to the absolute value of
the mean, expressed as a percentage. It is unitless and thus can be used to compare the
dispersion of two or more populations measured in the same or different units.
100s
CV = %
| x̄ |

24.When data are presented in a frequency distribution, measures for central tendency and mea-
sures of variation can be computed.

Measures of Central Tendency (Grouped data).

First Semester page 5 of 8


Psychological Statistics Stat 25 Teachers

Arithmetic mean:

The computational is formula is


¯
∑ik= f i xi
xg =
n

Where f i is the class frequency of the th ith class interval.

xi is the class mark of the ith class interval.

Note: The arithmetic mean cannot be computed from an open-ended frequency distribution.

Median:

The computational formula is



n/2 − Fm−1

x g = Lm + c
fm

Where Lm is the lower class boundary of the median class. The median class is the class interval
where the (n/2)th value falls.

Fm−1 is the cumulative frequency of the class interval immediately preceding the median class.

f m is the frequency of the median class.

c is the class width or class size.

The median of grouped data can be calculated even with open-ended intervals provided the me-
dian class is not open-median.

Mode:

To locate the modal class, look at the highest number in the frequency column.

f mo − f 1
 
Mode g = Lmo + c
2 f mo − f 1 − f 2

Where Lmo is the lower class boundary of the modal class. The modal class is the class interval
with the highest frequency.

f mo is the frequency of the modal class.

f 1 is the frequency of the class interval immediately preceding the modal class.

f 2 is the frequency of the class interval immediately following the modal class.

First Semester page 6 of 8


Psychological Statistics Stat 25 Teachers

c is the class width.

Measures of Variability (Grouped data)

Variance:

The computational formula is:


 2
n ∑ik=1 f i xi2 − ∑ik=1 f i xi
s2g =
n ( n − 1)

Where n is the number of observations.

f i is the frequency of the th i class interval.

xi is the class mark of the th i class interval.

k is the number of class intervals.

Standard deviation
q
The computational formula is: s g = s2g

Where s2g is the variance.

Coefficient of Variation:
sg
The computational formula is: CVg = (100%)
| x¯g |
25. Measure of skewness describes the degree of departures of the distribution of the data from
symmetry. The degree of skewness is measured by the coefficient of skewness, denoted as SK and
computed as,

SK = 3( Mean − Median)/Standard deviation

-if SK < 0 it is negatively skewed, SK > 0 it means positively skewed.

26. A distribution is said to be symmetric about the mean, if the distribution of the left of the
mean is the “mirror image” of the distribution to the right of the mean. Likewise, a symmetric
distribution has SK = 0 since its mean is equal to its median and its mode

27.Measure of kurtosis describes the extent of peakedness or flatness of the distribution of the
data. Measured by coefficient of Kurtosis(K ) computed as,

First Semester page 7 of 8


Psychological Statistics Stat 25 Teachers

4
∑ ( Xi − µ )
K= −3
Nσ4

- if K > 0 it is leptokurtic, ifK < 0 it is platykurtic and if K = 0 it is mesokurtic.

First Semester page 8 of 8

You might also like