Lecture 2-Descriptive

Yekatit 12 Hospital Medical College
Department of Public Health

Lecture 2: Summarizing Data
For Weekend MPH Student
BY
Dube Jara (BSc in PH, MPHE, PhD Candidate)
Assistant Professor of Epidemiology
Email: jaradube@gmail.com
February, 2024
Addis Ababa, Ethiopia
1 Dube Jara (Assistant Professor &PhD Candidate), For MPH
Content
 Introduction
 Numerical Summary Measures
 Measures of Central Tendency
 Measures of Dispersion

Introduction
 Data collected by different methods, presented by tables and
figures need to be described in some concise way
 Number of sample may be large
 It leads to wrong track of the overall picture if we want
to look at all the data at once.
 Therefore, these can be overcome by summarizing the
data numerically

Introduction…
 Measures of Central Tendency
 Mean
 Median
 Mode
 Measures of Variability
 Range
 Mean deviation
 Variance
 Standard Deviation
 Skewness
 Positive skew
 Normal distribution
 Negative skew

Measures of Central Tendency
 The tendency of statistical data to get concentrated at certain values is

called the “Central Tendency” and
 The various methods of determining the actual value at which the data tend
to concentrate are called measures of central Tendency or averages.
 Hence, an average is a value which tends to sum up or describe the mass of the data
 The objective of calculating MCT is to determine a single figure which
may be used to represent the whole dataset
 In that sense it is an even more compact description of the statistical data

than the frequency distribution
 Since a MCT represents the entire data, it facilitates comparison within one
group or between groups of data

Characteristics of a good MCT
 A MCT is good or satisfactory if it possesses the following
characteristics.
 It should be based on all the observations
 It should not be affected by the extreme values
 It should be as close to the maximum number of values as

possible
 It should have a definite value
 It should not be subjected to complicated and tedious

calculations
 It should be capable of further algebraic treatment
 It should be stable with regard to sampling

Most common measures…
 The most common measures of central tendency include:

 Arithmetic Mean
 Median
 Mode
 Others

Arithmetic Mean
A. Ungrouped Data (simple mean)

 The arithmetic mean is the "average" of the data set and by far
the most widely used measure of central location
 The arithmetic mean is, in general, a very natural measure of
central location
 Is the sum of all the observations divided by the total number
of observations.

Arithmetic Mean…
Example : Age of ten students

18,22,19,20,21,,25 ,18,24,23,19
Mean = 18 +22+19+20+21+25+18+24+23+19 = 20.9
10
Example
The heart rates for n=10 patients were as follows (beats per minute):
167, 120, 150, 125, 150, 140, 40, 136, 120, 150
What is the arithmetic mean for the heart rate of these patients?

B. Mean for Grouped data
In calculating mean from grouped data , we assume that all values falling into a
particular class interval are located at the mid points of the class interval. it is
calculated as follow:
where: K- is the number of class intervals.

mi- is the mid-point of the ith class interval.
fi- is the frequency of ith class interval.
Example. Compute the mean age of 169 subjects from the grouped data.
Mean = 5810.5/169 = 34.48 years
Class interval Mid-point (mi) Frequency (fi) mifi
10-19 14.5 4 58.0

20-29 24.5 66 1617.0
30-39 34.5 47 1621.5
40-49 44.5 36 1602.0
50-59 54.5 12 654.0
60-69 64.5 4 258.0
Total __ 169 5810.5

Skewness
 Skewness measures the degree of asymmetry displayed by the

data
 If extremely low or extremely high observations are present in
a distribution, then the mean tends to shift towards those scores.
 Skewness is computed by first adding together the cubed
deviations from the mean and then dividing by the product of
the cubed standard deviation and the number of observations:

Skewness…
 If skewness equals zero, the histogram is symmetric
about the mean.
 Based on the type of skewness, distributions can be:
a. Negatively skewed distribution: occurs when majority of scores are at the
right end of the curve and a few small scores are scattered at the left end.
b. Positively skewed distribution: Occurs when the majority of scores are at the
left end of the curve and a few extreme large scores are scattered at the right end.
c. Symmetrical distribution: It is neither positively nor negatively skewed. A
curve is symmetrical if one half of the curve is the mirror image of the other half.
 In unimodal (one-peak) symmetrical distributions, the mean, median and
mode are identical.

Example
 Example:
 Data: 14, 89, 93, 95, 96
Mean= 387 = 77.4
5
 Skewness is reflected in the outlying low value of 14
 The median is 93
 2, 3, 5, 8, 11, 16, 143
Mean= 172 = 28.7
6

Kurtosis
 Kurtosis measures how peaked the histogram is.
 characterises the relive peakedness or flatness of a distribution
compared to the normal distribution
 Its definition is similar to that for skewness, with the exception that the
fourth power is used instead of the third:
 Data with high degree of peakedness are said to be leptokurtic, and

have values of kurtosis over 3.0.
 Flat histograms are platykurtic, and have kurtosis values less than 3.0.
The kurtosis of the commuting times is equal to 6.43, and hence is
relatively peaked.
Kurtosis
k>3 leptokurtic
Frequency
k=3 mesokurtic
k<3 platykurtic
Value
A few words about the normal curve
 Skewness = 0
 Kurtosis = 3
1  ( x   ) / 2 2
f ( x)  e
18
 2
Dube Jara (Assistant Professor &PhD Candidate), For MPH
Characteristics of mean
 The value of the arithmetic mean is determined by every item in

the series
 It is greatly affected by extreme values
 The sum of the deviations about it is zero
 The sum of the squares of deviations from the arithmetic mean
is less than of those computed from any other point
 For a given set of data there is one and only one arithmetic mean
(uniqueness).

Advantages and disadvantages
Advantages
1. It is based on all values given in the distribution.
2. It is most early understood.
3. It is most amenable to algebraic treatment.
Disadvantages
4. It may be greatly affected by extreme items and its usefulness
as a “summary of the whole” may be considerably reduced.
5. When the distribution has open-end classes, its computation

would be based assumption, and therefore may not be valid
Geometric mean
 It is obtained by taking the nth root of the product of “n”
values, i.e, if the values of the observation are demoted by x1,x2 ,
…,x n then, GM = n√(x1)(x2)….(xn) .
 The geometric mean is preferable to the arithmetic mean if the

series of observations contains one or more unusually large
values.
 The above method of calculating geometric mean is satisfactory

only if there are a small number of items.

Geometric mean…
 But if n is a large number, the problem of computing the nth root of the
product of these values by simple arithmetic is a tedious work.
 To facilitate the computation of geometric mean we make use of
logarithms.
 The above formula when reduced to its logarithmic form will be:
GM = n√(x1)(x2)….(xn) = { (x1)(x2)… (xn ) }1/n
Log GM = log {(x1 )(x2 )…(xn)}1/n
= 1/n log {(x1 )(x2 )…(xn)}
=1/n {log(x1 ) + log(x2 )+…log(xn)}
= Σ(log xi)/n

Geometric mean…
 The logarithm of the geometric mean is equal to the arithmetic
mean of the logarithms of individual values.
 The actual process involves obtaining logarithm of each value,
adding them and dividing the sum by the number of
observations.
 The quotient so obtained is then looked up in the tables of anti-
logarithms which will give us the geometric mean

Geometric mean…
Example: The geometric mean may be calculated for the following parasite counts per
100 fields of thick films.
7 8 3 14 2 1 440 15 52 6 2 1 1 25
12 6 9 2 1 6 7 3 4 70 20 200 2 50
21 15 10 120 8 4 70 3 1 103 20 90 1 237
GM = 42√7x8x3x…x1x237
log Gm = 1/42 (log 7+log8+log3+..+log 237)
= 1/42 (.8451+.9031+.4771 +…2.3747)
= 1/42 (41.9985)
= 0.9999 ≈ 1.0000

Geometric mean…
 The anti-log of 0.9999 is 9.9992 ≈10 and this is the required
geometric mean.
 By contrast, the arithmetic mean, which is inflated by the high
values of 440, 237 and 200 is 39.8 ≈ 40.

Characteristics of geometric Mean
1. It is a calculated value and depends upon the size of all the
items.
2. It gives less importance to extreme items than does the

arithmetic Mean.
3. For any series of items it is always smaller than the arithmetic

mean.
4. It exists ordinarily only for positive values.

Advantages and disadvantages
Advantages:-
1. Since it is less affected by extremes it is a more preferable average than

the arithmetic mean
2. It is capable of algebraic treatment
3. It based on all values given in the distribution.
Disadvantages:-
4. Its computation is relatively difficult.
5. It cannot be determined if there is any negative value in the distribution,

or where one of the items has a zero value.

other mean
 Harmonic mean
 Weighted mean (WM)
 Trimmed mean

Median
Ungrouped data
 The median is the value which divides the data set into two
equal parts.
 If the number of values is odd, the median will be the middle

value when all values are arranged in order of magnitude
 When the number of observations is even, there is no single

middle value but two middle observations

Median…
 In this case the median is the mean of these two middle
observations, when all observations have been arranged in the
order of their magnitude
 The principal strength of the sample median is that it is insensitive
to very large or very small values
 The principal weakness of the sample median is that it is
determined mainly by the middle points in a sample and is less
sensitive to the actual numerical values of the remaining data
points
Grouped data
 In calculating the median from grouped data, we assume that the
values within a class-interval are evenly distributed through the
interval.
 The first step is to locate the class interval in which the
median is located, using the following procedure.
 Find n/2 and see a class interval with a minimum cumulative
frequency which contains n/2
 Then, use the following formal

 n 
  Fc 
~
x = Lm  2 W
 fm 
 
where,
Lm = lower true class boundary of the interval containing the median
Fc = cumulative frequency of the interval just above the median
class interval
fm = frequency of the interval containing the median
W= class interval width
n = total number of observations

Example. Compute the median age of 169 subjects from the grouped data.
n/2 = 169/2 = 84.5
Class interval Mid-point (mi) Frequency (fi) Cum. freq

10-19 14.5 4 4
20-29 24.5 66 70
30-39 34.5 47 117
40-49 44.5 36 153
50-59 54.5 12 165
60-69 64.5 4 169
Total 169

Example…
 n/2 = 84.5 = in the 3rd class interval
 Lower limit = 29.5, Upper limit = 39.5
 Frequency of the class = 47
 cumulative frequency of the interval just above the median class
interval, fc=70
 (n/2 – fc) = 84.5-70 = 14.5
 Median = 29.5 + (14.5/47)10 = 32.58 ≈ 33

Characteristics of Median
 It is an average of position
 It is affected by the number of items than by extreme
values
 There is only one median for a given set of data
(uniqueness)

Advantages &Disadvantages
Advantages
 It is easily calculated and is not much disturbed by extreme
values
 It is more typical of the series
 The median may be located even when the data are
incomplete, e.g, when the class intervals are irregular and
the final classes have open ends
Disadvantages
 It is determined mainly by the middle points and less
sensitive to the remaining data points (weakness)
 It is not so generally familiar as the arithmetic mean
Mode
 The most frequently occurring value among all observations in a

set of data
 The modal class is the class interval with the highest frequency in
grouped data.
 It is not influenced by extreme values
 It is possible to have more than one mode or no mode
 It is not a good summary of the majority of the data
 The mode is not often used in biological or medical data
 Find the modal values for the following data
 22, 66, 69, 70, 73. (no modal value)
 1.8, 3.0, 3.3, 2.8, 2.9, 3.6, 3.0, 1.9, 3.2, 3.5 (modal value = 3.0
kg)
Ungrouped data
 It is a value which occurs most frequently in a set of values
 If all the values are different there is no mode, on the other
hand, a set of values may have more than one mode
 Some distributions have more than one mode:
 Unimodal: A distribution with one mode
 Bimodal: A distribution with two modes
 Trimodal: A distribution with three modes

Examples
Example
 Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6
 Mode is 4 “Unimodal”
Example
 Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8
 There are two modes = 2 & 5
 This distribution is said to be “bi-modal”
Example
 Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12
 No mode, since all the values are different
Grouped data
 To find the mode of grouped data, we usually refer to the modal

class, where the modal class is the class interval with the highest
frequency
 If a single value for the mode of grouped data must be specified,
it is taken as the mid-point of the modal class interval

Characteristics of Mode
 It is an average of position
 It is not affected by extreme values
 It is the most typical value of the distribution
Advantages
 Since it is the most typical value it is the most descriptive average

 Since the mode is usually an “actual value”, it indicates the precise
value of an important part of the series

Disadvantages
 Unless the number of items is fairly large and the distribution
reveals a distinct central tendency, the mode has no significance
 It is not capable of mathematical treatment
 In a small number of items the mode may not exist

Choose of MCT
 Which measure of central tendency is best with a given set of
data?
 Two factors are important in making this decisions:
 The scale of measurement (type of data)
 The shape of the distribution of the observations

The scale of measurement (type of data)
 The mean can be used for discrete and continuous data

 The median is appropriate for discrete and continuous data as
well, but can also be used for ordinal data
 The mode can be used for all types of data, but may be especially
useful for nominal and ordinal measurements
 For discrete or continuous data, then “modal class” can be used
 The geometric mean is used primarily for observations measured
on a logarithmic scale
 Harmonic mean is a suitable MCT when the data pertains to rates
and time
 Weighted mean is commonly used in the calculation of mean for
different outcomes
Shape of a Distribution
(a) Symmetric and unimodal distribution — Mean, median,
and mode should all be approximately the same
Mean, Median & Mode

Shape…
(b) Bimodal — Mean and median should be about
the same, but may take a value that is unlikely to
occur; two modes might be best
Mode Mode

Shape…
(c) Skewed to the right (positively skewed) —Mean is
sensitive to extreme values, so median might be more
appropriate
Mode
Median
Mean

Shape…
(d) Skewed to the left (negatively skewed) — Same as (c)
Mode
Median
Mean

Measures of Dispersion
Other synonymous term:

– “Measure of Variation”
– “Measure of Spread”
– “Measures of Scatter”

Measures of Dispersion…
 While the mean, median, etc. give useful information about the
center of the data, we also need to know how “spread out” the
numbers are about the center
 Measures that quantify the variation or dispersion of a set of

data from its central location
 Dispersion refers to the variety exhibited by the values of the

data
 The amount may be small when the values are close together
 If all the values are the same, no dispersion

Consider the following data sets:

Mean
Set 1: 60 40 30 50 60 40 70 50
Set 2: 50 49 49 51 48 50 53 50
 The two data sets given above have a mean of 50, but obviously set 1 is
more “spread out” than set 2. How do we express this numerically?
 The object of measuring this scatter or dispersion is to obtain a single
summary figure which adequately exhibits whether the distribution is
compact or spread out

 Measures of dispersion include:

 Range
 Inter-quartile range
 Variance
 Standard deviation
 Coefficient of variation
 Standard error

Range (R)
 The difference between the largest and smallest

observations in a sample.
 Range = Maximum value – Minimum value
 Example –
 Data values: 5, 9, 12, 16, 23, 34, 37, 42
 Range = 42-5 = 37

Properties of range
 It is the simplest crude measure and can be easily
understood
 It takes into account only two values which causes it to
be a poor measure of dispersion
 Very sensitive to extreme observations
 The larger the sample size, the larger the range

Quintiles and percentiles
Percentiles
 Simply divide the data into 100 pieces.
 Percentiles are less sensitive to outliers and not greatly affected by the
sample size (n)
 Percentiles can be expressed:
 P0: The minimum
 P25: (25th percentile) ,25% of the sample values are less than or equal to this
value. 1st Quartile
 P50: 50% of the sample are less than or equal to this value. 2 nd Quartile
 P75: 75% of the sample values are less than or equal to this value. 3 rd Quartile
 P100: The maximum
Quintiles and percentiles…
 Quintiles are particularly useful- are the quartiles of the

distribution
 The quartiles divide the distribution into four equal parts
 The second quartile is the median.

Interquartile Range (IQR)
 It is the difference between the first and the third quartiles.
 To compute it, we first sort the data, in ascending order, then find
the data values corresponding to the first quarter of the numbers
(first quartile), and then the third quartile.
 IQR is the distance (difference) between these quartiles

Interquartile Range (IQR)…
 The IQR is a preferable measure to the range

 Because it is less prone to distortion by a single large or small value.
 That is, outliers in the data do not affect the Interquartile range.
 Also, it can be computed when the distribution has open-end class

 Indicates the spread of the middle 50% of the observations,

and used with median
IQR = Q3 - Q1
 Example: Suppose the first and third quartile for weights of
girls 12 months of age are 8.8 Kg and 10.2 Kg, respectively.
IQR = 10.2 Kg – 8.8 Kg
i.e. 50% of the infant girls weigh between 8.8 and 10.2 Kg

Eg. Given the following data set (age of patients):-

18,59,24,42,21,23,24,32 find the Interquartile range!
 Sort the data from lowest to highest
 Find the bottom and the top quarters of the data
 Find the difference (Interquartile range) between the two quartiles
18 21 23 24 24 32 42 59
1st quartile = The {(n+1)/4}th observation = (2.25)th observation = 21 +
(23-21)x .25 = 21.5
3rd quartile = {3/4 (n+1)}th observation = (6.75)th observation = 32 +
(42-32)x .75 = 39.5
Hence, IQR = 39.5 - 21.5 = 18
Properties of IQR
 It is a simple and versatile measure
 It encloses the central 50% of the observations
 It is not based on all observations but only on two specific
values
 It is important in selecting cut-off points in the formulation of
clinical standards
 Since it excludes the lowest and highest 25% values, it is not
affected by extreme values
 Less sensitive to the size of the sample

Variance (2, s2)
 The main objection of mean deviation, that the negative signs are
ignored, is removed by taking the square of the deviations from the
mean
 The variance is the average of the squares of the deviations taken
from the mean
 It is squared because the sum of the deviations of the individual
observations of a sample about the sample mean is always 0

Variance...
 The variance can be thought of as an average of squared deviations

 Variance is used to measure the dispersion of values relative to
the mean
 When values are close to their mean (narrow range) the
dispersion is less than when there is scattering over a wide range.
 Population variance = σ2
 Sample variance = S2
Ungrouped data
 Let X1, X2, ..., XN be the measurement on N population units,

then:
N
 (X i  ) 2
2  i 1
where
N
N
X i
= i=1
is the population mean.
N
Example
 Following are the survival times of n=11 patients after heart
transplant surgery.
 The survival time for the “ith” patient is represented as Xi for i= 1,
…, 11.
 Calculate the sample variance and SD.

Grouped data
 (m i  x) 2
fi
S2  i =1
k
f
i =1
i -1
where
mi = the mid-point of the ith class interval
fi = the frequency of the ith class interval
x
= the sample mean
k = the number of class intervals
Example. Compute the variance and SD of the age of 169 subjects from the grouped data.
Class
interval (mi) (fi) (mi-Mean) (mi-Mean)2 (mi-Mean)2 fi
10-19 14.5 4 -19.98 399.20 1596.80
20-29 24.5 66 -9-98 99.60 6573.60
30-39 34.5 47 0.02 0.0004 0.0188
40-49 44.5 36 10.02 100.40 3614.40
50-59 54.5 12 20.02 400.80 4809.60
60-69 64.5 4 30.02 901.20 3604.80
Total 169 1901.20 20199.22
Mean = 5810.5/169 = 34.48 years

S2 = 20199.22/169-1 = 120.23
SD = √S2 = √120.23 = 10.96
Grouped data
 A sample variance is calculated for a sample of
individual values (X1, X2, … Xn) and uses the
sample
mean (e.g. ) rather than the population mean µ.

Example:
 Areas of sprayable surfaces with DDT from a sample of 15 houses are as follows (m2):
101,105,110,114,115,124,125,125,130,133,135,136,137,140,145
 Find the variance and standard deviation of the above distribution., n=15
 The mean of the sample is 125 m2. and Variance (sample) = s2 = Σ(xi –x)2/n-1
= {(101-125)2 +(105-125)2 + ….(145-125)2 } / (15-1)
= 2502/14
= 178.71 (square metres)2

 Hence, the standard deviation
= √Variance
= √178.71
= 13.37 m2.
Properties of Variance
 The main disadvantage of variance is that its unit is the square
of the units of the original measurement values
 The variance gives more weight to the extreme values as
compared to those which are near to mean value, because the
difference is squared in variance
 The drawbacks of variance are overcome by the standard
deviation

Standard deviation (, s)
 The sample and population standard deviations denoted by S

and σ (by convention) respectively are defined as follows:
 It is the square root of the variance
 This produces a measure having the same scale as that of the
individual values.
   and S = S 2 2

Ungrouped data
 Let X1, X2, ..., XN be the measurement on N population
units, then:
 i
(X   ) 2
2  i 1
where
N
N
X i
 = i=1
is the population mean.
77 N
Dube Jara (Assistant Professor &PhD Candidate), For MPH
Ungrouped....
( x  x) 2
S = (n - 1)
sample standard
deviation
=square root
=sum (sigma)
X=score for each point in data
_
X=mean of scores for the variable
n=sample size (number of observations or cases
SD...
 This measure of variation is universally used to show the scatter
of the individual measurements around the mean of all the
measurements in a given distribution.
 Note that the sum of the deviations of the individual observations
of a sample about the sample mean is always 0.

Properties of SD
 The SD has the advantage of being expressed in the same units
of measurement as the mean
 SD is considered to be the best measure of dispersion and is used
widely because of the properties of the theoretical normal curve
 However, if the units of measurements of variables of two data
sets is not the same, then there variability can’t be compared by
comparing the values of SD

SD Vs Standard Error (SE)
 SD describes the variability among individual values in a
given dataset
 SE is used to describe the variability among separate sample
means obtained from one sample to another
 We interpret SE of the mean to mean that another

similarly conducted study may give a mean that may
lie between  SE.

Standard Error
 SD is about the variability of individuals

 SE is used to describe the variability in the means of repeated
samples taken from the same population
 For example, imagine 5,000 samples, each of the same size n=11
 This would produce 5,000 sample means. This new collection has its
own pattern of variability
 We describe this new pattern of variability using the SE, not the SD

Example: The heart transplant surgery
n=11, SD=168.89, Mean=161 days
 What happens if we repeat the study? What will our next mean
be? Will it be close? How different will it be? Focus here is on
the Generalizability of the study findings
 The behavior of mean from one replication of the study to the
next replication is referred to as the sampling distribution of
mean
 We can also have sampling distribution of the median or the SD

SE…
We interpret this to mean that a similarly conducted

study might produce an average survival time that is
near 161 days, ±50.9 days.

Coefficient of variation (CV)
 The standard deviation is an absolute measure of deviation
of observations around their mean and is expressed with
the same unit of the data
 Due to this nature of the standard deviation it is not directly
used for comparison purposes with respect to variability
 A special measure called the coefficient of variation, is
often used for this purpose

Coefficient of variation (CV)...
 When two data sets have different units of measurements, or their
means differ sufficiently in size, the CV should be used as a
measure of dispersion
 It is the best measure to compare the variability of two series of
sets of observations
 Data with less coefficient of variation is considered more
consistent

CV ...
CV is the ratio of the SD to the mean multiplied by 100.
S
CV   100
x
SD Mean CV (%)
SBP 20mm 140mm 14.3
Cholesterol 80mg/dl 400md/dl 20.0
 “Cholesterol is more variable than systolic blood pressure”

Thank you !!

Lecture 2-Descriptive

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 2-Descriptive

Uploaded by

Copyright:

Available Formats

Yekatit 12 Hospital Medical College

Department of Public Health

2 Dube Jara (Assistant Professor &PhD Candidate), For MPH

3 Dube Jara (Assistant Professor &PhD Candidate), For MPH

4 Dube Jara (Assistant Professor &PhD Candidate), For MPH

 The tendency of statistical data to get concentrated at certain values is

 In that sense it is an even more compact description of the statistical data

5 Dube Jara (Assistant Professor &PhD Candidate), For MPH

 It should not be affected by the extreme values

 It should be as close to the maximum number of values as

 It should not be subjected to complicated and tedious

 It should be stable with regard to sampling

6 Dube Jara (Assistant Professor &PhD Candidate), For MPH

 The most common measures of central tendency include:

7 Dube Jara (Assistant Professor &PhD Candidate), For MPH

A. Ungrouped Data (simple mean)

8 Dube Jara (Assistant Professor &PhD Candidate), For MPH

Example : Age of ten students

10 Dube Jara (Assistant Professor &PhD Candidate), For MPH

where: K- is the number of class intervals.

Class interval Mid-point (mi) Frequency (fi) mifi

10-19 14.5 4 58.0

12 Dube Jara (Assistant Professor &PhD Candidate), For MPH

 Skewness measures the degree of asymmetry displayed by the

13 Dube Jara (Assistant Professor &PhD Candidate), For MPH

14 Dube Jara (Assistant Professor &PhD Candidate), For MPH

15 Dube Jara (Assistant Professor &PhD Candidate), For MPH

 Data with high degree of peakedness are said to be leptokurtic, and

 The value of the arithmetic mean is determined by every item in

19 Dube Jara (Assistant Professor &PhD Candidate), For MPH

5. When the distribution has open-end classes, its computation

 The geometric mean is preferable to the arithmetic mean if the

 The above method of calculating geometric mean is satisfactory

21 Dube Jara (Assistant Professor &PhD Candidate), For MPH

Log GM = log {(x1 )(x2 )…(xn)}1/n

= 1/n log {(x1 )(x2 )…(xn)}

=1/n {log(x1 ) + log(x2 )+…log(xn)}

22 Dube Jara (Assistant Professor &PhD Candidate), For MPH

23 Dube Jara (Assistant Professor &PhD Candidate), For MPH

21 15 10 120 8 4 70 3 1 103 20 90 1 237

log Gm = 1/42 (log 7+log8+log3+..+log 237)

= 1/42 (.8451+.9031+.4771 +…2.3747)

24 Dube Jara (Assistant Professor &PhD Candidate), For MPH

25 Dube Jara (Assistant Professor &PhD Candidate), For MPH

2. It gives less importance to extreme items than does the

3. For any series of items it is always smaller than the arithmetic

4. It exists ordinarily only for positive values.

26 Dube Jara (Assistant Professor &PhD Candidate), For MPH

1. Since it is less affected by extremes it is a more preferable average than

2. It is capable of algebraic treatment

3. It based on all values given in the distribution.

4. Its computation is relatively difficult.

5. It cannot be determined if there is any negative value in the distribution,

27 Dube Jara (Assistant Professor &PhD Candidate), For MPH

28 Dube Jara (Assistant Professor &PhD Candidate), For MPH

 If the number of values is odd, the median will be the middle

 When the number of observations is even, there is no single

29 Dube Jara (Assistant Professor &PhD Candidate), For MPH

33 Dube Jara (Assistant Professor &PhD Candidate), For MPH

34 Dube Jara (Assistant Professor &PhD Candidate), For MPH

n/2 = 169/2 = 84.5

Class interval Mid-point (mi) Frequency (fi) Cum. freq

35 Dube Jara (Assistant Professor &PhD Candidate), For MPH

36 Dube Jara (Assistant Professor &PhD Candidate), For MPH