You are on page 1of 208

DEFENCE UNIVERSITY COLLEGE OF HEALTH SCIENCE

Introduction to
Biostatistics

1
Chapter one: Introduction to Biostatistics
1.1 Statistics and Biostatistics
Definition of Statistics
 We can define Statistics in two ways.

1. Plural sense (lay man definition).


 It is an aggregate or collection of numerical facts.

2. Singular sense (formal definition)


 Statistics is defined as the science of collecting, organizing, presenting,
analyzing and interpreting numerical data for the purpose of assisting in
making a more effective decision.

 Biostatistics: The application of statistical methods to the fields of


biological and health sciences

2
Rationale for studying Biostatistics
 More and more things (facts) are now measured quantitatively in
medicine and public health.
 Statistical techniques give a way of organizing information on a wider &
more formal basis
 The planning, conduct and interpretation of much of medical/public
health research are becoming increasingly reliant on statistical
technology
Examples:
Is this new drug better than the one commonly in use? How much
better?
In testing a new drug, how many patients must be treated, and in
what manner, in order to demonstrate its worth?
Which group of the population is more affected by disease x? etc.

3
1.2 Definition of some basic terms
 Statistical Population: It is the collection of all possible observations of a
specified characteristic of interest (possessing certain common property) and
being under study.
 Sample: It is a subset of the population, selected using some sampling technique
in such a way that they represent the population.
 Sampling: The process or method of sample selection from the population.
 Sample size: The number of elements or observation to be included in the
sample.
 Census: Complete enumeration or observation of the elements of the population.
Or it is the collection of data from every element in a population
 Parameter: Characteristic or measure obtained from a population.
 Statistic: Characteristic or measure obtained from a sample.
 Variable: It is an item of interest that can take on many different numerical
values.

4
1.3 Applications, uses and limitation of statistics
a. Application of Statistics:
 In almost all fields of human endeavor.

 Almost all human beings in their daily life are subjected to obtaining numerical
facts e.g. about price.

 Applicable in some process e.g. invention of certain drugs, extent of


environmental pollution.

 In industries especially in quality control area.

5
1.3 Applications, uses and limitation of statistics….con’t
b. Uses of statistics:
 The main function of statistics is to enlarge our knowledge of complex
phenomena. The following are some uses of statistics:
 It presents facts in a definite and precise form.

 Data reduction.

 Measuring the magnitude of variations in data.

 Furnishes a technique of comparison

 Estimating unknown population characteristics.

 Testing and formulating of hypothesis.

 Studying the relationship between two or more variable.

 Forecasting future events.

6
1.3 Applications, uses and limitation of statistics….con’t
c. Limitations of statistics:

 As a science statistics has some of its own limitations.

 Deals with mainly quantitative information.

 Deals with only aggregate of facts and not with individual data items.

 Statistical data are only approximately and not mathematical correct.

 Statistics can be easily misused and therefore should be used by experts.

7
1.4 Uses of Biostatistics
• Provide methods of organizing information
• Assessment of health status in-terms of:
Magnitudes of a disease/condition
Assessing risk factors in terms of cause & effect relationship
Vaccination uptake
• Resource allocation
• Magnitude of association between exposure and outcome
• Making diagnosis and choosing an appropriate treatment implicitly.
• Health program evaluation
• Drawing of inferences from sample to population information.

8
Characteristics of statistical data

 Must be in aggregates
 Affected by multiplicity of causes
 Estimated according to a reasonable standard of accuracy
 Collected in a systematic manner for a predetermined purpose
 Must be placed in relation to each other

9
Types of Statistics
 Depending on how data can be used, statistics is divided in to two main
areas or branches.

1. Descriptive statistics:
• Ways of organizing and summarizing data
• Methods for identifying the important features of data and extracting useful
information
• Example: tables, graphs, numerical summary measures
• Describing the entire characteristics of the data

10
Types of statistics
2. Inferential statistics
• Methods used for drawing conclusions about a population based on the
information contained in a sample of observations drawn from that
population
• For example, the average income of all workers (the population) in
Ethiopia can be estimated from figures obtained from a few hundred (the
sample) of workers.
• It is important because statistical data usually arises from sample.

• Statistical techniques based on probability theory are required.

• Principles of probability, estimation, hypothesis testing, etc.

11
Variable
• Variable: A characteristic which takes on different values in
different persons, places, or things
• It is also a characteristics of interest that can takes on many different
numerical values.
• Any aspect of an individual or object that is measured/observed and
takes any value
• Variables can be broadly classified into:
– Qualitative or categorical, or
– Quantitative or numerical variables

12
Variable cont’d…
Quantitative variable is divided into two:
 Discrete variable: It can only have a finite number of values in any
given interval
• It assumes a count values
• Characterized by gaps or interruptions between the values
• Example: Number of patients, Number of bedroom in hospital, etc.
 Continuous variable: It can have an infinite number of possible
values in any given interval
• It assumes a decimal values
• Does not possess the gaps or interruptions
• Examples: Age, height, weight …
14
1.5 Scales of measurement
• Measurement scale refers to the property of value assigned to the
data based on the properties of order, distance and fixed zero.

Order: The property of order exists when an object that has more of the
attribute than another object is given a bigger number by the rule system.
Distance: The property of distance is concerned with the relationship of
differences between objects
Fixed zero: A measurement system possesses a rational zero (fixed zero) if
an object that has none of the attribute in question is assigned the number
zero by the system of rules.
15
In effect, there are four types of measurement scales:
1. Nominal scale:
 The simplest type of data, in which the values fall into unordered categories or
classes

 It is measurement systems that possess none of the three properties such as order,
distance and fixed zero.

 Level of measurement which classifies data into mutually exclusive, all inclusive
categories in which no order or ranking can be imposed on the data.
 No arithmetic and relational operation can be applied.
 Uses names, labels, or symbols to assign each measurement

 Examples:
 Sex:(male, female)
 Blood type:(A, B AB, O, etc)
 Marital status:( single, married, divorced, widowed)
 Race: (Black, white, Latino, Other)

16
2. Ordinal scale:
 Assigns each measurement to one of a limited number of categories that are
ranked in terms of order
 Ordinal Scales are measurement systems that possess the property of order, but
not the property of distance and fixed zero.
 Level of measurement which classifies data into categories that can be ranked
and differences between the ranks do not exist.
 Arithmetic operations are not applicable but relational operations are applicable.
 Ordering is the sole property of ordinal scale.
Examples:
 Letter grades (A, B, C, D, F).
 Rating scales (Excellent, Very good, Good, Fair, poor).
 Patient cancer pain level or stages: 1. None, 2. Mild, 3. Moderate, 4. Severe

17
3. Interval scale:
• Measured on a continuum and differences between any two numbers on a scale are
of known size
 Interval scales are measurement systems that possess the properties of order and
distance, but not the property of fixed zero
 Level of measurement which classifies data that can be ranked and differences are
meaningful.
 However, there is no meaningful zero, so ratios are meaningless.

 All arithmetic operations except division are applicable.


 Relational operations are also possible.
• Example: Temp. in oF on 4 consecutive days
• Days: A B C D
• Temp. oF: 50 55 60 65
• For these data, day A is not only with 50o cooler than day D with 65o, but is 15o cooler
• It has no true zero point, “0” is arbitrarily chosen and doesn’t reflect the absence of
temperature
• IQ
18
4. Ratio scale:
• Measurement begins at a true zero point and the scale has equal space
• Ratio scales are measurement systems that possess all three properties:
order, distance, and fixed zero.
• The added power of a fixed zero allows ratios of numbers to be
meaningfully interpreted
• Level of measurement which classifies data that can be ranked, differences
are meaningful, and there is a true zero.
• True ratios exist between the different units of measure
• All arithmetic and relational operations are applicable.
• Examples:
• The ratio of Bekele's height to Martha's height is 1.32, whereas this is not
possible with interval scales.
• Height, weight, blood pressure
• Some one whose age is 40 is twice as old as someone whose age is 20

19
20
Interval
Ordinal
Nominal

Ratio
Degree of precision in measuring
CHAPTER TWO: Presentation and Summarization of Data
Is techniques used to organize and summarize a set of data as:

 Tables

 Graphs

 Numerical summary measures

 Measures of central tendency

 Measures of variability

 The methods of describing variables differ depending on the type of data


(Numerical or Categorical).

21
Introduction:
Data:
 What is data? Raw data is recorded information in its original collected
form, whether it be counts or measurements.

 The raw material for statistics

 It can be obtained from:


 Routinely kept records on book
 Surveys
 Census
 Vital registration
 Reports

22
2.1 Types of Data
1. Primary data: Data measured or collected by the investigator or
the user directly from the source for the purpose of certain study.
Two activities are involved:
 Planning and
 Measuring.
i. Planning: It should involves the following consideration:
 Identify source and elements of the data.
 Decide whether to consider sample or census.
 If sampling is preferred, decide on sample size, selection method,… etc.
 Decide measurement procedure.
 Set up the necessary organizational structure

23
Types of Data……cont’nd
ii. Measuring: there are different options.
Focus Group
Telephone Interview
Mail Questionnaires
Door-to-Door Survey
Mall Intercept
New Product Registration
Personal Interview and
Experiments are some of the sources for collecting the primary data.

24
Types of Data……cont’nd
2. Secondary data: Which had been collected by certain people or agency,
and statistically treated and the information contained in is used for other
purpose
 Data gathered or compiled from published and unpublished sources or
files.
 When our source is secondary data check that:

 The type and objective of the situations.

 The purpose for which the data are collected and compatible with the
present problem.

 The nature and classification of data is appropriate to our problem.

 There are no biases and misreporting in the published data.


 Note: Data which are primary for one may be secondary for the other.
25
2.2 Methods of data collection, organization and presentation
i. Data collection methods and Tools
 The following are some of the methods used for collecting the data:
► Observation
► Face-to-face and self-administered interviews
► Postal or mail method and telephone interviews
► Focus group discussions
► Desk review
► Experiments.
► Others’ such as, life histories, Case studies, Nominal group techniques, etc.

26
Common problems in data collection
♥ The following are some of the common problems during data collection:

 Language barriers

 Lack of adequate time

 Expense

 Inadequately trained and experienced staff

 Invasion of privacy

 Bias

 Cultural norms

27
2.2 Methods of data collection, organization and presentation…con’t
ii. Methods of data organization and presentation
 Organization of data: Summarization of data in some meaningful way,
e.g. table form
 Presentation of the data: The process of re-organization, classification,
compilation and summarization of data to present it in a meaningful form.
Methods of data presentation:
♥ The presentation of data is broadly classified in to the following two
categories:
 Tabular presentation

 Diagrammatic and Graphic presentation

♥ The process of arranging data in to classes or categories according to their


similarities technically is called classification.
♥ Classification is a preliminary and it prepares the ground for proper
presentation of data. 28
2.2 Methods of data collection, organization and presentation…con’t

i. Methods of data presentation by using tables:

Frequency: is the number of values in a specific class of the distribution.

Frequency distribution: is the organization of raw data in table form


using classes and frequencies.

Example: Presentation of student age by using table


Age Group Frequency
10-20 5
20-30 69
30-40 20

29
Why Use Frequency Distributions?
 The reasons for constructing a frequency distribution are as follows:

 To organize the data in a meaningful and intelligible way.

 To enable the reader to determine the nature or shape of the distribution

 To facilitate computational procedures for measures of average and spread

 To enable the researcher to draw charts and graphs for the presentation of
data

 To enable the reader to make comparisons between different data set

30
 Types of Frequency Distribution

 Categorical frequency distribution

 Ungrouped frequency distribution

 Grouped frequency distribution

31
Categorical Frequency Distribution
 Used for data that can be place in specific categories such as nominal,
or ordinal. e.g. marital status.

Example: A social worker collected the following data on marital status


for 25 persons. (M=married, S=single, W=widowed, D=divorced)
construct categorical frequency distribution

M S D W D Class Tally Frequency Percent


S S M M M (1) (2) (3) (4)
W D S M M M ///// 6 24
W D D S S Solution:
S W W D D S //// // 7 28
D //// // 7 28
W //// 5 20

32
2. Ungrouped frequency distribution
 Is a table of all potential values that could possibly separately occur in
the data collection along with their corresponding frequencies.
Example: Consider age of 20 students who read in library last night
30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30, 35, 37, 32, 30, and
41.

33
3. Grouped Frequency distribution
• When the range of the data set is large, the data must be
grouped in to classes that are more than one unit in width.

Example : Mark of a student in a class


89,17.21,100,11,3,90,45,41,67,87,34,69,3,39,63,41,57,53,12,79, 91, 42 ,100,
62,73,1,38,56,45,25, 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41,
43, 44, 27, 53, 27 …
. We need to summarize the data to make it Meaningful.

34
Definitions:
• Grouped Frequency Distribution: a frequency distribution when several
numbers are grouped in to one class or more.
• Class limits: Separates one class in a grouped frequency distribution from
another.
• The limits could actually appear in the data and have gaps between the upper
limits of one class and lower limit of the next.
• Units of measurement (U): the distance between two possible consecutive
measures. It is usually taken as 1, 0.1, 0.01, 0.001, -----.
• Class boundaries: Separates one class in a grouped frequency distribution from
another.
– The boundaries have one more decimal places than the row data and
therefore do not appear in the data.
• Class width: the difference between the upper and lower class boundaries of any
class.
– It is also the difference between the lower limits of any two consecutive
classes or the difference between any two consecutive class marks.
• Class mark (Mid points): it is the average of the lower and upper class limits or
the average of upper and lower class boundary.
• Cumulative frequency: is the number of observations less than/more than or
equal to a specific value

35
 Cumulative frequency above: it is the total frequency of all values
greater than or equal to the lower class boundary of a given class.

 Cumulative frequency blow: it is the total frequency of all values less


than or equal to the upper class boundary of a given class.

 Cumulative Frequency Distribution (CFD): it is the tabular


arrangement of class interval together with their corresponding
cumulative frequencies. It can be more than or less than type, depending
on the type of cumulative frequency used.

 Relative frequency (rf): it is the frequency divided by the total


frequency.

 Relative cumulative frequency (rcf): it is the cumulative frequency


divided by the total frequency
36
Guidelines for classes

1.There should be between 5 and 20 classes.

2.The classes must be mutually exclusive. This means that no data value can
fall into two different classes

3.The classes must be all inclusive or exhaustive. This means that all data
values must be included.

4.The classes must be continuous. There are no gaps in a frequency


distribution.

5.The classes must be equal in width. The exception here is the first or last
class. It is possible to have an "below ..." or "... and above" class. This is
often used with ages
37
Steps for constructing Grouped frequency Distribution
1. Find the largest and smallest values
2. Compute the Range(R) = Maximum - Minimum
3. Select the number of classes desired, usually between 5 and 20 or use Sturges
rule k= 1+3.32logn where k is number of classes desired and n is total number of
observation.
4. Find the class width W= R/K
5. Pick a suitable starting point ≤ to the minimum value.
6. To find the upper limit of the first class, subtract U from the lower limit of the
second class. Then continue to add the class width to this upper limit to find the
rest of the upper limits.
7. Find the boundaries by subtracting U/2 units from the lower limits and adding
U/2 units from the upper limits.
8. Tally the data.
9. Find the frequencies.
10.Find the cumulative frequencies
11.Find the relative frequencies and/or relative cumulative frequencies 38
Example 2.3

Construct a frequency distribution for the following data.

11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27

Solutions: 6 11 14 17 18 19 20 21 22 22 23 26 27 27 29 31 33 34 38 39
Step 1: Find the highest and the lowest value H=39, L=6

Step 2: Find the range; R=H-L=39-6=33

Step 3: Select the number of classes desired using Sturges formula;

k  1 3.32 log n =1+3.32log (20) =5.32=6(rounding up)

Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)

Step 5: Select the starting point, let it be the minimum observation.


39
Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
Find the class boundaries;

E.g. for class 1 Lower class boundary=6-U/2=5.5

Upper class boundary =11+U/2=11.5

 Then continue adding w on both boundaries to obtain the rest boundaries. By


doing so one can obtain the following classes.

Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5

40
The complete frequency distribution follows:

Class Class Clas Tally Freq. Cf (less Cf (more rf. rcf (less
limit boundary s than than than type
Mar type) type)
k
6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10

12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20

18 – 23 17.5 – 23.5 20.5 ////// 7 11 16 0.35 0.55

24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75

30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90

36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00

41
Guidelines for constructing tables
 Tables should be as simple as possible

 Tables should be self-explanatory

 Clear title, it answers: what? when? where? how classified?) and placed
above the table

 Labelled row and columns

 Totals should be shown on both rows and columns

 State clearly the unit of measurement used,

 Explain codes and abbreviations in the foot-note,

 Show total numbers

 If data is not original, indicate the source in foot-note

42
ii. Methods of data presentations by using diagrams:

 Diagrams are appropriate for presenting discrete data

 Importance:
 They have greater attraction.
 They facilitate comparison.
 They are easily understandable.
 Used to understand patterns and trends
 Gives quick overall impression of the data
 The three most commonly used diagrammatic presentation for
discrete as well as qualitative data are:
 Pie charts
 Bar charts

43
Limitations of Diagrammatic Representation
1. The technique of diagrammatic representation is made use only for
purposes of comparison.

2. Diagrammatic representation is not an alternative to tabulation. It only


strengthens the textual exposition of a subject.

3. It can give only an approximate idea and as such where greater accuracy
is needed diagrams will not be suitable.

4. They fail to bring to light small differences

44
Specific types of graphs:

• Bar graph Nominal, ordinal


• Pie chart and discrete data

• Histogram
• Box plot
Continuous
• Scatter plot data
• Line graph
• Others
45
Pie charts: A pie chart is a circle that is divided in to sections or wedges according to
the percentage of frequencies in each category of the distribution.
Example: Distribution of deaths for females, England, 1989

Figure: Distribution of deaths for


females, England, 1989

46
Bar charts
 A set of bars (thick lines or narrow rectangles) representing some
magnitude over time space.

 They are useful for comparing aggregate over time space.

 Bars can be drawn either vertically or horizontally.

 There are different types of bar charts.

47
Bar charts
 There are different types of bar charts.

• Simple bar chart

• Deviation or two way bar chart

• Broken bar chart

• Component or sub divided bar chart.

• Multiple bar charts.

 Essentially, the most common being are the following:

 Simple bar chart

 Component or sub divided bar chart.

 Multiple bar charts.


48
Method of constructing bar chart
a. All the bars must have equal width
b. The bars are not joined together
c. The different bars should be separated by equal distances
d. All the bars should rest on the same line called the base

49
Simple bar charts
Example: The following is example on simple bar chart

Figure: Type of ICU for 25 patients, Hospital x, 2006

50
Component (Sub-divided)bar chart
• If there are different quantities forming the sub-divisions of the totals, simple
bars may be sub-divided in the ratio of the various sub-divisions to exhibit the
relationship of the parts to the whole

• The order in which the components are shown in a “bar” is followed in all bars
used in the diagram

Example: The following is example on component bar chart


100 Mixed
P. vivax
80 P. falciparum

60
Percent

40

20

0
August October December
2003

Figure: Plasmodium species distribution for confirmed malaria cases, Zeway, 2003
51
Multiple bar graph
 Bar charts can be used to represent the relationships among more than two variables

 The following figure shows the relationship between children’s reports of breathlessness
and cigarette smoking by themselves and their parents

Example: The following is example on multiple bar chart

Prevalence of self reported breathlessness among school


childeren, 1998

35
Breathlessness, per cent

30
25
20
15
10
5
0
Neither One Both
Parents smooking

Child never smoked smoked occassionaly child smoked one/week or more

 We can see from the graph quickly that the prevalence of the symptoms increases both
52
with the child’s smoking and with that of their parents
Graphical presentation of data:
 These presentations are appropriate for presenting continuous data
Histogram:
 Histograms are frequency distributions with continuous class intervals that
have been turned into graphs
 To construct a histogram, we draw the class boundaries on a horizontal
line and the frequencies on a vertical line
 Non-overlapping intervals that cover all of the data values must be used
 Bars are then drawn over the intervals in such a way that the areas of the
bars are all proportional in the same way to their interval frequencies

53
Graphical presentation of data
Example:
Table: Distribution of the age of women at the time of marriage,
Ethiopia, 2008.

Figure: Distribution of the age of women at the time of marriage, Ethiopia, 2008
54
Graphical presentation of data
Frequency polygon:
 A frequency distribution can be portrayed graphically in yet another way
by means of a frequency polygon

 To draw a frequency polygon we connect the mid-point of the tops of the


cells of the histogram by a straight line

 Constructed using mid-point values

 Graphs drawn in this way are called frequency polygons (line graphs).

 Frequency polygons are superior to histograms for comparing two or more


sets of data.

55
Graphical presentation of data
Example: It can be also drawn without erecting rectangles by joining the
top midpoints of the intervals representing the frequency of the classes as
follows:
Age of women at the time of marriage

40

35

30
No of women

25

20

15

10

0
12 17 22 27 32 37 42 47
Age

56
Graphical presentation of data
Ogive Curve /cumulative frequency polygon:
 Ogive curve is using cumulative frequency distribution

 Are much more common than frequency polygons

 A graph showing the cumulative frequency (less than or more than


type) plotted against upper or lower class boundaries respectively.

 Class boundaries along the horizontal axis corresponding to


cumulative frequencies are along the vertical axis.

 The points are joined by a free hand curve.

57
General rule for graphical presentation of data
1. Every graph should be self-explanatory and as simple as possible.

2. Titles are usually placed below the graph and it should again question
what ? Where? When? How classified?

3. Legends or keys should be used

4. The axes label should be placed to read from the left side and from the
bottom.

5. The units in to which the scale is divided should be clearly indicated.

6. The numerical scale representing frequency must start at zero or a break


in the line should be shown.

58
Graphical presentation of data
Example 2: Heart rate of patients admitted to hospital Y, 1998.

59
Stem and Leaf Plot
• A quick way to organize data to give visual impression similar to a histogram
while retaining much more detail on the data

• Draw a vertical line and place the first digits of each value called the “stem”
on the left side of the line

• The numbers on the right side of the vertical line present the second digit of
each observation; they are the “leaves”
Example: 43, 28, 34, 61, 77, 82, 22, 47, 49, 51, 29, 36, 66, 72, 41

60
Questionnaire design and interview techniques.
Types of questions
♥ Open-ended Question ♥ Closed Question

 Example: “Can you describe  Example: What is your marital


exactly what the traditional status?
birth attendant did when 1) Single
your labor started?” 2) Married
3) Divorced
4) Widowed

61
Steps in designing questionnaire
1. Content preparation
2. Formulating questions
3. Sequencing of questions
4. Formatting the questionnaire
5. Translation

62
Quality of a measurement
 Quality of a measurement has two dimensions
1. Reliability or precision
 is the degree of reproducibility, or repeatability
 That is, the degree to which further measurement shows similar results
 Statistically, precision lacked measurement are subject to random error

2. Validity or accuracy
 Is an approximation to the “truth” as determined by instrument accepted
as the “gold standard”
 For a physiological measure, accuracy has the intuitive sense of the
instrument being able to get to the real value as it exists in nature

63
♥ How to maximize the quality of measurements?
 Use standardized instrument or adapt
 Train your observers;
 Automate measurement;
 Repeat the measurement
 Blinding
 Calibration

64
2.3 Measure of central tendency
Introduction:

65
66
 The following are some of the common measures of central tendency:
 The mean (Arithmetic mean, Geometric mean and Harmonic mean)
 The mode

 The median

 Quantiles( Quartiles, Deciles and Percentiles)

67
68
69
70
71
72
73
74
75
76
77
78
Class Frequency
39.5-44.5 7
44.5-49.5 10
49.5-54.5 22
54.5-59.5 15
59.5-64.5 12
64.5-69.5 6
69.5-74.5 3
79
Solution:

80
81
82
83
84
85
E.g. the following data shows the age of 30 sampled patients in JUSH
6,9,11,14,16,17,18,21,22,22,22,22,23,25,25,26,27,28,28,32,33,34,34,36,39,39,
41,45,46,49. Find,
a. All quartiles (Q1, Q2, & Q3)?
b. (D2 & D5)?
c. P50 & P75?

86
87
2.4 Measures of Variations or Dispersions
Introduction:
The scatter or spread of items of a distribution is known as
dispersion or variation.

In other words the degree to which numerical data tend to spread


about an average value is called dispersion or variation of the data.

Measures of dispersions are statistical measures which provide ways


of measuring the extent in which data are dispersed or spread out.

88
Measures of Variations or Dispersions
 A measure of dispersion conveys information regarding the amount of
variability present in a set of data
 We need to know something about the variability or spread of the values
whether they tend to be clustered close together, or spread out over a
broad range
Note:
1. If all the values are the same
→ There is no dispersion
2. If all the values are different
→ There is a dispersion
3. If the values close to each other
→The amount of Dispersion is small
4. If the values are widely scattered
→ The Dispersion is greater

89
Measures of Variations or Dispersions……Con't
Objectives of measuring Variation:
To judge the reliability of measures of central tendency

To control variability itself.

To compare two or more groups of numbers in terms of their


variability.

To make further statistical analysis

90
Measures of Variations or Dispersions
Absolute and Relative Measures of Dispersion:
The measures of dispersion which are expressed in terms of the original unit of a
series are termed as absolute measures.

Such measures are not suitable for comparing the variability of two distributions
which are expressed in different units of measurement and different average size.

Relative measures of dispersions are a ratio or percentage of a measure of


absolute dispersion to an appropriate measure of central tendency and are thus pure
numbers independent of the units of measurement.

For comparing the variability of two distributions (even if they are measured in
the same unit), we compute the relative measure of dispersion instead of absolute
measures of dispersion.

91
Measures of Variations or Dispersions
Measures of Variability:

92
Measures of Variations or Dispersions
Types measures of Variability:
The most commonly used measures of dispersions are:
1.Range and relative range

2.Quartile deviation and coefficient of Quartile deviation

3.Mean deviation and coefficient of Mean deviation

4.Standard deviation ,coefficient of variation and standard scores

93
RANGE AND RELATIVE RANGE
i. The Range (R):
 The range is the largest score minus the smallest score.
 Being determined by only the two extreme observations, use of the range
is limited because it tells us nothing about how the data between the
extremes are spread.
 The range is greatly affected by extreme scores, it may give a distorted
picture of the scores.
 The following two distributions have the same range, 13, yet appear to
differ greatly in the amount of variability.

Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45

Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45

94
 For this reason, among others, the range is not the most important
measure of variability.

R  LS , L  l arg est observation


S  smallest observation
Range for grouped data:
 If data are given in the shape of continuous frequency distribution, the
range is computed as:
R  UCLk  UCL1 , UCLk is upperclasslim it of the last class.
UCL1 is lower class lim it of the first class.
 This is some times expressed as:

R  X k  X1 , X k is class mark of the last class.


X 1 is classmark of the first class.
95
Merits and Demerits of range
Merits:
 It is rigidly defined.
 It is easy to calculate and simple to understand.
Demerits:
 It is not based on all observation.
 It is highly affected by extreme observations.
 It is affected by fluctuation in sampling.
 It is not liable to further algebraic treatment.
 It can not be computed in the case of open end distribution.
 It is very sensitive to the size of the sample.

96
ii. Relative Range (RR):
it is also some times called coefficient of range and given by:

LS R
RR  
Example: L S L S
1.Find the relative range of the above two distribution.(exercise!)
2.If the range and relative range of a series are 4 and 0.25 respectively. Then
what is the value of:
a. Smallest observation
b. Largest observation
Solutions :( 2)

R  4  L  S  4 _________________(1)
RR  0.25  L  S  16 _____________(2)
Solving (1) and (2) at the same time , one can obtain the following value
L  10 and S  6
97
Quartile deviation and coefficient of Quartile deviation
i. The Quartile Deviation (Semi-inter quartile range), Q.D:
 The inter quartile range is the difference between the third and the first
quartiles of a set of items and semi-inter quartile range is half of the inter
quartile range.
Q3  Q1
Q.D 
2
ii. Coefficient of Quartile Deviation (C.Q.D):
(Q3  Q1 2 2 * Q.D Q3  Q1
C. Q.D   
(Q3  Q1 ) 2 Q3  Q1 Q3  Q1
 It gives the average amount by which the two quartiles differ from the
median.

98
Mean deviation and coefficient of Mean deviation
The Mean Deviation (M.D):

The mean deviation of a set of items is defined as the arithmetic mean of the values of the
absolute deviations from a given average. Depending up on the type of averages used we
have different mean deviations.

a) Mean Deviation about the mean


 Denoted by M.D( X ) and given by

n
 Xi  X
M .D ( X )  i 1
n
 For the case of frequency distribution it is given as:

k
 fi X i  X
M .D ( X )  i 1
n

Steps to calculate M.D ( X ):


1. Find the arithmetic mean, X
2. Find the deviations of each reading from X .
3. Find the arithmetic mean of the deviations, ignoring sign.
99
a) Mean Deviation about the median.
~
 Denoted by M.D( X ) and given by

n ~
~
 Xi  X
M .D( X )  i 1
n
 For the case of frequency distribution it is given as:

k ~
~
 i i
f X  X
M .D( X )  i 1
n
~
Steps to calculate M.D ( X ):
~
1. Find the median, X
~
2. Find the deviations of each reading from X .
Find the arithmetic mean of the deviations, ignoring sign
100
a) Mean Deviation about the mode.
 Denoted by M.D( X̂ ) and given by

X i
ˆ
X
ˆ)
M.D( X i 1
n

 For the case of frequency distribution it is given as:

k
 f i X i  Xˆ
M .D( Xˆ )  i 1
n
Steps to calculate M.D ( X̂ ):
1. Find the mode, X̂
2. Find the deviations of each reading from X̂ .
3. Find the arithmetic mean of the deviations, ignoring sign.
101
Examples:
1. The following are the number of visit made by ten mothers to the local doctor’s surgery.
8, 6, 5, 5, 7, 4, 5, 9, 7, 4
Find mean deviation about mean, median and mode.
Solutions:
First calculate the three averages
~ ˆ 5
X  6, X  5.5, X
Then take the deviations of each observation from these averages.
Xi 4 4 5 5 5 6 7 7 8 9 total
X 6
i
2 2 1 1 1 0 1 1 2 3 14

X i  5.5 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14

Xi  5 1 1 0 0 0 1 2 2 3 4 14

10
 X i  6) 14
 M .D ( X )  i 1
  1.4
10 10

10

~
 X i  5.5 14
M .D ( X )  i 1
  1.4
10 10

10
 X i  5) 14
ˆ) 
M .D ( X i 1
  1.4
10 10
102
Coefficient of Mean Deviation (C.M.D)

M .D
C.M .D 
Average about which deviationsare taken

M .D( X )
 C.M .D( X ) 
X
~
~ M .D( X )
C.M .D( X )  ~
X
M .D( Xˆ )
C.M .D( Xˆ ) 

103
The variance:
Population Variance
If we divide the variation by the number of values in the population, we get something
called the population variance. This variance is the "average squared deviation from the
mean".

1
Population Varince   2 
N
 ( X i   )2 , i  1,2,.....N

For the case of frequency distribution it is expressed as:

1
Population Varince   2 
N
 fi ( X i   )2 , i  1,2,.....k

104
The variance:
Sample Variance:
 The sum of the squares of the deviations is divided by one less than the
sample size and the population mean is replaced by sample mean.
n

Sample variance:  i
(x  x) 2

s2  i 1
n -1

105
106
Properties of Variance:
 The main disadvantage of variance is that its unit is the square of the unit
of the original measurement values

 The variance gives more weight to the extreme values as compared to


those which are near to mean value, because the difference is squared in
variance.

 The drawbacks of variance are overcome by the standard deviation.

107
Population Standard Deviation
 Most commonly used measure of variation

 Shows variation about the mean

 Has the same units as the original data

Population standard deviation:


N

 i
(x  μ) 2

σ i 1
N

108
Sample Standard Deviation
 Most commonly used measure of variation

 Shows variation about the mean

 Has the same units as the original data

Sample standard deviation: n

 (x i  x) 2

S i 1
n -1

109
Examples
1. Find the variance and standard deviation of the following sample data

110
Example. Compute the variance and SD of the age of 169 subjects from the
grouped data.
Mean = 5810.5/169 = 34.48 years
S2 = 20199.22/169-1 = 120.23
SD = √S2 = √120.23 = 10.96

111
Properties of Standard Deviation (S.D)
 The SD has the advantage of being expressed in the same units of
measurement as the mean

 SD is considered to be the best measure of dispersion and is used widely


because of the properties of the theoretical normal curve.

 However, if the units of measurements of variables of two data sets is not


the same, then there variability can’t be compared by comparing the
values of SD.

112
Properties of Standard Deviation (S.D)
Special properties of Standard deviations

1.
 i
( X  X ) 2

 i
( X  A) 2
,A X
n 1 n 1
2. For normal (symmetric) distribution the following holds.

 Approximately 68.27% of the data values fall within one standard deviation of the
mean. i.e. with in ( X  S , X  S )
 Approximately 95.45% of the data values fall within two standard deviations of the
mean. i.e. with in ( X  2S , X  2 S )
 Approximately 99.73% of the data values fall within three standard deviations of the
mean. i.e. with in ( X  3S , X  3S )
113
Properties of Standard Deviation (S.D)
4.If the standard deviation of X 1 , X 2 , ..... X n is S , then the standard deviation of
a) X 1  k , X 2  k , ..... X n  k will also be S
b) kX 1 , kX 2 , .....kX n would be k S
a  kX 1 , a  kX 2 , .....a  kX n would be k S
Examples:
1. The mean and standard deviation of n Tetracycline Capsules X 1 , X 2 , ..... X n are
known to be 12 gm and 3 gm respectively. New set of capsules of another drug are
obtained by the linear transformation Yi = 2Xi – 0.5 ( i = 1, 2, …, n ) then what will
be the standard deviation of the new set of capsules.

2. The mean and the standard deviation of a set of numbers are respectively 500 and 10.
a) If 10 are added to each of the numbers in the set, then what will be the variance
and standard deviation of the new set?
b) If each of the numbers in the set are multiplied by -5, then what will be the
variance and standard deviation of the new set?
Solutions:
1. Using c) above the new standard deviation = k S  2*3  6
2. a. They will remain the same.
b. New standard deviation  k S  5 *10  50
114
Coefficient of Variation
 Is a measure use to compare the dispersion in two sets of data which
is independent of the unit of the measurement

 When two data sets have different units of measurements, or their means
differ sufficiently in size, the CV should be used as a measure of
dispersion.

 It is the best measure to compare the variability of two series of sets of


observations.

 Data with less coefficient of variation is considered more consistent


where S: Sample standard deviation S
X : Sample mean
C.V  (100)
X

115
Coefficient of Variation
Example: Suppose two samples of human males yield the following data:

116
Q: Which one is more variable? Cholesterol or systolic blood pressure?

117
Standard Scores (Z-scores)

 If X is a measurement from a distribution with mean X and standard


deviation S, then its value in standard units is

X 
Z , for population.

X X
Z , for sample
S
 Z gives the deviations from the mean in units of standard deviation
 Z gives the number of standard deviation a particular observation lie above or
below the mean.
 It is used to compare two observations coming from different groups.
118
Examples:
1. Two sections were given introduction to statistics examinations. The following
information was given.

Value Section 1 Section 2


Mean 78 90
Stan.deviation 6 5

Student A from section 1 scored 90 and student B from section 2 scored 95.Relatively
speaking who performed better?

Solutions:
Calculate the standard score of both students.
X A  X 1 90  78
ZA   2
S1 6
X B  X 2 95  90
ZB   1
S2 5
 Student A performed better relative to his section because the score of student A is
two standard deviations above the mean score of his section while, the score of student B
is only one standard deviation above the mean score of his section.
119
Skewness:
 Skewness is a measure of symmetry, or more precisely, the lack of
symmetry or departure from symmetry.

 If the frequency curve (smoothed frequency polygon) of a distribution has


a longer tail to the right of the central maximum than to the left, the
distribution is said to be skewed to the right or said to have positive
skewness.
 If it has a longer tail to the left of the central maximum than to the right, it
is said to be skewed to the left or said to have negative skewness.

120
Skewness:

121
Skewness:
Measures of Skewness

It is denoted by  3

There are various measures of skewness.


1. The Pearsonian coefficient of skewness

Mean  Mode X  Xˆ
3  
S tan dard deviation S
2. The Bowley’s coefficient of skewness ( coefficient of skewness based on quartiles)
(Q3  Q2 )  (Q2  Q1 ) Q3  Q1  2Q2
3  
Q3  Q1 Q3  Q1
3. The moment coefficient of skewness
M3 M3 M3
3    , Where  is the population s tan dard deviation. The
M2
32
( )
2 32
 3

122
Skewness:
shape of the curve is determined by the value of  3

 If 3  0 then the distribution is positivelyskewed.

 If 3  0 then the distribution is symmetric.

 If  3  0 then the distribution is negativelyskewed.


Remark:
 In a positively skewed distribution, smaller observations are more frequent than larger
observations. i.e. the majority of the observations have a value below an average.
 In a negatively skewed distribution, smaller observations are less frequent than larger
observations. i.e. the majority of the observations have a value above an average.
123
Examples:
1. Suppose the mean, the mode, and the standard deviation of a certain distribution are 32,
30.5 and 10 respectively. What is the shape of the curve representing the distribution?

Solutions:
Use the Pearsonian coefficient of skewness
Mean  Mode 32  30.5
3    0.15
S tan dard deviation 10
 3  0  The distribution is positivelyskewed.

124
1. In a frequency distribution, the coefficient of skewness based on the quartiles is given to
be 0.5. If the sum of the upper and lower quartile is 28 and the median is 11, find the
values of the upper and lower quartiles.
Solutions:

~
Given:  3  0.5, X  Q2  11 Required: Q1 ,Q3

Q1  Q3  28...........................(*)

(Q3  Q2 )  (Q2  Q1 ) Q3  Q1  2Q2


3    0.5
Q3  Q1 Q3  Q1
Substituting the givenvalues, one can obtain the following
Q3  Q1  12...................................(**)
Solving (*) and (**) at the same time we obtain the following values
Q1  8 and Q3  20
125
Kurtosis:

 Kurtosis is the degree of peakdness of a distribution, usually taken relative


to a normal distribution.

 A distribution having relatively high peak is called leptokurtic. If a curve


representing a distribution is flat topped, it is called Platykurtic.

 The normal distribution which is not very high peaked or flat topped is
called mesokurtic.
 Three terms are used for indicating flatness,
 mesokurtic stands for a normal curve,
 leptokurtic for a peaked curve and
 Platykurtic for a curve less peaked than normal.

126
127
Measures of kurtosis
The moment coefficient of kurtosis:
 Denoted by  4 and given by
M4 M4
4   4
M2 
2

Where : M 4 is the fourth moment about the mean.


M 2 is the sec ond moment about the mean.
 is the population s tan dard deviation.
The peakdness depends on the value of  4 .
 If  4  3 then the curveis leptokurtic.
 If  4  3 then the curveis mesokurtic.
If  4  3 then the curveis platykurtic.

128
Examples:

1. If the first four central moments of a distribution are:


M 1  0, M 2  16, M 3  60, M 4  162
a) Compute a measure of skewness
b) Compute a measure of kurtosis and give your interpretation.

Solutions:
M3  60
3  32
 32
 0.94  0
a) M2 16
 The distribution is negatively skewed .

M 4 162
4  2
 2  0.6  3
b) M 2 16
 The curve is platykurtic.

129
Chapter Three: Demography and health service statistics
3.1: Introduction:

Definition of Demography
 Demography is a science that studies about human population with
respect to size, distribution, composition, social mobility and its
variation with respect to all the above features and the causes of such
variation and the effect of all these on health, social, ethical and economic
conditions.

130
Uses of demographic data
 Planning

• Health service provision

• What a community ‘looks like’

 Population size

e.g. quantities of health personnel & facilities to provide

 Population structure:

e.g. types of services to provide (geriatric vs pediatric )

 Distribution:

e.g. where to provide the services

131
Main sources of demographic data
 Sources of Demographic Data:

 Census

 Survey

 Registration of Vital Events

1. Census:

 Refers to nation-wide counting of population

 Every 10 years in Ethiopia

 Large and complicated

132
Types of census
 Two different ways:

i. De Jure

 The enumeration is done according to the usual or legal place of residence

ii. De Facto

 The enumeration is done according to the actual place of residence on the


day of the census

133
Advantages and disadvantages
 De Jure:
Advantage:
 Unaffected by seasonal and temporary movements
Disadvantage:
 Some omitted and some counted twice
 Information regarding people away from home is incomplete or
inaccurate
 De Facto:
Advantage:
 Less chance of double counting
 Less chance of omission
Disadvantage:
 Affected by tourists and other travellers
 In areas of high migration vital statistics is subject to distortion
134
2. Sample survey
 Based on sampling methods

 Obtaining information from a representative group of a population

 A sample survey is a lighter operation than a census, needing less time,


less hands and less funds.

 Smaller size than census allows collection of more in-depth information


that can then be generalized

135
3. Registration of Vital Events
 Mainly births and deaths

 Census is merely a snapshot while the counting of births and deaths is a


continuous process

 This system is not well developed in Ethiopia

 Characteristics of Vital Registration

 Comprehensiveness

 Compulsory by law

 Central compilation

 Continuous process

136
Demographic Transition
 Conceptual framework to explain population change over time.

 Developed by American demographer Warren Thompson, 1929.

 Observed changes in birth and death rates in industrialized societies over


the past two hundred years.

 Demographic change has got three stages.

 Developed countries started the second stage in the beginning of


eighteenth century.

 Less developed countries began the transition later.

137
Demographic Transition (stages)

138
Demographic Transition (stages)…
i. Pre-transitional: iii. Post-transitional:
– High mortality and high – Low birth and death rates
fertility
– Stable, moderate growth rate
– Young population
– Narrow based pyramid with
– Triangular broad based pattern steeper sides
– Primitive societies
– Developed countries
– Also called Expansive (Type I)
ii. Transitional:
– Old population
– High birth rate and reduced – Also called Stationary (Type
death rate III)
– High growth rate
– Young population
– Improved medical care
– Higher socio-economic
development
– Triangular population pyramid
– Also called Expansive (Type
II) 139
Population Pyramid
 Population pyramid is a graphical presentation of the age-sex distribution
of a population.

 Normally forms the shape of a pyramid.

 Consists of two back-to-back bar graphs, with the population plotted on


the X-axis and age on the Y-axis,

 Males are shown on the left and females on the right in five-year age
groups.

 Young persons at the bottom and the elderly at the top.

140
Population Pyramid

141
Population Pyramid, Ethiopia 2007

142
Population Pyramid…

143
Important Indicators
1. Sex Ratio: Is the total number of male population per 1000 female
population. This can be explained as Y to 1000, Y:1 or Y/X when Y is
number male and X is number of female.

2. Child to Women Ratio: This is the ratio of number of children under


five to number of women of reproductive age in given place and time. It
can also be used as measure of fertility.

3. Dependency Ratio: Describe the ratio between non productive (age 0-14
and 65+) and productive (15-64) age groups in given place and time.

144
Dependency ratio

Dependency ratio = “Non-productive" × 100


“Productive"
EDHS 2005:
47.7 % of the population are under 15
48.6 % of the population are 15-64
3.7 % of the population are over 65
Dependency ratio = 105.7%

145
Population change
 Three variables determine the population of any defined area: births,
deaths and migration.
 The balance among these factors determines whether a population
decreases, remains stationary, or increases in number.
 Let P t = Population at time “t”
P 0 = Population at an earlier time “0”
B = Births between time “0” and time “t”
D = Deaths between time “0” and “t”
I = In-migration / immigration between time “0” and “t”
O = Out-migration / emigration between time “0” and “t”
 The population changes by adding births and in-migrants and subtracting
deaths and out-migrants
Pt = P0 + B – D + I - O

146
Population change…

Total number of births in a year  Total no of deaths


Natural growth rate 
Mid year population in a same year

= CBR − CDR
Total no of births  Total no of deaths  immigration  emigration
Total growth rate 
Mid year population in same year

= CBR − CDR + Net migration

147
Population projection
 It provides information on the future size and composition of the
population of a given area.

 Knowledge of this information is fundamental for development plans


whose target is to satisfy the future needs of the population in the area of
health, education, employment, etc.

 No projection is 100 percent accurate, i.e., there is always some degree of


uncertainty

 The United Nations uses low, medium, high, and constant projections

148
Population projection…
 Arithmetic projection model: If the population Pt is projected t years in
the future from the present population P0, and the annual rate of growth is
r, then the arithmetic projection model is given by:
Pt = p0 (1+rt)

 Geometric projection model: This model is done according to the


compound interest formula.
Pt = p0 (1+r)t
 The doubling period for geometric projection of the population is given
by:
t = log 2 = 0.7
log (1+r) r

149
Measures of fertility
 Crude birth rate

 General fertility rate

 Age specific fertility rate

 Total fertility rate

 Gross reproduction rate

 Net reproduction rate

150
Measures of fertility…
• Crude Birth Rate (CBR): The number of live births in a year per 1000
mid year population in the same year.

Total number of live births in a year


CBR  x 1000
Mid year population in same year

• General Fertility Rate (GFR): The number of live births in a year per
1000 mid year women of reproductive age.

Total number of live births in a year


GFR  x 1000
Mid year female population aged 15  49 yrs in same year

151
Measures of fertility...
• Age Specific Fertility Rate (ASFR): Refers to the number of live births
in a year per 1000 women of reproductive age in a give age or age group.

• Usually ASFR is calculated for the following 7 age groups of 5 years age
category: 15-19 yr, 20-24 yr, 25-29 yr, 30-34 yr, 35-39 yr, 40-44 yr, 45-
49 yrs.

Total no of live births to women of a given age group during a year


ASFR  x 1000
Mid year female population for the same age group in the same year

152
Measures of fertility...

Age category ASFR

15-19 104

20-24 228

25-29 241

30-34 231

35-39 160

40-44 84

45-49 34
153
Measures of fertility...

• Total Fertility Rate (TFR): The number of children a woman expected


to have at the end of her reproductive age given the current ASFRs are
maintained.

• Mathematically, it is the sum of all ASFRs from 15-49 yrs.

• TFR for data given in the usual 5 years age category is provided as:

7
TFR  5 x  ASFR
i 1
i

154
Measures of fertility...
 Gross Reproduction Rate (GRR): Is the total fertility rate restricted to
female births only.

GRR  TFR x Pr oportion of female births x 1000

 The GRR measures the production of females.

 It makes no allowance for the fact that some women may die during the
childbearing years.

 For a more accurate measure of the replacement of daughters by their


mothers, Net Reproduction Rate (NRR) is appropriate

155
Measures of fertility…
Net reproduction rate (NRR)
 NRR refers the average number of daughters that would be born to a
woman if she passed through her life-time from birth to the end of her
reproductive years conforming to the age specific fertility and mortality
rates of a given year

 It measures the extent to which females in child bearing age group are
replacing themselves in the next generation

156
Measures of fertility…

 NRR is always lower than GRR, because it takes into account the fact that
some women will die before entering and completing their child-bearing
years

 Correspondingly NRR will be less than half the magnitude of the TFR

 NRR = 1 – Stationary population

 NRR < 1 – Declining population

 NRR > 1 – Growing population

157
Example: Calculate ASFR, TFR and GFR from the following data
Age category Women of Live births ASFR
reproductive age

15-19 15,600 1596

20-24 14,400 3300

25-29 13,300 3210

30-34 12,200 2830

35-39 11,600 1860

40-44 10,100 850

45-49 9,200 320

Total 86,400 13,966 158


Measures of Mortality
• Crude Death Rate (CDR): Refers to total number of deaths in a given
area usually in a year per 1000 mid year population.
Total number of death per year
CDR  x 1000
Mid year population
• Age and sex specific mortality rate (ASSR): Quantifies death
occurring in defined age and sex category in a given area per 1000 mid
year population of same age and sex category.

No of death in a given age and sex category in a year


ASSR  x 1000
Mid year population of that age and sex category in the same year

159
Measures of Mortality…
 Cause- specific mortality rate

 Proportionate mortality ratio

 Case Fatality Rate (CFR)

160
Measures of Mortality…
 Infant Mortality Rate (IMR): It refers to number of death before the
age of 1 year (Infancy period) in a year out of 1000 live births in the same
year.
IMR

 Neonatal mortality rate:


NMR

 Post neonatal mortality rate:


PNMR =

161
Measures of mortality…
• Perinatal mortality rate

• Under Five Mortality Rate (U5MR): Quantifies the probability of dying


between birth and age five per 1000 live births in a given year.

162
Measures of Mortality…
 Maternal Mortality Ratio:
Number of maternal death in a given year
MMR  x 100000
Total number of live births in the same year

 Maternal Mortality Rate:


Number of maternal death in a given year
MMR  x 1000
Total number of women of reproductive age in the same year

163
Measures of Migration
• Crude In-Migration Rate: Number of in-migrants (I) per 1,000
population in a given year.

• Crude Out-Migration Rate: Number of out-migrants (O) per 1,000


population in a given year.

• Crude Net Migration Rate: Difference between the number of in-


migrants (I) and number of out-migrants (O) per 1000 population in a
given year.

164
Health Service Statistics

 Data generated from the health system itself.

 Advantages:

– Gives morbidity information

– Identify priority health problem in the area.

– Determine met and unmet health need.

– Determine success or failure of specific health care program.

– Assess utilization of health service.

165
Health service statistics...
1. Relative Frequency of a Disease:
No of patients diagnosed with a specific disease
Relative Frequency of a given disease  x 100%
Total number of health institution visits

2. Cure Rate:
• Quantifies proportion of patients who have been cured for a disease
condition using a treatment modality out of 100 patients who received
similar type of treatment.
• The term “Success Rate” can be used if the measured parameter is a
procedure.

No of cured patients of a given disease using a treatment mod ality


Cure Rate  x 100%
Number of patients who recieved the treatment

166
Health service statistics cont..
3. Admission Rate:
• Quantifies proportion of admissions of patients among patients who
visited the health institution in a given period of time.
No of patients admitted to a health institution
Admission Rate  x 100%
Total number of patients visited the institution
4. Hospital Death Rate:
• Quantifies proportion of deaths among hospitalized patients in a given
period of time.

No of death among hospitalized patients


Hospital Dealth Rate  x 100%
Total no of admission

167
Hospital utilization indicators

B = The bed complement

A = The annual number of admissions

D = The annual number of discharges

d = The annual number of deaths

H = The annual number of hospitalized patient days

N = The daily average of beds occupied = H/365

168
Hospital utilization indicators…
1. Average Length of Stay:
 It indicates the average period in hospital per patient admitted
H
L
Dd
2. Bed Occupancy Rate (O):
 Quantifies percentage occupancy of hospital beds in a year.

N H
O X 100%  X 100%
B 365 X B

169
Hospital utilization indicators…
 Turnover interval (T): It expresses the average period, in days, that a
bed remains empty, in other words, the average time elapsing between the
discharge of one patient and the admission of the next

B X 365 - H
T  X 100%
D  d

170
Hospital utilization indicators…
 Hospital frequentation rate (Fh): It is expressed as the number of
hospital admissions per 1000 of population at risk, P, per year

A
Fh  X 1000
P

• Hospitalization rate per person (Hc ): it expresses the volume of


hospitalization in terms of number of hospitalization days per person per
year
H
HC  X 1000
P

171
Hospital utilization indicators…
 Bed-occupancy ratio (Bc): It is the average daily number of persons
hospitalized per unit of population

N
BC  X 1000
P
 Bed/population index (Ipb): It expresses the availability of hospital beds
in terms of the number of beds per 1000 of the population

B
I pb  X 1000
P

172
Chapter Four
Sampling methods
Learning Objectives:
 Learn the reasons for sampling

 Develop an understanding about different sampling methods

 Distinguish between probability & non probability sampling

 Discuss the relative advantages & disadvantages of each sampling


methods

173
 Information about a larger population is often estimated by selecting and
measuring a sample from that population

 Since population is too large, information collected from the sample is


assumed to be sufficient and cost effective

 Inferences about the population are based on the information from the
sample drawn from that population

 A main concern in sampling: ensure that the sample represents the


population and the findings can be generalized

174
Advantages and Disadvantages of sampling:
 Advantages of sampling:
 Feasibility: Sampling may be the best feasible method of collecting
information.
 Reduced cost: Sampling reduces demands on resource such as finance,
personnel, and materials.
 Greater accuracy: Sampling may lead to a better quality of data

 Time saving: Data can be collected and summarized more quickly

 Disadvantages of sampling:
 There is always a sampling error

 Sampling may create a feeling of discrimination within the population

175
 While selecting a SAMPLE, there are basic questions:
– What is the group of people from which we want to draw a sample?

– How many people do we need in our sample?

– How will these people be selected?

176
Definition of terms
 Reference (target) population: the population of interest to whom the
researchers would like to make generalizations

 Sampling/Source population: the subset of the target population from


which a sample will be drawn

 A sample is a collection of individuals selected from a larger population.


 Example: we may have a single sample composed of 50 individuals,
representing a population of 1000 people

 Sampling – is the process of selecting a portion of the population to


represent the entire population

 Sample size -The number of individuals included in the sample

 Study participants: the actual group in which the study is conducted =


Sample
177
Definition of terms
 Study/sampling unit: the units on which information will be collected:
example, persons
 Sampling frame: the list of units on which information will be collected:
Examples, list of persons

 Population value (parameter): A Population parameter is a numerical


expression that summarizes the values of some characteristics for all units
of an entire population.

 It is also measure or characteristics obtained from population

 Sample value (Statistic): A sample statistic is a numerical expression that


summarizes the values of the some characteristics for certain
population(sample).

 It is also a measure or characteristics obtained from sample

178
Target population = All
TB patients in AA

Sampling population = All TB


patients in, e.g. 3, hospitals in
the Region, AA

Sample
179
Target population:
The conclusion may or
may not be generalizable
due to refusals, selection
biases, etc.

Sampled population:
If sampling is representative,
then the conclusion applies to
the sampled population

Sample:
The conclusion is drawn
from the sample

180
 The conclusion is initially drawn from the sample

 The question is then:


• How far back does the generalization go?

 The conclusion usually applies to the sampled population

 It may or may not apply to the target population

181
Sampling Methods
Two broad divisions:

A. Probability sampling methods

B. Non-probability sampling methods

A. Probability sampling methods:


 Involves random selection of a sample

 Every sampling unit has a known and non-zero probability of selection


into the sample

 Involves the selection of a sample from a population, based on chance

182
• Probability sampling is:
– more complex,

– more time-consuming and

– usually more costly than non-probability sampling

• However, because study samples are randomly selected and their


probability of inclusion can be calculated,

– reliable estimates can be produced and

– inferences can be made about the population

183
Basic features of Probability sampling
– A sampling frame exists or can be compiled.

– All units of the population have an equal or at least a known chance


of being included in the sample.

– Generalization from sample to population is possible.

184
Most common probability sampling methods
1. Simple random sampling

2. Systematic random sampling

3. Stratified random sampling

4. Cluster sampling

5. Multi-stage sampling

185
1. Simple random sampling (SRS)
 The required number of individuals are selected at random from the
sampling frame
 Sampling frame: a list or a database of all individuals in the population
 Each member of a population has an equal chance of being included in the
sample.

 SRS method:
 prepare sampling frame

 Each unit should be numbered from 1 to N

 Select the required number (n)

 The randomness of the sample is ensured by:


• Use of “lottery’ methods
• Table of random numbers
• Computer programs 186
 SRS has certain limitations:
– Requires a sampling frame

– Difficult if the reference population is dispersed

– Minority subgroups of interest may not be selected

 Systematic random sampling:


 Sometimes called interval sampling

 Selection of individuals from the sampling frame systematically rather


than randomly

 Individuals are taken at regular intervals down the list

 The starting point is chosen at random

187
• Important if the reference population is arranged in some order:

– Order of registration of patients

– Numerical number of house numbers

– Student’s registration lists

• Taking individuals at fixed intervals (every kth) based on the sampling


fraction

188
Steps in systematic random sampling
1. Number the units on your frame from 1 to N (where N is the total
population size)
2. Determine the sampling interval (K) by dividing the number of units in
the population by the desired sample size
3. Select a number between 1 and K at random. This number is called the
random start and would be the first number included in your sample
4. Select every Kth unit after that first number

189
Example
 To select a sample of 100 from a population of 2000, you would need a
sampling interval of 2000 ÷ 100 = 20
 Therefore, K = 20
 You will need to select one unit out of every twenty units to end up with
a total of 100 units in your sample
 Select a number between 1 and 20 from a table of random numbers or
lottery method

190
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

46 47 48 49 50 51 52 53 54 55 ……..

191
3. Stratified random sampling
 It is done when the population is known to have heterogeneity with regard
to some factors and those factors are used for stratification

 Using stratified sampling, the population is divided into homogeneous,


mutually exclusive groups called strata, and

 A population can be stratified by any variable that is available for all units
prior to sampling (e.g., age, sex, province of residence, income, etc.).
 A separate sample is taken independently from each stratum

 Any of the sampling methods mentioned in this section (and others that
exist) can be used to sample within each stratum

 Stratified sampling ensures an adequate sample size for sub-groups in the


population of interest
 When a population is stratified, each stratum becomes an independent
population and you will need to decide the sample size for each stratum
192
Two types of sample size allocation

Equal allocation:
 Allocate equal sample size to each stratum
• Village A B C D Total
• HHs 100 150 120 130 500
• S. size ? ? ? ? 60
 Proportionate Allocation:
• Village A B C D Total
• HHs 100 150 120 130 500
• S. size ? ? ? ? 60

193
4. Cluster sampling
• Sometimes it is too expensive to carry out SRS

– Population may be large and scattered


– Complete list of the study population unavailable
– Travel costs can become expensive if it involves a larger survey
• The clusters should be homogeneous, unlike stratified sampling where the
strata are heterogeneous

194
Steps
• Divide the population into groups or clusters.
• A number of clusters are selected randomly to represent the total
population
• Then units within selected clusters are included in the sample
• No units from non-selected clusters are included in the sample
• This differs from stratified sampling, where some units are selected from
each group
 Example
• In a school based study, we assume students of the same school are
homogeneous

• We can select randomly sections and include all students of the selected
sections only
195
Example: Cluster sampling
Section 1 Section 2

Section 3

Section 5

Section 4
196
Advantages and Disadvantages of cluster sampling
• Advantages
– A list of all the units in the reference population is not required.
– If a list of all units in the population is not available, a list of all
clusters is either available or easy to create.
– Cost reduction
• Disadvantages
– Assumes that the characteristic to be studied is uniformly distributed.

197
5. Multi-stage sampling
 Similar to cluster sampling but sampling done at stages
 Appropriate when the population is large and widely scattered.
 Requires at least two stages
 The primary sampling unit (PSU) is the sampling unit in the first sampling
stage
 The secondary sampling unit (SSU) is the sampling unit in the second
sampling stage, etc and the last stage is called ultimate sampling unit

198
Woreda PSU

Kebele SSU

Sub-Kebele TSU

HH USU

199
 First, large groups or clusters are identified and selected.
 Second, population units are picked from within the selected clusters
(using any of the possible probability sampling methods)

 For more than 2 stages, the process of choosing population units within
clusters will continue until a final sample.

 Advantages
– No need to have a list of all units in the population.
– Saves a great amount of time and effort

 Disadvantages
– More information is needed in this type of sampling, which may not be
available

200
B. Non-probability sampling
 In non-probability sampling, every item has an unknown chance of being
selected.

 There is an assumption that there is an even distribution of a characteristic


of interest within the population.

 For probability sampling, random is a feature of the selection process,


rather than an assumption about the structure of the population.
 In non-probability sampling, since elements are chosen arbitrarily, there is
no way to estimate the probability of any one element being included in
the sample.

 Results susceptible to bias

 As a result, many researchers are reluctant to use it.


201
• Despite these drawbacks, non-probability sampling methods can be
useful
– when descriptive comments about the sample itself are desired
– they are quick, inexpensive and convenient.
– when it is unfeasible or impractical to conduct probability sampling
 The most common types of non-probability sampling:
1. Convenience or haphazard sampling

2. Volunteer sampling

3. Judgment sampling

4. Quota sampling

5. Snowball sampling
202
1. Convenience or haphazard sampling
 Is sometimes referred to as haphazard or accidental sampling.

 Sample units are selected if they can be accessed easily and


conveniently.
 The obvious advantage is that the method is easy to use, but that
advantage is greatly offset by the presence of bias.

 Although useful applications of the technique are limited, it can deliver


accurate results when the population is homogeneous.

 For example, a scientist could use this method to determine whether a lake
is polluted or not.

203
2. Volunteer sampling
 Occurs when people volunteer to be involved in the study.
 In experiments or pharmaceutical trials (drug testing), for example, it
would be difficult and unethical to use random selection.
3. Judgment sampling
 Is used when a sample is taken based on certain judgments about the
overall population.

 The underlying assumption is that the investigator will select units that are
characteristic of the population.

 The critical issue here is objectivity: how much can judgment be relied
upon to arrive at a typical sample?

204
 Researchers often use this method in exploratory studies like pre-testing
of questionnaires and focus groups.

 Judgment sampling is subject to the researcher's biases and is perhaps


even more biased than haphazard sampling.

 One advantage of judgment sampling is the reduced cost and time


involved in acquiring the sample
4. Quota sampling:
 Is one of the most common forms of non-probability sampling.
 Sampling is done until a specific number of units (quotas) for various
sub-populations have been selected.
 Study units from different categories (or strata) of the population are
selected.
 The basis of stratification can be: sex, age, educational status,
occupation, religion, etc.
 The proportions of the subjects from each stratum are determined.
 Combines judgment and convenience, and is more structured. 205
5. Snowball sampling is where research participants recruit other participants for a test
or study.
Snowball sampling is also known as cold-calling, chain sampling, chain-referral
sampling, and referral sampling.
It consists of two steps:
1.Identify potential subjects in the population. Often, only one or two subjects can be
found initially.
2.Ask those subjects to recruit other people (and then ask those people to recruit).
Participants should be made aware that they do not have to provide any other names.
Advantages and Disadvantages of Snowball Sampling
i. Advantages:
It allows for studies to take place where otherwise it might be impossible to conduct
because of a lack of participants.
Snowball sampling may help you discover characteristics about a population that you
weren’t aware existed.
ii. Disadvantages:
It us usually impossible to determine the sampling error or make inferences about
populations based on the obtained sample.

206
Errors in Sampling
 When we take a sample, our results will not exactly equal with the results
for the whole population.
 Two types of errors
1. Sampling error (random error)
2. Non-sampling error (bias)
1. Sampling error (random error)
 The value of the characteristic measured in a sample differs from that of
the total population

 This type of error, arising from the sampling process, is called sampling
error, is bi-directional

 Can’t be avoided or totally eliminated

 Minimized by increasing the sample size, when n = N, sampling error = 0

207
Non-sampling error (bias)
 Systematic error in design or sampling procedure
 Results uni-direction distortion of results
 More serious type of error
 Causes due to:
 Faulty instrument
 Faulty measuring process and
 Personal bias (Selection, Information, Observational, and data editing
and tabulation error)

 Can be eliminated by careful design of the sampling procedure and not by


increasing the sample size

208
Reducing systematic error
 Do pretest/pilot test for your instruments,

 Train the data collector

 Double-check the data thoroughly

 Data editing & cleaning

 Triangulation measurements

 Calibration

209

You might also like