Professional Documents
Culture Documents
01/23/2023 2
Learning objectives
01/23/2023 3
Statistics
• A field of study concerned with collection,
organization and summarization of data, and the
drawing of inferences about a body of data when only
part of the data are observed.
• The term statistics is used to mean either statistical
data or statistical methods.
01/23/2023 4
Statistical data: statistics as statistical data refers to
numerical descriptions of things.
These descriptions may take the form of counts or
measurements.
E.g. Statistics of malaria cases in one of malaria
detection and treatment posts of Ethiopia include fever
cases, number of positives obtained, sex and age
distribution of positive cases, etc.
01/23/2023 5
Con…
01/23/2023 6
Characteristics of statistical data
01/23/2023 7
ii) They must be affected to a marked extent by a
multiplicity of causes.
– This means that statistics are aggregates of such facts
only as grow out of a ' variety of circumstances'.
01/23/2023 8
Con…
01/23/2023 9
Con…
01/23/2023 11
Con…
iv)They must have been collected in a systematic manner
for a predetermined purpose.
Numerical data can be called statistics only if they have
been compiled in a properly planned manner and for a
purpose about which the enumerator had a definite idea.
Facts collected in an unsystematic manner and without a
complete awareness of the object, will be confusing and
cannot be made the basis of valid conclusions.
01/23/2023 12
Con…
01/23/2023 13
Con…
Statistical methods:
When the term 'statistics' is used to mean 'statistical
methods‘
It refers to a body of methods that are used for collecting,
organizing, analyzing and interpreting numerical data for
understanding a phenomenon or making wise decisions
In this sense it is a branch of scientific method and helps
us to know in a better way the object under study
01/23/2023 14
Biostatistics
• Definition: When the different statistical methods are
applied in biological, medical and public health data
they constitute the discipline of Biostatistics.
• An application of statistical method applied to life
and health sciences.
01/23/2023 15
Types of Biostatistics
01/23/2023 19
Biostatistics provides a framework for the analysis of
data
Through the application of statistic principles to the
biologic sciences, biostatisticians are able to
methodically distinguish between true differences
among observations and random variations caused by
chance alone.
01/23/2023 20
• From an application standpoint, knowledge of
biostatistics and epidemiology permits one to make
valid conclusions from data sets.
• Associations between risk factors and disease are
determined with this information and, ultimately, are
used to reduce illness and injury
– Assess the magnitude and associated factors of under five
Malnutrition in South Wollo Zone
01/23/2023 21
Biostatistics Roles in Public Health Functions
01/23/2023 24
Con…
01/23/2023 25
Con…
01/23/2023 26
Con…
01/23/2023 27
Variable
1. Qualitative variable:
A variable or characteristic which cannot be measured
in quantitative form but can only be identified by name
or categories.
Non-numerical
The notion of magnitude is absent
E.g.) place of birth, ethnic group, Blood group, stages
01/23/2023
of breast cancer (I, II, III, or IV) 29
Con…
2. Quantitative variable:
• A quantitative variable is one that can be measured
and expressed numerically
• Variables measured by assigning numbers to the
items
01/23/2023 30
Types of quantitative variables
1. Discrete
– The values of a discrete variable are usually whole
numbers,
– Is characterized by gaps or interruptions in the
values that it can assume
01/23/2023 31
Con…
Discrete data are restricted to taking only specified
values often integers or counts that differ by fixed
amounts.
Number of episodes of diarrhea in the first five years
of life,
Number of new AIDS cases reported during one year
period,
Number of students in the class
01/23/2023 32
Con…
2. Continuous
• Can assume any value within a specified relevant
interval of values assumed by the variable.
• A continuous variable is a measurement on a continuous
scale.
01/23/2023 33
Con…
01/23/2023 34
Measurement scales
• Measurement: a procedure where qualities or quantities are assigned
to characteristics of subjects, objects or events.
ii. Ordinal
iii. Interval
iv. Ratio
01/23/2023 36
1. Nominal scale(choice)
• The simplest type of data, where the measurement of a
variable involves the naming or categorization of possible
values of the variable
• The values fall into unordered categories or classes
• Mutually exclusive and collectively exhaustive categories
• Uses names, labels or symbols to assign each
measurement
– Examples: Blood type, sex, marital status, religion, cause of
illness, cause of death
01/23/2023 37
Example of nominal; scale
Marital status • The numbers have NO meaning
1. Single • They are labels only
2. Married
3. Widowed
4. Divorced
01/23/2023 38
Con…
• Yes/No; 0 or 1 data
– For example: pregnant Vs Non-pregnant, Smokers/Non-
smoker, Diabetic Vs Non-diabetic
01/23/2023 39
2. Ordinal scale
• Assigns each measurement to one of a limited
number of categories that are ranked in terms of order
• Although non-numerical, can be considered to have a
natural ordering
– Examples: patient status, cancer stages, social class etc
01/23/2023 40
• The spaces or intervals between the categories (order) are not
necessarily equal.
2. Agree
3. No opinion
4. Disagree
5. Strongly disagree
In the above situation, we only know that the data are ordered
01/23/2023 41
Example of an ordinal scale:
Health status
• The numbers have
1. Poor
2. Fair limited meaning
3. Good 5>4>3>2>1 is all we
4. Very good know apart from their
5. Excellent
utility as labels
01/23/2023 42
3. Interval Data:mcq
01/23/2023 43
Con…
01/23/2023 44
Con…
• Measured on a continuum and difference between any
two numbers on a scale are of known size.
Temp Oc 18 20 22 23
• For these data, not only is Mon. with 18 Oc cooler than
Thu with 23 Oc, but 5 Oc cooler.
• It has no true zero point (value). “0” is arbitrarily
chosen and doesn’t reflect the absence of temperature.
01/23/2023 45
4. Ratio scale
• The highest scale of measurement
• Measurement begins at a true zero point and the scale
has equal space
• Example: Height, age, weight etc
• It has meaningful ratio or quotient.
– Someone who weighs 80 kg is two times as heavy
as someone else who weighs 40 kg.
01/23/2023 46
• A measurement on a higher scale can be transformed
into one on a lower scale, but not vice versa
Example
Ratio scale: Birth weight of Newborns in grams (gm)
01/23/2023 48
Example
01/23/2023 49
Chapter 2
01/23/2023 50
Learning Objectives
01/23/2023 51
Data Collection Methods
Data collection techniques allow us to systematically
collect data about our objects of study (people,
objects, and phenomena) and about the setting in
which they occur.
In the collection of data we have to be systematic.
If data are collected haphazardly, it will be difficult
to answer our research questions in a conclusive way.
01/23/2023 52
Con…
• The validity and accuracy of final judgment is most
crucial and depends heavily on how well the data was
collected in the first place
01/23/2023 56
Con…
Disadvantages:
Investigators or observer’s own biases, desires, e.t.c
Needs more resources and skilled human power
during the use of high level machines
01/23/2023 57
2. Interviews and self-administered questionnaire
01/23/2023 58
Con…
01/23/2023 59
Con…
Advantage of self-administered questionnaires:
– Simpler and cheaper
– Can be administered to many persons simultaneously
– No interviewer bias
Disadvantage
– Demand a certain level of education and skill on the
part of the respondents
– Low response rate
– May not return questionnaire
– May not respond to all questions
– Lack of probing
01/23/2023 60
Con…
• Must be literate so that the questionnaire
could be read and understood
• May not return within the specified time
• Introduce self selection bias
• Not suitable for complex questionnaire
• People of a low socio-economic status are less likely
to respond
01/23/2023 61
Face to face interview
01/23/2023 62
Face-to-face interviews (Advantages)mcq
01/23/2023 65
3. Use of documentary sources:
Expense
01/23/2023 68
Con…
Suspicion
01/23/2023 69
Data
• Data are numbers which can be obtained from taking
measurements or can be obtained by counting or observation
• Numerical description of things
• The raw material for statistics
• The statistical data may be classified under two categories,
depending upon the sources.
1. Primary data
2. Secondary data
01/23/2023 70
Data
01/23/2023 71
1. Primary Data
01/23/2023 72
Con…
01/23/2023 73
2. Secondary Data:
• When an investigator uses data, which have already
been collected by others
• Secondary data can be obtained from journals, reports,
government publications, publications of professionals
and research organizations.
• Less expensive to collect both in money and time
• Lack of completeness
01/23/2023 74
Questionnaire
• The quality of research depends to a large extent on the
quality of the data collection tools
• Interviewing and administering questionnaires are
probably the most commonly used research techniques
• Therefore designing good ‘questioning tools’ forms an
important and time-consuming phase in the
development of most research proposals
01/23/2023 75
Con…
• Questionnaires are an inexpensive way to gather data
from a potentially large number of respondents
• They are the only feasible way to reach a number of
reviewers large enough to allow statistically analysis
of the results
• A well-designed questionnaire that is used effectively
can gather information
01/23/2023 76
Con…
• When a research instrument contains only questions
and statements to be answered by respondents, it is
called a questionnaire.
01/23/2023 78
Open-ended questions
01/23/2023 79
Con…
• Such questions are useful to obtain information on:
– Sensitive issues
01/23/2023 80
Closed-ended Questions
01/23/2023 82
Con…
For example
“What is your marital status?
1. Single
2. Married
3. Separated/divorced
4. Widowed
01/23/2023 83
Advantages of closed ended questions
• It saves time
• Comparing responses of different groups, or of the
same group over time, becomes easier.
• Answers easier to analyze on computer and response
choices make question clearer
01/23/2023 84
Con…
01/23/2023 86
Questionnaire Design
• Designing a questionnaire always takes several drafts.
01/23/2023 88
Step 1: Content
01/23/2023 89
Step 2: Formulating questions:
01/23/2023 90
Con…
01/23/2023 91
Avoid leading questions:
• A question is leading if it suggests a certain answer.
– For example, the question, ''Do you agree that the
district health team should visit each health center
monthly?'' hardly leaves room for “no” or for other
options
– Better would be: “Do you think that district health
teams should visit each health center?
01/23/2023 92
• Sometimes, a question is leading because it presupposes
a certain condition.
• For example: “What action did you take when your
child had diarrhea the last time?” presupposes the child
has had diarrhea
• A better set of questions would be:
– “Has your child had diarrhea?
01/23/2023 94
Con…
• Start with an interesting but non-controversial question
(preferably open) that is directly related to the subject of the
study.
– This type of beginning should help to raise the informants’
interest
01/23/2023 97
Con…
01/23/2023 98
Step 5:Translation
01/23/2023 100
What makes a well designed questionnaire?
⇒ High response
⇒ Easier to collect
to summarize
to analyse
01/23/2023 101
In general in questionnaire design remember to:
01/23/2023 106
Methods of data organization and presentation
01/23/2023 108
Array (ordered array)
01/23/2023 109
Con…
01/23/2023 110
Example
01/23/2023 111
Frequency Distribution
The arrangement of data set in a table using values and
their corresponding frequency of occurrence within a
data set.
Frequency: the number of same values within a data set.
01/23/2023 112
Con…
01/23/2023 114
Frequency distributions for categorical
variables
01/23/2023 117
Example 1
01/23/2023 118
Con…
01/23/2023 120
Example 2
• A study was conducted to assess immunization
coverage among 830 children between 12-23 months
by collecting data on sex
Sex Frequency Relative frequency
Male 422 50.8%
Female 408 49.2%
Total 830 100%
01/23/2023 121
Example 3
n=118 female patients were diagnosed with regards to
depressive illness as “no depression”, “mild depression”,
“moderate depression””
The observed absolute and relative frequencies are shown in
a one-way table:
Depression Frequency Relative Frequency
None 26 23.6
Mild 67 60.9
Moderate 17 15.5
Total 110 100.0
01/23/2023 122
Example 4: Ordinal data
Pain intensity Frequency Relative frequency
No pain 70 35%
Mild pain 44 22%
Moderate pain 46 23%
Severe pain 26 13%
Very severe pain 14 7%
Total 200 100%
01/23/2023 123
Frequency distribution for numerical variables
01/23/2023 124
Con…
01/23/2023 125
Example
01/23/2023 126
Con…
01/23/2023 128
Grouped frequency distribution
01/23/2023 129
Con…
• For large samples, we can’t use the simple frequency
table to represent the data.
• We need to divide the data into groups or intervals or
classes.
• So, we need to determine:
1. The number of intervals (k): choosing the number of classes
2. The range (R): It is the difference between the largest and the
smallest observation in the data set.
3. The Width of the interval (w): Class intervals generally should
be of the same width. Each class should cover, namely, from
where to where each class should go.
01/23/2023 130
Con…
01/23/2023 131
`
Some rules that are generally observed:
Too few intervals are not good because information will be
lost. And Too many intervals are not helpful to summarize
the data.
We seldom use fewer than 6 or more than 20 classes;
commonly followed rule is that 6<k<15,
The exact number we use in a given situation depends
mainly on the number of measurements or observations we
have to group
01/23/2023 132
Con…
A guide on the determination of the number of classes
(k) can be the Sturge’s Formula, given by:
K = 1 + 3.322×log(n), where n is the number of
observations
And the length or width of the class interval (w) can be
calculated by:
W = (Maximum value – Minimum value)/K
= Range/K
01/23/2023 133
N.B):
The Sturge's rule should not be regarded as final, but
should be considered as a guide only.
The number of classes specified by the rule should be
increased or decreased for convenient or clear
presentation.
01/23/2023 134
Con…
• Suppose, for example, that we have a sample of 275
observations that we want to group
• The logarithm to the base 10 of 275 is 2.4393
• Applying Sturge's’ formula gives k= 1 + 3.3229(2.4393)
≈9
• In practice, other considerations might cause us to use 8 or
fewer or perhaps 10 or more class intervals
01/23/2023 135
Con…
Classes should be mutually exclusive.
Make sure that the smallest and largest values fall within
the classification, that
None of the values can fall into possible gaps between
successive classes, and that the classes do not overlap,
namely, that successive classes have no values in common
I.e. Class intervals should be continuous, non overlapping,
mutually exclusive and exhaustive
01/23/2023 136
Determination of class limits
01/23/2023 138
Con…
• Add the width to the lowest score taken as the starting
point to get the lower limit of the next class.
01/23/2023 139
2. Sorting (or tallying) of the data into these classes,
3. Counting the number of items in each class, and
4. Displaying the results in the form of a chart or table
01/23/2023 140
• These data represent the record high temperatures in
degrees Fahrenheit (F) for each of the 50 states.
Construct a grouped frequency distribution
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
01/23/2023 141
Step 1 Determine the classes.
• Find the highest value and lowest value:
H = 134 and L= 100.
• Find the range: R highest value - lowest value
R= 134-100= 34
• Select the number of classes desired; Using this
formula K = 1 + 3.322×log(n), where n is the number
of observations (n=50)
K= 1 + 3.322×log(50) ≈ 7 classes
01/23/2023 142
Find the class width by dividing the range by
the number of classes.
01/23/2023 143
Con…
Select a starting point for the lowest class limit.
• Add the width to the lowest score taken as the starting point
to get the lower limit of the next class and keep adding
100
105
110
115
120
125
135
01/23/2023 144
• Subtract one unit from the lower limit of the second class to get
the upper limit of the first class. Then add the width to each
upper limit to get all the upper limits.
• 105 – 1 = 104
then
104
109
114
119
124
129
134
139
Step 2: Tally the data.
Step 3: Find the numerical frequencies from the tallies.
01/23/2023 145
01/23/2023 146
Example 2:
Construct a grouped frequency distribution of the
following data on the amount of time (in hours) that
80 college students devoted to leisure activities
during a typical school week:
01/23/2023 147
The data set for 80 college students
23 24 18 14 20 24 24 26 23 21
16 15 19 20 22 14 13 20 19 27
29 22 38 28 34 32 23 19 21 31
16 28 19 18 12 27 15 21 25 16
30 17 22 29 29 18 25 20 16 11
17 12 15 24 25 21 22 17 18 15
21 20 23 18 17 15 16 26 23 22
11 16 18 20 23 19 17 15 20 10
01/23/2023 148
Step 1 Determine the classes.
Select the number of classes desired.
Using the above formula,
K = 1 + 3.322 × log (80) = 7.32 ≈ 7 classes
01/23/2023 149
Con…
01/23/2023 150
• Select a starting point (usually the lowest value; add
the width to get the lower limits.
(10, 15, 20, 25, 30, 35)
• Find the upper class limits.
(14, 19, 24, 29, 34, 39,)
• Step 2: Tally the data.
• Step 3: Find the numerical frequencies from the
tallies, and find the cumulative frequencies.
01/23/2023 151
Time spent (in Tally Frequency Cumulative frequency
hours)
30-34 //// 4 79
35-39 / 1 80
01/23/2023 152
Cumulative Frequency:
When frequencies of two or more classes are added up,
such total frequencies are called Cumulative Frequencies.
This frequencies help as to find the total number of items
whose values are less than or greater than some value.
Relative frequencies
On the other hand, express the frequency of each value
or class as a percentage to the total frequency.
01/23/2023 153
Cumulative relative frequency:
• The proportion of the total number of observations
that have a value less than or equal to the upper limit
of the interval
01/23/2023 154
• Note: two ways of expressing cumulative frequency distribution
• Less than cumulative frequency distribution'
• If we start the cumulating from the lowest size of the
variable to the highest size,
• More than cumulative frequency distribution
2
01/23/2023 156
Determination of Class Boundaries
• True limits (or class boundaries) are those limits, which
are determined mathematically to make an interval of a
continuous variable continuous in both directions, and
no gap exists between classes.
• Used for smoothening of the class intervals
• Subtract 0.5 from the lower and add it to the upper limit
01/23/2023 157
01/23/2023 158
Statistical Tables mcq
• A statistical table is an orderly and systematic presentation
of numerical data in rows and columns.
• Rows (stubs) are horizontal and columns (captions) are
vertical arrangements.
• The use of tables for organizing data involves grouping the
data into mutually exclusive categories of the variables and
counting the number of occurrences (frequency) to each
category.
01/23/2023 159
• These mutually exclusive categories, for qualitative
variables, are naturally occurring groupings.
– For example, Sex (Male, Female), Marital status
(single, Married, divorced, widowed, etc.), Blood
group (A, B, AB, O),
01/23/2023 160
• In the case of large size quantitative variables like
weight, height, etc. measurements, the groups are
formed by amalgamating (mixing) continuous values
into classes of intervals.
01/23/2023 161
Based on the purpose for which the table is designed
and the complexity of the relationship, a table could
be either of
– Simple frequency table or
– Cross tabulation.
01/23/2023 162
The simple frequency table is used when the
individual observations involve only to a single
variable
Whereas
• The cross tabulation is used to obtain the frequency
distribution of one variable by the subset of another
variable.
01/23/2023 163
• For simple frequency distributions, the denominators
for the percentages are the sum of all observed
frequencies, i.e. 210
• On the other hand, in cross tabulated frequency
distributions where there are row and column totals,
the decision for the denominator is based on the
variable of interest to be compared over the subset of
the other variable.
01/23/2023 164
Construction of tables
• Although there are no hard and fast rules to follow, the following
general principles should be addressed in constructing tables.
1. Tables should be as simple as possible.
2. Tables should be self-explanatory.
3. If data are not original, their source should be given in a
footnote.
01/23/2023 165
Con…
Self-explanatory tables:
• Title should be clear and to the point( a good title
answers: what? when? where? how classified ?) and it be
placed above the table.
• Each row and column should be labeled
• Numerical entities of zero should be explicitly written
rather than indicated by a dash. Dashed are reserved for
missing or unobserved data.
01/23/2023 166
Con…
01/23/2023 167
Con..
01/23/2023 168
Table 1: Overall immunization status of children in Adami Tullu
Woreda, Feb. 1995
Immunization Status
Immunized Non Immunized
Marital Status
No. % No. No. % Total
Residence
01/23/2023 173
Diagrammatic Representation of Data
01/23/2023 174
Importance of Diagrammatic Representation
01/23/2023 175
Con…
01/23/2023 180
Bar Chart
01/23/2023 181
Con…
• Plotting the frequency (or relative frequency) of each
category, and drawing a bar
• Categories are listed on the horizontal axis (X-axis)
• Frequencies or relative frequencies are represented on
the Y-axis (ordinate)
• The height of each bar is proportional to the frequency
or relative frequency of observations in that category
01/23/2023 182
Con…
01/23/2023 183
Con…
01/23/2023
(leave space between bars) 184
30%
25%
20%
15%
25%
10%
18%
5% 12%
0%
Blood film Stool Ix CBC
01/23/2023 185
B. Multiple bar chart:
01/23/2023 186
01/23/2023 187
C. Component (or sub-divided) Bar Diagram:
• Bars are sub-divided into two or more component
parts of the figure
• These sorts of diagrams are constructed when each
total is built up from two or more component figures.
• Each part of the bar represents a certain item and
proportional to the magnitude of that particular item.
01/23/2023 188
Con….
01/23/2023 189
01/23/2023 190
II. Percentage Component Bar Diagram:
• Where the individual component lengths represent the
percentage each component forms the over all total.
Note that a series of such bars will all be the same total
height, i.e., 100 percent.
01/23/2023 191
01/23/2023 192
Pie chart
• Usual in qualitative data
43 793
268
Very low
Low
Normal
Big
8870
01/23/2023 194
Histograms
• Histograms are frequency distributions with continuous class
intervals that have been turned into graphs
• To construct a histogram, we draw the interval boundaries on a
horizontal line and the frequencies on a vertical line
• The bars are drawn to touch each other, to show the underlying
continuity of the data
• Bars are then drawn over the intervals in such a way that the
areas of the bars are all proportional in the same way to their
interval frequencies.
01/23/2023 195
Con….
01/23/2023 196
Con…
01/23/2023 198
Time spent (hrs) Class boundary Frequency Cumulative freq
10 – 14 9.5-14.5 8 8
15 – 19 14.5-19.5 28 36
20-24 19.5-24.5 27 63
25 – 29 24.5-29.5 12 75
30 – 34 29.5-34.5 4 79
35 – 39 34.5-39.5 1 80
01/23/2023 199
01/23/2023 200
Frequency polygon: mcq
10 – 14 9.5-14.5 12 8 8
15 – 19 14.5-19.5 17 28 36
20-24 19.5-24.5 22 27 63
25 – 29 24.5-29.5 27 12 75
30 – 34 29.5-34.5 32 4 79
35 – 39 34.5-39.5 37 1 80
01/23/2023 205
01/23/2023 206
Con…
• O-give or cumulative frequency curve: When the
cumulative frequencies of a distribution are graphed the
resulting curve is called O-give Curve.
• The cumulative frequency is the sum of the frequencies
accumulated up to the upper boundary of a class in the
distribution.
• The O-give is a graph that represents the cumulative
frequencies for the classes in a frequency distribution.207
01/23/2023
Con…
To construct an Ogive curve:
i) Compute the cumulative frequency of the distribution.
ii) Prepare a graph with the cumulative frequency on the vertical axis
and the true upper class limits (class boundaries) of the interval
scaled along the X-axis (horizontal axis).
The true lower limit of the lowest class interval with lowest scores
is included in the X-axis scale; this is also the true upper limit of
the next lower interval having a cumulative frequency of 0.
01/23/2023 208
01/23/2023 209
The line diagram:mcq
• The line graph is especially useful for the study of some variables
according to the passage of time.
• The time, in weeks, months or years is marked along the
horizontal axis; and the value of the quantity that is being studied
is marked on the vertical axis.
• The distance of each plotted point above the base-line indicates its
numerical value.
• The line graph is suitable for depicting a consecutive trend of a
series over a long period.
01/23/2023 210
01/23/2023 211
Thank
you
01/23/2023 212