You are on page 1of 212

College of Medicine and Health Sciences

Department of Public Health


Epidemiology and Biostatistics unit

Fentaw T. (BSc, MPH)


01/23/2023 1
Chapter one
Introduction to Statistics

01/23/2023 2
Learning objectives

After completing this chapter, the student will be able to:


 Define Statistics and Biostatistics
 Enumerate the importance and limitations of statistics
 Define and Identify the different types of data and
 Define variable
 Identify with different classification of variables

01/23/2023 3
Statistics
• A field of study concerned with collection,
organization and summarization of data, and the
drawing of inferences about a body of data when only
part of the data are observed.
• The term statistics is used to mean either statistical
data or statistical methods.

01/23/2023 4
 Statistical data: statistics as statistical data refers to
numerical descriptions of things.
 These descriptions may take the form of counts or
measurements.
 E.g. Statistics of malaria cases in one of malaria
detection and treatment posts of Ethiopia include fever
cases, number of positives obtained, sex and age
distribution of positive cases, etc.
01/23/2023 5
Con…

• NB: Even though statistical data always denote figures


(numerical descriptions) it must be remembered that
all 'numerical descriptions' are not statistical data.

01/23/2023 6
Characteristics of statistical data

• In order that numerical descriptions may be called


statistics they must possess the following characteristics:
i. They must be in aggregates
• This means that statistics are 'number of facts.'

• A single fact, even though numerically stated,


cannot be called statistics.

01/23/2023 7
ii) They must be affected to a marked extent by a
multiplicity of causes.
– This means that statistics are aggregates of such facts
only as grow out of a ' variety of circumstances'.

01/23/2023 8
Con…

 E.g. the explosion of outbreak is attributable to a


number of factors, Viz., Human factors, parasite
factors, mosquito and environmental factors.
 All these factors acting jointly determine the severity
of the outbreak and it is very difficult to assess the
individual contribution of any one of these factors.

01/23/2023 9
Con…

iii) They must be enumerated or estimated according


to a reasonable standard of accuracy
 Statistics must be enumerated or estimated
according to reasonable standards of accuracy.
 This means that if aggregates of numerical facts are
to be called 'statistics' they must be reasonably
accurate.
01/23/2023 10
Con…

 This is necessary because statistical data are to


serve as a basis for statistical investigations
 If the basis happens to be incorrect the results are
bound to be misleading

01/23/2023 11
Con…
iv)They must have been collected in a systematic manner
for a predetermined purpose.
 Numerical data can be called statistics only if they have
been compiled in a properly planned manner and for a
purpose about which the enumerator had a definite idea.
 Facts collected in an unsystematic manner and without a
complete awareness of the object, will be confusing and
cannot be made the basis of valid conclusions.

01/23/2023 12
Con…

v) They must be placed in relation to each other.


 They must be comparable.
 Numerical facts may be placed in relation to each
other either in point of time, space or condition.
 The phrase, ‘placed in relation to each other'
suggests that the facts should be comparable.

01/23/2023 13
Con…

Statistical methods:
 When the term 'statistics' is used to mean 'statistical
methods‘
 It refers to a body of methods that are used for collecting,
organizing, analyzing and interpreting numerical data for
understanding a phenomenon or making wise decisions
 In this sense it is a branch of scientific method and helps
us to know in a better way the object under study
01/23/2023 14
Biostatistics
• Definition: When the different statistical methods are
applied in biological, medical and public health data
they constitute the discipline of Biostatistics.
• An application of statistical method applied to life
and health sciences.

01/23/2023 15
Types of Biostatistics

1. Descriptive statistical methods


• Consists of the collection, organization, classification,
summarization, and presentation of data obtain from
the sample.
• Concerned with describing the sampled data
• To check statistical assumptions (for example those
needed to test a hypothesis, statistical inference!)
01/23/2023 16
 It is used only to describe the sample or summarize
information about the sample
Descriptive statistics includes:
 Tables
 Graphs
– Bar graph, Pie chart, Scatter plot
 Numerical summary measures
– Measure of central tendency
– Measure of variation
01/23/2023 17
Con…
2. Inferential statistical methods:mcq
 Usually referred to by “statistical analysis”

 Deal with using the sample to make statement about a wider


population

 Sample statistics are used to make statements about


population values

 Used to make infer, estimate or approximate the


characteristics of target population
01/23/2023 18
Con…

 Used when we want to draw a conclusion for the data


obtain from the sample
 i.e. The basic aim of all statistical inference is to
employ sampled data to infer to a population from
which the sample was obtained.

01/23/2023 19
 Biostatistics provides a framework for the analysis of
data
 Through the application of statistic principles to the
biologic sciences, biostatisticians are able to
methodically distinguish between true differences
among observations and random variations caused by
chance alone.
01/23/2023 20
• From an application standpoint, knowledge of
biostatistics and epidemiology permits one to make
valid conclusions from data sets.
• Associations between risk factors and disease are
determined with this information and, ultimately, are
used to reduce illness and injury
– Assess the magnitude and associated factors of under five
Malnutrition in South Wollo Zone
01/23/2023 21
Biostatistics Roles in Public Health Functions

• What is Public Health all about?


– “Public Health is the science and art of preventing
disease, prolonging life and promoting health through
the organizedzed efforts of society.”
(World Health Organization)
• Fields of Public Health: Epidemiology, Biostatistics, RH
and Demography, Nutrition, EH, CDC, Research Methods
etc.
01/23/2023 22
The Functions of Public Health
• Assessment: Identify public health problems problem related to
the public’s health, and measure their extent
• Policy Setting: Prioritize problems, find possible solutions,
set strategies to achieve change and predict effect on the
population
• Implementation/Intervention: Provide services as determined by
policy, and or compliance
• Monitoring and Evaluation: to see whether the policy/strategy
designed and implemented addresses the problem identified
01/23/2023 23
Con…

• Role of Biostatistics in assessment:


– Decide which information to gather,
– Find patterns in collected data, and

– Make the best summary description of the


population and associated problems

01/23/2023 24
Con…

– Design general surveys of the population needs

– Plan experiments to supplement these surveys,


– Assists scientists in estimating the extent of health
problems and associated risk factors

01/23/2023 25
Con…

• Role of the Biostatistics in Policy Setting:


– Measure problems,
– Prioritize problems,

– Quantify associations of risk factors with disease,


– Predict the effect of policy changes, and
– Estimate costs

01/23/2023 26
Con…

• Role of the Biostatistics in Monitoring & Evaluation:


– Use sampling and estimation methods to study the
factors related to compliance and outcome
– Measuring indicators for program monitoring and
evaluation

01/23/2023 27
Variable

• A characteristics that when observed or measured takes


on different values in different individuals (elements)
persons, places, or things..
• Any aspect of an individual or object that is measured
(BP) or recorded (sex)and takes any value
• For example: heart rate, the heights of adult males, the
weights of preschool children
01/23/2023 28
Types of variables

1. Qualitative variable:
 A variable or characteristic which cannot be measured
in quantitative form but can only be identified by name
or categories.
 Non-numerical
 The notion of magnitude is absent
 E.g.) place of birth, ethnic group, Blood group, stages

01/23/2023
of breast cancer (I, II, III, or IV) 29
Con…

2. Quantitative variable:
• A quantitative variable is one that can be measured
and expressed numerically
• Variables measured by assigning numbers to the
items

01/23/2023 30
Types of quantitative variables

1. Discrete
– The values of a discrete variable are usually whole
numbers,
– Is characterized by gaps or interruptions in the
values that it can assume

01/23/2023 31
Con…
 Discrete data are restricted to taking only specified
values often integers or counts that differ by fixed
amounts.
 Number of episodes of diarrhea in the first five years
of life,
 Number of new AIDS cases reported during one year
period,
 Number of students in the class
01/23/2023 32
Con…

2. Continuous
• Can assume any value within a specified relevant
interval of values assumed by the variable.
• A continuous variable is a measurement on a continuous
scale.

01/23/2023 33
Con…

 Does not possess gaps or interruptions


E.g. weight, height, blood pressure, age, etc.
 No matter how close together the observed heights of
two people, we can find another person whose height
falls somewhere in between

01/23/2023 34
Measurement scales
• Measurement: a procedure where qualities or quantities are assigned
to characteristics of subjects, objects or events.

• All measurements are not the same.

– E.g. Measuring weight = 40 kg

– Measuring the status of a patient on scale = “improve”, “stable”, “Not


improved”

• Measurement scales are important for statistical analysis of data (the


type of measurement and data determines the type of appropriate
statistics for analysis).
01/23/2023 35
Types of scales of measurement

There are four Types of scales of measurement


i. Nominal

ii. Ordinal

iii. Interval
iv. Ratio

01/23/2023 36
1. Nominal scale(choice)
• The simplest type of data, where the measurement of a
variable involves the naming or categorization of possible
values of the variable
• The values fall into unordered categories or classes
• Mutually exclusive and collectively exhaustive categories
• Uses names, labels or symbols to assign each
measurement
– Examples: Blood type, sex, marital status, religion, cause of
illness, cause of death
01/23/2023 37
Example of nominal; scale
Marital status • The numbers have NO meaning
1. Single • They are labels only
2. Married
3. Widowed
4. Divorced

01/23/2023 38
Con…

• Dichotomous or Binary: if nominal data can take only


two possible values or categories
– E.g. Sex is not only nominal; it is dichotomous (Male or
Female)

• Yes/No; 0 or 1 data
– For example: pregnant Vs Non-pregnant, Smokers/Non-
smoker, Diabetic Vs Non-diabetic

01/23/2023 39
2. Ordinal scale
• Assigns each measurement to one of a limited
number of categories that are ranked in terms of order
• Although non-numerical, can be considered to have a
natural ordering
– Examples: patient status, cancer stages, social class etc

01/23/2023 40
• The spaces or intervals between the categories (order) are not
necessarily equal.

E.g. the likert scales are ordinal scales


1. Strongly agree

2. Agree

3. No opinion

4. Disagree

5. Strongly disagree
 In the above situation, we only know that the data are ordered
01/23/2023 41
Example of an ordinal scale:
Health status
• The numbers have
1. Poor
2. Fair limited meaning
3. Good 5>4>3>2>1 is all we
4. Very good know apart from their
5. Excellent
utility as labels

01/23/2023 42
3. Interval Data:mcq

• In interval data the intervals between values are the


same.
• There is meaningful differences between values
• E.g., in the Fahrenheit temperature scale,
– The difference between 70 degrees and 71 degrees is
the same as the difference between 32 and 33 degrees

01/23/2023 43
Con…

• But, the Interval scale is not a Ratio Scale.

– 40 degrees Fahrenheit is not twice as much as 20


degrees Fahrenheit.
• Another example is Intelligence Quotient (IQ)

01/23/2023 44
Con…
• Measured on a continuum and difference between any
two numbers on a scale are of known size.

• Example: Temperature in Oc on 4 consecutive days


Days Mon. Tue. Wed. Thu.

Temp Oc 18 20 22 23
• For these data, not only is Mon. with 18 Oc cooler than
Thu with 23 Oc, but 5 Oc cooler.
• It has no true zero point (value). “0” is arbitrarily
chosen and doesn’t reflect the absence of temperature.
01/23/2023 45
4. Ratio scale
• The highest scale of measurement
• Measurement begins at a true zero point and the scale
has equal space
• Example: Height, age, weight etc
• It has meaningful ratio or quotient.
– Someone who weighs 80 kg is two times as heavy
as someone else who weighs 40 kg.
01/23/2023 46
• A measurement on a higher scale can be transformed
into one on a lower scale, but not vice versa
Example
 Ratio scale: Birth weight of Newborns in grams (gm)

 Interval scale: Birth weight of Newborns in gm above an


arbitrary value (e.g. above 2000 gm)
 Ordinal scale: the comparative weight of newborns

(Lowest, Middle, Highest)


 Nominal scale: Categorizing Birth weight of Newborns as
“Normal” or “Under weight”
01/23/2023 47
Dependent variable Vs Independent variable
• Dependent: t/f
– The variable we measure as the outcome of
interest, or response
• Independent: t/f
– The variable that explains the dependent variable.
– It is also named as explanatory of predictor
variable
– Parasitic infections Anemia
– Change in attitude Change in practice

01/23/2023 48
Example

• The study will be conducted to identify factors


associated with ANC utilization/use; Then
• What is the dependent variable?
• What is the independent variable/s?

01/23/2023 49
Chapter 2

Methods Of Data Collection,


Organization And Presentation

01/23/2023 50
Learning Objectives

• At the end of this chapter, the students will be able to:


– Identify the different methods of data organization
and presentation
– Identify the different methods of data collection
– Define a questionnaire

01/23/2023 51
Data Collection Methods
 Data collection techniques allow us to systematically
collect data about our objects of study (people,
objects, and phenomena) and about the setting in
which they occur.
 In the collection of data we have to be systematic.
 If data are collected haphazardly, it will be difficult
to answer our research questions in a conclusive way.
01/23/2023 52
Con…
• The validity and accuracy of final judgment is most
crucial and depends heavily on how well the data was
collected in the first place

• The quality of data will greatly affect the conditions

– Hence at most importance must be given to this


process and every possible precautions should be
taken to ensure accuracy while collecting the data
01/23/2023 53
Con…
• Various data collection techniques can be used
such as:
– Observation
– Face-to-face and self-administered interviews
– Postal or mail method and telephone interviews
– Using available information (use Secondary data)
– Focus group discussions (FGDs)
01/23/2023 54
1. Observation

• It is a technique that involves systematically selecting,


watching and recoding behaviors of people or other
phenomena and aspects of the setting in which they
occur, for the purpose of getting (gaining) specified
information
• It includes all methods from simple visual
observations to the use of high level machines and
measurements, sophisticated equipment or facilities
01/23/2023 55
Con…

 Outline the guidelines/checklist for the observations


prior to actual data collection
 Advantages:

Gives relatively more accurate data on behavior


and activities

01/23/2023 56
Con…

 Disadvantages:
 Investigators or observer’s own biases, desires, e.t.c
Needs more resources and skilled human power
during the use of high level machines

01/23/2023 57
2. Interviews and self-administered questionnaire

• They are probably the most commonly used research


data collection techniques
• Designing good “questioning tools” forms an
important and time consuming phase in the
development of most research proposals

01/23/2023 58
Con…

• Standardized methods of asking questions are usually


preferred in community medicine research.
• Less structured interviews may be useful in a
preliminary survey, where the purpose is to obtain
information to help in the subsequent planning of a
study rather than factors for analysis

01/23/2023 59
Con…
Advantage of self-administered questionnaires:
– Simpler and cheaper
– Can be administered to many persons simultaneously
– No interviewer bias
Disadvantage
– Demand a certain level of education and skill on the
part of the respondents
– Low response rate
– May not return questionnaire
– May not respond to all questions
– Lack of probing
01/23/2023 60
Con…
• Must be literate so that the questionnaire
could be read and understood
• May not return within the specified time
• Introduce self selection bias
• Not suitable for complex questionnaire
• People of a low socio-economic status are less likely
to respond
01/23/2023 61
Face to face interview

• The investigator personally meets them and asks


questions to gather the necessary informations
• The persons from whom informations are collected are
known as informants or respondents

01/23/2023 62
Face-to-face interviews (Advantages)mcq

• A good interviewer can stimulate and maintain the respondent’s


interest
• Can create atmosphere conducive to the answering of question
• Serious approach by respondent resulting in accurate
information

• Good response rate

• Completed and immediate

• Possible in-depth questions


01/23/2023 63
Con…
• Interviewer in control and can give help if there is a problem
i.e. Provide an explanation or alternative wording, can use
props
• Can investigate motives and feelings
• Can use recording equipment

• Characteristics of respondent assessed: tone of voice, facial


expression, hesitation, etc.
• Answers for questions about which the informant is likely to
be sensitive can be gathered
01/23/2023 64
Disadvantages:

 Need to set up interviews


 Time consuming
 Geographic limitations
 Can be expensive
 Normally need a set of questions
 Respondent bias

01/23/2023 65
3. Use of documentary sources:

 Clinical and other personal records, death


certificates, published mortality statistics, census
publications
Examples:
 Official publications of Central Statistical Authority
 Publication of Ministry of Health and Other Ministries
 News Papers and Journals
 International Publications like Publications by WHO,
World Bank, UNICEF
 Records of hospitals or any Health Institutions.
01/23/2023 66
Merits of Secondary Data:

• Secondary data is cheap to obtain


• Large quantities of secondary data can be got through
internet
• Much of the secondary data available has been
collected for many years and therefore it can be used
to plot trends
• Secondary data is of value to the government
– Help in making decisions and planning future policy
01/23/2023 67
 It is important to recognize some of the main
problems that may be faced when collecting data
 Language barriers
 Lack of adequate time

 Expense

 Inadequately trained and experienced staff


 Invasion of privacy

01/23/2023 68
Con…

 Suspicion

 Bias (project, person, season, diplomatic,


Professional)
 Cultural norms (e.g. which may preclude men
interviewing women)

01/23/2023 69
Data
• Data are numbers which can be obtained from taking
measurements or can be obtained by counting or observation
• Numerical description of things
• The raw material for statistics
• The statistical data may be classified under two categories,
depending upon the sources.
1. Primary data
2. Secondary data
01/23/2023 70
Data

• Data are numbers which can be obtained from taking


measurements or can be obtained by counting or
observation
• Numerical description of things
• The raw material for statistics

01/23/2023 71
1. Primary Data

 Are those data, which are collected by the


investigator himself for the purpose of a specific
inquiry or study
 Such data are original in character and are mostly
generated by surveys conducted by individuals or
research institutions

01/23/2023 72
Con…

 More reliable and accurate since the investigator can


extract the correct information by removing doubts
 High response rates might be obtained
 Permits explanation of questions concerning difficult
subject matter

01/23/2023 73
2. Secondary Data:
• When an investigator uses data, which have already
been collected by others
• Secondary data can be obtained from journals, reports,
government publications, publications of professionals
and research organizations.
• Less expensive to collect both in money and time
• Lack of completeness

01/23/2023 74
Questionnaire
• The quality of research depends to a large extent on the
quality of the data collection tools
• Interviewing and administering questionnaires are
probably the most commonly used research techniques
• Therefore designing good ‘questioning tools’ forms an
important and time-consuming phase in the
development of most research proposals
01/23/2023 75
Con…
• Questionnaires are an inexpensive way to gather data
from a potentially large number of respondents
• They are the only feasible way to reach a number of
reviewers large enough to allow statistically analysis
of the results
• A well-designed questionnaire that is used effectively
can gather information
01/23/2023 76
Con…
• When a research instrument contains only questions
and statements to be answered by respondents, it is
called a questionnaire.

• Questionnaire: is an instrument of data collection which


includes different questions regarding basic
demographic information, other personal attributes and
so on
01/23/2023 77
Types of questions

• Depending on how questions are asked and recorded


we can distinguish two major possibilities
Open-ended questions, and
Closed-ended questions

01/23/2023 78
Open-ended questions

• Open-ended questions permit free responses that


should be recorded in the respondent’s own words
• The respondent is not given any possible answers to
choose from

01/23/2023 79
Con…
• Such questions are useful to obtain information on:

– Facts with which the researcher is not very familiar,

– Opinions, attitudes, and suggestions of informants,

– Sensitive issues

• E.g. “Can you describe exactly what the traditional


birth attendant did when your labor started?”

01/23/2023 80
Closed-ended Questions

 Closed questions offer a list of possible options or


answers from which the respondents must choose
 When designing closed questions one should try to:
• Offer a list of options that are exhaustive and
mutually exclusive
• Keep the number of options as few as possible

 Closed questions are useful if the range of possible


responses is known
01/23/2023 81
Con…

• Most commonly used for background variables such


as age, marital status or education,
• Closed questions may be used to get the respondents
to express their opinions or attitudes by choosing
rating points on a scale.

01/23/2023 82
Con…

For example
“What is your marital status?
1. Single
2. Married
3. Separated/divorced
4. Widowed

01/23/2023 83
Advantages of closed ended questions

• It saves time
• Comparing responses of different groups, or of the
same group over time, becomes easier.
• Answers easier to analyze on computer and response
choices make question clearer

01/23/2023 84
Con…

• Most appropriate when the range of possible alternative


responses is known, limited and clear cut
• Generally most efficient to administer and score than
open ended question
• Useful for addressing sensitive questions or stressful
topics about which respondents may be reluctant to
respond
01/23/2023 85
Requirements of questions

Must have face validity


Must be clear and unambiguous
Must not be offensive
Sensitive questions: In such situations the interviewer
(questioner) should do it very carefully and wisely
The questions should be fair

01/23/2023 86
Questionnaire Design
• Designing a questionnaire always takes several drafts.

• In the first draft we should concentrate on the content.

• In the second, we should look critically at the formulation and


sequencing of the questions.

• Then we should examine the format of the questionnaire.

• Finally, we should do a test-run to check whether the


questionnaire gives us the information we required & whether
both the respondents & we feel at ease with it
01/23/2023 87
Steps in designing questionnaire(write &
discuss)
Step 1: Content
Step 2: Formulating questions
Step 3: Sequencing the questions

Step 4: Formatting the questionnaire


Step 5: Translation
Step 6: pre-test

01/23/2023 88
Step 1: Content

• Take your objectives and variables as a starting point


• Decide what questions will be needed to measure or
to define your variables and reach your objectives
• When developing the questionnaire, you should
reconsider the variables you have chosen, and, if
necessary, add, drop or change some

01/23/2023 89
Step 2: Formulating questions:

• Formulate one or more questions that will provide the


information needed for each variable
– Check whether each question measures one thing at a time

• For example, the question, “Do you think the treatment


you received from the doctor was costly and ineffective?”
– Would better be divided into two questions because cost and
ineffective are two different things

01/23/2023 90
Con…

 Take care that questions are specific and precise


enough that different respondents do not interpret
them differently
 Ask sensitive questions in a socially acceptable way

01/23/2023 91
Avoid leading questions:
• A question is leading if it suggests a certain answer.
– For example, the question, ''Do you agree that the
district health team should visit each health center
monthly?'' hardly leaves room for “no” or for other
options
– Better would be: “Do you think that district health
teams should visit each health center?
01/23/2023 92
• Sometimes, a question is leading because it presupposes
a certain condition.
• For example: “What action did you take when your
child had diarrhea the last time?” presupposes the child
has had diarrhea
• A better set of questions would be:
– “Has your child had diarrhea?

– If yes, when was the last time?”


– “Did you do anything to treat it? If yes, what?”
01/23/2023 93
Step 3: Sequencing the questions:
• Design your interview schedule or questionnaire to be
'informant friendly’
• The sequence of questions must be logical for the respondent
& allow as mush as possible for a “natural” discussion.
• At the beginning of the interview, keep questions concerning
“background variables”
– E.g. Age, religion, education, marital status, or occupation

01/23/2023 94
Con…
• Start with an interesting but non-controversial question
(preferably open) that is directly related to the subject of the
study.
– This type of beginning should help to raise the informants’
interest

• Pose more sensitive questions as late as possible in the


interview
– E.g. questions pertaining to income, sexual behaviour, or
diseases with stigma attached to them, etc.
01/23/2023 95
Con…
• Use simple everyday language
• Objectionable, time-consuming, or especially difficult
questions should be at the end
• Selecting the first question is crucial
– It should be clearly related to the problem, be
interesting to the respondent, and be easy to respond to.

• Make the questionnaire as short as possible


01/23/2023 96
Step 4: Formatting the questionnaire:

• When you finalize your questionnaire, be sure that:


– A separate, introductory page is attached to each
questionnaire
– Explaining the purpose of the study
– Requesting the informant's consent to be interviewed
– Assuring confidentiality of the data obtained.

01/23/2023 97
Con…

• Each questionnaire has a heading and space to insert


the number, date and location of the interview
• You may add the name of the interviewer, to facilitate
quality control.

01/23/2023 98
Step 5:Translation

• If interview will be conducted in one or more local


languages, the questionnaire has to be translated to
standardize the way questions will be asked
• After having it translated you should have it
retranslated into the original language
• You can then compare the two versions for differences
and make a decision concerning the final phrasing of
difficult concepts.
01/23/2023 99
Step 6: Pre-test

• A pretest usually refers to a small-scale trial of a


particular research component
• we should do a test-run to check whether the
questionnaire gives us the information we required &
whether both the respondents & we feel at ease with it

01/23/2023 100
What makes a well designed questionnaire?

• Good appearance (easy for the eye)

• Short and simple


• Relevant and logical

⇒ High response
⇒ Easier to collect
to summarize

to analyse
01/23/2023 101
In general in questionnaire design remember to:

• Use familiar and appropriate language


• Avoid abbreviations, double negatives, etc
• Avoid two elements to be collected through one question
• Pre-code the responses to facilitate data processing
• Avoid embarrassing and painful questions
• Watch out for ambiguous wording
• Avoid language that suggests a response
01/23/2023 102
Con…
• Start with simpler questions
• Ask the same question to all respondents
• Provide others specify, or don’t know options where
appropriate
• Provide the unit of measurement for continuous variables
• For open ended questions, provide sufficient space for
the response
• Ensure options are mutually exclusive
01/23/2023 103
Con…
• Arrange questions in logical sequence

• Group questions by topic, and place a few sentences of


transition between topics
• Provide complete training for interviewers

• Pretest the questionnaire on 20-50 respondents in actual


field situation
• Check all filled questionnaire at field level
• Include “thank you” after the last question
01/23/2023 104
Choosing a Method of Data Collection

 Decision-makers need information that is relevant,


timely, accurate and usable
 The cost of obtaining, processing and analyzing these
data
 The challenge is to find ways, which lead to information
that is cost-effective, relevant, timely and important for
immediate use
01/23/2023 105
Con...

 Some methods pay attention to timeliness and reduction


in cost
 Others pay attention to accuracy and the strength of the
method in using scientific approaches

01/23/2023 106
Methods of data organization and presentation

 Numbers that have not been summarized and organized


are called raw data. i.e. The data collected in a
survey/study
 In most cases, useful information is not immediately
evident from the mass of unsorted data.
• Statistics is used to organize and interpret research
observations and findings
01/23/2023 107
 Before interpretation & communication of the findings,
the raw data must be organized and presented in a clear
and understandable way
 Distribution: arrangement of data values of variables
obtained from units (individuals or objects) as
measured in terms of time, place and person.
 Data set: a collection of values for a certain variable.

01/23/2023 108
Array (ordered array)

• It is a serial arrangement of numerical data in an


ascending or descending order
• This will enable us to know the range over which the
items are spread and will also get an idea of their
general distribution

01/23/2023 109
Con…

• Ordered array is an appropriate way of presentation


when the data are small in size (usually less than 20)
• The array helps us to see at once the maximum and
minimum values

01/23/2023 110
Example

01/23/2023 111
Frequency Distribution
 The arrangement of data set in a table using values and
their corresponding frequency of occurrence within a
data set.
Frequency: the number of same values within a data set.

 A table which contains the values of a variable and the


corresponding frequencies with which each value occurs
(or frequencies with which data falls within each range)

01/23/2023 112
Con…

• Consists of the set of classes or categories along with


their numerical counts in each i.e. frequency

• The actual summarization and organization of data


starts from frequency distribution

• The distribution condenses the raw data into a more


useful form and allows for a quick visual
interpretation of the data
01/23/2023 113
Con…

A frequency distribution is constructed for three main reasons:


To facilitate the analysis of data
To estimate frequencies of the unknown population
distribution from the distribution of sample data and
To facilitate the computation of various statistical
measures

01/23/2023 114
Frequency distributions for categorical
variables

 The categorical frequency distribution is used for data that


can be placed in specific categories, such as nominal- or
ordinal-level data
 Summarizing categorical variables (nominal & ordinal) is
simple
 Count the number of observations (frequency) in each category
and present as relative frequencies (percentages)
 Often presented in the form of table, bar and pie charts
01/23/2023 115
Con…
• A relative frequency distribution: shows the
proportion of counts that fall into each class or category
• A relative frequency value for any category is obtained
by dividing the number of observations in that category
by the total number of observations
– Rf= f/n

• This can be reported as a percentage by multiplying the


resulting fraction by 100 % (Rf)= f/n *100%
01/23/2023 116
Con…

• For nominal and ordinal data, frequency distributions


are often used as a summary.
• The percent (%) of times that each value occurs, or the
relative frequency, is often listed
• Tables make it easier to see how the data are distributed

01/23/2023 117
Example 1

• Distribution of Blood Types: Twenty-five individuals


were given blood test to determine their blood type.
• The data set is:
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A

Construct a frequency distribution for the data

01/23/2023 118
Con…

• Since the data are categorical, discrete classes are used


– There are four blood types: A, B, O, and AB
• These types will be used as the classes or categories for
the distribution
 Make a table has four columns (Classes or categories,
Tally, Frequency, Relative frequency (percent))
 Tally the data and Count the tallies and place the results in
terms frequency
 Find the percentage of values in each class by using the
formula:
% (Rf)= f/n *100%
01/23/2023 119
Con…

01/23/2023 120
Example 2
• A study was conducted to assess immunization
coverage among 830 children between 12-23 months
by collecting data on sex
Sex Frequency Relative frequency
Male 422 50.8%
Female 408 49.2%
Total 830 100%

01/23/2023 121
Example 3
 n=118 female patients were diagnosed with regards to
depressive illness as “no depression”, “mild depression”,
“moderate depression””
 The observed absolute and relative frequencies are shown in
a one-way table:
Depression Frequency Relative Frequency
None 26 23.6
Mild 67 60.9
Moderate 17 15.5
Total 110 100.0
01/23/2023 122
Example 4: Ordinal data
Pain intensity Frequency Relative frequency
No pain 70 35%
Mild pain 44 22%
Moderate pain 46 23%
Severe pain 26 13%
Very severe pain 14 7%
Total 200 100%

01/23/2023 123
Frequency distribution for numerical variables

• A frequency distribution can also show the number of


observations at different values or within certain
ranges
• For a discrete variable, the frequencies may be
tabulated either for each value of the variable or for
groups of values

01/23/2023 124
Con…

• With continuous variables, groups (class intervals)


have to be formed
• For both discrete or continuous data, the values are
grouped into distinct non-overlapping intervals,
usually of equal width.

01/23/2023 125
Example

01/23/2023 126
Con…

• For a continuous variable (e.g. – age), the frequency


distribution of the individual ages is not so interesting.
• Select a set of continuous, non-overlapping intervals
such that each value can be placed in one, and only
one, of the intervals.
• The first consideration is how many intervals to
include
01/23/2023 127
Con…
• We “see more” in
frequencies of age values
in “groupings”.
• Here, 10 year groupings
make sense.
• Grouped data frequency
distribution

01/23/2023 128
Grouped frequency distribution

Class Frequency Relative Cumulative Cumulative


interval Frequency Frequency Relative
Frequency
30-39 11 0.0582 11 0.0582
40-49 46 0.2434 57 0.3016
50-59 70 0.3704 127 0.6720
60-69 45 0.2381 172 0.9101
70-79 16 0.087 188 0.9948
80-89 1 0.0053 189 1.0001
Total 189 1.0001

01/23/2023 129
Con…
• For large samples, we can’t use the simple frequency
table to represent the data.
• We need to divide the data into groups or intervals or
classes.
• So, we need to determine:
1. The number of intervals (k): choosing the number of classes
2. The range (R): It is the difference between the largest and the
smallest observation in the data set.
3. The Width of the interval (w): Class intervals generally should
be of the same width. Each class should cover, namely, from
where to where each class should go.

01/23/2023 130
Con…

• Choosing suitable classification involves choosing the


number of classes and
• The range of values
• Both of these choices are arbitrary to some extent, but
they depend on the nature of the data and its accuracy,
and on the purpose the distribution is to serve.

01/23/2023 131
`
Some rules that are generally observed:
 Too few intervals are not good because information will be
lost. And Too many intervals are not helpful to summarize
the data.
 We seldom use fewer than 6 or more than 20 classes;
commonly followed rule is that 6<k<15,
 The exact number we use in a given situation depends
mainly on the number of measurements or observations we
have to group
01/23/2023 132
Con…
 A guide on the determination of the number of classes
(k) can be the Sturge’s Formula, given by:
K = 1 + 3.322×log(n), where n is the number of
observations
 And the length or width of the class interval (w) can be
calculated by:
W = (Maximum value – Minimum value)/K

= Range/K
01/23/2023 133
N.B):
 The Sturge's rule should not be regarded as final, but
should be considered as a guide only.
 The number of classes specified by the rule should be
increased or decreased for convenient or clear
presentation.

01/23/2023 134
Con…
• Suppose, for example, that we have a sample of 275
observations that we want to group
• The logarithm to the base 10 of 275 is 2.4393
• Applying Sturge's’ formula gives k= 1 + 3.3229(2.4393)
≈9
• In practice, other considerations might cause us to use 8 or
fewer or perhaps 10 or more class intervals
01/23/2023 135
Con…
 Classes should be mutually exclusive.
 Make sure that the smallest and largest values fall within
the classification, that
 None of the values can fall into possible gaps between
successive classes, and that the classes do not overlap,
namely, that successive classes have no values in common
 I.e. Class intervals should be continuous, non overlapping,
mutually exclusive and exhaustive
01/23/2023 136
Determination of class limits

• Class limits should be definite and clearly stated.


• Open-end classes should be avoided since they make it
difficult, or even impossible, to calculate certain further
descriptions that may be of interest.
• The smallest and largest values that can go into any
class are referred to as its class limits; they can be
either lower or upper class limits
01/23/2023 137
Con…
• The construction of grouped frequency distribution
consists essentially of four steps:
• Choosing the classes
• Find the number of classes by using Sturge’s guide and
calculate the range and width
• Select a starting point for the lowest class limit.

• This can be the smallest data value or any convenient


number less than the smallest data value

01/23/2023 138
Con…
• Add the width to the lowest score taken as the starting
point to get the lower limit of the next class.

• Keep adding until the number of classes.

• Subtract one unit from the lower limit of the second


class to get the upper limit of the first class. Then add
the width to each upper limit to get all the upper limits

01/23/2023 139
2. Sorting (or tallying) of the data into these classes,
3. Counting the number of items in each class, and
4. Displaying the results in the form of a chart or table

01/23/2023 140
• These data represent the record high temperatures in
degrees Fahrenheit (F) for each of the 50 states.
Construct a grouped frequency distribution
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114

01/23/2023 141
Step 1 Determine the classes.
• Find the highest value and lowest value:
H = 134 and L= 100.
• Find the range: R highest value - lowest value
R= 134-100= 34
• Select the number of classes desired; Using this
formula K = 1 + 3.322×log(n), where n is the number
of observations (n=50)
K= 1 + 3.322×log(50) ≈ 7 classes
01/23/2023 142
 Find the class width by dividing the range by
the number of classes.

 Width = R/Number of classes


= 34/7= 4.9

01/23/2023 143
Con…
 Select a starting point for the lowest class limit.

• Add the width to the lowest score taken as the starting point
to get the lower limit of the next class and keep adding
 100
 105
 110
 115
 120
 125
 135
01/23/2023 144
• Subtract one unit from the lower limit of the second class to get
the upper limit of the first class. Then add the width to each
upper limit to get all the upper limits.
• 105 – 1 = 104
then
 104
 109
 114
 119
 124
 129
 134
 139
Step 2: Tally the data.
Step 3: Find the numerical frequencies from the tallies.
01/23/2023 145
01/23/2023 146
Example 2:
 Construct a grouped frequency distribution of the
following data on the amount of time (in hours) that
80 college students devoted to leisure activities
during a typical school week:

01/23/2023 147
The data set for 80 college students
23 24 18 14 20 24 24 26 23 21
16 15 19 20 22 14 13 20 19 27
29 22 38 28 34 32 23 19 21 31
16 28 19 18 12 27 15 21 25 16
30 17 22 29 29 18 25 20 16 11
17 12 15 24 25 21 22 17 18 15
21 20 23 18 17 15 16 26 23 22
11 16 18 20 23 19 17 15 20 10
01/23/2023 148
Step 1 Determine the classes.
 Select the number of classes desired.
 Using the above formula,
 K = 1 + 3.322 × log (80) = 7.32 ≈ 7 classes

 Here we can use 6 classes as more fit the data set

 Find the highest and lowest values: Maximum value =


38 and Minimum value = 10
 Find the range: Range = 38 – 10 = 28 and

01/23/2023 149
Con…

 Find the width by dividing the range by the number


of classes and rounding up. W = 28/6 = 4.67
 Using width of 5, we can construct grouped
frequency distribution for the above data as:

01/23/2023 150
• Select a starting point (usually the lowest value; add
the width to get the lower limits.
(10, 15, 20, 25, 30, 35)
• Find the upper class limits.
(14, 19, 24, 29, 34, 39,)
• Step 2: Tally the data.
• Step 3: Find the numerical frequencies from the
tallies, and find the cumulative frequencies.

01/23/2023 151
Time spent (in Tally Frequency Cumulative frequency
hours)

10-14 //// /// 8 8

15-19 //// //// //// //// //// /// 28 36

20-24 //// //// //// //// //// // 27 63

25-29 //// //// // 12 75

30-34 //// 4 79

35-39 / 1 80

01/23/2023 152
Cumulative Frequency:
 When frequencies of two or more classes are added up,
such total frequencies are called Cumulative Frequencies.
 This frequencies help as to find the total number of items
whose values are less than or greater than some value.
Relative frequencies
 On the other hand, express the frequency of each value
or class as a percentage to the total frequency.

01/23/2023 153
Cumulative relative frequency:
• The proportion of the total number of observations
that have a value less than or equal to the upper limit
of the interval

01/23/2023 154
• Note: two ways of expressing cumulative frequency distribution
• Less than cumulative frequency distribution'
• If we start the cumulating from the lowest size of the
variable to the highest size,
• More than cumulative frequency distribution

• If the cumulating is from the highest to the lowest value


• The most common cumulative frequency is the less than
cumulative frequency.
01/23/2023 155
Mid-Point of a class interval
• Mid-point or class mark (Xc) of an interval is the value
of the interval which lies mid-way between the lower
true limit (LTL) and the upper true limit (UTL) of a class.
It is calculated as:
• Xc = Upper Class Limit + Lower Class limit

2
01/23/2023 156
Determination of Class Boundaries
• True limits (or class boundaries) are those limits, which
are determined mathematically to make an interval of a
continuous variable continuous in both directions, and
no gap exists between classes.
• Used for smoothening of the class intervals
• Subtract 0.5 from the lower and add it to the upper limit
01/23/2023 157
01/23/2023 158
Statistical Tables mcq
• A statistical table is an orderly and systematic presentation
of numerical data in rows and columns.
• Rows (stubs) are horizontal and columns (captions) are
vertical arrangements.
• The use of tables for organizing data involves grouping the
data into mutually exclusive categories of the variables and
counting the number of occurrences (frequency) to each
category.
01/23/2023 159
• These mutually exclusive categories, for qualitative
variables, are naturally occurring groupings.
– For example, Sex (Male, Female), Marital status
(single, Married, divorced, widowed, etc.), Blood
group (A, B, AB, O),

01/23/2023 160
• In the case of large size quantitative variables like
weight, height, etc. measurements, the groups are
formed by amalgamating (mixing) continuous values
into classes of intervals.

01/23/2023 161
 Based on the purpose for which the table is designed
and the complexity of the relationship, a table could
be either of
– Simple frequency table or
– Cross tabulation.

01/23/2023 162
 The simple frequency table is used when the
individual observations involve only to a single
variable
Whereas
• The cross tabulation is used to obtain the frequency
distribution of one variable by the subset of another
variable.
01/23/2023 163
• For simple frequency distributions, the denominators
for the percentages are the sum of all observed
frequencies, i.e. 210
• On the other hand, in cross tabulated frequency
distributions where there are row and column totals,
the decision for the denominator is based on the
variable of interest to be compared over the subset of
the other variable.
01/23/2023 164
Construction of tables
• Although there are no hard and fast rules to follow, the following
general principles should be addressed in constructing tables.
1. Tables should be as simple as possible.
2. Tables should be self-explanatory.
3. If data are not original, their source should be given in a
footnote.

01/23/2023 165
Con…
Self-explanatory tables:
• Title should be clear and to the point( a good title
answers: what? when? where? how classified ?) and it be
placed above the table.
• Each row and column should be labeled
• Numerical entities of zero should be explicitly written
rather than indicated by a dash. Dashed are reserved for
missing or unobserved data.
01/23/2023 166
Con…

• Totals should be shown either in the top row and the


first column or in the last row and last column.
• State clearly the unit of measurement used
• Explain codes and abbreviations in the foot-note

01/23/2023 167
Con..

A) Simple or one-way table:


 The simple frequency table is used when the individual
observations involve only to a single variable

01/23/2023 168
Table 1: Overall immunization status of children in Adami Tullu
Woreda, Feb. 1995

Immunization Number Percent


status

Not immunized 75 35.7

Partially immunized 57 27.1


Fully immunized 78 37.2

Total 210 100.0

Source: Fikru T et al. EPI Coverage in Adami Tulu. Eth J Health


Dev 1997;11(2): 109-113
01/23/2023 169
B) Two-way table:
 This table shows two characteristics and is formed
when either the caption or the stub is divided into two
or more parts.
• In cross tabulated frequency distributions where there
are row and column totals, the decision for the
denominator is based on the variable of interest to be
compared over the subset of the other variable.
01/23/2023 170
Table 2: TT immunization by marital status of the women of
childbearing age, town-X, South Wollo Zone, 2008

Immunization Status
Immunized Non Immunized
Marital Status
No. % No. No. % Total

Single 58 24.7 177 75.3 235


Married 156 34.7 294 65.3 450
Divorced 10 35.7 18 64.3 28
Widowed 7 50.0 7 50.0 14
Total 231 31.8 496 68.2 727
01/23/2023 171
C. Higher Order Table:
 When it is desired to represent three or more
characteristics in a single table.
• Example: A study was carried out on the degree of job
satisfaction among doctors and nurses in rural and urban
areas.
– To describe the sample a cross-tabulation was
constructed which included the sex and the residence
(rural Vs urban) of the doctors and nurses interviewed.
01/23/2023 172
Table 3: Distribution of health professional by sex and residence

Residence

Profession/Sex Urban Rural Total

Doctors Male 8 (10) 35 (21) 43 (17.7)

Female 2 (3) 16 (10) 18 (7.4)

Nurses Male 46 (58) 36 (22) 82 (33.7)

Female 23 (29) 77 (47) 100 (41.2)

Total 79 (100) 164 (100) 243 (100)

01/23/2023 173
Diagrammatic Representation of Data

• A diagram is a visual form for presentation of


statistical data
• It consists in presenting statistical material in
geometric figures, pictures, maps and lines or curves.
• Highlighting their basic facts and relationship.

01/23/2023 174
Importance of Diagrammatic Representation

1. They have greater attraction than mere figures. They


give delight to the eye and add a spark of interest.
2. They help in deriving the required information in less
time and without any mental strain.
3. They facilitate comparison.

01/23/2023 175
Con…

4. They may reveal unsuspected patterns in a complex set


of data and may suggest directions in which changes
are occurring.
 This warns us to take an immediate action.

5. They have greater memorizing value than mere


figures. This is so because the impression left by the
diagram is of a lasting nature.
01/23/2023 176
Limitations of Diagrammatic Representation

• Diagrammatic representation is not an alternative to


tabulation.
– It only strengthens the textual exposition of a subject, and
cannot serve as a complete substitute for statistical data.

• It can give only an approximate idea and as such where


greater accuracy is needed diagrams will not be suitable.
• They fail to bring to light small differences
01/23/2023 177
Construction of graphs
• The choice of the particular form among the different
possibilities will depend on personal choices and/or the
type of the data.
• Bar charts and pie chart are commonly used for
qualitative or quantitative discrete data.
• Histograms, frequency polygons, Line graph are used
for quantitative continuous data.
01/23/2023 178
Con…
 There are, however, general rules that are commonly
accepted about construction of graphs.
1. Every graph should be self-explanatory and as simple as
possible.
2. Titles are usually placed below the graph and it should
again question what ? Where? When? How classified?
3. Legends or keys should be used to differentiate variables
if more than one is shown.
01/23/2023 179
Con…

4. The axes label should be placed to read from the left


side and from the bottom.
5. The units in to which the scale is divided should be
clearly indicated.
6. The numerical scale representing frequency must start
at zero or a break in the line should be shown.

01/23/2023 180
Bar Chart

• Bar diagrams are used to represent and compare the


frequency distribution of discrete variables or
categorical series.
• When we represent data using bar diagram, all the bars
must have equal width and the distance between bars
must be equal.

01/23/2023 181
Con…
• Plotting the frequency (or relative frequency) of each
category, and drawing a bar
• Categories are listed on the horizontal axis (X-axis)
• Frequencies or relative frequencies are represented on
the Y-axis (ordinate)
• The height of each bar is proportional to the frequency
or relative frequency of observations in that category
01/23/2023 182
Con…

Method of constructing bar chart


• All the bars must have equal width
• The different bars should be separated by equal distances
• All the bars should rest on the same line called the base
• Label both axes clearly

01/23/2023 183
Con…

A. Simple bar chart:


 It is a one-dimensional diagram in which the bar
represents the whole of the magnitude.
 The height or length of each bar indicates the size
(frequency) of the figure represented.
 The bars are not joined together

01/23/2023
(leave space between bars) 184
30%

25%

20%

15%
25%
10%
18%
5% 12%

0%
Blood film Stool Ix CBC

• Figure: Laboratory investigations done in “X” health center, Dessie


Zuria Woreda

01/23/2023 185
B. Multiple bar chart:

 In this type of chart the component figures are


shown as separate bars adjoining each other.
 The height of each bar represents the actual value of
the component figure.
 It depicts distributional pattern of more than one
variable

01/23/2023 186
01/23/2023 187
C. Component (or sub-divided) Bar Diagram:
• Bars are sub-divided into two or more component
parts of the figure
• These sorts of diagrams are constructed when each
total is built up from two or more component figures.
• Each part of the bar represents a certain item and
proportional to the magnitude of that particular item.
01/23/2023 188
Con….

They can be of two kind:

I. Actual Component Bar Diagrams:


• When the over all height of the bars and the
individual component lengths represent actual
figures.

01/23/2023 189
01/23/2023 190
II. Percentage Component Bar Diagram:
• Where the individual component lengths represent the
percentage each component forms the over all total.
Note that a series of such bars will all be the same total
height, i.e., 100 percent.

01/23/2023 191
01/23/2023 192
Pie chart
• Usual in qualitative data

• Pie chart shows the relative frequency for each category by


dividing a circle into sectors, the angles of which are
proportional to the relative frequency.
Steps to construct a pie-chart
 Construct a frequency table
 Change the frequency into percentage (P)
 Change the percentages into degrees, where: where:
degree = Percentage X 3600
 Draw a circle and divided it accordingly
01/23/2023 193
Fig 3(a) Pie chart indicating frequency of categories
of birth weight

43 793
268

Very low
Low
Normal
Big

8870

01/23/2023 194
Histograms
• Histograms are frequency distributions with continuous class
intervals that have been turned into graphs
• To construct a histogram, we draw the interval boundaries on a
horizontal line and the frequencies on a vertical line
• The bars are drawn to touch each other, to show the underlying
continuity of the data
• Bars are then drawn over the intervals in such a way that the
areas of the bars are all proportional in the same way to their
interval frequencies.
01/23/2023 195
Con….

• Histograms are frequency distributions with continuous


class interval that have been turned into graphs.
• Given a set of numerical data, we can obtain impression
of the shape of its distribution by constructing a
histogram

01/23/2023 196
Con…

• A histogram is constructed by choosing a set of non-


overlapping intervals (class intervals) and counting the
number of observations that fall in each class.

• The number of observations in each class is called the


frequency.

• Hence histograms are also called frequency


distributions
01/23/2023 197
Con…

• Example: Consider the data on time (in hours) that 80

college students devoted to leisure activities during a

typical school week:

01/23/2023 198
Time spent (hrs) Class boundary Frequency Cumulative freq

10 – 14 9.5-14.5 8 8

15 – 19 14.5-19.5 28 36

20-24 19.5-24.5 27 63

25 – 29 24.5-29.5 12 75

30 – 34 29.5-34.5 4 79

35 – 39 34.5-39.5 1 80

01/23/2023 199
01/23/2023 200
Frequency polygon: mcq

• A frequency distribution can be portrayed graphically in yet


another way by means of a frequency polygon.

• The frequency polygon is a graph that displays the data by using


lines that connect points plotted for the frequencies at the
midpoints of the classes.

– The frequencies are represented by the heights of the points.

• To draw a frequency polygon we connect the mid-point of the tops


of the cells of the histogram by a straight line.
01/23/2023 201
Con…
• It can be also drawn without erecting rectangles as
follows:
•  The scale should be marked in the numerical values of the
mid-points of intervals.
• Erect ordinates on the mid-point of the interval-the length
or altitude of an ordinate representing the frequency of the
class on whose mid-point it is erected.
• Join the tops of the ordinates and extend the connecting
line to the scale of sizes.
01/23/2023 202
Con…
• Instead of drawing bars for each class interval, sometimes
a single point is drawn at the midpoint of each class
interval and consecutive points joined by straight line.
• Graphs drawn in this way are called frequency polygons
(line graphs).
• Frequency polygons are superior to histograms for
comparing two or more sets of data.
01/23/2023 203
e.g. Consider the above data on time spend on leisure activities
Solution
 Step 1: Find the midpoints of each class.
• Recall that midpoints are found by adding the upper and lower
boundaries and dividing by 2:

 Step 2: Draw the x and y axes.


 Label the x axis with the midpoint of each class, and then use a
suitable scale on the y axis for the frequencies.

 Step 3: Using the midpoints for the x values and the


frequencies as the y values, plot the points.
01/23/2023 204
Time spent (hrs Class boundary Mid point Frequency Cumulative
freq

10 – 14 9.5-14.5 12 8 8

15 – 19 14.5-19.5 17 28 36

20-24 19.5-24.5 22 27 63
25 – 29 24.5-29.5 27 12 75
30 – 34 29.5-34.5 32 4 79

35 – 39 34.5-39.5 37 1 80

01/23/2023 205
01/23/2023 206
Con…
• O-give or cumulative frequency curve: When the
cumulative frequencies of a distribution are graphed the
resulting curve is called O-give Curve.
• The cumulative frequency is the sum of the frequencies
accumulated up to the upper boundary of a class in the
distribution.
• The O-give is a graph that represents the cumulative
frequencies for the classes in a frequency distribution.207
01/23/2023
Con…
To construct an Ogive curve:
i) Compute the cumulative frequency of the distribution.

ii) Prepare a graph with the cumulative frequency on the vertical axis
and the true upper class limits (class boundaries) of the interval
scaled along the X-axis (horizontal axis).
 The true lower limit of the lowest class interval with lowest scores
is included in the X-axis scale; this is also the true upper limit of
the next lower interval having a cumulative frequency of 0.

01/23/2023 208
01/23/2023 209
The line diagram:mcq
• The line graph is especially useful for the study of some variables
according to the passage of time.
• The time, in weeks, months or years is marked along the
horizontal axis; and the value of the quantity that is being studied
is marked on the vertical axis.
• The distance of each plotted point above the base-line indicates its
numerical value.
• The line graph is suitable for depicting a consecutive trend of a
series over a long period.
01/23/2023 210
01/23/2023 211
Thank
you
01/23/2023 212

You might also like