Abiot Research Methods

RESEARCH METHODS
YOM INSTITUTE OF ECONOMIC DEVELOPMENT
Abiyot Animaw(Phd)
Email:abiotanimaw2014@gmail.com
Abiyot Animaw(PhD.) 1
Course outlines
1. The Fundamentals of Research: The scientific
Method
2. The Research Process and Preparing the
Research proposal
3. Survey and Elements of Sampling
4. Data Collection Techniques
5. Data Processing and Analysis
6. Writing the Research Report
7. Quantitative Analysis: Basic univariate and
multivariate analysis
Chapter-one
The Fundamentals of
Research
The Scientific Method and Economic Research
•In order to have a clear understanding of the term
research it is important to know the meaning of the
scientific method.
• The practice of economic research is informed by the
innovative thinking and careful attention to the details
of data.
• It is the research methodology adopted in the process
of economic research that makes economic research
scientific.
• Thus, research methods have become central parts of
the social science investigations.
• A science is a coherent body of thought about a topic
over which there is a broad consensus among its
practitioners.
• Science aims to discover universal laws about how the
world works
•The scientific method is a conscious and an objective,
logical and systematic method of investigation.
• The SM can be defined as the pursuit of truth as
determined by logical considerations.
• It refers to the ideas, rules, techniques, and approaches
that are commonly used in research.
• The Scientific Method uses both Inductive and
deductive Logical arguments.
• The inductive logic
• begins by observing facts
• then proceed from observations of facts to
universal laws
• The deductive view starts with
• (a) Universal law(s), or
• (b) Initial conditions
• We then show how an event we are trying to
explain follows
• Then we test the generality of this.
• The goal of scientific research is to test
current theories and develop new ones:
• Is the current theory consistent with the
world?
• Can we develop a new theory that is more
consistent with the world?
• The SM involves the following series of steps or
procedures:
– Identification of the problem to be investigated
– Collection of essential facts to prove or disprove
the theory
– Selection (hypothesizing) of tentative solutions to
the problem
– Evaluation of these alternative solutions to
determine which of them is in accordance with all
the facts, and
– The final selection of the most likely solution.
• Each research problem involves gathering data to
confirm or refute an existing theory
• We are not supposed to let our prior beliefs influence
our conclusions
• Because of this, we formulate our hypothesis before
we gather the data
• We don't let the data alter our hypothesis so we know
we will confirm it.
• Any truly scientific theory must be refutable
otherwise, it is just dogma.
• A sound theory is one that has withstood many
attempts to refute it
Guiding Principles of the SM
Three useful guiding principles need to be considered:
a) Use of empirical evidence and ethical neutrality
• The scientific method is based on empirical
evidence and utilizes relevant concepts
• The goal of SM is to facilitate independent
verification of scientific observation through the
use of empirical evidence.
• It presupposes ethical neutrality i.e., it aims at
nothing but making only adequate and correct
statements about population objectives.
b) Logical reasoning (Critical thinking)
• The SM practices logical reasoning which allows
determination of the truth through steps different
from emotional and hopeful thinking
• Its methodology is made known to all concerned
for critical scrutiny and for use in testing the
conclusions through replications
• Critical thinkers always use logical reasoning.
• Logic is not an ability that humans are born with
but rather it is a skill that must be learned within a
formal educational environment.
c) Possessing a Skeptical Attitude
• The final key idea is skepticism.
• the constant questioning of your beliefs and
conclusions.
• It requires the possession of skeptical attitudes.
• Scientific attitude (SA) implies skeptical attitude
• A skeptic holds beliefs tentatively, and is open to new
evidence and rational argument
The Meaning of Research
• Research begins with a question that the researcher is
trying to answer
• Research inculcates scientific reasoning and promotes
the development of logical habits of thinking.
• Hence, the term research can be broadly defined as the
scientific and systematic search or inquiry for pertinent
information or knowledge.
• It is a movement from the known to the unknown
and is a deliberate response to a need for
information in order to solve a given problem.
• It is an original contribution to the existing stock of
knowledge.
• The research activity comprises of the following
activities:
– the defining and redefining of the problem,
– the formulation of hypotheses or suggested
solutions,
– the collection, organization and evaluation of data or
facts,
– the making of deductions and reaching conclusions.
• Research requires a specific plan or procedure
– We need to know if the question is answerable and
how.
– We need to know whether the research project is
feasible in terms of time and money
• Research usually divides a problem into more manageable sub-
problems
– What is the current state of the Ethiopian economy?” is
vague by itself
– We could divide the economy into sectors
• by occupation: Agriculture, Household, etc.
• by industry
• Research accepts certain critical assumptions
– If you do not make assumptions, you cannot make logical
conclusions
• Research is, by its nature, cyclical
– Every research project brings answers but also new questions
– Those questions, in turn, bring new research projects
The Purpose of Research
• The purpose of research is to discover new ideas or
solutions through the application of scientific
procedures.
• Research has a clearly articulated goal
• Examples:
»Test a current theory
»Add details to a theory
»Replace a theory with a better one
»Write a new theory where none existed,
etc.
• In general the purpose of research may be either of the
following:
Exploration
Description
Explanation
• The main factors to be considered before embarking on
research include:
 Type and nature of information sought
 Timing
 Availability of resources
 Cost/benefit analysis
 Ethical considerations
• Classification of Research Activities
• Different people may use different classification
systems.
– The classification may be in terms of:
• methods employed,
• the time dimension,
• research environment or
• data used.
– Accordingly, several types of research classifications
could be identified some of which may include:
Descriptive versus Analytical Research :
• The purpose of descriptive research is description of
the state of affairs as it exists at present.
• The main characteristics of this method are that the
researcher has no control over the variables.
• He can only report what has happened or what is
happening.
• Example; the frequency of shopping by people,
the preference of people, the number of
employed workers in a factory, etc.
• In analytical research the researcher has to use facts
or information and analyze these to make a critical
evaluation of the material.
– Analytical studies go beyond simple description in
their attempt to model empirically the social
phenomena under investigation.
• It asks “why” and tries to find the answer to a
problem.
Applied versus fundamental research:
• Research may be undertaken either to understand the
fundamental nature of a social reality (basic research)
or to apply knowledge to address specific practical
issues (applied research).
• Applied research aims at finding solution for an
immediate pressing problem facing a society or an
industrial or business organization.
• Applied research tries to solve specific policy problems
or help practitioners accomplish a specific task.
– Theory is less central than seeking a solution to
specific problem for a limited setting.
• Fundamental research is mainly concerned with
generalizations and with the formulation of a theory.
• It is primarily concerned with the understanding
of the fundamental nature of social reality.
• It is the source of most scientific ideas and ways
of thinking about the world.
• It is mostly exploratory in nature.
• So, gathering knowledge for knowledge’s sake is
termed as fundamental or basic research.
• Mostly deductive -seeks new conclusions from current
assumptions
Quantitative versus Qualitative Research:
• Quantitative research is based on the measurement
of quantitative figure or quantity or amount.
• It is applicable to phenomenon that can be
expressed in terms of quantity.
• Most often we are testing a hypothesis
• We collect data and see whether the hypothesis is
consistent with the data
• Methodology is simpler than qualitative research
• But it often takes longer- identifying, collecting, and
analyzing appropriate data is difficult and expensive.
• This approach can be further subdivided into:
– inferential,
– experimental and
– simulation approaches.
• The purpose of the inferential approach is to form a
database from which to infer characteristics or
relationships of populations.
• A survey population where a sample population is
studied to determine its characteristics and it is
then inferred that the population has the
characteristics.
• Experimental approach is characterized by much
greater control over the research environment.
• Some or all the variables are manipulated to
observe their effect on other variables.
• Simulation approach involves the construction of
artificial environment within which relevant information
and data can be generated.
• This permits an observation of the dynamic
behavior of a system under controlled conditions.
• Given values of initial conditions, parameters and
exogenous variables, a simulation is run to
represent the behavior of the process over time.
• Qualitative Research
– Qualitative research is concerned with subjective
assessment of attitudes, opinions, and behavior.
– Qualitative Research is a function of researchers’
insights and impressions.
– It generates results, which are not subjected to
rigorous quantitative analysis.
– Generally group interviews, projective techniques
and in depth interviews are used.
• Qualitative research is particularly important in the
behavioral sciences.
• BUT, social research is essentially pluralistic:
 researchers often combine quantitative and
qualitative research methods within the same study.
 Mixed-method research strategies are particularly
effective in policy-oriented research and the
contribution that qualitative research can make to
policy evaluation is increasingly being recognized.
Conceptual versus Empirical Research:
– This classification is similar to the applied versus
fundamental classification.
– Conceptual research is related to some abstract or
theory.
• Generally used by philosophers and other similar
thinkers to develop new concepts or reinterpret
the existing ones.
• Mostly deductive: Seeks new conclusions from
current assumptions
• Empirical research relies on experiences or
observations alone without due regard to system and
theory.
• It is data based research, coming up with
conclusions, which are capable of being verified, by
observations or experiments.
• Under empirical research the researcher first
provides himself with working hypothesis.
• He/she then works to get enough data to prove or
disprove his hypothesis.
Some other types of research:
– Research:
• Can be one time or longitudinal research,
• can be field setting or laboratory based or
simulation research,
• can also be clinical or diagnostic research,
• can be conclusion oriented or decision oriented,
etc.
• Time Dimension in Research
• Quantitative research may be divided into two
groups in terms of the time dimension:
• A single point in time (cross sectional)
• Multiple points research (longitudinal research)
• Cross –sectional research takes a snapshot
approach to social world.
• This is the simplest and less costly research
approach.
• Limitation – it cannot capture social processes or
changes.
• Longitudinal research examines features of people or
other units more than one time.
– It is usually more complex and costly than cross
sectional research but is also more powerful
especially with respect to social changes.
• Types of Longitudinal Research
– Time series research – this is longitudinal study on a
group of people or other units across multiple
periods (e.g. time series data on exports of coffee).
– The panel study – the researcher observes exactly
the same people group or organization across time
periods, each time using the snapshot approach.
• In panel study the focus is on individuals or
households.
– Example: interviewing the same people in 1991,
1993, 1995, etc, and observing the change is an
example of panel data set.
• A cohort Analysis – is similar to the panel study, but
rather than observing the exact same people, a
category of people who share similar life experience in
a specified period is studied.
– Hence the focus is on group of individuals not on
specific individuals or households.
Ethical Consideration in the Research Process
• Shared Values
• There is no one best way to undertake research.
– There is no universal method that applies to all
scientific investigations.
• Accepted practices for the conduct of research
can and do vary from discipline to discipline.
• There are, however, some important shared
values for the responsible conduct of research
that bind all researchers together.
Some of the most important shared values
include:
 HONESTY — conveying information truthfully and
honoring commitments,
 ACCURACY — reporting findings precisely and
taking care to avoid errors,
 EFFICIENCY — using resources wisely and
avoiding waste, and
 OBJECTIVITY — letting the facts speak for
themselves and avoiding improper bias.
During data collection
• Some ethical principles governing data collection
include: informed consent, respect for privacy and
safeguarding the confidentiality of data.
– Informed consent implies that persons who are
invited to participate in social research activities
should be free to choose to take part or refuse.
– They are free to decide after having been given the
fullest information concerning the nature and
purpose of the research, including any risks to
which they personally would be exposed, the
arrangements for maintaining the confidentiality of
the data, and so on.
– Thus, collection of data illegally, under false
pretenses, from minors, etc is unethical.
– Getting access and consent to do research is
therefore, essential.
• During analysis (Misuse of data)
• Fabrication and falsification of research results are
serious forms of misconduct.
– It is a primary responsibility of a researcher to avoid
either a false statement or an omission that distorts
the truth.
• In order to preserve accurate documentation of
observed facts with which later reports or conclusions
can be compared, every researcher has an obligation to
maintain a clear and complete record of data acquired.
• Records should include sufficient detail to permit
examination for the purpose of
• replicating the research,
• responding to questions that may result from
unintentional error or misinterpretation,
• establishing authenticity of the records, and
• confirming the validity of the conclusions.
• It is considered a breach of research integrity to fail to

report data that contradict or merely fail to support the
conclusions, including the purposeful withholding of
information about confounding factors.
– Negative (unexpected) results must be reported.
When writing the research paper- Plagiarism
• Plagiarism is the unauthorized use of someone else's
thoughts or wording either by
• incorrect documentation, failing to cite your
sources altogether, or
• simply by relying too heavily on external resources.
• Whether intentional or inadvertent some or all of
another author's ideas become represented as your
own.
– Plagiarizing undermines your academic integrity.
• It betrays your own responsibilities as a student
writer, your audience, and the very research
community you were entering by deciding to write a
research paper in the first place.
• Incidentally, plagiarism also includes informal published
material such as the "buying" of a paper from another
student.
• If you feel cheating is an easy way-out, and the moral
and intellectual consequences don't sound alarm bells,
stop and think of the serious punitive repercussions you
could incur.
• Because it is intellectual theft, plagiarism is considered
as an academic crime with punishment anywhere from
an F on that particular paper to dismissal from the
course to expulsion from a college or university.
Chapter two
The Research Process and
Preparing the Research proposal
Introduction
Most research activities follow the following steps:
– Selecting a topic
– formulating the research problem and research
questions
– Extensive literature survey
– Formulating the working hypothesis
– Preparing the research design and determining the
sample design
– Collecting and analyzing the data
– Generalizations and interpretations of results
– Preparing the report and presentation of the results
(formal write up of conclusions reached)
1. Identification of a Research Topic
• To do a research a topic or a research problem must
be identified.
What is a Research problem?
• A research problem refers to some difficulty, which a
researcher experiences in the context of either a
theoretical or practical situation and wants to obtain a
solution for it.
• A research topic should seek to advance the state of
science
• It usually starts with a felt practical or theoretical
difficulty.
• It should ask a question to which the answer is not
known
– It should ask an interesting question
– It should be as objective as possible
Some Potential Sources of a Research Topic
• A topic must spring from the researcher’s mind like a
plant springs from its own seed.
• The best way to identify a topic is to draw up a shortlist
of possible topics that have emerged from your reading
or from your own experience that look interesting.
• A general area of interest or aspect of a subject
matter (agriculture, industry, social sector, etc.) may
have to be identified at first.
Some important sources, which may be helpful to a
select a research problem.
A) Professional Experience
• Own professional experience is the most important
source of a research problem.
– Contacts and discussions with research oriented
people,
– attending conferences, seminars, and
– listening to learned speakers
• are all helpful in identifying research problems.
b) Inferences from theory and Professional literature
• Research problems can also emanate from inferences
that can be drawn from theories or from empirical
literature.
• Two types of literature can be reviewed.
The conceptual literature
The empirical literature
• Research reports, bibliographies of books, and articles,
periodicals, research abstracts and research guides
suggest areas that need research.
c) Technological and Social Changes
– New developments bring forth new development
challenges for research.
– New innovations and changes need to be carefully
evaluated through the research process.
• In general, the most fundamental rule of good research
topic is to investigate questions that sincerely interest
you.
• i.e. a research which a researcher honestly enjoys
even if he/she encounters problems frustrating or
discouraging.
The following points are important in selecting a
research problem:
• Subject, which is overdone, should be avoided
since it will be difficult to throw any new light in
such cases for the average researcher.
• Controversial subjects should not become the
choice of the average researcher.
• Too narrow or too broad or vague problems
should be avoided
• The importance of the subject in terms of:
• The qualification and training of researcher,
• The cost involved and the time factor, etc.
• In general, the choice of a research topic is not made
in a vacuum and is influenced by several factors:
• Interest and Values of the Researcher,
• Current Debate in the Academic world,
• Funding,
• The value and power of the subject, etc.
Common/overused topics
• For example, if impact of microfinance on
poverty reduction in rural Ethiopia has been
well researched, you may consider a topic
impact of microfinance on poverty reduction
in urban Ethiopia
General /too broad topics
• General /too broad topics should be avoided.
• For example, why is productivity in Ethiopia
lower than in Kenya? Too broad
• However, why is labor productivity in
agriculture lower in Ethiopia than in Kenya
may be appropriate
Topics related to religion,
politics/controversy
• Controversies have the propensity to arouse
emotions among people, because the
surrounding issues are highly subjective and
sensitive.
• Select this topic-if required by the programme
of study
About Animaw(PhD.) 53
Too narrow topics
• Picking a topic that is too narrow should be
avoided, because it will be near impossible to find
enough information to conduct the research.
• For example, consider a research topic “ why Fasil
broke up with Sara?” this topic is too narrow and
focused in a single event.
• If this topic is changed to “Determinants of
breakups in relationships among undergraduate
students”- the topic will become more
researchable.
Research gaps and topic selection
• Research gap-explained
Exercise : identify research gaps in the
text
2. Definition and Statement of the Problem
– After a topic has been selected the next task is to
define it clearly.
• To define a problem means to put a fence around
it.
• It involves the task of laying down the boundaries
within which a researcher shall study the
problem.
• The researcher must be certain that he knows
exactly what his/her problem is before he/she
begins work on it.
• A problem clearly defined is a problem half
solved.
• Defining the problem unambiguously will help to find
answers to questions like:
– What data are to be collected?
– What characteristics of data are relevant and need
to be studied
– What relations are to be explored
– What techniques are to be used for the purpose
• Hence, in the formal definition of the problem the
researcher is required
• to describe the background of the study, its
theoretical basis and underlying assumptions in
concrete, specific and workable questions.
Useful steps in defining the research problem:
a) Statement of the problem in a general way
– Problem should be stated in a broad and general
way keeping in mind either some practical concern
or some scientific or intellectual interest.
b) Understanding the nature of the problem more
clearly
– The next steps is to understand its origin and nature
clearly.
– The best way to understand the problem is to
discuss it with other more acquainted or
experienced people.
c) Survey of the available literature
• The researcher must devote sufficient time in reviewing
both the conceptual and empirical literature.
– Research already undertaken on related topics or
problems need to be systematically reviewed.
• This exercise enables the researcher to
1. find out what data are available
2. find out if there are gaps in theories, and
3. find out whether the existing theory is
applicable to the problem under study.
4. find out what other researchers have to say
about the topic,
5. ensure that no one else has already exhausted
the questions that you aim to examine, etc.
d) Developing the idea through discussion
– Discussion concerning a problem often produces
useful information.
– The discussion sharpens the researcher’s focus of
attentions on specific aspects of the study.
e) Rephrasing the research problem:
– The researcher must sit to rephrase the research
problem into a working proposition.
– Through rephrasing, the researcher puts the research
problem in as specific terms as possible so that it may
become operationally viable and may help in the
development of a working hypothesis.
f) In addition
– Technical terms or phrases, with special meanings
used in the statement of the problem should be
clearly defined.
– Basic assumptions or postulates relating to the
research problem should be clearly stated.
– The suitability of the time period and the sources of
data available must be considered in defining the
problem.
– The scope of the investigation within which the
problem is to be studied must be mentioned
explicitly in defining a research problem.
3. Extensive Literature Survey
– Once the problem is formulated, the researcher
should undertake an extensive literature survey
connected with the problem.
• Academic journals, conference proceedings,
dissertations, government reports, policy
reports, publications of international
organizations, books, etc. must be tapped
depending on the nature of the problem.
– Usually one source leads to the next and the best
place for the survey is the library.
The main goals are:
– To familiarize oneself with the issue and establish
credibility
– To show the path of prior research and how current
project is linked to it
– To integrate and summarize what is known in the
area
– To learn from others and stimulate new ideas.
• From the survey of the literature, you will know
whether your question has not been answered
elsewhere
• You will also know what other people have said about
similar topics.
– You can learn how other people faced
methodological and theoretical issues similar to
your own
– You can learn about sources of data that you might
not have known before
• You can know other researchers tackling similar
problems
• Potential literature sources
• General information: Google, etc.
• Books: Library, amazon.com
• Articles:
–JSTOR: www.jstor.org
–EconLit
• Web Pages
Structuring the review:
– Summarize every article briefly; a sentence or two
will do
– Interpret the article in light of its relevance to your
own study
– Critique it, if necessary
– Show the stock of knowledge building up over the
course of the literature
– Show how your research topic adds naturally to this
stock of knowledge
4. Developing of working hypothesis
• A hypothesis is a statement, which predicts the
relationship between two or more variables.
– Formulating an appropriate and realistic research
hypothesis is a sin quo non for a sound research.
• The role of the hypothesis is to guide the researcher by
delimiting the area of research and keep him/her on the
right track.
• It is a tentative answer to a research question that can
be confirmed or refuted by data
• Formulating hypothesis is particularly useful for causal
relationships.
Main problems in formulating a working hypothesis
– Formulation of a hypothesis is not an easy task.
– The main problems that may arise include:
• The lack of clear theoretical framework
• The lack of ability to utilize that theoretical
framework logically
• The failure to be acquainted with available
research techniques so as to be able to phrase the
hypothesis properly.
Characteristics of useable hypotheses
• The hypothesis must be conceptually clear.
• This involves two things
–the concept should be clearly defined,
–the hypothesis should be commonly accepted
one. In other words, the hypothesis should be
stated in simple terms.
• The hypothesis should have empirical referents.
– no useable hypothesis embody moral judgments.
– while a hypothesis may study value judgment such a
goal must be separated from a moral preachment or a
plea for acceptance of one’s values.
• The hypothesis must be specific.
– all the operations and predictions indicated by it
should be spelled out.
• The hypothesis should be related to available
techniques.
– A theorist who does not know what techniques are
available to test his/her hypothesis is on a poor way
to formulate useable hypothesis or questions.
• The hypothesis should be related to a body of theory.
– It should posses theoretical relevance.
• The hypothesis should be testable.
– hypothesis should be formulated in such a way that it
is possible to verify it.
5, Scope and Limitations
A research project must be clear about its scope
(a) Geographical limitations
– The study might only focus on some regions, even
though the question pertains to a given country -
Ethiopia
(b) Limitations by industry or occupation
– The study might only be able to capture some
industries or occupations- formal or informal sector.
C) Limitations by subject matter
– The researcher also must know that many other
interesting questions may arise that are outside of the
scope of the study.
6. Preparing the Research Design
– The research design is a plan that specifies the
sources and types of information relevant to the
research question.
• It is the arrangement of conditions for the
collection and analysis of data in a manner that
aims to combine relevance to the research
purpose.
• It is the conceptual structure, plan, and strategy of
investigation within which research is conducted.
• It constitutes the blue print for the collection,
measurement and analysis of data.
– The design that gives the smallest experimental error
is the best design.
• The following elements are critical when making design
decisions
– What type of data is required (required data)
– Where can the required data be found (source of
data)
– What will be the sampling design
– What techniques of data collection will be used
– How will the data be analyzed (method of data
analysis)
7. Selecting the Sample
– The researcher must decide the way of selecting a
sample.
– Samples could be either probability or no probability
samples.
8. Execution of the Project
– Execution involves how the survey is conducted, by
means of structured questionnaire or otherwise,
etc.
• Several ways of collecting the data exist. They may
differ in terms of
(i) money costs
(ii) time costs and
(iii) other resources
• Survey data can be collected by any one or more of the
following ways:
• By observations
• Through personal interviews
• Through telephone interviews
• By mailing questionnaires/through schedules
• The researcher should select one of these methods
taking into account:
– the nature of investigations,
– objectives and scope of the study,
– financial resources,
– available time and the desired level of accuracy, etc.
9. Analyzing the Data
– After the data have been collected the researcher
turns to the task of analyzing them.
– The analysis may involve a number of closely related
operations such as:
–Editing of the raw data
–Summarizing and tabulation of the data to
obtain answers to research questions
–Drawing statistical inferences.
• Various statistical software are available for data entry
and analysis.
– SPSS, STATA, Cspro, Spreadsheet programs such as
Excel, Lotus, etc.
• Second round editing is done once the data entry is
completed by examining the frequency distributions,
averages, ranges modes, etc. to detect outliers.
• Analysis is completed with the preparation of
descriptive tables, running econometric and
mathematical models or programming models.
10. Interpretation and Generalizations
– Explaining and discussing the research results in line
with the theoretical framework is part of the
interpretation exercise.
– The real value of research lies in its ability to arrive
at certain generalizations.
11. Preparation of the Report
– The research process is completed only when the
results are shared with the scientific community.
– Report should be written in concise and objective
style in simple language avoiding vague expressions.
Preparing the Research Proposal
• The research proposal helps the researcher to organize
his/her ideas in a form whereby it will be possible for
him/her to look for flaws or inadequacies and is a pre
requisite in the research process.
– It serves as a basis for determining the feasibility of
the project and provides a systematic plan of
procedure for the researcher to follow.
– It assures that the parties understand the project’s
purpose and the proposed method of investigation.
– It provides an inventory of what must be done and
which materials have to be collected
The research proposal should usually contain the
following categories of information:
I. Introduction – this part should include the following
information
– a) The title – the title or the topic should be worded
in such a way that it suggests the theme of the
study.
– It should be long enough to be explicit but not too
long so that it is tedious – usually between 15 and
25 words.
– It should contain the key words – the important
words that indicate the subject.
There are three types of titles:
– Indicative title:
• they state the subject of the proposal rather than
expected outcomes.
• Example: The role of agricultural credit in
alleviating poverty in a low-potential area of
Ethiopia.
– Hanging titles have two parts: a general first part
followed by a more specific second part.
• Example: ‘Alleviation of poverty in low-potential
area of Ethiopia: the impact of agricultural credit’.
• Question-type titles are used less commonly than
indicative and hanging titles.
• However, they are acceptable where it is possible to
use few words – say less than 15.
– Example: ‘Does agricultural credit alleviate poverty
in low-potential areas of Ethiopia’.
2. Statement of the Problem
• This section makes up between one fourth and one
half of the proposal.
• It is an expansion of the title.
– It introduces the research by situating it (by giving
background), presenting the research problem and
saying how and why this problem will be “solved.“
• Without this important information the reader cannot
easily understand the more detailed information about
the research that comes later.
– It also explains why the research is being done
(rationale) which is crucial for the reader to
understand the significance of the study.
The problem statement should make a convincing
argument that there is not sufficient knowledge
available to explain the problem or there is a need to
test what is known and taken as fact.
– It should provide a brief overview of the literature
and research done in the field related to the
problem and of the gaps that the proposed research
is intended to fill.
• Some ways to demonstrate that you are adding to the
knowledge in your field:
• Gap: A research gap is an area where no or little
research has been carried out.

• Raising a question: The research problem is defined by
asking a question to which the answer is unknown, and
which you will explore in your research.
• Continuing a previously developed line of enquiry:
Building on work already done, but taking it further (by
using a new sample, extending the area studied, taking
more factors into consideration, taking fewer factors into
consideration, etc.).
• Counter-claiming: A conflicting claim, theory or method is
put forward.

3. Objectives of the study:
– in this section the specific activities to performed are
listed.
– This is the step of rephrasing the problem into
operational or analytical terms, i.e. to put the
problem in as specific terms as possible.
– This section is rather brief usually not more than half
a page at most.
• This is because the rationale for each objective
will already have been established in the previous
section.

• The general objective provides a short statement of the
specific goals pursued by the research.
• The specific objectives are the objectives against which
the success of the whole research will be judged.
– The specific objectives are operational and indicate
the type of knowledge to be produced, audiences to
be reached, etc.
• An objective for a proposal should be Specific,
Measurable, Achievable, Realistic and Time-bound –
that is, SMART.

4. Review of the Literature:
• The theoretical and empirical framework from which
the problem arises must briefly be discussed.
• Both conceptual and empirical literature is to be
reviewed for this purpose.
– The researcher has to make it clear that his problem
has roots in the existing literature but it needs
further research and exploration.
• The analysis of previous research eliminates the risk of
duplication of what has been done.

5. The Hypothesis:
– questions that the research is designed to answer
are usually framed as hypothesis to be tested on the
basis of evidence.
– It gives direction to the data gathering procedure.
6. Significance of the Study:
– This section justifies the need of the study.
– It describes the type of knowledge expected to be
obtained and the intended purpose of its
application.
– It should indicate clearly how the results of the
research could influence theory or practice.

The Rational for undertaking a research study can be:
1. to show the existence of a time lapse between the
earlier study and the present one, and therefore,
the new knowledge, techniques or considerations
indicate the need to replicate the study.
2. to show that there are gaps in knowledge provided
by previous research studies and to show how the
present study will help to fill in these gaps and add
to the quantum of existing knowledge.

• Hence, the justification should answer the following:
– How does the research relate to the priorities of the
Region and the country?
– What knowledge and information will be obtained?
– What is the ultimate purpose that the knowledge
obtained from the study will serve?
– How will the results be disseminated?
– How will the results be used, and who will be the
beneficiaries?

7. Definition of terms and concepts:
• it is necessary to define all unusual terms and concepts
that could be misinterpreted.
– Technical terms or words and phrases having special
meanings need to be defined operationally.
8. Scope and limitations of the study:
• boundaries of the study should be made clear with
reference to
– (i) the scope of the study by specifying the areas to
which the conclusions will be confined and
– (ii) the procedural treatment including the sampling
procedures, the techniques of data collection and
analysis, etc.

9. Basic assumptions:
• assumptions are statements of ideas that are accepted
as true.
– They serve as the foundation upon which the
research study is based.
II) Methodology
– The methodology will explain how each specific
objective will be achieved.
• It is impossible to define the budgetary needs of the
research project in the absence of a solid methodology
section.

a) Procedures for collecting data – the details about the
sampling procedures and the data collecting tools are
described.
(i) Sampling – in research situations the researcher
usually comes across unmanageable populations in
which large numbers are involved.
(ii) Tools (instruments) – in order to collect evidence
or data for a study the researcher has to make
use of certain tools such as observations,
interviews, questionnaires, etc.
• The proposal should explain the reasons for selecting
a particular tool or tools for collecting the data.

b) Procedures for treating data (method of analysis)
 In this section, the researcher describes how
he/she organizes, analyses and interprets the
data.
• The details of the statistical techniques and the
rationale for using such techniques should be
described in the research proposal.
(i) Statistical inference models
– Regression analysis is a good analytical tool,
providing a method to test various hypotheses
relating to the classical economic theory.
• The analysis is built upon the casual factor-effect
analytical framework.
• A range of regression models as well as various ways of
estimating regression coefficients, the most common
being the OLS method.

(ii) Mathematical programming models
• An example of a mathematical programming model is
the linear programming model.
• It is, however, only one example from a wide range of
mathematical models.
– There are also non-linear and dynamic mathematical
programming models that address a range of
economic and policy analysis questions and
hypotheses.
• The central theme in these models is to optimize an
objective functions subject to a set of constraints.

(iii) Simulation models
• Simulation is the operation of an abstract prototype of
a real system designed to trace out dynamic
interactions.
• Simulation models have acquired substantial appeal
among policy analysts because of their ability to
explore the consequences of a wide range of
alternative sets of policies, plans and even
management strategies.

III) Budgeting and Scheduling the Research
– Research costs money, depending on its complexity
and number of people and activities employed.
– A proposal should include a budget estimating the
funds required for travel expenses, typing, printing,
purchase of equipment, tools, books, etc.
• It would include all or some of the following items:
• Management time
• Bought out resources time
• Data collection
• Data analysis cost – software and hardware
• Transport cost
• Respondent’s incentives
• Research must also be scheduled appropriately.
– Researcher should also prepare a realistic time schedule for
completing the study within the time available.
– Dividing a study into phases and assigning dates for the
completion of each phase helps the researcher to use is time
systematically.
IV. Citations and references
• It is important that you correctly cite all consulted published
and unpublished documents that you refer to in the proposal.
• This allows the reader to know the sources of your information.
• Every reference you cite must appear in the list of references at
the end of the proposal.

VI. Bibliography
• Be sure to include every work that was referred to in
the proposal
• You do not have to refer to any other works if you do
not want to; the bibliography does not have to be long
or complete.
• Formats vary slightly by journal, etc.
• A common format:
– For a book: Smith, Adam (1776). An Inquiry into the
Nature and Causes of the Wealth of Nations. London:
Dent and Sons publishing.
– For an article: Coase, Ronald (1937). “The Nature of
the Firm.” Economica 4, 386-405.

VII. Appendix
• Mathematical formulae or proofs that are referred to in
the proposal
• Supporting documentation and evidence-
– letter from owner of data, etc.
– permission from any necessary authorities
– Evidence of material support
– Evidence of researcher qualifications
– Other supporting documentation
• For secondary data, documentation that the data are
available, and that they contain the measurements you
need
• For primary data, documentation that you will be able
to collect the data

Chapter Three
Survey and Elements of Sampling

Survey and Field Research Methods
• Survey research is one of the most basic methods in
economic research
• Survey research permits a rigorous step by step
development and testing of complex propositions
through survey data.
• The aim of sample surveys is to generalize from the
sample to the population.
• The three most common purposes of surveys are:
– Description
– Explanation, and
– Exploration

Basic Survey Designs
• We could distinguish between two basic types of survey
designs
Cross sectional surveys
– Data are collected at one point in time
– Less expensive and most common type
Longitudinal Survey
– Surveys are collected at different point in time.
– Useful for capturing changes over time.
Survey Sampling:
– Some studies involve only small number of people and
thus all of them can be included.
– But when the population is large, it is usually not
possible to undertake a census of all items in the
population.

• Sampling is the process of selecting a number of study
units from a defined study population.
– It aims at obtaining consistent and unbiased estimates
of the population parameters.
• There are two principles underlying any sample design:
– The need to avoid bias in the selection procedure
– The need to gain maximum precision.
• Bias can arise:
• if the selection of the sample is done by some non-
random method i.e. selection is consciously or
unconsciously influenced by human choice
• if the sampling frame (i.e. list, index, population record)
does not adequately cover the target population.
– if some sections of the population are impossible to find
or refuse to co-operate.
• Major Reasons for Sampling
1) Resource Limitations: A sample study is usually less
expensive than a census.
2) Superior Quality of Results:
» more accurate measurement
3) Infinite Population: sampling is also the only process
possible if the population is infinite.
4) Destructive nature of some tests: Sampling remains
the only choice when a test involves the destruction of
the items under study.
– Example: testing the quality of a commodity
(beer, cigarette, coffee, etc.)

• Representativeness
• Representativeness is important particularly if you want
to make generalization about the population.
• A representative sample has all the important
characteristics of the population from which it is drawn.
• For Quantitative Studies:
• If researchers want to draw conclusions which are valid
for the whole study population, they should draw a
sample in such a way that it is representative of that
population.
• For Qualitative Studies:
• representativeness of the sample is NOT a primary
concern.
• We select study units which give us the richest possible
information.
• you go for INFORMATION-RICH cases!

Steps in Sampling Design
• The critical steps in sampling are:
a) Identifying the relevant population: when one wants
to undertake a sample survey the relevant population
from which the sample is going to be drawn need to be
identified.
• Example: if the study concerns income, then the
definition of the population elements as
individuals or households can make a difference.
b) Determining the method of sampling:
• Whether a probability sampling procedure or a
non-probability sampling procedure has to be
used is also very important.
c) Securing a sampling frame:
• A list of elements from which the sample is actually
drawn is important and necessary.
d) Identifying parameters of interest:
• what specific population characteristics (variables and
attributes) may be of interest.
e) Determining the sample size
• The determination of the sample size deepens on several
factors.
i) Degree of homogeneity: The size of the population
variance is the single most important parameter.
• The greater the dispersion in the population the larger
the sample must be to provide a given estimation
precession.
ii) Degree of confidence required: Since a sample can
never reflect its population for certain, the researcher
must determine how much precision s/he needs.
• Precision is measured in terms of
–(i) An interval range in which we would expect
to find the parameter estimate.
–(ii) The degree of confidence we wish to have
in the estimate.
iii) Number of sub groups to be studied:
• When the researcher is interested in making
estimates concerning various subgroups of the
population then the sample must be large enough
for each of these subgroups to meet the desired
quality level.
iv) Cost: cost considerations have major impact on
decisions about the size and type of sample.
• All studies have some budgetary constraint and
hence cost dictates the size of the sample.
To determine the sample size:

1. Use prior information: If our process has been studied
before, we can use that prior information to determine
our sample size.
• This can be done by using prior mean and variance
estimates and by stratifying the population to reduce
variation within groups.

2. Rule of Thumb: are based on past experience with
samples that have met the requirements of the statistical
methods.
• Researchers use it because they rarely have
information on the variance or standard errors.
3. Practicality: Of course the sample size you select must
make sense.
• We want to take enough observations to obtain
reasonably precise estimates of the parameters
of interest but we also want to do this within a
practical resource budget.
• Therefore the sample size is usually a compromise
between what is DESIRABLE and what is FEASIBLE.
• In general, the smaller the population, the bigger
the sampling ratio has to be for a reasonable
sample.

• Hence:
• For small populations (under 1000 a researcher
needs a large sampling ratio (about 30%). Hence, a
sample size of about 300 is required for a high
degree of accuracy.
• For moderately large population (10,000), a smaller
sampling ratio (about 10%) is needed – a sample size
around 1,000.
• To sample from very large population (over 10
million), one can achieve accuracy using tiny
sampling ratios (.025%) or samples of about 2,500.
• These are approximates sizes, and practical
limitations (e.g. cost) also play a role in a
researcher’s decision about sample size.

• Sample Size in Qualitative Studies
• There are no fixed rules for sample size in qualitative
research.
• The size of the sample depends on WHAT you try to find
out, and from what different informants or perspectives
you try to find that out.
• the sample size is therefore estimated as precisely as
possible, but not determined.
• Probability and non-probability sampling
• There could be several sampling methods that could be
used to draw a sample.
• Two types:
• probability samples
• non-probability means.
• Probability sampling is based on the concept of random
selection of survey units.
• It uses a random selection procedures to ensure that
each unit of the sample is chosen on the basis of chance.
• A randomization process is used in order to reduce or
eliminate sampling bias so that the sample is
representative of the population from which it is drawn.
• A sample will be representative of the population
from which it is drawn if all members of the
population have an equal chance of being included in
the sample.
• Probability sampling requires a sampling frame (a listing
of all study units).

• Probability samples, although not perfectly
representative are more representative than any other
type of sample.
• So, probability sampling has considerable advantages
over all other forms of sampling.
• First, sampling errors can be calculated.
• Second, probability samples rely on random
process, i.e. the selection process operates in a truly
random method (no pattern).
• Finally, since each element has an equal chance or
probability of being selected it is possible to get
consistent and unbiased estimate of the population
parameter.

• Types of probability sampling methods
• Generally speaking we could distinguish between
the following types of sampling designs.
• Simple Random Sampling Technique
• Systematic sampling Technique
• Stratified Sampling Technique
• Cluster Sampling Technique.
• Hybrid Sampling

1. Simple Random Sampling (SRS)
– The SRS is the simplest and easiest method of
probability sampling.
– It is the sampling procedure in which each element
of the population has an equal chance of being
selected into the sample.
– It assumes that an accurate sampling frame exists.
– Usually two methods are adopted to pick a sample.
• The lottery method
• Table of random number:

• SRS requires a listing of the entire population of
interest. This may not be possible for national surveys.
• It is too expensive to interview a national face to
face sample based on SRS.
• The cost of interviewing randomly selected
individual drawn from a list of the entire
population is extremely high.
• So, the SRS can only be applied in situation where the
population size is small.

2. Systematic Sampling Technique
– In SYSTEMATIC SAMPLING individuals are chosen at
regular intervals (for example every fifth) from the
sampling frame.
– Under systematic sampling procedures, instead of a list
of random number the researcher calculates a
sampling interval.
• The sampling interval is the standard distance
between elements selected in the sample.
– The major advantages of SS are its simplicity and
flexibility.

3. Stratified Sampling
• Most populations can be segregated into a number
of mutually exclusive sub populations or Strata.
• The stratified sampling technique is particularly
useful when we have heterogeneous populations.
• After a population is divided into the appropriate
strata a simple random sample can be taken either
using the SRS or the SS techniques from each
stratum.

• The reasons for stratifying
There are three major reasons why a researcher chooses
a stratified random sampling.
(a) To increase a sample’s statistical efficiency.
(b)To provide adequate data for analyzing the various
subpopulation.
(c)To enable different research methods and
procedures to be used in different strata.

• How to Stratify
– Three major decisions must be made in order to
stratify the given population into some mutually
exclusive groups.
(1) What stratification base to use: stratification
would be based on the principal variable under study
such as income, age, education, sex, location, religion,
etc.
(2) How many strata to use: there is no precise answer
as to how many strata to use.
• The more strata the closer one would be to come
to maximizing inter-strata differences and
minimizing intra-strata variables.

(3) What strata sample size to draw: different
approaches could be used:
• One could adopt a proportionate sampling
procedure.
–If the number of units selected from the
different strata are proportional to the total
number of units of the strata then we have
proportionate sampling.
• Or use disproportionate sampling, which
allocates elements on the basis of some bias.

4. Cluster Sampling:
– The selection of groups of study units (clusters)
instead of the selection of study units individually is
called CLUSTER SAMPLING.
• If the total area of interest happens to be a big
one and can be divided into a number of smaller
non –overlapping areas (clusters) and if some of
the groups or clusters are selected randomly we
have cluster sampling.
– Clusters are often geographic units (e.g., districts,
villages) or organizational units (e.g., firms, clinics,
training groups, etc).
• Cluster sampling addresses two problems:
– Researchers lack a good sampling frame for a
dispersed population.

– The cost to reach a sample element is very high and
cluster sampling reduces cost by concentrating
surveys in selected clusters.
• Multistage area sampling (MAS) - is a cluster sampling
with several stages:
– First take a sample of a set of geographic regions or
clusters – randomly select X number of clusters.
– Next, a subset of geographic area is sampled within
each of those regions and so on.
– Finally a sample of elements is drawn from smaller
areas.
5. Hybrid sampling
– Where there is no single way to sample a particular
population some researchers use a combination of
the four different methods discussed above.
• Non-Probability Sampling
• non-probability selection is non random i.e., each
member does not have a known non-zero chance of
being included.
• Generally thee conditions need to be met in order to
use non-probability sampling.
– First, if there is no desire to generalize to a
population parameter, then there is much less
concern whether or not the sample fully reflects the
population - when precise representation is not
necessary.

• Secondly, it is used because of cost and time
requirements.
– probability sampling could be prohibitively
expensive since it calls for more planning and
repeated callbacks to assure that each selected
sample unit is contacted.
• Thirdly, though probability sampling may be superior in
theory there are breakdowns in its applications.
– The total population may not be available for the
study in certain cases.

Non-probability sampling methods:
(1) Convenience sampling
• The method selects anyone who is convenient.
• It can produce ineffective, highly un-
representative samples and is not recommended.
• Such samples are cheap, however, biased and full
of systematic errors.
– Example: the person on the street interview
conducted by television programs is an
example of a convenient sample.

(2) Quota Sampling
– Quotas are assigned to different strata groups and
interviewers are given quotas to be filled from
different strata.
– A researcher first identifies categories of people
(e.g., male, female) then decides how many to get
from each category.
• The major limitation of this method is the absence of
an element of randomization. Consequently the extent
of sampling error cannot be estimated.
• is used in opinion pollsters, marketing research and
other similar research areas.

(3) Purposive or Judgment sampling
• Purposive sampling occurs when one draws a non-
probability sample based on certain criteria.
– When focusing on a limited number of informants,
whom we select strategically so that their in-depth
information will give optimal insight into an issue is
known as purposeful sampling.
• It uses the judgment of the expert in selecting cases.
• BUT, care should be taken that for different categories of
informants; selection rules are developed to prevent
the researcher from sampling according to personal
preference.

(4) Snowball (Network) Sampling
– This is a method for identifying and sampling (or
selecting) the cases in a network.
• Snowball sampling is based on an analogy to a
snowball, which begins small but becomes larger
as it is rolled on wet snow and pick up additional
snow.
– Snowball sampling begins with one or a few people
or cases and spread out on the basis of links to the
initial case.
• You start with one or two information-rich key
informants and ask them if they know persons
who know a lot about your topic of interest.

• Problems in Sampling
– Two types of errors:
–Non sampling errors
–Sampling errors
– Non Sampling errors are biases or errors due to
fieldwork problems, interviewer induced bias, clerical
problems in managing data, etc.
• These would contribute to error in a survey,
irrespective of whether a sample is drawn or a
census is taken.
– On the other hand, error which is attributable to
sampling, and which therefore, is not present in
information gathered in a census is called sampling
error.

a) Non-Sampling Error
• Non sampling error refer to
– Non-coverage error
– Wrong population is being sampled
– No response error
– Instrument error
– Interviewer’s error
• Non-Coverage sampling error: This refers to sample frame
defect.
– Omission of part of the target population (for instance,
soldiers, students living on campus, people in hospitals,
prisoners, households without a telephone in telephone
surveys, etc).
– Non-coverage error also occurs when the list used for the
sampling are incomplete or are outdated.

• The wrong population is sampled
– Researchers must always be sure that the group
being sampled is drawn from the population they
want to generalize about or the intended population.
• Non response error
– Some people refuse to be interviewed because they
are ill, are too busy, or simply do not trust the
interviewer.
• One should try to reduce the incidence of non-
response errors.
– Non-response error can occur in any interview
situation, but it is mostly encountered in large-scale
surveys with self-administered questionnaires.

• It is important in any study to mention the non-
response rate and to honestly discuss whether and
how the non-response might have influenced the
results.
• Instrument error
– The word instrument in sampling survey means the
device in which we collect data- usually a
questionnaire.
– When a question is badly asked or worded, the
resulting error is called instrument error.
• Example: leading questions or carelessly worded
questions may be misinterpreted by some
researchers.

• Interviewer error : This occurs when some
characteristics of the interviewer such as age, sex,
affects the way in which the respondent answer
questions.
– Example: questions about sexual behavior might be
differently answered depending on the gender of the
interviewer.
• To sum up, a researcher must ensure that non sampling
error are avoided as far as possible, or is evenly
balanced (non systematic) and thus cancels out in the
calculation of the population estimates.

b) Sampling Errors
– Sampling errors are random variations in the sample
estimates around the true population parameters.
• Error which is attributable to sampling, and which
therefore is not present in a census-gathered
information, is called sampling error.
– Sampling errors can be calculated only for
probability samples.
– Increasing the sample size is one of the major
instruments to reduce the extent of the sampling
error.
– Sampling error is related to confidence intervals.

• A narrower confidence interval means more precise
estimates of the population for a given level of
confidence.
• The confidence interval for the true population mean
is given by: 
Mean  z
n
• Mean is the sample mean, z is the value of the
standard variate at a given confidence level (to be read
from the table giving the area under the normal curve)
n is the sample size, and  is the standard deviation of
the sample mean.
• The sampling error is given by:

z
n

• Dealing with missing data:
– There are several reasons why the data may be
missing.
• They may be missing because equipment
malfunctioned, the weather was terrible, or
people got sick, or the data were not entered
correctly.
• If data are missing at random, by far the most common
approach is to simply omit those cases with missing
data and to run our analyses on what remains.

• Although deletion often results in a substantial
decrease in the sample size available for the analysis, it
does have important advantages.
– Under the assumption that data are "missing at
random”, it leads to unbiased parameter estimates.
• If, on the other hand, data are not missing at random,
but are missing as a function of some other variable, a
complete treatment of missing data would have to
include a model that accounts for missing data.

Chapter Four
Data Collection Techniques

Introduction
• Once an appropriate research topic is determined, proper data
collection, retention, and sharing are vital to the research
enterprise.
• Data are the foundation of economic research since every
study is a search for information about the given topic.
Definition: Data refers to any group of facts, measurements, or
observations used to make inferences about the problem of
investigation.
• Can range from material created in a laboratory, to
information obtained in social-science research, such as a
filled-out questionnaire, video and audio recordings, or
photographs, etc.

• No research project has unlimited funds, so selection of the most promising
data usually is affected by the priorities of cost and convenience.
• Collection of the data should be feasible and the data should be

sufficient to test the hypotheses
• The first step in good data management is designing your experiment that
create meaningful and unbiased data, that will not waste resources, and that
will appropriately protect human and animal subjects.
• If data are not recorded in a fashion that allows others to validate findings,
results can be called into question.

• Data selection should precede actual data collection.
– Clear data selection standards set in advance help prevent

selective data reporting –selectively excluding data that are not
supportive of a research hypothesis--later in the research.
• There are a number of methodological issues researchers should

be aware of when selecting data.
• Data types (e.g., nominal, ordinal or interval measures).
• Samples ("frames") and sample size, instruments.
• Methodologies, etc.
 Data collection methods
• Data collection is the process of gathering and measuring
information on variables of interest in an accepted systematic
fashion.
• Rigorous collection methodologies, when based on a foundation

of solid data selection, enable researchers to answer the research
questions, test hypotheses, and evaluate outcomes.
• Data collection methods vary by discipline and data types; but

the emphasis on ensuring accurate collection remains the same.

• Both the selection of appropriate data collection instruments and clearly
delineated instructions for their use reduce the likelihood of error.
• Consequences from improperly collected data include:
• Inability to answer research questions accurately.
• Inability to repeat and validate the study.
• Distorted, inaccurate findings.
• Wasted resources.
• Misleading other researchers to pursue fruitless avenues of investigation.
• Compromised decisions for public policy or private decision-making.
• Harm to human participants and animal subjects.

• While the impact of faulty data collection varies by discipline and the nature of
investigation, poor collection may cause disproportionate harm when the results
of the flawed research are used to support public policy recommendations.
• Effective methods make the detection of errors easier - whether the errors are
intentional and deliberate falsifications or inadvertent systematic or random
errors.
• As with data selection, it is critical that researchers have sufficient skills to

ensure the integrity of their data collection efforts.
• For instance: Quality data collection requires a rigorous and detailed

recruitment and training plan for data collectors.

Data management issues
 Storage and Protection
• Research data must be stored securely both during a research

project and after it ends.
• Reliable security policies and procedures are essential to

safeguard data stored electronically or in the physical form.
• Everyday risks like fire, water or other environmental damage,

or common technical failures like hard disk crashes, must be
considered.
• It's essential to make backup copies of a data collection

periodically and store the copies in a secure location.
 Confidentiality
• Confidentiality refers to limiting information access and
disclosure to authorized users and preventing access by or
disclosures to unauthorized persons.
• who can handle which portion of data,
• at what point during the project,
• for what purpose, and so on.
• If research data are maintained on a personal computer, it is

essential to keep the PC physically secured, update software
regularly, particularly anti-malware utilities, and
access protections.
 Integrity
• Integrity refers to the trustworthiness of the information.
• It postulates that data have not been modified inappropriately,

whether accidentally or deliberately.
• Integrity includes the notion that the person or entity in

question entered the right information
• i.e. the information reflected the actual circumstances

(validity) and under the same circumstances would generate
identical data (reliability).

 Data Collection Techniques:
• The critical question here is from where and how to get the data.
• Data can be acquired from Secondary and primary sources or from both.
 Secondary Sources of data
– Secondary sources are those, which have been collected by other
individuals or agencies.
– As much as possible secondary data should always be considered first, if

available.
• Why reinvent the wheel (waste resource) if the data already exists.

• When dealing with secondary data you should ask:
• Is the owner of the data making them available to you?
• Is it free of charge? If not, how will you pay?
• Are the data in a format that you can work with?
• A description of the sampling technique, i.e., how the

sample was collected is also necessary.

 Sources of Secondary Data
• Secondary data may be acquired from various sources:
• Reports of various kinds, books, periodicals, reference books

(encyclopedia), university publications (thesis, dissertations, etc.),
policy documents, statistical compilations, proceedings, personal
documents (historical studies) , etc.
• The Internet
 Advantages of Secondary data
• Can be found more quickly and cheaply.
• Most researches on past events or distant places have to rely on

secondary data sources.

 Limitations
• The information often does not meet one’s specific needs.
– Definitions might differ, units of measurements may be
different and different time periods may be involved.
• Difficult to assess the accuracy of the information-unknown

research design or the conditions under which the research
took place.
• Data could also be out of date.

 Primary Sources of Data
• Data that came into being for the first time by the people directly
involved in the research.
– There are two approaches to primary data collection:
• the qualitative approach and
• the quantitative approach

 Qualitative data collection approaches
– Qualitative data can be acquired from:
– Case studies,
– Rapid appraisal methods,
– Focus group discussions and
– Key informant interviews.

i) Case studies
• A case study research involves a detailed investigation of a
particular case.
• Through Interviews or Through Direct observation (field
visits).
ii) Rapid Rural Appraisal (RRA)
• RRA is a systematic expert observation but semi-structured
activity often by a multidisciplinary team.
• The RRA method:
• takes only a short time to complete,
• tends to be relatively cheap, and
• make use of more 'informal' data collection procedures.
• Includes interviews with individuals, households and key
informants as well as group interview techniques.

iii) Focus group discussions
• A FGD is a group discussion guided by a facilitator, during
which group members talk freely and spontaneously about a
certain topic.
• The group of individuals are expected to have experience or
opinion on the topic and selected by the researcher.
• Its purpose is to obtain in-depth information on concepts,
perceptions and ideas of a group.
• It is more than a question-answer interaction.
• The idea is that group members discuss the topic and
interact among themselves with guidance from the
facilitator.

Why use focus groups?
• The main purpose of a focus group research is to draw
upon respondents’ attitudes, feelings, beliefs, experiences
and reactions which would not be captured using other
methods.
– attitudes, feelings and beliefs may likely be revealed

via the social gathering and the interaction.
• Compared to individual interviews, which aim to obtain individual
attitudes, beliefs and feelings, focus groups elicit a multiplicity of
views and emotional processes within a group context.

 Strengths and weakness of FGDs
– FGDs can be a powerful research tools which provide valuable information in a
short period of time and at relatively low cost if the groups have been well chosen,
in terms of composition and number.
• BUT, FGD should not be used for quantitative purposes, such as the testing of
hypotheses or the generalization of findings for larger areas, which would
require more elaborate surveys.
• It may be risky to use FGDs as a single tool.
– In group discussions, people tend to centre their opinions on the most common
ones.
• Therefore, it is advisable to combine FGDs with other methods (in-depth interviews).
• In case of very sensitive topics group members may hesitate to express their feelings
and experiences freely.

iv) Key Informant Interview
• The key informant interview technique is an interviewing
process for gathering information from opinion leaders
such as elected officials, government officials, and
business leaders, etc.
• This technique is particularly useful for:
– Raising community awareness about socio-economic

issues
– Learning minority viewpoints
– Gaining a deeper understanding of opinions and

perceptions, etc.
v) Triangulation
• Triangulation refers to the use of more than one approach

to the investigation of a research question in order to
enhance confidence in the findings.
• The purpose of triangulation is to obtain confirmation of

findings through convergence of different perspectives.
 Why use triangulation?
– By combining multiple methods, and empirical materials,

researchers can hope to overcome the weakness or biases and
problems that are associated with a single method.

Taxonomy/Classification of Triangulation
1. Data triangulation: Involves gathering data at different times and
situations, from different subjects using different sampling
techniques.
– Surveying relevant stakeholders about the impact of a policy
intervention would be an example.
• Economic forecasters who rely on national accounts for their
modeling exercises find that there is a lag between the data and
prevailing economic conditions.
• They often make use of different data sources (and types) to fill
this gap.
• Example: Using survey data alongside time series data.

2. Investigator triangulation: involves using more than one field
researcher to collect and analyze the data relevant to a specific
research object.
• Asking scientific experimenters to attempt to replicate each

other’s work is an example.
3. Theoretical triangulation: involves making explicit references to

more than one theoretical tradition to analyze data.
• This is intrinsically a method that allows for different

disciplinary perspectives upon an issue.
• This could also be called pluralist or multi-disciplinary

triangulation. Abiyot Animaw(PhD.) 180
4. Methodological triangulation: involves the combination of different
research methods or using different varieties of the same method.
 two forms of methodological triangulation.
• Within method triangulation involves making use of different varieties of

the same method.
• Thus, in economics, making use of alternative econometric estimators

would be an example.
• Between method triangulation involves making use of different methods.

• Using ‘quantitative’ and ‘qualitative’ methods in combination..

 Quantitative Primary Data Collection Methods
• A quantitative research method involves a numeric or statistical

approach to research design.
• It involves the collection of data so that information can be

quantified and subjected to statistical treatment.
• Primary data may be collected through:
• Direct personal observation method, or
• Survey or questioning other persons,
• From a literature search, or
• Combining them.
 The Observation Method
– Observation includes the full range of monitoring

behavioral and non-behavioral activities.
• Advantages
• It is less demanding and has less bias.
• One can collect data at the time it occurs and need not
depend on reports by others.
• With this method one can capture the whole event as it

occurs.

 Weakness of the Method
• The observer normally must be at the scene of the event when it
takes place.
– But it is often difficult or impossible to predict when and
where an event will occur.
• Observation is also slow and expensive process since it requires
either human observers or some type of costly surveillance
equipment.
• Its most reliable results are restricted to data that can be
determined by an open or deliberate action or surface indicator.
• Limited as a way to learn about the past, or difficult to gather
information on such topics as intensions, attitudes, opinions and
preferences.

 The Survey Method: this is the most commonly used method in economic
research
• To survey is to ask people questions in a questionnaire - mailed or

handled by interviewers.
 Strength of the Survey Method
• It is a versatile or flexible method - capable of many different uses.
• It does not require that there be a visual or other objective perception of

the sought information by a researcher.
• Questioning might be the best way to learn about opinion and attitudes
of people.
• Surveys tend to be more efficient and economical than observations -

surveying using telephone or mail is less expensive..
 Weakness of the Method
– The quality of information secured depends heavily on the ability and
willingness of the respondents.
• A respondent may interpret questions or concept differently from what

was intended by the researcher.
• A respondent may deliberately mislead the researcher by giving false

information.
 Surveys could be carried out through:

• Face to face personal interview
• By telephone interview
• By mail or e-mail, or
• By a combination of all these.

a) Personal Face to face Interview
– It is a two-way conversion where the respondent is asked to
provide information.
• It involves one person interviewing another person for
personal or detailed information.
 Advantages
• The depth and detail of the information that can be secured
far exceeds the information secured from telephone or mail
surveys.
• Interviewers can probe additional questions, gather
supplemental information through observation, etc.
• Interviewers can make adjustments to the language of the
interview because they can observe the problems and effects
with which the interviewer is faced.

 Limitations of the Method
• The method is an expensive enterprise.
• Hence, personal interviews are generally used only when
subjects are not likely to respond to other survey methods.
• Interviewer may also be reluctant to visit unfamiliar
neighborhoods.
– Biased results grow out of the three types of errors:
• Sampling error – error introduced when dealing with a
sample instead of a population.
• Non-response error
• Response error

b) Telephone Interview
– Telephone can be a helpful medium of communication in

setting up interviews and screening large population for rare
respondent type.
– Telephone surveys are the fastest method of gathering

information from a relatively large sample.
• Unlike a mail survey, the telephone survey allows the

opportunity for some opinion probing.
• Telephone surveys generally last less than ten minutes.

 Strength of this method
– Moderate travel and administrative costs
– Faster completion of the study
– Responses can be directly entered on to the computer
 Limitations of this method
– Respondents must be available by phone.
– The length of the interview period is short so can result in

less complete responses.
– Those interviewed by phone find the experience to be less

rewarding than a personal interview.
C) Interviewing by mail (Solicited responses)
– Self-administrated questionnaires may be used in surveys.
– They are ideal for large sample sizes, or when the sample
comes from a wide geographic area.
 Advantages
– Lower cost than personal interview
– Persons who might otherwise be inaccessible can be contacted

(major corporate executives)
– Respondents can take more time to collect facts
– There is no possibility of interviewer bias.

Disadvantages
– Non response error is high
– Large amount of information may not be acquired
d) Online Surveys (E-mail and internet)
– E-mail surveys are relatively new and little is known about the
effect of sampling bias in internet surveys.
 Advantages:
– Very inexpensive -saves inputting costs as well
– Respondents feel privacy

 Disadvantages:
– Very biased toward wealthy - in Ethiopia
– Biased toward young everywhere – even the very poor have

less online access in industrialized world
• While it is clearly the most cost effective and fastest method of

distributing a survey, the demographic profile of the internet user
does not always represent the general population.
– Therefore, before doing an e-mail or internet survey, carefully

consider the effect that this bias might have on the results.
 Survey Instrument Design (questionnaire design)
• Actual instrument design begins by drafting specific

measurement questions in the form of a questionnaire.
• Questionnaires are easy to analyze.
• Data entry and tabulation for nearly all surveys can be

easily done with many computer software packages.
• Questionnaires are familiar to most people.
• Nearly everyone has had some experience completing

questionnaires and they generally do not make people
apprehensive/worrying.
• Questionnaires reduce bias.
• There is uniform question presentation.
• The researcher's own opinions will not influence the answer.
• Mailed questionnaires are less intrusive/affecting rspondents.
• When a respondent receives a questionnaire by mail, he/she

is free to complete the questionnaire on his/her own time-
table.

• One major disadvantage particularly with mailed questionnaires
is the possibility of low response rates.
– Low response is the curse of statistical analysis since it can
dramatically lower our confidence in the results.
– Response rates vary widely from one questionnaire to another
but well-designed studies consistently produce high response
rates.
• Another disadvantage of mailed questionnaires is the inability to
probe responses.
– Since questionnaires are structured instruments they allow
little flexibility to the respondent

• The lack of personal contact in mail/online surveys will have
different effects depending on the type of information being
requested.
– A questionnaire requesting factual information will probably not
be affected by the lack of personal contact.
– But, a questionnaire probing sensitive issues or attitudes may be
severely affected.
• When returned questionnaires arrive in the mail/e-mail, it's natural
to assume that the respondent is the same person you sent the
questionnaire to.
– But, for a variety of reasons, the respondent may not be who you
think it is.
– Often business questionnaires get handed to other employees for
completion.
The main Components of a questionnaire
– Identification data: respondent’s name, address, time and

date of interview, code of interviewer, etc.
– Instruction: Include clear and concise instructions on how to

complete the questionnaire. These must be very easy to
understand, so use short sentences and basic vocabulary.
– Information sought: major portion of the questionnaire
– Covering letter: brief purpose of the survey, who is doing it,

time involved, etc. The cover letter provides your best chance
to persuade the respondent to complete the survey.
Designing of a Questionnaire – general considerations
– A questionnaire is developed to directly address the goals of
the study.
– Well-defined goals are the best ways to assure a good
questionnaire design.
• When the goals of a study can be expressed in a few clear
and concise sentences, the design of the questionnaire
becomes considerably easier.
– Hence, ask only questions that directly address the study
goals.
• Avoid the temptation/wish to ask questions because it
would be "interesting to know".

• As a general rule, with only a few exceptions, long
questionnaires get less response than short questionnaires.
• Response rate is the single most important indicator of how much

confidence you can place in the results.
– You must do everything possible to maximize the response

rate.
– Hence, keep your questionnaire short to maximize response

rate.

• In developing a survey instrument the following issues need to be
considered carefully:
• Question content
• Question wording
• Response form
• Question sequence
• In other words, both the subject and wording of each question
as well as the psychological order of the question needs to be
considered.
• Questions that are more interesting, easier to answer, and
less threatening usually are placed early in the sequence to
encourage response.

1. Question Content
– Both questions and statements could be used in survey
research.
• Using both in a given questionnaire gives the researcher

more flexibility.
– Minimizing the number of questions is highly desirable, but

we should never try to ask two questions in one.
– Question content usually depends on the respondent‘s:
• ability, and
• willingness to answer the question accurately.

a) Is the question of proper scope? Respondent must be competent
enough to answer the questions.
– The respondent information level should be assessed when

determining the content and appropriateness of a question.
• Questions that overtax the respondent‘s recall ability may

not be appropriate.
b) Willingness of respondent to answer adequately
– Even if respondents have the information, they may be

unwilling to give it.
– Some topics are also too sensitive to discuss with strangers.

Examples: the most sensitive topics concern money matters and
family life.
– If respondents consider a topic to be irrelevant and
uninteresting they would be reluctant to give an adequate
answer.
– Some of the main reasons for unwillingness:
• The situation is not appropriate for disclosing the
information
• Disclosure of information would be embarrassing/worrying
• Disclosure of information is a potential threat to the
respondent

Some approaches that may help to secure more complete and truthful information:
• Use an indirect statement i.e., “other people”
• Motivate respondent to provide appropriate information.
• Use methods other than questioning to secure the data (observation).
• Change the design of the questioning process.
• Apply appropriate questioning sequences that will lead a

respondent from „safe“ question gradually to those that are more
sensitive.
• So, begin with a few non-threatening and interesting questions.

• Provide incentives as a motivation for a properly completed
questionnaire.
– What does the respondent get for completing your
questionnaire?
– Altruism /bringing advantages to respondents/ is rarely an
effective motivator.
– Attaching a dollar bill to the questionnaire works well.
• If you use the mail, always include a self-addressed postage-paid
envelope.

2. Question Wording: Using Shared Vocabulary
• In a survey the two parties must understand each other and
this is possible only if the vocabulary used is common to both
parties.
• So, don ’ t use uncommon words or long sentences or
abbreviations and make items as brief as possible.
•And, don’t use emotionally loaded or vaguely defined
words.
• One way to eliminate misunderstandings is to emphasize
crucial words in each item by using bold, italics or
underlining.
• Words like usually, often, sometimes, occasionally,
seldom, etc., are "commonly" used in questionnaires,
although it is clear that they do not mean the same to all
people.
3. Response structure or format:
• A third major decision area is the degree and form of the
structure imposed on the responses.
• The options range from open (free choice of words) to
closed (specified alternatives).
a) Open Ended Questions:
– An open-ended question (free response) question asks
questions to which respondents can give any answer.
• Open ended (free response) in turn range from
– Those in which the respondents express themselves
extensively to:
– Those in which the freedom is to choose one word in a
“fill in “ question.

Advantages
– Permit an unlimited number of possible answers
– Respondents can answer in detail and can qualify and clarify
responses
– Permit creativity, self expression, etc.
Limitations
• Different respondents give different answers – responses
may not be consistent.
• Some responses may be irrelevant
• Comparison and statistical analysis become very difficult.
• Articulate and highly literate respondents have an
advantage
» Requires greater amount of respondent time, thought and
effort.
b) Closed Questions
– Although the open response question may have many
advantages closed questions are generally preferable in large
surveys.
– Closed questions are often categorized as dichotomous or
multiple-choice questions.
Advantages
– Easier and quicker for respondents to answer
– Easier to compare the answers of different respondents
– Easier to code and statistically analyze
– Are less costly to administer
– Reduce the variability of responses
– Make fewer demands on interviewer skill, etc.
Limitations
– Can suggest ideas that the respondents would not otherwise
have
– Respondents can be confused because of too many choices
• During the construction of closed ended questions:
• The response categories provided should be exhaustive.
» They should include all the possible responses that
might be expected.
• In multiple choice type questions, the answer categories
must be mutually exclusive.
» The respondent may not be compelled to select
more than one answer.

4) Question Sequence – the order of the questions
• The order in which questions are asked can affect the
response as well as the overall data collection activity.
• Transitions between questions should be smooth.
– Grouping questions that are similar will make the
questionnaire easier to complete, and the respondent
will feel more comfortable.
• Items on a questionnaire should be grouped into logically
coherent sections.
• Grouping questions that are similar will make the questionnaire
easier to complete, and the respondent will feel more
comfortable.

• Questions that use the same response formats, or those that cover a
specific topic, should appear together.
– Each question should follow comfortably from the previous
question.
• Questions that jump from one unrelated topic to another are
not likely to produce high response rates.
• It may be necessary to present general questions before specific
ones in order to avoid response contamination.
• At the same time, it is important to group items into coherent
categories.
– All items should flow smoothly from one to the next.

5) Physical Characteristics of a Questionnaire
• The physical appearance of a questionnaire is important as

the wording of the questions asked.
• An improperly laid out questionnaire can lead respondents

to miss questions, can confuse them.
– So, the questionnaire should be spread out properly.
• Putting more than one question on a line will result in

some respondents skipping the second question.
• Abbreviating questions will result in misinterpretation of

the question. Abiyot Animaw(PhD.) 214
Formats for Responses
– A variety of methods are available for presenting a series of

response categories.
• Boxes
• Blank spaces
Providing Instructions
– Every questionnaire whether to be self administered by the

respondent or administered by an interviewer should contain
clear instructions.

• General instructions: It is useful to begin a questionnaire with
basic instructions to be followed in completing it.
• Introduction: If a questionnaire is arranged into subsections it is

useful to introduce each section with a short statement
concerning its content and purpose.
• Specific Instructions: Some questions may require special

instructions to facilitate proper answering.
• Interviewers instruction: It is important to provide clear

complementary instruction where appropriate to the interviewer.

6) Reproducing the questionnaire
• Having constructed questionnaire it is necessary to provide
enough copies for the actual data collection.
• A neatly reproduced instrument will encourage a higher
response rate, thereby providing better data.
• Use professional production methods for the
questionnaire.
• The final test of a questionnaire is to try it on representatives of
the target audience.
– If there are problems with the questionnaire, they almost
always show up here.

In Summary: Qualities of a Good Questionnaire
• There are good and bad questions and the qualities of a good
question are as follows:
1. Evokes the truth: Questions must be non-threatening.
• When a respondent is concerned about the consequences of
answering a question there is a good possibility that the answer
will not be truthful.
• Anonymous questionnaires that contain no identifying
information are more likely to produce honest responses than
those identifying the respondent.
• If your questionnaire does contain sensitive items, be sure to
clearly state your policy on confidentiality.

2. Asks for an answer on only one dimension.
• The purpose of a survey is to find out information.
• A question that asks for a response on more than one
dimension will not provide the information you are seeking.
• Example: a researcher investigating a new food snack asks "Do
you like the texture and flavor of the snack?"
• If a respondent answers "no", then the researcher will not
know if the respondent dislikes the texture or the flavor, or
both.
• Another questionnaire asks, "Were you satisfied with the
quality of our food and service?”

• Again, if the respondent answers "no", there is no way to know
whether the quality of the food, service, or both were
unsatisfactory.
 A good question asks for only one "bit" of information.
3. Can accommodate all possible answers.
• Multiple choice items are the most popular type of survey

questions because they are generally the easiest for a
respondent to answer and the easiest to analyze.

• Asking a question that does not accommodate all possible
responses can confuse and frustrate the respondent.
For example, consider the following question:
• What brand of computer do you own?
A. IBM PC
B. Apple
• Clearly, there are many problems with this question.
• What if the respondent doesn't own a microcomputer?
• What if he owns a different brand of computer?
• What if he owns both an IBM PC and an Apple?

• There are two ways to correct this kind of problem.
• The first way is to make each response a separate dichotomous

item on the questionnaire.
• For example: Do you own an IBM PC? (circle: Yes or No)
• Do you own an Apple computer? (circle: Yes or No)
• Another way to correct the problem is to add the necessary

response categories and allow multiple responses.
• This is the preferable method because it provides more

information than the previous method.
• What brand of computer do you own?
(Check all that apply)
• Do not own a computer
• IBM PC
• Apple
• Other, specify
4. Has mutually exclusive options.
• A good question leaves no ambiguity in the mind of the respondent.
• There should be only one correct or appropriate choice for the respondent to
make.

• An obvious example is:
• Where did you grow up?
A. Country side
B. Farm
C. City
• A person who grew up on a farm in the country side would not

know whether to select choice A or B.
• This question would not provide meaningful information.
• Worse than that, it could frustrate the respondent and the

questionnaire might find its way to the trash.
5. Produces variability of responses.
• When a question produces no variability in responses, we are

left with considerable uncertainty about why we asked the
question and what we learned from the information.
• If a question does not produce variability in responses, it will

not be possible to perform any statistical analyses on the item.
• For example: What do you think about this report? _
1. It's the worst report I've read
2. It's somewhere between the worst and best
3. It's the best report I've read

• Since almost all responses would be choice 2, very little
information is learned.
• Design your questions so they are sensitive to differences between

respondents.
As another example:
• Are you against drug abuse? (circle: Yes or No)
• Again, there would be very little variability in responses and

we'd be left wondering why we asked the question in the first
place.

6. Follows comfortably from the previous question.
• Grouping questions that are similar will make the

questionnaire easier to complete, and the respondent will
feel more comfortable.
• Questionnaires that jump from one unrelated topic to

another feel disjointed and are not likely to produce high
response rates.

7. Does not presuppose a certain state of affairs.
• Among the most subtle mistakes in questionnaire design are
questions that make an unwarranted assumption.
• An example of this type of mistake is:
• Are you satisfied with your current auto insurance? (Yes or
No)
• This question will present a problem for someone who does not
currently have auto insurance.
• Write your questions so they apply to everyone.
• This often means simply adding an additional response
category.

• Are you satisfied with your current auto insurance?
• Yes
• No
• Don't have auto insurance
• One of the most common mistaken assumptions is that the
respondent knows the correct answer to the question.
• Industry surveys often contain very specific questions that the
respondent may not know the answer to.
For example:
• What percent of your budget do you spend on direct mail
advertising?

• Very few people would know the answer to this question without
looking it up, and very few respondents will take the time and
effort to look it up.
• It is important to look at each question and decide if all
respondents will be able to answer it.
• Be careful not to assume anything.
• For example: the following question assumes the respondent
knows what Proposition 13 is about.
• Are you in favor of Proposition 13 ?
• Yes
• No
• Undecided

• If there is any possibility that the respondent may not know the
answer to your question, include a "don't know" response
category.
8. Does not imply a desired answer.
• The wording of a question is extremely important.
• We are striving for objectivity in our surveys and, therefore,

must be careful not to lead the respondent into giving the
answer we would like to receive.
• Leading questions are usually easily spotted because they use

negative phraseology.

As examples:
• Wouldn't you like to receive our free brochure?
• Don't you think the Congress is spending too much money?
9. Does not use emotionally loaded or vaguely defined words.
• This is one of the areas overlooked by both beginners and

experienced researchers.
• Quantifying adjectives (e.g., most, least, majority) are

frequently used in questions.
• It is important to understand that these adjectives mean

different things to different people.
10. Does not use unfamiliar words or abbreviations.
• Remember who your audience is and write your questionnaire
for them.
• Do not use uncommon words or compound sentences.
• Abbreviations are okay if you are absolutely certain that every

single respondent will understand their meanings.
• The following question might be okay if all the respondents are

accountants, but it would not be a good question for the general
public.
• What was your AGI last year?

11. Does not ask the respondent to order or rank a series of more
than five items.
• Questions asking respondents to rank items by importance
should be avoided.
• This becomes increasingly difficult as the number of items
increases, and the answers become less reliable.
• In order to successfully complete this task, the respondent
must mentally continue to re-adjust his answers until they
total one hundred percent.
• Limiting the number of items to five will make it easier for
the respondent to answer

Chapter Five
Data Processing and Analysis

Data Processing and Analysis
• Data analysis ranges from very simple summary statistics to
extremely complex multivariate analyses.
Data Preparation and Presentation
• Data processing starts with the editing, coding, classifying
and tabulation of the collected data.
i) Editing
– Editing of data is the process of examining the collected raw data to detect
errors and omissions.
• Editing involves a careful scrutiny of the completed questionnaires.
– In general one edits to assure that the data are:
 Accurate- correct in all details and capable of reaching the intended
target
 Consistent with other information/facts gathered
 Uniformly entered Abiyot Animaw(PhD.) 236
• The editing can be done at two levels
• On the field and in the office.
a) Field level Editing
• After an interview, field workers should review their
reporting forms, complete what was abbreviated, translate
personal shorthand, rewrite illegible entries, and make
callback if necessary.
b) Central editing
• The central editing takes place when all forms have been
completed and returned to the office.
• Data editors correct obvious errors such as entry in wrong
place, recorded in wrong units, etc.

ii) Coding
• Coding refers to the process of assigning numerals to
answers so that responses can be put into a limited number
of categories or classes.
• Data are transcribed from a questionnaire to a coding sheet.
• The coding must be:
– Appropriate, which implies that the classes or categories
must provide the best partitioning of data for testing
hypothesis and showing relationships.
– Exhaustive - there must be a class for every data item.
– Mutual exclusivity – category components should be
mutually exclusive meaning that specific answers can be
placed in one and only one cell in a given category set.

iii) Classification and Tabulation
• Most research studies result in a large volume of raw data,
which must be reduced into homogenous groups if we are
to get meaningful relationships.
• Classification is the process of arranging data in groups or classes
on the basis of common characteristics.
• Data having common characteristics are placed in similar
classes and in this way the entire data get divided into a
number of groups or classes.
• Tabulation is the process of summarizing raw data and displaying
it in compact form (i.e. in the form of statistical tables) for further
analysis.
• It is an orderly arrangement of data in columns and rows.

• Tabulation may be done by hand or by mechanical or electronic
devices such as the computer.
• The choice is made largely on the basis of the size and type of
study, alternative costs, time pressures and the availability of
computer facilities.
• In the case of computer tabulation computer programs such as
SPSS, Lotus, excel, STATA, etc. could be used.
• Tabulation may be classified as simple and complex.
– Simple tabulation gives information about one or more groups
of independent questions.
– Complex tabulation shows the division of data into two or
more categories.

• Tabulation provides the following advantages:
 It conserves space and reduces explanatory and descriptive
statement to a minimum.
 It facilitates the process of comparison
 It facilitates the summation of items and the detection of
errors and omissions
 It provides a basis for various statistical computations such as
measures of central tendencies, dispersions, etc.
II) Data Analysis
– Large volume of raw statistical information need to be
reduced to more manageable dimensions if one is to see
meaningful relationships in it.

– Data analysis is the computation of certain indices or measures.
• It refers to the computation of certain measures along with
searching for patterns of relationship that exists among data
group.
• Data can be analyzed qualitatively or quantitatively.
Quantitative data analysis
• Was the data collected using a random or non-random
sample?
– If it was non random then non-parametric data analysis
techniques are appropriate,
– if random then parametric techniques are appropriate.

– Were the samples dependent (related) or independent?
• Samples are said to be dependent (related) when the
measurement taken from one sample affects the
measurement taken again from the same sample.
• Samples are independent if the measurements taken
from one sample do not affect those from another
sample.
– Has the data got characteristics, which can lead to the
application of parametric tests? i.e.
• Were observations drawn from a population with
normal distribution i.e. data normally distributed?

• Does the set of data being compared have approximately equal
variances (homogeneity of variances)?
• Is there a relationship between the variable that distinguishes the
rows and the variable, which distinguishes the columns.
• Analysis can also be categorized as descriptive analysis or
inferential analysis (statistical analysis).
• With respect to the number of variables involved in the analysis,
it can also be divided into uni-variate analysis and multivariate
analysis.

Uni-variate Analysis
– Uni-variate analysis refers to the analysis with respect to one
variable.
– It is also called a one-dimensional analysis.
– The uni-variate analysis could either be presented in the form
of statistical measures such as measures of central tendencies
and measure of variations or in the form of graphs.
• Graphical illustrations could also be used to demonstrate
the frequency distribution (histograms, ogives, polygons,
bar graphs, line graphs and circular graphs or pie charts).

Table of Summary Statistics
– The initial uni-variate analysis may be the presentation of
descriptive analysis in the form of summary statistics.
– Should show mean, standard deviation, smallest value, largest
value, and number of observations for every variable used
– The purpose is to make sure that the data in the sample look
“reasonable”
– If they don't, you should say something about it

Example:
Variable Mean Std. Dev. Min. Max. No. Obs
GDP/k 9.2265 0.7659 7.4434 10.2058 309
Per Capita 2.2510 3.8311 16.2883 11.2552 309
Growth
Wage 0.6462 0.0824 0.4020 0.7944 309
Share
Pct. 30.5487 19.6497 2.4200 83.5000 309
Secondary
Educ.
Pct. 47.1290 21.6031 5.5340 91.7428 309
Secondary
Educ.
I/Y 22.2633 7.5326 3.0337 42.6151 309
G/Y 14.9447 7.4135 3.0603 46.6381 309
Population 1.2508 1.0495 1.8074 3.5566 309
Growth
Pct Urban 61.9448 19.2422 10.9500 96.3800 309
Multivariate Analysis
– Multivariate analysis involves the considerations of two or
more variables.
• It we have two variables then we have bi-variate analysis
but if we have more than two variables then we have
multivariate analysis.
– Several multivariate analyses could be undertaken such as the
construction of bi-variate tables or multivariate analysis such
as multiple regressions, ANOVA, discriminant analysis,
probit and logit analyses, canonical analysis, etc.

For instance: The Regression Framework
– The usual framework:
y = xb + e
• x is the independent variable (or variables) and y is the
dependent variable
• It is assumed that x causes y
• b measures the effect of a unit increase in x on y
• e is an error term that reflects unexplainable influences
on y. It has a mean of zero.

Example:
• y is economic output per worker

• x is a set of variables explaining output:
• x1 is average education
• x2 is hours worked per week
• x3 is capital stock per worker
• e is a compound of relevant but unobservable
quantities: Natural resources; Weather; Technology.

• Several Econometric problems are expected.
– Sample Selectivity
– Misspecification
– Omitted Variables
– Fixed Effects
– Endogenous Variables
• Appropriate remedial measures need to be considered for these
problems.

Pitfalls in Data Analysis
The problem with statistics
• Three broad classes of statistical pitfalls.
– The first involves sources of bias. These are conditions or
circumstances which affect the external validity of statistical
results.
– The second category is errors in methodology, which can
lead to inaccurate or invalid results.
– The third class of problems concerns interpretation of results
- how statistical results are applied (or misapplied) to real
world issues.

1. Sources of Bias
• The core value of statistical methodology is its ability to assist
one in making inferences about a population based on
observations of a smaller subset of that group.
• In order for this to work correctly,
– the sample must be similar to the target population in all
relevant aspects (representative sampling);
• Representative sampling. This is one of the most fundamental
tenets of inferential statistics:
– the observed sample must be representative of the target
population in order for inferences to be valid.

• The ideal scenario would be selecting members of the population
at random, with each member having an equal probability of being
selected for the sample.
• The sample "parallels" the population with respect to certain key
characteristics which are thought to be important to the
investigation at hand.
– the problem comes in applying this principle to real world
situations.
• Statistical assumptions. The validity of a statistical procedure
depends on certain assumptions it makes about various aspects of
the problem.
• For instance, linear methods depends on the assumption of
normality and independence.

• If the distributions are non-normal, try to figure out why; if it is
due to a measurement artifact try to develop a better measurement
device.
– Another possible method for dealing with unusual distributions
is to apply a transformation say to logarithm.
– However, this has dangers as well; an ill-considered
transformation can do more harm than good in terms of
interpretability of results.
• The assumption regarding independence of observations is more
troublesome, because it is so frequently violated in practice.
– Observations which are linked in some way may show some
dependencies.
• One way to try to get around this is to aggregate cases to the
higher level.
2. Errors in methodology
• The most common hazards include designing experiments with
insufficient power, ignoring measurement error, and performing
multiple comparisons.
• Statistical Power. The power of your test generally depends on the
sample size, the effect size you want to be able to detect, the alpha
you specify, and the variability of the sample.
– Based on these parameters, you can calculate the power level of
your experiment.
• Similarly you can specify the power you desire (e.g. 0.80), the alpha
level, and use the power equation to determine the proper sample
size for your experiment.

• If you have too little power, you run the risk of overlooking the
effect you're trying to find.
• If your sample is too large, nearly any difference, no matter how
small or meaningless from a practical standpoint, will be
"statistically significant".
Measurement error. Most statistical models assume error free
measurement.
– However, measurements are seldom if ever perfect.
• Two characteristics of measurement reliability and validity.
Reliability refers to the ability of a measurement instrument to
measure the same thing each time it is used.

• So, a reliable measure should give you similar results.
– If the characteristic being measured is stable over time,
repeated measurement of the same unit should yield consistent
results.
Validity is the extent to which the indicator measures the thing it was
designed to measure.
– Validity is usually measured in relation to some external
criterion.

3. Problems with interpretation
– There are a number of difficulties which can arise in the
context of interpretation.
• Significance (in the statistical sense) is really a function of sample
size and experimental design and shows the strength of the
relationship.
• With low power, you may be overlooking a really useful
relationship; with excessive power, you may be finding
microscopic effects with no real practical value.

Multiple Variables and Confounds
• It would make our life simpler if every effect variable had only
one cause, and it co-varied only with one other variable.
• Unfortunately, this is hardly ever the case.
• If we have a number of interrelated variables, then it becomes
difficult to sort out how variables affect each other.
• It is easy to confuse one cause with another, or to attribute all
changes to a single cause when many causal factors are
operating.
– Having multiple variables related to each other obscures/makes
unclear the nature of covariance relationships.

• The process of determining whether a relationship exists between
two variables requires first that we establish covariance between
two variables.
• In addition to verifying that the two variables change in
predictable, non-random patterns, we must also be able to
discount any other variable or variables as sources of the change.
• To establish a true relationship, we must be able to confidently
state that we observed the relationship under conditions which
eliminated the effects of any other variables.
• Failure to properly control for confounding variables is a common
error.

• We must take steps to control all confounding variables, so that
we can avoid making misestimates of the size of relationships, or
even draw the wrong conclusions.
Controlling for Confounding Variables
• We can first organize the universe of variables and reduce
them by classifying every variable into one of two categories:
Relevant or Irrelevant to the phenomenon being investigated.
• The relevant variables are those which are important to
understand the phenomenon, or those for which a reasonable case
can be made.
– Example: if the literature tells us that Consumption
Expenditure is associated with income, then we will consider
income to be a relevant variable.

• If we have not included the relevant variable in our analysis it can
be because of different reasons.
• One reason we might choose to exclude a variable is because
we consider it to be irrelevant to the phenomenon we are
investigating.
• If we classify a variable as irrelevant, it means that it has no
systematic effect on any of the variables included.
– Irrelevant variables require no form of control, as they are not
systematically related to any of the variables in our model, so
they will not introduce any influence.

• Two basic reasons why relevant variables might be excluded:
– First, the variables might be unknown.
• We might have overlooked some relevant variables, but the
fact that we have missed these variables does not mean that
they have no effect.
• Another reason for excluding relevant variables is because they are
simply not of interest.
– Although the researcher knows that the variable affects the
phenomenon being studied, he does not want to include its
effect in the model.

• Finally, there remain two kinds of variables which are explicitly
included in our hypothesis tests.
– The first are the relevant, interesting variables which are
directly involved in our hypothesis test.
– The second is called a control variable.
• The control variable is included because it affects the relevant
variables and we need to control for its effect.
Methods for Controlling Confounding Variables
– The effects of confounding variables can be controlled with
three basic methods: manipulated control, statistical control,
and randomization.

Manipulated Control
– Manipulated control essentially changes a variable into a
constant.
– We eliminate the effect of a confounding variable by not
allowing it to vary. If it cannot vary, it cannot produce any
change in the other variables.
– If we can hold all confounding variables constant, we can be
confident that any difference observed between two groups is
indeed due to the explanatory variable and not due to the other
variables.
– So, Manipulated control prevents the controlled variables from
having any effect on the dependent variable.

Statistical Control
• With this method of control, we include the confounding variable
into the research design as an additional measured variable, rather
than forcing its value to be a constant.
• So, we will be considering with three (or more) variables and not
two: the independent and dependent variables, plus the
confounding (or control) variable or variables.
• The effect of the control variable is mathematically removed from
the effect of the independent variable, but the control variable is
allowed to vary naturally.
• This process yields additional information about the relationship
between the control variable and the other variables.

• In general, statistical control provides us with much more
information about the problem we are researching than does
manipulated control.
• But advantages in one area usually have a cost in another, and this
is no exception.
– An obvious drawback of the method lies in the increased
complexity of the measurement and statistical analysis which
will result from the introduction of larger numbers of
variables.

Randomization
• The third method of controlling for confounding variables is to
randomly assign the units of analysis (experimental subjects) to
experimental groups or conditions.
• The rationale for this approach is straightforward: any
confounding variable will have its effects spread evenly across
all groups, and so it will not produce any consistent effect that
can be confused with the effect of the independent variable.
• This is not to say that the confounding variables produce no
effects in the dependent variable—they do.

• But the effects are approximately equal for all groups, so the
confounding variables produce no systematic effects on the
dependent variable.
• The major advantage of randomization is that we can assume that
all confounding variables have been controlled.
• Even if we fail to identify all the confounding variables, we will
still control for their effects.
• As these confounding variables are allowed to vary naturally, as
they would in the real world.

• Since we don’t actually measure the confounding variables, we
assume that randomization produces identical effects from all
confounding variables in all groups, and that removes any
systematic confounding effects of these variables.
• But any random process may result in disproportionate outcomes
occasionally.
• Example: If we flip a coin 100 times, we will not always see
exactly 50 heads and 50 tails.
• Sometimes we will get 60 heads and 40 tails, or even 70 tails
and 30 heads.

• Consequently, we have no way of knowing with absolute
certainty that the randomization control procedure has actually
distributed identically the effects of all confounding variables.
– We are only trusting that it did.
• But, with manipulated control and statistical control, we can be
completely confident that the effects of the confounding variables
have been distributed so that no systematic influence can occur,
because we can measure the effects of the confounding variable
directly.
• There is no chance involved.

• We assume that we’ve eliminated the systematic effects of the
confounding variables by insuring that these effects are
distributed across all values of the relevant variables.
• But we have not actually measured or removed these effects—
the confounding variables will still produce change in the
relevant variables.

ALL VARIABLES
IRRELEVANT RELEVANT
No control
needed
INCLUDED NOT INCLUDED
INTERESTING CONTROLLED UN KNOWN UN INTEREST! NG
Statistical Statistical Random- Manipulated

Hypothesis Control ization Control
Manipulated Random•
Control ization
Random•
ization
FIGURE 4-3 Methods of controlling for confounding variables
Chapter 4: Testing Hypotheses: Confounds and Controls

Chapter Six
Writing the Research Report
Introduction
• Researchers spend an immense amount of time designing
projects, developing questionnaires, collecting and analyzing
data, and formulating possible policy implications.
• But hard work and excellence alone do not guarantee that public
policy research will have impact.
• In order to have impact, good research alone is insufficient.
– It must be communicated to the right people

The Writing Process
• The intrinsic value of a study can be easily destroyed by a
poor final report or presentation.
• A well-presented study can impress the reader more than
another study with greater scientific quality but a weaker
presentation.
• Hence, researchers must make special efforts to communicate
clearly and fully their research results.
• Writing is a process- It takes time, and effort and it improves
with practices.

• When writing the research report it would be important to
consider:
• What is the purpose of the report?
• Who will read the report?
• What are the circumstances and limitations under which
the report was written?
• How will the report be used? etc.

Generally the writing process has three major steps:
i) Pre-writing
• Prepare to write by arranging notes on the literature, making
lists of ideas, outlining, completing bibliographic citations,
and organizing comments on data analysis.
ii) Composing
• Get your ideas onto paper as a first draft by free-writing,
drawing up the bibliography and footnotes, preparing data
for presentation, and forming of an introduction and
conclusion.

iii) Rewriting
• Evaluate and polish the report by improving coherence,
proofreading for mechanical errors, checking citations, and
reviewing voices and usages.
• This step actually involves two related procedures: revising and
editing.
• Revising – is the process of inserting new ideas, adding
supportive evidences, deleting or changing new ideas,
strengthening transitions and links between ideas.
• Editing – is the process of cleaning up and tightening and
involves the mechanical aspects of writing such as spelling,
grammar usage, verb tense, sentence length and paragraph
organization.

Types of Research Reports
• We may have:
• Short reports and long reports.
• A) Short Reports
• Short reports are more informal and are appropriate for
studies in which the problem is well defined, of limited scope
and for which methodologies are simple and straightforward.
– Example: interim reports.
• At the beginning, there should be a brief statement on the
problem examined and its breadth and depth.
• Next comes the conclusions and recommendations, followed by
findings that support the conclusions.

B) Long Reports
• Long reports are long and follow well-defined formats.
– They are of two types, the technical or base report and
the popular report.
• Which of these to use depends chiefly on the audience and
the researcher’s objectives.
i) The technical report
• This report should include full documentation and detail - it
is the major source document.
• It is the report that has the full story of what was done and
how it was done.

• It contains information on the:
• sources of the data,
• sampling design,
• data gathering instruments,
• data analysis methods, as well as
• a full presentation and analysis of the data.
• Conclusions and recommendation should be clearly related to
specific findings.
ii) The popular report
• The popular report is designed for the non-technical
audience with no research background and may be
interested only in results rather than methodology.
• Decision makers need help in making decisions.

• Popular report should encourage rapid reading, quick
comprehension of major findings and prompt understanding of
the implication and conclusions.
Report format for long reports
• Two arrangements are typically used – the logical format
and the psychological format.
The logical format
• The introductory information covering the purpose of the
study, the methodology and limitations is followed by the
findings.
• The findings are analyzed and then followed by the
conclusions and recommendations.

The psychological format
• This is largely an inversion/reversal of the logical order and
is mostly used in popular reports.
• The conclusions and recommendations are presented
immediately after the introduction with the findings coming
later.
• Readers are quickly exposed to the most critical information – the
conclusions and recommendations.
• If they wish to go further they may read on into the findings, which
support the conclusion clearly given.
• Other report formats include the chronological report, which is
based on time sequence or occurrence.

II) Components of a technical report
– While some may be dropped, other may be added and their
order may vary from one situation to another, a research report
contains several components or elements.
• In general there are three parts: the prefatory pages, the
body of the report and the appended sections.
A) Prefatory pages – this section includes the title page, letters of
authorization (if any), tables of contents, charts and illustrations,
synopsis (summary, abstracts).

The Title page – the title page should include four items: the title of
the report, the date, for whom prepared and by whom the report
was prepared.
– A satisfactory title should be brief, but should at least include:
• The variables included in the study, the type of relationship
between the variables, and the population to which the
results may be applied.
The table of contents – any report of several sections that totals
more than six to ten pages should have a table of content.

Abstract – this is a short summary.
– For conference papers, research papers, theses and
dissertations, you will almost always be asked to write an
abstract.
– It goes first in the report, but should be written last.
– It helps the reader determine whether the full report contains
important information.
• It is essential that your abstract includes all the keywords of your
research.

• An abstract should briefly:
– Re-establish the topic of the research.
– Give the research problem and/or main objective of the
research.
– Indicate the methodology used.
– Present the main findings and conclusions
• The main point to remember is that it must be short, because it
should give a summary of your research.

Common Problems in preparing the Abstract
– Too long and Too much detail. Abstracts that are too long
often have unnecessary details.
• The abstract is not the place for detailed explanations of
methodology or the context of your research problem.
– Too short. Shorter is not necessarily better. You should review
your abstract and see where you could usefully give more
explanation.

B) The body of the report – Contains the introduction, findings,
summary and conclusions and recommendations.
1) Introduction – the introduction comes at the start of the writing.
• The purpose should be to interest your reader in the problem
and motivate your approach
• It will mostly contain the same material as the introduction
to your proposal
– It introduces the research by situating it (by giving the
background), presenting the research problem, indicating
the objectives, as well as the rationale or significance and
the scope and limitations.

• The introduction should flow from beginning to the end
– Each paragraph should flow smoothly from the previous one
• Introducing the rest of the report
• The last paragraph of the introduction should explain the
organization of the rest of the report
• Example: “Section two reviews the relevant literature. In
Section 3, we describe the data we have collected. In Section
four, we test our hypothesis using this data. Section five draws
concludes and makes recommendations for future research.”

Common problems in writing the introduction
• Too much detail, and hence too long:
– Although you will cover important points, detailed descriptions
of method, study site and results should come in later sections.
• Repetition of words, phrases or ideas. A high level of
repetition makes your writing look careless.
• Unclear problem definition. Without a clear definition of your
research problem, your reader is left with no clear idea of what
you were studying.
• Poor organization

2. Literature Review
• The report also frequently includes a literature review and links
the problem with theory.
• Literature means the works you consulted in order to
understand and investigate your research problem.
• Can be more or less cut and pasted from your proposal
• Remember, it should justify the following ideas:
• Other people are interested in the general topic
• Other studies left the problem unsolved which leaves a gap
in the literature
• Your study fills the gap at least a little bit

• Journal articles: These are good especially for up-to-date
information.
• Books: books tend to be less up-to-date as it takes longer for a
book to be published than for a journal article.
– Text books offer a good starting point from which to find
more detailed sources.
• Conference proceedings: These can be useful in providing the
latest research, or research that has not been published.
– They also provide information on which people are currently
involved.

• Government/corporate reports: Many government departments
and corporations commission or carry out research.
• Newspapers: Since newspapers are generally intended for a
general (not specialized) audience, the information they provide
will be of very limited use for your literature review- but can be a
starting point.
• Theses and dissertations: These can be useful sources of
information.
• Internet: The fastest-growing source of information is the
Internet.

• With regard to the internet, remember that:
• Anyone can post information on the Internet so the quality
may not be reliable,
• The information you find may be intended for a general
audience and hence less detailed, and
• More and more refereed electronic journals (e-journals) are
appearing on the Internet - the quality is more reliable
(depending on the reputation of the journal).

• CD-ROMS: More and more bibliographies are being put onto
CD-ROM for use in academic libraries, so they can be a very
valuable tool in searching for the information you need.
• Magazines: Magazines intended for a general audience are
unlikely to be useful in providing the sort of information you need.
– Magazines may be a starting point by providing news or
general information about new discoveries, policies, etc. that
you can further research in more specialized sources.

Common Problems:
• Trying to read everything: If you try to be comprehensive you will
never be able to finish the reading!
– The literature review should not provide a summary of all the
published work that relates to your research, but a survey of the
most relevant and significant work.
• Reading but not writing: Writing takes much more effort than
reading- don't put writing off until you've "finished" reading.
• Not keeping bibliographic information: When preparing your
reference you might notice that you have forgotten to keep the
information you need.
– To avoid this nightmare/frightening always put references into
your writing.

3. The methods: Answers at least two main questions:
• How was the data collected or generated?
• How was it analyzed?
• The data collection step should cover at least four items:
(i) the target population that is being studied and any sampling
methods used.
(ii) the research design used and the rationale for using it
including the sample size,
(iii) the materials and instruments used often with a copy of these
materials in the appendix,
(iv) the specific data collection method (survey, observation or
experiment)
• Knowing how the data was collected helps you to evaluate the
validity and reliability of your results, and the conclusions you
draw from them.
• Your methodology should make clear the reasons why you chose a
particular method or procedure.
Common Problems
• Unnecessary explanation of basic procedures
• Problem blindness: Do not ignore significant problems or pretend
they did not occur.
• Often, recording how you overcome obstacles can form an
interesting part of the methodology.

4) Findings and Discussions – It is an organized presentation of
results and is generally the longest section of the report.
• The Results Section includes:
– Statement of results: The results are presented in a format that is
accessible to the reader (e.g. in graphs, tables, diagrams or written
text).
– Explanatory text: All graphs, tables, diagrams and figures should
be accompanied by text that guides the reader's attention to
significant results.
• The text makes the results meaningful by pointing out the most
important results, simplifying the results, highlighting significant
trends or relationships and perhaps commenting on whether
certain results were expected or unexpected.

The Discussion Section:
• In the discussion section we talk about what we see in the data and
give the reader unambiguous interpretation of its meaning.
– The discussion section provides explanation of the results and
includes:
• Explanation of results: the writer comments on whether or not
the results were expected, and presents explanations for the
results, particularly for those that are unexpected or unsatisfactory.
• References to previous research: comparison of the results with
those reported in the literature, or use of the literature to support a
claim or a hypothesis.
• Deduction: a claim for how the results can be applied more
generally.

5) Summary and Conclusion – the summary is a brief restatement
of the essential findings.
• The summary section presents:
• What was learned
• What remains to be learned (directions for future research)
• The shortcomings of what was done (evaluation)
• The benefits, advantages, applications, etc. of the research
(evaluation), and
• Recommendations.
• The conclusions and recommendations should follow logically
from the discussion of the findings.

Common Problems
Too long: The conclusion section should be short.
– The conclusion section should be as little as 2.5% of an entire
piece.
Too much detail: Conclusions that are too long often have
unnecessary detail.
– Although you should give a summary of what was learnt from
your research, this summary should be short, since the
emphasis in the conclusions section is on the implications,
evaluations, etc. that you make.

Failure to comment on larger, more significant issues: Whereas
in the introduction your task was to move from general (your
field) to specific (your research), in the concluding section your
task is to move from specific (your research) back to general
(your field, how your research will affect the world).
Failure to reveal difficulties encountered: Negative aspects of
your research should not be ignored.
– Problems, drawbacks etc. can be included in summary in your
conclusion section as a way of qualifying your conclusions
(i.e. pointing out the negative aspects, even if they are
outweighed by the positive aspects).

6) Recommendations – this involves suggested future actions.
• It makes easy reading for an outsider if the recommendations are
again placed in roughly the same sequence as the conclusions.
• The recommendations could be for further study, to test, deepen
or broaden understanding in the subject area or for managerial
actions.
• The recommendations should take into consideration the local
conditions, constraints, feasibility and usefulness of the proposed
solutions.

7) The appended section – this includes appendix and bibliography.
i) Appendix – complex tables, statistical tests, supplying
documents, copies of forms used, detailed description of the
methodology, instructions to field workers, and any other
evidence that may be important.
– The annexes should contain any additional information needed
to enable professionals to follow your research procedures and
data analysis.

• Examples of information that can be presented in annexes are:
• tables referred to in the text but not included in order to
keep the report short;
• lists of study sites, - districts, villages, etc. that participated
in the study;
• questionnaires or check lists used for data collection.

ii) Bibliography – There should be a bibliographic section if the
study makes heavy use of secondary material.
• This section should contain all those works, which the researcher
has consulted.
• It should be arranged alphabetically and may be divided into two
parts.
• The first part may contain names of books and pamphlets and the
second part may contain names of magazines and newspaper
articles.
• There may be several bibliographic entry formats. The following
is one of such entry formats.

• For books and Pamphlets the following order may be adopted.
Ø Name of the principal author, last name first
Ø Title, underlined or in italic styles
Ø Place, publisher and date of publication
Ø Number and volumes.
• Example: Kothari, C. R. Quantitative Techniques, New Delhi,
Vikas publishing house Pvt ltd. 1978.

• For magazines, Journal articles and newspapers the following
order is appropriate
Ø Name of author, last name first
Ø Title of article in quotation marks
Ø Names of periodical underlined
Ø The volume or volume and number
Ø The date of the issue pagination/assign page
Ø The numbers Christenson, L. R., D. W.
• Example: Jorgenson and L.J. Law,
“Transcendental Logarithmic Production Frontiers ” Review of
Economics and statistics, 55(19) 1973, 28–45.

• The references in your text can be numbered in the sequence in
which they appear in the report and then listed in this order in the
list of references (Vancouver system).
• Another possibility is the Harvard system of listing in brackets
the author’s name(s) in the text followed by the date of the
publication and page number, for example: (Shan, 2000: 84).
• You can choose either system as long as you use it consistently
throughout the report.

Presentation Consideration
• Reports should be physically inviting, easy to read and match the
comprehension abilities of the designated audiences (reader).
• (1) Style of writing: Remember that your reader:
– Is short of time
– Has many other urgent matters demanding his or her interest
and attention
– Is probably not knowledgeable concerning ‘research jargon’
• It is always good to use words that convey thoughts accurately,
clearly and efficiently.

• Therefore the rules are:
– Simplify- Keep to the essentials.
– Justify- Make no statement that is not based on facts and data.
– Quantify when you have the data to do so - Avoid ‘large’,
‘small’; instead, say ‘50%’, ‘one in three’.
– Be precise and specific in your phrasing of findings.
– Use short sentences.
– Be consistent in the use of tenses (past or present tense).
– Aim to be logical and systematic in your presentation.

(2) Layout of the report
• A good physical layout is important since it will:
– make a good initial impression/feeling,
– encourage the readers, and
– give them an idea of how the material has been organized so the
reader can make a quick determination of what he will read
first.
• Poor reproduction, dirty typewriter type, incorrect spelling, poor
punctuation (typographic errors). and
• Overcrowding of text, inadequate labeling of charts and tables, etc.
reduce the credibility of a report.

• Particular attention should be paid to make sure there is:
– An attractive layout for the title page and a clear table of
contents.
– Consistency in margins, spacing, headings and subheadings,
– Numbering of figures and tables, provision of clear titles for
tables, and clear headings for columns and rows, etc.
– Accuracy and consistency in quotations and references.

• Revising and finalizing the text
– Having done the ‘analytical’ and ‘creative’ work, you now
need to put on your critical judgment hat.
– You need to take a step back and review your report from your
audience’s viewpoint.
– Remember, their viewpoint is different.
• They are looking for reasons to believe.
• They need to be comfortable with your report and accept
your findings.

• The following questions should be kept in mind when reading the
draft:
– Have all important findings been included?
– Do the conclusions follow logically from the findings? If
some of the findings contradict each other, has this been
discussed and explained, if possible? Have weaknesses in the
methodology, if any, been revealed?
– Are there any overlaps in the draft that have to be removed?
And is it possible to condense the content?

– Do data in the text agree with data in the tables? Are all tables
consistent (with the same number of informants per variable),
are they numbered in sequence, and do they have clear titles
and headings?
– Is the sequence of paragraphs and subsections logical and
coherent? Is there a smooth connection between successive
paragraphs and sections? Is the phrasing of findings and
conclusions precise and clear?
– Perform a spell and grammar check.

Briefings (presentation)
• Good presentation improves both the research and the reputation
of the researcher.
• A successful briefing typically requires a condensation of a
lengthy and complex body of information.
• Speaking rates should not exceed 100 to 150 words per minute.
• About 20 minutes presentation is usually required.
• A detailed outline of what one is going to say includes
– Opening
– Findings and conclusions
– Recommendations

• The most important thing to keep in mind:
– The time will usually pass a lot more quickly than you think
– Keep focused on the main ideas: The motivation, the problem,
and the main results
• You do not have to mention all of the difficulties and
shortcomings; people can ask during the presentation
• Hypotheses: Mention the ones whose tests you will show. You
do not have to mention all.
• Data: You do not need to mention response rates or sample size
misspecifications unless these are very important; people can ask

• Organizing slides:
– A slide should contain a handful (25) of key points; it should
not fill the page
– Slides should not contain your entire presentation, just the key
things to remember
– Graphics can be useful if they tell the story

Quantitative Analysis
Basic Univariate and Multivariate

Analysis
Presentation Outline:
1. Statistics and Econometrics?
2. Why Econometrics is A Separate Discipline?
3. Univariate Statistical Analysis
4. Methodology Of Econometrics
•Statement of theory or hypothesis (Model Specification).
•Specification of the mathematical model of the theory
•Specification of the statistical, or econometric, model
•Collecting the data
•Estimation of the parameters of the econometric model
•Diagnostic Tests (Post-Estimation Tests)
•Hypothesis testing
•Forecasting or prediction
•Using the model for control or policy purposes.
5. Qualitative Explanatory Variables /Dummy Variables
STATISTICS AND ECONOMETRICS
Statistics:
1. It is the science of learning from data, and of
measuring, controlling, and communicating
uncertainty; and it thereby provides the
navigation essential for controlling the course of
scientific and societal advances
2. It is a science of collection, presentation,
analysis, and reasonable interpretation of data.
And making inference and predict relations of
variables.
A Taxonomy of Statistics
328
Econometrics
• Economists are frequently interested in relationships between different
quantities, for example between income and consumption.
• The most important job of econometrics is to quantify these

relationships on the basis of available data and using statistical
techniques, and to interpret, use or exploit the resulting outcomes
appropriately.
• Consequently, econometrics is the interaction of

– economic theory,
– observed data and
– statistical methods.
• Econometrics has focused upon aggregate economic relationships.
Univariate Statistical Analysis:
Descriptive Analysis
Frequency Distribution, Measures of Central Tendency, Measures of
Dispersion, and Shape of Frequency Distribution
Frequency Distribution:
Distribution:- tells us what values the variable takes and how often it takes these values:
unimodal - having a single peak, Bimodal - having two distinct peaks, and Symmetric - left
and right half are mirror images.
Frequency Distribution: is an overview of all distinct values in some variable and the
number of times they occur. That is, a frequency distribution tells
how frequencies are distributed over values.
Graphical Presentation
Figure 1: Bar Chart of Subjects in

Treatm ent Groups
30
Number of Subjects
25
20
15
10
5
0
1 2 3
Treatm ent Group
Pie Chart: Lists the categories and presents the
percent or count of individuals who fall in each
category.
Measures of Descriptive Statistics
Descriptive statistics: are methods for organizing
and summarizing data.
For example, tables or graphs are used to
organize data, and descriptive values such as
the average score are used to summarize data.
A descriptive value for a population is called a
parameter and a descriptive value for a sample
is called a statistic.
•Descriptive statistics are used to describe the basic
features of the data in a study.
•With descriptive statistics you are simply describing

what is or what the data shows.
•Three Measures :
1.Measures of Central Tendency: Mean, median, and
mode measure the central tendency of a variable.
2.Measures of dispersion (Variability): include
variance, standard deviation and range
3. Shape of Distribution: Skewness, Kurtosis
Measures of central tendency
• Measures of central tendency use a single

value to describe the center of a data set.
• A score that indicates where the center of the
distribution tends to be located.
• Tells us about the shape and nature of the
distribution.
• Mode
• Median
• Mean
336
Mean:
The mean is the sum of the values divided by the
number of values. The mean of a set of numbers
x1, x2... xn is typically denoted by , pronounced "x
bar". This mean is a type of arithmetic mean. The
mean describes the central location of the data;
the arithmetic mean is the "standard" average,
often simply called the "mean".
The mode
• The most frequently occurring score.
• Typically useful in describing central tendency
when the scores reflect a nominal scale of
measurement (Nominal” scales could simply
be called “labels.”)
• E.g, Eye color, gender, and hair color are all
examples of nominal data.
338
• To find the mode, or modal value, it is best to
put the numbers in order.
• Then count how many of each number. A
number that appears most often is the mode.
• We can have only one mode (unimodal) or
more than one mode (bimodal) or more than
two modes (multimodal).
339
340
• E.g1, what is the mode?
Example2: {1, 3, 3, 3, 4, 4, 6, 6, 6, 9}
341
• However, mode gives us limited information
about a distribution
– Might be misleading
342
Median:
It is the middle value of the distribution when

all items are arranged in either ascending or
descending order in terms of value
Med= Med   n  1 value

th
 2 
• The score at the 50th percentile (in the middle) and
tells you where the middle of a data set is.
• Used to summarize ordinal (order of the values) or
highly skewed interval or ratio scores interval or
ratio scores.
• When data are normally distributed, the median is

the same score as the mode. 344
• When data are not normally distributed, the
following procedure is used to calculate:
– Arrange the scores from highest to the lowest.
– If there are an odd number of scores, the
median is the score in the middle position.
– If there are an even number of scores, the
median is the average of the two scores in the
middle
• EXP1: 1 2 3 3 4 7 9 10 11; mdn=4
• EXP2: 1 2 3 3 4 6 7 9 10 11;mdn=5
345
• The median formula is {(n + 1) ÷ 2}th, where “n”
is the number of items in the set and “th” just
means the (n)th number.
• EXP: 1 2 3 3 4 7 9 10 11
-Mdn=9+1/2=5th=4
• EXP: 1 2 3 3 4 6 7 9 10 11
-Mdn=10+1/2=5.5=5
• A better measure of central tendency than
mode. Because;
– Only one score can be the median 346
Measures Of Dispersion
• The degree to which numerical data tend to
spread about an average value is called
variation or dispersion or spread of the data.
• Dispersion (variability, scatter, or spread)
characterizes how stretched or squeezed of the
data.
• A measure of statistical dispersion is a non-
negative real number that is zero if all the data
are the same and increases as the data become
more diverse.
• It is used to know how the variates are
clustered around or scattered away from the
average.
347
Cont’d
• e.g, the works of two typists who typed the
following number of pages in 6 working days of a
week :
Mon. Tues. Wed. Thus. Fri. Sat. Total pages

1 typist : 15 20 25 25 30 35 150
2 typist : 10 20 25 25 30 40 150
• We see that each of the typist 1 and 2 typed 150

pages in 6 working days and average in both the
cases is 25. Thus there is no difference in the
average, but we know that
348
Cont’d
• In the first case the number of pages varies
from 15 to 35 while in the second case the
number of pages varies from 10 to 40.
• This denotes that the greatest deviation from

the mean in the first case is 10 and in the
second case is 15 i.e., there is a difference
between the two series.
• The variation of this type is termed scatter or
dispersion or spread.
349
Cont’d
• There are many types of dispersion measures:
• Range
• Mean Absolute Deviation
• Variance/Standard Deviation
350
Cont’d
THE RANGE
• It is the simplest possible measure of
dispersion.
• The range of a set of numbers (data) is the
difference between the largest and the least
numbers in the set
• If this difference is small then the series of numbers
is supposed regular and if this difference is large
then the series is supposed to be irregular.
• Example : 15 20 25 25 30 35
• Range = Largest – Smallest =20
351
Cont’d
 Mean Deviation
• How far, on average, all values are from the
middle.
– three steps to calculating:
1. Find the mean of all values
2. Find the distance of each value from that
mean (subtract the mean from each value,
ignore minus signs)
3. Then find the mean of those distances
352
cont’d
• Example: the Mean Deviation of 3, 6, 6, 7, 8, 11, 15, 16
• Step 1: Find the mean:
• Mean = 3 + 6 + 6 + 7 + 8 + 11 + 15 + 16/8 = 72/8 = 9
• Step 2: Find the distance of each value from that mean:
• Step 3: mean deviation Value Distance from 9
6+3+3+2+1++2+6+7/8=3.75 3 6
6 3
6 3
7 2
8 1
11 2
15 6
16 7
353
Cont’d
• Each distance we calculate is called
an Absolute Deviation, because it is
the Absolute Value of the deviation (how far
from the mean).
• From our example, the value 16 has Absolute
Deviation = |x - μ| = |16 - 9| = |7| = 7
354
Cont’d
• Mean deviation depends on all the values of
the variables and therefore it is a better
measure of dispersion than the range.
• Since signs of the deviations are ignored
(because all deviations are taken positive),
some artificiality is created.
355
cont’d
• Example:
• The heights (at the shoulders) are: 600mm,

470mm, 170mm, 430mm and 300mm.
• Step 1: Find the mean:
• μ = 600 + 470 + 170 + 430 +
300/5 = 1970/5 = 394
• Step 2: Find the Absolute Deviations: |x - μ|
356
Cont’d
 Standard Deviation
• The Standard Deviation is a measure of how spread out
numbers are.
• The formula is the square root of the Variance.
• denoted by s.
n 2
 ( x  x)
i
SE ( x)  i 1
• So now you ask, "What is the Variance?"

357
• The variance is used as a measure of how far
a set of numbers are spread out from each
other.
• The Variance is the average of
the squared differences from the Mean.
– To calculate the variance follow these steps:
 Work out the Mean of the numbers
 Then for each number: subtract the Mean and
square the result (the squared difference).
 Then work out the average of those squared
differences.
358
Cont’d
–Example
• You and your friends have just measured the
heights of your dogs (in millimeters):
• The heights (at the shoulders) are: 600mm,

470mm, 170mm, 430mm and 300mm.
359
Cont’d
• Find out the Mean, the Variance, and the
Standard Deviation.
• so the mean (average) height is 394 mm.
360
• Now we calculate each dog's difference from
the Mean:
To calculate the Variance, take each difference, square it, and

then average the result: Variance
2062 + 762 +
σ2 = (−224)2 + 362 +
(−94)2 /5
42436 + 5776
= + 50176 +
1296 + 88365
= 1085205
= 21704
361
Cont’d
Standard Deviation
σ = √21704
= 147.32...
147 (to the nearest
=
mm)
So, using the Standard Deviation we have a "standard" way of knowing what is normal,
and what is extra large or extra small.
362
Coefficient of variation (CV):
In probability theory and statistics, the coefficient
of variation (CV) is a normalized measure of
dispersion of a probability distribution.
The coefficient of variation (CV) is defined as the
ratio of the standard deviation to the mean :
 SD 
CV   
 Mean 
Covariance :
Covariance between X and Y refers to a measure of
how much two variables change together.
Covariance indicates how two variables are related.
• A positive covariance means the variables are
positively related, while a negative covariance
means the variables are inversely related. The
formula for calculating covariance of sample data is
shown below.
n
 (x i  x )( yi  y )
Cov( x, y )  i 1
n
Shape of Frequency Distribution
Skweness:
Kurtosis:
Skewness:
• It refers to symmetry or asymmetry of the distribution.
• It is a measure of the asymmetry of the probability
distribution of a real-valued random. The skewness value
can be positive or negative, or even undefined.
• Qualitatively, a negative skew indicates that the tail on the
left side of the probability density function is longer than the
right side and the bulk of the values (possibly including the
median) lie to the right of the mean.
• A positive skew indicates that the tail on the right side is
longer than the left side and the bulk of the values lie to the
left of the mean.
• A zero value indicates that the values are relatively evenly
distributed on both sides of the mean, typically but not
necessarily implying a symmetric distribution.
The coefficient of Skewness is a measure for the degree of symmetry in
the variable distribution.
Kurtosis:
• It refers to peakedness/flatness of the distribution.

• It is a measure of the "peakedness" of the probability distribution of a real-
valued random variable although some sources are insistent that heavy tails,
and not peakedness, is what is really being measured by kurtosis.
• Higher kurtosis means more of the variance is the result of infrequent extreme
deviation, as opposed to frequent modestly sized deviations.
The coefficient of Kurtosis is a measure for the
degree of peakedness/flatness in the variable
distribution.
Correlation between Variables
• Correlation is a bivariate analysis that measures the
strength of association between two variables and the
direction of the relationship.
• Correlation is a statistical technique that can show whether
and how strongly pairs of variables are related.
• You may suspect there are correlations, but don't know
which are the strongest.
• In terms of the strength of relationship, the value of the
correlation coefficient varies between +1 and -1.
n
 (x i  x )( yi  y )
r ( x, y )  i 1
var( xi  x ) var( yi  y )
370
Cont’d
• A value of ± 1 indicates a perfect degree of
association between the two variables.
• As the correlation coefficient value goes towards
0, the relationship between the two variables will
be weaker.
• Correlation works for quantifiable data in which
numbers are meaningful, usually quantities of
some sort.
• It cannot be used for purely categorical data, such
as gender, brands purchased, or favorite color.
371
Cont’d
• For example, height and weight are related;
taller people tend to be heavier than shorter
people.
• The relationship isn't perfect. People of the
same height vary in weight, and you can easily
think of two people you know where the shorter
one is heavier than the taller one.
• Correlation can tell you just how much of the
variation in peoples' weights is related to their
heights.
372
Example
• A correlation coefficient of 1
means that for every positive
increase in one variable, there is a
positive increase of a fixed
proportion in the other. For
example, shoe sizes go up in
(almost) perfect correlation with
foot length.
• A correlation coefficient of -1
means that for every positive
increase in one variable, there is a
negative decrease of a fixed
proportion in the other. For
example, the amount of gas in a
tank decreases in (almost) perfect
correlation with speed.
• Zero means that for every
increase, there isn’t a positive or
negative increase. The two just
aren’t related.
373
Correlation techniques
• Usually, in statistics, four types of correlations:
Pearson correlation, Kendall rank correlation,
Spearman correlation, and the Point-Biserial
correlation.
 Pearson r correlation: Pearson r correlation is the
most widely used correlation statistic to measure
the degree of the relationship between linearly
related variables.
• For example, if we want to measure how age and
glucose level are related to each other,
Pearson r correlation is used to measure the
degree of relationship between the two.
374
Cont’d
• Types of research questions a Pearson
correlation can examine:
• Is there a statistically significant relationship
between age and glucos level?
• Is there a relationship between temperature
and ice cream sales?
• Is there a relationship between job satisfaction
and income?
375
Cont’d
Kendall rank correlation: is a non-parametric test
that measures the strength of dependence
between two variables.
• Sample Question: Two interviewers ranked 12
candidates (A through L) for a position. The results from
most preferred to least preferred are:
• Interviewer 1: ABCDEFGHIJKL.
• Interviewer 2: ABDCFEHGJILK
376
Cont’d
 Spearman rank correlation: Spearman rank
correlation is a non-parametric test that is used to
measure the degree of association between two
variables.
• The Spearman rank correlation test does not carry
any assumptions about the distribution of the data
and is the appropriate correlation analysis when
the variables are measured on a scale that is at
least ordinal.
• ρ= Spearman rank correlation
di= the difference between the ranks of corresponding
variables
n= number of observations
377
Cont’d
• Types of research questions a Spearman Correlation
can examine:
• Is there a statistically significant relationship
between participants’ level of education (high
school, bachelor’s, or graduate degree) and
their starting salary?
378
ECONOMETRIC ANALYSIS:
Robust Regression Analysis
What is Regression?
• A way of predicting the value of one variable from
another.
• It is a hypothetical model of the relationship
between two variables.
• For example, relationship between rash driving
and number of road accidents by a driver is best
studied through regression.
380
• Regression analysis is a statistical tool for the
investigation of relationships between variables.
Usually, we seek to ascertain the causal effect of one
variable upon another.
 Regression analysis estimates the conditional
expectation of the dependent variable given the
independent variables that is, the average value of the
dependent variable when the independent variables
are held fixed.
In all cases, the estimation target is a function of the
independent variables called the regression function.
 Regression analysis is widely used for prediction and
forecasting Y     X   X  
0 1 1 2 2
Broadly speaking, traditional econometric methodology proceeds along
the following lines:
1. Statement of theory or hypothesis (Model Specification).

2. Specification of the mathematical model of the theory
3. Specification of the statistical, or econometric, model
4. Collecting the data
5. Estimation of the parameters of the econometric model
6. Hypothesis testing
7. Diagnostic Tests (Post-Estimation Tests)
8. Forecasting or prediction
9. Using the model for control or policy purposes.
• To illustrate the preceding steps, let us consider the well-known
Keynesian theory of consumption.
1. Statement of Theory or Hypothesis (Model Specification)
•Choosing among Competing Models: When a governmental agency collects
economic data, such as that shown in Table I.1, it does not necessarily have any
economic theory in mind.
•How then does one know that the data really support the Keynesian theory of
consumption? Is it because the Keynesian consumption function (i.e., the regression
line) shown in Figure I.3 is extremely close to the actual data points?
•Is it possible that another consumption model (theory) might equally fit the data as
well? For example, Milton Friedman has developed a model of consumption, called
the permanent income hypothesis.
•Robert Hall has also developed a model of consumption, called the life-cycle
permanent income hypothesis. Could one or both of these models also fit the data
in Table I.1?
 In short, the question facing a researcher in practice is how to choose among

competing hypotheses or models of a given phenomenon, such as the
consumption–income relationship.
•Let us use the Keynesian model for a time being. Let Keynes states that on average,
consumers increase their consumption as their income increases, but not as much as the
increase in their income (MPC < 1).
2. Specification of the Mathematical Model of Consumption (single-
equation model)
Y = β 1 + β2 X 0 < β2 < 1 (I.3.1)
Y = consumption expenditure and (dependent variable)

X = income, (independent, or explanatory variable)
β1 = the intercept (value of Y when X = 0)
– Point at which the regression line crosses the Y‐ axis (ordinate)
β2 = the slope coefficient
• – Regression coefficient for the predictor
• – Gradient (slope) of the regression line
• – Direction/strength of relationship
• The slope coefficient β2 measures the MPC.
• Geometrically,
3. Specification of the Econometric Model of Consumption
• The relationships between economic variables are generally inexact. In addition to
income, other variables affect consumption expenditure.
• For example, size of family, ages of the members in the family, family religion, etc., are
likely to exert some influence on consumption.
• To allow for the inexact relationships between economic variables, (I.3.1) is modified as
follows:
• Y = β1 + β2X + u (I.3.2)
• where u, known as the disturbance, or error, term, is a random (stochastic) variable that
has well-defined probabilistic properties. The disturbance term u may well represent all
those factors that affect consumption but are not taken into account explicitly.
• N.B: Dependent variable (y) means response variable, explained, predictand, endogenous,
and outcome variable. Independent variables (x) means explanatory, repressors,
exogenous, predictor variables. And, coefficients are called statistic in sample, and are
parameter in population
(I.3.2) is an example of a linear regression model, i.e., it hypothesizes that Y is
linearly related to X, but that the relationship between the two is not exact; it is
subject to individual variation. The econometric model of (I.3.2) can be
depicted as shown in Figure I.2.
4. Obtaining Data
• To obtain the numerical values of β1 and β2, we need data. Look at Table
I.1, which relate to the personal consumption expenditure (PCE) and the
gross domestic product (GDP). The data are in ―real‖ terms.
5. Estimation of the Econometric Model
• The objective is to minimize the error terms

so that we apply Ordinary Least Square
(OLS) method to find the optimal level of
coefficients.
• Least squares method minimizes the sum of
squares of errors (deviations of individual
data points form the regression line). Such a
and b are called least squares estimators
(estimators of parameters α and β).
• The process of getting parameter estimators
(e.g., a and b) is called estimation.
“Regress Y on X”
Regression line is a straight line that describes the
dependence of the average value of one variable on
the other. Slope Random Error
Y Intercept Coefficient
Yi      X i   i
Dependent (Response) Independent
Variable Regression (Explanatory)
Line Variable
391
Ordinary Least Squares Method
E (Y )  Yˆ  a  bX
  Y  Yˆ  Y  (a  bX )  Y  a  bX
 2  (Y  Yˆ ) 2  (Y  a  bX ) 2
(Y  a  bX ) 2  Y 2  a 2  b 2 X 2  2aY  2bXY  2abX
 
 2
 (Y  ˆ
Y ) 2
  (Y  a  bX ) 2
Min   2  Min  (Y  a  bX ) 2
How to get coefficients b that can minimize the sum of squares of errors?
Compute a and b so that partial derivatives
with respect to a and b are equal to zero

  2

 
  (Y  a  bX ) 2 
 2na  2 Y  2b X  0
a a
na   Y  b X  0
a  Y
b  X
 Y  bX
n n
Take a partial derivative with respect to b and
 got,
plug in a you Y X

  2
 
  (Y  a  bX ) 2
 2b X 2  2 XY  2a X  0
b b
b X 2   XY  a  X  0   Yb
X X 2 
 XY  Y  bX  X  0
b X 2   XY     b 
 Y X 
 X  0

 n n 
X Y  X 0
b X 2   XY     b 
2
n n
 n X 2   X 2   XY   X  Y
b 
 n  n
 
Least squares method is an algebraic solution
that minimizes the sum of squares of errors
(variance component of error)
n XY   X Y  ( X  X )(Y  Y ) SP
b   xy
n X   X  (X  X )
2 2 2
SS
x
a Y  b  X  Y  bX
n n
Properties of OLS estimators: The outcome of least squares method is
OLS parameter estimators a and b.
•OLS estimators are linear
•OLS estimators are unbiased (precise)
•OLS estimators are efficient (small variance)
•Gauss-Markov Theorem: Among linear unbiased estimators, least
square estimator (OLS estimator) has minimum variance. BLUE (best
linear unbiased estimator)
In order to estimate coefficients, first we need to build the Classical
linear regression model:
Linear in Parameter
Linear relationship between Y and Xs
Constant slopes (coefficients of Xs)
Xs are fixed; Y is conditional on Xs
X is exogenous and error is not related to Xs
Constant variance of errors (Homoscedascticity)
No autocorrelation with error terms
Therefore, the estimation of the Econometric Model of the example we
have is as follows:
• Regression analysis is the main tool used to obtain the estimates. Using this
technique and the data given in Table I.1, we obtain the following estimates
of β1 and β2, namely, −184.08 and 0.7064. Thus, the estimated
consumption function is:
Yˆ  184.08  0.7064 X (I.3.3)

Se 24.372 0.025
• The estimated regression line is shown in Figure I.3. The regression line fits
The slope coefficient (i.e., the
the data quite well.
MPC) was about 0.70, an increase in real
income of 1 dollar led, on average, to an
increase of about 70 cents in real consumption.
Yi =  0 +  1X1i +  2X2i + i
Y (Observed Y)
Response 0 i
Plane
X2
X1 (X1i,X2i)
 Y|X =  0 +  1X1i +  2X2i
399
R2
• R2(Coefficient of Determination) is SSM/SST that

measures how much a model explains the overall variance
of Y.
• Coefficient of Determination: is defined as the proportion
of the total variation or dispersion in the dependent
variable that explained by the variation in the explanatory
variables in the regression. Large R square means the
model fits the data
• R2 is used to analyze how differences in
one variable can be explained by a difference
in a second variable.
• For example, height weight relationship

– R2=0.81 means that height/weight relationship
accounts for 81% of the total variation.
– This means that most of the variation in the data
is explained by the model
401
Goodness-of-fit:How Good Is theModel?
• Goodness-of-fit measures evaluates how well

a regression model fits the data. The smaller
RSS, the better fit the model.
• The regression line is a model based on the

data.
• This model might not reflect reality.
– We need some way of testing how well the model
fits the observed data.
– How?
Sums of Squares
SST uses the

differences SSR uses the differences
between the observed between the observed data and
data and the mean the regression line
value of Y
SSM uses the differences

between the mean value of Y and
the regression line
Diagram showing from where the regression sums of squares derive

Total SS (SST)
• SST
– Total variability
(variability between
scores and the mean).
Residual SS or Error SS (SSR)
• SSR
– Residual/error variability (variability
between the regression model and the
actual data).
• Difference between the
observed data and the model
• This represents the degree of
inaccuracy when fitting the best
fit model to the data.
Model SS or Regression SS (SSM)
• SSM
– Model variability (difference in
variability between the model and the
mean).
• This is the improvement we get from
fitting the model to the data relative to
the null model.
SST = SSR +SSM
• How to we get large SSM?
• What happens if the SSM is large?
• Regression model is much different from using
the mean as the outcome, therefore
regression model improves the outcome.
• So, we can calculate the proportion of
improvement due to the model.
• SSM/SST, percentage of variation explained by
the model.
• In Simple Regression (if only has one X), R2
square is Karl Pearson correlation coefficient
squared. r2=.89672=.80
• If a regression model includes many regressors,
R2 is not equal to r2.
• Addition of any regressor always increases R2
regardless of the relevance of the regressor.
• Adjusted R2 give penalty for adding regressors:
(n  1)
R  1
2
(1  R 2 )
(n  k )
Statistical Test:
Inferential Statistics and Hypothesis
Testing
Inferential statistics
• Inferential statistics use data taken from a
population to describe and make inferences about
the population.
• With inferential statistics, researchers are trying to
reach conclusions that extend beyond the
immediate data alone.
• Inferential statistics use a random sample of data
taken from a population to describe and make
inferences about the population.
• Inferential statistics are valuable when examination

of each member of an entire population is not
convenient or possible.
410
• Example1, to measure the diameter of each nail
that is manufactured in a mill is impractical. You
can measure the diameters of a representative
random sample of nails. You can use the
information from the sample to make
generalizations about the diameters of all of the
nails.
• For instance, we use inferential statistics to try to
infer from the sample data what the population
might think.
• Thus, we use inferential statistics to make
inferences from our data to more general
conditions;
411
• Inferential statistics provide a way of: going
from a “sample” to a “population”
412
Hypothesis testing or significance testing
• It is a method for testing a claim or hypothesis
about a parameter in a population, using data
measured in a sample.
• In this method, we test some hypothesis by

determining the likelihood that a sample statistic
could have been selected, if the hypothesis
regarding the population parameter were true.
The goal of hypothesis testing is to determine the
likelihood that a population parameter, such as
the mean, is likely to be true. The method can be
summarized in five steps.
1. Hypothesis Testing: we identify a hypothesis or
claim that we feel should be tested.
2. Calculate Test Statistic:
3. Select Tabulated Test: Look for from their
distinct tables
4. Compare Calculated and tabulated one:
5. Decision Rules
A) Hypothesis Testing:
• A statistical test provides a mechanism for making quantitative decisions

about a process or processes. The intent is to determine whether there is
enough evidence to "reject" a conjecture or hypothesis about the
process.
The Ho conjecture is called the null hypothesis. Not rejecting may be a good
result if we want to continue to act as if we "believe" the null hypothesis
is true. Or it may be a disappointing result, possibly indicating we may
not yet have enough data to "prove" something by rejecting the null
hypothesis.
H0: Null Hypothesis indicating the current belief is true

H1: Alternative Hypotheses, indicating your belief
Null and alternative hypotheses can be two sided or one-sided, it means
two-tailed or one tailed.
Hypothesis Testing for individual
Coefficients: H 0 : i  0
H1 :  i  0
H 0 : 1   2  0
H1 : 1   2  0
Hypothesis Testing for Joint Coefficients

(overall significance of goodness of the fit)
B) Compute the Test Static:
1) Statistically testing for individual coefficient

In theory, the t-statistic of any one variable may be used to test the
hypothesis that the true value of the coefficient is zero (which is to say, the
variable should not be included in the model).
• In testing the null hypothesis that the populations mean is equal to a
specified value , one uses the statistic: Degrees of Freedom = (n-k).
Standard Error of the Slope Estimate
t 

sbˆ 
 (Y  Yˆ )
t
2

e 2
t
se(  )

( n  k ) ( X  X )
t
2
(n  k ) ( X t  X )2
1) Statistically Testing for joint level of significance
• The F-ratio provide a test of the significance of all the
independent variables (other than the constant term)
taken together.
• The F-ratio is the ratio of the explained-variance-per-
degree-of-freedom-used to the unexplained-variance-
per-degree-of-freedom-unused, i.e.:
ESS / k  1
F
RSS / n  k
Where K is the number of coefficient and N is the number of observation .
• That is to find out whether the estimates obtained in, Eq.
(I.3.3) are in accord with the expectations of the theory that is
being tested. Keynes expected the MPC to be positive but less
than 1. In our example we found the MPC to be about 0.70.
But before we accept this finding as confirmation of
Keynesian consumption theory, we must enquire whether this
estimate is sufficiently below unity. In other words, is 0.70
statistically less than 1? If it is, it may support Keynes‘ theory.
• Such confirmation or refutation of economic theories on the
basis of sample evidence is based on a branch of statistical
theory known as statistical inference (hypothesis testing).
• It is also long with Statistical Inference from Sample to
population
H 0 : 1  0 0.7064  0
t  28.56
H1 : 1  0 0.025
C) Decision Rules:
1. If tcal  ttab , Reject H 0 and Accept H1

Pvalue   , Reject H 0 and Accept H1
then coefficient is statistically significant, and

the associated variable is a policy variable. If not,
it is statistically insignificant and can not be a
policy variables.
Fcal  Ftab , Reject H0 and Accept H1
2.If Pvalue   , Reject H0 and Accept H1
then, all explanatory variables are jointly
statistically significant, meaning the model is good
fir. If not, the model is not good.
The probability of obtaining a sample mean, given that the value
stated in the null hypothesis is true, is stated by the p value. The
p value is a probability: It varies between 0 and 1 and can never
be negative.
we stated the criterion or probability of obtaining a sample mean

at which point we will decide to reject the value stated in the null
hypothesis, which is typically set at 5% in behavioral research.
• The p value for obtaining a sample outcome is compared to

the level of significance. Significance, or statistical
significance, describes a decision made concerning a value
stated in the null hypothesis.
• When the null hypothesis is rejected, we reach significance.
When the null hypothesis is retained, we fail to reach
significance
7. Forecasting or Prediction
• To illustrate, suppose we want to predict the mean consumption

expenditure for 1997. The GDP value for 1997 was 7269.8 billion dollars
consumption would be:
Yˆ1997 = −184.0779 + 0.7064 (7269.8) = 4951.3 (I.3.4)
• The actual value of the consumption expenditure reported in 1997 was

4913.5 billion dollars. The estimated model (I.3.3) thus over-predicted the
actual consumption expenditure by about 37.82 billion dollars. We could
say the forecast error is about 37.8 billion dollars, which is about 0.76
percent of the actual GDP value for 1997.
• Within Sample and Out of sample forecasting.
Example
423
Diagnostic Tests (Post-Estimation
Tests)
8. Diagnostic Tests (Post-Estimation Tests)
• The results of the model MUST satisfy the
assumptions of linear regression model and the
properties of the coefficients. Otherwise, we do
not need to use the result!
• Test for Normality
• Test for Multicollinearity
• Test for Autocorrelation
• Test for Homoskedasticity
Test for Normality:
the Jarque–Bera test is a goodness-of-fit test of
whether sample data have
the skewness and kurtosis matching a normal
distribution. The test statistic JB is defined as
where n is the number of observations (or degrees of freedom

in general); S is the sample skewness, and K is the
sample kurtosis:
a) Hypothesis Testing
H 0 : e rror terms are nomally distributed
H1 : Null Hypothesis is not true
a) Decision Rule
J cal  J tab , Reject H 0 and Accept H1
Test for Multicollinearity
Multicollinearity is a linear relationship between two explanatory
variables. One of the features of Multicollinearity is that the
standard errors of the affected coefficients tend to be large. In that
case, the test of the hypothesis that the coefficient is equal to zero
leads to a failure to reject the null hypothesis
Steps: run an OLS of one of the explanatory variable on all other
explanatory variables. And calculate VIF‖
High VIF, High MC: In the rule of thumb,
If VIF is less than 10 , MC is not there
1 1
VIFi  
(1  Ri ) tolerance
2
Test for Hetroscedasctisity
In statics, a sequence of random variable is heteroskedasticity, if
the random variables have different variance.
• When the errors have the same scatter regardless of the value of
X, the error terms are homoscedastic. When the scatter of the
errors is different, varying depending on the value of one or
more of the independent variables, the error terms
are heteroskedastic.
Test for Heteroskedasticity using Breusch-Pagan
Test for Heteroskedasticity using Goldfeld–Quandt:
H 0 : Constant Variance Chi 2cal  Chi 2tab , Reject H 0 and Accept H1
H1 : Null Hypothesis is not true Pvalue   , Reject H 0 and Accept H1
Test for Autocorrelation
In statistics, the autocorrelation of a random process describes the correlation
between values of the process at different points in time, as a function of the two
times or of the time difference.
• degree of similarity between a given time series and a lagged version of itself
over successive time intervals.
• measures the relationship between a variable's current value and its past values.
The existence of autocorrelation can be detected using
Having the above regression estimate, Durbin-Watson propose the following to
detect the existence of autocorrelation:
et   e11  t yt  0  1 xt  et  yˆ  et
t n  
 (e  e )2 
d  2(1   )
i i 1
d  i 2
t n 
 (e )
i 1
i
2
Having this decision rule, the Hypothesis Testing
is: H :   0 or d=2, no autocorrelation
0
H 1 :   0 or d  2, there is autocorrelation
Upper and lower critical values, du and dL have been

tabulated for different values of k and it has three
possibilities: accept, reject, or indeterminate.
If d  d L reject H 0 :   0
If d  du do not reject H 0 :   0
If d L  d  du test is inconclusive
Example: regressing Y on x in simple regression with
sample size 20. After regression you have the following:
t n  
 (ei  ei 1 ) 2
d  i 2
t n 
 1.08
 (e )
i 1
i
2
If we choose at 5% level of significance, the critical values

corresponding to n=20 and one regressor as DL=1.20 and
Du=1.41.
Therefore, d=1.08 < DL=1.20, rejecting HO and concluding
the errors are positively auto correlated
To avoid some of the pitfalls of the Durbin-Watson d test of the
autocorrelation, the Breusch–Godfrey has been developed to
address this issue in the sense that it allows for (Non-stochastic
regressors such as the lagged values of the regressand, Higher-order
autoregressive schemes, such as AR(1), AR(2) etc, and Simple or
higher-order moving averages of white noise error terms
H 0 : No serial correlation
H1 : Null Hypothesis is not true
Chi 2cal  Chi 2tab , Reject H 0 and Accept H1

Use of the Model for Control or
Policy Purposes
9. Use of the Model for Control or Policy
Purposes
• Suppose we have the estimated consumption function given in (I.3.3).
• Suppose further the government believes that consumer expenditure of
about 4900 will keep the unemployment rate at its current level of about
4.2%.
• What level of income will guarantee the target amount of consumption
expenditure?
• If the regression results given in (I.3.3) seem reasonable, simple arithmetic
will show that:
4900 = −184.0779 + 0.7064X (I.3.6)

• which gives X = 7197, approximately. That is, an income level of about
7197 (billion) dollars, given an MPC of about 0.70, will produce an
expenditure of about 4900 billion dollars. As these calculations suggest, an
estimated model may be used for control, or policy, purposes. By
appropriate fiscal and monetary policy mix, the government can manipulate
the control variable X to produce the desired level of the target variable Y.
Figure: Summarizes The Anatomy Of Classical Econometric Modeling.
Introducing
Qualitative/Categorical/Discrete
Explanatory Variables
Regression Model with Dummy
Variables
Dummy variables:
• They are discrete variables taking a value of ‗0‘ or
‗1‘. They are often called ‗on‘ ‗off‘ variables, being
‗on‘ when they are 1.
• Dummy variables can be used as explanatory
variables for qualitative data, or discrete data or
categorical data.
*Qualitative dummy variables: i.e. sex, race, health.
*Seasonal dummy variables: depends on the nature of the data, so quarterly data
requires three dummy variables etc.
*Dummy variables that represent a change in policy:
Intercept dummy variables, that pick up a change in
the intercept of the regression
Slope dummy variables, that pick up a change in the
slope of the regression
If y is a teachers salary and
Di = 1 if a non-smoker
Di = 0 if a smoker
We can model this in the following way:
yi    Di  ut
Keys:
This produces an average salary for a smoker of E(y/Di =0) =.

The average salary of a non-smoker will be E(y/Di = 1) =  + .
This suggests that non-smokers receive a higher salary than smokers.
Equally we could have used the dummy
variable in a model with other explanatory
variables. In addition to the dummy variable
we could also add years of experience (x), to
give:
yi    Di  xi  ut
y
Non-smoker
Smoker
α+β
x
Two ways of Specifying Model with Dummy:
1)A model with constant term:
• Drop out one of the dummy category and consider it as a
reference category. This is due to protecting the model from
multicollinearity.
•Constant term coefficient is mean value of the reference category.
•Coefficients of dummy variables measures marginal difference.
•Example a model for having 4 season dummy variables:
Examining the impacts of seasonality on wage income
Y   0  1d 2   2 d3  3 d 4  
Exercise 1: seasonality is represented by dummy
variables and agricultural wage income is captured
by Y.
Y  800  200d 2  400d3  100d 4  
The mean wage of season one is 800 Birr.
1. Wage in season two is less than the reference season, S1, wage
2. Wage in season Three wage is higher than S1 wage
3.Wage in season four is higher than S1 wage.
1) A model with out constant term:
• Drop out the constant term. This is due to
protecting the model from Multicollinearity
• No season dummy variables are dropped out for
being a reference category.
• Coefficients of dummy variables measures mean
values, not marginal difference
• Example a model for having 4 season dummy
variables: Examining the impacts of seasonality
on wage income
Exercise 2: seasonality is represented by dummy
variables and agricultural wage income is captured
by Y. You can simply derived from the first model
Y  800d1  600d 2  1000d3  900d 4  
1.The mean wage of season one is 800 Birr.
2. Mean wage in season two is 600
3. Mean wage in season three wage is 1000
4.Mean wage in season four is 900
Interactive Dummy
Dummy variables are simply variables that have been coded either 0 or 1
to indicate that an observation falls into a certain category. They are also
sometimes called indicator variables.
Interactive terms captures the possibility that the effect of one
independent variable might vary with the level of another independent
variable. Example, the effect of the drug on your blood pressure depends
on your age.
OBS PRESSURE AGE DRUG Age*Drug
1 85 30 0 0
2 95 40 1 40
3 90 40 1 40
4 75 20 0 0
5 100 60 1 60
6 90 40 0 0
7 90 50 0 0
8 90 30 1 30
9 100 60 1 60
10 85 30 1 30
Suppose that when we run a regression, we get the following result
A) Again we set D = 0 for the control group and D = 1 for
those taking the drug.
Y = 70 + 5(Drug) + .44(Age) + .21(Drug*Age)
B) We obtain two separate equations for the two
groups:
set D = 0: Y = 70 + .44Age
Set D = 1: Y = 75 + .65Age
Y=75 + .65Age
90
Y=70 + .44 Age
DRUG
BLOOD PRESSURE
80
CONTROL
70
10 20 30 40
AGE
Note that for those taking the drug not only does the intercept increase
(that is, the average level of blood pressure), but so does the slope.
Interpretation of an interactive term -- The effect of one independent
variable (DRUG) depends on the level of another independent
variable (AGE).
The results here suggest that for people not taking the drug, each
additional year adds .44 units to blood pressure.
For people taking the drug, each additional year increases blood
pressure by .65 units.
Do not fall into the dummy variable trap!

When you have entered both values of a
dummy variable in the same regression. These two
variables are linearly dependent. One will drop
out.
“No one size fits all”
Thank you !!!
01-Sep-22 abiotanimaw2014@gmail.com 449

Abiot Research Methods

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Abiot Research Methods

Uploaded by

Copyright:

Available Formats

RESEARCH METHODS

YOM INSTITUTE OF ECONOMIC DEVELOPMENT

• It is considered a breach of research integrity to fail to

Abiyot Animaw(PhD.) 100

Abiyot Animaw(PhD.) 101

Abiyot Animaw(PhD.) 102

Abiyot Animaw(PhD.) 103

Abiyot Animaw(PhD.) 104

Abiyot Animaw(PhD.) 105

Abiyot Animaw(PhD.) 106

Abiyot Animaw(PhD.) 107

Abiyot Animaw(PhD.) 108

Abiyot Animaw(PhD.) 109

Abiyot Animaw(PhD.) 110

Abiyot Animaw(PhD.) 111

Abiyot Animaw(PhD.) 112

Abiyot Animaw(PhD.) 113

Abiyot Animaw(PhD.) 115

Abiyot Animaw(PhD.) 116

Abiyot Animaw(PhD.) 117

Abiyot Animaw(PhD.) 118

Abiyot Animaw(PhD.) 119

Abiyot Animaw(PhD.) 120

Abiyot Animaw(PhD.) 122

Abiyot Animaw(PhD.) 123

To determine the sample size:

Abiyot Animaw(PhD.) 127

Abiyot Animaw(PhD.) 128

Abiyot Animaw(PhD.) 129

Abiyot Animaw(PhD.) 131

Abiyot Animaw(PhD.) 132

Abiyot Animaw(PhD.) 133

Abiyot Animaw(PhD.) 134

Abiyot Animaw(PhD.) 135

Abiyot Animaw(PhD.) 136

Abiyot Animaw(PhD.) 137

Abiyot Animaw(PhD.) 138

Abiyot Animaw(PhD.) 139

Abiyot Animaw(PhD.) 140

Abiyot Animaw(PhD.) 141

Abiyot Animaw(PhD.) 143

Abiyot Animaw(PhD.) 144

Abiyot Animaw(PhD.) 145

Abiyot Animaw(PhD.) 146

Abiyot Animaw(PhD.) 147

Abiyot Animaw(PhD.) 148

Abiyot Animaw(PhD.) 149

Abiyot Animaw(PhD.) 150

Abiyot Animaw(PhD.) 151

Abiyot Animaw(PhD.) 152

Abiyot Animaw(PhD.) 153

Abiyot Animaw(PhD.) 154

Abiyot Animaw(PhD.) 155

Abiyot Animaw(PhD.) 156

Abiyot Animaw(PhD.) 157

Data Collection Techniques

Abiyot Animaw(PhD.) 158

Abiyot Animaw(PhD.) 159

• Collection of the data should be feasible and the data should be

results can be called into question.

Abiyot Animaw(PhD.) 160

– Clear data selection standards set in advance help prevent