Written Report CHAPTER 4

Republic of the Philippines
POLYTECHNIC UNIVERSITY OF THE PHILIPPINES

QUEZON CITY BRANCH
Don Fabian St. Brgy. Commonwealth Quezon City
DEVELOPMENT
ECONOMIC
CHAPTER 4
Impact evaluation of development policies
and program
Group 4
Members:
Arboiz, Angelica
Barbado, Faith Dhoreece
Pucan, Kimberly Andrey
Valeriano, Daniel Gesam
Submitted to:
Francisco Calingasan
QUEZON CITY BRANCH
HOW DO WE KNOW WHAT WORKS FOR DEVELOPMENT?

Every year, governments and donors spend billions of dollars on development initiatives (Easterly,
2006). In 2012, the total OECD overseas development assistance budget to developing countries was
US$151billion (OECD, 2014). Determining what works for development is a complex and multi-faceted
process that involves a combination of research, data analysis, evaluation, and practical experience. Here are
some key steps and methods for understanding what works in development:
• Research and Data Collection:
Data Gathering: Collecting relevant data and information on the specific development issue or goal in
question. This could include economic indicators, social metrics, health statistics, and more.
Literature Review: Conducting a comprehensive review of existing research and studies related to the
development area of interest. This helps in understanding what has already been tried and tested.
• Needs Assessment:
Identifying Needs: Assessing the specific needs and challenges of the target population or region. This
often involves surveys, interviews, and consultation with local communities.
Stakeholder Engagement: Involving local communities, government agencies, NGOs, and other
stakeholders in the development process to ensure their needs and priorities are considered.
• Theory of Change:
Developing a Theory of Change: Creating a clear and detailed framework that outlines the cause-and-
effect relationships between different interventions and the desired development outcomes.
• Pilot Projects and Experiments:
Testing Interventions: Implementing pilot projects or randomized controlled trials (RCTs) to test the
effectiveness of various interventions. RCTs involve randomly assigning participants to different groups to
compare the impact of interventions.
• Monitoring and Evaluation:
Ongoing Assessment: Continuously monitoring and evaluating the progress of development programs to
determine their effectiveness. This includes tracking relevant indicators and making necessary adjustments.
Impact Evaluation: Conducting rigorous impact evaluations to measure the actual impact of interventions
and to distinguish between correlation and causation.
• Comparative Analysis:
Cross-Country and Cross-Project Comparisons: Comparing the outcomes of different development

initiatives across countries or regions to identify best practices and lessons learned.
• Knowledge Sharing and Dissemination:
QUEZON CITY BRANCH
Sharing Results: Disseminating the findings and results of development projects and evaluations with the
wider community, including policymakers, researchers, and practitioners.
• Scaling Up:
Successful Replication: If a particular approach or intervention is found to be effective in one context,

efforts can be made to replicate or scale up the intervention in other similar contexts.
• Feedback and Adaptation:
Learning from Mistakes: Recognizing that not all development interventions will work as planned and
being willing to learn from failures and adapt strategies accordingly.
• Policy and Decision-Making:
Influence Policy: Using research and evidence-based findings to influence government policies and
decisions related to development.
➢ It's important to note that what works in development can vary significantly depending on the context,
the specific development goals, and the needs of the target population. What works in one country or
community may not work in another, so a tailored, evidence-based approach is essential for successful
development efforts. Additionally, development is an ongoing process, and continuous monitoring and
adaptation are crucial to achieving sustainable results.
EVALUATION SYSTEM:
An evaluation system is a structured and systematic approach to assess, measure, and analyze the
effectiveness, efficiency, and impact of programs, projects, policies, or organizations. Evaluation systems are
commonly used in various fields, including business, education, healthcare, and social services, to make
informed decisions, improve performance, and demonstrate accountability. Evaluation systems are used in
various contexts, including program evaluation, performance appraisal, impact assessment, and policy
analysis. The specific design and implementation of an evaluation system depend on the goals and objectives
of the evaluation, the nature of the program or initiative being assessed, and the available resources.
Additionally, different evaluation methods, such as formative evaluation, summative evaluation, and
developmental evaluation, may be employed based on the specific needs of the evaluation.
MAKING IMPACT EVALUATION MORE USEFUL FOR POLICY PURPOSES:
Impact evaluations are an essential tool for assessing the effectiveness of policies and programs but
making them more useful for policymaking requires careful planning and a focus on relevance, timeliness,
and clarity. impact evaluations can be tailored to better inform policy decisions and provide evidence-based
insights into the effectiveness of various policies and programs, ultimately leading to more informed and
impactful policymaking.
EXPERIMENTAL DESIGN-RCT
QUEZON CITY BRANCH
Experimental Design
● Experimental design methods allow the experimenter to understand better and evaluate the factors that
influence a particular system by means of statistical approaches. Such approaches combine theoretical
knowledge of experimental designs and a working knowledge of the particular factors to be studied.
Although the choice of an experimental design ultimately depends on the experiment's objectives and
the number of factors to be investigated, initial experimental planning is essential.
Randomized Controlled Trials or RCT
● Is an experimental form of impact evaluation in which the population receiving the program or policy
intervention is chosen at random from the eligible population, and a control group is also chosen at
random from the same eligible population. It tests the extent to which specific, planned impacts are
being achieved.
● We will also talk about the terms "randomized," "controlled," and "trial," which are the key concepts
in this subject.
The basic principle of ethical randomization is that a group of people can be denied the intervention to serve
as control, but we cannot:
● actively hurt them

● mislead them or give them wrong information, or
● make them worse off than they would otherwise be without the experiment.
The major purpose of random assignment is to ensure that the study groups are comparable on baseline
characteristics, reduce biases, facilitate double-blinding, and get a relevant variable.
How randomization is done?
● Simple Randomization
● Block Randomization
● Stratified Randomization
QUEZON CITY BRANCH
Steps of conducting a RCT
1. Drawing up a protocol
2. Selecting reference and experimental population
3. Randomization
4. Manipulation or intervention
5. Follow up
6. Assessment of outcome
Advantages and Disadvantages of RCT
● RTC has the benefit of reducing bias and offering a rigorous technique for examining cause-and-effect
links between an intervention and outcome; yet, the procedure can be expensive, time-consuming, and
difficult to plan, carry out, and oversee.
Conclusion
● If randomization can be planned ahead of time for the intervention, that is usually the best course of
action. It functions best with straightforward, well-defined interventions that have advantages at the
individual level. This is because numerous observations can be made, and the outcomes of interest
have enough time to be seen in a survey a few years following the intervention. Checking the balance
of all the attributes unaffected by the program between the treatment and control groups is necessary
to confirm the correctness of the randomization. Usually, a baseline survey is conducted before the
intervention gathers these data. Therefore, a baseline survey is not necessary for the RCT effect
evaluation, but it is helpful in determining whether the randomization is valid.
MATCHING METHOD TO CONSTRUCT CONTROL GROUPS: PROPENSITY SCORE MATCHING
● Randomization is not always feasible, especially for events that have already happened and whose
impact we would like to assess after the fact. Moreover, governments and donors often request an
impact assessment of a program that has already been put into action. Propensity score matching (PSM)
is one option that is offered. In this matching method, non-participants who are similar to participants
in a wide range of observable crucial qualities are chosen to serve as the control group.
QUEZON CITY BRANCH
● The method's primary premise is that there won't be any misleading correlations between treatment
and result of the kind we just discussed since the unobserved features of the treated and control groups
are sufficiently identical. This indicates that program participation is uncorrelated with any other
outcome predictor after adjusting for all participant observables. This approach works best for
programs that have only reached a portion of the possible beneficiary population, where selection was
made using criteria that had no bearing whatsoever on the program's potential effects, and where
participation was mostly determined by personal preference.
Here's how to use propensity score matching to construct control groups:
1. Define Treatment and Control Groups

2. Collect Covariates
3. Estimate Propensity Scores
4. Matching
5. Assess Balance
6. Analyze Outcomes
7. Report Findings
In summary, the purpose of propensity score matching is to mitigate selection bias, improve the comparability
of treatment and control groups, and facilitate the estimation of causal effects in observational studies. It is a
widely used method in social and medical sciences when randomized controlled trials are not feasible or
ethical.
REGISTERING A PRE-ANALYSIS PLAN
● Is made available to the public and outlines the researchers' plans for conducting the study and
analyzing the collected data. It is usually filed before the start of the intervention but at the latest before
data analysis starts. The PAP is often submitted with the study's trial registration because it requires a
time stamp.
QUEZON CITY BRANCH
Why Analysis Plan?
● To ensure that you collect all the data you need and that you use all the data you collect. Analysis
planning can be an invaluable investment of time. It can help you select the most appropriate research
methods and statistical tools and highlight the transparency.
Guide in Preparing Analysis Plan
● Sorting Data
● Performing quality-control checks
● Data processing
● Data analysis
Purpose
● A Pre-Analysis Plan (PAP) serves several important purposes in research, particularly in the social
sciences and empirical studies:
1. Transparency
2. Reducing Data Mining and Harking
3. Replicability and Reusability
4. Accountability
5. Planning and Organization
6. Funding and Review
7. Ethical Considerations
In summary, the primary purpose of a Pre-Analysis Plan is to promote research transparency, reduce bias,
enhance the reproducibility of results, and uphold the quality and integrity of empirical research. It's a valuable
tool for ensuring that the research process is conducted in an open and accountable manner
Difference-in-Differences Method
To measure the impact of a program, one approach is to observe the change over time in the outcome variable
among beneficiaries. However, this change cannot be interpreted as policy impact due to other factors that
may have changed during the same period. To account for these "natural changes," the change over time
QUEZON CITY BRANCH
among non-beneficiaries during the same period is computed and subtracted from the change observed among
beneficiaries. This produces the difference-in-differences estimate of the program's impact. The validity of
this method depends on the crucial assumption of "parallel trends" between the control and treatment groups
without intervention. To support this assumption, observations before and after implementation of the program
for both the treatment and comparison groups are required, along with observations of the same groups for at
least two periods before implementation, unless alternative tests support the validity of the counterfactual
trends.
In Conclusion, The diff-in-diffs method is commonly used to analyze policy impact with macro- and
microeconomic data. However, policies are often introduced alongside contextual changes, making it difficult
to isolate the effects of a single policy. This can create a bias in the analysis, as other events and policies may
be mistakenly attributed to the policy reform. To address this, robustness checks and a convincing argument
must be made to dismiss suspicion of spurious correlation. Falsification tests can also be used to verify that
outcomes not affected by the program are moving together both before and after the start of the intervention.
The diff-in-diffs methodology can be combined with other impact evaluation methods like PSM to control for
unobserved characteristics.
Generalization of the Diff-in-Diffs Approach: Roll-Outs with Panel Data

A staggered entry roll-out can be used to define proper counterfactuals for programs or policies. The units that
have not been treated yet may serve as the control group for the treated units. Fixed-effects estimation with
panel data is used to estimate the impact of the rollout. The validity of this method depends on the absence of
other changes in the context or unit's behavior that are correlated with the entry into the program and may
affect the outcome of interest. Two important cases may occur that do not verify this assumption. Panel data
is also used to analyze country-level policy interventions and relationships between macroeconomic variables.
In conclusion, To analyze the impact of policy or program initiatives implemented in stages, a method similar
to diff-in-diffs can be used. However, it is important to understand the roll-out process, verify that pre-program
trends were unrelated to the program, and account for other factors that may affect the outcome. Robustness
checks should be performed to ensure estimated impact remains consistent..
Qualitative Methods
Qualitative methods are useful in gaining insights from program beneficiaries and administrators on what
works and what does not work. They help understand the reasons behind participation or non-participation in
a program and identify the perceived effects of different interventions. Narayan’s Voices of the Poor is an
example of an extensive interview and focus group study that provides insights into poverty determinants
influenced by personal or political interests. Qualitative methods should be designed to contrast the
experiences of beneficiaries and non-beneficiaries and inform us about presumed causal determinants of
observed outcomes. They can be used before quantitative analysis to formulate hypotheses and identify the
main dimensions of heterogeneity of impact. After quantitative analysis, they can assess the plausibility of
results and refine hypotheses by helping to interpret the presumed channels through which impact occurred.
QUEZON CITY BRANCH
Regression Discontinuity Designs (RDD)

This study reviews the literature on whether regression-discontinuity studies reproduce the results of
randomized experiments conducted on the same topic. After briefly reviewing the regression-discontinuity
design and its history, we explicate the general conditions necessary for a strong test of correspondence in
results when an experiment is used to validate any non-experimental method. In economics, withinstudy
comparisons of this kind are associated with LaLonde [1986], and we elaborate on how to do such studies
better than twenty years ago. We identify three cases where regression discontinuity and experimental results
with overlapping samples were explicitly contrasted. By criteria of both effect sizes and statistical significance
patterns, we then show that each study produced similar results across the experiment and regression-
discontinuity study. This correspondence is what theory predicts. But to achieve it in the complex social
settings in which these within-study comparisons were carried out suggests that regression-discontinuity
results may be generally robust.
In many countries, important features of municipal government (such as the electoral system, mayors' salaries,
and the number of councillors) depend on whether the municipality is above or below arbitrary pulation
thresholds. Several papers have used a regression discontinuity design (RDD) to measure the effects of these
thresholdbased policies on political and economic outcomes. Using evidence from France, Germany, and Italy,
we highlight two common pitfalls that arise in exploiting populationbased policies (compound treatment and
sorting), and we provide guidance for detecting and addressing these pitfalls. Even when these problems are
present, population-threshold RDD may be the best available research design for studying the effects of certain
policies and political institutions.
Event Analysis and Event-Severity Analysis
Generally speaking, we are unable to determine the impact of a one-time event that impacts all units of analysis
(households, businesses, and municipalities), such as a national policy reform. This is due to the lack of a
counterfactual for this specific occurrence. We are unable to distinguish the influence of the reform from that
of the other events because a number of things might have happened concurrently with the reform. For
instance, yields may be impacted by a one-time RER appreciation that changes fertilizer pricing, but the
greater yields that were recorded may have also been a result of favorable weather that year. We are unable to
link the event under investigation to the observed change in the outcome of interest.
This occurs when an event is focused in space and has a significant, well-defined shortterm impact; the
inference is that any observable shift in the outcome is unlikely to be the result of unrelated factors. In this
instance, the influence can be determined by taking the simple difference in the outcome between immediately
after and immediately before the incident. This has been used in research on how political ties affect how the
market values a company's ability to make money. Businesses associated with a specific politician or political
party may see a decline in share prices following the unexpected death of a powerful or dishonest leader. New
knowledge will very immediately be internalized by efficient stock markets into business valuations.
Fisman (2001) discovered that news concerning Indonesian President Suharto's deteriorating health had an
impact on the stock price of companies associated with him. An event-severity analysis is made possible by a
firm's degree of connectivity to Suharto; a higher connection is anticipated to result in more significant price
shocks. According to the findings, investors calculated that if Suharto passed away, the value of companies
QUEZON CITY BRANCH
connected to him politically would decrease by 20%. The extent of this price shock was correlated with the
degree of Suharto dependency.
Macro-level impact evaluation: What can be done?
In this chapter introduced language program evaluation at the meso level with a view to understand how
departmental strategies and teaching initiatives that affect the integration of technology are understood by
instructors and program administrators in a collegial environment. In this chapter, we discuss blended
language program evaluation at the macro level. Characteristically, the macro level is made up of senior
administrators and academics who often frame the integration of educational technology through the
production and dissemination of documents that include institutional policies, strategy papers, and teaching
initiatives. After explaining reasons for evaluation at the macro level, we describe an example argument to
illustrate its use in context. We then discuss working with project stakeholders, explain working with
documents and with people at this level, and conclude the chapter with a brief summary.
An achievement of these aims would inform the relevance and strength of other factors that may influence
blended language learning at the meso and micro levels. To give an example, imagine if a modern language
department that was keen to adopt blended approaches came to understand how well their efforts aligned with
national goals; with an improved view of the macro level, they could adjust their work such that it better met
larger initiatives and motivated long-term efforts. The lack of an understanding of the macro level could foster
a lack of purpose, for example, that may then lead to a sense of isolation or a poor use of funding
In a sense, macro-level mechanisms help to establish boundaries and expectations of an educational system.
That ‘system’, depending on the goal of the evaluation project, could encompass international agreements or
be focused on actions much more local, such as the rules set by a teacher for expected classroom behaviors.
Without such parameters, an educational system may lack direction and lack clarity about what is valued,
what is expected, or what can be achieved.
REFERENCES:
● Randomised Controlled Trial, RCT, Experimental study. (n.d.). PPT.

https://www.slideshare.net/DrLipilekhaPatnaik/randomised-controlled-trial-rct-experimental-study
● Research Guides: Study Design 101: Randomized Controlled Trial. (n.d.).
https://guides.himmelfarb.gwu.edu/studydesign101/randomized-controlled-trial
● Matching methods. (n.d.). PPT. https://www.slideshare.net/FAIZYMANSOURY/matching-methods
● Propensity Score Matching Methods. (n.d.). PPT. https://www.slideshare.net/erf_latest/propensity-
score-matching-methods
● analysis plan.ppt. (n.d.). https://www.slideshare.net/SamsonOlusinaBamiwuy/analysis-planpp
• R., Grembi, V., & Nannicini, T. (2018). Regression discontinuity designs based on population
thresholds: Pitfalls and solutions.
QUEZON CITY BRANCH
• Buddelmeyer, H., & Skoufias, E. (2004). An evaluation of the performance of regression discontinuity
design on PROGRESA Wuepper, D., & Finger, R. (2023).
• Regression discontinuity designs in agricultural and environmental economics. M.S., Suvorov, R.,
Rick, K. (2016). Macro-Level Evaluation. In: Blended Language Program Evaluation.
• Gruba, P., Cárdenas-Claros, M. S., Suvorov, R., Rick, K., Gruba, P., Cárdenas-Claros, M. S., ... &
Rick, K. (2016). Macro-level evaluation.
• Bussolo, M., & da Silva, L. A. P. (Eds.). (2008). The impact of macroeconomic policies on poverty
and income distribution: macro-micro evaluation techniques and tools.

Written Report CHAPTER 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Written Report CHAPTER 4

Uploaded by

Copyright:

Available Formats

Republic of the Philippines

POLYTECHNIC UNIVERSITY OF THE PHILIPPINES

HOW DO WE KNOW WHAT WORKS FOR DEVELOPMENT?

Cross-Country and Cross-Project Comparisons: Comparing the outcomes of different development

Successful Replication: If a particular approach or intervention is found to be effective in one context,

Randomized Controlled Trials or RCT

● actively hurt them

How randomization is done?

Steps of conducting a RCT

Advantages and Disadvantages of RCT

Here's how to use propensity score matching to construct control groups:

1. Define Treatment and Control Groups

REGISTERING A PRE-ANALYSIS PLAN

Why Analysis Plan?

Guide in Preparing Analysis Plan

Generalization of the Diff-in-Diffs Approach: Roll-Outs with Panel Data

Regression Discontinuity Designs (RDD)

Event Analysis and Event-Severity Analysis

Macro-level impact evaluation: What can be done?

● Randomised Controlled Trial, RCT, Experimental study. (n.d.). PPT.

You might also like