Professional Documents
Culture Documents
Jeremy Mitchell
Oakland University
Abstract
The second edition of Martin H. Jason’s 2008 Evaluating Programs to Increase Student
Achievement lays out the major reasons why schools must conduct program evaluations as well
as practical ways to make the process worthwhile. Program evaluation encompasses quantitative
and qualitative data collection and analysis. In an ideal world these evaluations are conducted by
a diverse team with the time and resources to analyze data and make recommendations to leaders
with fidelity. The scarcity of time and money is often times the biggest constraint, and thus this
Introduction
School leaders are responsible for the success of students in an environment of high
not an industry on an island in this regard, but with more options than ever to skin the proverbial
cat, it is important to evaluate programs and initiatives to ensure they are at the least effective,
but more importantly to provide recommendations for improvement. The second edition of
guidebook to aid school administrators and teacher leaders in the process of program evaluation.
The following is a synopsis of this guide by chapter, along with correlations and implications to
current practice.
The analogy of spinning your wheels in the mud comes to mind when thinking of
program evaluation. Whether in schools or any other organization, how do you know if your
efforts are worthwhile unless you are willing to take a step back and assess progress? This is
especially true for public schools considering they are funded by the public and face a paradigm
of shrinking budgets where efficiency and targeted initiatives must be effective. There are many
benefits and uses to a methodical program evaluation, but in particular, “program evaluation
serves two organizational functions - it confirms and it diagnoses” (Jason, 2008, p.1). Chapter 1
of the book lays out the large decisions that schools must make regarding the evaluation of
programs, such as whether to assess their own programs or look to outside agencies, taking the
time to identify the goals of the evaluation, and deciding whether the evaluation should be
Grand Blanc Community Schools conducts program evaluation across the spectrums laid
out in this first chapter. One example of a formative evaluation would be the curriculum studies
that are conducted across the district via grade level or department committees. Curriculum is on
a cycle to be reviewed and updated according to changing standards, pedagogy, and assessment.
These teams research and keep the district curriculum council up to date. This council meets
monthly and different groups present their findings periodically. Depending on the cycle of the
curriculum study, groups may be researching what solutions are out there, piloting different
materials, or even presenting proposals to fund the resources for full implementation.
On the other end of the spectrum, Grand Blanc Schools participates in a system wide
formal summative evaluation conducted by AdvancEd. Not only are school improvement plans
annually submitted for evaluation, but also every five years AdvancEd deploys a diverse team of
experienced external evaluators onsite to critique practices district wide. This process not only
serves as a pulse measure, but also gauges effectiveness of the organization for local, state, and
Evaluating programs in the field of education is absolutely necessary. The most practical
way is to conduct on-site evaluations that formatively assess the impact of teaching and learning
so plans for improvement can be formulated. It can be quite challenging to free up people’s time
to genuinely reflect and analyze effectiveness, but responsible leaders know that this process is
Ultimately this chapter highlights whether leaders and their teams value reflective
practice and whether or not problems are seen as opportunities for improvement. Jason cites
REPORT AND ANALYSIS OF EVALUATING PROGRAMS 5
different management and leadership authors in the education sphere and otherwise that define
those genuinely conducting program evaluation as “learning organizations” (Jason, 2008. P. 12).
Organizations in this category administer reflective practices focused on growth and continuous
improvement as opposed to complacency driven by ego. The continuous improvement cycle laid
out by Jason is as such: “plan evaluation design, collect and analyze data, draw conclusion, and
Early in the chapter Jason cites the “overriding mentality… to the unremitting pursuit of
excellence” as a key indicator of program evaluation success (Jason, 2008, p. 11). This phrase
happens to be part of the Grand Blanc Schools mission statement. The most recent change
driven by the continuous improvement and program evaluation processes in place throughout the
district is the rebranding and modification of Grand Blanc’s City School. City School is a 1st
grade through 5th grade multi-age elementary school that over the course of its history has
unintentionally become a defacto gifted and talented magnet school for wealthy families. While
highly effective in terms of scores on tests, City School is not representative of the greater
district environment.
It has taken much political capital, but in the fall of 2017 City School will become the
Perry Innovation Center, which will be a 2nd grade through 6th grade building that adopts a
project based learning approach to curriculum and instruction. The building will also house the
district’s curriculum and technology departments. Teachers will rotate in and out of the school
on two to three year stints where they will then bring innovative ideas back to their previous
elementary buildings. The new student body will be required to demographically match the
district on the whole, so that the innovations and successes will be able to be transferred and
REPORT AND ANALYSIS OF EVALUATING PROGRAMS 6
scaled to other buildings without push back that those types of things can only be done with a
draft pick of the best of the best students the way City School did.
There is much work to be done, but the hard questions were asked of the system and now
changes are underway to improve outcomes for everyone because of it. This new school will
now face more scrutiny that City School ever did. Program evaluation will continue and
improvement from the old model must be made. The Superintendent declared that he wants the
school to be successful, but that if it has not achieved a sustainable model for innovations that
Approach
In this chapter, Jason lays out two different methodologies for conducting the evaluation
drawbacks of each, with regard to validity and how to best understand and limit the contribution
of outside factors, are discussed (Jason, 2008). Nonexperimental evaluation is past driven, in
that it analyzes existing past data to “judge the program as it now stands” (Jason, 2008, p. 24).
Experimental evaluation is future focused and geared toward trying out something new and
hoping to limit variables and determine whether or not taking this new road would be beneficial
curriculum mentioned earlier. Grand Blanc, like many other schools districts across the country,
chooses to pilot new curriculum or programs as a test drive whether or not it will be beneficial
for student outcomes. It is wise not to jump into the mass purchasing of programs or items
REPORT AND ANALYSIS OF EVALUATING PROGRAMS 7
without doing so, however as Jason cites, there are many factors that can over-influence the
mentioned that Grand Blanc has been cautious of is the concept of “demoralization” (Jason,
2008, p. 28). Demoralization would be when students participating in the “regular” program
have their motivation or “esprit de corps” diminished by the excitement surrounding the
“innovative” experimental pilot program. As Grand Blanc has piloted new devices like
Chromebooks in certain pockets of the district this effect has been considered and even seen.
The shiny new toy cannot distract and dull the experience of the current or control group, thus
Three years ago the district piloted one-to-one chromebooks in select high school social
studies classrooms. Of course there was some resentment from non-participating classrooms
because they were missing out on the new technology, but as many educators have learned,
technology is merely a tool and cannot replace quality instruction. Ultimately, the initiative was
viewed favorably and with the passage of a Technology bond the district was able to move
forward with increasing the procurement and implementation of more chromebooks. Overall,
the affordability, durability, and functionality of the chromebooks has been a quite a benefit,
especially in comparison to other device options like the iPad, or even laptops.
tweaking variables and making adjustments to our curriculum, instruction, and assessment in
order to improve outcomes for students. This entire book is devoted to evaluating those tweaks
and process changes to determine it they have been effective or not, and whether tweaking is
REPORT AND ANALYSIS OF EVALUATING PROGRAMS 8
needed. This chapter focuses on some of the weeds of ensuring that the experiments and trials
One of the biggest pieces of ensuring validity is to build control into the experiment.
Some examples that the author mentions to help build in control would be to create similar
conditions across two classrooms for example, one being the control environment and the other
being the experimental environment. You would want to have similar conditions in each. So if
you were implementing a new writing program for example, you would want the teachers of
each classroom to have similar strengths, experience, and ability. Additionally, Jason cites that
you would hope that each class could be randomly assigned, “that way the process of assignment
would be based on chance and not on the judgment of the person(s) responsible for planning the
evaluation” (Jason, 2008, p. 39). The difficult part for schools is that creating two equivalent
groups that are assigned randomly to increase validity is rarely practical. In this case a
“one-group design” can be used and certain techniques like using multiple pretests and posttests
Taking steps to ensure that variables are limited so that outcomes can be correlated to the
program are important, but the most important piece is keeping the ‘big picture’ of enhancing
credibility, of which thoroughly describing the “experimental design” can overcome many
shortcomings (Jason, 2008). This is an important note for schools to remember. Internally,
running experiments free of bias can be very difficult, but having an open and transparent
process that is described to stakeholders is a major step to make the process more viable. This
holds true for the curriculum study and piloting example mentioned earlier. Every grade level
and department in Grand Blanc has to be conducted in stages such as exploration, pilot,
REPORT AND ANALYSIS OF EVALUATING PROGRAMS 9
evaluation, and adoption. While class rosters are already set, and finding teachers of the same
caliber and experience for the control and pilot class is difficult, the transparency of the process
evaluation, and even more so if the evaluation of the program could lead to the end of the
program. Therefore, having a strong, organized and diverse team, headed up by a leader that
emphasizes the importance of the work of the team is a necessity. In this chapter Jason identifies
some very practical things for principals to consider when forming an evaluation team as well
tips for identifying the roles of each member: nothing outside the norm of typical leadership and
task organization advice. Examples suggested include; being organized, sending out
communication early and often, communicating recommendations of the team to staff involved,
The difficulty in reviewing this chapter as it relates to local, state and national education
decisions is how much nuance is involved in leadership. As mentioned, these teams and
decisions can have some real consequences on the success of children, but also the livelihood of
team members. The hard truth is that most program evaluation can have some very serious
consequences. One example of this recently and locally involves Grand Blanc Community
Schools’ decision to conduct facility studies due to declining enrollment, and ultimately
determine if certain buildings should be closed and others then consolidated to maximize the use
of resources. On the north end of the district there are two mutually supporting elementary
buildings, one is K-2 and the other 3-5. Of four proposals given to the school board, three of
REPORT AND ANALYSIS OF EVALUATING PROGRAMS 10
them involved consolidating these two to one building and making it a single K-5 building.
Ultimately, the closure of one building will not happen due to enrollments not dropping as fast as
expected in the months since the study was initiated, but that possibility still looms.
In the Grand Blanc example, the facility study team, initiated by the school board, was
initially made up of a mix of administrators at the building and central office levels. Eventually,
the study team would grow to include representatives from buildings that would be affected and
even union leadership. Choosing the team, as mentioned in Jason’s book, was a very important
consideration to weigh. If the team was too heavy with administration then closing a building
and potentially having staff reductions could create a lot of animosity. On the flip side, having
too many staff members that were vested in the building staying open could lead to punting on a
tough decision to avoid short term heartache. Luckily, the ultimate level of democratic decision
making would be left to the school board to decide upon a menu of options. At the time of this
writing, the school board delayed closing any buildings, but are still keeping the option of
The same holds true for the field of education in general. Teams of evaluators must be
collaborative and diverse in experience in order to prevent bias. Every state has wide and
diverse communities with their own unique challenges. A team evaluating programs in an urban
impoverished area wouldn’t necessarily be best served by evaluators from that professional
demographic. They might be too sympathetic to their challenges of the environment. However,
a team of outsiders without that experience would be greeted with skepticism. In essence, the
Since many times the stakes of a program evaluation are so high, it is important for the
data. Additionally, the team should take steps to ensure that the data they are collecting is valid
and reliable. In other words, data is related to what they want to measure and it is measured in a
consistent way (Jason, 2008, p. 74). One of the best ways to determine the who, what, when,
where, and why of measurement is to have an open conversation with the parties involved in the
program: “leaders should be sensitive to the personal or human side of program evaluation”
(Jason, 2008, p. 65). Talking with the professionals involved in the program will never be a
wasted effort and many insights can be gained. Jason (2008) highlights the importance of
essence, you can measure many indicators, but ultimately the conclusions about the data will be
The last point about professional judgment, especially within local school districts, is
very important to consider. The reality is that having a staff member on a program evaluation
Conclusions can be drawn based upon presented data, but not to the level that a professional
research team would provide. At the state and national level though, this is much more likely.
Multimillion dollar implications call for the most detailed and informed analysis possible.
Decision makers seeking feedback from an evaluation team, no matter the political level, must
A recent example of this is the character education initiative taking place at Indian Hill
Elementary school that is ultimately spreading the entire district. One of the first quantitative
REPORT AND ANALYSIS OF EVALUATING PROGRAMS 12
indicators that administrators would look at first to evaluate a character education program is
suspension data. The issue with judging the effective of this initiative on this metric is that Indian
Hill happens to be a pretty high performing school. With so few serious behavior incidents
already, comparing the effect of this initiative according to this metric doesn’t paint an accurate
picture. However, there has been much anecdotal support for the positive effects of the program
that decision makers have decided to expand it to other schools. Early qualitative survey data
Qualitative data may be enough to justify a character education program, but for
academic initiatives that can be growth can be measured qualitative data must be utilized,
Chapter 7 & 8 are devoted to the six phases of the actual evaluation process. Chapter 7
describes phases one through three: Phase 1 - describing the program, Phase 2 - providing
direction for the evaluation, and Phase 3 - obtaining information to answer evaluation questions.
For Phase 1, the most important aspect is not only being able to describe the program in written
format, especially for decision makers, but also to clarify with the faculty in a program that the
team understands “what they are trying to accomplish and how they are going about it” (Jason,
2008, p. 78). The goal of Phase 2, providing direction, is to formulate questions to ask in order
to better understand the program, its goals, and whether or not those goals are being met. Lastly
for this chapter, Phase 3 requires gathering evidence and information to be able to answer those
questions. There are many approaches and data points that can be collected to inform the
and qualitative information that support the questions asked in Phase 2 are crucial to presenting a
Chapter 8 concludes the second half of the evaluation phases: Phase 4 - analyzing data to
recommendations, and Phase 6 - writing the evaluation report. These phases are the main event
of the process; the proverbial show we’ve all been waiting for.
One of the biggest considerations for the team during Phase 4 is to determine if sample
size can justify and prove quantitative statistical significance that the program is effective, or
whether practical significance is required wherein the evaluator's professional judgment must be
trusted solely. Throughout the many examples are provided by Jason for teams to utilize in
considering the presentation of the data in order to draw comparisons and correlations. One such
example that is very effective and common is the “Two Group Comparison” that seeks to
determine is the isolated variable of the program initiative have made an impact or not when
compared to a control group. Once data has been thoroughly presented and analyzed, judgments
about effective, efficiency and efficacy in context are then made by the team through consensus.
The team most likely will assign a rating across a spectrum, like ‘very effective, somewhat
effective’, etc. (Jason, 2008, p. 130). Lastly Phase 6, covered in greater detail in Chapter 9, the
team writes the evaluation report to all stakeholders in “two stages: draft and final form” in order
This chapter gives some general guidelines for how the report should be organized. It
frames how certain content should be presented, such as separating quantitative and qualitative
data analysis, etc. Essentially, the overall purpose of the evaluation report is to serve as a
detailed, succinct, and viable body of work for administrators and faculty to make decisions.
Conclusion
small scale formative checks or full spectrum summative evaluation. Since the implications for
some program evaluations are so drastic, it is essential for evaluation teams to work
hand-in-hand with groups to identify and measure key indicators accurately. Thorough analysis
and clear reporting are essential for decision makers to make an informed decision regarding the
program. While this process can be tedious, if done correctly, it will absolutely lead to increased
results and efficiency. Program evaluation is necessary and Jason’s guide is a practical way to
References