You are on page 1of 32

Program on Education Policy and Governance Working Papers Series

The Impact of Performance Pay for Public School Teachers:


Theory and Evidence

Marcus A. Winters
University of Arkansas

Gary W. Ritter
University of Arkansas

Ryan H. Marsh
University of Arkansas

Jay P. Greene
University of Arkansas

Marc J. Holley
University of Arkansas

PEPG 08-15

Preliminary draft
Please do not cite without permission

Prepared for the CESifo/PEPG joint conference


“Economic Incentives: Do They Work in Education”
Insights and Findings from Behavioral Research

CESifo Conference Center


Munich, Germany
May 16-17, 2008
The Impact of Performance Pay for Public School Teachers: Theory and Evidence

Marcus A. Winters*
Ph.D. Candidate

Gary Ritter**
Associate Professor

Ryan Marsh**
Research Associate

Jay P. Greene**
Professor

Marc Holley**
Ph.D. Student

* Department of Economics, University of Arkansas


** Department of Education Reform, University of Arkansas

Abstract:

This paper derives a theoretical model for understanding the impact of providing teachers
with bonuses for increasing student academic proficiency. Similar to previous labor
economic research, we model a teacher’s choice of effort as a labor-leisure tradeoff.
However, we recognize that teachers are different from other workers in that they could
have some internal motivation to increase productivity and that this “caring” for students
could have implications for the impact of performance pay. We then test these predictions
using data from a generous performance pay program in Little Rock, Arkansas. Using a
differences-in-differences approach, we find that students whose teachers were eligible
for performance pay made substantially larger test score gains in math, reading, and
language than students taught by untreated teachers. Further, we find a negative
relationship between the average performance of a teacher’s students the year before
treatment began and the additional gains made after treatment.
1

I) Introduction

In the United States, the majority of public school teachers receive compensation

according to a salary schedule that is nearly entirely determined by their number of years

of service and their highest degree attained. This system, however, has seen increasing

attacks from policymakers and researchers in recent years. Several school systems have

considered adding a component to the wage structure that directly compensates teachers

based upon the academic gains made by the students in a teacher’s care, at least partly

measured by student scores on standardized tests. Several public school systems

including Florida, New York City, Denver, and Nashville have recently adopted such

“performance-pay” policies. Recent survey research suggests that nearly half of all

Americans support performance-pay for teachers whose students are making academic

progress, while about a third of Americans directly oppose such a plan (Howell, West,

and Peterson 2007).

The focus on performance-pay programs recognizes the consensus that teacher

quality is one of the most important parts of the education process. Analyses using panel

data suggest that the quality of the teacher in a classroom is one of the most important

predictors of student achievement (Rivkin, Hanushek and Kain 2005; Harris and Sass

2006; Aaronson, Barrow and Sander 2003; Ballou, Sanders and Wright 2004; Goldhaber

and Brewer 1997; Rockoff 2004). Other research has focused on identifying observable

characteristics that predict teacher productivity, though these papers have had little

success in their search (for a complete review of this literature see Hanushek and Rivkin

2006).
2

However, an important limitation of this previous empirical research is that it

treats teacher productivity as a function of only teacher ability. The papers assume that

this ability is either exogenously given or it is increased through professional

development. Given that these papers have found little evidence that professional

development increases teacher effectiveness, this research suggests that ability is the sole

determinant of the substantial variation in teacher quality.

What may be missing from these previous models is an understanding of the

impact of teacher effort in the educational production process. There are several reasons

that it is important to consider the impact of effort in a teacher production function. First,

focusing on the decision of teachers to put forth effort will align discussion of educational

production with that of the rest of the labor economics literature. In fact, the lack of

discussion of teacher effort is interesting given that the decision to put forth effort at the

job is the driving force of productivity models in other sectors of the labor market. To the

extent that we understand microeconomic decisions to work, wage offers, and worker

productivity in the general labor market we do so through models driven by an

individual’s rational maximization of utility through the choice to put forth effort.

Understanding teacher effort could provide similar understanding of the educational

production function.

Secondly, it is impossible to discuss the impact of performance-pay programs on

student achievement without incorporating the chosen effort level of teachers. Those in

favor of such policies suggest that they will increase teacher productivity at least in part

by providing them with an incentive to put forth effort in the classroom. Thus, a rigorous

understanding of the impact of performance-pay demands an analysis of teacher effort.


3

Finally, the motivational situation in education may not be as directly comparable

to that of the firm as would first appear. The basic structure of the motivational problem

evaluated here is nothing new to research on firms and is the subject of a wide theoretical

and empirical literature comparing salaries and piece-rate compensation structures, and it

is to this literature and general idea that those in favor of performance-pay programs for

teachers point. However, the labor structure in education differs from that of firms in

important ways that could alter the effects of a piece-rate (performance-pay) system.

Important assumptions of the previous research that make sense for evaluating

piece-rate pay for some workers do not apply in education. In particular, this previous

research assumes that under a straight salary structure employees must continue to meet a

minimum benchmark in order to avoid being fired, while the landscape in education is

such that it is nearly impossible to fire a teacher for cause. Secondly, many of the results

of this previous research derive from the assumption that measuring worker productivity

is costly, but in education there is little to no cost from such measurement because of

wide-spread student achievement testing that is in practice for political and policy reasons

completely separate from the performance-pay program. (See Lazear 1986 for a treatment

of the implications of these assumptions for regular workers).

However, perhaps the most important difference between teachers and workers in

most firms is that we may expect that teachers are to some extent internally motivated to

produce a high quality product in a way that most other workers are not. One of the most

consistent criticisms of performance-pay policies is that teachers could already be

working to their fullest potential because their love for their students motivates them to

work their hardest to produce academic gains. A similar relationship would exist in the
4

labor market outside education among workers who take pride in their job or take a

particular interest in their own output. We expect that this internalization of the quality of

one’s output could be magnified for teachers who are tasked with providing children with

a better life than they would otherwise have in absence of teacher effort.

In the context of education, we develop a formal framework for understanding

teacher productivity where effort is the result of a utility maximization problem and

teachers directly gain utility from student achievement. Our model allows such “caring”

for students to be heterogeneous across teachers and sets up a general framework for

understanding the implications of this for performance-pay systems.

We also go on to empirically evaluate the predictions resulting from the theory by

studying the effects of a currently operating performance-pay program. Thus, a second

important contribution of this paper is to add to the limited empirical research on the

impact of performance-pay policies on student achievement.

Several researchers have evaluated the impact of performance pay programs on

reported teacher satisfaction, classroom practices, and retention (Johns, 1988; Jacobson,

1992; Heneman and Milanowski, 1999; Horan and Lambert, 1994). Some U.S. evidence

suggests that programs providing bonuses to entire schools, rather than changing the pay

of individual teachers, have a positive impact on student test scores (Clotfelter and Ladd,

1996). However, there is currently very little empirical evidence from the United States

suggesting that direct teacher-level performance pay leads to better student outcomes. 1

Figlio and Kenny (2006) independently surveyed the schools that participated in

the often-used National Educational Longitudinal Survey (NELS). They then

1
There is also limited evidence on the impact of performance pay in other countries. Lavy (2002) found
that a school-based program in Israel increased student performance, and Glewwe, Ilias, and Kremer (2003)
found similar results from a program in Kenya.
5

supplemented the NELS dataset with information on whether schools compensated

teachers for their performance. They found that test scores were higher in schools that

individually rewarded teachers for their classroom performance.

Eberts, Hollenbeck, and Stone (2000) used a differences-in-differences approach

to evaluate the impact of a performance incentive for teachers in an alternative high

school in Michigan. They found that the program had no effect on grade point averages

or attendance rates and actually increased the percentage of students who failed the

program. However, the study was unable to provide a direct evaluation of student

achievement (i.e. test scores). Further, the study’s focus on an alternative dropout

recovery school produces difficult estimation problems and could limit its use in the

discussion of traditional public K-12 education.

Finally, Keys and Dee (2005) evaluated an incentive improving career ladder

program in Tennessee. They took advantage of the fact that this program operated at the

same time as the notable Tennessee STAR program, a random assignment experiment on

the impact of class size on student achievement. Under STAR, students were randomly

assigned to classrooms of different sizes. This assignment additionally meant that

students were randomly assigned into classrooms led by teachers who were or were not

participating in a state sponsored performance pay program. Importantly, however,

teachers were not similarly randomly assigned to participate in the performance pay

program, and thus the study cannot be considered a conventional random assignment

experiment of the performance pay plan. Nonetheless, they found that students randomly

assigned to classrooms with teachers participating in the performance pay program made

exceptional gains in math and reading, though these results could be driven by selectivity
6

in the teachers that choose to participate in performance pay programs, rather than the

incentives of the program itself.

We add to this limited empirical research and further evaluate the predictions of

our theory by studying the impact of a generous performance-pay program in Little Rock,

Arkansas on student achievement in math, reading, and language. We find that adoption

of performance-pay substantially increased student proficiency in each of these subjects.

We also find evidence of an inverse relationship between previous teacher performance

before treatment and the positive impact of performance-pay on teacher productivity. No

other previous study of which we are aware has evaluated the impact of performance-pay

on the distribution of test score gains across teachers.

The remainder of this paper will be broken out into six sections. Section II

develops the theoretical model for understanding teacher productivity and evaluates the

impact of performance-pay programs for student achievement. Section III discusses the

performance-pay program in Little Rock, Arkansas evaluated in the paper. Section IV

discusses the data analyzed in the paper and develops the empirical models used to

measure the impact of performance pay on student achievement. Section V reports the

result of this estimation, and Section VI concludes.

II) The Model

Teachers, indexed by i, maximize utility, which depends on leisure (L), wages

(w), and student achievement gains (s), each of which has positive but diminishing

returns to utility.

(1) U i = U i ( w i , Li , s i ) subject to L ∈ [0, L] , U n′ > 0, U n′′ < 0, n ∈ [ w, L, s ]


7

The framework is innovative in two ways. First, we are aware of no previous

research in the education literature modeling teacher effort. Secondly, we recognize that

teachers receive utility from the productivity of their students – we call this teacher

“caring”. Without this addition we have the classical labor-leisure trade-off discussed in

previous labor economic research.

Such a caring for students is commonly attributed to teachers, but has yet to be

modeled in the literature. A clear motivation is simply to notice the differences between

children and other forms of output – humans have a natural inclination to care for

children in a way that they do not for other widgets.

It is a normal assumption in the education literature that teachers are so clearly

internally motivated that teacher effort warrants little discussion. Economists and those

interested in modern market-based education reforms famously pursue the opposite track

when discussing the labor force by assuming that individuals are entirely self-interested.

Here, we marry these two extreme points and recognize that our schools are staffed with

self-interested teachers who also possess at least some altruistic intentions.

The premise of this model may also be used to discuss the labor decisions of

workers in the general labor force who have an internal desire to be highly productive.

However, in the model here for teachers we assume that no external incentive for

teachers to put forth positive effort under the current pay system. This assumption rests

on the idea that it is nearly impossible to fire a public school teacher for cause in the
8

United States, which is supported by anecdotal and some empirical evidence. 2 Extensions

of this model for the general labor force would need to include a minimal effort level that

the individual must meet in order to avoid termination, such as in Lazear (2000).

We assume that there are at least some teachers that are internally motivated

enough to put forth some positive effort in absence of a financial reward and that there

are at least some teachers who not so internally motivated as to work to their highest

ability in absence of a financial or other external motivating force. Further, we allow

different teachers to care about their students with different strengths. If teacher 1 cares

more than teacher 2, then teacher 1 will be willing to accept lower levels of wages and

leisure in exchange for higher levels of student achievement. Thus in order to declare

teacher 1 to care more strongly than teacher 2, we compare their marginal rates of

substitution:

(2) MRS1(s,x)>MRS2(s,x) , x ∈ [ L, w]

or expressed as marginal utilities where MUi(x) is the first derivative of utility for teacher

i with respect to input x:

MU s1 ( w, l , s ) MU s2 ( w, l , s )
(3) > , x ∈ [ L, w]
MU x2 ( w, l , s ) MU x2 ( w, l , s )

2
For example, using Freedom of Information Act requests, journalist Scott Reader found that in the ten
year period between 1995 and 2005 there were only 555 instances of a formal remediation of a public
school teacher in the entire 873 schools in the state of Illinois. That is, each year an average of 55 public
school teachers were sanctioned for any reason – including sexual misconduct and other indiscretions other
than poor teaching performance – which amounts to about 0.04% of teachers per year. With such low rates
of sanctioning for any reason coupled with such high rates of low student academic achievement, it is clear
that public school teachers are not often fired (or even reprimanded) for failing to adequately instruct
students.
9

For ceteris paribus conditions, we assume that the marginal rate of substitution between

leisure and wages at equal levels of each is constant across individuals. This assumption

is trivial, for simplification only, and does not affect the results.

Each student is endowed with an initial ability level λ. In order to most directly

focus on teacher effort, here we consider λ as constant and homogenous across students.

Though this is clearly not the case, it is justifiable in our treatment of performance-pay

since these programs most often utilize value-added measures to provide pay bonuses,

which is an attempt to hold student ability constant. In the empirical work of the next

section, we attempt to hold λ constant by controlling for prior student achievement. The

theoretical results would differ if student ability were heterogeneous and teachers were

simply paid for having higher performing students.

Student achievement gains (sj) are a function of the student’s initial ability (λ) as

well as the productivity of their classroom teacher (ti). Teacher productivity is a function

of the teacher’s effort (ei). We assume that both functions have positive but diminishing

returns to the inputs.

∂s ∂2s ∂s ∂2s
(4) s j = s j (λ j , t i ) , > 0, 2 < 0 and > 0, <0
∂λ ∂λ ∂t ∂t 2

∂t ∂ 2t
(5) t i = t i (e i ) , > 0, 2 < 0
∂e ∂e

Effort and leisure are negatively related, such that effort is defined as maximum

available leisure time less the amount of leisure actually enjoyed by the teacher:

(6) e i = L − Li
10

Solving for L and combining equations 1-4, teachers maximize utility by choosing

their level of effort:

(7) max U i = U i ( w i , L − e i , s j (λ j , t i (e i )))

Teachers have an incentive to put forth effort to the extent that it is utility

increasing. We compare the chosen effort levels under both the conventional pay system

and a performance-pay program. We then go on to discuss the potential for

heterogeneous effects of performance-pay across teachers with different levels of internal

motivation.

RESULT 1: If wages are independent of teacher effort and two teachers differ only on

how much they care about student achievement, the teacher that cares more will put forth

greater effort and thus produce higher student achievement.

PROOF:

Consider two teachers such that teacher 1 cares more for her students’

achievement than teacher 2. For a given s, say s , the difference in caring is expressed in

equation (3).

Differentiating (7) with respect to effort, the first order condition for the teacher’s

maximization problem is

∂U i ∂U i ∂L ∂U i ∂s ∂t
(8) = * + * * =0
∂e ∂L ∂e ∂s ∂t ∂e

If we solve for teacher 2, we find


11

⎛ ∂U 2 ∂L ⎞ ⎛ ∂U 2 ∂s ∂t ⎞
(9) ⎜⎜ * ⎟⎟ + ⎜⎜ * * ⎟⎟ = 0 , or
⎝ ∂L ∂e ⎠ e* ⎝ ∂s ∂t ∂e ⎠ e*

⎛ ∂L ⎞ ⎛ ∂s ∂t ⎞
⎜ MU l ( w, L, s) * ⎟ + ⎜ MU s ( w, L, s ) * * ⎟ = 0
2 2

⎝ ∂e ⎠ e* ⎝ ∂t ∂e ⎠ e*

where e* is the equilibrium level of effort for teacher 2. This can be rewritten as

∂L ⎛ MU s2 ( w, L, s ) ∂s ∂t ⎞
(10) +⎜ * * ⎟ =0
∂e ⎜⎝ MU l2 ( w, L, s ) ∂t ∂e ⎟⎠ e*

The marginal utility of effort for teacher 1 at the equilibrium level for teacher 2, e* is

∂U 1 ∂L ⎛ MU s1 ( w, L, s ) ∂s ∂t ⎞
(11) = +⎜ * * ⎟ >0
∂e ∂e ⎜⎝ MU l1 ( w, L, s ) ∂t ∂e ⎟⎠ e*

We can now evaluate whether utility is increasing or decreasing in effort for

teacher 1 at e* by evaluating this condition. The marginal rate of substitution for teacher

1 between student achievement gains and leisure is greater than that for teacher 2. Since

the two derivatives that multiply this MRS are positive and we know that, evaluated at

e*, marginal utility is 0 for teacher 2, we know that teacher 1 has increasing marginal

utility at e*. Because of this, the utility-maximizing effort level for teacher 1, ~
e , is

greater than e*. Since student achievement gains are an increasing function of effort, this

also implies that teacher 1 will have greater student achievement gains than teacher 2,

holding innate ability constant.


12

RESULT 2: A performance-pay plan that makes wages dependent upon student

achievement will cause teachers to put forth more effort and therefore should result in

higher student achievement.

PROOF:

The expected positive overall effect from performance-pay is quite intuitive.

Consider a performance-pay plan set up with payouts being linearly related to student

achievement (or achievement gains; the plan would have the same effect). This changes

the wages from a simple w to

(12) w i ( s j (t i (e i ))) = w0 + a * s j (t i (e i )) where a > 0

This makes wages equal to some initial base pay, w0, plus a linear function of student

achievement gains.

To determine the incentives faced by the teacher in changing from a traditional

pay plan to a performance-pay plan, we evaluate the marginal utility of effort at the old

equilibrium after including the new pay plan. The traditional condition for equilibrium is

found in equation (8).

Marginal utility with respect to effort under performance-pay is:

(13)

∂U i ⎛ ∂w ∂s ∂t ⎞ ⎛ ∂L ⎞ ⎛ ∂s ∂t ⎞
= ⎜ MU wi ( w, L, s ) * * * ⎟ + ⎜ MU li ( w, L, s ) * ⎟ + ⎜ MU si ( w, L, s) * * ⎟ .
∂e ⎝ ∂s ∂t ∂e ⎠ ⎝ ∂e ⎠ ⎝ ∂t ∂e ⎠
13

To determine the effect of implementing a performance pay plan, we evaluate this

derivative at the original equilibrium level of achievement. As seen specifically for

teacher 2 in equation (9), the latter two terms of (13) sum to zero at the original

equilibrium, leaving marginal utility at the old equilibrium equal to the first term of (13).

The sign of this term and therefore of the derivative is positive since all components of

the product are positive. Thus beginning at the old level of effort, the teacher can increase

utility by putting forth more effort when the compensation structure is changed to include

a performance-pay mechanism. The proof of higher achievement follows the same lines

as the argument in proof 1. Teacher ability is constant across regime and effort increases,

so teacher production increases and therefore student achievement increases.

We can further examine the performance pay plan by reorganizing equation 13

into the equation seen below:

∂U i ⎛ MU wi ( w, L, s ) ∂w MU si ( w, L, s ) ⎞ ⎛ ∂s ∂t ⎞ ⎛ ∂L ⎞
(14) = ⎜⎜ * + ⎟⎟ * ⎜ * ⎟ + ⎜ * ⎟
∂e ⎝ MU l ( w, L, s ) ∂s MU l ( w, L, s ) ⎠ ⎝ ∂t ∂e ⎠ ⎝ ∂e ⎠
i i

This reorganization demonstrates a key difference across the two teacher pay

plans. Before, wages were constant with respect to student achievement gains, so the

marginal utility with respect to student achievement gains was the derivative of utility

with respect to its third component. Now, student achievement gains impact wages and

therefore the marginal utility of wages enters into the marginal utility with respect to

student achievement gains.

Granted that, however, we can evaluate effort based on the difference between

teachers marginal rates of substitution. Following the same lines as proof 1 and utilizing
14

the simplifying assumption that the marginal rate of substitution between leisure and

wages at equal levels of each is constant across individuals, the demonstration again

reduces to a comparison of marginal rates of substitution between student achievement

gains and leisure. Utilizing our terminology, this once again demonstrates that the teacher

who cares more will put forth more effort.

Potential for a Differential Effect of Performance-Pay Across Teachers with Different

Internal Motivations

The recognition that teachers have heterogeneous internal motivations begs the

question of whether performance-pay might affect teachers of different "caring" levels

differently. Intuitively, we may expect that performance-pay could have its greatest

impact on teachers with lower internal motivation because teachers with high levels of

caring are already putting forth a potentially large amount of positive effort. However,

though the intuition here is strong, we will see that formally evaluating this relationship is

quite complicated and fails to yield a clear analytical solution.

Let e1* be the level of effort chosen by teacher 1 under the performance pay plan

from result 2. Utilizing the fact that the original plan is simply a variant of performance

de1 *
pay that has a = 0, we can consider the change in effort level by evaluating through
da

the use of comparative statics; the same can be done for teacher 2. To determine the

teacher that reacts more strongly to the introduction of performance pay, we simply

compare these comparative static derivatives.


15

(15)

de i * ⎡ ∂U ∂ 2 w ∂s ∂ 2U ∂w ∂w ∂s ∂ 2U ∂w ∂ 2U ∂w ∂s ⎤
= −⎢ * * + * * * − * + * * ⎥
⎣ ∂w ∂s∂a ∂e ∂w ∂a ∂s ∂e ∂L∂w ∂a ∂s∂w ∂a ∂e ⎦
2
da
⎡ ∂U ⎡ ∂ 2 w ⎛ ∂s ⎞ 2 ∂w ∂ 2 s ⎤ ∂ 2U ⎛ ∂w ∂s ⎞ 2 ∂ 2U ∂U ∂ 2 s ∂ 2U ⎛ ∂s ⎞ 2 ⎤
⎢ * ⎢ 2 *⎜ ⎟ + * ⎥+ *⎜ * ⎟ + 2 + * + *⎜ ⎟ ⎥
⎢ ∂w ⎣⎢ ∂s ⎝ ∂t ⎠ ∂s ∂e 2 ⎦⎥ ∂s 2 ⎝ ∂s ∂e ⎠ ∂L ∂s ∂e 2 ∂s 2 ⎝ ∂e ⎠ ⎥
⎢ 2 ⎥
⎢ ∂ 2U ∂w ∂s ∂ 2U ∂w ⎛ ∂s ⎞ ∂ 2U ∂s ⎥
⎢− 2 * ∂w∂L * ∂s * ∂e + 2 * ∂w∂s * ∂s * ⎜⎝ ∂e ⎟⎠ − 2 * ∂L∂s * ∂e ⎥
⎣ ⎦

Clearly, the comparative static derivative is a complicated function of second and

cross-partial derivatives, about which we are not in a strong position to make

assumptions. Because of this we can make no theoretical statement about the strength of

the program’s impact on teachers who care at different levels. However, though a formal

expectation for the differential impact of performance-pay on teacher productivity

escapes us, this framework provides an interesting question that we can answer

empirically.

III) The Program

The Achievement Challenge Pilot Project (ACPP) is a teacher and staff pay-for-

performance program that has operated within the Little Rock School District (LRSD) for

three years since the 2004-05 school year. The purpose of the program is to motivate

faculty and staff to bring about greater student achievement gains. The ACPP uses

student improvement on nationally-normed standardized tests as the only basis for

financial rewards.

The funding for this project has come through a partnership between private

foundations and the LRSD. In the first year, private foundations supported ACPP at a

single elementary school and the program expanded to include another school in its
16

second year. In the third year the program adopted three additional elementary schools.

For reasons discussed below, our analyses will focus entirely on the impact of

performance-pay in the three schools that began treatment in the third year of the

program. The discussion that follows describes how the program operated in these three

schools.

The performance-pay program provided bonuses directly to teachers based on the

average spring-to-spring achievement gain of students in the teacher’s class on the

composite score of the Iowa Test of Basic Skills. The composite score includes student

achievement on the math, reading, and language arts portion of the exam.

Teachers whose students had an average achievement growth between 0-4%, earn

$50 times the number of students in their class; teachers whose students have an average

achievement growth between 5-9%, earn $100 times the number of students in their class;

teachers whose students have an average achievement growth between 10-14%, earn

$200 times the number of students in their class; teachers whose students have an average

achievement growth over 15%, earn $400 times the number of students in their class.

Table 1 displays the average bonuses that were actually earned in the schools included in

the analysis. Other staff members could also earn various bonuses based on their level of

responsibility.

[TABLE 1 ABOUT HERE]

Schools were selected to participate in ACPP based on their high percentages of

students who were struggling academically and economically disadvantaged. Table 2

reports baseline descriptive statistics for those variables used in the analyses below.

About 63 percent of the LRSD students that were not in a performance-pay eligible
17

school in 2007qualified for the federal free and reduced lunch program, and 67 percent of

these students are African American. The schools that were eligible for the program in

2007 served a more disadvantaged group of students: 88 percent of whom qualify for the

federal free and reduced lunch program and 88 percent of whom are African American.

[TABLE 2 ABOUT HERE]

The table also shows that students in untreated schools had baseline scores in

math, reading, and language that were substantially above those of students who were in

treated schools. Further, students in untreated schools made substantially larger

improvements in these subjects the year before treatment took place.

IV) Data and Method

We acquired individual data for the universe of public school students enrolled in

Little Rock, Arkansas elementary schools in the 2005 through 2007 school years,

providing us with two observations of student test scores gains. 3 For each elementary

student in the district, this dataset included demographic information, test scores, an

identifier for the student’s classroom teacher, and a unique student identifier that allows

us to track each student’s performance over time. We evaluate the impact of adoption of

the performance-pay program on student proficiency in math, reading, and language.

Test scores are reported in our dataset in Normal Curve Equivalent (NCE) units.

NCE’s rank the student on a normal cure compared to a nationally representative group

of students who have taken the test. NCE’s are similar to percentile scores, but differ in

that they are equal-interval scaled, meaning that the difference between two scores on one

3
Here and throughout this paper we use the spring term year to identify the school year. That is, the 2004-
05 school year is referred to as 2005.
18

part of the curve are equivalent to the difference of a similar interval on another part of

the curve. NCE scores are scaled between 1 and 99 with a mean of 50.

We utilize the differences-in-differences procedure to study the impact of

performance pay. Unfortunately, we are forced to exclude students in the schools that

began the performance pay treatment prior to 2007. The reason for the exclusion is that

since these schools were treated in each year for which we have data, in the analysis they

would become part of the comparison group.

We use OLS to estimate a model taking the form:

Yi ,a ,t = β o + β 1Yi ,a ,t −1 + β 2 Student i ,t + β 3 Schooli ,t + β 4Yeart + β 5Treat i ,t + ε i ,t (18)

Where Yi,a,t is the test score of student i in subject a in the spring of year t; Student is a

vector of observable characteristics about the student; School is vector indicating the

school that the student attended; Year is an indicator variable for the year; and ε is a

stochastic term clustered by school.

Treat is an indicator variable for whether the observation occurred for a student

attending the treatment school during the treatment year. That is, this variable is an

interaction between Year = 2007 and the indicator variable for each school that was

eventually treated. When Equation (1) is estimated using OLS, the Treat (β5) coefficient

becomes an estimate of the change in the conditional expectations of test score gains

resulting from the performance pay treatment. That is, β5 represents the impact of the

performance pay treatment after accounting for the differences in the test scores that

occur naturally over time and within the individual schools.

We also estimate a model working from (18) but which includes a teacher fixed

effect. This model takes the form:


19

Y i , a , t = ψ o + ψ 1Y i , a ,t −1 + ψ 2 Student i ,t + ψ 3 School i ,t
(19)
+ ψ 4 Year t + ψ 5 Treat i ,t + ψ 6 Teacher i ,t + ρ i ,t

Where Teacher is an indicator for the student's teacher, ρ is a stochastic term

clustered by school, and all other variables are as previously defined.

Secondly, as discussed in the above theoretical section, we are interested in

testing whether there is a differential relationship between the impact of performance-pay

and a teacher's prior productivity. We can evaluate whether teachers of varying success

had different responses to performance-pay by altering equation (18) to contain an

interaction between the treatment and a measure of a teacher’s pre-treatment productivity.

An obvious measure of pre-treatment productivity that is available in our dataset is the

average test score gain of students in the teacher’s classroom in the year prior to adoption

of the policy, 2006. Since treatment begins in 2007, and we only have test scores back

until 2005, we utilize the gains in 2006 as the only measure of pre-treatment productivity.

We slightly alter equation (18) to take the form:

Yi ,a ,t = φ o + φ1Yi ,a ,t + φ 2 Student i ,t + φ3 School i ,t + φ 4Yeart + φ5 Pr e _ Gaini ,a +


(20)
φ 6Treat i ,t + φ 7 (Pr e _ Gaini ,a * Treat i ,t ) + ρ i ,t

Where Pre_Gaini,t is the average test score gain in 2006 for students in the class of

student i’s current teacher, and ρ is again a normally distributed mean zero stochastic

term.

We are now particularly interested in φ7, which can be interpreted as the

heterogeneous effect of the performance-pay treatment by previous teacher performance.

If we find that φ7 < 0, we could interpret it as indicate that lower performing teachers

made the largest gains from the performance-pay policy.


20

We are able to estimate these equations in math, reading, and language in

elementary schools. However, the grades included in the analyses of each subject differ

due to limitations of the testing scheduled in Little Rock. Students were administered the

math version of the ITBS in all grades K-5 in each of the three years from 2005 - 2007,

and so each of these grades are included in the analyses. However, Little Rock students

were not administered the ITBS language or reading test in grades 3, 4, or 5 until 2006.

Further, students were not administered the ITBS reading test in Kindergarten until 2007.

These data limitations lead us to only include students in grades 2 and 3 for the reading

analyses and students in grades 1, 2, or 3 in the language analyses -- the only grades for

which we have both a pre- and post test score for students in both the baseline and

treatment eligible year.

A potential limitation of our approach is that we may have an endogeneity

problem since schools were not randomly assigned to the performance-pay treatment. In

particular, as discussed above, the treatment was made available to schools non-randomly

and treated schools had higher minority populations and lower income students on

average.

We are able to partially account for this endogeneity bias by including school and

in one analysis teacher fixed effects in order to account for heterogeneity in school

quality. However, it is also worth noting that summary statistics indicate that any

endogeneity bias should likely tend to underestimate the impact of the performance pay

treatment. Note that Table 2 shows that in 2006, the year before the policy was available,

on average students in eventually treated schools made smaller test score improvements

in each of the three subjects used in our analyses. That is, we should expect that in
21

absence of treatment these schools should have made smaller test score improvements

than the control schools, which would tend to bias the estimation of the treatment effect

downward. Nonetheless, we recognize that lack of random assignment is a concern with

any results.

V) Results

The results from estimation of equation (18) are reported in Table 3. Recall that

we are forced to use a more restricted group of grades in the reading and language

analyses, which accounts for the variation in the number of observations across subjects.

[TABLE 3 ABOUT HERE]

Our results suggest that students made statistically significant improvements in

math and reading, though the results in language just fail the test for significance at the

10% level (p = 0.126). The analyses suggest that the performance-pay treatment led to an

increase of about 3.52 NCE points in math, 3.29 NCE points in reading, and 4.56 NCE

points in language.

The size of these effects is substantial. We can use the summary statistics for

baseline achievement in these subjects reported in Table 2 to put our results into terms of

standard deviation units. Dividing the effect size by the standard deviation of the baseline

test score in the subject, our results suggest that performance-pay increased student

proficiency by 0.16 standard deviations in math, 0.15 standard deviations in reading, and

0.22 standard deviation units in language.

Table 4 reports the results of estimation of the overall treatment effect when we

include a fixed effect for each individual teacher. The table shows that the results are

qualitatively similar to those without a teacher fixed effect.


22

[TABLE 4 ABOUT HERE]

Somewhat surprisingly, the small gain in the R-Squared value between the

analyses reported in Tables 3 and 4 suggest that the teacher fixed-effect is explaining

very little of the variance in student achievement. We tested the explanatory power of the

teacher fixed-effect itself by estimating a regression of math test scores against only the

teacher fixed-effect. That is, we estimated Equation (19) but removed all independent

variables other than the teacher fixed-effect. These analyses found R-Squared values

between 0.20 and 0.25 for the three subjects. 4 This indicates that there is variation in

teacher effectiveness but that here it is correlated with other regressors included in the

model estimated in (18).

Table 5 reports the results of estimating equation (20). Here we are interested in

evaluating any differential impact from the performance-pay treatment by the teacher’s

previous productivity. In each subject we find that the coefficient on the overall treatment

effect remains positive, though the treatment effect in reading just misses the threshold

for significance (p = 0.110). However, we find a negative relationship between the

teacher's prior productivity (measured by the average test score gain of students in the

teacher's classroom in the baseline year) and the impact of performance-pay on teacher

productivity. The inverse relationship between prior teacher productivity and the

performance-pay effect is statistically significant in each subject. These results suggest

that the previously lowest performing teachers made the greatest improvements due to the

incentives of the performance-pay program.

[TABLE 5 ABOUT HERE]

VI) Conclusion
4
Analyses available upon request.
23

This paper makes a variety of contributions to the literature through the lens of

performance-pay for teachers. First, we provide a general theoretical framework for

understanding teacher productivity that is aligned with the labor economics structure of a

decision to exert effort, which has been so far absent from the economics of education

literature.

We suggest that the labor-leisure trade-off for teachers could be different from

those of other workers in important ways that are worth consideration in future

theoretical and empirical work. In particular, teachers very likely have adopted the

quality of their production – student achievement – into their own utility function. We

show that could hold important consequences for understanding teacher productivity. We

believe that these theoretical contributions could prove fruitful for future research not

only on the impacts of performance-pay, but for our understanding of academic

productivity more generally.

We have also added to the limited empirical research on performance-pay

programs for teachers. The results of our evaluation of the performance-pay program in

Little Rock, Arkansas coincide with the theoretical predictions. We find that adoption of

performance-pay led to substantial improvements in student math, reading, and language

proficiency. Further, the results indicate that performance-pay was beneficial for nearly

all teachers, and had a particularly large effect on the lowest-performing teachers.
24

References

Aaronson, D., Barrow, L., & Sander, W. (2003). “Teachers and student achievement in
the Chicago public high schools”. Unpublished manuscript.

Ballou, D., Sanders, W., & Wright, P. (2004). “Controlling for student background in
value-added analysis of teachers”. Journal of Educational and Behavioral Statistics,
29(1), 37-65.

Clottenfelter, C., and Ladd, H., 1996. “Recognizing and Rewarding Success in Public
Schools” in H. Ladd, ed. Holding Schools Accountable: Performance-Based Reform in
Education. Washington, D.C., Brookings Institution.

Eberts, R., Hollenbeck, K., and Stone, J., 2002. “Teacher Performance Incentives and
Student Outcomes.” Journal of Human Resources, 37, p. 913-27.

Figlio, D., and Kenny , L., 2006. “Individual Teacher Incentives and Student
Performance”. Journal of Public Economics, doi: 10.1016/j.jpubeco 2006.10.001,
forthcoming.

Glewwe, P., N. Ilias, and M. Kremer 2003. “Teacher Incentives”. NBER working paper
9671.

Goldhaber, D.D., & Brewer, D.J. (1997). “Why don’t schools and teachers seem to
matter? Assessing the impact of unobservables on educational productivity”. Journal of
Human Resources, 32(3), 505-523.

Hanushek, E.A., & Rivkin, S.G. (2006). “Teacher Quality”. In Eric Hanushek and Finis
Welch, eds. “Handbook of the Economics of Education, Volume 2”. Elsevier. Pp 1051-
1075.

Harris, D. & Sass, T.R. (2006). “The effects of teacher training on teacher value added”.
Unpublished manuscript.

Heneman, H. G., and Milanowski, A. T., 1999. “Teachers’ attitudes about teacher
bonuses under school-based performance award programs”. Journal of Personnel
Evaluation in Education, 12, p. 327–41.

Horan, C. B., and Lambert, V., 1994. “Evaluation of Utah career ladder programs”. Beryl
Buck Institute for Education. Utah State Office of Education and Utah State Legislature.

Howell, W.G., West M.R., & Peterson, P.E. (2007). “What Americans think about their
schools”. Education Next, 7(4), 12-26
25

Jacobson, S. L. 1992. “Performance-related pay for teacher: the American experience”. In


Tomlinson, H. (Ed.) “Performance-related pay in education” (pp. 34-54). London:
Routledge.

Johns, H.E. (1988). “Faculty perceptions of a teacher career ladder program”.


Contemporary Education, 59(4), 198-203.

Keys, B., and Dee, T., 2005. “Dollars and Sense”. Education Next, 5, p. 60-67.

Lavy, V. 2002. “Evaluating the Effect of Teachers’ Group Performance Incentives on


Pupil Achievement”. Journal of Political Economy, 110, p. 1286-1317.

Lazear, E.P. (2000). “Performance pay and productivity”. American Economic Review,
90(5), 1346-1361.

Rivkin, S.G., Hanushek, E.A., & Kain, J.F. (2005). “Teachers, schools and academic
achievement,” Econometrica, 73(2), 417-458.

Rockoff, J.E., “The impact of individual teachers on student achievement: Evidence from
panel data.” American Economic Review, 94(2), 247-252.
26

Table 1
Summary of ACPP Payouts by Year and School

Average
Highest Lowest Average Cost
Total Teacher Teacher Teacher Total Per
School Year Bonus Bonus Bonus Bonus Enrollment Pupil
Mabelvale 2006-2007 $39,550 $6,400 $450 $1,187.50 338 $117
Geyer
Springs 2006-2007 $64,530 $7,600 $350 $2,846 333 $194
Romine 2006-2007 $12,450 $5,200 $450 $723 365 $34
27

Table 2
Baseline Descriptive Statistics
Eventually
All Never Treated Treated

Variable Mean Std. Mean Std. Mean Std.

Black 0.69 0.46 0.67 0.47 0.88 0.33


Asian 0.02 0.12 0.02 0.13 0.00 0.00
Hispanic 0.04 0.19 0.04 0.19 0.06 0.23
Indian 0.00 0.06 0.00 0.06 0.00 0.05
Male 0.50 0.50 0.50 0.50 0.52 0.50
Eligible for Free or Reduced
Lunch 0.65 0.48 0.63 0.48 0.88 0.33
Baseline Math 50.41 21.54 51.15 21.57 38.57 17.27
Baseline Reading 50.16 21.53 51.12 21.55 40.53 18.87
Baseline Language 49.87 21.13 50.88 21.18 40.21 18.02
Math Gain 2006 1.94 14.37 2.14 14.25 -1.29 15.83
Reading Gain 2006 1.83 14.51 1.89 14.53 1.19 14.29
Language Gain 2006 0.00 16.07 0.18 15.90 -1.75 17.45
Note: Only students included in overall math regression are included in above summary statistics
for demographic variables. Reading and language test descriptive statistics include only students used in
those regressions.
28

Table 3
Regression Results - Overall
Treatment

Math Reading Language


Coef t Coef t Coef t
Math t-1 0.70 69.42 ***
Reading
t-1 0.68 64.59 ***
Language
t-1 0.68 51.33 ***
Black -4.60 -11.01 *** -4.69 -11.27 *** -2.75 -5.48 ***
Asian 3.65 4.46 *** 1.04 1.00 5.81 4.53 ***
Hispanic -1.14 -1.72 * -1.62 -2.38 ** 1.18 1.50
Indian -1.80 -1.47 -3.78 -1.79 * -3.19 -1.01
Male 0.03 0.14 -0.41 -1.45 -2.87 -10.43 ***
Lunch
Eligible -2.47 -10.17 *** -2.88 -6.06 *** -3.19 -8.58 ***
Treat 3.52 2.32 ** 3.29 1.91 * 4.56 1.57
Constant 23.11 20.24 *** 19.40 26.84 *** 20.04 27.49 ***

Teacher
Fixed
Effect No No No

N 13,389 5,948 8,933


R-
Squared 0.6479 0.7118 0.6211

Estimated via OLS. Models also control for school, grade, and year fixed effects. Standard errors clustered
by school.
*** Significant at p<= .01
** Significant at p<= .05
* Significant at p<= .10
29

Table 4
Regression Results - Overall Treatment with Teacher Fixed Effect

Math Reading Language


Coef t Coef t Coef t
Math t-1 0.71 71.94 ***
Reading
t-1 0.69 65.22 ***
Language
t-1 0.68 51.22 ***
Black -4.41 -10.82 *** -4.56 -11.04 *** -2.70 -4.69 ***
Asian 3.64 4.01 *** 1.33 1.23 5.92 5.37 ***
Hispanic -0.86 -1.30 -1.27 -1.85 * 1.68 2.10 **
Indian -1.34 -0.93 -2.89 -1.58 -3.11 -0.97
Male 0.06 0.29 -0.43 -1.32 -2.71 -10.30 ***
Lunch
Eligible -2.24 -8.33 *** -2.82 -5.55 *** -2.90 -6.96 ***
Treat 5.23 2.21 ** 3.05 2.76 *** 2.04 0.93
Constant 17.36 9.47 *** 22.60 6.87 *** 24.54 9.12 ***

Teacher
Fixed
Effect Yes Yes Yes

N 13,388 5,948 8,933


R-
Squared 0.6780 0.7293 0.6541

Estimated via OLS. Models also control for school, grade, and year fixed effects. Standard errors clustered
by school.
*** Significant at p<= .01
** Significant at p<= .05
* Significant at p<= .10
30

Table 5
Regression Results - Differential Effect by Prior Teacher Productivity

Math Reading Language


Coef t Coef t Coef t
Math t-1 0.72 62.03 ***
Reading
t-1 0.66 50.93 ***
Language
t-1 0.65 38.16 ***
Black -4.17 -9.10 *** -4.57 -9.04 *** -2.48 -4.03 ***
Asian 3.90 4.19 *** -0.10 -0.07 6.48 5.54 ***
Hispanic -0.79 -1.00 -1.80 -2.49 ** 0.89 0.94
Indian 0.87 0.62 -2.01 -0.93 -1.86 -0.43
Male -0.05 -0.24 -0.56 -1.48 -2.90 -8.27 ***
Lunch
Eligible -2.62 -10.02 *** -3.07 -5.24 *** -3.61 -8.14 ***
Average
2006
Gain for
Teacher 0.62 17.42 *** 0.22 3.06 *** 0.37 6.59 ***
Treat 6.93 14.32 *** 3.63 1.65 4.24 3.93 ***
Treat *
Average
2006
Gain for
Teacher -0.48 -13.72 *** -0.35 -3.61 *** -0.50 -9.76 ***
Constant 17.13 16.88 *** 17.66 17.77 *** 19.42 14.53 ***

N 10,305 4,560 6,695


R-
Squared 0.6756 0.7015 0.6025

Estimated via OLS. Models also control for school, grade, and year fixed effects. Standard errors clustered
by school.
*** Significant at p<= .01
** Significant at p<= .05
* Significant at p<= .10

You might also like