Huiioojhfg

MICv2n5.
qxd 1/9/2007 11:22 AM Page 1
FnT MIC 2:5 On-the-Job Training

Foundations and Trends® in
Microeconomics
On-the-Job Training 2:5 (2006)
Harley Frazis and Mark Loewenstein
On-the-Job Training surveys the recent literature from both a theoretical and empirical
perspective. The analysis of how individuals obtain and are paid for their skills is fundamental
to labor economics. The basic idea of human capital theory is that workers and firms invest in
workers’ skills in order to increase their productivity, much as persons invest in financial or
physical assets to earn income. Workers develop many skills through formal education not tied On-the-Job Training
Harley Frazis and Mark Loewenstein

to an employer, but an important part of their skills are learned on the job.
On-the-Job Training focuses on recent literature including empirical research using direct Harley Frazis and Mark Loewenstein
measures of training and theoretical papers inspired by findings from this empirical work. The
authors presents a theoretical model showing that costs and returns to general human capital
may be shared if training increases mobility costs, if there are constraints on lowering wages,
or if there is uncertainty about the value of training at competing employers. This model
analyzes the choice of the amount of training, emphasizing the influence of whether the
employer can commit to training prior to employment. In addition, the model implies that firms
will attempt to match low-turnover workers with training opportunities, which is supported by
the empirical literature.
This book is originally published as

Foundations and Trends® in Microeconomics,
Volume 2 Issue 5 (2006), ISSN: 1547-9846.
now
now
the essence of knowledge
On-the-Job-Training
On-the-Job-Training
Harley Frazis
Bureau of Labor Statistics

2 Massachusetts Ave. NE,
Suite 4945, Washington D.C. 20212
Frazis.Harley@bls.gov
Mark A. Loewenstein
Bureau of Labor Statistics

2 Massachusetts Ave. NE,
Suite 4130, Washington D.C. 20212
Loewenstein.Mark@bls.gov
Boston – Delft
Foundations and Trends
R
in
Microeconomics
Published, sold and distributed by:

now Publishers Inc.
PO Box 1024
Hanover, MA 02339
USA
Tel. +1-781-985-4510
www.nowpublishers.com
sales@nowpublishers.com
Outside North America:

now Publishers Inc.
PO Box 179
2600 AD Delft
The Netherlands
Tel. +31-6-51115274
Library of Congress Control Number: 2006939974

The preferred citation for this publication is H. Frazis and M. A. Loewenstein,
On-the-Job-Training, Foundations and Trends R
in Microeconomics, vol 2, no 5,
pp 363–440, 2006
Printed on acid-free paper
ISBN: 1-60198-002-7
c 2006 H. Frazis and M. A. Loewenstein
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, mechanical, photocopying, recording
or otherwise, without prior written permission of the publishers.
Photocopying. In the USA: This journal is registered at the Copyright Clearance Cen-
ter, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for
internal or personal use, or the internal or personal use of specific clients, is granted by
now Publishers Inc for users registered with the Copyright Clearance Center (CCC). The
‘services’ for users can be found on the internet at: www.copyright.com
For those organizations that have been granted a photocopy license, a separate system
of payment has been arranged. Authorization does not extend to other kinds of copy-
ing, such as that for general distribution, for advertising or promotional purposes, for
creating new collective works, or for resale. In the rest of the world: Permission to pho-
tocopy must be obtained from the copyright owner. Please apply to now Publishers Inc.,
PO Box 1024, Hanover, MA 02339, USA; Tel. +1 781 871 0245; www.nowpublishers.com;
now Publishers Inc. has an exclusive license to publish this material worldwide. Permission
to use this content must be obtained from the copyright license holder. Please apply to now
Publishers, PO Box 179, 2600 AD Delft, The Netherlands, www.nowpublishers.com; e-mail:
Foundations and Trends R
in
Microeconomics
Volume 2 Issue 5, 2006
Editorial Board
Editor-in-Chief:
W. Kip Viscusi
Vanderbilt University
Editors
Richard Carson, UC San Diego (environmental economics)
Joseph Harrington, Johns Hopkins University (industrial organization)
Tom Kniesner, Syracuse University (labor economics)
Mark V. Pauly, University of Pennsylvania (health economics)
David Wildasin, University of Kentucky (public economics)
Peter Zweifel, University of Zurich (insurance economics)
Editorial Scope

in Microeconomics will publish survey
and tutorial articles in the following topics:
• Environmental Economics • Labor Supply

• Contingent Valuation • Labor Demand
• Environmental Health Risks • Labor Market Institutions
• Climate Change • Search Theory
• Endangered Species • Wage Structure
• Market-based Policy Instruments • Income Distribution
• Health Economics • Race and Gender
• Moral Hazard • Law and Economics
• Medical Care Markets • Models of Litigation
• Medical Malpractice • Crime
• Insurance economics • Torts, Contracts and Property
• Industrial Organization • Constitutional Law
• Theory of the Firm • Public Economics
• Regulatory Economics • Public Goods
• Market Structure • Environmental Taxation
• Auctions • Social Insurance
• Monopolies and Antitrust • Public Finance
• Transaction Cost Economics • International Taxation
• Labor Economics
Information for Librarians

in Microeconomics, 2006, Volume 2, 5 issues. ISSN
paper version 1547-9846. ISSN online version 1547-9854. Also available as a
combined paper and online subscription.
in
Microeconomics
Vol. 2, No 5 (2006) 363–440

c 2006 H. Frazis and M. A. Loewenstein
DOI: 10.1561/0700000008
On-the-Job-Training
Harley Frazis1 and Mark A. Loewenstein2
1
Bureau of Labor Statistics, 2 Massachusetts Ave. NE, Suite 4945,
Washington D.C. 20212, Frazis.Harley@bls.gov
2
Bureau of Labor Statistics, 2 Massachusetts Ave. NE, Suite 4130,
Washington D.C. 20212, Loewenstein.Mark@bls.gov
Abstract
The analysis of how individuals obtain and are paid for their skills is
fundamental to labor economics. The basic idea of human capital theory
is that workers and firms invest in workers’ skills in order to increase
their productivity, much as persons invest in financial or physical assets
to earn income. Workers develop many skills through formal education
not tied to an employer, but an important part of their skills are learned
on the job. This paper is a survey of the recent literature on on-the-job
training, both theoretical and empirical.
Contents
1 Introduction 1
2 Measuring Training 3
3 The Division of the Cost and Return to Training 9

3.1 Why Employers May Share the Return to General
Training 17
3.2 Empirical Evidence on Sharing of General Human
Capital 23
4 The Choice of Training 27

4.1 The Effect of Wage Floors 30
4.2 Is There Underinvestment in Training? 35
5 Matching of High Ability, Low Turnover Workers

to High Training Jobs 39
5.1 Ways that Employers Who Offer Training
Can Reduce Turnover 44
6 Estimating the Effect of Training on Wages,

Productivity, and Turnover 49
ix
6.1 Estimating the Effect of Training on Wages 49
6.2 Estimating the Effect of Training on Productivity 63
6.3 Estimating the Effect of Training on Job Mobility 66
7 Conclusion 71
References 73
1
Introduction
The analysis of how individuals obtain and are paid for their skills is
fundamental to labor economics. The basic idea of human capital theory
is that workers and firms invest in workers’ skills in order to increase
their productivity, much as persons invest in financial or physical assets
to earn income. Workers develop many skills through formal education
not tied to an employer, but an important part of their skills are learned
on the job. This paper is a survey of the recent literature on on-the-job
training, both theoretical and empirical.
While the roots of human capital theory (including the metaphor of
skills as capital) go back at least to Adam Smith (1904) modern human
capital theory was developed in the late 1950s by such economists
as Theodore Schultz (1962), Jacob Mincer (1962), and Gary Becker
(1962). For a period of some two to three decades, the theory of on-
the-job training was dominated by Becker’s (1962) analysis of general
and specific human capital. Empirical work followed the lead of Min-
cer (1962, 1974), who imputed the amount of on-the-job training from
wage-experience profiles.
Because data on the actual amount of on-the-job training were not
available, Mincer’s attempts to measure such training were indirect.
1
2 Introduction
In the last two decades, as datasets with information on training have

become more plentiful, researchers using direct measures of training
have been able to examine its effects and test human capital theory.
Simultaneously, partly in response to empirical findings and partly in
response to advances in the analysis of the relationship between workers
and firms, theorists enriched and in some cases contradicted the Becker
model. We focus on this later literature–empirical work using direct
measures of training and theoretical papers inspired by findings from
such empirical work.
One of the clear predictions of the Becker model is that workers
will bear all the costs and reap all the returns to general training,
rather than sharing costs and returns with employers. We discuss sev-
eral strands of empirical results that cast doubt on this conclusion. We
develop a theoretical model similar to others in the literature show-
ing that costs and returns to general human capital may be shared if
training increases mobility costs, if there are constraints on lowering
wages, or if there is uncertainty about the value of training at compet-
ing employers.
Our model also allows us to analyze the choice of the amount of
training, where we emphasize the influence of whether the employer
can commit to training prior to employment. In addition, the model
implies that firms will attempt to match low-turnover workers with
training opportunities, an implication we find much empirical support
for in the literature.
The development of datasets with direct measures of training has
allowed researchers to examine the effects of training on wages and pro-
ductivity. We examine the many potential biases in estimating training
effects. Longitudinal data allow researchers to overcome many of these
biases. After correcting for most forms of bias, we conclude that the
weight of the evidence is that the average rate of return to formal
training for the trained is quite high; one reasonable estimate is in the
neighborhood of 50% for workers with the median (positive) amount of
training. However, no good estimates exist for the return to training for
workers on the margin of being trained, or for the marginal return to
training for trained workers. Productivity returns to training are found
by virtually all researchers to be higher than wage returns.
2
Measuring Training
Much of this survey will be concerned with empirical work on training.

It is helpful to begin our discussion with a brief description of how the
concept “training” has been operationalized in the literature.
Employees acquire skills on the job in a variety of ways. They may be
trained formally in classes, informally by supervisors or co-workers, or
they may become more productive without direct training as a result of
learning-by-doing. Because it is easiest to measure, most empirical work
has involved formal training, but broader training measures including
informal training and learning-by-doing have also been used.
Some early research on training, including Duncan and
Hoffman (1979), Mincer (1988), and Brown (1989), was done using
the broad training question in the Panel Study of Income Dynamics
(PSID). In 1976, 1978, and 1985, the PSID asked the question “On
a job like yours, how long would it take the average new person to
become fully trained and qualified?” (A slightly different question was
asked in 1993.) Since the question does not refer to specific training
activities, it is probably best interpreted as referring to a period at the
beginning of the job where the employee is trained, whether formally
3
4 Measuring Training
or informally, and also increases his or her productivity through

learning-by-doing.
Because it may be difficult for survey respondents to recall episodes
of informal training, relatively few surveys attempt to measure it; for a
detailed look at attempts to measure informal training, we refer the
reader to Loewenstein and Spletzer (1999a). The Employer Oppor-
tunity Pilot Project (EOPP) survey of 1982 and the Small Business
Administration (SBA) survey of 1992 attempted to measure infor-
mal training by asking establishments about the number of hours
new employees spend in particular informal training activities such as
receiving individualized training from line supervisors or co-workers, or
watching co-workers perform the job. These questions are asked about
the first three months on the job of the last person hired. (EOPP and
SBA also ask a question similar to the PSID question.) As reported by
Frazis et al. (1998), in order to minimize recall error, the 1995 Survey of
Employer Provided Training (SEPT95) attempted to measure informal
training by asking employees to fill out a training log while on the job.
Not surprisingly, given the difficulty of collecting informal train-
ing data, more datasets contain measures of formal training. Even in
the case of formal training, differences in reference periods, samples,
the definition of training, and the boundary between “education” and
“training” can lead to substantial differences in the estimated incidence
of formal training. Upon scrutiny, these differences can sometimes be
reconciled. Examining the formal training information in EOPP, the
1979 cohort of the National Longitudinal Study of Youth (NLSY79),
a January 1991 supplement to the Current Population Survey, and
the National Longitudinal Survey of the High School Class of 1972,
Loewenstein and Spletzer (1999a) determine that if one takes into
account differences in sample populations, reference periods, and for-
mal training definitions, the formal training responses are consistent
across the data sets. Excluding formal schooling, they find the annual
incidence of formal training to be about 17% and with about 45% of
workers having received training while on their current job.
More recent estimates appear to be higher. Lerman et al. (1999)
report that the annual incidence of employer-provided or -supported
formal training in the 1995 National Household Education Survey
5
(NHES) is 27%. Similarly, the incidence of formal training in SBA is 32

percent. Analyzing the 1994–1995 International Adult Literacy Survey
(IALS), O’Connell (1999) finds a still higher US incidence of educa-
tion or training for career or job-related purposes by the employed of
46%.1 However, this figure may well include informal training since
the training question is worded quite broadly and includes “on-the-job
training.”
Differences in the definition of training make comparisons across
data sets especially difficult, but the results presented above suggest
that formal training incidence may have risen during the 1990s. Espe-
cially striking is the fact that formal training incidence is about 18%
higher in SBA than in the very similarly designed EOPP (32% vs. 14%),
although this may be partly due to the fact that EOPP oversampled
low wage jobs. (For further discussion of how EOPP compares with
SBA, see Barron et al. (1997b).)
In contrast to formal training, EOPP and SBA indicate that nearly
all workers receive some informal training. Not only is the incidence
of informal training higher than the incidence of formal training, but
informal training spells appear to last longer than formal training spells.
In the SBA, workers on average receive 144 h of informal training during
the first three months of employment. However, the average length
of a formal training spell is only 89 h. Consequently, formal training
constitutes only 13% of total training hours. In SEPT95, this figure is
higher, but still only 30%. The SEPT95 includes all levels of tenure
but only establishments with 50 or more employees; the SBA includes
all sizes of establishments, but only covers the first three months of
employment. In light of Frazis et al.’s finding that the formality of
training increases with size and tenure, one can treat these estimates
as bounds.
As reported by Bassanini et al. (2005), the OECD has attempted
to produce internationally comparable training statistics by combining
data from surveys with similar instruments in different countries: the
household IALS and the establishment Continuing Vocational Training
1 Fraziset al. (1998) report an anomalously high figure of 70% in SEPT95; this may be due
at least partly to restricting the sample to large establishments and to non-response.
6 Measuring Training
Survey (CVTS). Comparing Europe and the US, Scandinavian coun-

tries appear to have particularly high amounts of formal training,
Eastern Europe appears to have low amounts, and the US seems to
be in the middle (see Pischke (2006) for a discussion of how compa-
rability might be affected by varying labor market institutions across
countries).
While cross-sectional data sets can be useful in examining the extent
and incidence of training, panel data methods are very useful in esti-
mating the effects of training. Thus it is not surprising that a dis-
proportionate amount of research on on-the-job training is done using
panel datasets, especially the 1979 cohort of the National Longitudinal
Study of Youth (NLSY79), a relatively early longitudinal dataset with
detailed questions on formal training in (almost) every wave. (Frazis
and Spletzer (2005) provide a non-technical review of NLSY79 train-
ing research.) More recently, European countries have developed panel
datasets with detailed training sequences, such as the European Com-
munity Household Panel (ECHP), the British Household Panel Sur-
vey (BHPS), and the German Socio-Economic Panel (GSOEP); see
Arulampalam et al. (2004b) for an example of research using the ECHP,
Arulampalam et al. (2004a) for the BHPS, and Pischke (2001) for the
GSOEP.
How well is training measured? The only evidence on this question
that we are aware of is a validation study by Barron et al. (1997b),
who conducted a survey where they matched employers’ reports of the
training provided to their most recently hired worker during the first
month on the job with the workers’ own reports. They found a correla-
tion between employers’ and workers’ reports of total hours of training
of only 0.47, indicating substantial measurement error. However, they
found no substantial difference in the degree of measurement error (as
measured by the correlation between employers’ and workers’ reports)
between formal and informal training. Surprisingly, there was substan-
tial divergence between reports of the incidence of formal training, with
only a 0.32 correlation in the reported incidence of on-site training
and a 0.38 correlation for off-site training. So there is a good possibil-
ity that the effect of formal training is typically underestimated, even
with informal training as an omitted variable. Later, we will discuss
7
the effects of measurement error on estimates of the wage return to

training.
To summarize our description of the available data, we observe
that relative to the early years of human capital research there is
now a wealth of data on on-the-job training. Moreover, the increas-
ing availability of training data in longitudinal datasets has greatly
aided research on the effects of training. However, it is clear that there
are important limitations in existing data. Most datasets only contain
information on formal training, even though the small amount of data
on informal training that exists indicates that most training is infor-
mal. Even the data on formal training are inconsistent across datasets
and plagued by measurement error.
3
The Division of the Cost and Return to Training
There are two distinct decisions that must be made with respect to
training. First, employers and workers must decide how much train-
ing to undertake. Second, employers and workers must determine how
to share the cost and return to training. Labor turnover considerations
play a fundamental role in shaping these decisions. Unlike physical cap-
ital, workers cannot sell their human capital. When a worker leaves an
employer, the worker’s human capital goes with him. The employer
loses the opportunity to derive any further benefits from it and the
worker loses the opportunity to use it at the employer’s workplace.
Thus, the employer’s and worker’s willingness to invest in training will
depend on the likelihood of a quit or dismissal in the future. Further-
more, as discussed below, decisions about the division of the return and
cost to training between employer and worker will be heavily influenced
by the consequent effects on turnover.
Becker’s (1962) distinction between general and specific human cap-
ital is a key concept in thinking about the relationship between training,
turnover, and the division of returns to training. Specific skills are only
useful at one employer, while general skills make a worker more pro-
ductive at many employers. Training that teaches a worker about an
9
10 The Division of the Cost and Return to Training
employer’s idiosyncratic production process would be an example of a

specific human capital investment.
In Becker’s original model, when the labor market is competitive
and skills purely general, workers’ wages always equal their productivity
net of the costs of training. Workers thus realize the entire return to
and pay the entire cost of general training. In contrast, a worker with
specific skills is more productive at one employer than elsewhere. Thus,
specific human capital in effect creates a bilateral monopoly. Workers’
post-training wages will generally fall somewhere between the value of
marginal product at the current employer and the value of marginal
product elsewhere, and employers and workers will generally share the
return to and cost of training. Workers realizing part of the return to
training in the form of a high post-training wage will pay part of the
cost in the form of a low pre-training wage.
Investments in specific human capital make turnover costly. A dis-
missal or layoff imposes a loss on a worker, who loses the return to his
or her past investment in specific training. Similarly, a quit imposes a
loss on an employer, who loses the return on any training investment
he may have made in the worker. Turnover is inefficient if the gain to
the party initiating a separation is less than the loss imposed on the
other party.
Becker notes that the division of the costs and returns to specific
training has important effects on turnover: the greater is workers’ share
of costs and returns, the lower are quits and the higher are dismissals.
Division of the returns to training can be used to minimize inefficient
turnover. However, before turning to a discussion of the division of costs
and returns, we briefly discuss other simple mechanisms that might be
used to eliminate inefficient turnover.
Mortensen (1978) points out two mechanisms that could in theory
eliminate inefficient separations. First, the employee and employer can
post turnover bonds. The employee’s (employer’s) bond compensates
the employer (employee) for the loss incurred when the employee sep-
arates. However, Black and Loewenstein (1997) note that while one
occasionally observes turnover bonds, their general use is limited by
the fact that they are hard to implement when the exact value of the
11
match to one party is not known by the other party.1 In addition, as

noted by Carmichael (1983), turnover bonds suffer from the disadvan-
tage of providing the worker and employer with an incentive to induce
the other party to initiate turnover
The use of counteroffers, where each party in effect pays the other
not to separate, is a second mechanism that could be used to eliminate
turnover. However, as Mortensen notes, a counteroffer arrangement
where each party is guaranteed the value of his or her best alterna-
tive offer has the undesirable effect of encouraging too much search. In
addition, as with turnover bonds, the use of counteroffers is limited by
the fact that it is often difficult to ascertain the exact value each party
places on the current match and on the best alternative.
When contracts that specify payments contingent on private infor-
mation are not feasible, it is not possible to eliminate all inefficient
separations. Two alternative contracting arrangements have received
the most attention in the literature.2 Sometimes it is assumed that
agents will do what is best for them ex-post. Alternatively, it is often
assumed that the employer can make binding future wage commitments
at the beginning of the employment contract. In the latter vein, Becker
discusses the problem of choosing a post-training wage that balances
the losses from quits and dismissals. Hashimoto (1981) analyzes this
problem more formally. For present purposes, it is useful to start our
discussion by considering a slightly modified version of Hashimoto’s
model that is presented in Frazis and Loewenstein (2006).
1 As examples of turnover bonds, Black and Loewenstein cite the article, “Firms Forcing
Employees to Repay Some Costs if They Quit Too Soon,” in The Wall Street Journal,
Tuesday, 16 July 1985, which indicates that corporations such as Electronic Data Sys-
tems, General Dynamics, McDonald Douglas, and Northrop required employees to repay
relocation costs if they quit within a specified period of time, usually one year, Lockheed
required employees to reimburse educational expenses if they quit within one year, and
American Airlines required pilots to reimburse on a prorated basis their $10,000 training
expense.
2 For a discussion of a fuller set of possible contracts, see Hall and Lazear (1984). Hall and
Lazear focus on demand uncertainty. By way of contrast, Black and Loewenstein (1998)
focus on uncertainty that is entirely match specific. However, the issues are fundamentally
the same. When there is imperfect information, no practical labor market contract will
eliminate all inefficient separations and it becomes necessary to resort to a second best
solution to allocate labor.
Consider a match between an employer and a worker who is in the

labor market for two periods. The worker is hired in period 1. At the
beginning of period 2, the employer decides whether or not to dismiss
the worker and the worker decides whether to remain at the employer
or to quit and work somewhere else. Let H denote the worker’s start-
ing value of marginal product in period 1. As a result of on-the-job
training, the worker’s expected productivity is higher in period 2. Let-
ting h denote the value in period 2 of the human capital accumulated
in the initial period, the worker’s expected value of marginal prod-
uct if he or she remains with the employer is H + h. The perceived
value elsewhere of the worker’s marginal product is given by H + γh,
where 0 ≤ γ ≤ 1. Note that if the training the worker receives does not
raise the worker’s productivity elsewhere or is not observed by other
employers, then γ = 0. At the other extreme, γ = 1 if training is gen-
eral. As a result of a firm-specific demand or cost shock, which the
employer observes at the beginning of period 2, the worker’s actual
value of marginal product may differ from H + h. Letting η be a mean
zero random variable denoting the value of the firm-specific shock, the
worker’s actual value of marginal product in period 2 is H + h + η.
The worker’s utility at the employer in each period is equal to the
wage plus the amenity value ε the worker places on the employer’s job.
The worker observes ε after starting the job. The random value ε has
a mean equal to zero. Let w1 denote the first-period wage and let w2
denote the second-period wage. If the worker quits, he or she incurs the
moving cost c and receives the alternative wage
wA = H + γh. (3.1)
As discussed further below, we allow the moving cost c to depend on

the amount of training the worker receives. Let
π2 = D + η, (3.2)
denote the employer’s second-period profit if he retains the worker,

where
D ≡ H + h − w2 (3.3)
13
is the rent the employer extracts when η is zero. The employer dismisses
the worker if π2 < 0. Letting g(·) denote the density function of the
random variable η, the probability of a dismissal or layoff is simply
Z −D
L= g(η)dη. (3.4)
−∞
Similarly, the worker switches jobs if the utility elsewhere exceeds

the utility from staying, or U2 < U2A , where U2 = w2 + ε is the worker’s
second-period utility if he or she remains with the employer and U2A =
wA − c is the worker’s utility if he or she quits. The probability of a
quit is therefore given by
Z εc
Q= f (ε)dε, (3.5)
−∞
where f (·) denotes the density function of the random variable ε and
εc = D − (c + (1 − γ)h) (3.6)
is the minimum value of ε such that the worker does not quit. The
expected gain to the worker from his or her match with the employer
is given by
U = w1 + δ((1 − L)(1 − Q)E(U2 |U2 ≥ U2A ) + (1 − L)QU2A + LU2A ),

(3.7)
where δ denotes the discount factor. The worker is willing to form a
match with the employer if U is at least as great as the expected utility
U A available to inexperienced workers elsewhere in the labor market.
Let k(h) denote the cost of training, with k 0 > 0, k 00 > 0. The employer’s
expected gain from the match with the worker is
π = H − w1 + δ(1 − L)(1 − Q)E(π2 |π2 ≥ 0) − k(h). (3.8)
The employer chooses first- and second-period wages to maximize π

subject to the constraint that U ≥ U A . Note that the first-period wage
simply serves to divide up the total return to the match between the
employer and worker. In a competitive labor market, the first-period
wage is bid up until the employer’s expected profit over the two periods
is driven to zero, or
w1 = H + δ(1 − L)(1 − Q)E(π2 |π2 ≥ 0) − k(h).

Z ∞
= H + δ(1 − Q) (D + η)g(η)dη − k(h). (3.9)
−D
When deciding whether or not to dismiss a worker, the employer

does not take into account the potential loss that the dismissal imposes
on the worker. The dismissal is inefficient if the worker’s loss exceeds
the employer’s gain; that is, it is inefficient if η < −(ε + c + (1 − γ)h).
Similarly, when deciding whether to quit, the worker does not take
into account the potential loss that a quit imposes on the employer.
The lower the second-period wage, the smaller is the expected loss
from inefficient dismissals and the greater is the expected loss from
inefficient quits. The optimal second-period wage, or, equivalently, the
optimal D, minimizes the expected loss from inefficient separations.
To characterize the optimal wage contract, let
Γ = π + λ(U − U A ), (3.10)
be the Lagrangean for the constrained maximization problem. Setting

the derivative with respect to w1 equal to 0, one finds that λ = 1. The
other first-order condition is obtained by setting the derivative of Γ
with respect to D equal 0. Rearranging terms, this condition can be
written as:
Z ∞ Z ∞
c
f (ε ) (η + D)g(η)dη = g(−D) (ε − εc )f (ε)dε. (3.11)
−D εc
To interpret (3.11), note that the loss imposed on the worker from
a dismissal is the difference between the utility the worker would
have received had the worker stayed with the employer and the utility
the worker receives when moving to another job, or U2 − U2A = (w +
ε) − (H + γh − c) = (H + h − D + ε) − (H + γh − c) = ε − εc . The
amenity value ε is initially unknown, so the worker’s expected loss
R∞
from a dismissal is εc (ε − εc )f (ε)dε. Since the marginal effect of
an increase in the wage on the probability of a dismissal is given
by (∂L/∂D)(∂D/∂w) = g(−D), the right-hand side of (3.11) is the
marginal effect of an increase in the wage on the expected loss to the
15
worker from a dismissal. Similarly, the left-hand side of (3.11) is the

marginal reduction in the expected loss to the employer from the lower
quit probability that results from a higher wage. At the optimum, the
increase in the worker’s expected capital loss from a marginal increase
in w must just equal the reduction in the employer’s expected capital
loss. The second-order condition requires that
Z ∞ Z ∞
c 0 c
f (ε ) g(η)dη + f (ε ) (D + η)g(η)dη + g(−D)
−D −D
Z ∞ Z ∞
0
× f (ε)dε + g (−D) (ε − εc )f (ε)dε > 0,
εc εc
and is satisfied if the responsiveness of the quit and dismissal rates do
not decrease too much as the wage rate increases.
Note that while we have interpreted ε as the value the worker places
on the amenities of the employer’s job, more generally ε can represent
uncertainty about anything that affects the value of the employer’s job
relative to the value of other jobs. For example, there may be uncer-
tainty about the value of training elsewhere. If the worker’s productivity
elsewhere is H + γh − ε, one still obtains Eq. (3.11).
We now examine the case where the employer cannot commit to
an arbitrary second-period wage. In the current context, the employer
cannot offer a wage that depends on the amenity value the worker
places on the job because this is the private information of the worker.
However, the employer can offer a wage that depends on the realized
value of the worker’s marginal product, something that the employer
observes. The employer will want to choose a wage offer that maximizes
(1 − Q(η))(D(η) + η), the expected profit in period 2.
Leuven (2005) and Acemoglu and Pischke (1999b) assume a Nash
bargaining solution where the worker receives a share β of the second-
period rent, so that in terms of our model the second-period wage is
given by w2 = U2A + β(H + h + ε + η − U2A ).3 Our set up differs in
that the parties cannot contract on the values of η and ε, which are
3 Incontrast, Balmaceda (2005) assumes that the outside options do not work as threat
points in bargaining when they are not binding, which in our present case means that w2 =
β(H + h + ε + η) provided that 0 < β(H + h + ε + η) < U2A . If β(H + h + ε + η) < U2A ,
the worker’s outside option is an effective threat point and he receives a wage of U2A ; if
0 > β(H + h + ε + η), the match does not continue. Note that when the outside options
private information. The most natural assumption in this situation is

to give the employer the ability to make a take-it-or-leave-it offer. (Note
that employers’ take-it-or-leave-it offers can be made credible if employ-
ers can establish reputations for making them.) Differentiating with
respect to D, one finds that the second-period wage offer maximizing
the employer’s expected profit in period 2 satisfies
(D(η) + η) − m(D(η)) = 0, (3.12)
where m(x) ≡ (1 − Q(x))/f (x). Note that second-order condition to

the profit maximization problem requires that m0 < 1. (Log-concavity
of F (·) is sufficient to guarantee this.)
Equation (3.12) implicitly defines the no-commitment second-period
wage as a function of the productivity shock η. Differentiating with
respect to D, one finds that
∂D(η) 1
= < 0, (3.13)
∂η −1 + m0
which implies that ∂w2 /∂η > 0: The second-period wage is increasing in
the productivity shock η. To retain well-matched workers, the employer
offers a share, but generally not all, of the return to a favorable match-
specific productivity shock. As in the fixed-wage model, there are inef-
ficient quits in the no-wage-commitment model. The worker quits when
ε < εc = D − c + (γ − 1)h = m − η − c + (γ − 1)h, but it is only effi-
cient to quit when ε = −η − c + (γ − 1)h. However, there are no inef-
ficient dismissals. Rather than dismiss the worker when η is low, the
employer simply lowers the wage. Following up on this point, Black and
Loewenstein (1997) note that the fixed-wage and no-commitment sce-
narios can be considered as special cases of a more general contract that
specifies a wage floor, but allows the employer to offer a higher wage
should he choose to do so. The wage floor leads to inefficient dismissals,
but limits the inefficient quits that will result from rent extraction by
the employer.
are not binding, ∂w2 /∂h = β, so that the employer and the worker share the return to
training.
3.1. Why Employers May Share the Return to General Training 17
3.1 Why Employers May Share the Return to General

Training
The existing evidence from surveys indicates that employers and work-
ers both believe that employer-provided training is typically gen-
eral. Specifically, employers in EOPP were asked, “How many of
the skills learned by new employees in this job are useful outside of
the company?” As reported by Loewenstein and Spletzer (1999b),
58% of employers indicate that almost all the skills learned by a
new employee are useful outside the company and only 8% indi-
cate that none of the skills are useful elsewhere. Similarly, in 1993,
the NLSY asked workers, “How many of the skills that you learned
in this training program do you think could be useful in doing
the same kind of work for an employer different than the current
employer?” Loewenstein and Spletzer report that 63% of the work-
ers receiving formal training respond that “all or almost all” of the
skills they learned are useful at another employer. Only 5% of the
workers indicate that “none or almost none” of the skills are useful
elsewhere.
In Becker’s model, workers realize the full return and bear the full
cost of general training. However, Acemoglu and Pischke (1999a,b) and
others point out that various labor market imperfections have the effect
of compressing the wage structure, with the result that skilled work-
ers do not receive their full marginal product and employers have an
incentive to share the cost of training. Becker (1962, p. 25) himself
indicates that imperfect competition can make training specific in an
economic sense, noting that “monopsony power as a whole, including
the more extreme manifestations, would appear to increase the impor-
tance of specific training and the incentive for firms to invest in human
capital . . . a relatively large difference between marginal product and
wages in monopsonies might measure, therefore, the combined effect
of economic power and a relatively large investment in employees.”
The simple model presented above can be used to illustrate the various
arguments. To this end, it does not really matter whether or not the
employer commits to a second-period wage. For convenience, we will
analyze the case where the employer makes a wage commitment.
Let D0 denote the optimal value of D, that is, the value that satis-
fies (3.11). Differentiating (3.11), (3.9) and (3.5) yields
∂D0 /∂h = M (c0 + (1 − γ)), (3.14)

∂D0

c 0
∂Q/∂h = f (ε ) − c − (1 − γ)
∂h
= f (εc )(M − 1)(c0 + (1 − γ)), (3.15)
Z ∞
0 ∂Q
∂w1 /∂h = −k (h) − δ (D0 + η)g(η)dη
∂h −D
Z ∞
∂D0

+ δ(1 − Q) g(η)dη , (3.16)
∂h −D
where
Z ∞ Z ∞
c 0 c
M ≡ f (ε ) g(η)dη + f (ε ) (D + η)g(η)dη
−D −D
Z ∞ Z ∞ −1
0 c
+ g(−D) f (ε)dε + g (−D) (ε − ε )f (ε)dε
εc εc
Z ∞ Z ∞
c 0 c
× f (ε ) g(η)dη + f (ε ) (D + η)g(η)dη .
−D −D
If γ = 1 and c0 = 0, then it follows from (3.14) and (3.3) that

∂D0 /∂h = 0 and ∂w2 /∂h = 1. That is, if training is general and fully
recognized by alternative employers and if the cost of switching jobs is
independent of the worker’s human capital stock, the worker receives
the entire return to training in the form of a higher second-period wage.
Rent extraction by the employer is unaffected by increased training,
as is the probability of a quit and of a dismissal. From (3.14), (3.15),
and (3.16), we see that ∂w1 /∂h = −k 0 (h): the worker pays for the entire
cost of training in the form of a lower starting wage.
Some authors have pointed out that the existence of asymmetric
information can cause employers to share in the returns and costs of
investments in what would otherwise be general human capital. For
example, Katz and Ziderman (1990) and Chang and Wang (1996) pos-
tulate that alternative employers only imperfectly observe past invest-
ments in human capital. In a similar vein, Acemoglu and Pischke (1998)
develop a model in which (a) employers can only fully ascertain a

worker’s ability after observing him on the job and (b) ability and
training are complements in production. These two conditions imply
that alternative employers do not fully value the training a worker has
received in the past. If imperfect information prevents a new employer
from fully valuing and rewarding a worker’s past human capital invest-
ment at a previous firm, then the investment effectively becomes spe-
cific. (In a different twist of the imperfect information argument, Autor
(2001) argues that employers in the temporary-help industry offer gen-
eral training as a way to screen and test workers.)
In terms of our present model, if other employers do not fully
observe the training provided by the worker’s current employer, then
γ < 1. Provided that f 0 (εc ) and g 0 (−D) are not too negative, it follows
from (3.12) that 0 < ∂D0 /∂h = M (1 − γ) < 1: The employer realizes
part of the return to training.4 From (3.14), one sees that ∂w1 /∂h >
−k 0 (h): The full cost of training is not reflected in the first-period wage.
Since the employer shares the return to training in period 2, he shares
the cost in period 1.
Casas-Arce (2004) points out another reason why γ may be less than
1: if an employer offers general training that is complementary to the spe-
cific training that he provides, then the effect of general training on pro-
ductivity will be higher at the employer than elsewhere.5 Once again, the
employer will be able to realize part of the return to general training.
R∞ R∞
4 If f (εc ) −D g(η)dη < −f 0 (εc ) −D (D + η)g(η)dη, then M < 0, which implies that
∂D0 /∂h < 0 (i.e., ∂w/∂h > 1). To see why, note that at the initial value of D, an increase
in h leads to a fall in εc . If f 0 is negative, a fall in εc causes an increase in the responsive-
ness of the quit rate to a change in the wage, which leads to an increase in the worker’s
share of the second-period rent. If this effect is sufficiently strong, the wage increase
will exceed the increase in productivity
R and the total
R ∞ rent cextracted by the employer
will fall. Similarly, if g(−D) ε∞ 0
c f (ε)dε < −g (−D) εc (ε − ε )f (ε)dε, it is possible that
∂D0 /∂h > 1 (i.e., ∂w/∂h < 0). The empirical evidence R∞ does not supportR ∞ either of these
extreme cases, and we shall assume that f (εc ) −D g(η)dη > −f 0 (εc ) −D (D + η)g(η)dη
R∞ R
and g(−D) εc f (ε)dε > −g 0 (−D) ε∞ c
c (ε − ε )f (ε)dε. (Note that the second-order condi-
tion implies that at least one of these conditions must hold.)
5 Balmaceda (2005) argues that neither complementarity between general and specific human
capital in the production function nor labor market frictions are required for employers to
share the return to training. Rather, he notes that the “presence of specific training cre-
ates quasi-rents that have to be divided ex-post between workers and firms according to
the outside option principle. . . When the surplus is shared, the firm appropriates a share
of the returns on general and specific training ex-post.” Specifically, see the discussion in
Similarly, Bishop (1991) argues that employers require different mixes of

general skills. Each firm provides that combination of skills that it needs,
concentrating on skills that it values highly. Thus, although the individ-
ual skills may be general, the skill mix is employer-specific.
Several authors have suggested that the cost of locating and mov-
ing to a new job generally increases with a worker’s stock of human
capital. Acemoglu and Pischke (1999b) note that a period of unem-
ployment spent searching for another job will be more costly to higher
paid workers. (Evidence suggests that workers who quit often find new
jobs without a period of unemployment, but one might generalize this
basic argument by pointing to the higher paid worker’s higher value of
time spent searching).
Frazis and Loewenstein (2006) note that more senior positions at an
employer are typically filled from within by workers who have proved
to be a good match. Consequently, a worker with more experience and
training who is searching for a job typically has a smaller set of openings
available to him and will generally find it more difficult and costly to
find and relocate to an employer who can use his or her skills effectively.
Zoega and Booth’s (2005) model of wage compression assumes that the
set of employers who can use more able, higher skilled workers is smaller
than the set of employers who can use less skilled workers. Stevens
(1994) makes a similar point, noting that “training can be a competition
reducing process.” In a related vein, Neal (1995) shows that a significant
amount of the skills acquired by workers “are neither completely general
nor firm-specific but rather specific to their industry or line of work.”
(This suggests that moving costs will be higher for workers who are in
industries with fewer and/or more geographically dispersed firms. And
moving costs will vary among occupations; secretaries, for example, will
generally have fewer industry-specific skills and lower mobility costs.)
In terms of our current model, a positive relationship between mobil-
ity cost and human capital means that we may write moving cost c as
an increasing function of H + h : c = c(H + h), c0 > 0. When c0 > 0, it
once again follows from (3.14), (3.15), and (3.16) that 0 < ∂D0 /∂h < 1
footnote 3. Note that one does not obtain Balmaceda’s result when one assumes a Nash
bargaining solution.
and ∂w1 /∂h > −k 0 (h). That is, if it is more costly for higher skilled
workers to change jobs, then trained workers’ higher productivity will
only be partially reflected in the post-training wage and employers will
share the cost of general training.
Acemoglu and Pischke (1999b) point out that costly search gives
prospective employers some monopsony power, preventing workers from
capturing the full value of their marginal product if they move to
another job. Similarly, Stevens (1994) presents a model in which a lim-
ited number of firms bid for a trained worker’s services. In equilibrium,
the worker moves to the employer at which he or she has the highest
value of marginal product, but the worker receives a wage equal to his or
her second highest value of marginal product. Frazis and Loewenstein
(2006) provide another reason why a worker moving to a new employer
will not receive the full value of marginal product: the existence of an
“equity norm” that prevents an employer from paying retained work-
ers less than equally productive experienced workers hired from the
outside. It is easy to imagine that an employer’s senior workers will
be unhappy and put forth less effort, if they receive a lower wage than
other experienced workers who are no more productive, but who simply
began their careers at other firms. Such behavior seems consistent with
both casual observation of the labor market and experimental studies
cited in Akerlof and Yellen (1990). (And equity norm considerations
might help explain why many employers have secrecy rules concerning
workers’ pay.)6
Whether due to monopsony or equity norm considerations, a reduc-
tion in the wage a skilled worker can receive at a new employer reduces
the worker’s optimal share of the cost and return to human capi-
tal investment. Formally, let the wage the worker can receive at an
6 Ransom (1993) finds that controlling for experience, wages at large research universi-
ties decline with tenure, which would seem to be inconsistent with the existence of an
equity norm. As Ransom (1993) and Black and Loewenstein (1991) note, large distances
between universities lead to high mobility costs in the academic labor market. In the con-
text of a multi-period model where employers make take-it-or-leave-it offers, Black and
Loewenstein (1991) show that this can lead to a declining wage profile, as workers who
stay reveal that their moving costs are particularly high. One might hypothesize that
in the absence of equity norm considerations, wages at universities would decline even
more with tenure. Or perhaps equity norm considerations in academe are weaker because
complementarities in production are less pronounced.
alternative employer be given by
wA = H + h − DA , (3.10 )
and note that the reservation value of ε is now given by
εc = D − DA − (c + (1 − γ)h). (3.60 )
Note too that other than the changed definition of εc , condition (3.11)
is unchanged. This condition implicitly defines D0 as a function of
DA : D0 = χ(DA ). Differentiating D0 with respect to DA , one finds that
χ0 = M : a reduction in the alternative wage causes a partial reduction
in the second-period wage offered by a worker’s initial employer.
In the equity norm model, DA is also a function of D0 : a reduc-
tion in the wage paid to an employer’s senior workers means a fall
in the wage paid to a skilled worker who changes jobs. The equity
norm thus amplifies an initial tendency toward employer sharing of the
return to general human capital acquisition, as the wage compression
and sharing effects reinforce each other. More specifically, competition
for experienced workers will ensure that they do not receive less than
retained workers. If employers hire a mix of inexperienced and experi-
enced workers, labor market equilibrium thus requires that D0 = DA .
Note too that employers will choose to hire a mix of inexperienced
and experienced workers if inexperienced and experienced workers are
strong complements in production, something which seems consistent
with casual observation. For example, it may be efficient to place less
skilled, inexperienced workers in less demanding tasks and let experi-
enced workers concentrate on certain critical tasks for which they are
better suited.
Let D∗ denote the equilibrium value of D. Using (3.60 ) and (3.11),
it is straightforward to show that
(c0 + (1 − γ))
∂D∗ /∂h = . (3.140 )
1−M
Comparing (3.14) and (3.140 ), one sees that the existence of an equity
norm will amplify an initial tendency by employers to share in the costs
and returns to general human capital investment.
3.2. Empirical Evidence on Sharing of General Human Capital 23
Wage guarantees as in Black and Loewenstein (1997) can also lead

to employers sharing the cost and return to general training. As dis-
cussed above, (explicit or implicit) wage guarantees allow employers to
assure workers that they will not extract excessive rents due to work-
ers’ immobility. Formally, a wage guarantee in period 2 shows up in
the form of a constraint w2 ≥ wmin . Loewenstein and Spletzer (1998)
note that if the wage guarantee is set before training is known with cer-
tainty, then higher productivity will not always translate into higher
wages, the end result being that the employer will share the return. In
accordance with the suggestion of Shapiro and Stiglitz (1984) and the
efficiency wage literature, efforts by employers to deter workers from
shirking may also lead to a wage floor. Obviously, if the wage guaran-
tee is binding, ∂D/∂h = 1: the employer realizes the entire return to a
small increase in the worker’s second-period human capital.
3.2 Empirical Evidence on Sharing of General Human

Capital Returns and Costs
What evidence is there for or against models that imply that employ-
ers share the costs and returns to general training? The main piece of
evidence in the literature is that the increase in wages associated with
a training event is larger for future employers than for the employer
providing the training. This implies that the firm doing the training is
getting returns for its investment in the form of post-training productiv-
ity higher than wages, while future employers pay wages closer to post-
training productivity. Such a pattern has been found by Loewenstein
and Spletzer (1998, 1999a) and Lengermann (1999) using NLSY79 data
and by Booth and Bryan (2002) using data from the BHPS. In contrast,
Parent (1999) finds that training other than apprenticeships generates
similar wage returns at the current and the previous employer in the
NLSY79 (in spite of the fact that his method of correcting for worker
heterogeneity is comparable with the fixed-effects methods used by the
other authors).
One obvious concern with this evidence is that comparisons of pre-
vious and current employers may be biased by the endogeneity of job
mobility. Loewenstein and Spletzer (1998) argue that the most likely
effect of endogenous job mobility is to downwardly bias in magnitude

the estimated difference between the effect of training at the employer
providing the training and the effect at future employers, whether that
difference is positive or negative. The argument can be summarized as
follows. Assume that a worker needs a given increase in wages to change
jobs: wnew − wcurrent ≥ c. For a trained worker, wnew − wcurrent can be
decomposed into a change in the return to training and a change in
wages net of training:
wnew − wcurrent = (βnew − βcurrent )T + (vnew − vcurrent ).
The greater is (βnew − βcurrent ), the less (vnew − vcurrent ) needs to be to

make changing jobs worthwhile. Assume (vnew − vcurrent ) is unobserv-
able by the researcher. If βnew > βcurrent , the average observed value of
wnew − wcurrent will be less than (βnew − βcurrent )T , downwardly bias-
ing the estimate of (βnew − βcurrent ), and similarly if βnew < βcurrent .
As related evidence that employers share the return to general train-
ing, Bishop (1991) finds using EOPP data that the return to training at
an employer is not sensitive to the degree to which employers indicate
it is general. Loewenstein and Spletzer (1999b) confirm this result for
EOPP and obtain a similar result in the NLSY (based, of course, on
the worker’s response to the question about the generality of training).
Therefore, if employers are sharing the return to specific training, they
are presumably also sharing the return to general training. In the same
vein, Loewenstein and Spletzer’s (1998) analysis of the NLSY data
indicates that workers’ wage return to training at the current employer
varies less by type of training, which presumably proxies for the gener-
ality of training, than does the wage return by type of training at the
previous employer; in particular, the return to formal company training
at the previous employer is similar to the return to formal training at
the current employer, while the wage return for school provided training
is lower at the previous employer than the current employer.
Another piece of evidence that employers share the return to general
training is the almost universal finding that training has a substantially
greater effect on productivity than it does on wages; if a large propor-
tion of training is general, this implies that returns to general training
3.2. Empirical Evidence on Sharing of General Human Capital 25
are shared. We present some of this evidence below, when we discuss

the effect of training on wages and productivity.
Finally, the apprenticeship training programs that can be found in
various countries suggest quite strongly that employers share the costs
of general training. The German apprenticeship system has received the
most attention in the literature. By its very nature, this training seems
largely general. Estimates of the net cost of apprentices to employers
can be found in Harhoff and Kane (1997). These estimates are calcu-
lated as the sum of wage payments to apprentices, wage payments to
training personnel (estimated as the product of their wage rate and
the proportion of their time that they spent providing training), and
training material costs minus the value of apprentice output, which in
turn is estimated as the product of the amount of time that apprentices
spend in production, the productivity of apprentices relative to skilled
workers, and the wage of skilled workers. The estimates of net cost are
considerable. For example, in 1991, the average estimated net cost of
a German apprentice is $10,657 in 1990 dollars. Smits (2005) reports
net cost estimates for Australia, Netherlands, and Britain in addition
to Germany. The net cost estimates are positive for all of the countries.
To conclude this section, we have just seen that there is substantial
evidence that employers share the costs and returns to general train-
ing, contrary to the Becker (1962) model. The model developed in this
section allowed us to analyze the role that various frictions considered
in the recent literature play in modifying Becker’s conclusion about
the absence of cost-sharing. Turnover is a key factor in thinking about
cost-sharing, as employers will set post-training wages relative to pro-
ductivity to minimize losses from turnover. Training that is useful at
many employers will in a frictionless model have no effect on turnover
and the costs of such training will not be shared with employers. Fric-
tions such as asymmetric information that lead to other employers not
fully valuing general training will induce a relationship between general
training and turnover and consequently sharing of costs and returns.
So will increases in mobility costs caused by training, and incomplete
responsiveness of wages to productivity growth caused by post-training
wage guarantees.
4
The Choice of Training
As we now show, the choice of training depends crucially on whether

employers can commit to provide specified training levels. Such com-
mitments are likely to be problematic. As Acemoglu and Pischke (1998,
1999a,b) note, it may be difficult or even impossible for third parties
to observe the training that a worker receives at a firm, so that an
employer’s training commitment may not be verifiable. In such a case,
a commitment to provide a specified training level will not generally be
credible. (Difficulty in verifying training in especially plausible in light
of the predominance of informal training noted in Section 2.)
If training is purely specific and has no value at other employers,
then a worker cares solely about the future wages that the employer
may promise; the level of training per se is irrelevant to him. But if
training has value elsewhere, the worker also cares about the training
the employer may promise. The employer’s ability or inability to com-
mit to providing a specified amount of training, therefore, has impor-
tant implications for the level of training investment that the employer
chooses. To start, consider the case where the employer commits both
to a training level and to a second-period wage. This corresponds to
Acemoglu and Pischke’s (1999b) “full-competition regime.” Setting the
27
28 The Choice of Training
derivative of (3.10) with respect to h equal to 0, one obtains the first-

order condition describing the choice of h. Using the fact that λ = 1
and rearranging terms, one obtains k 0 (h) = δθ(h), where
θ(h) = (1 − L)(1 − Q) + ((1 − L)Q + L)(γ − c0 (h))

Z ∞
0 c
− (γ − c (h))f (ε ) (η + D)g(η)dη
−D
Z ∞
+ g(−D) (ε − εc )f (ε)dε. (4.1)
εc
In choosing training, the employer equates the marginal cost to the

discounted marginal benefit. The marginal benefit has three compo-
nents. The first component, (1 − L)(1 − Q), is the expected gain from
the worker’s higher productivity at the employer in period 2; this gain
is simply the increase in the worker’s value of marginal product (by def-
inition, equal to the increase in the worker’s human capital) times the
probability that the worker neither quits or is dismissed. The second
component, ((1 − L)Q + L)(γ − c0 (h)), is the worker’s higher produc-
tivity at alternative employers in period 2, net of the increase in the
worker’s cost of moving, times the probability that the worker will
switch jobs. The employer internalizes this gain because it affects the
wage the employer pays in period 1.
The last two terms on the right-hand side of (4.1) represent an effect
that is perhaps more subtle than the first two. As discussed above, it
is not possible to eliminate all inefficient dismissals and quits. Conse-
quently, actions that affect turnover matter to the employer, although
the theoretical analyses of the choice of training that can be found
in the literature generally abstract from this consideration. The third
term reflects the fact that by raising the wage a worker can receive
elsewhere, training increases the probability that the worker quits and
imposes a capital loss on the employer. In interpreting this effect, note
that (γ − c0 (h))f (εc ) is the increase in the worker’s quit probability
R∞
and −D (η + D)g(η)dη is the expected loss that a quit imposes on
the employer. On the other hand, by raising the worker’s productivity,
training lowers the probability of a dismissal that imposes a loss on the
worker. Again, the employer internalizes this gain because it affects the
29
wage the employer pays in period 1. The final term in (4.1) captures
this effect.
As discussed above, there are no dismissals when the employer does
not make a wage commitment. Noting that the employer’s second-
period wage offer and hence the worker’s quit probability and reserva-
tion amenity level depend on η, the marginal return to training becomes
Z ∞ Z ∞
θ(h) = (1 − Q(η)g(η)dη + (γ − c0 (h)) (Q(η)
−∞ −∞
c
− f (ε (η))(η + D))g(η)dη, (4.10 )
Note that the employer continues to internalize the worker’s expected
return to training upon separation because the worker is willing to pay
for this in the form of a lower starting wage.
Finally, consider the choice of training when the worker places no
value on the employer’s training commitment in period 1. Formally, this
means that the training the worker expects to receive and the alter-
native wage the worker expects when switching employers are inde-
pendent of the training level that the employer actually chooses. This
corresponds to the noncooperative regime in Acemoglu and Pischke
(1999b). The marginal return to training is now given by
Z ∞
0 c
θ(h) = (1 − L)(1 − Q) − (γ − c (h))f (ε ) (η + D)g(η)dη. (4.100 )
−D
Note that the terms in Eq. (4.100 )also appear in (4.1). However, the
R∞
terms ((1 − L)Q + L)(γ − c0 (h)) and g(−D) εc (ε − εc )f (ε)dε appear
in Eq. (4.1), but not in (4.100 ). Recall that ((1 − L)Q + L)(γ − c0 (h))
is the net increase in the worker’s value of marginal product elsewhere
due to training weighted by the likelihood that the worker will sepa-
rate. When the employer can commit to training, the worker is willing
to accept a lower wage to get this benefit. But an employer who can-
not commit to training will not internalize this benefit to the worker.
Similarly, the employer will not take into account the benefit to the
worker from a lower dismissal probability. It follows immediately that
an inability to write a contract that specifies the amount of training
causes the employer to provide less training.
Note that the more general is the training, the greater is the adverse
impact of the employer’s inability to commit to a specified training
level. This has led Barron et al. (1999) to argue that employers have an
incentive to replace general training with specific training. If employers
can offer training that is not valuable elsewhere, then they will be more
willing to provide training that market frictions such as the employer’s
inability to commit to training or a liquidity constraint on the part
of young workers (which we will discuss below) prevent workers from
paying for.
We have focused on training investments by the employer, but some
authors have pointed out that workers also make on-the-job human cap-
ital investments. For example, workers may be able to improve their
productivity by investing time and effort into learning about a firm’s
unique production processes or developing relationships with customers
and co-workers. Non-verifiability by third parties will preclude employ-
ers and workers from contracting on the basis of workers’ human capital
investments. The consequent rent extraction by employers will lessen
workers’ incentive to invest in specific human capital. Arrangements
that limit rent extraction thus help encourage workers to make spe-
cific investments in human capital. Kahn and Huberman (1988) argue
that an “up or out” contract with a wage floor is one way of limiting
employer rent extraction and preserving workers’ incentive to invest in
specific human capital. Prendergast (1993) points out that employers
can limit the rent they extract from a match and thereby encourage
workers to invest in specific training if they can credibly commit to
promoting more productive workers to higher paying jobs. He argues
that this commitment is credible if the workers’ human capital invest-
ment makes them sufficiently more productive in the higher paying
job. Alternatively, an employer can run a tournament and commit to
promote a specified number of its most productive workers, although,
as Lazear (1989) notes, tournaments suffer from the disadvantage that
they discourage workers from cooperating with each other.
4.1 The Effect of Wage Floors

A legal minimum wage is the most prominently cited reason for a wage
floor. Since wage profiles generally slope upward, a minimum wage is
especially likely to be binding early in the employment relationship.
4.1. The Effect of Wage Floors 31
The same holds true for a wage floor caused by the fact that young
workers are liquidity constrained. By limiting the ability of employers
to reduce the starting wage, a wage floor that binds at the start of the
job forces employers to share the cost and return to general training –
in Bishop’s (1991) words, “general training masquerades as specific
training.” The amount of training is affected because the employer’s
inability to reduce the starting wage means that the employer no longer
internalizes the full value to the worker of a higher alternative wage.
To see this within the context of our model, write the Lagrangean
corresponding to the constrained profit maximization problem when
the employer commits to a future wage as Γ = π + λ(U − U A ) +
µ1 (w1 − wmin ), so that the first-order condition corresponding to choice
of w1 is now given by
∂Γ/∂w1 = −1 + λ + µ1 = 0, (4.2)
where the Kuhn–Tucker multiplier µ1 is positive, if the wage floor is

binding. When the employer is able to commit to training and to wages,
the marginal return to training is given by
θ(h) = (1 − L)(1 − Q) + (1 − µ1 )((1 − L)Q + L)(γ − c0 (h))

Z ∞
0 c
− (γ − c (h))f (ε ) (η + D)g(η)dη
−D
Z ∞
+ (1 − µ1 )g(−D) (ε − εc )f (ε)dε. (4.3)
εc
Comparing (4.3) and (4.1), one sees that the first-period wage floor
lowers the value the employer places on the worker’s expected second-
period income at other employers, which will cause the marginal return
to training to fall. In addition to this direct effect of the wage floor, there
is an indirect effect stemming from the induced effect on the second-
period wage and turnover. An employer will generally respond to the
first-period wage floor by lowering the second-period wage. (Recall
that the employer is maximizing expected profit subject to providing
the worker with a specified expected utility over two periods. If the
employer pays a higher first-period wage, then other things the same,
the second-period wage must fall if utility is to remain unchanged.) This
will lead to a higher quit probability, which lowers the return to human
capital investment, and a lower dismissal probability, which raises the

return to human capital.
A first-period wage floor has a stronger adverse effect on train-
ing when the employer cannot commit to a future wage. When the
employer commits to a second-period wage, the first-period wage floor
reduces the value the employer places on the increase in the worker’s
alternative wage that results from higher training, as shown by the
second term in (4.3), but it does not eliminate this value altogether.
Although the employer would prefer to reduce the first-period wage by
the increase in the expected value to the worker of the higher alter-
native wage in the second period, the employer is still able to benefit
by reducing the second-period wage in exchange for an increase in the
worker’s alternative wage resulting from training. This is not possible
when the employer cannot commit to a specified second-period wage.
The marginal return to training is therefore given by
Z ∞
θ(h) = (1 − Q(η))g(η)dη
−∞
Z ∞
0
− (γ − c (h)) f (εc (η))(η + D)g(η)dη, (4.30 )
−∞
as the worker’s expected alternative return to training, (γ −

R∞
c0 (h)) −∞ (Q(η)g(η)dη, drops out of θ(h) altogether.
By way of contrast, when the employer commits to the second-
period wage but not to training, the marginal return to training reduces
to (4.100 ) whether or not there is a wage floor. Since the worker is
not willing to accept a lower wage in exchange for an employer’s non-
credible promise to offer training, a first-period wage floor does not have
a direct effect on training (there is only an indirect effect stemming from
the induced effect on turnover).
The preceding discussion has abstracted from entry and exit. Start-
ing from a situation where employers are earning zero economic profit,
the imposition of a binding first-period wage floor will result in nega-
tive profit, which will cause employment to fall and workers’ value of
marginal product to increase. At the new equilibrium, employers will
just be willing to employ workers at the higher starting wage wmin , but
4.1. The Effect of Wage Floors 33
they will not be able to capture the benefit of training to workers in

the form of a lower starting wage.
Now consider the effect that a second-period wage floor has on
training. If wage profiles are upward sloping, a minimum wage that is
imposed on employers (as opposed to a wage guarantee that employers
themselves choose to offer on their own) and that binds in the second
period will also bind in the first period in the absence of a sub-minimum
wage during the training period. However, as just noted above, there
will generally exist a unique zero profit equilibrium where employers
just break even offering the starting wage wmin . Therefore, it generally
only makes sense to discuss a minimum wage that binds in the second
period in the context of a model with limited entry where employers
are able to earn positive profit.
In the preceding section, we noted that a binding second-period
wage floor leads to wage compression. Acemoglu and Pischke (1999b)
argue that this wage compression increases employers’ incentive to offer
training. Their analysis assumes that in the absence of a binding min-
imum wage, the second-period wage is determined exogenously as a
Nash bargaining solution. To see the effect of the second-period wage
floor in our model, consider the case where the employer can com-
mit to neither training nor wages. If the employer is forced to pay a
second-period wage at least as high as wmin , he will lay off a worker
whose second-period productivity falls below wmin − H − h, so that
the return to training becomes
Z ∞
θ(h) = (1 − Q(η))g(η)dη
wmin −H−h
Z ∞
0
− (γ − c (h)) f (εc (η))(η + D)g(η)dη. (4.1000 )
wmin −H−h
Let
µ2 (η) = (1 − Q(η)) − f (εc (η))(D(η) + η) (4.4)
denote the Kuhn–Tucker multiplier associated with the second-period

wage floor. (In the absence of the wage floor, µ2 (η) = 0 and (4.4)
reduces to the first-order condition (3.12).) Substitute (4.4) into (4.1000 )
to obtain
Z ∞
0
θ(h) = (1 − γ + c (h)) (1 − Q (η)) g(η)dη
wmin −H−h
Z ∞
0
+ (γ − c (h)) µ2 (η)(D + η)g(η)dη. (4.5)
wmin −H−h
Since µ2 > 0, one sees from the second term in (4.5) that the second-
period wage floor raises the employer’s return to training if one ignores
turnover effects. This is the analog to Acemoglu and Pischke’s wage
compression effect in our model where the employer chooses the second-
period wage to maximize profit. The wage floor forces the employer to
pay a second-period wage that exceeds the profit-maximizing level. An
increase in productivity raises the employer’s desired wage, bringing
it closer to the wage that the employer is constrained to pay. Other
things the same, this raises the employer’s profit. However, as with the
first-period wage floor, there is an ambiguously signed indirect effect
of the wage floor due to turnover, as the wage floor leads to a positive
probability of dismissal but lowers the quit probability.
Similar comments apply to the other contracting assumptions.
For example, when the employer commits to wages and training, the
return to training can be written as θ(h) = (1 − L)(1 − Q) + ((1 − L)
R∞
Q + L)(γ − c0 (h)) + (1 − (γ − c0 (h)))g(−D) εc (ε − εc )f (ε)dε + (γ −
c0 (h))µ2 . A wage floor raises the return to training because training
raises the desired wage closer to the constrained wage. (As above, the
wage floor also indirectly affects training through a higher dismissal
probability and a lower quit probability.)
The empirical findings concerning the effects of the minimum wage
on training are mixed, with different samples and different methods
yielding different results. Neumark and Wascher (2001), using the 1983
and 1991 training supplements to the CPS find a negative effect of the
minimum wage on training. Acemoglu and Pischke (2003), using the
NLSY79, obtain effects that are not statistically significant, although
Neumark and Wascher’s point estimate falls outside the 95% confidence
interval of their preferred specification. Arulampalam et al. (2004a)
find a marginally statistically significant positive effect of the minimum
4.2. Is There Underinvestment in Training? 35
wage, using BHPS data from before and after the imposition of a min-
imum wage in the UK after a period where there was none.
4.2 Is There Underinvestment in Training?

The marginal social return to training is given by
ψ(h) = (1 − L)(1 − Q) + ((1 − L)Q + L)(γ̂ − c0 (h))
Z ∞
− (γ − c0 (h))f (εc ) (η + D)g(η)dη
−D
Z ∞
+ g(−D) (ε − εc )f (ε)dε, (4.6)
εc
where γ̂ denotes the value of marginal product of a worker who moves
to a new employer in period 2. The socially efficient amount of training,
h∗ , is the quantity such that the marginal social cost just equals the
discounted marginal social return or k 0 (h∗ ) = δψ(h∗ ). We now discuss
three reasons why the marginal private return to training may fall short
of the marginal social return, with the result being that there is too
little training.
4.2.1 Workers’ inability to fully capture the return

to training when switching employers
Note that except for the second term, Eq. (4.6) is identical to (4.1).
The second term in (4.1) is the probability of a separation in period 2
times the worker’s wage return to training at an alternative employer
minus the increase in search cost as the result of increased training. In
contrast, the second term in (4.6) is the probability of a separation in
period 2 times the increased productivity from training at an alternative
employer minus the increase in search cost as the result of increased
training. Thus training may vary from the socially optimal amount due
to the gap between wages and productivity at alternative employers.
This gap represents a social gain in productivity that is not internalized
by the original employer. The difference between the productivity and
wage returns to training at alternative employers does not benefit the
worker and, therefore, does not provide a gain to the original employer
in the form of a lower starting wage.
Earlier we listed two reasons why training may have a smaller effect
on workers’ wages at alternative employers than on their productivity at
alternative employers. Asymmetric information may prevent employers
from fully valuing a worker’s previous training and monopsony power
may enable employers to extract rents.
4.2.2 Employers’ inability to credibly commit to training

As discussed above, it may be difficult for third parties to observe
the training that a worker receives at a firm, so that an employer’s
training commitment may not be verifiable. If an employer’s training
commitment is not credible, his return to training is given by (4.100 ).
Comparing (4.6) and (4.100 ), one sees that ψ(h) − θ(h) = ((1 − L)Q +
R∞
L)(γ − c0 (h)) + g(−D) εc (ε − εc )f (ε)dε > 0. (To focus attention on
the consequences of the employer’s ability to commit to training, we
are assuming that γ̂ = γ. We do the same when we discuss liquidity
constraints below.) The employer’s return to training is less than the
social return because the employer places no value on the return to
training realized by a worker who separates and no value on the fact
that training makes it less likely that the employer will impose a loss
on a worker through a dismissal – once again, because these factors are
not reflected in the starting wage. The more general is training, the
greater will this inefficiency be.
4.2.3 Liquidity constraints

As noted some time ago by Stern and Ritzen (1991) and Bishop (1987),
liquidity constraints on the part of younger workers reduce their abil-
ity to finance training through a lower starting wage. When an imper-
fect capital market prevents a worker from accepting a wage below
some minimal level, we know from our analysis above that in the
model where the employer commits to a future wage, the employer’s
marginal return to training is given by (4.3). (A less restrictive assump-
tion that produces qualitatively the same results is that the worker
has a higher discount factor than the employer.) Comparing (4.3)
and (4.6), one sees that ψ(h) − θ(h) = µ1 ((1 − L)Q + L)(γ − c0 (h))
R∞
+ µ1 g(−D) εc (ε − εc )f (ε)dε > 0. Similar to the case when the
4.2. Is There Underinvestment in Training? 37
employer cannot commit to training, the employer’s return to train-

ing is less than the social return because the employer does not fully
value the return to training to a worker who separates and the capital
loss that a dismissal imposes on the worker.
A high observed rate of return on training is a piece of evidence
that has been given in support of training being too low because of a
liquidity constraint or an inability by employers to commit to training.
The argument here is that employers and workers are not able to take
advantage of the high rate of return; in the absence of the market
imperfection, employers would provide more training, thereby driving
the rate of return down. As we discuss below, however, the inference is
problematic when workers and jobs are heterogeneous. In such a world,
workers who receive training will tend to have a higher return than
those who do not receive training, and one therefore cannot infer the
expected rate of return of training to the untrained from the expected
rate of return to the trained.
To summarize our discussion of the choice of training, we have seen
that a key determinant of the amount of training employers offer is
the extent to which employers are able to internalize the productivity
benefits of training to other potential employers by lowering the wage.
Barriers to internalization include employers’ inability to credibly com-
mit to training, workers’ inability to fully capture the return to training
when switching jobs, and wage floors. However, wage floors that bind
after training may encourage employers to train by compressing wages
relative to productivity.
If one grants that market imperfections lead to less than optimal
amounts of training, designing policy remedies presents challenges.
As discussed above, a large proportion of training is informal, pre-
sumably the least-cost method for the training being delivered. How-
ever, it is more difficult to monitor informal training than for-
mal training. Any government training mandates or training sub-
sidies would therefore presumably be directed toward formal train-
ing and, as Barron et al. (1997a) note, may induce substitution
from less expensive to more expensive forms of human capital
accumulation.
Another issue in designing training policies is the tradeoff between

equity and efficiency. Other than liquidity constraints, many of the
arguments for the under-provision of training apply with greater force
to situations where the workers in question are highly skilled. As dis-
cussed above, monopsony considerations are more relevant for highly
skilled workers because the markets for their skills are thinner. It is
also likely more difficult to evaluate the skills of more highly skilled
workers, so asymmetric information considerations are probably more
important for more highly skilled and highly paid workers.
Barron et al. (1997a) argue that in light of the fact that workers
receive a substantial amount of training at the start of the job, the best
way to raise the incomes of the economically disadvantaged may simply
be to pursue policies that provide incentives for them to work and for
employers to hire them. This argument is strengthened by the fact that
quite a bit of training seems to be general, so that the skills workers
learn on one job have value elsewhere. But it is weakened to the extent
that low-skill jobs offer less training. We now turn to discussing how
workers are matched to jobs with varying amounts of training.
5
Matching of High Ability, Low Turnover Workers
to High Training Jobs
As noted by Becker (1962) and Rosen (1972), competition among work-

ers for jobs means that jobs that offer greater training, productivity
growth, and wage growth will earn a lower starting wage: if high train-
ing jobs with higher period 2 wages do not pay lower period 1 wages,
then all workers will want to be in the high training jobs. Of course,
long run competitive equilibrium requires not only that workers be
indifferent about which job they are in, but that employers must also
earn zero profit. As discussed above, the latter condition means that
Eq. (3.9) must hold. Equation (3.9) indicates that other things the
same, workers pay for the cost of training k(h) by receiving a lower
starting wage. But Eq. (3.9) also indicates that an employer’s expected
return to training in period 2 is paid to the worker in the form of a
higher starting wage. If this effect is strong enough, training will actu-
ally be positively correlated with the starting wage. What mechanism
ensures that the zero profit condition Eq. (3.9) holds and that workers
are indifferent between high training and low training jobs?
Suppose that we initially have a condition where workers receive
a higher utility in high training jobs than in low training jobs. Then
workers will move from low training jobs to the high training jobs,
39
40 Matching of High Ability, Low Turnover Workers to High Training Jobs
causing the price of the output in high (low) training jobs to fall (rise).
This in turn means that the value of marginal product, H, will fall in
high training jobs relative to low training jobs, which will cause wages
to fall in high training jobs and rise in low training jobs. In equilibrium,
H will therefore be higher in low training jobs than in high training
jobs. The inverse correlation between H and h across jobs will be just
sufficient to ensure that workers are indifferent between high and low
training jobs. Starting wages will be lower in high training jobs and
period 2 wages will be higher.
Barron et al. (1989) and Barron et al. (1999) find at most a weak
negative relationship between training and the starting wage. One con-
tributing factor behind the inability to find a negative relationship
between the starting wage and training is self-selection. As Barron
et al. (1989) note, there is good reason to believe that the cost of
training is lower for higher ability and more educated workers. Consis-
tent with this hypothesis is the finding by a number of authors that
training is positively correlated with education. The positive correlation
between education and on-the-job training was first noted by Mincer
(1962), but as discussed by Lillard and Tan (1992) and Loewenstein
and Spletzer (1999a), exists in all datasets for both formal and infor-
mal training.
Similarly, worker ability is also likely positively correlated with
training. Cognitive skills (as measured by the Armed Forces Quali-
fying Test) are found to be strongly associated with formal training
incidence by Loewenstein and Spletzer (1997) and Veum (1995). Indi-
rect evidence that education and ability are positively correlated with
training is provided by Neal (1998), who shows that turnover rates are
lower for more able, better educated workers and that this results in
part from a sorting of more able, more educated workers into differ-
ent jobs that presumably require greater investment in human capital.
(We discuss the effect of training on mobility below. Neal suggests that
more able workers are choosing jobs with a higher proportion of specific
to general human capital, but all that is really required for his find-
ings is that the proportion of skills that are specific does not decline
too rapidly as total human capital increases.) Failure to fully control
for the positive correlation between starting human capital, H, and
41
training will obscure the relationship between training and the starting
wage. (In contrast, if one compares workers in the same job, one might
expect a negative correlation between H and h. Presumably, workers
hired for the same job have similar ability. The worker with less pre-
vious training will generally need more training to get up to speed on
the job.)
The cost of training may vary systematically among employers
as well as among workers. For example, Barron et al. (1987, 1989),
Holtman and Idson (1991), Frazis et al. (1995), and Black et al. (1999)
all find that larger firms provide more training than smaller firms, sug-
gesting that their cost of training is lower. This is especially true for
formal training.
Workers in high and low training jobs are likely to differ system-
atically in ways other than just ability. In particular, as Eq. (4.1)
(or (4.100 )) indicates, the return to training is inversely related to a
worker’s quit probability; a given training investment by the employer
will earn a greater return the longer the worker stays with the firm.
Workers with lower quit probabilities will tend to be matched to posi-
tions requiring more training since firms with greater training oppor-
tunities will attempt to hire employees with low propensities to quit
and to have compensation packages and other policies that discourage
turnover.
If there is a sufficiently strong negative correlation between train-
ing and the starting wage, workers may self-select voluntarily into
high and low training jobs since low quit probability workers place
a higher value on the higher wage that an employer will offer in
period 2 than do high quit probability workers. But if the rent shar-
ing term is big enough relative to the k(h) term, then high quit
probability workers will not self select out of the high training jobs.
Instead, as noted by Barron et al. (1993) and Kuhn (1993), employ-
ers will take it upon themselves to screen out high quit probability
workers.
Of course, employers do not directly observe a worker’s quit prob-
ability, but infer it from other observable characteristics. For example,
as noted by Blau and Kahn (2000), male–female differences in the labor
market have been diminishing. However, historically women have had
weaker attachment to the labor market, so that employers would likely

have screened on the basis of gender.
One question that arises is whether a low quit probability worker
who is incorrectly lumped together with high quit probability work-
ers, such as a woman who prefers an uninterrupted full-time career,
can somehow signal this information to an employer. If so, the worker
and the employer could both gain. Salop and Salop (1976) originally
noted that an employer offering a backloaded contract would be able
to attract low quitters and automatically screen out high probabil-
ity quitters. That is, the employer can offer a contract with a high
second-period wage and a low first-period wage, in effect letting the
worker bear the cost and realize the return to training; this contract
has the feature that the employer does not care whether or not high quit
workers are screened out because he no longer bears the cost of their
quitting. As noted by Barron et al. (1993), such a contract would gener-
ally not be self-enforcing. Even if, as in our model above, an employer
can commit to a determinate wage, such a wage profile may induce
too many dismissals. The difficulty stems from the fact that the wage
profile is now being chosen to satisfy two distinct objectives: first, to
induce workers to self-select and, second, to minimize inefficient sepa-
rations, and it simply may not be possible to achieve both objectives
simultaneously.
A number of authors have in fact found that women have tended
to receive less training than men. One early dataset with informa-
tion on training is the Panel Study of Income Dynamics (PSID). The
PSID asks workers the question, “On a job like yours, how long would
it take the average new person to become fully qualified?” Taking
the answer to this question as a measure of training, Corcoran and
Duncan (1979), Duncan and Hoffman (1979), and Gronau (1998) find
that women receive less training than men.1
1 Besidesthe variable described in the text, the PSID contains another measure of training
based on the question “Do you feel you are learning things in your job that could lead to
a better job or promotion.” Gronau labels the former variable RQT and the latter OJT.
He interprets RQT as a measure of training acquired on previous jobs, while other PSID-
based research such as Duncan and Hoffman (1979) interprets it as training acquired (or
in the process of being acquired) on the current job. OJT is almost the same for men and
43
Using the National Longitudinal Survey High School Class of 1972,

Altonji and Spletzer (1991) find that training incidence is similar for
men and women, but duration of training is higher for men. Analyzing
the Employment Opportunity Pilot Project (EOPP) dataset, Barron
et al. (1993) also find that the probability of receiving training during
the first three months on the job is similar for men and women, but
conditional on receiving training, duration is again, higher for men.
(Analysis of the NLSY data reveals the same pattern.) Similar to the
PSID, EOPP also has information on the number of weeks it takes
a new employee in the most recently filled position to become fully
trained and qualified if he or she has the necessary school provided
training but no experience in the job. Barron et al. (1993) find that
women are in jobs that require less time to be trained and qualified.
Consistent with women receiving less training, Barron et al. (1993)
also find that women’s relevant experience in previous other jobs has a
smaller wage return in their current job. (Further evidence that in the
past women have been sorted into different jobs than men is provided
by Blau and Kahn (1981) and Viscusi (1981), who find that much of the
difference in quit rates between the sexes disappears after one controls
for job characteristics.)
Royalty (1996) looks at training incidence in the National Lon-
gitudinal Survey of Youth. Royalty distinguishes between on-the-job
training and off-the-job training. She finds that on-the-job training
incidence is lower for women than men, while off-the-job training is
about the same for the two groups. Moving to more recent data,
Frazis et al. (2000) find a higher incidence of formal training in
the previous year for men in SEPT95. O’Connell (1999) finds using
the mid-1990s IALS that internationally, men in general have higher
duration of training.
Gronau (1998) and Royalty (1996) analyze the degree to which
women’s lower training can be traced to a higher turnover rate. Simple
OLS estimation does not suffice because causation is two-way: the sep-
aration probability affects training and training affects the separation
women; RQT has a greater effect on wages. In our discussion of Gronau’s findings below,
“training” can be viewed as being synonymous with RQT.
probability. (We discuss the effect of training on turnover in more detail

below.) Using the PSID from 1976 through 1979, Gronau therefore esti-
mates a multi-equation structural model which accounts for both direc-
tions of causation. One of Gronau’s objectives is to determine whether
the training gap (and hence the wage gap) between men and women
can be explained by differences in turnover. The results are ambigu-
ous. Separations from the labor force have a much greater effect on
training for men than for women. (Gronau does not consider job-to-job
turnover.) Using Oaxaca decompositions, close to 90% of the differ-
ence in training is explained by observable characteristics using men’s
coefficients, but only 5% using women’s coefficients. The effect of train-
ing on separations is restricted to come through an increase in wages.
Gronau finds substantial effects of training on wages and wages on sep-
arations, effects that are not significantly different between men and
women.
Gronau’s finding that a woman’s separation probability has little
effect on the amount of training she receives suggests that women
with strong labor force attachments are not very successful in signal-
ing this information to employers. Royalty (1996) obtains a similar
result. Royalty (1996) estimates a structural model where job turnover
and training are endogenous. Identification comes from the inclusion of
wages and health status in the turnover but not in the training equa-
tion; marital status and the presence of children are excluded from
the training equation in some cases. Inserting predicted turnover into
a probit for job training does partially explain men’s higher training,
but about 75% of the difference in the annual incidence of company
training between men and women is still unaccounted for.
5.1 Ways that Employers Who Offer Training

Can Reduce Turnover
In addition to sharing the returns to training and screening out
workers more likely to quit, there are other ways that employers
may be able to reduce turnover. Pensions are commonly thought to
reduce turnover (for example, Allen et al. (1993)), although some
research (Gustman and Steinmeier 1993) indicates that the apparent
5.1. Ways that Employers Who Offer Training Can Reduce Turnover 45
reduction of turnover is due to jobs with pensions having higher

overall compensation. Johnson (1996) and Dorsey and MacPherson
(1997) find that pension coverage is associated with the presence
of on-the-job training. Johnson additionally finds that the pension
replacement rate of income is positively associated with company
training. In the Becker model, one would predict that firm-specific
training would have a stronger association with pension coverage, as
firms would be indifferent to the loss of general capital which they
did not finance. Johnson finds that company training has a stronger
association with pension coverage than does other training, which
he interprets as consistent with the Becker story, while Dorsey and
McPherson find a mixed pattern of results, with formal company
training more strongly associated with pension coverage than outside
training, but informal training less so.
Looking more broadly at other benefits, Frazis et al. (1995) and
Frazis et al. (2000) find that the number of different fringe benefits
offered by an establishment is associated with several measures of train-
ing. Frazis et al. (1995) find that the probability of an establishment
offering training is particularly associated with the presence of employee
assistance programs and employee wellness programs, which they inter-
pret as consistent with the existence of a long-term, implicit contract
between the worker and the training firm.
Concerns about turnover may also affect the timing of training.
To address timing issues, let us extend the theoretical model presented
above to include multiple periods. Let the subscript i denote the period
in question. Thus, for example, wi denotes the wage in period i, Qi , the
quit probability in period i, εi denotes the realized value of amenities
in period i, and εci denotes the critical value of εi such that the worker
is just indifferent between staying and quitting in period i. In addi-
tion, let hi denote the human capital investment in period i and let
hTi = i−1
P
j=1 hj denote total human capital accumulation up until period
i. Finally, let ki (hTi + hi ) denote the cost of accumulating hi additional
units of human capital in period i given that hTi units have been accu-
mulated in previous periods. Then the optimal choice for hi satisfies
ki0 (hi + hTi ) ≥ δθi , (5.1a)

where
θi = (1 − Li )(1 − Qi )(1 + δθi+1 ) + ((1 − Li )Qi + Li )(γ − c0 (hTi+1 ))

Z ∞
+ gi (−Di ) (ε − εci )fi (ε)dε + (1 − γ + c0 (hTi+1 ))fi (εci )
εc
Z ∞
× (η + Di )gi (η)dη. (5.1b)
−D
In a stationary world, the functions fi , gi , and ki are independent of i

and the quit and dismissal rates are constant over time. If total training
is fixed at hT , the return to training also does not vary over time and
is given by
θ = ρ((1 − L)(1 − Q) + ((1 − L)Q + L)(γ − c0 (hT )))

Z ∞
+ (g(−D) (ε − εc )f (ε)dε + (1 − γ + c0 (hT ))f (εc )
εc
Z ∞
× (η + D)g(η)dη), (5.1b0 )
−D
where ρ = 1/(1 − δ(1 − L)(1 − Q)). It follows immediately that hi = 0

for all i > 1, so that all training occurs in the first-period. Delaying
training simply defers the benefit to be realized on the worker’s higher
productivity with no offsetting gain.
However, the stationarity assumptions are not reasonable. There
are two sources of nonstationarity. First, workers’ lives are finite. The
greater is i, the fewer the remaining number of periods that the worker
can remain with the employer. Other things the same, this effect causes
the return to training, θi , to decline with i.2 But there is an important
effect that works in the opposite direction. A worker who has a high
realized amenity value and high realized match-specific productivity
in period 1 will also tend to have high values of ε and η in future
periods. For example, in the simplest case where ε and η are fixed
over time, it must be the case that εi ≥ εc1 and ηi ≥ η1 for all i > 1,
which in turn implies that Qi < Q1 and Li < L1 , so that new workers
2 For example, suppose that the worker works exactly N periods. Then θN = 0. Through
backward recursion, one can solve for θN −1 , θN −2 , . . ., θ1 . Other things the same, θ will
fall over time.
5.1. Ways that Employers Who Offer Training Can Reduce Turnover 47
are more likely to separate that experienced workers. Another possible

source of nonstationarity stems from the fact that workers may be
better able to absorb training after an initial period in which they
become acclimatized to their job and their work environment. In the
context of the current model, this would show up as a downward shift
in ki (·) and ki0 (·) schedules over time. If these effects are strong enough,
it may pay to delay training beyond the first period. (Implicit in our
specification of the cost function ki (·) is the assumption that the cost
of training depends on the amount learned, not the rate at which it
is learned. Another possible source of nonstationarity would be a cost
of training that increases with the rate of learning – that is, a cost
function of the form ki (hTi + hi , hi ) with ki2 > 0. However, as we discuss
below, training spells are typically fairly short, so that this alone should
not provide a very strong incentive for employers to spread training
out over a very long time, say over a year or more.) Analyzing the
NLSY, Loewenstein and Spletzer (1997) find that a great deal of formal
training occurs later in the job match. In contrast, Frazis et al. (2000)
find that both hours of informal training and total hours of training
decline sharply with tenure, leading one to speculate that some amount
of informal training at the beginning of the job is usually necessary for
the worker to be productive at all.
In summary, our model suggests that since the return to training
is inversely related to a worker’s quit probability, employers have an
incentive to hire low quit probability workers for high training posi-
tions. The empirical evidence is consistent with this implication: high-
training firms offer pensions more frequently, and training often takes
place well after the start of a job, both of which suggest a desire to
lower turnover of trained employees. This also offers a partial but not
complete explanation of the higher training intensity of men compared
with women.
While not as clearly implied by theory, data overwhelmingly show
that training is offered more frequently to those with higher observable
skills. To the extent that this finding extends to skills unobservable to
analysts, this complicates estimation of the effects of training, the topic
to which we now turn.
6
Estimating the Effect of Training on Wages,
Productivity, and Turnover
6.1 Estimating the Effect of Training on Wages

Estimating the effect of employer-provided training on workers’ wages
is challenging. Among the potential sources of bias that an analyst
confronts are the following:
• Between-person heterogeneity in productivity (correlated

with training).
• Heterogeneity in job matches.
• Between-person or between-job heterogeneity in wage
growth.
• Job promotions occurring at the same time as training.
• Incorrect functional form for training.
• Measurement error in training.
• Heterogeneity in the effect of training.
Many of these sources of bias are relevant for other effects that
economists are interested in estimating, such as the effect of school-
ing on wages. As we shall see, these factors are frequently found to
have a large impact on estimates of the training effect. In contrast,
49
50 Estimating the Effect of Training on Wages, Productivity, and Turnover
the evidence that factors such as heterogeneity and measurement error

have large effects on estimated returns to schooling is inconsistent,
with some authors (Ashenfelter and Zimmerman (1997), Ashenfelter
and Rouse Ashenfelter and Rouse (1998)) asserting that simple OLS
estimates are approximately correct.
Much if not most of the work estimating the returns to training has
been done with longitudinal data. One great advantage that analysts
seeking to estimate the return to training have over those estimating
the return to schooling is that with longitudinal data, analysts can
observe wages both before and after training, while schooling is typi-
cally completed before workers have extensive work experience. Thus
one can observe wages and wage growth for the same person on the
same job both before and after training.
To illustrate the effect of some of the factors listed above, we use
the data from Frazis and Loewenstein (2005) (hereafter FL) to esti-
mate example regressions. These data are from the 1979 through 2000
waves of the 1979 cohort of the National Longitudinal Survey of Youth
(NLSY79). The basic regression that we run is:
wit = γf (Tit ) + Xit β + eit , (6.1)
where wit is the log real hourly wage of person i at time t, T is the
accumulated stock of hours of employer-provided training, f (·) is the
functional form for training, X is a vector of covariates including the
constant,1 and e is a residual. Details on the data and construction of
the training variable are in FL.
Column 1 of Table 6.1 shows the results of regressing log wages
on the cube root of training without taking into account any of the
1 Covariates include years of education, AFQT, hours of non-employer-paid training, age,

experience, experience squared; dummies for black, Hispanic, female, collective bargain-
ing, ever married, part-time, enrolled in school, calendar year, and two dummies for initial
occupation in the job; and indicators for missing values of part-time, AFQT, and collective
bargaining. Tenure, tenure squared, and tenure cubed are included, along with interac-
tions of the three tenure terms with all of the above covariates with the exception of hours
of non-employer-paid training, experience squared, and the calendar year dummies; the
interactions of tenure with age and experience use their value at the start of the job. As
an additional control for training, we include a count of spells with missing training dura-
tion (most of these occur before 1988). Small differences in specification of the covariates
account for small differences with the results in FL; the sample is identical.
Table 6.1 Regression of Ln wages on training, various specifications, NLSY79. (robust standard errors in parentheses)
Job effects + Job effects + Job effects +
OLS Fixed Fixed final final final
person job training training training
effects effects interactions interactions interactions
Cube root of training on 0.0197 0.0132 0.0092 0.0048 0.0043
the current job (0.0013) (0.0010) (0.0011) (0.0012) (0.0012)
Linear training on the 0.0014
current job (100 hrs) (0.0009)
Lagged cube root 0.0012
of training (0.0011)
Lead cube root 0.0005
of training (0.0014)
Implied effect of median 0.076 0.051 0.035 0.018 0.001 0.023
positive training (57 h)
Implied effect of increasing 0.046 0.031 0.021 0.011 0.002 0.014
training from 25th Percentile
25 h) to 75th Percentile (144 h)
6.1. Estimating the Effect of Training on Wages
51
listed forms of unobserved heterogeneity. The cube root specification

was found in FL to be the best fitting and we use it here as our baseline.
The estimated effect of training is very large, with 57 h of training (the
median for those with positive training stocks) yielding an increase in
wages of 7.6% – in the neighborhood of the return to a year of schooling.
One obvious concern is that persons who receive formal train-
ing from their employer tend to be more able. As noted above,
employees with formal training are disproportionately more edu-
cated and of higher cognitive ability. While our regression controls
for these factors, it seems likely that there are other characteristics
not available in our data that influence both training and wages.
As mentioned above, one piece of evidence that this might be the
case is the widespread failure to find a negative effect of on-the-
job training at the current employer on the starting wage. This has
led many researchers, including Lynch (1992), Booth (1993), Veum
(1995), Lengermann (1999), and Arulampalam et al. (1997), to use
fixed-effect regressions, which differences out unobserved individual
heterogeneity. Accordingly, in column 2, we show the results from a
regression with person-level fixed effects. The results are dramatic, as
the training coefficient falls 33%.
One might suspect that jobs with higher wages may also have more
training irrespective of the individual. Economists have stressed the
importance of job matches since Jovanovic (1979). This has led some
researchers to add the additional control of job fixed-effects instead of
individual fixed-effects (Parent, 1999, Frazis and Loewenstein, 2005).
The result of using job effects is shown in column 3. The training coeffi-
cient declines another 31% and is less than half the OLS value. However,
the estimated effect is still quite large, with 57 h of training associated
with an increase in wages of 3.5%.
Identification in the specification in column 3 is due to variation in
training stocks at a job match. The wages for periods where the worker
has a high stock of training with the current employer – i.e., after
training – are compared with the wages of periods with low training
stocks. As wages will on average increase with tenure and job experi-
ence, the training coefficient reflects the extent to which the increase
in wages after training is higher than would be expected given tenure
6.1. Estimating the Effect of Training on Wages 53
and experience.2 However, one might be concerned that just as the

trained have higher levels of productivity than the untrained irrespec-
tive of formal training, they may also have higher productivity and
wage growth irrespective of formal training. In this case use of job
fixed-effects will not consistently estimate the effect of training. One
particular reason for this concern is the fact that most datasets, such
as our NLSY79 data, do not contain measures of informal training.
The datasets that contain such measures show a positive correlation
between formal and informal training (see Loewenstein and Spletzer
(1999a)), so that higher wage growth due to informal training would
be attributed to formal training.
Loewenstein and Spletzer (1997) use data from EOPP and an
informal training variable present in some years of the NLSY79 to
assess the consequences of omitting informal training. They find that
the estimated effect of formal training on wage growth is reduced
12–15% when a measure of informal training is included. The effect of
informal training itself on wages has rarely been studied. Loewenstein
and Spletzer (1997) find that the coefficient on log hours of training in
EOPP is of comparable magnitude for formal and informal training.
Note that this implies smaller average and marginal effects, but a
larger total effect, of informal training due to the typically larger
amount of informal training.
In analyzing the effect of formal training, both FL and Pischke
(2001) attempt to correct for heterogeneity in wage growth. Pischke
(2001) estimates a specification where there is a fixed-effect in log wage
changes. FL estimate a specification where (within a job-fixed-effect
specification) they interact (the cube root of) the final training stock
accumulated on the current job with a cubic in tenure. (The rationale
for FL’s specification rather than using growth fixed-effects is the belief
that the effect of heterogeneity in growth is likely to vary with tenure,
as the size of wage increases declines with tenure.) The training effect in
both of these specifications is identified primarily by changes in wages
right after the period of training.
2 Asnoted in footnote 1, our specification here includes interactions of other covariates with
a cubic in tenure, so the effect of tenure is adjusted for the covariates.
The training coefficient for FL’s specification with final-training

interactions is shown in column 4 of Table 6.1. This specification results
in a 47% reduction in the coefficient from column 3, and the coefficient
is now only 24% the size of that reported in column 1. However, it is
still of substantial size, with 57 h of training increasing wages by 1.8%.
While identification of the training effect from wage changes around
the period of training may appear to be unimpeachable, one objection
that has been raised is that this wage increase may simply reflect job
promotions – workers are trained after they are promoted to new job
duties with higher pay. FL, using data from promotions in some years of
the NLSY79, find that including a promotions variable reduces the esti-
mated effect of training by around 40%. (In contrast to the NLSY data,
FL find that the EOPP data provide no indication that the estimated
return to training is partly due to the effect of promotions.) However,
FL also note that this almost certainly understates the return to train-
ing, as promotions may be an outcome of training as well as a cause.
There clearly is an identification problem – giving an able worker more
responsibilities may increase productivity in the absence of training,
but a worker’s improved ability to carry out more advanced job duties
should properly be considered to be part of the return to the training
investment.
Another issue is the choice of functional form, which has a large
impact on estimated effects of training. The early research on the effects
of training (for example, Duncan and Hoffman (1979), Lynch (1992),
and Veum (1995)) typically used wage levels as in column 1 in addition
to or instead of using fixed-effect specifications. Yet the returns found
were usually not as high as those found in column 4, let alone column 1.
For example, Lynch (1992) in her study of non-college-graduate youth
finds that the coefficient on weeks of on-the-job training from the cur-
rent employer is 0.002 for white males and is negative for white females;
neither estimate is statistically significant. The explanation lies in the
choice of functional form. Early studies using the NLSY79 and sim-
ilar datasets typically used linear hours of training, rather than the
substantially better-fitting cube root. This has surprisingly severe con-
sequences. Column 5 shows the effect of using linear hours rather than
the cube-root of hours in the specification in column 4 (the cube-root
of hours is still used in the final-training interactions with tenure). The

effect of training is estimated to be an order of magnitude lower than
in column 4, and is no longer statistically significant.
Why does the linear functional form so drastically underestimate
the effect of training in the middle of the positive training distribu-
tion? In the NLSY79 (and also in EOPP), the distribution of training
for those with some training is quite skewed to the right; it is approx-
imately log-normal. In our fixed-effect regressions, observations with
large deviations of training from average training will have a dispro-
portionately large effect on the training coefficient.3 Specifications such
as the linear should tend to predict better in the right tail of the dis-
tribution and worse in the middle of the training distribution than
specifications like the cube-root that compress the training distribu-
tion. The linear function’s tendency to fit the right tail will lead to
an especially poor fit in the middle of the training distribution when
linearity is a misspecification.
Measurement error in training is an additional source of bias. As
noted earlier, evidence from Barron et al. (1997b) indicates that there
is substantial measurement error in survey reports of training. It is
plausible that measurement error in informal training is “classical” in
the sense that it is uncorrelated with the true value of training or
other variables. Consistent with this supposition, Barron et al. (1997b)
regress the difference in employers’ and workers’ reported log hours
of training (a measure which is predominantly informal training) on a
group of covariates and find no significant results. Classical measure-
ment error will unambiguously bias downward the estimated coefficient
in a regression on training. Moreover, if a second measure or suitable
instrument is available, instrumental variables (IV) estimation yields a
consistent estimate.
In datasets where it has been measured, informal training has close
to universal incidence (at least at the start of a job). In the case of
P(f (T )−fˆ(T ))(ln W −ω̂)

3 Specifically, the coefficient on training is given by β̂2 = P(f (T )−fˆ(T )) , where fˆ(T )
2
and ω̂ denote the predicted values of f (T ) and ln W from regressions of f (T ) and ln W

on X and the fixed effects. Note that β̂2 is a weighted sum of the (ln W − ω̂) observations,
with the absolute value of the weights proportional to the absolute value of f (T ) − fˆ(T ).
formal training, the relatively low incidence (0.20 in the data used in
the table, for example) and consequent large number of observations
where the reported duration of training is zero complicates the analysis
of measurement error. Reporting no training when in reality there was
training implies a negative measurement error, while reporting training
when there was no training implies positive measurement error – gener-
ating a negative correlation between the true value of training and the
measurement error. This non-classical measurement error implies that
IV is not a consistent estimator (Frazis and Loewenstein 2003a, Kane
et al. 1999, Black et al. 2000). Moreover, even in a fairly simple model
of measurement error where the error in reporting duration conditional
on reporting incidence is uncorrelated with the true value of training,
the direction of the bias of IV is ambiguous (Frazis et al. 1996).
Frazis and Loewenstein (2003a) simplify the problem by estimating
the effect of a binary indicator of formal training. In this case, IV is
unambiguously upwardly biased, but Frazis and Loewenstein develop
a consistent generalized-method-of-moments (GMM) estimator. (Their
estimator is similar to ones developed by Black et al. (2000) and Kane
et al. (1999) where there is a second erroneous measure available.)
Applied to NLSY79 data, this estimator yields a value roughly two
times the OLS (with job fixed-effects) estimate.4 To date there have
been no satisfactory attempts to estimate the returns to training tak-
ing into account measurement error in the mixed discrete–continuous
training variable. One possible approach would be to extend the Frazis
and Loewenstein (2003a) GMM estimator along the lines of Kane
et al. (1999), whose estimator allows multiple discrete levels which could
be used to approximate the continuous training variable.
One final complication in considering measurement error in train-
ing data is the possibility of overestimating the returns to short spells.
To simplify the discussion, assume there is no error in reporting the
4 The dependent variable in Frazis and Loewenstein (2003a) analysis is wage growth.
The instruments are job reallocation rates by 2-digit industry (as a proxy for exoge-
nous turnover) and education controlling for AFQT and 1-digit occupation. Barron
et al. (1997b), using a second measure of training as an instrument, also find that the
effect of log total hours of training on productivity is more than doubled after taking
measurement error into account.
incidence of training (such error unambiguously reduces estimated

returns to training under standard assumptions) and that measure-
ment error in the duration measure conditional on incidence is inde-
pendent of the true measure. Assume also that this measurement error
is sufficiently small that we can neglect the possibility of negative
reported values. Let T ∗ denote true training and f ∗ be the function
relating true training and wages. Under these assumptions, the true
length of spells observed to be sufficiently shorter than average will
be underestimated – E(T ∗ |T ) > T . While the functional form f ∗ may
counteract this effect, for sufficiently small values of T , it will be true
that E(f ∗ (T ∗ )|T ) > f ∗ (T ). The return to training at T will be esti-
mated to be (E(f ∗ (T ∗ )|T ) − f ∗ (0))/T , greater than the true value
(f ∗ (T ) − f ∗ (0))/T (noting that T ∗ = 0 when T = 0 from our assump-
tion of no error in incidence reporting). For further details, see Frazis
and Loewenstein (2003b). Both Frazis and Loewenstein (2003b) and
Pischke (2001) considered this source of bias, but both concluded that
it was likely not very important at least at median values of positive
training.
Thus far we have been considering estimation of (6.1), which implic-
itly assumes a single function f describing returns to training. However,
just as more complex jobs require that the worker undertake more for-
mal education as a pre-requisite for being hired, it seems likely that
training has a higher return in more complex jobs than in simpler jobs.
If heterogeneity in returns were limited to interactions with observ-
able covariates, they could be easily handled. However, unobservable
heterogeneity in the effect of training raises issues of interpretation.
Recent research on variable treatment effects and program evalua-
tion has clarified the issues involved (for example, Angrist et al. 1996,
Heckman and Robb 1985, Heckman 1997, and Heckman et al. 1999).
Consider the following simplified wage model that abstracts from
covariates other than training:
ln Wit = αi + βi ϕ(Tit ) + eit , (6.2)
where E(eit ) = E(αi ) = 0, E(βi ) = β, and eit is independent of α and β.
Both α and β are potentially correlated with T . Fixed-effect estima-
tion eliminates any potential bias stemming from a positive correlation
between unmeasured ability α and training. However, fixed-effect esti-

mates of the return to training do not purge the effect of a correlation
between β and T .
To analyze the bias in fixed-effect estimation, consider a situation
where we have two periods of data, with training always equal to 0
when t = 1 and varying across the sample when t = 2. The expected
value of the return to training estimated by fixed-effects (which, in this
case, is equivalent to first differences) is given by:
f (T0 ) = E(ln Wi2 |Ti2 = T0 ) − E(ln Wi1 |Ti2 = T0 )

= E(αi |Ti2 = T0 ) − E(αi |Ti2 = T0 ) + E(βi ϕ(T0 )|Ti2 = T0 )
= E(βi |Ti2 = T0 )ϕ(T0 ). (6.3)
Borrowing a concept from the program evaluation literature, one

can distinguish between the return to training for the average member
of the population and the return to training for the trained (see the
above references). Fixed-effect regressions do not estimate the return
to training for the average member of the population βϕ(T0 ), but, as is
clear from (6.3), consistently estimate the effect of a given amount of
training for those with that amount of training.5 In particular, abstract-
ing from measurement error, the high estimated returns to short spells
of training in the table are not overestimates of the return to training
for those with such spells. However, this does not mean that one would
expect individuals who do not receive formal training to have real-
ized such returns had they been trained. Indeed, any reasonable model
would predict that E(βi |T = T0 ) > E(βi |T = 0): individuals with train-
ing should tend to have a higher return than those with no training.
Similar comments apply to estimates of the marginal return to train-
ing, which will be estimated as
∂E(βi |T = T0 )
f 0 (T0 ) = E(βi |T = T0 )ϕ0 (T0 ) + ϕ(T0 ), (6.4)
∂T
5 The situation is more complicated in the multiperiod NLSY dataset, where the estimated
return g(T0 ) will partly reflect average returns and partly reflect marginal returns. FL
found that omitting observations with (within-job) accumulated training greater than
zero but less than final observed training did not appreciably change their results.
|T =T0 )
and which will exceed E(βi |T = T0 )ϕ0 (T0 ), if ∂E(βi∂T > 0: estimation
of ϕ0 is confounded by a composition effect stemming from the fact that
individuals with more training can be expected to have a higher return.
Thus far we have used the term “returns to training” rather loosely,
and have avoided discussing the economic significance of our estimates.
We now consider how and to what extent regressions of wages on train-
ing can be used to estimate the rate of return to the training invest-
ment. It is well known that the rate of return to formal schooling can,
under certain conditions, be determined by the coefficient on years of
schooling in a regression on log wages (Mincer 1970). By comparison,
the estimation of the rate of return to on-the-job training is compli-
cated by a variety of factors: the worker is paid during the period of
training, wages are not adjusted continuously, and the returns may be
split between the worker and the employer.6
To help fix ideas, consider a case where the worker’s wages are
adjusted at the beginning of every year and equal average productivity
during the year. If the worker is trained in the middle of the year, wages
for the year will reflect productivity before, during, and after training.
Productivity after training is higher than pre-training productivity, but
productivity during training is presumably lower or zero. Wages for the
year after training will entirely reflect post-training productivity. FL
show that under these circumstances the return to training (neglecting
direct costs) can be estimated from annually collected data by summing
the coefficients on the stocks of current training, lagged training, and
lead training.
Column 6 of Table 6.1 shows the results for adding the lead and
lagged terms. The sum of the coefficients is 0.0059, which corresponds
to an increase of 0.023 in log wages with 57 h of training. Setting a
work-year equal to 2000 h, we compute the annualized rate of return
as r = (2000)(0.023)/57, approximately 80%. This is a substantially
6 Ifwages are adjusted continuously, we would expect wages to decrease during training
spells that have not been completed. This is not borne out empirically – for example, Lynch
(1992) and Loewenstein and Spletzer (1999b) do not find significant effects of uncompleted
training spells. Given the short length of most training spells, it would require very rapid
adjustment of wages to generate any decline in wages during periods of training.
higher rate of return than that found for schooling, which typically is
in the neighborhood of 10% or less.
This rate of return calculation neglects the direct costs of training
in the form of salaries for trainers and other expenses. SEPT95 esti-
mated that, in its sampling frame of firms with 50 or more employees,
wages and salaries of trainers, payments to outside trainers, tuition
reimbursements, and contributions to training funds totaled $300 per
employee in 1994. The survey also estimated that wages and salaries
paid to employees while in formal training totaled $224 over the period
May–October 1995 (Frazis et al., 1998). Pro-rating the wage and salary
cost of employees to a full year, the wages paid to workers receiving
training appear to account for only about 60% of the total costs of
training; other direct costs account for the remaining 40%.7 Apply-
ing this to column 6 gives a rate of return of 48%. This calculation
does not take into account the effect of promotions. However, FL con-
cluded that the most plausible estimate of the returns to formal training
was in the range of 40–50% after making a reasonable adjustment for
promotions.
High estimated returns, especially in combination with the fact that
most employees receive no formal training – for example, only 31% had
received formal training on their current job as of 1994 in our NLSY79
sample – may seem to imply a market failure in training leading to
underinvestment, as claimed by Ahlstrand et al. (2003). We have argued
that our estimates reflect the average return to training of the trained.
This interpretation implies that high estimated returns need not reflect
market failure. Untrained workers may realize much lower returns than
those obtained by workers who actually receive training. Without the
appropriate structural restrictions, it is not possible to estimate the
expected return to training of workers who do not receive training.8
7 Using firm data from Portugal, Almeida and Carneiro (2006) find that foregone production
accounts for less than 25% of training costs.
8 A reviewer argues that the estimated difference between average and marginal returns
is implausibly large if there is no market failure. In response, we note that there may
be extremely high returns to some short spells of training. For example, if an untrained
worker cannot operate a machine essential to production, the productivity of an untrained
worker assigned to the machine is zero and the return to his training is high.
Before discussing other approaches to estimating the returns to

training, we note that rates of return are likely to be estimated quite
imprecisely in typical datasets. The standard error for the effect of 57 h
of training is 0.7% points in our sample with over 75,000 observations,
which is the largest dataset with detailed training information that we
are familiar with. This implies a standard error for the rate of return
gross of direct costs of 25% points. Netting out direct costs (without
taking into account the variance of estimated costs), the estimate of
48% for the rate of return at 57 h mentioned above has a standard
error of 16% points. The large standard error for estimated rates of
return is a result of the short duration of most training spells; small
differences in the effect of such spells imply large differences in rates of
return.
6.1.1 Approaches other than fixed-effect estimation

The predominant approach in the literature on the wage returns to
training has been to use some form of fixed effects to correct for unob-
served heterogeneity correlated with training. (We include the use of
deviations from job means as instruments, as in Parent (1999), in
this group; in our NLSY79 data, this yields results that are quite
similar to those obtained in a simple specification with job-match
fixed-effects.) Approaches that rely on instrumental variables or the
comparison of matched treatment and control groups have not been
as popular. Two papers by Leuven and Oosterbeek (2003, 2004) illus-
trate recent attempts to use such techniques to correct for heterogene-
ity. Leuven and Oosterbeek argue that while fixed-effects will correct
bias associated with permanent (or job-level) heterogeneity correlated
with training, selection into training may be associated with temporary
movements in wages or wage growth. (Promotions, discussed above, are
one example of such a temporary shock to wages associated with train-
ing.) They take advantage of unique features of their data to generate
arguably valid instruments, usually a difficult task when it comes to
training as most variables that affect training would also be expected
to affect wages.
Leuven and Oosterbeek (2003) use a dataset that contains infor-

mation on the interest of the respondent in taking a training course.
Respondents who did not take a course because of a random event were
considered the control group. The idea is that the unobservable charac-
teristics of these respondents were likely similar to those of individuals
who actually took training. Leuven and Oosterbeek (2003) show that
if the control group is in fact similar to the trained, the difference
between the trained and the control group will estimate the average
effect of training on the trained, as in the fixed-effect regressions dis-
cussed above.
Leuven and Oosterbeek (2004) exploit the existence of a tax deduc-
tion available to Dutch employers for training employees aged 40 and
older. This allows a regression-discontinuity design where an indicator
for older than 40 is used as an instrument for training in a regression
that includes age and age-squared.
Recent research in the interpretation of IV regressions (Angrist
et al. (1996), for example) points out that when the effect of the endoge-
nous variable differs across the population, the IV coefficient on the
variable is a local average treatment effect – the effect of the treatment
on the population that changes treatment status due to the instrument.
In the context of the present example, (abstracting from covariates) the
limit of the IV estimator reduces to:
E(ln w| Age > 40) − E(ln w| Age < 40)
E(T | Age > 40) − E(T | Age < 40)
which measures the change in wages from the additional training

above age 40, presumably caused by the tax deduction. IV estima-
tion in this case allows one to estimate something close to a marginal
return. Leuven and Oosterbeek use an indicator as their training
variable, so in this case the coefficient on training measures the effect
on wages of the increased training (of whatever duration) due to the
tax deduction.
Both of these approaches are very demanding of the data – the first
because of the stringent definition of the control group (the final size of
this group was 77 respondents), the second because only respondents
close to age 40 contribute to the estimate. As a result, the estimates
6.2. Estimating the Effect of Training on Productivity 63
in both papers are very imprecise, with both zero and values compa-
rable with Frazis and Loewenstein (2005) contained in the confidence
intervals.
6.2 Estimating the Effect of Training on Productivity

The foregoing has implicitly assumed that the worker bears all the costs
(in terms of foregone production) and obtains all the returns to training.
If the training is to some extent firm-specific, or if there are frictions
in the labor market that cause the firm to share in the cost of general
training as discussed above, then the wage effect will underestimate the
return to training in terms of productivity. The observed wage effect is
thus a lower bound to the rate of return in terms of productivity.
The effect of training on productivity has been studied using data
on firm- or industry-level outcomes as well as subjective productivity
measures from employers. (A third strand of research, which we do not
discuss due to its lack of generalizability, is analysis of individual firms
and specific training programs; see Bartel (2000) for examples.) We
turn first to firm- and industry-level studies.
Relating measures of value-added by firm or industry to training
appears at first glance to be an ideal way of measuring the productivity
effects of training. However, we are aware of no data set that would
allow the satisfactory construction of measures of stocks of training at
these levels of aggregation. Most firm- or industry-level datasets do not
contain measures of training duration, so studies using these datasets
(for example, Black and Lynch 1996, 2001, Dearden et al., 2005, Bartel
1994 and Conti 2005) use the proportion or number of workers trained
as their measure of training, making it difficult to gauge the economic
significance of the coefficient on training. Studies that examine the
effect of changes in proportion trained (Dearden et al. 2005, Conti
2005) or the introduction of new training programs (Bartel 1994) on
changes in productivity do find substantial positive effects.
Datasets that do contain measures of duration for firms do not con-
tain comprehensive training histories for the employees of the firm, so
auxiliary assumptions must be made to generate measures of employ-
ees’ stocks of training (or changes in the stocks of training) from the
data. Barrett and O’Connell (2001) assume that the change in human
capital between two periods is equal to the amount of training, implic-
itly ignoring turnover and depreciation. Almeida and Carneiro (2006),
who have access to turnover data and allow the average level of human
capital in a firm to decline as the result of turnover and depreciation,
explicitly assume that exiting employees have on average the same level
of human capital as employees who stay. The latter paper is the only
one we are aware of that calculates a rate of return to training from
multi-firm or industry level data. After estimating parameters of pro-
duction and training cost functions, the authors find that the average
rate of return is 24% for firms providing training; on the basis of their
observable characteristics, the average rate of return for firms not pro-
viding training is −7%.
Some researchers have used subjective measures of productivity to
estimate the effect of training. The EOPP survey contains the question
“Please rate your employee on a productivity scale of

zero to 100, where 100 equals the maximum productiv-
ity rating any of your employees [in this] position can
attain and zero is absolutely no productivity”
for various points in the tenure of the last employee hired, the typical
worker in that employee’s position, and in some cases a second employee
in that position. This allows examination of the effect of training on
productivity growth for an individual employee as well as comparisons
between the training and productivity of different workers in the same
job. A similar question is asked in a 1992 survey sponsored by the Small
Business Administration (SBA).
To what extent these ratings correspond to true productivity is
an obvious issue. Analyses of these data essentially assume that the
observations on the same job are the sum of a component proportional
to true productivity plus a random error. Bishop (1987, 1991) defends
this assumption that the measure is proportional to true productivity
by noting that the coefficient of variation of output observed in EOPP
is similar to the average found in a review of studies that had physical
measures of output (Schmidt and Hunter, 1983).
6.2. Estimating the Effect of Training on Productivity 65
Hours of both formal and informal training during the employee’s

first three months on the job are reported in both the EOPP and
SBA surveys and are found to strongly affect productivity. Barron
et al. (1997a, 1999) and Barron et al. (1989) report elasticities of
approximately 0.2 for measures of long-term productivity change with
respect to hours of training. Matching the reference period for training
to that of productivity, Barron et al. (1997a) report a similar elasticity
for productivity change in the initial three months using SBA data.
Bishop (1991) is the only paper we are aware of to attempt to estimate
rates of return based on subjective productivity data; the results vary
widely depending on specification. (Bishop 1991 is also the only paper
to attempt a comparison of formal and informal training’s effect on pro-
ductivity. The relative effect once again depends on specification, with
some specifications showing similar returns and some a higher return
to formal training.)
When data on both wages and productivity are available, both
the firm/industry-based studies and the employee-based studies show
stronger effects of training on productivity than on wages. (The only
exception that we are aware of are some IV regressions in Bishop
1991.) For example, Barron et al. (1989) report a coefficient for the
log of hours of training of 0.035 for a regression of wage growth from
the start of the job to two years and 0.176 for a regression of produc-
tivity growth for the same periods. More broadly, also using EOPP
data, productivity growth instrumented by training and job complex-
ity measures has a coefficient of only 0.26 on wage growth in Frazis
and Loewenstein (2006). This contrast is another piece of evidence,
in addition to those mentioned above, that firms share the costs and
returns to general training. While sharing of returns to specific train-
ing could conceivably account for the small effect of productivity
growth on wage growth, the small fraction of productivity growth that
translates into wage growth implies an implausibly large degree of
specific training – and in fact most EOPP respondents report that the
skills learned on the job are useful outside the company (Loewenstein
and Spletzer, 1999b).
Summarizing the discussion in the last two sections, estimation of
the effects of training is subject to numerous biases, and the evidence
indicates that many of these biases have a substantial impact on esti-

mates of the effect of training on wages. However, we find in our exam-
ple from the NLSY that even after correcting for many of these biases
using longitudinal data, the estimated rate of return for the employee
to the median amount of formal training is substantial – about 50%.
It is important to note that this estimate reflects the average return of
training to the trained, not the marginal return to training for trained
workers or the potential return for untrained workers, for which no
good estimates exist. Estimates of the effect of training on productiv-
ity are plagued by data problems. With that caution, the productivity
return to training is probably higher than the wage return; When data
on both wages and productivity are available, researchers typically find
that training has a stronger effect on productivity than on wages.
6.3 Estimating the Effect of Training on Job Mobility

To determine the effect of training on turnover, let us return to our ear-
lier theoretical analysis of the division of the return to training. Recall
from (3.14) and (3.15) that in the simple Becker model without labor
market frictions ∂D0 /∂h = M (1 − γ) and ∂Q/∂h = f (εc )(M − 1)(1 −
γ). When training is general γ = 1, so that ∂D0 /∂h = 0 and ∂Q/∂h = 0.
It is straightforward to verify that ∂L/∂h is also zero: general training
causes the wage offered by the employer and the alternative wage that
the worker can earn elsewhere to both increase by the same amount as
productivity, with the result that mobility is unaffected.9 In contrast,
when training is specific, 0 = ∂D0 /∂h < 1, ∂Q/∂h < 0, and ∂L/∂h < 0:
specific training leads to an increase in the employer’s wage offer that is
less than the increase in the worker’s productivity; the higher value of
D0 causes the dismissal probability to fall and the higher wage causes
the quit probability to fall. (Munasinghe and O’Flaherty (2005) demon-
strate that in a multi-period model without wage commitments, specific
9 Note, however, that if one includes the wage rate in a quit equation, general training
should have a positive effect on the quit probability: holding a worker’s wage constant,
he is more likely to quit the higher the wage he can command elsewhere. Similarly, if one
includes the wage rate in a layoff equation, general training should have a negative effect
on the layoff probability: holding the worker’s wage constant, the worker is less likely to be
laid off if his productivity is higher. In his early paper, Parsons (1972) makes this point.
6.3. Estimating the Effect of Training on Job Mobility 67
training always results in lower turnover, but does not unambiguously

lead to a higher wage in every period: anticipating a higher wage in
period t, workers are willing to work for a lower wage in period τ < t.)
As we saw above, frictions in the labor market make general training
more like specific training, with the result that general training too
may be associated with lower turnover. One complication in estimat-
ing the effect of training on turnover, both in the simple Becker model
and in models that predict sharing of the costs and returns to general
training, is reverse causality: as we discussed above, workers with lower
quit propensities will tend to be in positions that offer more training.
This will reinforce any negative correlation caused by a negative effect
of training on turnover.
Proxying turnover by the percentage of workers who are at an estab-
lishment less than one year, Lynch and Black (1998) find that the
greater is this percentage the less likely are establishments to have any
computer training or any training in basic educational skills. However,
the proportion of workers who receive training (of any type) shows no
relation to the percentage of workers who are at an establishment less
than one year. Frazis et al. (1995) similarly find that basic skills train-
ing in large establishments is negatively associated with the percentage
of workers with less than one year of tenure, but for the entire sample
the effects for job skills training and training tend to be positive. Frazis
et al. (2000) find that employer training expenditures per worker are
negatively related to turnover; however, one does not find this negative
correlation when one looks at the hours of training per employee that
are reported in an employee log of training events.
There is a ready explanation for the mixed findings concerning the
relationship between training and the percentage of workers with less
than one year of tenure. Consider the multi-period model we discussed
above. Suppose for simplicity that the firm’s employment level is in
a steady state with the employer hiring just enough new workers to
replace the experienced workers who quit and suppose that workers are
always trained in the first-period. If the turnover rate is, say, 10%, then
10% of the employer’s workers will be new workers who are untrained.
If the turnover rate is 40%, then 40% of the employer’s workers will
be untrained new workers. If the productivity of untrained workers is
very low, then the value of modest amounts of training will be very
high and the employer will provide some training to all of its workers
even though there is a high probability that they will quit. Training
incidence in any period will therefore be higher the greater is turnover,
although presumably the total training that a worker receives will be
lower.
Of course, our model simplifies in assuming that a firm’s workers are
all alike ex ante and are all performing the same job. If one relaxes this
assumption, all workers will not necessarily get trained and training
incidence in any period is not necessarily higher at the high turnover
establishment. But the basic point remains: high turnover establish-
ments will have to provide training to their new workers to replace the
human capital that is lost when experienced workers quit. Consistent
with this argument, Frazis et al. (2000) find that after correcting for
tenure in data from individual employees, establishment level turnover
reduces several measures of formal training incidence and intensity
(however, the coefficient on turnover for informal training is insignifi-
cant and wrong-signed).
Turning to individual level data, Lynch (1991) finds that, for a
sample from the first few years of the NLSY79, on-the-job training
is associated with decreased mobility for women but off-the-job train-
ing with increased mobility. (The same sign pattern holds for men but
is not statistically significant.) As on-the-job training is likely to be
more firm-specific in nature than off-the-job training this finding con-
forms with theory, although it is not clear why off-the-job training
should be associated with increased mobility unless workers are invest-
ing in skills that are more useful at alternative employers. Analyzing
data obtained after the NLSY79 training questions were redesigned,
Loewenstein and Spletzer (1997) find that company training spells and
training in the form of seminars are associated with reduced job mobil-
ity, while the more general “school training” is uncorrelated with
mobility. Levine (1993) finds a negative association between a proxy
for quitting (obtained from a worker’s response about the likelihood of
looking for a new job) and various subjective measures of training in a
matched employee–employer dataset of US and Japanese manufactur-
ing firms. Interestingly, while human capital theory suggests that this
6.3. Estimating the Effect of Training on Job Mobility 69
should be due to wage increases from specific human capital acquisi-

tion, Levine finds that the relationship disappears once job satisfaction
measures are included in the analysis. (Levine notes that this finding
may be consistent with human capital theory if “workers who are highly
trained receive better working conditions as a reward for their higher
productivity,” so that the worker’s return to training is non-pecuniary.)
As discussed above, interpretation of the observed correlation
between training and mobility is complicated by the fact that there
are two distinct effects: training affects the probability of separating
and sorting of low turnover workers into high training jobs means
that unobservable determinants of mobility will be negatively corre-
lated with training. Some papers attempt to correct for unobservables
using panel data; the addition of heterogeneity controls appears not
to alter the conclusion that employer-provided training reduces mobil-
ity. Mincer (1988) controls for prior mobility in the PSID and finds
that longer periods of training reduce mobility even after such a con-
trol. (From the wording of the training question in the PSID, it is
not possible to distinguish specific from general training, unlike the
NLSY where there is information on type of training.) Elias (1994),
using recall data on job-histories from a survey of six cities in the UK,
and Parent (1999), using data from the NLSY, both exploit the panel
nature of their data to estimate hazard models of turnover with controls
for individual heterogeneity for individuals with multiple employment
spells. They both find that employer-provided training reduces mobil-
ity. Contrary to earlier findings, Parent (1999) finds that off-the-job
training with the current employer also reduces mobility (Elias 1994
does not distinguish off-the-job training); this is true whether or not
heterogeneity is controlled for.
Overall, the empirical evidence on the effect of training on turnover
is compatible with theory. Company training, which presumably con-
tains at least some element of specific training, reduces turnover. The
evidence is mixed for off-the-job training, for which the theoretical pre-
diction is less clear.
7
Conclusion
Datasets with information concerning on-the-job training have become

more plentiful over the last 25 years. These datasets have provided
researchers with direct measures of training with which to examine the
effects of training and test human capital theory. The datasets provide
evidence that workers receive a substantial amount of on-the-job train-
ing. In particular, informal training appears to be quite important at
the start of a job. There is a strong positive correlation between train-
ing and wage growth and a negative correlation between training and
turnover.
The available evidence indicates that training is typically useful at
more than one employer. Furthermore, some researchers have found
evidence that employers share the costs and returns to general train-
ing, contrary to the basic Becker model. This has in turn stimulated
new theoretical work wherein theorists have modified and enriched the
Becker model. Economists now have a thorough understanding of the
theoretical issues pertaining to on-the-job training; the major gaps in
our knowledge are empirical.
While recent empirical work enables us to better gauge the extent
of training and the effects of training and while we have a better
71
72 Conclusion
understanding of who gets training, overall our empirical knowledge

is still quite limited. Inconsistencies in the definition of training across
datasets complicate our efforts to determine the amount of training at
a point in time and hamper our ability to investigate changes in the
extent of both formal and informal training over time. Interpretation
of the relationship between training and wages (and between training
and productivity) remains an issue.
The best recent estimates of the wage returns to training have used
panel data to correct for unobserved determinants of wages correlated
with training. The weight of the empirical evidence indicates that there
are very high wage returns to formal training even after correcting for
heterogeneity in wage levels and in wage growth. Where both produc-
tivity and wage data exist, almost all studies that we are aware of
estimate the effect of training on productivity to be even higher than
for wages.
However, these estimates most plausibly represent the average
return to training for the trained, not marginal rates. Marginal rates,
whether for the last hour of training for the trained or for the worker
on the margin of being trained or not, are of course the key piece of
evidence in evaluating whether there is underinvestment in training.
There are at this date no good estimates of the marginal return to
on-the-job training. This is primarily due to lack of instruments that
are both plausibly exogenous and sufficiently powerful to generate pre-
cise estimates. Such instruments would also help to solve the problem
of obtaining consistent estimates of the effects of training in the pres-
ence of measurement error, which such evidence as we have indicates
is severe in training data.
References
Acemoglu, D. and J.-S. Pischke (1998), ‘Why do firms train: Theory

and evidence’. The Quarterly Journal of Economics 113(3), 79–119.
Acemoglu, D. and J.-S. Pischke (1999a), ‘Beyond becker: Training in
imperfect labor markets’. Economic Journal 109, F112–F142.
Acemoglu, D. and J.-S. Pischke (1999b), ‘The structure of wages and
investment in general training’. Journal of Political Economy 107(3),
539–572.
Acemoglu, D. and J.-S. Pischke (2003), ‘Minimum wages and on-the-
job training’. Research in Labor Economics 22, 159–202.
Ahlstrand, A., L. Bassi, and D. McMurrer (2003), Workplace Educa-
tion for Low-Wage Workers. Kalamazoo MI: Upjohn Institute for
Employment Research.
Akerlof, G. A. and J. Yellen (1990), ‘The fair-wage effort hypothesis
and unemployment’. The Quarterly Journal of Economics 102(2),
255–283.
Allen, S. G., R. Clark, and A. McDermed (1993), ‘Pensions, bonding,
and lifetime jobs’. Journal of Human Resources 28(3), 463–481.
Almeida, R. and P. Carneiro (2006), ‘The return to the firm investment
in human capital’. IZA Discussion Paper No. 1937, January.
73
74 References
Altonji, J. G. and J. R. Spletzer (1991), ‘Worker characteristics, job

characteristics, and the receipt of on-the-job training’. Industrial and
Labor Relations Review, pp. 58–79.
Angrist, J., G. Imbens, and D. Rubin (1996), ‘Identification of causal
effects using instrumental variables’. Journal of the American Sta-
tistical Association 91, 444–472.
Arulampalam, W., A. L. Booth, and M. Bryan (2004a), ‘Training and
the new minimum wage’. The Economic Journal 114, C87–C94.
Arulampalam, W., A. L. Booth, and M. Bryan (2004b), ‘Training
in Europe’. Journal of the European Economic Association 2(2/3),
346–360.
Arulampalam, W., A. L. Booth, and P. Elias (1997), ‘Work-related
training and earnings growth for young men in Britain’. Research in
Labor Economics 16, 119–147.
Ashenfelter, O. and C. Rouse (1998), ‘Income, schooling, and ability:
Evidence from a new sample of identical twins’. Quarterly Journal
of Economics 113(1), 253–284.
Ashenfelter, O. and D. J. Zimmerman (1997), ‘Estimates of the returns
to schooling from sibling data: Fathers, sons, and brothers’. The
Review of Economics and Statistics 79(1), 1–9.
Autor, D. (2001), ‘Why do temporary help firms provide free general
skills training?’. Quarterly Journal of Economics 116(4), 1409–1448.
Balmaceda, F. (2005), ‘Firm-sponsored general training’. Journal of
Labor Economics 23(1), 115–134.
Barrett, A. and O. P. J. (2001), ‘Does training generally work? The
returns to in-company training’. Industrial and Labor Relations
Review 54, 647–662.
Barron, J. M., M. C. Berger, and D. A. Black (1997a). On-the-Job
Training. Kalamazoo Michigan: W.E. Upjohn Institute of Employ-
ment Research.
Barron, J. M., M. C. Berger, and D. A. Black (1997b), ‘How well do
we measure training?’. Journal of Labor Economics 15(3), 507–528.
(part 1).
Barron, J. M., M. C. Berger, and D. A. Black (1999), ‘Do workers
pay for on-the-job training?’. Journal of Human Resources 34(2),
235–252.
References 75
Barron, J. M., D. A. Black, and M. A. Loewenstein (1987), ‘Employer

size: The implications for search, training, capital investment, start-
ing wages, and wage growth’. Journal of Labor Economics 5, 76–89.
Barron, J. M., D. A. Black, and M. A. Loewenstein (1989), ‘Job match-
ing and on-the-job training’. Journal of Labor Economics 7(1), 1–19.
Barron, J. M., D. A. Black, and M. A. Loewenstein (1993), ‘Gender dif-
ferences in training, capital, and wages’. Journal of Human Resources
28(2), 343–365.
Bartel, A. (1994), ‘Productivity gains from the implementation of
employee training programs’. Industrial Relations 33, 411–425.
Bartel, A. P. (2000), ‘Measuring the employer’s return on investments
in training: Evidence from the literature’. Industrial Relations 39(3),
502–523.
Bassanini, A., A. L. Booth, G. Brunello, M. D. Paola, and E. Leuven
(2005). Workplace Training in Europe. IZA Discussion Paper
No.1640, June. Forthcoming as Part II of Education and Training
in Europe, Oxford University Press.
Becker, G. S. (1962), ‘Investment in human capital: A theoretical anal-
ysis’. Journal of Political Economy 70, 9–49. Supplement (October).
Bishop, J. (1987), ‘The recognition and reward of employee perfor-
mance’. Journal of Labor Economics 4(4), S36–S56. Part 2.
Bishop, J. (1991), ‘On-the-job training of new hires’. In: D. Stern and
J. Ritzen (eds.): Market Failure in Training. New York, pp. 61–96,
Springer Verlag.
Black, D. A., M. C. Berger, and F. Scott (2000), ‘Bounding parame-
ter estimates with non-classical measurement error’. Journal of the
American Statistical Association 95(451), 739–748.
Black, D. A. and M. A. Loewenstein (1991), ‘Self-enforcing labor con-
tracts with costly mobility: The subgame perfect solution to the
chairman’s problem’. Research in Labor Economics 12, 63–83.
Black, D. A. and M. A. Loewenstein (1997), ‘Dismissals and match-
specific rents’. Labour Economics: An International Journal 4(4),
325–340.
Black, D. A., B. J. Noel, and Z. Wang (1999), ‘On-the-job training,
establishment size, and firm size: Evidence for economies of scale in
the production of human capital’. Southern Economic Journal 66(1),
82–100.
76 References
Black, S. and L. M. Lynch (1996), ‘Human capital investments and

productivity’. American Economic Review 86(2), 263–267.
Black, S. E. and L. M. Lynch (2001), ‘How to compete: The impact
of workplace practices and information technology on productivity’.
Blau, F. and L. Kahn (1981), ‘Race and sex differences in quits by young
workers’. Industrial and Labor Relations Review 34(4), 563–577.
Blau, F. and L. Kahn (2000), ‘Gender differences in pay’. Journal of
Economic Perspectives 14(4), 75–99.
Booth, A. and M. Bryan (2002), ‘Who pays for general training? New
evidence for British men and women’. IZA Discussion Paper No.
486, April.
Booth, A. L. (1993), ‘Private sector training and graduate earnings’.
Brown, J. N. (1989), ‘Why do wages increase with tenure? On-the-job
training and life-cycle wage growth observed within firms’. American
Economic Review 79(5), 971–991.
Carmichael, H. L. (1983), ‘Firm-specific human capital and promotion
ladders’. Bell Journal 14(2), 251–258.
Casas-Arce, P. (2004), ‘Firm provision of general training and spe-
cific human capital accumulation’. Oxford Department of Economics
Working Paper #198.
Chang, C. and Y. Wang (1996), ‘Human capital investment under
asymmetric information: The pigovian conjecture revisited’. Journal
of Labor Economics 14(3), 505–519.
Conti, G. (2005), ‘Training, productivity and wages in Italy’. Labour
Economics 12(4), 557–576.
Corcoran, M. and G. J. Duncan (1979), ‘Work history, labor force
attachment, and earnings differences between the races and sexes’.
Journal of Human Resources 14(1), 3–20.
Dearden, L., H. Reed, and J. V. Reenen (2005), ‘Who gains when
workers train? Training and corporate productivity in a panel of
British industries’. Oxford Bulletin of Economics and Statistics.
forthcoming.
Dorsey, S. and D. A. MacPherson (1997), ‘Pensions and training’.
Industrial Relations 36(1), 81–96.
References 77
Duncan, G. J. and S. Hoffman (1979), ‘On-the-job training and earnings

differences by race and sex’. The Review of Economics and Statistics
61(4), 594–603.
Elias, P. (1994), ‘Job-related training, trade union membership, and
labour mobility: A longitudinal study’. Oxford Economic Papers
46(4), 563–578.
Frazis, H., M. Gittleman, M. Horrigan, and M. Joyce (1998), ‘Results
from the 1995 survey of employer-provided training’. Monthly Labor
Review 121(6), 3–13.
Frazis, H., M. Gittleman, and M. Joyce (2000), ‘Correlates of train-
ing: An analysis using both employer and employee characteristics’.
Industrial and Labor Relations Review 53(3), 443–462.
Frazis, H., D. Herz, and M. Horrigan (1995), ‘Employer-provided train-
ing: Results from a new survey’. Monthly Labor Review 118(5), 3–17.
Frazis, H. and M. A. Loewenstein (2003a), ‘Estimating linear regres-
sions with mismeasured, possibly endogenous, binary explanatory
variables’. Journal of Econometrics 117(1), 151–178.
Frazis, H. and M. A. Loewenstein (2003b), ‘Reexamining the returns
to training: Functional form, magnitude, and interpretation’. BLS
Working Paper 367. Washington: Bureau of Labor Statistics.
Frazis, H. and M. A. Loewenstein (2005), ‘Reexamining the returns to
training: Functional form, magnitude, and interpretation’. Journal
of Human Resources 40(2), 453–476.
Frazis, H. and M. A. Loewenstein (2006), ‘Wage compression and divi-
sion of returns to productivity growth: Evidence from EOPP’. Bureau
of Labor Statistics Working Paper.
Frazis, H., M. A. Loewenstein, and J. R. Spletzer (1996), ‘The effects of
measurement error on estimates of the returns to training’. Mimeo.
Frazis, H. and J. R. Spletzer (2005), ‘Worker training: What we’ve
learned from the NLSY79’. Monthly Labor Review 128(2), 48–58.
Gronau, R. (1998), ‘Sex-related wage differentials and women’s inter-
rupted labor careers – The chicken or the egg’. Journal of Labor
Economics 6(3), 277–301.
Gustman, A. L. and T. L. Steinmeier (1993), ‘Pension portability and
labour mobility: Evidence from the survey of income and program
participation’. Journal of Public Economics 50(2), 299–323.
78 References
Hall, R. and E. P. Lazear (1984), ‘The excess sensitivity of layoffs and

quits to demand’. Journal of Labor Economics 2(2), 233–257.
Harhoff, D. and T. J. Kane (1997), ‘Is the German apprenticeship sys-
tem a panacea for U.S. labor market?’. Journal of Population Eco-
nomics 10, 171–196.
Hashimoto, M. (1981), ‘Firm-specific investment as a shared invest-
ment’. American Economic Review pp. 475–482.
Heckman, J. (1997), ‘Instrumental variables: A study of implicit behav-
ioral assumptions used in making program evaluations’. Journal of
Human Resources 32(3), 441–462.
Heckman, J., R. LaLonde, and J. Smith (1999), ‘The economics and
econometrics of active labor market programs’. In: O. Ashenfel-
ter and D. C. eds. (eds.): Handbook of Labor Economics, Vol. 4.
Amsterdam: North Holland, pp. 1865–2073.
Heckman, J. and R. Robb (1985), ‘Alternative methods for evaluat-
ing the impact of interventions’. In: Longitudinal Analysis of Labor
Market Data. New York: Wiley, pp. 156–245.
Holtman, A. G. and T. L. Idson (1991), ‘Employer size and on-the-job
training decisions’. Southern Economic Journal 58, 339–355.
Johnson, R. W. (1996), ‘The impact of human capital investments on
pension benefits’. Journal of Labor Economics 14(3), 520–554.
Jovanovic, B. (1979), ‘Job matching and the theory of turnover’. Jour-
nal of Political Economy 87(5, Part 1), 972–990.
Kahn, C. and G. Huberman (1988), ‘Two sided uncertainty and up or
out contracts’. Journal of Labor Economics 6(4), 423–444.
Kane, T. J., C. E. Rouse, and D. Staiger (1999), ‘Estimating returns
to schooling when schooling is misreported’. Working Paper 7235,
National Bureau of Economic Research.
Katz, E. and A. Ziderman (1990), ‘Investment in general training: The
role of information and labour mobility’. The Economic Journal,
pp. 1147–1158.
Kuhn, P. (1993), ‘Demographic groups and personnel policy’. Labour
Economics 1(1), 49–70.
Lazear, E. P. (1989), ‘Pay equality and industrial politics’. Journal of
Political Economy 97, 561–580.
References 79
Lengermann, P. A. (1999), ‘How long do the benefits of training last?

Evidence of long term effects across current and previous employers,
education levels, test scores, and occupations’. Research in Labor
Economics 18, 439–461.
Lerman, R. I., S.-M. McKernan, and S. Riegg (1999), Employer-
provided training and public policy. Washington DC: The Urban
Institute.
Leuven, E. (2005), ‘The economics of private sector training: A survey
of the literature’. Journal of Economic Surveys 19, 91–111.
Leuven, E. and H. Oosterbeek (2003), ‘An alternative approach to
estimate the returns to private-sector training’. Unpublished work-
ing paper, Department of Economics, University of Amsterdam.
Leuven, E. and H. Oosterbeek (2004), ‘Evaluating the effects of a
tax deduction on training’. Journal of Labor Economics 22(2),
461–488.
Levine, D. (1993), ‘Worth waiting for? Delayed compensation, train-
ing, and turnover in the United States and Japan’. Journal of Labor
Economics 11(4), 724–752.
Lillard, L. A. and H. W. Tan (1992), ‘Training: Who gets it and what
are its effects?’. Research in Labor Economics 13a.
Loewenstein, M. A. and J. R. Spletzer (1997), ‘Delayed formal on-the-
job training’. Industrial and Labor Relations Review, pp. 82–99.
Loewenstein, M. A. and J. R. Spletzer (1998), ‘Dividing the costs and
returns to general training’. Journal of Labor Economics 16, 142–171.
Loewenstein, M. A. and J. R. Spletzer (1999a), ‘Formal and informal
training: Evidence from the NLSY’. Research in Labor Economics
18, 403–438.
Loewenstein, M. A. and J. R. Spletzer (1999b), ‘General and specific
training: Evidence and implications’. Journal of Human Resources
34(4), 710–733.
Lynch, L. M. (1991), ‘The role of off-the-job vs. on-the-job training for
the mobility of women workers’. American Economic Review Papers
and Proceedings, pp. 151–156.
Lynch, L. M. (1992), ‘Private sector training and the earnings of young
workers’. American Economic Review 82(1), 299–312.
80 References
Lynch, L. M. and S. E. Black (1998), ‘Beyond the incidence of train-

ing: Evidence from a national employers survey’. Industrial Labor
Relations Review 52(1), 64–81.
Mincer, J. (1962), ‘On-the-job training: Costs, returns, and some impli-
cations’. Journal of Political Economy, pp. 50–79.
Mincer, J. (1970), ‘The distribution of labor incomes: A survey. with
special reference to the human capital approach’. Journal of Eco-
nomic Literature 8(1), 1–26.
Mincer, J. (1974), Schooling, Experience, and Earnings. New York:
Columbia University Press.
Mincer, J. (1988), ‘Job training, wage growth, and labor turnover’.
NBER Working Paper 2690. Cambridge, Mass.: National Bureau of
Economic Research.
Mortensen, D. T. (1978), ‘Specific capital and labor turnover’. Bell
Journal 9(2), 572–586.
Munasinghe, L. and B. O’Flaherty (2005), ‘Specific training sometimes
cuts wages and always cuts turnover’. Journal of Labor Economics
23(2), 213–233.
Neal, D. (1995), ‘Industry-specific human capital: Evidence from dis-
placed workers’. Journal of Labor Economics 13(4), 653–677.
Neal, D. (1998), ‘The link between ability and specialization: An expla-
nation for observed correlations between wages and mobility rates’.
Neumark, D. and W. Wascher (2001), ‘Minimum wages and training
revisited’. Journal of Labor Economics 19(3 July), 563–595.
O’Connell, P. J. (1999), ‘Adults in training: An international compari-
son of continuing training and education’. Organization for Economic
Cooperation and Development.
Parent, D. (1999), ‘Wages and mobility: The impact of employer-
provided training’. Journal of Labor Economics 17(2), 298–317.
Parsons, D. O. (1972), ‘Specific human capital: An application to
quit rates and layoff rate’. Journal of Political Economy 80,
1120–1143.
Pischke, J. S. (2001), ‘Continuous training in Germany’. Journal of
Population Economics 14, 523–548.
References 81
Pischke, J. S. (2006), ‘Comments on “Workplace training in Europe”

by Bassanini et al.’. In: G. Brunello, P. Garibaldi, and E. Wasmer
(eds.): Education and Training in Europe. Oxford University Press.
Prendergast, C. (1993), ‘The role of promotion in inducing specific
human capital acquisition’. Quarterly Journal of Economics 108(3),
523–534.
Ransom, M. R. (1993), ‘Search and monopsony in the academic labor
market’. American Economic Review 83(1), 221–233.
Rosen, S. (1972), ‘Learning and experience in the labor market’. The
Royalty, A. (1996), ‘The effects of job turnover on the training of men
and women’. Industrial and Labor Relations Review 49(3), 506–521.
Salop, J. and S. Salop (1976), ‘Self-selection and turnover in the labor
market’. The Quarterly Journal of Economics 90(4), 619–627.
Schmidt, F. L. and J. E. Hunter (1983), ‘Individual differences in pro-
ductivity: An empirical test of estimates derived from studies of selec-
tion procedure utility’. Journal of Applied Psychology 68, 407–414.
Schultz, T. W. (1962), ‘Reflections on investment in man’. Journal of
Political Economy 70, 1–8. Supplement (October).
Shapiro, C. and J. E. Stiglitz (1984), ‘Equilibrium unemployment as a
worker discipline device’. American Economic Review 74, 433–444.
Smith, A. (1904), ‘An Inquiry into the Nature and Causes of the Wealth
of Nations’. London, Methuen and Co., Ltd. [Online] available
from http://www.econlib.org/library/Smith/smWN4.html; accessed
21 June 2006; Internet.
Smits, W. (2005). The Quality of Apprenticeship Training; conflicting
Interests of Firms and Apprentices, Research Centre for Education
and the Labour Market (ROA), Maastricht University.
Stern, D. and J. Ritzen (1991), ‘Introduction and overview’. In: D.
Stern and J. R. (eds.) (eds.): Market Failure in Training. New York:
Springer Verlag, pp. 1–14.
Stevens, M. (1994), ‘A theoretical model of on-the-job training with
imperfect competition’. Oxford Economics Papers 46(4), 537–562.
Veum, J. R. (1995), ‘Sources of training and their impact on wages’.
Industrial and Labor Relations Review 48(4), 812–826.
82 References
Viscusi, W. K. (1981), ‘Sex differences in worker quitting’. Review of

Economics and Statistics 62(3), 388–398.
Zoega, G. and A. L. Booth (2005), ‘Worker heterogeneity, intra-firm
externalities, and wage compression’. Birkbeck Working Papers in
Economics and Finance, Number 0515.

Huiioojhfg

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Huiioojhfg

Uploaded by

Copyright:

Available Formats

MICv2n5.

qxd 1/9/2007 11:22 AM Page 1

FnT MIC 2:5 On-the-Job Training

Harley Frazis and Mark Loewenstein

This book is originally published as

Bureau of Labor Statistics

Bureau of Labor Statistics

Published, sold and distributed by:

Outside North America:

Library of Congress Control Number: 2006939974

Foundations and Trends R

• Environmental Economics • Labor Supply

Information for Librarians

Harley Frazis1 and Mark A. Loewenstein2

3 The Division of the Cost and Return to Training 9

4 The Choice of Training 27

5 Matching of High Ability, Low Turnover Workers

6 Estimating the Effect of Training on Wages,

In the last two decades, as datasets with information on training have

Much of this survey will be concerned with empirical work on training.

or informally, and also increases his or her productivity through

(NHES) is 27%. Similarly, the incidence of formal training in SBA is 32

Survey (CVTS). Comparing Europe and the US, Scandinavian coun-

the effects of measurement error on estimates of the wage return to

employer’s idiosyncratic production process would be an example of a

match to one party is not known by the other party.1 In addition, as

Consider a match between an employer and a worker who is in the

As discussed further below, we allow the moving cost c to depend on

denote the employer’s second-period profit if he retains the worker,

Similarly, the worker switches jobs if the utility elsewhere exceeds

U = w1 + δ((1 − L)(1 − Q)E(U2 |U2 ≥ U2A ) + (1 − L)QU2A + LU2A ),

π = H − w1 + δ(1 − L)(1 − Q)E(π2 |π2 ≥ 0) − k(h). (3.8)

The employer chooses first- and second-period wages to maximize π

w1 = H + δ(1 − L)(1 − Q)E(π2 |π2 ≥ 0) − k(h).

When deciding whether or not to dismiss a worker, the employer

be the Lagrangean for the constrained maximization problem. Setting

worker from a dismissal. Similarly, the left-hand side of (3.11) is the

private information. The most natural assumption in this situation is

(D(η) + η) − m(D(η)) = 0, (3.12)

where m(x) ≡ (1 − Q(x))/f (x). Note that second-order condition to

3.1 Why Employers May Share the Return to General

∂D0 /∂h = M (c0 + (1 − γ)), (3.14)

If γ = 1 and c0 = 0, then it follows from (3.14) and (3.3) that

develop a model in which (a) employers can only fully ascertain a

Similarly, Bishop (1991) argues that employers require different mixes of

alternative employer be given by

and note that the reservation value of ε is now given by

Wage guarantees as in Black and Loewenstein (1997) can also lead

3.2 Empirical Evidence on Sharing of General Human

effect of endogenous job mobility is to downwardly bias in magnitude

wnew − wcurrent = (βnew − βcurrent )T + (vnew − vcurrent ).

The greater is (βnew − βcurrent ), the less (vnew − vcurrent ) needs to be to

are shared. We present some of this evidence below, when we discuss

As we now show, the choice of training depends crucially on whether

derivative of (3.10) with respect to h equal to 0, one obtains the first-

θ(h) = (1 − L)(1 − Q) + ((1 − L)Q + L)(γ − c0 (h))

In choosing training, the employer equates the marginal cost to the

4.1 The Effect of Wage Floors

where the Kuhn–Tucker multiplier µ1 is positive, if the wage floor is

θ(h) = (1 − L)(1 − Q) + (1 − µ1 )((1 − L)Q + L)(γ − c0 (h))

capital investment, and a lower dismissal probability, which raises the

as the worker’s expected alternative return to training, (γ −

they will not be able to capture the benefit of training to workers in

µ2 (η) = (1 − Q(η)) − f (εc (η))(D(η) + η) (4.4)