You are on page 1of 28

PERSONNEL PSYCHOLOGY

1990,43

DYNAMIC CRITERIA REVISITED: A LONGITUDINAL


STUDY OF PERFORMANCE STABILITY AND
PREDICTIVE VALIDITY

DIANA L. DEADRICK
Owen Graduate School of Management
Vanderbilt University
ROBERT M. MADIGAN
Management Department
Virginia Polytechnic Institute and State University

The concept of dynamic criteria has been the subject of a recent de-
bate regarding both the definition and prevalence of the phenomenon
(Austin, Humphreys, & H u h , 1989; Barrett & Alexander, 1989; Bar-
rett, Caldwell, & Alexander, 1985). The present paper questions the
adequacy of the conceptual framework underlying the debate and pro-
vides data supporting a refined concept of dynamic criteria. The in-
cidence and possible causes of change in relative performance were
investigated using weekly performance data from 509 sewing machine
operators. Analyseswere conducted to determine the degree of perfor-
mance consistency, potential moderators of consistency, and the sta-
bility of predictor-criteria relationships using multiple predictors and
criteria. Results revealed a steady decline in performance stability co-
efficients as the interval between measures increased. This decay was
evident regardless of employees’ prior job experience, cognitive abil-
ity, or psychomotor ability. Analyses of predictive validity coefficients
revealed temporal changes in validity for both objective and subjective
criteria,but not in the expected direction. The validity of cognitive abil-
ity increased, the validity of psychomotor ability was stable, and that of
prior job experience decreased over time. Implications for theory and
research are discussed.

Dynamic-criteria are a commonly-cited cause for “the criterion prob-


lem” in industriaVorganizationa1 research. Although it has been widely
accepted that a single criterion measure, does not generalize over un-
specified periods of time (Dunnette, 1963; Ghiselli, 1956; Guion, 1976;
Prien, 1966; Ronan & Prien, 1966; Smith, 1976; Wernimont & Campbell,

Support for this paper was provided to the first author by the Dean’s Research Fund,
Owen Graduate School of Management. The authors contributed equally to this article.
We wish to thank Tom Mahoney, Rich Oliver, and Cliff Ball of the Owen Graduate
School of Management, Vanderbilt University, and three anonymous reviewers for their
helpful comments.
Correspondence and requests for reprints should be addressed to Diana L. Deadrick,
Owen Graduate School of Management, Vanderbilt University, 401 21st Avenue South,
Nashville, TN 37203.
COPYRIGHT@ 1990 PERSONNEL PSYCHOLOGY, INC

717
718 PERSONNEL PSYCHOLOGY

1968), it is nonetheless customary to assume temporal stability of per-


formance (Henry & Hulin, 1987; Hulin, Henry, & Noon, 1990; Rambo,
Chomiak, & Price, 1983; Rothe, 1978). The research evidence to support
dynamic criteria is limited and has, in fact, stimulated a recent debate
about the prevalence of dynamic criteria. The “dynamic criteria” per-
spective, advanced by Austin, Humphreys and Hulin (1989) and Henry
and Hulin (1987), argues that dynamic criteria are a general phenom-
ena, evidenced in a broad spectrum of performance studies in both field
and laboratory settings. “Performance changes [over time] and crite-
ria must therefore change if they are to represent performance validly”
(Austin et al., 1989, p. 593). In support of their view, these authors refer
to the generality of the simplex phenomenon, a pattern of systematically
decreasing correlations in time-structured matrices of criterion intercor-
relations and criterion-related validity coefficients. In contrast, Barrett,
Caldwell and Alexander (1985) and Barrett and Alexander (1989) find
little support for the received doctrine of dynamic criteria and conclude
that “dynamic criteria are rare phenomena, with the significant changes
found in key studies explainable by methodological artifacts” (Barrett
et al., 1985, p. 41). More recently, Barrett and Alexander (1989) took
strong issue with Austin et al. (1989), arguing that the burden of proof is
on researchers who advocate dynamic criteria to define the concept and
present evidence to support that concept. Based on research to date,
they find little evidence of the ubiquity of the simplex pattern and argue
that “more evidence is required before the Ghiselli and Haire (1960)
and Austin et al. (1989) concept of dynamic criteria can be accepted”
(p. 597).
The purposes of the present article are: (a) to clarify the debate over
dynamic criteria by refining the concept to distinguish between differ-
ent sources of criterion instability, (b) to briefly summarize the exist-
ing evidence for dynamic criteria in the light of this distinction, and (c)
to pre nt additional evidence pertaining to the incidence and possible
T
causes f performance stability/instability.

Concept Definition

The basic concept of dynamic criteria refers to variability in the rel-


ative performance of employees over time. In their review of the lit-
erature, Barrett et al. (1985) identified three operational definitions of
dynamic criteria: (a) changes in group average performance over time,
(b) changes in criterion-related validities over time, and (c) changes in
the rank-order of criterion scores over time. Barrett et al. (1985) and
Austin et al. (1989) both agree that the first definition is conceptually
DEADRICK AND MADIGAN 719

and operationally weak. Their debate centers on the intrinsic and sys-
tematic nature of change in predictor-criterion correlations over time
(definition #2) and on the prevalence and systematicnature of change in
relative performance over time (definition #3). Barrett et al. (1985) ini-
tially argued that although changes in criterion rank-ordering might oc-
cur, this definition of dynamic criteria “. ..assumes practical importance
only in relation to potentially consequent changes in validity” (pp. 52-
53). Based on their reviews and reanalyses of prior validation studies
incorporating repeated criterion measures, Barrett et al. (1985, 1989)
found little support for this concept of dynamic criteria and, therefore,
implicitly advocate changes in validity coefficients (definition #2) as the
most meaningful evidence of dynamic criteria. More recently, Barrett
and Alexander (1989) criticized the assertion that the simplex pattern
is a ubiquitous phenomenon with regard to either definition of dynamic
criteria, arguing that the simplex approach relies on a methodological
versus a theoretical basis for predicting changes.
In contrast to Barrett et al. (1985, 1989), Austin et al. (1989) ap-
pear to be concerned primarily with developing a better understanding
of criteria, per se, and focus on fluctuations in the rank-ordering of cri-
terion scores over time (definition #3) as the “core of the dynamic cri-
terion concept” (p. 589). Although they cite supporting evidence in the
form of both validity and performance stability coefficients, they empha-
size that the essence of dynamic criteria is represented by the decreas-
ing stability in performance intercorrelations over distant time periods
(p. 592). We take the position here that neither of these operational
definitions provides a sufficient basis upon which to advance our under-
standing of dynamic criteria because they do not indicate whether the
temporal changes reflect actual changes in job performance or changes
in the performance evaluation context. This confounding of different
types (sources) of change is reflected in the existing literature and cur-
rent debate. In this study we differentiate between criterion changes
attributable to individual differences (perfiormanceconsistency), changes
attributable to the organizational context (evaluation consistency), and
changes attributable to the measurement procedure (measurement re-
&ability). It is our contention that this distinction is necessary for any
meaningful discussion of the concept of dynamic criteria.
Performance consistency, as defined here, refers to the systematic
changes in critical job behaviors or outcomes over time that are at-
tributable to individual differences (Kane, 1982; Wernimont and Camp-
bell, 1968); therefore, the ultimate sources of criterion variability are
within the person. This concept is implied in Ghiselli’s (1956) discussion
of dynamic criteria, in which he suggests that there may be a uniform
pattern of change in performance over an extended period of time as
720 PERSONNEL PSYCHOLOGY

workers learn and develop on the job. Performance consistency has been
investigated by Rothe (1978) and by Rambo, Chomiak, and Price (1983)
who analyzed repeated measures of output obtained under conditions
of context stability (motivation) and/or task stability. Performance con-
sistency could also be ascertained via validity coefficients in situations
where the ability requirements of the job/task remain stable while indi-
viduals’ abilities change over time. This possibility has been referred to
as the changing-subject model (Alvares & H u h , 1972; Henry & H u h ,
1987).
Evaluation consistency, as defined here, refers to the temporal con-
sistency of the performance evaluation system. Criterion stability can
be affected by changes in the organization’s objectives, performance re-
quirements, job design, and other factors that might cause a change in
the relative importance of performance dimensions, thus altering the
composition of global (summary) criteria. In essence, performance is
implicitly redefined, conceptually and/or operationally; hence, the im-
petus for change is in the work environment and/or evaluation system.
The employee’s actual performance behavior or outcomes could be con-
sistent over time, but the overall performance evaluation would suggest
otherwise. This notion of dynamic criteria is evident in Prien’s descrip-
tion of dynamic criteria and organization change (Prien, 1966): The tran-
sitional nature of organizational needdobjectives results in shifting per-
formance expectations and requirements, although job duties might re-
main static. Evaluation consistency is also implied in Alvares and Hulin’s
(1972) changing-task model, in which the task structure and ability re-
quirements change over time while individuals’abilities remain constant,
and in Murphy’s (1989) dynamic model of job performance, in which
structural changes occur in the job and/or work environment. Although
evaluation consistency has not been directly studied in the context of dy-
namic criteria, Barrett and Alexander (1989) refer to this phenomenon
as the vnderlying cause of fluctuating predictor-criterion validities in
their rdview of dynamic criteria from educational psychology. Citing
opportunity bias, differential performance standards, and information-
processing differences in tasks, Barrett and Alexander (1989) concluded
that predictor-criteria relationships may decline, remain the same, or in-
crease over time, depending on the situation.
The concept of evaluation consistency should be distinguished from
measurement reliability. Even if the performance evaluation system and
work context remain stable, the temporal stability of observed criterion
scores and/or validity coefficients could still be affected by measurement
error. However, while the distinction between evaluation consistency
and measurement reliability is conceptually straightforward, it is often
difficult or impossible to operationally distinguish between them. For
DEADRICK AND MADIGAN 721

example, if the performance criterion consists of subjective ratings of


global performance, one would be hard-pressed to differentiate between
inconsistency due to job or contextual change and that attributable to
measurement error.
The foregoing distinction between performance consistency, eval-
uation consistency, and measurement reliability provides a conceptual
framework for considering the issue of dynamic criteria that separates
actual change in employees’ performance over time from the consis-
tency of the organization’s evaluation and measurement procedures.
The present research focuses on performance consistency,which we con-
sider to be the central issue in the dynamic criteria debate. The terms
p e ~ o m a n c econsistency and stability are used interchangeably here and
refer to the extent to which the relationship between repeated perfor-
mance measures changes over time. If the correlation between perfor-
mance measures decreases monotonically over time, this would indicate
instability, although in this case the change is systematic as opposed to
random. Note that this definition of performance consistency (stability)
refers to the relative (rank order) performance of individuals over time,
not within-person variability in performance.
The issue of performance Consistency is of considerable theoreti-
cal and practical importance; ,evidence for instability would require re-
vised approaches to understanding and evaluating the validity of crite-
rion measures, as well as the validity of predictor-criterion relationships.
For instance, if performance is not temporally stable, initial assessments
of job performance are not necessarily predictive of future performance.
Moreover, performance over time might not be equally predictable from
a given selection procedure, and selection researchers would need to de-
termine the appropriate time to collect criterion data in a predictive val-
idation study (Guion & Gibson, 1988). Findings of instability also raise
questions about the usefulness of test-retest correlations as estimates of
criterion reliability (Rambo, Chomiak, & Rountree, 1987) and the gen-
eralizability of predictive validities over time (Hulin et al., 1990).
In order to provide direct evidence regarding the stability of per-
formance, analyses of performance consistency should meet three min-
imum conditions. First, the criterion measure should be a specific per-
formance dimension that is deemed to be a primary determinant of job
performance (Wernimont & Campbell, 1968). A highly specificcriterion
measure facilitates differentiation between performance and evaluation
consistency. As noted above, such differentiation is virtually impossible
with global performance measures. Second, the tasWjob should be in a
stable and routine task environment in which the ability requirements
for performance are constant (Barrett & Alexander, 1989; Prien, 1966;
722 PERSONNEL PSYCHOLOGY

Rambo et al., 1983; Wernimont & Campbell, 1968). The focus on re-
curring critical tasks provides a control for work factors that might affect
stability but are characteristics of the situation rather than the individ-
ual. Third, in order to control other personal and contextual factors, the
sample should be an intact group of employees working on the same job
and under common performance standards and operating procedures.
Although temporal stability of predictor-criterion relationships is not
a required condition for evidence of performance consistency, such anal-
yses can and should be employed to examine possible causes of stabil-
itylchange (Ackerman, 1989; Murphy, 1989). Motivational andlor per-
sonal (ability) factors that might explain temporal changes in perfor-
mance should therefore be incorporated in the study design (Ackerman,
1989; Austin et al., 1989; Barrett & Alexander, 1989; Henry & Hulin,
1987; Rambo et al., 1983; Wernimont & Campbell, 1968).

Prior Research

The present body of evidence pertaining to dynamic criteria does


not provide an adequate basis for conclusions regarding performance
consistency. The studies by Ghiselli and Haire (1960) and Bass (1962)
are commonly cited as evidence for dynamic criteria, but neither study
meets the foregoing conditions for evidence of performance consistency.
Both studies involved global, summary measures of performance (fares
collected and supervisor ratings, respectively), lacked situational con-
trols, and suffered from methodological flaws (see Barrett et al., 1985;
Barrett & Alexander, 1989). Thus, interpretation of their findings is
problematic. Similar limitations pertain to the majority of studies ref-
erenced by both parties to the current debate. For example, fluctua-
tions in validity coefficients from research in educational and organi-
zational psychology were cited as evidence, pro and con, for dynamic
criteria: However, the global nature of the criterion measures in the
vast majority of these studies (e.g., GPA, salary progress) allows contra-
dictory interpretations. Global performance measures were also used
in most of the studies that examined repeated performance measures
(see the studies cited ‘in Henry & H u h , 1987, pp. 457-458). For in-
stance, Henry and Hulin (1987) presented evidence supporting the “uni-
versal” decreasing simplex pattern of criterion measures based on a 10-
year study of major league baseball players’ performance. However,
their criterion measures were summary, composite measures of offen-
sive and defensive performance, not specific measures of performance
behaviors/outcomes. Furthermore, they did not control for situational
factors that confound the distinction between performance consistency,
DEADRICK AND MADIGAN 723

evaluation consistency, and measurement reliability (i.e., team vs. indi-


vidual performance requirements, “seasonal” fluctuations, job/position
changes, changing vs. static cohort groups). It should be noted that the
authors recognized the potential influence of situational factors, yet ar-
gued that this would not confound their results. However, given the fore-
going distinction between the sources of criterion change, it is unclear
whether their findings are indicative of “actual” performance changes
or situational, contextual changes.
Some apparently relevant evidence of performance consistency can
be found in the research studies conducted by Rothe (1978) and Rambo
et al. (1983). Rothe summarized a series of studies that examined the
temporal consistency of one dimension of performance-production-
among various groups of employees (i.e., machine operators, welders)
over various periods of time (10 to 48 weeks). He found that consis-
tency, measured by week-to-week correlations, varied between .52 and
.82 and concluded that there is an apparent relationship between the
consistency of performance and the presence or absence of an incentive
system: Performance consistency was higher when an incentive system
was present. Although Rothe’s examination of consistency did focus on
a specific dimension of performance and the motivational context, he
did not control for situational factors that might confound performance
and evaluation consistency evidence (i.e., the stability of task/job assign-
ments, machine “pacing” vs. individual performance control, group vs.
individual output, stability of cohort groups).
Rambo et al. (1983; 1987) also examined production consistency over
an extended period of time (3.5 years). They found relatively high lev-
els of consistency in adjacent-week performance among two samples of
employees (median T = .94, .98), as well as “considerable consistency”
across intervals of one year (median T = -69, .86) and three and one-half
years (T = S9, 30). Rambo et al. (1983) concluded that consistency is
enhanced by a stable work environment and a close linkage of incen-
tives to individual performance (thus a high level of worker motivation).
They also observed that consistency decreased as the time between pro-
duction periods increased, with one-half of the decay occurring in the
initial four to five months. This decrement in stability coefficients sug-
gests that reliability estimates based on short intervals might not gener-
alize to distantly separated time intervals. Alternatively stated, past per-
formance might be predictive of near-term, but not long-term, perfor-
mance. The decrement would be expected to be even greater if the per-
formance measure was multi-dimensional and the performance-reward
contingency was loose.
Several limitations in the studies conducted by Rothe (1978) and
Rambo et al. (1983) suggest that further investigation of performance
724 PERSONNEL PSYCHOLOGY

consistency is warranted. First, Rambo et al.’s examination of long-term


consistency was based on very small samples (n = 27 sewing machine
operators and 19 folders/packers). Additional research involving larger
samples is needed before much credence can be placed in these results.
Second, the Rothe (1978) and Rambo et al. (1983) studies were lim-
ited to a description of the incidence of inconsistency; they only spec-
ulated about the potential effect of learning or ability on consistency.
For example, Rambo et al. reasoned that the performance of experi-
enced employees should be more stable over time because they draw on
a well-established performance repertoire and, thus, are not as vulner-
able to job adjustment and idiosyncratic early learning curves as inex-
perienced employees. However, their sample was composed of experi-
enced employees only; hence, examination of this proposition was not
possible. With regard to a possible moderating impact of aptitude on
performance consistency, Rambo et al. used a curve-fitting procedure to
estimate the asymptotic values, which were assumed to reflect the con-
tribution (marginal boundary) of skill and/or aptitude to performance
consistency. However, they did not directly examine ability factors or
how the relationship between ability and performance might change over
time.
Ackerman (1989) has provided theoretical arguments for both a
moderating impact of aptitude on consistency and for differential va-
lidities over time, depending on the nature of the task. If high-aptitude
employees learn the taskjjob more quickly than low-aptitude employ-
ees, then aptitudes should influence performance consistency such that
high-aptitude subgroups will exhibit higher levels of consistency than
low-aptitude subgroups. He further argues that if the elements of a job
are routine, resulting in automatized information processing, early valid
predictors such as cognitive ability will show reduced validities as work-
ers learn and master the task/job. The limiting factors of performance
are perceptual or motor skills, which may show an increase in validity
over tiihe.
Murphy (1989) also predicts that the relationship between ability and
performance will change over time, depending on the stage of job tenure.
Employees new to the job (or task demands) are in a “transition” stage
and must therefore learn new skills and tasks; in this stage, performance
depends largely on cognitive ability. When employees are in a “main-
tenance” stage, tasks are well-learned and individual differences in job
performance are not affected by differences in cognitive ability. Because
the causes of performance are different depending on the stage, the rel-
ative importance of ability as a cause of job performance varies across
the stages, resulting in validity changes over time.
DEADRICK AND MADIGAN 725

A third limitation of the Rothe (1978) and Rambo et al. (1983) stud-
ies is that they considered only an objective criterion; the more com-
monly used subjective performance measures were not included in their
analyses of consistency. Rothe noted that subjective data might have
contributed additional information, but that it “would also have con-
tributed controversy because of the lack of objectivity” (p. 46). How-
ever, an assessment of performance consistency need not preclude sub-
jective performance measures. The only stipulation is that the perfor-
mance ratings pertain to specific dimensions of performance behaviors
or outcomes (Wernimont & Campbell, 1968). Moreover, the use of both
objective and subjective (repeated) measures of specificperformance di-
mensions would provide some insight into the related issue of criterion
(rating) validity. Consider a case where a test-retest correlation between
dimensional ratings is high. Such rating consistency does not necessar-
ily indicate performance consistency. Rater error (i.e., first-impression,
halo) can produce rating consistency that is independent of actual (ob-
jective) performance consistency. In this case, the high reliability esti-
mate would actually reflect systematic error. A more informative ap-
proach is to examine the degree to which ratings of a specific perfor-
mance dimension, taken at different points in time, correlate with ob-
jective measures of the same performance dimension covering the same
period of time. Given high convergence among these different methods
of measurement, one would expect any changes in validity over time to
be similar for these matched criteria (see Smith, 1976).

Purpose of Present Study

The present study examines performance consistency in a field set-


ting that meets the evidentiary conditions noted above. The perfor-
mance domain of interest here was outcomes (output rates) of employ-
ees working in a stable, routine job which explicitly linked pay to indi-
vidual output. The performance of both inexperienced and experienced
new hires was tracked over time using objective and subjective measures.
In addition, job aptitude information was obtained, enabling a direct in-
vestigation of the impact of ability and prior experience on performance
consistency. Based on the existing research and current debate, the fol-
lowing propositions were examined:
Proposition 1. Performance consistency will decrease as the time
interval between performance measurement occasions increases. This
proposition is a replication of Rambo et al. (1983) and posits that a
time-structured matrix of performance correlations will show a declining
simplex pattern as purported by Ghiselli (1956) and Austin et al. (1989).
726 PERSONNEL PSYCHOLOGY

Proposition 2. Performance consistency will be greater for experi-


enced employees than for inexperienced employees. This proposition
tests the assumption made by Rambo et al. (1983) that, in a recurring
task situation, job/task experience produces transferable “maintenance
behaviors” that effectively mitigate job adjustment, resulting in higher
stability over time for experienced versus inexperienced employees.
Proposition 3. Performance consistency will be greater for employ-
ees with high job aptitudes than for those with low job aptitudes. This
proposition tests the argument made by Cronbach and Snow (1977) and
Ackerman (1989) that high-aptitude employees learn the job faster and
therefore stabilize their performance at a much quicker rate than low-
aptitude employees.
Proposition 4. The relationship between ability and performance will
change over time, depending on the type of ability and the nature of the
task. Contrary to the arguments made by Henry and Hulin (1987) and
Austin et al. (1989) for a universally declining simplex pattern for ability-
performance relationships, we expect the validity of cognitive ability to
decline over time and the validity of psychomotor ability to increase,
given the routine nature of the task (Ackerman, 1989).
Proposition 5. Any changes in criterion-related validity over time
will be similar for “matched” objective and subjective measures of di-
mensional performance. This proposition runs counter to Henry and
H u h ’ s (1987, p. 458) assertion that “the decrement in validity appears
to be greater if performance is assessed objectively rather than by means
of supervisor or peer ratings.” Instead, we expect similar increments/
decrementdstability in validity, given the matched specificity of criteria
and predictors (Smith, 1976).

Method

m e data reported in this study were gathered as part of a larger study


conddcted for the Virginia Employment Commission to evaluate the Va-
lidity Generalization (VG) Testing Procedure adopted by local Job Ser-
vice offices. One component of that research project was a validation
study of the General Aptitude Test Battery (GATB) for selecting sewing
machine operators. The present paper is not concerned with test valida-
tion, per se, but rather the stability of performance and the underlying
ability factors that might explain performance stability or instability over
time.
DEADRICK AND MADIGAN 727

Participants

Job performance data and ability measures were collected for sewing
machine operators employed at five nonunionized garment manufac-
turing plants in the Southeast. All five sewing plants were owned by
the same company, produced the same kind of garments using similar
equipment and operating procedures, and operated under a uniform set
of management policies, procedures, and record-keeping. The sample
sizes varied across plants (80-338); however, the demographic and abil-
ity characteristicswere quite similar. Because of the high degree of stan-
dardization of operations and operator similarity across plants, analyses
reported here were conducted on a combined sample.
Sewing machine operators performed a single operation during a
shift and were paid on a piece-rate basis. Although jobs were laid out
in a “production line,” each operator independently stitched bundles of
garment pieces from a large in-process inventory. As a result, the pro-
duction performance of any one operator was not dependent on the per-
formance of other operators. The piece-rate standards were determined
by industrial engineering studies typical of the industry. Thus, within the
limits of error of those studies, jobs were equated along a common pay
scale and differences in piece-rate (production) earnings reflect differ-
ences in production performance.
Operators’ work assignments (specificsewing operations) were made
by supervisors based on the plant production schedule as opposed to
either seniority or ability qualifications. As in all such garment plants,
there undoubtedly were some sewing operations that offered better earn-
ing opportunities, but there was no formal system to accommodate op-
erator preferences. Earnings temporarily decreased in some instances
when new models were introduced. However, such changes typically af-
fected all operators, and earnings levels quickly moved back to previous
levels.
The initial sample consisted of 932 operators hired during a 10-
month period. However, due to turnover and/or missing data, three sub-
samples were used for the analyses. The first subsample (N = 509) was
composed of employees for whom we had objective performance data
for at least the first 6 months on the job. This sample was used to ana-
lyze the first research proposition regarding the temporal decay in per-
formance consistency. Participants were all female, of which 67% were
white and 33% were black. The average age was 26.4 years old (range =
17-59) and the average amount of previous sewing experience was 12.9
months. Sixty-five percent of the participants had no previous sewing
728 PERSONNEL PSYCHOLOGY

machine operator experience at the time of hire. The demographic char-


acteristics of the study participants were very similar to those of the orig-
inal group of hirees (73% white; average age and experience were 26.5
years and 11.5 months, respectively; and 65% had no prior experience).
Note that the shrinkage from the hiree sample to sample 1 reflects early
turnover (within the first 6 months) and missing demographic data.
Because some of the operators in the first sample were not tested
and/or prior experience data were missing, a subsample of sample 1 (N
= 413) was used in the criterion-related validity analyses. The demo-
graphic characteristics of the two samples were very similar (68% white;
average age was 26, range = 17-59; average prior experience was 11.4
months; and 67%had no sewing experience). The third subsample (N =
224), drawn from sample 2, consisted of tested operators whose perfor-
mance was also rated on two occasions. This sample was used to compare
temporal changes in validity across different methods of performance
measurement. In this case, the sample reduction (from 413 to 224) re-
flects incomplete or missing performance appraisal data. The subsample
demographics were as follows: 74% white; average age was 26.2 (range
= 17-54); average prior experience was 9.2 months; and 65% had no
previous sewing experience.
The apparently high turnover rate for this company (43% of new
hirees quit within 6 months) is typical for this industry. However, it is
unlikely that turnover materially affected our findings. The “leavers”
did not differ markedly from the “stayers” in terms of average age (26.6
years), prior experience (9.4 months), or ability scores (49.1% on the
Job Family 5 composite of the GATB, which is described below), and
the same proportion (65%) had no previous sewing experience. Not
surprisingly, the average earnings of leavers were lower by 10-15%, but
the rate of improvement, as reflected in their monthly averages, was
similar to that of the stayers. Moreover, our analysis of turnover from
company documents indicated that approximately 50% left for reasons
unrelated to the job, a finding consistent with our prior experience in this
industry.

Procedure

The study employed a predictive validation design with multiple per-


formance measures. The General Aptitude Test Battery (GATB) was
administered prior to employment; however, test results were not used
for screening or selection purposes. Performance data were collected pe-
riodically throughout the study; both objective and subjective measures
are described in detail below.
DEADRICK AND MADIGAN 729

No significant changes occurred during the study in the organiza-


tion’s manufacturing procedures, job design, equipment, or industrial
engineering standards. As a result, we state with confidence that the
data reflect performance in a stable work environment on routine jobs.
Moreover, workers operated under a piece-rate system which presum-
ably produced constant (high) levels of employee motivation.
Peformance data. The performance dimension of interest in this
analysis was output, an obviously critical dimension of success in a gar-
ment plant. Furthermore, individual operators had virtually complete
control over their production, it was readily measured, and both em-
ployees and supervisorswere keenly aware of average hourly production
rates.
The objective measure of output was the average hourly production
earnings per week (total production earnings, divided by the number of
hours actually worked). This figure was obtained for each week from
date of hire. Production earnings reflected only the earnings for actual
hours worked and did not include any guarantees, time-not-worked, or
rework. Therefore, production errors by an operator were directly re-
flected in earnings in that the additional time required to rework a piece
was not available for production earnings.
The objective performance variables used in the analyses included:
(a) weekly production averages for each of the first 24 weeks on the job;
(b) monthly production averages (4 weeks) for each of the first 6 months,
derived from the weekly data; and (c) production averages for 3-month
time intervals, derived from the weekly data (mean output for the first
12 weeks on the job and the subsequent 12 weeks on the job).
The subjective measure of output was a supervisory rating of produc-
tion quantity, which was one of five dimensions of performance on which
operators were rated (quantity, quality, dependability, receptiveness to
training, flexibility). In addition, an overall (global) performance rating
was obtained. The rating dimensions were derived from a job analy-
sis and input from company officials and operators, but were used for
research purposes only. Because the purpose of this study was to ex-
amine performance consistency in terms of a specific, primary dimen-
sion of performance, the analyses did not include the other identified
performance dimensions. As noted above, quantity of production was
clearly a critical performance dimension in this industry, and interviews
with company officials, plant managers, and production supervisors con-
firmed that it was the primary determinant of operator success.
Quantity of work was defined as the average level of production over
a specified period of time. The rating scale described average production
levels on a 5-point scale: at or below minimum standard (1);above mini-
mum standard, but less than established company production goal (2); at
730 PERSONNEL PSYCHOLOGY

established company production goal (3); above established production


goal (4);among the best in the plant (5). Supervisorswere trained to use
the rating instrument and were assured that the ratings would be used
for research purposes only. The ratings were conducted during work
hours in a separate room under the supervision of the researchers. For
the operators included in the third sample, the single-item rating vari-
ables included: (a) an initial rating of average production performance
after approximately 3 months on the job, (b) a subsequent rating of av-
erage production performance collected 3 months later, and (c) initial
and subsequent ratings of overall job performance. Although the rat-
ings of overall job performance were not of primary importance here,
they were included in our analyses in order to compare the relative pre-
dictive stability of global versus dimensional performance criteria. The
rating scale described overall job performance on a 6-point scale from
unacceptable (1) to exceptional (6), with each scale level briefly defined
in terms of performance relative to minimum standards. Because per-
formance consistency was the central issue in this study, estimates of cri-
terion reliability are presented in the Results section.
Predictors. Three predictors were included in the study: (a) previous
job experience as a sewing machine operator, measured by the number
of months; (b) cognitive ability, as measured by the validity generaliza-
tion scoring procedure of the United States Employment Service (a raw
score composite of the general, verbal, and numerical aptitude scales of
the GATB); and (c) psychomotor ability, also measured by the GATB
scoring procedure (a raw score composite of coordination, finger dex-
terity, and manual dexterity scales).

Results

Pe$omzance Characteristics

Tatble 1presents means and standard deviations of the performance


and aptitude measures for the three samples. The mean output level
(weekly, monthly) increased rapidly in the initial weeks for all three sam-
ples, and as expected, exhibited a decreasing rate of improvement over
time. The average rating of output (Quantity) also increased slightly in
the second set of performance ratings. However, the glabal ratings re-
mained constant. Company policy specified a 12-week learning period
(on-the-job training) for all new hires to achieve a minimum standard
of $3.35 average hourly production earnings. While the majority of op-
erators met this standard within the desired time period, a substantial
minority were sub-standard. This fact, combined with the continued im-
provement in mean output levels, suggests that a significant amount of
DEADRICK AND MADIGAN 73 1

TABLE 1
Means and Standard Deviations of Pei$ormance and Aptitude hriables
for the Three Samples
Sample 1 Sample 2 Sample 3
output M SD M SD M SD

Variablesa
Week 1 $ 2.07 1.17 $1.98 1.02 $2.06 1.08
Week 2 2.42 1.23 2.35 1.13 2.44 1.15
Week 3 2.67 1.27 2.62 1.20 2.70 1.23
Week 4 2.89 1.22 2.84 1.14 2.90 1.09
Week 8 3.49 1.24 3.43 1.16 3.47 1.18
Week 12 3.88 1.25 3.84 1.16 3.87 1.15
Week 16 4.12 1.21 4.08 1.10 4.11 1.07
Week 20 4.32 1.21 4.28 1.11 4.34 1.06
Week 24 4.46 1.19 4.47 1.18 4.51 1.12
Month 1 $ 2.51 1.17 $2.45 1.07 $2.53 1.08
Month 2 3.29 1.21 3.25 1.12 3.31 1.11
Month 3 3.74 1.20 3.70 1.12 3.72 1.11
Month 4 4.01 1.15 3.98 1.06 4.01 1.04
Month 5 4.24 1.16 4.21 1.08 4.27 1.03
Month 6 4.40 1.15 4.41 1.12 4.45 1.06
Months 1-3 $3.18 1.15 $3.13 1.06 $3.18 1.06
Months 4-6 4.22 1.11 4.19 1.04 4.24 1.00
Months 1-6 3.70 1.09 3.66 1.01 3.71 0.98
Rating variablesb
Quantity rating #1 2.5 1.18 2.5 1.15 2.4 1.16
Quantity rating #2 2.6 1.08 2.6 1.05 2.7 1.05
Global rating #1 3.2 1.04 3.2 1.00 3.2 1.01
Global rating #2 3.2 0.86 3.3 0.86 3.2 0.88
Aptitude variables'
Cognitive ability 272.1 34.41 272.5 34.05 276.6 33.20
Psychomotor ability 324.4 45.44 324.5 45.15 330.3 38.79
Job Family 5 composited 53.2 % 25.73 53.3 % 25.51 55.1 % 22.83
a N = 509 (Sample l), 413 (Sample 2), and 224 (Sample 3).
N = 294 (Sample l), 239 (Sample 2), and 224 (Sample 3) for the initial ratings. N = 487
(Sample l), 395 (Sample 2), and 224 (Sample 3) for the follow-up ratings. The Quantity
ratings were based on 5-point rating scales; the Global ratings were based on 6-point rating
scales.
'N = 419 (Sample l), 413 (Sample 2), and 224 (Sample 3).
The Job Family 5 composite score was obtained from the GATB and is a norm-
referenced score composed of psychomotor ability (56%) and cognitive ability (44%).

learning occurred throughout the 6-month period. It is notable that


although the average output more than doubled during the 6 months,
the standard deviation remained stable. Some reduction in the range of
output over time might have been expected as the differential between
experienced and inexperienced hirees decreased.
732 PERSONNEL PSYCHOLOGY

The data pertaining to aptitude scores substantiate the company’sas-


sertion that test scores were not used for selection purposes. The mean
Job Family 5 score ranged from 53-55% for the three samples with no
evidence of restriction in range. This score is a norm-referenced (per-
centile) score created from a weighted composite of the cognitive (44%)
and psychomotor (56%) ability measures (Hunter, 1983). Similar distri-
butions were observed when we examined the experienced and inexpe-
rienced subsamples. Hence, it is reasonable to assume that the samples
approximate a random sample from the applicant pools for the plants.

Consistency of Peflomance

Performance consistency was analyzed using both descriptive and in-


ferential methods to provide evidence for propositions 1-3. Initially, the
general pattern, or trend, of consistency was described using time-period
intercorrelations (Pearson product-moment correlations) between adja-
cent and distantly-separated performance periods. Following the proce-
dure used by Rambo et al. (1983), a triangular matrix of fixed-interval
correlations was developed representing time lags (denoted by K) rang-
ing from adjacent weeks (K = 1, based on 23 week-by-week correlations)
to a 6-month time separation (K = 23, based on a single correlation be-
tween Week 1 and Week 24). Table 2 summarizes these results at se-
lected time intervals for the total sample and for the ability subgroups.
For the total sample, performance consistency was high when mea-
sured by adjacent, week-to-week correlations (median T = .92), yet de-
clined when measured over distantly-separated time periods (T = .55 for
6 months). This finding provides some preliminary support for our first
research proposition and is consistent with the results of Rambo et al.
(1983): The stability of performance decreases as the time lag increases.
However, our data show substantially lower levels of long-term consis-
tency than those reported by Rambo et al. (1983). When our analysis
was elatended to cover 12 months, the coefficient was .25 for K = 51
(N = 82 operators); Rambo et al. (1983) reported T = .69 for K = 51
(N = 27 operators). The magnitude of these decrements suggests that
rank-ordered performance shifts significantly over time.
Table 2 also addresses the issue of whether ability factors moderate
performance consistency. The experience subgroups were composed of
those operators with some previous sewing experience (N = 138) ver-
sus those with no prior experience (N = 275). The high and low apti-
tude subgroups were comprised of the top and bottom quartiles on the
cognitive and psychomotor ability measures. The data in Table 2 pro-
vide weak preliminary support for our second proposition: Previous job
DEADRICK AND MADIGAN 733

TABLE 2
Median CorrelationsBetween Average Weekly Output at Different
Time-LaggedIntervals (K) for the Total Sample and for the Ability Subgroups

Samplea K=l K=3 K=7 K=l1 K=15 K=19 K=23


~~

Total
Median r .92 .87 .81 .75 .71 .66 .55
Range .04 .04 .08 .07 .08 .08 -
Prior experience
Median T 91 .86 .80 .74 .69 .62 .55
Range .07 .09 .10 .10 .09 .09 -
No experience
Median r .90 .85 .76 .68 .65 .6 1 SO
Range .06 .16 .15 .17 .12 .13 -
High cognitive
Median r .89 33 .76 .67 .63 .57 .47
Range .09 .12 .10 .07 .06 .12 -
Low cognitive
Median T .91 .85 .74 .69 .66 .61 .55
Range .12 .10 .14 .22 .22 .16 -
High psychomotor
Median r .91 37 .80 .71 .67 .66 .59
Range .09 .10 .14 .17 .08 .12 -
Low psychomotor
Median T .92 .88 .80 .75 .72 .67 .52
Range .05 .09 .ll .21 .17 .19 -
Note: The values of “K” refer to the number of intervening weeks. The number of
correlation coefficients at each value of K is: 23 at K = 1week; 21 at K = 3 weeks, 17 at
K = 7 weeks; 13 at K = 11weeks; 9 at K = 15 weeks, 5 at K = 19 weeks; and 1 at K =
23 weeks. The range reported here refers to the difference between the highest and lowest
observed correlations for each time interval and sample.
a N = 509. The sample sizes for the Experienced and Inexperienced subgroups are 138
and 275, respectively;for High and Low cognitive ability are 101and 102, respectively; and
for High and L o w psychomotor ability are both 102.

experience produced only slightly higher consistency over time, as evi-


denced by higher time-interval correlations and a smaller range of ob-
served correlations throughout the time ,separations. Although the di-
rection of correlational differences was consistent with our proposition,
the magnitude of those differences was small. With regard to aptitudes,
the prediction that performance consistencywill be greater for high apti-
tude employees was not supported. In fact, the data suggest just the op-
posite: The median time-interval correlations for the low aptitude sub-
groups (both cognitive and psychomotor) were higher than those for the
high aptitude subgroups for most values of K.
The foregoing results suggest that the level of performance consis-
tency differs across time intervals and possibly across ability subgroups
734 PERSONNEL PSYCHOLOGY

(experience and aptitude). However, these data do not describe the na-
ture of the instability, nor do they indicate whether there are significant
differences in consistency across ability subgroups. Regression analyses
were employed as a means of examining the rate of change in stability
as a function of time separations. For this analysis, average correlations
were computed for each time interval K (via Fisher’s r-to-2 transforma-
tion and retransformation), and these averages were then regressed on
K. Estimates of Root Mean Square Error (RMSE) and R2were com-
puted in order to determine model fit. The regression analyses were con-
ducted on the total sample and the experience and aptitude subgroups.
The plot of average correlations on K revealed that a linear trend
(decline) fit the data well, a finding that sharply contrasts with the Rambo
et al. (1983) study in which a hyperbolic function best described their
data. A possible explanation for this difference is that the 6-month pe-
riod of the current study was too brief to identify the long term trend
revealed in the Rambo et al. study which covered 42 months. However
the data in Table 3 clearly suggest that the linear model is the best fit,
whereas Rambo et al. found that half of the total decrease in the mean
correlation occurred by week 18 and that the curvilinear trend was clearly
evident prior to 6 months. The ordinary least squares estimates for the
total sample presented in Table 3 revealed a significant decline in sta-
bility as the time interval increased (slope = -.01, p < .0001) and a
good model fit (RMSE = .01 and R2 = .99). However, a plot of resid-
uals on K revealed cyclical trends in the error terms, which is indicative
of first-order autocorrelation (i.e., the adjacent residuals were not inde-
pendent). The test for autocorrelation was significant (Durbin-Watson’s
d = .60, p < .Ol), thus necessitating a time-series model that could ac-
count for the autocorrelation and therefore improve the fit of the model
and the reliability of the model estimates. These results, shown in the
lower half of Table 3, were quite simikr to the ordinary least squares
estimates for the rate of change in stability (slope = -.Ol), the model
error (JXMSE = .Ol), and the proportion of variance in mean stability
that was accounted for by time separations (R2= .99). On the basis of
these results, our proposition 1 was supported: There was a significant
decline in performance consistencyas the interval between performance
measurement occasions increased.
Table 3 also presents the regression results for the experience and
aptitude subgroups, thus providing evidence pertaining to propositions
2 and 3. With regard to the impact of experience on performance con-
sistency, the plot of average correlations on K revealed a linear trend
for both groups, and the initial regression analysis indicated a slightly
stronger decline in stability for the inexperienced subgroup. However,
due to significant autocorrelation in both subgroups, autoregressive time
DEADRICK AND MADIGAN 735

TABLE 3
Regression Model Estimates of Average Performance Consistency
Over Time Intervals (K)for the Total Sample and for the Ability Subgroups
No ex- Priorex- Low High Lowpsy- High psy-
Sample Total perience perience cognitive cognitive chomotor chomotor
OLS Estimatesa
Intercept .91 .88 .90 .87 .89 .92 .89
Slope -.01 -.02 -.01 -.01 -.02 -.01 -.01
RMSE .01 .02 .01 .02 .01 .02 .01
R2 .99 .98 .99 .95 .99 .96 .98
ARl Estimatesb
Intercept .92 .89 .91 .89 .89 .93 .90
Slope -.01 -.02 -.02 -.02 -.02 -.02 -.01
RMSE .01 .01 .01 .01 .01 .01 .01
R2 .99 .99 .99 .99 .99 .98 .99
Note: K = 23 time intervals
a OLS refers to OrdinaryLeast Squares estimates. The model estimates were significant
at p < .OOOl.
ARl refers to first-order Autoregressive estimates. The model estimates were signifi-
cant at p < .0001.

series models were computed (lower half of Table 3), which revealed
equivalent estimates of the decline in stability over K and equivalent
estimates of model fit. In addition, although the mean level of per-
formance consistency across K was slightly higher for the experienced
subgroup, this difference was not statistically significant (t = 1.289, ns).
Thus, these results did not support our research proposition 2: Perfor-
mance consistency was not significantly higher for the experienced em-
ployees, and there was a significant decline in stability for experienced
as well as inexperienced employees.
The regression analyses of the cognitive and psychomotor ability sub-
groups produced similar results. Performance stability declined signifi-
cantly across K for the high and low subgroups, and there was no sig-
nificant difference between high and low groups in the mean level of
stability across K for either cognitive or psychomotor ability (t = -.83
and -.44, respectively). Hence, our research proposition 3 was not sup-
ported: Both high and low aptitude employees were characterized by
similar and significant declines in stability over time.

Consistency of Validity

Validity coefficients for successive performance periods were exam-


ined to determine whether a simplex pattern would also be observed in
the coefficients for the cognitive and psychomotor ability scores. The
criterion variables were monthly production averages, where Month 1
736 PERSONNEL PSYCHOLOGY

TABLE 4
CorrelationsAmong Output Criteria and Predictors Over Time
1 2 3 4 5 6 7 8
-
Criterion variablesa
1. Month 1 average .91 34 .75 .69 .66 .95 .73
2. Month 2 average - .92 32 .76 .72 .98 .80
3. Month 3 average - .91 .83 .78 .96 .87
4. Month 4 average - .92 .83 .86 .95
5. Month 5 average - .91 .79 .98
6. Month 6 average - .75 .95
7. Months 1-3 average .83
8. Months 4-6 average
Predictive validityb
Prior experience .27 .23 .21 .13 .15 .16 .25 .15
Psychomotor ability .16 .17 .18 .16 .18 .20 .17 .19
Cognitive ability .09 .09 .10 .12 .17 .16 .09 .16
Note: N = 413.
a All criteria intercorrelations were significant at p < .OOO1
Validity coefficients were significant as follows: T = .10 to .11, p < .05; T = .12 to .18,
p < .01; r > .18, p < .OOOl.

equalled the average of weeks 1 4 , Month 2 the average of weeks 5-8,


and so forth. In addition, production averages were calculated for 3-
month time intervals, again based on the weekly data. These data are
shown in Table 4.
The intercorrelations among adjacent monthly criterion variables de-
clined as the time interval increased, producing a simplex pattern similar
to that in Table 2. However, the validity of psychomotor ability remained
somewhat stable over the 6-month period, suggesting that it was a con-
sistent (and statistically significant) predictor of both initial (trainee) and
later job performance (T varied between .16 and .20). In contrast, cogni-
tive ability was a better predictor of post-training job performance than
of initial jo performance (T increased from .09 to .16). We extended
9
our analysis to determine whether these patterns persisted throughout
the first year on the job. Based on reduced samples, we found that the
validity of cognitive ability increased to .22 at Month 12 (n = 103, p <
.05) and that the validity of psychomotor ability was relatively stable at
.20 through Month 10 and then declined (Month 11: T = .13, n = 163, ns;
Month 12: T = .lo, n = 103, ns). These results do not support research
proposition 4: The validity of cognitive ability increased rather than de-
creased, and that of psychomotor ability was relatively stable over time.
Although the relationship of prior experience to performance was not
included in any of our propositions, the decrease in the coefficient for
prior experience from Month 1through Month 4 is notable, particularly
in light of the increase for cognitive ability.
DEADRICK AND MADIGAN 737

Rating Validity

Table 5 compares predictive validity coefficients across different


methods of performance measurement and presents correlations be-
tween the two sets of ratings and related output data. The predictive
validity results in Table 5 showed a slight increase in the correlation
between cognitive ability and ratings of quantity, while the correlation
between psychomotor ability and quantity ratings remained constant.
These changes in validities were similar to those for the objective perfor-
mance criteria (Table 4), and are consistent with proposition 5: Temporal
changes in validity are similar across methods of measurement when the
criteria are conceptually congruent.
Table 5 also provides data relevant to the issue of criterion reliabil-
ity and validity. When the correlations between the first and second
sets of ratings are viewed as test-retest reliability coefficients, the coeffi-
cients (.53 for the global ratings, .50 for ratings of quantity) indicate con-
siderable rating error. However, when the quantity ratings are aligned
with the corresponding production period (i.e., quantity rating #1 with
Months 1-3, quantity rating #2 with Months 4-6), the resulting correla-
tions are .61 and .65, indicating the extent to which the ratings of quan-
tity accurately reflected actual production output. Hence, the evidence
for rater accuracy is inconsistent with the test-retest reliability estimate,
providing a good example of the limitations of stability coefficients as
indicators of the quality of performance measures.

Discussion

Overall, the performance consistency analyses suggest that relative


performance is not stable over time, but declines systematicallyas a func-
tion of the time interval between performance measurement occasions,
regardless of the experience or aptitude characteristics of the employ-
ees. Our first proposition-that performance consistency decreases over
time-was strongly supported. The data unambiguously reflected the
simplex pattern claimed by Austin et al. (1989) and Henry and Hulin
(1987). Moreover, the decay in stability coefficients was insensitive to
the operators’ prior experience and aptitudes, contrary to propositions
2 and 3. However, the results were contrary to our expectations with
regard to the consistency of predictive validity. Although we predicted
changes in validity over time, the validity of cognitive ability steadily in-
creased rather than decreased during most of the first year, while that of
psychomotor ability was relatively constant. It is noteworthy that these
changes in validity were observed for both objective and subjective mea-
sures of production quantity, as we had predicted. On the other hand,
738 PERSONNEL PSYCHOLOGY

TABLE 5
Correlations Between Objective-Subjective Criteria and
Predictors-CriteriaOver Time
1 2 3 4 5 6
Variablesa
1. Global rating #1 .53 .59 NA .42 NA
2. Global rating #2 .44 .62 .31 .41
3. Quantity rating #1 .50 .61 NA
4. Quantity rating #2 .49 .65
5. Months 1-3 average .80
6. Months 4-6 average

Predictive validityb
Prior experience .19 .18 .21 .I7 .37 .28
Cognitive ability .28 .27 .15 .17 .07 .17
Psychomotor ability .18 .15 .13 .13 .ll .13
Note: N = 224. The cells denoted “NA”reflect illogical relationshipsdue to the temporal
nature of the ratings.
a All criteria intercorrelations were significant at p < .WI.
Predictive validity coefficients were significant as follows: T = .13 - .16, p < .05; T =
.17 - .24, p< .01; T > .24, p < .W1.

the magnitude, and possibly the consistency, of predictive validity dif-


fered for the global ratings. However, repeated ratings over a longer
period of time would be required before reaching any conclusions about
differential trends between global and dimensional criteria. At any rate,
the predictive validity findings here clearly call into question the univer-
sality of the simplex phenomenon, which was purported to exist for both
performance stability and validity coefficients.
The central question in the debate is whether performance is dy-
namic, not whether performance inconsistency is mirrored in validity
coefficients. As Ackerman (1989), Murphy (1989), and Barrett and
Alexander (1989) correctly note, there is no logical reason to expect a
universal p t t e r n of diminishing validity coefficients. On the contrary,
performance instability is likely reflective of the interaction of a wide ar-
ray of personal, situational, and temporal factors. The central finding of
this study is unequivocal: The rank-ordered production performance of
the sewing machine operators in this sample continuously shifted over
a 6-month period of time, with strong evidence that these performance
changes persisted throughout the first year. These results provide ad-
ditional evidence for the instability of skilled performance and high-
light the need for models of job performance that incorporate the phe-
nomenon of dynamic performance.
As noted earlier, performance on a routine skilled task could be at-
tributable to different aptitudes at different stages of skill acquisition
DEADRICK AND MADIGAN 739

(Ackerman, 1989). However, our results directly conflict with Acker-


man’s hypothesis that cognitive ability decreases in importance as the
task becomes well-learned. Perhaps the task in this instance was too
complex to allow for the automatized cognitive processing proposed by
Ackerman. If so, this would indicate that Ackerman’s theory is extremely
limited in its scope given the routine nature of the sewing machine op-
erator job. Alternatively, perhaps cognitive ability played a continuing
role as the operators discovered learning challenges in performance ac-
tivities associated with production output. Whereas early performance
for these operators might rely more on mastering the basic “mechanics”
of the job, continued performance improvement might depend upon the
ability to balance demands for quantity and quality, as well as respond
effectively to model changes and other more subtle demands of the job.
This “deferred learning” would serve to increase the complexity of the
job during later performance periods. The relatively higher correlation
coefficients between cognitive ability and the global ratings provide ad-
ditional evidence for this interpretation.
The findings reported here are supportive of Murphy’s (1989) dy-
namic model of job performance. In contrast to Ackerman’s focus on
task complexity (i.e., automatized vs. controlled processing demands),
Murphy argues that the relative causal factors of performance over time
depend on task activities (i.e., transition vs. maintenance phases), which
vary as a function of job tenure, job, and person. Although Murphy
defines both performance and ability in general terms, his model can
account for both the performance and the validity instability that were
evidenced here. With regard to performance instability, Murphy rea-
sons that skilled performance is unstable over time because it is “over-
learned.” As a result, dispositional (vs. ability) factors will have a greater
impact on performance, and some dispositional factors tend to be less
stable than abilities (p. 195). Although we were unable to directly con-
firm this prediction, it is quite plausible that the observed inconsistency
found in this study is attributable to fluctuations in the motivation level
of individuals. The present study made the simplifying (and probably
unrealistic) assumption that the incentive (piece-rate) system produced
a heightened and constant level of motivation, but other motivational
factors were not controlled. Kanfer and Ackerman (1989) offer a theo-
retical framework and provide evidence of an interaction between cog-
nitive ability and motivation during skill acquisition. In a similar vein,
Helmreich, Sawin, and Carsrud (1986) suggest the existence of a “hon-
eymoon effect” on performance inconsistency in routine jobs: Experi-
ence and ability are important determinants of performance during the
initial employment period, but after the honeymoon is over, motivation
740 PERSONNEL PSYCHOLOGY

assumes greater importance. Hollenbeck and Whitener (1988) also ref-


erence support for a performance model involving interactions among
personality, motivation, and ability. A variety of extrinsic and intrinsic
factors could have potentially affected motivation and, thus, the tempo-
ral stability of performance of the operators. The present study accounts
for only one of them, albeit an important one. Improved understanding
of performance dynamics will require direct measurement or control of
multiple motivation variables in future studies.
Regarding validity instability, Murphy (1989) predicts that the rela-
tive importance of cognitive ability as a cause of job performance will
be maximized during transition stages when workers are new to a job,
when the major duties or responsibilities of a job change, or when work-
ers “cannot rely on past experience but rather must rely on sound judg-
ment to perform their jobs” (p. 190). Therefore, occasional changes in
the tasks performed or the activities required to accomplish major tasks
will necessitate further learning, which will trigger a transition stage (a
concept consistent with the notion of “deferred learning” offered pre-
viously). The later learning challenges, where performance depends on
“sound judgment” as opposed to experience or practice, would repre-
sent transition stages for these operators. This perspective is also consis-
tent with the findings and rationale of McDaniel, Schmidt, and Hunter
(1988). In a meta-analysis of the relationship between experience and
job performance, they found that experience declines in importance over
time. Our results suggest a similar conclusion. For the first three months,
the typical training period for the operators, the validity of prior expe-
rience declined steadily but remained the best single predictor. During
Months 4-6, prior experience and cognitive ability were roughly equiva-
lent in validity, and in later months cognitive ability had higher validity.
If cognitive ability and experience both influence performance indirectly
through job knowledge (Schmidt, Hunter, & Outerbridge, 1986), then
one might e p e c t cognitive ability to assume greater importance as the
difference id experience shrinks with time on the job. That is, the rela-
tive contribution of cognitive ability to job knowledge will increase as the
knowledge differential attributable to prior experience decreases. More-
over, McDaniel et al. suggest that this process will be particularly strong
in relatively simple task situations where the advantage provided by prior
experience can be offset in a shorter time frame, especially for samples
with low mean levels of job experience. Both of these conditions fit the
present study. Therefore, our findings support the plausibility of this ex-
planation and suggest that further analysis of the temporal relationship
between experience, ability, and performance is warranted.
We cannot dismiss the possibility that instability in performance might
be partially attributable to changes in the context and/or tasks. Product
DEADRICK AND MADIGAN 741

(model) changes did occur periodically in each of the plants. To the de-
gree the industrial engineering system did not properly compensate for
these changes, the relative earnings could have been affected. However,
such changes typically affected the entire production line, resulting in
a brief general decrease in production as the workers adjusted to new
sea-ingoperations. Moreover, these changes occurred at irregular inter-
vals. Hence it is unlikely that they were a primary determinant of the
steady decline in stability coefficients. Other, more subtle, situational
factors might have differentially affected the operators, but in our ex-
tensive exposure to the plants and workers we did not identify them. In
short, we believe both the context and the findings here point to individ-
ual level factors as the primary determinants of performance inconsis-
tency.
The research reported here, together with a growing body of evi-
dence for instability in performance and validity (see Murphy, 1989),
suggests several research avenues that need to be explored further. First,
more longitudinal research needs to be conducted that examines re-
peated measures of both performance and ability. Repeated measures of
global and specific performance criteria will provide evidence for crite-
rion validity and equivalence (Binning & Barrett, 1989; Ironson, Smith,
Brannick, Gibson & Paul, 1989;James, 1973; Smith, 1976). Ideally, mul-
tiple methods of performance measurement would be utilized to better
our understanding of the “conceptual criterion.” Repeated measures of
ability will enable us to make more informed assumptions about the rel-
ative stability of ability factors as well as the stability of predictive validity
( H u h et al., 1990; Murphy, 1989).
Second, the research on performance consistency needs to be ex-
tended to models and analyses of within-person performance consis-
tency. Kane’s (1982) work on performance distributions is an isolated at-
tempt to understand the influence of ability and effort on individual per-
formance measurement and resultant organizational decision-making.
Individual consistency research would also provide information about
the frequency and duration of transition and maintenance stages that
workers experience. Murphy (1989) assumes that most jobs possess mul-
tiple transition stages that “occur at different times and last for different
durations” for different people (p. 195). Because progression through
these stages varies across individuals as well as jobs, it is important on
both theoretical and practical grounds that we are able to identifyworker
characteristics that might aid in the prediction of transition versus stable
stages of performance (Murphy, 1989).
Third, we agree with Murphy that consistency research should focus
on the job environment rather than the job itself. At the organizational
742 PERSONNEL PSYCHOLOGY

or work-group level, this type of research would identify external (struc-


tural) changes that elicit transition stages across an intact work group
and therefore extend the literature on organization and job design. At
the individual level, job environment research would complement the
role theory and citizenship behavior research by identifying the “fit” be-
tween job and worker characteristics that enables employees to move to
maintenance stages more quickly and effectively.
In conclusion, the data presented here, as well as the findings re-
viewed by both parties to the current debate regarding dynamic criteria,
provide a strong body of evidence that the relative performance of work-
ers changes considerably over time. Evidence of this type was dismissed
as inconsequential by Barrett et al. (1985) if such differences were not
also revealed in validity coefficients. However, our findings are consis-
tent with the numerous studies reporting varying temporal patterns of
validity coefficients as summarized by Barrett and Alexander (1989). We
therefore agree with them that the goal of research should be “. . .to de-
termine for each situation why the predictor-performance relationship
drops, remains the same, or increases over time rather than simply rely-
ing on an hypothesized process of ability acquisition such as the simplex”
(p. 610). Studies incorporating repeated measures of both predictors
and criteria will be needed to develop a better understanding of this re-
lationship. In the meantime, we believe that the attempt of Barrett and
Alexander to place the burden of proof on advocates of dynamic criteria
is inappropriate. Given the existing literature on the effects of individual
differences and situational influences on behavior, the burden of proof
should be on those who espouse temporal stability in relative perfor-
mance. An assumption of dynamic criteria, defined here as systematic
changes (instability) in relative performance, appears to be more appro-
priate theoretically and empirically than the static assumption.

REFERENCES

Ackerman PL. (1989). Within-task intercorrelationsof skilled performance: Implications


for predicting individual differences? Journal of Applied Psychology, 74,360-364.
Alvares KM, H u h CL. (1972). Tivo explanations of temporal changes in ability-skill
relationships: A literature review and a theoretical analysis. Human Factors, 14,
292-308.
Austin JT, Humphreys LG, H u h CL. (1989). Another view of dynamic criteria: A critical
reanalysis of Barrett, Caldwell, and Alexander. PERSONNEL PSYCHOLOGY, 42,583-
596.
Barrett GV, Alexander RA.(1989). Rejoinder to Austin, Humphreys, and H u h : Critical
reanalysis of Barrett, Caldwell, and Alexander. PERSONNEL PSYCHOLOGY, 42,597-
612.
Barrett GV, Caldwell MS, Alexander RA. (1985). The concept of dynamic criteria: A
critical reanalysis. PERSONNELPSYCHOLOGY, 38,41-56.
DEADRICK AND MADIGAN 743

Barrett GV, Caldwell MS, Alexander, RA. (1989). The predictive stability of ability re-
quirements for task performance: A critical reanalysis. Human Performance, 2(3),
167-181.
Bass, BM. (1962). Further evidence on the dynamic character of criteria. PERSONNEL
PSYCHOLOGY, 15,93-97.
Binning JE Barrett GV. (1989). Validity of personnel decisions: A conceptual analysis of
the inferential and evidential bases. Journal ofApplied Psychology, 74,478-494.
Cronbach LJ,Snow RE. (1977). Aptitudes and insmtctional methods. New York Irvington.
Dunnette MD. (1963). A modified model for test validation and selection research. In
Dreher G, Sackett P (Eds.), Perspectives on Employee Staffing and Selection (pp.9-
15). Homewood, I L Richard D. Irwin.
Ghiselli EE. (1956). Dimensional problems of criteria. Journal of Applied Psychology, 40,
14.
Ghiselli EE, Haire M. (1960). The validation of selection tests in light of the dynamic
nature of criteria. PERSONNEL PSYCHOLOGY, 13,225-231.
Guion RM. (1976). Recruiting, selection, and job placement. In Dunnette M (Ed.), Hand-
book of Industrial and Organizational Psychology (pp. 777-828). Chicago: Rand Mc-
Naliy.
Guion RM, Gibson WM. (1988). Personnel selection and placement. In Rosenzweig MR,
Porter LW (Eds.), Annual Review of Psychology, 39,349-74.
Helmreich RL, Sawin LL, Carsrud AL.(1986). The honeymoon effect in job performance:
Temporal increases in the predictive power of achievement motivation. Journal of
Applied Psychology, 71,185-188.
Henry RA, Hulin CL. (1987). Stability of skilled performance across time: Some general-
izations and limitations on utilities. Journal ofApplied Psychology, 72,457462.
Hollenbeck JR, Whitener EM. (1988). Reclaiming personality traits for personnel selec-
tion: Self-esteem as an illustrativecase. Journal of Management,14,81-91.
Hulin CL, Henry RA, Noon SL. (1990). Adding a dimension: Time as a factor in the
generalizability of predictive relationships. Psychological Bulletin, 107,328-340.
Hunter JE. (1983). Test validation for 12,000jobs: An application ofjob classification and
validiy generalization analysis to the GeneralApfitr.de Test Batrev. U.S.Employment
Service Test Research Report #45. Washington D C U.S. Department of Labor.
Ironson GH, Smith PC, Brannick MT, Gibson WM, Paul KB. (1989). Construction of a job
in general scale: A comparison of global, composite, and specific measures. Journal
of Applied Psychology, 74,193-200.
James, LR. (1973). Criterion models and construct validity for criteria. Psychological
Bulletin, 80,75-83.
Kane JS. (1982, November). Rethinking the problem of measuring performance: Some
new conclusions and a new appraisal method to fit them. Paper presented at the
Fourth Johns Hopkins University National Symposium on Educational Research,
Washington, DC.
Kanfer R, Ackerman PL. (1989). Motivation and cognitive abilities: An integrative/
aptitude-treatment interaction approach to skill acqu on.JournalofApplied Psy-
chology, 74,657-690.
McDaniel MA, Schmidt FL, Hunter JE. (1988). Job experience correlates of performance.
Journal of Applied Psychology, 73,327-330.
Murphy KR. (1989). Is the relationship between cognitive ability and job performance
stable over time? Human Performance, 2,183-200.
Prien EF! (1966). Dynamic character of criteria: Organizationalchange. JournalofAppZied
Psychology, 50,501-504.
Rambo WW, Chomiak AM, Price JM. (1983). Consistency of performance under stable
conditions of work. Journal of Applied Psychology, 68,78-87.
744 PERSONNEL PSYCHOLOGY

Rambo W,Chomiak AM, Rountree RI. (1987). Temporal intervals and the estimation of
the reliability of work performance data. Perceptual and Motor Skills, 64,791-798.
Ronan WN, Prien El? (1966). Towards a crirerion theov: A review and analysis of research
and opinion.Greensboro, NC: The Richardson Foundation.
Rothe HE (1978). Output rates among industrial employees. Journal ofApplied Psychol-
ogy, 63,4W6.
Schmidt FX,Hunter JE, Outerbridge AN. (1986). The impact of job experience and
ability on job knowledge, work sample performance and supervisory ratings of job
performance. Journal of Applied Psychology, 71,432-439.
Smith P.(1976). Behaviors, results, and organization effectiveness: The problem of crite-
ria. In Dunnette M (Ed.), Handbook of industrial and organizationalpsychology (pp.
745-775). Chicago: Rand McNally.
Wernimont PF, Campbell JF! (1968). Signs, samples, and criteria. Journal of Applkd
psycho lo^, 52,372-376.

You might also like