You are on page 1of 7

Review

Reviewed Work(s): The Flanagan Industrial Tests by John C. Flanagan


Review by: John L. Horn
Source: Journal of Educational Measurement , Summer, 1966, Vol. 3, No. 2 (Summer,
1966), pp. 191-196
Published by: National Council on Measurement in Education

Stable URL: https://www.jstor.org/stable/1433899

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

National Council on Measurement in Education is collaborating with JSTOR to digitize, preserve


and extend access to Journal of Educational Measurement

This content downloaded from


122.53.122.109 on Tue, 18 Apr 2023 12:14:13 UTC
All use subject to https://about.jstor.org/terms
JOURNAL OF EDUCATIONAL MEASUREMENT
VOLUME 3, NO. 2
SUMMER, 1966

THE FLANAGAN INDUSTRIAL TESTS

College students and adults; tests 1960-1963; Manual, 1965; 18 scales plus recom
composite scores: Arithmetic, Assembly, Components, Coordination, Electronics,
Ingenuity, Inspection, Judgment and Comprehension, Mathematics and Reasoning
Memory, Patterns, Planning, Precision, Scales, Tables, and Vocabulary, plus the
measures: general ability, verbal ability, and quantitative ability. 3 hours 37 minu
Flanagan; Science Research Associates.

FIT (Flanagan Industrial Tests) is an offspring of FACT (Flanagan


Classifications Tests, 1953; 1958). Depending upon the connotations of
FACT which first spring to mind, this might or might not seem like a hap
tion. Yet FIT probably is a worthy addition to the world's population of
in these days of population explosions). Certainly there are some inter
worthwhile features about its development.

As is probably well known, the construction of the FACT batteries w


large part upon knowledge-and a philosophy-developed in test con
projects which Professor Flanagan directed when he headed the WW II
Force Aviation Psychology Program-a program concerned principally
proving personnel selection and placement in the Air Force. The gener
characterizing this work derives more remotely from Truman Kelley's
the 1920's and 1930's to define and classify abilities according to o
wherein they are particularly needed. In Professor Flanagan's refineme
called the job-element approach.

A job element is defined in terms of "critical behaviors" involved in a num


jobs or occupations. According to the FACT Manual (1959) "These c
haviors are obtained by determining . . . which behaviors . . . make a d
with respect to on-the-job success and failure. The critical behaviors are the
fied into job elements in terms of initial hypotheses regarding the . . . natu
aptitudes involved. The next step is to test the hypotheses that specifi
variation in job performance are correlated with variation on the relat
(after which) . . . definitions of the job elements are prepared." It is im
most (if not all) of the tests presently available in the published FAC
batteries represent job-elements common to several occupations. The
batteries are said to provide a fairly comprehensive sampling of the jo
thus far identified.

According to the test author, FIT was ". . designed specifically for
adults in personnel selection programs for a wide variety of jobs" (Man
In some respects the battery seems particularly well suited for this pu
18 tests represent a wide sampling of human abilities-abilities which, o
of it at least, seem to be important in the real jobs of real people. And
Professor Flanagan has not used factor analytic procedures (or the like)
construction, he has been much more careful than most test construtor

191

This content downloaded from


122.53.122.109 on Tue, 18 Apr 2023 12:14:13 UTC
All use subject to https://about.jstor.org/terms
192 JOURNAL OF EDUCATIONAL MEASUREMENT VOL. 3, NO. 2

that his 18 tests represent truly independent attributes. Moreover-a


refreshing-the Manual provides data bearing pointedly on t
multiple correlation of each test with the 17 other tests in the batte
from .30 to .67 and all are low enough (relative to the assumed
below) to suggest that each test is reliably measuring something tha
by the other tests. Would that more test manuals contained this kin

But in some important respects the promise of the FIT batter


promise. The evidence which could indicate its true validity sim
gathered. With respect to this lack of evidence, there are several
wary potential user should consider.

The Manual tells us that the FIT tests "like the FACT series . . . are based on
the identified job elements . . . ," but the Manual is not at all clear in indicating
the source research in which these job elements were identified. It appears that this
research is essentially that upon which the FACT was based and that this was
mainly the research conducted under Flanagan's direction during WW II. This
being the impression, one wonders if the job-elements thus far identified are im-
portant elements of today's occupations.

In a sense, of course, this argument is specious, for the tests refer to fairly
general abilities, not specific elements of jobs that would have changed since WW
II. In this respect the tests assess attributes like the primary mental abilities isolated
by means of factor analyses; indeed, it is evident that the FACT and FIT batteries
contain measures of many of the ability factors so far identified, as described in
comprehensive reviews of replicated findings (French, Ekstrom & Price, 1963;
Guilford & Merrifield, 1960). The FIT Manual makes no reference to this line of
research, however. But the principal virtue of the job-element approach would
seem to be that it promises to identify features of job performance not identified
by other means and, in particular, features of today's jobs as they are performed
today. One might expect, for example, that this approach would lead to the identi-
fication of interesting new abilities related to performance in computer program-
ming, a very important occupation in today's world but one which has come into
being only in the last 15 or 20 years (i.e., since WW II). Yet there is no reference in
the FIT Manual to research showing which-if any-elements of such jobs have
been identified. In fact, the FIT Manual does not clearly direct the reader to the
sections of the FACT Manuals wherein the relationships between FACT scores
and job performances are indicated (Your FACT Scores-and What They Mean,
1953; Interpreting Test Scores, 1956; Technical Report, 1959) although tables
showing the comparability between FIT tests and FACT tests are provided.

A closely related point has to do with the norms and practical validities (i.e.,
relevancies) available for use with FIT. These are of two kinds: (1) those derived by
administering the FIT tests to persons in different academic programs and deter-
mining percentile norms and relationships between test scores and academic per-
formance in these groupings, and (2) those obtained by "equating" scores on a given
FIT test with scores on a comparable FACT test and using this as a basis for treat-
ing the information available on the FACT as applicable to the FIT. In either case,

This content downloaded from


122.53.122.109 on Tue, 18 Apr 2023 12:14:13 UTC
All use subject to https://about.jstor.org/terms
SUMMER, 1966 THE FLANAGAN INDUSTRIAL TESTS 193

as concerns the author's intent to provide a battery for use with adult
selection programs, the information presently available leaves muc

Percentile norms are provided on a sample of 3,359 twelfth-grade


sample of 701 entering freshmen at a select men's university. Ne
samples would appear to be representative of adults encountered ".
selection programs for a wide variety of jobs." The sample of fresh
divided into five groups corresponding to the kinds of programs offer
versity. The programs were: (1) Arts and Sciences and Arts and E
Business, (3) Chemistry and Chemical Engineering, (4) Electrical En
(5) Other Engineering.

Means for each of the groups and for each test are provided. T
patterns among them are interpreted as indicating differences bet
vocational groups, but analyses to indicate the significance (or insignif
differences are not given. Correlations and step-wise multiple corr
gression equations are given for fall and spring grades in four of the f
programs. The samples in a few of these analyses are small (e.g., N
of the reported regression coefficients are very likely unstable and
leading. Nevertheless, the results, overall, indicate that some of the tes
vance for predictions of academic performance. But, of course, it
from these results that the tests have relevance for predictions
success, even in fields seemingly related to the college programs,
"variety of jobs."

The "equipercentile" method (Flanagan, 1951) was used to help e


ponding FIT and FACT tests. In this, assuming that two tests conta
of items and that the correlation between them is sufficiently high to
sure of the same attribute, scores which cut at about equal perce
defined as comparable. Insofar as tests thus equated are in fact e
procedure allows one to use norm data available on one test (FACT
test (FIT) for which these data are lacking. The FACT Manuals
stanines, percentile norms and regression equations for large samp
wide range of adult occupations.

The FIT tests are, in general, about half as long as the FACT te
battery there is one test-Electronics-not found in FACT, whereas
tains two tests not found in FIT. As concerns the intended use for
the main difference between FIT and FACT is that the difficulty
former have been increased.

The correlations between corresponding FIT and FACT tests are in some cases
very low. For example, FIT Inspection correlates only .28 with FACT Inspection.
This could result because of a difference in difficulty level for the two tests. But
even so, it makes use of the results from equating FIT and FACT scores a dubious
procedure. Also, some FIT tests correlate rather high with non-corresponding FACT
tests or with other FIT tests-i.e., high relative to the correlation between corres-
ponding FIT and FACT tests. For example, although FIT Planning correlates .38
with FACT Planning, it correlates .43 with FACT Ingenuity. Explanation of this

This content downloaded from


122.53.122.109 on Tue, 18 Apr 2023 12:14:13 UTC
All use subject to https://about.jstor.org/terms
194 JOURNAL OF EDUCATIONAL MEASUREMENT VOL. 3, NO. 2

kind of outcome in terms of differences in difficulty levels is not at a


To use FACT norms as applicable to a FIT test in a case like this wo
a very dubious procedure indeed.

For at least three FIT tests-Inspection, Memory and Planning-th


tioned problems are very real. In this respect, the Manual states t
where the correlations are low . . . the equivalence tables can be u
comparable levels of performance on the two tests. However, the f
corresponding tests are measuring somewhat different functions, o
levels, should be considered in any interpretation of the scores." Th
in these cases, particularly, and with respect to other tests to a less
user should be cautious in attempting to interpret FIT tests in term
validities, etc., found for FACT tests.

If tests are intended for use with ". .. adults in personnel selection pr
a wide variety of jobs" it would seem desirable to have data indicatin
ships between test performances and such variables as age, speediness an
This is not to say that a test is necessarily invalid if it involves speedin
extent or if it discriminates against older persons. In some tests 'sp
essential aspect of the attribute measured (Inspection, for example)
expected that older persons will perform more poorly in some kinds of
use tests in personnel selection for a wide variety of jobs, one shou
know about these matters, for it is certain that speediness is not es
formance in some jobs where an ability like that measured in a spee
seem to be involved and, likewise, there are situations where one
make an adjustment to remove age differences found on the test but no
predicting job success. Yet the Manual provides no information of
specified.

The reliabilities of the FIT tests were estimated indirectly using several kinds of
information. No test-retest data were gathered and, since there is (as yet) no parallel
form for FIT, equivalency coefficients could not be obtained. Since the tests are
speeded, it is, as is noted in the Manual, ". . . inappropriate to compute the usual
Spearman-Brown and Kuder-Richardson estimates of the reliability coefficients"
(Manual, p. 12). Lacking these avenues of approach to estimates of reliability, Pro-
fessor Flanagan has utilized the correlations between corresponding FIT and FACT
tests, the inferences one can draw from the similarity in pattern of correlations
which corresponding FIT and FACT tests have with other measures, and the in-
ference one can draw from the multiple correlation which a test has with other tests.

The correlations between corresponding FIT and FACT tests range from .28 to
.79. Since the FIT tests are, in general, at a higher level of difficulty than the FACT
tests, these coefficients are almost certainly underestimates of equivalency relia-
bilities. The multiple correlations for the various tests with other tests vary from .30
to .68. These are somewhat inflated by least-squares capitalization on chance varia-
tion, but their lowness also reflects the fact that the tests were designed to be fairly
independent. Hence, again the suggestion is that reliabilities are almost certainly
above .30 to .80. The patterns of correlations with other tests for corresponding

This content downloaded from


122.53.122.109 on Tue, 18 Apr 2023 12:14:13 UTC
All use subject to https://about.jstor.org/terms
SUMMER, 1966 THE FLANAGAN INDUSTRIAL TESTS 195

FIT and FACT tests are in many cases strikingly similar and the
tween two FIT tests are often nearly the same size as the correla
responding FACT tests. Taken together, this evidence suggests t
bilities are surely non-zero, almost certainly all above .3, some p
ciably lower than for corresponding FACT tests, and perhaps m
from about .50 to .85. It would seem, therefore, that the reliab
for many uses for which the test is intended (viz., institution
should, of course, be cautious about using the test for purposes for
intended-viz., for individual decisions.

There are a few other rather minor points which, however, m


tioning. For example, this reviewer objects to a description of a test
tering" when, in fact, administration requires that one adhere close
set for various subtests, as in the FIT battery. There is in psych
convention of describing as "self-administering" a test for whi
directions are given in the test itself that the tester needs only to
stop the entire test at the right time." (English & English, 1958). Ho
has started and stopped his stop watch some 17 times, as required in
of the FIT battery, he is likely to derive the notion that the ease of
implied by the English and English definition doesn't quite apply an
the "self" referred to in the description of the FIT as "self-adm
tester! The test certainly is not self-administering in the sense that
hand it to a job applicant with instructions to take it into the next
with it in X minutes. This, irrespective of dictionary definitions, w
the implicit meaning of the term self-administering.

The Manual provides suggestions for combining tests, or usin


give estimates of "general ability," but no good rationale for th
given. It so happens that the suggested composite - "Judgment and
Mathematics and Reasoning, and Vocabulary" - is similar to th
mension defined in recent research (Cattell, 1963; Horn, 1965;
1966a; 1966b). This is no doubt a good predictor of academic p
pointed out in the Manual, but it is questionable whether it is the b
measure to use in personnel selection with adults having widely
backgrounds. The other tests said to have ". . . broad usefulness
general intellectual ability . . ." viz., "Ingenuity, Expression, a
not been identified as representative (on their own) of "general
search of which this reviewer is aware. Rather such tests repre
lower order than "general ability" in hierarchical organization
Vernon, 1950) or they represent primary (not general) abilitie
theories (e.g. French, Ekstrom & Price, 1963). The test author m
three tests in combination with others (not specified) may defin
dimension, but no research is cited to support this suggestion. In fa
the term "general ability" is not completely equivocal, the Manu
dence to support a contention that the three tests, either singly or
measure a distinct "general ability."

This content downloaded from


122.53.122.109 on Tue, 18 Apr 2023 12:14:13 UTC
All use subject to https://about.jstor.org/terms
196 JOURNAL OF EDUCATIONAL MEASUREMENT VOL. 3, NO. 2

While some of the statements made here are critical, they sh


preted to mean that the FIT tests are of low quality or that
misleads. In fact, the tests seem to be exceedingly well put t
for the FIT are not generally over-stated and much informatio
ual is found in this one. Several of the cautions noted above are at least adum-
brated in the Manual, for example. Probably the most damning thing that ca
be said about the test is that not enough data have been accumulated to show
that it is highly useful in the kinds of situations for which it is intended. T
should give caution to potential users who would jump from face validity
relevancy in other situations to a belief in the test's relevancy for use in the
potential user's situation, but it should not discourage use of the test. As is poin
out in the manual, "For personnel selection purposes each company needs to det
mine its own local standards." The test seems to be well-suited for research
aimed in this direction.

REFERENCES

CATTELL, R. B., Theory of fluid and crystallized intelligence: a critical experiment


Journal of Educational Psychology, 1963, 54, 1-22.
ENGLISH, H. B. AND ENGLISH, A. V. A comprehensive Dictionary of Psychologica
and Psychoanalytical terms. New York: Longmans, Green & Co., 1958.
FLANAGAN, J. C., Technical report. Flanagan aptitude classification tests. Chicago
Science Research Associates, 1959.
FLANAGAN, J. C., Interpreting test scores. Flanagan aptitude classification tests.
Chicago: Science Research Associates, 1956.
FLANAGAN, J. C., Your FACT scores-and what they mean. Chicago, Science Re-
search Associates, 1953.
FLANAGAN, J. C., Units, scores and norms. In E. F. Lindquist (Ed.). Educational
Measurement. Washington: American Council on Education, 1951.
FRENCH, J. W., EKSTROM, R. B. AND PRICE, L. A. Manual for kit of reference
tests for cognitive factors. Princeton, New Jersey: Educational Testing Service,
1963.
HORN. J. L. Fluid and crystallized intelligence: a factorial and developmental study
of structure among primary mental abilities. Uupublished doctoral dissertation,
University of Illinois, 1965.
HORN, J. L. AND CATTELL, R. B. Refinement and test of the theory of fluid and
crystallized intelligence. Journal of Educational Psychology, 1966 (in press).
HORN, J. L. AND CATTELL, R. B. Age differences in primary mental abilities.
Journal of Gerontology, 1966 (in press).
VERNON, P. E. The structure of human abilities. New York: Wiley, 1950.

JOHN L. HORN
University of Denver

This content downloaded from


122.53.122.109 on Tue, 18 Apr 2023 12:14:13 UTC
All use subject to https://about.jstor.org/terms

You might also like