Professional Documents
Culture Documents
Abstract
This report discusses problems of data aggregation in personnel selection using data
from two recent studies in the field of aviation psychology and traffic psychology. In
the first study 99 military pilot applicants were tested using a comprehensive test
battery. In order to determine the predictive validity of the chosen test battery
artificial neural networks, linear discriminant analysis and logistic regression analysis
are used as methods of statistical judgment formation. A global evaluation of the
applicants’ performance in a standardized flight simulator served as criterion measure.
The results of this study demonstrate that artificial neural networks outperformed
classical methods of statistical judgment formation with regard to classification rate
and validity coefficient. In the second study a comprehensive test battery measuring
driving related abilities was administered to 222 respondents. A global evaluation of
the respondents’ performance in a standardized driving test served as criterion
measure. Similar to the results obtained in study 1 artificial neural networks
outperformed classical approaches to statistical judgment formation with regard to
classification rate, validity coefficient and separability of safe and less safe drivers on
the individual level. Based on these results it can be concluded, that artificial neural
networks are a useful method in personnel selection which increases the objectivity
and validity of judgments derived from standardized test batteries.
Theoretical Introduction
The input layer consists of a number of units equal to the number of predictor
variables, while the output layer represents the criterion measure. In between these
two layers one or more so called "hidden layer" can be positioned. This general
structure is often referred to as multi-layer perceptrone (Anderson & Rosenfeld, 1988;
Bishop, 1995; Kinnebrock, 1992; Rojas, 2000; Warner & Misra, 1996). According to
Kinnebrock (1992) one single hidden layer often suffices in practical applications. The
number of units in the hidden layer is optional and determines to a great extent the
complexity and generalizability of the artificial neural net. Mielke (2001) already
pointed out, that a higher number of “hidden” layer units enable the artificial neural
net to adapt more closely to the data at hand. However, this comes with an increased
risk of over-generalization, which can hinder the generalization of the newly
constructed artificial neural net to different sets of data. Therefore the determination
of the number of hidden layer units is of high importance (Mielke, 2001). As can be
seen in figure 1 an artificial neural net features connections between the units of the
three layers. The individual units within a layer can be connected with all units of the
adjacent layer which is called a complete feed-forward connection. Thus each unit
transmits its information to all units of the adjacent layer. The information transmitted
to a unit of the adjacent layer is weighted. The main aim of the construction of an
artificial neural net resides in the iteratively optimization of these weights. As became
apparent the procedure for the construction of an artificial neural net differs clearly
from the estimation of a classical linear model such as the discriminant analysis or
regression analysis. Artificial neural nets are generally under-specified, which is due to
the comparably higher amount of path coefficients. They thus have negative degrees
of freedom and therefore the individual path coefficients cannot be clearly estimated,
since more than one solution for the weights of the individual paths is conceivable.
Instead of an estimate algorithm artificial neural nets thus use so-called learning
algorithms, which accomplish an optimization of the weights in an iterative process, by
strengthening beneficial paths and weakening other paths. There is a range of learning
algorithms, which can be used to train a newly constructed artificial neural net, back-
propagation being one of the well known. Thus artificial neural nets are to be
understood as diagnostic heuristic.
Method
In a first phase of the selection process all pilot applicants have been administered a
comprehensive standardized test battery. The test battery covered the areas of
inductive thinking (AMT: Hornke, Etzel & Rettig, 2003), spatial perception (A3DW:
Gittler, 2002) and attention (COG: Wagner & Karner, 2003). Moreover, the tests also
measured the candidates’ reactive stress tolerance (DT: Schuhfried, 1998), verbal
memory (VERGED: Etzel & Hornke, 2003a), visual memory (VISGED: Etzel & Hornke,
2003b) and psychomotor coordination (SMK: Bauer, Guttmann, Leodolter & Leodolter,
2002). The statistical analyses always take into account the main variables of the
respective test. In the AMT, A3DW, VISGED and VERGED, for instance, these are the
person parameters according to the Rasch Model. In Cognitrone, the variable mean
time correct rejection is used. In the Determination Test the number of correct
responses was used. Psychomotor coordination is covered with the test scores mean
angle deviation and time in ideal range of the SMK. In a second selection phase, data
were collected about the general performance on the flight simulator. On the basis of
these data, the candidates were subdivided into the two groups of suited and less
suited candidates. In the following analysis, those classified with D = less
suitable were referred to the group of the unsuitable, as this group would otherwise
have only included four candidates and an evaluation of the psychological test battery
would not have been possibly of this categorization. 53.54 percent of the pilot
applicants received a positive global evaluation and are thus considered to be
successful.
Sample
The sample encompasses 104 members of the German Federal Army who are in the
course of a pilot training. The complete data of 99 candidates are provided. All the
candidates are men between 16 and 25 years of age, with and average age of 20.4
years and a standard deviation of 1.85 years. One of them (1%) had completed just 9
years of school but no vocational training, while 19 candidates (19.2%) had completed
a vocational school. 74 candidates altogether (74.7%) provided a high-school leaving
certificate with university entrance permission, and five candidates (5.1%) graduated
from university or college.
Results
The calculation of the discriminant analysis was carried out with the program SPSS
10.0. The results show that the prerequisites of homogeneity of the variances and co-
variances were met (Box-M: F=1.363, p=.072). The outcome of the analysis was a
discriminant straight that cannot divide significantly between the two groups (Wilks-
Lambda=.851, df=8, p=.059). In this analysis a total of 69.7% of the sample is
correctly classified. This includes 81.1% of the suitable pilot candidates and 56.5% of
the unsuitable pilot applicants. The chance rate amounts to 53.5%. This results into a
validity coefficient of r=.390.
In order to ensure the stability of this result a jackknife validation is carried out.
Jackknife validations are a commonly used procedure to examine the stability of
results in case there is no second independent data set at hand (Brown & Wicker,
2000; Hagemeister, Scholz, & Westhoff, 2002). In the jackknife validation the
classification rate amounts to 54.50 percent, with a chance rate of 53.50 percent. This
equals a validity coefficient of r=.348. Figure 2 shows the distribution of the probability
to receive a positive evaluation of one’s performance in the flight simulator according
to the jackknife validation of the discriminant analysis.
Calculation of the neural network was realized with the program Matlab 6 (Nabney,
2002). The neural network at hand is a multi-layer perceptrone with one hidden layer
of five units. The number of “hidden” layer units was determined on the basis of a
comparison of various network architectures using the criterion outlined by Häusler
and Sommer (2006). The input layer encompassed eight units representing the
individual test scores. The output layer represents the criterion variable. The neural
network is equipped with a complete feed-forward connection. The transformation
function used is Softmax, which is an activation function that is especially suited for
categorical data (Bridle, 1990). Basically, it is a multiple, logistical function the result
of which can be interpreted in the sense of a posteriori probability. The training
algorithm used here is the back propagation algorithm "scaled conjugate gradient".
This algorithm is recommended in particular for non-linear optimization tasks with a
higher number of weights (Masters, 1995). Altogether 500 iterations were used in the
training phase. Using this artificial neural net a total 79.8% of the sample is classified
correctly. The chance rate amounts to 53.5%. 83.0% of the suitable pilot applicants
and 76.1% of the unsuitable pilot applicants are classified correctly. The validity
coefficient amounts to r=.650.
In order to examine the stability of this result, a jackknife validation is realized
(Dorffner, 1991; Michie, Spiegelthaler & Taylor, 1994). The classification rate in the
jackknife validation amounts to 73.7%. This equals a validity coefficient of r=.600.
Figure 3 shows the distribution of the probability to receive a positive evaluation of
one’s performance in the flight simulator according to the jackknife validation of the
artificial neural net.
If only those classifications are taken into consideration that have been made with
<0.25 or >0.75, 61.6% candidates can be classified. In this case, the classification
rate is situated at 88.5%. The majority of correct classifications are thus made with
high probability, while incorrect classifications were made with a rather low probability.
Method
The variables used as predictors of driving behavior were Gaining an Overview from
the Tachistoscopic Traffic Perception Test (Biehl, 1996), General Intelligence from the
Adaptive Matrices Test Form S2 (Hornke, Etzel & Rettig, 2003), Correct Responses
from the Determination Test Form S1 (Schuhfried, 1998), Motor Time and Reaction
Time from the Reaction Test Form S3 (Schuhfried & Prieler, 1997), the Mean Time for
Correct Rejection in the Cognitrone Form S1 (Wagner & Karner, 2003) and the Field of
View and Tracking Deviation from the Peripheral Perception Test (Schuhfried, Prieler &
Bauer, 2002). In the following sessions this test battery will be referred to as test
battery PLUS.
In addition to completing the above tests, each subject also took a standardized
driving test. The driving test took place over a previously defined route and lasted
approximately 45 minutes. The driving test used in Vienna was the “Vienna Driving
Test” (Risser & Brandstätter, 1985); while in Bad Tölz the “Bad Tölz Driving Test”
(Burgard, 2004) was used. The measure of driving behavior in road traffic was the
mean of the global assessments of two independent observers using an a priori five-
point scale. An average global assessment of 3.33 was defined as the cut-off value.
The dichotomized global assessment of driving behavior in the standardized driving
test served as the criterion variable in the subsequent analysis. Using this cut-off
value, the driving behavior of 60.4% of the sample received a positive assessment.
Sample
The sample consisted of 164 (74%) men and 58 (26%) women aged 19 – 91 with an
average age of 59 and a standard deviation of 18. The median age was 64. Many of
the subjects were therefore middle-aged or elderly. The age variable did not, however,
give rise to any incremental validity in the subsequent analysis. Some of the subjects
were drivers who had already committed traffic offences. Participation in the study
was, however, voluntary. A total of 39 people (18%) had completed compulsory
schooling or basic secondary school but without completing vocational training (EU
educational level 2), 96 people (43%) had completed vocational training or a course at
a technical college (EU educational level 3), 35 people (16%) had a school-leaving
qualification at university entrance level or a qualification from a technical university
(EU educational level 4) and 52 people (23%) had a university degree (EU educational
level 5).
The calculation of the logistic regression was carried out with the program SPSS 10.0.
Using the method “Enter” the analysis resulted in a -2 log likelihood value of 191.46,
Chi²=37.60, p<.001. Altogether 72.9% of the respondents were classified in
accordance with their global evaluation of their performance in the standardized
driving test, which results into a validity coefficient of r=.350. The chance rate
amounts to 60.4%. Among those correctly classified are 85.8% of the respondents
with positive global evaluations and 53.4% of the respondents with a negative global
evaluation of their driving performance. Thus the sensitivity can be regarded as rather
high while the specifity is rather low resulting into an imbalance between sensitivity
and specifity of the predictions made by applying this classical method of statistical
judgment formation.
In order to ensure the stability of these results a jackknife validation is carried out.
The classification rate in the jackknife validation amounts to 69.8% with a random rate
of 60.4%. This equals a validity coefficient of r=.340. Among those correctly classified
according to the jackknife validation are 85.8% the respondents with positive global
evaluations and 45.5% of the respondents with a negative global evaluation of their
driving performance in the standardized driving test. Figure 4 shows the distribution of
the probability to receive a positive evaluation of one’s driving performance according
to the jackknife validation of the logistic regression.
The artificial neural network was calculated using the program NN Predict (Häusler,
2004). The type of network used consisted of a multi-layer perceptron with one
functional intermediate layer and full feed-forward connection. As a transformation
function the activation function Softmax was used; the results of this can be
interpreted as a posteriori probability. QuickProp (Fahlmann, 1988) was used as the
learning algorithm. The number of iterations was 10,000. Using this artificial neural
net a total 86.5% of the sample is classified correctly. The chance rate amounts to
60.4%. 97.0% of respondents with a positive global evaluation and 81.8% of the
respondents with a negative global evaluation of their driving performance are
classified correctly. The validity coefficient amounts to r=.780.
In order to examine the stability of this result, a jackknife validation is realized. The
classification rate in the jackknife validation amounts to 83.8%. This equals a validity
coefficient of r=.770. Among those correctly classified are 81.8% of the respondents
with a positive global evaluation and 77.6% of the respondents with a negative global
evaluation of their driving performance. Figure 5 shows the distribution of the
probability to receive a positive evaluation of one’s driving performance according to
the jackknife validation of the artificial neural net.
Figure 5: Classification of a trained artificial neural network according to the jackknife
method. The x-axis shows the estimated probability of a positive global assessment of
driving performance in the standardized driving test; the probabilities are divided into
ten groups. The bars indicate the relative frequency (as a percentage) in each group of
subjects who actually received a negative (black bar) or positive global evaluation
(white bar) in the standardized driving test.
If only those classifications are taken into consideration that have been made with a
probability of < .30 or > .70, the test battery enables 77.9% of respondents to be
classified with a high degree of certainty. In this case, the classification rate amounts
to 92.5%.
Discussion
The results obtained in the two studies presented above demonstrate, that artificial
neural networks can outperform classical methods of statistical judgment formation
with respect to classification rate, validity coefficient as well as a more clear
separability of suited and unsuited applicants based on the classification probabilities
of the individual respondents. The results obtained with the artificial neural networks
in both studies also featured a satisfying generalizability as demonstrated in a
jackknife validation. Based on these results and previous studies we can thus conclude
that artificial neural networks are a valuable and applicable alternative to classic
algorithms of statistical judgment formation which can be used to considerably
increase the precision of diagnostic decisions derived from test batteries.
References
* Doctor, Dr. G. Schuhfried Ges. M.B.H.