Professional Documents
Culture Documents
Operations
Psychometric Testing - A review of the train driver selection process
Psychometric Testing – Review of Train Driver Selection Process
Recommendations for standardising and improving selection processes
This publication may be reproduced free of charge for research, private study or for internal
circulation within an organisation. This is subject to it being reproduced and referenced accurately and
not being used in a misleading context. The material must be acknowledged as the copyright of Rail
Safety and Standards Board and the title of the publication specified accordingly. For any other use of
the material please apply to RSSB's Head of Research and Development for permission. Any
additional queries can be directed to research@rssb.co.uk. This publication can be accessed via the
RSSB website
www.rssb.co.uk
Contents
1 INTRODUCTION .....................................................................................................6
1.1 SUMMARY OF THE WORK CARRIED OUT IN THE PRECEDING DELIVERABLES .........................6
3 RECOMMENDATIONS.............................................................................................28
Executive Summary
Introduction
In November 2004, RSSB commissioned CAS to undertake a review of the processes
currently used for the selection and recruitment of train drivers with particular
reference to the use of psychometric testing.
This report presents a summary of the findings of the review and recommends
changes to the selection criteria and selection methods currently being used and to
the way psychometric tests are used in driver management.
Key findings
The British rail industry uses a recruitment process for train drivers which is
recognisably similar to those used for train driver recruitment overseas and in other
industry sectors. Almost all companies in Great Britain use the same four stage
process for recruitment, namely:
• Almost all companies go through the same stages, although the methods and
approaches used during the sifting and final selection stages vary.
• Some of the selection methods used at the assessment centres have proven to be
reliable and valid predictors of train driver performance. For others, evidence of
their validity is either weak or inconsistent.
• The selection criteria used at the assessment centres needs to be revised to fully
encompass the skills needed for modern train driving. Discussions with industry
representatives and comparison with overseas practice and other industry
sectors have identified a consistent set of new selection criteria. These are
related to the current set but give wider coverage of key requirements. However,
not all of the selection criteria are associated with common national
requirements. Only those which are required nationally should be assessed in
the assessment centres.
• Psychometric tests are used for four main purposes in driver management:
Test use in the first two of these is sensible and conforms to good practice. Practice
in pre-incident counselling and post incident investigation is more problematic. It
is not clear that companies fully understand the purpose of testing in these
circumstances. As a result, some may not be using tests effectively.
Recommendations
• The selection criteria should be updated to give better coverage of the abilities
recognised to underpin good performance in modern train driving.
• Selection criteria which address safety and train handling performance should
form the core of driver selection and be assessed by all companies in the same
way either by the assessment centres or by qualified individuals in companies.
Companies should have flexibility in the way in which they assess criteria which
relate to personal effectiveness.
• The effective parts of the current assessment centre process should be retained
for national use.
• The parts of the assessment centre process which have not proved effective
should either be replaced or upgraded.
• Companies need to give greater thought to the use of psychometric tests for
post incident investigation. A process is outlined for how tests can be better
integrated into the investigation process.
1 Introduction
This report is the ninth deliverable of a project to review the processes currently
used for the selection and recruitment of train drivers, with particular reference to
the use of psychometric testing. It unfolds as follows:
• Section 2 sets out the key findings of the project. The findings have come from
research and data collected in five work packages. The main contributing
deliverables, described in more detail in section 1.1, are:
• Processes and methods used in the initial sifting and shortlisting of applicants.
• Recruitment difficulties.
• To identify how companies assess driver performance and how willing they were
to provide access to that information for validation purposes (Deliverable 7).
Data was also analysed from the eight national assessment centres which carry out
psychometric testing and interviewing on behalf of the industry. Analyses were
based on 1347 applicants for train driver posts assessed between April 2003 and
November 2004, for whom complete assessment centre data are available, and
4606 applicants between November 2000 and April 2003 for whom assessment
centre results and limited personal information is available.
The research on other industry sectors looked at recruitment and selection practices
and methods in car, tram and bus drivers, pilots, air traffic controllers, fire fighters,
the Royal Navy, the Merchant Navy, the Offshore Oil and Gas industry, shipping
industry, the Royal Air Force and the nuclear industry (BNFL).
The overseas research looked at recruitment and selection practices and methods in
Australia, Austria, Belgium, Denmark, Estonia, France, Finland, Germany, Hungary,
India, Italy, Lithuania, Luxembourg, Netherlands, Northern Ireland, Norway, Poland,
Spain, Sweden and Switzerland.
The validation exercise involved collecting performance data from all the companies
which had expressed an interest in contributing to the validation study. The data
included train handling performance, safety records, performance in following
procedures, work attitude and training records. The performance data were then
matched to assessment centre records. Eighteen companies are represented in the
analyses with performance data available for 373 drivers who had been recruited
since 1999. Most of the sample were drivers of passenger trains (both conventional
and high speed). There were some freight and on-track machine drivers but
insufficient to undertake separate statistical analyses.
Current and future train driver job demands were explored through desk research
and three workshops with 24 industry representatives from 14 separate companies
(including RSSB). The workshops involved focused discussions, which took into
account both the current competence standards for drivers and the implications of
the introduction of ERTMS, and a Repertory Grid exercise.
2 Key findings
• The ratio of applicants to job vacancies (e.g. several companies report occasions
when they have had more than a thousand applicants per vacancy while others
typically only have five or six).
Nonetheless, all but two companies use the following four stage recruitment
process:
The two companies which do not follow this process only recruit experienced drivers
and do not require applicants to go to an assessment centre or to sit psychometric
tests. This four stage process is the same as that used in most train driver
recruitment in other countries and in other sectors where safety critical staff are
recruited. However, the selection methods used at each stage can vary
considerably.
All companies use application forms in the second stage for sifting/shortlisting
although the sophistication of their use varies markedly. A number of companies
use a scored checklist approach while others rely on subjective assessment. A
number of selection methods are used in the sifting and final selection stages. For
example, some companies use additional psychometric testing at either or both
stages. It is also quite common for interviews to be carried out at both these
stages. Where interviews are carried out at the sifting stage, these are often done
The final selection stage usually involves a wider range of staff, typically involving a
mix of HR staff, driver managers and production or operations managers.
Sometimes, particularly in smaller companies, senior managers and directors will
also be involved in the final stage.
1. On average, there are 317 applicants for every vacancy. This number has to be
reduced for cost reasons before applicants are sent to an assessment centre.
2. Many companies report problems with large numbers of poor quality applicants.
The following table shows an example of both the selection ratios and the number
of applicants affected at each stage in a typical recruitment scheme.
Drop out
65 66% 43 22
before A/C
Assessment
43 41% 17 26
Centre
Drop outs
before final 17 66% 10 7
stage
Final selection
10 48% 4 6
stage
Drop outs
4 33% 2 2
before medical
Medical
2 80% 1–2 0–1
Examination
Recruited
- - 1 -
In this example, the assessment centres are only responsible for rejecting 26
applicants or just over 8% of applicants who responded to the advertisement. By
comparison, drop outs account for 43% of applicants while the sifting and final
selection stages account together for 48% of applicants.
• Interviews.
• References.
Little is known about the reliability and validity of these methods for train driver
recruitment. A variety of staff are involved in the delivery of these methods. Where
psychometric tests are used, staff trained to at least British Psychological Society
Level A standard always seem to be used. However, staff are not always trained in
the use of other methods or in processes for making assessment and selection
decisions. That interviewers are often untrained is a key issue given the extensive
use of interviews. This is clearly different from practice at the assessment centres
where only trained interviewers are used.
1. The pass marks at the assessment centres are pre-set. This means that the
rejection rate is more or less fixed although there is some variation from
company to company depending on the quality of their applicants and the
amount of sifting they do. This in turn means that companies with high ratios of
applicants to vacancies have to use other selection methods to reduce applicants
to manageable numbers.
2. Although all companies are aware of the recommended selection criteria for
driver recruitment, many either introduce new criteria during the recruitment
process or extend the coverage of the recommended criteria. An example of the
former is requiring applicants to have experience of technical or engineering
work. An example of the latter is requiring applicants to undertake a writing
exercise.
3. Most companies recognise the value in having their own staff involved in the
selection process. This gives staff a sense of ownership in the selection
decisions and allows companies to recruit staff who will fit the company culture.
marks. Applicants in such cases are excused taking the tests again at the
assessment centres although the results are entered in their assessment centre
results sheet.
This practice calls into question the need for such companies to use assessment
centres. It seems that most companies treat the assessment centres as a mandated
process rather than a process recommended in RACOP GO/RC3551. In practice,
there is no reason why companies with the required competence to administer
psychometric tests and conduct the CBI should send applicants to an assessment
centre at all. Alternatively, it is also reasonable for some companies to require
applicants only to undertake part of the process if they can demonstrate that they
have covered other parts of the process adequately elsewhere.
The standard assessment centre process consists of four psychometric tests and a
CBI. The tests provide between 22 and 33 scores, depending on which versions are
used, of which 10 or 13 are used to make assessment decisions. These assessment
scores are sometimes individual test scores and sometimes combinations of scores.
They are used to provide assessments for six of the twelve recommended selection
criteria. The other six selection criteria are covered by the CBI. Table 2 shows
which selection methods are used to assess which selection criteria.
Ability to learn new information Trainability for Rules & Procedures Test
within appropriate time limits. (TRP parts 1 and 2).
Ability to understand mechanical Mechanical comprehension test (MT4).
principles.
Ability to communicate clearly and Criterion Based Interview.
effectively verbally and in writing.
Motivation to follow rules and Criterion Based Interview.
procedures.
It can be argued that many of the selection criteria are assessed more than once in
the complete recruitment process and that the complete recruitment process does
conform to the definition. However, this depends on individual companies and it
cannot be guaranteed currently that selection criteria are systematically assessed by
more than one method. Not all companies have systematically considered the
relationships between the methods used at different stages and how coverage of the
selection criteria is provided. This has important implications for the reliability and
validity of the recruitment process.
• Pass rates are high on all the individual assessment centre selection methods.
• Many companies reject applicants as soon as they have failed one test (at least
25%) and do not require them to complete the process. Only about 25% of
companies require applicants to complete the process whether or not they have
passed all the tests.
• Indeed, the failure rate for all the criterion based interview criteria combined is
only 5%.
• The pass rate is particularly high on one of the new selection criteria, being
proactive and tenacious. In our assessment centre sample, not one applicant
was failed on this criterion.
• The only reason that the average success rate at the assessment centres is as
low as 41% is because there are so many relatively independent selection criteria.
• There are more fail grades than pass grades with poor differentiation amongst
the pass grades.
Note that the high pass rate on the CBI does not mean that it is a poor assessment
method. There are a number of reasons why the pass rates should be so high. The
most obvious is that applicants have already being selected on the selection criteria
addressed by the CBI during the shortlisting and sifting stage. In particular, many
companies interview applicants before sending them to an assessment centre.
Nonetheless, this does raise the question of whether the CBI is providing all the
value that it could or should.
Although the relevance of the selection criteria is recognised, this does not mean
that the recommended criteria are the best possible selection. The current selection
criteria are a mix of safety, performance and trainability criteria pitched at varying
levels of generality. For example, “ability to communicate clearly and effectively” is
a very general criterion which makes no reference to the specific sorts of
communications train driving involves. “Ability to operate hand and foot controls” is
an example of a very specific criterion.
Having both very general and very specific criteria can create problems:
• A very specific criterion is usually just one of a subset of criteria which make up
a more general criterion. The question arises as to why one or two specific
criteria should be chosen and not others. For example, “ability to operate hand
and foot controls” is just one of a range of train handling skills which might be
assessed.
The current selection criteria, and the methods used to assess them at the
assessment centres, do not appear to have got the balance of generality and
specificity right. This is one reason why individual companies find it necessary to
add extra criteria to the selection process or to assess further dimensions of the
existing criteria. For example, it is not clear that it is necessary to assess separately
for both “ability to recall and retain job related information” and “ability to learn new
information within appropriate time limits” when they are such closely related
aspects of trainability.
retained in the selection criteria. Most passenger and freight companies consider
fault diagnosis to be a more important skill than mechanical comprehension.
Furthermore, most drivers no longer take responsibility for even minor fixes and
repairs. This trend has been recognised before, with a fault finding test included in
trials in a previous validation study1. However, other companies, particularly those
employing track machine drivers, still see a need for mechanical comprehension
ability.
There is a clear case for setting minimum acceptable standards for the whole
industry on selection criteria concerned with safety, such as vigilance. The case for
general performance and trainability criteria is less clear. The reasons for including
such criteria are often commercial. Companies differ in the level of ability they
require on such criteria even if they are recognised to be important criteria by all.
Where companies have different requirements for performance and trainability
criteria, there is a case for them to either set their own pass marks or to use their
own selection methods as long as these decisions are taken by qualified individuals.
• A significant proportion of the dates for date of birth and date of assessment are
incorrect.
• A significant number of test scores entered in the records are outside the
possible range of scores for the tests.
• The old and new versions of the Mechanical Comprehension Test are kept
separate in the records but the old and new versions of the TRP test are being
entered in the same field making it difficult to distinguish them.
• When applicants re-sit, sometimes scores from previous sittings of tests are re-
entered in the relevant fields but sometimes the fields are left blank.
1 Fletcher, S. (2002) Predicting the Effective and Safe Train Driver: Report of Validation
Findings and Recommendations for Action. Occupational Psychology Centre Ltd., Watford.
Some of these problems, for example missing candidate records, resulted from
problems at the time the database was set up and should not be a problem in the
future. Others, such as recording test scores on different versions, will disappear
once stocks of old test materials have run out. For the other problems, two steps
need to be taken:
• Many of the data entry errors could be trapped by having field validation in the
database.
• Validity – do the selection methods accurately measure what they are supposed
to measure?
• Utility – do the different selection methods all add value to the selection process?
2.3.1 Reliability
Although reliability information was not available to us for all the selection methods,
where it does exist, the indications are that most individual measures are highly
reliable. However, there are two concerns:
1. The score bands used for making grading decisions at the assessment centres
are not reliable for all test scores. In some cases, the difference between a grade
A and a grade C is not statistically significant. This is particularly a problem with
the ‘error’ and ‘omission’ scores in the DTG and Group Bourdon.
2. A number of new versions of the tests have been introduced. Although care was
taken when introducing these new versions to make the pass marks and score
bands equivalent, different versions of the tests produce different pass rates.
This affects three of the tests, the Group Bourdon, the Mechanical
Comprehension and the Trainability for Rules and Procedures and occurs for
technical reasons to do with the way the scores are used for making decisions.
In particular, the computerised version of the Group Bourdon and the paper and
pencil version are not equivalent tests.
2.3.2 Validity
The validity of the psychometric tests used at the assessment centres has been
examined several times since their introduction. A variety of performance criteria
have been used in these studies including both objective and subjective measures.
The current study mainly used performance data collected from driver managers
under the following 15 headings:
In addition, for a subset of 157 drivers more detailed performance data was
collected from personnel records, on attendance and timekeeping, and examination
and assessment records from training.
Statistical analysis showed that these performance criteria could be grouped under
five main headings:
3. Safe performance
4. Formal competence and work attitude
5. Classroom examinations
Validation of the psychometric tests and interviews was conducted against these
performance groupings. Details of how the data were collected and the guidance
given to driver managers can be found in Deliverable 7: Validation study of the
current recruitment process and review of the future train driving role.
All the selection methods used at the assessment centres have shown some
evidence of being valid predictors of either driving performance or trainability, but
the results have been inconsistent. This inconsistency is also apparent in the
current study. For example, the DTG has been reported in the past to have a
positive relationship with training performance, job performance and safety
incidents. However, the previous validation study found no predictive relationships.
This study found no positive relationships for the DTG with training outcomes, nor
with overall train handling (e.g. control of acceleration, braking and speed), but did
find some modest significant correlations (in the range 0.13 – 0.16) with aspects of
safety performance and procedure based work such as the operation/isolation of
safety systems and train preparation, disposal and handover. Three of the four DTG
scores is associated with a significant predictive correlation but in each case with
only one aspect of driving performance.
The results for the Group Bourdon test are also inconclusive. There are significant
positive correlations for both the computer and paper and pencil versions of the test
but little consistency in which parts of the tests produce these results. So, for
example, the Time measure on the computer version of the test shows some
moderate correlations (in the range 0.21 to 0.24) with procedure based work,
classroom training performance and safe performance, but the Omissions and Error
scores produce no significant correlations. In contrast, the paper and pencil version
of the test produces no significant correlations for the Production score (regarded as
similar to the Time score) but several significant correlations for the Omissions and
Error scores, particularly with classroom training performance, in the range 0.31 to
0.43). Two important conclusions can be drawn from this result:
1. The computer version and paper and pencil version of the Group Bourdon are
not equivalent tests.
2. Although the tests may have some value they do not measure what they are
expected to measure. Although they may be measures of attention and
information processing, they are almost certainly not measures of vigilance.
Both parts of the TRP test consistently correlate positively with relevant criterion
measures. For example, both show significant correlations with classroom training,
practical training, procedure-based work and safe performance (in the range 0.16 to
0.39). There is also evidence that the new version of the TRP is a better predictor
than the old version, almost certainly because there is a wider score range which will
make the tests more reliable.
There is little evidence for the validity of the CBI. Only two of the criteria assessed
using the CBI (“follow set rules and procedures” and “being proactive and tenacious”)
show any significant predictive relationships, the former producing two low but
statistically significant correlations of 0.13 with safe performance and classroom
training, the latter correlating 0.35 with procedure-based work. Otherwise, there is
no consistent evidence for the validity of other parts of the CBI although some
evidence has been reported in previous studies. The very high pass rates and the
lack of differentiation amongst the pass grades almost certainly reduce the chances
of finding statistically significant results.
Nonetheless, there is evidence that the CBI has some construct validity. It
differentiates better between the various selection criteria than is often the case
with interviews, the correlations between the ratings on the six criteria being
moderate (averaging 0.32). Furthermore, the CBI seems to provide sufficiently
different information to the tests, the correlations again typically being in the range
0.3 to 0.35.
this study and other published research suggest that the paper and pencil version of
the test is of greater value than the computer version. The inconsistency in the
validity findings over time suggests that other parts of the process do have some
value but the predictive validities are probably low (0.20 or less) which is why they
are not found in every validation study.
2.3.5 Utility
Four criteria need to be considered when considering the utility of a selection
process:
2 Hausknecht, J.P., Trevor, C.O. and Farr, J.L. (2002) Retaking ability tests in a selection
setting: Implications for practice effects, training performance, and turnover. Journal of
Applied Psychology. 87 (2), 243-254.
For train driver recruitment the costs associated with psychometric testing and
interviewing are trivial compared to the benefits of making better selection
decisions. The cost of a train driving incident far outweighs recruitment costs.
Ensuring that an applicant who is at risk of having an accident is not recruited easily
pays for itself.
However, recruitment costs could be reduced without affecting the validity of the
process. With some important exceptions, the different methods used at the
assessment centres, both tests and interviews, are relatively independent of each
other. This means that, as long as the methods are valid, each assessment method
will add value to the selection process. The exceptions are some parts of the paper
and pencil version of the Group Bourdon, the `good’ and `omissions’ scores on the
DTG, and the Time measure of the computerised Group Bourdon where there are
moderate to large inter-correlations. There is little chance of one of these scores
adding value to the others. Also, a number of scores are recorded for the various
tests but not used in the selection decision. The scoring, recording and use of these
correlated scores should be rationalised and simplified.
The key finding is that several of the selection methods appear to have poor, or at
best modest, validity. In such circumstances, the utility of the selection process
must also be limited. This fact has been recognised in the past by lowering the pass
marks on some tests (e.g. the Group Bourdon). Such a step further reduces the
utility of the selection method since, if the pass rate gets very high, there is less
chance of the method realising its full benefit. As noted, the pass rates on all the
methods used at the assessment centres are high so the utility of the total process
is almost certain to be relatively low.
inconsistent across the industry. Industry opinions vary as to whether tests give
value.
Just over half of companies use tests for one or other of these purposes. About a
third use tests for post incident investigation, 30% for training and development and
15% for post trauma counselling and pre-incident counselling. However, even in
those companies where tests are used for these purposes, there is a degree of
unease about their appropriateness.
2. Tests are sometimes used as part of the promotion process from driver to roles
such as driver instructor and driver manager. Indeed, GO/RC3551 (pp132)
contains recommendations for the use of various tests (e.g. RAAT, VT5.1,
SAFEPQ) which are offered as ‘examples of suitable assessment tools” although
“there may be other exercises or assessment tools that would assess these key
criteria and are available off the shelf or could be designed bespoke by the
assessor’. When used in this way, the tests should be used to assess ability to
cope with demands of the new role. It follows that there is no point in using the
same tests as were used in driver recruitment. In fact, a wide variety of different
tests are used by companies including ability and personality tests and practice
appears to be entirely appropriate.
Typical practice is to re-test drivers on the tests used at the assessment centres.
Except in some specific circumstances, this practice makes little sense. Performance
on tests tends to be stable over time. Unless there are reasons to believe that the
driver has either experienced a traumatic event or suffered a deterioration in his or
her mental or physical health there is no reason to suppose that test performance
should change over time. If there is a psychological reason for the incident and no
evidence of a significant life event, the task for the investigator is to identify aspects
of individuals’ performance which are not assessed by the recruitment tests. As
noted, the assessment centre tests only measure a subset of the skills and abilities
involved in train handling and safety performance. Coverage of all these abilities is
a necessary element of good practice in post incident investigation.
Where there are reasons to suspect that a driver’s personal circumstances have
changed significantly, a case can be made for re-testing on the recruitment tests.
The assumption would be that, as in post trauma assessment, the driver’s
performance may have deteriorated. Note, however, that deterioration may occur in
other areas of performance and it is important to ensure that assessment is
comprehensive. Furthermore, deterioration in performance may result from the
stress of being tested again rather than from other causes. In any case, the fact that
there are some concerns about the validity of some of the tests and their ability to
predict safe performance suggests this should be undertaken with caution.
Since testing needs to be tailored to the circumstances of the incident and the
individual, it follows that testing should not be the first port of call in an
investigation. Interviews, observation and performance on simulators can all be
used to provide performance evidence and to help investigators develop hypotheses
about the causes of incidents.
In addition, many Human Factors issues were identified which are related to the
introduction of ERTMS (the European Rail Traffic Management System) and other
anticipated changes to the driver role. Interestingly, consideration of these issues
did not result in the identification of any additional selection criteria. All the
demands which these changes will make on driver performance already exist even
though the level of demand may vary.
3 Recommendations
management.
• Attention. • Conscientiousness.
• Communication skills.
Figure 1 below shows the relationships between the current selection criteria and
the recommended replacements. The key points to note are that:
• All the existing criteria are covered somewhere in the new criteria.
• In some cases several criteria have been combined in one selection criterion (e.g.
trainability) while in others one criterion has been split into two (Attention and
Vigilance).
• Where appropriate, the same or similar terminology has been retained, but where
there is a significant change in emphasis, more substantial changes have been
made to the wording.
• Some of the new selection criteria (e.g. achievement need) might best be
assessed by companies themselves rather than at an assessment centre. There
are several criteria for deciding which criteria might best be assessed by
companies which include:
The following definitions explain the range of concepts to be covered by each of the
new criteria:
Attention: Three different types of attention have been identified which are relevant
to train driving. These are:
• Selective attention (the ability to focus on one thing and avoid distractions).
• Attention switching (the ability to change the focus of attention as and when
necessary).
• Sustained attention (or vigilance, the ability to remain attentive for long periods
of time).
• Multi-tasking.
• Prioritisation.
• Fault diagnosis.
• Instrument interpretation.
Trainability: There are two existing selection criteria which relate to trainability,
“Ability to recall and retain job related information” and “Ability to learn new
information within appropriate time limits”. Between them, these criteria address all
but one of the main abilities which underpin trainability. These are:
These are four related stages. You cannot be good at one unless you are good at
the previous stage. Therefore, trainability is better considered as one selection
criterion but where assessment has to address all four aspects.
work.
• Self-control, which includes ability to check and not make assumptions and
• Virtue, which includes attitude to rule compliance (see next section) and the
Emotional Stability: There are three main aspects of emotional stability to consider:
The current selection criterion “ability to remain calm in emergency and/or stressful
situations” addresses the first of these but does not explicitly cover physiological or
intellectual reactivity to stress.
Tolerance of working alone: The current selection criterion “can spend time alone
and do so effectively” addresses this requirement but it is important to be clear
about what it entails. Train drivers do spend significant amounts of time on their
own on a regular basis. It is not sufficient, therefore, to assess whether they “can
spend time alone” since that wording does not capture the regularity of the activity.
Conversely, most drivers also need to interact with passengers, other train drivers
and / or other railway workers on a regular basis, so it is not desirable for drivers to
be unsociable or to lack social skills. The key issue is that drivers need to be able to
stay alert during periods of time when there is little happening to stimulate or
interest them. Extraverts are more likely to have problems with this than introverts.
As importantly, however, you do not want drivers who, in the absence of
stimulation, try to create interest by taking risks or seeking thrills. The key dangers
here, therefore, are extraversion and sensation seeking and not the need for
socialising. So, train drivers must have:
• Sufficient social skills for the social interaction involved in the role.
• The ability to remain alert for significant periods of time without social or other
external stimulation.
• The ability to carry out the same tasks in the same way repeatedly.
Again, companies are likely to vary in the demands made on their drivers. For
example, drivers on commuter lines have more varied day to day work experiences
than those on long haul freight journeys. As with achievement need, it may be
preferable for companies to take responsibility for this selection criterion since there
may not be a common, minimum standard which all companies require.
The following table summarises the recommended new selection criteria and the
sub-criteria which need to be taken into account:
Criteria Sub-criteria
Attention • Sustained attention (vigilance)*.
• Selective attention (including perceptual differentiation).
• Attentional switching.
The selection criteria which apply to all companies are those concerning safety and
train handling performance. The other criteria deal with personal effectiveness, that
is, the ability to be a good employee. The following table indicates which of the
recommended criteria fall into these categories.
criteria but they will need to be reworded to capture the wider definitions of what
needs to be assessed and the standards expected.
• Applicants should be allowed more than one re-sit. The validation evidence
suggests that applicants who pass at the second attempt perform as well as
those who pass first time and are more committed to the work. However,
applicants should be allowed no more than three attempts in total since inability
to reach the required standard by the third attempt may indicate weaknesses
which would be difficult to overcome.
• Test scores should remain current for more than one year. Indeed, this
requirement is contradictory at present since the test scores of existing drivers
are included in the data which is transferred when drivers change company.
Either, test scores should remain current for at least five years or should remain
current until there is reason to believe that there has been a significant change in
an individual’s life. The latter is more in keeping with what is known about
changes in test performance over time.
• Best practice requires that more than one assessment, preferably using different
methods, is carried out for each criterion. This is particularly true for the criteria
which address safety and train handling performance. These multiple
assessments need not happen at the assessment centres. Companies should
design their selection processes to ensure that they occur somewhere in the
three selection stages (sifting / shortlisting, assessment centre and final
selection).
Note, this does not mean that a different selection method should be used for
every assessment against every criterion. One test (or other method of
assessment) may be able to give you evidence on two or more of the selection
criteria. Companies should aim to make the total selection process as efficient
as possible. This is also true at the assessment centres. Any changes to the
selection methods used should be based on an evaluation of what are the most
important criteria to assess there.
• The TRP and the “motivation to follow rules and procedures” section of the CBI
should be retained as they are in the recommended assessment centre process.
Both have been found to predict safety performance as well as aspects of
personal effectiveness.
• The Computerised Group Bourdon should be replaced and the DTG replaced or
upgraded. Although the Group Bourdon is a test of selective attention, there is
reason to doubt it is an effective test of vigilance. Furthermore, the correlation
between the time measure on the Computerised Group Bourdon and both the
TRP and the `good’ scores on the DTG are high (0.55 and 0.76 respectively).
The DTG may need replacing or upgrading to give better coverage of the range
of abilities needed in modern train handling.
• The Mechanical Comprehension Test (MT4.1) measures abilities which are only
required by a minority of companies. This probably explains why the validation
evidence for the test is so inconsistent. It should, therefore, be removed from
the assessment centre process but individual companies may wish to retain it for
use where they have a specific requirement for mechanical aptitude.
• Ineffective parts of the CBI should be improved to give better coverage of the
recommended selection criteria. This is particularly true for “ability to
communicate clearly and effectively” where it seems not enough attention is
being paid to production skills, and “ability to remain calm in emergency and/or
stressful situations” where the assessment criteria seem to be inappropriate.
Consideration should be given to including role play or simulation exercises to
cover these criteria.
Figure 2 below illustrates the envisioned changes to the content of the assessment
centres using the following colour keys:
• The blue boxes indicate recommended methods for the revised assessment
centres.
• The black arrows indicate the selection criteria, current at the top, recommended
at the bottom, for which the various assessment methods seek to provide
evidence.
• The green arrows indicate where changes or upgrades to selection methods are
recommended.
• The pink arrow indicates where the existing method can be carried over
Communication
Follow rules Hand and foot Recall and retain
Remain calm control information
Conscientious React quickly Learn new
Proactive / tenacious Mechanical Vigilance and
and safety Comprehension information Concentration
Spend time alone
CBI + Personality
Perception Decision
Role TRP Vigilance
Test ? Test (DTG Making
Play? Test
upgrade?) test
Train Handling
Emotional stability Multi - tasking
Aptitude:
Communication Instrumentation Trainability Vigilance
Hand and foot control
Rule compliance interpretation Rule compliance
Concentration
Conscientious Planning / Communication
TTC estimation
Achievement need anticipation
Perceptual
Tolerance of working alone Train handling
differentiation
aptitude
• Pre-incident counselling.
Current use of psychometric tests in the first two of these seems sensible and no
changes are recommended in these areas.
The use of tests in post incident investigation and pre-incident counselling (special
monitoring) is more problematic. It is not clear that re-testing drivers on the tests
used at the assessment centres is always, or even usually, an appropriate strategy.
This conclusion is strengthened by the finding that three of the psychometric tests
used at the assessment centres have, at best, only modest validity and that they
only address a subset of the abilities required of a good driver. Furthermore, post
incident testing should focus on the safety and train handling aspects of the role.
Tests and other assessments of personal effectiveness will often be irrelevant.
Nonetheless, tests are widely used in other sectors for similar purposes. Indeed,
they are one of the main tools used by clinical psychologists.
behaviour.
When you may Why you may How you might use tests
want to use a want to test
test
When you may Why you may How you might use tests
want to use a want to test
test
Following an To help • Psychometric tests should not be the first
incident. establish choice for assessment.
explanations of • Observation, interviews and assessment on
behaviours. simulators should be used first to establish
likely causes which explain the driver’s
actions.
• Where these possible explanations can be
investigated using tests, identify suitable tests
which may help you to understand better why
a driver behaved in a particular way.
• The purpose of testing will be to support or
challenge the possible explanations identified
earlier.
• These tests will often not be the ones used for
recruitment since they cover only a subset of
the abilities required by a good driver e.g. you
may want to test for a suspected memory
problem using a specific test.
• Such testing often requires specialist
knowledge.
• Remember that re-testing, post incident is
itself a stressful event that may result in a
driver performing less well than they would in
normal circumstances.
When you may Why you may How you might use tests
want to use a want to test
test
If you have To find out if • If you suspect that a driver’s performance has
reasons to believe they are deteriorated but you are unsure why.
a driver’s unlikely to be • You will need to assess them to find out if this
performance may able to continue is really the case.
be introducing doing their job • Consider using the following sequence of
risk. properly. assessment methods to help you do this:
Interview the driver.
Observation.
Assessment on a simulator.
Re-test, using the selection tests – you
can compare these results with the
original ones to see what, if any,
changes there are in performance.
• Based on these outcomes you may wish to use
other psychometric tests to look at particular
areas of concern or to further explore areas
where performance may be deteriorating.
Table 5: Use of psychometric tests for driver management
4 Next steps
The full implementation of all the recommendations outlined above will take some
time. Furthermore, although stakeholders who attended the industry briefing held
on 20 October 2005 were broadly in agreement with the principles underpinning the
recommended changes, agreement has not been reached with regard to the detailed
implementation of the recommendations. Final decisions about which tests and
methods to upgrade, which to introduce and which to remove, will considerably
affect the time to implement revised processes. For example, developing new tests
or new versions of tests will take considerably longer than buying commercially
available tests. The recommendation of the stakeholders at the industry briefing
was that a steering committee should be set up, facilitated by RSSB, to make these
detailed decisions on content and process.
The questions which the steering committee will need to answer are:
• Does the industry needs a core assessment process for all train driver
recruitment?
• Given that it is impossible to test for all the criteria and sub-criteria, which might
be included in the core process at the assessment centre i.e. which criteria must
be covered?
management?
In making its decisions, the committee will need to take into account regulations
which may be introduced as part of the implementation of the European directive on
rail system interoperability and changes to the train driver group standard (GO/RT
3251) which may result from RSSB’s ongoing review of standards.
In the short term, however, there are a number of steps which can be taken to
improve the total recruitment and selection process:
1. Continue to operate the current assessment centre process until such time as a
revised process has been developed. There is sufficient evidence for the validity
and utility of parts of the process to suggest that it delivers value.
2. Companies can begin selecting against the new selection criteria as soon as they
are agreed.
3. To make this happen, the extent of the coverage of these selection criteria by the
current assessment centre process will need to be agreed.
4. Following this, companies will need to review their sifting and final selection
processes to ensure completeness of the coverage of the new selection criteria.
This may involve revisions or additions to their selection methods.
5. Companies need to make sure that all participants in the various stages of the
total recruitment process are properly trained.
There are also a number of steps that can be taken in the short term concerning the
use of tests in driver management:
1. Use tests where appropriate but not as the first choice of assessment.
2. Outsource testing for this purpose if the company does not have appropriate
expertise in-house.
Rail Safety & Standards Board Registered Office: Evergreen House 160 Euston Road London NW1 2DX. Registered in England and Wales No. 04655675.