GENERALISABILITY OF THE PSYCHOMETRIC PROPERTIES OF A PILOT SELECTION BATTERY

Agnès Kokorian People Technologies, UK
agneskokorian@ntlworld.com

& Colin Valsler, Psytech Ltd, UK
colin@psytech.demon.co.uk

The Pilot Aptitude (PILAPT) system The development of the PILAPT computer-based system grew out of the meta-analysis reported by Hunter and Burke, and the design principles have been described by Burke, Kitching and Valsler (1994). In summary, these design principles were as follows: • that the test designs be based on clearly understood measures of individual differences that research has shown are relevant to pilot performance, either in training or in operations. As such, PILAPT had to cover both handling skills (as required in ab initio training) and CRM competencies (such as situational awareness and capacity). that the test designs should assume no prior knowledge of flying, but should have links to key pilot performance factors that are intuitive to both candidates and users. that the test designs should allow for practice to avoid the influence of prior experience of video games and give all candidates a level playing field to demonstrate their potential.

• •

• that the overall battery should be efficient and avoid redundancy and nugatory assessments. Design work on PILAPT began in 1994 and has continued with new tests and new scoring algorithms over the nine years since. Beginning with ab initio selection for the Royal Air Force (RAF) University Air Squadrons, PILAPT has been evaluated through data provided by air forces in Chile, Denmark, Portugal, Sweden, Norway and Italy as well as civilian airlines and training schools in the UK, Europe and Asia. PILAPT is a fully automated test delivery system built on the TEKS technology developed by Psytech Ltd. The system caters for all aspects of the testing process from candidate log on including the capture of biographical data, instructions, test administration, test scoring, analysis of candidate performance, reporting, and data transfer to other systems. The system has crash recovery and networking capabilities. The PILAPT battery of tests developed to date includes: • Hands (10 minutes) – the ability to process oral (verbal) rules to execute a visual task quickly and accurately – related to absorbing and using oral (e.g. radio information) under pressure

The data shown shows average performance for Swedish fighter pilot applicants. dual and triple task load conditions. Tasks are administered and measured under a combination of single. primarily driven in design by ab initio requirements and taking around 40 minutes in total. the PILAPT battery has been extended to include a minitest battery named Capacity designed to assess performance under increasing workload. Given that different tests are at different stages in the development cycle. 500 400 300 C a p a c ity u n d e r s in g le ta s k lo a d 200 H o w m u c h c a p a c ity d o e s th e c a n d id a te re ta in a s w o rk lo a d in c re a s e s ? Mean 100 DI4 IN G L E S S ING L DI4 DUA L DUAL TDI4 TRIPL R IP L E C a p a c ity u n d e r trip le ta s k lo a d Figure 1: Overview of what the PILAPT Capacity mini-battery measures Reliability and construct validity evidence supporting PILAPT This section of the paper provides a summary of the data collected on the PILAPT tests to date in military context.• Patterns (10 minutes) – the ability to ignore distracting information in order to make quick and accurate decisions under time pressure – related to maintaining focus on critical information when confronted with ambiguous situations and pressure Concentration (8 minutes) . the evidence provided varies across PILAPT tests reflecting the iterative cycle of development since 1994. Capacity takes around 15 minutes to complete and comprises a primary handling task and two secondary tasks involving visual and auditory information. and the impact of increased workload on the candidate’s performance is then analysed and reported using a display similar to that shown in Figure 1 below. The evidence is presented in three parts in line with .the ability to maintain focus on a primary task when the conditions for that task are constantly changing – related to maintaining situational awareness Deviation Indicator (7 minutes) – the ability to compensate for deviations in flight parameters with a look-and-feel based on the flight path deviation indicator (FPDI) – related to basic handling skills Trax (5 minutes) – a pursuit tracking task requiring the candidate to work in a 3 dimensional environment – related to advanced aircraft control • • • In addition to the tests above.

70 0.89 0.92 0.79 Table 1: Reliability results for PILAPT tests across various national sites In addition to these results. First.7 reliability for use in pilot selection.91 for the sum of these three PILAPT test scores.212 108 232 1.487 0.73 0. DI and Trax are not included in Table 1 as internal consistency estimates of reliability are not suitable for these tests.71 3. All these data clearly show PILAPT tests exceed the minimum requirement of 0. Table 1 summarises the results of reliability (internal consistency) analyses across various country and organisational sites using the Schmidt-Hunter meta-analysis model.94 0. Data on their test-retest reliability is given below. As an overall composite score for use in selection decisions. evidence of test reliability (associated with accuracy and stability of scores) is presented and followed by results from studies involving other marker tests of pilot aptitude (construct validity).93 0.90 0.7 (this in effect states that 70% of the variation in test scores is true variation as intended in the test’s design). Construct validity This section presents the results of a study conducted in Denmark involving four PILAPT tests – DI. 0.80 for DI.84 for Trax and 0. British Psychological Society (BPS) and the International Test Commission (ITC).76 0.902 0. Table 1 contains two versions of Hands. This study had a four month interval between test administrations and yielded reliabilities of 0.recommendations from professional bodies such as the American Psychological Association (APA). Valsler and Cabrera) present criterion validity data. Papers two (Calanna and Serusi) and three (Kokorian.71 0. Patterns and Trax – and a 15-test battery used to assess both aircrew and ATC aptitudes.69 0.72 0.218 762 585 Hands 0.89 Patterns 0. Data were available across all 19 tests for a sample of 632 applicants.9 and above.87 0. Source Chile Denmark Italy Norway Portugal Sweden UK Total Sample Size (N) Sample Weighted Mean 90% Credibility Local Sample 370 1. a test-retest (stability) study was conducted in 1995 for the RAF UAS (N=109).66 Concentration 0.92 0. a longer 40-item version and a shorter 25-item version. the PILAPT battery offers a reliability of 0.82 0. The content of the 15-test battery was classified according to test content in line .83 (N=430) 538 0.6 0. and an overall test-retest reliability of 0. Reliability The standard recommendation for the level of reliability required for tests used in selection is a minimum coefficient of 0. Hands.77 for Hands.91 4.

12 .40 .57 Spatial . R.51 0.17 0.13 0. Kitching. Hunter. R. The results are shown in Table 2. and N.01 level Table 2: Results for 632 Danish military applicants Hunter and Burke identified the following predictor constructs as being the most consistent and substantial predictors of pilot training success: perceptual speed. spatial reasoning. (1994). (1990). Corrected for average reliabilities in the Danish tests (0. Evidence of criterion validity and transportability of validity is presented in the second and third papers of this symposium.27 ..20 0..23 . Handbook of Pilot Selection. and Burke. psychomotor and simulation based tests.35 . Hunter. CA: Sage .08 .37 0.with the classifications used by Hunter and Burke in their meta-analysis. In N. C.33 . Cambridge: Ashgate.18 . References Burke.55 Memory .25 0.31 . Conclusion This paper has summarised the R&D objectives as well as reliability and construct validity evidence supporting the PILAPT battery.13 . E.53 Mechanical . 4.29 General Reasoning . Aviation Psychology: Training and Selection.8). Computer-based assessment and the construction of valid aviator selection tests. E.71. the overall regressions (furthest right hand column in Table 2) would range from 0.05 -. Journal of Aviation Psychology.11 . Test Group DI Hands Patterns Trax Overall Mathematical Reasoning . R. McDonald (Eds. E. 297-313. A.09 .06 . The Danish data set did not contain psychomotor or simulation based tests.34 to 0. and Schmidt. Cambridge: Avery. and Burke. Predicting aircraft pilot-training success: A meta-analysis of published research. D.29 0. (1994).38 .18 . D. Johnston. This classification then provides a direct test of the extent to which PILAPT is measuring pilot relevant predictor constructs. E. The consistency of the reliability estimates across different national sites with different selection processes and different applicant populations suggests that scores are generalisable across settings.35 Language .38 .). Hunter.44 Numerical Speed & Accuracy . but the results clearly show that PILAPT is tapping the other predictor constructs identified by Hunter and Burke as critical to predicting pilot training success. mechanical reasoning. J. Methods of meta-analysis: Correcting error and bias in research findings.. (1995)..14 . L. Fuller.24 ..27 Notes: Overall column gives the regression of the Test Group onto the 4 PILAPT tests Correlations in bold and italicised are significant at the 0.03 . F.29 . and Valsler. Newbury Park.