You are on page 1of 751

The principles and practice of

psychological assessment
SECOND EDITION

Alwyn Moerdyk

Van Schaik
PUBLISHERS
Published by Van Schaik Publishers
A division of Media 24 Books
1059 Francis Baard Street, Hatfield, Pretoria
All rights reserved
Copyright © 2015 Van Schaik Publishers

No part of this publication may be reproduced, stored in a retrieval system, or


transmitted in any form or by any means – electronic, mechanical, photocopying,
recording or otherwise – without the written permission from the publisher, except in
accordance with the provisions of the Copyright Act, 98 of 1978.

Please contact DALRO for information regarding copyright clearance for this publication. Any
unauthorised copying could lead to civil liability and/or criminal sanctions.

Tel: 086 12 DALRO (from within South Africa) or +27 (0)11 712 8000
Fax: +27 (0)11 403 9094
Postal address: PO Box 31627, Braamfontein, 2017, South Africa
http://www.dalro.co.za

First edition 2009


Second edition 2015

ISBN 978 0 627 03270 7

eISBN 978 0 627 03579 1

Commissioning editors Rosemary Lerungoana & Marike von Moltke


Production manager Werner von Gruenewaldt
Editorial coordinator Lee-Ann Lamb
Copy editor Linton Davies
Proofreaders Wendy Priilaid, Beverlie and Linton Davies
Cover design by Gisela van Garderen
Cover image Corbis/GreatStock
Typeset in 9.5 on 11 pt Plantin by Pace-Setting & Graphics
Printed and bound by Interpak Books, Pietermaritzburg

Every effort has been made to obtain copyright permission for material used in this
book. Please contact the publisher with any queries in this regard.

Please note that reference to one gender includes reference to the other.

Website addresses and links were correct at time of publication.

The cover picture is of Nadia Comaneci, the first person ever to get a perfect score of
10 out of 10 for all her Olympic gymnastics events. This picture is used because it
illustrates the importance of attaching numbers to behaviour, without which no form of
accurate judgement or agreement about performance or behaviour of any kind is
possible.
About the author

Alwyn Moerdyk trained as an Industrial Psychologist and has over thirty


years’ experience in the assessment field. He has worked in research
organisations such as the National Institute for Personnel Research
(NIPR) and the Chamber of Mines Research Organisation. In both of
these organisations he was involved in the design, development and re-
standardisation of various psychometric tests. While at the Chamber of
Mines he spent a few weeks in Israel working under Reuven Feuerstein
coming to understand the thinking behind the concept of dynamic
testing and the Learning Potential Assessment Device or LPAD. During
this period he also served on a sub-committee tasked with the selection
of apprentices for the Mining Industry. When the Chamber of Mines
Research Organisation closed its human resources research division,
Alwyn moved to Transnet, where he helped in the re-focussing of the
organisation’s selection policy immediately prior to the 1994 democratic
elections. He also served on various committees involved in the
selection of technicians and potential bursary holders. He also worked
for a short period in a consulting practice involved in assessment centres
with various blue chip clients. He has worked in a school setting where
he was responsible for ability and career guidance assessment and
conducted career guidance assessments on a regular basis. He has also
been an expert witness in Third Party court cases involving brain and
other physical injuries resulting from motor vehicle accidents (MVAs)
and cerebro-vascular incidents (CVIs – strokes).

Previously he taught cross-cultural psychology at the University of Natal


in Durban, and currently lectures in assessment and various aspects of
industrial and organisational psychology at Rhodes University in
Grahamstown. He is also well known in the commercial seminar circuit
where he often presents papers on selection issues, especially as these
relate to shift-workers and the effects of HIV-AIDS on mental
functioning.
Preface to the first edition

There are numerous books that deal with psychological assessment, so


why another?

Firstly, most of the existing texts tackle the issue of assessment from a
clinical psychology or mental health perspective, while very few
approach assessment from an organisational perspective.

However, we are all aware that much of the work done in organisations
by industrial psychologists and people with an industrial or
organisational psychology background involves psychological
assessment in some form. The best known is assessment for selection
and placement, although various other forms of assessment take place
routinely. Most people also assume that assessment is aimed at
individuals, but organisational effectiveness depends on sound practice
at three or four distinct levels – the individual, the group or team, the
organisation, and external stakeholders such as clients and suppliers.
Most of the available assessment texts do not cover all these parameters.

Furthermore, although many of the technical and scientific aspects of


psychological assessment are common to all areas of applied
psychology, there are numerous facets and applications that are unique
to the organisational context. These include selection (and fairness in
selection), performance appraisal, assessment centres and the evaluation
of training outcomes, to name a few. In addition, organisational
application, more than any other area of applied psychology, is
increasingly being confronted with issues related to computerised
assessment, both on site and via the internet.

In the light of the above, this book is more of an explanatory than a


critical text, although where the material is problematic and/or not
properly defined (such as the concepts of intelligence, emotional
intelligence and competence), the text gives a fair amount of background
information and theory. This will enable a person carrying out an
assessment to understand exactly what he is assessing and to make an
informed choice about the technique or type of measurement to use, as
well as to understand what the results really mean.

The book is divided into five sections. The first begins by looking at the
basic theory of measurement and puts forward an unashamedly
empiricist view – we need to measure objects in order to understand
them fully. The text describes the properties of a good measuring device.
It then distinguishes between looking at and looking for, and goes on to
examine how we set about systematically observing some phenomenon,
a discussion which takes us on to how we set about drawing up an
accurate and reliable instrument or technique for assessing human
characteristics and attributes.

The second section of the book covers the basic technical matters of
psychometric theory, namely reliability, validity and interpretation of
assessment results, or what an assessment score means. The text also
considers how best to combine several assessment results to ensure
sound decisions (Chapter 6). It takes an in-depth look at the issue of
fairness, how it is measured, and ways to improve the fairness of the
assessment process. The section closes with a discussion of the
principles underlying the sound management of the assessment process,
including the control of assessment materials and the training of
assessment professionals and practitioners. South Africa’s current
policies and standards regarding control and professional training are
compared with those of other countries.

The third section of the book considers various domains or areas in


which assessment takes place. It examines in depth such constructs as
intelligence, personality and competence, illustrating how our
definitions and theories of these constructs shape the assessment
techniques we develop and use, as well as our interpretation of the
results.
In the fourth section, the text examines how and when to apply various
assessment techniques in the workplace (Chapter 14), and the
contribution of psychological assessment to career counselling (Chapter
15). It also considers the psychometric properties and general utility of
interviewing as an assessment tool (Chapter 16). The final chapter in
this section deals with assessment centres, their construction and
scoring, and their strengths and weaknesses.

The last section of the book is Chapter 18, which examines a variety of
“evolving” issues (such as definitions of emotional intelligence and
competence), as well as some emerging trends. One of these is the
increasing computerisation of the assessment process (and the promise
of new techniques and old problems that result from this). The chapter
also looks at some new areas of theory that are likely to impact on
psychological assessment, particularly the theories of artificial
intelligence and chaos theory or complexity science.

Each chapter closes with a summary of the material discussed, as well as


a section designed to test the reader’s understanding with short
paragraph and essay-type questions. In addition to a full set of
references, there is also a glossary of terms used in the text and an
appendix with a special cognitive map of psychological tests. The
cognitive map presents the various types of tests and item formats in
such a way that a person encountering psychological tests for the first
time will be able to understand clearly the nature of the available tests
and test batteries. There is also a second appendix which shows how to
calculate correlations (necessary for various forms of reliability and
validity testing, and for item-whole correlations) using Statistica®,
Excel® and SPSS® (Statistical Package for the Social Sciences).
Preface to the second edition

It has been five years since the first edition appeared and, although the
content is as relevant as it ever was, there is a need to revisit and update
the material in the light of changing developments and new theory.

In addition to the updating of the contents of most chapters, the most


significant changes are as follows:

Two new chapters have been included. The first of these is Chapter 8 in
which the important issues of assessing in a cross-cultural context are
explored in some depth, looking at both the theoretical issues raised by
assessing people with limited ability in the language of the assessment
(usually English in our context) and differences in understanding
resulting from different social and cultural experiences. The chapter
examines some of the technical issues (based on Item Response Theory
or IRT) required to detect the presence and analyse the extent of any
cross-cultural factors that may affect the validity and fairness of the
decisions based on the assessment.

The second new chapter is Chapter 13, which looks in depth at the
assessment of honesty and integrity. Some of the material was
previously contained in the chapter on Personality (Chapter 11), but it is
such an important topic in this country and throughout the world that it
was decided to devote a whole chapter to it.

In addition, some fairly extensive changes have been made to the


contents of what was Chapter 8 (now Chapter 9) on the control of the
assessment process. In particular, the issues of the statutory control of
testing, the classification of psychological tests and whether Internet
testing should be supervised by registered psychologists are discussed
both within a local historical context and in the light of significant
developments elsewhere in the world. These issues are also reflected in
the final chapter which looks at the future of testing and assessment
(Chapter 18, previously Chapter 16). In particular, the issues of whether
and how testing and assessment aimed at redressing previous
disadvantage and inequity are addressed.

Apart from these changes, the points made in the Preface to the first
edition remain as true and as important today as they were five years
ago.
Foreword to the first edition

Assessment has always been an important aspect of management science


and will continue to be a cornerstone of human resource practice – a fact
that has been clearly recognised by many practioners and consultants in
the field. It is therefore surprising to note that relatively few texts
looking specifically at the various forms of assessment in the workplace
have been published.

To find a wonderfully produced and handsome book that covers most, if


not all, of the areas of interest to industrial/organisational psychologists
and human resources managers under one cover is a real pleasure. When
this is done in a way that is theoretically sound and current, while at the
same time being practical and hands on, it becomes a very worthwhile
addition to the literature on the topic. An added bonus is the fact that the
text draws on and even contributes to international best practice, while
remaining sensitive to the issues facing the dynamics and complexities
of the South African workplace.

Students want textbooks that are accessible, comprehensive and up to


date. They also want them to be engaging, including realistic examples
and case studies, and they like an attractively produced text. This book
admirably fulfils all these criteria. I particularly appreciate the clarity
and engaging style in which the book has been written. It will be of
benefit both to students who are new to the area and to seasoned
professionals in the field needing a clear and concise overview of the
theories and practices on which scientifically sound and culturally fair
assessment is based.

This is a book that will be referred to again and again because of its
usefulness in so many courses and in the real-life work situation. It will,
I am sure, become a “must buy” for many. The author is to be
congratulated on producing a real gem.

Adrian Furnham
DPhil (Oxon) DSc (Lond) DLitt (Natal)
Professor of Psychology
University College, London

February 2009
Foreword to the second edition

When I wrote the foreword to the first edition I stated that I thought the
book was well written and that it would be well received by students,
academics and practitioners alike. This has indeed been the case.

I am pleased to see this second edition and its two new chapters, one on
assessing integrity and the other on assessing in cross-cultural contexts.
Both of these are vital content areas in today’s world. I am proud to
associate myself with this updated edition which I am convinced will
have an impact as great, if not greater, as the first edition.

The author needs to be commended on maintaining a high standard in


this important area.

Adrian Furnham
DPhil (Oxon) DSc (Lond) DLitt (Natal)
Professor of Psychology
University College, London

May 2014
Acknowledgements

Although I have tried to acknowledge all references in the text and to


give credit for sound ideas when I have used them, any mistakes or
misunderstanding are my own responsibility. I would like to thank those
colleagues who assisted me by commenting on the original draft and
making sound suggestions about content and layout. Thank you, Roelf,
Maureen and Bernadette – if I have ignored some of your suggestions,
forgive me. If the text is weaker because I did not understand or chose to
ignore your comments, only I can accept the responsibility for this.

I also acknowledge the input I received from various colleagues in the


field and I value the suggestions they made during the development of
the book. I trust they will recognise their contributions when they see
them.

I must also thank the staff of Van Schaik Publishers for all the work they
have put into the market research for, and production of, the book over
the years. In particular, the efforts of Julia Read and Nangamso Phakathi
are acknowledged. The hard work of Marike von Moltke and Lee-Ann
Lamb in the production of the second edition is also recognised with
gratitude.

Finally, I must express my sincerest gratitude to the peer reviewers for


their painstaking perusal of the draft manuscript and the positive
suggestions that they made. Most of all I wish to thank them for the
positive things they have said about the book.

I trust that you, the reader, will find the text useful and I hope that it
inspires you to be a true professional, approaching the assessment of
others with a scientific rigour and great sensitivity. What may seem like
an everyday activity to you may mean the difference between success
and failure for the people you assess – every assessment is a “high-
stakes” process in the lives of those who undergo it.

ALWYN MOERDYK
Grahamstown
February 2009 / May 2014
The aim of this book

This book is designed as a mid-level text which addresses various


aspects of assessment and psychometric theory of relevance to graduate
psychologists in South Africa in the first decades of the 21st century. It
consists of five sections.

Section 1 begins with a chapter that looks at what assessment is and the
benefits of quantification within a positivist or neo-positivist framework.
Chapter 2 examines processes of obtaining data, using a general
observation approach. Chapter 3 outlines the process of drawing up
and/or translating psychological measures. (This provides the basis for a
useful practical exercise or project for more senior students.)

Section 2 consists of five chapters (4–8) which deal with central


psychometric topics, namely reliability (Chapter 4), validity (Chapter 5),
interpreting assessment scores (both normative and other) and ways of
combining scores from different measures to arrive at a sound
conclusion (Chapter 6). Chapter 7 considers the issue of fairness, a topic
of particular relevance to the current local context, while Chapter 8
which is a new chapter, explores important issues of assessing in a
cross-cultural context. Chapter 9 considers issues of administration,
confidentiality and the like.

Section 3 consists of three chapters, and examines various assessment


domains, namely intelligence and ability (Chapter 10), personality
(Chapter 11) and competence (Chapter 12). Chapter 13 looks in depth at
the assessment of honesty and integrity. This is also a new chapter in
this edition.

Section 4 examines how assessment is used in various organisational


areas. Chapter 14 looks at assessment in the organisational context
(including selection) and Chapter 15 considers issues relating to career
counselling. The next chapter examines the processes and psychometric
properties of interviewing in the organisational arena (Chapter 16) and
assessment centres (Chapter 17).

Section 5 consists of only one chapter (Chapter 18) in which new


directions in assessment, including issues relating to testing over the
Internet, are discussed.

Finally, there is a comprehensive glossary of terms and a bibliography.


Contents

Preface to the first edition


Preface to the second edition
Foreword to the first edition
Foreword to the second edition
Acknowledgements
The aim of this book

Section 1 Basic theory of assessment


Chapter 1 Introduction to why and how we assess
1.1 Introduction
1.1.1 What is assessment?
1.1.2 Measurement
1.2 The advantages of quantification
1.2.1 Objectivity
1.2.2 Precision
1.2.3 Analysis and comparison
1.2.4 Generalisability
1.2.5 Communication
1.2.6 Economy
1.3 Why do we assess?
1.4 Formative and summative assessment
1.5 How do we obtain data?
1.5.1 Direct observation
1.5.2 Historical records
1.5.3 Referral information
1.5.4 Interviews
1.5.5 Written answers
1.5.6 Intervention
1.6 Triangulation
1.7 Levels of measurement
1.7.1 Nominal data
1.7.2 Ordinal data
1.7.3 Interval data
1.7.4 Ratio data
1.8 How do we know if our measure is a good one?
1.8.1 Is my measure relatively consistent?
1.8.2 Does my measuring device measure what it claims to
measure?
1.8.3 Is my measure fair – does it treat what is being measured
fairly?
1.8.4 What do my scores actually mean?
1.9 Problems associated with quantification in social sciences
1.10 Summary

Chapter 2 Observation
2.1 Introduction
2.1.1 Casual observation
2.1.2 Systematic observation
2.2 The ABCs of observation
2.2.1 Antecedents – those things that go before
2.2.2 Behaviours
2.2.3 Consequences – things that follow from the behaviour
2.3 Ways of categorising the observation process
2.3.1 Context
2.3.2 Observer involvement
2.3.3 Intervention or manipulation
2.4 Use of tools or aids
2.5 Observation schedules
2.6 Assessment as a form of research
2.7 Ethical issues
2.8 Summary

Chapter 3 Developing a psychological measure


3.1 Introduction
3.2 Techniques used in measurement
3.3 Types of content
3.4 Application formats
3.5 Developing a scale or test
3.5.1 Conceptualising
3.5.2 Operationalising
3.5.3 Quantifying
3.5.4 Pilot testing
3.5.5 Item analysis
3.5.6 Norm development and interpretation
3.5.7 Evaluation of the technique
3.6 Answer formats
3.6.1 Dichotomous items
3.6.2 Likert scales
3.6.3 Guttman scales
3.6.4 Item weighting
3.6.5 Item direction and reverse scoring
3.6.6 Ipsative scoring
3.7 Summary

Section 2 Introduction to psychometric theory


Chapter 4 Reliability
4.1 Introduction
4.1.1 The theory of measurement
4.1.2 Why do random errors occur?
4.1.3 Definition of reliability
4.1.4 Robustness versus sensitivity of assessment
4.1.5 Standard error of measurement
4.2 Sources of error
4.2.1 The assessment technique or measure
4.2.2 The assessment process
4.2.3 The person being assessed
4.2.4 The administrator or scorer
4.3 Forms of reliability determination
4.3.1 Test–retest reliability
4.3.2 Parallel or alternate form reliability
4.3.3 Internal consistency
4.3.4 Inter-scorer or inter-rater reliability
4.4 Factors affecting reliability
4.4.1 Speed versus power tests
4.4.2 Restriction of range
4.4.3 Ability level and ceiling or floor effects (skewness of
distribution)
4.4.4 Length of scale (number of items)
4.4.5 Subjective scoring
4.5 Summary

Chapter 5 Validity
5.1 Introduction
5.2 Forms of validity
5.2.1 Construct (theoretical) validity
5.2.2 Content validity
5.2.3 Criterion-related (empirical) validity
5.2.4 Face validity
5.2.5 Ecological validity
5.2.6 Incremental validity
5.2.7 Synthetic validity
5.3 Interpreting validity coefficients
5.4 The criterion problem
5.5 Validity generalisation
5.6 Factors affecting validity
5.6.1 Characteristics of the assessment technique or instrument
5.6.2 Individual characteristics
5.6.3 Demand characteristics
5.7 Summary

Chapter 6 Combining and interpreting assessment results


6.1 Introduction
6.1.1 Expectancy tables
6.1.2 Norm referencing
6.1.3 Age, grade and developmental stage referencing
6.1.4 Criterion or domain referencing
6.1.5 Self-referencing
6.2 Norms
6.2.1 Interpretations based on central tendency
6.2.2 Interpretation based on the number of people with lower
scores
6.2.3 Age equivalents or scores
6.2.4 Grade equivalents or scores
6.3 Developing and reporting norms (norm tables)
6.4 Norm groups
6.5 Combining information and making decisions
6.5.1 Mechanical (actuarial) versus clinical combination
6.5.2 Methods of combining various scores
6.6 Comparing results from different tests
6.7 Summary

Chapter 7 Fairness in assessment


7.1 Introduction
7.1.1 Definition of fairness
7.1.2 Fairness, bias and discrimination
7.1.3 Discrimination
7.2 The assumption of psychic unity in relation to psychological
assessment
7.2.1 Etic and emic approaches
7.3 Evidence of unfairness
7.3.1 Group differences
7.3.2 Differential item functioning
7.3.3 Regression analysis
7.4 Approaches to fairness
7.4.1 Unqualified individualism
7.4.2 Qualified individualism
7.4.3 Group-based decisions
7.4.4 The sliding band approach (banding)
7.5 Approaches to ensure fairness in assessment
7.5.1 Natural and inevitable differences
7.5.2 Removal of discriminatory items
7.5.3 Separate tests
7.5.4 Single tests, different norms
7.5.5 Single tests, same norms
7.6 Ways of ensuring fairness in practice
7.6.1 Do not assess
7.6.2 Interviews only
7.6.3 Observation
7.6.4 Separate (different) assessment processes
7.6.5 Same measure, different norms
7.6.6 Single method, single norm
7.7 Summary

Chapter 8 Assessing in a multicultural context


8.1 Introduction
8.1.1 Definitions of culture
8.1.2 Emic and etic approaches
8.1.3 The issue of acculturation
8.2 Approaches to cross-cultural assessment
8.2.1 Apply
8.2.2 Translate/adapt
8.2.3 Develop culture-friendly tests
8.2.4 Develop culture-specific tests
8.3 Forms of bias
8.3.1 Construct bias
8.3.2 Item bias
8.3.3 Method bias
8.4 Forms of equivalence
8.4.1 Construct equivalence
8.4.2 Measurement unit equivalence
8.4.3 Scalar equivalence
8.5 Detecting item bias
8.5.1 Judgemental techniques
8.5.2 Non-parametric statistical approaches
8.5.3 Parametric approaches to DIF analysis
8.6 Method bias
8.6.1 Detecting method bias
8.7 Addressing issues of bias and lack of equivalence
8.7.1 At the design stage
8.7.2 At the implementation stage
8.7.3 At the analysis stage
8.8 Summary

Chapter 9 Managing the assessment process


9.1 Introduction
9.2 Important standardisation procedures
9.2.1 Preparation
9.2.2 Administration
9.2.3 On completion of the assessment
9.2.4 Interpretation of the results
9.2.5 Feedback of results
9.2.6 Confidentiality of results
9.2.7 Dealing with special situations or participants
9.3 Setting and keeping ethical standards
9.3.1 The choice of techniques
9.3.2 The administration of the assessment process
9.3.3 The scoring, interpretation and feedback of results
9.3.4 Security of the material
9.4 The rights of people being assessed
9.5 Statutory control of psychological techniques
9.5.1 South Africa
9.5.2 Britain
9.5.3 Europe
9.5.4 The US/Canada
9.5.5 Australia and New Zealand
9.5.6 China
9.6 The classification of psychological tests
9.6.1 Test type
9.6.2 Setting
9.6.3 Purpose and use
9.6.4 Administration versus interpretation
9.6.5 South Africa
9.6.6 Britain
9.6.7 The US
9.7 Psychological testing on the Internet
9.7.1 South Africa
9.7.2 Britain
9.8 Protection of minority groups
9.8.1 South Africa
9.8.2 Britain
9.8.3 The US
9.8.4 Europe
9.8.5 Australia
9.9 South Africa in relation to other parts of the world
9.10 Summary

Section 3 Domains of assessment


Chapter 10 Assessing intelligence and ability
10.1 Introduction
10.1.1 Intelligence defined
10.2 The historical development of the concept of intelligence
10.2.1 Francis Galton
10.2.2 James McKeen Cattell
10.2.3 Alfred Binet
10.2.4 Lewis Terman
10.2.5 Developments in intelligence testing after Binet
10.3 Structural models of intelligence – the building blocks of
intelligence
10.3.1 Structural (factor analytic) approaches
10.3.2 Thurstone’s theory of primary mental abilities (1930s)
10.3.3 Raymond B. Cattell (1960s–1970s)
10.3.4 Philip Vernon (1950s–1970s)
10.3.5 J.B. Carroll (1930s–1970s)
10.3.6 J.P. Guilford (1950s–1980s)
10.4 The cognitive approach
10.4.1 Hunt’s cognitive correlates approach
10.4.2 Sternberg’s componential theory (1970s–1990s)
10.4.3 Sternberg’s triarchic theory of intelligence (1980s–2000s)
10.4.4 Howard Gardner’s theory of multiple intelligences (1980s–
2000s)
10.4.5 Das and Naglieri’s PASS Theory of Intelligence
10.4.6 Emotional intelligence
10.5 Assessing intelligence
10.5.1 Series items
10.5.2 Matrix items
10.5.3 Odd one out
10.5.4 General knowledge items
10.5.5 Assembly tasks
10.5.6 Group and individual assessment
10.5.7 Verbal and performance scales
10.5.8 Dynamic testing
10.6 The changing context of intelligence testing
10.7 Summary

Chapter 11 The assessment of personality


11.1 Introduction
11.1.1 Definition of personality
11.1.2 Idiographic versus nomothetic approaches
11.2 Theories of personality
11.2.1 Biological approaches
11.2.2 Developmental approaches
11.2.3 Psychoanalytic theories
11.2.4 Need theories
11.2.5 Phenomenological approaches
11.2.6 Trait approaches
11.3 Assessing personality
11.3.1 Observation
11.3.2 Computer-based simulations
11.3.3 Projective techniques
11.3.4 Objective approaches
11.3.5 The trait approach
11.3.6 The factor analysis approach – the case of R.B. Cattell’s 16
PF
11.3.7 The five-factor theory
11.3.8 Multiple-construct batteries
11.3.9 Behaviour-oriented approaches
11.4 Assessing personality in the organisational context
11.4.1 The five-factor model
11.4.2 MBTI
11.4.3 Locus of control
11.4.4 Type A and Type B personalities
11.5 Summary

Chapter 12 Assessing competence


12.1 Introduction
12.1.1 Definition
12.2 Drawing up a competency framework
12.2.1 Decide on the overall purpose of the job
12.2.2 Decide on units of competence
12.2.3 Describe elements of each competency (KPA)
12.2.4 Establish performance criteria
12.2.5 Draw up range statements
12.2.6 Specify sources of information
12.2.7 Identify potential barriers
12.3 Assessment of competence
12.3.1 Levels of competence
12.4 Various kinds of competency
12.4.1 Core and cross-functional competencies
12.4.2 Technical and higher-order competencies
12.5 Advantages of using a competency framework
12.6 How are competencies identified?
12.7 Developing a competency portfolio
12.8 Reliability, validity and fairness
12.9 Competence in non-work-related areas
12.10 Summary

Chapter 13 Assessing integrity and honesty in the workplace


13.1 Definition
13.2 Assessing integrity
13.2.1 Direct (overt) approach
13.2.2 The covert or personality profiling approach
13.2.3 The psycho-physiological approach
13.2.4 The mental health approach
13.2.5 Social/lifestyle profiling
13.3 The psychometric properties of integrity measures
13.3.1 Reliability
13.3.2 Validity
13.3.3 Scope
13.3.4 Faking on integrity tests
13.3.5 Fairness and adverse impact
13.4 Monitors and control factors
13.4.1 Consistency
13.4.2 Impossible responses
13.4.3 Social desirability
13.5 Summary

Section 4 Assessment in the organisational context


Chapter 14 Assessment in organisations
14.1 Introduction – why do we assess in industry?
14.2 Assessment at the individual level – selection
14.2.1 Definition
14.2.2 The selection process
14.2.3 Job descriptions (position profiling/competence mapping)
14.2.4 Implementing a selection process
14.2.5 Benefits of proper selection
14.2.6 The cost of not selecting properly
14.2.7 Staff development
14.2.8 Promotion and transfer
14.2.9 Performance management
14.3 Career path appreciation (stratified systems theory)
14.3.1 Stratified systems theory
14.3.2 Matrix of work relations (MWR)
14.3.3 The concept of flow
14.3.4 The uses of CPA
14.3.5 Trajectories
14.3.6 CPA procedure
14.4 When to use costly selection techniques
14.5 Assessment at group level
14.5.1 Team work
14.5.2 Assessment of industrial relations climate
14.5.3 Selection of people to work abroad
14.6 Organisational aspects
14.6.1 Mapping changes
14.6.2 Training
14.6.3 Forensic evaluation
14.7 Assessing external stakeholders
14.8 Criterion measurement
14.8.1 Production measures
14.8.2 Track record
14.8.3 Judgemental data
14.8.4 Economic value added
14.9 Summary

Chapter 15 Assessment for career counselling


15.1 Introduction
15.1.1 The world of work
15.1.2 What is a job?
15.1.3 What is a career?
15.1.4 Definition of a career
15.1.5 Choosing a career
15.2 What jobs/careers are available?
15.3 The characteristics of jobs and/or careers
15.3.1 A common-sense approach
15.3.2 Holland’s model
15.3.3 Schein’s career anchors model
15.4 Assessing individual characteristics
15.4.1 Ability
15.4.2 Values, interests and needs
15.4.3 Personality
15.5 Summary

Chapter 16 Interviewing
16.1 Introduction
16.1.1 Definition
16.1.2 Users of the information
16.2 Employment interviews
16.2.1 Traditional interviews
16.2.2 Structured interviews
16.2.3 Semi-structured interviews
16.2.4 Counselling interviews
16.3 Problems associated with interviews
16.3.1 Reliability
16.3.2 Validity
16.4 Reasons for poor reliability and validity
16.4.1 Theoretical orientation
16.4.2 Experience of the interviewer
16.4.3 Sophistication of the client
16.4.4 The nature of the problem
16.4.5 Confirmatory biases and self-fulfilling hypotheses
16.4.6 So why do they continue to be used?
16.4.7 Improving interviewing as an assessment technique
16.5 Stages of an interview
16.6 Effective interviewing
16.7 Summary

Chapter 17 Assessment centres


17.1 Introduction
17.1.1 Definition of an assessment centre
17.1.2 Assessment centres and development centres compared
17.1.3 Advantages of assessment centres
17.1.4 Disadvantages of assessment centres
17.1.5 What do assessment centres measure?
17.2 Identifying the dimensions (competencies) to be assessed
17.2.1 Competencies assessed
17.2.2 Definition of each competency
17.2.3 Designing or locating appropriate assessment centre
exercises
17.2.4 Drawing up a scoring system or matrix
17.3 Conducting a typical assessment centre
17.4 Psychometric properties of assessment centres
17.4.1 Reliability
17.4.2 Validity
17.4.3 Fairness
17.4.4 Gender differences
17.5 Improving the cultural fairness of assessment centres
17.5.1 Job analysis
17.5.2 Design of the assessment process
17.5.3 Exercise choice
17.5.4 Administration
17.5.5 Assessor training and rating process
17.5.6 Feedback to participants
17.6 Summary

Section 5 The future of assessment in organisations


Chapter 18 New developments in assessment
18.1 Introduction
18.2 Constructs to be assessed
18.2.1 Intelligence
18.2.2 Potential
18.2.3 Personality
18.2.4 Competencies
18.2.5 Emotional intelligence
18.2.6 Controlling response sets
18.2.7 The assessment of behavioural change
18.2.8 Bespoke (tailor-made) tests
18.2.9 Focused assessment batteries
18.2.10 New constructs associated with positive psychology
18.2.11 Fairness and equal opportunity
18.2.12 Translation, adaptation and development of culture
instruments
18.3 Development of new technologies
18.3.1 Computer-based testing
18.3.2 Computer-assisted administration
18.3.3 Generation of norms
18.3.4 The assessment of additional parameters
18.3.5 Computer-based adaptive testing
18.3.6 Computerised report writing
18.3.7 Assessment via the Internet
18.3.8 Dynamic assessment
18.3.9 Stratified systems theory
18.3.10 Other new technologies
18.4 Theoretical advances
18.4.1 Complexity theory
18.4.2 Artificial intelligence
18.4.3 Biological approaches (biologically anchored assessment)
18.4.4 Greater environmental involvement and activism
18.5 Control of assessment and professional training
18.6 The future of psychological assessment and testing in particular
18.7 Conclusion

Appendices
Appendix 1 Some tests and measures of maximum and typical
performance
Appendix 2 Calculating correlations
Glossary of terms
References
Index
SECTION
1

Basic theory of assessment


In this first section of the book, we take a strong empiricist* line, arguing in
Chapter 1 that we need to measure properties and characteristics in order to
understand and communicate them. We then consider what goes into
constructing a sound assessment instrument (Chapter 2) and in Chapter 3 we
look at the basics of how we set about rigorously observing the characteristic or
process we are trying to measure. The last chapter in this section looks at how we
set about creating an accurate and consistent tool for measuring psychological
attributes and characteristics.
Note that in the text, terms in bold with an asterisk (*) are explained in the
glossary of terms at the end of the book, which includes other important terms
that are unique to this field, but are not necessarily mentioned in this book. Also
note that while the text is written in the masculine gender, this should be read as
the neuter gender – it makes for very clumsy reading to have he/she, his/her in
the text.
1 Introduction to why and how
we assess

OBJECTIVES

By the end of this chapter, you should be able to

define what assessment is


give reasons for assessing
discuss the importance of measurement, especially in an industrial, organisational
or human resources environment
describe the properties of a good measuring technique
discuss approaches to assessment.

1.1 Introduction

If we stop and think for a minute, we will see that there are a number of
reasons for assessing any object, person or process. Firstly, we may
want to see whether the person (or object) meets certain requirements –
does the person know enough to pass his examination? Is the person
coping with his situation?

We may also want to compare different things or situations to help us


make a decision – how fast does this car go compared to another? How
economical is this vehicle (how much fuel does it use) in relation to
another? Is this candidate good enough to be employed and which
candidate is likely to be the best employee for a specific job or context?
Who should be given the bursary or scholarship?

We may also want to see whether a situation has changed over a given
period: has a person’s behaviour or ability improved, stayed the same or
deteriorated over time? Has our intervention (training or counselling)
resulted in any changes?

1.1.1 What is assessment?


Assessment* is the process of determining the presence of, and/or the
extent to which, an object, person, group or system possesses a
particular property, characteristic or attribute*. According to Goldstein,
Braverman and Goldstein (1991), “[a]ssessment is the systematic
collection of descriptive and judgmental information necessary to make
effective decisions”, while Kaplan (1982) argues that “[p]sychological
assessment involves the classification of behaviours into categories
measured against a normative* standard” [author’s bold].

1.1.2 Measurement
A closely related issue is that of measurement*. Although it may not be
too difficult to assess whether a person or system has a certain property,
it is often quite difficult to specify exactly how much of the property the
person or system possesses. For example, we may be able to judge that a
person is beautiful or intelligent, but it is far more difficult to say how
beautiful or intelligent the person is. Theories of measurement are
concerned with quantification of this kind. According to Nunnally and
Bernstein (1993, p. 29),

[m]easurement consists of rules for assigning symbols to objects to


(1) represent quantities of the attributes numerically (scaling) or (2)
define whether the objects or phenomena fall into the same or
different categories with respect to the attributes concerned
(classification).

In other words, measurement involves applying clearly stated rules to


determine how much of a certain property, characteristic or attribute is
present in a particular object, person, system or process.

Where assessment implies the use of structured observation techniques


in order to pass judgement on some phenomenon, measurement refers to
the process of attaching a numeric value to that phenomenon as an aid to
assessing and interpreting it. Measurement in this context therefore
involves determining the

form
size/magnitude behaviours
intensity cognitions
duration of atributes/traits
frequency abilities
antecedents interventions
consequences

or the relationship between measurables.

To illustrate: if I put my hand into a bucket of water to test its


temperature, I am assessing it. If I use a thermometer for this, then I am
measuring the temperature. If I look at myself in the mirror and see that
I am putting on weight, I am assessing my status. If I stand on a scale, I
am measuring my mass or weight. If I look at how fast the trees seem to
be flashing past me as I drive along, I am assessing the speed at which I
am travelling. If I look at the speedometer in my car, I am measuring my
speed. From this you can see that assessment is the process of passing
judgement about some phenomenon – it does not require measurement,
although the judgements made are far more precise and accurate when
we measure than when measurement is not involved.

1.1.2.1 Evaluation*
Related to measurement is evaluation. This involves interpreting or
attaching a judgemental value to an assessment: the water in my bath
may be too hot or too cold, or I may feel that I am too fat or too thin, or
even just right.

If I see water bubbling and steam coming from it, then I can assume that
the water is near to boiling and I do not have to test it. If I see chunks of
ice floating in the water, I can surmise that it is cold. This is called
observation*. Testing*, on the other hand, is the use of an intervention
of some kind to carry out the assessment. Putting my hand in the water
is a crude kind of testing procedure – I can tell if the water is too hot, too
cold or just right. Of course, using a thermometer is a much better option
because you get an accurate measurement that is easy to interpret.
According to Kaplan and Saccuzzo (2013), “the most important purpose
of testing is to differentiate among those taking the test” (p. 9).

1.1.2.2 Properties of a good measuring technique


A good measuring technique will have the following properties:

It will attach an observable phenomenon (e.g. the height of mercury in


a thermometer) to the unobservable phenomenon (e.g. temperature).
There will be correspondence between the two phenomena (e.g. the
hotter the water, the higher the mercury).
The observable phenomenon (e.g. the height of the mercury) will be
scalable (i.e. have numerical value, e.g. degrees Celsius (°C)).
There will be relative consistency between the two phenomena: when
the temperature is at a certain level, the mercury will always rise to
the same scale level.
Because the rules are transparent and consistently applied, different
observers will be able to agree on the value assigned to the
phenomenon. This means that different people will agree that the
length of an object is 21,3 cm, for example.

Measurement is therefore the process by which we attach a value or number to a


phenomenon in order to categorise and/or quantify it, using agreed-upon symbols
and criteria to represent quantities of a property or characteristic that are inherent
in the phenomenon. An important aspect of measurement is that there are rules
for attaching numbers to the phenomena, and these rules are public, transparent,
unambiguous and agreed upon by knowledgeable people.
1.2 The advantages of quantification

Nunnally and Bernstein (1993) identify six major advantages of


quantification.

1.2.1 Objectivity
A key to the growth of understanding is that different observers are able
to agree about what is being observed. Objectivity* is the extent to
which any process and its results are agreed to by neutral or unbiased
observers and is thus independent of the personal or subjective
judgement of those involved. In the case of assessment, objectivity is
enhanced when numerical values are attached to an object or
phenomenon in terms of known rules. Without this agreement between
observers about what is being observed and the results of this process,
there can be no knowledge, only speculation.

The ancient Greek philosopher, Aristotle, is reputed to have stated that women
have fewer teeth than males because their heads are smaller. He could easily
have disproved this theory by simply asking his wife to open her mouth and
counting her teeth! This is what observation is about.

It is on these grounds that Nunnally and Bernstein (1993, p. 6), argue


that “major advances in psychology, if not all sciences, are often based
upon breakthroughs in measurement”. They criticise various aspects of
Freudian theory such as libidinal energy as there are no accepted or
agreed-upon methods for observing or quantifying them (p. 6). To date,
over 100 years on, there is no scale for measuring Freud’s personality
types! As Nunnally and Bernstein point out (1993, p. 7), “psychology
can progress no faster than the measurement of its key variables”.

1.2.2 Precision
Measurement allows finer, more precise distinctions to be made, leaving
room for more subtle effects to be noted than is possible when personal
judgements are made. The average person is not able to judge when the
temperature of an object has risen a few degrees, nor to tell the
difference between a person with an IQ of 100 and one with an IQ of
110. However, in certain circumstances such distinctions could be vital.

1.2.3 Analysis and comparison


Quantification also allows for the more sophisticated analysis of patterns
and trends using statistical techniques such as t-tests, analysis of
variance*, regression analysis and factor analysis. Much of the progress
in personality theory and intelligence assessment over the last century
has depended largely on the development of advanced forms of
statistical analysis. Judging or comparing performance in such
technically complex areas of behaviour as competitive diving or
gymnastics would be impossible without attaching numbers or scores to
each individual’s performance.

1.2.4 Generalisability
A key aspect of any scientific enterprise is to find ways of generalising
from the specific to the general. For example, my dog at home is a
specific case of the more general class of animal known as a border
collie, which is a specific case of the more general class of creatures
known as dogs, which is a specific case of the more general class of
creatures known as mammals, which is …, and so on. Measurement
allows us to quantify and classify objects within larger superordinate
classes. In this process, we are able to specify what each case has in
common with other cases and how they differ. Although it may be
argued that everybody is unique and has nothing in common with other
people, this is clearly not so. Look around and you will see other males
and females, people of African, Indian and European origin. In some
ways we are all the same (we breathe oxygen and bleed red when we cut
ourselves), in others we are the same as only some (we are male or
female, for example) and in some respects we are unique. Our level of
focus depends on the questions we ask and the type of evidence we
regard as answers to these questions.

1.2.5 Communication
It is much easier to communicate and interpret information that is in
symbolic or numeric form. For example, we know what is meant when
we read that School X produces more A symbols in Grade 12 than does
School Y. We know what the A symbols refer to and so the information
does not have to be explained. However, suppose it is reported that a
new medication seems to make people more anxious. What does “more
anxious” mean and how would other researchers of the same
phenomenon interpret this? Conversely, if it is reported that people’s
average anxiety levels, as measured by the XYZ Anxiety Scale, rose
from 7,3 to 8,6, everyone who knows the XYZ Anxiety Scale would
easily understand what this means.

1.2.6 Economy
It is much easier to state that, on average, anxiety levels as measured by
the XYZ Anxiety Scale rose from 7,3 to 8,6, than to try to explain or
describe what this means in words.

In the physical and biological sciences, the design of good measuring


tools is fairly simple because many of the characteristics or properties
that interest us are directly observable. In many cases, these properties
may not be visible to the naked eye and may need to be magnified. In
other instances, phenomena may be readily observed (such as heat or
weight), but may need special tools or techniques such as thermometers
and scales to quantify them.

In the case of social and psychological phenomena, most properties or


characteristics are not directly observable, and techniques or instruments
like thermometers or scales need to be developed. The aim of this book
is to show you how we set about developing instruments* for
measuring psychological and social phenomena.

But, you may ask, why should we want to quantify any phenomenon?
Why do we need to attach numbers to properties?
1.3 Why do we assess?

Assessing, as defined earlier, is gathering information. There are a


number of good reasons for assessing people and other phenomena:

1. To obtain accurate information to describe an existing situation


2. To gain an understanding of reasons for the occurrence of an
existing situation
3. To suggest ways of changing an existing situation
4. To illustrate the impact of any intervention or change process
5. To enable us to predict the future behaviour or actions of the people
involved

Of course, we do not only assess the behaviour of individuals, but also


the performance of pairs of people (e.g. married couples), small groups
of people (e.g. families or teams), and larger groups of people such as
departments or whole organisations (e.g. safety attitudes, productivity
records).

1.4 Formative and summative assessment

It is important to distinguish between formative and summative


assessment.

Formative assessment* is concerned with what happens during a


process, and is designed to help the person managing that process to
gain insight into what is taking place and to modify it as necessary. This
form of assessment is more concerned with steering the process. An
example would be to ask students during a course of lectures what they
feel about the course, lecturing style, speed of delivery, and so on, to
help the lecturer adjust his delivery.

Summative assessment* is concerned with the outcome of a process or


the current status of a phenomenon, for example the results of the end of
a course, test or examination.

1.5 How do we obtain data?

Given the reasons for wanting to obtain information described in 1.3, the
next question that arises is where and how do we obtain this information
or what are the sources of information available to us? In general, there
are six basic approaches to obtaining data.

1.5.1 Direct observation


An important source of information is observation. Direct observation
means that the person doing the assessment watches or observes the
behaviour of the person in question in a particular setting and draws
conclusions based on this. For example, a psychologist may watch
children at play or in a classroom situation, and base his evaluation on
what takes place in these settings. Similarly, a social psychologist may
observe how people behave in a supermarket, at a sports or cultural
event, or when they visit a relative in a hospital or old-age home. In such
cases, some form of direct observation occurs. Direct observation can be
time consuming and labour intensive, but is particularly useful when
respondents are self-conscious, unable or unwilling to talk, or when the
behaviour is very complex.

Furthermore, it is a mistake to think that psychology is about people


only – there are many aspects of human behaviour that can best be
understood by looking at similar or related forms of behaviour in
primates and lower forms of animals. This branch of psychology is
known as comparative or evolutionary psychology*.
Over the years it has been common practice first to assess the effect of
an intervention on simpler organisms. For example, much of the early
work on learning used rats, dogs and pigeons because it is easier and
more ethically defensible to experiment on animals than on humans.
Although we know that there is no direct link between the behaviour of
lower animals and humans, we nevertheless need to examine the effects
of interventions (including medication) in as risk free a way as possible.
Thus, we examine the effects on animals before we extend the trials to
humans. In addition, however, many people argue that the exploration of
the behaviour of primates and other animals is of interest to
psychologists in itself, and not for what we can learn about human
behaviour, cognitions and the like.

1.5.2 Historical records


Because people are generally fairly consistent in what they do, the best
single predictor* of how they are likely to behave in the future is how
they have behaved in the past. It is therefore very useful when assessing
people to see what they have done in the past. This includes consulting
existing documents, such as school, court or medical records, case
histories or the person’s track record. At the same time, where the future
is not continuous with the past, past behaviour will not predict future
behaviour, and so other ways of gathering information or data become
more important.

1.5.3 Referral information


Most properly organised schools, clinics, hospitals and other agencies
have some kind of standardised intake form from which it is possible to
pick up potential problems. In many cases, there are also referral forms
by means of which problem cases are brought to the attention of the
authorities. If these do not exist, they should be designed and put into
place.

1.5.4 Interviews
A fourth source of information is to ask questions of the person and/or
those involved, such as parents, teachers and even victims, in the case of
a crime. These interviews may be structured or unstructured. In many
instances (such as with hospital/clinical intake and job selection
interviews), these interview schedules are standardised and available
commercially (see Chapter 16 for more details).

1.5.5 Written answers


Sometimes, the participant may be required to produce written
information. This may take several different forms, ranging from
relatively unstructured to quite formal and structured questionnaires*.
Examples include the following:

A narrative statement (as in the case of a crime or accident)


A relatively loose collection of questions in which answers to specific
questions are sought
A more structured survey, where responses to a number of questions
are required
A scale*, which is essentially a collection of questions designed to
assess a specific area of behaviour, such as personality, attitudes*,
values, various psychological problem areas and the like. No time
limit is set for responding.
A test*, which is essentially a sample* of information gathered under
strictly controlled, normally timed conditions. A test is nothing more
than observed data of a sample of behaviour gathered under
controlled conditions to exclude as many extraneous distracting or
noise* variables as possible. This applies to both psychological and
educational tests. Because of the importance of psychological testing
in assessing people, there is a whole chapter (Chapter 3) on how we
set about developing a test.

1.5.6 Intervention
The final form of observation involves some form of direct intervention
by the observer in an effort to answer “What if?” questions. For
example, a therapist may take a toy from the child he is observing to see
what happens when the toy is taken away. This is clearly an extreme
version of participant observation. If the therapist repeats this
intervention several times and controls extraneous conditions that may
influence the outcome, then this is called an experiment*.

1.6 Triangulation

An important issue to take note of when we talk about assessment is that


of triangulation*. This simply means that we should not rely only on
one form of assessment but use as many different approaches as are
warranted. According to Foxcroft and Roodt (2009, p. 7),

information should be purposefully gathered across:

multiple measures
multiple domains*
multiple sources
multiple settings
multiple occasions.

They argue (p. 6) that “the assessment process is multidimensional in


nature. It entails the gathering and synthesizing of information as a
means of describing and understanding functioning. This can inform
appropriate decision-making and interventions”.

Elsewhere they note:

Attention has shifted away from a unitary testing approach to multi-


method assessment. There was a tendency in the past to erroneously
equate testing and assessment. In the process, clinicians forgot that
test results were only one source of relevant data that could be
obtained. However, there now appears to be growing awareness of the
fact that test results gain in meaning and relevance when they are
integrated with information obtained from other sources and when
they are reflected against the total past and present context of the
testee (Foxcroft & Roodt, 2005, p. 23).

1.7 Levels of measurement

Before we close this section on measurement, we need to examine the


concept of levels of measurement. We have already noted above that
measurement serves two purposes, namely quantification (how much)
and categorisation (what kind). Although this is true, we can in fact use
numbers to indicate four different kinds of relationship. These are
known as levels of measurement*. Each level has certain properties and
different statistical manipulations that can be performed on the data.

1.7.1 Nominal data*


The first of the levels of measurement is termed “nominal” (or
“categorical” because it refers to categories) and is used to indicate
membership of a class. For example, we could label males 1 and females
2 (or vice versa). This does not mean that males are better or worse than
females in any way. In rugby, the last player in the scrum is a number 8
or the eighth man, while the fullback is number 15. This does not mean
that the fullback is worth more or less than the number 8 – it is merely a
name. The only property that is associated with a nominal scale is
equivalence – that is, all objects with the same number belong to the
same category.

The only permissible statistical manipulations that one can perform with
nominal data is to count the number of cases (e.g. in the psychology
class there are 93 1s (males) and 167 2s (females)). If there are several
categories, we can also establish the mode, which is the category with
the most members. We can also ask whether these numbers are to be
expected (given the number of males and females in the university as a
whole). To do this we would use one of several non-parametric
statistical techniques*, the best known of which is the chi-square (ᵪ2)
test.

1.7.2 Ordinal data*


Ordinal scales* place objects in rank order so that they can be graded
harder, bigger or stronger than other objects. However, this does not say
how much of each property any object has, only that it has more or less
than other objects to which it is compared.

The permissible statistics that can be used include percentiles*,


interquartile range*, median* (the 50th percentile) and various order-
based statistical tests.

1.7.3 Interval data*


With interval data, the size of the difference (or interval) between scores
is regarded as equal. For example, the difference between 20 °C and 40
°C is the same as the difference between 60 °C and 80 °C. However,
with interval scales there is no absolute zero (we can get temperatures of
220 °C and 240 °C, therefore we cannot say that 20 °C is half as warm
as 40 °C or that 80 °C is twice as hot as 40 °C). This is possible only
where there is an absolute zero as with a physical property like weight
(mass). Nothing has a negative weight or is a negative length, so 40 kg
is twice as heavy as 20 kg and 60 cm is half of 120 cm.

The permissible statistics that can be used with interval data are the
arithmetic mean* (average), and statistics based on variance, such as t-
tests, Pearson correlation and analysis of variance. The normal
distribution and interpretation of an individual’s performance relative to
that of the group require interval data.

1.7.4 Ratio data*


Ratio data are characterised by equal intervals and an absolute zero*.
This absolute zero makes concepts such as “twice” or “half” meaningful.
(See also section 1.7.3.)
The permissible statistical techniques are the same as with interval data,
but include ratios as well.

Temperature is a good example of how this works in practice. At the


nominal level, we could say that cold = 1, hot = 2, lukewarm = 3, and so
on. At the ordinal level, we could arrange the temperatures as follows:

Description Value
Very cold ............................................ 1
Cold ................................................... 2
Not so cold ......................................... 3
Lukewarm .......................................... 4
Quite warm ........................................ 5
Hot ..................................................... 6
Very hot.............................................. 7
Boiling hot.......................................... 8

Note that the number of categories is quite arbitrary, and although the
temperatures are arranged in ascending order from coldest to hottest, the
difference between the various values is not constant – the difference
between “very cold” and “not so cold” is not the same as between “not
so cold” and “lukewarm”.

If we look at the centigrade or Celsius temperature scale, we know that


water freezes at 0 °C and boils at 100 °C, and that the difference
between 20 °C and 30 °C is the same as between 60 °C and 70 °C. This
is a true interval scale. However, because we can have temperatures
below 0 °C (e.g. 223 °C), this is not a ratio but only an interval scale,
therefore we cannot say that 60 °C is twice as hot as 30 °C.

Finally, there is the Kelvin (or absolute) scale, which has an absolute
minimum of 2273,15 K. It is impossible to have a temperature lower
than this. According to the Kelvin scale, water freezes or melts at 273,15
K and boils at 373,15 K (273,15 K + 100). As a result of this absolute
zero, 100 K is exactly half as hot as 200 K. These relationships are
shown in Figure 1.1.

Figure 1.1 The four types of data applied to temperature

Several points need to be made about these four levels of measurement:

1. There is very little difference between ratio and interval data,


especially in psychology and other social sciences.
2. In almost all cases, psychological data are, at best, interval data; it is
difficult to imagine what an absolute zero would be when applied to
intelligence, anxiety or happiness.
3. We can easily move from a higher form of data such as interval to a
lower form of data such as ordinal or nominal. Sometimes we do
this when, for example, we group people with, say, a stress level of
between 5 and 10 (on an imaginary stress test) as suffering low
stress; those between 11 and 25 as moderately stressed; those with
stress scores of more than 26 as highly stressed. This is quite
permissible, but remember that moving from a higher to a lower
form of data means losing potentially useful information.

The levels of measurement and permissible statistics are summarised in


Table 1.1.

Table 1.1 Levels of measurement and permissible statistics

Example
Type of from Example from
Basis Permissible statistics
data natural social sciences
sciences
Nominal Group Number of cases Mode Food type Ethnic origin
membership Chi-square (ᵪ2) Gender

Ordinal More than, Median, quartiles and Hardness of Class test


less than inter-quartile range, minerals grouping (1s,
percentiles Order Cold, warm, 2/1s, 2/2s, 3s,
statistics hot F1s, F2s, etc.)
Interval Equality of Mean, variance, Temperature Most test scores
intervals or correlation, regression °C
differences analysis
Ratio Absolute Mean, variance, Temperature Physical
zero correlation, regression K Height properties only
Equality of analysis Weight (no. of children,
ratios age, etc.)

Source: Based on Nunnally & Bernstein (1993, p. 11)

1.8 How do we know if our measure is a good one?

When we set about measuring any characteristic or process, there are


three basic properties of the measuring process we need to know about.

1.8.1 Is my measure relatively consistent?


The first property of a good measuring instrument is that it measures in a
constant fashion, and that we get similar results under different
conditions. For example, if I weigh myself today and get a reading of 51
kg, I would expect to weigh about 51 kg tomorrow and the next day
(assuming that nothing changed over the period). If I were to get very
different results on two consecutive occasions, I would suspect that the
scale was faulty or that something dramatic had happened to me
between weighings. If my reading on the scale was 51 kg one day and
45 kg the next, I would be correct in thinking that the scale was wrong.
More importantly, I would also not be able to say with any confidence
whether my true weight was 51 kg or 45 kg – or any other weight for
that matter. This property of consistency is known as reliability and is
discussed in depth in Chapter 4.

1.8.2 Does my measuring device measure what it claims to


measure?
Even if my measuring instrument (e.g. the scale) were to give me a
consistent reading under different circumstances, this does not
necessarily mean that it is accurately measuring what it claims to be
measuring. Suppose that the scale is badly calibrated so that it over-
measures by 3 kg every time. This would mean that if the scale
measures my weight as 51 kg, my real weight is in fact only 48 kg. Even
though the scale is consistent (i.e. it reports my weight as 51 kg every
time), it is inaccurate (because my weight is really 48 kg).

For a more psychological application, suppose I try to measure a


person’s intelligence level using various tests that are presented in
English. Although this may be a reasonable method if the person being
assessed understands English well, the results would not be very
accurate if the person did not understand English at an acceptable level.
My measurement of intelligence would be affected by a factor that has
nothing to do with intelligence. Therefore the second bit of information
we need to have about any measure we use is whether the measure
actually measures what it claims to measure or whether it in fact is
measuring something else. The technical term for this is validity, which
is discussed in depth in Chapter 5.
1.8.3 Is my measure fair – does it treat what is being
measured fairly?
The third vital component of assessment is whether the technique is
equally valid for different groups of people. Suppose my weighing scale
for some reason always read 5 kg higher when it weighed brunettes and
5 kg lower when it weighed blondes, we would argue that the scale
treated the two groups differently. If a person’s weight was important
for some reason, we could say that the scale was being unfair to the one
group in relation to the other. This issue of fairness* is of particular
importance in situations where different groups exist (such as in present-
day South Africa). Various issues around notions of fairness and
discrimination are discussed in depth in Chapter 7.

1.8.4 What do my scores actually mean?


Besides the above three vital components, we also need to look at the
various ways in which we can interpret our results – what does a score
mean? Furthermore, because people are often assessed by more than one
test or technique, the question arises as to how we can best combine
different assessment scores to come to a single sound decision. Suppose
I wish to determine whether a person should be selected for a job or a
bursary (it does not matter, the principle is the same in all cases).
Suppose also that I use five different assessment techniques and the
person meets the selection* requirements on only three of the five. The
question then is how do we decide whether or not to select the person?
These issues, and different ways of addressing them, are dealt with in
Chapter 6.

1.9 Problems associated with quantification in social


sciences

Thus far we have discussed the benefits of quantification – and indeed


these are very real. However, there is a powerful school of thought that
argues that these models apply to the physical sciences and not very well
or not at all to the social sciences. Those who favour this line of
reasoning point out that in many cases psychological and other social
phenomena are not “real” entities (in the way that trees and plants and
other tangible objects are “real” entities), but rather that social
phenomena are generally constructed by the people involved. As a
result, observers are part of the situation being observed, and the ideal of
a neutral and uninvolved observer cannot be achieved. Furthermore,
observers do not always agree about what they observe, and even if they
do agree about what they see, they differ about what it means and why it
has occurred.

As a result there are two schools of thought in psychology (and the other
social sciences) which can be labelled the empiricist or quantitative*
approach and the constructivist or qualitative* approach. The
empiricist/quantitative approach believes that the world is real and exists
outside the experiences of the observer, and can therefore be measured
in reasonably accurate ways. On the other hand, the
constructivist/qualitative school argues that everything is in the mind
and is created by the observer in terms of categories and relationships
that have been learned. This group of people prefers to examine the
language that is used and how this shapes the stories or narratives people
use to describe their experiences. In the middle is a group of people who
see themselves as critical realists*, arguing that there is a real world out
there, but that this is shaped and constructed by our life experiences,
value systems and the cognitive schemata and categories that we bring
to bear on the issues.

Although we will not go into any great analyses in this regard, we need
to take note of these philosophical issues, because the approach we use
will be influenced by the theory we adopt. This book adopts a critical
realist outlook – there is some kind of reality out there, although we all
interpret it slightly differently as a result of our socialisation
experiences.
1.10 Summary

In this chapter, we looked at why we assess and began by asking


ourselves what assessment is and how it relates to concepts such as
measurement and evaluation. We then looked at the properties of a good
measuring technique. In describing the advantages of quantification, we
argued that quantifying phenomena improves objectivity and precision;
it allows for better analysis, comparison and generalisation of
information; and it improves the ability to communicate results, which
leads to a more verbally economical process. A distinction was made
between formative and summative assessment.

In answering the question “How do we obtain data?”, we observed that


this can take place through direct observation, the use of historical
records and referral information, interviews and questionnaires. A final
source of information was interventions or experiments. We saw that
using more than one source of information was important. Levels of
measurement and their different qualities of information allow for
different forms of manipulation and analysis.

We then considered briefly how to ensure that the measure is a good one
(consistent), that it measures what it claims to measure and that it does
this in a fair manner, and touched on the idea of how to interpret our
scores. In closing, we discussed some of the problems associated with
quantification in the social sciences.

Additional reading

The section on the advantages of quantification comes from Chapter 1 in Nunnally, J.C.
& Bernstein, I.H. (1993). Psychometric theory. This is a very good, though somewhat
technical account of psychometric theory, taking an analysis of variance approach.
For a refresher on levels of measurement and basic statistical concepts, see Cohen,
R.J. & Swerdlik, M.E. (2002). Psychological testing and assessment: An introduction to
tests and measurement (5th ed.), especially Chapter 3.
Test your understanding

Short paragraphs

1. What are the advantages of quantification?


2. What are the problems associated with quantification in psychology and the social
sciences in general?
3. What is meant by triangulation?
4. What is meant by the four levels of measurement? Use a diagram to explain this if
you like.

Essays

1. “The ancient Greek philosopher, Aristotle, is reputed to have stated that women have
fewer teeth than men, because their heads are smaller. He could easily have
disproved this theory by simply asking his wife to open her mouth and counting her
teeth!” Comment on this statement, showing why observation and quantification are
important in any scientific endeavour.
2 Observation

OBJECTIVES

By the end of this chapter, you should be able to

distinguish between casual and systematic observation


describe the steps that need to be taken in preparing for systematic observation
distinguish between naturalistic and simulated observation
describe the different levels of observer participation in the observation process.

2.1 Introduction

In Chapter 1 we saw why it is important and often necessary to assess


phenomena in the world – it enables us to describe, understand, predict
and control what is happening around us and to us. One of the benefits
of this understanding and control is that it allows us to provide new
experiences in a controlled manner in order to encourage growth and
development. This is the purpose of education in general, and training
and counselling in particular. Assessment is thus the cornerstone of
knowledge, understanding and controlling change.

Clinical, counselling and educational psychologists gather information (via


assessment in its various forms) to help people gain insight into their own thought
and behaviour patterns in the hope that this will foster growth and development
towards a more integrated and satisfying life. In organisational settings, this
information is gathered to describe and hopefully improve the functioning of
individuals and groups within the organisation, and the organisation as a whole.

So people and complex systems can be observed to help us understand


why they operate as they do and to determine what can be done
differently to make them work better, however that is defined. Although
there are numerous ways in which individuals and systems can be
assessed, they all have one thing in common – the process of assessment
begins with the systematic observation of the phenomenon of interest.
Observation is important because it is the first step in assessment and
also in the scientific process of generating knowledge (Bordens &
Abbott, 2008, p. 15).

2.1.1 Casual observation*


Suppose we go to the zoo with friends or family. We wander from one
enclosure to another and in the process notice (i.e. observe) a number of
things: the lions are safely behind high walls, and the birds and monkeys
are in roofed cages or enclosures so that they cannot escape. We also
note the sounds the animals make and their colours – some are brighter
and better looking than others. We may even notice how the
chimpanzees and big apes play with and groom each other – and that the
seals stink!

If we are interested in people, we may also notice things about the


people who are visiting the zoo. We may become aware that there are
two major groups: young married couples with their children of between
six months and ten years of age, and older couples who have probably
retired and are keeping themselves active. A third group may be people
from a specific organisation who are having their staff outing or team-
building session at the zoo.

This sort of casual observation can be termed “looking at” the


phenomena.

2.1.2 Systematic observation*


However, there is another kind of observation that can take place. This
is termed “looking for” things. If we continue with the zoo story, we
may sometimes find a group of people (students or scholars) who appear
to be studying various phenomena; they seem to be focusing on specific
situations and trying to observe these from a number of different vantage
points. They may even intervene in some way, for example by placing
an obstacle in an animal’s way to see what it will do under those
circumstances: will it give up and stop what it is doing, or will it try to
find a way around the obstacle? If it solves the problem, how long did it
take? If the students continue to put obstacles in the animal’s way, does
the time it takes to solve the problem decrease? Is the animal learning
how to overcome the problem? Are younger animals better than older
animals at learning, and are there gender differences? If there are, are
there lessons we can learn about human problem solving from these
observations?

This second form of observation is very systematic in that the observers


try to establish patterns of behaviour and relationships between
phenomena. The fundamental difference between looking at and looking
for can be summed up in two words: “So what?”.

If we go to the zoo and observe people or animals, or watch children at


play, or note that there are differences in the way males and females
carry their bags, so what? Unless the observation leads to a further set of
questions and an attempt to explain these, the observations are merely
descriptive and without any real scientific purpose. If the observations
arise from simple curiosity and are devoid of any theoretical interest,
they will fail the “So what?” test. On the other hand, if the casual
observations give rise to a thought like “Wow, that’s interesting – I
wonder why this occurred”, the observation will pass the “So what?”
test.

It is, however, recognised that, in many cases, it may be important


simply to observe behaviours without having a theoretical framework in
order to let the data speak for itself. This form of hermeneutic* or
grounded theory* research is essentially the first stage of the scientific
process in that the observers look at what is happening in order to
generate hypotheses that need to be looked at more closely in a looking-
for process. All knowledge generation begins with observation and
description. However, science needs to move beyond this observation
phase in an effort to identify patterns and explain what is being
observed. (See, for example, Bordens & Abbott, 2008, pp. 15–18.)

For the rest of this chapter, and the book as a whole, we will assume that
when we observe people (or other objects, systems or organisms) we do
this for a purpose and not merely to pass the time.

Consider the case study described in Sidebar 2.1.

Sidebar 2.1 Observing primate behaviour


When I was at the University of Natal in the 1970s, a group of my colleagues was
very interested in issues of dominance and learning. To help answer some of
their questions they observed a troop of vervet monkeys in their natural habitat
and noted that the dominant male (the so-called alpha male) was allowed to
approach food first and eat his fill before the rest of the troop could approach.
This is the basis of a hierarchy or pecking order in many species.
The researchers then introduced a novel situation – they painted some bananas
white and placed them in the area where the monkeys found their food. The alpha
male immediately stood back and let the young aspiring males, the young bucks,
test the novel situation.
The researchers’ interpretation of this?
From an evolutionary point of view it makes sense that the alpha male, who has
demonstrated his strength and wisdom in becoming the alpha, should be allowed
to have first access to food. In this way he would be safe and could pass his
genes on to the troop. However, when confronted with a novel situation, the alpha
held back, because it was a potentially threatening situation that could result in
his being killed or injured and thus unable to pass on his genes. By allowing the
young bucks to expose themselves to danger, there was little threat to the troop
as a whole. On the other hand, if the young bucks survived, they would have
knowledge that was superior to that of the alpha and this would help one of them
to supplant him. In this way the troop as a whole would benefit from having a
smarter leader than before.
Are there lessons for humans in this approach? Think about it.

Let us consider this case study briefly. If we look at this study, what do
we see?

Firstly, there was the casual observation that the alpha monkey had first
choice of food.

Secondly, there was the “What if?” question: what would happen if the
troop was to be presented with a novel situation? At this point, the
observation passes the “So what?” test.
Thirdly, there was some sort of intervention (the white bananas) which
allowed the research psychologists to explore the “What if?” question.
Clearly, this involved a series of systematic observations in which the
research team made very specific observations as they looked for
information about how the monkeys reacted to the novel situation.

Finally, the researchers put forward a theory about what they had
observed. This in turn led to a number of additional questions that
needed to be answered, and so they could (and did) devise a number of
other mini-experiments to explore aspects of the behaviour that
interested them.

Ah, you might say, but what has this got to do with real psychology? If
you look at Case study 2.1 at the end of this chapter, you will see the
importance of observation in a childcare situation, in which the effects
of crowding on the behaviour of nursery school children were observed.
Read it now on page 21. It may surprise you to learn that the major
author of the paper on which the case study is based, Christine Liddell,
was trained by the same research psychologists that did the white banana
study.

Case study 2.1 highlights a number of issues about observation.

Firstly, it shows that while some observation is casual (looking at), most
of the observation that a psychologist, consultant or manager is involved
in is systematic and involves looking for relationships, as illustrated in
sections 2.1.1 and 2.1.2. These relationships may involve trying to
understand the causes of a particular problem, or why things are not
working as they should. If your car will not start or your cellphone will
not work, you will look for causes or reasons.

Secondly, it shows that the nature of your theoretical framework will


determine what is being observed, and what will count as evidence for
and against your theory.

Thirdly, it shows that a great deal of preparation is required about what


you should be observing, what constitutes a hit (a positive instance) and
what observation schedule (timing and duration) you should follow
before the observation takes place.

Fourthly, it shows that some form of technology for recording your


observations may be very helpful.

Of course, the fact that we are looking for a pattern or a relationship is


not the only form of observation and is clearly based on questions that
arise from the looking-at process. However, from the standpoint of
psychologists and other people interested in behaviour (of people and
systems), “looking at” is not enough.

2.2 The ABCs of observation

All assessment begins with careful observation and analysis. The


essential processes of observation in turn compromise “ABCs of
observation” – the antecedents, the behaviours and the consequences.

2.2.1 Antecedents* – those things that go before


To understand how any system works and why it works as it does
(whether it be well, adequately or badly), we need to find out what led to
the current situation. In a work situation, this involves looking at a
person’s track record, his work history, merit awards, disciplinary
hearings, courses attended, and so forth. (Note that certain clauses in the
country’s labour legislation limit the type of questions that can be asked.
In addition, we should consider the ethics of our being privy to this
information – all assessment is in some way an invasion of privacy!). In
a research study, we need to examine the background situation carefully
and to control this as far as possible to ensure that our conclusions are a
result of our interventions and not of some other aspect of the situation.
At the same time, we must protect the person’s privacy and dignity.

2.2.2 Behaviours
Clearly, when we observe an animal, person or system, we need to focus
on what is actually taking place at the time of our observation. As
already stated, this observation can be casual or systematic. During
observation, we must be quite clear about what we are looking at and
what constitutes a particular action. Although this may seem like
common sense, many behaviours are quite subtle and we need to have
clearly defined criteria for saying what X or Y is doing. For example, is
a person disagreeing with others in the meeting because of his
ideological standpoint or is there a real practical issue involved? It is
important therefore that we draw up a carefully constructed checklist of
behaviours so that seemingly similar behaviours are clearly identified to
ensure that there is no confusion or overlap during the observation
period.

2.2.3 Consequences* – things that follow from the behaviour


We need to track the behaviour for long enough to determine what
happens next. If the employee is disruptive in the meeting, how do the
others react and what happens to both the person and the group? If the
informal leader in the group shows aggression, how does the rest of the
group or the target of the aggression react?

2.3 Ways of categorising the observation process

There are various approaches to observation.

2.3.1 Context
Context refers to the setting in which the observation takes place. We
can distinguish between naturalistic, simulated and artificial conditions
or situations.

Naturalistic situations. Observers have little or no control over the


situation. They watch employees on the job or in a training venue,
during breaks, or as they enter and leave the workplace.
Simulations. Employees are placed in a situation such as a team-
building exercise or an assessment centre* in which carefully
designed assessment tasks are introduced. These should be as close to
real-life situations as possible and include role plays, work-sample
tests, driving simulators, simulated accidents and so on.
Artificial situations. Employees are placed in a room and given
various tasks such as psychological tests* to complete. These may or
may not be timed. No attempt to resemble real life is made.

2.3.2 Observer involvement


A further distinction is based on the degree of observer involvement or
participation in the assessment process.

Present and involved. For example, a therapist plays with a child as


part of the observation process or asks specific questions to see how a
person being interviewed responds. An extreme form is stress
interviewing, during which the interviewee is deliberately provoked.
Present but not involved. For example, an observer sits in a corner
of the room and takes notes, or gives the person a task to complete
and then stands back to see what happens, as was done in Case study
2.1.
Absent. For example, a person is observed through a one-way mirror
or on video. A further example is the rugby concussion study
discussed in the next section.

2.3.3 Intervention or manipulation


A third dimension of observation is whether or not there is any
intervention in, or manipulation of, the situation. This may vary from
situations in which there is no intervention through to a highly
controlled situation such as occurs in a laboratory or test venue.

No intervention. The observer simply watches what is happening and


makes no attempt to influence the situation.
Minimal intervention. The observer may influence the situation to a
small degree, for example by introducing a problem into the
observation setting and seeing how the person solves it. Alternatively,
the observer may drop a purse or letter in a public place and see what
people do with it.
Moderate intervention. The observer or a helper may begin to
introduce more complex tasks and see how these are managed. For
example, the observer may watch how a person reacts to an elaborate
scam like those in Candid Camera and some of the local Leon
Schuster movies.
Maximal intervention. Testing is nothing more than the observation
of a clearly defined sample of behaviour in a tightly controlled
situation such as a laboratory experiment or psychological testing.

Various combinations of these situations are possible.

Naturalistic observation without intervention. Often the behaviour


of people or animals needs to be observed without the observer
influencing the situation in any way. Children playing in a school
classroom without any input from the observers, and the researchers
simply watching the monkey behaviour as described earlier are
examples of observation in a naturalistic situation without
intervention.
Naturalistic observation with intervention. Although naturalistic
observation without intervention is sometimes preferable, if a
psychologist is looking for specific behaviour patterns or reactions to
particular situations, he may find it necessary to intervene in order to
create the particular situation that needs to be observed. If we are
looking at cooperative behaviour in employees, it may be necessary
first to establish a baseline of interaction, then to introduce a situation
into the group, and observe what happens when this has been done.
We may decide to drop a letter or wallet in a shopping mall and
observe what people do with it. The introduction of the white bananas
in the monkey study in Sidebar 2.1 is an example of such an
intervention.
Observation in an artificial or simulated environment. In many
cases, observation cannot take place in a natural situation, and the
person being observed needs to be placed in an artificial situation
such as the psychologist’s office. In such instances we should ensure
that the artificial situation is as natural as possible. This may involve a
fairly long settling-in process so that the observed person begins to
feel at home in the situation (see Case study 2.1). Even when we
conduct interviews, we may need to arrange at least three meetings
and reserve the real questions for the second and third sessions.

Irrespective of whether there is any manipulation or intervention, the


mere fact that people are being observed carries with it numerous ethical
considerations. Remember, any form of observation is an invasion of a
person’s right to privacy, and so the purpose of the observation and the
nature of the interventions or manipulations must be carefully
considered.

2.4 Use of tools or aids

We can also distinguish between situations where observation is


unassisted and those where some tool or aid is used. The former rely on
the observational powers of the observer (such as watching an employee
at work or looking at a video recording of the post-concussion behaviour
of a rugby player). In contrast, observation may require the use of
specialised tools such as a brain scan, psychological tests or a
computerised test of cognitive functioning. In some cases, some kind of
electronic data logger can be used to record the activities taking place
and their duration. An example is the study into the effects of
concussion in rugby players being conducted at Rhodes University. One
of the projects involves a video recording of a rugby match and then
close analysis of the knocks and bumps suffered by various players. The
researchers use an electronic data logger to record who gets tackled, by
whom and with what force, and what the immediate consequences of
this are for the tackled player. Once they have analysed the whole video,
they download the data for further analysis. Another example is to use a
camera to take still pictures for later analysis.

2.5 Observation schedules*

One of the crucial aspects of observation is to draw up an observation


schedule or sampling frame* which specifies the aims of the
observation and the frequency, duration and sequence of the observed
behaviour. For example, we need to decide how often to observe and the
duration of each observation. At one extreme we can watch an
individual (or several individuals) continually for several hours. At the
other extreme we can randomly watch the same individual(s) for short
(unspecified) periods whenever the fancy takes us (e.g. when walking
by). Ideally, however, we need to specify beforehand when observations
will take place and their duration. During this time all activities carried
out by the individual are noted. We could also specify that only certain
activities, those involving A, B and C for example, will be recorded.
This involves drawing up a coding system*.

Observation schedules play an important role in observing behaviour in


organisational as well as in clinical and educational settings. In an
organisational context, such a frame would be useful for observing
leadership behaviour, for example, and for describing what people do in
a specific job or work situation.

Sidebar 2.2 Putting theory into practice


Here is a way to put this idea of a schedule into practice. At home this evening,
watch a family member or friend for 30 minutes. Every five minutes, note down
what he is doing, and with whom he is interacting. Can you identify a pattern?
2.6 Assessment as a form of research

Case study 2.1 clearly illustrates the final point to be made in this
chapter – that assessment is nothing more than research at the individual
level. According to Wise (1989), assessment is a specialised application
of the scientific method. All assessments involve

formulating a question
designing the means to address the question
interpreting the results
making recommendations
reporting the results.

Psychological assessment therefore involves the following:

Formulating the question. Questions that are pertinent and relevant


to the case should be formulated. For example, why is this per son
unable to hold a job? Why is this child having difficulty paying
attention? Does crowding affect the behaviour of children who have
been raised in very crowded situations? Is this person likely to
succeed in a particular position in a particular organisation? Is the
woman’s brain damaged or is she suffering from post-traumatic stress
disorder (PTSD), or simply malingering*?
Addressing the question – gathering data or information. This
involves gathering background information and assessing the level of
present functioning. The assessor should investigate previous
behaviour as reported in various documents and reports, as well as
observe current behaviour in as many different but appropriate
situations.
Diagnosing or interpreting the results. From the information
gathered, the assessor should attempt to establish what is causing the
individual or system to behave in this (unacceptable) way.
Making recommendations. The assessor should use the gathered
information and his professional knowledge to recommend what can
be done to bring about a change.
Reporting the results. The assessor should present the assessment
results in a written report or at a case conference. If he presents a
report it must be pitched at the correct level.

We must remember that observation is a key method in research, both


qualitative and quantitative. Unfortunately we cannot discuss
observation as a research method in any depth here, therefore we refer
interested readers to Peter Banister’s chapter in Banister et al. (1994) for
comprehensive coverage of this topic. Observation also plays a vital role
in changing behaviour – the whole science of behaviour modification*
is based on observing behaviour and then introducing different
reinforcers to modify it.

2.7 Ethical issues

As stated above, assessment is an invasion of a person’s privacy, and all


aspects of the assessment process and the way in which the information
will be used need to be carefully considered. Wherever possible, the
informed consent* of the person being assessed should be obtained. In
recent years it has become accepted practice that a person being
assessed, even for selection purposes, should sign a document giving the
assessor permission to use the information for the purposes stated. The
box below is an example of such a document.

Permission to assess
I, …………………………, understand the purpose for which I am being
assessed and how the results of the assessment will be used.

I hereby give my permission for the information to be gathered and used in


this way. I understand that the information will not be used for any other
purpose, except with my written permission.

[Signed]

[Date]

Of course, observation as described above and also in the various cases


studies is only one (albeit a very important one) of a number of
assessment processes. There are a number of other forms of assessment,
including more behaviourally oriented processes such as simulations,
role plays and vignettes. There are also more phenomenologically based
processes such as George Kelly’s (1955) repertory grid technique, as
well as various projective techniques such as the Thematic Apperception
Test (TAT), the Rorschach inkblot techniques and incomplete sentence
techniques. Clearly, there is a whole range of psychological tests from
simple attitude and value inventories to the more complicated
intelligence and personality tests. Finally, there are the
neuropsychological tests designed to assess levels of cognitive
functioning and possible brain damage resulting from a variety of
causes.

Much of the rest of the book examines these different approaches, and
shows us how to evaluate the effectiveness of the various assessment
methods – how well do they do what they claim to be doing? We also
look at how to interpret the results of various data-collection techniques,
and to make sense of our observations about people and systems. What
do our results actually mean?

Case study 2.1


The effects of crowding on nursery school children
(LIDDELL, C. & KRUGER, P. (1987))
In order to find out about the effects of crowding on the social behaviour of young
children, two researchers from the University of South Africa (UNISA), Christine
Liddell and Pieter Kruger, investigated the behaviour of young children in a very
crowded nursery school situation. Previous research studies showed that
increased densities did have a moderate effect on children, especially on their
social behaviour. However, a major difference between the present study and the
previous ones was the very high density levels in the current case. Where the
previous studies (carried out in Europe and the US) allowed approximately 15
square feet (1,42 m2) per child, in the current study the density per child varied
from 1,56 m2 (16,1 square feet) to 0,56 m2 (5,7 square feet). (This variation
resulted from different attendance rates during the observation period.) In terms
of US Census Bureau definitions, 90 per cent of the children in the current study
also lived in overcrowded homes.
The purpose of the study was to see whether previous findings would be
confirmed in the South African sample, especially in the light of the much higher
levels of crowding. The findings could also have implications for developing
improved pre-school education programmes by indicating whether, and how
urgently, spatial density factors need to be taken into account.

Method
Participants
The nursery school was located in a township near a large South African city and
catered for 83 children (44 boys and 39 girls) ranging in age between 32 and 64
months. The majority (75%) came from intact families with the father present, and
the balance from female-headed families. The township houses were small with
between three and 18 people per household (Mean 8, SD 3), resulting in an
average of 4 m2 per inhabitant. Parental occupations ranged from domestic
workers to teachers. Rates of growth and nutritional status were comparable to
those found in US children.

Nursery school setting


The nursery school was purpose built as a primary care centre for working
parents. The children attended from early morning until evening. There were
three age-segregated groups and the present study focused on the youngest
group. There were 83 children in the care of two adults. The children were left to
play as they liked with very little adult-led activities. There was no set routine,
although children rested or slept for two hours after lunch. Some activity took
place outdoors, but approximately 65 per cent of the time at school was spent
indoors. There was a range of small toys – dolls, plastic lorries, aeroplanes, a
teaset, a tennis ball and a small go-kart. These were brought out for only part of
each day. There was no large apparatus such as a jungle gym or slide.

Procedure
The study lasted for 12 weeks with 33 sessions in all. The first five were
habituation sessions, designed so that the children could get used to the
presence of the observers. In addition, some of the children were identified as
people to observe, and they were given coloured aprons to wear. The habituation
sessions also allowed them to get used to these. The second five sessions were
used to gather pilot data and to refine the data collection categories.
Conventional focal-child sampling*, in which an individual child is identified and
watched continuously, proved ineffectual, and so a system of group scanning
was used. In this process, the observer progresses through various groups in a
set order and observes the behaviour of the targeted individuals in each group
before moving on to the next. (This technique has been used extensively in
primatological research.) The sequence of the groups is determined by their
spatial organisation. In this study, the indoor area was divided into four equal
quadrants by means of wall markings. A starting point in each quadrant was also
identified. At the beginning of the observation period, the child nearest the
starting point was observed for ten seconds (indicated by a buzzer in the ear of
the observer) and his behaviour noted. Thereafter the child closest to the first
child, as well as any other children with whom the second child was interacting,
was also observed for ten seconds. During this process, the observer moved
slowly through the four quadrants to ensure a good view of what was taking
place, making notes as he moved through the classroom. Codings were spoken
into a portable tape recorder carried by the observer and later transcribed onto
coding sheets.
A total of 184 group scans was collected, eight for each of the 23 data collection
sessions. Repeated samples were sometimes taken when children moved from
one quadrant to the next ahead of the observer. These duplicates were discarded
before analysis. Three focal areas of behaviour were selected for investigation,
namely level of social participation, activity and aggressive behaviour.

A) LEVEL OF SOCIAL PARTICIPATION


Four levels of participation were identified:

Solitary or parallel play (playing alone and independently)


Onlooker behaviour (passively watching others play)
Associative play (playing together without formal rules)
Cooperative play (playing with rules and turn-taking)

B) ACTIVITY
Five levels of activity were identified:

Object-mediated activity (playing with a small toy)


Socially mediated activity (child playing with another, no toys involved)
Motor activity (child involved in gross motor activities such as running)
Cruising (child looking for something)
Unoccupied (child not involved in any of the above)

C) AGGRESSIVE BEHAVIOUR
Five forms of aggressive behaviour were identified:
Physical aggression (fighting, trying to hurt another)
Domination (chasing another child, trying to take a toy from him/keeping the
toy)
Dispute of object (trying to keep an object taken from another)
Failure to take object (trying to keep an object but failing)
Submission (giving in to other’s demand, losing possession of a toy or
apparatus)

Because it was difficult in practice to assess aggressive behaviour in the ten-


second observation period, this was scored on a Yes/No basis.

Reliability of measurement
In order to ensure an acceptable level of agreement between the observers (a
form of interrater reliability), inter-coder agreement coefficients* (ACs) were
calculated by taking the number of agreements between the observers as a
proportion of the total number of observations. After five days of training, there
was perfect agreement between the observers on the Yes/No items (AC = 1,0)
and an average AC of 0,85 for the other measures, ranging between 0,71 for
socially mediated activities and 0,94 for unoccupied behaviour.

Results
In short, it was shown that as density increased, so the amount of socially
mediated behaviour decreased and unoccupied behaviour increased. Despite the
much higher density levels in this case study compared to the US studies, the
same pattern of behaviours emerged with fewer social behaviours in the high-
density situations in relation to the low-density ones. In general, the absolute
levels of socially mediated behaviour seemed to be slightly lower than those
displayed in the US with comparable samples.

We give this brief summary of Liddell and Kruger’s article to show how
direct observation can be used in an assessment process. It illustrates
how the observation process needs to be carefully planned and executed.
We can use the same approach in observing an employee’s workplace
behaviour and social interactions with a teamwork situation.

For the full study, see Liddell and Kruger (1987).

Case study 2.1 shows how the various components of assessment were
put into practice. Firstly, the study was naturalistic, because it took place
in the venue where the children spent most of their time. The observers
were present but did not interfere with the children in any way. (In fact,
they did nothing for the first few days until the children accepted their
presence as normal, thus trying to make the situation as normal as
possible.) This was a case of naturalistic observation without
intervention.

Secondly, the space involved was subdivided and the observers moved
systematically through each area during the observation periods. This
was a direct result of the planned need to observe the phenomena in the
most controlled manner possible.

Thirdly, observations were made every 15 minutes and each lasted ten
seconds, with the clearly defined interactions that occurred in the time
slot and the geographic area being noted before the observers moved to
their next area. This is an example of scheduling, both temporal and
spatial.

Fourthly, the first five observation sessions were used for training and to
pilot the techniques. The actual observations continued for a period of
12 weeks comprising 23 observational sessions.

Fifthly, the observation process involved training, piloting and a check


on the consistency (inter-rater reliability) of the different observers.

Finally, the observations were clearly made within a theoretical


framework – the observers were looking for relationships and patterns.

2.8 Summary

We began this chapter by distinguishing between casual observation


(looking at) and systematic observation (looking for). In respect of the
latter, we identified the ABCs of observation, namely antecedents,
behaviours and consequences. We then examined various ways of
categorising the observation process in terms of context (naturalistic,
simulated and artificial) and observer involvement (present and
involved, present but not involved, absent). We also saw that
observation can be characterised by the degree of intervention or
manipulation involved, identifying four such levels, namely no
intervention, minimal intervention, moderate intervention and maximal
intervention. We discussed the use of tools or aids and the importance of
observation schedules.

In closing, we considered assessment as a research process that involves


formulating the question, addressing the question, diagnosing or
interpreting the result, making recommendations and reporting the
results. Finally, we raised the important issue of ethics* in observation.

Additional reading

A part of the discipline of behaviour modification makes a great deal of use of


observation, and the literature provides a number of sound examples of observation
schedules. A useful book in this regard is Miller, L.K. (1997). Principles of everyday
behavior analysis, especially Chapter 3. A similar book is Miltenberger, R. (1997).
Behavior modification: Principles and procedures.

Test your understanding

Short paragraphs

1. What is an observation schedule and why is this important?


2. Discuss the distinction between casual and systematic observation.
3. Discuss the notion of observation as a research process.

Essay

Outline the different ways in which the act of observation can be characterised. What
are the various parameters that can be used to describe observations?
3 Developing a psychological
measure

OBJECTIVES

By the end of this chapter, you should be able to

define a survey, a scale and a test


describe different types of tests
outline different forms of test content
outline different ways or formats in which tests can be administered
describe the various answering formats
differentiate between ipsative and normative approaches to answering items
describe the six steps involved in developing a psychological scale or test.

3.1 Introduction

The previous chapters illustrated the importance of gathering data and


the effectiveness of doing this through systematic observation. One of
the easiest ways of gathering data is to ask people questions. We can do
this simply by talking to them, but this is not the best way as it can lead
to a number of difficulties, the most obvious being that different people
may be asked different questions, which makes it difficult to treat
everybody the same. (These problems are discussed in Chapter 14 when
we examine interviewing.) A more rigorous way is to draw up a series
of questions and to give the people involved a written document to
complete. This is called a survey*.

Although surveys are very useful and are widely used by researchers,
especially market researchers, they present a problem in that a wide
range of topics is usually covered. Psychology researchers are generally
interested in very narrowly defined constructs* such as intelligence* or
personality*, and it is very difficult to assess these using a broad-based
questionnaire or survey. Therefore psychology researchers draw up a
scale or test in which the items* (questions) are carefully designed to
assess the construct they are interested in. The terms are discussed in
more detail in section 3.5.

Testing is particularly useful when the characteristic is not clearly


observable, but important for adequate functioning in a particular
setting. A test is simply an intervention to make the characteristic
observable – that is, it makes the invisible visible. For example, an
examination or a class test at school or university is nothing more than a
structured process for making what is basically invisible (a student’s
knowledge) into something that is visible and can be scored and judged
(the test result or examination script). Following the same principle, iron
filings can be used to demonstrate the existence of a magnetic field. A
psychological test, however, is a very carefully constructed and
interpreted form of observation.

In recent times, psychological testing has fallen out of favour in certain


quarters, because it was seen as a means of excluding certain people
from desirable jobs and other opportunities such as bursaries. This is
now beginning to change, because there is no scientifically defensible
alternative. The same thing happened in the US in the 1970s and 1980s,
but testing is once again firmly established as the preferred mode of
assessment in many situations, because the costs of not testing are very
high. As Victor Nell (1994, p. 105) notes:

Ability testing is not going to go away in the new South Africa.


However strongly groups in the anti-test lobby … insist on an end to
the tyranny of test scores, psychological assessment is so deeply
rooted in the global educational and personnel selection systems, and
in the administration of civil and criminal justice, that South African
parents, teachers, employers, work seekers and lawyers will continue
to demand detailed psychological assessments for school readiness,
vocational placements and court disposal.
3.2 Techniques used in measurement

There is a major difference between those techniques or instruments that


measure how we typically behave and those that measure our maximum
performance. Measures of typical performance cover such
characteristics as attitudes, values and personality, and are generally not
timed. Measures of maximum performance are concerned with aspects
such as our problem-solving ability, intelligence and aptitude*. These
measures generally have time limits although, as we see in Chapter 9,
we can distinguish between speed tests* (how many relatively simple
items can the person complete in a given time?) and power tests* (how
many items of increasing difficulty can the person complete in a given
time period?).

This distinction is shown below.

3.3 Types of content

Different psychological measures consist of different types of questions


or problems.

Verbal
– Reasoning: A man walks east for 60 m and then …
– Analogies: Hand is to arm as foot is to …
– Understanding or comprehension: Reading studies
– Knowledge: Who was the first black president of South Africa?
– Language: What does “apprehensive” mean?
– Grammar: Which word is wrong? “The cat sit on the mat.”
– Spelling: Which word is incorrectly spelled? “The kat sat on the
mat.”

Numerical

– Arithmetic: 2 + 2 = …
– Series: What comes next? 2 4 8 16 …

Symbolic

– Mental transport, rotation or assembly: Which two shapes go


together to form a square?
– Series: What shape comes the next in the series?
– Matrices

Codes

– If the code for BED is 254, what is the code for DOG?

Apparatus

– Tracking
– Assembly: Use the different pieces to make a face or human figure.
– Series: Form Series Test: Using different shapes that are supplied,
show which is the next in a series by physically placing the shapes
in position.

Narrative

– Tell me what is happening in this situation.

3.4 Application formats

Assessments can be administered in many ways:

Pencil and paper


Card sorting (e.g. sorting cards with adjectives on each into piles such
as “always like me”, “sometimes like me”, etc.)
Manual (e.g. fitting objects together to make a whole such as with
jigsaw puzzles)
Computer based (see Chapter 16)
Adaptive testing (see Chapter 16)

Because psychometric tests* have been designed to measure one or a


small number of dimensions*, the points for the answers given can be
added together to make a single score, just as a student’s end-of-year
mark in industrial or organisational psychology is a combination of class
tests, practical assignments and examination marks.

3.5 Developing a scale or test


As we have stated, assessment is a process of systematic observation.
Often we need to observe people in very controlled situations to ensure
fairness and compatibility of results (see Chapter 7). In these instances,
people answer written questions that can be posed and answered in one
of four ways:

1. A narrative or essay format is used, for example, to answer an


examination essay question. The assessor then reads and interprets
the results against fairly specific criteria.
2. A survey is a collection of unrelated items, the answers to which
cannot be graded and added together to give a single score.
3. A scale is a collection of related items, each of which indicates the
degree to which a particular phenomenon being assessed is present
or absent. If these items are properly grouped, their scores can be
added together to give a total score.
4. A test is a scale collected under strictly controlled and regulated
conditions. It is therefore a highly standardised form of observation
in which the participants are presented with specific and well-
defined questions or other materials, and their responses are
assessed (they have to perform a task and are then judged on how
well they have done it). Tests usually have time limits.

The basic premise of assessment is simply this:


Everything that exists, exists in some quantity and can therefore be measured.
This statement, attributed initially to the French philosopher René Descartes
and popularised in psychology by Edward Thorndike in 1918, forms the basis of
modern psychological assessment (see also Cronbach, 1960).

The challenge lies in finding ways of measuring things.

Suppose we want to draw up a scale to measure job satisfaction. First we


need to define what we mean by job satisfaction and outline how people
who experience high job satisfaction differ from people who do not. We
then draw up a scale – a series of questions to measure these factors –
and apply it to a number of people to ensure that it does what it claims to
do and that it does this consistently and fairly. If the scale is not as good
as we had hoped, we may need to change a few of the items or
questions, and repeat the process. For a test, we need to decide on a time
limit, based on how long it took the majority of people in the sample to
complete all the items. Finally, we must decide on a way of interpreting
the scores: on the basis of the scale, how do we decide whether a person
experiences high, medium or low job satisfaction?

From this brief outline, we can see that developing a scale involves the
following seven steps:

1. Conceptualising: What are we looking for?


2. Operationalising: How would this show itself?
3. Quantifying: How can we attach a value to what we have observed?
4. Pilot testing: How does the measure behave in practice?
5. Item analysis: Does each item contribute properly to the total score?
6. Norm development and interpretation: What does this score mean?
(Develop and maintain norms.)
7. Evaluation of the technique: Is the assessment process consistent
and accurate? (Is it reliable and valid?)

What do these seven steps involve?

3.5.1 Conceptualising
The first step in measurement is to gain a clear understanding of the
phenomenon of interest of the domain. In other words, we must clarify
what we are looking for. In our example, the question is: What do we
mean by job satisfaction? To do this, we ask the following:

What does job satisfaction mean in everyday experience and


common-sense language?
What does the literature say about this phenomenon?
How can we define it?
What are its dimensions and components?

We may decide that job satisfaction comprises components such as the


kind of work, the pay and benefits, the working conditions, the quality
of the supervision received, promotions and relations with co-workers.

A useful way of conceptualising is to use a mindmap or some similar


process.

3.5.2 Operationalising
The next question we have to ask is: How would job satisfaction reveal
itself? What are the indications that this phenomenon is present (or that
a process has occurred)? How does a person with a high level of job
satisfaction think and behave differently from a person with low job
satisfaction? Using the various components and dimensions identified in
the conceptualisation stage, we can generate as many indicators or
statements as possible to reflect these.

Suppose that we wished to devise a scale to assess a student’s


achievement motivation* at university. Our first action should be to
define this concept as something like the person’s desire to excel at
university. Then we could begin to formulate statements such as the
following to capture the spirit of what we are looking for:

I would be disappointed if I did not get a first-class pass in each of my


subjects.
I know that I am capable of getting A symbols in all my subjects.
I aim to finish in the top three of my class in this subject.

These statements all indicate a high level of achievement orientation in a


university context. We could think of many other such statements to
reflect the domain and its different components identified during the
conceptualisation stage. It is equally obvious that other contexts would
require different operationalisations of the same definition. In a work
situation, these statements or items would be phrased in terms of rapid
promotion, earning large bonuses, and so on. In an artistic field, the
achievement need would refer to recognition, fame, sales, invitations to
perform or exhibit, and so on.

TIPS FOR WRITING ITEMS

1. Keep them as short and simple as possible – research has shown that the
longer the item, the less accurate it is.
2. Ask simple and direct questions – do not try to be too subtle.
3. Avoid negatively phrased items.
4. Avoid idioms or the use of foreign terms.
5. Avoid asking two questions in one item.
6. Ask specific questions – rather ask: “What newspapers have you read this
week?” than “What newspaper do you read?”.

3.5.3 Quantifying
How can we attach a value to what we have observed? How can we
count examples of, or measure the intensity of, the construct we are
trying to measure? There are three requirements.

Firstly, we need a good representative sample of the various items we


generate, ensuring that the items are a fair reflection of the various
components identified during conceptualisation (see Chapter 5 for how
to make sure that the sample of items is representative of the domain we
are interested in). This is known technically as content validity*.
Research has shown that, as a general rule, the final scale should contain
about six items per facet or subscale we are trying to measure (Burish,
1997). Therefore we should try to develop about ten items for each
facet. It is useful to include several items that do not relate to what we
are trying to assess. These are known as distracters* and are used to
disguise the purpose of the test to discourage people from trying to
guess what the test is about and then responding in a way they think is
appropriate. This is known as a response set*. Obviously, these
distracters are not scored.
A related issue is building various checks and balances into the test. For
example, to ensure that people are paying proper attention to the items,
test or questionnaire, constructors often repeat a question in a slightly
different form in the measure. They may, for example, at one point in
the scale, put in a statement such as “I like ice cream” and then
somewhere else state “I do not like ice cream”. Participants cannot agree
with both and, if they do, it is clear that they are not giving the exercise
proper attention. When analysing the test results, the administrator looks
for the responses to these pairs of items, and if too many disagreements
occur, the results as a whole become suspect.

A final issue of importance regarding content validity is faking*. For a


number of reasons people may want to appear better or worse than they
really are. To prevent this from happening, various items to detect
faking can be built into the measure. These and the whole issue of
response sets are discussed in Chapter 5.

The second requirement is to develop a good set of instructions on how


to complete the tasks involved. This includes deciding upon preliminary
time limits should timing be part of the test. These time limits may need
to be changed once pilot testing has been done (see section 3.5.4).

The third aspect of quantification is to assign a numerical value to the


answers given to each of the statements or items. The way we do this
depends largely on the way the statements have been phrased and how
the participants have responded to them. For example, if the item
requires a Yes/No or True/False answer, we can score the correct
answers as 1 and the wrong answers as 0. If we ask an essay-type
question, we often give a more global result, for example 65 or 80 per
cent. Quantification thus depends on the kind of answers we require, in
other words the response format. This is a complex issue that is dealt
with fully in section 3.6.

3.5.4 Pilot testing*


Once we have decided on the items and response format, the next very
important stage is to administer the measure to a sample group of people
similar to those for whom it has been designed. The size of the group
will depend on the nature of the tasks involved, but in general a good
pilot study should include 400–500 participants (Foxcroft & Roodt,
2005, p. 51).

When the measure is administered in a pilot session, there are several


differences from the administration of the final version. Firstly, the
administrator should keep a close check on how many items people
complete by various time limits. For example, if a test developer thinks
a task should take about 30 minutes, the administrator should ask the
participants to indicate on their answer sheets how far they have got
after, say, 20 minutes and then at five-minute intervals thereafter until
40 or 45 minutes have elapsed.

Secondly, on completion of the task, the administrator should ask the


participants to comment on the experience and to outline any difficulties
they experienced with each item and even with the instructions. The test
developers can then use this information to modify the administration,
and rework some of the items if there appears to be confusion. (See
Chapter 11 in McIntire & Miller, 2000, (pp. 208–220) for further details
on this aspect of test construction.)

3.5.5 Item analysis


Once the test has been administered and scored, it is necessary to
conduct an item analysis*. In this process the score for each item is
correlated with the total score. This correlation* is known as the item-
total correlation. Purists would argue that it is not 100 per cent accurate
and that each item should be correlated with the total minus the item
score. This is known as item-remainder correlation* and is a better
indication of the item’s performance because the contribution of the item
is excluded from the total. If we do not exclude the item in this way,
each item is correlated in a small way with itself. In practice, there is
little difference between item-total and item-remainder correlations.
However, the purists prefer the item-remainder approach. Statistical
packages such as Excel®, Statistica® and SPSS (Statistical Package for
the Social Sciences®) and various others all have programs for doing
this analysis (see Appendix 2).
Using the results of these analyses, the items that show a low item-
remainder correlation (anything below about 0,4 is low) should be
excluded or re-examined. If an item is negatively correlated, it is quite
probable that it has been scored in the wrong direction. All problematic
items should be examined and ways of improving on them sought.
These items must then be re-piloted, although a smaller pilot sample is
permissible. This process of piloting, analysing, reworking and re-
piloting must be continued until a reasonably acceptable set of items has
been developed. Even then, the test developer must always be willing to
revisit items and to modify them where necessary. However, if the test
has been published, this will be a major exercise.

3.5.6 Norm development and interpretation


The sixth aspect of drawing up a psychological scale or test is to devise
a basis for interpreting the results obtained. What do the scores we have
obtained mean? Is the result good or bad? Is it better or worse than
something else? Will the person described in the case study be able to
work at the same level as before – or at all? And so on. In order to make
these decisions, the person’s score has to be compared against an
appropriate norm* or benchmark. The development of norms and the
interpreting of assessment results are comprehensive topics and are
discussed in full in Chapter 6.

3.5.7 Evaluation of the technique


We need to ensure that the measure we use is consistent or reliable (i.e.
it measures the same thing each time) and accurate or valid (i.e. it
measures what it claims to measure). These two issues of reliability*
and validity* are dealt with in Chapters 4 and 5 respectively. We also
need to examine various aspects such as response sets: is there a
tendency for people to complete the test in a particular way? For
example, some people tend to agree with everything, others prefer to fill
in the extremes, while still others prefer to use the middle values. All of
these response sets can have an effect on the total score. (The issue of
response sets is discussed more fully in Chapter 5 when we discuss
validity.) We also need to check the test’s performance in different
situations and with different participants. This is known as cross-
validation*.

Clearly, once all these aspects have been successfully addressed, the
assessment technique is ready for use. All that remains is for the
technical manual, containing information about its reliability and
validity, as well as norms or other ways of interpreting the data, to be
compiled and submitted to the Professional Board for Psychology for
classification. Of course, if we want to market it as a commercial
product, we then need to find a publisher, marketer and distributor.
These processes clearly lie beyond the scope of this book. (For another
look at this process of test development, see Foxcroft & Roodt, 2005.)

3.6 Answer formats*

There are various ways in which questions can be answered in


psychological assessment. Some of the most common formats are
described below.

3.6.1 Dichotomous items


The simplest way of assigning a value to any statement is simply to give
one point for each item indicating a response in the desired direction.
For example, using the first item generated in section 3.5.2 on
achievement motivation, we could simply attach an Agree/Disagree
response to the statement and give one point for “agree” and zero for
“disagree”.

I would be disappointed if I did not get a first-class pass in each Agree Disagree
of my subjects

With this approach, we simply count the “ones” to get a total, which
then represents the person’s level of achievement motivation. There is
nothing wrong with this procedure, and it is identical to what happens in
tests that use a multiple-choice question (MCQ) answering format.
However, it is more appropriate when the item is either right or wrong.
In many cases, as in the statement above, the people who agree with it
could nevertheless agree either more or less strongly. This leads to the
next answering format.

3.6.2 Likert scales


Likert scales are named after Rensis Likert (pronounced lick-ert), the
person who first developed and used them widely. At the heart of this
technique is a range of possible responses that are arranged from highest
to lowest (or the reverse – it makes no difference). An example of this is
given below.

1 2 3 4 5
Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree

Clearly, this response format allows a far broader range of responses to


the item than the simple Agree/Disagree or Yes/No format. For this
reason it is far more appropriate when dealing with issues that involve
feelings and opinions, such as the achievement motivation item under
discussion.

There are a few additional issues that need to be considered when we


look at Likert scales.

At first glance it looks as if Likert scales are interval scales, because the
numbers run neatly from 1 to 5. However, this is not the case. Each
Likert item is at best an ordinal scale: there is no evidence to indicate
that the difference between “Strongly agree” and “Agree” is the same as
the difference between “Agree” and “Neither agree nor disagree”.
However, it is treated as such. At the same time, as Nunnally and
Bernstein (1993, p. 67) point out, when a large enough number of Likert
scale items are combined, the resultant scores begin to take on interval
scale properties. This has to do with the notion of measurement error*,
and is closely related to the law of large numbers. This is discussed in
Chapter 4 when we look at the theory of measurement in relation to
reliability.
The second issue is whether there should be a midpoint in the scale. In
the answer format given above, a five-point scale is used, with the
midpoint 3 being “Neither agree nor disagree”. Those people who
oppose having a midpoint (i.e. those in favour of having an even number
of choices) argue that it is very seldom that people do not have an
opinion either in favour of or against a particular issue. They believe that
in reality there is no neutral midpoint, and argue that having a midpoint
only encourages people to sit on the fence and not commit themselves
either to being in favour of or against the view expressed. There is
evidence that some cultures (e.g. the Chinese) prefer not to disagree with
others and therefore to choose the neutral point. In view of this, answer
formats with an even number of items (i.e. no neutral point) should
always be used in this context.

Those in favour of the midpoint (i.e. having an odd number of choices)


argue that there are many cases where people do, in fact, have no
opinion, and are neither in favour of nor against the view contained in
the item. They therefore believe that there is a good case for having an
odd number of choices.

At the end of the day, it seems to be a matter of personal preference. I


personally favour even-numbered answer formats with no midpoint.
This point is also emphasised by Fisher (1997).

A third issue concerns the number of choices. Irrespective of whether


one opts for an odd or even number of choices, it is possible to have
three or four items on either side of the midpoint. Below are examples of
a seven- and a nine-choice response format.

1 2 3 4 5 6 7
Strongly Moderately Agree Neither agree Disagree Moderately Strongly
agree agree nor disagree disagree disagree

1 2 3 4 5 6 7 8
Strongly Moderately Agree Slightly Neither Slightly Disagree Moderately
agree agree agree agree disagree disagree
nor
disagree

In this regard, my own experience is that the lower the educational level
of the target audience, the fewer the options that should be made
available. I would therefore recommend that, in general terms, people
with an education level below Grade 10 or 12 should not be given more
than four or five choices, as it may become confusing.

3.6.3 Guttman scales


Guttman scales arrange items (descriptors) in an ascending order of
preference, so that the endorsement of a particular item probably
endorses all items above and none of the items below. An example of
this is known as the Bogardus Social Distance Scale*. With this
measure, a person’s attitudes to different out-group members is
measured by arranging various social situations along a hierarchy of
intimacy and then asking him to indicate whether or not he would accept
a member of the out-group into that situation. This is illustrated on the
next page.

Would you let an X (where an X could be a member of any social group, e.g. a
Rastafarian, a blonde person, a bee-keeper, a drug addict, etc.)

Yes No
a) attend your place of worship
b) live in your street
c) live next door to you
d) visit you in your house
e) have you as a friend
f) date your child
g) marry your child

In this example, if you said “No” to item d), you would also be likely to
say “No” to items e), f) and g). For another example, see Nunnally and
Bernstein (1993, pp. 73–74).

Although this approach was popular at one stage, it is not used very
often these days for two main reasons. Firstly, it is quite difficult to
construct a meaningful hierarchy in many cases. This is made even more
so by the fact that different people and different groups of people may
attach different values to the various anchors in the hierarchy. For
example, a strongly religious person could conceivably rate the first item
in the hierarchy above (namely “a) attend your place of worship)” as
more important than “g) marry your child”. Secondly, because these
items are in a hierarchy, much of the information is wasted. If a person
answers “No” to item d), then we know that he is likely to answer “No”
to items e), f) and g) and “Yes” to items a), b) and c). Therefore,
collecting information on items other than d) is a waste of time and
effort. For a more technical criticism of Guttman scales, see Nunnally
and Bernstein (1993, pp. 74–75).

3.6.4 Item weighting


Thus far we have assumed that the different items in the scale or test are
equally important and carry equal weighting. By simply adding the
scores for each item, we assume that each is equally important.
However, in some cases, some items may be more important than
others. For example, in a five-item scale we can weight different items
differently, as shown in the table.

The example shows that the five different items have been given
different weights, and these influence the final score. Note also that the
last item has been given a negative score because endorsement of this
sentiment is seen as negative and the person is marked down as a result.
In practice, this form of negative marking is seldom used. The
weightings are theoretically based in the first instance, but are then
modified over time in the light of experience and research.

Weighted
Item Score/5 Weight
score
I would be disappointed if I did not get a first-class 3 2 6
pass in each of my subjects
I know that I am capable of getting A symbols in all 4 3 12
my subjects
I aim to finish in the top three of my class in this 4 2.5 10
subject
I am prepared to work all weekend and during my 3 3 9
vacations to make sure I achieve my ambitions
I am prepared to cheat to make sure I come out top 5 –2 –10
of my class

3.6.5 Item direction and reverse scoring


When we draw up the different items, we tend to phrase them in such a
way that agreement with the statement indicates the presence of the
phenomenon we are looking for. Let us look again at the three items
developed to measure achievement motivation.

I would be disappointed if I did not get a first-class pass in each of my


subjects.

I know that I am capable of getting A symbols in all my subjects.

I aim to finish in the top three of my class in this subject.

People with a high level of achievement motivation would agree with


each of these statements. This is what is meant by item direction* –
these items are all scored in the positive direction. However, this may be
a problem sometimes, because people get used to endorsing all the
“Agree” items, which can begin to distort the results. The simple
solution is to reverse some of the items so that disagreement with the
statement indicates a high score. Using the achievement motivation
items above, we could reverse the direction of the third item to read as
follows:

I am not sure that I have the ability to finish in the top three of my
class in this subject.
Obviously, this item needs to be reverse scored. If we were using a five-
point Likert scale response format such as

5 4 3 2 1
Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree

we would allocate points in the opposite direction as shown below.

1 2 3 4 5
Strongly agree Agree Neither agree nor disagree Disagree Strongly
disagree

Clearly we would not give this information to the participant, but would
keep a record of which items are reverse scored and take this into
account when scoring the responses. In general, it is a good idea to have
between 35 and 50 per cent of the items reverse scored.

3.6.6 Ipsative scoring


The last issue to consider when allocating points is known as ipsative
scoring* and is based on the ranking of items. Suppose we were
interested in job satisfaction and asked a respondent to rank five items in
order of importance to him. It is obvious that no two items can have the
same value, and that if he ranks four of the items, the fifth is
automatically known. Ipsative means “self-referring”, and by ranking
the items a very true picture emerges, as the participant is forced to
choose how to allocate the points. This results in the score on one item
being affected and limited by scores on other related items.

Consider the following example:

Instructions
Each item consists of five statements, labelled A, B, C, D and E. For each item,
you must rank the five statements in order of preference, with 1 being the most
important and 5 the least important. Record the ranking you give to each item in
the spaces provided.
Ranking
I think that a really good workplace is one (which
allows me to)
A have a high standard of living
B make important decisions on my own
C feel that I have done something really worthwhile
D where people are encouraged to compete with
and outperform others
E where my supervisor encourages me to show
initiative.

We can see that if the participant ranked A = 3, B = 5, C = 1 and D = 2,


then E has to equal 4 (it is the only value that has not been used). This
contrasts with normative scores, where each item is judged on its own
merits. In the example above, a normative approach would be to ask the
participant to rate each of the five items on a ten-point scale. This allows
for ties because he could score, say, A and B 7 out of 10, and C and D 5
out of 10. There is no way of judging the score given to E, because each
item is judged independently.

Because the score given to the last item in ipsative scoring can be calculated from
knowing the other scores, correlations using ipsative scores tend to be lower than
those using normative scores. Correlations using normative scoring techniques
tend to be higher than those using ipsative scores.

A good way to think of these two approaches is to liken them to the


aeroplane cabin attendant asking whether we want chicken or beef. This
is ipsative, because it does not say how much we like (or dislike)
chicken. Neither does it tell how much we like (or dislike) beef. A
normative approach would be to ask us to rate both chicken and beef on
a ten-point scale. As a point of interest, it seems as though in general
most Americans prefer a normative approach to an ipsative one – they
do not like having to make forced choices!
3.7 Summary

In this chapter, we defined a survey, a scale and a test, and we outlined


various types of content and ways of answering items. We described a
six-step process for compiling a scale or test, which involved
conceptualising, operationalising, quantifying, pilot testing, interpreting
and evaluating the technique. Finally, we considered various scoring
formats, and examined the difference between ipsative and normative
approaches to scale construction.

Additional reading

Foxcroft, C. & Roodt, G. (Eds). (2009). An introduction to psychological assessment in


the South African context (3rd ed., especially Chapter 6), give a good account of the
steps to take in drawing up a scale or test. They also make the point that adapting or
translating a scale for use in South Africa is in many ways the same as developing a
scale from scratch.
Chapter 7 in Cohen, R.J. & Swerdlik, M.E. (2002). Psychological testing and
assessment: An introduction to tests and measurement also deals with this issue. The
text examines the need for a proper item analysis and the various issues surrounding
the way each item behaves.

Test your understanding

Short paragraphs

1. Describe briefly the seven stages of drawing up a scale or test.


2. How would you set about conceptualising an attribute you wish to assess?
3. Discuss what is meant by ipsative scoring.
4. Describe the differences between the Guttman and Likert answering formats.
Essay

Describe how you would set about drawing up a scale to assess some characteristic
such as job satisfaction.
SECTION
2

Introduction to psychometric theory


In the previous chapter we defined various forms of assessment, focusing on the
distinction between a survey, a scale and a test. We also outlined various types of
content and ways of answering items, and described the seven steps in compiling
a scale or test. Finally we considered various scoring formats.
The question now arises as to whether our new measuring tool meets the criteria
spelled out in section 1.8 of Chapter 1. Of particular concern to psychologists is
whether the measures we use are consistent: do they measure the same thing
every time they are used and do they accurately measure what they claim they
are measuring? These two aspects are known as reliability and validity. They are
discussed in depth in Chapters 4 and 5 respectively.
We also need to know what the results of a particular measurement mean in
practice. In other words, what does a score of 23 on the XYZ test mean? Is it a
good score or a poor score? Can we select a person (for a position, a bursary,
etc.) based on this score? We also need to ask ourselves what happens when a
person scores well on one or two tests and poorly on another two or three tests:
how do we combine scores to arrive at a selection decision?
An issue that is vital in a country like South Africa, where people have had very
different social and educational experiences, is the issue of fairness in the
assessment process, especially where important, potentially life-altering decisions
such as selection for employment or for bursaries must be made. Accordingly, in
Chapter 7 we examine the notion of what fairness is, what makes an assessment
decision or outcome fair, issues relating to the assessment process and how
these can affect the outcomes of the assessment. We then suggest ways in which
we can enhance the fairness of our measurement tools and our processes.
In closing this section, we examine the control of psychological assessment tools
(to prevent people from “swotting up” the results, which would make the whole
assessment process useless). We also look briefly at what the law and the
profession say about who can use tests and other assessment processes, and
the training that is required of these people, both in South Africa and elsewhere in
the world.
In order to show how all of these factors come together in reality, an example of
some promotional material for a test is presented at the end of this section.
4 Reliability

OBJECTIVES

By the end of this chapter, you should be able to

discuss what is meant by reliability


describe the various forms of reliability
describe the various sources of error in measuring a psychological attribute
discuss what is meant by standard error of measurement (SEM)
show how reliability determines the value of an assessment technique.

4.1 Introduction

In a previous chapter, we stated that true knowledge depends on people


agreeing on what they are seeing or observing, even though their
explanations for it may differ. If people cannot agree on what is being
observed, we enter into the realm of speculation. This chapter aims to
describe what is required to make a psychological assessment technique
an accurate and useful measure.

At this stage we are therefore concerned with how well our assessment
technique works, and not with what it actually measures, in other words
the way it measures, and not what it measures. If our measure is
inaccurate, we cannot believe anything it tells us, and so before we can
talk about what is being measured, we have to show that we can trust
our measuring instrument.

Reliability is a measure of the consistency with which the measuring


instrument measures. We cannot measure anything accurately with a
tape measure that stretches and shrinks, therefore one of the first things
we need to do is to evaluate the reliability of the instrument.

4.1.1 The theory of measurement


A good way of evaluating the reliability of an instrument is to use what
is known as the theory of measurement. According to this theory,
whenever we measure some attribute or characteristic, the result we get
is never 100 per cent accurate – every score has an error component*
in it. In other words, the score we get (the observed score*) consists of
a true component or score* (the part we are interested in) and an error
component (aspects of the score that do not relate to what we are
interested in. (See McIntire & Miller, 2000, p. 118.) This is shown
graphically in Figure 4.1.

Figure 4.1 The relation between the observed score, the


true score and the error component

According to this model, a measure’s reliability is equal to the true score


divided by the observed score (R = T/O). Because the true score is the
observed score minus the error score* (T = O – E), this formula
becomes R = (O – E)/O. From this formula we see that as the error
component gets smaller, so the reliability gets closer and closer to 1, and
as the error component gets larger, so the reliability approaches 0. In the
assessment situation, the error component is the result of various chance
factors.

It is also important to remember that the error component is random –


sometimes the observed score may be too high and sometimes too low.
In other words, a more accurate description is that the observed score is
equal to the true score plus or minus the error component – that is, O = T
± E. Because the error score can be positive or negative, if we repeat an
assessment often enough, the positives and the negatives cancel each
other out, and so we get a more accurate measurement. This is known as
the law of large numbers*, and is represented mathematically as
Se→0. Stated differently, we can say the sum of the error component
tends towards zero.

4.1.2 Why do random errors occur?


The occurrence of random errors* can be accounted for in a number of
ways. For example, McIntire and Miller (2000, pp. 125–126) make a
distinction between the following:

The test itself. People may not understand all the items, or the design
of the item alternatives may be poor.
Test administration. Failure to adhere strictly to time limits, noise or
other distractions, or a poor rapport between administrator and
respondent can affect results.
Test scoring. Strict or lenient markers, and poor scoring and data
capture procedures can cause errors.
Test takers. The respondents may not be fluent enough in the
language. Their mood can also have an influence on their responses.

Obviously, the more uniform and standardised the assessment process is,
the less likely a significant random error component. In other words, the
more standardised the assessment process and the scoring of the
assessment is, the more reliable the technique will be. (See section 4.2
for further discussion.)

4.1.3 Definition of reliability


The formal definition of reliability, as already stated in the introduction
to this chapter, is as follows: reliability is a measure of the consistency
with which the measuring instrument measures. As Kaplan and
Saccuzzo (2013, p. 583) say: “Reliability refers to the degree to which
test scores are free from measurement errors.”
4.1.4 Robustness* versus sensitivity* of assessment
It is important to remember that every assessment will have some error
component – no measurement is ever 100 per cent true. However, some
assessment procedures are less affected than others by variations in
assessment conditions that give rise to error: they are said to be more
robust. The robustness of any measure is therefore simply the degree to
which the measure remains unaffected by variations in the assessment
process.

While robustness is a desirable characteristic, we must not lose sight of


the fact that sometimes we are interested in the changes that occur over
time. For example, suppose we are trying to show that some process
(such as a particular form of treatment or intervention) has led to an
improvement or deterioration in a psychological attribute. Clearly, then,
we do not want a measure that is so robust that any differences in the
attribute cannot be observed.

Ideally, we need an assessment technique that is sensitive to changes in the


attribute of interest, but robust with respect to all other factors. Such a measuring
technique is hard to find, and so the assessment designer must strive to balance
robustness with sensitivity.

4.1.5 Standard error of measurement*


Because it is assumed that the observed score consists of a true score ±
an error component, it follows that the observed score will fluctuate
around the true score. It is also assumed that this fluctuation will take
the shape of a normal (or bell-shaped) distribution curve. If this is the
case, then we can argue that any score we get will be within a certain
range as given by what is termed the standard error of measurement or
SEM, which is based directly on the reliability coefficient of the
measure, using the formula

where
st = the standard deviation of the population*
rtt = the reliability coefficient

What this means is that if a person gets a score of 40, and the SEM is 3,
then there is a 68 per cent chance that his true score will be between 37
and 43 (40 ± 3), and a 95 per cent chance that his true score will be
between 34 and 46 (40 ± 6).

4.2 Sources of error

In discussing the concept of reliability, the first thing to do is to


recognise and account for the sources of the various errors that can
occur during the assessment process. As stated above, we can
distinguish between those inherent in the assessment technique or
measure, the assessment process, those related to the person being
assessed, and those associated with the administrator or scorer of the
assessment process.

4.2.1 The assessment technique or measure


Chapter 2 illustrates how an assessment technique or measure such as a
scale or test is constructed. Obviously, if this has been badly executed, it
may result in badly phrased items, poor instructions, ambiguous scoring
procedures, and the like. These in turn will contribute to errors in the
assessment process and will lead to lower levels of reliability.

4.2.2 The assessment process


The errors that arise from the assessment process itself include various
forms of distraction that may occur, including noise or a venue that is
too hot or too cold.

4.2.3 The person being assessed


These errors include fluctuations in mood, the ability to concentrate and
similar personal factors. For example, if the person has a cold, is tired or
is in a bad mood because of having had an argument, or is distracted
because of an illness or death in the family, or similar situations, those
factors will all have the effect of increasing the error component and
thus decreasing the reliability of the assessment. People who are not
used to being assessed may suffer from relatively high levels of anxiety,
also to the detriment of assessment reliability. As the person gets used to
being assessed, this anxiety level reduces. This is known as test
wiseness* or test sophistication*.

4.2.4 The administrator or scorer


These errors arise from the assessment behaviour of the assessor and
could result from his not following the instructions and time limits of the
assessment properly or from his failure to establish rapport* with the
assessees and put them at ease during the assessment process, thereby
increasing their anxiety levels. Some of the most important sources of
error are those that result from differences in the scoring method. To
illustrate this, compare the results obtained from a set of multiple-choice
questions (MCQs) with those from an essay. The results gained from
two people marking a MCQ test would not differ (except where there
are careless mistakes). On the other hand, it is highly unlikely that two
markers would agree with each other in the marking of an essay. The
greater the scope for individual interpretation of the assessment material,
the less likely two independent markers will agree in detail on the
outcome of the assessment. As a result, the more subjective the scoring
system, the more error there is likely to be and the lower the reliability
will be. (See McIntire & Miller, 2000, pp. 125–126.)

4.3 Forms of reliability determination

Reliability is thus the consistency with which a measure achieves the


same result under different conditions. With this as a general
background, we can discuss a number of different forms of reliability:
test–retest reliability*, parallel or alternate form reliability*,
internal consistency* and inter-scorer reliability*. Let us examine
each in turn.

4.3.1 Test–retest reliability


Probably the easiest way of showing that an assessment technique is
consistent in what it does is to apply the same technique to the same
group of people on two or more occasions. If we use a tape measure to
measure the length of a piece of wood today and then measure it again
tomorrow or the next day, we would expect the length to be identical on
both occasions. If we got different results, say 20 cm today and 23 cm
the next time, we would be very suspicious: either the piece of wood is
not the same, or the tape measure is faulty – perhaps it is made of rubber
and stretches. It is consistency in achieving the same result on different
occasions that lies at the heart of the test–retest technique. It represents
the technique’s stability over time. The index of this is known as the
coefficient of stability* and is obtained by correlating the score
obtained from a number of people or objects at Time 1 (T1) with those
obtained from the same group of people or objects at Time 2 (T2). The
closer the correlation coefficient* is to 1, the greater the stability over
time.

Of course, we know that people are not like pieces of wood and that
various factors are likely to interfere with and influence the score
obtained at both T1 and T2. We will consider these later. However, it is
important to note that some assessment techniques are less influenced by
these various factors and will tend to have larger coefficients of stability
than others. These techniques are said to be robust: they are not easily
influenced by extraneous factors. Equally important is that the
coefficient of stability of every assessment technique used needs to be
known as it is a crucial element in judging the value of any assessment
procedure. If we get two different results when we measure the same
phenomenon at T1 and T2, we cannot say whether the result at T1 is
correct or whether the result obtained at T2 is correct. In fact we cannot
say whether either of them is correct – they could both be wrong and
some other value correct. If a technique’s test–retest reliability is low,
we cannot trust any result obtained.

What are some of the factors that make it difficult to obtain high levels
of test–retest reliability?

Firstly, the conditions under which the assessment is carried out may
differ from T1 to T2. However, the greater the standardisation* of the
assessment procedures, the lower will be the impact or effect of these
variations.

The second source of difference between T1 and T2 lies in the person


being assessed. He may be well at T1 and ill or ailing in some way at
T2. Although these are real sources of difference between T1 and T2,
the law of large numbers discussed in section 4.1.1 will tend to cancel
the positive and negative error components, and so the overall stability
of the test will not be too badly affected.

Perhaps a more serious problem is that the person being assessed at T1


learns from the experience and is better able to do the tasks the second
time around. This involves more than simply remembering the answers
to certain items (although this does occur). It is more about learning
general strategies and ways of thinking with regard to the assessment
process. For example, most third-year psychology students would do
quite well if they suddenly were required to write a Psychology 1
examination. This would not be because they know the specific answers
to the examination questions, but because they have learned to think and
reason at a more mature level than when they were in first year. To tease
out the two factors of remembering specific answers and general
cognitive development, the time between the two assessment processes
is important and needs to be specified when looking at test–retest
reliability. When the T1–T2 interval is relatively short, remembering
specific answers is more important; when the T1–T2 interval is longer
than about six months, general cognitive growth becomes the more
dominant influence.

Although these transfer effects* are important, it is simply not correct


to conclude – as do Wolfaardt and Roodt (2009, p. 47) – that “for most
types of measures, this technique [i.e. test– retest] is not appropriate for
computing reliability”. Every catalogue of psychological assessment
materials of any repute reports on the test–retest reliability of the
techniques, generally specifying two or three T1–T2 intervals and the
associated coefficients of stability. (See the promotional material or flyer
for the fictional ABC4 test, based on a real-life test, in Exhibit 7.1, page
95, to understand how this is reported.)

4.3.2 Parallel or alternate form reliability


Suppose we were interested in seeing whether a particular form of
treatment (such as medication) could improve the ability of people, say,
with Alzheimer’s disease to solve a particular kind of problem. In this
case, we would need to test the problem-solving ability of a number of
people with Alzheimer’s disease, give them the medication and then
retest this ability. To demonstrate that the treatment had been effective,
we would have to show that if people had achieved higher scores the
second time, this was not simply because they had remembered the
answer or learned how to solve the problem in the first session. Having
two or more parallel versions of the problem-solving test would help to
overcome many of the problems associated with using the same test in a
before-and-after fashion. Of course, we would have to show that the
different versions of the test were (almost) identical.

Although some theorists argue that the development of a parallel or


alternate form of any test is time consuming and expensive, and
therefore not recommended (Wolfaardt & Roodt, 2009, p. 48), this is not
as bad as it would appear at first sight. We have already seen (in Chapter
3) how we set about drawing up and validating a test. To draw up a test
of (about) 25 items, we first define and conceptualise the content area or
domain to be investigated. The second step is to operationalise the
domain by drawing up a number of items before applying these in a pilot
study. It is at this stage that parallel versions of the test are constructed:
instead of compiling a set of 25 items, two or more sets of 25 items can
be constructed, each of which measures the domain defined in the first
step.
While the criticism that the development of parallel forms is expensive and time
consuming has been put forward, this is only partially true. In reality, the
advantages of having different versions of a test far outweigh these relatively
small additional costs.

Each version of the test needs to be tested for reliability and validity.
However, because the different versions of the test have much in
common, they can be validated relatively cheaply compared to
validating only one version of the test. Roughly, if the first version of a
test costs R100 to construct and validate, the second version will cost
about R30 to produce, and a third version only R10 more.

We still need to show that the different versions of the test are as close
to identical as possible. To do this, we administer the two tests to a
group of people as close as possible in time and then correlate the two
sets of scores. In practice, the two tests are often given one after the
other, in order to limit the effects of extraneous environmental factors.
Clearly, if the tests are demanding, an appropriate period of rest between
the two test sessions may be required. We acknowledge that there may
well be some transfer of knowledge from the first to the second version
of the test. To minimise this effect, a common tactic is for half the
sample to be given version A first, followed by version B, while the
other half of the sample group is given version B first and then version
A. In this way, any transfer effect is limited.

To ensure that the two versions of the test are at the same difficulty
level, we use a common-person research design*. This involves
having about 20 per cent of the items common to both tests and having
the same group of persons take both versions of the test. Because we
assume that the common items have the same difficulty level
irrespective of the version of the test, common-item equating*
becomes appropriate. Common-item equating assumes that any
differences in total test scores can be attributed to the difficulty of the
other items in the two tests. Since the persons are assumed to have the
same ability regardless of which test they take, the scores on the more
difficult test may have a constant added to make them equal to
equivalent scores on the easier test by adjusting the total scores based on
the differences of performance on the common items. (See, for example,
Masters, 1985.)

Once we have obtained the two (or more) sets of results, they are
correlated with each other to obtain a coefficient of equivalence*
between the different versions. In general terms, correlation coefficients
above 0,90 are regarded as acceptable indicators of equivalence. Most
reputable techniques for assessing general cognitive ability, achievement
and educational outcomes have alternate or parallel versions, and the
coefficients of equivalence between the different versions are reported.

At the same time, in Chapter 18 (section 18.3.6.1) several issues related


to online testing are discussed. In this section, Macqueen (2012),
building on developments in item response theory (IRT), describes a
technique known as the “linear-on-the-fly” (LOFT) technique. This
involves the development of a large databank of items which can then be
selected at random to generate any number of unique or “bespoke” tests
for each person being tested. This approach is summarised by Tredoux
(2013) when she states:

Using item response theory and related sophisticated algorithms, it is


no longer necessary for all respondents doing a particular test to
complete the same set of items in the same sequence. Testing systems
can adapt the difficulty level of the items to the candidate and thus
test more accurately and economically (p. 438).

This technique can thus generate any number of parallel or alternate test
versions and in this way makes the whole issue of creating hardcopy
alternate forms somewhat redundant.

4.3.3 Internal consistency


An important form of reliability is known as internal consistency. Very
briefly, the question we have to ask is simply this: Do all parts of the
assessment process measure the same thing? In other words, does the
assessment focus on a single phenomenon or is more than one property
being assessed? A good analogy here could be looking through a lens or
magnifying glass at an object. We get a much clearer picture when the
lens is focused on a single point than when it has two or more focal
points, as happens if we have astigmatism.

To find out whether our assessment technique has a single focus, we


correlate the different parts of the measure with each other. The simplest
method is to split the measure into two submeasures and to correlate the
two halves. If the two halves have a high correlation, then we can say
that the measure is internally consistent, whereas if the correlation is
low, the different parts of the measure are measuring different aspects or
phenomena. This correlation is known as the coefficient of internal
consistency*.

The most obvious way of splitting a measure is simply at its midpoint


into the first and the second half. However, this is not a good idea,
because measures, especially of ability, tend to become more difficult
towards the end. As a result, the second half is likely to be more difficult
than the first half. In addition, as we saw with test–retest reliability,
there can be a learning effect, so that the later items become easier as the
person being assessed becomes familiar with the assessment process and
type of content. As a result of this, the coefficient of internal stability is
generally based on odd and even item numbers.

To complicate matters further, the size of any correlation coefficient


depends in part on the number of items. Therefore, two measures of 100
items will have a higher correlation coefficient than two identical
measures consisting of 50 items each, because of the way in which the
coefficient is calculated. When a 100-item measure is split into two
equal parts, the resulting coefficient is based on 50 pairs of items, and is
thus an underestimate of the true correlation. Fortunately, there is a
formula to correct this – it is known as the Spearman-Brown formula*
for the correction of attenuation*. (“Attenuation” means shrinkage: the
100-item measure is attenuated to two 50-item measures.) The correct
coefficient is termed rtt, and is calculated by

where rhh is the correlation between the two half-measures.


This formula also allows us to see what would happen if the number of
items in a measure were increased. Suppose we had a test consisting of
50 items and a correlation of 0,80 with some external criterion*. What
would happen to the correlation coefficient if we were to double the
number of items to 100? (The actual number of items does not matter;
what is important is that it is being doubled from X to 2X.) Using the
Spearman-Brown formula, the correlation coefficient would increase as
follows (2)(0,80)/(1 + 0,80), which equals 1,6/1,8 = 0,88. By doubling
the number of items, the correlation between the measure and some
external criteria would thus increase from 0,80 to 0,88.

One of the problems associated with the split-half approach to internal


consistency is the decision on how to split the measure. We have
considered two possibilities: a first-half–second-half split and an odd–
even split. Clearly, there are numerous ways of dividing the whole into
two parts. To address this, a number of formulae have been developed
that look at the average of all the correlations between items. Two such
formulae are extensively used, namely those associated with Kuder and
Richardson on the one hand, and Cronbach on the other.

4.3.3.1 Kuder-Richardson formula 20


Kuder and Richardson have devised several formulae for the internal
consistency of a measure consisting of dichotomous items (Yes/No,
True/False, Right/Wrong). The best known is the Kuder-Richardson
formula* 20 (KR20), which is written as follows:

where
rtt = reliability coefficient
n = number of items in the measure
st2 = variance of total measure
p = proportion of items answered correctly
q = proportion of items answered
incorrectly

4.3.3.2 Cronbach’s alpha


Some psychological measures, such as attitude and personality scales,
have no right or wrong answers, but rather multiple-choice answers. To
calculate the internal reliability of this kind of measure, in 1951
Cronbach extended the KR20 formula. This became known as
Cronbach’s alpha statistic (Cronbach’s α), which is written as follows:

where
α = reliability coefficient
n = number of items in the measure
st2 = variance of total measure
Σs2i = individual item variances

For the mathematically inclined, note that Cronbach’s α is a special case of the
KR20, in which Σpq is replaced by Σs2i.

If the average correlation between the various items is low, alpha will be
low. As the average inter-item correlation increases, Cronbach’s alpha
increases as well. In general, alpha scores above 0,7 are acceptable. (See
Kaplan & Saccuzzo, 2013, p. 124.)

4.3.4 Inter-scorer or inter-rater reliability*


A final form of reliability is concerned with the extent to which two or
more raters, observers or judges agree about what has been observed. As
discussed earlier, most of us would accept that there is far greater
consistency between different observers, raters, judges or scorers when
the items are in a Yes/No or True/False format than when the questions
are open ended and require interpretation. If several people were to mark
a multiple-choice test, they would all get the same answer (except for
the odd counting mistake). However, it is unlikely that two different
markers would arrive at exactly the same mark if they marked an essay-
type question. Therefore we can say that multiple-choice questions have
a much greater inter-scorer reliability than an essay-type question.

A good illustration of where training is used to reduce inter-scorer differences is in


sporting competitions such as gymnastics, diving, and so on where performance
tasks are judged. These adjudicators (judges) train for many years before they
are competent to assess performance at the highest levels in these disciplines.

The more any assessment requires personal judgement and


interpretation, the greater the chance of random and systematic errors
occurring, and differences in the score will arise. The more exactly
response categories are fixed, and the more the scorers have been trained
to use these categories, the less interpretation will be required and fewer
differences in score will result, giving rise to higher levels of inter-scorer
reliability.

To conclude this section on reliability, the various forms of reliability


are summarised in Table 4.1.

Table 4.1 The various forms of reliability

Form Purpose Method Coefficient Weakness


Test–retest Consistency Same measure Coefficient Learning and
over time given to same of stability transfer effects
group at different
times
Alternate Equivalence Apply similar Coefficient Learning and
form of similar measure to same of transfer effects –
measures group equivalence control for this by
giving half of group
version A then B,
other half of group
version B then A
Split-half Internal Compare odd and Coefficient Many different ways
consistency even items of internal of splitting.
consistency Attenuation – apply
Spearman-Brown
formula
Kuder- Internal Dichotomous items KR20
Richardson consistency – True/False
20
Cronbach’s Internal Multiple answers – Cronbach’s
α consistency none correct α
Inter-scorer Agreement Give different Inter-scorer
Inter-rater between scorers (raters) reliability
scorers or copies of the coefficient
rater measure and
determine degree
of similarity in result

4.4 Factors affecting reliability

In closing this chapter, we need to look at some of the factors that affect
the value of the reliability coefficient. Among the most import of these
are the following:

4.4.1 Speed* versus power tests*


A speeded measure is one where relatively simple tasks are given and
the person being assessed is required to complete as many relatively
easy items as possible in a short space of time. Power* measures are
those where the items get progressively more difficult. In theory, power
tests are untimed, but in practice most ability tests are in fact timed,
resulting in what Nunnally and Bernstein (1993, p. 355) term “timed-
power” measures. However, if the time limits are relatively generous,
the assumptions of a power measure are met. We should also be aware
that in situations where social conditions, educational background and
test sophistication* levels vary greatly, what may seem to be a simple
task (and thus speeded) may in fact be a power test for some groups. For
example, using the alphabet as part of a coding/decoding task may be
much easier for a well-schooled person than a less well-schooled one.
The assumption that different groups have the same knowledge of the
alphabet may not be correct.

Power measures can be evaluated using all the reliability measures, but
speed measures should not be evaluated using Cronbach’s alpha or
KR20, because there is very little variance associated with each item.
Nunnally and Bernstein (1993, p. 351) also argue that the most
appropriate reliability approach for a speed measure is to split the
measure into two halves (halving the time as well). These two halves are
then treated as two alternate forms*, which are administered a short
time apart. As indicated earlier, the Spearman-Brown correction for
attenuation needs to be applied. Nunnally and Bernstein (1993, p. 351)
also recommend that temporal stability* be checked by administering
the two halves some time apart – about two weeks.

4.4.2 Restriction of range*


As we have seen, reliability is essentially a coefficient that is obtained
by correlating two sets of scores. Therefore anything that affects the
value of the correlation coefficient will impact negatively on reliability.
One of the most important of such factors is known as the restriction of
range. This is illustrated in Figure 4.2.

Figure 4.2 Restriction of range


As we can see, there is a relatively good correlation between the X and
Y scores for the sample as a whole (shown by the oval), whereas if we
restrict the scores to those inside the box there is almost no correlation
between the scores. The implication of this is that correlations should be
calculated using the widest range of participants possible in order to
make use of the total amount of variance in the sample. Restriction of
range will always reduce the reliability coefficient. (See McIntire &
Miller, 2000, p. 149.)

4.4.3 Ability level and ceiling* or floor effects* (skewness* of


distribution)
One of the factors affecting the reliability coefficient is the ability level
of the group of people being assessed. Clearly, if a measure is so easy
that everybody gets every item right, or so difficult that everybody gets
every item wrong, then there will be absolutely no variance and the
correlation of the measure with anything else will be zero. A measure
should reflect as wide an ability range as possible in order to maximise
the amount of variance in the data, which will then be reflected in
correlation coefficients that are as large as possible.

4.4.4 Length of scale (number of items)


A basic premise of measurement theory is that errors are either random
or systematic. Reliability is concerned with the random error
component. Because the error is random, it affects the observed score by
adding to or subtracting from the true score. Because the error can be
either positive or negative, the implication is that with repeated
measures these positives and negatives will cancel each other out – this
is the law of large numbers. It follows from this that the more items
there are in a measure, the less error there will be. In other words,
measures with more items are more reliable than measures with fewer
items. As Nunnally and Bernstein (1993, p. 262) note: “A major way to
make tests more reliable is to make them longer … the maxim holds that
a long test is a good test, other things being equal.”

Fewer items may be quicker and easier to administer, but the results are less
reliable – short measures are more likely to be quick and dirty than longer
measures. Of course, if a scale is too long, other sources of error such as fatigue
and loss of motivation begin to enter the picture.

4.4.5 Subjective scoring


As stated when we discussed inter-scorer reliability, the more subjective
the scoring process, the less reliable it is. Therefore any attempt to
improve the reliability of a measure has to include methods for
standardising the interpretation and scoring of items – there must be
clearly defined rules for interpreting results, and raters must be properly
trained to do this.

4.5 Summary

In this chapter we saw how important it is that the measure we use is


consistent – that is, that it yields comparable scores under different
circumstances. Reliability is defined as consistency, and is explained,
using the theory of measurement, as the ratio of true score to observed
score. Four major forms of reliability were identified, namely test–retest
reliability (consistency over time); parallel or alternate form reliability
(consistency between different versions of the same scale); internal
consistency (consistency between different parts of the scale); and inter-
rater reliability (consistency between different raters or people who
score the assessment). Finally, we examined the factors affecting
reliability, and identified five such factors, namely speed versus power
tests; restriction of range; ability level and ceiling or floor effects (i.e.
the skewness of the distribution); the length of the scale (number of
items); and how subjective the scoring is.

Additional reading
Nunnally, J.C. & Bernstein, I.H. (1993) is a major theoretical text on psychometric
theory. Reliability is dealt with extensively in Chapter 7.

Test your understanding

Short paragraphs

1. What is the relationship between the robustness and the sensitivity of a measure?
2. What is meant by the standard error of measurement?

Essays

1. Using the theory of measurement, discuss the concept of reliability and outline the
various forms it can take.
2. Briefly describe the factors that affect the reliability of a scale or measure, and
suggest ways of addressing these.
5 Validity

OBJECTIVES

By the end of this chapter, you should be able to

discuss what is meant by validity


describe the various forms of validity
describe what is meant by bias
show how these determine the value of an assessment technique.

5.1 Introduction

As we saw in Chapter 2, the validity of any technique is the extent to


which the technique measures what it claims to measure. For example, if
we are trying to measure general intelligence and find that the scores are
affected by whether the people are right- or left-handed (and we know
this is unrelated to intelligence and therefore its contribution to the total
or observed score is irrelevant), then clearly the measuring technique is
flawed. Validity is therefore concerned with the extent to which the
measure is free of irrelevant or contaminating influences. This is shown
in Figure 5.1.

Figure 5.1 True and error components of an observed


score
Validity is thus the ratio of the relevant score to the total or observed
score. Therefore the larger the irrelevant component, the lower the
validity. The irrelevant component is systematic and consistent – that is,
it is stable. It is part of the true component, but it is not part of what we
are trying to measure. Another name for this irrelevant component is
bias*. Perhaps more important is that even if we get rid of the irrelevant
component, the error component will remain. In other words, the validity
of any technique cannot be greater than its reliability.

From this it follows that the more any assessment technique relies on
subjective evaluations, the less reliable it is and therefore the less valid it
will be.

How then can we tell if a technique is valid and that it is truly measuring
what it claims to be measuring?

5.2 Forms of validity

There are three main forms of validity, all of which are important,
although they apply differently in different contexts and therefore
require different kinds of evidence. These are termed construct, content
and criterion-related validity*. (See, for example, McIntire & Miller,
2000, pp. 134–136.)

5.2.1 Construct (theoretical) validity*


The basic question asked by construct or theoretical validity is whether
the assessment technique produces results that are in line with what we
already know. For example, if we know that blonde-haired people do
things differently from brown-haired people, then we would expect our
assessment technique to distinguish between blondes and brunettes. If
our measure does not do this when existing theory says it should, then
the chances are that there is something wrong with our assessment
technique.

Be careful with the term “construct”. It has nothing to do with construction and the
way the measure was made. It refers to a theoretical idea. For example,
personality does not exist as an object – it is a theoretical construct.

There are five different ways of demonstrating the theoretical validity of


our measure.

5.2.1.1 Convergent validity


Evidence of convergent validity* is that results from our assessment
measure correlate with those from similar measures and from those
known to be theoretically linked to them. For example, we would expect
to find that racism and political conservatism are related and so we
would look for our newly developed measure of racism to correlate
positively (and quite strongly) with existing measures of conservatism.
Simply stated, our new measure should correlate with measures that we
know assess similar concepts or constructs.

5.2.1.2 Discriminant validity


Discriminant validity* is the opposite of convergent validity. In this
case, the new measure should not correlate with measures it is known to
be independent of. For example, there is no evidence to suggest that
intelligence is related to being left- or right-handed. Therefore if our new
test of intelligence correlates with handedness, then we have strong
evidence that our new measure is not valid: it has failed the discriminant
validity test.

5.2.1.3 Factor analysis*


A further source of evidence that our new assessment technique is
theoretically sound is that the factor structure as measured by our new
technique is similar to the factor structures found using other techniques
of the same construct. According to Nunnally and Bernstein (1993, p.
111),

[f]actor analysis consists of methods for finding clusters of related


variables. Each such cluster, or factor, consists of a group of variables
whose members correlate more highly with themselves than they do
with variables outside the cluster. Each factor is thought of as a
unitary attribute.

This is explained in more detail in Sidebar 5.1.

Sidebar 5.1 Factor analysis


Factor analysis is a technique for examining the components or clusters of related
variables that underlie and contribute to a number of related assessment scores.
For example, consider the scores obtained by Grade 12 scholars in a number of
different subject areas, such as English, mathematics, science, biology, history
and accounting. If we factor analyse the marks of a large number of scholars in
these subject areas, we may find that most of the marks can be accounted for in
terms of three factors, namely general intelligence, numerical ability and verbal
ability. This is shown below.

In addition, we see that each of these three factors contributes strongly, weakly or
not at all to performance in each of the subject areas. In much the same way as
in the example, we can determine which factors relate or contribute to various
aspects of a phenomenon being assessed. Although the factor structure may vary
slightly between different samples (because of measurement error), factor
structures are relatively robust – they do not change all that easily.
The point is that if we subject our new assessment technique to a factor analysis
and obtain a factor structure that is similar to that found by other researchers
using other techniques, then we can be relatively certain that our assessment
technique is assessing a similar construct. If our factor structure is very different
from that obtained with other techniques, then our case for the construct
validity* of our assessment technique is weakened considerably.
It is for this reason that Nunnally and Bernstein (1993, p. 111) argue that “[f]actor
analysis is at the heart of the measurement of psychological constructs”.
We can also distinguish between an exploratory factor analysis* and a
confirmatory factor analysis*. With the former, we try to uncover the optimum
factor structure underlying our data. For example, in the Grade 12 results
example above, we suggest that there are three factors and we call our approach
exploratory because we explore the possibilities in our factor analysis. In a
confirmatory factor analysis, we rather ask whether the data is compatible with a
certain factor structure. Suppose previous research indicates that there are five
factors underlying scholastic success and not three as we suggest, we can then
determine whether or not a five-factor solution is possible. In other words, we
seek to confirm the existence of five factors, rather than simply ask how many
factors exist.

If we are able to show that the factor structure we get from our research
is the same as or similar to the factor structure obtained in other
research, then we can be sure that our assessment measure is measuring
the same constructs as those of the other researchers. This is an
indication of the construct validity of our measure.

5.2.1.4 Maturational or developmental sequencing


A final source of evidence for the validity of our measure is that of
maturational or developmental sequencing. The point of this is quite
simple. We know that many constructs change over time and that there
is a sequence of stages, such as in Piaget’s theory of cognitive
development and Kohlberg’s theory of moral development. We know
this sequence of events is linked to physical development. If the
construct we are assessing follows this kind of developmental sequence
and our assessment technique reflects these stages, then we have further
evidence that our assessment is on the right track. If our assessment
results fail to reflect these stages, then we can suspect that our
assessment technique may not be valid.

5.2.2 Content validity


Content validity is concerned with whether the content of the scale or
measure accurately reflects the domain it is trying to assess. This form
of validity checking is most appropriate for achievement and knowledge
assessments. If we think of a typical examination or test, we see that its
content validity is concerned with whether it covers all the important
areas in the course. We may ask whether it does this in a way that
accurately reflects the importance of each aspect of the course (i.e. is the
test proportionally representative of the domain?). For example, does the
examination cover all the important areas, or does it concentrate on one
or two issues and leave out the bulk of the course content?

At the same time, content validity also applies to other forms of


assessment. If we think back to our discussion on developing an
assessment technique, we recall that the first stage was conceptualisation
(section 3.5.1). We may also remember that during the mindmapping
phase, the various components and dimensions of the construct were
spelled out. Content validity is the process by which we try to ensure
that the various components identified at this stage are more or less
proportionally represented in the final assessment measure.

This form of validity is assessed essentially by inspection: experts in the


field and those responsible for the conceptualisation of the assessment
instrument scrutinise (visually inspect) the instrument and decide
whether it is a good reflection of the domain being investigated. (See
McIntire & Miller, 2000, pp. 137–144.)

5.2.3 Criterion-related (empirical) validity


The third major form of validity is known as criterion-related validity,
because it relates the scale outcomes to some external criterion. For
example, does the scale successfully distinguish between groups of
people known to possess and not to possess a particular characteristic?
Does our test of will-power distinguish between those people who have
given up smoking and those who have tried and failed? Does our test of
typing ability separate the people known to be good typists from those
known to be poor ones? (See McIntire & Miller, 2000, pp. 148–152.)

There are two forms of criterion-related validity, namely concurrent


validity* and predictive validity*.

5.2.3.1 Concurrent validity


This form of validity is designed to ask whether the measure
successfully distinguishes between known groups. For example, an
assessment aimed at determining typing ability should (if it is any good)
be able to distinguish between people known to type well and those
known to type badly. However, there may sometimes be a problem with
the definition of a known group. In the cases of the smokers and the
typists, the criteria are obvious. In other cases, the criterion may simply
be a high or low score on another scale. For example, we may have to
ask whether the anxiety scale we have developed distinguishes between
people who score high on an existing test of anxiety and those who score
low on this measure. This is an aspect of the criterion problem*, which
is discussed in more detail in section 5.4.

5.2.3.2 Predictive validity


The second form of criterion-related validity does not focus on groups
that are known to differ at present, but instead on whether the
assessment procedure can predict how groups may differ in the future.
For example, if we develop an assessment process that claims to
measure will-power, we should be able to identify (based on the results)
those people who demonstrate will-power and those who do not. If we
were to administer our will-power scale to a group of 50 smokers, we
should successfully be able to predict who would stop smoking within a
given time period. If we were able accurately to predict, on the basis of
our assessment scores, those people who actually gave up smoking in
the prescribed time, then our assessment technique would be valid.
Another example would be the development of an assessment process to
predict who would and who would not succeed at university. If our
assessment process were accurate in doing this, we would have clear
evidence that it was valid: it had done what it claimed it would do, hence
this is known as predictive validity.

There are two further forms of validity worth a mention, namely face
validity* and ecological validity*, discussed below.

5.2.4 Face validity


Part of content validity is the notion of face validity. The basic issue
here is that the assessment technique should appear (especially to the
uninformed) to be doing what it claims to be doing. In other words, does
the test or scale seem to be appropriate? If we are trying to assess people
for appointment as apprentice fitters and turners, we need to phrase the
items in the scale in terms of nuts and bolts, cutting sheet metal and
welding. If we are trying to select apprentice dressmakers, the items
should be phrased in terms of buttons and sequins, cutting pieces of
cloth and sewing. The arithmetic operations and numbers can be the
same, but the terminology needs to be appropriate. Similarly, if we wish
to select police officers based on their reading comprehension, the
material should be phrased in terms of law enforcement situations. In the
same way, the selection of medical personnel would describe situations
involving accidents and medical emergencies. As Nunnally and
Bernstein (1993, p. 110) note, it is not only the people being assessed,
but also the administrators and their clients who need to be convinced of
the technique’s face validity. They state:

Conceivably, a good predictor of a particular criterion [such as job


success] might consist of preferences among drawings of differently
shaped and differently coloured butterflies, but it may be difficult to
convince administrators that the test actually selects employees well.

In many ways, face validity is the public relations component of the


assessment technique. (See McIntire & Miller, 2000, pp. 144–145.)

5.2.5 Ecological validity


In addition there is ecological validity, which is concerned with whether
the results of the assessment are meaningful and useful outside the
setting in which they are obtained. For example, some researchers
maintain that measures of intelligence are obtained in situations and by
methods that barely resemble what would be considered intelligent
behaviour in the real world. As a result, while these assessment results
are psychometrically sound, they have very little ecological validity.
Ecological validity can thus also be termed contextual validity, and
relates more to constructs such as fairness* (see Chapter 7) and validity
generalisation* (see section 5.5).

5.2.6 Incremental validity


A term that is often used when looking at validity, especially predictive
validity, is incremental validity*, which is the extent to which the
inclusion of a second predictor improves the predictive power of a
particular variable. Suppose we show that a cognitive test predicts job
training results with a correlation of 0,60. This can be shown as the
extent to which two circles (one representing the predictor and the other
representing the criterion) overlap. This is shown in Figure 5.2a. If we
then were to include a measure of, say, conscientiousness in the
calculation, we could find that the correlation with training results
increased to, say, 0.83, because more of the criterion variance is
explained. The inclusion of the conscientiousness measure has
accounted for additional variance (Figure 5.2b) and has increased the
overall prediction by 0,23 – we can thus conclude that conscientiousness
as an incremental validity of 0,23. If the second predictor explains very
little of the criterion variance it may have very little incremental value
(Figure 5.2c). In some cases, the inclusion of a second predictor may
actually reduce the predictive value of the first predictor and has a
negative incremental validity.

Figure 5.2 Incremental validity


5.2.7 Synthetic validity
A seventh form of validity that is sometimes used is that of synthetic
validity*, a term defined as “the inferring of validity in a specific
situation from a logical analysis of jobs into their elements, and a
combination of those elemental validities into a whole” (Balma, 1959:
395). It is thus a form of predictive validity and is also known as job
component validity. Suppose I were to compile a battery of six different
tests and could produce evidence that all six of these tests are valid
predictors of the criterion in question. According to this theory, I do not
have to prove the validity of the battery as a whole, because each
component is valid – and I can conclude that the battery is synthetically
valid without having to demonstrate this through independent research.
(See, for example, Hogan, Davies & Hogan, 2007.)

Table 5.1 summarises the various forms of validity and the questions
they seek to answer.

No discussion on validity is complete without considering three further


issues: interpreting validity coefficients, the criterion problem and
validity generalisation.

Table 5.1 The various forms of validity

Question
Type Purpose Form
asked
Construct Is the measure Convergent Does the
(theoretical) theoretically sound? measure
correlate with
similar
measures?
Discriminant This measure
must not
correlate with
other measures
to which it is not
related.
Factor analysis Is the factor
structure of the
measure similar
to the factor
structures of
other measures
of the
phenomenon?
Maturational/developmental Does the
sequencing measure reflect
the known
developmental
and
maturational
sequence
described by
theory?
Content Does the measure Content validity Are the items
accurately reflect the representative
content of the domain of the domain
that is being under
assessed? investigation?
Face validity Do the test
items appear to
be appropriate
for the test’s
purpose?
Criterion Does the test Concurrent Does the test
related correlate with external result correctly
criteria such as job identify groups
success, pass rates, that are known
etc.? to differ on the
characteristic
being assessed
(e.g. good vs
poor typists)?
Predictive Does the test
successfully
predict who will
show the
characteristic
being assessed
at some time in
the future?
Ecological Is the test fair and Are the items
useful in other and the test as
situations? a whole relevant
and meaningful
in situations
outside the test
situation?

5.3 Interpreting validity coefficients


In general, the higher the validity coefficient*, the better it is. In
practice, validity coefficients above 0,5 are acceptable, and in the case of
selection criteria, validity coefficients as low as 0,3 and even 0,2 are
acceptable. In this regard, it is important to know one additional
concept, namely that of the coefficient of determination*. This is the
extent to which the two scores being correlated overlap or share
common variance.

“Variance” is the spread of any observed score. Suppose we want to


predict first-year exam results in psychology using Grade 12 English.
The English mark for each Psychology 1 student is correlated with his
Psychology 1 examination mark. If the correlation is high (above 0,9),
the English mark is a good predictor, whereas if the correlation is low
(below 0,3), it is a poor predictor. The English mark is therefore the
predictor and the Psychology 1 mark is the criterion. One way of
understanding what a correlation is, is to see it as the amount of overlap
between the two sets of scores. Graphically, this can be represented by
two overlapping circles, with the common variance being that portion
which is in the overlapping area.

This is shown in Figure 5.3 and discussed further in Appendix 2.

Figure 5.3 Sharedvariancebetweenthepredictor


and the criterion

The coefficient of determination is calculated by squaring the validity


coefficient. In other words, if the validity coefficient is 0,5, then the
coefficient of determination is 0,5 × 0,5, which equals 0,25. This means
that there is a 25 per cent (0,25 × 100) overlap between the two
measures. If the validity coefficient is 0,3, then the coefficient of
determination is 0,09 – that is, there is only nine per cent overlap (i.e.
common variance) between the two sets of figures. In other words, only
nine per cent of the variance in the scores on the criterion can be
attributed to or explained by the variance in the scores on the predictor.
As the two circles overlap more and more, the value of the coefficient of
determination increases, showing that the predictor predicts the criterion
to an increasing extent. When the circles overlap completely, there is
perfect prediction.

Although nine per cent does not sound like a great deal, it can be important.
Imagine if an employer could improve the productivity of his workforce by nine per
cent using proper selection methods. Similarly, if ten per cent fewer marriages
ended in divorce, or ten per cent fewer people died in car accidents, it would be a
meaningful contribution.

5.4 The criterion problem

We have argued that the criterion-related validity of any techniques lies


in the extent to which it reflects or predicts performance related to some
external criterion. This could be the ability to give up smoking, to type
accurately, to pass university examinations, to be a good police officer,
and so forth. However, in many cases, it is relatively difficult to define
the criterion in a precise way. For example, what does it mean to be a
good police officer in reality, and how do we recognise one when we see
one? One way of establishing criteria for, say, a good policeman or
typist would be to ask a supervisor to rate the person’s performance.
However, this approach has many problems, because the raters do not
always know what they are looking for.

Research described by Nunnally and Bernstein (1993, p. 96) illustrates


this problem. In a particular study, a number of police officers were
rated as successful or unsuccessful by their supervisors, and these
ratings were compared to the officers’ personality profiles* obtained at
the recruitment stage. It turned out that the officers who were rated
highest by their supervisors had been the most maladjusted at the time of
the selection. What the senior officers had assessed was the more junior
officers’ subservience: those officers who obeyed instructions and were
not insolent were rated as better officers than those who challenged their
superiors.

When we examine interviewing in Chapter 16, we will see why this


result is not all that surprising. This research also points to the fact that
the low validity of the assessment technique may be as much of a
problem with how the criterion is defined as it is a problem with the
predictor!

5.5 Validity generalisation

An important issue is whether the assessment is equally valid for all


groups. For example, a certain form of assessment may be more accurate
(i.e. more valid) for males than females, for urban than rural people, for
learners from private schools than those from government schools, and
so on. Test fairness is a form of validity generalisation, and the
Employment Equity Act 55 of 1998 bans psychometric testing unless it
can be shown scientifically to be equally valid for all groups. This whole
issue of the fairness of assessment is dealt with in Chapter 7, as is the
notion of fairness and ecological validity.

5.6 Factors affecting validity

As already stated, validity is the ratio of relevant score to the total or


observed score (see section 5.1). We have also seen that the observed
score is affected by both random error and irrelevant or systemic error
bias (see Figure 5.1 and related text). Clearly, the larger each of these
components is, the lower will be the reliability and the validity. Chapter
4 deals with the origin of random error and how to minimise its
occurrence. In this section, we examine where the systemic error (the
irrelevant part of the observed score) comes from, and what can be done
to minimise this aspect.

5.6.1 Characteristics of the assessment technique or


instrument
This group of factors is associated with the measuring technique itself
and includes the following:

5.6.1.1 Test items


Often measuring instruments are drawn up with a particular group of
people in mind. Although the items may be valid for this target group,
they are less so for others. This is particularly true when imported
instruments are used without their being adapted to meet local
conditions. For example, tests and scales from the US could have items
about George Washington or Abraham Lincoln, making them biased
against South Africans. Because many South African tests and scales
were drawn up in an age when thinking was very Eurocentric, many of
the items that were developed favoured some groups and were biased
against others.

5.6.1.2 Phrasing and/or language level


The fact that many people who are assessed in South Africa do not have
English (or Afrikaans) as their first language means that quite often
items and assessment measures as a whole are not clearly understood by
them. This refers not only to vocabulary, but also to things like double
negatives and long, complicated sentences. In the late 1990s, two noted
professors of psychology, Fatima Abrahams and Ricki Mauer,
researched the differing performances of various groups on the 1992
South African version (SA92) of a well-known personality scale, the
16PF, at the University of the Western Cape. They felt that the
differences they found reflected the difficulty non-mother-tongue
speakers of English had with the language of the instrument. To check
this possibility, they asked students in the Industrial Psychology
Honours course at the university to give the correct meanings of various
terms used in the SA92, using a list of possible synonyms obtained by
consulting various dictionaries. The results confirmed for Abrahams and
Mauer (1999a, 1999b) that the students did, in fact, have trouble with
the items and were unable to select the correct synonyms for many of
the terms used. On this basis they argued that the language level of the
SA92 made it inappropriate for people whose mother tongue was not
English.

To explore this matter further, the instructions and the first 52 items
were submitted to a simple grammar check on the computer, which
returned a Flesch-Kincaid grade level* of 6,9. Flesch-Kincaid is a
formula based on sentence length and the number of syllables per word,
and is designed to measure the complexity level of the language used.
The grade levels are roughly equivalent to US school grades. When the
language used in the SA92 was analysed using the grammar check of
MS Word, it was found to have an English reading level of about Grade
7. This meant that the Honours students were unable to comprehend
English that the majority of English first-language speakers in Grade 7
could be expected to understand.

The Flesch-Kincaid grade level rates text based on US school grade levels. For
example, a score of 8,0 means that an eighth grader can understand the material.
The formula for the Flesch-Kincaid grade level score is

(0,39 × ASL) + (11,8 × ASW) – 15,59

where
ASL = average sentence length (the number of words
divided by the number of sentences)
ASW = average number of syllables per word (the number
of syllables divided by the number of words).

One of the problems with these findings is that the synonyms used in
this study were obtained from American dictionaries. Another is that
there is no indication how the Flesch-Kincaid grade levels relate to
South African school grades. However, the differences in the SA92
scores do not appear to be caused primarily by language ability, but
rather by the socialisation practices of the groups involved. Research by
Shuttleworth-Jordan (1996) and others shows that as educational and
background factors become more equal, language-based differences in
test outcomes begin to disappear.

5.6.1.3 Restriction of range


We have already seen (Chapter 4, Figure 4.2) that the correlation
between two variables is significantly reduced when a measure is
applied to a select subsample of the total population. This also applies
when we try to establish the validity of any assessment technique.
Suppose there is a good correlation between, say, intelligence and
creativity. If we try to demonstrate this relationship using a sample of
university students, the fact that the sample is above average in
intelligence is likely to result in a lower validity coefficient than would
be the case in a more representative sample of the population.

5.6.1.4 A good criterion


As already stated, the validity of any technique is established by
correlating the results obtained on our predictor with some external
criterion (section 5.2.3). We have also seen that it is sometimes quite
difficult to find a good criterion that is not subject to all sorts of
contaminating influences. (See section 5.4.)

5.6.2 Individual characteristics


In addition to problems associated with the measuring instrument itself,
there are also a number of problems associated with the person being
assessed that will result in the outcome being biased. These include the
following:

5.6.2.1 Test sophistication


When people are routinely exposed to situations in which they are
assessed, they become accustomed to the process. However, people who
are not often assessed find the whole situation a little strange and even
threatening. Given the complex nature of our society, people are going
to differ quite radically from each other in the way in which they deal
with the assessment process. This is referred to as test sophistication or
test wiseness. People who are relatively test unsophisticated are likely to
systematically underperform on the assessment tasks. This in turn will
affect the validity of the assessment.

5.6.2.2 Understanding and/or language ability


As we saw in section 5.6.1.2, the language ability of the person being
assessed can play a major role in understanding and responding to
language-based assessment processes. Differences in language ability
are a major source of bias in assessment.

5.6.2.3 Anxiety levels


It is well known that performance in complex tasks may decline as
anxiety levels increase, simply because the person is unable to devote
his full energy to the task at hand. Therefore the assessor has to do
everything in his power to put the person being assessed at ease.
However, people who are not very familiar with the whole assessment
process often show high levels of anxiety. (See, for example, Foxcroft &
Roodt, 2005, p. 93.)

5.6.2.4 Time limits and competitiveness


Different people have different attitudes to competitive situations,
irrespective of whether the competition is between people (such as in a
race) or against the clock (a race against time). This is particularly
important when assessment tasks are timed: those people who are non-
competitive by nature will approach the timed task in an unhurried way,
and will generally achieve lower results than more competitive people.

5.6.2.5 Response styles or response sets


People respond to assessment items in different but fairly predictable
ways. These are known as response styles or response sets.

Extremity or centrality. Some people prefer to answer at the


extremes of the available range: they tend to use the “Strongly agree”
or “Strongly disagree” choices in a five-point Likert scale. Other
people prefer to answer in the middle or to use the “Agree” or
“Disagree” options. (You will recall that in Chapter 2 we argued for
doing away with the midpoint by having an even number of response
options in order to avoid this fence-sitting response style.)
Acquiescence* or contradiction. In exactly the same way, some
people are inclined to agree with whatever is being asked, whereas
others want to disagree with what is being said – as a matter of
principle, it would seem. These people are often labelled “yeasayers”
or “naysayers”. To counteract this tendency, we argued in Chapter 2
that half the items should be scored in a positive direction and the
other half in a negative direction.

5.6.3 Demand characteristics


Another set of factors that have an effect on validity are the demand
characteristics. These refer to aspects of the situation that steer the
person’s response in a particular direction.

5.6.3.1 Presentation of self or image management


All of us have a particular image of ourselves that we strive to maintain.
Some of us may want to seem worldly wise, others mentally tough, or
we may want to be seen as fair, or whatever. We then begin to respond
to the various items in the scale in terms of these images and how we
would like to project ourselves. In a similar way, when we are
interviewed, we like to give the best account of ourselves.

5.6.3.2 Second guessing


Most of us are generally reasonable people and so, when we complete a
questionnaire or scale, we try to help the researcher by doing our best. In
this process, we may try to work out what we think the researcher is
looking for, and then tend to steer our responses in that direction. This is
second guessing. It means that the respondent guesses what the study is
about and tries to assist the researcher by responding appropriately.

5.6.3.3 Social desirability


At some time or another all of us do something we think others will
disapprove of. Almost everyone has at some stage been tempted to take
something that does not belong to him – this is only natural. Even
though we may never have succumbed to these temptations, we believe
it is wrong even to think about such things and so, if we are asked
whether we have ever been tempted, we say “No”. In other words, we
give answers that we believe are acceptable to others rather than totally
truthful. Because this sort of behaviour is widespread, there are various
scales of social desirability, the most widely used being the Crowne-
Marlowe social desirability scale. Most well-constructed scales include a
few social desirability items. If too many of these items are scored high,
doubt arises as to the accuracy of the scale as a whole, and the results
must be interpreted with caution.

5.6.3.4 Deliberate distortion or faking


The final demand characteristic we need to examine is the deliberate
faking of the results. In many cases, people try to make themselves look
better, for example in a job interview. This is termed “faking good” and
is similar to the impression management discussed in section 5.6.3.1. In
some cases, especially where people are claiming damages in order to be
paid insurance money, they may deliberately try to make themselves
look worse than they really are. This is termed “malingering” or “faking
bad”.

People who have been injured in motor vehicle accidents (MVAs) or industrial
accidents often stand to be paid large amounts of money by insurance companies
for the loss of amenities and future earnings. The temptation to overstate the
nature and extent of the damage is thus high. Psychologists investigating these
cases therefore have to guard against the possibility of malingering.
Many scales, especially those probing cognitive function or brain injury,
have items that may seem plausible to someone trying to fake bad, but,
in fact, describe what rarely occurs in practice. For obvious reasons,
these are not described here.

5.7 Summary

In this chapter we discussed validity, the single most important property


of any measuring technique – there has to be evidence that the technique
is assessing what it claims to be assessing and not something else.
Validity was shown to be the absence of both systemic irrelevant error
(or bias) and random error. Various forms of validity evidence exist and
need to be shown to be present if our assessment technique is to have
any meaning. The three major forms include construct or theoretical
validity (is our measure compatible with existing theory?), content
validity (does our measure represent the subject area or domain we are
assessing?), and criterion-related validity (does our measure reflect or
predict observable differences between groups? – the former is termed
concurrent validity while the latter is predictive validity). In addition,
there are two minor forms of validity, namely face validity (does the
measure look right to the target group?) and ecological validity (does the
test make sense in the environment in which it is used?). We discussed
how various factors, some unconscious, some quite deliberate, serve to
reduce the validity of any measure. These relate to the assessment
technique or instrument itself, those that relate to the people being
assessed (such as various demand characteristics, response sets, faking
good, malingering, and so forth) and problems associated with the
definition and quantification of the criterion – the so-called criterion
problem. If we know about the different barriers to validity, we can
build various checks and balances into our measuring techniques and
take steps to prevent them.
Additional reading

Chapter 6 in Cohen, R.J. & Swerdlik, M.E. (2002). Psychological testing and
assessment: An introduction to tests and measurement gives a good account of the
theories surrounding the concept of validity, especially in relation to culture.
For a clear explanation of factor analysis and the use of confirmatory factor analysis in a
validation study, see pages 177 to 179 in McIntire, S.A. & Miller, L.A. (2000).
Foundations of psychological testing.

Test your understanding

Short paragraphs

1. Using the theory of measurement, briefly describe the relationship between reliability
and validity.
2. Name the various forms of validity, going into detail on at least two of them.
3. Describe what is meant by demand characteristics and give five examples of these.

Essays

1. Discuss what is meant by validity, using the theory of measurement (O = T + E) and


outlining the different forms of validity that can be identified.
2. What factors affect the validity of a measuring technique (such as a scale) and what
can be done to overcome these factors in order to improve validity?
6 Combining and interpreting
assessment results

OBJECTIVES

By the end of this chapter, you should be able to

discuss the five approaches to interpreting assessment scores


discuss the concept of a norm
describe the strengths and weaknesses of a norm-based approach to interpreting
assessment results
distinguish between norms that are based on central tendency and those based on
a bottom-up approach
outline the strengths and weaknesses of other forms of interpretation
describe different ways of combining scores from different assessments
discuss the strengths and weaknesses associated with each
outline what is meant by a clinical versus a statistical combination of scores
describe different decision-making strategies and show how these affect the
decisions that are made.

6.1 Introduction

As we have stated, assessment is assigning a value to a phenomenon,


whereas evaluation is interpreting the results of an assessment and
passing value judgements. But what do these values mean? Are they
good or bad, above or below average? Does the person need help of
some kind? Should the person be admitted to a hospital or selected for a
job? How do we interpret the results of the assessment?

In essence, there are five quite distinct ways of approaching the


interpretation of assessment scores. These are discussed below.

6.1.1 Expectancy tables


The use of expectancy tables* is based on asking the following basic
question: What are the chances that a person with a given score will …?
In other words, what outcomes can be reasonably expected from
someone with these or similar results? For example, what are the
chances of a person who has a score of, say, 25 on a measure of will-
power being able to give up smoking within the next three months? Or
what are the chances of a person who has a score of x on a measure of
impulsivity being able to resist the temptation of stealing a pen left lying
on top of someone’s desk?

A good example of how an expectancy table is used is the Swedish


formula (a point system based on Grade 12 school results) that was used
until recently for admittance to different faculties at a university. In
Table 6.1, hypothetical scores obtained in this way are used to indicate
the likely outcomes for various subjects. It shows that if a person has a
Swedish formula score of 38, he will be likely to fail physics, get a 3rd
(50–59%) in other sciences, a 2nd (60–74%) in commerce and a 1st
(more than 75%) in humanities subjects.

Table 6.1 An example of an expectancy table

Result Fail 3rd (50–59%) 2nd (60–74%) 1st (75%+)


Physics <38 39 40 42+
Other sciences <37 38 39 40+
Commerce <35 36 38 38+
Humanities <33 34 35 36+

This does not mean that a person with a score of 33 who is highly
motivated will not do better than predicted. Similarly, there is no
guarantee that a person with a Swedish formula score above 44 will not
fail physics or any other subject if he does not apply himself properly.
However, the table has been built up over many years and gives us a
relatively good idea of what can be expected. (Although this approach is
no longer used by the educational authorities in South Africa, it does
illustrate how expectancy tables are used.)

6.1.2 Norm referencing


The norm-referenced* (or normative) approach asks the following
question: How does this person compare with others? For example, if
we lined up a particular group of people from the shortest to the tallest,
we could then quite easily say that a given person was the shortest, or
the tallest, or above average, or three from the bottom, and so on. Of
course, this raises the question about the composition of the group. If the
group consisted of US basketball players, then even the shortest person
would be quite tall by usual standards. Similarly, if the group consisted
of the dwarfs from Snow White, then most surely even the tallest in the
group would not be very tall by our usual definition of tallness. And so,
a crucial question in using a normative approach to interpret any
characteristic is which group are we judging and who are we judging
them against? Because normative interpretation is most often used in
psychology, it is dealt with in more detail in sections 6.3 and 6.4.

6.1.3 Age, grade and developmental stage referencing*


This approach is related to the normative one, but looks at age-
appropriate or developmental norms. For example, we expect puberty to
begin at about age 14 for boys and 12 or 13 for girls. If a child has not
entered puberty by age 18, or if puberty begins before age ten, we can
believe that it is unusual. Similarly, there are various stage theories such
as those of Piaget (cognitive development) and Kohlberg (moral
development) which link psychological development to chronological
age.

In terms of this approach, we can gauge whether a person’s physical


and/or psychological development is behind, ahead of or in line with
age-expected developments. For example, according to Piagetian theory,
the shift from concrete operational to formal operational or abstract
thinking usually takes place at about 12 to 15 years of age. If this has not
occurred by age 18, it is behind schedule (what Piaget terms a
décalage). We have talked quite crudely about being behind, ahead of or
in line with age-expected schedules. Much finer distinctions can be
made in terms of percentiles. These are discussed in section 6.2.

In the US many test norms are reported in terms of age equivalents* (or
age norms*) and/or grade equivalents* (or grade norms*). If we use
age norms as an example and say we have a score of 23 on the Ravens
Standard Progressive Matrices, it would (hypothetically) be reported as
an age equivalent of eight years two months and as having a grade
equivalent of Grade 2,4. In other words, the majority of people eight
years and two months of age and the majority of people who are in
Grade 2 and have been there for four months would be expected to score
23 on this test. This means that the score of 23 would be the equivalent
of being eight years and two months old and would also be equivalent of
being in Grade 2 and having been there for four months since promotion
from a lower grade. (Look at the fictitious promotional material in
Exhibit 7.1 page 95 to see how this is reported in reality.)

You can see that these two norms are very similar to Binet’s notion of
mental age*. Many of these issues are discussed in McIntire and Miller
(2000) especially Chapter 5. (See also Anastasi & Urbina, 1997.)

6.1.4 Criterion* or domain referencing*


This approach looks at the degree to which the characteristic meets some
external need* or criterion. For example, it is not concerned with
whether the person is the tallest or shortest in the group, but rather with
the question of whether the person is tall enough to be a cabin attendant
in an aircraft. It does not attempt to see how fast the person can type, but
rather whether the person is able to type a minimum of 60 words per
minute. It does not matter whether the person is the best or the worst in
the group, but whether the person meets the minimum requirements of
the situation.

6.1.5 Self-referencing*
The final approach to interpreting an assessment score is to see how the
score relates to a similar assessment made earlier. Has the person or the
system improved, remained the same or deteriorated? If they have
improved or deteriorated, to what extent? Has the depressed person
become less depressed after the treatment? Is the improvement greater
with medication 2 or dose 2 than with medication 1 or dose 1? Does the
person smoke fewer cigarettes or type faster today than he did
yesterday? These are all self-referential questions.

6.2 Norms

We stated earlier that the most popular method for interpreting


psychological scores is a normative one in which the person’s score on
the assessment instrument is compared with others on the same measure.
The distribution of scores obtained by the members of the group is
termed a norm* and the group that is used to obtain the norm is known
as the standardisation group. A norm is therefore the distribution of
scores obtained from a standardisation group. Norms are empirically
obtained by determining how well a representative group of people
achieves on the specific task. They are usually presented in a bell-
shaped or normal distribution* curve. We use norm data in two basic
ways: to establish where an individual’s score lies in relation to the
midpoint or mean, and how many people in the group scored less than
the individual did.

6.2.1 Interpretations based on central tendency*


This approach to evaluating a person’s results asks where his score lies
in relation to the mean (average): is it above or below the midpoint?
There are various ways of doing this. The following are among the most
important:

Z-score*. How many standard deviations* (SDs) above or below


the mean is this score?
Deviation score*. This converts the z-score to a mean of 100 and SD
of 15 and is used mainly with IQ scores.
McCall’s T-score*. This is similar to an IQ score, but converts the z-
score to a mean of 50 and an SD of 10. This approach is widely used
by psychologists in North America. These scores are sometimes
termed standard scores* in South Africa.

6.2.2 Interpretation based on the number of people with lower


scores
Another norm-based approach to interpreting assessment results is to see
how far from the top or bottom of the group the person is. That is, how
many people have lower scores than the person concerned? Is the person
in the top 10 per cent of the group, or in the bottom 20 per cent, and so
on? In most cases, psychologists arrange the scores from the lowest to
the highest and then count from the bottom up. There are four basic
ways in which this is done, namely percentiles, quartiles, stanines and
stens.

The various scoring processes are explained as follows:

Percentiles*. Interpretation relates to the percentage of the population


that scores below a particular score. For example, a person who scores
in the top ten per cent of the population will be at the 90th percentile.
A person in the lowest 15 per cent of the population will score at the
15th percentile.
Quartiles*. The scores obtained from all the people who have been
tested are ranked from lowest to highest and divided into four equal
groups (quartiles). The person’s score is then placed in one of these
groups. In other words, the person’s score could be in the first (i.e. the
lowest quarter of the population), second, third or fourth quartile. A
person in the fourth quartile will have a score that is better than three-
quarters of the people who have been tested.
Stanines*. This term is derived from sta(n-dard) nine. The
distribution of scores is divided into nine roughly equivalent
categories or bands.
Stens*. This is derived from St(andard) (t)en. The distribution of
scores is divided into ten roughly equivalent categories or bands.

These different approaches to normative interpretation are given in


Figure 6.1. We must bear in mind at all times that in a normative or
norm-based approach, the person’s score is evaluated in terms of where
he stands in relation to a group of people measured for the same
characteristic.

Figure 6.1 The normal distribution and methods of interpreting scores

Note: The various scales above are not exactly to scale, although the general
picture can easily be seen.

6.2.3 Age equivalents or scores


As we have seen, individual scores are sometimes linked to the age for
which the score is the most appropriate (cf. mental age).
6.2.4 Grade equivalents or scores
As stated above, US test producers often report assessment results in
terms of the school grade that someone with a particular score would
have successfully achieved.

6.3 Developing and reporting norms (norm tables*)

A norm table enables test scores to be grouped so that the assessor can
interpret a particular score. For example, a score of 23 may represent
stanine 3, an age equivalent of 14 years 8 months, or a grade equivalent
of 5,6. Where do these figures come from?

Quite simply, norms are generated by administering the assessment


measure to a large number of people similar to those for whom the test
has been designed. For example, if a test has been designed for the
selection of artisan apprentices, it should be administered to a
representative group of typical apprentices (preferably 200 or more).
Once the completed tests have been scored, the distribution of the results
should be calculated in terms of the normal distribution shown in Figure
6.1.

These norm tables are usually published in the test’s technical manual.
When the same test is used on a variety of samples, several norm tables
may be collected and kept in a norm book. Because the general
education and experience of a workforce changes over time, it is vital
that norms be updated every five years or so. Table 6.2 is an example of
a typical norm table, showing some of the more common indices.

Table 6.2 An example of a typical norm table

Norms for ABC test


Sample: Apprentices Durban March 1997 N=493 (462 male, 31
female)
Raw McCall’s T-score Percentile
Frequency Stanine Sten
score (standard score) rank
2 2 23 0,45 1 1
3 0 25 0,91 1 1
4 1 27 1,14 1 1
5 2 28 1,82 1 1
6 4 30 3,18 1 1
7 4 32 5,00 1 2
8 11 35 8,18 2 2
9 9 37 12,50 2 3
10 16 40 18,18 3 3
11 8 42 13,64 3 4
12 19 44 29,79 4 4
13 16 46 37,74 4 5
14 19 48 45,69 5 5
15 24 50 55,44 5 5
16 18 53 6,98 6 6
17 20 55 73,63 6 6
18 10 57 80,45 6 7
19 8 59 84,55 7 7
20 4 61 87,27 7 8
21 15 63 91,59 8 8
22 2 65 95,46 8 8
23 5 68 97,05 9 9
24 0 70 98,18 9 9
25 4 72 99,09 9 10

From this norm table, we can see that a person with a raw score* of 20
has a T-score* of 61, a percentile rank of 87,27, a stanine of 7 and a sten
of 8.
6.4 Norm groups*

It should be obvious that the nature of the norm group is vital. For
example, if we wish to evaluate a school leaver to see if he has the
ability to work as a clerk, then we must compare his score with that of a
group of school leavers who have succeeded in a clerical field. If we
wish to see whether an elderly patient is suffering from Alzheimer’s
disease, we need to compare his scores on a particular measure against a
norm sample of approximately the same age. If we know there may be
gender differences, then we need to make sure that the norm sample uses
people of the same gender as the subject.

It is therefore necessary to develop a series of norms and then to select


the one that is the most appropriate for the purpose. (A useful
programme for generating norms is called the NormMaker and can be
obtained from Leaderware (sales@leaderware.com).) This raises a
problem, though. Typically, if we want to select a person for a particular
position, we need to ensure that the appropriate norm is used and to
consider all the facets. For example, if we want a good typist, it would
not be useful to select the best typist from a group of six-year-olds.
Although he may be the best typist in his group, we cannot say whether
he will be good enough for our needs.

Furthermore, consider a fictitious case of where it has been decided that


stanine 5 is the minimum (or cut-off) score for selection or diagnosis.
Two groups of people have been assessed and separate norms for them
have been drawn up.

Table 6.3 Raw scores and stanines on an assessment

Stanine Group A raw score Group B raw score


1 1–11 1–9
2 12–14 10–11
3 15–16 12–13
4 17–19 1415
5 20–21 16–17
6 22–23 18–19
7 24–25 20–22
8 26 23–24
9 27–30 25–30

Table 6.3 shows that a person scoring 18 would be below the cut-off
score (and hence not selected) if he were from group A, but would be
above the cut-off point (and thus selected) if he were from group B.
Furthermore, the person from group A would need to score at least 20 if
he were to be selected, and though this would put him at stanine 7 (well
above average) if he were from group B.

We will consider this issue further when we discuss fairness in


assessment in Chapter 7. Nevertheless, we can see how a group A
candidate could feel unfairly treated if he scored higher than a person in
group B, but was judged to be below the cut-off point because of the use
of different norm tables. See Case study 6.1, page 72, for a
demonstration on how norms are used.

An important aspect to remember in this regard is that norms date as


people become better educated. For example, Flynn (1984) shows that
the information upon which IQ tests are based increases in complexity
so that a person scoring an IQ of, say, 100 in 1960 would score far lower
in 2008. This form of intelligence inflation is known as the Flynn
effect*.

6.5 Combining information and making decisions

So far it may appear as though there is only one score which needs to be
interpreted. However, in many instances, a person is assessed on several
measures as part of the triangulation process and to get as many
viewpoints as possible. In a few cases, people who have been assessed
by different instruments meet the criterion (i.e. score above the cut-off)
on all measures. In most cases this does not happen – participants
usually score above the cut-off point on some measures and below on
others. The problem then is to decide whether the person passes or fails
(is accepted or rejected) overall. Even where a person does meet all the
minimum requirements, there are many situations (such as selection)
where we are interested in choosing the best person rather than one who
meets the minimum requirements.

The question then is how to combine these different scores on different


measures to find the best person. The same question arises in a
diagnostic framework where we have to decide whether someone has
suffered brain damage as a result of an accident (as described in Case
study 6.1). So too in a school situation, where a person may fail some
examinations and pass others, and we have to decide whether he should
be promoted to the next grade or kept back.

In each of these cases, we are faced with a person who meets the criteria
on some measures and falls below the cut-off point on others. How do
we combine these results to come to a decision? Below we use a simple
example of a person who has written three different examinations and
has passed two and failed one. We also assume that the examination
results in this chapter are given as stanine scores, because they are the
easiest to understand, although the general principles apply to all forms
of scores. Also, to make it easier to understand, we assume that the cut-
off score (the pass mark) is stanine 5 – scores of stanine 5 and above
meet the criterion (they pass) and scores below stanine 5 do not meet the
criterion (they fail). Although different cutoff scores can be set, we will
work with stanine 5 in this chapter.

These hypothetical results are given in Table 6.4.

Table 6.4 Hypothetical scores

Subject Raw score Stanine


English 43/70 5
Mathematics 29/55 4
Biology 43/65 6

6.5.1 Mechanical (actuarial) versus clinical combination


The first distinction we need to make is that made by Meehl in 1954
between a mechanical (statistical or actuarial) and a clinical approach to
the combination of scores. The mechanical or adding machine approach
refers to the use of some formula in which the various scores are simply
added and the average of these taken. For instance, in the example
above, the stanine scores are simply added and the average taken: 5 + 6
+ 4 = 15 ÷ 3 = 5. This meets the cut-off of stanine 5, and the person
passes. This is straightforward and can be done by a computer. Even if
we weighted the measures slightly differently (as discussed in section
3.6.4), the arithmetic could still be done by computer. This is what is
meant by a mechanical or actuarial approach – it can be done using a
formula of some kind.

The clinical approach relies far more on the professional judgement of


the assessor. Using the set of scores in Table 6.4 again, the clinical
approach could argue that the result on measure 2 (mathematics) is
vitally important. Even though the average of the three stanine scores is
5, the person scores below the cut-off on this vital measure, and
therefore the overall decision must be a fail. Similarly, should the
average of the stanine scores be below 5, clinical judgement allows the
assessor to downplay the weak scores and admit the person. For
example, with the clinical approach we could argue that mathematics is
not very important, and that the stanine 6 in biology is more important
than the 4 in mathematics, which means that the person should pass,
even if the average is below 5. The reason for allowing this kind of
decision is because of the expertise and experience of the assessor; the
decision is based on a clinical judgement that takes other factors into
account. In other words, the assessor has the power and the expertise to
overrule the formula.

Although this approach has a certain appeal, it has a number of


problems, the most important of which is the possibility of individual
bias creeping into the decision. For example, it could be argued that the
person passed or failed because he is related to the assessor or because
of favouritism or bribery, and so forth. Even if the assessor is entirely
trustworthy, everybody can make mistakes, even clinicians (as we see in
Chapter 16 on interviewing). Finally, this clinical approach flies in the
face of our promotion of very scientific methods; if we allow decisions
to be made on the basis of the opinion or whim of some expert despite
the existence of a proven method, we invite criticism.

Meehl’s (1954) results, in which he examined 20 studies that compared statistical


and clinical methods, showed that in almost every case statistical (mechanical)
methods were as accurate and often more accurate than clinical methods of
combining scores. Subsequent research has continued to support these findings
(McIntire & Miller, 2000, p. 307).

6.5.2 Methods of combining various scores


In essence, there are six different ways of combining results.

6.5.2.1 Simple average


In this method, the various stanine scores are added together and divided
by the number of scores (A + B + C + … ÷ N) as discussed above. This
is shown in Table 6.5.

Table 6.5 Simple (linear) average

Subject Stanine
English 5
Mathematics 4
Biology 6
Total 15
Divide by 3
Final score 5
The final score is 5,0, which is above the cutoff score, so the person
passes.

6.5.2.2 Weighted averages


The second approach is the weighted average in which score A is
weighted by x, B is weighted by y, C is weighted by z, and so on, and
then divided by the sum of the weights (i.e. Ax + By + Cz + … ÷
x+y+z+ …). This is shown in Table 6.6.

Table 6.6 Weighted average

Subject Stanine Weight Score…


English 5 2 10
Mathematics 4 3 12
Biology 6 2,5 15
Total 37
Divide by 7,5
Final score 4,9

In this case, the final score is 4,9, which is below 5, and so the person
fails.

6.5.2.3 The multiple-hurdle approach


In this approach, the various measures are arranged in some order of
importance, and only those people who score above the cut-off level on
an earlier measure are considered for assessment by the later measure.
For example, in a typical selection scenario, only those people who have
a certain education level (say Grade 12) are invited for psychometric
evaluation, and only those people who score above the final cut-off level
on the psychometric measures are invited to an interview. Clearly this is
a useful way of identifying potential staff, because the most expensive
techniques (such as interviewing) are saved for later once all the
unsuitable candidates have been eliminated. In this scenario, the earlier
screening devices are relatively cheap and able to deal with larger
numbers, whereas the more expensive techniques are used later when
there are relatively few candidates left in the pool. This is illustrated in
Figure 6.2.

Figure 6.2 The multiple-hurdle approach

In this process candidates have to clear a number of hurdles, and success


at each hurdle is necessary for progress to the next. TV shows such as
The Apprentice and The Amazing Race use a similar knockout process,
with participants or candidates having to do various tasks, and where
failure results in elimination from the contest.

Although this is a widely used technique, it has its own problems.


Suppose, for instance, that the scenario is that of selection and that the
first hurdle is having a BCom degree. Therefore, everybody with a
BCom goes into the second round, and everybody without a BCom is
excluded from further consideration. However, suppose also that one of
the candidates has a BA (Hons), with majors in accounting and
economics. He would have the necessary skills and knowledge to do the
job, but falls short on what is essentially a technicality – the BA (Hons)
in place of a BCom. The person could prove to be superior at each of the
subsequent assessment hurdles, but is prevented from proving this
because he is eliminated at the first hurdle.

Similarly, in a clinical situation, we can imagine a case where it is


debatable whether a person needs to be admitted to a hospital or can be
treated as an outpatient. It may be that there are a series of assessments
that need to be completed before a final decision can be made. If the
results of these assessments are treated in a multiple-hurdle way, the
final decision could be very different from one using a different method
of combining the results.

6.5.2.4 Compensatory methods


The compensatory approach is similar to the averaging approach in that
we accept the presence of a below cut-off score on one measure, on
condition that the other scores are relatively high. For example, if we set
a cut-off point of 5, we can accept a score as low as 3 on one measure,
provided there is at least one score of 7 to compensate for the 3.
However, there needs to be a sub-minimum so that even if there are two
8 scores, any score below 3 is unacceptable and means that the person is
eliminated from further consideration.

This is similar to a round robin or pool system in which, despite a loss in one
round, the person or team may still win. It is quite different from a knockout or
multiple-hurdle situation where loss in one round means the player or team is
eliminated from the contest.

6.5.2.5 Profile analysis


A fifth way of combining scores from different assessment techniques is
to plot them on a graph to yield a profile. A good example of this is the
16PF Personality Inventory, where the candidate’s scores relating to
each of the personality factors are plotted on a stanine scale. By
connecting the person’s score on each of the factors, his profile is
created, which can then be compared to another profile and the degree of
similarity of the two calculated. An example of how to use this kind of
profile analysis* is shown in Figure 6.3. Two people (candidate A and
candidate B) have been assessed to see if they are likely to succeed as
chemical engineers. The third profile is that of a typical chemical
engineer that has been built up by assessing over 500 chemical
engineers. (The actual values used in this example are fictitious and are
used merely to show how the technique is used. For the sake of
simplicity, we have shown only eight of the 19 scales.)

Figure 6.3 A typical profile analysis

Simply by visual inspection of the three profiles, we can see that


candidate A’s profile is much closer to that of the typical engineer than
that of candidate B. In terms of personality then, we can say that
candidate A’s personality is far closer to that of a typical engineer than
candidate B’s.

Of course, it is not good enough simply to inspect the various graphs;


there must be and there is a scientific method of analysing profiles. It is
done by calculating a d-value. The “d” stands for distance, and the d-
value is simply a measure of the distance between each point on the two
profiles being compared (i.e. the candidate’s score (A or B) and the
typical score.) The calculation looks very similar to the calculation for
variance and the standard deviation. (Remember that V = (X – X)2/n and
the SD = √V.) To calculate d, we take the distance between the two
points on factor 1 (3 2 3) for candidate A) and 24 (3 2 7 for candidate B)
and square this (0 × 0 for A and 4 × 4 for B). Now we do the same for
factor 2 (6 2 7 for A and 3 2 7 for B, and square these numbers). And for
factor 3 (4 2 7 for candidate A and 5 2 7 for candidate B), and square
these numbers. We do the same for each factor: calculate the distance
between the two profiles and square it. Then we add all the squared
numbers of profile A and of candidate B’s profile and divide each total
(A and B) by the number of factors (in this case 8). We now have d2. We
then find the square root of this number and we finally have d, which is
very similar to the SD, or standard deviation (which we know is the
square root of the variance). The only difference between d and SD is
that the SD is based on subtracting each case value (X) from the mean
(X) every time (i.e. from the same number), whereas with d, the number
from which the case value is subtracted changes for each factor. The
closer d is to 0, the more similar the two profiles are to each other. (See
Nunnally & Bernstein, 1993, pp. 599–603. Note: these authors use “D”
and not “d”. However, because a correlation uses “r” and not “R”, we
choose the lower case “d”.)

Sidebar 6.1 Profile analysis in practice


A practical application of profile analysis is in the selection of various managers
and other senior staff. Using the job description*, ideal score ranges on each of
the factors are drawn up before the selection process (e.g. 5–7 on factor 1, 3–6
on factor 2, etc.). In addition, those factors that are considered essential (or
“mission critical” to use some jargon) are also identified. Candidates are then
assessed on the measure, and their profiles compared to the ideal. Where scores
fall outside the ideal range on any factor, especially the mission critical factors,
they (these outliers*) are explored during the interview stage.

6.5.2.6 Balanced scorecard


This term is quite common in organisational psychology and can easily
be used in any areas in which measurement is important. The idea
behind the balanced scorecard (BSC) stems from the fact that those
aspects that are measured are the ones that receive attention (“what gets
measured gets done”), and if we concentrate on measuring only one or
two attributes, changes can occur in these, but often at the expense of
other parts of the system. The BSC aims to avoid such a one-sided
approach. We may use the example of a motor vehicle’s various facets.
If we concentrate only on driving to our destination as quickly as
possible (i.e. we focus on speed), we may find that we use far more fuel
and may behave irresponsibly. BSC advocates a broader approach –
moderate speed, moderate fuel consumption and responsible driving
behaviour.

The balanced scorecard approach argues that we need to examine performance


on many aspects in order to make sure that an improvement in one area is not at
the expense of poorer performance in other areas. In a school situation, we would
want to show that an improvement in the marks in one area (say mathematics)
does not occur at the expense of other marks, because the student has spent all
his time and effort in improving his mathematics score and so has neglected the
other subject areas.

6.5.2.7 Decision-making matrix


The final issue we need to consider when looking at combining and
interpreting assessment scores is the impact of raising and lowering the
cut-off point. In our various examples in this chapter we set a cut-off
point at stanine 5. Now we must consider what happens if we raise the
cut-off to stanine 6 or 7, or if we drop the cut-off score to stanine 4 or
even 3. (Please note that it does not matter whether the cut-off score is
based on a single assessment score or some combination of scores as
discussed above. For simplicity, we will use a single score, although
how this is determined is immaterial.)

Imagine that we assessed a group of people on one set of measures (the


predictor), and then assessed their subsequent performance on some
other task (the criterion). This occurs in an organisational environment
when, for example, we have given a person a typing assessment as part
of the selection process for a typist. Our concern, however, is one of
validity – how well did my assessment predict later performance? The
same consideration arises in other contexts: How well did my
assessment of the person successfully predict that he would stop
smoking? Or how well did my assessment of the driver’s safety
behaviour predict his later accident rate?

In each case we can plot the predictor scores against the criterion scores.
In this process we will get the typical oval shape of the distribution
score. When we correlate the predictor and criterion scores, we get a
value between 21,0 and +1,0. The closer the correlation is to 1, the
flatter the oval and the better the validity will be. The nearer the
correlation is to 0, the rounder the oval and the lower the validity will
be. This is shown in Figure 6.4.

Figure 6.4 Score distributions associated with


various correlations
Now let us take Figure 6.4b and work on it further by drawing a vertical
line at some cutoff point between who passes and who fails on the
predictor (a score of 5 in this case). Similarly, we can draw a horizontal
line separating people who succeed or fail on some performance
criterion. We get four areas, or quadrants, which are labelled A, B, C
and D. This is the decision-making matrix shown in Figure 6.5. All
cases to the right of the vertical line (quadrants B and D) are predicted to
pass and are accepted. These are labelled positives. The cases to the left
of the line are predicted to fail and are rejected. These are labelled
negatives. Those cases above the horizontal line (quadrants B and C)
perform well (succeed at the task), whereas those below the line
(quadrants A and D) do not perform well (fail).

Figure 6.5 The decision-making matrix

If we examine the decision-making matrix in Figure 6.5, and look


specifically at the vertical line extending upwards from 5, we can see
that this is the cut-off point, with all cases to the right of the line
(quadrants B and D) predicted to pass and thus labelled positives. The
cases to the left of this line are predicted to fail and are labelled
negatives. If we look at the horizontal line (extending across the graph
from the middle of the word “Performance”), we can see that those cases
above the line (quadrants C and B) perform well (they succeed at the
task), whereas those below the line (quadrants A and D) do not perform
well and are labelled as fail.

Therefore, we have four groups of people:


The As, who are predicted to fail and do fail (true negatives)
The Bs, who are predicted to succeed and do succeed (true positives)
The Cs, who are predicted to fail but succeed (false negatives*)
The Ds who are predicted to succeed but fail (false positives*)

Several things follow from this analysis.

Firstly, we see that as the oval gets thinner (and begins to look like a
straight line – a perfect positive correlation), the size of quadrants C and
D shrink. In other words, as the measure increases in validity, the risk of
making false predictions decreases. (See Figure 6.4a.)

Secondly, we can also see that if the cut-off point increases (the vertical
line moves to the right), the D quadrant shrinks and will eventually
disappear altogether. In other words, by raising the cut-off score, the
chance of identifying false positives decreases. However, as D is
shrinking, the C quadrant is expanding. Therefore, although raising the
cut-off point decreases the number of false positives, it increases the
number of false negatives.

Similarly, if we reduce the cut-off point (i.e. move the vertical line left),
the size of quadrant C shrinks and can disappear completely. In other
words, lowering the cut-off point ensures that everybody who has a
chance of succeeding is given that chance – there are no false negatives.
However, the cost of this tactic is to increase the size of the D quadrant,
the false positives. In other words, by lowering the cut-off point, more
people succeed, but more people also fail. The exclusion of positive
cases is known as a Type 1 error*, while the inclusion of false cases is
known as a Type 2 error*.

Depending on the nature of the decision that has to be made, either a


Type 1 or a Type 2 error is appropriate. We all know the saying that it is
better for nine guilty people to be found not guilty in a court than for one
innocent person to be found guilty. This is a Type 1 error. Similarly, if
we are developing a cure for a disease such as cancer or HIV/AIDS, it is
better to falsely reject a possible cure than it is to accept a cure that has
negative side effects. We are still painfully aware of what happened in
the 1960s when thalidomide was given to pregnant women – it caused
their children to be born with deformed flipper-like limbs instead of
arms or legs because blood had been unable to get to the developing
limbs. This was a Type 1 error.

However, in the case of socially desirable outcomes such as the


development of previously disadvantaged people, it may be far more
important to tolerate a Type 2 error. We acknowledge that as a result of
past injustices, some groups may be more at risk of failing in
educational or work contexts than their previously advantaged
colleagues. In this case, a Type 2 error gives all people with a chance of
succeeding the opportunity to prove themselves. However, as part of this
strategy, the management of educational institutions or workplaces
needs to take steps to minimise the potential for failure. Managers must
find ways to decrease the size of the D quadrant via affirmative action*
and other steps that will enhance the chances of success.

Another possible strategy to ensure that the number of false negatives is


reduced is to lower the horizontal bar and to make it so easy to pass that
everybody passes, irrespective of his contribution. This strategy is
rejected completely as a solution. If standards are reduced so that
everybody, irrespective of ability, can succeed, the system becomes
meaningless. “Pass one, pass all” is a recipe for organisational and
national suicide.

Using the four categories of true negatives, true positives, false positives
and false negatives enables us to derive a number of different indices.
For example, if we take the number of positives (B + D) and divide it by
the total number of cases (A + B + C + D), we get the selection ratio*.
If we have a low selection ratio (i.e. if we have to choose relatively few
people from a large pool), we can raise the cut-off to get the very best
candidates. If the selection ratio is relatively high (i.e. we have to choose
a relatively large number from a relatively small pool), we can lower our
cut-off point to expand our pool to include as many people as possible.

If we look at the number of people who succeed (B + C) as a proportion


of the whole sample, we get the overall success rate. This is termed the
base rate*.

If we look at the true positives (B) as a proportion of all positives (B +


D), we get a measure of our detection rate* or the sensitivity of the
measure. This is sometimes termed the hit rate or the success rate.
Whatever system we use for assessing people, the success rate must be
higher than the base rate – and significantly so – otherwise we may as
well not assess. In addition, the financial gains that result from an
improved selection process must be greater than the costs of making the
improvement – there is no point spending R1000 to save R100.

Kaplan and Saccuzzo (2013, pp. 519–523) have a similar discussion of


this matrix in terms of hits and misses.

When we examine issues of fairness in Chapter 7, we will see how the


different fairness models link to these ratios.

6.6 Comparing results from different tests

One of the problems that often confronts people who have to make
decisions based on assessment scores is the fact that the candidates
sometimes write slightly different versions of the same test, with one
version (X) being more difficult than another (Y). As a result, there must
be ways to equate the results of the two versions of the test. The simplest
method is to use different norm tables and work out where the
candidates score in terms of these separate norms. For example, person
A may be at stanine 6 on test Y and person B at stanine 7 of test X, even
though the raw scores suggest that person B scored lower than person A,
simply because B was assessed on the more difficult version of the test.
This issue is discussed in Chapter 4 (section 4.3.2) where we look at
how to establish the equivalence of parallel forms of a test.
6.7 Summary

This chapter looked at different ways of combining information from


different sources. These include the simple addition of scores (a linear
approach) and the different weighting of scores before they are
combined.

We distinguished between what Meehl terms the mechanical and clinical


combination of data, and saw that while the clinical approach has some
merit, it is actually a step backwards and can invite claims of nepotism
and corruption.

We examined notions of the multiple-hurdle (or knockout) system and


compared this with a compensatory or round robin approach. We looked
at profile analysis and how the d-statistic is used to determine the degree
of similarity between two profiles. This d-statistic is very similar to the
better known SD (standard deviation) statistic.

We then considered the notion of the balanced scorecard, where it was


argued that a good score on one measure may be less than optimal if
scores on other measures suffer. We concluded that it may be better to
have a modest improvement on a range of assessment outcomes rather
than a spectacular improvement on one at the expense of deterioration
on a number of other measures.

Finally, we looked at the decision-making matrix, in which four


categories of outcomes (true negatives, true positives, false positives and
false negatives) were identified. We saw how raising and lowering the
cut-off points may affect the relative size of each of these categories and
the effect on Type 1 and Type 2 errors. In closing we saw that
management needs to manage the false positive group, especially in an
affirmative action scenario, and that lowering standards is definitely not
an option in addressing the issues involved.

Case study 6.1


The bus accident victim
Imagine that you have been called as an expert witness in a court case in which
a 19-year-old male, Sipho, is claiming for damages incurred in a bus accident. As
a result of this accident, which occurred when he was travelling from
Johannesburg to his home in Mpumalanga, he lost his dominant arm just above
the elbow, and was unconscious for three days. Subsequently, his social and
personal life have deteriorated, and he has displayed frequent bouts of
aggression (both verbal and physical). He is also reported to have become
sexually promiscuous. He has attempted several times to complete his high
school education, but has failed Grade 10 three times, once before the accident
and twice subsequently.
Sipho’s legal representative is suing the bus company for several million rand,
claiming loss of amenities (i.e. his dominant arm) and brain damage, and
claiming compensation for both pain and suffering, and for future loss of earnings
that have resulted from his reduced capacity, both mental and physical.
What can you as a psychologist do to help the court arrive at a fair decision?
What would you tell the human resources manager at his place of employment?
What recommendations could you make regarding the management of his
behaviour? What would you tell his family?
The question that has to be answered is whether Sipho’s injuries are
permanent, and what impact they are likely to have on the young man’s
future prospects at work and in life in general.

Sipho’s cognitive functioning


Sipho’s pre-accident school results point to a generally low intellectual ability, as
do those of his siblings, suggesting it is unlikely that he would have been in a
position to complete his secondary education.
A priori, it would appear that the following would be possible areas of
employment for someone in Sipho’s position, provided his employer knows the
nature and extent of his disabilities and reasonable accommodation* is made in
this regard:

1. Machine operator (providing he has sufficient levels of vigilance and


alertness)
2. Entry-level clerical positions (e.g. stores receiving clerk, document
reconciliation, seat/vehicle booking)
3. Sales/cashier positions where simple transactions are recorded
4. Messenger
5. Control activities (e.g. announcement of train/bus arrivals and departures,
access control/security via motorised boom/gate operation)
6. Community development instructor (e.g. HIV/AIDS, nutrition, health, etc.)
7. Agri-nursery (seedling care, weeding, watering, etc.)
8. Arts and crafts (cloth painting, etc.)
This list is not exhaustive and is given merely as an indication. Clearly the
availability of jobs in the area in which he lives is a major limiting factor.

Psychometric assessment
In order to assess his capabilities and general suitability for the types of job
outlined above, Sipho was given a number of psychometric tests to assess his
cognitive and psychological functioning in line with the requirements of such
positions. These are described below.

The Ravens Standard Progressive Matrices


On this measure Sipho scored 12 (out of a maximum of 60), which is very low
score (stanine 1) considering his claimed education status, indicating that his
general cognitive functioning is of a low level. The norms used were based on a
wide range of railway workers and were obtained in Gauteng in 1997 (n = 613).
This finding is in agreement with those obtained on a previous occasion by
another psychologist using a verbally based general reasoning test (the
Intermediate Level Mental Alertness Test), where he also scored at the very poor
stanine 1 level.
The first 3 stanines of these norms are given below.

Score Stanine
0–7 1
8–12 2
13–19 3
Cancellation task
This is a measure of perceptual speed, which is the ability to recognise simple
patterns quickly (e.g. 7) in a string of numbers (e.g. 5 3 7 8 9 5 8 4 7 3 2 3). This
is a general indicator of cognitive functioning and a crucial ability for clerical tasks
where specific information needs to be identified (see Appendix 1). On this test,
Sipho scored at stanine 5 level. Appropriate norms based on railway clerks and
obtained in 1993 were used. This suggests that he has a fair ability to identify
simple patterns or figures in a complex matrix and would be able to carry out
simple vigilance tasks.

Spot the error


This is a clerical test involving the ability to compare material across two lists and
to identify discrepancies as in a proofreading situation. The ability to compare
visual material in this fashion is a crucial skill for clerical tasks. On this test, Sipho
scored 19 in the allotted ten minutes for the exercise, which is at the very poor
level of stanine 1. This is slightly higher than the 16 achieved when this test was
given on a previous occasion by another psychologist. The slight improvement
probably represents a learning factor. However, he scored 100 per cent correct,
which represents a score well above the average (stanine 8). Although he was
very slow, he was extremely accurate. This makes one believe that he would be
able to cope in a low-level clerical position as indicated above.

Continuous adding
In order to test his endurance and perseverance in the cognitive domain, he was
given a simple adding task in which his performance was monitored over an
extended period (a so-called Pauli test). This test involves a simple adding task in
which a series of single digits are added over an extended time period (e.g. 4 + 3,
3 + 6, 6 + 2, 2 + 9, 9 + 5, etc.). The test taker’s performance is monitored over
the full period, with a line being drawn every minute to indicate the progress
made to that point. Any fall-off or deterioration in concentration and accuracy is
thus easily identified in terms of both the number of additions completed during
the period and the accuracy of the addition during the period.
On this test, which lasted for 20 minutes, Sipho worked at a steady rate and
managed to complete between 13 and 18 additions per minute, with no apparent
deterioration in performance. It must be pointed out that after the sixth minute, he
complained that his hand was getting tired. At this point, the tester took over the
task of writing down his answers to the additions. In this way, the fall-off of
performance that would have been associated with muscular fatigue resulting
from his using his non-dominant hand was avoided. The maintenance of his
concentration and steady information processing over the full 20-minute period
does not suggest any major impairment of cognitive faculties.

Numeric calculations
A test involving simple arithmetic operations was administered, as these are
generally required for many clerical and cashier-type jobs. On the test
administered, Sipho attempted only 17 of the 30 items, of which only four were
correct. This is a very low score (stanine 1), based on a norm group of railway
clerks obtained in 1993. These results are not surprising in the light of his poor
pre-accident scholastic record. Clearly, he is unable to carry out arithmetic
calculations at any advanced levels. However, this finding does not suggest any
significant measurable deterioration of his cognitive functioning as a direct result
of the accident.

Additional assessments
In addition to these cognitive tests, it was decided to examine him further to see
whether his cognitive functioning could have been influenced by a poor self-
image and the presence of post-traumatic stress symptoms, and if so, to what
extent. To this end, two further scales were administered, namely a Cognitive
Distortion Scale, and a Trauma Symptom Inventory.

Cognitive Distortion Scale (CDS)


The CDS is a short multidimensional scale designed to tap dysfunctional
concepts the person has about himself. It measures negative thinking or
cognitive distortions in five distinct areas, namely self-criticism (low self-esteem),
self-blame for uncontrollable events, helplessness, hopelessness and pre-
occupation with danger. On each of these subscales, Sipho scored at the very
highest levels, obtaining T-scores of 87, 95, 100, 97 and 100 respectively. (A T-
score is based on the normal distribution with a mean of 50 and a standards
deviation of 10. A T-score of 100 represents a massive 5 standard deviations
above the norm.) These are American norms, as the measure has not been
validated or normed in South Africa. There are no equivalent measures available
locally.
According to the manual of this scale, any T-score above 70 (i.e. 2 standard
deviations above the mean) should be considered clinically significant, so that the
scores obtained by Sipho are extreme. However, as the manual of the CDS
points out, each of the scales is relatively transparent, so that they are easy to
fake. In fact, the manual goes on to caution that the CDS

will be most helpful when assessing individuals who are not likely to
misrepresent themselves for primary or secondary gain. In forensic settings or
instances where symptom misrepresentation is a significant possibility, the
CDS should be co-administered with at least one test that has validity scales.

“Validity scales” refers to some kind of in-built accuracy test or lie-detector scale.

Trauma Symptom Inventory


To assess the extent of Sipho’s trauma, the Trauma Symptom Inventory (TSI)
was administered. The TSI is a measure of the presence and extent of any post-
traumatic stress symptoms. It consists of ten clinical subscales and has three
independent consistency and validity measures built into it. These are:

1. Atypical responses (very unusual responses such as losing one’s sense of


taste)
2. Unusual response levels (whether the person scores 0 on items that very
seldom give 0, e.g. wishing one had more money)
3. Inconsistency (where an item is presented twice in the scale, and
differences and inconsistencies between the two responses are identified)

On each of the ten clinical subscales of the TSI, Sipho scored in the clinically
significant range above T-score 70. Most of his scores were closer to 90 and 100,
which are extreme scores. This suggests that he is suffering from high levels of
post-traumatic stress and associated levels of depression. However, it is
important to note that he obtained a T-score of 100 on the Atypical response
validity scale and 70 on the Inconsistency scale. These are American norms, as
the measure has not been validated or normed in South Africa. There are no
equivalent measures available locally.
These high scores, especially the extreme Atypical response score, suggest that
his responses on both the TSI and the CDS, as well as the poor scores on the
other scales administered to him, may well be deliberate distortions. This could
reflect an attempt to mislead or to overstate his symptoms. This then brings the
validity of all previous tests into question.
1. Based on these results, do you think Sipho has suffered irreversible
brain damage?
2. To what extent do you think the accident has impaired his
educational prospects?
3. To what extent do you think the accident has impaired his
occupational prospects?
4. From the results on the CSD and TSI, do you think Sipho was
malingering and trying to fake his condition in order to get a good
insurance payout?

Give evidence from the test results to support your answers to these four
questions.

Additional reading

For a useful discussion of norms and the norm-based interpretation of assessment


scores, see Foxcroft, C. & Roodt, G. (Eds). (2009). An introduction to psychological
assessment in the South African context (3rd ed.), especially Chapter 3, pp. 38–43; and
Cohen, R.J. & Swerdlik, M.E. (2002). Psychological testing and assessment: An
introduction to tests and measurement, especially pp. 100–112.
For a brief discussion of the decision-making matrix, see Nunnally, J.C. & Bernstein,
I.H. (1993). Psychometric theory, pp. 370–372.

Test your understanding

Short paragraphs

1. Discuss the five approaches to interpreting an assessment score.


2. What is meant by a norm, and why is it important to specify the norm group one is
using?
3. Briefly discuss the six ways of combining scores to arrive at a decision about a
person.

Essay

Using the decision-making matrix, show the effects of raising or lowering the predictor
cut-off score on the four categories of outcomes (true positives, true negatives, false
positives and false negatives).
7 Fairness in assessment

OBJECTIVES

By the end of this chapter, you should be able to

define what is meant by fairness in assessment


describe the concept of psychic unity and see how this relates to assessment
outline the distinction between emic and etic approaches to assessment
distinguish between fairness, bias and discrimination
show what steps can be taken to minimise unfairness in selection.

7.1 Introduction

A useful place to begin a discussion on fairness is with the argument that


the purpose of any assessment process is to assess how much of a
particular characteristic or attribute a person or a group of people has,
and then by extension to distinguish between those people who have the
characteristic and those who do not. Assessment, in this sense, is used to
discriminate between people. Problems arise, however, when the
information that is obtained is used to discriminate against certain
people. An assessment technique can be said to be fair when it identifies
correctly the presence of and extent to which a person or group
possesses a particular attribute.

In 2005, the Society for Industrial and Organisational Psychology of


South Africa (SIOPSA) produced a document entitled Guidelines for the
validation and use of assessment procedures for the workplace, in which
various issues, including fairness, are discussed. According to this
document, there are four different meanings of fairness.
Firstly, there is the view that fairness means equal outcomes for all
groups (equal pass rates for the various groups being assessed). This
definition is rejected by all professionals involved in assessment.
The second meaning involves the equitable treatment of all groups in
terms of access to materials, conditions during the assessment
process, time limits, and so on. This includes the notion of reasonable
accommodation, which means adjusting conditions to meet the needs
of physically handicapped people.
The third meaning of fairness is that participants have an equal
opportunity to experience and learn from situations that may have an
impact on later assessment. This is most obvious in the educational
setting, where people who have not been taught certain content cannot
be expected to do as well on an assessment as those who have been
taught the material. Schmidt (1984) refers to this as “pre-market
discrimination” (see section 7.1.3).
The fourth meaning of fairness is the absence of predictive bias. This
means that an assessment is fair if it has the same ability to predict
future behaviour irrespective of group membership. The regression
models and other approaches to ensuring fairness discussed below are
all based on this definition of fairness.

Sidebar 7.1 Equitable treatment


We must be careful with the term “equitable treatment”; it is not the same as
“equal treatment”. Equitable means “fair”, not “the same”, and, as noted by the
famous American judge Harry Black who gave the American Affirmative Action
drive a solid boost when he passed judgement in the famous Bakke case, “to
treat people fairly, you may have to treat them differently”. Look at Case study 6.1
on page 72, especially the section on continuous adding, and you will see that the
rules for fairness (equitable treatment) meant that the rules of equal treatment
had to be bent. Because of Sipho’s injury, it was fairer to treat him differently than
it was to treat him the same. This is a clear example of reasonable
accommodation.

An assessment technique can be said to be unfair when it gives rise to


decisions that are not warranted, and are based on the erroneous
measurement or interpretation of the presence (or strength) of an
attribute. This is the view of the US Educational Testing Service when
they argue that “[f]airness requires that construct-irrelevant personal
characteristics of test takers have no appreciable effect on test results or
their interpretation” (Educational Testing Service, 2000, p. 17).

There are numerous sources of error in measurement that give rise to


incorrect or inaccurate results. However, at this point we should remind
ourselves that there are basically two forms of error that occur in
measurement: random or unsystematic error, and systemic error or bias.

Random or unsystemic error results from fluctuations in both the


internal and external conditions during the assessment process. Because
these are random fluctuations, they tend to cancel each other out when
the measurement is repeated. Systemic error does not balance itself over
time. Because it is systemic, it will always be reflected in the score. This
error is also termed bias. If we know the extent of the bias, we can use a
biased assessment technique fairly, provided that we recognise and
compensate for the bias. It is like having a clock that is ten minutes
slow. If we know this, we can still give an accurate time check by
simply adding ten minutes onto the time given by the clock. These
issues of error and bias are addressed in depth in Chapters 4 and 5.

This argument points to an important distinction between bias and


fairness. Bias is a technical term and relates to the amount of systemic
error contained in the assessment score. Fairness, on the other hand,
refers to the nature of the decisions made on the basis of the
measurement. If my clock is fast and I penalise someone for being late
(i.e. arriving ten minutes after my clock indicated the meeting time),
then I am being unfair.

7.1.1 Definition of fairness


Fairness can thus be defined as the lack of random error or systematic
bias in the assessment technique and/or the interpretation thereof. A
technique is fair when it treats people with similar attributes equally and
when it distinguishes between those people who have different
attributes. Fairness is a special case of validity generalisation and is
increasingly being applied in relation to classes of people.

7.1.2 Fairness, bias and discrimination


As we have seen, tests and other forms of assessment and evaluation are
designed to measure differences between people, because for some
purposes, differences rather than similarities are important. In a
multicultural country like ours we may even need to take differences
into account in order to be fair. This is exactly what affirmative action
and the Employment Equity Act are all about: we need to treat people
differently to address and rectify the effects of previous disadvantage.
This raises the issues of what is meant by fairness and fairness to whom.
As psychologists, we also need to see what we should do to manage the
differences that exist to the benefit of all stakeholders.

Our first question is: Who are these stakeholders?

If we take a close look at the various parties affected by the outcomes of


various assessments, we can identify at least six distinct groups.

7.1.2.1 Individuals
In the case of educational and clinical assessments, an accurate
assessment of a person’s psychological functioning may be crucial in
diagnosing problem areas and suggesting possible interventions and/or
treatments. In the case of employees or potential employees, a fair
assessment is crucial as it may affect employment or promotion
opportunities.

7.1.2.2 Groups of people


Both previously disadvantaged groups and those who are seen as having
been advantaged in the past are important stakeholders. Both groups
need to be fairly treated.

In this regard, we must recognise that fairness is a relative concept and that it
may be impossible to be fair to both the previously advantaged and the previously
disadvantaged groups at the same time. To be fair to the previously
disadvantaged, we may have to select people with lower assessment scores
ahead of those with higher ones. This is unfair to the latter. However, to ignore
past injustices by choosing the people with the higher scores will be unfair to
those who were previously discriminated against.

7.1.2.3 Families, communities and society


Assessments of various kinds affect not only the people directly
involved in the assessment process, but also their families and
communities, and the wider society in which they live. An incorrect
diagnosis may result in a reasonably competent or sane person being
institutionalised or made to undergo some form of treatment. On the
other hand, misdiagnosis of a person may result in his not receiving
treatment, and thus causing accidents or even committing crimes and
other forms of socially unacceptable behaviour.

7.1.2.4 Organisations
Business organisations exist to make money, and if people are
erroneously appointed or not appointed because of problems with the
assessment process, businesses stand to lose money and other
opportunities. Such errors can also result in increased levels of labour
unrest.

7.1.2.5 Employees or managers


In much the same way as organisations, both managers and employees
stand to lose if assessments that they make (or are made about them) are
incorrect.

7.1.2.6 The state


A final stakeholder is the state. The incorrect placement of people in the
civil service and the broader community as a result of poor assessment is
likely to increase the number of legal claims, the revenue lost to income
tax and the like.

Clearly, accidents that occur as a result of the incorrect placement of


people can have important consequences for all the stakeholders
identified above.

With so many stakeholders, it follows that there may be conflicting


interests, and that being fair to one set of stakeholders may result in
being unfair to another group. In Chapter 18 (section 18.2.11), the need
for restorative justice* is briefly discussed. In this case, it can be seen
that the emphasis falls on the individual, perhaps at the cost of the
organisation and other stakeholders.

7.1.3 Discrimination
Discrimination means treating some people differently from others.
Although this is generally seen in terms of group membership of some
kind (e.g. language, gender, class, race, religion, etc.), this is not
necessarily the case. Some parents favour one child and discriminate
against another. The word “discriminate” means “to choose between”
and does not always have negative connotations. For example, when we
shop, we discriminate between brands of toothpaste, choosing one rather
than another. However, when our choice is based on factors not related
to the task involved, we discriminate against the object or person. When
a lecturer or human resources practitioner sets a test or examination, he
discriminates (distinguishes) between people (by taking such aspects as
knowledge or job-relevant criteria into account). However, if the lecturer
or human resources practitioner were to take non-job-relevant criteria
(such as age, gender, ethnicity) into account, he would be discriminating
against the persons concerned.

Is it wrong to appoint someone on the basis of his religion or other class


membership?

In general terms, the answer is “Yes”. However, in some cases class


membership is part of the job; it is what South African law terms an
inherent requirement of the job*. (The International Labour
Organisation (ILO) terms it BFOQ – “a bona fide occupational
qualification”.) For example, it is acceptable to insist that the director of
a Jewish old-age home be Jewish, or that the principal of a Catholic
school be a Catholic – it is a requirement of the job. Discrimination is
also acceptable for reasons of social redress which is aimed at undoing
the effects of previous discriminations: this is stipulated in the South
African Constitution and Employment Equity Act.

In looking at fairness and the meaning of discrimination, Schmidt (1984)


makes a useful distinction between three forms of discrimination.

7.1.3.1 Disparate treatment


Individuals or groups are treated differently. For example, married men
get a housing subsidy, whereas unmarried people and married women do
not.

7.1.3.2 Adverse impact


A test or other technique has adverse impact* if there is a discrepancy
in outcomes between a reference group (such as white males) and a
focal group (e.g. females, Africans). Another example of adverse impact
would be if a test for sewing machine operators were based on language,
or had a strong language component (and a person does not require these
language skills to be a sewing machine operator), then it is likely that
the test will have a negative impact on those potentially competent
people whose language skills are underdeveloped. Likewise, if a
selection technique is based on height, women on average would be
impacted negatively, because, on average, women are shorter than men.
Adverse impact, however, does not necessarily mean that the techniques
are biased; the results may reflect real differences. Real differences do
exist and cannot be denied or ignored. For example, to find no
differences in the average weight of men and women means that the
scale is wrong, not that there are no differences.

In a landmark case in the US, it was argued that fire-fighters had to be a certain
height to allow them to work effectively with the fire-fighting equipment. Women’s
groups complained, arguing that this then excluded women from being fire-
fighters by reason of adverse impact. They won the case and this selection
criterion was dropped.
7.1.3.3 Pre-market discrimination
If some people are prevented from gaining the required skills or
experience before they get into the market, for example if mathematics
is required for many jobs, and girls are not taught this at school, then
they will be victims of pre-market discrimination*. One of the major
impacts of the apartheid system was the systematic denial of quality
educational opportunities to black children (and people) throughout the
country, the impact of which is still being felt today. In this regard,
Theron (2007, p. 183) shows that

valid selection procedures used in a fair and non-discriminatory


manner that optimises utility very often results in adverse impact
against members of “designated” or protected groups. (In South
Africa, the term “designated groups” refers to people who were
previously disadvantaged in South Africa, namely Blacks (Africans,
Coloureds and Indians), women and people with disabilities
(Employment Equity Act 55 of 1998).

As Milner, Donald and Thatcher (2013, p. 501) note, “[e]ven


assessments that are valid and reliable in determining success in the
organisation based on success in the past may only perpetuate unjust
organisational practices, even if assessment is applied in a fair manner”.

In looking at the various definitions of fairness at the beginning of this


chapter, we see that one important aspect involves the equitable
treatment of all groups in terms of access to materials, conditions during
the assessment process, time limits, etc. This includes the notion of
reasonable accommodation, which means adjusting the assessment
process to meet the needs of people who may be physically
disadvantaged in various ways. In an interesting article on this topic,
Lanchbury and Kearns (2000) (cited in Vermeulen, 2000) make the
suggestion that accommodations can be ranked on the basis of their
impact on the assessment process. They suggest the following:

Level 1 accommodations would have little or no appreciable impact


(e.g. using a larger font for visually impaired people).
Level 2 accommodations would possibly have an impact (e.g.
extending the time limits for speeded tests).
Level 3 accommodations would have the most impact (e.g. omitting
some tests from a test battery).

In Appendix C of the International Guidelines for Test Use Version


2000 (Bartram, 2000), a number of issues with respect to reasonable
accommodation are raised, of which the most important are the
following

Is the disability likely to have an effect on test performance? Many


people have disabilities that ought not to affect test performance. In
such cases, no accommodations should be made.
If the disability is likely to affect test performance, then the question
is raised as to whether the effect on performance is incidental to the
construct being measured. For example, a person with a damaged or
missing hand may have trouble with a speed test that involves writing.
If the ability to perform manual tasks rapidly is part of the construct
being measured, then the test conditions should not be changed.
However, when the particular disability is not related to the construct
being measured, but is likely to affect the individual’s performance on
the test, then modification of the procedure may be considered. For
example, if the purpose of the test is to assess visual checking speed,
then the person’s ability to write fast is not related to the task and an
alternative way of response would be appropriate. In the case study at
the end of Chapter 6, this issue is raised in the case of an accident
victim who lost the use of his dominant hand.

The Guidelines suggest that when it is felt that reasonable


accommodation of some kind is necessary, the test administrator should
always consult the test manual (and the publisher, if necessary) for
guidance on modification and for information regarding alternative
formats and procedures. They also suggest that any modifications made
to the test or test administration procedures should be carefully
documented along with the rationale behind the modification when
submitting the test results and/or an interpretation thereof.

In deciding to change the testing conditions to meet the requirements of


reasonable accommodation, the psychologist needs to weigh up the risks
associated with varying the assessment process against the importance
of the decisions being made – the greater the deviation, the more likely
that the validity of the assessment will be compromised. Once again, the
need to balance the interests of the person being assessed and the wider
community is highlighted.

7.2 The assumption of psychic unity in relation to


psychological assessment

Our discussion brings us to the notion of the psychic unity* of man, an


issue that we must now attend to. We may wonder what this term means.
Much of classical anthropology rests on the notion of the psychic unity
of man. It is a view that argues that all people are inherently similar, and
that because the brain is hardwired, any differences in belief, value or
behaviour are the result of social and cultural differences, or in problems
with the assessment techniques. In this respect, the Employment Equity
Act seems to assume a stance of psychic unity when it states that
psychometric testing is prohibited unless and until the tests have been
shown to be valid and non-discriminatory against members of
previously disadvantaged groups. This suggests that if differences
between the dominant group and other groups are identified, the first
thing we need to do is prove that the tests are equally valid and reliable
for the different groups. Only then can we argue that the results are real
and not a result of the assessment techniques being biased and
discriminatory.

In the field of assessment, especially assessment across gender and


sociocultural categories, we are constantly faced with the issue of
whether differences in the assessment score reflect real differences or
whether these differences are associated with the measuring technique.
In other words, are any group differences on various psychological
dimensions real or are they simply artefacts that arise because the
assessment processes measure things differently in the different groups?
To state it simply, if we find group differences in ability or personality
structure, are these differences real (i.e. can we assume that the
assessment techniques we use are correct, and that scores really reflect
differences?) or do we argue that there is genuine psychic unity and the
differences in assessment reflect weaknesses in the assessment
instruments (i.e. they are biased)?

Section 8 of the Employment Equity Act (1998) as amended by Act 47 of 2013,


states that psychological testing and other forms of assessment of an employee
are prohibited unless the test or assessment that is being used

a) has been scientifically shown to be valid and reliable


b) can be applied fairly to all employees
c) is not biased against any employee or group.
d) has been certified by the Health Professions Council of South Africa … or any
other body which may be authorised by law to certify those tests or
assessments.

The amendment also allows employees to refer any dispute in this regard to the
CCMA for arbitration (Section 10, paras (a), (b) and (c).

While the concept of psychic unity may hold true at the physiological
level (if we ignore racial characteristics such as colour, hair type, etc.), it
seems it holds no truth at the level of psychological characteristics.

The idea of a shared hardware is what is commonly meant by


“psychic unity”. Yet such shared neurological endowment alone does
not make the case that all humans, irrespective of culture, can be said
to be of the same psyche or mind (Shore, 1996, p. 16).

Contrary to structuralist dogma, meaning is not given to us ready-


made. Meaning could be understood only as an on-going process, an
active construction by people, with the help of cultural resources.
Variation in cultural cognitions can be traced to important local
differences in specific modes and general schema that constrain
ordinary perception and understanding (Shore, 1996, p. 7).

While it may be politically correct to assume a stance of psychic unity,


every anthropologist knows that there are systematic differences
between individuals and groups of individuals (Shore, 1996). Although
these differences are not an issue in themselves, they become a problem
when they are used to treat people differently and to discriminate against
them, rather than treating them equally on the basis of their similarities.

7.2.1 Etic and emic approaches


A closely related issue is the distinction we can make between etic and
emic approaches to understanding psychological phenomena. These
terms derive from linguistics where phonetics is the study of universal
sound patterns and phonemics is the study of sound patterns in a
particular social group.

7.2.1.1 Etic approaches


In psychological assessment, an etic approach* takes a personality
instrument and applies it across all cultures to see how different groups
behave on this measure. It assumes that personality has the same
definition and almost the same structure across cultures – it is a
universalistic approach to assessment. For example, if we took a
personality measure such as the 16PF, Myers Briggs Type Indicator
(MBTI) or Minnesota Multiphasic Personality Inventory (MMPI) and
applied it to all groups to see whether and how these groups differ, we
would be adopting a universalistic or etic approach.

7.2.1.2 Emic approaches


An emic approach* to assessment involves developing a personality
instrument that is unique to a particular culture. It assumes that
personality has different definitions and structures across cultures – it is
a particularistic approach to assessment. Although this approach may
produce a more accurate description of the personality of the particular
group, it makes comparison across groups almost impossible. An
example would be to imagine a group of people who had a very different
way of categorising and describing colours. Say they referred to blue as
the sky, red as the earth, green as the grass, and so on. If we were to talk
to them about red, blue and green, they would most likely not
understand us, at least not as far as colour terms are concerned.

Table 7.1 Etic and emic approaches

Strength Weakness
Etic Makes intergroup comparison May not be equally valid for all
possible groups
Emic Increases validity fowr the group Makes intergroup comparison difficult

When we examine personality in Chapter 11, we see a similar


distinction between idiographic* and nomothetic* approaches to
defining personality

7.3 Evidence of unfairness

With this theoretical background, we can now ask ourselves the


following question: How can we tell if a measure is fair or unfair? There
are three basic approaches to this.

7.3.1 Group differences


One of the most popular arguments that a measure is unfair is the
existence of group differences in the average score or passing rate
between two different groups. For example, suppose group A (e.g. men)
scores higher than group B (e.g. women) on a measure of mathematical
ability, with the result that more men than women score above the cut-
off point. If our definition of fairness is that there should be no group
differences, then it follows that these group differences point to
unfairness in the measuring process. However, no one queries the fact
that the average height of men differs from that of women, nor does this
indicate that there is something wrong with the measuring tape. The fact
that black men from Kenya and Ethiopia win most of the long-distance
races at the Olympics and elsewhere does not mean that these races are
biased against white males or that separate prizes should be given to the
first black man and the first white man to cross the line. (Interestingly
though, when males and females compete in the same event such as the
Comrades Marathon or the Two Oceans Marathon, separate prizes are
awarded to the first male and the first female to cross the line!) The
assumption that different groups of people are more or less equal in
ability (i.e. the assumption of psychic unity) in the case of long-distance
running is clearly wrong, even though some of us might wish that this
were not the case.

When it comes to cognitive ability or personality structure, however, we


somehow feel compelled to argue that psychic unity holds and that any
differences detected result from problems with the assessment
technique. Because of this, some people feel we should draw up
different norm tables to level the playing field. This is like saying that
shorter people should have a different scoring system to allow them to
compete fairly against tall basketball players. This is clearly nonsense.
Nevertheless, we insist that our assessment measures should not identify
differences, and that average scores on tests and other measures should
not differ across different groups. As a result, we choose items (and
discard some) so that different groups achieve the same average score on
the scale or test. This raises important issues that are at the heart of the
idea of psychic unity. Clearly, this definition of unfairness is difficult to
uphold and is, in fact, rejected by the American Psychological
Association (APA), although it does note that cognisance should be
taken of differences between groups and that possible sources of bias
should be investigated.

7.3.2 Differential item functioning


A second approach to fairness looks at the way different items in a test
or scale behave. Even though the average scores obtained by different
groups are the same, this could be as a result of very different item
behaviour. Suppose two groups on average get six out of ten items on a
test of ability correct. We would then conclude that the measure behaved
in a consistent way for both groups. However, it is possible that the two
average scores are based on very different item responses, as shown in
Table 7.2.

Table 7.2 Item responses for two groups

Item Group A Group B


1 1 0
2 1 0
3 1 0
4 1 1
5 0 1
6 0 1
7 0 1
8 1 0
9 0 1
10 1 1
Total 6 6

As you can see, the two groups have the same total scores, even though
only two items overlap (4 and 10). We would therefore be tempted to
argue that the scale assesses more or less the same set of factors or
constructs in both groups. However, the item analysis shows that this is
not the case. Therefore in analysing the results, we need to examine the
nature of the items that differ between the groups. Items that function
differently with different groups can be modified or eliminated. At the
same time, it should be noted that research tends to show that the
removal of biased items does not make a great deal of difference to
group norms – the stronger group tends to remain strong and the weaker
one to continue to score below the other group(s). This is because the
items that appear to be biased tend to be the easier ones, so that
removing these items from the measure only serves to make it more
difficult for everyone, and especially for the culturally different or
minority group (see Kaplan & Saccuzzo, 2009).

As stated earlier, fairness is a case of validity generalisation, and


considers whether a measure is equally valid for various groups.
Therefore one way of showing whether an assessment technique is fair
is to run a separate factor analysis for the different groups, and then to
examine the similarity of the factor structures obtained from the
different subsamples.

Although we can do this simply by scrutinising the results, there is also


a technique available known as confirmatory factor analysis (CFA).
With this technique, the results of one of the groups (e.g. group A in the
example above) are subjected to a factor analysis as described in
Chapter 5. (More technically, this is known as an exploratory factor
analysis (EFA) – we are exploring its factor structure.) We then take this
factor structure and, using a technique that is similar in many ways to
the d-score we looked at in developing a profile, we calculate how
similar the two factor structures are. Technically, we refer to the
similarity of the factor structures as goodness of fit*. This second
process is known as a confirmatory factor analysis.

Therefore, one way of showing whether an assessment technique is fair


is to run a separate factor analysis for the two groups, and then to
examine the similarity of the factor structures obtained from the
different subsamples. DIF rests on several assumptions, including that
items are one dimensional, that the underlying construct or latent trait*
is equally distributed across the different groups (which are typically
gender, age, racial or ethnic groups). DIF also assumes that the groups
being compared are homogeneous in themselves and that the overall test
is unbiased. The aim of the DIF analysis is to identify relative
discrepancies between these groups. Hunter and Schmidt (2004),
however, have criticised DIF methodology, arguing that most evidence
of DIF may be explained by a failure to control for measurement error in
ability estimates and from too much reliance on false findings from
statistical significance testing. In Chapter 8, this notion of differential
item functioning is taken to another level when fairness in cross-cultural
situations is discussed.

7.3.3 Regression analysis


Another important approach to determining fairness is based on the
notion of regression analysis. Regression analysis is a correlation
technique in which one variable is predicted from another (or several)
other variables (i.e. one score (y) is predicted from another score (x)).
This usually takes the form of

y = mx + c

In other words, score y can be calculated by some amount of x (e.g. 3,5


or 0,1, etc. – this is the m in the equation) and some constant figure (e.g.
0,95 or 18,6, or whatever – this is the c in the equation). We can also
plot this equation on a graph, where m is the slope of the line, and c is
the point at which it intercepts or cuts the y-axis. This is shown in Figure
7.1.

Figure 7.1 Intercept and slope of a


regression line

This is a simple linear relationship. A more complex relationship


predicts y from two or more variables and takes the form y = mx + nz +
qw + … + c.

When we apply this to a scatter plot*, we get the following:

Figure 7.2 A regression line

Think of the scatter plot* or scattergram* as a sausage and the


regression line as a stick. (Craig Russell (2000), from whom I borrow
the idea, refers to these as “hotdogs on a stick”.) As with any scatter
plot, the thinner the oval or sausage, the closer the relationship between
the variables. A round sausage shows low correlation or prediction. In
the case of regression analysis, a thinner sausage translates into the
greater ability of the x-variable to predict the y-variable. The point at
which the regression line crosses or intercepts the y-axis is known as the
intercept* – it is equal to c in the regression equation.

When we come to see how this relates to fairness, we need to run


separate regressions for the different groups or subsamples. Four
different patterns can occur. These are shown in Figures 7.3a–d.

Figure 7.3 Scatter plots of two different groups


In Figure 7.3a we see that the two scatter plots, the regression line and
the intercept all coincide: there is one sausage on one stick. In this case,
the assessment behaves identically for both groups, and therefore the
measure is fair – it is not biased against either group.

In Figure 7.3b we see that the two scatter plots do not coincide: one
group is lower than the other. However, because the regression line and
the intercept are the same for both groups (i.e. there are two sausages on
one stick), we can conclude that the assessment is fair, even though the
one group scores lower than the other. Having the same regression line
and intercept (i.e. stick) means that the group scoring lower on the x-
variable (the predictor) also scores lower on the y-variable (the
criterion).

In Figure 7.3c the two scatter plots and the intercept are different, but
the slope of the two regression lines is identical. In this case, the
assessment scores of the one group are lower than the other group’s.
This indicates that the assessment results are equally valid, but that the
scores of the second group underestimate the ability level of the people
in the group. In this case, merely adding a constant score to the lower
group’s score will raise both the scatter plot and the regression line to
match the situation in Figure 7.3a. The value of this constant is the
difference between the two intercepts. (The dotted line shows the
regression line and intercept for the combined group.)

Finally, in Figure 7.3d, the two scatter plots, the regression lines and the
intercept are all different and do not coincide, and we can conclude that
the measure will be unfair and will almost certainly be biased against
one group. It will therefore be less valid for use in making decisions
about the second group.

Any assessment technique or test where there are two distinct sausages and two
separate sticks is unfair using the Cleary model. This approach to determining
fairness is named after Anne Cleary who first proposed it in 1968. (See, for
example, Cleary et al., 1975.)

We discuss three other formulae for ensuring fairness based on the


decision-making matrix in Chapter 6. To make it easier to follow these
various options, we give a simplified version of the decision-making
matrix here.

Firstly, there is the equal probability model put forward by Linn (1973),
which argues that different cut-off points for the different groups should
be chosen in such a way that the success rate (B/(B + D)) remains
constant for the two groups.

The second model was put forward by Cole (1973), and is known as the
conditional probability model. This model advocates selecting different
cut-off scores in such a way that the ratio of true positive to total success
(B/(C + B)) is constant for both groups.

Figure 7.4 The decision-making


matrix
The third model is that described by Thorndike (1971), and is known as
the constant ratio model. According to this model, the cut-off points
must be chosen in such a way that the ratios of predicted success to
actual success (B + D)/B + C)) are the same for both groups.

For a fuller description of this decision-making matrix approach to


fairness, see Nunnally and Bernstein (1993, pp. 360–362).

7.4 Approaches to fairness

These various models of fairness are based on one of three distinct


positions, namely unqualified individualism*, qualified
individualism* and group-based decisions. There is a fourth approach,
the sliding band method, which tries to integrate the best of these three
positions.

7.4.1 Unqualified individualism


Unqualified individualism is the view that the best person for the job
should get the job. It assumes that the assessment technique is absolutely
valid and correct, and that it assesses only what it claims to assess.
Furthermore, it also assumes that the scores reflect ability and therefore
follows a top-down or merit-based strategy to select the best people.
This approach makes no allowance for any arguments about gender or
group membership, nor does it recognise any need for employment
equity* or affirmative action to address past disadvantages. According
to this view, a test or other assessment technique is fair if it finds the
best person for the job or position. The single-test, single-norm method
supports this argument. This approach results in large numbers of people
from the majority group and relatively fewer from the minority group(s)
being selected, and therefore perpetuates the status quo.

7.4.2 Qualified individualism


This approach argues that we need to take the best people available, but
because no assessment technique is 100 per cent valid for all groups, we
need to take group membership into account when making decisions. In
this case, judgements about the individual’s competence and suitability
are qualified (or interpreted) in the light of his group membership. The
use of separate tests or separate norms for different groups is based on
this assumption. Of course, the top-down or merit-based philosophy
applies, but this takes place within the separate norms that have been
established, rather than using a unified norm.

7.4.3 Group-based decisions


The third approach to fairness is to argue for full psychic unity and the
view that all groups of people are equally capable and equally motivated
to do a particular job. There are two approaches to fair outcomes within
the group-based approach. These are the quota system and the four-fifths
rule.

7.4.3.1 The quota system


The demand that all pass or fail decisions should be representative of the
populations involved follows logically from the point of view mentioned
above. This is nothing more than a quota system. Although it has some
advantages (such as creating role models for others to emulate), it is
actually a hollow argument. Accepting the philosophy of quotas* would
suggest that there should be a 50–50 split of male to female nurses, that
our rugby teams should also be half male and half female, 80 per cent
African, ten per cent white, five per cent coloured, and five per cent
Indian, and that all the languages should be proportionally represented in
the team. This should also apply to our soccer teams, our cricket teams,
and so on. This viewpoint is complicated by the fact that people apply
for jobs or play sport in different demographic proportions. We may ask
whether our quotas should be based on the national population or the
provincial population. (If we accept the latter, we would expect to find
more coloured people being appointed to jobs or given bursaries in the
Western Cape and more Indians in KwaZulu-Natal. It is interesting to
note that in a case between the Solidarity Trade Union and the
Department of Correctional Services, a Cape Town Labour Court on 18
October 2013 came to the decision that the use of national rather than
provincial demographics for employment equity calculations was an
unfair labour practice. However, this is not the end of the matter. In
March 2014, draft amendments to the Labour Relations Act proposed
that national, rather than regional, demographics be used. In April 2014,
this proposal was withdrawn from the draft legislation! The question
arises as to what happens if some groups do not apply for certain
positions. Should we insist that 50 per cent of our motor mechanic
apprentices be female, when only one or two apply? Should we then set
our quotas as proportional to those who apply – if five per cent of the
applicants are female, should five per cent of the positions be given to
females?

Adopting a quota system may also set people up for failure in that less-
than-competent people are appointed and fail, doing damage to the token
appointee, to the organisation and to all those who have believed in them
and their success. This outcome is predicted from the decision-making
matrix, as a quota system effectively means that the cut-off score is
dropped for some groups, thereby swelling the false positive quadrant
(D) for this group. As argued in section 6.5.2.7 on page 69, if this
strategy is adopted for social or political reasons, steps must be taken to
ensure that in this group fewer people than predicted fail.

A number of researchers in the US (Heilman, 1996; Heilman et al.,


1998; Kravitz et al., 1997; Stanush, Arthur & Doverspike, 1998) show
that there are negative consequences when employees or applicants
believe that hiring is based on group membership rather than on merit.
Similar findings have been reported in South Africa as well. As stated in
section 6.5.2.7, strategies such as affirmative action do not mean
lowering the performance standard, but do mean that additional input is
required from both management and successful candidates. This is the
true meaning of affirmative action.

7.4.3.2 The four-fifths rule


In the US, the generally accepted rule is that all positions offered should
be allocated to at least 80 per cent (or 4/5) of the number of people from
each group applying for the position. In other words, if there are 100
bursaries and 20 per cent (i.e. 20) of the applicants are Hispanic, then
between 16 and 24 of the bursaries must be given to Hispanics (20 per
cent of 20 is 4, so 16 represents 20 per cent below and 24 represents 20
per cent above the number of Hispanic applicants). Anything outside
these two numbers (i.e. fewer than 16 or more than 24) represents bias in
the selection process.

7.4.4 The sliding band approach (banding)


The sliding band method is a group-based decision that tries to find a
way of marrying top-down unqualified individualism with a group-
based approach. It uses the standard error of measurement (SEM) to
equate scores from different norms.

Let us assume that group A and group B score slightly differently on a


measure of ability, such as in Table 6.2 on page 63. The two groups
were treated as separate, and a top-down strategy was applied to both
sets of norms, so generating two separate norm tables. One of the
problems with this approach is that it accepts that the ability or
characteristic is distributed normally between the two groups, but
assumes that the underlying distribution is comparable to both groups.
To illustrate, suppose we were interested in selecting people based on
how tall they were and that we had different groups of people. If we
used a top-down approach we could end up arguing that a Masai warrior
from Kenya at stanine 3 would be roughly as tall as a pygmy from the
DRC at stanine 3, which is clearly not the case. Adopting a multinorm
solution of this kind makes comparison between different populations
impossible, or at least highly inaccurate. Where groups differ markedly
on a characteristic, using different norms may ensure representivity of
the different groups, but it fails to ensure that the quality of those
identified by a top-down process is comparable in an absolute sense.

Banding addresses this problem. It was put forward by Wayne Cascio


(Cascio et al., 1991) and is based on the notion of the standard error of
measurement (SEM), which is a measure of the spread of an individual’s
score on subsequent occasions. It is a normal distribution of the score
resulting from the random error component that is present in every score
and is given by

where
st = the standard deviation of the
population*
rtt = the reliability coefficient

Suppose that an assessment score had a maximum of 30 and that the


SEM was 1,0. If we look at a score of 25, we can argue that 25 is only a
probable score and not an exact one. We can also say that there is a 95
per cent chance that the correct score is somewhere between 2 SEMs
above and 2 SEMs below 25 – that is, that the correct score is
somewhere between 23 and 27. Each score is thus represented by a four-
point range or band, two points above and two points below the actual
score. This means that all scores within this four-point band can be
regarded as equal.

Table 7.3 shows that a group A score of 21 can be anywhere between 19


and 23, and that a group B score of 23 could be anywhere between 21
and 25. This means that a person from group A with a score as low as 19
(21 – 2) can be selected at the same time or even ahead of a person in
group B with a score of 25 (23 + 2).
Table 7.3 The sliding band approach

Group A scores Group B scores


Score 2-SEM band Score 2-SEM band
25 23–27 23 21–25
24 22–26 22 20–24
23 21–25 21 19–23
22 20–24 20 18–22
21 19–23 19 17–21
20 18–22 18 16–20
19 17–21 17 15–19

Because the scores are banded in this way, we can say that a group A
score of 21 can be regarded as being in the same band as a group B
score of 25, and a group A score of 19 can be regarded as being
equivalent to (in the same band as) a group B score of 23, and so on. As
long as we work down from the top of the group A scores, we can
regard all group B scores within 2 SEMs as equivalent. In this way, we
can slide a 2-SEM band down the two sets of scores and achieve our
balanced targets without violating the rights of either group A or group
B in a perfectly legitimate and scientific way.

The effects of these approaches to fairness are summarised in Table 7.4.

Table 7.4 Summary of fairness models

Effect on Effect on
Model Method Rationale
minorities organisation
No All comers Because Number of Poor
assessment selected on a assessment minorities investment
first-come, first- methods are maximised – as large
served basis unfair, applicants high failure numbers fail,
are allowed to rates at great cost
demonstrate their to the
ability over time organisation
Unqualified Top-down Assessment Relatively few Good
individualism selection using methods are minority performance
single norm equally valid for members are achieved; not
all groups and selected good for EE
scores therefore targets
represent merit
Qualified Group Separate norms Minority Poorer
individualism membership and/or correction groups well achievement
(race, gender, factors used to represented of objectives;
language, etc.) generate EE targets
taken into separate more easily
account topdown met
selections
Cleary Regression lines This is fair Minority Poorer
regression used to show because those groups well achievement
model fairness and to with the highest represented of objectives
calculate highest predicted
criterion scores; criterion score
people selected are selected
accordingly
Quota Appointment of Because Best Poorer
system minority groups potential is representation achievement
should be equally of minority of objectives
proportional to distributed across groups
their availability all groups
in the population (psychic unity),
(or within four- all groups must
fifths thereof) be proportionally
represented

From Table 7.4 we can see that different fairness models yield different
results, and that one cannot completely minimise adverse impact while
maximising job performance (as measured by the criterion-referenced
validity of the selection method). In other words, there has to be a trade-
off between equity and performance. To quote Hough and Oswald
(2000, p. 636): “A selection strategy aimed at minimising adverse
impact may differ somewhat from a selection strategy aimed at
maximising mean predicted performance.” In the US, reverse
discrimination court cases have concluded that race and other “job-
irrelevant class membership” cannot be used in making job-related
decisions. In South Africa, the Constitution and most current
employment legislation not only allow this, but make it compulsory.

It must be emphasised that representation of various groups is a


sociopolitical decision and not an economic one. Therefore
organisations need to balance their employment equity needs with their
economic ones. As discussed in section 6.5.2.7, when looking at the
decisionmaking matrix, any potential negative effects of bringing
underprepared minority group members into an organisation can be
countered by giving additional training and mentoring, and
implementing other affirmative action processes. At the same time,
efforts must be made to improve the general skills and competency*
levels of all social groups so that this kind of analysis becomes
redundant. This will require large investments in social and educational
development for the foreseeable future.

7.5 Approaches to ensure fairness in assessment

Clearly, the first thing we need to do is to decide on the cause of the


unfairness. There are two possibilities. The first is that the assessment
techniques are accurate (i.e. valid) and any observed differences are the
result of social and educational differences. This is a social and political
problem, but not a measurement issue. Secondly, we can assume some
version of the psychic unity argument, and argue that if people score
differently on any assessment technique, it is the measuring tool that is
at fault. This is a psychometric issue, and we must find ways of dealing
with it.

7.5.1 Natural and inevitable differences


The first line of approach in dealing with differences in scores is simply
to accept that people, and groups of people, differ in terms of their
makeup and abilities as a result of different backgrounds and
experiences. There is little we can do except to accept these results, in
the same way that we accept that men are generally taller and more
powerful and therefore better tennis players than women (in the sense
that most top male tennis players would beat most top women tennis
players). In the same way, we must simply accept that groups such as
Africans (for whatever reason) are better sprinters and tend to dominate
in international sporting competitions of this kind.

Based on this line of reasoning, a variation of the argument holds that


everyone behaves in a way that is appropriate to his situation. It argues
that there is little point in trying to compare the behaviour of, say, a cat
and a dog. Although these two animals behave very differently, the cat
does things that make sense to it in terms of the cat-life it leads and the
dog does things in the most appropriate way for the dog-life it leads. In
exactly the same way, people from different ways of life need to know
different things, to do different things and to do things differently. When
we assess people, we need to take these differences into account and try
to understand and evaluate people in terms of what is most appropriate
within these different parameters. In other words, we should not judge
cats using dog constructs, or vice versa. This argument is explained
more fully by Kaplan and Saccuzzo (2013, p. 173).

To illustrate their contention, Kaplan and Saccuzzo quote the SOMPA


(System of Multicultural Pluralistic Assessment) technique. SOMPA
was developed in 1973 by Mercer, who believed the average ability
level of all cultural groups to be equal and that differences between these
groups result from differences in their cultural experiences. In addition,
different groups have different success criteria and therefore the only
fair way of assessing people is by looking at differences within the
group and not across different groups. Stated differently, the only fair
way of assessing people is by using appropriate group-specific
assessment techniques and interpreting the outcomes against group-
specific norms. Where intergroup comparisons need to be made, these
should be done firstly by comparing the outcomes with those groups
having similar cultural and experiential parameters, and secondly, by
correcting or adjusting the scores to take account of socioeconomic
differences between the groups. In short, SOMPA adjusts the assessment
scores to take account of socioeconomic and cultural differences.
Although SOMPA was fairly popular at one stage, it is very difficult to
administer and interpret, and its popularity has decreased significantly in
recent times (Kaplan & Saccuzzo, 2013, p. 177). However, Mercer has
drawn attention to the important issue that assessments are often carried
out and interpreted in terms of the requirements of various values, needs
and philosophies framed by the socially powerful, especially the
education and employment systems. As we will argue when we look at
intelligence in Chapter 10, it is largely the views of the powerful
educators and managers in the workplace that determine what forms of
behaviour are valued and which are not; the interpretation of assessment
outcomes reflects little more than the requirements of the powerful.

7.5.2 Removal of discriminatory items


The second approach to the issue of group differences holds that, on
average, people are similar, and any differences in assessment scores are
a result of weaknesses in the assessment process. As a result, we need to
find which elements of the assessment process account for these
differences and eliminate them. This is rather like saying that at the
Olympics (and other international sports events) black athletes tend to
beat white athletes in the sprints and marathon races, therefore we
should remove these events from the competition so that everyone,
irrespective of background, has an equal chance of winning! Only the
most egalitarian, anti-competitive person would accept this argument,
yet when it comes to cognitive and work-related skills, we are required
to accept that any differences that are found in ability level between
groups originate in the assessment process and not in the individuals
being assessed.

At the same time, researchers have found that removing or modifying


the items that discriminate the most between groups does not alleviate
the problem. In the US, consistent differences of about one standard
deviation have been reported on most measures of cognitive ability
when the means of black and white groups are compared. In 1992,
Helms argued that these black/white differences might be reduced by
framing the items in a socially appropriate way. A panel of experts
modified abstract items to reflect everyday social, organisational and life
experiences. In contrast to Helms’ hypothesis, DeShon et al. (1998)
showed that marked black/white differences in US samples remained
even when large samples and parallel versions of the test were used.

7.5.3 Separate tests


In the past, the South African authorities approached the issue of
differences in socioeconomic and cultural backgrounds by developing
separate tests for the different groups identified in terms of the apartheid
ideology. This meant that separate tests were devised and standardised
for Indians, Africans, coloureds and whites. Although the reasoning
behind this was not very different from the SOMPA reasoning, it was
based on the assumption of forced segregation in which the different
ethnic groups would not compete with each other in the workplace. In
addition, the allocation of resources for the development of new
assessment measures, norms, and so on was highly skewed in favour of
the dominant white group. One result was that when the situation in the
country began to normalise, there was no way in which members of the
different groups could be evaluated on a single measure: there were no
non-racial assessment tools and very few non-racial norms for
interpreting assessment results. (Of particular interest is that during a
recent teaching stint undertaken by this author at the University of Fort
Hare, the predominantly black university attended in the past by such
luminaries as Nelson Mandela and Robert Mugabe, almost all fourth-
year Industrial Psychology students who were all Africans from various
parts of southern Africa, endorsed this separate test solution to the
region’s complex multicultural challenges.)

7.5.4 Single tests, different norms


The approach most commonly used in South Africa even today is to
administer single tests (or batteries of assessments) to different
individuals, but then to use different norms for interpreting the results.
This approach, with its advantages and disadvantages, is discussed in
Chapter 6 where we look at the norm-based approach to interpreting
assessment results (see especially sections 6.2, 6.3 and 6.4). However,
despite its widespread use in South Africa and elsewhere, this approach
has its share of problems, not least of which is the inability to make
absolute judgements across different categories or norm groups.

The use of race-based norms is forbidden in many states in the US.

7.5.5 Single tests, same norms


Only recently have any unified tests and common norms appeared in
South Africa. However, given that socioeconomic differences continue
to be visible in the education and general socioeconomic status of
different sectors of the country’s population, large parts of society
remain at a disadvantage when they are assessed with the current
assessment techniques and tools such as psychological tests. As a result
(and to compensate), a newly developed test of scholastic ability, the
Differential Aptitude Test (DAT) battery has very generous norms that
tend to overestimate ability in the previously advantaged groups. (In
much the same way, critics would argue that the Grade 12 examinations
– the so-called matric – have been watered down in recent years, with
the result that the number of A passes among the previously advantaged
groups is very high.) (See Foxcroft & Stumpf, 2005.)

7.6 Ways of ensuring fairness in practice

We have seen that fairness is about making decisions that reflect a


person’s true ability or personality. We know that fairness is not the
same as a lack of bias, because a biased measure can be used fairly
provided the nature and the extent of the bias is known and steps are
taken to correct this (the clock that is fast!). As we noted earlier when
looking at the idea of pre-market discrimination (section 7.1.3.3), even
valid and reliable assessments that have been used with success in the
past may only perpetuate unjust organisational practices, even if this
assessment is applied in a fair manner (Milner, Donald & Thatcher,
2013). Clearly, fairness in this context is a social and not a technical
issue.
The question then is how we can ensure fairness? We consider six
possible ways.

7.6.1 Do not assess


The first approach to ensuring fairness is simply not to assess. Although
this may seem a radical solution to some (and a completely reasonable
one to others), it is a non-starter. Let us consider why we assess in the
first place. In Chapter 1 we state that it allows us to distinguish between
people who need treatment and those who do not. It helps managers
decide who should be selected and who should not. It enables
educationalists to see which students know their work and should be
promoted and which need to repeat the material. Without assessment,
these decisions are impossible, and we revert to a pass one, pass all
mentality.

In the early 1990s, a young industrial psychologist was asked to devise a way of
selecting apprentices at his place of work. He prepared a detailed proposal for his
manager, who rejected it because of the inherent inequalities in the system which
would exclude a large number of previously disadvantaged candidates. The
manager’s suggestion was that everybody who was interested enough to apply
for an apprenticeship, and who appeared to be suitable, should be admitted to the
programme. Those who could not make it should be allowed to fail and be
excluded from the training programme after six months or a year.

Let us consider the implications of the manager’s suggestion. Instead of


taking a few hours to decide who was capable, he proposed taking six
months or a year (and thousands of rand’s worth of pay and other costs)
to arrive at the same decision. Moreover, apprenticeship is a formal
contract, and his proposal implied that a large number of contracts
would have had to be torn up at the end of the trial period. If we go back
to the decision-making matrix (Figure 6.5), we see that this is an
extreme case of dropping the cut-off point (i.e. moving it to the left).

Not assessing is not a realistic alternative.

7.6.2 Interviews only


A second alternative is to interview all applicants and to select on this
basis. However, this approach poses several problems. Firstly,
interviewing is a tedious, costly and time-consuming process, hence it
usually takes place fairly late in any selection process, once most of the
unsuitable candidates have been weeded out. (Think of the multiple-
hurdle approach to combining information discussed in section 6.5.2.3.
Interviewing is one of the last hurdles to be cleared.) Secondly, as
Chapter 16 points out, the validity of interviewing is suspect. If testing
and other forms of assessment are not ideal, interviewing is even less
trustworthy, despite what people say to the contrary. Finally, we have
seen the need for triangulation in arriving at a sound decision – the use
of only one technique is likely to give poor results.

7.6.3 Observation
A third approach to assessing people fairly is to place them in a situation
(either a sample of the real situation or some kind of simulation) and
then to observe how they perform in it. As we see in Chapter 17, this is
one component of an assessment centre and has the decided advantage
of allowing the person to demonstrate many of his abilities and
competencies. It is, however, a complex and costly method, and should
therefore be used fairly late in the assessment process.

7.6.4 Separate (different) assessment processes


As stated above, one way of trying to be fair to different groups is to
develop separate assessment processes (such as tests) for different
groups. Not only is this expensive, but this approach is rejected because
it was used in the past as a tool of apartheid. More importantly, it makes
comparison between members of the different groups almost impossible.

7.6.5 Same measure, different norms


Another approach that was widely used in the past (and is used widely in
many multicultural societies) is to administer a single set of assessment
measures and then develop separate norms for different cultural groups.
This is discussed briefly in section 7.5.4 and in some depth in Chapter 6
(sections 6.2, 6.3 and 6.4), where the advantages and disadvantages are
examined. The South African government has rejected this approach in
favour of a single-method, single-norm approach, as discussed in section
7.6.6.

7.6.6 Single method, single norm


In this approach, which is favoured by the post-liberation government in
South Africa, a single process with a single norm is developed. This
method seems to assume a model of psychic unity and requires a careful
choice of items to remove those that may be seen to favour one group
over another, which is viewed as discrimination. In most (if not all) of
the tests and other measures that have been updated or released by the
Human Sciences Research Council (HSRC) since 1994, a single set of
items and a single set of norms have been provided. (Separate male and
female norms are available where appropriate – gender differences are
not politically sensitive.) One result of this approach is that the norms
appear to be quite low in some cases, allowing people from privileged
backgrounds to score above the mean on these instruments. However, as
educational opportunities and socioeconomic conditions begin to be
more equitably distributed, so these class differences will begin to
disappear.

7.7 Summary

In this chapter, we defined fairness as the lack of random error or


systematic bias in the assessment technique and/or the interpretation
thereof. We highlighted four different meanings of fairness from the
SIOPSA’s document Guidelines for the validation and use of assessment
procedures for the workplace. These are equal outcomes for all groups;
the equitable treatment of all groups during the assessment process
(including the idea of reasonable accommodation); equal opportunity to
experience and learn from situations; and the absence of predictive bias
(the assessment has the same ability to predict future behaviours,
irrespective of group membership).
We examined the various notions related to fairness and discrimination,
and identified various groups affected by considerations of fairness. To
understand what fairness entails, we examined briefly the assumption of
psychic unity in relation to psychological assessment, and considered
what is meant by emic and etic approaches, before turning our attention
to how we recognise that unfairness has occurred. We then looked at
approaches to fairness in terms of unqualified individualism, qualified
individualism, group-based decisions, the four-fifths rule and the sliding
band approach (banding). We closed the chapter by putting forward
various ways of trying to ensure fairness in assessment and suggested a
number of different approaches to make assessing as fair a way as
possible.

In Exhibit 7.1 you will find an example of a flyer or advertising leaflet


promoting the use of a particular test to illustrate the various aspects
covered in this section. Although the test itself and the various bits of
information in the exhibit are fictitious, the material is based on
promotional material from the catalogue of a leading US test distributor,
and is an example of what is required of a good measurement technique
and of good promotional material for such a technique.

You, the reader, should at this point be able to define and explain every
term that is used in this promotional leaflet. If you are unable to do this,
go to the glossary of terms at the back of this book and then, if
necessary, revisit the relevant chapters.

Additional reading

For a useful discussion of forms of discrimination, read Schmidt (1988).


Kaplan & Saccuzzo (2013) discuss issues of test bias in some depth in Chapter 6 and
give a very useful account of fairness in testing.
A very useful and up-to-date text on fairness in assessment is Cohen, R.J., Swerdlik,
M.E. & Sturman, E. (2012). Psychological testing and assessment: An introduction to
tests and measurement. (8th ed.) Boston, MA: McGraw-Hill.
For an extension of the argument around the Cleary model of fairness (the sausages on
a stick model), see Russell, 2000.
See McIntire, S.A. & Miller, L.A. (2000). Foundations of psychological testing (especially
Appendix B and Appendix C) for excerpts from the American Psychological Association
(APA) and how to ensure fairness in assessment.
For an in-depth analysis of the situation in South Africa, see the article by Callie Theron
(2007) in the South African Journal of Industrial Psychology.

Test your understanding

Short paragraphs

1. Define fairness and list five kinds of evidence to show that an assessment is unfair to
certain groups.
2. Discuss what is meant by psychic unity and discuss this in relation to the emic and
etic approaches to assessment.
3. Briefly discuss the five approaches to fairness.

Essay

What are the advantages and disadvantages of using group-based norms for
interpreting assessment results, and what other approaches are there for ensuring
fair(er) assessment results?

Exhibit 7.1
The Adult Basic Competence Test – Version 4
(ABC4)
The Adult Basic Competence Test – Version 4 (ABC4) is the latest offering in a
test series first published in 1946. The various editions of the ABC have enjoyed
widespread use in a variety of settings as a measure of basic academic skills and
competencies necessary for effective learning, communication and thinking,
reading and spelling words, and performing basic mathematical calculations. The
ABC4 continues to measure these basic content areas, and preserves those
features that made the ABC3 and earlier versions so popular with users – ease of
administration and scoring, and the provision of a significant amount of
information gained through a relatively brief investment of testing time.
The interpretation of the ABC4 has been enhanced by the addition of grade-
based norms, thereby increasing the usefulness of the tests in Grades 0–12. The
age-based norms have been extended from 75 years in the third edition to 94
years in the fourth edition so that the basic literacy skills of older adults can now
be assessed.
The ABC4 is a norm-referenced test that measures the basic academic skills of
word reading, sentence comprehension, spelling words and mathematical
computation. It was standardised on a representative national sample of over
3000 individuals ranging in age from 5–94 years. Alternate forms, designated the
Pink and Green forms, were developed and equated during the standardisation
process using a common-person research design.
Derived scores were developed for both age- and grade-referenced groups.
Standard scores (T-scores), percentile ranks, stanines, normal curve equivalents
and grade equivalents are provided for both groups.
Although there is some evidence that the tests are somewhat sensitive to social
and cultural influences, in all but two cases the scores of the various tests were
no more than 1 stanine different from the overall finding, suggesting that the test
as a whole is relatively robust to sociocultural influences, while remaining
sensitive enough to be useful across the full sociocultural spectrum.

Reliability
Reliability evidence for the ABC4 is shown to be strong, and includes information
based on classical test reliability theory, such as internal consistency, alternate
form, test-retest (one day and three months), inter-scorer reliability and standard
error of measurement.

Test-retest reliability: one-day reliabilities range from 0,78–0,89 for age-based


samples and 0,86– 0,90 for grade-based samples.
These figures are 0,73–0,91 and 0,84–0,91 for the two samples when the test-
retest interval was 90 days. Together these two sets of figures show that the
effects of practice are relatively small.
Alternate form reliability tested one week apart (with the order of the Pink and
Green versions randomly assigned) produced results with the same order of
magnitude (0,76–0,88 and 0,77– 0,86 for the groups).

Validity
Validity studies include both exploratory and confirmatory factor analyses and
provide consistent support for the structure of the ABC4. These studies show
strong concurrent validity with other measures of academic ability and grade
examinations at all grade levels. It thus appears to be a good predictor of
academic achievement. In the workplace, there are moderate correlations (0,46–
0,63) with training outcomes and supervisors’ estimates of general intellectual
ability.
In addition, good discriminant validity evidence for the ABC4 is reported, in that
the ABC4 differentiates among individuals with mental retardation, learning
disabilities, speech and language impairments, and those individuals identified as
gifted.
8 Assessing in a multicultural
context

OBJECTIVES

By the end of this chapter, you should be able to

describe the importance of assessment in a cross-cultural context


list the factors that affect the outcomes of assessment in cultures that differ from
those in which the assessment was devised
describe the various forms of equivalence and the factors that jeopardise the
equivalence of different tests and items
describe the factors that influence the cross-cultural validity and fairness of
assessments.

8.1 Introduction

Psychological assessment is being increasingly applied to people from


different cultural contexts, either in a single country (involving
immigrants) or in different countries. This is done for many reasons –
for example, academic researchers may be interested in looking at
universals of behaviour across all groups, or they may be interested in
national group differences or, finally, they may be interested in
individual differences. The focus of their attention depends on why
assessing is being done in the first place. In many instances, people are
assessed to understand their current level of functioning as may occur
when they are experiencing some form of difficulty, either as a result of
personal adjustment problems or as a result of circumstances imposed
on them by natural disasters. Here one thinks of earthquake survivors or
people who have been traumatised by volcanic eruptions and cyclones.
People may also have been negatively affected by man-made disasters
such as war and nuclear disasters or large-scale chemical pollution
accidents such as occurred at Bhopal, India. People are also assessed for
educational placement and for the award of academic bursaries and
scholarships. In the organisational arena, people are usually assessed for
selection purposes to determine their suitability for specific jobs or
positions.

In the context of this book, most of this assessment will be for the
selection and placement purposes of people with limited English ability
and/or experience of English culture, such as with recent immigrant
populations, or where the assessments are used in transnational settings.
Various economic, political and social developments, both nationally
and internationally, in the past few decades have resulted in a great
increase in the need for, and interest in, cross-cultural assessment (Van
de Vijver, 2002). These trends include a more global economy and
increased labour migration, the internationalisation of education, and a
massive influx of political refugees into many European and other stable
countries, all of which have given impetus to the understanding of cross-
cultural interactions. According to a March 2000 report by the
International Labour Organization (ILO, 2000, p. 1)

[t]he growing pace of economic globalization has created more


migrant workers than ever before. Unemployment and increasing
poverty have prompted many workers in developing countries to seek
work elsewhere, while developed countries have increased their
demand for labour, especially unskilled labour. As a result, millions
of workers and their families travel to countries other than their own
to find work. At present there are approximately 175 million migrants
around the world, roughly half of them workers (of these, around 15%
are estimated to have an irregular status).

The report also predicts further increases in international economic


migration, particularly if the disparity in wealth between rich and poor
countries continues to grow as it did in the last decade of the 20th
century. In addition, political disturbances and natural disasters in many
parts of the world have resulted in the displacement and migration of
large numbers of people to more stable and economically attractive
countries.

As a result of these and similar factors, there is an increasing need to


assess various educational, mental health and work-related competencies
and the application of tests and assessments of different kinds across
different cultural contexts, either in a single country (involving
migrants) or in different countries. In addition, research into various
personality and other psychological constructs necessitates widespread
assessment across a broad range of cultural and social contexts.
Accordingly, when it comes to assessing psychological functioning in
such areas as cognitive ability, personality, mental health status and
legal competence across a range of contexts, such as in the educational
or mental health fields, neuropsychological assessment or for selection
or promotion in organisational contexts, the effects of culture on
psychological and cognitive process and outcomes need to be taken into
account.

These assessments need to be carried out in ways that are fair and
unbiased, irrespective of why they are carried out. As shown in Chapter
7, fairness is a special case of validity generalisation. (Are our measures
equally valid across different groups?) In much the same way, cross-
cultural fairness asks whether the measures are equally valid across
groups of people with different cultural backgrounds and linguistic
ability in the language of assessment. At the same time, as Coyne (2008)
has noted, “[e]qual opportunity laws in many countries will prohibit the
use of tests in a manner that discriminates unfairly against protected
groups of the population (such as gender, racial, age, disability, religion,
etc.).”

8.1.1 Definitions of culture


The term “culture” is widely used in anthropology, where a typical
definition is as follows:

Culture is a shared meaning system, a shared pattern of beliefs,


attitudes, self definitions, norms, roles and values … Cultural
differences are best conceptualised as different patterns of sampling
information found in the environment (Triandis, 2000, p. 146).

A similar definition is offered by Robbins (1996, p. 48):

The primary values and practices that characterise a particular


country.

Perhaps the most widely cited definition of culture is that put forward by
Geert Hofstede (1991) who sees culture as “software of the mind” and
as “the collective programming of the mind which distinguishes the
members of one group or category of people from others” (p. 5).

There are various ways of categorising culture, the most prominent of


which are theories by Kluckhohn and Strodtbeck (1961) and Hofstede
(e.g. 1980). However, space does not warrant a discussion of these
models here. Readers are referred to Hofstede (e.g. 1991) and to
Kluckhohn and Strodtbeck (1961).

8.1.2 Emic and etic approaches


A closely related issue is the widely accepted distinction made between
emic and etic approaches to the understanding of psychological
phenomena in a cross-cultural context. These terms derive from
linguistics, where phonemics is the study of sound patterns in a
particular social group and phonetics is the study of universal sound
patterns. They are discussed in some depth in Chapter 7 (section 7.2.1).

In the field of assessment, especially assessment across sociocultural


categories, one is constantly faced with the issue of whether differences
in assessment score reflect group differences or whether they reflect bias
and other problems in the measuring technique. In other words, are any
measured group differences on various psychological dimensions real or
are they simply artefacts that arise because the assessment processes
measure things differently in the different groups? To put it crudely, if
we find group differences in ability or personality structure, are they
real – that is, can we assume that the assessment techniques we use are
correct, and that scores accurately reflect these differences? Or do we
argue that the differences in the assessment outcomes reflect weaknesses
in the assessment instruments – that is, they are biased? If people score
differently on any assessment technique and this is because the
assessment techniques used are invalid, this is a psychometric issue –
the measuring tool is “at fault”, and ways of dealing with this need to be
found. Alternatively, if the assessment techniques are equally valid for
all people being assessed, any observed differences are the result of real
social, historical and educational differences that impact on the abilities
and behaviours of the people being assessed and hence on the
assessment process and/or outcomes. Addressing these differences is a
social and a political problem, not a measurement issue. Of course, a
middle way between these two extremes is possible – that while some
differences in group performance may reflect real differences in ability
and structure, few measures are culture-fair, and bias and differential
item functioning (DIF) may well suggest differences when none exist.

8.1.3 The issue of acculturation


An important aspect of culture and its impact on psychological
structures is acculturation, or the transition from one culture to another.
It is commonly known that humans are not static organisms but change
in reaction to (and often lead) changes in their environments. An
important source of these changes is moving from one sociocultural
context to another, for whatever reason. Acculturation is one such
process and involves the psychological adaptation of people (such as
migrants and minorities) to a new and different cultural setting as a
result of movement from and adjustment (see, for example, Van de
Vijver & Phalet, 2004). The extent of this adaptation depends on a range
of exogenous variables, such as length of residence, generational status,
education, language mastery, social disadvantage and cultural distance
(Aycan & Berry, 1996; Ward & Searle, 1991). In addition, it depends on
the extent to which the individuals wish to adapt and integrate into the
new culture. In this regard, Van de Vijver and Phalet (2004) argue that
two basic models of acculturation can be identified in the literature,
depending on whether acculturation is seen as a uni-dimensional or a bi-
dimensional process. The best-known uni-dimensional model is that
proposed by Gordon (1964), which assumes that acculturation is a
process of change in the direction of the mainstream culture. Although
migrants may differ in the speed of the process, it results in adaptation to
the culture of destination.

Recently (over the past two or three decades), the uni-dimensional


model has been increasingly replaced with a bi-dimensional model of
acculturation that has been seen as more appropriate (Ryder, Alden &
Paulhus, 2000). Rather than pursuing complete adjustment to the new
culture in an assimilationist way, the trend has been towards developing
a bi-cultural identity or by retaining the original culture without
extensively adjusting to the society of settlement. Van de Vijver and
Phalet (2004) attribute this to two factors: first, the sheer magnitude of
migration has allowed the incoming migrant populations to develop and
sustain their own cultural institutions such as education, health care and
religion, and second, because the Zeitgeist of the assimilationist doctrine
among existing cultures has been replaced by one that is more accepting
of diversity, in which the retention of various cultural institutions and
behaviour patterns by migrants is more readily accepted.

As a result, a popular current model is one proposed by Berry (Berry &


Sam, 1997). According to this model, a migrant is required to deal with
two questions. The first is, does he want to establish good relationships
with the culture of destination or his host culture (adaptation
dimension)? The second question involves cultural maintenance: does
he want to maintain good relations with the culture of origin or his
native culture? These two dimensions interact to yield four distinct
coping strategies, as shown in Table 8.1.

Table 8.1 Migrants’ strategies in a bidimensional model of acculturation

Cultural adaptation
Do I want to establish good relations
with the culture of destination?
Cultural maintenance Yes No
Do I want to maintain good Yes Integration Separation/segregation
relationships with my culture of origin?
No Assimilation Marginalisation
The first strategy put forward by Van de Vijver and Phalet (2004) is
integration, where characteristics of both cultures are maintained in a
process of biculturalism. They quote a number of research studies in
Belgium and the Netherlands (e.g. Phalet & Hagendoorn, 1996; Phalet,
Van Lotringen & Entzinger, 2000; Van de Vijver, Helms-Lorenz &
Feltzer, 1999), which consistently show a preference for this strategy,
namely that migrants want to combine their original culture with
elements of the mainstream culture.

The second strategy identified by Van de Vijver and Phalet (2004) is


one where migrants retain most elements of their original culture and
generally ignore most aspects of the host culture. Van de Vijver and
Phalet term this separation (in sociology and demography it is also
labelled segregation). In South Africa, where this cultural separation was
enforced by white nationalists, it was termed apartheid or “separate-
ness”.

The third strategy is assimilation, which is the opposite of the separation


strategy, in that it aims at complete absorption of the migrant into the
host culture with the concomitant loss of most elements of the original
culture. This is the notion of the melting pot, which was the dominant
policy for many years in many Anglophone countries (the UK, the US,
Australia and Canada, to name a few). In recent years, this melting-pot
view has given way to multiculturalism of various kinds.

The fourth (and in the view of Van de Vijver and Phalet (2004), the least
often observed) strategy is marginalisation, which involves the loss of
the original culture without establishing ties with the new culture. In
some countries youth, often second or third generation, show
marginalisation of this kind; they do not feel any attachment to the
parental culture nor do they want to establish strong ties with the host
culture (often they are prevented from identifying with the host culture
because of societal discrimination or other forms of exclusion). As
Denoso (2010) argues, in real life marginalisation is seen as a negative
outcome of the acculturation process, rather than as a conscious choice
by the people concerned.

When it comes to assessment, Van de Vijver and Phalet (2004, p. 218)


argue that the culture maintenance dimension is usually less relevant
than the adjustment dimension. This is so because the position of a
person on the latter dimension (which is essentially a continuum rather
than a dichotomy) determines the suitability of the assessment technique
for the person and the applicability of the norms used for interpreting the
outcomes. Simply assuming that all tests are invalid for minority groups,
or that they can simply be used with all minority groups, is clearly false:
the level of acculturation may be an important moderator of test
performance in multicultural groups (Cuéllar, 2000). For this reason,
Van de Vijver and Phalet (2004) argue that the various measures of
acculturation that have been developed need to be applied as a precursor
to assessment in a multicultural context. They argue (p. 218) that

[i]t is regrettable that assessment of acculturation is not an integral


part of assessment in multicultural groups (or ethnic groups in
general) when mainstream instruments are used among migrants.

Using this two-dimensional model, measures of acculturation are


typically based on different combinations of positive or negative
attitudes towards adaptation and maintenance. These attitudes are
assessed using three distinct question formats, namely one, two or four
questions (Van de Vijver, 2001). The Culture Integration-Separation
index (CIS: Ward & Kennedy, 1992) is an example of a one-question
format measure. These measures typically ask forced-choice questions,
with a choice between either valuing the ethnic culture or host culture,
or both, or neither, for example: “Do you prefer (A) your own [e.g.
Turkish] way of doing things; (B) the Dutch way of doing things; (C)
equally like both Turkish and Dutch ways of doing things; and (D)
neither – I dislike both ways of doing things”. An advantage of this one-
question format is that the questions tend to be efficient and short, but
they cannot distinguish complex attitudes of bicultural individuals.

The two-question format asks for separate importance ratings for


maintaining the ethnic culture and for adapting to the host culture, which
assess the individuals’ attitudes to cultural maintenance and adaptation
to the host culture separately. An example of this is Phalet &
Swyngedouw’s (2003) Acculturation in Context Measure (ACM). In this
case, the ACM asks these two questions: “Do you think that [“Culture of
Origin Groups”, e.g. Turks] in the [Country of Destination, e.g. the
Netherlands] should maintain the [Turkish] culture (4) completely; (3)
mostly; (2) only in part; or (1) not at all?” and “Do you think that
[Turks] in the [Netherlands] should adapt to the [Dutch] culture (4)
completely; (3) mostly; (2) only in part; or (1) not at all?”

The four-question format measures such as the Acculturation Attitudes


Scale (AAS) developed by Berry, Kim, Power, Young and Buyaki
(1989) use agreement ratings with four statements that independently
assesses each of Berry’s four strategies by indicating whether
participants Agree Strongly (A), Agree (a), Disagree (d) or Disagree
Strongly (D) with each of the following statements:

1. I think that [Turks] in [the Netherlands] should maintain the


[Turkish] culture and not adopt any Dutch ways of doing things
[Separation].
2. I think that Turks in the Netherlands should try to fully adopt Dutch
ways and forget about their Turkish ways of doing things
[Assimilation].
3. I think that Turks in the Netherlands should try to keep their Turkish
customs and culture, while at the same time trying to fit into Dutch
culture as far as possible [Integration].
4. I think it is stupid of people to have any form of culture – I reject
both my Turkish culture and that of the Netherlands
[Marginalisation].

(Note: These are not the actual questions used, but merely illustrate the
approach.)

Denoso (2010, p. 38) argues that the two- and four-question format
measures successfully discrimination between the integration strategy,
which is generally considered to be more adaptive, and the other less
adaptive strategies (Arends-Tóth & Van de Vijver, 2003). On the other
hand, Rudmin and Ahmadzadeh (2001, cited by Denoso) have argued
that the marginalisation strategy was misconceived and incorrectly
operationalised during the test construction process. They argue that the
four-fold paradigm commits the Fundamental Attribution Error* by
presuming that acculturation outcomes are caused by the preferences of
the acculturating individuals rather than by the acculturation situations.
They further argue the general point that the four-question approach to
assessment in general has poor psychometric properties, in that the
questions are ipsative (i.e. they are positively correlated and thus not
independent of one another). (See section 3.6.6).

Another development in the assessment of acculturation is the view that


individuals do not adopt a single approach in this area. Rather, the
approach adopted is contingent on the situation in which it is shown. In
this regard, according to Arends-Tóth and Van de Vijver (2003),
acculturation strategies adopted depend on whether this occurs in the
public and private domain. Similarly, Phalet and Swyngedouw (2003)
found that willingness to engage in maintenance or adaptation was
context dependent. In particular, they showed that most migrants tend to
favour cultural maintenance in the private domain, such as family
relationships, and adaptation to the host culture in the public domain,
such as school, work, etc. (Arends-Tóth & Van de Vijver, 2003; Phalet
& Andriessen, 2003; Phalet & Swyngedouw, 2003). Moreover, in these
studies this acculturation profile was considered as the most adaptive
pattern. The Acculturation in Context Measure (ACM) developed by
Phalet and Swyngedouw (2003) is a two-question format measure that
repeats the same questions in multiple contexts (e.g. home, family,
school and work situations).

In closing this section on acculturation, Arends-Tóth and Van de Vijver


(2006b) provide five guidelines for the assessment of acculturation.
These are as follows:

1. Acculturation conditions, orientations and outcomes usually cannot


be combined in a single measure. Combining them makes it difficult
to determine how acculturation could explain other variables (e.g.
cognitive developmental outcomes) if all aspects of acculturation are
used as predictors.
2. A measure of acculturation can only be comprehensive if it contains
aspects of both the mainstream and heritage cultures.
3. Proxy measures (e.g. generation, number of years living in the
country) can provide valuable complementary information to other
measures of acculturation, but are usually poor stand-alone
measures of acculturation. Simply taking stock of a set of
background conditions and ignoring psychological aspects results in
an indirect, limited appraisal of acculturation.
4. The use of single-index measures should be avoided. The content
validity of these types of measures is typically low and inadequate
to capture the multifaceted complexities of acculturation. Moreover,
there is no support in the literature for any single-index measure of
acculturation.
5. The psychometric properties of instruments (validity and reliability)
should be reported.

8.2 Approaches to cross-cultural assessment

In addressing the issues associated with using psychometric instruments


in societies for which they have not been developed, Van de Vijver and
Hambleton (1996) identify three approaches which they term Apply,
Adapt and Assemble. In this book, the third approach (namely Assemble)
has been split into two to yield Develop Culture-Friendly Tests and
Develop Culture-Specific Tests. The four different approaches discussed
in this text are:

8.2.1 Apply
Firstly, instruments that have been developed in one particular social
context (essentially Western/Eurocentric) can simply be applied to all
groups across different sociocultural settings without checking the
meaningfulness and psychometric properties such as reliability and
validity of the instruments. This approach adopts an assumption of
universality, the view that these instruments retain their original
properties in the new setting. Personality questionnaires developed by
Eysenck are examples of instruments that have been translated and
validated in various cultural groups on the assumption that personality
structures and the items assessing each aspect are the same in all
cultures and contexts as has occurred with various personality scales
(e.g. Barrett, Petrides, Eysenck & Eysenck, 1998; Eysenck, Barrett &
Eysenck, 1985). In Chapter 7 (section 7.2.1), this approach is described
as an “etic” approach, and as Van de Vijver (2002, p. 545) notes, this is
a form of “blind” application of an instrument in a culture for which it
has not been designed, and is simply bad practice where there is no
concern for the applicability of the instrument nor its psychometric
properties in the new context.

He argues that if any instrument is “borrowed” from another cultural


group, it must be shown to have been validly adapted: the test items
must have conceptual and linguistic equivalence, the test and test items
must be free of bias (Fouad, 1993; Geisinger, 1994) and appropriate
norms must be developed. These properties have to be empirically
determined.

8.2.2 Translate/adapt
Secondly, existing tests and measures can be adapted and translated into
the language of the target group. However, this goes beyond a literal and
even idiomatic translation in order to ensure the proper conceptual
translation of the test material. For example, the Minnesota Multiphasic
Personality Inventory (MMPI – a clinically oriented personality scale)
contains various implicit references to the American culture of the test
designers, and extensive adaptations to many items are required before
the scale can be used in other languages and cultures.

8.2.3 Develop culture-friendly tests


The third approach to assessing cross-culturally is to develop
instruments that are designed to measure the targeted construct in ways
that are “user friendly” in specific cross-cultural contexts. This was the
idea behind the so-called “culture-free” (Cattell, 1940), “culture-fair”
(Cattell & Cattell, 1963), and “culture-reduced” tests (Jensen, 1980).
The claim that there are psychological assessment processes that are not
affected by cultural factors was criticised more than 40 years ago (e.g.
Frijda & Jahoda, 1966). Nevertheless, the idea that some assessment
formats are more suited for use in cross-cultural contexts than others
because of particular features such as their format, mode of
administration or item contents still underlies much test design and data
analysis in cross-cultural psychology (Van de Vijver, 2002).

8.2.4 Develop culture-specific tests


The fourth approach to assessing cross-culturally is to develop culture-
specific instruments from scratch to assess constructs that may be very
different in the specific cultural setting (e.g. Cheung, Leung, Fan, Song,
Zhang & Chang, 1996). This is especially important when existing
instruments have been shown not only to be invalid and unreliable, but
more especially that they do not adequately assess the particular
construct in the “other” cultural group. This is termed an “emic”
approach. (The origin and meaning of the terms “emic” and “etic” are
discussed in some depth in Chapter 7, section 7.2.1.)

Irrespective of whether a psychometric test is taken as is and applied to a


new group of people, whether the test has been adapted or whether it has
been developed from scratch, it needs to be calibrated or “normed” for
the population for which it is to be used. Perhaps more importantly, the
behaviour of the test across cultural boundaries needs to be investigated
in order to determine whether the tests are measuring the same
phenomenon in the same way – do the results mean the same thing for
different groups? Put differently, are the tests and their results equivalent
across the different groups? In order to examine this further, we need to
understand the various factors or sources of bias that contaminate and
detract from the cross-cultural validity of our measure.

8.3 Forms of bias


In addressing the issues of cross-cultural equivalence, a useful starting
point is to identify the various sources of bias so that steps can be taken
to prevent them from contaminating the assessment scores. Van de
Vijver and others (Van de Vijver & Leung, 1997a, 1997b; Van de Vijver
& Poortinga, 1997) identify three distinct sources of bias and unfairness,
assuming that blatant forms of discrimination on the basis of sex, race,
caste, etc. are excluded. These are construct bias, item bias and method
bias.

8.3.1 Construct bias


Construct bias* is the most important reason for construct
inequivalence, and occurs when the constructs are associated with
different behaviours or characteristics across cultural groups (“cultural
specifics”). Schumacher (2010), for example, argues that in
individualistic Western cultures, leadership is usually associated with
traits such as dominance and assertiveness, whereas in more
communalistic cultures leadership is more likely to be associated with
self-effacing, community-supporting traits and behaviours. This is, of
course, in line with the findings of Hofstede (e.g. 1991, 1994, 1996) who
distinguishes between masculinity and femininity as one of five cultural
dimensions. As such, test items assessing self-presenting or self-
enhancing traits that would be viewed as important in communalistic
cultures would be seen as socially undesirable, rated lower and seen as
incongruent with possible leadership emergence and effectiveness in
individualistic Western cultures.

Another example comes from research in personality on the five-factor


model. On the basis of widespread research, McCrae and Costa (1997)
found considerable evidence for the universality of the structure in US
English, German, Portuguese, Hebrew, Chinese, Korean and Japanese
samples. On the other hand, however, Cheung et al. (1996) found that
the five-factor model leaves out aspects of psychological functioning
that are considered important by Chinese people. For example,
interpersonal factors such as “harmony” and “losing face” are often
observed when descriptions of personality are given by Chinese
informants, but are not represented in the five-factor model.
A third example can be found in Ho’s (1996) work on filial piety
(psychological characteristics associated with being a good son or
daughter). The Western conceptualisation is more restricted than the
Chinese, according to which children are supposed to assume the role of
caretakers of their parents when the latter grow old. Finally, Dyal (1984,
cited in Van de Vijver & Phalet, 2004) shows that measures of locus of
control often show different factor structures across cultures, strongly
suggesting that either the Western concept of control is inappropriate in
cross-cultural settings or that the behaviours associated with the concept
differ across cultures.

Construct equivalence thus implies that the same construct is being


measured across cultures, and inequivalence occurs when the instrument
measures a construct differently in two cultural groups, when the
concepts of the construct overlap only partially across cultures or when
the measure identifies somewhat different constructs (resulting in
“apples and oranges being compared”). This absence of structural
equivalence indicates bias at the construct level, and unless construct
equivalence is demonstrated, erroneous or misleading conclusions about
the nature and significance of the construct in the particular context are
likely to result. This suggests the need for an “emic” approach involving
the development of an appropriate assessment process that is tailored to
the unique constellation of dimensions in the particular context.

8.3.2 Item bias


Item bias, also known as differential item functioning or DIF, refers to
systematic error in how a test item measures a construct for the members
of a particular group (Camilli & Shepard, 1994). When a test item
unfairly favours one group of examinees over another, the item is
biased. Even if the construct itself does not vary across cultural divides,
many of the items in the assessment may behave quite differently in
different contexts. When anomalies at the item level exist, item bias is
detected (Fontaine, 2005), which points towards differences in the
psychological meaning of the items across cultures or the inapplicability
of item content in a specific culture. An item of, say, an assertiveness
scale is said to be biased if people from different sociocultural contexts
with a given level of assertiveness are not equally likely to endorse the
item. A good example, given by Hambleton (1994, p. 235 and cited by
Van de Vijver, 2002, p. 549), is the test item “Where is a bird with
webbed feet most likely to live?” The English phrase “the bird with
webbed feet” is translated into Swedish as “the bird with swimming
feet”, with the result that the English and Swedish items are no longer
equivalent as the Swedish version provides a much stronger clue to the
answer than the original English item. (In South Africa, many school-
leaving examination papers in technical subjects such as science or
biology were in the past presented simultaneously in English and
Afrikaans on the same question paper. Many English students, when
stumped about the meaning of a technical term, would turn to the
Afrikaans version of the question for a clue. For example, in biology, the
English term stamen is translated into Afrikaans as meeldraad, which
literally translates as pollen wire.)

This type of bias is a major issue in determining the cross-cultural


equivalence of a measure and has been extensively studied by
psychometricians (see e.g. Berk, 1982; Holland & Wainer, 1993). At the
same time, it must be realised that item bias does not reside only in the
translation of items from one language to another. Van de Vijver (2002)
gives the hypothetical example of the item: “Are you afraid when you
walk alone on the street in the middle of the night?”, pointing out that
this item may be responded to very differently by persons depending on
the safety of their neighbourhood, even though they fully comprehend
the question. An item is deemed equivalent across cultural groups when
it behaves in the same way in both cultures – that is, when this form of
item bias is absent. Ways of demonstrating and measuring the extent of
equivalence across cultural groups are discussed in some depth in
section 8.5, but in general these can be seen to take the form of chi-
square expectancies*, item-whole correlations*, factor-loadings*
and item curve characteristics* (ICC) of these items, which need to be
shown to be (acceptably) similar to each other.

8.3.3 Method bias


The third source of bias in cross-cultural assessment refers to the
presence of nuisance variables due to method-related factors. Three
types of method bias can be envisaged. First, incomparability of samples
on aspects other than the target variable can lead to method bias (sample
bias). For instance, cultural groups often differ in educational
background and, when dealing with mental tests, these differences can
confound real population differences on a target variable.

Secondly, method bias also refers to problems that relate to the


assessment materials used (instrument bias). A well-known example
illustrating this is the study by Deregowski and Serpell (1971), who
asked Scottish and Zambian children in one condition to sort miniature
models of animals and motor vehicles, and in another condition to sort
photographs of these models. Although no cross-cultural differences
were found for the physical models, the Scottish children obtained
higher scores than the Zambian children when photographs were sorted.
In the latter case, the Zambian children were relatively unfamiliar with
photographic material.

The third form of method bias arises from the manner in which the
assessment is administered (administration bias). Communication
problems between testers and testees (or interviewers and interviewees)
can easily occur, especially when they have different first languages and
cultural backgrounds (see Gass & Varonis, 1991). Interviewees’
insufficient knowledge of the testing language and inappropriate modes
of address or cultural norm violations on the part of the interviewer can
seriously endanger the collection of appropriate data, even in structured
interviews. One can see how computerised administration of a test
would affect computer-literate people and those with very little
computer experience quite differently.

The distinction between measurement unit equivalence (e.g. degrees


Celsius and degrees Kelvin) and scalar equivalence (where the meaning
of the values obtained on the measure are identical across groups) is
important because only the latter assumes that the measurement is
completely free of bias (Van de Vijver & Tanzer, 2004). As indicated,
construct bias indicates conceptual inequivalence, and instruments that
do not adequately cover the target construct in one of the cultural groups
cannot be used for cross-cultural score comparisons. Construct bias
precludes the cross-cultural measurement of a construct with the same
measuring instrument or scale (Van de Vijver & Tanzer, 2004). If no
direct score comparisons are to be made across cultures, then neither
method nor item bias will affect cross-cultural equivalence. However,
both method and item bias can have major effects on scalar equivalence
as items that systematically favour a particular cultural group may
conceal real differences in scores on the construct being assessed.

8.4 Forms of equivalence

Equivalence is essentially the absence of bias – that is, the systematic


but irrelevant component of the observed scores – and is the extent to
which any measure yields the same results across different groups and is
able to correctly identify individuals or groups possessing equal amounts
of the attribute concerned (assuming that they have the same amount of
the attribute being assessed), and to correctly distinguish between people
and groups with different amounts of the attribute. As Kanjee and
Foxcroft (2009) show (using a South African example),

[f]or measures to be equivalent, individuals with the same or similar


standing on a construct, such as learners with high mathematical
ability, but belong to different groups, such as Xhosa- and Afrikaans-
speaking, should obtain the same or similar scores on the different
language versions of the items or measure. If not, the items are said to
be biased and the two versions of the measure are non-equivalent (p.
79).

Individual test items and the test as a whole should not vary in the levels
of difficulty or intensity when the groups are known to be similar.
Equivalence is thus achieved when the assessment behaves in a similar
way across cultures as shown by a pattern of high correlations with
related measures (convergent validity) and low correlations with
measures of other constructs (discriminant validity) as would be
expected from an instrument measuring a similar construct. If there are
major differences in the way in which the groups behave, or if there are
marked differences in the way in which the attributes occur, then
specifically designed measures need to be developed and tailored to
meet the demands of the cultural context. This means that at least some
items will be different in the two countries. This approach is consistent
with the “emic” approach.

Three kinds of equivalence have been identified and are linked in a


hierarchy of increasing importance (Van de Vijver & Poortinga, 1997;
Van de Vijver & Leung, 1997a, 1997b). These levels are construct
equivalence, measurement unit equivalence and scalar equivalence.

8.4.1 Construct equivalence*


This form of equivalence, also termed structural equivalence* and
functional equivalence*, indicates that the same construct is measured
across all cultural groups studied, even if the measurement of the
construct is not based on identical instruments across all cultures. In
cross-cultural assessment, the test constructor/user cannot assume that
the construct being assessed has the same meaning and psychological
import across cultural divides – this needs to be empirically
demonstrated, and a measure shows construct bias if there is an
incomplete identity of a construct across groups or incomplete overlap
of behaviours associated with the construct (Van de Vijver & Phalet,
2004). Construct equivalence thus implies that the construct is universal
(i.e. culture independent) and is measured with a given instrument with
equal validity in the different sociocultural groups. It assumes the
similarity of the underlying psychological construct in the various
groups, a view that is associated with an “etic” position.

8.4.2 Measurement unit equivalence


This second level of equivalence identified) is called measurement unit
equivalence* and is obtained when two metric measures have the same
measurement unit but have different origins (i.e. the point at which they
cross the y-axis (point “c”) as shown in the formula Y = aX + c (Van de
Vijver & Leung, 1997a, 1997b). In other words, the scales can be
equated by adding (or subtracting) a constant equal to the difference
between the c-values obtained on the two measures. An example of this
can be found in the measurement of temperature using Kelvin and
Celsius scales as shown in Figure 1.1 when the different levels of data
are discussed. The two scales have the same unit of measurement, but
their origins differ by 273,4 degrees. Converting between degrees
Celsius and degrees Kelvin is achieved by adding a constant 273,4 to the
former (so that 100 °C is equivalent to 373,4 K). In some cases, the
scores are obtained using two measures that are scaled differently and
can therefore not be directly compared. The scores can be compared if
the relationship between the two scales is known, as occurs with Celsius
and Fahrenheit temperature measures. To illustrate this, conversion
between the Celsius scale and the Fahrenheit scale involves multiplying
the °C by 9/5 and adding an offset constant of 32 (so that 60 °C equals
60 × 9/5 + 32 or 108 + 32, which equals 140°F).

In the case of cross-cultural studies, both measurement unit equivalence


and scale relationships need to be known if scores are to be compared.
In the case where the measurement unit is equivalent, direct score
comparisons cannot be made across groups unless the size of the offset
is known, which is seldom the case. At the same time, differences within
each group can still be compared across groups. For example, change
scores in pretest–post-test designs can be compared across cultures for
instruments with measurement unit equivalence. Similarly, gender
differences found in one culture can be compared with gender
differences in another culture for scores showing measurement unit
equivalence, even though across-group comparisons of each gender are
not meaningful.

8.4.3 Scalar equivalence


The third and highest level of equivalence is scalar equivalence*, or
full-scale equivalence. This level of equivalence is obtained when two
measures have the same measurement unit and the same origin. These
values need to be demonstrated empirically. Thus a score of 10 on a
scale of job satisfaction would have the same psychological meaning in
all sociocultural groups assessed only if scalar equivalence has been
demonstrated – that is, only if the regression line obtained for the two
groups was equivalent in both slope and point of intersection with the y-
axis. Full-scale equivalence uses the same scale across cultures, thereby
maintaining the same unit of measure. Naturally, such equivalence can
only be achieved if scales are universally used and accepted to hold the
same universal meaning (e.g. Fahrenheit or Celsius scale).

The highest level of equivalence is thus scalar or full-scale equivalence,


which requires equivalence at both construct and measurement levels
(Van de Vijver & Tanzer, 2004). In other words, full-test or scalar
equivalence is achieved when construct, measurement and scalar
characteristics obtained within cross-cultural testing contexts are all
similar to those achieved in mono-cultural conditions (Van de Vijver &
Tanzer, 2004).

A general model for assessing construct equivalence has been developed


by Douglas and Nijssen (2002). (See Figure 8.1.)

Figure 8.1 A general model for assessing equivalence

Source: Douglas & Nijssen (2002)


Link to principal components

8.5 Detecting item bias

Various methods of demonstrating and measuring the extent of


equivalence across cultural groups take the form of differences in item
means and standard deviations, various nonparametric techniques based
on chi-square expectancies*, item-whole correlations*, factor
loadings* and item curve characteristics* (ICC). Differential item
functioning (DIF) is perhaps the most important indicator of non-
equivalence of assessment items/tests and of bias. At the same time, an
item that exhibits DIF may not necessarily be biased for or against any
group (Kanjee, 2007), but may reflect performance differences that the
test is designed to measure (Camilli & Shepard, 1994) or real
differences in the phenomenon being assessed. This is illustrated in
Chapter 7 (section 7.3.1), where differences between males and females,
and between athletes with a European and an African heritage perform
quite differently in various sporting events such as long-distance running
and sprinting.

In order to detect the presence and extent of inequivalence, we need to


move away from classical test theory to what is known as differential
item functioning (DIF), which is perhaps the most important indicator of
non-equivalence of assessment items/tests and bias. According to
Hambleton, Swaminathan and Rogers (1991), the accepted definition of
DIF is as follows:

An item shows DIF if individuals having the same ability, but from
different groups, do not have the same probability of getting the item
right (p. 110).

DIF analysis is a means of identifying unexpected differences in


performance across matched groups of testees by comparing the
performance of matched reference and focal groups. It also refers to the
differing probabilities of success on an item of people of the same
ability but belonging to different groups – that is, when people with
equivalent overall test performance but from different groups have a
different probability or likelihood of answering an item correctly.

DIF thus refers to the differing probabilities of success on an item of


people of the same ability but belonging to different groups – that is,
when people with equivalent overall test performance but from different
groups have a different probability or likelihood of answering an item
correctly. DIF analysis aims at identifying unexpected differences in
performance across matched groups of testees by comparing the
performance of matched reference and focal groups of equal ability. Of
course, the equivalence of the ability needs to be shown independently
of the assessment process – and this creates a great area for heated
debate and political grandstanding.

There are several ways in which item bias can be demonstrated. Some
are based on expert judgements which are based on inspection and back
translation, while others are based on various forms of statistical
analysis. The statistical techniques are divided into two main categories:
non-parametric methods developed for dichotomously scored items
using contingency tables, and parametric methods for test scores with
interval-scale properties based on the analysis of variance (ANOVA).

8.5.1 Judgemental techniques


Judgemental approaches for determining the equivalence of measures
rely on the degree to which two or more experts in the area agree that
the measures are similar. The most common judgemental approaches to
identifying inequivalence involve experts in test construction, very
familiar with both the culture of origin and the target culture, who
inspect the items for cultural and linguistic equivalence. These
techniques include forward translation and back translation of test
items. Forward translation is done when the measure is translated from
the source language (SL) into the target language (TL) by a person (or
group of people) who is/are experts in both languages. In forward
translation, the original test in the source language is translated into the
target language and TL speakers are then asked to complete the measure.
They are questioned by the experts about their responses and their
understanding of the various items.

In back translation, the test is translated into the target language and then
it is re-translated by an independent expert back into the source
language. A panel of bilingual scholars then reviews the translated
version, which is translated back into the first language to monitor
retention of the original meaning. An independent back translation
means that “an original translation would render items from the original
version of the instrument to a second language, and a second translator –
one not familiar with the instrument – would translate the instrument
back into the original language” (Geisinger, 1994, p. 306). Once the
process is complete, the final back-translated version is compared to the
original version (Brislin, 1980; Hambleton, 1994). Finally, the translated
one of the assessment is “tried out” out with a sample of participants and
refined in the light of this experience. This process can be repeated
several times. The simplicity of the option has led to its widespread use.

8.5.2 Non-parametric statistical approaches


Whereas the judgemental approaches involve judgements by expert
statistical approaches, non-parametric approaches look for differences in
the frequency with which test scores are given, using a contingency
approach and the chi-squared statistic. These patterns are based on
various factors such as age, gender and cultural-group membership –
when differences in predicted scoring patterns occur on the basis of
group membership, bias is identified. Three non-parametric approaches
can be identified, namely the Mantel-Haenszel (MH) approach, the
Simultaneous Item Bias Test (SIBTEST) and Distracter Response
Analysis (DRA).

8.5.2.1 The Mantel-Haenszel (MH) approach


The first non-parametric method for identifying DIF was developed by
Mantel and Haenszel (1959) (see also Holland & Thayer, 1988). The
Mantel-Haenszel (MH) approach uses contingency tables and is based
on the assumption that an item does not show DIF if the odds (or
chances) of getting an item correct are the same at all ability levels for
two matched groups of test-takers who differ only in terms of their
membership (call these two Group A and Group B). The pass/fail results
of Group A and Group B are tabulated in a two-by-two table for each
item and compared. This is repeated for each item in the measure.
Suppose there are 100 people in both groups, and that 58 As and 23 Bs
get the item right and 42 As and 77 Bs get the item wrong. This is
shown in Table 8.2.

Table 8.2 Contingency table for Item 1

Group A Group B
Pass 58 23
Fail 42 77

Just by looking at this distribution, it is clear that the item is much easier
for members of Group A than it is for Group B. Clearly Item 1 is biased
against the members of Group B. However, inspection is not good
enough and so the chi-square statistic is used. MH yields a chi-square
test with one degree of freedom to test the null hypothesis that there is
no relation between group membership and test performance on one
item after controlling for ability as given by the total test score. In other
words, an item is biased if there is a significant difference in the
proportions of each membership group achieving a correct or desired
response on each test item. Once the item has been examined in this
way, the next step is to compare the scores for Item 2 in exactly the
same way. This is continued for Item 3 and all other items, until they
have all been compared.

In order to calculate the Mantel-Haenszel statistic, the following steps


need to be taken. Firstly, the test data must be coded and scored. Each
examinee must have (a) a code or label for group membership; (b) the
actual response (right or wrong) for each item; and (c) total score on the
test. Secondly, data for each item must be organised into a three-way
contingency table. Thirdly, the statistical analysis for detecting and
testing for DIF and item bias (chi-square) has to be conducted for each
item and each ability level.

The method outlined above assumes that the amount of DIF is the same
across all members of Groups A and B, and that there is no interaction
between item difficulties for members with different levels of ability.
This assumption is termed a uniform DIF and exists when the
probability of answering an item correctly is greater for one group
consistently over all ability levels. In other words, uniform DIF occurs
when there is no interaction between ability level and group
membership. As Ekermans (2009) shows, uniform bias results from
differences in item difficulty as shown by differences in the regression
intercept of the observed item scores on the variable across different
sociocultural groups (the offset described in Chapter 3, section 3.6.3).
She argues further that if assumptions of scalar equivalence remain
untested, there is minimal impact on within-cultural group decisions.
This is because all scores will be affected in the same direction. At the
same time, in the absence of evidence of scalar invariance, between-
group differences may be incorrectly interpreted as showing real
differences between the groups (Cheung & Rensvold, 2002; Steenkamp
& Baumgartner, 1998; Van de Vijver & Tanzer, 2004). In the absence of
empirical evidence of metric equivalence of the measurements, any
findings about group differences on the attributes being assessed, and
subsequent practical implications of the results in important areas of
functioning, are simply not known.

Non-uniform DIF occurs when there are differences in the probabilities


of a correct response for the two groups at different levels of ability (in
other words, when there is an interaction between ability level and group
membership. Non-uniform item bias has implications at the
measurement unit (or metric) equivalence level because the variables of
interest are not measured on the same metric scales across different
groups. As a result, assessment outcome decisions (e.g. personnel
selection, mental health status) that are based on the attribute measured
may not be meaningful where relative differences exist between groups.
The only way around this is to develop and use group-specific norms to
avoid adverse impact (as determined by similar selection ratios for
majority and minority groups).

When non-uniform DIF occurs or is suspected, it is necessary to


calculated DIF scores at the different levels of ability. To do this, the
whole sample must be divided into a number of subgroups (K) on the
basis of their ability scores (call these K1, K2, K3, etc.). The comparison
of responses for each item is then carried out for each of the ability
subgroups, so that the passes and fails for Groups A and B are compared
at ability level K1, then again at ability level K2, and so on for each
ability level. Then the whole process is repeated for Item 2 and Item 3,
and so on. As can be seen, this approach requires a two-by-two
contingency table for each item and each ability level. If there are 50
items and four subgroups (K = 4), the chi-square statistic must be
computed 50 × 4 or 200 times. However, as Gierl, Jodoin and Ackerman
(2000, p. 11) note, non-uniform DIF is quite rare in practice.
Nevertheless, an alternative approach that takes non-uniform DIF into
account is likely to be more useful.

8.5.2.2 Simultaneous Item Bias Test (SIBTEST)


A second non-parametric statistical method for detecting DIF is the
Simultaneous Item Bias Test (SIBTEST) proposed by Shealy and Stout
(1993). SIBTEST, which is an extension of the MH approach, differs
from MH by using a more sophisticated matching process. They argue
that the observed ability score is not the best means of categorising the
ability groups as these scores contain an error component as shown in
Chapter 3, where it is argued that the Observed Score is made up of a
True Score, plus or minus an Error Score. As Zumbo (1999) correctly
notes, composite scores (i.e. scale total scores) are merely indicators of a
latent (unobservable) variable. The SIBTEST uses a regression estimate
of the true score instead of the Observed Score as the matching or
categorising variable. As a result, examinees are matched on an
estimated latent ability score rather than an observed score. An
advantage of this method is that SIBTEST can be used to evaluate DIF
in two or more items simultaneously in the analysis, whereas in the MH
approach a separate analysis has to be carried out for each item in the
test. SIBTEST does this by grouping the items into “testlets” or item
bundles (Douglas, Roussos & Stout, 1996).

Although MH has been the preferred method in DIF detection (Roussos


& Stout, 1996), researchers have shown that SIBTEST has superior
statistical characteristics compared to MH, especially for detecting
uniform DIF (Narayanan & Swaminathan, 1994; Roussos & Stout,
1996; Shealy & Stout, 1993).

8.5.2.3 Distracter Response Analysis (DRA)


A variant of the MH approach that can be used when multiple
alternatives are provided is known as Distracter Response Analysis
(DRA), which examines the incorrect alternatives or distracters to a test
item for differences in patterns of response among different subgroups
of a population. In the DRA, responses are analysed in terms of the null
hypothesis that there is no significant difference in proportions when
selecting distracters on the test items between the reference and focal
groups. As with MH, contingency tables are used and evaluated using
chi-square. In terms of this framework, no item bias occurs when there
is no significant difference in the proportion of the different groups
selecting particular distracters on the test items.

8.5.3 Parametric approaches to DIF analysis


8.5.3.1 Item Response Theory (IRT)
Item Response Theory (IRT) is an extremely powerful theory that can be
used to detect bias, especially with large-scale testing programmes. The
basic argument of IRT is that the higher an individual’s ability level, the
greater the individual’s chance of getting an item correct. This is
understandable as people with higher scores can generally expect to get
more items right than those with lower scores. This relationship can be
shown graphically by plotting the ability level of the test-taker
(represented by the total score) on the x-axis, and the probability of
getting the item correct on the y-axis. Such a plot is known as an Item
Characteristic Curve or ICC. This is shown in Figure 8.2.
Figure 8.2 Item Characteristic Curve

As can be seen in Figure 8.2, the probability of doing well on Item 1


increases as the ability levels of the individuals taking the test increase –
low-ability individuals do relatively badly on the item, whereas high-
ability individuals do relatively well on the item. Item 2, on the other
hand, is far more difficult, as the probability of getting the item correct
remains low, irrespective of the respondents’ ability level. The slope of
the curve indicates the discriminating power of the item. Note that if the
curve is relatively flat then the item does not discriminate among
individuals with high, moderate or low total scores on the measure.

Zumbo (1999, p. 16) identifies a number of parameters that characterise


ICCs. These include the slope of the curve (which in the case of a
cognitive test indicates the ability of the item to discriminate between
individuals), the position along the axis (which represents the item
difficulty level, or conversely the ability level required to get the item
correct) and the minimum, non-zero level, which represents what
someone with zero ability would get – that is the guessing level. In the
case of a personality measure, these three parameters reflect firstly the
ability of the item to distinguish between people with a particular
characteristic and those without; secondly, the amount of the
characteristic that the person must have to endorse the item; and finally,
the likelihood that the person will endorse the item without due
consideration, as a result of social desirability, guessing and the like.
The meanings of these three parameters in both the cognitive and
personality domains are summarised in Table 8.3.

Table 8.3 Interpretation of ICC properties for cognitive and personality measures

Cognitive, aptitude,
Personality, social or
ICC property achievement or
attitude measures
knowledge test
Slope (commonly called Item discrimination – a flat Item discrimination – a flat
the a-parameter in IRT) ICC does not differentiate ICC does not differentiate
among test-takers among test-takers
Position along the X-axis Item difficulty – the amount Threshold – the amount of
(commonly called the b- of a latent variable needed a latent variable needed to
parameter in IRT) to get an item right endorse the item
Y-intercept (commonly Guessing The likelihood of
called the c-parameter in indiscriminate responding
IRT) or socially desirable
responses

Against this background, it is relatively simple to demonstrate how ICCs


can be used to show DIF. Figure 8.3 shows the ICCs for a single item
for two groups (A and B) at various ability levels.

Figure 8.3 Item Characteristic Curve demonstrating item bias


Here it can be seen that the difficulty level of the item is greater for all
ability levels of Group B (squares) than it is for Group A (circles) – the
probability of getting the answer right is lower for each ability subgroup
in Group B than for the corresponding ability subgroup in Group A. In
addition, the difference in probability of being correct increases as the
ability of the groups increases – the gap between Group A and Group B
(i.e. the relative difficulty of the item) increases as the ability levels of
the groups increase. Remember, the two groups have been matched for
ability level in each of the five ability groups.

8.5.3.2 Logistic Regression (LR)


A different approach to detecting DIF is based on parametric analysis
(unlike MH and SIBTEST which are both non-parametric), and makes
use of Logistic Regression (LR) techniques (Swaminathan & Rogers,
1990). Logistic Regression is a kind of regression analysis often used
when the dependent variable is dichotomous and scored 0 or 1. It can
also be used when the dependent variable has more than two categories.
It is usually used for predicting whether something will happen or not –
anything that can be expressed as Event/Non-event. Independent
variables may be categorical or continuous. The logistic regression
approach compares group membership variables (such as gender,
ethnicity or age) and/or item parameters associated with two groups.
With LR, the presence of DIF is determined by testing the improvement
in model fit that occurs when a term for group membership and a term
for the interaction between test score and group membership are
successively added to the logistic regression model. When there is no
DIF, the Item Characteristic Curves (ICC) for the two groups will be the
same. The null hypothesis is that for two groups at a given ability level,
the population value is zero for either the difference between the
proportions correct or the log odds ratio on the test items between the
reference and the focal group. A chi-square test is used to evaluate the
presence of uniform and non-uniform DIF on the item of interest by
successively testing each term included in the model. LR can thus detect
uniform and non-uniform DIF, which is thus an improvement over the
MH. (For a more detailed account of LR, see Foxcroft & Roodt, 2001,
pp. 97–101 and Foxcroft & Roodt, 2009, pp. 83–84.)

The decision whether to use non-parametric (MH and/or SIBTEST)


techniques or the parametric LR technique depends on the situation. As
Gierl, Jodoin and Ackerman (2000) have shown, LR is more powerful
than MH at detecting non-uniform DIF since the latter method was only
designed to detect uniform DIF (Rogers & Swaminathan, 1993).
However, although the IRT approach is superior theoretically and
clearly recommended in the literature (Shepard, Camilli & Williams,
1985, p. 84), LR requires large sample sizes – the use of three different
parameters requires a minimum of 1000 cases per group. As
Schumacher (2010) notes,

[t]he Mantel-Haenszel method can be used with smaller sample sizes,


while logistic regression, which can be conceptualized as a link
between the contingency table method (Mantel-Haenszel) and IRT
method, offers a more robust solution under both uniform and non-
uniform DIF conditions.

Pedrajita and Talisayon (2009) point out that the common measure of
bias across these various approaches is the significance of the chi-square
value obtained. A significant chi-square value indicates: (1) difference in
proportion attaining a correct response across total score categories for
the X2 procedure; (2) difference in proportions selecting distracters for
the DRA; (3) difference in the odds of getting an item right between the
reference/focal groups compared for the LR; and (4) large DIF effect
for the MH statistic. They argue further that no one method is better than
any other, and that the argument for the presence of DIF is increased
when two, three or all of the four methods yield a statistically significant
chi-square value on an item or groups of items. They summarise the
various methods of DIF and their accompanying statistical analyses in
Table 8.4.

Table 8.4 Statistical criteria for identifying biased items

Measure of
DIF approaches Focus of analysis
bias
Chi-square Differences in proportions attaining a correct Significance of
response across score categories chi-square
Distracter Difference in proportions selecting distracters Significance of
Response chi-square
Analysis
Logistic Odds of getting the item right Significance of
Regression chi-square
Mantel-Haenszel Performing chi-square statistical tests for DIF Significance of
effect chi-square

Source: Pedrajita & Talisayon (2009), Table 1, p. 25

8.5.3.3 Factor analysis


A third parametric approach to the detection of inequivalence is the use
of factor analysis. As shown in Chapter 5, section 5.2.1.3, one way of
showing that a measure has theoretical or construct validity is to show
that the factor structure of the new measure is very close to that of more
established measures assessing the same construct. This technique can
also be used to show equivalence using both Exploratory Factor
Analysis (EFA) and Confirmatory Factor Analysis (CFA).

Exploratory Factor Analysis: EFA can be used to check and compare


factor structures, especially when the underlying dimensions of a
construct are unclear. Groups can then be compared and the similarity
of the underlying structures can be taken as an indicator of the degree
to which the groups attach a similar meaning to the assessment.
Multiple groups can be compared either in a pairwise or a one-to-all
(each cultural group versus the pooled solution) fashion. Target
rotations are employed to compare the structure across countries and
to evaluate factor congruence, often by means of the computation of
Tucker’s phi coefficient (Van de Vijver & Poortinga, 2002). This
statistic examines the extent to which factors are identical across
cultures. Values of Tucker’s phi above 0,90 are usually considered to
be adequate and values above 0,95 to be excellent. Tucker’s phi can
be computed with dedicated software such as an SPSS routine (syntax
available from Van de Vijver & Leung, 1997a, and
http://www.fonsvandevijver.org) (He & Van de Vijver, 2012, p. 11).
Confirmatory Factor Analysis: A more refined and theory-driven way
of examining the similarity of factor structures across different groups
is through the use of Confirmatory Factor Analysis (CFA, or
structural equation modelling). In general, the closeness of one factor
structure to another is demonstrated using CFA, in which the
“goodness of fit” between the two is determined using the chi-square
statistic (or x2). This technique can be used to compare the
equivalence of factor structures in different cultural settings (Marsh &
Byrne, 1993) by showing the degree of similarity between the factor
structures obtained in the target group and the reference group. In this
respect, the cross-cultural equivalence of the two tests is seen as a
form of validity generalisation – is the test equally valid for both
groups? If the goodness-of-fit statistic shows an acceptable fit (0,90 or
higher), the hypothesis that the two structures are similar cannot be
rejected.

CFA is more sophisticated than the EFA approach as it is based on


covariance matrix information to test hierarchical models (He & Van de
Vijver, 2011, p. 11). In addition to the use of the chi-square test to
determine goodness of fit, this can also be evaluated using the Tucker
Lewis Index (acceptable fit is indicated by values of above ,90 and
excellent above ,95), the Root Mean Square Error of Approximation
(RMSEA, with acceptable fit indicated by values of below ,06 and
excellent fit by values below ,04), and Comparative Fit Index
(acceptable above ,90 and excellent above ,95) (Kline, 2010). These
analyses can be carried out with software such as AMOS and Mplus
(Byrne, 2001, 2010).

According to Van de Vijver and Hambleton (1996), the advantage of


using CFA is that it allows for incomplete overlap of stimuli. However,
they point out (p. 10) that the amount of overlap in conceptualisation of
the construct or the extent of shared behaviours across cultures can be so
small that an entirely new instrument has to be assembled. This is most
likely to happen when an instrument that has to be developed in one
cultural context, usually some Western country, contains various –
implicit or explicit – references to the local context of the test developer.

Of course, this approach does not identify particular items that behave
differently across the groups, although an examination of the items that
load differently on particular factors in the two factor analyses will point
to the differential behaviour of items. These may then be further
explored, as described in the previous section on DIF.

In conclusion, DIF is a strong indication that some items of the measure,


or the measure as a whole, may be biased against one of the
sociocultural groups being assessed. At the same time, DIF is a
necessary, but not sufficient, condition for item bias to exist. In other
words, if an item does not show DIF, then no item bias is present.
However, if DIF is detected, this is not sufficient reason to declare item
bias; it rather indicates the possibility that such bias exists and one
would have to apply follow-up item bias analyses (e.g. content analysis,
empirical evaluation) to determine the presence of item bias. Two
approaches to examining potential measurement bias have been
described, namely judgemental approaches and statistical approaches.
Judgemental methods rely solely on one or more expert judges’ opinions
to select potentially biased items. Clearly this is an impressionistic
methodology, whereas statistical techniques that investigate in some
depth those items or measures that show potential bias and then probe
these in greater depth using statistical techniques are scientifically far
more defensible.
8.6 Method bias*

As indicated in Chapter 3, section 3.4, assessments can be administered


in many ways, including the following:

Interviewing
Pencil and paper
Card sorting (e.g. sorting a pile of cards with an adjective on each into
piles such as “very much like me”, “somewhat like me” and “not at
all like me”
Manual (e.g. fitting objects together to make a whole, such as jigsaw
puzzles, drawing lines/objects)
Computerised testing, including adaptive testing

Various problems are experienced when these different formats are used
in a cross-cultural context – these are termed method bias or instrument
bias* as discussed in section 8.2.3. Clearly when people are not used to
being assessed (i.e. are relatively low on test wiseness or test
sophistication*), they may suffer from test anxiety and as a result tend
to underperform. This may be particularly relevant when high-tech
methods such as questionnaires and computer-based applications are
used, and less likely when interviews and assessment techniques based
on culturally familiar methods such as toys and the like are used. Novel
assessment techniques have used sand tray drawings, clay modelling,
models of animals and everyday objects, and so forth. Such techniques
have been used, inter alia, by Deregowski and Serpell (1971), who asked
Scottish and Zambian children in one condition to sort miniature models
of animals and motor vehicles, and in another condition to sort
photographs of these models. Many of these same techniques are used in
various forms of psychotherapy, including art therapy (e.g. Oaklander,
1997).

8.6.1 Detecting method bias


Van de Vijver and Hambleton (1996) also argue that an often-neglected
source of bias in cross-cultural studies is method bias. They identify
several approaches to this, including triangulation, response set
detection and non-standard administration.

8.6.1.1 Triangulation
In order to detect method bias, they argue for a process of triangulation
(e.g. Lipson & Meleis, 1989) using single-trait, multimethod matrices
(e.g. Campbell & Fiske, 1959; Marsh & Byrne, 1993). Unless these
different measures that are known to assess similar constructs yield very
similar outcomes, one or all of the methods used are likely to be suspect.
An alternative method is to use repeated test administrations and to
examine score patterns between two administrations. If individuals from
different groups with equal test scores on the first occasion have very
different scores on the second administration, the validity of the first
administration is open to doubt. They argue that this approach is
particularly useful for mental tests.

8.6.1.2 Measures of response sets


A second method for detecting method bias identified by Van de Vijver
and Hambleton (1996) involves the measurement of social desirability
or other response sets (e.g. Fioravanti, Gough & Frere, 1981; Hui &
Triandis, 1989). Should these scores be very different across the cultures
assessed, one can surmise that the assessment itself is behaving quite
differently in the different contexts.

8.6.1.3 Non-standard administration


Finally, method bias can be examined by administering the instrument in
a non-standard way, soliciting all kinds of responses from a respondent
about the interpretation of instructions, items, response alternatives and
motivations for answers. Such a non-standard administration provides an
approximate check on the suitability of the instrument in the target
group.
8.7 Addressing issues of bias and lack of
equivalence

In general terms, all psychological assessments require the assessors to


demonstrate the reliability, validity and fairness of the techniques used.
By extension, part of this requirement is that equivalence of the
assessment techniques used in a cross-cultural context also needs to be
demonstrated. Minimising bias in cross-cultural assessment usually
amounts to a combination of strategies: integrating design,
implementation and analysis procedures. Van de Vijver and Tanzer
(2004) have identified a number of strategies for describing and dealing
with the different biases outlined above.

According to He and Van de Vijver (2011), actions can or should be


taken to reduce or prevent low levels of equivalence from occurring at
various stages of the assessment process. They identify three such
stages, namely the design, implementation and analysis stages (see pp.
9–14). Although this categorisation makes sense, it is difficult to see
how actions taken at the analytic stage can reduce inequivalence – at
best, analysis will identify the presence, nature and extent of such
inequivalence.

8.7.1 At the design stage


The actions that can be taken at the design stage to ensure construct
equivalence in a cross-cultural comparative study fall into two broad
categories, namely decentring* and convergence* (Van de Vijver &
Leung, 1997a). According to Werner and Campbell (1970), cultural
decentring means that an instrument is developed simultaneously in
several cultures and only the common items are retained for the
comparative study; making items suitable for a cross-cultural context in
this approach often implies the use of more general items and the
removal of specifics, such as references to places and currencies when
these concepts are not part of the construct being measured. This is
essentially an adaptation approach. He and Van de Vijver (2011) point
out that large international educational assessment programmes such as
the Program of International Student Assessment (PISA), generally
adopt this approach, which involves committee members from target
cultures meeting to develop culturally suitable concepts and items.

When the convergence approach is used, instruments measuring similar


constructs are developed independently within cultures, and the various
instruments are then administered across the various cultures (Campbell,
1986). It is essentially a process of assembly and then adoption. An
example of this is given by He and Van de Vijver (2011, pp. 9–10) when
they describe a study by Cheung, Cheung, Leung, Ward and Leung
(2003). Both the NEO-Five Factor Inventory (NEO-FFI) (a Big Five
measure developed and validated mostly in Western countries) and the
Chinese Personality Assessment Inventory (CPAI) (which was
developed in the Chinese context) were administered to both Chinese
and Americans. Joint factor analysis of the two personality measures
revealed that the Interpersonal Relatedness factor of the CPAI was not
covered by the NEO-FFI, whereas the Openness domain of the NEO-
FFI was not covered by the CPAI. Consequently, one can expect that
merging items from the measures developed in distinct cultural settings
may show a more comprehensive picture of personality than when the
measure is developed in one setting and then adapted for use in others.

8.7.2 At the implementation stage


Because the interaction between administrators and respondents can be a
significant source of error variance, the right administrator/interviewers
should be selected so that the respondents feel at ease and do not
experience any cultural barriers Brislin (1986). As shown in section
8.3.3, an important source of inequivalence that arises during this
implementation stage is method bias, which refers to problems caused
by the manner in which a study is conducted (method-related issues).
Four types of method bias are identified, namely sample bias,
instrument bias, response sets and administration bias. Steps need to be
taken to address each of these components.

Sample bias arises when sample parameters differ systematically


between the people being assessed and those for whom the assessment
process was initially developed. These differences may be the result of
educational levels, urban versus rural residency and religious affiliation,
or even intensity of religious belief. To address the issue of sampling
bias, Boehnke, Lietz, Schreier and Wilhelm (2011) suggest that the
sampling of cultures should be guided by research goals (e.g. select a
broad cultural spectrum if the goal is to establish cross-cultural
similarities and far more homogeneous cultural groups if cultural
differences are being looked for). When participants are recruited using
convenience sampling, the generalisability of findings to their
population needs special attention. Accordingly, sampling must be
guided by the distribution of the target variable being assessed.
Convenience sampling must be tempered by the nature of the
characteristic being investigated in order to match the two samples as
closely as possible. If this matching strategy does not work, it may well
be possible to control for factors that induce sample bias so that a
statistical correction for the confounding differences can be achieved.
For example, educational quality has a significant impact on the
assessment of intelligence, and therefore the nature, quality and extent
of education must be collected for later use as possible moderating or
adjustment variables. In this respect, He and Van de Vijver (2011) show,
in a study by Blom, De Leeuw and Hox (2011) how, when the non-
response information from the European Social Survey (see
http://www.europeansocialsurvey.org for more details) was combined
with a detailed interviewer questionnaire, systematic country differences
in non-response could in part be attributed to interviewer characteristics
such as contacting strategies.

As we have seen, instrument bias arises when the assessment method


used behaves differently across the different groups as illustrated by
Deregowski and Serpell’s (1971) findings in respect of Scottish and
Zambian children’s ability to sort photographs and models of animals
and motor vehicles.

Response sets refer to systematic differences in the tendency to respond


in particular ways. These need to be identified early and response
formats adjusted accordingly. For example, if particular groups are
known to agree with everything (acquiescence response set), care must
be taken to ensure that equal numbers of positively and negatively
phrased items are presented in the assessment instrument. These paired
items need to be interrogated to ensure consistency of response. Because
second guessing and/or presentation of self may be a problem, a few
distracter items should be included to ensure that this is minimised. A
useful technique in this respect is to label the instrument in some
innocuous way – instead of labelling the scale “Integrity”, “Job
Satisfaction” or “Trust in Minorities”, the instrument could simply be
labelled “Attitudes to Others”, “Attitudes to Work”, and so forth.

Administration bias is a form of method bias that comes about as a result


of various administration practices and conditions (e.g. data collection
modes, class size), ambiguous instructions, interaction between
administrator and respondents, and communication problems (e.g.
language difference, taboo topics), to name a few. This is not only a
question of the administrator’s language ability, but also involves
sensitivity to important aspects of culture, the avoidance of
inappropriate modes of address or cultural norm violations by the
interviewer, all of which can seriously endanger the collection of
appropriate data, even in very structured assessment situations. For
example, male medical staff may experience difficulty in collecting
sexual or other sensitive information from female participants,
especially in conservative societies. In this regard, a study by Davis and
Silver (2003) revealed that, in answering questions regarding political
knowledge, African-American respondents got fewer answers right
when interviewed by a European-American interviewer than when the
information was collected by an African-American interviewer.

In order to minimise this form of bias, a standardised administration


protocol should be developed and adhered to by all assessors. The
establishment of rapport between the administrators and those being
assessed is always crucial but is of particular importance when assessing
cross-culturally. Ensuring proper administration can help minimise the
various response biases that can affect the interpretation of cross-cultural
assessment processes. This must also involve the provision of clear
instructions with sufficient examples. A fact that is often overlooked is
the need to ensure rapport with the participants – “warm-up” or practice
exercises to ensure understanding of the assessment procedures need to
be given at the outset of any assessment process. Needless to say, the
results from these practice components are not analysed.

These various forms of bias and the strategies for reducing them are
shown in Table 8.5.

Table 8.5 Strategies to reduce bias in cross-cultural assessment

Type of bias Strategies


Construct Decentring (i.e. simultaneously developing the same instrument in
bias several cultures)
Convergence approach (i.e. independent within-culture development
of instruments and subsequent cross-cultural administration of all
instruments
Construct Use of informants with expertise in local culture and language
and/or Use of samples of bilingual subjects
method bias Use of local surveys (e.g. content analyses of free-response
questions)
Non-standard instrument administration (e.g. thinking aloud)
Cross-cultural comparison of nomological networks (e.g.
convergent/discriminate validity studies, monotrait-multimethod
studies, connotation of key phrases
Method bias Extensive training of administrators (e.g. increasing cultural
sensitivity)
Detailed manual/protocol for administration, scoring and interpretation
Establishing rapport through cultural sensitivity and practice items
Detailed instructions (e.g. with a sufficient number of examples and/or
exercises)
Use of subject and context variables (e.g. educational background)
Addressing sample issues
Use of collateral information (e.g. test-taking behaviour or test
attitudes)
Assessing response styles
Use of test-retest, training and/or intervention studies
Item bias Judgemental methods of item bias detection (e.g. linguistic and
psychological analysis)
Psychometric methods of item bias detection (e.g. Differential Item
Functioning analysis)
Error or distracter analysis
Documentation of “spare items” in the test manual which are equally
good measures of the construct as actually used test items

Source: Van de Vijver & Tanzer (2004)


Reproduced from Van de Vijver, F. J. R., & Tanzer, N. K. Bias
and equivalence in cross-cultural assessment: An overview.
Copyright © 2004. Elsevier Masson SAS. All rights reserved.

8.7.3 At the analysis stage


As He and van de Vijver (2012) show, there are a number of different
ways of showing the existence of bias at the analysis stage. Among the
most important ways that they identify are exploratory factor analysis
(EFA) and confirmatory factor analysis (CFA) for different levels of
equivalence and differential item functioning (DIF) analysis for
detecting item bias as outlined above in section 8.5.3. In brief, EFA can
be used to check and compare factor structures. However, they argue
that CFA is a better approach as a good fit between the two factor
structures suggests that there is a high level of equivalence across the
cultural groups. If a discrepancy exists between the two groups, DIF
analysis can be used to identify anomalous items. (Recall that DIF
indicates that respondents from different groups have differing
probabilities of getting an item correct or endorsing the item, after
matching on the underlying ability or latent trait that the item is intended
to measure (Zumbo, 1999). In this analysis, scales should be uni-
dimensional (for multi-dimensional constructs, DIF analyses can be
performed on each dimension (see section 8.5.3 above).

In conclusion, the various sources of bias and inequivalence need to be


understood, assessed and combated wherever possible. In addition, the
assessment process needs to be carefully documented, and feedback
from respondents about their experience of the assessment process
should be collected for further analysis and, where the effects cannot be
prevented, these data can be used to account for, and adjust for, any
systematic differences that may be identified (He & Van de Vijver,
2012, p. 11).
8.8 Summary

The need to assess people from minority or immigrant communities


arises as a result of various factors such as migration, natural disasters,
warfare, and the like. With particular reference to the job situation, most
parts of the developed world have seen large numbers of immigrants for
economic reasons, and it is often necessary to determine their suitability
for employment and education/training. As a result, occupational
psychologists and other selection specialists are increasingly being
confronted with the need to assess people whose home language and
cultural background are very different from the dominant ethos in which
the assessment techniques were conceived and/or are administered.

In looking at how this assessment takes place, Van de Vijver and


Hambleton (1996) have identified three approaches which they term
Apply, Adapt and Assemble, although in this text the third approach
(namely Assemble) has been divided in two to yield Develop Culture-
Friendly Tests and Develop Culture-Specific Tests. In order to explain
why any assessment technique cannot be blindly applied in contexts for
which it has not been designed, three distinct sources of bias and
unfairness, namely construct bias, item bias and method bias, have been
identified. The presence of these sources of bias affects the equivalence
of the assessment techniques and outcomes when used in different
sociocultural groups. In this respect, three kinds of equivalence have
been identified and linked in a hierarchy of increasing importance (Van
de Vijver & Poortinga, 1997; Van de Vijver & Leung, 1997a, 1997b).
These levels are: construct equivalence, measurement unit equivalence
and scalar equivalence.

Various methods of detecting and measuring the extent of equivalence


across cultural groups take the form of differences in item means and
standard deviations, and various nonparametric techniques based on chi-
square expectancies*, item-whole correlations*, factor-loadings* and
item curve characteristics* (ICC). Perhaps the most widely used
technique in this regard is Differential Item Functioning (DIF), which
refers to the differing probabilities of success on an item of people of the
same ability but belonging to different groups – that is, when people
with equivalent overall test performance but from different groups have
a different probability or likelihood of answering an item correctly.

There are several ways in which item bias can be demonstrated. Some
are based on expert judgements based on inspection as well as forward
and back translation, while others are based on various forms of
statistical analysis. The statistical techniques are divided into two main
categories: non-parametric methods developed for dichotomously scored
items using contingency tables and parametric methods for test scores
with interval-scale properties based on the analysis of variance
(ANOVA). Non-parametric statistical approaches look for differences in
the frequency with which tests scores are given, using a contingency
approach and the chi-square statistic. There are three such non-
parametric approaches, namely the Mantel-Haenszel (MH) approach, the
Simultaneous Item Bias Test (SIBTEST) and Distracter Response
Analysis (DRA). The best known of the non-parametric techniques is
the Mantel-Haenszel statistic, which uses chi-square to test the null
hypothesis that there is no relation between group membership and test
performance on one item after controlling for ability as given by the
total test score. In terms of MH, an item is biased if there is a significant
difference in the proportions of each membership group achieving a
correct or desired response on each test item. Once an item has been
examined in this way, the process is continued until all items have been
compared.

Parametric approaches to DIF analysis make use of Item Response


Theory (IRT), which is an extremely powerful theory that can be used to
detect bias, especially in large-scale testing programmes. The basic
argument of IRT is that the higher an individual’s ability level, the
greater the individual’s chance of getting a more difficult item correct,
and the less likely it is that a person with lower ability would get the
more difficult items correct. This relationship can be shown graphically
by plotting the ability level of the test-taker (represented by the total
score) on the x-axis, and the probability of getting the item correct on
the y-axis. Such a plot is known as an item characteristic curve or ICC.
If this pattern of responses to items of equal difficulty differs across
cultural (or other) groups, then it is clear that the items are behaving
differently for the different groups – this is what is meant by Differential
Item Functioning or DIF.

DIF is a strong indication that some items of the measure, or the


measure as a whole, may be biased against one of the sociocultural
groups being assessed. At the same time, DIF is a necessary, but not
sufficient, condition for item bias to exist – if DIF is detected, this is not
a sufficient reason to declare item bias, but indicates the possibility that
such bias exists and various other techniques should be used to
determine if item bias is present. Factor analysis is one such measure
that could be used.

In order to detect whether method bias is present, Van de Vijver and


Hambleton (1996) suggest several approaches, including triangulation,
response set detection and non-standard administration.

Finally, He and Van de Vijver (2012) identify three actions that


can/should be taken to reduce or prevent inequivalence from occurring
at various stages of the assessment process, namely at the design,
implementation and analysis stages.

Additional reading

For good insight into the use of psychometric scales across cultural boundaries, see
Douglas, S.P. & Nijssen, E.J. (2002). On the use of ‘borrowed’ scales in cross-national
research: a cautionary note. International Marketing Review, 20(6), 621–642.

Test your understanding

Essays
1. In the light of the theories discussed in this chapter, revisit Case study 6.1 (p. 72) in
Chapter 6 and suggest how you would demonstrate the cross-cultural equivalence of
the Trauma Symptom Inventory.
2. Suppose that you want to compare two countries on individualism–collectivism and
its effect, if any, on workplace behaviour, bearing in mind that the samples in one
group have on average a higher level of education than the samples in the other
group. Discuss how this difference could challenge your findings and how you could
try to disentangle educational and cultural differences.
3. Suppose that you wanted to investigate the conformity levels of employees in your
organisation which has sizable groups of people from Eastern Europe, Asia, the US
and South Africa. How can sources of method bias be controlled in cross-cultural
studies in this study? Discuss procedures at both the design and analysis stage.
9 Managing the assessment
process

OBJECTIVES

By the end of this chapter, you should be able to

say why strict administration procedures should be followed


show how errors and biases must be controlled
show what steps need to be taken during the preparation, administration and
scoring of the assessment process
describe how to deal with irregularities and special cases
describe the statutory control of assessment instruments and practitioners
show how South Africa relates to other countries in the control of the assessment
process.

9.1 Introduction

As already stated, assessing people, especially with regard to sensitive


issues, can affect their lives and careers, and therefore we must take care
and adopt a professional approach at all times. Furthermore, we must
also try to minimise the error component that we know exists in the
assessment process. To be fair to everybody being assessed, it is
essential that the whole process, from choice of techniques through to
the administration, scoring and interpretation of the results, be as
uniform as possible. (Note, however, that this strict rule may have to be
relaxed in the interests of fairness and reasonable accommodation.)
9.2 Important standardisation procedures

In order to be as fair as possible to all the people being assessed,


especially when their assessment results are being compared, we need to
ensure that the assessment process is controlled and as similar for
everyone as possible. In general, the assessment process can be divided
into three main stages:

1. Before the testing session begins (preparation)


2. During the testing session (introduction, instructions, time limits and
dealing with questions or problems)
3. After the session (scoring, test security)

9.2.1 Preparation
Preparation for any assessment session involves ensuring that all the
required materials are available and that the venue is appropriate for the
assessment. The following aspects need special attention:

9.2.1.1 Choice of techniques and sequence


To treat everyone being assessed in the same manner, the first thing we
need to ensure is that all the people are assessed using the same
techniques in the same order. When people are assessed, they generally
start off slowly until they get used to the process(es) involved. Therefore
the results of the earlier techniques tend to be less accurate and less valid
than the later ones. However, people also get tired, so that the results of
the last assessment processes in a long session may also be less valid
and accurate. It is for this reason that the sequencing of the assessments
must be carefully considered and the same sequence used for all
participants.

As a general rule, cognitive assessments that require a great deal of


concentration should be given earlier in the sessions, and easier untimed
assessments such as personality inventories and interviews given later. Also
remember that a relatively easy task should be given first as a warm-up exercise,
especially where anxious and/or test-unsophisticated people are being assessed.
It may even be a good policy not to score this first exercise. It is also important
that adequate breaks or rest periods be given. Most assessment batteries specify
the breaks, usually 15–20 minutes every two hours or so. Obviously, more
vulnerable people (children, older people and those not used to being assessed)
will require more frequent and longer breaks than those who are less vulnerable.

9.2.1.2 Materials
The materials should be in good condition and the same for everybody
being assessed. Obviously, different groups of people may need
different versions of the material at different times, but it is unfair to
those being assessed and difficult for the assessor to interpret the results
of an assessment if person A is assessed in one way and person B, who
is being assessed for the same purpose, is assessed in another way.

The person in charge of the assessment process (the administrator) needs


to ensure that all materials required for the assessment session are in
order. This means having sufficient booklets, answer sheets, pencils,
erasers, calculators, and so on, available and in good working order (e.g.
pencils should be sharp and kept so throughout the assessment session;
the batteries in any calculators must be sufficiently charged). If any
exercises are timed, a stopwatch must be available and in good working
order. Where rough paper for calculations is required, this must be
provided. Where apparatus is used, this should be checked to ensure that
all its parts (jigsaw pieces, etc.) are present and in good condition. Part
of the administrator’s materials is a log sheet or control sheet in which
the sequence, duration, and start and stop times of each test or other
technique as well as any unusual occurrences are recorded.

9.2.1.3 Instructions
The administrator should be thoroughly familiar with the instructions
and must adhere to them as closely as circumstances permit. Prior to the
assessment session, he must ensure that, besides the instructions, he is
also thoroughly familiar with the material, the time allocated for the
various tests, and so forth.

In terms of South African law, people being assessed for any purpose
need to give their permission for this to take place, and therefore it is
essential that participants sign a document indicating that they consent
to being assessed and that their results can be used for the specified
purpose. Those not willing to sign such a document may be asked to
leave the assessment venue. Of course, parents of children below the age
of majority (18 years) may sign on their behalf. People who are unable
to sign on their own behalf for reasons of enfeeblement, as well as
people referred for assessment by the state, are regarded as having given
their rights to the state. Remember, that in terms of the Employment
Equity Act 55 of 1998, any job applicant has the same rights as people
already in the organisation’s employment. As we argue later, a
practitioner should conduct each assessment as if he may be required to
justify his actions in a court of law.

9.2.1.4 Venue
The venue in which the assessment takes place should be well
illuminated and at a comfortable temperature. It must be adequate for the
number of people being assessed and large enough to accommodate all
in relative comfort. Lighting and ventilation must also be adequate, and
the venue must be relatively free of distractions such as noise and
interruptions. No distractions such as telephones and other noises should
be allowed to interfere with the assessment. Participants (and the
assessor) should make sure that their cellphones are switched off. Any
exceptional occurrences that take place (power outages, excessive noise,
distractions, etc.) should be noted.

9.2.2 Administration
The second aspect in which as much uniformity and control as possible
should be exercised is the actual administration of the technique(s) being
used. The following aspects all require attention:

9.2.2.1 Establishing rapport


Once the participants have entered the venue and taken their seats, the
first thing the administrator should do is to establish rapport with them.
He does this by welcoming the participants to the session and
introducing himself to them, before briefly explaining the purpose and
methods of the session. Any house rules, such as when the breaks will
occur, what to do about toilet needs, and so forth, must be addressed at
this point. Participants with cellphones must be instructed to switch
them off. If any participants are likely to experience a language problem,
it is essential that a competent translator or interpreter be present. If this
is likely to be a significant problem, it is probably better to separate
those who are fluent in the language of administration from those who
are not, and to conduct separate assessment sessions with the two
groups. As stated earlier, people being assessed need to give their
consent to the assessment and to the uses to which the results will be
put. They need to understand the processes involved and to give their
written permission at this point.

9.2.2.2 Ensuring task instructions are understood


A crucial role of the administrator is to ensure that the instructions are
clearly understood by all participants. This may require him to repeat the
instructions several times. As already stated, administrators should
follow the instructions laid out in the administration manual as strictly as
possible. However, in the interests of fairness, people experiencing
difficulty should be given additional help until such time as they grasp
what is required. Here the administrator may have to go beyond the
instructions in the manual. This is acceptable, provided he does not
directly show the participant how to answer the question – he may have
to use his judgement in thinking of alternative explanations or ways of
demonstrating what is required.

If the participant is still unable to grasp the requirements of the process,


the administrator may simply have to ignore the problem and let the rest
of the group proceed with the task. This decision should be noted on the
control sheet. It is precisely for this reason that many assessments use
the first exercise as a warm-up or practice run. The results of this are
ignored.

This is especially necessary when people who are somewhat anxious or


inexperienced in these situations are being assessed. If it is absolutely essential
to assess the person, it is a good idea to reschedule the assessment for another
date, and to administer the assessment battery* in a one-on-one situation.

In the case of people with special needs, reasonable accommodation


must be made. Case study 6.1 cites the example of a person who lost his
dominant arm in a bus accident and was assisted with the writing down
of answers. This was reasonable, because the task was concerned with
cognitive processes rather than the motor process of writing.

9.2.2.3 Monitoring
During completion of the exercises, the administrator should walk
around and ensure that the answers are being completed in the correct
fashion. Close attention must be paid to ensure that the participants are
answering in the correct place on the answer sheet – participants
sometimes work across the page instead of down the columns. If more
than about 20 people are being assessed in a single venue, there should
be assistant administrators – ideally, one administrator for every 20–25
people being assessed in a group situation. It is important that the
administrator ensures that the participants do not copy from one another,
particularly on cognitive tests.

9.2.3 On completion of the assessment


9.2.3.1 Collecting the material
Once the assessment session has been completed, the administrator has
several important tasks. Firstly, he must collect the material and pack it
away securely. He must ensure that all materials are returned, especially
booklets and answer sheets, and also pencils, erasers, calculators, and so
forth. He should inspect the question booklets to see that no marks have
been made in them. Any marks must be erased.

Tip
Sometimes when marks have been made and then erased, indentations remain
on the page, allowing subsequent participants to read the marked answers. A
way around this is to make similar marks for all possible answers in the booklet
and then to erase them all. In this way, the next participant will not have any help
in choosing the correct answer – all possible answers will have been marked!

9.2.3.2 Security of the material


Once the material has been collected and checked, it must be carefully
stored in a secure place to prevent unauthorised access to it. Failure to
do this may result in the integrity and validity of the material being
compromised. In one case, the test material used in a major industry was
widely available to people who were to be assessed, and there was a
thriving industry in coaching people on the material. As a result the
predictive validity of the assessment process was substantially reduced.

9.2.3.3 Scoring
The scoring of the material is a primary source of error in assessment
and therefore great care must be taken with this aspect. It must be
carefully and accurately done, and checked by a second person to ensure
accuracy. This is especially necessary where items are scored
subjectively.

9.2.4 Interpretation of the results


As we saw in section 6.4, the interpretation of any assessment results
using a normative approach depends to a large degree on the use of the
appropriate norms, and so the choice of the norm group is a vital aspect
of test interpretation. It is for this reason that the interpretation of
assessment results is limited to properly qualified psychologists. Section
9.5 deals with the control of the practice of psychology in South Africa.

9.2.5 Feedback of results


The final stage in the assessment process is to provide feedback on the
outcome of the assessment. In many circumstances, this is given directly
to the individual(s) concerned. However, care must be taken with this
process. Firstly, specific raw scores or even normed scores (such as
stanines) should not be given to the person – at best, he should be told
that he scored above average or slightly below average, or whatever the
case may be. The second aspect is closely related to this: the information
should be given in as positive a light as possible so as to maintain the
self-esteem of the person.

Although the tendency in industry is to regard assessment results as the property


of the organisation (especially where this has been obtained for selection
purposes), South African labour legislation makes it clear that the person being
assessed has a right to this information.

Note that the party that pays for the administration of the assessment
process has a right to the report, while the participant has the right to
feedback. However, the participant cannot prescribe the nature or the
format of the feedback or how he is to access the information.

9.2.6 Confidentiality of results


A key area of concern is that once assessments have been scored, the
results should be treated as confidential and should not be made
available to unauthorised people, including the participant’s own
superior, unless he is a registered psychologist. This may sometimes be
difficult. In these cases, or where a legal entity such as a court of law
demands to see the results, practitioners should express strong
reservations about making any results available. In many ways, these
results should be treated with the same confidentiality that a lawyer
treats the information he receives from a client, or the confidentiality
with which a minister of the church treats information from a
parishioner. Assessment results must be kept in a separate filing cabinet
and should not be part of an employee’s general personnel file.

9.2.7 Dealing with special situations or participants


Sometimes it is necessary to assess people who are physically
challenged or who have specific cognitive problems, such as low levels
of literacy or skill in the language in which the assessment is conducted.
This is particularly important at this stage in South Africa’s
development for several reasons. Firstly, nearly all verbally based
assessment material is in English or Afrikaans, and many people being
assessed are not fluent in these languages. Secondly, the Constitution of
the country and the Employment Equity Act stipulate that organisations
must ensure that physically challenged people are employed. Thirdly,
there is a skills shortage in the country, and so every person with skills
should be given the opportunity to exercise these in the marketplace.

The key to this situation is the principle of reasonable accommodation,


which is the process of bending the rules slightly to accommodate the
person’s disability, while at the same time not affecting the outcome of
the assessment in any way. (In Case study 6.1, in which Sipho lost his
dominant hand in an accident, we saw how, while assessing his ability to
concentrate for a long time, the assessor wrote down his answers for
him. This is a good example of reasonable accommodation.) Scores of
standardised tests should be interpreted with caution: the error
component for all disadvantaged people is much higher than for the
general population.

Let us consider briefly some of the more common disabilities one is


likely to encounter in the occupational setting.

9.2.7.1 Language
We have already noted that we live in a multilingual country in which
many potential employees lack fluency in English or Afrikaans, the
languages in which almost all assessments are conducted. In section 2.3
of the International Test Commission (2001), which is concerned with
fairness in test application, the following statements are made:

When testing in more than one language (within or across countries),


competent test users will make all reasonable efforts to ensure that:

– Each language or dialect version has been developed using a


rigorous methodology meeting the requirements of best practice.

– The developers have been sensitive to issues of content, culture and


language.

– The test administrators can communicate clearly in the language in


which the test is to be administered.

– The test taker’s level of proficiency in the language in which the test
will be administered is determined systematically and the
appropriate language version is administered or bilingual
assessment is performed, if appropriate. Basically, there are three
ways around the problem of language ability.

Firstly, when the assessment involves interviewing and narrative tasks


(such as the Thematic Apperception Test (TAT)), a suitably qualified
and experienced native speaker of the language of the person being
assessed must act as an interpreter.
Secondly, where possible, assessment tasks that do not involve
language (such as the Ravens Progressive Matrices and various
assembly or rotational tasks) should be used. Even here there is some
debate, as performance on these tasks may need verbal rules for their
successful completion. There is some evidence that people from
socially deprived backgrounds often lack the vocabulary required to
conceptualise the tasks in their heads.
Thirdly, when a language-based task has to be completed, some form
of language assessment must be conducted before the assessment
itself is done. If the person’s language ability is inadequate, he should
be excluded from further assessment. Of course, we need to bear in
mind that it is necessary to show that ability in the area that was
assessed (e.g. language) is an integral part of the job – what the
Employment Equity Act terms “an inherent requirement of the job”.

9.2.7.2 Physically disabled


People with physical disabilities need to be assisted as far as possible
within the framework of reasonable accommodation. A writer should be
made available to assist with writing tasks where this is needed. More
importantly, the choice of instrument or assessment technique should be
such that the administration of the assessment deviates as little from the
standardised administration process as possible.
9.2.7.3 Hard of hearing
People who are hard of hearing must be allowed to use devices to
amplify the assessor’s instructions. In addition, the instructions should
be available in written form – this may require some additional
preparation by the assessor prior to the session. However, we should
also be aware that providing written instructions may introduce another
potential source of error, namely the person’s ability to read and
comprehend the instructions. It may be necessary to have a person who
can use sign language. When interpreting results, we need to remember
that the communication of deaf people is often fragmented, which many
give the impression that the person is less intelligent than he really is.

9.2.7.4 Visually impaired


People who are blind or partially sighted should be assessed in a
situation that is free of distractions as visually impaired people are
usually easily distracted. Where possible, material should be adapted by
enlarging the print. (Photocopying test booklets and other instruments to
a larger size is permissible in these cases. Of course, it is important to
test the person’s ability to read print of various sizes beforehand so that
a suitable font size can be established.) In some cases it may be
necessary to get a reader and/or writer in to read the questions and
record the answers as required. If available, Braille versions of the
instruments can be used.

To conclude, people with special needs need to be treated differently,


within the framework of reasonable accommodation. We must also to
remember that the responses of such people are more prone to error, and
so the standard error of measurement will be much higher than for the
general population. Wherever possible, make use of specially developed
instruments.

9.3 Setting and keeping ethical standards


The ethical use of psychological assessment processes is of great
importance and deserves serious consideration. The issues constitute the
duties of the assessor and can be broken down into four main areas: the
choice of techniques; the administration of the assessment process; the
scoring, interpretation and feedback of the results; and the security of
the material. (See McIntire & Miller, 2000, especially pp 55–56.)

9.3.1 The choice of techniques


Assessors should

know the strengths and limitations of the techniques they choose, and
should use only those that have been properly validated for the
purposes for which they are being used and for the target population
being assessed
base all decisions on as wide a source of information as is feasible
be properly trained and competent to use (i.e. administer, score and
interpret) the various techniques
take into account any form of bias known to exist when interpreting
the results to ensure that these measures are as free of discrimination
as possible
regularly update their knowledge in the areas and techniques they are
using.

9.3.2 The administration of the assessment process


Assessors should do the following:

Ensure that the venue is adequate for the assessment process and
allows optimum performance.
Ensure that all the assessment materials are available in sufficient
quantity and are of good quality.
Establish rapport with the people being assessed.
Take into account the needs of people who are physically challenged.
Ensure that suitably trained interpreters are available if they suspect
that language ability may be a problem.
Ensure that they are thoroughly familiar with the instructions and not
deviate from, or modify them, except when the requirements of
reasonable accommodation demand this. If they are in doubt, they
should consult a more senior or experienced colleague.
Make sure that all participants clearly understand the task
requirements.
Ensure that time limits are strictly adhered to. Noting start and finish
times in a log book in case the stopwatch malfunctions is vitally
important.
Not count down to the end of the assessment; they should avoid
saying: “You have three minutes left” or something similar.

9.3.3 The scoring, interpretation and feedback of results


Assessors should do the following:

Ensure that the assessments are properly scored and double-check all
results every time.
Follow the scoring instructions carefully, especially with procedures
that are more subjective. Every now and then they should go back to
an earlier answer sheet to ensure that their standards and/or
interpretations have not drifted (i.e. they should check their own test–
retest reliability or consistency). Where possible, they should also
check their interpretation of more subjective items with colleagues
(i.e. their inter-scorer reliability).
Know which norms are appropriate and make sure that the correct
ones are used.
Know how the various transformations affect the raw scores and use
the most appropriate ones.
Know what factors affect the validity of any assessment, and keep
these in mind when interpreting any assessment outcome or score.
Make sure that all sources and forms of relevant information are
considered in coming to a decision.
Understand the meaning and importance of the standard error of
measurement (SEM) and the effect this may have on the accuracy of
any cut-off score. The greater the SEM, the less absolute is any cut-
off score.
Give feedback in as positive a way as possible and be aware of the
damage that a poor result can do if it is badly communicated.
Use language that is appropriate to the person being assessed and
other interested parties.

9.3.4 Security of the material


Assessors should do the following:

Ensure that all assessment results are treated as confidential and that
they cannot be seen by unauthorised people.
Not disclose raw scores or transformed scores to anyone, but rather
give a verbal interpretation such as “above average”, “near the top of
the scale”, and so forth.
Make sure that the material is securely stored and not accessible to
unauthorised people when not in use.
Not duplicate copyrighted material: the producers of this material
have gone to great lengths with their product, and copying it is
nothing less than theft.

9.4 The rights of people being assessed

Assessment is an invasion of a person’s privacy and people need to


understand what is taking place and to give their informed consent to
being assessed.

The following constitute the basic rights of people being assessed:

They have the right to know why and how they are to be assessed.
They need to know how the results will be used.
They have the right to confidentiality and need to be assured that the
results will not be passed on to third parties, except with their written
consent.
They have the right to refuse to be assessed. However, they must be
aware of the consequences of this refusal.
They have the right to a full and comprehensive assessment in the
light of the purpose of the assessment.
They have the right to expect that the assessment techniques used are
appropriate and valid for them and for the purpose of the assessment.
They have the right to be protected from stigmatisation as a result of
their assessment results.
They have the right to expect that feedback should be given in a way
that protects their dignity and their self-image, while encouraging
insight and opportunities for growth and development.
In terms of the South African Employment Equity Act 55 of 1998, a
job applicant has the same rights as an employee. In other countries,
very little attention is being paid to these issues at present.

9.5 Statutory control of psychological techniques

The control of psychologists, their training and the services they provide
are controlled quite differently in different parts of the world. In some
countries there has traditionally been very little, if any, control; in others
this control has been exercised by the psychology profession through
membership of relevant associations; and in some countries, this control
is exercised through specific legislation. However, there is a growing
trend towards the establishment of centralised professional boards
located within central government structures and controlled by specific
“Healthcare Professions” legislation.

9.5.1 South Africa


In South Africa the training of psychologists and the use of tests and the
performance of various other “psychological acts” are controlled by law
(i.e. by statutes, hence the term “statutory”), and various penalties for
breaches of the law are enforced by the Professional Board of
Psychology of the Health Professions Council (HPCSA). This was
established in terms of the Health Professions Act 56 of 1974 to promote
the health of the population, determine standards of professional
education and training, and set and maintain standards of excellence for
ethical and professional practice. This is a direct result of certain
practices that took place in the 1980s, which led the authorities to enact
legislation to protect the public from perceived abuses. This in turn led
to the formation of the South African Medical and Dental Council
(SAMDC) with a Professional Board for Psychology as a key
component. A consequence of this was that various forms of
psychological practitioner were defined, with the term “psychologist”
being restricted to people with certain carefully specified qualifications
and training. In addition, psychological acts were defined and
psychological techniques, especially psychometric tests, were
categorised. The use of these techniques was carefully linked to the
various levels of psychological practitioner that had been defined. The
mandate of the Professional Board for Psychology was (and is)

to control and exercise authority in respect of all matters relating to


the training of psychologists, registered counsellors and
psychometrists; promote the standards of such education and training
in South Africa; and maintain and enhance the dignity of the
profession and the integrity of the persons practising the profession.

In terms of the health professions legislation, various forms of


psychological practitioner are defined, with the term psychologist being
restricted to people with certain carefully specified qualifications and
training. In brief, three levels of psychological practitioner were
identified, namely psychologist, psychometrist and psycho-technician.
To be a psychologist, the person needed at least a Master’s degree in
psychology and an appropriate one-year internship, in which the
principles and theories learned were observed in practice (much like the
housemanship year that medical doctors need to undergo). To be
registered as a psychometrist required a four-year Honours degree in
psychology and a six-month internship, while a psycho-technician
needed a Bachelor’s degree in psychology. All people wishing to work
in the area of psychology were required to be registered with the
SAMDC. In terms of this legislation, psychologists have to be registered
in one of five categories as follows:

Clinical psychologist
Counselling psychologist
Industrial psychologist
Educational psychologist
Research psychologist

Each category required its own Master’s-level training and internship.


People who were not registered in one of these categories with the
SAMDC could not perform psychological acts, and those who did so
were liable for criminal prosecution, although few, if any, have been
prosecuted or convicted for this.

As stated, permission to conduct psychological acts was limited to


psychologists. With regard to psychological assessment, only the
psychologist was allowed to decide on which instruments and tests to
use, to interpret test scores and to write the reports. The psychometrist
was allowed to administer the tests and to score them, both under the
guidance of a psychologist. Finally, the psycho-technician was only
allowed to administer tests.

One final aspect of this approach was to place all tests and other
techniques into three categories, namely C, B and A levels. C-level tests
were those that involved in-depth individual assessments of personality,
intelligence and other aspects of psychological functioning, especially
those related to personality problems. B-level tests were those that were
related to normal functioning and could be administered in group
situations, such as ability and aptitude testing* at schools and in the
workplace. A-level tests were those of interests and aptitudes often used
in the school or educational environment. Until recently, tests and other
techniques were graded as C, B and A level, initially by the Test
Commission of South Africa and later by a committee of the
Professional Board for Psychology within the HPCSA (which replaced
the SAMDC). The purpose of this classification was to ensure that all
tests in use were properly evaluated with respect to their psychometric
properties, the adequacy of their norms and the level of registration
needed to use them.

Some of the more important recent developments include the decision


by the HPCSA that all psychologists should complete one year of
community service after their internship year before they can be
registered as psychologists. At the time of writing, this had not been
enforced in the case of industrial psychologists.

In a letter to practitioners dated 15 December 2005, the Professional


Board for Psychology of the HPCSA reported that a recent meeting of
the Board had approved the following:

The establishment of a specialist register. According to this


communication, specialists would require a doctoral-level
qualification (National Qualifications Framework (NQF) level 8),
stating that they could be registered at the outset as
neuropsychologists and forensic psychologists, and would not be able
to practice as generalists. At one stage it seemed as though this
specialist registration would be done away with, but in its newsletter
dated August 2013, it was announced that both the categories of
neuropsychologist and forensic psychologist had been approved and
that the Board was “currently attending to the outstanding legislative
issues associated with this step”. Whether specialists will be able to
practice at the “lower level” of generalists is not clear at this stage.
In their letter to practitioners dated 15 December 2005, the
Professional Board for Psychology also stated that registration as a
psychometrist would require the same academic and practical training
as a psychological counsellor. The difference between a counsellor
and a psychometrist is that the former undertakes “basic short-term
supportive counselling, basic psycho-educational training and the
promotion of primary psychosocial well-being”, whereas the
psychometrist can “administer, score and interpret tests and give
feedback on test results, excluding projective personality measures,
specialist neuropsychological measures and measures that are used for
the diagnosis of psychopathology (e.g. MMPI-2)”.
It was also suggested that the “different practicing fields for registered
counsellors should be consolidated” and the name “registered
counsellor” should be changed to “psychological counsellor”.
However, the Professional Board for Psychology’s “Framework for
Education, Training and Registration as a Registered Counsellor”
(updated in February 2010), retains the category of “registered
counsellor” and gives a detailed scope of practice for this category,
thus it would seem that the Board has changed its mind on this aspect.
A four-year course (Honours or BPsych) and a six-month internship
or “practicum” are required for registration.
On 5 March 2007, the Professional Board for Psychology put out a
draft discussion document (Health Professions Council of South
Africa (HPCSA), 2007), in which a number of new categories of
psychological practitioner were proposed so that the full list of
categories would include mental health practitioner, registered
counsellor, psychometrist, research psychologist, industrial
psychologist, educational psychologist, counselling psychologist,
clinical psychologist, neuropsychologist and forensic psychologist.
For each category, a list of the activities that those registered in the
category may perform and the educational and training requirements
associated with it are given. This is known as a “scope of practice”.
The training for psychologists remains at the Master’s level in an
accredited course for the relevant category and a 12-month internship.
(The fact that the two specialist categories are listed alongside the
generalist categories makes the training for the specialists somewhat
unclear at this stage.) The current status of the “mental health
practitioner” is also not clear at this point.
In addition, a letter from the HPCSA (2006) to practitioners addressed
a development of interest to industrial psychologists relating to testing
and assessment over the Internet. In short, the Professional Board for
Psychology took the strong line that all tests conducted via the
Internet have to be classified by the South African authorities before
they can be administered via the Internet (p. 4) and that any person
administering the test has to be registered accordingly (HPCSA, 2006,
p. 5). The issue of Internet testing is dealt with in detail in section 9.7.

A final issue that needs to be mentioned is the idea of continuous


professional development (CPD), in terms of which registered
psychologists and other practitioners need to keep abreast of new
developments. To do this they are required to earn CPD points by
attending various conferences and workshops which have been accorded
certain points by the Professional Board for Psychology. Records of
these points must be kept by the practitioners and produced on demand
by the Board. Failure to produce evidence of these CPD points could
lead to cancellation of registration. People may not call themselves
psychologists (or any other proscribed names), nor can they perform acts
identified as the work of psychologists unless they are registered.

9.5.2 Britain
In the past, the recognition and control of psychologists was the realm of
the British Psychological Society (BPS). However, statutory regulation
for psychologists was introduced on 1 July 2009, and the Health and
Care Professions Council (HCPC) also opened a “Register of
Practitioner Psychologists”. This legislation protects seven titles:
Clinical Psychologist, Health Psychologist, Counselling Psychologist,
Educational Psychologist, Occupational Psychologist, Sport and
Exercise Psychologist and Forensic Psychologist. In addition, the HCPC
stipulated two routes to statutory regulation – for some categories a
professional doctorate (DPsych or DClinPsych) was decreed, whereas
for others a Master’s degree and endorsement by the BPS is required.
The HCPC does not approve other qualifications in psychology, such as
undergraduate degrees or Master’s programmes, because these do not
lead directly to eligibility for registration with the HCPC. The use of the
title “Chartered Psychologist” is also protected by statutory regulation
and simply means that the psychologist is a chartered member of the
British Psychological Society (BPS). However, it does not necessarily
signify that the psychologist is registered with the HCPC. It is an
offence for someone who is not in the appropriate section of the HCPC
to call himself a psychologist even though the BPS continues to accredit
these programmes. According to the BPS fact sheet on registration, the
categories are as follows:

Clinical – Professional Doctorate


Counselling – Professional Doctorate or equivalent
Educational – Professional Doctorate or equivalent
Forensic – Master’s degree (with the award of the Society
qualification in forensic psychology or equivalent)
Health – Master’s degree (with the award of the Society qualification
in health psychology or equivalent)
Occupational – Master’s degree (with the award of the Society
qualification in occupational psychology, or equivalent)
Sport and exercise – Master’s degree (with the award of the Society
qualification in sport and exercise psychology, or equivalent).

Source: http://www.bps.org.uk/what-we-do/bps/regulation-psychology/regulation-
psychology [retrieved 24 July 2013]

In the UK, psychologists are trained in two stages. The first is an


Honours degree in psychology or its equivalent. For a degree to count as
the first stage of training it must be accredited by the BPS as conferring
the “graduate basis for registration”. Once a graduate has the
“graduate basis for registration”, he may enter postgraduate training.
This will take between three and five years to complete, and will usually
involve both an academic assessment and periods of supervised practice.
The precise structure of this postgraduate training will depend on the
particular speciality involved. Once this is completed, the psychologist
may register as a chartered psychologist with the BPS. It is the BPS
rather than a statutory body that recognises qualifications.

9.5.3 Europe
In recent years, attempts have been made to establish a Europe-wide
approach to the training of psychologists. According to Sagana and
Potocnic (2009), this training would take the form of a European
Diploma in Psychology (EDP) consisting of a three-year Bachelor’s
degree and a two-year Master’s degree with a 12-month supervised
practice, although the latter would not necessarily be offered by the
university offering the academic training. The authors identify several
problems with this approach, including the following:

The duration of the formal curriculum in some countries is too short.


In some countries, there is no research thesis or dissertation.
No allowance is made for renewal of qualifications – CPD is not
required in some countries.

With regard to the training of occupational/organisational psychologists,


Sagana and Potocnic (2009) do not support the EDP approach, arguing
instead for the European Master’s in Work, Organizational, and
Personnel Psychology (WOP-P), which is a graduate university
programme supported by the European Commission through the
Erasmus Mundus Programme. The entry requirement for the Master’s is
an undergraduate degree in psychology. The objective is to contribute to
the qualification of professionals and researchers in the discipline of
WOP-P, emphasising a European approach and perspective. Although
the WOP-P implements the main guidelines of the EDP supported by the
European Federation of Psychology Association (EFPA), it also follows
the reference model and minimal standards of the European curriculum
in WOP-P established by the European Network of Work and
Organisational Psychology Professors (ENOP). This is a full-time two-
year programme.
9.5.4 The US/Canada
In the US, psychologists need to complete a doctoral programme in the
form of a PsyD or PhD degree. The US approach to test classification as
put forward by the American Psychological Association (APA) is
similar to the current South African system in so far as it has a three-tier
classification of tests into A, B and C levels. The APA, however, is less
strict than South Africa in the prescription of the qualifications required
to work with the various tests. Once again, it is the profession in the
form of the APA that recognises qualifications, rather than a statutory
body as is the case in South Africa. To practise clinically, psychologists
must also hold a clinical licence. The exception to this is the profession
of school psychologist, who can be certified by boards of education to
practise and use the title “psychologist” with an Education Specialist
(EdS) degree. The most commonly recognised psychology professionals
are clinical and counselling psychologists, psychotherapists and/or those
who administer and interpret psychological tests. There are differences
between states in the requirements for academics in psychology and
government employees.

Of interest is that psychologists in the US have campaigned for


legislation changes to enable specially trained psychologists to prescribe
psychiatric medications. New legislation in the states of Louisiana and
New Mexico has granted those who take an additional Master’s
programme in psychopharmacology permission to prescribe medications
for mental and emotional disorders in coordination with the patient’s
physician. Under the provisions of a 2004 Louisiana law, medical
psychologists are permitted to prescribe medications for mental and
emotional disorders. This practice of psychology limited to medical
psychologists is regulated by the medical profession represented by the
Louisiana State Board of Medical Examiners (Louisiana Psychological
Association). Louisiana is the second state after New Mexico to
authorise specially trained psychologists to add medication to their
treatment options. Similar legislation in the states of Hawaii and Oregon
passed through the legislative House and Senate but was vetoed by the
governors.

The requirements for the administration of tests in the US are as follows:


A-level tests require no special qualification.
B-level tests require at least a BA in psychology, counselling or a
closely related discipline, with advanced coursework in such fields as
statistics, individual differences, test construction, personnel
psychology, adjustment and counselling.
C-level tests require an advanced degree, with knowledge of the
principles of psychological assessment, and supervised experience
using the test.

It is important to note that it is the profession in the form of the APA


that recognises qualifications, rather than a statutory body.

9.5.5 Australia and New Zealand


In Australia the psychology profession and the use of the title
“psychologist” is regulated by an Act of Parliament, the Health
Practitioner Regulation (Administrative Arrangements) National Law
Act 2008, following an agreement between the state and territory
governments. Under the National Law, registration of psychologists is
now administered by the Psychology Board of Australia (PsyBA).
Before July 2010, professional registration of psychologists was
governed by various state and territory psychology registration boards.
In order to practise, a psychologist must be registered with the PsyBA.
Psychology education and training programmes offered by Australian
universities are covered by an accreditation system administered by the
Australian Psychology Accreditation Council (APAC). Although
Australia and New Zealand have slightly different approaches, the two
countries are working increasingly more closely and have, for example,
established the Australian and New Zealand Standard Classification of
Occupations (ANZSCO), which has developed a set of standards for
various occupations, including psychologists of various kinds. ANZSCO
recognises the following areas of practice:

272311 Clinical Psychologist


272312 Educational Psychologist
272313 Organisational Psychologist
272314 Psychotherapist
272399 Psychologists NEC (Not Elsewhere Classified – including
Community Psychologist, Counselling Psychologist and Sport
Psychologist).

According to ANZSCO (2013), “most occupations in this unit group


have a level of skill commensurate with a Bachelor’s degree or higher
qualification. In some instances relevant experience and/or on-the-job
training may be required in addition to the formal qualification
(ANZSCO Skill Level 1)”.

All tertiary psychology courses (except research degrees) are assessed to


ensure that they provide suitable preparation for students wishing to gain
professional registration as a psychologist and/or membership of the
APS. All APAC-accredited programmes must also be approved by the
PsyBA. The following areas of psychology are currently recognised in
Australia:

Clinical neuropsychology
Clinical psychology
Community psychology
Counselling psychology
Educational and developmental psychology
Forensic psychology
Health psychology
Organisational psychology
Sport and exercise psychology

Source: Australian Psychology Accreditation Council (2012)

The minimum requirements for general registration in psychology and to


use the title “psychologist” is an APAC-approved four-year degree in
psychology followed by either (1) a two-year Master’s programme; or
(2) two years supervised by a registered psychologist. According to the
2011-2012 President’s Initiative on the Future of Psychological Science
in Australia (Psychology 2020), the two years of supervised training
required for registration may be achieved through one of the following
pathways:

Two years of accredited supervised internship training in the


workplace (the ‘4+2’ model)
One year of postgraduate university training followed by one year of
accredited supervised internship training in the workplace (the “5+1”
model), which is a relatively new pathway (in South African terms
this is equivalent to an Honours degree plus a one-year internship)
A postgraduate accredited Master’s or doctoral degree of at least two
years full-time duration (Psychology 2020)

Some critics of the 4+2 and 5+1 models are of the view that, because the
basic entry requirement is four years of academic study, Australian
psychologists are underqualified in that. Wemm (2001) disagrees,
however, arguing that the Australian Honours degree is at least as good
as a Master’s degree in other countries and that an Australian Bachelor’s
degree with a major in psychology is roughly equivalent to an American
Master’s degree in psychology in terms of years of psychology studied,
although she concedes that this training is academic rather than
professional. She further contends that a four-year Australian Honours
degree approximates a three-year American doctorate, while a five-year
Australian Bachelor’s degree at pass level is generally equivalent to a
four-year American professional doctorate. A two-year Australian
Coursework Master’s is somewhere between an American PhD and a
post-doctoral diploma.

In New Zealand, the use of the title “Psychologist” is restricted by law


in order to protect the public by providing assurance that the title user is
registered and therefore qualified and competent, and can be held
accountable for their practice. Prior to 2004, only the title “Registered
Psychologist” was restricted (to people qualified and registered as such).
However, with the proclamation of the Health Practitioners Competence
Assurance Act 2003, the use of the title “Psychologist” was limited to
practitioners registered with the New Zealand Psychologists Board. The
titles “Clinical Psychologist”, “Counselling Psychologist”, “Educational
Psychologist”, “Intern Psychologist” and “Trainee Psychologist” are
similarly protected.

Psychologists must have a Master’s degree in psychology and


supervised training in an appropriate area. Three categories of
psychologist are recognised, namely:

General psychologist. This requires a minimum of a Master’s degree


in psychology and 1500 hours or more of supervised practice in a
setting acceptable to the New Zealand Psychologists Board.
Clinical psychologist. This requires a postgraduate qualification and
training in clinical psychology – typically a psychology doctorate or
postgraduate diploma in psychology or demonstrated practice in this
discipline.
Educational psychologist. This requires a postgraduate qualification
and training in educational psychology.

9.5.6 China
At the turn of this century, in China there were only six psychology
departments and four psychology institutions among all the institutions
of higher education, although all normal universities and teachers’
colleges have psychology curricula and established psychology teaching
and research groups. To a certain extent, China had to depend on the
developed world for the training of its psychologists (Jing & Fu, 2001).
This dependence resulted from the importation of foreign experts as well
as the training abroad of Chinese psychologists at the postgraduate level
and the subsequent brain drain, as many of the latter do not return to
China (Higgins & Zheng, 2002, pp. 10/11 of 14).

The 7,9-magnitude earthquake that shook Sichuan and neighbouring


provinces on 12 May 2008 provided a strong impetus for clinical and
counselling psychologists to help deal with the post-traumatic stress
resulting from the devastation. In addition, the social and psychological
impact of China’s one-child policy on only children came to the fore –
psychologists have found, for example, that only children had superior
cognitive abilities, but no significant difference in personality traits,
compared to children with siblings (Jing et al., 2003). Although
occupational/organisational psychology is an important aspect of
psychology in modern-day China, the literature tends to focus on mental
health aspects, and so prominence is given to clinical and counselling
psychology.

According to Wang and Mobley (2011), industrial and organisational (I-


O) psychology in China has developed significantly over the past decade
in the context of China’s rapid economic growth and globalisation, as
multinational firms have increased their investment in China and
because Chinese organisations have felt the need to become more
competitive and global in their strategies. As a result, this has led to a
high demand for talent, including I-O psychologists. In recent years, I-O
psychology has witnessed significant developments in three areas: (a)
professional programme development; (b) problem-driven research; and
(c) globally integrated collaboration in relation to key issues in the social
and economic development of China. With the continuous growth of the
economy and rapid development of internationalisation in China, I-O
psychology is becoming one of the most widely applied disciplines.

Since 1980, industrial psychology has become an important area of


psychological research, and industrial and organisational psychology has
become increasingly linked to global research development and
academic upgrading, and to the application of I-O principles to the
workplace using a professional/practitioner model. Since 2009,
universities in China have started to emphasise professional education I-
O psychology, focusing on occupational skills in personnel selection,
compensation, organisational change and entrepreneurship. This
signifies that I-O psychology is now formally recognised as a profession
in China.
9.6 The classification of psychological tests

In line with the need to carefully manage the training and education of
psychologists, many countries find it necessary to classify their
psychological tests and assessments. However, the way this has been
done in some countries is problematic. In a 2010 submission to the
PsyBA in Australia, a useful analysis was put forward by Littlefield,
Stokes and Li (2010), who argue that any classification system needs to
optimise benefits by doing three things simultaneously, namely 1)
managing risks of harm to the public (safety); 2) ensuring quality
services; and 3) protecting tests from inappropriate use. In addressing
these needs, they identify four different approaches to classification and
the advantages and disadvantages associated with each. These are test
type, setting, purpose and use, and administration versus interpretation.

9.6.1 Test type


The first approach to classification is based on the types of tests that are
available. Psychological tests may be regarded as formal assessment
tools that meet the stringent requirements of test theory and design, and
are developed within a particular psychological theoretical context to
measure psychological attributes. The problem with this approach is that
by their very nature the people classifying the tests require detailed
knowledge of both test theory and the theoretical context in which they
are embedded for their interpretation. They show that, for example, a
psychological test of attitudes would be expected to meet much more
stringent construction requirements than, for example, a 10-item
“attitude test” in a popular magazine which may have little validity.
Similarly, a psychological test of intelligence or general cognitive ability
will differ greatly from a 10- or 20-item quiz in a popular magazine
under the banner of “Test your own intelligence”, which will not be
concerned with issues of reliability and validity. On the other hand, by
definition, all psychological tests need to be valid and reliable, and
provide credible assessments. As a result, the impact of the lack of these
requirements is far greater than when such a lack occurs in a “pop” test.
Littlefield et al. (2010) argue further that psychological tests can be
distinguished along a number of dimensions, including area of
assessment (e.g. attitudes, aptitude for work, intelligence), means of
administration (individual, group) and nature of the test (e.g.
questionnaire, rating scale, series of subtests requiring a range of
response types). Simply distinguishing test types in terms of these
dimensions will not adequately predict the risk of harm associated with
their use.

9.6.2 Setting
A second approach considers the setting or circumstances in which
testing occurs. Such a process might, for instance, make a distinction
between those psychological tests that are used in clinical settings from
those used for vocational counselling or staff selection. Clinical tests
could then be limited for use in clinical settings by people who have
been trained and registered as clinical psychologists, while vocational or
selection tests would be available only to those with demonstrated
competence, accredited training or registered as organisational
psychologists. The criteria used in this approach may be the type of tests
used, the client population targeted and the tests’ abilities as diagnostic
or predictive tools. As with test type, a specific setting on its own does
not represent an adequate predictor of risks of harm, but should form
one element of an overall considered approach.

9.6.3 Purpose and use


The third approach to classification takes as its starting point an
examination of the risks and vulnerabilities to the person being assessed
arising from the (incorrect) use of the technique/instrument. Take, for
example, the use of a test for the measurement of management styles or
conservative political views in determining suitability for promotion or
selection – this form of assessment, while it may cause some (or even
considerable) harm, is likely to carry less associated risk that the
misdiagnosis of autism or dementia.

9.6.4 Administration versus interpretation


Littlefield et al. (2010) argue that the consideration of use as a basis of
classification points to the fundamental distinction between test
administration and test interpretation. Test administration, while
requiring some form of training, is merely one of several means of
obtaining information aimed at arriving at some diagnosis or
interpretation. On the other hand, the process of interpreting these
results draws on other information in the broader context of assessment,
hence the use of the test findings, or their interpretation, is the more
critical aspect of psychological assessment. In fact, assuming that the
test has been administered and scored correctly, the risks of harm are
most evident in the interpretation of the results. In other words, the risk
is largely associated with interpretation and only to a limited extent with
administration, especially if the risks associated with test type, setting,
use and competency in administration are carefully managed.

Competent administration is vitally important to gain reliable and valid


scores, and ensure the best performance of clients. Competent
administration gains rapport from clients through confidentiality and
trust throughout the experience. It is thus necessary to require standard
training and supervised experience, including the understanding of
psychological concepts underlying test theory, to ensure that test
administration is done appropriately. These foundations, as argued
earlier, are an integral part of the training of all psychologists.

Littlefield et al. (2010) conclude that no single mechanism based on a


testing model will be suitable as the basis for minimising the risks to the
public arising from the misuse of psychological tests. However, defining
psychological tests, and then making a distinction between test
administration and test interpretation, would be logical first steps
forward. If the administration of psychological tests were to be
undertaken by individuals with appropriate training in the administration
of such tests under the supervision of a registered psychologist, and the
interpretation of the results were reserved for psychologists with
additional training and experience in the particular tests, protection of
the public can be maximised (pp. 15–16). This will also help to
minimise the likelihood (and therefore risks of harm) of psychologists
practising outside their areas of professional competence. The authors
conclude by stating that when used with legislative support that restricts
the use of tests, this approach (i.e. defining psychological tests and
separating test administration from interpretation) has the potential to
ensure that standard psychological tests are interpreted and reported by a
specialist or senior psychologist.

9.6.5 South Africa


In South Africa, previous legislation divided all tests into three
categories based on content and purpose, namely C-level tests, B-level
tests and A-level tests. C-level tests were those that involved in-depth
individual assessments of personality, intelligence and other aspects of
psychological functioning, especially those related to personality
problems. These could be administered and interpreted by registered
psychologists, with at least a Master’s degree and appropriate internship
of 12 months. B-level tests were those that were related to normal
functioning and assessed in group situations, such as ability and aptitude
testing at school and in the workplace. These could be administered by
people with at least an Honours degree (four years of psychology) and a
six-month internship. A-level tests were defined as those related to the
assessment of interests and attitudes. These tests and other techniques
were until recently graded as C-, B- and A-level, initially by the Test
Commission of South Africa, and more recently by a committee of the
Professional Board for Psychology within the HPCSA. The purpose of
this classification was to ensure that all tests in use were properly
evaluated with respect to their psychometric properties, the adequacy of
their norms and the level of registration needed to use the tests. Any test
not so classified was deemed to belong in category C. In addition,
because a number of older tests had been allowed onto the register
without proper evaluation, there has been an effort to move away from
the three-tier classification of tests and to regard all tests that measure
psychological phenomena as C-level tests, which only psychologists can
control (i.e. order, administer, score and interpret). In short, this makes a
distinction between “psychological” and “non-psychological” tests, a
distinction that is neither practical nor practicable. In addition, there is in
fact no legislation in place that indicates that only HPCSA-approved
tests can be used, and attempts to write this into the Employment Equity
Act have been put on hold as a result of powerful lobbying by
commercial test producers and distributors. (See also Laher &
Cockcroft, 2013, p. 536.)

Although the Society for Industrial and Organisational Psychology of


South Africa (SIOPSA) is in general agreement with the notion that
there should be some distinction between different types of tests, it does
not support the current classification process, and recommends that the
EFPA Review Model (also used by the BPS), which looks at whether
the instrument’s psychometric properties fulfil requirements, should
form the basis as an alternative to the current classification system
(SIOPSA, 2012). In this way they hope to avoid possible confusion that
may arise from assuming that an instrument is suitable for a particular
use because it has been classified, rather than because it is appropriate
for the task at hand (taking into account ethical considerations and
labour regulations) and that the assessors have the necessary training to
use it.

9.6.6 Britain
In Britain the system recently adopted is similar to that used by the
European Federation of Psychologists’ Associations (EFPA). The EFPA
model broadens the classification of the tests based only on content to
include additional considerations of the context, instruments and the use
to which they will be put, in line with the Australian model discussed in
the previous sections. This model involves a qualification system based
on three levels of competence needed to practise safely in various test
roles (Bartram, 2011). These levels are as follows:

Level 1 are those assessment instruments that can be used under


supervision by people who have been trained as test administrators
and who are deemed sufficiently competent to practise under
supervision. This level requires training in test administration or
similar qualifications relating to some part of test use, and could, for
example, cover use by line managers of reports generated by tests.
Level 2 are those instruments that require qualified test users who are
competent to practise independently in a limited range of situations.
This could include psychology graduates, human resources
professionals or work/organisational psychologists who use tests in a
routine way for selection, development, etc.
Level 3 involves assessments where in-depth expertise and/or a
greater breadth of knowledge are required. These people should be
competent to practise independently in a broad range of situations
with expertise in one or more areas, including psychologists who have
received specialised training and other professional experts in testing
(e.g. psychometricians).

The EFPA model also takes into account various technical issues
focusing on the key psychometric quality of the test, such as an
evaluation of the norms, reliability and validity of the instrument. The
rating process allows reviewers to comment on such aspects as the
appropriateness of norm groups for local use, sample sizes, etc. The
emphasis on training in the EFPA model is in line with the findings of
Muniz et al. (2001) that while the restrictions imposed on testing vary
considerably from country to country, restrictions alone are no guarantee
of good practice, but that some form of specialised training requirement
is also necessary.

9.6.7 The US
The US approach to test classification as put forward by the American
Psychological Association (APA) has a three-tier classification into A-
level, B-level and C-level tests. In terms of this model

A-level tests require no special qualification


B-level tests require at least a BA in psychology, counselling or a
closely related discipline, with advanced coursework in such areas as
statistics, individual differences, test construction, personnel
psychology, adjustment and counselling
C-level tests require an advanced degree, with knowledge of the
principles of psychological assessment, and supervised experience
using the test.
Various test producers/distributors build on these categories. For
example, in the US, SIGMA Assessment Systems based in Port Huron,
Michigan, supply test materials as follows:

A-level assessments are available for purchase by individuals who


have

a) a Bachelor’s degree in psychology or a related discipline (e.g.


counselling, education, human resources, social work, etc.) and
coursework relevant to psychological testing; OR
b) equivalent training in psychological assessments from a reputable
organisation; OR
c) professional membership of an organisation that requires training
and experience in the use of psychological assessments and
surveys; OR
d) certification from an organisation with similar proficiency
requirements; OR
e) practical experience in the use of psychological assessments.

B-level assessments are available for purchase by individuals who


have

a) a graduate degree in psychology or a related discipline (e.g.


counselling, education, human resources, social work, etc.) and
have completed graduate-level coursework in psychological
testing or measurement; OR
b) equivalent training focused on psychological testing or
measurement from a reputable organisation.

C-level assessments are available for purchase by individuals who


have

a) a doctoral degree in psychology or a related discipline (e.g.


counselling, education, human resources, social work, etc.); OR
b) the direct supervision of a qualified psychologist or a qualified
professional in a related discipline.

(See http://www.sigmaassessmentsystems.com/)

In Canada, Pearson Publishers have extended this three-level system to


establish a five-level one (A, B, C, Q1, Q2) as follows:

Qualification Level A: Generally, A-level instruments are those that


do not require an individual to have advanced training in assessment
and interpretation.
Qualification Level B: B-level instruments require more expertise on
the part of the examiner than A-level tests and may be purchased by
individuals who are certified by a professional organisation
recognised by Pearson. The administration procedures and
interpretation of test results of these instruments are generally more
complex than for A-level products. These professional organisations
include as part of their code of ethics the requirement that
practitioners engage in aspects of their professions that are within the
scope of their competence. B-level tests may be purchased by
individuals with a Master’s degree in psychology, education or a
related field with relevant training in assessment, while practitioners
who do not have a Master’s degree but who have completed
specialised training or have developed expertise in a specific area may
order B-level products to assess skills in their area of expertise.
Members of professional organisations in such areas as speech-
language pathology, counselling, educational training and
occupational therapy are eligible to buy these B-level products.
Qualification Level C: C-level tests require a doctorate in psychology,
education or a related field or licensure.

Pearson has also introduced two new qualification levels, Q1 and Q2.
Q1-level assessments may be purchased by individuals who have a
degree or licence to practise in the healthcare or allied healthcare field.
Q2-level tests may be purchased by individuals who have formal
supervised training in mental health, speech/language, and/or in
educational training settings specific to working with parents and
assessing children; formal supervised training in infant and child
development; or formal training in the ethical use, administration and
interpretation of standardised assessment tools and psychometrics (see
http://www.pearsonassess.ca/haiweb/cultures/en-
ca/ordering/qualification-levels.htm).

9.7 Psychological testing on the Internet

An important development of interest to all psychologists, but


particularly to industrial/organisational psychologists, relates to testing
and assessment over the Internet, which has grown rapidly in the last
decade or so. In 2005, the International Test Commission (ITC) drew up
guidelines for the use of computerised and Internet-based assessment
(ITC, 2006).

Four modes of test administration have been described by Bartram


(2001).

1. Open mode: Here there is no direct human supervision of the


assessment session and hence there is no means of authenticating the
identity of the test-taker. Internet-based tests without any
requirement for registration can be considered an example of this
mode of administration.
2. Controlled mode: Here there is no direct human supervision of the
assessment session involved, but the test is made available only to
known test-takers. Internet tests require test-takers to obtain a login
username and password. These are often designed to operate on a
one-time-only basis.
3. Supervised (or “proctored”) mode: This is where there is a level of
direct human supervision over test-taking conditions. In this mode,
the test-taker’s identity can be authenticated. For online testing, this
would require an administrator to log in a candidate and confirm
that the test had been properly administered and completed.
4. Managed mode: In this mode there is a high level of human
supervision and control over the test-taking environment. In CBT
testing, this is normally achieved by the use of dedicated testing
centres, where there is a high level of control over access, security,
the qualification of test administration staff and the quality and
technical specifications of the test equipment.

More recently, the ITC (2010) has proposed a modified model of test
administration involving the splitting of the supervised mode into (a)
Remote: supervised and (b) Local: supervised. The “remote supervised”
is based on the availability and application of online monitoring by the
test user with real-time biometrics, which permits the following
safeguards and ways of controlling for/detecting that obviate the need
for proctoring:

Remote analysis of keystrokes


Certified online proctoring (e.g. online webcam)
Protective item formats
Strong machine and browser lockdowns
Real-time monitoring of response patterns, response latencies, etc.,
which may suggest prior knowledge or attempts to cheat
Monitoring of unauthorised keystrokes (e.g. issuing of warnings by
the proctor for test-taker attempts to bypass controls)
Following existing security standards, which can include monitoring
of web traffic

The ITC guidelines are divided into four areas of focus, each with
between three and six subareas, yielding a total of 18 areas that need to
be attended to. These are the following:

1. Give due regard to technological issues in computer-based (CBT)


and Internet testing

a. Give consideration to hardware and software requirements.


b. Take account of the robustness of the CBT/Internet test.
c. Consider human factor issues in the presentation of material via
computer or the Internet.
d. Consider reasonable adjustments to the technical features of the test
for candidates with disabilities.
e. Provide help, information and practice items within the
CBT/Internet test.

2. Attend to quality issues in CBT and Internet testing

a. Ensure knowledge, competence and appropriate use of


CBT/Internet testing.
b. Consider the psychometric qualities of the CBT/Internet test.
c. Where the CBT/Internet test has been developed from a paper-and-
pencil version, ensure that there is evidence of equivalence.
d. Score and analyse CBT/Internet test results accurately.
e. Interpret results appropriately and provide appropriate feedback.
f. Consider equality of access for all groups.

3. Provide appropriate levels of control over CBT and Internet testing

a. Detail the level of control over the test conditions.


b. Detail the appropriate control over the supervision of the testing.
c. Give due consideration to controlling prior practice and item
exposure.
d. Give consideration to control over the test-taker’s authenticity and
cheating.
4. Make appropriate provision for security and safeguarding privacy in
CBT and Internet testing

a. Take account of the security of test materials.


b. Consider the security of the test-taker’s data transferred over the
Internet.
c. Maintain the confidentiality of the test-taker’s results.

Each of these aspects is considered in terms of three interest groups,


namely test developers, test publishers and test users, and each of the 18
sub-areas is further embellished. In organisational settings, for example,
it is now possible with some tests to retest selected (short-listed)
candidates in a supervised setting using a subset of items from the
database used for the unsupervised testing session, and to compare the
results from the two different administrations.

Of particular concern to the issues of test fairness with culturally


different groups is the fourth Guideline 2f: “Consider equality of access
for all groups”, which reads as follows:

2 f 4. For tests that are to be used internationally:

avoid the use of language, drawings, content, graphics, etc.


that are country or culture specific.
where culture-specific tests may be more suitable than
culturally neutral ones, ensure that there is construct
equivalence across the different forms.

(See http://www.psychology.org.au/assets/files/online-psychological-
testing.pdf for more details.)

9.7.1 South Africa


Despite the fact that the professional association of industrial
psychologists (SIOPSA – Society for Industrial and Organisational
Psychologists of South Africa) actively supports the English/European
model, the Professional Board for Psychology has rejected all
alternatives other than the managed approach, insisting also that these
sessions should be supervised by a registered psychologist (HPCSA,
2006, p. 5). This stance is also supported by Cheryl Foxcroft and Gert
Roodt in the 2009 edition of their text, who argue strongly that “there
are sound professional and legal reasons why a psychology professional
should be present during the administration of a computer-based or
Internet-delivered test” (Foxcroft & Roodt, 2009, p. 258). They support
this stipulation by citing the work of Hall et al. (2005), and quote the
findings of a worldwide survey conducted by the International Test
Commission (ITC) and the European Federation of Professional
Psychology Associations (EFPPA) (Bartram & Coyne, 1998). One of
the questions asked in this survey was whether “the use of psychological
tests should be restricted to qualified psychologists”. The answer to this
question on a 5-point scale was an overwhelming 4,45 (SD = 0,75) in
support of the statement. On the basis of this result, and in the light of a
report by Hall et al. (2005) arguing against the use of “test technicians”
in the US, Foxcroft and Roodt (2009) argue that testing is seen as a core
competency of psychologists throughout the world and that therefore
only psychologists should be allowed to conduct testing, including
Internet-based testing.

This stance is echoed to a certain extent by Tredoux (2013) who,


quoting the South African ethical code, argues that

[t]est administration … is not just the mechanical process of reading


instructions, timing and scoring. Rather, it is a professionally
compassionate, interactive process of observation and adaptation …
to ensure that respondents are fairly tested and not merely uniformly.
… The ethical code further specifically states that psychological
assessment must take place in a context of a defined professional
relationship. It is difficult to see how this requirement could be met
with unsupervised test administration (p. 433).

However, she goes on to argue that “[s]upervised computerised test


administration can, however, have considerable advantages – provided it
is properly managed” (p. 433). She is clearly taking a softer stand than
the “all-or-nothing” view put forward by Foxcroft and Roodt (2009),
who appear to discount findings contrary to their view, such as those of
Holloway (2003). Holloway argues that, despite the ITC/EFPPA survey
findings quoted by Foxcroft and Roodt, the APA is against this
prohibition of “testing technicians”, and that numerous states that had
previously been opposed to the use of technicians later reversed their
decisions in the early 2000s. She (Holloway) argues (quite convincingly,
in my view) that many other professions make use of “lesser”
professionals for the gathering of diagnostic information. She continues
(p. 26) as follows:

Imagine you broke your arm and needed an X-ray. Now imagine that
your physician was required to perform the X-ray, as well as every
other diagnostic test that might be required without the help of trained
technicians – for you and every patient that entered her practice.
Certainly it’s possible. But most primary-care physicians leave the X-
raying and lab tests to other professionals so that they have more time
to spend with patients. It’s an accepted practice that’s intended to
provide better quality patient care.

So, shouldn’t the same be true for neuropsychologists who rely on


tests and assessments to diagnose their patients? Shouldn’t they be
allowed to use skilled technicians to administer those tests under their
direction?

The APA Practice Directorate says yes – and has supported several
state psychological associations in defense of this practice.

Holloway (2003) goes on to cite Neil Pliskin, chair of the Practice


Advisory Committee for Division 40 of the APA (the Clinical
Neuropsychology Division). Pliskin said that those US states that
prohibited the use of testing technicians

interpreted the scope of practice laws in their states to indicate that


only licensed psychologists should be able to provide testing services.
That goes against a long tradition that dates back to the 1940s of using
a professional and technical model for data-gathering.
That model, Pliskin adds, is also endorsed by professional organisations
that represent neuropsychology, including the National Academy of
Neuropsychology (NAN) and the American Academy of Clinical
Neuropsychology.

As shown above, the South African Professional Board for Psychology


of the HSPCA tried to enforce the strict “managed approach”, insisting
that all Internet testing needs to be fully supervised by a registered
psychologist in full attendance. This is in line with the anti-technician
stance in the US. However, following protests by commercial test
producers represented by the Association of Test Producers (ATP) and a
court case which the HSPCA lost, these stipulations were withdrawn by
the HPCSA (ATP v HPCSA, Pretoria High Court, Case No. 4218/07)
(see also Tredoux, 2013, p. 431). At this stage, there appear to be no
legal prescriptions, and the use of “lesser” forms of administration is
entirely adequate.

Why should psychology adopt a much stricter and knowledge-intensive


approach than the other professions? In fact, Foxcroft and Roodt (2009,
pp. 257–258) quote the work of Petersen (2004), who argues that one of
the key primary healthcare services that registered counsellors in
particular can provide is assessment – registered counsellors are in many
ways equivalent to the US “testing technicians”. It must also be noted
that the survey cited by Foxcroft and Roodt also found that “only 14%
of those who practice educational, clinical, occupational and forensic
assessment were found to be psychologists” (Foxcroft & Roodt, 2009, p.
257). Quite clearly, professionals other than psychologists are also using
tests – and they are presumably well trained, professional, proficient and
ethical in this use. This suggests that the parts of the psychological
profession and/or its ethical code may have defined “psychological
testing” too broadly and have “over-interpreted” the situation in a way
that is no longer relevant to the changing assessment environment.

9.7.2 Britain
The BPS endorses the ITC Guidelines. The distinction made between
the four modes of test administration was put forward by Bartram
(2001), namely Open, Controlled, Supervised and Managed (see above).
9.8 Protection of minority groups

As we have seen in Chapters 7 and 8, the issues of fairness in testing,


especially in cross-cultural or multicultural contexts, is an area of
growing concern. This is especially important in countries with
numerous cultural groups living and working side by side, such as
Australia, Israel, the US and South Africa. However, as we have seen at
the beginning of Chapter 8, migration of people to developed economies
for whatever reason has increased rapidly in the last few decades. As a
result, even in countries that have traditionally viewed themselves as
fairly homogeneous, these migration trends are beginning to make
societies more culturally complex, and so a country such as the UK
cannot ignore the fact that it is becoming multicultural and multilingual.
The view that assessment is based on a single language and a single
cultural experience is rapidly being supplanted by the need for a
multicultural perspective. In line with this argument, psychologists
throughout the world, including those in the UK, need to insure that
groups whose mother tongue and cultural heritage is not English are
adequately protected.

9.8.1 South Africa


In South Africa, because of the contentious nature of assessment (tied as
it has been to perceptions of racial bias and discrimination), strong
efforts have been made to protect the rights of Africans and other
culturally diverse and often disadvantaged minority groups. This
protection is enshrined in the country’s labour legislation, particularly
the Employment Equity Act (EEA) 55 of 1998, which categorically
states in section 8 that

Psychological testing and other similar assessments are prohibited


unless the test or assessment being used (a) has been scientifically
shown to be valid and reliable; (b) can be applied fairly to all
employees; and (c) is not biased against any employee or group.

This legislation also states that a person should be considered for a


position even if he lacks the skills required for successful job
performance, but has the potential to acquire those skills “within a
reasonable period of time”. (What this time is in practice is not defined.)
It also places the onus on the organisation to define, identify and
develop “potential”. It should be noted that the EEA goes beyond
protection to the promotion and development of previously
disadvantaged groups by “promoting equal opportunity and fair
treatment” and by “implementing affirmative action measures to redress
the disadvantages experienced by designated groups”, namely people of
colour, females and the disabled.

9.8.2 Britain
As far as can be determined, no mention is made of the need to protect
these groups in the Code of Good Practice for Psychological Testing
prepared by the Committee on Test Standards and approved by the
Membership Standards Board in 2010, nor is any cross-reference made
to existing laws that are aimed at ensuring fair practice and non-
discrimination against women and minorities in the workplace. The
closest that can be found is item 10 of the Test Taker’s Guide published
by the BPS Psychological Testing Centre which does promise to “give
due consideration to factors such as gender, ethnicity, age, disability and
special needs, educational background and level of ability in using and
interpreting the results of tests”. In its brochure Using online assessment
tools for recruitment, the BPS makes a general statement about adapting
the test administration process to take account of disability and language
problems, suggesting that the introductory information/biographic
section should “allow such candidates to identify that there may be a
problem and assess whether taking the test in the standard way will be
appropriate for them” (p. 5). It goes on to suggest that special
arrangements may need to be made for the disabled, as not to do so
would be in contradiction of the Disability Discrimination Act of 1995.

9.8.3 The US
In the US, while there are no statutory requirements or regulations
regarding the use of psychological assessment, the Equal Employment
Opportunity Commission (EEOC), supported by the Equal Employment
Opportunity Act, has jurisdiction over people using assessments
incorrectly or inappropriately in the employment context. The primary
concern of the EEOC is to reduce non-job-related discrimination in
hiring practices. Recall that in Chapter 7 (section 7.1.3), Schmidt’s
(1988) three forms of discrimination were referred to, namely adverse
impact, pre-market discrimination and disparate treatment. When trying
to assess whether any selection or assessment process is unfair, these
three aspects need to be borne in mind. However, the emphasis seems to
fall on adverse impact, and various formulae and processes need to be
put in place to minimise or control for these. In particular, the 4/5ths rule
appears to be the dominant model – in short, the ratio between the
proportion of a target group (e.g. females) being selected and the
proportion of the reference group (e.g. males) should be 80 per cent or
greater. She gives the example of 100 people being tested for upper-
body strength – 50 females and 50 males. Of the 50 females, 30 (60 per
cent) pass the test and are selected; of the males, 45 (90 per cent) pass
the test and are selected. The ratio of female to male selection rate is
60/90, which equals, .67 or 67 per cent. This is lower than the 80 per
cent level, thus indicating an adverse impact of the assessment against
the female group in the selection process. Other ways of demonstrating
adverse impact and other forms of bias and discrimination are discussed
in Chapter 8.

9.8.4 Europe
Of all the countries in Europe, it would seem that the Netherlands has
the most active interest in issues of minority rights protection, and most
of the work in cross-cultural assessment has been done by people such
as Fons van de Vijver and Ype Poortinga (e.g. Van de Vijver, 2002; Van
de Vijver & Hambleton, 1996; Van de Vijver & Poortinga, 1997).

9.8.5 Australia
In Australia, the General Registration Standard was approved by the
Australian Health Workforce Ministerial Council on 31 March 2010 (in
line with the Health Practitioner Regulation National Law – the National
Law). This law took effect from 1 July 2010 and has force in each state
and territory and, among other things, specifies examination criteria for
registration as a psychologist. In terms of these standards, four broad
areas of competency are defined, and four content domains are
specified. The second of these is the Assessment Domain, requiring
candidates to show (inter alia) “understanding of … cross-cultural
issues, and test uses with different age and gender groups” (Psychology
Board of Australia (2012, pp. 4–5). Employers (at least in Victoria) have
a positive duty under the Equal Opportunity Act of 2010 to take
reasonable and proportionate measures to eliminate discrimination,
sexual harassment and victimisation. Various states have similar pieces
of legislation, although fairness in assessing across cultural groups does
not appear to be specifically covered in these acts.

9.9 South Africa in relation to other parts of the world

In a 2005 study, the staff of Saville Holdsworth Limited (South Africa)


(SHL-SA) conducted a study into the regulation of psychologists and
psychological assessment in 21 countries. Some of their findings are
reprinted (with permission) in Sidebar 9.1.

Sidebar 9.1 Global trends and the regulation of


psychological tests

Regulations in South Africa


The statutory body, the Health Professions Council of South Africa (HPCSA), was
established to promote the health of the population, determine standards of
professional education and training, and set and maintain standards of excellence
for ethical and professional practice. The mandate of the Professional Board for
Psychology is to control and exercise authority in respect of all matters relating to
the training of psychologists, registered counsellors and psychometrists; promote
the standards of such education and training in South Africa and maintain and
enhance the dignity of the profession and the integrity of the persons practising
the profession.

Survey methodology
To obtain a view of the regulations for psychological testing in other countries, a
questionnaire was sent to country managers of SHL worldwide. The
questionnaire consisted of ten questions and covered the topics of test regulation
and classification, test administration, Internet testing and feedback on test
results. Responses received were collated and, where applicable, the frequency
of response types calculated.
In total, responses were received from 21 countries:

Australia Belgium Estonia


Finland France Greece
Hong Kong India Ireland
Israel Italy Japan
Netherlands New Zealand Portugal
Russia Singapore Sweden
Switzerland UK USA

Statutory body or laws regulating the use of


psychological tests
There are no statutory bodies or regulations concerning the use of psychological
tests in ten of the countries that responded to the survey. In nine of the countries,
professional associations provide guidance for practitioners, which is not
necessarily enforceable by law.
Regulations governing the use of psychological tests are provided by legislation
in Israel and Finland. In Israel this covers the use of psychological tests by
psychologists, and in Finland the law concerning privacy and data protection in
the working life (2001) requires users of tests to be competent. However, the
responsibility to ensure that test users are competent rests with the employer.
The Finnish Psychological Association, which is a member of the European
Federation of Psychologists’ Association, has established a certification board for
the profession, but this is not yet well established.

Classification system for psychological instruments


Fifteen of the countries that responded to the survey do not have a system that
classifies psychological tests into different categories. Some indicated that while
there are no formal classification systems, distinctions are made between
personality and ability measures – Finland recommends distinguishing between
competency-based and psychological (attribute-based) instruments.
Six of the countries responding have such a classification system. For example,
Belgium’s classification system differentiates between clinical personality
questionnaires or projective tests, occupational personality questionnaires, and
personality questionnaires with report generation. In most cases where a system
exists it also prescribes who may administer, score, interpret and provide
feedback to candidates on psychological tests. In four of the countries responding
to the survey the test publishers themselves are responsible for setting and
monitoring the qualification requirements for use of their products.
Professional associations’ and psychologists’ ethical codes give guidelines on
who may administer psychological instruments in three of the countries included,
although these guidelines are not necessarily enforceable by law. Five countries
indicated the existence of some regulations concerning test administration.

Regulations for test administrators*


Regulations or policies concerning who may administer psychological tests vary
from country to country. However, nine of the countries that responded do not
have any regulations or restrictions regarding who may administer psychological
tests. In the US, there are no regulations for the administration of work-related
assessments, but in order to purchase clinical instruments such as the MMPI, the
purchaser is required to be a licensed psychologist. This is regulated by the test
publishers.

Qualifications required for a test administrator


The majority of respondents – fourteen countries – indicated that qualification
requirements for test administrators were not specified. In two of the countries,
the test publishers set their own requirements, while another two countries
indicated that professional associations provide guidance. A further two countries
require a degree in psychology, and one country indicated that one had to be a
qualified psychologist in order to administer psychological tests.

Conclusion
From the findings of the survey, it has been concluded that South Africa’s
regulations regarding the use of psychological tests are among the most stringent
in the world. However, it is interesting to note the parallels between labour
legislation in South Africa and the US.
In the US, although there are no statutory requirements or regulations regarding
the use of psychological assessment, the Equal Employment Opportunity
Commission (EEOC), supported by the Equal Employment Opportunity Act, has
jurisdiction over people using assessments incorrectly or inappropriately in the
employment context. The primary concern of the EEOC is to reduce non-job-
related discrimination in hiring practices. This approach to testing provides
greater control over the use of psychological assessment in the workplace as
employers are forced by law to use best practices in their assessment
methodology. This relates to the South African labour legislation context, where
the Employment Equity Act prohibits the use of psychological testing or other
similar assessments of an employee unless the test or assessment used can be
shown to be valid and reliable, is applied fairly to all employees, and is not biased
against any employee or group of employees. This parallel between the situations
in the US and South Africa, which share the common goal of protecting
employees from unfair discrimination, raises the question of whether relying on
labour legislation to regulate psychological assessment results in greater control
over the use of test level categories in the workplace.
Source: Reprinted from SHL (SA) Newsline, September 2005, with permission.

By world standards, it seems that in South Africa the psychology


profession is quite strictly regulated, with the emphasis falling on
statutory regulation (various laws exist, most of which are policed by the
government-instituted HPCSA). This contrasts quite markedly with
most other countries, where control lies largely in the hands of the
profession itself. However, there is a growing trend towards statutory
control, with Britain, Australia and New Zealand having recently moved
in this direction.

9.10 Summary

In this chapter, we looked at several issues, including the steps taken to


ensure fairness during administration, the control of psychological
assessment practice in South Africa, Internet testing and the protection
of minority groups. With respect to the first, we examined the different
areas in which the administration of assessment needs to be
standardised. We identified three major areas, namely preparation for
the assessment (choice of techniques and sequence, the materials, the
instructions and the venue); the administration of the assessment
(establishing rapport, ensuring understanding of task instructions, and
monitoring during the assessment process); and completion of the
assessment (the collecting and securing of the material and scoring).
Other important issues discussed were the correct interpretation,
feedback and confidentiality of the results, and special situations and/or
participants (in terms of language, physical disability, hearing and visual
impairment) were dealt with. We also examined the setting and keeping
ethical standards in terms of many of the issues discussed in an effort to
safeguard the rights of people being assessed.

The second aspect discussed was the statutory control of psychological


techniques in South Africa. We considered both the historical situation
and some of the more recent developments. As part of this discussion,
we looked at how South Africa compares with other parts of the world,
including Australia, New Zealand, the UK and the US. Finally, we
looked at some research into global trends and the regulation of
psychological tests, and concluded that South Africa has one of the most
stringent statutory control systems in the world; however, several other
countries, including Britain, Australia and New Zealand, are moving in
this direction.

The third topic we examined was the issues surrounding assessment via
the Internet and international differences in this respect. We found that
South Africa was less willing than most other countries to accept the
idea of anyone other than a registered psychologist or
psychometrist/counsellor being able to do assessments.

Finally, the way in which different countries protect the rights of its
minority groups was examined: in this area, South Africa is better
placed than many of the other countries examined.

Additional reading

For a comprehensive discussion of many of the issues related to the sound


administration of tests and other forms of assessment, see McIntire, S.A. & Miller, L.A.
(2000). Foundations of psychological testing, especially Chapter 3.
Bartram (2011, pp. 149–159) gives a good discussion of the British and European
approach to the control of testing, including Internet testing.
The 2012 SIOPSA submission on this to the Professional Board of Psychology can be
found in SIOPSA (2012), Recommendations for regulating development, control and
use of psychological tests, which is available at
http://www.siopsa.org.za/files/pai_regulating_development.pdf
Test your understanding

Short paragraphs

1. Briefly describe why sticking to the specified time limits in a test is important.
2. What is meant by reasonable accommodation, and how and when should this apply
in an assessment situation?
3. Discuss the concepts of confidentiality and test security, and say what must be done
in this regard.

Essays

1. Discuss the three areas of standardisation in the administration of psychological


assessment.
2. Discuss the current and/or planned statutory arrangements for the registration of both
psychological practitioners and psychological materials, indicating how these
arrangements differ from those in practice in 1994. How do the South African
requirements differ from those elsewhere in the world?
SECTION
3

Domains of assessment
In this section of the book, we see how we use the theory we have developed to
assess psychological ability and performance in various areas. The next four
chapters are concerned with how we define and measure intelligence and ability
(Chapter 10), personality (Chapter 11), competence (Chapter 12) and integrity
and honesty (Chapter 13). Each chapter starts by defining the constructs involved
and then seeing how these definitions shape the way in which they are measured.
10 Assessing intelligence and
ability

OBJECTIVES

By the end of this chapter, you should be able to

give various definitions of intelligence


describe various approaches to the conceptualisation of intelligence
show how these approaches influence the way intelligence is assessed
discuss the factors that contribute to cognitive ability
show how intelligence and aptitude scores may be inaccurate
show what can be done to improve the accuracy of intelligence assessment.

10.1 Introduction

More than any other area or domain, the measurement of intelligence is


fraught with difficulties. Firstly, the topic is a highly emotive one that
has been politicised, largely because intelligence is somehow seen as
inherited and therefore beyond change. Although this assumption is
incorrect, nevertheless the construct of intelligence is central to much of
what we achieve in life, therefore it has become stigmatised, and various
euphemisms (such as cognitive ability or aptitude) are used instead.
More importantly, the measurement of intelligence depends to a large
extent on the way we define it and how we think it is made up (i.e. its
structure). This thinking, in turn, has developed over time, and so we
also need to look briefly at the history of intelligence testing and see
how the different conceptions of intelligence have led to very different
assessment processes.
10.1.1 Intelligence defined
According to the About intelligence newsletter, intelligence is often seen
in the popular sense as the general mental ability to learn and apply
knowledge to manipulate one’s environment, as well as the ability to
reason and think in the abstract. The word “intelligence” comes from the
Latin verb intelligere, which means “to understand”. The newsletter
goes on to argue that other definitions of intelligence include
adaptability to a new environment or to changes in the current one, the
ability to evaluate and judge, the ability to comprehend complex ideas,
and the capacity for solving problems in an original and productive way.
Intelligence is also seen as the ability to learn quickly and learn from
experience, and even the ability to comprehend relationships. A superior
ability to interact with the environment and overcome its challenges is
often considered as a sign of intelligence. In this case, the environment
does not just refer to the physical landscape (e.g. buildings and streets)
or the surroundings (e.g. school, home, workplace) but also to a person’s
social contacts, such as colleagues, friends and family – or even
complete strangers.

Most psychologists would agree that intelligence involves the ability to


think and act in ways that are smart, allowing the more intelligent person
to arrive at correct answers to various problems in ways that are better or
faster than those of less intelligent people. More formally, intelligence
involves the ability to act purposefully by seeking out relevant
knowledge, finding or creating rules to link these units of knowledge or
information, applying them to novel situations, adapting and extending
the rules where necessary, and generally arriving at a reasoned and
appropriate decision. In this regard, David Wechsler (1939), one of the
fathers of intelligence testing, argues that there are four general criteria
for defining intelligent behaviour.

1. The organism must have awareness or insight (i.e. be aware of and


able to reflect on or think about the behaviours involved).
2. The behaviour must be meaningful and goal directed. The organism
must be purposefully trying to achieve or attain something.
3. The behaviour must be rational (i.e. it must involve reasoning) and
must be a conscious effort to address an issue of importance. Going
to the refrigerator to get a snack does not qualify as intelligent
behaviour as it does not require any advanced cognitive activity.
However, having to open the fridge door when your hands are full
or to sneak out a snack without being caught by someone who has
forbidden you to do this does count as intelligent behaviour because
it involves cognitive activity.
4. The behaviour must have some value or be positively regarded by
society. Criminal acts such as violence or fraud are not considered to
be intelligent, no matter how complex the behaviour is or how well
the problems have been solved. In other words, a terrorist act such
as the 9/11 attack on the Twin Towers is not seen as an intelligent
act, no matter how carefully planned and executed it was, because it
was antisocial. This last aspect is somewhat contentious, as so-
called terrorist actions may be approved by one segment of society
and not by others – one person’s terrorist is another person’s
freedom fighter. Our own history of liberation illustrates this
argument.

This raises the important issue that intelligence often lies in the eyes of the
beholder, and involves approval of the outcome by powerful people. Intelligence is
socially defined – it is the ability to solve problems in a particular social context.

With this as background, we see that intelligence has been defined in


different ways, with each definition stressing different aspects of the
process of acting purposefully. (For a good overview of these different
approaches to intelligence, see Sternberg, 1983, 2000.) Among the most
important of these are the following:

10.1.1.1 Learning from experience


One of the most common definitions of intelligence is that it is the
capacity to learn from experience and the ability to adapt to the
environment – the more intelligent the person is, the quicker he learns
and adapts, and the fewer the mistakes he makes in the process. This
implies that people who are able to learn or adapt more quickly than
others are more intelligent. (See Morris & Maisto, 2002, p. 313.)

10.1.1.2 Ability to understand or comprehend


This is a person’s ability to understand or comprehend the meaning of
what he perceives. It is this understanding and comprehension that
allows the person to learn from his experience.

10.1.1.3 Recognising patterns


Clearly, one of the reasons people understand and learn from their
experiences is that they are able to see relationships between aspects of a
situation and what may happen to them – if this occurs, then that will be
the result. The quicker a person is able to see patterns in the information
he receives and make the links between the elements (or units of
information), the more intelligent he is.

10.1.1.4 Discovering rules


Closely related to this view is the argument that intelligence involves not
only the ability to see relationships and patterns, but also to understand
the nature of the relationship – what causes what. This involves the
ability to discover the rules that relate the different aspects of the
situation to each other, and is considered an important aspect of
developing intelligence. To act intelligently, people have to discover the
rule(s) that govern a situation and then use them to show how other
kinds of situations will lead to similar (or different) outcomes. This
points to what may be the single most important factor involved in
intelligence – the ability to identify the rule(s) that govern or link the
elements people are considering. A crucial aspect of this process is the
need to carefully and deliberately state these rules (in language). Many
people find it easier to solve difficult problems by vocalising their
thought processes, for example A leads to B, and this then causes A to
… and so on.

10.1.1.5 Solving problems


Following on from this idea of thinking as forming rules is the view that
thinking is solving problems (see Gardner, 1993, and Sternberg, 2000).
If we are trying to understand what we mean by intelligence, clearly the
ability to identify a problem situation and to find ways of solving it are
important. In many ways, problem solving is an extension of rule
formation, because it suggests that firstly we must find a rule linking
elements of the situation together, then we must identify the element that
is incorrect or is an exception to the rule, and finally we must decide on
the steps that we must take to rectify the situation.

10.1.1.6 Processing information


The newest approach to intelligence and one that has its origins in
modern computer technology is information processing. Here the basic
argument is that all organisms (including people) receive inputs or
information from the outside (and inner) world and then process or
interpret this information (often transforming it) before reacting or
outputting (behaving) in the light of the (transformed) information (see
Sternberg, 2000). For example, I may see an object approaching me
(input), recognise it as a dog (transform the input), and because I am
afraid of dogs as I was once attacked by a large dog (information from
memory), I wet myself, pick up a stick or run away (behavioural output).
Alternatively, I may not be afraid of dogs and go towards it and pat it.

The information-processing approach to thinking looks at what happens


at each stage and sees how the various components of memory and
information processing link to each other. Intelligence is then defined as
the efficiency with which the information is processed, where efficiency
is linked to effort, speed and “correctness” of outcomes.

In summary, all these definitions of intelligence (no matter which of the


approaches is used) have to do with the desirability of the outcomes of
the thinking or reasoning process, and the speed and efficiency with
which it is carried out. They also emphasise that, while different groups
may desire different kinds of output, it is the definition of those people
who hold powerful positions in the particular society that matters.

In line with these arguments, in 1994 David Wechsler, the designer of many of the
important tests in use today, defined intelligence as the “aggregate or global
capacity of the individual to act purposefully, to think rationally and to deal
effectively with his environment” (Wechsler, 1994, p. 4). Similarly, Westen (2002,
p. 280) defines intelligence as “the application of cognitive skills and knowledge to
learn, solve problems and obtain ends that are valued by an individual or a
culture.”

For example, while a criminal may be very clever in committing a


crime, society does not recognise this cleverness as intelligent
behaviour, even though fellow criminals may argue that it does show
intelligence. In the same way, different social groups may value social
cohesion above efficiency or speed of decision making, and would
regard outputs that promote social goals as more intelligent than other
outcomes.

Although there are many definitions, most people would agree that
intelligence is not a thing or a process, but a quality that describes
behaviour, mainly in terms of proficiency or competence – how well the
person is able to perform various cognitive tasks. In other words, if
thinking is the ability to link units of information to make meaning, then
people who do this and arrive at “correct” meanings and the “right”
answers more often and more quickly than others are said to be more
intelligent than those who are slower or get the answers wrong.
However, different groups may have different views about what
constitutes a “correct” answer – intelligence is thus socially defined. In
Eurocentric countries, it is generally the educators and the business
world that make this decision; in a very religious society, it would be the
clerics and religious leaders who would decide on (and reward) the
correct answers to particular problems.

Intelligence is thus socially defined by the people and groups controlling


the society, in terms of the principles and outcomes they, the dominating
elite, see as important. At the end of the day, intelligence is a social or
political term, as much as a technical one.

We cannot escape the fact that, in Western industrial and educational settings,
efficiency and speed are more important than social cohesion and therefore the
definitions of intelligence that dominate will stress the former rather than the latter.
We therefore also cannot avoid the view that in the industrialised world,
intelligence is the ability to learn from experience, to process information in order
to discover rules that can be applied in other settings to explain relationships and
to solve problems in a speedy and efficient way. Our understanding and
assessment of intelligence is based on this view.

It is therefore not surprising to learn that in a major study of over one


million students, Ones, Viswesvaran and Dilchert (2005) showed that
general mental ability (intelligence) was a strong valid predictor of
examination success, learning and outcome at school and university,
regardless of the speciality or subject involved. They also demonstrated
that cognitive ability tests and intelligence levels predict job
performance well because intelligence “is linked to the speed and quality
of learning, adaptability and problem-solving ability”. (See Furnham,
2008, p. 194.)

10.2 The historical development of the concept of


intelligence

10.2.1 Francis Galton


Historically, the first real efforts at measuring intelligence were those of
Sir Francis Galton, a cousin of Charles Darwin, the man who shocked
the world by putting forward the theory of evolution at the end of the
19th century. Galton, who published his findings in the 1880s and
1890s, tried to explain why superior intelligence or genius ran in certain
families (like his own!). He believed that intelligence was inherited and
biologically determined, and that perception, attention, memory,
language, problem solving, reasoning and other cognitive processes
were all dependent on the power of the nervous system. Accordingly, he
tried to link various psycho-physiological factors (such as reaction time,
sensory discrimination, and visual and hearing acuity) to intelligence.

This fitted in well with the philosophy of the day, which commonly
assumed that consciousness was simply a composition of elementary
processes such as simple sensations, images and perceptions, and that
intelligent people were better able than others at arranging these
elements into complex thought and behaviour patterns. Therefore, he
argued, that in order to understand these complex processes, one needed
only to see how different people used these basic processes. (This is like
saying that we can appreciate the difference between a well-built luxury
house and a poor-quality dwelling by looking at the materials used in the
building and the skill with which they are used. In itself this is not a
good assumption as it excludes aspects such as the architect’s plan in
relation to the requirements of the owners.) This approach thus proved
to be very wrong, and very few findings linking these basic
physiological processes to more complex forms of thinking and
behaviour were made.

However, later in section 10.4.2 in this chapter and in Chapter 18, we see that
there is a correlation between speed of information processing, even at a physical
level, and intelligence. Perhaps Galton and Cattell (see section 10.2.2) were not
so wrong, but merely lacked the sophisticated equipment needed to demonstrate
these relationships.

10.2.2 James McKeen Cattell


Cattell was the first North American proponent of mental testing. Like
Galton, he believed that sensory, perceptual and motor processes were
the fundamental elements of thought, and he set about measuring the
ability of large numbers of college students to select the heavier of two
weights and the speed with which they responded to a tone. He
continued with this line of research until around 1915, but failed to show
anything except the smallest relationship between these and similar
results and college achievement. Cattell is credited with coining the term
“mental test”.

10.2.3 Alfred Binet


A very different approach was taken by Alfred Binet, a French
psychologist. In 1904 the school authorities in Paris asked him to devise
a means of identifying children of low intelligence who needed special
education that would be more objective than teacher judgements. Unlike
Galton and Cattell, Binet argued that intelligence was a high-level
ability to which mental judgement was the key. It could best be assessed
using relatively complex thought patterns usually displayed in everyday
life and in the ability to learn within an academic setting. Together with
his student, Theophile Simon, he developed a range of 30 different age-
appropriate tests that were given initially to 50 “normal” school
children. The questions were administered in order of difficulty by a
psychologist working individually with each child. The testing stopped
when the children were unable to answer a question or gave a wrong
answer. These tests were published in 1905. This technique is still used
in some tests.

Binet and Simon continued with this work, and by 1908 they had
collected enough data to calculate the number of questions children of
each age could typically answer correctly. On the basis of these data,
Binet and Simon identified the age at which the children were able to
correctly answer questions of increasing difficulty level. (For example,
the average seven-year-old was able to explain the difference between
paper and cardboard, whereas the average five-year-old was not.) The
age level of the questions was then regarded as the mental age of the
child. For example, the child (or adult) who could answer questions at
the seven-year-old level was said to have a mental age of seven years.
By comparing the responding child’s performance with that of children
of the same age, they were able to identify those children who should be
sent for remedial education.

10.2.4 Lewis Terman


In 1916, Lewis Terman, a professor at Stanford University in California,
adapted the Simon-Binet tests for use in the US. These became the
Stanford-Binet tests, which remain some of the most widely used
intelligence tests today. These tests underwent major revisions in 1937
and 1960, and more recently in 2003, yielding the fifth edition (SB5)
(Roid, 2003). Terman expressed a child’s level of performance in terms
of an intelligence quotient* or IQ, a term first used by William Stern in
1912 and represented by the following formula
This means that an eight-year-old child who can answer questions
typically answered by ten-year-olds would have an IQ equal to 125 (10
÷ 8 × 100), whereas a ten-year-old child who answers questions
typically answered by eight-year-olds would have an IQ of 80 (8 ÷ 10 ×
100). Obviously a ten-year-old with a mental age of ten would have an
IQ of 100 (10 ÷ 10 × 100 = 100), which is the average. This approach
and the latest version of the Stanford-Binet scale are still in use today.
However, this way of defining IQ breaks down as the age of the person
increases – for both to have an IQ of 100, a 60-year-old would have to
have twice as much knowledge as a 30-year-old.

10.2.5 Developments in intelligence testing after Binet


Apart from Binet’s work, the most important contribution to the
development of psychological testing in the US was that of David
Wechsler. Starting in the late 1930s, Wechsler developed his own tests,
the Wechsler Intelligence Scale for Children (WISC) and the Wechsler
Adult Intelligence Scale (WAIS).

Like Binet’s scale, the WISC and the WAIS are administered
individually. Wechsler (1939) introduced two important innovations.
The first was to distinguish between two types of scholastic intelligence:
verbal and performance, measured by different subtests which make up
two different subscales. The performance subtests require the child or
adult actually to do something, for example rearrange wooden blocks in
a particular way to reproduce a design shown on a card or rearrange
cards showing the elements of a story in their correct narrative sequence.
In practice, it is found that children with special educational needs often
do much better on the performance scales than on the verbal scales.
Used diagnostically, Wechsler’s scales thus have the advantage of being
very useful in identifying children who appear to be underperforming
for some reason (e.g. because of emotional difficulties in relation to their
overall IQ).

Wechsler’s other innovation was to introduce a statistical method of


calculating IQ. This proved necessary because the formula used by
Terman does not work with adults, as their mental age does not increase
as they get older as it does in childhood. By collecting test score data
from large samples that were representative or typical of the US
population, Wechsler was able to obtain a frequency distribution* of
scores (the normal distribution or bell curve*) for each age group.
Using this information, he could then see how far above or below the
average score for the age group an individual was and in this way he
was able to convert each person’s raw test score into an IQ score. For
example, if a person’s raw score was one standard deviation higher than
the mean (or average) score obtained by people of his age (defined as
100), then his IQ was calculated as 115 (i.e. 100 + 15). Likewise, if a
person’s raw score was one standard deviation lower than the mean
score for his age, his IQ was 85 (i.e. 100 – 15). All IQ scores have been
calculated in this way (known as the deviation IQ) since Wechsler’s day.
From this it can clearly be seen that an intelligence test is only as good
as the population norms used to convert raw scores into IQ scores. As a
general rule, IQ tests should be updated and modernised every 12 years
or so.

The most popular and most widely used IQ scales today are those
developed by Wechsler. They include:

1. The Revised Wechsler Preschool and Primary Scale of Intelligence


(WPPSI-R)
2. The Wechsler Intelligence Scale for Children (WISC-III)
3. The Wechsler Adult Intelligence Scale – Revised (WAIS-III)

These have been adapted in various ways by most countries, including


South Africa, for both group-based and individual application.

10.3 Structural models of intelligence – the building


blocks of intelligence
As it became easier to measure the intelligence of large numbers of
participants, people began to wonder about the underlying features or
building blocks of intelligence. Stated differently, they wondered about
the nature or structure of intelligence.

10.3.1 Structural (factor analytic) approaches


One way of approaching this question is based on the statistical
technique known as factor analysis. Factor analysis argues that
differences in scores on a number of different tasks can be explained in
terms of relatively few underlying abilities or factors (this was discussed
in Sidebar 5.1 in Chapter 5 when we examined the factor analysis
approach to construct validity). The factor analysis approach to defining
the structure of intelligence involves giving a wide range of cognitive
tests to a large group of people and factor analysing their results.
Depending on which tests we administer, which factor analysis
technique is used and which decision-making criteria are applied,
different factor structures emerge. This is the main reason why there are
different models of intelligence – the various theorists use different
methods of factor analysis.

10.3.1.1 Charles Spearman (1900s)


The earliest attempts at factor analysis are associated with an English
psychologist, Charles Spearman, who in about 1904 started to use this
recently discovered technique to describe the nature or structure of
intelligence. When Spearman did a factor analysis of the intelligence test
results of thousands of people on a large number of different tests, he
found that there was one very strong general factor and a number of
more specific factors, something like the example of the Grade 12
examination results. He named the former “general intelligence” or
simply “g-” and the latter “specific” or “s-” factors (see Spearman,
1927). This model can be represented as a circle with g at the centre and
the various s-factors partly independent and partly overlapping. This is
shown in Figure 10.1. Note that the various s-factors differ in the extent
to which they overlap the central g-factor. Some s-factors have a higher
loading on g than others.

Figure 10.1 Spearman’s model

Research has shown (e.g. Levine et al., 1996) that g predicts outcomes
such as job performance and training very well. At the same time,
people who score higher in g show lower correlations between the s-
factors than people who score lower in g. This suggests that it may be
more important to assess the s-factors associated with job performance
for people who score high in g than it is for people who score relatively
low in g (Lubinski & Benbow, 2000).

10.3.2 Thurstone’s theory of primary mental abilities (1930s)


In the 1930s a different approach was put forward by an American
psychologist named Louis Thurstone, who argued that intelligence
consisted of a number of overlapping but distinct abilities. Using 14-
year-olds and college students as his participants, Thurstone (1938)
identified seven relatively distinct and independent abilities, which he
labelled primary mental abilities* or PMAs.

1. Verbal comprehension. This is measured by vocabulary and


reading comprehension tests.
2. Verbal fluency. This ability relates to writing and producing words,
and is tested by setting tasks such as: “Think of as many words as
you can that start with “C” in two minutes”.
3. Inductive reasoning. This is the reasoning involved in analogies,
completing a number series or in predicting the future, based upon
past experience.
4. Spatial visualisation. This involves mental rotation tasks such as
are required to fit a set of suitcases into the boot of a car.
5. Number. This is measured by simple mathematical problem-solving
tests.
6. Memory. This is the ability to recall pictures and texts, and to
remember people’s names or faces.
7. Perceptual speed. This refers to the ability to recognise and
respond to shapes quickly as is needed in rapidly proofreading to
discover typing errors in a text.

Thurstone’s theory of primary mental abilities suggests that it is entirely


possible for a person to be very good at mathematics, for example, and
mediocre in subjects requiring a good knowledge of English. Rather
than seeing intelligence in terms of a sun and planets, we could best
describe Thurstone’s PMAs as seven equally important balls being
juggled by a circus performer.

10.3.3 Raymond B. Cattell (1960s–1970s)


Spearman’s model was not accepted by everyone, including Cattell, who
argued in the 1970s (e.g. Cattell, Eber & Tatsuoka, 1970) that
intelligence consists of two major factors which he termed fluid and
crystallised intelligence. Fluid intelligence* involves those skills used
in rule discovery and problem solving (as described above), whereas
crystallised intelligence* abilities, including vocabulary, general
information and knowledge about specific fields, are derived from fluid
abilities and can be seen as their products. Crystallised intelligence
therefore reflects the facts and knowledge that people have learned at
school and elsewhere.
The currently accepted relationship between fluid and crystallised
intelligence is called the investment theory of intelligence*. According
to Cattell (1987), we are all born with a certain raw ability to see
relations and identify rules or patterns that exist between objects, and
that we can measure this ability (i.e. fluid intelligence) using appropriate
culture-fair tests. As we get older, we “invest” this fluid intelligence in
certain kinds of judgement skills, such as those involved in doing a
mathematical word problem or composing a sentence. When we are
young, the formal education most people receive means that our fluid
intelligence and our crystallised intelligence are so similar at an early
age that it is almost impossible to tell them apart.

As we grow older, however, we all begin to invest our fluid intelligence


in different areas, and our fluid and crystallised intelligence begin to
diverge. People who invest their fluid intelligence in academic activities
continue to show intellectual growth on conventional (crystallised) IQ
tests. Those that put their intelligence to work in other less mentally
stimulating areas will not show the same intellectual growth, and may
even show a decline in IQ on conventional measures of intelligence with
the passage of time. Together with John L. Horn, an American
psychologist, Cattell has shown that as people grow older, their fluid
intelligence (involving tasks such as memory, processing speed and
various types of reasoning) falls off, although their crystallised
intelligence (involving such aspects as vocabulary, general knowledge
and some number skills) is maintained or even increases with healthy
ageing (Horn & Cattell, 1966).

An image that springs to mind is that solving a problem or thinking a thought is


like baking a cake, which requires two different components – a set of ingredients
(flour, eggs, milk, vanilla essence, etc.) and a recipe (or set of rules for combining
the ingredients). In terms of this image, crystallised intelligence and knowledge
are the ingredients, and fluid intelligence is the rules for combining them (the
recipe). As people age, they forget the recipe, even though they still have all the
ingredients.

10.3.4 Philip Vernon (1950s–1970s)


Like Cattell, Vernon (1960) argues that abilities are hierarchical – that
is, they are arranged in a pyramid shape. At the top of the hierarchy is g,
or general ability. Below g in the hierarchy are two major group factors
representing a verbal–educational ability (v:ed) and a spatial– practical–
mechanical ability (k:m). The v:ed factor can be broken down further
into such aspects as verbal and numerical ability. Similarly, the k:m
factor can be broken down into specific factors such as perceptual speed
and spatial ability. This is shown in Figure 10.2.

Figure 10.2 Vernon’s model

Source: See Johnson & Bouchard (2005, p. 395)

10.3.5 J.B. Carroll (1930s–1970s)


A similar approach was adopted by Carroll who, over a 60-year period
from 1927 to 1987, examined some 430 data sets involving 130 000
participants. On the basis of his results, he came up with a hierarchical
model consisting of three levels or strata:

Stratum 1: Narrow specific abilities, such as spelling

Stratum 2: General abilities: speed and accuracy of abstract reasoning


or fluid intelligence abilities (gf), and the ability to accumulate
knowledge or crystallised intelligence (gc)

Stratum 3: Single general intelligence or g

Carroll (1993) summarises his work in his book Human cognitive


abilities. Using data from 3000 US Air Force applicants, Carretta and
Ree (1996) showed that a hierarchical model of ability explains the data
in a large US Air Force sample far better than does a g-only model.

10.3.6 J.P. Guilford (1950s–1980s)


By the latter part of the 20th century, most psychologists agreed that a
broader subdivision of abilities was needed than was provided by
Spearman, but not all agreed that the subdivision should be hierarchical
as suggested by Vernon and Carroll. J.P. Guilford, also an American
psychologist, proposed what he called the “Structure of Intellect” model
(Guilford & Hoepfner, 1971), arguing that intelligence was represented
as a cube with three intersecting dimensions:

1. Operations. There are five kinds of mental processes, namely


evaluation convergent production, divergent production, memory
and cognition.
2. Contents. There are five areas in which problems can occur, namely
visual, auditory, symbolic, semantic and behavioural.
3. Products. There are six kinds of responses or products, namely
units, classes, relations, systems, transformations and implications.

These various factors were arranged rather like a Rubik’s cube, resulting
in some 120 different factors (4 × 5 × 6 = 120). In 1984, Guilford
increased the number of abilities proposed by his theory, raising the
number of operations to five (by adding evaluation) and the total to 150.
This is shown in Figure 10.3.

Figure 10.3 Guilford’s SI model


Source: Guilford & Hoepfner (1971)

10.4 The cognitive approach

Theories of intelligence that rely entirely on the psychometric approach


that we have described so far suffer from two major flaws: first, they are
descriptive rather than explanatory – they seek to describe the structure
of intellectual performance, but do not explain how it is made possible.
Second, they lack a theory that is independent of the data which provide
evidence for it. As a result, a different approach to understanding what
intelligence is has been formulated – the cognitive or information-
processing approach. This approach deals with both the above
limitations of the psychometric approaches. There are a number of
theories regarding the cognitive approach to intelligence.

10.4.1 Hunt’s cognitive correlates approach


Instead of starting with conventional psychometric tests, three American
psychologists, Earl B. Hunt, Nancy Frost and Clifford E. Lunneborg,
began their study of intelligence with tasks that experimental
psychologists were using in their laboratories to study the basic
phenomena of cognition, such as perception, learning and memory.
They showed that individual differences in performing these tasks were
in fact related (although rather weakly) to individual differences in
intelligence test scores. These results, they argued, showed that the basic
cognitive processes might be the building blocks of intelligence.

In their research they used a technique designed by Michael Posner


(1980), in which a person is shown a pair of letters, such as “AA”, “Aa,”
or “Ab”. The person has to indicate as quickly as possible whether the
two letters are the same physically (AA) by pressing key 1 on a keypad,
whether the two letters are the same only in name (Aa) by pressing key
2, or whether the two letters are different (Ab) by pressing key 3.

They recorded the time taken (in milliseconds) to answer each type of
item pair and then subtracted the reaction time to the question about
physical match (AA) from the reaction time to the question about name
match (Aa). In this way they were able to separate the time required for
sheer speed of reading letters and pressing keys on a computer from the
time taken to interpret the different shapes “A” and “a”. Their most
important finding was that people differed in the speed with which they
identified the different letter combinations, and that these score
differences were closely related to scores on various intelligence tests,
especially those tests of verbal ability, such as verbal analogies and
reading comprehension. The researchers concluded that people who
score well on verbal tests are those who have the underlying ability to
absorb and then retrieve from memory large amounts of verbal
information in a short space of time. The short time they took to process
the verbal information was the key to their verbal intelligence.

When Hunt and his colleagues (Hunt, Frost & Lunneborg, 1973) used
Posner’s technique, Hunt realised that the people who performed well
also did well at other verbal tasks (he termed these people “high
verbal”). He argued that what was happening was far more than a simple
storage and retrieval process, and that those people who were good at
the Posner tasks were, in fact, using higher-level thinking processes to
decide on the best strategy for approaching the task. As a result of these
findings, in 1973 he began to consider what it was that people with a
high verbal intelligence did that was different from those who were
lower in verbal intelligence – he asked the question: What does it mean
to be high verbal? (Hunt, Frost & Lunneborg, 1973).

By posing this question, Hunt became the first person to move from
looking at the outcomes or products of thinking (what everyone
discussed so far had done) to asking questions about the processes
involved in thinking.

10.4.2 Sternberg’s componential theory (1970s–1990s)


A few years later, the American psychologist Robert J. Sternberg
suggested an alternative approach to studying the cognitive processes
underlying human intelligence. He argued that Hunt and his colleagues
had found only a weak relation between basic cognitive tasks and
psychometric test scores because the tasks they were using were not
complex enough. Although low-level cognitive processes may be
involved in intelligence, Sternberg (1977) believed that they were
peripheral rather than central. He proposed that psychologists should
rather study the tasks found in the intelligence tests and then determine
the mental processes and strategies that people use to perform those
tasks. After all, if we are trying to understand what intelligence is, why
not use the items that have been developed to measure intelligence?

Sternberg began by looking at the processes involved in solving


analogies tasks such as: “lawyer” is to “client” as “doctor” is to “?”.
Various possible answers were given, for example “patient”,
“medicine”, “illness” and “cure”. (The notation used in these items was
lawyer : client :: doctor : ? a) patient b) medicine c) illness d) cure.) He
found that there are five distinct processes or components that underlie
the processing of these analogies. These are:

1. Evaluating and understanding each term used in the problem (e.g.


lawyer, client, doctor, patient, medicine, illness and cure) by
focusing on each term and retrieving information about the term
from memory
2. Inferring the relationship between (i.e. the rule linking) “lawyer”
and “client” (i.e. a lawyer is a knowledgeable specialist who tries to
solve the legal problems of a client)
3. Mapping or transferring the relationship or rule that applies in the
first half of the problem onto the second half (i.e. a doctor is a
knowledgeable specialist who tries to solve the medical problems of
a person)
4. Applying the rule or relationship to the second part of the problem
(i.e. “What is the name of a person who consults a doctor in order to
solve a medical problem?”). In this case, the answer can only be a
“patient”, which is answer “a”.
5. Indicating the chosen alternative on the answer sheet

In 1977, Sternberg carried out a series of observations in a test


laboratory where these five different aspects of a participant’s
performance were timed very accurately. Using this reaction-time data,
he was able to isolate the various components of information processing.
He determined whether or not each person did indeed use these
processes, how they were combined, how long each process took (in
milliseconds), and how susceptible each process was to error. He was
able to show that everybody used the same steps to solve the analogies,
and that these and similar cognitive processes were involved in a wide
variety of intellectual tasks. He went on to show that these and other
related processes underlie scores on intelligence tests (Sternberg, 1977).

Sternberg was also able to show that people who were better at solving
the mental problems took less time to process the information than
people who were not so good at solving them. He also showed that the
better problem solvers spent longer on the encoding stage (step 1) and
less time on the relationship stages (steps 2–4) than did less-able
problem solvers. As Sternberg put it: “They want to make sure they
understand what they are doing before they go ahead and do it” (2000, p.
252). Sternberg called this approach a componential approach because it
involved breaking mental problem solving down into the component
processes that made problem solving possible.

10.4.3 Sternberg’s triarchic theory of intelligence (1980s–


2000s)
Building on and extending his componential analysis of thinking,
Sternberg (1988) went on to develop what is termed his triarchic theory
of intelligence. In this theory he argues that there are three distinct and
largely independent types of intelligence: analytical, creative and
practical. Each is characterised by different cognitive operations and
practical activities. Analytical intelligence involves solving familiar
problems by using strategies that manipulate the elements of a problem
or the relationships between them. It is characterised by the processes of
analysis, comparison and evaluation. Creative thinking involves solving
new kinds of problems that require people to think about a problem and
its elements in a new way. It is characterised by creation, invention and
design. Practical thinking involves solving problems by applying what
people already know to everyday contexts. In other words, people
“apply, use and do”.

A key theme underlying all Sternberg’s work is that intelligent thought


and activity take place in, and are shaped by and oriented towards, the
specific contexts in which people operate. For example, analytic abilities
are called upon when a person tackles “familiar problems that are
largely academic because they are abstracted from the substance of
everyday life” (Sternberg, 2000, p. 253). In contrast, creative abilities
are used when a person comes up with new solutions to old problems
and/or identifies new problems other people had not previously thought
of.

Because of this contextualist view, Sternberg (1988) emphasises that


what is defined as intelligent as opposed to unintelligent is shaped by
culture and therefore varies greatly between cultures and different
historical periods. This point was made earlier when we said that
intelligence is a political rather than a technical term, because it is
defined by the people who hold power in business and educational
circles. See Sternberg (1988, pp. 366–367) and Westen (2002, pp. 279–
280) for further discussion of this.

The primary strength of Sternberg’s work is that he has provided us with


a sound understanding of what actually happens at an information-
processing level when people solve problems. He has also provided a
systematic method for relating laboratory-based studies of mental
problem solving to the psychometric methods of standardised testing.
His work thus links with that of people like Binet and Wechsler. His
work also fits in with those who sought an explanation of intelligence in
basic psycho-physiological processes such as reaction time. Of course,
his work differs from and is vastly superior to these earlier theorists,
because new technologies such as computers and millisecond timers
allowed him to look at aspects of behaviour that were beyond their
reach. His work also provides a coherent set of theories and principles
by which to identify the existence of types of intelligence previously
unrecognised by psychologists. Finally, he has developed new ways of
measuring creative intelligence and has stimulated a new field of
research into practical intelligence. (See Sternberg (2000) for an updated
overview of this.)

However, on the negative side, a major drawback to Sternberg’s


approach is that there is no way of systematically assessing the various
aspects of his triarchic theory.

10.4.4 Howard Gardner’s theory of multiple intelligences*


(1980s–2000s)
Howard Gardner is another psychologist who has long been very
dissatisfied with traditional psychometric (i.e. intelligence test)
approaches to the study of intelligence and with what he perceives to be
their narrowness, arguing that a person may be good at learning
languages, but may struggle to learn music, or vice versa. Gardner has
tried to “go back to basics” by asking the fundamental question of
whether all the symbol systems that humans have invented and use, such
as everyday language, mathematics and music, involve the same abilities
and “intelligence”, or whether they draw on different kinds of
intelligence. To answer this question, Gardner (1983, 1993) considered a
wide variety of information such as the effects of brain damage in
particular areas on particular types of ability, the distinctive patterns of
lifetime development of these people with brain damage, and the study
of exceptional individuals, and came up with a theory of multiple
intelligences.

According to Gardner, all people possess a number of intellectual


potentials, or “intelligences”, each of which involves a somewhat
different set of skills. He argues that what people are born with provides
the raw capacities for each of these intelligences, and that culture and
social experiences provide the means whereby they are able to develop
and use these inherited capacities. Although these various intelligences
normally interact, they can function independently to some degree, and
people may develop certain intelligences to a greater degree than others,
for example as a result of training and practice.

Initially, Gardner (1983) argued that there were seven different types of
intelligence. In addition to the three recognised by traditional
approaches (verbal or linguistic, mathematical and spatial), he added
musical intelligence, bodily/kinaesthetic intelligence (exhibited, for
example, in dancing, sport and athletics), intrapersonal intelligence
involving the understanding of ourselves and what “makes us tick”, and
interpersonal intelligence shown in being able to relate to and effectively
understand other people. The last two correspond closely to what other
researchers like Goleman (1995) have called emotional intelligence*.
Gardner later added an eighth form of intelligence, which he termed
naturalistic intelligence. This is the ability to understand, relate to,
categorise, classify, comprehend and explain the things encountered in
the world of nature. People such as farmers, ranchers, hunters, gardeners
and animal handlers would exhibit high levels of this kind of
intelligence. He has also added a ninth form of intelligence, which he
calls spiritual intelligence.

The original seven intelligences and examples of the types of people


who exhibit a high level of each are listed in Table 10.1.

Table 10.1 Gardner’s seven intelligences


Intelligence Description Examples
Linguistic Verbal/linguistic intelligence relates to Poets, writers,
utilisation of language, including reading, orators,
vocabulary, formal speech, creative writing, communicators
poetry, verbal debate, humour and
storytelling. It includes the ability to
communicate well, both orally and in writing,
often in several languages
Logical- Logical/mathematical intelligence is related Mathematicians,
mathematical to problem solving and pattern recognition, logicians,
including abilities involving abstract symbols scientists
and formulae, number sequences,
calculations and codes. It covers solving
mathematical problems and the ability to
reason and handle complex logical
arguments, as well as balance a budget
Spatial Visual/spatial intelligence is related to Architects,
designs, patterns, shapes, active navigators,
imagination, visualisation and imagery. It draughtspersons,
includes the ability to know where one is surgeons,
relative to fixed locations, and to accomplish sculptors,
tasks requiring three-dimensional painters
visualisation. It also covers hand-eye
coordination, as well as reading a map
Bodily/kinaesthetic This is the ability to use one’s physical body Dancers,
well. Bodily/kinaesthetic intelligence is athletes,
related to physical movement and craftspeople,
expression through physical exercise, body sportsmen and
language, drama and dance sportswomen,
people with ball
sense
Musical This refers to the ability to learn, perform Musicians,
and compose music. Musical/rhythmic composers,
intelligence is related to the vibrational effect people who
of music on the brain including rhythmic appreciate
patterns, environmental sounds, singing and classical music
musical performance
Intrapersonal Intrapersonal intelligence is related to People who have
introspection and knowledge of the internal good insight into
aspects of the self. It includes the ability to themselves and
know one’s own body and mind make effective
use of their other
intelligences
Interpersonal Interpersonal intelligence is related to Salespeople,
person-to-person encounters in such things teachers,
as effective communication, working with clinicians,
others towards a common goal, and noticing politicians,
distinctions among people. It includes the religious leaders,
ability to sense others’ feelings and be in psychologists
tune with others. Together with
intrapersonal intelligence, this forms the
basis of emotional intelligence (Mayer &
Salovey, 1993)

Source: Based on Gardner (1993)

Gardner’s theory has inspired some educationalists to develop new


approaches to teaching and learning, designed to enhance the
intelligences he claims to have identified, and his work has been taken
up and applied by practitioners working in the field of accelerated
learning. On the other hand, his work has had less influence on research
psychologists, possibly because he and his colleagues have not been
much concerned with developing psychometric tests to measure the
intelligences identified by his theory. Consequently, the theory is
difficult to test in a systematic way. Another problem is that his theory
does not set any limits on the number of extra intelligences that might be
identified in the future. If we accept the existence of bodily intelligence,
why not distinguish specific kinds of such intelligence, like dance
intelligence or an intelligence for running, football or rugby?

It seems that in many ways, Gardner’s theory is a move back to the


model suggested by Thurstone. While it may be tempting to find a large
number of intelligences so that everyone can be “intelligent” in some
fashion, this is nothing more than playing semantic games, and by doing
this we eliminate any real meaning the term “intelligence” may have
had. It is probably theoretically more sound to see these intelligences as
areas or domains that require slightly different combinations of
knowledge and rules for combining information. Being street smart,
having practical or common sense, and being socially intelligent, for
example, are “specialisations” of intelligence, just like academic
intelligence is. Other things, like musical ability, or kinaesthetic or
artistic abilities, are talents in their own right and not new kinds of
intelligence.
As with Sternberg’s theory, there is a major problem because there is no
way of systematically assessing the various kinds of intelligence
identified by Gardner.

10.4.5 Das and Naglieri’s PASS Theory of Intelligence


The PASS Theory of Intelligence was first proposed in 1975 by Das,
Kirby and Jarman (1975) and later elaborated by Das, Naglieri and
Kirby (1994) and Das, Kar and Parrila (1996). It build on the theories of
both Sternberg and Gardner’s view of intelligence as neither a single nor
biologically determined factor, but as a number of domains that
represent the interaction of the individual’s biological predispositions
with the environment and cultural context (Das, Naglieri & Kirby,
1994). It challenges Spearman’s g-theory on the grounds that the brain is
made up of interdependent, but separate, functional systems. Studies
involving neuro-imaging and clinical studies of individuals with brain
lesions indicate quite clearly that the brain is modularised; for example,
damage to a very specific area of the left temporal lobe will impair the
production (but not the comprehension) of spoken and written language.
Damage to an adjacent area will have the opposite effect, preserving the
individual’s ability to produce, but not understand speech and text.
According to the PASS model, cognition consists of four processes,
namely planning, arousal, simultaneous processing and successive
processing (hence PASS) (Das, Naglieri & Kirby, 1994).

Planning is concerned with various executive functions responsible for


controlling and organising behaviour, selecting and constructing
strategies, and monitoring performance. Attention is responsible for
maintaining arousal levels and alertness, and ensuring focus on relevant
stimuli. The next two processes involve simultaneous and successive
processing to encode, transform and retain information. Simultaneous
processing is engaged when the relationship between items and their
integration into whole units of information is required. Examples of this
include recognising figures, such as a triangle within a square vs a
square within a triangle, or the difference between “he went for a run
before breakfast” and “he had breakfast before going for a run”.
Successive processing is required for organising separate items in a
sequence such as remembering a sequence of words or actions exactly in
the order in which they had just been presented.

Das, Naglieri and Kirby (1994) link these four processes to four
functional areas of the brain. Planning is broadly located in the frontal
lobes, while attention and arousal are functions of the frontal lobe and
the lower parts of the cortex, with some additional involvement of the
parietal lobes in attention. Simultaneous and successive processing
occur in the posterior region of the brain. Simultaneous processing is
broadly associated with the occipital and the parietal lobes, while
successive processing is broadly associated with the frontal-temporal
lobes.

Based on this theory and studies in cognitive psychology involved in


promoting a better look at intelligence Das (2002) and Naglieri and Das
(1997) developed a cognitive measurement instrument called the Das-
Naglieri Cognitive Assessment System (CAS). The CAS is designed to
provide information about cognitive strengths and weaknesses in each of
the four PASS processes. This emphasis on processes (rather than
abilities) makes it useful for differential diagnosis – unlike more
traditional full-scale IQ tests, the CAS is also able to diagnose such
aspects as attention deficit disorder (ADD), learning disabilities, autism,
mental retardation, cognitive changes resulting from aging and Down
syndrome. More recently, the CAS has been used to identify changes
due to brain impairment in stroke patients and has been shown to be
useful as a theory for measuring aspects of the planning and decision-
making processes in management (Das, Kar & Parrila, 1996).

10.4.6 Emotional intelligence


In the early 1990s, a very popular type of intelligence, emotional
intelligence (EI), was put forward by Mayer and Salovey (Salovey &
Mayer, 1990; Mayer & Salovey, 1993). They see EI as “a type of social
intelligence that involves the ability to monitor one’s own and others’
emotions, to discriminate among them, and to use the information to
guide one’s thinking and actions” (Mayer & Salovey, 1993, p. 433). The
idea of EI was popularised by Daniel Goleman (1995). According to
Mayer and Salovey, EI includes Gardner’s interpersonal and
intrapersonal intelligences, and involves abilities that may be
categorised into five domains:

1. Self-awareness. Observing oneself and recognising a feeling or


emotion as it happens
2. Managing emotions. Handling emotions so that they are
appropriate; realising what is behind an emotion; finding ways to
manage fears and anxieties, anger and sadness
3. Motivating oneself. Channelling emotions to serve a goal; having
emotional self-control; delaying gratification and stifling impulses
4. Empathy. Being sensitive to others’ feelings and concerns, and
seeing their perspective; appreciating the differences in how people
feel about things
5. Handling relationships. Managing emotions in others; having
social competence and social skills

In recent years, the concept of emotional intelligence has gained in


popularity, and several comprehensive models of emotional intelligence
have provided alternative theoretical frameworks for conceptualising
this construct. As Emmerling and Goleman (2003) note, there have been
three quite distinct approaches to EI, represented by the work of Bar-On
(1997), Goleman (1995), and Mayer and Salovey (1993). As Caruso
(2004), in his review of Emmerling and Goleman’s (2003) paper, points
out, Bar-On’s interests seemed to have grown out of his concern with a
concept called subjective wellbeing and on non-intellective aspects of
performance. Goleman was a student of David McClelland and is
concerned with the area of competencies. Mayer and his colleague
Salovey both worked in the areas of human intelligence as well as
cognition and affect (how emotions and thinking interact to affect
performance, especially with respect to health psychology).

A certain amount of academic research has been done on EI, and the
originators of the theory have provided evidence for its construct
validity. However, others such as Davies, Stankov and Roberts (1998)
claim that they cannot show construct validity and that EI does not fit
the true definition of intelligence, but is rather much closer to notions of
personality and emotional control. As with Sternberg’s theory, there is a
major problem because there is no way of systematically assessing the
various kinds of intelligence identified by Gardner.

10.4.6.1 Emotional intelligence as “intelligence”


A major issue that needs clarification is whether emotional intelligence
is a form of intelligence as defined (efficiency of information
processing), or whether it is closer to being a personality variable
(preferred or typical ways of dealing with the world). Those in favour of
the intelligence argument maintain that it has a direct relationship to the
concept of “social intelligence” which was first identified by Thorndike
in 1920. He defined social intelligence as “the ability to understand and
manage men and women, boys and girls – to act wisely in human
relations”. These theorists then built on Gardner’s (1983) model of
multiple intelligences, identifying seven types of intelligence, especially
his interpersonal intelligence (defined as the ability to symbolise
complex and highly differentiated sets of feelings in dealing with
oneself), and intrapersonal intelligence (which is the ability to notice
and make distinctions among other individuals, and in particular,
among their moods, temperaments, motivations and intentions). Even
though Gardner did not use the term emotional intelligence, his concepts
of intrapersonal and interpersonal intelligence provided a foundation for
later models of emotional intelligence.

10.4.6.2 Criticisms of EQ as “intelligence”


According to Emmerling and Goleman (2002), cognitive intelligence
(IQ) is clearly defined and research has demonstrated that IQ is a
reliable and relatively stable measure of cognitive capacity/ability. They
go on to argue that in the area of so-called emotional intelligence (i.e.
EQ), the various definitions of EQ are inconsistent about what it
measures. For example, people such as Bradberry and Greaves (2005)
argue that EQ is not fixed and that it can be learned or increased,
whereas others (such as Mayer) argue that EQ is stable, and cannot be
increased. In addition, Emmerling and Goleman point out that emotional
intelligence has no “benchmark” or external criterion against which to
evaluate itself. They contrast this with traditional IQ tests which have
been designed to correlate as closely as possible with school grades.
Emotional intelligence seems to have no similar objective quantity it can
be based on. Intelligence tests are characterised by items that have one
correct answer, whereas EQ tests are far more similar to personality
scales where the instructions generally stress that there is no correct
answer – respond as you typically react. Finally, traditional intelligence
tests (and they are tests in the true sense of the word) are generally timed
and the items display increasing levels of difficulty. EQ measures do not
have this sense of increasing difficulty about them.

As a result, many psychology researchers do not accept emotional


intelligence to be a part of a “standard” intelligence model (like IQ). For
example, Eysenck (2000) argues that Goleman

exemplifies more clearly than most, the fundamental absurdity of the


tendency to class almost any type of behaviour as an “intelligence” …
If these five “abilities” define “emotional intelligence”, we would
expect some evidence that they are highly correlated; Goleman admits
that they might be quite uncorrelated, … So the whole theory is built
on quicksand; there is no sound scientific basis (pp. 109–110).

There are thus fairly strong arguments that EQ is not a form of


intelligence and that the term is used in a loose, unscientific and populist
fashion. Indeed, this argument could with equal justification be aimed at
Gardiner’s theory of multiple intelligences.

10.4.6.3 Emotional intelligence as “personality”


If emotional intelligence (EQ) is not a form of intelligence, what is it?
Various researchers have indicated that EI has many of the properties
associated with personality theories. For example, EI correlates
significantly with two dimensions of the Big Five, namely neuroticism
and extraversion. In common with most personality measures, EI
measures are made up of items that are quite transparent in that the test-
taker knows exactly what is being looked for on the scale. This makes it
very easy for test-takers to respond in a socially desirable way – this is
known as faking good. This is a form of bias or systematic error that has
long been known to contaminate responses on personality inventories. It
is thus argued that the similarities between personality testing and self-
report EI testing and the differences between EI and traditional
intelligence make it reasonable to assert that EI is much closer to being a
measure of personality that it is to being a measure of intelligence. Until
the definition of emotional intelligence is clarified, little progress can be
expected in its assessment.

10.4.6.4 Emotional intelligence as “competency”


One way out of the dilemma is to suggest that EI is a competence rather
than either a form of intelligence or a personality dimension. A
competence is defined as a blend of knowledge, skills, attitudes and
values (or KSAVs) required for success in a particular situation (see
Chapter 12). In support of this, Goleman (1998) has described his five
dimensions of emotional intelligence in terms of 25 different emotional
competencies.

It would thus appear that EI is not strictly a form of intelligence but


rather a set of competencies (which are defined as the knowledge, skills,
attitudes, attributes and values that are required for successful task
performance). Seeing EI as a set of competencies rather than as
intelligence allows us to move beyond the intelligence/personality
debate. It may also open new possibilities for assessment. These three
alternatives are summarised in Table 10.2.

Table 10.2 Three ways of viewing emotional intelligence

EI as Theory Related to Assessment


Intelligence Intellectual abilities using Models of Timed efficiency
emotional information general, or measures – tests
(e.g. ability to identify standard,
emotion) intelligence
Personality Traits related to Models of Measures of typical or
adaptation and coping personality preferred ways of
(e.g. assertiveness) and reacting with others –
dispositional inventories
traits
Competency Acquired KSAVs Leadership Demonstrated behaviour
underlying effective competency patterns in specified
performance (e.g. models situation – role plays,
influence in leadership) simulations

10.5 Assessing intelligence

Given this background, how do we assess intelligence? As you can see,


there are many different models of intelligence, each giving rise to its
own approach as to how it should be assessed. However, what is
common to most of these is the idea that intelligence involves certain
knowledge (or bits of information), “rules” that link one bit of
information to others and the ability to use this information and the rules
to identify patterns and solve problems quickly and efficiently. This is
best illustrated using a typical verbal analogy test item such as hot is to
cold as wet is to?. In this example, the “rule” that applies to the first part
is that hot is the opposite of cold and therefore if we apply the rule to
wet, then by the rule of opposite, the answer must be dry because dry is
the opposite of wet.

10.5.1 Series items


Of course, not all items are language based, and some can use abstract
and realistic objects in various ways. A very common approach is to use
a series such as that shown in Figure 10.4.

Figure 10.4 A typical series


item

Clearly, the answer to the question in Figure 10.4 is because


the number of stars increases by one each time, and the next in the series
is five stars, so the rule that has to be identified is “increase the number
of stars by one each time”. Of course some other rule, such as “increase
the number of stars by two each time”, could just as easily have been
used.

10.5.2 Matrix items


A similar kind of problem is a matrix, which is simply two sets of series,
one that goes across and one that goes down. (See, for example, Raven’s
Progressive Matrices in e.g. Raven, Raven & Court, 2003/4.) This is
shown in Figure 10.5.

Figure 10.5 A typical


matrix item

As we can see, there are two series, one going across (increase by 1, but
use same symbol), and one going down (increase by 1 and change
symbol). So what should the missing value be? Clearly, from the across
rule there should be five asterisks, and from the down rule there should
also be five asterisks. In all cases, the across and down answers should
be the same. The participant must then identify which of the answer
options contains five stars (*****).

10.5.3 Odd one out


A format that is often used is the odd-one-out technique, where five
objects are shown, one of which does not belong to the group. The
testtaker is required to indicate which of the objects does not belong
with the others. These objects can be pictures representative of animals
(e.g. a dog, cow, horse, sheep and lion) or vehicles, or they can be
abstract shapes (e.g. four closed figures – a circle, a square, a diamond
and a parallelogram, say – and an open figure such as a square with one
side missing). The items can even consist of words (e.g. four verbs and a
noun) or numbers (e.g. four numbers divisible by 7 and one number not
divisible by 7).

10.5.4 General knowledge items


In some cases, the items involve general knowledge, such as: “Who was
the first president of democratic South Africa?” or “What is the name of
the largest game reserve in South Africa?” Sometimes the items involve
knowledge of English (or Afrikaans) grammar, spelling, punctuation,
etc. Often, items are numeric in nature (what is 6 + √9?). These general
knowledge items are the most likely to be unfair to different cultural
groups and people with less education because they reflect the social
context in which the people have been raised. These items are often said
to be culturally saturated.

10.5.5 Assembly tasks*


An important approach to the assessment of intelligence involves the
assembly of objects such as jigsaw puzzles. This is a common technique
in the various Wechsler intelligence tests where the person being tested
has to construct a figure (or mannikin) from existing pieces (e.g. arms,
legs, head, torso). A related technique is to have the person draw either a
house, tree or person. An example of the latter is the Goodenough-Harris
“Draw a Person” test, in which the resulting figure is scored for the
presence of all limbs, correct proportions, profiling, and so on (Harris,
1963). This more hands-on approach is particularly useful when the
people being tested are very young or when they have special needs.
Some tasks even require the continuation of a pattern using shapes and
blocks.

10.5.6 Group and individual assessment


There is a major difference between assessment processes that can take
place in a group and those that are individually administered. In general
terms, young children and people with special needs where the measures
are administered for diagnostic purposes (to determine specific areas of
strength or weakness) are assessed one-to-one. When assessment is done
more routinely such as with high school pupils and normal adults, group
assessment is far more economical, although somewhat more global and
less diagnostic.

10.5.7 Verbal and performance scales


A distinction is often made in assessment between verbal and non-verbal
(or performance) scores. Typically, when an assessment is done using
intelligence tests, three sets of scores (often termed IQ scores) are given,
namely verbal IQ, performance IQ and full-scale IQ. Because verbal
material is influenced to a greater extent than non-verbal material by
cultural factors and deprivation, the non-verbal measures are generally
regarded as more accurate in cases where deprivation and cultural
differences are identified. A discrepancy of more than one standard
deviation (15 IQ points) between the verbal and performance IQ for any
individual should be regarded as a strong indication of possible cultural
deprivation.

10.5.8 Dynamic testing*


One final approach to assessing intellectual ability and potential is that
put forward by the Israeli psychologist Reuven Feuerstein (e.g. 1979).
According to Feuerstein, a major problem with traditional measures of
intellectual ability is that they are based on the assumption that people’s
social and educational backgrounds are relatively similar, and that
differences in intelligence scores reflect differences in processing speed
rather than in basic knowledge. He goes on to argue that this is not
necessarily true, especially when assessing the ability of socially
impoverished groups. Working from Vygotsky’s (1978) theory of the
zone of proximal development, Feuerstein takes a dynamic view of
intelligence, arguing that it is the person’s ability to learn rules quickly
that is the key aspect. He takes a three-step approach: firstly, he uses a
relatively simple task such as matrix thinking, and assesses the person
on this task. Then he teaches the person how to do the tasks involved in
the matrix. Finally, he retests the person using a parallel version of the
matrix test. The extent to which the person improves his score the
second time around is a much better indicator of the person’s
intelligence than either the initial or the second test. This test-train-retest
technique is known as the Learning Potential Assessment Device or
LPAD. (In recent years, this has been relabelled Learning Propensity
Assessment Device.) This approach is termed dynamic assessment and
is being increasingly used to assess ability and potential, both in South
Africa and elsewhere. (For an in-depth look at the theory behind
dynamic assessment, readers are referred to Amod and Seabi (2013)
while Murphy and Maree (2006) give a good account of dynamic
assessment in South Africa.)

One problem with this approach is that it is labour intensive, especially


during the training period, which is itself difficult to standardise (see De
Beer, 2006). An alternative to this test-train-retest approach (what
Sternberg and Grigorenko (2002) term the “sandwich approach”)
involves a continual process of gauging the individual’s mastery of the
material by offering prompts and assistance during the early part of the
assessment. This is termed the “cake” approach and forms the basis of
several new testing procedures by people such as Terry Taylor (Taylor,
1994; 2006, 2013; and De Beer, 2005, 2013).

10.6 The changing context of intelligence testing

Over the past few decades there has been a substantial increase in our
ability to think rationally and solve increasingly more complex
problems. This is a result of stimulation by radio, television, better
education, and so on. The effect of this has been that the average level of
cognitive ability has steadily increased. However, because intelligence is
a comparative notion (i.e. people’s intelligence is defined in relation to
others via the normal distribution – the bell curve), IQ scores of the
general population have remained the constant over the years, while
those of older people tested on updated tests appear to have decreased.
(This phenomenon is termed the Flynn effect after J.R. Flynn (1984),
who first described it.) Similarly, people raised in non-stimulating
environments and who are not exposed to preferred ways of problem
solving tend to score lower in relation to those who have benefited from
an enriched environment.

At the same time, changing historical agendas have altered those aspects
of intelligence considered to be of primary importance. Conventional IQ
tests measure cognitive abilities that are needed to do well at school and
to succeed in various intellectual tasks. In addition to individual
differences in cognitive ability, performance on such tests has been
shown to be influenced by a range of other factors that reflect individual
differences in experience (such as class and ethnicity).

10.7 Summary

Although psychologists still argue about the nature and structure of


intelligence, the debates have moved forward. In the past 20 years or so,
they have broadened their views to include a wider range of abilities that
are evident in everyday life, and have begun to take seriously the
personal, cultural and historical contexts which define a given behaviour
as being more or less intelligent. The methods advocated for measuring
intelligence have also diversified.

Early attempts to measure intelligence were based on the view that


intelligence was an expression of basic psycho-physiological factors
such as reaction time, and visual and auditory acuity. This changed
when Binet introduced his approach of assessing how well people
answered real-life questions. In trying to explain the nature of
intelligence, Spearman used factor analysis, and suggested that there
was a strong central intelligence factor g and numerous specific factors.
Cattell argued for two forms of intelligence, namely fluid and
crystallised intelligence. Numerous other models followed, with
Guilford arguing that there were as many as 150 facets of intelligence.
Gardner argued that there may be as many as seven or nine distinct types
of intelligence that emerge in this way. Based on this model, theories of
emotional intelligence have also been developed. What is probably the
most important step forward in conceptualising and measuring
intelligence since Binet is the cognitive or information-processing model
put forward by Sternberg and his colleagues (Sternberg, 2000).

When it comes to assessing intelligence, it is very apparent that the


factors we assess and the methods we use to do this are directly related
to the theory and model of intelligence we believe in. Different theories
of intelligence provide different criteria for defining and measuring
intelligence. Methods used to assess intellectual ability range from
testing the understanding of the meanings of words, through verbal
analogies (“hot” is to “cold” as “black” is to “?”), to series, matrices and
odd-one-out formats. The physical assembly of objects and patterns is
also used. Conventional IQ tests do not measure emotional intelligence,
nor do they measure other kinds of intelligence that have been
postulated by Gardner.

The answer to the question: “What do intelligence tests measure?” will


thus depend on the theoretical perspective adopted by the person
answering the question and on the particular test being considered. For
example, the Wechsler Adult Individual Scale (WAIS) clearly measures
a different kind of intelligence to that measured by one of the new tests
of emotional intelligence. Intelligence can be measured in group or one-
to-one situations. The former situation is used with more mature people,
whereas the latter is used with younger people and those with special
needs, and when a greater level of diagnosis is required. Both
approaches can yield verbal, performance and full-scale scores of
intelligence. Series and matrices (especially when abstract symbols are
used) are seen as the best measure of the g-factor*, whereas language,
knowledge and numeracy tests assess Spearman’s s-factors. It is
important to see that the content and assessment methods used in the
various tests of intelligence reflect the different theoretical points of
view of the people who design the tests, as outlined in sections 10.3 and
10.4. The dynamic approach of the test-train-retest (LPAD)
methodology and developments of this technique hold some hope for
socially disadvantaged groups. The chapter concludes by briefly
discussing the changing context in which the assessment of intelligence
and ability is occurring.

Additional reading

For an in-depth (though somewhat dated) look at the issues surrounding the definition
and assessment of intelligence, see Neisser et al. (1995). Intelligence: Knowns and
unknowns: Report of a task force established by the Board of Scientific Affairs of the
American Psychological Association.
Chapter 8 of Cohen, R.J. & Swerdlik, M.E. (2002). Psychological testing and
assessment: An introduction to tests and measurement gives a good account of some
theories of intelligence and how these shape the way in which intelligence is measured.
A good general introduction to intelligence and intelligence testing is given by Louw,
D.A. & Edwards, D.J.A. (1997). Psychology: An introduction for students in Southern
Africa. (See especially Chapter 7.)
Another good overview is provided by Kowalski & Westen, D. (2004). Psychology:
Brain, behaviour and culture (4th ed).
Kowalski, R. & Westen, D. (2004). Psychology: Brain, behavior, and culture, 4th edition.
NY: Wiley.
Perhaps the best book currently available on the role of intelligence in the workplace is
Adrian Furnham’s (2008) Personality and intelligence at work, especially Chapter 6.
For a closer look at dynamic testing, see Sternberg, R.J. & Grigorenko, E.L. (2002).
Dynamic testing: The nature and measurement of learning potential. Cambridge
University Press.

Test your understanding

Short questions

1. Discuss the various ways in which intelligence has been defined.


2. Outline the structural (factor analytic) models of intelligence.
3. Discuss the cognitive (information processing) models of intelligence.

Essays
1. Give a brief overview of the way the concept of intelligence has developed over time.
2. Show how the theoretical model of intelligence that has been adopted has a direct
bearing on the way intelligence is assessed.
3. Define intelligence and show how it contributes to workplace success.
11 The assessment of
personality

OBJECTIVES

By the end of this chapter, you should be able to

give various definitions of personality


describe various approaches to conceptualising personality
show how these ways of looking at personality influence how it is assessed
define what is meant by the trait approach to personality
describe the importance of traits to psychology
see how traits are measured
see how traits relate to developmental approaches to personality.

11.1 Introduction

Personality is an important aspect of people’s behaviour, as it leads them


to behave in certain consistent ways and to make certain choices. This
affects the careers they choose and the way they behave in the
workplace (and elsewhere).

11.1.1 Definition of personality


One of the oldest definitions of personality still in use today is that of
Gordon Allport (1937): “Personality is the dynamic organisation within
the individual of those psychosocial systems that determine his or her
unique adjustments to the environment.”

According to Barnouw (1985, p. 188), “[p]ersonality is a more or less


enduring organisation of forces within the individual associated with a
complex of fairly consistent attitudes, values and modes of perception
which account, in part, for the individual’s consistency of behaviour”.

In the view of Robbins (1996, p. 90) “[p]ersonality is the sum total of


the ways in which an individual reacts and interacts with others”.

Similarly, Meyer, Moore and Viljoen (1997, p. 12) argue that


“[p]ersonality is the constantly changing but nevertheless relatively
stable organisation of all physical, psychological and spiritual
characteristics of the individual which determine his or her behaviour in
interaction with the context in which the individual finds himself or
herself”.

Greenberg and Baron (2000, p. 97) state that “[p]ersonality is the unique
and relatively stable pattern of behaviours, thoughts and emotions
shown by an individual”.

Finally, Kaplan and Saccuzzo (2013, p. 440) argue that “personality is


the relatively stable and distinctive patterns of behaviour that
characterise an individual and his or her reactions to the environment”.

In other words, personality is the sum total of the way in which an


individual reacts to and interacts with others and the world in general.
The more consistent a person is in his reactions across different
situations, and the more frequently the person shows these behaviours,
the stronger is the characteristic or trait* and the more important that
trait is for describing the individual. If we adopt an information-
processing view of people, we can define personality as the relatively
stable way in which people prefer to process information and interact
with the world in which they live.

11.1.2 Idiographic versus nomothetic approaches


There are two different views on personality – these are termed the
idiographic and nomothetic approaches. The idiographic view takes an
in-depth look at the factors that go into the person’s makeup and
emphasises that each person is unique in his psychological structure. It
is almost a case study of the person, and no attempt is made to describe
the person in terms of any particular traits or theoretical constructs. This
sometimes makes it difficult to compare one person with others. This
viewpoint also emphasises that a given characteristic may differ in
importance from person to person – they can be cardinal, central or
secondary traits*. Assessing people in terms of this framework makes
use of case studies, bibliographical information, diaries, and so forth for
information gathering.

The nomothetic view, on the other hand, emphasises that all personality
characteristics are well-defined entities and therefore common to all
people. This makes it relatively easy to describe people. People differ in
their positions along a continuum on the same set of characteristics, and
they are unique only in the balance and amount of each characteristic –
it is this balance which constitutes their uniqueness. Most contemporary
psychologists tend towards a nomothetic approach, but they are aware of
how a characteristic may differ slightly from person to person in the way
that it is expressed. This approach tends to use self-report personality
questions, factor analysis and other trait-based methods for gathering
information about and/or describing the person. These methods are
discussed in detail below.

11.2 Theories of personality

Before we try to assess personality, we need to understand the different


theories that are used to describe it. This chapter takes a brief look at six
different theoretical approaches.

11.2.1 Biological approaches


There is some evidence that personality is related to physical
characteristics and may have its origins in factors such as the rate at
which people mature, their general responsivity to stimuli and various
temperamental factors. Perhaps the best-known theory in this respect is
the one put forward by William Sheldon in the 1940s. He held that
distinct personality characteristics were associated with different body
types or somatotypes. Sheldon identified three basic human shapes,
which he termed ectomorph, mesomorph and endomorph (Sheldon &
Stevens, 1942). These body types and associated personality
characteristics are summarised in Table 11.1.

Table 11.1 Sheldon’s somatotypes and associated personality characteristics

Somatotype Description Personality


Ectomorph Thin, poorly muscled, energetic Studious, withdrawn, a
A stick figure typical nerd

Mesomorph Well built, athletic, muscular Well balanced,


A superman with a V-torso motivated, assertive

Endomorph Round, fat and flabby, with somewhat Relaxed, lazy and laid
underdeveloped muscles back
A snowman figure

Sources: Friedman & Shustack (1999, p. 170) and Carducci (2009, p. 327)

Clearly these are stereotypes and people often behave as others expect –
fat people are expected to be jolly and fun loving, while scrawny people
“are supposed” to be bookish and nerdy. It is thus not surprising that
people with particular physiques behave in certain stereotypical ways.
However, some people still believe in this approach – a search of the
Internet for somatotypes will get some good hits.

11.2.2 Developmental approaches


Developmental theories see personality as reflecting the stage of
emotional and cognitive development of the person involved.
Proponents of these theories would argue that a newborn child really
experiences two basic emotions, namely pleasure and discomfort. As the
child matures, so the emotions begin to differentiate and to become
more refined. This approach is important for clinical and educational
psychologists who may wish to find out why people behave as they do –
this is vitally important if they wish to change their behaviour. Two
theories are of relevance here, namely Kohlberg’s theory of moral
development and Erikson’s stages of psychosocial development.
Lawrence Kohlberg was born in October 1927 and died in January 1987.
According to Kohlberg, moral reasoning develops through three levels
and six stages as follows:

1. Level 1 (Pre-conventional) (No rules)

1. Obedience and punishment orientation (How can I avoid


punishment?)
2. Self-interest orientation (What’s in it for me? Paying for a benefit)

2. Level 2 (Conventional) (Rule following)

3. Interpersonal accord and conformity (Social norms – The good


boy/good girl attitude)
4. Authority and social-order-maintaining orientation (Law and order
morality)

3. Level 3 (Post-conventional) (Rules can be broken under certain


circumstances)

5. Social contract orientation (What is best for everyone?)


6. Universal ethical principles (Principled conscience)

Another developmental theory is that of Erik Erikson, which states that


people develop through eight stages. Each stage identifies a task that
must be achieved. The achievement can be complete, partial or
unsuccessful. The greater the task achievement, the healthier the
personality of the person. Failure to achieve a task at one stage
influences the person’s ability to achieve the next task. The
developmental tasks are viewed as crises, and successful resolution is
supportive to the person’s ego. The individual must find a balance
between the positive and negative side of the task – for example the
balance between trust and mistrust. Erikson argues that no level of
development can be bypassed. Table 11.2 shows a chart of Erikson’s
eight stages of development.

Table 11.2 Erikson’s eight stages of development

Indicators of
Central Indicators of positive
Stage Age negative
task resolution
resolution
Infancy Birth to Trust Learning to trust others Mistrust,
18 versus withdrawal,
months mistrust estrangement
Early 18 Autonomy Self-control without loss Compulsive self-
childhood months versus of self-esteem restraint or
to 3 shame and Ability to cooperate and compliance
years doubt to express oneself Wilfulness and
defiance
Late 3 to 5 Initiative Learning the degree to Lack of self-
childhood years versus guilt which assertiveness and confidence
purpose influence the Pessimism, fear of
environment wrongdoing
Beginning to evaluate Over-control and
one’s own behaviour over-restriction of
own activity
School age 6 to 12 Industry Beginning to create, Loss of hope,
years versus develop and manipulate sense of being
inferiority Developing a sense of mediocre
competence and Withdrawal from
perseverance school and peers
Adolescence 12 to Identify Coherent sense of self Feelings of
20 versus role Plans to actualise one’s confusion,
years confusion abilities indecisiveness,
and antisocial
behaviour
Young 18 to Intimacy Intimate relationship Impersonal
adulthood 25 versus with another person relationships
years isolation Commitment to work Avoidance of
and relationships relationship,
career or lifestyle
commitments
Adulthood 25 to Generativity Creativity, productivity, Self-indulgence,
65 versus concern for others self-concern, lack
years stagnation of interests and
commitments
Maturity 65 Integrity Acceptance of worth Sense of loss,
years versus and uniqueness of one’s contempt for
to despair own life others
death Acceptance of death

Source:
http://www.sinclair.edu/academics/lhs/departments/nsg/pub/maslowanderikson1.pdf

This approach is important for clinical and educational psychologists


who may wish to find out why people behave as they do. This is a
vitally important issue if one wishes to change their behaviour.
However, it is less useful in the organisational arena as there is little in
these theories to suggest how the different developmental stages affect
the workplace or how they can be assessed.

11.2.3 Psychoanalytic theories


Theories such as those put forward by Freud and his followers see
personality as being shaped by events that occur during the person’s
early development and the way a person deploys that psychic energy.
People are generally not aware of the reasons for doing what they do –
these are buried deep in their unconscious. Psycho-analytic theories
assess personality by means of dream analysis and free association.
Various projective tests such as the Rorschach and Thematic
Apperception Test (TAT) are used to tap into these unconscious
processes. (See Friedman & Schustack, 1999, pp. 62–146 for an in-
depth look at the various psychoanalytical approaches to personality.)

11.2.4 Need theories


A very popular approach to personality is the need theories associated
with people like Maslow and McClelland. Maslow (e.g. 1954), sees
people’s behaviour as being driven by the lowest unfulfilled need in his
hierarchy of needs (physiological, security, social, esteem and self-
actualisation). There are several different scales to determine the relative
strength of needs at each of these levels.

McClelland and his colleagues (McClelland et al., 1953), on the other


hand, see personality in terms of such needs as the need for
achievement (NAch)*, the need for affiliation (NAff)* and the need
for power (NPow)*. Again, the relative strengths of these needs can be
assessed – McClelland and his associates developed their original theory
using scoring methods based on projective tests such as the Thematic
Apperception Test (TAT) to assess these needs. Subsequently, a number
of scales have been developed for this purpose.

11.2.5 Phenomenological approaches


The theorist most closely associated with this school of thinking is
George Kelly (1955), who argued that much of our behaviour is
governed or controlled by feelings and ideas that we are unable to
articulate. To bring these into awareness, he developed what he termed
the repertory grid. A repertory grid (or repgrid for short) is a two-stage
technique. In the first stage, the analyst compares three items or objects
(e.g. people, jobs, processes) and decides what two of the objects have
in common that the third lacks. The common elements are called
personal constructs. In the second stage, various other objects are rated
on how much of the construct each possesses.

To illustrate, consider the example of a repgrid given in Table 11.3. Five


people are compared: A (a bank teller), B and C (two of his colleagues),
D (the best performer in the department) and E (the worst performer).
The first stage of the repgrid has been completed using ticks to identify
the objects possessing the constructs (step 1) and a cross to indicate the
object lacking the construct (step 2). The third step in the process is to
label the construct. This process is repeated in each row of the repgrid.
The construct in the first row is customer friendliness; in the second
row, it is seeing opportunities to cross-sell, and so on.
Table 11.3 An example of a repertory grid used in the bank teller case
Repertory grid
Organisation ABC Bank Date 26/03/2012
Job title Bank teller Respondent G. Pillay (A)
Analyst P.D. Sithole

Elements Constructs
How are the two How is the third A’s
A B C D D
similar? different? score
✓ ✓ ✗ Customer friendly Rude to customers 5
✗ ✓ ✓ Sees chances to cross- Misses chances to 3
sell cross-sell
✗ ✓ ✓ Organises own work Misplaces files 2
✗ ✓ ✓ Helps other team Selfish 4
members
✗ ✓ ✓ Accepts feedback Gets defensive 2
✓ ✓ ✗ Copes with pressure Shows frustration 3
✓ ✓ ✗ Enthusiastic Negative 4
✓ ✗ ✓ Shows initiative, self No proactivity, non- 2
starter starter
✗ ✓ ✓ Motivated Does not want to work 4
✓ ✓ ✗ Patient Impatient 3

The analyst rates each object or person on a five-point scale on the


degree to which he or it displays the construct, where 1 is low and 5 is
high. In Table 11.3, person A has been rated on each of these constructs.
(We could just as easily compare the headmaster of a school, the class
teacher, a learner’s mother and father, and the school sports coach.) The
repgrid is also used in market research. Different brands of soap powder,
cars, hamburgers, and so forth can be evaluated against competing
products in this way (see Stewart, 2004).

In Table 11.3, the score of the person being assessed is given in the
right-hand column and the name of the analyst on the left. The clerk (G.
Pillay) is analysed by P. Sithole. When she is compared with others,
dimensions or constructs such as customer friendliness, seeing
opportunities for cross-selling, and so on emerge. In the second phase,
G. Pillay is scored on each of these constructs (right-hand column). As
we can see, she scores high on “customer friendliness”, is “helpful to her
colleagues” and “enthusiastic about her work”, and is “well motivated”.
However, she falls short on “showing initiative” and is “not very well
organised”, mislaying files often. (For an excellent introduction to
repgrids, see Jankowicz, 2004.)

11.2.6 Trait approaches


The trait approach to personality sees people’s behaviour in terms of
relatively stable characteristics, such as being clever, patient, lazy, and
so on. Although there are many different definitions of a trait, a useful
one is that of Guilford (1959, p. 6) who defines it as follows: “A trait is
any distinguishable, relatively enduring way in which one individual
varies from another.” However, in the realm of personality theory, traits
refer to behavioural tendencies – factors such as gender, age, skin
colour, etc. are not considered as traits, even though they are a
“relatively enduring way in which one individual varies from another”.

Trait theories assume that we can assess these relatively stable


characteristics using questionnaires and inventories. Although
personality traits are relatively enduring, they change over time as
people mature (e.g. people may become less aggressive but more
narrow-minded and conservative as they grow older). The trait approach
to assessing personality generally uses personality scales, of which there
are a great number. These can focus on a single characteristic (such as
ambitiousness) or they can assess a whole range of dimensions. A large
number of these multidimensional scales are designed to assess
personality from a clinical perspective, but an increasing number look at
so-called normal personalities within the work and sport environments.
These are discussed in detail in the next section.
11.3 Assessing personality

In 11.1.1, we defined personality as the relatively stable way in which


people prefer to process information and interact with the world in
which they live. The assessment of personality is thus the process of
trying to identify these relatively stable preferred ways of dealing with
the world. This can be done in various ways, and the method used will
reflect the theories regarding the nature of personality preferred by the
people involved. In practice, some theories lend themselves to
meaningful assessment in the workplace, while others do not.

11.3.1 Observation
This approach to assessment involves observing what people do and
how they react in specific situations. This is a behavioural approach and
is used, for example, in assessment centre situations (see Chapter 17).
While it is a useful approach, it is labour intensive and certainly requires
some form of intervention in the process (as discussed in Chapter 2). In
practice, this approach to assessment usually requires a person either to
be observed in his work situation, or it can involve some kind of role
play in which he has to respond to a structured situation of some kind,
such as dealing with a difficult subordinate, store assistant or municipal
official. Another typical exercise is placing people in a leaderless group
situation, giving them a task to complete together and monitoring the
interactions of the group members to see who shows particular forms of
interpersonal behaviour, such as dominance, leadership, the ability to
integrate outcomes, conciliatory behaviour, and so forth.

This method of assessment is quite culturally sensitive. Different groups


can have different norms in this type of group situation; for example,
historically women and many black people have been less dominant
than the typical white male. (See Chapter 17, especially section 17.4.3,
where the cultural fairness of assessment centres is discussed.)
Assessment centres rely heavily on this kind of observation.) Clearly, if
this behaviour is sought in a work situation (and it often is), then people
from social or cultural
groups who have been raised not to show dominance will score lower than their
white male counterparts. As a result, such persons may not be selected for a
particular position, which may rather be given to a higher-scoring white male.

Even though proportionally more white males than members from the
other groups would be selected in this scenario, this is not unfair if the
groups really do differ in dominance. It would be unfair if the
characteristic in question (i.e. dominance) were not required for the
position that was applied for.

11.3.2 Computer-based simulations


A closely related situation, and one which is growing in popularity, is
the use of computer-based simulations in which the people being
assessed have to respond to various situations as they evolve in these
simulations. Not only are these programs able to assess a wide range of
behavioural responses, but they also provide an instant assessment of the
situation. This is very convenient administratively, and also for the
person being assessed as it enables him to monitor his own performance
and adjust as necessary. It thus provides a learning opportunity that
allows the person to develop as a result of the assessment process.

11.3.3 Projective techniques*


This approach to the assessment of personality is based on the
assumption that the way people respond to various unstructured or
ambiguous situations reflects their unconscious desires, needs and
motives. The assumption is known as the projective hypothesis*. (See
Friedman & Schustack, 1999, pp. 50–52.) In personality assessment, this
involves asking a person to respond to various ambiguous situations that
are presented to him. Three basic types of stimulus are used in this
approach:

Inkblots. The assessor presents a series of inkblots to the person and


asks him to describe what he sees. These inkblots are similar to those
a child might create by dabbing a few spots of paint or ink on a page
and then folding the page so that the colours run together. The best
known of these are the Rorschach inkblots and the Holtzman inkblots.
Ambiguous pictures. The assessor gives the person a series of fairly
abstract pictures and asks him to describe what is happening in them,
what led to the situation depicted and what the outcomes are. A
typical picture found in the Thematic Apperception Test (TAT) is that
of a young child standing at a window and holding a violin. Another
is of a person interacting with an older figure. A third involves what
could be a mother and daughter or father and son in discussion. The
stories or protocols* resulting from each picture are analysed in terms
of various themes of interest to the assessor, for example anger
management, social dependence, unresolved oedipal issues, the need
for power or achievement, and so on. In each case, the assumption is
that when the person tells what is happening, he projects his own
hopes, fears, attitudes, and so forth onto the stimulus material. Two
examples of such pictures are given in Figure 11.1.
Incomplete sentences. This projective technique consists of a number
of incomplete sentences that the person is required to complete.
Typical sentences are: “I am happiest when …”, “My greatest fear is
…”, and “I get really angry when …”.

Figure 11.1 Two examples of TAT pictures

The strength of any projective technique is that it allows the assessor to


access psychological material that may be buried deep in the person’s
unconscious and which would not be elicited in any other way. The
downside of projective techniques is that they are subjective and thus
suffer from low levels of inter-rater reliability. As we saw in Chapter 5,
a technique cannot be valid if it is not reliable. Attempts to improve
inter-rater and other forms of reliability involve drawing up detailed
analytic guidelines and training the assessors to use them. A multiple-
choice version of the Rorschach test has also been drawn up in which
the person being assessed has to choose from four or five alternative
interpretations of the inkblot. This is known as the Structured Objective
Rorschach Test or the SORT.

11.3.4 Objective approaches


The use of questionnaires or scales is a very structured approach to
assessing personality and is based on a trait approach to personality.
Scales and questionnaires are by the far the most widely used approach
to measuring personality in most organisations, although the other
paradigms are also used. We have seen in Chapter 3 how a scale is
developed by identifying the construct to be measured, devising various
statements to assess various aspects of the construct, applying the scale
and then checking its reliability, validity and fairness. This technique is
also used in the development of personality scales. According to Kaplan
and Saccuzzo (2013), four basic approaches can be identified, namely
logical, theoretical, empirical and factor analytical.

Logical approach. The logical approach to personality scale


development takes common sense as its starting point. For example, if
we want to develop a scale (often termed an inventory) to assess
eating behaviour, we would ask a question like: “Do you frequently
eat between meals?” We would not ask a question like: “Do you
enjoy going for long walks?” or “Do you currently have a pet at
home?” These last two questions do not have a prima facie
relationship to eating patterns and would therefore not be included in
the scale.
Theoretical approach. This approach starts with a theory of, say,
eating disorders and then formulates a number of questions based on
this theory. For example, suppose the theory postulates that eating
habits are related to birth order and oral behaviour such as smoking.
The eating habit inventory would then include items on birth order
and smoking habits, even though these factors do not appear at first
glance (prima facie) to be related to the issue of eating habits.
(Clearly, these less-than-obvious items would need to survive item
analysis and other forms of validity checking, as described in Chapter
5.)
Empirical approaches. Empirical strategies begin by identifying two
known groups and then exploring a wide range of items that are
associated with one group and not the other. This approach is
sometimes termed empirical criterion keying*. There are two
examples that illustrate this. Suppose we wish to draw up an
inventory of gender identity (male versus female). Firstly, think about
the tube of toothpaste that you use regularly. Do you squeeze it from
the top (near the cap) or from the base? Secondly, think about the way
you hang the toilet roll on the holder in the toilet. Do you let the end
hang over the front (“overhang” it) or do you let the end bit hang at
the back of the roll (“underhang” it)? Different groups of people
respond differently to these two questions. Think carefully about what
you typically do in these two situations and note your answers on a
piece of paper. Now turn to Sidebar 11.1 on page 177 to see the
prediction.

Sidebar 11.1
How did you answer the toothpaste and toilet roll questions?

If you squeeze your toothpaste at the base of the tube and/or if you overhang
your loo roll, the chances are that you are male.
If you squeeze your toothpaste at the top of the tube and/or if you underhang
your loo roll, the chances are that you are female.

Although these two behaviours have nothing to do with masculinity or femininity,


they are examples of items that could be included if the empirical keying
approach to the development of a personality scale was used. (Of course, this
assumes that different groups respond to these two items differently, as
suggested here.)
Irrespective of your answers to these two questions or the correctness
of the predictions, if the answers successfully distinguish between the
two groups male and female as predicted, they could be included in an
inventory of gender identity. If any factor such as preference for
colours, food type or motor vehicle varies consistently across two
groups, items based on these factors can be included in the scale. This
is what is meant by the empirical approach.

Factor analysis approach. This approach begins by collecting a


range of empirical data from a large number of people and then
subjecting these to a factor analysis (see Sidebar 5.1 in Chapter 5).
The factors that are identified in this way form the basis of the
personality inventory. This is exactly what Raymond B. Cattell did
when he drew up the 16 Personality Factor Inventory or the 16PF,
discussed in section 11.3.6.

Before we proceed, it is useful to note that measures of personality may


focus on the presence and strength of a single characteristic, or they can
be much broader in focus and measure the relative strength of a number
of personality characteristics in the same assessment process.

Assessing a single dimension. A large number of scales and


inventories focus on measuring a single dimension or factor, such as
social phobia (shyness), internal versus external locus of control*,
anxiety level, need for achievement, assertiveness, dominance and
almost any other dimension of personality one can think of.
Assessing multiple dimensions. Instead of having a large number of
scales that each assess a single dimension, many personality measures
assess people on a wide range of personality dimensions such as is the
case with the 16PF which measures 16 personality factors. Other
inventories of this kind are the Minnesota Multiphasic Personality
Inventory (MMPI) and its revised version, the MMPI-2, and the
California Personality Inventory (CPI) and its revision, the CPI-R. For
a fuller account of these two scales, see Kaplan and Saccuzzo (2013,
pp. 446–458).
A second way of categorising personality tests is to make a distinction
between clinically oriented scales aimed at understanding the
personalities of people experiencing emotional and psychological
problems of some sort, and those aimed at assessing the functioning of
normal people in everyday or work situations.

Clinical scales. The MMPI mentioned above is an example of a


clinically focused scale with subscales assessing such constructs as
hypochondriasis (physical complaints), depression, hysteria and
paranoia (suspicion and hostility) to name a few. There are many
clinical scales of this kind.
Normal scales. These scales look at the personality functioning of
normal or non-clinical people. They are used for job selection and
research purposes. The California Personality Inventory (CPI), for
example, has subscales that measure factors such as poise, self-
assurance, socialisation, maturity and interpersonal effectiveness.
There are also a number of multidimensional scales developed for use
in the workplace. The Occupational Personality Questionnaire (OPQ),
developed by Saville and Holdsworth Limited (SHL), is one such
example that is used widely in South Africa and the UK.

11.3.5 The trait approach


By far the most prominent approach (or paradigm) to defining and
assessing personality is based on the concept of traits. Traits are seen as
“relatively enduring dispositions (tendencies to act, think or feel in a
certain manner in any given circumstance) that distinguish one
individual from another” (Kaplan & Saccuzzo, 2013, p. 17). For
example, we say that some people are hard working or lazy, anxious or
relaxed, and so forth. There are several approaches within this paradigm.

11.3.5.1 The four humours


One of the first attempts to describe personality was put forward almost
2000 years ago by the Greek physician Galen (129 – c.216 ad), who
believed that personality was determined by the predominance of one of
four humours or body fluids, namely black bile, blood, yellow bile and
phlegm. Each of these fluids was related to one of the four primary
elements that were seen to compose all matter, namely earth, air, fire
and water (see Table 11.4).

Table 11.4 The four humours and personality

Produced
Element Fluid Type Personality
by
Earth Black bile Gall bladder Melancholic Depressed, withdrawn unhappy
Air Blood Liver Sanguine Optimistic, outgoing, calm,
cheerful
Fire Yellow Spleen Choleric Irritable, grumpy, loud
bile
Water Phlegm Lungs Phlegmatic Quiet, placid, unemotional

Source: Based on Boeree (2002)

Although this approach is very old fashioned, some people still argue in
its favour, and if you go onto the Internet, you will find some modern
references to it. However, it is of interest to us because in the 1920s,
Wundt, one of the fathers of modern psychology, converted these four
characteristics into a two-by-two matrix, as shown in Figure 11.2.

Figure 11.2 Wundt’s typology


11.3.5.2 Wundt’s typology
Wilhelm Wundt established the first psychology laboratory in Leipzig,
Germany, to examine human behaviour in terms of reaction time,
perception and attention span (Kaplan & Saccuzzo, 2013, p. 13). He
tried to isolate all possible sources of error, and in so doing established
the first real experiments in psychology. Wundt used the four humours
theory as his starting point, and argued that the personality types
associated with each humour reflected different positions on a two-
dimensional matrix, with the degree of emotionality on one axis and the
extent to which emotions are expressed on the second axis. This is
shown in Figure 11.2.

In this development, he was the first to move from a type approach to a


trait approach. Types are regarded as categories with distinct and
discontinuous membership – people are either one type or another. In
trait theories, people differ in the amount of each trait they possess. (See
Furnham, 2008, Chapter 4, especially pp. 112–113 for further discussion
on this distinction.)

11.3.5.3 Jung’s typology


An extension of Wundt’s approach is that of Jung (1968), which also
places people along two similar dimensions, namely
introversion/extraversion and stable/neurotic. This is shown in Figure
11.3.

Figure 11.3 Jung’s typology


A great amount of work on personality and personality assessment has
been based on this kind of two-by-two mode with participants’ relative
strengths on each of these dimensions being assessed using
questionnaires and/or scales. In particular, the work of Hans Eysenck
(1967) in the UK, with his two dimensions of extraversion/introversion
and psychoticism/neuroticism, is well known and widely used. What is
interesting is that Eysenck linked the theory of introversion and
extraversion to the general level of brain or cortical arousal that people
possess (Eysenck, 1967; Eysenck & Eysenck, 1985). He argued that all
people need a certain degree of arousal and take actions to maintain this
– people do not want to be bored. (This is why solitary confinement is
such a powerful form of punishment.) He argues that introverts have a
low threshold of stimulation and are easily aroused (because they are
physiologically more easily stimulated) and they therefore try to avoid
situations that increase their arousal levels. They tend to avoid people
and situations that increase arousal levels – as a result they tend to be
seen as shy. On the other hand, extraverts have high thresholds and need
greater levels of stimulation. As a result they are usually under-aroused
and therefore tend to seek out situations that increase levels of arousal.
Accordingly, they are attracted to other people and to noisy, exciting and
often dangerous situations in order to meet their arousal needs.

11.3.5.4 Myers-Briggs
In 1962, this two-by-two approach was further developed by Myers and
Briggs (1962), who argued that people react to the world in terms of
four dimensions reflecting their preferences between equally desirable
alternatives. They termed these dimensions or preferences
extraversion/introversion, judgement/perception, sensing/intuiting and
feeling/thinking. According to Myers and Briggs, a person’s personality
is best described as different combinations of these four preferences (see
Figure 11.4).

Figure 11.4 The relationship between the four preferences

Source: Based on Hirsch (1991)

Extraversion (E)/introversion (I)


With extraverts, attention and energy seem to flow out from the person
into the environment. They prefer to act on the environment, to affirm
its importance and to increase its effect. Extraverts are therefore aware
of and reliant on the environment, are action oriented, some-times
impulsive, frank, easy in their communication and sociable. With
introverts, attention and energy flows from the environment into the
person. The main interests of introverts are the inner world of concepts
and ideas. Introverts are typically concerned with the clarity of ideas and
concepts, relying more on enduring concepts and principles than on
transitory external events. They tend to be thoughtful, contemplative and
detached, and enjoy solitude and privacy. In everyday usage, “extravert”
often means “sociable” and “introvert” often means “shy”.
Judgement (J)/perception (P)
Judgers are concerned with making decisions, seeking closure, planning
operations or organising activities. People who are Js are often seen to
be well organised, purposeful and decisive. Perceivers are attuned to
incoming information. Ps are seen as spontaneous and adaptive people,
open to new events and changes, and aiming to miss nothing.
Sensing (S)/intuition (N)
Sensing refers to perceptions that are observable by the way of the
senses (hearing, seeing, tasting, etc.). Ss establish what exists, and base
their decisions on what they can and have observed. They prefer to focus
on immediate experience, on facts, details and practicalities in the here-
and-now. They tend also to have keen powers of observation and a good
eye and memory for details. Intuiters, on the other hand, prefer to focus
on possibilities, meanings and relationships by way of insight. They are
more concerned with what could be than with what is. Intuition allows
perceptions beyond what is visible to the senses and tends to focus on
the abstract, creative, future. They tend to be more concerned with broad
principles and patterns than with fine details.
Thinking (T)/feeling (F)
Thinkers prefer to link ideas together based on logic and principle,
looking for cause-and-effect relationships. As a result, thinkers prefer to
be seen as impersonal, analytic and concerned with abstract principles
such as truth, justice and fairness. They are more objective and principle
oriented in their approach to decisions. Feelers come to decisions by
weighing relative values and the merits of the issues in the situation.
There is a capacity for warmth, human concern and the preservation of
traditions and values of the past. Feelers are more subjective and person
oriented in their approach to decisions.

These four dimensions are combined to yield 16 different personality


types, labelled, for example, ENTJ or ISTP. These combinations are
shown in Figure 11.5.

Figure 11.5 The sixteen MBTI personality types


An example of how this works is if we look at a person who is described
as an ESTJ (bottom left corner) and at another who is the opposite on
every dimension, an INFP.

Based on the trait descriptions provided by Myers and Briggs, a typical


ESTJ personality would be described as “practical, realistic, matter-of-
fact, with a natural head for business or mechanics. Not interested in
abstract theories: wants learning to have direct and immediate
application. Likes to organise and run activities. Often makes a good
administrator: is decisive; quickly moves to implement decisions; takes
care of routine details”.

The very opposite, the INFP, would be described as “a quiet observer,


idealistic, loyal. Important that outer life is congruent with inner values.
Curious, quick to see possibilities, often acts as a catalyst to implement
ideas. Adaptable, flexible, and accepting unless a value is threatened.
Wants to understand people and ways of fulfilling human potential. Has
little concern with possessions or surroundings”.

Each of the 16 cells in the table has a similar description (see, for
example, Furnham, 2008, pp. 114–118). The Myers-Briggs type
indicator (MBTI) is widely used in South Africa and abroad in areas
such as communication style, career guidance, job selection, leadership,
team formation and development, and emotional perception. (See
Quenk, 2000 and McCaulley, 2000 for other examples.)
One of the criticisms levelled against the MBTI is that there is very little
research to demonstrate its validity (see, for example, Furnham, 2008,
p.115 and the Public Service Commission of Canada, 2006). In addition,
the reports are worded in such a way that most people would agree with
them. In Sidebar 11.2, some interesting research into why these
statements are seen as valid is outlined.

Sidebar 11.2 The Forer effect


The Forer effect refers to the tendency of people to rate sets of statements as
“highly accurate” for them personally even though the statements could apply to
many people. It is named after a psychologist, Bertram R. Forer, who in 1949
found that people tend to accept vague and general personality descriptions as
uniquely applicable to themselves without realising that the same description
could be applied to just about anyone. He gave a personality test to his students,
ignored their answers, and gave the following personality description to each of
them.

You have a need for other people to like and admire you, and yet you tend to
be critical of yourself. While you have some personality weaknesses you are
generally able to compensate for them. You have considerable unused
capacity that you have not turned to your advantage. Disciplined and self-
controlled on the outside, you tend to be worrisome and insecure on the inside.
At times you have serious doubts as to whether you have made the right
decision or done the right thing. You prefer a certain amount of change and
variety and become dissatisfied when hemmed in by restrictions and
limitations. You also pride yourself as an independent thinker; and do not
accept others’ statements without satisfactory proof. But you have found it
unwise to be too frank in revealing yourself to others. At times you are
extroverted, affable and sociable, while at other times you are introverted, wary
and reserved. Some of your aspirations tend to be rather unrealistic. He asked
them to evaluate the accuracy of the description on a five-point scale, with “5”
meaning the recipient felt the evaluation was an “excellent” assessment and “4”
meaning the assessment was “good”. The class average evaluation was 4,26.
That was in 1948.

The test has been repeated hundreds of times with psychology students and the
average is still around 4,2 out of 5, or 84% accurate. His accuracy amazed his
subjects, though his personality analysis was taken from an astrology column in a
local newspaper and the same description was presented to all the people in his
sample without regard to their birth sign. This finding is known as the Forer effect.
The most common explanations given to account for the Forer effect are in terms
of hope, wishful thinking and vanity, and the tendency to try to make sense out of
experience, though Forer’s own explanation was in terms of human gullibility.
People tend to accept claims about themselves in proportion to their desire that
the claims be true rather than in proportion to the empirical accuracy of the claims
as measured by some non-subjective standard. We tend to accept questionable
or even false statements about ourselves if we deem them positive or flattering
enough. We will often give very liberal interpretations to vague or inconsistent
claims about ourselves in order to make sense of the claims. Subjects who seek
counselling from psychics, mediums, fortune tellers, mind readers, graphologists,
etc. will often ignore false or questionable claims and, in many cases, by their
own words or actions, will provide most of the information they erroneously
attribute to a pseudoscientific counsellor. Many such subjects feel their
counsellors have provided them with profound and personal information. Such
subjective validation*, however, is of little scientific value. It is also termed the
Barnum effect*.
Source: Based on Forer (1949). See also Friedman & Shustack (1999, p. 21)

11.3.6 The factor analysis approach – the case of R.B.


Cattell’s 16 PF
A major alternative to the two-by-two thinking discussed in section
11.3.5 is the factor analysis approach, in which a large number of
adjectives describing people (e.g. lazy, conservative, happy, sad) are
factor analysed (see Sidebar 5.1 for an explanation of factor analysis).
The person most closely associated with this approach is Raymond B.
Cattell, who argued for and demonstrated over a period of 30 years or
more the existence of 12 basic or primary personality traits (termed
source traits), and four higher-order factors, which are weighted
combinations of the 12 primary factors (see Cattell, Eber & Tatsuoka,
1970). All 16 traits are measured using the 16PF instrument. Table 11.5
gives a list of these factors, with a brief description of each, with lower
scores on the left and higher scores on the right.

Table 11.5 Factors of the 16PF

Factor Low score stens 1–3 High score stens 8–10


A Warmth Reserved, impersonal, distant Warm, outgoing, attentive to
others
B Reasoning Concrete• Abstract
C Emotion Reactive, emotionally Emotionally stable, adaptive,
Stability changeable mature
E Dominance Deferential, cooperative, Dominant, forceful, assertive
avoids conflict
F Liveliness Serious, restrained, careful Lively, animated, spontaneous
G Rule-conscious Expedient, non-conforming Rule-conscious, dutiful
H Social boldness Shy, threat sensitive, timid Socially bold, venturesome,
thick-skinned
I Sensitivity Utilitarian, objective, Sensitive, aesthetic,
unsentimental sentimental
L Vigilance Trusting, unsuspecting, Vigilant, suspicious, sceptical,
accepting• wary
M Abstractedness Grounded, practical, solution- Abstracted, imaginative, idea-
oriented oriented
N Privateness Forthright, genuine, artless Private, discreet, non-disclosing
O Apprehension Self-assured, unworried, Apprehensive, self-doubting,
complacent worried
Q1 Openness to Traditional, attached to the Open to change, experimenting
change familiar
Q2 Self-reliance Group-oriented, affiliative Self-reliant, solitary,
individualistic
Q3 Perfectionism• Tolerates disorder, Perfectionistic, organised, self-
unexacting, flexible disciplined
Q4 Tension• Relaxed, placid, patient Tense, high-energy, impatient,
driven

Source: IPAT (Institute for Personality and Ability Testing), 1993. “Copyright ©1993 by
the Institute of Personality and Ability Testing (IPAT Inc.), PO Box 1188,
Champaign, Illinois, USA. IPAT is a wholly owned subsidiary of OPP Ltd.,
Oxford, England. Reproduced with the permission of the copyright owner. All
rights reserved.”

The 16PF has been widely tested across the world, and remains a
popular instrument as it has been shown to be valid in a wide range of
situations. However, its long-term test-retest reliability is on the low
side, ranging from 0,21 to 0,64 (Kaplan & Saccuzzo, 2001, p. 461). The
16PF is designed for use with a normal population. To use it in a clinical
situation, an additional 12 factors need to be included. Prinsloo and his
colleagues at the Human Sciences Research Council (HSRC) adapted
and standardised the 16PF on a largely white sample of South Africans,
yielding a version known as the 16PF SA92 (Prinsloo, 1998). As shown
in section 5.6.1.2, Abrahams and Mauer (1999a, 1999b) argued that the
language level of the SA92 is such that non-mother-tongue speakers of
English would be at an extreme disadvantage. They even tried to have
the instrument banned in South Africa by the Psychometrics Committee
of the Professional Board for Psychology. Although this attempt has not
been successful, their work does point to some difficulties with the
instrument for use with people who do not have an adequate mastery of
English.

Although the 16PF is an excellent example of the factor analysis


approach to personality measurement, the claim that the 16 factor
identified by Cattell and his associates are the basic building blocks of
personality has been challenged by many critics and has largely been
replaced by the five-factor model discussed in section 11.4.1, especially
when used in the work context.

11.3.7 The five-factor theory


In recent years, the 16PF has been supplanted by several new scales
based on what is termed the five-factor model (FFM, also called the Big
Five model). One such scale, developed by Costa and McCrae (1985)
and updated in 1992, is the NEO-PI (Neuroticism, Extraversion and
Openness Personality Inventory). Costa and McCrae used advances in
factor analysis and personality theory in developing the items and
constructing the scale. Data presented in the manual of the NEO-PI and
a later revised version (the NEOPI-R) both support the validity of the
scale in numerous cross-cultural settings, suggesting that the idea that
there are five basic factors to personality is correct and universal. There
is some debate as to whether there is a sixth factor, but this view seems
to be much weaker than the case for five factors. The five factors are
best remembered by the acronym OCEAN (or CANOE), where

O = Openness to experience

C = Conscientiousness

E = Extraversion
A = Agreeableness or amiability

N = Neuroticism (the opposite of stability)

Each of these five factors has six subdivisions or facets.

In China, a sixth factor, Interpersonal Relatedness, has been identified


(Cheung et al., 2001), and in South Africa a further three factors seem to
be necessary to explain all the variances in the personality structure of
all South Africans (Valchev et al., 2013). The FFM is discussed in more
detail in section 11.4.1.

11.3.8 Multiple-construct batteries


In the same way that there are test batteries that assess various aspects of
ability and aptitude, there are also a number of personality batteries
which tap a wide range of personality dimensions. Some of these are
aimed at more clinically defined traits. They include the Minnesota
Multiphasic Personality Inventory (MMPI), the Psychological
Assessment Inventory (PAI) and the Millon Clinical Multiaxial
Inventory (MCMI) (Millon, 1997). This latter has been revised three
times and consists of 14 separate personality scales that correspond to
the DSMIV personality disorders. (The DSM-IV is the fourth edition of
the Diagnostic and statistical manual, which is the primary reference
work for clinical psychologists.) The MCMI contains ten clinical scales
and four validity indices (see Cohen & Swerdlik, 2002, p. 418).

Other personality batteries have been specifically designed to measure


various personality dimensions that are important in occupational
settings, such as the Occupational Personality Questionnaire (OPQ).
These are dealt with in detail in section 11.4.

11.3.9 Behaviour-oriented approaches


Behaviour-oriented psychologists take a learning theory approach and
assess personality by observing how people behave in various group
situations, at school and in the workplace, where techniques such as
work samples* and assessment centres are used. In these situations,
carefully constructed behavioural checklists (as discussed in section
2.2.2) are used to record exactly what the people do in particular
situations. This behavioural approach is an important aspect of assessing
people in assessment centres, a topic that is discussed in depth in
Chapter 17.

11.4 Assessing personality in the organisational


context

Organisational success depends as much on the ability of the employee


to fit into the organisation and to work with the management and other
team members as it does on intellectual ability. As a result, personality
assessment has begun to play an important role in selection, and some of
the more recent job analysis* techniques now incorporate personality
dimensions in the job description data (Hough & Oswald, 2000, p. 2).
Personality dimensions, in the form of MBTI scores, have also long
been part of the Position Analysis Questionnaire (PAQ)*. In addition
to these broad-based personality inventories or batteries, there are many
scales and questionnaires that measure a wide range of specific
personality traits. These include such aspects as social anxiety/phobia,
Type A* personality (coronary prone), locus of control, need for
power, Machiavellianism, achievement motivation, racism, sexism and a
whole range of other clinical, occupational and everyday factors.

In the past, personality assessment was seen to have a relatively low


predictive validity, especially in multicultural situations such as occur in
South Africa. This was largely because the assessment process was
based on a very narrow definition of personality and because of the
Eurocentric conception of personality traits. Furnham (2008, p. 126)
reports on a number of recent studies that show “the powerful and
predictable relationship between personality traits and work
performance/job success”. It is important to note that both the 16PF
(especially the South African version of 1992, the SA92) and the South
African Personality Questionnaire (SAPQ) have been found to
discriminate against African and other minority candidates.

11.4.1 The five-factor model*


The Five-factor model is discussed in section 11.3.7. In the workplace,
the conscientiousness factor of the five-factor model in particular has
been shown to be as important as reasoning ability in determining job
success. This is not surprising, as people higher on conscientiousness
have been shown to develop higher levels of job knowledge.
Conscientiousness also predicts organisational citizenship behaviour
(OCB). As Mount, Barrick and Strauss (1994) have argued,

[t]he preponderance of evidence shows that individuals who are


dependable, reliable, careful, thorough, able to plan, organised,
hardworking, persistent, and achievement oriented tend to have higher
job performance in most if not all occupational groups (p. 272).

Rothstein and Goffin (2006), in their review of the use of personality


measures to predict workplace behaviour, argue strongly that

[p]ersonality measures are increasingly being used by managers and


human resource professionals to evaluate the suitability of job
applicants for positions across many levels in an organisation. The
growth of this personnel selection practice undoubtedly stems from a
series of meta-analytic research studies in the early 1990s in which
personality measures were demonstrated to have a level of validity
and predictability for personnel selection (p. 155).

They conclude that

[d]espite the controversies surrounding meta-analysis and the FFM,


the weight of the meta-analytic evidence clearly leads to the
conclusion that personality measures may be an important contributor
to the prediction of job performance (p. 158).

Similarly, McManus and Kelly (1999) found that FFM measures of


personality provided significant incremental validity over biodata
measures in predicting job performance, while a study by Goffin,
Rothstein and Johnston (1996) showed that personality data provided
incremental validity over evaluations of managerial potential provided
by an assessment centre. In addition, personality data have been shown
to be linked to the likelihood that job applicants may be involved in an
accident, that they are more likely to be satisfied with their job, will be
motivated to perform, and will develop into leaders (see Rothstein &
Goffin, 2006, pp. 160–161 for details).

However, for a different viewpoint and a less favourable interpretation


of the importance of the Big Five to workplace success, especially the
conscientiousness factor, see Hough and Oswald (2000, pp. 635–636).
Furnham (2008), for example, cites various studies showing a significant
relationship between various FFM dimensions and such organisational
behaviours as sales success, productivity, absenteeism, leadership and
job satisfaction* (see pp. 130–134). He goes on to state (see pp. 130–
131) that “at the turn of the millennium, there was a new sense of
optimism among personality theorists … [and] … there has been
consistent evidence that personality tests do indeed predict behaviour at
work”. Using these examples, he goes on to dismiss the arguments by
Hough and Oswald (2000) (Furnham, 2008, pp. 134–140).

Health compliance and safety behaviour have also been related to this
model of personality – especially to conscientiousness. For example,
people who are low on conscientiousness are less likely to comply with
medication and treatment instructions. Risky behaviour and addiction
are also related to personality structure. Impulsiveness is clearly related
to risk-taking, which has major implications for the use of illicit drugs,
the prevention of HIV/AIDS and adherence to safety precautions in the
workplace. Conscientiousness seems to be positively related to a range
of good health and safety habits, and negatively to honesty and integrity
(see Chapter 13, section 13.2.2).

Other Big Five dimensions that are important are extraversion, which
predicts success in management and sales environments, and openness
to experience, which predicts training outcomes and receptivity to new
ideas. The remaining factors do not seem to be particularly important for
performance in the workplace.

11.4.2 MBTI
Another very popular measure of personality used in selection batteries
in many organisations is the Myers-Briggs type indicator (MBTI), which
was discussed in section 11.3.5.4. It is also discussed in depth in section
15.4.3.1 when we look at career assessment. (For an in-depth look at the
MBTI in the workplace, see Furnham, 2008, pp. 86–92.)

11.4.3 Locus of control*


A widely used personality dimension is locus of control. Briefly stated,
locus of control refers to the belief that a person’s fortunes in life are
determined by things that he does for himself (internal locus), or that the
things happen as a result of forces outside of him (external locus)
(Rotter, 1966). People who argue that they have failed because they
misunderstood the task or did not work hard enough have an internal
locus of control. However, people who blame their lack of success on
external factors such as the unjust system, discrimination by important
people, bad lecturers and other factors in their environment, are said to
have an external locus of control.

In general, the findings in respect of locus of control are as follows:

Sales people tend to be external types.


High external types tend to have lower levels of job satisfaction.
Internal types have lower absenteeism, because they are inclined to
look after their own health.
External types are more compliant and willing to follow instructions.

11.4.4 Type A and Type B personalities*


An important personality measure that is used to select people is the so-
called coronary-prone personality measure. In 1974, two cardiologists,
Meyer Friedman and Ray Rosenham, argued that in general there were
two basic personality types: Type A and Type B.

Type As

are workaholics
are always moving, walking, eating rapidly
work long hours
are competitive about everything
set themselves tight deadlines
get impatient when things are moving too slowly for them
always try to do several things at the same time (multitask)
are unable to delegate
cannot cope with leisure
measure success in terms of the number of things they own or how
much they have acquired
• are prone to stress and heart attacks.

Type Bs

are relaxed, laid back and not driven by time urgency


are able to delegate
feel no need to brag about their accomplishments
play for fun rather than to prove their superiority.

Type Bs are generally more successful at higher levels of the


organisation. Type As are less successful as they tend to trade quality for
speed – corporate success depends on wisdom and sound decisions –
that is, on quality and not quantity of work done. In general, Type As
tend to burn out and to suffer heart attacks.
11.5 Summary

In this chapter we defined personality as the preferred and relatively


stable ways in which people react to their worlds. We illustrated that
personality is more than biology, and rather reflects the values and needs
in terms of which the person has been socialised. Personality is seen as
the person’s “preferred ways of processing information and dealing with
the world”. Assessment methods are closely related to the theoretical
framework chosen. A distinction was made between those theories that
try to understand the person in terms of their own sociocultural
experience (the idiographic approach), and those that try to apply a
universalistic or nomothetic approach. The major theoretical frameworks
or paradigms described include biological, developmental,
psychoanalytic, needs, phenomenological and trait theories. By far the
most common theoretical framework used in industry is trait theory,
which sees personality as being made up of a number of relatively
permanent characteristics, which can be assessed using appropriate
methods.

When it comes to the measurement of personality, the most important


aspect to note is that the method chosen reflects the assessor’s
theoretical orientation. We identified numerous approaches to the
assessment of personality, including observation, computer-based
simulation, and projective and objective techniques. In line with the
importance of trait theory in organisational settings, the most frequently
used approach to personality assessment is the use of self-completion
questionnaires and scales of various kinds. We looked at the
development of trait theory from the ancient four humours model,
through various two-by-two matrices to the model proposed by Myers
and Briggs (the MBTI). Finally, we discussed factor analysis models
such as those proposed by Cattell’s 16PF and Costa and McCrae’s NEO-
PIR. In the process, we saw that personality measures can focus on
single traits or on a number of different characteristics.

Personality and the assessment thereof is important in all spheres, from


coping with everyday life events to clinical problems. Recent work has
shown that personality is an important aspect of success in the
workplace. As stated above, the most widely accepted approach in
organisational psychology is trait theory, and the most widely used
technique is the use of questionnaires. Other popular approaches are the
behavioural approaches (observing behaviour in a real situation or
simulation) such as occur in in-basket techniques. The use of Kelly’s
repertory grid is fairly popular in the UK.

We concluded by arguing that care must be taken with many assessment


instruments to avoid the Forer or Barnum effect: that most people would
agree with very broad and generalised descriptions of themselves.

Additional reading

For a good review of the MBTI, see Public Service Commission (of Canada) (2006),
Standardized testing and employment equity career counselling: A literature review of
six tests. Available at http://www.psc-cfp.gc.ca/ee/eecco/intro_e.htm
Fay Fransella, who has done much to advance Kelly’s repgrid theory over the years,
has produced a book on personal construct psychology, The essential practitioner’s
handbook of personal construct psychology (2005, London: Wiley), which is highly
recommended for people interested in this technique. An equally easy-to-read text is
Jankowicz, D. (2004), The easy guide to repertory grids. Chichester, UK: Wiley.
In addition, a good account of Kelly’s theory of personal constructs is provided by Dr
Valerie Stewart (2004), Kelly’s theory summarised: A summary of Kelly’s theory of
personal constructs, the basis of the repertory grid interview. Available at
http://www.enquirewithin.co.nz/theoryof.htm
Furnham’s 2008 text, Personality and Intelligence at work, has excellent chapters on
personality and personality testing in the workplace, the identification of personality
disorders at work, and the origin and assessment of integrity and dishonesty at work.

Test your understanding

Short paragraphs
1. Discuss what is meant by the projective hypothesis, and show how this is used to
assess personality.
2. Outline briefly Costa and McCrae’s five-factor model of personality.
3. What is meant by the Forer effect?

Essays

1. Show how the theory we adopt determines the method used to assess personality.
Refer to at least three different theoretical frameworks.
2. Show how personality theory evolved from the four humours model of ancient Greek
science to the type indicator model put forward by Myers and Briggs (the MBTI).
12 Assessing competence

OBJECTIVES

By the end of this chapter, you should be able to

define a competency
describe various kinds of competence
describe different levels of competence
show how competencies drive excellence.

12.1 Introduction

If we think of good sportspeople, we recognise that they have skills and


attitudes that make them good at what they do. Barry Richards, for
example, was one of South Africa’s greatest cricketers. He captained his
school side for three years, the national school side for several years, his
provincial and the South African team also for a number of years and
then, because of apartheid, played first in England and then in Australia.
Barry was not very academic, but he knew how to play cricket. At
school he had a thick notebook in which he drew little stick figures in
the bottom right-hand corner of each page. Each figure depicted the next
stage of a batting stroke, so that when Barry flipped through the pages
he ended up with a smooth batting stroke. He would repeat this exercise,
improving each little drawing until he got the stroke absolutely perfect.
Then he would do the same for the next stroke. In this way, he got to
know every stroke in perfect slow motion.

Of course, there is more to being a good sportsperson than mere


technical competence. All sportspeople, but especially leaders such as
captains, need a blend of technical ability, interpersonal skill, strategic
vision and sound off-the-field behaviour. To be a good captain, a player
has to have advanced knowledge about the game and the opponents.
Such a person has to have the required technical skills (e.g. batting,
bowling, fielding, etc.). He must be highly motivated and very keen to
win. He must also have strong tactical and strategic skills (e.g. know
when to change bowlers, re-arrange the fielders, etc.). The person must
have the interpersonal and leadership skills to direct and motivate the
team; he must get on well with the team management, fans, politicians
and the press. And finally, he must have a high level of moral integrity
and not be open to bribery and match fixing.

This kind of thinking has been applied to skilled performance in a


number of areas, but most especially in the area of industrial and
organisational psychology, where job performance is analysed in terms
of this collection of abilities, which are termed competencies.

A competency is what a person must be able (and willing) to do to achieve above-


average performance in a specific role. Competence is seen as that which
underlies and facilitates the demonstration of skilled behaviour on a consistent
basis and thus includes aspects of motivation and the desire and willingness to
perform at the highest levels possible. Competence is what is needed for effective
performance.

12.1.1 Definition
According to David Dubois (2005, p. 8), “[C]ompetencies are the traits
or characteristics, including an individual’s knowledge, skills, thought
patterns, aspects of self-esteem, and social roles, that are used to achieve
successful or exemplary performance of any type”.

Weiss and Hartle (1997, p. 29) define a competency as

a personal characteristic that is proven to drive superior job


performance. It describes what top performers do more often with
better results than their average counterparts. Competencies establish
a causal link between certain behaviours and the achievement of
success. They describe what makes people effective in a given role.
Similarly, Arnold (2005, p. 616) gives the following definition: “A
competency is the specific behaviour patterns (including knowledge,
skills and abilities) a job holder is required to demonstrate in order to
perform the relevant job tasks with competence.”

Finally, the US Strategic Human Resources Managers (SHRM) (2012)


argue that

competencies are individual characteristics, including knowledge,


skills, abilities, self-image, traits, mindsets, feelings, and ways of
thinking, which, when used with the appropriate roles, achieve a
desired result. Competencies contribute to individual exemplary
performance that creates reasonable impact on business outcomes (p.
1).

Furnham (2008, p. 319) shows how the term competence is used in at


least six different and often opposing ways. In short, competencies are
collections of knowledge (what people know), skills (what people can
do), attitudes (what they feel about various issues) and values (what they
believe is the right thing to do) required for above-average performance
in any task. Although the term “competencies” is used most often in
work-related contexts, it can just as easily be used in social situations.
We may ask what makes a good husband or wife, a good citizen or
sportsperson, or a good parent. In each case, success depends on the
presence and relative strength of the person’s knowledge, skills,
attitudes and values – or KSAVs. (“A” can stand for “attributes” not
“attitudes”, but in practice this makes very little difference as an attitude
is an attribute of a person.)

The first part of this chapter focuses on competencies in the typical work
situation, because this is where almost all the research has been done.
The second part examines the extent to which the notions developed in
the first part can be applied to other, non-work situations.

The first thing to notice is that competence is tied to demonstrable


behaviour and is an output rather than an input characteristic – it is
outcomes based. In other words, competence needs to be demonstrated
during the performance of a task and does not depend directly on
qualifications and experience. A person with the right amount of
knowledge obtained from a university may lack the necessary skills or
understanding required for the job, and is thus not competent. Similarly,
a person who lacks the academic training may have learned how to carry
out certain tasks and be competent even though he does not have all the
required theoretical knowledge. Of course, it is hoped that university-
educated people will become competent very quickly because of their
background knowledge, especially when new skills and knowledge have
to be developed.

12.2 Drawing up a competency framework

To establish the nature of the competencies required for any job, there is
a six-step process to follow. This is given below. Note that while we talk
about jobs, the same reasoning can apply to other areas such as
admission to a school (school readiness) or release from an institution
such as a hospital or prison.

12.2.1 Decide on the overall purpose of the job


Most organisations have job descriptions of some form in which the
tasks that need to be carried out for each particular job or job type are
described. The overall purpose is more general; for example, in the case
of an administrative or personal assistant, this could read as follows:
Ensure the smooth functioning of the office and that all required
material and information are available when needed. Treat visitors and
colleagues with respect and make them feel welcome in the office. In the
case of a captain of a sports team, this general statement could read:
Ensure that the team wins the majority of its games in an entertaining
way, keeping the fans, sponsors, management, press, politicians and
general public happy and supportive of the team’s effort.

12.2.2 Decide on units of competence


This step involves breaking the job down into various critical areas in
which the person has to perform. These are known as key performance
areas or KPAs. In the case of the cricket captain, his units of competence
(KPAs) would include the following, for example:

Assist in the selection of the team.


Show technical and/or functional skills (batting, bowling, etc.).
Display tactical skills (changing the bowlers and field settings,
declaring the innings closed).
Make strategic decisions (batting first or second).
Show leadership (motivating team members, disciplining them where
necessary, etc.).
Liaise with the media and undertake public relations activities (giving
interviews, visiting underprivileged schools or areas, and coaching
there).

12.2.3 Describe elements of each competency (KPA)


Each competency or KPA needs to be specified in terms of the
observable activities that are involved. For example, in the case of the
media liaison KPA mentioned above, the observable behaviours would
include such elements as

holding media briefings before and after every match


behaving professionally at these media briefings (in terms of attitude,
cooperation, appearance, answering reasonable questions openly,
etc.).

Precise ways of generating these KPAs are dealt with in section 12.6.

12.2.4 Establish performance criteria


If we take the second element of the media liaison competency
described above, namely “behave professionally at these media
briefings”, the performance criteria could include the following:

Show a positive attitude towards the media.


Arrive on time and do not rush interviews.
Be cooperative, answering reasonable questions openly.
Be well dressed and professional in appearance.
Be sober.
Do not disparage the efforts of the opposition.

Such statements are often termed behavioural indicators (BIs). For each
competency, we ideally require about five behavioural indicators and for
each of these BIs we have to specify levels or standards against which to
judge or benchmark the person’s behaviour or performance level. For
example, if we look at “batting” as a competency in cricket, we could
identify five indicators such as: 1 – batting average; 2 – not running your
partner out; 3 – having a wide range of strokes; 4 – running between
wickets; and 5 – strike rate. For 1 – batting average, we would have to
specify what is meant. For example:

1. Very competent – has a batting average of 50+ runs per innings


2. Good – has a batting average of 30–49 runs per innings
3. Acceptable – has a batting average of 15–29 runs per innings
4. Poor – has a batting average of under 15 runs per innings

As Dubois (2005, p. 9) argues, behavioural indicators describe actions or


behaviours one can observe an individual taking or using that signify an
appropriate application of the competency in a specific performance
setting. Competencies are defined relative to the performance context in
which they are to be used. Behavioural indicators capture key
information about the cultural expectations of the work setting or
organisation (Dubois, 2005, p. 5). For example, batting averages in
international cricket will not be the same as those at school-level cricket.
12.2.5 Draw up range statements
A range statement is a list of the situations in which the competencies
are appropriate, and can include products, services, clients, and so on, to
which the competencies relate. For example, in the case of the cricket
captain, talking to the local and international press could be included in
the list, but talking to Noseweek or posing for nude photos in Stud
magazine may well be excluded or even specifically forbidden. (Recall,
however, that a former provincial and national cricket captain, Clive
Rice, did pose nude once for a calendar, although his “vital statistics”
were well hidden behind a cricket bat!)

12.2.6 Specify sources of information


A very important part in drawing up a list of competencies and
behavioural indicators is to specify where the information about a
person’s performance will be obtained. This may include written reports
by the person involved or from discussions with his colleagues,
superiors and/or clients. In the case of a person who has been released
from an institution, this may involve reports from his probation officer
or social worker, from the family, and so forth. As in any management
process, information systems have to be in place, and if they are not, it is
management’s responsibility to create and use such systems. It is also
imperative that these systems work properly. It is useless to have
training officers so overworked that proper training does not occur, but
management still assumes that it is under control because an inadequate
system does not report failure.

12.2.7 Identify potential barriers


In addition to these six steps, some writers in the area of competencies
argue that it is useful to identify the possible barriers to successful
implementation or display of a desired competency.

12.3 Assessment of competence


We may wonder how we get to know what competencies a person has,
and what level of competence he has achieved in each of them. In
essence, there are only two ways: firstly by direct observation, and then
by indirectly assessing whether the person knows what has to be done
and is motivated to meet these requirements.

Direct observation involves watching a person in a work and/or social


situation as well as examining outputs generated by specific situations.
As we have noted, competencies are performance based, and thus
assessment is behavioural – we need to observe people in a real or
realistic situation in which they are given the opportunity to display their
competence. (This stresses the importance of proper observation, as
described in Chapter 2.)

If one of the competencies of a university lecturer is to produce research


publications in his discipline, then simply counting the number of
published articles, papers or books and assessing their quality is a form
of direct measurement. If part of his job is to teach students, then the
pass rate and average class mark are important direct measures of
competence. If the competence being assessed is the production of good
course materials, then these need to be examined and commented upon.
If lecturing technique is important, then the lecturer needs to be
observed in the lecturing situation so that a decision about his
competence can be made.

Indirect measures include discussions with the people affected by the


particular task (clients, patients, etc.) as well as with the person’s
superior. If we wish to assess a lecturer’s competence in teaching, the
feedback received from students at the end of a course is an important
indirect measure of competence. Likewise, a report from the department
head or colleague should be considered. Pass rates, the growth (or
shrinkage) in class sizes, and the popularity of the course are also useful
indicators. (However, we need to be careful here – some classes are
inherently more interesting than others. Lectures about mental health
and sexual behaviour are, and always will be, more popular than lectures
on statistics and psychometrics.) In the work situation, the importance of
a properly constructed performance appraisal process cannot be
overemphasised.

12.3.1 Levels of competence


To determine an individual’s level of competence, the particular task is
broken down into various subtasks, and a detailed description or
standard of performance at each level of competence for each separate
subtask is established. The person is then judged against these standards.

Dreyfus et al. (1980), who did much of the original work on


competencies, proposed five levels of competence, namely:

1. Novice: rule-based behaviour, strongly limited and inflexible


2. Experienced beginner: incorporates aspects of the situation
3. Practitioner: acts consciously from long-term goals and plans
4. Knowledgeable practitioner: sees the situation as a whole and acts
from personal conviction
5. Expert: has an intuitive understanding of the situation, and zooms in
on the central aspects

Thirty-three years later, Deloitte Consulting (Bersin, 2013) produced


almost exactly the same five levels of proficiency, namely:

0 = No understanding

1 = Basic understanding

2 = Working experience

3 = Extensive experience

4 = Expert in the field

5 = World-renowned expert
Although five levels of competence are recommended above, in practice
many organisations use only three levels: not yet competent, competent
and more than competent. This approach was proposed for South Africa
for the competency framework of the South African Qualifications
Authority (SAQA) and the National Qualifications Framework (NQF).
Numerous problems were foreseen with this approach. However, in a
more recent move SAQA (2012) has adopted a far more realistic
approach, identifying ten levels of competence and listing ten categories
in terms of which each level must be described. These ten categories that
are used in the level descriptors to describe applied competencies across
each of the ten levels of the NQF are the following:

1. Scope of knowledge
2. Knowledge literacy
3. Method and procedure
4. Problem solving
5. Ethics and professional practice
6. Accessing, processing and managing information
7. Producing and communicating of information
8. Context and systems
9. Management of learning
10. Accountability

If we take the first of these categories (scope of knowledge), the


competency framework identifies the following ten level indicators (pp.
5–12):

Level 1 – the learner is able to demonstrate a general knowledge of


one or more areas or fields of study.
Level 2 – the learner is able to demonstrate a basic operational
knowledge of one or more areas.
Level 3 – the learner is able to demonstrate a basic understanding of
the key concepts and knowledge.
Level 4 – the learner is able to demonstrate a fundamental knowledge
base of the most important areas of one or more fields or disciplines.
Level 5 – the learner is able to demonstrate an informed
understanding of the core areas of one or more fields, disciplines or
practices, and an informed understanding of the key terms, concepts,
facts, general principles, rules and theories of that field, discipline or
practice.
Level 6 – the learner is able to demonstrate detailed knowledge of the
main areas of one or more fields, disciplines or practices, including an
understanding of and the ability to apply the key terms, concepts,
facts, principles, rules and theories of that field, discipline or practice
to unfamiliar but relevant contexts; and knowledge of an area or areas
of specialisation and how that knowledge relates to other fields,
disciplines or practices.
Level 7 – the learner is able to demonstrate integrated knowledge of
the central areas of one or more fields, disciplines or practices,
including an understanding of and the ability to apply and evaluate the
key terms, concepts, facts, principles, rules and theories of that field,
discipline or practice; and detailed knowledge of an area or areas of
specialisation and how that knowledge relates to other fields,
disciplines or practices.
Level 8 – the learner is able to demonstrate knowledge of and
engagement in an area at the forefront of a field, discipline or
practice; an understanding of the theories, research methodologies,
methods and techniques relevant to the field, discipline or practice;
and an understanding of how to apply such knowledge in a particular
context.
Level 9 – the learner is able to demonstrate specialist knowledge to
enable engagement with and critique of current research or practices,
as well as advanced scholarship or research in a particular field,
discipline or practice.
Level 10 – the learner is able to demonstrate expertise and critical
knowledge in an area at the forefront of a field, discipline or practice;
and the ability to conceptualise new research initiatives and create
new knowledge or practice.

A key issue that needs to be addressed in any competency-based framework is to


answer this question clearly: Competent for what?

The 2012 list of level descriptors put forward by SAQA has gone a long
way towards addressing this issue by clearly indicating a hierarchy of
increasingly difficult criteria that need to be met at various levels in the
education system. In addition, a far more sensible approach has been
adopted for the grading of performance at the school-leaving (and
lower) levels. Previously, it had appeared that the education authorities
were in favour of a simple three-level system – not yet competent,
competent and more than competent. Within this framework, it looked at
one stage as though the school-leaving certificate at the end of the
secondary phase (for example) would simply reflect Mathematics –
Competent; Biology – More than competent; etc. Although this approach
could have some benefits, it would also have created a number of
problems. For example, if one wanted to award a bursary or scholarship
on the basis of merit, this approach would be unsuitable. Similarly, if
one wanted to select the best person for a position (using a top-down
rating system), this approach would also have been inadequate. For
example, a person may be more than competent on leaving school to
study technical drawing but not competent to study engineering or
actuarial science (both of which require an A or B in mathematics in the
old system). If we were to use this three-point system, we would have to
create a report card that looks something like this:

Mathematics (for psychology, technical drawing and nursing) –


more than competent
Mathematics (for BSc and accountancy) – competent
Mathematics (for engineering, actuarial sciences, statistics, Honours
in biology, etc.) – not competent
We would need do this for all subjects taken in Grade 12 (the so-called
“matric”) and so try to accommodate all possible tertiary education and
job opportunities. Clearly, it would not be a very practical system, but
seems better than the simple three descriptions chosen by the South
African education authorities, since these fail to spell out what academic
or work direction a person’s level of competence will allow him to
follow (competent for what?). As a result it is quite probable that the
universities and other tertiary education institutions, as well as business
organisations, will need to develop their own assessment procedures to
ensure that people entering the different systems are properly placed.

“Matriculation” (abbreviated as “matric”) means “being able to be admitted to a


university”. In the past, universities set their own entrance examinations.
According to the previous education system, when school leavers achieved a high
enough pass in their Grade 12 examinations, they were given matriculation
exemption. In other words, they did not have to write the separate examination set
by the universities.

Sidebar 12.1 The National Benchmarking Tests (NBTs)


In February 2009, HESA (Higher Education South Africa), a high-level
educational body consisting of representatives from all 23 public universities and
universities of technology in South Africa, requested the development of
standards for a series of university entrance examinations. These tests, termed
the National Benchmark Tests (NBTs), have been developed in collaboration with
the American Educational Testing Service (ETS). The ETS has been responsible
for developing most of the world’s largest educational tests such as the SAT
(formerly known as the Scholastic Aptitude Test but now simply the SAT
Reasoning Test), the GRE (Graduate Record Examination), the TOEFL (Test of
English as a Foreign Language) and others. The NBTs were developed because
the existing school-leaving qualification (the National Senior Certificate or NSC)
does not provide adequate information about the competency level of school
leavers. In their words, “the NSC is of necessity norm referenced, which means
that its results yield little information about candidates’ actual levels of
achievement”. The NBTs, in contrast, “are designed to provide criterion-
referenced information to supplement the National Senior Certificate” (HESA
letter to academics dated 29 January, 2009).
This request by HESA clearly points to the failure of the competency-based
approach within the school systems – the normative nature of the NSC runs
counter to the idea of assessing competencies, and fails to answer the question
“competent for what?” Organisations seeking to evaluate potential employees will
also need to face this issue, and implement their own assessment procedures,
unless the educational standards can be normalised.

There is, however, an alternative to these two approaches (i.e. clearer


and more specific assessments of competence, and organisations
developing their own assessment processes). Recent work has suggested
that competence is not a matter of all or nothing, and that at least five
levels of competence can be identified, namely not yet competent,
threshold competence, experienced worker competence, highly
competent and mastery-level competence.

1. Not yet competent. This means that the person is basically unable
to perform the task or to meet the minimum standards required. This
may be because he lacks the skills and knowledge required for the
task, although with further training, development and experience he
may be able to achieve these levels in a reasonable time.
Alternatively, it may be that the demands of the task are beyond his
abilities, and he should therefore be given other, more suitable, tasks
to do. Deciding whether the person has the potential to acquire the
competence or not points to the importance of proper assessment
and placement procedures.
2. Threshold competence. This means that the person is able to carry
out the tasks related to the job at a level that is acceptable to the
organisation in terms of quality and efficiency. At this stage,
however, his ability to solve problems is not very well developed
and he may need supervision and/or help from more competent
colleagues. In terms of Hersey and Blanchard’s (1968) theory of
situational leadership, these people are at task maturity level 1 and
require a telling mode of leadership. (See any industrial or
organisational psychology textbook for details of Hersey and
Blanchard’s situational leadership (SitLead) model.) Successful
performance at this level suggests that the person has the potential to
progress to higher levels of competence.
3. Experienced worker competence. This level is attained when the
person is able to carry out all the tasks required of the job at an
acceptable level, with above-average levels of efficiency and
quality. The person will make few, if any, mistakes and is able to
solve most problems that confront him in his area of expertise. In
terms of Hersey and Blanchard’s SitLead model, these people are at
task maturity level 2 and 3, and require a selling or participating
mode of leadership. In general, they can be left alone to get on with
the job.
4. Highly competent. This person is able to meet and exceed the
required work standards without having to rectify mistakes
afterwards. High-quality standards are maintained, and the person
has well-developed problem-solving skills. He is able to initiate new
ways of approaching tasks and solving problems. He makes a good
mentor and coach, and is able (and often keen) to share his expertise
and knowledge with others and to help them quickly become
experienced workers.
5. Mastery-level competence. This person shows complete mastery of
his task, and is a true expert in the area; he is able to solve really
difficult problems that have baffled others. He is difficult to replace,
and as a result is often kept in a specialist role and not considered
for promotion as this may take him away from his technical
competence into a managerial role. Novel ways of rewarding such a
person and ensuring he remains committed to the organisation need
to be found.

In many cases, there is not enough evidence for the assessor to judge
whether or not the person is competent, and so a category marked
“Insufficient evidence to form a judgement” is also used.

It would appear that the Department of Education (or at least parts of it)
is aware that the simple three-way categorisation of competent, not yet
competent and more than competent is inadequate. For example, the
Department of Education, in its 2005 subject assessment guidelines
(Department of Education, 2005, p. 7) makes the following statement:

Schools are required to give feedback to parents on the Programme of


Assessment using a formal reporting tool. This reporting must use the following
seven-point scale:
Table 12.1 The Education Department’s 2005 grading system

Rating code Rating Marks


7 Outstanding achievement 80–100
6 Meritorious achievement 70–79
5 Substantial achievement 60–69
4 Adequate achievement 50–59
3 Moderate achievement 40–49
2 Elementary achievement 30–39
1 Not achieved 0–29

Source: Department of Education, 2005

However, it should be noted that in terms of this grading system,


someone scoring only 30 per cent is regarded as having “elementary
achievement”. A sounder categorisation is put forwards by INSETA (the
Insurance and Financial Sector Education and Training Authority)
which, in its “Recognition of Prior Learning (RPL) Concessions
Document” (INSETA, 2012), has proposed the following five levels of
(successful) achievement, tied to the matriculation (school-leaving)
qualification.

Table 12.2 The INSETA 2012 grading system

Achievement level Achievement description Marks %


A Outstanding achievement 80–100
B Meritorious achievement 70–79
C Substantial achievement 60–69
D Adequate achievement 50–59
E Achieved 40–49

Source: http://www.inseta.org.za/downloads/RPL_Concessions_Guideline_V6_2012.pdf
(p. 2)
As a general statement, it can be noted that where the competence
measures are closely related to a specific purpose or a fairly narrowly
defined job or situation, the fewer the number of competence levels that
are required. Where the outcome of the assessment is to be more widely
used, a greater number of competence levels are required.

12.4 Various kinds of competency

Various authorities have defined different kinds of competency for a


variety of situations. One such is the Department of Education, which
argues for four different types of competency. This is shown in Sidebar
12.2.

The number of levels of competence is quite arbitrary, despite what some experts
would have us believe. In some cases, a simple three-way system is all that is
required. In other situations, a more refined system with as many as seven or
more levels may be more appropriate. The number of levels depends largely on
the purpose of the assessment, how easy it is to differentiate the various levels,
and how well trained the assessors are in the assessment process.

Sidebar 12.2 Educational competencies


According to the Norms and Standards for Educators (Department of Education,
2000, p. 10), applied competence is the overarching term for three interconnected
kinds of competence.
Practical competence is the demonstrated ability, in an authentic context, to
consider the range of possibilities for action to make considered decisions about
which to follow and to perform the chosen action.
It is grounded in foundational competence where the learner demonstrates an
understanding of the knowledge and thinking that underpins the actions taken,
and integrated through reflexive competence in which the learner demonstrates
the ability to integrate or connect performances and decision making with
understanding and with an ability to adapt to change and unforeseen
circumstances and to explain the reasons behind these adaptations.
Source: Department of Education (2000)

12.4.1 Core and cross-functional competencies


One way of looking at competencies is in terms of core and cross-
functional or cross-field competencies. This terminology also comes
from the South African education authorities. A core competency is one
that is part of the particular area under discussion. For example, a core
competency in the field of mathematics could be the ability to calculate
the square root of a number without using a calculator. In biology, a core
competency could be the ability to describe the contents of a typical
plant cell, using an appropriate diagram to illustrate the structure.

Cross-field competencies are those that apply to different subject areas,


and which support and are necessary for the expression of the core
competencies. An example of such a cross-field competency is to
organise and manage oneself and one’s activities responsibly and
effectively. Another cross-field competency is to communicate
effectively using visual, mathematical and/or language skills in the
modes of oral and/or written persuasion. We can see that these cross-
field competencies are developed by, but also determine and reflect, the
competency achieved in specific areas of knowledge.

12.4.2 Technical and higher-order competencies


Another way of describing competencies used in organisational settings
is to distinguish between technical or functional and higher-order
competencies. If we think back to our discussion of the cricket captain,
we can recall that there are essentially two kinds of competencies: those
related to the technical aspects of the job (batting, bowling, etc.) and
those that are broader in scope, having to do with managing others,
motivating the team, dealing with the press and sponsors, and so on. For
want of better terms, let us call the first set of competencies technical or
functional competencies, and the second set higher-level competencies.
(In many ways, this distinction is similar to the core and cross-field
competencies described in section 12.4.)
12.4.2.1 Technical or functional competencies
Technical or functional competencies are those that focus on knowledge
and skills within a particular area or domain. According to Dubois and
Rothwell (2000, pp. 2–30), “[t]echnical competencies are the specialised
primary and highly related knowledge and skill competencies that
employees must possess and use in appropriate ways on the job”. They
give the following examples:

Quadratic equations: solve quadratic equations over the domain of


complex numbers.
Surgical wound closure: demonstrates the use of a primary closure
technique to close a surgical wound.
Word processing: processes a standard manuscript text at a rate of 80
words per minute with no errors (Dubois, 2005, p. 4).

In the case of university lecturers, examples of technical or functional


competencies could include the following:

Curriculum development. Draw up a properly constructed course in


psychological assessment, covering all relevant areas.
Course material. Produce a set of notes, reference lists, overhead
and/or teaching material, associated practical exercises and
examination questions (with marking protocol and/or model answers).
Marking assignments. Mark and return, with comments, all set
assignments within three weeks of submission.

Note the format of these competency statements. The competency is


named, and this is followed by a description of the content of the
competency, with a unit standard where appropriate (e.g. “80 words per
minute” or “within three weeks of submission”). Thus we see that the
style of these competencies is very similar to the core competencies
described in section 12.4.1.

Dubois and Rothwell (2000) argue that not all competencies are of this
technical/functional kind, and identify a class of higher-level
competencies focusing on softer issues such as people management
skills. These higher level or

personal functioning competencies … are not oriented or aligned with


any particular functional or technical speciality. They include the
characteristics or competencies that employees call upon and
consistently use – along with their other competencies – to be
successful performers with other persons, both internal and external to
their organisations. They can also include knowledge or skills
elements (2000, pp. 2–31).

These are similar in many ways to the cross-functional competencies


discussed above. In the academic arena, these would include the ability
to critically read a text, access information in the library and on the
Internet, write an essay, reference material, and the like. According to
Meyer and Semark (1996), competencies that transcend specific jobs but
that are essential for effective functioning in a modern economy are
referred to as generic, individual meta-competencies.

As Dubois (2005, pp. 4–5) points out, within the general management
arena many of these generic interpersonal competencies can be
described within the broad notion of emotional intelligence. He goes on
to give several examples of these higher-order or personal functioning
competencies as follows:

Interpersonal sensitivity: sincerely and consistently values and


demonstrates respect for the opinions of others, even when one is not
in agreement with those opinions.
Strategic view: takes a strategic or broad-range view of organisational
issues, problems, events or circumstances relative to one’s thoughts
feelings or potential actions.
Managing emotions: manages one’s thoughts and feelings about
circumstances, issues or situations in ways that lead to productive or
successful performance.
It must also be recognised that different competencies may be required
at different job levels and that the same competency will have different
behavioural indicators at different levels of the jobs hierarchy. In this
respect, the US Society for Human Resource Managers (SHRM) (2012)
put forward various competencies that differ by the career stage one is
in. For this purpose they identified Entry, Mid, Senior and Executive
levels. Using the best practices identified by the US Society for
Industrial Organizational Psychology’s (SIOP) taskforce on competency
modelling and job analysis (see Campion et al., 2011; Shippmann et al.,
2000), the SHRM arrived at the following nine primary competencies
they believe are necessary for success in the HR field. These are:

Human resource technical expertise and practice


Relationship management
Consultation
Organisational leadership and navigation
Communication
Diversity and inclusion
Ethical practice
Critical evaluation
Business acumen

It remains true, however, that as one moves up the job hierarchy, the
emphasis increasingly shifts from technical and functional competencies
toward the higher-order, more interpersonal, professional and
managerial competencies. (For a look at other competency models, see
Chapter 17, section 17.2.1 and Tables 17.2 and 17.3.)

12.4.2.2 Higher-level competencies


Besides the technical or functional competencies, there are competencies
that focus on softer issues such as people management. According to
Dubois and Rothwell (2000, pp. 2–31), these higher-level or personal
functioning competencies are not oriented or aligned with any particular
functional or technical speciality. They include the characteristics or
competencies that employees call upon and consistently use – along
with their other competencies – to be successful performers with other
persons, both internal and external to their organisations. They can also
include knowledge or skills elements.

These are similar in many ways to the cross-functional competencies


discussed in section 12.4. In the academic arena, these would include the
ability to critically read a text, access information in the library and on
the Internet, write an essay, reference material, and the like.

From this discussion we see that the higher-order or personal


competencies are very much like the cross-field competencies described
in section 12.4.1.

12.5 Advantages of using a competency framework

In the work situation, it is sometimes difficult to translate information


about job requirements and responsibilities into clear-cut criteria that
can be evaluated. However, the most obvious advantage of using a
competency framework is that it is based on observable behaviours.
Therefore these competencies often form the basis for many other
organisational activities, such as selection, training and development,
promotion, performance management and even disciplinary processes. If
we know exactly what is required, we can manage our organisation in
terms of these competencies.

Similarly, in education, grading the work of learners has often been a


very vague sort of exercise, in which the marker has not had very
explicit criteria or success, thus rating broad impressions rather than
assessing clearly defined outcomes. In many instances, it has been a case
of having to recognise a good piece of work when the marker encounters
one. By insisting on clearly defined and observable outcomes, which
both the student and the marker know about beforehand, a far more
precise and equitable (fair) assessment can be made.

Although competencies have much to be said in their favour, a major


problem is that the fields to which they apply are rather narrowly
defined – with the result that it is often difficult to generalise across
different circumstances or job categories. Technically this can be seen as
the domain having a very steep gradient of generalisability, which
means that the usefulness of one set of competencies and their
definitions are difficult to apply to jobs that are even relatively closely
related. Compare this to a measure of cognitive ability or a personality
variable such as conscientiousness, where the validity of the technique
remains relatively consistent across a broad range of jobs and situations
– the gradient of generalisation for these measures is relatively flat. So
while the concurrent and predictive validity of a competency framework
for specific jobs may be very high, this falls off quite rapidly across
jobs, requiring the development of a large number of different, very
specific competency frameworks or models for different jobs and job
families. The developers of competency frameworks know this and have
developed large numbers of related competencies. These frameworks
have names such as Competency Libraries, Competency Mapping, The
Competency Architect and the like. A key issue that needs to be
addressed in any competency-based framework is to clearly answer the
question: competent for what? As long as the competency model is
applied to a very narrow job definition and description, the general
utility of a competency approach is very low.

12.6 How are competencies identified?

Clearly, the choice of competencies by a particular organisation is


closely tied to its objectives. In organisational psychology, there is a
well-known saying: “What gets measured, gets done”. Therefore, if we
want to change the behaviour of individuals, groups, departments or the
organisation as a whole, we need to determine and spell out in
competency terms what it is that we are trying to achieve and then draw
up a list of “success criteria” that tells us when we have done it. This
also applies in clinical, counselling, health and educational contexts.

So how do we do this? The following seven methods have been


suggested:

1. Focus groups. These can include highperforming employees and


their managers.
2. Job diaries. A technique used to draw up both job descriptions and
competency profiles is to ask job holders to note in a diary or on an
activity sheet all their main tasks and the information and decision
processes needed to come to a good decision. This is quite laborious
and subjective. People often do not actually know why they act as
they do – it is simply a result of their education or training. Asking
people to comment on their actions can become confusing and even
threatening.
3. Critical incident technique. A variation of the diary is to ask
people to note all the important or critical decisions they have to
make. This is easier to manage and can be done as it occurs or on
reflection, individually or in a group.
4. Client interviews. Discussions with the receivers of good (and bad)
service will help to identify those behaviours that distinguish
between above- and below-average performance.
5. Surveys. There are a number of detailed commercially available
competency lists to identify the KSAVs associated with excellence.
However, there is no list of competencies applicable to all
organisations. Competencies need to be selected and/or generated in
terms of the organisation’s strategic objectives and its social and/or
business environment.
6. Existing job descriptions. Most organisations should have detailed
job descriptions for all their employees, and these can be used to
generate behavioural indicators. Unfortunately, however, job
descriptions are sometimes out of date – the more rapidly the
technology changes, the more likely it is that the job description is
inaccurate.
7. Repertory grids. In Chapter 11 we discussed the use of Kelly’s
repertory grid (or repgrid) technique. This is a useful approach to
identifying competencies in various situations. (See also Furnham,
2008, p. 328.)

Various theorists and organisations have lists of competencies that they


believe to be the most important. There are any number of consultants
and vendors (salespeople) who will try to sell their lists to management.
One such list is presented in Chapter 17 (Table 17.2). There is nothing
wrong with this approach – if the list of competencies works for a
particular organisation or situation, then it works. Furnham (2008, pp.
324–327) gives numerous other lists of competencies.

12.7 Developing a competency portfolio

In many occupations such as music, art, architecture, engineering and


being a technician, the idea of drawing up a portfolio of outputs or
developmental processes is standard practice. Portfolios are now widely
used in education institutions as well. These portfolios include examples
of successful work, project outcomes, as well as any awards received
and letters sent by satisfied clients or customers. In Chapter 15 we see
that the definition of a career has shifted from a list of jobs performed to
a list of skills and competencies acquired. In other words, a person’s
career is a series of situations in which he has been able to develop
competencies, and his curriculum vitae (CV) is nothing more than a
portfolio of these competencies and evidence for his claims.

In drawing up a portfolio of competencies (whether as part of one’s own


CV or for evaluating the performance of other people), one needs to
decide what competencies should be included and what evidence there is
for competence in each of them. Such a portfolio is useful in many
situations, such as for selection, promotion or even demotion. In a more
clinically oriented situation, one may be required to certify another
person as fit or unfit for some purpose. In all cases, sufficient evidence
needs to be provided to ensure that any independent assessor is in a
position to make a sound and unequivocal decision about someone’s
level of competence.

One final point to note in this respect is that competent workers tend to
be promoted to just beyond the level of their competence. This is
referred to as the Peter principle, which states that the members of an
organisation where promotion is based on achievement, success and
merit will eventually be promoted beyond their level of ability. Sooner
or later people are promoted to a position at which they are no longer
competent (their “level of incompetence”), and there they remain, being
unable to gain further promotion. In other words, employees tend to rise
to their level of incompetence. This view was first formulated by Peter
and Hull (1969).

12.8 Reliability, validity and fairness

The most important way of checking the reliability and validity of a


competency approach is to ensure that different observers agree in their
descriptions of any behaviour or outcome. Validity can also be
determined by ensuring that the different documents or sources of
evidence support one another. It may be useful to follow up some of the
documents to ensure their accuracy and authenticity – unfortunately
fraud is sometimes committed in this area. Following up any portfolio
claims needs to be managed in the same way as one would follow up
references for a job application. If there appear to be any anomalies or
blank areas in the portfolio, these too may need to be probed.

Because competencies are essentially behaviourally determined, there is


little reason to think that the behaviours shown can be biased in certain
ways. Dominance remains dominance and decisiveness remains
decisiveness, and even though different groups may vary in their ability
to display these competencies, this is for reasons of socialisation and
culturally determined value systems, rather than because the behaviours
have been misinterpreted. Milsom (2004), in discussing a survey of the
use of assessment centres* for the development of managers in
multicultural settings, states the following:

Our research project identified three prime issues relating to the cross-
cultural use of an organisation’s competencies. First, clear links were
found between cultural background and people’s perceptions of what
good performance looks like in terms of interpersonal and social
competencies, for example, the commonly found competency of
“leadership” and “team working”.

Second, a connection was also identified between cultural background


and the demonstration of other less socially driven competencies,
such as “creativity” and “proactivity”.

And third, the research found that, while managers from Germany, the
UK, Italy and the USA generally showed agreement about what
constitutes effective behaviour, there were also some clear areas of
disagreement. … [S]ubtle important differences of perception
concerning individual behaviours frequently mean that direct
comparison between candidates from specific countries are subject to
systematic bias (pp. 19–20).

If these conclusions can be drawn from an examination of various


Eurocentric cultures, how much more likely are these findings to be
repeated in a country consisting of a mixture of Eurocentric, Afrocentric
and Asiocentric world-views? Milson concludes by warning that we
should not assume that any competency is universally applicable (p. 20).

12.9 Competence in non-work-related areas

We stated at the beginning of this chapter that the kind of competency


framework discussed in the text applies mainly to the work situation, but
can be used with success in other areas. This is certainly the case in
education, where a major thrust of educational reform in the last ten or
20 years has been along these lines. In the examples used in this chapter,
we see that this approach can also be used in the sporting arena, for both
players and the captain. It can also be applied to coaching, in which
specific technical and higher-order (core or cross-field) competencies
can be described and behavioural indicators found for judging the level
of competence displayed.

We also suggested that this approach can be used in other areas of


interest to psychologists. To do this, we can create the term KBA or key
behavioural area to parallel the use of key performance areas or KPAs
in business. As in any other area, we need to determine what it is the
person being assessed should do (i.e. the goal), and then define a full set
of KBAs and accompanying behavioural indicators. Let us briefly
consider some possible instances.

School readiness. There are numerous school readiness tests


available to assess whether or not young children are ready to enter
primary school. Such assessments focus on areas such as cognitive
ability, hand–eye coordination, social maturity, independence, toilet
training, and so on. Each of these can easily be phrased in terms of
competencies, KBAs and behavioural indicators.
Marital or partner relations. Counselling couples about the nature
of their relationship and ways of improving it lends itself to this
competency framework.
Release from mental health institutions. The decision to release
people from various institutions clearly depends on the person being
competent in various key behavioural areas.
Child rearing. This is an area where little education is provided for
the average person. Given the high levels of child abuse and the
increasing numbers of children being raised by older siblings, a
nationwide programme on how to raise socially strong, intellectually
able and generally empowered children could be instituted. A
competency-based approach would be very useful in this regard.
In short, a competency-based framework can be applied across a broad
range of areas of direct concern to psychologists. All that is required is
for the need to be identified, and the correct sequence of steps to be
taken.

12.10 Summary

In this chapter, we defined the notion of competence and discussed the


various competencies that need to be identified for any specific job. We
also saw that these are essentially defined in behavioural terms – what
the person can and/or is expected to do. We then discussed the drawing
up of a competency framework and the various steps involved, which
lead to a consideration of levels of competence and the desirable number
of them. This is closely related to the issue of “competence for what?”.
In general terms, the more narrowly defined the relationship between
competence and performance area, the fewer competence levels
required. However, if the competence measure is to be used for a wide
range of purposes, a larger number of competency levels will be
required.

We then looked at various kinds of competencies before pointing out the


advantages of using a competency framework. Ways of identifying
competencies were discussed, as well as the compilation of a
competency portfolio and the ethical considerations related to this.
Finally, we suggested that a competency framework can be used in areas
other than the typical work situation – in education, sport, admission to
or release from institutions, childcare, and the like.

Additional reading
The work by Dubois is a sound introduction to competencies in the workplace. See, for
example, Dubois, D.W. (2005). What are competencies and why are they important?
and Dubois, D.W. & Rothwell, W.J. (2000). The competency toolkit.
For a sound critique of the competency concept, see Furnham (2008), Chapter 11. He
has also produced an excellent book (2003) entitled The incompetent manager.

Test your understanding

Short paragraphs

1. Define competence and discuss the basis for choosing a specific number of
competency levels.
2. Competencies are the same in every culture – there are no differences between
various cultural groups. Discuss.

Essay

Describe how you would set about drawing up a set of competencies for the press
officer of a national sports team (e.g. rugby, cricket or football).
13 Assessing integrity and
honesty in the workplace

OBJECTIVES

By the end of this chapter, you should be able to

define integrity and honesty


distinguish between integrity as “wholeness” and integrity as “honesty”
describe different ways of assessing integrity
discuss the reliability, validity, scope, fairness and faking of integrity measures.

13.1 Definition

Integrity is generally seen as having two aspects. The first meaning is


honesty/criminality/corruption, and is concerned with the individual’s
ability to deal with situations of moral conflict and temptation (we can
term this Integrity 1). The second meaning is “wholeness” (or being well
integrated and free from pathological conditions such as aggression,
drug dependence, and the like). We can term this Integrity 2. There is a
link between the two aspects of this definition because people who, for
example, have a drug problem may lose focus and have to resort to theft
and other Integrity 1 behaviours to feed their habits. Definitions of
corruption tend to focus on Integrity 1 – for example, Transparency
International (TI) defines integrity as those “behaviours and actions
consistent with a set of moral or ethical principles and standards,
embraced by individuals as well as institutions that create a barrier to
corruption” (Transparency International, 2009). In much the same way,
Fine (2010) defines integrity as a “quality of moral self governance at
the individual and collective level” (Paine, 1997, p. 335).
In line with the broader definition of Integrity 2, several other terms
have come into usage more recently. In particular, the terms personal or
employee reliability and counter-productive behaviour are increasingly
found in the literature. These two terms reflect a broadening of the
meaning of honesty and integrity from the relatively narrow view of
theft, lying and cheating (Integrity 1) that first defined the overt integrity
tests of the early 1980s, through to a range of behaviours, attitudes and
dispositions that are considered to be “not conducive to efficient and
effective work practices” or counter-productive to organisational or
societal “health”. In this sense, Integrity 2 is contrasted with deviance in
all its forms and includes a wide range of counter-productive behaviour
and deviant acts such as:

Dishonesty and fraud (including theft, corruption, fraudulent claims,


etc.)
Violence against society’s or the organisation’s property (including
vandalism, arson and sabotage)
Violence or threats of violence against people (intimidation and
physical harm)
Substance abuse
Issues of workplace reliability (such as propensity toward
absenteeism, malingering, excessive sick leave, tardiness, absconding,
etc.)
Refusal to carry out instructions (dereliction of duty, insubordination,
negligence, intransigence, non-compliance)
Various other counter-productive and delinquent behaviours that are
site or industry specific

This counter-productive behaviour manifests itself in various ways,


among the most important indicators of which include the following:

Arrest and/or conviction for a crime


Frequent involvement with authorities even as a juvenile
Drunk driving
A history of not meeting financial obligations, evidenced by a pattern
of financial irresponsibility (bankruptcy, debt or credit problems,
defaulting on a student loan)
Traffic violations with fines over R500
Illegal drug use (e.g. cocaine, heroin, LSD and PCP)
The illegal purchase, possession or sale of any such narcotics
Deceptive or illegal financial practices, such as embezzlement,
employee theft, cheque fraud, income tax evasion, expense account
fraud, filing deceptive loan statements and other intentional breaches
of trust
Inability or unwillingness to meet debt obligations
Unexplained affluence
Financial problems that are linked to gambling, drug abuse,
alcoholism or other issues of a security concern
Deliberate omission, concealment or falsification of a material fact in
any written document or oral statement in the job application
Credit history that yields evidence of dishonesty and disregard for
obligations such as late payment history
Parking and other traffic violation tickets: excessive parking and other
tickets may indicate disrespect for the law
Court records – while companies have always checked for criminal
histories, many are now examining other public records for evidence
of moral laxity, including aspects such as divorce cases and civil
litigation
Plagiarism, which is rampant, and can be detected using the Internet
Other forms of impropriety

Wild-cat strikes and other forms of disregarding laws and organisational


policies are examples of a failure of Integrity 2. Perhaps the most
notorious form of this breakdown of Integrity 2 are the shootings at
Columbine High School in Colorado, US, and the famous case of
“going postal”, which in American usage has come to mean becoming
extremely and uncontrollably angry, often to the point of violence, and
usually in a workplace environment. The expression derives from a
series of incidents from 1983 onwards in which United States Postal
Service (USPS) workers shot and killed managers, fellow workers and
members of the police or general public in acts of mass murder. The first
recorded case of this occurred on 20 August 1986, when Patrick Sherrill,
a postman facing possible dismissal after a troubled work history, shot
and killed 14 and wounded six employees at the Edmond, Oklahoma,
post office before committing suicide by shooting himself in the
forehead. This incident is the origin of the term “going postal”. Between
1986 and 1997, more than 40 people were gunned down by spree killers
in at least 20 incidents of workplace rage. In times gone by, these people
were said to be running amok. Whatever term one uses, the last thing
any manager wants is for one of his or her staff to go on a crazy
shooting spree, and therefore measures of Integrity 2 are important.

13.2 Assessing integrity

Until the late 1980s, the most widely used method of identifying and
dealing with workplace dishonesty was the polygraph examination
(popularly known as a “lie detector test”), and these tests were often an
important consideration in decisions of whether to hire or fire specific
individuals. However, their use was highly controversial, both because
of doubts about their validity and because of concerns over invasions of
the privacy and dignity of people being examined (Murphy, 2000). The
Employee Polygraph Protection Act of 1988 placed severe restrictions
on the use of the polygraph in the workplace, and this method was
abandoned by most employers. Employers who had relied on polygraph
examinations and other potentially invasive methods sought alternatives
for dealing with workplace dishonesty. Integrity tests have been
embraced by many organisations either as a replacement for the
polygraph or as a selection tool in its own right (Coyne & Bartram,
2002).

There are various ways of assessing integrity, such as tests and


physiological arousal and sociological profiling methods. The most
popular of these is currently integrity testing, which is usually assigned
to one of two categories, namely overt integrity tests and personality-
oriented (or covert) tests (Sackett, Buris & Callahan, 1989; Sackett &
Wanek, 1996).

13.2.1 Direct (overt) approach


This approach asks people directly whether they have been involved in
any criminal or dishonest behaviour, on the assumption that people
behave in relatively set ways and that past/current behaviour predicts
future behaviour. It is assumed that if you have lied or been dishonest in
the past (or during the assessment), the chances are that you would lie or
be dishonest again in the future. Overt tests thus approach integrity
testing in terms of direct questions regarding a test-taker’s attitudes or
past behaviours. The questions enquire directly about an individual’s
attitudes toward counter-productive work behaviours such as theft. Such
tests typically ask how often an individual has thought about committing
theft or has engaged in theft, drug usage, criminal behaviour or other
wrongdoing. This approach may also include questions about how
severely theft should be punished, as well as questions about the
individual’s honesty in general. Additional questions directly enquire
about beliefs on these same topic areas, such as the extent to which the
person agrees with various rationalisations for counter-productive work
behaviours (CWBs) and remorse for past actions (Berry, Sackett &
Wiemann, 2007). This method is based on the person’s history or track
record, and approaches the situation head-on by asking questions about
the person’s criminal record, the use of recreational drugs, and so on.

This direct approach is not particularly successful, as the items are


transparent – it is obvious what is being sought and it is relatively easy
to fake the correct or desirable answers. As a result, the person being
assessed is not likely to admit to actions that may jeopardise his or her
employment possibilities. One of the issues not often dealt with is the
magnitude of the crime. For example, admitting to “nicking” a spanner
left in your car by the service mechanic is very different from holding up
and robbing an all-night food outlet or service station. It also fails to
distinguish between white lies and those that are designed to mislead
people in a way that harms them or their interests. This covert approach
also penalises the person who is honest enough to admit to being
tempted or who has committed a minor crime in the past, and rewards
the person who lies about these things. In addition, asking these kinds of
questions is a violation of the person’s civil rights and is prohibited by
the law of this and many other countries.

13.2.2 The covert or personality profiling approach


The second approach to integrity assessment, the covert or personality-
oriented approach, is a more indirect approach, and is followed in
measuring counter-productive tendencies. These assessments tend to
have a broader focus than overt tests in that they focus on a range of
counter-productive work behaviours other than theft and honesty.
Personality-oriented exams assess personality constructs believed to be
involved in integrity (e.g. socialisation, positive outlook and
orderliness/diligence). These traits include such aspects as dominance,
frustration tolerance and similar characteristics assumed to be predictive
of deviant or criminal behaviour. These tests may also ask questions
about the individuals’ thrill-seeking behaviours, social conformity,
attitudes towards authority, aggression, conscientiousness and
dependability. Also included are aspects such as risk taking, the need for
instant gratification, a sense of entitlement, a strong need for power, and
similar attributes that can increase the risk of dishonest behaviour.

The covert approach is the most widely used, and many test
producers/distributors have spent time and effort on isolating deviant
profiles from their broad-spectrum personality scales. In some cases,
dedicated integrity tests have been developed – these are all strongly
reminiscent of existing normal personality scales (such as the 16PF) or
more clinically focused measures such as the MMPI. These models are
based on a relatively narrow definition of deviance, which is essentially
seen as fraud/theft. However, given the acknowledged violence in this
country, there are numerous other aspects which should be taken into
consideration. Robbery with violence, intimidation, outbursts of
violence, uncontrollable anger, addiction, gambling and organised
crime, to name but a few, suggest that the dominant characterisations of
deviance and its opposite, integrity, need to be broadened. In this
process a much broader range of indicators of deviancy and/or risk of
deviancy need to be explored.

There is strong evidence that both overt and personality-based tests are
related to the broad five-factor model traits of Agreeableness,
Conscientiousness and Neuroticism (or adjustment) – the ACN factors
(Hough & Oswald, 2000; Wanek, Sackett & Ones, 2003). These same
traits have been associated with individual-level counter-productive
behaviours (Berry, Ones & Sackett, 2007) and national-level corruption
rates (Connelly & Ones, 2008). Hough and Oswald also show that meta-
analysis (the combination of a large number of related studies) indicates
that “integrity and conscientiousness tests usefully supplement general
cognitive ability tests when predicting overall job performance.
Converging evidence exists for the construct, criterion-related and
incremental validity of integrity tests” (Hough & Oswald, 2000, p. 6).

Three additional approaches to integrity assessment can be identified –


the third approach is based on various psycho-physiological responses,
the fourth approach looks at mental health profiling and the fifth
approach looks at various sociological factors (such as socialisation and
vulnerability to temptation) that may increase the risk of dishonest
behaviour.

13.2.3 The psycho-physiological approach


The basic premise is that when an individual is lying or evading a “truth-
laden” statement, he or she in essence “fears” detection. This “fear”
emotion is then hypothesised to cause a change in the individual’s
physiological arousal responses which can be detected and then used as
an index of “deceitfulness”. This physiological arousal can be detected
and recorded either on a continuous-feed chart or some other display
device. The biosignals most often recorded are factors such as heart rate,
blood pressure, skin conductance and resistance, respiration rate and
depth, and peripheral skin temperature. As the person becomes
“aroused” by the deception being perpetrated, the heart rate and blood
pressure increase, the skin conductance increases (as a result of
sweating) and the skin temperature decreases (as the peripheral blood
flow decreases). Pupil size tends to increase. The biosignals are recorded
and form the basis of the polygraph or lie detector readings.

There are four subset techniques based on psycho-physiology. The first


two are used primarily in conjunction with polygraphs. The first method,
known as the CQT (control question technique) uses the difference
between responses to “neutral” questions and to “target” questions as an
index of “deception”. The second approach, the GKT (guilty knowledge
test) uses the extent of variation between responses as an index of
deception. In the GKT, overt questions about an event or occurrence are
not asked of the individual, but instead questions in which only a
“guilty” individual might see the relevance, in that they possess
knowledge about an event or occurrence that they are trying to hide.
Although polygraphs have been widely used for pre-employment
screening in the past, the US Federal Office of Technology Assessment
(OTA) published a report in 1983 which concluded that “no overall
measure or single, simple judgement of polygraph testing validity can be
examined based on available scientific evidence … – the available
research evidence does not establish the scientific validity of the
polygraph test for personnel security screening” (1983, p. 4).

Within five years, the use of the polygraph for the purpose of pre-and
post-employment screening was prohibited, albeit with some
exemptions. The use of polygraphs is also prohibited for general use in
South Africa. This near prohibition of polygraph testing has led to the
widespread use of the paper and pencil integrity test.

A third physiological approach uses a particular segment of the


waveform from an evoked brain potential to form an index of deception,
a process that has been termed “brain fingerprinting”. In this approach
(called the “oddball paradigm”), two stimuli are presented, one of which
rarely occurs while the other commonly occurs. An individual is
required to respond in some way only to the rare “odd-ball” stimulus.
The amplitude of a specific brainwave component (the so-called P300)
increases as the probability of occurrence of the rare stimulus increases.
It is argued that this P300 component is an indicator of information
processing of “meaningful” information. Typically, a number of words
and pictures are flashed across a computer screen in front of an
individual. Some of the stimuli are associated directly with the suspected
offence. The increase in P300 amplitude and time-shift of the
component are measured as these indicate an increased amount of
information processing by the individual, over and above the stimulus
processing of the “irrelevant” stimuli. The proponents of this approach
claimed an accuracy of around 87 per cent, although published figures
are hard to find.

The fourth “physiological” assessment is associated with general


physiological arousal. Raine and Venables (1984a, 1984b), in a
longitudinal follow-up study of 101 male adolescents, demonstrated that
low levels of psycho-physiological arousal found in some of the
adolescents when assessed at age 15 predisposed toward later delinquent
behaviours at age 24. This chronic under-arousal is observed when
monitoring the change in heart rate or skin conductance immediately
following the presentation of sudden “orienting” stimuli. Raine and
Venables demonstrated that the responses given by many adolescents,
who later went on to commit acts of anti-social delinquency, were quite
flat compared to those adolescents who were not associated with acts of
delinquency. Work by Hare (1993, 1995, 1996, 1997, 1998) shows that
psychopaths (described as people who possess a sense of entitlement,
lack of remorse and apathy towards others, are unconscionable, blameful
of others, manipulative and conning, affectively cold, have a disparate
understanding of socially acceptable behaviour, are disregardful of
social obligations, non-conforming to social norms, and are
irresponsible) are “expert cheaters”, and show this same pattern of
under-arousal. Psychopaths tend to be of above-average intelligence and
are expert users of charm, manipulation and control (think of the serial
killer Hannibal Lecter in the movie Silence of the lambs). With people
like this, questionnaire and interview procedures are generally
ineffective – however, the P300 and brain functioning research suggests
that this kind of sophisticated assessment may soon become standard in
the security and government services.

13.2.4 The mental health approach


This approach looks for various mental health indicators in the belief
that various aspects of mental health can put employees at risk for
dishonest and counter-productive behaviour. Indicators include factors
such as frustration tolerance (risk for violence), drug usage (risk for
escalation and theft to feed the habit), recent or ongoing exposure to
traumatic events which increases the risk of post-traumatic breakdown
into violence and the post office syndrome – going postal. Also included
are such aspects as depression, anxiety, obsessive/compulsive
behaviour, phobias, paranoia and even psychoses. A problem with this
approach in some cross-cultural situations is that various forms of
behaviour that may appear to be pathological or deviant in one cultural
group may be regarded as normal and even desirable by members of
another group.

13.2.5 Social/lifestyle profiling


The fourth approach to integrity assessment is to identify various social
and lifestyle risk factors that have been shown to predict anti-social and
criminal activity. Although each risk factor in itself may not be
definitive of a lack of integrity, positive responses to a wide range of
these risk factors does begin to suggest the likelihood of this. In other
words, it is not the person’s responses to individual items that is
important, but rather the cumulative effect – the profile.

Important factors include aspects of socialisation such as home life,


quality of parenting received, role models, etc. People with the
following characteristics all tend to be at greater risk of being dishonest
and lacking integrity than people with the opposite characteristics:

Relatively young
Single
Few social bonds (shown by socialisation/family background
conditions, frequent changes of address)
Use of drugs (both recreational and as a result of dependency)
Various indicators of mental instability such as having been a patient
in an institution for the treatment of mental, emotional or
psychological disorders
Poor quality and duration of education
Low work ethic
Poorly developed moral values

Of course, it is not the presence of one or two of these factors alone, but
a consistent pattern or profile that is most often associated with a risk of
lack of integrity.

13.3 The psychometric properties of integrity


measures

The usefulness of any psychological assessment process depends on


three fundamental criteria being met: reliability, validity and fairness. As
shown in Chapter 3, reliability is the consistency of the assessment
outcomes – do different administrators get similar results, are the results
obtained at different times very similar and do all parts of the
assessment measure the same construct? Validity is the extent to which
the measure measures what it is designed or purports to measure. The
measure must be in line with what existing theory would predict and the
extent to which the scores are in line with the scores on instruments
measuring similar constructs. Fairness is the extent to which the measure
correctly identifies equal levels of the attribute being assessed in
different groups. It is therefore a special case of validity generalisation –
is the measure equally valid for different groups? In addition, the scope
of the measure must be evaluated – scope is the range of the content
covered and is thus equivalent to content validity as discussed in
Chapter 5, section 5.2.2.

13.3.1 Reliability
The reliability of a testing method examines the extent to which a test-
taker’s score can be relied upon. Reliance can be in terms of internal
consistency (responses to items are related) or stability (the tendency to
get the same score over a number of trials or with different assessors).
Empirical studies have clearly illustrated the reliability of integrity tests
in terms of internal consistency and test-retest (values consistently above
0,7). Based on a meta-analytic review, Ones, Viswesvaran and Schmidt
(1993) report a mean alpha of 0,81 and mean test-retest of 0,85 for
integrity test reliability.

13.3.2 Validity
Internationally, the validity of integrity assessment tools is problematic
as honesty and integrity are extremely difficult constructs to define with
enough precision to allow empirical measurement of those constructs.
Ones et al. (1993) conducted a massive meta-analytic study in which
they analysed the results from 665 validity studies. This is the largest
and most comprehensive study of its kind to date. For the prediction of
Counterproductive Work Behaviours other than theft, Ones et al. (1993)
reported validity for overt tests of 0,39 and 0,29 for personality-oriented
measures, although it was lower for overt tests. Thus there appears to be
no basis on which to choose one type of test over another. For theft in
particular, both overt and personality-oriented tests had validity
coefficients of 0,33. Integrity tests therefore had good validity with
regard to the prediction of CWBs. In addition to the CWBs, integrity
tests have also been found to predict job performance (Ones et al.,
1993). In fact, integrity tests are the “personnel selection method with
the greatest incremental validity in predicting job performance over
cognitive ability” (Berry, Ones & Sackett, 2007, p. 2). For both overt
and personality-oriented measures, Ones et al. (1993) reported
coefficients of 0,41 for the prediction of job performance.
13.3.3 Scope
Scope refers to the range of attributes covered by the assessment method
and how focused or general the method is. The method may cover a
detailed aspect of a specific attribute or a general overall picture.
Integrity tests were initially designed to measure the specific construct
of honesty and employee theft, and as such are generally narrow and
specific in terms of scope. However, this is especially true of overt
integrity tests, which tend to be much more specific in their approach
and typically comprise subscales that measure specific attributes such as
predisposition to theft, past theft and drug abuse. The covert
(personality-oriented) and sociological profiling approaches are more
concerned with identifying overall levels of counterproductive
behaviour rather than just theft or drug abuse, and are therefore far
broader in design and outcome. The Integrity International based in
Johannesburg has adopted this broad approach to assessing integrity.

13.3.4 Faking on integrity tests


An important issue in respect of integrity tests is the extent to which
candidates are able to fake their responses to questions, either in socially
desirable ways – that is, by not responding in terms of their personal
beliefs – or by not admitting to actual counterproductive behaviours they
may have engaged in. Research suggests that such faking is possible
(Ellingson et al., 1999), although the majority belief is that while people
can fake when instructed to do so, individuals do not fake in real-world
situations (Hough et al., 1990; Morgeson et al., 2007; Ones &
Viswesvaran, 1998; Ones, Viswesvaran & Reiss, 1996). This argument
is supported by the fact that faking and giving socially desirable
responses can easily be detected by building various social desirability
and consistency measures into the assessment process (Morgeson et al.,
2007; Ones & Viswesvaran, 1998). In addition, measures of faking and
social desirability have virtually no impact on measures of integrity
(Hough et al., 1990; Morgeson et al., 2007; Ones & Viswesvaran, 1998;
Ones, Viswesvaran & Reiss, 1996).

The faking of answers is considered to be a particular problem with


integrity tests, more so in overt integrity tests than personality-based
tests because the individual is required to admit to previous behaviour or
express attitudes to deviant behaviour. Studies where test-takers were
told to “fake good” on integrity tests have shown that scores were on
average more positive than test-takers who were instructed to answer
truthfully (Lobello & Sims, 1993; Ryan & Sackett, 1987). In a meta-
analytic review of integrity test faking, Alliger and Dwight (2000)
reported that overt integrity tests were more susceptible to faking than
personality-based tests.

However, the question is whether, under a given set of conditions, the


degree of “faking good” is correlated with the trait of “honesty”, not
whether scores can be changed by changing the conditions (i.e. directing
the individuals to respond in a specific way). There is a paradox here –
is the person who is honest enough to admit prior infractions (and thus
get a lower integrity score) not more honest than the person who does
not admit having committed these infractions and thus obtains a higher
integrity score?

13.3.5 Fairness and adverse impact


An important issue in selection measures in many countries is fairness or
adverse impact, and integrity is no exception. According to Coyne
(2008):

Equal opportunity laws in many countries will prohibit the use of tests
in a manner that discriminates unfairly against protected groups of the
population (such as gender, race, age, disability, religion, etc.).
Adverse impact in itself is not unfair but it provides initial evidence
for indirect discrimination. The question arises then as to whether
integrity tests show adverse impact. Qualitative reviews have
suggested that no adverse impact is seen for integrity test scores
(Goldberg et al., 1991, Sackett, Burris & Callahan, 1989). However,
as Ones and Viswesvaran (1998) point out, studies looking at this
issue have tended to confuse adverse impact with intergroup
differences. Adverse impact relates to the use of the integrity test in
occupational settings, whereas group differences focus on whether a
bias occurs within a scale. Yet by looking at group differences within
an integrity test, information regarding the likelihood that the test
would cause adverse impact (so long as selection decisions were
based only on that specific test) can be obtained. Ones and
Viswesvaran (1998) examined group differences by age, gender and
race on overt integrity tests in a sample of 724 806 job applicants.
Effect sizes (differences between groups in terms of standard
deviation units) showed females scored 0.16 SDs higher (more
positive) on integrity tests than males and that those 40 and over
scored 0.08 SDs higher than those under 40. Further, Blacks and
Asians scored 0.04 SDs lower than Whites, American Indians 0.08
SDs lower and Hispanics 0.14 SDs lower than Whites. From this they
argue that differences between age, gender and racial groups on
integrity test scores are minor especially as values of 0.2 or lower are
considered to be small (Cohen, 1977). Previous research appears to
illustrate the lack of bias and by implication adverse impact of
integrity tests. Indeed, Arnold (1991) argues that the statistical record
of honesty/integrity tests, which illustrate their freedom from adverse
impact, cannot be matched by any other selection technique (pp. 7–8).

Unlike other selection tools, integrity research is very promising in


terms of its adverse impact on protected classes. Often the more valid
tools (i.e. tools with the highest predictive relationship with on-the-job
performance), such as cognitive ability measures, produce the highest
adverse impact values. Integrity exams seem to be an exception to this
trend. As the previous sections have indicated, integrity exams have
very strong predictive relationships with the criteria of interest,
especially with performance. Still, research shows minimal to no
difference in performance on integrity exams across protected groups,
meaning that integrity exams do not adversely affect these protected
groups (Ones, Viswesvaran & Schmidt, 1996). Virtually the only “sub-
group” differences appear between men and women, with women
scoring 0,11 to 0,27 standard deviations higher than men, although this
difference will likely not violate the 4/5th rule of thumb (Ones et al.,
1996).

One problem is that integrity assessments tend to have high failure rates
(false negatives) or require very stringent scoring systems which may
result in the rejection of honest employees (false positives). This may
result in employees who should be selected being excluded for the
wrong reasons – a truly honest person may truthfully answer items in
ways that impact negatively on their scores, or an individual who is truly
honest and answers the assessment accurately can be seen as being “too
good to be true”.

Another possible dilemma is that the nature of integrity assessments is


such that informing test-takers of the purpose of the assessment may
impact on the way in which they answer the questions. This could be
challenged as informed consent, and is an essential part of the
assessment process in South Africa. However, the same problem applies
to most types of assessment with the exception of ability tests, for which
it is not possible to “fake good”. The way around this is to build in
control items that monitor faking, unusual responses and social
desirability.

13.4 Monitors and control factors

A cornerstone of sound psychological assessment is to build a number of


checks and balances into the scale to ensure that the scores are a
relatively true and accurate measure of the construct being assessed.
These monitors generally involve three different processes, namely
consistency, extreme or unlikely responses, and deliberate distortion or a
lie factor.

13.4.1 Consistency
Consistency measures are designed to ensure that the person being
assessed remains alert and consistent in his response. Simply put, one
item may consist of a simple statement such as “I like ice cream”, and
then later in the survey the same statement or its opposite is put forward
(“I do not like ice cream”). In scoring the test, pairs of items of this kind
are examined and the degree to which the person is consistent is noted.
If the consistency score is too low, this would suggest that the
participant was not giving the task at hand due consideration.

13.4.2 Impossible responses


The impossible response monitor is designed to identify unlikely
responses that are very seldom encountered in real-life situations. For
example, the statement “Since my accident I have lost all taste in my
mouth” is extremely unlikely to be true and is a good item for detecting
malingering in the case of motor accident claims.

13.4.3 Social desirability


The third monitor assesses the extent to which people disagree with
statements that are seen as socially unacceptable, although in reality
most people would have committed such acts. These items are assumed
to detect distortions that induce unrealistic positive self-presentations.
An example of such a social desirability (SD) item is “I have never
spoken ill of my friend behind their back” and “If I am given more
change than I should have received at a shop, I always point it out to the
shop assistant”. As argued by Espinosa, Procidano and He (2012), social
desirability occurs through two distinct routes: self-deception and
impression management. Self-deception is defined as an unintentional
tendency to give a favourable impression, whereas impression
management is the deliberate attempt to distort the self-image and to
present oneself in a favourable light to others. With self-deception, a
person overestimates his qualities, while with impression management,
the person fears social disapproval and seeks to present himself in a
socially acceptable way. In the latter case, the attributes that are
perceived as socially desirable are defined by each culture in relation to
their values and standards.

In general it seems that social desirability scales (SDSs) have been


widely used with different theoretical measures in order to test the
validity of a measurement. However, it has been suggested that SD is
linked with other more general features of personality such as ego
strength and agreeableness. Insufficient attention has been paid to the
nature and extent of SD in cross-cultural settings. For example, Latin
Americans and Asians tend to achieve higher SD scores. This may be
because SD has been defined from the Anglo-Saxon viewpoint (US and
Canada), and it has been shown that Latin Americans differ from
Europeans and American in other domains (personality, values, etc.)
according to Espinoza et al. They argue that when there is no apparent
social demand situation, SD is a reflection of a personality trait related
to a need for social approval rather than an honesty/integrity dimension.

In their study, Espinoza et al. (2012) compared groups from the US,
Mexico and China on a measure of SD. The results indicating the
proportion of their sample giving specific answers are shown in Table
13.1. Note that items marked with an asterisk (*) are negatively phrased
and are expected to be disagreed with, whereas those without an asterisk
are expected to be agreed with.

Table 13.1 Social desirability scores across three cultural groups

Items Mexico US China


1. I easily forgive those who trespass against me 0,75 0,75 0,71
2. If I receive more change than is due to me at a store, I 0,54 0,51 0,44
would not say anything*
3. I would let myself be bribed if the benefit I received was 0,67 0,60 0,75
great*
4. I would steal something if I was sure that no one would 0,47 0,75 0,76
catch me doing this*
5. I would “bend” the truth if it was too painful* 0,60 0,41 0,16
6. I am friendly with all people, regardless of their nature 0,36 0,53 0,68
7. I easily forget when people offend or annoy me* 0,74 0,85 0,79
8. I have spoken ill of my friends without them knowing* 0,38 0,26 0,36
9. I always try to reconcile with my enemies 0,41 0,56 0,57
10. I tell lies if I know that I will not be discovered* 0,68 0,66 0,60
11. In any situation, I am willing to help people 0,52 0,46 0,61
12. My most comfortable way out of difficult situations is 0,47 0,64 0,55
bribing when this is needed*
13. I tend to forgive others, even though they may have hurt 0,79 0,69 0,62
me badly
14. I have avoided returning something that is not mine by 0,59 0,46 0,61
pretending to have forgotten*

13.5 Summary

Integrity can be seen in a narrow sense as the person’s adherence to


various moral and ethical standards (termed Integrity 1 and reflected in
the person’s honesty and its opposite dishonesty and corruption). It can
also be seen more broadly as the psychological “wholeness”,
“intactness” and “integration” of the person. This is termed Integrity 2,
and is reflected in being psychology secure and free from pathological
components such as drug dependency. The opposite of this is seen in
actions such as being easily provoked to anger, high levels of
aggression, and a disregard for the laws of the land and the feelings and
rights of others, and so on. Sometimes Integrity 2 can lead to
breakdowns of Integrity 1, where people steal and murder to feed their
drug addictions, for example. It could be argued that beliefs in Satanism
could reflect the lack of Integrity 2 – when it leads to crimes such as
murder and rape, this is a violation of Integrity 1 principles.

Integrity in both its forms and its opposites reflects the degree to which
people follow the precepts laid down by their family, community and/or
religion/moral guides. These may differ across different groups and
cultures, and what may be acceptable in one setting may be taboo in
other settings. The reasons for the breakdown of integrity, whether this
is seen as dishonesty, counter-productive work behaviours (CWBs) or
corruption can be accounted for at both individual and sociological
levels. Individual accounts look at aspects such as personality, moral
development, socialisation practices and even a sense of entitlement.
Sociological theories look at cultural explanations in terms of Hofstede’s
dimensions of culture, the differences between guilt and shame cultures
and “caring and sharing” cultures, and principle vs relationship cultures.
Although the concept of ubuntu has numerous advantages, the downside
of it is that it may predispose people to focus on relationships rather than
principle – and at the end of the day, integrity is the adherence to
principles even when this is inconvenient and against the short-term
interest of the person and his community. At the same time, it appears
that there has been a general “dumbing down” of integrity in many
places in the world – pop idols and sports heroes are looked up to as
they violate society’s norms in line with the anti-establishment bravado
of the “brave new bling world” that is emerging.

Different approaches to the assessment of integrity and various technical


aspects associated with this were discussed.

Additional reading

Alliger & Dwight’s (2000) investigation into whether integrity tests can be faked or
modified through coaching is an interesting look at the assessment of integrity using
tests, while Berry, Sackett & Wiemann (2007) explore various developments in integrity
testing that have occurred in the new millennium.
Fine (2010) takes a close look at the origins of corruption, linking it to various
personality and sociocultural factors.
Whiteley’s (2012) paper on whether Britons are becoming more dishonest looks at the
effects on British people of the general approval of low-integrity behaviour in which
programmes such as Footballers’ Wives and the “bling world” they live in are seen as
something to aspire to.

Test your understanding

Short paragraphs

1. Define integrity, and contrast integrity-as-wholeness (Integrity 1) and integrity-as-


honesty (Integrity 2).
2. Briefly outline the four approaches to assessing integrity and honesty in the
workplace.

Essays

1. Discuss integrity and the methods used in its assessment. Outline some of the major
issues associated with this in a multicultural context and comment on the fairness of
the different ways of assessing it.
2. Describe the psychometric properties of the various methods of assessing integrity.
SECTION
4

Assessment in the organisational


context
In this, the fourth part of the book, we examine how the various aspects of
assessment are applied in the workplace. In Chapter 14, we consider the different
reasons for doing assessments in organisations, and distinguish between
assessing at the individual level, the group level, the organisational level and even
at the level of external stakeholders.
We discuss career counselling and guidance in Chapter 15, and look at the role of
assessment in this process.
Chapter 16 is concerned with interviewing, which we examine in the light of strict
psychometric criteria and find wanting: interviewing is not a very effective process,
despite claims to the contrary by many practitioners. The text gives ways of
improving the reliability, validity and fairness of the interviewing process.
This section of the book closes by examining the Rolls-Royce of assessment,
namely assessment centres. Chapter 17 defines what an assessment centre is
and shows how one sets about constructing and running an effective assessment
centre that is reliable and valid, and fair to all participants.
14 Assessment in organisations

OBJECTIVES

By the end of this chapter, you should be able to

discuss the role of assessment in industrial and/or organisational settings


identify four target groups assessed in the organisational setting
explain the importance of a proper job description
show the importance of psychological factors underlying job performance
define stratified systems theory
explain what is meant by the matrix of work
outline the uses of stratified systems theory
describe the cost of poor selection as opposed to selecting properly
describe how you would organise a selection process
decide when to use psychological assessments and when not to.

14.1 Introduction – why do we assess in industry?

As we argued in Chapter 1, assessment is the attaching of a value to


some attribute, characteristic or process in order to evaluate (pass
judgement on) it. In this chapter we consider why this is important in the
organisational context, highlighting the different uses of assessment.

Organisational psychology is part of the process of managing the


performance of individuals, groups, processes and systems in order to
achieve organisational objectives efficiently and effectively for the
benefit of all stakeholders. These include the organisation itself, the
individual employees, the various teams and groups within the
organisation, and external stakeholders such as clients and the
community in which the organisation is located. All of these are
assessed for different purposes in the overall management process.

The most important reasons for obtaining this information include the
following:

1. To describe the current situation


2. To map changes and determine whether a situation has improved,
remained the same or deteriorated
3. To determine the impact of various processes and interventions
4. To establish areas in which improvement is required
5. To optimise the match between the person, object or process and the
overall system, and to improve this by a process of selection and/or
development
6. To determine potential in order to manage future performance

If we look closely at these six aspects, we see that they boil down to
three basic functions: describe (assess the current situation), decide (use
the information to make certain choices and implement actions), and
develop (use the information to try to change the situation for the better).

If we combine the four different stakeholders (individual, group,


organisation and external) with the six reasons for assessment listed
above, we arrive at Table 14.1.

Table 14.1 Reasons for assessment of the various stakeholder groups

Individual Group Organisation External1


Describe 1 Current Performance Industrial Finances Image of
situation/performance management relations (IR) Safety organisation
levels Job satisfaction climate Productivity Programme
(management and Problem areas Organisational management
control) Quality
culture/climate
Resources PESTEL2
Team
performance Culture
Team roles Climate

Describe 2 Future needs Leadership needs Team New organisational Environmental


(projections) composition structures analysis
Growth SWOT3
opportunities
Decide Future performance Selection/promotion Acceptability Selection/promotion Market research
(predictions) Leadership of new Leadership Acceptability of
potential systems, new
Integrity
Integrity processes, products/services
conditions
Develop Map changes Training evaluation Team Building Introduction of new Impact analysis
(impact) Improved Impact on systems or of repositioning
motivation group of technologies, new
Coaching changes in boss
systems, Organisational
conditions development

1 Suppliers, customers, the general public as well as wider


PESTEL changes
2 PESTEL = political, economic, social, technological,
environmental and legislative
3 SWOT = strengths, weaknesses, opportunities and threats

Although this matrix is not exhaustive and other types of assessment are
available, Table 14.1 gives some indication of the range of situations in
and for which assessment is useful. We will examine each of these in
turn, although there will be some overlap between categories. However,
our main concern is with the primary beneficiaries of the assessment.
For example, selection is the process of matching an individual to a job.
Although this does benefit the individual (he is offered the job), the
primary beneficiary of the process is the organisation, because it has
someone to carry out its objectives. On the other hand, assessing a
person so that he understands himself better and can make more
informed decisions about his career or job-related strengths and
weaknesses is of prime interest and concern to the person involved.

14.2 Assessment at the individual level – selection

Perhaps the most important area of assessment within the human


resources arena is selection: this is the process of finding the most
suitable people to meet the current and future manpower needs of the
organisation. Although they are not strictly identical, for the purposes of
this discussion the term “selection” will cover promotion, transfer and
placement, because these all imply placing people in specific jobs within
the organisation.

14.2.1 Definition
Selection is the process of matching people to job requirements in order
to meet organisational objectives, both current and in the longer term.
As part of this process, a wide range of attributes is assessed. These
include knowledge, ability or aptitude, personality, potential, leadership
potential, learning style, management style and communication style, to
name but a few. Chapter 12 illustrates how all these KSAVs
(knowledge, skills, attitudes and values) form the competencies required
for a job. They are all important because they determine how well the
person will do in his job, whether he will benefit from training and
development, and how well he will fit into the organisation. (See, for
example, Furnham, 2003.)

14.2.2 The selection process


When assessing people for selection, the following five-stage process
should be implemented:

1. Describe the tasks that need to be completed by the person by


compiling a job description or post profile.
2. Describe the characteristics or competencies required by the person
in the job for satisfactory performance in the job. This is termed the
person specification or the person profile.
3. Translate these attributes and competencies into selection criteria.
For example, if the person profile requires above-average
intelligence, this must be translated into a specific requirement, such
as the appointee needing a stanine 7 or above on the Ravens
Standard Progressive Matrices. Each aspect of the person profile
needs to be translated into an observable score or value so that the
person’s performance, potential or competence in each area can be
exactly specified. The resulting set of selection criteria forms the
basis of the selection battery. As stated previously, these
assessments need not be based on testing – every form of
assessment discussed in this book can be used.
4. Assess the individual on the various techniques devised. At lower
levels of the organisation this may take the form of a one-day
assessment session; at more senior levels, various forms of
assessment centre may be more appropriate.
5. Score the assessments and decide on the best candidate(s) for the
job. (This assumes that various decisions have been made regarding
cut-off scores*, combining scores from various assessments,
ensuring fairness and proper treatment of minorities and the
disabled, and so forth.) These people are shortlisted and interviewed
before being offered the job.

A sixth stage that is often ignored in the selection process is follow-up.


This is both short and long term. Short-term follow-up includes the
settling-in phase and the induction process, while longer-term follow-up
includes aspects such as the development, retention and promotion of
the person. This is part of the validation of the selection process.

14.2.3 Job descriptions (position profiling/competence


mapping)
As we know, the first step in selection is compiling a job description and
a person profile in which the competencies and abilities for the job are
clearly specified. This is a fairly complex process, and a very useful
approach to use is the Position Analysis Questionnaire (PAQ)
(McCormick, Jeanneret & Meacham, 1972). This questionnaire has an
extensive checklist of 195 job elements to indicate which tasks and/or
behaviours are required by a particular job. The items fall into five
categories:

1. Information input (where and how the workers get information)


2. Mental processes (reasoning and other processes that workers use)
3. Work output (physical activities and tools used on the job)
4. Relationships with other persons
5. Job context (the physical and social contexts of work)

When this is scored, it yields, among other things, information about the
psychological characteristics associated with the job (in terms of a
modified list of Thurstone’s primary mental abilities or PMAs – see
section 10.3.2). It also provides a list of tests that can be used to assess
these abilities and the score ranges on these tests that have been shown
to have high predictive validity for a specific job. Although these tests
are American, it is relatively easy to find local equivalents and derive
appropriate cut-off scores for them.

Saville & Holdsworth Ltd (SHL) have a similar system known as the
Work Profiling System (WPS). It is designed to help employers
accomplish many of the human resources functions outlined above. The
job analysis component yields reports aimed at various human resources
functions such as individual development planning, employee selection
and job description. There are three versions of the WPS linked to
different types of occupation: managerial, service and technical
occupations. The WPS is administered by computer at the company’s
premises. It contains a structured questionnaire which measures ability
and personality attributes in areas such as hearing, sight, taste, smell,
touch, body coordination, verbal skills, number skills, complex
management skills, personality and team role. Unlike the PAQ, which is
scored at the University of Purdue in the US, SHL does not require WPS
users to submit their data – the WPS is scored on site via the Internet.

14.2.3.1 Sources of information


Although we have mentioned testing as a source of information, it is not
the only one. Other ways of obtaining job-relevant information include
the following:

Biographics (age, education, experience, etc.)


Track record (past achievements, promotion rates, salary levels, etc.)
Interviews
Work samples (observing the person at work for several hours or
more)
Simulations (in-baskets, role plays, etc. See Chapter 17 on assessment
centres)
360-degree assessment*

14.2.4 Implementing a selection process


If we consider the process of selection, we realise that there are three
different aspects: job content (what the job entails), selection criteria
(what is being looked for) and assessment outcomes (what is found).
Each of these can be a source of error. Their interdependent
relationships are shown in Figure 14.1.

Figure 14.1 The relation between job content,


selection criteria and assessment outcomes

From Figure 14.1 we see that the overall validity of the assessment
technique (G) depends on the degree of overlap between the three
circles, A, B and C. The more accurate the job description and selection
criteria are (i.e. the overlap between circles A and B), the greater the
predictive validity (G). Similarly, the greater the construct validity of the
battery (i.e. the overlap between circles B and C) is, the greater will be
the size of G. Finally, the more accurately the outcomes of the selection
technique reflect the job requirements (i.e. the degree of overlap
between circles A and C), the greater will be the predictive validity of
the selection process.

From this we see that in order to increase the overall validity of the selection
process (G), we must find ways of increasing the amount of overlap between the
three circles A, B and C.

All organisations have to distinguish between people for good reasons.


Choosing between people on valid grounds is not only acceptable, but
also often imperative. For example, all of us would certainly want the
pilot of the aircraft in which we are flying to have good eyesight and
hand-eye coordination, and not to panic under stress or to suffer from
epilepsy.

In organisations, assessments are most often used for selection purposes


and are designed to measure differences among people, because for most
purposes differences rather than similarities are important. In a
multicultural country like ours, we need to take differences into account
in order to be fair. This raises the issue of what is meant by fairness and
fairness to whom. As industrial psychologists, we also need to manage
the differences that exist to the benefit of all stakeholders. If we base our
choices on valid job-related criteria, we discriminate between people.
However, when we take irrelevant criteria such as race, gender and age
into account in making our choices, we discriminate against the
excluded people. The various laws in South Africa allow us to
discriminate between, but not against people. (The whole issue of
fairness and discrimination is dealt with in depth in Chapter 9.)

14.2.5 Benefits of proper selection


Selection, the identification of potential, and the ability to meet future
growth needs as well as replace staff who leave, are crucial aspects of
human resources management. As noted above, selection is probably the
most important reason for assessing staff. It is vitally important for
organisations to have the right number of competent people in the
organisation at the time they are needed. This is important for both
current functioning and long-term planning. Factors such as motivation,
interest, values, the ability to get on with others and conscientiousness
all affect job performance. By assessing these during selection, people
can be placed in positions where they are able to make the most of their
abilities. A proper selection policy and procedure ensures that

competent people join the organisation


future needs are addressed
the fit between the person and the organisation is increased (when
people are incorrectly placed, they become stressed, frustrated or
bored, and are therefore set up for failure)
stress and conflict are reduced
job satisfaction is enhanced
labour turnover is reduced.

14.2.6 The cost of not selecting properly


In general terms, the cost of not selecting the right candidate for any job
is high. According to a global survey of 700 managers in seven countries
reported in HR Magazine (August, 2004), companies in the US waste
$105 billion a year (equal to 1,05 per cent of the gross domestic product)
because of poor hiring and management practices. In the survey, the
average cost of managing poor performers working in each of the seven
countries amounted to $153,5 billion. The hidden costs include the cost
of failure owing to poor selection of new employees and the cost of
managing these poor performers. The researchers found that nearly one
in four employees (23%) in the US believe their colleagues are
incompetent. In addition to this, they found that almost 70 per cent of
mistakes made by US employees are never reported.
Research (Furnham, 1992) has shown that the difference in productivity
between good and poor workers is about two to one – that is, good
workers produce roughly twice as much as poor workers. As the work
becomes more complex, the productivity ratio becomes even higher, so
that a good physicist produces much more than twice the output of a
poor one. Similarly, in the selling profession, good salespeople are
estimated to be more than twice as productive as poor ones (Schwartz,
1983).

In addition to these direct costs, poor selection can result in quality and
safety being compromised, increased risks of injury, underspending of
allocated budgets, and under-delivery of vital services, to name but a
few. George and Reiber (2005) argue that the cost of replacing a single
employee can vary between one-and-a-half to three times his annual
salary. Harvard Business School puts this at three to five times the
annual salary. These costs are made up of factors such as the following:

Underperformance during the final stages of the person’s employment


Exiting costs (administration, pension payout, leave payout, etc.)
Recruitment and selection costs (advertising, assessing, interviewing,
head-hunting)
Underperformance during the warm-up and settling-in phase
Lost opportunities

Although it is not essentially an assessment issue, we must not lose sight


of a crucial employment step, namely recruitment. This is important for
assessment because a poor-level candidate pool will relatively adversely
affect the selection ratio, which will affect the quality of the applicants
who are eventually appointed – this will have a major impact on
organisational outcomes. Morgan (2007) discusses this issue, and the
advantages and disadvantages associated with various recruitment
strategies are summarised in Table 14.2.

Table 14.2 Advantages and risks of different recruitment methods


Coverage
Cost Speed of talent Disadvantages
pool
Internet low fast narrow Attracts the largest number of
advertising applicants, but they may have the
wrong skill sets
Does not attract top talent, i.e.
those already in jobs
Involves a lot of administration –
screening, selecting and
interviewing
Print medium slow narrow Attracts a large number of
advertising applicants with the wrong skill
sets
Does not attract top talent, i.e.
those already in jobs not actively
looking
Involves a lot of administration –
screening, selecting and
interviewing
Executive high slow high If the search agency has limited
search resources to identify and
influence targets, then it is costly
to client in money and time
Internal referral low fast narrow Cannot be sure if you are getting
the best – not testing the total
pool of candidates
Database/talent low fast varies Important to use database in a
pool search dynamic way – keeping up to
date
Holistic high mixed largest If the search agency has limited
approach coverage resources to identify and
(all of the above) (>80%) influence targets, then it is costly
to client in money and time

Source: Morgan (2007, p. 4)

In South Africa, we have seen the costs at provincial and municipal


levels of the placement of unqualified and underqualified people in
positions of power. These costs are evidenced in unacceptable levels of
corruption, underspending of allocated funds, poor service delivery, and
the like. In addition, there are unacceptably high levels of labour
turnover, especially at senior management and professional levels.
Furnham (2003) dedicates a whole book of more than 260 pages to the
causes, consequences and cures of bad selection and management
processes that lead to the appointment of incompetent managers.

14.2.7 Staff development


Another area where assessment can be important in organisations is the
identification of the training and development needs of individual staff
members. Although it is important for the individual to know what his
strengths and weaknesses are, it is also important for the organisation to
know this as it can then take steps to address these weaknesses in its
own best interests for manpower planning, future promotions, and so
forth.

14.2.8 Promotion and transfer


An important aspect of assessment at the individual level relates to
promotion and/or transfer to another department. (Promotion can be
seen as transfer to a higher-level job, whereas transfer refers to
movement to another department or function at the same level.
Demotion is merely a transfer to a lower level of responsibility.) As a
general rule, formal once-off assessment (using tests and other
techniques such as assessment centres) should be used only when there
is insufficient information about the person to make a sound decision.
The longer a person has been in an organisation, the less important this
onceoff assessment becomes, and the more important the person’s track
record, the results of his performance appraisals, and so on become.
Formal once-off assessment should be considered only where there is a
clear break in career route (say from administration to sales). This
process is shown in Figure 14.2.

Figure 14.2 Testing versus track record in promotions


and transfers
14.2.9 Performance management
Another area where assessment is important is performance
management. This is the process by which a manager and his
subordinates reach agreement about the key performance areas (KPAs)
that need to be achieved. In essence, these are specific targets that need
to be reached during the review process. A good KPA is characterised
by the following properties:

It should be framed in terms of observable outcomes (sell x number of


widgets per month, decrease wastage by y per cent per month, etc.).
It should be time bound (increase sales by the end of the third
quarter).
It should be realistic and achievable.

Clearly, management needs to assess the extent to which these targets


have been achieved. Although this form of assessment is not strictly
psychological in nature, many of the problems of reliability, validity and
fairness apply and hence should be taken seriously. In many cases, the
results of this type of assessment can have repercussions for the
employee in terms of bonuses, promotions and even disciplinary action
if the targets are not met. The assessment therefore needs to meet
minimum psychometric criteria.
14.3 Career path appreciation* (stratified systems
theory)

One final piece of theory that we need to look at is career path


appreciation (CPA). CPA (and its online version MCPA or modified
CPA) is an assessment of potential developed by Gillian Stamp and the
Brunel Institute of Organisation and Social Studies (BIOSS), based on
Elliott Jaques’ theory of stratified systems. Jaques developed his
stratified systems theory (SST) over almost 50 years between 1950 and
1997, arguing that all work involves using discretion and making
decisions, and that individuals differ in their capacity, to manage
complexity in the workplace. He refers to this ability as “work capacity”
which is basically the level of abstraction at which any individual is able
to function. (See, for example, Jaques, 1976, 1982 and 1997 as well as
Jaques, Gibson & Isaac, 1978.) An important aspect of this capacity is
the time frames in which the implications of these decisions are realised.
In today’s terms, Jaques’ theories would be framed in terms such as
“discretion”, “making judgement calls” and “time horizons”.

In the mid-1970s, Gillian Stamp, a colleague of Jaques, developed a


means for measuring this work capacity. She and her colleagues
identified seven “levels of work”, ranging from direct hands-on work,
through organisational and managerial levels to the longer-term strategic
levels. She termed this the “matrix of work relations” (MWR, or
sometimes MoW). (See, for example, Stamp & Stamp, 1993.)

An important aspect of CPA theory is the notion long accepted by major


organisations such as Shell Oil that in the normal course of events a
person’s career development follows certain clearly defined trajectories
or growth curves. In other words, if a person has not achieved a certain
level in the organisation by a certain age, then the chances of his
achieving very high levels are limited.

CPA is thus based on four components, namely the person’s ability to


deal with complexity, the time frames in which the impact of decisions
is appreciated, the levels of complexity required by particular jobs and
the idea of trajectories or growth curves. These are explored in more
detail below.

14.3.1 Stratified systems theory


In terms of SST, work involves the exercise of discretion and judgement
in decision making when carrying out tasks. It is driven by values and
skilled knowledge. According to SST, all organisations have some form
of a managerial hierarchy, and it is essential that the relationship
between this hierarchy and complexity of work be understood to ensure
effective use of talent and energy. The SST identifies seven levels of
work (see Table 14.2) differentiated on the basis of complexity and
timespan of decision making. The longer the timespan, the higher the
level of work and the greater the responsibility associated with that role.
In addition, the CPA identifies three categories of human working
capability as follows:

1. Current potential capability. This gives an indication of the


maximum level of work that an individual can do at any given point
in time, if he values what he is doing.
2. Current applied capability. This refers to the level of capability
that the individual is currently applying in his work.
3. Future potential capability. This refers to the predicted level of
potential capability that an individual will be able to handle at a
specific point in the future.

14.3.2 Matrix of work relations (MWR)


The MWR explores the relationship between an individual at work, the
organisation and the environment within which the organisation needs to
function. It defines the levels of work as referred to in CPA, together
with the capability required to cope with work at the various levels. As
an individual’s responsibilities increase, so does the complexity of the
decision making and the need for discretion in arriving at job-related
decisions.
The MWR identifies various “themes of work”, each of which requires a
higher level of complexity than the one before, and so employees need a
higher level of individual capability to be able to manage the increasing
levels of uncertainty and ambiguity associated with the longer lead
times. The lower levels contribute to the more concrete outputs
concerned with the operational functioning of the organisation, while the
higher levels contribute to the strategic future positioning of the
organisation and ensure its future viability. Each level makes a unique
contribution, and missing levels of work have a negative impact on the
organisation. It is very important for there to be a match or alignment
between the decision-making requirements of the job and the job
incumbent’s level of complexity. This is termed “flow” and is discussed
in section 14.3.3.

Table 14.3 shows the various levels of work, together with the level of
capability required to effectively manage tasks at a particular level.

There is also a shorter version of CPA known as the Initial Recruitment


Interview Schedule or IRIS which has been designed for unskilled to
junior levels of management (although it can also be utilised at higher
levels). The IRIS assesses people to the level of “practice” and covers a
timespan of up to 15 years.

Table 14.3 Levels of work

Level Time Work complexity Cognitive mechanism Position


span
VII 20 Construct complex Linear extrapolation; Board
years systems; construct develop new theories chairman
versus predict future Corporate
CEO
VI 10 Oversee complex Reflective articulation COO
years systems; group of between systems; Executive
business units; plan long- higher conceptual VP group
term strategy approaches Executive
VP
V 5 years Command one complex Shape, reshape whole President
system; connections to systems, boundaries; VP
environment utilise theory Top
specialist
IV 2 years Oversee operating Develop alternative General
subsystems; design new systems; abstract from manager
methods, policies data; parallel processing Division
manager
Chief
specialist
III 1 year Direct one operating Linear extrapolation; Unit
subsystem; predict needs alternate pathways manager
12–18 months out Department
manager
Director
II 3 Direct an aggregate of Reflective articulation; First-line
months tasks; diagnose problems formulate new ideas; manager
handle ambiguity Supervisor
I 1 day Carry out one task at a Concrete shaping; Operators
time; daily, weekly, concrete thinking; linear and clerks
monthly quotas pathways Day
workers

Source: Jaques (1998)

14.3.3 The concept of flow


When the scale of challenges presented by the person’s work matches
his ability to deal with complexity (i.e. his capability), the person is said
to be “in flow”. When the person’s capabilities are not aligned with job
requirements, he is likely to be perplexed and experience anxiety, or to
be frustrated, bored and anxious. In situations of this kind, the person is
likely to withdraw from the job unless his job responsibilities are
renegotiated or redrawn. This is shown in Figure 14.3.

Figure 14.3 The concept of being “in flow”


Source: Stamp & Stamp (2004)

14.3.4 The uses of CPA


CPA may be used by an organisation to do an initial assessment of
capability for individual and organisational development, or by an
individual at a point of uncertainty about his working life. In this role, it
provides a valuable adjunct to individual and/or managerial judgement
about performance and potential for carrying greater responsibility. In
this way the CPA contributes progressively to the articulation of issues
surrounding the optimal structuring of work such as delegation of tasks,
patterns and accountability, training, succession planning and statements
of corporate philosophy.

The following is a broad overview of the various uses of CPA:

Executive recruitment, mentoring and development


Early selection of future executives
Minimising the risk of turnover of high-calibre people by placing
them at a level equal to their capability to ensure current as well as
future challenges
Educating employees and managers into a framework that empowers
with accountability, resulting in improved productivity and
organisational effectiveness due to maximisation of capabilities and
creativity
Developing platforms for mentoring and broader-based succession
planning in affirmative action strategies

14.3.5 Trajectories
A key aspect of the theory is that the career paths of people are generally
locked on to particular trajectories (or “modes” – sometimes called
“growth curves”) that are determined largely by the complexity of their
cognitive processes. These trajectories are empirically determined on the
basis of experience with a large number of organisations. Typically,
organisations such as the Dutch oil company Shell evaluate their
managers every six months or so in terms of where they will likely end
up in the organisation. For example, if a manager is not at the level of
general manager (level 4) by age 35, there is very little chance that he
will ever become managing director of the organisation. This is shown
in Figure 14.4.

Figure 14.4 Career modes or trajectories (growth curves

Source: Based on Stamp & Stamp (1993)


A criticism of this theory is that it is based, in part, on historical data. In
a society such as South Africa’s, where business and other opportunities
were denied to large parts of the population, these assumptions do not
necessarily hold true.

However, CPA does make allowance for this and provides an indication
of an individual’s capability to generate, understand and act in contexts
where prior knowledge and experience may no longer be applicable.
CPA provides an understanding of the nature of freedom the person
requires to act appropriately, as well as the value and type of work
contribution he is likely to make at various levels in the organisation.
The process is therefore able to identify potential for advancement and
suggest the best fit between capability and the demands of the
organisation. It creates mutual benefit for the individual and the
organisation as it predicts the person’s capability to generate
contextually appropriate solutions and decisions even in the absence of
previously acquired knowledge, skills and experience.

14.3.6 CPA procedure


The CPA process takes the form of a one-on-one interview that allows a
trained practitioner to obtain a picture of a person’s current and likely
future capability to make effective decisions. The procedure consists of
three parts:

1. Nine sets of phrase cards which enable the practitioner to gain


information about the person’s current level of capability in relation
to his current expected level of work, as well as the likely rate of
growth of his capability
2. A symbol card task, the purpose of which is to observe the process
of defining the task, generating alternative courses of action,
handling of uncertainty and reaching a sound solution. This allows
the practitioner to gain greater insight into the candidate’s current
level of capability and preferred approach to work
3. A career history interview in which candidates are required to reflect
upon their entire career, emphasising the times when they felt their
capabilities were well matched to the challenges that faced them,
and the times when they felt they were being given challenges that
they were not ready to handle

Using the above information, the practitioner is able to predict the


individual’s current and potential capabilities and to take steps to
address any perceived weaknesses.

Work by people such as Stamp and Retief (1996), and Mauer (2003a,
2003b, n.d.) has demonstrated the test-retest and inter-scorer reliability
as well as the construct validity and cultural fairness of the CPA. It is
also one of the few assessment methodologies to have been the subject
of a full independent validation at national research institute level.

14.4 When to use costly selection techniques

The decision whether to use psychological assessment techniques


(including tests) during the selection process is determined by a cost–
benefit or utility analysis. In other words, the benefits that the
organisation obtains from this process must justify the expense of the
assessment. The situations warranting the use of expensive selection
techniques and those that do not are listed in Table 14.4. Although it
may make financial sense to invest, say, R50 000 in the selection of a
top manager or executive, this amount is not justified in the selection of
a secretary or entry-level technician.

Table 14.4 When to use and not to use advanced selection techniques

Use Do not use


Large number of applicants for few Small number of applicants (high
positions (low selection ratio) selection ratio)
When good performance is essential When good performance is not critical
When high skills levels are required When average skills levels are
adequate
When poor performance affects the When mistakes can be tolerated
organisation’s costs, opportunities or
image
Where training costs are high Where training costs are low
For senior jobs with high impact on costs For lower-level jobs with less impact on
and the performance of others costs or the performance of others
When required competencies are rare When required competencies are
readily available
When new technologies are introduced When existing technology will continue
into the workplace
When stress on incumbents and When stress on others is not an issue
colleagues must be minimised
When current selection techniques are not When current selection techniques are
working successful

14.5 Assessment at group level

Much of an organisation’s success depends on the way in which groups,


especially teams, work together. Accordingly, it is important to assess
such aspects as team functioning, as well as the industrial relations
climate and general mood of the organisation. In this section, we
examine how assessment at the group level can contribute to
organisational success.

14.5.1 Team work


Teams form a significant component of most modern organisations, and
it is important during the process of team formation to understand
individual strengths and weaknesses. Teams need to know what their
membership consists of so that different members can build on the
strengths and cover for the weaknesses of fellow team members. As a
result, organisations pay much attention to team composition and team
performance. In assessing team functioning, two clusters of factors have
been identified: those relating to team dynamics and those relating to
processes. Team dynamics refer to the way in which members of the
team interact with one another. Team processes, on the other hand, are
concerned with aspects such as leadership, communication and
management systems, and similar aspects of team performance. Ways of
assessing these team performance factors include the use of specially
designed questionnaires, one-on-one discussions with key group
members, focus groups and surveys.

Barrick et al. (1998) have shown that conscientiousness (one of the Big
Five dimensions) predicts task performance in a team setting, especially
where team members contribute independently to the outcome. They
found that where the team performs in such a way that interpersonal
conflict is possible, agreeableness is a better predictor of team success –
one disagreeable person is often all that it takes to disrupt team
performance. As Hough and Oswald (2000, p. 645) show, this
consideration illustrates the importance of selecting people with
acceptable levels of interpersonal skill. However, the level of
interpersonal skill required also depends to a large degree on the role the
person plays in the team, as shown in the next paragraph.

Belbin’s Team Role Inventory (Belbin, 1981) is a useful device both for
selecting team members and for team-building exercises. The scale
consists of eight different groups of seven items (56 items in total)
which determine which of nine different roles individuals prefer to play
in a team situation. These roles are as follows:

1. Implementer. The converter of concepts, strategies and ideas into


relevant plans for action
2. Coordinator. The charismatic steerer from non-productive strife
towards focusing resources
3. Shaper. A forceful person who has the task in mind and makes sure
everyone else carries it out
4. Innovator (or plant). The ideas person who finds new angles and
approaches to problems
5. Resource investigator. The Mr Fixit who has contacts and runs the
relevant and irrelevant networks
6. Monitor/evaluator. The standard setter who knows how it was and
how it should be done
7. Team worker. The person wanting to get on with the job without
the problems of control issues
8. Completer/finisher. The actual completer of jobs and the one
concerned with fine detail
9. Specialist. The person who has expert knowledge and/or skills in
key areas and solves many problems there – may not be interested in
any other areas

These roles fall into three types, shown in Table 14.5.

Table 14.5 Types of team role

Action-oriented roles Implementer, shaper and completer/finisher


People-oriented roles Coordinator, resource investigator and team worker
Cerebral roles Innovator/plant, monitor/evaluator and specialist

The scale is commercially available (see http://www.belbin.com/) and in


some textbooks. It is useful for assigning members to teams and for
understanding the different roles people play in them. It can also be used
to understand and manage conflicts that occur in team contexts.

In addition to the Belbin scale, there are numerous management tools


used in the areas of leadership development and group dynamics, most
being based on the idea of assessment, self-understanding and
development. They include such instruments (somewhat dated, though
still useful) as Hersey and Blanchard’s Situational Leadership. Currently
fashionable instruments include measures like emotional intelligence
and the Myers-Briggs type indicator (MBTI). Other assessment tools
look at various aspects of team functioning, internal communication
systems, leader effectiveness, interpersonal dynamics and conflict
management.
14.5.2 Assessment of industrial relations climate
The industrial relations (IR) climate in any organisation is determined
by a range of factors including management practices, the presence and
use of various policies and structures, and the history of the way in
which management and workers have interacted in the past. To assess
the IR climate in an organisation requires an understanding of these
factors and then finding (or drawing up) various scales for measuring
these aspects. The measures may include aspects of job satisfaction,
quality of work life, organisational commitment, trust in management
and colleagues, and a whole range of similar factors. Until a proper
analysis is made of the possible factors contributing to a particular
organisation’s IR climate, little can be done to assess it. Measuring the
IR climate is no different from measuring any other kind of attitude, and
so the general techniques of drawing up a psychological measure
outlined in Chapter 3 (Conceptualising, Operationalising, Quantifying,
Pilot Testing, Interpretation and Evaluation) apply equally in this
setting.

14.5.3 Selection of people to work abroad


Many organisations need to appoint people to work abroad. According
to Aryee (1997), most companies select their expatriate workers on
technical competence alone, and thus run the risk of a high failure rate.
Shackleton and Newell (1997) show that in the US this failure rate is
between 15 and 40 per cent. It would seem that a proper assessment of
people to be appointed to work in countries with very different cultures
from their own is thus required.

14.6 Organisational aspects

Although assessment is generally seen as an individual or group process,


it is clear that the organisation as a whole needs to assess its
performance continually across a wide range of conditions. This is a
large topic and cannot be dealt with here, except to mention that the
need to be assessed includes various financial indicators such as return
on investment, inventory levels and profits. In addition, management
requires information on more general indicators of organisational
effectiveness such as productivity, quality and safety. Interested readers
are referred to literature on organisational effectiveness for more details
on these aspects. We do, however, look briefly at certain aspects that fall
within the domain and job description of a human resources manager or
industrial psychologist.

14.6.1 Mapping changes


Organisational psychologists are often involved in change management
processes, from a relatively straightforward assessment of training
outcomes to larger-scale efforts to change organisational effectiveness,
workplace attitudes and the like. This latter process is known as
organisational development* (OD). A key aspect of any change
process is to monitor the outcomes of the process, both positive and
negative. The assessment of specific components of this process of
change is essential and these will depend on which aspects of
organisational effectiveness are being addressed by the OD process.

14.6.2 Training
Most organisations pay large sums of money for the training and
development of their employees. In South Africa, the Skills Levies Act 9
of 1999 requires organisations to pay two per cent of their wage bills for
this purpose. As a result, many organisations make a conscious effort to
evaluate the effectiveness of their training efforts. Although there are
numerous methods for doing this, the best known is Kirkpatrick’s four
levels of training evaluation (Kirkpatrick, 1996). These levels are:

1. Reaction. Did the trainees enjoy the course, was the material well
presented, etc.?
2. Learning. Did the trainees learn (and retain) significant new
competencies?
3. Behaviour change. Do the trainees do things differently as a result
of this learning experience?
4. Impact. Has the training had any measurable effect on quality,
productivity, safety or any other (specified) indices of organisation
effectiveness? In other words, has the training affected the bottom
line to any meaningful extent?

14.6.3 Forensic evaluation


One final area of assessment that may be relevant to organisational
psychologists is forensic psychology, which in its simplest form is
concerned with trying to establish the cause of poor performance,
especially the sudden deterioration of work attitudes, poor
concentration, increasing accident rates, negligence, and the like.
Although this is more properly the concern of a clinical psychologist,
the human resources practitioner should be alert to the possibility of
drug-related symptoms (including alcohol abuse), behaviour reflecting
high stress levels, and HIV/AIDS-related fall-off of cognitive and/or
physical ability. If any employee appears to display any such behaviour,
he should be referred to a competent employee assistance programme
(EAP) officer or to an external expert in the field. In the absence of an
external expertise, the human resources practitioner may have to make
decisions about transferring the employee to a different work
environment.

Properly trained and registered organisational or industrial psychologists


may also be called on to comment on the job prospects and optimal
placement of people following motor vehicle and other accidents,
especially where brain damage is suspected or where the use of limbs
and other functions have been impaired.

14.7 Assessing external stakeholders

The final targets of assessment in organisations are the external


stakeholders. These people include suppliers, service providers, clients
and/or customers, and the general public. This target group is important
because these people often hold the key to organisational success. If a
product or service does not meet the customers’ requirements of quality
and value, they will take their business elsewhere. Therefore some kind
of market research – which is a form of assessment – needs to be carried
out regularly. Similarly, if a new product is to be launched, pre-
development research needs to be conducted. If the communities in
which the business is (to be) located are opposed to the organisation or
the product (or even to the way it is produced), the organisation will be
faced with difficulties. Ways of assessing the attitudes and beliefs of
external stakeholders include one-on-one discussions with key group
members, focus groups and surveys. In many cases, specialised market
research organisations are contracted for this process.

14.8 Criterion measurement

Much of this chapter, and indeed much of the book, has assumed that we
are able to quantify both the predictors (test and other assessment
scores) and the criteria (job satisfaction, performance productivity, and
so on). However, we do note in Chapter 5, section 5.4, that one of the
difficulties associated with validating assessment techniques is the so-
called criterion problem*. This issue is of great relevance to
assessment in organisations and needs to be dealt with at this point.

If we consider how we assess a person’s performance in the workplace,


we realise that the information is quite readily available and is used for
performance appraisal* purposes. Three basic kinds of information are
available. Firstly, there are production measures* (the number of
widgets the person makes or words the typist types per minute, and so
on). Secondly, there is track record information* that can be obtained
from the human resources department and includes such things as
promotion rate, absentee rate, tardiness (frequency of being late),
courses attended and pass rate, and so on). Thirdly, there are
judgemental measures* such as performance ratings by supervisors
and/or peers. Of these three, judgemental measures are the most widely
used. According to Murphy and Davidshofer (2005), this is so for
several reasons.

14.8.1 Production measures


Although production measures or data may seem objective, many jobs,
especially at managerial level, cannot be assessed in this way. It makes
no sense to count the number of memos written or telephone calls made
by a manager or a professional person – this is not what their jobs are
about. Even where countable products are produced, the numbers are
often outside the control of the employee. For example, the speed at
which people can assemble units like car parts or pack fruit is controlled
by the speed of the conveyor belt, and so quality, rather than number of
units, becomes a better indicator. Although quality may be assessed in
terms of numbers of rejects, it contains a strong judgemental element.

14.8.2 Track record


Track record information also has its share of problems. For example, if
we take the common index of absenteeism, we may ask what it really
means and how we should calculate it. Do we count the number of days
absent without distinguishing between a person who is away from work
for ten days with appendicitis and a person who is absent every
Monday? Do we make a distinction between voluntary absence (such as
taking a day off to go to a wedding) and involuntary absence (as a result
of illness or accidents)? In a society where cultural norms demand
attendance at funerals, is absence to attend a funeral voluntary or
involuntary? According to Murphy and Davidshofer (2005), there are
over 40 different indices of absenteeism alone. They also point out that
the distribution of absenteeism is skewed: most people take very few
days off; a few take many days. This lack of spread (variance) makes
any correlation between absenteeism and other measures very low. It
also makes it very difficult to differentiate between people on the basis
of absenteeism, when the vast majority of people are very seldom
absent.
Another common index is accidents. However, this is also a poor index
of performance because the distribution of accidents is skewed too, with
most people not being involved in accidents. It is therefore very difficult
to use accident rate as a measure of performance or to relate it to other
factors such as personality or intelligence.

14.8.3 Judgemental data


The most common method of arriving at measures of job performance
involve judgements made by others, usually the person’s supervisor,
sometimes by peers and sometimes by external stakeholders such as
clients and/or customers and suppliers. Sometimes all of these people
are involved in a process known as 360-degree assessment.

There are two approaches to making these judgements: one based on


ranking* and the other on rating*. Ranking is a normative approach in
which people are compared to each other in an ipsative manner: if five
of six rankings are known, the sixth one is fixed. On the other hand,
rating scales involve some kind of external criteria or standard and are
therefore an example of a criterion-referenced process that allows for
ties. (See Chapter 3, section 3.6.6.)

14.8.3.1 Ranking techniques


Ranking involves placing people in order. This can be done in one of
three ways. Firstly, there is full ranking*, in which all subordinates are
ranked by their supervisor from best to worst in terms of their overall
effectiveness. A problem with this is that it can be very subjective with
various forms of halo effect* operating. If, for example, a supervisor
does not agree with the political views of a subordinate, this may be
reflected in the way he judges the subordinate’s performance. As we
will see in Chapter 16 on interviewing, research has shown that
attractive people tend to get better rankings than less attractive ones.

The second approach to ranking is forced distribution* in which the


participants are placed into a number of categories (e.g. a high, a middle
and a low category). One problem with this is the tendency of most
supervisors to rank everyone in the top category – even the worst
performer tends to be rated as above average. To avoid this, supervisors
can be forced, for example, to place 25 per cent of subordinates in the
high category, 50 per cent in the middle category and 25 per cent in the
low category. However, supervisors often resist doing this because they
may feel that their subordinates are all good performers and better than
those in another section or department.

The third approach to ranking is paired comparisons*. This is a very


simple process in which a single question is asked: Is person A more or
less effective than person B? This comparison is made for each pair and,
in this way, a ranking order is established. Although this method seems
quite crude, it allows very fine ordering of people in terms of their
perceived effectiveness. While this approach works well when relatively
few people are compared, the number of comparisons becomes large
very rapidly. The total number of comparisons is given by the formula
n(n – 1)/2. If four people are compared, the number of paired
comparisons is six (4×3/2). If there are six people, this number grows to
15 (6×5/2). If there are ten participants, 45 comparisons need to be made
(10×9/2).

Although all three methods can be used, they involve different amounts
of work: forced distribution methods are the easiest; paired comparisons
are the most difficult.

14.8.3.2 Rating techniques


The second judgemental approach involves rating the person against
some external standard or criterion. There are a number of ways of
doing this, including continuous scales, numeric scales and verbally
anchored rating scales (VARS). These are illustrated in Table 14.6.

Table 14.6 Examples of various rating scales


In order to attach a value to the continuous scale, the rater needs to take
a ruler and physically measure the distance of the mark from the left-
hand end of the scale and use this distance as the value. Using a ten-
point scale, worker A is rated as 8,5 and worker B is rated as 2. This
chore of physically measuring the distance in each case is overcome in
the graphic scale where the distances are marked. In the numeric scale a
line is presumed but not given, and the rater decides first in which of the
four categories to rate the person, and then gives a finer rating. In Table
14.6, the person is rated 13, which is near the top end of “average”.
Finally, in the verbally anchored rating scale, the task of rating the
person is made much easier, because performance levels are verbally
described, which helps the rater to make a sound judgement.

There are, however, several problems associated with rating scales. The
first is the global nature of many of the dimensions being evaluated,
which results in high levels of ambiguity – it is never clear exactly what
behaviours are included in the dimensions being rated. What exactly do
“Communication”, “Relations with others” or “Quality of work” mean?
Secondly, there is the halo effect in which performance on one
dimension is influenced by other irrelevant factors.

14.8.3.3 360-degree assessment


One approach to assessment that is increasingly used and which can
provide criteria for validity studies is 360-degree assessment. This refers
to “full circle” feedback from superiors, peers and subordinates to the
person being rated. It is sometimes referred to as “multi-source
appraisal”. (See, for example, Bates, 2002.) Typically, a 360-degree
assessment would include key categories (such as communication,
teamwork, etc.) with five or six specific behaviours listed within each
category. In this process, an employee is rated by a range of colleagues
across a number of dimensions and performance areas. As the business
culture has moved towards looking not only at what people do, but also
at how they do it, 360-degree assessment provides a way of accurately
measuring relevant behaviours. Using a number of raters helps to
overcome many of the limitations of the more traditional top-down
appraisals, which usually represent the view of one person (usually the
superior). The 360-degree assessment method has the potential to deal
with many of the problems associated with single-source data,
producing rating information that is relatively free of bias and error.

14.8.3.4 Behaviourally anchored rating scales


One way around many of the issues mentioned is through the
development and use of behaviourally anchored rating scales (BARS).
This technique is similar in many ways to verbally anchored scales (see
Table 14.7). Unlike verbally anchored scales, behaviourally anchored
scales clearly specify the behaviours associated with performance at
various levels of performance. An example of a BARS is shown in
Table 14.7.

Table 14.7 An example of a behaviourally anchored rating scale


14.8.4 Economic value added
One potential measure that has not been used to assess organisational
performance, but which has a great deal of potential and is gaining
increasing acceptance in the workplace, is economic value added* or
EVA. EVA is the value of an activity that remains after the cost of
executing the activity and the cost of the lost opportunity to invest the
time, money and effort in an alternative activity have been subtracted.
Crudely stated, EVA is the “profit” generated by the organisation after
all expenses have been paid. More formally, EVA can be calculated as
follows:

EVA = Calculated as net operating profits after taxes (NOPAT) minus


the costs of production, i.e. EVA = (NOPAT – costs)

What is interesting about this formula is that in principle EVA can be


calculated not only for the organisation as a whole, but for departments,
teams and even individuals. In other words, adopting an EVA framework
allows us to work out in fairly precise terms what each person has
earned, what this has cost the organisation and what “profit” (EVA)
each person has contributed to the organisation. This becomes a very
useful method for judging individual performance. In Sidebar 14.1 is an
extract from an article by a leading US consultancy firm, BCG (Boston
Consulting Group), about one of their products which they have termed
Workonomics.

Sidebar 14.1 BCG’s approach to calculating EVA


… Workonomics calculates value added per person (VAP) as a measure of
average productivity. Subtracting the average cost per person (ACP) from VAP
and multiplying the difference by the number of employed people (P) produces
the residual income, which is called economic value added (EVA).
“Workonomics links traditional measures of employees’ productivity (for example,
sales per employee, employee hours per store, and employee turnover) with the
financial performance of each store and region, and with various corporate
functions. To complement ROI in people-intensive industries, for instance,” this
approach provides quantitative, personnel-oriented metrics that mirror classic
control systems and bridge the gap between measures of capital and assessment
of human assets. “The purpose of Workonomics is not to replace old, capital-
based control systems; rather it is to make them more realistic by incorporating a
measure of human capital and linking that measure directly to shareholder value.”
Source: From (with modifications)
http://www.bcg.com/publications/publication_view.jsp?pubID

14.9 Summary

In this chapter, we saw that assessment covers a wide range of situations


at the individual, group, organisational and external stakeholder levels.
In addition, assessment is used to describe, decide about and develop
various aspects of the organisation in order to manage and improve
organisational performance. A matrix of the reasons for and targets of
assessment was presented.

Perhaps the single most important reason for assessment is selection. A


model of the selection process (Figure 14.1) was presented, and we
discussed five steps to increase its validity. Related uses of assessment
were also briefly mentioned (staff development, promotion and transfer,
and performance management). We considered career path appreciation
theory and costs associated with the selection process.
The latter part of the chapter dealt with group assessment and some
related issues, and we looked at organisational aspects and the
assessment of external stakeholders. Finally, we addressed various
approaches to criterion measurement.

Additional reading

For an excellent review of the use of psychological assessments in predicting job


success, see Hough, L.M. & Oswald, F.L. (2000). Personnel selection: Looking toward
the future – remembering the past. The book by Adrian Furnham (2003) entitled The
incompetent manager should be read to see just what effect bad selection can have on
an organisation’s performance.
Kaplan & Saccuzzo (2013) have a good chapter on testing in industrial and business
settings (Chapter 18).
People interested in calculating the cost of poor selection should read Williams, R.W.
(2006). The not-so-hidden costs of poor selection, available at
http://hr.monster.com/articles/wendell/wendell3/
For an in-depth look at non-psychological aspects of evaluation in the organisational
context, see Russ-Eft, D. & Preskill, H. (2001). Evaluation in organizations: A systematic
approach to enhancing learning, performance and change.
For further information on EVA, visit the BCG website at
http://www.bcg.com/publications/publication_view.jsp?pubID
For an excellent overview of assessment for selection and related personnel decisions,
see Guion, R.M. (1998). Assessment, measurement and prediction for personnel
decisions. Hough & Oswald (2000) in their review of personnel selection were so
impressed that they felt that “this book on personnel selection will fast become a
classic” (p. 347).
People interested in the predictive validity of various selection techniques are referred to
Ekuma, KJ. (2012). The importance of predictive and face validity in employee selection
and ways of maximizing them: An assessment of three selection methods. International
Journal of Business and Management, 7(22), 115–122.

Test your understanding


Short paragraphs

1. Describe the five steps involved in the selection process.


2. What is meant by utility in the selection process? Describe the situations where
relatively advanced selection techniques should be used and where they should be
avoided. You may present this information in table form if you like.
3. Describe what is meant by economic value added (EVA) and show how this can be
used to evaluate an employee’s job performance.

Essays

1. Using a suitable diagram, explain the relationship between job requirements, what an
assessment technique claims to measure and what it actually measures. What are
the implications of this for a fair and valid selection process?
2. Describe what is meant by career path appreciation.
3. Besides selection, what are the other uses of assessment in organisations? Who
should carry out these assessments and what techniques are available?
15 Assessment for career
counselling

OBJECTIVES

By the end of this chapter, you should be able to

define what is meant by a career


show how this concept has changed over time
describe the six steps needed to make a decision about a career
describe the techniques that can be used to assess people in their career choice
process
describe Holland’s RIASEC model
suggest ways in which this model can be extended
describe Schein’s model of career anchors.

15.1 Introduction

Successful career choice and job satisfaction, as well as high levels of


achievement, depend to a large degree on the match between the
requirements of the job and the competencies (knowledge, skills,
attitudes and values) brought by the person to the job. In order to ensure
that there is as good a match as possible, we need to know what
psychological dimensions are required by the job, and then determine
the extent to which these characteristics are present in the person
concerned. In this respect, career choice is similar in many ways to the
process of selection.

Before we look at this process of matching people to jobs and/or careers,


we need to examine a few related issues.
15.1.1 The world of work
The world of work is changing rapidly, with new jobs and new ways of
working emerging all the time. Even five or ten years ago, there was
little indication that cellphones, computers and television would start to
converge, and thus no one could have predicted that jobs in this area
would become available. The career of a webmaster or web designer did
not exist ten years ago. Office blocks and factories are being replaced by
rather loose collections of people working for relatively short times on
various projects. When these come to an end, these people are
reassigned to a new project. In many cases, people work from home and
stay in contact with other members of the project team by email or
cellphone – this is called telecommuting. The rate at which new jobs
will emerge and existing jobs disappear is likely to increase at an
accelerating rate.

In a study prepared for DGV of the European Commission, Bruce


Ballantine (1999), Chairman – Business Decisions Limited, seven main
areas in which changes are being introduced by leading companies were
identified:

New organisational structures:

– process-oriented business units; and


– semi-autonomous teams

More flexible and less hierarchical working methods:

– more flexible working hours; and


– multiskilling

New corporate cultures:

– greater focus on people, customers, service and quality


New business practices:

– quality management programmes

Increased investment in education and training:

– involvement of more employees in programmes; and


– the use of programmes to enhance “personal” skills

New performance measurement techniques:

– “non-financial” measures and objectives for teams and individuals

New reward systems:

– profit-sharing, bonuses and stock-option schemes

The benefits for companies from adopting these new forms of work
organisation include

improvements in operating efficiency


increases in marketplace performance and customer satisfaction
enhancements in product and process innovation
more effective exploitation of investments in advanced technologies
faster response to changes in the business environment.

The benefits for employees from new forms of work organisation


include

the protection and creation of jobs, including new types of jobs that
match the aspirations of employees
improvements in the quality and the content of work itself.
Clearly the nature of work is changing, and will continue to do so as
new technologies change what we do and how we do it. New careers are
constantly emerging and these require new skills and attitudes, and
career counselling has to keep abreast of these developments if it is to
remain effective and relevant in post-modern society. We also have to
recognise that people are likely to change their jobs a number of times in
their lives – according to Naicker (1994), people in the major developed
economies change careers an average of five times during their career
lifetime. Savickas (2006) asserts that individuals in the US born between
1957 and 1964 had an average of ten jobs between the ages of 18 and
38. This creates a whole new spin on career counselling, stressing the
need for critical thinking and multiskilling to allow people to move
relatively easily and seamlessly between jobs and economic sectors.
Post-modern approaches of “life design counselling” and other
techniques such as “lifeline”, “collage”, “role identification” and
“fantasy” may be useful in helping people design their lives, but in the
absence of meaningful opportunities to exercises one’s talents and
preferences, no amount of “interactive group discussions that focus on
the personal application of these techniques to enhance the life design
process” (Zunker, 1998) will be of any use. (For an explanation of these
terms and how they help in “the construction of meaning through
communication”, the reader is referred to Maree, 2010, p. 363).

Two quite distinct approaches to career counselling can be identified.


The first is the traditional (some would even call it old-fashioned)
approach which is focused on an impartial and qualified outsider
identifying the strengths and weaknesses of the person and then trying to
match them to a particular niche or career path in an organisation. This
approach is very similar to what occurs when a person is selected into an
organisation, and could thus be termed a “selection” approach. The
second post-modern approach is focused on the needs of the individual,
and explores the needs and preferences of the person and helps him to
decide where best his talents could be used. This approach may also
suggest what additional skills the person should acquire to help him
meet his personal objectives. This approach could best be described as
an “interpretative” or a “facilitation” model in which the third party’s
role is one of a caring colleague assisting in the exploration process,
rather than one of an independent expert giving advice.

Although there is room for both approaches, this book is rooted in the
modern and positivistic (as against a post-modern) paradigm, and so the
traditional selection approach rather than the interpretative facilitation
model is adopted. Nevertheless, psychometric assessment can be used to
assess individual abilities, flexibility, personal styles and preferences as
a basis for further exploration and social construction. As is shown
below (section 15.3.2), people are drawn to, and are most comfortable
in, work environments that match their temperaments and personal
styles. While a “people” person may move between teaching, human
resources management and running a B&B, he will be far less likely to
move to a career in bookkeeping or technical drawing. Even in a post-
modern world, people still need to find areas in which their skills and
preferences can be fully exercised. It is an axiom of modern biology that
an organism must be able to adapt to its environment or move
elsewhere, or else it will die! Although humans can adapt to hostile
environments, like most creatures they prefer to be within their comfort
zones for most of the time – the energy costs of continually fighting
against an unfriendly environment are too high for this to be sustainable.

These issues then raise the following questions:

What is a job?
What is a career?
How does one go about advising people about possible careers and
selecting them for these?

15.1.2 What is a job?


A job is a collection of tasks that need to be done. For example, the job
of a gardener involves mowing the lawn, weeding the flower beds,
cutting the hedge, taking the clippings to the dump, digging beds, and so
on. The job of a teacher is to prepare teaching material, give the lessons,
draw up test papers, hand them out, collect and mark them, enter the
marks, write comments, deal with learners who are causing problems,
and so forth. In many schools, teachers are also expected to do
extracurricular activities, such as coach a sports team, run the chess or
debating society, and put on a play or musical every year or two. Of
course, jobs may change over time – as new tasks are added, existing
tasks change or even fall away. In most cases, people are paid for doing
these tasks, although some people, especially those who do charity
work, do their jobs without remuneration.

15.1.3 What is a career?


A career can be described as a series of jobs that are related to each
other and which grow in complexity. An example would be a teacher
who becomes a head of department (HoD) before becoming a vice-
principal and then a principal. After this, the teacher can become an
inspector of schools, then a curriculum planner and then move into a
provincial department of education, before finally moving to the
national Department of Education as a director of education and
possibly even the Minister of Education. In a personnel job, a person
may begin as a human resources officer, move to being a senior human
resources officer, then human resources manager, group human
resources manager and then director of human resources. Both of these
are examples of a career – there is a straight line between the entry-level
job and the highest-level job.

15.1.4 Definition of a career


In terms of this argument, a career is a set of interrelated jobs that a
person follows over his lifetime, with some kind of upward trajectory. A
person who has remained a personnel officer, teacher or police constable
cannot really talk about a career. According to Greenhaus and Callanan
(1994, p. 22), a career is “the pattern of work-related experiences that
span the course of a person’s life”.

This definition is less than adequate because it fails to take into account
that a career consists of a series of related jobs – being a butcher, a baker
and a candlestick maker are three careers, not one. Suppose a teacher
became HoD and then decided to go into industry as a human resources
manager, or to sell houses or to run a guesthouse. Clearly, this is a
change of career. Each new career may have its own promotion route.
However, in most cases the skills and experience gained in one career
(e.g. teaching) may not be very useful in the new career (such as selling
houses). In other words, a career can be seen as a family of related jobs
with some kind of hierarchy so that people are seen to “progress” as
they move to increasingly senior levels within this family of jobs.

One result of moving between career routes too often is that a person
does not gain enough experience in one particular sphere to move up the
career ladder. In the past, changing jobs too often was frowned upon – it
was seen as job hopping. If a person applied for a job and the
interviewer saw that the applicant had had four or five jobs in the last
ten years, he would become very suspicious. If job applicants were seen
to be job hoppers, it would count against them. However, things have
changed somewhat.

Because of the rapid changes in technology and work systems, many


existing jobs will fall away and new jobs will be created. Quite often,
people are forced to change jobs and careers because the company or
even the industry they work for lays people off or closes completely.
Gone forever are the days when a person joined the bank after his
studies and stayed there for the next 20 or 30 years, rising from being a
teller or clerk to become managing director. In addition, the last decade
or so has been characterised by widespread restructurings, mergers and
downsizing. This has resulted in large numbers of people being
retrenched, hence the idea of job security no longer exists. Instead, it is
likely that most people entering the world of work in the next ten to 20
years will have at least three or four quite distinct careers during their
lives. This means that it is almost impossible for a person to choose a
career that he will stick to for the rest of his life.

Perhaps even more importantly, people in business and industry are


becoming suspicious of those who have been in one job for too long.
Because the world is changing so fast, people with a wide range of
appropriate experience are the ones who now have the advantage. There
is a vast difference between a person who has ten years of experience
and one who has had one year of experience repeated ten times.
Nowadays, therefore, it is more important to think of career security and
to define a career as a portfolio of skills and knowledge that can be
applied in a variety of similar situations. This means that the focus of a
career in the 21st century has to be on building a set of desirable and
sought-after skills. A career must be seen as a process of building one’s
abilities which grow each year and which can be applied in different
organisations. In the past, loyalty was towards an employer and a career
was seen as progression up the ladder in a particular organisation.
Today, loyalty has to be to oneself and a particular profession. It is for
this reason that people need to keep abreast of technological changes, to
become lifelong learners, to receive ongoing training, and to acquire the
skill to adapt to rapidly changing career contexts and to deal with
repeated employment transitions. Accordingly, they have to acquire the
necessary skills associated with the latest technology to remain relevant
in a highly competitive job market. This will require a new approach to
assessment and guidance. As Maree (2010, p. 362) notes, “[t]he realities
of the 21st century labour market should therefore dictate assessment
strategies and guide feedback to clients”.

15.1.5 Choosing a career


Holland (1985) explains the job choice process by arguing that as a
person grows up he learns different ways of dealing with and adjusting
to the world. When he then has to make a career decision he looks for
job situations which match his knowledge, skills, attitudes and values. A
person also looks for jobs that match his personality and abilities. In
other words, a career choice is essentially the selection of the best
available option in terms of what the person knows and what he
considers important in life. This stresses the need to be aware of what
jobs are available: not knowing about a job or career route prevents a
person from considering it as a career possibility.

This process has vast implications for assessment. Firstly, we need to


accept that career choice is a matching process in which the skills,
abilities and interests of a person are coordinated with the requirements
of an organisation and a job in particular. We therefore need to know
both the individual’s strengths and the specific requirements of the job.
In addition, because the requirements of the job may change at some
point in the future, we may also want to know in broad terms the kinds
of job that he may be able to perform in the future – that is, his potential.

Assessment for selection on the basis of current ability or competence


and assessment of potential (i.e. future competence) for career decisions
can be quite different. It also raises a problem in terms of South African
labour law. On the one hand the law argues that selection must be based
on the specific requirements of the job, whereas the Employment Equity
Act also states that previously disadvantaged people should be selected
on the basis of their potential to function adequately in the job within a
reasonable time; therefore, when people are being assessed for career
choice, it is important not to lose sight of the longer-term perspective.

At the same time, however, career choice is not a once-off process


because both jobs and people evolve. What seems of interest at one
stage of a person’s life or level of knowledge may become far less
interesting when the person gets to know more about the detailed
contents of the job and the prospects associated with it. Similarly, first-
time entrants into the world of work have little insight into their own
preferences and abilities, so that what seems like a sound choice at one
stage may be seen as a poor choice five years later. For this reason,
career should not be seen as an all-or-nothing decision, but rather as part
of an evolving process. This suggests that a narrowly focused approach
should be replaced with a much broader one, in which broad fields are
determined and, based on research, narrowed down to a more specific
career direction.

We must not forget that, for most people, career development is a


lifelong process of engaging the world of work through choosing among
employment opportunities made available to them. If the “right job” is
not immediately available, people are often forced to accept a less-than-
ideal position, with one of three possible outcomes resulting – they may
“grow into” the job, they may leave it as soon as possible, or they may
suffer for a period in the job, bored, dissatisfied and underachieving.
Clearly, this is not a scenario that any employer would want.

Each of us is influenced by many factors in undertaking this process,


including the context in which we live, our personal aptitudes, and our
educational attainment (Bandura et al., 2001). Sound career choice (and
assessment) depends on four sets of knowledge: what kinds of jobs are
available, what each job requires, what we are interested in, and our own
strengths and weaknesses in each of these four required areas. Success
in a career also depends on a fifth factor, namely motivation and
application – hard work!

15.2 What jobs/careers are available?

There are many different jobs and career routes, and so a person needs
to find out more about what is available. If a person is unaware of
certain possibilities, he will not be able to consider them at all through
sheer ignorance. It is well established that the best single predictor of a
person’s career is the career followed by his parents – teachers tend to
produce teachers, doctors tend to have children who become doctors,
accountants produce accountants, motor mechanics have children who
become motor mechanics, and so on. This happens because the parents
provide information about the kinds of things they do (and not about
other things) and act as role models for their children. Parents shape the
interests, attitudes and values of their children, and these then shape the
career patterns of the next generation.

The presence of appropriate role models and encouragement in a


particular career direction and to a particular level are thus critical
determinants of job choice, and especially of the appropriateness of that
choice. This is a major problem in many developing countries, including
South Africa, where the range of jobs far exceeds the knowledge and
experience of many parents. As a result, there are very few positive role
models and even less knowledge about what new or emerging jobs
entail. Even many well-established jobs change rapidly. For both these
reasons, parents may sometimes not be in a good position to guide their
children into suitable careers. Therefore, the first thing we need to do is
to explore what jobs are available in the marketplace.
Let us consider the story about a person who spent hours under a
lamppost looking for a key he had lost. A stranger joined him and
helped look for the key. After about an hour of looking for the key, the
stranger asked him where exactly he had dropped the key. His answer
was that he had dropped it about 25 metres down the road. “Then why
are we looking for it here?”, the stranger asked. The man’s reply was
“Because it is dark there and light here – I would never be able to find
the key in the dark!”

If we look for our careers in the light of the knowledge we have, we may
well be looking in the wrong place.

15.3 The characteristics of jobs and/or careers

The second important issue is to determine what kind of knowledge,


skills and other abilities are required for success in specific jobs. There
are a number of ways in which to approach this.

15.3.1 A common-sense approach


This approach is not based on any particular theory, but poses a number
of simple questions about the job.

15.3.1.1 Does the job involve working with numbers, people,


ideas or objects?
One way of looking at jobs is to decide whether they involve working
with numbers (e.g. bookkeeping, accounting), or with people (e.g.
teaching, personnel management, selling, counselling, etc.), or with
ideas (art, research, philosophy), or with objects (engineering, technical
work, farming, sport). Most jobs require different blends of each of
these, but focus mainly on one of these four areas.

15.3.1.2 Where is the job located?


Another way of looking at jobs is to see where most of a person’s time is
spent when doing the job. There are three main types of location:
indoors in an office, indoors in a factory or workshop, and outdoors. If
the job requires a combination, one should establish the ratio. Typical
office workers, teachers, artists, medical practitioners, and so forth
spend most of their time indoors, whereas motor mechanics, engineers
and production managers spend most of their time in a factory or
workshop. Professional sports people, game rangers, geologists, farmers
and the like spend most of their time in the open air. Although the idea
of being out in the open all the time may sound appealing, careful
thought needs to be given as to whether a person really wants to spend
his days in the hot sun, rain, wind or cold.

15.3.1.3 Is the job a generalist or a specialist one?


Generalist jobs require a broad range of abilities developed over time
with experience. A typical generalist job is that of a production or
factory manager, where the incumbent depends on his experience and
general management abilities. A specialist job is one which requires a
much narrower range of knowledge, but this must be at great depth. The
information technology (IT) manager, the industrial relations lawyer or a
medical doctor are examples of more specialist jobs. Specialisation is
reflected in the type of post-school qualifications a person obtains: the
more specialised the career, the more important the professional
qualification becomes. In general terms, people often begin their careers
as specialists, and then gradually become generalists as they progress up
the job hierarchy. In contrast, many people enter generalist positions at
job-entry level.

15.3.1.4 What career stage is involved?


A fourth way of looking at a job is to ask whether it is an entry-level job,
whether it is a job for relatively experienced people, or a high-powered
job that requires high levels of expertise and subject mastery. Clearly,
this information is required for proper assessment.

15.3.1.5 What education is required?


A final consideration when looking at a job or career field concerns the
amount of education or training required. In practice, four levels can be
identified:

1. Lower than Grade 12


2. Grade 12 (plus a year or two training)
3. Three or four years of tertiary education (such as a B degree from a
university or a 3–4 year technikon or training college qualification)
4. A postgraduate degree (e.g. Master’s degree, doctorate or a six-year
professional degree)

The choice of career therefore depends to a certain extent on a person’s


cognitive ability. This means that an important aspect of any assessment
process is to determine the person’s ability or potential to achieve a
specific level of education.

15.3.2 Holland’s model


John Holland’s approach (1985) to describing jobs is based on a firm
theoretical basis. According to Holland, jobs can be described in terms
of six different factors that form the acronym RIASEC. These six factors
and typical jobs associated with each are described in Table 15.1.

Table 15.1 Holland’s RIASEC factors

Factor Description Typical jobs


R Realistic/practical Action Farmer, motor mechanic, engineer, sportsman
based
I Investigative Exploring Scientific jobs such as researcher,
astronomer, etc.
A Artistic Creative Musician, actor, ballet dancer, novelist
S Social Helping Teacher, medical practitioner, priest, social
worker
E Enterprising Selling Business manager, estate agent, B&B owner
C Conventional Rule Bookkeeper, accountant, conveyancer
following

An important aspect of the model is that these six factors are arranged in
a hexagon or circle. This means that the factors next to each other are
fairly similar and more likely to occur in a single job. For example,
teaching involves “social” and some “artistic” and “enterprising”
(persuading) ability. On the other hand, factors that are opposite each
other in the model are opposite each other in real life and seldom occur
in the same job. For example, bookkeepers and accountants have to
follow rules and conventions (such as the double entry system) and in
general are not very artistic or social in their work. The same can be said
for engineers, although they are more practical and investigative rather
than conventional (even though the rule-following aspect of this
dimension should not be overlooked). This model is shown in Figure
15.1.

Figure 15.1 Holland’s RIASEC


model

Of course, most jobs require more than one characteristic, and so jobs in
general are described using a three-letter label to reflect their three most
important factors: an English teacher is SAE (social, artistic and
enterprising); an airline pilot is RIE (realistic, investigative and
enterprising) and a surgeon is ISR (investigative, social and realistic).
These three characteristics need not be in adjacent “blocks”, as can be
seen in the case of the pilot and the surgeon. There are various
directories, for example Holland’s The occupations finder, in which a
large number of jobs are rated on their three most important dimensions.
However, these directories tend to become outdated fairly quickly and
many new jobs such as professional sportsman, call-centre operator and
Internet service provider (ISP), operator or administrator are not listed.

We must also not forget that there are similar jobs that exist along a
single dimension, but vary according to the level of education required.
A good example of such a hierarchy of jobs is in engineering, with jobs
ranging from unskilled labourers, to semi-skilled artisan aides, to skilled
artisans, to technicians, to graduate engineers and finally to high-
powered consulting engineers with Master’s degrees or doctorates. This
means that, although a person may be interested in, say, the engineering
field, the exact level to which he aspires will depend on his evaluation of
himself and his ability to succeed at a particular level. A person who
feels that tertiary education is beyond him or who cannot afford it may
opt for the artisan or technician route. On the other hand, the person who
believes he has the potential to acquire a primary or higher degree and
can afford the university fees would go the university graduate engineer
route. The implications of different educational levels for assessment is
that we need to assess the person’s ability and likelihood of succeeding
at a given level within a career family. (This issue is dealt with in
15.4.1.)

An extended version of the RIASEC model that the author has


developed and which incorporates education levels and types of job
content is given in Figure 15.2. The outside of the wheel (the tyre, as it
were) is divided into four segments, based on the basic content of the
jobs. These segments are labelled ideas, people, numbers and things, and
they coincide roughly with the six RIASEC dimensions. For example,
jobs that are explorative and creative in nature are concerned largely
with ideas, while the creative, helping and selling jobs are concerned
largely with people. Likewise, selling and rule-following jobs are most
closely aligned to numbers (e.g. banking and clerical jobs) while the
action-oriented jobs (such as policing and engineering) involve things
such as machines, plants and animals. (Note that the tertiary education
level is divided into two: graduate and postgraduate.)

Figure 15.2 The expanded RIASEC model

15.3.3 Schein’s career anchors model


Edgar Schein (1990) has put forward a different kind of model, arguing
that people’s career choices are shaped by a dominant “anchor” or
motivator that acts as an internal compass in making career decisions.
He outlines eight main career anchors:

1. Technical/functional competence
2. General management competence
3. Autonomy/independence
4. Security/stability
5. Entrepreneurial creativity
6. Service/dedication to a cause
7. Pure challenge
8. Lifestyle

The meanings and implications for people with these career anchors are
given below (see also Schein, 1995).

Technical/functional. These people enjoy using and excelling in their


core or specialist skills. These skills are not necessarily technical or
mechanical in nature. For example, human resources workers or
teachers may achieve high levels of job satisfaction from using the
skills they have and which are needed for their positions. Such people
are motivated by learning new skills and expanding their current
knowledge base. People like this need to be placed in jobs in which
they can exercise their skills and talents. They gain satisfaction from
knowing the relevant concepts and exercising them daily. If the work
is not a challenge, technical/functional types feel bored and/or
underutilised. They see teaching and mentoring as opportunities for
demonstrating their expertise. Professions such as engineering and
software design attract a large proportion of people with this
particular anchor, although they can also be found just about
anywhere, from the financial analyst excited by the chance to solve
complicated investment problems to the teacher happy to continually
fine-tune classroom performance.
General management. These people view specialisation as limiting,
and much prefer to manage or supervise people. They enjoy
motivating, training and directing the work of others and being in a
position of authority and responsibility. They do best when they are
able to exercise their competence in three areas, namely analytical
ability, interpersonal or intergroup skills, and emotional intelligence.
They are demotivated when they lose their ability to influence others
in these managerial situations. As a result, these people look for jobs
with high levels of responsibility, and which require leadership and
the ability to solve a wide range of problems and find integrative
solutions.
Autonomy/independence. These people need and want control over
their own work and want to be recognised for their achievements.
They do not like working for others and having to follow other
people’s rules or procedures. They need to do things their own way
and prefer to fill roles as independent consulting and contract
workers. Once they have been told what is required and the due date
stipulated, they want to be left alone to get on with the work in their
own way. These people tend to seek to work autonomously such as to
consult, teach or do contract or project work, or even to do temporary
work, either part time or full time.
Security/stability. These people do not take risks, nor do they like to
take chances. They prefer to work in a safe, secure and predictable
environment, and strive for predictability, safety and structure, and
the knowledge that they have completed their tasks properly. They are
motivated by calmness and consistency of work, and as a result seek
out stable companies, even though this may mean somewhat lower
salaries. Any unused talents may be channelled outside work. Because
these people are motivated by stability and predictability, they are
more concerned with their pay, benefits and work environment than
with the actual nature of their work. Employees with this career
anchor like tasks and policies to be clearly codified and defined. They
identify strongly with their organisation, whatever their level of
responsibility.
Entrepreneurial creativity. These people are motivated by the
challenge of starting new projects or businesses. They have many
interests and large amounts of energy, and often have many projects
going at once. This anchor differs from autonomy in that the emphasis
is on creating new business – people with this competence often
display this ability at an early age. In the work situation, they have a
strong need to create something new: they get bored easily and are
restless and constantly seeking new creative outlets. They like
situations in which they can develop new practical solutions and
inventions of various kinds.
Service/dedication to a cause. These people are motivated by their
deep-seated core values rather than the work itself and have a strong
desire to make the world a better place. They are found mainly in
service-oriented professions and helping organisations such as NGOs,
CBOs, charitable organisations, and so on. They are motivated by the
pursuit of values and causes rather than by money or prestige.
Pure challenge. These people are motivated by a strong desire to
solve difficult problems and overcome obstacles. They enjoy
conquering difficult situations and beating others. As a result, they are
constantly testing their own limits and pushing boundaries, and thus
come across as single-minded individuals. They like situations where
competition is of absolute importance.
Lifestyle. These people have a great need to balance work and other
aspects of life. Although they enjoy work, they also realise that it is
just one aspect of life. They believe in “working to live”, rather than
“living to work”. As a result, they look for jobs and careers that can
be easily integrated with the rest of their lives. They therefore look for
work with organisations that accept and promote balance, and may
even be unwilling to relocate if this is likely to affect this balance.

The manifestation of these career anchors in a person can be measured


by Schein’s (1995) Career Orientations Inventory. It is also important to
realise, however, that most people do not have one very strong anchor
and that at different times different anchors may become more
important. For example, people who are driven by the need for
technical/functional excellence when they are young may become more
interested in general management as they grow older. Similarly, lifestyle
tends to be more important to young people and becomes less so as their
careers mature. Nevertheless, it is unlikely that a keen surfer will be
happy to work far away from the sea, or that someone who is very keen
on game viewing will be content to spend all his time in a large city. In
the 1970s and 1980s, fairly consistent results were obtained from tests
Schein conducted, with roughly 25 per cent of the candidates being
anchored in “General management”, another 25 per cent in
“Technical/functional” competence, and ten per cent each in
“Autonomy” and “Security”. The rest were spread across the remaining
anchors (Schein, 1990). In recent times there appears to have been a
move toward autonomy and a balanced lifestyle, and away from
security, reflecting changes in the way organisations are currently
managed. There has also been an increase in the number of people with
a service orientation, especially around environmental issues (Schein,
1996).

15.4 Assessing individual characteristics

Now that we have a reasonable idea of how jobs and careers are
structured in terms of the six RIASEC dimensions and four educational
levels, the next step is to assess the presence and relative strength of
each of these factors in a person. In general, five different sets of
competencies exist:

1. Abilities
2. Interests
3. Personality
4. Values
5. Motivation and/or ability to study further

On closer examination, these five competencies collapse into two sets of


factors: a person’s abilities (including the ability and willingness to
study further), and a combination of values, interests and personality
which determines the type of work a person is willing to do or is
interested in doing.

15.4.1 Ability
Abilities can be divided into two broad categories, namely general
ability (often called intelligence or “g”) and specific abilities or
aptitudes.
15.4.1.1 General ability
Because general ability (“intelligence”) plays an important role in
determining the level of education that can realistically be achieved, any
form of assessment for career purposes should include some general
measure of this. Numerous scales (both verbal and non-verbal) are
available. A relatively simple and culture-fair measure is Raven’s
Standard Progressive Matrices, used as a timed test with a time limit of
30 minutes. It can be used over a relatively broad educational range (to
university level), provided suitable norms are available.

15.4.1.2 Specific aptitudes


In addition to a general measure of ability, there are a number of
aptitude batteries for assessing abilities in areas such as language usage,
verbal reasoning, numerical reasoning, clerical aptitude and mechanical
ability. At the school-leaving level, the South African Differential
Aptitude Test (DAT) battery developed by the Human Sciences
Research Council (HSRC) (Coetzee & Vosloo, 2000), is a widely used,
culture-fair battery, although the norms may be a little low. Local test
vendors such as SHL, Psytech and Jopie van Rooyen & Associates all
provide various stand-alone tests especially for numerical reasoning,
verbal reasoning and clerical aptitude. These can be used together to
form a meaningful battery for career purposes. In addition, there are
specialised tests for predicting competence in areas such as information
technology and sales. See Appendix 1 for a discussion of various tests
and the specific meanings of these terms.

15.4.2 Values, interests and needs


The second group of factors or competencies associated with career
success are those related to a person’s likes and dislikes. There is little
purpose in recommending a career route that is not aligned with a
person’s basic value system or personality structure. (However, bear in
mind that people change and that they may learn to appreciate a career
that initially seemed unsuitable.)

15.4.2.1 Values
According to Katz (1983, p. 17), “… values represent feelings (and
judgment) about outcomes or results, such as the importance, purpose or
worth of an activity”. Values therefore lay the foundation for our
attitudes, motivation and job satisfaction (Robbins, 1996, p. 174). Nevill
and Super (1986) argue that one of the main reasons for job
dissatisfaction is a poor match between the satisfaction of work-related
needs and post requirements (cited in Langley, Du Toit & Herbst, 1995,
p. 3). It follows from this that the process of values clarification is very
important in career planning, and therefore any technique that helps
people clarify what they value in terms of work style will assist them in
making more fulfilling and rewarding career and employment decisions.

To assess values, we return to Holland’s RIASEC model. There are


several scales for determining where a person stands on these six
factors. Just as jobs are rated in terms of their three dominant RIASEC
factors, so too can people be rated on the same factors. Measures such as
the Self-Directed Search (SDS) and the Vocational Preference Inventory
(VPI) developed by Holland (1985) can be used for this purpose.
Numerous versions of these scales have been developed by various
consultants and agencies, both in South Africa and internationally. A
search for “Holland”, “RIASEC” or “career advice” on the Internet will
bring up many of these scales.

15.4.2.2 Interests
Many of the scales that measure career interests are based on Holland’s
RIASEC model of six vocational personality types. However, there are a
number of other occupational interest scales such as the Kuder
Occupational Interest Survey (Kuder), the 19-Field Interest Inventory
(19FII), the Jackson Vocational Interest Survey (JVIS) and the Strong
Interest Inventory (SII). These scales measure people’s interests in a
broad range of occupations, work activities, leisure activities and school
subjects, and compare these with the interests of people successfully
employed in the particular occupations. Organisations such as SHL and
Psytech have their own scales such as the Occupational Interest
Inventory and the Occupational Interest Profile. They produce scores on
various dimensions relevant to choosing a career.
Research (e.g. Haverkamp, Collins & Hansen, 1994; Ryan, Tracey &
Rounds, 1996; Fouad, Harmon & Borgen, 1997; Day & Rounds, 1998)
suggests that there is a certain universality to vocational interest
structure, such as that expressed by Holland’s vocational personality
types, which cuts across ethnicity, gender and socioeconomic status.
This conclusion is not unanimous, however (see, for example, Rounds &
Tracey, 1996).

15.4.2.3 Aspiration level


In addition to the person’s general intellectual ability, interests and
values, an important consideration is his motivation to study further and
the opportunities for him to do so. In a country where higher education
is expensive and access limited, cost can be a major limiting factor.
Advising a person to study further when he does not have the financial
means can be counterproductive. Nevertheless, notwithstanding this
possible major obstacle, a person with vision, drive and grit who really
aspires to higher education can find financial assistance (bursaries,
loans, etc.) to make his dream a reality.

15.4.3 Personality
Personality plays an important role in career choice, because it
determines the kinds of situation in which a person is most comfortable.
As Chapter 11 states, a good definition of personality is that it is the
preferred way of perceiving and interacting with the environment. In
other words, it is our personality that directs our ongoing attention to
various aspects of our world, and thus has a major role in determining
both what kind of work we choose and how happy we will be in
particular situations. For example, an outgoing energetic personality will
not be happy in a situation that requires close attention to detail over a
long period of time. Similarly, a shy, retiring person will not be happy or
successful in situations that require high levels of drive and enthusiasm.
As a result, an important aspect of advising a person about his career
choice is to assess his personality. This topic is dealt with extensively in
Chapter 11. Various assessment measures can be used, including those
based on the 16PF and Jung’s theories. SHL has developed a
comprehensive instrument called the Occupational Personality
Questionnaire (OPQ), which is very useful in this context, although it is
computer administered and scored at a cost.

15.4.3.1 The Myers-Briggs type indicator


One approach to personality assessment that is relatively cost effective
is the Myers-Briggs type indicator (also known as the Myers-Briggs
temperament indicator or MBTI), which is based on Jung’s theories.
Very similar scales exist in the form of the Jung Personality
Questionnaire (JPQ) and the Keirsey Temperament Sorter (Keirsey,
1998). As Chapter 11 points out, the MBTI approach to assessing
personality identifies four different factors: extraversion/introversion (E-
I), intuition/sensing (N-S), thinking/feeling (T-F) and perceiving/judging
(P-J). These dimensions are described as follows:
EXTRAVERSION (E) VERSUS INTROVERSION (I)
The first dimension identified by the MBTI is extraversion versus
introversion, and has to do with whether the person focuses his attention
outwards or inwards in his dealings with others and with the world in
general. Extraverts focus their energy and attention outside of
themselves. They naturally seek out others, whether one-on-one or in
large groups. Because extraverts need to experience the world in order
to understand it, they thrive on lots of activity. When looking at any
situation, extraverts ask themselves: “How do I affect this?”

Introverts, on the other hand, enjoy spending time alone in order to


“recharge their batteries”. Because they try to understand the world
before they experience it, much of their activity goes on mentally, in
their inner world. Where an extravert might find too much time alone
draining and counter-productive, an introvert may become turned off
and tired by the clamour of a cocktail party. Introverts step back to
examine a situation, asking themselves: “How does that affect me?”

EXTRAVERTS (E) INTROVERTS (I)


Act, then think Think, then act
Tend to think out loud Think things through in their heads
Talk more than listen Listen more than talk
Communicate with enthusiasm Keep their enthusiasm to themselves
Respond quickly; enjoy fast pace Respond only after thinking things through
Prefer breadth to depth Prefer depth to breadth

Career implications of E/I


In the workplace, extraverts gravitate to jobs that allow a good deal of
verbal interaction with others. They like jobs with variety and activity,
and often become impatient with long, slow jobs. Extraverts are
interested in the activities related to their work and in how other people
address them. They like jobs in which they are able to act quickly,
although they are sometimes guilty of acting impulsively and not
thinking through a matter. They find phone calls a welcome diversion
from the routine of work, and like to have people around so that they can
develop ideas through discussion.

There is some confusion about whether the spelling of these terms is “extravert”
or “extrovert” and “intravert” or “introvert”. Both are acceptable, and in this text
“extravert” and “introvert” are used.

Introverts, on the other hand, develop ideas by reflection and therefore


do well in situations that require focus and managing one task at a time.
They like to think extensively before they act, and are best at jobs that
allow quiet times for concentration. Sometimes they think without
acting. They tend not to mind working uninterruptedly on one project
for a long time. When concentrating on a task, they do not like
interruptions and find phone calls intrusive. They like working alone.
SENSING (S) VERSUS INTUITION (N)
The second dimension of the MBTI concerns the kind of information
people prefer to acquire. Some people focus on “what is”, while others
see “what could be”. Although both approaches are valid, they are
fundamentally different. Sensors literally gather data using their five
senses. They establish what exists, and base their decisions on what they
can observe and have observed, concentrating on what they see, hear,
touch, smell and taste. Sensors trust what is real and concrete, and seek
documentation and measurement to back it up. They tend to focus on
immediate experience, on facts, details and practicalities in the present
context. They tend also to have keen powers of observation and a good
eye and memory for detail.

Intuitives like to read between the lines and look for meaning among the
hard facts. They use their imagination, and trust their own intuition and
hunches. They see the big picture and are oriented towards the future
and what might be, rather than what is. They focus on possibilities,
meanings and relationships, and tend to be more concerned with broad
principles and patterns than with fine detail. Intuition allows perceptions
beyond what is visible to the senses, and tends to focus on the abstract
and creative future. Obviously, we all use our five senses to relate to the
world, but each of us also has a preference for the kind of information
we take in.

SENSORS (S) INTUITIVES (N)


Like new ideas only if they have Like new ideas and concepts for their own
practical applications sake
Value realism and common Value imagination and innovation
sense
Like to use and polish Like to learn new skills; easily become
established skills bored after mastering a skill
Tend to be specific and literal, Tend to be general and figurative, and to
give detailed instructions use metaphors and analogies
Present information in a step-by- Present information through leaps, in a
step manner roundabout way

Implications of S/N for careers


Sensors prefer to work with real things and to apply their past
experience to solve problems in a concrete way. They therefore like and
do well in situations where they can use experience and standard ways
to solve problems. They like to do things with a practical bent, and
seldom make errors of fact. They enjoy presenting the details of their
work first, and prefer fine-tuning an existing project rather than trying
new things. They usually proceed in a step-by-step fashion and generally
distrust and ignore their inspirations. Sensors are therefore happiest in
careers that require careful attention to detail and to following prescribed
methods and processes, such as in the “conventional” professions and
(to a lesser degree) “realistic” professions associated with Holland’s
RIASEC model in Table 15.1.

Intuitives apply their appreciation for complexity to solving new and


complicated problems that are more theoretical and require them to use
their imagination. They prefer change, sometimes radical change, to the
continuation of the existing, and like to do things that are innovative and
creative. They enjoy learning a new skill more than using it, and usually
proceed in bursts of energy. Intuitives follow their inspirations, good or
bad, even when they may make errors of fact. As a result they often find
themselves in creative professions. They tend to have little patience with
those who follow rules and are conservative and conventional. They like
to present an overview of their work first.
THINKING (T) VERSUS FEELING (F)
The third dimension of personality addressed by the MBTI refers to the
way in which people make decisions and reach conclusions. “Thinking”
suggests impersonal decision making, whereas “feeling” refers to
decisions based on personal values.

THINKERS (T) FEELERS (F)


Step back, apply impersonal Step forward, consider effect of actions
analysis to problems on others
Value justice and fairness, one Value empathy and harmony; see the
standard for all exception to the rule
Naturally see flaws and tend to be Naturally like to please others; show
critical appreciation easily
May be seen as heartless, May be seen as over-emotional, illogical
insensitive and uncaring and weak
Consider it more important to be Consider it just as important to be tactful
truthful than tactful as truthful
Believe feelings are valid only if Believe any feeling is valid
they are logical
Are motivated by a desire to Are motivated by a desire to be
achieve and accomplish appreciated and to help others

Thinkers pride themselves on their ability to be objective and analytical.


They prefer decisions made on the basis of logic and principle, and they
look for cause-and-effect relationships. As a result, they tend to be seen
as impersonal and analytic, and concerned with abstract principles such
as truth, justice and fairness. They are objective and principle oriented in
their approach to decisions.

Feelers consider how much they care about an issue and what they feel
is right; they pride themselves on their empathy and compassion. They
come to decisions by weighing relative values and the merits of the
issues in a situation. They have a capacity for warmth, human concern
and preservation of traditions and values of the past. They are more
subjective and person oriented in their approach to decisions.
Implications of T/F for careers
Thinkers prefer work that allows them to analyse things logically and to
apply objective criteria to decision making. They need to work in
situations where they can use logical analysis to reach conclusions and
are able to base their decisions on the principles involved in the
situation. Because of this approach, they tend to decide impersonally,
sometimes paying insufficient attention to other people’s wishes and
sometimes inadvertently hurting people’s feelings. They tend to have
firm views, and can give criticism when appropriate – this allows them
to work in situations where there is tension and little harmony. They feel
rewarded when a job is done well, whether or not the people involved
are happy with the process. They are best suited to jobs where principles
and precedent count more than relationships and harmony.

Feelers often experience their greatest satisfaction in helping people,


either directly or indirectly. They do best in situations where people use
values to reach conclusions and work best when they are in harmony
with others. They enjoy pleasing people, even in unimportant things,
tend to be sympathetic, and dislike, even avoid, telling people
unpleasant things. They prefer to look at the underlying values in a
situation, and feel rewarded when people’s needs are met, as is the case
in the social and helping professions. However, they often allow
decisions to be influenced by their own and other people’s likes and
dislikes. Feelers are best suited to careers where interpersonal relations
and empathy are important, such as those associated with the “social”
and “enterprising” categories in Table 15.1.
JUDGING (J) VERSUS PERCEIVING (P)
The final dimension of personality concerns people’s preference for
structure or spontaneity in their lives. People with a preference for
structure are happiest when they have made a decision, have settled
matters, and have good structure in their lives. They are seen as judging.
Judgers seek to organise and control their lives, but are not necessarily
judgemental (opinionated). They are concerned with making decisions,
seeking closure, planning operations or organising activities. People
who prefer judging are often seen to be well organised, purposeful and
decisive.

Perceivers are attuned to incoming information. They are seen as


spontaneous and adaptive people, open to new events and changes. They
aim to miss nothing. They prefer flexibility, and like to stay open to all
kinds of possibilities. They seek to understand and experience life,
rather than to control it. A fundamental difference between judgers and
perceivers is their reaction to closure. Judgers feel tension until closure
is reached and are naturally attracted to making decisions. Perceivers
avoid closure at all costs since they regard decision making as a stressful
removal or termination of their options.
Career implications of J/P
Judgers prefer jobs where they get to make decisions that exercise
control. They work best when they can plan their work and follow their
plans. They like to get things settled and finished, and reach closure by
deciding quickly. They seek structure, and draw up schedules for most
things, using lists – for themselves and for others – to prompt action on
specific tasks. They tend to be satisfied once they reach a decision,
therefore they make good managers. However, they may not notice the
need for innovation or change. They are much better at day-to-day
management than at long-range, more strategic and innovative planning.

Perceivers prefer work situations that accommodate their need for


flexibility and their talents for adapting. They tend to be curious and to
welcome new light on a matter, which means that they adapt well to
changing situations.

JUDGERS (J) PERCEIVERS (P)


Have a work ethic: “Work first, play Have a play ethic: “Enjoy now; finish the
later” job later (if there’s time)”
Set goals and work towards Change goals as new data become
achieving them on time available
Prefer knowing what they are Like to adapt to new situations
getting into
Are product oriented (emphasis on Are process oriented (emphasis on how
completing the task) task is performed)
Derive satisfaction from finishing Derive satisfaction from starting projects
projects
Take deadlines seriously See deadlines as elastic

However, they tend to leave issues open for last-minute changes and to
postpone decisions while searching for options. They feel restricted
without change and are inclined to postpone doing unpleasant tasks.
This creates problems if they are in a situation where tight deadlines and
routine decisions have to be made. They also use lists to remind them of
all the things they have to do. If they have the ability, perceivers do well
in the “investigative” and “artistic” professions related to the categories
in Table 15.1.

These four factors can be combined to give 16 different combinations


such as ENTJ or ISFP, which are used to describe an individual’s
personality. Each of the 16 combinations has particular strengths and
weaknesses associated with it, and each type of person is likely to be
comfortable in one kind of job and uncomfortable in another. These 16
types are sometimes labelled as shown in Table 15.2.
Table 15.2 Role labels of the MBTI types

Type Role
ISTJ The Duty Fulfillers
ESTJ The Guardians
ISFJ The Nurturers
ESFJ The Caregivers
ISTP The Mechanics
ESTP The Doers
ESFP The Performers
ISFP The Artists
ENTJ The Executives
INTJ The Scientists
ENTP The Visionaries
INTP The Thinkers
ENFJ The Givers
INFJ The Protectors
ENFP The Inspirers

How would you describe yourself?

We should note that the theory of psychological types reflects a person’s


preferences and not his intelligence, abilities or likelihood of success.
However, if people find themselves in work or social situations that
match their preferences, they are likely to be happier and more
motivated to succeed than if there is a mismatch between the demands
of the situation and the person-preferred behaviour. It must also be
remembered that all people have a certain amount of each personality
characteristic, and that describing someone in terms of a type does not
mean that he cannot display any other characteristics. All such a
description does is suggest how each person normally prefers to behave.
(See also the criticisms of the MBTI given in section 11.3.6, especially
with regard to the Forer Effect described in Sidebar 11.2.)
We should not forget that career choice is only one step in the much
longer process of career management, and that this requires ongoing
fine-tuning and occasional radical reassessment – neither jobs nor
people remain constant, and new opportunities may arise. This may
require a mid-life reappraisal of interests and career direction. Because
the notion of a career has changed, a person’s career management is no
longer the responsibility of the organisation that employs him. Rather,
the responsibility of managing a career now rests solely on the person’s
own shoulders.

15.5 Summary

In this chapter, we discussed the changing definition of a career,


illustrating that “career” no longer refers to a lifetime of employment in
a single organisation, but rather to a series of positions and assignments
that use a particular set of skills and abilities. Career progress is no
longer the upward mobility of people within an organisation, but rather
the development of increasingly sophisticated competencies that can be
applied in different situations. Sound career choice, however, depends
on knowing what jobs are available, what each job requires, what a
person’s interests are, and his strengths and weaknesses in each key area
of any prospective job or career. It can also be expected that most people
will change career paths several times, although most of these changes
will be to similar and compatible jobs.

In view of this, we covered job and/or career characteristics, including


the theories and models of Holland (RIASEC) and Schein (career
anchors). We then discussed ways of assessing individual
characteristics, including ability, values, interests and aspirations, as
well as the role of personality. Career choice and career success is then a
matching process: people seek out and succeed in those jobs and/or
careers that are aligned with their ability and personality and their
values, interests and needs.
Additional reading

For an overview of the management of the career process, see Chapter 12 in Crafford,
A., Moerdyk, A.P., Nel, P., O’Neill, C. & Schlechter, A. (2006). Industrial psychology:
Fresh perspectives.
People interested in using testing for employment equity purposes are referred to the
Public Service Commission (of Canada) (2006), Standardized testing and employment
equity career counselling: A literature review of six tests. The purpose of this document,
developed by the Personnel Psychology Centre (PPC) of the Canadian Public Service
Commission, is to summarise research evidence in respect of the use of standardised
tests and other assessments in the career counselling of employment equity (EE)-
designated group members (i.e. aboriginal peoples, persons with disabilities, visible
minorities and women). Although not directly applicable to the South African situation, it
raises numerous issues of importance to the local context. This is available at
http://www.psc-cfp.gc.ca/ee/eecco/intro_e.htm
Pearson Publications offers a career assessment tool known as IDEAS™ (Interest
Determination, Exploration and Assessment System®), which is designed to be used in
conjunction with career exploration and guidance units. According to the brochure, the
IDEAS inventory offers 16 basic scales based upon the RIASEC themes, and helps
both students and adults to develop an awareness of possible career choices. The
IDEAS test can be accessed at http://www.pearsonassessments.com/tests/ideas.htm

Test your understanding

Short paragraphs

1. Briefly describe Holland’s hexagonal model of career types.


2. Outline Schein’s theory of career anchors.

Essay

According to Holland, career choice and success are based on a match between an
individual’s personality and the characteristics of the job and/or career. Discuss this
argument, showing how it should be used in assessing a person seeking career advice.
16 Interviewing

OBJECTIVES

By the end of this chapter, you should be able to

describe the elements of an interview


show why interviews should be seen as a form of psychological assessment
outline what the law says about interviewing
describe the various stages in an interview
show which behaviours should be avoided and which should be developed in the
interview situation
list the major problems associated with interviewing
show what can be done to improve the interviewing situation.

16.1 Introduction

Interviewing is an important aspect of assessment in many areas of


organisational management – from selection to performance
management and from counselling to discipline. In this chapter, we
focus on interviewing as a source of information, especially in the
selection context. As a result, we will focus on the interview as an
assessment process and evaluate it in terms of the psychometric
principles of reliability, validity and fairness.

16.1.1 Definition
According to Cohen and Swerdlik (2002, p. 410), “[a]n interview is a
technique for gathering information by means of discussion”. In general,
interviews are relatively unstructured procedures designed to obtain
information about an individual that is not readily available via more
formal psychological assessment techniques. There are various forms of
interviews, ranging from the completely unstructured approach to highly
structured approaches. Interviews are used for both diagnostic and
assessment purposes in the work situation. Although they have elements
in common, in some respects these two applications are very different.

16.1.2 Users of the information


A useful way of categorising interviews is according to who makes use
of the information that is obtained by the interviewer. In general,
information obtained from interviews is used by three different parties:
the organisation, the interviewer and the interviewee:

Organisation. In the case of selection interviews, the information is


used mainly by the organisation, insofar as it is a method for
determining the individual’s strengths and potential weaknesses in
relation to specific jobs and work situations. The information is thus
used to assist in the selection process.
Interviewer. In managing performance and trying to understand why
specific problems have occurred, the interview information is of
primary importance to the interviewer. Understanding the problems
that have been experienced and/or the hopes and aspirations of the
interviewee places the interviewer in a better position to manage the
situation.
Interviewee. In the case of performance management and/or
counselling situations, the information is of particular interest to the
interviewee, who comes to understand why he behaves in a certain
way and what can be done to change or make better use of these
behavioural and thought patterns. In some work situations, both
interviewer and interviewee use performance management interviews
to set performance targets, to discuss levels of performance achieved,
and/or to decide on steps to improve performance.
16.2 Employment interviews

Employment interviews are used mainly for selection purposes, although


they are also used extensively for performance appraisal and
management. In looking at interviewing, McIntire and Miller (2000, p.
329) distinguish between traditional and structured interviews. In
addition, we can also identify semi-structured and counselling
interviews.

16.2.1 Traditional interviews


In traditional interviews*, the interviewer (or panel of interviewers)
asks different questions of each candidate in order to come to a clear
understanding of the person’s strengths and weaknesses, interests and
aspirations. According to McIntire and Miller (2000, p. 328), this type of
interview serves the “getting to know you” function. In this respect, this
approach to job interviewing has a number of weaknesses related to
reliability and validity. (These are discussed in greater depth in section
16.3.) The Employment Equity Act prohibits the asking of certain
questions which could violate a person’s rights to privacy.

16.2.2 Structured interviews


Structured interviews* involve the use of set questions that are
presented to all applicants in a specific order and are based directly on
the content of the job being applied for – all candidates are asked the
same questions. This implies that there is a comprehensive job
description for the particular job. In addition, interviewers need to be
trained in proper interviewing techniques and the scoring system – job
interviews are usually scored in terms of a clearly defined rating system.
As a result, it is much easier to calculate psychometric properties such as
inter-rater reliability, internal consistency and concurrent validity. This
information is essential to disprove charges of discrimination and
adverse impact, should such charges arise.

16.2.3 Semi-structured interviews


A compromise between the relative flexibility (but potential bias) of
unstructured interviews and the rigidity of structured interviews is found
in the semi-structured interview*. In this approach, a list of
predetermined questions is presented, allowing the interviewer to change
the order of item presentation and to ask follow-up questions when it is
necessary to probe for more information. However, it is essential that all
key questions are asked during the interview.

16.2.4 Counselling interviews


Traditional, structured and semi-structured selection interviews are
conducted to find out fairly specific details about the person’s
background, experience and motivation patterns to see if they will fit
into the organisation and meet its job-related needs. Counselling
interviews, on the other hand, are designed to manage poor performance
and change an employee’s behaviour and thought patterns.

Counselling interviews are less about gathering information and more


about addressing problems being experienced by the employee in
meeting the employer’s needs. They can also involve disciplinary steps
if required. As a result, they are unique to each situation and are more
therapeutic than psychometric in nature. (However, see section 16.4.7
for a brief discussion on structured clinical interviews.) Self-insight is
the key success factor.

In some cases, workplace interviews are used in an effort to change the


employee’s behaviour and take the form of disciplinary interviews and
counselling interviews in which self-insight is a key success factor. In
this form, they become client-focused interviews. Because
organisational interviewing is usually tied directly to job prospects
(selection, promotion, performance management, i.e. bonuses), it is
important that opportunities for bias and wrong decisions are minimised.
This takes interviewing into the realm of psychometric testing, and the
issues of reliability, validity, content sampling, etc. need to be attended
to. (The importance of having a proper job description prior to any
selection, promotion and disciplinary processes cannot be
overemphasised!)
In employment interviews, the goals are generally known beforehand –
to fill a particular post, or to address performance problems in line with
the person’s job description. This allows the interview to be carefully
structured. In addition, work-related interviews are designed to obtain a
few relatively concrete facts about work history, skills, values, attitudes
and various other work-specific competencies. This is quite different
from a clinical interview, which also tries to obtain information,
although the specific goal of a particular clinical interview depends
almost entirely on the needs of the interviewee. As a result, the exact
goals of a clinical interview may be difficult to determine in advance,
and the clinician almost has to play the role of a detective. Where
counselling interviews form part of the process of managing the
employee and are used in an effort to change the interviewee’s thought
and behaviour patterns, the selection interview is far more concrete in its
objectives.

16.3 Problems associated with interviews

16.3.1 Reliability
There is a great deal of evidence to show that interviewers in general are
relatively consistent in their assessment and evaluation of people if the
interview is repeated (or a video of it is looked at several times). In other
words, there is high test-retest reliability. However, especially in
unstructured interview situations, there is little agreement between
interviewers, resulting in low inter-rater reliability. Unless, firstly, the
interview is carefully structured beforehand; secondly, interviewers are
trained to follow the structure and to record information in a consistent
way; and thirdly, the interviewers follow the structure closely, there is a
strong possibility that the interview will be unreliable. If these
conditions are not met, no two interviewers will cover the same ground,
resulting in poor internal consistency.
Remember – any assessment that is not reliable cannot be valid!

16.3.2 Validity
Although clinicians regard the interview as an extremely useful source
of information, there is little evidence that the clinical or counselling
interview is either reliable or valid. Yet it continues to be used –
possibly because clinicians feel that it ought to work, and because of
their training, clinical psychologists should be better than anyone else at
observing and judging another person’s behaviour. Evidence runs
counter to this. As early as 1954, Meehl showed that decisions based on
unstructured, non-directive interviews were no more accurate, and often
less so, than those using more structured techniques. These findings
have been supported by later research (e.g. Dawes & Corrigan, 1974;
Goldberg, 1970; Wiggins, 1973). In general, the validity of unstructured
interviews tends to be around 0,10 to 0,15, whereas properly structured
interviews linked to specific conditions (clinical) or job descriptions
(work related) are in the region of 0,5 and above.

Hough and Oswald (2000, p. 646) show that the vast majority of cases in
the US federal courts contesting selection and other workplace decisions
involve unstructured interviews. A review of 158 US federal court cases
involving hiring discrimination from 1978 to 1997 by Terpstra,
Mohamed and Kethley (1999) showed that unstructured interviews were
challenged in court more often than any other type of selection device.
Even more important is the fact that unstructured interviews were found
to be discriminatory in 59 per cent of these cases, whereas structured
interviews were found not to be discriminatory in 100 per cent of cases.
In addition, there is a great deal of evidence to suggest that even when
interviews are valid predictors of later job performance, the inclusion of
interviewing does not add to the overall validity of the selection process.
This lack of incremental validity* has been reported by Cortina et al.
(2000) and by Walters, Miller and Ree (1993), among others.
16.4 Reasons for poor reliability and validity

16.4.1 Theoretical orientation


The theoretical orientation of the person drawing up the interview
determines what questions are asked and what is seen as a good
response. It is well known that a key ingredient for behavioural or
attitudinal change is the interviewee’s acceptance of the interviewer’s
approach. Non-acceptance will lead to a lack of cooperation and an
unwillingness to contribute any information, thus making the interview a
failure.

16.4.2 Experience of the interviewer


Research (for example by Graves, 1993) has shown that inexperienced
interviewers often commit basic errors and come across as cold and
defensive. This leads to a lack of rapport and unsatisfactory interview
outcomes. When interviewers are seen as warm, open and friendly, the
interview is rated far more positively, resulting in better data being
obtained and the content being better communicated and thus more
readily accepted. (See McIntire & Miller, 2000, p. 329.)

Even experienced interviewers may be prepared to admit that the


interview is generally not a good technique, but individually most feel
that they are able to carry out an interview properly, even if it is evident
that very few of their colleagues can do this. Some interviewers rely
heavily on interviews because they feel that the alternative methods do
not provide them with the information they need. They argue that even
though the validity of the information may be open to doubt, it is useful
when it is used in a process of triangulation in which other information
is available.

In counselling and clinical situations, because therapy is an ongoing,


interactive process, less than valid information can be amended as the
process unfolds.
16.4.3 Sophistication of the client
Many people, especially those who are sophisticated or antisocial, have
learned to play the game and deliberately try to lead the interviewer on.
They purposely distort what they say either as part of malingering or
faking a bad situation or in an attempt at one-upmanship. (Think of
Hannibal Lecter in the movie Silence of the lambs.) Such interviews
cannot be successful.

16.4.4 The nature of the problem


Some issues are more complex than others and hence difficult to deal
with. As a result, both test-retest and inter-rater reliability as well as the
validity of the information obtained are lower than in more
straightforward cases.

16.4.5 Confirmatory biases and self-fulfilling hypotheses


It has long been known in social psychology (see, for example,
Rosenthal & Rubin, 1978) that people tend to look for information that
confirms their view or diagnosis, and ignore or even reject information
that contradicts it. If information is being sought, an approach that looks
only for confirmatory evidence leads to early foreclosure in which initial
impressions often have a decisive impact on the scope and conduct of
the interview. If the first impression is accurate, a great deal of valuable
information can be obtained. If, however, the initial hypothesis is wrong,
much time is wasted in pursuing incorrect lines of reasoning – barking
up the wrong tree. Even worse, the wrong initial hypothesis could result
in a wrong diagnosis, with the interviewee feeling he is being tortured
until he confesses!

The following example of findings from organisational psychology


relating to stereotyping and other demand characteristics illustrates this
tendency to confirm initial impressions. In a performance management
situation, Fletcher (1995) shows that the scores on 13 different
performance appraisal (PA) dimensions given to employees correlate
with the same dimensions over a three-year period. In other words, the
people who were assessed during the PA tended to score high or low on
the 13 PA dimensions each year, irrespective of how hard they worked
or the results they achieved. This suggests that it is not performance but
some other factor(s) that is (are) leading to good or poor appraisals.

To support this view, in Fletcher’s study there was a strong correlation


between almost all the PA dimensions and “optimistic” as measured by
the Occupational Personality Questionnaire (OPQ). In other words,
people who got higher optimistic ratings on the OPQ received higher
ratings across the board. The same findings were reported in relation to
“outgoing” and “affiliative” (friendly). There is little reason to expect
that attributes such as “sound judgement”, “learning new skills” and
“showing initiative” (other dimensions of the PA) should relate to these
three, but they do. This is clearly an example of the halo effect.

What is even more interesting is the fact that in job and performance
management interviews, physically attractive people are consistently
rated more positively than other people. A problem can occur with the
selection of sales staff – who may be good at selling themselves, but
then fail to sell the organisation’s products or services.

In employment and/or selection testing as well as performance appraisal


situations, interviews are widely used. Surveys show that over 95 per
cent of organisations use them as part of their selection process, and the
majority of these see the interview as the single most important
component of this. As we have seen in sections 16.4.1–16.4.5,
interviews are prone to a number of systematic errors. Sex, race, age and
physical attractiveness all influence interview outcomes. These effects
are not consistent and seem to relate to stereotyping. For example,
physical attractiveness may influence the outcome of an interview
positively for some jobs in some situations, but may also have a negative
effect for other jobs in different situations. For example, accountants are
not “supposed” to be good at sports, so someone not fitting this
stereotype may be rejected. There are numerous stereotypes about
blonde women and redheads, and these too interfere with interview
outcomes, yet organisations continue to use interviews.

16.4.6 So why do they continue to be used?


There are five reasons why interviews continue to be used.

Firstly, it is widely known that people believe that they know what is
happening, even if others do not. This is termed the “illusion of
validity”. Furthermore, they consistently overvalue their own decisions.

Secondly, the interview is often regarded as more practical than


technically superior alternatives. However, unless the interviews are
carefully structured and the interviewers are well trained, they are
neither reliable nor valid, and therefore of little practical value.

Thirdly, interviews serve purposes other than prediction of job success.


For example, in addition to the “getting to know you” function,
interviews help to create a set of expectations about the job and the
organisation. As McIntire and Miller (2000, p. 329) show, when
candidates have realistic expectations about their new position, they are
far less likely to feel let down and are likely to remain with the
organisation longer.

Fourthly, in the workplace, selection interviews are often used to sell the
organisation in an effort to make good candidates accept a job offer. The
question then arises as to who stretches the truth furthest in the interview
situation: the candidate who says what a marvellous person he is, or the
interviewer, who says what a marvellous organisation it is to work for.

The fifth reason for interviewing is to address performance or behaviour


problems. In clinical or counselling settings, the interview thus forms
part of the therapeutic process and is of great value. However, we must
not confuse this type of therapeutic interview with a data-gathering
interview. As stated in section 16.2.4, clinical interviews are less
concerned with issues of accuracy and validity as they are by nature
exploratory and discursive.

16.4.7 Improving interviewing as an assessment technique


In general, the quality of interviewing improves when the schedule is
carefully planned and structured. In addition, interviewers need to be
aware of any issues likely to impact negatively on interview validity.
The use of hypothetical situations in which a person is asked how he
would behave in a specific situation is recommended. Selection
interviewing should be based on a comprehensive job description in
which the competencies and behavioural indicators are clearly spelled
out. In this respect, the use of job profiling techniques such as the
Position Analysis Questionnaire (PAQ) or Targeted Selection should be
used. Interviewers need to be made aware of their own biases and “pet”
theories, and trained to avoid these.

In addition, interviewers should also note the interviewee’s appearance,


dress, behaviour and body language as well as his thought processes and
content, his insight into and reasoning regarding the situation, and any
displays of judgement. (For a more in-depth look at these dimensions,
see Cohen & Swerdlik, 2002, p. 414.)

Clinical/counselling interviews can be improved by ensuring that the


interviewers are properly trained to identify causes of underperformance
and, if necessary, to refer the employee for more professional help
through the Employment Assistance Programme (EAP) or external
experts. Kaplan and Saccuzzo (2013, pp. 236–241) discuss various
structured clinical interviews for making diagnoses based on the
Diagnostic and statistical manual of mental disorders (DSM) (American
Psychiatric Association, 2000).

16.5 Stages of an interview

Selection and performance interviews can be divided into four distinct


phases, although the boundaries between these are not always obvious,
especially with a skilled interviewer. In the initial phase, the interviewer
greets the client or applicant and establishes rapport, confirms existing
information and generally sets the scene. In the middle phase, the
interviewer elaborates on the information and probes for any more that
will help him to arrive at a decision regarding the person’s behaviour, or
whether to hire him or not, and so on. In the last stage, the interviewer
asks if there are any further issues the interviewee would like to clarify.
The interview ends with the interviewer explaining what will happen
next and whether any further assessment is necessary. Then the
interview is concluded.

For the interviewer the final stage involves formulating a decision and
recording this in the person’s file. This includes the decision to hire,
shortlist or reject the person, and taking appropriate actions to
implement the decision – a job offer, an invitation to the next phase or a
“We regret” (rejection) letter. It is important to realise that job applicants
are covered by the Employment Equity Act and need to be treated as
employees.

16.6 Effective interviewing

Interviewing is the process of eliciting information from an interviewee


and getting his acceptance and support for a particular point of view.
This viewpoint can either relate to his own problems or the need to
behave in a certain socially accepted or organisationally appropriate
way.

Irrespective of whether we are trying to elicit information for selection


and performance appraisal or to find out the causes of poor performance
with a view to changing behaviour, there are certain dos and don’ts to
observe. These are listed below. We must also be aware that interview
participants affect each other’s moods – if the interviewer or interviewee
is angry or aggressive, the other party tends to react in the same way.
This is known as social facilitation* (see Kaplan & Saccuzzo, 2013, p.
226).
Some don’ts

1. Don’t use evaluative or judgemental statements. Words such as


“good”, “bad”, “excellent”, “terrible”, “stupid”, “disgusting”, and
the like signal to interviewees that they are being judged. This
inhibits them and makes them reluctant to express their opinions or
to reveal facts about themselves that may be crucial for making
appropriate decisions about them. Evaluative statements should be
avoided in almost every interview situation.
2. Don’t use aggressively probing statements that demand that the
interviewee provides more information than he is willing to give.
Although this may be the intent of the interviewer, crudely probing
questions generally give rise to discomfort and an unwillingness to
share information. Sound interviewing skills replace “Why?”
questions with more subtle responses such as “Tell me more”,
“What led to this situation?”, “I’m not sure what you mean – could
you explain it in another way for me?”. One of the tricks is for the
interviewer to deflect the issue from the interviewee to himself:
“Sorry I don’t understand”, rather than “You are not being very
clear about that”.
3. Don’t use hostile statements or any statements designed to make the
person angry or defensive. Avoid statements like “I think you are
not being completely honest with me” or any other language that
reflects one’s own hostility.
4. Don’t use reassuring language. Avoid statements like “Don’t worry,
everything will be OK”, or “Cheer up, things could be worse”.
These false reassurances are recognised by the interviewee as empty
and worthless, and will do little to improve the situation – the
chances are they will rather make it worse.
5. Don’t use closed questions – that is, questions requiring a single
“Yes/No” answer. Instead of asking “Do you like watching soccer
or rugby?”, ask a more open-ended question such as “What sport do
you enjoy watching?” or “What is it about this sport that excites
you?” Instead of asking “Would you like a salary or wage
increase?”, ask “What do you think it takes to get a pay increase in
an organisation like this?”.

Of course, there are times when you may want to break these rules – it
may be useful to become angry to make the interviewee confront an
issue he has been avoiding. However, these cases are rare and should be
considered carefully before implementation.
Some dos

1. Do use open-ended questions. Open-ended questions are those that


require more than a single word or two. They are designed to make
the interviewee bring additional and new information to the
situation.
2. Do use silence when appropriate. Many interviewers find it difficult
to remain silent for any period of time, but a pause in the
conversation often allows the interviewee to collect his thoughts and
to volunteer additional information. Of course, the silence must not
be too long as it generates discomfort. If wisely used, though, even
this discomfort can be productive.
3. Do keep the interaction flowing. There are a number of techniques
that can be used to encourage the flow of the interview. These
include the following (see also Kaplan & Saccuzzo, 2013, pp. 226–
235):

Response Example
Transitional phrase “Yes, go on.” “I see.” “Uh-huh.”
Verbatim repeat Repeats exact words. “Your previous boss had it in for
you.”
Paraphrasing and Repeats response using other words. “So you think your
restatement boss picked on you unnecessarily.”
Summarising Pulls together a number of different responses. “In other
words, you seem to think that your boss was treating you
badly.”
Clarification Clarifies the response. “Why do you think she behaved like
this? Was it just you or did she treat others in the same
way?”
Empathy/understanding Communicates understanding. “I know what you mean. It
must have made you feel angry.”

To get the interview started and to help establish rapport, interviewers


should begin with an open-ended question followed by a statement of
understanding that indicates that he understands the issue and the way
the person feels about the situation.

16.7 Summary

In this chapter, we defined an interview as “a technique for gathering


information by means of discussion” (Cohen & Swerdlik, 2002). We
saw that there are four basic forms of interview that are used in
organisational settings, namely traditional (or unstructured), structured,
semi-structured and counselling interviews.

Interviews were shown to be somewhat reliable in terms of test-retest


reliability, but not so when the approaches of different interviewers were
compared (inter-rater reliability). There is little evidence to show that
interviewing as an assessment technique is valid, or that the inclusion of
interview data with other data adds to overall assessment validity (i.e. it
has low incremental validity). Various reasons for this were put forward,
as well as reasons for the continued use of interviewing as part of the
selection process. Some suggestions regarding possible improvements to
the interviewing process were made.

We were reminded again that a technique cannot be more valid than it is


reliable.

In conclusion, several dos and don’ts of interviewing were put forward


as ways of improving the overall reliability and validity of interviews.

Additional reading

Kaplan, R.M. & Saccuzzo, D.P. (2013). Psychological assessment and theory (Chapter
8) provides a good overview of the issues involved in interviewing as well as a number
of useful tips.
An extensive guide to the development and conducting of structured interviews is given
by the Assessment Oversight and the Personnel Psychology Centre of the Canadian
Public Service Commission (2009).

Test your understanding

Short paragraphs

1. Describe the four types of interview found in organisational settings.


2. Discuss the reliability of interviews.
3. What is known about the validity of interviews?
4. What are the major sources of weakness in interviews?

Essays

1. Describe the strengths and weaknesses of interviewing and discuss what can be
done to improve it as an assessment technique.
2. Compare and contrast selection and performance appraisal interviews with clinical
and counselling interviews.
17 Assessment centres

OBJECTIVES

By the end of this chapter, you should be able to

define what is meant by an assessment centre


describe how you would set about determining and defining the dimensions and/or
competencies to be assessed
show how you would design and/or locate the exercises to use in the assessment
process
describe how you would draw up a scoring system
show how you would set about evaluating people on the various dimensions
and/or competencies
show how you would combine the various assessment scores to arrive at a
decision
show how the basic assessment centre techniques can be used in a
developmental process
describe the steps required to set up and administer an assessment centre.

17.1 Introduction

Thus far, this book has dealt with various approaches to organisational
assessment for identifying managerial and cognitive skills. These
include techniques such as interviews, psychometric testing, work
samples, reference checks, bio-data and track records. We have also
seen the importance of triangulation and the need to use a range of
different techniques to obtain the information required for accurate
description and prediction.

An area in which almost all the theories related to assessment converge


is that of assessment centres*. This approach to assessment applies
particularly to identifying competencies at managerial and executive
levels, but has also been used at supervisory levels. It has also been
suggested for the selection of postgraduate students, especially for
aspiring clinical psychologists, for whom interpersonal skills and “ego
strength” (i.e. personal strength of mind, resilience, ability to manage
stress, etc.) are crucial.

17.1.1 Definition of an assessment centre


An assessment centre is a process for assessing a number of attributes
and competencies using a broad range of assessment techniques in a
multi-method, multi-trait and multi-observer process. According to
Arnold (2005, p. 614), an assessment centre “is an assessment process
that involves multiple exercises and multiple assessors to rate an
assessee’s performance on a series of job-related competencies”.

McIntire and Miller (2000, p. 365) see assessment centres as

large-scale replications of a job that require test takers to solve typical


job problems by role playing or to demonstrate proficiency at job
functions such as making presentations or fulfilling administrative
duties. They are used for assessing job-related dimensions such as
leadership, decision making, planning and organising.

“Assessment centre” refers to the technique and not to a place where


assessments take place. Typically an assessment centre will involve five
to eight candidates and three to four observers (two candidates per
observer), and last from one to three days, although the current tendency
is for centres to be shorter rather than longer. However, one day is
probably too short for senior positions. Assessment centres are labour
intensive and time consuming, and therefore tend to be limited to
candidates for relatively senior positions. Results obtained from
assessment centres are used to identify the strengths and weaknesses of
the candidates or staff members with a view to selection and/or
promotion. Participants are usually given a full description of their
competency profile, and guidelines on how to improve their
performance in the relevant areas.
17.1.2 Assessment centres and development centres
compared
In some cases, assessment centres are used as a developmental process.
In this instance, participants are given specific and direct feedback on
their performance of various exercises, and are allowed a second and
even a third chance to perform a particular task. A very useful technique
is video recording, as it enables the participant to review his
performance to see what should have been done differently. In these
cases, the emphasis is on development rather than on assessment per se,
and the approach is called a development or experiential centre.

The main differences between assessment and development centres are


given in Table 17.1.

Table 17.1 Assessment centres and development centres

Assessment centres Development centres*


Aimed at filling an immediate vacancy Aimed at developing skills for the longer
term
Have fewer assessors and more Have more assessors and fewer participants
participants (1 to 2) (1 to 1)
Information used for selection, Used for development, therefore the
therefore have competent/not yet conclusions are in terms of development
competent criteria and not selection
Focus on what the participant can do Focus on what the participant can learn to
do
Organisation has control of the Participant has control of the information to
information to make selection make developmental decisions
decisions
Line managers as assessors Line managers as coaches and facilitators
Less emphasis on self-assessment More emphasis on self-assessment
Feedback delayed, given at a later Feedback given during process, to enhance
date learning
No repetition of exercises allowed Participants allowed to redo exercises in
order to improve performance
Tend to be used with external job Tend to be used with internal employees for
applicants development and promotion
Note that although assessment centres and development centres both
make use of assessment, their purposes are very different.

17.1.3 Advantages of assessment centres


The advantages of using assessment centres are as follows:

They are competency based and therefore job related.


They are organisationally relevant because exercises are tailored to
meet organisation and industry needs.
They are based on real-life situations typical of those likely to be
faced in the work situation (face validity).
They use properly trained observers, some of whom are fairly senior
line managers, under the direction of a trained psychologist. (The use
of line managers helps to ensure that the decisions made are
acceptable to other line managers.)
They score different exercises for different competencies using
different assessors and combine the scores to give a single score. (The
use of multiple methods and multiple assessors is a process of
triangulation that strengthens predictive validity.)
They are relatively culturally fair (see sections 17.4.3 and 17.5).
They provide further training for both participants and observers.
They give the most accurate assessment possible.
They are defensible in court as they are based on specific job and
competency needs.
They generally leave candidates who attend them with a positive view
of the organisation, even when they are not selected.

17.1.4 Disadvantages of assessment centres


Some of the disadvantages associated with assessment centres are as
follows:

They are labour intensive.


They are time consuming.
They require specially trained assessors.
They are expensive to develop and validate.
They are costly to implement.
They remove participants and observers from the workplace for
several days, requiring substitutes to be found for this period.

Because of these disadvantages, organisations are trying to find ways of


shortening the duration by using fewer exercises and computer-based
systems where these are available. Despite these drawbacks, assessment
centres are extremely popular, especially among large organisations that
have the capacity and money to develop them. Many smaller
organisations also use them, but employ external consultants to run
them. Their popularity lies in the advantages described above, and that
they can be used for both predictive (selection, promotion) and
developmental purposes. It is estimated that the cost of selecting the
wrong person for a position can be between 1,5 and three times the
person’s annual salary (George & Reiber, 2005. See section 14.2.6). If
using an assessment centre improves the chances of making the right
decision, its cost is relatively small in comparison. Assessment centres
are especially valuable when the costs of failure or a wrong selection
decision are high, when there are many good candidates available for
selection and where the emphasis is on identifying and developing talent
for use at a later stage. Because they are so closely related to the job
content and involve several assessors, including both trained
psychologists and line managers, they can be strongly defended in court
against claims of unfair labour practice. The predictive validity of a
properly designed and conducted assessment centre can be as high as
0,80.

17.1.5 What do assessment centres measure?


Assessment centres typically assess candidates for selection or
promotion on a range of factors designed to measure most, if not all, the
following characteristics:

Knowledge (job and industry specific, general)


Skills – cognitive and professional (analytic, decision making,
leadership, job specific)
Aptitudes (ability and/or potential to learn)
Attitudes (commitment, loyalty)
Motivation (need for achievement, perseverance)
Personality (openness, conscientiousness, attention to detail,
creativity, stress management)
Social skills, intelligence and styles (management style, team role
and/or style)
Values

17.2 Identifying the dimensions (competencies) to be


assessed

In the assessment centre literature, competencies are often referred to as


dimensions. For most purposes the two terms can be used
interchangeably.

There are many different sets of competencies that are used to assess
management candidates. Various consultants have their own way of
organising such competencies. Quite often, they have different sets for
different levels in the organisation and for different functional areas.
Saville & Holdsworth South Africa (SHL) (2005), one of the largest
sellers of assessment tools in South Africa, has four such lists, which
they describe as follows:
1. Universal Competency Framework™ (UCF)
Based on a model of 20 generic competencies found to contribute to
superior performance in all roles and positions in an organisation

2. Inventory of Management Competencies (IMC)


Based on a model of 16 generic competencies found to contribute to
superior performance in management and professional roles

3. Customer Contact Competency Inventory (CCCI)


Based on a model of 16 generic competencies found applicable to
superior performance for non-managerial sales and customer service
staff, covering behavioural requirements for effective customer
handling and management of the sales process

4. Work Styles Competency Inventory (WSCI)


Based on a model of 16 generic competencies associated with
successful job performance in the manufacturing and production area

17.2.1 Competencies assessed


A typical assessment centre designed for use with middle and senior
management in South Africa may categorise the competencies as given
in Table 17.2. Although this list is not exhaustive and different
organisations use different combinations, it does indicate the range of
issues used to select people into the management ranks of typical
organisations. Where organisations are undergoing major changes or
find themselves in a rapidly changing environment, the emphasis should
fall on competencies such as flexibility and the ability to learn. Where
the organisation is in a stable environment, the centre should focus more
on management and delivery skills.

Table 17.2 A typical categorisation of management competencies

Management area Competency


Cognitive competencies Analytic ability
Reasoning ability
Innovativeness/creativity
Conceptual thinking
Strategic thinking (senior levels only)
Management competencies Resource utilisation
Direction giving
Empowering others
Administration
Team leadership
Change management
Interpersonal competencies Assertiveness
Sensitivity/emotional intelligence
Influencing skills
Communication skills
Achieving competencies Output focus
Quality focus
Efficiency focus
Team orientation
Customer orientation
Personal competencies Self-development
Self-control/stress management
Tenacity/perseverance
Thoroughness/attention to detail
Decisiveness/independence
Flexibility/adaptability
Initiative
Self-confidence
Impact

Source: Scully Mogg Consulting (1999, p. 12), with permission

A similar set of what they term their “Great Eight” Competencies is put
forward by SHL (Bartram, 2005, 2012) in Table 17.3.

Table 17.3 SHL’s “Great Eight” Competencies

Leading and deciding


Takes control and exercises leadership
Initiates action, gives direction and takes responsibility
Supporting and cooperating
Supports others and shows respect and positive regard for them in social situations
Puts people first, working effectively with individuals and teams, clients and staff
Behaves consistently with clear personal values that complement those of the
organisation
Interacting and presenting
Communicates and networks effectively
Successfully persuades and influences others
Relates to others in a confident and relaxed manner
Analysing and interpreting
Shows evidence of clear analytical thinking
Gets to the heart of complex problems and issues
Applies own expertise effectively
Quickly learns new technology
Communicates well in writing
Creating and conceptualising
Open to new ideas and experiences
Seeks out learning opportunities
Handles situations and problems with innovation and creativity
Thinks broadly and strategically
Supports and drives organisational change
Organising and executing
Plans ahead and works in a systematic and organised way
Follows directions and procedures
Focuses on customer satisfaction and delivers a quality service or product to the
agreed standards
Adapting and coping
Adapts and responds well to change
Manages pressure effectively and copes with setbacks
Enterprising and performing
Focuses on results and achieving personal work objectives
Works best when work is closely related to results and the impact of personal efforts
is obvious
Shows an understanding of business, commerce and finance

Seeks opportunities for self-development and career advancement

Source: Bartram (2012)

A final set of competencies is taken from the HR department of


Syracuse University, the private research university located in Syracuse,
New York. It identifies two major groups of competencies, namely
General and Leadership competencies. These are listed below (see
Syracuse University HRD Competency Library (n.d.) on
http://humanresources.syr.edu/staff/nbu_staff/comp_library.html).
General competencies

Adaptability
Attention to detail
Caring
Collaboration
Communication: open
Communication: oral and written
Continuous learning
Crisis management
Discernment/judgement
Diversity
Drive for results
Initiative
Innovation
Negotiation
Organisational understanding
Planning and organising/time management
Problem solving
Professionalism
Quality
Reliability
Service
Technical expertise

Leadership competencies

Change leadership
Coaching
Collaborative leadership
Conflict management
Influence
Team leadership

17.2.2 Definition of each competency


Once the list of competencies has been finalised (as in Table 17.2), the
content of the competence is then defined exactly. Using the material in
Table 17.2, the first two competencies, namely “Analytic ability” and
“Reasoning ability”, are defined thus:

Analytic ability. This is defined as the person’s ability to analyse


situations and diagnose problems, by probing for relevant information
in a logical and rational manner; obtaining the relevant information;
relating and comparing data from different sources; and identifying
cause/effect relationships.
The importance of this competency lies in the fact that meaningful
business/management decisions should not be based on assumptions
and guesswork, but rather on as much relevant information as
possible. People must therefore seek out the information that is
needed to come to sound decisions. This is a competency that is
required in all jobs and at all levels in the company, but especially in
those situations when the causes of problems or potential problems
need to be identified (Scully Mogg Consulting, 1999, p. 14).

Reasoning ability. This is the person’s ability to combine and


integrate information and to come to sound conclusions, based on the
information that is available to him at the time. It also contains within
it a component of the person’s risk-taking propensity, in that decisions
often have to be taken in the absence of complete information.
The importance of this competency lies in the fact that information
that is gathered needs to be integrated and interpreted in order to
arrive at sound decisions in the light of the organisation’s objectives
and the prevailing conditions. This is a competency that is required in
all jobs and at all levels in the company where sound decisions need
to be made (Scully Mogg Consulting, 1999, p. 15).

Similar definitions and rationales for all identified competencies need to


be drawn up. In the case of the Scully Mogg material, the 29 different
competencies listed in Table 17.2 are identified and defined in this way.
The eight SHL competencies are defined in Table 17.3, and in the case
of Syracuse University, two of the competencies are defined as follows:

Attention to detail. This is the thoroughness the person shows in


accomplishing a task through concern for all the areas involved, no
matter how small.

Behavioural indicators

Monitors and checks work or information, and plans and organises his
time and resources efficiently.
Double-checks the accuracy of information and work product to
provide accurate and consistent work.
Provides information on a timely basis and in a usable form to others
who need to act on it.
Carefully monitors the details and quality of own and others’ work.
Expresses concern that things be done right, thoroughly and precisely.
Completes all work according to procedures and standards.
Drive for results. This demonstrates the person’s concern for
achieving or surpassing results against an internal or external standard
of excellence.

Behavioural indicators

Shows a passion for improving the delivery of services with a


commitment to continuous improvement.
Recognises and capitalises on opportunities.
Sets and maintains high performance standards for self and others that
support the university’s strategic plan and holds self and other team
members accountable for achieving results.
Tries new things to reach challenging goals and persists until personal
and team goals are achieved and commitments met.
Works to meet individual and university goals with positive regard,
acknowledgment of, and cooperation with the achievement of others’
goals.
Motivates others to translate ideas into actions and results.

(See HRD Competency Library (n.d.) at


http://humanresources.syr.edu/staff/nbu_staff/comp_library.html)

17.2.3 Designing or locating appropriate assessment centre


exercises
The next step is to develop or locate suitable exercises for measuring
each of the different competencies. In many cases, a single exercise will
provide good indicators of several of the competencies to be used. Most
consultants guard their materials jealously. Because a great deal of effort
goes into developing such exercises which form part of their business’s
competitive advantage, they do not give this material away very easily.
And so, although this material is theoretically available, in reality most
practitioners find it necessary to invite consultants in as strategic
partners, or to draw up and validate their own material.

In the introduction to this chapter we briefly mentioned the kinds of


exercise that can be used for identifying management and cognitive
skills. Now we need to give substance to this framework. We refer to
specific tests and measures, although other similar measures may be just
as good. (If the details of these instruments are unfamiliar, it is no cause
for concern as it is the prerogative of registered psychologists to work
with them.) Note that the acronyms in brackets – MBTI, etc. – are used
in Table 17.6, the managerial competencies assessment grid.

1. Personality assessment. Although any number of personality scales


can be used, the Myers-Briggs type indicator (MBTI) is a popular
instrument, and so, for the purposes of this exercise, we will use the
MBTI.
2. General aptitude. A widely used measure is the Raven’s Standard
Progressive Matrices (RPM), and this will be used here as it is a
relatively culture-fair measure of general intellectual ability. Other
equally valid measures are available from different vendors.
3. An in-basket exercise (IB). This is a simulation in which each
candidate is required to work through a typical manager’s in-tray
and to make various decisions about his department or
organisations. Typically, an in-basket exercise will begin by
suggesting that the candidate has taken over the department or
organisation as the previous incumbent has been hospitalised. The
candidate has to make all the decisions that need to be made. The
exercise can include as many as 20 or 30 items, some of them linked
and others contradictory. This exercise allows the assessor(s) to
evaluate various cognitive, decision-making and managerial
competencies.
4. Leaderless group technique (LGT). In this kind of exercise,
groups of four to six people are given a task to do with competing
interests and limited resources. The candidate’s ability to persuade
others and to get the resources allocated to his needs are assessed to
identify leadership, dominance and the ability to persuade others to
support his point of view.
5. Role play (RoleP). This is used to identify the person’s
interpersonal skills. A typical situation is one in which a subordinate
has to be counselled or disciplined, or a problem client or customer
dealt with.
6. Various written exercises (Writ). These can be presented to
determine the person’s ability to think strategically and lay out a
document in a logical manner, and his language skills and the like.
7. Oral presentations (Oral). Here the person is required to make an
oral presentation. Aspects such as the logic of his presentation,
language use, self-confidence and impact can be assessed.
8. Interviews (IV). In almost all cases, the person being assessed is
given a structured interview in which various aspects of cognitive,
interpersonal and other competencies are assessed.
9. Social interaction (SI). In many assessment centres the people
being assessed are observed as they interact with each other and/or
the observers during a social occasion such as a meal, a cocktail
party or a braai.
10. Analysis problem (AP). A useful exercise involves the in-depth
analysis of a particular problem situation, identifying problem areas
and drawing up a significant proposal for addressing the issues
identified. This could involve the development of a new product, the
relocation of a plant, integrating product lines, and so on. This
technique allows many of the competencies to be assessed.
11. Team roles (Belb). Belbin’s Team Role Inventory is used as it gives
nine different roles people can play in a team situation, ranging from
a strong leader to a backroom researcher.

The exercises should, as far as possible, reflect the realities of the job. It
is therefore necessary to align each exercise with the particular
organisation or industry in which it is being conducted. For example, if
the exercise is a role play dealing with a difficult worker, it is useful to
examine a few recent disciplinary cases to find out what issues typically
arise and how they have been handled in the organisation. These
findings must then be incorporated into the detail of the role play.
Depending on the purpose of the assessment, the exercises may
encourage competition or cooperation.

17.2.4 Drawing up a scoring system or matrix


Drawing up a scoring system for the various exercises and the
assessment centre as a whole involves six distinct steps:

17.2.4.1 Decide on the number of competency levels


In section 11.3.1, we saw that the number of levels used is quite
arbitrary, and can range from as few as three (not yet competent,
competent and more than competent), to as many as ten. In the case of
the Scully Mogg Consulting (1999) material, the eight-point scale
shown in Table 17.4 is used to evaluate the level of competence in each
area.

Table 17.4 Scoring system used

Score Interpretation
0 Does not exist; could not be detected
1 Poor, underdeveloped, inadequate for current position
2 Minimum required for adequate performance
3 Adequate – further development required for good performance
4 Above average – meets requirements for good performance
5 Very good – good performance almost guaranteed
6 Excellent – will succeed at the next level of promotion
6+ Excessive – too much of a good thing, unbalanced, will interfere with good
performance

Source: Scully Mogg Consulting (1999, p. 27)

17.2.4.2 Decide on which exercises will be used


To assess competence in each of the competencies decided upon,
specific exercises or techniques must be identified. Syracuse University,
as part of their “Performance Partnership” argue that different
competencies and/or behavioural indicators may be appropriate for use
at different levels in the job hierarchy. These are shown in Table 17.5.

Table 17.5 Recommended competencies by job category

Staff II Staff III


Attention to detail Attention to detail
Communication: oral and written Collaboration
Planning and organising Planning and organising
Professionalism Problem solving
Service Technical expertise
Staff IV Staff V
Collaboration Coaching
Discernment/Judgement Discernment/Judgement
Planning and organising Organisational understanding
Problem solving Problem solving
Technical expertise Team leadership
Staff VI Staff VII
Coaching Change leadership
Discernment/Judgement Coaching
Influence Collaborative leadership
Organisational understanding Influence
Team leadership Innovation

Most exercises measure more than one competency and each


competency is measured by more than one exercise. Using the Scully
Mogg system and the various exercises described in section 17.2.3, the
management competency grid shown in Table 17.6 can be drawn up.
(Note the point • indicates that the exercise is used to measure the
competence described.)
Table 17.6 The managerial competencies assessment grid

COMPETENCIES RPM IB MBTI LGT Belb RoleP AP IV SI


1. COGNITIVE
1.1 Analytic ability • • •
1.2 Reasoning ability • • • • •
1.3 Innovativeness • • •
1.4 Conceptual thinking • •
1.5 Strategic thinking • •
2. MANAGEMENT
2.1 Resource utilisation • •
2.2 Direction giving • •
2.3 Empowering others • •
2.4 Administration • • •
2.5 Team leadership
2.6 Change management •
3. INTERPERSONAL
3.1 Assertiveness skills • • • • •
3.2 Sensitivity skills • • • • •
3.3 Influencing skills • • • • •
3.4 Written comm. skills • •
3.5 Oral comm. skills •
4. ACHIEVING
4.1 Output focus •
4.2 Quality focus • •
4.3 Efficiency focus • • •
4.4 Team orientation • • • •
4.5 Customer orientation • • •
5. PERSONAL
5.1 Self-development •
5.2 Self-control/stress mgt • • • •
5.3 Tenacity/perseverance •
5.4 Thoroughness/attention • • •
to detail
5.6 Flexibility/adaptability •
5.7 Initiative • •
5.8 Self-confidence • • •
5.9 Impact

Source: Scully Mogg Consulting (1999, p. 29)

The table shows that most exercises measure more than one
competency, and each is measured by more than one exercise. If we
remember that there are several observers who each observe different
participants in the various exercises, and that each observer assesses
different participants at different times, then we get a perfect multiple
competency, multiple technique and multiple observer triangulation
process.

17.2.4.3 Identify behavioural indicators


The third step in the process is to draw up a list of both positive and
negative behavioural indicators for each of the competencies on the
basis of the competency’s definition and the content of the various
exercises. An example of the first two cognitive competencies used by
Scully Mogg Consulting, with both positive and negative indicators, is
given in Table 17.7. In the case of analytical ability, the positive
indicators include breaking the problem down into logical elements and
probing for information. Negative indicators include dealing with large
and/or inappropriate elements, accepting information at face value, and
so forth.

Table 17.7 Examples of behavioural indicators

Analytic ability: the ability to diagnose problems and obtain information by probing
Positive Negative
Breaks problem down into logical Deals with large/inappropriate
elements elements
Probes for information Accepts information at face value
Asks diagnostic questions Asks rhetorical/closed questions only
Identifies cause and effect Focuses on symptoms
relationships
Prioritises Lists

Reasoning ability: the ability to integrate information and come to conclusions


Positive Negative
Arrives at sound and logical decisions Bases decisions on insufficient
evidence
Thinks dispassionately Thinking influenced by emotions
Motivates/justifies conclusions Jumps to conclusions
Requests clarification when confused States conclusions blandly
Comes to sound conclusions Avoids decisions when confused

Source: Scully Mogg Consulting (1999, p. 32)

Note that a space is left below each list in which specific evidence
gathered during each exercise can be noted. Note also that a space is left
in the middle of the top line of each table. This is where the score or
competence level is recorded.

There is a table like this for each of the competencies listed in Table
17.2. Organisations change with time, and there is a serious danger that
using a particular competency framework for too long may result in the
organisation being filled with a particular type of individual. This may
create problems when the environment in which the organisation
operates changes. To illustrate this: a large South African organisation
had put in place a particular competency framework that was used for all
selection, promotion and development activities. The framework chosen
was focused on a strong, dominating management style, producing
managers that were high in assertiveness. When the government of the
country changed following the 1994 democratic elections, the
organisation acquired a new set of executives. In this transition, very
few of the previous senior management retained their positions – the
competencies they possessed were no longer relevant in the new
dispensation. A new set of competencies, requiring very different
selection, development, promotion and performance management
processes, had to be designed to fit in with the new ethos that was being
created.

17.2.4.4 Evaluate people on the various dimensions or


competencies
The fourth step in drawing up a scoring system is to decide on the
evidence that will be used to determine the competency levels shown by
the candidate. Using the behavioural indicators drawn up in Table 17.7,
evidence for each one is looked for in the candidate’s response to the
exercise and noted in the bottom half of the table. In Table 17.8, a
person’s responses in a role-play exercise have been written down. In
this example, the person asked the subordinate such questions as: “When
did the problem first occur?” (a positive indicator) and then asked him:
“What did you do next?” (also a positive indicator). However, the
candidate also gave a negative indicator by simply listing various
activities. These responses are identified by the check marks (ticks) in
the appropriate places, and the evidence or behavioural indicator is
noted verbatim in the bottom half of the table.

Table 17.8 Examples of behavioural indicators

Analytical ability The ability to diagnose problems and obtain information by probing
Positive 4 Negative
Breaks problem down into [ ] Deals with large/inappropriate elements
logical elements [ ] Accepts information at face value
Probes for information [ ] Asks rhetorical/closed questions only
Asks diagnostic questions [ ] Focuses on symptoms
Identifies cause and effect Lists
relationships
[ ] Prioritises
Asked sub when problem first On Monday he told me this and on Tuesday he
occurred did that and on Thursday he did …
What did you do next?
Why do you think this happened?
First you said they were useless,
and then what did you do?

From the candidate’s responses written in the bottom boxes, it is clear


that he got four positive indicators on the left side and only one negative
comment in the right-hand box. In setting up the exercises, the designer
of the assessment centre decided that these responses were worth a
level-4 score. This kind of decision is based on extensive training and
years of experience. (The number 4 has been entered in the middle box
of the top line of the table, reflecting the score given to the candidate by
the evaluator on this exercise.)

For each of the competencies used, a set of guidelines for interpreting


the evidence provided in the exercise needs to be worked out. In
practice, each observer would complete an assessment like this and then
discussions would be entered into until there was agreement that this
was indeed a level 4 and not a 3 or a 5.

17.2.4.5 Combine the various assessment scores to arrive at


a decision
Once all the evaluation results of the different assessors on the different
exercises have been obtained, all that remains is to find a sound way of
combining them to yield a composite assessment of the person. In much
the same way that the various assessors reached agreement on the score
attained for “Analytic ability” in Table 17.8, they now discuss the scores
for each competency in every exercise until they agree on a common
score. This is then entered onto a master sheet that is similar to Table
17.6 but with scores for each exercise, rather than dots. A section of this
master sheet is given in Table 17.9.

Table 17.9 A section of the scoring master sheet

COMPETENCIES RPM IB MBTI LGT Belb RoleP AP INT SI FIN


1. COGNITIVE
1.1 Analytic ability 5 4 4 5 5
1.2 Reasoning
ability
1.3 Innovativeness
1.4 Conceptual
thinking
1.5 Strategic
thinking

The table shows that the participant scored either 4 or 5 on the various
exercises used to measure analytic ability. After some discussion, the
various assessors agreed that the participant’s final score (FIN) should
be 5, the reason being that the “in-basket” best demonstrates this
particular competence. Had the participant scored 4 here and 5
elsewhere, the final score would have been 4.

The same process should occur with every competency: first, the
individual assessors rate a participant on each exercise, then they reach
agreement about what he scores on each particular competency in each
particular exercise. Then the assessors combine all the scores for a
competency to come to a final decision about the level of the
competency. This process is repeated for each competency being
assessed.

17.2.4.6 Compile a report


Once each person’s scores have been finalised, a report on his strengths
and potential development areas should be drawn up, giving the scores
that were obtained for each competency area and the reasons for these
scores. The box below each scoring sheet is used to record evidence for
the scoring and the feedback process. For example, an assessor could
report as follows:

In the role-play exercise, Thandi scored 5, which is a very good score,


suggesting that in her case good performance is almost guaranteed.
She was able to identify many of the issues raised. For example, she
asked such questions as: “What did you do next?” and “Why do you
think this happened?” However, on the negative side, Thandi was
inclined to list things rather than look for cause-and-effect
relationships. This is illustrated by her comment: “On Monday he told
me this and on Tuesday he did that and on Thursday he did …”

Note that the description of Thandi’s performance is taken from the scoring levels
given in Table 17.9, with evidence being collected from the bottom section of the
box in Table 17.8.

Assessors must provide a similar description for each of the


competencies.

17.2.4.7 Provide feedback


As soon as the report has been written, and within two or three weeks of
the assessment centre, the participant should be given feedback on his
performance. This must be done by a trained person (in South Africa
this must be a registered psychologist). In addition, the person’s
immediate superior must also be present.

During the feedback session, the results of the assessment centre should
be discussed as well as any suggestions or proposals about any
developmental steps that need to be taken. This involves aspects such as
training courses to attend, job assignments, reading, and so forth.
Feedback report
Dear ABC
Thank you for having made yourself available for the assessment centre held on
[date]. You have now received verbal feedback on your performance. You also
have a copy of your feedback report. This document briefly summarises our
findings, and proceeds to suggest ways in which you, with the help of your
managers and other important people in your life, can begin to address those
areas in which you feel improvement may be necessary for your future success at
work and elsewhere.
It is important to realise that this document, and the whole assessment process,
is not designed to criticise or judge you in any way, but rather to help you achieve
what you are capable of. The first stage in any such process is to evaluate one’s
current situation in relation to where one would like to be. Once this gap has been
identified, the process of closing it can start.

1 Cognitive competencies
1.1 Analytic ability
We define this as a person’s ability to analyse situations and diagnose problems
by probing for relevant information in a logical and rational manner; obtaining the
relevant information; relating and comparing data from different sources; and
identifying cause-and-effect relationships.
The importance of this competency lies in the fact that meaningful business
and/or management decisions should not be based on assumptions and
guesswork, but rather on as much relevant information as possible. People must
therefore seek out the information that is needed to come to sound decisions.
This is a competency that is required in all jobs and at all levels in the company,
but especially in those situations when the causes of problems or potential
problems need to be identified.

Major aspects that may need your attention


Analytic ability involves your ability to

recognise that a problem or potential problem exists


analyse situations in a logical and rational manner
make sense of information by organising and structuring it
relate and compare data from different sources
see links, patterns and tendencies
identify cause-and-effect relationships
probe for information and ask diagnostic questions rather than accept
information at face value
develop clear criteria for guiding decisions.
What you can do for yourself

Refrain from jumping to conclusions.


Think about the way you usually tackle problems and whether there are ways
you can improve this. Ask your colleagues to help you.
Examine your own prejudices and biases, and try to understand how these
make you think in certain ways. Then try to think of ways of overcoming these
thinking patterns.
Ask yourself if the current situation reminds you of something you have
experienced before.

Things your manager can do for you

Your manager must regularly give you tasks that need analytic thinking, and
then help you in this process. On completion, he should review your
performance and point out your successes and where further refinement is
required.
As a matter of course your manager must learn to expect from you a
breakdown of all the advantages and disadvantages of any action or decision
that you need to take.

Developmental activities

Seminars/training/development courses
Problem solving Smith and
Jones
Turning plans into SMG
action Consulting
Work-based experience
[In this space, note down what you think you can do at work to further develop
and strengthen this competence.]

1.2 Reasoning ability


etc.

17.3 Conducting a typical assessment centre


To close this section, Table 17.10 gives the various steps involved in
setting up and running an assessment centre with suggestions of the
effects on the various stakeholders.

Table 17.10 Steps required for an assessment centre

Administrator (with Immediate


Action Participant
assessors) superior/supervisor
Invitation Invites line managers to Identifies participants
nominate participants
Preparation Ensures all materials, Prepares the person, Ensures
observers and venue stressing the need and workplace
are in order and importance of the activities are
available when required assessment centre covered by deputy
Ensures that Reads any
participant’s duties are material provided
covered – no (especially for
interruptions allowed! development
centres)
Arrival Introduces participants Attends opening to give Settles in and has
Explains process credibility to positive attitude
assessment centre
Day 1 Various exercises May act as observer if Various exercises
Begins scoring trained for this Attends social
Observes social event event

Day 2 Exercises continue May act as observer if Exercises


Scores exercises trained Exercises continue
Observes social continue Extended
interactions project/analysis
task
Day 3 Exercises continue May act as observer if Exercises
Scores exercises trained continue
Integrates Leaves after lunch

Feedback Compiles report Receives report Receives report


Discusses with superior Discusses report with Discusses report
Discusses with participant with superior
participant Helps plan follow-up Commits to follow-
up/development
Ongoing Liaises with superior Builds development Attends
and participants as targets into developmental
required participant’s processes
performance
management process

Source: Murray (2005)

17.4 Psychometric properties of assessment centres

As with any assessment process, assessment centres need to be


evaluated in terms of the key criteria of reliability, validity and fairness.

17.4.1 Reliability
Because assessment centres are based on observable behaviours, the
most important form of reliability is inter-rater or inter-scorer reliability.
Research has consistently shown that the inter-rater or inter-scorer
reliability of assessment centres is high (in the 0,60 to 0,95 range),
provided the assessors have been properly trained (Murphy &
Davidshofer, 2006). The fact that all scores are based on consensus
between different assessors and assessment techniques at the end of each
exercise and at the end of the assessment centre means that this form of
reliability is very high.

Test-retest reliability is also very high, as the competencies being


assessed are complex, and the behaviour patterns are well established
and unlikely to change over time unless a conscious effort is made to do
so.

17.4.2 Validity
Research has consistently demonstrated that assessment centres have
high concurrent and predictive validity, with correlation coefficients in
the region of 0,60 and higher, against criteria such as job performance,
management potential, training performance and career progression
(Murphy & Davidshofer, 2006).

Content validity and face validity are both high, because the material is
based on the jobs, organisations and industries for which the centres are
being used.

17.4.3 Fairness
Because assessment centres are competency based, they are relatively
culture fair, with some studies finding no cultural effect (Kriek, Hurst &
Charoux, 1994). At the same time, for all the reasons given in Chapter 7,
especially sections 7.2 and 7.5, cross-cultural differences do exist in
most measures, including those obtained in assessment centres (Blair,
2003) (see also section 12.8).

Milsom (2004, p. 20) argues that “systematic differences should be


expected between the behaviour of people from different countries as
they strive to meet different models of what good performance looks
like”. Quoting Ployhart and Tsacoumis (2001), Blair (2003) shows
typical black/white score differences in his US sample. These are shown
in Table 17.11.

Table 17.11 Black/white differences (US sample)

Measure Score differences


Cognitive 1,00
Personality – 0,04 to 0,21
Structured interview 0,23
Biodata 0,33
Video situational judgement 0,43
Paper situational judgement 0,61
Assessment centre 0,20 to 0,60 (0,40)

Source: Blair (2003)


Note that the differences shown in Table 17.11 are expressed as a proportion of a
standard deviation (SD). If we look at the first row, differences in the scores of the
“Cognitive” measure between blacks and whites in the US studies are about one
full SD. In other words, if the SD is 15, then the difference between the two
groups is also 15.
In the case of the “Structured interview”, the black/white difference are just less
than one quarter of an SD, namely 0,23.

Similarly, Goldstein et al. (1998) show the following black/white


differences in their UK sample.

Table 17.12 Black/white differences (UK sample)

Measure Score differences


In-basket 0,355
Role play (subordinate meeting) 0,03
Group discussion 0,25
Project presentation 0,27
Project discussion 0,39
Team preparation 0,40
Overall score 0,40

Source: Goldstein et al. (1998)

The results in Tables 17.11 and 17.12 once again raise the issue of whether we
should have a single norm or group-based norms. This matter is dealt with in
some detail in section 7.5.

17.4.4 Gender differences


In general, it has been found that females do as well as males on most
competencies, although they tend to score a little lower on tasks
requiring dominance and higher than males on those tasks where
sensitivity and interpersonal support is assessed.
17.5 Improving the cultural fairness of assessment
centres

Given the findings on fairness, we may ask what can be done to


eliminate or minimise any sub-group differences that may occur.
According to Blair (2003), there are six areas that require attention,
namely job analysis, design of the assessment process, exercise choice
(i.e. which role plays, in-basket, and so forth should be used),
administration, assessor training and rating process, and feedback.

These are expanded on briefly below.

17.5.1 Job analysis


Because job analysis is central to establishing job entry and performance
requirements (competencies), the way in which it is carried out affects
the fairness of any assessment process, including assessment centres. In
general, the typical job analysis tends to overemphasise the cognitive
aspects of the job. In order to counteract this, the assessment centre
designer should try to broaden the range of competencies assessed to
ensure that all competencies and KPAs are properly assessed. In this
respect, the designer should focus on non-cognitive elements, in contrast
to the typical approach which focuses so heavily on the cognitive
components that non-cognitive competencies get lost.

17.5.2 Design of the assessment process


The whole assessment process should also be looked at in an effort to
ensure that it is kept simple and directly related to the job. Designers
should avoid the temptation to overelaborate. It is also important that
designers review the contribution and weight of each exercise to the
final score (some exercises may not be as important or as unimportant as
first thought). Differences in scores can be reduced by removing or
reducing the weight attached to the most skewed exercises. Designers
may also need to review the behavioural checklists or indicators that link
behaviours to competencies (see Table 17.7). Where possible, designers
should use group-appropriate norms.

17.5.3 Exercise choice


Designers should determine which exercises contribute most to the
group differences – exercises that are cognitively complex and rich in
information generally have a greater negative impact on minority groups
than other materials. There should therefore be a re-examination of the
content of those exercises where differences are greatest and an attempt
made to reduce these differences through their redesign. Equally valid
alternatives with less negative impact can also be sought. Finally,
alternative exercise formats can be used such as a spoken rather than
written in-basket exercise. In real life, most in-baskets are dealt with
orally. (Some assessment centres use this approach with success.) There
are also a number of computer-based in-basket exercises that require
little more than answering multiple-choice questions on screen.

17.5.4 Administration
There are a number of ways in which the administration of an
assessment centre can be changed to reduce minority group differences.
Firstly, designers can reduce the amount of information processing and
reading comprehension skills in the exercises by making them shorter
and considering language difficulty levels. Assessors and designers
should give participants ample time to process the information –
minorities perform worse on timed tasks. Participants from
disadvantaged or culturally different groups should be briefed between
exercises, rather than at the beginning; this will reduce information load
during the assessments. Wherever possible, minority groups should be
represented on the assessment panel as this not only gives participants a
sense of security, but may also help to explain the cultural context of
some of the behaviours. Finally, assessors could use fewer dimensions
and coarser scoring systems – three- or four-point rather than seven- or
eight-point systems.

However, assessors, designers and other interested parties need to bear


in mind that if one interferes with the assessment design, the criterion-
related validity (both concurrent and predictive) will have to be checked.
Changing the exercises or their contribution to the whole will mean that
the process has to be validated from scratch if it is to meet the moral and
legal requirements of validity and fairness.

17.5.5 Assessor training and rating process


A key area in which group differences can be minimised is in the
training of assessors and observers. During their training, they should be
sensitised to different cultural behaviour patterns and how these affect
behavioural indicators. It is useful for observers to practise on culturally
different participants during their training, and for them to give in-depth
and culturally sensitive feedback as part of this process. It is important
for observers to be aware of prejudices and other sources of bias in
coming to their conclusions. Alternatively, they should try to arrive at an
assessment during an exercise rather than after its completion – first
impressions are often more valid than those arrived at rationally after the
exercise. Finally, it is important for the assessment centre administrator
or assessor trainer to debrief assessors after the training sessions and to
specifically address these issues of potential bias.

17.5.6 Feedback to participants


The final area for addressing group differences is during the feedback
process. The assessor should focus on identifying strengths as well as
weaknesses, showing how performance in each area can be further
improved. It is also useful to show participants how they compare to the
group as a whole on the various competencies.

17.6 Summary

In this chapter we defined what constitutes an assessment centre and


noted the differences between assessment and development centres,
before examining the advantages and disadvantages of using assessment
centres. We then considered what an assessment centre measures. We
began by identifying the dimensions or competencies to be assessed and
the categorisation and definition of these. Next we discussed the
designing or locating of appropriate assessment centre exercises and the
drawing up of a scoring system or matrix on which the various exercises
and competencies could be reflected. A decision regarding the number
of competency levels to be used was needed before the evaluation of
people on the various dimensions could begin. Once the exercises had
been completed and the scores for the various competencies obtained,
these needed to be combined for the final assessment. The last step was
to compile a report and give feedback to the candidate and his superior.

We also looked at what conducting a typical assessment centre involves


and the psychometric properties of assessment centres (reliability,
validity and fairness). In closing we examined ways of improving the
cultural fairness of assessment centres, including aspects of job analysis,
the design of the assessment process, the choice of the exercises, the
administration of the process, the training of assessors in the rating
process and the giving of feedback in a culturally fair manner.

Additional reading

Blair, M.D. (2003). Best practices in assessment centres: Reducing “group differences”
to a phrase for the past gives a good description of the issues around fairness and what
can be done to minimise adverse impact with minority groups in the US.
Grayson, P. (2005). An introduction to assessment centres. This provides a good
overview of the assessment centre process and a historical overview of the
development of these centres in the UK and the US.
Lievens, F. & Klimoski, R.J. (2001). Understanding the assessment centre process:
Where are we now? In C.L. Cooper & I.T. Robertson (Eds), International review of
industrial and organisational psychology, 16, 245–286. This chapter provides a good
theoretical background to assessment centres.
Murray, M. (2005). How to design a successful assessment centre. People
Management (UK), 11(4), 24–45 provides a good account of setting up and running an
assessment centre.
Test your understanding

Short paragraphs

1. Define what is meant by an assessment centre, and compare an assessment centre


with a development centre in terms of objectives and methods.
2. What evidence is there that assessment centres meet the required psychometric
properties of reliability, validity and fairness?

Essay

Assessment centres have been described as the Rolls-Royce of assessment


technology. Discuss this statement, stating whether you accept this view or not. Explain
why you have arrived at this conclusion.
SECTION
5

The future of assessment in


organisations
In this last part of the book, we look into the future and try to see how the field of
assessment is likely to evolve. This involves both refining existing theories and
models as well as suggesting possible new models. We even look at how
previous theories that were rejected may, as a result of new technology, be
coming to the fore. We recall the criticism of Sir Francis Galton’s arguments when
he tried to link various psycho-physiological factors (such as reaction time,
sensory discrimination, and visual and hearing acuity) to intelligence. Binet
reacted against this argument when he developed his school placement tests.
Now it seems that Galton’s theory was not so wrong after all – with the use of new
technology, there is strong evidence to suggest that reaction time is an indicator
of intelligence in that it is one of the signs of speed of information processing, a
cornerstone in the various cognitive models of intelligence. New theorising and
improved technology are likely to develop this line of reasoning further.
Issues around concepts such as emotional intelligence, the control of various
forms of response sets (particularly in the context of cross-cultural assessment),
theoretical developments in the fields of complexity science and chaos theory, as
well as computer-administered, scored and interpreted tests are investigated, and
their impact on the principles, theory and practice of assessment are examined.
18 New developments in
assessment

OBJECTIVES

By the end of this chapter, you should be able to

describe new developments in assessment


discuss the benefits and disadvantages of computerised assessment
describe the techniques for assessing the individual’s current and potential
capabilities
outline some of the new theoretical areas that may influence assessment in the
future.

18.1 Introduction

If we consider the various issues facing assessment in general, and in the


South African context in particular, the following issues and emerging
trends are among the most important:

Theoretical developments with respect to assessment


The domains in which assessment takes place
The administration of assessments, particularly the use of computers
and testing over the Internet
The control of tests and the profession
Professional training of assessment practitioners
The future of psychological assessment and psychometric testing in
particular
A number of trends are likely to gain strength in the next two or three
decades. Firstly, these are likely to cluster around the nature of the
constructs that need to be assessed, including refinements of existing
psychological theories in areas such as intelligence, potential and
personality, behavioural change as well as relatively new dimensions
such as organisational citizenship behaviour and positive organisational
scholarship, and a number of other ideas likely to emerge from the
whole positive psychology movement. In addition, the detection and
management of response sets (faking, impression management and
malingering) will come to the fore. Secondly, new and innovative
assessment technologies, such as adaptive testing, dynamic testing and
Career Path Appreciation will develop. Thirdly, there are likely to be
significant advances with respect to new technologies (especially
computer-based assessment and testing over the Internet). New
developments in the philosophy of science (such as chaos and
complexity theory) may give rise to new psychological constructs or
ways of assessing existing constructs.

18.2 Constructs to be assessed

Among the most pressing theoretical issues that need to be clarified


before the field of psychological assessment can move forward are those
associated with how we define various key concepts that dominate the
field. As far as the measurement of intelligence and aptitudes is
concerned, Fernández-Ballesteros (2006) proposes that three sources of
development can be expected. Firstly, she suggests that advances in
cognitive psychology will yield new techniques for evaluating first-
order mental processes associated with simple as well as increasingly
complex levels of human cognitive functioning by using laboratory
devices. In particular, one can think of simple and complex reaction
times as measures of intellectual capacity or ability. Secondly, she
believes that the use of the dynamic assessment of intelligence will
continue to develop, and that this will become increasingly important
when we measure the cognitive abilities of culturally deprived people
and when we need to plan and programme cognitive interventions.
Finally, she believes that developments in respect of the item response
theory (IRT) will allow for the further development of both computer-
assisted and adaptive tests.

18.2.1 Intelligence
The first of the constructs that needs to be refined is intelligence. We
may ask what constitutes intelligence, how it is structured, and how it
should be assessed. Chapter 10 gives various definitions (see section
10.1.1). In fact, when two dozen prominent theorists were asked to
define intelligence, they gave two dozen somewhat different definitions
(Sternberg & Detterman, 1986). Moreover, it is very difficult to compare
concepts of intelligence across cultures. English has many words for
different aspects of intellectual power and cognitive skill (wise, sensible,
smart, bright, clever, cunning, etc.). If other languages have just as
many, which of them shall we say corresponds to its speakers’ “concept
of intelligence”? Even within a given society, different cognitive
characteristics are emphasised from one situation to another and from
one subculture to another. These differences extend not just to
conceptions of intelligence, but to what is considered adaptive or
appropriate in a broader sense. (See Neisser et al., 1995.)

However, if we distil these definitions down to their very basics, two


aspects seem to emerge. The first is that intelligence involves gathering
and transforming information to produce some outcome with a higher
value than existed before. Whether this is seen in terms of solving
problems or learning from experience is immaterial. What is important
is that intelligent behaviour somehow adds value. This relates to the
possibility of using some measure of cognitive “value added” as a
criterion of intellectual success, in the same way that we suggested in
section 14.8.4 that economic value added can usefully be seen as a
measure or criterion of job performance success.

The second aspect of the definitions of intelligence presented in Chapter


10 is that in a society where time is an important commodity, solutions
that are found sooner rather than later are preferred. It is for this reason
that among the critical areas that are valued are the speed and efficiency
with which problems are solved: speed is a key aspect of our definition
of intelligence. In this regard, the definitive study by Neisser et al.
(1995, p. 14) notes that

many recent studies show that the speeds with which people perform
very simple perceptual and cognitive tasks are correlated with
psychometric intelligence … In general, people with higher
intelligence test scores apprehend, scan, retrieve, and respond to
stimuli more quickly than those who score lower.

This view is supported by Fink and Neubauer (2001), and also by


Gottfredson (1998), who notes that

… research on the physiology and genetics of g has uncovered


biological correlates of this psychological phenomenon. In the past
decade, studies by teams of researchers in North America and Europe
have linked several attributes of the brain to general intelligence.
After taking into account gender and physical stature, brain size as
determined by magnetic resonance imaging is moderately correlated
with IQ (about 0.4 on a scale of 0 to 1). So is the speed of nerve
conduction. The brains of bright people also use less energy during
problem solving than do those of their less able peers. And various
qualities of brain waves correlate strongly (about 0.5 to 0.7) with IQ:
the brain waves of individuals with higher IQs, for example, respond
more promptly and consistently to simple sensory stimuli such as
audible clicks. These observations have led some investigators to
posit that differences in g result from differences in the speed and
efficiency of neural processing.

Perhaps the clearest opinion on this issue is provided by Sheppard and


Vernon (2008, p. 542), who conducted a 50-year review of the
relationship between intelligence and speed of information processing,
concluding as follows:

Diverse measures of mental speed are significantly correlated with


measured intelligence. There is a trend – among some mental speed
tasks – for more complex measures to be more highly correlated with
intelligence, but this effect is not evident for all tasks. The results also
reveal that mental speed often (though not always) correlates more
strongly with gf [fluid intelligence*] than with gc [crystallised
intelligence*]…. The overall correlation between mental speed and
intelligence is moderate but very consistent: across all the different
studies and measures that were reviewed – which yielded a total of
1146 correlations – the mean correlation is –0.24.

An additional aspect of our definition of intelligence is the vague,


seldom articulated view that a good or intelligent solution is one that is
elegant. Elegance in this context means that the solution to the problem
should account for all variables identified, while at the same time being
as simple as possible. This is termed Occam’s razor*, and is best
summarised by the view attributed to Einstein: “Everything should be as
simple as possible – but no simpler”. Of course, this notion of elegance
needs to be traded against a related term, that of satisficing. A satisficing
solution is one that is good enough for now. Because time is a precious
commodity, a less than perfect solution may be required as a stopgap
measure. The challenge of looking at intelligence as efficiency and
elegance lies in both the assessment of elegance and in the trade-off with
solutions that are expedient and satisficing.

The notion of elegance is important because there is a basic belief that simple
solutions are better than complex ones. One is reminded here of a recent book on
complexity theory by John Gribbin, entitled Deep simplicity: Chaos, complexity
and the emergence of life (2005). He argues quite strongly that science proceeds
by finding ways of simplifying the complex. All the great scientific breakthroughs
have involved finding simple and elegant ways of explaining complex
relationships. Clearly, in terms of this view, intelligence involves finding simple
solutions to complex problems.

A final aspect relating to the definition and assessment of intelligence,


and one that we should not lose sight of, is the fact that intelligence is
socially defined by powerful social agencies, especially our educational
and industrial decision makers: it is they who, to a large extent,
determine what is valued, and hence what is value adding.

18.2.2 Potential
An area or construct of particular concern to societies where there has
been an unequal distribution of social and educational resources is the
defining and identifying of potential. As far as a definition is concerned,
there are two approaches. Firstly, potential can be defined in terms of a
behavioural readiness to perform a particular task which simply awaits
the opportunity to perform it. In this sense, potential can be likened to a
seed which is waiting to germinate. All that is required is for the right
conditions to occur for this behaviour to manifest itself. (The seed needs
to be planted in fertile soil and properly nurtured, and it will blossom.)
In assessment terms, this translates into the question of whether the
person is able to demonstrate the competencies that are required. This
question is best answered in a simulation or assessment centre process.

The second approach to defining potential can be framed in terms of the


person’s cognitive readiness to perform the particular task. This view of
assessing potential is best conveyed by asking whether the person has
the necessary cognitive or information-processing strategies and other
abilities required for him to acquire the competencies in question. To
use the plant analogy, this approach would be more concerned with the
genetic makeup of the seed rather than the soil in which it is planted.
(Note that this does not imply in any way that this cognitive readiness is
at all genetically determined.) The question posed is best answered using
both static and dynamic psychometric assessment techniques. (Dynamic
assessment, a test-train-retest approach, is discussed in depth in Chapter
10, section 10.5.8.)

An important element of the definition and assessment of potential in


this way is to obtain some indication of the rate at which the required
knowledge and skill sets can be obtained – how long will it take for the
potential to be realised? One possible solution to this is that by the end
of the normal probation period the person should have been brought up
to speed. In a country like South Africa, where the industrial relations
legislation makes it quite difficult to release underperforming staff, such
an approach is not without complications.

18.2.3 Personality
With respect to the definition of personality, the first issue that we need
to address (if not resolve) is whether the etic or nomothetic view holds
(i.e. that there is a relatively well-defined set of factors, such as the Big
Five), or should we take an emic or idiographic view, which argues that
people need to be understood in themselves, as individuals, rather than
as an assembly of various traits that differ only in the “amount” that
different people have. In this regard, the ideas of John Berry and the
other people concerned with cross-cultural assessment should be
revisited. In short, they argue that the two theoretical streams – emic and
etic – can be brought together using the ideas of a derived etic – that is, a
sort of compromise between the two that allows cross-cultural
comparisons to be made without the risk of imposing an etic approach
on the target cultural group (see Berry et al., 1992).

18.2.3.1 Assessment of personality


As far as the assessment of personality is concerned, we can expect three
lines of development:

1. The improvement of paper-and-pencil tests in the measurement of


personality traits. As we shall see in section 18.2.6, it is likely that
these improvements will be based in part on solving issues
associated with various biases in self-reports.
2. The development of tests linked to new personality constructs in the
field of health and interpersonal adaptation (e.g. prone types of
personality, rationality and defensiveness).
3. The construction of new adaptive tests for the measurement of
attitudes and personality characteristics.

According to Jones and Higgins (2001, p. 9), a fourth trend that is


emerging, and one that is likely to continue, is the development toward
more job-specific personality tests such as those of integrity and
customer service orientation, and away from the general personality
assessments. In their view

[a] major breakthrough occurred when industrial/organisational


psychologists discovered that job-related personality constructs such
as integrity, service orientation, and conscientiousness helped to
statistically differentiate highly productive and dependable workers
from counterproductive and irresponsible workers. Assessment
constructs that have recently surfaced include emotional intelligence,
technology readiness, and job loyalty, to name a few. I/O
psychologists and psychometricians alike realise that innovative test
measures are always well received by the marketplace as long as they
are job related, valid, and fair and they lead to a clear strategic
advantage.

In addition, job applicants may be less willing to take general


personality tests if they do not see a clear relationship between the
assessment and the job they have applied for (Sullivan & Arnold, 2000).
Applicants may well find some of the questions in general personality
measures to be excessively invasive or irrelevant. According to Guion
(1969), there has also been a shift from general personality tests toward
more specific job-related constructs such as integrity and other job-
relevant personality dimensions. (The issue of integrity testing is dealt
with extensively in Chapter 13.) Already various tests of this kind are
appearing in the market. One such inventory is the Occupational
Personality Questionnaire (OPQ) developed by Saville & Holdsworth
Ltd (SHL). It measures 32 different personality traits that are seen as
relevant to occupational settings. These are grouped into categories such
as: Relationships, Sociability (e.g. outgoing, socially confident),
Influence (e.g. persuasive, outspoken, independent minded), Empathy
(e.g. democratic, caring), and Thinking style (e.g. evaluative, rational). It
also includes a social desirability measure to detect “faking” responses.
However, the OPQ32 targets graduates, managers and experienced hires
and, together with its multidimensional forced choice format, it may be
quite difficult for people whose first language is not that of the test and
who are relatively unsophisticated with respect to test-taking. (See
http://www.shl.com/assets/resource/opq-uk.pdf)

Another recent entry into this market is a work-oriented version of the


Jung Typology Test, the Jung Typology Profiler for Workplace
(HumanMetrics, 2013) which assesses 14 workplace-related indices
such as Power (leadership index), Assurance, Empathy, Visionary,
Resourcefulness, Communication, Sociability, Rationality,
Conscientiousness and Self-control. In South Africa, much progress has
been made with the development of a personality measure aimed at
employees with a relatively poor command of English. This is the
Personality at Work measure (PAW – Fick, 2011), consisting of five
factors, namely: (1) Doing the work orientation; (2) Self- and perception
orientation; (3) Thinking and styles approach; (4) People and
relationship orientation; and (5) Emotions and feelings orientation. Each
of these factors is divided into four facets, and each consists of a further
three components, yielding 60 separate aspects that are evaluated. Each
of these components consists of four items, two phrased positively and
two negatively.

18.2.4 Competencies
Another theoretical area where we need clarity is that of competencies.
Although the construct has a great deal of appeal, especially when tied
to specific outcomes such as passing examinations and being selected
into specific positions, far more attention needs to be paid to the longer-
term aspects of the definition of competence, including the crucial
question: “Competent for what?” which has to be answered. This and
other issues relating to competencies were dealt with in detail in Chapter
12. The view that competency is an all-or-nothing condition is
inadequate, given the different needs of different stakeholders and
different purposes that a single assessment has to serve.

18.2.5 Emotional intelligence


In recent years, the concept of emotional intelligence has gained
popularity, and several comprehensive models of emotional intelligence
provide alternative theoretical frameworks for conceptualising this
construct. As Emmerling and Goleman (2003) note, there have been
three quite distinct approaches to emotional intelligence represented by
the work of Bar-On (1997), Goleman (1995), and Mayer and Salovey
(1993). As Caruso (2004) points out in his review of Emmerling and
Goleman’s paper, Bar-On’s interest seems to have grown out of his
concern with a concept called subjective well-being and non-intellective
aspects of performance. Goleman was a student of David McClelland
and is concerned with the area of competencies. Mayer and his
colleague Salovey both worked in the areas of human intelligence as
well as cognition and affect (how emotions and thinking interact to
affect performance, especially with respect to health psychology).

18.2.5.1 Emotional intelligence as “intelligence”


A major issue that we need to clarify is whether emotional intelligence is
a form of intelligence as defined in this text (efficiency of information
processing) or whether it is closer to being a personality variable
(preferred or typical ways of dealing with the world). Those in favour of
the intelligence argument maintain that it has a direct relationship to the
concept of social intelligence which was first identified by Thorndike in
1920. He defined social intelligence as “the ability to understand and
manage men and women, boys and girls – to act wisely in human
relations”. The concept of emotional intelligence developed further by
incorporating aspects of Gardner’s (1983) theory of multiple
intelligences (see section 10.4.4). As shown in Table 10.1, interpersonal
intelligence is the ability to sense others’ feelings and be in tune with
others. It includes the ability to communicate effectively with other
people and to be able to develop relationships. Interpersonal intelligence
is related to person-to-person encounters in such things as effective
communication, working together with others toward a common goal
and noticing distinctions among people. Intrapersonal intelligence, on
the other hand, is related to introspection and knowledge of the internal
aspects of the self. It is the ability to know one’s own body and mind,
and to understand and reflect on one’s own emotions, motivations and
inner states of being. Even though Gardner did not use the term
“emotional intelligence”, his concepts of intra-personal and
interpersonal intelligence provided a foundation for later models of
emotional intelligence.

Although there is no doubt that emotional intelligence is an important


attribute and is of importance in many interpersonal situations, both at
work and in the wider world, it clearly does not meet the minimum
criteria of “intelligence” as it is generally defined. The conceptual
difficulties need to be clarified, and an accepted and uniform way of
assessing this needs to be found if it is to remain a useful concept in
organisational psychology.

18.2.5.2 Criticisms of emotional intelligence as “intelligence”


According to Emmerling and Goleman (2003), cognitive intelligence
(IQ) (see Sidebar 18.1) is clearly defined, and research has demonstrated
that it is a reliable and relatively stable measure of cognitive capacity or
ability. They go on to argue that in the area of so-called emotional
intelligence (i.e. the emotional quotient – EQ), its various definitions are
inconsistent in terms of what it measures. For example, people such as
Bradberry and Greaves (2005) argue that EQ is not fixed and that it can
be learned or increased, whereas others (such as Mayer & Salovey,
1993) argue that EQ is stable and cannot be increased. In addition,
Emmerling and Goleman point out that emotional intelligence has no
“benchmark” or external criterion against which to evaluate itself. They
contrast this with traditional IQ tests which have been designed to
correlate as closely as possible with school grades. Emotional
intelligence seems to have no similar objective quantity on which it can
be based. Intelligence tests are characterised by items that have one
correct answer, whereas EQ tests are far more like personality scales
where the instructions generally stress that there is no correct answer,
and candidates should respond as they typically react. Finally, traditional
intelligence tests (and they are tests in the true sense of the word) are
generally timed, and the items display increasing levels of difficulty. EQ
measures do not have this sense of increasing difficulty about them.

Sidebar 18.1 Cognitive intelligence


Cognitive intelligence is often expressed as the intelligence quotient (IQ), and the
two are used interchangeably by some people. In the same way, emotional
intelligence (EI) is measured by EQ, so EI and EQ are often seen as the same
thing.

As a result, many psychological researchers do not accept emotional


intelligence as a part of a “standard” intelligence model (like IQ). For
example, Eysenck (2000, pp. 109–110) argues that Goleman exemplifies
more clearly than most the fundamental absurdity of the tendency to
class almost any type of behaviour as an “intelligence”: “If these five
‘abilities’ define ‘emotional intelligence’, we would expect some
evidence that they are highly correlated; Goleman admits that they might
be quite uncorrelated … [s]o the whole theory is built on quicksand;
there is no sound scientific basis.”

There are thus fairly strong arguments that EQ is not a form of


intelligence and that the term is used in a loose, unscientific and populist
fashion. Indeed, this argument could, in my view, with equal
justification be aimed at Gardiner’s theory of multiple intelligences.

18.2.5.3 Emotional intelligence as “personality”


If emotional intelligence is not a form of intelligence, then what is it?
Various researchers have indicated that EQ has many of the properties
associated with personality theories. For example, it correlates
significantly with two dimensions of the Big Five, namely neuroticism
and extraversion. In common with most personality measures, EQ
measures are made up of items that are quite transparent in that the test-
taker knows exactly what is being looked for in the scale. This makes it
very easy for test-takers to respond in a socially desirable way – a
practice known as “faking good”. This is a form of bias or systematic
error that has long been known to contaminate responses on personality
inventories. It is thus argued that the similarities between personality
testing and self-report EQ testing and the differences between EQ and
traditional intelligence (IQ) make it reasonable to assert that EQ is much
closer to being a measure of personality than it is to being a measure of
intelligence. Until the definition of emotional intelligence is clarified,
little progress can be expected in its assessment.

18.2.5.4 Emotional intelligence as “competency”


One way out of the dilemma is to suggest that EQ is a competency
rather than either a form of intelligence or a personality dimension. We
recall that a competency is defined as a blend of knowledge, skills,
attitudes and values (or KSAVs) required for success in a particular
situation (see section 11.1). In support of this, Goleman (1995) has
described his five dimensions of emotional intelligence in terms of 25
different emotional competencies.

It would thus appear that EQ is not strictly a form of intelligence but


rather a set of competencies (which are defined as the knowledge, skills,
attitudes or attributes and values that are required for successful task
performance). Seeing EQ as a set of competencies rather than as
intelligence allows us to move beyond the intelligence versus
personality debate. It may also open new possibilities for assessment.
These three alternatives are summarised in Table 18.1.

Table 18.1 Three ways of viewing emotional intelligence

EI as Theory Related to: Assessment


Intelligence Intellectual abilities using Models of Timed efficiency
emotional information general or measures – tests
(e.g. ability to identify standard
emotion) intelligence
Personality Traits related to Models of Measures of typical or
adaptation and coping personality preferred ways of
(e.g. assertiveness) and interacting with others –
dispositional inventories
traits
Competency Acquired KSAVs Leadership Demonstrated behaviour
underlying effective competency patterns in specified
performance (e.g. models situation – role plays,
influence in leadership) simulations

18.2.6 Controlling response sets


An important issue raised by Fernández-Ballesteros (2006) is that of
response sets and the need to find ways of identifying and controlling
for them. Since the 1920s, psychologists have been aware of various
response patterns that systematically distort responses on self-report
psychological assessments, increasing bias and decreasing validity.
These response sets include various forms of social desirability,
deception and self-deception, “faking good” and “faking bad”,
simulation, being dishonest, malingering, defensiveness, excessive self-
disclosure, inflated self-descriptions, and coping styles such as denial
and repression. There are also response styles such as acquiescence
(agreeing with all statements), and central and extremity response sets
(preferring the middle or outer ranges of the response options). All these
response sets affect the validity of the assessment as they represent
systematic distortions of the true score component. Many organisations
in both the public and private social domains are interested in their
assessment as an expression of integrity or honesty.

Most new scales attempt to control these sources of error by including


consistency, unusual responses, social desirability and extremity
response checks in the scales. Acquiescence is controlled by alternating
positively and negatively phrased versions of the same item. Despite all
these checks and balances, people being assessed are still able to lie and
often to get away with this in a way that cannot be totally controlled.
One way of addressing this is through adaptive assessment – if a
response set is suspected, new item formulations can be administered to
examine this. This is addressed in section 18.3.4 when we look at
adaptive assessment techniques.

18.2.7 The assessment of behavioural change


In recent years, attention has been focused on a new applied field of
psychological assessment and measurement that concerns programme
and/or training evaluation. When social or community programmes and
organisational interventions have behavioural goals, psychological
assessment processes are needed to assess behavioural change. In most
of these instances, changes in awareness and knowledge are insufficient
for the desired behavioural changes to occur. For example, in any
programme aimed at improving behaviours associated with such factors
as safety, organisational citizenship, quality and productivity, new
measures will have to be developed.

18.2.8 Bespoke (tailor-made) tests


The development of new technology will allow a number of equivalent,
but not identical, tests (bespoke tests) to be tailor made for different
people, even within a single test session based on what is known as the
linear-on-the-fly technique (LOFT*). This is discussed in greater depth
in section 18.3.7.1.

18.2.9 Focused assessment batteries


One important consequence of the new developments in assessment
theory and technology is likely to be the development of a number of
tests and other assessment techniques designed for dedicated areas of
investigation – boutique tests, as it were. In place of wide-spectrum,
one-size-fits-all assessments, there is likely to be a large number of
narrowly focused instruments, assembled together into adaptive
batteries, where the results of one assessment will specify quite closely
the next form of assessment. As indicated earlier, various tests
measuring aspects of personality in the workplace (HumanMetrics,
2013) have already appeared in the market or are being developed (Fick,
2011). However, as yet these are not adaptively linked into batteries.

18.2.10 New constructs associated with positive psychology


The positive psychology* movement also opens up a range of new
possibilities. One simple example is the assessment at individual level of
positive energy in a manner that is akin to measuring the negative
energy of stress. At the organisational level, we could develop methods
for assessing the extent to which social or organisational contexts
generate tranquillity, in a way that is analogous to “stressogenic” or
stress-causing situations. (Can we talk of assessing individual levels of
tranquillity and “tranquilogenic” situations or organisations?)

In the same way, we could begin to assess aspects of organisational


citizenship behaviour (OCB). This is a relatively new work-related
construct, which needs to be clearly defined and ways of measuring it
established. According to Coleman and Borman (2000), OCB consists of
three broad categories of behaviour, namely interpersonal citizenship
behaviour (benefiting peers and employees), organisational citizen
behaviour (benefiting the organisation) and job or task
conscientiousness (benefiting the work itself). With this as a starting
point, can we identify and measure people’s levels of OCB, and the
factors that allow OCB to emerge in organisations (“OCB-genic”
organisations)?

18.2.11 Fairness and equal opportunity


As a general rule, but especially in South Africa with its history of social
discrimination and exclusion, assessment practices have to be fair to all,
in the sense that people with equal ability or potential need to be
identified in exactly the same way. Assessment techniques and
instruments such as tests therefore need to be properly constructed,
administered, scored and interpreted to achieve this objective. Items in
various tests and measures must have the same meaning and
characteristics for different groups. In this regard, Laher and Cockroft
(2013) point out the need to develop locally relevant or emic personality
tests to augment and possibly replace tests such as the NEO-PI-R
measure of the five-factor model. In China, Leung and her colleagues
(Cheung et al., 2001) have shown that a sixth factor, Interpersonal
Relations, is needed to explain personality, while work by a group of
researchers including Deon Meiring, Fons van de Vijver, Gideon de
Bruin and others (e.g. Valchev et al., 2013) has suggested that as many
as nine factors are required to fully describe personality among various
ethnic and cultural groups in South Africa.

Test designers should give a great deal of attention to the construction of


these instruments and to the analysis of the items. The item
characteristic curves* of the instruments must be carefully examined,
and corrections made where necessary. This is particularly necessary in
high-stake assessments, which are used for making life-changing
decisions: job selection, promotions, the awarding of bursaries and
scholarships, and (on the negative side) admission to psychiatric
institutions and places of safety, dismissal from organisations, demotion,
and so on. A major challenge thus facing South African test developers
and users is to adapt existing measures and to develop a new set of
culturally appropriate techniques (see also Foxcroft & Roodt, 2005, pp.
254–255).

In addition, psychological assessment processes can, and should, be seen


as a means for redressing past imbalances. In South Africa, as occurred
in the 1970s and 1980s in the US, assessment has been seen in many
quarters as a hegemonic tool to maintain the social and racial status quo
(e.g. Nzimande, 1984, 1995; Foxcroft, 1997). As Milner, Donald and
Thatcher (2013) note: “Critical reviews of assessment tools and
practices in South Africa have thus contributed to a body of knowledge
that positions assessment as a contested terrain” (p. 489). Against an
urgent need for sociopolitical transformation in various parts of the
world, and given the shortages of skills and high-level manpower in
many economies, there is a great need for assessment to play a social
transformative role in the workplace and wider society (Milner, Donald
& Thatcher, 2013).

This argument has a long history. In a paper examining fairness in


assessment (albeit in an educational context), Schellenberg (2004)
contrasts the psychometric framework with the sociocultural framework.
He argues that during the late 1960s and early 1970s a changing
sociopolitical awareness and sensitivity in the US resulted in the
growing expectation that test results should be approximately equal for
different groups. Within this framework, the psychometric approach
concentrates on examining the testing instrument and respondents’
responses to it. Sociocultural approaches look at performance on the test
as part of the overall context in which a student lives and learns.
Educators intent on improving the lot of historically underserved classes
adopt the latter view, arguing that testing forms part of the cultural
phenomenon of public education (and by extension, the world of work),
which itself is reflective of larger societal and cultural issues. These
educators are left in the position of arguing that although the test data
showed that certain groups of respondents were not achieving, this was
because the test was itself unfair and therefore not to be trusted.
Accordingly, psychometric analysis fails to situate assessment in a
cultural context, with the result that we cannot truly address cultural
bias. Although psychometric analyses may detect the artefacts of bias,
they do little to explain or alleviate it.

The arguments put forward by Milner et al. (2013) need to be seen


within this context. The fundamental issue remains some 50 years on –
do differences in test scores reflect fundamental flaws in the assessment
technique, or are they indicative of sociocultural differences that may (or
may not) impact on workplace behaviour and performance? The
assumption of unqualified individualism discussed in Chapter 7 (section
7.4.1 and Table 7.4) presupposes the cross-cultural equivalence of the
measure(s) used and adopts a top-down selection approach; although
this maximises utility for the organisation in the short to medium term, it
cannot be seen as appropriate (within a sociocultural framework) for

a nation striving to transform its social and occupational relations.


Theron (2007) addresses this issue by proposing that more attention be
paid to competency potential and its development within the work
situation, rather than selecting people based on their existing knowledge
and skills. This stresses once more the need to understand potential and
to find adequate ways of assessing it (see section 18.2.2).

18.2.12 Translation, adaptation and development of culture


instruments
As discussed in Chapter 8, there is an increasing need to translate and
adapt existing tests and instruments into various languages, while
ensuring the construct and measurement equivalence of the various
versions. Fernández-Ballesteros (2006) argues that

[a]t the end of the 20th century we are living in a planetary or earth
world; more and more, psychologists will develop their work in
different languages and cultures. This fact demands standards for test
adaptation and for test construction for cross-cultural research and
practice, and more efforts will need to be made in this direction (no
page no.)

Clearly, issues such as the nature of intelligence and the structure of


personality in various settings and with different cultural groups need to
be clarified to ensure fairness. As Laher and Foxcroft (2013) point out,
even the nature of culture needs to be explored, and they suggest that
levels of acculturation to the dominant (Eurocentric) ethos is possibly
becoming more important than ingrained cultural categories.
The issue of acculturation is discussed in some depth in Chapter 8,
section 8.1.3, where the two-dimensional model of acculturation put
forward by Ryder, Alden and Paulhus (2000) is discussed. According to
this model, rather than pursuing complete adjustment to the new culture
in an assimilationist way, the trend has been for many migrants to
develop a bi-cultural identity or retain their original culture without
extensively adjusting to the society of settlement. Laher and Cockcroft
(2013) also point to the growing acculturation of the dominant groups
towards the cultural values of the non-dominant group. They argue that

[t]he general perception tends to be that in South Africa, African


individuals are acculturated into the white, Western, usually
individualistic, culture. However, since 1994, it has become
increasingly evident from daily interactions that acculturation is
occurring in both directions (p. 541).

They point to the growing popularity of soccer (a game historically


associated with black spectators) among white spectators and an
increasing viewership of rugby (traditionally a white spectators’ game)
by black spectators.

Clearly, as suggested by Van de Vijver and Phalet (2004), the various


measures of acculturation that have been developed need to be applied
as a precursor to assessment in a multicultural context. They argue (p.
218) that

[i]t is regrettable that assessment of acculturation is not an integral


part of assessment in multicultural groups (or ethnic groups in
general) when mainstream instruments are used among migrants.

18.3 Development of new technologies

Another area in which we can expect new trends in assessment will


come from both a continuation of past practices and advances in
technology (including what Fernández-Ballesteros (2006) has termed
“the interchanges between new technologies and the cognitive sciences”,
computer-assisted assessment and assessment through virtual reality). A
number of important technological developments have occurred in the
last 20 years or so, and these are increasingly finding their way into
psychological assessment. By far the most important of these is the
computer and the associated Internet.

18.3.1 Computer-based testing


Computers now play a major role in almost all facets of human life,
from agriculture to sport. There has been a continuous increase in
computer usage since the 1980s (Lyman, 1998), and computer
administration and scoring of tests have become general practice (Silzer
& Jeanneret, 1998). In the US, the Association of Test Publishers (ATP)
has developed standards for computer-based testing (CBT) (Harris,
2000). Currently, the Professional Board for Psychology of the Health
Professions Council of South Africa is giving attention to the matter as
part of its re-examination of testing, as outlined in Chapter 7, section
7.5.2.

Technological advances for administering and interpreting computerised


versions of already existing tests and other psychological instruments
have resulted in computer-based testing becoming common practice.
While there was initially some concern with the psychometric properties
of computer-based administration of assessment material, there is
generally positive evidence regarding equivalence (Clarke, 2000;
Donovan, Drasgow & Probst, 2000; Neuman & Baydoun, 1998). In
general, computer-based tests are as reliable and valid as traditional
paper-and-pencil measures and have become the preferred method in
comparison to the more traditional methods. More than a decade ago,
Richman-Hirsch, Olson-Buchanan and Drasgow (2000) showed that
managers completing paper-and-pencil, computerised and multimedia
versions of a test rated the multimedia version of assessment as having
greater face validity and had more positive attitudes toward the test than
did managers completing the other two versions of the test. The
researchers speculate that because technology is so ingrained in the
work lives of most people, there is an expectation that the use of
technology increases the value and accuracy of the process.

Since then, the use of computerised testing, especially via the Internet,
has become the preferred mode and in some cases the only way of
testing for large organisations. Macqueen (2012), for example, shows
that SHL, a leading global test provider, currently conducts 95 per cent
of its testing online rather than through the traditional paper-and-pencil
methods. He argues further that there is widespread support for the view
that within five to ten years all psychological testing, apart from certain
clinical and neuro-psychological testing, will be conducted online.

The use of computers has affected assessment in at least three important


ways, namely computer-assisted administration (including scoring and
report writing), the assessment of additional parameters that cannot be
achieved in paper-and-pencil versions, and computer-based adaptive
testing. Let us examine each of these in turn.

18.3.2 Computer-assisted administration


In many ways, this is merely the application of computer technology to
the administration of existing techniques. In this context, the computer is
little more than an electronic administrator, scorer and interpreter of
assessment results. However, this has decided advantages. One is a more
uniform presentation of items and ease of scoring. Instant feedback can
be given to the test administrator and, if required, to the test-taker.
Reports can be compiled (written) by the computer. This is nothing
more than a mechanical interpretation of the data (see also section
18.3.5) and the computer is little more than a very efficient administrator
and “page turner”.

At a slightly more advanced level, computerised testing can improve the


quality of the items, with both the material and the instructions making
use of three-dimensional graphics and being able to move item
components. For example, instead of having to verbally present the idea
of an item rotating in space, the computer can do this visually. Similarly,
instead of asking people to imagine which two items fit together to form,
say, a square, the two components can be moved together on the screen
to show exactly what this entails.

At an even higher level of complexity, computer-based assessment


provides the opportunity for multimedia test items such as those based
on film (video) and sound (audio) to be included (Hanson et al. 1999;
Stanton, 1999). One of the most fascinating and promising fields for
assessment is virtual reality. This can be defined as a computer-
generated simulation of the three-dimensional environment in which the
user is able both to deal with and manipulate the contents of that
environment using his five senses (Stampe, Roehl & Eagan, 1993).
Foxcroft and Roodt (2005, p. 256) refer to this as “virtual assisted
testing (VAT)”. Such progress in assessment implies not only great
advances in presenting visual stimuli, but a revolution in the handling of
material involving other senses such as sound, touch, balance or smell.
Progress in the first arena has made great strides in the use of computer-
generated graphics in recent movies. All that remains is for people to
harness this technology in the assessment field.

An important advantage of computer-based assessment (CBA) is that it


allows testing and other forms of assessment to take place around the
clock and at remote sites through Internet or web-based assessment.
Section 18.3.7 takes a closer look at some of the issues associated with
web-based assessment.

18.3.2.1 Advantages for the administrator


Although CBA has a number of advantages, these generally favour the
test administrators rather than the people being assessed. Perhaps the
greatest advantage of CBA is that it frees the administrator from many
of the chores of administering, scoring and interpreting the assessment.
In fact, much of the administration of CBA can be done by an assistant.
Because the tests are fully automated, they are more standardised in
terms of test instructions and time keeping. Unless there is a power
failure, CBA makes the administration and scoring error free, and
provides instant feedback. Precise scores can be calculated very rapidly,
obviating unfortunate common human mistakes and saving much time,
especially in tests and inventories that are complicated to score. In
addition, the assessment can easily be costed as most programmes have
a counting system that records the number of times the assessment is
administered, and the user is charged accordingly. Disposable materials
are also saved. This has implications for both short- and long-term costs,
for convenience of test administration, and for environmental protection.
Furthermore, computer-based systems enable the updating of norms as
sufficiently large numbers of people are assessed. These scores can be
automatically and easily added to a test’s database to adjust norms and
can be used for research. In fact, the very availability of this data may
encourage research. Finally, computers can produce interpretative texts,
with suitable graphics, pie charts, norm tables, and suchlike.

18.3.2.2 Advantages for the person being assessed


There are also several advantages for the people being assessed. These
include the convenience of the assessment being carried out at any time
and place that suits them, and obviates the inconvenience of having to
travel somewhere for the assessment. Tests can be individually
administered in comfortable surroundings. In addition, the items can be
presented in a far more interesting and understandable way – models
unfold and come to life as a result of good graphics. If it is thought
advisable by the test administrator, test-takers can also have the benefit
of receiving immediate, objective, expert-based narrative feedback of
their test findings. Moreover, if a complicated inventory or test battery is
administered, a comprehensive automated evaluation may be provided
instantly as well.

18.3.2.3 Disadvantages
A major possible disadvantage for people being assessed is computer
phobia or fear of technology. This could introduce a unique, irrelevant
error variance into observed scores, thus impairing test-result validity.
Although this factor has not manifested itself among relatively well-
educated people, it is likely to be an issue with semi-literate and
unschooled employees with little exposure to electronic media. There
are also indications that computer skills – at least typing speed – may be
related to test achievement (Russell, 1999). It should be noted, however,
that this factor was found to be related to performance in open-ended
tests, not multiple-choice tests (Russell & Haney, 1997).
A second issue of concern is the possibility of the system crashing as a
result of computer, program or power failures. There have been times
when the reliability of the country’s power supply has been erratic.

18.3.2.4 Reliability and validity


In general, computerised tests maintain the psychometric properties of
their paper-and-pencil equivalents. There is debate about the accuracy of
the computer interpretation and reporting of test scores, a problem that
may become paramount when comprehensive test batteries or series of
tests are administered. This is discussed in detail in section 18.3.5.

18.3.3 Generation of norms


In the past, the need to implement wide-scale norming and validity
studies was met centrally by government agencies such as the Human
Sciences Research Council (HSRC) and the National Institute of
Personnel Research (NIPR) via coordinated, wide-scale and centrally
funded research: this is unlikely to happen in the short term as
commercial, rather than centralised and bureaucratic needs prevail.

However, as Nanette Tredoux (2013) has noted, “[w]ithin approximately


a decade, testing had changed from a largely state-funded and controlled
activity to a highly competitive, highly commercialised industry” (p.
431). The widespread use of Internet-based tests, many of which have
feedback mechanisms to the test creators, allows the collection of
normative data which is then given (sold?) back to the end users for
improved interpretation of the test results.

18.3.4 The assessment of additional parameters


A second way in which computerisation has affected assessment is
through its ability to assess various parameters and processes that cannot
be measured using ordinary paper-and-pencil technology. These
parameters include factors such as latencies (the time taken to process
different items, which can be an indication of ability, decision-making
style and impulsivity), recursions (the number of times the person
returns to previous items, which is an indication of visual search
patterns, confidence in and commitment to the solutions arrived at) and
error analysis (the patterns of items where the person struggles or fails).
This third aspect can be important if the assessment results are to form
the basis of some sort of training or remediation.

When we looked at the definition and assessment of intelligence in


section 10.1.1, one of the findings was that measured intelligence
correlates quite highly with the speed of information processing. In fact,
we defined intelligence as “the efficiency of information processing”,
with efficiency being equated to speed of processing and correctness of
outcome. In this regard, the great physiologist Matarazzo (1992, p.
1012) argues that physiological measures of intelligence are likely to
become one of the most important developments of psychological
assessment and using these indices will allow us to predict “success in
school, as well as occupational attainment and other aspects of everyday
living”.

Intelligence-as-efficiency also focuses attention on the speed with which


desirable outcomes are achieved: good solutions need to be found
relatively quickly, although it is conceded that the more difficult the
problem, the longer will be the time it takes to find a valued solution.
This idea of speed is, of course, present in the distinction between power
and speed tests. Although some work has been done on this aspect of
computerisation in South Africa (e.g. Verster, 1989), and taken forward
by people such as Tredoux (2013), more attention needs to (and
undoubtedly will) be paid to it in future.

According to Fernández-Ballesteros (2006), neuropsychology will also


become an increasingly important field of assessment, based on
developments in basic research techniques into brain-behaviour
relationships. This, she argues, has been helped by technological
developments in such areas as computerised axial tomography (CAT),
regional Cerebral-Blood-Flow (rCBF), among others. She believes that
improvement is still needed in the development of norms, criteria and
qualitative evaluation that will allow the assessor to establish more
accurate diagnoses, prognoses, and rehabilitation programme design.
However, she cautions against expecting to find any biological and bio-
physiological indices of personality as may be the case with cognitive
ability tests (see section 18.2.1). She argues that

[i]n spite of the fact that bio-physiological measurement and


assessment is, nowadays, extremely useful in basic psychological
research and in the diagnosis and rehabilitation of individual cases,
the predictive power of any of these biological indices in school
achievement or occupational success or other progress in everyday
life is not yet supported and it seems difficult that this will be reached
… [w]ithout taking into account environmental as well as
motivational factors, we will never be able to predict those
multidimensional and molar behaviours (no page no).

18.3.5 Computer-based adaptive testing


The third area in which computers can be expected to impact on
assessment is via the use of computer-based adaptive testing (CAT).
CAT (sometimes referred to as stratified or strat-adaptive assessment) is
an “interactive, computer-administered assessment process in which the
items presented are based, in part, on the performance on previous items
by the person being assessed”. What this means is that the way a person
responds to one item determines to a degree which item is presented
next (see Cohen & Swerdlik, 2002, p. 546). This can apply within a
particular assessment task (especially with ability tests), and also across
different tasks within a battery (involving such assessments as
personality, attitudes, values, motivation, etc.).

The process is as follows:

The first step is to develop an item bank or library of measures that


are ranked in order of difficulty using item response theory (IRT)*.
An item is administered at ± 30 per cent difficulty level. (For the sake
of simplicity, let us imagine a test of 100 items ranked from easiest to
most difficult.)
If the person gets this 30th item correct, then the program goes to the
60 per cent difficulty level (item 60). If he gets this correct, the
program goes to the 75 per cent level. If he gets, say, the 60 per cent
level item wrong, the program goes to the 50 per cent level. If the
person gets this correct, the program goes to the 55 per cent level, and
so on. This is shown in Figure 18.1.

Figure 18.1 Flow diagram for adaptive testing

Once a person has answered three questions correctly at any given level,
this is taken as his achievement level on the test. Of course, some
allowance must also be made for guessing. Using a CAT approach
makes the assessment much shorter because the person being assessed
does not waste time answering items that are very easy (in which case
they would all be answered correctly) or very difficult (in which case
they would all be answered incorrectly). The above explanation is based
on an ability test, but one can see how personality or job satisfaction
scores could be obtained in a similar fashion: if the candidate were to
say he was not interested in, for example, the outdoors, then later
questions relating to outdoor activities could be excluded. Meijer and
Nering (1999) suggest that one area for the future use of CAT is for
personality tests, where faking and inconsistencies in item responses can
be detected and additional items then administered to adjust for or
identify those inconsistencies. Ben-Porath, Slutske and Butcher (1989)
have shown how adaptive testing can be used very successfully with the
Minnesota Multiphasic Personality Inventory (MMPI).
18.3.5.1 Advantages of adaptive testing
A CAT system is very efficient and has been shown to reduce the
number of items by as much as 50 per cent. It therefore takes far less
time to assess the person’s ability level, while also reducing
measurement variance (the random error component) by at least 50 per
cent (see Cohen & Swerdlik, 2002, p. 546 for details).

18.3.5.2 Disadvantages of adaptive testing


Despite the decided advantages associated with CAT, there are several
important disadvantages. One of the most important is that this method
provides very limited opportunities for the test-taker to learn from
previous items. In answering the items in many, if not most,
conventional tests and similar assessment tools, the person being
assessed learns as he progresses through the exercise.

This is one reason why test-retest reliability is lower than one would
hope. This is termed the learning or transfer effect. (This is discussed in
Chapter 4.)

A second problem associated with CAT, and with item response theory
on which it is based, is that it assumes that the linearity of difficulty
levels is equal for all groups. In other words, it assumes that the item at,
say, the 60th percentile of difficulty for members of group A is at the
same level of difficulty for members of group B. Given the social and
educational discrepancies between different groups in South Africa, this
assumption is not warranted.

18.3.6 Computerised report writing


As stated above, a real advantage of computer-assisted assessment is the
facility that most programs have of generating written reports. This is
essentially a variation of the mechanical approach to decision making
(see section 6.5.1), in which different interpretations are linked to
different scores in a branching fashion to produce statements that sound
knowledgeable. The basic format of this interpretation is simply
something like the following: “If the score on dimension A is X and the
score on dimension B is Y, and if the score on dimension C is Z, then
…”. (See Sidebar 18.2.)

Sidebar 18.2 Decision rules for interpreting scores


Just how sophisticated these decision rules need to be is illustrated by an
example in Cohen and Swerdlik (2002, p. 558). Describing a software program for
screening employees on the MMPI (the Minnesota Multiphasic Personality
Inventory – a measure of personality within a clinical setting), they show that
interpreting the statement: “The client may be inclined to keep problems to
himself too much” depends on the following conditions being met:

1. Lie and Correction scales are greater than the Infrequency scale.
2. The Infrequency scale is less than a T-score of 55, and
3. the Depression, Paranoia, Psychasthenia and Schizophrenia scales are less
than a T-score of 65, and
4. the Conversion Hysteria scale is greater than T=69, or
5. the Need for Affection subscale is greater than T=63, or
6. the Conversion Hysteria scale is greater than T=64, and the Denial of Social
Anxiety subscale or Inhibition of Aggression subscale is greater than T=59,
or
7. the Repression scale is greater than T=59, or
8. the Brooding subscale is greater than T=59.

All of these conditions need to be met in order to give a score and interpret the
statement quoted above.

It takes very little extra effort for a program to recognise the gender of
the person being assessed, and to generate the correct pronouns (he/she,
his/her) in the reports that are produced.

18.3.6.1 Validity of the interpretation


As Sidebar 18.2 shows, the generation of an interpretive report such as
that outlined above is essentially a mechanical or actuarial combination
of scores (albeit a highly sophisticated one). The question now arises as
to whether these reports are more valid, just as valid as or less valid than
those produced by an experienced assessor.
Obviously, the first thing to note is that the answer to this depends on
the sophistication of the program. In the early stages of the development
of this technology, serious questions were raised about the accuracy of
some interpretations. However, in recent years this has not been an issue
with reputable products, because they have been developed by panels of
experts and programmers.

Grove et al. (2000) use a meta-analysis of 136 studies to compare the


results of the interpretations of scores by experienced assessors
(clinicians) with the reports generated by computer programs. In some
studies the two interpretations are almost the same, but Grove et al.
conclude that “on average, the computer-generated interpretations were
approximately 10% more accurate than those generated by the
clinicians” (See Cohen & Swerdlik, 2002, p. 561). They argue that this
is the case because the computers, unlike the clinicians, are 100 per cent
reliable. Grove et al. also make the point that the computer-based
interpretations are significantly cheaper than if clinical psychologists are
used. At the same time, it must be conceded that when non-quantitative
data, such as interviews, are included, or where the evaluation takes
place in a cross-cultural setting, the superiority of the computer-based
systems is likely to be reduced. In many ways, the issue of clinical
versus mechanical combination of data, discussed in section 6.5.1, is
being raised again.

Although the local psychological authorities have yet to pronounce


definitively on the whole issue of computer-assisted assessment, it is
important that the person interpreting the reports reads them very
carefully. The profession demands that these reports are read and
reinterpreted if necessary – a clinical combination of information still
appears to be regarded as superior to the mechanical combination of
information.

18.3.7 Assessment via the Internet


Perhaps the most important development in recent times has been testing
via the Internet. As Macqueen (2012) has argued, both globalisation and
the increasing need for speed and efficiency in test administration and
the accompanying decision making has led to a marked increase in the
use of online psychological testing in recent years, especially within
organisational settings. He cites the fact that SHL, a leading global test
provider, currently conducts 95 per cent of its testing online rather than
through the traditional paper-and-pencil methods. Citing Hambleton
(2010), he argues further that there is widespread support for the view
that within five to ten years all psychological testing, apart from certain
clinical and neuropsychological testing, will be conducted online. He
also cites the 2012 Global Assessment Trends Report (Fallaw,
Kantrowitz & Dawson, 2012) as stating that at least 64 per cent of the
respondents to the human resources survey indicated that their
companies allow “remote” online testing without direct supervision (i.e.
“unproctored”).

Internet-based testing includes all the advantages and disadvantages of


personal (micro) computer-based testing, but introduces additional
factors, negative and positive. While some of the tests reflect an online
version of pre-existing paper-and-pencil tests, others are original. A
major advantage of Internet-based testing is that it does not require test
software as this is installed on a remote server that uses a few, fairly
standard capabilities of the test-taker’s own computer. Access to a
particular online test may be open to all or limited to selected users by
means of a password or other mechanism. Test time can be
predetermined (through correct programming), and these tests assess
various content areas such as tests of intelligence and specific abilities,
perceptual tests, clerical tests, measures of a wide range of attitudes,
personality, vocational interests and attitudes, and more.

18.3.7.1 Advantages
Web-based testing has the advantages of offering 24-hour access to
testing in all corners of the world, immediate scoring, and a limited need
for test administrators, which means convenient, cost-effective and
efficient testing (Jones, 1998; Jones & Higgins, n.d.). Online testing also
provides advantages in terms of cost, volume, efficiency, global reach
and standardisation (see, for example, Tippins, 2009). The Internet also
allows both recruitment organisations/managers and researchers to find
a large number of participants who are already online and are interested
in being assessed.

There are a few other advantages that relate to the design of the tests and
scales:

1. Firstly, data can be collected quickly and cheaply in a relatively


secure way that protects privacy and ensures confidentiality.
2. Online testing has flexibility with respect to factors such as item
format, and accessibility to a given test in different language
versions and with choice of norms. Macqueen (2012) points out that
with the advancement of item response theory (IRT), online testing
is able to generate a number of equivalent but not identical tests
(bespoke tests). This process is now increasingly being driven by
two methods, namely the linear-on-the-fly technique (LOFT*),
which involves a large databank of items selected at random for a
test of fixed item length; and secondly computerised adaptive testing
(CAT), which can result in a relatively short test, with item selection
from a large databank of items being dependent upon the test-taker’s
pattern of responses. The test is concluded once a prescribed
threshold of the standard error of measurement is reached. CAT is
described in section 18.3.5.
3. Publishers can ensure that test administrators have access to the
most up-to-date test versions, norms and manuals.
4. Participants can take an online test anonymously, in private, and at
their own pace, which encourages honesty in their responses.
5. Internet pages can be constructed with mandatory fields to prevent
oversights or omissions during testing.
6. Participants may feel more comfortable revealing sensitive data
about themselves to a computer than to a human interviewer.
7. If follow-up is desired, an entry can be created for the participant to
leave an email address while the researcher can leave a contact
email address on the web test for future questions.
8. Participants who would like to follow the outcome of a particular
study can be given a “results” page for further information, which
may assist researchers in providing debriefing and making research
results available to participants.
9. One final advantage of online assessment relates to the issue of
illegal copying of tests. Depending on how an Internet-based test is
laid out, it can be made more difficult to print or reproduce than a
paper test. Even if the tests are duplicated without the ability to
score them, the test questions are not very useful.

In essence, online testing can lead to better, faster and cheaper


assessment outcomes, particularly where large numbers of test-takers are
involved.

18.3.7.2 Disadvantages
Despite the many advantages associated with online assessment, there
are a number of disadvantages as well. These include concerns about the
measurement and construct equivalence of items (and the test as a
whole), test security, standardisation of administration at remote testing
sites and what Macqueen (2012) refers to as “touch”.

Measurement and construct equivalence. A fundamental question


needs to be addressed: “What are we really measuring?” In converting
a paper-based test to an online format, appropriate piloting and even
simulation need to be conducted, with consideration of differential
item functioning. However, as indicated in section 18.3.2.4, all
indications are that computer-based tests maintain the reliability and
validity of the original paper-and-pencil formats. However, more
work needs to be done in this area, especially with respect to the
cross-cultural equivalence of online measures given the sometimes
large discrepancies in language and exposure to computer-based
technologies.
Cheating and security. By far the greatest risk is that job-seekers
who are being assessed are in a position to cheat or to get assistance
during the process. This is especially so when the assessment process
is carried out remotely without any form of external monitoring. As
Macqueen (2012) points out, “speeded” high-stakes cognitive tests are
less vulnerable than unspeeded “power” tests, as the former appear to
be partially buffered because of the timed nature of the assessment.
Surrogates or “stand-ins” may take the test on someone else’s behalf,
although test-taker authentication can also be an issue for traditional
testing. One way around this is to ensure that some form of
administrator or supervisor or proctor is present when the assessment
takes place.

A useful way of managing cheating is to inform people that a parallel version of


the assessment will take place under controlled conditions at a later stage should
the person be short-listed. A discrepancy of more than (say) 0,5 of a Standard
Deviation or one Standard Error of Measurement (SEM) on the second
assessment will result in the person being automatically excluded from further
consideration. This warning decreases the temptation to cheat.

More subtle methods of detecting item responses that indicate prior


exposure to the material (various transfer and learning effects) may be
found. The emerging field of data forensics is able to detect and thus
prevent inappropriate test-taking behaviour. The use of technology
provides a major step forward in controlling for levels of honesty and
faking and other response styles/demand characteristics. In this regard,
Macqueen (2012) shows that US-based organisations such as Kryteryon
offer real-time analysis of online responses so that unusual patterns can
be detected (e.g. fast latencies on difficult items) and keystroke analytics
can be used to authenticate test-taker identity. Nevertheless, cheating
can still occur.

Macqueen (2012) also poses the question as to what percentage of test-


takers actually cheat on online tests. He cites a study by Jing, Drasgow
and Gibby (2012), who claim that the estimated base rate of cheating is
low, and that the extent of cheating is probably influenced by the
perceived selection ratio and the average level of item difficulty.
Similarly, Weiner and Rice (2012) contend that (only) five to ten per
cent of scores obtained in unsupervised conditions were unconfirmed in
subsequent verification testing. Macqueen (2012) raises the issue of
what level of confidence is required for a practitioner to conclude that an
individual has cheated when a verification score is statistically different
from the original unsupervised score. The solution posed in the box
above is half a standard deviation, although this is merely a wild guess
that needs further investigation.

Standardisation. As we have seen, one of the advantages of online


testing is the standardisation of test materials, and the administration
and scoring processes associated with the testing. At the same time,
there can be variability within an unsupervised testing process,
including not only the testing environment, but also the quality and
suitability of the device display and the technology in general. At the
same time, ongoing technological improvements continue to reduce
the impact of the technology. This having been said, we should not
lose sight of the fact that candidates are often prepared to undertake
testing under less than satisfactory conditions. Macqueen (2012) cites
a study by Morelli (2012) in which data from over 900 000 applicants
for customer support roles were presented – in this study, a small
percentage had been tested via a game console, and some applications
are now available for personality tests to be completed on mobile
devices. In the light of these findings, Macqueen (2012) argues that
the test delivery system and the testing environment continue to be
potential sources of error in test scores.
Lack of “touch”. The final issue raised by Macqueen (2012) is what
he terms a lack of “touch”. By this he means a lack of contact with the
test-taker and a failure to appreciate the context of the testing and any
special factors associated with the testing activity or the test-taker. He
states that “touch” provides qualitative information which can
increase the variability in the assessment (“noise”), but it can also
provide rich information regarding the individual and how they
function (“signal”) (pp. 2–3 of 5). He goes on to argue that the
potential danger with online testing is that the profile generated by the
computer and its report writer becomes the “person” in the eyes of the
person interpreting the test outcome, particularly if the person is
unaware of aspects such as measurement error, confidence intervals
and the nature of the norms being used.

18.3.8 Dynamic assessment


As we saw in Chapter 10 (section 10.5.8), dynamic testing or assessment
is gaining popularity in South Africa and elsewhere in the world.
However, it is a labour-intensive process, although recent innovations
have addressed some of the reasons for this, for example by improved
administrator training and standardising the teaching process. It is likely
that computer-based instruction, used in an adaptive fashion, will
increase the practicability and use of this technique even further. It is a
powerful tool that overcomes a number of the problems associated with
the fairness of many existing measures when used in a multicultural
setting. It is also likely to prove to be the method of choice when
assessing potential in previously disadvantaged and culturally different
groups. For an in-depth examination of dynamic assessment in South
Africa, especially to counter the various arguments against its use, see
Murphy and Maree (2006).

18.3.9 Stratified systems theory


Chapter 14 (section 14.3.1) shows that the stratified systems theory links
the employee’s information processing complexity level to the level of
work at which he is able to function optimally. In 1989, SST was
relabelled Requisite Organisation by Elliott Jaques (see Jaques, 1998). It
is likely that this assessment technique will increase in popularity,
especially if the somewhat labour-intensive approach to assessment can
be computerised. Once again, some form of adaptive assessment is
needed to give impetus to this technique. In her Master’s thesis,
Kitching (2004) explored the cross-cultural equivalence of the Career
Path Appreciation (CPA) technique. Using a data base of approximately
27 000 cases provided by BIOSS, she identified 4606 entries with CPA
scores. On the basis of her analysis of these results she concluded that
there is no evidence of bias across Asian, black, coloured and white
groups. Although these results are promising, more work in the area of
the cross-cultural equivalence of the CPA is perhaps needed. (For a
good analysis of the contribution (and lack thereof) by Jaques, readers
are referred to Groenewald’s (2012) Master’s thesis where he takes an
interpretivist view of Jaques’ work and SST in particular.)

18.3.10 Other new technologies


According to Fernández-Ballesteros (2006), one of the most promising
fields for assessment is virtual reality, which can be defined as a
computer-generated simulation of the three-dimensional environment in
which the user is able to both deal with and manipulate the content of
that environment involving his five senses (Stampe, Roehl & Eagan,
1993; p. 9). This will require not only a revolution in the presentation of
visual stimuli, but allow the use of other senses such as auditory, touch,
balance and even smell. Fernández-Ballesteros asks whether it would be
possible to test in virtual reality and answers with an absolute “yes” to
this, arguing that spatial orientation, learning, interpersonal interaction,
as well as other targets for assessment could be tested by means of
virtual situational tests. The use of computer-based in-baskets and other
simulations used widely in assessment centres and other behavioural
assessment go some of the way toward realising this potential. However,
the use of “scent injectors” analogous to ink-jet printing technology has
hardly begun to be realised, and yet, conceptually at least, this does not
appear to be an insurmountable problem.

18.4 Theoretical advances

The third area which may give rise to new constructs or ways of
assessing is the development of new theories in science in general, and
in the cognitive, behavioural and social sciences in particular. In this
section we examine a few of the more exciting developments that are
likely to affect psychological assessment. Perhaps the most important of
these is chaos theory or complexity science. Another is artificial
intelligence. A third is the issue of biological and physiological
measures of intellectual capacity. Although this last concept has been
around for almost as long as psychological assessment itself, recent
developments in the biological sciences now make it likely that
physiological measures may become real, accurate and valid measures
of psychological functions. Let us examine each of these briefly.
18.4.1 Complexity theory
One area of theorising from which new concepts and definitions of
personality may emerge is that of the complexity sciences*. Complexity
theory is an emerging field or paradigm* of science that is closely
related to chaos theory*. In many respects, it is the logical successor to
the general systems theory that emerged in the second half of the 20th
century. Complexity science throws up new ideas such as fractals*
(which are self-repeating patterns at different levels of magnitude) and
strange attractors* (which can crudely be seen as “magnets” which
pull phenomena to particular points and push them away from others).

Sidebar 18.3 Complexity theory and the new science


In very basic terms, complexity theory is concerned with the appearance of order
out of chaos. When a system is closed and it receives no information from outside
itself, it spirals down into disorder in a process that is called entropy. Under
certain circumstances, however, new order emerges from this chaos – life is an
example of such an emergence. A key construct in complexity science is fractals.
A fractal is a pattern that reflects itself at different levels of magnitude. A classic
example of a fractal pattern can be seen in a cauliflower, where the same basic
shape repeats itself at the smallest pinhead level, at the pea-size level, at the
egg-size level and at the total cauliflower head level. The same can be seen in
the shape of clouds and trees.
A second term to examine is strange attractor. Very simply stated, a strange
attractor is some force that attracts a system to a certain point. Imagine a marble
rotating around in a bowl. As the marble slows down it moves to the bottom of the
bowl until it comes to rest, as though it were pulled or attracted to that point. It is
the activity of these strange attractors that gives rise to fractal patterns – some
“magnet” is pulling the elements of the system to very similar outcomes. No two
snowflakes are identical, but each has a fractal similarity to other snowflakes. (In
this case the strange attractor relates to the dipolar nature of the water molecule.)
A third related concept we must look at comes from evolutionary biology. This is
termed a fitness landscape. Imagine that a surface or plane is covered with a
number of organisms (plants or animals). As these organisms evolve, some are
better suited to the environment and thrive, others are not suited and perish. If we
think of those organisms that are better survivors as moving to a higher fitness
level, we can see that what was a level surface now becomes one that has
various hills and valleys, with the organisms that are the best survivors (the
winners) on top of each hill, and the less suited (the losers) in the valleys.
If we now combine these constructs, we see that the survival force that moves the
organisms up the hills, that is to a higher fitness level, are strange attractors (the
organisms move up the hills in exactly the same way that the marble was pulled
to the bottom of the bowl). The strange attractors also give rise to fractal patterns.

These constructs lend themselves to new ideas in psychology and


assessment. Firstly, the idea that phenomena such as leadership and
personality can be seen as strange attractors is beginning to emerge in
the literature. Personality, for example, can be seen as the “magnet” or
attractor that pulls our thoughts and actions in particular directions
rather than in others, while leadership is the “magnet” that attracts
enthusiasm and directs followers along certain courses of action. At the
neurological level, such an attractor may be the relative strength of a
neural pathway. For example, Tryon (1999) has suggested that post-
traumatic stress disorder (PTSD) involves a tendency to associate any
incoming stimulus with memories of a traumatic event because of
activation of pervasive self-reinforcing connections (attractors)
representing aspects of the event in memory. Severity, in this
framework, is proportional to the strength of the attractor.

Siegle and Hasselmo (2002, pp. 4–5) argue that “individuals who are
efficient at pattern completion will be more vulnerable to PTSD because
they are better able to engage in memory retrieval based on a previous
network state, a key feature of attractor formation”.

Secondly, these patterns of behaviour are fractal in nature – they occur


at all levels of behaviour. If we were to look into the behaviour and
thought patterns of the serial killer and cannibal Geoffrey Dahmer, for
example, we may well find that as a child he pulled the wings off flies
and burned ants with a magnifying glass. We may ask whether these
actions were fractals of the pathology to come. A study of Adolf Hitler
as an adolescent (Pausewang, 1997) shows him sitting in the town
square of Linz in Austria, deciding how he would restructure the town
centre if he were in power. Was this a fractal of his emerging
megalomania? (For a discussion of personality as a fractal, see Marks-
Tarlow, 2002.)

Thirdly, if we take the notion of a fitness landscape and see different


personalities as peaks in this landscape, we realise that a particular
personality can become out of tune with its environment if this changes.
In the same way, highly specialised species such as the dinosaurs
became out of tune with their landscape when this changed. Using this
framework, psychotherapy becomes the process of moving the
personality from its (now dysfunctional) fitness peak and trying to create
a new fitness peak for it. Perhaps personality can be assessed in terms of
its suitability to the fitness landscape, given that the person is in
particular social and environmental conditions. (We can also analyse
organisational survival in terms of a fitness landscape. For example, the
South African arms industry thrived under the apartheid regime, but has
struggled to survive in the post-apartheid world – the environment in
which it had developed a high level of fitness suddenly shifted and what
was a strength became a liability almost overnight. As with personality,
a major avenue of organisational assessment could be the strategic
evaluation of the organisation’s potential to survive given particular
sociopolitical and business scenarios.)

A recent book edited by Anolli et al. (2005) presents a mathematical


process for interpreting an important aspect of complexity, which the
editors term a hierarchical time pattern or T-pattern. In a preface to the
book, Professor Marcello Fontenesi, Rector Magnificus of the
University of Milan-Bicocca, Italy, states that T-patterns are
probabilistic, repeated, recursive (self-similar and hierarchical),
synchronic and/or sequential structures that may involve any number of
individuals and types of behavioural, physiological and/or kinds of
environmental events. The editors claim that their book “offers a series
of recommendations, based on extensive evidence from research, about
how to investigate human and animal behaviour” in order to uncover
“the hidden patterns behind human and animal behaviours”. They offer a
software tool for detecting and analysing these patterns which they
suggest are “new tools for the detection of hidden behaviour patterns”
(Fontenesi, 2005, electronic version). Could this program be used to
identify new and perhaps very different personality factors?

18.4.2 Artificial intelligence


Theories of artificial intelligence (AI) and various forms of knowledge-
based expert systems are also likely to gain influence. As far as AI is
concerned, there is a distinction between knowledge-based systems
(KBS) and case-based reasoning (CBR) approaches. Knowledge-based
systems (KBS) depend on the existence of a large database of
knowledge to perform difficult tasks. This knowledge is obtained from
experts who program relevant information such as facts, rules, heuristics
and procedures into a database. These data are then used in solving the
problem at hand. CBR approaches the problem from another angle: it
uses the behaviour of human experts in specific situations. A case
library is set up and each expert decision is stored. The input question
(representing a case) is fed into the system using appropriate features
recognisable to the system. The system then matches the case question
to the most similar problem stored in the library, and that solution is
used. In other words, when a new case is put in, CBR characterises the
situation (or case) and searches for, and applies, the most successful
solutions based on previous similar cases. This approach accepts that
there are no right answers, just those that were successfully used in
similar previous cases.

Theories of artificial intelligence have already been used to develop a


system for diagnosing autism (Adarrage & Zacagnini, 1992).
Knowledge-based systems (KBSs) of artificial intelligence are already
being used by the Tennessee Department of Corrections in the US for
determining which inmates are eligible for parole (Peterson, 1993).
These approaches could form the basis for selecting job applicants in the
next decade or so. CBR, with its ability to use past precedents, is likely
to have applications in criminal settings, where it could be used to
determine a sentence and prison term in criminal cases. Its use in
selection and placement decisions could also prove fruitful.

18.4.3 Biological approaches (biologically anchored


assessment)
The possibility of using basic biological characteristics to measure
intellectual ability has always been a strong element in assessment
theory. In fact, the earliest attempts to assess intelligence were based on
physiological and psycho-physiological properties such as reaction time
(see Chapter 10, especially section 10.2). One of the primary reasons for
the lack of progress in this area has undoubtedly been the absence of
sophisticated tools and processes for measuring these attributes.
However, with the recent improvement in technology (e.g. CAT scans,
PET and MRI scans, improved EEG technology and other new forms of
brain imaging), this avenue of theorising may be more likely to yield
positive results. Many aspects of brain anatomy and physiology have
been suggested as potentially relevant measures of intelligence: the
arborisation (branching) of cortical neurons (Ceci, 1990); rates of
cerebral glucose metabolism (Haier, 1993); evoked potentials (Caryl,
1994); nerve conduction velocity (Reed & Jensen, 1993), and sex
hormones all hold the possibility of being useful measures of intellectual
ability (see also Vernon, 1993). Although these physiological measures
have been unsuccessful predictors of intellectual ability in the past, this
is likely to have been a result of the crudity of the assessment processes,
rather than because they were logically impossible. According to Price
et al. (2000), it may be possible in the not too distant future to relate
specific brain functions to aspects of test performance.

18.4.4 Greater environmental involvement and activism


In her article, Fernández-Ballesteros (2006) shows that during the 1960s
and 1970s, a strong environmental and ecological position emerged in
psychology, as it did in many other disciplines affected by the “green
revolution” and a general social activism calling for political, social and
sexual liberation. In terms of this agenda, a number of psycho-
environmental models were designed by social agents in close
relationship with other social and natural disciplines in line with social
policies aimed at solving pressing environmental problems (Moos, 1973;
Cone & Hayes, 1980). In this way, science, technology and social needs
interact in order to solve the identified problems. Within this context,
psychological assessment, along with other disciplines, is involved in
the important task of describing, predicting and evaluating human
environments. Since environmental problems require behavioural
solutions, and in a world where resources are dwindling and becoming
less evenly distributed, and where jobs and other forms of work are
decreasing, the importance of environmental assessment and activism
identified by Fernández-Ballesteros is likely to increase over the next
decades. There is also an applied concern of adjusting the environment
to subjects, as well as to preserve natural environments through more
eco-friendly behaviours. The role of psychological assessment in
determining attitudes and behaviours toward the environment and
mapping out shifts over time or after specific interventions is thus likely
to grow in importance in the coming years.

18.5 Control of assessment and professional training

New developments in the training of psychologists, the control of tests


and Internet testing, and the fairness of assessment (particularly in
multicultural settings) are likely to occur. In particular, the assessment of
neurological damage resulting from head injuries is increasingly being
dominated by computer-based testing procedures. The role of
psychological assistants (UK) and testing technicians (US), with duties
that are limited to the administration of computer-based assessment, has
been accepted in various parts of the world, although it seems that this is
being resisted in South Africa, as discussed in Chapter 9 (section 9.7.1).

In the recent past, the Professional Board of Psychology announced its


intention to allow specialist psychologists to be trained and registered –
it seems that these specialists will need training at doctoral level.
However, the exact nature of this training and the nature of the doctorate
are unclear. What is certain is the idea of a doctorate in psychology
(DPsych) has been rejected by various authorities as being incompatible
with the academic nature of a doctorate. As a result, some other
qualification will have to be found to designate this group of specialist
psychologists. It also seems that the specialist registration will be
granted initially to people with advanced qualifications and experience
in neuropsychology and forensic psychology (the study of criminal and
pathological behaviour). The current situation is that these two
categories have been recognised as registration categories, and relevant
“scopes of practice” have been developed. It would thus appear that the
idea of specialist registration has been put on hold for the foreseeable
future.

18.6 The future of psychological assessment and


testing in particular

It should be clear from all that has been said in this chapter and in the
book as a whole that psychological assessment is far more than just
testing. In addition, there is a growing acceptance of the need for
psychological assessments of various kinds and for various reasons.
These include the identification of abilities in people being assessed.
They also include the identification of problem areas that may have
occurred as a result of accident or disease. This last is of particular
interest to those concerned with ensuring justice for all stakeholders –
the individual, the family, the workplace and society in general.
Assessment is crucial in ensuring health and safety, job satisfaction and
personal growth, the optimal placement of people into different work
categories, and for ensuring the adequate insurance reimbursement for
damage that may have occurred as a result of accidents and illness.

In the organisational/work context, there is a growing need to identify


candidates with appropriate skills levels, especially at the middle and
higher job levels, and people with the potential to learn and deliver
desired organisational outcomes at the lower job levels. Throughout the
world there is an increasing shortage of skilled employees – a large UK
study of 905 organisations conducted in 2006 (cited by Morgan, 2007)
revealed that 84 per cent experienced recruitment difficulties as a result
of (1) lack of special skills (65%); (2) high pay expectations (46 %); (3)
insufficient experience (37%). Of course, the recent economic downturn
has altered this picture somewhat. In addition, advances in shop-floor
technology mean that existing employees often need to be “retooled” in
a relatively short time. Changes in technology have also resulted in the
need to adapt recruitment practices. For example, Morgan (2007) reports
that 80 per cent of “Fortune 500” companies only accept job
applications made online.

In closing, we must take note of leading South African psychologist


Victor Nell when he states that

[a]bility testing is not going to go away in the new South Africa.


However strongly groups in the anti-test lobby … insist on an end to
the tyranny of test scores, psychological assessment is deeply rooted
in the global education and personnel selection systems, and in the
administration of civil and criminal justice, that South African
parents, teachers, employers, work seekers and lawyers will continue
to demand detailed psychological assessments (Nell, 1994, p. 105).

Very little has changed since then, and his arguments remain as
powerful as ever.

18.7 Conclusion

In an overview of the areas covered in this book, the following points


can be made.

1. There is a resurgence of interest in and the practice of psychological


assessment in general in South Africa, but particularly in the
workplace, that parallels the experience in the US. This is tied in
part to the findings that personality is important for job success.
2. In particular, there is a new awareness of the strengths of, and value
added by, properly conducted and psychometrically sound forms of
assessment. In this respect, psychological assessment has been
shown to be as accurate and as meaningful as medical tests.
3. The rekindled interest is also related to the failure of non-test-based
forms of assessment to identify much-needed job skills and
competencies.
4. Personality tests are being used increasingly in workplace selection,
largely as the result of research flowing from the Big Five theory of
Costa and McCrae (1992), as well as from findings that the culture
fairness and construct equivalence of items used in personality
assessment are generally higher than those of the purely intellectual
ones.
5. Although ideas associated with adaptive testing have been around
since the 1970s, its use has substantially increased in recent years,
largely as a result of developments in computerisation.
6. New theories based on evolving theories in science, such as
complexity theory, fitness landscapes and artificial intelligence, will
create new approaches to, and methods of, assessment.
7. A new generation of tests based on item response theory, together
with artificial intelligence, computer modelling and other computer-
assisted strategies, will produce improvements in theory and
technology, as well as in assessment practice.
8. New social concerns in the work context as well as in the wider
social environment will create new psychological constructs that
will have to be assessed. Many of these will come from areas
associated with the positive psychology movement.
9. The control of assessment practice in South Africa will continue to
evolve, and methods will be found for ensuring the proper training
of highly professional and ethical practitioners.
10. Perhaps this is best summed up by the comments in the box below.

Recent research has prompted a new recognition of the value of assessment


among consumers and health professionals, and the advocacy efforts of
psychology have helped insurers recognise the value of assessment as well.
Meanwhile, psychologists are moving into new areas of assessment, and
developing new techniques.
“In large measure, the public understands the value of psychological testing,”
says Bruce L. Smith, PhD, assessment advocacy coordinator for the Society for
Personality Assessment (SPA) and a private practitioner in Berkeley, California.
“People have the sense that this is science, not someone’s opinion.” (Cited by
Clay, 2006.)
Additional reading

For an examination of the advantages and disadvantages of computerised testing, see


McIntire, S.A. & Miller, L.A. (2000). Foundations of psychological testing, especially
pages 63–68.
A similar discussion is given in Chapter 17 of Cohen, R.J. & Swerdlik, M.E. (2002).
Psychological testing and assessment: An introduction to tests and measurement.

Test your understanding

Short paragraphs

1. Briefly outline some of the advantages and disadvantages of computer-assisted


assessment.
2. Briefly discuss one of the emerging theories of science that may impact on
psychological assessment.
3. Outline three ways in which computer-based assessment is likely to affect
psychological assessment.

Essays

1. “Emotional intelligence is not a form of intelligence in the strict sense of the word”.
Discuss.
2. “Psychometric assessment is like democracy: it is the worst possible system –
except for all the others.” Discuss this modification of Winston Churchill’s statement
about democracy in the light of the anti-testing movement.
3. Critically discuss the issue of whether online testing should be controlled by a
registered/chartered psychologist or whether proctoring/supervision by some “lesser-
qualified” professional should be allowed.
APPENDICES
Appendix 1

SOME TESTS AND MEASURES OF


MAXIMUM AND TYPICAL
PERFORMANCE

In this appendix, various tests of ability (maximum performance) and


personality, values (typical performance), etc. are listed, along with their
producers/distributors.

A1.1 Tests of maximum performance (ability)

Tests of maximum performance can be described in terms of various


dimensions, among which the three most important are

attribute being measured


presentation format
target audience (i.e. the job/educational level at which they are most
applicable).

What do these terms mean?

It is important to note that in August 2014, the HPCSA has published a


list of teats that have been classified and certified tests (Board Notice 93
of 2014). This list must be read in conjunction with the Employment
Equity Amendment Act 47 of 2013 (see Chapter 7, Section 7.2 of this
text.)

A1.1.1 Attribute being measured


Tests of maximum performance generally measure cognitive ability in
five different task areas. These are:

1. Memory. This is the ability to recall new information over a relatively


short time (usually 20—30 minutes).

2. Knowledge. This is the ability to correctly recall and use information


from a number of areas such as general knowledge (who was the
first president of South Africa after liberation?), language (spelling,
grammar and idioms) and technical knowledge (if wheel A turns in a
clockwise direction, what does wheel B do?)

3. Reasoning. If John has 20 apples and he eats three of them, how many
apples does he have left?

4. Visual transport or visualisation. This is the ability to manipulate


objects in your head, such as seeing what an object would look like if
it were rotated, turned upside down or seen from the back. A
comparison task in which one object or list is compared with another
object or list in a proofreading task is also an example.

5. Psychomotor tasks. These involve assembling various objects from


parts such as doing a jigsaw puzzle or making a human figure
(manikin) from various parts.

A1.1.2 Presentation format


The content of these tests can also vary across four different areas,
namely:

1. Language (two plus two is equal to?).


2. Numbers (2 + 2 = ?).
3. Figures (shapes, not numbers). Note that the figural material can be
representational (pictures of cats and dogs, lions and tigers) or
abstract (squiggly lines, triangles, squares, etc.).
4. Physical objects (assemble parts of a jigsaw puzzle).

Each of the five main categories of tests can use these four content areas,
so that we can have memory for words, memory for numbers, memory
for pictures, memory for abstract designs and memory for physical
objects. (We have all played the memory game of looking at a table with
twenty objects on it for a short time and then trying to recall the objects
on the table.) This results in the categorisation shown in Table A1.2.

Let us examine each of these presentation formats.

A1.1.2.1 Figural visualisation


Some of the more important tests of figural visualisation are

Figural memory. This is the memory of shapes and pictures.


Perceptual speed. This is the ability to find examples of a certain
shape in a series of shapes, e.g. find all the * in the following
sequence and draw a line through them:

This sequence can go on for pages, and the person has to draw a line
through all the *s that he or she can find in a two-minute period.

Comparison. This is a typical proofreading task, in which a certain


sequence of numbers needs to be matched with a similar set some
distance away. For example, find the set in the right-hand column that
is identical to the set in the left-hand column.
(In this case, the answer is d). Of course, the further apart the two sets
of symbols that need to be compared, the more difficult the task
becomes.

2-D rotation. Which of the following two-dimensional shapes can be


obtained by rotating the example on a flat plane, i.e. without turning it
over of flipping it?

(c is the correct answer)

3-D rotation. This is the same as the previous but using three-
dimensional shapes.
2-D assembly. Which two shapes can be joined together to form a
square?

(d is the correct answer)

3-D assembly. Which two shapes can be joined together to form a


cube?
2-D disassembly. Which of the following shapes is contained within
the bigger, more complex pattern?

(c is the correct answer)

3-D disassembly. This is the same as the previous one, but using
three-dimensional shapes. A typical example of this would be the so-
called exploded diagram of an object such as a carburettor, brake
assembly, sewing machine, etc.

A1.1.2.2 Figural reasoning


The next class of tests are those called figural reasoning. This involves
working out which object or shape completes a sequence or pattern.
There are basically four ways of doing this, namely:

Series. Which is the next object in the series?


+ ** +++ **** +++++ ?

Clearly, the next item is ******.

Matrices. This is a series that runs both across and down. Raven’s
Progressive Matrices is the best known of this, although various other
test producers have used this format.
Analogies. This is similar to the verbal analogies hot:cold, wet:dry
(hot is to cold as wet is to dry) and takes the form of various shapes
such as:
Classification of representative objects. Which of the following is the
odd one out? (Pictures of a dog, cat, horse, cow and giraffe?) The
giraffe, because the others are all domestic animals.
Classification of abstract objects. This is the same as the previous, but
using abstract objects such as shapes, squiggly lines, etc.

A1.1.2.3 Verbal tests


Verbal tests are those based on language and come in many forms, from
a simple memory for words, through visual transport (proofreading
sentences and paragraphs), to grammar, spelling and other language
rules to reading comprehension and verbal reasoning. Let us take a brief
look at each of these in turn.

Verbal memory. This is the ability to remember words after a period


of time. For example, a short story is read (say about Simon who is
sent to the shop to buy certain things). After 20 minutes or so, the test
taker has to remember who the person was (Simon) and what things
he had to buy at the shop – was it 500 grams of butter and 1 kg of
sausages, or was it 1 kg of butter and 500 grams of sausage?
Verbal receptive ability. This is the ability of the person to understand
the spoken word; it is basically a dictation test.
Verbal usage. This is the correct use of grammar, spelling and other
rules of language.
Verbal comparison. This is a proofreading task, in which words on
one page have to be compared to words on another page, to see which
are incorrectly copied. It is the same as the figural comparison
discussed above.
Reading comprehension. This is the ability to understand and
comprehend the meaning of different texts; it is basically a reading
study and is based on general knowledge topics.
Verbal reasoning. This is the ability to understand and reason/solve
problems using verbal material. An example of this would be: If Jane
walks 60 metres due north, turns and walks due east for 80 metres,
how far is she from her starting point? (This is a classic example of a
3-4-5 triangle – she will be 100 metres from her starting point!)

A1.1.2.4 Numeric tests


Numeric tests are those that work with numbers. Once again, there are
various tests that are based on numbers. The simplest is numerical
comparison, followed by computation and numerical reasoning.

Numerical comparison. This is a proofreading exercise using numbers


as shown in the example below.

a) b) c) d) e)
63587 65387 63578 63587 36587 63857

In this case, the answer is c).

Computation. This is the ability to apply the rules of arithmetic from


simple addition and subtraction to fractions, ratios, powers, square
roots, and using a different base (e.g. counting to the base 8, etc.).
Numerical reasoning. This is the ability to perform typical story sums,
such as: John has three apples and Thabo gives him four more. How
many apples does John now have?
Numerical interpretation. This involves the ability to interpret data
from graphs, scales, tables, gauges and similar displays of numerical
information.

A1.1.2.5 Technical, scientific, mechanical knowledge


There are also a number of tests that contain material about various
mechanical and technical areas. These might include material about
cogs, levers and pulleys, or about the way in which boats are steered
(e.g. in order to turn a boat to the left, does one push the rudder to the
left or the right?).

Insight and comprehension. These tests are essentially reading


comprehension tests in which the material is relatively technical in
nature and would obviously be used for selecting people for jobs that
involve this type of knowledge.
Assembly. This is the ability to assemble various object such as a
tripod in terms of a particular design, using all the bits and pieces
provided. (This overlaps to a small extent with the psychomotor tasks
described in the next section. However, the emphasis in these tests is
on the ability to see how the various components fit together in a kind
of three-dimensional jigsaw puzzle, rather than on the physical
activity of assembling or building the object.)

A1.1.2.6 Psychomotor tests


A final set of tests that are sometimes used measure various
psychomotor abilities, such as hand-eye coordination, the ability to
balance, the ability to orient yourself in space, balance and physical
strength. The following are among the most important areas that are
assessed in this domain.

Motor speed. This is the ability to carry out various motor tasks as
quickly as possible. Tasks used here include things like putting pegs
in a pegboard using a pair of tweezers, matching nuts and bolts of
different sizes and screwing the nuts onto the bolts, sorting washers of
different sizes into appropriate containers, and so forth. These are
generally measures of the ability to see differences in the objects and
of fine motor coordination/manual dexterity. (Such assessments may
be important for jobs involving assembly of small units such as
watches, cellphones, computers and the like.)
Reaction time. This is the ability to react quickly to a stimulus. In the
simplest version, a rod (similar to a broomstick) with ruler markings
on it is released, and the time taken for the person to react to the
dropping stick by catching it is measured by the number of notches
from the start to where the person catches the rod. Other versions of
this involve recording the time taken from a light appearing and the
person hitting a switch. (In Chapters 9 and 16, it is argued that
reaction time may be closely related to overall intelligence, especially
when fairly complex decisions need to be taken.)
Hand-eye coordination. This is the ability to track an object in space.
Some of you might have played a game at a fairground in which a
ring must be moved along the length of a squiggly piece of wire
without touching the wire. A similar task is one in which the person
has to draw a pencil line between the walls of an object such as a star
or circle as quickly as possible without touching either of the walls.

Two-hand-eye coordination. This is the ability to coordinate both


hands while tracking an object in space. Many of you will have seen
old World War II movies where a gunner has to move a gun to the left
or right using one hand and up and down using the other hand.
Modern PlayStation games also require this kind of two-hand-eye
coordination. (Sometimes the test-taker also has to coordinate his or
her feet as well. This is what happens when you drive a car and
change gears manually. Using a cellphone while driving involves
coordinating your mouth as well!)
Balancing. In some cases, people are assessed for their ability to
maintain their balance, sometimes even when they are blindfolded.
Strength Similarly, people are sometimes required to show their
strength. This could be whole body strength (such as when lifting a
heavy object) or it could be arm or hand strength.
Drawing. One last area of psychomotor ability that is sometimes
assessed is technical drawing ability, such as copying the design of a
tool or blueprint of some kind.

A1.1.3 Levels of applicability (target audience)


Of course, not all tests are useful at all educational and job levels, and so
a final step is to specify the different levels where the tests are useful
(i.e. most valid and reliable). Six such levels can be identified, although
some experts may want a different system with fewer or more levels.
The levels shown in Table A1.1 are proposed as a useful framework.

Table A1.1 Levels at which testing in the workplace occurs

Level 1 2 3 4 5 6
Approximate Illiterate Unskilled Semi- Skilled Managerial Senior
job level skilled & technical management
ABET level 1 2 3 4 5 6+
Approximate Grade Grade Grade Grade Grade 12 + Postgraduate
education level 0–3 4–7 8–10 9–12 Tertiary

A1.2 Ability tests matrix

If we combine these different levels with the different test domains we


have identified, we end up with a matrix as shown in Table A1.2. The
final step is then to identify various tests and subtests of batteries that
can be placed within this matrix. Note that the various tests in each cell
of the matrix are roughly equivalent and can be used more or less
interchangeably, bearing in mind that the nature of the job may vary, so
that one person may prefer more clerical items, whereas another may
prefer more mechanical or technical items. The location of some of the
currently available tests produced by local test suppliers are shown as
well as their contact details, although the list is not complete. These are
indicated in the Table A1.2, with the full name of the test being given in
Table A1.3.

In Tables A1.4 and A1.5, a summary of the various measures of typical


behaviour (personality, styles, attitudes, integrity and the like) are given.
Finally, in Table A1.6, the names and contact details of various
suppliers of the test material are listed. Local test producers/distributors
were invited to add to or comment on these tables, but there was no
response to this invitation.
These various ability and typical behaviour tests and subtests can be
combined into specific batteries by choosing appropriate tests at relevant
levels. Typical batteries include clerical, technical, commercial, IT,
sales, security and general management.

For a full list of tests classified by the Professional Board for


Psychology of the HPCSA, see
http://www.psyc.co.za/HPCSA/FILES/FORMS/207.pdf. Tests and
measures not listed by the HSPCA should only be used by people who
know what they are doing. In all cases, the relevance of the material and
availability of local norms should be borne in mind – the use of this
material may need to be defended in a civil or labour court.

Table A1.2 A Matrix of ability tests*

Level 1 2 3 4 5 6
Senior
Approximate job level Illiterate Unskilled Semi- Skilled Managerial managers
skilled & technical and
professional
Approximate education level Grade 0–3 Grade 4– Grade Grade 9– Grade 12 + Postgraduate
7 8–10 12 tertiary
Figural Figural memory SAT/10
visualisation Perceptual Cancl Cancl Cancl
speed
Comparison CWP DAT- DAT-
RS* KL*/6,
TRAT/8
2-D rotation TRAT/12, VSA VSA
VSA
3-D rotation DAT- VSA VSA
KL/7,
VSA
2-D assembly SAT/8, VSA VSA
SpAbil,
TRAT/6
3-D assembly DAT- VSA VSA
KL/7,
SAT/9
VSA,
SRT/2,
TRAT/16
2-D Gott Gott
disassembly
3-D
disassembly
Figural Series SAT/6,
reasoning TRAT/14
Matrices SPM, CPM, COPAS1 SPM SPM SAT/5, APM, AR1
COPAS 1 & 11, SPM AR2
Analogies DAT-KL/3
Classification – TRAT/5
representative
Classification – FCT-HL
abstract
Verbal Verbal memory DAT-KL/9
SAT/7
Verbal Dictation1
receptive
Verbal usage SP2 DAT- VR1
KL/1,
TRAT/13,
TRAT/15,
SP2, VR2
Verbal CC2 CC2,
comparison Fi2 DAT-
LK/6, Fi2
SAT/1
Comprehension VWP1 DAT-KL/5
Reasoning Following DAT- CRT/1, VDR, VIR 1
instructions1 KL/2, VC1/3 VR1 Various
VR2 suppliers, no
details
available
Numerical Numerical CC2 CC2,
comparison DATKL/6
SAT/4,
TRAT/8
Computation NA2 NA2
DATKL/4,
SAT/2,
TRAT/7
Reasoning SAT/3, CRT/2, VNR, VDR,
TRAT/11 NR1, VIR
NMG/1-6
Numerical TRAT/9
interpretation
Technical, Knowledge, VMC DAT-
scientific, insight and KL/8,
mechanical comprehension MRT2,
TTB2,
TRAT/10
Assembly Manikin/jigsaw
puzzle
assembly
Psychomotor Motor speed SAT/11
Reaction time
Typing and
computer skills
Hand-eye VAC2, TRAT/1
coordination WSSM
2 Hand-eye TRAT/2
coordination
Balancing
Strength
Drawing
Dynamic LPCAT, APIL,
testing TRAM LPCAT
II

* Although every effort has been made to ensure that the


tests and subtests are correctly located in the matrix and
producers/distributors were approached to confirm these
locations, the matrix may not be a hundred per cent accurate.
** Note that the DAT forms K and L are parallel versions, as
are forms R and S. These are given in the matrix as DAT-KL
and DAT-RS.

Table A1.3 A matrix of measures of typical behaviour

Level 1 2 3 4* 5* 6*
Senior
Approximate Illiterate Unskilled Semi- Skilled Managerial managers
job level skilled & technical and
professionals
Approximate Grades Grades Grades Grades 9– Grade 12 + Postgraduate
education level 0–3 4–7 8–10 12 tertiary
Personality Scales – 15FQ+,
batteries 16PF,
FFM, HPI
Single- LOC.MBTI,
factor JTI, Type
scales A/B
Projective TAT,
Rorschach
Workplace OIP OIP,
specific OPPro,
OPQ, WPI
Motivation MQ MQ
Values, interests and career CAnchs,
choice OPQ,SDS,
VMI
Affective states
Emotional intelligence BarOn,
MSCEIT,
PDA
Interpersonal skills PDA
Leadership/management OPQ
styles
Integrity/dependability DSI,
Giotto,
IP2000
Communication/decision- PDA
making style
Team functioning LGT Belbin,
Team functioning LGT Belbin,
OPQ
* Note: Most of the personality and other tests of typical behaviour
can be used at levels 4, 5 and 6.
It is important to note that not all tests listed in this matrix have
been classified and certified by the HSPCA, and that not all the
tests certified by the HSPCA are listed here. The full HPCSA list
of tests can be downloaded from www.gpwonline.co.za.

Table A1.4 Names of tests contained in the matrix of ability tests

Label Full title of test/subtest Supplier


APIL-B Ability, Processing of Information and Learning Battery APL
APM Advanced Progressive Matrices JvR, MMM
AR1 Graduate Abstract Reasoning Test PSA
AR2 General Abstract Reasoning Test PSA
Cancl Cancellation tasks Various
CC2 Clerical Checking Test PSA
COPAS Cognitive and Potential Assessment Integ
CRT/1 Critical Reasoning Test Subtest 1 PSA
CRTB2 Critical Reasoning Test Battery PSA
DAT Differential Aptitude Test (various forms) MMM
FCT-HL Figure Classification Test — high level MMM
FI2 Filing Test (part of clerical battery) PSA
Gott Gottschalt Embedded Figures Test MMM
GRT1 Graduate Reasoning Tests PSA
GVR2 General Verbal Reasoning Test PSA
LPCAT The Learning Potential Computerised Adaptive Test M&M
MRT2 Mechanical Reasoning Test PSA
NA2 Numerical Ability PSA
NR1 Graduate Numerical Reasoning Test PSA
NR2 General Numerical Reasoning Test PSA
SAT Senior Aptitude Test MMM
SP2 Spelling Test PSA
SPM Raven’s Standard Progressive Matrices MMM
SRT2 Spatial Reasoning Test PSA
TRAM Transfer, Automisation and Memory APL
TRAT Trade Aptitude Test Battery MMM
TTB2 Technical Test Battery PSA
VAC2 Visual Acuity Test PSA
VDR Deductive Reasoning (Verify) CEB/SHL
VIR Inductive Reasoning (Verify) CEB/SHL
VMC Mechanical Comprehension (Verify) CEB/SHL
VNR Numerical Reasoning (Verify) CEB/SHL
VR1 Graduate Verbal Reasoning Test PSA
VR2 General Verbal Reasoning Test PSA
VSA Spatial Ability (Verify) CEB/SHL
VVR Verbal Reasoning – Following Instructions (Verify) CEB/SHL

Table A1.5 Full names of tests of typical performance

15FQ+ 15FQ Plus Questionnaire PSA


16PF 16 Personality Factor Inventory JvR
BarOn BarOn Emotional Quotient Inventory (BarON EQ-I™) JvR
Belbin Belbin Team Roles Belbin
CAnchs, Career Anchors (Schein) Various
DSI Dependability and Safety Instrument CEB/SHL
FFM Five Factor Model (OCEAN) or Revised Neuroticism, Various
Extraversion, Openness Personality Inventory (NEO-PI-R),
Giotto Giotto Integrity Test SCon
HPI Hogan Personality Inventory JvR
IP2000 Integrity 2000 Integ
JTI Jung Type Indicator PSA
LGT Leaderless Group Test Various
LOC Locus of Control Various
MBTI Myers-Briggs Type Indicator JvR
MQ Motivation Questionnaire CEB/SHL
MSCEIT Mayer-Salovey-Caruso Emotional Intelligence Test JvR
OIP Occupational Interest Profile PSA
OPPro Occupational Personality Profile PSA
OPQ Occupational Personality Questionnaire CEB/SHL
PDA Personal Development Analysis Bioss
SDS Self-Directed Search (Holland’s RIASEC model) Various
VMI Values and Motives Inventory PSA
WPI Workplace Personality Index JVR

Table A1.6 Key to distributor acronyms used and contact details

The various test distributors listed above and their contact details
(current as at June 2014) are given below.
Name Location Contact
APL Aprolab Johannesburg aprolab@icon.co.za
Belbin Belbin Cape Town, www.capacityinc.co.zacarolk@7i.co.za
Associates Johannesburg
Bioss BIOSS Johannesburg info@bioss.com
Southern
Africa
CEB/SHL Corporate Centurion zacustomersuccess@shl.com
Executive
Board/SHL
US
Integ Integrity Johannesburg integrity@integtests.com
International
JvR JVR Africa Johannesburg info@jvrafrica.co.za
MMM MindMuzik Pretoria sales@mindmuzik.com
Media
M&M M&M Pretoria info@mminitiatives.com
Initiatives
PSA Psytech South Johannesburg www.psytech.co.za
Africa
SCon Saville Johannesburg info@savilleconsulting.co.za
Consulting
Appendix 2

CALCULATING CORRELATIONS

A2.1 General introduction

There are basically three stages to carrying out a correlation (or any
other statistical analysis, for that matter). These are as follows:

1. Data capture – getting the various bits of data ready for analysis
2. Conducting the analysis
3. Interpreting the results

Various programs are available for this. The most widely used are
Statistica®, Excel® and Statistical Package for the Social Sciences®
(SPSS). In this appendix, we take you through each of the three stages
with both programs as well as with SPSS.

A2.1.1 Correlations
Before we do this, we need to clarify one small issue. As you know,
correlations compare the variance of one set of numbers with the
variance of another in order to determine the amount of overlap between
the variances of the two sets of numbers. This is best visualised in terms
of two overlapping circles, as is shown in Figure A2.1.
Figure A2.1 Visualisation of a correlation between two
variables

Sidebar A2.1
In statistical terms, a correlation is nothing more than the overlap between the two
circles shown in Figure A2.1 (termed the covariance, because they vary in
relation to each other) divided by the average or mean of the variance of the two
samples. To make things simpler, let us call the one circle A and the other circle
B, and let us call the covariance AB. In terms of this notation, the correlation
coefficient (r) is obtained by AB/mean of A and B (A B divided by the mean of A
and B).
However, there is a small extension to this calculation, and it is this: there are
three distinct ways of calculating the mean, of which two are important for this
discussion.
The way that we all know is simply to add up all the scores and divide by the
number of scores. (In this case, we simply add A and B and divide by 2
[(A+B)/2].) This is known as the arithmetic mean. The second way of calculating
the mean is to multiply the two scores and then take the square root of the total.
This is known as the geometric mean and is shown as (the square root of
A times B).
This figure is usually a little lower than the arithmetic mean and is preferred by the
statisticians, for reasons unknown to me. When calculating the correlation
coefficient, the covariance AB is divided by the geometric mean rather
than the arithmetic mean (A+B)/2, so that

The third type of mean is known as the harmonic mean, but is of no interest to us
here. It is defined as the reciprocal of the arithmetic mean of the reciprocals. A
reciprocal of any number is 1 divided by that number. To calculate the harmonic
mean, divide 1 by A and then divide 1 by B. Next, add these two numbers and
divide by 2. Finally, divide 1 by this answer.

We know that the value of r must fall within the range of +1 to 21.
Where r = +1, a perfect positive correlation exists, that is, the values of
the variables fall and rise together. Where r = 21, there is a perfect
negative correlation between the variables, meaning that an increase in
the value of one variable is accompanied by a decrease in the value of
the other variable. When r = 0, this denotes a zero correlation – the
variables are unrelated to each other.

In most psychological studies, perfect correlations are never found; a


value of above 0,6 is good and above 0,4 or even 0,3 is acceptable. In
practice, items that correlate at 0,35 or higher can be retained. If a
negative correlation is obtained, it is an indication that the item is
probably scored in the wrong direction.

A2.1.2 Item analysis


When we carry out an item analysis, we correlate the score of each item
with the item total or whole, that is, the scores of all the items added
together. This is called item-whole correlation. However, each item’s
score forms part of the total score and so the correlation between the
item and the total is slightly too high. According to the purists and
experts, it is better to calculate the correlation between each item and the
total excluding that item. This is termed an item-remainder correlation.
Although this is theoretically an important refinement, in practice there
is little difference between an item-whole correlation and its item-
remainder correlation equivalent. In this text, item-whole correlations
are used.

There are various forms of correlation with associated correlation


coefficients. The most important of these are

(a) Pearson’s product moment coefficient of correlation (r)

(b) the rank coefficient of correlation also known as the Spearman’s


coefficient of correlation ρ (rho).

We will use the Pearson product moment correlation (r) in this analysis
tutorial. We will show you how to do the calculations using Excel,
Statistica and SPSS.

A2.1.3 The correlation matrix


Suppose we run a correlation between four variables. The output from
the various programs will be in the form of a correlation matrix as
appears in Table A2.1.

Table A2.1 A correlation table between four variables

Var 1 Var 2 Var 3 Var 4


Var 1 1 ,35 ,62 ,49
Var 2 ,35 1 ,58 – .27
Var 3 ,62 ,58 1 ,43
Var 4 ,49 – ,27 ,43 1

If we examine this matrix, several aspects are noteworthy.

a) The first row and column of Table A2.1 have the same variables. In
this case, they are labelled Var 1, Var 2, etc. If these variables had
been labelled as Age, Height, Width, etc. in setting up the data
matrix, these names would appear in the correlation matrix.

b) The values running down the diagonal are all 1 (shaded blocks). This
is not surprising, because in these cells each variable is correlated
with itself.

c) The correlations in the matrix are duplicated above and below the
diagonal. These are called off-diagonal cells. This is because the
correlation between Var 1 and Var 2 (,35) is the same as the
correlation between Var 2 and Var 1 (,35). Because of this, the one
half of the matrix is redundant – the same information appears above
and below the diagonal.

d) Sometimes, when correlations are reported in a manual or text, the


results of two different groups are given in the same matrix. For
example, the results for males could be given above the diagonal and
females below the diagonal. Similarly, “Before” scores could be
given above the diagonal and “After” scores below. Of course, the
computer does not produce these results in this form – you have to
type them in.

e) If you look at the correlation between Var 2 and Var 4, you see it is
negative. This may indicate that one of the variables is scored in the
wrong direction, although negative correlations do occur – the more
often you brush your teeth, the fewer caries you are likely to have.

A2.1.4 A note on data capture


This is the process of preparing the various bits of data for analysis. In
order to perform any kind of analysis, the data must be captured in a
matrix or table form, with each person’s scores for the various items,
test scores, etc. (called variables) running across the matrix and the
different people (called cases) running down the matrix. This is shown
in Table A2.2. For the sake of simplicity, five people with six items are
shown. In addition, the total for each person (P-totals) and for each
variable (V-totals) is given.

Before we show you how to capture the data in the various programs,
look at Table A2.2. Firstly, note that the grand total in the bottom right
is the same (99 in this case) when you add down (21 + 11 + 20, etc.) and
when you add across (13 +15 + 18, etc.).

Table A2.2 Data sheet for five cases and six variables

Case Var 1 Var 2 Var 3 Var 4 Var 5 Var 6 P-Total


A 3 5 4 2 3 4 21
B 2 1 2 2 3 1 11
C 3 4 3 2 3 5 20
D 4 3 5 4 3 4 23
E 1 2 4 12 3 2 24
V-total 13 15 18 22 15 16 99

Note: Always work with people/cases down and scores/variables across.

Secondly, if you look at item 4 for case E, the score is given as 12


(shaded block). However, this is impossible, because each item is scored
out of 5, so this means that there was an error in capturing the data. This
needs to be corrected – let us assume it was 2. This would make the C-
total for case E 14, not 24, and the V-total for item 4 should be 12, not
22. Similarly, the grand total is 89 and not 99. It is important to look for
outliers of this kind and to make sure that all the data is correct before it
is analysed. It is important that somebody checks the data, preferably an
independent outsider. (In industry and research institutions, the data is
generally captured twice to ensure that it has been correctly captured. It
is recommended that a sample of about 10 per cent of your data is
recaptured.) Below, you will see some ways of spotting these errors. Of
course, if incorrect data is given to the data capturer, no amount of
checking will correct this. (Remember GIGO – garbage in, garbage out.)

For a good text on statistics and research methods, see Tredoux and
Durrheim (2005).

A2.2 Statistica®

A2.2.1 To get started


a) Double click on the Statistica icon on the desktop.

b) Alternatively click on Start, go to Programs and click on Statistica.


A2.2.2 Capturing data in Statistica
In order to capture data in Statistica, do the following:

a) Select New from the File menu to display the Create New Document
dialogue.

b) Select the number of variables and cases.

Enter the number of variables and cases you have in your sample. The
exact numbers are not vital as you can add and subtract variables later
(see e) below). Fill in the MD (missing data) code, as indicated. Fill
in the Variable Names as indicated.

c) To add or subtract cases or variables, or change values, click on the


Insert button on the toolbar and choose whether you wish to add or
remove data. Then add, delete or modify the required data.

d) You can enter the names of the variables by double clicking on the
existing variable name (e.g. Var1) and entering the name of the
variable you want (e.g. Age). Click OK.

e) Save the data file using the name you want (e.g. Pilot 1).

A2.2.3 Displaying and checking data in Statistica


In order to ensure that your data is clean and that there are no typing
errors, it is useful to display the data. Do this as follows:

a) Click on Statistics on the toolbar.

b) Go to Basic Statistics/Tables.

c) Go to Descriptive Statistics.

d) Enter OK.

e) Go to the Variables button and select the variable(s) you wish to


display (e.g. Age, Total Scores). Select All if you wish.
f) Click on Summary at the right of the screen.

g) The results are then displayed for the variable(s) you specified. In this
way, outliers and impossible values can be identified.

h) Go to Descriptive Statistics. (You can also go to Frequency Tables.


In this case, a frequency table or histogram will result. Inspect this
for outliers.)

A2.2.4 Cleaning or correcting incorrect data in Statistica


If you identify any problems, go back and correct them according to
A2.2.2 c) above and Save.

A2.2.5 Conducting the analysis in Statistica


a) In order to run the correlation, go back to Descriptive Statistics and
click on Correlation Matrices.

b) Click on One Variable List and list the variables you wish to
correlate.

c) On the bottom right of the drop-down box, there is a choice of how to


deal with missing data. This is either casewise or pairwise deletion.
Casewise deletion results in the whole case being excluded from the
calculations, whereas pairwise includes the cases with missing data,
and simply excludes the pairs of variables with missing data involved
in the calculation. If there are 20 cases and each one has one bit of
information missing (e.g. Age, Test1, Total, etc.), casewise deletion
would result in all 20 cases being excluded. However, if pairwise
deletion is used, this would result in the correlations being based on
the remaining 19 cases. This is the better option in our view. Of
course, if some cases have a large number of variable scores missing,
it casts some suspicion on the accuracy of the other scores for the
particular case.

d) Go to Quick and click on Summary Correlations.


A2.2.6 Interpreting the results
The results are presented as a matrix, with the mean and standard
deviation of each variable forming the first two columns, followed by
the correlation matrix as outlined in section A2.1.3 of this appendix. The
results that are significant at the 0,05 (5%) level appear in bold.

If you want to change this significance level, go back to the Statistics


button on the toolbar, then to Resume. Then go to Options and raise or
lower the significance level (second line from bottom of the drop-down
box). Go back to Quick and Summary Correlations. Decide whether
the correlation is significant or not.

A2.3 Excel®

Excel is based on spreadsheet technology and has various formulae built


into it. Once the data has been captured and certified correct, these
formulae are applied.

A2.3.1 To get started


a) Double click on the Excel icon on the desktop.

b) Alternatively click on Start, go to Programs and click on Excel.

A2.3.2 Capturing data in Excel (version 5.0 or later)


In order to capture data in Excel, the following method is recommended:

a) To open Excel, double click on the Excel icon if it is available on the


desktop or go to Programs and click on Excel.

b) Double click on the tab Sheet1 at the bottom of the Excel sheet (the
default).
c) List the variables (i.e. the scores) for each case (person) across the
table and the various cases (people) below one another. This is the
data that will be used to calculate correlations, create graphs and
perform other statistical analyses as required.

d) Save the file to a file name that suits your needs (e.g. Sample1 2008).

e) In the screen shot below, we have two sets of data, width and height,
for nine cases. (Note that the spreadsheet in the example has not been
named, it is simply called Sheet 1.)

A2.3.3 Displaying or checking data in Excel


Before the data can be analysed, it must be displayed so that incorrect
entries and/or anomalies can be identified. One way of doing this is to
display the data in each column or variable. In this way, data that is
obviously incorrect can be identified. For example, if we captured the
ages of the people in our sample, we have entered 91 instead of 19. If we
then generated a table showing the highest and lowest scores on the
variable “Age” the 91 would appear. We could spot such an error if, say,
only young adults comprised our cases. Similarly, if we ranked all
scores from highest to lowest, we could pick up various other incorrect
entries. Of course, if the person’s age was 19 and we captured it as 18,
the error would not be obvious. It is therefore important that all data in
the spreadsheet be checked, preferably by an independent person.

A2.3.4 Correcting incorrect data in Excel


Erroneous capturing of data is a common phenomenon. To correct this,
do the following:

a) Place the cursor on the cell with the incorrect data.

b) Press the F2 button.

c) Enter the correct data and then press Enter.

d) Save the updated file.

A2.3.5 Conducting the analysis in Excel


To carry out a correlation:

a) Go to the function wizard fx at the top left of the screen.

b) Go to the drop-down menu Or select a category and select


Statistical.

c) Toggle down until you find Correl and enter OK.

d) Click OK or Finish at the bottom of the dialogue box.

e) Another dialogue box (function wizard or function argument) will


appear with Array1 and Array2 displayed.
f) Click on the little box at the right of Array1 and the first variable in
the table you created will appear.

g) Drag the cursor over the first variable you want to correlate. Repeat
this process for Array2 and click OK or Finish. This will give you
the correlation for the two variables you initially selected (width and
height).
A2.3.6 Interpreting the results
Decide whether the correlation is significant or not.
A2.4 SPSS®

SPSS is an initialism for Statistical Package for the Social Sciences. This
section is based on the SPSS for Windows 11.0, although other versions
are very similar. This section draws on Antonius (2003), especially Lab1
(pp. 216– 212).

A2.4.1 To get started


a) Double click on the SPSS icon on the desktop.

b) Alternatively click on Start, go to Programs and click on SPSS.

A2.4.2 Capturing data in SPSS


In order to capture data in SPSS, the following method is recommended:

a) Open the SPSS program.

b) Set up a data file by clicking on File, then New, then select the
number of variables (by name or label) and finally click OK. You
will be asked to specify each variable – the type of data and the
format. For example, you will be asked to give the number of
numerals involved and the number of decimals, for instance 3,1 is
two numerals of which one is a decimal. This means you will not be
able to enter a number larger than 99,9. This specification can be
changed at a later date (see A2.4.3 c) below).

The data file should appear in columns corresponding to the number of


variables selected. In this case there are two variables and nine cases.

Width Height
1
2
3
4
5
6
7
8
9

c) Capture the data

Width Height
1 22 31
2 23 32
3 24 33
4 25 34
5 26 35
6 27 36
7 28 37
8 29 38
9 30 39

d) Save the data using an appropriate name such as Sample 1.

A2.4.3 Displaying and checking data in SPSS


Before the data can be analysed, it must be displayed so that incorrect
entries and/or anomalies can be identified. One way of doing this is to
display the data in each column or variable. In this way, data that is
obviously incorrect can be identified. For example, if we captured the
ages of the people in our sample, we have entered 91 instead of 19. If we
then generated a table showing the highest and lowest scores on the
variable “Age” the 91 would appear. We could spot such an error if, say,
only young adults comprised our cases. Similarly, if we ranked all
scores from highest to lowest, we could pick up various other incorrect
entries. Of course, if the person’s age was 19 and we captured it as 18,
the error would not be obvious. It is therefore important that all data in
the spreadsheet be checked, preferably by an independent person.

a) To display the data, click on Analyze in the SPSS Data Editor


Toolbar, go to Descriptive Statistics and then choose the
Descriptives option.

b) Click on the arrow between the two panes and highlight the
variable(s) you wish to examine. Next, go to the Options button on
the bottom right of the screen and click on it.

c) Identify the descriptors you wish to use: for data cleaning, use
Maximum and Minimum. In the bottom half of this drop-down,
click on Variable List. Finally click on the Continue button on the
top right of the drop-down.

d) Examine the output to determine if there are any Maximums or


Minimums that are too high or too low.
A2.4.4 Correcting incorrect data in SPSS
Erroneous capturing of data is a common phenomenon. To correct this,
do the following:

a) If you are working from an existing file that has been previously
saved, click on Open an Existing File. Identify the file you want and
click OK.

b) Every time you open an existing file, you are presented with an SPSS
Data Editor, which gives you the choice of viewing the variable
information or the data that has been captured. You make your
choice by clicking on one of the two little tabs at the bottom left of
the screen, labelled Data View or Variable View. Unless you want
to change the Variable Specifications (as outlined in 4.2 b) above),
you need only to work with the Data View.

c) Make the changes you want.

d) Save the updated file.

A2.4.5 Conducting the analysis


a) Go to the main menu in the horizontal tool bar and click on Analyze.
A dialogue box will open. Select Correlate.

b) A subdialogue box will open. You will see that it contains Bivariate,
Partial and Distance. Select Bivariate to run a correlation between
the variables.

c) A further dialogue box with the title Bivariate Correlations in the


upper horizontal tab will open. The variables initially created in the
data file will appear in the variable box on the right. Highlight the
variables to be used in the correlation.

d) Still in the same dialogue box, the following are visible:


Correlation Coefficients. Choose the Pearson Correlation
Coefficient. (For your own exercise, you may wish to try out your
analysis with the other correlation coefficients such as Kendall’s
Tau-b or Spearman.)
Test of Significance. Select the Two-Tailed Test.

e) Finally click OK. An output of the correlation analysis in matrix form


will appear.

A2.4.6 Interpreting the results


Recall that the closer r gets to 1, the greater the relationship. When r =
21, there is a perfect negative correlation between the variables, meaning
that an increase in the value of one variable is associated with a decrease
in the value of the other variable. Where r = +1, a perfect positive
correlation exists, while an r = 0 denotes a zero correlation.

A2.5 Evaluating item-whole correlations


Irrespective of which program you use, you will need to highlight each
variable that you are interested in and run the correlation. When you
conduct an item analysis, highlight each item and the total, and generate
a correlation matrix of the items and the total. You are not interested in
the various correlations between the different items, only between each
item and the total. A simplified matrix of five items and the total is
shown below using fictitious data.

Table A2.3

Item 1 Item 2 Item 3 Item 4 Item 5 Total


Item 1 1 ,325 ,475 ,423 ,854 ,343
Item 2 ,325 1 ,338 ,249 ,204 ,672
Item 3 ,475 ,338 1 ,468 ,264 ,107
Item 4 ,423 ,249 ,468 1 ,317 ,604
Item 5 ,854 ,204 ,264 ,317 1 ,734
Total ,343 ,672 ,107 ,604 ,734 1

If we look at the correlation matrix in Table A2.3, we see that the values
above and below the diagonal are identical, for the reason given in
section A2.1.3 c) of this appendix. We will work on the data below the
diagonal, as highlighted. From this you see that the item-total correlation
for item 1 is 0,343, for item 2 it is 0,672, for item 3 it is 0,107, for item 4
it is 0,604 and for item 5 it is 0,734.

In terms of our rule given in A2.1.2, every value above 0,350 is


acceptable. Therefore all items other than item 1 and item 3 are
acceptable, with item 1 almost there and item 3 well below the cut-off
score.

In interpreting these results, we can conclude that items 2, 4 and 5 have


acceptable itemwhole correlations and that if we really had to, we could
include item 1, because it is just below our cut-off point of 0,350.
However, item 3 is unacceptable and should be excluded.

Of course, we could re-examine the content and working of these two


items, and run a second pilot study to see if the reworked items behave
in a better way. This would especially be the case where we had
obtained a negative correlation, which would suggest that the item had
been phrased in a negative direction.
GLOSSARY

Items in italics are cross-referenced to other entries in the glossary.

360-degree assessment – An approach to assessment usually used in


performance appraisals which involves several people from more
senior levels, peers and subordinates rating an employee. This
approach can provide useful data to act as external criteria in a
validation study.

A
Absolute zero – The lowest temperature that is theoretically possible.

Actuarial judgements – Decisions or predictions of future behaviour


based on numerical formulae derived from analyses of prior
outcomes.

Achievement test – A test of maximum performance designed to assess


knowledge, learning or proficiency. Generally used to measure the
outcomes of training or instruction in a specific area, usually one that
is academic or work related in nature.

Acquiescence – A response set or style characterised by the person


tending to agree with whatever is presented.

Adaptive tests – Tests that are made up of questions drawn from a large
item bank to match the ability level of the test taker. Also known as
tailored testing or response-contingent testing.

Adverse impact – A situation where some people being assessed find


the process more difficult than others for reasons not related to the
ability of the assessment to predict later performance.

Affirmative action – The process of taking positive steps to undo the


harmful effects of previous disadvantage and discrimination.

Age-equivalent score – A measure of a person’s ability, skill or


knowledge, expressed in terms of the age at which the average
person could be expected to achieve that level of performance. Also
called an age score. See also grade-equivalent score.

Alternate forms – Two forms of a test or measure that are alike in every
way except the actual items. These are used to overcome practice
effects when the measure has to be given on several occasions. Also
referred to as parallel forms.

Alternate form reliability – A measure of reliability in which alternate


forms of the same measurement instruments are administered to the
same subjects on separate occasions. Also known as parallel form
reliability.

Answer formats – Different ways of answering questions, such as


multiple choice, Likert scales and narrative essays.

Antecedents – The things that come before, precede what is being


observed.

Aptitude – The ability to perform or to learn material of a particular


kind, e.g. music, clerical, mechanical. In many ways, aptitudes can be
seen as specialised intelligences.

Aptitude testing – Refers to tests and procedures used to assess the


presence and relative strength of various aptitudes.

Arithmetic mean – Simply the average, which is obtained by adding


the various scores together and dividing by the number of scores
(SX/n – i.e. sigma X divided by n). It is also referred to simply as the
mean.

Artificial intelligence (AI) –The capability of a device such as a


computer to perform functions that are normally associated with
human intelligence, such as reasoning and learning through
experience.

Assembly tasks – Assessment tasks that require two or more


components to be assembled to produce a specific object or pattern.

Assessment – The process of measuring one or several variables of


interest in order to make decisions about individuals or inferences
about a population. It is the process of determining the presence of
and/or the extent to which an object, person, group or system
possesses a particular property, characteristic or attribute.

Assessment centre – An assessment process that involves a variety of


assessment exercises and where multiple assessors assess a group of
participants on a series of job-related competencies. It requires
participants to solve typical work problems or to demonstrate
proficiency at job functions such as making presentations or fulfilling
administrative duties. Assessment centres are used for assessing job-
related dimensions such as leadership, decision making, planning and
organising.

Attenuation – Shrinkage or diminution in size.

Attitude – A learned disposition to react to some attribute or situation


in a characteristic fashion.

Attribute – A psychological or physical characteristic or property.

B
Barnum effect – The tendency for people to believe vague positively
phrased reports about themselves because of the way the reports are
presented. Also termed the Forer effect.
Battery – A set of tests and other assessment techniques given to an
individual or group that have value individually and as well as
collectively.

Behavioural assessment – An approach to evaluation based on the


analysis of samples of behaviour, including the antecedents and
consequences of the behaviour, hence ABCs: antecedents,
behaviours, consequences.

Behaviourally anchored rating scales (BARS) – An approach to rating


the performance of people by using a carefully graded series of
statements that reflect increasing levels of the behaviour that is to be
observed. People are then rated against these anchors, thereby
reducing the possibility of vagueness and ambiguity about what
exactly is being rated.

Bell curve – A popular name for the normal distribution.

Bias – Systemic error in measurement or research that affects one group


(e.g. race, age, gender) more than another. Unlike random error, bias
can be controlled for. See also fairness.

Bogardus Social Distance Scale – A measurement technique in which a


person’s attitudes to different out-group members is measured by
arranging various social situations along a hierarchy of intimacy and
then indicating whether or not one would accept a member of the
out-group into that situation.

C
Cardinal, central or secondary traits – According to idiographic
theories of personality, these are the most vital, important and
peripheral traits respectively.

Career anchor – In terms of Schein’s theory of career choice, a


motivator that acts as an internal compass the person uses to make
career decisions.
Career path appreciation – An approach to assessment that determines
the complexity of a person’s thought processes and then suggests
how far up the corporate ladder the person is likely to progress, all
other things being equal.

Casual observation – A random, non-systematic way of observing or


looking at. Contrasted with systematic observation or looking for.

Ceiling effect – This is the tendency of scores on a measure to cluster at


the top end of the score distribution because the items are too easy
and a large number of them have high scores. This leads to a
distortion of correlations because of a reduction in variance in the
data. See also floor effect.

Central tendency – The response set in which people tend to choose


the middle value(s) of a rating scale when completing items and
systematically ignore values at the highest and lowest ends of the
scale. This is improved slightly by not having a mid-point i.e. using
an even number of response options.

Change score – The difference between scores before and after a task or
intervention of some kind. It indicates whether the variables
measured are improving or deteriorating.

Chaos theory – The view that when systems are left long enough they
disintegrate and become chaotic.

Classical test theory – The theory of measurement that attempts to


estimate the strength of the relationship between the observed score
and the true score. The mathematic expression is: X = T ± E, where
X is the observed score, T is the true score and E is the error.

Coding system – The system by which specific behaviours and their


intensity are recorded during observation.

Coefficient – A number or value, such as a correlation coefficient that


lies between 21,0 and +1,0, indicating the degree of similarity or
overlap between two sets of information.

Coefficient alpha – See Cronbach’s coefficient alpha.

Coefficient of determination – A measure of the extent to which one


set of data can be seen to cause or determine another set of data, and
is equal to r2. It is the amount of variance shared by the two variables
being correlated. It allows one to determine how certain one can be in
making predictions from a certain model/graph. The value of r2 lies
between 0 and 1, and denotes the strength of the linear association
between the two variables.

Coefficient of equivalence – The estimate of the degree to which two


parallel or alternate forms of a measuring instrument are similar
obtained when the scores from form A and form B of the measure are
correlated.

Coefficient of internal consistency – The estimate of the extent to


which different test items contribute to a single overall score.
Depending on the nature of the data, Cronbach’s α or the Kuder-
Richardson Formula 20 (KR20) are the most often used.

Coefficient of stability – Another term for the test–retest correlation


coefficient, indicating how stable a measure is over time.

Common-item equating – Assumes that any differences in total test


scores can be attributed to the difficulty of the other items in the two
tests. Since the persons are assumed to have the same ability
regardless of which test they take, the scores on the more difficult
test may have a constant added to make them equal to equivalent
scores on the easier test by adjusting the total scores based on the
differences of performance on the common items. This involves
adding a constant to the total score of the more difficult test.

Common-person research design – Involves the same group of


persons taking two different tests. Since the persons are assumed to
have the same ability regardless of which test they take, the scores on
the more difficult test may have a constant added to make them equal
to equivalent scores on the easier test.

Comparative psychology – An approach to understanding


psychological processes by examining behaviours and underlying
physiological systems in organisms lower in the evolutionary
hierarchy than man. Also termed evolutionary psychology.

Competence profiling/mapping – The process of listing all the


competencies required for successful performance in a particular
position. It is similar to a job description and position profile.

Competency – A competence is a set of knowledge, skills, attitudes,


values and other personal attributes that are required for successful
performance in a job.

Complexity science – An emerging paradigm in science in which


complex, relatively stable but unpredictable patterns emerge from
chaotic systems.

Computer adaptive testing (CAT) – A computer-administered


interactive test-taking procedure where the items from an item pool
are presented in response to the test taker’s previous responses. See
adaptive tests.

Computer phobia – Fear of computers.

Concurrent validity – A form of criterion-related validity in which a


test score is correlated with some external criterion that is obtained at
the same time (such as job satisfaction), hence concurrent. See also
predictive validity.

Confidence interval – The likely range of values with a known


probability of including the true value. This range is usually defined
as two z-scores above and two z-scores below the observed score.

Confirmatory factor analysis – A form of theory testing that confirms


the existence of factors predicted by theory and offers a viable
method of determining construct validity. The process tests whether
the predetermined factors are in fact present in the data using a
goodness-of-fit statistic (usually chi-square).

Conscientiousness – One of the factors of the Big Five or five-factor


model of personality. It is associated with job success in almost all
jobs.

Consequences – Actions and processes that follow from and are caused
by other actions or processes.

Construct – An attribute that exists in theory and is based on firm


scientific reasoning, but which is not directly observable or
measurable, such as personality, intelligence, anxiety and job
satisfaction.

Constructivist/qualitative school of psychology – The view that


objects and actions are given meaning by the observer; the view that
meaning is constructed by the observer.

Construct validity – The degree to which an instrument accurately


relates a given construct to other measures of the same or similar
constructs as predicted by the theory. Convergent and discriminant
validity are forms of construct validity.

Content validity – The extent to which the questions in a test or other


measure represent the universe or domain that is being assessed. It is
based on expert judgement that the content of the measure is
representative of the behaviour it was designed to measure. (For
example, does the psychology examination cover all the important
sections in the course?)

Convergent validity – A type of validity that is determined when two or


more measures that presumably assess the same construct overlap
(when there is a significant correlation between them).
Correlation – A measure of the degree and direction of the relationship
between two variables. In simple terms, a correlation is the amount of
variance in each set of scores in relation to the amount of variance
that is shared between the two variables. See Figure 5.2 in Chapter 5.

Correlation coefficient – A value, symbolised by “r”, indicating the


strength of a linear relationship between two variables in a sample.
Its value lies between 21,0 and 1,0.

Correlation matrix – A table in which several measures are listed on


both the horizontal and vertical axes, and all correlations between the
variables are shown.

Criterion – An external measure or standard of performance. This


criterion may take the form of some observed behaviour (such as
leaving an organisation) or a score on another measure (such as job
satisfaction).

Criterion problem – The name given to the fact that when calculating
the correlation between one measure and another (e.g. between a test
score and, say, job satisfaction), very often the criterion (job
satisfaction) is not very well defined. This results in the correlation
being low.

Criterion-referenced interpretation (also termed domain-referenced


or content-referenced) – A method of evaluating an individual’s
assessment score against some external criterion, such as ability to
perform a specific task. This is in contrast to norm-based
interpretation, in which the individual’s score is judged relative to
others assessed on the same instrument, for example asking whether
a person is tall enough to work as a member of the cabin crew in
comparison with whether this person is taller or shorter than the
mean.

Criterion-related validity – The degree to which a measure or test


score successfully predicts performance on some external criterion of
interest. This may be some aspect of job performance or performance
on another test or measure of the same construct assessed either
concurrently or in the future. This performance data may be available
at the time of the assessment (concurrent validity) or may only
become available at some time in the future (predictive validity).

Critical realists – The view that reality exists “out there”, but the
interpretation of this reality depends on the thoughts and other
psychological processes of the observer.

Cronbach’s alpha coefficient – An index of reliability for a set of


items, indicating the extent to which the various scale items measure
the same characteristic (internal consistency). Technically, it is the
mean of all possible split-half reliabilities. See also Kuder-
Richardson formulae.

Cross-validation – A process of revalidating a measure on a sample or


population other than that used initially to validate the measure.

Crystallised intelligence – In Cattell’s two-factor theory of intelligence,


crystallised intelligence represents the acquired knowledge and skills
needed to behave intelligently. It reflects the formal and informal
education received. It is contrasted with fluid intelligence.

Culture-fair assessment – An assessment process designed to minimise


the influence of culture and social background on various aspects of
assessment – administration, timing, item selection, responses
required and interpretation of the results.

Cut-off score – A point on a scale used for selection below which


applicants are rejected. Also termed a cut score.

D
Demand characteristics – Aspects of the situation, including the
perceptions of the person involved, which encourage people to
answer or behave in a certain way, such as the tendency to present
oneself in a favourable light or to anticipate what is being looked for
and reacting in a way that “helps” the researcher.

Detection rate –The rate of “correct” answers given, normally


expressed as a percentage of attempts. Also termed the hit rate.

Development centre – Similar to an assessment centre, with the same


kinds of technology being used. However, the accent lies on
development, so participants are encouraged to repeat exercises after
they have been given feedback, in an effort to improve their
performance on key competencies.

Deviation IQ (deviation score) – An IQ score that is based on a


normative distribution with a mean of 100 and a standard deviation
of 15.

Dichotomous response – Choice between items containing two main


category types (e.g. male and female, pass or fail, etc.) This is also
known as binary choice. See also polychotomous response.

Dimension – A psychological construct. For example, the dimension


“job satisfaction” may include a number of indicators such as
satisfaction with pay, supervision, job content, etc. Leadership
behaviour may consist of numerous dimensions or competencies.

Disassembly tasks – An assessment task involving the ability to


identify subcomponents or subpatterns within larger units.

Discriminant validity – Evidence that the measure does not correlate


with attributes or measures to which it is theoretically unrelated. It is
the opposite of convergent validity and is also known as divergent
validity.

Disparate treatment – A form of discrimination that occurs when


individuals or groups are treated differently. For example, married
men may receive a housing subsidy, whereas unmarried people and
married women do not.
Distracters – Items in multiple-choice questions that are similar to the
correct answer, but are in fact incorrect.

Domain – The area under consideration. For example, the domain of


this textbook is psychological assessment. Safety, education, and
clinical psychology are all psychological domains.

Domain-referenced interpretation – The interpretation of an


assessment score in terms of the person’s ability to meet certain task
(domain) requirements. For example, whether the person is tall
enough for … or is able to type x words per minute. See also
criterion-referenced interpretation.

Dynamic testing – A form of testing where the person is tested on a


measure (such as the Ravens Progressive Matrices), taught the
principles underlying the assessment, and then retested on an
alternative or parallel form of the measure. The increase in score
from T1 to T2 is taken as a measure of ability. (Contrasted with most
other forms of assessment which are static.)

E
Ecological validity – A type of validity that indicates the extent to
which results will generalise to other settings.

Economic value added (EVA) – A measure of the contribution made to


the organisation that takes into account the total value of the
organisation’s outputs, less its input costs. This is similar to the
notion of profit. It is possible to establish the EVA generated by each
employee. It is suggested that this statistic can be a useful measure of
an individual’s job performance.

Eigenvalue – A statistic that quantifies variation in a group of variables


and its accountability by a particular factor.

Emic approach – Involves the use of personality (and other)


instruments that tap the values and abilities that are unique to a
particular culture. Although this may be a more accurate description
of a particular group, it makes comparison across groups almost
impossible.

Emotional intelligence (EQ) – The ability to recognise, understand and


manage one’s own and others’ feelings and emotional states. A
popularisation of the interpersonal and intrapersonal aspects of
Gardner’s theory of multiple intelligences.

Empirical criterion keying – The process of using criterion groups


(e.g. males, engineers) to develop scale items. This is done by
identifying and “marking” those items that empirically distinguish
the criterion groups from others, and then combining these into a
scale or questionnaire to identify the group.

Empiricist/quantitative approach to psychology – The view that reality


exists “out there” and can be directly experienced and assessed.
(Contrasted with constructivist views.)

Employment equity – A policy aimed at ensuring equal access by


previously excluded people to various opportunities. It includes
affirmative action as one measure and is designed to make the
organisational demographics as representative as possible of the
society in which it is located.

Error analysis – An examination of the patterns of items where a


person struggles or fails.

Error score/error variance – According to the theory of measurement


model or classical test theory, any assessment score consists of true
and error components. More strictly, any group of scores will have
variance that can be attributed to a true component and an error
component.

Ethics – A set of principles that spell out what is right, good or proper
conduct. This contrasts with law, where these principles are laid
down and enforced.
Etic approach – Taking a personality instrument and apply it across all
cultures to see how different groups behave on this measure. It
assumes that personality has the same definition and almost the same
structure across cultures – it is a universalistic approach to
assessment. For example, if we took a personality measure such as
the 16PF, Myers-Briggs type indicator (MBTI) or Minnesota
multiphasic personality inventory (MMPI) and applied it to all
groups to see whether and how these groups differ, this would be an
example of a universalistic approach.

Evaluation – A process designed to answer a question related to the


value or worth of something. It involves interpreting or attaching a
judgemental value to an assessment outcome.

Evolutionary psychology – An approach to understanding behaviour by


looking at the behaviour (and the physiological structures associated
with them) in animals lower in the evolutionary chain than man.

Expectancy table – A statistical table that summarises the probability or


likelihood that given a particular assessment score, some outcome
will occur.

Experiment – A controlled test of a hypothesis in which the researcher


manipulates one variable to discover its effect on another.

Exploded diagram – A diagram in which the parts are shown


distributed in space while maintaining their relationship with each
other – often used in technical manuals.

Exploratory factor analysis – A theory-generated procedure that


identifies factors and/or factor patterns associated with variables or
measures. It determines the number or nature of factors that best
account for the pattern of relationships that is observed in the data.
This analysis tries to discover the number and nature of these factors
when prior research analysis is not present. (This is contrasted with
confirmatory factor analysis.) See also factor analysis.
F
Face validity – The extent to which an instrument appears to be valid to
those who are completing it. In other words, it seems to be measuring
what it is supposed to measure.

Factor analysis – A statistical technique used to examine the


interrelations among a set of variables or items in order to identify an
underlying structure to those items. Factor analysis can show whether
the relations between items on a test are consistent with the
underlying theoretical construct or constructs. This process can be
exploratory, which is typically used to identify common underlying
constructs among a group of variables in research in the absence of
any theory or model. It can also be confirmatory, which means that
an underlying causal structure is hypothesised or suspected and the
degree to which the data supports the hypothesised structure is
evaluated.

Fairness – The extent to which assessment outcomes are used in a way


that does not discriminate against particular individuals or groups.
See also bias. An assessment technique or process can be biased but
fair, if the extent of the bias is known and steps are taken to correct
this in arriving at a decision.

Faking – When people being assessed modify their answers in order to


create a false impression, rather than answering honestly. See also
impression management and malingering.

False negative – When a person or object that is predicted not to


perform satisfactorily does, or when a person or object possessing an
attribute or condition tests negative for it (e.g. someone who tests
negative for drug taking but in fact does take drugs). This is also
termed a Type 2 error.

False positive – When a person or object predicted to perform in a


satisfactory way does not do so, or when a person or object that does
not possess an attribute or condition tests positive for it (e.g. a person
who tests positive for drug taking but in fact does not take drugs).
This is also termed a Type I error.

Five-factor model of personality (the Big Five) – A widely accepted


view that there are five basic or central personality factors.

Flesch scores – Scores that reflect the difficulty level of text. There are
two important scores, namely the Flesch-Kincaid grade level and the
Flesch Reading Ease. The Flesch scores are based on the average
number of syllables per word and words per sentence.

Flesch-Kincaid Grade Level score – Rates text based on the US high


school grade level system (i.e. a score of 7,0 would mean a seventh
grader should be able to comprehend the text). The Flesch Reading
Ease score is based on a 100-point scale: the higher the score, the
easier it is to comprehend. The Flesch-Kincaid grade level is
calculated as follows:

(0,39 × ASL) + (11,8 × ASW) 2 15,59

where ASL is the average sentence length (number of words divided


by the number of sentences) and ASW is the average number of
syllables per word (number of syllables divided by number of
words).

Flesch Reading Ease score – The Flesch Reading Ease statistic is


calculated as follows:

206,835 2 (1,015 × ASL) 2 (84,6 × ASW)

Floor effect – The tendency of scores on a measure to cluster at the


lower end of the score distribution because the items are too difficult
– a large number of people have low scores. This leads to the
distortion of correlations because of a reduction in variance in the
data. See also ceiling effect.

Fluid intelligence – In Cattell’s two-factor theory of intelligence, fluid


intelligence represents non-verbal abilities and is therefore much less
dependent on formal and informal education. (It is contrasted with
crystallised intelligence.)

Flynn effect – The so-called intelligence inflation. Measured


intelligence increases each year after a test is developed, representing
a general increase in the knowledge of the general population.
Named after the person who first described it, J.R. Flynn (1984).

Focus group – A qualitative research method using groups of people


with specific characteristics brought together for the purpose of the
research. It is used widely in market research.

Forced distribution – Where people are rated in such a way that a


given proportion are placed into particular categories, for example 25
per cent in the top category, 50 per cent in the middle category and
25 per cent in the lowest category, irrespective of absolute levels of
performance.

Forensic psychology – The theory and practice of psychology in legal


situations, such as the fitness of someone to work after an accident or
with a particular illness.

Forer effect – The tendency for people to agree with generalised


statements that show them in a relatively positive light (also termed
the Barnum effect). This typically occurs in fortune telling and
horoscopes.

Formative assessment – An ongoing assessment that shows whether


the intervention is having an effect during the process. In this way it
acts as a steering mechanism. It contrasts with summative evaluation,
which occurs at the end of the process.

Fractals – Self-repeating patterns that occur at different levels in a


system.

Frequency distribution – A table or graph showing the number of


people obtaining a particular score on an item or in an assessment
process.

Full ranking – A ranking technique in which there are as many


categories as there are people to be ranked, that is, no two people can
be ranked in the same category (no ties).

G
Gardner’s theory of multiple intelligences – According to Gardner,
there are at least seven or eight distinct kinds of intelligence.

g-factor (or g) – General intelligence within Spearman’s two-factor


theory of intelligence. It is contrasted with s-factors or specific
factors.

Generalisable – The extent to which a test or other measure is able to


produce similar results when given under different conditions or to
different groups. See also robustness.

Goodness of fit – A statistical test of significance that provides evidence


that a particular result or set of factors obtained during an assessment
matches the predicted results of factor structure. Simply stated, it is
the amount of overlap between the obtained result and that predicted
by the theory. This statistic is usually chi-square.

Grade-equivalent score or grade-referenced score – A measure of a


person’s ability, skill or knowledge, expressed in terms of the grade
level in school at which the average person could be expected to
achieve that level of performance. Also called a grade score.

Grade norms – Norms that have been derived from a sample of people
in the same school grade as the person being assessed, which allows
the person to be evaluated against the performance of others in the
same grade. See also age-equivalent score.

Grounded theory – A research method that seeks to construct theory


about issues of importance in people’s lives. It does this through a
process of data collection that is inductive in nature, in that the
researcher has no preconceived ideas to prove or disprove. Rather,
issues of importance to participants emerge from people’s accounts
and stories about an area of interest or event that is of interest to the
researcher. The researcher analyses data by constant comparison,
initially of data with data, progressing to comparisons between their
interpretations translated into codes and categories and more data.
This constant movement between the field data and the analysis
grounds or locates the researcher’s final theorising in the
participants’ experiences. See also emic and etic.

H
Halo effect – A judgement error that occurs when the judgement of a
person’s abilities, motives, and so on are affected by a positive score
on another attribute. For example, attractive people generally score
higher on measures of work motivation, effort, and so forth than do
people who are less attractive.

Hermeneutics – A method of enquiry that involves the development


and study of theories that underpin our interpretation and
understanding of texts and life events. It argues that our knowledge
of events, people and causes are embedded within a social network of
norms, purposes, power, and so on. The hermeneutic method
involves cultivating the ability to understand things from somebody
else’s point of view, and to appreciate the cultural and social forces
that may have influenced their outlook. It is the process of applying
this understanding to interpreting the meaning of written texts and
behaviour.

Homogeneity of the population – A population in which all members


are very similar to each other.

I
Idiographic – The idiographic approach to personality focuses on a
person’s unique psychological structure and no attempt is made to
describe the person in terms of any particular traits or theoretical
constructs. This sometimes makes it difficult to compare one person
with others. See also nomothetic.

Impression management – The attempt to manipulate others’


perceptions in order to appear in a more favourable light. See also
faking and malingering.

Incremental validity – The increase in predictive validity that results


when additional predictors are included in a validity calculation. For
example, suppose a single ability measure (such as a score on an
intelligence test) correlates 0,48 with job success, and the inclusion
of a measure of conscientiousness raises this correlation to 0,63, then
the incremental validity obtained by including the conscientiousness
measure would be 0,63 2 0,48 = 0,25. In other words, including the
second predictor increased the predictive validity by 0,25.

Informed consent – Understanding and agreeing to the procedures


(assessments, etc.) that they will undergo, as well as how the results
are to be used.

Inherent requirement of the job – An attribute or characteristic that is


required for the successful performance of a job. For example, good
eyesight is required for becoming an airline pilot. Selection based on
these criteria is not discriminatory. (The International Labour
Organization (ILO) terms these bona fide occupational
qualifications, or BFOQs.)

Instrument – A device or procedure created to assess a trait, ability or


characteristic of individuals. Instruments include questionnaires,
surveys, tests and other forms of assessment, and the word
“instrument(s)” may be used interchangeably with these terms.

Integrity testing – A measure that claims to measure the honesty levels


of the person being assessed.
Intelligence – The ability of people to solve problems and learn from
experience. It is defined in this book as the efficiency with which
information is processed to achieve socially desirable outcomes.

Intelligence quotient (IQ) – A term created by David Wechsler to


indicate the level of a person’s intellectual ability. It was originally
seen as mental age divided by chronological age multiplied by 100.
Later, the notion of a deviation IQ replaced this formulation. By
definition, the average IQ is 100, with a standard deviation of 15.
Intercept – The place or value where a regression line crosses the Y-
axis.

Inter-correlation matrix – Measures the strength of correlations


between each of a number of observed or measured variables.

Internal consistency – An index of the extent to which all members of a


set of items in a test measure the same trait or characteristic. See also
Cronbach’s alpha coefficient.

Interquartile range – An ordinal index of the spread or variability of a


set of numbers which is the difference between the first and third
quartile points of the score distribution. See quartile.

Inter-rater reliability – The extent to which different evaluators of a


task or performance give identical ratings. It is also termed scorer or
inter-scorer reliability.

Interval data – Data where the objects/processes can be rank ordered


and where the ranks are of equal size. There is no absolute zero, thus
excluding many sophisticated forms of mathematical and statistical
analysis. Most psychological and social phenomena are best
described as being interval data.

Investment theory of intelligence – The view that people are all born
with a certain raw ability to see relations and identify rules or
patterns that exist between objects, and that we can measure this
ability (g), using appropriate culture-fair tests. As we get older, we
“invest” this fluid g in certain kinds of judgement skills, such as
those involved in doing a mathematical or word problem, or
composing a sentence. Most people growing up in a stable
environment, receive a similar formal education so that they all
invest their fluid g in much the same kinds of judgement skills. This
means that their fluid intelligence and their crystallised intelligence
are so similar at an early age that it is almost impossible to tell them
apart. People raised in very different cultures may well have very
different fluid and crystallised intelligences.

Ipsative rating or scoring – A process of rating the relative strength of


items by a single person by ranking them or distributing a fixed
number of points among the items.

Item – An individual scenario, question or task designed to elicit a


response from a test-taker.

Item analysis – The examination of how items behave in terms of


aspects such as difficulty, correlation with the whole, and so forth.

Item bank – A collection of test and/or questionnaire items, organised


by subject matter, item difficulty and question type (multiple-choice,
true/false, etc.). For example, a lecturer may have 200 multiple-
choice questions of which he uses only 25 in any one test or
examination. As each item is used, the good items are retained and
the weaker ones modified or excluded. This bank helps to create
questionnaires. In computer adaptive testing, items are selected from
this bank.

Item bias – The tendency for different groups to respond to items in


different ways as a result of class, culture, gender and similar
differences in experience and the meanings attached to the items.

Item characteristic curve (ICC) – A graphic representation of item


difficulty and its ability to discriminate between high and low
scorers. A good item is one that is not too easy or too difficult, and
where most people scoring high will get it right and most low scorers
will get it wrong. See item response theory.

Item direction – Items can be phrased in a positive direction (e.g. I like


ice cream) or in a negative direction (e.g. I do not like ice cream).

Item-remainder correlation – In order to determine the relationship of


an item to the scale total, the item is correlated with the total score of
all items, excluding the item itself. This is known as the item-
remainder correlation. (If the item score is included in the total score,
this is termed the item-whole correlation.)

Item response theory (IRT) – A measurement model that assumes that


the characteristic being measured is a latent variable and that it
causes the responses observed on a test or measure. For example, a
person who has an underlying fear of dogs responds positively to the
item: “Dogs cannot be trusted”. The fear is the latent variable which
“causes” him to respond as he does. IRT relates the performance of
each item to a statistical estimate of the strength of the construct
being measured. Also known as latent-trait theory or the latent-trait
model.

J
Job analysis – The systematic assessment of the knowledge, skills,
values and other attributes required to perform a job successfully.

Job description – A list of all the tasks associated with a particular job,
as well as the tools required for this. It is also termed a post profile.

Job satisfaction – A positive emotional state resulting from the


perception that one’s job is meeting (most of) one’s needs.

Judgemental measures – Techniques used for rating the performance


of one employee relative to others. This information can be used as
an external criterion for evaluating or validating an assessment
technique.
K
Key performance areas (KPAs) – The most important areas and/or
tasks within a person’s job that contribute most to his effectiveness.

Kuder-Richardson formulae – A series of equations developed by


G.F. Kuder and M.W. Richardson as measures of the internal
consistency of tests. Formula KR-21 gives a quick estimate of the
lowest possible KR-20 for a given data set. These various formulae
are based on Cronbach’s alpha coefficient.

L
Latent-trait model – A set of assumptions about measurement,
including the assumption that a trait being assessed is uni-
dimensional and that each item measures the strength of that trait.

Law of large numbers – As the number of items (or people) sampled


increases, so the sum of the error component tends towards zero: the
errors cancel each other out.

Leniency error – A systemic rating error in which raters consistently


score assessments at a level that is higher than is warranted. It is also
known as the generosity error. See also severity error.

Levels of measurement – The four types of data, namely nominal,


ordinal, interval and ratio. The level determines the type of statistical
analyses that can be performed on the data.

Likert-type response – An ordinal rating scale developed by Rensis


Likert in efforts to measure attitude (e.g. Strongly
disagree/Disagree/Neither disagree or agree/Agree/Strongly agree).

Linear regression – A technique used to determine the relationship


between one or several predictors and a criterion. It does this by
fitting a straight line onto the data set in a way that minimises the
variance. It determines the effect of an independent variable on a
dependent one.

Locus of control – A personality construct indicating whether the


person believes that what happens to him is internal (i.e. located
within the person and hence under personal control) or external to
him (and thus lies beyond the control of the person).

M
Malingering –The deliberate faking of a bad condition (or making it
seem worse than it really is). This is a ploy frequently used in
insurance claims and similar situations. Many psychological
assessment instruments used for forensic purposes have separate
scales to detect malingering built into them. See also faking and
impression management.

Market research – Research carried out to determine the public’s


perceptions of and attitudes towards existing or planned products or
services or perceptions of an organisation as a whole.

Mean – The arithmetic average that is calculated by adding all the


scores together and dividing by the number of scores. (In
mathematics, this is given by SX/n and stated “Sigma X over n”.)
See also arithmetic mean.

Measurement – A logical process of assigning numbers to observations


to represent the quantity of a trait or characteristic possessed. It
involves applying clearly stated rules that are public, transparent,
unambiguous and agreed upon by knowledgeable people to
determine how much of some property or attribute is present in a
particular object, system or process.

Measurement error – Variations or inconsistencies in the measurement


of some property or attribute. This error may be systemic (bias) or
random (error).

Median – The middle score of a distribution – the 50th percentile. The


score at which half the scores are below and half the scores are
above.

Mental age – A term created by Simon Binet to represent a person’s


intellectual ability relative to how an average person of a given age
would perform on a given task (i.e. if the person knew as much and
thought in the same way as an average ten-year-old, the person
would have a mental age of 10). This term is seldom used today, but
is an important construct in the development of intelligence testing
and the notion of an IQ.

Motivation – The nature and strength of the factors that cause a person
to behave in a certain way (to initiate, continue with or stop some
action).

Multiple regression – The analysis of the relationship between several


predictors and a criterion to understand how each of these
contributes to (i.e. predicts) the final score on the criterion.

N
Need – According to one personality theory, our behaviour is
determined by various needs, which continue to have motivating
force until they are satisfied. The best known need theory is
Maslow’s hierarchy of needs.

Need for achievement (NAch) – One of the motivating forces


described by McClelland et al. (1953), in which people strive to
reach high levels of personal achievement. TATs are scored for
achievement imagery.

Need for affiliation (NAff) – One of the motivating forces described by


McClelland et al. (1953), in which people strive to be liked and
admired by others. TATs are scored for achievement imagery.

Need for power (NPow) – One of the motivating forces described by


McClelland et al. (1953) in which people strive to reach high levels
of personal power. TATs are scored for achievement imagery.

Nominal scale/data – The most basic level of measurement, in which a


value is given on the basis of group or category membership (e.g.
Male =1, Female = 2). These categories are mutually exclusive (an
object cannot belong to more than one category) and exhaustive (no
other categories exist or are possible).

Nomothetic – Emphasises that all personality characteristics are well-


defined entities and therefore common to all people. People differ
only in their positions along a continuum of these characteristics and
they are unique only in the balance and amount of each
characteristic. This makes it relatively easy to describe people. See
also idiographic.

Non-parametric statistical techniques – Many of the statistical


methods used in psychology are based on the assumption of a normal
distribution, with a mean and standard deviation (or variance). These
are termed “parameters” and the use of these techniques generally
requires the data to be interval or ratio. Non-parametric statistical
techniques do not need parameters of this kind and can be used with
nominal (or categorical) and ordinal data.

Norm – A standard against which an individual’s performance is


judged. Norms are established by assessing a large sample and then
using that data to gauge the performance of individuals tested
subsequently.

Norm-based or norm-referenced (normative) interpretation – The


process of comparing an individual’s score to the range of scores
obtained by a group of similar people assessed on the same
instrument.

Norm group – A group of people used to norm or standardise


(calibrate) the assessment instrument. See also grade norms.

Normal distribution – The symmetrical bell-shaped distribution curve


in which the number of cases with a particular score is plotted against
each score. The normal curve illustrates that the largest number of
cases is close to the mean, with fewer and fewer cases occurring the
further away from the mean one moves. Most psychological
characteristics, including test scores, are assumed to be normally
distributed.

Normative – See norm-based interpretation.

Norming – The process of calibrating a test or drawing up a set of


norms. Also termed standardisation.

O
Objective tests – Tests where only one answer for each item is correct
and therefore no subjectivity is required. Multiple-choice
examinations are of this kind.

Objectivity – The extent to which an assessment process or scoring


procedure is independent of the judgement of the assessor.

Observation – The deliberate act of seeing how an individual, animal or


system behaves. Observation can be casual (looking at) or systematic
(looking for).

Observation schedules – The systematic framework or the pattern in


terms of which the frequency, duration and targets of observation are
determined. Also termed a sampling frame.

Observed score – In classical test theory, the score that results from the
assessment process. This is contrasted with the true score. The
observed score is seen as the true score ± an error score.

Occam’s razor – The view that the simplest explanation of any


phenomenon is likely to be the correct one.

Ordinal scale data – A scale at the second level of measurement in


which the items are placed in rank order from greatest to least (or
vice versa) such that the differences between the ranks need not be
equal. (Compare with Likert-type response or dichotomous
response.)

Organisational citizenship behaviour – Prosocial forms of behaviour


within organisations supporting the objectives of management and
co-workers.

Organisational development (OD) – A process of training and


motivating employees, as well as redesigning organisational
processes in order to improve the overall performance of the
organisation.

Organisational effectiveness – An indication of how well the


organisation is succeeding in achieving its business and other
objectives.

Outlier – Any score or observation that is numerically distant from the


rest of the data or that lies outside the expected, possible or desired
range.

P
Paired comparisons – The process in which employees are compared
with one another in all possible combinations – A with B, A with C,
B with C, and so on. The number of possible comparisons is given by
the formula n(n21)/2.

Paradigm – A set of assumptions about what is being investigated.

Parallel form – See alternate form.

Pearson’s product moment correlation – The most commonly used


index or coefficient of the relation between two sets of data,
symbolised by “r”. Most reliability and validity coefficients use this
technique.
Peers – Equal-ranking colleagues.

Percentile – A number, between 0 and 100, expressing the percentage


of the norm group whose score falls below the particular raw sore on
a measure. If 35 per cent of people in the sample assessed are shorter
than you are, then your height is at the 35th percentile. If 75 per cent
of lecturers at other universities earn more than Professor X does,
then his salary is at the 25th percentile of university lecturers.

Performance appraisal – The process in which an employee’s job


performance is formally evaluated, that is, how well an employee has
achieved his task within agreed key performance areas (KPAs).

Person specification/person profile – A description of the type of


person (e.g. knowledge, skills, experience, etc.) required to fill a
position in an organisation and forms the basis for drawing up a
selection battery. (Compare with job description and post profile.)

Personality – A theoretical construct that describes an individual’s


unique blend of traits, values, interests, attitudes, personal styles and
related attributes. It is defined in section 11.1.1 as “the preferred
ways that people process information and interact with the world in
which they live”.

Personality profile – The pattern of the relative strengths of various


traits presented in a table or graph form.

Pilot study or test, piloting – A stage in the development of a survey or


scale in which the prototype is tested on a small sample of the
targeted group to see how it behaves. Items are analysed, instructions
and time limits investigated, and so forth. Once these aspects have
been satisfactorily addressed, the instrument can be administered to a
much larger group to establish norms (where this is relevant).

Polychotomous response format – Items containing more than two


response categories (e.g. always/sometimes/never or very
strong/strong/neutral weak/very weak). (Compare with dichotomous.)
Population (statistical) – All members of the class of objects or people
that are to be considered, for example all male students at a specific
university, all drivers of red cars in South Africa. (Contrasted with a
sample.)

Position Analysis Questionnaire (PAQ) – A job description technique


that uses an extensive checklist to indicate which tasks and/or
behaviours are required by the particular job. It provides a list of
psychological characteristics associated with the job (in terms of a
modified list of Thurstone’s primary mental abilities or PMAs).

Positive psychology – An emerging branch of (an approach to)


psychology, which strives to understand the origins of normal
healthy people and groups.

Post profile – Another term for a job description.

Power – A statistical term indicating the probability that a statistical test


will lead to a correct decision.

Power test – A test for which respondents have ample time to answer all
the items. (Compare with speed test.)

Practice effects – The benefits that result when a person is assessed a


second or third time on the same instrument, reflecting learning that
takes place during the initial assessment. See also transfer effect.

Predictive validity – A type of criterion-related validity in which a test


is correlated with a criterion occurring at a later point in time. This is
contrasted with concurrent validity.

Predictor – The measure (e.g. assessment score) used to predict


performance on some external variable or criterion. For example, if
one uses job satisfaction to predict turnover behaviour, job
satisfaction is the predict- or and leaving (or intending to leave) the
organisation is the criterion.
Pre-market discrimination – A form of discrimination against specific
groups that occurs when members of the group are prevented from
gaining required skills or experience before they get into the market.
For example, if mathematics is required for many jobs, and girls are
not taught mathematics at school, then they will be victims of pre-
market discrimination.

Primary mental abilities (PMAs) – The building blocks of cognition in


Thurstone’s theory.

Principal component – An uncorrelated (or stand-alone) variable that


has been derived from a mathematical procedure (principle
component analysis) which is used to reduce the dimensions of the
data. This technique is similar to factor analysis, as it attempts to
extract factors. This helps to identify new and meaningful underlying
(or latent) variables.

Proctor – A person who acts as an administrator or monitor to ensure


that self-testing procedures are properly followed.

Protocols – A term for the answer sheets obtained during the assessment
process.

Production measures – Output measures such as “widgets made”,


“words typed”, “bricks laid”, and so on that can be used as external
criteria for evaluating or validating an assessment technique.

Profile analysis – The interpretation of patterns of scores on a test or


test battery. A useful measure of the similarity of two profiles is the
d-statistic, which is analogous to the calculation of variance, except
that the average (X) is replaced with P, the value of the profile point
that is the standard against which the profile is being evaluated. In
other words, D2 = Σ(X – P)2/n). The d-statistic is the square root of
D2.

Programme evaluation – The assessment of the effectiveness of


ongoing activities such as health care delivery, education,
rehabilitation, or other social programmes or interventions.

Projective hypothesis – The assumption that when people try to make


sense of ambiguous material their interpretations reflect their own
personal needs or unconscious states.

Projective technique – Presents ambiguous or incomplete material


(such as inkblots, pictures and incomplete sentences) and requires the
respondent to describe what the stimulus means. It is based on the
view that when confronted with ambiguous material, people will
project their needs, fears, wishes, and so on onto the material – the
so-called projective hypothesis.

Psychic unity – The view that all people are essentially the same, all
“created equal in the eyes of God”.

Psychological test – A sample of behaviour gathered under standardised


conditions with clearly defined rules for scoring the sample, with a
view to describing current behaviour or to predicting future
behaviour.

Psychometrics – The study of how the measurement of psychological


information is operationalised; the quantitative and technical aspects
of the measurement of psychological processes and attributes.

Psychometric tests – See Psychological test.

Psychometrist – A person with a four-year degree in psychology and


appropriate practical training who is qualified to administer and score
certain non-clinical tests.

Q
Qualified individualism – A form of affirmative action in which
organisational interests are maintained while preference is given to
the selection and development of underrepresented groups. See also
unqualified individualism.
Quantitative data – Information presented in numerical form.

Quartile – If the distribution of scores is divided into four groups, each


containing an equal number of cases, each group is termed a quartile.
In addition, the score that separates the first and second quarter of the
results is termed the first quartile (Q1), while the score separating the
third and fourth quarter of the results is termed the third quartile
(Q3). (The score separating the second and third quarter of the
distribution is the second quartile (Q2), although this is more
commonly known as the median. This is the score where half the
population has lower scores and half the population has higher
scores.)

Questionnaire – An assessment device containing written questions and


which is used for the purpose of gathering data from an individual.
See also survey.

Quintiles – When the distribution of an assessment score is divided into


five roughly equivalent categories or bands, each category is termed
a quintile. See also quartile.

Quotas – A set number of individuals who need to be selected in order


to meet racial, gender or other socially desirable targets irrespective
of quality or availability.

R
Race norming – The practice (controversial and even banned in some
parts of the US) of developing separate norms for different race
and/or ethnic groups. In the past (and in some quarters even today),
this was or is the preferred way of dealing with group-based
differences in performance on various measures of performance.

Random error – Measurement error that is apparently due to random


causes; the opposite of systemic error.

Ranking techniques – Techniques used for judging the performance of


one employee relative to others. This information can be used as an
external criterion for evaluating or validating an assessment
technique.

Rapport – The degree of comfort, cooperation and communication


between the assessor and the person being assessed.

Rating techniques – One of the judgemental approaches for quantifying


a person’s performance level. It involves rating the person against
some external standard or criterion. There are a number of ways of
doing this, including continuous scales, numeric scales, and
behaviourally anchored rating scales (BARS).

Ratio data – The level of measurement where there is a zero point and
where score differences are assumed to reflect in an absolute sense
differences in the phenomenon being assessed.

Raw score – The unstandardised score for a test or measure. It is


typically not interpretable without additional information for
reference, for example 25 out of 45.

Reactivity – Changes in a person’s behaviour, thinking or performance


that arise in response to being observed, assessed or evaluated.

Reasonable accommodation – The acceptable process of bending the


rules to make allowances for people with special needs.

Reliability – The degree of consistency of a measure and/or the degree


to which it is free of random error. A test or measure that produces
consistent results has high reliability.

Response set – The tendency to respond to test items or interview


questions in some characteristic way, including acquiescence
(agreeing with everything) hostility, extremity or centrality, and
social desirability (giving socially or politically correct answers to
questions). Also termed “response style”.
Responsiveness – The extent to which a measure can detect small
changes. See also sensitivity.

Restriction of range – A reduction in the range of scores on a variable


that reduces the amount of variability in the data and causes the
correlation of the variable with others to be reduced. See also floor
and ceiling effects.

Robustness – The degree to which a measurement technique or


instrument is relatively unaffected by environmental and other
sources of error.

S
Sample – Some part of a larger body of people or objects chosen to be
representative of the whole. See also population.

Sampling – The process of drawing a sample from a larger group or


population in such a way as to maximise its representivity.

Sampling frame – The framework used for observing or sampling


behaviours during observation. It is also termed an observation
schedule.

Scale – A set of ordered numbers or descriptors, arranged in a fixed


order, indicating increasing or decreasing amounts of some property
or attribute, and used for indicating the degree to which the attribute
is present. Also refers to the questionnaire incorporating the items
and descriptors that indicates the degree of a characteristic or trait in
an ordered way.

Scattergram or scatter plot – The plot of individual scores on one


variable against another variable. When both variables are normally
distributed, this plot takes the shape of an oval.

Select-response – An assessment method in which testers are given


predefined choices from which to choose in order to answer a given
item, as is used in a multiple-choice test.

Selection – A process in which a person is evaluated for a position and


then accepted or rejected.

Selection ratio – The ratio or number of people selected in relation to


the total number of applicants.

Self-administered – A method by which the person supplies


information about himself. Measures of personality, interests and
values are generally self-report or self-administered measures.

Semi-structured interview – An interview with pre-determined


questions, although these need not be followed in any specific order,
and which allows for additional probing questions to be asked if
necessary. See also unstructured and structured interviews.

Sensitivity – The ability of a test to detect a condition or attribute (or a


change in these) that is present in small amounts. See also
robustness.

Severity error – A systemic rating error in which raters consistently


score assessments at a level that is lower than is warranted. See also
leniency error.

s-factor – In terms of Spearman’s two-factor theory of intelligence, the


s-factors (specific factors) are contrasted with the general or g-factor.

Skewness of a distribution – If many of the items in an assessment are


very easy or very difficult, the resulting distribution of the scores will
not be normal or bell shaped, but will crowd at the top end of the
distribution (positive skewness) or at the lower levels of the
distribution (negative skewness).

Spearman-Brown formula – A mathematical equation for correcting


and/or boosting the correlation coefficient between two halves of a
test or other measure. It is a correction for attenuation. It is not
appropriate for speed tests.

Speed test – A test of achievement or ability with a clear time limit, and
containing items that are of a uniform difficulty level, that is, within
the reach of most of the people in the target group. The fact that the
items have to be completed in a relatively short time emphasises the
efficiency with which candidates can answer the items. (Compare
with power test.)

Split-half reliability – A measure of internal consistency, obtained by


correlating two sets of scores obtained from equivalent halves of a
single test administered once. Because correlations depend on the
number of items in the data set, shorter tests have lower correlations
than longer ones. As a result, the correlation between two halves of a
test needs to be corrected for attenuation, using the Spearman-Brown
formula.

Standard deviation – A statistical term that is a measure of the spread


or variability of a set of scores. Technically, it is the square root of
the variance (V).

Standard error of measurement (SEM) – In classical test theory, a


measure of the extent to which an observed score varies around the
true score. It is a z-score. Values within 2 SEMs above and 2 SEMs
below this true score can be treated as equivalent.

Standardisation – Another term for norming. It is also used to refer to


the use of standard (identical) procedures for administering and
scoring tests and other assessment procedures.

Standard score – Another name for (McCall’s) T-score.

Stanine – A norm-based measure in which the score distribution is


divided into nine roughly equal sets of data. It derives from standard
nine and has a mean of 5 and a standard deviation of approximately
2. It provides the largest number of categories that fit onto a single
column for statistical analysis, but has the disadvantage of having no
midpoint. See also sten.

Stem – Also known as question stem and item prompt. It is the item or
question used to elicit a multiple-choice response.

Sten – A norm-based measure in which the score distribution is divided


into ten roughly equal sets of data. It derives from standard ten. It
provides a suitably large number of categories with the added
advantage that it has a clear midpoint and a mean of 5,5. However, it
is somewhat inconvenient for analysis purposes because it requires
two computer columns, the second of which is almost never used.
See also stanine.

Strange attractors – An apparent force (like magnetism or gravity) that


pulls or attracts elements of a system to different points.

Stratified (strat-adaptive) assessment – An approach to assessment


where later items depend on responses to earlier items.

Structured interviews – An interview in which questions are asked


from a predetermined guide in a specific order, with very little
leeway for additional questions to be asked. See also unstructured
and semi-structured interviews.

Structure of intellect model – The name given to Guilford’s (1959)


highly structured model of how people think.

Subjectivity – A situation where opinion or judgement is called for.


(This is contrasted with objectivity.)

Summative assessment – A form of assessment which occurs at the end


of some process or intervention and is evaluative, showing whether
the intervention has had an effect. It contrasts with formative
assessment, which occurs in an ongoing fashion during the process or
intervention, and acts as a steering mechanism.

Survey – A technique used to determine specific information about a


sample of individuals. This information can include questions about
people’s attitudes (how they feel), knowledge (what they know) and
behaviour (what they do). Surveys do not usually assess the strength
of particular constructs, so that the responses cannot as a rule be
aggregated (totalled) to yield a single score.

Systematic observation – The process of scientific observation in


which specific behaviours are looked for. Contrasts with casual
observation, which is haphazard and is essentially a process of
looking at.

Systemic error – Error introduced into a measure that is consistent in


form; also known as bias.

T
Tailored testing – A testing method, used in computer adaptive testing,
in which the test complexity is structured to the test taker’s ability.
The tests or test items are adapted for use by a particular individual.
(Compare this with clothes that are made for you by a tailor as
against those bought off the shelf from a store.)

Test – See psychological test.

Test administrator – The person who actually administers and


supervises the test process (hands out and collects the material, gives
instructions, keeps time, and so forth).

Test battery – A composite or selection of tests and assessment


procedures designed to measure a range of relevant constructs for
selection or diagnostic purposes.

Test of maximum performance – A test that requires the test taker to


perform a given task to the best of his ability, usually in a timed
context. This is typical of intelligence and knowledge tests. This
contrasts with tests of typical performance such as personality or
interest tests.
Test of typical performance – A test that requires the test taker to
perform a given task as he typically does in an everyday (non-
competitive) situation. It is usually untimed. This is typical of
personality or interest tests. It contrasts with tests of maximum
performance such as intelligence and knowledge tests, which are
usually seen as being competitive, in that one is doing one’s best
within a limited time period.

Test–retest reliability – Under the assumptions of equal true scores and


uncorrelated errors, the correlation between two administrations of a
test given to the same individuals at two different times is an estimate
of the test’s reliability. It yields a coefficient of stability. See also
robustness.

Test sophistication – The extent to which a person is comfortable with


the process of being assessed, arising from previous experiences of
assessment. Also termed “test wiseness”.

Track record information – Information that is usually obtained from


an employee’s personnel file indicating rate of promotion, courses
attended and passed, achievements, and so forth that can be used as
an external criterion for evaluating or validating an assessment
technique.

Traditional interview – An interviewing technique when the


interviewer asks different interviewees different questions, making
the comparison of results very difficult. See also unstructured
interview.

Trait – A relatively enduring characteristic or way in which one person


differs from another. Trait theory is a major approach to personality
definition and assessment in organisational psychology.

Transfer effects – The learning that takes place when an assessment


such as a test is given to someone which then affects (improves) his
performance on subsequent assessments with the same test. In is a
major factor influencing test–retest reliability. The way to overcome
this problem is to develop and use an alternative or parallel form of
the assessment. See also practice effects.

Triangulation – The process by which information is purposefully


gathered across multiple measures, multiple domains, multiple
sources, multiple settings, and on multiple occasions.

True score – In classical test theory, the expected score of an individual


across all conditions. It is a hypothetical construct, because the size
of the error component cannot be specified with any accuracy.
However, the observed score is seen to reflect the true score ± some
error component.

T-scores (also termed standard scores or McCall’s T-scores) – Scores


that are based on a normative sample that have been converted to
yield a distribution with a mean of 50 and a standard deviation of 10.

Type A personality – In Friedman and Rosenman’s typology (1974), a


Type A personality is characterised by high levels of
competitiveness, impatience, achievement orientation with strong
needs for domination. Type As have been shown to be prone to heart
attacks. See also Type B personality.

Type B personality – In Friedman and Rosenman’s typology (1974), a


Type B personality is characterised by traits that are the opposite of
Type As. They are far more patient and casual. See also Type A
personality.

Type 1/Type 2 errors – A Type 1 error occurs when a true result is


rejected, whereas a Type 2 error occurs when a false result is
accepted. It is better that 100 guilty people be found not guilty (Type
1 error) than one innocent person to be found guilty (Type 2 error).

U
Unstructured interview – An interview that does not follow a
predetermined structure or set of questions. Each interview is unique
and different people are asked different questions, giving rise to low
reliability and validity, and being open to bias. See also structured
interview and traditional interview.

Unqualified individualism – A selection strategy in which the best


person (most qualified) is chosen, irrespective of factors such as race,
gender, previous disadvantage. See also qualified individualism.

V
Validation – The process of proving the validity of an assessment
measure.

Validity – The degree to which an instrument measures what it is


intended or claims to measure. Note: reliability is a necessary but
insufficient condition for validity.

Validity coefficient – The correlation coefficient between an


assessment score and an external criterion of performance that is
used as an indicator of the validity of the assessment.

Validity generalisation – The extent to which an assessment’s validity


can be extended to populations that are somewhat different from that
used to validate the technique originally.

Variance – A measure of the variability or spread of a set of scores. It is


formally defined as the mean deviation squared or Σ(X – X)2/n

Virtual reality – Interaction with computer-generated material that


mimic reality in almost every way.

W
Work sample – A small-scale sample of a typical aspect of a person’s
job; a sample of work behaviour that is used to evaluate a person’s
ability to perform the job.
X
X (pronounced “X-bar”) – The mathematical shorthand for the mean or
average of all the X scores.

Z
z-score – A statistic representing the number of standard deviations a
score is above or below the mean. For example, if the mean is 100
and the SD is 10, then a score of 115 is equal to a z-score of +1,5
(115 – 100 = 15 which is 1,5 SDs above the mean).
References

About intelligence newsletter: What is intelligence and how is it measured? Retrieved 4


December 2008 from http://www.aboutintelligence.co.uk/what-intelligence

Abrahams, F., & Mauer, F.K. (1999a). Qualitative and statistical impacts of home language on
responses to items of the Sixteen Personality Factor Questionnaire (16PF) in South Africa.
South African Journal of Psychology, 29(2), 76–86.

Abrahams, F., & Mauer, F.K. (1999b). The comparability of the constructs of the 16PF in the
South African context. Journal of Industrial Psychology, 25(1), 53–59.

Adarrage, P., & Zacagnini, J.L. (1992). DAIA knowledge-based system for diagnosing autism. A
case study on the application of artificial intelligence to psychology. European Journal of
Psychological Assessment, 8, 17–25.

Alatas, S.H. (1968). The sociology of corruption: The nature, function, causes and prevention of
corruption. Singapore: D. Moore Press.

Alliger, G.M., & Dwight, S.A. (2000). A meta-analytic investigation of the susceptibility of
integrity tests to faking and coaching. Educational and Psychological Measurement, 60, 59–
72.

Allport, G.W. (1937). Personality: A psychological interpretation. New York: Holt, Rinehart, &
Winston.

AMA (American Management Association). (2002). Corporate values survey. Retrieved from
http://www.amanet.org/research

American Psychiatric Association. (2000). Diagnostic and statistical manual of mental


disorders, fourth edition text review (DSM-IVTR). Washington, DC: American Psychiatric
Association.

Amod, Z., & Seabi, J. (2013). Dynamic assessment in South Africa. In S. Laher, & K. Cockcroft
(Eds.), Psychological assessment in South Africa: Research and applications. (Chapter 9, pp.
120–136). Johannesburg: Wits University Press.

Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed). New York: McMillan.

Anolli, L., Duncan S., Magnusson, M.S., & Riva, G. (Eds.). (2005). The hidden structure of
interaction from neurons to culture patterns. Amsterdam: IOS Press.

Antonius, R. (2003). Interpreting quantitative data with SPSS. London: SAGE.

ANZSCO. (2013) Australian and New Zealand Standard Classification of Occupations (1st ed.),
(Revision 1 dated 4 July 2013). Retrieved 24 July 2013 from
http://www.abs.gov.au/ausstats/abs@.nsf/Product+Lookup/173D4D67348CB91CCA2575DF002DA7A7?
opendocument

Arends-Tóth, J., & van de Vijver, F.J.R. (2003). Multiculturalism and acculturation: Views of
Dutch and Turkish-Dutch. European Journal of Social Psychology, 33, 249–266.

Arnold, D. (1991). To test or not to test: Legal issues in integrity testing. Forensic Reports, 4,
213-214.

Arnold, J. (2005). Work psychology: Understanding human behaviour in the workplace (4th
ed.). London: Prentice Hall.

Arvis, J.-F., & Berenbeim, R.E. (2003). Fighting corruption in East Asia: Solutions from the
private sector. Washington, DC: World Bank. (Cited in Verhezen, 2008.)

Aryee, S. (1997). Selection and training of expatriate employees. In N. Anderson, & P. Herriot
(Eds.), International handbook of selection and assessment (Vol. 13, pp. 147–160).
Chichester, UK: Wiley.

Assessment Oversight and the Personnel Psychology Centre. (2009). Structured interviewing:
How to design and conduct structured interviews for an appointment process. Prepared for the
Canadian Public Service Commission. Retrieved form http://www.psc-cfp.gc.ca/plcy-
pltq/guides/structured-structuree/rpt-eng.pdf Also available at www.psc-cfp.gc.ca

Aycan, Z., & Berry, J. (1996). Impact of employment-related experiences on immigrants’


psychological well-being and adaptation to Canada. Canadian Journal of Behavioural
Science/Revue Canadienne des Sciences du Comportement, 28(3), 240–251.

Ballantine, B. (1999). New forms of work organisation and productivity. A study prepared by
Business Decisions Limited for DGV of the European Commission. Retrieved 15 October
2013 from http://www.ukwon.net/files/kdb/36bfab692c2666745b2ea83846bf917a.pdf

Balma, M.J. (1959). The concept of synthetic validity. Personnel Psychology, 12, 395–396.

Bandura, A., Barbaranelli, C., Caprara, G.V., & Pastorelli, C. (2001). Self-efficacy beliefs as
shapers of children’s aspirations and career trajectories. Child Development, 72, 187–206.

Banister, P. (1994). Observation. In P. Banister, E. Buurman, I. Parker, M. Taylor, & C. Tindall


(Eds.), Qualitative methods in psychology: A research guide (pp. 17–33). Buckingham, UK:
Open University Press.

Barnouw, V. (1985). Culture and personality (4th ed.). Homewood, IL: Dorsey Press.

Baron, H., & Bartram, D. (2006) Using online assessment tools for recruitment. Leicester, UK:
British Psychological Society.

Bar-On, R. (1997). Development of the Bar-On EQ-i: A measure of emotional intelligence and
social intelligence. Toronto, Canada: Multi-Health Systems.

Barrett, P.T., Petrides, K.V., Eysenck, S.B.G., & Eysenck, H.J. (1998). The Eysenck Personality
Questionnaire: An examination of the factorial similarity of P, E, N, and L across 34
countries. Personality and Individual Differences, 25, 805–819.

Barrick, M.R., Stewart, G.I., Neubert, M., & Mount, M.K. (1998). Relating member ability and
personality to work team processes and team effectiveness. Journal of Applied Psychology,
83, 377–391.

Bartram (2001). The impact of Internet on testing: Issues that need to be addressed by a Code of
Good Practice. Internal Report for SHL Group plc. (Cited in ITC, 2005).

Bartram, D. (2000). International guidelines for test use – Version 2000. Punta Gorda, FL:
International Test Commission.

Bartram, D. (2005). The Great Eight Competencies: A criterion-centric approach to validation.


Journal of Applied Psychology, 90, 1185–1203.

Bartram, D. (2011). Contributions of the EFPA Standing Committee on Tests and Testing to
standards and good practice. European Psychologist, 16(2), 149–159.

Bartram, D. (2012). White Paper: The SHL Universal Competency Framework. Retrieved 21
August 2013 from http://www.shl.com/assets/resources/White-Paper-SHL-Universal-
Competency-Framework.pdf

Bartram, D., & Coyne, I. (1998). Variations in national patterns of testing and test use: The
ITC/EFPPA international survey. European Journal of Psychological Assessment, 14, 249–
260.

Bates, R. (2002). Liking and similarity as predictors of multi-source ratings. Personnel Review,
31(5), 540–552.

Bayley, H. (1998). British MP House of Commons debate. Hansard 25/2/98, p. 374

Belbin, M. (1981). Management teams. London: Heinemann.

Ben-Porath, Y.S., Slutske, W.S., & Butcher, J.N. (1989). A real-data simulation of computerized
adaptive administration of the MMPI. Psychological Assessment: A Journal of Consulting
and Clinical Psychology, 1(1), 18–22.

Berk, R.A. (Ed.). (1982). Handbook of methods for detecting test bias. Baltimore, MD: The
Johns Hopkins University Press.

Berry, C.M., Sackett, P.R., & Wiemann, S. (2007). A review of recent developments in integrity
test research. Personnel Psychology, 60, 271–301.
Berry, C.M., Ones, D.S., & Sackett, P.R. (2007). Interpersonal deviance, organizational
deviance, and their common correlates: A review and meta-analysis. Journal of Applied
Psychology, 92, 409–423.

Berry, J.W., & Sam, D.L. (1997). Acculturation and adaptation. In J.W. Berry, M.H. Segall, &
C. Kagitcibasi (Eds.), Handbook of cross-cultural psychology: Social behavior and
applications (2nd ed), (Vol. 3, pp. 291–326). Boston, MA: Allyn & Bacon.

Berry, J.W., Kim, U., Power, S., Young, M., & Bujaki, M. (1989). Acculturation attitudes in
plural societies. Applied Psychology, 38, 185–206.

Berry, J.W., Poortinga, Y.H., Segall, M.H., & Dasen, P.R. (1992). Cross-cultural psychology:
Research and applications. New York: Cambridge University Press.

Bersin, J. (2013). Proficiency. Deloitte Development. Retrieved 1 August 2013 from


http://www.bersin.com/lexicon/Details.aspx?id=14921

Blair, M.D. (2003). Best practices in assessment centres: Reducing “group differences” to a
phrase for the past. Retrieved 11 November 2005 from
http://www.ipmaac.org/conf03/blair.pdf

Blom, A.G., de Leeuw, E.D., & Hox, J.J. (2011). Interviewer effects on non-response in the
European Social Survey. Journal of Official Statistics, 27, 359–377.

Boehnke, K., Lietz, P., Schreier, M., & Wilhelm, A. (2011). Sampling: The selection of cases for
culturally comparative psychological research. In D. Matsumoto, & F.J. R. van de Vijver
(Eds.), Cross-cultural research methods in psychology (pp. 101–129). New York: Cambridge
University Press.

Boeree, C.G. (2002). Early medicine and physiology. Retrieved 2 October 2013 from
http://webspace.ship.edu/cgboer/neurophysio.html

Borden, K.S., & Abbott, B.B. (2008). Research design and method (7th ed.). Boston, MA.:
McGraw-Hill.

Bradberry, T., & Greaves, J. (2005). The emotional intelligence quickbook. New York: Simon &
Schuster.

Brislin, R.W. (1986). The wording and translation of research instruments. In W.J. Lonner, &
J.W. Berry (Eds.), Field methods in cross-cultural research (Vol. 8, pp. 137–164). Thousand
Oaks, CA: SAGE.

Brislin, R.W. (1980). Translation and content analysis of oral and written material. In H.C.
Triandis, & J.W. Berry (Eds.), Handbook of cross-cultural psychology (Vol. 1, pp. 389–444).
Boston: Allyn & Bacon.

British Psychological Society. (2010). Code of good practice for psychological testing Retrieved
24 July 2013 from http://www.psychtesting.org.uk/download$.cfm?file_uuid=663F4988-
A7A7-91D6-68E7-677BD-82D14E9&siteName=ptc
Burish, M. (1997). Test length and validity revisited. European Journal of Personality, 11, 303–
315.

Byrne, B.M. (2001). Structural equation modeling with AMOS – Basic concepts, applications,
and programming. Mahwah, NJ: Lawrence Erlbaum.

Byrne, B.M. (2010). Structural equation modeling with AMOS: Basic concepts, applications,
and programming (2nd ed.). New York: Taylor and Francis.

Camilli, G., & Shepard, L.A. (1994). Methods for identifying biased test items. Thousand Oaks,
CA: SAGE.

Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait
multi-method matrix. Psychological Bulletin, 56, 81–105.

Campbell, D.T. (1986). Science’s social system of validity-enhancing collective belief change
and the problems of the social sciences. In D.W. Fiske, & R.A. Shweder (Eds.), Metatheory
in social science: Pluralities and subjectivities (pp. 108–135). Chicago, IL: University of
Chicago Press.

Campion, M.A., Fink, A.A., Ruggeberg, B.J., Carr, L., Phillips, G.M., & Odman, R.B. (2011).
Doing competencies well: Best practices in competency modelling. Personnel Psychology,
64, 225–262.

Carducci, B.J. (2009). The psychology of personality: Viewpoints, research, and applications.
Maldon, MA: Wiley-Blackwell.

Carretta, T.R., & Ree, M.J. (1996). Factor structure of the Air Force qualifying test: Analysis
and comparison. Military Psychology, 8, 29–42.

Carretta, T.R., Retzlaff, P.D., Callister, J.D., & King, R.E. (1998). A comparison of two U.S. Air
Force pilot aptitude tests. Aviation, Space and Environment Medicine, 69, 931–935.

Carroll, J.B. (1993). Human cognitive abilities. Cambridge: Cambridge University Press.

Caruso, D. (2004). Defining the Inkblot called emotional intelligence: Comment on R.J.
Emmerling and D. Goleman, Emotional intelligence: Issues and common misunderstandings.
Retrieved 17 October 2013, from http://www.eiconsortium.org/pdf/defining_the_ink-
blot_called_emotional_intelligence.pdf

Caryl, P.G. (1994). Early event-related potentials correlate with inspection time and intelligence.
Intelligence, 18, 15–46.

Cascio, W.F., Outtz, J., Zedeck, S., & Goldstein, I.L. (1991). Statistical implications of six
methods of test score use in personnel selection. Personnel Psychology, 4, 233–264.

Cattell, R.B. (1940). A culture-free intelligence test, I. Journal of Educational Psychology, 31,
176–199.

Cattell, R.B. (1987). Intelligence: Its structure, growth, and action. New York: Elsevier Science.
Cattell, R.B., & Cattell, A.K.S. (1963). Culture fair intelligence test. Champaign, IL: Institute for
Personality and Ability Testing.

Cattell, R.B., Eber, H.W., & Tatsuoka, M.M. (1970). Handbook for the Sixteen Personality
Factors Questionnaire (16PF). Champaign, IL: Institute for Personality and Ability Testing.

Ceci, S.J. (1990). On intelligence, more or less: A bioecological treatise on intellectual


development. Englewood Cliffs, NJ: Prentice Hall.

Cheung, F.M., Cheung, S.F., Leung, K., Ward, C., & Leong, F. (2003). The English version of
the Chinese Personality Assessment Inventory: Derived etics in a mirror position. Journal of
Cross-Cultural Psychology, 34, 433–452.

Cheung, F.M., Leung, K., Fan, R.M., Song, W.Z., Zhang, J.-X., & Chang, J.P. (1996).
Development of the Chinese Personality Assessment Inventory. Journal of Cross-Cultural
Psychology, 27, 181–199.

Cheung, F.M., Leung, K., Zhang, J.-X., Sun, H.-F., Gan, Y.-Q., Song, W.-Z., & Xie, D. (2001).
Indigenous Chinese personality constructs: Is the five-factor model complete? Journal of
Cross-Cultural Psychology, 32(4), 407–433.

Cheung, G.W., & Rensvold, R.B. (2002). Evaluating goodness-of-fit indexes for testing
measurement invariance. Structural Equation Modeling, 9, 233–255.

Chinkanda, E.N. (1990). Shared values and Ubuntu. Paper presented at the Conference Kontak:
On Nation Building. Pretoria: Human Sciences Research Council. (Cited in Prinsloo, 2001.)

Claasen, N.C.W., van Heerden, J.S., Vosloo, H.N., & Wheeler, J.J. (2000). Manual for the
differential aptitude tests (Form R). Pretoria: Human Sciences Research Council.

Clarke, D. (2000). Evaluation of a networked self-testing program. Psychological Reports, 86,


127–128.

Clay, R.A. (2006). Assessing assessment. Monitor. Retrieved 14 October 2008 from
http://www.apapractice.org/apa/insider/practice/trends/assessment.html

Cleary, T.A., Humphreys, L.G., Kendrick, S.A., & Wesman, A. (1975). Educational uses of tests
with disadvantaged populations. American Psychologist, 30, 15–41.

Coetzee, N., & Vosloo, H.N. (2000). Manual for the differential aptitude test. Pretoria: Human
Sciences Research Council.

Cohen, J. (1977). Statistical power analysis for the behaviour sciences (revised ed.). New York:
Academic.

Cohen, R.J., & Swerdlik, M.E. (2002). Psychological testing and assessment: An introduction to
tests and measurement (5th ed.). London: McGraw-Hill.

Cohen, R.J., Swerdlik, M.E., & Sturman, E. (2012). Psychological testing and assessment: An
introduction to tests and measurement. (8th ed.). Boston, MA: McGraw-Hill.
Cole, N.S. (1973). Bias in selection. Journal of Educational Measurement, 10(4), 237–255.

Coleman, V., & Borman, W. (2000). Investigating the underlying structure of the citizen
performance domain. Human Resources Management Review, 10(1), 25–44.

Colman, A.M. (2001). A dictionary of psychology. Oxford: Oxford University Press.

Cone, J.D., & Hayes (1980). Environmental problems, behavioural solutions. California: Cole.

Connelly, B.S., & Ones, D.S. (2008). The personality of corruption: A national-level analysis.
Cross-Cultural Research, 42, 353–385.

Cooper, C.L., & Robertson, I.T. (Eds.). (2001). International review of industrial and
organisational psychology, 16. Chichester, UK: Wiley.

Cortina, J.M., Goldstein, N., Payne, S., Davison, K., & Gilliland, S.W. (2000). The incremental
validity of interview scores over and above cognitive ability and conscientiousness. Personal
Psychology, 53, 325–351.

Costa, P.T. Jr., & McCrae, R.R. (1985). The NEO personality inventory manual. Odessa, FL:
Psychological Assessment Resources.

Costa, P.T. Jr., & McCrae, R.R. (1992). Revised NEO personality inventory (NEO-PI-R) and
NEO five-factor inventory (NEO-FFI) manual. Odessa, FL: Psychological Assessment
Resources.

Coyne, I., & Bartram, D. (2002). Assessing the effectiveness of integrity tests: A review.
International Journal of Testing, 2(1), 15–34.

Coyne, I. (2008). Integrity testing in organisational contexts. In M. Born, C.D. Foxcroft, & R.
Butter (Eds.), Online readings in testing and assessment. International Test Commission
Available at http://www.intestcom.org/Publications/ORTA.php

Crafford, A., Moerdyk, A.P., Nel, P., O’Neill, C., & Schlechter, A. (2006). Industrial
psychology: Fresh perspectives. Cape Town: Pearson.

Cronshaw, S.F., Alexander, R.A., Wiesner, W.H., & Barrick M.R. (1987). Incorporating risk
into selection utility: Two models for sensitivity analysis and risk simulation. Organizational
Behaviour and Human Decision Processes, 40, 270–286.

Cuéllar, I. (2000). Acculturation as a moderator of personality and psychological assessment. In


R.H. Dana (Ed.), Handbook of cross-cultural and multicultural personality assessment (pp.
113–129). Mahwah, NJ: Erlbaum.

Das, J.P., & Naglieri, J.A. (1997). Das-Naglieri cognitive assessment system (CAS). Itasca, IL:
Riverside Publishing.

Das, J.P., Kirby, J.R., & Jarman R.F. (1975). Simultaneous and successive syntheses: An
alternative model for cognitive abilities. Psychological Bulletin, 82, 87–103.
Das, J.P., Naglieri, J.A., & Kirby, J.R. (1994). Assessment of cognitive processes. Needham
Heights, MA: Allyn & Bacon.

Das, J.P. (2002). A better look at intelligence. Current Directions in Psychology, 11(1), 28–32.

Das, J.P., Kar, B., & Parrila, R. (1996). Cognitive planning: The psychological basis of
intelligent behaviour. New Delhi: SAGE International.

Davies, M., Stankov, L., & Roberts, R.D. (1998). Emotional intelligence: In search of an elusive
construct. Journal of Personality and Social Psychology, 75, 989–1015.

Davis, D.W., & Silver, B.D. (2003). Stereotype threat and race of interviewer effects in a survey
on political knowledge. American Journal of Political Science, 47, 33–45.

Dawes, R., & Corrigan, B. (1974). Linear models in decision making. Psychological Bulletin,
81, 95–106.

Day, S.X., & Rounds, J.B. (1998). Universality of vocational interest structure among racial and
ethnic minorities. American Psychologist, 53(77), 728–736.

De Beer, M. (2005) Development of the learning potential computerised test (LPCAT). South
African Journal of Psychology, 35(4), 717–747.

De Beer, M. (2006). Dynamic testing: Practical solutions to some concerns. SA Journal of


Industrial Psychology, 32(4), 8–14.

De Beer, M. (2013). The learning potential computerised test (LPCAT) in South Africa. In S.
Laher, & K. Cockcroft (Eds.), Psychological assessment in South Africa: Research and
applications. (Chapter 10, pp. 137–157). Johannesburg: Wits University Press.

Deregowski, J.B., & Serpell, R. (1971). Performance on a sorting task: a cross-cultural


experiment. International Journal of Psychology 6, 273–281.

DeShon, R.P., Smith, M.R., Chan, D., & Schmitt, N. (1998). Can racial differences in cognitive
test performance be reduced by presenting problems in a social context? Journal of Applied
Psychology, 83, 438–451.

Dodds, E.R. (1951/1983). The Greeks and the irrational. Berkeley and Los Angeles, CA:
University of California Press.

Donoso, O.A. (2010). Psychological assessment in vocational rehabilitation: A qualitative


exploration of acculturation assessment and clinician testing practices. College of Liberal
Arts & Social Sciences Theses and Dissertations, Paper 12. Retrieved 12 January 2014 from
http://via.library.depaul.edu/cgi/viewcontent.cgi?article=1051&context=etd

Donovan, M.A., Drasgow, F., & Probst, T.M. (2000). Does computerizing paper-and-pencil job
attitude scales make a difference? New IRT analyses offer insight. Journal of Applied
Psychology, 85, 305–313.

Douglas, J., Roussos, L., & Stout, W. (1996). Item-bundle DIF hypothesis testing: Identifying
suspect bundles and assessing their differential functioning. Journal of Educational
Measurement, 33, 465–484.

Douglas, S.P., & Nijssen, E.J. (2002). On the use of ‘borrowed’ scales in cross-national research:
A cautionary note. International Marketing Review, 20(6), 621–642.

Dreyfus, S.E., & Dreyfus, H.L. (1980). A five-stage model of the mental activities involved in
directed skill acquisition. Washington, DC: Storming Media. Retrieved 1 August 2013 from
http://www.dtic.mil/cgi-bin/GetTRDoc?
AD=ADA084551&Location=U2&doc=GetTRDoc.pdf

Dubois, D.W. (2005). What are competencies and why are they important? People Dynamics,
July, 10–11.

Dubois, D.W., & Rothwell, W.J. (2000). The competency toolkit. Amherst, MA: Human
Resources Development Press.

Dyal, J.A. (1984). Cross-cultural research with the locus of control construct. In H.M. Lefcourt
(Ed.), Research with the locus of control construct (pp. 209–306). San Diego, CA: Academic
Press. (Cited in Van de Vijver & Phalet, 2004.)

EDAC (Executive Development Centres). (2006). Introduction to MCPA™ Mar 06. Retrieved
27 January 2009 from http://www.edacen.com/assessments/MCPA

Educational Testing Service. (2000). ETS standards for quality and fairness. Princeton, NJ:
Educational Testing Service.

Ekermans, G. (2009). Exploring the emotional intelligence construct: A cross-cultural


investigation. PhD Thesis, Swinburne University of Technology, Melbourne, Australia.

Ekuma, K.J. (2012). The importance of predictive and face validity in employee selection and
ways of maximizing them: An assessment of three selection methods. International Journal
of Business and Management, 7(22), 115–122.

Ellingson, J.E., Sackett, P.R., & Hough, L.H. (1999). Social desirability corrections in
personality measurement: Issues of applicant comparison and construct validity. Journal of
Applied Psychology, 84, 155–166.

Emmerling, R.J., & Goleman, D. (2003). Emotional intelligence: Issues and common
misunderstandings. Issues in Emotional Intelligence, 1(1). Retrieved 25 October 2008 from
http://www.eiconsortium.org

Employment Equity Amendment Act. (2013). Employment Equity Amendment Act, 2013 Act No
47 of 2013. Pretoria: Government Gazette No 37238 dated 16 January 2014.

Erikson, E. (1963). Childhood and society (2nd ed.). New York: W.W. Norton.

Espinosa, A.J., Procidano, M.E., & He, J (2012). Social desirability across 3 cultural contexts:
Mexico, USA and China. Paper presented at the 21st International Cross-cultural Psychology
Conference, Stellenbosch, South Africa, 17–21 July.
Evans, B.R. (1999). The cost of corruption. A discussion paper on corruption, development and
the poor. Teddington, UK: Tearfund.

Eysenck, H. (2000). Intelligence: A new look. New Brunswick, NJ: Transaction Publishers.

Eysenck, H.J., & Eysenck, M.W. (1985). Personality and individual difference. New York:
Plenum.

Eysenck, H.J., Barrett, P.T., & Eysenck, S.B.G. (1985). Indices of factor comparison for
homologous and non-homologous personality scales in 24 different countries. Personality
and Individual Differences, 6, 503–504.

Fallaw, S.S., Kantrowitz, T.M., & Dawson, C.R. (2012). 2012 Global assessment trends report.
SHL Talent Measurement.

Feldman-Bischoff, J., & Barshi, I. (2007). The effects of blood glucose levels on cognitive
performance: A review of the literature. Moffett Field, CA: NASA Ames Research Centre.
Retrieved 7 August 2013 from
http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20070031714_2007030981.pdf

Fernández-Ballesteros, R. (2006). Crucial issues in the field of psychological assessment and


evaluation. Available at http://www.iaapsy.org/rocio.htm

Feuerstein, R. (1979). The dynamic assessment of retarded performers: The learning potential
assessment device: Theory, instruments, and techniques. Baltimore, MD: University Park
Press.

Fine, S. (2010) Cross-cultural integrity testing as a marker of regional corruption rates.


International Journal of Selection and Assessment, 18(3), 251–259.

Fink, A., & Neubauer, A.C. (2001). Speed of information processing, psychometric intelligence
and time estimation as an index of cognitive load. Personality and Individual Differences,
30(6), 1009–1021.

Fioravanti, M., Gough, H.G., & Frere, L.J. (1981). English, French, and Italian adjective check
lists: A social desirability analysis. Journal of Cross-Cultural Psychology, 12, 461–472.

Fontaine, J.R.J, (2005). Equivalence. In K. Kempf-Leonard (Ed.), Encyclopedia of social


measurement (pp. 801–813). San Diego, CA: Academic Press.

Fisher, W.P. Jr. (1997). What scale-free measurement means to health outcomes research.
Physical Medicine & Rehabilitation State of the Art Reviews, 11(2), 357–373.

Fletcher, C. (1995). New directions for performance appraisal. Some findings and observations.
International Journal of Selection and Assessment, 3(3), 191–196.

Flynn, J.R. (1984). The mean IQ of Americans: Massive gains 1932 to 1978. Psychological
Bulletin, 95, 29–51.

Fontenesi, M. (2005). Preface. In L. Anolli, S. Duncan, M.S. Magnusson, & G. Riva (Eds.), The
hidden structure of interaction from neurons to culture patterns. Amsterdam: IOS Press.
Retrieved 23 September 2007 from http://www.vespsy.com/communication/volume7.html

Forer, B.R. (1949). The fallacy of personal validation: A classroom demonstration of gullibility.
Journal of Abnormal Psychology, 44, 118–121.

Fouad, N.A. (1993). Cross-cultural vocational assessment. The Career Development Quarterly,
42, 4–13.

Fouad, N.A., Harmon, L.W., & Borgen, F.H. (1997). Structure of interests in employed male and
female members of US racial-ethnic minority and nonminority groups. Journal of Counseling
Psychology, 44, 339–345.

Foxcroft, C., & Roodt, G. (Eds). (2009). An introduction to psychological assessment in the
South African context (3rd ed.). Cape Town: Oxford University Press.

Foxcroft, C., & Roodt, G. (Eds.). (2001). An introduction to psychological assessment in the
South African context (1st ed.). Cape Town: Oxford University Press.

Foxcroft, C., & Roodt, G. (Eds.). (2005). An introduction to psychological assessment in the
South African context (2nd ed.). Cape Town: Oxford University Press.

Foxcroft, C., & Stumpf, R. (2005, 23 June). What is matric for? Paper presented at the Umalusi
Seminar on “Matric – What is to be done?” 23 June, Pretoria.

Foxcroft, C.D. (1997). Psychological testing in South Africa: Perspectives regarding ethical and
fair practices. European Journal of Psychological Assessment, 13(3), 229–235.

Fransella, F. (Ed.). (2005). The essential practitioner’s handbook of personal construct


psychology. London: Wiley.

Friedman, H.S., & Schustack, M.W. (1999). Personality: Classic theories and modern research.
Boston, MA: Allyn & Bacon.

Frijda, N., & Jahoda, G. (1966). On the scope and methods of cross-cultural research.
International Journal of Psychology, 1(2): 109–127.

Fulmer, R.M., & Conger, J.A. (2004). Growing your company’s leaders: How organizations use
succession management to sustain competitive advantage. New York: AMA-COM.

Furnham, A. (1992). Personality at work: The role of individual differences in the workplace.
London: Routledge.

Furnham, A. (2003). The incompetent manager: The causes, consequences and cures of
management failure. London: Whurr Publishers.

Furnham, A. (2008). Personality and intelligence at work. London: Routledge.

Gardner, H. (1983). Frames of mind: The theory of multiple intelligence. New York: Basic
Books.
Gardner, H. (1993). Multiple intelligences: The theory in practice. New York: Basic Books.

Gass, S.M., & Varonis, E.M. (1991). Miscommunication in Nonnative Speaker Discourse. In N.
Coupland, H. Giles, & J.M. Wiemann (Eds.). Miscommunication and problematic talk (pp.
121–145). London: SAGE.

Geisinger, K.F. (1994). Cross-cultural normative assessment: Translation and adaptation issues
influencing the normative interpretation of assessment instruments. Psychological
Assessment, 6(4), 304–312.

Geisinger, K.F., Spies, R.A., Carlson, J.F., & Plake, B.S. (2007). (Eds.). Mental measurements
yearbook (17th ed.). Lincoln, NE: University of Nebraska Press, Buros Institute of Mental
Measurements.

George, J.A., & Reiber, A. (2005). The ROI of assessment. Retrieved 8 May 2007 from
http://www.workindex.com/editorial/staff/sta0506-tt-01.asp

Ghorpade, J., Hattrup, K., & Lackritz, J.R. (1999). The use of personality measures in cross-
cultural research: A test of three personality scales across two countries. Journal of Applied
Psychology, 84, 670–679.

Gierl, M., Jodoin, M., & Ackerman, T. (2000). Performance of Mantel-Haenszel, simultaneous
bias test and logistic regression when the proportion of DIF items is large. Paper presented at
the annual meeting of the American Education Research Association (AERA), New Orleans,
LA, 24–27 April. Retrieved 17 July 2013 from
http://www2.education.ualberta.ca/educ/psych/crame

Goldberg, L.R. (1970). Man versus model of man: A rationale plus evidence for a method of
improving clinical inference. Psychological Bulletin, 73, 422–432.

Goldberg, L.R., Grenier, J.R., Guion, R.M., Sechrest, L.B., & Wing, H. (1991). Questionnaires
used in the prediction of trustworthiness in pre-employment selection decisions; APA Task
Force Report. Washington, DC: American Psychiatric Association.

Goldstein, I.L., Braverman, E.P., & Goldstein H.W. (1991). The use of needs assessment in
training systems design. In K. Wexley (Ed.), Handbook of human resources management:
Developing human resources (pp. 35–75). Washington DC: BNA Books.

Goldstein, H.W., Yusko, K.P., Braverman, E.P., Smith, D.B., & Chung, B. (1998). The role of
cognitive ability in the subgroup differences and incremental validity of assessment centre
exercises. Personnel Psychology, 51, 357–374.

Goleman, D. (1995). Emotional intelligence – Why it can matter more than IQ. New York:
Bantam Books.

Gordon, J.R. (2001). Organisational behaviour: A diagnostic approach. Upper Saddle River,
NJ: Prentice Hall.

Gordon, M. M. (1964). Assimilation in American life. New York: Oxford University Press.
Gottfredson, L.S. (1998). The general intelligence factor. Scientific American, 9(4), 24–29.

Graves, L. (1993). Sources of individual differences in interview effectiveness: A model and


implications for future research. Journal of Organizational Behavior, 14(4), 349–370.

Grayson, P. (2005). An introduction to assessment centres. Retrieved 28 November 2005 from


http://www.psychometrics.co.uk./adc.htm

Greenberg, J., & Baron, R.A. (2000). Behaviour in organisations: Understanding and managing
the human side of work (7th ed.) Upper Saddle River, NJ: Prentice Hall.

Greenhaus, J.H., & Callanan, G.A. (1994). Career management (2nd ed.). Fort Worth, TX:
Dryden Press.

Gribbin, J. (2005). Deep simplicity: Chaos, complexity and the emergence of life. London:
Penguin Books.

Groenewald, H.J. (2012). Elliott Jaques and sensemaking: Ultimate sensemaker or 20th century
relic? MPhil Dissertation, University of Stellenbosch, Stellenbosch. Retrieved 15 August
2013 from http://www.google.com/url?sa=t&rct=j&q=-
scholar.sun.ac.za%2Fbitstream%2Fhandle%2F…1%2F…
%2Fgroenewald_elliott_2012.pdf&source=web&cd=1&ved=0CCkQFjAA&url=http%3A%2F%2Fscholar.sun.ac
IIMUqHnKvSv7AaV9YEo&usg=AFQjCNH_L3LF6K3ZR241abNEWq9OHmkKDg

Grove, W.M., Zald, D.H., Lebow, B.S., Snitz, B.E., & Nelson, C. (2000). Clinical versus
mechanical prediction: A meta analysis. Psychological Assessment, 12, 19–30.

Guilford, J.P. (1959). Traits of creativity. In H.H. Anderson (Ed.), Creativity and its cultivation
(pp. 142–161). New York: Harper and Row.

Guilford, J.P., & Hoepfner, R. (1971). The analysis of intelligence. New York: McGraw-Hill.

Guion, R.M. (1996). Assessment, measurement and prediction for personnel decisions. Mahwah,
NJ: Erlbaum.

Guion, R.M. (1998). Assessment, measurement and prediction for personnel decisions. Mahwah,
NJ: Erlbaum.

Haier, R.J. (1993). Cerebral glucose metabolism and intelligence. In P.A. Vernon (Ed.),
Biological approaches to the study of human intelligence (pp. 317–332). Norwood. NJ:
Ablex.

Hall, J.D., Howerton D.L., & Bolin A.U. (2005). The use of testing technicians: Critical issues
for professional psychology. International Journal of Testing, 5(4), 357–375.

Hambleton, R. (2010). Item response theory: Concepts, models and applications. Workshop
presented at the 27th International Congress of Applied Psychology, Melbourne, Australia.
(Cited in Macqueen, 2012.)

Hambleton, R.K. (1994). Guidelines for adapting educational and psychological tests: A
progress report. European Journal of Psychological Assessment (Bulletin of the International
Test Commission), 10, 229–244. (Cited in Van de Vijver, 2002.)

Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response
theory. Newbury Park, CA: SAGE.

Hanson, M.A., Borman, W.C., Mogilka, H.J., Manning, C., & Hedge, J.W. (1999).
Computerized assessment of skill for a highly technical job. In F. Drasgow, & J. B. Olson-
Buchanan (Eds.), Innovations in computerized assessment (pp. 197–220). Mahwah, NJ:
Erlbaum.

Hare, R.D. (1993) Without conscience: The disturbing world of the psychopaths among us. New
York: Simon & Schuster.

Hare, R.D. (1995). Psychopaths: New trends in research. Harvard Mental Health Letter, 12, 4–5.

Hare, R.D. (1996). Psychopathy: A clinical construct whose time has come. Criminal Justice and
Behavior. 23, 25–54.

Hare, R.D. (1997). The NATO Advanced Study Institute on psychopathy, Alvor 1995. Journal
of Personality Disorders, 11, 301–303.

Hare, R.D. (1998). Psychopathy, affect and behaviour. In D.J. Cooke, A.E. Forth, & R.D. Hare
(Eds.), Psychopathy: Theory, research and implications for society (pp. 105–137). Dordrecht,
The Netherlands: Kluwer.

Harris, D.B. (1963). Children’s drawings as measures of intellectual maturity. New York:
Harcourt, Brace & World.

Harris, W.G. (2000). Best practices in testing technology: Proposed computer-based testing
guidelines. Journal of e-Commerce and Psychology, 1(2), 23–35.

Haverkamp, B.E., Collins, R.C., & Hansen J.I. (1994). Structure of interests of Asian American
college students. Journal of Counseling Psychology, 41, 256–264.

He, J., & van de Vijver, F. (2012). Bias and equivalence in cross-cultural research. Online
Readings in Psychology and Culture, 2(2). Retrieved 30 May 2014.

Heilman, M.E. (1996). Affirmative action’s contradictory consequences. Journal of Social


Issues, 52(4), 105–109.

Heilman, M.E., Battle, W.S., Keller, C.E., & Lee, R.A. (1998). Type of affirmative action
policy: A determinant of reactions to sex-based preferential selection? Journal of Applied
Psychology, 83, 190–205.

Helms, J.E. (1992). Why is there no study of cultural equivalence in standardised cognitive
ability testing? American Psychologist, 47, 1083–1101.

Hersey, P., & Blanchard, K.H. (2008). Management of organisational behaviour (9th ed.).
Englewood Cliffs, NJ: Prentice Hall.
Hiebert, P.G. (1985). Anthropological insights for missionaries Grand Rapids, MI: Baker Book
House.

Higgins, L.T., & Zheng, M. (2002). An introduction to Chinese psychology – Its historical roots
until the present day. The Journal of Psychology, 136(2), 225–239.

Hirsch, S. K. (1991). Using the Myers-Briggs type indicator in organizations (2nd ed.). Palo
Alto, CA: Consulting Psychologist Press.

Ho, D.Y.F. (1996). Filial piety and its psychological consequences. In M.H. Bond (Ed.),
Handbook of Chinese psychology (pp. 155–165). Hong Kong: Oxford University Press.

Hofstede, G. (1980). Culture’s consequences: International differences in work-related values.


London: SAGE.

Hofstede, G. (1991) Culture and organisations: Software of the mind. New York: McGraw–Hill.

Hofstede, G. (1994). Uncommon sense about organizations – Cases, studies, and field
observations. London: SAGE.

Hofstede, G. (1996) Cultural constraints in management theories. In R.M. Steers, L.W. Porter, &
G.A. Bigley (Eds.) Motivation and leadership at work (6th ed.) New York: McGraw-Hill.

Hofstede, G. (2001). Culture’s consequences: International differences in work related values


(2nd ed.). Thousand Oaks, CA: SAGE.

Hogan, J., Davies, S., & Hogan, R. (2007). Generalizing personality-based validity evidence. In
S.M. McPhail (Ed.), Alternative validation strategies: Developing new and leveraging
existing validity evidence (pp. 181–229). San Francisco, CA: Jossey-Bass.

Holland, J.L. (1985). Making vocational choices (2nd ed.). Upper Saddle River, N.J: Prentice-
Hall.

Holland, P.W., & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale, NJ:
Erlbaum.

Holland, W.P., & Thayer, D.T. (1988). Differential item performance and the Mantel_Haenszel
procedure. In H. Wainer, & H.I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ:
Erlbaum.

Holloway, J.D. (2003). Nondoctoral assistants in question. APA Monitor, 34(10), 26.

Horn, J.L., & Cattell, R.B. (1966). Refinement and test of the theory of fluid and crystallized
general intelligence. Journal of Educational Psychology, 57(5), 253–270.

Hough, L.M., Eaton, N.K., Dunnette, M.D., Kamp, J.D., & McCloy, R.A. (1990). Criterion-
related validities of personality constructs and the effect of response distortion on those
validities. Journal of Applied Psychology, 75, 581–595.

Hough, L.M., & Oswald, F.L. (2000). Personnel selection: Looking toward the future –
remembering the past. Annual Review of Psychology, 51, 631–664.

HPCSA (Health Professions Council of South Africa). (2006). South African guidelines on
computerised testing – Form 257. Pretoria: HPCSA Professional Board for Psychology.

HPCSA (Health Professions Council of South Africa). (2007). Discussion document: Scope of
practice. Retrieved 27 March 2008 from
http://www.hpcsa.co.za/hpcsa/UserFiles/File/PSYCHOLOGY/Discussion%20document%20%20%20Scope%20o
03-071%20-%20Special%20Educ%20Meeting%20(March2007).pdf

HPCSA (Health Professions Council of South Africa) Professional Board for Psychology.
(2006). Policy on the classification of psychometric measuring devices, instruments, methods
and techniques – Form 08. Retrieved 16 August 2008 from
http://www.hpcsa.co.za/hpcsa/Userfiles/File/Psychpolicyclassificationf208.Doc

HPCSA (Health Professions Council of South Africa). (2014). List of Classified and Certified
Psychological Tests. Pretoria: Health Professions Council of South Africa Board Notice 93 of
2014. Government Gazette No 37903, 15 August 2014.

HR Magazine. (2004, August). Retrieved 19 January 2006 from


http://www.findarticles.com/p/articles/mi_m3495/is_8_49/ai_n6181407

HRD Competency Library (n.d.) at


http://humanresources.syr.edu/staff/nbu_staff/comp_library.html

Hui, C. H., & Triandis, H. C. (1989). Effects of culture and response format on extreme response
style. Journal of Cross-Cultural Psychology, 20, 296-309.

HumanMetrics (2013). The Jung typology profiler for workplace. Retrieved 12 August 2013
from http://www.humanmetrics.com/hr/business/preemploymenttesting.aspx#about_jtpw.

Hunt, E., Frost, N., & Lunneborg, C. (1973). Individual differences in cognition: A new
approach to intelligence. In G. Bower (Ed.), Advances in learning and motivation (Vol. VII)
(pp. 87–122). New York: Academic Press.

Hunter, J. E., & Schmidt, F.L. (2004) Methods of meta-analysis: Correcting error and bias in
research findings (2nd Ed.). Newbury Park, CA: Sage.

Hunter, J.E., & Hunter, R.F. (1984). Validity and utility of alternative predictors of job
performance. Psychological Bulletin, 96, 72–98.

ILO (International Labour Organization). (2000) Workers without frontiers: The impact of
globalization on international migration. Geneva: ILO. Retrieved 17 July 2013 from
http://www.ilo.org/global/standards/subjects-covered-by-international-labour-
standards/migrant-workers/lang—en/index.htm

ITC (International Test Commission). (2001) International guidelines for test use. Retrieved 24
July2013 from http://www.intestcom.org/test_use_full.htm

ITC (International Test Commission). (2006). International guidelines on computer-based and


Internet-delivered testing (Version 2005). International Journal of Testing, 6(2), 143–172.
Retrieved 29 July 2013 from
http//:www.intestcom.org/Downloads/ITC%20Guidelines%20on%20Computer%20version%202005%20approve

ITC (International Test Commission). (2010). A test-taker’s guide to technology-based testing.


Available at http://www.intestcom.org

Jankowicz, D. (2004). The easy guide to repertory grids. Chichester, UK: Wiley.

Jaques, E. (1976). A general theory of bureaucracy. London: Heinemann Educational Books.

Jaques, E. (1982). Free enterprise, fair employment. New York and London: Crane Russak & Co
and Heinemann Educational Books.

Jaques, E. (1997). Requisite organization: A total system for effective managerial organization
and managerial leadership for the 21st century (1st ed.). Arlington: Cason Hall.

Jaques, E. (1998). Requisite organisation: A total system for effective managerial organization
and managerial leadership for the 21st century (2nd ed.). Fall Church, VA: Cason Hall.

Jaques, E., Gibson, R.O., & Isaac, D.J. (1978). Levels of abstraction in logic and human action:
A theory of discontinuity in the structure of mathematical logic, psychological behaviour, and
social organization. London: Heinemann Educational Books.

Jensen, A.R. (1980). Bias in mental testing. New York: Free Press.

Jing, G., Drasgow, F., & Gibby, R.E. (2012). Estimating the base rate of cheating for
unproctored Internet tests? Paper presented at the 27th Annual Conference of the Society for
Industrial and Organizational Psychology, San Diego, CA. (Cited by Macqueen, 2012.)

Jing, Q., & Fu, X. (2001). Modern Chinese psychology: Its indigenous roots and international
influences. International Journal of Psychology, 36, 408–418.

Jing, Q.C., Wan, C.W., & Lin, G.B. (2003). Psychological studies on Chinese only children in
the last two decades. Journal of Psychology in Chinese Societies, 3, 163–181.

Johnson, W., & Bouchard, T.J. Jr. (2005). Constructive replication of the visual-perceptual-
image rotation (VPR) model in Thurstone’s (1941) battery of 60 tests of mental ability.
Intelligence, 33, 417–430.

Jones, J.W. (1998). Virtual HR: Human resources management in the 21st century. Menlo Park,
CA: Crisp Publications.

Jones, J.W., & Higgins, K.D. (2001). Megatrends in personnel testing: A practitioner’s
perspective. Retrieved 18 March 2008 from
http://www.testpublishers.org/Documents/journal03.pdf

Jung, C.G. (1968). AION: Researches into the phenomenology of the self (2nd ed.). London:
Routledge.
Kanjee, A., & Foxcroft, C. (2009). Cross-cultural test adaptation, translation and tests in multiple
languages. In C. Foxcroft, & G. Roodt (Eds.), Introduction to psychological assessment in the
South African context (3rd ed.). (pp. 77–89). Cape Town: Oxford University Press.

Kanjee, A. (2007). Using logistic regression to detect bias when multiple groups are tested.
South African Journal of Psychology, 37, 47-61.

Kaplan, R.M., & Saccuzzo, O.P. (2013). Psychological testing: Principles, applications, and
issues (9th ed.). Belmont, CA: Wadsworth.

Kaplan, R.M. (1982). Nader’s raid on the testing industry: Is it in the best interests of the
consumer? American Psychologist, 37, 15–23.

Kaplan, R.M., & Saccuzzo, D.P. (1989). Psychological testing: Principles, applications, and
issues (2nd ed.). Pacific Grove, CA: Brooks/Cole.

Kaplan, R.M., & Saccuzzo, D.P. (2001). Psychological testing: Principle, applications and
issues (5th ed.). Belmont, CA: Wadsworth.

Kaplan, R.M., & Saccuzzo, D.P. (2005). Psychological testing: Principles, applications and
issues (6th ed.). Belmont, CA: Wadsworth.

Katz, M.R. (1983). SIGI: An interactive aid to career decision making. Journal of College
Student Personnel, 21, 34–40. (Cited in R. Langley, R. du Toit, & D.L. Herbst, 1995).
Keirsey, D. (1998). Please understand me II (1st ed.). Del Mar, CA: Prometheus Nemesis
Books.

Kelly, G.A. (1955). The psychology of personal constructs. Vol. 1: A theory of personality. New
York: Norton.

Khoza, R. (1994). African humanism. Diepkloof Extension, South Africa: Ekhaya Promotions.

Kirkpatrick, D.L. (1996). Evaluating training programs: The four levels. San Francisco, CA:
Berrett-Koehler.

Kitching, J. (2004). The measurement outcome equivalence of the career path appreciation
(CPA) for employees from diverse cultural backgrounds. MCom Dissertation, Pretoria,
University of Pretoria. Retrieved 15 August 2013 from
http://upetd.up.ac.za/thesis/available/etd-03162005-151333/unrestricted/00dissertation.pdf

Kline, R.B. (2010). Principles and practice of structural equation modeling (3rd ed.). New
York: Guilford Press.

Kluckhohn, F., & Strodtbeck, F.L. (1961). Variations in value orientation. Evanston, IL: Row.

Knott, K., Taylor, N., Oosthuizen, Y., & Bhabha, F. (2013). The Myers-Briggs type indicator in
South Africa. (Chapter 17, pp. 244–256). In S. Laher, & K. Cockcroft (Eds.), Psychological
assessment in South Africa: Research and applications. Johannesburg: Wits University Press.

Kouzes, J.M., & Posner, B.Z. (2009). To lead, create a shared vision. Harvard Business Review,
87, 20–21.

Kowalski, R., & Westen, D. (2004). Psychology: Brain, behavior, and culture (4th ed.). New
York: Wiley.

Kravitz, D.A., Harrison, D.A., Turner, M.E., Levine, E. L., Chaves, W., et al (1997). Affirmative
action: A review of psychological and behavioural research. Bowling Green, OH: Society for
Industrial and Organisational Psychology.

Kriek, H.J., Hurst, D.N., & Charoux, J.A.E. (1994). The assessment centre: Testing the fairness
hypothesis. Perspectives in Industrial Psychology, 20(2), 21–25.

Laher, S., & K Cockcroft (2013) Psychological assessment in South Africa: Research and
applications. Johannesburg: Wits University Press.

Lambsdorff, J.G. (2007). The methodology of the corruption perceptions index. Transparency
International (TI) and University of Passau. Retrieved 18 March 2009 from
http://www.transparency.org

Lanchbury, P., & Kearns, A. (2000) How do you know when and if a candidate with a disability
needs a test accommodation? Journal of the Application of Occupational Psychology to
Employment and Disability, 2(2), 37–40. (Cited in Vermeulen, 2000.) Review of J. Sandoval,
C.L. Frisby, K.F. Geisinger, J. D. Scheuneman & Julia Ramos Grenier (Eds.) (1998), Test
interpretation and diversity: Achieving equity in assessment. Washington DC: American
Psychiatric Association. Retrieved 12 September 2013 from
http://www.dwp.gov.uk/docs/no1-oct-00-book-review-2.pdf

Langley, R., Du Toit, R., & Herbst, D.L. (1995). Manual for the values scale. Pretoria: Human
Sciences Research Council.

Larmour, P. (2008).Corruption and the concept of “culture”: Evidence from the Pacific Islands.
Crime, Law and Social Change, 49(3), 225–237.

Larmour, P. (2012) Corruption and the concept of “culture”: Evidence from the Pacific Islands.
Chapter 9 in M. Barchess, P.B. Hindess, & P. Larmour (Eds.), Corruption: Expanding the
focus. Canberra: Australian National University.

Levine, E.L., Spector, P.E., Menon, S., Narayanan, L., & Cannon-Bowers, J. (1996). Validity
generalisation for cognitive, psychomotor and perceptual tests for craft jobs in the utility
industry. Human Performance, 9, 1–22.

Liddell, C., & Kruger, P. (1987). Activity and social behaviour in a South African township
nursery: Some effects of crowding. Merrill Palmer Quarterly, 33(2), 195–211.

Lievens, F., & Klimoski, R.J. (2001). Understanding the assessment centre process: Where are
we now? In C.L. Cooper, & I.T. Robertson (Eds.), International Review of Industrial and
Organisational Psychology, Vol. 16 (pp. 245–286). Chichester, UK: Wiley.

Linn, R.L. (1973). Fair test use in selection. Review of Educational Research, 43, 343–357.
Lipson, J.G., & Meleis, A.I. (1989). Methodological issues in research with immigrants. Special
Issue: Cross-cultural nursing: Anthropological approaches to nursing research. Medical
Anthropology, 12, 103–115.

Littlefield, L. Stokes, D., & Li, B (2010). Options for the protection of the public posed by the
inappropriate use of psychological testing. Submission to the Psychology Board of Australia,
Consultation Paper, Australian Psychological Society, September.

Lloyd-Jones, H. (1983). The justice of Zeus. (2nd ed.). Berkeley, CA: University of California
Press.

Lobello, S., & Sims, B. (1993). Fakability of a commercially produced pre-employment integrity
test. Journal of Business and Psychology, 8, 265–273.

Louisiana Psychological Association. (n. d.). Medical psychologists may prescribe medication.
Retrieved 24 July 2013 from http://www.louisianapsychologist.org/displaycommon.cfm?
an=1&subarticlenbr=6

Louw, D.A., & Edwards, D.J.A. (1997). Psychology: An introduction for students in Southern
Africa (2nd ed.). Johannesburg: Heinemann.

Lubinski, D., & Benbow, C. (2000). States of excellence: A psychological interpretation of their
emergence. American Psychologist, 55, 137–150.

Luria, A.R. (1973). The working brain: An introduction to neuropsychology (B. Haigh, trans.).
New York: Basic Books.

Luria, A.R. (1966). Human brain and psychological processes. New York: Harper and Row.

Lyman, H.B. (1998). Test scores and what they mean (6th ed.). Boston, MA: Allyn & Bacon.

Macqueen, P. (2012). The rapid rise of online psychological testing in selection. InPsych
(Australian Psych Soc) October. Available from
http://www.psychology.org.au/Content.aspx?ID=4925

Makhudu, N. (1993). Cultivating a climate of cooperation through Ubuntu. Enterprise, 68, 40–
41 (Cited in Prinsloo, 2001).

Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective
studies of disease. Journal of the National Cancer Institute, 22, 719-748.

Maree, J.G. (2010). Brief overview of the advancement of postmodern approaches to career
counselling. Journal of Psychology in Africa, 20(3), 361–368.

Marks-Tarlow, T. (2002). Fractal dynamics of the psyche. Retrieved 6 July 2006 from
http://www.goertzel.org/dynapsyc/2002/FractalPsyche.htm

Marsh, H.W., & Byrne, B.M. (1993). Confirmatory factor analysis of multigroup–multimethod
self-concept data: Between-group and within-group invariance constraints. Multivariate
Behavioral Research, 28, 313–349.
Maslow, A. (1954). Motivation and personality. New York: Harper.

Masters, G.N. (1985). Common-Person equating with the Rasch model. Applied Psychological
Measurement, 9(1), 73–82.

Matarazzo, J.D. (1992). Testing and assessment in the 21st century. American Psychologist, 47,
1007–1018.

Mathews, R., Stokes, D., & Grenyer, B. (2010). A snapshot of the Australian psychology
workforce. InPsych, 32(5), 28–30. (Cited in Psychology, 2020.)

Mauer, K.F. (2003a). News flash 22 April, 2003: New validity evidence for the Career Path
Appreciation (CPA). Johannesburg: BIOSS.

Mauer, K.F. (2003b). News flash 23 November, 2003: New findings on the test–retest validity of
the CPA. Johannesburg: BIOSS.

Mauer, K.F. (n.d.). Summary of research evidence in respect of the CPA and IRIS.
Johannesburg: BIOSS.

Mayer, J.D., & Salovey, P. (1993). The intelligence of emotional intelligence. Intelligence, 17,
433–442.

McCaulley, M.H. (2000). Myers-Briggs type indicator: A bridge between counselling and
consulting. Consulting Psychology Journal: Practice and Research, 57, 117–132.

McClelland, D.C., Atkinson, J.W., Clark, R.A., & Lowell, E.L. (1953). The achievement motive.
New York: Appleton-Century-Crofts.

McCormick, E.J., Jeanneret, P.R., & Meacham, R.C. (1989) Position analysis questionnaire.
Bellingham, WA. PAQ Services.

McCormick, E.J., Jeanneret, P.R., & Meacham, R.C. (1972). A study of job characteristics and
job dimensions as based on the position analysis questionnaire. Journal of Applied
Psychology, 56, 347–368.

McCrae, R.R., & Costa, P.T. Jr. (1997). Toward a new generation of personality theories:
Theoretical contexts for the five-factor model. In J.S. Wiggins (Ed.), The five-factor model of
personality: Theoretical perspectives (pp. 51–87). New York: Guilford.

McIntire, S.A., & Miller, L.A. (2000). Foundations of psychological testing. New York:
McGraw-Hill.

McManus, M.A., & Kelly, M.L. (1999). Personality measures and biodata: evidence regarding
their incremental predictive validity in the life insurance industry. Personnel Psychology,
52(1/2), 137–148.

Meehl, P.E. (1954). Clinical versus statistical prediction – A theoretical analysis and a review of
the evidence. Minneapolis, MN: University of Minnesota Press.
Meijer, R.R., & Nering, M.L. (1999). Computerized adaptive testing: Overview and
introduction. Applied Psychological Measurement, 23, 187–194.

Mellam, A., & Aloi, D. (2003). Papua New Guinea national integrity systems: Country study
report. Blackburn South: Transparency International Australia.

Mercer, J.R. (1973). The pluralistic assessment project: Sociocultural effects in clinical
assessment. School Psychology Digest, 2, 10–18.

Meyer, W.F., Moore, C., & Viljoen, H.G. (1997). Personology: From individual to ecosystem.
Johannesburg: Heinemann.

Miller, L.K. (1997). Principles of everyday behavior analysis. Pacific Grove, CA: Brookes/Cole.

Millon, T. (1997). Millon clinical multiaxial inventory-III manual (2nd ed.). Minneapolis, MN:
National Computer Systems.

Milner, K. Donald, F., & Thatcher, A. (2013). Psychological assessment and workplace
transformation in South Africa: A review of the research literature. In S. Laher, & K.
Cockcroft (Eds.), Psychological assessment in South Africa: Research and applications.
Johannesburg: Wits University Press.

Milsom, J. (2004). The growing importance of cross-cultural assessment. Competency &


Emotional Intelligence Quarterly, 11(4), 19–22. Retrieved 23 August 2008 from
http://www.wickland-westcott.co.uk/pdfs/cross-cultural_assessment.pdf

Miltenberger, R. (1997). Behavior modification: Principles and procedures. Pacific Grove, CA:
Brookes/Cole.

Moos, R. (1973). Conceptualization of human environments. American Psychologists, 28, 652–


665.

Morelli, N. (2012). Are Internet-based, unproctored assessments on mobile and nonmobile


devices equivalent? Paper presented at the 27th Annual Conference of the Society for
Industrial and Organizational Psychology, San Diego, CA. (Cited by Macqueen, 2012.)

Morgan, L. (2007). Global trends in effective candidate identification, assessment and


deployment methods. Retrieved 5 August 2013 from
http://www.lloydmorgan.com/PDF/Global%20Trends%20in%20Effective%20Candidate%20Identification%20%

Morgeson, F.P., Campion, M.A., Dipboye, R.L., Hollenbeck, J.R., Murphy, K., & Schmitt, N.
(2007). Reconsidering the use of personality tests in personnel selection contexts. Personnel
Psychology, 60, 683–729.

Morris, C.G., & Maisto, A.A. (2002). Psychology: An introduction (12th ed.). Upper Saddle
River, NJ: Prentice Hall.

Mount, M.K., Barrick, M.R., & Strauss, J.P. (1994). Validity of observer ratings of the Big Five
personality factors. Journal of Applied Psychology, 79, 272–280.
Muchinsky, P.M., Kriek, H.J., & Schreuder, A.M.G. (2002). Personnel Psychology (2nd ed.).
Johannesburg: Thomson.

Muniz, J., Bartram, D., Evers, A., Boben, D., Matesic, K., Glabeke, K., Fernandez-Hermida,
J.R., & Zaal, J.N. (2001). Testing practices in European countries. European Journal of
Psychological Assessment, 17(3), 201–211.

Murphy, K.R. (2000). What constructs underlie measures of honesty or integrity? In R. Goffin,
& E. Helmes (Eds.), Problems and solutions in human assessment: A Festschrift to Douglas
N. Jackson at seventy (pp. 265–284). London: Kluwer.

Murphy, K.R., & Davidshofer, C.O. (2006). Psychological testing: Principles and applications
(6th ed.). Upper Saddle River, NJ: Prentice Hall.

Murphy, L., Plake, B.S., & Spies, R.A. (2008). (Eds.). Tests in Print VII. Lincoln, NE:
University of Nebraska Press, Buros Institute of Mental Measurements.

Murphy, R., & Maree, J.F. (2006). A review of South African research in the field of dynamic
testing. South African Journal of Psychology, 36(1), 168–191.

Murray, M. (2005). How to design a successful assessment centre. People Management (UK),
11(4), 24–45.

Myers, I.B. (1980). Introduction to type. Palo Alto, CA: Consulting Psychologists Press.

Myers, I.B., & Briggs, K.C. (1962). The Myers-Briggs type indicator. Palo Alto, CA: Consulting
Psychologists Press.

Myers, I.B., & McCaulley, M.H. (1985). Manual: A guide to the development and use of the
Myers-Briggs type indicator. Palo Alto, CA: Consulting Psychologists Press.

Naglieri, J.A., & Das, J.P. (1997). Cognitive assessment system. Administration and scoring
manual. Interpretive handbook. Itasca, IL: Riverside.

Naicker, A. (1994). The psycho-social context of career counselling in S.A. schools. South
African Journal of Psychology 24(1), 7–34.

Narayanan, P., & Swaminathan, H. (1994). Performance of Mantel-Haenszel and simultaneous


item bias procedure for detecting differential item functioning. Applied Psychological
Measurement, 18, 315–328.

Neisser, U., Boodoo, G., Bouchard, T.J., Boykin, A.W., Brody, N., Ceci, S.J., et al. (1995).
Intelligence: Knowns and unknowns: Report of a task force established by the Board of
Scientific Affairs of the American Psychological Association. Washington DC: APA Science
Directorate. Retrieved 18 October 2007 from
http://www.lrainc.com/swtaboo/taboos/apa_01.html

Nell, V. (1994). Interpretation and misinterpretation of the South African Wechsler-Bellevue


Adult Intelligence Scale: A history and prospectus. South African Journal of Psychology,
24(2), 100–109.
Neuman, G., & Baydoun, R. (1998). Computerization of paper-and-pencil tests: When are they
equivalent? Applied Psychological Measurement, 22, 71–83.

Nevill, D.D., & Super, D.E. (1986). The values scale: Theory, application and research. Palo
Alto, CA: Consulting Psychologists Press.

Ngo, D. (2010). Position analysis questionnaire. Retrieved 31 July 2013 from


http://www.humanresources.hrvinet.com/position-analysis-questionnaire-paq-model

Nunnally, J.C., & Bernstein, I.H. (1993). Psychometric theory. New York: McGraw-Hill.

Nzimande, B. (1984). Industrial psychology and the study of black workers in South Africa: A
review and critique. Psychology in Society, 2, 54–91.

Nzimande, B. (1995). To test or not to test? Paper presented at the Congress on Psychometrics,
Council for Scientific and Industrial Research, Pretoria, South Africa.

Oaklander, V. (1997) The therapeutic process with children and adolescents. Gestalt Review,
1(4), 292–317.

Office of Technology Assessment of the US Federal Government (OTA). (1983).

Ones, D.S., & Viswesvaran, C. (1998). The effects of social desirability and faking on
personality and integrity assessment for personnel selection. Human Performance, 11, 245–
269.

Ones, D.S., Viswesvaran, C., & Reiss, A.D. (1996). Role of social desirability in personality
testing for personnel selection: The red herring. Journal of Applied Psychology, 81, 660–679.

Ones, D.S., Viswesvaren, C., & Schmidt, F.L. (1993). Comprehensive meta-analysis of integrity
test validities: Findings and implications for personnel selection and theories of job
performance. Journal of Applied Psychology Monograph, 78, 679–703.

Ones, D., Viswesvaran, C., & Dilchert, S. (2005). Personality at work: Raising awareness and
correcting misconceptions. Human Performance, 18(4), 389–404.

Paine, L.S. (1997). Integrity. In P.H. Werhane, & R.E. Freeman, (Eds.), Encyclopedic dictionary
of business ethics (pp. 335–337). Oxford: Blackwell.

PAQ Services. (2013). PAQ’s job analysis. Retrieved 29 July 2013 from
http://www.paq.com/index.cfm?Fuse-Action=bulletins.job-analysis

Pausewang, G. (1997). Adi: Jugend eines Diktators {Adi: The adolescence of a dictator}.
Ravensburg, Germany: Ravensburger Verlag.

Pedrajita, J.Q., & Talisayon, V.M. (2009). Identifying biased test items by differential item
functioning analysis by using contingency table approaches: a comparative analysis.
Education Quarterly, 67(1), 21–43. (University of the Philippines College of Education).
Retrieved 17 July 2013 from
http://journals.upd.edu.ph/index.php/edq/article/viewFile/2017/1912
Peter, L.J., & Hull, R. (1969). The Peter Principle: Why things always go wrong. New York:
William Morrow.

Peters, T. (1994). The Tom Peters seminar: Crazy times call for crazy organizations. London:
Macmillan.

Petersen, I. (2004). Primary level psychological services in South Africa: Can a new
psychological professional fill the gap? Health Policy and Planning, 19, 33–40.

Peterson, I. (1993). From prisons to autos to space. Science New, 17 July, 37.

Phalet, K., & Hagendoorn, L. (1996). Personal adjustment to acculturative transitions: The
Turkish experience. International Journal of Psychology, 31, 131–144.

Phalet, K., & Swyngedouw, M. (2003). A cross-cultural analysis of immigrant and host values
and acculturation orientations. In H. Vinken, & P. Esther (Eds.), Comparing cultures (pp.
185–212). Leiden: Brill.

Phalet, K., van Lotringen, C., & Entzinger, H. (2000). Islam in de multiculturele samenleving
{Islam in the multicultural society}. Utrecht, The Netherlands: University of Utrecht,
European Research Centre on Migration and Ethnic Relations.

Pilbeam, S., & Corbridge, M. (2006). People resourcing: Contemporary HRM in practice (3rd
ed.). Essex, UK: Prentice Hall.

Ployhart, R.E., & Tsacoumis, S. (2001). Strategies for reducing adverse impact. Paper presented
at the February 2001 workshop of the Personnel Testing Council of Metropolitan
Washington, DC. (Cited in Blair, 2003.)

Posner, M.I. (1980). Orienting of attention. The Quarterly Journal of Experimental Psychology,
32(1), 3–25.

Price, R.K., Spitznagel, E.L., Downey, T.J., Meyer, D.J., Risk, N.K., & el-Ghazzawy, O.G.
(2000). Applying artificial neural network models to clinical decision making. Psychological
Assessment, 12(1), 40–51.

Prinsloo, C.H. (1998). Manual for the use of the Sixteen Personality Factor Questionnaire:
South African 1992 version (16PF SA92). Pretoria: Human Sciences Research Council.

Prinsloo, E.D. (2001). A comparison between medicine from an African (Ubuntu) and Western
philosophy. Curationis, 24(1), 58–65.

Psychology 2020 – The 2011–2012 President’s Initiative on the Future of Psychological Science
in Australia. The Australian Psychological Society. Available at
http://www.psychology.org.au/Assets/Files/2012_APS_PIFOPS_WEB.pdf

Psychology Board of Australia. (2012). National psychology examination curriculum. Retrieved


24 July 2013 from http://www.psychologyboard.gov.au/documents/default.aspx?record

Public Service Commission (of Canada). (2006). Standardized testing and employment equity
career counselling: A literature review of six tests. Retrieved 15 June 2007 from
http://www.psc-cfp.gc.ca/ee/eecco/intro_e.htm

Pulakos, E.D. (2005) Selection assessment methods. Alexandria, VA.:SHRM Foundation.


Available at
http://www.shrm.org/about/foundation/research/documents/assessment_methods.pdf.

Quenk, N.L. (2000). Essentials of Myers-Briggs type indicator assessment. New York: Wiley.

Raine, A., & Venables, P.H. (1984a) Tonic heart rate, social class and antisocial behaviour in
adolescents. Biological Psychology, 18, 123–132.

Raine, A., & Venables, P.H. (1984b). Electrodermal non-responding, antisocial behaviour and
schizoid tendencies in adolescents. Psychophysiology, 21(4), 424–433.

Raven, J., Raven, J.C., & Court, J.H. (2003, updated 2004) Manual for Raven’s progressive
matrices and vocabulary scales. San Antonio, TX: Harcourt Assessment.

Reed, T.E., & Jensen., A.R. (1993). Choice reaction time and visual pathway conduction
velocity both correlate with intelligence but appear not to correlate with each other:
Implications for information processing. Intelligence, 17, 191–203.

Reid, B. (1990). Weighing up the factors: Moral reasoning and culture change in a Samoan
community. Ethos, 18(1), 48–71.

Republic of South Africa. (1995). Labour Relations Act, No. 66 of 1995. Government Gazette,
No. 16861, Cape Town.

Republic of South Africa. (1998). Employment Equity Act, No. 55 of 1998. Government
Gazette, No 19370, Pretoria.

Richman-Hirsch, W.L., Olson-Buchanan, J.B., & Drasgow, F. (2000). Examining the impact of
administration medium on examinee perceptions and attitudes. Journal of Applied
Psychology, 85, 880–887.

Robbins, S.P. (1996). Organisational behaviour: Concepts, controversies, applications (7th ed.).
San Diego, CA: Prentice Hall.

Roid, G.H. (2003). Stanford-Binet intelligence scales interpretive manual: Expanded guide to
the interpretation of SB5 test results. Itasca, IL: Riverside.

Rosenthal, R., & Rubin, D.B. (1978). Interpersonal expectancy effects. The first 345 studies.
Behavioural and Brain Sciences, 3, 377–386.

Rothstein, M.G., & Goffin, R.D. (2006). The use of personality measures in personnel selection:
What does current research support? Human Resource Management Review, 16(2), 155–180.

Rotter, J. B. (1966). Generalized expectancies for internal versus external control of


reinforcement: Psychological Monographs: General & Applied, 80(1),1–28.
Roughan, P. (2004). Solomon Islands national integrity systems: Country study report.
Blackburn South: Transparency International Australia.

Rounds, J.B., & Tracey, T.J. (1996). Cross-cultural structural equivalence of RIASEC models
and measures. Journal of Counseling Psychology, 43(3), 310–329.

Roussos, L.A., & Stout, W.F. (1996). Simulation studies of the effects of small sample size and
studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal
of Educational Measurement, 33, 215–230.

Rudmin, F., & Ahmadzadeh, V. (2001). Psychometric critique of acculturation psychology: The
case of Iranian migrants in Norway. Scandinavian Journal of Psychology, 42(1), 41–56.
(Cited by Donoso, 2010, q.v.).

Russ-Eft, D., & Preskill, H. (2001). Evaluation in organizations: A systematic approach to


enhancing learning, performance and change. New York: Basic Books.

Russell, C.J. (2000). The Cleary Model: “Test bias” as defined by the EEOC uniform guidelines
on selection procedures. Retrieved 26 March 2008 from
http://www.ou.edu/russell/whitepapers/Cleary_model.pdf

Russell, M. (1999). Testing on computers: A follow-up study comparing performance on


computer and on paper. Education Policy Analysis Archives, 7(20). Retrieved 25 September
2007 from http://epaa.asu.edu/epaa/v7n20

Russell, M., & Haney, W. (1997). Testing writing on computers: An experiment comparing
student performance on tests conducted via computer and via paper-and-pencil. Education
Policy Analysis Archives, 5(3). Retrieved 25 September 2009 from
http://olam.ed.asu.edu/epaa/v5n3.html

Ryan, A.M., & Sackett, P.R. (1987). Pre-employment honesty testing: Fakability, reactions of
test-takers and company image. Journal of Business and Psychology, 1, 248–256.

Ryan, J., Tracey, T., & Rounds, J. (1996). Generalizability of Holland’s structure of vocational
interests across ethnicity, gender, and socioeconomic status. Journal of Counseling
Psychology, 43, 330–337.

Ryder, A., Alden, L., & Paulhus, D. (2000). Is acculturation unidimensional or bidimensional?
Journal of Personality and Social Psychology, 79, 49–65.

Saccuzzo, D.P., & Jackson, N.E. (1995). Identifying traditionally under-represented children for
gifted programs. The National Research Center on the Gifted and Talented Newsletter,
Winter, 4–5.

Sackett, P., & Wanek, J.E (1996). New developments in the use of measures of honesty,
integrity, conscientiousness, dependability, trustworthiness, and reliability for personnel
selection. Personnel Psychology, 47, 787–829.

Sackett, P.R., Burris, L.R., & Callahan, C. (1989). Integrity testing for personnel selection: An
update. Personnel Psychology, 42, 491–529.
Sagana, A., & Potocnic, K. (2009). Psychological teaching and training in Europe. Paper
presented at the 32nd Inter-American Congress on Psychology 28 June – 2 July, Guatemala
City. Retrieved 19 July 2013 from
http://www.iaapsy.org/division15/uploads/congress/Sagana&Potocnik.pdf

Salovey, P., & Mayer, J.D. (1990). Emotional intelligence. Imagination, Cognition, and
Personality, 9, 185–211.

SAQA (The South African Qualifications Authority). (2012). Level descriptors for the South
African National Qualifications Framework. Pretoria: SAQA (pp. 5–12).

Savickas, M.L. (2006). A vocational psychology for the global economy. Keynote presentation at
the Psychological Association, New Orleans, LA. (Cited in Maree, 2010.)

Saville & Holdsworth Ltd (SHL) (South Africa). (2005). Study into the regulation of
psychologists and psychological assessment in 21 countries. Retrieved 21 November 2005
from http://www.shl.com/SHL/za/Products/Access_Competencies/online-competency-
profiler. aspx

Schein, E.H. (1990). Career anchors: Discovering your real values. San Diego, CA: Pfeiffer.

Schein, E.H. (1992). Organizational culture and leadership. San Francisco, CA: Jossey-Bass.

Schein, E.H. (1995). Career orientations inventory. Englewood Cliffs, NJ: Prentice Hall.

Schein, E.H. (1996). Career anchors revisited: Implications for career development in the 21st
century. Retrieved 18 September 2007 from http://www.solonline.org/res/wp/10009.html

Schellenberg, S.J. (2004). Test bias or cultural bias: Have we really learned anything? Paper
presented at the symposium The Achievement Gap: Test Bias or School Structures?
sponsored by the National Association of Test Directors as part of the Annual Meeting of the
National Council for Measurement in Education, San Diego, CA, 14 April 2004. Retrieved 16
Sept 2013 from http://datacenter.spps.org/uploads/Test_Bias_Paper.pdf

Schmidt, F.L. (1988). The problem of group differences in ability scores in employment
selection. Journal of Vocational Behaviour, 33, 272–292.

Schmidt, F.L., Hunter, J.E, McKenzie, R.C., & Muldrow, T.W. (1979). Impact of valid selection
procedure on work-force productivity. Journal of Applied Psychology, 6(6), 609–626.

Schmidt, F.L., & Hunter, J.E. (1998). The validity and utility of selection methods in personnel
psychology: Practical and theoretical implications of 85 years of research findings.
Psychological Bulletin, 124(2), 262–274.

Schmidt, G. (1984). Equal opportunity policy: A comparative perspective. International Journal


of Manpower, 5(3), 15–25.

Schumacher, L.B. (2010). Statement made by the Director of the PhD in Global Leadership at an
Indiana Tech Immersion Weekend (March 2010) (Personal communication 24 July, 2013).
Schwartz, A.L. (1983). Recruiting and selection of sales-people. In E.E. Borrow, & L.
Wizenberg (Eds.), Sales managers’ handbook (pp. 341–348), Homewood, IL: Dow Jones
Irwin.

Scully Mogg Consulting. (1999). Unpublished assessment centre manual. Johannesburg: Scully
Mogg Consulting.

Shackleton, V., & Newell, S. (1997). International assessment and selection. In N. Anderson, &
P. Herriot (Eds.), International handbook of selection and assessment (Vol. 13, pp. 81–95).
Chichester, UK: Wiley.

Shealy, R., & Stout, W. F. (1993). A model-based standardization approach that separates true
bias/DIF from group differences and detects test bias/DIF as well as item bias/DIF.
Psychometrika, 58, 159–194.

Sheldon, W., & Stevens, S.S. (1942). Varieties of human temperament: A psychology of
constitutional differences. New York: Harper.

Shepard, L.A., Camilli, G., & Williams, D.M. (1985). Validity of approximation techniques for
detecting item bias. Journal of Educational Measurement, 22, 77–105.

Sheppard, L.D., & Vernon, P.A. (2008). Intelligence and speed of information processing: A
review of 50 years of research. Personality and Individual Differences, 44, 535–551.

Shippmann, J.S., Ash, R.A., Battista, M., Carr, L., Eyde, L.D., Hesketh, B., Keyhoe, J.,
Pearlman, K., Prien, E.P., & Sanchez, J.I. (2000). The practice of competency modeling.
Personnel Psychology, 53, 703–740.

Shore, B. (1996). Culture in mind: Cognition, culture and the problem of meaning. Oxford:
Oxford University Press.

Shuttleworth-Jordan, A.B. (1996). On not re-inventing the wheel: A clinical perspective on


culturally relevant test usage in South Africa. South African Journal of Psychology, 26(2),
96–102.

Siegle, G.J., & Hasselmo, M.E. (2002). Using connectionist models to guide assessment of
psychological disorders. Psychological Assessment, 14(3), 263–278.

Silzer, R., & Jeanneret, R. (1998). Anticipating the future: Assessment strategies for tomorrow.
In R. Jeanneret, & R. Silzer (Eds.), Individual psychological assessment: Predicting
behaviours in organizational settings (pp. 445–477). San Francisco, CA: Jossey-Bass.

Singh, H.P., & Dakunivosa, M. (2001). Fiji national integrity systems: Country study report.
Blackburn South: Transparency International Australia.

So’o, L.L., Asofou, S., Ruta-Fiti, V., Unasa L.F., & Lāmeta, S. (2004). Sāmoa national integrity
systems: Country study report. Blackburn South: Transparency International Australia.

Society for Human Resource Management (2012). SHRM Elements for HR Success Competency
Model® Retrieved 12 December, 2013 from
http://www.shrm.org/hrcompetencies/documents/competency%20 model%208%200.pdf

Society for Industrial and Organisational Psychology of South Africa (SIOPSA). (2005).
Guidelines for the validation and use of assessment procedures for the workplace. Pretoria:
SIOPSA.

Society for Industrial and Organisational Psychology of South Africa (SIOPSA). (2012)
Recommendations for regulating development, control and use of psychological tests. Drawn
up by People Assessment in Industry (PAI – an Interest Group of the Society for Industrial
and Organisational Psychology of South Africa). Pretoria, SIOPSA.

Solomon, R.C. (1999). A better way to think about business. How personal integrity leads to
corporate success. New York: Oxford University Press.

South African Department of Education. (2000). Norms and Standards for Educators,
Government Gazette, 415 (20844), 4 February 2000: Pretoria. Retrieved 25 March 2008 from
http://www.info.gov.za/gazette/notices/2000/20844.pdf

South African Department of Education. (2005). Subject assessment guidelines – Life


Orientation Draft 2005 (p. 7). Retrieved 26 March 2008, from
http://www.apek.org.za/standards%5CLO%20180705.doc

South African Government. (1998). Employment Equity Act (55 of 1998). Government Gazette,
No. 19370, Vol. 400, Cape Town: Government Printer.

South African Government. (1999). Skills Levies Act (9 of 1999). Government Gazette, No.
19984, Vol. 406, Cape Town: Government Printer.

Spearman, C. (1927). The abilities of man. New York: Macmillan.

Stamp, G., & Retief, A. (1996). Towards a culture-free identification of working capacity: The
Career Path Appreciation. Uxbridge, UK: BIOSS.

Stamp, G., & Stamp, C. (1993). Well-being at work: Aligning purposes, people, strategies and
structure. International Journal of Career Management, 5(3).

Stamp, G., & Stamp, C. (2004). The individual, the organisation and the path to mutual
appreciation. Retrieved 27 January 2009 from http://www.gillianstamp.com

Stampe, D., Roehl, B., & Eagan, J. (1993). Virtual reality creations. Corte Madera, CA: White
Group Press.

Stanton, J.M. (1999). Validity and related issues in Web-based hiring. The Industrial-
Organizational Psychologist, 36(3), 69–77.

Stanush, P., Arthur, W., & Doverspike, D. (1998). Hispanic and African American reactions to a
simulated race-based affirmative action scenario. Hispanic Journal of Behavioural Science,
20(1), 3–16.

Steenkamp, J.B., & Baumgartner, H. (1998). Assessing measurement invariance in cross-


national research. Journal of Consumer Research, 25, 78–90.

Sternberg, R.J., & Grigorenko, E.L. (2002) Dynamic testing: The nature and measurement of
learning potential. Cambridge: Cambridge University Press.

Sternberg, R.J. (1977). Intelligence, information processing, and analogical reasoning: The
componential analysis of human abilities. Hillsdale, NJ: Erlbaum.

Sternberg, R.J. (1988). The triarchic mind: A new theory of human intelligence. New York:
Viking.

Sternberg, R.J. (Ed.). (2000). Handbook of intelligence. New York: Cambridge University Press.

Sternberg, R.J., & Detterman, D.K. (Eds.). (1986). What is intelligence? Contemporary
viewpoints on its nature and definition. Norwood, NJ: Ablex.

Sternberg, R.J., Forsythe, G.B., Hedlund, J., Horvath, J., Snook, S., Williams, W.M., Wagner,
R.K., & Grigorenko, E.L. (2000). Practical intelligence in everyday life. New York:
Cambridge University Press.

Stewart, V. (2004). Kelly’s theory summarised: A summary of Kelly’s theory of personal


constructs, the basis of the repertory grid interview. Retrieved on 11 July from
http://www.enquirewithin.co.nz/theoryof.htm

Strategic Human Resources Managers (SHRM) (2012). What is a competency? Retrieved 1


August 2013 from
http://www.shrm.org/hrcompetencies/documents/competency%20model%207%203.pdf

Sullivan, L., & Arnold, D.W. (2000). Invasive questions lead to legal challenge, settlement and
use of different tests. The Industrial Psychologist, 38(2), 142–143.

Swaminathan, H., & Rogers, H.J. (1990). Detecting differential item functioning using logistic
regression procedures. Journal of Educational Measurement, 27, 361–370.

Syracuse University HRD Competency Library (n.d.) Retrieved 15 August 2013 from
http://humanresources.syr.edu/staff/nbu_staff/comp_library.html

Taylor, T.R. (1994). A review of three approaches to cognitive assessment and a proposed
integrative approach based on a unifying theoretical framework. South African Journal of
Psychology, 24(4), 184–193.

Taylor, T.R. (2006). Administrator’s manual for TRAM-1 battery. Edition 3. Johannesburg:
Aprolab.

Taylor, T.R. (2013). APIL and TRAM learning potential assessment instruments. In S. Laher, &
K. Cockcroft (Eds.), Psychological Assessment in South Africa: Research and Applications
(Chapter 11, pp. 158–168). Johannesburg: Wits University Press.

Terpstra, D.E., Mohamed, A.A., & Kethley, R.B. (1999). An analysis of federal court cases
involving nine selection devices. International Journal of Selection and Assessment, 7(1),
26–33.

Theron, C. (2007). Confessions, scapegoats and flying pigs: Psychometric testing and the law.
South African Journal of Industrial Psychology, 33(1), 102–117.

Theron, C. (2009.) The diversity-validity dilemma: In search of minimum adverse impact and
maximum utility. South African Journal of Industrial Psychology, 35(1), 1–13.

Thorndike, R.L. (1971). Educational measurement (2nd ed.). Washington DC: American
Council on Education.

Thurstone, L.L. (1938). Primary mental abilities. Chicago: University of Chicago Press.

Tippins, N.T. (2009). Internet alternatives to traditional proctored testing: Where are we now?
Industrial and Organizational Psychology: Perspectives on Science and Practice, 2(1), 2–10.

Transparency International (TI). (2009). The anti-corruption plain language guide. Switzerland:
TI.

Tredoux, C., & Durrheim, K. (Eds.). (2005). Numbers, hypotheses and conclusions (2nd ed.).
Cape Town: UCT Press.

Tredoux, N. (2013). Using computerised and Internet-based testing in South Africa. In S. Laher,
& K. Cockcroft (Eds.) Psychological assessment in South Africa: Research and applications.
Johannesburg: Wits University Press.

Triandis, H.C. (2000). Culture and conflict. International Journal of Psychology, 35(2), 145–
152.

Tryon, W.W. (1999). A bidirectional associative memory explanation of posttraumatic stress


disorder. Clinical Psychology Review, 19, 789–818.

US Congress, Office of Technology Assessment. (1990). The use of integrity tests for pre-
employment screening, OTA-SET-442 Washington, DC: US Government Printing Office.

Valchev, V.H., Nel, J.A., van de Vijver, F.J.R., Meiring, D., de Bruin, G.P., & Rothmann, S.
(2013). Similarities and differences in implicit personality concepts across ethnocultural
groups in South Africa. Journal of Cross-Cultural Psychology, 44, 365–388.

Van de Vijver, F.J.R. (2002). Cross-cultural assessment: Value for money? Applied Psychology:
An International Review, 51(4), 545–566.

Van de Vijver, F.J.R., & Hambleton, R.K. (1996). Translating tests: Some practical guidelines.
European Psychologist, 1(2), 89–99.

Van de Vijver, F.J.R., Helms-Lorenz, M., & Feltzer, M.F. (1999). Acculturation and cognitive
performance of migrant children in the Netherlands. International Journal of Psychology, 34,
149–162.

Van de Vijver, F.J.R., & Leung, K. (1997a). Methods and data analysis of comparative research.
In J.W. Berry, Y.H. Poortinga, & J. Pandey (Eds.), Handbook of cross-cultural psychology
(2nd ed.) (Vol. 1, pp. 257–300). Boston: Allyn & Bacon.

Van de Vijver, F.J.R., & Leung, K. (1997b). Methods and data analysis for cross-cultural
research. Newbury Park, CA: SAGE.

Van de Vijver, F.J.R., & Phalet, K. (2004). Assessment in multicultural groups: The role of
acculturation. Applied Psychology: An International Review, 53, 215–236.

Van de Vijver, F.J.R., & Poortinga, Y.H. (1997). Towards an integrated analysis of bias in cross-
cultural assessment. European Journal of Psychological Assessment, 13, 29–37.

Van de Vijver, F.J.R., & Poortinga, Y.H. (2002). Structural equivalence in multilevel research.
Journal of Cross-Cultural Psychology, 33(2), 141–156.

Van de Vijver, F.J.R., & Tanzer, N.K. (2004). Bias and equivalence in cross-cultural assessment:
An overview. European Review of Applied Psychology, 54(2), 119–135.

Verhezen, P. (n.d.). Respect, integrity and trust: A cross-cultural interpretation of ‘corruption’


beyond the (conflictual) shame & guilt concepts. Retrieved 28 November 2012 from
http://www.verhezen.net/thoughts_articles/Respect,%20Integrity%20and%20Trust.pdf

Vernon, P.A. (1993). Biological approaches to the study of human intelligence. Norwood, NJ:
Ablex.

Vernon, P.E. (1960). The structure of human abilities (revised ed.). London: Methuen.

Verster, J.M. (1989). A cross-cultural study of cognitive processes using computerized tests.
Unpublished PhD Thesis, University of Pretoria, Pretoria.

Vygotsky, L.S. (1978). Mind and society: The development of higher mental processes.
Cambridge, MA: Harvard University Press.

Walters, L.C., Miller, M.R., & Ree, M.J. (1993). Structured interviews for pilot selection: No
incremental validity. International Journal of Aviation Psychology, 3(1), 25–38.

Wanek, J.E., Sackett, P.R., Ones, D.S. (2003). Towards an understanding of integrity test
similarities and difference: An item-level analysis of seven tests. Personnel Psychology, 56,
873–894.

Wang, Z & Mobley, W.H. (2011). Spotlight on global I-O industrial-organizational psychology
developments in China. The Industrial-Organizational Psychologist, 49(2), 101–104
Retrieved 24 July 2013 from http://www.siop.org/tip/oct11/16thompson.aspx

Wang, Z.M. (1993). Psychology in China: A review dedicated to Li Chen. Annual Review of
Psychology, 44, 87–116.

Wang, Z.M. (1995). Culture, economic reform and the role of industrial and organizational
psychology in China. In M.D. Dunnette, & L.M. Hough (Eds.), Handbook of industrial and
organizational psychology, (2nd ed.). (pp. 689–726). Palo Alto, CA: Consulting
Psychologists Press.

Ward, C., & Kennedy, A. (1992). The effects of acculturation strategies on psychological and
sociocultural dimensions of cross-cultural adjustment. Paper presented at the 3rd Asian
Regional IACCP Conference, Bangi, Malaysia. (Cited in Donoso, 2010, q.v.)

Ward, C., & Searle, W. (1991). The impact of value discrepancies and cultural identity on
psychological and sociocultural adjustment of sojourners. International Journal of
Intercultural Relations, 15(2), 209–224.

Wechsler, D. (1939). Measurement of adult intelligence. Baltimore, MA: Williams & Wilkins.

Weiner, J.A., & Rice, C. (2012). Utility of alternative UIT verification models. Paper presented
at the 27th Annual Conference of the Society for Industrial and Organizational Psychology,
San Diego, CA. (Cited by Macqueen, 2012.)

Weiss, T.B., & Hartle, F. (1997). Reengineering performance management: Breakthroughs in


achieving strategy through people. Boca Raton, FL: St. Lucie Press.

Wemm, R.L. (2001). International psychologists are trained by varying degrees. Observer
(American Psychological Society), 14(3). Retrieved 22 July 2013 from
http://www.psychologicalscience.org/observer/0301/notebook2.html See also
http://www.neurognostics.com.au/AcademicEquivs/OzziePsychoCringe.htm.

Werner, O., & Campbell, D.T. (1970). Translating, working through interpreters, and the
problem of decentering. In R. Naroll, & R. Cohen (Eds.), A handbook of cultural
anthropology (pp. 398–419). New York: American Museum of National History.

Westen, D. (2002). Psychology: Brain, behaviour and culture. New York: Wiley.

White, M.J., Brockett, D.R., & Overstreet, B.G. (1993). Confirmatory bias in evaluating
personality test information: Am I really that kind of person? Journal of Counselling
Psychology, 40(1), 120–126.

Whiteley, P. (2012). Are Britons becoming more dishonest? Retrieved 4 June 2013 from
http://www.essex.ac.uk/government/news_and_seminars/newsEvent.aspx?e_id=3880

Wiggins, J.S. (1973). Personality and prediction: Principles of personality assessment. Reading,
MA: Addison-Wesley.

Wikipedia (n.d.) The Forer effect. Retrieved 25 April 2008, from


http://en.wikipedia.org/wiki/Forer_effect

Williams, R.W. (2006). The not-so-hidden costs of poor selection. Retrieved 15 April 2008 from
http://hr.monster.com/articles/wendell/wendell3

Wise, P.S. (1989). The use of assessment techniques by applied psychologists. Belmont, CA:
Wadsworth.

Wolfaardt, J.B., & Roodt, G. (2005). Basic concepts. In C. Foxcroft, & G. Roodt (Eds.), An
introduction to psychological assessment in the South African context (2nd ed.). (Chap. 3).
Cape Town: Oxford University Press.

Woodruffe, C. (1990). Identifying and developing competence. London: Institute of Personnel


Management.

Zumbo, B.D. (1999). A handbook on the theory and methods of differential item functioning
(DIF): Logistic regression modeling as a unitary framework for binary and Likert-type
(ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and
Evaluation, Department of National Defense.

Zunker, V.G. (1998). Career counseling: Applied concepts of life planning (5th ed.). Pacific
Grove, CA: Brooks/Cole.
Index

A
Abbott, B.B. 15
Abrahams, F. 53, 137
absolute zero 9
acquiescence 55
Adarrage, P. 234
Adult Basic Competence Test 91
adverse impact 78
affirmative action 69
age equivalents 58
age referencing 58
Allport, G. 127
alternate form reliability 40
ambiguous pictures 131
American Psychological Association 102, 125
Anastasi, A. 58
Anolli, L. 234
antecedents 17
Antonius, R. 256
anxiety levels 54
APA see American Psychological Association
Aristotle 13
Arnold, D.W. 222
Arnold, J. 146, 201
Arthur, W. 83
artificial intelligence 234
artificial situations 18
Aryee, S. 170
aspiration level 186
assembly tasks 123
assess 5
why assess? 5
assessment 3, 20, 94, 219, 234
assessment as research 20
biologically anchored 234
dynamic 219
assessment centres 201, 211–213
definition 201
fairness 213
psychometric properties 211
reliability 212
validity 212
assessment instructions 94
assessment materials 94
attenuation 42
Australia 101
autonomony/independance 183

B
banding 84
Barnouw, V. 127
Barnum effect 137
Bar-On, R. 222
Baron, R.A. 127
Barrick, M.R. 140, 169
base rate 69
battery 88, 185
Baydoun, R. 226
behavioural change 225
behavioural indicators 147, 208–209
behaviourally anchored rating scales 174
Belbin, M. 169
Belbin 206
Benbow, C. 115
Ben-Porath, Y.S. 229
Bernstein, I.H. 3, 5, 12, 32, 44–45, 48, 50, 66, 73, 83
Berry, J.W. 222
bias 47
Big Five dimensions 169
Big Five theory 236
Binet, A. 113
biologically anchored assessment 234
Blair, M.D. 213, 215
Blanchard, K.H. 150, 170
Bogardus Social Distance Scale 31
Borden, K.S. 15
Borgen, F.H. 185
Borman, W. 225
Boston Consulting Group 175
Bouchard, T.J. 116
Bradberry, T. 223
Briggs, K.C. 135
Buros Institute of Mental Measurements 241
bus accident victim 70
Butcher, J.N. 229

C
Callanan, G.A. 178
cancellation task 71
career anchor 183
career path appreciation 164, 219
career 178
security 178
definition 178
Caretta, T.R. 116
Carroll, J.B. 116
Caruso, D. 222
Caryl, P.G. 235
Cascio, W.F. 85
case-based reasoning 234
Cattell, J.M. 112
Cattell, R.B. 115–116, 133, 137
CDS see Cognitive Distortion Scale
Ceci, S.J. 235
ceiling effects 44
central tendency 59
centrality 55
cerebral glucose metabolism 235
chaos theory 219, 232
Charoux, J.A.E. 213
Clarke, D. 226
Cleary model 92
client interview 153
clinical combination of scores 63
coefficient of
determination 51
equivalence 41
internal consistency 42
stability 40
Coetzee, N. 185
cognitive correlates 117
Cognitive Distortion Scale 72
Cohen, R.J. 12, 34, 56, 73, 92, 125, 139, 193, 197, 199, 228, 230, 237
Cole, N.S. 83
Coleman, V. 225
Collins, R.C. 185
combining scores 63–67
balanced scorecard 66
compensatory methods 65
decision-making matrix 67
multiple-hurdle approach 64
profile analysis 65
simple average 64
weighted averages 64
common-item equating 41
common-person research design 41
competency 145–148, 148, 151–155
assessment of 148
competence in non-work-related areas 155
core and cross-functional 151
definition 146
competency framework, drawing up a 146
fairness 154
identification of 153
information, sources of 148
levels of competence 148
performance criteria 147
portfolio of 154
potential barriers 148
range statements 147
technical and higher-order 152
units of competence 146
competitiveness 54
complexity theory 232
componential theory of intelligence 118
computer-assisted 226
computer-based adaptive testing 228
computer-based assessment 219
computer-based testing 226
computerised report writing 230
conceptualising 27
concientiousness 139
concurrent validity 50
conditional probability model 83
confidentiality 96
confirmatory biases 196
confirmatory factor analysis 49
constant ratio model 83
construct validity 47
constructivist 11
constructs 25
content validity 49
continuous adding 71
convergent validity 48
Cooper, C.L. 215
correlation matrix 250
correlations 249
Corrigan, B. 195
cortical neurons 235
Cortina, J.M 195
Costa, P.T. 139, 236
Crafford, A. 190
criterion problem 52, 171
criterion referencing 58
criterion-related validity 50
critical incident technique 153
critical realists 12
Cronbach 42
Cronbach’s alpha 42
cross-validation 30
crystallised intelligence 115, 221
cultural fairness of assessment centres 213–214
assessor 214
design 214
culturally saturated 123
culture fairness 236

D
Dahmer, G. 233
Darwin, C. 112
data capture 251
Davidshofer, C.O. 171–172, 212
Davies, M. 122
Dawes, A. 195
Day, S.X. 185
De Beer, M. 124
décalage 58
360–degree assessment 173
demand characteristics 55, 232
detection rate 69
Detterman, D.K. 220
developmental centres 202
advantages 202
disadvantages 203
developmental sequencing 49
developmental stage referencing 58
deviation IQ 114
deviation score 59
dichotomous items 30
Differential Aptitude Test 88, 185
differential item functioning 80
Dilchert, S. 112
discriminant validity 48
discrimination 77
disparate treatment 78
distortion 55
distracters 28
domain referencing 58
Donovan, M.A. 226
Doverspike, D. 83
DPsych degree 102, 235
Drasgow, F. 226
DSM-IV 138
Du Toit, R. 185
Dubois, D. 146 147 156
d-value 65
dynamic assessment 219, 221, 232
dynamic testing 123

E
Eagan, J. 227
EAP see Employment Assistance Programme
Eber, H.W. 137
ecological validity 51
economic value added 174
ectomorph 128
Educational Testing Service 149
Edwards, D.J.A. 125
emic approach 79
EmmerlingR.J. 222
emotional intelligence 120–121, 222
emperical validity 50
empirical criterion keying 132
empiricist 11
Employment Assistance Programme 197
Employment Equity Act 52, 78–79, 99, 179
employment equity 83
endomorph 128
entrepreneurial creativity 183
equal probability model 83
Erikson, E. 128
error score 37
ethical standards 98
ethics 20
etic approach 79
ETS see Educational Testing Service
evaluation 4
evoked potentials 235
evolutionary biology 233
Excel 252
expectancy tables 57
exploratory factor analysis 49
external stakeholders171
extraverts 135, 187
extremity 55
Eysenck, H. 134, 223

F
face validity 50
factor analysis 48
fairness 11, 75–76, 80, 85, 87–88
absence of predictive bias 75
bias 76
definition 76
discrimination 76
ensuring fairness 85
equal opportunity 75
equal outcomes 75
equitable treatment 75
evidence of unfairness 80
reasonable accommodation 76
removal of discriminatory items 87
single tests/different norms 88
single tests/same norms 88
use of separate tests 87
faking 55, 219
false negatives 68
false positives 68
feedback 96
feelers 135, 188
Fernández-Ballesteros, R. 219, 224, 226
Feuerstein, R. 123
19-Field Interest Inventory 185
Fink, A. 220
Fisher, W.P. 3, 141
fitness landscape 233
five-factor model 139
Flesch-Kincaid grade level 53
Fletcher, C. 196
floor effects 44
fluid intelligence 115, 221
Flynn, J.R. 62, 124
Flynn effect 62, 124
focus groups 153
Fontenesi, M. 234
forced distribution 172
forensic evalution 171
Forer effect 136
formative assessment 5
FouadN.A 185
four fifths rule 84
four humours 134
Foxcroft, C. 8, 34, 73, 88, 226, 227
fractals 232–233
Fransella, F. 142
Friedman, H.S. 129, 131
Frost, N. 117
Furnham, A. 112, 125, 134, 136, 140–141, 146, 154, 156, 163–164

G
Galton, F 112
Gardner, H. 111, 120, 223
theory of multiple intelligences 120
Geisinger, K.F. 241
general ability 184
general knowledge 123
general management competence 183
generalisability 5
genetics 220
George, J.A. 163, 203
Gibson, R.O. 165
Goldberg, L.R. 195
Goldstein, Braverman and Goldstein 3
Goleman, D. 120, 122, 222–224
goodness of fit 81
Gottfredson, L.S. 220
grade equivalents 58
grade referencing 58
Graduate Record Examination 149
Graves, L. 195
Grayson, P. 215
Greaves, J. 223
Greenberg, J. 127
Greenhaus, J.H. 178
Gribbin, J. 221
Grigorenko, E.L. 125
grounded theory 16
Grove, W.M. 230
growth curves 167
Guilford, J.P. 117
Guion, R.M. 176
Guttman scales 31

H
Haier, R.J. 235
halo effect 172
Haney, W. 228
Hansen, J.I. 185
Hanson, M.A. 227
hard of hearing candidates 97
Harmon, L.W. 185
Harris, W.G. 226
Hartle, F. 146
Harvard Business School 163
Hasselmo, M.E. 233
Haverkamp, B.E. 185
Health Professions Council of South Africa 100, 235
Heilman, M.E. 83
Herbst, D.L. 185
hermeneutic 16
Hersey, P. 150, 170
HESA see Higher Education South Africa
Higgins, K.D. 222, 231
high verbal 118
Higher Education South Africa 149
Hitler, A. 233
HIV/Aids 140
Hoepfner, R. 117
Holland, J.L. 179, 181
Horn 116
Hough, L. M. 85, 140–141, 169, 175–176, 195
Human Sciences Research Council 137
Hunt, E.B. 117
Hurst, D.N. 213

I
IDEAS see Interest Determination, Exploration and Assessment System
identifying 203
competencies 203
idiographic 127
illusion of validity 196
image management 55
impression 219
incomplete sentences 132
incremental validity 195
inductive reasoning 115
industrial relations climate 170
information processing 111
informed consent 21
Initial Recruitment Interview Schedule 166
inkblots 131
integrity testing 140
intelligence quotient 113
intelligence 109–112, 114, 219–220
definition 109
discovering rules110
dynamic assessment of 219
genetics 220
information processing 111
learning from experience 110
mental speed 220
models of 114
problem solving 111
recognising patterns 110
socially defined 112
structural approaches 114
understand to comprehend 110
Wechler, D. 110
Interest Determination, Exploration and Assessment System 191
interests 185
internal consistency 41
internet based assessment 231
internet 101, 219
interpretation of results 96
interpretation 30
of test results 30
interquartile range 8
inter-rater reliability 43
inter-scorer 43
interval data 9
interviewing 89, 193–197
continued use 196
counselling interviews 194
definition 193
effective 197
employment interviews 194
reliability of interviews 194–195
semi-structured interviews 194
stages 197
structured interviews 194
traditional interviews 194
validity of interviews 195
introverts 135, 187
intuiters 135
intuitives 187
investment theory of intelligence 115
ipsative scoring 33
IQ see intelligence quotient
I.R.I.S see Initial Recruitment Interview Schedule
Isaac, D.J. 165
item analysis 29, 250
item direction 32
item response theory 220, 228
item weighting 32
item-remainder correlation 29
item-total correlation 29

J
Jackson Vocational Interest Survey 185
Jaques, E. 165
Jeanneret, P.R. 161
job description 66, 154, 161
job diaries 153
Johnson, W. 116
Jones, J.W. 222, 231
Jopie van Rooyen & Associates 185
judgemental measures 171
judgers 135, 189
Jung Personality Questionnaire 186

K
Kaplan, R.M. 38, 43, 69, 81, 87, 92, 127, 132, 134, 197–199
Katz, M.R. 185
Keirsey, D. 186
Keirsey Temperament Sorter 186
Kelly, G.A. 21, 129, 142
key performance areas 146–147
Kirkpatrick, D.L. 170
Klimoski, R.J. 215
knowledge-based systems 234
Kravitz, D.A. 83
Kriek, H.J. 213
Kruger, P. 21
Kuder Occupational Interest Survey 185
Kuder 42
Kuder-Richardson formula 42

L
Langley, R. 185
language ability 54
leaderless group technique 205
Learning Propensity Assessment Device 124
levels of measurement 810
permissible statistics 10
Liddel, C. 21
Lievens, F. 215
lifestyle 183
likert scales 30
Linn, R.L. 83
locating assessment centre exercises 205
locus of control 140
Louw, D.A. 125
Lubinski, D. 115
Lunneborg, C.E. 117
Lyman, H.B. 226

M
magnetic resonance imaging 220
malingering 219
management 219
Maree, J.F. 124, 232
Marks-Tarlow, T. 233
Maslow 129
Matarazzo, J.D. 228
material 99
matriculation 149
matrix of work relations 165
matrix 122
maturational sequencing 49
Mauer, K.F. 53, 137, 168
Mayer, J.D. 122, 222–223
MBTI see Meyers-Briggs type indicator
McCall’s T-score 59
McCaulley, M.H. 136
McClelland, D.C. 129
McCormick, E.J. 161
McCrae, R.R. 139, 236
McIntyre, S.A. 38–39, 44, 49–50, 56, 58, 63, 92, 104, 195, 197, 201, 237
MCMI see Millon Clinical Multiaxial Inventory
measurement 3
measuring 4
technique 4
Mecham, R.C. 161
mechanical/actuarial combination of scores 63
median 8
Meehl, P.E. 63
Meijer, R.R. 229
mental age 58, 113
Mental Measurements Yearbook 241
mental speed 220
Mercer 87
mesomorph 128
Meyer, W.F. 127
Miller, L.A. 38–39, 44, 49–50, 56, 58, 63, 92, 104, 195, 197, 201, 237
Miller, L.K. 24
Miller, M.R. 195
Millon Clinical Multiaxial Inventory 138
Millon, T. 138
Milsom, J. 154, 213
Miltenberger, R. 24
Minnesota Multiphasic Personality Inventory 138, 229–230
MMPI see Minnesota Multiphasic Personality Inventory
MMPI-2 101
Moerdyk, A.P. 190
monitoring 95
Moore, C. 127
Mount, M.K. 140
Murphy, K.R. 171–172, 212
Murphy, L. 241
Murphy, R. 124, 232
Murray, M. 215
Myers, I.B. 135
Myers-Briggs Type Indicator 140, 170, 186, 205

N
National Benchmark Tests 149
National Qualifications Framework 100, 148
naturalistic situations 18
NBT see National Benchmark Tests
need for
achievement 129
affiliation 129
power 129
needs 185
Neisser, U. 125, 220
Nel, P.190
Nell, V. 26, 235
NEO-PI 139
Nering, M.L. 229
nerve conduction velocity 235
Neubauer, A.C. 220
Neuman, G. 226
Nevill, D.D. 185
New Zealand 102
Newell, S. 170
nominal data 8
nomothetic 127
norm development 30
norm groups 61
norm referencing 58
norm 30, 59
normal distribution 59, 60
NormMaker 62
NQF see National Qualifications Framework
numeric calculations 71
Nunnally, J.C. 3, 5, 12, 32, 44–45, 48, 50, 66, 73, 83

O
objectivity 5
observation 4, 15–16, 18, 19, 89
artificial situations 18
casual 15
looking at 16
looking for 16
naturalistic situations 18
observer intervention 18
observer involvement 18
primate behaviour 16
schedules 19
simulations 18
systematic 16
tools or aids 19
observed score 37
Occam’s razor 221
Occupational Personality Questionnaire 133, 139, 186, 196
odd-one-out 123
off-diagonal cells 250
Olson-Buchanan, J.B. 226
O’Neill, C. 190
Ones, D. 112
operationalising 28
OPQ see Occupational Personality Questionnaire
ordinal scales 8
organisational citizenship behaviour 139, 225
organisational development 170
Oswald, F.L. 85, 140–141, 169, 175–176, 195
outliers 66

P
PAI see Psychological Assessment Inventory
PAQ see Position Analysis Questionnaire
parallel form reliability 40
Pausewang, G. 233
perceivers 135, 189
percentiles 859
perceptual speed 115
performance appraisal 171
performance management 164
person profile 161
person specification 161
personal constructs 142
personality assessment 134
four humours 134
Jung’s typology 134
Myers-Briggs 134
Wundt’s typology 134
16 Personality Factor Inventory 133
personality profiles 52
personality 127–129, 130–131, 221, 233
as a fractal 233
assessment of 130
biological approach 128
computer-based simulations 131
definition 127
developmental approaches 128
need theories129
observation 131
phenomenological approaches 129
projective techniques 131
psychoanalytic theories 129
theory 128
trait approaches 130
Peterson, I. 234
physically disabled candidates 97
Piaget, J. 58
pilot testing 29
placement 160
Plake, B.S. 241
Ployhart, R.E. 213
Position Analysis Questionnaire 139, 161, 197
positive psychology 225
Posner, M. 118
post profile 161
posttraumatic stress disorder 233
potential 221
power tests 44
precision 5
predictive validity 50
pre-market discrimination 78
presentation of self 55
Preskill, H. 175
Price, R.K. 235
primary mental abilities 115
Prinsloo, C.H. 137
problem solving 111
Probst, T.M. 226
production measures 171
Professional Board for Psychology 226, 231, 235
professional training 235
promotion 160, 164
psychic unity 78
Psychological Assessment Inventory 138
psychological assistants 235
psychometric tests 26
Psychometrics Committee 138
psychometrist 100
psycho-technician 100
Psytech 185
Public Service Commission of Canada 136, 142, 190
pure challenge 183

Q
qualified individualism 83
quantification 11
social sciences 11
quantifying 28
quartiles 59
Quenk, N.L. 136
quota system 83

R
random error 38
ranking 172
rapport 39, 95
rating 172
rating techniques 173
ratio data 9
Ravens Progressive Matrices 122
Ravens Standard Progressive Matrices 71, 205
reasonable accommodation 97
Ree, M.J. 116, 195
regression analysis 81
Reiber, A. 163, 203
reliability 37, 39, 41, 43
internal consistency 41
inter-rater 43
inter-scorer 43
test-retest 39
repertory grid 21, 129, 154
repgrid see repertory grid
response set 28
response sets 54, 224
restriction of range 44, 54
Retief, A. 168
reverse scoring 32
RIASEC 181–182, 188
expanded RIASEC model 182
Richardson 42
Richman-Hirsch, W.L. 226
rights 99
Robbins, S.P. 127, 185
Roberts, R.D. 122
Robertson, I.T. 215
robustness 38
Roehl, B. 227
Roodt, G. 8, 34, 40–41, 73, 226–227
Rorschach test 139
Rosenthal, R. 196
Rothwell, W.J. 156
Rounds, J.B. 185–186
Rubin, D.B. 196
Russ-Eft, D. 175
Russel, C. 82
Russel, C.J. 92
Russel, M. 227–228
Ryan, J. 185

S
Saccuzzo, D.P. 38, 43, 69, 81, 87, 92, 127, 132, 134, 197–199
Salovey, P. 122, 222–223
SAQA see South African Qualifications Authority
SAT Reasoning test 149
satisficing 221
sausages on a stick 82
Saville & Holdsworth Ltd 162
scale length 45
Schein, E. 183
Schlechter, A. 190
Schmidt, F.L. 92
Schustack, M.W. 129, 131
Schwartz, A.L. 163
scoring 96
SDS see Self-Directed Search (SDS)
second guessing 55
security 99
security/stability 183
selection criteria 161
selection ratio 69
selection 160
Self-Directed Search (SDS) 185
self-fulfilling hypotheses 196
self-referencing 59
sensing 135
sensitivity 38
sensors 187
service/dedication to a cause 183
sex hormones 235
Shackleton, V. 170
Sheldon, W. 128
Sheppard, L.D. 220
SHL 185, 203
SHL-SA 102
Shore, B. 79
Shuttleworth-Jordan, A.B. 53
Siegle, G.J. 233
Silence of the Lambs 195
Simon 113
simulations 18
situational leadership 150
sliding band approach 84
Slutske, W.S. 229
social desirability 55
social facilitation 197
somatotypes 128
SOMPA see System of Multicultural Pluralistic Assessment
SORT see Structured Objective Rorschach Test
South African Medical and Dental Council 100
South African Qualifications Authority 148
spatial visualisation 115
spatial-practical-mechanical ability 116
Spearman, C. 114
Spearman-Brown formula 42
special situations 97
specialist register 100
specific aptitudes 185
speed tests 44
Spies, R.A. 241
spot the error 71
SPSS see Statistical Package for the Social Sciences
SST see stratified systems theory
staff development 164
Stamp, G. 165, 168
Stampe, D. 227
standard deviation 59
standard error of measurement 38–39, 85
sources of error 39
standard score 59
standardisation 40
stanines 59
Stankov, L. 122
Stanton, J.M. 227
Stanush, P. 83
Statistica 251
Statistical Package for the Social Sciences 256
statutory control 99
stens 60
Sternberg, R.J. 111, 118–119, 125, 220
componential theory of intelligence 118
triarchic theory of intelligence 119
Stevens 128
Stewart, V. 129, 142
strange attractor 232–233
stratified systems theory 164, 165, 232
Strauss, J.P. 140
Strong Interest Inventory 185
Structure of Intellect 117
Structured Objective Rorschach Test 132–133
empirical approach 132
factor analysis approach 133
logical approach 132
objective measures 132
theoretical approach 132
the trait approach 133
Stumpf, R. 88
subjective scoring 45
Sullivan, L. 222
summative assessment 5
Super, D.E. 185
survey 25153
Swerdlik, M.E. 12, 34, 56, 73, 92, 125, 139, 193, 197, 199, 228, 230, 237
System of Multicultural Pluralistic Assessment 87

T
Targeted Selection 197
TAT see Thematic Apperception Test
Tatsuaoka, M.M. 137
Taylor, T 124
Team Role Inventory 169
team work 169
technical/functional competence 183
temporal stability 44
Terman, L. 113
Test Commission of South Africa 100
Test of English as a Foreign Language 149
test sophistication 44, 54
testing 4–5
test-retest reliability 39
Tests in Print 241
Thematic Apperception Test 129, 131
theoretical validity 47
theory of measurement 37
theory of multiple intelligences 120
Theron, C. 92
thinkers 135, 188
Thorndike, R.L. 83, 223
Thurstone, L. 115, 121
time limits 54
T-patterns 234
Tracey, T. 185
track record 172
track record information 171
training 170
training for psychologists 101
trait 127–128
cardinal 128
central 128
secondary 128
transfer 160, 164
transfer effects 40
Trauma Symptom Inventory 72
triangulation 8
triarchic theory of intelligence 119
true negatives 68
true positives 68
true score 37
Tryon, W.W. 233
Tsacoumis, S. 213
type 1 error 68
type 2 error 68
type A and type B personalities 140

U
United Kingdom 102
United States 102
unqualified individualism 83
Urbina 58

V
validity generalisation 52
validity 47–53
concurrent 50
construct 47
content 49
convergent 48
criterion-related 50
developmental sequencing 49
discriminant 48
ecological 51
empirical 50
face 50
factors affecting 53
generalisation 52
maturational sequencing 49
predictive 50
theoretical 47
values 185
venue for assessment 94
verbal comprehension 115
verbal fluency 115
verbal-educational ability 116
Vernon, P.A. 220, 235
Vernon, P.E. 116
Verster, J.M. 228
Viljoen, H.G. 127
visually impaired candidates 98
Viswesvaran, C. 112
Vocational Preference Inventory 185
Vosloo, H.N. 185
VPI see Vocational Preference Inventory
Vygotsky, L.S. 124

W
WAIS see Wechler Adult Intelligence Scale
Walters, L.C. 195
Wechler, D. 110, 111
Wechler Adult Intelligence Scale 113
Wechler Intelligence Scale for Children 113
Weiss, T.B. 146
Westen, D. 111, 119, 125
Wiggins, J.S. 195
Williams, R.W. 175
WISC see Wechler Intelligence Scale for Children
Wolfaardt, J.B. 40–41
Work Profiling System 162
work samples 139
Wundt, W. 134
Wundt’s typology 134

Z
Zacagnini, J.L. 234
Z-score 59

You might also like