Professional Documents
Culture Documents
psychological assessment
SECOND EDITION
Alwyn Moerdyk
Van Schaik
PUBLISHERS
Published by Van Schaik Publishers
A division of Media 24 Books
1059 Francis Baard Street, Hatfield, Pretoria
All rights reserved
Copyright © 2015 Van Schaik Publishers
Please contact DALRO for information regarding copyright clearance for this publication. Any
unauthorised copying could lead to civil liability and/or criminal sanctions.
Tel: 086 12 DALRO (from within South Africa) or +27 (0)11 712 8000
Fax: +27 (0)11 403 9094
Postal address: PO Box 31627, Braamfontein, 2017, South Africa
http://www.dalro.co.za
Every effort has been made to obtain copyright permission for material used in this
book. Please contact the publisher with any queries in this regard.
Please note that reference to one gender includes reference to the other.
The cover picture is of Nadia Comaneci, the first person ever to get a perfect score of
10 out of 10 for all her Olympic gymnastics events. This picture is used because it
illustrates the importance of attaching numbers to behaviour, without which no form of
accurate judgement or agreement about performance or behaviour of any kind is
possible.
About the author
Firstly, most of the existing texts tackle the issue of assessment from a
clinical psychology or mental health perspective, while very few
approach assessment from an organisational perspective.
However, we are all aware that much of the work done in organisations
by industrial psychologists and people with an industrial or
organisational psychology background involves psychological
assessment in some form. The best known is assessment for selection
and placement, although various other forms of assessment take place
routinely. Most people also assume that assessment is aimed at
individuals, but organisational effectiveness depends on sound practice
at three or four distinct levels – the individual, the group or team, the
organisation, and external stakeholders such as clients and suppliers.
Most of the available assessment texts do not cover all these parameters.
The book is divided into five sections. The first begins by looking at the
basic theory of measurement and puts forward an unashamedly
empiricist view – we need to measure objects in order to understand
them fully. The text describes the properties of a good measuring device.
It then distinguishes between looking at and looking for, and goes on to
examine how we set about systematically observing some phenomenon,
a discussion which takes us on to how we set about drawing up an
accurate and reliable instrument or technique for assessing human
characteristics and attributes.
The second section of the book covers the basic technical matters of
psychometric theory, namely reliability, validity and interpretation of
assessment results, or what an assessment score means. The text also
considers how best to combine several assessment results to ensure
sound decisions (Chapter 6). It takes an in-depth look at the issue of
fairness, how it is measured, and ways to improve the fairness of the
assessment process. The section closes with a discussion of the
principles underlying the sound management of the assessment process,
including the control of assessment materials and the training of
assessment professionals and practitioners. South Africa’s current
policies and standards regarding control and professional training are
compared with those of other countries.
The last section of the book is Chapter 18, which examines a variety of
“evolving” issues (such as definitions of emotional intelligence and
competence), as well as some emerging trends. One of these is the
increasing computerisation of the assessment process (and the promise
of new techniques and old problems that result from this). The chapter
also looks at some new areas of theory that are likely to impact on
psychological assessment, particularly the theories of artificial
intelligence and chaos theory or complexity science.
It has been five years since the first edition appeared and, although the
content is as relevant as it ever was, there is a need to revisit and update
the material in the light of changing developments and new theory.
Two new chapters have been included. The first of these is Chapter 8 in
which the important issues of assessing in a cross-cultural context are
explored in some depth, looking at both the theoretical issues raised by
assessing people with limited ability in the language of the assessment
(usually English in our context) and differences in understanding
resulting from different social and cultural experiences. The chapter
examines some of the technical issues (based on Item Response Theory
or IRT) required to detect the presence and analyse the extent of any
cross-cultural factors that may affect the validity and fairness of the
decisions based on the assessment.
The second new chapter is Chapter 13, which looks in depth at the
assessment of honesty and integrity. Some of the material was
previously contained in the chapter on Personality (Chapter 11), but it is
such an important topic in this country and throughout the world that it
was decided to devote a whole chapter to it.
Apart from these changes, the points made in the Preface to the first
edition remain as true and as important today as they were five years
ago.
Foreword to the first edition
This is a book that will be referred to again and again because of its
usefulness in so many courses and in the real-life work situation. It will,
I am sure, become a “must buy” for many. The author is to be
congratulated on producing a real gem.
Adrian Furnham
DPhil (Oxon) DSc (Lond) DLitt (Natal)
Professor of Psychology
University College, London
February 2009
Foreword to the second edition
When I wrote the foreword to the first edition I stated that I thought the
book was well written and that it would be well received by students,
academics and practitioners alike. This has indeed been the case.
I am pleased to see this second edition and its two new chapters, one on
assessing integrity and the other on assessing in cross-cultural contexts.
Both of these are vital content areas in today’s world. I am proud to
associate myself with this updated edition which I am convinced will
have an impact as great, if not greater, as the first edition.
Adrian Furnham
DPhil (Oxon) DSc (Lond) DLitt (Natal)
Professor of Psychology
University College, London
May 2014
Acknowledgements
I must also thank the staff of Van Schaik Publishers for all the work they
have put into the market research for, and production of, the book over
the years. In particular, the efforts of Julia Read and Nangamso Phakathi
are acknowledged. The hard work of Marike von Moltke and Lee-Ann
Lamb in the production of the second edition is also recognised with
gratitude.
I trust that you, the reader, will find the text useful and I hope that it
inspires you to be a true professional, approaching the assessment of
others with a scientific rigour and great sensitivity. What may seem like
an everyday activity to you may mean the difference between success
and failure for the people you assess – every assessment is a “high-
stakes” process in the lives of those who undergo it.
ALWYN MOERDYK
Grahamstown
February 2009 / May 2014
The aim of this book
Section 1 begins with a chapter that looks at what assessment is and the
benefits of quantification within a positivist or neo-positivist framework.
Chapter 2 examines processes of obtaining data, using a general
observation approach. Chapter 3 outlines the process of drawing up
and/or translating psychological measures. (This provides the basis for a
useful practical exercise or project for more senior students.)
Chapter 2 Observation
2.1 Introduction
2.1.1 Casual observation
2.1.2 Systematic observation
2.2 The ABCs of observation
2.2.1 Antecedents – those things that go before
2.2.2 Behaviours
2.2.3 Consequences – things that follow from the behaviour
2.3 Ways of categorising the observation process
2.3.1 Context
2.3.2 Observer involvement
2.3.3 Intervention or manipulation
2.4 Use of tools or aids
2.5 Observation schedules
2.6 Assessment as a form of research
2.7 Ethical issues
2.8 Summary
Chapter 5 Validity
5.1 Introduction
5.2 Forms of validity
5.2.1 Construct (theoretical) validity
5.2.2 Content validity
5.2.3 Criterion-related (empirical) validity
5.2.4 Face validity
5.2.5 Ecological validity
5.2.6 Incremental validity
5.2.7 Synthetic validity
5.3 Interpreting validity coefficients
5.4 The criterion problem
5.5 Validity generalisation
5.6 Factors affecting validity
5.6.1 Characteristics of the assessment technique or instrument
5.6.2 Individual characteristics
5.6.3 Demand characteristics
5.7 Summary
Chapter 16 Interviewing
16.1 Introduction
16.1.1 Definition
16.1.2 Users of the information
16.2 Employment interviews
16.2.1 Traditional interviews
16.2.2 Structured interviews
16.2.3 Semi-structured interviews
16.2.4 Counselling interviews
16.3 Problems associated with interviews
16.3.1 Reliability
16.3.2 Validity
16.4 Reasons for poor reliability and validity
16.4.1 Theoretical orientation
16.4.2 Experience of the interviewer
16.4.3 Sophistication of the client
16.4.4 The nature of the problem
16.4.5 Confirmatory biases and self-fulfilling hypotheses
16.4.6 So why do they continue to be used?
16.4.7 Improving interviewing as an assessment technique
16.5 Stages of an interview
16.6 Effective interviewing
16.7 Summary
Appendices
Appendix 1 Some tests and measures of maximum and typical
performance
Appendix 2 Calculating correlations
Glossary of terms
References
Index
SECTION
1
OBJECTIVES
1.1 Introduction
If we stop and think for a minute, we will see that there are a number of
reasons for assessing any object, person or process. Firstly, we may
want to see whether the person (or object) meets certain requirements –
does the person know enough to pass his examination? Is the person
coping with his situation?
We may also want to see whether a situation has changed over a given
period: has a person’s behaviour or ability improved, stayed the same or
deteriorated over time? Has our intervention (training or counselling)
resulted in any changes?
1.1.2 Measurement
A closely related issue is that of measurement*. Although it may not be
too difficult to assess whether a person or system has a certain property,
it is often quite difficult to specify exactly how much of the property the
person or system possesses. For example, we may be able to judge that a
person is beautiful or intelligent, but it is far more difficult to say how
beautiful or intelligent the person is. Theories of measurement are
concerned with quantification of this kind. According to Nunnally and
Bernstein (1993, p. 29),
form
size/magnitude behaviours
intensity cognitions
duration of atributes/traits
frequency abilities
antecedents interventions
consequences
1.1.2.1 Evaluation*
Related to measurement is evaluation. This involves interpreting or
attaching a judgemental value to an assessment: the water in my bath
may be too hot or too cold, or I may feel that I am too fat or too thin, or
even just right.
If I see water bubbling and steam coming from it, then I can assume that
the water is near to boiling and I do not have to test it. If I see chunks of
ice floating in the water, I can surmise that it is cold. This is called
observation*. Testing*, on the other hand, is the use of an intervention
of some kind to carry out the assessment. Putting my hand in the water
is a crude kind of testing procedure – I can tell if the water is too hot, too
cold or just right. Of course, using a thermometer is a much better option
because you get an accurate measurement that is easy to interpret.
According to Kaplan and Saccuzzo (2013), “the most important purpose
of testing is to differentiate among those taking the test” (p. 9).
1.2.1 Objectivity
A key to the growth of understanding is that different observers are able
to agree about what is being observed. Objectivity* is the extent to
which any process and its results are agreed to by neutral or unbiased
observers and is thus independent of the personal or subjective
judgement of those involved. In the case of assessment, objectivity is
enhanced when numerical values are attached to an object or
phenomenon in terms of known rules. Without this agreement between
observers about what is being observed and the results of this process,
there can be no knowledge, only speculation.
The ancient Greek philosopher, Aristotle, is reputed to have stated that women
have fewer teeth than males because their heads are smaller. He could easily
have disproved this theory by simply asking his wife to open her mouth and
counting her teeth! This is what observation is about.
1.2.2 Precision
Measurement allows finer, more precise distinctions to be made, leaving
room for more subtle effects to be noted than is possible when personal
judgements are made. The average person is not able to judge when the
temperature of an object has risen a few degrees, nor to tell the
difference between a person with an IQ of 100 and one with an IQ of
110. However, in certain circumstances such distinctions could be vital.
1.2.4 Generalisability
A key aspect of any scientific enterprise is to find ways of generalising
from the specific to the general. For example, my dog at home is a
specific case of the more general class of animal known as a border
collie, which is a specific case of the more general class of creatures
known as dogs, which is a specific case of the more general class of
creatures known as mammals, which is …, and so on. Measurement
allows us to quantify and classify objects within larger superordinate
classes. In this process, we are able to specify what each case has in
common with other cases and how they differ. Although it may be
argued that everybody is unique and has nothing in common with other
people, this is clearly not so. Look around and you will see other males
and females, people of African, Indian and European origin. In some
ways we are all the same (we breathe oxygen and bleed red when we cut
ourselves), in others we are the same as only some (we are male or
female, for example) and in some respects we are unique. Our level of
focus depends on the questions we ask and the type of evidence we
regard as answers to these questions.
1.2.5 Communication
It is much easier to communicate and interpret information that is in
symbolic or numeric form. For example, we know what is meant when
we read that School X produces more A symbols in Grade 12 than does
School Y. We know what the A symbols refer to and so the information
does not have to be explained. However, suppose it is reported that a
new medication seems to make people more anxious. What does “more
anxious” mean and how would other researchers of the same
phenomenon interpret this? Conversely, if it is reported that people’s
average anxiety levels, as measured by the XYZ Anxiety Scale, rose
from 7,3 to 8,6, everyone who knows the XYZ Anxiety Scale would
easily understand what this means.
1.2.6 Economy
It is much easier to state that, on average, anxiety levels as measured by
the XYZ Anxiety Scale rose from 7,3 to 8,6, than to try to explain or
describe what this means in words.
But, you may ask, why should we want to quantify any phenomenon?
Why do we need to attach numbers to properties?
1.3 Why do we assess?
Given the reasons for wanting to obtain information described in 1.3, the
next question that arises is where and how do we obtain this information
or what are the sources of information available to us? In general, there
are six basic approaches to obtaining data.
1.5.4 Interviews
A fourth source of information is to ask questions of the person and/or
those involved, such as parents, teachers and even victims, in the case of
a crime. These interviews may be structured or unstructured. In many
instances (such as with hospital/clinical intake and job selection
interviews), these interview schedules are standardised and available
commercially (see Chapter 16 for more details).
1.5.6 Intervention
The final form of observation involves some form of direct intervention
by the observer in an effort to answer “What if?” questions. For
example, a therapist may take a toy from the child he is observing to see
what happens when the toy is taken away. This is clearly an extreme
version of participant observation. If the therapist repeats this
intervention several times and controls extraneous conditions that may
influence the outcome, then this is called an experiment*.
1.6 Triangulation
multiple measures
multiple domains*
multiple sources
multiple settings
multiple occasions.
The only permissible statistical manipulations that one can perform with
nominal data is to count the number of cases (e.g. in the psychology
class there are 93 1s (males) and 167 2s (females)). If there are several
categories, we can also establish the mode, which is the category with
the most members. We can also ask whether these numbers are to be
expected (given the number of males and females in the university as a
whole). To do this we would use one of several non-parametric
statistical techniques*, the best known of which is the chi-square (ᵪ2)
test.
The permissible statistics that can be used with interval data are the
arithmetic mean* (average), and statistics based on variance, such as t-
tests, Pearson correlation and analysis of variance. The normal
distribution and interpretation of an individual’s performance relative to
that of the group require interval data.
Description Value
Very cold ............................................ 1
Cold ................................................... 2
Not so cold ......................................... 3
Lukewarm .......................................... 4
Quite warm ........................................ 5
Hot ..................................................... 6
Very hot.............................................. 7
Boiling hot.......................................... 8
Note that the number of categories is quite arbitrary, and although the
temperatures are arranged in ascending order from coldest to hottest, the
difference between the various values is not constant – the difference
between “very cold” and “not so cold” is not the same as between “not
so cold” and “lukewarm”.
Finally, there is the Kelvin (or absolute) scale, which has an absolute
minimum of 2273,15 K. It is impossible to have a temperature lower
than this. According to the Kelvin scale, water freezes or melts at 273,15
K and boils at 373,15 K (273,15 K + 100). As a result of this absolute
zero, 100 K is exactly half as hot as 200 K. These relationships are
shown in Figure 1.1.
Example
Type of from Example from
Basis Permissible statistics
data natural social sciences
sciences
Nominal Group Number of cases Mode Food type Ethnic origin
membership Chi-square (ᵪ2) Gender
As a result there are two schools of thought in psychology (and the other
social sciences) which can be labelled the empiricist or quantitative*
approach and the constructivist or qualitative* approach. The
empiricist/quantitative approach believes that the world is real and exists
outside the experiences of the observer, and can therefore be measured
in reasonably accurate ways. On the other hand, the
constructivist/qualitative school argues that everything is in the mind
and is created by the observer in terms of categories and relationships
that have been learned. This group of people prefers to examine the
language that is used and how this shapes the stories or narratives people
use to describe their experiences. In the middle is a group of people who
see themselves as critical realists*, arguing that there is a real world out
there, but that this is shaped and constructed by our life experiences,
value systems and the cognitive schemata and categories that we bring
to bear on the issues.
Although we will not go into any great analyses in this regard, we need
to take note of these philosophical issues, because the approach we use
will be influenced by the theory we adopt. This book adopts a critical
realist outlook – there is some kind of reality out there, although we all
interpret it slightly differently as a result of our socialisation
experiences.
1.10 Summary
We then considered briefly how to ensure that the measure is a good one
(consistent), that it measures what it claims to measure and that it does
this in a fair manner, and touched on the idea of how to interpret our
scores. In closing, we discussed some of the problems associated with
quantification in the social sciences.
Additional reading
The section on the advantages of quantification comes from Chapter 1 in Nunnally, J.C.
& Bernstein, I.H. (1993). Psychometric theory. This is a very good, though somewhat
technical account of psychometric theory, taking an analysis of variance approach.
For a refresher on levels of measurement and basic statistical concepts, see Cohen,
R.J. & Swerdlik, M.E. (2002). Psychological testing and assessment: An introduction to
tests and measurement (5th ed.), especially Chapter 3.
Test your understanding
Short paragraphs
Essays
1. “The ancient Greek philosopher, Aristotle, is reputed to have stated that women have
fewer teeth than men, because their heads are smaller. He could easily have
disproved this theory by simply asking his wife to open her mouth and counting her
teeth!” Comment on this statement, showing why observation and quantification are
important in any scientific endeavour.
2 Observation
OBJECTIVES
2.1 Introduction
For the rest of this chapter, and the book as a whole, we will assume that
when we observe people (or other objects, systems or organisms) we do
this for a purpose and not merely to pass the time.
Let us consider this case study briefly. If we look at this study, what do
we see?
Firstly, there was the casual observation that the alpha monkey had first
choice of food.
Secondly, there was the “What if?” question: what would happen if the
troop was to be presented with a novel situation? At this point, the
observation passes the “So what?” test.
Thirdly, there was some sort of intervention (the white bananas) which
allowed the research psychologists to explore the “What if?” question.
Clearly, this involved a series of systematic observations in which the
research team made very specific observations as they looked for
information about how the monkeys reacted to the novel situation.
Finally, the researchers put forward a theory about what they had
observed. This in turn led to a number of additional questions that
needed to be answered, and so they could (and did) devise a number of
other mini-experiments to explore aspects of the behaviour that
interested them.
Ah, you might say, but what has this got to do with real psychology? If
you look at Case study 2.1 at the end of this chapter, you will see the
importance of observation in a childcare situation, in which the effects
of crowding on the behaviour of nursery school children were observed.
Read it now on page 21. It may surprise you to learn that the major
author of the paper on which the case study is based, Christine Liddell,
was trained by the same research psychologists that did the white banana
study.
Firstly, it shows that while some observation is casual (looking at), most
of the observation that a psychologist, consultant or manager is involved
in is systematic and involves looking for relationships, as illustrated in
sections 2.1.1 and 2.1.2. These relationships may involve trying to
understand the causes of a particular problem, or why things are not
working as they should. If your car will not start or your cellphone will
not work, you will look for causes or reasons.
2.2.2 Behaviours
Clearly, when we observe an animal, person or system, we need to focus
on what is actually taking place at the time of our observation. As
already stated, this observation can be casual or systematic. During
observation, we must be quite clear about what we are looking at and
what constitutes a particular action. Although this may seem like
common sense, many behaviours are quite subtle and we need to have
clearly defined criteria for saying what X or Y is doing. For example, is
a person disagreeing with others in the meeting because of his
ideological standpoint or is there a real practical issue involved? It is
important therefore that we draw up a carefully constructed checklist of
behaviours so that seemingly similar behaviours are clearly identified to
ensure that there is no confusion or overlap during the observation
period.
2.3.1 Context
Context refers to the setting in which the observation takes place. We
can distinguish between naturalistic, simulated and artificial conditions
or situations.
Case study 2.1 clearly illustrates the final point to be made in this
chapter – that assessment is nothing more than research at the individual
level. According to Wise (1989), assessment is a specialised application
of the scientific method. All assessments involve
formulating a question
designing the means to address the question
interpreting the results
making recommendations
reporting the results.
Permission to assess
I, …………………………, understand the purpose for which I am being
assessed and how the results of the assessment will be used.
[Signed]
[Date]
Much of the rest of the book examines these different approaches, and
shows us how to evaluate the effectiveness of the various assessment
methods – how well do they do what they claim to be doing? We also
look at how to interpret the results of various data-collection techniques,
and to make sense of our observations about people and systems. What
do our results actually mean?
Method
Participants
The nursery school was located in a township near a large South African city and
catered for 83 children (44 boys and 39 girls) ranging in age between 32 and 64
months. The majority (75%) came from intact families with the father present, and
the balance from female-headed families. The township houses were small with
between three and 18 people per household (Mean 8, SD 3), resulting in an
average of 4 m2 per inhabitant. Parental occupations ranged from domestic
workers to teachers. Rates of growth and nutritional status were comparable to
those found in US children.
Procedure
The study lasted for 12 weeks with 33 sessions in all. The first five were
habituation sessions, designed so that the children could get used to the
presence of the observers. In addition, some of the children were identified as
people to observe, and they were given coloured aprons to wear. The habituation
sessions also allowed them to get used to these. The second five sessions were
used to gather pilot data and to refine the data collection categories.
Conventional focal-child sampling*, in which an individual child is identified and
watched continuously, proved ineffectual, and so a system of group scanning
was used. In this process, the observer progresses through various groups in a
set order and observes the behaviour of the targeted individuals in each group
before moving on to the next. (This technique has been used extensively in
primatological research.) The sequence of the groups is determined by their
spatial organisation. In this study, the indoor area was divided into four equal
quadrants by means of wall markings. A starting point in each quadrant was also
identified. At the beginning of the observation period, the child nearest the
starting point was observed for ten seconds (indicated by a buzzer in the ear of
the observer) and his behaviour noted. Thereafter the child closest to the first
child, as well as any other children with whom the second child was interacting,
was also observed for ten seconds. During this process, the observer moved
slowly through the four quadrants to ensure a good view of what was taking
place, making notes as he moved through the classroom. Codings were spoken
into a portable tape recorder carried by the observer and later transcribed onto
coding sheets.
A total of 184 group scans was collected, eight for each of the 23 data collection
sessions. Repeated samples were sometimes taken when children moved from
one quadrant to the next ahead of the observer. These duplicates were discarded
before analysis. Three focal areas of behaviour were selected for investigation,
namely level of social participation, activity and aggressive behaviour.
B) ACTIVITY
Five levels of activity were identified:
C) AGGRESSIVE BEHAVIOUR
Five forms of aggressive behaviour were identified:
Physical aggression (fighting, trying to hurt another)
Domination (chasing another child, trying to take a toy from him/keeping the
toy)
Dispute of object (trying to keep an object taken from another)
Failure to take object (trying to keep an object but failing)
Submission (giving in to other’s demand, losing possession of a toy or
apparatus)
Reliability of measurement
In order to ensure an acceptable level of agreement between the observers (a
form of interrater reliability), inter-coder agreement coefficients* (ACs) were
calculated by taking the number of agreements between the observers as a
proportion of the total number of observations. After five days of training, there
was perfect agreement between the observers on the Yes/No items (AC = 1,0)
and an average AC of 0,85 for the other measures, ranging between 0,71 for
socially mediated activities and 0,94 for unoccupied behaviour.
Results
In short, it was shown that as density increased, so the amount of socially
mediated behaviour decreased and unoccupied behaviour increased. Despite the
much higher density levels in this case study compared to the US studies, the
same pattern of behaviours emerged with fewer social behaviours in the high-
density situations in relation to the low-density ones. In general, the absolute
levels of socially mediated behaviour seemed to be slightly lower than those
displayed in the US with comparable samples.
We give this brief summary of Liddell and Kruger’s article to show how
direct observation can be used in an assessment process. It illustrates
how the observation process needs to be carefully planned and executed.
We can use the same approach in observing an employee’s workplace
behaviour and social interactions with a teamwork situation.
Case study 2.1 shows how the various components of assessment were
put into practice. Firstly, the study was naturalistic, because it took place
in the venue where the children spent most of their time. The observers
were present but did not interfere with the children in any way. (In fact,
they did nothing for the first few days until the children accepted their
presence as normal, thus trying to make the situation as normal as
possible.) This was a case of naturalistic observation without
intervention.
Secondly, the space involved was subdivided and the observers moved
systematically through each area during the observation periods. This
was a direct result of the planned need to observe the phenomena in the
most controlled manner possible.
Thirdly, observations were made every 15 minutes and each lasted ten
seconds, with the clearly defined interactions that occurred in the time
slot and the geographic area being noted before the observers moved to
their next area. This is an example of scheduling, both temporal and
spatial.
Fourthly, the first five observation sessions were used for training and to
pilot the techniques. The actual observations continued for a period of
12 weeks comprising 23 observational sessions.
2.8 Summary
Additional reading
Short paragraphs
Essay
Outline the different ways in which the act of observation can be characterised. What
are the various parameters that can be used to describe observations?
3 Developing a psychological
measure
OBJECTIVES
3.1 Introduction
Although surveys are very useful and are widely used by researchers,
especially market researchers, they present a problem in that a wide
range of topics is usually covered. Psychology researchers are generally
interested in very narrowly defined constructs* such as intelligence* or
personality*, and it is very difficult to assess these using a broad-based
questionnaire or survey. Therefore psychology researchers draw up a
scale or test in which the items* (questions) are carefully designed to
assess the construct they are interested in. The terms are discussed in
more detail in section 3.5.
Verbal
– Reasoning: A man walks east for 60 m and then …
– Analogies: Hand is to arm as foot is to …
– Understanding or comprehension: Reading studies
– Knowledge: Who was the first black president of South Africa?
– Language: What does “apprehensive” mean?
– Grammar: Which word is wrong? “The cat sit on the mat.”
– Spelling: Which word is incorrectly spelled? “The kat sat on the
mat.”
Numerical
– Arithmetic: 2 + 2 = …
– Series: What comes next? 2 4 8 16 …
Symbolic
Codes
– If the code for BED is 254, what is the code for DOG?
Apparatus
– Tracking
– Assembly: Use the different pieces to make a face or human figure.
– Series: Form Series Test: Using different shapes that are supplied,
show which is the next in a series by physically placing the shapes
in position.
Narrative
From this brief outline, we can see that developing a scale involves the
following seven steps:
3.5.1 Conceptualising
The first step in measurement is to gain a clear understanding of the
phenomenon of interest of the domain. In other words, we must clarify
what we are looking for. In our example, the question is: What do we
mean by job satisfaction? To do this, we ask the following:
3.5.2 Operationalising
The next question we have to ask is: How would job satisfaction reveal
itself? What are the indications that this phenomenon is present (or that
a process has occurred)? How does a person with a high level of job
satisfaction think and behave differently from a person with low job
satisfaction? Using the various components and dimensions identified in
the conceptualisation stage, we can generate as many indicators or
statements as possible to reflect these.
1. Keep them as short and simple as possible – research has shown that the
longer the item, the less accurate it is.
2. Ask simple and direct questions – do not try to be too subtle.
3. Avoid negatively phrased items.
4. Avoid idioms or the use of foreign terms.
5. Avoid asking two questions in one item.
6. Ask specific questions – rather ask: “What newspapers have you read this
week?” than “What newspaper do you read?”.
3.5.3 Quantifying
How can we attach a value to what we have observed? How can we
count examples of, or measure the intensity of, the construct we are
trying to measure? There are three requirements.
Clearly, once all these aspects have been successfully addressed, the
assessment technique is ready for use. All that remains is for the
technical manual, containing information about its reliability and
validity, as well as norms or other ways of interpreting the data, to be
compiled and submitted to the Professional Board for Psychology for
classification. Of course, if we want to market it as a commercial
product, we then need to find a publisher, marketer and distributor.
These processes clearly lie beyond the scope of this book. (For another
look at this process of test development, see Foxcroft & Roodt, 2005.)
I would be disappointed if I did not get a first-class pass in each Agree Disagree
of my subjects
With this approach, we simply count the “ones” to get a total, which
then represents the person’s level of achievement motivation. There is
nothing wrong with this procedure, and it is identical to what happens in
tests that use a multiple-choice question (MCQ) answering format.
However, it is more appropriate when the item is either right or wrong.
In many cases, as in the statement above, the people who agree with it
could nevertheless agree either more or less strongly. This leads to the
next answering format.
1 2 3 4 5
Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree
At first glance it looks as if Likert scales are interval scales, because the
numbers run neatly from 1 to 5. However, this is not the case. Each
Likert item is at best an ordinal scale: there is no evidence to indicate
that the difference between “Strongly agree” and “Agree” is the same as
the difference between “Agree” and “Neither agree nor disagree”.
However, it is treated as such. At the same time, as Nunnally and
Bernstein (1993, p. 67) point out, when a large enough number of Likert
scale items are combined, the resultant scores begin to take on interval
scale properties. This has to do with the notion of measurement error*,
and is closely related to the law of large numbers. This is discussed in
Chapter 4 when we look at the theory of measurement in relation to
reliability.
The second issue is whether there should be a midpoint in the scale. In
the answer format given above, a five-point scale is used, with the
midpoint 3 being “Neither agree nor disagree”. Those people who
oppose having a midpoint (i.e. those in favour of having an even number
of choices) argue that it is very seldom that people do not have an
opinion either in favour of or against a particular issue. They believe that
in reality there is no neutral midpoint, and argue that having a midpoint
only encourages people to sit on the fence and not commit themselves
either to being in favour of or against the view expressed. There is
evidence that some cultures (e.g. the Chinese) prefer not to disagree with
others and therefore to choose the neutral point. In view of this, answer
formats with an even number of items (i.e. no neutral point) should
always be used in this context.
1 2 3 4 5 6 7
Strongly Moderately Agree Neither agree Disagree Moderately Strongly
agree agree nor disagree disagree disagree
1 2 3 4 5 6 7 8
Strongly Moderately Agree Slightly Neither Slightly Disagree Moderately
agree agree agree agree disagree disagree
nor
disagree
In this regard, my own experience is that the lower the educational level
of the target audience, the fewer the options that should be made
available. I would therefore recommend that, in general terms, people
with an education level below Grade 10 or 12 should not be given more
than four or five choices, as it may become confusing.
Would you let an X (where an X could be a member of any social group, e.g. a
Rastafarian, a blonde person, a bee-keeper, a drug addict, etc.)
Yes No
a) attend your place of worship
b) live in your street
c) live next door to you
d) visit you in your house
e) have you as a friend
f) date your child
g) marry your child
In this example, if you said “No” to item d), you would also be likely to
say “No” to items e), f) and g). For another example, see Nunnally and
Bernstein (1993, pp. 73–74).
Although this approach was popular at one stage, it is not used very
often these days for two main reasons. Firstly, it is quite difficult to
construct a meaningful hierarchy in many cases. This is made even more
so by the fact that different people and different groups of people may
attach different values to the various anchors in the hierarchy. For
example, a strongly religious person could conceivably rate the first item
in the hierarchy above (namely “a) attend your place of worship)” as
more important than “g) marry your child”. Secondly, because these
items are in a hierarchy, much of the information is wasted. If a person
answers “No” to item d), then we know that he is likely to answer “No”
to items e), f) and g) and “Yes” to items a), b) and c). Therefore,
collecting information on items other than d) is a waste of time and
effort. For a more technical criticism of Guttman scales, see Nunnally
and Bernstein (1993, pp. 74–75).
The example shows that the five different items have been given
different weights, and these influence the final score. Note also that the
last item has been given a negative score because endorsement of this
sentiment is seen as negative and the person is marked down as a result.
In practice, this form of negative marking is seldom used. The
weightings are theoretically based in the first instance, but are then
modified over time in the light of experience and research.
Weighted
Item Score/5 Weight
score
I would be disappointed if I did not get a first-class 3 2 6
pass in each of my subjects
I know that I am capable of getting A symbols in all 4 3 12
my subjects
I aim to finish in the top three of my class in this 4 2.5 10
subject
I am prepared to work all weekend and during my 3 3 9
vacations to make sure I achieve my ambitions
I am prepared to cheat to make sure I come out top 5 –2 –10
of my class
I am not sure that I have the ability to finish in the top three of my
class in this subject.
Obviously, this item needs to be reverse scored. If we were using a five-
point Likert scale response format such as
5 4 3 2 1
Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree
1 2 3 4 5
Strongly agree Agree Neither agree nor disagree Disagree Strongly
disagree
Clearly we would not give this information to the participant, but would
keep a record of which items are reverse scored and take this into
account when scoring the responses. In general, it is a good idea to have
between 35 and 50 per cent of the items reverse scored.
Instructions
Each item consists of five statements, labelled A, B, C, D and E. For each item,
you must rank the five statements in order of preference, with 1 being the most
important and 5 the least important. Record the ranking you give to each item in
the spaces provided.
Ranking
I think that a really good workplace is one (which
allows me to)
A have a high standard of living
B make important decisions on my own
C feel that I have done something really worthwhile
D where people are encouraged to compete with
and outperform others
E where my supervisor encourages me to show
initiative.
Because the score given to the last item in ipsative scoring can be calculated from
knowing the other scores, correlations using ipsative scores tend to be lower than
those using normative scores. Correlations using normative scoring techniques
tend to be higher than those using ipsative scores.
Additional reading
Short paragraphs
Describe how you would set about drawing up a scale to assess some characteristic
such as job satisfaction.
SECTION
2
OBJECTIVES
4.1 Introduction
At this stage we are therefore concerned with how well our assessment
technique works, and not with what it actually measures, in other words
the way it measures, and not what it measures. If our measure is
inaccurate, we cannot believe anything it tells us, and so before we can
talk about what is being measured, we have to show that we can trust
our measuring instrument.
The test itself. People may not understand all the items, or the design
of the item alternatives may be poor.
Test administration. Failure to adhere strictly to time limits, noise or
other distractions, or a poor rapport between administrator and
respondent can affect results.
Test scoring. Strict or lenient markers, and poor scoring and data
capture procedures can cause errors.
Test takers. The respondents may not be fluent enough in the
language. Their mood can also have an influence on their responses.
Obviously, the more uniform and standardised the assessment process is,
the less likely a significant random error component. In other words, the
more standardised the assessment process and the scoring of the
assessment is, the more reliable the technique will be. (See section 4.2
for further discussion.)
where
st = the standard deviation of the population*
rtt = the reliability coefficient
What this means is that if a person gets a score of 40, and the SEM is 3,
then there is a 68 per cent chance that his true score will be between 37
and 43 (40 ± 3), and a 95 per cent chance that his true score will be
between 34 and 46 (40 ± 6).
Of course, we know that people are not like pieces of wood and that
various factors are likely to interfere with and influence the score
obtained at both T1 and T2. We will consider these later. However, it is
important to note that some assessment techniques are less influenced by
these various factors and will tend to have larger coefficients of stability
than others. These techniques are said to be robust: they are not easily
influenced by extraneous factors. Equally important is that the
coefficient of stability of every assessment technique used needs to be
known as it is a crucial element in judging the value of any assessment
procedure. If we get two different results when we measure the same
phenomenon at T1 and T2, we cannot say whether the result at T1 is
correct or whether the result obtained at T2 is correct. In fact we cannot
say whether either of them is correct – they could both be wrong and
some other value correct. If a technique’s test–retest reliability is low,
we cannot trust any result obtained.
What are some of the factors that make it difficult to obtain high levels
of test–retest reliability?
Firstly, the conditions under which the assessment is carried out may
differ from T1 to T2. However, the greater the standardisation* of the
assessment procedures, the lower will be the impact or effect of these
variations.
Each version of the test needs to be tested for reliability and validity.
However, because the different versions of the test have much in
common, they can be validated relatively cheaply compared to
validating only one version of the test. Roughly, if the first version of a
test costs R100 to construct and validate, the second version will cost
about R30 to produce, and a third version only R10 more.
We still need to show that the different versions of the test are as close
to identical as possible. To do this, we administer the two tests to a
group of people as close as possible in time and then correlate the two
sets of scores. In practice, the two tests are often given one after the
other, in order to limit the effects of extraneous environmental factors.
Clearly, if the tests are demanding, an appropriate period of rest between
the two test sessions may be required. We acknowledge that there may
well be some transfer of knowledge from the first to the second version
of the test. To minimise this effect, a common tactic is for half the
sample to be given version A first, followed by version B, while the
other half of the sample group is given version B first and then version
A. In this way, any transfer effect is limited.
To ensure that the two versions of the test are at the same difficulty
level, we use a common-person research design*. This involves
having about 20 per cent of the items common to both tests and having
the same group of persons take both versions of the test. Because we
assume that the common items have the same difficulty level
irrespective of the version of the test, common-item equating*
becomes appropriate. Common-item equating assumes that any
differences in total test scores can be attributed to the difficulty of the
other items in the two tests. Since the persons are assumed to have the
same ability regardless of which test they take, the scores on the more
difficult test may have a constant added to make them equal to
equivalent scores on the easier test by adjusting the total scores based on
the differences of performance on the common items. (See, for example,
Masters, 1985.)
Once we have obtained the two (or more) sets of results, they are
correlated with each other to obtain a coefficient of equivalence*
between the different versions. In general terms, correlation coefficients
above 0,90 are regarded as acceptable indicators of equivalence. Most
reputable techniques for assessing general cognitive ability, achievement
and educational outcomes have alternate or parallel versions, and the
coefficients of equivalence between the different versions are reported.
This technique can thus generate any number of parallel or alternate test
versions and in this way makes the whole issue of creating hardcopy
alternate forms somewhat redundant.
where
rtt = reliability coefficient
n = number of items in the measure
st2 = variance of total measure
p = proportion of items answered correctly
q = proportion of items answered
incorrectly
where
α = reliability coefficient
n = number of items in the measure
st2 = variance of total measure
Σs2i = individual item variances
For the mathematically inclined, note that Cronbach’s α is a special case of the
KR20, in which Σpq is replaced by Σs2i.
If the average correlation between the various items is low, alpha will be
low. As the average inter-item correlation increases, Cronbach’s alpha
increases as well. In general, alpha scores above 0,7 are acceptable. (See
Kaplan & Saccuzzo, 2013, p. 124.)
In closing this chapter, we need to look at some of the factors that affect
the value of the reliability coefficient. Among the most import of these
are the following:
Power measures can be evaluated using all the reliability measures, but
speed measures should not be evaluated using Cronbach’s alpha or
KR20, because there is very little variance associated with each item.
Nunnally and Bernstein (1993, p. 351) also argue that the most
appropriate reliability approach for a speed measure is to split the
measure into two halves (halving the time as well). These two halves are
then treated as two alternate forms*, which are administered a short
time apart. As indicated earlier, the Spearman-Brown correction for
attenuation needs to be applied. Nunnally and Bernstein (1993, p. 351)
also recommend that temporal stability* be checked by administering
the two halves some time apart – about two weeks.
Fewer items may be quicker and easier to administer, but the results are less
reliable – short measures are more likely to be quick and dirty than longer
measures. Of course, if a scale is too long, other sources of error such as fatigue
and loss of motivation begin to enter the picture.
4.5 Summary
Additional reading
Nunnally, J.C. & Bernstein, I.H. (1993) is a major theoretical text on psychometric
theory. Reliability is dealt with extensively in Chapter 7.
Short paragraphs
1. What is the relationship between the robustness and the sensitivity of a measure?
2. What is meant by the standard error of measurement?
Essays
1. Using the theory of measurement, discuss the concept of reliability and outline the
various forms it can take.
2. Briefly describe the factors that affect the reliability of a scale or measure, and
suggest ways of addressing these.
5 Validity
OBJECTIVES
5.1 Introduction
From this it follows that the more any assessment technique relies on
subjective evaluations, the less reliable it is and therefore the less valid it
will be.
How then can we tell if a technique is valid and that it is truly measuring
what it claims to be measuring?
There are three main forms of validity, all of which are important,
although they apply differently in different contexts and therefore
require different kinds of evidence. These are termed construct, content
and criterion-related validity*. (See, for example, McIntire & Miller,
2000, pp. 134–136.)
Be careful with the term “construct”. It has nothing to do with construction and the
way the measure was made. It refers to a theoretical idea. For example,
personality does not exist as an object – it is a theoretical construct.
In addition, we see that each of these three factors contributes strongly, weakly or
not at all to performance in each of the subject areas. In much the same way as
in the example, we can determine which factors relate or contribute to various
aspects of a phenomenon being assessed. Although the factor structure may vary
slightly between different samples (because of measurement error), factor
structures are relatively robust – they do not change all that easily.
The point is that if we subject our new assessment technique to a factor analysis
and obtain a factor structure that is similar to that found by other researchers
using other techniques, then we can be relatively certain that our assessment
technique is assessing a similar construct. If our factor structure is very different
from that obtained with other techniques, then our case for the construct
validity* of our assessment technique is weakened considerably.
It is for this reason that Nunnally and Bernstein (1993, p. 111) argue that “[f]actor
analysis is at the heart of the measurement of psychological constructs”.
We can also distinguish between an exploratory factor analysis* and a
confirmatory factor analysis*. With the former, we try to uncover the optimum
factor structure underlying our data. For example, in the Grade 12 results
example above, we suggest that there are three factors and we call our approach
exploratory because we explore the possibilities in our factor analysis. In a
confirmatory factor analysis, we rather ask whether the data is compatible with a
certain factor structure. Suppose previous research indicates that there are five
factors underlying scholastic success and not three as we suggest, we can then
determine whether or not a five-factor solution is possible. In other words, we
seek to confirm the existence of five factors, rather than simply ask how many
factors exist.
If we are able to show that the factor structure we get from our research
is the same as or similar to the factor structure obtained in other
research, then we can be sure that our assessment measure is measuring
the same constructs as those of the other researchers. This is an
indication of the construct validity of our measure.
There are two further forms of validity worth a mention, namely face
validity* and ecological validity*, discussed below.
Table 5.1 summarises the various forms of validity and the questions
they seek to answer.
Question
Type Purpose Form
asked
Construct Is the measure Convergent Does the
(theoretical) theoretically sound? measure
correlate with
similar
measures?
Discriminant This measure
must not
correlate with
other measures
to which it is not
related.
Factor analysis Is the factor
structure of the
measure similar
to the factor
structures of
other measures
of the
phenomenon?
Maturational/developmental Does the
sequencing measure reflect
the known
developmental
and
maturational
sequence
described by
theory?
Content Does the measure Content validity Are the items
accurately reflect the representative
content of the domain of the domain
that is being under
assessed? investigation?
Face validity Do the test
items appear to
be appropriate
for the test’s
purpose?
Criterion Does the test Concurrent Does the test
related correlate with external result correctly
criteria such as job identify groups
success, pass rates, that are known
etc.? to differ on the
characteristic
being assessed
(e.g. good vs
poor typists)?
Predictive Does the test
successfully
predict who will
show the
characteristic
being assessed
at some time in
the future?
Ecological Is the test fair and Are the items
useful in other and the test as
situations? a whole relevant
and meaningful
in situations
outside the test
situation?
Although nine per cent does not sound like a great deal, it can be important.
Imagine if an employer could improve the productivity of his workforce by nine per
cent using proper selection methods. Similarly, if ten per cent fewer marriages
ended in divorce, or ten per cent fewer people died in car accidents, it would be a
meaningful contribution.
To explore this matter further, the instructions and the first 52 items
were submitted to a simple grammar check on the computer, which
returned a Flesch-Kincaid grade level* of 6,9. Flesch-Kincaid is a
formula based on sentence length and the number of syllables per word,
and is designed to measure the complexity level of the language used.
The grade levels are roughly equivalent to US school grades. When the
language used in the SA92 was analysed using the grammar check of
MS Word, it was found to have an English reading level of about Grade
7. This meant that the Honours students were unable to comprehend
English that the majority of English first-language speakers in Grade 7
could be expected to understand.
The Flesch-Kincaid grade level rates text based on US school grade levels. For
example, a score of 8,0 means that an eighth grader can understand the material.
The formula for the Flesch-Kincaid grade level score is
where
ASL = average sentence length (the number of words
divided by the number of sentences)
ASW = average number of syllables per word (the number
of syllables divided by the number of words).
One of the problems with these findings is that the synonyms used in
this study were obtained from American dictionaries. Another is that
there is no indication how the Flesch-Kincaid grade levels relate to
South African school grades. However, the differences in the SA92
scores do not appear to be caused primarily by language ability, but
rather by the socialisation practices of the groups involved. Research by
Shuttleworth-Jordan (1996) and others shows that as educational and
background factors become more equal, language-based differences in
test outcomes begin to disappear.
People who have been injured in motor vehicle accidents (MVAs) or industrial
accidents often stand to be paid large amounts of money by insurance companies
for the loss of amenities and future earnings. The temptation to overstate the
nature and extent of the damage is thus high. Psychologists investigating these
cases therefore have to guard against the possibility of malingering.
Many scales, especially those probing cognitive function or brain injury,
have items that may seem plausible to someone trying to fake bad, but,
in fact, describe what rarely occurs in practice. For obvious reasons,
these are not described here.
5.7 Summary
Chapter 6 in Cohen, R.J. & Swerdlik, M.E. (2002). Psychological testing and
assessment: An introduction to tests and measurement gives a good account of the
theories surrounding the concept of validity, especially in relation to culture.
For a clear explanation of factor analysis and the use of confirmatory factor analysis in a
validation study, see pages 177 to 179 in McIntire, S.A. & Miller, L.A. (2000).
Foundations of psychological testing.
Short paragraphs
1. Using the theory of measurement, briefly describe the relationship between reliability
and validity.
2. Name the various forms of validity, going into detail on at least two of them.
3. Describe what is meant by demand characteristics and give five examples of these.
Essays
OBJECTIVES
6.1 Introduction
This does not mean that a person with a score of 33 who is highly
motivated will not do better than predicted. Similarly, there is no
guarantee that a person with a Swedish formula score above 44 will not
fail physics or any other subject if he does not apply himself properly.
However, the table has been built up over many years and gives us a
relatively good idea of what can be expected. (Although this approach is
no longer used by the educational authorities in South Africa, it does
illustrate how expectancy tables are used.)
In the US many test norms are reported in terms of age equivalents* (or
age norms*) and/or grade equivalents* (or grade norms*). If we use
age norms as an example and say we have a score of 23 on the Ravens
Standard Progressive Matrices, it would (hypothetically) be reported as
an age equivalent of eight years two months and as having a grade
equivalent of Grade 2,4. In other words, the majority of people eight
years and two months of age and the majority of people who are in
Grade 2 and have been there for four months would be expected to score
23 on this test. This means that the score of 23 would be the equivalent
of being eight years and two months old and would also be equivalent of
being in Grade 2 and having been there for four months since promotion
from a lower grade. (Look at the fictitious promotional material in
Exhibit 7.1 page 95 to see how this is reported in reality.)
You can see that these two norms are very similar to Binet’s notion of
mental age*. Many of these issues are discussed in McIntire and Miller
(2000) especially Chapter 5. (See also Anastasi & Urbina, 1997.)
6.1.5 Self-referencing*
The final approach to interpreting an assessment score is to see how the
score relates to a similar assessment made earlier. Has the person or the
system improved, remained the same or deteriorated? If they have
improved or deteriorated, to what extent? Has the depressed person
become less depressed after the treatment? Is the improvement greater
with medication 2 or dose 2 than with medication 1 or dose 1? Does the
person smoke fewer cigarettes or type faster today than he did
yesterday? These are all self-referential questions.
6.2 Norms
Note: The various scales above are not exactly to scale, although the general
picture can easily be seen.
A norm table enables test scores to be grouped so that the assessor can
interpret a particular score. For example, a score of 23 may represent
stanine 3, an age equivalent of 14 years 8 months, or a grade equivalent
of 5,6. Where do these figures come from?
These norm tables are usually published in the test’s technical manual.
When the same test is used on a variety of samples, several norm tables
may be collected and kept in a norm book. Because the general
education and experience of a workforce changes over time, it is vital
that norms be updated every five years or so. Table 6.2 is an example of
a typical norm table, showing some of the more common indices.
From this norm table, we can see that a person with a raw score* of 20
has a T-score* of 61, a percentile rank of 87,27, a stanine of 7 and a sten
of 8.
6.4 Norm groups*
It should be obvious that the nature of the norm group is vital. For
example, if we wish to evaluate a school leaver to see if he has the
ability to work as a clerk, then we must compare his score with that of a
group of school leavers who have succeeded in a clerical field. If we
wish to see whether an elderly patient is suffering from Alzheimer’s
disease, we need to compare his scores on a particular measure against a
norm sample of approximately the same age. If we know there may be
gender differences, then we need to make sure that the norm sample uses
people of the same gender as the subject.
Table 6.3 shows that a person scoring 18 would be below the cut-off
score (and hence not selected) if he were from group A, but would be
above the cut-off point (and thus selected) if he were from group B.
Furthermore, the person from group A would need to score at least 20 if
he were to be selected, and though this would put him at stanine 7 (well
above average) if he were from group B.
So far it may appear as though there is only one score which needs to be
interpreted. However, in many instances, a person is assessed on several
measures as part of the triangulation process and to get as many
viewpoints as possible. In a few cases, people who have been assessed
by different instruments meet the criterion (i.e. score above the cut-off)
on all measures. In most cases this does not happen – participants
usually score above the cut-off point on some measures and below on
others. The problem then is to decide whether the person passes or fails
(is accepted or rejected) overall. Even where a person does meet all the
minimum requirements, there are many situations (such as selection)
where we are interested in choosing the best person rather than one who
meets the minimum requirements.
In each of these cases, we are faced with a person who meets the criteria
on some measures and falls below the cut-off point on others. How do
we combine these results to come to a decision? Below we use a simple
example of a person who has written three different examinations and
has passed two and failed one. We also assume that the examination
results in this chapter are given as stanine scores, because they are the
easiest to understand, although the general principles apply to all forms
of scores. Also, to make it easier to understand, we assume that the cut-
off score (the pass mark) is stanine 5 – scores of stanine 5 and above
meet the criterion (they pass) and scores below stanine 5 do not meet the
criterion (they fail). Although different cutoff scores can be set, we will
work with stanine 5 in this chapter.
Subject Stanine
English 5
Mathematics 4
Biology 6
Total 15
Divide by 3
Final score 5
The final score is 5,0, which is above the cutoff score, so the person
passes.
In this case, the final score is 4,9, which is below 5, and so the person
fails.
This is similar to a round robin or pool system in which, despite a loss in one
round, the person or team may still win. It is quite different from a knockout or
multiple-hurdle situation where loss in one round means the player or team is
eliminated from the contest.
In each case we can plot the predictor scores against the criterion scores.
In this process we will get the typical oval shape of the distribution
score. When we correlate the predictor and criterion scores, we get a
value between 21,0 and +1,0. The closer the correlation is to 1, the
flatter the oval and the better the validity will be. The nearer the
correlation is to 0, the rounder the oval and the lower the validity will
be. This is shown in Figure 6.4.
Firstly, we see that as the oval gets thinner (and begins to look like a
straight line – a perfect positive correlation), the size of quadrants C and
D shrink. In other words, as the measure increases in validity, the risk of
making false predictions decreases. (See Figure 6.4a.)
Secondly, we can also see that if the cut-off point increases (the vertical
line moves to the right), the D quadrant shrinks and will eventually
disappear altogether. In other words, by raising the cut-off score, the
chance of identifying false positives decreases. However, as D is
shrinking, the C quadrant is expanding. Therefore, although raising the
cut-off point decreases the number of false positives, it increases the
number of false negatives.
Similarly, if we reduce the cut-off point (i.e. move the vertical line left),
the size of quadrant C shrinks and can disappear completely. In other
words, lowering the cut-off point ensures that everybody who has a
chance of succeeding is given that chance – there are no false negatives.
However, the cost of this tactic is to increase the size of the D quadrant,
the false positives. In other words, by lowering the cut-off point, more
people succeed, but more people also fail. The exclusion of positive
cases is known as a Type 1 error*, while the inclusion of false cases is
known as a Type 2 error*.
Using the four categories of true negatives, true positives, false positives
and false negatives enables us to derive a number of different indices.
For example, if we take the number of positives (B + D) and divide it by
the total number of cases (A + B + C + D), we get the selection ratio*.
If we have a low selection ratio (i.e. if we have to choose relatively few
people from a large pool), we can raise the cut-off to get the very best
candidates. If the selection ratio is relatively high (i.e. we have to choose
a relatively large number from a relatively small pool), we can lower our
cut-off point to expand our pool to include as many people as possible.
One of the problems that often confronts people who have to make
decisions based on assessment scores is the fact that the candidates
sometimes write slightly different versions of the same test, with one
version (X) being more difficult than another (Y). As a result, there must
be ways to equate the results of the two versions of the test. The simplest
method is to use different norm tables and work out where the
candidates score in terms of these separate norms. For example, person
A may be at stanine 6 on test Y and person B at stanine 7 of test X, even
though the raw scores suggest that person B scored lower than person A,
simply because B was assessed on the more difficult version of the test.
This issue is discussed in Chapter 4 (section 4.3.2) where we look at
how to establish the equivalence of parallel forms of a test.
6.7 Summary
Psychometric assessment
In order to assess his capabilities and general suitability for the types of job
outlined above, Sipho was given a number of psychometric tests to assess his
cognitive and psychological functioning in line with the requirements of such
positions. These are described below.
Score Stanine
0–7 1
8–12 2
13–19 3
Cancellation task
This is a measure of perceptual speed, which is the ability to recognise simple
patterns quickly (e.g. 7) in a string of numbers (e.g. 5 3 7 8 9 5 8 4 7 3 2 3). This
is a general indicator of cognitive functioning and a crucial ability for clerical tasks
where specific information needs to be identified (see Appendix 1). On this test,
Sipho scored at stanine 5 level. Appropriate norms based on railway clerks and
obtained in 1993 were used. This suggests that he has a fair ability to identify
simple patterns or figures in a complex matrix and would be able to carry out
simple vigilance tasks.
Continuous adding
In order to test his endurance and perseverance in the cognitive domain, he was
given a simple adding task in which his performance was monitored over an
extended period (a so-called Pauli test). This test involves a simple adding task in
which a series of single digits are added over an extended time period (e.g. 4 + 3,
3 + 6, 6 + 2, 2 + 9, 9 + 5, etc.). The test taker’s performance is monitored over
the full period, with a line being drawn every minute to indicate the progress
made to that point. Any fall-off or deterioration in concentration and accuracy is
thus easily identified in terms of both the number of additions completed during
the period and the accuracy of the addition during the period.
On this test, which lasted for 20 minutes, Sipho worked at a steady rate and
managed to complete between 13 and 18 additions per minute, with no apparent
deterioration in performance. It must be pointed out that after the sixth minute, he
complained that his hand was getting tired. At this point, the tester took over the
task of writing down his answers to the additions. In this way, the fall-off of
performance that would have been associated with muscular fatigue resulting
from his using his non-dominant hand was avoided. The maintenance of his
concentration and steady information processing over the full 20-minute period
does not suggest any major impairment of cognitive faculties.
Numeric calculations
A test involving simple arithmetic operations was administered, as these are
generally required for many clerical and cashier-type jobs. On the test
administered, Sipho attempted only 17 of the 30 items, of which only four were
correct. This is a very low score (stanine 1), based on a norm group of railway
clerks obtained in 1993. These results are not surprising in the light of his poor
pre-accident scholastic record. Clearly, he is unable to carry out arithmetic
calculations at any advanced levels. However, this finding does not suggest any
significant measurable deterioration of his cognitive functioning as a direct result
of the accident.
Additional assessments
In addition to these cognitive tests, it was decided to examine him further to see
whether his cognitive functioning could have been influenced by a poor self-
image and the presence of post-traumatic stress symptoms, and if so, to what
extent. To this end, two further scales were administered, namely a Cognitive
Distortion Scale, and a Trauma Symptom Inventory.
will be most helpful when assessing individuals who are not likely to
misrepresent themselves for primary or secondary gain. In forensic settings or
instances where symptom misrepresentation is a significant possibility, the
CDS should be co-administered with at least one test that has validity scales.
“Validity scales” refers to some kind of in-built accuracy test or lie-detector scale.
On each of the ten clinical subscales of the TSI, Sipho scored in the clinically
significant range above T-score 70. Most of his scores were closer to 90 and 100,
which are extreme scores. This suggests that he is suffering from high levels of
post-traumatic stress and associated levels of depression. However, it is
important to note that he obtained a T-score of 100 on the Atypical response
validity scale and 70 on the Inconsistency scale. These are American norms, as
the measure has not been validated or normed in South Africa. There are no
equivalent measures available locally.
These high scores, especially the extreme Atypical response score, suggest that
his responses on both the TSI and the CDS, as well as the poor scores on the
other scales administered to him, may well be deliberate distortions. This could
reflect an attempt to mislead or to overstate his symptoms. This then brings the
validity of all previous tests into question.
1. Based on these results, do you think Sipho has suffered irreversible
brain damage?
2. To what extent do you think the accident has impaired his
educational prospects?
3. To what extent do you think the accident has impaired his
occupational prospects?
4. From the results on the CSD and TSI, do you think Sipho was
malingering and trying to fake his condition in order to get a good
insurance payout?
Give evidence from the test results to support your answers to these four
questions.
Additional reading
Short paragraphs
Essay
Using the decision-making matrix, show the effects of raising or lowering the predictor
cut-off score on the four categories of outcomes (true positives, true negatives, false
positives and false negatives).
7 Fairness in assessment
OBJECTIVES
7.1 Introduction
7.1.2.1 Individuals
In the case of educational and clinical assessments, an accurate
assessment of a person’s psychological functioning may be crucial in
diagnosing problem areas and suggesting possible interventions and/or
treatments. In the case of employees or potential employees, a fair
assessment is crucial as it may affect employment or promotion
opportunities.
In this regard, we must recognise that fairness is a relative concept and that it
may be impossible to be fair to both the previously advantaged and the previously
disadvantaged groups at the same time. To be fair to the previously
disadvantaged, we may have to select people with lower assessment scores
ahead of those with higher ones. This is unfair to the latter. However, to ignore
past injustices by choosing the people with the higher scores will be unfair to
those who were previously discriminated against.
7.1.2.4 Organisations
Business organisations exist to make money, and if people are
erroneously appointed or not appointed because of problems with the
assessment process, businesses stand to lose money and other
opportunities. Such errors can also result in increased levels of labour
unrest.
7.1.3 Discrimination
Discrimination means treating some people differently from others.
Although this is generally seen in terms of group membership of some
kind (e.g. language, gender, class, race, religion, etc.), this is not
necessarily the case. Some parents favour one child and discriminate
against another. The word “discriminate” means “to choose between”
and does not always have negative connotations. For example, when we
shop, we discriminate between brands of toothpaste, choosing one rather
than another. However, when our choice is based on factors not related
to the task involved, we discriminate against the object or person. When
a lecturer or human resources practitioner sets a test or examination, he
discriminates (distinguishes) between people (by taking such aspects as
knowledge or job-relevant criteria into account). However, if the lecturer
or human resources practitioner were to take non-job-relevant criteria
(such as age, gender, ethnicity) into account, he would be discriminating
against the persons concerned.
In a landmark case in the US, it was argued that fire-fighters had to be a certain
height to allow them to work effectively with the fire-fighting equipment. Women’s
groups complained, arguing that this then excluded women from being fire-
fighters by reason of adverse impact. They won the case and this selection
criterion was dropped.
7.1.3.3 Pre-market discrimination
If some people are prevented from gaining the required skills or
experience before they get into the market, for example if mathematics
is required for many jobs, and girls are not taught this at school, then
they will be victims of pre-market discrimination*. One of the major
impacts of the apartheid system was the systematic denial of quality
educational opportunities to black children (and people) throughout the
country, the impact of which is still being felt today. In this regard,
Theron (2007, p. 183) shows that
The amendment also allows employees to refer any dispute in this regard to the
CCMA for arbitration (Section 10, paras (a), (b) and (c).
While the concept of psychic unity may hold true at the physiological
level (if we ignore racial characteristics such as colour, hair type, etc.), it
seems it holds no truth at the level of psychological characteristics.
Strength Weakness
Etic Makes intergroup comparison May not be equally valid for all
possible groups
Emic Increases validity fowr the group Makes intergroup comparison difficult
As you can see, the two groups have the same total scores, even though
only two items overlap (4 and 10). We would therefore be tempted to
argue that the scale assesses more or less the same set of factors or
constructs in both groups. However, the item analysis shows that this is
not the case. Therefore in analysing the results, we need to examine the
nature of the items that differ between the groups. Items that function
differently with different groups can be modified or eliminated. At the
same time, it should be noted that research tends to show that the
removal of biased items does not make a great deal of difference to
group norms – the stronger group tends to remain strong and the weaker
one to continue to score below the other group(s). This is because the
items that appear to be biased tend to be the easier ones, so that
removing these items from the measure only serves to make it more
difficult for everyone, and especially for the culturally different or
minority group (see Kaplan & Saccuzzo, 2009).
y = mx + c
In Figure 7.3b we see that the two scatter plots do not coincide: one
group is lower than the other. However, because the regression line and
the intercept are the same for both groups (i.e. there are two sausages on
one stick), we can conclude that the assessment is fair, even though the
one group scores lower than the other. Having the same regression line
and intercept (i.e. stick) means that the group scoring lower on the x-
variable (the predictor) also scores lower on the y-variable (the
criterion).
In Figure 7.3c the two scatter plots and the intercept are different, but
the slope of the two regression lines is identical. In this case, the
assessment scores of the one group are lower than the other group’s.
This indicates that the assessment results are equally valid, but that the
scores of the second group underestimate the ability level of the people
in the group. In this case, merely adding a constant score to the lower
group’s score will raise both the scatter plot and the regression line to
match the situation in Figure 7.3a. The value of this constant is the
difference between the two intercepts. (The dotted line shows the
regression line and intercept for the combined group.)
Finally, in Figure 7.3d, the two scatter plots, the regression lines and the
intercept are all different and do not coincide, and we can conclude that
the measure will be unfair and will almost certainly be biased against
one group. It will therefore be less valid for use in making decisions
about the second group.
Any assessment technique or test where there are two distinct sausages and two
separate sticks is unfair using the Cleary model. This approach to determining
fairness is named after Anne Cleary who first proposed it in 1968. (See, for
example, Cleary et al., 1975.)
Firstly, there is the equal probability model put forward by Linn (1973),
which argues that different cut-off points for the different groups should
be chosen in such a way that the success rate (B/(B + D)) remains
constant for the two groups.
The second model was put forward by Cole (1973), and is known as the
conditional probability model. This model advocates selecting different
cut-off scores in such a way that the ratio of true positive to total success
(B/(C + B)) is constant for both groups.
Adopting a quota system may also set people up for failure in that less-
than-competent people are appointed and fail, doing damage to the token
appointee, to the organisation and to all those who have believed in them
and their success. This outcome is predicted from the decision-making
matrix, as a quota system effectively means that the cut-off score is
dropped for some groups, thereby swelling the false positive quadrant
(D) for this group. As argued in section 6.5.2.7 on page 69, if this
strategy is adopted for social or political reasons, steps must be taken to
ensure that in this group fewer people than predicted fail.
where
st = the standard deviation of the
population*
rtt = the reliability coefficient
Because the scores are banded in this way, we can say that a group A
score of 21 can be regarded as being in the same band as a group B
score of 25, and a group A score of 19 can be regarded as being
equivalent to (in the same band as) a group B score of 23, and so on. As
long as we work down from the top of the group A scores, we can
regard all group B scores within 2 SEMs as equivalent. In this way, we
can slide a 2-SEM band down the two sets of scores and achieve our
balanced targets without violating the rights of either group A or group
B in a perfectly legitimate and scientific way.
Effect on Effect on
Model Method Rationale
minorities organisation
No All comers Because Number of Poor
assessment selected on a assessment minorities investment
first-come, first- methods are maximised – as large
served basis unfair, applicants high failure numbers fail,
are allowed to rates at great cost
demonstrate their to the
ability over time organisation
Unqualified Top-down Assessment Relatively few Good
individualism selection using methods are minority performance
single norm equally valid for members are achieved; not
all groups and selected good for EE
scores therefore targets
represent merit
Qualified Group Separate norms Minority Poorer
individualism membership and/or correction groups well achievement
(race, gender, factors used to represented of objectives;
language, etc.) generate EE targets
taken into separate more easily
account topdown met
selections
Cleary Regression lines This is fair Minority Poorer
regression used to show because those groups well achievement
model fairness and to with the highest represented of objectives
calculate highest predicted
criterion scores; criterion score
people selected are selected
accordingly
Quota Appointment of Because Best Poorer
system minority groups potential is representation achievement
should be equally of minority of objectives
proportional to distributed across groups
their availability all groups
in the population (psychic unity),
(or within four- all groups must
fifths thereof) be proportionally
represented
From Table 7.4 we can see that different fairness models yield different
results, and that one cannot completely minimise adverse impact while
maximising job performance (as measured by the criterion-referenced
validity of the selection method). In other words, there has to be a trade-
off between equity and performance. To quote Hough and Oswald
(2000, p. 636): “A selection strategy aimed at minimising adverse
impact may differ somewhat from a selection strategy aimed at
maximising mean predicted performance.” In the US, reverse
discrimination court cases have concluded that race and other “job-
irrelevant class membership” cannot be used in making job-related
decisions. In South Africa, the Constitution and most current
employment legislation not only allow this, but make it compulsory.
In the early 1990s, a young industrial psychologist was asked to devise a way of
selecting apprentices at his place of work. He prepared a detailed proposal for his
manager, who rejected it because of the inherent inequalities in the system which
would exclude a large number of previously disadvantaged candidates. The
manager’s suggestion was that everybody who was interested enough to apply
for an apprenticeship, and who appeared to be suitable, should be admitted to the
programme. Those who could not make it should be allowed to fail and be
excluded from the training programme after six months or a year.
7.6.3 Observation
A third approach to assessing people fairly is to place them in a situation
(either a sample of the real situation or some kind of simulation) and
then to observe how they perform in it. As we see in Chapter 17, this is
one component of an assessment centre and has the decided advantage
of allowing the person to demonstrate many of his abilities and
competencies. It is, however, a complex and costly method, and should
therefore be used fairly late in the assessment process.
7.7 Summary
You, the reader, should at this point be able to define and explain every
term that is used in this promotional leaflet. If you are unable to do this,
go to the glossary of terms at the back of this book and then, if
necessary, revisit the relevant chapters.
Additional reading
Short paragraphs
1. Define fairness and list five kinds of evidence to show that an assessment is unfair to
certain groups.
2. Discuss what is meant by psychic unity and discuss this in relation to the emic and
etic approaches to assessment.
3. Briefly discuss the five approaches to fairness.
Essay
What are the advantages and disadvantages of using group-based norms for
interpreting assessment results, and what other approaches are there for ensuring
fair(er) assessment results?
Exhibit 7.1
The Adult Basic Competence Test – Version 4
(ABC4)
The Adult Basic Competence Test – Version 4 (ABC4) is the latest offering in a
test series first published in 1946. The various editions of the ABC have enjoyed
widespread use in a variety of settings as a measure of basic academic skills and
competencies necessary for effective learning, communication and thinking,
reading and spelling words, and performing basic mathematical calculations. The
ABC4 continues to measure these basic content areas, and preserves those
features that made the ABC3 and earlier versions so popular with users – ease of
administration and scoring, and the provision of a significant amount of
information gained through a relatively brief investment of testing time.
The interpretation of the ABC4 has been enhanced by the addition of grade-
based norms, thereby increasing the usefulness of the tests in Grades 0–12. The
age-based norms have been extended from 75 years in the third edition to 94
years in the fourth edition so that the basic literacy skills of older adults can now
be assessed.
The ABC4 is a norm-referenced test that measures the basic academic skills of
word reading, sentence comprehension, spelling words and mathematical
computation. It was standardised on a representative national sample of over
3000 individuals ranging in age from 5–94 years. Alternate forms, designated the
Pink and Green forms, were developed and equated during the standardisation
process using a common-person research design.
Derived scores were developed for both age- and grade-referenced groups.
Standard scores (T-scores), percentile ranks, stanines, normal curve equivalents
and grade equivalents are provided for both groups.
Although there is some evidence that the tests are somewhat sensitive to social
and cultural influences, in all but two cases the scores of the various tests were
no more than 1 stanine different from the overall finding, suggesting that the test
as a whole is relatively robust to sociocultural influences, while remaining
sensitive enough to be useful across the full sociocultural spectrum.
Reliability
Reliability evidence for the ABC4 is shown to be strong, and includes information
based on classical test reliability theory, such as internal consistency, alternate
form, test-retest (one day and three months), inter-scorer reliability and standard
error of measurement.
Validity
Validity studies include both exploratory and confirmatory factor analyses and
provide consistent support for the structure of the ABC4. These studies show
strong concurrent validity with other measures of academic ability and grade
examinations at all grade levels. It thus appears to be a good predictor of
academic achievement. In the workplace, there are moderate correlations (0,46–
0,63) with training outcomes and supervisors’ estimates of general intellectual
ability.
In addition, good discriminant validity evidence for the ABC4 is reported, in that
the ABC4 differentiates among individuals with mental retardation, learning
disabilities, speech and language impairments, and those individuals identified as
gifted.
8 Assessing in a multicultural
context
OBJECTIVES
8.1 Introduction
In the context of this book, most of this assessment will be for the
selection and placement purposes of people with limited English ability
and/or experience of English culture, such as with recent immigrant
populations, or where the assessments are used in transnational settings.
Various economic, political and social developments, both nationally
and internationally, in the past few decades have resulted in a great
increase in the need for, and interest in, cross-cultural assessment (Van
de Vijver, 2002). These trends include a more global economy and
increased labour migration, the internationalisation of education, and a
massive influx of political refugees into many European and other stable
countries, all of which have given impetus to the understanding of cross-
cultural interactions. According to a March 2000 report by the
International Labour Organization (ILO, 2000, p. 1)
These assessments need to be carried out in ways that are fair and
unbiased, irrespective of why they are carried out. As shown in Chapter
7, fairness is a special case of validity generalisation. (Are our measures
equally valid across different groups?) In much the same way, cross-
cultural fairness asks whether the measures are equally valid across
groups of people with different cultural backgrounds and linguistic
ability in the language of assessment. At the same time, as Coyne (2008)
has noted, “[e]qual opportunity laws in many countries will prohibit the
use of tests in a manner that discriminates unfairly against protected
groups of the population (such as gender, racial, age, disability, religion,
etc.).”
Perhaps the most widely cited definition of culture is that put forward by
Geert Hofstede (1991) who sees culture as “software of the mind” and
as “the collective programming of the mind which distinguishes the
members of one group or category of people from others” (p. 5).
Cultural adaptation
Do I want to establish good relations
with the culture of destination?
Cultural maintenance Yes No
Do I want to maintain good Yes Integration Separation/segregation
relationships with my culture of origin?
No Assimilation Marginalisation
The first strategy put forward by Van de Vijver and Phalet (2004) is
integration, where characteristics of both cultures are maintained in a
process of biculturalism. They quote a number of research studies in
Belgium and the Netherlands (e.g. Phalet & Hagendoorn, 1996; Phalet,
Van Lotringen & Entzinger, 2000; Van de Vijver, Helms-Lorenz &
Feltzer, 1999), which consistently show a preference for this strategy,
namely that migrants want to combine their original culture with
elements of the mainstream culture.
The fourth (and in the view of Van de Vijver and Phalet (2004), the least
often observed) strategy is marginalisation, which involves the loss of
the original culture without establishing ties with the new culture. In
some countries youth, often second or third generation, show
marginalisation of this kind; they do not feel any attachment to the
parental culture nor do they want to establish strong ties with the host
culture (often they are prevented from identifying with the host culture
because of societal discrimination or other forms of exclusion). As
Denoso (2010) argues, in real life marginalisation is seen as a negative
outcome of the acculturation process, rather than as a conscious choice
by the people concerned.
(Note: These are not the actual questions used, but merely illustrate the
approach.)
Denoso (2010, p. 38) argues that the two- and four-question format
measures successfully discrimination between the integration strategy,
which is generally considered to be more adaptive, and the other less
adaptive strategies (Arends-Tóth & Van de Vijver, 2003). On the other
hand, Rudmin and Ahmadzadeh (2001, cited by Denoso) have argued
that the marginalisation strategy was misconceived and incorrectly
operationalised during the test construction process. They argue that the
four-fold paradigm commits the Fundamental Attribution Error* by
presuming that acculturation outcomes are caused by the preferences of
the acculturating individuals rather than by the acculturation situations.
They further argue the general point that the four-question approach to
assessment in general has poor psychometric properties, in that the
questions are ipsative (i.e. they are positively correlated and thus not
independent of one another). (See section 3.6.6).
8.2.1 Apply
Firstly, instruments that have been developed in one particular social
context (essentially Western/Eurocentric) can simply be applied to all
groups across different sociocultural settings without checking the
meaningfulness and psychometric properties such as reliability and
validity of the instruments. This approach adopts an assumption of
universality, the view that these instruments retain their original
properties in the new setting. Personality questionnaires developed by
Eysenck are examples of instruments that have been translated and
validated in various cultural groups on the assumption that personality
structures and the items assessing each aspect are the same in all
cultures and contexts as has occurred with various personality scales
(e.g. Barrett, Petrides, Eysenck & Eysenck, 1998; Eysenck, Barrett &
Eysenck, 1985). In Chapter 7 (section 7.2.1), this approach is described
as an “etic” approach, and as Van de Vijver (2002, p. 545) notes, this is
a form of “blind” application of an instrument in a culture for which it
has not been designed, and is simply bad practice where there is no
concern for the applicability of the instrument nor its psychometric
properties in the new context.
8.2.2 Translate/adapt
Secondly, existing tests and measures can be adapted and translated into
the language of the target group. However, this goes beyond a literal and
even idiomatic translation in order to ensure the proper conceptual
translation of the test material. For example, the Minnesota Multiphasic
Personality Inventory (MMPI – a clinically oriented personality scale)
contains various implicit references to the American culture of the test
designers, and extensive adaptations to many items are required before
the scale can be used in other languages and cultures.
The third form of method bias arises from the manner in which the
assessment is administered (administration bias). Communication
problems between testers and testees (or interviewers and interviewees)
can easily occur, especially when they have different first languages and
cultural backgrounds (see Gass & Varonis, 1991). Interviewees’
insufficient knowledge of the testing language and inappropriate modes
of address or cultural norm violations on the part of the interviewer can
seriously endanger the collection of appropriate data, even in structured
interviews. One can see how computerised administration of a test
would affect computer-literate people and those with very little
computer experience quite differently.
Individual test items and the test as a whole should not vary in the levels
of difficulty or intensity when the groups are known to be similar.
Equivalence is thus achieved when the assessment behaves in a similar
way across cultures as shown by a pattern of high correlations with
related measures (convergent validity) and low correlations with
measures of other constructs (discriminant validity) as would be
expected from an instrument measuring a similar construct. If there are
major differences in the way in which the groups behave, or if there are
marked differences in the way in which the attributes occur, then
specifically designed measures need to be developed and tailored to
meet the demands of the cultural context. This means that at least some
items will be different in the two countries. This approach is consistent
with the “emic” approach.
An item shows DIF if individuals having the same ability, but from
different groups, do not have the same probability of getting the item
right (p. 110).
There are several ways in which item bias can be demonstrated. Some
are based on expert judgements which are based on inspection and back
translation, while others are based on various forms of statistical
analysis. The statistical techniques are divided into two main categories:
non-parametric methods developed for dichotomously scored items
using contingency tables, and parametric methods for test scores with
interval-scale properties based on the analysis of variance (ANOVA).
In back translation, the test is translated into the target language and then
it is re-translated by an independent expert back into the source
language. A panel of bilingual scholars then reviews the translated
version, which is translated back into the first language to monitor
retention of the original meaning. An independent back translation
means that “an original translation would render items from the original
version of the instrument to a second language, and a second translator –
one not familiar with the instrument – would translate the instrument
back into the original language” (Geisinger, 1994, p. 306). Once the
process is complete, the final back-translated version is compared to the
original version (Brislin, 1980; Hambleton, 1994). Finally, the translated
one of the assessment is “tried out” out with a sample of participants and
refined in the light of this experience. This process can be repeated
several times. The simplicity of the option has led to its widespread use.
Group A Group B
Pass 58 23
Fail 42 77
Just by looking at this distribution, it is clear that the item is much easier
for members of Group A than it is for Group B. Clearly Item 1 is biased
against the members of Group B. However, inspection is not good
enough and so the chi-square statistic is used. MH yields a chi-square
test with one degree of freedom to test the null hypothesis that there is
no relation between group membership and test performance on one
item after controlling for ability as given by the total test score. In other
words, an item is biased if there is a significant difference in the
proportions of each membership group achieving a correct or desired
response on each test item. Once the item has been examined in this
way, the next step is to compare the scores for Item 2 in exactly the
same way. This is continued for Item 3 and all other items, until they
have all been compared.
The method outlined above assumes that the amount of DIF is the same
across all members of Groups A and B, and that there is no interaction
between item difficulties for members with different levels of ability.
This assumption is termed a uniform DIF and exists when the
probability of answering an item correctly is greater for one group
consistently over all ability levels. In other words, uniform DIF occurs
when there is no interaction between ability level and group
membership. As Ekermans (2009) shows, uniform bias results from
differences in item difficulty as shown by differences in the regression
intercept of the observed item scores on the variable across different
sociocultural groups (the offset described in Chapter 3, section 3.6.3).
She argues further that if assumptions of scalar equivalence remain
untested, there is minimal impact on within-cultural group decisions.
This is because all scores will be affected in the same direction. At the
same time, in the absence of evidence of scalar invariance, between-
group differences may be incorrectly interpreted as showing real
differences between the groups (Cheung & Rensvold, 2002; Steenkamp
& Baumgartner, 1998; Van de Vijver & Tanzer, 2004). In the absence of
empirical evidence of metric equivalence of the measurements, any
findings about group differences on the attributes being assessed, and
subsequent practical implications of the results in important areas of
functioning, are simply not known.
Table 8.3 Interpretation of ICC properties for cognitive and personality measures
Cognitive, aptitude,
Personality, social or
ICC property achievement or
attitude measures
knowledge test
Slope (commonly called Item discrimination – a flat Item discrimination – a flat
the a-parameter in IRT) ICC does not differentiate ICC does not differentiate
among test-takers among test-takers
Position along the X-axis Item difficulty – the amount Threshold – the amount of
(commonly called the b- of a latent variable needed a latent variable needed to
parameter in IRT) to get an item right endorse the item
Y-intercept (commonly Guessing The likelihood of
called the c-parameter in indiscriminate responding
IRT) or socially desirable
responses
Pedrajita and Talisayon (2009) point out that the common measure of
bias across these various approaches is the significance of the chi-square
value obtained. A significant chi-square value indicates: (1) difference in
proportion attaining a correct response across total score categories for
the X2 procedure; (2) difference in proportions selecting distracters for
the DRA; (3) difference in the odds of getting an item right between the
reference/focal groups compared for the LR; and (4) large DIF effect
for the MH statistic. They argue further that no one method is better than
any other, and that the argument for the presence of DIF is increased
when two, three or all of the four methods yield a statistically significant
chi-square value on an item or groups of items. They summarise the
various methods of DIF and their accompanying statistical analyses in
Table 8.4.
Measure of
DIF approaches Focus of analysis
bias
Chi-square Differences in proportions attaining a correct Significance of
response across score categories chi-square
Distracter Difference in proportions selecting distracters Significance of
Response chi-square
Analysis
Logistic Odds of getting the item right Significance of
Regression chi-square
Mantel-Haenszel Performing chi-square statistical tests for DIF Significance of
effect chi-square
Of course, this approach does not identify particular items that behave
differently across the groups, although an examination of the items that
load differently on particular factors in the two factor analyses will point
to the differential behaviour of items. These may then be further
explored, as described in the previous section on DIF.
Interviewing
Pencil and paper
Card sorting (e.g. sorting a pile of cards with an adjective on each into
piles such as “very much like me”, “somewhat like me” and “not at
all like me”
Manual (e.g. fitting objects together to make a whole, such as jigsaw
puzzles, drawing lines/objects)
Computerised testing, including adaptive testing
Various problems are experienced when these different formats are used
in a cross-cultural context – these are termed method bias or instrument
bias* as discussed in section 8.2.3. Clearly when people are not used to
being assessed (i.e. are relatively low on test wiseness or test
sophistication*), they may suffer from test anxiety and as a result tend
to underperform. This may be particularly relevant when high-tech
methods such as questionnaires and computer-based applications are
used, and less likely when interviews and assessment techniques based
on culturally familiar methods such as toys and the like are used. Novel
assessment techniques have used sand tray drawings, clay modelling,
models of animals and everyday objects, and so forth. Such techniques
have been used, inter alia, by Deregowski and Serpell (1971), who asked
Scottish and Zambian children in one condition to sort miniature models
of animals and motor vehicles, and in another condition to sort
photographs of these models. Many of these same techniques are used in
various forms of psychotherapy, including art therapy (e.g. Oaklander,
1997).
8.6.1.1 Triangulation
In order to detect method bias, they argue for a process of triangulation
(e.g. Lipson & Meleis, 1989) using single-trait, multimethod matrices
(e.g. Campbell & Fiske, 1959; Marsh & Byrne, 1993). Unless these
different measures that are known to assess similar constructs yield very
similar outcomes, one or all of the methods used are likely to be suspect.
An alternative method is to use repeated test administrations and to
examine score patterns between two administrations. If individuals from
different groups with equal test scores on the first occasion have very
different scores on the second administration, the validity of the first
administration is open to doubt. They argue that this approach is
particularly useful for mental tests.
These various forms of bias and the strategies for reducing them are
shown in Table 8.5.
There are several ways in which item bias can be demonstrated. Some
are based on expert judgements based on inspection as well as forward
and back translation, while others are based on various forms of
statistical analysis. The statistical techniques are divided into two main
categories: non-parametric methods developed for dichotomously scored
items using contingency tables and parametric methods for test scores
with interval-scale properties based on the analysis of variance
(ANOVA). Non-parametric statistical approaches look for differences in
the frequency with which tests scores are given, using a contingency
approach and the chi-square statistic. There are three such non-
parametric approaches, namely the Mantel-Haenszel (MH) approach, the
Simultaneous Item Bias Test (SIBTEST) and Distracter Response
Analysis (DRA). The best known of the non-parametric techniques is
the Mantel-Haenszel statistic, which uses chi-square to test the null
hypothesis that there is no relation between group membership and test
performance on one item after controlling for ability as given by the
total test score. In terms of MH, an item is biased if there is a significant
difference in the proportions of each membership group achieving a
correct or desired response on each test item. Once an item has been
examined in this way, the process is continued until all items have been
compared.
Additional reading
For good insight into the use of psychometric scales across cultural boundaries, see
Douglas, S.P. & Nijssen, E.J. (2002). On the use of ‘borrowed’ scales in cross-national
research: a cautionary note. International Marketing Review, 20(6), 621–642.
Essays
1. In the light of the theories discussed in this chapter, revisit Case study 6.1 (p. 72) in
Chapter 6 and suggest how you would demonstrate the cross-cultural equivalence of
the Trauma Symptom Inventory.
2. Suppose that you want to compare two countries on individualism–collectivism and
its effect, if any, on workplace behaviour, bearing in mind that the samples in one
group have on average a higher level of education than the samples in the other
group. Discuss how this difference could challenge your findings and how you could
try to disentangle educational and cultural differences.
3. Suppose that you wanted to investigate the conformity levels of employees in your
organisation which has sizable groups of people from Eastern Europe, Asia, the US
and South Africa. How can sources of method bias be controlled in cross-cultural
studies in this study? Discuss procedures at both the design and analysis stage.
9 Managing the assessment
process
OBJECTIVES
9.1 Introduction
9.2.1 Preparation
Preparation for any assessment session involves ensuring that all the
required materials are available and that the venue is appropriate for the
assessment. The following aspects need special attention:
9.2.1.2 Materials
The materials should be in good condition and the same for everybody
being assessed. Obviously, different groups of people may need
different versions of the material at different times, but it is unfair to
those being assessed and difficult for the assessor to interpret the results
of an assessment if person A is assessed in one way and person B, who
is being assessed for the same purpose, is assessed in another way.
9.2.1.3 Instructions
The administrator should be thoroughly familiar with the instructions
and must adhere to them as closely as circumstances permit. Prior to the
assessment session, he must ensure that, besides the instructions, he is
also thoroughly familiar with the material, the time allocated for the
various tests, and so forth.
In terms of South African law, people being assessed for any purpose
need to give their permission for this to take place, and therefore it is
essential that participants sign a document indicating that they consent
to being assessed and that their results can be used for the specified
purpose. Those not willing to sign such a document may be asked to
leave the assessment venue. Of course, parents of children below the age
of majority (18 years) may sign on their behalf. People who are unable
to sign on their own behalf for reasons of enfeeblement, as well as
people referred for assessment by the state, are regarded as having given
their rights to the state. Remember, that in terms of the Employment
Equity Act 55 of 1998, any job applicant has the same rights as people
already in the organisation’s employment. As we argue later, a
practitioner should conduct each assessment as if he may be required to
justify his actions in a court of law.
9.2.1.4 Venue
The venue in which the assessment takes place should be well
illuminated and at a comfortable temperature. It must be adequate for the
number of people being assessed and large enough to accommodate all
in relative comfort. Lighting and ventilation must also be adequate, and
the venue must be relatively free of distractions such as noise and
interruptions. No distractions such as telephones and other noises should
be allowed to interfere with the assessment. Participants (and the
assessor) should make sure that their cellphones are switched off. Any
exceptional occurrences that take place (power outages, excessive noise,
distractions, etc.) should be noted.
9.2.2 Administration
The second aspect in which as much uniformity and control as possible
should be exercised is the actual administration of the technique(s) being
used. The following aspects all require attention:
9.2.2.3 Monitoring
During completion of the exercises, the administrator should walk
around and ensure that the answers are being completed in the correct
fashion. Close attention must be paid to ensure that the participants are
answering in the correct place on the answer sheet – participants
sometimes work across the page instead of down the columns. If more
than about 20 people are being assessed in a single venue, there should
be assistant administrators – ideally, one administrator for every 20–25
people being assessed in a group situation. It is important that the
administrator ensures that the participants do not copy from one another,
particularly on cognitive tests.
Tip
Sometimes when marks have been made and then erased, indentations remain
on the page, allowing subsequent participants to read the marked answers. A
way around this is to make similar marks for all possible answers in the booklet
and then to erase them all. In this way, the next participant will not have any help
in choosing the correct answer – all possible answers will have been marked!
9.2.3.3 Scoring
The scoring of the material is a primary source of error in assessment
and therefore great care must be taken with this aspect. It must be
carefully and accurately done, and checked by a second person to ensure
accuracy. This is especially necessary where items are scored
subjectively.
Note that the party that pays for the administration of the assessment
process has a right to the report, while the participant has the right to
feedback. However, the participant cannot prescribe the nature or the
format of the feedback or how he is to access the information.
9.2.7.1 Language
We have already noted that we live in a multilingual country in which
many potential employees lack fluency in English or Afrikaans, the
languages in which almost all assessments are conducted. In section 2.3
of the International Test Commission (2001), which is concerned with
fairness in test application, the following statements are made:
– The test taker’s level of proficiency in the language in which the test
will be administered is determined systematically and the
appropriate language version is administered or bilingual
assessment is performed, if appropriate. Basically, there are three
ways around the problem of language ability.
know the strengths and limitations of the techniques they choose, and
should use only those that have been properly validated for the
purposes for which they are being used and for the target population
being assessed
base all decisions on as wide a source of information as is feasible
be properly trained and competent to use (i.e. administer, score and
interpret) the various techniques
take into account any form of bias known to exist when interpreting
the results to ensure that these measures are as free of discrimination
as possible
regularly update their knowledge in the areas and techniques they are
using.
Ensure that the venue is adequate for the assessment process and
allows optimum performance.
Ensure that all the assessment materials are available in sufficient
quantity and are of good quality.
Establish rapport with the people being assessed.
Take into account the needs of people who are physically challenged.
Ensure that suitably trained interpreters are available if they suspect
that language ability may be a problem.
Ensure that they are thoroughly familiar with the instructions and not
deviate from, or modify them, except when the requirements of
reasonable accommodation demand this. If they are in doubt, they
should consult a more senior or experienced colleague.
Make sure that all participants clearly understand the task
requirements.
Ensure that time limits are strictly adhered to. Noting start and finish
times in a log book in case the stopwatch malfunctions is vitally
important.
Not count down to the end of the assessment; they should avoid
saying: “You have three minutes left” or something similar.
Ensure that the assessments are properly scored and double-check all
results every time.
Follow the scoring instructions carefully, especially with procedures
that are more subjective. Every now and then they should go back to
an earlier answer sheet to ensure that their standards and/or
interpretations have not drifted (i.e. they should check their own test–
retest reliability or consistency). Where possible, they should also
check their interpretation of more subjective items with colleagues
(i.e. their inter-scorer reliability).
Know which norms are appropriate and make sure that the correct
ones are used.
Know how the various transformations affect the raw scores and use
the most appropriate ones.
Know what factors affect the validity of any assessment, and keep
these in mind when interpreting any assessment outcome or score.
Make sure that all sources and forms of relevant information are
considered in coming to a decision.
Understand the meaning and importance of the standard error of
measurement (SEM) and the effect this may have on the accuracy of
any cut-off score. The greater the SEM, the less absolute is any cut-
off score.
Give feedback in as positive a way as possible and be aware of the
damage that a poor result can do if it is badly communicated.
Use language that is appropriate to the person being assessed and
other interested parties.
Ensure that all assessment results are treated as confidential and that
they cannot be seen by unauthorised people.
Not disclose raw scores or transformed scores to anyone, but rather
give a verbal interpretation such as “above average”, “near the top of
the scale”, and so forth.
Make sure that the material is securely stored and not accessible to
unauthorised people when not in use.
Not duplicate copyrighted material: the producers of this material
have gone to great lengths with their product, and copying it is
nothing less than theft.
They have the right to know why and how they are to be assessed.
They need to know how the results will be used.
They have the right to confidentiality and need to be assured that the
results will not be passed on to third parties, except with their written
consent.
They have the right to refuse to be assessed. However, they must be
aware of the consequences of this refusal.
They have the right to a full and comprehensive assessment in the
light of the purpose of the assessment.
They have the right to expect that the assessment techniques used are
appropriate and valid for them and for the purpose of the assessment.
They have the right to be protected from stigmatisation as a result of
their assessment results.
They have the right to expect that feedback should be given in a way
that protects their dignity and their self-image, while encouraging
insight and opportunities for growth and development.
In terms of the South African Employment Equity Act 55 of 1998, a
job applicant has the same rights as an employee. In other countries,
very little attention is being paid to these issues at present.
The control of psychologists, their training and the services they provide
are controlled quite differently in different parts of the world. In some
countries there has traditionally been very little, if any, control; in others
this control has been exercised by the psychology profession through
membership of relevant associations; and in some countries, this control
is exercised through specific legislation. However, there is a growing
trend towards the establishment of centralised professional boards
located within central government structures and controlled by specific
“Healthcare Professions” legislation.
Clinical psychologist
Counselling psychologist
Industrial psychologist
Educational psychologist
Research psychologist
One final aspect of this approach was to place all tests and other
techniques into three categories, namely C, B and A levels. C-level tests
were those that involved in-depth individual assessments of personality,
intelligence and other aspects of psychological functioning, especially
those related to personality problems. B-level tests were those that were
related to normal functioning and could be administered in group
situations, such as ability and aptitude testing* at schools and in the
workplace. A-level tests were those of interests and aptitudes often used
in the school or educational environment. Until recently, tests and other
techniques were graded as C, B and A level, initially by the Test
Commission of South Africa and later by a committee of the
Professional Board for Psychology within the HPCSA (which replaced
the SAMDC). The purpose of this classification was to ensure that all
tests in use were properly evaluated with respect to their psychometric
properties, the adequacy of their norms and the level of registration
needed to use them.
9.5.2 Britain
In the past, the recognition and control of psychologists was the realm of
the British Psychological Society (BPS). However, statutory regulation
for psychologists was introduced on 1 July 2009, and the Health and
Care Professions Council (HCPC) also opened a “Register of
Practitioner Psychologists”. This legislation protects seven titles:
Clinical Psychologist, Health Psychologist, Counselling Psychologist,
Educational Psychologist, Occupational Psychologist, Sport and
Exercise Psychologist and Forensic Psychologist. In addition, the HCPC
stipulated two routes to statutory regulation – for some categories a
professional doctorate (DPsych or DClinPsych) was decreed, whereas
for others a Master’s degree and endorsement by the BPS is required.
The HCPC does not approve other qualifications in psychology, such as
undergraduate degrees or Master’s programmes, because these do not
lead directly to eligibility for registration with the HCPC. The use of the
title “Chartered Psychologist” is also protected by statutory regulation
and simply means that the psychologist is a chartered member of the
British Psychological Society (BPS). However, it does not necessarily
signify that the psychologist is registered with the HCPC. It is an
offence for someone who is not in the appropriate section of the HCPC
to call himself a psychologist even though the BPS continues to accredit
these programmes. According to the BPS fact sheet on registration, the
categories are as follows:
Source: http://www.bps.org.uk/what-we-do/bps/regulation-psychology/regulation-
psychology [retrieved 24 July 2013]
9.5.3 Europe
In recent years, attempts have been made to establish a Europe-wide
approach to the training of psychologists. According to Sagana and
Potocnic (2009), this training would take the form of a European
Diploma in Psychology (EDP) consisting of a three-year Bachelor’s
degree and a two-year Master’s degree with a 12-month supervised
practice, although the latter would not necessarily be offered by the
university offering the academic training. The authors identify several
problems with this approach, including the following:
Clinical neuropsychology
Clinical psychology
Community psychology
Counselling psychology
Educational and developmental psychology
Forensic psychology
Health psychology
Organisational psychology
Sport and exercise psychology
Some critics of the 4+2 and 5+1 models are of the view that, because the
basic entry requirement is four years of academic study, Australian
psychologists are underqualified in that. Wemm (2001) disagrees,
however, arguing that the Australian Honours degree is at least as good
as a Master’s degree in other countries and that an Australian Bachelor’s
degree with a major in psychology is roughly equivalent to an American
Master’s degree in psychology in terms of years of psychology studied,
although she concedes that this training is academic rather than
professional. She further contends that a four-year Australian Honours
degree approximates a three-year American doctorate, while a five-year
Australian Bachelor’s degree at pass level is generally equivalent to a
four-year American professional doctorate. A two-year Australian
Coursework Master’s is somewhere between an American PhD and a
post-doctoral diploma.
9.5.6 China
At the turn of this century, in China there were only six psychology
departments and four psychology institutions among all the institutions
of higher education, although all normal universities and teachers’
colleges have psychology curricula and established psychology teaching
and research groups. To a certain extent, China had to depend on the
developed world for the training of its psychologists (Jing & Fu, 2001).
This dependence resulted from the importation of foreign experts as well
as the training abroad of Chinese psychologists at the postgraduate level
and the subsequent brain drain, as many of the latter do not return to
China (Higgins & Zheng, 2002, pp. 10/11 of 14).
In line with the need to carefully manage the training and education of
psychologists, many countries find it necessary to classify their
psychological tests and assessments. However, the way this has been
done in some countries is problematic. In a 2010 submission to the
PsyBA in Australia, a useful analysis was put forward by Littlefield,
Stokes and Li (2010), who argue that any classification system needs to
optimise benefits by doing three things simultaneously, namely 1)
managing risks of harm to the public (safety); 2) ensuring quality
services; and 3) protecting tests from inappropriate use. In addressing
these needs, they identify four different approaches to classification and
the advantages and disadvantages associated with each. These are test
type, setting, purpose and use, and administration versus interpretation.
9.6.2 Setting
A second approach considers the setting or circumstances in which
testing occurs. Such a process might, for instance, make a distinction
between those psychological tests that are used in clinical settings from
those used for vocational counselling or staff selection. Clinical tests
could then be limited for use in clinical settings by people who have
been trained and registered as clinical psychologists, while vocational or
selection tests would be available only to those with demonstrated
competence, accredited training or registered as organisational
psychologists. The criteria used in this approach may be the type of tests
used, the client population targeted and the tests’ abilities as diagnostic
or predictive tools. As with test type, a specific setting on its own does
not represent an adequate predictor of risks of harm, but should form
one element of an overall considered approach.
9.6.6 Britain
In Britain the system recently adopted is similar to that used by the
European Federation of Psychologists’ Associations (EFPA). The EFPA
model broadens the classification of the tests based only on content to
include additional considerations of the context, instruments and the use
to which they will be put, in line with the Australian model discussed in
the previous sections. This model involves a qualification system based
on three levels of competence needed to practise safely in various test
roles (Bartram, 2011). These levels are as follows:
The EFPA model also takes into account various technical issues
focusing on the key psychometric quality of the test, such as an
evaluation of the norms, reliability and validity of the instrument. The
rating process allows reviewers to comment on such aspects as the
appropriateness of norm groups for local use, sample sizes, etc. The
emphasis on training in the EFPA model is in line with the findings of
Muniz et al. (2001) that while the restrictions imposed on testing vary
considerably from country to country, restrictions alone are no guarantee
of good practice, but that some form of specialised training requirement
is also necessary.
9.6.7 The US
The US approach to test classification as put forward by the American
Psychological Association (APA) has a three-tier classification into A-
level, B-level and C-level tests. In terms of this model
(See http://www.sigmaassessmentsystems.com/)
Pearson has also introduced two new qualification levels, Q1 and Q2.
Q1-level assessments may be purchased by individuals who have a
degree or licence to practise in the healthcare or allied healthcare field.
Q2-level tests may be purchased by individuals who have formal
supervised training in mental health, speech/language, and/or in
educational training settings specific to working with parents and
assessing children; formal supervised training in infant and child
development; or formal training in the ethical use, administration and
interpretation of standardised assessment tools and psychometrics (see
http://www.pearsonassess.ca/haiweb/cultures/en-
ca/ordering/qualification-levels.htm).
More recently, the ITC (2010) has proposed a modified model of test
administration involving the splitting of the supervised mode into (a)
Remote: supervised and (b) Local: supervised. The “remote supervised”
is based on the availability and application of online monitoring by the
test user with real-time biometrics, which permits the following
safeguards and ways of controlling for/detecting that obviate the need
for proctoring:
The ITC guidelines are divided into four areas of focus, each with
between three and six subareas, yielding a total of 18 areas that need to
be attended to. These are the following:
(See http://www.psychology.org.au/assets/files/online-psychological-
testing.pdf for more details.)
Imagine you broke your arm and needed an X-ray. Now imagine that
your physician was required to perform the X-ray, as well as every
other diagnostic test that might be required without the help of trained
technicians – for you and every patient that entered her practice.
Certainly it’s possible. But most primary-care physicians leave the X-
raying and lab tests to other professionals so that they have more time
to spend with patients. It’s an accepted practice that’s intended to
provide better quality patient care.
The APA Practice Directorate says yes – and has supported several
state psychological associations in defense of this practice.
9.7.2 Britain
The BPS endorses the ITC Guidelines. The distinction made between
the four modes of test administration was put forward by Bartram
(2001), namely Open, Controlled, Supervised and Managed (see above).
9.8 Protection of minority groups
9.8.2 Britain
As far as can be determined, no mention is made of the need to protect
these groups in the Code of Good Practice for Psychological Testing
prepared by the Committee on Test Standards and approved by the
Membership Standards Board in 2010, nor is any cross-reference made
to existing laws that are aimed at ensuring fair practice and non-
discrimination against women and minorities in the workplace. The
closest that can be found is item 10 of the Test Taker’s Guide published
by the BPS Psychological Testing Centre which does promise to “give
due consideration to factors such as gender, ethnicity, age, disability and
special needs, educational background and level of ability in using and
interpreting the results of tests”. In its brochure Using online assessment
tools for recruitment, the BPS makes a general statement about adapting
the test administration process to take account of disability and language
problems, suggesting that the introductory information/biographic
section should “allow such candidates to identify that there may be a
problem and assess whether taking the test in the standard way will be
appropriate for them” (p. 5). It goes on to suggest that special
arrangements may need to be made for the disabled, as not to do so
would be in contradiction of the Disability Discrimination Act of 1995.
9.8.3 The US
In the US, while there are no statutory requirements or regulations
regarding the use of psychological assessment, the Equal Employment
Opportunity Commission (EEOC), supported by the Equal Employment
Opportunity Act, has jurisdiction over people using assessments
incorrectly or inappropriately in the employment context. The primary
concern of the EEOC is to reduce non-job-related discrimination in
hiring practices. Recall that in Chapter 7 (section 7.1.3), Schmidt’s
(1988) three forms of discrimination were referred to, namely adverse
impact, pre-market discrimination and disparate treatment. When trying
to assess whether any selection or assessment process is unfair, these
three aspects need to be borne in mind. However, the emphasis seems to
fall on adverse impact, and various formulae and processes need to be
put in place to minimise or control for these. In particular, the 4/5ths rule
appears to be the dominant model – in short, the ratio between the
proportion of a target group (e.g. females) being selected and the
proportion of the reference group (e.g. males) should be 80 per cent or
greater. She gives the example of 100 people being tested for upper-
body strength – 50 females and 50 males. Of the 50 females, 30 (60 per
cent) pass the test and are selected; of the males, 45 (90 per cent) pass
the test and are selected. The ratio of female to male selection rate is
60/90, which equals, .67 or 67 per cent. This is lower than the 80 per
cent level, thus indicating an adverse impact of the assessment against
the female group in the selection process. Other ways of demonstrating
adverse impact and other forms of bias and discrimination are discussed
in Chapter 8.
9.8.4 Europe
Of all the countries in Europe, it would seem that the Netherlands has
the most active interest in issues of minority rights protection, and most
of the work in cross-cultural assessment has been done by people such
as Fons van de Vijver and Ype Poortinga (e.g. Van de Vijver, 2002; Van
de Vijver & Hambleton, 1996; Van de Vijver & Poortinga, 1997).
9.8.5 Australia
In Australia, the General Registration Standard was approved by the
Australian Health Workforce Ministerial Council on 31 March 2010 (in
line with the Health Practitioner Regulation National Law – the National
Law). This law took effect from 1 July 2010 and has force in each state
and territory and, among other things, specifies examination criteria for
registration as a psychologist. In terms of these standards, four broad
areas of competency are defined, and four content domains are
specified. The second of these is the Assessment Domain, requiring
candidates to show (inter alia) “understanding of … cross-cultural
issues, and test uses with different age and gender groups” (Psychology
Board of Australia (2012, pp. 4–5). Employers (at least in Victoria) have
a positive duty under the Equal Opportunity Act of 2010 to take
reasonable and proportionate measures to eliminate discrimination,
sexual harassment and victimisation. Various states have similar pieces
of legislation, although fairness in assessing across cultural groups does
not appear to be specifically covered in these acts.
Survey methodology
To obtain a view of the regulations for psychological testing in other countries, a
questionnaire was sent to country managers of SHL worldwide. The
questionnaire consisted of ten questions and covered the topics of test regulation
and classification, test administration, Internet testing and feedback on test
results. Responses received were collated and, where applicable, the frequency
of response types calculated.
In total, responses were received from 21 countries:
Conclusion
From the findings of the survey, it has been concluded that South Africa’s
regulations regarding the use of psychological tests are among the most stringent
in the world. However, it is interesting to note the parallels between labour
legislation in South Africa and the US.
In the US, although there are no statutory requirements or regulations regarding
the use of psychological assessment, the Equal Employment Opportunity
Commission (EEOC), supported by the Equal Employment Opportunity Act, has
jurisdiction over people using assessments incorrectly or inappropriately in the
employment context. The primary concern of the EEOC is to reduce non-job-
related discrimination in hiring practices. This approach to testing provides
greater control over the use of psychological assessment in the workplace as
employers are forced by law to use best practices in their assessment
methodology. This relates to the South African labour legislation context, where
the Employment Equity Act prohibits the use of psychological testing or other
similar assessments of an employee unless the test or assessment used can be
shown to be valid and reliable, is applied fairly to all employees, and is not biased
against any employee or group of employees. This parallel between the situations
in the US and South Africa, which share the common goal of protecting
employees from unfair discrimination, raises the question of whether relying on
labour legislation to regulate psychological assessment results in greater control
over the use of test level categories in the workplace.
Source: Reprinted from SHL (SA) Newsline, September 2005, with permission.
9.10 Summary
The third topic we examined was the issues surrounding assessment via
the Internet and international differences in this respect. We found that
South Africa was less willing than most other countries to accept the
idea of anyone other than a registered psychologist or
psychometrist/counsellor being able to do assessments.
Finally, the way in which different countries protect the rights of its
minority groups was examined: in this area, South Africa is better
placed than many of the other countries examined.
Additional reading
Short paragraphs
1. Briefly describe why sticking to the specified time limits in a test is important.
2. What is meant by reasonable accommodation, and how and when should this apply
in an assessment situation?
3. Discuss the concepts of confidentiality and test security, and say what must be done
in this regard.
Essays
Domains of assessment
In this section of the book, we see how we use the theory we have developed to
assess psychological ability and performance in various areas. The next four
chapters are concerned with how we define and measure intelligence and ability
(Chapter 10), personality (Chapter 11), competence (Chapter 12) and integrity
and honesty (Chapter 13). Each chapter starts by defining the constructs involved
and then seeing how these definitions shape the way in which they are measured.
10 Assessing intelligence and
ability
OBJECTIVES
10.1 Introduction
This raises the important issue that intelligence often lies in the eyes of the
beholder, and involves approval of the outcome by powerful people. Intelligence is
socially defined – it is the ability to solve problems in a particular social context.
In line with these arguments, in 1994 David Wechsler, the designer of many of the
important tests in use today, defined intelligence as the “aggregate or global
capacity of the individual to act purposefully, to think rationally and to deal
effectively with his environment” (Wechsler, 1994, p. 4). Similarly, Westen (2002,
p. 280) defines intelligence as “the application of cognitive skills and knowledge to
learn, solve problems and obtain ends that are valued by an individual or a
culture.”
Although there are many definitions, most people would agree that
intelligence is not a thing or a process, but a quality that describes
behaviour, mainly in terms of proficiency or competence – how well the
person is able to perform various cognitive tasks. In other words, if
thinking is the ability to link units of information to make meaning, then
people who do this and arrive at “correct” meanings and the “right”
answers more often and more quickly than others are said to be more
intelligent than those who are slower or get the answers wrong.
However, different groups may have different views about what
constitutes a “correct” answer – intelligence is thus socially defined. In
Eurocentric countries, it is generally the educators and the business
world that make this decision; in a very religious society, it would be the
clerics and religious leaders who would decide on (and reward) the
correct answers to particular problems.
We cannot escape the fact that, in Western industrial and educational settings,
efficiency and speed are more important than social cohesion and therefore the
definitions of intelligence that dominate will stress the former rather than the latter.
We therefore also cannot avoid the view that in the industrialised world,
intelligence is the ability to learn from experience, to process information in order
to discover rules that can be applied in other settings to explain relationships and
to solve problems in a speedy and efficient way. Our understanding and
assessment of intelligence is based on this view.
This fitted in well with the philosophy of the day, which commonly
assumed that consciousness was simply a composition of elementary
processes such as simple sensations, images and perceptions, and that
intelligent people were better able than others at arranging these
elements into complex thought and behaviour patterns. Therefore, he
argued, that in order to understand these complex processes, one needed
only to see how different people used these basic processes. (This is like
saying that we can appreciate the difference between a well-built luxury
house and a poor-quality dwelling by looking at the materials used in the
building and the skill with which they are used. In itself this is not a
good assumption as it excludes aspects such as the architect’s plan in
relation to the requirements of the owners.) This approach thus proved
to be very wrong, and very few findings linking these basic
physiological processes to more complex forms of thinking and
behaviour were made.
However, later in section 10.4.2 in this chapter and in Chapter 18, we see that
there is a correlation between speed of information processing, even at a physical
level, and intelligence. Perhaps Galton and Cattell (see section 10.2.2) were not
so wrong, but merely lacked the sophisticated equipment needed to demonstrate
these relationships.
Binet and Simon continued with this work, and by 1908 they had
collected enough data to calculate the number of questions children of
each age could typically answer correctly. On the basis of these data,
Binet and Simon identified the age at which the children were able to
correctly answer questions of increasing difficulty level. (For example,
the average seven-year-old was able to explain the difference between
paper and cardboard, whereas the average five-year-old was not.) The
age level of the questions was then regarded as the mental age of the
child. For example, the child (or adult) who could answer questions at
the seven-year-old level was said to have a mental age of seven years.
By comparing the responding child’s performance with that of children
of the same age, they were able to identify those children who should be
sent for remedial education.
Like Binet’s scale, the WISC and the WAIS are administered
individually. Wechsler (1939) introduced two important innovations.
The first was to distinguish between two types of scholastic intelligence:
verbal and performance, measured by different subtests which make up
two different subscales. The performance subtests require the child or
adult actually to do something, for example rearrange wooden blocks in
a particular way to reproduce a design shown on a card or rearrange
cards showing the elements of a story in their correct narrative sequence.
In practice, it is found that children with special educational needs often
do much better on the performance scales than on the verbal scales.
Used diagnostically, Wechsler’s scales thus have the advantage of being
very useful in identifying children who appear to be underperforming
for some reason (e.g. because of emotional difficulties in relation to their
overall IQ).
The most popular and most widely used IQ scales today are those
developed by Wechsler. They include:
Research has shown (e.g. Levine et al., 1996) that g predicts outcomes
such as job performance and training very well. At the same time,
people who score higher in g show lower correlations between the s-
factors than people who score lower in g. This suggests that it may be
more important to assess the s-factors associated with job performance
for people who score high in g than it is for people who score relatively
low in g (Lubinski & Benbow, 2000).
These various factors were arranged rather like a Rubik’s cube, resulting
in some 120 different factors (4 × 5 × 6 = 120). In 1984, Guilford
increased the number of abilities proposed by his theory, raising the
number of operations to five (by adding evaluation) and the total to 150.
This is shown in Figure 10.3.
They recorded the time taken (in milliseconds) to answer each type of
item pair and then subtracted the reaction time to the question about
physical match (AA) from the reaction time to the question about name
match (Aa). In this way they were able to separate the time required for
sheer speed of reading letters and pressing keys on a computer from the
time taken to interpret the different shapes “A” and “a”. Their most
important finding was that people differed in the speed with which they
identified the different letter combinations, and that these score
differences were closely related to scores on various intelligence tests,
especially those tests of verbal ability, such as verbal analogies and
reading comprehension. The researchers concluded that people who
score well on verbal tests are those who have the underlying ability to
absorb and then retrieve from memory large amounts of verbal
information in a short space of time. The short time they took to process
the verbal information was the key to their verbal intelligence.
When Hunt and his colleagues (Hunt, Frost & Lunneborg, 1973) used
Posner’s technique, Hunt realised that the people who performed well
also did well at other verbal tasks (he termed these people “high
verbal”). He argued that what was happening was far more than a simple
storage and retrieval process, and that those people who were good at
the Posner tasks were, in fact, using higher-level thinking processes to
decide on the best strategy for approaching the task. As a result of these
findings, in 1973 he began to consider what it was that people with a
high verbal intelligence did that was different from those who were
lower in verbal intelligence – he asked the question: What does it mean
to be high verbal? (Hunt, Frost & Lunneborg, 1973).
By posing this question, Hunt became the first person to move from
looking at the outcomes or products of thinking (what everyone
discussed so far had done) to asking questions about the processes
involved in thinking.
Sternberg was also able to show that people who were better at solving
the mental problems took less time to process the information than
people who were not so good at solving them. He also showed that the
better problem solvers spent longer on the encoding stage (step 1) and
less time on the relationship stages (steps 2–4) than did less-able
problem solvers. As Sternberg put it: “They want to make sure they
understand what they are doing before they go ahead and do it” (2000, p.
252). Sternberg called this approach a componential approach because it
involved breaking mental problem solving down into the component
processes that made problem solving possible.
Initially, Gardner (1983) argued that there were seven different types of
intelligence. In addition to the three recognised by traditional
approaches (verbal or linguistic, mathematical and spatial), he added
musical intelligence, bodily/kinaesthetic intelligence (exhibited, for
example, in dancing, sport and athletics), intrapersonal intelligence
involving the understanding of ourselves and what “makes us tick”, and
interpersonal intelligence shown in being able to relate to and effectively
understand other people. The last two correspond closely to what other
researchers like Goleman (1995) have called emotional intelligence*.
Gardner later added an eighth form of intelligence, which he termed
naturalistic intelligence. This is the ability to understand, relate to,
categorise, classify, comprehend and explain the things encountered in
the world of nature. People such as farmers, ranchers, hunters, gardeners
and animal handlers would exhibit high levels of this kind of
intelligence. He has also added a ninth form of intelligence, which he
calls spiritual intelligence.
Das, Naglieri and Kirby (1994) link these four processes to four
functional areas of the brain. Planning is broadly located in the frontal
lobes, while attention and arousal are functions of the frontal lobe and
the lower parts of the cortex, with some additional involvement of the
parietal lobes in attention. Simultaneous and successive processing
occur in the posterior region of the brain. Simultaneous processing is
broadly associated with the occipital and the parietal lobes, while
successive processing is broadly associated with the frontal-temporal
lobes.
A certain amount of academic research has been done on EI, and the
originators of the theory have provided evidence for its construct
validity. However, others such as Davies, Stankov and Roberts (1998)
claim that they cannot show construct validity and that EI does not fit
the true definition of intelligence, but is rather much closer to notions of
personality and emotional control. As with Sternberg’s theory, there is a
major problem because there is no way of systematically assessing the
various kinds of intelligence identified by Gardner.
As we can see, there are two series, one going across (increase by 1, but
use same symbol), and one going down (increase by 1 and change
symbol). So what should the missing value be? Clearly, from the across
rule there should be five asterisks, and from the down rule there should
also be five asterisks. In all cases, the across and down answers should
be the same. The participant must then identify which of the answer
options contains five stars (*****).
Over the past few decades there has been a substantial increase in our
ability to think rationally and solve increasingly more complex
problems. This is a result of stimulation by radio, television, better
education, and so on. The effect of this has been that the average level of
cognitive ability has steadily increased. However, because intelligence is
a comparative notion (i.e. people’s intelligence is defined in relation to
others via the normal distribution – the bell curve), IQ scores of the
general population have remained the constant over the years, while
those of older people tested on updated tests appear to have decreased.
(This phenomenon is termed the Flynn effect after J.R. Flynn (1984),
who first described it.) Similarly, people raised in non-stimulating
environments and who are not exposed to preferred ways of problem
solving tend to score lower in relation to those who have benefited from
an enriched environment.
At the same time, changing historical agendas have altered those aspects
of intelligence considered to be of primary importance. Conventional IQ
tests measure cognitive abilities that are needed to do well at school and
to succeed in various intellectual tasks. In addition to individual
differences in cognitive ability, performance on such tests has been
shown to be influenced by a range of other factors that reflect individual
differences in experience (such as class and ethnicity).
10.7 Summary
Additional reading
For an in-depth (though somewhat dated) look at the issues surrounding the definition
and assessment of intelligence, see Neisser et al. (1995). Intelligence: Knowns and
unknowns: Report of a task force established by the Board of Scientific Affairs of the
American Psychological Association.
Chapter 8 of Cohen, R.J. & Swerdlik, M.E. (2002). Psychological testing and
assessment: An introduction to tests and measurement gives a good account of some
theories of intelligence and how these shape the way in which intelligence is measured.
A good general introduction to intelligence and intelligence testing is given by Louw,
D.A. & Edwards, D.J.A. (1997). Psychology: An introduction for students in Southern
Africa. (See especially Chapter 7.)
Another good overview is provided by Kowalski & Westen, D. (2004). Psychology:
Brain, behaviour and culture (4th ed).
Kowalski, R. & Westen, D. (2004). Psychology: Brain, behavior, and culture, 4th edition.
NY: Wiley.
Perhaps the best book currently available on the role of intelligence in the workplace is
Adrian Furnham’s (2008) Personality and intelligence at work, especially Chapter 6.
For a closer look at dynamic testing, see Sternberg, R.J. & Grigorenko, E.L. (2002).
Dynamic testing: The nature and measurement of learning potential. Cambridge
University Press.
Short questions
Essays
1. Give a brief overview of the way the concept of intelligence has developed over time.
2. Show how the theoretical model of intelligence that has been adopted has a direct
bearing on the way intelligence is assessed.
3. Define intelligence and show how it contributes to workplace success.
11 The assessment of
personality
OBJECTIVES
11.1 Introduction
Greenberg and Baron (2000, p. 97) state that “[p]ersonality is the unique
and relatively stable pattern of behaviours, thoughts and emotions
shown by an individual”.
The nomothetic view, on the other hand, emphasises that all personality
characteristics are well-defined entities and therefore common to all
people. This makes it relatively easy to describe people. People differ in
their positions along a continuum on the same set of characteristics, and
they are unique only in the balance and amount of each characteristic –
it is this balance which constitutes their uniqueness. Most contemporary
psychologists tend towards a nomothetic approach, but they are aware of
how a characteristic may differ slightly from person to person in the way
that it is expressed. This approach tends to use self-report personality
questions, factor analysis and other trait-based methods for gathering
information about and/or describing the person. These methods are
discussed in detail below.
Endomorph Round, fat and flabby, with somewhat Relaxed, lazy and laid
underdeveloped muscles back
A snowman figure
Sources: Friedman & Shustack (1999, p. 170) and Carducci (2009, p. 327)
Clearly these are stereotypes and people often behave as others expect –
fat people are expected to be jolly and fun loving, while scrawny people
“are supposed” to be bookish and nerdy. It is thus not surprising that
people with particular physiques behave in certain stereotypical ways.
However, some people still believe in this approach – a search of the
Internet for somatotypes will get some good hits.
Indicators of
Central Indicators of positive
Stage Age negative
task resolution
resolution
Infancy Birth to Trust Learning to trust others Mistrust,
18 versus withdrawal,
months mistrust estrangement
Early 18 Autonomy Self-control without loss Compulsive self-
childhood months versus of self-esteem restraint or
to 3 shame and Ability to cooperate and compliance
years doubt to express oneself Wilfulness and
defiance
Late 3 to 5 Initiative Learning the degree to Lack of self-
childhood years versus guilt which assertiveness and confidence
purpose influence the Pessimism, fear of
environment wrongdoing
Beginning to evaluate Over-control and
one’s own behaviour over-restriction of
own activity
School age 6 to 12 Industry Beginning to create, Loss of hope,
years versus develop and manipulate sense of being
inferiority Developing a sense of mediocre
competence and Withdrawal from
perseverance school and peers
Adolescence 12 to Identify Coherent sense of self Feelings of
20 versus role Plans to actualise one’s confusion,
years confusion abilities indecisiveness,
and antisocial
behaviour
Young 18 to Intimacy Intimate relationship Impersonal
adulthood 25 versus with another person relationships
years isolation Commitment to work Avoidance of
and relationships relationship,
career or lifestyle
commitments
Adulthood 25 to Generativity Creativity, productivity, Self-indulgence,
65 versus concern for others self-concern, lack
years stagnation of interests and
commitments
Maturity 65 Integrity Acceptance of worth Sense of loss,
years versus and uniqueness of one’s contempt for
to despair own life others
death Acceptance of death
Source:
http://www.sinclair.edu/academics/lhs/departments/nsg/pub/maslowanderikson1.pdf
Elements Constructs
How are the two How is the third A’s
A B C D D
similar? different? score
✓ ✓ ✗ Customer friendly Rude to customers 5
✗ ✓ ✓ Sees chances to cross- Misses chances to 3
sell cross-sell
✗ ✓ ✓ Organises own work Misplaces files 2
✗ ✓ ✓ Helps other team Selfish 4
members
✗ ✓ ✓ Accepts feedback Gets defensive 2
✓ ✓ ✗ Copes with pressure Shows frustration 3
✓ ✓ ✗ Enthusiastic Negative 4
✓ ✗ ✓ Shows initiative, self No proactivity, non- 2
starter starter
✗ ✓ ✓ Motivated Does not want to work 4
✓ ✓ ✗ Patient Impatient 3
In Table 11.3, the score of the person being assessed is given in the
right-hand column and the name of the analyst on the left. The clerk (G.
Pillay) is analysed by P. Sithole. When she is compared with others,
dimensions or constructs such as customer friendliness, seeing
opportunities for cross-selling, and so on emerge. In the second phase,
G. Pillay is scored on each of these constructs (right-hand column). As
we can see, she scores high on “customer friendliness”, is “helpful to her
colleagues” and “enthusiastic about her work”, and is “well motivated”.
However, she falls short on “showing initiative” and is “not very well
organised”, mislaying files often. (For an excellent introduction to
repgrids, see Jankowicz, 2004.)
11.3.1 Observation
This approach to assessment involves observing what people do and
how they react in specific situations. This is a behavioural approach and
is used, for example, in assessment centre situations (see Chapter 17).
While it is a useful approach, it is labour intensive and certainly requires
some form of intervention in the process (as discussed in Chapter 2). In
practice, this approach to assessment usually requires a person either to
be observed in his work situation, or it can involve some kind of role
play in which he has to respond to a structured situation of some kind,
such as dealing with a difficult subordinate, store assistant or municipal
official. Another typical exercise is placing people in a leaderless group
situation, giving them a task to complete together and monitoring the
interactions of the group members to see who shows particular forms of
interpersonal behaviour, such as dominance, leadership, the ability to
integrate outcomes, conciliatory behaviour, and so forth.
Even though proportionally more white males than members from the
other groups would be selected in this scenario, this is not unfair if the
groups really do differ in dominance. It would be unfair if the
characteristic in question (i.e. dominance) were not required for the
position that was applied for.
Sidebar 11.1
How did you answer the toothpaste and toilet roll questions?
If you squeeze your toothpaste at the base of the tube and/or if you overhang
your loo roll, the chances are that you are male.
If you squeeze your toothpaste at the top of the tube and/or if you underhang
your loo roll, the chances are that you are female.
Produced
Element Fluid Type Personality
by
Earth Black bile Gall bladder Melancholic Depressed, withdrawn unhappy
Air Blood Liver Sanguine Optimistic, outgoing, calm,
cheerful
Fire Yellow Spleen Choleric Irritable, grumpy, loud
bile
Water Phlegm Lungs Phlegmatic Quiet, placid, unemotional
Although this approach is very old fashioned, some people still argue in
its favour, and if you go onto the Internet, you will find some modern
references to it. However, it is of interest to us because in the 1920s,
Wundt, one of the fathers of modern psychology, converted these four
characteristics into a two-by-two matrix, as shown in Figure 11.2.
11.3.5.4 Myers-Briggs
In 1962, this two-by-two approach was further developed by Myers and
Briggs (1962), who argued that people react to the world in terms of
four dimensions reflecting their preferences between equally desirable
alternatives. They termed these dimensions or preferences
extraversion/introversion, judgement/perception, sensing/intuiting and
feeling/thinking. According to Myers and Briggs, a person’s personality
is best described as different combinations of these four preferences (see
Figure 11.4).
Each of the 16 cells in the table has a similar description (see, for
example, Furnham, 2008, pp. 114–118). The Myers-Briggs type
indicator (MBTI) is widely used in South Africa and abroad in areas
such as communication style, career guidance, job selection, leadership,
team formation and development, and emotional perception. (See
Quenk, 2000 and McCaulley, 2000 for other examples.)
One of the criticisms levelled against the MBTI is that there is very little
research to demonstrate its validity (see, for example, Furnham, 2008,
p.115 and the Public Service Commission of Canada, 2006). In addition,
the reports are worded in such a way that most people would agree with
them. In Sidebar 11.2, some interesting research into why these
statements are seen as valid is outlined.
You have a need for other people to like and admire you, and yet you tend to
be critical of yourself. While you have some personality weaknesses you are
generally able to compensate for them. You have considerable unused
capacity that you have not turned to your advantage. Disciplined and self-
controlled on the outside, you tend to be worrisome and insecure on the inside.
At times you have serious doubts as to whether you have made the right
decision or done the right thing. You prefer a certain amount of change and
variety and become dissatisfied when hemmed in by restrictions and
limitations. You also pride yourself as an independent thinker; and do not
accept others’ statements without satisfactory proof. But you have found it
unwise to be too frank in revealing yourself to others. At times you are
extroverted, affable and sociable, while at other times you are introverted, wary
and reserved. Some of your aspirations tend to be rather unrealistic. He asked
them to evaluate the accuracy of the description on a five-point scale, with “5”
meaning the recipient felt the evaluation was an “excellent” assessment and “4”
meaning the assessment was “good”. The class average evaluation was 4,26.
That was in 1948.
The test has been repeated hundreds of times with psychology students and the
average is still around 4,2 out of 5, or 84% accurate. His accuracy amazed his
subjects, though his personality analysis was taken from an astrology column in a
local newspaper and the same description was presented to all the people in his
sample without regard to their birth sign. This finding is known as the Forer effect.
The most common explanations given to account for the Forer effect are in terms
of hope, wishful thinking and vanity, and the tendency to try to make sense out of
experience, though Forer’s own explanation was in terms of human gullibility.
People tend to accept claims about themselves in proportion to their desire that
the claims be true rather than in proportion to the empirical accuracy of the claims
as measured by some non-subjective standard. We tend to accept questionable
or even false statements about ourselves if we deem them positive or flattering
enough. We will often give very liberal interpretations to vague or inconsistent
claims about ourselves in order to make sense of the claims. Subjects who seek
counselling from psychics, mediums, fortune tellers, mind readers, graphologists,
etc. will often ignore false or questionable claims and, in many cases, by their
own words or actions, will provide most of the information they erroneously
attribute to a pseudoscientific counsellor. Many such subjects feel their
counsellors have provided them with profound and personal information. Such
subjective validation*, however, is of little scientific value. It is also termed the
Barnum effect*.
Source: Based on Forer (1949). See also Friedman & Shustack (1999, p. 21)
Source: IPAT (Institute for Personality and Ability Testing), 1993. “Copyright ©1993 by
the Institute of Personality and Ability Testing (IPAT Inc.), PO Box 1188,
Champaign, Illinois, USA. IPAT is a wholly owned subsidiary of OPP Ltd.,
Oxford, England. Reproduced with the permission of the copyright owner. All
rights reserved.”
The 16PF has been widely tested across the world, and remains a
popular instrument as it has been shown to be valid in a wide range of
situations. However, its long-term test-retest reliability is on the low
side, ranging from 0,21 to 0,64 (Kaplan & Saccuzzo, 2001, p. 461). The
16PF is designed for use with a normal population. To use it in a clinical
situation, an additional 12 factors need to be included. Prinsloo and his
colleagues at the Human Sciences Research Council (HSRC) adapted
and standardised the 16PF on a largely white sample of South Africans,
yielding a version known as the 16PF SA92 (Prinsloo, 1998). As shown
in section 5.6.1.2, Abrahams and Mauer (1999a, 1999b) argued that the
language level of the SA92 is such that non-mother-tongue speakers of
English would be at an extreme disadvantage. They even tried to have
the instrument banned in South Africa by the Psychometrics Committee
of the Professional Board for Psychology. Although this attempt has not
been successful, their work does point to some difficulties with the
instrument for use with people who do not have an adequate mastery of
English.
O = Openness to experience
C = Conscientiousness
E = Extraversion
A = Agreeableness or amiability
Health compliance and safety behaviour have also been related to this
model of personality – especially to conscientiousness. For example,
people who are low on conscientiousness are less likely to comply with
medication and treatment instructions. Risky behaviour and addiction
are also related to personality structure. Impulsiveness is clearly related
to risk-taking, which has major implications for the use of illicit drugs,
the prevention of HIV/AIDS and adherence to safety precautions in the
workplace. Conscientiousness seems to be positively related to a range
of good health and safety habits, and negatively to honesty and integrity
(see Chapter 13, section 13.2.2).
Other Big Five dimensions that are important are extraversion, which
predicts success in management and sales environments, and openness
to experience, which predicts training outcomes and receptivity to new
ideas. The remaining factors do not seem to be particularly important for
performance in the workplace.
11.4.2 MBTI
Another very popular measure of personality used in selection batteries
in many organisations is the Myers-Briggs type indicator (MBTI), which
was discussed in section 11.3.5.4. It is also discussed in depth in section
15.4.3.1 when we look at career assessment. (For an in-depth look at the
MBTI in the workplace, see Furnham, 2008, pp. 86–92.)
Type As
are workaholics
are always moving, walking, eating rapidly
work long hours
are competitive about everything
set themselves tight deadlines
get impatient when things are moving too slowly for them
always try to do several things at the same time (multitask)
are unable to delegate
cannot cope with leisure
measure success in terms of the number of things they own or how
much they have acquired
• are prone to stress and heart attacks.
Type Bs
Additional reading
For a good review of the MBTI, see Public Service Commission (of Canada) (2006),
Standardized testing and employment equity career counselling: A literature review of
six tests. Available at http://www.psc-cfp.gc.ca/ee/eecco/intro_e.htm
Fay Fransella, who has done much to advance Kelly’s repgrid theory over the years,
has produced a book on personal construct psychology, The essential practitioner’s
handbook of personal construct psychology (2005, London: Wiley), which is highly
recommended for people interested in this technique. An equally easy-to-read text is
Jankowicz, D. (2004), The easy guide to repertory grids. Chichester, UK: Wiley.
In addition, a good account of Kelly’s theory of personal constructs is provided by Dr
Valerie Stewart (2004), Kelly’s theory summarised: A summary of Kelly’s theory of
personal constructs, the basis of the repertory grid interview. Available at
http://www.enquirewithin.co.nz/theoryof.htm
Furnham’s 2008 text, Personality and Intelligence at work, has excellent chapters on
personality and personality testing in the workplace, the identification of personality
disorders at work, and the origin and assessment of integrity and dishonesty at work.
Short paragraphs
1. Discuss what is meant by the projective hypothesis, and show how this is used to
assess personality.
2. Outline briefly Costa and McCrae’s five-factor model of personality.
3. What is meant by the Forer effect?
Essays
1. Show how the theory we adopt determines the method used to assess personality.
Refer to at least three different theoretical frameworks.
2. Show how personality theory evolved from the four humours model of ancient Greek
science to the type indicator model put forward by Myers and Briggs (the MBTI).
12 Assessing competence
OBJECTIVES
define a competency
describe various kinds of competence
describe different levels of competence
show how competencies drive excellence.
12.1 Introduction
12.1.1 Definition
According to David Dubois (2005, p. 8), “[C]ompetencies are the traits
or characteristics, including an individual’s knowledge, skills, thought
patterns, aspects of self-esteem, and social roles, that are used to achieve
successful or exemplary performance of any type”.
The first part of this chapter focuses on competencies in the typical work
situation, because this is where almost all the research has been done.
The second part examines the extent to which the notions developed in
the first part can be applied to other, non-work situations.
To establish the nature of the competencies required for any job, there is
a six-step process to follow. This is given below. Note that while we talk
about jobs, the same reasoning can apply to other areas such as
admission to a school (school readiness) or release from an institution
such as a hospital or prison.
Precise ways of generating these KPAs are dealt with in section 12.6.
Such statements are often termed behavioural indicators (BIs). For each
competency, we ideally require about five behavioural indicators and for
each of these BIs we have to specify levels or standards against which to
judge or benchmark the person’s behaviour or performance level. For
example, if we look at “batting” as a competency in cricket, we could
identify five indicators such as: 1 – batting average; 2 – not running your
partner out; 3 – having a wide range of strokes; 4 – running between
wickets; and 5 – strike rate. For 1 – batting average, we would have to
specify what is meant. For example:
0 = No understanding
1 = Basic understanding
2 = Working experience
3 = Extensive experience
5 = World-renowned expert
Although five levels of competence are recommended above, in practice
many organisations use only three levels: not yet competent, competent
and more than competent. This approach was proposed for South Africa
for the competency framework of the South African Qualifications
Authority (SAQA) and the National Qualifications Framework (NQF).
Numerous problems were foreseen with this approach. However, in a
more recent move SAQA (2012) has adopted a far more realistic
approach, identifying ten levels of competence and listing ten categories
in terms of which each level must be described. These ten categories that
are used in the level descriptors to describe applied competencies across
each of the ten levels of the NQF are the following:
1. Scope of knowledge
2. Knowledge literacy
3. Method and procedure
4. Problem solving
5. Ethics and professional practice
6. Accessing, processing and managing information
7. Producing and communicating of information
8. Context and systems
9. Management of learning
10. Accountability
The 2012 list of level descriptors put forward by SAQA has gone a long
way towards addressing this issue by clearly indicating a hierarchy of
increasingly difficult criteria that need to be met at various levels in the
education system. In addition, a far more sensible approach has been
adopted for the grading of performance at the school-leaving (and
lower) levels. Previously, it had appeared that the education authorities
were in favour of a simple three-level system – not yet competent,
competent and more than competent. Within this framework, it looked at
one stage as though the school-leaving certificate at the end of the
secondary phase (for example) would simply reflect Mathematics –
Competent; Biology – More than competent; etc. Although this approach
could have some benefits, it would also have created a number of
problems. For example, if one wanted to award a bursary or scholarship
on the basis of merit, this approach would be unsuitable. Similarly, if
one wanted to select the best person for a position (using a top-down
rating system), this approach would also have been inadequate. For
example, a person may be more than competent on leaving school to
study technical drawing but not competent to study engineering or
actuarial science (both of which require an A or B in mathematics in the
old system). If we were to use this three-point system, we would have to
create a report card that looks something like this:
1. Not yet competent. This means that the person is basically unable
to perform the task or to meet the minimum standards required. This
may be because he lacks the skills and knowledge required for the
task, although with further training, development and experience he
may be able to achieve these levels in a reasonable time.
Alternatively, it may be that the demands of the task are beyond his
abilities, and he should therefore be given other, more suitable, tasks
to do. Deciding whether the person has the potential to acquire the
competence or not points to the importance of proper assessment
and placement procedures.
2. Threshold competence. This means that the person is able to carry
out the tasks related to the job at a level that is acceptable to the
organisation in terms of quality and efficiency. At this stage,
however, his ability to solve problems is not very well developed
and he may need supervision and/or help from more competent
colleagues. In terms of Hersey and Blanchard’s (1968) theory of
situational leadership, these people are at task maturity level 1 and
require a telling mode of leadership. (See any industrial or
organisational psychology textbook for details of Hersey and
Blanchard’s situational leadership (SitLead) model.) Successful
performance at this level suggests that the person has the potential to
progress to higher levels of competence.
3. Experienced worker competence. This level is attained when the
person is able to carry out all the tasks required of the job at an
acceptable level, with above-average levels of efficiency and
quality. The person will make few, if any, mistakes and is able to
solve most problems that confront him in his area of expertise. In
terms of Hersey and Blanchard’s SitLead model, these people are at
task maturity level 2 and 3, and require a selling or participating
mode of leadership. In general, they can be left alone to get on with
the job.
4. Highly competent. This person is able to meet and exceed the
required work standards without having to rectify mistakes
afterwards. High-quality standards are maintained, and the person
has well-developed problem-solving skills. He is able to initiate new
ways of approaching tasks and solving problems. He makes a good
mentor and coach, and is able (and often keen) to share his expertise
and knowledge with others and to help them quickly become
experienced workers.
5. Mastery-level competence. This person shows complete mastery of
his task, and is a true expert in the area; he is able to solve really
difficult problems that have baffled others. He is difficult to replace,
and as a result is often kept in a specialist role and not considered
for promotion as this may take him away from his technical
competence into a managerial role. Novel ways of rewarding such a
person and ensuring he remains committed to the organisation need
to be found.
In many cases, there is not enough evidence for the assessor to judge
whether or not the person is competent, and so a category marked
“Insufficient evidence to form a judgement” is also used.
It would appear that the Department of Education (or at least parts of it)
is aware that the simple three-way categorisation of competent, not yet
competent and more than competent is inadequate. For example, the
Department of Education, in its 2005 subject assessment guidelines
(Department of Education, 2005, p. 7) makes the following statement:
Source: http://www.inseta.org.za/downloads/RPL_Concessions_Guideline_V6_2012.pdf
(p. 2)
As a general statement, it can be noted that where the competence
measures are closely related to a specific purpose or a fairly narrowly
defined job or situation, the fewer the number of competence levels that
are required. Where the outcome of the assessment is to be more widely
used, a greater number of competence levels are required.
The number of levels of competence is quite arbitrary, despite what some experts
would have us believe. In some cases, a simple three-way system is all that is
required. In other situations, a more refined system with as many as seven or
more levels may be more appropriate. The number of levels depends largely on
the purpose of the assessment, how easy it is to differentiate the various levels,
and how well trained the assessors are in the assessment process.
Dubois and Rothwell (2000) argue that not all competencies are of this
technical/functional kind, and identify a class of higher-level
competencies focusing on softer issues such as people management
skills. These higher level or
As Dubois (2005, pp. 4–5) points out, within the general management
arena many of these generic interpersonal competencies can be
described within the broad notion of emotional intelligence. He goes on
to give several examples of these higher-order or personal functioning
competencies as follows:
It remains true, however, that as one moves up the job hierarchy, the
emphasis increasingly shifts from technical and functional competencies
toward the higher-order, more interpersonal, professional and
managerial competencies. (For a look at other competency models, see
Chapter 17, section 17.2.1 and Tables 17.2 and 17.3.)
One final point to note in this respect is that competent workers tend to
be promoted to just beyond the level of their competence. This is
referred to as the Peter principle, which states that the members of an
organisation where promotion is based on achievement, success and
merit will eventually be promoted beyond their level of ability. Sooner
or later people are promoted to a position at which they are no longer
competent (their “level of incompetence”), and there they remain, being
unable to gain further promotion. In other words, employees tend to rise
to their level of incompetence. This view was first formulated by Peter
and Hull (1969).
Our research project identified three prime issues relating to the cross-
cultural use of an organisation’s competencies. First, clear links were
found between cultural background and people’s perceptions of what
good performance looks like in terms of interpersonal and social
competencies, for example, the commonly found competency of
“leadership” and “team working”.
And third, the research found that, while managers from Germany, the
UK, Italy and the USA generally showed agreement about what
constitutes effective behaviour, there were also some clear areas of
disagreement. … [S]ubtle important differences of perception
concerning individual behaviours frequently mean that direct
comparison between candidates from specific countries are subject to
systematic bias (pp. 19–20).
12.10 Summary
Additional reading
The work by Dubois is a sound introduction to competencies in the workplace. See, for
example, Dubois, D.W. (2005). What are competencies and why are they important?
and Dubois, D.W. & Rothwell, W.J. (2000). The competency toolkit.
For a sound critique of the competency concept, see Furnham (2008), Chapter 11. He
has also produced an excellent book (2003) entitled The incompetent manager.
Short paragraphs
1. Define competence and discuss the basis for choosing a specific number of
competency levels.
2. Competencies are the same in every culture – there are no differences between
various cultural groups. Discuss.
Essay
Describe how you would set about drawing up a set of competencies for the press
officer of a national sports team (e.g. rugby, cricket or football).
13 Assessing integrity and
honesty in the workplace
OBJECTIVES
13.1 Definition
Until the late 1980s, the most widely used method of identifying and
dealing with workplace dishonesty was the polygraph examination
(popularly known as a “lie detector test”), and these tests were often an
important consideration in decisions of whether to hire or fire specific
individuals. However, their use was highly controversial, both because
of doubts about their validity and because of concerns over invasions of
the privacy and dignity of people being examined (Murphy, 2000). The
Employee Polygraph Protection Act of 1988 placed severe restrictions
on the use of the polygraph in the workplace, and this method was
abandoned by most employers. Employers who had relied on polygraph
examinations and other potentially invasive methods sought alternatives
for dealing with workplace dishonesty. Integrity tests have been
embraced by many organisations either as a replacement for the
polygraph or as a selection tool in its own right (Coyne & Bartram,
2002).
The covert approach is the most widely used, and many test
producers/distributors have spent time and effort on isolating deviant
profiles from their broad-spectrum personality scales. In some cases,
dedicated integrity tests have been developed – these are all strongly
reminiscent of existing normal personality scales (such as the 16PF) or
more clinically focused measures such as the MMPI. These models are
based on a relatively narrow definition of deviance, which is essentially
seen as fraud/theft. However, given the acknowledged violence in this
country, there are numerous other aspects which should be taken into
consideration. Robbery with violence, intimidation, outbursts of
violence, uncontrollable anger, addiction, gambling and organised
crime, to name but a few, suggest that the dominant characterisations of
deviance and its opposite, integrity, need to be broadened. In this
process a much broader range of indicators of deviancy and/or risk of
deviancy need to be explored.
There is strong evidence that both overt and personality-based tests are
related to the broad five-factor model traits of Agreeableness,
Conscientiousness and Neuroticism (or adjustment) – the ACN factors
(Hough & Oswald, 2000; Wanek, Sackett & Ones, 2003). These same
traits have been associated with individual-level counter-productive
behaviours (Berry, Ones & Sackett, 2007) and national-level corruption
rates (Connelly & Ones, 2008). Hough and Oswald also show that meta-
analysis (the combination of a large number of related studies) indicates
that “integrity and conscientiousness tests usefully supplement general
cognitive ability tests when predicting overall job performance.
Converging evidence exists for the construct, criterion-related and
incremental validity of integrity tests” (Hough & Oswald, 2000, p. 6).
Within five years, the use of the polygraph for the purpose of pre-and
post-employment screening was prohibited, albeit with some
exemptions. The use of polygraphs is also prohibited for general use in
South Africa. This near prohibition of polygraph testing has led to the
widespread use of the paper and pencil integrity test.
Relatively young
Single
Few social bonds (shown by socialisation/family background
conditions, frequent changes of address)
Use of drugs (both recreational and as a result of dependency)
Various indicators of mental instability such as having been a patient
in an institution for the treatment of mental, emotional or
psychological disorders
Poor quality and duration of education
Low work ethic
Poorly developed moral values
Of course, it is not the presence of one or two of these factors alone, but
a consistent pattern or profile that is most often associated with a risk of
lack of integrity.
13.3.1 Reliability
The reliability of a testing method examines the extent to which a test-
taker’s score can be relied upon. Reliance can be in terms of internal
consistency (responses to items are related) or stability (the tendency to
get the same score over a number of trials or with different assessors).
Empirical studies have clearly illustrated the reliability of integrity tests
in terms of internal consistency and test-retest (values consistently above
0,7). Based on a meta-analytic review, Ones, Viswesvaran and Schmidt
(1993) report a mean alpha of 0,81 and mean test-retest of 0,85 for
integrity test reliability.
13.3.2 Validity
Internationally, the validity of integrity assessment tools is problematic
as honesty and integrity are extremely difficult constructs to define with
enough precision to allow empirical measurement of those constructs.
Ones et al. (1993) conducted a massive meta-analytic study in which
they analysed the results from 665 validity studies. This is the largest
and most comprehensive study of its kind to date. For the prediction of
Counterproductive Work Behaviours other than theft, Ones et al. (1993)
reported validity for overt tests of 0,39 and 0,29 for personality-oriented
measures, although it was lower for overt tests. Thus there appears to be
no basis on which to choose one type of test over another. For theft in
particular, both overt and personality-oriented tests had validity
coefficients of 0,33. Integrity tests therefore had good validity with
regard to the prediction of CWBs. In addition to the CWBs, integrity
tests have also been found to predict job performance (Ones et al.,
1993). In fact, integrity tests are the “personnel selection method with
the greatest incremental validity in predicting job performance over
cognitive ability” (Berry, Ones & Sackett, 2007, p. 2). For both overt
and personality-oriented measures, Ones et al. (1993) reported
coefficients of 0,41 for the prediction of job performance.
13.3.3 Scope
Scope refers to the range of attributes covered by the assessment method
and how focused or general the method is. The method may cover a
detailed aspect of a specific attribute or a general overall picture.
Integrity tests were initially designed to measure the specific construct
of honesty and employee theft, and as such are generally narrow and
specific in terms of scope. However, this is especially true of overt
integrity tests, which tend to be much more specific in their approach
and typically comprise subscales that measure specific attributes such as
predisposition to theft, past theft and drug abuse. The covert
(personality-oriented) and sociological profiling approaches are more
concerned with identifying overall levels of counterproductive
behaviour rather than just theft or drug abuse, and are therefore far
broader in design and outcome. The Integrity International based in
Johannesburg has adopted this broad approach to assessing integrity.
Equal opportunity laws in many countries will prohibit the use of tests
in a manner that discriminates unfairly against protected groups of the
population (such as gender, race, age, disability, religion, etc.).
Adverse impact in itself is not unfair but it provides initial evidence
for indirect discrimination. The question arises then as to whether
integrity tests show adverse impact. Qualitative reviews have
suggested that no adverse impact is seen for integrity test scores
(Goldberg et al., 1991, Sackett, Burris & Callahan, 1989). However,
as Ones and Viswesvaran (1998) point out, studies looking at this
issue have tended to confuse adverse impact with intergroup
differences. Adverse impact relates to the use of the integrity test in
occupational settings, whereas group differences focus on whether a
bias occurs within a scale. Yet by looking at group differences within
an integrity test, information regarding the likelihood that the test
would cause adverse impact (so long as selection decisions were
based only on that specific test) can be obtained. Ones and
Viswesvaran (1998) examined group differences by age, gender and
race on overt integrity tests in a sample of 724 806 job applicants.
Effect sizes (differences between groups in terms of standard
deviation units) showed females scored 0.16 SDs higher (more
positive) on integrity tests than males and that those 40 and over
scored 0.08 SDs higher than those under 40. Further, Blacks and
Asians scored 0.04 SDs lower than Whites, American Indians 0.08
SDs lower and Hispanics 0.14 SDs lower than Whites. From this they
argue that differences between age, gender and racial groups on
integrity test scores are minor especially as values of 0.2 or lower are
considered to be small (Cohen, 1977). Previous research appears to
illustrate the lack of bias and by implication adverse impact of
integrity tests. Indeed, Arnold (1991) argues that the statistical record
of honesty/integrity tests, which illustrate their freedom from adverse
impact, cannot be matched by any other selection technique (pp. 7–8).
One problem is that integrity assessments tend to have high failure rates
(false negatives) or require very stringent scoring systems which may
result in the rejection of honest employees (false positives). This may
result in employees who should be selected being excluded for the
wrong reasons – a truly honest person may truthfully answer items in
ways that impact negatively on their scores, or an individual who is truly
honest and answers the assessment accurately can be seen as being “too
good to be true”.
13.4.1 Consistency
Consistency measures are designed to ensure that the person being
assessed remains alert and consistent in his response. Simply put, one
item may consist of a simple statement such as “I like ice cream”, and
then later in the survey the same statement or its opposite is put forward
(“I do not like ice cream”). In scoring the test, pairs of items of this kind
are examined and the degree to which the person is consistent is noted.
If the consistency score is too low, this would suggest that the
participant was not giving the task at hand due consideration.
In their study, Espinoza et al. (2012) compared groups from the US,
Mexico and China on a measure of SD. The results indicating the
proportion of their sample giving specific answers are shown in Table
13.1. Note that items marked with an asterisk (*) are negatively phrased
and are expected to be disagreed with, whereas those without an asterisk
are expected to be agreed with.
13.5 Summary
Integrity in both its forms and its opposites reflects the degree to which
people follow the precepts laid down by their family, community and/or
religion/moral guides. These may differ across different groups and
cultures, and what may be acceptable in one setting may be taboo in
other settings. The reasons for the breakdown of integrity, whether this
is seen as dishonesty, counter-productive work behaviours (CWBs) or
corruption can be accounted for at both individual and sociological
levels. Individual accounts look at aspects such as personality, moral
development, socialisation practices and even a sense of entitlement.
Sociological theories look at cultural explanations in terms of Hofstede’s
dimensions of culture, the differences between guilt and shame cultures
and “caring and sharing” cultures, and principle vs relationship cultures.
Although the concept of ubuntu has numerous advantages, the downside
of it is that it may predispose people to focus on relationships rather than
principle – and at the end of the day, integrity is the adherence to
principles even when this is inconvenient and against the short-term
interest of the person and his community. At the same time, it appears
that there has been a general “dumbing down” of integrity in many
places in the world – pop idols and sports heroes are looked up to as
they violate society’s norms in line with the anti-establishment bravado
of the “brave new bling world” that is emerging.
Additional reading
Alliger & Dwight’s (2000) investigation into whether integrity tests can be faked or
modified through coaching is an interesting look at the assessment of integrity using
tests, while Berry, Sackett & Wiemann (2007) explore various developments in integrity
testing that have occurred in the new millennium.
Fine (2010) takes a close look at the origins of corruption, linking it to various
personality and sociocultural factors.
Whiteley’s (2012) paper on whether Britons are becoming more dishonest looks at the
effects on British people of the general approval of low-integrity behaviour in which
programmes such as Footballers’ Wives and the “bling world” they live in are seen as
something to aspire to.
Short paragraphs
Essays
1. Discuss integrity and the methods used in its assessment. Outline some of the major
issues associated with this in a multicultural context and comment on the fairness of
the different ways of assessing it.
2. Describe the psychometric properties of the various methods of assessing integrity.
SECTION
4
OBJECTIVES
The most important reasons for obtaining this information include the
following:
If we look closely at these six aspects, we see that they boil down to
three basic functions: describe (assess the current situation), decide (use
the information to make certain choices and implement actions), and
develop (use the information to try to change the situation for the better).
Although this matrix is not exhaustive and other types of assessment are
available, Table 14.1 gives some indication of the range of situations in
and for which assessment is useful. We will examine each of these in
turn, although there will be some overlap between categories. However,
our main concern is with the primary beneficiaries of the assessment.
For example, selection is the process of matching an individual to a job.
Although this does benefit the individual (he is offered the job), the
primary beneficiary of the process is the organisation, because it has
someone to carry out its objectives. On the other hand, assessing a
person so that he understands himself better and can make more
informed decisions about his career or job-related strengths and
weaknesses is of prime interest and concern to the person involved.
14.2.1 Definition
Selection is the process of matching people to job requirements in order
to meet organisational objectives, both current and in the longer term.
As part of this process, a wide range of attributes is assessed. These
include knowledge, ability or aptitude, personality, potential, leadership
potential, learning style, management style and communication style, to
name but a few. Chapter 12 illustrates how all these KSAVs
(knowledge, skills, attitudes and values) form the competencies required
for a job. They are all important because they determine how well the
person will do in his job, whether he will benefit from training and
development, and how well he will fit into the organisation. (See, for
example, Furnham, 2003.)
When this is scored, it yields, among other things, information about the
psychological characteristics associated with the job (in terms of a
modified list of Thurstone’s primary mental abilities or PMAs – see
section 10.3.2). It also provides a list of tests that can be used to assess
these abilities and the score ranges on these tests that have been shown
to have high predictive validity for a specific job. Although these tests
are American, it is relatively easy to find local equivalents and derive
appropriate cut-off scores for them.
Saville & Holdsworth Ltd (SHL) have a similar system known as the
Work Profiling System (WPS). It is designed to help employers
accomplish many of the human resources functions outlined above. The
job analysis component yields reports aimed at various human resources
functions such as individual development planning, employee selection
and job description. There are three versions of the WPS linked to
different types of occupation: managerial, service and technical
occupations. The WPS is administered by computer at the company’s
premises. It contains a structured questionnaire which measures ability
and personality attributes in areas such as hearing, sight, taste, smell,
touch, body coordination, verbal skills, number skills, complex
management skills, personality and team role. Unlike the PAQ, which is
scored at the University of Purdue in the US, SHL does not require WPS
users to submit their data – the WPS is scored on site via the Internet.
From Figure 14.1 we see that the overall validity of the assessment
technique (G) depends on the degree of overlap between the three
circles, A, B and C. The more accurate the job description and selection
criteria are (i.e. the overlap between circles A and B), the greater the
predictive validity (G). Similarly, the greater the construct validity of the
battery (i.e. the overlap between circles B and C) is, the greater will be
the size of G. Finally, the more accurately the outcomes of the selection
technique reflect the job requirements (i.e. the degree of overlap
between circles A and C), the greater will be the predictive validity of
the selection process.
From this we see that in order to increase the overall validity of the selection
process (G), we must find ways of increasing the amount of overlap between the
three circles A, B and C.
In addition to these direct costs, poor selection can result in quality and
safety being compromised, increased risks of injury, underspending of
allocated budgets, and under-delivery of vital services, to name but a
few. George and Reiber (2005) argue that the cost of replacing a single
employee can vary between one-and-a-half to three times his annual
salary. Harvard Business School puts this at three to five times the
annual salary. These costs are made up of factors such as the following:
Table 14.3 shows the various levels of work, together with the level of
capability required to effectively manage tasks at a particular level.
14.3.5 Trajectories
A key aspect of the theory is that the career paths of people are generally
locked on to particular trajectories (or “modes” – sometimes called
“growth curves”) that are determined largely by the complexity of their
cognitive processes. These trajectories are empirically determined on the
basis of experience with a large number of organisations. Typically,
organisations such as the Dutch oil company Shell evaluate their
managers every six months or so in terms of where they will likely end
up in the organisation. For example, if a manager is not at the level of
general manager (level 4) by age 35, there is very little chance that he
will ever become managing director of the organisation. This is shown
in Figure 14.4.
However, CPA does make allowance for this and provides an indication
of an individual’s capability to generate, understand and act in contexts
where prior knowledge and experience may no longer be applicable.
CPA provides an understanding of the nature of freedom the person
requires to act appropriately, as well as the value and type of work
contribution he is likely to make at various levels in the organisation.
The process is therefore able to identify potential for advancement and
suggest the best fit between capability and the demands of the
organisation. It creates mutual benefit for the individual and the
organisation as it predicts the person’s capability to generate
contextually appropriate solutions and decisions even in the absence of
previously acquired knowledge, skills and experience.
Work by people such as Stamp and Retief (1996), and Mauer (2003a,
2003b, n.d.) has demonstrated the test-retest and inter-scorer reliability
as well as the construct validity and cultural fairness of the CPA. It is
also one of the few assessment methodologies to have been the subject
of a full independent validation at national research institute level.
Table 14.4 When to use and not to use advanced selection techniques
Barrick et al. (1998) have shown that conscientiousness (one of the Big
Five dimensions) predicts task performance in a team setting, especially
where team members contribute independently to the outcome. They
found that where the team performs in such a way that interpersonal
conflict is possible, agreeableness is a better predictor of team success –
one disagreeable person is often all that it takes to disrupt team
performance. As Hough and Oswald (2000, p. 645) show, this
consideration illustrates the importance of selecting people with
acceptable levels of interpersonal skill. However, the level of
interpersonal skill required also depends to a large degree on the role the
person plays in the team, as shown in the next paragraph.
Belbin’s Team Role Inventory (Belbin, 1981) is a useful device both for
selecting team members and for team-building exercises. The scale
consists of eight different groups of seven items (56 items in total)
which determine which of nine different roles individuals prefer to play
in a team situation. These roles are as follows:
14.6.2 Training
Most organisations pay large sums of money for the training and
development of their employees. In South Africa, the Skills Levies Act 9
of 1999 requires organisations to pay two per cent of their wage bills for
this purpose. As a result, many organisations make a conscious effort to
evaluate the effectiveness of their training efforts. Although there are
numerous methods for doing this, the best known is Kirkpatrick’s four
levels of training evaluation (Kirkpatrick, 1996). These levels are:
1. Reaction. Did the trainees enjoy the course, was the material well
presented, etc.?
2. Learning. Did the trainees learn (and retain) significant new
competencies?
3. Behaviour change. Do the trainees do things differently as a result
of this learning experience?
4. Impact. Has the training had any measurable effect on quality,
productivity, safety or any other (specified) indices of organisation
effectiveness? In other words, has the training affected the bottom
line to any meaningful extent?
Much of this chapter, and indeed much of the book, has assumed that we
are able to quantify both the predictors (test and other assessment
scores) and the criteria (job satisfaction, performance productivity, and
so on). However, we do note in Chapter 5, section 5.4, that one of the
difficulties associated with validating assessment techniques is the so-
called criterion problem*. This issue is of great relevance to
assessment in organisations and needs to be dealt with at this point.
Although all three methods can be used, they involve different amounts
of work: forced distribution methods are the easiest; paired comparisons
are the most difficult.
There are, however, several problems associated with rating scales. The
first is the global nature of many of the dimensions being evaluated,
which results in high levels of ambiguity – it is never clear exactly what
behaviours are included in the dimensions being rated. What exactly do
“Communication”, “Relations with others” or “Quality of work” mean?
Secondly, there is the halo effect in which performance on one
dimension is influenced by other irrelevant factors.
14.9 Summary
Additional reading
Essays
1. Using a suitable diagram, explain the relationship between job requirements, what an
assessment technique claims to measure and what it actually measures. What are
the implications of this for a fair and valid selection process?
2. Describe what is meant by career path appreciation.
3. Besides selection, what are the other uses of assessment in organisations? Who
should carry out these assessments and what techniques are available?
15 Assessment for career
counselling
OBJECTIVES
15.1 Introduction
The benefits for companies from adopting these new forms of work
organisation include
the protection and creation of jobs, including new types of jobs that
match the aspirations of employees
improvements in the quality and the content of work itself.
Clearly the nature of work is changing, and will continue to do so as
new technologies change what we do and how we do it. New careers are
constantly emerging and these require new skills and attitudes, and
career counselling has to keep abreast of these developments if it is to
remain effective and relevant in post-modern society. We also have to
recognise that people are likely to change their jobs a number of times in
their lives – according to Naicker (1994), people in the major developed
economies change careers an average of five times during their career
lifetime. Savickas (2006) asserts that individuals in the US born between
1957 and 1964 had an average of ten jobs between the ages of 18 and
38. This creates a whole new spin on career counselling, stressing the
need for critical thinking and multiskilling to allow people to move
relatively easily and seamlessly between jobs and economic sectors.
Post-modern approaches of “life design counselling” and other
techniques such as “lifeline”, “collage”, “role identification” and
“fantasy” may be useful in helping people design their lives, but in the
absence of meaningful opportunities to exercises one’s talents and
preferences, no amount of “interactive group discussions that focus on
the personal application of these techniques to enhance the life design
process” (Zunker, 1998) will be of any use. (For an explanation of these
terms and how they help in “the construction of meaning through
communication”, the reader is referred to Maree, 2010, p. 363).
Although there is room for both approaches, this book is rooted in the
modern and positivistic (as against a post-modern) paradigm, and so the
traditional selection approach rather than the interpretative facilitation
model is adopted. Nevertheless, psychometric assessment can be used to
assess individual abilities, flexibility, personal styles and preferences as
a basis for further exploration and social construction. As is shown
below (section 15.3.2), people are drawn to, and are most comfortable
in, work environments that match their temperaments and personal
styles. While a “people” person may move between teaching, human
resources management and running a B&B, he will be far less likely to
move to a career in bookkeeping or technical drawing. Even in a post-
modern world, people still need to find areas in which their skills and
preferences can be fully exercised. It is an axiom of modern biology that
an organism must be able to adapt to its environment or move
elsewhere, or else it will die! Although humans can adapt to hostile
environments, like most creatures they prefer to be within their comfort
zones for most of the time – the energy costs of continually fighting
against an unfriendly environment are too high for this to be sustainable.
What is a job?
What is a career?
How does one go about advising people about possible careers and
selecting them for these?
This definition is less than adequate because it fails to take into account
that a career consists of a series of related jobs – being a butcher, a baker
and a candlestick maker are three careers, not one. Suppose a teacher
became HoD and then decided to go into industry as a human resources
manager, or to sell houses or to run a guesthouse. Clearly, this is a
change of career. Each new career may have its own promotion route.
However, in most cases the skills and experience gained in one career
(e.g. teaching) may not be very useful in the new career (such as selling
houses). In other words, a career can be seen as a family of related jobs
with some kind of hierarchy so that people are seen to “progress” as
they move to increasingly senior levels within this family of jobs.
One result of moving between career routes too often is that a person
does not gain enough experience in one particular sphere to move up the
career ladder. In the past, changing jobs too often was frowned upon – it
was seen as job hopping. If a person applied for a job and the
interviewer saw that the applicant had had four or five jobs in the last
ten years, he would become very suspicious. If job applicants were seen
to be job hoppers, it would count against them. However, things have
changed somewhat.
There are many different jobs and career routes, and so a person needs
to find out more about what is available. If a person is unaware of
certain possibilities, he will not be able to consider them at all through
sheer ignorance. It is well established that the best single predictor of a
person’s career is the career followed by his parents – teachers tend to
produce teachers, doctors tend to have children who become doctors,
accountants produce accountants, motor mechanics have children who
become motor mechanics, and so on. This happens because the parents
provide information about the kinds of things they do (and not about
other things) and act as role models for their children. Parents shape the
interests, attitudes and values of their children, and these then shape the
career patterns of the next generation.
If we look for our careers in the light of the knowledge we have, we may
well be looking in the wrong place.
An important aspect of the model is that these six factors are arranged in
a hexagon or circle. This means that the factors next to each other are
fairly similar and more likely to occur in a single job. For example,
teaching involves “social” and some “artistic” and “enterprising”
(persuading) ability. On the other hand, factors that are opposite each
other in the model are opposite each other in real life and seldom occur
in the same job. For example, bookkeepers and accountants have to
follow rules and conventions (such as the double entry system) and in
general are not very artistic or social in their work. The same can be said
for engineers, although they are more practical and investigative rather
than conventional (even though the rule-following aspect of this
dimension should not be overlooked). This model is shown in Figure
15.1.
Of course, most jobs require more than one characteristic, and so jobs in
general are described using a three-letter label to reflect their three most
important factors: an English teacher is SAE (social, artistic and
enterprising); an airline pilot is RIE (realistic, investigative and
enterprising) and a surgeon is ISR (investigative, social and realistic).
These three characteristics need not be in adjacent “blocks”, as can be
seen in the case of the pilot and the surgeon. There are various
directories, for example Holland’s The occupations finder, in which a
large number of jobs are rated on their three most important dimensions.
However, these directories tend to become outdated fairly quickly and
many new jobs such as professional sportsman, call-centre operator and
Internet service provider (ISP), operator or administrator are not listed.
We must also not forget that there are similar jobs that exist along a
single dimension, but vary according to the level of education required.
A good example of such a hierarchy of jobs is in engineering, with jobs
ranging from unskilled labourers, to semi-skilled artisan aides, to skilled
artisans, to technicians, to graduate engineers and finally to high-
powered consulting engineers with Master’s degrees or doctorates. This
means that, although a person may be interested in, say, the engineering
field, the exact level to which he aspires will depend on his evaluation of
himself and his ability to succeed at a particular level. A person who
feels that tertiary education is beyond him or who cannot afford it may
opt for the artisan or technician route. On the other hand, the person who
believes he has the potential to acquire a primary or higher degree and
can afford the university fees would go the university graduate engineer
route. The implications of different educational levels for assessment is
that we need to assess the person’s ability and likelihood of succeeding
at a given level within a career family. (This issue is dealt with in
15.4.1.)
1. Technical/functional competence
2. General management competence
3. Autonomy/independence
4. Security/stability
5. Entrepreneurial creativity
6. Service/dedication to a cause
7. Pure challenge
8. Lifestyle
The meanings and implications for people with these career anchors are
given below (see also Schein, 1995).
Now that we have a reasonable idea of how jobs and careers are
structured in terms of the six RIASEC dimensions and four educational
levels, the next step is to assess the presence and relative strength of
each of these factors in a person. In general, five different sets of
competencies exist:
1. Abilities
2. Interests
3. Personality
4. Values
5. Motivation and/or ability to study further
15.4.1 Ability
Abilities can be divided into two broad categories, namely general
ability (often called intelligence or “g”) and specific abilities or
aptitudes.
15.4.1.1 General ability
Because general ability (“intelligence”) plays an important role in
determining the level of education that can realistically be achieved, any
form of assessment for career purposes should include some general
measure of this. Numerous scales (both verbal and non-verbal) are
available. A relatively simple and culture-fair measure is Raven’s
Standard Progressive Matrices, used as a timed test with a time limit of
30 minutes. It can be used over a relatively broad educational range (to
university level), provided suitable norms are available.
15.4.2.1 Values
According to Katz (1983, p. 17), “… values represent feelings (and
judgment) about outcomes or results, such as the importance, purpose or
worth of an activity”. Values therefore lay the foundation for our
attitudes, motivation and job satisfaction (Robbins, 1996, p. 174). Nevill
and Super (1986) argue that one of the main reasons for job
dissatisfaction is a poor match between the satisfaction of work-related
needs and post requirements (cited in Langley, Du Toit & Herbst, 1995,
p. 3). It follows from this that the process of values clarification is very
important in career planning, and therefore any technique that helps
people clarify what they value in terms of work style will assist them in
making more fulfilling and rewarding career and employment decisions.
15.4.2.2 Interests
Many of the scales that measure career interests are based on Holland’s
RIASEC model of six vocational personality types. However, there are a
number of other occupational interest scales such as the Kuder
Occupational Interest Survey (Kuder), the 19-Field Interest Inventory
(19FII), the Jackson Vocational Interest Survey (JVIS) and the Strong
Interest Inventory (SII). These scales measure people’s interests in a
broad range of occupations, work activities, leisure activities and school
subjects, and compare these with the interests of people successfully
employed in the particular occupations. Organisations such as SHL and
Psytech have their own scales such as the Occupational Interest
Inventory and the Occupational Interest Profile. They produce scores on
various dimensions relevant to choosing a career.
Research (e.g. Haverkamp, Collins & Hansen, 1994; Ryan, Tracey &
Rounds, 1996; Fouad, Harmon & Borgen, 1997; Day & Rounds, 1998)
suggests that there is a certain universality to vocational interest
structure, such as that expressed by Holland’s vocational personality
types, which cuts across ethnicity, gender and socioeconomic status.
This conclusion is not unanimous, however (see, for example, Rounds &
Tracey, 1996).
15.4.3 Personality
Personality plays an important role in career choice, because it
determines the kinds of situation in which a person is most comfortable.
As Chapter 11 states, a good definition of personality is that it is the
preferred way of perceiving and interacting with the environment. In
other words, it is our personality that directs our ongoing attention to
various aspects of our world, and thus has a major role in determining
both what kind of work we choose and how happy we will be in
particular situations. For example, an outgoing energetic personality will
not be happy in a situation that requires close attention to detail over a
long period of time. Similarly, a shy, retiring person will not be happy or
successful in situations that require high levels of drive and enthusiasm.
As a result, an important aspect of advising a person about his career
choice is to assess his personality. This topic is dealt with extensively in
Chapter 11. Various assessment measures can be used, including those
based on the 16PF and Jung’s theories. SHL has developed a
comprehensive instrument called the Occupational Personality
Questionnaire (OPQ), which is very useful in this context, although it is
computer administered and scored at a cost.
There is some confusion about whether the spelling of these terms is “extravert”
or “extrovert” and “intravert” or “introvert”. Both are acceptable, and in this text
“extravert” and “introvert” are used.
Intuitives like to read between the lines and look for meaning among the
hard facts. They use their imagination, and trust their own intuition and
hunches. They see the big picture and are oriented towards the future
and what might be, rather than what is. They focus on possibilities,
meanings and relationships, and tend to be more concerned with broad
principles and patterns than with fine detail. Intuition allows perceptions
beyond what is visible to the senses, and tends to focus on the abstract
and creative future. Obviously, we all use our five senses to relate to the
world, but each of us also has a preference for the kind of information
we take in.
Feelers consider how much they care about an issue and what they feel
is right; they pride themselves on their empathy and compassion. They
come to decisions by weighing relative values and the merits of the
issues in a situation. They have a capacity for warmth, human concern
and preservation of traditions and values of the past. They are more
subjective and person oriented in their approach to decisions.
Implications of T/F for careers
Thinkers prefer work that allows them to analyse things logically and to
apply objective criteria to decision making. They need to work in
situations where they can use logical analysis to reach conclusions and
are able to base their decisions on the principles involved in the
situation. Because of this approach, they tend to decide impersonally,
sometimes paying insufficient attention to other people’s wishes and
sometimes inadvertently hurting people’s feelings. They tend to have
firm views, and can give criticism when appropriate – this allows them
to work in situations where there is tension and little harmony. They feel
rewarded when a job is done well, whether or not the people involved
are happy with the process. They are best suited to jobs where principles
and precedent count more than relationships and harmony.
However, they tend to leave issues open for last-minute changes and to
postpone decisions while searching for options. They feel restricted
without change and are inclined to postpone doing unpleasant tasks.
This creates problems if they are in a situation where tight deadlines and
routine decisions have to be made. They also use lists to remind them of
all the things they have to do. If they have the ability, perceivers do well
in the “investigative” and “artistic” professions related to the categories
in Table 15.1.
Type Role
ISTJ The Duty Fulfillers
ESTJ The Guardians
ISFJ The Nurturers
ESFJ The Caregivers
ISTP The Mechanics
ESTP The Doers
ESFP The Performers
ISFP The Artists
ENTJ The Executives
INTJ The Scientists
ENTP The Visionaries
INTP The Thinkers
ENFJ The Givers
INFJ The Protectors
ENFP The Inspirers
15.5 Summary
For an overview of the management of the career process, see Chapter 12 in Crafford,
A., Moerdyk, A.P., Nel, P., O’Neill, C. & Schlechter, A. (2006). Industrial psychology:
Fresh perspectives.
People interested in using testing for employment equity purposes are referred to the
Public Service Commission (of Canada) (2006), Standardized testing and employment
equity career counselling: A literature review of six tests. The purpose of this document,
developed by the Personnel Psychology Centre (PPC) of the Canadian Public Service
Commission, is to summarise research evidence in respect of the use of standardised
tests and other assessments in the career counselling of employment equity (EE)-
designated group members (i.e. aboriginal peoples, persons with disabilities, visible
minorities and women). Although not directly applicable to the South African situation, it
raises numerous issues of importance to the local context. This is available at
http://www.psc-cfp.gc.ca/ee/eecco/intro_e.htm
Pearson Publications offers a career assessment tool known as IDEAS™ (Interest
Determination, Exploration and Assessment System®), which is designed to be used in
conjunction with career exploration and guidance units. According to the brochure, the
IDEAS inventory offers 16 basic scales based upon the RIASEC themes, and helps
both students and adults to develop an awareness of possible career choices. The
IDEAS test can be accessed at http://www.pearsonassessments.com/tests/ideas.htm
Short paragraphs
Essay
According to Holland, career choice and success are based on a match between an
individual’s personality and the characteristics of the job and/or career. Discuss this
argument, showing how it should be used in assessing a person seeking career advice.
16 Interviewing
OBJECTIVES
16.1 Introduction
16.1.1 Definition
According to Cohen and Swerdlik (2002, p. 410), “[a]n interview is a
technique for gathering information by means of discussion”. In general,
interviews are relatively unstructured procedures designed to obtain
information about an individual that is not readily available via more
formal psychological assessment techniques. There are various forms of
interviews, ranging from the completely unstructured approach to highly
structured approaches. Interviews are used for both diagnostic and
assessment purposes in the work situation. Although they have elements
in common, in some respects these two applications are very different.
16.3.1 Reliability
There is a great deal of evidence to show that interviewers in general are
relatively consistent in their assessment and evaluation of people if the
interview is repeated (or a video of it is looked at several times). In other
words, there is high test-retest reliability. However, especially in
unstructured interview situations, there is little agreement between
interviewers, resulting in low inter-rater reliability. Unless, firstly, the
interview is carefully structured beforehand; secondly, interviewers are
trained to follow the structure and to record information in a consistent
way; and thirdly, the interviewers follow the structure closely, there is a
strong possibility that the interview will be unreliable. If these
conditions are not met, no two interviewers will cover the same ground,
resulting in poor internal consistency.
Remember – any assessment that is not reliable cannot be valid!
16.3.2 Validity
Although clinicians regard the interview as an extremely useful source
of information, there is little evidence that the clinical or counselling
interview is either reliable or valid. Yet it continues to be used –
possibly because clinicians feel that it ought to work, and because of
their training, clinical psychologists should be better than anyone else at
observing and judging another person’s behaviour. Evidence runs
counter to this. As early as 1954, Meehl showed that decisions based on
unstructured, non-directive interviews were no more accurate, and often
less so, than those using more structured techniques. These findings
have been supported by later research (e.g. Dawes & Corrigan, 1974;
Goldberg, 1970; Wiggins, 1973). In general, the validity of unstructured
interviews tends to be around 0,10 to 0,15, whereas properly structured
interviews linked to specific conditions (clinical) or job descriptions
(work related) are in the region of 0,5 and above.
Hough and Oswald (2000, p. 646) show that the vast majority of cases in
the US federal courts contesting selection and other workplace decisions
involve unstructured interviews. A review of 158 US federal court cases
involving hiring discrimination from 1978 to 1997 by Terpstra,
Mohamed and Kethley (1999) showed that unstructured interviews were
challenged in court more often than any other type of selection device.
Even more important is the fact that unstructured interviews were found
to be discriminatory in 59 per cent of these cases, whereas structured
interviews were found not to be discriminatory in 100 per cent of cases.
In addition, there is a great deal of evidence to suggest that even when
interviews are valid predictors of later job performance, the inclusion of
interviewing does not add to the overall validity of the selection process.
This lack of incremental validity* has been reported by Cortina et al.
(2000) and by Walters, Miller and Ree (1993), among others.
16.4 Reasons for poor reliability and validity
What is even more interesting is the fact that in job and performance
management interviews, physically attractive people are consistently
rated more positively than other people. A problem can occur with the
selection of sales staff – who may be good at selling themselves, but
then fail to sell the organisation’s products or services.
Firstly, it is widely known that people believe that they know what is
happening, even if others do not. This is termed the “illusion of
validity”. Furthermore, they consistently overvalue their own decisions.
Fourthly, in the workplace, selection interviews are often used to sell the
organisation in an effort to make good candidates accept a job offer. The
question then arises as to who stretches the truth furthest in the interview
situation: the candidate who says what a marvellous person he is, or the
interviewer, who says what a marvellous organisation it is to work for.
For the interviewer the final stage involves formulating a decision and
recording this in the person’s file. This includes the decision to hire,
shortlist or reject the person, and taking appropriate actions to
implement the decision – a job offer, an invitation to the next phase or a
“We regret” (rejection) letter. It is important to realise that job applicants
are covered by the Employment Equity Act and need to be treated as
employees.
Of course, there are times when you may want to break these rules – it
may be useful to become angry to make the interviewee confront an
issue he has been avoiding. However, these cases are rare and should be
considered carefully before implementation.
Some dos
Response Example
Transitional phrase “Yes, go on.” “I see.” “Uh-huh.”
Verbatim repeat Repeats exact words. “Your previous boss had it in for
you.”
Paraphrasing and Repeats response using other words. “So you think your
restatement boss picked on you unnecessarily.”
Summarising Pulls together a number of different responses. “In other
words, you seem to think that your boss was treating you
badly.”
Clarification Clarifies the response. “Why do you think she behaved like
this? Was it just you or did she treat others in the same
way?”
Empathy/understanding Communicates understanding. “I know what you mean. It
must have made you feel angry.”
16.7 Summary
Additional reading
Kaplan, R.M. & Saccuzzo, D.P. (2013). Psychological assessment and theory (Chapter
8) provides a good overview of the issues involved in interviewing as well as a number
of useful tips.
An extensive guide to the development and conducting of structured interviews is given
by the Assessment Oversight and the Personnel Psychology Centre of the Canadian
Public Service Commission (2009).
Short paragraphs
Essays
1. Describe the strengths and weaknesses of interviewing and discuss what can be
done to improve it as an assessment technique.
2. Compare and contrast selection and performance appraisal interviews with clinical
and counselling interviews.
17 Assessment centres
OBJECTIVES
17.1 Introduction
Thus far, this book has dealt with various approaches to organisational
assessment for identifying managerial and cognitive skills. These
include techniques such as interviews, psychometric testing, work
samples, reference checks, bio-data and track records. We have also
seen the importance of triangulation and the need to use a range of
different techniques to obtain the information required for accurate
description and prediction.
There are many different sets of competencies that are used to assess
management candidates. Various consultants have their own way of
organising such competencies. Quite often, they have different sets for
different levels in the organisation and for different functional areas.
Saville & Holdsworth South Africa (SHL) (2005), one of the largest
sellers of assessment tools in South Africa, has four such lists, which
they describe as follows:
1. Universal Competency Framework™ (UCF)
Based on a model of 20 generic competencies found to contribute to
superior performance in all roles and positions in an organisation
A similar set of what they term their “Great Eight” Competencies is put
forward by SHL (Bartram, 2005, 2012) in Table 17.3.
Adaptability
Attention to detail
Caring
Collaboration
Communication: open
Communication: oral and written
Continuous learning
Crisis management
Discernment/judgement
Diversity
Drive for results
Initiative
Innovation
Negotiation
Organisational understanding
Planning and organising/time management
Problem solving
Professionalism
Quality
Reliability
Service
Technical expertise
Leadership competencies
Change leadership
Coaching
Collaborative leadership
Conflict management
Influence
Team leadership
Behavioural indicators
Monitors and checks work or information, and plans and organises his
time and resources efficiently.
Double-checks the accuracy of information and work product to
provide accurate and consistent work.
Provides information on a timely basis and in a usable form to others
who need to act on it.
Carefully monitors the details and quality of own and others’ work.
Expresses concern that things be done right, thoroughly and precisely.
Completes all work according to procedures and standards.
Drive for results. This demonstrates the person’s concern for
achieving or surpassing results against an internal or external standard
of excellence.
Behavioural indicators
The exercises should, as far as possible, reflect the realities of the job. It
is therefore necessary to align each exercise with the particular
organisation or industry in which it is being conducted. For example, if
the exercise is a role play dealing with a difficult worker, it is useful to
examine a few recent disciplinary cases to find out what issues typically
arise and how they have been handled in the organisation. These
findings must then be incorporated into the detail of the role play.
Depending on the purpose of the assessment, the exercises may
encourage competition or cooperation.
Score Interpretation
0 Does not exist; could not be detected
1 Poor, underdeveloped, inadequate for current position
2 Minimum required for adequate performance
3 Adequate – further development required for good performance
4 Above average – meets requirements for good performance
5 Very good – good performance almost guaranteed
6 Excellent – will succeed at the next level of promotion
6+ Excessive – too much of a good thing, unbalanced, will interfere with good
performance
The table shows that most exercises measure more than one
competency, and each is measured by more than one exercise. If we
remember that there are several observers who each observe different
participants in the various exercises, and that each observer assesses
different participants at different times, then we get a perfect multiple
competency, multiple technique and multiple observer triangulation
process.
Analytic ability: the ability to diagnose problems and obtain information by probing
Positive Negative
Breaks problem down into logical Deals with large/inappropriate
elements elements
Probes for information Accepts information at face value
Asks diagnostic questions Asks rhetorical/closed questions only
Identifies cause and effect Focuses on symptoms
relationships
Prioritises Lists
Note that a space is left below each list in which specific evidence
gathered during each exercise can be noted. Note also that a space is left
in the middle of the top line of each table. This is where the score or
competence level is recorded.
There is a table like this for each of the competencies listed in Table
17.2. Organisations change with time, and there is a serious danger that
using a particular competency framework for too long may result in the
organisation being filled with a particular type of individual. This may
create problems when the environment in which the organisation
operates changes. To illustrate this: a large South African organisation
had put in place a particular competency framework that was used for all
selection, promotion and development activities. The framework chosen
was focused on a strong, dominating management style, producing
managers that were high in assertiveness. When the government of the
country changed following the 1994 democratic elections, the
organisation acquired a new set of executives. In this transition, very
few of the previous senior management retained their positions – the
competencies they possessed were no longer relevant in the new
dispensation. A new set of competencies, requiring very different
selection, development, promotion and performance management
processes, had to be designed to fit in with the new ethos that was being
created.
Analytical ability The ability to diagnose problems and obtain information by probing
Positive 4 Negative
Breaks problem down into [ ] Deals with large/inappropriate elements
logical elements [ ] Accepts information at face value
Probes for information [ ] Asks rhetorical/closed questions only
Asks diagnostic questions [ ] Focuses on symptoms
Identifies cause and effect Lists
relationships
[ ] Prioritises
Asked sub when problem first On Monday he told me this and on Tuesday he
occurred did that and on Thursday he did …
What did you do next?
Why do you think this happened?
First you said they were useless,
and then what did you do?
The table shows that the participant scored either 4 or 5 on the various
exercises used to measure analytic ability. After some discussion, the
various assessors agreed that the participant’s final score (FIN) should
be 5, the reason being that the “in-basket” best demonstrates this
particular competence. Had the participant scored 4 here and 5
elsewhere, the final score would have been 4.
The same process should occur with every competency: first, the
individual assessors rate a participant on each exercise, then they reach
agreement about what he scores on each particular competency in each
particular exercise. Then the assessors combine all the scores for a
competency to come to a final decision about the level of the
competency. This process is repeated for each competency being
assessed.
Note that the description of Thandi’s performance is taken from the scoring levels
given in Table 17.9, with evidence being collected from the bottom section of the
box in Table 17.8.
During the feedback session, the results of the assessment centre should
be discussed as well as any suggestions or proposals about any
developmental steps that need to be taken. This involves aspects such as
training courses to attend, job assignments, reading, and so forth.
Feedback report
Dear ABC
Thank you for having made yourself available for the assessment centre held on
[date]. You have now received verbal feedback on your performance. You also
have a copy of your feedback report. This document briefly summarises our
findings, and proceeds to suggest ways in which you, with the help of your
managers and other important people in your life, can begin to address those
areas in which you feel improvement may be necessary for your future success at
work and elsewhere.
It is important to realise that this document, and the whole assessment process,
is not designed to criticise or judge you in any way, but rather to help you achieve
what you are capable of. The first stage in any such process is to evaluate one’s
current situation in relation to where one would like to be. Once this gap has been
identified, the process of closing it can start.
1 Cognitive competencies
1.1 Analytic ability
We define this as a person’s ability to analyse situations and diagnose problems
by probing for relevant information in a logical and rational manner; obtaining the
relevant information; relating and comparing data from different sources; and
identifying cause-and-effect relationships.
The importance of this competency lies in the fact that meaningful business
and/or management decisions should not be based on assumptions and
guesswork, but rather on as much relevant information as possible. People must
therefore seek out the information that is needed to come to sound decisions.
This is a competency that is required in all jobs and at all levels in the company,
but especially in those situations when the causes of problems or potential
problems need to be identified.
Your manager must regularly give you tasks that need analytic thinking, and
then help you in this process. On completion, he should review your
performance and point out your successes and where further refinement is
required.
As a matter of course your manager must learn to expect from you a
breakdown of all the advantages and disadvantages of any action or decision
that you need to take.
Developmental activities
Seminars/training/development courses
Problem solving Smith and
Jones
Turning plans into SMG
action Consulting
Work-based experience
[In this space, note down what you think you can do at work to further develop
and strengthen this competence.]
17.4.1 Reliability
Because assessment centres are based on observable behaviours, the
most important form of reliability is inter-rater or inter-scorer reliability.
Research has consistently shown that the inter-rater or inter-scorer
reliability of assessment centres is high (in the 0,60 to 0,95 range),
provided the assessors have been properly trained (Murphy &
Davidshofer, 2006). The fact that all scores are based on consensus
between different assessors and assessment techniques at the end of each
exercise and at the end of the assessment centre means that this form of
reliability is very high.
17.4.2 Validity
Research has consistently demonstrated that assessment centres have
high concurrent and predictive validity, with correlation coefficients in
the region of 0,60 and higher, against criteria such as job performance,
management potential, training performance and career progression
(Murphy & Davidshofer, 2006).
Content validity and face validity are both high, because the material is
based on the jobs, organisations and industries for which the centres are
being used.
17.4.3 Fairness
Because assessment centres are competency based, they are relatively
culture fair, with some studies finding no cultural effect (Kriek, Hurst &
Charoux, 1994). At the same time, for all the reasons given in Chapter 7,
especially sections 7.2 and 7.5, cross-cultural differences do exist in
most measures, including those obtained in assessment centres (Blair,
2003) (see also section 12.8).
The results in Tables 17.11 and 17.12 once again raise the issue of whether we
should have a single norm or group-based norms. This matter is dealt with in
some detail in section 7.5.
17.5.4 Administration
There are a number of ways in which the administration of an
assessment centre can be changed to reduce minority group differences.
Firstly, designers can reduce the amount of information processing and
reading comprehension skills in the exercises by making them shorter
and considering language difficulty levels. Assessors and designers
should give participants ample time to process the information –
minorities perform worse on timed tasks. Participants from
disadvantaged or culturally different groups should be briefed between
exercises, rather than at the beginning; this will reduce information load
during the assessments. Wherever possible, minority groups should be
represented on the assessment panel as this not only gives participants a
sense of security, but may also help to explain the cultural context of
some of the behaviours. Finally, assessors could use fewer dimensions
and coarser scoring systems – three- or four-point rather than seven- or
eight-point systems.
17.6 Summary
Additional reading
Blair, M.D. (2003). Best practices in assessment centres: Reducing “group differences”
to a phrase for the past gives a good description of the issues around fairness and what
can be done to minimise adverse impact with minority groups in the US.
Grayson, P. (2005). An introduction to assessment centres. This provides a good
overview of the assessment centre process and a historical overview of the
development of these centres in the UK and the US.
Lievens, F. & Klimoski, R.J. (2001). Understanding the assessment centre process:
Where are we now? In C.L. Cooper & I.T. Robertson (Eds), International review of
industrial and organisational psychology, 16, 245–286. This chapter provides a good
theoretical background to assessment centres.
Murray, M. (2005). How to design a successful assessment centre. People
Management (UK), 11(4), 24–45 provides a good account of setting up and running an
assessment centre.
Test your understanding
Short paragraphs
Essay
OBJECTIVES
18.1 Introduction
18.2.1 Intelligence
The first of the constructs that needs to be refined is intelligence. We
may ask what constitutes intelligence, how it is structured, and how it
should be assessed. Chapter 10 gives various definitions (see section
10.1.1). In fact, when two dozen prominent theorists were asked to
define intelligence, they gave two dozen somewhat different definitions
(Sternberg & Detterman, 1986). Moreover, it is very difficult to compare
concepts of intelligence across cultures. English has many words for
different aspects of intellectual power and cognitive skill (wise, sensible,
smart, bright, clever, cunning, etc.). If other languages have just as
many, which of them shall we say corresponds to its speakers’ “concept
of intelligence”? Even within a given society, different cognitive
characteristics are emphasised from one situation to another and from
one subculture to another. These differences extend not just to
conceptions of intelligence, but to what is considered adaptive or
appropriate in a broader sense. (See Neisser et al., 1995.)
many recent studies show that the speeds with which people perform
very simple perceptual and cognitive tasks are correlated with
psychometric intelligence … In general, people with higher
intelligence test scores apprehend, scan, retrieve, and respond to
stimuli more quickly than those who score lower.
The notion of elegance is important because there is a basic belief that simple
solutions are better than complex ones. One is reminded here of a recent book on
complexity theory by John Gribbin, entitled Deep simplicity: Chaos, complexity
and the emergence of life (2005). He argues quite strongly that science proceeds
by finding ways of simplifying the complex. All the great scientific breakthroughs
have involved finding simple and elegant ways of explaining complex
relationships. Clearly, in terms of this view, intelligence involves finding simple
solutions to complex problems.
18.2.2 Potential
An area or construct of particular concern to societies where there has
been an unequal distribution of social and educational resources is the
defining and identifying of potential. As far as a definition is concerned,
there are two approaches. Firstly, potential can be defined in terms of a
behavioural readiness to perform a particular task which simply awaits
the opportunity to perform it. In this sense, potential can be likened to a
seed which is waiting to germinate. All that is required is for the right
conditions to occur for this behaviour to manifest itself. (The seed needs
to be planted in fertile soil and properly nurtured, and it will blossom.)
In assessment terms, this translates into the question of whether the
person is able to demonstrate the competencies that are required. This
question is best answered in a simulation or assessment centre process.
18.2.3 Personality
With respect to the definition of personality, the first issue that we need
to address (if not resolve) is whether the etic or nomothetic view holds
(i.e. that there is a relatively well-defined set of factors, such as the Big
Five), or should we take an emic or idiographic view, which argues that
people need to be understood in themselves, as individuals, rather than
as an assembly of various traits that differ only in the “amount” that
different people have. In this regard, the ideas of John Berry and the
other people concerned with cross-cultural assessment should be
revisited. In short, they argue that the two theoretical streams – emic and
etic – can be brought together using the ideas of a derived etic – that is, a
sort of compromise between the two that allows cross-cultural
comparisons to be made without the risk of imposing an etic approach
on the target cultural group (see Berry et al., 1992).
18.2.4 Competencies
Another theoretical area where we need clarity is that of competencies.
Although the construct has a great deal of appeal, especially when tied
to specific outcomes such as passing examinations and being selected
into specific positions, far more attention needs to be paid to the longer-
term aspects of the definition of competence, including the crucial
question: “Competent for what?” which has to be answered. This and
other issues relating to competencies were dealt with in detail in Chapter
12. The view that competency is an all-or-nothing condition is
inadequate, given the different needs of different stakeholders and
different purposes that a single assessment has to serve.
[a]t the end of the 20th century we are living in a planetary or earth
world; more and more, psychologists will develop their work in
different languages and cultures. This fact demands standards for test
adaptation and for test construction for cross-cultural research and
practice, and more efforts will need to be made in this direction (no
page no.)
Since then, the use of computerised testing, especially via the Internet,
has become the preferred mode and in some cases the only way of
testing for large organisations. Macqueen (2012), for example, shows
that SHL, a leading global test provider, currently conducts 95 per cent
of its testing online rather than through the traditional paper-and-pencil
methods. He argues further that there is widespread support for the view
that within five to ten years all psychological testing, apart from certain
clinical and neuro-psychological testing, will be conducted online.
18.3.2.3 Disadvantages
A major possible disadvantage for people being assessed is computer
phobia or fear of technology. This could introduce a unique, irrelevant
error variance into observed scores, thus impairing test-result validity.
Although this factor has not manifested itself among relatively well-
educated people, it is likely to be an issue with semi-literate and
unschooled employees with little exposure to electronic media. There
are also indications that computer skills – at least typing speed – may be
related to test achievement (Russell, 1999). It should be noted, however,
that this factor was found to be related to performance in open-ended
tests, not multiple-choice tests (Russell & Haney, 1997).
A second issue of concern is the possibility of the system crashing as a
result of computer, program or power failures. There have been times
when the reliability of the country’s power supply has been erratic.
Once a person has answered three questions correctly at any given level,
this is taken as his achievement level on the test. Of course, some
allowance must also be made for guessing. Using a CAT approach
makes the assessment much shorter because the person being assessed
does not waste time answering items that are very easy (in which case
they would all be answered correctly) or very difficult (in which case
they would all be answered incorrectly). The above explanation is based
on an ability test, but one can see how personality or job satisfaction
scores could be obtained in a similar fashion: if the candidate were to
say he was not interested in, for example, the outdoors, then later
questions relating to outdoor activities could be excluded. Meijer and
Nering (1999) suggest that one area for the future use of CAT is for
personality tests, where faking and inconsistencies in item responses can
be detected and additional items then administered to adjust for or
identify those inconsistencies. Ben-Porath, Slutske and Butcher (1989)
have shown how adaptive testing can be used very successfully with the
Minnesota Multiphasic Personality Inventory (MMPI).
18.3.5.1 Advantages of adaptive testing
A CAT system is very efficient and has been shown to reduce the
number of items by as much as 50 per cent. It therefore takes far less
time to assess the person’s ability level, while also reducing
measurement variance (the random error component) by at least 50 per
cent (see Cohen & Swerdlik, 2002, p. 546 for details).
This is one reason why test-retest reliability is lower than one would
hope. This is termed the learning or transfer effect. (This is discussed in
Chapter 4.)
A second problem associated with CAT, and with item response theory
on which it is based, is that it assumes that the linearity of difficulty
levels is equal for all groups. In other words, it assumes that the item at,
say, the 60th percentile of difficulty for members of group A is at the
same level of difficulty for members of group B. Given the social and
educational discrepancies between different groups in South Africa, this
assumption is not warranted.
1. Lie and Correction scales are greater than the Infrequency scale.
2. The Infrequency scale is less than a T-score of 55, and
3. the Depression, Paranoia, Psychasthenia and Schizophrenia scales are less
than a T-score of 65, and
4. the Conversion Hysteria scale is greater than T=69, or
5. the Need for Affection subscale is greater than T=63, or
6. the Conversion Hysteria scale is greater than T=64, and the Denial of Social
Anxiety subscale or Inhibition of Aggression subscale is greater than T=59,
or
7. the Repression scale is greater than T=59, or
8. the Brooding subscale is greater than T=59.
All of these conditions need to be met in order to give a score and interpret the
statement quoted above.
It takes very little extra effort for a program to recognise the gender of
the person being assessed, and to generate the correct pronouns (he/she,
his/her) in the reports that are produced.
18.3.7.1 Advantages
Web-based testing has the advantages of offering 24-hour access to
testing in all corners of the world, immediate scoring, and a limited need
for test administrators, which means convenient, cost-effective and
efficient testing (Jones, 1998; Jones & Higgins, n.d.). Online testing also
provides advantages in terms of cost, volume, efficiency, global reach
and standardisation (see, for example, Tippins, 2009). The Internet also
allows both recruitment organisations/managers and researchers to find
a large number of participants who are already online and are interested
in being assessed.
There are a few other advantages that relate to the design of the tests and
scales:
18.3.7.2 Disadvantages
Despite the many advantages associated with online assessment, there
are a number of disadvantages as well. These include concerns about the
measurement and construct equivalence of items (and the test as a
whole), test security, standardisation of administration at remote testing
sites and what Macqueen (2012) refers to as “touch”.
The third area which may give rise to new constructs or ways of
assessing is the development of new theories in science in general, and
in the cognitive, behavioural and social sciences in particular. In this
section we examine a few of the more exciting developments that are
likely to affect psychological assessment. Perhaps the most important of
these is chaos theory or complexity science. Another is artificial
intelligence. A third is the issue of biological and physiological
measures of intellectual capacity. Although this last concept has been
around for almost as long as psychological assessment itself, recent
developments in the biological sciences now make it likely that
physiological measures may become real, accurate and valid measures
of psychological functions. Let us examine each of these briefly.
18.4.1 Complexity theory
One area of theorising from which new concepts and definitions of
personality may emerge is that of the complexity sciences*. Complexity
theory is an emerging field or paradigm* of science that is closely
related to chaos theory*. In many respects, it is the logical successor to
the general systems theory that emerged in the second half of the 20th
century. Complexity science throws up new ideas such as fractals*
(which are self-repeating patterns at different levels of magnitude) and
strange attractors* (which can crudely be seen as “magnets” which
pull phenomena to particular points and push them away from others).
Siegle and Hasselmo (2002, pp. 4–5) argue that “individuals who are
efficient at pattern completion will be more vulnerable to PTSD because
they are better able to engage in memory retrieval based on a previous
network state, a key feature of attractor formation”.
It should be clear from all that has been said in this chapter and in the
book as a whole that psychological assessment is far more than just
testing. In addition, there is a growing acceptance of the need for
psychological assessments of various kinds and for various reasons.
These include the identification of abilities in people being assessed.
They also include the identification of problem areas that may have
occurred as a result of accident or disease. This last is of particular
interest to those concerned with ensuring justice for all stakeholders –
the individual, the family, the workplace and society in general.
Assessment is crucial in ensuring health and safety, job satisfaction and
personal growth, the optimal placement of people into different work
categories, and for ensuring the adequate insurance reimbursement for
damage that may have occurred as a result of accidents and illness.
Very little has changed since then, and his arguments remain as
powerful as ever.
18.7 Conclusion
Short paragraphs
Essays
1. “Emotional intelligence is not a form of intelligence in the strict sense of the word”.
Discuss.
2. “Psychometric assessment is like democracy: it is the worst possible system –
except for all the others.” Discuss this modification of Winston Churchill’s statement
about democracy in the light of the anti-testing movement.
3. Critically discuss the issue of whether online testing should be controlled by a
registered/chartered psychologist or whether proctoring/supervision by some “lesser-
qualified” professional should be allowed.
APPENDICES
Appendix 1
3. Reasoning. If John has 20 apples and he eats three of them, how many
apples does he have left?
Each of the five main categories of tests can use these four content areas,
so that we can have memory for words, memory for numbers, memory
for pictures, memory for abstract designs and memory for physical
objects. (We have all played the memory game of looking at a table with
twenty objects on it for a short time and then trying to recall the objects
on the table.) This results in the categorisation shown in Table A1.2.
This sequence can go on for pages, and the person has to draw a line
through all the *s that he or she can find in a two-minute period.
3-D rotation. This is the same as the previous but using three-
dimensional shapes.
2-D assembly. Which two shapes can be joined together to form a
square?
3-D disassembly. This is the same as the previous one, but using
three-dimensional shapes. A typical example of this would be the so-
called exploded diagram of an object such as a carburettor, brake
assembly, sewing machine, etc.
Matrices. This is a series that runs both across and down. Raven’s
Progressive Matrices is the best known of this, although various other
test producers have used this format.
Analogies. This is similar to the verbal analogies hot:cold, wet:dry
(hot is to cold as wet is to dry) and takes the form of various shapes
such as:
Classification of representative objects. Which of the following is the
odd one out? (Pictures of a dog, cat, horse, cow and giraffe?) The
giraffe, because the others are all domestic animals.
Classification of abstract objects. This is the same as the previous, but
using abstract objects such as shapes, squiggly lines, etc.
a) b) c) d) e)
63587 65387 63578 63587 36587 63857
Motor speed. This is the ability to carry out various motor tasks as
quickly as possible. Tasks used here include things like putting pegs
in a pegboard using a pair of tweezers, matching nuts and bolts of
different sizes and screwing the nuts onto the bolts, sorting washers of
different sizes into appropriate containers, and so forth. These are
generally measures of the ability to see differences in the objects and
of fine motor coordination/manual dexterity. (Such assessments may
be important for jobs involving assembly of small units such as
watches, cellphones, computers and the like.)
Reaction time. This is the ability to react quickly to a stimulus. In the
simplest version, a rod (similar to a broomstick) with ruler markings
on it is released, and the time taken for the person to react to the
dropping stick by catching it is measured by the number of notches
from the start to where the person catches the rod. Other versions of
this involve recording the time taken from a light appearing and the
person hitting a switch. (In Chapters 9 and 16, it is argued that
reaction time may be closely related to overall intelligence, especially
when fairly complex decisions need to be taken.)
Hand-eye coordination. This is the ability to track an object in space.
Some of you might have played a game at a fairground in which a
ring must be moved along the length of a squiggly piece of wire
without touching the wire. A similar task is one in which the person
has to draw a pencil line between the walls of an object such as a star
or circle as quickly as possible without touching either of the walls.
Level 1 2 3 4 5 6
Approximate Illiterate Unskilled Semi- Skilled Managerial Senior
job level skilled & technical management
ABET level 1 2 3 4 5 6+
Approximate Grade Grade Grade Grade Grade 12 + Postgraduate
education level 0–3 4–7 8–10 9–12 Tertiary
Level 1 2 3 4 5 6
Senior
Approximate job level Illiterate Unskilled Semi- Skilled Managerial managers
skilled & technical and
professional
Approximate education level Grade 0–3 Grade 4– Grade Grade 9– Grade 12 + Postgraduate
7 8–10 12 tertiary
Figural Figural memory SAT/10
visualisation Perceptual Cancl Cancl Cancl
speed
Comparison CWP DAT- DAT-
RS* KL*/6,
TRAT/8
2-D rotation TRAT/12, VSA VSA
VSA
3-D rotation DAT- VSA VSA
KL/7,
VSA
2-D assembly SAT/8, VSA VSA
SpAbil,
TRAT/6
3-D assembly DAT- VSA VSA
KL/7,
SAT/9
VSA,
SRT/2,
TRAT/16
2-D Gott Gott
disassembly
3-D
disassembly
Figural Series SAT/6,
reasoning TRAT/14
Matrices SPM, CPM, COPAS1 SPM SPM SAT/5, APM, AR1
COPAS 1 & 11, SPM AR2
Analogies DAT-KL/3
Classification – TRAT/5
representative
Classification – FCT-HL
abstract
Verbal Verbal memory DAT-KL/9
SAT/7
Verbal Dictation1
receptive
Verbal usage SP2 DAT- VR1
KL/1,
TRAT/13,
TRAT/15,
SP2, VR2
Verbal CC2 CC2,
comparison Fi2 DAT-
LK/6, Fi2
SAT/1
Comprehension VWP1 DAT-KL/5
Reasoning Following DAT- CRT/1, VDR, VIR 1
instructions1 KL/2, VC1/3 VR1 Various
VR2 suppliers, no
details
available
Numerical Numerical CC2 CC2,
comparison DATKL/6
SAT/4,
TRAT/8
Computation NA2 NA2
DATKL/4,
SAT/2,
TRAT/7
Reasoning SAT/3, CRT/2, VNR, VDR,
TRAT/11 NR1, VIR
NMG/1-6
Numerical TRAT/9
interpretation
Technical, Knowledge, VMC DAT-
scientific, insight and KL/8,
mechanical comprehension MRT2,
TTB2,
TRAT/10
Assembly Manikin/jigsaw
puzzle
assembly
Psychomotor Motor speed SAT/11
Reaction time
Typing and
computer skills
Hand-eye VAC2, TRAT/1
coordination WSSM
2 Hand-eye TRAT/2
coordination
Balancing
Strength
Drawing
Dynamic LPCAT, APIL,
testing TRAM LPCAT
II
Level 1 2 3 4* 5* 6*
Senior
Approximate Illiterate Unskilled Semi- Skilled Managerial managers
job level skilled & technical and
professionals
Approximate Grades Grades Grades Grades 9– Grade 12 + Postgraduate
education level 0–3 4–7 8–10 12 tertiary
Personality Scales – 15FQ+,
batteries 16PF,
FFM, HPI
Single- LOC.MBTI,
factor JTI, Type
scales A/B
Projective TAT,
Rorschach
Workplace OIP OIP,
specific OPPro,
OPQ, WPI
Motivation MQ MQ
Values, interests and career CAnchs,
choice OPQ,SDS,
VMI
Affective states
Emotional intelligence BarOn,
MSCEIT,
PDA
Interpersonal skills PDA
Leadership/management OPQ
styles
Integrity/dependability DSI,
Giotto,
IP2000
Communication/decision- PDA
making style
Team functioning LGT Belbin,
Team functioning LGT Belbin,
OPQ
* Note: Most of the personality and other tests of typical behaviour
can be used at levels 4, 5 and 6.
It is important to note that not all tests listed in this matrix have
been classified and certified by the HSPCA, and that not all the
tests certified by the HSPCA are listed here. The full HPCSA list
of tests can be downloaded from www.gpwonline.co.za.
The various test distributors listed above and their contact details
(current as at June 2014) are given below.
Name Location Contact
APL Aprolab Johannesburg aprolab@icon.co.za
Belbin Belbin Cape Town, www.capacityinc.co.zacarolk@7i.co.za
Associates Johannesburg
Bioss BIOSS Johannesburg info@bioss.com
Southern
Africa
CEB/SHL Corporate Centurion zacustomersuccess@shl.com
Executive
Board/SHL
US
Integ Integrity Johannesburg integrity@integtests.com
International
JvR JVR Africa Johannesburg info@jvrafrica.co.za
MMM MindMuzik Pretoria sales@mindmuzik.com
Media
M&M M&M Pretoria info@mminitiatives.com
Initiatives
PSA Psytech South Johannesburg www.psytech.co.za
Africa
SCon Saville Johannesburg info@savilleconsulting.co.za
Consulting
Appendix 2
CALCULATING CORRELATIONS
There are basically three stages to carrying out a correlation (or any
other statistical analysis, for that matter). These are as follows:
1. Data capture – getting the various bits of data ready for analysis
2. Conducting the analysis
3. Interpreting the results
Various programs are available for this. The most widely used are
Statistica®, Excel® and Statistical Package for the Social Sciences®
(SPSS). In this appendix, we take you through each of the three stages
with both programs as well as with SPSS.
A2.1.1 Correlations
Before we do this, we need to clarify one small issue. As you know,
correlations compare the variance of one set of numbers with the
variance of another in order to determine the amount of overlap between
the variances of the two sets of numbers. This is best visualised in terms
of two overlapping circles, as is shown in Figure A2.1.
Figure A2.1 Visualisation of a correlation between two
variables
Sidebar A2.1
In statistical terms, a correlation is nothing more than the overlap between the two
circles shown in Figure A2.1 (termed the covariance, because they vary in
relation to each other) divided by the average or mean of the variance of the two
samples. To make things simpler, let us call the one circle A and the other circle
B, and let us call the covariance AB. In terms of this notation, the correlation
coefficient (r) is obtained by AB/mean of A and B (A B divided by the mean of A
and B).
However, there is a small extension to this calculation, and it is this: there are
three distinct ways of calculating the mean, of which two are important for this
discussion.
The way that we all know is simply to add up all the scores and divide by the
number of scores. (In this case, we simply add A and B and divide by 2
[(A+B)/2].) This is known as the arithmetic mean. The second way of calculating
the mean is to multiply the two scores and then take the square root of the total.
This is known as the geometric mean and is shown as (the square root of
A times B).
This figure is usually a little lower than the arithmetic mean and is preferred by the
statisticians, for reasons unknown to me. When calculating the correlation
coefficient, the covariance AB is divided by the geometric mean rather
than the arithmetic mean (A+B)/2, so that
The third type of mean is known as the harmonic mean, but is of no interest to us
here. It is defined as the reciprocal of the arithmetic mean of the reciprocals. A
reciprocal of any number is 1 divided by that number. To calculate the harmonic
mean, divide 1 by A and then divide 1 by B. Next, add these two numbers and
divide by 2. Finally, divide 1 by this answer.
We know that the value of r must fall within the range of +1 to 21.
Where r = +1, a perfect positive correlation exists, that is, the values of
the variables fall and rise together. Where r = 21, there is a perfect
negative correlation between the variables, meaning that an increase in
the value of one variable is accompanied by a decrease in the value of
the other variable. When r = 0, this denotes a zero correlation – the
variables are unrelated to each other.
We will use the Pearson product moment correlation (r) in this analysis
tutorial. We will show you how to do the calculations using Excel,
Statistica and SPSS.
a) The first row and column of Table A2.1 have the same variables. In
this case, they are labelled Var 1, Var 2, etc. If these variables had
been labelled as Age, Height, Width, etc. in setting up the data
matrix, these names would appear in the correlation matrix.
b) The values running down the diagonal are all 1 (shaded blocks). This
is not surprising, because in these cells each variable is correlated
with itself.
c) The correlations in the matrix are duplicated above and below the
diagonal. These are called off-diagonal cells. This is because the
correlation between Var 1 and Var 2 (,35) is the same as the
correlation between Var 2 and Var 1 (,35). Because of this, the one
half of the matrix is redundant – the same information appears above
and below the diagonal.
e) If you look at the correlation between Var 2 and Var 4, you see it is
negative. This may indicate that one of the variables is scored in the
wrong direction, although negative correlations do occur – the more
often you brush your teeth, the fewer caries you are likely to have.
Before we show you how to capture the data in the various programs,
look at Table A2.2. Firstly, note that the grand total in the bottom right
is the same (99 in this case) when you add down (21 + 11 + 20, etc.) and
when you add across (13 +15 + 18, etc.).
Table A2.2 Data sheet for five cases and six variables
For a good text on statistics and research methods, see Tredoux and
Durrheim (2005).
A2.2 Statistica®
a) Select New from the File menu to display the Create New Document
dialogue.
Enter the number of variables and cases you have in your sample. The
exact numbers are not vital as you can add and subtract variables later
(see e) below). Fill in the MD (missing data) code, as indicated. Fill
in the Variable Names as indicated.
d) You can enter the names of the variables by double clicking on the
existing variable name (e.g. Var1) and entering the name of the
variable you want (e.g. Age). Click OK.
e) Save the data file using the name you want (e.g. Pilot 1).
b) Go to Basic Statistics/Tables.
c) Go to Descriptive Statistics.
d) Enter OK.
g) The results are then displayed for the variable(s) you specified. In this
way, outliers and impossible values can be identified.
b) Click on One Variable List and list the variables you wish to
correlate.
A2.3 Excel®
b) Double click on the tab Sheet1 at the bottom of the Excel sheet (the
default).
c) List the variables (i.e. the scores) for each case (person) across the
table and the various cases (people) below one another. This is the
data that will be used to calculate correlations, create graphs and
perform other statistical analyses as required.
d) Save the file to a file name that suits your needs (e.g. Sample1 2008).
e) In the screen shot below, we have two sets of data, width and height,
for nine cases. (Note that the spreadsheet in the example has not been
named, it is simply called Sheet 1.)
g) Drag the cursor over the first variable you want to correlate. Repeat
this process for Array2 and click OK or Finish. This will give you
the correlation for the two variables you initially selected (width and
height).
A2.3.6 Interpreting the results
Decide whether the correlation is significant or not.
A2.4 SPSS®
SPSS is an initialism for Statistical Package for the Social Sciences. This
section is based on the SPSS for Windows 11.0, although other versions
are very similar. This section draws on Antonius (2003), especially Lab1
(pp. 216– 212).
b) Set up a data file by clicking on File, then New, then select the
number of variables (by name or label) and finally click OK. You
will be asked to specify each variable – the type of data and the
format. For example, you will be asked to give the number of
numerals involved and the number of decimals, for instance 3,1 is
two numerals of which one is a decimal. This means you will not be
able to enter a number larger than 99,9. This specification can be
changed at a later date (see A2.4.3 c) below).
Width Height
1
2
3
4
5
6
7
8
9
Width Height
1 22 31
2 23 32
3 24 33
4 25 34
5 26 35
6 27 36
7 28 37
8 29 38
9 30 39
b) Click on the arrow between the two panes and highlight the
variable(s) you wish to examine. Next, go to the Options button on
the bottom right of the screen and click on it.
c) Identify the descriptors you wish to use: for data cleaning, use
Maximum and Minimum. In the bottom half of this drop-down,
click on Variable List. Finally click on the Continue button on the
top right of the drop-down.
a) If you are working from an existing file that has been previously
saved, click on Open an Existing File. Identify the file you want and
click OK.
b) Every time you open an existing file, you are presented with an SPSS
Data Editor, which gives you the choice of viewing the variable
information or the data that has been captured. You make your
choice by clicking on one of the two little tabs at the bottom left of
the screen, labelled Data View or Variable View. Unless you want
to change the Variable Specifications (as outlined in 4.2 b) above),
you need only to work with the Data View.
b) A subdialogue box will open. You will see that it contains Bivariate,
Partial and Distance. Select Bivariate to run a correlation between
the variables.
Table A2.3
If we look at the correlation matrix in Table A2.3, we see that the values
above and below the diagonal are identical, for the reason given in
section A2.1.3 c) of this appendix. We will work on the data below the
diagonal, as highlighted. From this you see that the item-total correlation
for item 1 is 0,343, for item 2 it is 0,672, for item 3 it is 0,107, for item 4
it is 0,604 and for item 5 it is 0,734.
A
Absolute zero – The lowest temperature that is theoretically possible.
Adaptive tests – Tests that are made up of questions drawn from a large
item bank to match the ability level of the test taker. Also known as
tailored testing or response-contingent testing.
Alternate forms – Two forms of a test or measure that are alike in every
way except the actual items. These are used to overcome practice
effects when the measure has to be given on several occasions. Also
referred to as parallel forms.
B
Barnum effect – The tendency for people to believe vague positively
phrased reports about themselves because of the way the reports are
presented. Also termed the Forer effect.
Battery – A set of tests and other assessment techniques given to an
individual or group that have value individually and as well as
collectively.
C
Cardinal, central or secondary traits – According to idiographic
theories of personality, these are the most vital, important and
peripheral traits respectively.
Change score – The difference between scores before and after a task or
intervention of some kind. It indicates whether the variables
measured are improving or deteriorating.
Chaos theory – The view that when systems are left long enough they
disintegrate and become chaotic.
Consequences – Actions and processes that follow from and are caused
by other actions or processes.
Criterion problem – The name given to the fact that when calculating
the correlation between one measure and another (e.g. between a test
score and, say, job satisfaction), very often the criterion (job
satisfaction) is not very well defined. This results in the correlation
being low.
Critical realists – The view that reality exists “out there”, but the
interpretation of this reality depends on the thoughts and other
psychological processes of the observer.
D
Demand characteristics – Aspects of the situation, including the
perceptions of the person involved, which encourage people to
answer or behave in a certain way, such as the tendency to present
oneself in a favourable light or to anticipate what is being looked for
and reacting in a way that “helps” the researcher.
E
Ecological validity – A type of validity that indicates the extent to
which results will generalise to other settings.
Ethics – A set of principles that spell out what is right, good or proper
conduct. This contrasts with law, where these principles are laid
down and enforced.
Etic approach – Taking a personality instrument and apply it across all
cultures to see how different groups behave on this measure. It
assumes that personality has the same definition and almost the same
structure across cultures – it is a universalistic approach to
assessment. For example, if we took a personality measure such as
the 16PF, Myers-Briggs type indicator (MBTI) or Minnesota
multiphasic personality inventory (MMPI) and applied it to all
groups to see whether and how these groups differ, this would be an
example of a universalistic approach.
Flesch scores – Scores that reflect the difficulty level of text. There are
two important scores, namely the Flesch-Kincaid grade level and the
Flesch Reading Ease. The Flesch scores are based on the average
number of syllables per word and words per sentence.
G
Gardner’s theory of multiple intelligences – According to Gardner,
there are at least seven or eight distinct kinds of intelligence.
Grade norms – Norms that have been derived from a sample of people
in the same school grade as the person being assessed, which allows
the person to be evaluated against the performance of others in the
same grade. See also age-equivalent score.
H
Halo effect – A judgement error that occurs when the judgement of a
person’s abilities, motives, and so on are affected by a positive score
on another attribute. For example, attractive people generally score
higher on measures of work motivation, effort, and so forth than do
people who are less attractive.
I
Idiographic – The idiographic approach to personality focuses on a
person’s unique psychological structure and no attempt is made to
describe the person in terms of any particular traits or theoretical
constructs. This sometimes makes it difficult to compare one person
with others. See also nomothetic.
Investment theory of intelligence – The view that people are all born
with a certain raw ability to see relations and identify rules or
patterns that exist between objects, and that we can measure this
ability (g), using appropriate culture-fair tests. As we get older, we
“invest” this fluid g in certain kinds of judgement skills, such as
those involved in doing a mathematical or word problem, or
composing a sentence. Most people growing up in a stable
environment, receive a similar formal education so that they all
invest their fluid g in much the same kinds of judgement skills. This
means that their fluid intelligence and their crystallised intelligence
are so similar at an early age that it is almost impossible to tell them
apart. People raised in very different cultures may well have very
different fluid and crystallised intelligences.
J
Job analysis – The systematic assessment of the knowledge, skills,
values and other attributes required to perform a job successfully.
Job description – A list of all the tasks associated with a particular job,
as well as the tools required for this. It is also termed a post profile.
L
Latent-trait model – A set of assumptions about measurement,
including the assumption that a trait being assessed is uni-
dimensional and that each item measures the strength of that trait.
M
Malingering –The deliberate faking of a bad condition (or making it
seem worse than it really is). This is a ploy frequently used in
insurance claims and similar situations. Many psychological
assessment instruments used for forensic purposes have separate
scales to detect malingering built into them. See also faking and
impression management.
Motivation – The nature and strength of the factors that cause a person
to behave in a certain way (to initiate, continue with or stop some
action).
N
Need – According to one personality theory, our behaviour is
determined by various needs, which continue to have motivating
force until they are satisfied. The best known need theory is
Maslow’s hierarchy of needs.
O
Objective tests – Tests where only one answer for each item is correct
and therefore no subjectivity is required. Multiple-choice
examinations are of this kind.
Observed score – In classical test theory, the score that results from the
assessment process. This is contrasted with the true score. The
observed score is seen as the true score ± an error score.
P
Paired comparisons – The process in which employees are compared
with one another in all possible combinations – A with B, A with C,
B with C, and so on. The number of possible comparisons is given by
the formula n(n21)/2.
Power test – A test for which respondents have ample time to answer all
the items. (Compare with speed test.)
Protocols – A term for the answer sheets obtained during the assessment
process.
Psychic unity – The view that all people are essentially the same, all
“created equal in the eyes of God”.
Q
Qualified individualism – A form of affirmative action in which
organisational interests are maintained while preference is given to
the selection and development of underrepresented groups. See also
unqualified individualism.
Quantitative data – Information presented in numerical form.
R
Race norming – The practice (controversial and even banned in some
parts of the US) of developing separate norms for different race
and/or ethnic groups. In the past (and in some quarters even today),
this was or is the preferred way of dealing with group-based
differences in performance on various measures of performance.
Ratio data – The level of measurement where there is a zero point and
where score differences are assumed to reflect in an absolute sense
differences in the phenomenon being assessed.
S
Sample – Some part of a larger body of people or objects chosen to be
representative of the whole. See also population.
Speed test – A test of achievement or ability with a clear time limit, and
containing items that are of a uniform difficulty level, that is, within
the reach of most of the people in the target group. The fact that the
items have to be completed in a relatively short time emphasises the
efficiency with which candidates can answer the items. (Compare
with power test.)
Stem – Also known as question stem and item prompt. It is the item or
question used to elicit a multiple-choice response.
T
Tailored testing – A testing method, used in computer adaptive testing,
in which the test complexity is structured to the test taker’s ability.
The tests or test items are adapted for use by a particular individual.
(Compare this with clothes that are made for you by a tailor as
against those bought off the shelf from a store.)
U
Unstructured interview – An interview that does not follow a
predetermined structure or set of questions. Each interview is unique
and different people are asked different questions, giving rise to low
reliability and validity, and being open to bias. See also structured
interview and traditional interview.
V
Validation – The process of proving the validity of an assessment
measure.
W
Work sample – A small-scale sample of a typical aspect of a person’s
job; a sample of work behaviour that is used to evaluate a person’s
ability to perform the job.
X
X (pronounced “X-bar”) – The mathematical shorthand for the mean or
average of all the X scores.
Z
z-score – A statistic representing the number of standard deviations a
score is above or below the mean. For example, if the mean is 100
and the SD is 10, then a score of 115 is equal to a z-score of +1,5
(115 – 100 = 15 which is 1,5 SDs above the mean).
References
Abrahams, F., & Mauer, F.K. (1999a). Qualitative and statistical impacts of home language on
responses to items of the Sixteen Personality Factor Questionnaire (16PF) in South Africa.
South African Journal of Psychology, 29(2), 76–86.
Abrahams, F., & Mauer, F.K. (1999b). The comparability of the constructs of the 16PF in the
South African context. Journal of Industrial Psychology, 25(1), 53–59.
Adarrage, P., & Zacagnini, J.L. (1992). DAIA knowledge-based system for diagnosing autism. A
case study on the application of artificial intelligence to psychology. European Journal of
Psychological Assessment, 8, 17–25.
Alatas, S.H. (1968). The sociology of corruption: The nature, function, causes and prevention of
corruption. Singapore: D. Moore Press.
Alliger, G.M., & Dwight, S.A. (2000). A meta-analytic investigation of the susceptibility of
integrity tests to faking and coaching. Educational and Psychological Measurement, 60, 59–
72.
Allport, G.W. (1937). Personality: A psychological interpretation. New York: Holt, Rinehart, &
Winston.
AMA (American Management Association). (2002). Corporate values survey. Retrieved from
http://www.amanet.org/research
Amod, Z., & Seabi, J. (2013). Dynamic assessment in South Africa. In S. Laher, & K. Cockcroft
(Eds.), Psychological assessment in South Africa: Research and applications. (Chapter 9, pp.
120–136). Johannesburg: Wits University Press.
Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed). New York: McMillan.
Anolli, L., Duncan S., Magnusson, M.S., & Riva, G. (Eds.). (2005). The hidden structure of
interaction from neurons to culture patterns. Amsterdam: IOS Press.
ANZSCO. (2013) Australian and New Zealand Standard Classification of Occupations (1st ed.),
(Revision 1 dated 4 July 2013). Retrieved 24 July 2013 from
http://www.abs.gov.au/ausstats/abs@.nsf/Product+Lookup/173D4D67348CB91CCA2575DF002DA7A7?
opendocument
Arends-Tóth, J., & van de Vijver, F.J.R. (2003). Multiculturalism and acculturation: Views of
Dutch and Turkish-Dutch. European Journal of Social Psychology, 33, 249–266.
Arnold, D. (1991). To test or not to test: Legal issues in integrity testing. Forensic Reports, 4,
213-214.
Arnold, J. (2005). Work psychology: Understanding human behaviour in the workplace (4th
ed.). London: Prentice Hall.
Arvis, J.-F., & Berenbeim, R.E. (2003). Fighting corruption in East Asia: Solutions from the
private sector. Washington, DC: World Bank. (Cited in Verhezen, 2008.)
Aryee, S. (1997). Selection and training of expatriate employees. In N. Anderson, & P. Herriot
(Eds.), International handbook of selection and assessment (Vol. 13, pp. 147–160).
Chichester, UK: Wiley.
Assessment Oversight and the Personnel Psychology Centre. (2009). Structured interviewing:
How to design and conduct structured interviews for an appointment process. Prepared for the
Canadian Public Service Commission. Retrieved form http://www.psc-cfp.gc.ca/plcy-
pltq/guides/structured-structuree/rpt-eng.pdf Also available at www.psc-cfp.gc.ca
Ballantine, B. (1999). New forms of work organisation and productivity. A study prepared by
Business Decisions Limited for DGV of the European Commission. Retrieved 15 October
2013 from http://www.ukwon.net/files/kdb/36bfab692c2666745b2ea83846bf917a.pdf
Balma, M.J. (1959). The concept of synthetic validity. Personnel Psychology, 12, 395–396.
Bandura, A., Barbaranelli, C., Caprara, G.V., & Pastorelli, C. (2001). Self-efficacy beliefs as
shapers of children’s aspirations and career trajectories. Child Development, 72, 187–206.
Barnouw, V. (1985). Culture and personality (4th ed.). Homewood, IL: Dorsey Press.
Baron, H., & Bartram, D. (2006) Using online assessment tools for recruitment. Leicester, UK:
British Psychological Society.
Bar-On, R. (1997). Development of the Bar-On EQ-i: A measure of emotional intelligence and
social intelligence. Toronto, Canada: Multi-Health Systems.
Barrett, P.T., Petrides, K.V., Eysenck, S.B.G., & Eysenck, H.J. (1998). The Eysenck Personality
Questionnaire: An examination of the factorial similarity of P, E, N, and L across 34
countries. Personality and Individual Differences, 25, 805–819.
Barrick, M.R., Stewart, G.I., Neubert, M., & Mount, M.K. (1998). Relating member ability and
personality to work team processes and team effectiveness. Journal of Applied Psychology,
83, 377–391.
Bartram (2001). The impact of Internet on testing: Issues that need to be addressed by a Code of
Good Practice. Internal Report for SHL Group plc. (Cited in ITC, 2005).
Bartram, D. (2000). International guidelines for test use – Version 2000. Punta Gorda, FL:
International Test Commission.
Bartram, D. (2011). Contributions of the EFPA Standing Committee on Tests and Testing to
standards and good practice. European Psychologist, 16(2), 149–159.
Bartram, D. (2012). White Paper: The SHL Universal Competency Framework. Retrieved 21
August 2013 from http://www.shl.com/assets/resources/White-Paper-SHL-Universal-
Competency-Framework.pdf
Bartram, D., & Coyne, I. (1998). Variations in national patterns of testing and test use: The
ITC/EFPPA international survey. European Journal of Psychological Assessment, 14, 249–
260.
Bates, R. (2002). Liking and similarity as predictors of multi-source ratings. Personnel Review,
31(5), 540–552.
Ben-Porath, Y.S., Slutske, W.S., & Butcher, J.N. (1989). A real-data simulation of computerized
adaptive administration of the MMPI. Psychological Assessment: A Journal of Consulting
and Clinical Psychology, 1(1), 18–22.
Berk, R.A. (Ed.). (1982). Handbook of methods for detecting test bias. Baltimore, MD: The
Johns Hopkins University Press.
Berry, C.M., Sackett, P.R., & Wiemann, S. (2007). A review of recent developments in integrity
test research. Personnel Psychology, 60, 271–301.
Berry, C.M., Ones, D.S., & Sackett, P.R. (2007). Interpersonal deviance, organizational
deviance, and their common correlates: A review and meta-analysis. Journal of Applied
Psychology, 92, 409–423.
Berry, J.W., & Sam, D.L. (1997). Acculturation and adaptation. In J.W. Berry, M.H. Segall, &
C. Kagitcibasi (Eds.), Handbook of cross-cultural psychology: Social behavior and
applications (2nd ed), (Vol. 3, pp. 291–326). Boston, MA: Allyn & Bacon.
Berry, J.W., Kim, U., Power, S., Young, M., & Bujaki, M. (1989). Acculturation attitudes in
plural societies. Applied Psychology, 38, 185–206.
Berry, J.W., Poortinga, Y.H., Segall, M.H., & Dasen, P.R. (1992). Cross-cultural psychology:
Research and applications. New York: Cambridge University Press.
Blair, M.D. (2003). Best practices in assessment centres: Reducing “group differences” to a
phrase for the past. Retrieved 11 November 2005 from
http://www.ipmaac.org/conf03/blair.pdf
Blom, A.G., de Leeuw, E.D., & Hox, J.J. (2011). Interviewer effects on non-response in the
European Social Survey. Journal of Official Statistics, 27, 359–377.
Boehnke, K., Lietz, P., Schreier, M., & Wilhelm, A. (2011). Sampling: The selection of cases for
culturally comparative psychological research. In D. Matsumoto, & F.J. R. van de Vijver
(Eds.), Cross-cultural research methods in psychology (pp. 101–129). New York: Cambridge
University Press.
Boeree, C.G. (2002). Early medicine and physiology. Retrieved 2 October 2013 from
http://webspace.ship.edu/cgboer/neurophysio.html
Borden, K.S., & Abbott, B.B. (2008). Research design and method (7th ed.). Boston, MA.:
McGraw-Hill.
Bradberry, T., & Greaves, J. (2005). The emotional intelligence quickbook. New York: Simon &
Schuster.
Brislin, R.W. (1986). The wording and translation of research instruments. In W.J. Lonner, &
J.W. Berry (Eds.), Field methods in cross-cultural research (Vol. 8, pp. 137–164). Thousand
Oaks, CA: SAGE.
Brislin, R.W. (1980). Translation and content analysis of oral and written material. In H.C.
Triandis, & J.W. Berry (Eds.), Handbook of cross-cultural psychology (Vol. 1, pp. 389–444).
Boston: Allyn & Bacon.
British Psychological Society. (2010). Code of good practice for psychological testing Retrieved
24 July 2013 from http://www.psychtesting.org.uk/download$.cfm?file_uuid=663F4988-
A7A7-91D6-68E7-677BD-82D14E9&siteName=ptc
Burish, M. (1997). Test length and validity revisited. European Journal of Personality, 11, 303–
315.
Byrne, B.M. (2001). Structural equation modeling with AMOS – Basic concepts, applications,
and programming. Mahwah, NJ: Lawrence Erlbaum.
Byrne, B.M. (2010). Structural equation modeling with AMOS: Basic concepts, applications,
and programming (2nd ed.). New York: Taylor and Francis.
Camilli, G., & Shepard, L.A. (1994). Methods for identifying biased test items. Thousand Oaks,
CA: SAGE.
Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait
multi-method matrix. Psychological Bulletin, 56, 81–105.
Campbell, D.T. (1986). Science’s social system of validity-enhancing collective belief change
and the problems of the social sciences. In D.W. Fiske, & R.A. Shweder (Eds.), Metatheory
in social science: Pluralities and subjectivities (pp. 108–135). Chicago, IL: University of
Chicago Press.
Campion, M.A., Fink, A.A., Ruggeberg, B.J., Carr, L., Phillips, G.M., & Odman, R.B. (2011).
Doing competencies well: Best practices in competency modelling. Personnel Psychology,
64, 225–262.
Carducci, B.J. (2009). The psychology of personality: Viewpoints, research, and applications.
Maldon, MA: Wiley-Blackwell.
Carretta, T.R., & Ree, M.J. (1996). Factor structure of the Air Force qualifying test: Analysis
and comparison. Military Psychology, 8, 29–42.
Carretta, T.R., Retzlaff, P.D., Callister, J.D., & King, R.E. (1998). A comparison of two U.S. Air
Force pilot aptitude tests. Aviation, Space and Environment Medicine, 69, 931–935.
Carroll, J.B. (1993). Human cognitive abilities. Cambridge: Cambridge University Press.
Caruso, D. (2004). Defining the Inkblot called emotional intelligence: Comment on R.J.
Emmerling and D. Goleman, Emotional intelligence: Issues and common misunderstandings.
Retrieved 17 October 2013, from http://www.eiconsortium.org/pdf/defining_the_ink-
blot_called_emotional_intelligence.pdf
Caryl, P.G. (1994). Early event-related potentials correlate with inspection time and intelligence.
Intelligence, 18, 15–46.
Cascio, W.F., Outtz, J., Zedeck, S., & Goldstein, I.L. (1991). Statistical implications of six
methods of test score use in personnel selection. Personnel Psychology, 4, 233–264.
Cattell, R.B. (1940). A culture-free intelligence test, I. Journal of Educational Psychology, 31,
176–199.
Cattell, R.B. (1987). Intelligence: Its structure, growth, and action. New York: Elsevier Science.
Cattell, R.B., & Cattell, A.K.S. (1963). Culture fair intelligence test. Champaign, IL: Institute for
Personality and Ability Testing.
Cattell, R.B., Eber, H.W., & Tatsuoka, M.M. (1970). Handbook for the Sixteen Personality
Factors Questionnaire (16PF). Champaign, IL: Institute for Personality and Ability Testing.
Cheung, F.M., Cheung, S.F., Leung, K., Ward, C., & Leong, F. (2003). The English version of
the Chinese Personality Assessment Inventory: Derived etics in a mirror position. Journal of
Cross-Cultural Psychology, 34, 433–452.
Cheung, F.M., Leung, K., Fan, R.M., Song, W.Z., Zhang, J.-X., & Chang, J.P. (1996).
Development of the Chinese Personality Assessment Inventory. Journal of Cross-Cultural
Psychology, 27, 181–199.
Cheung, F.M., Leung, K., Zhang, J.-X., Sun, H.-F., Gan, Y.-Q., Song, W.-Z., & Xie, D. (2001).
Indigenous Chinese personality constructs: Is the five-factor model complete? Journal of
Cross-Cultural Psychology, 32(4), 407–433.
Cheung, G.W., & Rensvold, R.B. (2002). Evaluating goodness-of-fit indexes for testing
measurement invariance. Structural Equation Modeling, 9, 233–255.
Chinkanda, E.N. (1990). Shared values and Ubuntu. Paper presented at the Conference Kontak:
On Nation Building. Pretoria: Human Sciences Research Council. (Cited in Prinsloo, 2001.)
Claasen, N.C.W., van Heerden, J.S., Vosloo, H.N., & Wheeler, J.J. (2000). Manual for the
differential aptitude tests (Form R). Pretoria: Human Sciences Research Council.
Clay, R.A. (2006). Assessing assessment. Monitor. Retrieved 14 October 2008 from
http://www.apapractice.org/apa/insider/practice/trends/assessment.html
Cleary, T.A., Humphreys, L.G., Kendrick, S.A., & Wesman, A. (1975). Educational uses of tests
with disadvantaged populations. American Psychologist, 30, 15–41.
Coetzee, N., & Vosloo, H.N. (2000). Manual for the differential aptitude test. Pretoria: Human
Sciences Research Council.
Cohen, J. (1977). Statistical power analysis for the behaviour sciences (revised ed.). New York:
Academic.
Cohen, R.J., & Swerdlik, M.E. (2002). Psychological testing and assessment: An introduction to
tests and measurement (5th ed.). London: McGraw-Hill.
Cohen, R.J., Swerdlik, M.E., & Sturman, E. (2012). Psychological testing and assessment: An
introduction to tests and measurement. (8th ed.). Boston, MA: McGraw-Hill.
Cole, N.S. (1973). Bias in selection. Journal of Educational Measurement, 10(4), 237–255.
Coleman, V., & Borman, W. (2000). Investigating the underlying structure of the citizen
performance domain. Human Resources Management Review, 10(1), 25–44.
Cone, J.D., & Hayes (1980). Environmental problems, behavioural solutions. California: Cole.
Connelly, B.S., & Ones, D.S. (2008). The personality of corruption: A national-level analysis.
Cross-Cultural Research, 42, 353–385.
Cooper, C.L., & Robertson, I.T. (Eds.). (2001). International review of industrial and
organisational psychology, 16. Chichester, UK: Wiley.
Cortina, J.M., Goldstein, N., Payne, S., Davison, K., & Gilliland, S.W. (2000). The incremental
validity of interview scores over and above cognitive ability and conscientiousness. Personal
Psychology, 53, 325–351.
Costa, P.T. Jr., & McCrae, R.R. (1985). The NEO personality inventory manual. Odessa, FL:
Psychological Assessment Resources.
Costa, P.T. Jr., & McCrae, R.R. (1992). Revised NEO personality inventory (NEO-PI-R) and
NEO five-factor inventory (NEO-FFI) manual. Odessa, FL: Psychological Assessment
Resources.
Coyne, I., & Bartram, D. (2002). Assessing the effectiveness of integrity tests: A review.
International Journal of Testing, 2(1), 15–34.
Coyne, I. (2008). Integrity testing in organisational contexts. In M. Born, C.D. Foxcroft, & R.
Butter (Eds.), Online readings in testing and assessment. International Test Commission
Available at http://www.intestcom.org/Publications/ORTA.php
Crafford, A., Moerdyk, A.P., Nel, P., O’Neill, C., & Schlechter, A. (2006). Industrial
psychology: Fresh perspectives. Cape Town: Pearson.
Cronshaw, S.F., Alexander, R.A., Wiesner, W.H., & Barrick M.R. (1987). Incorporating risk
into selection utility: Two models for sensitivity analysis and risk simulation. Organizational
Behaviour and Human Decision Processes, 40, 270–286.
Das, J.P., & Naglieri, J.A. (1997). Das-Naglieri cognitive assessment system (CAS). Itasca, IL:
Riverside Publishing.
Das, J.P., Kirby, J.R., & Jarman R.F. (1975). Simultaneous and successive syntheses: An
alternative model for cognitive abilities. Psychological Bulletin, 82, 87–103.
Das, J.P., Naglieri, J.A., & Kirby, J.R. (1994). Assessment of cognitive processes. Needham
Heights, MA: Allyn & Bacon.
Das, J.P. (2002). A better look at intelligence. Current Directions in Psychology, 11(1), 28–32.
Das, J.P., Kar, B., & Parrila, R. (1996). Cognitive planning: The psychological basis of
intelligent behaviour. New Delhi: SAGE International.
Davies, M., Stankov, L., & Roberts, R.D. (1998). Emotional intelligence: In search of an elusive
construct. Journal of Personality and Social Psychology, 75, 989–1015.
Davis, D.W., & Silver, B.D. (2003). Stereotype threat and race of interviewer effects in a survey
on political knowledge. American Journal of Political Science, 47, 33–45.
Dawes, R., & Corrigan, B. (1974). Linear models in decision making. Psychological Bulletin,
81, 95–106.
Day, S.X., & Rounds, J.B. (1998). Universality of vocational interest structure among racial and
ethnic minorities. American Psychologist, 53(77), 728–736.
De Beer, M. (2005) Development of the learning potential computerised test (LPCAT). South
African Journal of Psychology, 35(4), 717–747.
De Beer, M. (2013). The learning potential computerised test (LPCAT) in South Africa. In S.
Laher, & K. Cockcroft (Eds.), Psychological assessment in South Africa: Research and
applications. (Chapter 10, pp. 137–157). Johannesburg: Wits University Press.
DeShon, R.P., Smith, M.R., Chan, D., & Schmitt, N. (1998). Can racial differences in cognitive
test performance be reduced by presenting problems in a social context? Journal of Applied
Psychology, 83, 438–451.
Dodds, E.R. (1951/1983). The Greeks and the irrational. Berkeley and Los Angeles, CA:
University of California Press.
Donovan, M.A., Drasgow, F., & Probst, T.M. (2000). Does computerizing paper-and-pencil job
attitude scales make a difference? New IRT analyses offer insight. Journal of Applied
Psychology, 85, 305–313.
Douglas, J., Roussos, L., & Stout, W. (1996). Item-bundle DIF hypothesis testing: Identifying
suspect bundles and assessing their differential functioning. Journal of Educational
Measurement, 33, 465–484.
Douglas, S.P., & Nijssen, E.J. (2002). On the use of ‘borrowed’ scales in cross-national research:
A cautionary note. International Marketing Review, 20(6), 621–642.
Dreyfus, S.E., & Dreyfus, H.L. (1980). A five-stage model of the mental activities involved in
directed skill acquisition. Washington, DC: Storming Media. Retrieved 1 August 2013 from
http://www.dtic.mil/cgi-bin/GetTRDoc?
AD=ADA084551&Location=U2&doc=GetTRDoc.pdf
Dubois, D.W. (2005). What are competencies and why are they important? People Dynamics,
July, 10–11.
Dubois, D.W., & Rothwell, W.J. (2000). The competency toolkit. Amherst, MA: Human
Resources Development Press.
Dyal, J.A. (1984). Cross-cultural research with the locus of control construct. In H.M. Lefcourt
(Ed.), Research with the locus of control construct (pp. 209–306). San Diego, CA: Academic
Press. (Cited in Van de Vijver & Phalet, 2004.)
EDAC (Executive Development Centres). (2006). Introduction to MCPA™ Mar 06. Retrieved
27 January 2009 from http://www.edacen.com/assessments/MCPA
Educational Testing Service. (2000). ETS standards for quality and fairness. Princeton, NJ:
Educational Testing Service.
Ekuma, K.J. (2012). The importance of predictive and face validity in employee selection and
ways of maximizing them: An assessment of three selection methods. International Journal
of Business and Management, 7(22), 115–122.
Ellingson, J.E., Sackett, P.R., & Hough, L.H. (1999). Social desirability corrections in
personality measurement: Issues of applicant comparison and construct validity. Journal of
Applied Psychology, 84, 155–166.
Emmerling, R.J., & Goleman, D. (2003). Emotional intelligence: Issues and common
misunderstandings. Issues in Emotional Intelligence, 1(1). Retrieved 25 October 2008 from
http://www.eiconsortium.org
Employment Equity Amendment Act. (2013). Employment Equity Amendment Act, 2013 Act No
47 of 2013. Pretoria: Government Gazette No 37238 dated 16 January 2014.
Erikson, E. (1963). Childhood and society (2nd ed.). New York: W.W. Norton.
Espinosa, A.J., Procidano, M.E., & He, J (2012). Social desirability across 3 cultural contexts:
Mexico, USA and China. Paper presented at the 21st International Cross-cultural Psychology
Conference, Stellenbosch, South Africa, 17–21 July.
Evans, B.R. (1999). The cost of corruption. A discussion paper on corruption, development and
the poor. Teddington, UK: Tearfund.
Eysenck, H. (2000). Intelligence: A new look. New Brunswick, NJ: Transaction Publishers.
Eysenck, H.J., & Eysenck, M.W. (1985). Personality and individual difference. New York:
Plenum.
Eysenck, H.J., Barrett, P.T., & Eysenck, S.B.G. (1985). Indices of factor comparison for
homologous and non-homologous personality scales in 24 different countries. Personality
and Individual Differences, 6, 503–504.
Fallaw, S.S., Kantrowitz, T.M., & Dawson, C.R. (2012). 2012 Global assessment trends report.
SHL Talent Measurement.
Feldman-Bischoff, J., & Barshi, I. (2007). The effects of blood glucose levels on cognitive
performance: A review of the literature. Moffett Field, CA: NASA Ames Research Centre.
Retrieved 7 August 2013 from
http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20070031714_2007030981.pdf
Feuerstein, R. (1979). The dynamic assessment of retarded performers: The learning potential
assessment device: Theory, instruments, and techniques. Baltimore, MD: University Park
Press.
Fink, A., & Neubauer, A.C. (2001). Speed of information processing, psychometric intelligence
and time estimation as an index of cognitive load. Personality and Individual Differences,
30(6), 1009–1021.
Fioravanti, M., Gough, H.G., & Frere, L.J. (1981). English, French, and Italian adjective check
lists: A social desirability analysis. Journal of Cross-Cultural Psychology, 12, 461–472.
Fisher, W.P. Jr. (1997). What scale-free measurement means to health outcomes research.
Physical Medicine & Rehabilitation State of the Art Reviews, 11(2), 357–373.
Fletcher, C. (1995). New directions for performance appraisal. Some findings and observations.
International Journal of Selection and Assessment, 3(3), 191–196.
Flynn, J.R. (1984). The mean IQ of Americans: Massive gains 1932 to 1978. Psychological
Bulletin, 95, 29–51.
Fontenesi, M. (2005). Preface. In L. Anolli, S. Duncan, M.S. Magnusson, & G. Riva (Eds.), The
hidden structure of interaction from neurons to culture patterns. Amsterdam: IOS Press.
Retrieved 23 September 2007 from http://www.vespsy.com/communication/volume7.html
Forer, B.R. (1949). The fallacy of personal validation: A classroom demonstration of gullibility.
Journal of Abnormal Psychology, 44, 118–121.
Fouad, N.A. (1993). Cross-cultural vocational assessment. The Career Development Quarterly,
42, 4–13.
Fouad, N.A., Harmon, L.W., & Borgen, F.H. (1997). Structure of interests in employed male and
female members of US racial-ethnic minority and nonminority groups. Journal of Counseling
Psychology, 44, 339–345.
Foxcroft, C., & Roodt, G. (Eds). (2009). An introduction to psychological assessment in the
South African context (3rd ed.). Cape Town: Oxford University Press.
Foxcroft, C., & Roodt, G. (Eds.). (2001). An introduction to psychological assessment in the
South African context (1st ed.). Cape Town: Oxford University Press.
Foxcroft, C., & Roodt, G. (Eds.). (2005). An introduction to psychological assessment in the
South African context (2nd ed.). Cape Town: Oxford University Press.
Foxcroft, C., & Stumpf, R. (2005, 23 June). What is matric for? Paper presented at the Umalusi
Seminar on “Matric – What is to be done?” 23 June, Pretoria.
Foxcroft, C.D. (1997). Psychological testing in South Africa: Perspectives regarding ethical and
fair practices. European Journal of Psychological Assessment, 13(3), 229–235.
Friedman, H.S., & Schustack, M.W. (1999). Personality: Classic theories and modern research.
Boston, MA: Allyn & Bacon.
Frijda, N., & Jahoda, G. (1966). On the scope and methods of cross-cultural research.
International Journal of Psychology, 1(2): 109–127.
Fulmer, R.M., & Conger, J.A. (2004). Growing your company’s leaders: How organizations use
succession management to sustain competitive advantage. New York: AMA-COM.
Furnham, A. (1992). Personality at work: The role of individual differences in the workplace.
London: Routledge.
Furnham, A. (2003). The incompetent manager: The causes, consequences and cures of
management failure. London: Whurr Publishers.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligence. New York: Basic
Books.
Gardner, H. (1993). Multiple intelligences: The theory in practice. New York: Basic Books.
Gass, S.M., & Varonis, E.M. (1991). Miscommunication in Nonnative Speaker Discourse. In N.
Coupland, H. Giles, & J.M. Wiemann (Eds.). Miscommunication and problematic talk (pp.
121–145). London: SAGE.
Geisinger, K.F. (1994). Cross-cultural normative assessment: Translation and adaptation issues
influencing the normative interpretation of assessment instruments. Psychological
Assessment, 6(4), 304–312.
Geisinger, K.F., Spies, R.A., Carlson, J.F., & Plake, B.S. (2007). (Eds.). Mental measurements
yearbook (17th ed.). Lincoln, NE: University of Nebraska Press, Buros Institute of Mental
Measurements.
George, J.A., & Reiber, A. (2005). The ROI of assessment. Retrieved 8 May 2007 from
http://www.workindex.com/editorial/staff/sta0506-tt-01.asp
Ghorpade, J., Hattrup, K., & Lackritz, J.R. (1999). The use of personality measures in cross-
cultural research: A test of three personality scales across two countries. Journal of Applied
Psychology, 84, 670–679.
Gierl, M., Jodoin, M., & Ackerman, T. (2000). Performance of Mantel-Haenszel, simultaneous
bias test and logistic regression when the proportion of DIF items is large. Paper presented at
the annual meeting of the American Education Research Association (AERA), New Orleans,
LA, 24–27 April. Retrieved 17 July 2013 from
http://www2.education.ualberta.ca/educ/psych/crame
Goldberg, L.R. (1970). Man versus model of man: A rationale plus evidence for a method of
improving clinical inference. Psychological Bulletin, 73, 422–432.
Goldberg, L.R., Grenier, J.R., Guion, R.M., Sechrest, L.B., & Wing, H. (1991). Questionnaires
used in the prediction of trustworthiness in pre-employment selection decisions; APA Task
Force Report. Washington, DC: American Psychiatric Association.
Goldstein, I.L., Braverman, E.P., & Goldstein H.W. (1991). The use of needs assessment in
training systems design. In K. Wexley (Ed.), Handbook of human resources management:
Developing human resources (pp. 35–75). Washington DC: BNA Books.
Goldstein, H.W., Yusko, K.P., Braverman, E.P., Smith, D.B., & Chung, B. (1998). The role of
cognitive ability in the subgroup differences and incremental validity of assessment centre
exercises. Personnel Psychology, 51, 357–374.
Goleman, D. (1995). Emotional intelligence – Why it can matter more than IQ. New York:
Bantam Books.
Gordon, J.R. (2001). Organisational behaviour: A diagnostic approach. Upper Saddle River,
NJ: Prentice Hall.
Gordon, M. M. (1964). Assimilation in American life. New York: Oxford University Press.
Gottfredson, L.S. (1998). The general intelligence factor. Scientific American, 9(4), 24–29.
Greenberg, J., & Baron, R.A. (2000). Behaviour in organisations: Understanding and managing
the human side of work (7th ed.) Upper Saddle River, NJ: Prentice Hall.
Greenhaus, J.H., & Callanan, G.A. (1994). Career management (2nd ed.). Fort Worth, TX:
Dryden Press.
Gribbin, J. (2005). Deep simplicity: Chaos, complexity and the emergence of life. London:
Penguin Books.
Groenewald, H.J. (2012). Elliott Jaques and sensemaking: Ultimate sensemaker or 20th century
relic? MPhil Dissertation, University of Stellenbosch, Stellenbosch. Retrieved 15 August
2013 from http://www.google.com/url?sa=t&rct=j&q=-
scholar.sun.ac.za%2Fbitstream%2Fhandle%2F…1%2F…
%2Fgroenewald_elliott_2012.pdf&source=web&cd=1&ved=0CCkQFjAA&url=http%3A%2F%2Fscholar.sun.ac
IIMUqHnKvSv7AaV9YEo&usg=AFQjCNH_L3LF6K3ZR241abNEWq9OHmkKDg
Grove, W.M., Zald, D.H., Lebow, B.S., Snitz, B.E., & Nelson, C. (2000). Clinical versus
mechanical prediction: A meta analysis. Psychological Assessment, 12, 19–30.
Guilford, J.P. (1959). Traits of creativity. In H.H. Anderson (Ed.), Creativity and its cultivation
(pp. 142–161). New York: Harper and Row.
Guilford, J.P., & Hoepfner, R. (1971). The analysis of intelligence. New York: McGraw-Hill.
Guion, R.M. (1996). Assessment, measurement and prediction for personnel decisions. Mahwah,
NJ: Erlbaum.
Guion, R.M. (1998). Assessment, measurement and prediction for personnel decisions. Mahwah,
NJ: Erlbaum.
Haier, R.J. (1993). Cerebral glucose metabolism and intelligence. In P.A. Vernon (Ed.),
Biological approaches to the study of human intelligence (pp. 317–332). Norwood. NJ:
Ablex.
Hall, J.D., Howerton D.L., & Bolin A.U. (2005). The use of testing technicians: Critical issues
for professional psychology. International Journal of Testing, 5(4), 357–375.
Hambleton, R. (2010). Item response theory: Concepts, models and applications. Workshop
presented at the 27th International Congress of Applied Psychology, Melbourne, Australia.
(Cited in Macqueen, 2012.)
Hambleton, R.K. (1994). Guidelines for adapting educational and psychological tests: A
progress report. European Journal of Psychological Assessment (Bulletin of the International
Test Commission), 10, 229–244. (Cited in Van de Vijver, 2002.)
Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response
theory. Newbury Park, CA: SAGE.
Hanson, M.A., Borman, W.C., Mogilka, H.J., Manning, C., & Hedge, J.W. (1999).
Computerized assessment of skill for a highly technical job. In F. Drasgow, & J. B. Olson-
Buchanan (Eds.), Innovations in computerized assessment (pp. 197–220). Mahwah, NJ:
Erlbaum.
Hare, R.D. (1993) Without conscience: The disturbing world of the psychopaths among us. New
York: Simon & Schuster.
Hare, R.D. (1995). Psychopaths: New trends in research. Harvard Mental Health Letter, 12, 4–5.
Hare, R.D. (1996). Psychopathy: A clinical construct whose time has come. Criminal Justice and
Behavior. 23, 25–54.
Hare, R.D. (1997). The NATO Advanced Study Institute on psychopathy, Alvor 1995. Journal
of Personality Disorders, 11, 301–303.
Hare, R.D. (1998). Psychopathy, affect and behaviour. In D.J. Cooke, A.E. Forth, & R.D. Hare
(Eds.), Psychopathy: Theory, research and implications for society (pp. 105–137). Dordrecht,
The Netherlands: Kluwer.
Harris, D.B. (1963). Children’s drawings as measures of intellectual maturity. New York:
Harcourt, Brace & World.
Harris, W.G. (2000). Best practices in testing technology: Proposed computer-based testing
guidelines. Journal of e-Commerce and Psychology, 1(2), 23–35.
Haverkamp, B.E., Collins, R.C., & Hansen J.I. (1994). Structure of interests of Asian American
college students. Journal of Counseling Psychology, 41, 256–264.
He, J., & van de Vijver, F. (2012). Bias and equivalence in cross-cultural research. Online
Readings in Psychology and Culture, 2(2). Retrieved 30 May 2014.
Heilman, M.E., Battle, W.S., Keller, C.E., & Lee, R.A. (1998). Type of affirmative action
policy: A determinant of reactions to sex-based preferential selection? Journal of Applied
Psychology, 83, 190–205.
Helms, J.E. (1992). Why is there no study of cultural equivalence in standardised cognitive
ability testing? American Psychologist, 47, 1083–1101.
Hersey, P., & Blanchard, K.H. (2008). Management of organisational behaviour (9th ed.).
Englewood Cliffs, NJ: Prentice Hall.
Hiebert, P.G. (1985). Anthropological insights for missionaries Grand Rapids, MI: Baker Book
House.
Higgins, L.T., & Zheng, M. (2002). An introduction to Chinese psychology – Its historical roots
until the present day. The Journal of Psychology, 136(2), 225–239.
Hirsch, S. K. (1991). Using the Myers-Briggs type indicator in organizations (2nd ed.). Palo
Alto, CA: Consulting Psychologist Press.
Ho, D.Y.F. (1996). Filial piety and its psychological consequences. In M.H. Bond (Ed.),
Handbook of Chinese psychology (pp. 155–165). Hong Kong: Oxford University Press.
Hofstede, G. (1991) Culture and organisations: Software of the mind. New York: McGraw–Hill.
Hofstede, G. (1994). Uncommon sense about organizations – Cases, studies, and field
observations. London: SAGE.
Hofstede, G. (1996) Cultural constraints in management theories. In R.M. Steers, L.W. Porter, &
G.A. Bigley (Eds.) Motivation and leadership at work (6th ed.) New York: McGraw-Hill.
Hogan, J., Davies, S., & Hogan, R. (2007). Generalizing personality-based validity evidence. In
S.M. McPhail (Ed.), Alternative validation strategies: Developing new and leveraging
existing validity evidence (pp. 181–229). San Francisco, CA: Jossey-Bass.
Holland, J.L. (1985). Making vocational choices (2nd ed.). Upper Saddle River, N.J: Prentice-
Hall.
Holland, P.W., & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale, NJ:
Erlbaum.
Holland, W.P., & Thayer, D.T. (1988). Differential item performance and the Mantel_Haenszel
procedure. In H. Wainer, & H.I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ:
Erlbaum.
Holloway, J.D. (2003). Nondoctoral assistants in question. APA Monitor, 34(10), 26.
Horn, J.L., & Cattell, R.B. (1966). Refinement and test of the theory of fluid and crystallized
general intelligence. Journal of Educational Psychology, 57(5), 253–270.
Hough, L.M., Eaton, N.K., Dunnette, M.D., Kamp, J.D., & McCloy, R.A. (1990). Criterion-
related validities of personality constructs and the effect of response distortion on those
validities. Journal of Applied Psychology, 75, 581–595.
Hough, L.M., & Oswald, F.L. (2000). Personnel selection: Looking toward the future –
remembering the past. Annual Review of Psychology, 51, 631–664.
HPCSA (Health Professions Council of South Africa). (2006). South African guidelines on
computerised testing – Form 257. Pretoria: HPCSA Professional Board for Psychology.
HPCSA (Health Professions Council of South Africa). (2007). Discussion document: Scope of
practice. Retrieved 27 March 2008 from
http://www.hpcsa.co.za/hpcsa/UserFiles/File/PSYCHOLOGY/Discussion%20document%20%20%20Scope%20o
03-071%20-%20Special%20Educ%20Meeting%20(March2007).pdf
HPCSA (Health Professions Council of South Africa) Professional Board for Psychology.
(2006). Policy on the classification of psychometric measuring devices, instruments, methods
and techniques – Form 08. Retrieved 16 August 2008 from
http://www.hpcsa.co.za/hpcsa/Userfiles/File/Psychpolicyclassificationf208.Doc
HPCSA (Health Professions Council of South Africa). (2014). List of Classified and Certified
Psychological Tests. Pretoria: Health Professions Council of South Africa Board Notice 93 of
2014. Government Gazette No 37903, 15 August 2014.
Hui, C. H., & Triandis, H. C. (1989). Effects of culture and response format on extreme response
style. Journal of Cross-Cultural Psychology, 20, 296-309.
HumanMetrics (2013). The Jung typology profiler for workplace. Retrieved 12 August 2013
from http://www.humanmetrics.com/hr/business/preemploymenttesting.aspx#about_jtpw.
Hunt, E., Frost, N., & Lunneborg, C. (1973). Individual differences in cognition: A new
approach to intelligence. In G. Bower (Ed.), Advances in learning and motivation (Vol. VII)
(pp. 87–122). New York: Academic Press.
Hunter, J. E., & Schmidt, F.L. (2004) Methods of meta-analysis: Correcting error and bias in
research findings (2nd Ed.). Newbury Park, CA: Sage.
Hunter, J.E., & Hunter, R.F. (1984). Validity and utility of alternative predictors of job
performance. Psychological Bulletin, 96, 72–98.
ILO (International Labour Organization). (2000) Workers without frontiers: The impact of
globalization on international migration. Geneva: ILO. Retrieved 17 July 2013 from
http://www.ilo.org/global/standards/subjects-covered-by-international-labour-
standards/migrant-workers/lang—en/index.htm
ITC (International Test Commission). (2001) International guidelines for test use. Retrieved 24
July2013 from http://www.intestcom.org/test_use_full.htm
Jankowicz, D. (2004). The easy guide to repertory grids. Chichester, UK: Wiley.
Jaques, E. (1982). Free enterprise, fair employment. New York and London: Crane Russak & Co
and Heinemann Educational Books.
Jaques, E. (1997). Requisite organization: A total system for effective managerial organization
and managerial leadership for the 21st century (1st ed.). Arlington: Cason Hall.
Jaques, E. (1998). Requisite organisation: A total system for effective managerial organization
and managerial leadership for the 21st century (2nd ed.). Fall Church, VA: Cason Hall.
Jaques, E., Gibson, R.O., & Isaac, D.J. (1978). Levels of abstraction in logic and human action:
A theory of discontinuity in the structure of mathematical logic, psychological behaviour, and
social organization. London: Heinemann Educational Books.
Jensen, A.R. (1980). Bias in mental testing. New York: Free Press.
Jing, G., Drasgow, F., & Gibby, R.E. (2012). Estimating the base rate of cheating for
unproctored Internet tests? Paper presented at the 27th Annual Conference of the Society for
Industrial and Organizational Psychology, San Diego, CA. (Cited by Macqueen, 2012.)
Jing, Q., & Fu, X. (2001). Modern Chinese psychology: Its indigenous roots and international
influences. International Journal of Psychology, 36, 408–418.
Jing, Q.C., Wan, C.W., & Lin, G.B. (2003). Psychological studies on Chinese only children in
the last two decades. Journal of Psychology in Chinese Societies, 3, 163–181.
Johnson, W., & Bouchard, T.J. Jr. (2005). Constructive replication of the visual-perceptual-
image rotation (VPR) model in Thurstone’s (1941) battery of 60 tests of mental ability.
Intelligence, 33, 417–430.
Jones, J.W. (1998). Virtual HR: Human resources management in the 21st century. Menlo Park,
CA: Crisp Publications.
Jones, J.W., & Higgins, K.D. (2001). Megatrends in personnel testing: A practitioner’s
perspective. Retrieved 18 March 2008 from
http://www.testpublishers.org/Documents/journal03.pdf
Jung, C.G. (1968). AION: Researches into the phenomenology of the self (2nd ed.). London:
Routledge.
Kanjee, A., & Foxcroft, C. (2009). Cross-cultural test adaptation, translation and tests in multiple
languages. In C. Foxcroft, & G. Roodt (Eds.), Introduction to psychological assessment in the
South African context (3rd ed.). (pp. 77–89). Cape Town: Oxford University Press.
Kanjee, A. (2007). Using logistic regression to detect bias when multiple groups are tested.
South African Journal of Psychology, 37, 47-61.
Kaplan, R.M., & Saccuzzo, O.P. (2013). Psychological testing: Principles, applications, and
issues (9th ed.). Belmont, CA: Wadsworth.
Kaplan, R.M. (1982). Nader’s raid on the testing industry: Is it in the best interests of the
consumer? American Psychologist, 37, 15–23.
Kaplan, R.M., & Saccuzzo, D.P. (1989). Psychological testing: Principles, applications, and
issues (2nd ed.). Pacific Grove, CA: Brooks/Cole.
Kaplan, R.M., & Saccuzzo, D.P. (2001). Psychological testing: Principle, applications and
issues (5th ed.). Belmont, CA: Wadsworth.
Kaplan, R.M., & Saccuzzo, D.P. (2005). Psychological testing: Principles, applications and
issues (6th ed.). Belmont, CA: Wadsworth.
Katz, M.R. (1983). SIGI: An interactive aid to career decision making. Journal of College
Student Personnel, 21, 34–40. (Cited in R. Langley, R. du Toit, & D.L. Herbst, 1995).
Keirsey, D. (1998). Please understand me II (1st ed.). Del Mar, CA: Prometheus Nemesis
Books.
Kelly, G.A. (1955). The psychology of personal constructs. Vol. 1: A theory of personality. New
York: Norton.
Khoza, R. (1994). African humanism. Diepkloof Extension, South Africa: Ekhaya Promotions.
Kirkpatrick, D.L. (1996). Evaluating training programs: The four levels. San Francisco, CA:
Berrett-Koehler.
Kitching, J. (2004). The measurement outcome equivalence of the career path appreciation
(CPA) for employees from diverse cultural backgrounds. MCom Dissertation, Pretoria,
University of Pretoria. Retrieved 15 August 2013 from
http://upetd.up.ac.za/thesis/available/etd-03162005-151333/unrestricted/00dissertation.pdf
Kline, R.B. (2010). Principles and practice of structural equation modeling (3rd ed.). New
York: Guilford Press.
Kluckhohn, F., & Strodtbeck, F.L. (1961). Variations in value orientation. Evanston, IL: Row.
Knott, K., Taylor, N., Oosthuizen, Y., & Bhabha, F. (2013). The Myers-Briggs type indicator in
South Africa. (Chapter 17, pp. 244–256). In S. Laher, & K. Cockcroft (Eds.), Psychological
assessment in South Africa: Research and applications. Johannesburg: Wits University Press.
Kouzes, J.M., & Posner, B.Z. (2009). To lead, create a shared vision. Harvard Business Review,
87, 20–21.
Kowalski, R., & Westen, D. (2004). Psychology: Brain, behavior, and culture (4th ed.). New
York: Wiley.
Kravitz, D.A., Harrison, D.A., Turner, M.E., Levine, E. L., Chaves, W., et al (1997). Affirmative
action: A review of psychological and behavioural research. Bowling Green, OH: Society for
Industrial and Organisational Psychology.
Kriek, H.J., Hurst, D.N., & Charoux, J.A.E. (1994). The assessment centre: Testing the fairness
hypothesis. Perspectives in Industrial Psychology, 20(2), 21–25.
Laher, S., & K Cockcroft (2013) Psychological assessment in South Africa: Research and
applications. Johannesburg: Wits University Press.
Lambsdorff, J.G. (2007). The methodology of the corruption perceptions index. Transparency
International (TI) and University of Passau. Retrieved 18 March 2009 from
http://www.transparency.org
Lanchbury, P., & Kearns, A. (2000) How do you know when and if a candidate with a disability
needs a test accommodation? Journal of the Application of Occupational Psychology to
Employment and Disability, 2(2), 37–40. (Cited in Vermeulen, 2000.) Review of J. Sandoval,
C.L. Frisby, K.F. Geisinger, J. D. Scheuneman & Julia Ramos Grenier (Eds.) (1998), Test
interpretation and diversity: Achieving equity in assessment. Washington DC: American
Psychiatric Association. Retrieved 12 September 2013 from
http://www.dwp.gov.uk/docs/no1-oct-00-book-review-2.pdf
Langley, R., Du Toit, R., & Herbst, D.L. (1995). Manual for the values scale. Pretoria: Human
Sciences Research Council.
Larmour, P. (2008).Corruption and the concept of “culture”: Evidence from the Pacific Islands.
Crime, Law and Social Change, 49(3), 225–237.
Larmour, P. (2012) Corruption and the concept of “culture”: Evidence from the Pacific Islands.
Chapter 9 in M. Barchess, P.B. Hindess, & P. Larmour (Eds.), Corruption: Expanding the
focus. Canberra: Australian National University.
Levine, E.L., Spector, P.E., Menon, S., Narayanan, L., & Cannon-Bowers, J. (1996). Validity
generalisation for cognitive, psychomotor and perceptual tests for craft jobs in the utility
industry. Human Performance, 9, 1–22.
Liddell, C., & Kruger, P. (1987). Activity and social behaviour in a South African township
nursery: Some effects of crowding. Merrill Palmer Quarterly, 33(2), 195–211.
Lievens, F., & Klimoski, R.J. (2001). Understanding the assessment centre process: Where are
we now? In C.L. Cooper, & I.T. Robertson (Eds.), International Review of Industrial and
Organisational Psychology, Vol. 16 (pp. 245–286). Chichester, UK: Wiley.
Linn, R.L. (1973). Fair test use in selection. Review of Educational Research, 43, 343–357.
Lipson, J.G., & Meleis, A.I. (1989). Methodological issues in research with immigrants. Special
Issue: Cross-cultural nursing: Anthropological approaches to nursing research. Medical
Anthropology, 12, 103–115.
Littlefield, L. Stokes, D., & Li, B (2010). Options for the protection of the public posed by the
inappropriate use of psychological testing. Submission to the Psychology Board of Australia,
Consultation Paper, Australian Psychological Society, September.
Lloyd-Jones, H. (1983). The justice of Zeus. (2nd ed.). Berkeley, CA: University of California
Press.
Lobello, S., & Sims, B. (1993). Fakability of a commercially produced pre-employment integrity
test. Journal of Business and Psychology, 8, 265–273.
Louisiana Psychological Association. (n. d.). Medical psychologists may prescribe medication.
Retrieved 24 July 2013 from http://www.louisianapsychologist.org/displaycommon.cfm?
an=1&subarticlenbr=6
Louw, D.A., & Edwards, D.J.A. (1997). Psychology: An introduction for students in Southern
Africa (2nd ed.). Johannesburg: Heinemann.
Lubinski, D., & Benbow, C. (2000). States of excellence: A psychological interpretation of their
emergence. American Psychologist, 55, 137–150.
Luria, A.R. (1973). The working brain: An introduction to neuropsychology (B. Haigh, trans.).
New York: Basic Books.
Luria, A.R. (1966). Human brain and psychological processes. New York: Harper and Row.
Lyman, H.B. (1998). Test scores and what they mean (6th ed.). Boston, MA: Allyn & Bacon.
Macqueen, P. (2012). The rapid rise of online psychological testing in selection. InPsych
(Australian Psych Soc) October. Available from
http://www.psychology.org.au/Content.aspx?ID=4925
Makhudu, N. (1993). Cultivating a climate of cooperation through Ubuntu. Enterprise, 68, 40–
41 (Cited in Prinsloo, 2001).
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective
studies of disease. Journal of the National Cancer Institute, 22, 719-748.
Maree, J.G. (2010). Brief overview of the advancement of postmodern approaches to career
counselling. Journal of Psychology in Africa, 20(3), 361–368.
Marks-Tarlow, T. (2002). Fractal dynamics of the psyche. Retrieved 6 July 2006 from
http://www.goertzel.org/dynapsyc/2002/FractalPsyche.htm
Marsh, H.W., & Byrne, B.M. (1993). Confirmatory factor analysis of multigroup–multimethod
self-concept data: Between-group and within-group invariance constraints. Multivariate
Behavioral Research, 28, 313–349.
Maslow, A. (1954). Motivation and personality. New York: Harper.
Masters, G.N. (1985). Common-Person equating with the Rasch model. Applied Psychological
Measurement, 9(1), 73–82.
Matarazzo, J.D. (1992). Testing and assessment in the 21st century. American Psychologist, 47,
1007–1018.
Mathews, R., Stokes, D., & Grenyer, B. (2010). A snapshot of the Australian psychology
workforce. InPsych, 32(5), 28–30. (Cited in Psychology, 2020.)
Mauer, K.F. (2003a). News flash 22 April, 2003: New validity evidence for the Career Path
Appreciation (CPA). Johannesburg: BIOSS.
Mauer, K.F. (2003b). News flash 23 November, 2003: New findings on the test–retest validity of
the CPA. Johannesburg: BIOSS.
Mauer, K.F. (n.d.). Summary of research evidence in respect of the CPA and IRIS.
Johannesburg: BIOSS.
Mayer, J.D., & Salovey, P. (1993). The intelligence of emotional intelligence. Intelligence, 17,
433–442.
McCaulley, M.H. (2000). Myers-Briggs type indicator: A bridge between counselling and
consulting. Consulting Psychology Journal: Practice and Research, 57, 117–132.
McClelland, D.C., Atkinson, J.W., Clark, R.A., & Lowell, E.L. (1953). The achievement motive.
New York: Appleton-Century-Crofts.
McCormick, E.J., Jeanneret, P.R., & Meacham, R.C. (1989) Position analysis questionnaire.
Bellingham, WA. PAQ Services.
McCormick, E.J., Jeanneret, P.R., & Meacham, R.C. (1972). A study of job characteristics and
job dimensions as based on the position analysis questionnaire. Journal of Applied
Psychology, 56, 347–368.
McCrae, R.R., & Costa, P.T. Jr. (1997). Toward a new generation of personality theories:
Theoretical contexts for the five-factor model. In J.S. Wiggins (Ed.), The five-factor model of
personality: Theoretical perspectives (pp. 51–87). New York: Guilford.
McIntire, S.A., & Miller, L.A. (2000). Foundations of psychological testing. New York:
McGraw-Hill.
McManus, M.A., & Kelly, M.L. (1999). Personality measures and biodata: evidence regarding
their incremental predictive validity in the life insurance industry. Personnel Psychology,
52(1/2), 137–148.
Meehl, P.E. (1954). Clinical versus statistical prediction – A theoretical analysis and a review of
the evidence. Minneapolis, MN: University of Minnesota Press.
Meijer, R.R., & Nering, M.L. (1999). Computerized adaptive testing: Overview and
introduction. Applied Psychological Measurement, 23, 187–194.
Mellam, A., & Aloi, D. (2003). Papua New Guinea national integrity systems: Country study
report. Blackburn South: Transparency International Australia.
Mercer, J.R. (1973). The pluralistic assessment project: Sociocultural effects in clinical
assessment. School Psychology Digest, 2, 10–18.
Meyer, W.F., Moore, C., & Viljoen, H.G. (1997). Personology: From individual to ecosystem.
Johannesburg: Heinemann.
Miller, L.K. (1997). Principles of everyday behavior analysis. Pacific Grove, CA: Brookes/Cole.
Millon, T. (1997). Millon clinical multiaxial inventory-III manual (2nd ed.). Minneapolis, MN:
National Computer Systems.
Milner, K. Donald, F., & Thatcher, A. (2013). Psychological assessment and workplace
transformation in South Africa: A review of the research literature. In S. Laher, & K.
Cockcroft (Eds.), Psychological assessment in South Africa: Research and applications.
Johannesburg: Wits University Press.
Miltenberger, R. (1997). Behavior modification: Principles and procedures. Pacific Grove, CA:
Brookes/Cole.
Morgeson, F.P., Campion, M.A., Dipboye, R.L., Hollenbeck, J.R., Murphy, K., & Schmitt, N.
(2007). Reconsidering the use of personality tests in personnel selection contexts. Personnel
Psychology, 60, 683–729.
Morris, C.G., & Maisto, A.A. (2002). Psychology: An introduction (12th ed.). Upper Saddle
River, NJ: Prentice Hall.
Mount, M.K., Barrick, M.R., & Strauss, J.P. (1994). Validity of observer ratings of the Big Five
personality factors. Journal of Applied Psychology, 79, 272–280.
Muchinsky, P.M., Kriek, H.J., & Schreuder, A.M.G. (2002). Personnel Psychology (2nd ed.).
Johannesburg: Thomson.
Muniz, J., Bartram, D., Evers, A., Boben, D., Matesic, K., Glabeke, K., Fernandez-Hermida,
J.R., & Zaal, J.N. (2001). Testing practices in European countries. European Journal of
Psychological Assessment, 17(3), 201–211.
Murphy, K.R. (2000). What constructs underlie measures of honesty or integrity? In R. Goffin,
& E. Helmes (Eds.), Problems and solutions in human assessment: A Festschrift to Douglas
N. Jackson at seventy (pp. 265–284). London: Kluwer.
Murphy, K.R., & Davidshofer, C.O. (2006). Psychological testing: Principles and applications
(6th ed.). Upper Saddle River, NJ: Prentice Hall.
Murphy, L., Plake, B.S., & Spies, R.A. (2008). (Eds.). Tests in Print VII. Lincoln, NE:
University of Nebraska Press, Buros Institute of Mental Measurements.
Murphy, R., & Maree, J.F. (2006). A review of South African research in the field of dynamic
testing. South African Journal of Psychology, 36(1), 168–191.
Murray, M. (2005). How to design a successful assessment centre. People Management (UK),
11(4), 24–45.
Myers, I.B. (1980). Introduction to type. Palo Alto, CA: Consulting Psychologists Press.
Myers, I.B., & Briggs, K.C. (1962). The Myers-Briggs type indicator. Palo Alto, CA: Consulting
Psychologists Press.
Myers, I.B., & McCaulley, M.H. (1985). Manual: A guide to the development and use of the
Myers-Briggs type indicator. Palo Alto, CA: Consulting Psychologists Press.
Naglieri, J.A., & Das, J.P. (1997). Cognitive assessment system. Administration and scoring
manual. Interpretive handbook. Itasca, IL: Riverside.
Naicker, A. (1994). The psycho-social context of career counselling in S.A. schools. South
African Journal of Psychology 24(1), 7–34.
Neisser, U., Boodoo, G., Bouchard, T.J., Boykin, A.W., Brody, N., Ceci, S.J., et al. (1995).
Intelligence: Knowns and unknowns: Report of a task force established by the Board of
Scientific Affairs of the American Psychological Association. Washington DC: APA Science
Directorate. Retrieved 18 October 2007 from
http://www.lrainc.com/swtaboo/taboos/apa_01.html
Nevill, D.D., & Super, D.E. (1986). The values scale: Theory, application and research. Palo
Alto, CA: Consulting Psychologists Press.
Nunnally, J.C., & Bernstein, I.H. (1993). Psychometric theory. New York: McGraw-Hill.
Nzimande, B. (1984). Industrial psychology and the study of black workers in South Africa: A
review and critique. Psychology in Society, 2, 54–91.
Nzimande, B. (1995). To test or not to test? Paper presented at the Congress on Psychometrics,
Council for Scientific and Industrial Research, Pretoria, South Africa.
Oaklander, V. (1997) The therapeutic process with children and adolescents. Gestalt Review,
1(4), 292–317.
Ones, D.S., & Viswesvaran, C. (1998). The effects of social desirability and faking on
personality and integrity assessment for personnel selection. Human Performance, 11, 245–
269.
Ones, D.S., Viswesvaran, C., & Reiss, A.D. (1996). Role of social desirability in personality
testing for personnel selection: The red herring. Journal of Applied Psychology, 81, 660–679.
Ones, D.S., Viswesvaren, C., & Schmidt, F.L. (1993). Comprehensive meta-analysis of integrity
test validities: Findings and implications for personnel selection and theories of job
performance. Journal of Applied Psychology Monograph, 78, 679–703.
Ones, D., Viswesvaran, C., & Dilchert, S. (2005). Personality at work: Raising awareness and
correcting misconceptions. Human Performance, 18(4), 389–404.
Paine, L.S. (1997). Integrity. In P.H. Werhane, & R.E. Freeman, (Eds.), Encyclopedic dictionary
of business ethics (pp. 335–337). Oxford: Blackwell.
PAQ Services. (2013). PAQ’s job analysis. Retrieved 29 July 2013 from
http://www.paq.com/index.cfm?Fuse-Action=bulletins.job-analysis
Pausewang, G. (1997). Adi: Jugend eines Diktators {Adi: The adolescence of a dictator}.
Ravensburg, Germany: Ravensburger Verlag.
Pedrajita, J.Q., & Talisayon, V.M. (2009). Identifying biased test items by differential item
functioning analysis by using contingency table approaches: a comparative analysis.
Education Quarterly, 67(1), 21–43. (University of the Philippines College of Education).
Retrieved 17 July 2013 from
http://journals.upd.edu.ph/index.php/edq/article/viewFile/2017/1912
Peter, L.J., & Hull, R. (1969). The Peter Principle: Why things always go wrong. New York:
William Morrow.
Peters, T. (1994). The Tom Peters seminar: Crazy times call for crazy organizations. London:
Macmillan.
Petersen, I. (2004). Primary level psychological services in South Africa: Can a new
psychological professional fill the gap? Health Policy and Planning, 19, 33–40.
Peterson, I. (1993). From prisons to autos to space. Science New, 17 July, 37.
Phalet, K., & Hagendoorn, L. (1996). Personal adjustment to acculturative transitions: The
Turkish experience. International Journal of Psychology, 31, 131–144.
Phalet, K., & Swyngedouw, M. (2003). A cross-cultural analysis of immigrant and host values
and acculturation orientations. In H. Vinken, & P. Esther (Eds.), Comparing cultures (pp.
185–212). Leiden: Brill.
Phalet, K., van Lotringen, C., & Entzinger, H. (2000). Islam in de multiculturele samenleving
{Islam in the multicultural society}. Utrecht, The Netherlands: University of Utrecht,
European Research Centre on Migration and Ethnic Relations.
Pilbeam, S., & Corbridge, M. (2006). People resourcing: Contemporary HRM in practice (3rd
ed.). Essex, UK: Prentice Hall.
Ployhart, R.E., & Tsacoumis, S. (2001). Strategies for reducing adverse impact. Paper presented
at the February 2001 workshop of the Personnel Testing Council of Metropolitan
Washington, DC. (Cited in Blair, 2003.)
Posner, M.I. (1980). Orienting of attention. The Quarterly Journal of Experimental Psychology,
32(1), 3–25.
Price, R.K., Spitznagel, E.L., Downey, T.J., Meyer, D.J., Risk, N.K., & el-Ghazzawy, O.G.
(2000). Applying artificial neural network models to clinical decision making. Psychological
Assessment, 12(1), 40–51.
Prinsloo, C.H. (1998). Manual for the use of the Sixteen Personality Factor Questionnaire:
South African 1992 version (16PF SA92). Pretoria: Human Sciences Research Council.
Prinsloo, E.D. (2001). A comparison between medicine from an African (Ubuntu) and Western
philosophy. Curationis, 24(1), 58–65.
Psychology 2020 – The 2011–2012 President’s Initiative on the Future of Psychological Science
in Australia. The Australian Psychological Society. Available at
http://www.psychology.org.au/Assets/Files/2012_APS_PIFOPS_WEB.pdf
Public Service Commission (of Canada). (2006). Standardized testing and employment equity
career counselling: A literature review of six tests. Retrieved 15 June 2007 from
http://www.psc-cfp.gc.ca/ee/eecco/intro_e.htm
Quenk, N.L. (2000). Essentials of Myers-Briggs type indicator assessment. New York: Wiley.
Raine, A., & Venables, P.H. (1984a) Tonic heart rate, social class and antisocial behaviour in
adolescents. Biological Psychology, 18, 123–132.
Raine, A., & Venables, P.H. (1984b). Electrodermal non-responding, antisocial behaviour and
schizoid tendencies in adolescents. Psychophysiology, 21(4), 424–433.
Raven, J., Raven, J.C., & Court, J.H. (2003, updated 2004) Manual for Raven’s progressive
matrices and vocabulary scales. San Antonio, TX: Harcourt Assessment.
Reed, T.E., & Jensen., A.R. (1993). Choice reaction time and visual pathway conduction
velocity both correlate with intelligence but appear not to correlate with each other:
Implications for information processing. Intelligence, 17, 191–203.
Reid, B. (1990). Weighing up the factors: Moral reasoning and culture change in a Samoan
community. Ethos, 18(1), 48–71.
Republic of South Africa. (1995). Labour Relations Act, No. 66 of 1995. Government Gazette,
No. 16861, Cape Town.
Republic of South Africa. (1998). Employment Equity Act, No. 55 of 1998. Government
Gazette, No 19370, Pretoria.
Richman-Hirsch, W.L., Olson-Buchanan, J.B., & Drasgow, F. (2000). Examining the impact of
administration medium on examinee perceptions and attitudes. Journal of Applied
Psychology, 85, 880–887.
Robbins, S.P. (1996). Organisational behaviour: Concepts, controversies, applications (7th ed.).
San Diego, CA: Prentice Hall.
Roid, G.H. (2003). Stanford-Binet intelligence scales interpretive manual: Expanded guide to
the interpretation of SB5 test results. Itasca, IL: Riverside.
Rosenthal, R., & Rubin, D.B. (1978). Interpersonal expectancy effects. The first 345 studies.
Behavioural and Brain Sciences, 3, 377–386.
Rothstein, M.G., & Goffin, R.D. (2006). The use of personality measures in personnel selection:
What does current research support? Human Resource Management Review, 16(2), 155–180.
Rounds, J.B., & Tracey, T.J. (1996). Cross-cultural structural equivalence of RIASEC models
and measures. Journal of Counseling Psychology, 43(3), 310–329.
Roussos, L.A., & Stout, W.F. (1996). Simulation studies of the effects of small sample size and
studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal
of Educational Measurement, 33, 215–230.
Rudmin, F., & Ahmadzadeh, V. (2001). Psychometric critique of acculturation psychology: The
case of Iranian migrants in Norway. Scandinavian Journal of Psychology, 42(1), 41–56.
(Cited by Donoso, 2010, q.v.).
Russell, C.J. (2000). The Cleary Model: “Test bias” as defined by the EEOC uniform guidelines
on selection procedures. Retrieved 26 March 2008 from
http://www.ou.edu/russell/whitepapers/Cleary_model.pdf
Russell, M., & Haney, W. (1997). Testing writing on computers: An experiment comparing
student performance on tests conducted via computer and via paper-and-pencil. Education
Policy Analysis Archives, 5(3). Retrieved 25 September 2009 from
http://olam.ed.asu.edu/epaa/v5n3.html
Ryan, A.M., & Sackett, P.R. (1987). Pre-employment honesty testing: Fakability, reactions of
test-takers and company image. Journal of Business and Psychology, 1, 248–256.
Ryan, J., Tracey, T., & Rounds, J. (1996). Generalizability of Holland’s structure of vocational
interests across ethnicity, gender, and socioeconomic status. Journal of Counseling
Psychology, 43, 330–337.
Ryder, A., Alden, L., & Paulhus, D. (2000). Is acculturation unidimensional or bidimensional?
Journal of Personality and Social Psychology, 79, 49–65.
Saccuzzo, D.P., & Jackson, N.E. (1995). Identifying traditionally under-represented children for
gifted programs. The National Research Center on the Gifted and Talented Newsletter,
Winter, 4–5.
Sackett, P., & Wanek, J.E (1996). New developments in the use of measures of honesty,
integrity, conscientiousness, dependability, trustworthiness, and reliability for personnel
selection. Personnel Psychology, 47, 787–829.
Sackett, P.R., Burris, L.R., & Callahan, C. (1989). Integrity testing for personnel selection: An
update. Personnel Psychology, 42, 491–529.
Sagana, A., & Potocnic, K. (2009). Psychological teaching and training in Europe. Paper
presented at the 32nd Inter-American Congress on Psychology 28 June – 2 July, Guatemala
City. Retrieved 19 July 2013 from
http://www.iaapsy.org/division15/uploads/congress/Sagana&Potocnik.pdf
Salovey, P., & Mayer, J.D. (1990). Emotional intelligence. Imagination, Cognition, and
Personality, 9, 185–211.
SAQA (The South African Qualifications Authority). (2012). Level descriptors for the South
African National Qualifications Framework. Pretoria: SAQA (pp. 5–12).
Savickas, M.L. (2006). A vocational psychology for the global economy. Keynote presentation at
the Psychological Association, New Orleans, LA. (Cited in Maree, 2010.)
Saville & Holdsworth Ltd (SHL) (South Africa). (2005). Study into the regulation of
psychologists and psychological assessment in 21 countries. Retrieved 21 November 2005
from http://www.shl.com/SHL/za/Products/Access_Competencies/online-competency-
profiler. aspx
Schein, E.H. (1990). Career anchors: Discovering your real values. San Diego, CA: Pfeiffer.
Schein, E.H. (1992). Organizational culture and leadership. San Francisco, CA: Jossey-Bass.
Schein, E.H. (1995). Career orientations inventory. Englewood Cliffs, NJ: Prentice Hall.
Schein, E.H. (1996). Career anchors revisited: Implications for career development in the 21st
century. Retrieved 18 September 2007 from http://www.solonline.org/res/wp/10009.html
Schellenberg, S.J. (2004). Test bias or cultural bias: Have we really learned anything? Paper
presented at the symposium The Achievement Gap: Test Bias or School Structures?
sponsored by the National Association of Test Directors as part of the Annual Meeting of the
National Council for Measurement in Education, San Diego, CA, 14 April 2004. Retrieved 16
Sept 2013 from http://datacenter.spps.org/uploads/Test_Bias_Paper.pdf
Schmidt, F.L. (1988). The problem of group differences in ability scores in employment
selection. Journal of Vocational Behaviour, 33, 272–292.
Schmidt, F.L., Hunter, J.E, McKenzie, R.C., & Muldrow, T.W. (1979). Impact of valid selection
procedure on work-force productivity. Journal of Applied Psychology, 6(6), 609–626.
Schmidt, F.L., & Hunter, J.E. (1998). The validity and utility of selection methods in personnel
psychology: Practical and theoretical implications of 85 years of research findings.
Psychological Bulletin, 124(2), 262–274.
Schumacher, L.B. (2010). Statement made by the Director of the PhD in Global Leadership at an
Indiana Tech Immersion Weekend (March 2010) (Personal communication 24 July, 2013).
Schwartz, A.L. (1983). Recruiting and selection of sales-people. In E.E. Borrow, & L.
Wizenberg (Eds.), Sales managers’ handbook (pp. 341–348), Homewood, IL: Dow Jones
Irwin.
Scully Mogg Consulting. (1999). Unpublished assessment centre manual. Johannesburg: Scully
Mogg Consulting.
Shackleton, V., & Newell, S. (1997). International assessment and selection. In N. Anderson, &
P. Herriot (Eds.), International handbook of selection and assessment (Vol. 13, pp. 81–95).
Chichester, UK: Wiley.
Shealy, R., & Stout, W. F. (1993). A model-based standardization approach that separates true
bias/DIF from group differences and detects test bias/DIF as well as item bias/DIF.
Psychometrika, 58, 159–194.
Sheldon, W., & Stevens, S.S. (1942). Varieties of human temperament: A psychology of
constitutional differences. New York: Harper.
Shepard, L.A., Camilli, G., & Williams, D.M. (1985). Validity of approximation techniques for
detecting item bias. Journal of Educational Measurement, 22, 77–105.
Sheppard, L.D., & Vernon, P.A. (2008). Intelligence and speed of information processing: A
review of 50 years of research. Personality and Individual Differences, 44, 535–551.
Shippmann, J.S., Ash, R.A., Battista, M., Carr, L., Eyde, L.D., Hesketh, B., Keyhoe, J.,
Pearlman, K., Prien, E.P., & Sanchez, J.I. (2000). The practice of competency modeling.
Personnel Psychology, 53, 703–740.
Shore, B. (1996). Culture in mind: Cognition, culture and the problem of meaning. Oxford:
Oxford University Press.
Siegle, G.J., & Hasselmo, M.E. (2002). Using connectionist models to guide assessment of
psychological disorders. Psychological Assessment, 14(3), 263–278.
Silzer, R., & Jeanneret, R. (1998). Anticipating the future: Assessment strategies for tomorrow.
In R. Jeanneret, & R. Silzer (Eds.), Individual psychological assessment: Predicting
behaviours in organizational settings (pp. 445–477). San Francisco, CA: Jossey-Bass.
Singh, H.P., & Dakunivosa, M. (2001). Fiji national integrity systems: Country study report.
Blackburn South: Transparency International Australia.
So’o, L.L., Asofou, S., Ruta-Fiti, V., Unasa L.F., & Lāmeta, S. (2004). Sāmoa national integrity
systems: Country study report. Blackburn South: Transparency International Australia.
Society for Human Resource Management (2012). SHRM Elements for HR Success Competency
Model® Retrieved 12 December, 2013 from
http://www.shrm.org/hrcompetencies/documents/competency%20 model%208%200.pdf
Society for Industrial and Organisational Psychology of South Africa (SIOPSA). (2005).
Guidelines for the validation and use of assessment procedures for the workplace. Pretoria:
SIOPSA.
Society for Industrial and Organisational Psychology of South Africa (SIOPSA). (2012)
Recommendations for regulating development, control and use of psychological tests. Drawn
up by People Assessment in Industry (PAI – an Interest Group of the Society for Industrial
and Organisational Psychology of South Africa). Pretoria, SIOPSA.
Solomon, R.C. (1999). A better way to think about business. How personal integrity leads to
corporate success. New York: Oxford University Press.
South African Department of Education. (2000). Norms and Standards for Educators,
Government Gazette, 415 (20844), 4 February 2000: Pretoria. Retrieved 25 March 2008 from
http://www.info.gov.za/gazette/notices/2000/20844.pdf
South African Government. (1998). Employment Equity Act (55 of 1998). Government Gazette,
No. 19370, Vol. 400, Cape Town: Government Printer.
South African Government. (1999). Skills Levies Act (9 of 1999). Government Gazette, No.
19984, Vol. 406, Cape Town: Government Printer.
Stamp, G., & Retief, A. (1996). Towards a culture-free identification of working capacity: The
Career Path Appreciation. Uxbridge, UK: BIOSS.
Stamp, G., & Stamp, C. (1993). Well-being at work: Aligning purposes, people, strategies and
structure. International Journal of Career Management, 5(3).
Stamp, G., & Stamp, C. (2004). The individual, the organisation and the path to mutual
appreciation. Retrieved 27 January 2009 from http://www.gillianstamp.com
Stampe, D., Roehl, B., & Eagan, J. (1993). Virtual reality creations. Corte Madera, CA: White
Group Press.
Stanton, J.M. (1999). Validity and related issues in Web-based hiring. The Industrial-
Organizational Psychologist, 36(3), 69–77.
Stanush, P., Arthur, W., & Doverspike, D. (1998). Hispanic and African American reactions to a
simulated race-based affirmative action scenario. Hispanic Journal of Behavioural Science,
20(1), 3–16.
Sternberg, R.J., & Grigorenko, E.L. (2002) Dynamic testing: The nature and measurement of
learning potential. Cambridge: Cambridge University Press.
Sternberg, R.J. (1977). Intelligence, information processing, and analogical reasoning: The
componential analysis of human abilities. Hillsdale, NJ: Erlbaum.
Sternberg, R.J. (1988). The triarchic mind: A new theory of human intelligence. New York:
Viking.
Sternberg, R.J. (Ed.). (2000). Handbook of intelligence. New York: Cambridge University Press.
Sternberg, R.J., & Detterman, D.K. (Eds.). (1986). What is intelligence? Contemporary
viewpoints on its nature and definition. Norwood, NJ: Ablex.
Sternberg, R.J., Forsythe, G.B., Hedlund, J., Horvath, J., Snook, S., Williams, W.M., Wagner,
R.K., & Grigorenko, E.L. (2000). Practical intelligence in everyday life. New York:
Cambridge University Press.
Sullivan, L., & Arnold, D.W. (2000). Invasive questions lead to legal challenge, settlement and
use of different tests. The Industrial Psychologist, 38(2), 142–143.
Swaminathan, H., & Rogers, H.J. (1990). Detecting differential item functioning using logistic
regression procedures. Journal of Educational Measurement, 27, 361–370.
Syracuse University HRD Competency Library (n.d.) Retrieved 15 August 2013 from
http://humanresources.syr.edu/staff/nbu_staff/comp_library.html
Taylor, T.R. (1994). A review of three approaches to cognitive assessment and a proposed
integrative approach based on a unifying theoretical framework. South African Journal of
Psychology, 24(4), 184–193.
Taylor, T.R. (2006). Administrator’s manual for TRAM-1 battery. Edition 3. Johannesburg:
Aprolab.
Taylor, T.R. (2013). APIL and TRAM learning potential assessment instruments. In S. Laher, &
K. Cockcroft (Eds.), Psychological Assessment in South Africa: Research and Applications
(Chapter 11, pp. 158–168). Johannesburg: Wits University Press.
Terpstra, D.E., Mohamed, A.A., & Kethley, R.B. (1999). An analysis of federal court cases
involving nine selection devices. International Journal of Selection and Assessment, 7(1),
26–33.
Theron, C. (2007). Confessions, scapegoats and flying pigs: Psychometric testing and the law.
South African Journal of Industrial Psychology, 33(1), 102–117.
Theron, C. (2009.) The diversity-validity dilemma: In search of minimum adverse impact and
maximum utility. South African Journal of Industrial Psychology, 35(1), 1–13.
Thorndike, R.L. (1971). Educational measurement (2nd ed.). Washington DC: American
Council on Education.
Thurstone, L.L. (1938). Primary mental abilities. Chicago: University of Chicago Press.
Tippins, N.T. (2009). Internet alternatives to traditional proctored testing: Where are we now?
Industrial and Organizational Psychology: Perspectives on Science and Practice, 2(1), 2–10.
Transparency International (TI). (2009). The anti-corruption plain language guide. Switzerland:
TI.
Tredoux, C., & Durrheim, K. (Eds.). (2005). Numbers, hypotheses and conclusions (2nd ed.).
Cape Town: UCT Press.
Tredoux, N. (2013). Using computerised and Internet-based testing in South Africa. In S. Laher,
& K. Cockcroft (Eds.) Psychological assessment in South Africa: Research and applications.
Johannesburg: Wits University Press.
Triandis, H.C. (2000). Culture and conflict. International Journal of Psychology, 35(2), 145–
152.
US Congress, Office of Technology Assessment. (1990). The use of integrity tests for pre-
employment screening, OTA-SET-442 Washington, DC: US Government Printing Office.
Valchev, V.H., Nel, J.A., van de Vijver, F.J.R., Meiring, D., de Bruin, G.P., & Rothmann, S.
(2013). Similarities and differences in implicit personality concepts across ethnocultural
groups in South Africa. Journal of Cross-Cultural Psychology, 44, 365–388.
Van de Vijver, F.J.R. (2002). Cross-cultural assessment: Value for money? Applied Psychology:
An International Review, 51(4), 545–566.
Van de Vijver, F.J.R., & Hambleton, R.K. (1996). Translating tests: Some practical guidelines.
European Psychologist, 1(2), 89–99.
Van de Vijver, F.J.R., Helms-Lorenz, M., & Feltzer, M.F. (1999). Acculturation and cognitive
performance of migrant children in the Netherlands. International Journal of Psychology, 34,
149–162.
Van de Vijver, F.J.R., & Leung, K. (1997a). Methods and data analysis of comparative research.
In J.W. Berry, Y.H. Poortinga, & J. Pandey (Eds.), Handbook of cross-cultural psychology
(2nd ed.) (Vol. 1, pp. 257–300). Boston: Allyn & Bacon.
Van de Vijver, F.J.R., & Leung, K. (1997b). Methods and data analysis for cross-cultural
research. Newbury Park, CA: SAGE.
Van de Vijver, F.J.R., & Phalet, K. (2004). Assessment in multicultural groups: The role of
acculturation. Applied Psychology: An International Review, 53, 215–236.
Van de Vijver, F.J.R., & Poortinga, Y.H. (1997). Towards an integrated analysis of bias in cross-
cultural assessment. European Journal of Psychological Assessment, 13, 29–37.
Van de Vijver, F.J.R., & Poortinga, Y.H. (2002). Structural equivalence in multilevel research.
Journal of Cross-Cultural Psychology, 33(2), 141–156.
Van de Vijver, F.J.R., & Tanzer, N.K. (2004). Bias and equivalence in cross-cultural assessment:
An overview. European Review of Applied Psychology, 54(2), 119–135.
Vernon, P.A. (1993). Biological approaches to the study of human intelligence. Norwood, NJ:
Ablex.
Vernon, P.E. (1960). The structure of human abilities (revised ed.). London: Methuen.
Verster, J.M. (1989). A cross-cultural study of cognitive processes using computerized tests.
Unpublished PhD Thesis, University of Pretoria, Pretoria.
Vygotsky, L.S. (1978). Mind and society: The development of higher mental processes.
Cambridge, MA: Harvard University Press.
Walters, L.C., Miller, M.R., & Ree, M.J. (1993). Structured interviews for pilot selection: No
incremental validity. International Journal of Aviation Psychology, 3(1), 25–38.
Wanek, J.E., Sackett, P.R., Ones, D.S. (2003). Towards an understanding of integrity test
similarities and difference: An item-level analysis of seven tests. Personnel Psychology, 56,
873–894.
Wang, Z & Mobley, W.H. (2011). Spotlight on global I-O industrial-organizational psychology
developments in China. The Industrial-Organizational Psychologist, 49(2), 101–104
Retrieved 24 July 2013 from http://www.siop.org/tip/oct11/16thompson.aspx
Wang, Z.M. (1993). Psychology in China: A review dedicated to Li Chen. Annual Review of
Psychology, 44, 87–116.
Wang, Z.M. (1995). Culture, economic reform and the role of industrial and organizational
psychology in China. In M.D. Dunnette, & L.M. Hough (Eds.), Handbook of industrial and
organizational psychology, (2nd ed.). (pp. 689–726). Palo Alto, CA: Consulting
Psychologists Press.
Ward, C., & Kennedy, A. (1992). The effects of acculturation strategies on psychological and
sociocultural dimensions of cross-cultural adjustment. Paper presented at the 3rd Asian
Regional IACCP Conference, Bangi, Malaysia. (Cited in Donoso, 2010, q.v.)
Ward, C., & Searle, W. (1991). The impact of value discrepancies and cultural identity on
psychological and sociocultural adjustment of sojourners. International Journal of
Intercultural Relations, 15(2), 209–224.
Wechsler, D. (1939). Measurement of adult intelligence. Baltimore, MA: Williams & Wilkins.
Weiner, J.A., & Rice, C. (2012). Utility of alternative UIT verification models. Paper presented
at the 27th Annual Conference of the Society for Industrial and Organizational Psychology,
San Diego, CA. (Cited by Macqueen, 2012.)
Wemm, R.L. (2001). International psychologists are trained by varying degrees. Observer
(American Psychological Society), 14(3). Retrieved 22 July 2013 from
http://www.psychologicalscience.org/observer/0301/notebook2.html See also
http://www.neurognostics.com.au/AcademicEquivs/OzziePsychoCringe.htm.
Werner, O., & Campbell, D.T. (1970). Translating, working through interpreters, and the
problem of decentering. In R. Naroll, & R. Cohen (Eds.), A handbook of cultural
anthropology (pp. 398–419). New York: American Museum of National History.
Westen, D. (2002). Psychology: Brain, behaviour and culture. New York: Wiley.
White, M.J., Brockett, D.R., & Overstreet, B.G. (1993). Confirmatory bias in evaluating
personality test information: Am I really that kind of person? Journal of Counselling
Psychology, 40(1), 120–126.
Whiteley, P. (2012). Are Britons becoming more dishonest? Retrieved 4 June 2013 from
http://www.essex.ac.uk/government/news_and_seminars/newsEvent.aspx?e_id=3880
Wiggins, J.S. (1973). Personality and prediction: Principles of personality assessment. Reading,
MA: Addison-Wesley.
Williams, R.W. (2006). The not-so-hidden costs of poor selection. Retrieved 15 April 2008 from
http://hr.monster.com/articles/wendell/wendell3
Wise, P.S. (1989). The use of assessment techniques by applied psychologists. Belmont, CA:
Wadsworth.
Wolfaardt, J.B., & Roodt, G. (2005). Basic concepts. In C. Foxcroft, & G. Roodt (Eds.), An
introduction to psychological assessment in the South African context (2nd ed.). (Chap. 3).
Cape Town: Oxford University Press.
Zumbo, B.D. (1999). A handbook on the theory and methods of differential item functioning
(DIF): Logistic regression modeling as a unitary framework for binary and Likert-type
(ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and
Evaluation, Department of National Defense.
Zunker, V.G. (1998). Career counseling: Applied concepts of life planning (5th ed.). Pacific
Grove, CA: Brooks/Cole.
Index
A
Abbott, B.B. 15
Abrahams, F. 53, 137
absolute zero 9
acquiescence 55
Adarrage, P. 234
Adult Basic Competence Test 91
adverse impact 78
affirmative action 69
age equivalents 58
age referencing 58
Allport, G. 127
alternate form reliability 40
ambiguous pictures 131
American Psychological Association 102, 125
Anastasi, A. 58
Anolli, L. 234
antecedents 17
Antonius, R. 256
anxiety levels 54
APA see American Psychological Association
Aristotle 13
Arnold, D.W. 222
Arnold, J. 146, 201
Arthur, W. 83
artificial intelligence 234
artificial situations 18
Aryee, S. 170
aspiration level 186
assembly tasks 123
assess 5
why assess? 5
assessment 3, 20, 94, 219, 234
assessment as research 20
biologically anchored 234
dynamic 219
assessment centres 201, 211–213
definition 201
fairness 213
psychometric properties 211
reliability 212
validity 212
assessment instructions 94
assessment materials 94
attenuation 42
Australia 101
autonomony/independance 183
B
banding 84
Barnouw, V. 127
Barnum effect 137
Bar-On, R. 222
Baron, R.A. 127
Barrick, M.R. 140, 169
base rate 69
battery 88, 185
Baydoun, R. 226
behavioural change 225
behavioural indicators 147, 208–209
behaviourally anchored rating scales 174
Belbin, M. 169
Belbin 206
Benbow, C. 115
Ben-Porath, Y.S. 229
Bernstein, I.H. 3, 5, 12, 32, 44–45, 48, 50, 66, 73, 83
Berry, J.W. 222
bias 47
Big Five dimensions 169
Big Five theory 236
Binet, A. 113
biologically anchored assessment 234
Blair, M.D. 213, 215
Blanchard, K.H. 150, 170
Bogardus Social Distance Scale 31
Borden, K.S. 15
Borgen, F.H. 185
Borman, W. 225
Boston Consulting Group 175
Bouchard, T.J. 116
Bradberry, T. 223
Briggs, K.C. 135
Buros Institute of Mental Measurements 241
bus accident victim 70
Butcher, J.N. 229
C
Callanan, G.A. 178
cancellation task 71
career anchor 183
career path appreciation 164, 219
career 178
security 178
definition 178
Caretta, T.R. 116
Carroll, J.B. 116
Caruso, D. 222
Caryl, P.G. 235
Cascio, W.F. 85
case-based reasoning 234
Cattell, J.M. 112
Cattell, R.B. 115–116, 133, 137
CDS see Cognitive Distortion Scale
Ceci, S.J. 235
ceiling effects 44
central tendency 59
centrality 55
cerebral glucose metabolism 235
chaos theory 219, 232
Charoux, J.A.E. 213
Clarke, D. 226
Cleary model 92
client interview 153
clinical combination of scores 63
coefficient of
determination 51
equivalence 41
internal consistency 42
stability 40
Coetzee, N. 185
cognitive correlates 117
Cognitive Distortion Scale 72
Cohen, R.J. 12, 34, 56, 73, 92, 125, 139, 193, 197, 199, 228, 230, 237
Cole, N.S. 83
Coleman, V. 225
Collins, R.C. 185
combining scores 63–67
balanced scorecard 66
compensatory methods 65
decision-making matrix 67
multiple-hurdle approach 64
profile analysis 65
simple average 64
weighted averages 64
common-item equating 41
common-person research design 41
competency 145–148, 148, 151–155
assessment of 148
competence in non-work-related areas 155
core and cross-functional 151
definition 146
competency framework, drawing up a 146
fairness 154
identification of 153
information, sources of 148
levels of competence 148
performance criteria 147
portfolio of 154
potential barriers 148
range statements 147
technical and higher-order 152
units of competence 146
competitiveness 54
complexity theory 232
componential theory of intelligence 118
computer-assisted 226
computer-based adaptive testing 228
computer-based assessment 219
computer-based testing 226
computerised report writing 230
conceptualising 27
concientiousness 139
concurrent validity 50
conditional probability model 83
confidentiality 96
confirmatory biases 196
confirmatory factor analysis 49
constant ratio model 83
construct validity 47
constructivist 11
constructs 25
content validity 49
continuous adding 71
convergent validity 48
Cooper, C.L. 215
correlation matrix 250
correlations 249
Corrigan, B. 195
cortical neurons 235
Cortina, J.M 195
Costa, P.T. 139, 236
Crafford, A. 190
criterion problem 52, 171
criterion referencing 58
criterion-related validity 50
critical incident technique 153
critical realists 12
Cronbach 42
Cronbach’s alpha 42
cross-validation 30
crystallised intelligence 115, 221
cultural fairness of assessment centres 213–214
assessor 214
design 214
culturally saturated 123
culture fairness 236
D
Dahmer, G. 233
Darwin, C. 112
data capture 251
Davidshofer, C.O. 171–172, 212
Davies, M. 122
Dawes, A. 195
Day, S.X. 185
De Beer, M. 124
décalage 58
360–degree assessment 173
demand characteristics 55, 232
detection rate 69
Detterman, D.K. 220
developmental centres 202
advantages 202
disadvantages 203
developmental sequencing 49
developmental stage referencing 58
deviation IQ 114
deviation score 59
dichotomous items 30
Differential Aptitude Test 88, 185
differential item functioning 80
Dilchert, S. 112
discriminant validity 48
discrimination 77
disparate treatment 78
distortion 55
distracters 28
domain referencing 58
Donovan, M.A. 226
Doverspike, D. 83
DPsych degree 102, 235
Drasgow, F. 226
DSM-IV 138
Du Toit, R. 185
Dubois, D. 146 147 156
d-value 65
dynamic assessment 219, 221, 232
dynamic testing 123
E
Eagan, J. 227
EAP see Employment Assistance Programme
Eber, H.W. 137
ecological validity 51
economic value added 174
ectomorph 128
Educational Testing Service 149
Edwards, D.J.A. 125
emic approach 79
EmmerlingR.J. 222
emotional intelligence 120–121, 222
emperical validity 50
empirical criterion keying 132
empiricist 11
Employment Assistance Programme 197
Employment Equity Act 52, 78–79, 99, 179
employment equity 83
endomorph 128
entrepreneurial creativity 183
equal probability model 83
Erikson, E. 128
error score 37
ethical standards 98
ethics 20
etic approach 79
ETS see Educational Testing Service
evaluation 4
evoked potentials 235
evolutionary biology 233
Excel 252
expectancy tables 57
exploratory factor analysis 49
external stakeholders171
extraverts 135, 187
extremity 55
Eysenck, H. 134, 223
F
face validity 50
factor analysis 48
fairness 11, 75–76, 80, 85, 87–88
absence of predictive bias 75
bias 76
definition 76
discrimination 76
ensuring fairness 85
equal opportunity 75
equal outcomes 75
equitable treatment 75
evidence of unfairness 80
reasonable accommodation 76
removal of discriminatory items 87
single tests/different norms 88
single tests/same norms 88
use of separate tests 87
faking 55, 219
false negatives 68
false positives 68
feedback 96
feelers 135, 188
Fernández-Ballesteros, R. 219, 224, 226
Feuerstein, R. 123
19-Field Interest Inventory 185
Fink, A. 220
Fisher, W.P. 3, 141
fitness landscape 233
five-factor model 139
Flesch-Kincaid grade level 53
Fletcher, C. 196
floor effects 44
fluid intelligence 115, 221
Flynn, J.R. 62, 124
Flynn effect 62, 124
focus groups 153
Fontenesi, M. 234
forced distribution 172
forensic evalution 171
Forer effect 136
formative assessment 5
FouadN.A 185
four fifths rule 84
four humours 134
Foxcroft, C. 8, 34, 73, 88, 226, 227
fractals 232–233
Fransella, F. 142
Friedman, H.S. 129, 131
Frost, N. 117
Furnham, A. 112, 125, 134, 136, 140–141, 146, 154, 156, 163–164
G
Galton, F 112
Gardner, H. 111, 120, 223
theory of multiple intelligences 120
Geisinger, K.F. 241
general ability 184
general knowledge 123
general management competence 183
generalisability 5
genetics 220
George, J.A. 163, 203
Gibson, R.O. 165
Goldberg, L.R. 195
Goldstein, Braverman and Goldstein 3
Goleman, D. 120, 122, 222–224
goodness of fit 81
Gottfredson, L.S. 220
grade equivalents 58
grade referencing 58
Graduate Record Examination 149
Graves, L. 195
Grayson, P. 215
Greaves, J. 223
Greenberg, J. 127
Greenhaus, J.H. 178
Gribbin, J. 221
Grigorenko, E.L. 125
grounded theory 16
Grove, W.M. 230
growth curves 167
Guilford, J.P. 117
Guion, R.M. 176
Guttman scales 31
H
Haier, R.J. 235
halo effect 172
Haney, W. 228
Hansen, J.I. 185
Hanson, M.A. 227
hard of hearing candidates 97
Harmon, L.W. 185
Harris, W.G. 226
Hartle, F. 146
Harvard Business School 163
Hasselmo, M.E. 233
Haverkamp, B.E. 185
Health Professions Council of South Africa 100, 235
Heilman, M.E. 83
Herbst, D.L. 185
hermeneutic 16
Hersey, P. 150, 170
HESA see Higher Education South Africa
Higgins, K.D. 222, 231
high verbal 118
Higher Education South Africa 149
Hitler, A. 233
HIV/Aids 140
Hoepfner, R. 117
Holland, J.L. 179, 181
Horn 116
Hough, L. M. 85, 140–141, 169, 175–176, 195
Human Sciences Research Council 137
Hunt, E.B. 117
Hurst, D.N. 213
I
IDEAS see Interest Determination, Exploration and Assessment System
identifying 203
competencies 203
idiographic 127
illusion of validity 196
image management 55
impression 219
incomplete sentences 132
incremental validity 195
inductive reasoning 115
industrial relations climate 170
information processing 111
informed consent 21
Initial Recruitment Interview Schedule 166
inkblots 131
integrity testing 140
intelligence quotient 113
intelligence 109–112, 114, 219–220
definition 109
discovering rules110
dynamic assessment of 219
genetics 220
information processing 111
learning from experience 110
mental speed 220
models of 114
problem solving 111
recognising patterns 110
socially defined 112
structural approaches 114
understand to comprehend 110
Wechler, D. 110
Interest Determination, Exploration and Assessment System 191
interests 185
internal consistency 41
internet based assessment 231
internet 101, 219
interpretation of results 96
interpretation 30
of test results 30
interquartile range 8
inter-rater reliability 43
inter-scorer 43
interval data 9
interviewing 89, 193–197
continued use 196
counselling interviews 194
definition 193
effective 197
employment interviews 194
reliability of interviews 194–195
semi-structured interviews 194
stages 197
structured interviews 194
traditional interviews 194
validity of interviews 195
introverts 135, 187
intuiters 135
intuitives 187
investment theory of intelligence 115
ipsative scoring 33
IQ see intelligence quotient
I.R.I.S see Initial Recruitment Interview Schedule
Isaac, D.J. 165
item analysis 29, 250
item direction 32
item response theory 220, 228
item weighting 32
item-remainder correlation 29
item-total correlation 29
J
Jackson Vocational Interest Survey 185
Jaques, E. 165
Jeanneret, P.R. 161
job description 66, 154, 161
job diaries 153
Johnson, W. 116
Jones, J.W. 222, 231
Jopie van Rooyen & Associates 185
judgemental measures 171
judgers 135, 189
Jung Personality Questionnaire 186
K
Kaplan, R.M. 38, 43, 69, 81, 87, 92, 127, 132, 134, 197–199
Katz, M.R. 185
Keirsey, D. 186
Keirsey Temperament Sorter 186
Kelly, G.A. 21, 129, 142
key performance areas 146–147
Kirkpatrick, D.L. 170
Klimoski, R.J. 215
knowledge-based systems 234
Kravitz, D.A. 83
Kriek, H.J. 213
Kruger, P. 21
Kuder Occupational Interest Survey 185
Kuder 42
Kuder-Richardson formula 42
L
Langley, R. 185
language ability 54
leaderless group technique 205
Learning Propensity Assessment Device 124
levels of measurement 810
permissible statistics 10
Liddel, C. 21
Lievens, F. 215
lifestyle 183
likert scales 30
Linn, R.L. 83
locating assessment centre exercises 205
locus of control 140
Louw, D.A. 125
Lubinski, D. 115
Lunneborg, C.E. 117
Lyman, H.B. 226
M
magnetic resonance imaging 220
malingering 219
management 219
Maree, J.F. 124, 232
Marks-Tarlow, T. 233
Maslow 129
Matarazzo, J.D. 228
material 99
matriculation 149
matrix of work relations 165
matrix 122
maturational sequencing 49
Mauer, K.F. 53, 137, 168
Mayer, J.D. 122, 222–223
MBTI see Meyers-Briggs type indicator
McCall’s T-score 59
McCaulley, M.H. 136
McClelland, D.C. 129
McCormick, E.J. 161
McCrae, R.R. 139, 236
McIntyre, S.A. 38–39, 44, 49–50, 56, 58, 63, 92, 104, 195, 197, 201, 237
MCMI see Millon Clinical Multiaxial Inventory
measurement 3
measuring 4
technique 4
Mecham, R.C. 161
mechanical/actuarial combination of scores 63
median 8
Meehl, P.E. 63
Meijer, R.R. 229
mental age 58, 113
Mental Measurements Yearbook 241
mental speed 220
Mercer 87
mesomorph 128
Meyer, W.F. 127
Miller, L.A. 38–39, 44, 49–50, 56, 58, 63, 92, 104, 195, 197, 201, 237
Miller, L.K. 24
Miller, M.R. 195
Millon Clinical Multiaxial Inventory 138
Millon, T. 138
Milsom, J. 154, 213
Miltenberger, R. 24
Minnesota Multiphasic Personality Inventory 138, 229–230
MMPI see Minnesota Multiphasic Personality Inventory
MMPI-2 101
Moerdyk, A.P. 190
monitoring 95
Moore, C. 127
Mount, M.K. 140
Murphy, K.R. 171–172, 212
Murphy, L. 241
Murphy, R. 124, 232
Murray, M. 215
Myers, I.B. 135
Myers-Briggs Type Indicator 140, 170, 186, 205
N
National Benchmark Tests 149
National Qualifications Framework 100, 148
naturalistic situations 18
NBT see National Benchmark Tests
need for
achievement 129
affiliation 129
power 129
needs 185
Neisser, U. 125, 220
Nel, P.190
Nell, V. 26, 235
NEO-PI 139
Nering, M.L. 229
nerve conduction velocity 235
Neubauer, A.C. 220
Neuman, G. 226
Nevill, D.D. 185
New Zealand 102
Newell, S. 170
nominal data 8
nomothetic 127
norm development 30
norm groups 61
norm referencing 58
norm 30, 59
normal distribution 59, 60
NormMaker 62
NQF see National Qualifications Framework
numeric calculations 71
Nunnally, J.C. 3, 5, 12, 32, 44–45, 48, 50, 66, 73, 83
O
objectivity 5
observation 4, 15–16, 18, 19, 89
artificial situations 18
casual 15
looking at 16
looking for 16
naturalistic situations 18
observer intervention 18
observer involvement 18
primate behaviour 16
schedules 19
simulations 18
systematic 16
tools or aids 19
observed score 37
Occam’s razor 221
Occupational Personality Questionnaire 133, 139, 186, 196
odd-one-out 123
off-diagonal cells 250
Olson-Buchanan, J.B. 226
O’Neill, C. 190
Ones, D. 112
operationalising 28
OPQ see Occupational Personality Questionnaire
ordinal scales 8
organisational citizenship behaviour 139, 225
organisational development 170
Oswald, F.L. 85, 140–141, 169, 175–176, 195
outliers 66
P
PAI see Psychological Assessment Inventory
PAQ see Position Analysis Questionnaire
parallel form reliability 40
Pausewang, G. 233
perceivers 135, 189
percentiles 859
perceptual speed 115
performance appraisal 171
performance management 164
person profile 161
person specification 161
personal constructs 142
personality assessment 134
four humours 134
Jung’s typology 134
Myers-Briggs 134
Wundt’s typology 134
16 Personality Factor Inventory 133
personality profiles 52
personality 127–129, 130–131, 221, 233
as a fractal 233
assessment of 130
biological approach 128
computer-based simulations 131
definition 127
developmental approaches 128
need theories129
observation 131
phenomenological approaches 129
projective techniques 131
psychoanalytic theories 129
theory 128
trait approaches 130
Peterson, I. 234
physically disabled candidates 97
Piaget, J. 58
pilot testing 29
placement 160
Plake, B.S. 241
Ployhart, R.E. 213
Position Analysis Questionnaire 139, 161, 197
positive psychology 225
Posner, M. 118
post profile 161
posttraumatic stress disorder 233
potential 221
power tests 44
precision 5
predictive validity 50
pre-market discrimination 78
presentation of self 55
Preskill, H. 175
Price, R.K. 235
primary mental abilities 115
Prinsloo, C.H. 137
problem solving 111
Probst, T.M. 226
production measures 171
Professional Board for Psychology 226, 231, 235
professional training 235
promotion 160, 164
psychic unity 78
Psychological Assessment Inventory 138
psychological assistants 235
psychometric tests 26
Psychometrics Committee 138
psychometrist 100
psycho-technician 100
Psytech 185
Public Service Commission of Canada 136, 142, 190
pure challenge 183
Q
qualified individualism 83
quantification 11
social sciences 11
quantifying 28
quartiles 59
Quenk, N.L. 136
quota system 83
R
random error 38
ranking 172
rapport 39, 95
rating 172
rating techniques 173
ratio data 9
Ravens Progressive Matrices 122
Ravens Standard Progressive Matrices 71, 205
reasonable accommodation 97
Ree, M.J. 116, 195
regression analysis 81
Reiber, A. 163, 203
reliability 37, 39, 41, 43
internal consistency 41
inter-rater 43
inter-scorer 43
test-retest 39
repertory grid 21, 129, 154
repgrid see repertory grid
response set 28
response sets 54, 224
restriction of range 44, 54
Retief, A. 168
reverse scoring 32
RIASEC 181–182, 188
expanded RIASEC model 182
Richardson 42
Richman-Hirsch, W.L. 226
rights 99
Robbins, S.P. 127, 185
Roberts, R.D. 122
Robertson, I.T. 215
robustness 38
Roehl, B. 227
Roodt, G. 8, 34, 40–41, 73, 226–227
Rorschach test 139
Rosenthal, R. 196
Rothwell, W.J. 156
Rounds, J.B. 185–186
Rubin, D.B. 196
Russ-Eft, D. 175
Russel, C. 82
Russel, C.J. 92
Russel, M. 227–228
Ryan, J. 185
S
Saccuzzo, D.P. 38, 43, 69, 81, 87, 92, 127, 132, 134, 197–199
Salovey, P. 122, 222–223
SAQA see South African Qualifications Authority
SAT Reasoning test 149
satisficing 221
sausages on a stick 82
Saville & Holdsworth Ltd 162
scale length 45
Schein, E. 183
Schlechter, A. 190
Schmidt, F.L. 92
Schustack, M.W. 129, 131
Schwartz, A.L. 163
scoring 96
SDS see Self-Directed Search (SDS)
second guessing 55
security 99
security/stability 183
selection criteria 161
selection ratio 69
selection 160
Self-Directed Search (SDS) 185
self-fulfilling hypotheses 196
self-referencing 59
sensing 135
sensitivity 38
sensors 187
service/dedication to a cause 183
sex hormones 235
Shackleton, V. 170
Sheldon, W. 128
Sheppard, L.D. 220
SHL 185, 203
SHL-SA 102
Shore, B. 79
Shuttleworth-Jordan, A.B. 53
Siegle, G.J. 233
Silence of the Lambs 195
Simon 113
simulations 18
situational leadership 150
sliding band approach 84
Slutske, W.S. 229
social desirability 55
social facilitation 197
somatotypes 128
SOMPA see System of Multicultural Pluralistic Assessment
SORT see Structured Objective Rorschach Test
South African Medical and Dental Council 100
South African Qualifications Authority 148
spatial visualisation 115
spatial-practical-mechanical ability 116
Spearman, C. 114
Spearman-Brown formula 42
special situations 97
specialist register 100
specific aptitudes 185
speed tests 44
Spies, R.A. 241
spot the error 71
SPSS see Statistical Package for the Social Sciences
SST see stratified systems theory
staff development 164
Stamp, G. 165, 168
Stampe, D. 227
standard deviation 59
standard error of measurement 38–39, 85
sources of error 39
standard score 59
standardisation 40
stanines 59
Stankov, L. 122
Stanton, J.M. 227
Stanush, P. 83
Statistica 251
Statistical Package for the Social Sciences 256
statutory control 99
stens 60
Sternberg, R.J. 111, 118–119, 125, 220
componential theory of intelligence 118
triarchic theory of intelligence 119
Stevens 128
Stewart, V. 129, 142
strange attractor 232–233
stratified systems theory 164, 165, 232
Strauss, J.P. 140
Strong Interest Inventory 185
Structure of Intellect 117
Structured Objective Rorschach Test 132–133
empirical approach 132
factor analysis approach 133
logical approach 132
objective measures 132
theoretical approach 132
the trait approach 133
Stumpf, R. 88
subjective scoring 45
Sullivan, L. 222
summative assessment 5
Super, D.E. 185
survey 25153
Swerdlik, M.E. 12, 34, 56, 73, 92, 125, 139, 193, 197, 199, 228, 230, 237
System of Multicultural Pluralistic Assessment 87
T
Targeted Selection 197
TAT see Thematic Apperception Test
Tatsuaoka, M.M. 137
Taylor, T 124
Team Role Inventory 169
team work 169
technical/functional competence 183
temporal stability 44
Terman, L. 113
Test Commission of South Africa 100
Test of English as a Foreign Language 149
test sophistication 44, 54
testing 4–5
test-retest reliability 39
Tests in Print 241
Thematic Apperception Test 129, 131
theoretical validity 47
theory of measurement 37
theory of multiple intelligences 120
Theron, C. 92
thinkers 135, 188
Thorndike, R.L. 83, 223
Thurstone, L. 115, 121
time limits 54
T-patterns 234
Tracey, T. 185
track record 172
track record information 171
training 170
training for psychologists 101
trait 127–128
cardinal 128
central 128
secondary 128
transfer 160, 164
transfer effects 40
Trauma Symptom Inventory 72
triangulation 8
triarchic theory of intelligence 119
true negatives 68
true positives 68
true score 37
Tryon, W.W. 233
Tsacoumis, S. 213
type 1 error 68
type 2 error 68
type A and type B personalities 140
U
United Kingdom 102
United States 102
unqualified individualism 83
Urbina 58
V
validity generalisation 52
validity 47–53
concurrent 50
construct 47
content 49
convergent 48
criterion-related 50
developmental sequencing 49
discriminant 48
ecological 51
empirical 50
face 50
factors affecting 53
generalisation 52
maturational sequencing 49
predictive 50
theoretical 47
values 185
venue for assessment 94
verbal comprehension 115
verbal fluency 115
verbal-educational ability 116
Vernon, P.A. 220, 235
Vernon, P.E. 116
Verster, J.M. 228
Viljoen, H.G. 127
visually impaired candidates 98
Viswesvaran, C. 112
Vocational Preference Inventory 185
Vosloo, H.N. 185
VPI see Vocational Preference Inventory
Vygotsky, L.S. 124
W
WAIS see Wechler Adult Intelligence Scale
Walters, L.C. 195
Wechler, D. 110, 111
Wechler Adult Intelligence Scale 113
Wechler Intelligence Scale for Children 113
Weiss, T.B. 146
Westen, D. 111, 119, 125
Wiggins, J.S. 195
Williams, R.W. 175
WISC see Wechler Intelligence Scale for Children
Wolfaardt, J.B. 40–41
Work Profiling System 162
work samples 139
Wundt, W. 134
Wundt’s typology 134
Z
Zacagnini, J.L. 234
Z-score 59