Professional Documents
Culture Documents
May not be reproduced in any form without permission from the publisher, except fair uses permitted under
8
OB J ECT I V ES
By the end of this chapter, you should be able
to
war and nuclear disasters or large-scale chemical veloping countries to seek work elsewhere,
pollution accidents such as occurred at Bhopal, while developed countries have increased
India. People are also assessed for educational their demand for labour, especially unskilled
placement and for the award of academic bur- labour. As a result, millions of workers and
saries and scholarships. In the organisational their families travel to countries other than
arena, people are usually assessed for selection their own to find work. At present there are
purposes to determine their suitability for specif- approximately 175 million migrants around
ic jobs or positions. the world, roughly half of them workers (of
In the context of this book, most of this as- these, around 15% are estimated to have an
sessment will be for the selection and placement irregular status).
µ BACK TO CONTENTS
EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 8/19/2019 6:56 AM via THE SOUTH AFRICAN COLLEGE OF97
APPLIED PSYCHOLOGY
AN: 1243028 ; Moerdyk, A. P..; The Principles and Practice of Psychological Assessment
Account: ns190599
SECTION 2 INTRODUCTION TO PSYCHOMETRIC THEORY
The report also predicts further increases in norms, roles and values ... Cultural differ-
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
outcomes reflect weaknesses in the assessment turation is a process of change in the direction of
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
instruments – that is, they are biased? If people the mainstream culture. Although migrants may
score differently on any assessment technique differ in the speed of the process, it results in
and this is because the assessment techniques adaptation to the culture of destination.
used are invalid, this is a psychometric issue Recently (over the past two or three decades),
– the measuring tool is “at fault”, and ways of the uni-dimensional model has been increas-
dealing with this need to be found. Alternative- ingly replaced with a bi-dimensional model of
ly, if the assessment techniques are equally valid acculturation that has been seen as more appro-
for all people being assessed, any observed dif- priate (Ryder, Alden & Paulhus, 2000). Rather
ferences are the result of real social, historical than pursuing complete adjustment to the new
and educational differences that impact on the culture in an assimilationist way, the trend has
abilities and behaviours of the people being as- been towards developing a bi-cultural identity or
sessed and hence on the assessment process and/ by retaining the original culture without exten-
or outcomes. Addressing these differences is a sively adjusting to the society of settlement. Van
social and a political problem, not a measure- de Vijver and Phalet (2004) attribute this to two
ment issue. Of course, a middle way between factors: first, the sheer magnitude of migration
these two extremes is possible – that while some has allowed the incoming migrant populations
differences in group performance may reflect to develop and sustain their own cultural insti-
real differences in ability and structure, few tutions such as education, health care and reli-
measures are culture-fair, and bias and differ- gion, and second, because the Zeitgeist of the
ential item functioning (DIF) may well suggest assimilationist doctrine among existing cultures
differences when none exist. has been replaced by one that is more accepting
of diversity, in which the retention of various
8.1.3 The issue of acculturation cultural institutions and behaviour patterns by
migrants is more readily accepted.
An important aspect of culture and its impact As a result, a popular current model is one
on psychological structures is acculturation, or proposed by Berry (Berry & Sam, 1997). Ac-
the transition from one culture to another. It is cording to this model, a migrant is required to
commonly known that humans are not static deal with two questions. The first is, does he
organisms but change in reaction to (and often want to establish good relationships with the
lead) changes in their environments. An import- culture of destination or his host culture (adap-
ant source of these changes is moving from one tation dimension)? The second question involves
sociocultural context to another, for whatever cultural maintenance: does he want to maintain
reason. Acculturation is one such process and good relations with the culture of origin or his
involves the psychological adaptation of people native culture? These two dimensions interact to
(such as migrants and minorities) to a new and yield four distinct coping strategies, as shown in
different cultural setting as a result of movement Table 8.1.
from and adjustment (see, for example, Van de
Vijver & Phalet, 2004). The extent of this adap-
Table 8.1 Migrants’ strategies in a bi-
tation depends on a range of exogenous vari-
dimensional model of acculturation
ables, such as length of residence, generation-
al status, education, language mastery, social
Cultural adaptation
disadvantage and cultural distance (Aycan & Do I want to establish good
U.S. or applicable copyright law.
Berry, 1996; Ward & Searle, 1991). In addition, relations with the culture of
it depends on the extent to which the individuals destination?
wish to adapt and integrate into the new culture.
In this regard, Van de Vijver and Phalet (2004) Cultural Yes No
argue that two basic models of acculturation maintenance
Do I want to Yes Integration Separation/
can be identified in the literature, depending on segregation
maintain good
whether acculturation is seen as a uni-dimen- relationships
sional or a bi-dimensional process. The best- with my culture No Assimilation Marginalisation
known uni-dimensional model is that proposed of origin?
by Gordon (1964), which assumes that accul-
µ BACK TO CONTENTS
EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 8/19/2019 6:56 AM via THE SOUTH AFRICAN COLLEGE OF99
APPLIED PSYCHOLOGY
AN: 1243028 ; Moerdyk, A. P..; The Principles and Practice of Psychological Assessment
Account: ns190599
SECTION 2 INTRODUCTION TO PSYCHOMETRIC THEORY
The first strategy put forward by Van de Vij- mension (which is essentially a continuum rath-
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
ver and Phalet (2004) is integration, where char- er than a dichotomy) determines the suitability
acteristics of both cultures are maintained in a of the assessment technique for the person and
process of biculturalism. They quote a number the applicability of the norms used for inter-
of research studies in Belgium and the Nether- preting the outcomes. Simply assuming that all
lands (e.g. Phalet & Hagendoorn, 1996; Pha- tests are invalid for minority groups, or that they
let, Van Lotringen & Entzinger, 2000; Van de can simply be used with all minority groups, is
Vijver, Helms-Lorenz & Feltzer, 1999), which clearly false: the level of acculturation may be
consistently show a preference for this strategy, an important moderator of test performance in
namely that migrants want to combine their ori- multicultural groups (Cuéllar, 2000). For this
ginal culture with elements of the mainstream reason, Van de Vijver and Phalet (2004) argue
culture. that the various measures of acculturation that
The second strategy identified by Van de Vij- have been developed need to be applied as a pre-
ver and Phalet (2004) is one where migrants re- cursor to assessment in a multicultural context.
tain most elements of their original culture and They argue (p. 218) that
generally ignore most aspects of the host culture.
Van de Vijver and Phalet term this separation (in [i]t is regrettable that assessment of accultur-
sociology and demography it is also labelled seg- ation is not an integral part of assessment in
regation). In South Africa, where this cultural multicultural groups (or ethnic groups in gen-
separation was enforced by white nationalists, it eral) when mainstream instruments are used
was termed apartheid or “separate-ness”. among migrants.
The third strategy is assimilation, which is
the opposite of the separation strategy, in that Using this two-dimensional model, measures
it aims at complete absorption of the migrant of acculturation are typically based on different
into the host culture with the concomitant loss combinations of positive or negative attitudes
of most elements of the original culture. This is towards adaptation and maintenance. These atti-
the notion of the melting pot, which was the dom- tudes are assessed using three distinct question
inant policy for many years in many Anglophone formats, namely one, two or four questions
countries (the UK, the US, Australia and Can- (Van de Vijver, 2001). The Culture Integra-
ada, to name a few). In recent years, this melt- tion-Separation index (CIS: Ward & Kennedy,
ing-pot view has given way to multiculturalism 1992) is an example of a one-question format
of various kinds. measure. These measures typically ask forced-
The fourth (and in the view of Van de Vij- choice questions, with a choice between either
ver and Phalet (2004), the least often observed) valuing the ethnic culture or host culture, or
strategy is marginalisation, which involves the both, or neither, for example: “Do you pre-
loss of the original culture without establishing fer (A) your own [e.g. Turkish] way of doing
ties with the new culture. In some countries things; (B) the Dutch way of doing things; (C)
youth, often second or third generation, show equally like both Turkish and Dutch ways of
marginalisation of this kind; they do not feel any doing things; and (D) neither – I dislike both
attachment to the parental culture nor do they ways of doing things”. An advantage of this
want to establish strong ties with the host cul- one-question format is that the questions tend
ture (often they are prevented from identifying to be efficient and short, but they cannot dis-
with the host culture because of societal dis- tinguish complex attitudes of bicultural individ-
U.S. or applicable copyright law.
100
EBSCO
µ BACK TO CONTENTS
Publishing : eBook Collection (EBSCOhost) - printed on 8/19/2019 6:56 AM via THE SOUTH AFRICAN COLLEGE OF
APPLIED PSYCHOLOGY
AN: 1243028 ; Moerdyk, A. P..; The Principles and Practice of Psychological Assessment
Account: ns190599
ASSESSING IN A MULTICULTURAL CONTEXT 8
[“Culture of Origin Groups”, e.g. Turks] in the to assessment in general has poor psychometric
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
[Country of Destination, e.g. the Netherlands] properties, in that the questions are ipsative
should maintain the [Turkish] culture (4) com- (i.e. they are positively correlated and thus not
pletely; (3) mostly; (2) only in part; or (1) not independent of one another). (See section 3.6.6).
at all?” and “Do you think that [Turks] in the Another development in the assessment of ac-
[Netherlands] should adapt to the [Dutch] cul- culturation is the view that individuals do not
ture (4) completely; (3) mostly; (2) only in part; adopt a single approach in this area. Rather, the
or (1) not at all?” approach adopted is contingent on the situation
The four-question format measures such as in which it is shown. In this regard, according to
the Acculturation Attitudes Scale (AAS) de- Arends-Tóth and Van de Vijver (2003), accul-
veloped by Berry, Kim, Power, Young and turation strategies adopted depend on whether
Buyaki (1989) use agreement ratings with four this occurs in the public and private domain.
statements that independently assesses each of Similarly, Phalet and Swyngedouw (2003)
Berry’s four strategies by indicating whether found that willingness to engage in maintenance
participants Agree Strongly (A), Agree (a), Dis- or adaptation was context dependent. In particu-
agree (d) or Disagree Strongly (D) with each of lar, they showed that most migrants tend to fa-
the following statements: vour cultural maintenance in the private domain,
such as family relationships, and adaptation to
1. I think that [Turks] in [the Netherlands] should the host culture in the public domain, such as
maintain the [Turkish] culture and not adopt school, work, etc. (Arends-Tóth & Van de Vij-
any Dutch ways of doing things [Separation]. ver, 2003; Phalet & Andriessen, 2003; Phalet &
2. I think that Turks in the Netherlands should try Swyngedouw, 2003). Moreover, in these stud-
to fully adopt Dutch ways and forget about their ies this acculturation profile was considered as
Turkish ways of doing things [Assimilation]. the most adaptive pattern. The Acculturation in
Context Measure (ACM) developed by Phalet
3. I think that Turks in the Netherlands should try and Swyngedouw (2003) is a two-question for-
to keep their Turkish customs and culture, while mat measure that repeats the same questions in
at the same time trying to fit into Dutch culture multiple contexts (e.g. home, family, school and
as far as possible [Integration]. work situations).
4. I think it is stupid of people to have any form of In closing this section on acculturation,
culture – I reject both my Turkish culture and Arends-Tóth and Van de Vijver (2006b) provide
that of the Netherlands [Marginalisation]. five guidelines for the assessment of accultura-
tion. These are as follows:
(Note: These are not the actual questions used,
but merely illustrate the approach.) 1. Acculturation conditions, orientations and
outcomes usually cannot be combined in
a single measure. Combining them makes
Denoso (2010, p. 38) argues that the two- and
it difficult to determine how acculturation
four-question format measures successfully
could explain other variables (e.g. cognitive
discrimination between the integration strat-
developmental outcomes) if all aspects of ac-
egy, which is generally considered to be more
culturation are used as predictors.
adaptive, and the other less adaptive strategies
(Arends-Tóth & Van de Vijver, 2003). On the 2. A measure of acculturation can only be com-
other hand, Rudmin and Ahmadzadeh (2001, prehensive if it contains aspects of both the
mainstream and heritage cultures.
U.S. or applicable copyright law.
4. The use of single-index measures should be He argues that if any instrument is “borrowed”
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
avoided. The content validity of these types from another cultural group, it must be shown
of measures is typically low and inadequate to have been validly adapted: the test items must
to capture the multifaceted complexities of have conceptual and linguistic equivalence, the
acculturation. Moreover, there is no support test and test items must be free of bias (Fouad,
in the literature for any single-index measure 1993; Geisinger, 1994) and appropriate norms
of acculturation. must be developed. These properties have to be
5. The psychometric properties of instruments empirically determined.
(validity and reliability) should be reported.
8.2.2 Translate/adapt
8.2 Approaches to cross-cultural Secondly, existing tests and measures can be
assessment adapted and translated into the language of the
target group. However, this goes beyond a literal
In addressing the issues associated with using and even idiomatic translation in order to ensure
psychometric instruments in societies for which the proper conceptual translation of the test ma-
they have not been developed, Van de Vijver terial. For example, the Minnesota Multiphasic
and Hambleton (1996) identify three approach- Personality Inventory (MMPI – a clinically ori-
es which they term Apply, Adapt and Assemble. ented personality scale) contains various implicit
In this book, the third approach (namely As- references to the American culture of the test
semble) has been split into two to yield Develop designers, and extensive adaptations to many
Culture-Friendly Tests and Develop Cul- items are required before the scale can be used
ture-Specific Tests. The four different ap- in other languages and cultures.
proaches discussed in this text are:
sonality scales (e.g. Barrett, Petrides, Eysenck ology (Van de Vijver, 2002).
& Eysenck, 1998; Eysenck, Barrett & Eysenck,
1985). In Chapter 7 (section 7.2.1), this ap-
proach is described as an “etic” approach, and 8.2.4 Develop culture-specific tests
as Van de Vijver (2002, p. 545) notes, this is The fourth approach to assessing cross-
a form of “blind” application of an instrument culturally is to develop culture-specific instru-
in a culture for which it has not been designed, ments from scratch to assess constructs that
and is simply bad practice where there is no con- may be very different in the specific cultural
cern for the applicability of the instrument nor setting (e.g. Cheung, Leung, Fan, Song, Zhang
its psychometric properties in the new context. & Chang, 1996). This is especially important
102
EBSCO
µ BACK TO CONTENTS
Publishing : eBook Collection (EBSCOhost) - printed on 8/19/2019 6:56 AM via THE SOUTH AFRICAN COLLEGE OF
APPLIED PSYCHOLOGY
AN: 1243028 ; Moerdyk, A. P..; The Principles and Practice of Psychological Assessment
Account: ns190599
ASSESSING IN A MULTICULTURAL CONTEXT 8
when existing instruments have been shown not haviours. This is, of course, in line with the find-
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
only to be invalid and unreliable, but more espe- ings of Hofstede (e.g. 1991, 1994, 1996) who
cially that they do not adequately assess the par- distinguishes between masculinity and feminin-
ticular construct in the “other” cultural group. ity as one of five cultural dimensions. As such,
This is termed an “emic” approach. (The origin test items assessing self-presenting or self-en-
and meaning of the terms “emic” and “etic” are hancing traits that would be viewed as import-
discussed in some depth in Chapter 7, section ant in communalistic cultures would be seen
7.2.1.) as socially undesirable, rated lower and seen as
Irrespective of whether a psychometric test is incongruent with possible leadership emergence
taken as is and applied to a new group of people, and effectiveness in individualistic Western cul-
whether the test has been adapted or whether tures.
it has been developed from scratch, it needs to Another example comes from research in per-
be calibrated or “normed” for the population sonality on the five-factor model. On the basis of
for which it is to be used. Perhaps more import- widespread research, McCrae and Costa (1997)
antly, the behaviour of the test across cultural found considerable evidence for the universal-
boundaries needs to be investigated in order to ity of the structure in US English, German,
determine whether the tests are measuring the Portuguese, Hebrew, Chinese, Korean and
same phenomenon in the same way – do the re- Japanese samples. On the other hand, however,
sults mean the same thing for different groups? Cheung et al. (1996) found that the five-factor
Put differently, are the tests and their results model leaves out aspects of psychological func-
equivalent across the different groups? In order tioning that are considered important by Chi-
to examine this further, we need to understand nese people. For example, interpersonal factors
the various factors or sources of bias that con- such as “harmony” and “losing face” are often
taminate and detract from the cross-cultural observed when descriptions of personality are
validity of our measure. given by Chinese informants, but are not repre-
sented in the five-factor model.
A third example can be found in Ho’s (1996)
8.3 Forms of bias work on filial piety (psychological characteristics
In addressing the issues of cross-cultural equiva- associated with being a good son or daughter).
lence, a useful starting point is to identify the The Western conceptualisation is more restrict-
various sources of bias so that steps can be taken ed than the Chinese, according to which chil-
to prevent them from contaminating the assess- dren are supposed to assume the role of care-
ment scores. Van de Vijver and others (Van de takers of their parents when the latter grow old.
Vijver & Leung, 1997a, 1997b; Van de Vijver & Finally, Dyal (1984, cited in Van de Vijver &
Poortinga, 1997) identify three distinct sources Phalet, 2004) shows that measures of locus of
of bias and unfairness, assuming that blatant control often show different factor structures
forms of discrimination on the basis of sex, race, across cultures, strongly suggesting that either
caste, etc. are excluded. These are construct the Western concept of control is inappropriate
bias, item bias and method bias. in cross-cultural settings or that the behaviours
associated with the concept differ across cul-
tures.
8.3.1 Construct bias Construct equivalence thus implies that the
Construct bias* is the most important reason same construct is being measured across cul-
U.S. or applicable copyright law.
for construct inequivalence, and occurs when tures, and inequivalence occurs when the instru-
the constructs are associated with different be- ment measures a construct differently in two cul-
haviours or characteristics across cultural groups tural groups, when the concepts of the construct
(“cultural specifics”). Schumacher (2010), for overlap only partially across cultures or when
example, argues that in individualistic Western the measure identifies somewhat different con-
cultures, leadership is usually associated with structs (resulting in “apples and oranges being
traits such as dominance and assertiveness, compared”). This absence of structural equiva-
whereas in more communalistic cultures leader- lence indicates bias at the construct level, and
ship is more likely to be associated with self- unless construct equivalence is demonstrated,
effacing, community-supporting traits and be- erroneous or misleading conclusions about the
nature and significance of the construct in the another. Van de Vijver (2002) gives the hypo-
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
particular context are likely to result. This sug- thetical example of the item: “Are you afraid
gests the need for an “emic” approach involving when you walk alone on the street in the middle
the development of an appropriate assessment of the night?”, pointing out that this item may
process that is tailored to the unique constella- be responded to very differently by persons
tion of dimensions in the particular context. depending on the safety of their neighbour-
hood, even though they fully comprehend the
question. An item is deemed equivalent across
8.3.2 Item bias cultural groups when it behaves in the same
Item bias, also known as differential item func- way in both cultures – that is, when this form
tioning or DIF, refers to systematic error in how of item bias is absent. Ways of demonstrating
a test item measures a construct for the members and measuring the extent of equivalence across
of a particular group (Camilli & Shepard, 1994). cultural groups are discussed in some depth in
When a test item unfairly favours one group of section 8.5, but in general these can be seen to
examinees over another, the item is biased. Even take the form of chi-square expectancies*,
if the construct itself does not vary across cultural item-whole correlations*, factor-loadings*
divides, many of the items in the assessment may and item curve characteristics* (ICC) of
behave quite differently in different contexts. these items, which need to be shown to be (ac-
When anomalies at the item level exist, item bias ceptably) similar to each other.
is detected (Fontaine, 2005), which points to-
wards differences in the psychological meaning
of the items across cultures or the inapplicability
8.3.3 Method bias
of item content in a specific culture. An item of, The third source of bias in cross-cultural assess-
say, an assertiveness scale is said to be biased if ment refers to the presence of nuisance variables
people from different sociocultural contexts with due to method-related factors. Three types of
a given level of assertiveness are not equally like- method bias can be envisaged. First, incompar-
ly to endorse the item. A good example, given ability of samples on aspects other than the
by Hambleton (1994, p. 235 and cited by Van target variable can lead to method bias (sample
de Vijver, 2002, p. 549), is the test item “Where bias). For instance, cultural groups often differ in
is a bird with webbed feet most likely to live?” The educational background and, when dealing with
English phrase “the bird with webbed feet” is mental tests, these differences can confound real
translated into Swedish as “the bird with swim- population differences on a target variable.
ming feet”, with the result that the English and Secondly, method bias also refers to problems
Swedish items are no longer equivalent as the that relate to the assessment materials used (in-
Swedish version provides a much stronger clue strument bias). A well-known example illustrat-
to the answer than the original English item. (In ing this is the study by Deregowski and Serpell
South Africa, many school-leaving examination (1971), who asked Scottish and Zambian chil-
papers in technical subjects such as science or dren in one condition to sort miniature models
biology were in the past presented simultaneous- of animals and motor vehicles, and in another
ly in English and Afrikaans on the same question condition to sort photographs of these models.
paper. Many English students, when stumped Although no cross-cultural differences were
about the meaning of a technical term, would found for the physical models, the Scottish chil-
turn to the Afrikaans version of the question for dren obtained higher scores than the Zambian
U.S. or applicable copyright law.
a clue. For example, in biology, the English term children when photographs were sorted. In the
stamen is translated into Afrikaans as meeldraad, latter case, the Zambian children were relatively
which literally translates as pollen wire.) unfamiliar with photographic material.
This type of bias is a major issue in deter- The third form of method bias arises from the
mining the cross-cultural equivalence of a manner in which the assessment is administered
measure and has been extensively studied by (administration bias). Communication problems
psychometricians (see e.g. Berk, 1982; Holland between testers and testees (or interviewers and
& Wainer, 1993). At the same time, it must be interviewees) can easily occur, especially when
realised that item bias does not reside only in they have different first languages and cultural
the translation of items from one language to backgrounds (see Gass & Varonis, 1991). Inter-
104
EBSCO
µ BACK TO CONTENTS
Publishing : eBook Collection (EBSCOhost) - printed on 8/19/2019 6:56 AM via THE SOUTH AFRICAN COLLEGE OF
APPLIED PSYCHOLOGY
AN: 1243028 ; Moerdyk, A. P..; The Principles and Practice of Psychological Assessment
Account: ns190599
ASSESSING IN A MULTICULTURAL CONTEXT 8
viewees’ insufficient knowledge of the testing tain the same or similar scores on the different
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
language and inappropriate modes of address or language versions of the items or measure. If
cultural norm violations on the part of the inter- not, the items are said to be biased and the
viewer can seriously endanger the collection of two versions of the measure are non-equiva-
appropriate data, even in structured interviews. lent (p. 79).
One can see how computerised administration
of a test would affect computer-literate people Individual test items and the test as a whole
and those with very little computer experience should not vary in the levels of difficulty or
quite differently. intensity when the groups are known to be
The distinction between measurement unit similar. Equivalence is thus achieved when
equivalence (e.g. degrees Celsius and degrees the assessment behaves in a similar way across
Kelvin) and scalar equivalence (where the cultures as shown by a pattern of high correla-
meaning of the values obtained on the measure tions with related measures (convergent valid-
are identical across groups) is important because ity) and low correlations with measures of other
only the latter assumes that the measurement is constructs (discriminant validity) as would be
completely free of bias (Van de Vijver & Tan- expected from an instrument measuring a sim-
zer, 2004). As indicated, construct bias indicates ilar construct. If there are major differences in
conceptual inequivalence, and instruments that the way in which the groups behave, or if there
do not adequately cover the target construct in are marked differences in the way in which
one of the cultural groups cannot be used for the attributes occur, then specifically designed
cross-cultural score comparisons. Construct bias measures need to be developed and tailored to
precludes the cross-cultural measurement of a meet the demands of the cultural context. This
construct with the same measuring instrument means that at least some items will be differ-
or scale (Van de Vijver & Tanzer, 2004). If no ent in the two countries. This approach is con-
direct score comparisons are to be made across sistent with the “emic” approach.
cultures, then neither method nor item bias will Three kinds of equivalence have been identi-
affect cross-cultural equivalence. However, both fied and are linked in a hierarchy of increasing
method and item bias can have major effects on importance (Van de Vijver & Poortinga, 1997;
scalar equivalence as items that systematically Van de Vijver & Leung, 1997a, 1997b). These
favour a particular cultural group may conceal levels are construct equivalence, measurement
real differences in scores on the construct being unit equivalence and scalar equivalence.
assessed.
8.4.1 Construct equivalence*
8.4 Forms of equivalence This form of equivalence, also termed struc-
Equivalence is essentially the absence of bias – tural equivalence* and functional equiva-
that is, the systematic but irrelevant compon- lence*, indicates that the same construct is
ent of the observed scores – and is the extent to measured across all cultural groups studied, even
which any measure yields the same results across if the measurement of the construct is not based
different groups and is able to correctly identify on identical instruments across all cultures. In
individuals or groups possessing equal amounts cross-cultural assessment, the test constructor/
of the attribute concerned (assuming that they user cannot assume that the construct being as-
have the same amount of the attribute being sessed has the same meaning and psychological
U.S. or applicable copyright law.
the same time, differences within each group can teristics* (ICC). Differential item functioning
still be compared across groups. For example, (DIF) is perhaps the most important indicator of
change scores in pretest–post-test designs can non-equivalence of assessment items/tests and
be compared across cultures for instruments of bias. At the same time, an item that exhibits
with measurement unit equivalence. Similarly, DIF may not necessarily be biased for or against
gender differences found in one culture can be any group (Kanjee, 2007), but may reflect per-
compared with gender differences in another formance differences that the test is designed
culture for scores showing measurement unit to measure (Camilli & Shepard, 1994) or real
equivalence, even though across-group compari- differences in the phenomenon being assessed.
sons of each gender are not meaningful. This is illustrated in Chapter 7 (section 7.3.1),
106
EBSCO
µ BACK TO CONTENTS
Publishing : eBook Collection (EBSCOhost) - printed on 8/19/2019 6:56 AM via THE SOUTH AFRICAN COLLEGE OF
APPLIED PSYCHOLOGY
AN: 1243028 ; Moerdyk, A. P..; The Principles and Practice of Psychological Assessment
Account: ns190599
ASSESSING IN A MULTICULTURAL CONTEXT 8
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
where differences between males and females, of matched reference and focal groups. It also
and between athletes with a European and an refers to the differing probabilities of success on
African heritage perform quite differently in an item of people of the same ability but belong-
various sporting events such as long-distance ing to different groups – that is, when people
running and sprinting. with equivalent overall test performance but
In order to detect the presence and extent from different groups have a different probability
of inequivalence, we need to move away from or likelihood of answering an item correctly.
classical test theory to what is known as differ- DIF thus refers to the differing probabil-
ential item functioning (DIF), which is perhaps ities of success on an item of people of the
the most important indicator of non-equivalence same ability but belonging to different groups –
of assessment items/tests and bias. According to that is, when people with equivalent overall test
Hambleton, Swaminathan and Rogers (1991), performance but from different groups have a
U.S. or applicable copyright law.
There are several ways in which item bias can refined in the light of this experience. This pro-
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
be demonstrated. Some are based on expert cess can be repeated several times. The simpli-
judgements which are based on inspection and city of the option has led to its widespread use.
back translation, while others are based on vari-
ous forms of statistical analysis. The statistical
techniques are divided into two main categories: 8.5.2 Non-parametric statistical approaches
non-parametric methods developed for dichot- Whereas the judgemental approaches involve
omously scored items using contingency tables, judgements by expert statistical approaches,
and parametric methods for test scores with non-parametric approaches look for differences
interval-scale properties based on the analysis of in the frequency with which test scores are
variance (ANOVA). given, using a contingency approach and the
chi-squared statistic. These patterns are based
8.5.1 Judgemental techniques on various factors such as age, gender and cul-
tural-group membership – when differences in
Judgemental approaches for determining the predicted scoring patterns occur on the basis
equivalence of measures rely on the degree to of group membership, bias is identified. Three
which two or more experts in the area agree non-parametric approaches can be identified,
that the measures are similar. The most com- namely the Mantel-Haenszel (MH) approach,
mon judgemental approaches to identifying the Simultaneous Item Bias Test (SIBTEST)
inequivalence involve experts in test construc- and Distracter Response Analysis (DRA).
tion, very familiar with both the culture of ori-
gin and the target culture, who inspect the items
8.5.2.1 The Mantel-Haenszel (MH) approach
for cultural and linguistic equivalence. These
techniques include forward translation and back The first non-parametric method for identify-
translation of test items. Forward translation is ing DIF was developed by Mantel and Haenszel
done when the measure is translated from the (1959) (see also Holland & Thayer, 1988). The
source language (SL) into the target language Mantel-Haenszel (MH) approach uses contin-
(TL) by a person (or group of people) who is/ gency tables and is based on the assumption that
are experts in both languages. In forward trans- an item does not show DIF if the odds (or chan-
lation, the original test in the source language ces) of getting an item correct are the same at
is translated into the target language and TL all ability levels for two matched groups of test-
speakers are then asked to complete the meas- takers who differ only in terms of their mem-
ure. They are questioned by the experts about bership (call these two Group A and Group B).
their responses and their understanding of the The pass/fail results of Group A and Group B
various items. are tabulated in a two-by-two table for each item
In back translation, the test is translated into and compared. This is repeated for each item
the target language and then it is re-translated in the measure. Suppose there are 100 people
by an independent expert back into the source in both groups, and that 58 As and 23 Bs get
language. A panel of bilingual scholars then re- the item right and 42 As and 77 Bs get the item
views the translated version, which is translated wrong. This is shown in Table 8.2.
back into the first language to monitor retention
of the original meaning. An independent back Table 8.2 Contingency table for Item 1
translation means that “an original translation
U.S. or applicable copyright law.
108
EBSCO
µ BACK TO CONTENTS
Publishing : eBook Collection (EBSCOhost) - printed on 8/19/2019 6:56 AM via THE SOUTH AFRICAN COLLEGE OF
APPLIED PSYCHOLOGY
AN: 1243028 ; Moerdyk, A. P..; The Principles and Practice of Psychological Assessment
Account: ns190599
ASSESSING IN A MULTICULTURAL CONTEXT 8
However, inspection is not good enough and evidence of metric equivalence of the measure-
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
so the chi-square statistic is used. MH yields a ments, any findings about group differences on
chi-square test with one degree of freedom to the attributes being assessed, and subsequent
test the null hypothesis that there is no relation practical implications of the results in important
between group membership and test performance on areas of functioning, are simply not known.
one item after controlling for ability as given by the Non-uniform DIF occurs when there are dif-
total test score. In other words, an item is biased ferences in the probabilities of a correct response
if there is a significant difference in the propor- for the two groups at different levels of ability (in
tions of each membership group achieving a other words, when there is an interaction between
correct or desired response on each test item. ability level and group membership. Non-uniform
Once the item has been examined in this way, item bias has implications at the measurement
the next step is to compare the scores for Item unit (or metric) equivalence level because the
2 in exactly the same way. This is continued for variables of interest are not measured on the same
Item 3 and all other items, until they have all metric scales across different groups. As a result,
been compared. assessment outcome decisions (e.g. personnel se-
In order to calculate the Mantel-Haenszel lection, mental health status) that are based on
statistic, the following steps need to be taken. the attribute measured may not be meaningful
Firstly, the test data must be coded and scored. where relative differences exist between groups.
Each examinee must have (a) a code or label for The only way around this is to develop and use
group membership; (b) the actual response (right group-specific norms to avoid adverse impact (as
or wrong) for each item; and (c) total score on determined by similar selection ratios for majority
the test. Secondly, data for each item must be and minority groups).
organised into a three-way contingency table. When non-uniform DIF occurs or is suspect-
Thirdly, the statistical analysis for detecting and ed, it is necessary to calculated DIF scores at the
testing for DIF and item bias (chi-square) has to different levels of ability. To do this, the whole
be conducted for each item and each ability level. sample must be divided into a number of sub-
The method outlined above assumes that the groups (K) on the basis of their ability scores
amount of DIF is the same across all members (call these K1, K2, K3, etc.). The comparison
of Groups A and B, and that there is no inter- of responses for each item is then carried out for
action between item difficulties for members each of the ability subgroups, so that the passes
with different levels of ability. This assump- and fails for Groups A and B are compared at
tion is termed a uniform DIF and exists when ability level K1, then again at ability level K2,
the probability of answering an item correct- and so on for each ability level. Then the whole
ly is greater for one group consistently over all process is repeated for Item 2 and Item 3, and
ability levels. In other words, uniform DIF oc- so on. As can be seen, this approach requires
curs when there is no interaction between abil- a two-by-two contingency table for each item
ity level and group membership. As Ekermans and each ability level. If there are 50 items and
(2009) shows, uniform bias results from differ- four subgroups (K = 4), the chi-square statistic
ences in item difficulty as shown by differences must be computed 50 × 4 or 200 times. How-
in the regression intercept of the observed item ever, as Gierl, Jodoin and Ackerman (2000, p.
scores on the variable across different socio- 11) note, non-uniform DIF is quite rare in prac-
cultural groups (the offset described in Chapter tice. Nevertheless, an alternative approach that
3, section 3.6.3). She argues further that if as- takes non-uniform DIF into account is likely to
U.S. or applicable copyright law.
that the observed ability score is not the best especially with large-scale testing programmes.
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
means of categorising the ability groups as these The basic argument of IRT is that the higher
scores contain an error component as shown in an individual’s ability level, the greater the indi-
Chapter 3, where it is argued that the Observed vidual’s chance of getting an item correct. This
Score is made up of a True Score, plus or minus is understandable as people with higher scores
an Error Score. As Zumbo (1999) correctly can generally expect to get more items right than
notes, composite scores (i.e. scale total scores) those with lower scores. This relationship can be
are merely indicators of a latent (unobservable) shown graphically by plotting the ability level of
variable. The SIBTEST uses a regression esti- the test-taker (represented by the total score) on
mate of the true score instead of the Observed the x-axis, and the probability of getting the item
Score as the matching or categorising variable. correct on the y-axis. Such a plot is known as an
As a result, examinees are matched on an es- Item Characteristic Curve or ICC. This is shown
timated latent ability score rather than an ob- in Figure 8.2.
served score. An advantage of this method is
that SIBTEST can be used to evaluate DIF in 1
8.5.2.3 Distracter Response Analysis (DRA) As can be seen in Figure 8.2, the probability
of doing well on Item 1 increases as the ability
A variant of the MH approach that can be
levels of the individuals taking the test increase
used when multiple alternatives are provided is
– low-ability individuals do relatively badly on
known as Distracter Response Analysis (DRA),
the item, whereas high-ability individuals do
which examines the incorrect alternatives or
relatively well on the item. Item 2, on the other
distracters to a test item for differences in pat-
hand, is far more difficult, as the probability of
terns of response among different subgroups of a
getting the item correct remains low, irrespec-
population. In the DRA, responses are analysed
tive of the respondents’ ability level. The slope
in terms of the null hypothesis that there is no
of the curve indicates the discriminating power
significant difference in proportions when selecting
of the item. Note that if the curve is relatively flat
distracters on the test items between the reference and then the item does not discriminate among indi-
focal groups. As with MH, contingency tables are viduals with high, moderate or low total scores
used and evaluated using chi-square. In terms of on the measure.
this framework, no item bias occurs when there
U.S. or applicable copyright law.
110
EBSCO
µ BACK TO CONTENTS
Publishing : eBook Collection (EBSCOhost) - printed on 8/19/2019 6:56 AM via THE SOUTH AFRICAN COLLEGE OF
APPLIED PSYCHOLOGY
AN: 1243028 ; Moerdyk, A. P..; The Principles and Practice of Psychological Assessment
Account: ns190599
ASSESSING IN A MULTICULTURAL CONTEXT 8
ure, these three parameters reflect firstly the 0,9 Group A Group B
ability of the item to distinguish between people 0,8
with a particular characteristic and those with- 0,7
out; secondly, the amount of the characteristic 0,6
that the person must have to endorse the item; 0,5
and finally, the likelihood that the person will 0,4
endorse the item without due consideration, as 0,3
a result of social desirability, guessing and the 0,2
like. The meanings of these three parameters in 0,1
both the cognitive and personality domains are 0,0
summarised in Table 8.3. Low Ability level High
the item of interest by successively testing each Table 8.4 Statistical criteria for identifying
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
all of the four methods yield a statistically sig- tries and to evaluate factor congruence, often
nificant chi-square value on an item or groups by means of the computation of Tucker’s phi
of items. They summarise the various meth- coefficient (Van de Vijver & Poortinga, 2002).
ods of DIF and their accompanying statistical This statistic examines the extent to which
analyses in Table 8.4. factors are identical across cultures. Values
of Tucker’s phi above 0,90 are usually con-
sidered to be adequate and values above 0,95
8.5.3.3 Factor analysis to be excellent. Tucker’s phi can be computed
A third parametric approach to the detection with dedicated software such as an SPSS rou-
of inequivalence is the use of factor analysis. As tine (syntax available from Van de Vijver &
112
EBSCO
µ BACK TO CONTENTS
Publishing : eBook Collection (EBSCOhost) - printed on 8/19/2019 6:56 AM via THE SOUTH AFRICAN COLLEGE OF
APPLIED PSYCHOLOGY
AN: 1243028 ; Moerdyk, A. P..; The Principles and Practice of Psychological Assessment
Account: ns190599
ASSESSING IN A MULTICULTURAL CONTEXT 8
Leung, 1997a, and http://www.fonsvandevij- – references to the local context of the test de-
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
these different formats are used in a cross- A second method for detecting method bias
cultural context – these are termed method identified by Van de Vijver and Hambleton
bias or instrument bias* as discussed in sec- (1996) involves the measurement of social de-
tion 8.2.3. Clearly when people are not used sirability or other response sets (e.g. Fioravanti,
to being assessed (i.e. are relatively low on test Gough & Frere, 1981; Hui & Triandis, 1989).
wiseness or test sophistication*), they may Should these scores be very different across the
suffer from test anxiety and as a result tend to cultures assessed, one can surmise that the as-
underperform. This may be particularly relevant sessment itself is behaving quite differently in
when high-tech methods such as questionnaires the different contexts.
and computer-based applications are used, and
less likely when interviews and assessment tech- 8.6.1.3 Non-standard administration
niques based on culturally familiar methods
Finally, method bias can be examined by ad-
such as toys and the like are used. Novel assess-
ministering the instrument in a non-standard
ment techniques have used sand tray drawings,
way, soliciting all kinds of responses from a
clay modelling, models of animals and every-
respondent about the interpretation of instruc-
day objects, and so forth. Such techniques have tions, items, response alternatives and motiva-
been used, inter alia, by Deregowski and Serpell tions for answers. Such a non-standard admin-
(1971), who asked Scottish and Zambian chil- istration provides an approximate check on the
dren in one condition to sort miniature models suitability of the instrument in the target group.
of animals and motor vehicles, and in another
condition to sort photographs of these models.
Many of these same techniques are used in vari- 8.7 Addressing issues of bias and lack of
ous forms of psychotherapy, including art ther- equivalence
apy (e.g. Oaklander, 1997). In general terms, all psychological assessments
require the assessors to demonstrate the reli-
8.6.1 Detecting method bias ability, validity and fairness of the techniques
used. By extension, part of this requirement is
Van de Vijver and Hambleton (1996) also
that equivalence of the assessment techniques
argue that an often-neglected source of bias in used in a cross-cultural context also needs to be
cross-cultural studies is method bias. They iden- demonstrated. Minimising bias in cross-cultural
tify several approaches to this, including triangu- assessment usually amounts to a combination of
lation, response set detection and non-standard strategies: integrating design, implementation
administration. and analysis procedures. Van de Vijver and Tan-
zer (2004) have identified a number of strategies
8.6.1.1 Triangulation for describing and dealing with the different
In order to detect method bias, they argue for a biases outlined above.
process of triangulation (e.g. Lipson & Meleis, According to He and Van de Vijver (2011),
1989) using single-trait, multimethod matrices actions can or should be taken to reduce or
(e.g. Campbell & Fiske, 1959; Marsh & Byrne, prevent low levels of equivalence from occur-
1993). Unless these different measures that are ring at various stages of the assessment process.
known to assess similar constructs yield very They identify three such stages, namely the de-
sign, implementation and analysis stages (see
U.S. or applicable copyright law.
114
EBSCO
µ BACK TO CONTENTS
Publishing : eBook Collection (EBSCOhost) - printed on 8/19/2019 6:56 AM via THE SOUTH AFRICAN COLLEGE OF
APPLIED PSYCHOLOGY
AN: 1243028 ; Moerdyk, A. P..; The Principles and Practice of Psychological Assessment
Account: ns190599
ASSESSING IN A MULTICULTURAL CONTEXT 8
cultural comparative study fall into two broad at ease and do not experience any cultural bar-
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
categories, namely decentring* and conver- riers Brislin (1986). As shown in section 8.3.3,
gence* (Van de Vijver & Leung, 1997a). Accord- an important source of inequivalence that arises
ing to Werner and Campbell (1970), cultural de- during this implementation stage is method bias,
centring means that an instrument is developed which refers to problems caused by the manner
simultaneously in several cultures and only the in which a study is conducted (method-related
common items are retained for the comparative issues). Four types of method bias are identified,
study; making items suitable for a cross-cultural namely sample bias, instrument bias, response sets
context in this approach often implies the use of and administration bias. Steps need to be taken
more general items and the removal of specifics, to address each of these components.
such as references to places and currencies when Sample bias arises when sample parameters
these concepts are not part of the construct being differ systematically between the people being as-
measured. This is essentially an adaptation ap- sessed and those for whom the assessment process
proach. He and Van de Vijver (2011) point out was initially developed. These differences may be
that large international educational assessment the result of educational levels, urban versus rural
programmes such as the Program of International residency and religious affiliation, or even inten-
Student Assessment (PISA), generally adopt this sity of religious belief. To address the issue of
approach, which involves committee members sampling bias, Boehnke, Lietz, Schreier and Wil-
from target cultures meeting to develop culturally helm (2011) suggest that the sampling of cultures
suitable concepts and items. should be guided by research goals (e.g. select a
broad cultural spectrum if the goal is to establish
When the convergence approach is used, in-
cross-cultural similarities and far more homogen-
struments measuring similar constructs are
eous cultural groups if cultural differences are be-
developed independently within cultures, and
ing looked for). When participants are recruited
the various instruments are then administered
using convenience sampling, the generalisability
across the various cultures (Campbell, 1986).
of findings to their population needs special atten-
It is essentially a process of assembly and then tion. Accordingly, sampling must be guided by the
adoption. An example of this is given by He and distribution of the target variable being assessed.
Van de Vijver (2011, pp. 9–10) when they de- Convenience sampling must be tempered by the
scribe a study by Cheung, Cheung, Leung, nature of the characteristic being investigated in
Ward and Leung (2003). Both the NEO-Five order to match the two samples as closely as pos-
Factor Inventory (NEO-FFI) (a Big Five meas- sible. If this matching strategy does not work, it
ure developed and validated mostly in Western may well be possible to control for factors that in-
countries) and the Chinese Personality Assess- duce sample bias so that a statistical correction for
ment Inventory (CPAI) (which was developed the confounding differences can be achieved. For
in the Chinese context) were administered to example, educational quality has a significant im-
both Chinese and Americans. Joint factor an- pact on the assessment of intelligence, and there-
alysis of the two personality measures revealed fore the nature, quality and extent of education
that the Interpersonal Relatedness factor of the must be collected for later use as possible mod-
CPAI was not covered by the NEO-FFI, where- erating or adjustment variables. In this respect,
as the Openness domain of the NEO-FFI was He and Van de Vijver (2011) show, in a study
not covered by the CPAI. Consequently, one by Blom, De Leeuw and Hox (2011) how, when
can expect that merging items from the meas- the non-response information from the European
ures developed in distinct cultural settings may Social Survey (see http://www.europeansocial-
U.S. or applicable copyright law.
show a more comprehensive picture of person- survey.org for more details) was combined with
ality than when the measure is developed in one a detailed interviewer questionnaire, systematic
setting and then adapted for use in others. country differences in non-response could in part
be attributed to interviewer characteristics such as
contacting strategies.
8.7.2 At the implementation stage As we have seen, instrument bias arises when
Because the interaction between administrators the assessment method used behaves different-
and respondents can be a significant source of er- ly across the different groups as illustrated by
ror variance, the right administrator/interviewers Deregowski and Serpell’s (1971) findings in re-
should be selected so that the respondents feel spect of Scottish and Zambian children’s ability
to sort photographs and models of animals and that can affect the interpretation of cross-cultural
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
than when the information was collected by an sufficient number of examples and/or
African-American interviewer. exercises)
In order to minimise this form of bias, a stan- Use of subject and context variables
(e.g. educational background)
dardised administration protocol should be de-
veloped and adhered to by all assessors. The Addressing sample issues
establishment of rapport between the adminis- Use of collateral information (e.g.
test-taking behaviour or test attitudes)
trators and those being assessed is always crucial
Assessing response styles
but is of particular importance when assessing
Use of test-retest, training and/or inter-
cross-culturally. Ensuring proper administration
vention studies
can help minimise the various response biases
116
EBSCO
µ BACK TO CONTENTS
Publishing : eBook Collection (EBSCOhost) - printed on 8/19/2019 6:56 AM via THE SOUTH AFRICAN COLLEGE OF
APPLIED PSYCHOLOGY
AN: 1243028 ; Moerdyk, A. P..; The Principles and Practice of Psychological Assessment
Account: ns190599
ASSESSING IN A MULTICULTURAL CONTEXT 8
people of the same ability but belonging to dif- to detect bias, especially in large-scale testing
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
ferent groups – that is, when people with equiva- programmes. The basic argument of IRT is that
lent overall test performance but from different the higher an individual’s ability level, the great-
groups have a different probability or likelihood er the individual’s chance of getting a more dif-
of answering an item correctly. ficult item correct, and the less likely it is that
There are several ways in which item bias can a person with lower ability would get the more
be demonstrated. Some are based on expert difficult items correct. This relationship can be
judgements based on inspection as well as for- shown graphically by plotting the ability level of
ward and back translation, while others are based the test-taker (represented by the total score) on
on various forms of statistical analysis. The sta- the x-axis, and the probability of getting the item
tistical techniques are divided into two main cat- correct on the y-axis. Such a plot is known as
egories: non-parametric methods developed for an item characteristic curve or ICC. If this pattern
dichotomously scored items using contingency of responses to items of equal difficulty differs
tables and parametric methods for test scores across cultural (or other) groups, then it is clear
with interval-scale properties based on the an- that the items are behaving differently for the
alysis of variance (ANOVA). Non-parametric different groups – this is what is meant by Dif-
statistical approaches look for differences in ferential Item Functioning or DIF.
the frequency with which tests scores are given, DIF is a strong indication that some items of
using a contingency approach and the chi-square the measure, or the measure as a whole, may be
statistic. There are three such non-parametric biased against one of the sociocultural groups
approaches, namely the Mantel-Haenszel (MH) being assessed. At the same time, DIF is a ne-
approach, the Simultaneous Item Bias Test cessary, but not sufficient, condition for item
(SIBTEST) and Distracter Response Analysis bias to exist – if DIF is detected, this is not a suf-
(DRA). The best known of the non-parametric ficient reason to declare item bias, but indicates
techniques is the Mantel-Haenszel statistic, which the possibility that such bias exists and various
uses chi-square to test the null hypothesis that other techniques should be used to determine if
there is no relation between group membership item bias is present. Factor analysis is one such
and test performance on one item after control- measure that could be used.
ling for ability as given by the total test score. In order to detect whether method bias is
In terms of MH, an item is biased if there is a present, Van de Vijver and Hambleton (1996)
significant difference in the proportions of each suggest several approaches, including triangu-
membership group achieving a correct or de- lation, response set detection and non-standard
sired response on each test item. Once an item administration.
has been examined in this way, the process is Finally, He and Van de Vijver (2012) identify
continued until all items have been compared. three actions that can/should be taken to reduce
Parametric approaches to DIF analysis make or prevent inequivalence from occurring at vari-
use of Item Response Theory (IRT), which is ous stages of the assessment process, namely at
an extremely powerful theory that can be used the design, implementation and analysis stages.
U.S. or applicable copyright law.
118
EBSCO
µ BACK TO CONTENTS
Publishing : eBook Collection (EBSCOhost) - printed on 8/19/2019 6:56 AM via THE SOUTH AFRICAN COLLEGE OF
APPLIED PSYCHOLOGY
AN: 1243028 ; Moerdyk, A. P..; The Principles and Practice of Psychological Assessment
Account: ns190599
ASSESSING IN A MULTICULTURAL CONTEXT 8
Copyright © 2015. Van Schaik Publishers. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
Additional reading
For good insight into the use of psychometric scales across cultural boundaries, see Douglas, S.P.
& Nijssen, E.J. (2002). On the use of ‘borrowed’ scales in cross-national research: a cautionary note.
International Marketing Review, 20(6), 621–642.
Essays
1. In the light of the theories discussed in this chapter, revisit Case study 6.1 (p. 72) in Chapter 6 and sug-
gest how you would demonstrate the cross-cultural equivalence of the Trauma Symptom Inventory.
2. Suppose that you want to compare two countries on individualism–collectivism and its effect, if any,
on workplace behaviour, bearing in mind that the samples in one group have on average a higher
level of education than the samples in the other group. Discuss how this difference could challenge
your findings and how you could try to disentangle educational and cultural differences.
3. Suppose that you wanted to investigate the conformity levels of employees in your organisation
which has sizable groups of people from Eastern Europe, Asia, the US and South Africa. How can
sources of method bias be controlled in cross-cultural studies in this study? Discuss procedures at
both the design and analysis stage.
U.S. or applicable copyright law.
EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 8/19/2019 6:56 AM via THE SOUTH AFRICAN COLLEGE OF
APPLIED PSYCHOLOGY
AN: 1243028 ; Moerdyk, A. P..; The Principles and Practice of Psychological Assessment
Account: ns190599