You are on page 1of 32

Methodological Issues in Cross-Cultural Counseling Research:

Equivalence, Bias, and Translations

Stefana gisdttir Lawrence H. Gerstein Deniz Canel inarbas
Ball State University
Concerns about the cross-cultural validity of constructs are discussed, including equivalence, bias, and translation procedures. Methods to enhance equivalence are described, as are strategies to evaluate and minimize types of bias. Recommendations for translating instruments are also presented. To illustrate some challenges of cross-cultural counseling research, translation procedures employed in studies published in five counseling journals are evaluated. In 15 of 615 empirical articles, a translation of instruments was performed. In 9 studies, there was some effort to enhance and evaluate equivalence between language versions of the measures employed. In contrast, 2 studies did not report using thorough translation and verification procedures, and 4 studies employed a moderate degree of rigorousness. Suggestions for strengthening translation methodologies and enhancing the rigor of cross-cultural counseling research are provided. To conduct cross-culturally valid research and deliver culturally appropriate services, counseling psychologists must generate and rely on methodologically sound cross-cultural studies. This article provides a schema for performing such studies.

There is growing interest in international issues in the counseling profession. There are more publications about cross-cultural issues in counseling and the role of counseling outside of the United States (Gerstein, 2005; Gerstein & gisdttir, 2005a, 2005b, 2005c; Leong & Blustein, 2000; Leong & Ponterotto, 2003; Leung, 2003; gisdttir & Gerstein, 2005). Greater attention also has been paid to counseling international individuals living in the United States (Fouad, 1991; Pedersen, 1991). Confirming this trend is the focus of Division 17s past president (2003 to 2004), Louise Douce, on the globalization of counseling psychology. Douce encouraged developing a strategic plan to enhance the professions global effort and facilitate a movement that transcends nationalism (Douce, 2004, p. 145). She also stressed questioning the validity and applicability of our Eurocentric paradigms and the hegemony of such paradigms. Instead, she claimed our paradigms must integrate and evolve from indigenous models. P. Puncky Heppner continued Douces effort as part of his Division 17 presidential initiative. Heppner (2006) claimed, Cross-national relationships
THE COUNSELING PSYCHOLOGIST, Vol. XX No. X, Month XXXX xx-xx DOI: 10.1177/0011000007305384 2007 by the Division of Counseling Psychology.

Copyright 2007 by Division 17 of Counseling Psychologist Association.


have tremendous potential to enhance the basic core of the science and practice of counseling psychology, both domestically and internationally (p. 147). He also predicted, In the future, counseling psychology will no longer be defined as counseling psychology within the United States, but rather, the parameters of counseling psychology will cross many countries and many cultures (Heppner, 2006, p. 170). Although an international focus in counseling is important, there are many challenges (cf. Douce, 2004; Heppner, 2006; Pedersen, 2003). This article discusses methodological challenges, especially as related to the translation and adaptation of instruments for use in international and crosscultural studies and their link to equivalence and bias. While there has been discussion in the counseling psychology literature about the benefits and challenges of cross-cultural counseling and the risks of simply applying Western theories and strategies cross-culturally, we were unable to locate publications in our literature detailing how to perform cross-culturally valid research. There is literature, however, in other areas of psychology (e.g., cross-cultural, social, international) that addresses these topics. This article draws from this literature to introduce counseling psychologists to some concepts, methods, and issues when conducting cross-cultural research. We also extend this literature by discussing the potential use of cross-cultural methodologies in counseling research. As a way to illustrate some challenges of cross-cultural research, we also examine, analyze, and evaluate translation practices employed in five prominent counseling journals to determine the translation procedures counseling researchers have used and the methods employed to minimize bias and evaluate equivalence. Finally, we offer recommendations about translation methodology and ways to increase validity in cross-cultural counseling research.

METHODOLOGICAL CONCEPTS AND ISSUES IN CROSS-CULTURAL RESEARCH Approaches to Studying Culture There are numerous definitions of culture in anthropology and counseling psychology. Ponterotto, Casas, Suzuki, and Alexander (1995) concluded that for most scholars, culture is a learned system of meaning and behavior passed from one generation to the next. When studying cultural influences on behavior, counseling psychologists may approach cultural variables and the design of research from three different angles using the indigenous, the cultural, and the cross-cultural approach (Triandis, 2000).


According to Triandis, when using the indigenous approach, researchers are mainly interested in the meaning of concepts in a culture and how such meaning may change across demographics within a cultural context (e.g., what does counseling mean in this culture?). With this approach, psychologists often study their own culture with the goal of benefiting people in that culture. The focus of such studies is the development of a psychology tailored to a specific culture without a focus on generalization outside of that cultural context (cf. Adamopolous & Lonner, 2001). The main challenge with the indigenous approach is the difficulty in avoiding existing psychological concepts, theories, and methodologies and therefore determining what is indigenous (Adamopolous & Lonner, 2001). Triandis (2000) contended with the cultural approach; in contrast, psychologists often study cultures other than their own by using ethnographic methods. True experimental methods can also be used within this approach (van de Vijver, 2001). Again, the meanings of constructs in a culture are the main focus without direct comparison of constructs across cultures. The aim is to advance the understanding of persons in a sociocultural context and to emphasize the importance of culture in understanding behavior (Adamopolous & Lonner, 2001). The challenge with this approach is a lack of widely accepted research methodology (Adamopolous & Lonner, 2001). Last, Triandis (2000) stated that when using cross-cultural approaches, psychologists obtain data in two or more cultures assuming the constructs under investigation exist in all of the cultures studied. Here, researchers are interested in how a construct affects behavior differently or similarly across cultures. Thus, one implication of this approach is an increased understanding of the cross-cultural validity and generalizability of the theories and/or constructs. The main challenge with this approach is demonstrating equivalence of constructs and measures used in the target cultures and also minimizing biases that may threaten valid cross-cultural comparisons. In sum, indigenous and cultural approaches focus on the emics, or things unique to a culture. These approaches are relativistic in that the aim is studying the local context and meaning of constructs without imposing a priori definitions of the constructs (Tanaka-Matsumi, 2001). Scholars representing these approaches usually reject claims that psychological theories are universal (Kim, 2001). In the cross-cultural approach, in contrast, the focus is on the etics, or factors common across cultures (Brislin, Lonner, & Thorndike, 1973). Here the goal is to understand similarities and differences across cultures, and the comparability of cross-cultural categories or dimensions is emphasized (Tanaka-Matsumi, 2001).


Methodological Challenges in Cross-Cultural Research Scholars from diverse psychology disciplines have pursued cross-cultural research for decades, and as a result, a literature on cross-cultural research methodologies and challenges emerged (e.g., Berry, 1969; Brislin, 1976; Brislin et al., 1973; Lonner & Berry, 1986; Triandis, 1976; van de Vijver, 2001; van de Vijver & Hambleton, 1996; van de Vijver & Leung, 1997). Based on this work, our article identifies some methodological challenges faced by cross-cultural researchers. Before proceeding, note that the challenges summarized below refer to any cross-cultural comparison of psychological constructs (within [e.g., ethnic groups] and between countries). These challenges are greater, though, in cross-cultural comparisons requiring translation of instruments. Equivalence Equivalence is a key concept in cross-cultural psychology. It addresses the question of comparability of observations (test scores) across cultures (van de Vijver, 2001). Several definitions or forms of equivalence have been reported. Lonner (1985), for instance, discussed four types: functional, conceptual, metric, and linguistic. Functional equivalence refers to the function the behavior under study (e.g., counselor empathy) has in different cultures. If similar behaviors or activities (e.g., smiling) have different functions in various cultures, their parameters cannot be used for cross-cultural comparison (Jahoda, 1966; Lonner, 1985). In comparison, conceptual equivalence refers to the similarity in meaning attached to a behavior or concept (Lonner, 1985; Malpass & Poortinga, 1986). Certain behaviors and concepts (e.g., help seeking) may vary in meaning across cultures. Metric equivalence refers to psychometric properties of the tool (e.g., Self-Directed Search) used to measure the same construct across cultures. It is assumed if psychometric data from two or more cultural groups have the same structure (Malpass & Poortinga, 1986). Finally, linguistic equivalence has to do with wording of items (form, meaning, and structure) in different language versions of an instrument, the reading difficulty of the items, and the naturalness of the items in the translated form (Lonner, 1985; van de Vijver & Leung, 1997). Van de Vijver and his colleagues (van de Vijver, 2001; van de Vijver & Leung, 1997) also discussed four types of equivalence representing a hierarchical order from absence to higher degree of equivalence. The first type, construct nonequivalence, refers to constructs (e.g., cultural syndromes) being so dissimilar across cultures they cannot be compared. Under these circumstances, no link exists between the constructs. The next three types of equivalence demonstrate some equivalence with the higher level in the


hierarchy presupposing a lower level. These are construct (or structural), measurement unit, and scalar equivalence. At the lowest level is construct equivalence. A scale has construct equivalence if it measures the same underlying construct across cultural groups. Construct equivalence has been demonstrated for many constructs in psychology (e.g., NEO Personality InventoryRevised five-factor model of personality; McCrae & Costa, 1997). With construct equivalence, the constructs (e.g., extraversion) are considered having the same meaning and nomological network across cultures (relationships between constructs, hypotheses, and measures; e.g., Betz, 2005) but need not be operationally defined the same way for each cultural group (e.g., van de Vijver, 2001). For instance, two emic measures of attitudes toward counseling may tap different indicators of attitudes in each culture, and therefore, the measures may include different items but at the same time be structurally equivalent, as they both measure the same dimensions of counseling attitudes and predict help seeking. Yet as their measurement differs, a direct comparison of average test scores across cultures using a t test or ANOVA, for example, cannot be performed. The measures lack scalar equivalence (see below). Construct equivalence is often demonstrated using exploratory and confirmatory factor analyses and structural equation modeling (SEM) to discern the similarities and differences of constructs structure and their nomological networks across cultures. The next level of equivalence is measurement-unit equivalence (van de Vijver, 2001; van de Vijver & Leung, 1997). With this type of equivalence, the measurement scales of the tools are equivalent (e.g., interval level), but their origins are different across groups. While mean scores from scales with this level of equivalence can be compared to examine individual differences within groups (e.g., using t test), because of different origin, comparing mean scores (e.g., t test) between groups from scales at this level will not provide a valid comparison. For example, Kelvin and Celsius scales have equivalent measurement units (interval scales) but measure temperature differentlythey have a different origin and, thus, direct comparison of temperature using these two scales cannot be done. But because of a constant difference between these two scales, comparability may be possible (i.e., K = C 273). The known constant or value offsetting the scales makes them comparable (van de Vijver & Leung, 1997). Such known constants are difficult to discern in studies of human behavior, rendering scores at this level often incomparable. A clear analogy in counseling psychology is using different cut scores for various groups (e.g., gender) on instruments as an indicator of some criteria or an underlying trait. Different cut scores (or standard scores) are used because instruments do not show equivalence beyond the measurement unit. That is, some bias affects the origin of the


scale for one group relative to the other, limiting raw score comparability between the groups. For example, a raw score of 28 on the Minnesota Multiphasic Personality Inventory 2 MacAndrew Alcohol ScaleRevised (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 2001) does not mean the same thing for women as it does for men. For women, this score indicates more impulsiveness and greater risk for substance abuse than it does for men (Greene, 2000). A less clear example but extremely important to cross-cultural research involves two language versions of the same psychological instrument. Here the origins of the two language versions of the scale may appear the same (both versions include the same interval rating scale for the items). This assumption, however, may be threatened if the two cultural groups responding to this measure vary in their familiarity with Likert-type answer formats (method bias; see later). Because of the differential familiarity with this type of stimuli, the origin of the measurement unit is not the same for both groups. Similarly, if the two cultural groups vary in response style (e.g., acquiescence), a score of 2 on a 5-point scale may not mean the same for both groups. In these examples, the source or the origin of the scale is different in the two language versions, compromising valid cross-cultural comparison. Finally, and at the highest level of equivalence, is scalar equivalence or full score comparability. Equivalent instruments at the scalar level measure a concept with the same interval or ratio scale across cultures, and the origins of the scales are the same. Therefore, at this level, bias has been ruled out, and direct cross-cultural comparisons of average scores on an instrument can be made (e.g., van de Vijver & Leung, 1997). According to van de Vijver (2001), it can be difficult to discern if measures are equivalent at the measurement-unit or scalar level. This challenge is observed in comparison of scale scores between cultural groups responding to the same language version of an instrument as well as between different language versions of a measure. As an example of this difficulty, when using the same language version of an instrument, racial differences in intelligence test scores can be interpreted as representing true differences in intelligence (scalar equivalence has been reached) and as an artifact of the measures (measurement-unit equivalence has been reached). In the latter, the measurement units are the same, but they have different origins because of various biases, hindering valid comparisons across different racial groups. In this instance, valid comparisons at the ratio level (comparing mean scores) cannot be done. Higher levels of equivalence are more difficult to establish. It is, for instance, easier to show that an instrument measures the same construct across cultures (construct equivalence) by showing a similar factor structure and nomological networks than it is to demonstrate the instruments numerical comparability (scalar equivalence). The


higher the level of equivalence, though, the more detailed analysis can be performed on cross-cultural similarities and differences (van de Vijver, 2001; van de Vijver & Leung, 1997). Levels of equivalence for measures used in cross-cultural counseling research should be established and reported in counseling psychology publications. It is not until the equivalence of the concepts under study have been determined that a meaningful cross-cultural comparison can be made. Without demonstrated equivalence, numerous rival hypotheses (e.g., poor translation) may account for observed cross-cultural differences. Bias Another important concept in cross-cultural research is bias. Bias negatively influences equivalence and refers to nuisance factors, limiting the comparability or scalar equivalence of observations (test scores) across cultural groups (van de Vijver, 2001; van de Vijver & Leung, 1997; van de Vijver & Poortinga, 1997). Typical sources of bias are construct, method, and item bias. A construct bias occurs when the construct measured as a whole (e.g., intelligence) is not identical across cultural groups. Potential sources for this type of bias are when there is different coverage of the construct across cultures (i.e., not all relevant behavioral domains are sampled), an incomplete overlap of how the construct is defined across cultures, and when the appropriateness of item content differs between two language versions of an instrument (cf. van de Vijver & Leung, 1997; van de Vijver & Poortinga, 1997). A serious construct bias equates to construct nonequivalence. Even though a construct is well represented in multilingual versions of a scale (construct equivalence, e.g., similar factor structure, and there is no construct bias, e.g., complete coverage of construct), bias may still exist in the scores, resulting in measurement-unit or scalar nonequivalence (van de Vijver & Leung, 1997). This may be a result of method bias. Method bias can stem from characteristics of the instrument or from its administration (van de Vijver, 2001; van de Vijver & Leung, 1997; van de Vijver & Poortinga, 1997). Possible sources of this bias are differential response styles (e.g., social desirability) across cultures (e.g., Johnson, Kulesa, Cho, & Shavitt, 2005), variations in familiarity with the type of stimuli or scale across cultures, communication problems between investigators and participants, and differences in physical conditions under which the instrument is administered across cultures. Method bias can also limit cross-cultural comparisons when samples drawn from different cultures are not comparable (e.g., prior experiences). Item bias may also exist, posing a threat to cross-cultural comparison (scalar equivalence). This type of bias refers to measurement at the item level.


This bias has several potential sources. It can result from poor translation or poor item formulation (e.g., complex wording) and because item content may not be equally relevant or appropriate for the cultural groups being compared (e.g., Malpass & Poortinga, 1986; van de Vijver & Poortinga, 1997). An item on an instrument is considered biased if persons from different cultures having the same standing on the underlying characteristic (trait or state) measured yield different average item scores on the instrument. Finally, bias can be considered uniform and nonuniform. A uniform bias refers to any type of bias affecting all score levels on an instrument equally (van de Vijver and Leung, 1997). For instance, when measuring persons intelligence, the scale may be accurate for one group but may consistently reflect 10 points too much for another group. The 10-point difference would appear at different intelligence levels (a true score of 90 would be 100, and a true score of 120 would be 130). A nonuniform bias is any type of bias differentially affecting different score levels. In measuring persons intelligence, the scale may again be accurate for one group, but for the other group, 10 points are recorded as 12 points. The difference in measured intelligence for persons whose true score is 90 would be a score of 108 (18point difference), whereas for persons whose true score is 110, the difference is 22 points (a score of 132). The distortion is greater at higher levels on the scale. Nonuniform bias is considered a greater threat in cross-cultural comparisons than uniform bias, as it influences the origin and measurement unit (scale) of a scale. Uniform bias affects only the origin of a scale (cf. van de Vijver, 1998, 2001). Relationship Between Bias and Equivalence Bias and equivalence are closely related. When two or more language versions of an instrument are unbiased (construct, method, item), they are determined equivalent on a scale level. Bias will lower a measures level of equivalence (construct, measurement unit, scalar). Also, construct bias has more serious consequences and is more difficult to remedy than method and item bias. For instance, selecting a preexisting instrument for translation and use on a different language group, the researcher runs the risk of incomplete coverage of the construct in the target culture (i.e., construct bias limiting construct equivalence). Method bias can be minimized, for example, by using standardized administration (administering under similar conditions using same instructions) and by using covariates, whereas thorough translation procedures may limit item bias. Furthermore, higher levels of equivalence are less robust against bias. Scalar equivalence (a needed condition for comparison of average scores between groups) is, for instance, affected by all types of bias and is more susceptible to bias than


measurement-unit equivalence or construct equivalence, where comparative statements are not a focus (cf. van de Vijver, 1998). Thus, if one wants to infer if Culture A shows more or less magnitude of a characteristic (e.g., willingness to seek counseling services) than Culture B, one has to empirically demonstrate the measures lack of bias and scalar equivalence. Not all instruments are equally vulnerable to bias. In fact, more structured tests administered under standardized conditions are less susceptible to bias than open-ended questions. Similarly, the less the cultural distance (Triandis, 1994, 2000) between groups being compared, the less room there is for bias. Cultural distance can, for instance, be discerned based on the Human Development Index (HDI; United Nations, 2005) published yearly by the United Nations Development Programme to assess well-being and child welfare (human development). Using the HDI as a measure of cultural distance, it can be seen that the United States (ranked 10) and Ireland (ranked 8) are more similar in terms of human development than the United States and Niger (ranked 177). Therefore, it can be expected that greater bias affects cross-cultural comparisons between the United States and Niger than between the United States and Ireland.

MEASUREMENT APPROACHES Selection of Measurement Devices A prerequisite to conducting a cross-cultural study is to make sure what is being studied exists and is functionally equivalent across cultures (Berry, 1969; Lonner, 1985). Once this has been determined, the next step is deciding how the construct should be assessed. This decision should be based on the type of bias expected. If there is a concern with construct bias, the construct is not functionally equivalent, and serious method bias is expected, the researcher may need to rely on emic approaches (indigenous or cultural), develop measures meaningful to the culture, and use culture-sensitive methodologies. Van de Vijver and Leung (1997) called this strategy the assembly approach. Emic techniques (i.e., assembly) are often needed if the cultures of interest are very different (Triandis, 1994, 2000). In this approach, though, direct comparisons between cultures can be challenging, as the two or more measures of the construct may not be equivalent at the measurement level. If, in contrast, the cultures are relatively similar and the concept is functionally equivalent across cultures, the researcher may opt to translate and/or adapt preexisting instruments and methodologies to discern cultural similarities and differences across cultural groups. Van de Vijver and Leung (1997)


listed two common strategies employed when using preexisting measures for multilingual groups. First is the applied approach, where an instrument goes through a literal translation of items. Item content is not changed to a new cultural context, and the linguistic and psychological appropriateness of the items are assumed. It is also assumed there is no need to change the instrument to avoid bias. According to van de Vijver (2001), this is the most common technique in cross-cultural research on multilingual groups. The second strategy is adaptation, where some items may be literally translated, while others require modification of wording and content to enhance the appropriateness to a new cultural context (van de Vijver & Leung, 1997). This technique is chosen if there is concern with construct bias. Of the three approaches just mentioned (assembly, application, and adaptation), the application strategy is the easiest and least cumbersome in terms of money, time, and effort. This technique may also offer high levels of equivalence (measurement-unit and scalar equivalence), and it can make the comparison to results of other studies using the same instrument possible. This approach may not be useful, however, when the characteristic behaviors or attitudes (e.g., obedience and being a good daughter or son) associated with the construct (e.g., filial piety) differ across cultures (lack of construct equivalence and high construct bias) (e.g., Ho, 1996). In such instances, the assembly or adaptation strategy may be needed. With the assembly approach (emic), researchers may focus on the construct validity of the instrument (e.g., factor analysis, divergent and convergent validity), not on direct cross-cultural comparisons. When adaptation of an instrument is needed in which some items are literally translated, whereas others are changed or added, cross-cultural comparisons may be challenging, as direct comparisons of total scores may not be feasible because all items are not identical. Only scores on identical items can be compared using mean score comparisons (Hambleton, 2001). The application technique (etic) to translation most easily allows for a direct comparison of test scores using t tests or ANOVA because of potential scalar equivalence. For such comparisons to be valid, however, an absence of bias needs to be demonstrated. The applied approach and to some degree the adaptation strategy focus on capturing the etics, or the qualities of concepts common across cultures. Yet cultural researchers have criticized it. Berry (1989), for instance, labeled this practice imposed etics, claiming that by using the etic approach, researchers fail to capture the culturally specific aspects of a construct and may erroneously assume the construct exists and functions similarly across cultures (cf. Adamopolous & Lonner, 2001). The advantage of the etic over the emic strategy, however, is that the etic technique provides the ability to make cross-cultural comparisons, whereas in the emic approach, cross-cultural comparison is more difficult and not as direct.



Nevertheless, the etic strategy may be limited when trying to understand a specific culture. There is, for instance, no guarantee a translated measure developed to assess a concept in one culture will assess the same construct equally well in another culture. It is highly likely that some aspects of the concept may be lost or not captured by the scale. There might be construct bias and lack of construct equivalence. To counteract this shortcoming, several methods have been proposed. Brislin and colleagues (Brislin, 1976, 1983; Brislin et al., 1973) suggested a combined eticemic strategy. In this approach, researchers begin with an existing tool developed in one culture that is translated for use in a target culture (potentially etic items). Next, additional items are included in the translated scale, which are unique to the target culture (emic). The additional items may be developed by persons knowledgeable about the culture and/or drawn from relevant literature. These culture-specific items must be highly correlated with the original items in the target instrument but unrelated to culture-specific items generated from another culture (Brislin, 1976, 1983; Brislin et al., 1973). Adding emic items will provide the researcher with a greater in-depth understanding of a construct in a given culture. Assessing equivalence between the language versions of the instrument would be based only on the shared (etic) items (Hambleton, 2001). Similarly, Triandis (1972, 1975, 1976) suggested that researchers start with an etic concept (thought to exist in all cultures under study) and then develop emic items based on each culture for the etic concept. Thus, all instrument development is carried out within each culture included in the study (i.e., assembly). Triandis argued that cross-cultural comparison could still be made using these versions of the measure (one in each culture) because the emic items would be written to measure an etic concept. SEM could, for instance, be used for this purpose (see Betz, 2005; Weston & Gore, 2006). Finally, a convergence approach can be applied (e.g., van de Vijver, 1998). Relying on this technique, researchers may assemble a scale measuring an etic concept in each culture or use preexisting culture-specific tools translated into each language. Then all measures are given to each cultural group. Comparisons can be made between shared items (given enough items are shared), whereas nonshared items provide culture-specific understanding of the construct. When this method is used, the appropriateness of items in all scales needs to be determined before administration. Determining Equivalence of Translated Instruments Several statistical methods are available to determine equivalence between translated and original versions of scales. Reporting Cronbachs


alpha reliability, item-total scale correlations, and item means and variations provides initial information about instruments psychometric properties. A statistical comparison between two independent reliability coefficients can be performed (cf. van de Vijver & Leung, 1997). If the coefficients are significantly different from each other, the source of the difference should be examined. This may indicate item or construct bias. Additionally, item-total scale correlations may indicate construct bias and nonequivalence, and method bias (e.g., administration differences, differential social desirability, differential familiarity with instrumentation). Finally, item score distribution may suggest biased items and, therefore, information about equivalence. For instance, an indicator (e.g., item or scale) showing variation in one cultural group but not the other may represent an emic concept (Johnson, 1998). Therefore, comparing these statistics across different language versions of an instrument will offer preliminary data about the instruments equivalence (e.g., construct, measurement unit, and scalar; van de Vijver & Leung, 1997; conceptual and measurement; Lonner, 1985). Construct (van de Vijver & Leung, 1997), conceptual, and measurement equivalence (Lonner, 1985) can also be assessed at the scale level. Here, exploratory and confirmatory factor analysis, multidimensional scaling techniques, and cluster analysis can be used (e.g., van de Vijver & Leung, 1997). These techniques provide information about whether the construct is structurally similar across cultures and if the same meaning is attached to the construct. For instance, in confirmatory factor analysis, hypotheses about the factor structure of a measure, such as the number of factors, loadings of variables on factors, and correlations among factors, can be tested. Numerous fit indices can be used to evaluate the fit of the model to the data. Scalar or full score equivalence is more difficult to establish than construct and measurement-unit equivalence, and various biases may threaten this level of equivalence. Item bias, for instance, influences scalar equivalence. Item bias can be ascertained by studying the distribution of item scores for all cultural groups (cf. van de Vijver & Leung, 1997). Item response theory (IRT), in which differential item functioning (DIF) is examined, may be used for this purpose. In IRT, it is assumed item responses are related to an underlying or latent trait using a logistic curve known as item characteristic curve (ICC). The ICCs for each selected parameter (e.g., item difficulty or popularity) are compared for every item in each cultural group using chisquare statistics. Items differing between cultural groups are eliminated before cross-cultural comparisons are made (e.g., Hambleton & Swaminathan, 1985; van de Vijver & Leung, 1997). Item bias can also be examined by using ANOVA. The item score is treated as the dependent variable, and the cultural group (e.g., two levels) and score levels (levels dependent on number of scale items and number of participants scoring at each



level) are the independent variables. Main effects for culture and the interaction between culture and score level are then examined. Significant effects indicate biased items (cf. van de Vijver & Leung, 1997). Logistic regression can also be used for this purpose using the same type of independent and dependent variables. Additionally, multiple-group SEM invariance analyses (MCFA) and multiple group mean and covariance structures analysis (MACS) also provide information about biased items or indicators (e.g., Byrne, 2004; Cheung & Rensvold, 2000; Little, 1997, 2000), with the MACS method also providing information about mean differences between groups on latent constructs (e.g., Ployhart & Oswald, 2004). Finally, factors contributing to method bias can be assessed and statistically held constant when measuring constructs across cultures, given that valid measures are available. A measure of social desirability may, for instance, be used to partially control for method bias. Also, gross national product per capita may be used to control for method bias, as it has been found to correlate with social desirability (e.g., Van Hemert, van de Vijver, Poortinga, & Georgas, 2002) and acquiescence (Johnson et al., 2005). Furthermore, personal experience variables potentially influencing the construct under study differentially across cultures may serve as covariates. Translation Methodology Employing a proper translation methodology is extremely important to increase equivalence between multilingual versions of an instrument and the measures cross-cultural validity. About a decade ago, van de Vijver and Hambleton (1996) published practical guidelines for translating psychological tests that were based on standards set forth in 1993 by the International Test Commission (ITC). The guidelines covered best practices in regard to context, development, administration, and the interpretation of psychological instruments (cf. Hambleton & de Jong, 2003; van de Vijver, 2001; van de Vijver & Hambleton, 1996; van de Vijver & Leung, 1997). The context guidelines emphasized the importance of minimizing construct, method, and item bias and the need to assess, instead of assume, construct similarity across cultural groups before embarking on instrument translation. The development guidelines referred to the translation process itself, while the administration guidelines suggested ways to minimize method bias. Finally, the interpretation guidelines recommended caution when explaining score differences unless alternative hypotheses had been ruled out and equivalence between original and translated measures had been ensured (van de Vijver & Hambleton, 1996). Counseling psychologists should review these guidelines when designing cross-cultural research projects and prior to translating and adapting psychological instruments for such research.


Prior to the development of the ITC standards, Brislin et al. (1973) and Brislin (1986) had written extensively about translation procedures. The following paragraphs outline the common translation methods that Brislin et al. summarized with connotations to the ITC guidelines (e.g., Hambleton & de Jong, 2003; van de Vijver & Hambleton, 1996). Additional methods to enhance equivalence of translated scales are also mentioned. Translation. When translating an instrument, bilingual persons who speak both the original and the target language should be employed. Either a single person or a committee of translators can be used (Brislin et al., 1973). In contrast to employing only a single person for the translation, the committee approach emphasizes two or more persons performing the translation independently. Then, the translations are compared, sometimes with another person, until an agreement is reached on an optimal translation. The advantage of the committee approach recommended in the ITC guidelines (van de Vijver & Hambleton, 1996) over a single person is the possible reduction in bias and misconceptions of a single person. In addition to being knowledgeable about the target language of the translation, test translators need to be familiar with the target culture, the construct being assessed, and the principles of assessment (Hambleton & de Jong, 2003; van de Vijver & Hambleton, 1996). Being knowledgeable about such topics minimizes item biases (e.g., in an achievement test, an item in one culture may give away more information than the same item in another culture) that may result from literal translations. Back translation. In this procedure, the translated or target version of the measure is independently translated back to the original language by different person(s) than the one(s) performing the translation to the target language. If more than one person is involved in the back translation, together they decide on the best back-translated version of the scale that is compared to the original same-language version for linguistic equivalence. Back translation does not only provide the researcher with some control over the end result of the translated instrument in cases where he or she does not know the target language (e.g., Brislin et al., 1973; Werner & Campbell, 1970), it also allows for further refinement of the translated version to ensure equivalence of the measures. If the two same-language versions of the scale do not seem identical (i.e., the original and the backtranslated versions), the researcher in cooperation with the translation committee works on the translations until equivalence is reached. Here, the items requiring a changed translation may be subject to back translation again. Oftentimes in this procedure, only the translated version is changed to be equivalent to the original-language version that remains unchanged.



At other times, the original language version of the scale is also changed to ensure equivalence, a process known as decentering (Brislin et al., 1973). Adequate back translation does not guarantee a good translation of a scale, as this procedure often leads to literal translation at the cost of readability and naturalness of the translated version. To minimize this, a team of back translators with a combined expertise in psychology and linguistics may be used (van de Vijver & Hambleton, 1996). It is also important to note that in addition to the test items, test instructions need to go through a thorough translation/back-translation process. Decentering. This method was first introduced by Werner and Campbell (1970) and refers to a translation/back-translation process in which both the source (original instruments language) and the target language versions are considered equally important and both are open to modification. Decentering may need to take place if words in the original language have no equivalence in the target language. If the aim is collecting data in both the original and the target culture, items in the original instrument are changed to ensure maximum equivalence (cf. Brislin, 1970, on the translation of Marlowe-Crownes [Crowne & Marlowe, 1960] Social Desirability Scale). Thus, the back-translated version of the original instrument is used for data collection instead of the original version, as it is considered most likely to be equivalent to the translated version (Brislin, 1986). When this outcome is selected and when researchers worry that changes in the original language may lead to a lack of comparability with previous studies using the original instrument, Brislin (1986) suggested collecting data using both the decentered and the original version of the instrument on a sample speaking the original language. The participants may see half of the original items and half of the revised items in a counterbalanced order. Statistical analysis can indicate whether different conclusions should be made based on responses to the original versus the revised items (see Brislin, 1970). Pretests. Following translation and back translation of an instrument and, therefore, judgmental evidence about the equivalence of the original and translated versions of the instrument, several pretest measures can be used to evaluate the equivalence of the instruments in regard to the meaning conveyed by the items. One approach is to administer the original and the translated versions of the instrument to bilingual persons (Brislin et al., 1973; van de Vijver & Hambleton, 1996). Following the administration of the instruments, item responses can be compared using statistical methods (e.g., t test). If item differences are discovered between versions of the instrument, the translations are reviewed and changed accordingly.


Sometimes bilingual individuals are used in lieu of performing back translations (Brislin et al., 1973). In this case, the translated version and original versions of the instrument are administered to bilingual persons. The bilingual persons may be randomly assigned to two groups that receive half of the questions in the original language and the other half in the target language. The translated items resulting in responses different from responses elicited by the same original items are then refined until the responses between the original and the translated items are comparable. Items not yielding comparable responses despite revisions are discarded. If items yield comparable results, the two versions of the instrument are considered equivalent. Additionally, a small group of bilingual individuals can be employed to rate each item from the original and translated versions of the instrument on a predetermined scale in regard to the similarity of meaning conveyed by the item. Problematic items are then refined until deemed satisfactory (e.g., Hambleton, 2001). A small sample of participants (e.g., N = 10) can also be employed to pretest a translated measure that has gone through the translation/back-translation iteration. Here, participants are instructed to provide verbal or written feedback about each item of the scale. For example, Brislin et al. (1973) noted two methods: random probe and rating of items. In the random probe method, the researcher randomly selects items from a scale and asks probing questions about an item, such as What do you mean? Persons responses to the probes are then examined. Responses considered bizarre or unfitting an item are scrutinized, and the translation of the item is changed. This method provides insight into how well the meaning of the original items has fared in the translation. In the rating method, respondents are asked to rate their perceptions about item clarity and appropriateness on a predetermined scale. Items that are unclear or not fitting based on these ratings are reworded. Finally, a focus group approach can be used (e.g., gisdttir, Gerstein, & Gridley, 2000) where a small group of participants responds to the translated version and then discusses with the researcher(s) the meaning the participants associated with the items. Participants also share their perception about the clarity and cultural appropriateness of the items. Item wording is then changed based on responses from the focus group members. Statistical Assessment of the Translated Measure In addition to pretesting a translated scale and judgmental evidence about a scales equivalence, researchers need to provide further evidence of the measures equivalence to the original instrument. As stated earlier, item analyses and Cronbachs alpha suggest equivalence and lack of bias. Furthermore, exploratory and confirmatory factor analyses of the measures factor structure can contribute information about construct equivalence. Multidimensional scaling and cluster analysis can be used to explore construct



equivalence as well. These techniques indicate equivalence on an instrument level, more specifically, about the similarities and differences of the hypothesized construct underlying the instrument for the different language versions. Similar to Brislin et al.s (1973) suggestions mentioned earlier, Mallinckrodt and Wang (2004) proposed a method they termed the duallanguage split half (DLSH) to evaluate equivalence. In this procedure, alternate forms of a translated measure, each composed of one half of items in the original language and one half of items in the target language, are administered to bilingual persons in a counterbalanced order of languages. Equivalence between the two language versions of the instruments is determined by lack of significant differences between mean scores on the original and translated version of the measures, by split-half correlations between clusters of items on the original and the target language, and by the internal consistency reliability and testretest reliability of the dual language form of the measures. These coefficients are compared to results from the original-language version of the instrument. Also inherent in this approach is collection of evidence for convergent validity for each language version. Finally, and as mentioned earlier, to provide further evidence of the measures equivalence to the original measure analyses at the item level (item bias analysis; van de Vijver & Hambleton, 1996), procedures such as ANOVA and IRT to examine DIF can be applied to determine scalar equivalence (cf. van de Vijver & Leung, 1997). MCFA and MACS invariance analyses can be employed for this purpose as well.

CONTENT ANALYSIS OF TRANSLATION METHODS IN SELECT COUNSELING JOURNALS Another purpose of this article is to examine, analyze, and evaluate translation practices employed in five prominent counseling journals thought to publish a greater number of articles on international topics than other counseling periodicals. This purpose was pursued to determine whether counseling researchers have, in fact, followed the translation procedures suggested by Brislin (1986) and Brislin et al. (1973) and in the ITC guidelines (e.g., van de Vijver and Hambleton, 1996). We also examined the methods used to control for bias and increase equivalence. While this was not the primary purpose of this article, results of our investigation might help illustrate counseling researchers use of preferred translation principles mentioned in the cross-cultural literature. It was also assumed results obtained from this type of investigation could help identify further recommendations to assist counseling researchers when conducting crosscultural studies and when reporting results of such projects in the scholarly literature.


METHOD Sample The sample consisted of published studies employing translated instruments in their data collection. To be included in this project, an integral part of the studys methodology had to be a translation of one or more entire instrument or some subset of items from an instrument. Furthermore, the target instrument could not have been translated or evaluated the same way in earlier studies. Additionally, the included studies had to either compare responses from persons from more than one culture (nationality) or investigate a psychological concept using a non-U.S. or non-English-speaking sample of participants. Studies for this investigation were sampled from five counseling journals (Journal of Counseling Psychology [JCP], Journal of Counseling and Development [JCD], Journal of Multicultural Counseling and Development [JMCD], Measurement and Evaluation in Counseling and Development [MECD], and The Counseling Psychologist [TCP]) thought to publish articles relevant to non-English-speaking cultures, ethnic groups, and/or countries. To assess for more recent trends in the literature, only articles published between the years 2000 and 2005 were included in our sample. We assumed recent studies (i.e., studies published since 2000) would provide a good representation of current translation and verification practices employed by counseling researchers. From 2000 to 2005, a total of 615 empirical articles were published in the targeted journals. Of these articles, 15 included translation as a part of their methodology. Therefore, 2.4% of the empirical articles published in these five counseling journals incorporated a translation process. Procedure The 15 identified studies were coded by (a) publication source (e.g., TCP), (b) year of publication (e.g., 2001), (c) construct investigated and name of scale translated, (d) translation methodology used (single person, committee, bilinguals), (e) whether the translated version of the scale was pilot tested (yes or no) before main data collection, (f) number of participants used for pilot testing, (g) psychometric properties reported and statistics used to evaluate the translated measures equivalence to the original scale, and (h) number of participants from which the psychometric data were gathered. Two of the current authors coded information from the articles independently. If disagreements arose in the coding (e.g., relevant psychometrics for equivalence evaluation), these were resolved through consensus agreement between the coders. (text continues on p. 22)

TABLE 1: Studies Involving Translation of Instruments

Psychometrics Reported

Instrument Name Translation English to Korean Committee Yes No N/A Pretest Original Approach to Translation Back Translation Target English to Dutch Committee (researchers); unclear what instruments were translated in study Committee Yes Pilot interviews Yes (researchers) No N/A English to Vietnamese, Khmer, Laotian N/A English to Spanish. No discussion of translation method Not reported No English to Spanish Committee Yes Pilot interview; no comparison between English and Spanish version of protocol prior to data collection No N/A English to Spanish Committee Yes No Not reported for the 10% of the sample that responded to this version

Assigned Number, Citation, and Journal


Type of Sample

1. Shin, Berkson, & Crittenden (2000); JMCD

Psychological help-seeking attitudes; traditional values

Immigrants from Korea

ATSPPH: Factor analysis AAS: Cronbach's alpha (N=110 Korean immigrants in U.S.)

2. Engels, Finkenauer, Meeus, & Dekovic (2001); JCP

Parental attachment; Relational competence; Self-esteem; Depression

Dutch adolescents

3. Chung & Bemak (2002); JCD

Southeastern Asian refugees

Six items from the Attitudes Toward Seeking Professional Psychological Help (ATSPPH); Acculturation Attitude Scale, (AAS) prior translation; Vignettes developed in English Parent and Peer Attachment (IPPA); Perceived Competence Scale for Children; Self-Esteem Scale; Depressive Mood List Health Opinion Survey (interview)

Cronbach's alpha (N=412 Dutch adolescents)

Exploratory factory analysis for Vietnamese (N=867), Cambodian (N=590), and Laotian (n=723) persons Latina professor of foreign N/A language served as an auditor English version to ensure proper translation of protocol of transcripts from Spanish administered to to English (n=7) Latina (n=3) Latina women women

4. Kasturirangan & Nutt-Williams (2003); JMCD

Anxiety; depression; psychosocial dysfunction symptoms Culture Domestic violence

Latino women

A semi structured interview protocol developed by the researchers: Two interviews in English, seven in Spanish

5. Asner-Self & Schreiber (2004); MECD

Attributional style

Immigrants from Central America

The Attributional Style Questionnaire (ASQ)

6. Torres & Rollock (2004); MECD

Acculturation-related challenges

Immigrants from Central & South America

Cultural Adjustment Difficulties Checklist (CADC)

Cronbach's alpha, principle components analysis (N=89 Central American immigrants in U.S.) Cronbach's alpha (N=86 Hispanic immigrants). 90% of the sample responded to the translated version of instruments. No comparison reported between the two language versions


Psychometrics Reported Instrument Name Translation English to Korean Single person Yes Yes; Focus group (n=4 South Korean nationals) evaluated each item from IRMAS and 26 items generated from Korean literature. All items were in Korean N/A Pretest Original Approach to Translation Back Translation Target English to Spanish Not reported Yes Not reported Not reported English to Mandarin Single person Yes No N/A English to Vietnamese Committee Yes Yes (n = 10) Vietnamese version

TABLE: (continued)

Assigned Number, Citation, and Journal


Type of Sample

7. Oh & Neville (2004); TCP

Korean rape myth acceptance

Korean college students

Illinois Rape Myth Acceptance Scale (IRMAS) (26 items from IRMAS were translated and included in the preliminary version of the Korean Rape Myth Acceptance Scale; KRMAS)

8. Asner-Self & Marotta (2005); JCD

Depression, anxiety, phobic anxiety; Erikson's eight psychosocial stages

Immigrants from Central America

Brief Symptom Inventory (BSI); Measures of Psychosocial Development (MPD)

9. Wei & Heppner (2005); TCP

Clients' perceptions of counselor credibility; working alliance

Counselorclient dyads in Taiwan

Counselor Rating Formshort Version (CRF-S); The Working Alliance Inventoryshort Version (WAI-S)

Study 1: Principle components analysis followed by exploratory factor analysis (N=348 South Korean college students). Study 2: confirmatory Factor analysis, factorial invariance procedure, Cronbach's alpha, & MANOVA to establish criterion validity (N=547 South Korean nationals ). Study 3: Testretest reliability (N=40 South Korean teachers or school administrators) Not reported. No information about number of participants responding to English or Spanish versions of instruments. Volunteers probed about the research experience. Cronbach's alpha, intercorrelations among CRF subscales (construct validity) (N=31 counselor/ client dyads in Taiwan)

Cross-cultural studies 10. Marino, Stuart, & Minas (2000); MECD


Anglo-Celtic Australians & Vietnamese immigrants to Australia

Developed a questionnaire (in English) measuring behavioral and psychological acculturation, and socioeconomic and demographic influences on acculturation

Cronbach's alpha, Cronbach's alpha (N=187 Vietnamese Australians). (N=196 Vietnamese participants Anglo-Celtic responded to either an English Australians) or a Vietnamese version of the instrument. Statistical evidence of equivalence between these two language versions of the instrument was not reported

11. gisdttir & Gerstein (2000); JCD

Counseling expectations; Holland's typology

Icelandic & U.S. college students

English to Icelandic



Focus Group (n = 8) Icelandic version

Cronbach's alpha (N = 225 U.S. college students)

12. Poasa, Mallinckrodt, & Suzuki (2000); TCP

Causal attributions

U.S., American Somoan, & Western Samoan college students

Expectations About Counseling Questionnaire (EAC-B); Self-Directed Search (SDS) Questionnaire of Attribution and Culture (QAC; vignettes with openended response probes developed in English) English to Samoan Single person Yes English version of QAC pilot tested and respondents provided feedback to evaluate equivalence (n = 16) No

13. Tang (2002); JMCD English to Chinese Single person (researcher) Yes

Career choice

Chinese, Chinese-American, & Caucasian American college students

A questionnaire developed in English in the study to measure influences on career choice

Cronbach's alpha (N = 261 Icelandic college students). Covariate analysis (prior counseling experience) used to control for method bias. A team of English- A team of Samoan-speaking persons (n = 3) speaking independently coded the persons (n = 4) Samoan language responses independently from QAC and interviews coded the (N = 50). No information Englishabout if themes/codes were language translated from Samoan to responses from English QAC and interviews (N = 23) None reported for None reported for Chinese (N = 120) college students Caucasian American (N = 124) and Asian American (131) college students

Equivalence studies 14. Chang & Myers (2003); MECD English to Korean Single translator whose translations were edited by first author. Discrepancies resolved between translator and editor upon mutual agreement No


Immigrants from Korea

The Wellness Evaluation of Lifestyle (WEL)

15. Mallinckrodt & Wang (2004); JCP English to Chinese

Adult attachment

Int'l students from Taiwan

The Experiences in Close Relationships Scale (ECRS)



Yes (n = 3): Bilingual examinees took both the English and the Korean version. Effect size (Cohen's d) of difference in mean scores between English and Korean version No

None reported for None reported for a larger a larger sample sample (N not reported) (N not reported)

Split-half reliability, Cronbach's alpha (N = 399 U.S. college students)


Used bilinguals (n = 30 Taiwanese international college students) to evaluate equivalence using DLSH method: within-subjects t test between two language versions, split-half reliability, Cronbach's alpha, testretest reliability and construct validity correlations with a related construct


RESULTS Table 1 lists results found for each of the 15 studies. Three of the included studies used a structured or semistructured interviewtest protocol. In 3 studies, of which one included a semistructured test protocol, an English-language instrument was developed and then translated to another language. Furthermore, in 9 studies, one or more preexisting measures (the entire instrument or subset of items) were translated into a language other than English. In the 15 studies, a range of constructs was examined, including persons counseling orientations (e.g., help-seeking attitudes, counseling expectations), adjustment (e.g., acculturation), and maladjustment (e.g., psychological stress). A diversity of cultural groups was represented in the 15 studies as well (see Table 1). Evaluation of Included Studies Two main criteria were used to evaluate these 15 studies: (a) the translation methodology employed (single person, committee, back translation, pretest), which provides judgmental evidence about the equivalence of the translated measure to the original measure; and (b) whether statistical methods were used to verify equivalence of the translated measure to its original-language version. Because the studies ranged in terms of their purpose and the approaches taken when investigating multicultural groups, and also because these strategies were linked with different measurement opportunities of equivalence and bias, we divided these 15 studies into three categories: target-language, cross-cultural, and equivalence studies. The target-language studies included projects in which only translated versions of measures were investigated. These studies employed either crosscultural (etic) methodologies or a combination of cultural and cross-cultural methodologies (emicetic). For these studies, there was no direct comparison made between an original and a translated version of the protocol. The second category of studies used a cross-cultural approach, as they compared two or more groups on a certain construct. Each of these groups received the original and translated versions of a measure. Finally, the third category of studies was specifically designed to examine equivalence between two language versions of an instrument. These studies we termed equivalence studies. We identified studies that employed sound versus weak translation methodologies. This task turned out to be difficult, however, because of the scarcity of information reported about the translation processes used. Sometimes, the translation procedure was described in only a couple of sentences. In other instances, the translation methodology was discussed in more detail



(e.g., number and qualifications of translators and back translators), while in fewer instances, examples were provided about acceptable and unacceptable item translations. Despite these difficulties, and based on available information, we contrasted relatively sound and weak translation procedures. Translation methods we considered weak did not incorporate any mechanism to evaluate the translation, including either judgmental (e.g., back translation, use of bilinguals, pretest) and/or quantitative (statistical evidence of equivalence) procedures. Instead, such a protocol was translated to one or more languages without any apparent evaluation about its equivalence to the original language version. Methodologically sound studies incorporated both judgmental and quantitative methods to assess the validity of the translation. Given these criteria to evaluate the methodological rigor of the translation process employed, we now present the analyses of the 15 identified studies in the literature. Target-language studies. Eight of the 15 studies administered and examined responses from a translated measure without direct comparison to a group responding to an original-language version of the measure (see Table 1). In most of these studies, persons from one cultural group participated. Both quantitative and qualitative methods were employed. These studies relied on preexisting instruments, select items from preexisting instruments, or interview protocols translated into a new language. We also included in this category studies in which a protocol was developed in English and translated into another language. In two studies (4 and 8), few procedures were reported to evaluate the translation and verify the different language form of the measures used (see Table 1). In these studies, two language versions of a scale were collapsed into one set of responses without evaluating their equivalence. A stronger design for these studies would ensure judgmental equivalence between the two language versions of the scales. This could have been accomplished by using a committee of translators and independent back translators. A stronger design would have also resulted from incorporating a decentering process when developing the adapted measures and, if appropriate, by statistically assessing equivalence. Thus, we considered these studies weak in terms of their methodological rigor. Sound translation methods incorporate several mechanisms to evaluate a translated version of a protocol. They involve, for instance, a committee approach to translation/back translation, a pretest of the scale, and an evaluation of the instruments psychometric properties relative to the original version. Four studies reported information somewhat consistent with our criteria for sound methodological procedures (3, 5, 7, and 9). The authors, with varying degree of detail, reported using either a single person or a


committee approach to translation, they relied on back translation, and they employed one or more independent experts to evaluate the equivalence of the language forms. They also reported making subsequent changes to the translated version of the instruments they were using. Additionally, in some of these studies, a pretest of the translated protocol was performed, and in all of these projects, the investigators discussed the statistical tests of the measures psychometric properties (see Table 1). The remaining three studies in this category (1, 2, and 6) contained translation methods of moderate quality, in that their quality ranged in between those we considered using relatively weak and strong translation procedures. In fact, the translation process was not fully described. Furthermore, in one instance, the same person performed the translation and the back translation (2), and in another (6), no assessment of equivalence was reported on the two language versions of the scale used before responses were collapsed into one data set. Also, in one study (1), translated items from an existing scale were selected a priori without any quantitative or qualitative (e.g., pretest) assurance these items fit the cultural group to which they were administered. In none of these three studies were the measures pretested before collecting data for the main study. Finally, insufficient information was reported about the translated instruments psychometric properties to evaluate the validity of the measures for the targeted cultural groups. The internal validity of these studies could have been greatly improved had the researchers included some of these procedures in the translation and verification process. Cross-cultural studies. Four of the 15 studies directly compared two or more cultural groups. In 3 of these studies, an instrument was developed in English and then translated into another language, whereas in 1 study, a preexisting instrument was translated to another language (see Table 1). In all 4 studies, comparisons were made between language groups relying on two language versions of the same instrument. None of these four studies employed a particularly weak translation methodology. Yet three of the four studies (11, 12, and 13) used relatively rigorous methods. In these three studies, the scales were pretested following the translation/back-translation process, providing judgmental evidence of equivalence. Additionally, in the two quantitative studies (10 and 11), the researchers compared Cronbachs alphas between language versions. Finally, in one study (11), equivalence was further determined by employing covariate analysis to control for method bias (different experiences of participants across cultures) in support of scalar equivalence. None of these approaches to examine and ensure equivalence was reported in the Tang (2002) study. As a result, we concluded that this study used the least valid



approach. It is noteworthy that all four studies in this category failed to assess the factor structure of the different language versions of the measures, and as such, they did not provide additional support for construct equivalence. Similarly, none of these studies assessed item bias or performed any detailed analyses to verify scalar equivalence. Employing these additional analyses would have greatly enhanced the validity of the reported cross-cultural comparisons in these four studies. Equivalence studies. Two of the 15 studies were treated as separate cases, as they were specifically designed to demonstrate and evaluate equivalence between two language versions of a scale (see Table 1). Therefore, we did not evaluate these the same way as the other 13 studies. Instead, they are examples of how to enhance cross-cultural validity of translated and adapted scales. We concluded that Mallinckrodt and Wangs (2004) approach to determine construct equivalence between language versions of a measure was significantly more rigorous than the one presented by Chang and Myers (2003). As can be seen from Table 1, Chang and Myers (2003) employed three bilingual persons in lieu of back translation. In their approach, bilingual persons average scale scores to both versions of a scale were compared. Mallinckrodt and Wang (2004), in contrast, used both back translation and bilingual individuals to demonstrate and ensure equivalence. Their method subsumed the method employed by Chang and Myers. Following a back translation of an instrument, Mallinckrodt and Wang used a quantitative methodology, the DLSH, to assess equivalence between two language versions of a scale (see discussion earlier). In brief, with this approach, responses from bilingual individuals receiving half of the items in each language were compared to a criterion sample of persons responding to the original version of the scale. By comparing average scale scores, reliability coefficients, and construct validity correlations, the researchers were able to examine the equivalence (construct and to some degree scalar equivalence) between the two language versions of the instrument. Interpretation of Results The current results are consistent with Mallinckrodt and Wang (2004), who discovered in their review of articles published in two counseling journals (JCP and TCP) that few studies in counseling psychology have investigated multilingual or international groups or employed translation methods. Additionally, consistent with these investigators, we found in many instances, counseling researchers used inadequate procedures to verify equivalence between language versions of an instrument. For example, our analyses


indicated just more than half of the 15 studies employed a committee of translators. A committee is highly recommended in the ITC guidelines (van de Vijver & Hambleton, 1996). We also discovered in less than half of the 15 studies that the measurement devices were pretested, and in slightly more than half of the studies, the researchers used quantitative methods to further demonstrate equivalence. Furthermore, only 1 study systematically controlled for method bias, while none of the 15 studies assessed for item bias. All these procedures are recommended in the ITC guidelines. On a positive note, however, all but 2 studies used a back-translation procedure to enhance equivalence. Taken together, all of these results are disquieting and lead us to call for employing more rigorous research designs when studying culture, when using and evaluating translated instruments, and when performing crosscultural comparisons. Additionally, we found, in many cases, limited attention was placed on discussing translation methods. Hambleton (2001) also observed this trend. Not knowing the reason for this lack of effort, we speculate about why methods of translation were not described in more detail. One reason could be the lack of importance placed on this methodological feature of a research design. Another may relate to an authors desire to comply with page limitations in journals. A third reason could be a researchers failure to recognize the importance of reporting the details about methods of translation. Finally, it is conceivable that researchers assume others are aware of common methods of translation and thus do not discuss the methods they use in much detail. Whatever the reasons, consistent with the ITC guidelines, we strongly suggest investigators provide detailed information about the methods they employ when translating and validating instruments used in research. This is especially important, as an inappropriate translation of a measure can lead to a serious threat to a studys internal validity, may contribute to bias, and in international comparisons may limit the level of equivalence between multilingual versions of a measure. As a threat to internal validity, a poorly translated instrument may act as a strong rival hypothesis for obtained results.

RECOMMENDATIONS Translation Practices Several steps are essential for a valid translation. Based on our and Brislin and colleagues (Brislin, 1986; Brislin et al., 1973) review of common translation methods and the ITC guidelines (e.g., Hambleton, 2001; van de Vijver & Hambleton, 1996), the best translation procedure involves several steps as


TABLE 2: Summary of Recommended Translation Practices


1. Independent translation from two or more persons familiar with the target language and culture and intent of the scale 2. Documentation of comparisons of translations and agreement on the best translation 3. Rewriting of translated items to fit grammatical structure of target language 4. Independent back translation of translated measure into original language (one or more persons) 5. Comparison of original and back-translated versions, focusing on appropriateness, clarity, meaning (e.g., use rating scales) 6. Changes to the translated measure based on prior comparison. Changed items go through the translation/back-translation iteration until satisfactory 7. If concepts or ideas do not translate well, deciding what version of the original version of a scale should be used for cross-cultural comparison (original, back translated, or decentered) 8. Pretest of translated instrument on an independent sample (bilinguals or target language group). Check for clarity, appropriateness, and meaning 9. Assessment of the scales reliability and validity, absence of bias, and equivalence to the original-language version of the scale

outlined in Table 2. All but the last step outlined in this table help to minimize item and construct bias and therefore may increase scalar equivalence between language versions of a measure (ITC development guidelines). The last step or recommendation refers to verifying cross-cultural validity of measures (i.e., absence of bias and equivalence; ITC interpretation guidelines). Combining Emic and Etic Approaches As stated previously, the cross-cultural approach to studying cultural influences on behavior has limitations. One risk involves assuming universal laws of behavior and neglecting an in-depth understanding of cultures and their influences on behavior (e.g., imposed etics). To address this problem, and in line with suggestions reviewed earlier, we offer several recommendations for counseling psychologists involved in international research. First, collaboration between scholars worldwide and across disciplines is suggested to enhance the quality of cross-cultural studies and the validity of methods and findings. Such collaboration increases the possibility that unique cultural variables will be incorporated into the research and potential threats to internal and external validity will be reduced. Second, to avoid potential method bias, an integration of quantitative and qualitative methods should be considered, especially when one type of method may be more appropriate and relevant to a particular culture. A convergence of results from both methods


enhances the validity of the findings. Third, when method bias is not expected but there is a potential for construct bias while the use of a preexisting measure is considered feasible, researchers should consider collecting emic items to be included in the instrument when studying an etic construct (e.g., Brislin, 1976; Oh & Neville, 2004). This approach will enhance construct equivalence by limiting construct bias and will provide culture-specific information to aid theory development. Fourth, when emic scales are available in the cultures of interest to assess an etic construct and cross-cultural comparisons are sought, the convergence approach should be considered. With this approach, all instruments are translated and administered to each cultural group. Then, items and scales shared across cultures are used for cross-cultural comparisons, whereas nonshared items provide information about the unique aspect of the construct in each culture (e.g., van de Vijver, 1998). This approach will enhance construct equivalence, it may deepen the current understanding of cultural and cross-cultural dimensions of a construct, and it may aid theory development. Finally, Triandiss (1972, 1976) suggestion can be considered. With this procedure, instruments are simultaneously assembled in each culture to measure the etic construct (e.g., subjective well-being). With this approach, most or all types of biases can be minimized and equivalence enhanced, as no predetermined stimuli are used. Factor analyses can be performed to identify etic constructs for cross-cultural comparisons.

CONCLUSION Given our professions increased interest in international topics, there is a critical need to address methodological challenges unique to this area. We discussed important challenges such as translation, equivalence, and bias. Proper translation methods may strengthen the equivalence of constructs across cultures, as a focus on instrumentation can minimize item bias and some method bias. Consequently, construct equivalence may be enhanced. Merely targeting an instruments translation, however, is not sufficient. Other factors to consider when making cross-cultural comparisons are evidence of construct and scalar equivalence and the absence of construct, item, and method bias. Implications of well-designed cross-cultural research are many. Obviously, establishing the cross-cultural generalizability of theories and counseling approaches across cultures is critical. Without strong cross-cultural methodology, erroneous conclusions can be made about similarities and differences between cultural groups in research and when using counseling and assessment strategies. One should not, for instance, employ psychodynamic approaches when working with persons from a cultural group expecting



solution-focused interventions in line with their cultural norms. Similarly, one should not assume an instrument developed in one culture is appropriate to use and will yield valid findings about another cultural group. Counseling psychologists should not only demonstrate cultural awareness, knowledge, and skills to deliver competent mental health services (American Psychological Association, 2003; Arrendondo et al., 1996), they should also display this talent in cross-cultural research. Understanding methods of sound translation and procedures for reducing bias and enhancing the validity of cross-cultural findings are essential for the informed scientist professional. To deliver culturally appropriate and effective services, counseling psychologists must generate and rely on valid cross-cultural studies. Additionally, we should collaborate with professionals worldwide. The science and practice of cross-cultural counseling psychology would be strengthened through this effort. More important, there would be a greater likelihood that various paradigms of cross-cultural counseling psychology would be appropriate to the culture, context, and population being studied and/or served. Ultimately, such paradigms can contribute to the preservation of different cultures worldwide and enhance individuals quality of life.

Adamopolous, J., & Lonner, W. J. (2001). Culture and psychology at a crossroad: Historical perspective and theoretical analysis. In D. Matsumoto (Ed.), The handbook of culture and psychology (pp. 11-34). New York: Oxford University Press. American Psychological Association. (2003). Guidelines on multicultural education, training, research, practice, and organizational change for psychologists. American Psychologist, 58, 377-402. Arrendondo, P., Toporek, R., Brown, S. P., Jones, J., Locke, D. C., Sanchez, J., et al. (1996). Operationalization of the multicultural counseling competencies. Journal of Multicultural Counseling and Development, 24, 42-78. Asner-Self, K. K., & Marotta, S. A. (2005). Developmental indices among Central American immigrants exposed to way-related trauma: Clinical implications for counselors. Journal of Counseling and Development, 83, 162-171. Asner-Self, K. K., & Schreiber, J. B. (2004). A factor analytic study of the attributional style questionnaire with Central American immigrants. Measurement and Evaluation in Counseling & Development, 37, 144-153. Berry, J. W. (1969). On cross-cultural comparability. International Journal of Psychology, 4, 119-128. Berry, J. W. (1989). Imposed etics-emics-derived etics: The operationalization of a compelling idea. International Journal of Psychology, 26, 721-735. Betz, N. E. (2005). Enhancing research productivity in counseling psychology: Reactions to three perspectives. The Counseling Psychologist, 33, 358-366. Brislin, R. W. (1970). Back-translation for cross-cultural research. Journal of Cross-Cultural Psychology, 1, 185-216. Brislin, R. W. (1976). Comparative research methodology: Cross cultural studies. International Journal of Psychology, 11, 213-229.


Brislin, R. W. (1983). Cross cultural research in psychology. Annual Review of Psychology, 34, 363-400. Brislin, R. W. (1986). The wording and translation of research instruments. In W. J. Lonner & J. W. Berry (Eds.), Field methods in cross-cultural research (pp. 137-164). Beverly Hills, CA: Sage. Brislin, R. W., Lonner, W. J., & Thorndike, R. M. (1973). Cross-cultural research methods. New York: John Wiley. Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (2001). MMPI-2 (Minnesota Multiphasic Personality Inventory 2): Manual for administration and scoring (Rev. ed.). Minneapolis: University of Minnesota Press. Byrne, B. M. (2004). Testing for multigroup invariance using AMOS graphics: A road less traveled. Structural Equation Modeling: A Multidisciplinary Journal, 11, 272-300. Chang, C. Y., & Myers, J. E. (2003). Cultural adaptation of the wellness evaluation of lifestyle: An assessment challenge. Measurement and Evaluation in Counseling and Development, 35, 239-250. Cheung, G. W., & Rensvold, R. B. (2000). Assessing extreme and acquiescence response sets in cross-cultural research using structural equation modeling. Journal of Cross-Cultural Psychology, 31, 188-213. Chung, R. C., & Bemak, F. (2002). Revisiting the California Southeast Asian mental health needs assessment data: An examination of refugee ethnic and gender differences. Journal of Counseling and Development, 80, 111-119. Crowne, D. P., & Marlowe, D. (1960). A new scale of social desirability independent of psychopathology. Journal of Consulting Psychology, 24, 349-354. Douce, L. A. (2004). Globalization of counseling psychology. The Counseling Psychologist, 32, 142-152. Engels, R. C. M. E., Finkenauer, C., Meeus, W., & Dekovic, M. (2001). Parental attachment and adolescents emotional adjustment: The associations with social skills and relational competence. Journal of Counseling Psychology, 48, 428-439. Fouad, N. A. (1991). Training counselors to counsel international students. The Counseling Psychologist, 19, 66-71. Gerstein, L. H. (2005). Counseling psychologists as international social architects. In R. L. Toporek, L. H. Gerstein, N. A. Fouad, G. Roysircar-Sodowsky, & T. Israel (Eds.), Handbook for social justice in counseling psychology: Leadership, vision, and action (pp. 377-387). Thousand Oaks, CA: Sage. Gerstein, L. H., & gisdttir, S. (Eds.). (2005a). Counseling around the world [Special issue]. Journal of Mental Health Counseling, 27, 95-184. Gerstein, L. H., & gisdttir, S. (Eds.). (2005b). Counseling outside of the United States: Looking in and reaching out [Special section]. Journal of Mental Health Counseling, 27, 221-281. Gerstein, L. H., & gisdttir, S. (2005c). A trip around the world: A counseling travelogue! Journal of Mental Health Counseling, 27, 95-103. Greene, R. L. (2000). The MMPI-2: An interpretive manual (2nd ed.). Boston: Allyn & Bacon. Hambleton, R. K. (2001). The next generation of the ITC test translation and adaptation guidelines. European Journal of Psychological Assessment, 17, 164-172. Hambleton, R. K., & de Jong, J. H. A. L. (2003). Advances in translating and adapting educational and psychological tests. Language Testing, 20, 127-134. Hambleton. R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Dordrecht, Netherlands: Kluwer. Heppner, P. P. (2006). The benefits and challenges of becoming cross-culturally competent counseling psychologists: Presidential address. The Counseling Psychologist, 34, 147-172. Ho, D. Y. F. (1996). Filial piety and its psychological consequences. In M. H. Bond (Ed.), Handbook of Chinese psychology (pp. 155-165). Hong Kong: Oxford University Press.



Jahoda, G. (1966). Geometric illusions and environment: A study in Ghana. British Journal of Psychology, 57, 193-199. Johnson, T., Kulesa, P., Cho, Y. I., & Shavitt, S. (2005). The relation between culture and response styles: Evidence from 19 countries. Journal of Cross-Cultural Psychology, 36, 264-277. Johnson, T. P. (1998). Approaches to equivalence in cross-cultural and cross-national survey research. In ZUMA (Centrum fur Unfragen Methoden und Analysen)-Nachrichten Spezial Band 3: Cross-Cultural Survey Equivalence (pp. 1-40). Retrieved from http://www.gesis .org/Publikationen/Zeitschriften/ZUMA_Nachrichten_spezial/zn-sp-3-inhalt.htm Kasturirangan, A., & Nutt-Williams, E. (2003). Counseling Latina battered women: A qualitative study of the Latina perspective. Journal of Multicultural Counseling and Development, 31, 162-178. Kim, U. (2001). Culture, science, and indigenous psychologies. In D. Matsumoto (Ed.), The handbook of culture and psychology (pp. 51-76). New York: Oxford University Press. Leong, F. T. L., & Blustein, D. L. (2000). Toward a global vision of counseling psychology. The Counseling Psychologist, 28, 5-9. Leong, F. T. L., & Ponterotto, J. G. (2003). A proposal for internationalizing counseling psychology in the United States: Rationale, recommendations, and challenges. The Counseling Psychologist, 31, 381-395. Leung, S. A. (2003). A journey worth traveling: Globalization of counseling psychology. The Counseling Psychologist, 31, 412-419. Little, T. D. (1997). Mean and covariance structures (MACS) analyses of cross-cultural data: Practical and theoretical issues. Multivariate Behavioral Research, 32, 53-76. Little, T. D. (2000). On the comparability of constructs in cross-cultural research: A critique of Cheung and Rensvold. Journal of Cross-Cultural Psychology, 31, 213-219. Lonner, W. J. (1985). Issues in testing and assessment in cross-cultural counseling. The Counseling Psychologist, 13, 599-614. Lonner, W. J., & Berry, J. W. (Eds.). (1986). Field methods in cross-cultural research. Beverly Hills, CA: Sage. Mallinckrodt, B., & Wang, C.-C. (2004). Quantitative methods for verifying semantic equivalence of translated research instruments: A Chinese version of the Experiences in Close Relationships Scale. Journal of Counseling Psychology, 51, 368-379. Malpass, R. S., & Poortinga, Y. H. (1986). Strategies for design and analysis. In J. W. Berry & J. W. Lonner (Eds.), Cross-cultural research and methodology series: Vol. 8. Field methods in cross-cultural research (pp. 47-83). Beverly Hills, CA: Sage. McCrae, R. R., & Costa, P. T. (1997). Personality trait structure as a human universal. American Psychologist, 52, 509-516. Oh, E., & Neville, H. (2004). Development and validation of the Korean rape myth acceptance scale. The Counseling Psychologist, 32, 301-331. Pedersen, P. B. (1991). Counseling international students. The Counseling Psychologist, 19, 10-58. Pedersen, P. B. (2003). Culturally biased assumptions in counseling psychology. The Counseling Psychologist, 31, 396-403. Ployhart, R. E., & Oswald, F. L. (2004). Applications of mean and covariance structure analysis: Integrating correlational and experimental approaches. Organizational Research Methods, 7, 2765. Poasa, K. H., Mallinckrodt, B., & Suzuki, L. A. (2000). Causal attributions for problematic family interactions: A qualitative, cultural comparison of Western Samoa, American Samoa, and the United States. The Counseling Psychologist, 28, 32-60. Ponterotto, J. G., Casas, J. M., Suzuki, L. A., & Alexander, C. M. (Eds.). (1995). Handbook of multicultural counseling (2nd ed.). Thousand Oaks, CA: Sage. Shin, J. Y., Berkson, G., & Crittenden, K. (2000). Informal and professional support for solving psychological problems among Korean-speaking immigrants. Journal of Multicultural Counseling and Development, 28, 144-159.


Tanaka-Matsumi, J. (2001). Abnormal psychology and culture. In D. Matsumoto (Ed.), The handbook of culture and psychology (pp. 265-286). New York: Oxford University Press. Tang, M. (2002). A comparison of Asian American, Caucasian American, and Chinese college students: An initial report. Journal of Multicultural Counseling and Development, 30, 124-134. Torres, L., & Rollock, D. (2004). Acculturative distress among Hispanics: The role of acculturation, coping, and intercultural competence. Multicultural Counseling and Development, 32, 155-167. Triandis, H. C. (1972). The analysis of subjective culture. New York: John Wiley. Triandis, H. C. (1975). Social psychology and cultural analysis. Journal of the Theory of Social Behaviour, 5, 81-106. Triandis, H. C. (1976). Approaches toward minimizing translation. In R. Brislin (Ed.), Translation: Applications and research (pp. 229-243). New York: Wiley/Halstead. Triandis, H. C. (1994). Culture and social behavior. New York: McGraw-Hill. Triandis, H. C. (2000). Dialectics between cultural and cross-cultural psychology. Asian Journal of Social Psychology, 3, 185-195. United Nations. (2005). Human development indicators. Retrieved from reports/global/2005/pdf/HDR05_HDI.pdf van de Vijver, F. J. R. (1998). Towards a theory of bias and equivalence. In ZUMA (Centrum fur Unfragen Methoden und Analysen)-Nachrichten Spezial Band 3: Cross-Cultural Survey Equivalence (pp. 41-65). Retrieved from Zeitschriften/ZUMA_Nachrichten_spezial/zn-sp-3-inhalt.htm van de Vijver, F. J. R. (2001). The evolution of cross-cultural research methods. In D. Matsumoto (Ed.), Handbook of culture and psychology (pp. 77-97). New York: Oxford University Press. van de Vijver, F. J. R., & Hambleton, R. K. (1996). Translating tests: Some practical guidelines. European Psychologist, 1, 89-99. van de Vijver, F. J. R., & Leung, K. (1997). Methods and data analysis for cross-cultural research. Thousand Oaks, CA: Sage. van de Vijver, F. J. R., & Poortinga, Y. H. (1997). Towards an integrated analysis of bias in cross-cultural assessment. European Journal of Psychological Assessment, 13, 29-37. Van Hemert, D. A., van de Vijver, F. J. R., Poortinga, Y. H., & Georgas, J. (2002). Structural and functional equivalence of the Eysenck Personality Questionnaire within and between countries. Personality and Individual Differences, 33, 1229-1249. Wei, M., & Heppner, P. P. (2005). Counselor and client predictors of the initial working alliance: A replication and extension to Taiwanese client-counselor dyads. The Counseling Psychologist, 33, 51-71. Werner, O., & Campbell, D. T. (1970). Translating, working through interpreters, and the problem of decentering. In R. Naroll & R. Hohen (Eds.), A handbook of methods in cultural anthropology (pp. 398-420). New York: American Museum of Natural History. Weston, R., & Gore, P. A., Jr. (2006). A brief guide to structural equation modeling. The Counseling Psychologist, 34, 719-751. gisdttir, S., & Gerstein, L. H. (2000). Icelandic and American students expectations about counseling. Journal of Counseling and Development, 78, 44-53. gisdttir, S., & Gerstein, L. H. (2005). Reaching out: Mental health delivery outside the box. Journal of Mental Health Counseling, 27, 221-224. gisdttir, S., Gerstein, L. H., & Gridley, B. E. (2000). The factorial structure of the expectations about counseling questionnaire-brief form: Some serious questions. Measurement and Evaluation in Counseling and Development, 33, 3-20.