You are on page 1of 16

INTERNATIONAL JOURNAL OF SELECTION AND ASSESSMENT VOLUME 13 NUMBER 1 MARCH 2005

Self–Other Agreement:
Comparing its Relationship with Performance
in the U.S. and Europe
Leanne Atwater * and Cheri Ostroff
David Waldman Columbia University
Arizona State University

Chet Robie Karin M. Johnson


Wilfrid Laurier University Minneapolis MN

The use of multi-source feedback has proliferated in the United States in recent years;
however, its usefulness in other countries is unknown. Using a large sample of American
managers (n 5 3793), this study first replicated earlier studies demonstrating that
simultaneous consideration of self and other ratings of leadership skills is important for
managerial performance ratings. In addition, the impact of self–other agreement on
performance was investigated among 2732 managers in five European countries (U.K.,
Germany, France, Denmark, Italy). Results indicated that the effect of self and other
ratings in the prediction of performance differs between the U.S. and the European
countries in that the simultaneous inclusion of both self and other ratings is generally
less useful in those countries than in the U.S. Further, the effect of self–other agreement
varies among the European countries. Implications for multi-source feedback
interventions as well as multi-national personnel management are discussed.

W ith the advent of multi-source feedback processes,


there has been a growing concern about the extent
of agreement between an individual’s self-evaluation and
work behavior (Warr, 1987), it would be useful to assess the
relevance of self–other agreement in countries outside the
U.S. In this study, we examine the relevance of self–other
evaluations of that individual made by others (e.g., agreement in the U.S. and five European countries.
subordinates or peers). Research has demonstrated that
self–other agreement is related to individual outcomes
such as performance and promotability (McCaulley &
Self-Awareness, Self–Other Agreement,
Lombardo, 1990; McCall & Lombardo, 1983; Bass
and Performance
& Yammarino, 1991; Atwater, Ostroff, Yammarino, &
The theoretical rationale underlying self–other agreement
Fleenor, 1998). However, to our knowledge, self–other
and its relationship with performance and other outcomes
agreement and its relationship with outcomes has only
(e.g., promotions) stems from control theory. Control
been systematically studied in the U.S. Given the vast
theory (Carver & Scheier, 1981) proposes that individuals
differences in cultural values and their potential impact on
are continuously matching their behavior to goals or
standards. In the organizational domain, that goal or
The authors would like to thank Paul Bly and Chris Carraher from
Personnel Decisions International for their assistance with data standard could be the perceptions others have of the
analyses. individual. For example, a leader may want to be seen by
*Address for correspondence: Leanne Atwater, Arizona State Uni- his supervisor as technically competent. If she recognizes,
versity, West Campus, 4701 W. Thunderbird Rd, School of Global
Mgt and Leadership, P.O. Box 37100, Phoenix AZ 85069-7100, USA. given the attitudes and behavior of the supervisor toward
E-mail: Leanne.atwater@asu.edu her (e.g., tasks he assigns her), that the supervisor does not

r Blackwell Publishing Ltd 2005, 9600 Garsington Road, Oxford OX4 2DQ, UK and
350 Main Street, Malden, MA 02148, USA. 25
26 LEANNE ATWATER, DAVID WALDMAN, CHERI OSTROFF, CHET ROBIE AND KARIN M. JOHNSON

perceive her as technically competent, she may make nosis is influenced by the way in which one views his
behavioral changes in an attempt to modify her supervisor’s or herself. Self-perception is a key element in the self-
perceptions. According to control theory, individuals must regulation process (Ashford, 1989; Taylor & Brown,
do three things to bring their behavior in line with goals. 1988). As noted by Bass and Yammarino (1991), high
First, they must have a goal they are trying to achieve (e.g., self-ratings when coupled with low ratings from others
positive perception by one’s supervisor that they are (i.e., overestimation) can lead to ignoring criticism and
technically competent). Second, they must recognize that discounting failure, which, in turn, would be expected to be
their behavior, or the individual’s perception is not in line associated with poorer performance.
with that goal. Third, they need to enact behavior in an In 1993, Yammarino and Atwater presented a model of
attempt to meet the goal. In this case, if the leader is self- self-perception accuracy whereby individuals whose self-
aware and realizes that her goal to be viewed as technically ratings were in agreement with others were purported to
competent by her boss has not been achieved, she may have superior individual and organizational outcomes, those
decide to pursue training opportunities, or to take time to who overestimated their competencies would have lower
demonstrate her technical competence to her boss. If the outcomes, and under-raters would have a mix of positive
leader has a goal to be seen as technically competent, and negative outcomes. This model was later expanded
recognizes this has not been achieved and enacts behavior to include four categories of agreement: in-agreement good
to achieve this goal, the three steps to bring behavior in line (or high ratings from both self and other); in-agreement poor
with goals have been achieved. The relevance of self–other (or low ratings from both self and other); under-rater; and
agreement in this process is that the individual who does over-rater (Yammarino & Atwater, 1997). Different out-
not recognize the discrepancy between his/her and others’ comes were theorized to be associated with each type of
perceptions will not see the need to alter behavior to change agreement. Outcomes for those with in-agreement good
others’ perceptions. In this case, the leader will continue to ratings are expected to be the highest because both self and
get assignments that are not at the level she is capable of other believe the individual is doing well; over-raters are
achieving. To take another example, perhaps the leader expected to have the lowest outcomes because problems seen
wants his subordinates to perceive him as considerate, by others are not recognized by the self-rater and therefore
recognizing that leaders who are perceived positively have not addressed; those with in-agreement poor ratings are also
more effective workgroups. If the leader does not recognize expected to have very low outcomes because the individual,
that the subordinates do not perceive the leader as while recognizing weaknesses, has not addressed them; and
considerate he will make no effort to become more under-raters are expected to have a mixture of positive
considerate or to change the subordinates’ perceptions. and negative outcomes because they are likely doing well and
The leader will be less effective because he will not be trying hard to improve, but may not accurately recognize
treating subordinates in a way to maximize their motiva- their own strengths and weaknesses. To go back to the
tion and performance. control theory example, over-raters see no discrepancy
Ashford (1989) pointed out that employees need to between their behavior and goal. In-agreement poor raters
develop proficiency in observing and evaluating their see the discrepancy but do not enact behavior to change the
behavior in a manner that is consistent with how others discrepancy. In both cases, performance suffers. Interestingly,
perceive and evaluate it. Accurate (or in-agreement) self- while not the focus of this paper, 360 (multi-rater) feedback
assessments should be associated with more positive is designed to make over-raters aware of these discrepancies,
outcomes because they help employees correct mistakes thereby motivating change in their behavior. Consistent with
and help them tailor their behavior to the demands of the control theory notions, earlier work has shown that it is the
organization. Smircich and Chesser (1981) suggested that over-raters who demonstrate the most improvement follow-
self and other ratings or perceptions that are in-agreement, ing 360 feedback (Atwater, Roush, & Fischthal, 1995).
are preferable because it indicates a degree of mutual The increasing use of 360 feedback processes has led to
understanding. Self-awareness has also been noted as increased interest in self–other agreement and its outcomes.
an important element of emotional intelligence, which, Feedback interventions such as 360-degree feedback have
in turn, has been associated with effective leadership experienced rapid proliferation in the U.S. (Bracken,
(Megerian & Sosik, 1996). Timmreck, & Church, 2001) with estimates that over one-
While self-awareness and self–other agreement are third of U.S. companies are using some type of multi-source
desirable, self-enhancement bias is common when indivi- feedback process for managers. The goal of these feedback
duals are asked to rate themselves (Paulhaus, 1986; processes is to increase managers’ awareness of performance
Sackheim, 1983; Taylor & Brown, 1988). Thus, self- deficiencies and to give managers information about how
ratings are often inflated when compared with ratings others view them, thereby increasing their self-awareness.
provided by others (Mabe & West, 1982; Harris & The presumption is that increased self-awareness, or self–
Schaubroeck, 1988). By themselves, high self-ratings may other agreement as it has been operationalized (e.g., Church,
be positive since they may coincide with fewer negative 1997), will ultimately benefit performance and organiza-
thoughts and strong self-confidence. However, self-diag- tional outcomes. However, the extent to which self–other

International Journal of Selection and Assessment r Blackwell Publishing Ltd 2005


SELF–OTHER AGREEMENT 27

agreement is related to performance is still being debated. polynomial regression procedures (e.g., Atwater et al.,
Additionally, to our knowledge, while 360-degree feedback 1998; Johnson & Ferstl, 1999). As such, it is important to
is beginning to spread to countries outside the U.S., delineate the functional form of self–other agreement
empirical research on the relevance of self–other agreement effects, explicating the direction of agreement (in agree-
outside the U.S. has not been conducted. ment, self greater than other, self less than other) as well as
There are a variety of individual and contextual factors the degree of agreement and level of associated attributes as
that impact self and other ratings and ultimately self–other rated by self and other.
agreement (Atwater & Yammarino, 1992). One such factor Atwater and her colleagues (Atwater et al., 1998)
may be the cultural context. Given the differences in cultural supported a largely additive model whereby both self and
values and job contexts across cultures (Warr, 1987), the ways other ratings were related to outcomes (although there
in which self and others provide ratings of a manager, were some nonlinear trends in the relationships found as
the degree of agreement among sources of ratings, and the well). Along the line of perfect agreement (i.e., when self-
relationship between self–other agreement and performance ratings equal other ratings), agreement at higher levels of
could vary across cultures. The primary purpose of the rated leadership attributes was associated with higher
present research was to examine the effects of self–other performance than agreement at lower levels of rated
agreement in a cross-cultural sample. We selected a sample of attributes. That is, it is not merely agreement that is related
five European countries where 360-degree feedback was to outcomes, but rather whether self and others agree that
currently being used. This was done for practical reasons (i.e., leader’s behavior is good or poor. Further, Atwater et al.
relevant data could be obtained) and because we believe that (1998) reported additional effects for over- vs. under-
organizations using 360-degree feedback have at least some estimation. Lack of agreement when self-ratings were
belief that self-awareness (self–other agreement) is advanta- lower than other ratings (underestimator) was related to
geous or they would not have adopted a 360 process that higher performance than lack of agreement when self-
included self-ratings. However, we also believe that it is ratings were higher than other ratings (overestimator).
important not to assume that the same phenomena (e.g., self- That is, to underestimate one’s leadership behaviors is less
awareness) have the same importance or relevance across problematic than to over-rate one’s leadership. This
cultures. As has been noted in recent leadership theory (e.g., supports the findings of McCall and Lombardo (1983)
Dorfman, 1996; House, Hanges, Ruiz-Quintanilla, Dorfman, that arrogance is a factor that derails careers. These results
Javidan, Dickson, & Gupta, 1999), factors that lead to one are consistent with prior research (e.g., Atwater &
result in one culture may not necessarily have the same result Yammarino, 1997; Atwater et al., 1998; Van Velsor,
in another culture. Further, the theory of self–other agreement Taylor, & Leslie, 1993; Yammarino & Atwater, 1993).
and its effects has been developed in the U.S., by U.S. Although the functional form described above and the
researchers using U.S. samples (e.g., Atwater & Yammarino, effects of in-agreement, overestimation, and underestima-
1997). Thus, ethnocentric basis is likely to be inherent in the tion have been documented in a number of U.S. samples,
theory and its relevance to other cultures can be questioned. the extent to which aspects of this model are generalizable
Given that the self–other agreement literature has across cultures has not been investigated. In the following
developed largely in the U.S., we use the U.S. model and section, we explore rationales for the generalizability and
theory as a ‘‘benchmark’’ and explore the extent to which differences in this model across different cultures.
self–other agreement and its relationship with performance
holds in other cultures, specifically in the U.K., Germany,
The Effects of Self–Other Agreement across
Italy, Denmark, and France. We first provide an overview
Countries
of the U.S. model, detailing the effects of over- and
underestimation with regard to self–subordinate and self– As noted above, self–other agreement is generally prefer-
peer comparisons. We then explore self–other agreement able to disagreement (i.e., under- or overestimation)
across cultures using the literature on cultural values to because it indicates some level of understanding between
help predict whether self–other agreement is culture self and other (Smircich & Chesser, 1981). This will be
specific or culture universal. In so doing, we address the the case particularly when self and other ratings are in
potential impact of cultural differences across countries on agreement and favorable. At a general level, we propose
self–other agreement and its outcomes. that overall, self–other agreement will be relevant to
performance across cultures because both self-awareness
and high ratings should contribute to better performance
Self–Other Agreement in the United States
regardless of the organizational context or cultural norms.
Based on analytical procedures described by Edwards
H1a: Across cultures, self–subordinate agreement in
(1994), recent research in the self–other agreement area has
ratings will be associated with managerial performance.
shown that the relationship between self-ratings, other
ratings (e.g., subordinates and peers), and outcomes should H1b: Across cultures, self–peer agreement in ratings will be
be conceptualized and modeled in three dimensions using associated with managerial performance.

r Blackwell Publishing Ltd 2005 Volume 13 Number 1 March 2005


28 LEANNE ATWATER, DAVID WALDMAN, CHERI OSTROFF, CHET ROBIE AND KARIN M. JOHNSON

However, despite the general importance of agreement masculinity, leveling is the norm. Individuals do not try
across cultures, the nature of the agreement effect may vary to be better than others. In high masculinity countries the
across cultures. Specifically, we are interested in examining norm is to try to be the best.
whether self–other comparison adds to ratings provided by As evidence of the robustness of Hofstede’s MF measure,
others differentially across cultures in the prediction of the U.S., Italy, and Germany also score high on a similar
performance. That is, the degree to which over- or under- dimension labeled ‘‘mastery’’ in Schwartz’s (1994) new
estimation is a problem may vary according to certain cultural dimensions. The mastery dimension emphasizes
cultural dimensions. In some cultures, the negative effects of active mastery of the social environment and self-assertion.
overestimation may be accentuated. We describe the cultural In cultures high on mastery, emphasis is placed on
dimensions that may influence these phenomena below. competition and getting ahead of other people. This
measure correlates .56 with Hofstede’s masculinity mea-
sure. Additionally, Hofstede (1998) cites other evidence of
Cultural Dimensions, Ratings from Others, the construct validity of the MF measure. For example, MF
and Self–Other Agreement correlated with LV Gordon’s survey of interpersonal values
measured across 17 countries. Students in masculine
Cultural typologies can be complex and multi-faceted.
countries expressed a greater need for recognition and less
Hofstede (1980, 1993) put forth one of the more popular
benevolence. Masculinity was also highly negatively
classification schemes. As noted by Schwartz, ‘‘Hofstede
correlated with the % of GNP the country donated to
sought dimensions of cross-cultural variation in the
poor countries as well as with a preference for higher
responses of more than 117,000 employees of multi-
salaries over shorter working hours.
national business corporations in 40 nations. Based on an
The U.S., U.K., Germany, and Italy are all high on
approach that takes the national sample rather than the
masculinity, according to the work of Hofstede (1980,
individual person as the unit of analysis, Hofstede delivered
2001), with scores for these countries above the median.
what he calls . . . culture-level dimensions’’ (Schwartz,
The scores of both Denmark and France on masculinity
1994, p. 86). Hofstede’s four dimensions (masculinity/
are, according to Hofstede, below the median.
femininity (MF), power distance, individualism/collecti-
Hofstede (1980) defined individualism as a focus on
vism, and uncertainty avoidance) are widely accepted and
concern for oneself over others, and an emphasis on
have been used by numerous researchers to compare
personal autonomy, accomplishment, and self-fulfillment.
different cultural groups (e.g., Bond & Forgas, 1984;
Kim, Triandis, Kagitcibasi, Choi, and Yoon (1994)
Gabrenya, Wang, & Lantane, 1985; Leung, 1988b).
suggested that individualism also includes a component
In using culture as an explanatory variable, it is
of self-focus and egocentrism.
preferable to view it as a multi-dimensional structure
While the cultures studied in the present research are
wherein differences in the strength of cultures on multiple
all above the median on individualism as measured
dimensions are used in combination to explain differences
by Hofstede (1980), a closer inspection reveals potential
in work behavior (Dorfman, 1996; Clark, 1987). We
differences. On Hofstede’s dimension of individualism, the
propose that two of Hofstede’s dimensions, masculinity/
U.S. scores higher than any of the other 39 countries
femininity (MF) and individualism, may be especially
studied. In the meta-analysis of studies carried out in
relevant to understanding self–other agreement across
individualism and collectivism conducted by Oyserman,
cultures.
Coon, and Kemmelmeier (2002), the U.S. again was found
The MF dimension has been defined as:
to be more individualistic than any of the other countries
tough values like assertiveness, performance, success and studied. France and Denmark were lower on individualism
competition, which in nearly all societies are associated than Germany and Italy.
with the role of men, prevail over tender values like the Taken together, France and Denmark can be considered
quality of life, maintaining warm personal relationships, more feminine/less individualistic, relative to other coun-
service, care for the weak, and solidarity, which in nearly tries in our sample while the U.K., Germany, and Italy can
all societies are more associated with women’s roles be considered more masculine/more individualistic. The
(Hofstede, 1993, p. 90)
U.S. can be considered masculine and highly individualis-
and similarly as tic. These differences in cultural values are expected to
affect how self and other ratings are viewed as a managerial
a preference for achievement, heroism, assertiveness and tool. For example, as a result of the extreme individualism
material success as opposed to a preference for relation- in the U.S. coupled with the fact that 360-degree feedback
ships, modesty, caring for the weak and quality of life and theorizing about self–other agreement were developed
(Schwartz, 1994). in the U.S., the relationship between self–other agreement
and performance is expected to be different in the U.S. than
In addition, in describing the masculinity societal norm, in other countries. Specifically, given the large use of 360
Hofstede (1980) indicates that in countries low on feedback processes in the U.S. managers in the U.S. may be

International Journal of Selection and Assessment r Blackwell Publishing Ltd 2005


SELF–OTHER AGREEMENT 29

more accustomed to receiving feedback from multiple a sign of weakness, and, in reality, a sign of low self-esteem
sources and recognize the need to adjust their self- on the part of an individual (Morrison, 1994). These
perceptions accordingly in order to succeed. Further, the qualities, together with an emphasis on workplace relation-
notion that self-regard and accomplishment are paramount ships, suggest that self-enhancement bias (i.e., overestima-
in the U.S. with its highly individualistic culture coupled tion) will be especially counter to effective leadership.
with the value placed on assertiveness and individual That is, a leader who over-rates him/herself in relation to
success inherent in the high masculinity cultures suggests others is likely to be viewed as out-of-touch and arro-
that managers’ self-evaluations may be particularly rele- gant, qualities that are highly unacceptable in such a
vant in the U.S. In contrast, when individualism or culture. Accordingly, it follows that overestimators may be
masculinity are lower, self-regard and assertiveness are of especially ineffective in feminine (or less masculine) coun-
lesser importance and the relationship between self and tries, as compared with overestimators in more masculine
other ratings may be less relevant. As such, we propose that cultures.
both individualism and masculinity and their degree of While we believe MF may be relevant to the effects of
extremity are relevant to self–other agreement. overestimation of one’s leadership, the degree of individu-
We propose the following: alism must also be considered. Markus and Kitayama
(1991) distinguish the U.S. from other countries on a
H2: The pattern of relationships between self and other dimension similar to individualism that they label inde-
(subordinate and peer) ratings and managerial perfor- pendence/interdependence. They propose that a sense of
mance in the U.S. (highest on individualism; high on ‘‘self’’ is more important to those in cultures characterized
masculinity) will differ from that of European countries. by independence. To have a sense of self is to have an
appreciation of how one differs from others and how others
H3: The pattern of relationships between self and other perceive one’s self. Markus and Kitayama also suggest that
(subordinate and peer) ratings and managerial perfor- in independent cultures, the self must be responsive to the
mance in European countries high on masculinity (U.K., social environment. However, that responsiveness is not so
Germany, Italy) will differ from the U.S. and from important for the sake of responsiveness per se – as may be
European countries relatively lower on individualism and the case in interdependent cultural contexts. Instead, they
low on masculinity (France, Denmark). propose that social responsiveness is necessary in order to
best assert one’s sense of self in the social context of
The above hypotheses indicate that the overall pattern of independent cultures. Thus, perceptions of others are
relationships between self–other ratings and performance important, but primarily as ‘‘standards of reflected
should differ for different countries. Below, we explore this appraisal’’ (Markus & Kitayama, 1991, p. 226). In other
notion more specifically by delineating how over- and words, in order to perform effectively in a social context
under-rating effects should differ based on different in independent cultures, one must have an accurate sense
cultural dimensions. of self in relation to perceptions maintained by others.
It follows that especially in independent cultures, over-
estimators may be at a particular disadvantage. These
Culture and Over- and Underestimation results support previous theory and empirical results that
overestimation tends to be associated with poorer perfor-
The theoretical evidence pertaining to the potential effects mance than in-agreement good ratings and underestima-
of overestimation is somewhat mixed. On the one hand, tion (e.g., Atwater et al., 1998).
in cultures high on masculinity (or mastery), overestima- In sum, the theoretical evidence appears to be somewhat
tion may not be as likely to be associated with poor mixed with regard to our cultural combinations. For
performance when compared with more feminine cultures. example, although arguments relevant to masculinity
The emphasis on assertiveness and competition may foster would suggest that overestimation is less of a problem in
at least a degree of appreciation for, or tolerance of, the the U.S., the arguments of Markus and Kitayama (1991)
overestimator. Stated another way, the arrogance that would suggest the opposite for an independent/individua-
could potentially be attributed to an overestimator might listic culture for which the U.S. can also be characterized.
be ‘‘forgiven’’ more in a masculine country because of While our analyses must be considered somewhat explora-
values that stress self-confidence and assertiveness. The tory in nature, on balance we feel that the following
poor relationships with others that are likely to accompany predictions are warranted:
overestimation may not be as problematic in masculine
countries in terms of relationships with effectiveness as a H4a: Self-subordinate overestimation will be associated
manager. Maintaining a sense of humility is important in with poorer managerial performance in European low
feminine cultures (Adler, 1991; Sabath, 1999). These masculinity/lower individualism countries as compared
cultures discourage bragging or boasting (i.e., self-en- with the U.S. and as compared with European high
hancement) because such behavior is considered immature, masculinity/higher individualism countries.

r Blackwell Publishing Ltd 2005 Volume 13 Number 1 March 2005


30 LEANNE ATWATER, DAVID WALDMAN, CHERI OSTROFF, CHET ROBIE AND KARIN M. JOHNSON

H4b: Self-peer overestimation will be associated with boss, 15% by two bosses, and less than 2% by more than
poorer managerial performance in European low mascu- two bosses.
linity/lower individualism countries as compared with the Respondents were given the leadership surveys by their
U.S. and as compared with European high masculinity/ focal manager in a sealed packet and they were mailed back
higher individualism countries. directly to the consulting company that scanned and scored
them. Respondents were told that the surveys were for
The same cultural values also may be used to explain the gathering feedback for the manager’s development and
impact of underestimation. When the norm is one of they were confidential. That is, they were told that only the
assertiveness, competition, and trying to be the best, as in manager being rated would see the results. In addition, peer
countries with high masculinity and individualism, rating ratings and subordinate ratings were aggregated for each
oneself lower than others may be seen as a sign of weakness source so no individual responses were included in the
or low self-confidence. Hence, underestimators are likely feedback. Respondents (except bosses) were guaranteed
to have lower performance (compared with those whose anonymity. All questionnaires were translated and back
self-ratings are characterized by agreement with others) in translated into the native language of the respondents using
countries with high masculinity and individualism. In more a standard procedure by a professional translation organi-
feminine/less individualistic countries, where the norm is zation in each host country. Feedback reports were
one of leveling or not trying to be better than others, prepared for each manager and discussed with them in
underestimation may either be related to higher perfor- small group meetings.
mance or may be unrelated to performance. Humility and In both samples, the average number of subordinate
modesty are valued; hence, underestimating one’s perfor- surveys received per manager was four. The average
mance may be viewed as consistent with leadership number of peer surveys received also was four. The range
effectiveness in more feminine/less individualistic countries of number of surveys was 1–20 per manager. The per-
such as France and Denmark. centage of managers receiving only one survey from a peer
In sum, we hypothesize the following: or subordinate ranged from 3% to 7%.
In the U.S. sample, 69% of the managers were male, the
H5a: Self–subordinate underestimation will be associated average age was 41 years, 92% were white, 65% were at
with poorer managerial performance in European high the level of middle management or higher, and 71% had
masculinity/higher individualism countries and the U.S. as been a manager for 3 or more years. In the European
compared with European low masculinity/lower individu- sample, 78% were male, the average age was 40 years,
alism countries. 68% were at the level of middle management or higher, and
70% had been a manager for 3 or more years. Because there
H5b: Self–peer underestimation will be associated with were some managers (less than one percent) working
poorer managerial performance in European high mascu- outside their native or home country, only those for whom
line/higher individualism countries and the U.S. as com- their country of origin matched their current work location
pared with European low masculinity/lower individualism were included in the sample. Most managers (roughly 2/3)
countries. worked for large companies (over 10,000 employees) in
both the U.S. and the European countries.
Method
Measures
Sample and Procedure R
PROFILOR , a multi-rater feedback instrument that
Data were collected from managers in the U.S. and five collects ratings of managerial skills (Hezlett, Ronnkvist,
European countries (Germany, U.K., Italy, France, and Holt, & Hazucha, 1997), was used to assess measures of
Denmark) who participated in a leadership development leader behavior and performance. We used data from peers,
program. As part of each program, a multi-rater feedback subordinates, and self who provided ratings on 196 items
instrument was completed by the manager, his/her boss, across nine dimensions (e.g., analyze issues, drive for
peers, and subordinates. Bosses also rated the manager’s results, influence others). For the purposes of this study, we
competence or overall performance. Of the managers who used items from the seven dimensions that assess leadership
had performance rated by their boss, 3793 U.S. managers (i.e., directing, influencing, motivating, leading coura-
had data from subordinates; 3896 had data from peers. For geously, fostering teamwork, coaching, and developing
the European managers, of those who had performance and championing change). Items are assessed on a 5-point
ratings made by their boss, 2732 had data from sub- scale indicating the frequency with which the manager
ordinates; 2855 had data from peers. In the U.S. sample, engages in each behavior. The response scale ranges from
87% had performance rated by one boss, 11% by two 1 5 not at all to 5 5 to a very great extent. Sample items
bosses, and less than 2% by more than two bosses. In the include ‘‘Provides clear direction and defines priorities for
European sample, 83% had performance rated by one the team,’’ ‘‘Rewards people for good performance,’’ and

International Journal of Selection and Assessment r Blackwell Publishing Ltd 2005


SELF–OTHER AGREEMENT 31

‘‘Negotiates persuasively.’’ In the sample as a whole, full sample, without inclusion of demographic and back-
correlations among the dimensions were quite high (all ground characteristics.
were above .7) and a factor analysis produced a one-factor The use of difference scores or categories to represent
solution; hence, dimensions were combined into a single agreement as has been done in much past multi-source
scale of overall leadership behaviors (i.e., the average of the research does not allow for testing the functional form
44 items). Scale reliabilities were computed for each rater underlying agreement, and further, these indices are
group (self, peer, and subordinate) and ranged from .96 to substantially flawed from a statistical perspective (e.g.,
.99 in all countries. Edwards, 1994). Therefore, to test the relationship
The performance measure used was a one-item measure between self and other ratings and performance, poly-
rated by the manager’s boss, and was collected on a nomial regression procedures with moderators (Edwards,
separate survey. Where a manager had more than one boss 1994; Edwards & Parry, 1993) were used. In a hierarchical
rating, the ratings of performance were averaged (although manner, bosses’ ratings of performance were regressed on
in most cases there was only one boss rating of self-ratings and other ratings in the first step (main effects),
performance). The performance item was ‘‘How would and the product of self times other ratings, the square of
you rate this manager’s competence in his/her current self-ratings and square of other ratings in the second step
position?’’ It was evaluated on a 7-point scale ranging from (higher order effects). Self, subordinate, and peer ratings,
1 5 Outstanding; one of the best to 7 5 Very weak; one of were also centered at the same value, based on the midpoint
the worst. (For the purposes of analyses, this item was of their shared scale (Edwards, 1994).
reverse scored.) While we would have preferred a multi- Examination of the surfaces corresponding to the
item performance measure, only this one-item competency polynomial regression equations aids interpretation. Sev-
measure was collected in all countries. However, in the eral salient features of the surfaces were also examined. The
U.S., a five-item performance measure was also collected. It slope of the line of perfect agreement, or the S 5 0 line (self
included items such as ‘‘gets the job done’’, ‘‘is an effective equals other), is given by a1 5 b11b2 where b1 is the b for
manager overall,’’ and ‘‘produces high quality work.’’ self-ratings and b2 for other (peer or subordinate) ratings.
These items were rated on a 5-point scale. When averaged A curve along the S 5 0 line is indicated by a2 5 b31b41b5
into one performance score and correlated with the (where b3 is the b for self squared, b4 is the b for the cross-
competence measure used in this study, the correlation product of self and other, and b5 is the b for other squared).
was .58 (n 5 3981). We therefore have confidence that our If a1 differs significantly from zero and a2 does not, there
one-item competence measure adequately taps managerial is a linear slope along the line of perfect agreement. A
performance. negative value for a2 indicates a concave surface along the
To justify aggregation across raters for a manager, line of perfect agreement, while a positive value indicates a
intraclass correlations (ICCs) were computed for the 44 convex surface. The impact of over- and underestimation
items across four raters per manager. ICCs for subordinates can be examined through the reverse of the S 5 0 line,
on the leadership dimension ranged from .57 to .67 across or the line perpendicular to the line of perfect agreement
countries. ICCs for peers ranged from .54 to .64 across (e.g., when self 5 5, other 5 1, or centered self 5 2 and
countries. These ratings are adequate to justify aggregation other 5  2). Here, if the quantity a3 5 (b1  b2) differs
(Van Velsor & Leslie, 1991), and similar to those found in significantly from zero and the quantity a4 5 (b3  b41b5)
other studies using 360-degree feedback instruments (see does not, there is a linear slope along the S 5  0 line. A
Fleenor, McCauley, & Brutus, 1996; Atwater et al., 1998). curve along the S 5  0 line is indicated by the quantity
Thus, each manager received an aggregated (mean) leader- (b3  b41b5) such that a negative value indicates a concave
ship score across subordinates and across peers as well as a surface along the line of complete disagreement.
self-score and a performance score.

Results
Data Analysis
Means, standard deviations, and correlation coefficients
Analyses of variance were conducted for the demographic among self, subordinate, and peer ratings of leadership and
and background variables (sex, age, organizational level, the manager’s performance rating are presented across all
and time as a manager) to determine whether differences countries and by country in Table 1. Generally, self-ratings
existed by country. For all variables, results of the analyses were higher than other ratings. Peer ratings tended to
of variance were significant. All subsequent analyses were correlate most strongly with performance ratings as
conducted with and without demographic and background compared with self and subordinate ratings. It is also
variables as controls. Including these variables as controls interesting to note the correlations among ratings and
did not change any of the results, and because demographic performance across countries. The relationships between
and background characteristics were missing for approxi- self-ratings and performance, and between other ratings
mately 20% of the sample, results reported are based on the and performance are very similar across countries and

r Blackwell Publishing Ltd 2005 Volume 13 Number 1 March 2005


32 LEANNE ATWATER, DAVID WALDMAN, CHERI OSTROFF, CHET ROBIE AND KARIN M. JOHNSON

Table 1. Means, standard deviations (SD), and correlations for self, subordinate, and peer ratings of leadership skills
and boss ratings of performance (perf) by country1
n Mean SD Self Sub Peer
All countries
Self 6751 3.70 .41
Sub 6525 3.62 .49 .27**
Peer 6751 3.56 .40 .26** .43**
Perf 6751 5.39 1.02 .08** .23** .31**
U.S.
Self 3896 3.72 .41
Sub 3793 3.65 .52 .23**
Peer 3896 3.59 .41 .22** .41**
Perf 3896 5.34 1.13 .09** .22** .31**
U.K.
Self 788 3.60 .41
Sub 745 3.56 .45 .23**
Peer 788 3.46 .38 .19** .43**
Perf 788 5.37 .94 .04 .25** .35**
Germany
Self 437 3.86 .42
Sub 403 3.85 .48 .35**
Peer 437 3.73 .40 .24** .44**
Perf 437 5.61 .83  .02 .14** .29**
France
Self 533 3.60 .38
Sub 509 3.48 .43 .27**
Peer 533 3.48 .39 .23** .43**
Perf 533 5.49 .82 .09 .31** .34**
Denmark
Self 897 3.65 .36
Sub 896 3.49 .41 .34**
Peer 897 3.44 .34 .21** .45**
Perf 897 5.35 .89 .08 .26** .28**
Italy
Self 201 3.85 .23
Sub 178 3.72 .19 .17*
Peer 201 3.61 .22 .18* .26**
Perf 178 5.52 .90 .06 .17* .43**

**po.01, *po.05.

country groupings. In particular, the degree of relationship performance in countries other than the U.S. These
between rating sources does not significantly vary by patterns imply that there are many similarities in relation-
country grouping (e.g., the correlations between subordi- ships across countries between ratings of a manager’s
nate and peer, self and peer, self and subordinate ratings) behaviors and performance ratings. That is, ratings from
are very similar. Further, relationships between behavior peers and subordinates are significantly related to perfor-
ratings and performance are also similar across countries mance ratings, and are of essentially the same magnitude
for subordinates and peers. However, the relationship across countries. This suggests that how behaviors are
between self-ratings and performance is somewhat differ- viewed and their relationship with supervisory ratings of
ent in that self-ratings are not significantly related to performance shows some similar patterns across countries.

International Journal of Selection and Assessment r Blackwell Publishing Ltd 2005


SELF–OTHER AGREEMENT 33

However, there were differences in the patterns of relation- individualism European– U.K., Germany, Italy; and (3) low
ship between self and other ratings, which are discussed masculinity/relatively lower individualism European –
below. France and Denmark. (Please note that our designation of
low individualism is in relation to the other countries in the
study. France and Denmark did not receive individualism
Omnibus Relationships scores that were below average in Hofstede’s analyses.)
Moderated polynomial regression procedures were used
Hypotheses 1a and 1b predicted that self and other ratings to test Hypothesis 3 that the pattern of relationships
were both important for understanding performance between self and other (subordinate and peer) ratings and
across all countries, while Hypotheses 2–3 refined this managerial performance in European countries high on
prediction by focusing on differences in these relationships masculinity/individualism would differ from the U.S. and
by country groupings. As a first omnibus test, we from European countries low on masculinity/individual-
conducted a polynomial regression for self–subordinate ism. In a hierarchical manner, supervisory ratings of
and self–peer ratings across all countries combined. For performance were regressed on self-ratings and other
self-subordinate agreement, the overall R2 from the ratings (peer or subordinate) in the first step, the product
polynomial regression was .23 (F 5 74.75, po.001) and of self times other ratings, the square of self-ratings, and
was .32 (F 5 145.13, po.001) for self–peer ratings.2 square of other ratings in the second step, a set of dummy
Examination of the response surfaces indicated a signifi- variables representing country grouping (U.S., European
cant curve along the line of perfect agreement for self–other high masculinity, European low masculinity) and 10 cross-
(peer and subordinate) ratings such that a sharper decrease product terms between the country dummy variables and
in performance is observed at lower levels of rated the terms in the equation. A significant increase in R2 when
behaviors. That is, when self and other ratings of the the set of cross-product terms for country are entered in the
leader’s behavior went from moderate to high, perfor- equation indicates a significant difference in the response
mance ratings did not increase sharply. However, when self surfaces between the country groupings.
and other ratings were low and decreasing, performance as For self–subordinate agreement, the overall R2 from
rated by the supervisor dropped sharply. Further, along the the moderated polynomial regression was .25 (F 5 24.13,
S 5  0 line performance was inverted U-shaped, with the po.001). This analysis also revealed that a significant 1%
turning point near self equaling other. That is, for a severe additional variance (p  .01) was accounted for by the set
overestimator performance was lowest; then it began to of cross-product terms for country and self–subordinate
increase as the overestimation became less severe. It should ratings, beyond that accounted for by the prior terms in the
also be noted that the slope of the line of perfect agreement equation. For self–peer agreement, the overall R2 was .33
as well as the slope of the line S 5  0 line were significant (F 5 45.14, po.001) and the set of cross-product terms of
for each country separately, indicating that agreement at country and self-subordinate ratings accounted for a
high levels of rated behaviors was related to higher significant 1% (p  .05) additional variance in perfor-
outcomes than agreement at low levels of rated behaviors.3 mance, beyond that accounted for by self and other ratings
The findings from the omnibus test support Hypotheses and country dummy variables. Thus, at a general level,
1a and 1b that self and other ratings are related to perfor- Hypothesis 3 was supported.
mance outcomes across cultures. Detailed results are not To facilitate interpretation, regression coefficients for
fully presented because Hypotheses 2 and 3 focused on more each country grouping are presented separately. These are
explicit relationships by country groupings and it is more the resultant coefficients from the moderator analyses
meaningful to examine these moderated relationships. when country grouping is taken into account and are the
same as when each country group is treated separately in
regressions (standard errors, however, differ slightly
Differential Patterns by Country Grouping
between the omnibus test and separate regressions by
Hypotheses 2 and 3 predicted that the pattern of relation- country). Results are presented in Table 2. Significant
ships between self and other (subordinate and peer) ratings differences between b’s for each term by country grouping
and managerial performance in the U.S. would differ from are contained in the last column of Table 2 based on the
that of European countries. First, moderated polynomial results of the moderated regression analyses. For both self–
regressions and follow-up tests supported the notion that the subordinate and self–peer agreement, significant differ-
U.S. differed significantly from European countries ences between country groupings were found, and the
as a whole (combined) as well as from each of the other significant interaction terms between country grouping
European countries.3 Therefore, the U.S. was retained based on masculinity/individualism and self–other rating
separately and three country groupings were created based variables provide specific support for Hypothesis 3.
on the cultural values for masculinity–femininity and As can be seen in Table 2, results for the relationship
individualism scores. The three groups were: (1) U.S. – high between self–subordinate agreement and performance
masculinity/highest individualism; (2) high masculinity/high varied across country groupings. Figures 1a (U.S.), 1b

r Blackwell Publishing Ltd 2005 Volume 13 Number 1 March 2005


34 LEANNE ATWATER, DAVID WALDMAN, CHERI OSTROFF, CHET ROBIE AND KARIN M. JOHNSON

Table 2. Regression of managerial performance on self–other ratings based on country groupings


U.S. highest individuality EHM ELM
High masculinity U.K., Germany, Italy France, Denmark
Self–other agreement b (SE) b (SE) b (SE) Significant comparisons
Self–subordinate
Self .16(.04)** .00(.08)  .05(.08) US 6¼ EHM, ELM
Subordinate .43(.03)** .45(.06)** .46(.08)**
Self squared  .16(.07)*  .07(.12) .25(.16) US, EHM 6¼ ELM
Self * subordinate .02(.08)  .12(.16) .01(.19)
Subordinate squared  .06(.04)  .09(.10)  .40(.11)** US, EHM 6¼ ELM
Self–Peer
Self .11(.04)**  .08(.07) .00(.08) US 6¼ EHM
Peer .79(.04)** .81(.07)** .63(.08)** US, EHM 6¼ ELM
Self squared  .19(.07)**  .01(.11) .16(.14) US 6¼ ELM
Self * peer .08(.10)  .06(.18) .17(.21)
Peer squared  .22(.06)**  .37(.12)**  .33(.13)*

For self–other: US (n 5 3793); EHM (n 5 1326); ELM (n 5 1406).


For self–peer: US (n 5 3896); EHM (n 5 1426); ELM (n 5 1429).
EHM, European high masculinity/individuality; ELM, European low masculinity/individuality. *p  .05, **p  .01.

(European low masculinity) and 1c (European high mascu- interpretation. For the U.S., as can be seen in Table 2 and
linity) display the results graphically for self–subordinate Figure 2a, results indicated a significant slope along the line
agreement, while similar graphs are presented in Figures 2a, of perfect agreement (a1 5 .88, po.01) with a concave
2b, and 2c for self–peer agreement. The results between curve such that a sharper decrease in performance occurs as
country groups show differences in several areas. First, the rated behaviors become lower (a2 5  .33, po.01).
results in Table 2 indicate that self–other agreement is Further, a3 was  .68 (po.01) and a4 was  .49
relevant in the U.S. but not in other countries (significant (po.01), indicating a concave trend such that performance
differences in self and self-squared b’s were found between is lower for underestimators than for those in agreement,
the U.S. and the other country groups). Second, in both high and performance also decreases as overestimation becomes
and low masculinity/individualism countries in Europe, only more severe. In addition, the quantity (b2  b1)/
other ratings are significant (see Table 2). 2(b3  b41b5) 5  .69 indicating a shift toward the self
Pattern of Relationships in U.S. For self–subordinate less than other region relative to peer ratings is important
ratings, self–other agreement was significantly related to for understanding the relationship between over- and
performance in the U.S. (see Table 2 and Figure 1a). underestimation and performance. Thus, performance is
Examination of surface features revealed that, for the U.S., highest when self and peer ratings are similar and high,
a1 was .59 (po.01) and a2 was  .20 (po.05), indicating a while they are lowest when self and peer ratings are similar
slope along the line of perfect agreement for self– and low. Performance decreases somewhat as under-
subordinate that is curved such that a sharper decrease in estimation increases and performance decreases more
performance is observed at lower levels of rated behaviors. substantially as over-rating becomes more severe.
Along the S 5  0 line, performance is inverted U-shaped, This pattern is similar to that found in previous multi-
with the turning point near self equaling other (a3 5  .27 source feedback studies (cf., Atwater et al., 1998; Johnson
and a4 5  .24, po.05) such that performance is lowest for & Ferstl, 1999). For self–peer ratings, overestimation had a
a severe overestimator, begins to increase as the over- more detrimental effect than for self–subordinate ratings.
estimation becomes less severe, and then decreases slightly Comparison of European Countries. Hypotheses 4a
as underestimation increases. The quantity (b2  b1)/ and 4b predicted that overestimation would be more
2(b3  b41b5) was  .56, indicating a small shift toward detrimental (related to lower performance) in countries
the self less than the other region of about one-half unit lower in masculinity/individualism and Hypotheses 5a and
along the line of S 5  0, further evidence that over- 5b suggested that underestimation would be more detri-
estimation is worse in relation to performance than mental in countries higher in masculinity/individualism.
underestimation. However, because only other ratings were important in
A similar pattern emerged when examining self–peer European countries (see Table 2), over- and underestima-
agreement and its relationship with performance in the U.S. tion issues are not particularly relevant. Thus, Hypotheses
Salient features of the surfaces were again examined to aid 4a, 4b, 5a, and 5b were not supported (although as

International Journal of Selection and Assessment r Blackwell Publishing Ltd 2005


SELF–OTHER AGREEMENT 35

a U.S. a U.S.

7.00 7.00

6.00 6.00
performance

performance
5.00 5.00

4.00 4.00
3.00 3.00
2.00 2.00
1.00 1.00
2 2 2 2
1 1 1 1
0 0 0 0
subordinate −1 −1 self peer −1 −1 self
−2 −2 −2 −2

b Low Masculinity / Indiv European b Low Masculinity / Indiv European


France, Denmark France, Denmark

7.00 7.00
6.00 6.00
performance

5.00

performance
5.00
4.00 4.00
3.00 3.00
2.00 2.00
1.00 1.00
2 2 2 2
1 1 1 1
0 0 0 0
−1 −1 −1 −1
subordinate −2 −2 self peer −2 −2 self

c High Masculinity / Indiv European c High Masculinity / Indiv European


Germany, UK, Italy Germany, UK, Italy

7.00 7.00
6.00 6.00
performance

5.00
performance

5.00
4.00 4.00
3.00 3.00
2.00 2.00
1.00 1.00
2 2 2 2
1 1 1 1
0 0 0 0
subordinate −1 −1 self peer −1 −1 self
−2 −2 −2 −2

Figure 1. Relationship between self and subordinate Figure 2. Relationship between self and peer ratings and
ratings and performance. performance.

indicated above, overestimation is related to lower differs significantly from European countries, particularly
performance than underestimation in the U.S.). in that both self and other ratings are simultaneously
Nevertheless, the results based on country grouping important in their relationship with performance. Second,
indicate significant differences in the pattern of ratings for the European countries, only other ratings are
between the U.S. (Figures 1a and 2a), European high important (i.e., there are no significant effects for self or
masculinity/individualism countries (Figures 1c and 2c), self-squared); however, the pattern of results differs by
and European low masculinity/individualism countries country grouping and differs somewhat depending on
(Figures 1b and 2b). First, as indicated above, the U.S. whether the other ratings are from subordinates or peers.

r Blackwell Publishing Ltd 2005 Volume 13 Number 1 March 2005


36 LEANNE ATWATER, DAVID WALDMAN, CHERI OSTROFF, CHET ROBIE AND KARIN M. JOHNSON

For self–subordinate agreement, the difference between In low masculinity/individualism European countries
high and low masculinity/individualism countries in (France and Denmark), other ratings were related to
Europe is primarily as a result of the nonlinearity of performance and the nonlinear results indicated that the
subordinate ratings. In high masculinity European coun- positive relationship between others ratings of leader
tries, subordinate ratings are significantly and linearly behavior and performance ratings becomes flatter at higher
related to performance. However, in low masculinity levels of other ratings and drops sharply as others give
European countries, subordinate ratings are related to lower ratings. For example, if subordinates rate a leader’s
performance in a nonlinear manner (and the effect is behavior as moderate to high (e.g., above 3.5 on a 5-point
significantly different from other country groupings). In scale,) performance is high. However, if subordinates rate
the case of low masculinity/individualism countries, the the leader low, as those ratings become progressively lower,
relationship between other ratings and performance is such performance also becomes progressively lower. That is, in
that at mid- to high levels of other ratings, performance low masculinity/individualism countries there is little
differences are small, but performance ratings decrease distinction in terms of performance between those who
more rapidly as other ratings become lower. are moderate to good, but lower ratings are strongly related
For self–peer analyses, results indicated that the primary to poorer performance.
difference between high and low masculinity/individualism In high masculinity/individualism European countries,
countries in Europe was in the strength of the relationship when examining self–subordinate relationships, subordi-
between peer ratings and performance, with peer ratings nate ratings were significantly related to performance in a
more strongly related to performance in high masculinity/ linear manner. That is, as ratings become lower, perfor-
individualism countries. Further, while the relationship mance ratings become lower in a linear fashion. However,
between peer ratings and performance was nonlinear in for peer ratings, a nonlinear relationship with performance
both high and low masculinity/individualism countries, the was observed such that performance ratings drop off
nature of the relationship differs slightly. In low masculi- steeply as peer ratings become progressively lower. The
nity/individualism countries, the ‘‘curve’’ begins closer to strength of the relationship of peer ratings of leader
the midpoint of the scale such that those with mid- to high behavior with performance was significantly stronger in
ratings have similar performance, whereas in high mascu- high masculinity/individualism countries than low mascu-
linity/individualism countries, the ‘‘curve’’ begins at a much linity/individualism countries.
higher rating point, indicating that only at very high levels These results indicate that the U.S. and European
of leadership behaviors are distinctions in performance countries differed significantly. In the U.S., self-ratings
minimal. relative to other ratings are important in relation to
performance, whereas in Europe, other ratings are the
predominant influence in relation to performance, and self-
Summary ratings relative to other ratings play an insignificant role.

Taken together, the results indicate that there are differ-


ences between countries based on cultural values and that Discussion
the U.S. differs significantly from other countries, support-
ing Hypotheses 1a, 1b, 2, and 3. In particular, the Overall, there was much similarity among countries in the
differences between the U.S. and the European countries relationships between self-ratings and performance and
are based on the relative role of self and other ratings, with between other ratings and performance across countries. In
self–other agreement being important for the U.S. while all country groupings, peer and subordinate ratings were
other ratings are most important in European coun- more highly correlated with each other than self-ratings
tries. Further, differences between countries based on mas- were with either group. Additionally, in all countries, self-
culinity–femininity/individualism were observed in the ratings were higher than subordinate or peer ratings. These
strength and nonlinearity of the relationship between other results suggest that how behaviors are viewed and their
ratings and performance. relationship with supervisory ratings of performance show
In the U.S., simultaneous consideration of both self and some similar patterns across countries. However, the role
subordinate ratings is important for predicting perfor- of self-ratings is not the same across countries and this is
mance ratings. Agreement between self and other at high particularly evident in examining relationships between
levels was better than agreement at low levels and self–other agreement and performance ratings.
agreement was generally better than over- or underestima- The polynomial regression approach used here is helpful
tion. Over- and underestimation were not related to in understanding the complexities in the patterns of
performance in the European high nor low masculinity/ relationship between self and other ratings across coun-
individualism countries, although they were in the U.S. tries. The results from this study support earlier findings
This provides some support for Hypotheses 4a, 4b, 5a, and that self–other differences are related to performance in the
5b only for the U.S. but not for the European countries. U.S. and extend this work by showing that alternative

International Journal of Selection and Assessment r Blackwell Publishing Ltd 2005


SELF–OTHER AGREEMENT 37

patterns of self–other agreement occur in other countries. U.S. constantly striving to be better than their peers; thus,
Using a different sample and different leadership and knowing how they are perceived becomes important to
performance measures, findings for the current U.S. sample their success.
replicate those of Atwater et al. (1998). Specifically, in the There seems to be an implicit understanding among
U.S., the patterns of agreement and performance are employees in many U.S. organizations that their opinions
similar for subordinates and peers. Both self and other should be valued, and that managers should be aware of
(peer and subordinate) ratings are relevant in combination. how they are perceived. U.S. organizations often bombard
Agreement at high levels is related to higher performance employees with surveys assessing their opinions on a
than agreement at low levels. Agreement is generally better variety of topics, including their job satisfaction and
than disagreement, that is, managers receive higher preferences for human resource practices and initiatives
performance ratings when they agree with their subordi- (e.g., flextime). Because employees are encouraged to
nates that their performance is good, and overestimation is provide their opinions, they may come to expect that
related to lower performance than underestimation. managers should understand how they perceive the
In the U.S., much theory has been offered (e.g., Atwater manager’s behavior, and the manager should align his or
& Yammarino, 1997; Yammarino & Atwater, 1993; her own perceptions accordingly. As such, self–other
Yammarino & Atwater, 1997) to explain why a compar- agreement may be more relevant to performance in the
ison of self-ratings with those of others is important for U.S. than elsewhere. It is also possible that the high degree
predicting performance ratings, effectiveness, and perfor- of attention paid to self–other comparisons in 360-degree
mance outcomes. Further, in recent years, a great deal of feedback processes (which are prevalent in the U.S.) may
work from practitioners has focused on these same issues have sensitized U.S. managers and employees to the
(e.g., Bracken, Timmreck, & Church, 2001; Hazucha, importance of self–other agreement.
Hezlett, & Schneider, 1993; Tornow & London, 1998). Along similar lines, although we recognize that power
The basic underlying assumption across this work is that distance is largely similar across the countries in our study
how a manager views his or her performance relative to (Adler, 2002; Hofstede, 1980), perhaps the social distance
how others (e.g., peers or subordinates) view his or her between managers and subordinates is lower in the U.S.
performance will have implications for the manager’s Thus, greater awareness of how one is viewed by
leadership behaviors, relationships with subordinates and subordinates becomes more of an expectation.
peers, and ultimately his or her performance. While some Nevertheless, several interesting differences between
early research in this area showed inconsistent and often countries were revealed. For example, the relationship
conflicting results, in recent years, a few studies have used between peer ratings and performance was nonlinear in
more sophisticated and appropriate methodologies for France and Denmark. In France and Denmark, perfor-
studying self–other agreement (e.g., Atwater et al., 1998; mance ratings were relatively flat at higher levels of
Johnson & Ferstl, 1999). Using larger U.S. samples and subordinate ratings but dropped sharply at lower levels
polynomial regression procedures, this more recent re- of subordinate ratings. This pattern of nonlinear effects
search has revealed patterns of self–other agreement was not observed for most of the other countries. One
relationships very similar to the ones found in this study possible explanation is that in relatively lower individua-
for the U.S. – namely that simultaneous consideration of listic countries that are low on masculinity, such as France
both self and other ratings is important for examining and Denmark, leveling is the norm, and hence, the flatter
relationships with performance measures. Over-, under- relationship between ratings and performance at higher
rating, and being in agreement are all relevant concepts. levels may reflect, in part, this leveling or avoidance of
Clearly, in the U.S., self–awareness, as operationalized by competition among the managers who are viewed as more
self–other agreement, is relevant to performance. Over- successful. Alternatively, in countries high on masculinity
rating which may represent a degree of arrogance, is related such as the U.K. and Germany, where assertiveness and
to lower performance as is underestimation. competition are valued, one would not expect to see such a
In contrast to the U.S., findings for the European leveling effect. That is, being rated even a small amount
countries indicated that self–other agreement was not very better than another is relevant. Our findings support this
important. Rather, others’ ratings of leadership behaviors interpretation, particularly for self–subordinate compar-
(subordinate and peer) alone were most strongly related to isons, in that for masculine countries, the relationship
the performance measure, and self–other comparisons did between other ratings and performance was linear.
not add significant explanatory power. These findings were Generally, the findings for self–peer ratings were similar
unexpected, and it is not clear why we failed to show to those for self and subordinates’ ratings, although in some
stronger self–other agreement effects in the European cases the results were more pronounced. For example,
countries. It is possible that masculinity is not as important performance ratings decreased more dramatically as peer
to the self–other agreement effect as the extreme individu- ratings of leadership became lower when compared with
alism and independence found in the U.S. The competi- the decrease in performance ratings when subordinate
tiveness and self-focus may contribute to individuals in the ratings became lower. Overall, peer ratings were more

r Blackwell Publishing Ltd 2005 Volume 13 Number 1 March 2005


38 LEANNE ATWATER, DAVID WALDMAN, CHERI OSTROFF, CHET ROBIE AND KARIN M. JOHNSON

highly related to performance than subordinate ratings, Differences that we found in the correlations among
particularly in countries high on masculinity. Perhaps, rating sources and performance across cultures suggest that
supervisors and peers have more in common in their an additional avenue for future research would be to focus
judgments and perspective of a manager than do super- on differential perceptions about the peer relations and
visors and subordinates. superior–subordinate relations. For example, different
perceptions about the trustworthiness of peer ratings, the
usefulness of alignment between self and subordinate
Limitations perceptions of behavior, or appropriateness of feedback
from subordinates or peers might explain some of the
A number of limitations of this research should be noted. differences we found across cultures.
First, the translation–retranslation process may have This study just begins to scratch the surface in terms of
contributed to the differences in slopes found in the various extending the work on feedback and self–other agreement
countries. However, this would not explain the differences cross-culturally. How might self–other agreement and
found between the U.S. and U.K. feedback processes work in countries such as Japan, China,
Second, the one-item rating of performance is a or Brazil? For example, as noted above, each of the
limitation. The study would have benefited from a more countries in our sample scored above the median on
objective or multi-item measure. Third, it is possible that individualism. How might self–other agreement relate to
perceptions of competence and what constitutes compe- performance in a collectivist country? Also, we should
tence may vary across cultures, or there may be response consider the possibility that self–other agreement results
biases that influence ratings in the different cultures (such are influenced by the rating biases of self and other raters.
as leniency or harshness). However, if rating styles vary by In collectivist countries, raters may hesitate to provide
culture because of societal values, it is likely that all rater discriminating ratings and may rate themselves and all
groups (peers, subordinates) would have the same biases. others very similarly. This certainly could impact the results
Thus, we might expect mean differences, but not necessa- of self–other agreement on performance.
rily relational differences across sources. The mean The results of our study have implications for the use of
performance ratings did not differ significantly between feedback processes such as 360-degree feedback when
any of the country groupings. Similarly, the correlations applied across cultures. Studies have demonstrated that
between self, other, and performance ratings are very feedback does alter self-ratings; over-raters lower their self-
similar across the different countries. Thus, while bosses’ ratings and under-raters raise their self-ratings following
perceptions of what constitutes competence may have feedback (cf. Smither, London, Vasilopoulos, Reilly, Mill-
varied across cultures, the rating patterns do not suggest sap, & Salveminim, 1995). Other research (e.g., Johnson &
wide variability in response patterns. A fourth limitation Ferstl, 1999) has demonstrated that over-raters tended to
is the cross-sectional nature of the data. As is the case improve their performance after upward feedback. We can
with most studies of this nature, longitudinal data would speculate that this would be the case in other countries, and
be preferable. Finally, while we placed a great deal of that over time, stronger self–other agreement relationships
emphasis on Hofstede’s dimensions, and there is support with performance may be demonstrated.
for using them for cultural comparisons, they have come This study also has implications for multi-national
under criticism for being overly general and based on what management. Our results suggest that we cannot apply
could be a biased sample (Schwartz, 1994). It would be Western management practices outside the U.S. in a
interesting in future research to explore alternative cultural cavalier manner and expect those processes to have
value perspectives. comparable results. The differences in effects between the
U.S. and the various European countries that we studied
suggest that we must exercise caution when applying U.S.-
based findings to a global context. Factors that are
Future Research and Implications
important for performance outcomes in the U.S. may have
In addition to exploring the impact of differing cultural no relationship with performance, or even a negative
dimensions independently as an explanation for the influence on performance, outside the U.S.
different results across countries and across ratings sources, We believe that the results of our study have broadened
a more fruitful approach might be to investigate the profile our understanding of self–other agreement. Clearly, future
or configuration of cultural values across countries. For research in countries outside the U.S. would continue to
example, there may be other cultural values or differences enhance our understanding of self–other agreement and
that work in combination and that could explain differ- cultural values. It is our belief that this study will also
ences by country. It should be noted that such a profile prompt researchers and practitioners to carefully consider
comparison was not possible here because the countries the cross-cultural implications of their research and
studied were similar in terms of many of Hofstede’s personnel practices, particularly the implementation of
dimensions. multi-source feedback.

International Journal of Selection and Assessment r Blackwell Publishing Ltd 2005


SELF–OTHER AGREEMENT 39

Notes international management research (pp. 267–349). Oxford, UK:


Blackwell.
Edwards, J. (1994) The study of congruence in organizational
1. Correlations in Table 1 contain a mixture of correla- behavior research: Critique and proposed alternative. Organiza-
tions across levels: unaggregated (self-manager), single- tional Behavior and Human Decision Processes, 58, 683–689.
aggregated (self–subordinate and self–peer) and double- Edwards, J. and Parry, M. (1993) On the use of polynomial
aggregated (subordinate–peer) correlations. Given our regression equations as an alternative to difference scores in
theoretical interest in examining cross-level relation- organizational research. Academy of Management Journal, 36,
1577–1613.
ships in subsequent analyses, this aggregation is appro-
Fleenor, J., McCauley, C. and Brutus, S. (1996) Self–other rating
priate. However, single- and double-aggregation have agreement and leader effectiveness. Leadership Quarterly, 7,
differential effects on correlations and readers are 487–506.
referred to Scullen (1997) for explication in interpreting Gabrenya, W.K., Wang, Y.E. and Latane, B. (1985) Social loafing on
these conceptually different types of relationships. an optimizing task: Cross cultural differences among Chinese
2. Detailed results of this analysis are available from the and Americans. Journal of Cross-Cultural Psychology, 16,
223–242.
author.
Harris, M. and Schaubroeck, J. (1988) A meta-analysis of self–
3. Full results of the analyses comparing the U.S. with supervisor self–peer and peer–supervisor ratings. Personnel
European countries are available from the author. Psychology, 41, 43–61.
Hazucha, J.F., Hezlett, S.A. and Schneider, R.J. (1993) The impact
of 360-degree feedback on management skills development.
Human Resource Management, 32, 325–351.
References Hezlett, S.A., Ronnkvist, A.M., Holt, K.E. and Hazucha, J.F. (1997)
The PROFILORR technical summary. Minneapolis, MN:
Adler, N.J. (1991) International dimensions of organizational Personnel Decisions International.
behavior (2nd Edn.). Cincinnati, OH: South-Western College Hofstede, G. (1980) Culture’s consequences: International differ-
Publishing. ences in work-related values. London: Sage.
Adler, N.J. (2002) International dimensions of organizational Hofstede, G. (1993) Cultural constraints in management theories.
behavior (4th Edn.). Cincinnati, OH: South-Western College Academy of Management Executive, 7(1), 81–94.
Publishing. Hofstede, G. (1998) Masculinity and femininity. London: Sage.
Ashford, S. (1989) Self-assessments in organizations: A literature Hofstede, G. (2001) Culture’s consequences: Comparing values,
review and integrative model. In L.L. Cummings and B.M. Staw behaviors, institutions and organizations across nations. Thou-
(Eds.), Research in organizational behavior (vol. 11, pp. sand Oaks, CA: Sage.
133–174). Greenwich CT: JAI Press. House, R.J., Hanges, P.J., Ruiz-Quintanilla, S.A., Dorfman, P.W.,
Atwater, L., Ostroff, C., Yammarino, F. and Fleenor, J. (1998) Self– Javidan, M., Dickson, M. and Gupta, V. (1999) Cultural
other agreement: Does it really matter? Personnel Psychology,
influences on leadership and organizations: Project Globe.
51, 577–598.
Advances in Global Leadership, 1, 171–233.
Atwater, L., Roush, P. and Fischthal, A. (1995) The influence of
Johnson, J.W. and Ferstl, K.L. (1999) The effects of inter-rater and
upward feedback on self and follower ratings of leadership.
self–other agreement on performance improvement following
Personnel Psychology, 48, 35–60.
upward feedback. Personnel Psychology, 52, 271–303.
Atwater, L. and Yammarino, F. (1992) Does self–other agreement on
Kim, U., Triandis, H., Kagitcibasi, C., Choi, S. and Yoon, G. (1994)
leadership perceptions moderate the validity of leadership and
Individualism and collectivism: Theory, method and applica-
performance predictions. Personnel Psychology, 45, 141–164.
tions. Thousand Oaks: Sage.
Atwater, L. and Yammarino, F. (1997) Self–other rating agreement:
Leung, K. (1988) Theoretical advances in justice behavior: Some
A review and model. Research in Personnel and Human
Resource Management, 15, 121–174. cross-cultural inputs. In M.H. Bond (Ed.), The cross-cultural
Bass, B. and Yammarino, F. (1991) Congruence of self and others’ challenge to social psychology (pp. 218–229). Newbury Park,
leadership ratings of naval officers for understanding successful CA: Sage.
performance. Applied Psychology: An International Review, 40, Mabe, P. and West, S. (1982) Validity of self-evaluation of ability:
437–454. A review and meta-analysis. Journal of Applied Psychology, 67,
Bond, M.H. and Forgas, J.P. (1984) Linking person perception to 280–296.
behavior intention across cultures: The role of cultural Markus, H. and Kitayama, S. (1991) Culture and the self:
collectivism. Journal of Cross Cultural Psychology, 15, Implications for cognition, emotion and motivation. Psycholo-
337–352. gical Review, 98(2), 224–253.
Bracken, D., Timmreck, C. and Church, A. (2001) The handbook of McCall, M. and Lombardo, M. (1983) Off the track: Why and how
multi-source feedback. San Francisco: Jossey-Bass. successful executives get derailed (Technical Report No. 21).
Carver, C. and Scheier, M. (1981) Attention and self-regulation: A Greensboro, NC: Center for Creative Leadership.
control theory approach to human behavior. New York: McCaulley, C. and Lombardo, M. (1990) Benchmarks: An
Springer. instrument for diagnosing managerial strengths and weak-
Church, A. (1997) Managerial self-awareness in high-performing nesses. In K.D. Clark and M.S. Clark (Eds.), Measures of
individuals in organizations. Journal of Applied Psychology, leadership (pp. 535–545). West Orange, NJ: Leadership Library
82(2), 281–292. of America.
Clark, L.A. (1987) Mutual relevance of mainstream and cross- Megerian, L.E. and Sosik, J.J. (1996) An affair of the heart:
cultural psychology. Journal of Consulting and Clinical Psychol- Emotional intelligence and transformational leadership. The
ogy, 55, 461–470. Journal of Leadership Studies, 3(3), 31–48.
Dorfman, P.W. (1996) International and cross-cultural leadership Morrison, T. (1994) Kiss, bow, or shake hands: How to do business
research. In J. Punnett and B.J. Shenkar (Eds.), Handbook for in 60 countries. Holbrook, MA: Bob Adams, Inc.

r Blackwell Publishing Ltd 2005 Volume 13 Number 1 March 2005


40 LEANNE ATWATER, DAVID WALDMAN, CHERI OSTROFF, CHET ROBIE AND KARIN M. JOHNSON

Oyserman, D., Coon, H. and Kemmelmeier, M. (2002) Rethinking Smither, J., London, M., Vasilopoulos, N., Reilly, R., Millsap, R. and
individualism and collectivism: Evaluation of theoretical assum- Salveminim, N. (1995) An examination of the effects of an upward
ptions and meta analyses. Psychological Bulletin, 128(1), 3–72. feedback program over time. Personnel Psychology, 48, 1–33.
Paulhaus, D. (1986) Self-deception and impression management in Taylor, S. and Brown, J. (1988) Illusion and well-being: A social
test responses. In A. Angleitner and J. Wiggins (Eds.), psychological perspective on mental health. Psychological
Perspectives in interactional psychology. NY: Plenum. Bulletin, 103, 193–210.
Sabath, A.M. (1999) International business etiquette. Franklin Tornow, W. and London, M. (1998) Maximizing the value of 360
Lakes, NJ: Career Press. degree feedback. San Francisco: Jossey-Bass.
Sackheim, H. (1983) Self-deception, self-esteem and depression: Van Velsor, E. and Leslie, J.B. (1991) Feedback to managers: Vol. 1
The adaptive value of lying to oneself. In J. Masling (Ed.), A guide to evaluating multi-rater feedback instruments. Green-
sboro NC: Center for Creative Leadership.
Empirical studies of psychoanalytic theories (pp. 101–157).
Van Velsor, E., Taylor, S. and Leslie, J. (1993) An examination of the
Hillsdale, NJ: Earlbaum.
relationships among self-perception accuracy, self-awareness,
Schwartz, H. (1994) Beyond individualism/collectivism. In U. Kim,
gender and leader effectiveness. Human Resource Management
H. Triandis, C. Kagitcibasi, S. Choi and G. Yoon (Eds.),
Journal, 32, 249–264.
Individualism and collectivism: theory, method and applica- Warr, P. (1987) The Meaning of Working. London: Harcourt, Brace
tions. Thousand Oaks: Sage. Jovanovich.
Scullen, S.E. (1997) When ratings from one source have been Yammarino, F. and Atwater, L. (1993) Understanding self-percep-
averaged, but ratings from another source have not: Problems tion accuracy: Implications for human resources management.
and solutions. Journal of Applied Psychology, 82, 880–888. Human Resource Management Journal, 32, 231–247.
Smircich, L. and Chesser, R. (1981) Superiors’ and subordinates’ Yammarino, F. and Atwater, L. (1997) Do managers see them-
perceptions of performance: Beyond disagreement. Academy of selves as others see them? Organizational Dynamics, 25,
Management Journal, 24, 198–205. 35–44.

International Journal of Selection and Assessment r Blackwell Publishing Ltd 2005

You might also like