You are on page 1of 16

The Leadership Quarterly 17 (2006) 506 – 521

Measurement metrics at aggregate levels of analysis: Implications for


organization culture research and the GLOBE project
Mark F. Peterson ⁎, Stephanie L. Castro
Department of Management, International Business and Entrepreneurship, College of Business, Florida Atlantic University,
Boca Raton, FL 33431, USA

Abstract

We propose that scholars who are interested in group, organizational, or societal constructs should consider three approaches to
designing aggregate measures. The typical approach to aggregate measure design in organization studies is to create measures based
on individual-level metric structures, then evaluate whether the individual level measures can be aggregated. We propose that the field
continue to use this approach for fundamentally individual-level constructs, but to also make greater use of two alternative approaches
that are now only occasionally used. One approach used in cross-cultural research is to aggregate items to the target level, then evaluate
measurement structure based on the relationships among items at the target level. Another approach is to aggregate individual-level
scales to the target level, then evaluate measure characteristics based on the relationships among scales at the target level. We also
recommend that constructing measures based on relationships among items or among scales at aggregate levels offers an approach to
studying organizational culture that is distinct from organizational climate. We apply the distinctions between different approaches to
aggregate measure design to a recent Leadership Quarterly article and to the GLOBE project on which that article is based.
© 2006 Published by Elsevier Inc.

Keywords: Organizational culture; National culture; Aggregation; Level of analysis; Psychometrics

1. Introduction

The issues of level of analysis that typically arise in the organizational literature differ from those that typically arise
in the cross-cultural literature. The organizational culture and climate literatures and the literature about aggregating
survey data to the group level have long reflected scholars' awareness that relationships between predictors and
criteria that are found at the individual level may or may not be found at an aggregate level (Castro, 2002; Denison,
1996; Glick, 1985). For example, scholars have come to recognize that if a measure of leadership and a measure of
performance are correlated at the individual level, they may or may not be correlated when the measures are aggregated
to the group or organizational levels. The cross-cultural literature, in contrast to the organizational literature, has
attended more to the issue that the structure of the measures themselves typically differs depending on whether the
measures are constructed based on the correlations among individual-level items or based on the correlations among
the items after they have been aggregated to the societal-level (Leung & Bond, 1989). In both literatures, the concern is
that relationships found at one level of analysis do not necessarily apply at another. The difference is that the

⁎ Corresponding author.
E-mail address: mpeterso@fau.edu (M.F. Peterson).

1048-9843/$ - see front matter © 2006 Published by Elsevier Inc.


doi:10.1016/j.leaqua.2006.07.001
M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521 507

organizational literature tends to apply this insight to relationships among measures that were originally designed at the
individual level, whereas the cross-cultural literature tends to apply it to relationships among the items that are used to
construct the measures. Although both applications are appropriate, the problem of how level of analysis affects
measure design is logically prior to the problem of how it affects relationships among measures and neglect of this
problem has been a major limitation in the organizational literature about level of analysis. It is the measure design
problem that we will address here. The article by Dickson and colleagues in this issue of Leadership Quarterly that is
based on a major recent multilevel project, the GLOBE Project, as well as a recent book that reports on GLOBE
(House, Hanges, Javidan, Dorfman, & Gupta, 2004) begin to address this problem, but both include ambiguities and
inconsistencies. The present paper is intended to encourage and help scholars to deal with the issue that just as
relationships between predictors and criteria may differ by level of analysis, so may the relationships among items that
are used to construct predictors and criteria. We do so by integrating insights from the organizational and cross-cultural
literatures to make recommendations about how to effectively handle level of analysis issues in measure design and use
the GLOBE project as an example.
First, we briefly summarize the perspective GLOBE has taken to creating scales beginning from data collected at the
individual level for use at the organization and nation levels. We then review the way aggregation issues have been
handled in the organizational culture and societal culture literatures. Next, we draw from a recent article in Leadership
Quarterly and the GLOBE book, particularly chapter 8 (Hanges & Dickson, 2004) to summarize how the GLOBE
group has drawn from these literatures to design aggregate measures. We note ambiguities and apparent inconsistencies
in the description of GLOBE's measure development process. In order to both further clarify how the level of analysis
issues addressed in the organizational and cross cultural literatures can be best integrated, we specify three approaches
to constructing aggregate measures. We conclude by applying these three approaches to organizational research, cross-
cultural research, and the GLOBE project.

2. Measure development in GLOBE

GLOBE deals with three major topics — leadership, organizational culture, and national culture (House et al.,
2004). In designing measures, it draws from two internally consistent ways to develop and justify aggregate measures
that are derived from individual level responses to surveys (Hanges & Dickson, 2004, pp. 133–136). One approach is
found in organizational culture and climate research. This approach is to create scales based on psychometrics that are
meaningful at the individual level, then aggregate the scales to a target level. That is, scholars first develop concepts and
apply factor analyses, reliability estimates, and other scale evaluation methods to the individual level items. As detailed
below, they then apply rwg, ICC(1), ICC(2), WABA, or ANOVA on the individual level scales to justify aggregating
them to the group, organization, or nation level. The result of this approach is that it identifies individual level
constructs that have enough consistency within aggregate units (e.g., group, organization, or nation) and enough
variability across aggregate units to make the analysis of aggregate means worthwhile. The second approach is
frequently found in cross-cultural studies that are designed to create societal-level culture dimensions. This approach
that Hofstede (1980, 2001) suggested, Leung & Bond (1989) developed further, and Triandis et al. (1993) and
Schwartz (1994) follow is to create scales based on the data structure that results after items have been aggregated to a
target level. That is, researchers first aggregate items to the target level and then do factor analyses and other scale
evaluation at the target level. The differences between creating measures based on the data structures at individual and
societal levels are typically substantial. For example, Leung & Bond (2004) argue that five factors are appropriate for
their data about causal beliefs at the individual level, but only two are appropriate at the nation level.
While we see evidence that the GLOBE project considered both of these approaches, we have questions about the
way the GLOBE project has used and presented these approaches. In so doing, we support the unusual potential of
GLOBE's research design not only to use both of these approaches for different purposes, but also to combine them in a
third unique way to design organizational and societal culture measures. This third approach, as detailed below, is to
create measures at the individual level as in the first approach, then to aggregate the scales to the target level, and finally
to create new scales based on relationships among these aggregated scales that were originally created based on the
individual-level data structure. We give an example of how this might be done based on societal culture scales provided
by the GLOBE project. This potential for a third alternative to designing aggregate measures provides GLOBE and
other large scale survey projects with an opportunity to contribute to research about both organizational culture and
national culture.
508 M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521

3. Level of analysis in organizational culture and climate

From the time that the organizational climate literature developed, management scholars have been aware that the
aggregation of measures designed at the individual level needs to be justified rather than assumed. The focus in the
literature about designing organizational culture and climate dimensions has been different from the focus in designing
national or societal culture dimensions. Specifically, organizational culture and climate research has been concerned
with evaluating within and between group (e.g., department, organization, etc.) differences in individual level scales to
determine whether aggregating these scales to a target level is justified. As detailed below, the focus in designing
national or societal culture dimensions has been on forming scales based on relationships among items aggregated to the
target level.
Merely looking at an average of what people report (e.g., values) may provide useful information, but it may be
misleading. Without empirically evaluating the level (e.g., individual, group, industry, nation, culture, etc.) at which
variables are operating, effects may be missed, misidentified, or misinterpreted. Looking at the equation for a decomposed
raw score correlation (Dansereau, Alutto, & Yammarino, 1984) illustrates the potential downfalls of merely using raw scores:
g
Bx g By r Bxy þ g W x g W y r Wxy ¼ r Txy ;

where ηBx and ηBy are the between-unit etas for variables x and y (respectively), ηWx and ηWy are the within unit etas for x and
y, rBxy and rWxy are the between and within unit correlations, and rTxy is the total raw score correlation. The raw score
correlation is comprised of between unit and within unit effects, which could be very different, even contradictory (e.g.,
positive and negative, in effect cancelling each other out and leading to a zero or practically zero correlation). For example,
r
Txy could represent the raw score correlation between culture and leadership within a particular nation. Yet, if the data were
drawn from multiple industries, an industry-level effect could be missed if level of analysis were not evaluated. There could
potentially be a very large between industry component and a negligible within industry component comprising the total raw
score correlation (or vice versa). The point is, without evaluating the level of analysis in some empirical manner, conclusions
may be inaccurate.
A number of procedures, such as intraclass correlation coefficients (ICC; Glick, 1985; Shrout & Fleiss, 1979), are
used in the organizational literature to evaluate whether a measure designed at the individual level shows enough
consistency within groups or enough consistency within relative to differences between groups to justify aggregation.
While some approach to check on the appropriateness of aggregating scales to a target level is now commonplace,
practice varies and debates continue about the most appropriate statistic to use for particular applications.

3.1. Intraclass correlation coefficients

There are two types of ICCs: ICC(1) and ICC(2). ICC(1) has been defined as an index of the proportion of the total
variance explained by group membership (Raudenbush & Bryk, 2002) and as interrater reliability (James, 1982). ICC
(2) provides an estimate of the reliability of group means (James, 1982). While ICC(1)s are not affected by group size,
ICC(2)s are affected such that the smaller the size, the smaller the ICC(2) estimate (Bliese, 1998). ICCs are now
frequently used to evaluate aggregation. For example, Gibson & Birkinshaw (2004) calculated ICC(1) and ICC(2)s for
all of their study variables, as did Dietz, Pugh, & Wiley (2004). Dietz, Robinson, Folger, Baron, & Schulz (2003) report
ICCs for procedural justice climate. ICCs occasionally have been used in cross-cultural research to evaluate whether or
not the differences between nations in individual level constructs like role stress and beliefs about causal relationships
are large enough to justify research that predicts these differences from other nation level measures (Leung & Bond,
2004; Peterson et al., 1995).

3.2. Eta-squared

Organization scholars also use eta-squared (η2) as a measure of reliability or consistency of rater responses. This
statistic is problematic when group size varies, as the magnitude is affected by group sizes. In particular, “when group
sizes are small, eta-squared values show significant inflation relative to the ICC(1)” (Bliese, 2000, p. 360). Few studies
solely use eta-squared to justify aggregation. Choi, Price, & Vinokur (2003) used eta-squared to justify aggregation of
their four group-level variables (including group climate) prior to using hierarchical linear modeling (HLM). Others
M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521 509

have used eta-squared in conjunction with an indicator of within-group interrater agreement (rwg(j); e.g., Klein, Conn,
Smith, & Sorra, 2001; Ostroff, Kinicki, & Clark, 2002).

3.3. rwg(j) Coefficient

Another statistic that is popular among organizational scholars for evaluating agreement and justifying aggregation
is the rwg(j) coefficient (James, Demaree, & Wolf, 1984). Using rwg(j), within-group interrater agreement can be
assessed, and aggregation justified by comparing the observed group variability to an expected variance. The typical
problem is to determine what variance is to be expected under a null hypothesis that there is no within-group
agreement. It is difficult to determine the statistical significance of rwg(j) values, as the sampling distribution of rwg(j)
for various null distributions is not known (James et al., 1984). Despite cautions (James et al., 1984) and available
alternatives (e.g., specifying a null distribution or utilizing a Monte Carlo simulation to evaluate significance [Charnes
& Schriesheim, 1995]), a rectangular distribution is most frequently used. The rectangular distribution assumes
completely nonrandom responses and an equal number of responses for each category. For example, on a five-point
Likert scale, it assumes that 20% of the respondents will select each of the five response alternatives. Since this is
generally not typical in an organizational sample, the rwg(j) coefficient is usually overstated (James et al., 1984).
Another problem with the index is that the number of items in a scale affects the size of the coefficient. The greater the
number of items, the larger is the index (Lindell & Brandt, 2000; Schriesheim et al., 1995). Finally, the result of rwg(j)
calculations is an index for each group. That makes it helpful for deciding which of multiple groups show enough
agreement to justify aggregation, but makes questionable whether the average rwg(j) is appropriate for making an
overall aggregation decision for a measure across a set of groups.
Despite its drawbacks, rwg(j)is frequently used to justify aggregation in the organizational climate literature. Baer &
Frese (2003) calculated rwg(j) for four variables (e.g., climate for initiative and climate for psychological safety) for
each of 47 companies. Cogliser & Schriesheim (2000) calculated rwg(j) coefficients for twelve variables (including five
measures of organizational climate) for each of 65 different work groups. (Notably, they used a slightly skewed
distribution rather than the rectangular distribution, and found that less than half of the rwg(j) coefficients met the
agreement criteria.) Dietz et al. (2003) calculated rwg(j) for procedural justice climate, while Dietz et al. (2004)
calculated rwg(j) for both service climate and customer satisfaction. Gibson & Birkinshaw (2004) used rwg(j) to evaluate
agreement on five variables, including ambidexterity and organization context. Griffin & Mathieu (1997) used rwg(j)
to evaluate agreement in eight scales (including three climate measures). Finally, Lindell & Brandt (2000) calculated
rwg(j)* (an alternative rwg(j) index using average variance of the items) for four measures of climate.

3.4. WABA

Within and between entities analysis (WABA; Dansereau et al., 1984) is a procedure that is used in the
organizational literature to both justify aggregation and investigate multilevel relationships between measures. There
are three steps in WABA. First, original raw scores are partitioned into within and between unit scores and used in
evaluations of variance (WABA I) and covariance (WABA II). In the final step, raw score correlations are decomposed
into within and between unit components. These decomposed raw score correlations are used in conjunction with the
findings from the first two steps to make a final judgment as to where the effects are taking place (within units, between
units, both within and between units, or neither within nor between units) and ultimately about the appropriateness of
aggregation.
There are concerns associated with this methodology. One issue is that etas are used to evaluate variance in the first
step of WABA (WABA I). A problem highlighted by George & James (1993) is that when group scores suffer from
range restriction, WABA I conclusions may be erroneous. Additionally, while some of Cogliser & Schriesheim’s
(2000) WABA findings supported their rwg(j) conclusions, some findings were contradictory. Likewise, Markham &
Halverson’s (2002) WABA I results contradicted the conclusions drawn from ICC(1) and ICC(2) values.

3.5. Justifying aggregation in hierarchical linear modeling

Hierarchical linear modeling (HLM; Raudenbush & Bryk, 2002) is a relatively recent method that is being
increasingly used to investigate relationships among measures at multiple levels. Since it is typically applied in a way
510 M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521

that assumes that measures have equivalent meaning at individual and aggregate levels, scholars using it have used the
preceding approaches to evaluate the appropriateness of using the aggregate measures. Although the program does not
specifically report any indices to help the user justify aggregation, it provides information for each variable that lets the
user easily calculate ICC(1) (Hofmann, 1997; Raudenbush & Bryk, 2002, pp. 23–24, 36, 71). For example, in their
comprehensive analysis, Naumann & Bennett (2000) evaluated a procedural justice climate variable using rwg(j), eta-
squared, and ICCs before aggregating and testing their hypotheses using HLM. Also employing multiple techniques,
Glission & James (2002) evaluated within-group agreement using rwg(j) tests, evaluated between-group differences
using eta-squared and ICCs, and finally tested their cross-level hypotheses (concerning teams and individuals) using
HLM. Seibert, Silver, & Randolph (2004) used ANOVA to evaluate between-group variance and ICC(1) to examine
within-group agreement for a measure of empowerment climate (to justify aggregation), and then used HLM to
investigate their multi-level hypotheses.

3.6. The current status of aggregation issues in organizational research

Our review indicates that organizational scholars use a number of methods to evaluate the between and within group
differences of measures designed at the individual level prior to aggregating. All of these approaches, however, focus
on aggregating scales and do not consider possible differences in the metric structure associated with inter-item
relationships at different levels of analysis. This lack of investigation is possibly due to the history of the field of
organizational behavior. Historically, research in this field has been concerned with developing and validating
individual level variables. Aggregating them to a higher level has generally been of secondary concern. Consequently,
how the aggregate-level variable differs from the lower-level variable has typically not been investigated. Bliese (2000)
refers to this situation as the “fuzzy composition process” (p. 369). He defines this as a situation where the aggregate
variable represents a similar yet different construct than its lower-level counterpart. That is, “the aggregate maintains
close links to its lower-level counterpart but nevertheless differs in subtle and important ways” (Bliese, 2000, p. 369).
While he recommends use of ICC(2) to estimate group-mean reliability, he does not make any further analytical
recommendations.
These debates about how to evaluate the appropriateness of aggregation in the organizational literature all have one
thing in common. They take for granted that measures should first be constructed based on theory and metric structures
at the individual level. As a second step, they evaluate whether it is appropriate to aggregate the individual level
measures. Occasionally, a similar approach has been taken in the international literature. For example, ICCs have been
used to evaluate whether or not the differences between nations in individual level constructs are large enough to justify
research that predicts these differences from other nation level measures (Peterson, et al., 1995; Van de Vliert, Huang,
& Levine, 2004). We now look at cross-cultural research and see a quite different picture — the strength is a careful
attention to metric structure at different levels, while the limitation is a general assumption that aggregating items is
readily justified.

4. Level of analysis in national culture research

The level of analysis issue was brought to the attention of cross cultural scholars first in application to one particular
study by Hofstede (1980), then framed for broader application by Leung & Bond (1989), and subsequently represented
by separate sets of measures for individuals and for nations by Schwartz (1992, 1994, 1999), Schwartz & Bilsky (1987)
and Leung & Bond (2004). However, unlike in the organizational literature, the level of analysis issue given most
attention in this literature is the one of the metric characteristics of items in surveys administered to individuals, but that
are then used to represent national or societal cultures. Since few studies have data about societal culture dimensions
from more than a handful of societies, few studies have had the data needed to evaluate the metric structure of items
aggregated to the societal level.

4.1. Ecological and reverse-ecological fallacies in measure design

The issue of level of analysis in the design of national culture measures was raised by Hofstede (1980, 2001). He
drew attention to the “ecological fallacy” (Robinson, 1950) of taking a conclusion about relationships found at a
collective level and applying them at the individual level. What he called the “reverse ecological fallacy” was applying
M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521 511

relationships among variables at the individual level to make conclusions at a collective level. It is the reverse
ecological fallacy that has been a particularly common problem in subsequent research. Whereas most prior research
had applied the issue of such fallacies to relationships among measures, Hofstede applied them to measure design. In
measure design, the reverse ecological fallacy frequently appears when scholars construct individual level measures
and use Hofstede's nation-level measures and theory to interpret them.

4.2. Hofstede's approach to aggregate measure design

Hofstede applied these insights about level of analysis in measure design to overcome a problem that he had
encountered when trying to create individual-level measures and aggregate the measures to the nation level. In effect,
he had tried to follow what has become the traditional organizational culture approach to measure design in order to
create national culture measures. His new procedure was based on aggregating each item to the nation level before
evaluating their metric structure. He developed two of his nation-level measures (individual–collectivism and
masculinity–femininity) based in part on a factor analyses of a set of work goal items aggregated to the nation level and
corrected for response bias. He developed the other two of his nation-level measures (power distance and uncertainty
avoidance) by selecting a key item representing each construct and checking to see which of a set of conceptually
related items were correlated with them. Hofstede & Bond (1988) followed a similar procedure to develop a set of
measures based on Confucian culture. Considerable controversies remain about the specifics of his conclusions, but the
practice of first aggregating items to the nation level and then checking the metric structure has continued to have
influence.

4.3. Pancultural, within-culture, and cross-cultural analyses: implications for measure design

A second key step in establishing the distinction between individual-level and nation-level metrics in measure
design was taken by Leung & Bond (1989). They distinguished between pancultural relationships, within-culture
relationships, and cross-cultural relationships. Pancultural relationships are consistent with the starting point for
individual-level measure design that are described above as being typical in the organizational literature. That is,
measures are ideally designed at the individual level for people from a broad range of organizations, departments and
groups. In cross-cultural research, it would mean that measures are designed by evaluating psychometrics in a way that
combines respondents across all nations. Within-culture relationships are those that are found in each nation taken
separately. The closest analogy in organizational behavior research might be constructing unique personality or
performance evaluation measures for each of several organizations out of a pool of personality or performance
evaluation items. Cross-cultural relationships, though, are the sort of thing that Hofstede used as a basis of his culture
dimensions. These are relationships found after data for separate items have been aggregated to a target level. The
Leung & Bond (1989) article does not differentiate between relationships among items and relationships among
constructs. However, its application to relationships among items has been particularly important in several subsequent
international projects.
The first such application that explicitly referenced the Leung & Bond (1989) procedure was Triandis's project
distinguishing loosely related measures at two levels of analysis. Triandis et al. (1993) used Leung & Bond's (1989)
procedure to develop a measure of individualism–collectivism at the societal level and a measure of idiocentricism–
allocentrism at the individual level. Other projects followed suit. Schwartz (1994) shows the distinction between
measures constructed at the individual and societal levels quite clearly by providing 10 measures of individual level
values as compared to 7 societal level measures of values. Smith, Dugan, & Trompenaars (1996) identified three
nation-level measures in a consulting data set collected by Trompenaars. Goodwin, Nizharadze, Luu, Kosa, &
Emelyanova (1999) report following the Leung & Bond (1989) procedure. Fu et al. (2004) took this approach in
constructing measures of influence strategies and before using HLM to link them to the cultural values data provided by
GLOBE (House et al., 2004).

4.4. The current status of aggregation in cross-cultural research

The preceding history of aggregation issues in the cross-cultural literature shows a very different set of concerns
than does the literature about creating organizational culture measures. It shows considerable concern to develop
512 M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521

measures based on the metrics of items, not scales, after the items have been aggregated to the societal level. These
cross-cultural studies show little interest in using ICCs, rwg's or other statistics to evaluate the appropriateness of
aggregating items to the societal level. At most, the appropriateness of aggregating items is implicit in the relationships
among the items once they are aggregated. That is, if there is no between-group variability in an item at the individual
level, then the item will have minimal variance and be minimally correlated with other items at the aggregate level. The
reason for the difference in concern is probably related to the level of analysis of the focal concepts — cultural
characteristics of societies. It is probably also made possible by the large sample sizes both overall and for individual
societies in the large cross-cultural projects. It is this typically large sample size that allows relationships to be found
between society-level variables in different projects (e.g., Hofstede, 2001, pp. 503–520; Smith, Peterson, & Schwartz,
2002) despite typically greater variability within nations than one would expect within groups or organizations (Au,
1999; Au & Cheung, 2004).

5. Level of analysis of measure development in GLOBE

The GLOBE authors show an interest in realizing the project's potential to create organization-level and society-
level measures rather than individual-level measures (Hanges & Dickson, 2004, pp. 124, 127, 146). The analyses
used also show an awareness of both the organizational culture and societal culture literatures about aggregation,
albeit a somewhat confused awareness. In particular, they endorse the approach that Schwartz (1992, 1994)
conducted to design his measures, although they focus particularly on the design of his individual-level measures
(Hanges & Dickson, 2004, pp. 123–124). They use the term “convergent–emergent” constructs to describe their
measures (e.g., Hanges & Dickson, 2004, p. 124), but the logic of the measure development shows a mixture of
unclarity about what the project did and confusion about what the project could do to create measures. The details
provided suggest that the gist of the approach was to design measures at the individual level, justify aggregating the
individual-level measures to a target aggregate level (either organizational or societal), then check to see whether
items composing each individual-level measure were interrelated when the items were aggregated to the
organizational or societal levels. This approach is consistent with that done in most organizational research as noted
above. The rhetoric as detailed below, including the appeal to the project of Schwartz (1992, 1994), suggests that the
intent was to incorporate the sort of measure design approach that is described above as being typical in the cross-
cultural literature.

5.1. Basis of GLOBE measures in individual-level metric structures

The basis of the psychometric evaluation of the measures in individual-level metric structures is reflected
throughout the GLOBE book's main measure development chapter (Hanges & Dickson, 2004) and the GLOBE book
chapter about organizational culture (Brodbeck, Hanges, Dickson, Gupta, & Dorfman, 2004). For example, “A first-
order exploratory factor analysis on the leader attributes items yielded 16 unidimensional factors” (Hanges & Dickson,
2004, p. 128) indicates that leadership measures were designed based on what Leung & Bond (1989) would refer to as a
pancultural analysis of individuals since 16 factors would be unlikely based on societal level analysis of 28 societies.
For the organizational and societal culture measures, the aggregation process that GLOBE reports also suggests that
they aggregated scales that were first designed based on individual-level metric structures (Hanges & Dickson, 2004,
pp. 132–136). Consistent with the organizational culture literature, Hanges & Dickson (2004, p. 127) report ICCs for
composite measures rather than items. They also indicate that GLOBE used rwg, ICCs and one-way analysis of
variance to evaluate whether scales constructed at the individual level could be meaningfully aggregated to the societal
or organizational levels. In so doing, the GLOBE project attempts to follow the procedure described above as being
typical of organizational research and the few international comparative studies that evaluate whether there is enough
variability between relative to within an individual level measure to talk about differences in averages between nations
(e.g., Peterson, et al., 1995).

5.2. Level of analysis inconsistencies in GLOBE

Despite the evidence of the procedure actually followed, Hanges and Dickson (p. 127) indicate that “It should be
noted that because we wanted these scales [leadership, organizational culture, societal culture] to measure
M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521 513

organizational or societal level and not individual level variation, we performed these analyses [ICCs, rwg, ANOVA] on
the means of the country item responses for each scale.” This sentence goes to the core of a misunderstanding of the
difference between the aggregation issues addressed in the organizational culture and the societal culture literatures. In
fact, aggregate “internal consistency” indicators are provided that are based on society-level means of items that are
part of what would be needed to create such scales. However, these tend to be low for the organizational measures and
were apparently not the basis for making decisions about measure design. Further, the lack of a systematic evaluation of
scale correlations at the society level makes it unclear whether the constructs are sufficiently distinct to be considered
independent society-level dimensions. As detailed below, Dickson et al. (2006-this issue) gives substantial evidence
that the leadership scales are too highly correlated to be considered distinct at the organizational level. Similarly, data
from the GLOBE book gives at least some evidence that the societal culture scales may be too highly correlated to be
considered distinct at the nation level.
The confusion about level of analysis is not just in chapter 8, but is also in Dickson et al. (2006-this issue).
Using individual-level confirmatory factor analysis to evaluate measure adequacy is inconsistent with the apparent
intent of constructing measures based on metric structures at the organizational level. It sometimes seems to
indicate that the individual level data structures were used to make decisions about how organization and society
level scales would be constructed. On page 17 (Dickson et al., 2006-this issue) the authors suggest that “These
standard scores were then combined to compute the mechanistic-organic scales for each person based on the
previous factor analysis. These standardized scale scores were then aggregated to the organizational level of
analysis….” It is unclear what is meant by “each person” if the measures were designed based on the measurement
structure of society-level item means. On p. 23 (Dickson et al., 2006-this issue), the authors also seem to suggest
that the scales were created based on individual level data structures, that ICCs were then used to check the
appropriateness of aggregating the individual level scales (not items), then that the individual level scales were
aggregated to the society level. Taken along with the very high correlation between transformational and
considerate leadership at the organization level, it appears that the scales were formed based on individual level
psychometrics then aggregated.
In summary, although the descriptions of the judgments made about measure composition during the design
process are unclear at points, it appears that the GLOBE project first evaluated the measures of its three main
categories of variables, leadership, organization culture and national culture based on individual-level metrics, then
aggregated them to the society level. Although information about internal consistency among aggregated items is
provided, it is not clear that even this limited information was used in decisions about how to construct societal-level
measures. There are other issues in whether GLOBE used the best available approaches even within the
organizational literature about whether aggregating individual level scales is justified. For example, GLOBE seems
to have placed heavy reliance on average rwg(j) to evaluate the appropriateness of aggregation despite the
controversies noted above regarding the use of a square distribution to represent the error that inflates the apparent
degree of agreement and the potential that a modest average rwg(j) may hide organizations or societies in which the
aggregate construct may not be meaningful. However, the main issue in the present analysis is not about how
GLOBE followed the organizational literature to justify the aggregation of individual level scales. Instead, the issue
is that GLOBE's focus is on aggregating individual level scales rather than constructing scales based on aggregated
items. The main point of ambiguity is that when the GLOBE group emphasizes its purpose of developing
organizational and societal measures, their insistence that the measures are not appropriate for individual level
research suggests that they wished to evaluate their measures based on organizational and societal level metric
structures (Hanges & Dickson, 2004, pp. 124, 146).

5.3. Tensions in integrating the organizational and cross-cultural literatures about aggregation

The unclarity that appears in the GLOBE discussion of the two very different logics for constructing aggregate
measures reflects differences noted above in the history of measure development in the organizational culture and
national culture literatures. Scholars working in these two literatures have not integrated the different approaches to
aggregation issues and it has taken a project of the scope of GLOBE to bring the problem to the fore. Although
aggregating individual level scales can be appropriate for particular research questions, the logic underlying the
GLOBE book and the Leadership Quarterly article is to construct measures based on organization or society level data
structures. The GLOBE group appears to have been misled in its application of ICCs by missing the distinction
514 M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521

between the two purposes that they note for developing measures for country-level comparison. Their attempt to
combine “convergent” (fundamentally individual-level, pancultural) and “emergent” (fundamentally aggregate-level,
cross-cultural) muddles the discussion and may have muddled the measure design process. It also results in a misfit
between measures and theory by placing an emphasis on the “convergent” in decisions about measures, but an
emphasis on the “emergent” in describing the measures' purposes and implications. The next section seeks to sort out
these purposes more clearly and in a way that draws from both the organizational and cross-cultural literatures about
aggregation.

6. Three alternative analysis procedures for aggregate measure development

In order to suggest directions that might be taken not only in the GLOBE project, but also in organizational
and cross-cultural research, we suggest that scholars distinguish between and make increased use of three
approaches to aggregate measure design. Table 1 summarizes these three approaches, summarizes the steps for
designing measures and evaluating aggregation in each, and indicates the type of constructs for which each type
of analysis approach is appropriate. The create individual level scales and aggregate (ILSA) approach is
appropriate when aggregate level differences in theoretically individual phenomena are being studied. The
aggregate items, create aggregate scales (CAS) approach is appropriate when theoretically aggregate phenomena
are being studied and there are few items for each individual level indicator. The create individual level scales,
aggregate, create aggregate scales (ILSA/CAS) approach is appropriate when theoretically aggregate phenomena
are being studied and the data set has a large number of items with multiple items for each individual level
indicator.

6.1. ILSA

Many constructs in the climate literature including measures of satisfaction, personal attachment to a group,
and forms of leadership that deal with the way superiors interact with subordinates, are fundamentally individual
level constructs. The decision that a construct is an individual level construct does not mean that it cannot be
aggregated. Aggregation can be appropriate when conditions at group, organizational or societal levels produce
systematic differences in these individual level constructs. For example, economic stress, natural disasters,
political stress and the like can well result in periods when the average satisfaction level of people in some
societies is higher than that in others. Similarly, personality traits are fundamentally individual level constructs,
but there may be phenomena at the nation level that will produce sufficient differences between nations in
average levels of personality traits (notably those affected more by socialization than genetics) so that
aggregation to the nation level may be meaningful. For example, extroversion may be more socially accepted in
some nations than in others, but such differences in acceptance do not change the individual level nature of
extroversion. For such constructs, we recommend that scales be constructed at the individual level and that the

Table 1
Approaches to aggregate measure construction
Approach Procedure Examples Appropriate for:
Create Individual-Level Create and evaluate individual-level scales; Climate and culture studies; Personality; attitudes;
Scales, Aggregate evaluate degree of within-group agreement Peterson et al. (1995) personal values;
(ILSA) (compared to between) for scales; aggregate scales personal relationships
Aggregate Items, Create Conceptually evaluate individual-level items; Hofstede (1980, 2001); Schwartz Normative values;
Aggregate-Level Scales aggregate items; create and evaluate scales based (1994), Leung & Bond, 1989, perceived societal values;
(CAS) on relationships among aggregated items Triandis (1995), Smith et al. (2002), institutionalized practices
Hofstede et al. (1990)
Create Individual-Level Create and evaluate individual-level scales; See Table 2 Normative values;
Scales, Aggregate, evaluate degree of within-group agreement perceived societal values;
Create Aggregate-Level (compared to between) for individual-level scales; institutionalized practices
Scales (ILSA/CAS) create and evaluate configural aggregate scales
based on relationships among aggregated scales
M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521 515

procedures typical in the organizational literature be followed to evaluate whether there is enough within-group
homogeneity or enough within-group homogeneity compared to between-group difference to justify aggregating
them.

6.2. CAS

Our second category is reflected in fewer projects, notably Hofstede (2001) and Schwartz (1994) at the society or
nation level and Hofstede, Neuijen, Ohayv, & Sanders (1990) at the organization level. Cross-cultural scholars use
many constructs derived from individual value items. The logic of that use is that the individual values are so strongly
affected by characteristics of societal institutions that it is useful to think of them as surrogates for institutional
measures. From an individualistic perspective, it is natural to reduce social processes to the actions of individuals.
However, individuals also act in a way that reflects what is typical in a society. An obvious example is language. An
individual has no choice at all over what language he or she learns first, although a reflective person might develop
attitudes about how much they like their language. In this view, what one finds normal based on childhood experience
is largely a societal matter determined by context, what one prefers is a more complex individual-level matter best
represented by individual differences in choices and preferences.
Treating individual level data in this way is especially appropriate when respondents are explicitly asked to
describe norms of their society rather than their personal preferences. Nevertheless, values scholars have made
convincing arguments that values are sufficiently shaped by societal norms (Hofstede, 2001; Schwartz, 1994) that
using the CAS approach to study aggregate societal values would be useful. When respondents are asked to
describe societal norms, it is useful to check whether there is enough consistency within a society to conclude that
they are really describing norms rather than personal views or answering at random (indicating that the items are
not meaningful to them). When respondents are asked to describe values, consistency is appropriate for studies
comparing means, but other distribution characteristics (e.g., variances) have the potential to provide additional
societal insights (Au, 1999). Consequently, we suggest that scholars use rwg, ICCs, and ANOVAs on items (not
scales) to justify aggregating the items and only use those that appear empirically appropriate for creating aggregate
level scales. Hofstede et al. (1990) provide an example in constructing organizational culture measures. This
approach has a limitation. Since it is items that are aggregated, it is necessary to assume that each item has
reasonably similar meaning in each nation (or other aggregate unit) that one studies. This limitation has rarely been
explicitly addressed in the cross-cultural literature. The next alternative approach to measure design overcomes this
limitation.

6.3. ILSA/CAS

Our third category is a variant on the second that requires an unusually large number of items and aggregate units.
In other words, it is particularly appropriate for GLOBE. We provide an example in Table 2 that takes the next step
toward an ILSA/CAS approach to measure design starting from the ILSA approach that the GLOBE project
emphasized to create its societal culture scales. Table 2 shows varimax rotated factor analysis results for measures of
societal culture provided in the GLOBE book. Factor analysis is done separately for GLOBE measures asking what
respondents perceive their society as being like (“as is”), and what they would like their society to be like (“should
be”). The steps to follow in this approach combine the two preceding approaches in sequence. Scales are first
constructed at the individual level. The scales are then evaluated using ICCs, rwg's, or ANOVAs as appropriate to
evaluate the value of aggregating to the target level. The aggregate means of these scales that were designed at the
individual level are then analyzed using the results of procedures such as the factor analyses shown in Table 2. For
example, five of the “as is” measures load sufficiently highly on the first varimax rotated factor that they might be
combined into a single index. The ILSA/CAS approach requires a large enough number of items reflecting each
indicator at the individual level to create scales at that level. This approach has an advantage over the CAS approach.
In the CAS approach, the items have unknown reliability and validity at the individual level. Creating individual
level scales overcomes this limitation.
In order to choose among these approaches, scholars first need to evaluate whether the phenomena they are studying
are fundamentally individual or aggregate phenomena. To guide this choice, the third column in Table 1 suggests that
some constructs, like many aspects of personality, attitudes, personal values, relationships between an individual and
516 M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521

Table 2
Example of factor structure based on configural aggregation of scales: uncorrected GLOBE culture scales
Cultural “as is” measures
Component
1 2 3
Assertiveness .08 − .79 .32
Performance orientation .71 .16 .50
Gender egalitarianism − .02 .03 − .81
Institutional collectivism .50 .68 .01
In-group collectivism − .65 .17 .54
Humane orientation .00 .83 .32
Future orientation .89 .03 .10
Power distance − .68 − .26 .41
Uncertainty avoidance .89 .02 − .03

Cultural “should be” measures


Component
1 2 3 4
Assertiveness .19 .09 − .06 .88
Performance orientation .07 .84 .16 − .08
Gender egalitarianism −.78 .36 .20 − .23
Institutional collectivism .47 .41 − .01 − .59
In-group collectivism .13 .86 − .14 .03
Humane orientation −.08 −.10 .95 .06
Future orientation .75 .48 − .01 − .05
Power distance .21 −.44 − .62 .39
Uncertainty avoidance .88 .17 − .10 .03
Extraction method: principal component analysis. Rotation method: varimax with Kaiser normalization.

various others (leaders, group members) are meaningfully constructed at the individual level. For such measures, much
individual-level variability is to be expected due to variability in personal experiences within a society. For such
measures, the possibility that there may be differences at an aggregate unit may be an interesting hypotheses, but is not
inherent in the concepts themselves. For example, Schneider's (1987) attraction–selection–attrition (ASA) model of
climate suggests that individuals having certain personalities or values may be attracted by an employer based on the
employer's image, selected based on the values reflected in an employer's HR practices, and may leave based on
incompatibilities between the person and their organization. This sequence is quite different from one based on the
view that values are created by organizations. If that were the case, then values constructs might better be studied at the
organization level. Table 1 also suggests that other constructs that can be derived from individuals are likely to be so
strongly affected by an individual's association with a larger unit that measures are best constructed based on the metric
structure at an aggregate level.

7. Implications for national or societal culture research in organization studies

The preceding history of level of analysis assessment in international organizational research indicates that the first
two approaches shown in Table 1 to creating societal level measures have been frequently followed in large multiple-
nation projects. Cross-cultural scholars still need to take care to recognize the difference. The ILSA approach is
appropriate when evaluating whether personality, role stress, or supervisor–subordinate leadership measures are being
aggregated to see whether they are predicted by nation-level culture measures designed using a CAS approach. The
CAS approach is appropriate for items that are sufficiently linked to societal norms and individuals' socialization into
norms of a particular organization or society that it is helpful to think of them as indicators of organization or society
characteristics than only as random individual differences.
We recommend that scholars consider the ILSA/CAS approach when dealing with broad scope, large sample
comparative data bases. The original Hofstede (2001) database may have too few items relevant to the four
M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521 517

culture dimensions to first create individual-level scales. The Inglehart, Basanez, & Moreno (1998) database has a
sufficiently large number of items and data from a large enough set of nations to consider this approach. In fact,
Au (1999) suggests something closely analogous when he suggests that variances rather than means be the
aggregation metric used and scales then be constructed at the nation level based on variances. The Schwartz
(1994) project may have the potential to consider the ILSA/CAS approach. The GLOBE project, as detailed
below, clearly has that potential.

8. Implications for organizational climate and culture research

The CAS and ILSA/CAS approaches have the potential to re-energize the field of organizational culture research.
Organizational culture research continues to struggle to find a clear niche in organization studies (Ashkanasy,
Wilderom, & Peterson, 2000). The analogy that some “cultural” characteristics of societies are also found in
organizations has some merit, but also should be taken with caution. For example, the taken-for-granted quality that can
be found in organizations is somewhat similar to what is found in societies. Both are linked to unconscious automatic or
System 1 cognitions (Kahneman, 2003). As Hofstede (2001) implies in the sharp distinction he makes between national
and organizational culture, the scope and resilience of such cognitions that develop through organizational socialization
may be more focused and more amenable to change than those that develop through childhood socialization. Whether
or not this is the case, socialization into organizational norms is likely to be strong enough that the sort of CAS measure
design process that has been frequently used in cross-cultural analysis may be more useful than it is currently
recognized to be in organization culture analysis.
Increasing the use of the CAS and ILSA/CAS approaches has the potential to identify uniquely organizational
culture constructs. Hofstede et al. (1990) provide an example, albeit one that is based on only 20 organizations. They
argue that beliefs about and preferences for relatively concrete organizational policies and programs (termed practices)
are consequential for organizations, whereas societal norms about what is desired and desirable (termed values) are
more consequential for larger societies. Whether or not their position on this point is supported by future research, few
other research projects have constructed measures based on items aggregated to the organizational level. Instead, the
norm in organizational culture research is to take the ILSA approach developed in the organizational climate and group
literatures.

9. Implications for Project GLOBE

Project GLOBE deals with the issue of level of analysis using many of the more sophisticated techniques typical in
the organizational culture literature, but does not deal as clearly with the metric issues of level of analysis in the cross
cultural literature. The issue of the level of analysis at which measures are designed appears to be an element in the
distinction that the GLOBE scholars make between “first order” or “basic” factors versus “second order” or “global”
factors (e.g., GLOBE, p. 136). However, this distinction seems also to reflect an individual-level issue of whether or not
the 21 somewhat distinguishable individual-level leadership factors can be combined at the societal level. The
convergent–emergent logic that the GLOBE group has followed (Hanges & Dickson, 2004) seems to be applied in
such a way that the level of analysis actually followed in measure design is ambiguous. Apart from recommending
what steps should be followed, we also recommend that scholars who work with the GLOBE data base should be
careful to be clear and consistent when describing the level of analysis at which the metric structure of the measures
they report were designed. This sort of clarification may require redesigning the GLOBE measures rather than using
those for which information has been published to date.

9.1. Potential of the GLOBE database

GLOBE is distinctive in having developed a broader set of items focusing on management-related issues of
leadership, organizational culture and national culture in a larger number of nations than has been done before. The
issues we are raising here are not about the project's basic design, but about the data analysis approach taken to evaluate
and describe its measures. GLOBE has already taken an important step by constructing useful individual-level
measures and demonstrating that they can be meaningfully aggregated following the ILSA approach. The project's
scope provides the potential to either follow a CAS approach to design aggregate measures from items or to add a CAS
518 M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521

element onto its ILSA present approach. This addition would be to take the third step of using the aggregated
individual-level scales to create measures based on the organization-level and society-level data structures applied to
these scales as shown in Table 2. The few earlier projects that create measures based on society-level data structures,
notably Hofstede (2001) and Schwartz (1994) do so by aggregating items then creating scales based on the nation-level
data structures. It could also be helpful if GLOBE were to create scales based on this CAS method and compare them
with scales created using the ILSA/CAS method.

9.2. Completing the process of individual-level measure development

In order to complete the analyses that are presented in the GLOBE book, the authors need to provide additional
information. Since these are fundamentally individual-level scales, the individual-level information typically used to
evaluate scales needs to be provided. Apart from aggregation issues, Hanges & Dickson (2004, p. 147) explicitly
and inappropriately reject the need to evaluate translation equivalence at the individual level. Documenting the
individual level metrics of the measures is just as important as and logically prior to evaluating whether they can be
aggregated. If the GLOBE group had developed their measures based on items aggregated to the society level, the
limitation in ability to use confirmatory factor analysis to evaluate equivalence would be appropriate. However, as it
stands, a prior step in aggregating scales developed at the individual level is to evaluate equivalence between
translations using confirmatory factor analysis. Individual-level reliability information should also be provided for
each nation. Given the large number of items and nations, some items are likely to be found to have been translated
inadequately, and some nations may show a whole set of problems. Given that social constructs are socially
constructed, it should not be surprising that ideas constructed in some societies are sometimes not meaningful in
others (Morris, Leung, Ames, & Lickel, 1999; Peterson & Pike, 2002). Finding some examples of non-equivalence
is just as likely to indicate that items have been designed from different societal bases and translated carefully as it
is to indicate a mistake by a translator. Once the evaluation of the measures at the individual level is completed, the
analyses of ICCs that assume meaningful individual level metrics (Hanges & Dickson, 2004, pp. 134–135) will
become easier to interpret.

9.3. Evaluating item aggregation and doing ILSA/CAS analyses

The GLOBE group has the potential to help the field of organization studies in two ways. One is to establish the
norm of evaluating which items should be aggregated before aggregating them to the organization or nation level and
using only items that pass this screen to evaluate measure structure at aggregate levels. Hofstede et al. (1990) have done
this before using ANOVA results as a criterion for aggregating to the organization level, but we find little evidence that
either organization or national culture scholars have noticed this innovation. The second is to extend their insight of
taking aggregated scales developed at the individual level and submitting them to factor analysis and other scaling
techniques at an aggregate level.

9.4. Beginning to apply ILSA/CAS to GLOBE cultural values dimensions

Table 2 begins to take a next step in the ILSA/CAS approach to aggregation by providing exploratory factor
analyses of the GLOBE societal culture “as is” and “should be” measures based on the uncorrected society scores
provided in each chapter. (Scores corrected for response bias are also provided at the end of the GLOBE book.)
The “should be” items ask managers about the state of affairs they would prefer to see in their society. These
items seem to reflect fundamentally individual concepts, so that the ILSA approach to measure design would be
appropriate. The “as is” items in the GLOBE project ask managers to describe the state of affairs in their society.
They seem to reflect fundamentally societal concepts, so that the CAS or ILSA/CAS approaches to measure
design would be appropriate. Rather than presuming further on the GLOBE group by naming and offering
interpretations of the factors shown in Table 2, we wish only to indicate here that the dimensions they provide do
not appear to be independent at the societal level. We also notice that the factor loadings for different societal
culture measures are quite different for the “as is” and “should be” measures. Dickson et al. (2006-this issue) also
had the opportunity to create analogous measures that would be distinctively organizational in level. These could
be created following the CAS approach by aggregating items to the organizational level and analyzing the
M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521 519

organizational level item means. They could also be created by taking the ILSA/CAS approach and aggregating
individual-level scales to the organizational level, then analyzing the metrics of the organizational level scale
means. The GLOBE book (chap. 8, notably p. 127) asserts that organization and society level scales are designed
after items were aggregated to the organization and society levels, respectively. As noted above, that appears not
to be the case, although reliability at the society-level is provided. In order to complete the society-level metric
analysis following the CAS approach, GLOBE would need to provide information (like factor analysis) that also
reflects item correlations between scales. The GLOBE group would then need to actually make scale building
decisions based on this information.

9.5. Organization and nation level measures

One aspect of GLOBE's project design affects the ability to construct separate organization and nation level
measures. GLOBE's only partial success in fully executing its research plan of having several organizations
representing each of three industries in each society may make it difficult to clearly separate nation from organization.
The description of the sample in Dickson et al. (2006-this issue) implies that many countries were represented by a
single organization in some industries. The organization culture analyses reported in the GLOBE book (p. 659) are
based on nations with more than one organization per industry per country. Even with that constraint, industry, nation,
and organization are somewhat confounded. This sort of confound is likely to make it difficult to separately evaluate
organization-level and nation-level metric structures. Still, the number of organizations per nation makes it appropriate
to press on with the effort.

10. Discussion

The present paper is intended to draw the attention of organizational behavior scholars and cross-cultural scholars
to aggregation issues in each area that have relevance for the other. GLOBE is a particularly good example of the
need to deal with both sets of issues. Organizational scholars working on projects with data from many organizations
need to recognize that the level of analysis issues they have encountered when studying relationships between
measures also apply to measure design. Organization culture scholars should consider the possibility that uniquely
organization-level measures of organization culture might be developed that are different from averages of measures
designed for individuals based on individual-level psychometrics. Scholars working on projects with data from many
nations should attend more carefully to issues that organizational behavior scholars have addressed in aggregating
individual level data before it is analyzed at the aggregate level. Doing so should help identify items that do not have
the properties that are likely to let them contribute to measures created at the aggregate level. Given the modest
number of aggregate data points (nations) in even the largest studies, eliminating items that should not be aggregated
due to excessive individual-level variability may make it easier to successfully construct aggregate measures
following the CAS approach. In order to accomplish these purposes, we suggest three approaches to creating
aggregate measures and make recommendations about how different sorts of organizational research that aggregates
data collected from individuals should progress.
A caveat that applies to all research about aggregation is that some units may be culturally heterogeneous.
Whether within-society variability makes it unreasonable to talk about national (or organizational) culture needs to
be carefully considered if measures are designed based on society-level metric structures. It may well be that one
organization, nation or society is too heterogeneous to include in a particular multiple nation analysis. Does that
mean that organization-, nation- or society-level measures should not be constructed? No. That would be equivalent
to the case where a particular individual shows too much complexity or instability for a measure of personality to
apply. Neither the occasional heterogeneous society nor the complex or unstable individual invalidates a measure
that is based on overall metrics at a particular level of analysis. Further, if the usual checks (ICCs, rwg, etc.) are
conducted, they directly respond to the issue of possibly excessive within-unit heterogeneity that is frequently
raised, but rarely directly addressed, in international comparative research. When evaluating whether aggregation is
appropriate in cross cultural research, both the opportunity to collect large samples in each nation and the limitation
that there will be considerable variability within a nation must be recognized. It is the large sample sizes and
considerable variance between nations that is often found which make it possible to overcome the limitation of
within-nation variability.
520 M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521

The design of the GLOBE project on which the article and book that stimulated our analysis are based has
unusual potential strengths that these publications have not yet exploited. In order to construct measures based on
item interrelationships at the organization or nation level, a project needs to have data from many nations or
organizations. Since organizational behavior scholars rarely have this sort of database, they are usually only able to
consider a very limited aspect of data structures at these levels. Until other similar data bases are developed in
which aggregation to both the organization and nation levels is possible, the field will await further leadership from
the GLOBE scholars.

Acknowledgement

The authors would like to thank Kwok Leung, Lilach Sagiv, Chet Schriesheim and Peter Smith for their comments.

References

Ashkanasy, N. M., Wilderom, C. P. M., & Peterson, M. F. (2000). Introduction. In N. M. Ashkanasy, C. P. M. Wilderom, & M. F. Peterson (Eds.),
Handbook of organizational culture and climate (pp. 1−18). Thousand Oaks, CA: Sage.
Au, K. Y. (1999). Intra-cultural variation: Evidence and implications for international business. Journal of International Business Studies, 30(4),
799−812.
Au, K. Y., & Cheung, M. W. L. (2004). Intra-cultural variation and job autonomy in 42 countries. Organization Studies, 25(8), 1339−1362.
Baer, M., & Frese, M. (2003). Innovation is not enough: Climates for initiative and psychological safety, process innovations, and firm performance.
Journal of Organizational Behavior, 24, 45−68.
Bliese, P. D. (1998). Group size, ICC values and group-level correlations: A simulation. Organizational Research Methods, 1, 355−373.
Bliese, P. D. (2000). Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. In K. J. Klein, &
S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 349−381). San Francisco: Jossey-Bass.
Brodbeck, F. C., Hanges, P. J., Dickson, M. W., Gupta, V., & Dorfman, P. W. (2004). Societal culture and industrial sector influences on
organizational culture. In R. J. House, P. J. Hanges, M. Javidan, P. W. Dorfman, & V. Gupta (Eds.), Culture, leadership, and organizations: The
GLOBE study of 62 societies (pp. 654−668). Thousand Oaks, CA: Sage.
Castro, S. L. (2002). Data analytic methods for the analysis of multilevel questions — a comparison of intraclass correlation coefficients, r(wg(j)),
hierarchical linear modeling, within- and between-analysis, and random group resampling. Leadership Quarterly, 13, 69−93.
Charnes, J. M., & Schriesheim, C. A. (1995). Estimation of quantiles for the sampling distribution of the rWG within group agreement index.
Educational and Psychological Measurement, 55, 435−437.
Choi, J. N., Price, R. H., & Vinokur, A. D. (2003). Self-efficacy changes in groups: Effects of diversity, leadership, and group climate. Journal of
Organizational Behavior, 24, 357.
Cogliser, C. C., & Schriesheim, C. A. (2000). Exploring work unit context and leader–member exchange: A multi-level perspective. Journal of
Organizational Behavior, 21, 487−511.
Dansereau, F., Alutto, J., & Yammarino, F. (1984). Theory Testing in Organizational Behavior: The Varient Approach. Englewood Cliffs, N.J.:
Prentice-Hall.
Denison, D. R. (1996). What is the difference between organizational culture and organizational climate: A native's point of view on a decade of
paradigm wars. Academy of Management Review, 21, 619−654.
Dickson, M. W., Resick, C. J., & Hanges, P. J. (2006-this issue). Systematic variation in organizationally-shared cognitive prototypes of effective
leadership based on organizational form. Leadership Quarterly, 17, 487−505. doi:10.1016/j.leaqua.2006.07.005
Dietz, J., Pugh, S. D., & Wiley, J. W. (2004). Service climate effects on customer attitudes: An examination of boundary conditions. Academy of
Management Journal, 47, 81−92.
Dietz, J., Robinson, S. L., Folger, R., Baron, R. A., & Schulz, M. (2003). The impact of community violence and an organization's procedural justice
climate on workplace aggression. Academy of Management Journal, 46, 317−326.
Fu, P. P., Kennedy, J., Tata, J., Yukl, G., Bond, M. H., et al. (2004). The impact of societal cultural values and individual social beliefs on the perceived
effectiveness of managerial influence strategies: A meso approach. Journal of International Business Studies, 35, 284−305.
George, J. M., & James, L. R. (1993). Personality, affect, and behavior in groups revisited: Comment on aggregation, levels of analysis, and a recent
application of within and between analysis. Journal of Applied Psychology, 78, 798−804.
Gibson, C. B., & Birkinshaw, J. (2004). The antecedents, consequences, and mediating role of organizational ambidexterity. Academy of
Management Journal, 47, 209−226.
Glick, W. H. (1985). Conceptualizing and measuring organizational and psychological climate: Pitfalls in multilevel research. Academy of
Management Review (10), 601−616.
Glission, C., & James, L. R. (2002). The cross-level effects of culture and climate in human service teams. Journal of Organizational Behavior,
23, 767.
Goodwin, R., Nizharadze, G., Luu, L. A. N., Kosa, E., & Emelyanova, T. (1999). Glasnost and the art of conversation: A multilevel analysis of
intimate disclosure across three former communist cultures. Journal of Cross-Cultural Psychology, 30, 72−90.
Griffin, M. A., & Mathieu, J. E. (1997). Modeling organizational processes across hierarchical levels: Climate, leadership, and group processes in
work groups. Journal of Organizational Behavior, 18, 731−744.
M.F. Peterson, S.L. Castro / The Leadership Quarterly 17 (2006) 506–521 521

Hanges, P. J., & Dickson, M. W. (2004). The development and validation of the GLOBE culture and leadership scales. In R. J. House, P. J. Hanges, M.
Javidan, P. W. Dorfman, & V. Gupta (Eds.), Culture, leadership, and organizations: The GLOBE study of 62 societies (pp. 122−151). Thousand
Oaks, CA: Sage.
Hofmann, D. A. (1997). An overview of the logic and rationale of hierarchical linear models. Journal of Management, 23(6), 723−744.
Hofstede, G. (1980). Culture’s consequences. Beverly Hills, CA: Sage.
Hofstede, G. (2001). Culture's consequences (2nd ed.). Thousand Oaks, CA: Sage.
Hofstede, G., & Bond, M. H. (1988). The Confucius connection: From cultural roots to economic growth. Organizational Dynamics, 16(4), 4−21.
Hofstede, G., Neuijen, B., Ohayv, D. D., & Sanders, D. (1990). Measuring organizational cultures: A qualitative and quantitative study across 20
cases. Administrative Science Quarterly, 35, 286−316.
House, R. J., Hanges, P. J., Javidan, M., Dorfman, P. W., & Gupta, V. (Eds.). (2004). Culture, leadership, and organizations: The GLOBE study of
62 societies (pp. 9−28). Thousand Oaks, CA: Sage.
Inglehart, R., Basanez, M., & Moreno, A. (1998). Human values and beliefs. Ann Arbor: University of Michigan Press.
James, L. R. (1982). Aggregation bias in estimates of perceptual agreement. Journal of Applied Psychology, 67, 219−229.
James, L. R., Demaree, R. G., & Wolf, G. (1984). Estimating within-group interrater reliability with and without response bias. Journal of Applied
Psychology, 69(1), 85−98.
Kahneman, D. (2003). A perspective on judgment and choice. American Psychologist, 58, 697−720.
Klein, K. J., Conn, A. B., Smith, D. B., & Sorra, J. S. (2001). Is everyone in agreement? An exploration of within-group agreement in employee
perceptions of the work environment. Journal of Applied Psychology, 86, 3−16.
Leung, K., & Bond, M. H. (1989). On the empirical identification of dimensions for cross-cultural comparisons. Journal of Cross-Cultural
Psychology, 20, 133−151.
Leung, K., & Bond, M. H. (2004). Social axioms: A model for social beliefs in multicultural perspective. Advances in Experimental Social
Psychology, 36, 119−197.
Lindell, M. K., & Brandt, C. J. (2000). Climate quality and climate consensus as mediators of the relationship between organizational antecedents and
outcomes. Journal of Applied Psychology, 85, 331−348.
Markham, S. E., & Halverson, R. R. (2002). Within- and between-entity analyses in multilevel research: A leadership example using single level
analyses and boundary conditions (MRA). Leadership Quarterly, 13, 35−52.
Morris, M. W., Leung, K., Ames, D., & Lickel, B. (1999). Views from inside and outside: Integrating emic and etic insights about culture and justice
judgment. Academy of Management Review, 24, 781−796.
Naumann, S. E., & Bennett, N. (2000). A case for procedural justice climate: Development and test of a multilevel model. Academy of Management
Journal, 43, 881−889.
Ostroff, C., Kinicki, A. J., & Clark, M. A. (2002). Substantive and operational issues of response bias across levels of analysis: An example of
climate–satisfaction relationships. Journal of Applied Psychology, 87, 355−368.
Peterson, M. F., & Pike, K. L. (2002). Emics and etics for organizational studies: A lesson in contrast from linguistics. International Journal of Cross
Cultural Management, 2, 5−19.
Peterson, M. F., Smith, P. B., et al. (1995). Role conflict, ambiguity, and overload: A 21-nation study. Academy of Management Journal, 38,
429−452.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage.
Robinson, W. (1950). Ecological correlations and the behavior of individuals. American Sociological Review, 75, 351−357.
Schneider, B. (1987). The people make the place. Personnel Psychology, 40, 437−454.
Schriesheim, C. A., Cogliser, C. C., & Neider, L. L. (1995). Is it “trustworthy”? A multiple-levels-of-analysis reexamination of an Ohio State
leadership study, with implications for future research. Leadership Quarterly, 6, 111−145.
Schwartz, S. H. (1992). Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In M. Zanna (Ed.),
Advances in experimental social psychology, Vol. 25 (pp. 1-65). New York: Academic Press.
Schwartz, S. H. (1994). Beyond individualism–collectivism: New cultural dimensions of values. In U. Kim, H. C. Triandis, C. Kagitcibasi, S. C.
Chioi, & G. Yoon (Eds.), Individualism and collectivism: Theory, method and applications (pp. 85−119). London: Sage.
Schwartz, S. H. (1999). A theory of cultural values and some implications for work. Applied Psychology: An International Review, 48, 23−47.
Schwartz, S. H., & Bilsky, W. (1987). Toward a universal psychological structure of human values. Journal of Personality and Social Psychology,
53, 550−562.
Seibert, S. E., Silver, S. R., & Randolph, W. A. (2004). Taking empowerment to the next level: A multiple-level model of empowerment,
performance, and satisfaction. Academy of Management Journal, 47, 332−349.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420−428.
Smith, P. B., Dugan, S., & Trompenaars, F. (1996). National culture and the values of organizational employees: A dimensional analysis across 43
nations. Journal of Cross-Cultural Psychology, 27, 231−264.
Smith, P. B., Peterson, M. F., & Schwartz, S. H. (2002). Cultural values, sources of guidance and their relevance to managerial behavior: A 47 nation
study. Journal of Cross Cultural Psychology, 33, 188−208.
Triandis, H. C. (1995). Individualism and collectivism. Boulder, CO: Westview.
Triandis, H. C., McCusker, C., Betancourt, H., Iwao, S., Leung, K., Salazar, J. M., et al. (1993). An etic–emic analysis of individualism and
collectivism. Journal of Cross-Cultural Psychology, 24, 366−383.
Van de Vliert, E., Huang, X., & Levine, R. V. (2004). National wealth and thermal climate as predictors of motives for volunteer work. Journal of
Cross-Cultural Psychology, 35, 62−73.

You might also like