RP2081R 1

CATEGORY SPANNING, DISTANCE, AND APPEAL
BALÁZS KOVÁCS AND MICHAEL T. HANNAN
Stanford GSB Research Paper 2081, revised November 2011
Abstract. A general finding in economic and organizational sociology states

that producers and products that span categories lose appeal to audiences.
This paper argues that the negative consequences of crossing category bound-
aries are more severe when the categories spanned are distant and have high-
contrast. Available empirical strategies do not incorporate information on
distances among categories. Here we introduce novel measures to categorical
distance and derive measures for grade-of-membership, category contrast and
categorical niche width. Using the proposed measurement approach, we test
our theory using data on online restaurant reviews in Los Angeles and San
Francisco, and we find that the results confirm our predictions.
1. Introduction
Recent research has reopened the classic sociological problem of how systems of
categories shape social structures (Durkheim and Mauss 1969 [1903]). Much of this
research works in the spirit of what Garfinkel (1968) called breaching experiments.
It seeks to understand the role of categorical boundaries by examining the conse-
quences to actors who ignore them. In this context, ignoring category boundaries
means claiming membership in multiple categories.
Category spanners adopt patterns of feature values that fit more than one cat-
egory and, sometimes, claim membership in more than one category at one time
or over time in a sequence of categorical affiliations. Sociological studies show that
categorical boundaries are indeed consequential in diverse domains: combining cat-
egories generally generates some form of direct or indirect devaluation (Hannan
2010; Negro, Koçak, and Hsu 2010b).
The line of research on boundary crossing has not reached its full potential
because it has not considered the structure of the space of categories. In the interest
of tractability, researchers treat all pairs of categories alike, meaning that all kinds
of spanning are expected to have the same consequences. Yet some combinations of
categories are much more difficult to interpret than others. Therefore the impact
of combining categories depends on the socio-cultural distance separating them.
This paper brings distance into sociological models of categories. It builds mod-
els that take account of categorical distances and proposes an empirical strategy
for using commonly available data to estimate them. Before turning to these mat-
ters, we briefly sketch the range of sociological applications for which our proposed
models and methods have relevance.
Date: November 4, 2011.
We appreciate the helpful comments of Glenn Carroll, Greta Hsu, and Gaël Le Mens. We thank
the University of Lugano, Stanford Graduate School of Business, and the Stanford GSB Faculty
Trust for generous financial support for this project.
1
2 BALÁZS KOVÁCS AND MICHAEL T. HANNAN
Some research addresses the consequences of combining categories for individuals

in diverse contexts. These studies find that bridging category lowers: the proba-
bility that film actors will gain additional roles (Zuckerman, Kim, Ukanwa, and
Rittmann 2003), the odds of winning a bid for contract work (Leung 2011), the
productivity and earnings of researchers in linguistics and sociology (Leahey 2007),
the odds of receiving funding in an online market for lending (Leung and Sharkey
2009), and the odds of completing a sale on eBay (Hsu, Hannan, and Koçak 2009).
Research on organizations finds that mixing categories and genres reduces: the
probability that listed firms receive coverage from stock market analysts, which
in turn diminishes valuations (Zuckerman 1999), ratings of feature films by critics
and general audiences and also box-office revenues (Hsu et al. 2009), ratings of
restaurants by critics (Rao, Monin, and Durand 2005) and by the general audience
(Kovács and Hannan 2010), critical evaluations and prices of elite Italian wines
(Negro, Hannan, and Rao 2011), and sales of software products (Pontikes 2011).1
In addition, work underway explores the implications of category spanning for the
success of social movements and of terrorist organizations as well as for the technical
impact of discoveries (patents) and patent classes (Carnabuci, Kovács, and Wezel
2011).
Finally, some research examines the effects of categorical combination for the cat-
egories themselves. Pervasive spanning weakens category boundaries, which lowers
the penalties for spanning (Rao et al. 2005) But it also reduces the value of mem-
bership in the sense that appeal to the audience falls even for those who do not
stray over the category boundary (Negro, Hannan, and Rao 2010a, 2011).
Although this style of research has produced useful new knowledge, it could yield
sharper and deeper insight by changing how we assess multiple-category member-
ships. Understanding the consequences of moving across category boundaries re-
quires analysts to incorporate information about the distances among the categories
and the strength of the boundaries, which has not yet been done. Audience mem-
bers regard domains such as cuisine, films, software, and scholarly disciplines as
populated by categories arrayed over a space with similar categories arrayed close
to one another and dissimilar ones standing at large distances. Social categories
also differ considerably in the clarity of their boundaries, the degree to which they
stand out in the categorical space. Knowledge of the typography of such the socio-
cultural space containing the categories provides essential information about the
audience’s categorical structure.
The first part of the paper argues that bridging distant categories causes greater
difficulties of interpretation than combining neighboring ones.2 For example, in
the empirical domain of this paper, restaurants in Los Angeles and San Francisco,
combining related cuisines such as “Spanish” and “Basque” causes less confusion,
and thus higher evaluations in the eye of audience members than combining distant
cuisines such as “Mexican” and “Japanese.” It also argues that fuzziness of bound-
aries affects the consequences of spanning. Although combining multiple categories
generally makes an actor more difficult to understand, sharp boundaries make the
problem especially acute.
1
Two studies find that investors are much less sensitive to category combination than members of
general audiences for domains (Pontikes 2011; Smith 2011).
2
We restrict attention to the categories in the same domain, e.g., cuisine or banking. In general,
if sets of categories are seen as unrelated, then establishing membership in more than one does
not seem to cause problems.
CATEGORY SPANNING, DISTANCE, AND APPEAL 3
In designing a new conceptualization and measurement strategy, we sought to

represent distances among categories in a way that plausibly maps to what the
audience sees. We propose a simple co-occurrence approach for building a repre-
sentation of category structure. Category that tend to co-occur, whose members
tend to share labels, lie close to each other in the socio-cultural space, while those
that rarely co-occur are distant (Church and Hanks 1990). Therefore we use infor-
mation on the pattern of co-occurrences for calculating distances.
The second step uses these distances in calculating grades of membership (GoMs)
of producers3 in the categories. The current approach to assigning GoMs in la-
bels/categories assumes that producers that get classified into multiple categories
are unlikely to be typical members of each category (Hsu et al. 2009). We claim
that this is especially so when the categories lie far apart. For example, a scholar
whose work gets labeled as “Sociology” and “Genetics” likely publishes research that
is atypical for both disciplines. But this might not be the case when the categories
are close. To continue the example, a scholar whose work is tagged as “Sociology”
and “Gender Studies” can plausibly produce research that fits well in each. We
want to allow for such differences in assessing the effect of categorical overlaps.
Based on this distance-based approach to calculating GoMs in categories, we
propose measures of categorical niche width and categorical contrast that also make
adjustments for the category structure. The section of the paper that outlines these
developments has a methodological flavor.
After explaining the new approach, we report tests of our substantive argument
in the third part of the paper. A prior study applied part of this argument (but
ignoring distance) as it applies to restaurants in San Francisco (Kovács and Hannan
2010). Here, we re-examine these issues in light of our argument regarding the
distance of the categories spanned. We use new data on reviews of restaurants in
Los Angeles and updated and expanded data on restaurant reviews in San Francisco.
These data contain assignments of the producers to one or more genres as well as
consumers’ assessments of restaurant quality.
This empirical part of our study has a pair of goals. The substantive goal is
learning whether the consequences of spanning depend on the distances involved
as well as on the fuzziness of the categories. The methodological goal is to see
whether bringing category structure into the picture improves over the approach
used previously.
2. Category Spanning and Typicality

The core of our argument holds that audiences generally prefer the offerings of
easy-to-interpret producers. Typical producers and offerings are easier to interpret
than atypical ones. Therefore we organize our argument in terms of typicality.
Three lines of explanation have been proposed for the deleterious effect of com-
bining categories. One concerns skills and learning, the likelihood that generalists
become Jacks-of-all-trades, masters of none (Hsu 2006). Spanning makes it hard
to develop expertise in any category. Therefore category spanners generally accu-
mulate less skill in performing in any category than do category specialists; and
appeal in a category is proportional to category-specific skill.
3
We build on the recent body of research that considers the interface between two roles: producer
and audience. Incumbents of the producer role create offerings; incumbents of the audience role
inspect, evaluate, and consume the offerings.
Another explanation emphasizes expectations of audiences. According to type-

casting arguments, audiences use category generalism as an indicator of low skill, as
evidence of the master-of-none problem (Phillips and Zuckerman 2001; Hsu, Han-
nan, and Pólos 2011). Even actors who do develop high-level skills in multiple
categories (by dint of superior overall ability or some kind of resource advantage)
have difficulty convincing audiences of this. Such attributions make generalists less
appealing.
A third kind of explanation considers a different kind of interpretive problem.
As mentioned above, producers that get assigned to several (widely separated)
categories have feature values that make them dissimilar to the producers that
specialize in each of the categories. According to recent theory on these issues, the
intrinsic appeal of a producer in a category increases with its typicality (Hannan,
Pólos, and Carroll 2007). It follows from this theory that spanning lowers appeal
(Hsu et al. 2009).
We focus here on typicality rather than on the skill-diminishing effect of general-
ism or on typecasting expectations.4 Only issues of typicality apply to all facets of
our argument. Restricting attention to these processes allows us to offer a unified
approach to the implications of categorical distance.
Contexts Without Categorical Focus. Making precise arguments about cat-
egory combination and appeal requires attention to the setting in which audience
members encounter producers and offerings. Sometimes audience members see pro-
ducers portraying different “roles,” as instances of a sequence of different labels.
For instance, some studies analyze the success of sellers in online markets, where
sellers make offers in one categories but buyers can see histories of participation
over categories (Hsu et al. 2009; Leung 2011) . Such contexts provide a clear focal
category for evaluation.
Many other studies collect overall assessments of producers without knowing the
evaluators’ category focus. For instance, Zuckerman (1999) observes decisions by
analysts to cover or not cover firms as entities, not as a set of business segments,
each to be considered separately. Hsu (2006) observes what ratings filmgoers give
to a film, but not what they think of a film as a drama, comedy, horror, musical,
and so forth. In these cases researchers observe overall appeal, not appeal “as an
l” for each of the relevant labels.
The difference we highlight concerns the mode in which the audience members
typically encounter the producer. If these encounters are specialized to a cate-
gorical context, the context presumably shapes the audiences’ focus and ties their
assessments of appeal to that category. Otherwise, audience members likely make
assessments that consider all of the assigned labels. Indeed it might be very hard
for audience members to parse out their separate reactions with respect to each
label. So the analysis must be modified to deal with the difference between these
two generic situations.
The view that people use categories to structure their expectations about pro-
ducers and offerings suggests that spanning causes problems because it is hard to
make sense of the combinations. According to our reading of the wave of research on
categories and markets beginning with Zuckerman (1999), audience members pre-
fer to interact with producers that they find easy to grasp and avoid dealing with
4
It is worth noting that the new methodological strategy applies as well to studies designed to
understand skill, learning, and typecasting.
hard-to-interpret producers and devalue such producers when they do transact with
them.
Here we specialize the arguments to contexts that do not supply a categorical
focus. That is, we assume that audience members perceive a set of label assign-
ments, and they try to make sense of the producers in light of the assigned labels.
If an offering bears only one label (e.g., a film is labeled only as a “Comedy”) then
the simple label provides a categorical focus. But suppose that two labels have
been applied (e.g., a film is tagged as “Comedy” and “Science Fiction”). How do
audiences make sense of the combination? We propose that answering this ques-
tion requires attention to the distances among categories, specifically the distances
among an audience member’s schemas.
Schemas are cognitive representations of what it means to be a full-fledged mem-
ber of a category. Specifically schemas are combinations of feature values. In the
fuzzy-membership approach, a producer/offer that fits fully the constraints ex-
pressed by an audience member’s schema for a label has grade of membership
(GoM) of one in that agent’s meaning of the label. Partial fits produce partial
memberships. In other words, GoMs in schemas tell degrees of typicality.
The socio-cultural distance between a pair of categories depends on how much
their associated schemas overlap. If the overlap is considerable, then the categories
lie close to each other in the socio-cultural space. If the schemas do not overlap,
then the schemas are distant. In the latter case, a producer cannot fit both schemas
well. Any offering that partly fits a pair of distant schemas must be a very atypical
instance of the labels associated with these schemas.
Based on this reasoning, we propose that an offering’s typicality in any label falls
with (1) the number of labels used to describe it and (2) the distances among the
schemas associated with these labels. When many (distant) labels are applied to a
producer, atypicality characterizes the producer as a whole: not only is it atypical
for some label, it is atypical for all of the applied labels. We use the term overall
typicality to refer to such a global assessment of the degree to which a producer fits
any label.
This reasoning implies that producers that do not fit any applied label, those
with high overall typicality, have low intrinsic appeal. We now spell out some of
the implications of this idea using a nonmonotonic logic. The theory on which
we build expresses postulates as rules with exceptions, a formulas quantified by a
nonmonotonic quantifier N (Pólos and Hannan 2002, 2004). This quantification
expresses what “normally” holds, with the proviso that more specific information
can overrule the normal case.
We use the predicate nf(y), which reads as “each producer bears a set of category
labels, the audience member y associates non-empty schema with each label, and the
context does not focus the audience’s attention to one of them.” We incorporate
this predicate in the antecedents of the formulas in the following postulates and
propositions to make clear that the argument applies to this special case.
As noted above, the argument builds on a premise about the relationship between
(overall) typicality and intrinsic appeal.
Postulate 1. A producer’s intrinsic appeal increases with its overall typicality.
N x, x0 , y [nf(y) ∧ (t(x, y) ≥ t(x0 , y)) → α(x, y) > α(x0 , y)],
where t(x, y) is a real-valued function that tells the overall typicality of the producer
x to the audience member y, and α(x, y) is a real-valued function that records the
intrinsic appeal of the producer x to the audience member y.
Next we tie this postulate to more specific ones that relate categorical niche
width to typicality and category contrast to typicality.
Categorical Niche Width and Intrinsic Appeal. Category generalists are
confusing—they are “neither fish nor fowl.” Such generalism can be expressed
well in terms of the width of the categorical niche. Producers associated with one
category—category specialists—have a niche width of zero. Niche width increases
as a producer gains a broader and more diverse set of categorical affiliations.
Postulate 2. A producer’s typicality declines with its categorical niche width.
N x, x0 , y [nf(y) ∧ (w(x, y) ≥ w(x0 , y)) → t(x, y) < t(x0 , y)],
where w(x, y) denotes a non-negative, real-valued function that gives the categorical
niche width of the producer x from the perspective of the audience member y. (Below
we discuss ways to measure niche width.)
A testable implication of the argument follows immediately from Postulates 1
and 2.
Proposition 1. A producer’s intrinsic appeal decreases with its categorical niche
width.5
P x, x0 , y [nf(y) ∧ (w(x, y) ≥ w(x0 , y)) → α(x, y) < α(x0 , y)].
Proof. This follows from a cut-rule applied to Postulates 1 and 2.
Contrast and Intrinsic Appeal. The second part of the argument concerns a
category-level variable: contrast. According to Hannan et al. (2007), categories
with very fuzzy boundaries exert less social power than crisper ones. A relatively
crisp category stands out from the social background and likely serves as a basis of
enduring expectations about those who bear the category label. The main intuition
holds that membership in a high-contrast category conveys greater advantage than
membership in a fuzzy category. This is because audience members generally find
the crisper label to be a more stable basis on which to form expectations. This
intuition has been found to hold in diverse contexts, as noted in the Introduction.
Negro, Hannan, and Rao (2011) argue that pervasive spanning by members of
a label, especially long-distance spanning, lowers its contrast and thereby reduces
the appeal of all of its members. Lowered contrast likely reduces the appeal of
all offerings in a category in two ways. One involves the relationships among cat-
egories. Fuzziness implies a loss of distinctiveness of a category relative to the
others, raising questions about what comparisons are appropriate for the members
of a category. With increasing fuzziness, clusters of producers/offerings become
less salient and elicit lower attention. Previous research shows that comparisons
become more difficult; audience members have trouble using distinct descriptors,
5In the nonmonotonic logic used in this line of theory building, the implication of a set of rules
with possible exceptions is a formula quantified by another nonmonotonic quantifier P. This
quantification expresses what “presumably” follows from the premises of the current stage of a
theory. Future theory stages might incorporate more specific considerations might destroy a
provisional implication from an earlier theory stage.
and develop attitudes of reserve, strangeness, even aversion or repulsion (Griswold

1987). Negative evaluations are more common, and audience members claim pre-
vious judgments were too generous or neglected important differences.
A loss of contrast also likely diminishes an audience’s enthusiasm for a category
involves the loss of agreement about the meaning of the category (Hannan et al.
2007). When contrast falls, the producers to which some audience members apply a
label share fewer schema-relevant feature values. Such a situation sparks disagree-
ment about the meaning of the label and about which producers belong. According
to Simmel (1978 [1907]), the loss of distinctiveness “hollows out the core of things.”
Hence, low-contrast categories lack intrinsic appeal relative to higher-contrast ones.
A challenge is to translate this intuition about the advantages of affiliation with a
high-contrast category to the context of multiple-category memberships. If contrast
matters, which category’s contrast? After considering a number of possible answers
to this question, Kovács and Hannan (2010) concluded that two dimensions need
to be considered: (1) whether the producer is associated with any high-contrast
category and (2) how much the highest-contrast categorical membership dominates.
The first consideration is that association with at least one high-contrast cate-
gory makes a producer more intrinsically appealing. In the simplest case, consider
two producers associated with only one category having shaper boundaries, higher
contrast. Then existing theory holds that the producer affiliated with the sharper
category will have higher intrinsic appeal (Negro et al. 2010a).
More generally if belonging to a high-contrast category makes agents easier to
interpret, then having a membership in at least one such category will make an agent
more interpretable than having memberships in a set of low-contrast categories. In
other words, the maximum of the contrasts of the applicable categories provides
some systematic information about overall typicality.
Postulate 3 (Maximum contrast and typicality). The overall typicality of a pro-
ducer increases with the maximum contrast of the categories assigned (as long as
the next-highest contrast is not higher).
N x, x0 , y [nf(y) ∧ (mc(x, y) > mc(x0 , y)) ∧ (sc(x, y) ≥ cs(x0 , y))
→ t(x, y) > t(x0 , y)],
where mc(x, y) is a real-valued, non-negative function that tells the contrast of
the maximum-contrast category applied to the producer x based on the grade-
of-membership assignments by audience member y, and sc(x, y) is a real-valued,
non-negative function that gives the secondary contrast for the producer x based
on the grade-of-membership assignments by y.
The second consideration pertains to the distribution of contrasts over the labels
applied to a producer. A key intuition holds that a market participant’s member-
ship in two or more high-contrast categories confuses the audience (Kovács and
Hannan 2010). So it is clearly not enough to analyze only maximum contrast. We
need to know the consequences of having second, third, etc. high-contrast category
memberships. We reason that, net of the effect of the maximum-contrast categorical
membership, having another high-contrast membership will confuse the audience
and thereby reduce intrinsic appeal.
Given assignment of multiple labels, overall typicality is low when one of the
labels other than that with maximal contrast also has high contrast. We use the
term secondary contrast to refer to the next-to-maximal contrast. In these terms,

appeal will fall with increasing secondary contrast.
Postulate 4. The overall typicality of a producer decreases with its secondary con-
trast (as long as the maximum contrast of the categories assigned is not lower).
N x, x0 , y [nf(y) ∧ (mc(x, y) ≥ mc(x0 , y)) ∧ (sc(x, y) < sc(x0 , y))

→ t(x, y) > t(x0 , y)].
Again we have a testable implication of the argument, which in this case follows
immediately from Postulates 1, 3, and 4.
Proposition 2. When categorical focus does not hold, the intrinsic appeal of a
producer generally increases with the maximum contrast of the categories assigned
and decreases with secondary contrast.
P x, x0 , y [nf(y) ∧ ((mc(x, y) > mc(x0 , y)) ∧ (sc(x, y) ≤ sc(x0 , y))

∨ (mc(x, y) ≥ mc(x0 , y)) ∧ (sc(x, y) < sc(x0 , y))) → α(x, y) > α(x0 , y)].
Proof. This follows from a cut-rule applied to Postulates 1, 3, and 4.
3. Label Assignments and Grades of Membership in Categories

Empirical progress on issues of multiple-category memberships has been rapid
(for a review, see Hannan 2010). One reason is that researchers have found archives
and websites that assign one or more categorical memberships to producers and
products and also provide measures of the audience’s assessment of each. For in-
stance Pontikes (2008) analyzes the relationship between the positions of software
producers in “knowledge space” (induced from patterns of patent citations) and
claims to membership in product categories coded from press releases; Hsu (2006)
and Hsu et al. (2009) analyze data drawn from websites that provide reviews by
professional critics and members of the general audience of films assigned to one
or more genres; Hsu et al. (2009) analyze producers that affiliate (by listing prod-
ucts for sale) with one or more of eBay’s product categories; and Carroll, Feng,
Mens, and McKendrick (2010) code producers of tape drives as producers in one or
more of the industry’s technological formats. Researchers have devised ways to use
these data to characterize each producer’s strength of association with the various
categories and to relate patterns of categorical affiliation to appeal to the audience.
We show that the empirical strategy for relating label assignments to GoMs in
genres/categories does not take full account of the available information. In par-
ticular, it does not consider the distances between the labels spanned. We propose
some generalizations of the measures used in prior research that use information on
the structure of categories.
Multiple-category membership relates directly to the nature of category bound-
aries. Until fairly recently sociologists, following the so-called called the classical
perspective on concepts, ignored the possibility that concepts might have an inter-
nal structure, that the entities assigned a concept label might differ in the degree to
which they typify the category, the degree to which they “belong” to the category.
However, a long tradition of research in cognitive psychology shows that familiar
concepts such as “fruit” and “furniture” have an internal structure: apples and or-
anges are viewed as typical fruits and olives and pineapples as atypical fruits and
so forth (Rosch 1975; Rosch and Mervis 1975; Hampton 2007). One useful way to
represent such internal structure is to view the label as referring to a fuzzy set, a set
whose membership function admits partiality (Hannan et al. 2007). Our research
implements this view.
Recent research on categories attempts to construct meaningful GoMs in labels
(such as film genres and product categories) from sparse data that does not allow
measurement of schemas.6 The now-common study design obtains assignments to a
predetermined list of category (or genre) labels. In most previous research settings,
as well as ours, some market intermediary (such as a the managers of publications
or websites that post reviews) assigns the labels. The analyst lacks information
about how individual audience members would apply the labels. This means that
using such data to test arguments stated at the level of the audience member, as
above, requires an assumption of homogeneity. If members of the audience use
the domain language in idiosyncratic ways, there is little hope of finding systematic
relationship between combinations of the (externally given) labels and the responses
of the audience members. What some agents will see as spanning will not be so for
others, and so forth. So we must assume that the audience uses the language in
a homogenous manner, that they associate similar schemas with the labels of the
domain.
The basic data on label assignment can be represented as a vector that assigns
to a producer a value (say one) for each label assigned and a value (zero) for those
that are not. The analyst wants to calculate a GoM in each label for each market
participant as a first step in characterizing producers and categories.
Suppose the language of the relevant domain contains L labels for producers.
Throughout we make reference to the label function (of an unspecified party7) as
given by (
1 if label i is assigned to x;
li (x) =
0 otherwise.
The basic analytical question asks how to use a label profile to make inferences
about memberships in the categories/genres that correspond to the labels. We next
consider some answers to this question.
A Qualitative Approach. Much recent work follows what we call the qualitative
approach (for instance, Pontikes 2008; Hsu et al. 2009; Pontikes 2009; Carroll et al.
2010; Negro et al. 2010a; Kovács and Hannan 2010; Negro et al. 2011). It works as
follows. In the first (largely implicit) step, the analyst assumes that, because the
schemas for the various labels differ (they impose different constraints), producers
and products with only one label generally fit better the schema for that label than
those assigned two labels. For instance, a film classified as “Comedy” and “Horror”
likely lacks the typical features of either genre. Similarly a restaurant labeled as
6Obviously we would prefer to have access to data that tell what schemas the audience members
associate with the relevant labels. Then categories could be represented as sets in a space of the
values of categorically relevant features and relations. Questions about combining categories could
then be addressed in terms of positions in the feature space.
7Because the labeling is not done by the members of the audience, we do not assign an “audience-
member slot” to this function.
“Mexican” and “Thai” can hardly typify either label.8 The reasoning then makes a
similar assertion about two-label versus three-label entities, and so forth.
Following this reasoning, a market participant’s GoM in any label generally
declines with its number of labels. In particular this reasoning suggests that the
GoM function in any assigned label decreases monotonically with the number of
labels assigned: µi (x) = g(lx ), with g 0 (lx ) < 0 subject to the condition that g lies
in the unit interval: 0 ≤ g(lx ) ≤ 1.
Hsu et al. (2009) proposed the following functional form for relating label as-
signments and GoM that satisfies these desiderata:9
li (x)
(1) µQi (x) = ,
lx
where lx denotes the number of labels applied to x. For example, if three labels
are applied to a producer or product, then its GoM in each of these labels is set to
1/3, and its GoM in each of the other labels is set to zero.
We refer to this approach as qualitative because it does not use any information
distance, about the space of the categories. Specifically it does not adjust for the
fact that some pairs of categories are closer than others. Our proposal seeks to
rectify this limitation by bringing distance into the picture.
A Metric Approach: Incorporating Distance. We want to measure GoMs
in labels in a way that reflect a market participant’s typicality for each label. As
noted above, a producer whose feature values cause it to be assigned to two distant
labels is generally an atypical member of each. But one that gets assigned two close
labels might be quite typical of each. We propose that GoMs in labels be defined
in a way that incorporates information about the distances among labels:
l (x)
(2) µD
i (x) = P i ,
1 + j∈L lj (x) dij
where dij denotes the distance from the label i to the label j with (dii ≡ 0).
This definition, like the qualitative one (eqn. 1), sets µi (x) = 1 if i is the only
label assigned to x, and it sets µj (x) = 0 for j 6= i in such cases. When two labels,
i and j, are assigned, then it sets µi (x) = (1 + dij )−1 and µj (x) = (1 + dji )−1 , and
so forth. The addition of each label lowers GoMs, but it does so much more when
the added label lies far from the others.
The qualitative procedure for calculating GoMs from label assignments (eqn. 1)
can be viewed as a special case of our proposed measure (eqn. 2) that constrains all
of the categories to stand at a distance of unity from one another. That is, eqn. (1)
results from imposing the constraint dij = 1 for all i and j in eqn. (2).
4. Measures Based on Distance in the Space of Categories

Completing the definition proposed above requires specification of perceived dis-
tance in the socio-cultural space of categories. A basic intuition, backed by research
in cognitive psychology (Shepard 1987; Tversky 1977), holds that similarity and dis-
tance are inversely related. We allow for the possibility that similarity judgments
8The dataset that we analyze actually contains a restaurant with this pair of labels.
9Of course many different functional forms might be used. This choice has the advantage of
simplicity and, as we will show below, of easy generalization to inclusion of distances among
categories.
and perceived distances are asymmetric, that, say, label i is more similar to label
j than vice versa. So in general we refer to the distance from one label to another.
We denote the distance from i to j as dij and the distance from j to i by dji . Of
course if distance is symmetric, then dij = dji .
Following the foundational work of Shepard (1987), we posit a negative expo-
nential relationship between perceived socio-cultural distance and similarity:
(3) s(i, j) = exp(−γ dij ), γ > 0.
Although we vary the measure of similarity, we leave unchanged the relation be-
tween similarity and distance in eqn. 3.
This paper proposes that the relatedness of categories is reflected by their ten-
dency to co-occur in systems of classification. For example, if “Western” films also
tend to be classified as “Drama,” we infer that these labels have similar meanings.
Such a frequentist approach enables researchers to map out the relationships among
categories as we show below.
Symmetric Similarity. The standard assumption that distance is a metric en-
tails the assumption of symmetry. So we begin with this case and later relax the
assumption of symmetry.
We use a simple and widely used symmetric measure of category similarity due
to Jaccard (1901).10 The Jaccard similarity of a pair of labels amounts to a simple
calculation on their extensions.11 Let i denote the extension of li , that is, i = {x |
li ∈ l(x)}. Then the Jaccard similarity of labels li and lj can be defined as the
ratio of the number of producers/products that are categorized as both li and lj
to the number that are categorized as li and/or lj . Formally, if |i ∩ j| denotes the
cardinality of the set of producers that are categorized as both li and lj , and |i ∪ j|
denotes the cardinality of the set of producers that are categorized as li and/or lj ,
then
|i ∩ j|
(4) SimJ (i, j) = .
|i ∪ j|
This index takes values in the [0, 1] range, with 0 denoting perfect dissimilarity and
1 denoting perfect similarity. For example, the dataset on restaurants analyzed
below contains nine restaurants labeled as “Malaysian” and eleven “Singaporean.”
Four of these restaurants are assigned both labels. Thus the Jaccard similarity of
“Malaysian” and “Singaporean” in these data is 3/(9 + 11 − 4) = 0.25.
Asymmetric Similarity. An influential line of research in cognitive psychology,
stimulated largely by Tversky (1977), questions the symmetry of similarity judg-
ments. Research shows that more prominent objects are judged less similar to less
prominent ones than vice versa (e.g., Tversky found that subjects judged North
Korea to be more similar to Red China than the reverse). More generally objects
for the subject knows more feature values are judged as less similar to those about
which they know fewer feature values, e.g., a portrait is more similar to its subject
than the reverse.
10For a detailed discussion of alternative similarity and dissimilarity measures, see Batagelj and
Bren (1995). Some preliminary results show that the main findings of this paper apply to other
measures as well, but we leave this direction of investigation for further research.
11In the usual language of logic and linguistics, the extension of a label refers to the set of objects
that bear the label.
Tversky (1977) proposed a set-theoretic measure of similarity. In the original

formulation, the measure pertains to a pair of objects that possess sets of (perceived)
features. In our setting the objects are labels and the sets of feature values are the
lists of the entities to which these labels are assigned (their extensions). Using the
notation introduced above, the Tversky ratio measure of similarity can be written
as:
f (i ∩ j)
(5) SimR (i, j) = , α, β > 0,
f (i ∩ j) + αf (i − j) + βf (j − i)
where f is an interval scale that weights the elements of a set by their salience for
similarity judgments.
We assume that the weighting function f treats the members of the various
sets (label extensions) as having equal salience to the audience and therefore as
deserving equal weight. Then, without loss of generality, the ratio measure of the
similarity of a pair of labels can be expressed in terms of the cardinalities of their
extensions:
|i ∩ j|
(6) SimT (i, j) = , α, β > 0.
|i ∩ j| + α|i − j| + β|j − i|
If α = 1 = β, then the counting measure of similarity (in eqn. 6) reduces to the
symmetric Jaccard index (eqn. 4). But if these two parameters differ, then the
similarities are asymmetric.
Categorical Niche Width. The concept of categorical niche width provides a

useful way to analyze typicality in the context of multiple-category memberships.
Hsu et al. (2009) defined niche width in category space as a way to summarize
the differences among market participants in the degree to which they specialize
in terms of categorical memberships. A category specialist belongs to only one
category. A category generalist allocates its engagement and the features of its
offering so as to appeal as a member of several categories.
The category-membership niche has the form: µx = {µ1 (x), . . . , µL (x)}. Hannan
et al. (2007) define niche width for fuzzy niches using the Simpson (1949) index of
dissimilarity of the GoMs defining the niche. When applied to categorical niches,
the Simpson index yields:
w(µx ) = 1 − i∈L µ2i .
P
(7)
Hsu et al. (2009) adapted this conceptualization to category memberships Eqn. (1)
implies that a producer given k labels has µi = 1/k in each assigned label (and zero
GOM in the remainder). In this case, the width of a producer’ categorical niche
depends only on the number of labels assigned to it: lx . This qualitative measure
of categorical can be written as:
lx − 1
(8) WQ (µx ) = .
lx
If GoMs in categories reflect distances among the categories spanned, then we
run into a problem with the Simpson index. Unlike the qualitative approach, our
proposal does not constrain the sum of squared GoMs to lie in the unit interval.
As a result, niche width calculated as a Simpson index can be negative, which does
not make sense. As we see it, new thinking about categorical niche width is needed
once distance enters the picture.
While previous formalizations assumed that niche width increases simply with
the number of labels assigned, we now need a measure that also incorporates the
structure of the categories. Specifically, we want a measure to have the following
properties:
(1) be non-negative,
(2) have a minimal value of zero (if an agent gets assigned to a single category),
(3) increase with the number of categories assigned and with the distances
among them.
We consider two measures that meet these desiderata. Both use the total pairwise
distance among the labels assigned:
P P
(9) Dx = i∈L j∈L li (x) lj (x) dij ,
and the average of the pairwise distances among the labels assigned:
Dx
(10) d¯x = .
lx (lx − 1)
The first measure sets categorical niche width to the product of the number of
labels assigned and the average of the pair-wise distances between them.12 For
reasons explained below, we call this a constant-increment measure of width and
denote it by WC .
(
0 if lx = 1;
(11) WC (µx ) = ¯
lx dx otherwise.
The qualitative behavior of this measure can be seen more clearly when we replace
d¯x according to eqn. (10) with the restriction that lx > 1:
WC (µx ) = Dx (lx − 1)−1 , lx > 1.
With this measure, niche width increases with total distance at a constant positive
rate (when the number of labels assigned is held constant):
∂WC
= (lx − 1)−1 > 0, lx > 1.
∂Dx
In other words, if the number of applied labels remains constant but the total
pairwise distance among them increases (for instance, if one label is replaced with
another that lies further away from the rest), then the niche grows wider. But,
holding total distance constant, niche width decreases if the number of labels as-
signed increases (for instance if one label far from the others is replaced with two
closer labels):
∂WC
= −Dx (lx − 1)−2 < 0, lx > 1.
∂lx
We think this is a desirable property for a measure of niche width, because more
labels with the same total distance means that the applicable labels cluster more
tightly in the socio-cultural space. A market participant that spans a closely packed
set of labels has a narrower niche than one that spans fewer more-distant labels.
One property of WC has less substantive appeal. The effect of increasing total
pairwise distance does not depend on the level of niche width. The same given
12We also explored setting niche width equal to the total distance among the labels assigned, the
numerator in equation 10. However, this measure performed poorly as compared with the one
based on the product of the number of labels and average distance.
increase in D has the same implications when D ≈ 0 and when D 0. From the
perceptual perspective, these two situations ought to be quite different. In the for-
mer, a producer that had a sharply defined categorical niche as a specialist in one
label starts getting hard to comprehend. In the later, an already extremely confus-
ing instance is getting more so. We think that something like a perceptual ceiling
effect operates here—there is a limit on how confusing a producer’s categorical
position can be.
So we also propose an alternative measure with the property that niche width
increases more with a given increment of distance at low levels of total distance
spanned than at higher levels. In other words, if a producer with a narrow niche
adds a label at a fixed distance, its niche width increases more than if it initially
had a broad niche. We refer to this measure as the nonproportional measure, which
we denote it by WN . We represent this idea as follows:
(
0 if lx = 1;
(12) WN (µx ) = ¯ −1
1 − (1 + lx dx ) otherwise.
Again we rewrite this (using eqn. 10) as

−1
Dx
(13) WN (µx ) = 1 − 1 + , lx > 1.
lx − 1
In this case, niche width grows with increasing Dx , holding constant the number
of labels applied (i.e., ∂WN /∂Dx > 0), but this effect is weaker at higher levels of
total distance (∂ 2 WN /∂Dx2 < 0). As with WC , niche width shrinks as the number
of labels applied increases, when total distance is held constant: ∂WN /∂lx < 0.
WN has another advantage: it is bounded at unity, but WC lacks a finite upper
bound. This means that WN can more easily be compared over contexts. Whether
this is an advantage for the perspective of empirical prediction is an open question.
The lack of an upper bound on WC might, on the other hand, might allow this
measure to differentiate usefully between cases that are really extreme and the
rest.
Our intuition resonates with the motivation behind the construction of WN .
But it is an empirical question whether allowing nonproportionality makes a useful
substantive difference.
Category Contrast. Contrast refers to the degree to which a set stands out from
the background, the clarity of its boundary. Hannan, Pólos, and Carroll (2007)
defined contrast as the average GoM in the category among those with positive
GoM. Using our notation for label functions and extensions, we have
P
(14) C(li ) = x∈i µi (x)/|i|.
Obviously the qualitative approach for measuring GoM from label assignments
understates the contrasts of categories that lie close to others in the space of cat-
egories and overstates contrasts for categories that overlap distant categories. In
our empirical analysis below, we show that adjusting for distances in calculating
contrasts does indeed make a difference.
5. Empirical Application: Restaurant Genres

We turn now to an empirical application to restaurants. We first describe the set-
ting and show some differences among the alternative measures in this domain.
Restaurants are an appealing application because its many genres are broadly un-
derstood and schematized (Carroll and Wheaton 2009). Moreover, this domain
provides ample opportunity to examine genre spanning.
We take advantage of the upsurge in interest and involvement in online websites
that publish critical reviews by general audience members. On the site we use, users
can post reviews of a diverse array of businesses and nonprofit service providers.
The site categorizes producers in 397 categories, grouped into 22 super-categories,
such as “Hotels and spas,” “Restaurants,” or “Financial services.” The category
labels appear prominently on the site. A call for restaurant reviews for a location
yields a screen with the list of 78 restaurant labels. Clicking through to a label
produces a list of establishments shown by name, address, neighborhood, and a set
of label assignments.
Our data include all the organizations in Los Angeles (hereafter LA) and San
Francisco (SF) that are listed in at least one label in the restaurant domain by this
site. Restaurants receive very frequent reviews; and they are distributed over a
broad diversity of categories. Some labels concern food genres such as various eth-
nic/national cuisines, e.g., “American (traditional),” “Basque,” “Mexican,” “Japan-
ese,” “Soul food,” and “Thai.” Others refer to the mode of service, e.g., “Buffet,”
“Diner,” “Fast food,” and “Food stand.” Still others pertain the key ingredient(s) or
dishes, e.g., “Burgers,” “Chicken wings,” “Fondue,” “Live/raw food,” “Sandwiches,”
and “Seafood,” and some refer to food codes, e.g., “Halal,” “Kosher,” “Vegan,” and
“Vegetarian.” Some restaurants are also classified in food-related/non-restaurant
categories such as “Champagne bar” and “Sport bar” and non-food categories such
as “Art gallery” and “Bowling alley.”
We analyze reviews posted between October 2004 and September 2011. The LA
sample contains 8,131 producers and 617,141 reviews, written by 57,211 reviewers.
The SF sample contains 3,976 producers and 767,268 reviews, written by 59,473
reviewers. SF restaurants are better represented because the website started first
in San Francisco and gained popularity there earlier (see Figure 1).
The distribution of the number of label assignments is highly skewed. Most
restaurants in the two cities (73%) receive only one category assignment. More
than a third (24%) are assigned two, roughly 3% get three or more labels. The
most common labels are “Mexican” (1,907 instances), “Chinese” (1,205), “Japanese”
(1,024), “Pizza” (997), and “Sandwiches” (980).
Symmetric Similarity. We begin with the more familiar symmetric similarity.

Table 1 shows for ten randomly selected labels (listed at the left) what other labels
are significantly similar according to the Jaccard similarity index. For example,
“Barbeque” has significant associations with “Korean,” “Hawaiian,” and “Southern”
in SF and with “Korean” and “Hawaiian” in LA.13
An overall view on the similarity structure of the labels can be seen in Figure 2,
which shows the result of an agglomerative hierarchical clustering of the food labels
13For a test of significance, we rely on the permutation test of Kovács (Forthcoming) and the 0.05
level.
%#!!!"
%!!!!"
$#!!!"
$!!!!"
718"9,18:;<:'"
#!!!"
='<">8?)@)<"
!"
&' 01 /"
+) /"
&' 01 /"
+) /"
&' 01 /"
+) /"
&' 01 /"
+) /"
&' 01 /"
+) /"
&' 01 /"
+) /"
0 !/"
"
$/
.
!
,-!
() 2-!
,-!
() 2-!
,-!
() 2-!
,-!
() 2-!
,-!
() 2-!
,-!
() 2-$
,-$
-$
12
+)
*
*
()
&'
Figure 1. The evolution of the count of reviews per month in Los

Angeles and San Francisco
Table 1. Examples of strong associations among restaurant genres
Focal genre Strongly similar genres

San Francisco
American (New) Breakfast & Brunch; Seafood; American
(Traditional); Sandwiches
Barbeque Korean; Hawaiian; Southern
Italian Pizza; Seafood
Mediterranean Greek; Middle Eastern; Turkish
Modern European -
Pakistani Indian; Vegetarian
Tex-Mex Mexican; Chicken wings; Fast food
Los Angeles
American (New) Breakfast & Brunch; American (Traditional);
French
Barbeque Korean; Hawaiian
Italian Pizza
Mediterranean Greek; Middle Eastern
Modern European Brasseries; Creperies
Pakistani Indian; Halal; Buffets
Tex-Mex Mexican; Fast food
(not just the restaurant labels in the combined data).14 This classification uses the
average clustering method applied to all labels that contain five or more members,
in which each step joins two clusters if the average distance between the members
of the clusters is smaller than any other possible combinations of clusters at that
point of the classification.15 The numbers at the top of the figure indicate the
average distance between the members in that joint cluster. Consider the branch
at the bottom of this figure. At the first level it combines “Tea rooms” with the
set consisting of “Moroccan,” “Turkish,” “Middle Eastern,” “Greek” and “Mediter-
ranean.” At the next branching point, it breaks out “Moroccan,” and so forth. The
main branch ends with the pair “Greek” and “Mediterranean;” these are very close
according to this analysis.
Note that several pairs of labels have highly overlapping extensions and thereby
are quite close to each other in the nonmetric space in Figure 2. The include
“Spanish–Basque,” “Greek–Mediterranean,” “Pakistani–Indian,” and “Japanese–Sushi
bar.” We note these pairs because taking account of the distance spanned makes
the biggest difference for them.
For example, the qualitative measure assigns a GoM of 0.5 in each label to a
restaurant categorized as both “Spanish” and “Basque.” What happens under the
alternative that pays attention to distance spanned? Answering this question re-
quires that we supply a value for the free parameter γ that relates distance and
similarity in equation 3. In the analyses reported below we experimented with dif-
ferent values of γ and found that model fits were highest with γ = 1. So we use
this value in this illustration (and below in our regression analyses). Because the
Jaccard similarity of “Spanish” and “Basque” in the combined data equals 0.48, the
distance between these categories equals − ln(0.42)/1 = 0.73. So our measurement
strategy assigns GoM of 1/(1+0.73)=0.54 in both labels. What about dissimilar
labels? Consider “Chinese” and “French,” whose Jaccard similarity is 0.0007. Using
the Shepard transformation, this gives a distance 7.2, which means that a restau-
rant in the intersection has GoM= 1/(1+7.2)=0.12 in each genre according to our
definition of GoM. Recall that the qualitative measure assigns a GOM of 0.5 to
such cases.
Asymmetric Similarity. How does Tversky’s asymmetric ratio measure change

the empirical patterns? Answering this question requires that we choose values for
the parameters α and β. In previous analysis of the San Francisco data for an early
period, we experimented with different values and obtained the best statistical fits
by setting α = 0.3 and β = 0.1. We use these value in the present analysis. Setting
α > β implies that judgments of the similarity of one label with another places
more weight on the number of members of the focal label that do not belong to the
comparison label than on the number in the comparison that do not belong to the
focal label.
14We also explored various multidimensional scaling representations of the similarity structure
of labels (Kruskal and Wish 1978) We found that multidimensional scaling does not represent
the structure of the data well. We obtained high stress measures (37.9 for the two-dimensional
solution, and 28.4 for the three-dimensional solution), which indicate that the underlying similarity
structure is not metric. Indeed, this is indicated by the hierarchical clustering figure as well.
15We used the “hclus” procedure in R with the average distance option. We measured distance
using eqn. (3) with the distance between pairs with similarity of zero (nonintersecting extensions)
set to 0.001.
18
Height
BALÁZS KOVÁCS AND MICHAEL T. HANNAN
0 5 10 15 20
Burmese
Indonesian
Singaporean
Malaysian
Portuguese
German
Fondue
Russian
Modern European
Argentine
Tapas/Small Plates
Creperies
Belgian
Fish & Chips
Irish
Gastropubs
British
Cafes
Ethiopian
African
Tapas Bars
Spanish
Basque
French
Brasseries
Soul Food
Cluster Dendrogram
Cajun/Creole
Caribbean
Cuban
hclust (*, "average")
Cambodian
Himalayan/Nepalese
Persian/Iranian
Halal
Indian
Pakistani
Filipino
Moroccan
Kosher
dist
Chicken Wings
Latin American
Peruvian
Steakhouses
Seafood
Brazilian
Buffets
Chinese
Taiwanese
Hawaiian
Korean
Barbeque
Japanese
Sushi Bars
Asian Fusion
Thai
Turkish
Middle Eastern
Mediterranean
Greek
Tex−Mex
Mexican
Burgers
Fast Food
Italian
Pizza
American (New)
American (Traditional)
Breakfast & Brunch
Diners
Soup
Vietnamese
Delis
Sandwiches
Food Stands
Hot Dogs
Vegetarian
Vegan
Gluten−Free
Live/Raw Food
Figure 2. Hierarchical clustering of the food categories (“average

distance” method—see text)
Contrast. We expect that incorporating distance in calculation of GoMs would

increase or keep constant the contrasts of such highly overlapping pairs as “Indian”–
“Pakistani” and “Japanese”–“Sushi bar,” to the extent that they members of these
categories restrict their memberships to the pairs (which is not the case). As a
Table 2. Contrasts of selected restaurant genres with alternative

measures of grade of membership
Contrast
San Francisco Los Angeles
Q D
Genre using µ using µ using µQ using µD
Asian Fusion 0.709 0.538 0.659 0.476
Barbeque 0.731 0.570 0.696 0.550
Chinese 0.914 0.862 0.882 0.815
Italian 0.803 0.715 0.766 0.675
Indian 0.698 0.681 0.676 0.619
Japanese 0.687 0.664 0.724 0.666
Modern European 0.583 0.416 0.524 0.265
Pakistani 0.509 0.542 0.467 0.457
Steakhouses 0.747 0.603 0.718 0.590
Thai 0.939 0.905 0.912 0.863
Vegetarian 0.522 0.321 0.505 0.284
result, contrast generally rise even for these categories when we adjust for distances
as can be seen in Table 2. Only for “Pakistani” in San Francisco does contrast rise.
However, it declines only slightly for “Chinese,” “Indian,” “Italian,” “Japanese,”
and “Thai.” At the other extreme, contrast falls considerably with the distance
correction for “Asian Fusion,” “Barbeque,” and “Vegetarian.” What matters here is
not the exact magnitudes of changes in contrasts (as these depend on the specific
distance measure used) but that incorporating distance alters the order of categories
in terms of contrast.
Categorical Niche Width. Adjustments for distance also affect measures of cat-
egorical niche width. Table 3 compares the three measures (WQ , WC , and WN )
for some restaurants with three or four label assignments (the measures do not
differ for category specialists). Of course, WQ , does not discriminate cases with
the same number of labels assigned. However, the distance-based measures do
discriminate. The combination in the first row of Table 3, “Chinese,” “Japanese,”
“American (New),” and “Hawaiian” spans a very considerable distance, and it re-
ceives have values for WC , and WN . However, the restaurant in the fourth row,
which also bear four labels, combines more similar categories: “Persian/Iranian,”
“American (New),” “Middle Eastern,” and “Mediterranean.” Consequently the val-
ues of distance-based measures of niche width are lower. The table illustrates a
similar pattern among restaurants assigned three labels.
The comparisons of the extreme cases in Table 3 also show that the range of
variation of WC is very large relative to that of the constrained WN . Over the
ranges in this table, the ratio of the maximum to the minimum of WC is 1.65,
while the same calculation for WN gives 1.03. So these two measures differ greatly
for extreme values, even though they mostly agree on the ordering. The Pearson
correlation of the two distance-based measures is 0.85 for the multi-label cases.
The correlations of the qualitative measure with the two distance-weighted mea-
sure are modest for the multi-label cases. The correlation of WQ and WC is 0.43;
Table 3. Examples of alternative calculations of categorical niche

width for restaurants with three or four label assignments
WQ WC WN Labels
0.750 24.0 0.959 Chinese, Japanese, American (New), Hawaiian
0.750 19.2 0.950 Chinese, Japanese, Korean, Thai
0.750 17.5 0.936 Middle Eastern, Greek, Barbeque, Med.
0.750 14.7 0.941 Persian/Iranian, American (New), Middle Eastern, Med.
0.667 18.3 0.948 Mexican, Korean, Burgers
0.667 15.1 0.938 Japanese, Italian, Asian Fusion
0.667 7.54 0.883 Russian, Modern European, German
0.667 4.84 0.829 Singaporean, Malaysian, Indonesian
and the correlation of WQ and WN is 0.28. Note that in some cases the ordering
switches, for example the three-label restaurant “Mexican,” “Korean,” “Burgers”
have a wider niche according WC than the four-label “Persian/Iranian,” “American
(New),” “Middle Eastern,” “Mediterranean” restaurant.
6. Genre Combination and Restaurant Appeal

Finally we turn to empirical testing of the main propositions in our argument. We
have two goals here. The first is testing whether categorical niche width, maximum
contrast, and secondary contrast affect appeal as predicted. We also want to learn
whether the proposed reconceptualization and measurement make a substantive
difference for the analysis of category membership and appeal.
To answer these questions we compare models using the alternative measures of
audience members’ ratings of restaurants. The outcome variable is the rating given
in an audience member’s review. The explanatory variables include the theoretically
relevant ones and a variety controls for other properties of the categories, of the
reviewer, and of the restaurant reviewed.
Measurement. All registered users of the sites can submit one or more reviews of
any producer. Especially important is the summary rating, the outcome variable in
our analyses, which ranges from one to five stars (the highest rating). Most reviews
give three or more stars: the mode of the distribution is four stars in both cities,
and the mean is 3.7 in both cities.
Similarity. We consider two ways of measuring similarity: the symmetric Jaccard
index (SimJ ) and the asymmetric Tversky ratio measure (SimT ). We convert mea-
sures of similarity to measures of distance using the exponential relation with γ = 1,
as explained above. We calculate GoMs using µD (eqn. 2).
In the end we produce three sets of GoMs to compare with the results of the
qualitative approach. These GoMs do not appear directly in the stochastic spec-
ifications we estimate. Rather they form the basis for calculations of niche width
and contrast.
Niche width over labels. We conduct analyses using the three alternative measures
of niche width: the purely qualitative measure used in previous research WQ (eqn. 8)
and our proposed alternatives, WC (eqn. 11) and WN (eqn. 12).
Table 4. Distributions of theoretically relevant variables with al-

ternative measurements (for all restaurants)

Variable Measures Mean S.D. Min. Max. Mean S.D. Min. Max.
Niche width WQ SimQ 0.19 0.25 0 0.75 0.18 0.25 0 0.75
Niche width WC SimJ 2.42 3.86 0 17.5 2.55 4.03 0 24.0
Niche width WC SimT 1.41 2.44 0 11.9 1.51 2.61 0 17.4
Niche width WN SimJ 0.29 0.40 0 0.95 0.30 0.41 0 0.96
Niche width WN SimT 0.25 0.36 0 0.92 0.26 0.36 0 0.95
Max. contrast µQ SimQ 0.78 0.09 0.42 1 0.78 0.09 0.45 1
Max. contrast µD SimJ 0.69 0.11 0.24 1 0.69 0.12 0.23 1
Max. contrast µD SimT 0.75 0.09 0.35 1 0.75 0.10 0.33 1
We also pay attention to another aspect of the niche. Any restaurant that
engages outside the very broad food category has a very wide niche. We constructed
a dummy variable to capture this dimensions of niche width, labeled “any non-food
category,” equals one for restaurants with an assignment to a non-food category
such as “Gas station” and equals zero for those whose category assignments all
come from the food domain.
Maximum and secondary contrast. We measure maximum contrast for a restaurant
as the maximum over the labels assigned to it of the label contrasts (average GoM in
a label for those restaurants with positive GoM in the label), and we set secondary
contrast to the value for the next-highest contrast label assigned. Of course, the
values of these two variables depend on which method is used for calculating GoMs
in labels.
The distributions of these theoretically relevant variables can be found in Table 4.
Note that we discriminate among the various combinations of measurements of
interest. This table shows that organizational niche width is much more sensitive
to choice of alternative measures than contrast. We will see below that this means
that choice of measure has more impact on estimated effects of niche width on
appeal.
Controls We control for variation among reviewers in engagement in the category
and the website using (the natural log of) the number of reviews posted for restau-
rants bearing the category label, following Koçak, Hannan, and Hsu (2009). We
refer to this variable as the reviewer’s activism. We also control for the producer’s
prominence (on the website), measures as the natural log of the number of reviews
it receives. We include the date of review to control for secular trends in appeal.
Finally, we control for price levels, which can take four ordered values: coded one
(for “cheap”) to four (“splurge”).
Hypotheses. In forming hypothesis for test with data on appeal, we must make an
assumption about producers’ engagement with the audience. Our theory pertains to
intrinsic appeal and the ratings reflect actual appeal, which, according to Hannan
et al. (2007), depends on intrinsic appeal and engagement. It seems clear that
the restaurants have engaged the audience, but we do not know much about the
intensity of engagement. We do control for engagement in categories outside the

food domain, which likely serves as the major source of variation in engagement as
a “restaurant.” Otherwise, we assume that actual appeal is proportional to intrinsic
appeal.
Our argument implies that appeal is lower for restaurants with broad categorical
niches. Previous research with a subset of the reviews of SF restaurants revealed
that some of the effects of interest are sensitive to price levels (Kovács and Hannan
2010). We expect the effect of niche width on appeal to become more negative at
higher price levels (even though the prior research did not find this). At the lowest
price level (“cheap”), many establishments lack seating and provide offerings that do
not demand much skill in preparation (e.g., sandwiches, burgers, hot dogs, tacos,
and so forth). Genre conventions presumably apply even to such establishments,
but they appear to be weaker than for restaurants that employ chefs and provide
table service. From perusing reviews and from our own experience, we think that
genre conventions are strongest and most constraining at the high-price (“splurge”)
level.
Hypothesis 1. The actual appeal of restaurants decreases with (distance-weighted)
categorical niche width at high price levels.
Two other hypotheses concern the expected difference in effect of maximum and
secondary contrast on appeal:
Hypothesis 2. The actual appeal of restaurants increases with the maximum
(distance-weighted) contrast of the categories assigned.
Hypothesis 3. The actual appeal of restaurants decreases with secondary (distance-
weighted) contrast of the categories assigned.
Estimation. We use an ordered-logit specification to assess the effect of the above

variables on the appeal, because the outcome variable (number of stars) is discrete
and ordered. The stochastic specification is
A∗ = x0 β + ,
where x denotes a vector of covariates, β denotes a vector of parameters, has a
logistic distribution, and
A∗ = i if δi−1 ≤ A∗ < δi ,
where δi (i = 1, . . . , I) are cut points, δ0 = −∞, and δI = ∞. To account for the
possible persistent differences between reviewers (as a source of non-independence),
we cluster the error term in the regressions by reviewer.
Results. At a broad level, our analyses reveal three clear tendencies. First, the
results mainly support the main study hypotheses in a robust way: the hypothesized
patterns hold across the various ways of measuring GoMs, similarity, and niche
width (with the exception of the results that bear on Hypothesis 2 using data for
LA). Second, the specifications built on distance-related measures fit substantially
better than those that build on the distance-independent qualitative measures.
Third, the distance-based specifications generally yield more consistent evidence
in support of the hypotheses. Introducing some adjustment for category structure
Table 5. Model fits using three measures of similarity and three

measures of niche width
WQ WC WN
SimQ SimJ SimT SimJ SimT
(1) (2) (3) (4) (5)
San Francisco
ln L −378,503 −377,883 −377,888 −378,092 −378,133
Los Angeles
ln L −306,829 −306,794 −308,802 −306,776 −306,779
matters more for substantive conclusions than the choice among the alternative
ways of making these adjustments.
Recall that nearly three-quarters of the organizations are category specialists—
they get assigned only one label. We have done the analysis with these specialists
included and with them excluded. The general pattern of results is quite similar.
However, analysis restricted to the restaurants with two or more labels is more
informative for our purposes, because the specialists do not vary in niche width
or secondary contrast. The results for the category specialists, in the Appendix,
reveal a strong and significant effect of category contrast on appeal, in support of
Hypothesis 2.
Here we concentrate on the multiple-category restaurants. Because the compar-
isons of interest involve many different measures, we begin by considering model fits
(without providing all of the parameter estimates) in Table 5. We are especially in-
terested in comparing the fits of the specifications built on the qualitative measures
with those based on distance-weighted measures. It is clear that the qualitative
approach yields specifications (columns 1 and 4 in Table 5) that fit much less well
than those that make adjustments for the distances among categories.
The best fit partly differs by city. For SF, the best fits come from the specification
with proportional niche width, but the reverse is the case for LA. With that choice
set to the best fit, there are only small differences in fit stemming from the choice
between the Jaccard and Tversky measures of similarity. For both cities, the fits
are marginally better with the Jaccard measure than with the asymmetric Tversky
measure.
The key findings that bear on our hypotheses appear in Table 6, which reports
the parameter estimates for the qualitative approach along with the those based on
proportional niche width for SF and nonproportional niche width for LA.
Consider the effects of niche width, beginning with SF. Here we see an impor-
tant substantive consequence of taking account of distances among categories. The
qualitative approach gives results that run counter to Hypothesis 1: the estimated
effects of niche width in column 1 in Table 6 are positive for all price levels (and
significant for all but the third price category). According to these results, the
audience prefers restaurants with broad categorical niches. However, the estimates
of parameters of specifications that build on the distance-weighted measures, in
columns 2 and 3, tell the opposite story. They show that a broad niche increases
appeal at the lowest price level but beyond that price point, increasing niche width
Table 6. Estimated effects of niche width, maximum contrast,

and secondary contrast on appeal of restaurants in San Francisco
and Los Angles: ML estimates of ordered-logit models

WQ WC WQ WN
SimQ SimJ SimT SimQ SimJ SimT
Price level 2 0.013 2.12** 0.765** −0.294** −0.318** −291**
(0.077) (0.072) (0.033) (0.080) (0.023) (0.019)
Price level 3 0.580** 2.72** 1.15** −0.492** −0.024 −0.034
(0.101) (0.092) (0.042) (0.137) (0.033) (0.027)
Price level 4 0.161 4.00** 1.86** 1.96** 0.842** 0.794**
(0.258) (0.343) (0.164) (0.197) (0.052) (0.045)
Non-food cat. 0.047** 0.017 0.019 −0.157** −0.176** −0.180**
(0.011) (0.011) (0.011) (0.011) (0.011) (0.011)
Review date 0.015** 0.013** 0.013** −0.002 −0.003 −0.003
(0.002) (0.002) (0.002) (0.003) (0.003) (0.003)
Rest. prominence 0.218** 0.248** 0.247** 0.164** 0.157** 0.156**
(0.005) (0.005) (0.005) (0.004) (0.004) (0.004)
ln(rev. activism) −0.022 −0.104** −0.068** 0.019 −0.018 −0.029**
(0.023) (0.025) (0.011) (0.029) (0.013) (0.011)
NW * price 1 1.23** 2.19** 1.17** 0.408 −0.003 −0.007
(0.207) (0.131) (0.069) (0.229) (0.004) (0.005)
NW * price 2 0.682** −0.695** −0.308** 0.575** 0.013** 0.014**
(0.175) (0.120) (0.064) (0.209) (0.004) (0.005)
NW * price 3 0.035 −1.14** −0.526** 1.12** −0.017** −0.028**
(0.212) (0.138) (0.073) (0.309) (0.005) (0.006)
NW * price 4 1.09* −2.38** −1.17** −2.26** −0.049** −0.074**
(0.488) (0.397) (0.205) (0.382) (0.007) (0.009)
NW * activism −0.030 0.066* (0.024) −0.072 −0.000 0.000
(0.040) (0.028) (0.015) (0.051) (0.001) (0.001)
ln(max. contrast) 0.320** 0.442** 0.631** 0.113 −0.006 −0.015
(0.123) (0.085) (0.112) (0.134) (0.088) (0.116)
ln(sec. contrast) −0.031 −0.195** −0.343** −0.748** −0.385** −0.436**
(0.088) (0.053) (0.078) (0.108) (0.055) (0.072)
ln(max. contrast) −0.043 −0.065** −0.113** −0.012 0.005 0.009
*activism (0.029) (0.020) (0.027) (0.033) (0.022) (0.028)
ln(sec. contrast) 0.036 0.031** 0.057** 0.112** 0.062** 0.069**
*activism (0.021) (0.012) (0.019) (0.027) (0.014) (0.018)
N 268,996 268,996 268,996 217,687 217,687 217,687
ln L −378,503 −377,883 −377,888 −306,829 −306,776 −306,779
* p < 0.05; ** p < 0.01; standard errors (clustered by reviewer) in parentheses.

is associated with lower appeal, with the effect becoming stronger and more signif-
icant at higher price levels. In other words, once account is taken of the distances
spanned, it becomes clear that having a broad categorical niche makes a restaurant
less appealing to this audience.
For LA, the qualitative approach and metric (distance-weighted) approaches
agree in yielding a pattern in which niche width increases appeal in the lower price
categories. According to the results built on the qualitative approach, the effect of
niche width on appeal remains positive (and significant) at the third price range
and only become negative at the highest price range. The distance-based estimates
indicate the effect of niche width on appeal is negative and significant at both of
the higher price ranges, with the absolute value of the effect greatest at the highest
price.
Overall we see that the parameter estimates with distance-weighted measures
provide stronger and more systematic support for Hypothesis 1 than do those built
on the qualitative measure of similarity/distance.
Next consider maximum contrast. The first-order effect on appeal is positive for
SF, but negative and insignificant for LA in the better fitting specifications. So
only the results for SF support Hypothesis 2. Figure 3 plots the implied effects
of maximum contrast for SF for the qualitative, Jaccard, and Tversky measures
(over the varying ranges of maximum contrast produced by the different measures).
Note that the effect of maximum contrast is considerably weaker (flatter) with the
qualitative measure and strongest with the Tversky measure.
Finally, Hypothesis 3 receives strong support. For both cities, appeal declines
with increasing secondary contrast; and this effect is significant in all cases for the
distance-based approach (but only for LA with the qualitative approach). The
better-fitting models show that, net of the effect of maximum contrast, higher
secondary contrast lowers appeal, as hypothesized. Figure 4 illustrates the joint
effect of maximum and secondary contrast on appeal using the estimates in the
second column in Table 6. The peak of the function (the effect on appeal) in
the upper right equals approximately 0.35. Moving to the left from that point
(increasing secondary contrast) drives the function toward zero, the value when
secondary contrast equals one. The decline in appeal with falling maximum contrast
is steeper. At the left-hand face of the graph of the function, the effect on appeal
falls from 0.35 to roughly −0.35 as maximum contrast falls over its observed range.
We explored whether the effects of the theoretically relevant variables differ for
the more and less active reviewers. Again the answer differs by city. For SF it is
clear that activists are less sensitive to issues of interpretability. For all three of
the theoretically relevant effects there is a statistically significant interaction with
activism whose sign opposes the sign of the main effect. For instance the coefficient
of the interaction of niche width and activism is 0.066 in column 2 in Table 6. At
the mean of (ln) activism for SF, 3.86, the interaction effect is 0.25; at the 75th
percentile, 4.29, it is 0.28; at the maximum, 7.29, it is 0.48. These differences are
small relative to the main effects of niche width. The total effect of niche width
(including the main effect and the interaction) is negative even at the observed
maximum of activism for the three highest price levels. So, although activists react
less to boundary crossing, they find broad spanners less appealing than restaurants
that do not span categories or that span close by categories. The situation for the
interactions of activism with maximum and secondary contrast in SF is similar. The
−0.1
−0.2
Effect on appeal
−0.3
−0.4
−0.5 Qualitative
Jaccard
Tversky
−0.6
−0.7
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Maximum contrast
Figure 3. Effect of maximum contrast on appeal for San Fran-

cisco restaurants based on three different measures of grades of
membership: solid line based on specification (1) in Table 6, the
dotted line based on specification (2) in Table 6, and the dashed
line based on specification (3) in Table 6.
total effect of maximum contrast is positive up to the 99th percentile of activism,

and the total effect of secondary contrast is negative up to the 99th percentile of
activism.
For LA only one of three interactions is significant: the interaction of niche
width and secondary contrast if positive and significant. Again the total effect of
secondary contrast on appeal is negative over virtually the full range of reviewer
activism, up to the 99th percentile of the distribution.
7. Discussion
Our point of departure is a general pattern emerging in contemporary research:
organizations affiliated with multiple categories suffer diminished appeal in the
eye of audience members (Zuckerman 1999; Dobrev, Kim, and Hannan 2001; Hsu
et al. 2009). We argue that better understanding of the consequences of category
spanning requires consideration of the structure of the underlying categories space.
We made two arguments. First, the socio-cultural distance of the categories be-
ing combined affects evaluations: the less similar are the spanned categories, the
greater is the confusion of identity (the overall atypicality) resulting from combining
them. Such confusion lowers appeal to audience members. Second, the contrasts of
categories also influence the consequences of combination. Because high-contrast
categories come with stronger expectations and norms, we expect that combining
0.4
0.3
0.2
Effect on appeal
0.1
−0.1
−0.2
−0.3
0.9
0.8
0.7 1
0.6 0.9
0.8
0.5 0.7
0.4 0.6
Secondary contrast 0.3 0.5
0.4
0.2 0.3 Maximum contrast
0.1 0.2
0.1
Figure 4. Effect of maximum contrast and secondary contrast on

appeal for San Francisco. Based on the specification in the second
column in Table 6.
high-contrast categories leads to more confusion and therefore to lower appeal to

audience members.
Study these propositions empirically required rethinking how multiple-category
membership influences producers’ appeal to audience members. We took as a start-
ing point the approach of Hsu et al. (2009), which by now has become a widespread
approach to study multiple memberships (Pontikes 2008; Hsu et al. 2009; Pontikes
2009; Carroll et al. 2010; Negro et al. 2010a; Kovács and Hannan 2010; Negro et al.
2011). This approach assume that, because the schemas for the various labels differ,
organizations that get assigned only one label generally fit to that category schema
better than do those that get assigned two or more labels. Therefore, Hsu et al.
(2009) argue that grades of membership in each category fall as more labels are
assigned. They propose that GoMs of organizations in categories can be measured
simply in terms of the count of labels assigned to an organization.
Our goal to take category structure into account led us to rework this approach.
We proposed that inter-category distances be built directly into the GoM function
in a way that yields lower GoMs for combining more distant categories. We then
use the new measure of GoM to calculate category contrast. Thus we arrive at a
measure of contrast that builds on the category structure of the domain. We also
introduced two alternative measures to categorical niche width that incorporate the
similarity structure of the categories.
In the empirical part of the paper we analyzed customers’ evaluations of restau-
rants in SF and LA (many of which cross categorical boundaries, such as “Mexican”–
“Vietnamese”). We found generally strong support for our theoretical propositions.
We also found that the proposed measures provide a better model fits than the
approach used previously, suggesting that they provide a more precise description
of the data. In other words, incorporating information about category structure
indeed leads to a better understanding of the consequences of category spanning.
Using labels to infer category membership, however, could be more tenuous
than we have assumed. For instance, multiple labels do not differentiate between
“foodcourt” and “fusion” situations (Baron 2004). That is, a “Mexican”-“French”
restaurant can be either such that on side of its menu it offers Mexican dishes
while on the other side it offers French dishes; or, it might serve only dishes that
fuse elements of the two cuisines. These are qualitatively distinct cases of category
spanning, and these restaurants would attract different audiences. We cannot,
however, tell these cases apart in our data. Future work is needed to address this
distinction both theoretically and empirically (for example, by analyzing restaurant
menus).
Our proposed approach could be useful in diverse empirical settings. Organi-
zational examples include law firms (some practices are closer to others), wineries
(looking at different blends of varietals to calculate distances among varietals),
movies (certain movie genres are closer to others), financial organizations such as
hedge funds (some stocks and financial instruments are closer to each other than
others, thus hedge funds differ in their focus). Other possible applications include
category combination in work (Leahey 2007), innovation (Carnabuci et al. 2011),
or culture (Goldberg 2011).
The measures proposed for estimating distance of categories could aid researchers
in studying the evolution of categories over time. Categories that are distant in one
time period might become much less so in subsequent ones. Adjusting for the dis-
similarity of categories seems particularly useful when the categorical structure is
in flux. Indeed the finding that distance among categories and the contrast of cate-
gories influence producers’ evaluation might have interesting dynamic consequences.
In the case of categorical contrast, if producers in low-contrast categories are more
likely to cross boundaries, then the contrasts of these categories further decrease;
and, similarly, we expect that the contrast of high-contrast domains would further
increase or at least remain stable. Pontikes and Hannan (2011) find evidence of
such a pattern in the software industry. These processes would imply a tendency
toward the macro-level polarization of categories’ contrasts.
Another possibly interesting dynamic links category similarity and the distances
among categories. On one hand, distance affects the prevalence of spanning. Cate-
gory combination, however, influences how audiences perceive the distance among
categories: research in cognitive psychology and linguistics show that categories
that tend to occur together are perceived similar (e.g., Church and Hanks 1990).
This feedback loop between spanning and category distances would imply the polar-
ization of pairwise distances of categories: categories that are initially similar will
get combined more often thereby increasing their similarity; and dissimilar pairs
will rarely be combined, keeping their similarity low.
How do the processes that we study intersect with the phenomenon of cultural
omnivore (Peterson 1992; Peterson and Kern 1996; Goldberg 2011)? Although
we controlled for individual differences in reviewing histories, we did not study
how individual differences affect reactions to category combination. The literature
on structural differences in omnivorousness would indicate, however, that these
differences are present and would make a difference. A main finding in this literature
states that individuals in high-status occupations more are more likely than others
to be involved in a wide range of cultural activities, for example they attend both the
opera (high-brow) and rock concerts (low-brow) (see, e.g., Peterson and Kern 1996).
This pattern suggests that high-status individuals are more open to trying a variety
of cuisines and their combinations. The negative effects of category spanning,
then, would depend on the audience structure, specifically the composition of each
organization’s audience in terms of omnivorousness.16
Status likely plays a role in these processes as well. Clearly, categories differ in
status (Sharkey 2010), and specific to our empirical domain, among cuisines (for
example, French and sushi are traditionally considered as high-brow). Organiza-
tions could be punished for combining status-incoherent cuisines. Although status
differences would be picked up by our relational approach to category distances,
future research could scrutinize the impact of status differences on category span-
ning. Some relevant research has addressed this topic. For instance, research on
middle-status conformity investigates the antecedents of category spanning: orga-
nizations with either low or high status are more likely than middle-status ones
to deviate from (non-fundamental) category codes (Phillips and Zuckerman 2001).
Rao et al. (2005) demonstrates that high-status chefs are more likely to cross bound-
aries. There is less research, however, on the relationship between status and the
consequences of crossing category boundaries.
Another potential extensions could be to use the proposed approach to study
the diversity of organizations entering a given category (McKendrick and Carroll
2001). The process of legitimation would likely change if de alio entrants come from
a set of industries that are close versus distant from one another. Future research
might look at how, when novel forms first emerge, their distance relative to others
affects legitimation (Ruef 2000).
Future research could also explore alternative measurement approaches. For
example, one could put more emphasis on the analysis of audience structures and
on the taste of audience members regarding category spanning by using the novel
approach of relational class analysis (Goldberg 2011). Future research that uses
review data should also consider the selection problems that arise in such data:
while this paper analyzed how category spanning influences whether the restaurants
get high or low ratings, we did not model the chance that these restaurants are
16The audience member’s omnivorousness is not necessarily indicative of her willingness to accept
category spanning though. It is not evident that an omnivore is open to this combination. For
example, even if she likes Mexican and French cuisines, she might care about the authenticity
of the experience and would only appreciate authentic instances of either genre but not their
combination.
visited or reviewed. We suspect that category spanning does influence selection

but our data did not allow us to investigate this question further.
Appendix
Here we present the estimated effect of maximum contrast (the contrast of the
only applicable category) for restaurants in the two cities that bear only one cate-
gory label.
Table 7. Estimated effects of maximum contrast on appeal of

single food-category restaurants in San Francisco and Los Angles:
ML estimates of ordered logit models

SimQ SimJ SimT SimQ SimJ SimT
Price level 2 −0.300** −0.303** −0.305** −0.106** −0.099** −0.090**
(0.006) (0.006) (0.006) (0.007) (0.007) (0.007)
Price level 3 −0.210** −0.215** −0.221** −0.011 0.000 0.016
(0.009) (0.009) (0.009) (0.013) (0.013) (0.013)
Price level 4 0.410** 0.409** 0.405** 0.567** 0.580** 0.596**
(0.018) (0.018) (0.018) (0.022) (0.023) (0.022)
Non-food cat. 0.030** 0.029** 0.025** −0.131** −0.130** −0.125**
(0.005) (0.005) (0.005) (0.006) (0.006) (0.006)
Review date_years 0.027** 0.027** 0.027** 0.002 0.002 0.003
(0.002) (0.002) (0.002) (0.003) (0.003) (0.003)
Rest. prominence 0.255** 0.255** 0.255** 0.101** 0.103** 0.105**
(0.003) (0.003) (0.003) (0.003) (0.003) (0.003)
ln(rev. activism) −0.046** −0.049** −0.053** −0.066** −0.070** −0.077**
(0.005) (0.005) (0.005) (0.006) (0.006) (0.006)
ln(max contrast) 0.312** 0.184** 0.201** −0.033 0.102* 0.354**
(0.067) (0.045) (0.058) (0.074) (0.048) (0.065)
ln(max contrast) −0.036* −0.034** −0.055** −0.043* −0.039** −0.071**
*ln(rev. activism) (0.016) (0.011) (0.014) (0.018) (0.012) (0.016)
N 497,233 497,233 497,233 397,210 397,210 397,210
ln L −703,050 −703,072 −703,076 −569,248 −569,272 −569,262
* p < 0.05; ** p < 0.01; standard errors (clustered by reviewer) in parentheses.

References
Baron, James N. 2004. “Employing Identities in Organizational Ecology.” Industrial
and Corporate Change 13:3–32.
Batagelj, Vladimir and Matevz Bren. 1995. “Comparing Resemblance Measures.”
Journal of Classification 12:73–90.
Carnabuci, Gianluca, Balázs Kovács, and Filippo Wezel. 2011. “A Breach of Dis-
cipline? Category Contrast and the Impact of Knowledge Recombination.” Pre-
sentated at the Meetings of the Nagymaros Group on Organizational Ecology,
Lugano, Switzerland.
Carroll, Glenn R., Mi Feng, Gaël Le Mens, and David G. McKendrick. 2010. “Orga-
nizational Evolution with Fuzzy Technological Formats: Tape Drive Producers
in the World Market, 1951–1998.” Research in the Sociology of Organizations
31:203–234.
Carroll, Glenn R. and Dennis R. Wheaton. 2009. “The Organizational Construction
of Authenticity: An Examination of Contemporary Food and Dining in the U.S.”
Research in Organizational Behavior 29:255–282.
Church, Kenneth W. and Patrick Hanks. 1990. “Word Association Norms, Mutual
Information, and Lexicography.” Computational Linguistics 16:22–29.
Dobrev, Stanislav D., Tai-Young Kim, and Michael T. Hannan. 2001. “Dynam-
ics of Niche Width and Resource Partitioning.” American Journal of Sociology
106:1299–1337.
Durkheim, Emile and Marcel Mauss. 1969 [1903]. Primitive Classification. London:
Routledge.
Garfinkel, Harold. 1968. Studies in Ethnomethodology. Polity Press.
Goldberg, Amir. 2011. “Mapping Shared Understandings Using Relational Class
Analysis: The Case of the Cultural Omnivore Reconsidered.” American Journal
of Sociology 116:1397–1436.
Griswold, Wendy. 1987. “The Fabrication of Meaning: Literary Interpretation in
the United States, Great Britain, and the West Indies.” American Journal of
Sociology 92:1077–1117.
Hampton, James A. 2007. “Typicality, Graded Membership, and Vagueness.” Cog-
nitive Science 31:355–383.
Hannan, Michael T. 2010. “Partiality of Memberships in Categories and Audiences.”
Annual Review of Sociology 36:159–181.
Hannan, Michael T., László Pólos, and Glenn R. Carroll. 2007. Logics of Orga-
nization Theory: Audiences, Codes, and Ecologies. Princeton, N.J.: Princeton
University Press.
Hsu, Greta. 2006. “Jacks of All Trades and Masters of None: Audiences’ Reac-
tions to Spanning Genres in Feature Film Production.” Administrative Science
Quarterly 51:420–450.
Hsu, Greta, Michael T. Hannan, and Özgeçan Koçak. 2009. “Multiple Category
Memberships in Markets: An Integrative Theory and Two Empirical Tests.”
American Sociological Review 74:150–169.
Hsu, Greta, Michael T. Hannan, and László Pólos. 2011. “Typecasting, Legitima-
tion, and Form Emergence: A Formal Theory.” Sociological Theory 29:97–123.
Jaccard, Paul. 1901. “Étude Comparative de la Distribution Florale dans une
Portion des Alpes et des Jura.” Bulletin de la Société Vaudoise des Sciences
Naturelles 37:547–579.
Koçak, Özgeçan, Michael T. Hannan, and Greta Hsu. 2009. “Enthusiasts and the
Structure of Markets.” Presented at the Meeting of the Society for the Study of
Socioeconomics, Paris.
Kovács, Balázs. Forthcoming. “A Monte Carlo Permutation Test for Co-Occurrence
Data.” Quality and Quantity .
Kovács, Balázs and Michael T. Hannan. 2010. “The Consequences of Category Span-
ning Depend on Contrast.” Research in the Sociology of Organizations 31:175–
201.
Kruskal, Joseph B. and Myron Wish. 1978. Multidimensional Scaling. Sage.
Leahey, Erin. 2007. “Not by Productivity Alone: How Visibility and Specialization
Contribute to Academic Earnings.” American Sociological Review 72:533–561.
Leung, Ming D. 2011. “Apples to Oranges: How Category Overlap Faciliates Com-
mensuration in an Online Market for Services.” Presented at the Meetings of the
Academy of Management.
Leung, Ming D. and Amanda Sharkey. 2009. “Out of Sight Out of Mind: The Mere
Labeling Effect of Multiple-Category Memberships in Markets.” Unpublished
Manuscript, Stanford University.
McKendrick, David G. and Glenn R. Carroll. 2001. “On the Genesis of Organiza-
tional Forms: Evidence from the Market for Disk Drive Arrays.” Organization
Science 12:661–683.
Negro, Giacomo, Michael T. Hannan, and Hayagreeva Rao. 2010a. “Categorical
Contrast and Audience Appeal: Niche Width and Critical Success in Winemak-
ing.” Industrial and Corporate Change 19:1397–1425.
Negro, Giacomo, Michael T. Hannan, and Hayagreeva Rao. 2011. “Categorical
Challenge and Response: Modernism and Tradition in Italian Wine Production.”
Organization Science in press.
Negro, Giacomo, Özgeçan Koçak, and Greta Hsu. 2010b. “Research on Categories
in the Sociology of Organizations.” Research in the Sociology of Organizations
31:1–35.
Peterson, Richard A. 1992. “Understanding Audience Segmentation: From Elite
and Mass to Omnivore and Univore.” Poetics 21:243–258.
Peterson, Richard A. and Roger M. Kern. 1996. “Changing Highbrow Taste: From
Snob to Omnivore.” American Sociological Review 61:900–907.
Phillips, Damon J. and Ezra W. Zuckerman. 2001. “Middle-status Conformity: The-
oretical Refinement and Empirical Demonstration in Two Markets.” American
Journal of Sociology 107:379–429.
Pólos, László and Michael T. Hannan. 2002. “Reasoning with Partial Nnowledge.”
Sociological Methodology 32:133–81.
Pólos, László and Michael T. Hannan. 2004. “A Logic for Theories in Flux: A
Model-theoretic Approach.” Logique et Analyse 47:85–121.
Pontikes, Elizabeth G. 2008. Fitting In or Starting New? An Analysis of Invention,
Constraint, and the Emergence of New Categories in the Software Industry. Ph.D.
thesis, Stanford University.
Pontikes, Elizabeth G. 2009. “Leniency and Audience Evaluations.” Presented at
the the meetings of the Nagymaros Group on Organizational Ecology.
Pontikes, Elizabeth G. 2011. “Two Sides of the Same Coin: How Category Leniency
Affects Multiple Audience Evaluations.” Technical report, University of Chicago.
Pontikes, Elizabeth G. and Michael T. Hannan. 2011. “Leniency, Recombination,

and Claims to Category Labels.” Presented at the the meetings of the Nagymaros
Group on Organizational Ecology.
Rao, Hayagreeva, Philippe Monin, and Rodolphe Durand. 2005. “Border Crossing:
Bricolage and the Erosion of Categorical Boundaries in French Gastronomy.”
American Sociological Review 70:968–991.
Rosch, Eleanor and Carolyn B. Mervis. 1975. “Family Resemblances: Studies in
the Internal Structure of Categories.” Cognitive Psychology 7:573–605.
Rosch, Eleanor H. 1975. “Cognitive Representations of Semantic Categories.” Jour-
nal of Experimental Psychology: General 104:192–233.
Ruef, Martin. 2000. “The Emergence of Organizational Forms: A Community
Ecology Approach.” American Journal of Sociology 106:658–714.
Sharkey, Amanda J. 2010. Sieves and Lenses: Essays on the Role of Categoriza-
tion in Social Valuation. Ph.D. thesis, Graduate School of Business, Stanford
University.
Shepard, Roger N. 1987. “Toward a Universal Law of Generalization for Psycho-
logical Science.” Science 237:1317–1323.
Simmel, Georg. 1978 [1907]. The Philosophy of Money. Boston: Routledge.
Simpson, E. H. 1949. “Measurement of Diversity.” Nature 163:688.
Smith, Edward Bishop. 2011. “Identities as Lenses: How Organizational Identity
Affects Audiences’ Evaluation of Organizational Performance.” Administrative
Science Quarterly 56:61–94.
Tversky, Amos. 1977. “Features of Similarity.” Psychological Review 84:327–352.
Zuckerman, Ezra W. 1999. “The Categorical Imperative: Securities Analysts and
the Legitimacy Discount.” American Journal of Sociology 104:1398–1438.
Zuckerman, Ezra W., Tai-Young Kim, Kalinda Ukanwa, and James von Rittmann.
2003. “Robust Identities or Non-entities? Typecasting in the Feature Film Labor
Market.” American Journal of Sociology 108:1018–1075.
University of Lugano
Stanford University

RP2081R 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RP2081R 1

Uploaded by

Copyright:

Available Formats

CATEGORY SPANNING, DISTANCE, AND APPEAL

BALÁZS KOVÁCS AND MICHAEL T. HANNAN

Stanford GSB Research Paper 2081, revised November 2011

Abstract. A general finding in economic and organizational sociology states

Some research addresses the consequences of combining categories for individuals

In designing a new conceptualization and measurement strategy, we sought to

2. Category Spanning and Typicality

Another explanation emphasizes expectations of audiences. According to type-

and develop attitudes of reserve, strangeness, even aversion or repulsion (Griswold

term secondary contrast to refer to the next-to-maximal contrast. In these terms,

N x, x0 , y [nf(y) ∧ (mc(x, y) ≥ mc(x0 , y)) ∧ (sc(x, y) < sc(x0 , y))

P x, x0 , y [nf(y) ∧ ((mc(x, y) > mc(x0 , y)) ∧ (sc(x, y) ≤ sc(x0 , y))

Proof. This follows from a cut-rule applied to Postulates 1, 3, and 4.

3. Label Assignments and Grades of Membership in Categories

4. Measures Based on Distance in the Space of Categories

Tversky (1977) proposed a set-theoretic measure of similarity. In the original

Categorical Niche Width. The concept of categorical niche width provides a

Again we rewrite this (using eqn. 10) as

5. Empirical Application: Restaurant Genres

Symmetric Similarity. We begin with the more familiar symmetric similarity.

Figure 1. The evolution of the count of reviews per month in Los

Table 1. Examples of strong associations among restaurant genres

Focal genre Strongly similar genres

Asymmetric Similarity. How does Tversky’s asymmetric ratio measure change

Figure 2. Hierarchical clustering of the food categories (“average

Contrast. We expect that incorporating distance in calculation of GoMs would

Table 2. Contrasts of selected restaurant genres with alternative

Table 3. Examples of alternative calculations of categorical niche

6. Genre Combination and Restaurant Appeal

Table 4. Distributions of theoretically relevant variables with al-

San Francisco Los Angeles

intensity of engagement. We do control for engagement in categories outside the

Estimation. We use an ordered-logit specification to assess the effect of the above

Table 5. Model fits using three measures of similarity and three

Table 6. Estimated effects of niche width, maximum contrast,

San Francisco Los Angeles

* p < 0.05; ** p < 0.01; standard errors (clustered by reviewer) in parentheses.

Figure 3. Effect of maximum contrast on appeal for San Fran-

total effect of maximum contrast is positive up to the 99th percentile of activism,

Figure 4. Effect of maximum contrast and secondary contrast on

high-contrast categories leads to more confusion and therefore to lower appeal to

visited or reviewed. We suspect that category spanning does influence selection

Table 7. Estimated effects of maximum contrast on appeal of

San Francisco Los Angeles

* p < 0.05; ** p < 0.01; standard errors (clustered by reviewer) in parentheses.

Pontikes, Elizabeth G. and Michael T. Hannan. 2011. “Leniency, Recombination,

You might also like