You are on page 1of 27

What makes Individual I ’s a Collective We;

Coordination mechanisms & costs


Jisung Yoon1,2 , Chris Kempes3 , Vicky Chuqiao Yang4 , Geoffrey West3 , and
arXiv:2306.02113v1 [physics.soc-ph] 3 Jun 2023

Hyejin Youn1,2,3,*
1
Kellogg School of Management, Northwestern University, Evanston, IL
2
Northwestern Institute on Complex Systems, Evanston, IL
3
Santa Fe Institute, Santa Fe, NM
4
MIT Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA
*
The order of authors has not yet been determined.

June 6, 2023

Abstract
For a collective to become greater than the sum of its parts, individuals’ efforts
and activities must be coordinated or regulated. Not readily observable and measur-
able, this particular aspect often goes unnoticed and understudied in complex systems.
Diving into the Wikipedia ecosystem, where people are free to join and voluntarily
edit individual pages with no firm rules, we identified and quantified three fundamen-
tal coordination mechanisms and found they scale with an influx of contributors in
a remarkably systemic way over three order of magnitudes. Firstly, we have found a
super-linear growth in mutual adjustments (scaling exponent: 1.3), manifested through
extensive discussions and activity reversals. Secondly, the increase in direct supervision
(scaling exponent: 0.9), as represented by the administrators’ activities, is dispropor-
tionately limited. Finally, the rate of rule enforcement exhibits the slowest escalation
(scaling exponent 0.7), reflected by automated bots. The observed scaling exponents
are notably robust across topical categories with minor variations attributed to the
topic complication. Our findings suggest that as more people contribute to a project, a
self-regulating ecosystem incurs faster mutual adjustments than direct supervision and
rule enforcement. These findings have practical implications for online collaborative
communities aiming to enhance their coordination efficiency. These results also have
implications for how we understand human organizations in general.

Introduction

Coordination mechanisms lie at the heart of efficient and effective operations within complex
systems, including human organizations and biological entities. These mechanisms provide

1
the key to understanding how numerous individual components coalesce to form a cohesive,

operational whole across a wide range of scales [1, 2, 3]. To attain and maintain a function-
ing collective necessitates the inevitable and indispensable coordination cost of connecting
isolated individual I ’s to a meaningful We [4, 5, 6]. Nevertheless, coordination mechanisms

and associated costs often remain concealed in theoretical and mathematical studies, re-
vealing themselves only upon closer examination for implementation. Take, for instance,
a bustling bee colony, an impressive superorganism that, at first glance, seems to operate

without any associated coordination costs. Yet, despite their genetic predeterminations,
these societies necessitate constant coordination of activities to foster a thriving colony [7],
including frequent role adjustments, communication through pheromones [8], dances [9], and

vibrations [10] in addition to resolving occasional conflicts [11]. In our daily life, these under-
lying principles manifest themselves when we find our emails and meetings far exceeding our
original plan, leading to moments of unexpected frustration. According to a recent survey,

faculty in academia spent almost 45% of their work week in meetings, emails, scheduling,
planning, and administrative tasks that are not traditionally thought of as part of the life of
an academic [12].

Socioeconomic systems, in their wide-ranging forms, are typically coordinated and reg-
ulated through rules and feedback mechanisms [13, 14]. Rules within social systems can

either be naturally ingrained as in physical systems, akin to the genetically predetermined


roles of bees, or be crafted as impersonal embodiments of formal plans, blueprints, emergent
norms, or cultures, all aimed at meeting organizational goals. In addition to rules, feedback

mechanisms are a vital part of the system, much like the additional adaptations bees make,
offering flexibility and adaptability in the face of changing circumstances. While rules gov-
ern the many-body system by their mere existence or through procedural tools to enforce

them, feedback mechanisms often occur at the level of interpersonal interactions. Author-
ities enforce, interpret, and carry out rules or agendas through directional feedback. This

2
top-down approach, however, often runs the risk of bureaucratic slowdowns due to limited

administrative capacity [15]. Alongside this direct oversight, continuous communication for
mutual feedback among participants is always required to reduce ambiguity and accommo-
date flexibility. Therefore, selecting the appropriate coordination mechanism is paramount

to enhancing organizational effectiveness and efficiency.

What are the determining factors behind the coordination costs within a collective? Sev-

eral elements come into play, including the size of the organisation [16], its structure [17, 18],
and the intricacy of the problems it grapples with [19]. These variables are intertwined
and often hard to isolate from each other. For a collective to successfully navigate a com-

plex objective, the size, structure, and diversity of the organization need to correspond to
the required complexity of the problem at hand. For instance, sociopolitical evolution is
predominantly marked by the expansion of the polity scale [20]. This often leads to a co-

evolution between different coordination mechanisms, influenced by a multitude of factors.


Typically, large organizational units lean towards impersonal modes of coordination mecha-
nisms such as formalization, standardization, and hierarchical control to ensure efficiency [21]

while complex tasks involving considerable interdependence are often better handled through
personal modes via horizontal communication channels within large groups. Economies of
scale are another consideration when it comes to coordination costs. As an organization

grows, the relative proportion of administrators may decrease, thereby potentially reducing
costs [21, 22]. Again, the economies of scale in coordination costs have been contested in
both theoretical accounts and empirical evidence [23, 24]. Therefore, it remains unclear

whether there exist endogenous mechanisms generating the various types of coordination as
more individual I ’s join the collective We.
To unpack the underlying mechanism and empirically assess the associated costs, we have

chosen Wikipedia as our focal point. This is an ideal system for studying these dynamics
because of its vast collection of individual activity records and myriad small communities

3
associated with individual pages. Wikipedia embodies the essence of collective intelligence,

continually evolving through contributions from a diverse array of intellectual minds [25, 26].
Notably, what sets Wikipedia apart is that it is entirely authored and maintained by a de-
centralized community of volunteers. Individual pages are defined by groups of varying sizes,

giving us a perfect window into various scales. Yet, each page shares approximately the same
task of constructing a coherent knowledge structure of a subject. Within this community,
individuals contribute their knowledge not in isolation but in a concerted effort to construct

a cohesive knowledge structure for each project [27]. To achieve the construction of a com-
prehensive knowledge structure on a given topic, individuals must allocate extra time, effort,
and resources away from their substantive contributions to coordination among themselves.

This coordination effort entails communication in talk pages [28, 29, 30], decisions about
other’s edits [25, 31, 32, 33], the action of authorities [34, 35, 36], and even execution of
norms by automated bots [37, 38, 39].

In this paper, we investigate how coordination mechanisms evolve when the number of
contributors expands within each project by analyzing how various metrics scale with the
number of contributors. In almost all cases, we find good evidence of simple power-law

scaling (see Eq. 1 below) whose exponents quantitatively reveal that the nature of coordina-
tion mechanisms systematically shifts as the number of contributors on a project increases.
Furthermore, based on these scaling exponents, we are able to quantify comparisons be-

tween our results and other systems, including biological [40, 41], ecological [42], and urban
systems [43, 44]. Our study reveals that mutual adjustment exhibits a superlinear scaling
relationship with the number of contributors (i.e., mutual adjustment per contributor in-

creases as the size of the project increases), while direct supervision increases sublinearly as
contributor numbers rise (i.e., the amount of direct supervision per contributor decreases as
the size of the project increases, reflecting an economy of scale). Moreover, we found that co-

ordination by rule, measured by the automated bots, increases sublinearly, but more slowly
than coordination by feedback. This range in the pattern of coordination cost escalation

4
indicates nontrivial tradeoffs between mutual adjustment, supervision, and rule enforcement

as collective organizations increase, and suggests a variety of interesting mechanisms to ex-


plore in the future. The details of these results and their implications are presented and
discussed in the following section.

Result
Coordination Mechanisms in Wikipedia

The goal of Wikipedia is to capture the entirety of human understanding as manifest by


6,412,821 pages covering an expansive range of topics. This grand ambition, accomplished

through the collective efforts of numerous contributors, necessitates an intricate process


wherein each article’s fabric is woven through meticulous edits to ensure accuracy, coherency,
and structural consistency. To achieve this objective, for each project (or article/page),

contributors edit segments such that the entire page is written in a coherent and consistent
structure. This often incurs disagreement with other contributors and thus ends up with
revising or reverting prior contributors’ work. Indeed, contributors do not edit in isolation—

they need to coordinate themselves with others on the platform to resolve disagreements and
conflicts. We categorized coordination activities on Wikipedia into four types: discussions,
disagreements, interventions, and rule enforcement as illustrated in Fig 1. By measuring the

volume of these coordination activities, we can better understand how Wikipedia functions
as a collaborative platform and quantify the cost of making I ’s into We in a decentralized
system.

Given the vast intellectual diversity of individuals, differing views on any given topic are
inevitable. The true brilliance of Wikipedia lies in its ability to weave these disparate threads
into a single, cohesive narrative. However, pursuing structural cohesion from various people

comes with a price. Discord can arise when contributors’ opinions diverge significantly. Such
disagreements become visible in the frequency of reversals — a feature that empowers all
contributors to completely annul the efforts of others, thus reverting the article to a previous

5
state. This process is often considered to reveal a “norm violation” [45], and employing

reverts to map conflicts on Wikipedia is widely used [25, 31, 32, 33].
When simple reversals do not resolve the discords, or reversals continue back and forth
indefinitely, a lengthy discussion with other contributors is needed to make decisions about

an article’s content and potential improvements. These discussions are documented in the
“talk page” for each project [28, 29]. All contributors can utilize this page to deliberate on the
validity of sources, the page’s structure and organization, what content should be included,

and the conflicting opinions. In some cases, talk pages are also used to build consensus
or norms for the project, including documenting how previous conflicts were resolved. To
measure the level of discussion for each article, we use the length of the corresponding talk

page. In our analysis, both discussions (the talk page) and disagreements (reversals) are
forms of mutual adjustments toward consensus through tools available to all contributors.
Every participant in an article can leverage these tools in their interactions with each other

up to behaviors requiring higher-level intervention, which we discuss next.

6
Figure 1: Schematic diagram of coordination mechanisms in Wikipedia. In Wikipedia,
discussions involve communicating with other contributors about the content of a project
(page/article). We measure the level of discussion by looking at the length of the corresponding
talk page, an accompanied page for discussing potential improvements to the article. Conflicting
opinions or ideas among contributors can lead to disagreements and disputes, proxied by frequent
reverts. To manage conflicts among contributors, administrators take actions known as interven-
tions, which we measure by administrators’ activities. Bots are automated to take action, enforcing
a set of rules.

7
Beyond the actions that all contributors can take to affect each other, some actions are

only available to a small subset of Wikipedia administrators, also referred to as Sysops,


an abbreviation of system operator. Sysops comprise only 0.001% of Wikipedia users and
are officially approved through the adminship process, by a higher administrator group

called Bureaucrats. They have powerful authority to block and unblock users, delete and
protect pages, and rename pages, often guided by the need to deal with vandalism, enforce
community policies and guidelines, and mediate disputes. Sysops often delegate their duties

and administrative privileges to trusted Wikipedia users. We include the actions taken by
those who require adminship approval administrators’ interventions as direct supervision [46,
47] as opposed to mutual interactions since they entail human intervention and because there

is a clear authority and power direction in the interactions.


A final category of actions focuses on bots, which in Wikipedia are automated tools that
carry out repetitive and mundane tasks required for page maintenance. One example is

anti-vandalism bots like ClueBot NG, which are specifically designed to detect and undo
instances of vandalism swiftly. According to an empirical study, when ClueBot NG experi-
enced a breakdown, the number of revert edits nearly doubled [48], highlighting the bot’s

effectiveness in suppressing conflicts and upholding the quality of Wikipedia’s pages. Bots
can also generate standardized articles, such as those related to geography, by using statis-
tical data [49]. While bots are capable of making edits at a fast pace if they are designed or

operated incorrectly, they have the potential to disrupt the smooth functioning of Wikipedia.
To address this concern, Wikipedia established a bot policy, which is primarily maintained
by Wikipedia administrators. In our analysis, we consider edits by bots as a proxy of rule

enforcement in Wikipedia [37, 38].


To study how coordination costs increase with the number of contributors, we regarded
each article as a functioning group for a project. To analyze the requirements for coordi-

nation, we compare organizations of different sizes using a scaling perspective. Using the
number of unique contributors on each page, 𝑁, as the measure of organization size, we

8
consider power-law scaling takes the form of

𝑌 = 𝑌0 𝑁 𝛽 . (1)

where 𝑌 represents the coordination cost and 𝑌0 is a normalization constant. The exponent
𝛽 quantifies the rate of increase in 𝑌 with relative increases in 𝑁 as being more (𝛽 > 1) or

less (𝛽 < 1) than expected if it were linear.


We focus our empirical analysis on Vital Articles, a subset of articles selected by Wikipedia
as covering important topics, aiming for higher quality. Among them, we only consider ar-

ticles with at least one edit in their talk pages, one revert in their content, and one action
from an administrator. This resulting subset comprises 26,014 pages tuned to our ques-
tions. Fig. 2 shows the scaling curves of coordination costs. We observe that both talk page

size and the number of reverts exhibit a super-linear scaling relationship (𝛽 ≃ 1.3), Fig. 2
(a-b). This implies that discussions and disagreements in Wikipedia grow faster than the
number of contributors, reflecting the societal characteristics of coordination costs driven

by interactions, which may follow similar mechanisms to those driving urban features with
superlinear scaling [43, 44]. On the other hand, administrator activities and bot activities
increase sub-linearly, 𝛽 ≃ 0.9 and 𝛽 ≃ 0.7, respectively (Fig. 2 c-d.) These results display

economies of scale, consistent with economies of scale of infrastructure in urban systems [43]
and administrative staff in companies [21]. Finally, our analysis reveals that the length of
Frequently Asked Questions (FAQ), which acts as a repository for past consensus in Talk

Pages, scales sub-linearly (𝛽 = 0.673, Fig.3), allowing us to understand how coordination


leads to establishing norms, culture, and rules. This indicates that the growth rate of FAQ
length is even slower than the growth rate of the number of administrators.

As discussed above, the coordination mechanisms are determined not only by the size 𝑁
but also by the complexity of functions 𝐶, which are often interrelated. For complex topics,
it is often the case that a larger group of individuals is required to produce a bring full

perspective. Similarly, the more complex the subject, the higher the potential for disagree-
ments and disputes, drawing a greater crowd into the discourse. This naturally suggests

9
a
Talk page size (Byte) slope = 1.334 [1.255, 1.413] b slope = 1.312 [1.282, 1.342]

# of revert
# of contributor # of contributor
c slope = 0.906 [0.867, 0.945] d slope = 0.692 [0.671, 0.713]
# of admin activity

# of bot activity

# of contributor # of contributor

Figure 2: Relationship between four coordination mechanisms in Wikipedia and the


number of article contributors. The four coordination costs measured are: a) Talk page size,
b) Number of reverts, c) Number of edits by administrators, and d) Number of edits by bots. The
orange line represents the regression results, while the gray line indicates linear scaling to help guide
the eye. Blue dots represent the average value of each bin, with error bars denoting the standard
error. Mutual adjustment (talk page size and reverts) increases faster than contributors (super-
linear scaling). Direct supervision (admin activity) and rule enforcement (bot activity) increase
slower than contributors (sublinear scaling).

that we should consider how an article’s content might influence our conclusions. Table 1
summarizes the scaling exponents deconstructed by article category. Overall, the findings

are remarkably consistent with the previous integrated analysis: talk page size and the num-
ber of reverts exhibit a super-linear scaling relationship, while administrator activity shows
a sub-linear relationship. That said, one mustn’t overlook the minor variations across ar-

ticle categories. For instance, in the category of Everyday Life and Mathematics—a topic
generally devoid of controversy—the talk page size increases only mildly superlinearly with

10
the rise in contributors (𝛽 ≈ 1.1), which stands in stark contrast to more contentious topics

such as History or Philosophy and religion, where the talk page size rises at a far brisker
pace (𝛽 ≈ 1.5). This does not mean, however, Mathematics is not complex enough to require
lengthy discussion. In fact, the average discussion volume for Mathematics is as substantial

as those of other topics (see SI). The results indicate that the complexity of the topic has to
be further differentiated epistemological complexity from controversy-prone topics. The lat-
ter incur frequent interpersonal conflicts among different ideologies, reflected on the fastest

growth in discussions and disagreements in contentious topics with the contributor influx as
opposed to those that require a significant amount of discussion regardless of the contributor
influx [4].

Category Pages 𝛽talk 𝛽revert 𝛽adm act 𝛽bot act


Arts 1,808 1.23 [1.08,1.38] 1.29 [1.24,1.33] 0.91 [0.86,0.96] 0.73 [0.70,0.76]
Biology/Health 2,503 1.46 [1.29,1.62] 1.35 [1.30,1.40] 0.88 [0.83,0.93] 0.75 [0.71,0.80]
Everyday life 1,553 1.12 [0.99,1.25] 1.24 [1.21,1.28] 0.92 [0.85,0.99] 0.72 [0.67,0.77]
Geography 2,701 1.40 [1.27,1.53] 1.35 [1.31,1.39] 0.94 [0.90,0.99] 0.78 [0.75,0.82]
History 1,913 1.53 [1.42,1.63] 1.28 [1.21,1.34] 0.90 [0.83,0.97] 0.76 [0.70,0.83]
Mathematics 455 1.10 [0.96,1.24] 1.42 [1.34,1.50] 0.92 [0.85,0.99] 0.80 [0.73,0.87]
People 7,884 1.39 [1.31,1.48] 1.36 [1.32,1.41] 0.89 [0.83,0.94] 0.64 [0.62,0.66]
Philos./Religion 884 1.49 [1.38,1.60] 1.42 [1.37,1.47] 0.95 [0.88,1.02] 0.68 [0.63,0.74]
Physical sci. 1,936 1.44 [1.28,1.61] 1.35 [1.32,1.39] 0.96 [0.91,1.02] 0.76 [0.72,0.79]
Society/Social 2,698 1.38 [1.27,1.48] 1.27 [1.22,1.31] 0.99 [0.95,1.02] 0.76 [0.72,0.79]
Technology 1,679 1.21 [1.13,1.30] 1.30 [1.27,1.33] 0.97 [0.93,1.02] 0.81 [0.78,0.85]
Vital Articles 26,014 1.33 [1.26,1.41] 1.31 [1.28,1.34] 0.91 [0.87, 0.95] 0.69 [0.67,0.71]

Table 1: Scaling exponents of coordination costs by article category. The scaling expo-
nent 𝛽 and corresponding confidence intervals for all categories of Vital Articles, in all four measures
of coordination costs. Talk page length and reverts correspond to mutual adjustment, administra-
tor activity corresponds to direct supervision, and Bot activity corresponds to rule enforcement.
Although there are minor differences in the scaling exponents across categories, the general trend
of super- or sub-linearity remains consistent across all categories.

As the scaling relationship (Eq. 1) only provides an average description of the behavior of
coordination costs, differences between observed and predicted values may indicate that other
factors are influencing the required amount of coordination. A residual analysis provides

insights into the factors that affect coordination costs beyond the mean behavior predicted
by the power law scaling model. To investigate this, we calculate the residual, 𝜉𝑖 , [50] for

11







 R 2 = 0.02
 
 

Figure 3: Coordinnation mechanisms in Wikipedia: FAQs Within Wikipedia’s talk pages,


there exists a special section called FAQ (Frequently Asked Questions), which serves as a repository
for past consensus, aimed at minimizing repetitive coordination efforts. Among the 48,822 vital
pages in Wikipedia, a total of 175 pages have a FAQ section. Our analysis reveals that the length
of these FAQ pages exhibits a sub-linear scaling relationship, indicating that it grows at a slower
rate compared to the number of administrators (0.753).

each coordination cost as follows:

𝑌𝑖 𝑌𝑖
𝜉𝑖 ≡ log = log , (2)
𝑌 (𝑁𝑖 ) 𝑌0 𝑁
𝛽
𝑖
where 𝑌𝑖 is the observed value of coordination costs and 𝑌 (𝑁𝑖 ) is the expected value of co-
ordination costs given the number of contributors 𝑁𝑖 . As shown in Table 2, the residuals
of talk page size, the number of reverts, and the admin activity denoted as 𝜉 talk , 𝜉 revert ,

and 𝜉 bot act respectively, exhibit almost no correlation (𝜌𝜉 talk ,𝜉 revert = 0.092,𝜌𝜉 talk ,𝜉 bot act =
0.080,𝜌𝜉 revert ,𝜉 bot act = 0.088). In contrast, residuals of admin activity, 𝜉 adm act shows rela-
tively strong correlations with other coordination costs (𝜌𝜉 adm act ,𝜉 talk = 0.386,𝜌𝜉 adm act ,𝜉 revert =

0.229,𝜌𝜉 adm act ,𝜉 bot act = 0.427), implying that governance plays a central role in the coordina-
tion process of collective intelligence [51, 52].
The residual analysis shows that there is no clear trade-off among different coordination

mechanisms. Nevertheless, the values of exponents by category suggest that the level of con-
troversy or prior knowledge surrounding the topic does require more discussion for effective
contributions. Furthermore, the results suggest that understanding the coordination needs of

12
𝜉 talk 𝜉 revert 𝜉 adm act 𝜉 bot act
𝜉 talk
𝜉 revert 0.092
𝜉 adm act 0.386 0.229
𝜉 bot act 0.080 0.088 0.427

Table 2: Correlation between residuals The Pearson correlation among the scaling residuals
of the four coordination functions. The p-values for all correlations in this table are significant
(<< 0.001).

specific categories of articles may be useful for developing targeted interventions to improve
coordination and reduce costs. For instance, for categories that require more intervention
by administrators, allocating more resources to administrative tasks may be necessary to

improve overall coordination. Overall, these findings highlight the importance of considering
the specific nature of the content when studying coordination costs and developing strategies
to improve coordination in complex systems such as Wikipedia.

Discussion

We investigated how coordination costs scale with the size of the Wikipedia communities and

how they vary across different categories of articles. By analyzing the mutual adjustment
and direct supervision costs of coordination, we found that mutual adjustment costs scale
superlinearly with the number of contributors. In contrast, direct supervision cost and rule

enforcement increase sublinearly with the number of contributors. Our residual analysis
also highlighted that article content and the interaction between contributors shape the
coordination costs of Wikipedia pages.

Our findings contribute to the literature on coordination costs and scaling in complex
systems by providing insights into the coordination challenges online communities face. The
results also have practical implications for online communities, as they demonstrate the

importance of understanding the coordination costs associated with different categories of


content and the need to develop appropriate coordination mechanisms as the community

13
size increases.

In our increasingly complex society, the efforts and expertise of highly specialized in-
dividuals are distributed across the globe. This necessitates their productive coordination
spanning vast multi-level networks to create a diverse array of complex goods and services

[53]. Also, the impact of successful collaboration is becoming increasingly important in


knowledge production, showing the significance of collaborative efforts in producing suc-
cessful outcomes and advancing scientific progress [54, 55, 56, 57]. This reality underscores

the urgent need for enhanced comprehension of coordination mechanisms and their latent
structures at many different scales.

Data
Wikipedia data

We used Wikipedia XML dump of English Wikipedia on January 9, 2022, comprising edit
history of 6,412,821 distinct pages in Wikipedia. Each edit includes a unique identifier of the
editor (username for registered contributors and IP address for unregistered contributors),

timestamp, comments, and edit size. Additionally, we extracted contributor’s roles from
the user group table of the SQL dump. Our primary focus lies on the crucial articles in
the curated collection, Vital articles. These articles have been carefully selected to cover

a broad spectrum of subjects and are considered essential for obtaining a comprehensive
understanding of human knowledge. Encompassing a wide range of disciplines, including
history, science, arts, geography, and more, these articles represent significant topics, events,

individuals, and concepts. They undergo regular evaluation and updating to ensure their ac-
curacy, relevance, and comprehensiveness, thus serving as indispensable pillars of knowledge
within Wikipedia’s vast and extensive encyclopedia.

The exponents are robust to the entire English Wikipedia articles beyond vital articles
with 𝛽talk = 1.379, 𝛽revert = 1.309, 𝛽adm act = 0.910, and 𝛽bot act = 0.679 Also, among the
48,864 vital pages on Wikipedia, 22,680 pages have not undergone any actions by adminis-

14
trators. To delve deeper into this matter, we performed a robust check by re-conducting a

scaling analysis on talk page size, number of reverts, and bots’ activity including pages that
have not received any intervention from administrators. We again confirm that our results
are consistent (Fig.S4), affirming our earlier observations.

Acknowledgements

The authors would like to acknowledge the support of the National Science Foundation Grant
Award Number 2133863. J.Y and H.Y thank Seoul Lee for the literature search and helpful

discussions and Eric Rupley for generous support.

Author Contributions

All authors contributed to the work presented in this paper.

Additional Information

Supporting Information is available for this paper. Correspondence and requests for materials

should be addressed to Dr. Youn.

Data availability

Wikipedia edit history data are available at wiki-dumps, https://dumps.wikimedia.org/.


We collect the list of vital articles from https://en.wikipedia.org/wiki/Wikipedia%3A
Vital_articles.

Code Availability

The code used in this analysis can be found at

15
References

[1] Malone, T. W. Superminds: The surprising power of people and computers thinking

together (Little, Brown Spark, 2018).

[2] Tuomela, R. The philosophy of sociality: The shared point of view (Oxford University
Press, 2007).

[3] Weinberg, S. Dreams of a Final Theory: The Scientist’s Search for the Ultimate Laws
of Nature (Vintage, 1994).

[4] Van de Ven, A. H., Delbecq, A. L. & Koenig Jr, R. Determinants of coordination modes

within organizations. American Sociological Review 322–338 (1976).

[5] Coase, R. H. The nature of the firm (Springer, 1995).

[6] Bak-Coleman, J. B. et al. Stewardship of global collective behavior. Proceedings of the


National Academy of Sciences 118, e2025764118 (2021).

[7] Huang, Z.-Y. & Robinson, G. E. Honeybee colony integration: worker-worker interac-
tions mediate hormonally regulated plasticity in division of labor. Proceedings of the

National Academy of Sciences 89, 11726–11729 (1992).

[8] Slessor, K. N., Winston, M. L. & Le Conte, Y. Pheromone communication in the


honeybee (apis mellifera l.). Journal of chemical ecology 31, 2731–2745 (2005).

[9] Dyer, F. C. The biology of the dance language. Annual review of entomology 47,
917–949 (2002).

[10] Schneider, S. S. & Lewis, L. A. The vibration signal, modulatory communication and

the organization of labor in honey bees, apis mellifera. Apidologie 35, 117–131 (2004).

16
[11] Galbraith, D. A. et al. Testing the kinship theory of intragenomic conflict in honey

bees (apis mellifera). Proceedings of the National Academy of Sciences 113, 1020–1025
(2016).

[12] Ziker, J. The long, lonely job of homo academicus. Blue Review Post (2014).

[13] March, S. H. A., James G. Organizations (Wiley, New York, 1958).

[14] DiMaggio, P. J. & Powell, W. W. The iron cage revisited: Institutional isomorphism
and collective rationality in organizational fields. American sociological review 147–160
(1983).

[15] Mintzberg, H. Structure in 5’s: A synthesis of the research on organization design.

Management Science 26, 322–341 (1980).

[16] Camacho, A. Adaptation costs, coordination costs and optimal firm size. Journal of
Economic Behavior & Organization 15, 137–149 (1991).

[17] Marsden, P. V., Cook, C. R. & Kalleberg, A. L. Organizational structures: Coordination


and control. American Behavioral Scientist 37, 911–929 (1994).

[18] Powell, W. et al. Neither market nor hierarchy. The sociology of organizations: classic,

contemporary, and critical readings 315, 104–117 (2003).

[19] Thompson, J. D. Organizations in action: Social science bases of administrative theory


(Transaction publishers, 2003).

[20] Shin, J. et al. Scale and information-processing thresholds in holocene social evolution.
Nature communications 11, 2394 (2020).

[21] Klatzky, S. R. Relationship of organizational size to complexity and coordination.

Administrative Science Quarterly 428–438 (1970).

17
[22] Blau, P. M. A formal theory of differentiation in organizations. American Sociological

Review 201–218 (1970).

[23] Coates, R. & Updegraff, D. E. The relationship between organizational size and the

administrative component of banks. The Journal of Business 46, 576–588 (1973).

[24] Brooks, F. P. The mythical man-month. Datamation 20, 44–52 (1974).

[25] Yasseri, T., Sumi, R., Rung, A., Kornai, A. & Kertész, J. Dynamics of conflicts in

wikipedia. PloS One 7, e38869 (2012).

[26] Yun, J., Lee, S. H. & Jeong, H. Early onset of structural inequality in the formation of

collaborative knowledge in all wikimedia projects. Nature Human Behaviour 3, 155–163


(2019).

[27] Zhu, H., Zhang, A., He, J., Kraut, R. E. & Kittur, A. Effects of peer feedback on
contribution: a field experiment in wikipedia. In Proceedings of the SIGCHI conference

on human factors in computing systems, 2253–2262 (2013).

[28] Kittur, A. & Kraut, R. E. Harnessing the wisdom of crowds in wikipedia: quality

through coordination. In Proceedings of the 2008 ACM conference on Computer sup-


ported cooperative work, 37–46 (2008).

[29] Kittur, A. & Kraut, R. E. Beyond wikipedia: coordination and conflict in online
production groups. In Proceedings of the 2010 ACM conference on Computer supported
cooperative work, 215–224 (2010).

[30] Aaltonen, A. & Lanzara, G. F. Building governance capability in online social produc-

tion: Insights from wikipedia. Organization Studies 36, 1649–1673 (2015).

[31] Tsvetkova, M., Garcı́a-Gavilanes, R. & Yasseri, T. Dynamics of disagreement: Large-

scale temporal network analysis reveals negative interactions in online collaboration.


Scientific Reports 6, 36333 (2016).

18
[32] Zhang, A. F. et al. Participation of new editors after times of shock on wikipedia. In

Proceedings of the International AAAI Conference on Web and Social Media, vol. 13,
560–571 (2019).

[33] Halfaker, A., Kittur, A. & Riedl, J. Don’t bite the newbies: how reverts affect the quan-
tity and quality of wikipedia work. In Proceedings of the 7th international symposium

on wikis and open collaboration, 163–172 (2011).

[34] Arazy, O., Lifshitz-Assaf, H. & Balila, A. Neither a bazaar nor a cathedral: The interplay
between structure and agency in wikipedia’s role system. Journal of the Association for
Information Science and Technology 70, 3–15 (2019).

[35] Danescu-Niculescu-Mizil, C., Lee, L., Pang, B. & Kleinberg, J. Echoes of power: Lan-

guage effects and power differences in social interaction. In Proceedings of the 21st
international conference on World Wide Web, 699–708 (2012).

[36] Greenstein, S., Gu, G. & Zhu, F. Ideology and composition among an online crowd:
Evidence from wikipedians. Management Science 67, 3067–3086 (2021).

[37] Geiger, R. S. The social roles of bots and assisted editing programs. In Proceedings of
the 5th International Symposium on Wikis and Open Collaboration, 1–2 (2009).

[38] Steiner, T. Bots vs. wikipedians, anons vs. logged-ins (redux) a global study of edit

activity on wikipedia and wikidata. In Proceedings of The International Symposium on


Open Collaboration, 1–7 (2014).

[39] Tsvetkova, M., Garcı́a-Gavilanes, R., Floridi, L. & Yasseri, T. Even good bots fight:
The case of wikipedia. PloS One 12, e0171774 (2017).

[40] Koonin, E. V., Wolf, Y. I. & Karev, G. P. Power Laws, Scale-free Networks and Genome

Biology (Springer, 2006).

19
[41] Kempes, C. P. et al. Drivers of bacterial maintenance and minimal energy requirements.

Frontiers in Microbiology 31 (2017).

[42] Cody, M. L., MacArthur, R. H., Diamond, J. M. et al. Ecology and Evolution of

Communities (Harvard University Press, 1975).

[43] Bettencourt, L. M., Lobo, J., Helbing, D., Kühnert, C. & West, G. B. Growth, inno-
vation, scaling, and the pace of life in cities. Proceedings of the National Academy of
Sciences 104, 7301–7306 (2007).

[44] Youn, H. et al. Scaling and universality in urban economic diversification. Journal of

The Royal Society Interface 13, 20150937 (2016).

[45] Jan Piskorski, M. & Gorbatâi, A. Testing coleman’s social-norm enforcement mecha-
nism: Evidence from wikipedia. American Journal of Sociology 122, 1183–1222 (2017).

[46] Shachaf, P. & Hara, N. Beyond vandalism: Wikipedia trolls. Journal of Information
Science 36, 357–370 (2010).

[47] Klapper, H. & Reitzig, M. On the effects of authority on peer motivation: L earning

from w ikipedia. Strategic Management Journal 39, 2178–2203 (2018).

[48] Geiger, R. S. & Halfaker, A. When the levee breaks: without bots, what happens to
wikipedia’s quality control processes? In Proceedings of the 9th International Sympo-
sium on Open Collaboration, 1–6 (2013).

[49] Zheng, L., Albano, C. M., Vora, N. M., Mai, F. & Nickerson, J. V. The roles bots play

in wikipedia. Proceedings of the ACM on Human-Computer Interaction 3, 1–20 (2019).

[50] Bettencourt, L. M., Lobo, J., Strumsky, D. & West, G. B. Urban scaling and its
deviations: Revealing the structure of wealth, innovation and crime across cities. PloS
one 5, e13541 (2010).

20
[51] O’mahony, S. & Ferraro, F. The emergence of governance in an open source community.

Academy of Management Journal 50, 1079–1106 (2007).

[52] Demil, B. & Lecocq, X. Neither market nor hierarchy nor network: The emergence of

bazaar governance. Organization studies 27, 1447–1466 (2006).

[53] Hosseinioun, M., Neffke, F., Youn, H. et al. Deconstructing human capital to construct
hierarchical nestedness. arXiv preprint arXiv:2303.15629 (2023).

[54] van der Wouden, F. & Youn, H. The impact of geographical distance on learning through
collaboration. Research Policy 52, 104698 (2023).

[55] Wuchty, S., Jones, B. F. & Uzzi, B. The increasing dominance of teams in production

of knowledge. Science 316, 1036–1039 (2007).

[56] Fortunato, S. et al. Science of science. Science 359, eaao0185 (2018).

[57] Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science
and technology. Nature 566, 378–382 (2019).

21
Supplementary Information: What
makes Individual I ’s a Collective We;
Coordination mechanisms & costs
S1 Text. Who is the administrator in Wikipedia?: Bureaucracy in Wikipedia
The level of authority granted to users on Wikipedia is determined by their user access
level, which defines the actions they are allowed to perform. The hierarchical structure of

adminship flags is depicted in Fig. S1. The most prominent type of administrator is known
as a ”Sysop” (system operator), and they make up only 0.001% of Wikipedia users. To be-
come a sysop, individuals must undergo the adminship process and receive official approval,

which grants them the authority to block and unblock users, protect and delete pages, and
rename pages. Sysops also have the potential to acquire higher or more specialized access
levels (Fig.S1 top). Due to the limited number of administrators, they often delegate specific

responsibilities to trustworthy Wikipedia contributors (Fig.S1 bottom), referred delegated


administrators). This category also includes bots, which are approved accounts used to
assist human contributors in automating repetitive tasks. For analytical purposes, both re-

quested, and delegated administrators are considered as “administrators” making up 0.003%


of Wikipedia users. For simplicity, we define an administrator as someone who has held the
adminship flags at least once in Wikipedia’s history.

22
Category Pages 𝛽talk 𝑌0talk 𝛽revert 𝑌0revert 𝛽adm act 𝑌0adm act 𝛽bot act 𝑌0bot act
Arts 1,808 1.23 [1.08,1.38] 14.33 1.29 [1.24,1.33] 0.05 0.91 [0.86,0.96] 1.46 0.73 [0.70,0.76] 0.46
Biology/Health 2,503 1.46 [1.29,1.62] 1.35 [1.30,1.40] 0.03 0.88 [0.83,0.93] 1.98 0.75 [0.71,0.80] 0.48
Everyday life 1,553 1.12 [0.99,1.25] 31.70 1.24 [1.21,1.28] 0.07 0.92 [0.85,0.99] 1.24 0.72 [0.67,0.77] 0.47
Geography 2,701 1.40 [1.27,1.53] 4.89 1.35 [1.31,1.39] 0.03 0.94 [0.90,0.99] 1.26 0.78 [0.75,0.82] 0.40
History 1,913 1.53 [1.42,1.63] 4.68 1.28 [1.21,1.34] 0.06 0.90 [0.83,0.97] 1.96 0.76 [0.70,0.83] 0.46
Mathematics 455 1.10 [0.96,1.24] 79.99 1.42 [1.34,1.50] 0.02 0.92 [0.85,0.99] 1.33 0.80 [0.73,0.87] 0.25
People 7,884 1.39 [1.31,1.48] 5.87 1.36 [1.32,1.41] 0.03 0.89 [0.83,0.94] 1.98 0.64 [0.62,0.66] 0.96

23
Philos./Religion 884 1.49 [1.38,1.60] 6.01 1.42 [1.37,1.47] 0.02 0.95 [0.88,1.02] 1.16 0.68 [0.63,0.74] 0.68
Physical sci. 1,936 1.44 [1.28,1.61] 6.06 1.35 [1.32,1.39] 0.4 0.96 [0.91,1.02] 1.09 0.76 [0.72,0.79] 0.47
Society/Social 2,698 1.38 [1.27,1.48] 6.98 1.27 [1.22,1.31] 0.05 0.99 [0.95,1.02] 0.75 0.76 [0.72,0.79] 0.41
Technology 1,679 1.21 [1.13,1.30] 17.54 1.30 [1.27,1.33] 0.05 0.97 [0.93,1.02] 0.84 0.81 [0.78,0.85] 0.26
Vital Articles 26,014 1.33 [1.26,1.41] 11.5 1.31 [1.28,1.34] 0.05 0.91 [0.87, 0.95] 1.71 0.69 [0.67,0.71] 0.64

Table 1: Scaling exponent by article’s category Including 𝑌0 .


Founder (1) 2.3% 2.0% 0.1%

Import (2) 9.1% 5.3% 2.3% 0.7% 0.2%

Interface-admin (11) 50.0% 5.3% 6.8% 3.9% 4.7% 1.0% 0.1% 0.3%

Administer
Bureaucrat (19) 50.0% 9.1% 13.6% 7.8% 3.4% 1.8% 0.2%

Oversight (44) 100.0% 50.0% 27.3% 31.6% 60.8% 12.1% 4.1% 0.1% 0.6%

Checkuser (51) 100.0% 18.2% 21.1% 70.5% 14.8% 4.8% 0.1% 0.7%

Abusefilter (149) 50.0% 63.6% 26.3% 40.9% 43.1% 12.8% 5.9% 3.1% 1.1% 1.0% 0.3% 0.7% 1.1% 0.2% 0.1% 0.3%

Sysop (1066) 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 91.3% 4.3% 0.8% 0.1% 6.2% 3.1%

Accountcreator (17) 0.7% 21.7% 3.4% 2.1% 2.0% 0.5% 0.3% 1.3% 0.1% 0.3% 0.2%

Abusefilter-helper (23) 0.1% 29.4% 5.2% 0.8% 3.1% 3.4% 0.5% 0.6% 2.0% 0.1% 0.2% 0.2% 0.6%

Massmessage-sender (58) 11.8% 13.0% 9.9% 7.3% 4.5% 2.7% 0.8% 3.5% 0.8% 0.6% 0.6% 0.6%

Eventcoordinator (121) 4.3% 20.7% 3.7% 4.8% 2.5% 2.1% 3.1% 1.2% 0.5% 0.6% 0.3%

Delegated
Templateeditor (191) 4.0% 23.5% 26.1% 24.1% 5.8% 18.5% 11.7% 2.0% 8.0% 2.2% 1.6% 1.6% 4.0%

Extendedmover (357) 2.7% 41.2% 52.2% 27.6% 14.0% 34.6% 16.6% 4.8% 26.7% 4.6% 3.9% 3.7% 0.3%

Administer Filemover (403) 2.7% 11.8% 8.7% 19.0% 8.3% 24.6% 18.8% 2.9% 10.1% 5.0% 4.8% 4.4%

Ipblock-exempt (666) 1.3% 0.5% 11.8% 17.4% 8.6% 11.6% 6.8% 9.0% 4.7% 7.7% 2.1% 1.9% 1.8% 0.6%

Patroller (711) 2.3% 2.0% 3.4% 0.1% 52.9% 60.9% 43.1% 18.2% 29.8% 53.2% 17.9% 8.3% 7.7% 7.3% 6.9% 0.6%

Autoreviewer (4504) 100.0% 54.5% 36.8% 59.1% 62.7% 34.2% 26.1% 17.6% 26.1% 65.5% 43.8% 51.3% 57.7% 56.3% 14.0% 48.9% 21.7% 33.0%

Rollbacker (6561) 6.7% 100.0% 69.6% 69.0% 26.4% 55.0% 72.3% 78.9% 18.9% 67.7% 31.6% 48.6% 0.9%

Reviewer (7681) 6.7% 94.1% 78.3% 74.1% 37.2% 64.4% 80.4% 84.6% 21.0% 74.4% 56.3% 56.9% 1.5%

Bot (325) 9.1% 0.7% 0.9% 8.7% 3.4% 0.8% 6.8% 0.3% 0.3% 0.3% 0.1%
Founder (1)

Import (2)

Interface-admin (11)

Bureaucrat (19)

Oversight (44)

Checkuser (51)

Abusefilter (149)

Sysop (1066)

Accountcreator (17)

Abusefilter-helper (23)

Massmessage-sender (58)

Eventcoordinator (121)

Templateeditor (191)

Extendedmover (357)

Filemover (403)

Ipblock-exempt (666)

Patroller (711)

Autoreviewer (4504)

Rollbacker (6561)

Reviewer (7681)

Bot (325)

Figure S1: A hierarchical structure of adminship flags in Wikipedia. The axes of the
figure represent lists of adminship flags. Since a user can possess multiple adminship flags, the
colored squares with annotated numbers indicate the conditional probability that a user holding
an adminship flag on the y-axis also holds an adminship flag on the x-axis. This representation
reveals two distinct community structures within the adminship system: requested administrators
and delegated administrators.

24
a slope = 0.753 [0.747, 0.759] b slope = 0.395 [0.367, 0.423]
# of administrator

# of bot

# of contributor # of contributor
Figure S2: Coordination costs in Wikiepdia The figures depict the scaling relationship of
the required a) number of administrators and d) number of bots. The orange line represents the
regression results, while the gray line indicates the baseline results with a scaling exponent of 𝛽 = 1.
Blue dots represent the average value of each bin, with error bars denoting the standard error.

25
a slope = 1.379 [1.324, 1.435] b slope = 1.309 [1.283, 1.335]
Talk page size (Byte)

# of revert
# of contributor # of contributor
c slope = 0.910 [0.876, 0.944] d slope = 0.679 [0.663, 0.695]
# of admin activity

# of bot activity

# of contributor # of contributor

Figure S3: Coordincation costs in Wikipedia – All pages The figures depict the scaling
relationship between the measured coordination costs and the number of contributors for three
different coordination cost metrics: a) Talkpage size, b) Number of reverts, and c) Number of
administrators. The orange line represents the regression results, while the gray line indicates the
baseline results with a scaling exponent of 𝛽 = 1. Blue dots represent the average value of each
bin, with error bars denoting the standard error. We extend our analysis from the main page to
encompass the whole Wikipedia page for robustness check.

26
a slope = 1.324 [1.250, 1.398] b slope = 1.311 [1.282, 1.341] c slope = 0.710 [0.690, 0.730]
Talk page size (Byte)

# of bot activity
# of revert

# of contributor # of contributor # of contributor

Figure S4: Robustness check of the scaling exponent Out of the 48,822 vital pages in
Wikipedia, 22,071 pages have not received any intervention from administrators. For robustness
check, we examine the scaling relationship between the following metrics: a) Talk page size, b)
number of reverts, and c) number of bot activity and the number of contributors, including pages
that do not have any edit from administrators. Our findings consistently demonstrate a super-
scaling relationship on talk page size and the number of reverts, as well as a sub-linear scaling
relationship on the number of bot activities. These results are in line with our previous observations.

27

You might also like