You are on page 1of 70

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/322554233

Why is Wikipedia reliable ? Towards an etiology of Wikipedia’s reliability.

Thesis · January 2018


DOI: 10.13140/RG.2.2.26721.20321

CITATIONS READS
0 240

1 author:

Valentin Lageard
Sorbonne Université
6 PUBLICATIONS   0 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Valentin Lageard on 09 April 2018.

The user has requested enhancement of the downloaded file.


UNIVERSITÉ PARIS-SORBONNE

UFR de Philosophie

Mémoire de Master II

Spécialité : LOPHISC

Présenté et soutenu par :


Valentin LAGEARD
4 septembre 2017

Why is Wikipedia reliable ?


Towards an etiology of Wikipedia’s reliability.

Sous la direction de :
M. Cédric PATERNOTTE
To Aaron Swartz (1986-2013), a fighter for free knowledge.

0
Why is Wikipedia reliable ?
Towards an etiology of Wikipedia’s reliability.
Valentin Lageard

September 4, 2017

“Wikipedia flourished partly because it was a shrine to altruism.”


“Wikipedia is just an incredible thing. It is fact-encirclingly huge, and it is id-
iosyncratic, careful, messy, funny, shocking and full of simmering controversies -
and it is free, and it is fast.”
Nicholson Baker, novelist, in the article How I fell in love with Wikipedia 1 pub-
lished by The Guardian.

It is common to refer to Wikipedia as being an encyclopedia because it presents itself as


one. However, is it really one ? Gorman argues that since Wikipedia is not redacted by
a close committee of experts, then it’s not an encyclopedia. He states that Wikipedia is
“a compendium of opinion provided by unknown individuals” (Gorman [2007]). In contrast
to traditional encyclopedias, Wikipedia is massively participative : anyone with an internet
connection can edit any article. It seems however that the intuitive notion of encyclopedia
fundamentally refers to a compilation of thematically organized articles of which the purpose
is to present the knowledge we have about these topics. In this regard, it is quite clear that
Wikipedia is indeed an encyclopedia. The origin of redaction seems to be a contingent or
non-essential property of encyclopedias. To be redacted by a close committee of experts
or by any individual does not seem to be a necessary criterion of encyclopedias. However,
the validity of criteria used for definitions often relies on the epistemically unsound base of
unjustified intuitions we have about the defined notions. To be or not to be a necessary
criterion for the redactional origin in defining encyclopedias is not a problem for my inquiry
and I will let this question be solved by someone else.
Nevertheless, regardless of our definition of an encyclopedia, we must at least consider
Wikipedia as a thematically organized compilation of articles for which the purpose is to
1
Reference : https://www.theguardian.com/technology/2008/apr/10/wikipedia.internet

1
present our common knowledge about the covered topics. Therefore, it is clear that Wikipedia
has an epistemic aim. It is then relevant to raise some problems that have been traditionally
raised about Wikipedia. Is Wikipedia successfull in regard to its epistemic aim ? Are
informations2 in Wikipedia articles true ? In Don Fallis’ words : “is Wikipedia a reliable
source of information” (Don Fallis [2008]) ?

1 Is Wikipedia reliable ?
“Everything’s wrong on Wikipedia.”
Gore Vidal, writer, in the interview Gore Vidal: What i’ve learned 3 published in
the lifestyle and fashion “men’s magazine” Esquire.

“Wikipedia is the first place I go when I’m looking for knowledge. . . or when I
want to create some.”
Stephen Colbert in an episode of The Colbert Report 4 .

1.1 Defining reliability


Before I enumerate the various epistemic pitfalls of Wikipedia, it is useful to define precisely
what reliability is. I will use reliability in its statistical meaning. The first kind of reliability
is the reliability of an individual and is defined as follows :

Reliability of an individual =def the probability for this individual to state


a true information (when honestly believing this information).

The second kind of reliability is the reliability of a singular information and it is defined
as follows :

Reliability of an information =def the probability for this information to be


true (when stated honestly).

Of course, an information is necessarily true or false and its reliability must be inferred. I
name the individual-to-information reliability transfer the inference of attributing reliability
to an information from the reliability of the individual who stated it.
2
I will not respect the convention for the term “information” to be a mass noun. In this work, “an
information” is equivalent to “a proposition” and informations are countable.
3
Reference : http://www.esquire.com/entertainment/interviews/a4581/gore-vidal-0608/
4
Reference : http://www.cc.com/video-clips/z1aahs/the-colbert-report-the-word—wikiality

2
Individual-to-information reliability transfer =def the inference of at-
tributing the reliability of an individual to an information stated by this indi-
vidual.

If the individual n with the reliability rn states the information i, then the reliability ri
of the information i is equal to the reliability of the individual stating it (ri = rn ). We now
have a definition of reliability for an individual and for an information and we have a simple
statistical rule to transfer the first to the second, but what about the reliability of a source
(a compilation of information) ?

Reliability of a source =def the probability for a random5 information from


this source to be true (and when this information was honestly believed by the
user who added it to the source).

Therefore, we can calculate rS as the actual reliability of a source S with T as the total
number of informations in S and TT as the total number of true informations in S.

TT
rS =
T

The same way we transfer the reliability of the individual stating an information to the
information itself, we can transfer the reliability of a source to an information in this source
such as ri = rS . This is a source-to-information reliability transfer :

Source-to-information reliability transfer =def the inference of attributing


the reliability of a source to an information found in it.

Such a transfer is implicitely used to consider the reliability of a random information from a
source. When one rejects an information because of its source such as when an antiwikipedian
refuses an information because it comes from Wikipedia, one uses implicitely a source-to-
information reliability transfer. But there is another kind of possible reliability transfer which
is assumed by critics of Wikipedia : the individuals-to-source reliability transfer.

Individuals-to-source reliability transfer =def the inference of attributing


the reliability of a source based on the reliability of its contributors.

This reliability transfer is a bit more complicated to calculate. Using the individual-to-
source transfer, an information i will have the reliability ri = rn when stated by the individual
5
Using a uniform distribution.

3
n having the reliability rn . If this individual contributes to a source, the chance for a random
information of this source to be from n is (with in as the number of information added by n
and T as the total number of informations in the source S) :

in
T

Since ri = rn , the probability for a random information of this source (i) to be from the
user n and (ii) to be true is :

in
rn
T

And therefore the sum of all the probabilities for a random information in S (i) to be
from the user n, for all n in the set N of all contributors to S and (ii) to be true, is a
statistical approximation of the reliability rS of the source S (with nmax as the total number
of contributors of the source S) :
nX
max
in
rS = rn
n=1 T

1.2 Epistemic pitfalls


Wikipedia has been and is still attacked on a regular basis about the alleged success of its
epistemic aims especially its reliability aim. The assumed unreliability of Wikipedia has con-
sequences on the academic world. It is common for teachers to forbid quotes from Wikipedia
and as an example, the Harvard university provides a guideline stating that Wikipedia arti-
cles are not reliable sources for academic research6 . In order to teach a lesson to his students,
a french teacher even added an incorrect information on a Wikipedia article so his students
would copy it7 .
Criticisms of Wikipedia are generally based on the differences between Wikipedia and tra-
ditionnal encyclopedias. For instance, the reliability of traditionnal encyclopedias is thought
to be ensured by the reliability of the editorial committee composed of experts. This line
of reasoning is implicitely based on the individuals-to-source reliability transfer presented
above. For instance, let’s assume that an expert is someone with a random uniform reliabil-
ity between 0.5 and 1. To simplify the task, let’s also assume that each expert contributor
adds a single information which is true or false according to their reliability. According to
6
Reference : http://isites.harvard.edu/icb/icb.do?keyword=k70847&pageid=icb.page346376
7
Reference : http://www.laviemoderne.net/lames-de-fond/009-comment-j-ai-pourri-le-web.
html

4
the individuals-to-source reliability transfer, the approximated reliability of the source will
tend to 0.75 with a decreasing margin of error as the number of users increases. But how is
this a problem for Wikipedia’s aimed reliability ? Since Wikipedia is massively participative,
anyone can edit it regardless of their reliability. Therefore, a random Wikipedia editor can
have any level of reliability (between 0 and 1). For instance, if the distribution of reliability
in the population of Wikipedia editors is uniform and, to simplify, if a single information
is added by each editor according to their reliability, then the approximate reliability of
Wikipedia will be 0.5. Even if this is assumed by critics of Wikipedia to be a fundamental
and uncorrectable flaw for Wikipedia’s reliability such as Denning & al. who call this flaw
the “uncertain expertise” (Denning & al. [2005]), let us consider it instead as a problem for
a theory aiming to explain Wikipedia’s reliability. How can a source massively edited by
amateurs be reliable ? Without experts, what allows such a source to accumulate more true
informations than false ones ? I call this problem the amateurs problem.
There is another problem for Wikipedia’s reliability : the volatility problem. The volatility
problem has been outlined as a flaw by Denning & al. This is another problem based on the
comparison between Wikipedia and traditionnal encyclopedias. Traditionnal encyclopedias’
versioning system is such that they update slowly and each modification is peer-reviewed
by a competent committee of experts, such that true informations from previous versions
are preserved. But in Wikipedia, an editor edit has not to be peer-reviewed by experts
to be added. When someone edits Wikipedia, the edit is immediately active and visible.
This implies that there is a chance that true information will be deleted : information on
Wikipedia is volatile. To define volatility precisely we can say that volatility function is the
function of the probability for an information to have been deleted after being present for
a variable duration. Of course, false information is also volatile so a positive answer to the
volatility problem is more to explain why true informations are less volatile that false ones.
This is the problem of comparative volatility. But since informations can also be massively
deleted, which is a troll practice named “page blanking”8 , there is also a problem of absolute
volatility. How is it that the article survives such mass deletions ? However, while the
comparative volatility problem concerns reliability, the absolute volatility concerns mostly
another value of sources : power as defined as the quantity of information in the source. But
since an article with a single true information would not be deemed “reliable” while it would
technically be, the problem of absolute volatility will also be treated in my investigation.
Another problem, also outlined by Denning & al. is what they call the “motive” flaw.
According to them, since we cannot know the motive of Wikipedia’s editors, we therefore
cannot know if information is reliable. This line of reasoning seems to be based on the
8
Reference : https://en.wikipedia.org/wiki/Wikipedia:Page blanking

5
enthymeme which states that to transfer an individual’s reliability to an information or a
source, we must know the intent of this individual. In our definitions above, I included
between parentheses such an enthymeme by stating that reliability depends on the honesty
of the individual believing the information she states. This accounts for the fact that people
can state as much false information they want if they don’t believe in those informations but
cannot state as much true information they want when they believe in the informations they
state. So, in order for the individuals-to-source reliability transfer to be a reliable tool to
approximate a source’s reliability, we must be ensured to know the motive of those individuals.
In traditionnal encyclopedias, since editors are experts and they don’t want to loose their
expert reputation, they have the pressure to be epistemically honest. But, in Wikipedia, since
editors can be anonymous or just don’t care about their epistemic reputation, social pressure
does not ensure the motives of its users. Therefore the motive problem is a serious problem.
The motive problem is actually quite real in Wikipedia and manifests in the existence of
trolls. A troll is an epistemically dishonest contributor to Wikipedia and epistemic honesty
is the property for an individual to state only information he believes to be true.
The motive is also the dimension along which Fallis distinguishes false information into
three kinds (Fallis [2008]). The first kind of false information is misinformation and is de-
fined as information added by an epistemically honest contributor. The second kind of false
information is disinformation and is defined as false information intentionally added because
it was false. Disinformation, even if added by an epistemically dishonest contributor, is still
added because of an epistemic aim. Disinformation is also associated with a kind of troll :
the disinformer. A disinformer is an epistemically dishonest contributor adding only false
information following the intent to add false information or removing true information fol-
lowing the intent to remove true information. The third kind of false information is bullshit.
Bullshit is information added by the second kind of troll : the bullshitter. A bullshitter is
someone adding or removing information from Wikipedia without pursuing an epistemic aim.
Bullshit will then be false information added to a source without an epistemic aim. Disinfor-
mation and bullshit are supposed to be frequent in Wikipedia, since because Wikipedia has
a large audience, the stakes are high9 .
There is another minor problem for Wikipedia’s reliability which will not be treated
in depth. Since Wikipedia has discussion pages allowing contributors to deliberate about
the content of articles, Wikipedia may be affected by deliberation biases such as reputa-
tional cascades or amplification of cognitive errors (Sunstein [2007]). But since data about
communication-based causal factors on Wikipedia is very hard to obtain or to model, I will
9
In december 2016, the english Wikipedia was consulted nearly 6 millions times per hour. Reference :
https://stats.wikimedia.org/FR/Sitemap.htm

6
not concentrate my efforts on such flaws.
However, notwithstanding all those flaws and critics, Wikipedia is indeed reliable.

1.3 Wikipedia is reliable


I will not provide a detailed account for Wikipedia’s reliability. Fallis already studied this
question in depth and I am rooting my work in his results (Fallis [2008]). However, I’ll still
provide a minimal account for its reliability.
The massive use of Wikipedia has led researchers to empirically test the reliability of
Wikipedia. The most known research has been published in Nature and has been con-
ducted by Giles. It is a blind comparative review of 42 scientific articles by experts between
Wikipedia articles and articles from the encyclopedia Britannica (Giles [2005]). In regards to
errors and ommissions, a Wikipedia article had an average of 4 compared to an average of 3
for Britannica. In regards to serious errors, an average of 4 was measured for both Wikipedia
and Britannica. After this article, various reliability reviews were published in various ar-
eas. Even if reliability varies according to topics, reliability reviews were mostly positive10 at
least when compared to traditionnal encyclopedias such as Britannica. Reliability has also
been measured in an absolute manner instead of a comparative one. For instance a study
revealed that a random set of 100 Wikipedia entries about drugs had a reliability of 99.7%
(Kräenbring & al. [2014]).
Also, Wikipedia is known for its quick resilience against incorrect informations. Read
relates the example of Mr. Halavais, an assistant professor who falsified Wikipedia and found
his disruptive edit inverted in less than three hours (Read [2006]). Viégas & al. studied
mass deletions and found that half of them were corrected in less than 3 minutes outlining
Wikipedia’s “surprisingly effective self-healing capabilities” (Viégas & al. [2004]). Such
results are easily observed if one looks at the history of an article, especially if the article
has a large audience and that the stakes are high. The example of Mr. Halavais shows that
Wikipedia has some resilience against the volatility of false informations while the study of
the “survival time” of mass deletions shows that Wikipedia has resilience againt deletion of
true informations. It empirically appears that this resilience compensates disinformation and
bullshit when the stakes are high.
Also, in addition to reliability, Fallis outlines that Wikipedia also has other epistemic
virtues such as power (the quantity of accessible information), speed (the speed for informa-
tion to be acquired)11 and fecundity (the size of the audience) (Fallis [2008]).
10
For a detailed review of those, I invite you to consult the Wikipedia article Reliability of Wikipedia :
https://en.wikipedia.org/wiki/Reliability_of_Wikipedia
11
The term “Wikipedia” is coined by adding the hawaiian word “wiki” (quick) and the “pedia” of “ency-

7
2 Why is Wikipedia reliable ?
“The true miracle of Wikipedia is that this open system of amateur user contri-
butions and edits doesn’t simply collapse into anarchy.”
Chris Anderson in the book The Long Tail: Why the Future of Business Is Selling
Less of More.

Once the reliability of Wikipedia is recognized, we are faced with a huge problem : why
is Wikipedia reliable ? How does it happen that Wikipeda “doesn’t simply collapse into
anarchy” as Anderson puts it and can achieve an epistemic aim such as creating a reliable
source of information ? The answer to this question is not an obvious one since Wikipedia
is a complex social system with several layers in play. The explanatory problem of the
reliability of a source isn’t much of a problem for traditionnal encyclopedias. As I have
outlined above, reliability of traditionnal encyclopedias is thought to be ensured by the
expertise of their editors through an individuals-to-source reliability transfer. However, as
we have seen, since nothing can be attested about the reliability of Wikipedia’s editors,
such an explanation cannot be appealed to. This implies that an explanation of Wikipedia
must answer the amateurs problem. How can it happen for an epistemic social system to
be successfull without a control for the reliability of its participants ? The second main
problem that must be answered is the volatility problem. How does it happen that true
information is not volatile ? What is the origin of the observed resilience against disruptive
deletions ? And conversely, what makes false information volatile ? What is the origin of the
empirical resilience of Wikipedia against false informations ? These are questions that can
be answered by system-oriented social epistemology and this is the area in which our inquiry
falls (Goldman [2010]).
In order to address the explanatory problem of Wikipedia, I will proceed following a four
step methodology. Firstly, I will give details about the object of the study : Wikipedia as
a system of knowledge compilation. I will delineate the different explanatory levels so that
we can locate which hypothesis belongs to which level. This will give us a framework to
determine whether the reliability of Wikipedia is the result of a bottom-up process or a top-
down process. Is the reliability of Wikipedia based on parameters lying in the individuals
such as their reliability, their activity or does it depend on the very structure of Wikipedia
such as in the administration or in the policies and guidelines of Wikipedia ? Secondly, I will
review the different explanatory hypotheses of the reliability of Wikipedia and will proceed
to an a priori selection in order to highlight which I think are the most relevant to solve
the explanatory problem of Wikipedia. Thirdly, I will select a few hypotheses that could be
clopedia”.

8
tested in a simulation and I will test them using an agent-based simulation of Wikipedia.
For this approach to be relevant, one first needs to argue what a simulation can offer in
this inquiry. One also must describe how the simulation works and what assumptions and
idealizations underpin the simulation. An epistemological evaluation of such assumptions
and idealizations is needed in association to ensure the reliability of such a simulation and
this will be done using arguments and robustness tests. Finally, the last step will be to gather
data and interprate it with care. Using these interpretations, I will draw possible conclusions
as to the origin of the reliability of Wikipedia and then attempt to solve the central problem
of this work.
The first step in this inquiry is to precisely define what we are talking about. What is
Wikipedia ? When we use the term Wikipedia we can think about two referents. First we can
refer to something as the encyclopedia definition given above : a compilation of articles. This
way of looking at Wikipedia defines it as a product. But we can also mean by “Wikipedia”
the system that produces this compilation of articles, the complex set of processes at its
causal origin. If the question of the reliability of Wikipedia is about Wikipedia as a product,
the question of the explanation of the reliability of Wikipedia is about Wikipedia as a process
or a system. In the search for an explanation of the reliability of Wikipedia, we are therefore
not concerned about the details of Wikipedia as a product, but we must pay attention to the
components of Wikipedia as a system. What are these components and which hypotheses
can we attribute to them ?

2.1 Individuals
First, the most basic components of Wikipedia as a system are its participants. Partic-
ipants of Wikipedia can add or delete information, check other participants’ edits, etc. . .
Their actions are at the very base of Wikipedia as a system because without participants,
Wikipedia would simply not function since nothing would ever be added or deleted.
Among the possible explanations of Wikipedia’s reliability of this kind, the first hypoth-
esis we can propose is that a few but especially active and reliable users contribute most of
Wikipedia’s content. The reliability of Wikipedia would then rely on such super-contributors.
If we take the equation associated with the individuals-to-source reliability transfer, an espe-
cially active user will have an increased number of informations contributed in and therefore
her reliability will have a larger weight compared to less active users. If this user is reliable,
then the source’s reliability will increase compared to the case where every user adds the same
number of informations. For instance, let’s suppose that there are 10 contributors. They
all have a reliability of 0.5 and add a single information to the source. In such a case, the

9
approximated reliability of the source through the individuals-to-source reliability transfer
would be 0.5. However, if only 9 of them add a single information to the source and have
a reliability of 0.5, but that the last user has a reliability of 0.9 and adds 10 informations
instead of 1, then the reliability of the source would approximate 0.71 and would therefore
be superior to the source’s reliability in the first case.
Such a hypothesis has been proposed both by the free knowledge activist Aaron Swartz
and the founder of Wikipedia Jimbo Wales. However while they proposed that a few of the
contributors are actually at the origin of most the content, they dissent about who these super
contributors are. While both of them do not outline the problem in terms of reliability, but
in term of quantity, we can however infer that their opposing theories also concern reliability
since they take the reliability of Wikipedia for granted.
Jimbo Wales expressed the idea that most of the content comes from a few “elite” wikipedi-
ans. He based his theory on a similar analysis of Wikipedia by counting how many edits users
performed. He discovered that 50% of the edits were done by 0.7% people (524 users at the
time).
For Swartz, a minority of outsiders add most of Wikipedia’s content12 . He criticized the
results that Jimbo Wales came to by pointing to the fact that he counted the number of
edits instead of the quantity of characters. And when one looks at the history of a page,
it’s easy to understand that most of the edits are little corrections instead of added content.
Maybe the Gang of 500, the elite Wikipedians according to Wales, are mainly correctors
but not content contributors. So Swartz ran several analyses of article histories and he said
that the results showed that in some articles, most of the content is added by outsiders in
few but large scale edits, while the majority of edits are small corrections or reorganizations
adding little content. However, he gave anecdotal results and never released the statistical
confirmation.
On one hand we have a misleading measurement, on the other we have only anecdotal
results. But in both cases, we have the similar idea that a few contribute the most. Kittur &
al. [2007] who intended to study this point of disagreement between Wales and Swartz, comes
to the conclusion that while most of the edits were previously done by a few extremely active
users in the early days of Wikipedia, there was a transition to more and more non-elite
wikipedians. They also observed a similar result in the del.icio.us bookmarking network.
However, while this result is interesting since it hints at a transition, they still used the
quantity of edits instead of the quantity of content as an indicator which makes their work
unsatisfying for studying the super contributor hypothesis. So, is this hypothesis empirically
true ?
12
Reference : http://www.aaronsw.com/weblog/whowriteswikipedia

10
In order to study the hypothesis I developed 2 tools of my own. The first one is inspired
by the program developed by Viegas & al [2004] which allows one to study the history flow
of Wikipedia articles. The program basically compares each version of the article in order
to attribute parts of its content to their rightful owner. This allows one to study how the
content of a participant perseveres throughout the evolution of the article. Such a program
would generally give the results in the figure 1.

Figure 1: History flow of the Willard Von Orman Quine article.

The previous graph shows the results given by the history flow extractor when applied to
the Quine article of the english Wikipedia. The x axis is the number of versions while the y
axis is the quantity of characters. Each colors represents a single user. A wide strip of color
represents that the user associated with the color was attributed a lot of the content of the
article, while a narrow strip represents a small proportion of the content of the article. A
first interpretation of the result is that the super-contributors hypothesis is true since we can
observe persevering wide strips throughout the article evolution. However the data produced
by the algorithm is not entirely clean. For instance when the article is reorganized by a
user without content addition, the algorithm will attribute some of the content to the user
having done the reorganization. This can be observed for instance when one looks at the user
associated to the khaki strip who, around the 150th version, contributed some content but
also re-attributed to himself some of the content associated to the pink user. Another more

11
visible instance of this is observed before the 500th version when the beige user is attributed
a great deal of content from other users. The clue to discovering this artifact is to compare
the absolute increase of the character quantity to the attributed content of the article to this
user.
However, even with the elimination of these artifacts, the graph seems to show that most
of the attributed creation of content are narrow strips while most of the content is distributed
among a few number of colors. This hints that the super contributor hypothesis is plausibly,
to some extent, true.
The second tool is a little program designed to extract the variation of the proportion
of the sum of all positive contributions in terms of quantity of characters added in several
articles over the proportion of all the positively contributing users in thoses articles. The
plot on figure 2 describes this variation on 9 randomly selected articles.

Figure 2: Proportion of the total contributed volume over the proportion of con-
tributors.

We can observe that the 10% most positively contributing users in terms of quantity of
characters added are responsible for 80% of the content of the article. This seems therefore
to be a definite proof of the super contributor hypothesis13 .
While this allows us to verify the super contributor hypothesis, this does not permit us to
answer the Wales/Swartz disagreement about who these most contributing users are. How-
ever, it is possible to translate the measurement used by Wales into a different measurement.
13
The data is cleaned to delete most of the reverts of mass deletions since these falsify the data by giving
the impression that the users having performed the revert actually added lots of characters. This means that
cleaner data would produce a flattened curve showing that 80% of the content would be contributed by more
than 10% of the most contributing users.

12
If the number of edits isn’t representative of the quantity of the content added, it can be a
measurement of the degree of “wikipedianism” of the user since wikipedian users will tend
to contribute more often to apply the various policies and guidelines of Wikipedia while
outsiders will tend to contribute less often. With this hypothesis we can now graph the
data points corresponding to the quantity of positive significant contribution in terms of the
amount of characters over the number of edits made by the user. A significant contribution
is considered to be a contribution containing between 500 and 5000 characters because below
500 characters, little information will be added to the article while most of the contributions
over the mark of 5000 characters are disruptive contributions. The figure 3 features a graph
describing the results of this operation on the same 9 articles from the previous graph of
figure 2.

Figure 3: Scatter plot of single contributors over the total number of contributions
and the summed volume of their contributions.

This graph seems to show that both Wales and Swartz are right while giving an advantage
to the Swartz hypothesis. Most of the users having done significant contributions have not
often contributed and therefore are probably outsiders giving reason to the Swartz hypothesis.
But, users contributing often and probably being elitistic insiders, contribute more quantity
in more pieces of contribution as shown by the average contribution and this seem to validate
the Wales hypothesis in a lower measure. So both insiders and outsiders make significant
contributions but there are more outsiders making singular and voluminous contributions
while less insiders makes less voluminous but more frequent contributions.
The super contributors hypothesis being verified and Wikipedia being reliable, this means
that the reliability seems more of a bottom-up process than a top-down process. Its reliability
depends on the smaller constituents rather than from an imposed order. However, taken
alone, such a hypothesis would only partially answer the amateurs problem. But such a

13
hypothesis doesn’t answer the volatility problem or the resilience problem. Through which
process is true information protected against deletion and through which process is false
information ensured to be volatile ? The super contributors hypothesis then needs to be
combined with hypotheses allowing us to answer the volatility problem in order to form a
satisfying explanatory theory of Wikipedia’s reliability.
Other individuals-based explanatory hypotheses of the reliability of Wikipedia can be pro-
posed. For instance, the distribution of reliabity in the users population could be a structural
factor of Wikipedia success. Since a source reliability depends on its authors’ individual reli-
ability through the individuals-to-source reliabity transfer, then variation between a uniform
distribution and various normal distributions could impact Wikipedia’s reliability. It can be
easily predicted that a normal distribution with a low mean (close to 0) will ensure the doom
of Wikipedia’s reliability and that a normal distribution with a high mean (close to 1) will
ensure Wikipedia’s success. It could also happen that even if the population of editors has
a low average level of reliability, then Wikipedia’s reliability would still be ensured through
resilience processes allowing it to preserve particularly well true information and to remove
particularly quickly any false information.
Since a wikipedia user can engage in various actions (adding information, removing infor-
mation, checking information, talking in a discussion page, reversing a diff, etc. . . ), another
potential parameter could be the distribution of actions engaged in by users. For instance,
if checking and removing are done more frequently than adding information, it could be
predicted that Wikipedia’s content will be thinner but will also be more reliable.
The size of the population can also be an explanatory relevant factor. Maybe larger groups
produce more reliable content than smaller groups because of their increased diversity, maybe
not. For instance, in Condorcet’s Jury theorem, if the average reliability of all voters is
superior to 0.5, then the probability for the voted proposition to be true tends to 1 as the
number of voters increases. But if the average is inferior to 0.5, the probability for the voted
proposition to be true tends to 0 as the number of voters increases. However, it is not evident
how the size of the population has a causal impact and if it does, it must be closely tied to
other parameters.
Associated to the size of the population, the proportion of trolls (disinformers and bull-
shitters) over the epistemically honest population could have an effect. It can be predicted
that there is a threshold such as if it is exceeded, then trolls win against honest contributors
and Wikipedia’s reliability isn’t ensured anymore. The proportion of trolls over the honest
contributors can also be tied up with other parameters to study Wikipedia’s resilience against
them and to try to answer the volatility problem. For instance, administration may increase
the threshold of the proportion of trolls above which Wikipedia’s reliability is doomed.

14
As we have seen in the super contributors hypothesis, the number of informations added by
a user acts as a weight factor for its reliability. If this weight is superior to the average number
of informations per user and the user’s reliability is also superior to the average reliability, then
it will have a positive impact on Wikipedia’s reliability according to the equation associated
with the individuals-to-source reliability transfer. A potential explanation for Wikipedia’s
reliability is then possibly that reliable users are more active than unreliable users or trolls.
If a positive correlation exists between activity and reliability, it can certainly be proposed as
an explanation for Wikipedia’s reliability but such a correlation is difficult to observe since
reliability of an individual is hard to determine.

2.2 Engine
Wikipedia is based on the wiki content management system Mediawiki. A wiki is a con-
tent management system designed to permit edits on a massive scale such as is the case
for Wikipedia. Since “code is law ” as Lessig coined to refer to the fact that the code of a
computer-based system will act as a set of laws (Lessig [1999]), the design of Mediawiki is
a crucial way to explain Wikipedia’s reliability. The Mediawiki interface and engine delin-
eates the possible actions available to users. By providing such a modal frame to the user’s
actions, Mediawiki can therefore be appealed to in order to explain positively or negatively
the reliability of Wikipedia.
For instance, Mediawiki allows edits to be transparent. Any user can access the history
of all the versions of a given article and can know which user14 is associated with each
version. This allows an honest user to investigate and easily isolate disruptive edits and
their associated contributors. Such a user can then report the incriminated contributor to
the administrators of Wikipedia through the dedicated page15 . If the disruptive nature of
the edit is obvious, the incriminated contributor is banned by the administrators. If the
disruptive nature of the edit is debatable, then a debate will occur and some warnings will be
stated. If such a contributor continues disruptive edits, unruly edits in regards to the policies
and guidelines of Wikipedia or starts being aggressive, then the incriminated contributor will
be banned. Such a mechanism is thought to be healthy by regular contributors of Wikipedia.
Since such a system allows disruptive editors to be excluded from Wikipedia, then it certainly
increases the overall reliability of Wikipedia. This could be considered the quality control
that Wikipedia is accused of lacking by its detractors. But it can also happen that honest
contributors get banned because of their low reliability. In all these cases, we can conclude
14
Or which IP address if the user has no account.
15
Reference : https://en.wikipedia.org/wiki/Wikipedia:Administrator_intervention_against_
vandalism

15
that if adminstrators are well motivated, then administration will have a positive impact on
Wikipedia’s resilience againt various kinds of incorrect informations, be it misinformation if
the user is honest, or disinformation or bullshit if the user is a troll.
A contributor can also revert the article to a previous version if she thinks that the article
was better. Associated with the transparency allowed by the history function, it can be
stated that such a quick revert system allows easy and quick rectifications. Transparency
and the quick revert system associated with the Mediawiki’s functionality for a user to be
notified when selected articles are modified, is probably one of the reasons for the empirically
observed quick resilience against relatively obvious disruptive edits, especially if the article
is of importance and stakes are high. Such a feature of the engine could then provide a top-
down answer to the volatility problem : true information is preserved and false information
is deleted because of the existence of engine features allowing to quickly review and revert
edits.
There are also hypotheses based on the engine worth being mentionned but they are
presumably of little impact on Wikipedia’s reliability. Among these, there are disruptive
edits detection bots, incentives mechanisms such as attributing a quality score to an article
providing reward for the article’s authors or various warning tags for inaccuracy, bias, absence
of reference, , etc. . . that can be applied to articles or individual informations, or the
acknowledgement functionality which allow users to thank other users, which could in turn
increase the activity of reliable users, as a kind of incentive/reward system.

2.3 Policies and guidelines


But code is not the whole law. Another important component of Wikipedia as process
is its set of normative statements : the policies and guidelines16 . Policies and guidelines of
Wikipedia are normative statements defining what kind of edits are authorized, forbidden or
recommended. For normative statements to play an explanatory role, one needs to ensure
that they are applied. Are policies and guidelines of Wikipedia applied ? If yes, through which
process or processes ? Firstly, the application of policies and guidelines of Wikipedia can be
found in the commitment of individuals to follow these norms. But application of normative
rules are often subject to various interpretations and therefore lead to controversies. How does
Wikipedia as a system cope with these controversies ? Such controversies reach a consensus
in the discussion pages associated to each article of Wikipedia. Actually, if one looks at
discussion pages of an article, one can find that most of the debates are actually motivated
and ended by appeal to policies and guidelines. So the explanation for the application of
16
Reference : https://en.wikipedia.org/wiki/Wikipedia:Policies_and_guidelines

16
policies and guidelines relies on the commitment of individuals willing to apply them and on
the deliberations taking place in the discussion pages. If application of policies and guidelines
of Wikipedia is the case, as it is, then the second problem is to evaluate the effect of those
normative statements on Wikipedia’s reliability.
Wikipedia’s policy most notably thought to be a guarantee of its reliability is the same
that states verifiability must be ensured. If one is familiar with Wikipedia, one will note
that the “Verifiability” policy17 and its derivatives are probably the most invoked rules in
discussion pages or in tag warnings18 and are therefore likely to have an effect on Wikipedia
as a system. But what can we predict to be the effect of these policies and guidelines over
Wikipedia’s reliability ?
The “Verifiability” policy of Wikipedia states that :

“In Wikipedia, verifiability means that other people using the encyclopedia can
check that the information comes from a reliable source. Wikipedia does not
publish original research. Its content is determined by previously published
information rather than the beliefs or experiences of its editors. Even if you’re
sure something is true, it must be verifiable before you can add it. When reliable
sources disagree, maintain a neutral point of view and present what the various
sources say, giving each side its due weight.”
(My emphasis outlines the 4 guidelines associated with the “Verifiability” policy.)

Therefore, for an information to be legitimately added in an article, it must be sourced.


To understand the potential explanatory power of such a normative rule, the comparative
example of open source coding is useful. An open source coded program is a program coded
by any coder willing to do it. Ultimately, every change to the code is reviewed and accepted
or refused by administrators of the project but by doing so they apply a normative rule to
the proposed contributions : if the contribution is successfull in its purpose, then it must be
added, else it has to be refused. Fallis says that an open source program “bumps up” against
its effectiveness. This “bumping up” relation is actually a set of normative rules applied
to some product in order to evaluate it. Such an evaluation allows for the reduction of the
domain of possible realisations of the product to the domain of acceptable (according to this
set of norms) realisations of the product. In an open source program, the normative rules
are based on usability, effectiveness, less bugs, more functionalities, etc. . . But what is the
set of stuff against which Wikipedia must bump up ? Fallis notes that the “Verifiability”
policy states that Wikipedia must bump up against “reliable sources” (Fallis [2008]). It is
17
Reference : https://en.wikipedia.org/wiki/Wikipedia:Verifiability
18
Reference : https://en.wikipedia.org/wiki/Category:Inline_cleanup_templates

17
then thought that the reliability of a “reliable source” is to be transfered to Wikipedia. The
implicit line of reasoning here uses the source-to-information reliability transfer as defined
above. A sourced information on Wikipedia will have the reliability of its source. This allows
restriction of the domain of all possible realisations of Wikipedia to sourced realisations of
Wikipedia. If the source-to-information reliability transfer is accepted and the “Verifiability”
policy is applied by individual contributors, then it is a plausible explanation of the reliability
of Wikipedia.
It is worth noting that the very statement of the “Verifiability” policy refers to four
guidelines of Wikipedia19 .
First, it refers to the “Reliable sources”20 guideline which states when the reliability of a
given kind of source is to be used or not. This guideline provides normative uses of sources
depending on their alleged reliability and on the kind of information allowed to be sourced by
such or such kind of source. This ensures the reliability of the source and through the source-
to-information reliability transfer, the reliability of the sourced information. Therefore, for
the “bumping up” relation between Wikipedia and its sources to explain its reliability, the
“Reliable source” guideline must be applied too.
Secondly, the “No original research” guideline21 forbids contributions being constituted of
original works. Upon reflexion, it is a prolongation of the “Verifiability” policy since original
research is likely to be unsourced. It is a regularly invoked rule against outsider contributors
prone to contributing their own opinion. This guideline, if applied, is a way of enforcing the
“Verifiability” policy and is likely to have the same benefits.
Thirdly, the “Neutral point of view” guideline22 states that point-of-view-based informa-
tions do not have their place in Wikipedia’s articles. Interpretations and judgements should
therefore be removed or presented as facts by attributing them to a sourced commentator.
By removing opinions or by attributing them to a sourced commentator, this guideline allows
the removal of biased (and therefore unreliable) informations or their translation into facts.
If applied, this guideline is likely to have a positive impact on Wikipedia’s reliability.
Fourthly, the “Due weight” guideline23 provides a framework to treat absence of consensus
for a specific topic. When a topic meets consensus, then the information on this page can be
stated as if it was objective knowledge. However, if there’s no consensus on the topic, then the
associated article must present the various positions attributing them a due weight depending
19
Those I emphasised in the quote above. Additionally they are also hyperlinks to the guidelines referred
to in the Wikipedia’s “Verifiability” page.
20
Reference : https://en.wikipedia.org/wiki/Wikipedia:Identifying_reliable_sources
21
Reference : https://en.wikipedia.org/wiki/Wikipedia:No_original_research
22
Reference : https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view
23
Reference : https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view#Undue_weight

18
on the degree of sharedness in the researcher population. If the “Due weight” guideline
is applied, it is then likely to be beneficial to Wikipedia’s reliability since non-consensual
information is likely to be less reliable than consensual information. Also, presenting non-
consensual positions as such is reliable whilst presenting non-consensual information as if it
was consensual is unreliable.
There are also a certain number of subject specific guidelines such as the guideline gov-
erning articles about living people24 and it is plausible that they have some real effect on
Wikipedia’s reliability. But for reasons of economy and since they are subject specific, I will
not list and study them in detail, limiting myself to the most general and important policies
and guidelines presented above.

2.4 Communication
Some communication-based explanations can be proposed since the existence of discussion
pages allows a number of social interactions on Wikipedia. The most important social feature
in Wikipedia, as I outlined above, is the fact that discussion pages host deliberations about
the content of Wikipedia. If one judges deliberations to be epistemically flawed, then one
would not consider these deliberations as an explanation of Wikipedia’s reliability. However,
since deliberations on Wikipedia focus on the application of the policies and guidelines, we
can consider that deliberations’ negative impact would be canceled by the assumed beneficial
impact of the policies and guidelines applied.
Wikipedia is also a community. There are core wikipedians who know one another in real
life. Such a community generally tries to give incentive to new and honest users by rewarding
them. Such a reward could impact the activity level of reliable users and therefore explain
the potential correlation between activity and reliability as outlined above.

2.5 Object
Finally, there is one more possible kind of explanation to Wikipedia as a system. Since
Wikipedia’s fundamental aim is epistemic, then some explanatory hypotheses could rely on
the properties of knowledge itself. There is one notably interesting hypothesis of this kind
about the reliability of Wikipedia : the hypothesis that since articles are about a specific
theme, then there is a finite number of pertinent information about this specific theme.
This hypothesis assumes that there is a “perfect” state of a given article containing only the
true and relevant informations. The idea that an article, in its virtual complete realisation is
finite implies that users can operate a selection among the dimension of topic-relevance. This
24
Reference : https://en.wikipedia.org/wiki/Wikipedia:Biographies_of_living_persons

19
can have a structural effect on Wikipedia’s reliability. First it means that an information can
be checked not only by its truth or source, but by its topic-relevance. This is likely to increase
the resilience of Wikipedia against disruptive edits where the added content is irrelevant (as it
is often the case with bullshit). Secondly, it implies that honest users following the evolution
of an article will be more likely to protect that article when its “perfect” state is reached
or almost reached. They will potentially shift from contribution to reviewing and therefore
increase Wikipedia’s resilience.

20
3 Simulating Wikipedia
In order to study some of the hypotheses presented above, I will now present the agent-
based simulation of Wikipedia I developed.

3.1 Why the simulation approach ?


Simulations are a relatively new tool in philosophy. However, in some areas of philosophy
such as in the domain of this work, system-oriented social epistemology, the use of simulations
is increasingly widespread. For instance, Zollman develloped a network-based model to study
the impact of various kinds of communication networks concerning the stabilization around
a consensual theory. In his simulation, nodes represent individual researchers and links
represent access to other researcher’s results. At each step, beliefs of individual researchers
are updated in regards to the information they access. The results of his simulation suggest
that even if dense communication networks are quicker to reach a consensus, they are also
more likely to reach consensus over a false hypothesis compared to less dense communication
networks (Zollman [2007]).
In the domain of system-oriented social epistemology, simulations are assumed to yield
some kind of explanatory power of the epistemic systems at hand. The first problem for this
kind of approach is to find an epistemic ground for this assumption. How can a simulation,
which is a language-based phenomenon, teach us something about a real world phenomenon
such as Wikipedia and help us in explaning it ? There are three problems to this question.
First, we must define explanatory power. Second, we must ensure that the relation between
a simulation and its object allows the latter to have some explanatory power. And third, we
must answer the epistemological problem of idealizations in simulations.
The first problem will not be studied in detail since there is already considerable litterature
about the property of a statement to be used in a satisfying manner within explanations.
However, I will assume a causal definition of explanation and therefore define explanatory
power as the property for a theory t to be able “to provide some information about [the target
phenomenon] causal history”25 as Lewis famously puts it (Lewis [1986]). For a theory to yield
explanatory power, it must then refer to real world causal factors and processes. Therefore,
for a relation between a simulation and its target, it must allow this reference. However,
this reference is a special kind of reference. When a researcher constructs an explanatory
model of a real world phenomenon, the theoretical entities of his model do not need to refer
to the real world entities. The reference needed for a model to have explanatory power is
25
I leave the problem of the definition of causality unsolved in this work, may it be a Lewis-like possible
worlds causal conception or a Salmon-like mechanistic conception of causality.

21
an isomorphism between the structure of the real world causal processes and the processes
in the model. Such an isomorphism can be recognized if the structure of the model can be
described as laws that are instances or are identical to the more general laws governing the
real world processes behind the target phenomenon (Weirich [2011]).
But before assessing the isomorphism between a simulation and its target, we must distin-
guish several layers between those. We can distinguish four layers : the target of a simulation,
the model of this target, the language-specific simulation and the running simulation. The
target of the simulation is the real world phenomenon simulated by the simulation. The
model of the target is the conceptual representation of the target. The language-specific sim-
ulation is a particular realization of this model in a specific computer language. The running
simulation is a particular run of the language-specific simulation with some fixed parameters.
Since data is acquired by running simulation, for those data to yield explanatory power of
the target phenomenon, we must ensure the preservation of the explanatory power between
all those levels which means we must ensure isomorphism between those four levels.
I already stated the first isomorphic relation when I defined explanatory power. For a
theory or a model to yield explanatory power is for it to be isomorphic with its target causal
factors. However, we can distinguish here between complete and partial explanations. A
complete explanation would be reached by a model if and only if the model provides a
complete causal picture of the real world processes at the origin of the target phenomenon.
This would mean that the processes of the model are completely isomorphic to the processes
of the target phenomenon. The laws governing the model would then be exactly the same
as the laws governing the real world. Such a complete model is rarely achieved in science.
Instead, a model can aim for a partial explanation if the laws governing it are instances of
the laws governing the target phenomenon processes. For instance, a model of mechanical
movement can consider frictions as negligeable and therefore remove them or assign them a
fixed value. Even in this case, such a model would still be thought of as yielding some partial
explanation of the phenomenon of mechanical movement because the laws of the model are
instances of the laws of real world mechanical movement where friction can have some effect.
Such idealizations control for untested causal factors and they are not a problem for a partial
explanation (Grüne-Yanoff, Weirich [2010], Weirich [2011]). Applied to our problem at hand,
the model on which our simulation is based must then be as isomorphic as possible to the
real Wikipedia as a process which is equivalent to having the least possible model-based
idealizations.
This isomorphism must be preserved when the model is embedded in a specific computer
language26 . Such a preservation is ensured if the model is faithfully transcribed in the algo-
26
The computer language used in my simulation is Python.

22
rithms of the simulation. But, there can be differences between a model and a simulation
based on this model. A simulation has constraints a model has not, for instance, computabil-
ity. These constraints can imply new idealizations and therefore, less isomorphism. Also,
such simulation-based idealizations would not control for causal factors. However, a simu-
lation can still yield a partial explanation even if it relies on such idealizations. Contrary
to the idealizations controlling for untested causal factors, idealizations of this kind do not
control for causal factors and therefore must be tested through robustness tests.

“A result is robust if it is achieved across a variety of different starting conditions


and/or parameters.” (Weirich [2011])

If the result of a simulation using idealizations that do not control for untested causal
factors is robust accross a variety of values for such idealizations, then the result is robust.
Robustness ensures that such results are not artifacts based on idealizations that do not
control for untested causal factors and therefore ensures that such idealizations have no
impact on the isomorphism of the simulation and the target phenomenon. The level of a
running simulation is also likely to add idealizations of this sort since a running simulation
needs to fill fixed values for various parameters. Some of these fixed values will correspond
to parameters based on computability idealizations for instance. But, there is no novelty
here compared to the language-specific simulation idealizations, and robustness can offer an
epistemic guarantee of a running simulation result the same way it does for the language-
specific simulation (Grüne-Yanoff, Weirich [2010], Weirich [2011]).

3.2 How to simulate Wikipedia ?


I will now detail how my simulation of Wikipedia works. This simulation is an agent-based
simulation aiming to simulate the evolution of an article based on its contributors’ actions.

3.2.1 Basic simulation

An article is a finite list of informations. In the simple version of the simulation, an


information is an object having two properties : (i) to be true or false and (ii) to be more or
less esoteric. Esotericity is here defined as the property for the truth value of an information
to be more or less hard to check. The higher the level of esotericity of an information, the
harder it is for its truth value to be knowable. An article has several versions organised along
the temporal dimension. A version of an article is created when an edit is done by a user.
A user is an object able to modify the article. Basically, a user has a fixed reliability as
defined above and a level of activity which is her probability to engage in an action at each

23
step of the simulation. In the simple version of the simulation, a user can perform one of two
actions : to add information or to check and remove information.
If a user adds information, each information added is true or false depending on her
reliability27 . This allows honest users to contribute misinformation. The information created
also has an esotericity equal to the sum of the user’s reliability and a random number. The
reason the esotericity of the information is correlated to the user’s reliability is based on the
assumption that the more reliable a user is, the more esoteric information she knows and
therefore, the more the information she contributes will be esoteric. The random number
added to the reliability of the user in order to determine the esotericity of the information
contributed is to add variation to the esotericity since even an expert (a very reliable user) will
be able to contribute information with a low degree of reliability. I call this random variation
of esotericity the esotericity noise. In the basic setting of the simulation, the esotericity noise
scale will be between 0 and 1.
When an honest user removes information, she checks a random number of informations
in the article. For the user to know the truth value of an information, a random number in
the esotericity scale is added to her own reliability. If the result is superior or equal to the
esotericity level of the information, then the user knows its truth value, else she does not.
This idealization is assumed to represent the fact that informations are more or less easy
to know and that a more reliable user will be more likely to know the truth value of more
esoteric information than an unreliable user. Another idealization is made here : if the user
is honest (not a troll) and does not know the truth value of information, she will not remove
this information. However, if the user is honest and she knows that the information is false,
then she will remove the information.
This last assumption that honest users cannot delete information of which they ignore
the truth value has a strong implication. It implies that an honest user, while able to
produce positive misinformation by adding false information, is not able to perform negative
misinformation which is deleting true information. This assumption can be coined as strong
epistemic honesty and defined as following :

Strong epistemic honesty (applied to Wikipedia) =def A constraint stating


that a user should/can only add information she believes to be true (even if it
is false) and should/can only delete information she is certain to be false (and
therefore cannot delete an information for which its truth value is unknown).

Strong epistemic honesty is a strong claim and a potential flaw of the simulation since
27
If a user has a level of reliability of 0.5, then half of the information she adds will be true, the other half
will be false.

24
it misrepresent Wikipedia’s causal factors therefore limiting the explanatory power of the
simulation. This is the case because there is good reason to think that most honests users
follow a weak epistemic honesty defined as follows :

Weak epistemic honesty (applied to Wikipedia) =def A constraint stating that


a user should/can only add information she believes to be true (even if it is false)
and should/can only delete information she believes to be false (and therefore
can delete information without knowing its truth value or by believing a true
information to be false).

The difference between these two statements is that in the latter, honest users can perform
negative misinformation and therefore delete true information. Such an action can happen
for different reasons may it be because the user deletes an information without knowing
its truth value but believing it to be implausible or may it be because the user thinks a
true information is false. Such cases can’t happen in the simulation. Is this implying a
disqualification of the partial explanatory power of the simulation ? I do not believe so.
First, since we can assume that a part of the honest users will act according to the strong
epistemic honesty, then the simulation will still yield some partial explanatory power. This
explanatory power will be epistemologically flawed since we cannot measure the degree of
epistemic honesty in the real life Wikipedia. By degree of epistemic honesty I mean the fact
that there is a spectrum of possibility of weak epistemic honesty. Maybe all honests users
(but will they still earn the title of being honest in this case ?) will delete information when
they ignore their truth value. Or maybe when they don’t beat the esotericity level of the
information they have a probability, correlated in some way to their reliability, to delete the
information. The degree of weakness of epistemic honesty depends on how we conceive the
possibility of negative misinformation.
Second, there is a fact about Wikipedia as a system that can be invoked in order to
consider plausible the possibility that if weak epistemic honesty is the case on Wikipedia, it
still tends towards strong epistemic honesty. This fact is the Verifiability policy of Wikipedia.
In practice, when one looks at the contributions of Wikipedia, one can observe the fact that
an information can be added without being sourced, even if such a practice is frowned upon.
However, since a certain amount of informations are sourced, and in the hypothesis that
their sources are reliable (as is enforced by the Reliable source guideline), then these sourced
informations will not be likely to be deleted by a weakly honest user. This fact therefore
reduces the number of true informations susceptible to be deleted by an honest users. This
consideration allows us to consider the most weak epistemic honesties as implausible while
considering more plausible the less weak epistemic honesty, closer to strong epistemic honesty.

25
A final consideration about this problematic assumption, which I think to be the strongest
when considering the assumption of strong epistemic honesty to be not so damaging to
the explanatory power of the simulation, appeals to the trolls behaviors. I will get back
to this point when trolls will be introduced. But for now, it is required to survey the
various parameters needed to be filled when running the simulation. There are two kinds of
parameters, the tested parameters and the untested parameters.
First we need to determine how users’ reliability is distributed. Variation of this parameter
allows us to study the various hypotheses assuming that the reliability distribution in the
users population has some impact on the reliability of Wikipedia. A uniform distribution
will be confronted with various normal distributions. If a normal distribution is chosen, then
we can test various means and various standard deviations. Means will be considered as a
tested parameter while standard deviation will be considered as an untested parameter and
its base value will be 0.12 because such a standard deviation allows both the possibility for
significant variation while having a clear average reliability. Of course, robustness tests will
be performed with other standard variations.
We can also vary the number of users but as we will see, when trolls are absent, this value
will be an untested parameter. The reason being that, without trolls the explanatory value
of the quantity of users is not really very high. However, when trolls will be added to the
simulation, the proportion of trolls over honest users will be a tested parameter.
It is also possible to vary the proportion of actions engaged in by users and study the
impact of this parameter. This parameter will allow us to study whether it is better for
honests users to check the article more often than to contribute or not.
Another tested parameter concerns the distribution of the activity among users. In the
basic setting, the simulation will use a uniform distribution of activity, but this will be
compared to a pareto distribution since I found that such a distribution approximates well
the real distribution of activity in Wikipedia. However, it is not possible to convert the
activity of the real Wikipedia into its simulation. The reason for this impossibility is that
the measure of activity distribution in the real Wikipedia is the variation of the quantity of
edits per number of users while in the simulation it’s the probability for a user to act at each
step of the simulation. For this reason, the shape parameter will be fixed at 5 and will be an
untested parameter which will be submitted to robustness tests.
Concerning the volume of checked or contributed informations, there are two parameters
: the maximum quantity of this volume and the distribution of the volume of contributed or
checked informations. The first will be untested and fixed at 25. Concerning the latter, we will
compare a uniform distribution against a pareto distribution since I observed a distribution
approximated by a pareto distribution in the volume of characters per edition. However,

26
since it’s difficult to quantify the average number of characters per information, the value
of the shape parameter of the pareto distribution will be arbitrarily at 0.5 since this value
allows that most of the volume of checked and contributed informations will be quite limited,
whilst rarely, more voluminous edits will happen. Both these fixed values will be submitted
to robustness tests since they are untested parameters.
Finally, the last untested parameter that will be submitted to robustness tests is the scale
of the esotericity noise which will have a basic value of 1 (the random number added to the
user’s reliability will be uniformly selected between 0 and this value).
The model presented here is the basic version of our completed simulation. Several func-
tionalities are then added as other hypotheses will be studied using the simulation.

3.2.2 Trolls

Among those functionnalities, trolls, as espitemically dishonest users, are added and the
protocol defining their action is different from honest users. Concerning the contributing ac-
tion, a troll adds a certain number of information but contrary to honest users, all contributed
informations are false. However, the esotericity score of their informations is determined in
the same way as for honest users. Concerning the check and delete action, disinformers
and bullshiters are distinguished. Disinformers delete only the information they know to
be true by checking the information the same way as with honest users, while bullshiters
simply delete random informations without checking for their truth value. There is also an
important difference between disinformers and bullshitters. While disinformers in Wikipedia
intentionally want to disinform and therefore want their disinformation to persevere in the
article, the volume of contributed or checked actions are identical to those of honest users.
However, since bullshitters have no epistemic motives, they are the trolls at the origin of
the disruptive mass contributions and mass deletions. In order to simulate this property,
the way the volume of information contributed by bullshitters is determined varies compared
to honest users and bullshitters. Concerning their contribution, a multiplier is applied to
the maximum random number of contributed informations. This multiplier is an untested
parameter fixed at 5 (and since the fixed value of the maximum number of information is fixed
at 25, bullshitters can contribute between 1 and 125 informations) but this multiplier will be
submitted to robustness tests. Concerning their random deletion of information, they can
actually select between 1 and the total quantity of informations in the article, potentially
allowing them to delete all of the article. Also, while we can select the distribution of
the random number of contributed and checked informations as a pareto distribution, the
bullshitters are not affected by it and the distribution of the random volume of information
is always uniform. These particularities of the bullshitters are designed in order to allow

27
them to perform massively disruptive contributions and deletions as it is observed in the real
Wikipedia.
The addition of trolls to the simulation implies we must fill additional variable parameters.
First we can vary the proportion of trolls over honest users and this will be used to determine
the resilience of various configurations of parameters against trolls. Secondly, we can vary
the proportion of disinformers over bullshiters and I will in fact always distinguish the two
in order to study selectively and comparatively their effect on the simulation.
Returning to the assumption of strong epistemic honesty, I think that the addition of
trolls to the simulation allows for a final argument decreasing the severity of the explanatory
limitations of this assumption. While without trolls, there is no possible deletion of true
informations, this becomes possible when they are added. The idea is that when there are
trolls, on average, the simulation will approximate results with weak epistemic honesty. This
is the case because we can consider that some of the troll actions are actually actions of
honests users temporarily loosing their strong honesty and, by deleting true information,
they endorse weak epistemic honesty. Such a consideration would be plausible on singular
runs but there is a limitation. Since users are single instances individualized by their specific
values of reliability and activity, it would not be possible to pair some of them to their
troll counterparts. But these individualizations cancel out when one looks at the average of
multiple simulations since the higher the number of runs when computing average results,
the more representative of the selected distributions, the individual pairings of reliability and
activity will be.
I think this argument is plausible. And if it is accepted as plausible, then the averages of
multiple runs with trolls would correctly approximate weak epistemic honesty, allowing us
to consider these results to have a higher degree of explanatory power. This implies that we
must keep in mind, when interpreting results, that the explanatory worth of results without
trolls would be lower compared to the explanatory worth of results with trolls.

3.2.3 History check and revert

In order to test the hypothesis that the revert functionality of Mediawiki allows for an
increased resilience against the volatility of informations, another added functionality is the
possibility for users to engage in a third complex action. They will be able to access and check
a certain number of the latest versions of the article. According to the results of their check,
they may revert the article to a previous version except for bullshitters who just randomly
revert the article to a previous version without checking the history.
When an honest user performs this action, she will check n previous versions with n
being between 1 and the maximum number of contributed or checked informations defined

28
previously. In order to do so, the simulation stocks the diffs of each modification of the article
which are objects that stock the added or deleted informations and the user associated to
each modification. For each diffs checked, all the added and deleted informations are checked
using the same protocol as seen before by beating the esotericity score of each information.
By doing so, the honest user computes the beneficial score of the potential revert to each of
the checked diffs. With the beneficial score of the nm ax previous diff starting with the latest
diff n = 1, we can compute the beneficial score of a potential revert to the version before the
nmax diff as following28 :
nX
max

−Tn+ − Un+ + Fn+ + Tn− + Un− − Fn−


n=1

When all the diffs have been attributed a beneficial score, the user decides to revert iff
there is a diff with a value superior to 0. If several diffs have positive beneficial scores, then
the user will select the diff with the higher one. The formula basically expresses the idea
that a revert must be done if the revert will mostly add more true or unknown informations
or will mostly delete false informations taking in account that a revert to a specific version
will also erase the modifications between this version and the latest version.
Disinformers perform the same protocol as above at the exception that the beneficial score
of the nm ax diff will be express as :
nX
max

+Tn+ − Un+ − Fn+ − Tn− + Un− + Fn−


n=1

The beneficial score for a disinformer will then be the fact that the revert will mostly
delete true informations or add false and unknown informations.
Finally, concerning bullshitters, they can revert to any previous version without checking.
Each previous version has the same amount of probablity to be selected as the revert target.

3.2.4 Administration and report

Administration is also added and allows us to study the impact of the ban function on
Wikipedia’s reliability and therefore its assumed beneficial impact in regards to resilience
against trolls and unreliable users. The administration mechanism adds a new functionality
28
With Tn+ : number of added informations known to be true by the user at the nth diff, Un+ : number of
added informations at the nth diff for which the truth value is unknown by the user, Fn+ : number of added
informations known to be false by the user at the nth diff, Tn− : number of deleted informations known to
be true by the user at the nth diff, Un− : number of deleted informations at the nth diff for which the truth
value is unknown by the user and Fn− : number of deleted informations known to be false by the user at the
nth diff.

29
to the action seen just above. While checking an article’s history, honest users can detect
and report disruptive edits to administrators. When an edit has been reported as disruptive,
an administrator will check it and if she is certain of its disruptive nature, the user at the
origin of this edit is definitely banned.
A disruptive edit can be either one of those : a contribution with only false informations, a
deletion of at least one true information, a contribution or a deletion with a larger amount of
informations than the maximum number of contributed or deleted informations by honests
users and disinformers and finally a revert to a version before the maximum number of
versions to be reverted. The two lasts disruptive edits are edits that only bullshitters can
perform since the volume of their edition can be much larger than the maximum volume of
honests users and disinformers editions. This is made in order to model the fact that when an
edit adds or deletes too many characters on Wikipedia, it is almost immediately considered
as disruptive by wikipedians or by disruption detecting bots, and immediately reverted.
Three points deserve to be outlined concerning the simulation of the administration.
Firstly, the simulation assumes that administrators are epistemically honest, but since this
seems to be true in the real Wikipedia without judging of administrator’s reliability, then
I take it to be a plausible idealization. Secondly, I make the assumption that trolls have
no interest in reporting disruptive edits and therefore don’t have the ability to do it in the
simulation. Thirdly, since, as we have seen above, honest users can produce disruptive edits
if they only add false information, then the administration process not only bans trolls but
can also ban honest users.

30
4 Results
4.1 Basic simulation
In order to keep the interpretations as understandable as possible, we have to start with
a basic simulation which will be used as a corner stone. In this basic simulation, we have
100 honests users and no trolls. These users have 50% chances to contribute or 50% chances
to check random information and delete them. There is no possibility to revert the article
to a previous state nor is there any administration. Finally, the distribution of reliability
and activity is uniform as is the distribution of the volume of information contributed or
checked. The figure 4 shows the average of 20 runs with such parameters (see Table 1 in the
Appendix).

Figure 4: Average results with 100 honest users.

In such a configuration of parameters, reliability is achieved. However this should be


of no surprise since the model assumes a strong epistemic honesty as we asserted earlier.
Since honests users cannot delete true informations but only false informations, it can be
easily predicted that true informations will accumulate while false information will not. This
observation can offer a first and simple answer to the amateur problem. Amateurs can
produce a reliable article because while they are able to add false informations, they are

31
not able to delete true informations. As we have seen, the strong epistemic honesty is
somewhat limiting the explanatory value of the simulation. However, in reality the class of
true informations with a reasonable probability of being deleted by honests users is narrow. It
includes common truths taken to be false (which is a lower class than the commonly believed
falsities) or the very unplausible truth propositions. Even if such informations exists, they
are a minority of all the informations and the simulation still seems to hold some partial
explanatory value in regards to the amateur problems.

4.1.1 Normal distribution of reliability

Reliability can be obtained even with extremely poor reliability distributions. The Table
1 in the Appendix show the same simulation with the exception that the reliability follows a
normal distribution with a variable mean.
What we can see is that with some poor reliability distributions such as normal distribution
with means of 0.25 or even 0.1, the article is still mostly reliable. This is of course, tightly
linked with strong epistemic honesty as we have seen before. Since no true informations can
be deleted, even if most of the informations added are false, they will be deleted at some
point, there will come a moment when true informations will accumulate more than false
ones and the article will therefore be reliable. The explanatory value of such simulations is
however poor since weak epistemic honesty seems to be linked with reliability in such a way
that, the less a user is reliable, the weaker her honesty will be.

4.1.2 Varying the actions proportion

In the tested hypothesis, we outlined the proportion between contributing informations,


checking informations and deleting them. We can study this hypothesis by comparing the
results of the basic configuration of parameters to simulations where users are more likely to
contribute or to check the article (see Table 2 in the Appendix).
The results of figures 5 and 6 show that it is better for users to check the article more
often than to contribute more often. However, while an increase in contribution strongly
affects the reliability of the article in a negative manner, an increase in checking only slightly
increases reliability. In the case where users contribute 75% of the time and check the article
only 25% of the time, average reliability goes from 0.96±0.02 to 0.71±0.08 compared to the
basic simulation where the two kinds of actions have each the same amount of chances to
be selected by users. The results concerning volatility explicitely show the reason for such
a reliability drop : the half life of false informations goes from 4 to 114 compared to the
basic simulation. While the volatility of true information is unchanged, the volatility of

32
Figure 5: Average results with 100 honest users contributing 75% of the time.

false information decreases a lot and since false information is less volatile, all things being
equal, reliability drops. The last observation is that there is an average of 4596.92±415.24
true informations and 1775.96±435.71 false informations for the last 100 versions of the run
instead of 3453.56±354.73 and 28.93±18.57 respectively when users are equally likely to
perform contributions or checks. While there are more true informations at the end, there
are also many more false informations.
Concerning the runs where users are more likely to check the article instead of contributing,
we observe a slight increase of the average reliability from 0.96±0.02 to 0.98±0.02. There
is also a slightly decrease of both the half life of true and false informations which can be
explained by the fact that half life is sensitive to the quantity of information added in the
first versions of the article. Since users are less likely to contribute, there will statistically
be less information at the beginning of the article, and therefore the half life will slightly
decrease. However, convergence toward reliability happens quicker than with equiprobable
proportions of actions. But despite this quicker convergence toward reliability, there is only
an average 2397.68±228.74 true informations at the end of the run.
These results can be easily explained and interpreted. If honest users contribute more
than they check the article, the false informations will be more likely to pile up compared
to a situation where users check the article more often than they contribute. This result is

33
Figure 6: Average results with 100 honest users checking for false informations
75% of the time.

linked with the assumption that when a user contributes, the average quantity of information
contributed will statistically be equal to the average quantity of information checked. More
contribution implies therefore that the users checking the information will be exceeded : they
will not be able to check and delete all the false information. This result is to be interpreted
with caution since it is based on the assumption that the average volume of informations
in a contribution is equal to the average volume of informations checked. However, since
there is no empirical evidence for such a statement, we must therefore state that the correct
interpretation of this data isn’t that honests users should check the article as often as they
contribute, but instead that they should check as much informations as informations con-
tributed. The shift from frequency to quantity is needed to apply the result of the simulation
to the real world Wikipedia.
Another conclusion can be drawn and concerns the interaction between the tested param-
eter at hand and another parameter : the distribution of reliability. In the runs at hand, we
assume that reliability is distributed uniformly among the population and will therefore have
an average of 0.5. But, if the reliability distribution is such that the average reliability is
higher or lower, then the need for checking will vary. The proportion between the quantity of
information checked compared to the quantity of information contributed is sensitive to the

34
average reliability in the sense that if the average reliability is lower, all things being equal,
more false informations will be added to the article. In order to achieve global reliability,
users will therefore need to check more information than they contribute. Conversely, the
higher the average reliability, the less the quantity of checked information will be needed in
order to achieve global reliability.
A parallel can be drawn between those results and the practice of peer review in science.
Among the advocated benefits of the practice of peer review, there is the assumed increase
of reliability. The idea is that when a scientific article is reviewed by other experts, then
those experts will be able to refine the article by correcting errors and adding omissions.
Concerning the corrections of errors this is essentially the same with the action of checking
the article in order to delete false informations. If we simplify that the practice of peer review
is to delete false information, then this is an approximation of my model of Wikipedia and the
descriptive and normative conclusions I have drawn can be applied to scientific peer review.
When a scientist writes an article, the reliability of the article will be approximately equal
to her reliability. When her peers start to review her article, they will outline, according
to their own reliability, the incorrect informations and those will be deleted or replaced by
true informations. If the parallel is to be drawn and the conclusions to be applied to peer
review, then a surprising potential conclusion can be drawn. In the hypothesis that scientific
experts are defined in an absolute manner such as that they have at least a reliability superior
to 0.5, then this means that the need for correction is actually not as necessary compared
to a case with amateurs. We can even push this reasoning further by stating that if the
experts are actually extremely reliable, then a lazy peer review will be possible without a
strong damaging result for the article. Of course, since we are not able to assess the absolute
reliability of experts, it is always preferable to practice conscientious peer review.

4.1.3 Traditional encyclopedia

By modifying the parameters of the simulation, it is possible to simulate a classic encyclo-


pedia in order to compare it to Wikipedia. In this case I simulate 11 users with a reliability
of 0.75 (assuming an absolute notion of expert, which is an expert with a degree of reliability
superior to 0.5) and where the first user contributes 500 informations while the other 10 users
simply check the article to remove false information.
This is supposed to simulate a traditionnal encyclopedia since in such encyclopedia, a
single expert writes most of the content and this content is supposedly peer-reviewed. The
result of such a simulation is shown in figure 7.
This result isn’t surprising. First, the contributing expert conributes 500 informations of
which in average 75% will be true. There will be in average 375 true informations and 125

35
Figure 7: Result of the simulation of a traditional encyclopedia.

false. Then, the 10 reviewers will all check the article in its entirety. Everyone of them have
a reliability of 0.75 and in average they will be able to determine the truth value of half the
informations. So the first will remove 62.5 informations in average, the second half of that
and so on. At the end of the simulation, we have therefore an average of 375 informations
and no or few false informations. This implies that the article converges towards reliability
through the process of peer-reviewing.
The comparison of these results to the previous results does not yield interesting conclu-
sions since the results of these runs would mostly depend on the arbitrary parameters I have
chosen to represent the experts, however, I think that the fact that the underlying model of
my simulation is able to simulate a traditionnal encyclopedia is actually one of its strengths
since generality is often considered a value of explanatory models.

4.2 Trolls
4.2.1 Disinformers and bullshitters

If Wikipedia were to be contributed to only by honest users with strong epistemic honesty,
Wikipedia would be reliable. Unfortunately, since Wikipedia is massively participative, not

36
only the people with good epistemic intentions can contribute, but also the disinformers
and the bullshitters. The problem of the resilience of Wikipedia againt trolls then arises.
In order to have a basis to compare the various resilience mechanisms but also in order to
comparatively study the impact of bullshitters and disinformers, I provide the result of runs
with 50 honests users and 50 disinformers in figure 8 or 50 bullshitters in figure 9 (also see
Table 3 in the Appendix).

Figure 8: Average results with 50 honest users and 50 disinformers.

We can observe that disinformers and bullshitters have a distinct impact over Wikipedia.
The first difference is that with 50 disinformers, the reliability of the article converges to a low
average reliability of 0.08±0.06 for the last 100 versions while with 50 bullshitters, instead
of a convergence, we can observe a stable average reliability of 0.15±0.17. This comparative
phenomenon of convergence versus stability can be explained by looking at the volatility for
each set of runs. When there are disinformers, false information is much less volatile (with
a half life of 99) than true information (with a half life of 14) and will then tend to pile
up. This is explained because disinformers are selective in the information they delete : they
delete only true informations. Instead, when bullshitters delete informations, they delete vast
amounts of informations without selecting them, may they be true or false. This explains the
high volatility of both true and false informations. This also explains the average produced
stable reliability since large amounts of information will be added and deleted randomly in

37
Figure 9: Average results with 50 honest users and 50 bullshitters.

such a way that on average, neither true or false information will accumulate.
Which of these two kinds of trolls is the most disruptive in terms of reliability ? At first
sight, the intuitive response would be the disinformers since their combined actions cause the
reliability to converge to a lower reliability than the stable reliability produced by bullshitters.
However, we can see that with disinformers, true informations pile up at a really slow rate
compared to false informations, while with bullshitters, both true and false informations are
prohibited from piling up. If true informations pile up, it’s because disinformers are selective
in the informations they delete and therefore must beat the esotericity levels of the checked
informations. Therefore, true and highly esoteric informations will be hardly deleted and
will accumulate. In comparison, bullshitters simply make the article too volatile. With fewer
trolls, we can then predict that disinformers will be much less disruptive. In order to test
this hypothesis, I simulated runs with 75 honests users and 25 trolls (see Table 3 in the
Appendix).
The hypothesis is correct, bullshitters are indeed more disruptive than disinformers. Con-
cerning disinformers, we observe that with only 25 of them, the volatility functions of false
and true informations are almost perfectly inverted compared to the runs with 50 disinform-
ers. Volatility of true information in the first case will be almost equal to the volatility of
false information and conversely. The article will then converge to an average reliability of

38
0.82±0.13 for the last 100 versions of the article. In comparison, while the average reliability
of the article will increase from 0.15±0.17 with 50 bullshitters to 0.36±0.23 with only 25 of
them, the article will still be globally unreliable. And again, we observe no convergence in the
average reliability when trolls are bullshitters. With only 25 bullshitters, volatility decreases
for both true and false informations, but the volatility of true informations is slightly more
decreased than the volatility of false informations. If we continue to diminish the number of
bullshitters and to increase the number of honests users, this tendency continues to manifest
itself. With 90 honests users and 10 bullshitters, the average reliability will still not converge
and be of 0.6±0.21. The volatility of false information starts to stagnate with a half life of
7 but the volatility of information will significantly decreases with a half life of 25 instead
of 14 with 25 bullshitters. With only 5 bullshitters and 95 honests users, a convergence
phenomenon manifests. The reliability converges towards an average reliability of 0.78±0.15
for the last 100 versions. Volatility of false information still stagnates while volatility of true
informations decreases with a half life of 51.
Another observation can be made if we look at singular runs. Even when there are few
bullshitters compared to honest users (5 of the former, 95 of the latter) we can observe highly
disruptive mass deletions and mass contributions such as shown in the graph shown by figure
10.
Such mass disruptions do not allow the article to be stable and while it is on average
mostly reliable, the mass deletions render the article a perpetual fight for honest users to
restore the article integrity after such mass deletions.
These results show that bullshitters are much more disruptive compared to disinformers
since even a small proportion of them will result in an unreliable article. While disinformers,
in sufficient numbers, will be able to make the reliability of the article converge to unreli-
ability by contributing false information and selectively deleting false information, instead
bullshitters are higly disruptive since they aren’t selective and can therefore add and delete
more informations. Those data are the results of two correlated features of the simulation :
the unselectiveness of bullshitters when deleting informations and the largeness of the infor-
mations contributed or deleted by bullshitters. As we have noted before when introducing the
bullshitters, the particularities of the bullshitter in comparison to honest users and disinform-
ers is justified in the reality of Wikipedia. The quantity difference stems from the fact that
bullshitters can just select a big portion of the article (or even the entire article) and press
delete on their computer keyboard. Concerning the quantity of information added, this is
justified by the fact that it takes less effort to produce random information29 than to produce
29
For instance by copy-pasting a New World Order conspiracy theorist blog post inside the Truth article :
https://en.wikipedia.org/w/index.php?title=Truth&diff=20002271&oldid=19959811

39
Figure 10: Results of one of the run with 95 honest users and 5 bullshitters.

serious information, true or false, honest or dishonest. The unselectiveness feature is simply
the computing consequence of the concept of bullshitter which are users without epistemic
intentions and are therefore unselective in regards of the truth value of their deletions.

4.2.2 Actions proportion against trolls

Now that trolls are added to the simulation and that we observed their harmful impact
on the reliability of Wikipedia, we can ask which strategy the honests users can implement
to increase the resilience against trolls. A possible strategy is to vary the actions performed.
We saw earlier that, in order to have a reliable article, a certain ratio of informations checked
over the informations contributed must be met. Maybe such a variation in the proportion of
actions performed is efficient enough to ensure Wikipedia’s reliability against the nefarious
actions of trolls (see Table 4 in the Appendix).
We can observe that for bullshitters, none of these strategies are giving significant results.
This is likely to stem from the fact that the bullshitter quantity of information added or
deleted is far higher than with honest users or disinformers. No matter which kind of action
the honest users will favor, the bullshitters will still have a more powerful effect on the article.
The situation is however different for disinformers since they act as much as honests users.

40
We can see that if honest users are more likely to check the article more than to contribute, the
global average reliability, even if it’s still inferior to 0.5, increases from 0.11±0.08 with honests
users contributing 50% of the time and checking 50% of the time to 0.22±0.17 with honests
users checking 75% of the time and contributing only 25% of the time. Another important
observation is that this variation in the actions engaged in by honest users actually cancels
the convergence toward unreliability. Finally, we can explain this result by looking at the
volatility variations caused by the variation of the parameters at hand. While the half life
of true informations decreases by 4, the half life of false informations drops from 99 to 10
meaning that the volatility of false information increases much more than the reliability of
true information. This result stems from the fact that, since honests users are checking more
often the informations in the article, they will be more likely to delete false informations and
therefore false informations volatility will increase.
On a more philosophical level, we could conclude from these results that, in a massively
participative epistemic system with a contribution system similar to Wikipedia, the answer
to trolls in terms of actions engaged in (contribution or check and deletion) depends on
the kind of trolls. If the troll is epistemically malevolent such as a disinformer, then an
increase in vigilance (check and delete actions) is needed to fight them. However, if the troll
intention is simply not epistemic and the action is to randomly and massively edit the article,
then neither more vigiliance nor more contribution will successfully diminish the impact of
his disruptive behavior. This implies that in order to fight the bullshitters, we need other
resilience mechanisms than simply changing the behaviors of honests users.

4.3 Administration
Administration is thought to be a powerful resilience mechanism against trolls by allowing
to ban them.
If we add administration, then when a user checks the history of the article (as opposed
to checking random informations in the article), the user is allowed to report disruptive edits
(as I defined earlier). In this case, an administrator will check the diff at hand. If it concludes
that the diff is disruptive then the user associated With this diff will be permanently banned.
The purpose of such a mechanism is to ensure a higher proportion of honest users against
trolls. But, as we have seen before, it is possible for an honest user to be banned if she
adds only false information (because of bad luck or poor reliability). Is this mechanism
able to fulfill its purpose ? Figure 11 shows average of results with 50 honest users and 50
disinformers and figure 12 with 50 bullshitters instead of disinformers (see also Table 5 in
the Appendix).

41
Figure 11: Average results with 50 honest users, 50 disinformers and administra-
tion.

As expected, administration decreases the troll population among the users population al-
lowing it to have a ratio where honests users are sufficiently represented to ensure Wikipedia’s
reliability. Of course, some honest users are banned too, but slower than trolls since trolls
always do disruptive edits. We can observe that administration is especially effective against
bullshitters which are deleted faster than disinformers. This result is to be expected since
honest users can detect a disruptive contribution not only by detecting that the contribution
only adds false information but also because the volume of their contributions and deletions
are in average quite a bit larger in comparison to honests users or disinformers. In the real
Wikipedia, a mass deletion or a mass contribution is immediately suspected and therefore
administration can proceed quickly. There are even internet bots trained to immediately
revert such massively disruptive editions.
On a more normative level, I think this data allows us to draw the following conclusion.
In a massively participative system such as Wikipedia where the conditions for trolling are
present, there is a strong need for administration since it is the most powerful resilience
mechanism against both trolls and poorly reliable users. Without administration, a massively
participative system would not be resilient enough to fulfill its purpose. Wikipedia wouldn’t
work as well as it does if its governance was completely distributed.

42
Figure 12: Average results with 50 honest users, 50 bullshitters and administra-
tion.

However, despite its excellent impact over reliability, when one looks at individual runs
instead of an average of them, we can observe that administration is not a silver bullet against
the strongly disruptive edits that massive deletion are when there are bullshitters. Some of
the runs, such as the one shown in figure 13, manifest mass deletion.
So while the number of bullshitters diminishes and that therefore mass deletions are more
unlikely, there is still the possibility of an individual bullshitter causing strong damages. In
order to be resilient against these mass deletion, we need another feature of the versioning
system : the revert functionality which allows one to revert the article to a previous version.
Some limitations of the results must be made explicit as they probably lower its impact.
For instance, in the simulation, the administrator bans the incriminated user at the first
reported false step while in Wikipedia a permanent ban is preceded by warnings and tempo-
rary bans. Also, in controversial reports, a consensus must be met in order to decide a ban.
Those facts are not implemented in the simulation and the results must be balanced in this
regard.

43
Figure 13: Results of a single run with 50 honest users, 50 bullshitters and ad-
ministration showing examples of mass deletions.

4.3.1 Administration and other parameters

The conclusion about administration remains observable with the variation of other tested
parameters. When pareto distributions of activity and volume of informations are added, no
significant differences are to be observed except that convergence toward reliability takes more
time compared to runs with a uniform distribution of activity and volume of information. As
with all runs with a pareto distribution, we also observe that the number of informations is
lower than with a uniform distribution but since it affects both true and false informations,
the reliability is unchanged.
Concerning the distribution of reliability, we can similarly observe that with a lower mean
in the normal distribution of reliability, convergence toward reliability takes more time, while
with a higher mean, it takes less time. Also, with means lower than 0.5, reliability will not
converge towards 1 but towards lower values such as 0.9 with a mean of 0.25.
Concerning the distribution of actions, we observe that the best action for honest users
to undertake is to check the history of the article in order to report disruptive edits. In
such cases, bans of trolls will happen quicker as will convergence toward reliability. However,
more honests users will be deleted at the end of the runs and the article will have less true

44
informations at the end of the run compared to a simulation with equiprobable proportions of
undertaken actions. The second best action to undertake is to check for false information wich
will give the same result as with runs with an equiprobable probability for each kind of action
with the exception that there will be less true informations in the end. Finally, an increase of
contribution, as expected, will lower the convergence toward reliability, decrease the volatility
of false information (since less check will be undertaken) and increase the quantity of true
informations at the end of the runs.
A first conclusion would be that administration still work with poorly reliable honests
users since the most unreliable honests users will be likely to be banned while the more rare
reliable honests users will not. A second conclusion is that, normatively speaking, it is better
for honests users to undertake more often the action of checking the article in order to report
disruptive users. Such a behavior will quickly ban trolls and poorly reliable users allowing a
quicker convergence toward reliability.

4.4 Revert
As we have seen, the revert functionality allows any users to revert the article to a previous
version. This is hypothetically supposed to ensure resilience against massively disruptive
edits. In order to study this hypothesis, let us simulate runs with 50 honest users and 50
trolls (whether disinformers or bullshitters) where all users can either contribute, check the
article in order to delete information or check the last edits in order to find disruptive edits
and revert to a better version (see Table 6 in the Appendix).
Before the interpretation of the data, we need to explain how the measure of volatility is
done when revert is possible. When there is no revert, then when an information is deleted,
there is no possibility for this information to be reinstated in the article. In order to compute
the volatility, we therefore simply have the duration of each information. But when there
is revert, an information can appear and disappear often. This is why I distinguished two
volatility functions : the summed volatility and the non-summed volatility. The non-summed
volatility takes every single duration of each information appearance to compute the volatility
while the summed volatility sums every single appearances of each information to compute
the volatility. This is a necessary move in order to study the volatility under the regime of
revert because it allows us to compare the frequency at which an information is deleted and
resinstated to the total survival time of informations. The summed volatility will always be
lower than the non-summed volatility, but the higher the difference is, the better the revert
function allows information to be reinstated after deletion. The same applies to half life :
non-summed half life takes all the instances of each information while the summed half life

45
sums all these instances.
Returning to the the produced data, several observations can be made. First, concerning
the case where trolls are disinformers, we observe that the average reliability and its average
standard deviation are roughly doubled (from 0.11±0.08 without revert to 0.2±0.14 with
revert). So while it makes the article more reliable on average, this result is however more
unstable. The convergence toward unreliability, while it seems to converge towards a higher
value, is still to be observed with the revert functionality. Concerning volatility, we observe
that the non-summed volatility increases for both true and false informations. While true
informations has a half life of 14 without revert, its non-summed half life drops to 6 with
revert. Concerning false informations, their half life goes from 99 without revert to 30 with
revert. So, when looking at single instances of the informations durations, volatility increases.
But when one looks at the summed volatility, one can observe that the summed volatility of
true information with revert is actually lower compared to the volatility of true informations
without revert while the summed volatility of false informations is still higher than compared
to runs without revert.
Concerning the case where trolls are all bullshiters, we also observe an increase of both
average reliability and its standard deviation. With revert the article is highly unstable since
bullshitters can revert to a randomly selected previous version. Concerning volatility, all
informations are highly volatile for both summed and non-summed volatility.
These results seem to dismiss the hypothesized virtue of the revert functionality. While
there is an increased reliability, the information is still highly volatile and the article unstable.
But it is maybe a result observed because of the highly implausible assumption that there are
as many trolls as there are honest users. What happens if we now diminish the proportion
of trolls over honest users ?
An interesting result emerges (see Table 6 in the Appendix). Concerning disinformers, the
less there are in the simulations, the less the revert functionality will have a significant effect
on reliability and volatility. So while the revert functionality has a somewhat significant
effect with a very large number of disinformers, its beneficial effect disappears correlatively
to the diminishing of their numbers.

46
Figure 14: Results of a single run with 95 honest users, 5 bullshitters and revert
showing examples of reverted mass deletions.

Figure 15: Results of a single run with 75 honest users, 25 disinformers and revert
showing examples of revert wars.

47
Concerning bullshitters, the observed variation is the opposite. The less the users popu-
lation is composed of bullshitters, the higher is the impact of the revert functionality over
reliability and volatility. While the convergence towards reliability starts to appear with 95
honest users for 5 bullshitters without revert, the convergence toward reliability can be ob-
served when there are 75 honest users againt 25 bullshitters with both having the possibility
to revert. Also when there is revert, the standard variation of reliability decreases quicker
as the number of bullshitters decreases, compared to runs without the revert functionality.
Since this standard variation of reliability is caused by mass deletions, this means that mass
deletions occur less often when the revert functionality is made available. This implies that
with a plausible amount of bullshitters, the revert functionality achieves its aim. This can
be better observed by looking at singular runs. For instance, the run shown in figure 14 with
5 bullshitters against 95 honest users shows reverted mass deletions as predicted.
Another interesting fact about the simulation is that it reproduces the so-called “revert
wars” where users successively revert the article producing a saw-like shape as it is shown in
figure 15. This happen with disinformers since disinformers are selective in their reverts.

4.4.1 Revert and administration

Figure 16: Average results with 50 honest users, 50 disinformers, administration


and revert.

48
Figure 17: Average results with 50 honest users, 50 bullshitters, administration
and revert.

Based on the previous results, we can predict that with equivalent amounts of trolls over
honest users, the revert functionality must work in sync with administration since adminis-
tration diminishes the proportion of trolls over honest users. Figure 16 shows the combined
effect of administration and revert with 50 honest users and 50 disinformers while the disin-
formers are replaced by bullshitters in figure 17
While administration reduces the number of bullshitters drastically, we have seen that a
few survivors can have a devastating effect through mass deletions. But when revert and
administration work in sync, we can see in singular runs that these mass deletions are well
reverted. In figure 18, a reverted mass deletion is clearly visible.
To illustrate the very powerful resilience allowed by administration and revert working
together, we can see in figure 19 that even with 40 bullshitters, 40 disinformers and only 20
honest users, the convergence towards reliability can still be observed.

49
Figure 18: Result of a single run with 50 honest users, 50 bullshitters, adminis-
tration and revert.

Figure 19: Average results with 20 honest users, 40 bullshitters, 40 disinformers,


administration and revert.

50
4.5 Robustness of untested parameters
All these results are robust. When changing the esotericity noise to 0, the checks are
purely based on the reliability of the user meaning that the most reliable users will never see
their contributions deleted except by bullshitters. Such a modification actually lowers the
impact of check actions but without significant differences compared to an esotericity noise
of 1. I also tried an esotericity of 100 meaning that reliability and esotericity are now almost
completely uncorrelated. As it can be predicted, the efficiency of checks are increased. While
it doesn’t change the result, lower esotericity noise slows down convergence phenomena and
higher scores will accelerate them.
Concerning the maximum number of contributed or checked informations as well as the
maximum number of versions checked, the higher the value is, the more intense all the effects.
For instance, when convergence towards reliability occurred, it occured quicker. The number
of informations will also be higher except for highly unstable articles such as when there
are 50 bullshitters and no administration. Low values of this parameter tend to slow down
observed convergences and extremely low and unrealistic values such as 1 for instance would
completely annihilate the positive effect of administration and revert. This can be explained
easily. When a mass deletion occurs, most of the time there will be other edits before an
honest user checks the history and discovers the mass deletion. But when the untested
parameter at hand is 1, then this user can only check the previous diff and will, in most of
the cases, not be able to discover the mass deletion in time thus negating the positive effect
of revert. The same applies to administration because in order for a disruptive diff to be
deleted, a user must check the previous diffs. But while this is predictible, it isn’t problematic
since it is empirically false that no users check more than the last diff.
Concerning the Pareto distribution of activity among the user population, such a distri-
bution will slow observed convergences compared to runs with a uniform distribution. This
is explained by the fact that while in uniform distributions more users will be more active on
average, with a Pareto distribution, users with a high activity level will be rarer. Concerning
the Pareto distribution of the number of actions, it will also slow down the various observed
convergences, will diminish the number of informations at the end sof the runs and will also
diminishes the efficiency of administration and revert for the same reason as seen with the
maximum number of this parameter. For both these distribution, high value in the shape
parameter will increase the described effects while low value will make the results identical
to runs with uniform distributions.
Finally, the multiplier of the number of contributed actions for bullshitters, as expected,
only affects the disruptiveness of bullshitter contributions. A multiplier of 1 will make bull-

51
shitters only disruptive in their deletions and reverts, while high value, will increase the
disruptiveness of their contribution. When bullshitters outnumber the honest users, the arti-
cle will still be unstable but the variation will be higher while when honest users outnumber
bullshitters and there is revert, then the revert beneficial impact against massive disruptive
contributions will be more easily observed under the form of high spikes of false informations
in individual runs.

52
5 Conclusions
5.1 Amateurs
Several factors are implicated in the explanation of how amateurs are capable of producing
a reliable article. As we have measured that most of the content is produced by a minor-
ity of contributors, and since Wikipedia is empirically reliable, we can infer that most of
the reliability of Wikipedia comes from these super contributors. We can then, through a
contributor-to-source reliability transfer, explain the reliability of Wikipedia.
Another important factor is the peer reviewing process of Wikipedia. While most of the
content is contributed by a few users, most of the edits are corrections implying that most of
the actions engaged in are checks of the article. As we have seen with the simulation results,
checking is extremely important for the overall reliability of the article, especially if users
are amateurs. The checking process allows to refine the big chunks of content added by the
super-contributors.
Another important factor is strong epistemic honesty and administration. Strong epistemic
honesty forbids an honest user from deleting true informations. However, potentially weakly
epistemically honest users in the real Wikipedia will perform this action. But such users will
then be perceived as dishonest users if this action is repeated. They will either strengthen
their epistemic honesty, or be banned if this behavior is repeated. While this is not a
simulated result, it is highly plausible that administration will have this effect over the most
weak epistemically honest users.
A final but unsimulated factor is the policies and guidelines of Wikipedia. The constraint
for an information to be referenced to a reliable source or the constraint to not add original
content plays a strong role against weakly honest and unreliable users. Forcing them to
source their informations refines their reliability.
More generally, the possibility for amateurs to produce reliable content should not be a
surprise. While it is not a surprise for little groups of experts to produce reliable content,
it is not either a surprise for large groups of amateurs because as long as there is incentive
for strong epistemic honesty (such as administration or the various policies) and unless those
amateurs are especially unreliable, then in the long run true informations will accumulate
while false ones will be corrected. But these results bring us to another question. Could
it be that large groups of amateurs could replace little groups of experts ? Do we still
need scientists or can we replace them by a huge contribution platform ? Is the success of
Wikipedia a clue as to the end of the scientific expert ?
In an article, Larry Sanger [2009], the cofounder of Wikipedia and Nupedia, studies such
a claim and provides several arguments. While he attacks a straw man by criticizing the

53
idea that truth would be entirely determined by Wikipedia’s content, he also provides the
more acceptable argument that Wikipedia’s central policy forces its content to be sourced
and to not be original content. Wikipedia cannot replace science since this is not its aim.
But what if it was ? After all, in my simulation there is no model of sources and users cannot
engage in this action. Could those result be therefore applied to a massively participative
community of amateur researchers ? I do not believe so since amateurs get their reliability
from somewhere. The model proposed here assumes that amateurs can virtually produce an
infinite amount of informations but this is obviously an assumption. Even very reliable users
will, at some point, have exhausted their knowledge and the model does not take account
of this. In order for a simulation to study the question of whether a massively participative
system of amateurs would actually outperform little groups of experts, such a simulation
would need to incorporate a model of how people acquire new knowledge. The underlying
model of my simulation does not manifest such a feature and therefore cannot be considered
a satisfying model to answer this question.

5.2 Resilience
Concerning resilience, three main factors are important. The most important factor is
administration since administration allows the quick ban of both disinformers and bullshit-
ters. Administration therefore acts upon the proportion of honests users over the trolls.
When trolls are a minority, the article converges toward reliability since amateur honest
users produce reliable articles.
Concerning the resilience against disinformers, more checks from honest users are effective
since disinformers try to hide disinformation in the article. While with bullshitters it is
possible to easily detect them through the size of their edits, disinformer behavior appears
to be similar. The best way to get rid of their disinformation is to verify their contributions
and here the Verifiability policy has a potential role to play since it forces one to source
information.
Concerning the resilience against bullshitters, only both administration and revert are
effective. Revert alone does not provide satisfying resilience against large amounts of bullhit-
ters since they can revert in a disruptive manner. However, when faced with smaller amounts
of bullshitters, reverts from honest users provide satisfying resilience against mass contribu-
tions of bullshit and mass deletions of the article. Concerning administration alone, while it
drastically diminishes the number of bullshitters, even a small number are highly disruptive
and administration must therefore be accompanied by the revert system to forbid disruptive
mass edits.

54
5.3 Volatility
Concerning volatility, there are several issues and solutions. The first issue is the abso-
lute volatility of all informations due to potential mass deletions. Such mass deletions are
performed by bullshitters and as we have seen the best way to fight those is a combination
of administration and revert. While revert is thought to be an excellent mechanism against
global volatility, the simulation showed that since both honest users and trolls are able to
perform it, it is effective only when honest users outnumber trolls and especially bullshitters.
Concerning the comparative volatility of true informations against false ones, several fac-
tors are at play. First, the proportion of honest users over trolls is an important factor. The
more trolls, the higher the volatility of true informations will be compared to the volatility
of false ones and conversely. A second important parameter is the proportion of checks in
the actions undertook by honest users. The more frequent the checks will be, the higher the
volatility of false informations will be compared to the volatility of true informations. The
same applies to the proportion of history checks and revert for honest users. And of course
administration will again play a crucial role by diminishing the proportion of trolls.

5.4 Governance
The overall importance of administration in the three explanatory problems is of signifi-
cance. The produced results hint at its necessity in massively participative epistemic system.
Administration bans the most unreliable or weakly honest users, provides resilience against
trolls and finally has an overall impact on absolute and comparative volatility. Except cases
where all the participants are both mostly reliable and honest, administration is a necessity
and since massively participative epistemic systems cannot ensure these conditions (and on
the contrary even provide a playground for trolls and unreliable users), then it can be inferred
that all massively participative epistemic systems must implement a form of administration.
It is this normative conclusion that I think is valuable as an advice for creators of massively
participative epistemic systems. It also contradicts the “anarchist” opposite thesis that such
systems are able to work without a form of hierarchy. Such systems could work if trolls would
not participate but massively participative systems have a way to attract them.
However too much hierarchy isn’t a good idea either as it is exemplified by Larry Sanger’s
Citizendium project. While Sanger is a cofounder of Wikipedia and Nupedia, he decided
to quit Wikipedia on the basis that the users would not recognize any kind of authority
from experts and the criticisms he made of the relationship of Wikipedia to experts can be
seen as his motivations for this project. He decided to fork Wikipedia but with a major
modification : all the editors would need to work under their true name and would need

55
to provide proof of their qualifications which could be used as appeal to authority in their
respective domains of expertise. Even a move that small condemned the project which never
had the success of Wikipedia. While the articles are indeed reliable, the power (the quantity
and exhaustivity of a source) of Citizendium isn’t to be compared to Wikipedia’s size. If
governance is a necessity in massively participative system, too much can produce reluctance
with amateurs who in consequence are not incentivized into participating, fearing that their
contributions will potentially be deleted by an epistemic authority without the possibility of
appeal permitted by a horizontal distribution of epistemic authority.

5.5 Broadening the range of the results


It is possible, with the right epistemic constraints, to broaden the range of my simulation’s
target. While it was designed to study Wikipedia’s reliability, I think that some of the results
can be applied to other systems, even non-epistemic systems. The general idea is that the
simulation is based on a model and that this model can be found in other systems. So in
order to see the range of the model, we need to look at the basic components of the system.
The model needs a value which is here the truth value of information. This value is
associated to a degree of difficulty for users to assess it and this is esotericity. The model
implies a versioning system with contributions and checks (and potentially reverts). It also
implies honest users in regards to the relevant value but also dishonest users. All the users
have a degree of performance in regards to the value (reliability of users) and can contribute
or checks the production. The versioning system is filled with pieces of stuff that manifest
the value and the difficulty to assess the value, here it’s informations. Finally, there are
administrators that can exclude users from participation.
The most obvious example of this model, except Wikipedia, is free and open source soft-
ware projects. In such projects users can contribute or delete pieces of code (pieces of stuff),
which must implement a functionality without bugs (the value) and can be more or less hard
to understand (the equivalent of esotericity). The participants have a degree of skill in coding
(performance in regard to the value) and can add pieces of code or review them (contribution
or checks) in the git versioning system. The strongest difference is actually that for a piece
of code to be added to the master branch of the versioning system, it must be authorized by
an administrator. This difference could imply that the produced program would be highly
dependant on the coding skills of this administrator but since administrators are generally
coders at the origin of the project, we can safely assume that they have a somewhat high
degree of coding skills and will therefore allow most of the contributions.
Fallis [2008] already compared Wikipedia to free software projects but my contribution is

56
to add a formal model at the bottom of this comparison. The fact that the underlying model
of my simulation can be applied to a broader horizon of massively participative systems is
one of its strengths.

5.6 Possible developments


Adding other functionnalities to the simulation would allow to test even more hypotheses
and to refine the etiology of Wikipedia’s reliability. It is possible to add relevance to it
since informations in a Wikipedia article are not only checked for their truth value but also
according to relevance to the topic. It’s also possible to simulate the effect of the Verifiability
policy by simulating more or less reliable sources of informations and to let users undertake
the action of sourcing their informations. This would allow to study how such a behavior
would make the Wikipedia article dependant on the sources’ reliability. Another possible
implementation would be to add extremely active bots, both honest and disruptive, which
would protect the article against mass deletions or on the contrary perform really disruptive
edits. Finally a last possibility would be to add a democratic system of administration which
would allow to simulate the elections of administrators in Wikipedia. It would allow to study
in which conditions such a democratic governance system would be contaminated or not by
trolls and how it would affect article’s reliability.

57
Appendix

58
Table 1: Results with 100 honest users.

59
Table 2: Results of varying actions proportion with 100 honest users.

Table 3: Results of varying proportions of trolls over honest users.


60
Table 4: Results with varying actions proportions of honest users against trolls.

Table 5: Results with administration.


61
Table 6: Results with revert and variable numbers of trolls.
References
• Denning, P., Horning, J., Parnas, D., & Weinstein, L. [2005]. Wikipedia risks. Com-
munications of the ACM, 48, 152
• Fallis, D. [2008]. Toward an Epistemology of Wikipedia. Journal of the american
society for information science and technology, 59(10), 1662–1674.
• Giles, J. [2005]. Internet encyclopaedias go head to head. Nature, 438, 900–901
• Goldman, A.I. [2010]. A Guide to Social Epistemology. In Social Epistemology: Es-
sential Readings. Oxford University Press.
• Gorman, G.E. [2007]. A tale of information ethics and encyclopedias; or, is Wikipedia
just another internet scam? Online Information Review, 31, 273–276.
• Grüne-Yanoff, T. Weirich, P. [2010]. The Philosophy and Epistemology of Simulation
: A Review. Simulation & Gaming, 41(1), 20–50.
• Kittur, A. Chi, E. Pendleton, B.A. Suh, B. Mytkowicz, T. [2007]. Power of the few
vs. wisdom of the crowd: Wikipedia and the rise of the bourgeoisie. World wide web.
• Kräenbring, J. Penza, T.M. Gutmann, J. Muehlich, S. Zolk, O. Wojnowski, L. Maas,
R. Engelhardt, S. Sarikas, A. [2014]. Accuracy and Completeness of Drug Information
in Wikipedia: A Comparison with Standard Textbooks of Pharmacology. PLoS ONE,
9(9).
• Lessig, L. [1999]. Code and Other Laws of Cyberspace. Basic Books.
• Lewis, D. Causal explanation. [1986] Philosophical Papers II. Oxford University Press,
214-240.
• Magnus, P.D. [2006]. Epistemology and the Wikipedia.Presented at the North Ameri-
can Computing and Philosophy Conference in Troy, New York. http://hdl.handle.
net/1951/42589
• Read, B. [2006]. Can Wikipedia ever make the grade? Chronicle of Higher Education,
53(10), A31.
• Sanger, L.M. [2009]. The Fate of Expertise after Wikipedia. Episteme, 6(1), 52-73.
• Sunstein C.R. [2007]. Deliberating Groups versus Prediction Markets (or Hayek’s Chal-
lenge to Habermas). Law and Economics Working Paper, 321.
• Viégas, F.B., Wattenberg, M., & Dave, K. [2004]. Studying cooperation and conflict
between authors with history flow visualizations. Retrieved in 2016 from http://
alumni.media.mit.edu/~fviegas/papers/history_flow.pdf
• Weirich, P. [2011]. The Explanatory power of models and simulations. Simulation &
Gaming, 42(2), 155–176
• Zollman, K.J.S. [2007]. The communication structure of epistemic communities. Phi-
losophy of Science, 74(5), 574-587.

62
Acknowledgements
I would like to thank Cédric Paternotte for the insights, helpful criticisms and directions
he gave me during this last year. I also want to point out the kind investment of his time
with my research.
I also want to thank the complete professoral board of Paris IV for their high expectations
and in-depth lessons : Anouk Barberousse, Jean-Baptiste Rauzy, Isabelle Drouet, Pascal
Ludwig et Elise Marrou.
I am also thankful to my beloved Sonia Higgins for constantly being available for discussion
which really helps when one tries to sort out one’s mind, for the numerous ideas she submitted
and also for the proof reading she gladly made.
A special thanks to my fellow students : Adrien, Ariane, Hugo, Eugène, Tim, Grégoire,
Josselin, Joffrey, Victorien and the others. These years at LOPHISC were some of my
favourite and you were beyond all my best expectations both on the intellectual and human
levels.
I’d also like to thank Adrien Luxey for helping me optimize my Wikipedia history flow
tool.
Thanks to Sci-Hub and its creator Alexandra Elbakyan. Without her, this research would
not have been possible since science is so expensive nowadays.
Finally, I’d like to not thank my two budgies, Mescaline and Cléopatre, for being a constant
cute distraction on the top of my computer screen.

63
Summary
1 Is Wikipedia reliable ? 2
1.1 Defining reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Epistemic pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Wikipedia is reliable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Why is Wikipedia reliable ? 8


2.1 Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Policies and guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Simulating Wikipedia 21
3.1 Why the simulation approach ? . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 How to simulate Wikipedia ? . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Basic simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2 Trolls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.3 History check and revert . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.4 Administration and report . . . . . . . . . . . . . . . . . . . . . . . . 30

4 Results 31
4.1 Basic simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 Normal distribution of reliability . . . . . . . . . . . . . . . . . . . . 32
4.1.2 Varying the actions proportion . . . . . . . . . . . . . . . . . . . . . 32
4.1.3 Traditional encyclopedia . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Trolls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.1 Disinformers and bullshitters . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.2 Actions proportion against trolls . . . . . . . . . . . . . . . . . . . . 40
4.3 Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.1 Administration and other parameters . . . . . . . . . . . . . . . . . . 44
4.4 Revert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.1 Revert and administration . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5 Robustness of untested parameters . . . . . . . . . . . . . . . . . . . . . . . 52

64
5 Conclusions 54
5.1 Amateurs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4 Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.5 Broadening the range of the results . . . . . . . . . . . . . . . . . . . . . . . 57
5.6 Possible developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Appendix 59

References 63

Acknowledgements 64

65

View publication stats

You might also like