Professional Documents
Culture Documents
Master’s Thesis
Content
Figures ............................................................................................................................. IV
Tables .............................................................................................................................. VI
Abbreviations ................................................................................................................ VII
1 Introduction .................................................................................................................. 1
2 Literature Review ......................................................................................................... 4
2.1 The Theory of Social Representations .................................................................. 4
2.1.1 The Cradle of SRT: A Historical Perspective .............................................. 4
2.1.2 Concepts and Processes ............................................................................... 6
2.1.3 Previous Applications and Ongoing Research ........................................... 12
2.2 The Wikipedia Project ......................................................................................... 14
2.2.1 Wikipedia Outline ...................................................................................... 14
2.2.2 Subject of Scientific Studies ...................................................................... 15
2.3 Social Representations on Wikipedia .................................................................. 19
3 Methodology of Studying Social Representations on Wikipedia ............................... 20
3.1 Wikipedia Structure ............................................................................................. 20
3.2 Suitability of Wikipedia for Studying Social Representations ............................ 23
3.3 Wikipedia Article as a Central Element of Study ............................................... 25
3.3.1 Social Representation Concepts in the Article ........................................... 25
3.3.2 Social Representation Processes in the Article Evolution ......................... 27
3.4 Case Studies......................................................................................................... 30
3.4.1 Case Study Anchoring Analysis ................................................................ 31
3.4.2 Case Study Objectification Analysis.......................................................... 33
3.5 Towards a Quantitative Analysis of Wikipedia Articles ..................................... 33
4 WikiGen: A Statistical Tool for Quantitative Analysis of Wikipedia Articles .......... 35
4.1 Architecture ......................................................................................................... 35
4.2 Statistical Features ............................................................................................... 38
4.2.1 Editing Statistics ........................................................................................ 38
4.2.2 Links Statistics ........................................................................................... 42
4.2.2.1 Anchor Maps .................................................................................. 42
4.2.2.2 Anchor Snapshots ........................................................................... 43
4.2.2.3 Anchor Dynamics ........................................................................... 45
4.2.3 Reference Statistics .................................................................................... 49
4.2.4 Integrated Tools ......................................................................................... 50
5 Case Studies of Social Representations on Wikipedia ............................................... 52
5.1 Cloud Computing: From Utility Computing to a Jargon Term ........................... 52
5.1.1 General Description ................................................................................... 52
5.1.2 Anchoring Analysis.................................................................................... 56
5.1.2.1 Employed Anchor Statistics ........................................................... 56
5.1.2.2 Anchor Coding ............................................................................... 59
5.1.2.3 Narrative Interpretation .................................................................. 63
5.1.3 Objectification Analysis ............................................................................. 74
5.2 The iPad: From a Big Smartphone to a New Market .......................................... 75
5.2.1 General Description ................................................................................... 75
5.2.2 Anchoring Analysis.................................................................................... 79
5.2.2.1 Employed Anchor Statistics ........................................................... 79
III
Figures
Tables
Abbreviations
CC Cloud Computing
IS Information Systems
MUSE Mutually Exclusive and Collectively Exhaustive
SR Social Representation
SRT Social Representations Theory
URL Unique Resource Locator
WF Wikimedia Foundation
WP Wikipedia
1
1 Introduction
The quest for the origins and the nature of human knowledge has a history as long as
those of the self-reflective human mind. In our lives, we are constantly reminded of the
imperfection of our knowledge: Our senses deceive us, our logical conclusions are often
contradictory and facts we never question turn into fabrications. It was arguably
Copernicus who most clearly exposed the potentially illusive nature of what we call
reality by discovering the heliocentric model of the universe. Abandoning the centricity
of the Earth was a reorientation not only in the physical sense but, more importantly, a
radical change in the very way human kind perceived its role and significance in the
universe.
From the philosophy of the Greek Classics, until modern philosophers such as
Descartes, the phenomenon of change however was not of main concern. Although the
possibility of change was accepted, the essence of each phenomena was considered
timeless (Marková 1996, p.178). As a consequence, “throughout the history of the
natural and social science, there has been a long-lasting difficulty in conceptualising
phenomena that are inherently dynamic” (Marková 1996, p. 178). Knowledge, being not
a simple description of what is ‘out there’ but a product of human interactions and
communication involving different interests (Duveen 2000, p.2), is one of those
inherently dynamic phenomena. A ‘lay knowledge’ (Marková 1996; Wagner et al.
1999) is correspondingly not a simple deviation from the true understanding of the
phenomenon but a social construct (Berger and Luckmann 1966). Sharing this socially
constructed knowledge is equal to constituting the common reality (Moscovici 1990,
p.164) - a reality for an individual which is “to a high degree [...] determined by what is
socially accepted as reality” (Lewin 1948).
Given the dynamic nature of knowledge, the focus of a researcher who is interested in
social phenomena, and therefore the focus of this thesis, shifts towards understanding
the processes through which knowledge is generated and projected into the social world
(Duveen 2000, p.2). By these processes, knowledge acquires a historical dimension.
Past ideas and experiences continue to exist by changing and infiltrating the present
(Moscovici 2000/1984, p.24). Subjects of study in this process become changing
representations of phenomena collectively held by individuals. SERGE MOSCOVICI,
whose work is fundamental for the given thesis, states that “in order to understand and
to explain a representation, it is necessary to start with that, or those, from which it was
born” (Moscovici 2000/1984, p.27). Following the offspring metaphor, the aim of this
work is to explore the genealogy of knowledge. That is, to explore the origins and thus
the kinship of socially procreated knowledge.
2
That said, the importance of making single representations explicit resides in the
potential it opens to understand and even influence group processes in particular and
human conduct in general. To become aware of conventional aspects in our life is to
evade some of the constraints they impose on our perceptions and thoughts (Moscovici
2000/1984 p.23). If social sciences would take account of this aspect, they would
provide richer contextual explanation for social phenomena (Bauer and Gaskell 2008).
For natural sciences, understanding the representations that are related to the research
promises a higher efficiency when advisory services are required (Farr 1993). To give a
vivid example of benefits resulting from understanding the origins of our knowledge,
problems in the information systems (IS) field such as too static analysis techniques for
studying organisational phenomena (Boland 1999) can be addressed as it was shown by
GAL AND BERENTE (2008).
The heterogeneity within modern Western societies increases the relevance of exploring
the origins of knowledge. The absence of powerful centralised institutions as well as the
decreasing role of traditions and religion creates a diversity of representations (Duveen
2000, p.7). This modern world is characterised by very heterogeneous practices in
politics, philosophy, religion and arts (Moscovici 2008/1976, p.5). The resulting effects
are amplified by changes in the communication towards cross-space and cross-culture
dialogs and, more recently, by the increasing use of electronic media (Gervais 1997,
p.192). In this context, social media, with its influence on representations which freely
circulate the digital world, serves as an illustration for the aforementioned tendency.
The study of the genealogy of knowledge is however difficult. High complexity and
interdependencies between social and individual factors in the process of forming a
representation make any attempt to trace its lineage a challenging task in itself (László
1997). According to MOSCOVICI, various forms of knowledge, as “outgrowths of long
3
mutation chains”, can only be understood when “reimmersed” in the social setting of
communication (Moscovici 1988, p. 214). Tackling such collective processes requires a
suitable theoretical framework (Wagner et al. 1999). For the purpose of this thesis, the
theory of social representations (SRT) was chosen – a theory which became canonical
in the social psychology over the last 50 years (Jodelet 2008, p. 427).
To study the genealogy of knowledge in the sense of the social representations theory,
two criteria must be satisfied. First, the used data must represent socially constructed
knowledge. Second, the amounts of data must be extensive in order to capture the
historical dimension of knowledge. The difficulty in achieving the latter is emphasised
in studies in which the social representations theory is adapted as a framework for
studying group phenomena (Vaast 2007). Considering the increasing role of social
media in communication and collaboration, the Wikipedia platform, as a popular
collaboratively edited online encyclopaedia, provides a unique opportunity to study
social representations in the process of their creation. Historical data on every article
change along with the transparency of its collaboration processes make Wikipedia an
ideal choice for exploring the genealogy of knowledge. The selection of the Wikipedia
platform as the data source leads to the research question of the given thesis:
The aim is therefore to develop a method capable of revealing the genealogy of social
representations on Wikipedia and to exemplarily apply it in several case studies. It is
argued that both research on social representations and research on Wikipedia will
benefit from the introduction of this method.
The remainder of the thesis is organised as follows: First, an outline for the research on
both SRT and Wikipedia is provided in the literature review section. On behalf of
identified gaps in the research, a method for studying social representations on
Wikipedia which combines quantitative and qualitative techniques is developed in
section 3. Section 4 introduces a tool for a quantitative analysis of Wikipedia articles
that was developed to support the qualitative analysis. The latter is demonstrated in
section 5 as case studies of cloud computing and iPad social representations on
Wikipedia. Results of the method application in general and of the case studies in
particular are discussed in section 6. The thesis is concluded with a summary and
suggestions for further research on genealogy of knowledge.
4
2 Literature Review
The literature review is divided into two major parts reflecting the two pillars the thesis
is based upon: the theory of social representations and the Wikipedia platform. A
thorough analysis of previous research on both phenomena is inevitable in order to
identify research gaps and to situate the method of studying social representations on
Wikipedia in the context of ongoing research.
The central idea in the theory is that people’s knowledge of the world is mediated.
Objects in this world, whether physical or not, acquire meaning only through
representations. The role of representations in the relationship between people and their
world is apparent as there is no single true representation of an object. Instead, social
groups create and change different representations of the same object in a continuous
process. Therefore, social representations are semiotic mediating devices (Valsiner
2003, p.7.2) that members of a social group invariably use to render their world
meaningful (Wagner et al. 1999). They are formed and transformed in this process and
are partially distributed among the individuals comprising the social group (Moscovici
1994, p.168). But social representations are more than this; they stand for a theory with
over 50 years of history since SERGE MOSCOVICI published his famous book “La
Psychanalyse, son Image et son Public”1. Its presence in the canon of social psychology
is justified by thousands of published papers associated with the theory, a specially
devoted journal2, PhD programs and multitudinous followers (Howarth 2006). The long
history of the theory requires a historical perspective on the phenomenon itself and its
roots to gain a deeper understanding of its characteristics (Marková 1996; Rosa 2013).
For someone who lives in the 21st century and is not familiar with the history of social
psychology, it can be difficult to believe that only 60 years ago there was no theory
capable of taking account for the dynamic nature of social phenomena. Those were,
however, precisely the circumstances in which MOSCOVICI found himself in the mid 20st
century when he started to work on SRT (Moscovici 1988, p.214; Rosa 2013, p.3). To
clarify these circumstances, the section will explain the historical background of the
theory as well as situate the theory in the epistemological context.
1
(from French) Psychoanalysis: Its Image and Its Public (Moscovici 2008)
2
Cf. http://www.psych.lse.ac.uk/psr/
5
MOSCOVOCI’s understanding of the relation between social and individual aspects in the
development of knowledge must be seen in the context of Hegelian views on the
independency between the universal and the particular (Marková 1996, p.179). For
HEGEL, something universal was co-developed in interaction with particular (Hegel
1998/1807). Similarly, the collective and the individual in the social representations
thory are interdependent. In fact, the interdependence between individual and collective
aspects of knowledge distinguishes SRT from other theories of knowledge in social
psychology (Rose et al. 1995, p.3).
European theories in social psychology preceding the MOSCOVICI’s work are based on
Kantian philosophy, which has no account for interdependence between the collective
and the individual. The collective is seen as a given social fact which is independent
from individuals (Marková 1996, p.179). Other approaches such as cognitive
psychology focus on information processing and operate similar to natural sciences by
disregarding any social influences on the individuals’ mind (Duveen 2000, p.12). In the
similar vein, the theory of attitudes, which is sometimes considered as a North
American counterpart of SRT, lost most of its social elements as a result of the
individualisation of social psychology (Farr 1993; Wagner et al. 1999). A theory which
takes account for collective elements is the theory of social cognition. The concept of
social schemata used within the theory is related to SRT but is to static BARTLETT'S
6
(1995/1932 p.201) and lacks theoretical explanations of the schemata origins (Wagner
et al. 1999). That said, MOSCOVICI’s (1984b) attempt to integrate both individual and
social factors into a theory without losing its claim to be an explanatory device for
social phenomena is characteristic. Höijer (2011, p.4) described this characteristic as
follows: “giving the individual some room the theory of social representations avoids
social determinism and opens for processes of transformation. [...] the individual is
mainly embedded in and formed by social structures”.
Contrary to widely held beliefs that MOSCIVICI’s theory is derived from DURKHEIM’s
social psychology (Gillespie 2008; Glăveanu 2009; Ju and Gluck 2011; Vaast 2007),
DURKHEIM’s influence on the theory is limited. First, a similar approach aiming to
include social dimensions into psychology was already introduced by WUNDT’s
“Völkerpsychology”3 (Wagner et al. 1999). Second, DURKHEIM’s concept of collective
representations was not suited for something MOSCOVICI considered as a new era and a
new society (Duveen 2000, p.7-9). It was poorly equipped for the diversity of
representations as well as tensions and conflicts in the modern world (Rose et al. 1995,
p.3). For MOSCOVICI (1988, p.219), the focus, when analysing modern societies, had to
be on “innovation rather than tradition” and on “social life in the making rather than a
preestablisehed one”.
To provide a detailed summary of the theory, its concepts and processes will be
elaborated upon in this section starting with the concept of unfamiliarity. The
unfamiliarity is, in a sense, life-giving for the remaining theory elements as it triggers
the subsequent processes, which lead to the formation of social representations.
3
German for ‘folk psychology’
7
Unfamiliarity
For the process of forming or changing a social representation to come into being,
something disruptive must threaten the reality of a social group (Moscovici 2000/1984,
p.38). Disruptive in this context is a strange unfamiliar phenomenon, or a strange
unfamiliar characteristic of a familiar phenomenon. This human reaction to the
unknown is fundamental if one recalls that Aristotle had identified the very reason to do
philosophy in a similar vein: “human beings began to do philosophy [...] because they
wondered about the strange things right in front of them” (Met. 982b12)4. The
unfamiliar leaves a human being with a sense of “incompleteness and randomness” and
emphasises the “actuality of something absent” (Moscovici 2000/1984, p.38). The
necessity of dealing with unfamiliar is even accompanied by fear (Moscovici 1988,
p.234). GERARD DUVEEN has given the most striking metaphor for the naturalness of
this process by comparing a mind which “abhors” an absence of meaning with the
nature which abhors a vacuum (Duveen 2000, p.8). To overcome unfamiliarity, to make
something unfamiliar familiar, is the purpose of any social representation (Moscovici
2000/1984, p.37).
Social Representation
The first aspect to be clarified about social representations is their relation to both
individuals and social groups. Social representations exist not only in people’s minds
but also in the culture being collectively realised (Rose et al. 1995). It means that once
created, a social representation continues to exist on its own, on a trans-individual level
(Farr 1993, p.194; Moscovici 1988, p.231). However, it does not imply that social
4
Citation is taken from Stanford Encyclopedia of Philosophy (http://plato.stanford.edu/entries/aristotle/,
accessed 2013.05.29 )
8
representations are completely shared by everyone. Instead, they are partially distributed
as pieces of knowledge shared by some people yet possibly unknown to others
(Moscovici 1994, p. 168). Different social representations of the same social object
exist simultaneously across social groups, but, what is more, within the same group
(Howarth 2006, p.68). Their lack of uniformity is amplified by the fact that distinct
social representations can be inconsistent by incorporating conflicting concepts
(Moscovici 1988, p.233). Similarities and differences in representations across and
within groups are indeed substantial as they allow communication. In a felicitous
manner, Gillespie (2008, p.379) noted that “the possibility of communication is born out
of similarity, while the necessity of communication is born out of difference”. The
permanent dialog between individuals is furthermore the only driving force behind the
continuous change in social representations (McKinlay and Potter 1987, p.473).
The power of social representations is in their mediating role between individuals and
the outside world already on the level of stimuli. Each stimulus is thus interpreted
according to social representations an individual holds (Compare Fig. 1). This aspect
was already highlighted in the previous section when the outside world was identified as
something that enters social life only through social representations. Especially when a
social representation is not recognised as such, it is thought by an individual as the
reality on which he acts (Moscovici 1985, p.91).
Given the above perspective, it is apparent why social representations are considered to
be mediating devices which regulate human conduct (Valsiner 2003, p.7.6). MOSCOVICI
(2000/1984, p.23) uses a strong image to highlight the prescriptive role of social
9
Another important aspect of the theory is the distinction between the reified world of
science and consensual world of common sense (Bauer and Gaskell 1999, p.167). For
MOSCOVICI, social representations arise at the transition from science into common
sense (Farr 1993, p.195) In this world of common sense, people appropriate only a
fraction of information about the objects they encounter and they do it by forming social
representations of those objects through communication (Moscovici 1988, p.215). And
yet people are still able to successfully orient themselves based on this incomplete body
of representations - this is the paradox which is at the roots of the social representations
theory. Analogously to anthropology and child psychology which “trace the genealogy
of mythic thought to scientific thought”, the aim of social psychology and thus of SRT
is to explore the transition “from science to representations” (Moscovici 1988, p.217).
To portray this important aspect of the theory in a more vivid way, an extensive citation
from the work of BAUER AND GASKELL (1999) is valuable:
“Consider the following analogy: throwing a stone (genetic research) into a pond
(public) creates ripples. We are more interested in the ripples (representations of
genetics) and what they tell us about the invisible depths of the pond (local
concerns and sensitivities), than the stone itself (theories of genetics). Equally,
we assume that the stone throwers (geneticists and bio- technologists), while
starting the ripples, cannot control them. The very unpredictability of common
sense is the problematic of social representations theory”
(Bauer and Gaskell 1999, p.166-167)
In fact, most knowledge and ideas communicated by media and verbally among the
individuals have a scientific origin according to MOSCOVICI (1988, p.215). He
differentiates science from common sense on behalf of the notion of the consensual and
reified universes of knowledge. The reified universe is a domain of “rationality,
intellectual precision and independent judgement” which is neutral to individual values
(Marková 1996, p.182) while the consensual world is characterised by men being “the
measure of all things” (Moscovici 2000/1984, p.33). MOSCOVICI sees human cognition
10
as a product of an interrelation between two cognitive systems: While the first system
corresponds to the consensual universe and is based on associations and
discriminations, the function of the latter is in verification and control based on logical
rules (Moscovici 2008/1976 p.256).
The last characteristic of social representations to be mentioned is that they can be either
implicit or explicit. A social representation is explicit only if it becomes the subject of a
discussion itself or when communication is interpreted in terms of the underlying
representations (Gillespie 2008, p.377). Apart from these cases, social representations
are “buried under the layers of words and images” (Moscovici 1994, p.168).
As it was outlined previously, the very focus of SRT is not on describing what social
representations are, but on how they are formed. They are formed by two processes:
anchoring and objectification (Moscovici 2000/1984, p.41).
Anchoring
It is important to note that anchoring is more than simple naming and classifying of
unfamiliar. According to MOSCOVICI (2000/1984), the main aim of the process can be
seen in allowing interpretation of characteristics associated with the unfamiliar. By
comparing unknown to a prototype, it acquires characteristics of the prototypes category
and is even adapted to fit within this category. Simultaneously, a positive or negative
relation with the unfamiliar is established since anchoring is never neutral. This happens
on a subconscious level resulting in the “priority of verdict over the trial” (Moscovici
2000/1984, p.44). Accordingly, a person wearing black glasses and using a white cane
is often instantaneously classified as a blind person without much efforts being made to
determine the degree of the persons visual impairment.
11
However, even after the unfamiliar phenomenon has initially been anchored, the process
does not stop. In fact, anchoring never stops (Höijer 2011, p7). Anchors are thus an
integral part of thinking in general. As MOSCOVICI expresses it: There is no thought or
perception without anchor (Moscovici 2000/1984, p.48) .
Objectification
However, some concepts cannot be directly ‘converted’ into images. For a concept such
as greed, the resulting image after objectification is another object which is created
using the objectified concept; a greedy man for example. MOSCOVICI found in his
famous La Psychoanalyse study that psychoanalytical concepts such as the complex
were objectified as men with complexes instead of producing an image for the complex
itself (Moscovici 2000/1984, p.52). In the case of a taboo object or an object for which
no image can be found, different images are integrated into a ‘figurative nucleus’
(Moscovici 2000/1984, p.50).
Alongside with an intensive research on the theory itself, the social representations
community has generated a considerable amount of empirical research over the last 50
years (László 1997). This section will provide a summary of theory alterations, an
overview over its heterogeneous applications and an outline of the main criticism
towards the theory.
Theory alterations
Gillespie (2008) suggested an extension for the theory based on the developed concept
of alternative representations. He argued that the concept is necessary to analyse how
social groups account for other groups’ representations. To guide the research on social
representations, BAUER AND GASKELL (1999, 2008) have repeatedly tried to formulate a
progressive research program providing different models of social representations such
as the Toblerone Model (1999) and the Wind Rose Model (2008). HOWARTH (2006)
encouraged the development of a more critical social representations theory that is
aimed at tackling a broader spectrum of relevant problems within society. Addressing
the challenge of predictability, Valsiner (2003) derived a theory of enablement from the
theory of social representations. The new theory aims at better explanation of the
transition from present to future. Having discovered a growing body of sub-theories
within SRT, JODELET (2008) advocated for a dialogue between the latter to increase
explanatory power of the combined theory.
13
Theory applications
The original work of MOSCOVICI on SRT included the social representation analysis of
psychoanalysis in French society (Moscovici 2008/1961,1976). Following MOSCOVICI’s
initial work, a number of studies examined the public understanding of science and
technology (Bauer and Gaskell 1999; Farr 1996). Another body of research was
dedicated to representations of physical and mental health issues including works by
JODELET (1991) on madness, JOFFE (2009) on AIDS, MOLONEY AND WALKER (2002) on
transplantants and WAGNER ET AL. (1995) on conception. Typical research topics for
social representations also include studies of gender (Duveen 1996; Psaltis 2012),
disasters (Gervais 1997) and human rights (Doise et al. 1998).
Today, applications of the theory, apart from the traditional areas, range from analysing
social representations of food (Backstrom et al. 2003) to the perception of wolves in
Scandinavia (Figari and Skogen 2011) and to social representations of burnout in IT
professions (Pawlowski et al. 2007). Especially in the latter area of information systems
(IS) and IT there is a growing body of research based on the theory. Alongside with the
study on burnout in IT, VAAST (2007) introduced one of the first studies in the IS field
by investigating the social representation of IS-security within the healthcare domain. It
was followed by an introduction of the SRT perspective for studying socio-cognitive
processes during IS-implementation (Gal and Berente 2008). More recent studies within
information systems have adopted a social representations research perspective for
studying online privacy (Oetzel 2011), information relevance (Ju and Gluck 2011) and
social representation of social media in organisations (Kaganer 2010).
The social representations theory is known for the multiplicity of research methods one
can apply to study social representations (Moscovici 2000/1984). In an overview of
several studies which have adopted the theory, WAGNER ET AL. (1999) have identified
the use of methods including ethnography, interviews, focus-groups, content analysis of
media, statistical analysis of word associations, questionnaires and experiments. While
the majority of the techniques are qualitative, there is an increasing use of quantitative
methods for supporting analysis of social representations (Doise et al. 1993; Ju and
Gluck 2011; Breakwell and Canter 1993). Furthermore, the use of triangulation
techniques, where quantitative methods support the qualitative analysis, is considered
valuable (Gervais 1997; László 1997; Wagner et al. 1999)
Theory criticism
2000, p.419). According to the critics, the theory is lacking clear definitions (Jahoda
1988), is conceptually incoherent (McKinlay and Potter 1987), flawed in its major
concepts (Bangerter 1995) and confused (Billig 1986).
Some of the concerns are repeatedly observed across the critical elaborations of the
theory. The explanatory power of SRT, which is based on the interdependence between
individual and collective processes, is difficult to make use of in the context of the
empirical research (László 1997, p.156). An example for high complexity of this
interdependence is the constituting role of social representations and individual
capabilities to evade the imposed constraints5.
Another recurring point of criticism is directed towards the sharp distinction between
the reified world of science and consensual world of common sense (Bangerter 1995;
Bauer and Gaskell 1999, 2008; Duveen 1990; Howarth 2006; Potter and Edwards
1999). It is argued that there is a vague border between those worlds, if any. Social
representation can penetrate science as much as science enters the world of common
sense through social representations. Some voices within academia even call to abandon
the very distinction between science and non-science: “Go to a laboratory, any lab will
do, and hang around [...] do you see anything beyond ordinary discourse and situated
action?” (Lynch 1997/1993).
Despite all the criticism, the theory is successfully spreading across researchers and
different sciences – a fact which is acknowledged even by its most vehement critics
such as JAHODA (1988) and POTTER AND EDWARDS (1999).
The second pillar of the given thesis is the Wikipedia platform. An overview of the
project and associated research in academia is inevitable to establish a connection
between the social representations thoery and the Wikipedia project.
Wikipedia is one of the best known websites ever created. Its popularity is reflected in
the internet traffic in which the online encyclopaedia is ranked as the 6th most visited
web page in the world6. The definition of Wikipedia can be reduced to the definition of
its properties: free, collaboratively edited, consensus-based and multilingual. Wikipedia
5
Cf. section 2.1.2
6
Wikipedia is 6th most visited site according to Alexa.com http://alexa.com/siteinfo/wikipedia.org,
accessed 26.05.2013
15
is free due to the use of the GNU Free Documentation License7 (“Wikipedia License
Information”). Collaborative editing is achieved by allowing any individual with
internet access to contribute to any non-restricted8 article. Regarding the editorial
policy, Wikipedia has a unique position among encyclopaedias by preferring consensus
over credentials in the process of creating articles9 (Yasseri et al. 2012, p.2). The mix of
the aforementioned characteristics is available in 284 languages currently supported by
Wikipedia.
The history of Wikipedia goes back to January 2001 when the encyclopaedia was
created by Jimmy Wales and Larry Sanger with the aim that any internet user can edit
any article of the encyclopaedia at any time (Olleros 2008). Since then it has grown
rapidly in both amount of articles and participating users. According to the official
Wikipedia statistics, the encyclopaedia has a total of over 4 million articles and over 18
million users among which around 130 thousand are considered to be active 11.
Consequently, even within academia there are voices claiming Wikipedia to be
unquestionably the number one reference in practice (Yasseri et al. 2012). Another
claim Wikipedia to be one of the most complex and relevant data sets humanity has ever
produced (Martin 2011).
Apart from being a starting point for intellectual curiosities, Wikipedia is also known as
a platform providing data for various research projects. Its popularity within academia is
rooted in the fact that every article revision and every discussion post ever made are
saved and available on the platform. This makes Wikipedia a suitable subject of a broad
research spectrum including epistemological studies and studies regarding collaborative
processes (Martin 2011). In the following, research related to the Wikipedia project is
introduced. The section is structured according to different aspects of the platform,
which are researched by the academia.
7
http://www.gnu.org/copyleft/fdl.html
8
Some Wikipedia articles are limited to contributions only by users with certain rights. Cf.
http://en.wikipedia.org/wiki/Wikipedia:User_rights for details
9
http://en.wikipedia.org/wiki/Wikipedia:Consensus
10
http://wikimediafoundation.org/wiki/Our_projects
11
http://en.wikipedia.org/wiki/Wikipedia:Statistics as for 12.04.2013
16
Distributional aspects
One of the major research areas within the Wikipedia research community is dedicated
to questions regarding weaknesses of the platform. They represent waves of criticism
Wikipedia experienced during its years of existence. Its main criticism is directed
towards the question of quality and reliability of information on Wikipedia (Anderka
and Stein 2012; Magnus 2009; Martin 2011; Stross 2006). Among the most discussed
issues in this area are concerns regarding vandalism (Potthast et al. 2008) and the lack
of creditability in the process of creating encyclopaedia articles on Wikipedia
(Kubiszewski et al. 2011).
While the aforementioned works drive attention to the question of Wikipedia’s general
reliability, other studies call for these concerns to be tempered. BARTON'S (2005)
analysis has identified long-term potentials of the Wikipedia platform if its democratic
and decentralised nature is preserved. In this context, Wikipedia is seen as a
demonstration of information systems potentials in creating more emancipatory forms
of communication (Hansen et al. 2009). This perspective appears even more optimistic
when the current quality of Wikipedia articles is found to be comparable with those of
the proprietary Britannica encyclopaedia (Giles 2005). OLLEROS (2008) goes further in
defending the ‘Wikipedia principle’ by questioning the applicability of the traditional
quality criteria to the platform. According to OLLEROS, the quality dimensions for
encyclopaedias undergo changes as a result of disruptive influence that collaborative
projects such as Wikipedia are exercising. An example for such reorientation in
assessing quality of knowledge in general is an essay by ROSENZWEIG (2006) in which
the way of producing historic knowledge on Wikipedia is opposed to the traditional,
rather individualistic, approaches of historians. Furthermore, both WF and academia
17
continue proposing and introducing different instruments to improve the overall quality
of Wikipedia such as the reputation system suggested by KORSGAARD AND JENSEN
(2009).
Semantic aspects
Researchers concerned with the development of semantic web build upon limits in
Wikipedia’s search capabilities and its inconsistencies through duplication of
information on different pages (Morsey et al. 2012). Their efforts are directed towards a
better utilisation of information contained in Wikipedia. The DBpedia project is a
successful example of these efforts. BIZER ET AL. (2009) introduce DBpedia as an
extraction mechanism to convert structural information contained in Wikipedia
infoboxes into Linked Data14. In a related research, Siorpaes and Bachlechner (2006)
address the challenge of creating and maintaining ontologies for knowledge systems.
They advocate for using millions of Unique Resource Identifiers (URI)15 available on
Wikipedia to improve knowledge management in organisations by using socially
maintained consensual Wikipedia vocabularies.
Content analysis
12
Users can put pages into their watchlist in order to be updated on any changes those pages undergo
13
Cf. http://en.wikipedia.org/wiki/Wikipedia:Edit_warring for details
14
Cf. www.w3.org/DesignIssues/LinkedData.html for Tim Bernes-Lee essay on Linked Data
15
Cf. section 3.1 for Wikipedia structure
18
WILKINSON AND HUBERMAN (2007) discovered a high correlation between the number
of article edits and its quality. Correlation between popularity of an article indicated by
a high number of views and the editing intensity was identified by RATKIEWICZ ET AL.
(2010). Consequently, one could assume a correlation between the quality of a
collaborative work and the popularity of its content. Interestingly, another correlation
was found between the quality of an article and its size16. This speaks in favour of a
rather smooth process of the article evolvement in which additional edits would lead to
a constructive extension and correction of the existing text (Wilkinson and Huberman
2007). According to YASSERI ET AL. (2012), the majority of articles in English
Wikipedia follow this smooth evolution process.
While the aforementioned studies are concerned with either questions related to the WP
project as a phenomenon or with explanations of its properties such as the quality of the
articles, another branch of research comes from the epistemological direction. In light of
the thesis’ focus, this perspective on Wikipedia as a knowledge system and on the
evolution of this knowledge is more relevant. Following SUCHECKI ET AL. (2012) it is
possible to consider Wikipedia as a “proxy for knowledge in general”. The emphasis on
temporality of knowledge is characteristic for this type of research. Accordingly,
KALTENBRUNNER AND LANIADO (2012a) analysed changes in talk pages over time in
order to understand patterns of collaborative process associated with the content
creation. In this study, a measure of discussion growth is used as an instrument for
detecting controversies and assessing discussion maturity. RATKIEWICZ ET AL. (2010)
analysed measurable effects of external events on the related Wikipedia articles. They
identified exogenous factors, such as an Oscar nomination for an actor, to have
influence on the content of corresponding articles and on the process behind the creation
of this content.
16
Blumenstock, J. E. (2008), “Size matters: Word count as a measure of quality on wikipedia“, in
Proceedings of the 17th international conference on World Wide Web (WWW) 2008, pp. 1095–
1096.
19
The conducted literature review reveals that although Wikipedia is seen as a proxy for
knowledge in general (Suchecki et al. 2012), there have been no attempts to study the
genesis of this knowledge. In terms of the social representation theory, no studies
explore the genealogy of social representations circulating on Wikipedia. The latter is
striking since core aspects, namely temporality and high emphasis on communication,
are characteristic for both Wikipedia platform and the social representations theory. The
transparency of the collaboration process on Wikipedia makes an application of the SRT
framework to the encyclopaedia platform especially appealing.
The potential benefits of the method for studying social representation on Wikipedia are
evident when recapitulating problems in the research outlined in the both sections of the
literature review. On the one hand, a branch of Wikipedia research is concerned with
patterns in collaboration and consensus making (Kaltenbrunner and Laniado 2012a) -
processes which are difficult to study without a proper theoretical framework.
Considering that the application of qualitative techniques on Wikipedia is rare, the
explanatory power of current analysis conducted on Wikipedia remains limited. On the
other hand, applications of social representation theory are often limited by the lack of
data required to analyse the genesis of social representations (Vaast 2007). The theory
would furthermore directly benefit from a large scale study of anchoring and
objectification processes on Wikipedia since those processes are still barely understood
and rarely investigated (Duveen and De Rosa 1992, p.106). Critics of the theory
challenge the very existence of anchoring and objectification by arguing that tests for
existence of those hypothetical mechanisms are virtually impossible (McKinlay and
Potter 1987). A successful application of the method proposed in this thesis can provide
evidence for the existence of anchoring and objectification processes within the social
representations theory.
In summary, researchers within the Wikipedia community work with unique historical
data regarding knowledge genesis but lack a theory to explain it, while the research on
social representations theory provides an excellent explanatory device for studying
genealogy of knowledge but lacks the required data to identify any patterns. The given
thesis is a step towards overcoming this gap by introducing a method for studying the
genesis and evolution of social representations on the Wikipedia platform. In the world,
in which social media plays an increasing role in communication and collaboration
within and across social groups, both Wikipedia and social representation research
communities will benefit from this ‘marriage’.
20
In the following, the research methodology employed by the thesis will be introduced.
The aim of this section is twofold. First, the concepts and processes of social
representation theory need to be mapped to Wikipedia. Second, the quantitative analysis
on the basis of the established link between SRT and Wikipedia must be integrated into
a case study. Case studies are required to demonstrate and to verify the applicability of
the method. Furthermore, they should illustrate the necessity of a qualitative analysis
which supplements the quantitative data that can be derived from the historical data on
the Wikipedia platform.
The section begins with an analysis of the Wikipedia structure followed by a discussion
regarding the suitability of Wikipedia for studying social representations. The link
between Wikipedia and SRT elements is established in section 3.3, while section 3.4
presents the structure of the case studies. Implications of the required analysis
techniques for the case studies are discussed in section 3.5.
Page
When considering technical details of Wikipedia from the user’s perspective, its central
element is the page. Every Wikipedia page is an instance with a Unique Resource
Locator (URL) leading to it. An example of a page is the article “Thesis” with a
corresponding URL: http://en.wikipedia.org/wiki/Thesis.
Not every Wikipedia page is an encyclopaedia article. There are different page types
represented by so-called namespaces – sets of pages containing a special prefix that is
recognised by the MediaWiki software17. It is important to note that every instance of
MediaWiki software such as the English Wikipedia can have its own namespaces. Table
1 indicates the namespaces of the English Wikipedia.
17
http://en.wikipedia.org/wiki/Wikipedia:Namespace
21
As illustrated by Table 1, namespaces are divided into basic and talk namespaces. One
of the central collaboration aspects of Wikipedia is this distinction according to which
every page in any of the basic namespaces has a corresponding talk page in which users
can discuss the content of the actual page. While the ‘Alice’ encyclopaedia article can
be discussed on the ‘Talk:Alice’ page, the page for discussing the user ‘Alice’ is
‘User_talk:Alice’. The same logic applies to all other basic / talk namespace pairs.
The Wikipedia platform allows the content of entire pages to be used within other
pages. This is achieved by using the so-called templates. Templates reduce formatting
22
Every Wikipedia article is assigned to at least one category. The Wikipedia category
structure is hierarchical, with 26 main categories18. Each category includes a list of the
pages it contains, a list of its subcategories and a list of the supercategories that this
category is a part of. Page categories are displayed at the bottom of every page in the
main namespace. This allows for navigation between thematically related encyclopaedia
articles. Navigation between Wikipedia pages is also enabled by internal links entitled
wikilinks, which appear within the content of Wikipedia pages and point to other pages.
It is important to note that for encyclopaedia content, only links to articles relevant for
the understanding of the phenomenon are welcomed by the Wikipedia community19.
Any change to a Wikipedia page results in a new version of this page being permanently
stored on the Wikipedia servers as a revision which is made accessible for any user at
any time. Additionally, Wikipedia stores meta-data for the new revision. This includes
the timestamp, a user description of the change, an indication of the change in the page
size, a user indication of whether the change is minor or major and information about
the user who made the change. The latter includes the name of the user, or IP address in
the case of an anonymous user, and an indication of whether or not the user is a bot.
User
The user concept on Wikipedia requires elaboration. Wikipedia users are generally
divided into anonymous and registered users. Anonymous users are known by their IP
address, and have minimal rights such as permission to read and to edit unrestricted
articles. To obtain the rights to perform additional actions, a registered user must have
the required user access level20. Advanced user access levels allow blocking other users,
moving or renaming articles and applying restrictions to prevent pages from being
edited.
Registered users are divided into standard users, higher-privileged users such as
administrators, bureaucrats and stewards, and bot users. The privileged users possess
18
http://en.wikipedia.org/wiki/Category:Main_topic_classifications
19
http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Linking
20
http://en.wikipedia.org/wiki/Wikipedia:User_access_levels
23
additional access rights as mentioned before, while bot users are a group of non-human
automata in the form of programs scripts and macros. Bot users perform automated
tasks for the purpose of both maintaining Wikipedia quality and gathering statistics.
Additional remarks regarding the suitability of the Wikipedia platform for studying
social representations are required before establishing the link between the theory and
Wikipedia elements in the next section.
The claim by MARKOVÁ (1996, p.183) that “society forms its opinions by consensus
amongst members of the public who are all equal” can be applied to both the process of
forming representations and the process of creating Wikipedia articles. Although she
refers to the former, the similarity in the temporal consensus-making within SRT and on
Wikipedia is striking. This becomes clearer when the consensus-making on Wikipedia
is compared to the process of forming social representations as described by
MOSCOVICI. To illustrate the comparison, an extensive citation is employed:
When the aforementioned process is projected onto the Wikipedia platform, its users
form the collective decision committee. They vote and express opinions by changing the
Wikipedia content and participating in discussions on the corresponding talk pages.
They know how others have “voted” by accessing the revision history and by observing
the previous discussions regarding the topic of interest. In doing so, Wikipedia users
build upon the efforts of others. Consequently, the natural way of reaching temporal
consensus, as observed on the Wikipedia platform, is exactly the type of consensus
MOSCOVICI refers to in his words about forming social representations. In fact,
MOSCOVICI’s citation might well be used as an official description of the collaboration
process on Wikipedia.
24
As outlined in the literature review, the main weaknesses of Wikipedia are identified
around the quality of the encyclopaedia content. Those weaknesses are however less
relevant for answering the research question. This is due to the thesis focusing on the
process of forming social representations, rather than on the ‘correctness’ of this
content. The very idea of ‘correctness’ itself is abandoned in social representation
theory as was explained in section 2.1.2. The dynamic nature of Wikipedia and the
SRT, together with the growing number of social representations circulating the digital
world, make Wikipedia an appropriate means for investigating the concept of social
representations.
However, several aspects of the theory do not immediately match with the collaborative
processes on Wikipedia. First, the social representations theory assumes plurality of
representations within a single social group (Moscovici 1988, p.219). The Wikipedia
platform does not provide direct means for tracing different social representations of the
same phenomenon that exist simultaneously within a single social group. Rather, there
is only ever of one representation at any given point in time that can be analysed. This is
one of the Wikipedia specifics. Consequently, possible effects this aspect might have on
the evolution of social representations must be discussed. Another potentially
problematic aspect is the fact that all representations on Wikipedia seem to be of a
rather formalised nature: There are continuous efforts towards improving the quality of
Wikipedia by making it more neutral, objective and trustworthy. An example for such
efforts is the provision of extensive information regarding the sources from which the
content has been acquired. In the MOSCIVICI’s terminology one would say that they are
of a ‘scientific’ nature. The above implies that the social representations found on
Wikipedia are not situated in the consensual world as it is supposed by the original
formulation of SRT. Instead, they tend to be “indifferent to individuality and [to] lack
identity” – a characteristic for the opposite reified world (Moscovici 2000/1984, p.32).
This incoherence seems to support the criticism that the theory has experienced
regarding the distinction between the reified world of science and the consensual world
of common sense. Consequently, the work supports the view that the interrelation
between these two worlds is more complex than originally formulated by MOSCOVICI.
The theory of social representations was presented in section 2.1.2, while the structure
of the Wikipedia platform was discussed in section 3.1. Section 3.2 was considered with
the suitability of the Wikipedia platform to study social representations. It is now
possible to address the research question of how social representations on Wikipedia
can be studied.
To answer the research question it is necessary to break it down into several sub
questions as illustrated in Fig. 2. The first part of the question (Q1.1) will be addressed
in the following, while the second part (Q1.2) comprises the content of the subsequent
section.
In order to study the evolution of social representations on Wikipedia, the first step must
be the clarification of what is understood by ‘social representation’ on the platform
(Q1.1.1), as well as what the equivalent of an anchor is on Wikipedia (Q1.1.2). The
answers to both questions establish a theoretical link between concepts within the social
representations theory and Wikipedia elements.
representing the specific Wikipedia resources. They are used to organise the Wikipedias
content, and are therefore not considered to contain social representations.
Unlike with the social representation concept, the correspondence between the anchor
concept and Wikipedia elements is less apparent. There are different candidates among
Wikipedia elements that can be interpreted as anchors. In order to identify elements that
are meaningful and coherent with the theory, a number of different articles were
analysed in the initial phase of the research project. The most natural ‘candidate’ for an
anchor role within an article is an internal link to another article. The theory requires
that anchors are social representations themselves (Moscovici 2000/1984, p.42). An
internal link to an article in the main namespace satisfies this requirement as per
definition. However, not every internal link contained in an article plays the role of an
anchor: There are internal links that are not related to the social representation under
study. It was found that different sections of the article contain different numbers of
relevant anchors. After the examination of multiple Wikipedia articles, the anchors from
the first section of the article, known as the ‘definition part’21, were identified as the
most relevant. The reason for this is because the process of defining internal links
directly corresponds to the social representations theory. This process aims at
identifying concepts relevant to understanding the phenomenon of interest. Such
phenomena are linked by the Wikipedia users to the corresponding articles. Figure 3
exemplarily shows potential anchors in the definition section of the “Thesis” article.
21
The definition is the first part of the article before the table of contents or any other article sections.
27
Further candidates for the anchor role are the categories the article is assigned to. As
outlined in the section 3.1, every Wikipedia article in the main namespace is assigned to
at least one category. However, not every category has a corresponding encyclopaedia
article in the main namespace. Most categories have as their single function the
organising of articles into a category. For example, the article “IPad” is assigned to the
category “Products introduced in 2010”. The latter has no corresponding article in the
main namespace meaning that there is no encyclopaedia article for “Products introduced
in 2010”. This contradicts the requirement of an anchor to be a social representation
itself. Furthermore, analysing the categories of an article has proven itself to be a
tedious task. The ‘category’ elements on Wikipedia are hence disregarded as possible
anchors due to both theoretical and technical challenges.
In addition to internal links in the definition section and categories, any word in the
plain text of the article can potentially be an anchor. It is, however, difficult to identify
which of them are possible anchors and which are not. Additionally, there are reasons
for excluding unlinked components of the plain text from the list of potential anchors.
The process of linking concepts as such is an important factor when it comes to
interpretation of an element as an anchor. Internal linking on Wikipedia is a purposeful
act by the social group that indicates an intended reference to a related concept. Such
intention emphasises the importance of the referenced social representation when it
comes to giving meaning to the social representation under study.
Similarly to the link between the concepts of social representation theory and Wikipedia
elements, a connection between SRT processes and those observable on Wikipedia must
be established in this section. A match between theoretical processes within the theory
and those observed on Wikipedia is required to interpret changes in the social
representation on Wikipedia in accordance with MOSCOVICI’s theory. This is the second
part of the research question and can be broken down into sub questions as shown in
Fig. 3. Answers to the sub questions will provide a foundation for recognising patterns
in the evolution of social representations and in the collaboration processes
accompanying this evolution.
Regarding social representation evolution, there are two different groups of aspects to
be considered when searching for means of studying these processes on Wikipedia. The
first group consists of aspects regarding the change in the representation while the
second group focuses on collaboration dynamics within the social group. Figure 3
illustrates the distinction.
28
Similarity of anchor states (2) is another important aspect, as anchors are not considered
to possess equal strength or significance for the corresponding social representation.
Even after the introduction of many new anchors, the similarity between anchors from
two different periods can stay on the same level due to the relative weakness of the new
anchors. To measure the similarity of anchoring in two different time periods, the
cumulative time during which the anchors were present in those periods of time can be
compared. A higher similarity value should indicate that the time in which the strongest
anchors were present in one period is comparable with the time the same anchors were
present in the other period of time. It is important to note that to assess similarity, a pure
quantitative analysis as described above is not sufficient. Only a qualitative analysis of
29
anchors and their context can reveal whether they are indeed similar or not. Quantitative
data can only indicate changes that will then trigger the subsequent qualitative analysis.
Identifying new/obsolete anchors (1) and measuring the similarity of anchoring (2) does
not however completely describe the anchoring evolution process: Data about the
stability of anchoring (3) is also of importance. It is possible that anchoring in two
different time periods is very similar according to the above measures, and yet none of
the anchors stays in the article for a longer time. Therefore, it is required to assess the
average time that anchors are present in the article. A value that is considerably less
than the timeframe would indicate that the anchoring in this period is instable due to
anchors in average being present only for a short amount of time.
The last aspect in the representation evolution is the indication of the objectification
process (4). According to the SRT, objectification is a process through which the
unfamiliar becomes familiar. Consequently, indications for objectification can be found
by looking at other social representations using the corresponding social representation
as an anchor. This is because only something already familiarised to a certain degree
can be used as an anchor. Thus, social representations of more familiar phenomena are
used more often to anchor other phenomena than those that have not yet reached a
higher degree of objectification.
The intensity of the collaboration process that is associated with a social representation
under study (5) is derived by comparing the number of article edits and editors in
different time periods. Given that each encyclopaedia article on Wikipedia is interpreted
as a social representation, this is the most natural and the only available means of
accessing changes in the collaboration process. Correspondingly, an increase in the
number of article edits in a period of time indicates higher collaboration intensity.
Similarly, the number of participating editors in different time periods is another
possible indication of change in the intensity of the collaboration process. Disagreement
among participants (6) can be measured by looking at the relative number of anchor
30
Case studies in this work aim to demonstrate and verify the applicability of the method
developed to study social representations on Wikipedia. Furthermore, they should
illustrate the use of quantitative data that is available on Wikipedia to support the
analysis of the social representation under study.
It must be noted that not every Wikipedia article is suitable for a case study. The article
should satisfy the following criteria. First, it should have a high number of edits and
distinct editors in order to represent the social aspect of representations. Second, its
historical data should cover a longer period of time to provide enough data for pattern
recognition. The phenomenon described by the Wikipedia article, however, should not
be older than the encyclopaedia itself. Otherwise, the initial phase in the process of
forming a social representation cannot be observed. Finally, the phenomenon should
have a certain level of ambiguity or novelty which is required to demonstrate the
complexity of the corresponding collaboration processes.
Following the concepts of SRT, all case studies should feature the following three pars.
The first part will introduce the social representation under study to provide a context
for the subsequent analysis. The second part will focus on the evolution of the
corresponding social representation in terms of anchoring. Integral parts of the
anchoring analysis are described in section 3.4.1. The objectification process of a social
representation under study will be analysed in the third part of the case study.
Correspondingly, section 3.4.2 provides details about how the analysis of the
objectification process is conducted.
31
Changing anchors in a Wikipedia article over time provide information that is required
to interpret the direction of the change in the social representation under study. In order
to identify patterns in this process, it is apparent to group anchors in categories, which
reflect distinct aspects of the representation. The categorisation requires a qualitative
analysis of the anchors. Wikipedia’s historical data, which includes all relevant internal
links (cf. Section 3.3.2), will provide the foundation for this qualitative analysis.
By considering the overall time-frame used for the case study, the “stability” of the
anchoring is examined as the first step of the anchoring analysis. The analysis includes
interpretation of data about the stability and similarity of anchoring as described in
section 3.3.2. Interpretation of this data provides first indications for the intensity of the
change that a particular social representation is undergoing without going into details.
After the interpretation of indications provided by the quantitative data about internal
links, a detailed qualitative anchor analysis should be conducted on the level of single
anchors. For this purpose, monthly and yearly anchor data are used. Out of all anchors
in the given time period only the strongest anchors are considered. Anchors that fall
under the threshold, which is set individually for each case study, are not accounted for
in the analysis. In order to reveal the meaning of each anchor, it is necessary to observe
anchors in the context of the historical revisions they are a part of. The exploration of
different historical revisions is a foundation for coding anchors and categorising them
into different groups. The coding is, in turn, a prerequisite for the construction of the
narrative that describes the evolution of the anchoring for the social representation
analysed in a case study. Identified anchor categories help to trace significant changes in
the social representation, in contrast to changes given by anchor changes within
categories. In order to derive meaningful categories from the list of generated anchors,
steps illustrated in the Fig. 5 are required.
Although anchors identified in the definition section of the article are suitable
candidates for representing first order codes as defined by DACIN ET AL. (2010, p.16),
redundant or unrelated anchors may exist among them. In the first step (1), every anchor
in context of an anchor is uncovered. During the analysis of the historical revisions, all
different text passages in which the anchor appears are examined, documented and their
relationship type to the phenomenon is formulated. For example, the relationship of the
anchor ‘iPhone’ to the social representation of the Apple iPad can be “a device from
which iPad inherits”. Depending on the context, it can also be “a device iPad is opposed
to”. In case there are different contexts among historical revisions for the same anchor,
all different relationship types are formulated.
32
Given the context for every anchor, in the form of relevant citations and relationship
types the anchors form with the phenomenon under study, it is possible to merge
redundant and to delete unrelated anchors in step (2). The reasons for an anchor to be
considered unrelated are manifold. For example, the definition section of an article can
contain a reference to a pdf document in which ‘pdf’ is linked to the article about the pdf
file format. This article is not an anchor for the phenomenon. Redundancy, on the other
hand, is always due to either the renaming of an article during the timeframe of the
analysis or to the existence of multiple articles that have different names, yet describe
the same phenomenon22. In accordance with MILLS ET AL. (2006), the resulting list of
relevant anchors corresponds to open codes and is subject for theoretical coding, which
is done by applying steps (3) and (4) in several iterations. The aim is to derive a
minimal set of mutually exclusive and collectively exhaustive categories. As long as the
current set of categories does not satisfy the condition, both categorisation steps are
applied to each anchor that is not yet assigned to a category.
22
This is a rare case resulting from the quality issue on Wikipedia where two articles which describe the
same phenomenon coexist for a certain amount of time before they are merged.
33
In step (3), an anchor is assigned to a category. The anchor will either be assigned to an
existing category, or to a new category in the case that a category corresponding to the
relationship type is yet to be defined.
Step (4) is required to account for adding a new category. The function of the step is
twofold: First, verification of whether or not there are anchors that can be assigned to
the new category, and second, reconsideration of the existing categories in case they are
concurrent with the newly added category in the sense that some of the anchors can be
assigned to either of them. In case of the conflict, categories are redefined accordingly
and all their anchors categorised into one of the unambiguous categories.
It is important to note that the anchoring analysis includes more than the categorisation
of anchors. The identified categories are subsequently used to divide the overall analysis
time frame for the evolution of the corresponding social representation on Wikipedia
into distinct phases. Each phase should represent a significant change in the social
representation under study, which is given by increasing or decreasing importance
among the identified anchor categories in the corresponding phase.
The last part of every case study will shed light on the evolution of the social
representation under study regarding the objectification process. As identified in section
3.3.2, the level of objectification can be assessed by identification of the Wikipedia
articles that contain a reference to the article under study. Out of all references, only the
references to articles in the main namespaces are considered. Furthermore, indirect
references resulting from articles using templates containing the analysed article are
disregarded. The remaining articles are thus the encyclopaedia articles with a reference
to the article under study in the text or within categories.
Using the results from anchor analysis, objectification evolution will be compared with
the anchoring dynamics. In this context, the aim is to discover how changes in
anchoring influence the objectification process. This comparison can potentially provide
an additional perspective on the evolution picture process of the social representation
under study.
look into the causes of those changes. Second, the qualitative analysis of anchors
requires lists of anchors for different time periods which must be isolated from the
revisions of the article. Third, analyses such as the identification of articles pointing at
the social representation of the interest are impossible to conduct manually – it is a task
that requires analysing all encyclopaedia articles on Wikipedia. Lastly, quantitative data
are required for identifying common patterns in the evolution of social representations
on Wikipedia.
WikiGen is an online tool created to support case studies in this thesis. The tool is a web
application which can be accessed via a standard web browser. By connecting to
Wikipedia databases over Wikipedia’s Application Programming Interface (API),
WikiGen generates multiple statistics and navigation maps based on the historical
revisions of the chosen article. There are 196 Wikipedia languages supported by the
web application from which the user can select articles.
In the next section, design decisions made during the initial phase of the tool
development are explained and the resulting tool architecture is presented. This is
followed by a detailed elaboration on statistical features implemented in the tool. The
corresponding section 4.2 provides both formulas for the measures as well as
exemplarily graphical and textual outputs of the corresponding statistics.
4.1 Architecture
23
Cf. http://en.wikipedia.org/wiki/Wikipedia:Database_download
24
https://toolserver.org/
25
Full dumps amounts to several terabytes when expanded. Cf. http://dumps.wikimedia.org/enwiki/latest/
36
impossibility of working with the live data per definition. Updating dumps on a regular
basis complicates the project maintenance and adds complexity to the infrastructure.
The Toolserver addresses the latter problems by taking over the maintenance of the
replicated Wikipedia databases. It allows data accessing for third party applications over
a specially created interface. In exchange, the Toolserver adds an additional dependency
for the project – a complex component out of the programmer’s control. Additionally,
the Toolserver has a long and tedious registration process for anyone who intends to use
the service. The delay in registration time is amplified by the learning curve associated
with the complexity of the Toolserver’s interface. In context of a limited time provided
for the thesis, the use of the Toolserver services is therefore not feasible.
Consequently, the API approach was chosen for accessing Wikipedia’s historical data.
This approach allows for the most compact and transparent infrastructure. Moreover, it
is the only way to access live data and, apart from using the Toolserver, the only way to
access different Wikipedia language databases efficiently. The latter utilise the same
API, which allows for an inherent multilingual support in the WikiGen tool. The only
restriction of the API solution in the context of this work is the limited performance – a
drawback which is only relevant in case of performance critical tasks. Our analysis has
shown that for the quantitative analyses required by case studies, API performance is
sufficient. Especially shifting the calculation procedures from the server to the clients
allows the scalability required to overcome possible performance issues.
The overall infrastructure for the WikiGen web application is illustrated in Fig. 6.
The web application is stored on a web server26 and can be accessed by any computing
device with browser functionality such as desktop computers, mobile phones or tablet
computers (1). Using statistics functions provided by the tool results in the HTTP
requests that are sent to the Wikipedia API. Each request contains specifications of the
data required for the corresponding statistical analysis (2). Note that every language
version of Wikipedia has its own API. Figure 6, however, depicts the API element
outside the Wikipedia elements. This is done in order to highlight the fact that all those
different API instances provide the same functionalities. The actual HTTP requests
from WikiGen are sent to different URL’s that corresponds to the Wikipedia language
platform that the user has chosen for the analysis. Requested data are then sent to the
requesting device in the JSON format (3) and is directly processed on the device.
The WikiGen web application is a pure HTML, CSS and JavaScript solution. The visual
elements are controlled by HTML/CSS while JavaScript controls the application
navigation flow as well as triggers the data requests and performs the subsequent
calculations. Figure 7 illustrates both front- and backend layers of the application.
The front layer consists of multiple HTML documents containing styled web
application elements as well as place holders for the statistical elements such as graphs
and data tables. As for the JavaScript backend, it utilises jQuery 27 libraries to control
web application elements through the application scripts and to control statistics
elements through the calculation and rendering script modules.
26
Current address of the server is http://wikigen.info.net.ua/
27
Cf. http://jquery.com/
38
The application scripts consist of scripts for navigation between different HTML pages
of the application (1), scripts for animation of the HTML elements (2) and scripts used
for application configuration (3). The latter includes control over appearance of HTML
elements as well as data source related configurations. Statistical elements in WikiGen
require dedicated modules for their calculation and rendering. The calculation module
includes scripts for data requests to the Wikipedia API (4), the implementation of data
processing algorithms (5) as well as multiple utility functions for data converting, text
parsing and other supporting algorithms (6). Rendering algorithms (7) as well as open
source libraries they utilise to visualise diagrams (8)28 and data tables (9)29 are part of the
rendering module. Technical characteristics of WikiGen include full parallelisation of
its functionalities, scalability due to efficient client calculations, broad language support
and extensive in-tool help.
Statistical features in the WikiGen web application are divided into editing statistics
(section 4.2.1), link statistics (section 4.2.2), reference statistics (section 4.2.3) and
statistics provided by already existing tools which were integrated into the WikiGen
application (section 4.2.4).
To capture the collaboration processes in greater detail, different types of editors are
distinguished according to user attributes provided by the Wikipedia platform:
anonymous, registered and bot editors31. Overall editors include all three different types.
In a similar vein, overall edits are divided into distinct and major distinct edits. The
distinction is based on the data provided by Wikipedia regarding major or minor edits as
well as observations made during the revision analysis of different articles: A typical
pattern was discovered when a number of subsequent revisions is created by the same
user within a short period of time. This pattern arises when an editor saves the article
several times during the editing procedure. In this case, all resulting revisions can be
considered as a single distinct edit.
28
Cf. http://www.flotcharts.org/
29
Cf. http://www.datatables.net/
30
Cf. section 3.3.2 and question Q1.2
31
Cf. section 3.1
39
Different types of edits and editors can be combined into multiple versions of edits per
editor statistic. However, from all possible combinations, only the major distinct edits
per corresponding non-bot users are considered. It is the only meaningful combination
in the sense that minor or intermediate edits as well as edits made by bots are irrelevant
from the perspective of the social representations theory.
Figure 8 illustrates an exemplarily output of the WikiGen tool for the editing statistics.
Per default, the data are visualised in the form of monthly and yearly bar charts. The
chart appearance can be however changed in the WikiGen settings.
Furthermore, data for the overall period of time is provided in text form (Fig. 9).
The output for editor statistics is analogous to the edits statistics. It displays yearly and
monthly data for the amount of editors that were active in the corresponding time
frames. The number of edits and editors is furthermore combines into a joint measure
edits per editors. An exemplary chart for edits per editor data is shown in Fig. 10.
40
Editing statistics can be interpreted in accordance with the research question Q1.2
presented in section 3.3.2. An increase in the intensity of the editing activity might
indicate external events such as, for example, a murder charge in February 2013 for a
South African sprint runner with a double below-knee amputations Oscar Pistorius. A
clear rise of editing activity in the corresponding article is indicated in Fig. 11.
Using the interpretation scheme from Appendix E it is possible to use edits per editors
statistics to identify interest increase or decrease, asymmetries as well as find indication
for higher unfamiliarity level of the underlying phenomenon.
One of the most useful WikiGen features for the qualitative analysis of social
representation on Wikipedia is the interactive revision map illustrated in Fig. 13. Every
horizontal line in the chart corresponds to a historical revision of the article. Beside the
insight into the distribution of revisions over time which helps to identify gaps and
possible periodicity in the editing activity, it allows for fast navigation between different
revisions. By choosing any revision, the content of the corresponding historical version
of the article is displayed.
42
The major part of the WikiGen tool focuses on the anchoring analysis. Since
Wikipedia’s internal links are interpreted as anchors, all measures in this section are
based on internal links to Wikipedia articles, which can be found among revisions of the
article under study.
The map visualises periods of time a chosen article was present or absent in the revision
and linking the historical revisions in which the anchor disappeared or (re)appeared. In
that way it is possible to navigate between relevant historical revisions of the article in
which the anchor which is analysed is contextualised. Figure 14 exemplarily depicts an
extract from the anchor map for the user interface anchor in the iPad article. There are
two time periods in which user interface anchor was present in the article.
Every point in the revision map corresponds to a revision in which an anchor was
introduced or removed. When clicking at the point, a corresponding revision is opened
showing the content of the historical article similar to the revision map functionality
explained in the previous section.
Figure 15 shows the article which is opened after clicking on a point corresponding to a
revision dated 15.03.2010 18:39.
Any comparison between anchoring in two different periods of time requires data about
all anchors present in those periods. Statistics regarding anchor evolution are based on
this data in form of snapshots. A snapshot in this context is a list of all anchors in a
given period of time including all relevant attributes. Figure 16 shows a snapshot for the
iPad social representation in the year 2010. The table has 5 columns:
Anchor: name of the anchor which is an internal Wikipedia link within the definition
part of the article.
Days survived: cumulative number of days an anchor was present in the definition part
of the article. Note that this number is not resulting from the difference between first
and last seen. Instead it adds all time periods in which the anchor was present in the
article and thus corresponds to the graph in the Anchor map section.
44
Revisions survived: number of major distinct edits an anchor survived. The definition
of major distinct edit is coherent with the one given in section 4.2.1. This measure is
beside Days survived another perspective on how strong an anchor is.
Anchor Strength: a linear combination of Days survived and Revisions survived (cf.
formula 1).
, where (1)
This rating between 0 and 1 indicates the strength of an anchor in the sense that an
anchor is strong (1) if it both survived all revisions and stayed in the article for the
whole period of time. All anchors are sorted in the snapshot table according to the rating
column to show the strongest anchors for the corresponding timeframe. Note that only
45
anchors which survived at least one day are entering the table in order to filter out
unimportant/junk data.
This section directly corresponds with the aspects 1-3 of the research question Q1.2 in
section 3.3.2. For the purpose of measuring dynamics in the anchoring process,
WikiGen Tool provides a number of different statistics which will be explained in the
following.
Figure 17 shows a bar chart WikiGen generates in order to indicate the amount of newly
introduced and removed anchors in a particular period of time. The data directly
correspond to the first aspect of the research question Q1.2 (cf. section 3.3.2).
The anchor dissimilarity measures how anchors in time period are dissimilar to
anchors in the previous period . The basis for the measure is the anchor attribute
days survived which is the cumulative time an anchor stayed in the definition part of the
article in the given period of time . Note that in order to remove influence of anchors
that enter the article due to vandalism, only anchors that were present in the
corresponding period for at least one day are accounted for.
The dissimilarity values range from 0 (anchors are absolutely dissimilar) to 1 (anchors
are absolutely similar). The logic behind the calculation is to take the sum of least
common days survived for all anchors present in periods and and set it in
relation to the total sum of maximum days anchors survived for every anchor in periods
and . The calculation is done according to the formula 1.
46
(1)
, where
in .
(2)
The WikiGen tool provides monthly and yearly dissimilarity data in form of charts.
Random examples of dissimilarity measure outputs are shown in Fig. 18.
Average anchor durability measures the average time every anchor was present in the
definition part of the article in a particular time frame . However, only those anchors
are considered which stayed in the article at least for one day. This is done in order to
remove influence of weak anchors on the average value. The measure can shed light on
how stable the anchoring is during a particular time frame .
(3)
, where
in .
The WikiGen tool provides monthly and yearly average anchor durability data in form
of charts. Examples of the charts are shown in Fig. 19.
Edit-war level statistics measures the level of disagreement in the collaboration process
by relating the number of introductions and disappearances of anchors to the total
number of unique anchors in a period of time. The higher the value of the statistics the
more disagreement is expected to be observed in the corresponding period of time since
the same anchors would be introduced and removed several times.
, where (4)
The WikiGen provides monthly and yearly data for edit-war level in form of charts
indicated in Fig. 20.
Reference statistics part of the WikiGen tool reveals how many articles contain
references to the analysed article. For example, if a link to ‘Einstein’ was introduced to
the ‘Physics’ article in the year 2009, WikiGen would add one reference for the year
2009. If an article contains several links pointing at the article of interest, it is
considered as one reference. Note, that an article containing a reference to the article
under study is called a ‘backlink’.
In terms of the social representation theory, the statistics mirrors the process of
objectification during which a social representation is increasingly used as an anchor for
other social representations.
The result of the statistical analysis is the distribution of different reference types over
time. Reference types correspond to the location of the reference in the identified
article. Correspondingly, there are four different article types according to the reference
types:
Backlinks in text: articles where a reference is found in the rest of the article following
the definition section.
Indirect backlinks: articles which point at the article indirectly by using Wikipedia
templates32.
Figure 21 illustrates the graphical output of the reference statistics in WikiGen. The blue
line indicates the cumulative amount of referencing articles over time. The higher the
cumulative amount is the higher is the objectification level of the corresponding social
representation.
32
Cf. template definition in section 3.1
50
In addition to statistics which are specially developed for the purpose of the given
thesis, WikiGen integrates two already existing tools for analysing Wikipedia articles.
The first tool provides Wikipedia article traffic statistics33. The tool visualises the
viewership of Wikipedia articles over time. Several articles can be compared with each
other in an integrative graph as shown in Fig. 22.
The second tool integrated in the WikiGen web application is a tool called
contributors34. It allows for a detailed analysis of the contributor structure displaying the
33
Cf. http://toolserver.org/~emw/wikistats/
34
Cf. http://toolserver.org/~daniel/WikiSense/Contributors.php
51
This section contains two case studies. As outlined in section 3, case studies aims to
demonstrate and verify the applicability of the method developed to study social
representations on Wikipedia. Furthermore, they illustrate the employment of the
WikiGen statistical tool as they supplement quantitative data provided by the tool with a
necessary qualitative analysis.
The case study of ‘cloud computing’ is introduced in section 5.1. Section 5.2 contains
the case study of ‘iPad’. Results of both case studies including the evaluation of the
method are discussed in section 6.
Cloud computing (CC) is arguably one of the most popular yet ambiguous phenomena
in the information technology field today. As the title of the case study suggests, its
representation on Wikipedia is a subject of change. The demonstration of the evolution
of cloud computing social representation will begin with the context for the case study.
This will include a brief description of the phenomenon in its present form as well as
several characteristics of the CC social representation on Wikipedia. The latter include
collaboration aspects and are therefore necessary for the analysis of anchoring and
objectification processes in respective sections 5.1.2 and 5.1.3.
To begin the case study of cloud computing social representation on Wikipedia, the
current35 definition of the phenomenon on the Wikipedia platform is required.
35
https://en.wikipedia.org/wiki/Cloud_computing, accessed 14.06.2013 11:35
53
The first version of the cloud computing article on Wikipedia is dated 3 of March 2007.
This date is therefore where the case study analysis will commence. The standardised
distribution of the subsequent 7264 revisions over time is sketched in Fig. 24 with the
value 100 indicating the peak revision number.
Note that, the evolution time frame for the social representation of cloud computing
corresponds to the ‘lifetime’ of the phenomenon outside of Wikipedia. The latter can be
approximated by the Google search index graph illustrated in Fig. 25. Consequently,
the evolution of the CC social representation on Wikipedia is concurrent to the
corresponding evolution of the cloud computing phenomenon outside of the platform.
This would not be the case if the phenomenon had existed longer than the online
encyclopaedia itself.
36
http://en.wikipedia.org/wiki/Jargon, accessed, 14.06.2013
54
A detailed view of the monthly and yearly revision distribution in Fig. 26 illustrates the
major distinct edits38 measure to be more stable over time than the overall number of
edits. For example, overall edits in February 2010 increases by 62% compared to the
21% increase of the major distinct edits. This effect can be observed across all periods.
Consequently, peaks in the number of overall edits are mainly due to either an increase
in the number of minor edits or in the number of subsequent edits by the same user.
Both of these edit types are not accounted for in distinct major edits measure. This is an
important observation for interpreting changes in the collaboration process.
To underline the claimed popularity of the cloud computing article on Wikipedia, the
number of views for the cloud computing article is compared to the number of views for
the most popular Wikipedia article in the year 2012 – “Facebook”39. Fig. 27 indicates
37
Source: http://www.google.com/trends/explore?hl=en#q=Cloud%20computing&cmpt=q
38
Cf. definition of major distinct edits in section 4.2.1
39
Source: http://toolserver.org/~johang/2012.html
55
that the average views per day for the cloud computing article is 11140. This is 14.9%
of the views for the ‘Facebook’ article40. In this thesis, power law distribution with a
long tail is assumed for the popularity of Wikipedia articles. Accordingly, the majority
of articles on Wikipedia is assumed to have less than 1% of the views compared to the
most popular article. Therefore, the cloud computing article must be considered as
rather popular.
The last aspect required for reconstructing the context of the cloud computing social
representation on Wikipedia, before discussing the anchoring and objectification
processes, is the collaboration dynamics. Figure 28 illustrates the application of the
interpretation scheme from Appendix E. The orange line indicates the amount of edits
per month, the blue line indicates the amount of contributors per month and the green
line indicates the combined monthly measure edits per editor. All data are standardised
in the interval from 0 to 1 to capture the tendencies rather than the absolute values.
Fig. 28 Standardised Monthly Edits, Editors and Edits per Editors Statistics
40
Data is visualised by Wikistats: http://toolserver.org/~emw/wikistats/
56
The graph illustrates 69 months of a volatile collaboration process with a total of 3182
users having participated in it. According to the interpretation scheme, 11 from 13
possible collaboration trends are observed for cloud computing. Any one trend is
observed for a maximum of two subsequent months. As a result, the collaboration
process consists of several alternating centralisation and decentralisation phases. Green
areas on the graph indicate centralisation phases that come together with an increase in
the number of revisions. These phases potentially reflect a high degree of unfamiliarity
associated with the cloud computing phenomenon among the Wikipedia users. In the
following anchoring analysis, it will be verified whether or not the centralisation phases
are caused by the increasing editing efforts of users to resolve conflicting
representations. Consequently, the statistics from Fig. 28 help to relate changes in the
social representation of CC to the corresponding changes in the collaboration process.
The analysis of the cloud computing anchoring ranges from the first article revision
dated 3rd of March 2007 until 25th May of 2013. First, anchor statistics from the section
4.2.2.3 are introduced for this time frame. Second, all relevant anchors according to the
coding process introduced in section 3.4.2 are categorised. Finally, different phases in
the evolution of the social representation anchoring are identified and described using
both introduced statistics and identified categories.
Anchor statistics for cloud computing generated by the WikiGen include new and
obsolete anchors in each period, the dissimilarity between anchoring states, the average
anchor durability in different time periods and the anchor edit war level for each
period41. The statistics indicate a dynamic anchoring process. The following introduces
the statistics and explains them in terms of indications they provide. The actual
narrative interpretation of the data will follow in the subsequent section.
The search for changes in the social representation is difficult without the data for the
number of new and obsolete anchors for each time period in the analysis. New aspects
of the representation are necessarily accompanied by the introduction of anchors that
were not used in the previous period or by the removal of anchors from the last period.
Figure 29 illustrates this data for the cloud computing social representation together
with exemplification of time periods that contain the most significant changes. For
41
Cf. section 4.2.2.3 for the definition of the measures and Appendix F for statistic details
57
example, in the August 2008, 49 new anchors for cloud computing were introduced.
This was followed by the removal of 44 anchors two months later (cf. leftmost area
marked grey). Similarly, further marked areas show periods with extensive introduction
of new anchors or removal of obsolete anchors. In subsequent sections, such indications
are used as a starting point for a qualitative analysis with a twofold aim. The data help
to reveal the nature of the changes in the representation and facilitates the division of
the evolution process into distinct phases.
Fig. 29 Amount of New and Obsolete Cloud Computing Anchors per Month
Anchor dissimilarity
The effect of introducing new anchors, and removing of the obsolete ones, can be
observed on the monthly anchor dissimilarity data in Fig. 30. The introduction of the 49
aforementioned anchors results in a dissimilarity of the anchoring in August 2008,
peaking at 0.97. In this case, 0.97 indicates that 97% of the anchoring is different to that
of the previous month42. However, the dissimilarity data can also indicate that some
introductions and removals of anchors are only temporal. For example, the introduction
of 21 new and removal of 16 obsolete anchors in February 2010 corresponds to a low
dissimilarity measure of 0.33 indicating that those changes have not endured.
42
Cf. definition of the dissimilarity measure in section 4.2.2.3
58
It must be reiterated at this point that the quantitative data only indicate possible
changes. A high dissimilarity measure might, for example, be misleading. Old anchors
can be replaced by new anchors with a similar meaning. Therefore, it is necessary to
code anchors and to explain the dissimilarity in terms of changes among different
categories rather than within them. The anchor coding is provided in the next section.
The average anchor durability measure provides additional support for the overall high
dynamics of the cloud computing anchoring process. The year 2011 especially
demonstrates the anchor ‘instability’ with anchor being present in the article for an
average of only 52.96 days.
For the majority of periods, decreases in average anchor durability follow increases in
anchor dissimilarity. However, exceptions from this observation indicate cases in which
the decrease in the measure is due to anchors being present in the article only for a short
period of time. These indications are used to correctly identify phases in the anchoring
evolution of cloud computing.
The last anchor statistics employed in the analysis is the anchor edit-war level. The
graph depicted in Fig. 32 is especially valuable when compared to the collaboration
dynamics in Fig 31. A combination of an intense collaboration and a high edit-war level
is a clear indication of a high level of disagreement between contributors. Therefore,
this becomes an additional instrument for interpreting changes in the social
representation of cloud computing.
The quantitative WikiGen analysis identified a total of 325 anchors for cloud computing.
Following the analysis requirements explained in section 3.4, anchors that are identified
by the WikiGen tool need to be analysed qualitatively and grouped into categories in
order to reflect different aspects of the cloud computing social representation. In
accordance with section 3.4.2, the anchor strength threshold of 0.15 was chosen. The
context analysis of the remaining 122 strongest anchors identified 15 anchors to be
excluded. Three anchors (pdf, nist, gartner) were removed as they were unrelated to the
social representation of cloud computing. A further 12 anchors were found to have
duplicates due to different spellings for example “yahoo” and “yahoo!”.
60
The coding of the remaining 107 anchors revealed 9 different groups of anchors
presented in Tab 2. The complete list of the strongest anchors for the cloud computing
social representation including categorisation is provided in Appendix B.
Concepts from which cloud Figurative aspects of cloud Technical aspects of cloud
computing has departed (1) computing (2) computing (3)
Sub concepts of cloud Origins of cloud computing Cloud computing solutions and
computing (4) (5) providers (6)
Benefits of cloud computing Broader concepts related to Means to interact with cloud
(7) cloud computing (8) computing (9)
1. The category “Concepts from which cloud computing has departed” (7 anchors)
describes concepts that cloud computing is distinguished from. Those anchors also
emphasise how cloud computing influenced typical practices in the IT. Exemplarily, the
anchor client-server43 describes the paradigm shift from using rich clients towards using
thin clients or web browsers in order to access IT resources. In the same context,
anchors such as product44 and software45 are used to emphasise that something that was
previously delivered as a software product became a service within cloud computing.
2. Six anchors in the category “Figurative aspects of cloud computing” use metaphors,
abstraction or analogies to describe cloud computing. Accordingly, electrical grid46 and
electricity47 anchors are analogies for resource delivery within cloud computing. The
analogies draw a similarity between cloud computing and how electricity is delivered
through the electricity network. Further anchors in this category such as computer
network diagram48 and cloud49 serve as metaphors for the internet as such. Among the
anchors in this category are also two anchors: abstraction and metaphor. Abstraction50
points at the hidden complexity of the internet, as implied by cloud computing, while
metaphor51 points at how cloud symbolises the internet.
3. The “Technical aspects of cloud computing” category (17 anchors) represents anchors
that are either integral technical components of cloud computing, or technologies used
within cloud computing. This group includes anchors such as server, multitenancy,
virtualization, remote server, data, parallel computing and computer cluster.
43
Cf. cloud computing revision from 22:38, 27 April 2010
44
Cf. cloud computing revision from 08:12, 19 August 2011
45
Cf. cloud computing revision from 09:35, 9 October 2007
46
Cf. cloud computing revision from 13:35, 18 August 2011
47
Cf. cloud computing revision from 22:52, 1 August 2008
48
Cf. cloud computing revision from 12:49, 14 October 2008
49
Cf. cloud computing revision from 16:52, 30 July 2008
50
Cf. cloud computing revision from 16:37, 23 April 2009
51
Cf. cloud computing revision from 07:02, 11 April 2009
61
4. The twenty anchors that comprise the “Sub concepts of cloud computing” category
either represent part of cloud computing, or can be considered as synonyms to the entire
phenomenon of cloud computing. Each anchor in this category can be used in one of the
following two wordings: a) Cloud computing is <anchor> or b) <anchor> is a part of
cloud computing. Consider the first anchor used for cloud computing on Wikipedia:
utility computing. In this case, cloud computing is substituted with utility computing. It
is possible to formulate: “Cloud computing is utility computing”. An example for an
anchor representing a part of cloud computing is infrastructure as a service52. In
addition to software as a service, data as a service and the other types of services cloud
computing can deliver, infrastructure is one of the integral components of cloud
computing. It is thus possible to construct the wording: “Infrastructure as a service is a
part of cloud computing”. The same logic applies to all anchors in the category.
5. The small “Origins of cloud computing” category comprises three anchors. Each of
these is linked to a reference of possibly the first scientist who used the term ‘cloud
computing’ in a scientific paper53. The name of this Brazilian professor is Ramnath
Chellappa. His affiliation to the Goizueta business school of the Emory University is
labelled in the two corresponding anchors.
6. The category “Cloud computing solutions and providers” (33 anchors) is formed by
examples of cloud computing instantiations, and examples of corporations that either
provide or use cloud computing. Among the corporations providing cloud computing
solutions are: Google, NetSuite and Salesforce54. Corporations such as General
Electric, L'Oréal and Procter & Gamble are known for adopting cloud computing
solutions55. Implementations of cloud computing, on the other hand, are represented by
anchors such as Google apps56, Amazon web services57, and azure service platform58.
52
Cf. cloud computing revision from 04:50, 25 February 2009
53
Cf. cloud computing revision from 22:20, 10 June 2009
54
Cf. cloud computing revision from 08:24, 31 July 2008
55
Cf. cloud computing revision from 22:13, 5 August 2008
56
Cf. cloud computing revision from 06:08, 8 August 2008
57
Cf. cloud computing revision from 22:13, 5 August 2008
58
Cf. cloud computing revision from 18:19, 3 December 2010
59
Cf. cloud computing revision from 10:36, 23 April 2012
60
Cf. cloud computing revision from 22:52, 1 August 2008
62
subscriptions61. Remaining anchors quality of service and service level agreement stress
the quality of the commercial cloud computing solutions, which are guaranteed by legal
agreements62.
8. The category “Broader concepts related to cloud computing” (13 anchors) consist of
anchors pointing out the broad scope of cloud computing. While this category might
appear to be heterogeneous, the context of the anchors enables them to be combined
into one category. Two main examples of anchors in this category are internet and
computing. Both anchors generalise the cloud computing concept to “any computations
in the internet”. Other anchors such as shared services or converged infrastructure
reflect general tendencies in the IT. They thus broaden the concept of cloud computing
as they perceive it as a consequence of global trends in the IT field63. Anchors such as
services64 or utility65 extend the concept of cloud computing even further by generalising
it to any possible services or resource delivery type over the network. Similar rational
can be applied to all anchors in this category.
9. The “Means to interact with cloud computing” category includes five anchors
corresponding to the manifestations of cloud computing. These are the different visual
interfaces for cloud computing. It can be a business application66 or application
software67 in general that uses resources from the cloud. The means of accessing
resources from the cloud are nevertheless unrestricted to software in the traditional
sense. Further examples are web applications in a web browser68 or mobile apps69 – both
of which provide an interface for triggering computation in the cloud and displaying the
results.
The high number of resulting categories has potential to reduce the clarity of the
analysis. While some of the categories play a central role in understanding the anchor
evolution, others have a limited significance due to the small number of weak anchors
in the categories. To avoid fragmentation of the analysis, categories was grouped into
five distinct perspectives.
The generalising perspective (Anchors that extend the scope of CC) comprises three
categories: Concepts from which CC has departed (1), Figurative aspects of CC (2) and
61
Cf. cloud computing revision from 17:23, 5 August 2008
62
Cf. cloud computing revision from 22:52, 1 August 2008
63
Cf. cloud computing revision from 08:20, 9 January 2012
64
Cf. cloud computing revision from 13:35, 18 August 2011
65
Cf. cloud computing revision from 22:52, 1 August 2008
66
Cf. cloud computing revision from 12:58, 9 June 2009
67
Cf. cloud computing revision from 09:54, 9 January 2012
68
Cf. cloud computing revision from 06:05, 8 August 2008
69
Cf. cloud computing revision from 12:47, 10 July 2011
63
Broader concepts related to CC (8). Each of these categories broadens the scope of the
cloud computing phenomenon. Anchors in the category “Broader concepts related to
cloud computing” do so per definition. “Figurative aspects of cloud computing” anchors
have a similar effect by ‘blurring the borders’ of cloud computing. As for “Concepts
from which cloud computing has departed” anchors, they do not describe what CC is.
Rather, they point at what cloud computing is not, and thus potentially include new
emerging phenomena into its scope.
The usage perspective (Use cases for the cloud computing) combines “Benefits of
cloud computing” (7) with the “Means to interact with cloud computing (9) categories.
Both categories make the user their focus through introducing different interfaces that
the user can operate in order to benefit from utilising cloud computing services.
The remaining three perspectives comprise of only one category each. “Technical
aspects of cloud computing” category constitutes the technical perspective. Similarly,
“Cloud computing solutions and providers” category makes up the example
perspective. The sub concept perspective incorporates anchors from the category of
“Sub concepts of cloud computing”. Anchors from the category “Origins of cloud
computing” are candidates for the sixth perspective. However, origin aspects do not
provide any evolutional insights. The category has three anchors, which appear in a
single period, and is therefore of a limited significance. For the clarity of the case study,
the origins perspective is disregarded in the analysis, which leaves five perspectives to
focus on in the next section.
The first attempt to familiarise the cloud computing phenomenon on the Wikipedia
platform is dated to the 3rd of March 2007. In the first revision of the CC article, the
phenomenon was anchored in terms of utility computing (1.00)70 by redirecting to the
article of the latter. The social representation of utility computing on Wikipedia has
existed since the 3rd of June 2005. By the time that the CC article was created, utility
70
Numbers in the brackets indicate anchor strength in the corresponding period as defined in 4.2.2.2
64
computing had already been defined on Wikipedia as: “[a] business model whereby
computer resources are provided on-demand and on pay-per-use basis”71.
Anchors of utility computing as for 3rd of March are shown in Fig. 33. The majority of
anchors such as computer, hardware and grid computing have a technical character. An
interesting feature of this context is the anchor “natural gas”. This employs the analogy
of a gas provider to illustrate the utility computing provider who delivers computing
resources on demand. Anchoring utility computing in terms of grid computing, which is
a “computing model that distributes processing across a parallel infrastructure” 72,
emphasises the distributed nature of the technology.
Consequently, the historical anchors of utility computing depicted in Fig. 33 can be seen
as anchors for, at that time, rather unfamiliar phenomenon of cloud computing.
Following the infancy stage, the next phase is characterised by the initial introduction of
different perspectives on cloud computing so to distinguish it from the utility computing
concept and make it more tangible. The phase begins with an introduction of new
anchors in September 2007. In comparison to a single utility computer anchor in its
infancy stage, there are 74 new anchors to be considered in this time frame.
The usage perspective is introduced during the rest of the year 2007 and is represented
by the anchors for web application (0.27), web browser (0.21) and rich internet
application (0.31). The corresponding concepts allow cloud services to be accessed, and
are therefore representative of how the user understands cloud computing.
In the year 2008, the sub concept perspective gains importance. Concepts that can be
seen as ‘close’ to cloud computing start to appear as anchors. Each of these anchors
come from the same category as utility computing, namely “Sub concepts of cloud
71
http://en.wikipedia.org/w/index.php?title=Utility_computing&oldid=113214987
72
http://en.wikipedia.org/w/index.php?title=Grid_computing&oldid=112784741
65
computing”. While grid computing (0.56) is one of the already introduced anchors of
utility computing, two other concepts introduce additional characteristics to the concept
of cloud computing. On the one hand, autonomic computing (0.21) puts an emphasis on
the self-management aspect of cloud computing networks. On the other hand,
distributed computing (0.27) underlines the distribution of the computational nodes that
are involved in a cloud network. One particular CC sub concept anchor - software as a
service (0.7) (SaaS) - appears to integrate the most important aspects of cloud
computing social representation of the time. The representation of SaaS summarises the
paradigm shift that includes the service orientation and the software migration from
clients’ devices into the web.
The next perspective on cloud computing during this phase is an attempt to make the
cloud computing phenomenon familiar through specifying its technological scope.
Among the 20 strongest anchors in this period are anchors such as virtualization (0.47),
data (0.45), computer cluster (0.29), multi-core (0.29), and parallel computing (0.29).
Similarly, there is another attempt to familiarise cloud computing in 2008 through the
example perspective. Anchor instances in this perspective are corporations, which at
this point of time have already introduced some forms of cloud computing: Google
(0.23) and their Google apps (0.45), Salesforce (0.23), IBM (0.23), Microsoft (0.23)
General Electric (0.22) and many others (cf. Appendix B).
The data in Fig. 34 reflect the introduction of different perspectives on cloud computing.
The phase is distinguishable from both the infancy phase, in which no anchoring
activities are observed, and from the subsequent phase. The end of the phase is marked
by the disappearance of a high number of anchors in October 2008, and by the change
of the dissimilarity pattern from peaking (due to the introduction of different
perspectives) to more ‘stable’ in the next phase.
66
The collaboration process during the phase is marked by the steadily increasing interest
that occurs before the first peak in August 2008 (cf. Fig. 35). The centralisation trend of
the collaboration between April 2008 and August 2008 together with the rising editing
activity mirrors the competition between the technological, sub concept and example
perspectives that is observed in this time frame. The disagreement between the
contributors is also confirmed by the rising edit-war level in this period.
In a nutshell, the time period between September 2007 and October 2008 consist in the
initial search for different ways by which the phenomenon of cloud computing can be
made more tangible. A social representation, which was previously solely dependent
upon utility computing, starts to acquire new characteristics through the new anchors in
different categories. It is a period characterised by attempts to find a distinct and more
independent representation of cloud computing, rather than those of the utility
computing in the infancy phase. The anchors from the technological perspective have
their strongest position in this phase.
67
There is a group of cloud computing anchors from the previous phase, which is not yet
elaborated upon. The group comprises generalising aspects of cloud computing. The
phase between November 2008 and February 2010 is marked by the ‘generalising core’
that starts to emerge on the background of the disappearing perspectives of the last
period.
The establishment of the strong generalising perspective had already begun in 2008.
During this year, cloud computing was anchored in terms of internet (0.72 – the
strongest anchor in 2008), where cloud is used as a synonym for the internet73, in terms
of computing (0.53) as a particular style of computing in this case74 and in terms of web
2.0 (0.47). Anchors from this perspective remain strong over the entire anchoring period
that is analysed in the case study. In the year 2009, the representation of cloud
computing became even more generalised through the introduction of three strong
anchors belonging to the category of “Figurative aspects of cloud computing”. The first
anchor is metaphor (0.69), where cloud is understood as a metaphor for the complexity
of the internet75. The second anchor in this category is abstraction (0.66), where cloud is
an abstraction from the internet complexity. In the similar vein, computer network
diagram (0.99) anchor is nothing more than another metaphor for the internet.
The generalising anchors appear to be responsible for the decline of other more concrete
perspectives that were construed in the previous phase. Given the strong position of the
anchors internet and computing, the core meaning of cloud computing can be
generalised to “calculating something in the internet”. For such a generalised
phenomenon it is difficult to define a more specific scope. A concrete example for the
effect of anchoring cloud computing in terms of generalising social representations can
be observed in the year 2008. Beside the specific software as a service anchor, a more
general one – everything as a service (0.52) – is introduced in order to account for other
types of computing in the internet. This thus facilitates a representation that is coherent
with the generalisation level already set by anchors such as internet and computation.
The anchoring statistics in this period reveal an interesting observation. While the
amount of edits reaches its peak in this period, the dissimilarity of anchoring (Fig. 36) is
around 30%, which is relatively low when compared to the previous periods. Similar
effects, indicating decrease in the intensity of the anchoring process, can be observed on
the anchor durability graph. It appears that after the intensive anchoring phase that
73
http://en.wikipedia.org/w/index.php?title=Cloud_computing&oldid=190252349
74
http://en.wikipedia.org/w/index.php?title=Cloud_computing&oldid=227985578
75
http://en.wikipedia.org/w/index.php?title=Cloud_computing&oldid=283087883
68
occurred in response to the novelty of the cloud computing phenomenon, this phase
introduces relative temporal anchor stability.
The most convincing explanation for this temporal relative stability is a widening of the
phenomenon scope. Attempts to anchor cloud computing in terms of a close set of
anchors within the technological, sub concept, example or usage perspectives are no
longer observed in this timeframe. Instead, during the period, generalising anchors such
as computer network diagram (0.99), data (0.99), and software (0.99)76 gained strength.
The phase between November 2008 and February 2010 can be thus seen as a year of
establishing for the upcoming years the generalising nature of cloud computing
phenomenon. This will have significant implications for the subsequent anchoring
process. The effect will be demonstrated in the following course of the case study.
76
Anchors data and software change their context to “points of departure” in this period.
69
The increase in the number of anchors in this period to 80, and decrease in the average
anchor durability, indicate that a more intense anchoring process is once again
occurring. However, since the generalising perspective remains strong, all anchor
changes occur on the background of the anchors that have been preserved from the
generalising perspective. This results in the dissimilarity measure showing the lowest
values in this period (Fig. 37).
Examination of individual anchors provides qualitative support for the observed data.
The example perspective is once again added to the cloud computing representation.
Every example is a step towards familiarising the generalising social representation of
cloud computing. In the year 2010, the list of corporations implementing cloud
computing and the corresponding solutions includes: Google (0.6), salesforce (0.52),
Microsoft (0.46), IBM (0.45), vmware (0.2), amazon web services (0.39) and many
others (cf. Appendix B). The usage perspective additionally regains strength. Two
anchors, service level agreement (0.84) and quality of service (0.84) from the benefits
70
In summary, this phase is characterised by the search for suitable ways through which
the social representation of cloud computing can be made more tangible, while
simultaneously preserving its generalising nature. Attempts to find more specific
representation for cloud computing include the use of a high number of examples and an
emphasis on the user benefits.
Collapse and reestablishment of the generalising core (Mar 2011 – Jan 2012)
The attempt to combine the anchoring of the phenomenon in both generalising and
specific terms in the previous phase fails. Consequently, the ‘generalising core’ begins
to collapse. Anchors such as internet (0.35), computing (0.54), metaphor (0.22),
computer network diagram (0.22), abstraction (0.23) and paradigm shift (0.00) lose
their strength in this period. The anchoring is completely changed repeatedly between
Mar 2011 and Aug 2011. Figure 38 illustrates the effect very clearly.
It is however not only generalising anchors that are affected during the collapse. In this
period, every perspective present in the representation is threatened. Thus, the anchoring
in September 2011 starts ‘from scratch’ with a new attempt to redefine cloud
computing. Consequently, the generalising core is once again restored by the end of the
phase. The resulting anchors by the end of the period are illustrated in Fig. 39. All of
them, except utility computing, belong to the generalising perspective.
The domination of a single generalising perspective continues until January 2012. The
new phase is characterised by a repeated attempts to combine the generalising
perspective with more specific ones. This phase is therefore an analogy for the first
concretisation attempt. It appears that cloud computing, when anchored in terms of only
72
In the period from June 2012 until March 2013, the sub concept perspective dominates
the social representation of cloud computing. The observed changes in this period
appear as a “desperate” attempt to list all different forms of cloud computing:
Even business process – a term that means much more than the sum of its non-
obligatory IT components – is included into the type of service that cloud computing
can provide79.
In a nutshell, the second concretisation attempt is marked by three aspects. First, it is the
strengthening of the generalising perspective of social representation. Second, it is the
attempt to anchor cloud computing in terms of all possible forms of service delivery,
and thus a strong sub concept perspective. Finally, it is the introduction of a mobile
aspect by including mobile app anchor into the usage perspective as well as emphasis
on the additional means of accessing cloud computing and its resultant benefits.
77
http://readwrite.com/2010/03/16/mobile_app_marketplace_175_billion_by_2012
78
http://en.wikipedia.org/w/index.php?title=Cloud_computingt&oldid=488792493
79
This anchor has a strength below the threshold.
73
A final observation of the strongest anchors in the first months of 2013 is insightful. In
addition to the utility computing (1.00) anchor, which has influenced the social
representation of cloud computing from the beginning of its evolution, Fig. 41 illustrates
two dominating perspectives. The generalising perspective is the strongest perspectives
in both the present period and previous periods. The usage perspective on the other
hand, is a temporal attempt to make the representation of cloud computing more
tangible. The appearance of the anchor business model is symbolic. It demonstrates the
movement of cloud computing from its underlying technological background, towards
means of doing business in general. This is coherent with the generalising nature of the
basic characteristics such as internet and computing that are assigned to cloud
computing by anchoring it in terms of corresponding concepts.
As it was outlined several times, the strongest anchors for cloud computing reflect the
generalising nature of the phenomenon. Nevertheless, throughout the anchoring process
it is apparent that the high level of generalisation in the social representation of cloud
computing, which can be simplified to “computing something in the internet”, does not
satisfy the social group of Wikipedia users. There have been numerous attempts
throughout the history of cloud computing social representation on Wikipedia to
establish more specific anchors. All of them appear to have failed in the long term due
to incompleteness and a resulting lack of coherence between the introduced anchors and
the more generalising ones.
Consequently, it is logical to suggest that the anchoring process for cloud computing
will remain intense, and that it will be marked by consecutive alternations between
either more or less generalising natured anchors, in terms of which the phenomenon of
cloud computing is anchored.
The objectification process began six month after the beginning of the anchoring
process. The first relevant article that is identified is a revision of the cloud applications
article dated 05.09.2007, in which cloud computing was referenced as a platform on
which to run cloud applications80. The exact date of the last reference to the cloud
computing article is difficult to identify although it is know that there were 367 new
references to cloud computing in 2013.
80
http://en.wikipedia.org/w/index.php?title=Cloud_Applications&oldid=155846389
81
As for 05.05.2013
75
The subject of the second case study is the social representation of iPad on Wikipedia.
The iPad is a tablet computer that was originally introduced in January 2010 by the
American IT Corporation ‘Apple Inc.’. In terms of the social representations theory, the
iPad has shifted from being an unfamiliar phenomenon to, arguably, the symbol of a
new market. Companies such as Samsung, HTC, Motorola, RIM, Sony, HP, Microsoft,
Archos and many others have entered this market with their tablet computers82. The aim
of the case study is to analyse the change in the representation of the iPad since its
introduction.
The structure of this section is similar to those of the first case study. The general
description of the iPad device is provided in the next section. Sections 5.2.2 and 5.2.3
contain the anchoring and the objectification analysis, respectively.
82
http://smartmediatech.in/?page_id=61, accessed 11:15 2013.06.22
83
iPad definition as for 18 of June http://en.wikipedia.org/w/index.php?title=IPad&oldid=560439228
76
An iPad can shoot video, take photos, play music, and perform Internet functions
such as web-browsing and emailing. Other functions—games, reference, GPS
navigation, social networking, etc.—can be enabled by downloading and
installing apps; as of 2013, the App Store offered more than 800,000 apps by
Apple and third parties.[14] […]
English Wikipedia, formatting in original
According to this definition, the iPad is more than a single product. Rather, it is a line
of tablet computers, which is defined by entertainment, social networking and
navigational functions. The fact that different iPad versions are used as anchors for the
phenomenon is particularly interesting when considering it in terms of the theoretical
perspective. It shows a high objectification level of the phenomenon and therefore
supports one of the central stances of the social representations theory. The latter states
that, through the process of forming social representations, objects become part of the
social reality (Moscovici 1988, p.214). In the case study of cloud computing, the focus
was more on dynamic changes in the anchoring process, rather than on the process,
through which a representation becomes a part of reality. Consequently, the iPad case
study is valuable in understanding the transition of objects from being unfamiliar to
becoming ‘iconic’ (Moscovici 2000/1984, p.49).
Similar to the cloud computing case study, the evolution of the iPad social
representation is concurrent to the evolution of the phenomenon outside of Wikipedia.
Figures 43 and 44 sketch the same time frame, during which the Wikipedia article
underwent editing and Google search queries were initiated. The value of 100 in both
graphs represents the highest number of edits and search queries, respectively.
Fig. 44 Interest for the iPad Search Term According to Google Trends 84
The fact that the iPad is a physical device allows such events as its release and
announcement dates to be related to the distribution of article edits. Figure 45 illustrates
the monthly number of edits for the iPad article together with six important events that
occurred within the period.
The editing pattern is different to the one that was observed in the cloud computing
article. The latter consisted of a long increasing trend towards the peak, and a
decreasing trend following that. The peak in editing activity for the iPad article, on the
contrary, is observed at the beginning. This appears to follow the increase in interest
toward the phenomenon, following Apple’s official announcement of the release date
for the iPad in January 2010 (1). The subsequent increases in editing activity
correspond to further release and announcement dates for related Apple products (2-6).
84
Source: http://www.google.com.au/trends/explore?q=iPad#q=iPad&cmpt=q,
78
The effects from these events are also observed in the number of views for the iPad
article (Fig. 46). For example, the initial interest for the iPad phenomenon leads to a
higher number of views for the article, when compared to the ‘Facebook’ article.
The average number of views for the iPad article is about 11.4% of the respective value
of the most popular article from the year 2012. According to the assumption made in
5.1.1, the popularity of the iPad article on Wikipedia is considered to be high.
Similarly to the cloud computing case study, Fig. 47 illustrates the collaboration process
for the iPad article. This includes data about the amount of edits per month (orange),
amount of contributors per month (blue) and the combined monthly measure edits per
editor (green). The data is the foundation for the interpretation of the anchoring in terms
of the changes made in the collaboration process.
Fig. 47 Standardised Monthly Edits, Editors and Edits per Editors Statistics
for the iPad Article
79
The collaboration process for the iPad article appears to be less dynamic than the
collaboration process for the cloud computing article. The centralisation phases (marked
green in the graph), which an increasing interest, appear to correspond with the
aforementioned release and announcement events for the Apple products. With the
exception of these peaks, the collaboration intensity decreases and remains on a
relatively similar level in the time frames between the events. In the following
anchoring analysis, the correspondence between centralisation phases and changes in
the anchoring will be verified.
The anchoring time frame ranges from the first revision of the iPad article dated 26th of
December 2009 until 20th of June 2013, the date on which the anchor analysis was
conducted. The structure of the section is similar to the cloud computing study. First,
statistics for the iPad anchors are introduced. Second, the anchors identified by the tool
are coded in section 5.2.2.2. Finally, the statistics and the anchor categories are
employed in section 5.2.2.3 to reconstruct the evolution of the iPad’s social
representation on Wikipedia.
Similarly to the cloud computing study, the employed anchor statistics include new and
obsolete anchors in each period, the dissimilarity between anchoring states, the average
anchor durability in different time periods and the anchor edit war level for each
period. Since the anchor evolution interpretation in section 5.2.2.3 is focused on the first
two measures, they will be introduced in the following. The remaining anchor statistics,
can be found in Appendix G.
The analysis of new and obsolete anchors for the iPad social representation on
Wikipedia reveal several time periods containing significant anchor movements (Fig.
48). The first period is represented by the months immediately after the article was
created. A high anchor movement corresponds to the high editing activity during the
period. Furthermore, the months October and December in the year 2012 contain a
higher number of anchor introductions. The qualitative nature of these changes will be
explored in section 5.2.2.3. The anchor movements are also used to identify different
phases in the evolution of the social representation of the iPad.
80
Fig. 48 Amount of New and Obsolete Anchors for iPad per Month
Anchor dissimilarity
The anchor dissimilarity across subsequent time periods in Fig. 49 indicates that the
anchoring of the iPad social representation is more stable than that observed in the
cloud computing case study. Apart from the anchoring occurring in the first four
months, and the peak in November 2012, the dissimilarity measure predominantly
remains below 20 percent. This means that the differences in the anchoring in these
periods are rather small. Even without a qualitative analysis on the level of single
anchors, it is apparent that the social representation of iPad could not experience
significant change within those periods. For example, the transition from July 2011 to
August 2011correponds to an anchor dissimilarity measure of 0.01. Consequently, only
one percent of the anchoring was changed, while 99% of the anchors remained for the
same amount of time in the article. However, the periods with a high anchor
dissimilarity require a qualitative analysis in order to reveal the exact nature of the
corresponding change.
In comparison to the cloud computing case study, the anchoring of the iPad social
representation appears to be less dynamic. The actual reasoning behind this will be
conveyed in the subsequent qualitative analysis. However, it is reasonable to assume
that on the basis that the younger iPad phenomenon is a physical object, its comparison
to other objects in terms of both specifications and functionalities is easier. This
therefore means that the iPad appears more familiar to the social groups from the
beginning. This is especially true when considering that at least a part of the iPad
characteristics are attributable to its most concrete technical specifications – something
that cloud computing, as a jargon term and a business model, does not possess. The
qualitative anchor analysis will help to verify this assumption.
A total of 143 anchors for the iPad social representation on Wikipedia were identified
by the WikiGen tool. Fifty anchors were disregarded due to their anchor strength falling
below the chosen threshold of 0.10. The context of the remaining 93 anchors was
analysed according to the defined procedure in section 3.4.2. As a result, 10 anchors
were removed and 20 anchors were merged. The majority of the deleted anchors are
links to organisations such as “Pc World” magazine or “New York Times”, which
published documents sourced in the article. They therefore serve as anchors for the
corresponding documents, rather than for the iPad device. The removed anchors
comprise anchors that only differ by their spelling, such as “multitouch” and “multi-
touch”. Appendix D provides the full list of deleted and merged anchors.
Products and companies that Origins of the iPad (2) Technical aspects of the iPad
compete with iPad and Apple (3)
respectively (1)
Products and technologies Use cases for the iPad (5)
that are similar to the iPad (4)
1. The category “Products and companies that compete with iPad and Apple
respectively” (Competitors) contains five anchors, which point to Apple’s competitors
and their iPad-similar devices. The first two anchors are amazon and its eBook reader
82
‘kindle’85. The third anchor ‘barnes & noble nook’86 is another e-reader device
developed by the American book retailer Barnes & Noble. Furthermore, tablet computer
and stylus (computing) anchors87 point at a table computer definition by Microsoft,
which satisfies the majority of all potentially competing tablet computers of that time.
2. Five anchors in the category “Origins of the iPad” (Origins) refer to companies,
people and events associated with the development and marketing of the device. Three
anchors in this category entiteled: Steve Jobs88, apple inc89 and macintosh90, establish a
link to the Apple Inc. Corporation. Additionally, the iPad is anchored in terms of its
manufacturer foxconn and in terms of the conference center “yerba buena center for the
arts”91, which hosted the iPad introduction event.
3. Similarly to the cloud computing case study, the category “Technical aspects of the
iPad” (Technical aspects), which comprises 18 anchors, represents device’s
specifications. One group of the anchors, including wi-fi, 3g and cellular network92,
focuses on the iPad’s ability to access computer networks. Anchors such as dock
connector93, pixel, and gigahertz94, on the other hand, shed light on the remaining
technical aspects of the iPad.
4. The majority of the fifteen anchors that comprise the category “Products and
technologies that are similar to the iPad” (Similarities) represent similar electronic
devices. These include laptops95, tablet computers96 and smartphones97. Also included in
this category are similar Apple devices such as iPod98, iPhone99, and different versions
of the iPad itself.
5. The largest category, entitled “Use cases for the iPad” (Usage), contains 23 anchors.
These anchors represent aspects regarding the usage of the iPad device. Examples of
anchors in this category are virtual keyboard100, video camera, video game, social
85
Cf. iPad revision from 20:05, 13 February 2010
86
Cf. iPad revision from 01:42, 23 March 2010
87
Cf. iPad revision from 18:53, 17 April 2010
88
Cf. iPad revision from 20:59, 30 October 2011
89
Cf. iPad revision from 18:53, 17 April 2010
90
Cf. iPad revision from 23:52, 24 January 2012
91
Cf. iPad revision from 02:04, 27 December 2009
92
Cf. iPad revision from 18:46, 17 April 2010
93
Cf. iPad revision from 20:05, 13 February 2010
94
Cf. iPad revision from 20:05, 13 February 2010
95
01:28, 16 April 2010
96
00:07, 28 January 2010
97
13:21, 8 March 2010
98
01:35, 3 February 2010
99
22:55, 1 March 2010
100
15:49, 28 January 2010
83
network service and many others101. The context of the anchors is manifold and,
additionally, is a subject of change during the analysis time frame. The latter is a part of
the following narrative analysis that describes changes in the anchoring of the iPad
social representation on Wikipedia.
The overall anchoring time frame consists of distinct phases; these can be separated
from each other according to the different patterns observed in those phases. However,
the social representation of iPad is different to those of the cloud computing in that it
contains anchors that ‘survived’ almost the entire anchoring period. Therefore, these
anchors will be introduced before the reconstruction of changes, which the iPad
representation experienced in the different phases of its anchoring.
Three iPad anchors that have remained unquestioned by the Wikipedia users are tablet
computer, Apple Inc and multi-touch. The reasons for the strong position of these
anchors are apparent. While the corporation name is a historical and legal fact, tablet
computer and multi-touch anchors possess the ‘iconic quality’ for the representation
(Moscovici 2000/1984, p.49). Both refer to the physical form of the device, and the
physical means of interacting with it. The image of a human holding or touches a flat
electronic device is, likely, the most immediate and vivid image associated with the
iPad device. This is not least due to the Apple’s marketing campaigns, which often
incorporate this kind of imagery. Figure 50 illustrates the title image of the iPad TV
Ads section on the Apple website102.
Fig. 50 Title Picture of the iPad Ads Section on the Apple’s Website
101
22:16, 23 October 2012
102
http://www.apple.com/au/ipad/videos/#tv-ads-together, accessed on 23th of June 2013
84
In contrast to the above anchors, the remaining anchors are subject to change and are
therefore analysed in the context of their corresponding time frame in the anchoring
process. A summary of WikiGen statistics for each evolution phase of the iPad social
representation on Wikipedia is provided in Appendix G.
The infancy phase for the iPad social representation is remarkable as it is based purely
on rumours and speculations about a new Apple device. The first article revision103 is
dated 26th of December while the earliest Apple’s announcement about the actual device
was made on 27th of January. In this early stage, the iPad article had a different name –
iSlate. This name was based on the rumours within the Wikipedia community and was
changed to iPad after the official announcement. It is important to note that neither iPad
devices nor official statements about the device had been made available to the
community at this time. Yet the Wikipedia users were able to form a social
representation of a non-existent device. This emphasises the nature and the power of
social representations to form a social reality, as opposed to objectively reflecting the
world ‘out there’ (Moscovici 1990, p.164).
The strongest anchors within the infancy phase are introduced in what follows. Based
on rumours, the Wikipedia community assumed that the apple inc. is producing a new
tablet pc with the help of manufacturer foxconn. The ‘iSlate’ device was supposed to be
presented in San Francisco at the yerba buena center for the arts. Its function was
supposed to be similar to an already existing ebook reader entitled barnes & noble nook,
with the prime difference being that the content would be purchased via Apple’s app
store.
The end of the infancy phase is clearly marked by the official Apple announcement,
which has provided the community with technical specifications and typical use case
scenarios for the upcoming device.
Similarly to the cloud computing study, the phase following the infancy of the
representation is characterised by the introduction of different anchor groups. The aim
of this phase was to overcome the unfamiliarity felt towards the new iPad phenomenon
by approaching it from different angles. Within the next seven months, anchors from all
five different groups had appeared for a limited amount of time. This is reflected in the
103
http://en.wikipedia.org/w/index.php?title=IPad&oldid=334876628
85
high number of introduced and removed anchors, as well as in the high dissimilarity
measure. Figure 51 highlights the anchor dynamics in this phase.
The initially high unfamiliarity level of the iPad phenomenon is also observed from the
corresponding collaboration dynamics. In the cloud computing case study, it was
emphasised that a high level of unfamiliarity appears together with an increase in the
intensity of the collaboration. Furthermore in such cases, the rise in the editing activity
is not proportional in the sense that a high increase in the edits is followed by a lesser
increase in the number of editors. In other words, the unfamiliarity of the phenomenon
on Wikipedia is accompanied by the centralisation of the collaboration processes with a
simultaneous rise of interest. Similarly, the initial unfamiliarity associated with the new
iPad device is accompanied by the aforementioned changes in the collaboration process.
Figure 52 illustrates a relatively high level of the centralisation in this phase using the
standardised data for edits, editors and edits per editor.
In the first months after the official announcement (February - March), the
representation of iPad was dominated by anchors from the technical aspect category.
Technical aspects in this period include anchors such as pixel, gigabytes flash memory,
Bluetooth 2.1, apple a4 and a number of anchors for the communication technologies
that have been integrated into the device.
Related to anchors from this group are those from the usage category. The first anchors
in this category are focused around the ebook use case scenario, which had already
acquired familiarity at that time. The anchors in this category between February and
July include ebook, ibooks and ibookstore. This pattern of anchoring unknown
phenomenon in terms of already known products and usage scenarios is in full
accordance with the social representations theory.
In the similar vein, anchors from the similarities category in this phase include devices
such as iphone and ipod touch, which had already existed for a longer period of time.
Two other anchors in the category, entitled smartphone and laptop, emphasise the
position of the iPad somewhere in between these two types of devices. It highlights the
dependency of the iPad social representation on already established representations.
The end of the phase is marked by a transition to a more stable anchoring following the
initial attempts at familiarisation. Figure 53 illustrates the dissimilarity measure of the
last month of the period (August 2010) to preceding and subsequent months. While the
dissimilarity to the previous months is rising, the measure for the subsequent months
indicates that about 70% of the anchoring remains stable in the following months.
104
Cf. iPad definition from 23:16, 17 April 2010
http://en.wikipedia.org/w/index.php?title=IPad&direction=next&oldid=356674575
105
Cf. Tablet PC definition from 23:01, 17 April 2010
http://en.wikipedia.org/w/index.php?title=Tablet_computer&direction=next&oldid=355863073
87
In a nutshell, the time period from February 2010 until August 2010 is characterised by
initial attempts to familiarise the unknown concept of the iPad. Anchoring attempts
range from stating the technical specifications of the device, to emphasising the
competitive and similar devices, and lastly to usage scenarios. Anchors from the last
three categories focus on already established representations, such as an old definition
of a tablet computer by Microsoft, competing devices such as Amazon’s kindle, and
usage scenarios such as ebook reading.
The previous phase resulted in a set of anchors that appear to be accepted by Wikipedia
contributors for the period between September 2010 and September 2012. In this
representation, the majority of technical anchors had disappeared. Rather, the
representation is dominated by anchors from the usage and similarities categories.
Figure 54 illustrates the strongest anchors in this phase.
The effect of the almost permanent presence of these anchors in the article is shown in
Fig. 55. During the period, the dissimilarity measure falls below 25% meaning that
88
more than 75 percent of the anchoring remains stable from period to period.
Consequently, only minor anchor fluctuations are observed during this time frame.
The name of the phase however indicates that there is an intense objectification process
taking place throughout the period. This was derived from two observations. First, as
demonstrated in the subsequent section, this period contains a growing amount of
articles that reference the iPad social representation. Second, the end of the phase
indicates a major change towards a more independent representation of the iPad device.
The last period in the anchoring process of the iPad social representation on Wikipedia
can be described as a process of the individualisation. During this process, the
independence of the iPad phenomenon from its roots becomes apparent. The
corresponding changes are explained in the following.
The phase starts with a higher number of new anchors being introduced to the article
(Fig 51). The majority of the new anchors represent the different generations of iPad
devices. In this way, the iPad representation is now anchored in terms of different iPad
versions such as iPad1, iPad2, iPad3 and iPad mini. Anchors from the similarities
category take over the role that concepts such as iphone, ipod, laptop and smartphone
had in the previous phases106. More importantly, the social representation of the tablet
computer itself is changed. In its current definition on Wikipedia, the tablet computer is
no longer associated with an old “stylus-based Microsoft definition”. Instead, the
representation is anchored in terms of the iPad itself.
106
Anchors laptop and smartphone disappeared in October 2012
89
Among tablets available in 2012, the top-selling line of devices was Apple's iPad
with 100 million sold by mid October 2012 since it had been released on April 3,
2010 […]
English Wikipedia107, formatting in original
Further indication for a growing independence of the iPad representations are hidden in
a number of small but important details in the anchoring evolution. First, within the
usage category, a number of anchors such as jailbreak, itunes, file synchronisation and
usb have previously represented limitations in iPad use. Jailbreak was viewed as
necessary means to access additional, unofficial iPad features. With the growing
amount of software applications for the iPad, the jailbreak anchor lost its importance in
this period. The growing range of software products for the iPad is also reflected in the
appearance of anchors that symbolise different usage scenarios. The latter includes
camera video, portable media player, social network service, reference work and gps
navigation software. Moreover, the usage range appears to be extended by the
disappearance of anchors such as iTunes, usb and file synchronisation. These anchors
stand for requirements to operate the device that were eliminated in one of the iPad
software updates. Changes in the usage and the previously described similarities
category are illustrated in Figure 57.
107
Tablet computer definition on Wikipedia
http://en.wikipedia.org/w/index.php?title=Tablet_computer&oldid=561068568 , accessed 23.06.2013
90
Further support for the individualisation process in the anchor evolution of this time
period is provided by the fact that anchors from the competitors group are no longer
observed. A comparison with other devices is rendered unnecessary for an objectified
representation of the iPad.
In summary, the social representation of the iPad does not fall in between those of a
laptop and a smartphone, nor is it a device that is considered to be similar to the iphone
or ipod. Instead, it appears to be a distinct device with its own use cases and its own
tablet computer market that is defined in terms of the iPad device itself. It appears that
the job of a marketing department of a corporation such as that in Apple Inc. is to
objectify its products as fast and as effectively as possible so that they become a part of
the social reality. In the context of the iPad’s rather short yet successful history, Apple’s
marketing department appears to be successful in achieving this goal. A further
explanation for a rather fast objectification process of the phenomenon can also be
found in the physical nature of the phenomenon. The iPad is an electronic device
designed to be touched, to be seen and to be heard. It induces sensations that
distinguishes it from abstract phenomena such as cloud computing.
The objectification analysis conducted with the help of the WikiGen tool revealed a total
of 2817 articles ion Wikipedia that have references to the iPad social representation.
The 894 articles are not encyclopaedia articles and are therefore not accounted for in the
distribution of the remaining 1923 articles in the main namespace of the Wikipedia.
Figure 58 illustrates the number of references over time since the creation of the iPad
article.
In accordance with the results of the anchoring analysis, there is a significant increase in
the number of references in the year 2011 and 2012. This period was identified as an
‘objectification’ phase. The tendency is still increasing, indicating that there is an
91
ongoing objectification process through which the iPad becomes manifested in the
social reality of the Wikipedia users.
6 Discussion
The cloud computing and iPad case studies in the previous section demonstrated the
developed method for studying the evolution of social representations on Wikipedia.
The aims of the discussion section are to answer the research question, and to elaborate
upon the identified similarities and differences in the evolution of both analysed social
representations.
The method presented in this work answers the research question of how the evolution
of social representations can be studied on Wikipedia by developing a corresponding
method. The method is based on three essential steps. First, the mapping between
concepts of the social representation theory and Wikipedia elements, such as the one
presented in section 3.3, is required. Second, the employment of a statistical tool, such
as the WikiGen tool presented in section 4, is essential to process thousands of article
revisions that are available on the platform. Finally, a qualitative analysis on the level of
identified anchors is required to capture the semantics of the changes in the social
representation under study. The case studies in section 5 integrated all three of these
steps, and demonstrated the applicability of the method for studying the evolution of
social representations on Wikipedia through the case studies of cloud computing and the
iPad.
Figure 59 summarises the evolution of the cloud computing social representation. In the
course of the case study, six subsequent evolution phases were distinguished. Each
phase is characterised by changes in the importance of five identified anchor categories.
The evolution of the cloud computing social representation is marked by the highly
dynamic nature of its anchoring process. In this process, the generalising category
exercises the most important influence on the representation by extending its scope.
According to anchors in this category, cloud computing refers to almost anything that is
calculated on the internet. However, this view is repeatedly challenged by more specific
anchors from the categories of technical aspects of CC, sub concept of CC, CC use
cases and CC solutions and providers anchors. This tension represents two
incompatible views on cloud computing: a general, and a more specific one. Without
more specific anchors, the representation appears to evade the grasp of Wikipedia users.
An inclusion of more specific anchors, on the other hand, appears to be incoherent with
a broader scope of the phenomenon, which is set by generalising anchors. The tension
leads to consecutive alternations between either more or less specific representations of
the CC phenomenon across the different phases of its evolution.
93
Fig. 59 Evolution Phases and Anchor Perspectives for the CC Case Study
The second case study in this thesis analysed the social representation of the popular
iPad tablet device. The corresponding evolution of the iPad representation is
summarised in Fig. 60. Similarly to the cloud computing case study, four phases within
the overall evolution time frame were distinguished. Changes in the social
representation of the iPad are described in terms of the varying importance of five
identified anchor categories.
Starting with rumours about the origins of a new tablet device, the social representation
of the iPad steadily moves towards a distinct device with its own use cases and its own
tablet computer market that is defined in terms of the iPad device itself. In the course of
this shift, anchors that represent iPad competitors, as well as technical specification of
the device lose their importance to anchors from similarities and usage categories. The
latter comprise the variety of different iPad versions and the multiplicity of use cases
that the iPad is increasingly associated with.
Both case studies have indicated similarities and differences in the evolution process of
the cloud computing and the iPad social representation on Wikipedia.
94
Fig. 60 Evolution Phases and Anchor Categories for the iPad Case Study
In accordance with both Wikipedia research (Kaltenbrunner and Laniado 2012a) and
elaboration upon the social representations theory (Höijer 2011), the analysed articles
are experiencing continuous change. This fact emphasises the dynamic nature of
knowledge and encourages to see the meaning of an object through the prism of a
continuous social process that forms this meaning. In the case of cloud computing and
the iPad, both representations are initially derived from phenomena that were similar to
cloud computing and the iPad respectively. In the course of the evolution both
representations however shift towards becoming more independent. Simultaneously,
both social representations are increasingly anchored in terms of their usage rather than
in terms of their technical characteristics. A similar pattern can be assumed to exist for
other phenomena in the IT field.
The case studies indicate that the unfamiliarity with a phenomenon, as defined in the
social representations theory, leads to a more intense and centralised collaboration
process among the Wikipedia users. Attempts to resolve this unfamiliarity, in turn, are
reflected in the changing anchoring of the corresponding social representation. For
example, the initial high unfamiliarity with the cloud computing and the iPad
phenomena is reflected in the similar phase in the evolution of both representations,
95
entitled initial familiarisation attempts. During the phases, an intense and centralised
collaboration process among the Wikipedia users is accompanied by significant changes
in the anchoring of the representations. The latter is characterised by the introduction of
all different anchor categories that are associated with the corresponding social
representation, without any group of aspects dominating the others. At the end of this
phase, one or several anchor groups acquire a dominating role. In the cloud computing
case study, the stand out category is that of generalising anchors, while in the iPad case
study this role was taken by anchors in usage and similarities categories. This initial
constellation, however, is subject to change, as the cloud computing case study in this
thesis has demonstrated.
The objectification analysis in both case studies showed a similar increasing trend in the
cumulative amount of references among Wikipedia articles to the social representation
under study. This is a particularly valuable observation given the fact that the anchoring
process in both case studies is substantially different. While the anchoring of the iPad is
rather ‘smooth’ and ‘stable, the corresponding process for the cloud computing is
volatile. Yet the social representations of both phenomena increasingly become part of
the social reality. This is indicated by the rising number of other phenomena being
explained in terms of either the cloud computing or the iPad. Figure 61 illustrates the
objectification of cloud computing and iPad social representations on Wikipedia.
It must be noted that some differences in the case studies might be grounded in the fact
that the divergent representations might belong to different social representation types.
Cloud computing and the iPad are recent technological phenomena, with the analysed
time frames on Wikipeida being concurrent to the history of both phenomena. However,
the iPad is a physical device, which contrasts to the abstract model of cloud computing.
Furthermore, it can be assumed that Apple Inc. and its most devoted customers have a
stronger personal interest in crafting the representation of the iPad. The relationship
between cloud computing and different social groups on the other hand appears to be
96
more neutral. Regardless the correctness of these assumptions, differences in the level
of group interests towards phenomena under study can potentially be a source of
differences in the evolution of the corresponding social representations. A
categorisation of representations in further studies can therefore enable a selection of
more homogeneous social representations in order to study more specific evolution
patterns. In this manner, representations that are more likely to be deliberately
influenced by social groups can represent one possible sample.
The anchoring analysis in the thesis revealed distinct differences in the evolution of the
cloud computing and iPad representations. The cloud computing social representation is
marked by a more fluctuating anchoring process and therefore the representation of
cloud computing appears to be more ambiguous than the representation of the iPad. The
anchoring of cloud computing oscillates between a widening in scope of what cloud
computing means and a narrowing in scope. On the one hand, anchors in the
‘generalising’ category extend the scope of cloud computing by generalising it to
additional possible services or resource delivery modes. On the other hand, this view is
repeatedly challenged by more specific anchor categories. This effect can be observed
in the left part of Fig. 62 where peaks in the anchor dissimilarity of the cloud
computing correspond to fluctuations between general and specific perspectives on
cloud computing.
Another difference between the two case studies was observed in the evolution of the
iPad social representation. In the individualisation evolution phase, the iPad
representation is altered due to changes within categories rather than between
categories. In this phase, anchors from the similarities group are replaced by other
anchors from the same group that have changed the focus of the representation. It
implies that changes in the anchoring can happen even if the strength of the anchor
groups remains the same and only anchors within the category are replaced.
Last but not least, in the iPad case study it was possible to establish a connection
between the anchoring and objectification processes. During the objectification phase,
the anchors of the iPad stayed stable for a longer time period indicating a temporal
consensus. The assumption that an increasing objectification will be observed in this
period was supported by the results of the objectification analysis. For the volatile
anchoring process of the cloud computing SR, no indications for interdependence
between anchoring and objectification were observed.
Method evaluation
Due to the meaningful interpretations provided by the case studies, the method is
considered to be a suitable mean for studying the evolution of social representations on
Wikipedia. Some of the method advantages deserve are emphasised in the following.
First, the Wikipedia structure with its page concept and transparent collaboration
processes welcomes the application of the social representations theory. The
encyclopaedia articles with their unique names correspond to the definition of social
representations, and the internal links on Wikipedia allow exploration of the
relationship between social representations. Second, the tool enabled analyses in the
case studies. Without the automated anchor identification and statistics, such as the
anchor strength, tracing changes in the social representations on Wikipedia would be
intractable. Finally, the choice of the case studies has proven itself to be useful. The
cloud computing phenomenon demonstrated the dynamic nature of knowledge with the
changing character of its social representation. On the other hand, the iPad social
representation appeared to exist even before the actual device was produced, being
entirely based on rumours and speculations. This emphasises MOSCOVICI’s central
argument that social representations form reality rather than reflect it (1990, p.164).
However, the method also has its pitfalls. First, not all WikiGen statistics have proven to
be equally useful. In the case studies, the anchor durability and anchor edit war level
statistics played only a peripheral role. Second, the mapping of anchors to internal links
in the definition part of the article appeared to be problematic for a number of reasons.
For example, some potentially important concepts, although present in the relevant
98
section of the article, remain unlinked to the corresponding encyclopaedia entries. Other
potential anchors might be present in the article outside of the definition section. This is
particularly true for controversial phenomena, in which the inclusion of controversial
aspects into the definition section contradicts the encyclopaedias requirements of
neutrality and objectivity. Correspondingly, such aspects often migrate into the
specialised sections of the article (Hansen et al. 2009). Furthermore, some aspects of the
representation can be found in the talk pages of the corresponding article (Schneider et
al. 2010). As a consequence, potential anchors in those scenarios are not accounted for
in the case study. To increase the range of identified anchors and therefore to facilitate
analysis of different representations of the same phenomenon, the tool must be
extended.
99
The research is facing the challenge of accounting for the dynamic nature of knowledge.
This challenge historically belongs to the problem space of philosophy (Marková 1996),
and is typically explored within social sciences (Berger and Luckmann 1966). However,
other fields such as the information systems increasingly recognise the need to study
knowledge in its making in order to understand group phenomena (Boland 1999). This
epistemological problem of the origins of knowledge “becomes a social problem in the
world of today, with its permanent scientific and technological revolutions” (Moscovici
1988, p.216). Divergent representations of the same objects are observed across both
time and social space. Within this multiplicity, an increasing number of representations
are circulating the digital world and especially in the realm of social media.
Instead of focusing on the evolution of single representations, this thesis adopted the
theoretical notion of social representations (Moscovici 2008 / 1966) to provide a mean
for studying the process of knowledge evolution itself. Having identified the potential of
using Wikipedia’s historical data to study the genealogy of knowledge, the thesis
addressed the question of how the evolution of social representations can be studied on
Wikipedia. The answer to the question is presented in the form of the method for
studying social representations, which integrates a quantitative analysis of Wikipedia
data by means of a specially developed statistical tool into a case study.
The cloud computing and iPad case studies in this work demonstrated the applicability
of the method. The findings of the case studies are manifold. First, the Wikipedia data
provided evidence for the existence of the continuous anchoring and objectification
processes that are defined within the social representations theory. Second, the
quantitative data provided by the tool was found to be essential in order to indicate
changes in the social representations and, additionally, to derive distinct phases from the
overall evolution period that is analysed in a case study. Phases with an increasing
intensity and centralisation of the collaboration process have indicated a correlation
with significant changes in the corresponding representation. Third, the qualitative
analysis was found to provide meaningful interpretation of the changes that are
indicated by the quantitative data. Finally, similarities and differences in the patterns of
change were identified for the social representations of the cloud computing and the
iPad. Both representations departed from their historical roots, which were represented
by anchoring both phenomena in terms of similar concepts, towards more independent
representations. Furthermore, the evolution of both phenomena is accompanied by the
strengthening of their usage aspects and the weakening role of technical specifications.
However, the actual patterns of change were found to be different for both phenomena.
100
The case studies also identified limitations of the method. The chosen mapping between
theoretical concepts and Wikipedia elements focuses on the dominating representation
of the phenomena, which is presented in the definition part of the article. Especially in
the case of controversial phenomena, this part might contain only ‘neutralised’ aspects
of the representation while more distinct representations can be hidden in other sections
of the article. Additionally, user discussions on the talk pages potentially contain further
aspects of the social representation or even distinct representations that are not
expressed in the actual article. These missing representation aspects limit the
applicability of the method for studying the evolution of different social representations
of the same phenomenon. Furthermore, the generalisability of results based on two
conducted case studies is limited. Although the case studies demonstrate the
applicability of the method and illustrate an effective employment of the developed
statistical tool, a larger and more specialised sample of social representations is required
to derive more specific patterns.
The aforementioned limitations constitute directions for further research. First, the
WikiGen tool can be extended to identify potentially missing anchors within the plain
text or among the internal links outside of the articles’ definition sections. Together
with an analysis of talk page discussions with a corresponding sentiment analysis
(Schneider et al. 2010), the method would be better suited for tracing different
representations of the same object. Another direction for further research suggests an
attempt to identify patterns in the evolution of social representations on Wikipedia
within a specific group of objects such as a set of communication and collaboration
software products. The results of such a study would be valuable for understanding how
people perceive new IT in enterprises and how this perception changes over time. The
use of the method is therefore promising for the IS implementation research – a
direction which is already indicated within the IS research (Gal and Berente 2008).
Having demonstrated the method for studying social representations on Wikipedia and
furthermore indicating directions for further research, this thesis is a step towards
understanding the dynamic nature of knowledge. It is the hope of the author that both
the method and the tool will encourage further studies on the genealogy of knowledge.
The importance of the latter is unquestionable in that each identified pattern in the social
process of forming representations that constitute our social reality is a step towards a
better understanding of the human conduct.
102
References
Adler, B. T., De Alfaro, L., Mola-Velasco, S. M., Rosso, P., and West, A. G. 2011.
“Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and
Reputation Features.“, Lecture Notes in Computer Science (6609:PART 2), A.
Gelbukh (ed.), pp. 277–288.
Bangerter, A. 1995. “Rethinking the Relation Between Science and Common Sense: A
Comment on the Current State of SR Theory“, Papers on Social Representations
(4), pp. 61–78.
Bauer, M., and Gaskell, G. 1999. “Towards a Paradigm for Research on Social
Representations“, Journal for the Theory of Social Behaviour (29:2).
Bellomi, F., and Bonato, R. 2005. “Network analysis for Wikipedia“, in Proceedings of
Wikimania 2005, The First International Wikimedia Conference.
Berger, P. L., and Luckmann, T. 1966. The social construction of reality: a treatise in
the sociology of knowledge, Doubleday.
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., et al. 2009.
“DBpedia - A crystallization point for the Web of Data“, Web Semantics Science
Services and Agents on the World Wide Web (7:3), pp. 154–165.
Chin, S.-C., Street, W. N., Srinivasan, P., and Eichmann, D. 2010. “Detecting
Wikipedia vandalism with active learning and statistical language models“, North,
p. 3.
Dacin, M. T., Munir, K., and Tracey, P. 2010. “Formal Dining at Cambridge Colleges:
Linking Ritual Performance and Institutional Maintenance“, Academy of
Management Journal (53:6), pp. 1393–1418.
Doise, W., Clémence, A., and Lorenzi-Cioldi, F. 1993. The quantitative analysis of
social representations, Harvester Wheatsheaf.
Doise, W., Staerkle, C., Clémence, A., and Savory, F. 1998. “Human rights and
Genevan youth: A developmental study of social representations“, Swiss Journal of
Psychology (57:2), pp. 86–100.
Duveen, G. &, and De Rosa, A. 1992. “Social Representations and the Genesis of
Social Knowledge“, Papers on Social Representations (1), pp. 94–108.
Figari, H., and Skogen, K. 2011. “Social representations of the wolf“, Acta Sociologica
(54:4), pp. 317–332.
Gervais, M. 1997. “Social representations of nature: the case of the ‘Braer’ oil spill in
Shetland“, (Doctoral thesis).
Ghazal, M., Vazquez, C., and Amer, A. 2007. Real-time automatic detection of
vandalism behavior in video sequences.
Glott, R., Schmidt, P., and Ghosh, R. 2010. “Analysos pf Wikipedia Survey Data.”
Hansen, S., Berente, N., and Lyytinen, K. 2009. “Wikipedia, Critical Social Theory, and
the Possibility of Rational Discourse“, The Information Society (25:1), pp. 38–59.
Höijer, B. 2011. “Social Representations Theory“, Nordicom Review (32:2), pp. 3–16.
Holloway, T., Bozicevic, M., and Börner, K. 2007. “Analyzing and visualizing the
semantic coverage of Wikipedia and its authors: Research Articles“, Complex.
(12:3), pp. 30–40.
Howarth, C. 2006. “A social representation is not a quiet thing: Exploring the critical
potential of social representations theory“, British Journal of Social Psychology
(45:1), pp. 65–86.
Jodelet, D. 1991. Madness and Social Representations: Living With the Mad in One
French Community, University of California Press.
Jodelet, D. 2008. “Social Representations: The Beautiful Invention“, Journal for the
Theory of Social Behaviour (38:4), pp. 411–430.
Ju, B., and Gluck, M. 2011. “Calibrating information users’ views on relevance: A
social representations approach“, Journal of Information Science (37:4), pp. 429–
438.
Lewin, K. 1948. Resolving Social Conflicts: Selected Papers on Group Dynamics, G.W.
Allport and G.W. Lewin (eds.), Harper & Row.
Marková, I. 2000. “Amédée or How to get rid of it: Social Representations from a
Diaological Perspective“, (6:4), pp. 419–460.
Mills, J., Bonner, A., and Francis, K. 2006. “The Development of Constructivist
Grounded Theory“, International Journal (5:March), pp. 1–10.
Moloney, G., and Walker, I. 2002. “Talking about transplants: Social representations
and the dialectical, dilemmatic nature of organ donation and transplantation“,
British Journal of Social Psychology (41:2), pp. 299–320.
Morsey, M., Lehmann, J., Auer, S., Stadler, C., and Hellmann, S. 2012. “DBpedia and
the live extraction of structured data from Wikipedia“, Program Electronic Library
And Information Systems (46:2), pp. 157–181.
Moscovici, S. 2008. Psychoanalysis: Its Image and Its Public, G. Duveen (ed.), Wiley.
Olleros, F. X. 2008. “Learning to Trust the Crowd: Some Lessons from Wikipedia“,
2008 International MCETECH Conference on eTechnologies mcetech 2008, pp.
212–216.
Pawlowski, S. D., Kaganer, E. A., and Cater, J. J. 2007. “Focusing the research agenda
on burnout in IT: social representations of burnout in the profession“, European
Journal of Information Systems (16:5), pp. 612–627.
Potter, J., and Edwards, D. 1999. “Social representations and discursive psychology:
From cognition to action“, Culture & Psychology (5), pp. 447–458.
Potthast, M., Stein, B., and Gerling, R. 2008. “Automatic Vandalism Detection in
Wikipedia“, Advances in Information Retrieval (4956), C. Macdonald, I. Ounis, V.
Plachouras, I. Ruthven and R.W. White (eds.), pp. 663–668.
Ratkiewicz, J., Fortunato, S., Flammini, A., Menczer, F., and Vespignani, A. 2010.
“Characterizing and modeling the dynamics of online popularity.“, Physical
Review Letters (105:15), p. 158701.
Rosa, A. S. de 2013. “Taking stock: a theory with more than half a century of history“,
in Social Representations in the “Social Arena”, A.S. de Rosa (ed.), Routledge.
Rose, D., Efraim, D., Gervais, M.-C., Joffe, H., Jovchelovitch, S., and Morant, N. 1995.
“Questioning consensus in social representations theory“, Papers on social
representations (4:2), pp. 150–176.
Rosenzweig, R. 2006. “Can History Be Open Source? Wikipedia and the Future of the
Past“, Journal of American History (93:1), pp. 117–146.
107
Schneider, J., Passant, A., and Breslin, J. 2010. “A Qualitative and Quantitative
Analysis of How Wikipedia Talk Pages Are Used.”, Web Science Conference
Smets, K., Goethals, B., and Verdonk, B. 2008. “Automatic Vandalism Detection in
Wikipedia : Towards a Machine Learning Approach“, in AAAI Workshop on
Wikipedia and Artificial Intelligence An Evolving Synergy, AAAI Press, pp. 43–48.
Stross, R. 2006. “Anonymous Source Is Not the Same as Open Source“, in New York
Times.
Suchecki, K., Salah, A. A. A., Gao, C., and Scharnhorst, A. 2012. “Evolution of
Wikipedia’s Category Structure“, Advances in Complex Systems (15:supp01), p.
19.
Wagner, W., Duveen, G., Farr, R., Jovchelovitch, S., Cioldi, F. L., Marková, I., et al.
1999. “Theory and Method of Social Representations“, Asian Journal Of Social
Psychology (2:1), pp. 95–125.
Wagner, W., Elejabarrieta, F., and Lahnsteiner, I. 1995. “How the sperm dominates the
ovum - objectification by metaphor in the social representation of conception“,
European Journal of Social Psychology (25:6), pp. 671–688.
Wagner, W., Valencia, J., and Elejabarrieta, F. 1996. “Relevance, discourse and the
‘hot’ stable core social representations -A structural analysis of word associations“,
British Journal of Social Psychology (35:3), pp. 331–351.
Weber, M., Roth, G., and Wittich, C. 1978. Economy and Society: An Outline of
Interpretative Sociology, University California Press.
Yasseri, T., Sumi, R., Rung, A., Kornai, A., and Kertész, J. 2012. “Dynamics of
conflicts in Wikipedia“, PLoS ONE (7:6), A. Szolnoki (ed.), p. e38869.
108
Appendix
Concepts from which cloud Figurative aspects of cloud Technical aspects of cloud
computing has departed (1) computing (2) computing (3)
Sub concepts of cloud Origins of cloud computing Cloud computing solutions and
computing (4) (5) providers (6)
Benefits of cloud Broader concepts related to Means to interact with cloud
computing(7) cloud computing (8) computing (9)
Products and companies that Origins of the iPad (2) Technical aspects of the iPad
compete with iPad and Apple (3)
respectively (1)
Products and technologies Use cases for the iPad (5)
that are similar to the iPad (4)
Rank in Rank in
Anchor Rank in 2012
2010 2011
Merged Deleted
3gpp long term evolution with 4g fiscal quarter (removes as a simple point
to a definition what is a quarter)
There are 13 possible trends in the collaboration process, as combinations from edits,
editors and edits per editor statistics.
The interpretation scheme is based on the assumption that the intensity of the editing
activity depends on the interest of individuals. The higher the interest in the social
group, the more editing activity is observed. However, in accordance with the theory,
there are two possible explanations for the interest increase. It can be caused by either a
high level of unfamiliarity or by a higher ‘involvement’. The involvement describes the
role a phenomenon plays in individuals’ lives while ‘unfamiliarity’ describes the extent
to which the phenomenon remains unfamiliar to the social group. If the unfamiliarity of
a phenomenon is high, the social group naturally responds with an attempt to familiarise
it (Moscovici 2000/1984, p.37). In the similar vein, phenomena which are of a greater
importance for the social group are subject to more intensive editing activity.The
interpretation scheme helps in giving meaning to the changes in the evolution of
collaboration on Wikipedia. Such meaning is achieved by analysing data regarding the
amount of edits and editors in two different periods. The scheme is capable of revealing
centralisation or decentralisation of the collaboration, increasing or decreasing interest,
and even indicating the probable cause of those interest changes. For example, more
centralised collaboration indicates an attempt to resolve conflicting representations.
Consequently, some collaboration patterns are more likely correspond to a higher
degree of unfamiliarity associated with the underlying phenomenon than others.
117
All values in the table are average values for the corresponding monthly statistic in the corresponding evolution phase.
118
All values in the table are average values for the corresponding monthly statistic in the corresponding evolution phase.
119
Editing Statistics
120
Editors Statistics
121
Total Time
Rank in Rank in Rank in Rank in Rank in Total
Anchor Adjusted
2008 2009 2010 2011 2012 Rank
Rank
internet 0.72 0.81 0.99 0.35 1.00 3.87 1.952
web browser 0.45 0.99 0.99 0.39 0.97 3.79 1.968
computing 0.53 0.29 0.98 0.54 0.99 3.33 1.860
software 0.45 0.99 0.99 0.22 0.33 2.98 1.322
data 0.45 0.99 0.99 0.22 0 2.65 1.047
software as a service 0.70 0.90 0.11 0.09 0.74 2.54 1.148
computer network diagram 0.25 0.99 0.99 0.22 0 2.45 1.013
server (computing) 0 0.54 0.99 0.27 0.58 2.38 1.338
utility computing 0.47 0.04 0.26 0.54 1.00 2.31 1.415
scalability 0 0.94 0.99 0.22 0.05 2.20 0.997
infoworld 0 0.89 0.99 0.22 0.05 2.15 0.980
virtualization 0.47 0.56 0.93 0 0.05 2.01 0.772
metaphor 0 0.69 0.99 0.22 0 1.90 0.872
abstraction 0 0.66 0.99 0.23 0 1.88 0.868
everything as a service 0.52 0.99 0.29 0 0 1.80 0.562
business application 0 0.54 0.99 0.22 0 1.75 0.822
computer network 0 0 0 0.58 1.00 1.58 1.220
122
Total Time
Rank in Rank in Rank in Rank in Rank in Total
Anchor Adjusted
2008 2009 2010 2011 2012 Rank
Rank
platform as a service 0 0.73 0.11 0 0.53 1.37 0.740
quality of service 0.24 0 0.84 0.22 0 1.30 0.607
infrastructure as a service 0 0.60 0 0 0.52 1.12 0.633
electrical grid 0 0 0 0.28 0.81 1.09 0.862
application software 0 0 0 0.04 0.97 1.01 0.835
data center 0.66 0 0.34 0 0.01 1.01 0.288
paradigm shift 0 0.17 0.82 0 0 0.99 0.467
converged infrastructure 0 0 0 0 0.97 0.97 0.808
shared services 0 0 0 0 0.97 0.97 0.808
pdf 0.24 0.21 0.48 0 0 0.93 0.350
mobile app 0 0 0 0 0.92 0.92 0.767
google 0.23 0 0.60 0.08 0 0.91 0.392
salesforce 0.23 0 0.52 0.07 0 0.82 0.345
electricity grid 0 0 0.52 0.22 0.07 0.81 0.465
web 2.0 0.47 0.34 0 0 0 0.81 0.192
service level agreement 0.24 0 0.54 0 0 0.78 0.310
google apps 0.45 0.33 0 0 0 0.78 0.185
service (economics) 0 0 0 0.31 0.44 0.75 0.573
ibm 0.23 0 0.45 0.07 0 0.75 0.310
microsoft 0.23 0 0.46 0 0 0.69 0.268
economies of scale 0 0 0 0 0.67 0.67 0.558
amazon web services 0.19 0 0.39 0.07 0 0.65 0.273
product (business) 0 0 0 0.31 0.33 0.64 0.482
nist 0 0 0.42 0.22 0 0.64 0.357
business software 0 0 0 0 0.63 0.63 0.525
mainframe computer 0 0 0.60 0 0 0.60 0.300
123
Total Time
Rank in Rank in Rank in Rank in Rank in Total
Anchor Adjusted
2008 2009 2010 2011 2012 Rank
Rank
hewlett packard 0.23 0 0.26 0.07 0 0.56 0.215
computer 0 0 0.41 0.13 0 0.54 0.292
service level agreements 0 0 0.30 0.22 0 0.52 0.297
web application 0.48 0 0 0.04 0 0.52 0.107
client–server 0 0 0.51 0 0 0.51 0.255
information technology 0 0.01 0.14 0.22 0.05 0.42 0.262
cloud 0 0 0 0 0.41 0.41 0.342
gartner 0 0.39 0 0 0 0.39 0.130
data as a service 0.00 0 0 0 0.38 0.38 0.317
storage as a service 0 0 0 0 0.38 0.38 0.317
security as a service 0 0 0 0 0.38 0.38 0.317
test environment as a service 0 0 0 0 0.38 0.38 0.317
api as a service 0 0 0 0 0.38 0.38 0.317
service-oriented architecture 0 0 0.15 0.22 0 0.37 0.222
grid computing 0.36 0 0 0 0 0.36 0.060
business model 0 0 0 0 0.33 0.33 0.275
vmware 0 0 0.24 0.07 0 0.31 0.167
autonomic computing 0.21 0 0 0.10 0 0.31 0.102
computer cluster 0.29 0 0 0 0 0.29 0.048
multi-core 0.29 0 0 0 0 0.29 0.048
parallel computing 0.29 0 0 0 0 0.29 0.048
computer data storage 0 0.00 0.04 0 0.24 0.28 0.220
hardware virtualization 0 0 0.06 0.22 0 0.28 0.177
the cloud 0.27 0.01 0 0 0 0.28 0.048
fujitsu 0 0 0.20 0.07 0 0.27 0.147
distributed computing 0.27 0 0 0 0 0.27 0.045
124
Total Time
Rank in Rank in Rank in Rank in Rank in Total
Anchor Adjusted
2008 2009 2010 2011 2012 Rank
Rank
emory university 0 0.26 0 0 0 0.26 0.087
goizueta business school 0 0.26 0 0 0 0.26 0.087
desktop as a service 0 0 0 0 0.25 0.25 0.208
dell 0 0 0.17 0.07 0 0.24 0.132
capital expenditure 0.24 0 0 0 0 0.24 0.040
utility 0.24 0 0 0 0 0.24 0.040
electricity 0.24 0 0 0 0 0.24 0.040
subscription 0.24 0 0 0 0 0.24 0.040
multitenancy 0.24 0 0 0 0 0.24 0.040
ramnath chellappa 0 0.23 0 0 0 0.23 0.077
multi-tenant 0.23 0 0 0 0 0.23 0.038
vector processor 0.23 0 0 0 0 0.23 0.038
yahoo! 0.23 0 0 0 0 0.23 0.038
skytap 0 0 0.15 0.07 0 0.22 0.122
general electric 0.22 0 0 0 0 0.22 0.037
l'oréal 0.22 0 0 0 0 0.22 0.037
valeo 0.22 0 0 0 0 0.22 0.037
virtualisation 0 0.21 0 0 0 0.21 0.070
open standards 0.21 0 0 0 0 0.21 0.035
open source software 0.21 0 0 0 0 0.21 0.035
computational resource 0.21 0 0 0 0 0.21 0.035
public utility 0.21 0 0 0 0 0.21 0.035
autonomic computing#autonomic_systems 0.21 0 0 0 0 0.21 0.035
netapp 0 0 0.13 0.07 0 0.20 0.112
cluster (computing) 0.20 0 0 0 0 0.20 0.033
loose coupling 0.20 0 0 0 0 0.20 0.033
125
Total Time
Rank in Rank in Rank in Rank in Rank in Total
Anchor Adjusted
2008 2009 2010 2011 2012 Rank
Rank
grid computing#grids versus conventional
0.20 0 0 0 0 0.20 0.033
supercomputers
peer to peer 0.20 0 0 0 0 0.20 0.033
bittorrent (protocol) 0.20 0 0 0 0 0.20 0.033
skype protocol#protocol 0.20 0 0 0 0 0.20 0.033
volunteer computing 0.20 0 0 0 0 0.20 0.033
Anchor Snapshots
130
131
Anchor Dissimilarity
133
Editing Statistics
136
Editors Statistics
137
Anchor Snapshots
143
Anchor Dissimilarity
145
Declaration of Authorship
I hereby declare that, to the best of my knowledge and belief, this Master Thesis titled
“The Genealogy of Knowledge in Wikipedia – Method Development and Application” is
my own work. I confirm that each significant contribution to and quotation in this thesis
that originates from the work or works of others is indicated by proper use of citation
and references.
Münster, 29 July 2013