You are on page 1of 7

Crowdsourcing Research Tool for a Digital Age

Protein group B
Ross Hunter, Adama Goli
nski and Vaida Vanagaite
February 23, 2014
Abstract. The advent of the internet was a pivotal moment in human history. Not only has it had a
profound influence on the everyday lives of millions, but it has revolutionised scientific research. We will
discuss the concept of crowdsourcing and its scientific applications in a digital era. Specifically this report
presents crowdsourcings recent history and progressive evolution ultimately resulting in the development
of Games with a Purpose alongside possible future developments and their implications for humanity.

A Brief Overview

Crowdsourcing has existed throughout history, under various guises, wherever an application requires
a broad knowledge base, or a wide range of participants. Perhaps the most notable historical example
of crowdsourcing is that of the census. Originally
used in the Roman Empire to record eligibility for
military service, entire populations were questioned
to produce comprehensive sets of data. In its modern form, a census is a nation-wide survey designed
to collect information about the nation. Despite
these non-scientific origins, it is plain to see that
it is possible to enlist the assistance of vast numbers
of individuals should the need arise.

The term crowdsourcing was first coined in 2006,


by Jeff Howe, in Wired magazine. It was used in a
socio-economic context, describing the then recent
business trend of taking a function previously performed by an individual or small group, and outsourcing it to a large network of people using the
internet.
Approximately two years later, crowdsourcing came
to be formally defined, in a scientific sense, as a distributed problem solving model [3]. It is then, in
essence, a digital embodiment of well known idiom,
many hands make light work. While it is true that
many areas of research require vast quantities of data to be analysed, and that it is certainly feasible to
segment and distribute this data to many individuals, this does not solely define crowdsourcing: it is
more than just distributed computing [15].

It is hardly surprising then, that this concept was


developed into a scientific tool. With computer usage and high-speed internet connections commonplace in modern digital society, one might, with just
a few clicks, enlist thousands of individuals to provide, collect or analyse data. In 2007, with their
project Galaxy Zoo [16], the Zooniverse research
team did exactly that.

The fundamental premise we wish to concern ourselves with is that the combined cognitive abilities
of a group of individuals is far more significant than
the computing power they collectively posses. It is
this premise that underpins crowdsourcings scientific validity: the internet affords the ability to communicate with, and utilise the intelligence of, the
masses.

Galaxy Zoo was a crowdsourcing endeavour that


has since come to be described as a Citizen Science project. It involved over one hundred and fifty
thousand ordinary members of the public visually classifying galaxies imaged by the Sloan Digital
Sky Survey. This project served as a vital stepping-

stone for concept of crowdsourcing not only did it


demonstrate the colossal scale of involvement that
crowdsourcing could generate, it showed progression
away from distributed computing. The images were
not classified by software that ran as a background
application on personal computers, they were categorised by the participants themselves, consciously and repeatedly allocating a category to each image, according to some predetermined characteristics. The result was over fifty million self-verifying
classified galactic images.

Data gathered by the game was used to label the


images for the search purposes this would have
been incredibly difficult to do any other way than
by harnessing human cognitive skills. It did not,
however, provide any insight into the process of the
visual cognition itself; an issue addressed in Peekaboom a game co-created in 2006 by von Ahn and
his student, Ruoran Lui [11].
In contrast with ESP, Peekaboom incorporated
asymmetric player roles. One player was given a
random image with a word they had to convey to
the other player, by only revealing as small a section of the image as possible (figure 1). In this case,
detailed game records were held for further analysis. It represented an innovative breakthrough for
crowdsourcing, as it allowed not only the gathering
of human generated data, but held the potential for
this user input to assist in the development of superior computerised algorithms. This was, at the
time, a paradigm shifting view on how crowdsourcing could be used.

This task, albeit not particularly challenging, serves


to illustrate the evolution of crowdsourcing; not only can it be utilised as a data collection mechanism,
but now, with the assistance of the internet, it is a
viable platform to multiply and collate human problem solving efforts. Furthermore, it is a platform
that is particularly applicable to scenarios requiring
some degree of spatial reasoning, or image recognition; for it is in these areas that the human mind
can easily outclass even the most advanced computer.

Extra-Sensory
Peekaboom

Perception

Both ESP and Peekaboom utilise the same method


of player motivation they were created to entertain players. Although this might seem obvious, because in general people will not play games they
do not enjoy, it must be compared with a different motivational technique forced participation.
Projects such as reCAPTCHA force user participation by providing a compulsory obstacle that must
be overcome in order to reach the desired ultimate
goal.

and

That being said, the foundations of contemporary


internet-based crowdsourcing are often attributed to
Luis von Ahn, associate professor at Carnegie Mellon University. Von Ahn is also widely known as the
creator of CAPTCHA the Completely Automated
Public Turning test to tell Computers and Humans
Apart.

In his doctoral thesis von Ahn came up with the term Games With a Purpose (GWAP) describing the
games in which the primary aim is not solely entertainment. Presently, this is used interchangeably
with crowdsourcing games.

At the time of his doctoral thesis, in 2005, he created a two player online game known only as ESP.
The game connected two players anonymously over
the internet, with no physical means of communication. Both players were given the same randomly
selected image from the internet. To score points
and progress to the next level both players had to
give the same word to describe the images not
having any forms of communication, this was presumably done by Extra-Sensory Perception, hence
the name [13].

reCAPTCHA

As mentioned above, in 2000 Von Ahn pioneered CAPTCHA. Shortly afterwards, the most
widespread implementation of CAPTCHA was used

Figure 1: Screen shot of a Peekaboom game-play [1].


as a security measure. Human users were required to
prove they were human by inputting the text from a
distorted image. This was at the time, very difficult
to do using Optical Character Recognition software
(OCR).

The fact that reCAPCTHA was a compulsory feature for the many online applications, rather than a
standalone game requiring separate marketing had
a profound effect on the number of user, and hence,
the amount of effort exercised during participation.
While collectively von Ahns games were played by
around two hundred thousand people [12], over 100
million compulsory reCAPTCHAs are displayed every day [7].

In 2007 he initiated the reCAPTCHA project which


was used for digitising books, specifically any text fragments that OCR software could not process.
Those fragments were used as in tandem with computer generated CAPTCHAs, to be interpreted by
humans, during verification processes. If the user provided the correct answer to the computergenerated piece of the CAPTCHA, it was highly
likely that they would also provide a valuable and
correct interpretation of the unknown text fragment [14].

Foldit

In 2008, the online puzzle game Foldit was developed by the University of Washingtons Centre for
Game Science. Foldit is centred on the concept of protein folding. Proteins are large biological
molecules that consist of long chains of amino acids, each with their own unique structure, which are
responsible for a vast array of functions including
immune responses, metabolic regulation and DNA
replication.

The whole system was acquired by Google in 2009


and was subsequently used to digitise the 30 years
archives of The New York Times. Despite not being designed to aid the development of OCR algorithms or gather data about human visual cognition, CAPTCHAs security applications inadvertently provoked further development of OCR software [5] and subsequently Speech Recognition (SR) software [2] when an audio alternative to a text
based CAPTCHA was implemented for people with
hearing impairments.

Through game play, Foldits purpose is for users to


find the lowest energy state of a protein chain by
bending, stretching, wiggling or otherwise manipulating certain aspects of it. The lower the energy of
the state, the more stable the molecule will be, and

subsequently the more likely it is to exist in that


particular form. The game determines what effect a players adjustment would have to the overall
energy of the molecule and translates this to a numerical score: naturally, the lowest energies achieve
the highest scores.

interface has had many technical terms removed in


favour of non-scientific instructions, suited to a general audience. This coupled with no prerequisite
knowledge of physics or biology makes the game accessible to and playable by scientific professionals
and non-professionals alike.

The three dimensional structure of a protein determines its biological function and, as such, the ability to predict possible stable protein structures
would be vastly advantageous to modern science: it
would allow scientists to design new proteins with
a single specific function that could subsequently
be used to treat degenerative neurological disorders
such as Alzheimers, Huntingtons and CreutzfeldtJakob disease, along with retroviral infections like
HIV.

Soon after release, the games popularity prompted


developers to introduce the possibility for players
to digitise their strategies into recipes that could
be shared with other users. This feature allowed
users to generate over five thousand documented
approach tactics in the first year alone. The most
successful of these approaches rapidly gained popularity amongst the players, with two recipes (A
Deep Breath and Blue Fuse) gaining particular
dominance.

Currently, the most commonly used protein structure prediction technique utilises a Monte Carlo stochastic modelling algorithm [6]. This arbitrarily (but systematically) varies elements of a protein chain in search of the most stable, lowest energy structure. Although comprehensive, the approach is highly inefficient when compared to human manipulation. Humans, as a species have evolved and developed astounding spatial reasoning
and pattern recognition abilities and subsequently,
when presented with a poorly-folded protein chain
and a list of criteria that must met, can instantly eliminate disadvantageous structural adjustments
and complete optimisation tasks with increased efficiency.

Figure 2 shows a graphical interpretation of how six


recipes can be used to reduce energy, as a function
of time. It can be seen that the performance of
A Deep Breath and Blue Fuse was similar to the
performance of Fast Relax a scientifically generated, unpublished recipe. Moreover, it can also be
seen that three user generated recipes are more effective at energy minimisation than Classic Relax
a commonly used Rosetta protocol.
Figure 3 shows a graphical representation of energy minimisation achieved by the Rosetta computer
algorithm (yellow dots) and Foldit player (green dots). While it can be seen that broadly, the computer algorithm produced lower energy outcomes than
the majority of players, it is evident that the lowest energies were achieved by human players. The
blue line in the graph shows the energy pathway followed by a single Foldit player and illustrates precisely how players can react adaptively they can
explore the high-energy states to subsequently reach
the desirable low energy ones from a different starting point. Technically speaking, this situation
occurs during significant backbone restructuring
where the primary chains structure is completely changed. This tear it up and start again with
something more useful approach cannot be undertaken by computers even the most sophisticated
algorithms would, quite literally, tie themselves in

Thus, based on a hypothesis that human spatial reasoning could improve both the sampling of conformational space and the determination of when to
pursue suboptimal structures, Foldit was created.
In the game, elements of the structure optimisation process have been replaced with human decision
making while retaining some deterministic Rosetta
algorithms as user tools [8].
Improperly folded protein structures are periodically posted online as puzzles for a fixed amount of
time; during which players interactively manipulate
the protein chains in any manner which they believe will lead to the highest score. The in-game
4

Integrating both organic and digital functions was a


novel approach to protein structure prediction that
subsequently led to some impressive results that
would otherwise have not been possible. One particularly notable Foldit success that invariably utilised some of the recipes and strategies outlined
above occurred in 2009, when the full structure of
the Mason-Pfizer monkey virus (M-PMV) retroviral
protease was determined [9]. This accomplishment,
after several years of distributed computing failures,
held significant scientific importance. Very little was
previously known about the structure of retroviral
proteases, however knowledge of their structure was
of critical importance to the development of antiretroviral drugs that, in the future, could be used to
combat diseases like AIDS.
At the time, this was a huge break-through for
crowd sourcing it showed that scientific games had
the potential not only to solve hard technical problems by utilising human visual skills, but also that
they could be used to discover and formalise effective new problem solving strategies and algorithms,
by invoking human problem solving skills.

Figure 3: Achievements of computer alogrithms and


Foldit players in protein modelling [8].

Further Developments

The pertinent question to now ask is what does the


future hold for crowdsourcing, and what does the
future hold for Foldit? The answer, unquestionably,
is whatever we make of it. Scientific crowdsourcing has reached a critical point in its development,
with some substantial triumphs to its name, but it
is by no means a miracle solution to every problem.
That being said, the exploitation of human spatial
cognition does have current applications. A joint
venture between Heriot-Watt University and the University of Edinburgh has lead to the production of
another scientific game with a purpose for research
into quantum information [4]. Since many physics
problems can be represented and analysed in three
dimensional space, it was hypothesised that human
decision making could arguably increase the efficiency of brute force attempts at finding solutions. From

Figure 2: Performance of the different algorithms in


protein modelling [8].
knots at the seemingly infinite array of possibilities.

this and from Foldit, the beginnings of a pattern are


emergent: an effective way to harness human input
is through the medium of a video game. Collectively, on average, humanity spends around three
billion hours every week playing video games [10].
If even one tenth of one percent of this time could
be utilised for scientific purposes, the implications
would be staggering!

be possible to use these digitally determined structures to create accurate computer models and simulations to test the effects of treatments for highly
dangerous diseases (Anthrax, for example) safely,
before physical trials are implemented. Purely on
this basis, if nothing else, rejecting crowdsourcing
as a valid research tool would prove detrimental to
disease and treatment research and human life as a
whole.

To achieve anywhere near this figure, these games


must be popular, or at least used by many people.
One possible way to achieve this could be through
their inclusion as mini-games inside popular mass
market games. In many mass market role-playing
games, players must solve short logical or agility
puzzles to open locked doors, hack into a computer,
or progress to the next level. These puzzles could
easily be substituted with short science problems.
This simultaneously tackles two of the major challenges facing scientific crowdsourcing games: providing a source of motivation and accessing a wide
audience.

References
[1] Accessed 22 January 2014. URL: http://
ninjamonkeys.co.za/media/instr1.gif.
[2] Anon. Project stiltwalker. Accessed 22 January 2014. URL: http://www.dc949.org/
projects/stiltwalker/.
[3] D Brabham. Crowdsourcing as a model for
problem solving an introduction and cases.
2008. doi:10.1177/1354856507084420.

The quantum information research game, developed


in Edinburgh and mentioned above, could be very
easily incorporated into futuristic games similar to
Mass Effect; as such games feature topics like quantum cryptography, Faster-Than-Light travel and information transport. Inclusion of such scientific
mini-games into major titles would not involve unachievable amounts of work. Furthermore, through
the resulting press attention that would arise from
the stereotype breaking real world application these
games bring, could result in substantial profits for
the gaming industry.

[4] O Brown, J Truesdale, S Louchart, and McEndoo, S. Serious game for quantum research.
2013. doi:10.1007/978-3-642-40790-1_17.
[5] Claudia Cruz-Perez, Oleg Starostenko, Fernando Uceda-Ponga, Vicente Alarcon-Aquino, and
Leobardo Reyes-Cabrera. Breaking recaptchas
with unpredictable collapse: heuristic character segmentation and recognition. page 155165,
2012. doi:10.1007/978-3-642-31149-9_16.
[6] Rhiju Das and David Baker. Macromolecular modeling with rosetta. Annual review of
biochemistry, 77:363382, 2008. doi:10.1146/
annurev.biochem.77.062906.171838.

Finally, consider Foldit once more. If, as proposed


above, three hundred thousand hours a week could
be devoted to solving protein structure prediction
problems, it is not out with the realms of possibility that this could translate to three hundred thousand additional players. From there it does not
take much to start romanticising about how quickly brand new protein structures could be theorised
and digitally determined, then used to assist in the
treatment of the diseases mentioned previously, and
indeed, many more. Further to this, it would also

[7] Google. Recaptcha, frequently asked questions. Accessed 22 January 2014. URL: https:
//www.google.com/recaptcha/faq.
[8] Eric Hand. Citizen science: People power. Nature, 466(7307):685687, 2010. doi:10.1038/
466685a.
[9] Firas Khatib, DiMaio, Frank, Foldit Contenders Group, Foldit Void Crushers Group,
6

Seth Cooper, Maciej Kazmierczyk, Miroslaw


Gilski, Szymon Krzywda, Helena Zabranska, Iva Pichova, James Thompson, Zoran Popovi,
Mariusz Jaskolski, and David Baker. Crystal
structure of a monomeric retroviral protease
solved by protein folding game players. Nature
structural & molecular biology, 18(10):1175
1177, 2011. doi:10.1038/nsmb.2119.

[13] Luis Von Ahn and Laura Dabbish. Labeling


images with a computer game. pages 319326,
2004.
[14] Luis von Ahn, Benjamin Maurer, McMillen, Colin, David Abraham, and
Manuel Blum.
reCAPTCHA: humanbased character recognition via web security measures. Science (New York, N.Y.),
321(5895):14651468, 2008.
URL: http:
//dx.doi.org/10.1126/science.1160379,
doi:10.1126/science.1160379.

[10] J McGonigal.
Ted conversations 3 billion hours, 2011. Accessed 21 January 2014.
URL: http://www.ted.com/conversations/
44/we_spend_3_billion_hours_a_wee.html.

[15] Wikipedia.
Distributed computing, 2013.
Accessed 7 January 2014.
URL: http:
//en.wikipedia.org/wiki/Distributed_
computing.

[11] L Von, R Liu, and M Blum. Peekaboom: A


game for locating objects in images. 2006.
[12] Luis von Ahn. Games with a purpose. Accessed 22 January 2014. URL: http://www.
gwap.com/.

[16] Zooniverse. Galaxy zoo home, 20072014. Accessed 11 January 2014. URL: http://www.
galaxyzoo.org/.

You might also like