You are on page 1of 3

Vol 461|15 October 2009

OPINION

Massively collaborative mathematics


The ‘Polymath Project’ proved that many minds can work together to solve difficult mathematical problems.
Timothy Gowers and Michael Nielsen reflect on the lessons learned for open-source science.
n 27 January 2009, one of us — Gowers that relied on heavy mathematical machinery. approximately 800 substantive comments,

ILLUSTRATIONS BY M. HODSON
— used his blog to announce an unu- An elementary proof — one that starts from containing 170,000 words. No one was spe-
sual experiment. The Polymath Project first principles instead of relying on advanced cifically invited to participate: anybody, from
had a conventional scientific goal: to attack an techniques — would require many new ideas. graduate student to professional mathema-
unsolved problem in mathematics. But it also Second, DHJ implies another famous theorem, tician, could provide input on any aspect.
had the more ambitious goal of doing math- called Szemerédi’s theorem, novel proofs of Nielsen set up the wiki to distil notable insights
ematical research in a new way. Inspired by which have led to several breakthroughs over from the blog discussions. The project received
open-source enterprises such as Linux and the past decade, so there is reason to expect commentary on at least 16 blogs, reached the
Wikipedia, it used blogs and a wiki to mediate that the same would happen with a new proof front page of the Slashdot technology-news
a fully open collaboration. Anyone in the world of the DHJ theorem. aggregator, and spawned
could follow along and, if they wished, make a The project began with “Who would have guessed a closely related project
contribution. The blogs and wiki functioned Gowers posting a descrip- on Tao’s blog. Things went
as a collective short-term working memory, tion of the problem, pointers
that the working record smoothly: neither Internet
a conversational commons for the rapid-fire to background materials and of a mathematical project ‘trolls’ — persistent posters
exchange and improvement of ideas. a preliminary list of rules for would read like a thriller?” of malicious or purpose-
The collaboration achieved far more than collaboration (see go.nature. fully distracting comments
Gowers expected, and showcases what we com/DrCmnC). These rules helped to create a — nor well-intentioned but unhelpful com-
think will be a powerful force in scientific polite, respectful atmosphere, and encouraged ments were significant problems, although
discovery — the collaboration of many minds people to share a single idea in each comment, spam was an occasional issue on the wiki.
through the Internet. even if the idea was not fully developed. This Gowers acted as a moderator, but this involved
The specific aim of the Polymath Project was lowered the barrier to contribution and kept little more than correcting a few typos.
to find an elementary proof of a special case the conversation informal. Progress came far faster than anyone
of the density Hales–Jewett theorem (DHJ), expected. On 10 March, Gowers announced
which is a central result of combinatorics, the Building momentum that he was confident that the Polymath par-
branch of mathematics that studies discrete When the collaborative discussion kicked off ticipants had found an elementary proof of the
structures (see ‘Multidimensional noughts on 1 February, it started slowly: more than special case of DHJ, but also that, very surpris-
and crosses’). This theorem was already known seven hours passed before Jozsef Solymosi, ingly (in the light of experience with similar
to be true, but for mathematicians, proofs are a mathematician at the University of British problems), the argument could be straightfor-
more than guarantees of truth: they are valued Columbia in Vancouver made the first com- wardly generalized to prove the full theorem.
for their explanatory power, and a new proof ment. Fifteen minutes later a comment came A paper describing this proof is being written
of a theorem can provide crucial insights. in from Arizona-based high-school teacher up, along with a second paper describing related
There were two reasons to want a new proof Jason Dyer. Three minutes after that Terence results. Also during the project, Tim Austin, a
of the DHJ theorem. First, it is one of a clus- Tao (winner of a Fields Medal, the highest graduate student at the University of California,
ter of important related results, and although honour in mathematics) at the University of Los Angeles, announced another new (but non-
almost all the others have multiple proofs, DHJ California, Los Angeles, made a comment. elementary) proof of DHJ that made crucial
had just one — a long and complicated proof Over the next 37 days, 27 people contributed use of ideas from the Polymath Project.
879
© 2009 Macmillan Publishers Limited. All rights reserved
OPINION NATURE|Vol 461|15 October 2009

The working record of the Polymath Project preserve blogs by people in the legal profession;
is a remarkable resource for students of a similar but broader programme is needed to
mathematics and for historians and philo- preserve research blogs and wikis.
sophers of science. For the first time one New projects now under way will help
can see on full display a complete account to explore how collaborative mathematics
of how a serious mathematical result was works best (see go.nature.com/4ZfIdc).
discovered. It shows vividly how ideas grow, One question of particular interest is
change, improve and are discarded, and how whether the process can be scaled up to
advances in understanding may come not in a involve more contributors. Although DHJ
single giant leap, but through the aggregation Polymath was large compared with most
and refinement of many smaller insights. It mathematical collaborations, it fell short of
shows the persistence required to solve a being the mass collaboration initially envis-
difficult problem, often in the face of con- aged. Those involved agreed that scaling up
siderable uncertainty, and how even the much further would require changes to the
best mathematicians can make basic mis- process. A significant barrier to entry was
takes and pursue many failed ideas. There are the linear narrative style of the blog. This
ups, downs and real tension as the partici- made it difficult for late entrants to identify
pants close in on a solution. Who would have problems to which their talents could be
guessed that the working record of a math- applied. There was also a natural fear that
ematical project would read like a thriller? they might have missed an earlier discussion
and that any contribution they made would
Broader implications be redundant. In open-source software
The Polymath Project differed from tradi- development, this difficulty is addressed
tional large-team collaborations in other in part by using issue-tracking software to
parts of science and industry. In such col- organize development around ‘issues’ — typi-
laborations, work is usually divided up in cally, bug reports or feature requests — giving
a static, hierarchical way. In the Polymath late entrants a natural starting point, limiting
Project, everything was out in the open, the background material that must be mas-
so anybody could potentially contribute tered, and breaking the discussion down into
to any aspect. This allowed ideas to be modules. Similar ideas may be useful in future
explored from many different perspectives Polymath Projects.
and allowed unanticipated connections to
be made. Towards open science
The process raises questions about author- transparent what any given person contributed. The Polymath process could potentially be
ship: it is difficult to set a hard-and-fast bar for If it is necessary to assess the achievements of a applied to even the biggest open problems,
authorship without causing contention or dis- Polymath contributor, then this may be done such as the million-dollar prize problems of
couraging participation. What credit should be primarily through letters of recommendation, the Clay Mathematics Institute in Cambridge,
given to contributors with just a single insight- as is done already in particle physics, where Massachusetts. Although the collaborative
ful contribution, or to a contributor who is papers can have hundreds of authors. model might deter some people who hope to
prolific but not insightful? As a provisional The project also raises questions about keep all the credit for themselves, others could
solution, the project is signing papers with a preservation. The main working record of the see it as their best chance of being involved in
group pseudonym, ‘DHJ Polymath’, and a link Polymath Project is spread across two blogs the solution of a famous problem.
to the full working record. One advantage of and a wiki, leaving it vulnerable should any of Outside mathematics, open-source
Polymath-style collaborations is that because those sites disappear. In 2007, the US Library approaches have only slowly been adopted by
all contributions are out in the open, it is of Congress implemented a programme to scientists. One area in which they are being
used is synthetic biology. DNA for the design
Multidimensional noughts and crosses of living organisms is specified digitally and
uploaded to an online repository such as the
To understand the density such a board has coordinates avoiding such a line. More Massachusetts Institute of Technology Registry
Hales–Jewett theorem (DHJ), that either stay the same than this, there is no way to of Standard Biological Parts. Other groups may
imagine a multidimensional from one point to the next, or avoid a ‘combinatorial line’, use those designs in their laboratories and, if
noughts-and-crosses (or go upwards or downwards. in which the coordinates that they wish, contribute improved designs back
tic-tac-toe) board, with k For instance, the three points vary have to vary in the same
to the registry. The registry contains more
squares on a side (instead (1,2,3,1,3), (2,2,3,2,2) and direction (rather than some
than 3,200 parts, deposited by more than 100
of the usual three), and in n (3,2,3,3,1), form a line. DHJ going up and some going
groups. Discoveries have led to many scientific
dimensions rather than two. states that, for a very large down), as in the line (1,2,3,1,1),
Any square in this board has number of dimensions, filling (2,2,3,2,2) and (3,2,3,3,3). papers, including a 2008 study showing that
n coordinates between 1 and in even a tiny fraction of the The initial aim of the polymath most parts are not primitive but rather build
k, so for instance if k=3 and board always forces a line project was to tackle the first on simpler parts (J. Peccoud et al. PLoS ONE 3,
n=5, then a typical point to be filled in somewhere — truly difficult case of DHJ, e2671; 2008). Open-source biology and open-
might be (1,3,2,1,2). A line on there is no possible way of which is when k=3. source mathematics thus both show how sci-
ence can be done using a gradual aggregation of
880
© 2009 Macmillan Publishers Limited. All rights reserved
NATURE|Vol 461|15 October 2009 OPINION

insights from people with diverse expertise. of experimental data does at least allow open Timothy Gowers is in the Department of Pure
Similar open-source techniques could be data analysis. The widespread adoption of such Mathematics and Mathematical Statistics,
applied in fields such as theoretical physics and open-source techniques will require significant University of Cambridge, Wilberforce Road,
computer science, where the raw materials are cultural changes in science, as well as the devel- Cambridge CB3 0WB, UK, and a Royal Society
informational and can be freely shared online. opment of new online tools. We believe that 2010 Anniversary Research Professor. Michael
The application of open-source techniques this will lead to the widespread use of mass Nielsen is a Toronto-based writer and physicist
to experimental work is more constrained, collaboration in many fields of science, and working on a book about the future of science.
because control of experimental equipment that mass collaboration will extend the limits e-mails: W.T.Gowers@dpmms.cam.ac.uk; mn@
is often difficult to share. But open sharing of human problem-solving ability. ■ michaelnielsen.org

Stitching science together


Google Wave is the kind of open-source online collaboration tool that should drive scientists
to wire their research and publications into an interactive data web, says Cameron Neylon.

cience communication today remains ‘document’ or ‘wave’ is shared between all the possible to have a dashboard in your inbox to

S firmly wedded to its print origins. We cling


to the notion that ‘the real version’ exists on
the page. Beyond ease of delivery, we take very
participants and updates flow in real time. You
no longer need to worry about which version
of a document you have e-mailed around. This
monitor and control instruments in the lab.
The second step forward is using versions.
Each wave maintains a record of every change.
little advantage of the potential of the World is helpful for scientists, but not revolutionary. It could be possible to check each step from
Wide Web to transform the way we store and Where Wave offers a big step for science is in data collection to drawing a graph and its
transfer knowledge. We rarely take the oppor- two other functionalities. publication. This would allow a reader to step
tunity to update material with new data, or to through an analysis to see where conclusions
provide a record of how a document or data set Two steps forward have come from, and would make detecting
has changed. Gene names and protein structures First, Wave introduces the idea of robots: fraud — or honest mistakes — much easier.
should be routinely linked to database entries automated agents that can be invited into a Google has done a good thing in making
through hyperlinks. The outputs of computa- document. Robots could look through your the protocol and programming tools open
tional processes should be connected to their paper checking for Protein Data Bank codes source, enabling people to test and build. Per-
inputs, so analyses can be redone. If we can make or gene names, for example, and putting in haps 50 people, myself included, from experi-
these records accessible to humans and readable links to the databases. A robot might represent mental scientists to journal publishers, have
by machines, then whole new types of analysis a lab instrument, adding data automatically been testing the prototype system for science
will become possible, indeed standard. to your laboratory record when they become applications since June, building robots that
Many of these things are possible today. available. You can easily add maps, video or link chemical information, visualize data and
But they are hard to achieve. Much effort has three-dimensional graphics to your work format references. Since 30 September, a much
gone into solving parts of the problem, by big using ‘gadgets’ or ‘applications’, familiar from bigger group has been testing. But real benefits
players such as Microsoft and Amazon as well services such as iGoogle and Facebook. Robots will come only if the system is widely adopted.
as by smaller organizations. Electronic lab can interact with this information, making it Perhaps a new generation of scientists will be
notebooks can help to capture the details of required to exploit the power that working with
science, and databases can make it available to these dynamic documents and tools offers.
the user. Reference-management tools such as Solving the current problems in science ILLUSTRATION BY M. HODSON

Delicious, semantic data stores and Wikipedia communication requires the intervention of
can help to wire up and monitor knowledge. strong companies such as Google. But it will
But the tools are often difficult to use and don’t take more than technical advances to provoke
‘talk’ to each other. There is no single frame- scientists into taking full advantage of the web.
work that makes it easy to link all the steps of We need pressure, and perhaps compulsion,
science. Scientists do their analysis and writing from journals and funders to raise publishing
using different software, and prepare graphs standards to the new level made possible by
and record data using different tools. such tools. Google Wave may not be, indeed is
Very few companies worldwide have both probably not, the whole answer. But it points
the expertise and resources to take on the task the way to tools that build records and repro-
of stitching this together. So it is with great ducibility into every step. And that has to be
interest that I have watched Google develop its good for science. ■
product, Google Wave. The company describes Cameron Neylon is senior scientist in
Google Wave as “what e-mail would look like biomolecular sciences at the Science and
if it were invented today”. It blends elements of Technology Facilities Council Rutherford
e-mail with instant messaging and online col- Appleton Laboratory, Didcot OX11 0QX, UK.
laborative authoring. The big change is that the e-mail: cameron.neylon@stfc.ac.uk
881
© 2009 Macmillan Publishers Limited. All rights reserved

You might also like