/  6
 
DOI: 10.1126/science.1165620, 85 (2009);
324
Science 
 
et al.
Ross D. King,
The Automation of Science
 
www.sciencemag.org (this information is current as of April 2, 2009 ): The following resources related to this article are available online at 
 http://www.sciencemag.org/cgi/content/full/324/5923/85version of this article at:including high-resolution figures, can be found in the online
Updated information and services,
found at:can be
related to this article
subject collections
this articlepermission to reproduce
of this article or about obtaining
reprints
Information about obtainingregistered trademark of AAAS.is a
Science 
2009 by the American Association for the Advancement of Science; all rights reserved. The titleCopyrightAmerican Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005.(print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the
Science 
   o  n   A  p  r   i   l   2 ,   2   0   0   9  w  w  w .  s  c   i  e  n  c  e  m  a  g .  o  r  g   D  o  w  n   l  o  a   d  e   d   f  r  o  m 
 
We have demonstrated the discovery of  physical laws, from scratch, directly from ex- perimentally captured data with the use of computational search. We used the presentedapproach to detect nonlinear energy conservationlaws, Newtonian force laws, geometric invari-ants, and system manifolds in various syntheticand physically implemented systems withou prior knowledge about physics, kinematics, or geometry. The concise analytical expressions that we found are amenable to human interpretationand help to reveal the physics underlying theobserved phenomenon. Many applications exist for this approach, in fields ranging from systems biology to cosmology, where theoretical gapsexist despite abundance in data.Might this process diminish the role of futurescientists? Quite the contrary: Scientists may use processessuchasthistohelpfocusoninteresting phenomena more rapidly and to interpret their meaning.
References and Notes
1. P. W. Anderson,
Science
177
, 393 (1972).2. E. Noether,
Nachr. d. König Gesellsch. d. Wiss. zuGöttingen, Math-Phys. Klasse
235 (1918).3. J. Hanc, S. Tuleja, M. Hancova,
Am. J. Phys.
72
, 428(2004).4. D. Clery, D. Voss,
Science
308
, 809 (2005).5. A. Szalay, J. Gray,
Nature
440
, 413 (2006).6. R. E. Valdés-Pérez,
Commun. Assoc. Comput. Mach.
42
,37 (1999).7. R. D. King
et al
.,
Nature
427
, 247 (2004).8. P. Langley,
Cogn. Sci.
5
, 31 (1981).9. R. M. Jones, P. Langley,
Comput. Intell.
21
, 480(2005).10. J. R. Koza,
Genetic Programming: On the Programming of Computers by Means of Natural Selection
. (MIT Press,Cambridge, MA, 1992).11. S. Forrest,
Science
261
, 872 (1993).12. J. Duffy, J. Engle-Warnick,
Evolutionary Computation inEconomics and Finance
100
, 61 (2002).13. F. Cyril, B. Alberto, in
2007 IEEE Congress onEvolutionary Computation
, S. Dipti, W. Lipo, Eds.(IEEE Press, Singapore, 2007), pp. 23
30.14. B. Elena, B. Andrei, L. Henri, in
Seventh International Symposium on Symbolic and Numeric Algorithms for  Scientific Computing (SYNASC '05)
(IEEE Press, 2005),pp. 321
324.15. J. Bongard, H. Lipson,
Proc. Natl. Acad. Sci. U.S.A.
104
,9943 (2007).16. S. Nee, N. Colegrave, S. A. West, A. Grafen,
Science
309
,1236 (2005).17. P. Jäckel, T. Mullin,
Proc. R. Soc. London Ser. A
454
,3257 (1998).18. T. Shinbrot, C. Grebogi, J. Wisdom, J. A. Yorke,
 Am. J. Phys.
60
, 491 (1992).19. Y. Liang, B. Feeny,
Nonlinear Dyn.
52
, 181 (2008).20. M. Mor, A. Wolf, O. Gottlieb, in
Proceedings of the 21st  ASME Biennial Conference on Mechanical Vibration and Noise
(ASME Press, Las Vegas, NV, 2007), pp. 1
8.21. P. Gregory, R. Denis, F. Cyril, in
Evolution Artificielle,6th International Conference
, vol. 2936, L. Pierre,C. Pierre, F. Cyril, L. Evelyne, S. Marc, Eds. (Springer,Marseilles, France, 2003), pp. 267
277.22. E. D. De Jong, J. B. Pollack, in
Genetic Programming and Evolvable Machines
, vol. 4 (Springer, Berlin, 2003),pp. 211
233.23. S. H. Strogatz,
Nature
410
, 268 (2001).24. P. A. Marquet,
Nature
418
, 723 (2002).25. This research was supported in part by Integrative GraduateEducation and Research Traineeship program in nonlinearsystems, a U.S. NSF graduate research fellowship, and NSFCreative-IT grant 0757478 and CAREER grant 0547376.We thank M. Kurman for editorial consultation andsubstantive editing of the manuscript.
Supporting Online Material
www.sciencemag.org/cgi/content/full/324/5923/81/DC1Materials and MethodsSOM TextFigs. S1 to S7Tables S1 to S3ReferencesMovie S1Data Sets S1 to S1515 September 2008; accepted 19 February 200910.1126/science.1165893
The Automation of Science
Ross D. King,
1
*
Jem Rowland,
1
Stephen G. Oliver,
2
Michael Young,
3
Wayne Aubrey,
1
Emma Byrne,
1
Maria Liakata,
1
Magdalena Markham,
1
P
ı
nar Pir,
2
Larisa N. Soldatova,
1
Andrew Sparkes,
1
Kenneth E. Whelan,
1
Amanda Clare
1
The basis of science is the hypothetico-deductive method and the recording of experiments insufficient detail to enable reproducibility. We report the development of Robot Scientist
Adam,
which advances the automation of both. Adam has autonomously generated functional genomicshypotheses about the yeast
Saccharomyces cerevisiae
and experimentally tested these hypothesesby using laboratory automation. We have confirmed Adam
s conclusions through manualexperiments. To describe Adam
s research, we have developed an ontology and logical language.The resulting formalization involves over 10,000 different research units in a nested treelikestructure, 10 levels deep, that relates the 6.6 million biomass measurements to their logicaldescription. This formalization describes how a machine contributed to scientific knowledge.
C
omputers are playing an ever-greater rolein the scientific process (
1
). Their use tocontroltheexecutionofexperimentscon-tributes to a vast expansion in the production of scientific data (
2
). This growth in scientific data,in turn, requires the increased use of computersfor analysis and modeling. The use of computersisalsochangingthewaythatscience is describedand reported. Scientific knowledge is best ex- pressed in formal logical languages (
3
). Onlyformal languages provide sufficient semanticclarity to ensure reproducibility and the freeexchange of scientific knowledge. Despite theadvantages of logic, most scientific knowledge isexpressed only in natural languages. This is nowchanging through developments such as theSemantic Web (
4
) and ontologies (
5
).Anaturalextensionofthetrendtoever-greater computer involvement inscience is the conceptof a robot scientist (
). This is a physically imple-mentedlaboratoryautomationsystemthatexploitstechniques from the field of artificial intelligence(
 – 
9
) to execute cycles of scientific experimenta-tion. A robot scientist automatically originateshypotheses to explain observations, devises exper-imentstotestthesehypotheses,physicallyrunstheexperiments by using laboratory robotics, inter- prets the results, and then repeats the cycle.High-throughput laboratory automation is trans-forming biology and revealing vast amounts of new scientific knowledge (
10
). Nevertheless, ex-isting high-throughput methods are currently in-adequate for areas such as systems biology. Thisis because, even though very large numbers of experiments can be executed,each individual ex- periment cannot be designed to test a hypothesisaboutamodel.Robotscientistshavethepotentialto overcome this fundamental limitation.The complexity of biological systems neces-sitates the recording of experimental metadata inasmuchdetailaspossible.Acquiringthesemeta-data has often proved problematic. With robot scientists, comprehensive metadata are producedas a natural by-product of the way they work.Because the experiments are conceived and ex-ecuted automatically by computer, it is possibleto completely capture and digitally curate all as- pects of the scientific process (
11
,
12
).To demonstrate that the robot scientist meth-odology can be both automated and be madeeffectiveenoughtocontributetoscientificknowl-edge, we have developed Robot Scientist 
Adam
(
13
)(Fig.1).Adam
shardwareisfullyautomatedsuch that it only requires a technician to period-ically add laboratory consumables and to removewaste. It is designed to automate the high-throughput execution of individually designedmicrobial batch growth experiments in micro-titer plates (
14
). Adam measures growth curves(phenotypes) of selected microbial strains (geno-types) growing in defined media (environments).Growthof cell cultures can be easily measured inhigh-throughput, and growth curves are sensitiveto changes in genotype and environment.We applied Adam to the identification of genes encoding orphan enzymes in
Saccharomy-ces cerevisiae
: enzymes catalyzing biochemicalreactions thought to occur in yeast, but for whichthe encoding gene(s) are not known (
15
). To set up Adam for this application required (i) a comprehensive logical model encoding knowl-edge of 
S. cerevisiae
metabolism [~1200 open
1
Department of Computer Science, Aberystwyth University,SY23 3DB, UK.
2
Cambridge Systems Biology Centre, Depart-ment of Biochemistry, University of Cambridge, SangerBuilding, 80 Tennis Court Road, Cambridge CB2 1GA, UK.
3
Institute of Biological, Environmental and Rural Sciences,Aberystwyth University, SY23 3DD, UK.*To whom correspondence should be addressed. E-mail:rdk@aber.ac.uk
www.sciencemag.org
SCIENCE
VOL 324 3 APRIL 2009
85
REPORTS
   o  n   A  p  r   i   l   2 ,   2   0   0   9  w  w  w .  s  c   i  e  n  c  e  m  a  g .  o  r  g   D  o  w  n   l  o  a   d  e   d   f  r  o  m 
 
reading frames (ORFs), ~800 metabolites] (
15
),expressed in the logic programming languageProlog; (ii) a general bioinformatic database of genes and proteins involved in metabolism; (iii)software to abduce hypotheses about the genesencoding the orphan enzymes, done by using a combination of standard bioinformatic softwareand databases; (iv) software to deduce experi-mentsthat test the observational consequences of hypotheses (based on the model); (v) software to plananddesigntheexperiments,whicharebasedontheuseofdeletionmutantsandtheadditionof selected metabolites to a defined growth medium;(vi) laboratory automation software to physicallyexecute the experimental plan and to record thedata and metadata in a relational database; (vii)software to analyze the data and metadata (gen-erate growth curves and extract parameters); and(viii) software to relate the analyzed data to thehypotheses; for example, statistical methods arerequired to decide on significance. Once this in-frastructureisinplace,nohuman intellectualinter-vention is necessary to execute cycles of simplehypothesis-led experimentation. [For more detailsof the software, and its application to a relatedfunctional genomics problem, see (
16 
) and figs.S1 and S2].Adam formulated and tested 20 hypothesesconcerninggenesencoding13orphanenzymes(
16 
) (Table 1). The weight of the experimentalevidence for the hypotheses varied (based on ob-servationsofdifferentialgrowth),but12hypothe-ses with no previous evidence were confirmedwith
< 0.05 for the null hypothesis.Because Adam
s experimental evidence for itsconclusions is indirect, we tested Adam
s conclu-sionswithmoredirectexperimentalmethods.Theenzyme 2-aminoadipate:2-oxoglutarate amino-transferase (2A2OA) catalyzes a reaction in thelysine biosynthetic pathways of fungi. Adam hy- pothesized that three genes (YER152C, YJL060W,and YGL202W) encode this enzyme and ob-servedresultsconsistentwithallthreehypotheses(Table 1). To test Adam
s conclusions, we pu-rified the protein products of these genes andused them in in vitro enzyme assays, whichconfirmed Adam
s conclusions [supporting on-line material (SOM)] (Fig. 2).To further test Adam's conclusions, we ex-amined the scientific literature on the 20 genesinvestigated (Table 1) (
16 
). This revealed the ex-istence of strong empirical evidence for the cor-rectness of six of the hypotheses; that is, theenzymes were not actually orphans (Table 1).The reason that Adam considered them to beorphans was due to the use of an incomplete bio-informatic database. These six genes thereforeconstitute a positive control for Adam's meth-odology. A possible error was also revealed(Table 1) (SOM).To better understand the reasons why theidentity ofthe genesencodingtheseenzymes hasremained obscure for so long, we investigatedtheir comparative genomics in detail (
16 
). Thelikely explanation is a combination of three com- plicatingfactors:geneduplicationswithretentionof overlapping function, enzymes that catalyzemorethanonerelatedreaction,andexistingfunc-tionalannotations.Adam
ssystematicbioinformaticand quantitative phenotypic analyzes were requiredto unravel this web of functionality.Useofarobotscientistenablesallaspectsofscientific investigation to be formalized in logic.For the core organization of this formalization,we used the ontology of scientific experiments:EXPO(
11
,
12
).Thisontologyformalizesgenericknowledge about experiments. For Adam, wedeveloped LABORS, a customized version of EXPO, expressed in the description logic lan-guage OWL-DL (
17 
). Application of LABORS produces experimental descriptions in the logic-
Fig. 1.
The Robot Scien-tist Adam. The advancesthat distinguish Adam fromother complex laboratorysystems are the individualdesign of the experimentsto test hypotheses and theutilization of complex in-ternal cycles. Adam
s basicoperations are selection ofspecified yeast strains froma library held in a freezer,inoculation of these strainsinto microtiter plate wellscontaining rich medium,measurement of growthcurves on rich medium,harvesting of a definedquantity of cells from eachwell, inoculation of thesecells into wells containingdefined media (minimal syn-thetic dextrose medium plusup to four added metab-olites from a choice of six),and measurement of growthcurves on the specified me-dia. To achieve this func-tionality, Adam has thefollowing components: a,anautomated
20°Cfreezer;b, three liquid handlers (oneof which can separately control 96 fluid channels simultaneously); c, threeautomated +30°C incubators; d, two automated plate readers; e, three robotarms; f, two automated plate slides; g, an automated plate centrifuge; h, anautomated plate washer; i, two high-efficiency particulate air filters; and j, arigid transparent plastic enclosure. There are also two bar code readers, sevencameras, 20 environment sensors, andfour personal computers, as well as thesoftware. Adam is capable of designing and initiating over a thousand newstrain and defined-growth-medium experiments each day (from a selection ofthousands of yeast strains), with each experiment lasting up to 5 days. Thedesign enables measurement of OD
595nm
for each experiment at least onceevery 30 min (more often if running at less than full capacity), allowing ac-curate growth curves to be recorded (typically we take over a hundred mea-surements a day per well), plus associated metadata. See the supportingonline material for pictures and a video of Adam in action.
3 APRIL 2009 VOL 324
SCIENCE
www.sciencemag.org
86
REPORTS
   o  n   A  p  r   i   l   2 ,   2   0   0   9  w  w  w .  s  c   i  e  n  c  e  m  a  g .  o  r  g   D  o  w  n   l  o  a   d  e   d   f  r  o  m 

Share & Embed

More from this user

Add a Comment

Characters: ...