/  8
 
Jean-Claude Bradley (Drexel University, PI)
is an Associate Professor of Chemistry and E-LearningCoordinator for the College of Arts and Sciences at Drexel University. He leads the UsefulChem project,an initiative started in the summer of 2005 to make the scientific process as transparent as possible bypublishing all research work in real time to a collection of public blogs, wikis and other web pages. Jean-Claude coined the term Open Notebook Science (ONS) to distinguish this approach from other morerestricted forms of Open Science. He established the ONS Challenge in the fall of 2008, open to studentsin the US and the UK for the measurement of solubilities in non-aqueous solvents, with prizes throughsponsorship from Submeta, Nature and Sigma-Aldrich. The main chemistry objective of the UsefulChemproject is currently the synthesis and testing of novel anti-malarial agents and the measurement of non-aqueous solubility for readily available organic compounds. The cheminformatics component aims tointerface as much of the research work as possible with autonomous agents to automate the scientificprocess in novel ways. Jean-Claude teaches undergraduate organic chemistry and chemical informationretrieval courses with most content freely available on public blogs, wikis, games and audio and videopodcasts. Openness in research meshes well with openness in teaching. Real data from the laboratorycan be used in assignments to practice concepts learned in class. Jean-Claude has a Ph.D. in organicchemistry and has published articles and obtained patents in the areas of synthetic and mechanisticchemistry, gene therapy, nanotechnology and scientific knowledge management.
 
Rajarshi Guha (Indiana University, co-PI)
is a visiting Assistant Professor in the School of Informaticsat Indiana University. His area of research focus on the development of cheminformatics methodologiesto enhance various aspects of the drug discovery process. Much of his research focuses on thedevelopment of novel methods for Quantitative Structure-Activity Relationship (QSAR) models thatenhance their reliability and interpretability. This research has been applied to a number of biologicalsystems including artemisinin analogs, PDGFR inhibitors and cell proliferation agents. More recently hehas developed methods to characterize structure-activity relationship (SAR) data using a novel network paradigm. In addtion to QSAR modeling, he is also addressing efficient ways to characterize and explorechemical spaces of arbitrary dimensionality. These methods make use of nearest neighbor algorithms andhave been employed in the analysis of combinatorial libraries and determination of cluster cardinality inan a priori fashion. Along with methodology development he has contributed extensively to the softwareimplementations in cheminformatics. Examples include signifcant contributions to the ChemistryDevelopment Kit (CDK ) and Java toolkit for cheminformatics, development of a cheminformatics webservice infrastructure and the development of a statistical web service ifnrastructure (based on R) thatsupports access to specific machine learning routines as well deployment of prebuilt statistical models. Hehas a Ph.D. in chemistry and has published extensively in peer reviewed journals.
 Antony Williams (ChemSpider, co-PI)
is a Senior Fellow at the National Institute of StatisticalSciences, is the President of ChemZoo Inc., and is the host of ChemSpider, a Structure CentricCommunity for Chemists. Antony has worked in academia, in industry (Eastman-Kodak) and in thecommercial cheminformatics sector in the role of Chief Science Officer. He has authored and co-authored over 100 scientific publications and multiple invited book chapters. He has two issued patentsand has one pending. His formal training is as an NMR Spectroscopist with a focus on small moleculestructure elucidation and analytical data processing algorithms. the last decade of his career has beenfocused on the development of integrated sample, structure and analytical data management systems atthe desktop and at the enterprise level utilizing web-based technologies. In the past year he hasestablished a free access website, ChemSpider, to provide access to chemistry-related information foralmost 20 million chemical entities and their associated data. He is interested in all aspects of cheminformatics, with special interest in chemical structure handling, nomenclature and computer-assisted structure elucidation. He conducts research in the area of data processing techniques forspectroscopy and development of the semantic web.
 
Justification of Intellectual Partnership
This project aims to synergize the following tools and processes: Open Notebook Science,crowdsourcing, laboratory automation, a chemical structure repository, cheminformatics and QSPRmodeling.Crowdsourcing is a term introduced by Jeff Howe to describe the solution of problems through adistributed network of people.
1
Although there are several examples of crowdsourcing in science, mostare not open. The best known example is Innocentive, where scientific problems are made public withprizes for the solvers.
2
However, the proposed solutions are not made public and accepted solutions areintended to be proprietary to the company funding the solution. Innocentive is also collaborating with theRockefeller Institute for help to combat third world and orphan diseases.
3
Other non-commercialexamples in science include Stardust@home
4
and GalaxyZoo,
5
initiatives designed to identifyastronomical objects. Some recent examples of crowdsourcing in chemistry are Chemmunity,
6
theSynaptic Leap,
7
OrgList,
8
Chemists Without Borders,
9
ChemUnPub
10
and NineSigma.
11
 The term "Open Notebook Science" was coined to represent a form of Open Science where the laboratorynotebook is made public in as close to real time as possible.
12,13
The Bradley lab has demonstrated thefeasibility of carrying out Open Notebook Science since the summer of 2005 with the UsefulChemproject.
14
Benefitting from the use of monthly prizes and open to students from the US and UK, the OpenNotebook Science Challenge, presently underway, is another example.
15
Experiments are stored on wikipages, much like in a paper notebook, but with hyperlinks to all the raw data collected. There is also ablog to report on the overall progress of the scientific work.
16
The UsefulChem blog and wiki receiveabout 200 hits per day. Over time, collaborations with other scientists have evolved. Rajarshi Guha atIndiana University and Tsu-Soo Tan from Nanyang Polytechnic in Singapore have invested significantamounts of time in running docking calculations for UsefulChem virtual libraries and reporting theirresults openly, in near real-time. Dan Zaharevitz, from the National Cancer Institute has contributed bytesting compounds for potential anti-tumor activity. Phil Rosenthal, from UCSF, has process of testingsome compounds for anti-malarial activity. Other individuals from around the world have contributed byengaging in open discussions in the blogosphere. The UsefulChem project has also been cited in the peer-reviewed literature,
17-18
including an article in a peer-reviewed journal citing UsefulChem lab notebook pages as primary references.
19
 These examples of spontaneous collaboration clearly indicate that there is a huge potential for doingscience under open conditions. However, in order to take full advantage of this potential, additionalelements must come into play. Currently, experimental results on the wiki pages are written in a formatamenable to human interpretation. The system could be made much more powerful by representing theinformation in a format understandable by machines as well, so as to contribute to the growing semanticweb.As a step in this direction, each experiment page has a “tag” section at the end where more machine-friendly representations of molecules used in the experiment are listed. As an example, InChI codes arealphanumeric representations of molecules using standards supported by IUPAC. Since these InChIs areindexed by search engines, such as Google, it permits an unambiguous means of retrieving relevantinformation about each chemical. Other means of representing chemicals (SMILES, chemical names)suffer from having non-unique representations.This tagging system works for simple retrieval of an exact chemical structure but fails to allow moresophisticated queries, such as, identifying chemical substructure, similarity of structure, whether achemical is a reagent or a product, the reaction conditions, and details regarding characterization of thematerials, etc. ChemSpider
20
is a free access online database working to build a structure community forchemists and offers some of the necessary capabilities to facilitate this crowdsourcing project. The systemhas been built specifically with the intention of hosting chemical structures and related information suchas analytical data. The system presently hosts over 20 million unique chemical structures and is the first
 
online system to facilitate public depositions of single structures or collections, property data,unstructured annotations and analytical data. ChemSpider (CS) has been designed with the needs of thechemist in mind and over 5000 chemists per day use the system for the purpose of searching for data andinformation associated with chemical entities. The system has already started to deliver on its promise tofacilitate connectivities via the semantic web by providing open web services to allow other platforms tointegrate and derive value from the information available online. CS has already committed to twodirections of related interest to this project : Development of a platform for collaboration betweenchemists
21
and the development of a structure-based encyclopedia for chemists, WiChempedia.
22
Thesedevelopments will provide core functionality to support our Open Notebook Science efforts. Additionalcapabilities will be introduced on an as-needed basis to facilitate our efforts in terms of the support of experimental data capture, control and reporting as outlined below.Another component of this partnership is the automation of the execution of some of the experiments.This will allow us to take scientific crowdsourcing to the next level, where the online community can notonly provide feedback but also directly carry out simple experiments. We propose to incorporate aMettler-Toledo Miniblock system with a MiniMapper automated liquid handler. This system has alreadyproven effective in optimizing the Ugi reaction.
19
The MiniMapper software provides an XML log fileand we have demonstrated a simple way to transfer reaction conditions from an open Google Spreadsheetto program the robot.
19
This set-up is ideal for crowdsourcing since anyone can request experiments to beperformed without having to grasp the controls for the MiniMapper.The final component is the use of computational methods to prioritize molecules for experimental study,avoiding the time and effort involved in testing all available compounds. One approach is the use of Quantitative Structure Property Relationship (QSPR) models. A QSPR model employs statistical methodsto correlate molecular structural features of tested molecules to a property (such as solubility) and thenpredict that property for untested molecules. More importantly, the development of QSPR models is aniterative process, whereby one experimentally tests a set of molecules, develops a predictive model,obtains predictions for new molecules and then experimentally tests them for confirmation. The results of the confirmatory experiments can be fed back into the modeling process to enhance the predictiveperformance. In addition to QSPR models, one can employ diversity analysis
23
to identify classes of molecular structures to focus on. This is useful in identifying molecules that have not been well-studied,allowing resources to be focused on them. Common techniques include principal components analysis(PCA) and k-nearest neighbor methods. In combination these methods allow crowd-source collaborationsto be much more time- and cost-efficient. Given the distributed nature of crowdsourcing, it is vital that thecomputational methods also be available in a distributed fashion. Web services have been employed in avariety of fields, and the availability of SOAP based cheminformatics
24
and statistical
25
services, allowsthe integration of of raw exprimental data (from Wiki's and Google Docs), structure information (fromrepositories such as ChemSpider) with computational models and tools.
Research program
The following scientific objectives will be initiated as projects:
1) Determining optimal conditions for the precipitation of Ugi products.
 
The Ugi reaction provides arapid access to large combinatorial spaces by bringing together an amine, an aldehyde, a carboxylic acidand an isonitrile.
26-28
In addition, reaction conditions are generally very convenient: methanol at roomtemperature is a common protocol. The Bradley group has used the Ugi reaction to synthesize potentialanti-malarial compounds.
29
We observed that sometimes the products simply crystallized from thereaction mixture. Although NMR monitoring indicated that the reaction proceeded smoothly in mostcases, the products did not always precipitate. The ability to purify products by a simple filteringtranslates into a significant advantage compared against the alternatives, which would usually consist inchromatography, a costly and inefficient process that severely limits scaling up.

Share & Embed

More from this user

Add a Comment

Characters: ...