You are on page 1of 14

Int. J. Information Technology and Management, Vol. 1, No.

1, 2002 69

Chemoinformatics: a tool for modern drug discovery

M. Karthikeyan and S. Krishnan


SMIS Division, National Chemical Laboratory, Pune – 411 008, India
Fax: +91-20-5893973
E-mail: karthi@ems.ncl.res.in krish@ems.ncl.res.in

Abstract: The exponential rise in costs despite numerous years of hard work
has hampered the proficiency, productivity and efficiency of this research.
Nevertheless, due to in silico and computational advances, databanks have
accelerated the decoding of genes sequences to 3D complex biomolecules in a
very short time span. The important advantage of a computational technique is
its ability to eliminate unpromising lines of inquiry early in the discovery
process. With the help of the combinatorial library and database mining, it is
possible to undertake a specific chemical reaction on various identical reactants
in all possible combinations. Chemoinformatics will help to identify promising
molecules of greater importance at earlier stages viz., to eliminate failures at
the latter stages.

Keywords: Chemoinformatics; computational chemistry; virtual library of


molecules; knowledge based computing; drug discovery; in silico synthesis.

Reference to this paper should be made as follows: Karthikeyan, M. and


Krishnan, S. (2002) ‘Chemoinformatics: a tool for modern drug discovery’,
Int. J. Information Technology and Management, Vol. 1, No. 1, pp.69–82.

Biographical notes: M. Karthikeyan is a scientist in the Scientific


Management Information System Division, National Chemical Laboratory,
Pashan Rd., Pune 411 008, India. After graduating in chemistry from
Pondicherry University, he carried out his research in the area of synthetic
organic chemistry at the national chemical laboratory and obtained his PhD
from the Pune University. His major interests include designing knowledge
management tools to handle digital databases with chemical structure
information. He received two student awards in this area for originality and
novelty from the Chemical Structure Association.

S. Krishnan is the head of the Management and Information Services Division,


National Chemical Laboratory, Pashan Rd., Pune 411 008, India. He obtained
his PhD in biophysics from the Indian Institute of Science, Bangalore. After a
postdoctoral fellowship at the University of California, Santa Cruz, he worked
in the Scientific Information group at Ciba-Geigy, Basel, Switzerland. His
major interests include chemical information and digital libraries.

1 Introduction

Competition and cost has changed the drug design paradigm from the hit and trial
approach to the drug design approach allowing the tailormade design of active molecules.
This has resulted in both targeted drug discovery and reduced drug development cycle

Copyright © 2002 Inderscience Enterprises Ltd.


70 M. Karthikeyan and S. Krishnan

time. The need for introducing newer molecules that are superior using an automated
approach will make drug discovery a highly knowledge specific and efficient process.
Some of the techniques that are evolved over time are schematically presented in
Figure 1 indicating that, progressively, every step in the drug discovery chain has become
automated.

Figure 1 Progress in drug discovery with time

The rapid change in global competition, growth in IT and emergence of low cost storage
technology have facilitated the paradigm change in drug discovery. Every new drug on
the market has its own story to tell as to how it succeeded in surmounting various hurdles
beginning from conceptualisation to reality. Knowledge management is playing in a
major role in almost all chemical and pharmaceutical companies. New chemoinformatics
units are created to assist ongoing drug discovery programs. Many studies have appeared
on chemoinformatics. This paper briefly outlines managerial issues and the support
required for the effective implementation of chemoinformatics in small as well large
organisations successfully.

2 Chemoinformatics

Chem(o)informatics is a generic term that encompasses the design, creation, organisation,


storage, management, retrieval, analysis, dissemination, visualisation and use of chemical
information, not only in its own right, but as a surrogate or index for other data,
information and knowledge [1]. Chemoinformatics is defined as the:
Chemoinformatics: a tool for modern drug discovery 71

‘‘mixing of information resources to transform data into information, and


information into knowledge, intending for better rapid decisions in the arena of
drug lead identification and optimisation.” [2]
Chemoinformatics play a vital link between theoretical design and in drug design through
extraction of information from the data and conversion into knowledge (Figure 2).

Figure 2 Pyramid of chemoinformatics

Chemoinformatics has two primary questions:


1 What to test next? and
2 What to make next?
Derivation of information and knowledge is only one aspect of chemoinformatics.
Chemoinformatics methods can be used proactively to design and filter the most
appropriate compounds to work with in the real world. It is worth mentioning the roles of
chemical information systems in this context.

Figure 3 Chemical structure information


72 M. Karthikeyan and S. Krishnan

The use of derived knowledge in a design and selection support role is an important part
of the drug design cycle. The main processes within drug discovery are lead
identification, where a lead is something that has activity in the low micromolar range,
and lead optimisation, which is the process of transforming a lead into a drug candidate.
On identifying the molecular target various stages involved in the drug development are
depicted in Figure 4.
Figure 4 The molecular paradigm

2.1 Approach towards chemoinformatics


For the effective implementation of chemoinformatics different firms and organisations
follow various approaches which includes compound registration (Database Creation),
library enumeration, navigating virtual libraries, access to primary and secondary
scientific literature, QSAR (quantitative structure/activity relationships), physico-
chemical property calculations and integrated chemical structure based property
databases [3]. These approaches require tools not only for the analysis of experimental
data, but also for the generation of calculated properties of molecules.

2.2 Physico-chemical property predictions


For a long time, efforts were directed towards predicting the properties of chemical
species (drugs, drug-like candidates, drug-intermediates etc.). However recent advances
in chemoinformatics include new molecular descriptors and pharmacophore techniques,
statistical tools and their applications. The ability to predict so-called ADME (absorption,
distribution, metabolism and excretion) properties from a molecular structure would have
a tremendous impact on the drug discovery process both in terms of cost and the amount
of time required to bring a new compound to market. Over several years there has been a
tremendous shift towards an emphasis on the optimisation of ADME properties early in
the life of drug discovery programs.
Two strategies are likely to emerge in the area of physico-chemical property
prediction: first to develop a general approach to screen large number of compounds, and
then to attempt a high level of accuracy for more diverse compounds. Future research
will probably focus on developing models with datasets that are larger and built around
more diverse collections of compounds with a wide range of chemical functionalities.
The focus is on predicting human pharmacokinetic parameters, human intestinal
absorption, LogP, ClogP, local absorption rate, solubility and permeability, reduced ion
mobility, drug absorption and transport phenomena. The application of chemoinformatics
tools to predict all these physico-chemical properties are reviewed [4].
Chemoinformatics: a tool for modern drug discovery 73

3 In silico generation of virtual molecules


In the present scenario, computational tools play a major role in the design of the libraries
prior to synthesis that meet the defined criteria of similarity or diversity. The term in
silico is extensively used to describe the virtual world of data, analysis, models and
designs within a computer [5,6]. Rapid identification of a lead compound or a series,
remain the primary objective of all high-throughput screening.
The ‘real’ world of compounds synthesised in a chemical laboratory and tested in a
biological laboratory is a small part of a larger ‘virtual’ world where hypotheses may be
computer-generated and practically tested.
To address these, an appropriate structure coding has to be chosen, which is related to
the biological activity under investigation. These encoding structural features efficiently
play an important role as they work as a fingerprint for similarity analysis [7,8].
Moreover, the structure-coding scheme must produce the same number of descriptors,
irrespective of the size and the number of atoms in a molecule. An interesting example of
a new descriptor is the ‘feature tree’, a novel way of representing the characteristics of a
molecule [9]. Work on three-dimensional pharmacophore and shape representations
continues, because these are the methods that should mimic a receptor’s viewpoint, rather
than a chemist’s perception of the internal make-up of a molecule [10].
3D sub-structural descriptors based upon potential pharmacophoric patterns have also
been widely used for diversity analysis as have physico-chemical properties that describe
a molecule’s topological, electronic, steric, lipophilic or geometric features [11-15].
The library enumeration where the core sub-structures are identified as templates with
few atoms will be left open for substituents (R-groups). By varying the R-groups at the
points of substitution different product structures can be generated. Rebek et al. [16]
published the synthesis of two combinatorial libraries of semi-rigid compounds that were
prepared by condensing a rigid central molecule functionalised by four acid chloride
groups with a set of 19 different L-amino acids.
The more symmetric skeleton give fewer compounds as shown in Figure 5.
Figure 5 Virtual library generation
74 M. Karthikeyan and S. Krishnan

However if the core sub-structure contains symmetric geometry it may create duplicate
structures during enumeration.
This problem could be overcome by implementing a similarity check algorithm based
on the connection table and chirality of the atoms involved. Estimates of the number of
drug-like compounds that could theoretically be made are greater than 1040 [17].
Deciding which of these molecules to synthesise and test require a good decision support
system. Four principal types of selection procedures are cited in the literature based on
cluster, partition, dissimilarity and optimisation. For the datasets studied, Bayada et al.,
concluded that Ward’s clustering of two-dimensional fingerprints gave the biggest
improvement over random selections while in a different study, the use of a partitioned
chemical descriptor space showed how such a space could be used for diverse subset
selections [18]. The latter method obviates the problem of some clustering techniques,
where the clusters change as new molecules are added in a study. The above methods
allow the calculated property profile of a virtual library to be optimised so that it most
effectively matches a desired target, such as the properties of a collection of drug-like
molecules. They can also cope with the huge combinatorial space that must be examined
when selecting monomers for a library that is to be smaller than that theoretically
possible. Few useful papers have appeared on library design methods [19-21]. Experience
has shown that library design should preferably be based on calculated properties in
product space rather than in monomer space [22].
Synthetic chemists favour software systems based on chemical transformations that
mimic the actual chemistry carried out as these are more familiar [23,24]. Alternative
methods that require the identification of the common core and appended fragments of a
library [25,26] are faster once the separate parts of the product have been defined, but this
often requires considerable human intervention. Hybrid systems have also been
developed [26,27].
Strategies for more efficient biological screening continue to evolve. Rather than
relying on very large screening campaigns, iterative screening strategies are being
explored. These involve screening smaller, selected sets of molecules and using the
derived results to define descriptors for the rational selection of a further set of
molecules. While this obviously mimics the traditional medicinal chemistry approach of
responding to new data, it has taken some time for it to be effectively translated into the
library paradigm. Statistical tools such as recursive partitioning [28] can assist in this
process to identify which descriptors about a lead should be pursued.
Alternative approaches are where the actual reaction is simulated through a synthetic
knowledge base. This more closely replicates the stages involved in the actual synthesis,
in which reagents should react together according to the rules of synthetic chemistry. A
strong background in the Computer Aided Organic Synthesis (CAOS) program will help
to generate reasonable structures of synthetic importance. So far more attention has been
paid to the generation of descriptors for diversity analysis than studies on fragment
substructures or physico-chemical properties [29].

4 Role of natural product chemistry in chemoinformatics

A natural product is an important sector in the area of drug discovery and development.
Most encouraging is the continuing emergence of new natural product chemotypes with
interesting structures and biological activities and potential for sub-library generation of
Chemoinformatics: a tool for modern drug discovery 75

targeted screening. Increasingly available as pure compounds, natural products are highly
amenable to the much broader screening opportunities presented by the new targets.
Regardless of chemical library input, natural products are uniquely well placed to provide
structural information from which virtual compounds can be created by computational
chemistry and allied technologies. The structural versatility of natural products is
expected to play a major role in modern drug discovery programs [30].

5 Review of drug company status (growth sector)

The major companies are using chemoinformatics in an integrated manner for areas that
have high growth potential. The challenge is to learn rapidly how to leverage
chemoinformatics to bring newer molecules that have highly predictable activity
characteristics to minimise time and clinical trial costs. Genomics, proteomics and
chemoinformatics will increase the diffusion of IT into the pharmaceutical industry. This
will require a higher level organisational knowledge integration process that was hitherto
non-existent in the pharmaceutical industry. Drug discovery is moving into the realm of
IT. Structural knowledge and drug knowledge is getting highly integrated. In recent years
organisations have been focusing on the knowledge based drug design as shown in
Figure 6. Major pharmaceutical organisations with a large database backbone and vast
experience in this field will concentrate on structure based design whereas newly created
organisations with zero knowledge will try for random screening in the early stages.

Figure 6 Knowledge based drug design

Diversity needed to find a hit

6 Organisational structures for implementation

Chemoinformatics has evolved through individual initiatives of many firms. IT and drug
discovery are distinct competences in the existing organisational environment and hence
will require coordination. The success of leveraging chemoinformatics will depend on the
76 M. Karthikeyan and S. Krishnan

ability of firms to use chemoinformatics to reduce the drug discovery cycle and the
ability to integrate chemoinformatics into the organisational knowledge creation process.
The main organisational issue is managing the IT and the drug discovery process.
Organisations have two options for sourcing chemoinformatics competence through
1 in-house facilities and
2 outsourcers.
The way chemoinformatics is developing may enable firms to outsource it, as it is a
specialised competence. The difficulty in getting specialised experts on chemical
information is likely to be compounded in the future.

6.1 Identifying partners


The major issue in leveraging chemoinformatics is identifying competent partners
without losing competitive edge and at the same time creating new molecules of
medicinal importance. Selecting ‘representative’ molecules from clusters created in a
multidimensional chemical descriptor space often requires the selection of subsets of
molecules for screening. Computational library design techniques using genetic
algorithms [31,32] have become vital because of the need to design more efficient
libraries.

6.2 Tools and techniques


With the advent of client/server concepts of computing and the deep penetration of WEB
technologies into most computing environments, however, the situation is rapidly
changing. Software and hardware application systems have emerged and now they are
becoming integrated. Since growth is not organic if the applications have to catch up
there has to be a major drive towards standardisation. Use of the web is widely accepted
for text and image handling, utilising scientific tools is technically more difficult. Many
of the tools developed and applied in chemoinformatics by various software vendors
[33-39]. While tools for making chemoinformatics methods more accessible to bench
scientists are important, the receptiveness of medicinal chemists to these techniques
requires that their training in statistics, data analysis, visualisation and biomolecular
concepts be improved. The interest shown in Lipinski’s ‘rule of five’ [40], which
succinctly encapsulates some simple parameters concerning drug absorption, shows how
eager medicinal chemists are for rules to help design appropriate molecules in the
libraries era. This illustrates a real need for both better end-user tools and training of
chemists, biologists and interdisciplinary experts to apply the more advanced methods
effectively. A list of databases available from different sources is presented in Table 1.
From the Table it is clear that if every organisation collects and generates its own library,
there will be repetition of information. There must be a mechanism or tool to be
developed to link all the related information. This will help to develop a unique database
with a global interest and one point access to chemical information. Some of the key
players in this area are listed in Table 2.
Chemoinformatics: a tool for modern drug discovery 77

Table 1 Chemical structure databases

Database Contents Supplier


ACD 2,38,000 MDL Information Systems Inc.
Aquire 5,300 EPA
Asinex 1,15,000 AsInEx Ltd.
ChemReact97 4,70,000 InfoChem GmbH
ChemSynth97 1,70,000 InfoChem GmbH
IBioScreenSC 16,000 InterBioscreen Ltd.
Maybridge 62,000 Maybridge
MedChem 36,000 Pomona / BioByte
NCI96 1,20,000 NCI
SPRESI ‘95 3,200,000 InfoChem GmbH
SPRESI ‘95 Preps 20,00,000 InfoChem GmbH
SpresiReact 1,800,000 InfoChem GmbH
TSCA93 1,00,000 EPA
WDI 60,000 Derwent

Table 2 Companies sponsoring chemoinformatics products

Abbott Laboratories Henkel KgaA


Affymax Research Institute Hoffmann-La Roche (AG, Inc)
Aventis Crop Science (France, UK) Instituto Quimico de Sarriá
Aventis Pharma (France, Germany, USA) Janssen Pharmaceutica
AstraZeneca UK Novartis Pharma
Avon Products Inc NV Organon
Bayer (Germany, USA) Pfizer Inc
Beiersdorf AG Procter & Gamble Company
Birmingham University RW Johnson PRI
Boehringer Ingelheim Schering AG
Cardiff University Searle Pharmaceuticals
CMBI Nijmegen SmithKline Beecham Pharmaceuticals
Celltech R&D Limited Sanofi-Synthelabo Group
Firmenich SA Takeda Chemical Industries
GlaxoWellcome Inc Unilever Research
GlaxoWellcome R & D University of Leeds
GlaxoWellcome SpA Wyeth-Ayerst Research
78 M. Karthikeyan and S. Krishnan

6.3 Technical issues


Chemoinformatics software from software houses is expensive. Building and maintaining
your own solutions is also expensive. Thus, for good tools and knowledge, one must be
prepared to commit significant resources in this area, in terms of hardware, software and
people-ware (i.e. effective creators and users of software). Avoiding supplier monopolies
and looking for cheaper modules to be substituted for outdated or overpriced parts helps
keep costs down. This, however, requires software to be assembled in a modular fashion
in the first place and to be mutually compatible. Structure representation on a computer in
an encoded form is an almost matured field now, however many organisations follow
their own file format for storage of structure in addition to their in-house acquired
research data. Much time will continue to be wasted with incompatible file types without
internationally agreed standards. There is a need to develop a compact unicode for
individual molecules along with a structural descriptor, which should be implemented in
all the databases available globally as a linking medium, irrespective of database type in
the e-world. All the compounds, including the virtual library of molecules, should be
referred to using this unicode comparable with the registry number of individual
compounds by a chemical abstract service or Beilstein etc., This unicode technique will
reduce duplication of information.

7 Current status

Recent advances in virtual screening track computational capability and as the processing
power of computers improves, so do screening speed and complexity. Parameters such as
structure, function or chemical space allow for a nearly limitless array of screening
options. The use of screening data for development decision making is predicated from
the management and interpretation of the data. Extraction of information from the data is
the vital link between theoretical design and the drug candidate. Finally, it is the
integration of iterative results from computation to activity that drives the cycle forward.
Library chemistry and high-throughput screening require the greater use of
chemoinformatics to increase their effectiveness. However to identify types of procedure
which yield the best result and address factors such as cost, availability and synthetic
feasibility rest with the user’s decision. In parallel, another area that is gaining greater
importance, is the development of filtering procedures which identifies molecules that
exhibit some sort of undesirable characteristic (toxicity, high reactivity etc.).

Figure 7 Drug discovery funnel


Chemoinformatics: a tool for modern drug discovery 79

Without a proper knowledge base, lead optimisation is a search in the vast darkness of
chemistry space. It may lead to the wrong direction in the drug discovery program.
Establishing a proper database with complete test results may lead to organisational
success in drug discovery developments (Figure 8).

Figure 8 Need for effective chemoinformatics filter

Combinatorial chemistry has opened up new strategies for a more comprehensive parallel
approach to sweeping and searching during lead optimisation, which has necessitated the
development of suitable and new library design principles.

8 Conclusions

The explosion of raw data coming from library synthesis and HTS operations has driven
the need for an improved chemoinformatics systems. In the first instance knowledge
gained by analysis of data is as good as the quality of the data. However, the increase in
the amount of data available is often at the expense of context and quality. The next
phase of the challenge could be to have quality chemoinformatics tools to apply to
quality data. At least we may achieve something other than a new name for a continuing
problem. This integration of chemical information and drug discovery will completely
change the drug discovery process, allowing small and innovative firms to be active in
drug discovery.

References
1 Brown, F.K. (1998) ‘Chemoinformatics: what is it and how does it impact drug discovery’,
Annu Rep Med Chem, Vol. 33, pp.375–384.
2 Hann, M. and Green, R. (1999) ‘Chemoinformatics – a new name for an old problem’,
Current Opinion in Chemical Biology, Vol. 3, pp.379-383.
3 Parks, C.A., Crippen, G.M. and Topliss, J.G. (1998) ‘The measurement of molecular diversity
by receptor site interaction simulation’, J Comput Aided Mol Des., Vol. 12, pp.441-449.
4 Blake, J.F. (2000) ‘Chemoinformatics – predicting the physicochemical properties of drug-like
molecules’, Current Opinion in Biotechnology, Vol. 11, pp.104-107.
5 Willett, P. (2000) ‘Chemoinformatics – similarity and diversity in chemical libraries’, Current
Opinion in Biotechnology, Vol. 11, pp.85-88.
80 M. Karthikeyan and S. Krishnan

6 Leach, AR. and Hann, M.M (2000) ‘The in silico world of virtual libraries’, Drug Discov
Today, Vol. 5, No. 8, pp.326-336.
7 Willett, P., Barnard, J.M. and Downs, G.M. (1998) ‘Chemical similarity searching’, J Chem
Inform Comput Sci, Vol. 38, pp.983–996.
8 Kubinyi, H. (1998) ‘Similarity and dissimilarity: a medicinal chemists view’, Perspect Drug
Discov Des, Vol. 9–11, pp.225–252.
9 Rarey, M. and Dixon, J.S. (1998) ‘Feature trees, a new molecular similarity measure based on
tree matching’, J Comput Aided Mol Des., Vol. 12, pp.471–490.
10 Good, A.C. and Richards, W.G. (1998) ‘Explicit calculation of 3D molecular similarity’,
Perspect Drug Discov Des, Vol. 9–11, pp.321–338.
11 Pickett, S.D., Mason, J.S. and McLay, I.M. (1996) ‘Diversity profiling and design using 3D
pharmacophores, pharmacophore-derived queries (PDQ)’, J Chem Inform Comput Sci.,
Vol. 36, pp.1214-1223.
12 Kubinyi, H., Folkers, G. and Martin, Y.C. (1998) ‘3D QSAR in drug design. Theory, methods
and applications’, Perspect Drug Discov Des, pp.12–14, v–vii.
13 Cummins, D.J., Andrews, C.W., Bentley, J.A. and Cory, M. (1996) ‘Molecular diversity in
chemical databases, comparison of medicinal chemistry knowledge bases and databases of
commercially available compounds’, J Chem Inform Comput Sci., Vol. 36, pp.750-763.
14 Martin, E.J., Blanney, J.M., Siani, M.A., Spellmeyer, D.C., Wong, A.K. and Moos, W.H.
(1995) ‘Measuring diversity, experimental design of combinatorial libraries for drug
discovery’, J Med Chem, Vol. 38, pp.1431-1436.
15 Martin, Y.C., Brown, R.D. and Bures, M.G. (1998) ‘Quantifying diversity’, in, M. Gordon and
J.F. Kerwin (Eds.) Combinatorial Chemistry and Molecular Diversity in Drug Discovery,
New York, Wiley–Liss, pp.369–385.
16 Carell, T., Wintner, A., Bashir, H.A. and Rebek, J.A. (1994) ‘Solution-phase screening
procedure for the isolation of active compounds from a library of molecules’, Angew Chem.
Int. Ed. Engl., Vol. 33, pp.2061–2064.
17 Martin, Y.C. (1997) ‘Challenges and prospects for computational aids to molecular diversity’,
Perspect Drug Discov Des, Vol. 7/8, pp.159–172.
18 Bayada, D.M., Hamersma, H. and van Geerestein, V.J. (1999) ‘Molecular diversity and
representativity in chemical databases’, J Chem Inform Comput Sci., Vol. 39, pp.1–10.
19 Cramer, R.D., Patterson, D.E., Clark, R.D.D., Soltanshahi, F. and Lawless, M.S. (1998)
‘Virtual compound libraries, a new approach to decision making in molecular discovery
research’, J Chem Inform Comput Sci., Vol. 38, pp.1010–1023.
20 Drewry, D. and Young, S. (1999) ‘Approaches to the design of combinatorial libraries’,
Chemomet Intell Lab Sys., Vol. 48, pp.1–20.
21 Lewell, X.Q., Judd, D., Watson, S. and Hann, M. (1998) ‘RECAP-retrosynthetic
combinatorial analysis procedure, a powerful new technique for identifying privileged
molecular fragments with useful applications in combinatorial chemistry’, J Chem Inf Comput
Sci., Vol. 38, pp.511–522.
22 Gillet, V., Willett, P. and Bradshaw, J. (1997) ‘The effectiveness of reactant pools for
generating structurally-diverse combinatorial libraries’, J Chem Inform Comput Sci., Vol. 37,
pp.731-740.
23 Daylight Chemical Information Systems Inc. on the World Wide Web,
URL http, //www.daylight.com/
24 Afferent Systems Inc. on the World Wide Web, URL http, //www.afferent.com/.
25 Molecular Design Limited, Information Systems Inc. on the World Wide Web,
URL http. //www.MDLi.com/
26 Tripos, Inc. on the World Wide Web, URL http, //www.tripos.com/
27 Synopsys Scientific Systems on the World Wide Web, URL http, //www.synopsys.co.uk/
Chemoinformatics: a tool for modern drug discovery 81

28 Chen, X., Rusinko, A. and Young, S.S. (1998) ‘Recursive partitioning analysis of a large
structure-activity data set using three-dimensional descriptors’, J Chem Inform Comput Sci.,
Vol. 38, pp.1054–1062.
29 Brown, R.D. (1997) ‘Descriptors for diversity analysis’, Prospect Drug Discov Des. Vol. 7/8,
pp.31–49.
30 Nisbet, L.J. and Moore, M. (1997) ‘Will natural products remain an important source of drug
research for the future?’, Current Opinion in Biotechnology, Vol. 8, pp.708–712.
31 Gillet, V.J., Willett, P., Bradshaw, J. and Green, D.V.S. (1999) ‘Selecting combinatorial
libraries to optimize diversity and physical properties’, J Chem Inform Comput Sci., Vol. 39,
pp.169–177.
32 Brown, R.D. and Martin, Y.C. (1997) ‘Designing combinatorial library mixtures using a
genetic algorithm’, J Med Chem., Vol. 40, pp.2304–2313.
33 Molecular Simulations Inc. on the World Wide Web, URL http, //www.msi.com
34 Cherwell Scientific Publishing Ltd. on the World Wide Web, URL http, //www.cherwell.com/
35 Chemical Computing Group Inc. on the World Wide Web, URL http, //www.chemcomp.com/.
36 Spotfire Inc. on the World Wide Web, URL http, //www.spotfire.com/
37 Partek Inc. on the World Wide Web, URL http, //www.partek.com/
38 Oxford Molecular Group on the World Wide Web, URL http, //www.oxmol.co.uk/.
39 CambridgeSoft Corporation on the World Wide Web, URL http, //www.camsoft.com/
40 Lipinski, C.A., Lombardo, F., Dominy, B.W. and Feeney, P.J. (1997) ‘Experimental and
computational approaches to estimate solubility and permeability in drug discovery and
development settings’, Adv Drug Deliv Rev., Vol. 23, pp.3–2535

Glossary for chemoinformatics


CML Chemical Markup Language: http://www.xml-cml.org/
CIS chemical information system: must include registration, computed and measured
properties, chemical descriptors and inventory.
Chemoinformatics: increasingly incorporates compound registration into databases,
including library enumeration; access to primary and secondary scientific literature;
QSARs (quantitative structure/activity relationships) and similar tools for relating activity
to structure; physical and chemical property calculations; chemical structure and property
databases, chemical library design and analysis; structure-based design and statistical
methods.
Chemometrics: the chemical discipline that uses mathematical, statistical and other
methods employing formal logic
• to design or select optimal measurement procedures and experiments and
• to provide maximum relevant chemical information by analysing chemical data.
Computational chemistry: a discipline using mathematical methods for the calculation of
molecular properties or for the simulation of molecular behaviour [IUPAC Med Chem] .
Data mining: non-trivial extraction of implicit, previously unknown and potentially
useful information from data, or the search for relationships and global patterns that exist
in databases.
82 M. Karthikeyan and S. Krishnan

Data mining tools: tools for Data Mining, NCBI, US


http://www.ncbi.nlm.nih.gov/Tools/index.html provides access to BLAST, Clusters of
Orthologous Groups (COGs), ORF finder, Electronic PCR, UniGene, GeneMap99,
VecScreen, Cancer Genome Anatomy Project CGAP, Cancer Chromosome Aberration
Project cCAP, Human-Mouse Homology Maps, LocusLink, VAST search data mining,
genomic:
GUI Graphical User Interface: the two most useful GUIs are the Query interface to the
database and the Report/Analysis interfaces
in silico: in or by means of a computer simulation. Virtual world of data, analysis, models
and designs that reside within a computer. All possible compounds and ideas are
contained within this virtual world. More molecules that we can make or afford to test.
Estimates of the number of drug-like compounds that could theoretically be made are
greater than 1040
Lipinski’s rule of five: The Rule of Five is called so because the cut-offs for each of four
parameters are all close to five or a multiple of five. The ‘rule of 5’ states that: poor
absorption or permeation are more likely when: there are more than 5 H-bond donors
(expressed as the sum of OHs and NHs); the MWT is over 500, the LogP is over 5
(or MLogP is over 4.15) there are more than 10 H-bond acceptors (expressed as the sum
of Ns and Os).
http://www.acdlabs.com/products/phys_chem_lab/logp/ruleof5.html
‘plug and play’ systems: required for effective chemoinformatics systems. Must be
designed backwards from the answer to the data to be captured and systems should be in
components where each component has one simple task.
‘silo systems’: legacy method for many information systems, a system built to collect,
store and report one laboratory’s data. Each ‘silo system’ holds the data differently and
may be in a different technology and the results of the systems cannot easily be
interchanged.
SAR Structure Activity Relationship: The relationship between chemical structure and
pharmacological activity for a series of compounds.

You might also like