Professional Documents
Culture Documents
: 06 Computational Biology
Principal Investigator: Dr. Vibha Dhawan, Distinguished Fellow and Sr. Director
The Energy and Resources Institute (TERI), New Delhi
Computational Biology
Biotechnology
Chemical functional descriptors – I
Description of Module
Subject Name Biotechnology
Module Id 08
Pre-requisites
Objectives
Keywords
Computational Biology
Biotechnology
Chemical functional descriptors – I
Module 8
Graph Theory Based Approach to chemicals and functional descriptors
How to represent different chemical structures in 1-Dimensional, 2-Dimensional & 3-
Dimensional way:
Simple line notation is defined as 1D and chemical sketch of molecule is defined as 2-D, but
possible , these are call ensembles. Now we shall discuss how to get from Text based 1D
structure to 2D and then to represent 3D. Sw which are available freely in open source are
favorite amino acid or ligand and convert into SMILE ,line notaion., you will be able to make
a simple program code to copy the SMILE and keep in the database for future use. But it will
not be so simple to convert from the sketch (2D) to 1D, So to evaluate the similarity of two
structures it will be good to use their SMILES notation and compare the text string
comparison logic , you can easily code this too. These are the reason most often large
databases are stored in 1D SMILE notations and converted as and when the reason arises.
Easy to use and fast searching possible line notation is SMILES, especially unique SMILES
notation. One molecule of Phenylalanine has been used as demo to represent all chemical
Computational Biology
Biotechnology
Chemical functional descriptors – I
chirality etc.in chemistry can also be represented in SMILES format.
http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html .
Searching of chemicals in this SMILES format can be done using Graph theory, which deals
with nodes & edges (connectivity), process of Isomorphism can compare two graphs or part
mathematically generated depiction in terms of nodes and edges, Hence can be coded to
computer program, so that properties like isomorphism can be evaluated. Graph depiction of
underground metro of LONDON city is shown, a star constellation can also be shown in
graph and this two can be compared, irrespective of their distance between the nodes or
length of edges.
Let G = (V, E) and G' = (V' , E' ) be two graphs. We call G and G' isomorphic,and write G ≃
Graph representation helps to convert the structures into a set of nodes (from atom points)
and edge (from connectivity), there are definitions associated with graphs as described in
slides like degree of a nodes (number of edges meeting at it) ,leaf node (a node of degree 1)
and path (connected sequence of edges between two nodes) , cycle(path which returns to its
starting node) ,tree (graph with no cycles), subgraph (graph containing a subset of the nodes
and edges of another graph) etc. Substructure fragment finding is a method developed as part
of important feature wise, like –OH, -NH2 ,ring etc. Each of this fragments can be indexed
and searching is done easily. One can design a vocabulary of features important in Biology.
identified in a structure graph corresponding to functional groups, rings etc. will be easy to
Computational Biology
Biotechnology
Chemical functional descriptors – I
find similar fragments in a large chemical database using program code easily and this will
help to find presence of bioactive fragments in the chemical databases. Graphs are isomorphs
only by their nodes and edges as shown three different spatially looking graphs are
isomorphs. The searching for isomorphs of graph leads to find similar active chemicals. In
2D depictions, bonds are edges and atoms can be used as colour in graph. The entire well
known mathematical algorithms can be utilized to analyse the graph. Isomorphoism of graph
can be used to extract a subset from large dataset. Simple way this is used for annotation
large database, similarity to what extent between two chemicals can be calculated using
graphs is that chemical structure and graphs are not perfect match ,like tautomerism is a
chemical property cannot be traced by graph ,also the graph algorithms are very slow ,may
take longer CPU than other string based searches like SMILE based.
Matrix representations preserve many chemical properties including some which is derived
from 3D structure but one required many other information like bond length , bond angle and
torsional angles to depict the 3D structure. The 3D structure generation also requires that all
the structures must be stable, i.e., they will not have intra atomic collisions, “bumps”. Hence
use matrix representation of chemicals may not require a real 3D structure ie, co-ordinates in
Cartesian system. Let us start with simple matrix types called Adjacency matrix, which
express a chemical having how many near neighbour it has; this helps to search for
substructure and build graph. But being Boolean, does not need to have bond order
between two atoms in the molecule. Many other matrix representation which are non-
Boolean, like Atom connectivity matrix , Bond matrix etc. of the molecule can be found in
Computational Biology
Biotechnology
Chemical functional descriptors – I
handbook cited at the end. .Matrix representation’s merits & demerits are discussed. Most
often large databases are searched using these operations as it can be done very fast and
easily by computers.
Computational use for application in chemical structural analysis , structure searching in DB,
prediction from the set of structures whose bioactivity will be similar to other chemicals and
annotate the chemicals with biologically important fragments are a few application of the
present methods discussed. Next module will talk about the features or descriptors which are
generated from the 1D, 2D & 3D representation and associate with the biological activity of
chemicals.
Computational Biology
Biotechnology
Chemical functional descriptors – I