Chemical Functional Descriptors 1

Paper No.
: 06 Computational Biology
Module : 08 Chemical functional descriptors – I
Principal Investigator: Dr. Vibha Dhawan, Distinguished Fellow and Sr. Director
The Energy and Resources Institute (TERI), New Delhi
Co-Principal Investigator: Prof S K Jain, Professor,

Jamia Hamdard University, New Delhi
Paper Coordinator: Dr. Indira Ghosh, Professor

Jawaharlal Nehru University, New Delhi
Content Writer: Dr. Indira Ghosh, Professor

Jawaharlal Nehru University, New Delhi
Paper Reviewer: Dr. Debasisa Mohanty

National Institute of Immunology, New Delhi
Computational Biology
Biotechnology
Chemical functional descriptors – I
Description of Module
Subject Name Biotechnology
Paper Name Computational Biology
Module Name/Title Chemical functional descriptors – I
Module Id 08
Pre-requisites
Objectives
Keywords
Biotechnology
Module 8
Graph Theory Based Approach to chemicals and functional descriptors
How to represent different chemical structures in 1-Dimensional, 2-Dimensional & 3-
Dimensional way:
Simple line notation is defined as 1D and chemical sketch of molecule is defined as 2-D, but
to generate 3-D one needs to have many conformations,which are thermodynamically
possible , these are call ensembles. Now we shall discuss how to get from Text based 1D
structure to 2D and then to represent 3D. Sw which are available freely in open source are
ISIS/Draw (MDL) (http://www.mdl.com/downloads/free.html ), ChemDraw (CambridgeSoft)
(http://www.cambridgesoft.com/products/ ), GRINS/JavaGRINS (Daylight)
(http://www.daylight.com/products/javatools.html ) and MarvinSketch
(http://www.chemaxon.com/marvin/ ) . Please go to the website and try to sketch your
favorite amino acid or ligand and convert into SMILE ,line notaion., you will be able to make
a simple program code to copy the SMILE and keep in the database for future use. But it will
not be so simple to convert from the sketch (2D) to 1D, So to evaluate the similarity of two
structures it will be good to use their SMILES notation and compare the text string
comparison logic , you can easily code this too. These are the reason most often large
databases are stored in 1D SMILE notations and converted as and when the reason arises.
Easy to use and fast searching possible line notation is SMILES, especially unique SMILES
notation. One molecule of Phenylalanine has been used as demo to represent all chemical
information on this molecule to represent ie, molecular to chemical information storage.
Many important characteristics like valance, atomtype, branching, and stereochemistry or
Biotechnology
chirality etc.in chemistry can also be represented in SMILES format.
http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html .
Searching of chemicals in this SMILES format can be done using Graph theory, which deals
with nodes & edges (connectivity), process of Isomorphism can compare two graphs or part
of graphs, hence, maximum common sub-graph can be derived. A graph is general
mathematically generated depiction in terms of nodes and edges, Hence can be coded to
computer program, so that properties like isomorphism can be evaluated. Graph depiction of
underground metro of LONDON city is shown, a star constellation can also be shown in
graph and this two can be compared, irrespective of their distance between the nodes or
length of edges.
Definition of Isomorphism of graphs mathematically can be expressed as follows;
Let G = (V, E) and G' = (V' , E' ) be two graphs. We call G and G' isomorphic,and write G ≃
G' , if there exists a bijection φ: V → V with xy ∈ E ↔ φ(x)φ(y) ∈ E for all x, y ∈ V . Such a
map φ is called an isomorphism; if G = G' , it is called an automorphism.
Graph representation helps to convert the structures into a set of nodes (from atom points)
and edge (from connectivity), there are definitions associated with graphs as described in
slides like degree of a nodes (number of edges meeting at it) ,leaf node (a node of degree 1)
and path (connected sequence of edges between two nodes) , cycle(path which returns to its
starting node) ,tree (graph with no cycles), subgraph (graph containing a subset of the nodes
and edges of another graph) etc. Substructure fragment finding is a method developed as part
of important feature wise, like –OH, -NH2 ,ring etc. Each of this fragments can be indexed
and searching is done easily. One can design a vocabulary of features important in Biology.
Usefulness of graph is demonstrated by examples like representing a subgraph which can be
identified in a structure graph corresponding to functional groups, rings etc. will be easy to
Biotechnology
find similar fragments in a large chemical database using program code easily and this will
help to find presence of bioactive fragments in the chemical databases. Graphs are isomorphs
only by their nodes and edges as shown three different spatially looking graphs are
isomorphs. The searching for isomorphs of graph leads to find similar active chemicals. In
2D depictions, bonds are edges and atoms can be used as colour in graph. The entire well
known mathematical algorithms can be utilized to analyse the graph. Isomorphoism of graph
can be used to extract a subset from large dataset. Simple way this is used for annotation
large database, similarity to what extent between two chemicals can be calculated using
graphical representation and annotation can be included in Database. Disadvantage of using
graphs is that chemical structure and graphs are not perfect match ,like tautomerism is a
chemical property cannot be traced by graph ,also the graph algorithms are very slow ,may
take longer CPU than other string based searches like SMILE based.
How to use matrix notation for property presentation?
Matrix representations preserve many chemical properties including some which is derived
from 3D structure but one required many other information like bond length , bond angle and
torsional angles to depict the 3D structure. The 3D structure generation also requires that all
the structures must be stable, i.e., they will not have intra atomic collisions, “bumps”. Hence
other pre-processing is to be done before chemicals represented by 3D structures. However to
use matrix representation of chemicals may not require a real 3D structure ie, co-ordinates in
Cartesian system. Let us start with simple matrix types called Adjacency matrix, which
express a chemical having how many near neighbour it has; this helps to search for
substructure and build graph. But being Boolean, does not need to have bond order
information. Distance matrix contains the shortest distance (geometrical or topological)
between two atoms in the molecule. Many other matrix representation which are non-
Boolean, like Atom connectivity matrix , Bond matrix etc. of the molecule can be found in
Biotechnology
handbook cited at the end. .Matrix representation’s merits & demerits are discussed. Most
often large databases are searched using these operations as it can be done very fast and
easily by computers.
Computational use for application in chemical structural analysis , structure searching in DB,
prediction from the set of structures whose bioactivity will be similar to other chemicals and
annotate the chemicals with biologically important fragments are a few application of the
present methods discussed. Next module will talk about the features or descriptors which are
generated from the 1D, 2D & 3D representation and associate with the biological activity of
chemicals.
Biotechnology

Chemical Functional Descriptors 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chemical Functional Descriptors 1

Uploaded by

Copyright:

Available Formats

Paper No.

Module : 08 Chemical functional descriptors – I

Co-Principal Investigator: Prof S K Jain, Professor,

Paper Coordinator: Dr. Indira Ghosh, Professor

Content Writer: Dr. Indira Ghosh, Professor

Paper Reviewer: Dr. Debasisa Mohanty

Paper Name Computational Biology

Module Name/Title Chemical functional descriptors – I

to generate 3-D one needs to have many conformations,which are thermodynamically

ISIS/Draw (MDL) (http://www.mdl.com/downloads/free.html ), ChemDraw (CambridgeSoft)

(http://www.cambridgesoft.com/products/ ), GRINS/JavaGRINS (Daylight)

(http://www.daylight.com/products/javatools.html ) and MarvinSketch

(http://www.chemaxon.com/marvin/ ) . Please go to the website and try to sketch your

information on this molecule to represent ie, molecular to chemical information storage.

Many important characteristics like valance, atomtype, branching, and stereochemistry or

of graphs, hence, maximum common sub-graph can be derived. A graph is general

Definition of Isomorphism of graphs mathematically can be expressed as follows;

G' , if there exists a bijection φ: V → V with xy ∈ E ↔ φ(x)φ(y) ∈ E for all x, y ∈ V . Such a

map φ is called an isomorphism; if G = G' , it is called an automorphism.

Usefulness of graph is demonstrated by examples like representing a subgraph which can be

graphical representation and annotation can be included in Database. Disadvantage of using

How to use matrix notation for property presentation?

other pre-processing is to be done before chemicals represented by 3D structures. However to

information. Distance matrix contains the shortest distance (geometrical or topological)

You might also like