You are on page 1of 6

Paper No.

: 06 Computational Biology

Module : 08 Chemical functional descriptors – I

Principal Investigator: Dr. Vibha Dhawan, Distinguished Fellow and Sr. Director
The Energy and Resources Institute (TERI), New Delhi

Co-Principal Investigator: Prof S K Jain, Professor,


Jamia Hamdard University, New Delhi

Paper Coordinator: Dr. Indira Ghosh, Professor


Jawaharlal Nehru University, New Delhi

Content Writer: Dr. Indira Ghosh, Professor


Jawaharlal Nehru University, New Delhi

Paper Reviewer: Dr. Debasisa Mohanty


National Institute of Immunology, New Delhi

Computational Biology
Biotechnology
Chemical functional descriptors – I
Description of Module
Subject Name Biotechnology

Paper Name Computational Biology

Module Name/Title Chemical functional descriptors – I

Module Id 08

Pre-requisites

Objectives

Keywords

Computational Biology
Biotechnology
Chemical functional descriptors – I
Module 8
Graph Theory Based Approach to chemicals and functional descriptors
How to represent different chemical structures in 1-Dimensional, 2-Dimensional & 3-

Dimensional way:

Simple line notation is defined as 1D and chemical sketch of molecule is defined as 2-D, but

to generate 3-D one needs to have many conformations,which are thermodynamically

possible , these are call ensembles. Now we shall discuss how to get from Text based 1D

structure to 2D and then to represent 3D. Sw which are available freely in open source are

ISIS/Draw (MDL) (http://www.mdl.com/downloads/free.html ), ChemDraw (CambridgeSoft)

(http://www.cambridgesoft.com/products/ ), GRINS/JavaGRINS (Daylight)

(http://www.daylight.com/products/javatools.html ) and MarvinSketch

(http://www.chemaxon.com/marvin/ ) . Please go to the website and try to sketch your

favorite amino acid or ligand and convert into SMILE ,line notaion., you will be able to make

a simple program code to copy the SMILE and keep in the database for future use. But it will

not be so simple to convert from the sketch (2D) to 1D, So to evaluate the similarity of two

structures it will be good to use their SMILES notation and compare the text string

comparison logic , you can easily code this too. These are the reason most often large

databases are stored in 1D SMILE notations and converted as and when the reason arises.

Easy to use and fast searching possible line notation is SMILES, especially unique SMILES

notation. One molecule of Phenylalanine has been used as demo to represent all chemical

information on this molecule to represent ie, molecular to chemical information storage.

Many important characteristics like valance, atomtype, branching, and stereochemistry or

Computational Biology
Biotechnology
Chemical functional descriptors – I
chirality etc.in chemistry can also be represented in SMILES format.

http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html .

Searching of chemicals in this SMILES format can be done using Graph theory, which deals

with nodes & edges (connectivity), process of Isomorphism can compare two graphs or part

of graphs, hence, maximum common sub-graph can be derived. A graph is general

mathematically generated depiction in terms of nodes and edges, Hence can be coded to

computer program, so that properties like isomorphism can be evaluated. Graph depiction of

underground metro of LONDON city is shown, a star constellation can also be shown in

graph and this two can be compared, irrespective of their distance between the nodes or

length of edges.

Definition of Isomorphism of graphs mathematically can be expressed as follows;

Let G = (V, E) and G' = (V' , E' ) be two graphs. We call G and G' isomorphic,and write G ≃

G' , if there exists a bijection φ: V → V with xy ∈ E ↔ φ(x)φ(y) ∈ E for all x, y ∈ V . Such a

map φ is called an isomorphism; if G = G' , it is called an automorphism.

Graph representation helps to convert the structures into a set of nodes (from atom points)

and edge (from connectivity), there are definitions associated with graphs as described in

slides like degree of a nodes (number of edges meeting at it) ,leaf node (a node of degree 1)

and path (connected sequence of edges between two nodes) , cycle(path which returns to its

starting node) ,tree (graph with no cycles), subgraph (graph containing a subset of the nodes

and edges of another graph) etc. Substructure fragment finding is a method developed as part

of important feature wise, like –OH, -NH2 ,ring etc. Each of this fragments can be indexed

and searching is done easily. One can design a vocabulary of features important in Biology.

Usefulness of graph is demonstrated by examples like representing a subgraph which can be

identified in a structure graph corresponding to functional groups, rings etc. will be easy to

Computational Biology
Biotechnology
Chemical functional descriptors – I
find similar fragments in a large chemical database using program code easily and this will

help to find presence of bioactive fragments in the chemical databases. Graphs are isomorphs

only by their nodes and edges as shown three different spatially looking graphs are

isomorphs. The searching for isomorphs of graph leads to find similar active chemicals. In

2D depictions, bonds are edges and atoms can be used as colour in graph. The entire well

known mathematical algorithms can be utilized to analyse the graph. Isomorphoism of graph

can be used to extract a subset from large dataset. Simple way this is used for annotation

large database, similarity to what extent between two chemicals can be calculated using

graphical representation and annotation can be included in Database. Disadvantage of using

graphs is that chemical structure and graphs are not perfect match ,like tautomerism is a

chemical property cannot be traced by graph ,also the graph algorithms are very slow ,may

take longer CPU than other string based searches like SMILE based.

How to use matrix notation for property presentation?

Matrix representations preserve many chemical properties including some which is derived

from 3D structure but one required many other information like bond length , bond angle and

torsional angles to depict the 3D structure. The 3D structure generation also requires that all

the structures must be stable, i.e., they will not have intra atomic collisions, “bumps”. Hence

other pre-processing is to be done before chemicals represented by 3D structures. However to

use matrix representation of chemicals may not require a real 3D structure ie, co-ordinates in

Cartesian system. Let us start with simple matrix types called Adjacency matrix, which

express a chemical having how many near neighbour it has; this helps to search for

substructure and build graph. But being Boolean, does not need to have bond order

information. Distance matrix contains the shortest distance (geometrical or topological)

between two atoms in the molecule. Many other matrix representation which are non-

Boolean, like Atom connectivity matrix , Bond matrix etc. of the molecule can be found in

Computational Biology
Biotechnology
Chemical functional descriptors – I
handbook cited at the end. .Matrix representation’s merits & demerits are discussed. Most

often large databases are searched using these operations as it can be done very fast and

easily by computers.

Computational use for application in chemical structural analysis , structure searching in DB,

prediction from the set of structures whose bioactivity will be similar to other chemicals and

annotate the chemicals with biologically important fragments are a few application of the

present methods discussed. Next module will talk about the features or descriptors which are

generated from the 1D, 2D & 3D representation and associate with the biological activity of

chemicals.

Computational Biology
Biotechnology
Chemical functional descriptors – I

You might also like