You are on page 1of 26

Chemoinformatics and Metabolism

Paula de Matos

EBI is an Outstation of the European Molecular Biology Laboratory.


Chemoinformatics and Metabolism Group
Research

Indexing, searching
Natural Products and and dissemination of
Metabolomics chemical information

Cheminformatics Algorithms and Toolkits


Chemical Entities of Biological Interest
• A database containing a freely available,
manually annotated dictionary of molecular
entities focused on ‘small’ chemical
compounds.

• Provides a method to navigate the


chemical space via an ontology

• ChEBI aims to provide a central, definitive


reference of chemical nomenclature.

3 08.12.21
http://www.ebi.ac.uk/chebi

Dictionary
Ontology

Resource for
Nomenclature

4 08.12.21
What does ChEBI cover?

• Mostly small entities

• Big entities too like


• alumina
• amylose
• metaborate

Excludes proteins and nucleic acids

5 08.12.21
6 08.12.21
7 08.12.21
Status

8 08.12.21
ChEBI further info
• http://www.ebi.ac.uk/chebi

• Mailing lists:
• chebi-help@ebi.ac.uk
• chebi-announce@lists.sourceforge.net
• chebi-ontology@lists.sourceforge.net

• Submitting data
• http://www.ebi.ac.uk/chebi/submissions

9 08.12.21
The Chemistry Development Kit (CDK):
An Open Source Java-Library for Structural Chemo- and Bioinformatics

•>90.000 Lines of Code, >900 Classes, > 9000 Methods


•Library Generation
•Virtual Screening
•Molecular Property Prediction
•Visualization

http://cdk.sourceforge.net

(1) Steinbeck, C.; Hoppe, C.; Kuhn, S.; Guha, R.; Willighagen, E. L. Current Pharmaceutical Design 2006, 12, 2111-2120.
(2) Steinbeck<, C.; Han, Y. Q.; Kuhn, S.; Horlacher, O.; Luttmann, E., Willighagen, E. Journal of Chemical Information and
Computer Sciences 2003, 43, 493-500.
10 08.12.21
The Chemistry Development Kit (CDK)
Input/Output Visualization
•I/O (CML, MDL Molfile, SDF, PDB) •Structure-Diagram-Layout (SDG)
•SMILES •2D Rendering
•InChI •3D Rendering

Modelling Library Enumeration


•3D Model-Builder •Deterministic Isomer generator
•Atom-Typing •Stochastic Structure Generators via
•Force-Field •Simulated Annealing
•Representation of Biomolecular Structures •Genetic Algorithms

Chemical Graphs Properties


•Isomorphism detection •Fingerprinting
•Maximum-Common-Substructure Searches •> 70 QSAR-Descriptors
•SMARTS- and Substructure searches •QSAR model building
•Ring searches
•Aromaticity detection
11 08.12.21
Example: Structure Diagram Generation

12 08.12.21
Example: Fingerprinting
H
N

IMolecule superstructure = MoleculeFactory.makeIndole();


IMolecule substructure = MoleculeFactory.makePyrrole();
O
Fingerprinter fingerprinter = new Fingerprinter(); O
BitSet superBS = fingerprinter.getFingerprint(superstructure);
BitSet subBS = fingerprinter.getFingerprint(substructure);
boolean isSubset = FingerprinterTool.isSubset(superBS, subBS);

Bitscreen coding for structural features

0 0 1 1 0 1 0 0 1 0

O O
-COOH Alky Hetero-
aryl N O-Alkyl -NH2
H N
13 08.12.21
CDK in numbers


67 registered developers on SF

86 people subscribed to cdk-devel list

111 people subscribed to cdk-user list

14 08.12.21
CDK in numbers

80,966 downloads since 2001

15 08.12.21
CDK in numbers

CDK article (2003)


cited 68 times
16 08.12.21
CDK info
• Project home page:
• http://cdk.sourceforge.net/

• Mailing list:
• cdk-user@lists.sourceforge.net
• cdk-devel@lists.sourceforge.net

• Documentation
• http://pele.farmbio.uu.se/nightly/

17 08.12.21
OrChem

• Oracle chemistry plug-in using the


Chemistry Development Kit (CDK).

• OrChem is suitable for Oracle 11G and onwards

• Uses Oracle’s JIT compiler

• Not an Oracle data cartridge - it doesn't need Oracle's


extensibility architecture because its Java components run
as Java stored procedures inside the Oracle standard JVM
(Aurora).

18 08.12.21
OrChem database structure

19 08.12.21
Example OrChem Queries
• Similarity search
• select * from table(
orchem_simsearch.search( 'OC4=C(C(=C3OC(C)
(COC=1C=CC(=CC=1)CC2C(=O)NC(=O)S2)CCC3=C4C)C)C','S
MILES',0.8,null,'N')
);

• Substructure search
• select orchem_subsearch.search(molfile,'MOL',50,'Y') from
compounds where molregno=12345;

20 08.12.21
Fingerprint distribution

21 08.12.21
Parallel vs. Non parallel
Performance of substructure search on 3.5 million compounds

22 08.12.21
Substructure benchmarking
Performance of substructure search on 3.5 million compounds

23 08.12.21
Similarity Benchmarking

24 08.12.21
OrChem info
• http://orchem.sourceforge.net/

• Mailing list:
• orchem-devel@lists.sourceforge.net

25 08.12.21
26 08.12.21

You might also like