Professional Documents
Culture Documents
By Bhushan Bapat
08/29/12
CONTENTS
SMARTS
Specifications Recursive SMARTS
08/29/12
Line Notations
International Chemical Identifier (InChI) ROSDAL Wiswesser Line Notation (WLN) Simplified Molecular Input Line Entry Specification (SMILES) SMILES Arbitrary Target Specification (SMARTS) SYBYL Line Notation (SLN)
3
08/29/12
SMILES
Simplified Molecular-Input Line-Entry System Describes the structure of molecules using short ASCII strings Conversion into two-dimensional drawings or three-dimensional models of the molecules. By Arthur Weininger and David Weininger in the late 1980s Modified and extended by Daylight Chemical Information Systems Inc In 2007, an open standard called "Open SMILES" was developed by the Blue Obelisk open-source chemistry community.
08/29/12
Five rules for specifying atoms, bonds, branching, ring closures and disconnections. Rule for specifying Atoms
Denoted by their atomic symbols in Square brackets [] [Se] , [Au] B, C, N, O, P, S, P, F, Cl, Br and I do not need [] C methane, O water Within brackets attached H and charges must be specified [H+] proton, [OH-] hydroxyl ion [Fe+2] iron (II) cation Aliphatic atom by capital symbol, aromatic by lower case symbol
08/29/12
Single bond by - or can be omitted CC ethane CH2CH3 Double bond by = C=C ethene CH2CH2 Triple bond by # C#N hydrogen cyanide Aromatic bond by : or can be omitted cccccc - benzene For linear structures, SMILES notation is simple diagrammatic notation with Hydrogen and single bonds omitted C=CCC=CC - 1,4-Hexadiene
08/29/12
by breaking one bond in each ring, ring opening (closing) atoms denoted by a number following them
08/29/12
But
C1.C1 means CC i.e. Ethane
08/29/12
Isomeric SMILES
Used to specify isotopism, configuration around = bonds and chirality Isotopic specification
Desired atomic mass followed by atomic symbol [12C] carbon-12 [13C] carbon-13 [13CH4] carbon-13 methane
F/C=C/F F\C=C\F
08/29/12
F/C=C\F F\C=C/F
9
Chiral specification
Tetrahedral structure is commonest chiral structure with four different structures attached to C atom Indicated by @ and @@ @ - when neighboring atoms listed anticlockwise
N[C@](C)(F)C(=O)O
N[C@@](F)(C)C(=O)O
08/29/12
10
Denoted as explicit when 1) Charged hydrogen proton [H+] 2) Hydrogen molecule [H][H] 3) Bridging hydrogen H connected to two atoms 4) Isotopic hydrogen heavy water
Aromaticity
Uses Huckel rule to identify aromaticity 1) All C sp2 hybrtidized 2) Pi electrons satisfy 4n+2 rule
C1=COC=C1 c1cocc1
C1=CN=C[NH]C(=O)1 c1cnc[nH]c(=O)1
08/29/12
11
Cn1cccc1 [nH]1cccc1
08/29/12 12
Reaction SMILES
Examples
C=CCBr>>C=CCI C=CCBr.[Na+].[I-]>CC(=O)C>C=CCI.[Na+].[Br-]
08/29/12
13
SMARTS
SMILES Arbitrary Target Specification Language that allows searching of substructure within a structure Extension of SMILES rules Includes logical operators with nodes and edges Specifications
08/29/12
14
Atomic Primitives
Symbol name wildcard aromatic aliphatic degree total-H-count implicit-H-count ring membership ring size valence connectivity Atomic property requirements any atom aromatic aliphatic <n> explicit connections <n> attached hydrogens <n> implicit hydrogens in <n> SSSR rings in smallest SSSR ring of size <n> total bond order <n> <n> total connections Default (no default) (no default) (no default) exactly one exactly one at least one any ring atom any ring atom exactly one exactly one
08/29/12
15
ring <n> total ring connections connectivity negative charge positive charge atomic number chirality chirality chirality chiral or unspec atomic mass -<n> charge +<n> formal charge atomic number <n> anticlockwise clockwise chiral class <c> chirality <n> chirality <c><n> or unspecified explicit atomic mass
at least one -1 charge (-- is -2, etc) +1 charge (++ is +2, etc) (no default) anticlockwise, default class clockwise, default class (nodefault) (no default) unspecified mass
08/29/12
16
Bond Primitives Symbol / \ /? \? = # : ~ @ Atomic property requirements single bond (aliphatic) directional bond "up" directional bond "down" directional bond "up or unspecified" directional bond "down or unspecified" double bond triple bond aromatic bond any bond (wildcard) any ring bond1
08/29/12
17
Logical operators
Atom and Bond specifications combined to form expressions Symbol exclamation ampersand comma semicolon Expression !e1 e1&e2 e1,e2 e1;e2 Meaning not e1 a1 and e2 (high precedence) e1 or e2 a1 and e2 (low precedence)
Example [CH2] - aliphatic carbon with two hydrogens (methylene carbon) [!C;R] - ( NOT aliphatic carbon ) AND in ring [X3&H0]atom with 3 total bonds and no H's [35*]any atom of mass 35
18
08/29/12
Recursive SMARTS
Conditional SMARTS
CaaO CaaaN Caa(O)aN Ca(aO)aaN C[$(aaO);$(aaaN)] C ortho to O C meta to N C ortho to O and meta to N (but 2O,3N only) C ortho to O and meta to N (but 2O,5N only) C ortho to O and meta to N (all cases)
08/29/12
19
SMILES vs SMARTS
SMILES Molecules SMARTS Patterns SMILES allows implicit hydrogens to be added as explicit atoms eg 1H-pyrrole SMILE [nH]1cccc1 Hn1cccc1 confusing SMART n1cccc1 but !Hn1cccc1 Most SMARTS are not valid SMILES eg cOc
08/29/12
20
ACKNOWLEDGEMENT
Thank You!
08/29/12
21