You are on page 1of 21

An Introduction to Line Notations SMILES and SMARTS

By Bhushan Bapat

08/29/12

CONTENTS

Line notations to represent molecular structures SMILES


Specification Rules Isomeric SMILES General conventions Reaction SMILES

SMARTS
Specifications Recursive SMARTS

SMILES vs SMARTS Acknowledgements


2

08/29/12

Line Notations

International Chemical Identifier (InChI) ROSDAL Wiswesser Line Notation (WLN) Simplified Molecular Input Line Entry Specification (SMILES) SMILES Arbitrary Target Specification (SMARTS) SYBYL Line Notation (SLN)
3

08/29/12

SMILES

Simplified Molecular-Input Line-Entry System Describes the structure of molecules using short ASCII strings Conversion into two-dimensional drawings or three-dimensional models of the molecules. By Arthur Weininger and David Weininger in the late 1980s Modified and extended by Daylight Chemical Information Systems Inc In 2007, an open standard called "Open SMILES" was developed by the Blue Obelisk open-source chemistry community.

08/29/12

Rules for Encoding

Five rules for specifying atoms, bonds, branching, ring closures and disconnections. Rule for specifying Atoms

Denoted by their atomic symbols in Square brackets [] [Se] , [Au] B, C, N, O, P, S, P, F, Cl, Br and I do not need [] C methane, O water Within brackets attached H and charges must be specified [H+] proton, [OH-] hydroxyl ion [Fe+2] iron (II) cation Aliphatic atom by capital symbol, aromatic by lower case symbol

08/29/12

Rules for specifying Bonds

Single bond by - or can be omitted CC ethane CH2CH3 Double bond by = C=C ethene CH2CH2 Triple bond by # C#N hydrogen cyanide Aromatic bond by : or can be omitted cccccc - benzene For linear structures, SMILES notation is simple diagrammatic notation with Hydrogen and single bonds omitted C=CCC=CC - 1,4-Hexadiene

Rules for specifying Branches

Branches shown in parenthesis on the right


CCN(CC)CC 6

08/29/12

Rules for specifying Cyclic structures

by breaking one bond in each ring, ring opening (closing) atoms denoted by a number following them

For cubane where more than one ring closure is present

SMILE for cubane is C12C3C4C1C5C4C3C25

08/29/12

Rules for specifying Disconnected structures


Written as individual structures separated by . Example Sodium Phenoxide

Atoms separated by . are not connected / bonded to each other

But
C1.C1 means CC i.e. Ethane

08/29/12

Isomeric SMILES

Used to specify isotopism, configuration around = bonds and chirality Isotopic specification

Desired atomic mass followed by atomic symbol [12C] carbon-12 [13C] carbon-13 [13CH4] carbon-13 methane

Configuration around double bond

Denoted by / and \ called directional bonds

F/C=C/F F\C=C\F
08/29/12

F/C=C\F F\C=C/F
9

Chiral specification

Configuration around Tetrahedral Centers

Tetrahedral structure is commonest chiral structure with four different structures attached to C atom Indicated by @ and @@ @ - when neighboring atoms listed anticlockwise

N[C@](C)(F)C(=O)O

@@ when neighboring atoms listed clockwise

N[C@@](F)(C)C(=O)O

08/29/12

10

General conventions in Hydrogens SMILES

Denoted as explicit when 1) Charged hydrogen proton [H+] 2) Hydrogen molecule [H][H] 3) Bridging hydrogen H connected to two atoms 4) Isotopic hydrogen heavy water

Aromaticity

Uses Huckel rule to identify aromaticity 1) All C sp2 hybrtidized 2) Pi electrons satisfy 4n+2 rule

C1=COC=C1 c1cocc1

C1=CN=C[NH]C(=O)1 c1cnc[nH]c(=O)1

08/29/12

11

Aromatic Nitrogen compounds

All can be represented as lower case atomic symbol, n 1) Pyridine n1ccccc1

2) Pyridine-N-oxide O=n1ccccc1 [O-][n+]1ccccc1

3) Mthyl and 1H-pyrrole

Cn1cccc1 [nH]1cccc1
08/29/12 12

Reaction SMILES

Reactions written as reactant > agent > product

Examples

C=CCBr>>C=CCI C=CCBr.[Na+].[I-]>CC(=O)C>C=CCI.[Na+].[Br-]

08/29/12

13

SMARTS

SMILES Arbitrary Target Specification Language that allows searching of substructure within a structure Extension of SMILES rules Includes logical operators with nodes and edges Specifications

Atomic and Bond Primitives Logical operators and Recursive SMARTS

08/29/12

14

Atomic Primitives
Symbol name wildcard aromatic aliphatic degree total-H-count implicit-H-count ring membership ring size valence connectivity Atomic property requirements any atom aromatic aliphatic <n> explicit connections <n> attached hydrogens <n> implicit hydrogens in <n> SSSR rings in smallest SSSR ring of size <n> total bond order <n> <n> total connections Default (no default) (no default) (no default) exactly one exactly one at least one any ring atom any ring atom exactly one exactly one

Symbol * a A D<n> H<n> h<n> R<n> r<n> v<n> X<n>

08/29/12

15

x<n> - <n> +<n> #n @ @@ @<c><n> @<c><n>? <n>

ring <n> total ring connections connectivity negative charge positive charge atomic number chirality chirality chirality chiral or unspec atomic mass -<n> charge +<n> formal charge atomic number <n> anticlockwise clockwise chiral class <c> chirality <n> chirality <c><n> or unspecified explicit atomic mass

at least one -1 charge (-- is -2, etc) +1 charge (++ is +2, etc) (no default) anticlockwise, default class clockwise, default class (nodefault) (no default) unspecified mass

08/29/12

16

Bond Primitives Symbol / \ /? \? = # : ~ @ Atomic property requirements single bond (aliphatic) directional bond "up" directional bond "down" directional bond "up or unspecified" directional bond "down or unspecified" double bond triple bond aromatic bond any bond (wildcard) any ring bond1

08/29/12

17

Logical operators

Atom and Bond specifications combined to form expressions Symbol exclamation ampersand comma semicolon Expression !e1 e1&e2 e1,e2 e1;e2 Meaning not e1 a1 and e2 (high precedence) e1 or e2 a1 and e2 (low precedence)

Example [CH2] - aliphatic carbon with two hydrogens (methylene carbon) [!C;R] - ( NOT aliphatic carbon ) AND in ring [X3&H0]atom with 3 total bonds and no H's [35*]any atom of mass 35
18

08/29/12

Recursive SMARTS

Conditional SMARTS
CaaO CaaaN Caa(O)aN Ca(aO)aaN C[$(aaO);$(aaaN)] C ortho to O C meta to N C ortho to O and meta to N (but 2O,3N only) C ortho to O and meta to N (but 2O,5N only) C ortho to O and meta to N (all cases)

08/29/12

19

SMILES vs SMARTS

SMILES Molecules SMARTS Patterns SMILES allows implicit hydrogens to be added as explicit atoms eg 1H-pyrrole SMILE [nH]1cccc1 Hn1cccc1 confusing SMART n1cccc1 but !Hn1cccc1 Most SMARTS are not valid SMILES eg cOc

08/29/12

20

ACKNOWLEDGEMENT

Daylight Chemical Information System Inc Wikipedia

Thank You!

08/29/12

21

You might also like