You are on page 1of 21

SMILES

Simplified Molecular Input Line Entry


System (SMILES)
Widely used AND computationally
efficient
Uses atomic symbols and a set of
intuitive rules
Uses hydrogen-suppressed molecular
graphs (HSMG)

SMILES Bonds
SINGLE*

DOUBLE

TRIPLE

AROMATIC*
* can be omitted

Butanols
O

2-Butanol
iso-Butanol
tert-Butanol

SMILES Branches
Represented by enclosure in
parentheses
Can be nested or stacked
Examples:
CC(O)CC is 2-Butanol
OCC(C)C is iso-Butanol
OC(C)(C)C is tert-Butanol

SMILES Bonds
Ethene
Chloroethene
1,1-Dichloroethene
cis-1,2-Dichloroethene
Trichloroethene
Perchloroethene

C=C
ClC=C
ClC(Cl)=C
ClC=CCl
ClC(Cl)=CCl
ClC(Cl)=C(Cl)Cl

SMILES Atoms
Use normal chemical symbols
Add punctuation symbols if necessary
No super- or subscripts

SMILES Symbols
String of alphanumeric characters and
certain punctuation symbols
Terminates at the first space
encountered when read left to right
The ORGANIC SUBSET:
B, C, N, O, P, S, F, Cl, Br, I

Other SMILES Atoms


Aliphatic or nonaromatic carbon: C
Atom in aromatic ring: lowercase letter
Designate ring closure with pairs of
matching digits, e.g.
c1ccccc1 (or C1=CC=CC=C1) is Benzene,
whereas
C1CCCCC1 is Cyclohexane

SMILES Charges
Specify attached hydrogens and
charges in square brackets
Number of attached hydrogens is the
symbol H followed by optional digit

SMILES Charges
[H+]
[OH-]
[OH3+]
[Fe++]
[NH4+]

proton
hydroxyl anion
hydronium cation
iron(II) cation
ammonium cation

SMILES Cyclic Structures


Break one single or one aromatic bond
in each ring
Number in any order
Designate ring-breaking atoms by the
same digit following the atomic symbol

Cyclic Structures
Numbers indicate start and stop of ring
Same number indicates start and end of the
ring, entered immediately following the
start/end atoms
Only numbers 1 9 are used
A number should appear only twice
Atom can be associated w. 2 consecutive
numbers, e.g., Napthalene: c12ccccc1cccc2

Naphthalene

c12ccccc1cccc2

SMILES Conventions
Avoid two consecutive left parentheses
if possible
Strive for the fewest number of possible
branches
Tautomeric bonds are not designated;
enter the appropriate form

Further Restrictions
A branch cannot begin a SMILES
notation
A branch cannot immediately follow a
double- or triple-bond symbol
Example: C=(CC)C is invalid, but
C(=CC)C or C(CC)=C are valid SMILES

SMILES Fragments

Nitro
Nitrate
Nitrite
Sulfonic acid
Cyanide/Nitrile
Azide
Azido

N(=O)(=O)
ON(=O)(=O)
ON(=O)
S(=O)(=O)O
C#N
N=N#N
N+=N-

SMILES Metals
[Al] [As] [Au] [Be]
[Bi] [Cd] [Ca] [Fe]
[Hg] [K] [Li] [Mg]
[Na] [Ni] [Pt] [Sb]
[Sn] [Zn] [Zr]

Disconnected Structures
Indicated by a dot
Tetramethyl ammonium bromide
C[N+]C(C)C.[Br-]

Isomeric and Chiral SMILES


Isomeric configuration indicated by
forward and backward slashes: / \
Examples:
trans-1,2-dibromoethene: Br/C=C/Br
Direction of the slash continues

cis-1,2-dibromoethene: Br/C=C\Br
Direction of the slash reverses

Chirality indicated by the @ symbol

Some Applications
JMDraw/SMILESViewer (Christoph
Steinbeck)
JME Molecular Editor (Peter Ertl)
STN Express (SMILES as output)
Tripos (dbtranslate: SMILES to MOL)
Marvin (Ferenc Csizmadia)
http://chemaxon.com/marvin/
CACTVS http://www2.ccc.uni-erlangen.de/cactvs/

Another Application
SMILESCAS Database
http://www.syrres.com/esc/smilecas.htm
Over 103,000 SMILES notations

Input CAS Registry Number


Leads to SMILES and thence to a
structure search