Professional Documents
Culture Documents
InChI--The IUPAC International Chemical Identifier An InChI [1] is a character string generated by computer algorithm to represent a chemical structure. It is used in software applications and databases where chemical structures need to be represented as machine-readable strings of information. InChIs are unique to the compound they describe and can encode absolute stereochemistry. InChI has been called the bar-code for chemistry and chemical structures. The InChI format and algorithm are non-proprietary and the software is open source, with ongoing development done by the community. Steve Heller wrote in a 9/15/2010 posting on CHMINF-L that virtually all major publishers are now supporting InChI and are adding the InChI/InChIKey to the chemicals reported in journal articles. InChI's and InChIKeys are searchable in Google, Yahoo, Bing, and other search engines. The two major NIH databases (PubChem and NCI) have over 60 million InChI's, while ChemSpider has well over 20 million. All the major commercial and Open Source structure drawing programs have imbedded InChI generation in their products. InChIs are freely usable and non-proprietary. They allow a more advanced representation of chemical information than other codes (such as the SMILES code). InChIs are unambiguous (i.e., conversion of chemical structures using standardized algorithms only
Chemical Information Sources/Cheminformatics leads to one InChI), and they are precisely indexed by major search engines such as Google. Standards for Coding Chemical Data In order for cheminformatics to succeed, certain standards had to be developed, although often a development of a dominant company turned into a standard coding method if made public, as in the case of MDL's SDF format or more recently, their CTfile format [2]. In the field of crystallography, the CIF format is widely used for small molecules and mmCIF for macromolecules. Even for such things as the color of molecules in in a 3D depiction, it is important to follow standards. For example, the CPK (Corey-Pauling-Koltun) representation for color coding requires: Carbon: grey or black (although some use green) Hydrogen: white Oxygen: red Nitrogen: blue Sulfur: yellow Phosphorous: orange Chlorine: green Sodium: blue Iron: purple Bromine: brown Zinc: brown Calcium: dark grey Other metals: dark grey Unknown: deep pink
CPK models have their atomic radii defined to reflect the space which molecules take up when they pack in solids or associate in liquids. Current Issues in Cheminformatics What is a small molecule? What is an adequate representtion of a sample? Property calculations vs. measurements Scoring functions for drug-like molecules Docking for ligand binding prediction Calculating diversity and similarity Where do cheminformatics and bioinformatics merge? Toxicology, ADME (Absorption, Distribution, Metabolism, Excrection), and other pieces of the puzzle for drugs Depictions of structure and visualization of data Electronic notebooks
Chemical Information Sources/Cheminformatics Summary Cheminformatics (or as it is more commonly known in Europe, chemoinformatics) has almost as long a history as the computer itself. It is the application of computer technology and methods to chemistry. Related fields are molecular modeling and computational chemistry. Chemiformatic techniques have found particular applications in the drug industry, but are now beginning to penetrate other areas of chemistry. CIIM Link for further study SIRCh Link for Cheminformatics Cheminformatics Introductory Resources
References
[1] http:/ / www. inchi-trust. org/ [2] http:/ / www. mdl. com/ downloads/ public/ ctfile/ ctfile. jsp
License
Creative Commons Attribution-Share Alike 3.0 Unported //creativecommons.org/licenses/by-sa/3.0/