You are on page 1of 12

Chemical Information Sources/Chemical Name and Formula Searches

1
Chemical Information Sources/Chemical Name
and Formula Searches
Introduction
Chemical Abstracts Service's Registry File is the largest single collection of data that can be used to identify a
chemical substance. Each unique chemical substance is assigned a Registry Number, which CAS uses in preference
to a chemical name to index documents in the CA or CAPlus Files. Much of the descriptive information about a
compound (its molecular formula, variant names for the substance, as well as much detailed information about its
makeup, including the structure) is found in the Registry File. Furthermore, in recent years, actual data (experimental
or calculated data) have been added to the file, making it much more like a huge handbook. The Registry Number
serves as the unique identifier of the record. The Registry File includes a number of search techniques that are built
on the chemical name and other fields included in the Registry File records.
In the printed CA, there is no Registry Number Index. Instead, the "Chemical Substance Index" ("CSI") links the
preferred CA Index Name for the substance to the documents that have information on it. However, names for
classes of compounds are indexed in the "General Subject Index". Also, in the printed Chemical Abstracts,
supplemental access to the printed product is found in the "Formula Indexes". The "CSI" has dictated much of the
indexing policy for supplemental terms used to describe the role of the chemical substance in the document. The
broad indexing terms found in the CAS Roles in the CA File and the Standard Subject Divisions in the printed CSI
can be of considerable use in retrieving the precise information of interest about a compound on which much has
been written.
Molecular formula searching in CA is based on the Hill Formula system (described below). The concept of the
dot-disconnected formula for salts, addition compounds, and mixtures is important in both the database and the
printed "Molecular Formula Index" to Chemical Abstracts.
A search for information on a single chemical substance may start with the name of the substance, its molecular
formula, or various other words or codes that can be associated with it. (See: Locating All CA File References Citing
a Chemical Substance
[1]
and CAS Registry: Finding CAS Registry Numbers
[2]
) In this chapter, we will encounter
various coding systems that have been applied to the retrieval of chemical substances from both printed and
computer-based sources. The main database to search for such information is the CAS Registry File, which now has
in excess of 70,000,000 records for chemical substances (including biosequences). Many of the entries in the
Registry File are for sequences of biological macromolecules. The bulk of the remaining small molecule entries are
for organic compounds, either simple organics (esters, steroids, heterocycles, stereoisomers, etc.) or such things as
mixtures, polymers, and organic salts. Just over 10% of the file is comprised of inorganic compounds.
Chemical Nomenclature
Mastery of formal chemical nomenclature is a skill possessed by few chemists nowadays. The International Union of
Pure and Applied Chemistry (IUPAC) determines the recommended practices for assigning official names to
chemical substances. With a knowledge of the IUPAC nomenclature rules, a chemist can visualize and depict the
correct structure of even complex chemical compounds. However, creating such a name from scratch is another
matter. An excellent Web guide to chemical nomenclature is Charles H. Davis's Chemical Nomenclature Lite.
[3]
Fox
and Powell's classic work, Nomenclature of Organic Compounds: Principles and Practice, appeared in a 2nd edition
in 2001. For other types of substances (and nomenclature in specific areas of chemistry), see the so-called "color
books" of the IUPAC.
[4]
The Enzyme Commission assigns EC numbers
[5]
for enzymes that are very useful in
computer searching.
Until late 2006, Chemical Abstracts Service (CAS) made major changes to their chemical nomenclature policies
only at the boundaries of the five-year collective index periods. They have now abandoned that policy, preferring to
Chemical Information Sources/Chemical Name and Formula Searches
2
make changes to CA Index Names as needed to ensure that the CAS Registry System has the most current, usable
information. The names will now conform more closely with the names that chemists typically use. Among the
nomenclature improvements to be implemented are more uniformly cited locants, reduction in the number of
stereoparent names, and the elimination of nearly 3,000 obscure stereoparents. Unexpressed amides also will be
disregarded.
Substance Searching Using Chemical Abstracts Service Registry Numbers
One very effective method of retrieving chemical substance information from a reference source is to utilize the
Chemical Abstracts Service REGISTRY NUMBER for the substance. The Registry Number is a unique number
assigned to each substance indexed by CAS. The CAS RN is a number of the format Y-XX-X, where Y can be from
two to six digits, and X is one digit, for example, 494-12-2. (Recently, the RN has been expanded to 10 digits.) The
Registry Number is found in many databases
[6]
and increasingly as an index to printed reference works. The
Registry File started in 1965 with new substances that were encounered from that date forward. Older substances
have now been entered into the system for records that date from 1907-65. Now that CAS has finished this task, all
compounds discovered post-1907 should be in the database. For compounds discovered prior to 1907, it is wise to
search the Beilstein and Gmelin databases on Reaxys
[7]
, which have coverage back to the 18th century.
The Registry Number appears in the indexing of CA and CAPlus File records in preference to the formal name of the
compound. In volume 106 of Chemical Abstracts is found abstract number 195826 for the following article:
Grieco, Paul A.; Bahsas, Ali. Reactions of allylstannanes with in situ generated immonium salts in protic solvent: a
facile aminomethano destannylation process. J. Org. Chem. (1987), 52(7), 1378-80. CODEN: JOCEAH
ISSN:0022-3263. CAN 106:195826 AN 1987:195826 CAPLUS
The indexing below includes part of the Registry Numbers for compounds discussed in the article.
SciFinder Example of Registry Number Indexing
(Reproduced with permission of CAS, a division of the American Chemical Society.)
CAS Registry Numbers are assigned to organic and inorganic substances, metals, alloys, minerals, polymers,
coordination compounds, elements, isotopes, peptides, enzymes, biomolecular sequences, and nuclear particles.
However, the mere mention of a compound in a document is not enough to insure that the indexers at Chemical
Abstracts Service will tie a CAS RN to the record for that document. To get an entry in the CA indexes, there must
be something new reported about the substance. It may be a new method of preparation, a new source for the
substance, a new reaction, a new kinetic or mechanistic study, new chemical or physical properties, a new method of
analysis, a new use or application, or a new biological effect. Chemical reactants and the resulting products are
routinely indexed, but reagents are not indexed unless there is a new preparation of the reagent itself or a novel use
of a standard reagent.
In 2008, CAS entered into a cooperative venture with Wikipedia to provide CAS Registry Numbers for chemical
substances of widespread general interest. The result is Common Chemistry
[8]
, a Web resource where
approximately 7,900 substances can be searched without cost by chemical name or CAS Registry Number. Entering
the CAS RN for Isatin, 91-56-5, brings up a record with the CAS Preferred Name, 1H-Indole-2,3-dione, 18 other
names for Isatin, the molecular formula, a 2D structural drawing, and the link to the Wikpedia article on Isatin.
Chemical Information Sources/Chemical Name and Formula Searches
3
The "Index Guide" and Chemical Name Searching in the Printed Chemical Substance Indexes
Just as the "Index Guide" controls the vocabulary that must be used in the Chemical Abstracts "General Subject
Index," it also provides the correct name to use in searching the CA "Chemical Substance Index". For example, a
check of the "Index Guide" for "Flavan" finds the following:
Flavan See 2H-1-Benzopyran, 3,4-dihydro-2-phenyl- [494-12-2]
In alphabetizing chemical substance names in the index, locant numbers, stereo designators, etc. are ignored. Thus,
we must look in the "B" section of the printed CA "Chemical Substance Index" for "Benzopyran" in order to find
index entries on the compound. Note that the CAS Index Name for Flavan is inverted, with the name of the so-called
HEADING PARENT listed first. This keeps structurally related compounds in the same area of the index. The basic
Heading Parent compound is listed first, followed by derivatives and other structurally related compounds. The
entries in the "Chemical Substance Index" include the TEXT MODIFICATIONS (other subject words) that give
more information about the documents that are indexed.
From 2007, CAS no longer categorizes information by collective index periods, so the new CA index names no
longer have a "CI" label, e.g., (6CI, 7CI, 8CI, 9CI), etc.
Qualified Substances in CAS Files and Indexes
If not much has been written about the substance during the indexing period, all of the indexed information is found
in a single alphabetical sequence under the Index Name in the printed "Chemical Substance Index". However, when
the index entries become voluminous, CAS divides them into Standard Subject Divisions. The compounds so treated
are referred to as QUALIFIED SUBSTANCES. Originally seven qualifiers were used, but two additional terms
(formation and processes) were added in 1994, and one phrase (uses and miscellaneous) was subsequently split
apart. The qualifiers are:
€€ ANALYTICAL STUDY (ANST) - for methodology of detection or determination of the substance, or its
analysis; also for separation if the intent is analytical.
€€ BIOLOGICAL STUDY (BIOL) - for biochemical uses and for processes, properties, occurrence, and formation
in biological systems (including nonfossil by-products of living matter, food, etc.). Studies on the herbicidal,
pesticidal, and pharmaceutical use of the material are also placed in this subdivision.
€€ FORMATION, NONPREPARATIVE (FORM) - for the incidental formation of the substance in a nonpreparative
study (from v. 121 onward).
€€ MISCELLANEOUS (MSC) - studies not otherwise classifiable.
€€ OCCURRENCE (OCCU) - for natural occurrence (in other than biological systems).
€€ PREPARATION (PREP) - for synthesis, manufacture, incidental formation (other than biochemical), recovery,
separation, and purification.
€€ PROCESS (PROC) - for nonreactive treatment of the substance, nonpreparative removal of the substance, and
complex treatments of the substance (from v. 121 onward).
€€ PROPERTIES (PRP) - for physical and chemical properties and related non-reaction processes.
€€ REACTIONS (RACT) - for chemical changes that lead to products differing chemically from the starting
material, including nuclear interactions (other than simple scattering), corrosion, neutralization, enolization,
isomerization, and tautomerism.
€€ USES (USES) - for applications (other than biochemical), removal (in purification procedures), industrial
processing.
Chemical Information Sources/Chemical Name and Formula Searches
4
CAS Roles
[9]
in the CA and other Files
ROLES are CAS indexing terms assigned to every indexed substance and to controlled index terms for classes of
compounds. The use of roles began to be appplied to the new online CA File records with v. 121 (July 1994). They
were then applied retrospectively to all CA File records by means of a computer algorithm. Since there are over 60
specific roles and 9 broad super roles, they substantially expand the indexing terms that were used prior to their
introduction. The role terms give a more precise link to the substance. For example, it is now possible to specify not
only that you want the preparation of the substance, but also that the preparation be a synthetic preparation, as
opposed to industrial manufacture. In the past, there was no distinction made in the use of the term "Preparation" in
such cases.
Searching the Registry File with a Chemical Name
The Registry File is the largest single source of chemical names in existence. It can be searched on the STN
command-language system by a trade or common name for a substance (CN), by its CAS Index Name (CN) in
inverted order, or by fragments of the CAS Index Name (CNS field). (See: Tips for Chemical Name Searching
[10]
)
Just as we had a Basic Index that is formed from subject words in a bibliographic database, there is also a basic index
for the Registry File when searched on STN. The BASIC INDEX of the Registry File includes both chemical name
fragments and molecular formula fragments. It may be necessary to follow certain protocols for special characters in
order to search for a chemical name. Greek characters, for example, are spelled out in their entirety with a period
before and after the Greek part of the name. An example of such a chemical name search in SciFinder Scholar is
below. Note that in the SciFinder Scholar system, the search will work with or without the periods around the
"alpha," but in STN command-language searching, the dots are mandatory.
SciFinder Explore by Substance Name Search for alpha-Methylbenzoin]
(Reproduced with permission of CAS, a division of the American Chemical Society.)
SciFinder Record for alpha-Methylbenzoin]
(Reproduced with permission of CAS, a division of the American Chemical Society.)
Note that in SciFinder Scholar, you should not invert the name when searching a CA Index Name. For example,
entering Benzene, 1,4-dibromo will not work, but searching 1,4-dibromobenzene will.
Searching the Registry File and Printed CA Indexes with a Molecular Formula: The Hill System
The system most commonly used today for arranging molecular formulas in indexes is the HILL SYSTEM. The
Hill System covers both organic and inorganic compounds according to the following rules:
1. Sum individually all like atoms within the molecule.
2. If carbon is present, place it and the total number of C's first in the formula.
3. If both carbon and hydrogen are present, place hydrogen and the total number of H's second. Note that if carbon is
not present, rule 4 applies to the substance, and the H is placed in its regular position in the alphabet.
4. All other atoms in the molecule are arranged alphabetically. That means that for inorganic substances without
carbon, the arrangement is alphabetical.
Within the index itself, the numbers of elements come into play. Here is an example of compounds arranged for a
Hill System Index:
Chemical Information Sources/Chemical Name and Formula Searches
5
Al6 Ca5 O14 C5 H8 O2
B2 O3 C8 H5 N O2
B2 Zr3 C15 H24 N2
Br H C22 H24 F N3 O2
C Cl4 Ca O3 Ti
C H Cl3 Cl H
C H N O H2 O4 S
C2 Ca H4 Sn
C2 H4 O3 Pb Rb2
C2 H4 Br Cl O5 P14 Zn7
C2 H5 Al Br2 Sn Zr4
Note that in the Registry File (including the SciFinder approach), the formulas may be searched with or without
spaces between the element symbols. They are put here for clarity. The Hill System gives rise to some formulas that
are quite different from those a chemist is used to seeing, e.g., H2O4S for sulfuric acid or BrH for hydrobromic acid.
The printed CA "Formula Indexes" do not have entries for the 600 or so qualified substances that have lots of
information written about them. Thus, we find in the Chemical Abstracts "Formula Index" from the 10th Collective
Index period (1977-81):
C8H5NO2
1H-Indole-2,3-dione [91-56-5].
See Chemical Substance Index
sodium salt [3486-31-5], 90: 6180p; 91: 157670v; 94: 209034z
This tells us that the printed CA "Chemical Substance Index" must be used for detailed information on isatin itself,
but it gives direct information that three documents dealt with the sodium salt of isatin during the period. When a
sustance would have more than 20 entries in a 6-month volume index or more than 50 entries in the 5-year collective
"Formula Indexes," a "See" reference is made to the name of the substance in the "Chemical Substance Index". We
find in the "Formula Index" the abstract numbers for the sodium salt of isatin since there were relatively few
documents written about that compound during the 10th Collective Index period.
A chemical formula in the Hill System may have more than one substance with that formula. For a given formula,
isomers are arranged alphabetically by the CAS Index Name.
In the online molecular formula index of the Registry File (/MF), salts, addition compounds, and mixtures have the
molecular formulas for the components arranged separately, with ratios for salts and addition compounds specified
when known. If the ratios are unknown, a lower case "x" before the second formula or subsequent formulas is used,
e.g.,
C15 H24 N2 . 2 Cl H
C22 H24 F N3 O2 .x H2 O4 S
These are examples of the so-called DOT-DISCONNECTED FORMULAS. (See: Tips for Molecular Formula
Searching
[11]
)
Chemical Information Sources/Chemical Name and Formula Searches
6
Molecular Formulas of Types of Compounds in CA/STN
A. Salts.
Simple salts such as sodium chloride are treated as any other Hill Formula: ClNa.
1. Metal Salts of Complex Organic or Organometallic Acids
In general these substances have the molecular formula of the cation followed by the dot disconnect symbol (the
period) and a multiplier times the molecular formula of the anion.
For metal salts of organic acids, the metal replaces one or more hydrogens attached to N, O, P, As, Se, or Te in an
organic substance. The CAS structuring conventions treat these substances in the following manner:
€€ The organic portion is treated as a neutral molecule, including the acidic hydrogen atoms.
€€ The metal is viewed as a separate, unattached fragment.
€€ The ratio between the organic acid and the metal atom is expressed. (If unknown, the ratio is expressed as "x".)
The multiplier for the organic acid is always 1. For the metal, it indicates the oxidation state as a fraction, e.g., C7
H6 O2 . 1/2 Cu
Example: C6 H8 O7 . 3 Na
1, 2, 3-Propanetricarboxylic acid, 2-hydroxy-, trisodium salt
CAS RN: 68-04-2
A search of the SciFinder Scholar product for the molecular formula yielded ten answers at the time of the search,
among them:
SciFinder Molecular Formula Answer: Trisodium Citrate]
(Reproduced with permission of CAS, a division of the American Chemical Society.)
Other examples:
€€ Unknown ratio: C6 H8 O7 . x Na
€€ Mixed metal salt: C6 H8 O7 . Ca . Na
€€ Metal salt of an alcohol: C6 H6 O2 . 1/2 Ba
€€ Metal salt of a radical ion: C10 H8 . Na
Exceptions:
€€ Metal salts of two or more different acids have the hydrogens removed, and bonds are formed from the
heteroatoms of the acids to the metal.
€€ Metal salts of dithiocarbamates (and Se or Te analogs) are represented as N-C(=Q)-Q, where Q = S or Se.
€€ Likewise, metal salts of dithiophosphates are represented as R2P(=Q)-Q, where R = halide, halogenoid, or
carbon-containing substituent.
€€ Salts of coordination compounds, e.g., C7 H4 Cu O3 and C18 H18 O8 Zn.
Organometallic compounds in the Registry File are substances which have a carbon atom directly bonded to a metal
atom, e.g., Phenyl Lithium: C6 H5 Li. Note, however, that carbonium ions and carbanions are generally found as
dot-disconnects in the Registry File.
Coordination compounds in the Registry File are substances in which an atom or group of atoms is bound to a
central metal atom by a pair of electrons supplied by the coordinate group and not by the central metal atom, e.g.,
metallocenes. These substances have the Class Identifier code CCS in the Registry File records.
B. Polymers.
Polymers are indicated with the molecular formula of the repeating unit(s) in parentheses to which is appended an
"x". The "x" indicates a repeating unit. For example, the molecular formula for 1,3-Butadiene is (C4H6)x. A search
for a polymer by molecular formula may retrieve variant forms of the substance, because the syndiotactic, isotactic,
Chemical Information Sources/Chemical Name and Formula Searches
7
graft or co-polymer will all have separate Registry Numbers.
Molecular Formulas in The Basic Index of the Registry File
The Registry File's Basic Index contains chemical name fragments and molecular formula fragments (including
molecular formulas for individual components of multi-component substances and single component substances).
Formula fragments searched in the Basic Index must be entered without spaces.
Element Information
In command-driven searching, it is possible to search for various information about the elements comprising a
chemical substance, such as:
€ Element Symbol, indicating the presence of an element (/ELS), e.g., => S B/ELS and H/ELS
€€ Element Count, to specify the number of unique elements in a component or substance (/ELC or /ELC.SUB)
€ Element Formula, the molecular formula of components without the numbers that depict the ratios (/ELF), e.g.,
=> S AL CO LA O/ELF
€ Periodic Group, the column and row designations for elements, e.g., => S B6/PC or => S LNTH/PG
€€ Material Composition, when looking for alloys
There are many more options for such searching on the STN command-language system.
Ring System Data and Ring Indexes
The Ring Identifier information (RID) lets you search a database for everything from the number of rings in a
substance to the Ring Formula (minus hydrogens). The Registry File now has much information about rings that can
be searched online, such as the Elemental Sequence for the Smallest Ring (/ESS), the number of rings in the ring
system (/NRRS), etc. These search techniques can be valuable in refining a substance search in the Registry File. See
the Registry File Database Summary Sheet
[12]
for more options.
The Ring Systems Handbook provides an easy way to find the Heading Parent name for ring compounds. This name
can then be used in the printed CA "Chemical Substance Index" or, for an online search, either the name or the
Registry Number can be used to retrieve the Registry File record. It is important to know that the compound found in
the Ring Systems Handbook may not actually exist. That is, there may be no information in the CA File on the
substance. When a new ring system is identified, the substituents are stripped off, and a new ring system entry placed
in the RSH.
The access to the entries in the Ring Systems Handbook is by name or ring analysis (and then by molecular formula
of the rings making up the compound, ignoring hydrogens). The main part of the set is arranged by the number of
rings comprising the compounds and the individual sizes of the smallest set of smallest rings. Thus, the number of
component rings, the sizes of those rings, and the elements comprising them are enough information to find a ring
compound. A section in the main body of the work might be labeled:
2 RINGS: 5,6 C4N-C6
We would find in the section an entry for 1H-Indole [120-72-9]
H
C .
: . . N .
C: .C. . C
. : :
. : :
C: C.........C
: .
Chemical Information Sources/Chemical Name and Formula Searches
8
:C.
with the molecular formula C8H7N and a 2-dimensional structural drawing of the molecule.
It would not be too difficult then to assign the proper Chemical Abstracts Index name for isatin: 1H-Indole-2,3-dione
Chemical Abstracts incudes an "Index of Ring Systems" with each Formula Index, beginning with the 7th Collective
Index period (1962-66).
Compound Class Identifiers
There are a number of other indexes that can be used in an online search of the Registry File, e.g., Compound Class
Identifiers (/CI).
Class Name Code
Alloy AYS
Coordination Compound CCS
Registered Concept CTS
Generic Registration GRS
Incompletely Defined Substance IDS
Manually Registered Substance MAN
Mineral MNS
Mixture MXS
Polymer PMS
Radical Ion RIS
Ring Parent RPS
An example of the use of the CI field in command-level searching is:
=> SEARCH PMS/CI (retrieves polymers)
Such searches are of use in combination with other Registry File searches in order to narrow an answer set. See the
Registry File Summary Sheet
[12]
for additional possibilities.
NLM's Online Chemical Dictionary Files, PubChem and ChemSpider
Databases such as the Registry File are referred to as ONLINE CHEMICAL DICTIONARY FILES. They exist to
help you identify substances, to gather like substances into a set, and to discover which files on the database vendor's
system have information on the substance(s).
In the past there was an online chemical dictionary file from the National Library of Medicine. Although not nearly
as large as the Registry File, NLM's CHEMLINE file contained over 1,360,000 records as of mid-1995. Work ceased
on the CHEMLINE file in 1998. NLM publishes Supplementary Concept Records (formerly, Supplementary
Chemical Records). It was an annual printed compilation for many years that contained all of the compound names
used in indexing records in the Medline system. See the record on this page
[13]
for a summary of the data fields
included in the Supplementary Concept Records. Various Medical Subject Heading (MeSH) files are available for
Chemical Information Sources/Chemical Name and Formula Searches
9
download
[14]
.
A smaller NLM file is ChemIDplus
[15]
, with nearly 380,000 compounds, over 263,000 of which have structure data.
There is also a ChemIDplus Lite
[16]
version for those who just need to do name or Registry Number searching and
do not want to use a plugin or applet. An important feature of the ChemIDplus file is the link to SuperList
[17]
.
SuperList designates a collection of lists of chemical substances maintained by key federal and state government
regulatory agencies, as well as by scientific organizations concerned with health and environmental hazards of
chemical substances. ChemIDplus provides directory assistance to those lists. Searching the NLM files is
considerably cheaper than searching the CAS Registry file.
Unlike CAS, the National Library of Medicine has attempted to group compounds with related substances in their
index in a hierarchical fashion. From 1963 through 1995, a chemical was generally "treed" in two places: in one Tree
showing its chemical structure and in a second Tree under its function, or pharmacological action. The arrangement
of chemical headings in MeSH (Medical Subject Headings) has not changed, but NLM no longer puts all drugs
under the functional trees.
The NIH's PubChem
[18]
is a free database covering over 27 million unique substances. PubChem has numerous
search options, including the capability to search by InChi
[19]
, the IUPAC International Chemical Identifier.
PubChem includes substance information, compound structures, and BioActivity data in three primary databases,
Pcsubstance, Pccompound, and PCBioAssay, respectively.
The RSC's ChemSpider
[20]
is also a free database containing around 25 million compounds from 400 data sources.
The easiest way to search in ChemSpider is to use a common name or tradename. For example, benzyl azide is a
versatile reaction intermediate. What information can I find about this compound in ChemSpider?
STEP 1 Go to www.chemspider.com. On the home page there is a search box, simply type the name of the
compound of interest and click Search. Alternatively, select the Search tab from the top toolbar and choose Simple
Search from the drop down menu.
STEP 2 Look at the results. The default record view will give you the structure, SMILES, InChIKey alternative
names & synonyms.
Scroll down the record view to see more information. The record view comprises a number of info boxes which may
include a number of different tabs indicating the different pieces of information that are available.
In the Associated data sources box for example, those data sources who are commercial vendors from whom you can
purchase the chemicals are indicated with a shopping cart. Other sources may include links to biological data,
toxicology data, physical properties, spectral data and safety data.
Scroll down the record to view all of the different sections of the page (if they aren€t visible click on the •expand€
icon in the section heading to expand them).
There will be info boxes for links to patent information from SureChem and literature links providing access to RSC
journals, book and databases. The Search Google Scholar link will enable you to expand a search into the wider
scientific literature based on the approved names and synonyms in ChemSpider.
Records may also have a link to reactions in ChemSpider SyntheticPages. You can view the full article in CS|SP at
http:/ / cssp. chemspider. com
There is also a link to spectral data. This can be HMNR, CNMR, IR or Mass Spectra. The spectra can be viewed in a
Java applet and can also be downloaded.
Chemical Information Sources/Chemical Name and Formula Searches
10
Beilstein and Gmelin
The factual Databases Beilstein and Gmelin are organized a little bit differently. Structure searching is the most
appropriate way to find informaiton in these sources on Reaxys
[21]
. Although both can be queried using chemical
name and formula searching, for the inorganic compounds in Gmelin, formula searching is actually the most
appropriate approach.
Searching by Name
In both Beilstein and Gmelin, there is a field "Chemical Name (CN)" containing the chemical names of the
substances in the databases. Select the field from the datastructure or use it in advanced mode like cn=*searchterm*.
Truncation can be used left and right. It is advised to use the list (expand) function to look for different spellings of
the same name that might be found from different authors in different publications.
The field "Chemical Name Segment" contains the fragmented pieces of the field CN. Querying for "Indole" using
this field retrieves a list of compounds containing the term "Indole" in their chemical name.
While these two fields contain the names and name fragments of registered Substances, the field "All Chemical
Names" includes the names of solvents, derivates and other fields with chemical names in addition, and thus allows a
broader approach to searches using chemical names.
Searching by Formula
Using molecular formula for searches in Beilstein can be a very powerful option, and there are a few options for such
a search.
The field Molecular Formula (MF) contains exact molecular formula for single- and multi-fragment compounds. It is
calculated from the chemical structure in Hill order, with no charge or isotope information. For multi-fragment
compounds like salts the molecular formulas corresponding to the individual fragments are separated from each
other by an asterisk and have normalized stoichiometric multipliers.
For the sodium salt of Isatin the Molecular Formula accordingly is C8H4NO2*Na
Positional isomers can be searched very effectively when the molecular formula is combined with a Lawson
Number(LN).
The field Linear Structure Formula (LSF) adds the option to explicitly include charges or isotope labels with the
exception of Deuterium and Tritium.
For the above mentioned Isatin salt this would be C8H4NO2(1-)*Na(1+)
The field "Search MF Range" allows searching for derivatives of a certain carbon skeleton or for ranges in the
molecular formula. Thus queries like "C(2-4) H(4-8)" or "C8 H7 *" are allowed using this field. Note that it is not
possible to use larger or less than signs or symbols. If you want to require more than 3 oxygens to be present in the
resulting structures, use "O(4-99)".
For Gmelin searching by molecular formular is the method of choice especially when it comes to inorganic
compounds.
Summary
Chemical nomenclature is an area of expertise claimed by few chemists today, but there are powerful search
capabilities in databases and printed reference works that make use of chemical names, both trivial and formal
names. On the other hand, all chemists use molecular formulas, and a system such as the Hill System for arranging
molecular formulas in an index provides a useful retrieval mechanism. Chemical Abstracts Service uses the Registry
Number to index documents in the CA database. Many tags have been developed to use with the Registry Numbers
for more effective searching in the CA databases. An increasingly popular search site is the PubChem database, and
the Beilstein and Gmelin databases are useful complements to the others.
CIIM Chemical Nomenclature
Chemical Information Sources/Chemical Name and Formula Searches
11
CIIM Link for further study
SIRCh Link for Chemical Name and Formula Searches
Problem Set on this topic
[22]
References
[1] http:/ / www. indiana. edu/ ~cheminfo/ C471/ chemall.html
[2] http:/ / www. cas. org/ ASSETS/ 2F6CF61DE9D843F9B5AF0A6B199E57BE/ casregnumbersweb. pdf
[3] http:/ / php. indiana. edu/ ~davisc/ Abstract.htm
[4] http:/ / www. iupac.org/ publications/ books/ seriestitles/ nomenclature. html
[5] http:/ / www. genome. jp/ dbget-bin/ get_htext?ECtable+ -f+ T+ w+ D
[6] http:/ / www. indiana. edu/ ~cheminfo/ C471/ stnfiles.html
[7] http:/ / www. reaxys. com/
[8] http:/ / www. commonchemistry.org/
[9] http:/ / www. cas. org/ ASSETS/ EB85B919049C4E448DCF8D391788F0DD/ casroles. pdf
[10] http:/ / www.indiana. edu/ ~cheminfo/ C471/ cnametip.html
[11] http:/ / www.indiana. edu/ ~cheminfo/ C471/ molftip.html
[12] http:/ / info. cas.org/ ONLINE/ DBSS/ registryss. html
[13] http:/ / www.nlm.nih. gov/ mesh/ ctype.html
[14] http:/ / www.nlm.nih. gov/ mesh/ filelist. html
[15] http:/ / chem. sis. nlm. nih. gov/ chemidplus/
[16] http:/ / chem. sis. nlm. nih. gov/ chemidplus/ chemidlite.jsp
[17] http:/ / sis.nlm. nih. gov/ chem/ superlist.html
[18] http:/ / pubchem. ncbi. nlm. nih. gov/
[19] http:/ / en. wikipedia. org/ wiki/ International_Chemical_Identifier
[20] http:/ / chemspider. com/
[21] http:/ / www.reaxys. com
[22] http:/ / www.indiana. edu/ ~cheminfo/ C471/ 471ps3. html
Article Sources and Contributors
12
Article Sources and Contributors
Chemical Information Sources/Chemical Name and Formula Searches  Source: http://en.wikibooks.org/w/index.php?oldid=2063862  Contributors: Adrignola, Avicennasis, Daviesje7, Gary
Dorman Wiggins
Image Sources, Licenses and Contributors
File:Isatine.svg  Source: http://en.wikibooks.org/w/index.php?title=File:Isatine.svg  License: Public Domain  Contributors: Dschanz
License
Creative Commons Attribution-Share Alike 3.0 Unported
//creativecommons.org/licenses/by-sa/3.0/