You are on page 1of 8

Chemical Information Sources/Subject Searches

1
Chemical Information Sources/Subject Searches
Introduction
Almost all abstracting and indexing services, not to mention many other secondary and primary works, have subject
indexes. In this chapter we will look closely at the subject indexes for some of the major works already covered, as
well as note the existence of specialized abstracting and indexing services devoted to a particular document type and
full-text databases of primary and other literature types. Discussion of the type of subject search that uses the name
of a specific chemical compound is deferred to a later topic, although words that stand for classes of compounds are
discussed here.
The searches dealt with here are word searches. We must often find the right word(s) or group of words (phrases) to
pull needed information from a given reference tool. Such searches cover techniques, processes, types of reactions,
equipment, etc. The searcher has to be aware of variant spellings, the use of initialisms and acronyms, synonyms,
and other complicating factors in a subject or topic search. In addition, the interpretation that the search system gives
to the form in which the search statement is input is critical. For example, does the search system interpret two
adjacent words as a phrase that must always have the words in that order? Or does it assume that either of those
words could be present in a record in order to be a valid hit?
A fundamental question in conducting a subject search is whether all possible words, including synonyms,
acronyms, abbreviations, etc. should be used in a subject search or whether the search can be conducted using a set
of preferred terms selected by the indexers of the documents. As computers have become more and more powerful,
the techniques of FULL-TEXT SEARCHING have become popular, with every word in a document being a
potential subject term. Unfortunately, the number of false drops yielded in this type of UNCONTROLLED
VOCABULARY search can be quite voluminous. Therefore, searching with terms selected from the
CONTROLLED VOCABULARY of a THESAURUS or other subject term authority list is often preferable. One
example of such a tool is the Library of Congress Subject Headings
[1]
used by many academic libraries. Another is
the MeSH
[2]
Medical Subject Headings list that is used with the National Library of Medicine's Medline database.
Another NLM effort that is even broader in its scope is the development of a Unified Medical Language System
[3]
.
Included in that project is the UMLS Metathesaurus
[4]
. Chemical Abstracts Service uses the Index Guide to control
search terms in the printed product, and the CA Lexicon on the STN system (See STNotes #25
[5]
) shows the
underlying structure of the CAS vocabulary control system.
Most keyword searches, such as those in Science Citation Index, impose on the searcher the burden of selecting
alternate names, acronyms, etc. for the concept of interest when performing the subject search. For example,
Electron Spectroscopy for Chemical Analysis (ESCA) and X-Ray Photoelectron Spectroscopy (XPS) are both names
for the same technique. Therefore, a search for all references to the technique in a keyword subject index would
force the searcher to use both ESCA and XPS in the search strategy.
The distinction between uncontrolled (keyword) searching and searching using controlled vocabulary is important
and is the main point of this lesson, but that distinction is blurred in a tool such as SciFinder. The searcher simply
types into the Research Topic search window the natural language expression that defines the search, without trying
to insert Boolean search terms. Also, with SciFinder, no truncation is used. The SciFinder search algorithm has some
built-in intelligence to look for relevant word forms in the search. For instance, the search system automatically
searches for both singular and plural subject words.
Let's see an example of a search on SciFinder for the analytical technique "Electron Spectroscopy for Chemical
Analysis (ESCA)," including results from both the CAPlus and Medline databases. At the time it was run, the search
as entered found 4395 references where the two concepts "electron spectroscopy" and "chemical analysis" were
closely associated with each other and only 582 where the phrase as entered was found. In this case, let's repeat the
search using the acronym for the analytical technique (ESCA) and also use a synonymous acronym, XPS. (The
Chemical Information Sources/Subject Searches
2
technique is also known as X-Ray Photoelectron Spectroscopy.) We have the option of entering synonymous words
in parentheses, following a term or phrase. Thus, entering the research topic search on SciFinder Scholar as:
XPS (ESCA)
would imply to the system that you are looking for synonymous terms (an OR search). This search found
considerably more documents: 114,511 at the time of the search on October 3, 2004. Unfortunately, many of the
35,609 records pulled by the ESCA part of the search are false drops that match the word "escape".) Entering ESCA
by itself pulls 7516 records with the term "as entered," and it appears that all but the oldest (a 1918 record) are
relevant. Thus, the technique of entering synonyms in parentheses must be used with caution on SciFinder.
Keyword Searches
Let us restrict the phrase KEYWORD SEARCH to the type of uncontrolled vocabulary searching that is done when
the terms are not selected from an authoritative subject list. Google and most other Web search engine searches are
keyword searches (with some sophisticated backend analysis of the web sites coming into play to produce the final
result list). Keyword indexes are often computer-produced indexes that result in every significant word in the
document (or in certain fields of the document) becoming a KEYWORD. Such indexes exist in the weekly printed
issues of Chemical Abstracts and in the Science Citation Index in its "Permuterm Subject Index". The same is true of
the Web of Science subject searches and searches on ingenta. However, Science Citation Index has for a number of
years included the capability to enhance the keyword searches using their KeyWords Plus feature.
SCI generates KeyWords Plus terms for many articles. KeyWords Plus are words or phrases that frequently appear
in the titles of an article's references, but do not necessarily appear in the title or abstract of the article being indexed.
SCI also utilizes keywords that authors sometimes provide in their articles that they feel best represent the content of
the paper. Thus, KeyWords Plus may be present for articles that have no author keywords and may include important
terms not listed among the title, abstract, or author keywords. All of these keywords are contained in the SCI record
and are searchable.
Controlled Vocabulary Indexes: Library of Congress Subject Headings and Classification
Library of Congress subject headings are commonly used in college and research libraries, and LC breaks the broad
area of chemistry into sub-areas. Of course, one option to find a relevant book, journal, or database owned or leased
by a given library is simply to browse an appropriate section of the library's stacks, using the following table as a
roadmap in a library that uses the Library of Congress classification system.
MAJOR DIVISIONS OF THE
LIBRARY OF CONGRESS
CLASSIFICATION SCHEDULE FOR CHEMISTRY
Subjects LC Range
Chemistry (General) QD 1-65
Analytical Chemistry QD 71-142
Inorganic Chemistry QD 146-197
Organic Chemistry QD 241-441
Physical and Theoretical Chemistry QD 450-801
Crystallography QD 901-999
For an LC classification number of many chemical topics, consult this list of chemistry terms linked to LC Class
Numbers
[6]
.
Many library OPACs allow two methods of searching for a subject:
€€ keyword (any subject word you can think of)
Chemical Information Sources/Subject Searches
3
€€ prescribed subject word or concept.
The latter approach utilizes a controlled vocabulary, in this case, the Library of Congress subject headings. Those
may be searched in a library's OPAC, such as the Indiana University Libraries OPAC, IUCAT
[7]
. The broad LC
subject headings can often be further defined by topic or format of the material being indexed, so to find appropriate
works one could search phrases such as:
chemistry inorganic encyclopedias
OR
chemistry analytic dictionaries.
Controlled Vocabulary Indexes: Chemical Abstracts "Index Guide"
One of the virtues of a keyword subject index is that the index terms reflect the current, ever-changing vocabulary of
science. As soon as a new name for a concept, technique, etc., is used in a document, it could become an indexing
term. Controlled vocabulary lists, on the other hand, are slower to adapt to changes in scientific terminology, but
their greatest benefit is that they guide you to the preferred term for the concept. Hence, the searcher need only
identify the preferred indexing term to find documents of interest.
The printed tool that controls the vocabulary in the Chemical Abstracts six-month volume and five-year collective
General Subject and Chemical Substance Indexes is the INDEX GUIDE. For example, looking in the "E" section of
the "Index Guide" for ESCA directs you to the "P" section of the actual "General Subject Indexes":
ESCA (electron spectroscopy for chemical analysis)
See Photoelectric emission
x-ray
See Photoelectron spectroscopy
x-ray
Likewise, looking in the "X" section of the Index Guide for XPS leads to the same preferred phrases:
XPS (x-ray photoelectron spectroscopy)
See Photoelectric emission
x-ray
See Photoelectron spectroscopy
x-ray
Thus, by using the Index Guide, the searcher would discover that documents on this topic can be found in the "P"
section of the "General Subject Index" to Chemical Abstracts. It is important to use the CA "Index Guide" before
using the "General Subject Index" because there are no "see" references in the "General Subject Index" itself.
Furthermore, each five-year collective index period has its own "Index Guide". There is a guide to Hierarchies of
General Subject Headings
[8]
to assist in selecting terms.
Chemical Abstracts Printed Subject Indexes and CA File Subject Searches vs. SciFinder Subject Searches
Prior to 1972, there were five- and ten-year Subject Indexes to Chemical Abstracts. Beginning with the 9th
Collective Index period for 1972-76, the chemical name index entries for single chemical substances were put into a
new work, the CHEMICAL SUBSTANCE INDEX. Everything else, including names for classes of substances
[9]
(e.g., ethers), went into the GENERAL SUBJECT INDEX. Thus, searches for terms referring to classes of
compounds, reactions, processes, equipment, or plant and animal species should be searched in the "General Subject
Index" after the proper term or phrase has been found in the "Index Guide". Another way of finding the proper
General Subject Index terms for recent CA entries is to utilize the CA Lexicon on STN. The 15th Collective Index
period refers to the years 2002-2006. You must keep in mind that the terminology rules may change from one
Chemical Information Sources/Subject Searches
4
collective index period to another. For example, the 14th CI Period moved significantly toward the current
terminology in various fields, preferring "DNA" to the previous "Deoxyribonucleic acids" and "Drugs" to
"Pharmaceuticals". From 2007, CAS no longer categorizes information by collective index periods, so the new CA
index names no longer have a "CI" label. It is important to check the "Index Guide" that corresponds to the period
you are searching in order to be sure of finding the correct term for use in the "General Subject Index".
Not every preferred term or phrase is found in the "Index Guide," and if you do not find a listing there, assume that
you have chosen the correct preferred term and look in the appropriate section of the "General Subject Index".
Always be aware that preferred terms may change when the boundaries of the Collective Index periods are crossed.
Look at a sample record from the CA Student Edition on OCLC
[10]
, paying particular attention to the index terms
and the use of abbreviations.
For most online commercial bibliographic databases, the database vendors will define a default subject index
(BASIC INDEX) in which subject words are searched. In the CA File on the STN system, the Basic Index contains
subject words from the titles, keywords, abstracts, and controlled vocabulary of the documents (and so-called TEXT
MODIFICATIONS of the controlled vocabulary entries), plus CAS Registry Numbers used to index the documents.
The vendors will list in the database summary sheets
[11]
exactly what types of terms are included in a Basic Index
search.
As seen in the sample record from the CA Student edition, the text modifications to the controlled vocabulary terms
were sometimes difficult to interpret, e.g., "(intramol., of silyloxytetradecatrienoate and silyloxytetradecatrienal,
stereochem. of)". So beginning in October 1994, CAS introduced a format that is easier to read.
Old style:
€€ Adsorbed substances
€€ (carbon monoxide and water and nitric oxide, on copper-silica catalysts, reactions of)
New style:
€€ Adsorbed substances
€€ (adsorption and reactions of carbon monoxide and water and nitric oxide on copper-silica catalysts)
As noted above, the SciFinder topic search will do some behind-the-scenes work to find appropriate terms to include
in a search, so people who use that search tool do not have to worry as much about controlled or uncontrolled
vocabulary when they perform a research topic search. However, with some caution, as noted above, you may use
synonyms in parentheses next to a related concept, for example, ESCA (XPS).
Section Codes for Online Searches
Since the information in Chemical Abstracts is classified into 80 major subject sections
[12]
, the section numbers and
codes can be used on STN and Dialog to limit a subject search. For example, works dealing primarily with enzymes
are found in sections 3 and 7 of the weekly Chemical Abstracts. Other documents are assigned to one of the 80
subject categories divided into the following gross categories:
Chemical Information Sources/Subject Searches
5
Section
Name
Section
Code
Section
Numbers
Biochemistry BIO/CC 1-20
Organic Chemistry ORG/CC 21-34
Macromolecular Chemistry MAC/CC 35-46
Applied Chemistry & Chemical Engineering APP/CC 47-64
Physical, Inorganic, & Analytical Chemistry PIA/CC 65-80
Thus, a strategy that included in an online search on STN:
=> S L4 AND (3 OR 7)/CC
or
=>S L4 AND BIO/CC
would have the effect of limiting the retrieved documents in answer set L4 to those dealing with enzymes (found in
sections 3 or 7 of the printed CA) in the first case, and those of a biochemical nature found anywhere in sections 1-20
of the printed product in the second case.
Refining Searches on SciFinder
SciFinder searches can be refined by many other options, as seen below.
(Reproduced with permission of CAS, a division of the American Chemical Society.)
Similar refinements are possible with Web of Science and other database searches.
In 2010, chemistry librarians Chuck Huber and Ben Wagner gave the following useful guidelines on CHMINF-L
(slightly edited in the "mashup" below).
Some of the options in SciFinder can be effective in eliminating Medline references automatically.
1) CA Section Title
[12]
has its origins in the original print Chemical Abstracts which appeared in 80 sections to help
people browse. These are very broad categories. Note that the definitions and exact title of the sections changed a
number of times over the years which explains the variations you will see when you do an Analysis. Note that doing
this analysis automatically discards Medline records (with no warning message), as they of course don't have CA
Section Titles assigned.
2) Index Term - analyzes the controlled vocabulary of both CAPLUS and MEDLINE, i.e. subject headings, but not
the chemical substance indexing. It does not search the Supplementary Terms.
3) CA Concept Headings - this analyzes CA "main heading" controlled vocabulary/index terms used in the old
General Subject Index in the print world, i.e., this excludes chemical substance indexing. These headings appear in
the CONCEPT column (the header box, not the detailed text modifier info) in the SciFinder record. This analysis
excludes MEDLINE records, again without a warning message. If you are searching a set that has only CA
references, this analysis appears to be identical to the Index Term analysis.
4) Supplementary Terms - Originally Supplementary Terms contained single terms from the CA keyword phrases,
which are (or were) indexing terms used to prepare quick indexes to each issue of printed CA. Keywords reflect the
content of the title and the abstract, using vocabulary found in the original document. The singular form for words is
preferred and generally used according to page V-19 of the May 1996 CA files database description manual. For
CAPLUS, it analyzes the ST field. For MEDLINE, it analyzes the title field. Hence, MEDLINE records are not
excluded from this analysis.
Here are some hints for how these can be used in SciFinder subject searches.
Chemical Information Sources/Subject Searches
6
1) CA Section Title - assuming you do not care about MEDLINE records in the answer set, the CA Section Title
limitation will help focus on a very broad category such as Enzymes or Biochemical Genetics or Mammalian
Hormones. This is useful when one wants categories too broad to be defined by keywords or to eliminate noise from
a disparate category. It can also be useful for large sets where an index term analysis is overwhelming. Be sure to
move far enough down in the Analyze listing to pick up older variant section titles for the same section. References
often are assigned to more than one section, so care is needed since it is unreasonable to expect every singe reference
on enzymes regardless of the context to be in the Enzyme section. When you select CA Section Titles, you are
making the assumption that you are selecting references where the major emphasis of the article (not unlike the
asterisked headings in Medline) is related to the section. Thus, CA Section Title is useful when you want to
distinguish between two concepts with the same name, but radically different areas. For example, you search
"plasma" and want to separate the stuff in your blood from the stuff in stars. Perhaps you want to home in on major
concepts, e.g., when you are looking for uses of a particular type of catalyst. If you narrow to the papers placed in the
Catalysis section, those papers would presumably be treating the catalytic role as a major rather than minor topic.
2) Index Term - This has the advantage of keeping MEDLINE records. It is useful when you get to a point in the
search where you have put in all the concepts you can think of and all the limits that you feel are safe, but still have
too many references to comfortably browse. The Analysis by index terms is the perfect solution showing us what is
in the set when we don't know exactly what we want. It generates ideas as to what facets of the set we want to look
at. Index Term is a dependable way to identify key terminology and/or more tightly focus a search on your topic.
One problem with Analyze on Index Terms is that sometimes an index term which you would like to home in on is
buried in the lower levels of both the sort by rank and the alphabetical sort. By using Categorize, you can work your
way down hierarchically to the set of terms you want, and within the smaller subset of terms in the final Categorize
column, you can find the terms you're looking for. However, Categorize doesn't run for really large answer sets.
The example below shows the results of analyzing the 11,126 records from the XPS(ESCA) search found by first
limiting the search to the CAplus database, then limiting to the period 2003- (performed on October 3, 2004). Once
the analysis has been done, it is possible to select terms of interest simply by checking the boxes and getting the
results.
(Reproduced with permission of CAS, a division of the American Chemical Society.)
3) CA Concept Headings - I have little use for this option since it basically performs an index term analysis. The
only use I can think of is where I have a set of both CAPlus and MEDLINE records and want to simultaneously
eliminate the MEDLINE records while looking at the CA indexing.
4) Supplementary Terms - Especially if going after a very new, specific, or unusual topic, it would be an extra
precaution to check Supplementary Terms to make sure that an index term analysis had not missed some important
records. This is a way to do a title term search in MEDLINE, something that otherwise could only be done with the
Explore References: Journal search screen. It is a good idea to first do an Analyze by Index Term and use
Supplementary Terms as a double check. You would likely seldom do an ST analysis alone. Another use of
Supplementary Terms is when SciFinder "over-truncates". For example, "alcoholysis" gets truncated to "alcohol"
which yields a huge number of false drops. But if you analyze by supplementary terms, you can pick out the papers
in which the desired term appears untruncated in that field.
Specialized Abstracting and Indexing Services for Subjects or Document Types
There are many specialized abstracting or indexing services that cover either a subset of chemistry, e.g., Analytical
Abstracts
[13]
, or a particular format, e.g., Proquest's Dissertation Abstracts International and their online dissertation
services
[14]
. Many of the techniques for subject searching discussed in this chapter are applicable to those works,
but acquainting yourself with the guides, database summary sheets, and other user aids for any tools you choose to
search is a very good idea.
Chemical Information Sources/Subject Searches
7
Full Text Databases
Special techniques, particularly the use of proximity operators, are critical to success in searching text databases.
Electronic primary journal databases are now widely available on the Web. American Chemical Society journals
[15]
can be searched by subject on the Web only by words in the article titles or in the full text of the articles. More
sophisticated searching is reserved to the Chemical Abstracts database and a link through CAS's ChemPort
[16]
service to the articles themselves. The ACS Electronic Supporting Information (formerly called Supplementary
Material), containing more detailed data and other supplements not found in the printed journals, is also available to
subscribers of ACS journals on the ACS Publications Web site. Links to the Supporting Information can be found in
the table of contents for those issues that include such data or linked to the HTML version of the articles themselves.
Elsevier Science makes available on the Web a search engine named Scirus
[17]
that covers both Elsevier journals
and Web resources.
Summary
Depending on the database or printed reference tool chosen for a subject search, the user may simply enter a topic in
natural language or may be required to consult an authoritative list of subject terms used to index the documents
before performing a subject search. The less effort that has been put into the construction of the database by the
indexers or database producers, the more creative the searcher will have to be in thinking of synonyms, acronyms,
and other aspects of the search to find the most relevant information. There is always a trade-off between precision
(the relevance of the retrieved articles) and recall (the number of relevant items in the database that were actually
pulled by the search strategy). A very narrowly defined search strategy may achieve nearly 100% precision, but find
a relatively small percentage of the important relevant references in the database. The voluminous answer sets that
are returned in most Gooogle searches represent low precision. Commercial databases have developed many
techniques that allow a searcher to refine a search strategy and hone in on the needed information. Learning about
the options available before attempting a database search will pay dividends in the long run.
CIIM Link for further study
SIRCh Link for Subject Searches
Problem Set on this topic
[18]
References
[1] http:/ / en. wikipedia. org/ wiki/ Library_of_Congress_Subject_Headings
[2] http:/ / www. nlm. nih.gov/ mesh/
[3] http:/ / www. nlm. nih.gov/ pubs/ factsheets/ umls. html
[4] http:/ / www. nlm. nih.gov/ pubs/ factsheets/ umlsmeta. html
[5] http:/ / www. cas. org/ support/ stngen/ stnotes/ index.html
[6] http:/ / www. indiana. edu/ ~cheminfo/ 01-03. html
[7] http:/ / www. iucat. iu. edu/ uhtbin/ cgisirsi/ HWwJXgIuwB/ B-WELLS/ 174352002/ 60/ 1180/ X
[8] http:/ / www-sul. stanford. edu/ depts/ swain/ collections/ databases/ cas/ casapp/ index. html
[9] http:/ / www. indiana. edu/ ~cheminfo/ C471/ cmpdclas. html
[10] http:/ / www.indiana. edu/ ~cheminfo/ C471/ roush_1st. html
[11] http:/ / cheminfo.informatics. indiana.edu/ cicc/ cis/ index. php/
Link_to_Internet_resources_for_Guides_to_Chemical_Information_Sources_and_Databases#Database_Summary_Sheets_and_Guides
[12] http:/ / support.dialog. com/ searchaids/ dialog/ pdf/ f399_title. pdf
[13] http:/ / www.rsc.org/ Publishing/ CurrentAwareness/ AA/
[14] http:/ / www.proquest.com/ products_umi/ dissertations/
[15] http:/ / pubs. acs. org/
[16] http:/ / www.chemport. org/
[17] http:/ / www.scirus. com/ srsapp/
[18] http:/ / www.indiana. edu/ ~cheminfo/ C471/ 471ps2. html
Article Sources and Contributors
8
Article Sources and Contributors
Chemical Information Sources/Subject Searches  Source: http://en.wikibooks.org/w/index.php?oldid=2180893  Contributors: Adrignola, Avicennasis, Gary Dorman Wiggins, Jquigley
License
Creative Commons Attribution-Share Alike 3.0 Unported
//creativecommons.org/licenses/by-sa/3.0/