Professional Documents
Culture Documents
THERE are now several important KWIC indexes of titles of research papers
available in scientific disciplines, for example Chemical Titles, and the subject
indexes of Biological Abstracts and Geo Abstracts. Science Citation Index and Social
Science Citation Index have 'Permuterm' indexes based on keywords in titles. In
addition, there are several large machine-readable databases which can be searched
by title words. The question of whether such retrieval methods would work
acceptably in other, non-scientific subject areas does not seem to have been
studied hitherto, and the work reported here represents one approach towards
an answer.
Bottle and Preibish1 have made a comparison of index terms assigned in Psychological Abstracts with the corresponding titles in order to assess the suitability of a
KWIC index for psychology. Such a technique is applicable only in subject areas
where an indexing periodical exists, and cannot be used to compare different
subjects because of differences in depth of indexing and specificity of index terms.
To carry out interdisciplinary comparisons, a technique is necessary which
depends on the content of the titles alone.
METHOD AND RESULTS
One method fulfilling this condition (and capable of mechanization) is that used
by Tocatlian2 on chemical titles and by Bird and Knight3 on titles in Nature and
Journal of Clinical Endocrinology and Metabolism. Both these studies investigated the
change in the information content with time during two decades by counting
the number of 'substantive' or 'key' words per title in their samples. Non-substantive words were defined by Tocatlian as 'words that convey little or no
Journal of Documentation, Vol. 33, No. 1, March 1977, pp. 46-52
46
March 1977
JOURNAL OF DOCUMENTATION
Journal
Year
Trans. Faraday
Soc.
1947
1962
JCS Faraday
Trans.
Analyt. Chem.
1973
1947
1962
1973
J. Organic Chem.1947
1962
Ann. Bot.
Lancet
Philosophy
Mean S.D.
Mean S.D.
Mean S.D.
507
644
402
651
278
219
059 012
0 6 8 * 010
843* 3 0 6
066* 009
0 6 6 011
1130
991
I327* 564
864
329
354
975
1 1 5 8 * 429
718
368
740
1962-3
1973
1358*481
1973
1946-9
1947
1962
J. Soc. Psychol. 1 9 4 7 - 9
Brit. J. Social.
(started 1950)
Substantive
words per
title
332
1018* 4 2 3
617
1547
1176* 466
1973
Economica
All words
per title
624
820*
904*
306
950
317
320
1962
964
1973
1946-50
1960-3
982
374
398
362
672
304
1971-4
1950-4
1960-4
I97I-3
1946-50
1960-3
I97I-4
1945-50
1960-3
1971-4
J.Opt.Soc.Amer. 1947
739
343
8 9 0 * 323
559
668*
202
235
798* 2 3 8
473
492
655*
766
682
065
017
199
272
069
066
014
012
311
051 012
059* 012
840
28l
28l
415
546*
184
183
071
069
629* 2 1 9
071
059
010
013
063*
714
261
012
212
232
221
158
063
063
012
060
016
450
517*
446
437
489
464
197
185
197
186
064
060
063
063
016
014
014
015
215
171
060
011
742
736
355
348
842
370
751
881*
926*
482
305
361
556*
2 06
065
012
016
013
337
236
579*
282
186
109
064
064
012
017
503
265
017
191
299
284
063
447
950
066
063
019
015
553
296
008
016
544
589
605
382
567
745
0 7 0 011
072* 012
203
145
113
206
Mean S.D.
065
1962
834
403
326
175
069* 014
1973
933
371
655* 2 1 9
073* 012
476
662
818
240
268
058
059
010
010
11-02
480
839
829
329
321
230
053
059
009
904
387
060
439
055
013
012
963
274
213
054
011
759
261
Z. Phys. Client.
1178
{Leipzig)
1973
1425
Engl. translation
J. Chim. Phys. 1974
I611
Engl. translation
1436
1970-4
884
Hist. Z.
Engl. translation
1139
Annales
1972-3 1 0 7 6
525
638
576
421
550
514
605
400
568
011
48
March 1977
of Chemical Titles in 1960). However, these periodicals also showed an. increase
in the earlier period 1947-62, though the increase was significant only for
Analytical Chemistry. The two life science periodicals, Annals of Botany and Lancet
also showed significant increases in substantive words during 1962-73. (The
BASIC index of Biological Abstracts was introduced at the end of 1959.)TheLancet
showed a significant increase for the period 1947-62, while Annals of Botany
actually showed a decrease in this period. There were also significant increases
during 1962-73 for Economica, English Historical Review, and Journal ofthe Optical
Society of America, which cannot be attributed to the introduction of KWIC
indexes in their subject areas. English Historical Review showed a significant increase throughout 1947-62.
The increase in information content of chemical and biological titles since
i960 is thus to be seen in the context of a trend to more informative titles which
has occurred over a wide range of subjects (philosophy being the only exception
found), and which was already apparent before KWIC indexes and mechanized
searching of title words became common. The introduction of these tools may
be responsible to some extent for an awareness of the need for informative titles,
but it cannot provide an explanation for the generality of the trend observed.
Bird and Knight suggest another possible cause of the increasing informativeness of titles, viz. the need to pick out papers of possible interest from everincreasing numbers of papers in the field, if there are only a few periodicals of
interest, each containing say twenty papers a year, it is an easy matter to scan all
the papers as they appear. This is possibly still the position in philosophy: the
number of papers in Philosophy increased from eleven in 1947 to thirty-three in
1973. As the numbers ofperiodicals and papers per year grow, increasing reliance
must be placed on scanning lists of titles either in the journals themselves or in a
secondary journal. As early as 1962 the editor4 of English Historical Review requested contributors to word their titles so that 'the reader scanning contents or
index knows where he is in time and space'. The Journal ofOrganic Chemistry grew
from 122 papers in 1947 to about 1,100 in 1975, and the number of periodicals of
possible interest to an organic chemist lias also escalated. Clearly, his current
awareness problem is of a different order from the philosopher's. We would
suggest this as an explanation both of the greater number of substantive words in
chemical titles than in philosophy, and of the increase of title-length with time
in chemistry.
An attempt was made to discover what kinds of words were represented by the
increases in the number of substantive words. Of the increase of 116 words in
Journal of Organic Chemistry over 1947-73, the biggest contribution was found
to be +070 from words relating to structures and mechanisms; +030 was due
to chemical names and +026 to names of reactions. The increase of 239 substantive words in Analytical Chemistry during the same period included +162
from words relating to instrumentation and techniques. The increases in chemical
titles thus represent the introduction into the titles of words relating to new techniques used and aspects studied. In the social sciences and arts subjects new techniques and new aspects of study are comparatively much rarer, so that their
influence on the information content of titles is much less. In the Journal of Social
Psychology, for example, words relating to tests and technique contributed only
10 substantive word per title in 1947 and 04 in 1973. (Bottle and Preibish1 found
the value to be 8% of keywords, or about 05 per title, for 300 titles from 1968
Psychological Abstracts.)
49
JOURNAL OF DOCUMENTATION
We have counted all substantive words as equal, but clearly within a given title
this is in no respect true, and there may be gross differences between subjects in
the usefulness of title words. For example, 'Silicon heterocyclic compounds: ring
closure by hydrosilation' (Journal of Organic Chemistry, 1973) and 'Misleading
questions and irrelevant answers in Berkeley's theory of vision' (Philosophy, 1968)
each have six substantive words, but it is not clear that the philosophical title gives
as much information about the paper it describes as does the chemical title.
The traditional 'precision' and 'recall' values of retrieval experiments depend,
of course, not on the data elements alone but on the relation between them and
terms in the search profile. Suppose that the user's interest was 'English agricultural history', and the search profile was written as '(BRIT- or ENGL-) and
(AGRICULTUR- or FARM-)'. Then the title 'Wheat-growing in fourteenth
century East Anglia' would be missed, but had this been a subtitle to a main title
'Medieval English Farming. Part V , the paper would have been retrieved.
Strictly, the main title is redundant, since it is implied by the subtitle, and so in
one sense it contributes nothing to the information content. However, using the
number of substantive words as a measure, the information content is increased
50
March 1977
from six to nine by the inclusion of the 'series' heading. Similarly, we may consider that other substantive words in the title will contribute to precision rather
than recall, for example, 'East' in the title above.
Thus, for a particular title, the number of substantive words is not simply related
to its value in retrieval either as regards recall or precision. However, it seems
reasonable to suppose that in general a longer title will contain more words related
to the subject matter of the paper, and will be of more use as a basis for retrieval.
Olive et al.5 have studied the value of titles in operating an SDI service based on
Nuclear Science Abstracts. They found that, for titles of fewer than 100 characters,
index terms gave better recall than titles; while for longer titles, the titles gave
better recall. The shorter titles gave 51% precision and the longer ones 40%.
(One factor responsible for this difference is probably that papers with shorter
titles tend to be on more general subjects: they arc more likely to be of some
interest to the user whereas a highly specific paper on the wrong aspect of a subject
will be irrelevant.)
There are certain features in the vocabulary of scientific subjects which may make
title-searching easier than in non-scientific subjects.
(1) In chemistry, and to a lesser extent in biology and medicine, word-fragments
are often meaningful enough to be useful in retrieval. For example, the fragment
'-ase' will retrieve a large proportion of enzymes such as 'oxidase', 'urease',
'ligase', 'hydrogenase', etc. (as well as 'base', 'release', and a few other false drops).
Complex chemical names such as 'trans-2,3-dimethyl-i-phthalimidoaziridines'
give rise to several entries (five in this case) in Chemical Titles by being split
before each meaningful fragment. 'Hemocytoblastosis' is indexed at three points
in the KWIC index of Biological Abstracts. Such fragmentation is not possible to
anything like the same extent in history or psychology, so the number of entry
points will be almost limited to the number of substantive words, and the elements available for matching against a profile will be similarly reduced.
Fragmentation would, however, be important in a German KWIC index,
because of the frequency of agglutination.
(ii) The nomenclature of chemistry permits searches on two or more facets of
a compound, for instance a paper on 'ammonium trifluoroacetate' could be
retrieved by someone interested in ammonium compounds, acetates, or fluorocompounds. However, no-one is likely to be interested in the class of people with
the Christian name 'Samuel', so that the word 'Samuel' in 'Samuel Johnson'
serves only to improve precision when searching under 'Johnson'. Similarly,
topographical nomenclature does not indicate broader terms in the hierarchy. In
expanding fully the concept 'United States' in a history search, one would have
to include the name of each state, town, and region.
(iii) An important facet in history-related subjects will often be date. The ways
in which titles may indicate relevance to a particular period are many and unpredictable. For example, someone may be interested in English agriculture during
the period 1750-1850. The title words 'Georgian', 'eighteenth-century', '18001914', would all indicate the inclusion of potentially useful information. Thus
natural language searching seems to present formidable difficulties for searching
by date. (If, on the other hand, a title is located in a KWIC index by a word relating
51
JOURNAL OF DOCUMENTATION
to another facet of the problem, a date given in the context should indicate
whether the paper is relevant.)
(iv) There is a tendency in philosophy, and to some extent in other arts subjects,
for whimsical or metaphorical titles to be used, e.g., 'Never smile at a crocodile'
(Journal for the Theory of Social Behavior, 1973), and 'The cow on the roof '{Journal
of Philosophy, 1973). Indicative as these may be to the initiated, they seem to have
little value for retrieval. Sometimes a subtitle gives a more literal statement, as
'Right, left and centre: the Second Spanish Republic' (Historical Journal, 1972).
CONCLUSION
On the basis of the number of informative words they contain, the titles of research papers in physics, history, psychology, and to a somewhat lesser extent
other social sciences, do not seem to fall far short of chemistry and the life sciences
in their suitability for retrieval. However, they do not enjoy the semi-systematic
nomenclature of the sciences, which means that although sufficient information
may be present in the titles it may not be in a form suitable for retrieval. Further
work is needed on the vocabularies of titles in different subjects from the point of
view of specificity and predictability.
ACKNOWLEDGEMENT
O n e o f us ( A . B . B . ) is grateful t o t h e D e p a r t m e n t o f E d u c a t i o n a n d Science for
s u p p o r t f r o m a n I n f o r m a t i o n Science S t u d e n t s h i p .
REFERENCES
1. BOTTLE, R. T.AND PREIBISH, c. I. The proposed K W I C index for psychology: an experimental
test of its effectiveness. Journal of the American Society for Information Science, 21, 1970,427.
2. TOCATLIAN, J. J. Are titles of chemical papers becoming more informative? Journal of the
American Society for Information Science, 21, 1970, 345-50.
3. BIRD, p. R. and KNIGHT, M. A . W o r d count statistics of the titles of scientific papers. Information
Scientist, 9, 1975, 67-9.
4. HAY, D. in: English Historical Review, 77, 1962, 3.
5. OLIVE, c , TERRY, J. E.AND DATTA, S. Studies to compare retrieval using titles with that using
index terms: SDI from 'Nuclear Science Abstracts'. Journal of Documentation, 29, 1973,
169-91.
(Received 3 August 1976)
52