Professional Documents
Culture Documents
Structure
10.0 Objectives
10.1 Introduction
10.2 Meaning and Scope
10.3 Natural Language vs. Indexing Language
10.4 Structure of Indexing Language
10.4.1 Controlled Vocabulary
10.4.2 Syntax
10.4.3 Semantics
10.1 INTRODUCTION
Indexing language (IL) is an artificial language made up of expressions connecting several
kernel terms and adopted to the requirements of indexing. The function of an IL is to do
whatever a natural language (NL) does and in addition organise the semantic content
through a different expression providing a point of access to the seekers of information.
An IL is a system for naming subjects and has controlled vocabulary. The vocabulary of
an IL may be verbal or coded. A classification scheme uses coded vocabulary in the
form of notation and authority lists uses verbal vocabulary. It is a prerequisite to understand
the features of the language used for the representation of the subject content of the
documents in terms of their linguistics structures and functions for the purpose of studying
the structure of indexing language. Thus, there are areas of linguistics which are of
common interest to information scientists.
35
Indexing
10.4 STRUCTURE OF INDEXING LANGUAGE
Like natural language, an indexing language consists of three elements: (a) Vocabulary
(not free vocabulary, but controlled vocabulary), (b) Syntax, and (c) Semantics. All the
structured indexing languages are based upon careful subject analysis. The following
figure presents the structure of an indexing language:
Indexing
Language
Verbal Coded
Classification
SH Lists Schemes
Thesauri Thesaurofacet
Classaurus
10.4.2 Syntax
The etymological meaning of syntax is ‘putting things together in an orderly manner’. In
the context of an indexing language, syntax refers to a set of rules or grammar which
governs the sequence of words in a subject heading, or notations in a classification
number.
Most of the subjects treated even in modern macro documents are of compound nature.
This means that the name of a subject can no longer be represented by a single word or
term. When a number of terms have to be used in representing the subject coextensively,
syntax is necessary to put the terms in a most helpful and known searchable order. In
other words, we can say that syntax of an indexing language provides pattern of
relationship which we recognise between the terms used in the system, i.e. between the
terms in the index vocabulary or controlled vocabulary. This recognition is based on a
36
careful subject analysis which is basic to the indexing language.
The order of terms according to the rules of syntax of an indexing language assumes Indexing Languages
greater importance of presenting correct meaning of a subject heading. Apart from the
order of terms prescribed by its rules of syntax, it becomes necessary, at times, to use
relational symbols or indicator digits to bring out the correct relations between terms. In
this connection, it is to be pointed out here that natural language provides auxiliaries like
prepositions, conjunctions, etc. to bring out the correct meaning of the sentence. But in
an indexing language such auxiliaries are not available and hence, the correct meaning
of a subject heading has to be expressed largely through the order of terms along with
the relational symbols like role operators or indicator digits. Syntactical relationship is
document dependent relationship.
10.4.3 Semantics
As stated earlier, semantics refers to the systematic study of how meaning is structured,
expressed and understood in the use of an indexing language. Various types of semantic
relationships are evident in an indexing language. These relationships include equivalence
relationships, hierarchical relationships, and associative relationships. Meaning of the
term can be derived from its hierarchy. Semantic relationship is document independent
relationship. The syntactical rules of an indexing language is also used to resolve the
meaning of the term in a subject heading (consisting of string of terms) through the
determination of context.
Problem of homographs.
10.6.2 Objectives
The basic objective of controlled vocabularies is to provide a means for organising
information. Through the process of assigning terms selected from controlled vocabularies
to describe documents and other types of content objects, the materials are organised
according to the various elements that have been chosen to describe them.
Controlled vocabularies serve five purposes:
1) Translation: Provide a means for converting the natural language of authors,
indexers, and users into a vocabulary that can be used for indexing and retrieval.
2) Consistency: Promote uniformity in term format and in the assignment of terms.
3) Indication of relationships: Indicate semantic relationships among terms.
4) Label and browse: Provide consistent and clear hierarchies in a navigation system
to help users locate desired content objects.
5) Retrieval: Serve as a searching aid in locating content objects.
From the operational point of view, there are basically two fold objectives for having a
controlled vocabulary:
1) To promote the consistent representation of the subject matter of documents by
indexers and searchers, thereby avoiding the dispersion of related documents,
through control of synonymous and nearly synonymous expressions and by
distinguishing among homographs, and
2) To facilitate the conduct of a comprehensive search by bringing together in some
way the terms that are most closely related semantically.
10.6.3 Methods
The methods of vocabulary control refer to the various means by which the objectives
of controlled vocabulary are achieved. The first objective is achieved by controlling the
terminology. Control of terminology is achieved in various ways. First the form of term
is controlled, whether this involves grammatical form, spelling, singular and plural form,
abbreviations or compound form of terms. Second, a choice is made between two or
more synonyms, near-synonyms and quasi-synonyms. Third, homographs are
distinguished. The control of synonyms is achieved simply by choosing one of the possible
alternatives as the “Preferred term” and referring to this term (by using “See” or “Use”)
from the variants under which certain users may be likely to approach. It should be
obvious that the synonyms selected as the preferred term (i.e. the term under which
documents will actually be indexed and search for) must be the one under which the
majority of users will be likely to look first. Sometimes “quasi-synonyms” are created in
the same way as synonyms (i.e. one is chosen and the reference is made from the
other). The term “quasi-synonym” is not very precise. Many authors consider the quasi-
synonyms as the antonyms that represent opposite extremes on a continuum values. An
example is the pair of “roughness” and “smoothness”. Clearly “roughness” may be
regarded as merely the “absence of smoothness”, and vice-versa.
The controlled vocabulary also distinguishes among homographs (i.e. words with identical
spelling but the different meaning), usually by means of a parenthetical qualifier or scope 41
Indexing note. Thus CRANE (Bird) tells us that this term is to be used exclusively for a type of
birds and not as lifting equipment or any other possible context.
By controlling synonyms, near-synonyms and quasi-synonyms, and by distinguishing
among homographs, the vocabulary control device avoids the dispersion of like subject
matter and the collocation of unlike subject matter. In this way it helps to achieve the
object of consistent representation of subject matter in indexing and searching.
The second objective of the vocabulary control device is to link together terms that are
semantically related in order to facilitate the conduct of comprehensive searches. A
controlled vocabulary will bring together terms that are hierarchically related (in a formal
genus-species relationships), and it will also reveal semantic relationship across hierarchies
(i.e. terms having non-hierarchical associative relationships). This objective is achieved
by means of “See also” references among terms that are most closely related semantically.
Self Check Exercise
Note: i) Write your answer in the space given below.
ii) Check your answer with the answer given at the end of this Unit.
4) What is the difference between paradigmatic and syntagmatic relationships?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
5) What do you understand by ‘Controlled vocabulary’?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
6) State the objectives of vocabulary control. How are these objectives achieved?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
10.7.3 Thesaurus
The term thesaurus has been derived from Greek and Latin words which mean ‘a
treasury’ and it has been used for several centuries to mean a lexicon or treasury of
words. Modern usage may be said to date from 1852, when the first edition of Thesaurus
of English Words and Phrases was published by Peter Mark Roget. A thesaurus
(plural: thesauri) with which we are concerned is meant for information retrieval and is
used as a valuable vocabulary control device for indexing and searching in a specific
subject area. The journey of the thesaurus from the linguistic domain to information
46 retrieval is evident from the following timeline:
1736: The term ‘thesaurus’ first appeared in OED. It came from the Greek word Indexing Languages
‘thesauros’ which means ‘Treasury or storehouse of knowledge’.
1852: Appeared Peter Mark Roget’s Thesaurus. It was a linguistic thesaurus showing
the word(s) by which the given idea most fitly and aptly expressed, i.e.
Classification of ideas.
1957: Dorking Conference. Miss Helen Brownson first brought the idea of
‘thesaurus’ in terms if IR through a paper presented there.
1959: H P Luhn gave the idea of the application of thesaurus in IR.
1969: The first thesaurus used in IR system was developed by Du Pont in USA.
Definition
There are many different definitions of thesauri, varying from quite modest definitions
that focus on the relations between words without stating which kinds of relations that
are meant, to such definitions that state more exactly which relations that are concerned.
The definition of Thesaurus provided by World Science Information System of
UNESCO (known as UNISIST) on the basis of its function and structure seems to be
most comprehensive to understand the meaning and scope of the thesaurus:
“In terms of function, a thesaurus is a terminological control device used in translating
from the natural language of documents, indexers or users into a more constrained
‘system language’ (documentation language, information language)”.
“In terms of structure, a thesaurus is a controlled and dynamic vocabulary of semantically
and generically related terms which covers a specific domain of knowledge”.
Purpose
A thesaurus is a semantic network of terms. Its purposes are
a) To provide a map of given field of knowledge, how concepts or ideas about
concepts are related to one another, which helps an indexer or a searcher to
understand the structure of the field.
b) To provide a standard vocabulary for a given subject field which will ensure that
indexers are consistent when they are making index entries to an information storage
and retrieval system.
c) To provide a system of references between terms which will ensure that only one
term from a set of synonyms is used for indexing one concept, and that indexers
and searchers are told which of the set is the one chosen; and to provide guide to
terms which are related to any index term in other ways, either by classification
structure or otherwise in the literature.
d) To provide a guide for users of the system so that they choose a correct term for
a subject search; this stresses the importance of cross references. If an indexer
uses more than one synonym in the same index—for example, “abroad”, “foreign”
and “overseas”—then documents are liable to be indexed haphazardly under all
of these; a searcher who chooses one and finds documents indexed there will
assume that he has found the correct term and will stop his search without knowing
that there are other useful documents indexed under the other synonyms.
e) To locate a new concept in a scheme of relationships with existing concepts in a
way which makes sense to users of the system. 47
Indexing f) To provide classified hierarchies so that a search can be broadened and narrowed
systematically, if the first choice of search term produces either too few or too
many references to the materials in the store.
g) A desirable purpose, but one which it would be premature to say is being achieved,
is to provide a means by which the use of terms in a given subject field may be
standardised.
Basic Thesaural Relationships
Basic thesaural relationships or the semantic relationships in a thesaurus refer to two
types of relationships: (1) Hierarchical Relationship; (2) and Non-Hierarchical
Relationship. The following figure shows the different types of relationships displayed in
a thesaurus.
Thesaural Relationship
Equivalence Associative
Relationship Relationship
Hierarchical Relationship
Hierarchical relationships are based on degrees or levels of superordination and
subordination, where the superordinate term represents a class or a whole, and
subordinate terms refer to its members or parts. This relationship is of four types: Genus-
Species (Generic) relationship, Whole-Part relationship, Instance relationship and Poly-
hierarchical relationship.
Reciprocity in the hierarchical relationships is expressed by the relationship indicators:
BT (Broader Term), i.e. a label for the superordinate (parent) term; and NT (Narrower
Term), i.e. a label for the subordinate (child) term.
Genus-Species (Generic) Relationship links genus and species and represents
the basis of scientific taxonomic system. As for example
Examples of Hierarchical relationship indicator (BT and NT),
Mammals Vertebrates
BT Vertebrates NT Mammals
Whole-Part Relationship covers situations in which one concept is inherently
included in another, regardless of context, so that the terms can be organised into
logical hierarchies, with the whole treated as a broader term. As for example:
Central nervous system Spinal cord
NT Spinal cord BT Central nervous system
48 Instance Relationship identifies the link between a general category of things or
events, expressed by a common noun, and an individual instance of that category, Indexing Languages
often a proper name. As for examples:
Mountain regions Himalayas
NT Himalayas BT Mountain regions
Polyhierarchical Relationship occurs when some concepts belong, on logical
grounds, to more than one category. In the following example, the term pianos is
assigned to subordinate positions on the basis of its generic relationship to two
broader terms—in other words, pianos would be an NT to both stringed
instruments and wind instruments.
Musical Instruments
Organ
Non-Hierarchical Relationship
Relationship between terms other than hierarchical is called Non-hierarchical relationship,
which may further be grouped as Equivalence (or Preferential) Relationship and
Associative Relationship.
Equivalence (or Preferential) Relationship refers to the relationship between
preferred and non-preferred terms in which each term is regarded as referring to the
same concept. When the same concept can be expressed by two or more terms, one
of these is selected as the preferred term. A cross-reference to the preferred term
should be made from any “equivalent” entry term. Reciprocity in the equivalence
relationships is expressed by the relationship indicators: USE, which leads from a non-
preferred (entry) term to the preferred term, and UF or USED FOR, which leads from
the preferred entry term to the non-preferred term(s).
Four basic types of equivalence relationship are evident: (a) Synonyms; (b) Lexical
variants; (c) near-synonyms; and (d) Generic posting.
Synonyms: Synonymy occurs when a concept can be represented by multiple
terms having the same or similar meanings. A thesaurus compensates for the
problems caused by synonymy by ensuring that each concept is represented by a
single preferred term. It lists other synonyms and variants as non-preferred terms
with USE references to the preferred term.
Birds Aves
UF Aves USE Birds
Vaseline Petroleum jelly
UF Petroleum jelly USE Vaseline
Lexical variants: Lexical variants differ from synonyms in that synonyms are
different terms for the same concept, while lexical variants are different word
forms for the same expression. These forms may derive from spelling or grammatical
variation or from abbreviated formats. The following examples indicate the
preferred grammatical forms of terms. 49
Indexing Nouns and Noun Phrases: The grammatical form of a term should be a noun or noun
phrase. Nouns used as terms are divided into two categories: Count nouns and Noncount
(mass) nouns. Count Nouns are names of objects or concepts that are subject to the
question “How many?” but not “How much?” These should normally be expressed as
plurals. For examples: books, penguins, singers, vertebrates, windows, etc. Mass
(noncount) nouns are names of materials or substances that are subject to the question
“How much?” but not “How many?” These should be expressed in the singular. Some
examples of Singular mass nouns are: milk, water, etc.
Where the singular and plural forms of a term represent different concepts, separate
terms for each are entered in the thesaurus. The distinction should be indicated by a
qualifier. Some examples are: Bridge (game) / Bridges (structures); Damage (injury) /
Damages (law); Wood (material) / Woods (forested areas).
Noun phrases are compound terms that are established as preferred terms if they
represent a single concept. Noun phrases occur in two forms: (a) Adjectival noun phrases
like Red rose, Marine birds, Cold fusion, Historical drama, etc.; and (b) Prepositional
noun phrases like Plaster of Paris, Prisoners of war, Hospitals for children, etc.
Adjectives: Adjectives and adjectival phrases used alone are established as terms in a
thesaurus under certain special circumstances. Single adjectives is used in a “nominal”
way; that is, the noun is obvious from the context or the adjective is used to describe an
attribute of the content object other than topic, such as colour or size. For examples:
small, medium, large, blue, green, red, yellow, etc.
As an alternative to the creation of multiple compound terms, adjectives may appear as
separate terms when designed to be precoordinated in indexing or postcoordinated in
searching. They should generally not be assigned as indexing terms in isolation. Given
the possibility of false coordination in searching (e.g., the linking of an adjective with the
wrong noun), adjectival terms should be used sparingly. Some examples of the use of
adjectives as terms in pre- and post coordination are: Airborne / Airborne troops;
Offshore / Offshore drilling; Mobile / Mobile homes, etc.
Certain noun phrases may be used to modify other nouns, e.g., high frequency can
modify the noun waves.
Adjectives may be used alone in general cross references to direct the user to or from
a group of terms beginning with a corresponding noun, e.g., “cardiac . . . see also the
terms beginning with heart.” An example of a reference in the opposite direction (noun
to adjective) is: “France see also the terms beginning with French (French art, French
language, French literature, French wines).”
Adverbs: Adverbs such as “very” or “highly” should not be used alone as terms. A
phrase beginning with such an adverb may be accepted as a term only when it has
acquired a specialized meaning within a domain. Some examples of Adverbial phrases
are: very high frequency, very large scale integration, very low density lipoproteins, etc.
Abbreviations: Abbreviations are selected as preferred terms only when they have
become so well established that the full form of the term or proper name is rarely used,
e.g. AIDS rather than Acquired Immune Deficiency Syndrome; Lasers rather than
Light Amplification by Stimulated Emission of Radiation; UNESCO rather than
United Nations Educational, Scientific, and Cultural Organization; etc. The full form of
terms are selected as preferred terms when the abbreviated form is not widely used
and generally understood, e.g. Automated teller machine rather than ATM; Prisoners
of war rather than POW, etc. Cross-references should be made from the non-preferred
forms to preferred form.
50
Popular and Scientific Names: If a popular and a scientific name refer to the same Indexing Languages
concept, the form most likely to be sought by the users of the thesaurus should be
chosen as the preferred term. For example, Penguins is chosen as the preferred term
in a nontechnical thesaurus with a cross reference from the scientific equivalent,
Sphenisciformes. However, Sphenisciformes is selected as the preferred term in a
zoological thesaurus with a cross-reference from the popular name, Penguins.
Near-synonyms: Near-synonyms are terms whose meanings are generally
regarded as different, but which are treated as equivalents for the purposes of a
controlled vocabulary. The extent to which terms are treated as near-synonyms
depends in large measure upon the domain covered by the controlled vocabulary
and its size. Near-synonyms may include antonyms or represent points on a
continuum. As for examples, Sea water/salt water [variant terms]; Smoothness/
roughness [antonyms].
Generic Posting: It is a technique in which the name of a class and the names of
its members are treated as equivalents, with the broader class name functioning as
the preferred term. As for examples, Waxes UF Plant waxes; Plant waxes USE
Waxes.
Associative Relationships
This relationship covers associations between terms that are neither equivalent nor
hierarchical, yet the terms are semantically or conceptually associated to such an extent
that the link between them is made explicit in the thesaurus, on the grounds that it may
suggest additional terms for use in indexing or retrieval. The associative relationship
used in thesauri is indicated by the abbreviation RT (Related Term). As a general guideline,
whenever one term is used, the other should always be implied within the common
frames of reference shared by the users of the thesaurus. Either of the following types of
terms can be linked by the associative relationship:
a) Those belonging to the same category, and
b) Those belonging to different categories.
Relationships between terms belonging to the same category
Relationships are needed for terms belonging to the same category in various special
situations, primarily to guide the user in locating the desired term. Each of the terms
belonging to the same category has its own particular meaning, but the boundary between
them is often confused with common usage, to the extent that a user checking one of
them in the index should be informed of documents indicated by others. As for examples:
Ships Carpets
RT Boats RT Rugs
Boats Rugs
RT Ships RT Carpets
Relationships between terms belonging to the different categories
It is possible to establish many grounds for associating terms belonging to different
categories. Related Term references are often made between etymologically related
terms, i.e., terms that contain the same root, but which do not represent the same kind
of thing. The following are some representative examples of typical relational situations.
51
Indexing a) Process / Agent
Temperature control
RT Thermostats
Thermostats
RT Temperature control
b) Process / Counteragent
Inflammation
RT Anti-inflammatory agents
Anti-inflammatory agents
RT Inflammation
c) Action / Property
Polling
RT Public opinion
Public opinion
RT Polling
d) Action / Product
Weaving
RT Cloth
Cloth
RT Weaving
e) Action / Target
Harvesting
RT Crops
Crops
RT Harvesting
f) Cause / Effect
Cloud
RT Rain
Rain
RT Cloud
g) Concept or Object / Property
Poisons
RT Toxicity
Toxicity
RT Poisons
h) Concept or Object / Origins
Americans
52
RT United States
United States Indexing Languages
RT Americans
i) Concept or Object / Units or Mechanisms of Measurement Associative
Relationships
Electric current
RT Amperes
Amperes
RT Electric current
j) Raw Material / Product
Wheat
RT Flour
Flour
RT Wheat
k) Discipline or Field of Study / Object or Phenomenon Study
Neurology
RT Nervous system
Nervous system
RT Neurology
l) Discipline or Field of Study / Practitioner
Mathematics
RT Mathematicians
Mathematicians
RT Mathematics
m) Antonyms
Height
RT Depth
Depth
RT Height
n) Phrases Containing Syncategorematic Nouns and their Apparent Foci
Ships
RT Model ships
Model ships
RT Ships
o) Coordinate Ideas
Hinduism
RT Buddhism
Christianity
Islam 53
Indexing 10.7.4 Thesaurofacet
Thesaurofacet: a thesaurus and faceted classification for engineering and related
topics was developed from the English Electric Company’s Faceted classification
for Engineering, the first edition of which was published in 1958. Thesaurofacet came
about when the third edition of Faceted Classification for Engineering, published in
1961, was up for revision. This system was used to organise documents belonging to
the libraries of the corporation of English Electric. However, with the growing trends in
science and technology and the need for using computer techniques and post-coordinate
indexing, a decision was taken in 1967 to commission Jean Aitchison, a member of
Classification Research Group, to review the indexing needs of the company and the
result of that review was the compilation of a new and improved 4th edition of the
faceted classification system called Thesaurofacet, published in 1970. In the 4th edition,
the alphabetised index to the classification scheme was replaced with a thesaurus.
Thesaurofacet covers the whole field of science and technology but subjects are treated
in varying depth and only engineering and allied fields are covered exhaustively. Full
subject coverage includes engineering and fields directly related to engineering like
computers, measurement and testing, physics and management. Relevant management-
related concepts were borrowed from the “Classification of Business Studies” developed
by London Graduate School of Business Studies.
Thesaurofacet is considered as a multi-purpose retrieval language tool because it has
classification schedules and a faceted thesaurus. The classification consists of main
classes and facets and has a notation system that consists of letters in upper case and
numbers from 2 to 9. The faceted thesaurus is the key to the uniqueness of the tool
because it offers the user options to identify topics within the system. Because the two
are linked, each term in the system appears twice, once in the schedule, and once in the
thesaurus, with notation that links the two parts together. However, the information
given about the term in the thesaurus is not the same information given about that term
in the schedule. The two parts of the system are complementary and should be used
conjunctively and not separately. Finally, Thesaurofacet can be used for the arrangement
of books on the shelves and arrangement of entries in the subject catalogues. Further,
the index terms are intended to be used for indexing and searching.
If we are asked for information on Documentation, we turn to the thesaurus and find:
Documentation use
Information Science
At Information Science we find
Information Science ZR
UF Documentation
Librarianship
Library science
RT Communication (Sociology)
Data processing
Information theory
Librarians
54
We also see the notation to the right and we are told to look for ZR in the classification Indexing Languages
schedules. At ZR in the classification schedule we find that Thesaurofacet divides
Information Science using subjects and facets:
Main Class
Subject Field(s)
Fundamental Facets
Sub-Facets
Hierarchies and Arrays
ZR Information science
ZR2 LIBRARIES
By type:
ZR3 National libraries
ZR4 Public libraries
ZR5 Municipal libraries
ZR6 County libraries
ZRB Educational Libraries
—————————
—————————
By management:
ZRP Library management
—————————
—————————
By equipment:
—————————
—————————
By materials:
—————————
—————————
ZT INFORMATION RETRIEVAL
By type of language:
ZT2 Index languages
Subdivided by type:
ZT3 Natural language
ZT4 Single term natural language
55
Indexing ZT5 Concept natural language
ZT8 Controlled index languages
—————————
—————————
Here Information Science is called Main Class. Main class can be divided into subject
fields, in this case LIBRARIES, and INFORMATION RETRIEVAL. Subject fields
are broken down into fundamental facets, and are printed by bold typeface (e.g. by
type, by management, by equipment, etc.). This is where the schedules start to use
facet analysis. If we take a look at one of fundamental facets as an example, Information
Retrieval is broken down ‘By type of language’ into the facet Index languages. The
facet is broken down into sub-facet Natural Language and Controlled index
languages. Terms are then listed in hierarchies and arrays. The schedules are typically
used for a broad subject search and browsing through the possible topics that exist
within a class. So a user may start with the main class and navigate through the facets
and arrays to locate an ideal or more specific topic.
The classification schedules also allow for the combination or synthesis of topics and
notation. For example, there is a note in the main class ZL SOCIOLOGY that tells us
that this topic can be combined with ZM PSYCHOLOGY to create the synthesized
subject field called ZL/ZM SOCIAL PSYCHOLOGY. Synthesis can indeed be used
wherever required, there being no preferred combination order unless there is an
instruction in the schedules.
Thus it appears that two types or styles of “faceted classifications integrated with thesauri”:
First type uses subject fields as main subdivisions, and facet analysis is used to determine
the relationships and the second type of faceted thesauri are those in which concepts
are first divided by facets.
10.7.5 Classaurus
The vocabulary control device used for POPSI has been designated as Classaurus. It
is a category-based (faceted) systematic scheme of hierarchical classification in verbal
plane incorporating all the essential features of a conventional IR thesaurus—i.e. control
of synonyms, quasi-synonyms, etc. RTs are not shown in the classaurus. A scheme of
this type, for its application, calls for a complementary alphabetical index giving the
address of each term occurring in the systematic part. The purpose for which a classaurus
is used does not necessarily warrant any principle-based arrangement of the terms in
the array. Even if the terms in each array are arranged alphabetically the purpose is not
going to be disturbed. This feature of the classaurus makes it largely amenable to
computerisation.
The structure and style of presentation of a classaurus can be systematically presented
as follows:
A) Systematic Part
A1 Common Modifiers
A1.1 Form
A1.2 Time
A1.3 Environment
56 A1.4 Place
A2 Inter-subject Relation Modifiers Indexing Languages
59
Indexing Prepositional phrase headings
Prepositional phrase headings are used to enable the subject cataloguer to express
single but complex ideas for which there is no one word, e.g. Photography in
Psychiatry; Radioisotopes in Cardiology.
Inverted Headings
Inverted Headings serve the alphabetico-classed function of subordinating specific
descriptors under their broad generic categories, e.g. Proverb, Korean, Education,
Elementary. Names of geographic features have traditionally been inverted in order to
place a significant word in the initial position instead of the generic word. As for example:
“Lake Eric” is formulated as Eric, Lake so that the distinguishing part of the name,
“Eric” appears first.
Proper name headings
Any proper name can be used as a subject heading when that proper name becomes
the subject of a work. Names are not, however, all included in the LCSH. The only
ones appearing in the LCSH are those that are used as pattern headings, or as examples,
or that need special subject subdivisions or instructions printed under them. For
examples: Bermuda Islands, Shakespeare, William, 1564-1616, Canadian
Psychological Association.
10.8.2 Subdivisions
LCSH requires extensive use of subdivisions as a means of combining a number of
different concepts into single subject heading. Subject subdivisions are words or phrases
that can follow a subject heading, and are listed below the subject heading and indicated
by a dash. These words or phrases are used to make the subject more specific. Complex
topics are represented by subject headings followed by subdivisions. Some subdivisions
are printed in LCSH but greater number of subdivisions may be assigned according to
the rules specified in the Subject Headings Manual (2008). Library of Congress has
published a separate alphabetical listing of various subdivisions that can be used after
any listed subject heading unless otherwise specified. Subdivisions fall into several broadly
defined categories: topical and chronological subdivisions specific to particular headings,
form subdivisions, geographic subdivisions, free-floating subdivisions, and subdivisions
under pattern headings. Valid subdivisions are marked in bold letters. Primary
subdivisions are preceded by a single dash. Secondary subdivision—that is subdivisions
of subdivisions are preceded with a double dash. Each category is described below:
Topical Subdivisions
Topical subdivisions are used under the main headings or other subdivisions to limit the
scope of the concept expressed by the heading to a special subtopic. Such headings
are displayed by a long dash in LCSH: if two subdivisions are used, there are two long
dashes. For example: Women—Employment; Automobiles—Motors—
Carburetors. The rules for their application are found in the Subject Headings Manual
(2008) and in general references printed under the generic headings in LCSH.
Chronological Subdivisions
Chronological subdivisions are used to limit a heading, with or without subdivision, to a
particular time period, e.g. Philosophy, French—18th century. Specific topical
subdivisions and chronological subdivisions printed under names of countries and other
jurisdictions or regions may be used with them. The date subdivisions displayed under
60
United States—Economic conditions, United States—History, and United Indexing Languages
States—Politics and governments are illustrative.
Form Subdivisions
Form subdivisions are used to indicate the form in which the material on a subject is
organised and presented, and as such are added as the last element to any heading.
Form and topical subdivisions are both included within a general subdivision category.
As for examples: Art–Bibliography–Periodicals (a serially issued art bibliography);
Art–Periodicals–Bibliography (a bibliography of art journals). For combinations of
subdivisions not listed in Subject Cataloguing Manual or Free-floating subdivisions,
other methods must be employed. In most subject headings, the form subdivision appears
as the last element, following the general pattern of subdivision order,
Topic–Place–Time–Form.
Geographic Subdivisions
For geographic subdivisions, headings with the code (May Subd Geog) may have
place names added which are not specifically listed in LCSH. In the LCSH, the heading
Labor supply (May Subd Geog)” tells us that place name may be added with the
main heading “Labor supply”, e.g. Labor supply—France. If a heading contains both
a geographic subdivisions and topical or form subdivisions, the location of the geographic
subdivision depends on which elements can be subdivided by place, e.g. Construction
industry—Finance—Law and legislation—Italy. Sometimes geographic headings
are subdivided by topic, e.g. Massachusetts—History.
Free-floating Subdivisions
There are a number of identical (or nearly so) subdivisions that can be applied in
specifically defined situations to primary headings and which won’t have separate
authority records for each heading and subdivision combination. The five broad groups
of free-floating subdivisions are: general, under classes of persons and ethnic groups,
under names of corporate bodies, persons and families, under place names, and entries
controlled by pattern headings. All free-floating subdivisions are listed in Free-floating
Subdivision: an Alphabetical Index, published annually by Library of Congress (the
latest one is 21st edition, 2009). It is a useful tool for libraries, indicating the various
subdivisions that can be used after any listed subject heading unless otherwise specified.
Subdivisions Controlled by Pattern Headings
Certain form or topical subdivisions are common in a particular subject field or applicable
to headings in a particular category. Instead of authorising them heading by heading
within the category, they are listed under a chosen heading in the category. This chosen
heading then serves as a “pattern heading” of subdivisions for headings in that category.
The applicable subdivisions are displayed under the table of pattern headings in LCSH.
The subdivisions listed under a pattern heading may be transferred and used with another
heading in the same category even though the combination does not appear in LCSH.
The table of pattern headings is arranged alphabetically by category of headings covered
by the pattern headings. The specific headings under which the subdivisions will appear
in LCSH are listed in the right-hand column. For example, a set of subdivisions has
been developed under Cancer and Tuberculosis, the pattern headings for the category
Diseases. The subdivisions established under either of these pattern headings may be
used as subdivisions under any heading belonging to the category Diseases if it is
appropriate.
61
Indexing 10.8.3 Entry Structure
All preferred primary subject terms and subjects with subdivisions are listed in boldface
type, while those in the entry vocabulary only (i.e. references interfiled with the main
headings), appear in normal typeface. Each entry in the LCSH is accompanied by all or
some of the following:
Headings
Subject headings displayed in the LCSH consist of single word or multiple word headings
as discussed under the Section 10.8.1 of this Unit.
Class Numbers
Library of Congress (LC) classification numbers are included in LCSH for many primary
subject terms and some subject subdivisions. Class numbers are added only when
there is a class correspondence between the subject heading and the provisions of the
LC classification schedules, e.g.
Computer software
[QA76.755]
Scope Notes
This is a note under the subject heading that explains and clarifies what is meant and
what is not meant in the definition of the term and in its use as a subject heading. It
specifies the range of application for a heading or draw distinctions between related
terms. Scope notes may follow in separate paragraphs.
Reference Structure
References associated with the headings are then listed in groups under the following
codes:
USE= Use this heading instead of another
UF = Used For (do not use this heading)
BT = Broader Topic (less specific)
NT = Narrower Topic (more specific)
RT = ‘Related Topic (similar)
SA = See Also (additional terms)
Equivalence relationship: USE References
The code UF denotes non preferred headings for the given term/phrase. The codes
USE and UF function as reciprocal in the LCSH. As for examples:
Cars (Automobiles) Automobiles
USE Automobiles UF Cars (Automobiles)
In the catalogue, the USE and UF can be translated into “see” references.
Hierarchical Relationships: BT and NT References
Subject headings are linked to other subject headings through cross-references now
expressed as Broader Topic (BT) and Narrower Topic (NT). Headings connected by
62 the BT and NT references are all valid headings and the connections use reciprocals. A
heading appearing as a BT must be matched by the reversed relationship as an NT, as Indexing Languages
demonstrated by the following example:
Vehicles Transportation Motor vehicles
BT Transportation NT Vehicles BT Vehicles
NT Motor vehicles
The BT and NT can be translated into “see also form” and “see also” references
respectively in the catalogue, as demonstrated below by taking the above example:
Transportation Vehicles
see also see also
Vehicles Motor Vehicles
Associative Relationship: RT References
The associative relationship, expressed by the code RT (Related Term), links two
headings that are associated in some manner other than hierarchical or equivalence
sense. RT references are provided under both terms in a reciprocal way. For examples,
Ornithology Birds
RT Birds RT Ornithology
General References
A general reference, represented by the code “SA” (see also), is a reference made not
to specific individual headings but to an entire group of headings. The code “SA” is
translated into “see also” reference in the catalogue. For example,
Wood working industries
SA names of specific industries, e.g. Furniture industry and trade
It is expected that each library will make specific references to each individual industry
about which the library has documents in its collection.
10.9.2 Subdivisions
The purpose of subject subdivisions is to subdivide topics which are broad in scope or
have much written about them. Subdivisions provided by the SLSH indicate a specialised
aspect of a broad subject or point of view. There are different types of subdivisions,
including topical subdivisions, geographic subdivisions, chronological subdivisions, and
form subdivisions. Subdivisions are compounded under a given topic. For examples of
the different types of subdivisions [form, key (like LCSH pattern headings), special
topic, time, geographic] refer to the heading “United States—History—1861-1865,
Civil War—Medical care”. The subdivisions in SLSH usually follow the standard
order of [Topical]—[Geographic]—[Chronological]—[Form].
Form subdivisions are used to indicate the physical form of the exposition of the
document, e.g. ...—Bibliography.
Topical subdivisions provide the aspect of a subject or point of view presented
in a particular document, e.g. Oceanography—Research.
Key headings in SLSH, like LCSH, serve as patterns for the subdivisions and
may be used under all headings of the same type. Some of these ‘key headings’
are: Shakespeare, William, 1564-1616 (to illustrate subdivisions under any
author); President—United States (to illustrate subdivisions that may be used
under presidents, prime ministers, governors, and rulers of any country, state,
etc.); English language and English literature (to illustrate subdivisions that
may be used with any language and literature respectively).
Chronological subdivisions, i.e. time divisions, which apply most frequently to
history, define a specific chronology for the primary topic, e.g. Europe—History—
1789-1900. If a chronological period has a given name, this name is included in
the heading following the dates, e.g. United States—History—1600-1775,
Colonial period.
Geographic subdivisions provide geographical aspect of a broad subject. Two
forms of geographic divisions are evident in the SLSH: (a) Area—Subject, e.g.
Chicago (III)—Census and (b) Subject—Area, e.g. Geology—Bolivia.
Parenthetical instructions are provided with those headings which may be subdivided
67
by place, e.g. Folklore (May subdiv. Geog.).
Indexing 10.9.3 References
References in the SLSH are grouped into three main categories: specific See references,
specific See also references, and general references. With the fifteenth edition of SLSH
in 1994 See and “See also” references and the complementary “x” and “xx” were
replaced by the thesaurus symbols UF/USE, BT, NT, RT and SA as explained under
the Section 10.8.3 of this Unit.
Specific “See” References
The UF symbol stands for “Used for”, and it designates those preferred terms or
phrases form which “see” references are to be made. The most frequent and helpful
varieties of See references direct the users from:
a) Synonyms or near synonyms, e.g. Chemical geology USE Geochemistry.
b) The second part of a compound heading, e.g. Motels USE Hotels and motels.
c) Conjunctive i.e. terms connected by “and”, e.g. Religion and Science USE Science
and religion.
d) Variant spellings or initialism to the accepted spelling or full form, e.g. Gipsies
USE Gypsies and ESP USE Extrasensory perception.
e) Opposites when they are included without being specifically mentioned, e.g.
Disobedience USE Obedience.
f) Singular to plural when the two forms would not file together, e.g. Goose USE
Geese.
Specific “See also” References
A “See also” reference connects a heading to a related heading or headings designated
in the SLSH as NT (Narrower term(s)) and RT (Related term(s)) if the library has
material listed under both headings and it normally move downward from a general
term to a more specific term(s), e.g. Indoor gardening See also Terrariums; Window
gardening. In this example the general term refers to two more specific terms. The
user who pursues the index file by looking under Window gardening will find a still
more specific downward reference like Window gardening See also House plants.
It appears that See also references follow the rule that references are made from a
general headings to one more specific. Yet there are a high number of bilateral or
reciprocal references, where the movement is horizontal (between related subjects)
rather than vertically (towards a more specific term from a more general one). These
terms are designated as RT (Related term(s)) in the SLSH and the See also references
between related terms are reciprocal in the catalogue. Under the heading Window
gardening, the users will find Container gardening and Flower gardening as its
RTs, and reciprocal See also reference entries are evident in the following examples:
Window gardening See also Container gardening; Container gardening See also
Window gardening; Window gardening See also Flower gardening; Flower
gardening See also Window gardening.
General References
The symbol “SA” stands for “See also” and introduces a general reference, which
refers for one heading to an entire category or class of headings rather than an individual
heading. A general explanation or direction is given instead of a long list of individual
headings. For example,
68
Furniture Indexing Languages
69
Indexing 11) Discuss the different types of “See also” cross-references in the Sears List of
Subject Headings.
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
12) Describe the filing order of subject headings in the SLSH.
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
13) Prepare the “See” and “See also” cross-references from the following entry:
Fatigue
UF Exhaustion
Weariness
BT Physiology
NT Jet lag
RT Rest
10.10 SUMMARY
Indexing languages are controlled languages used to facilitate access to information.
The present Unit discusses various aspects of an indexing language in holistic manner.
The meaning, scope, structure and attributes of an indexing language and how it differs
from the natural language have been discussed at the outset. It is followed by the discussion
on meaning, need, objectives and method of vocabulary control. Types of indexing
languages are discussed. The subject headings list has been explained in the context of
the general principles associated with the choice and rendering of subject headings.
Subject authority file has also been considered in the light of use and maintenance of the
subject heading list. Thesaurus has been discussed with particular reference to its types,
functions, structure, nature of relationships between terms, and the important role they
play in indexing and searching. Salient features of two extended forms of thesaurus—
Thesaurofacet and Classaurus are explained. This unit also looks at the Library of
Congress Subject Headings (LCSH) and Sears’ List of subject headings (SLSH) in
greater detail.
10.12 KEYWORDS
Associative Relationship : A relationship between or among terms in a
controlled vocabulary that leads from one term
to other terms that are related to or associated
with it; begins with the words SEE ALSO or
related term (RT).
Broader Term : A term to which another term or multiple terms
are subordinate in a hierarchy. In thesauri, the
relationship indicator for this type of term is BT.
Classaurus : A category-based (faceted) systematic scheme
of hierarchical classification in verbal plane
incorporating all the essential features of a
conventional IR thesaurus.
Controlled Vocabulary : An authority list of terms showing their
interrelationships and indicating ways in which
they may usefully be combined to represent
specific subject of a document.
Cross Reference : A mechanism available in the indexing language
for directing the user from term/phrase not used
as headings to the term/phrase that is used, and
from broader and related topics to the one
chosen to represent a given subject. Two types
of cross-references are used in the indexing
language: See and See also.
Descriptor : See Preferred term
Entry Term : The non-preferred term in a cross reference that
leads to a term in a controlled vocabulary. Also
known as “lead-in term.” In thesauri, the
relationship indicator for this type of term is U
(USE); its reciprocal is UF (USED FOR). See
also preferred term.
Entry Vocabulary : The set of non-preferred terms (USE references)
which lead to the preferred terms in a controlled
vocabulary.
Equivalence relationship : A relationship between or among terms in a
controlled vocabulary that leads to one or more
terms that are to be used instead of the term from
which the cross-reference is made; begins with
74 the word SEE or USE.
Free-floating Subdivision : A term that can be added to a subject heading in Indexing Languages
a published list, as needed, whether or not it is
written in the published list following that heading.
In the LCSH, terms called ‘free-floating’ have
scope notes that express limitations on the use
of such terms.
Hierarchical Relationship : A relationship between or among terms in a
controlled vocabulary that depicts broader
(generic) to narrower (specific) or whole-part
relationships; begins with the words broader term
(BT), or narrower term (NT).
Homograph : One of two or more words that have the same
spelling, but different meanings and origins. In
controlled vocabularies, homographs are
generally distinguished by qualifiers.
Indexing Language : An indexing language is a set of codes and their
admissible expression used for representing the
content of the documents as well as queries of
the users.
Indexing Term : The representation of a concept in an indexing
language, generally in the form of a noun or noun
phrase. Terms, subject headings, and heading-
subheading combinations are examples of
indexing terms. Also called descriptor.
Lexical Variant : Lexical variants are different word forms for the
same expression resulting from spelling variants,
grammatical variation and variation in abbreviated
formats.
Narrower Term : A term that is subordinate to another term or to
multiple terms in a hierarchy. In thesauri, the
relationship indicator for this type of term is NT.
Natural Language : A language used by human beings for verbal
communication.
Near-Synonym : A term whose meaning is not exactly synonymous
with that of another term, yet which may
nevertheless be treated as its equivalent in a
controlled vocabulary.
Pattern Heading : A pattern heading is a representative heading
from a category of terms that would normally be
excluded from the LCSH (e.g. names of
individuals), included as an example of normal
subdivision practice within that category.
Paradigmatic Relation : A paradigmatic relation, also called semantic or
generic relation, is document independent
relationship which usually finds expression in the
hierarchical organisation of the vocabulary itself.
This relationship is document independent
75
Indexing relationship because it is established without any
reference to a document.
Preferred Term : One of two or more synonyms or lexical variants
selected as a term for inclusion in a controlled
vocabulary.
Reciprocity : Semantic relationships in controlled vocabularies
must be reciprocal, that is each relationship from
one term to another must also be represented by
a reciprocal relationship in the other direction.
Reciprocal relationships may be symmetric, e.g.
RT / RT, or asymmetric e.g. BT / NT.
Related Term : A term that is associatively but not hierarchically
linked to another term in a controlled vocabulary.
In thesauri, the relationship indicator for this type
of term is RT.
Relationship Indicator : A word, phrase, abbreviation, or symbol used in
thesauri to identify a semantic relationship
between terms. Examples of relationship
indicators are UF (USED FOR), and RT (related
term).
Scope Note : A note following a term explaining its coverage,
specialised usage, or rules for assigning it.
Semantics : Semantics is a study of meaning. In an indexing
language, semantic relationship refers to the
hierarchical and non-hierarchical relationships
between the subjects and is governed by see and
sees also references in an index file. Controlled
vocabulary serves as the source for see and see
also references.
Subject Heading : A word or group of words (phrase) indicating a
subject under which all materials dealing with
same theme is entered in a catalogue or
bibliography, or is arranged in a file.
Subject Heading List : A subject heading list is an alphabetical list terms
and phrases, with appropriate cross references
and notes, that can be used as a source of
subject headings in order to represent the subject
content of a document.
Subject proposition : Name of a subject, i.e. subject heading
Syndetic Device : An organisational framework in which related
names, topics, etc. are linked to each other by
connective terms Like ‘See’ and ‘See also’.
Synonym : A word or term having exactly or very nearly the
same meaning as another word or term.
Syntagmatic relation : Syntagmatic relationship, also called syntactical
relationship, refers to those relationships which
are achieved through the syntactic devices—
76
word or term order and relators or linking Indexing Languages
mechanism used in the given indexing language.
This relationship is document dependent
relationship because it is established with
reference to the concepts associated with the
content of a given document.
Syntax : The grammatical structure consisting of a set of
rules that govern the sequence of occurrence the
terms in a subject heading.
Thesaurus : A thesaurus is a controlled and dynamic
vocabulary of semantically related terms used in
translating from the natural language of
documents, indexers or users into an indexing
language.
Vocabulary Control : The process of organising a list of terms (a) to
indicate which of two or more synonymous terms
is authorised for use; (b) to distinguish between
homographs; and (c) to indicate hierarchical and
associative relationships among terms in the
context of a controlled vocabulary or subject
heading list. See also controlled vocabulary.
77