You are on page 1of 4

T5-DIGITAL THESAURI

CONTENT INDEX
1. Thesaurus concept
2. Thesaurus composition
3. Thesaurus typology
4. Standards for thesauri
5. Methodology for the preparation of thesauri
6. Maintenance and updating of thesauri
7. Digital thesauri

THESAURUS CONCEPT

A FIRST DEFINITION
- A thesaurus (plural, thesauri), also known as synonym dictionary, is a reference work for nding synonyms,
and sometimes antonyms, of words.
- It is a tool aimed at nding the words that most accurately and appropriately express an idea.
- Forms of organization:
o Systematic presentation: hierarchical taxonomy of concepts.
o Alphabetical presentation.
o Graphical presentation: tree, network or arrow diagram.

DEFINITIONS IN THE FIELD OF LIBRARY AND INFORMATION SCIENCE (LIS)


- A thesaurus is a type of knowledge organization system that is made up of analyzed and standardized
terms that have semantic and functional relationships with each other. The thesaurus is organized under
strong terminological control, in order to provide a suitable tool for the storage and retrieval of information in
specialized areas. […] In certain cases, it adds a notation (BARITÉ, 2015).

- A thesaurus is a controlled and formally structured vocabulary, made up of terms that have semantic and
generic relationships between them: equivalence, hierarchical and associative. It is an instrument of
terminological control that allows converting the natural language of documents into a controlled language,
thus univocally representing the content of documents, in order to serve for indexing and document retrieval
(LAMARCA, 2013).

- LEVÉRY (1976) formulated one of the most concise de nitions of the thesaurus, asserting that it is a
bridge between the language of the informed (the documentalist) and the language of the uninformed (the
user)".

THESAURUS COMPOSITION
fi
fi
fi
THESAURUS COMPOSITION

Lexical units
o Descriptors (preferred terms): are words or a group of words retained in the thesaurus and
chosen from a group of equivalent terms. These are authorized and formalized terms in a thesaurus, which
are used to unambiguously represent the concepts contained in documents and in information retrieval
requests.
• Single terms: are used when the concept is clear in itself, without the need to add
any other words. Ex.: photography.
• Compound terms: are used when it is necessary to use several terms (e.g.
adjective + noun) to precise or specify a concept. Ex: digital photography.

o Non-descriptors (non-preferred terms): are words included in the thesaurus, which belong to a
list of synonyms or quasi-synonyms and related terms linked to the descriptors by a semantic equivalence
relationship, which are likely to appear in the documents or in the requests, but which are not used to
formulate the query to the system. It is intended these terms improve the coherence of the representation of
the documents or of a query by sending us to the indexing term.

Semantic relationships

o Equivalence: It is the relationship between preferred descriptors or preferred terms and


nondescriptors or terms not used in indexing for the same concept. Represented by USE (preferred or
authorized term) and UF (USED FOR) (non-preferred or non-authorized terms). Example: Informatics (non-
authorized term) USE Computer Science (authorized term) Classi cation systems (authorized term) UF
Classi cation schemes (non-authorized term)

o Associative: It indicates relationships or links to the meaning of two descriptors. They are
symmetrical relationships between two descriptors, which are likely to evoke each other by reciprocal
association of ideas. Represented by RT (related term).
Example: Scienti c information RT Open science

o Hierarchical: It is the vertical relationship between all the descriptors of the same class,
expressed in terms of the subordination of the concepts (one term is superior or generic to another).
Represented by BT (broader term), NT (narrower term). They can be categorized as:
o Generic/speci c (a type of, or a class of). Example: Vertebrates (BT) →
Birds (NT)
o Whole/part (a part of). Example: Spain (BT) → Region of Murcia (NT)
o Enumerative (a case of). Example: Operating System (BT) → Microsoft
Windows (NT)
o Polihierarchy (a term falling into two categories). Example: Wind
instruments (BT) → Organ (NT) | Keyboard instruments (BT) → Organ (NT)

Scope notes (SN)


- They are intended to clarify, describe, explain and/or restrict the meaning of a term in a thesaurus. They
may be: historical, of application, de nitional or explanatory.
Example: Telematics (SN: for pre-1982 use “Telecommunication” and “Informatics”).
fi
fi
fi
fi
fi
THESAURUS TYPOLOGY

According to the approach used


- Faceted thesaurus: this is a thesaurus that combines a systematic faceted
classi cation (by breaking down subjects into multiple facets/dimensions/characteristic) with an
alphabetical thesaurus.
- Term thesaurus: relates the terms with which a concept is generally associated with.

According to the language used


- Monolingual thesaurus: a thesaurus containing descriptors in a single language.
- Multilingual thesaurus: contains descriptors in more than one language.

According to its structure


- Linear: presents the descriptors in a simple way, without connections.
- Tree: built following a tree hierarchy. Each class has a generic descriptor and several
speci c descriptors related in an ascending or descending order.
- Grid: built in the form of a network in which the descriptors intersect. Each class can
have several generic and speci c descriptors.

According to its presentation


- Alphabetical: descriptors and non-descriptors are grouped in a single alphabetical
sequence along with their relationships.
- Systematic: structured in two parts. The rst one (main) contains the categories or
hierarchies; the second (auxiliary) consists of an alphabetical index that leads users to the
corresponding section to which the term belongs.
- Graphic: terms are presented in the form of a graphic gure where the related terms are
associated. This graphic representation is usually a tree, a network or an arrow diagram.

STANDARDS FOR THESAURI

ISO 25964-1:2011. Information and documentation. Thesauri and interoperability with other
vocabularies Part 1: Thesauri for information retrieval (will be replaced by ISO/AWI 25964-1, now
under development).

- It gives recommendations for the development and maintenance of thesauri (monolingual and multilingual)
intended for information retrieval applications. It is applicable to vocabularies used for retrieving information
about all types of information resources, irrespective of the media used (text, sound, still or moving image,
physical object or multimedia) including knowledge bases and portals, bibliographic databases, text,
museum or multimedia collections, and the items within them.

- It also provides a data model and recommended format for the import and export of thesaurus data.

ISO 25964-2:2013. Information and documentation. Thesauri and interoperability with other
vocabularies. Part 2: Interoperability with other vocabularies.

- It is applicable to thesauri and other types of vocabulary that are commonly used for information retrieval.
It describes, compares and contrasts the elements and features of these vocabularies that are implicated
when interoperability is needed. It gives recommendations for the establishment and maintenance of
mappings between multiple thesauri, or between thesauri and other types of vocabularies.

ANSI/NISO Z39.19-2005 (R2010). Guidelines for the Construction, Format, and Management of
Monolingual Controlled Vocabularies.

- “Presents guidelines and conventions for the contents, display, construction, testing, maintenance, and
management of monolingual controlled vocabularies. It focuses on controlled vocabularies that are used
for the representation of content objects in knowledge organization systems including lists, synonym
rings, taxonomies, and thesauri”.
fi
fi
fi
fi
fi
METHODOLOGY FOR THE PREPARATION OF THESAURI
How do you build a thesaurus? (I)

• Gather terms from as many sources as possible (e.g., users, subject experts, documents, other existing
knowledge organization systems, etc.).
• Entry terms should include synonyms and abbreviations, acronyms, and alternative spellings for all of the
important concepts in your document collection.
• De ne the preferred terms.
• Create guidelines for selecting preferred terms. For example, in a collection of health-related documents
that include terms such as cancer, oncology, skin, and dermatology, make a decision to select medical
terminology or regular English as the preferred terms, according to you primary audience.
• Whichever terminology you choose, it's important to be consistent in your approach to de ne the
preferred terms.

• Link synonyms and near-synonyms. This is where you map the synonyms, abbreviations, acronyms, and
alternate spellings as "variant terms" to the preferred terms.
• The more entry terms you have, the easier it will be for indexers and users to nd the preferred terms.
• Group preferred terms by subject. This forms the foundation of your thesaurus hierarchy.
• De nition of the subject hierarchy should be informed by a balance of top-down considerations (e.g.,
mission, vision, intended audiences) and bottom-up content analysis.

• Identify broader and narrower terms.


• De ne where each term ts within the hierarchy.
• Existing thesauri (or other KOS) in your subject area or sector can be very useful in generating ideas for
broader and narrower terms.
• Perform associative relationships.The de nition of related terms is highly subjective.
• For each term ask the question: "Where will users want to go from here?“
• Choose only the most obvious and important relationships.

MAINTENANCE AND UPDATING OF THESAURI


• A thesaurus is never nished.
• The content and the terms used to describe concepts within that content will continue to grow and
evolve.
• New terms must be added, old terms deleted, and relationships between terms revisited.
• You should always be on the lookout for new variant terms.

DIGITAL THESAURI
THESAURUS.COM, UNESCO THESAURUS, VISUAL THESAURUS, VISUWORDS

SKOS (Simple Knowledge Organisation System) is a W3C initiative in the form of an RDF application that
provides a model for representing the basic structure and content of conceptual schemas such as subject
heading lists, taxonomies, classi cation schemes, thesauri and any kind of controlled vocabulary.

SKOS is a W3C standard that provides a set of terms, classes and properties to describe concepts and
relationships between them, enabling the creation of interoperable controlled vocabularies. Some of the key
terms in SKOS include "Concept," "Label," "Broader," "Narrower," "Related," and "Exact Match," among
others. Because SKOS is based on the Resource Description Framework (RDF) these representations are
machine-readable and can be exchanged between software applications and published on the World Wide
Web.
fi
fi
fi
fi
fi
fi
fi
fi
fi

You might also like