You are on page 1of 12

Revisiting the Syntactical and Structural Analysis

of Library of Congress Subject Headings for


the Digital Environment

Kwan Yi
School of Library and Information Science, University of Kentucky, 331 Little Library Building, Lexington,
Kentucky 40506-0224. E-mail: kwan.yi@uky.edu

Lois Mai Chan


School of Library and Information Science, University of Kentucky, 337 Little Library Building, Lexington,
Kentucky 40506-0224. E-mail: loischan@uky.edu

With the current information environment characterized have utilized LCSH for subject access of digital information
by the proliferation of digital resources, including col- in an automated manner: an automatic assignment of LCSH to
laboratively created and shared resources, Library of cataloging records for electronic resources (Frank & Paynter,
Congress Subject Headings (LCSH) is facing the chal-
lenges of effective and efficient subject-based organi- 2005); an association of LCSH to books in the Google Book
zation and retrieval of digital resources. To explore the Search project to facilitate access to a group of works on
feasibility of utilizing LCSH in a digital environment, we the same subject (Riley, 2007); and a potential linking of
might need to revisit its basic characteristics. The objec- social tags and LCSH with the goal of organizing networked
tives of our study were to analyze LCSH in both syntactic resources through a combination of tags and LCSH (Yi &
and relational structures, to discover the structural char-
acteristics of LCSH, and to identify problems and issues Chan, 2009). Regarding the question of where LCSH stands
for the feasibility of LCSH as an effective subject access currently in becoming a viable system in networked environ-
tool. This study reports and discusses issues raised by ments, Chan (2000, p. 172) has stated, “LCSH can become
the syntactic and hierarchical structures of LCSH that a versatile system that is capable of functioning in hetero-
present challenges to its use in a networked environment. geneous environments and can serve as the united basis
Given the results of this study, we recommend a num-
ber of provisional future directions for the development for supporting diversified uses while maintaining semantic
of LCSH towards further becoming a viable system for interoperability among them.”
digital and networked resources. Furthermore, in a networked environment, a vast num-
ber of heterogeneous information resources in distributed
information repositories have been organized with differ-
Introduction ent controlled vocabularies and different organization sys-
Since its inception in the 19th century, Library of Congress tems. The issue of interoperability of controlled vocabularies
Subject Headings (LCSH) has been widely used in the library has been raised with regard to cross-repositories searching
community as a tool for creating and providing subject access and cross-controlled vocabularies searching (Zeng & Chan,
points to traditional library collections. LCSH is a subject 2004). Interoperability between LCSH and a number of other
vocabulary covering all general disciplines, the largest gen- controlled vocabularies, including Medical Subject Head-
eral indexing vocabulary in the English language, and a ings (MeSH; Olson, 2001), Dewey Decimal Classification
de facto universal controlled vocabulary (O’Neill & Chan, (DDC; Vizine-Goetz, 1996), and the Education Resources
2003). Information Center (ERIC) thesaurus (Vizine-Goetz, Hickey,
In the past decade, however, the long traditional role of Houghton, & Thompson, 2004), has been explored in several
LCSH has been further expanded for and applied to the con- projects in recent years.
trol of digital resources. A series of projects and applications Automated LCSH-based subject indexing of digital infor-
mation entails the following requirements:
Received October 23, 2009; revised November 23, 2009; accepted November
• Automated identification of the appropriate subjects of digital
25, 2009
resources
© 2010 ASIS&T • Published online 28 January 2010 in Wiley InterScience • Automated selection of LC subject headings relevant to the
(www.interscience.wiley.com). DOI: 10.1002/asi.21295 identified subjects

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 61(4):677–687, 2010
To meet these requirements, the syntactic and semantic in a study of the development of an LCSH visualization tool,
structures of LCSH must be explored. The Association of Yi and Chan (2008) presented a display of hierarchical struc-
Library Collections and Technical Services (ALCTS) Sub- tures (tree structures) embedded in the entire LCSH via a
committee on Metadata and Subject Analysis “recommends complete set of subject authority records. The LCSH visual-
separating the consideration regarding semantics from that ization tool also can be used to identify locations of a single
relating to application syntax, in other words, distinguish- heading in multiple trees along with their paths within the
ing between the vocabulary (LCSH per se) and the indexing hierarchical structures.
system (i.e., how LCSH is applied in a particular imple- In other studies, LCSH was closely examined in rela-
mentation). This recommendation involves several important tion to other controlled vocabularies. A Khosh-khuui (1985)
concepts that need to be reviewed. Semantics and syntax are dissertation examined the associated relationship between
two distinct aspects of a controlled vocabulary” (Chan, 2000, LCSH and standard library classification schemes via a set
p. 167). of cataloging records (Khosh-khui, 1985). Larson (1992)
The current digital information environment may require attempted to automatically classify bibliographic MARC
us to re-visit LCSH for its utilization of organizing Inter- records into Library of Congress Classification (LCC) based
net resources so that a blend of diverse information resources on title and subject headings appearing in the bibliographic
from traditional library collections, networked resources, and records; his automated classification method using a vec-
digital resources can be organized under a single umbrella tor space model was built and tested only for the LC class
of LCSH. To explore the feasibility of utilizing LCSH in the Z. Vizine-Goetz (1998) conducted a study of mapping the
current environment, its basic characteristics should be exam- most relevant LC subject headings to DDC numbers. Her
ined in the context of automated assignment of LCSHs to study measured the similarity between LCSH and DDC by
diverse resources. The objective of the present study is to ana- log-likelihood ratio and calculated the most closely asso-
lyze LCSH in both syntactic and relational structures, to ciated pairs of LCSH and DDC numbers. A more recent
discover the structural characteristics of LCSH, and to iden- OCLC research project (Vizine-Goetz, Hickey, Houghton, &
tify problems and issues for the feasibility of LCSH as Thompson, 2004) summarized various OCLC vocabulary
an effective subject control tool for traditional and digi- projects exploring the direct or indirect mappings among var-
tal resources. This study attempts to answer the following ious subject heading lists (including LCSH), thesauri, and
three research questions and discuss their implications for library classification schemes.
employing LCSH in the current digital environment: Similar to the OCLC project (Markey & Vizine-Goetz,
1988), the present study examines and analyzes a full set
• What are the characteristics of LCSH in syntactic structure? of LC subject headings. Our study goes beyond the OCLC
• What are the characteristics of LCSH in local hierarchical
project in two aspects: (a) analyzing LC subject headings
structure?
• What are the characteristics of LCSH in global hierarchical
in the context of the hierarchical structures embedded in
structure? LCSH and (2) re-examining and updating the study of syn-
tactic structures in LCSH nearly two decades after the OCLC
project.
Related Work
We will examine previous researches on analyzing LCSH
LCSH-Based Access and Organization of
itself and then review research projects and applications with
Digital Resources
regard to the LCSH-based access and organization of digital
resources in the past 15 years. LCSH and Folksonomy. Collaborative tagging (often call
social tagging or social bookmarking) is a new approach
in collecting and organizing information resources on the
Analysis of LCSH
Web. In this context, information resources are selected, col-
LCSH has long been the target of research in subject- lected, and indexed by information users or seekers rather
centered information organizations. However, there have than by information professionals. Web applications and sys-
been few comprehensive studies analyzing the entire LCSH tems created and operated on the basis of social tagging are
itself.Among the few is an OCLC research project (Markey & collectively called Web 2.0. In a Web 2.0 application, users
Vizine-Goetz, 1988), which examined the characteristics create and assign their own keywords or tags to resources. A
of authority records for topical and geographic headings set of such user tags is collectively called a folksonomy (Van-
in LCSH and the cross-reference structure (see also refer- der Wal, 2007). Folksonomy plays a major role in organizing,
ences and see also from tracings) in the records. Based on accessing, and retrieving resources in the Web 2.0 envi-
the analysis of 160,706 subject authority machine-readable ronment. This is the opposite approach of using controlled
cataloging (MARC) records, the project reported statistical vocabularies in information organization and control.
results of various components: topical and geographical sub- Advantages and disadvantages of folksonomies as opposed
ject headings, orphan headings, subdivided and unsubdivided to controlled vocabularies in information management
established headings, and subject heading forms. Recently, have been widely discussed and debated (Mathes, 2004;

678 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—April 2010
DOI: 10.1002/asi
Quintarelli, 2005). To achieve the best result in informa- BUBL
tion organization and indexing, a bridge between folk-
BUBL (http://bubl.ac.uk/), originally called “Bulletin
sonomies and controlled vocabularies is often suggested
Board for Libraries,” is an information service comprising
(Rosenfeld, 2005; Rolla, 2009; Chan, 2009). Yi (2008) pro-
selected Internet resources covering all academic subject
posed a conceptual framework for integrating LCSH into a
areas, particularly for the library and higher education com-
folksonomy-based information application to improve Infor-
munities (Gold, 1996). BUBL uses LCSH for specific subject
mation Retrieval (IR). Yi and Chan (2009) conducted a close
categories of the resources and the DDC system to classify
examination of the overlap between LCSH and folkson-
the resources. As in INFOMINE, all resources stored are cat-
omy in Delicious (http://delicious.com) and the distribution
aloged and assigned LCSH-based subject keywords, DDC
of folksonomy over the hierarchical structure embedded in
numbers, etc. Like INFOMINE, LC subject headings are
LCSH. Chan (2009) examined the practice of social tagging
manually assigned to records (Dawson, 1997).
in LibraryThing and compared it to LCSH-based traditional
indexing.A more practical mapping between folksonomy and
LCSH can be found in a recent study by Yi (in review), which InterCat: A Catalog of Internet Resources
compares the semantic similarity of LC subject headings and This project (July, 1995), led by OCLC, was a collabo-
social tags using five different statistical metrics. The recent rative effort at creating a database of MARC bibliographic
studies mentioned above represent a series of attempts to records for Internet resources. More than 200 voluntary
link folksonomy and LC subject headings, moving toward an participants from libraries and information centers were
integration of Web 2.0 resources and traditional information involved in the creation of a collection of about 2,500 records.
systems using LCSH. LCSH was a source for subject entry of the records. The
database was publicly available via the InterCat Catalog
Web site (http://www.oclc.org:6990). However, it is no longer
Google Book Search and LibraryThing available.
Google Book Search (http://books.google.com/) is a
Google project that digitizes books in individual libraries and Electronic Journal Miner
makes their textual content universally accessible, with the
goal of 15 millions books scanned over the next decade (New Electronic Journal Miner was a project of collecting a
York Times). LibraryThing (http://www.librarything.com) is wide range of electronic serials, including e-magazines
a social cataloging Web application to help people catalog and newsletters, which were offered by the publisher and
their books and to store and share personal library catalogs available over the Internet. To facilitate easy access to
and book lists, with about 43 million books cataloged as of e-journals, a series of indexes including LC subject headings
August 18, 2009. LC subject headings play a direct or indi- was provided. The Web site no longer exists.
rect role of bridging the services via the OCLC WorldCat In previous attempts of organizing and accessing net-
bibliographic database so that easy access to similar items worked digital resources using LCSH described above, the
across different applications on the same subject becomes assignment of LC subject headings to resources has been
viable (Riley, 2007). made manually in all cases, a great challenge in view of the
massive size of resources available. So far, automatic meth-
ods for labelling resources with subject terms remain illusive.
INFOMINE With the enormously popular activity of social tagging of
networked resources, a social tag-based approach through
INFOMINE (http://infomine.ucr.edu/) is a virtual library linking folksonomy and LCSH appears feasible toward the
of selected Internet resources aiming at university-level aca- goal of automatic assignment of subject terms. The moti-
demic personnel such as faculty, students, teaching and vation for the present study can be traced to the studies
research staffs, etc. The virtual library collection includes (Vizine-Goetz, Hickey, Houghton, & Thompson, 2004; Yi &
scholarly Internet resources such as databases, electronic Chan, 2009) of linking LCSH to other controlled vocabular-
journals and books, articles, online library card catalogs, etc. ies and folksonomy. The hypothesis is that the extraction of
(http://infomine.ucr.edu/about/index.shtml). Each cataloging the syntactic structure and semantic relationships existing in
record was manually prepared for an Internet resource in LCSH and the utilization of the syntactic and semantic infor-
INFOMINE, and a LCC number was automatically assigned mation to the linking study can improve the quality of the
to the record. The appropriate LCC number is estimated linking. The next section explores the embedded hierarchical
from LC subject headings, using a support vector machine structures and syntactical characteristics of LCSH.
(SVM) algorithm (Frank & Paynter, 2004). Frank and Paynter
extended Larson’s study by estimating LCC numbers for
Structures Embedded on Library of Congress
all LCC classes and demonstrated the practical use of these
Subject Headings
numbers. Manually assigned LC subject headings and auto-
matically assigned LCC numbers are two essential elements A main heading in LCSH can be linked to two different
of the INFOMINE cataloging records. levels of LCSH-embedded hierarchical structures: one on a

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—April 2010 679
DOI: 10.1002/asi
local level and the other on a global level. The local-level Third, the associative relationship, expressed by the code
hierarchical structure of an LC subject heading refers to a list RT meaning related term, indicates two headings associated
of subject headings that have hierarchical relationship with in some manner other than hierarchical. For example,
the heading at one immediate level above or below in their
hierarchical positions (broader or narrower). The global-level Ornithology
hierarchical structure of a LC subject heading is similar to the RT Birds
Birds
local-level, except that subject headings are hierarchically
RT Ornithology
related at one or more levels. Based on the hypothesis that
the hierarchical structures of a given subject heading provide
As shown in the example above, the associative rela-
semantic contexts of the heading, the two immediate levels of
tionship is reciprocal. That is, when a heading A is in an
hierarchical relationships of a subject heading can be viewed
associative relationship with another heading B, then B is also
as two layers of semantic information related to a particular
in an associative relationship with A. The first entry will
subject heading. These two structures are called local rela-
appear in an LRS for the authorized heading of “Ornithol-
tional structure and global hierarchical structure. Each of the
ogy” and the second entry will appear in an LRS for the
structures will be described below.
authorized heading of “Birds.”
Hierarchical references enable the display of systemati-
Local Relational Structure cally related headings that are more general or more specific
than the heading being consulted. No matter the level at which
A local relational structure (LRS) of LCSH comprises a one enters the hierarchy, one can follow either BTs or NTs
manifestation of a main heading and its immediately related to find the broadest or most specific heading available. The
terms in equivalence, hierarchical, and associative relation- following headings illustrate this:
ships. There is a one-to-one relationship between an LRS and An example of an LRS for the established heading
a main heading: given a main heading, a corresponding LRS “Subject cataloging” is shown below:
exists, and vice versa. There are no LRS for lead-in terms.
There are three relationships that may occur in an LRS. Subject cataloging
First, the equivalence relationship exists between a valid or UF Subject analysis
authorized heading and its synonyms. Reciprocal references, BT Cataloging
expressed by the codes USE and UF (meaning used for), are Content analysis (Communication)
made between the authorized or preferred heading and the Indexing
non-preferred terms, for example, NT Classification—Books
Online library catalogs—Subject access
Subject analysis Subject headings
USE Subject cataloging World Wide Web—Subject access
Subject cataloging
UF Subject analysis This specific LRS has one UF, three BTs, and four NTs
without any RT. Based on the display of this LRS, the user
The term “subject cataloging” in bold is an authorized may infer that “Subject cataloging” will appear in three other
heading. The term “subject analysis” is an unauthorized term LRSs as an NT and in four other LRSs as a BT, because of
corresponding to the authorized heading. the reciprocal property between BT and NT.
Second, the hierarchical relationship, expressed by the
codes BT for broader term and NT for narrower term,
links an established or preferred heading A (Cataloging) to Global Hierarchical Structure
another established heading B (Subject cataloging), such A global hierarchical structure (GHS) of LCSH is a display
that heading B is a member (i.e., a narrower term) of the of a group of preferred subject headings that are hierarchi-
class represented by the heading A, and that the heading A cally connected to each other directly or indirectly. As the
is a broader term over B. As the terms involved in BT or hierarchical relationship among the LC subject headings is
NT are directly above and below each other in the hierar- presented in LRSs, GHSs can be built from LRSs. That is, a
chy, they must be matched by the reversed relationship, as GHS is a concatenation of all possible LRSs that can be linked
demonstrated by the following example: to each other in terms of BT/NT relationships. Figure 1 dis-
plays a portion of a GHS containing the “Subject cataloging”
Subject cataloging
BT Cataloging heading.
Cataloging As shown in Figure 1, a GHS can be represented by a
NT Subject cataloging tree that comprises nodes (each of which corresponds to an
entry in a row) and links (each of which is represented by
In this case, the first entry will appear in an LRS for the left indentation) between them. In a GHS tree, an established
heading “Subject cataloging” and the second one will appear LC subject heading is denoted by a tree node. Broader terms
in an LRS for the heading “Cataloging.” are denoted by the parent (upper in indentation) nodes of

680 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—April 2010
DOI: 10.1002/asi
tagger algorithm reads text and assigns each word of the
text an appropriate POS tag, such as noun, verb, adjective,
etc. The two taggers were built on the basis of different
approaches: rule-based and probabilistic approaches. The
Monty Tagger, a rule-based POS tagger, implements Eric
Brill’s transformational-based learning POS tagger (Brill,
1994) that is known as an effective tagger (Hasan, UzZaman,
Khan, 2007). TreeTagger (Schmid, 1994), developed at the
Institute for Computational Linguistics of the University
FIG. 1. A part of a GHS tree including “Subject cataloging” as a tree node, of Stuttgart, is one of the popular probabilistic approach-
obtained from using the LCSH visualization tool (Yi and Chan, 2009). based POS taggers (Marcus, Santorini, & Marcinkiewicz,
1993). Both taggers were constructed on the basis of a
established terms, and narrower terms are represented by same comprehensive text corpus, the Penn Treebank tag set
the child (lower in indentation) nodes of established terms. (http://www.cis.upenn.edu/∼treebank/).
Headings at the same indentation are called sibling nodes.
Compare the GHS shown in Figure 1 with the “Subject
cataloging” LRS in the LRS section. The LRS shows that Syntactic Analysis of LC Subject Headings
the heading “Subject cataloging” has three different BTs and The set of subject authority records comprises 341,745
four different NTs. The GHS in Figure 1 shows only one BT preferred headings (or established headings) and 939 anno-
and one NT. The rest of the BTs and NTs can appear in other tated card (AC) headings. AC headings, a separate set of
parts of the same GHS or in other GHSs. Also, note that the subject headings exclusively for juvenile collections, are not
BT or NT used in this GHS can occur in other GHSs as well. included in this study. The remaining preferred headings are
Applying such an algorithm for building GHSs implied in separated into two groups of headings: unsubdivided and sub-
LCSH yields a set of n independent GHS trees, independent divided. Of the 341,745 preferred headings, there are 231,488
in that there is no single hierarchical link across any two out unsubdivided subject headings (about 67.7%) and 110,257
of the n trees. In other words, there is no hierarchical rela- subdivided subject headings (about 32.3%). 80,400 (23.52%)
tionship found between any two nodes in different trees. This of the established headings contain one subdivision, and
independent property leads us to the hypothesis that each of 29,857 (8.74%) have two or more subdivisions.
the n GHS trees represents and supports a separate concept The percentages of headings are closely comparable
or topic and that no two concepts represented by two inde- to the outcome of an earlier study (Markey & Vizine-
pendent trees overlap in hierarchy. The overall hierarchical Goetz, 1988), which examined a total of 160,706 established
structure of LC subject headings can, therefore, be viewed as headings—less than 50% of the total number of head-
n number of independent GHS trees. ings used in the present study: Markey and Vizine-Goetz
(1988, p. 67) reported 68.35% (vs. 67.7% from our result),
Analysis of LCSH: Results 23.01% (vs. 23.52%), and 8.64% (vs. 8.74) for the percent-
ages of headings with no subdivision, one subdivision, and
Dataset
multiple subdivisions, respectively. Even with new head-
Instances of LCSH can be found in diverse sources: ings added during the last two decades, the distribution of
printed versions; bibliographic records; Web applications headings in terms of subdivision has remains more or less
through the LC Web site (http://authorities.loc.gov) or the same.
Classification Web (http://classificationweb.net/); and elec- Table 1 shows the frequency distribution of LC subject
tronic versions available for purchase at the LC Web headings over the number of words per subject heading. As
site (http://www.loc.gov/cds/contact.html). Recently, LC shown in the table, two-term subject headings (36.9%) are
released a set of LC subject authority records as a public the most common. After that, three-term (23.9%), four-term
domain dataset. For this study, the RDF/XML June 05 2009 (14.5%), and single-term (12.6%) subject headings follow in
version of LC authority records was downloaded from the decreasing order of frequency. The two longest headings con-
LC site (http://id.loc.gov/authorities/search/). It comprises tain 15 words each. The pattern of frequency distribution is
342,684 subject authority records. A series of computer shown in Table 1: except in the case of single-term headings,
programs in Perl was written for the analysis. as the number of terms per heading decreases, the number of
headings increases.
Table 2 shows a frequency distribution of LC subject
Part-of-Speech Tagging
headings with one or more subdivisions over the number of
To identify adjectival phrase headings, two popular part- subdivisions. Among subdivided subject headings, 72.9% are
of-speech (POS) taggers were employed: MontyTagger subdivided once, and 27.1% are subdivided multiple times.
version 1.2 (http://web.media.mit.edu/∼hugo/montytagger/) The largest number of subdivisions per subject heading is five.
and TreeTagger (http://www.ims.uni-stuttgart.de/projekte/cor As the number of subdivisions per subject heading increases,
plex/TreeTagger/DecisionTreeTagger.html#Linux). A POS the number of headings sharply decreases.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—April 2010 681
DOI: 10.1002/asi
TABLE 1. Frequency distribution of a total of 231,488 unsubdivided In LCSH, adjectival phrase headings consist of a noun
Library of Congress subject headings. or noun phrase with an adjectival modifier in the form of
No. of words per heading No. of headings only (ratio) a common adjective or a noun modifier (Chan (2008). The
TreeTagger and the MontyTagger were used to count all pos-
1 29,293 (12.6%) sible forms. As shown in Table 3, the number of adjectival
2 85,484 (36.9%)
phrase headings found with TreeTagger is 43,195 (12.6%)
3 55,315 (23.9%)
4 33,462 (14.5%) and that with MontyTagger is 38,737 (11.3%).
5 16,060 (6.9%) Conjunctive phrase headings contain the word “and” to
6 6,953 (3.0%) connect two or more nouns, noun phrases, or both with or
7 2,639 (1.1%) without modifiers. A count of conjunctive phrase headings
8 1,292 (0.6%)
that contained the word “and” were counted by checking
9 561 (0.2%)
10 230 (0.1%) whether it contains “and” or not, which yielded a result of
11 121 (0.1%) 9,061 (2.7%).
12 43 (0.0%) Prepositional phrase headings contain a preposition con-
13 23 (0.0%) necting two or more nouns, noun phrases, or both with
14 10 (0.0%)
or without modifiers. Various prepositions occur in this
15 2 (0.0%)
type of heading. There is a list of 70 of the more com-
mon one-word prepositions at http://www.englishclub.com/
TABLE 2. Frequency distribution of a total of 110,257 subdivided Library
vocabulary/prepositions.htm. Prepositional phrase headings
of Congress subject headings. were examined in three different ways: (a) headings con-
taining at least one preposition from the list above without
No. of subdivision involved No. of headings (ratio) any semantic consideration, (b) headings counted using Tree-
1 80,400 (72.9%) Tagger, and (c) headings counted using MontyTagger. Both
2 26,454 (24.0%) TreeTagger and MontyTagger use the same tag coding sys-
3 3,164 (2.9%) tem, the Penn Treebank tag set. The tag set poses an ambiguity
4 226 (0.2%) in part-of-speech tags. For example, the Penn Treebank tag set
5 13 (0.0%)
uses “IN” for a preposition or a subordinating conjunction.
Therefore, it is often difficult to distinguish the two com-
ponents even by looking at the following tags because they
TABLE 3. Syntactical analysis of Library of Congress subject headings.
can share a clause as a following tag. Therefore, all headings
No. of subject including the tag “IN” were counted for preposition or sub-
Syntactic structure headings (ratio) ordinating conjunction, and the result was used as the upper
Adjectival phrase headings (with TreeTagger) 43,195 (12.6%) limit of the number of prepositional phrase headings. The fre-
Adjectival phrase headings (with MontyTagger) 38,737 (11.3%) quencies of prepositional phrase headings are 19,117 (5.6%)
Conjunctive phrase headings 9,061 (2.7%) when counted with a list of prepositions, 20,304 (5.9%) with
Prepositional phrase headings (with a list 19,117 (5.6%) TreeTagger, and 327 (0.1%) with MontyTagger, respectively.
of prepositions)
In inverted phrase headings, the order of words is inverted,
Prepositional phrase headings (with TreeTagger) 20,304 (5.9%)
Prepositional phrase headings (with MontyTagger) 327 (0.1%) so that the more significant word precedes the less signifi-
Inverted phrase headings 45,426 (13.3%) cant word(s), e.g., “Chemistry, Organic.” As shown in this
Free-floating phrase headings with the current 477 (0.1%) example, the inverted form contains a comma. Therefore, the
free-floating components only comma is used in order to approximate the number of inverted
Free-floating phrase headings with the current 6,313 (1.8%)
phrase headings. As a result, the number obtained is likely to
free-floating components plus old ones
Qualifier 74,481 (21.8%) be greater than the actual correct number of inverted phrase
Qualifier without any heading for preceding term 73,415 headings because commas are used in headings for other pur-
Qualifier with a separate heading for preceding term 1,066 poses as well. Therefore, the number is treated as the upper
limit of the number of inverted phrase headings. The result
shows that as many as 13.3% of the total of 231,488 LC
We then analyzed LCSH in its syntactical structure. Each subject headings contain a comma.
LC subject heading, whether topical/form heading or head- A free-floating heading is a type of heading with free-
ing for named entities, comprises one or more words (refer floating components which may be combined with any exist-
to Table 1). We focus here on counting various forms of ing heading. The current Subject Headings Manual (2008)
multiple-word main headings (phrase headings) from a total specifies the three free-floating components being used, e.g.,
of 231,488 unsubdivided LC subject headings. The phrase “Metropolitan Area,” “Region,” “Suburban Area.” Examples
structure of LC subject headings displays several patterns: of free-floating headings include: “New York Metropoli-
adjectival, conjunctive, prepositional, inverted, and free- tan Area,” “Dallas Region (Tex.),” and “Atlanta Suburban
floating (Chan, 2008, pp. 218–219). Table 3 shows the results Area (Ga.)” showing different types of free-floating compo-
of an analysis of LCSH phrase patterns. nents, respectively. Because of the complexity of the contexts,

682 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—April 2010
DOI: 10.1002/asi
TABLE 4. Analysis of the local relational structures. “Noun/Adjectival” accounted for 65.48% and 48.71% of all
subject headings in tags 150 and 151, respectively; 11.59% in
Local relational Local relational
structure with structure with tag 150 and 46.93% in tag 151 are for the category of “Qual-
unsubdivided subdivided ified”; 10.86% in tag 150 and 3.53% in tag 151 belong in the
top main heading top main heading category of “Inverted”; 5.83% in tag 150 and 0.53% in tag
151 in the “Prepositional Phrase” category; and 4.15% in
No. of LRSs 231,488 110,257
No. of UFs (no. per LRS) 284,748 (1.23) 27,674 (0.25) tag 150 and 0.27% in tag 151 are in the “Conjunctive Phrase”
No. of BTs (no. per LRS) 230,806 (1.0) 16,716 (0.15) category. When the percentages in both tags are combined,
No. of NTs (no. per LRS) 155,422 (0.67) 92,100 (0.84) our resultant percentages and those from the OCLC project
No. of RTs (no. per LRS) 19,978 (0.09) 1,704 (0.02) appear quite similar. Two key findings are: (a) the percentages
of headings for the categories calculated from two studies two
decades apart are comparable; and (b) the categories associ-
we counted headings containing any of the current free- ated with the three largest percentages of subject headings
floating components. We found only 477 free-floating head- are “Qualifier,” “Inverted Phrase Headings,” and “Adjectival
ings, a relatively small portion compared to any other form Phrase Headings,” which together total almost 50%.
of phrase headings. The Manual also notes that several free-
floating phrase headings have been discontinued, such as
[personal name] in fiction, drama, poetry, etc. and [name Structural Analysis of LC Subject Headings
heading (except personal names)] in literature in August LRSs of LCSH. In this study, a total of 341,745 LRSs were
1993 and [name heading (except personal names)] in art in constructed from LCSH authority records used. An LRS was
January 1997. As the subject authority records, which were created for each specific established heading. The LRSs were
downloaded from the LC site for this study, contain head- divided into two separate groups: LRSs with unsubdivided
ings created since 1985, we also counted all the headings top main headings (67.74%) and LRSs with subdivided top
encompassing the earlier types of free-floating components. main headings (32.26%). As recorded on the second row of
Note that the frequencies of prepositional phrase headings Table 4, there are a total of 231,488 LRSs of which top main
were counted to be 477 (0.1%) only for the currently used headings (i.e., the first heading in each LRS) do not carry
three free-floating components and to be 6,313 (1.8%) for all a subdivision. By contrast, the number of LRSs with sub-
free-floating components including the ones used in the past. divided top main headings is less than half of the number
In LCSH, a qualifier may be added to a heading in two situ- of those with unsubdivided top main headings. In the LRSs
ations (Chan, 2008, p. 220): to resolve the problem of multiple with unsubdivided top headings, the average numbers of UFs,
meanings of a heading and to provide context for ambiguous BTs, NTs, and RTs per LRS are 1.23, 1.0, 0.67 and 0.09,
or technical terms. In both cases, the qualifier is enclosed in respectively. In the subdivided LRSs, those of UFs, BTs, NTs,
parentheses. All headings including terms enclosed in paren- and RTs per LRS are 0.25, 0.15, 0.84, and 0.12, respectively.
theses were counted. Table 4 shows that 74,481 (21.8%) The frequency distributions of UFs, BTs, NTs, or RTs
headings contain qualifiers. We also tried to count the use within LRSs are plotted in Figures 2 through 5. In these
of qualifiers for the two situations. One way to distinguish figures, the x-axis represents the number of UFs, BTs,
the two situations is to see if the term preceding the qualifier NTs, or RTs per LRS, and the y-axis represents the frequency
is also an individual subject heading, e.g., to see if “Mercury” of LRSs. Each point in the plots represents the frequency of
itself is a standalone heading when the subject heading “Mer- LRSs for the corresponding number of UFs, BTs, NTs, or
cury (automobile)” is considered. If the answer is yes, then RTs appearing in the x-axis.
it reflects the first situation. Otherwise, it reflects the second
situation. Out of a total of 74,481 cases, only 1,066 cases fell
in the first situation and 73,415 cases belonged to the second
situation.
The accurate count of subject headings for each of the
categories shown in Table 3 often requires a semantic inter-
pretation of subject headings. However, the automated count-
ing methods in use in this study have a limited capability
for perfect counting. Consequently, the resultant numbers
and percentages reported in Table 3 should be taken as
approximate.
Markey and Vizine-Goetz’s OCLC project also investi-
gated some of the same categories used in our syntactical
analysis. In the OCLC project, headings for each category
were counted only by specific MARC tags, with a total num-
ber of 128,367 headings in tag 150 and 19,218 in tag 151.
The OCLC project reported that headings in the category of FIG. 2. Frequency distribution of UFs in Local Relational Structures.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—April 2010 683
DOI: 10.1002/asi
FIG. 6. Frequency distribution of GHS trees in terms of tree size.
FIG. 3. Frequency distribution of BTs in Local Relational Structures.

in any of the unsubdivided and subdivided cases are as fol-


lows: between 0 and 10 for BTs, between 0 and 17 for RTs,
between 0 and 68 for UFs, and between 0 and 1,025 for NTs.
A comparison of the plots in the figures demonstrates that
the two frequency distributions from two different LRS sets in
any UFs, BTs, NTs, or RTs case share a similar pattern. Fur-
thermore, a common characteristic across all four frequency
distributions of UFs, BTs, NTs, and RTs in each LRS set is
that the number of UFs, BTs, NTs, or RTs per LRS tends to be
inversely proportional to the number of LRSs. In other words,
with more UFs, BTs, NTs, or RTs per LRS, the frequency of
corresponding LRSs is inclined to decrease.
Some key findings here can be summarized: (a) our
study reported statistical data related to different types of
FIG. 4. Frequency distribution of NTs in Local Relational Structures. terms within LRSs—total frequency, frequency per LRS,
and frequency distribution; and (b) our study discovered the
similarity and variation across different types of terms in
distribution pattern and related frequencies.

GHS of LCSH. A total of 155,905 GHS trees were identified


and constructed from the LC subject authority records used in
this study. Figure 6 demonstrates the frequency distribution
of the GHS trees in terms of their size, i.e., the number of
subject headings per GHS tree. A dominant pattern of the
frequency distribution shown in the figure is that fewer GHS
trees tend to have more subject headings, i.e., there is an
inverse relationship between the frequency of GHS trees and
their size in general. Another characteristic is a long tail of
the distribution: the frequency of any GHS tree whose size is
at least 114 is equal to or under five.
FIG. 5. Frequency distribution of RTs in Local Relational Structures. The two extreme points of the plot are as follows: the point
at the upper-left corner represents 131,479 GHS trees, each
of which comprises only one subject heading, and the point at
In each of the unsubdivided and subdivided cases as listed the lower-right corner indicates that the corresponding GHS
in Table 4, the range of the average number is not wide: tree has 1,455,026 subject headings. The largest tree is asso-
1.14 for the unsubdivided case that is calculated by 1.23 ciated with “Science,” i.e., the top subject heading of the GHS
minus 0.09, and 0.82 for the subdivided case that is cal- tree is “Science.” The ordered list of the largest trees is as fol-
culated by 0.84 minus 0.02. However, as demonstrated in lows: 1,455,026 (“Science”), 769,526 (“Auxiliary sciences
Figures 2 through 5, the distributed range of UFs, BTs, NTs, of history”), 367,964 (“Culture”), 292,639 (“Religion”),
and RTs greatly varies in each case. The ranges of each term 145,050 (“Life”), 103,495 (“Rehabilitation”).

684 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—April 2010
DOI: 10.1002/asi
(equivalent to less than 10% of the total) single-word, unsub-
divided main headings (see Table 1), which is much less
than the number of words in a reasonably sized English
dictionary. In a study comparing LCSH and social tags,
Yi and Chan (2009) reported that only two thirds of the
social tags examined in the study were also found in LCSH.
Heckner et al. (2007) compared bibliographic metadata
comprising title, abstract, author-assigned keyword, and full-
text against social tags assigned to the same bibliographic
items. They reported that only 54 percent of the social
tags matched with the bibliographic metadata that do not
include LC subject headings. What causes such mismatches
between social tags and LCSH? Yi and Chan identified three
barriers: technology-related social tags, inconsistent forms
FIG. 7. Frequency distribution of two groups of GHS trees in terms of tree and patterns of multiword social tags, and incompatible
size. forms of social tags such as abbreviations or acronyms. In
LCSH-based automated subject indexing, the development
of sophisticated algorithms for linking LC subject headings
The GHS trees can be divided into two different groups, G1 to target vocabulary is crucial as current state-of-the-art auto-
(a group of trees with non-subdivided top subject headings) mated algorithms rely primarily on simple word matching.
and G2 (a group with subdivided top subject headings). Of Therefore, for terms that are not in LCSH, a novel algorithm
a total number of 155,905 GHS trees, 59,892 trees (38.4%) or approach should be devised to link them to proper LC
belong to G1 and 96,013 trees (61.6%) belong to G2. The subject headings.
frequency distributions of the GHS trees for both groups
are separately plotted in Figure 7 for comparison. The two Syntactic structure of LCSH. Understanding the precise
plots are similar in shape but differ in that G1 has a much syntactics of subject headings is a necessity in automated
longer tail and lower frequency over a broader range of dis- assignment or selection of proper subject headings. In spite
tinct tree sizes in general. Some of the top largest trees in G1 of recent technological advances in artificial intelligence, the
are the same as those for all GHS trees. The ordered list of discovery of a resource’s primary subject expressed in natural
the largest trees in G2 is as follows: 54,068 (“Mathematics– language remains a challenging task. This study analyzed the
Philosophy”), 21,029 (“Research–Equipment and supplies”), syntactic characteristics of LCSH and classified them in sev-
6,524 (“Family–Study and teaching”), 6,261 (“Matter– eral forms, as shown in Tables 2 and 3. Moreover, more than
Constitution”), 2,152 (“Home economics–Equipment and 90% of LC subject headings contain various phrase forms
supplies”). to represent complex subjects, through subdivisions, quali-
In summary, the total of 341,745 established headings fiers, free-floating forms, etc. Identifying specific syntactic
yields 155,905 GHS trees (45.62% of the total headings) patterns of LCSH will significantly improve the process of
with an average of 2.19 headings per GHS tree. The identifying the semantics of LC subject headings.
OCLC research reported that 39.74% of 128,367 topical In addition, we encourage those seeking to apply LCSH
subject headings (MARC tag 150) and 56.67% of 19,218 to digital resources to pay more attention to the synonym and
established headings for geographical names (MARC tag homograph issues. Synonym control is implemented within
151) are orphan headings—headings that have no broader LCSH. Synonyms of an LC subject heading are explicitly
terms. By definition, then, the number of orphan headings listed through the equivalence relationship using USE and
must be equal to the number of GHS trees. By combining the UF. Homograph control is also practiced in LCSH through a
numbers from tag 150 and 151, the orphan headings account specific syntactic form, the qualifier. However, as qualifiers
for 41.94% (61,903) of the combined number of headings are not used only for homograph control in LCSH, a special
from tag 150 and 151 (147,585). caution must be paid to the identification of qualifiers for
homograph control for the accurate semantics. Furthermore,
an attempt to understand synonyms and homographs in LCSH
Discussion in the context of its LRS and GHS is particularly encouraging
for better extraction of the accurate semantics of LCSH.
The Potential Use of LCSH for Subject Access to Digital
Networked Resources
Hierarchical structure in LCSH. LCSH was originally a
Term coverage in LCSH. The subject authority file used in subject heading list that possessed explicit representation
this study contains 341,745 main headings, which include of the relationships between concepts, including hierarchical
both subdivided and unsubdivided ones. LCSH is known as relationships. However, the hierarchical structures embed-
the most comprehensive, controlled vocabulary in English ded in LCSH are not as rigid as in typical thesauri, because
(O’Neill & Chan, 2003). However, there are only 29,293 LCSH has been built from the bottom up, without the design

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—April 2010 685
DOI: 10.1002/asi
and guidance of an overall hierarchical structure. The struc- partly because of the obscure syntactic structure. Neverthe-
tural characteristics of the GHS trees demonstrated this point. less, the value of this study lies in the structural analysis of
As illustrated in Figure 6, some GHS trees are extremely LCSH, unique and hitherto unexplored. In the analysis, the
large, and some are minimal with only one subject head- hierarchical structures embedded in LCSH are analyzed in
ing. Some single-heading GHSs referring to the same subject perspectives of LRS and GHS. The empirical result identifies
may be combined into a new tree or may be integrated into a number of structural issues in LCSH, particularly in GHS.
existing GHS trees. Gigantic trees can be problematic when Given the results of this study, we recommend the fol-
their hierarchical structure is too complex to understand and lowing provisional future directions for the development
manage or when the hierarchical relationship between two of LCSH toward becoming a viable system for digital and
terms shown at a considerable distance within a tree is too networked resources:
weak to be useful. Such quirky structures may occur from the
• Efforts should be made to render the semantics and syntax
fact that when new headings were added, their hierarchical of LCSH more consistent and predictable. As shown in the
positions do not appear to be taken into consideration in the result, the automatic analysis on syntactic structures in LCSH
context of the GHSs of LCSH. As an independent tree ideally failed to uncover the precise semantics intended in subject
represents a mutually exclusive single concept, trees with too headings in all cases because of their innate inconsistency.
many nodes (implying overly broad concept) or too few nodes Introducing predictable syntaxes into LCSH and using them
(implying overly specific concept) are less useful for provid- consistently will greatly assist in mining correct semantics of
ing semantic information. For example, each of the following subject headings, predictable in that intended semantics can
LC subject headings is a single-node tree, respectively: “Folk be retrievable based on the syntax. Thus, with the aid of more
literature, Abazin,” “Folk literature, Bom,” “Folk literature, predictable syntaxes, automatic tools of analyzing the LCSH
syntax can facilitate a better understanding of the semantics
Bunak,” “Folk, literature, Efik,” etc. However, splitting some
of subject headings. Having predictable syntaxes is also ben-
of the gigantic trees, those comprising a few thousand nodes eficial to the computer-based understanding of LCSH, which
(headings), into several trees or combining very small size of will be essential in utilizing LCSH as a more useful tool in
trees, those comprising only one single node, into larger size the network environment.
of trees, based on semantics, appears to be quite challenging. • The structures of LCSH should be made more rigorously hier-
Given the current situation, a short-term method might be archical. LCSH contains rich semantic links among terms
to have terms in one or a few hierarchical levels placed into through cross-references. However, its hierarchical relation-
an independent single hierarchical block, preserving strong ships are not always rigidly defined and implemented (Yi &
hierarchical relationship but abandoning weak ones. Chan, 2008) and, as a result, highly valuable semantic infor-
mation cannot be fully utilized. To make it worse, the
non-rigorous hierarchical relationships of a subject heading
will provide blurred or vague hierarchical context over the
Conclusion heading and its related headings. A rigorous hierarchy in
LCSH will be beneficial to the mining of semantic knowledge
With the proliferation of digital resources especially user- of LC headings and also facilitate the generation of domain-
created and shared resources, LCSH must confront the specific or subject-specific ontologies.
demand of systematic organization and access to the massive • Intelligent utilization of a new type of external knowledge
amount of the online resources. In response to the demand, source folksonomy may provide a novel approach for the inter-
this study intends to examine the syntactic and hierarchical operability of LCSH and other controlled vocabularies and
structures of LCSH for the purpose of mining semanti- the integration of different information systems via LCSH.
cally contextual information of subject headings from the Digital and traditional information resources are organized
structures embedded in LCSH. using a variety of controlled vocabularies and/or classification
This study is similar to an earlier work by Markey and schemes. Achieving an enhanced interoperability among dif-
ferent systems, controlled vocabularies, metadata standards,
Vizine-Goetz (1988), but different in the approach to the
and stored resources is essential in utilizing these resources
analysis and the intended goal. Markey and Vizine-Goetz’s to their best advantages.
work focused on the analysis of LCSH itself in structural
aspects, such as MARC tags, subfields, etc., to represent
their statistical distribution. In contrast, the primary atten-
tion of the current study is to examine the syntactic and Acknowledgments
hierarchical structures of LCSH to explore the feasibility of
mining the semantic and contextual information of LC sub- The authors thank the anonymous reviewers of this article
ject headings and for its potential use in the construction of for their valuable and insightful input and comments.
LCSH-based access to digital resources. The two studies con-
ducted at the interval of a little over two decades produced References
similar results in the syntactic analysis of LCSH regarding
Brill, E. (1994). Some advances in transformation-based part of speech
the distribution of headings into various syntactic types. The tagging. Proceedings of the Twelfth National Conference on Artifi-
present study primarily relies on POS method and lexical cial Intelligence (AAAI-94) (pp.722–727). Menlo Park, CA: American
analysis. We experienced certain limitations in the analysis Association for Artificial Intelligence.

686 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—April 2010
DOI: 10.1002/asi
Chan, L.M. (2000). Exploiting LCSH, LCC, and DDC to retrieve net- O’Neill, E.T., & Chan, L.M. (2003). FAST (Faceted Application of Subject
worked resources: Issues and challenges. Proceedings of the Bicen- Terminology): A simplified vocabulary based on the Library of Congress
tennial Conference on Bibliographic Control for the New Millennium: Subject Headings. IFLA Journal, 29(4), 336–342.
Confronting the Challenges of Networked Resources and the Web Olson, T. (2001). Integrating LCSH and MeSH in information systems.
(pp. 159–178). Washington, DC: Library of Congress, Cataloging Dis- Proceedings of the Subject Retrieval in a Networked Environment at an
tribution Service. IFLA Satellite Meeting sponsored by the IFLA Section on Classification
Chan, L.M. (2009, August). Social bookmarking and subject indexing. Paper and Indexing & IFLA Section on Information Technology. Dublin, OH:
presented at the Satellite Pre-Conference, IFLA Classification and Index- OCLC.
ing Section, Florence, Italy. Retrieved January 12, 2010, from http:// Quintarelli, E. (2005). Folksonomies: Power to the people. Paper presented
www.ifla2009satelliteflorence.it/meeting2/program/assets/Chan.pdf at the ISKO- UniMIB Meeting. Retrieved July 7, 2009, from http://www.
Dawson, A. (1997). BUBL bursts out of bath. The Serials Librarian, 31(4). iskoi.org/doc/folksonomies.htm
Retrieved October 22, 2009, from http://cdlr.strath.ac.uk/pubs/dawsona/ Riley, J. (2007). Google book search and . . . LCSH? Retrieved October 19,
ad199701.htm 2007, from http://inquiringlibrarian.blogspot.com/2007/10/google-book-
Frank, E., & Paynter, G.W. (2004). Predicting Library of Congress classifica- search-and-lcsh.html
tions from Library of Congress subject headings. Journal of the American Rosenfeld, L. (2005). Folksonomies? How about metadata ecologies?
Society for Information Science and Technology, 55(3), 214–227. Retrieved May 29, 2009, from http://www.louisrosenfeld.com/home/
Gold, J. (1996). The BUBL information service. The Serials Librarian, bloug_archive/000330.html
29(3/4), 165–174. Schmid, H. (1994). Probabilistic part-of-speech tagging using decision
Hasan, F.M., UzZaman, N., & Khan, M. (2007). Comparison of different trees. Proceedings of the International Conference on New Methods in
POS tagging techniques (n-gram, HMM and Brill’s tagger) for Bangla. Language Processing (pp. 44–49). Manchester, UK.
In K. Elleithy (Ed.), Advances and innovations in systems (pp. 121–126). Svenonius, E. (2000). LCSH: Semantics, syntax and specificity.
Netherlands: Springer. Cataloging & Classification Quarterly, 29(1/2), 17–30.
Heckner, M., Muhlbacher, S., & Wolff, C. (2007). Tagging tagging: A Vander Wal, T. (2007). Folksonomy. Retrieved November 19, 2009, from
classification model for user keywords in scientific bibliography man- http://vanderwal.net/folksonomy.html
agement systems. Proceedings of the 6th European Networked Knowl- Vizine-Goetz, D. (1996). Classification research at OCLC. Annual Review
edge Organization Systems (NKOS) Workshop at the 11th ECDL of OCLC Research 1997 (pp. 27–33).
Conference, Budapest, Hungary. Retrieved November 2, 2008, from Vizine-Goetz, D. (1998). Popular LCSH with Dewey numbers. Annual
http: / / www.comp.glam.ac.uk / pages /research /hypermedia /nkos /nkos Review of OCLC Research 1997. Retrieved October 22, 2009, from
2007/papers/heckner.pdf http://worldcat.org/arcviewer/1/OCC/2003/03/18/0000002652/viewer/
Jul, E. (1995). OCLC Internet cataloging project. D-Lib Magazine. Retrieved file108.html
October 22, 2009, from http://www.dlib.org/dlib/december95/briefings/ Vizine-Goetz, D., Hickey, C., Houghton, A., & Thompson, R. (2004).
12oclc.html Vocabulary mapping for terminology services. Journal of Digital
Khosh-khui, A. (1985). Statistical analysis of the association between Information, 4(4).
Library of Congress subject headings and their corresponding class Woodward, J. (1996). Cataloging and classifying information resources
notations in main classes of LCC and DDC. Indiana University. on the Internet. Annual Review of Information Science and Technology
Larson, R.R. (1992). Experiments in automatic Library of Congress classi- (ARIST), 31, 189–220.
fication. Journal of the American Society for Information Science, 43(2), Yi, K. (2008). A conceptual framework for improving information retrieval
130–148. in folksonomy using Library of Congress Subject Headings. Proceed-
Marcus, M.P., Santorini, B., & Marcinkiewicz, M.A. (1993). Building a ings of the American Society for Information Science and Technology,
large annotated corpus of English: The Penn Treebank. Computational 45(1), 1–6.
Linguistics, 19(2), 313–330. Yi, K. (2010). A semantic similarity approach to mapping social tags
Markey, K., & Vizine-Goetz, D. (1988). Characteristics of subject authority to Library of Congress subject headings. Manuscript submitted for
records in the machine-readable Library of Congress subject headings. publication.
Dublin, OH: OCLC. Research report series, no. OCLC/OR/RR-88/2, Yi, K., & Chan, L.M. (2008). A visualization software tool for Library
OCLC control no. 18650558. of Congress subject headings. Proceedings of the 10th International
Mathes, A. (2004). Folksonomies – cooperative classification and commu- Conference of International Society for Knowledge Organization. In
nication through shared metadata. Retrieved July 7, 2009, from http:// C. Arsenault & J.T. Tennis (Eds.), Advances in knowledge organization
www.adammathes.com/academic/computer-mediated-communication/ (Vol. 11, pp. 170–176). Würzburg, Germany: Ergon.
folksonomies.html Yi, K., & Chan, L.M. (2009). Linking folksonomy to Library of Congress
McKiernan, G. (2001). Beyond bookmarks: Schemes for organizing the subject headings: An exploratory study. Journal of Documentation, 65(6),
Web. Retrieved October 22, 2009, from http://www.public.iastate.edu/∼ 872–900.
CYBERSTACKS/CTW.htm Zeng, M., & Chan, L.M. (2004). Trends and issues in establishing
New York Times. Google book search. Retrieved October 22, 2009, interoperability among knowledge organization systems. Journal of
from http://topics.nytimes.com/top/news/business/companies/google_inc/ the American Society for Information Science and Technology, 55(5),
google_book_search/index.html 377–95.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—April 2010 687
DOI: 10.1002/asi
Copyright of Journal of the American Society for Information Science & Technology is the property of John
Wiley & Sons, Inc. / Business and its content may not be copied or emailed to multiple sites or posted to a
listserv without the copyright holder's express written permission. However, users may print, download, or
email articles for individual use.

You might also like