You are on page 1of 4

Introduction and Overview: Visualization, Retrieval,

and Knowledge†

This Perspectives issue is assembled to provide an his- the sky. But the first definition will elude the untutored and
torical background to visualization in information retrieval. the second remain unverifiable for half of each planetary
It is a review of the assumptions and technology configu- revolution.
rations by which the current literature may be interpreted. In many circumstances, descriptive knowledge of the
The techniques of the authors of this issue differ, but all world must be limited to closed vocabularies such as math-
treat their techniques as manuals of description flowing ematics or predicate logic. These special languages remove
from a history of common mathematical and technical in- ambiguity, but frequently limit the power of expressiveness
fluences. to the trivial, or distort common observations into the realm
All technologies have histories of development. The of the bizarre. The common sense in which words are used
historical forces of visualization frame the current efforts to encapsulate knowledge is the only sense in which words
and comprise the field in which new problem dimensions interest us. We seek in description to elucidate those prop-
are addressed. No field of scientific inquiry emerges without erties of a phenomenon which most differentiate it from
a background. This issue adds to the depth necessary for the others, or which best provide the boundaries by which
study of visualization by new students and new scholars. phenomena may be treated in common as a class. It may
Moreover, these articles also reflect developments by well be that in the future visualization in retrieval will be
some of the world’s most prestigious research institutions. construed as such a special language, but one vastly more
Not only are two American national laboratories repre- expressive and universal.
sented, but also the National Aeronautics and Space Ad- The digital world technology has introduced bequeaths
ministration (NASA), the Institute for Scientific Informa- us with ever more complex problems of knowledge differ-
tion, the Digital Libraries Initiative in the Alexandria entiation and classification. No one would argue that images
Project, the Canadian National Archives, and the Founda- do not encapsulate knowledge, but how images encapsulat-
tions of Advanced Information Visualization (FADIVA) ing knowledge may be described from their primitive forms
European group. This issue is far from the last word on this is still in the resolution stage. Images are not usually parsed
topic, but surely it is among the most authoritative ones. into words as documents are so treated. On an even more
intricate plane of examples, the methodologies by which
sounds may be retrieved by images is a problem so con-
In the Beginning Was the Word founded that no literature exists yet to address it.
The history of information retrieval is mostly the history In the early literature of information science, reference is
of word retrieval. Very early in this history, word frequency often made to a paradigm known as “the answer document.”
in document collections was used to convey distinctions In this formalism, the question is immutable, and the world
among classes of documents. Much followed from this basic image is that of an “answer document” flying like an arrow
insight and today the notion of combining like documents to intercept the question and bring it to ground. This char-
by common words is universal in practical information acterization is considered antique. Yet, in the test collec-
retrieval engineering. This concept has reached so refined a tions that we must use to determine the efficacy of varying
state, however, that discoveries now occur only in small approaches to retrieval, we essentially resort to the same
increments. formalism. Although these static methodologies may be
There seems little amiss in the notion that knowledge of altered by interactive experimentation, the quality of knowl-
the world about us should be encapsulated by words. Yet, edge is of a different order when we do so.
words may be false, or their meaning misunderstood. Take The authors in this issue present an alternative vision. It
the notion of words which uniquely define knowledge of is a vision which relies on user presentation of entire answer
nature, for example. One might universally define “spring” sets resting in a visual field. It is not an “answer document”
as the period between the winter solstice and the vernal upon which this vision relies, but rather an “answer set.” Is
equinox, or the word “sun” as the brightest of all objects in this vision of greater power than the ordering of retrieved
objects by lists? All the authors in this issue claim that it is
a more powerful vision. Moreover, this claim is made on the

face validation presented by our human senses and the
This Perspectives issue is dedicated to the memory of Robert R.
Korfhage: teacher, scholar, friend. pre-linguistic evolutionary properties of the human visual
neocortex. The visual processing skills which permitted our
© 1999 John Wiley & Sons, Inc. species early triumph over much stronger and better adapted

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 50(9):790 –793, 1999 CCC 0002-8231/99/090790-04
ones provide the prima facie rationale for the use of visu- more successful than Rorvig et al., due to their use of
alization in information retrieval. As one observer put it late Kohenen feature maps, a far more robust technique than the
at night at an ASIS conference, “Whatever we were doing to simple cosine vector intersection of terms. They were also
survive in the brush 50,000 years ago, you can bet that it able to take advantage of the homogeneity of their earth
wasn’t perfecting the skill of taking notes.” observation data. Their system is more than merely “prom-
No one claims that words are not useful in capturing ising.” It actually works.
knowledge. Indeed, for detailed control and description of
complex phenomena, they represent our most valued cul-
tural legacy. The illiterate cannot expect to profit much from Theoretical Foundations
visualization technology. Rather, the claim made in this
The article by Henry Small of the Institute for Scientific
issue is that the wholistic expression of the interrelation of
Information contributes the most clearly defined statement
knowledge objects is more powerful than any other compi-
of visualization techniques to appear in print. This is a rather
lation of such objects. In this paradigm, a single Linnean
bold statement, yet it is true. The article addresses the two
taxonomic tree is more revelatory than all the nodes of
decade-long historical use of visualization techniques in
description in the tree individually presented. And in this
calculating the relationships among scientific fields by their
proposition, these authors propose new visual grammars for
patterns of co-citation.
interpreting the manifold relations among words and im-
Researchers who seek “cookbook” renditions of algo-
ages.
rithms as they developed over time will find this article to be
Such propositions do not resolve the issues of knowing
a precise guide to alternatives. Small, in his article entitled
by any user. Nor do they resolve the issues of classification
“Visualizing Science by Citation Mapping,” begins with the
boundaries by which like phenomena may be known. They
simplest of algorithms as conceived within the computa-
do, on the other hand, offer a paradigm of alternative
tional limitations of the 1970s and ends with the most
engineering strategies for user apprehension of a multime-
ambitious ones presently available through Sandia National
dia world. Alternatives are priceless. Even a rough and
Laboratories (SNL). In this article, students and scholars
uneven road is preferable to a dead end. And although it is
will find algorithms applicable to many different aspects of
quite unfair to characterize advances in word retrieval as
the co-citation problem, as Small frankly describes the
negligible, such advances have surely been limited in recent
research paths that were successful and led to further en-
years.
hancements as well as the ones that were eventually dis-
carded either because of their inefficiency in computation,
The First Visual Interface or their failure to yield truthful insights validated by earlier
techniques. Many of these algorithms may be transplanted
The first visual interface to a collection was designed and
to address similar problems with data that may be encoun-
implemented at the Johnson Space Center of NASA in the
tered by researchers who require some intermediate pro-
years 1988 to 1992. In this interface, described in the article
cessing alternatives.
entitled, “The NASA Image Collection Visual Thesaurus,”
the authors assumed that the task of inferring images from
terms and terms from images would introduce invariance in Exemplar Applications
image indexing. The system remained in use for two years,
but eventually failed because no automatic method to assign This issue would surely be guilty of intellectual hubris
terms to images could be discovered, and the manual cost of without providing a few examples of applications of re-
such term assignment was too great to be supported. trieval visualization to common search problems. Although
The authors of this article attempted to use image de- the bibliography provided by Robert Korfhage in the online
scriptions clustered by cosine vector methods to identify a site illustrates the widespread trials of visualization technol-
unique image for every thesaurus term. The candidate im- ogy, these two applications illustrate what can be done on a
ages suggested by this method were often heartbreakingly completely practical level. Neither of the two systems pre-
close to the mark. But close was not good enough. These sented in this section is a demonstration system, rather both
developments were described in detail by Seloff (1990). are part of larger efforts to control massive collections. The
Although the Seloff article has been widely cited, the initial first of these two articles, by Ramsay et al., “A Collection of
article which specified the design parameters for the system Visual Thesauri for Browsing Large Collections of Geo-
of his report has remained unpublished. It appears in this graphic Images,” describes the efforts to provide some
article in the form originally presented at the ASIS midyear meaningful access to the more than eight million earth-
conference of 1988. observing images made available in this decade. The second
The article is significant because it represents the first article by Brooks and Campbell, “Interactive Graphical
identification of the components of a visual interface. Its Queries for Bibliographic Search,” describes the use of a
heritage is reflected in the article “A Collection of Visual Canadian government-sponsored National Archives product
Thesauri for Browsing Large Collections of Geographic designed to permit visual searching of text materials in a
Images,” by Ramsay et al., in this issue. These authors were traditional boolean logic-based environment.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—July 1999 791


In Ramsay et al., earth-observing images are parsed as References
texts are parsed. These authors use Gabor filters to combine
Brooks, M. & Campbell, J. (1999). Interactive graphical queries for bib-
like terrains. No clearer description of this process is avail- liographic search. Journal of the American Society for Information
able in the present literature. A Gabor filter yields textures. Science, 50, 814 – 825.
By segmenting images into component texture boundaries, Manjunath, B.S. & Ma, W.Y. (1996). Texture features for browsing and
search classes may be derived without resorting to textual retrieval of image data. IEEE Transactions on Pattern Analysis and
description. This technology thus succeeds where the Machine Intelligence, 18(8), 837– 841.
Ramsey, M.C., Chen, H., Zhu, B., & Schatz, B.R. (1999). A collection of
NASA effort by Rorvig et al. failed. The results reported in visual thesauri for browsing large collections of geographic images.
this article are concrete and verifiable; indeed, anyone who Journal of the American Society for Information Science, 50, 826 – 834.
has ever traveled over Arizona highways can authenticate Rorvig, M. (in press). Scaled and visualized TREC data and query feed-
these data. The authors acknowledge the contributions of back. Information Processing and Management.
Rorvig, M. & Fitzpatrick, S. (in press). Scaled and visualized structure in
the Alexandria Digital Libraries Project, particularly the
TREC IR test collection documents. Information Processing and Man-
work of Manjunath and Ma (1996), but claim their own agement.
extensions to this work as well. Rorvig, M.E., Turner, C.H., & Moncada, J. (1999). The NASA image
Brooks and Campbell, on the other hand, describe the collection visual thesaurus. Journal of the American Society for Infor-
translation of interactive boolean interfaces with data to a mation Science, 50, 794 –798.
Seloff, G.A. (1990). Automated access to the NASA-JSC image archives.
visual display. The “Islands” interface which they illustrate
Library Trends, 38, 682– 696.
harnesses the power of visualization to the process of com- Small, H. (1999). Visualizing science by citation mapping. Journal of the
mercial text retrieval. It is a fact that students are still American Society for Information Science, 50, 799 – 813.
mystified by these processes. One need only while away a
few minutes on any college campus to realize that most
persons still don’t have a clue about the meaning of term About the Authors
conjunction and its impact on search results. Anyone who Jennifer Campbell leads strategic projects at the Can-
has ever performed a boolean search will be able to examine ada Institute for Scientific and Technical Information
the effect of visualization on this process, and this article is (CISTI), at the National Research Council of Canada. Ms.
offered to permit a broad view of practice changes which Campbell has extensive experience with information man-
can be expected in future systems. “Islands” may not be the agement and retrieval systems, having managed a national
ideal interface, but something like it, to paraphrase an current awareness service and having worked intensively
advertising slogan, will be “. . . coming soon to a computer with clients searching for scientific and technical informa-
near you.” tion requirements. Ms. Campbell has a B.Sc. (Hons.) in
microbiology and a master of library sciences. (At the time
of publication, Ms. Campbell is with Nortel’s International
Conference Notes and Bibliographic References Optical Networks, London.) Martin Brooks leads the In-
teractive Information Group at the Institute for Information
One of the landmark developments in visual retrieval Technology, at the National Research Council of Canada.
occurred at a workshop held in Zurich in the summer of Dr. Brooks’s research interest is development of new algo-
1996 in conjunction with the Association for Computing rithms and software architectures for modeling numerical
Machinery’s Special Interest Group on Information Re- data, with applications to content-based multimedia re-
trieval Annual Meeting. For the first time, both European trieval, pattern recognition, and robotics. Dr. Brooks has a
and North American interests were represented in the de- B.Sc. in mathematics and a Ph.D. in computer science.
velopment of criteria for evaluation of visual information
retrieval. Among the Europeans, the newly formed Marshall Ramsey is a Ph.D. student at the University of
FADIVA group played the dominant role. The workshop Arizona’s Department of Management Information Systems
report reproduced in this issue has been widely circulated, and a member of the UA/MIS Artificial Intelligence Group.
but never before published. This conference led to the first He received his B.S., B.A., and M.S. in MIS from the
visualization of native TREC/Tipster data as a prelude to University of Arizona in 1993 and 1997. Ramsey was
formal visual information retrieval evaluation strategies awarded a graduate fellowship from the National Library of
(Rorvig & Fitzpatrick, 1998; Rorvig, 1998). Medicine for a semantic analysis and retrieval project from
For practical use in permitting users to copy this bibli- 1996 to 1997. His research interests include semantic, cross-
ography, it is available through the ASIS SIGVIS website media, and translingual retrieval and intelligent agents.
,http://www.asis.org/SIG/SIGVIS/references.html., where Hsinchun Chen is a professor of Management Information
future editions may be conveniently updated. Systems at the University of Arizona and head of the
UA/MIS Artificial Intelligence Group. He is also a visiting
senior research scientist at the National Center for Super-
computing Applications (NCSA). He received an NSF Re-
Mark Rorvig search Initiation Award in 1992, the Hawaii International
Lois F. Lunin Conference on System Sciences (HICSS) Best Paper

792 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—July 1999


Award, and an AT&T Foundation Award in Science and funded the work discussed in this article, and is developing
Engineering in 1994 and 1995. He received a Ph.D. degree the Interspace Prototype, the first large-scale analysis envi-
in information systems from New York University in 1989. ronment utilizing scalable semantics.
Chen has published more than 40 journal articles covering
Mark Rorvig is an associate professor at the School of
semantic retrieval, search algorithms, knowledge discovery,
Library and Information Sciences, University of North
and collaborative computing in many respected journals. He
Texas. He has conducted research into the measurement of
is a PI of the Illinois Digital Library Initiative project,
funded by NSF/ARPA/NASA, 1994 –1998, and has re- information retrieval, visualization generally, and image
ceived several grants from NSF, DARPA, NASA, NIH, and retrieval for the last 12 years. Charles Hudson Turner and
NCSA. He is the guest editor of IEEE Computer special Jesus Moncada were students of Dr. Rorvig during his
issue on “Building Large-Scale Digital Libraries” and the period as an assistant professor at the University of Texas at
Journal of the American Society for Information Science Austin.
special issue on “Artificial Intelligence Techniques for Matthias Hemmje is presently a research associate at
Emerging Information Systems Applications.” Bin Zhu is a the Integrated Publication and Information Systems Institute
Ph.D. student in management information systems at the (IPSI) of the German National Research Center for Infor-
University of Arizona and a member of the UA/MIS Arti- mation Technology located in Darmstadt, FRG. He has been
ficial Intelligence Group. Her research interests include active in the field of visual interface systems since the early
information analysis, information visualization, and human– 1990s, and is the principal architect of Lyberworld. His
computer interaction. Zhu received an undergraduate degree most recent publication is “Virgilio: A Non-Immersive VR
in meteorology in 1989 from Beijing University and an System To Browse Multimedia Databases,” in Proceedings
M.S. degree in atmospheric science in 1997 from the Uni- of the IEEE International Conference on Multimedia Com-
versity of Arizona. Bruce R. Schatz is professor in the puting and Systems, 1997. Dr. Hemmje has recently re-
Graduate School of Library and Information Science at the turned to the academic world from the private sector, where
University of Illinois at Urbana-Champaign. He holds joint he performed considerable work on the commercialization
appointments in computer science, neuroscience, and health of visual information retrieval interfaces.
information sciences. He is senior research scientist at the
National Center for Supercomputing Applications (NCSA), Henry Small received a joint Ph.D. in chemistry and the
and at the PACI Partner for Digital Libraries he is head of history of science from the University of Wisconsin in 1971.
the Digital Library Research Group in the information tech- After a brief career as an historian of science at the Amer-
nologies division. Schatz is the founding Director of the ican Institute of Physics’s Center for History and Philoso-
CANIS (Community Architectures for Network Informa- phy of Physics, he joined the staff of the Institute for
tion Systems) Laboratory, a unique facility which develops Scientific Information in 1972, where he is currently Direc-
fundamentally new technology for the Net and deploys it in tor of Contract Research. His 1973 paper in JASIS on
large-scale testbeds. Previous flagship projects have in- co-citation in the scientific literature led to numerous papers
cluded the Worm Community System in the NSF National on citation analysis and the mapping of science. His current
Collaboratory program and the NSF/DARPA/NASA Digital research centers on delineating document trails through
Libraries initiative (DLI) project that built a production science. He has served on the JASIS editorial board since
testbed of federated SGML for scientific journals. Schatz is 1986, and in 1987 received the JASIS Best Paper Award. He
currently the principal investigator of the flagship project in is recipient of the Derek de Solla Price Medal from the
the DARPA Information Management program, which journal Scientometrics, and is a fellow of the AAAS.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—July 1999 793

You might also like