You are on page 1of 20

Just-in-time

information
retrieval agents

by B. J. Rhodes
P. Maes

A just-in-time information retrieval agent (JITIR provide information from an unordered set of doc-
agent) is software that proactively retrieves and uments; no hand-coding of documents or special do-
presents information based on a person’s local
context in an easily accessible yet nonintrusive main-dependent techniques are required. This fea-
manner. This paper describes three implemented ture makes it easy to adapt systems to new domains
JITIR agents: the Remembrance Agent, Margin and information sources.
Notes, and Jimminy. Theory and design lessons
learned from these implementations are The word “agent” has many different definitions.
presented, drawing from behavioral psychology,
information retrieval, and interface design. They JITIR agents are “software agents” in that they are
are followed by evaluations and experimental long-lived, watch an environment, and can take ac-
results. The key lesson is that users of JITIR tion based on that environment without direct user
agents are not merely more efficient at retrieving intervention. They should not be confused with “em-
information, but actually retrieve and use more
information than they would with traditional bodied conversational agents” or synthetic charac-
search engines. ters, which are graphical interactive characters that
present a personality and interact with a user in an
animistic way. They should also not be confused with
distributed agent architectures or agent-oriented
programming models, which are both architectures
for software design rather than a class of applica-

I n this paper, just-in-time information retrieval


agents (JITIR agents) are introduced. JITIR agents
(pronounced “jitter agents”) are a class of software
tions. Unless specified, all references to the word
“agents” in this document specifically mean JITIR
agents.
agents that proactively present information based on
a person’s context in an easily accessible and non- The three necessary features of a JITIR agent are pro-
intrusive manner. They continuously watch a per- activity, the presentation of information in an acces-
son’s environment and present information that may sible yet nonintrusive manner, and awareness of the
be useful without requiring any action on the part user’s local context. These three features make JITIR
of the user. The environment that an agent looks at agents similar to search engines, alarms, and person-
is usually computational: e-mail, a Web page that a alized help systems, although they differ from each.
person is reading, or a document that he or she is
writing. However, it can also be a person’s physical
environment as sensed by a hand-held or wearable
娀Copyright 2000 by International Business Machines Corpora-
computer. The information a JITIR agent provides tion. Copying in printed form for private use is permitted with-
can come from any number of preindexed databases out payment of royalty provided that (1) each reproduction is done
of documents, e.g., e-mail archives, notes files, or without alteration and (2) the Journal reference and IBM copy-
documents from commercial databases such as the right notice are included on the first page. The title and abstract,
but no other portions, of this paper may be copied or distributed
INSPEC collection of the Institute of Electrical and
royalty free without further permission by computer-based and
Electronics Engineers (IEEE) technical paper ab- other information-service systems. Permission to republish any
stracts. A key feature of a JITIR agent is that it can other portion of this paper must be obtained from the Editor.

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 0018-8670/00/$5.00 © 2000 IBM RHODES AND MAES 685
Before the main discussion of the paper is begun, Notification systems present information from a rap-
the similarities and differences are discussed below. idly changing source (e.g., current stock prices),
based on relevance to a mostly static user profile.
Search engines and structured knowledge bases such JITIR agents are the reverse: they provide informa-
as Yahoo!** are inherently interactive: an informa- tion from a mostly static source (e.g., e-mail archives)
tion seeker has some query in mind and directly in- based on relevance to a user’s rapidly changing lo-
teracts with the program to obtain the desired in- cal context. Information provided by an agent is not
formation. JITIR agents, in contrast, are proactive. meant to pull a person out of his or her current con-
The user need not have a query in mind, or even text, but rather to add additional information that
know that information relevant to his or her situ- might be useful within that context.
ation exists. This proactivity has ramifications for the
information retrieval techniques that can be used, In summary, JITIR agents are similar to search en-
because the “query” utilized is limited to what can gines, alarms, and notification systems, but none of
be sensed in the environment. It also has ramifica- these systems has all three features necessary for
tions for interface design, because proactively dis- JITIR agents: proactivity, a nonintrusive yet acces-
played information can be far more distracting than sible interface, and attention to local context.
requested information.
Note that automatic help systems, such as the Mi-
When a cellular (cell) phone rings, it provides the crosoft Office Assistant**, fit the definition of a JITIR
information that a person is calling and may also in- agent. However, these systems are domain-specific;
dicate the identity of the caller by the tone of the they only provide information from a specialized help
ring. A ringing telephone is an alarm. It calls the user database, using information-retrieval techniques that
away from whatever task is currently being per- are specific for that particular help domain. Although
formed and cannot easily be ignored. If the cell no system can support every possible domain and
phone is turned off, the caller is often forwarded to environment, the JITIR agents described in this pa-
voice mail. A silent cell phone is extremely nonin- per are designed to provide information from a wide
trusive, but without the ringer it is necessary to call variety of sources using techniques that can be ap-
the voice-mail service directly to determine whether plied to a wide variety of task domains.
anyone has called. The information about who called
(if anyone) is less accessible than when it was indi- Implementations
cated by the ring. JITIR agents are designed to op-
erate in between these two extremes by presenting Three JITIR agents have been deployed in the course
information in such a way that it can be ignored, but of this research. The first and oldest is the Remem-
is still easy to access should it be desirable. Rather brance Agent (RA), an agent that operates within the
than presuppose whether or not a particular piece Emacs text editor, a program developed at the Mas-
of information is important or urgent, JITIR agents sachusetts Institute of Technology (MIT). The sec-
allow the user to decide whether to view or ignore ond implementation is Margin Notes, a Web-based
it, depending on the user’s current task and level of agent that automatically annotates Web pages as they
cognitive load. are being loaded into a browser. The third system
is Jimminy (also called the Wearable RA). Jimminy
Notification systems such as newspaper clipping ser- presents information via a head-mounted display at-
vices and alerts are proactive, but the information tached to a wearable computer, based on a person’s
they present is based on events outside of the user’s physical environment: where the person is, who he
local context. For example, an alert might be trig- or she is talking to, the time of day, etc. The system
gered whenever a new piece of e-mail arrives, a stock demonstrates the multidimensional aspects of the in-
price goes below a certain threshold, or news that formation retrieval system used and how these agents
fits a user’s personal profile hits the news wire. The can be applied “off the desktop.”
notifications are designed to pull a person out of his
or her current context (task) and provide informa- All three systems use the same information retrieval
tion about a different context that might require at- back end, called Savant, which was developed at the
tention. The urgency of a notification can range from MIT Media Laboratory especially for this research.
the immediacy of a fire alarm to a news briefing that Savant uses a template structure to identify file types
is announced but intended to be read whenever con- and parse out individual documents and fields. For
venient. example, it can identify a mail archive file and index

686 RHODES AND MAES IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000
Figure 1 RA main display

individual e-mail documents by the from field, sub- For example, the system can be configured to dis-
ject line, date, and body of the message. Savant can play suggestions from e-mail archives in the first four
also handle multiple data types including the GPS lines and suggestions from note files in the next two
(Global Positioning System) and date as well as raw lines. The display can also show different “scopes”
text, and can be easily extended to include more types from the same database, e.g., the first few lines can
of fields and similarity metrics. show suggestions based on the past 20 words,
whereas the others can show suggestions based on
The Remembrance Agent. Emacs is a popular text the past 500 words.
editor for the UNIX** operating system. Although
it is often used for traditional text-editing tasks such Figure 1 shows a screen shot of the RA when writing
as writing papers or computer code, the editor is pow- an earlier version of the introduction to this paper.
erful enough that it is also used for reading and writ- In this case, the documents being suggested come
ing e-mail and network news, and even browsing the from a subset of the INSPEC database of paper ab-
Web. Emacs supports the use of multiple buffers, stracts (about 150000 citations). The suggestions are
with each buffer containing a different file. all for papers that might be relevant to the section
being written. The full Emacs window is larger in
The RA 1 continually presents a list of documents that normal operation, which means a larger ratio of ed-
are related to the current document being written itor lines to RA-suggestion lines.
or read. These suggestions appear in order of rel-
evance within a special display buffer at the bottom The summary lines are a combination of fields in the
of the Emacs window. When the user is typing, read- document and are designed to give the user an in-
ing e-mails, or otherwise changing his or her envi- dication of the content of the document as quickly
ronment, the list is updated every few seconds. Sug- as possible. These summary lines can be customized
gestions from multiple databases can be listed in the for individual databases. For example, suggestions
display buffer, each with a certain number of lines. from the INSPEC database display author, date of

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 RHODES AND MAES 687
Figure 2 Emacs RA result screen

publication, and paper title. Articles from The Bos- automatically runs a search for other documents as-
ton Globe, in contrast, use a longer field to show sociated with that person or field. A full search form
headlines and show publication date, but do not show is also available to perform manual queries. These
the author of the article. All summary lines also in- features allow two-way dialog between user and RA.
clude a relevance score, consisting of zero, one, or For example, in the example in Figure 1, one may
two plus signs. By default, if a suggestion is below not care about the “News on-demand . . .” paper but
a minimum threshold, it is not displayed, and “No might want to know about other papers on news.
suggestion” is shown instead. It is also possible to Left-clicking on the subject field brings up summa-
configure the system to display below-threshold sug- ries for other documents with the same words in the
gestions with a minus sign as the relevance. title. These suggestions may then lead the user to
perform a manual search, which in turn may lead to
Right-clicking on a suggestion causes a small pop-up more suggestions.
window to display the keywords that lead to the ab-
stract being suggested. These keywords are also dis-
played to the far right of the suggestion line, although Different suggestion databases can be associated with
due to space constraints they are only visible when specific buffers or buffer types. For example, the sys-
the user has a wide display window. To see the full tem can be configured to automatically draw sug-
text being suggested, the user types a keyboard short- gestions from an e-mail archive when reading or writ-
cut or clicks on the desired line number. The full text ing e-mail, and from the INSPEC database whenever
then replaces the current display buffer, as shown in writing a paper in the LaTeX formatting mode.
Figure 2. By default the RA will not make sugges- These defaults are set by hand in a configuration file.
tions based on a suggestion that has been retrieved, Databases can also be switched manually with a key-
though this is customizable. board shortcut. Database changes are sticky: If the
database is switched once for a particular buffer, then
The system can also be used as a normal search en- revisiting that buffer will automatically switch to that
gine. Left-clicking on a field (e.g., the author field) database thereafter.

688 RHODES AND MAES IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000
Figure 3 Margin Notes screen shot

Choosing A Cell Phone Provider


''Re: cell phone
By J.A. Hitchcock question''
(rich)
Are you considering getting a cell phone, but don’t know which provider to choose? Do you already hackers–archive
May 24, 1999
have a cell phone but want to switch providers?

Digital vs. Analog Keywords: cell


phone phone plan
It used to be fairly easy when it came to buying a cell phone–– you went to your local provider, took coverage
advantage of their latest promotion (usually getting the phone for free with an annual contract), and
that was it. Today, not only do you need to decide whether you want an analog or digital phone or
one that does both, you also need to pick the provider that can provide analog and/or digital service.

What’s the difference?

Margin Notes. Margin Notes 2 is a JITIR agent that gives personal experiences with cellular service in the
automatically rewrites Web pages as they are loaded, MIT area. The circles at the top of a suggestion are
adding hyperlinks to personal files. As a Web page filled in with red to indicate the relevance of the sug-
is loaded, Margin Notes adds a black margin strip gestion.
to the right of the document. Like the RA, it then
compares each section of the document to prein- Placing the mouse over a suggestion note produces
dexed e-mail archives, notes files, and other text files, a list of the five keywords that were most important
based on keyword co-occurrence. If one of these in- in deciding the relevance of a suggestion to the an-
dexed files is found to be relevant to the current sec- notated Web page section (these are also shown in
tion of the Web page, a small “suggestion note” is the figure). Clicking on a suggestion note creates a
included in the margin next to the relevant section. new browser window that displays the full text of the
The note contains a quick description of the sug- e-mail, note file, or text being suggested. The sug-
gested text, a series of circles representing the rel- gested page also has feedback buttons used for eval-
evance of the suggestion, and a link to obtain more uating the system.
information. The suggestion note consists of a sub-
ject, date, and author for the suggested text, though Because Web pages will often cover many different
the exact makeup of the note is customizable based subjects, Margin Notes segments the Web pages it
on the database. annotates based on HTML (HyperText Markup Lan-
guage) tags. Each section receives its own annota-
Figure 3 shows a screen shot of a Web page on cel- tion, assuming the annotation is over a threshold rel-
lular phone service plans, annotated by Margin evance score. The exception is the first annotation
Notes. This example is a page and annotation that on each Web page, which is based on the entire page
came up during normal use of Margin Notes, though instead of a single section. This gives both a general
the relevance of the annotation is better than the annotation as well as a specific, focused view. Mar-
typical annotation. The database used for this ex- gin Notes uses the location of the annotation to in-
ample is the collection of Media Lab e-mail archives dicate scope: annotations appear at the top of the
since 1988, a total of over 180000 messages. The sug- section to which they are relevant. Thus it is anal-
gested document, listed in the black margin to the ogous to marginal notes in traditional print. The
right, is e-mail sent to the “hackers” mailing list and black margin strip achieves “branding”; all text to

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 RHODES AND MAES 689
the left of the margin is the page being annotated, a 418-MHz AM radio receiver detects unique radio
all text within the margin is placed there by Margin beacons that have been placed around the Media
Notes. Lab. 5 An alternate system uses IR (infrared light)
instead of radio, which gives a finer control over
Jimminy. Both the RA and Margin Notes provide where a beacon will be detected. 6 Beacon numbers
potentially useful information based on a person’s are converted to room numbers via a static lookup.
computational environment: his or her e-mail, doc- By putting these beacons into name badges, the wear-
uments, Web pages being viewed, etc. Jimminy pro- able can also detect who is currently in the same room
vides information based on a person’s physical envi- as the wearable user. This method is essentially iden-
ronment: his or her location, people in the room, tical to the active badge system designed at Olivetti. 7
time of day, and subject of conversation. 3 Process- However, because people do not generally wear
ing is performed on a shoulder-mounted “wearable these active badges, the people sensor is only used
computer,” and suggestions are presented on a head- for demonstration purposes. The wearable computer
mounted display. also has a 1.2-Mbit wireless network connection that
is used for communications and receiving informa-
The ultimate goal is that all information about the tion from sensors not directly attached to the com-
wearer’s physical environment will be available to puter. Sensors and Jimminy are interconnected us-
Jimminy through automatic sensors. However, the ing Hive, a distributed agent architecture developed
focus of this research is not sensor technology but by Nelson Minar. 8
rather what can be done with that technology once
it is available. To this end, Jimminy is a general ar- With recent improvements in sensor technology and
chitecture that can use plug-ins for any sensor that in the processor speed of wearable computers, it is
can be attached to a wearable computer. Informa- expected that other types of sensing technology will
tion that is not provided by sensors can be entered soon become available. One such technique is ASR
into the system by hand. The currently implemented (automatic speech recognition), which is now accu-
system has been demonstrated with passive sensors rate enough so that information retrieval on a da-
that detect a person’s physical location, people in the tabase of raw audio news stories can be performed
room, and the time of day. The general topic of a with over 55 percent precision. 9 This percentage is
conversation is entered by hand in the form of real- close to the level of performance that would be
time notes. needed to automatically generate queries for Jim-
miny by transcribing a person’s natural conversa-
The hardware for the system is the “Lizzy” wearable tional speech. Another technique is vision-based au-
computer designed by Thad Starner, 4 which consists tomatic face recognition, 10 which could be used
of a 100-MHz 486 processor running the Linux** instead of active badges to let the wearable know
operating system, a head-mounted display, and one- what other people are in the room.
handed chording keyboard. The entire system fits in
a small shoulder bag. The display is the “Private The wearable computer serves two main functions.
Eye**” made by Reflection Technology. Display res- First, it is used as a general note-taking system. In
olution is 720 ⫻ 280 pixels, monochrome red with conversations and lectures, notes are usually touch-
one level of dimness. This display gives a crisp 80- typed using the one-handed keyboard while main-
column by 25-row screen with a virtual image seem- taining eye contact with the person speaking. The
ing to float in front of the wearer. The keyboard is head-mounted display is occasionally viewed to see
the “Twiddler**,” a one-handed keyboard made by what has just been typed. Any note written using the
Handykey Corporation that uses a 17-button system, wearable computer can be tagged with people
on which multiple keys can be struck at once to ac- present, subject, location, and time stamp using a
cess all the symbols possible on a normal keyboard, single key combination. Over 850 notes have been
plus extra combinations for macros. Average typing taken and annotated on the wearable computer by
speed is about 35 words per minute using the Twid- this means over the course of four years. They range
dler, although Starner has been clocked at up to 60 from notes on classes and conversations at confer-
words per minute. ences to notes on dance steps. The second major use
of the wearable computer is to retrieve notes and
The wearable computer also includes several ways information anytime, anywhere. Reading and under-
to sense the outside world. When outdoors, a GPS standing information on the head-mounted display
is used to detect the wearer’s location. When indoors, is a more attention-demanding task than note-tak-

690 RHODES AND MAES IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000
Figure 4 Jimminy screen shot

ing. It is also more obvious to observers that the user Figure 4 shows a screen shot of Jimminy as it would
is distracted: the user’s eyes are clearly focused on appear on the head-mounted display. The top 80 per-
the screen, and it is difficult to speak and read at the cent of the screen is reserved for notes being entered
same time. For these reasons information tends to or read, plus the standard Emacs mode line giving
be looked up during pauses in conversation or in lec- time and information about the file being edited. The
ture situations. next four lines show suggestions based on the wear-
er’s current context. The display is the same as the
The interface for Jimminy is based on the RA but RA except formatted for the 80-column monochrome
differs in a few important ways. First, screen real es- display. The bottom mode line shows a condensed
tate is scarce for a wearable computer, so sugges- view of current biases for location, subject, person,
tions are presented in abbreviated form. Second, the and current text being typed, followed by the con-
features of the environment that are currently be- text currently being used. The actual head-mounted
ing sensed are listed on the mode line. Having this display is bright red on black with 720 ⫻ 240 pixel
information is important because sensor data may resolution, with the number of characters and the
be incorrect, and the user also needs a reminder aspect ratio shown in the figure.
about what environmental information was last en-
tered by hand so it can be updated. Third, keys are For the example scenario of Figure 4, imagine the
defined on the chording keyboard to increment or wearer is talking with David Mizell, a fellow “wear-
decrement the bias on different field types. For ex- ables” researcher from Boeing, and has just entered
ample, one could set biases so that the person field Media Laboratory room E15-335, which is where the
is twice as important as other fields when finding use- automatic embroidery machine used in one of the
ful documents. The biases are listed on the mode research projects is kept. This example is hypothet-
line as well. Finally, the system will modify the bias ical, although the screen shot is of the actual system
for certain features based on recent-modification using the real database of notes written on the wear-
times. For example, if the wearer of the system en- able. The mode line shows the current location and
ters a new room, the bias for room location is tem- people in the wearer’s environment and shows that
porarily increased by three. After a minute in the the bias for location and people present is four, and
same room, the bias is returned to base-line levels. the bias for subject and current text being typed are

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 RHODES AND MAES 691
both one. The periods around the location and per- e-mail archives are indexed as people, whereas the
son biases indicate that they have temporarily been title fields of HTML documents and the subject fields
raised because these features changed recently. In of e-mail messages are indexed as subjects. Differ-
one minute it will go back to the hand-set value of ent field types can have different similarity metrics.
one. Biases can also be set using the chord keyboard. For example, a collaborator at British Telecom has
The first suggestion “embroidery machine class” is developed a version of the RA that can find docu-
a note that was taken during the training class for ments that are related to a current location based
the embroidery machine. As can be seen in the key- on GPS coordinates. 11 If a document is similar to a
words section (far right), the suggestion is based person’s current context based on more than one
mainly on the room number, but also on some of field (e.g., if both the “from” field and body of a mes-
the words in the notes being typed. The second and sage partially match), a linear combination of the
third notes are about David Mizell, the first being similarities are used based on weights defined in the
his entry in the wearer’s contact Rolodex** file and templates. The template system is also used for que-
the other being notes about his talk on augmented ries, so fields can be parsed out of e-mail being writ-
reality at a conference in 1998. The final suggestion ten or Web pages being read. Templates are hard-
is an e-mail note about the wearable fashion show coded into Savant, but are designed to be easily
for which the conductive cloth technology was be- modified or added with a recompilation of the source
ing developed at the Media Lab, based on keywords code.
that have been typed into the notes area.
Automatically generated queries tend to contain ex-
Savant. All three implemented JITIR agents use the traneous text, such as signature lines and e-mail
same back-end system, called Savant. The front end headers, that is not useful for retrieval. Indexed doc-
senses the environment (that is, the document or uments will likewise have HTML markup and head-
e-mail being written, the Web page being viewed, or ers that will dilute the value of important data when
the physical environment of the user of a wearable selecting documents. To address this problem, each
computer) and sends that information in text form template can associate a filter bank with each field.
to Savant as a “query.” Savant then works as an in- A filter bank is an ordered list of Perl-style regular
formation retrieval engine: given a query, it produces expressions that match text that should be removed
a rank-ordered list of preindexed documents that from the field before parsing. For example, filters
best match the query. Savant consists of two pro- associated with the e-mail body field recognize and
grams: ra-retrieve performs information retrieval remove e-mail signature lines, headers from included
based on a query, and ra-index creates index files so files, and common lines such as “Begin forwarded
retrieval can be performed quickly. Indexes can be message.” Filters associated with the e-mail person
created from generally useful sources such as a col- field remove all information except for the user
lection of newspaper or journal articles, organization- name, and filters associated with all fields in HTML
wide collections such as office memos, or from per- documents remove hypertext tags and comments.
sonal sources such as e-mail and notes. Previous
versions also allowed pages to be indexed directly Although Savant can simultaneously use different
from the Web. Documents are usually reindexed similarity metrics for different fields, the most com-
nightly to incorporate new changes and additions. mon metric is for similarity between two texts. The
particular algorithm used is the Okapi version of the
The power of Savant comes from a strong template- Term Frequency/Inverse Document Frequency
matching system that can recognize documents, parse (TF/iDF) algorithm, a fairly standard benchmark text
out fields, and index them all automatically. For ex- retrieval algorithm. 12
ample, if pointed at a top-level directory of heter-
ogeneous files, it will recognize and parse e-mail ar- Theoretical issues
chives, HTML files, LaTeX files, notes taken on the
wearable computer, paper abstracts from the INSPEC There are three primary questions being addressed
database, and raw text, while ignoring other file for- by this research, ranging from the fields of psychol-
mats. It will also break e-mail archives into individ- ogy and decision sciences to document retrieval to
ual messages. This means indexing can be performed interface design and cognitive science. An overview
completely automatically with no hand annotation. of the issues surrounding these questions is presented
Different fields from documents are identified and below. For a full discussion, interested readers are
indexed separately. For example, the from fields of referred to the dissertation by Rhodes. 13

692 RHODES AND MAES IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000
How does the use of a JITIR agent affect the way in a primary task. For example, when performing a
which a person uses information? Imagine an ex- search for information about a digression, a person
tremely paranoid executive who wants to be sure he needs to use short-term memory to keep his or her
reads every piece of information related to his work. place in the larger framework of the task. The
To make sure he does not miss anything, he searches amount of short-term memory required, and thus
his old e-mails and office memos after every para- the amount of effort required in the task, will de-
graph he writes. This would imitate the effect of a pend on a number of factors including the complex-
JITIR agent. In fact, it would be better than a JITIR ity of the digression, the amount of time required
agent because he could perform his searches more to complete the task, and the similarity of the digres-
precisely than could any automated process. sion to the primary task.

Of course, no one is as meticulous as the executive When applied to the information search domain,
described. Sometimes a person wants a particular these theories suggest that if the cost of finding and
piece of information, and in this case a search en-
gine is the appropriate tool. Other times a person
will not bother to perform a search because of a lack
of time, because the information he or she already
has is “good enough,” or because the person expects JITR agents greatly reduce
a search will not turn up anything useful.
the cost of searching
Such action (or inaction) is in keeping with Zipf’s for information
Principle of Least Effort, 14 which states that people by doing most of the work
will try to minimize their total future work, given their automatically.
best estimates at the time. Payne, Bettman, and John-
son expand on this principle, arguing that people
choose decision-making strategies based on multi-
ple goals, including goals of accuracy and the con-
servation of limited cognitive responses. 15 This using information is more than the expected value
framework is based on anticipated accuracy versus of that information, then the search will not be per-
anticipated effort: a decision maker assesses the ben- formed. This result could be the case for several rea-
efits and costs of available strategies and chooses the sons. First, the desired information may not be im-
strategy with the best cost/benefit trade-off. The ef- portant enough. For example, the searcher might
fort-accuracy framework has been used to explain think he or she “remembers it well enough,” or there
why people behave differently using different kinds might be little reward for accuracy. The person might
of information displays in a computational environ- also think a search will not be fruitful, and thus the
ment. 16 It can also be used to explain why people expected value is low even if an unexpectedly good
make different purchases depending on whether per- result would be worthwhile. Finally, the person could
unit prices in a grocery store are listed on a single be under time pressure or dealing with a poor search
sign or only listed under each product. 17 interface, and thus the effort required for a search
is too costly.
Studies in computer response time indicate that small
increases in the effort required to perform a task can JITIR agents greatly reduce the cost of searching for
have large effects on whether a person will bother information by doing most of the work automatically.
acting at all. For example, Robert Miller argues that The downside is that queries are automatically gen-
for many tasks, more than two seconds of response erated and therefore will not be as exact as would
delay is unacceptable and will result in fewer uses a human-generated query. This low-cost search is
of a particular tool, even at the cost of decreased more than just a time-saver, it has the qualitative ef-
accuracy. 18 He also argues that there is not a linear fect of providing information in situations where an
decrease in efficiency as response delay increases; individual is not personally willing to perform a
there are response delay thresholds beyond which search. In terms of an effort-accuracy trade-off, a
sudden drops in mental efficiency will occur. Miller JITIR agent is a resource for retrieving information
is primarily discussing system response delays, but that would otherwise be too costly to search for or
the same short-term memory limitations he discusses where a search by other means has a low expected
also apply to performing subtasks that distract from benefit.

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 RHODES AND MAES 693
How can a JITIR agent automatically find informa- can be expected to provide information that might
tion that would be useful to a person by looking at be related to anything typed in that application, but
that person’s current environment? The perfect re- not in other applications. This constraint provides
trieval engine for a JITIR agent would instantly be some natural limits on the kinds of tasks being per-
able to know whether a piece of information would formed, though usually not as many as are enjoyed
be useful to a person. The next best thing would be by a specialized search engine. It is also sometimes
an engine that could know a person’s task, what in- possible to use features of the task environment to
formation he or she already knows, the thoughts he make a good first guess at the kind of information
or she is currently thinking, and how he or she pro- that might be most useful. For example, the RA uses
cesses new information. With such information, an the Emacs editing mode to tell whether a person is
agent could easily deduce whether information reading e-mail, writing code, or writing a paper. This
would be useful. Unfortunately, we have neither the information is used to choose between archived
ability to prognosticate nor read minds. An agent e-mail, a code library, or paper abstracts from the
must “make do” with whatever limited information INSPEC database.
it can sense automatically from a person’s compu-
tational or physical environment. Multiple topics. In a manual information system a per-
son usually searches for one piece of information at
Because it is impossible to directly sense whether a a time. A person’s environment will usually relate
document will be useful or not, features of the envi- to several different subjects. For example, the dis-
ronment that are detectable must be used as pre- cussion in this paper ranges from information re-
dictors, or proxies for usefulness. For the JITIR agents trieval to interface design to specific implementations
presented here, the similarity of a document to the of JITIR agents. E-mail will often cover even more
person’s current local context is used as the proxy disparate topics within the same message.
for usefulness. The more similar it is, the more use-
ful it is likely to be. Sometimes multiple-subject queries can be an ad-
vantage. Any documents that match more than one
Although Savant is designed to use many similarity of the subjects (e.g., in this case relating to both in-
metrics, the implemented systems all rely heavily on formation retrieval and interface design) are likely
text-retrieval techniques. The retrieval problems for to be useful documents, and documents relating to
JITIR agents differ from traditional text retrieval as just one subject represented in a query might still be
described in the following subsections. useful. However, multiple-subject queries can also
cause traditional IR techniques to miss documents
Lack of priors. A person chooses information re- that are very relevant to one particular subject in fa-
sources based on certain needs at the time. For ex- vor of documents only somewhat relevant to all sub-
ample, if a doctor needs a medical reference, he or jects represented. Furthermore, parts of the query
she will use the National Institutes of Health search might not be useful to a retrieval engine at all. For
site. If the doctor wants to know where to play golf, example, a signature line at the bottom of an e-mail
he or she will browse the tourism board database. may not contain any information relevant to a per-
The choice of information source is one of the best son’s current task or interests. This information must
indications of a person’s current needs. In terms of therefore be removed or otherwise ignored.
probability, the fact that a person chose a particular
database gives a strong prior (bias in probability) that Different criteria for evaluation. In the text-retrieval
information in that database will be useful. Usually field, algorithms are typically evaluated based on
JITIR agents do not have such a direct indication of whether the documents returned are relevant to the
what information might be valuable. Although one given query. It is assumed that the query is a good
could access a combined database of all information indication of the user’s interests. JITIR agent que-
that might be valuable, this solution neither scales ries are automatically created, so relevance is not
nor retrieves quality documents. good enough. To evaluate the information retrieval
algorithm of a JITIR agent, one needs to show that
Of course, some priors exist even with JITIR agents. the hits returned are useful to a person given a cur-
The kinds of information that can be provided by an rent task. Although relevance may correlate with use-
agent will be implicitly constrained by its interface, fulness, the two are not the same. For example, a
the sensors it uses, and the people who use it. For citation from the INSPEC database could be relevant
example, an agent embedded in a word processor to a paper a researcher is writing but still be useless

694 RHODES AND MAES IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000
if the suggested document is already known. Al- documents. More suggestions, even if all of them
though relevance can be evaluated given only a were relevant, would be too much of a distraction.
query, usefulness must be evaluated for a given per- For this reason, the information retrieval algorithms
son in a given task environment. This requirement for JITIR agents should tend to favor precision over
makes it more difficult to generalize results. recall.

Queries are longer. In the past five years the IR field How should a JITIR agent present potentially use-
has been attempting to produce relevant documents ful information? The most important design con-
based on shorter queries. This trend has been spurred straint for JITIR agents is that reading provided in-
by the needs of Web search engines, where the av-
formation should be a secondary task. Unlike users
erage query length is less than three words. 19 Many
of a search engine, JITIR agent users are not actively
of the techniques developed have been ways to per-
seeking information being suggested. They are less
form query expansion, where a short query is auto-
tolerant of distraction from their primary task and
matically augmented with words appearing in the
best ranked documents of an initial probe search. 20 are less willing to dig through suggestions to find use-
With JITIR agents, the environment provides a large ful information. Furthermore, even if a suggested
amount of data that can potentially be a part of a text is highly relevant to a user’s current context, the
query, so query expansion is less important. person may not be interested in it. The text could
already be known, the user may not wish to be dis-
Both indexed documents and queries are multivariate. tracted, or may simply not want any new informa-
Both documents being suggested and queries them- tion. For this reason an interface must have a low
selves will often contain people’s names, dates, sub- cost for false positives. It must be an ignorable
jects, abstracts, locations, phone numbers, and a host interface, although not so ignorable that it is never
of other information types. Although manual que- used.
ries can also contain such information, they are of-
ten sparser because of the difficulties of entering The design of an ignorable interface must take into
large amounts of data. account two limitations on cognitive processes: fo-
cused attention and divided attention. Focused atten-
JITIR agents need both ranked-best evaluation and fil- tion is the ability to attend to intended stimuli and
tering. Search engines normally produce a ranked- tasks while ignoring others. The other side of the coin
best list of hits for a given query. The absolute rel- is divided attention: the ability to switch between tasks
evance score of a hit is not important as long as this or stimuli. Generally speaking, it is easier to focus
relative ranking is correct. JITIR agents require a com- one’s attention when the features of the environment
bination of ranked-best evaluation and filtering. They being attended are dissimilar to distractions. 22 For
need to display the top few hits, but also need to dis- example, it is easy to listen to talk radio while driv-
play an absolute confidence in the relevance of a sug- ing because driving is largely visual and spatial,
gestion and to potentially suppress the display of low- whereas listening to the radio is auditory and ver-
quality suggestions. This need is similar to the bal.
problem faced by text-filtering systems such as au-
tomatic newspaper-clipping services. However, these
systems assume that a stream of documents is com- Unfortunately, there is a trade-off between focused
pared to a long-lasting set of query profiles, such as and divided attention: the same similarity in display
a list of keywords expressing user interests. 21 The that makes it harder to focus on one stimulus and
queries in JITIR agents are not long-lasting, so most ignore the other makes it easier to switch focus be-
of the machine-learning techniques used by these tween two stimuli. This trade-off leads to the prox-
text-filtering systems cannot be applied. imity compatibility principle, which states that high
display proximity (similarity) helps in tasks with sim-
High precision is required. Information retrieval re- ilar mental proximity, and where information is re-
searchers often talk about the trade-off between pre- lated and needs to be treated together. 23 For exam-
cision (making sure all suggested documents are of ple, military pilots use head-up displays to place
high relevance) and recall (making sure all relevant annotations visually on top of enemy and friendly
documents are suggested). Because JITIR agents aircraft. This use of spatial proximity helps link the
largely play a supporting rather than a primary task annotation to the object but can make it more dif-
role, they usually should not suggest more than a few ficult to process only one without the other.

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 RHODES AND MAES 695
Divided attention is also easier when switching be- On a related note, it is important to be able to dis-
tween tasks because it does not require a large shift tinguish a suggestion from the environment. For ex-
in the contents of short-term memory. This situa- ample, it must be clear that a suggestion on a Web
tion is obviously the case when one of the tasks has page comes from Margin Notes and is not actually
low memory requirements. For example, turning on a part of the original Web page being viewed. One
a light while talking on the phone requires little shift- method is to ensure that suggestions appear out-of-
ing of memory because the location of the light switch band, for example, in a separate window or differ-
can be seen: it is knowledge in the world. However, ent modality. Another method is to ensure that sug-
gestions look nothing like the user’s other context.
For example, annotations in an augmented reality
system are never mistaken for the real world because
the resolution of computer graphics is not yet high
It is important
enough. The third method is branding: ensuring that
to be able to distinguish annotations have a unique “look and feel” that sets
a suggestion them apart. For example, annotations could use a
from the environment. different font, color, location, or modality than the
information being annotated.

To make divided attention (task switching) easier,


finding an object or reading e-mail while on the a JITIR agent should display information in a way that
phone requires more swapping of short-term mem- is congruous with the parts of the environment to
ory between tasks, and the phone conversation will which it relates. For example, it is likely easier to
probably suffer. Short-term memory also requires look at suggestions from an e-mail archive when
less swapping if the two tasks share similar mental reading or writing e-mail because the two formats
models, or schema. For example, several schema will have the same mental model. Similar mappings be-
be shared by tasks that relate to the same general tween suggestion and the context to which it is rel-
task or subject. 24 evant can be achieved with color and font. Probably
the most effective way to link information is to use
JITIR agents need to allow persons to focus their at- spatial proximity: put information near what it is
tention on their primary task, and also to divide their about. In the Remembrance Agent, the suggestion
attention between the primary task and the infor- display is in a buffer within the editor window rather
mation provided by the agent. than a separate window. This placement links the
RA with the particular text being edited and keeps
To make focused attention easier, a JITIR agent it from being linked with other applications running
should use different modalities or channels than are on the desktop. In Margin Notes, annotations ap-
used by a person’s primary task. Such use is espe- pear to the right of the paragraph or section to which
cially important when the primary task is demand- they relate. Moving the scroll bar in the Web browser
ing. For example, Jimminy is designed to give infor- moves the annotations as well, increasing the men-
mation to persons as they engage in conversation or tal linkage between the two.
listen to lectures. In these environments the audi-
tory modality is primary, so Jimminy uses a visual
display for output. The Margin Notes example reveals another use of
spatial proximity: It can indicate to which part of a
It is also important that suggested information is not user’s context a suggestion is relevant. Even if a per-
linked to parts of the environment to which it is not son’s full context is constrained to a single Web page,
relevant. For example, if an agent is giving informa- it should still be apparent whether a suggestion is
tion related to a text editor, the display should not relevant to a single paragraph, a section, or the en-
be near the Web browser, nor should it have a color tire Web page. Spatial proximity is a good way to
scheme or layout that associates it with the Web indicate this information, although when this is not
browser. Doing so would encourage checking the possible, the indication can still be by fiat, e.g., by
JITIR agent output at times when the suggestions are declaring that suggestions are chosen based on rel-
almost guaranteed not to be related to the current evance to the whole body, or by indicating the scope
task. of relevance in the suggestion in some other way.

696 RHODES AND MAES IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000
Even when the interface is designed to help divided sensor watching the user’s actions. No user action
attention, a JITIR agent will still increase cognitive or attention is required to show the agent what to
load. When the information provided is valuable, the do. The output at this stage is potentially nothing:
trade-off is worthwhile, but false positives will still the agent simply passes the HTML to the browser and
be a problem. False positives are also guaranteed: continues.
a JITIR agent will never reach 100 percent useful in-
formation without perfect information about a per- When Margin Notes determines that a suggestion
son’s mental state. Given this limitation, informa- is above a certain relevance threshold, it automat-
tion should be displayed to minimize the cognitive ically jumps to the second stage and shows a sug-
and perceptual load the user must undertake to eval- gestion note. In keeping with the philosophy that
uate the quality of information provided. jumping to the next stage should be commensurate
with the amount of work or attention required by
One display technique is what we call a “ramping the current stage, no user thought or action is re-
interface,” where information is conveyed in stages. quired to display the suggestion note.
Each stage of a ramping interface provides a little
more information, at the cost of a little more atten- At this point the user may ignore the suggestion en-
tion required to read and understand it. The idea is tirely, and the interaction has cost nothing more than
to present useful information, while at the same time some screen real estate and a very minor cognitive
allowing a person to detect bad information and bail load. If the user wishes to go to the next stage, he
out as early as possible. or she can quickly view a row of filled-in circles in-
dicating the relevance value for that suggestion.
In a ramping interface the user should always be able
to obtain more information by going to the next stage, The next stage is reached by reading the suggestion
and the action required to get to that stage should description, which requires further attention on the
be proportional to the amount of information pro- part of the user. Information in the suggestion note
vided in the current stage. It should only require a is as concise as possible to allow rapid scanning for
simple action such as reading a pop-up window for content. The box also contains many different kinds
a user to go to early stages. Later stages might re- of information to try to contextualize the suggestion.
quire the user to click on an icon, trading off sim- For example, when e-mail is displayed, the box con-
plicity for increased control of what information is tains subject, author, date, and archive file name in
displayed. Note that stages are not defined by dis- the hope that at least one of these will be a good
play actions taken by the agent, but rather are de- indication of what the suggestion is about.
fined as pieces of information that can be individ-
ually processed by a user. For example, a Web page At the fourth stage of Margin Notes the system dis-
with a large-font title, keywords in boldface, and then plays the most relevant keywords. Going to this stage
the body of the page would contain three stages even requires physical action by the user (a mouse-over),
if the page is rendered all at once, because a reader and gives an incremental increase in the amount of
can easily process each chunk of information sep- information returned. While keyword information
arately and use that information to decide whether could have been included in the original suggestion
to read further. (reducing the system to a four-stage ramping inter-
face), it was decided that this information made the
Of the three implemented JITIR agents, Margin suggestion note too cluttered. To jump to the final
Notes best illustrates the idea of a ramping inter- stage, the user clicks on the link and obtains the fully
face. suggested text. At this point the user is completely
committed to seeing the text and has been distracted
The first stage could be referred to as the no-action, from the primary task. It is hoped that if the user
no-information stage. In this stage the system an- arrives at this point, the suggested text is worth the
alyzes a section of a Web page and decides whether attention spent.
to leave any suggestion at all. There are several rea-
sons a section might not be annotated. First, the best Note that the actions required to go through the
suggestions may still be below the required relevance stages of the Margin Notes ramping interface in or-
threshold. The section may also be too small to an- der are also the natural actions to obtain more in-
notate, or the document as a whole might be too formation in the context of Web browsing. The user
small. At this first stage, the input is simply a passive sees a link, reads it, moves the mouse to the link,

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 RHODES AND MAES 697
and clicks. This action sequence allows the user to other reported she had little knowledge about hous-
jump to the full suggestion quickly, while still seeing ing and so wrote about “what The Tech says about
all stages. housing.” Results were consistently good for the RA
in terms of preference and usage. Margin Notes,
When designing a ramping interface, it is also help- however, did poorly in both categories. This outcome
ful to consider at what stage a user is likely to re- is not unexpected, since Margin Notes was never dis-
ceive the information needed. In the Margin Notes playing information when the subjects were perform-
system, it is assumed that most of the time users will ing their primary task. Within the setup of this ex-
receive information from the full suggested text, but periment, the amount a subject used the search
that occasionally they will be reminded by the sug- engine set an upper bound to how many articles
gestion note itself and never need to follow the link. would even be displayed using Margin Notes.
In contrast, a JITIR agent intended for an automo-
bile or other attention-demanding environment Subjects who were given all three tools showed a con-
might be designed such that users normally need not sistent preference for the RA over the search engine
go past the first stage of a suggestion. and the search engine over Margin Notes in terms
of ranking, overall usefulness, and whether they
Evaluations would want the tool for a similar task. As can be seen
in Table 1, 78.5 percent of experimental subjects
Two user studies have been performed. The first is ranked the RA as their number one choice. (Because
a controlled-task experiment that compares the use- some answers were left blank, percentages do not
fulness of JITIR agents to a traditional search engine. add up to 100 percent.) The RA was also rated around
The second study evaluates the information retrieval a point (out of seven) higher than the search engine
used by the implemented JITIR agents. It also exam- for both usefulness and if the subject would want to
ines how the database used by an RA affects the qual- use the system for a similar task. Margin Notes, in
ity of suggestion. These systems have also been de- contrast, was consistently rated lower than the other
ployed for several years, and user feedback has been two systems. The differences between the RA and the
used to improve the systems. rank order of the search engine and whether the sys-
tem was useful are both statistically significant (p ⫽
Controlled task evaluation. The first experiment 0.05); the differences between the search engine and
compares JITIR agents to nonproactive information Margin Notes and the differences in whether the sub-
tools such as search engines. Twenty-seven subjects ject would want to use the system again are not sig-
were recruited from the MIT community and were nificant. Errors are to the p ⫽ 0.05 level.
divided randomly into experimental and control
groups. Subjects from both groups were asked to The usage patterns for the different tools point to
write a newspaper-style article about housing at MIT, similar conclusions. As can be seen from Table 2,
currently a hotly debated topic around campus. Con- subjects from the experimental group viewed around
trol group users were given a normal Emacs text ed- three times as many different Tech articles as did
itor and a Web browser accessing the MIT Tech news- those in the control group, and within the experi-
paper search page. 25 Experimental group subjects mental group subjects viewed around two-and-a-half
were given an Emacs augmented by the Remem- times as many articles using the RA as they did using
brance Agent and the MIT Tech search page aug- the search engine. The difference between total pages
mented by Margin Notes. The Emacs RA, Margin viewed in the two groups and the differences between
Notes, and The Tech search page all pulled from the search engine and RA use within the experimental
same archive of 16 240 news articles. All interactions group are significant to the 0.01 level. Less than one-
with the information tools were logged, and subjects third of the experimental subjects viewed any doc-
were also given pre- and post-task surveys. uments suggested by Margin Notes at all, and even
those did not view many.
A control group of 14 and an experimental group
of 13 subjects were tested. Of these, two outliers from The essays themselves were also examined and coded
the control group viewed a number of articles more to see whether a difference could be found between
than 1.5 times the inner-quartile distance from the the control and experimental groups. Articles were
third quartile and were eliminated. One of these two blinded and coded for number of facts mentioned,
reported that he had finished early and browsed The number of references to The Tech, and overall qual-
Tech about other issues for the remaining time, the ity. However, individual variance was high, and no

698 RHODES AND MAES IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000
statistically significant difference was found between Table 1 Experimental group reported preferences (n ⫽ 13)
the two groups.
Search Emacs Margin
Engine RA Notes
From these results it is clear that the RA is not just
an alternative for a search engine; rather it facili- Found useful (1–7) 3.8 ⫾ 1.3 5.1 ⫾ 0.7 2.8 ⫾ 1.1
tates the retrieval of information that would not oth- Would want again (1–7) 4.8 ⫾ 1.1 6.0 ⫾ 0.8 3.9 ⫾ 1.2
Percent ranked #1 27 78.5 9
erwise be viewed at all. This conclusion is supported Percent ranked #2 64 15 9
by the fact that search engine use was not signifi- Percent ranked #3 9 8 82
cantly reduced when users were given the RA, and
the total number of articles viewed increased. This
fact follows directly from the accuracy versus effort
trade-off described earlier. As one subject com- Table 2 Number of unique Tech articles viewed (n ⫽ 12,
mented, “the hits [from the RA] weren’t as effective n ⫽ 13)
or relevant as the ones from the search engine, but Total Search Emacs Margin
I would never bother using the search engine.” Two Engine RA Notes
other experimental subjects commented that they
Control mean 2.8 ⫾ 1.7 2.8 ⫾ 1.7
“would be writing opinions, and the RA would bring Control median 2 2
up the facts to support those opinions automatical- Experimental 7.3 ⫾ 2.4 1.9 ⫾ 1.0 4.9 ⫾ 1.9 0.5 ⫾ 0.5
ly.” This comment is in contrast to a subject in the mean
control group, who complained that he bothered to Experimental 8 2 6 0
median
check his facts with the search engine, but it was a
great deal of effort, and he almost did not bother.
Perhaps the best review of the RA was a subject who
commented that it was “almost an unfair advantage Information retrieval test. A second experiment was
to have the RA.” performed explicitly to test the query-free informa-
tion retrieval system used by the RA, Margin Notes,
A few criticisms were made about the RA that could and Jimminy. Media Lab researchers were asked to
easily be remedied. First, several subjects commented submit conference papers and articles that they had
that they would start receiving very good suggestions, converted to HTML and placed on the Web. These
but that the display would “settle” and show the same papers were annotated using Margin Notes, running
suggestions after a few paragraphs had been writ- with a database containing a subset of the INSPEC
ten. This situation occurred because the RA was set collection of abstracts and citations. The database
up with only one scope, based on the past 500 words contained 152 860 citations that had been picked
of the text being written. As described earlier, the based on IEEE thesaurus labels that matched MIT Me-
RA is configurable for multiple scopes, each describ- dia Lab areas of research (e.g., “user interfaces,”
ing a different database or based on a different “computer vision,” etc.). Margin Notes was patched
amount of text. It would have been simple to con- to display all annotations regardless of relevance, and
figure the RA so a third of the suggestions were based the relevance score was blanked out. The research-
on the past 20 words being written, a third based on ers were then asked to rate each annotation in terms
of relevance to their paper in general, relevance to
the past 100 words, and a third based on the past
the section it annotates, and whether the citation
500 words written. However, this particular exper-
would be useful if they were writing or rereading their
iment only used the one scope.
paper. Nine researchers were asked to evaluate the
annotations on their papers, for a total of 112 spe-
Several subjects also requested the ability to high- cific annotations.
light a particular region or word for which they
wanted more information. While the RA does not Generally, the annotations were rated highly for rel-
have this feature yet, it does have the capability to evance. As can be seen in Table 3, the average score
perform full manual queries based on individual was 3.3 out of 5 for relevance to the paper in general
strings. However, subjects were not informed about and 3.4 out of 5 for relevance to a specific section.
this feature because the experiment was designed to In all, 7 ⫾ 9 percent of the annotations received a
examine the proactive nature of the RA. The high- score of four or five for general relevance, and 4 ⫾ 9
lighting feature will be added in a future release. percent for relevance to a specific section (p ⫽ 0.05).

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 RHODES AND MAES 699
Table 3 INSPEC annotation rating breakdown (5 ⫽ best) the raw unnormalized score. Unfortunately, the non-
normalized score had only a low correlation to hu-
General Section Usefulness man judgments of relevance (r ⫽ 0.36, p ⫽ 0.0002),
Relevance Relevance accounting for only 13 percent of the variation. Nor-
Score ⫽ 1 16% 21% 32%
malized scores were even lower (r ⫽ 0.13) and were
Score ⫽ 2 16% 9% 18% not statistically significant. The reason is query length
Score ⫽ 3 21% 16% 15% correlates to relevance (r ⫽ 0.27, p ⫽ 0.05), because
Score ⫽ 4 17% 19% 18% with more words in the query, the text-retrieval sys-
Score ⫽ 5 30% 35% 17% tem can find a better match. Correlations between
Average score 3.3 3.4 2.7 scores and usefulness were not significant, presum-
Error 0.3 0.3 0.3 ably because highly relevant annotations were more
likely to already be known by a reader and thus not
be useful.

Usefulness scores were not as good as relevance Related work


scores. The average usefulness score was only 2.7,
with 35 percent of the annotations receiving a 4 or Most directly related to this work are other JITIR
5. With a further look at the data, it is clear that, at agents: systems that proactively provide information
least for this task, relevance was a necessary but not based on a person’s local context in a nonintrusive
sufficient condition for usefulness. Of particular in- yet accessible manner. Several such systems are de-
terest is that, of the 4 ⫾ 7 percent of citations that scribed below.
were rated low on usefulness (1 or 2) but high on
general relevance (4 or 5), 100 percent were noted Watson 26 is a system that automatically produces and
as being not useful because the citation was already submits AltaVista** or other Web queries based on
known by the person doing the rating. Moreover, 68 a user’s current Web page or Microsoft Word** doc-
percent were not only known, but were in fact writ- ument. Results are clustered and displayed in a sep-
ten by the person doing the rating. This result is en- arate window. Watson is designed both to proactively
couraging for three reasons. First, it indicates that display information and to augment queries to Web
higher usefulness ratings could be obtained by us- pages with local context. It also includes domain-spe-
ing a small amount of domain knowledge to avoid cific features, such as automatically searching the Ar-
suggesting references that the user most likely al- riba Vista image search engine whenever a Microsoft
ready knows. For example, in this case the system Word user creates an image caption.
could avoid listing citations that are already refer-
enced in the paper or written by the user of the sys- SUITOR 27 watches a person’s Web browser, word pro-
tem. Second, the fact that these references were rel- cessor, and other applications and provides poten-
evant but already known by the author of the paper tially useful information above the task bar at the
being annotated means the annotations would prob- bottom of the screen. It is based on a blackboard
ably be useful to someone less well-versed in the par- architecture with multiple agents, each looking for
ticular subject. Third, just because a reference is al- domain-specific information. For example, if a per-
ready known does not mean it is not useful. Some son is looking at the IBM Web page and also looking
of the readers rated known annotations as still be- at financial pages, SUITOR will present IBM stock
ing useful, commenting that they were a good re- quotes at regular intervals.
minder.
Letizia 28 automatically recommends Web pages a
Because the relevance score is used both as an in- person might want to visit, based on a short-term user
dication of usefulness and for filtering low-relevance profile and the Web page currently being viewed.
suggestions, it was hoped that the score would have Letizia automatically creates a user profile by com-
a strong correlation to how relevant a person would piling keywords contained in previous pages viewed
find an annotation. Two relevance scores were tested. by the user. It then acts as an “advance scout,” fol-
The first is normalized by the number of unique lowing links from the user’s current Web page and
words in the query (the section being annotated). bringing up pages that match the user’s profile. In
When this normalization is not performed, relevance most JITIR agents the source of suggested informa-
scores for long queries can be orders of magnitude tion is personalized and static (though slowly updat-
higher than scores for short queries. The second was ed), and the user’s current context is used to retrieve

700 RHODES AND MAES IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000
this information. Letizia does just the opposite: the in a small town in Japan would contain hyperlinks
source of information is the Web pages, and doc- from the number “220” to a description of what per-
uments are suggested based on the user’s profile that centage of the population was killed and how many
changes over the course of hours or days. Letizia also people that percentage would represent in the read-
shows the entirety of a recommended Web page in er’s own hometown.
a separate window rather than presenting a summary
of potential hits and letting the user pick which sug- Flyswat** 33 is another recent commercial venture.
gestions to view. Flyswat operates as a plug-in to a Web browser and
highlights words or phrases it identifies in a data-
Kenjin** 29 is a brand new commercial product by base. Clicking on the highlight can lead to dictionary
Autonomy that, according to user guides and press or thesaurus lookups or e-commerce pages that will
releases, provides information from product data- sell products related to the word.
bases, Web searches, or the user’s personal computer
that are related to information in currently running JITIR agents also share similarities with many per-
applications. Being a commercial product and hav- sonalized information systems. Of these, the most
ing just been released, few details are available. relevant are Lifestreams, 34 the Forget-Me-Not sys-
tem, 35–37 and Lotus Agenda**. 38 All these systems
Finally, RADAR 11 is a different front end for the Re- are designed to organize personal information with
membrance Agent that uses Microsoft Word instead minimal user intervention. However, Forget-Me-Not
of Emacs and displays suggestions in a separate win- is a direct memory aid, allowing users to search a
dow. RADAR was built in collaboration with the JITIR database of their own automatically recorded past
work described here, and also uses Savant as the in- actions. It falls into the same category as search en-
formation-retrieval engine. gines and other interactive information retrieval sys-
tems and is not designed to be proactive. Lifestreams
There are also several JITIR agents that proactively is an organizational structure to personal data rather
provide information based on local context but are than a JITIR agent, though it does provide a notifi-
domain-specific. cation system based on time. Agenda is a database
program that includes contextual triggers that au-
Fixit 30 is an expert system for copy machine repair tomatically perform an action when data that meet
that automatically brings up manual pages relating a particular criterion are entered.
to the fault being diagnosed. It uses the table of con-
tents for the repair manual to find useful pages, based JITIR agents are also related to work in the area of
on the diagnosis produced by the expert system. Al- context-aware applications, especially those that use
though the techniques used were domain-dependent, wearable computers. One such system is Audio Au-
the corpus (manual pages) was still a legacy system ra, 39 an audio-based wearable computer that uses
and did not need to be hand-coded for the system. ambient sound to automatically indicate e-mail,
group activity, and location-based information. No-
WebWatcher 31 is similar to Letizia, highlighting hy- madic Radio 40 is another audio-based wearable sys-
perlinks that best match a user’s stated interest. As tem to deliver news, voice mail, and e-mail. In par-
with Letizia, recommendations are only pulled from ticular, Nomadic Radio uses a concept called
links on the page currently being viewed, based on “dynamic scaling” that is quite similar to a ramping
a user profile. However, in WebWatcher, recom- interface. A third related system is the Context-
mended links are indicated by surrounding the link Aware Archaeological Assistant, 41 by which location
with a special icon. The system gives no further in- sensed by a GPS is used to provide field notes about
formation about the link or why one might want to animals in the area. This work is also related to many
follow it. WebWatcher uses the topology of links be- efforts in augmented reality 42,43 and ubiquitous com-
tween Web pages to make recommendations and is puting. 44
therefore only applicable to the Web.
Finally, several interface techniques have been de-
The Peace, Love, and Understanding Machine veloped for presentation of information related to
(PLUM) system 32 will automatically add hyperlinks current text, either automatically or by an author’s
to disaster news stories to better give a reader a per- design. One such system is VOIR, 45 which combines
sonalized understanding of the news. For example, elements of a document with a user’s interest pro-
a report that 220 people were killed in an earthquake file to automatically create hyperlinks. Another is

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 RHODES AND MAES 701
the XLibris pen-based system, 46 which supports the Acknowledgments
ability to hand-annotate documents using margin
links. Similarly, Fluid Links 47 allows Web designers We would like to thank our sponsors at British Tele-
to create “glosses,” which are a way of presenting com and Merrill Lynch. We would also like to thank
information incrementally. Thad Starner, without whom the project would never
have been started, and the RA team of Jesse Rabek,
Alex Park, Jesse Koontz, Pammie Mukerji, Matthew
Conclusions Burnside, Aneel Nazareth, Bayard Wenzel, and Jan
Nelson, without whom the project would never have
In conclusion, just-in-time information retrieval been completed. Finally, we would like to thank the
agents are systems that proactively provide informa- reviewers of the papers that have been written on
tion based on a person’s local context in a general, this subject over the years, and the Remembrance
task-independent way. They are related to search en- Agent for bringing those reviewer comments to the
gines, alarms, and news-clipping software, but differ foreground just when they were needed the most.
from each.
**Trademark or registered trademark of Yahoo! Inc., Microsoft
Corporation, The Open Group, Linus Torvalds, Reflection Tech-
The Remembrance Agent, Margin Notes, and Jim- nology, Inc., Handykey Corporation, Insilco Corporation, Alta-
miny have all been in use in the field for over three Vista Company, Autonomy, Inc., Flyswat, Inc., or Lotus Devel-
years. Although controlled experiments are useful opment Corporation.
for validating theories, it is these experiments “in the
wild” that give the best indication of what those the- Cited references
ories should be. Reports from long-term users have 1. B. Rhodes and T. Starner, “The Remembrance Agent: A Con-
verified the assertion that JITIR agents are not sim- tinuously Running Information Retrieval System,” The Pro-
ply a substitute for a traditional search tool; using ceedings of the First International Conference on Practical Ap-
a JITIR agent changes how people use information plications of Intelligent Agents and Multi-Agent Technology
(PAAM’96), London (April 1996), pp. 486 – 495.
in ways that having a search engine alone does not. 2. B. Rhodes, “Margin Notes: Building a Contextually Aware
For example, the comments of the two controlled- Associative Memory,” Proceedings of the International Con-
experiment subjects that “I typed opinions, and the ference on Intelligent User Interfaces (IUI’00), New Orleans,
RA gave me the facts to support them automatically” LA (January 9 –12, 2000), pp. 219 –224.
have been repeated many times in conversations with 3. B. Rhodes, “The Wearable Remembrance Agent: A System
for Augmented Memory,” Personal Technologies: Special Is-
long-term users. Variations include users who an- sue on Wearable Computing 1, 218 –224 (1997).
swer more e-mailed questions than they normally 4. T. Starner, Lizzy: MIT’s Wearable Computer Design 2.0.5,
would because the RA makes it simple, and users be- http://www.media.mit.edu/wearables/lizzy/ (1997).
coming generally more aware of the historical con- 5. B. Rhodes, “Wearable Computing Meets Ubiquitous Com-
puting: Reaping the Best of Both Worlds,” The Third Inter-
text surrounding issues about which they are read- national Symposium on Wearable Computers (ISWC’99), San
ing or writing. Francisco, CA (October 18 –19, 1999), pp. 141–149.
6. T. Starner, D. Kirsch, and S. Assefa, “The Locust Swarm: An
Environmentally-Powered, Networkless Location and Mes-
It is also clear from long-term use that JITIR agents saging System,” First International Symposium on Wearable
are situational applications: their effectiveness de- Computers (ISWC’97), Cambridge, MA (October 13–14,
pends a great deal on the environment in which they 1997), pp. 169 –170.
are used. If the environment and the agent are mis- 7. R. Want, A. Hopper, V. Falcão, and J. Gibbons, “The Active
matched, for example, using the INSPEC database Badge Location System,” ACM Transactions on Information
Systems 10, No. 1, 91–102 (January 1992).
when writing e-mail about nonresearch, then the re- 8. N. Minar, M. Gray, O. Roup, R. Krikorian, and P. Maes,
sults are not useful. Similarly, a database of personal “Hive: Distributed Agents for Networking Things,” Proceed-
e-mail works well for the owner of that e-mail but ings of the First International Symposium on Agent Systems and
does not produce results as good when used by an- Applications and Third International Symposium on Mobile
Agents (ASA/MA’99) (1999), pp. 118 –129.
other. These observations are the inspiration for de- 9. S. E. Johnson, P. Jourlin, K. Spark Jones, and P. C. Wood-
signing techniques that are generally applicable, so land, “Spoken Document Retrieval for TREC-8 at Cambridge
the agents can be adapted quickly to different do- University,” to appear in NIST Special Publication: The Eighth
mains, corpora, and individual preferences. It is also Text REtrieval Conference (TREC 8) NIST, Gaithersburg, MD
the inspiration for producing interfaces that are ig- (2000).
10. A. Pentland, B. Moghaddam, and T. Starner, “View-Based
norable, so the inevitable false-positive suggestions and Modular Eigenspaces for Face Recognition,” Proceed-
do not become more of a distraction than their oc- ings of IEEE Conference on Computer Vision and Pattern Rec-
casional usefulness is worth. ognition, Seattle, WA (June 1994), pp. 21–23.

702 RHODES AND MAES IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000
11. I. B. Crabtree, S. Soltysiak, and M. Thint, “Adaptive Personal 32. S. Elo, PLUM: Contextualizing News for Communities Through
Agents,” Personal Technologies 2, No. 3, 141–151 (1998). Augmentation, master’s thesis, MIT, Media Arts and Sciences,
12. S. Robertson, S. Walker, and M. Beaulieu, “Okapi at TREC-7: Cambridge, MA (1995).
Automatic ad hoc, Filtering, VLC and Interactive,” NIST Spe- 33. Flyswat, Inc., see http://www.flyswat.com.
cial Publication 500-242: The Seventh Text REtrieval Confer- 34. E. Freeman and D. Gelernter, “Lifestreams: A Storage Model
ence (TREC 7), NIST, Gaithersburg, MD (1999), pp. 253– for Personal Data,” ACM SIGMOD Bulletin, 80 – 86 (March
264. 1996).
13. B. Rhodes, Just-in-Time Information Retrieval, Ph.D. disser- 35. M. Lamming, P. Brown, K. Carter, M. Eldridge, M. Flynn,
tation, MIT, Media Arts and Sciences, Cambridge, MA (June and G. Louie, “The Design of a Human Memory Prosthe-
2000). sis,” The Computer Journal 37, No. 3, 153–163 (1994).
14. G. K. Zipf, Human Behavior and the Principle of Least Effort: 36. M. Lamming and M. Flynn, “Forget-Me-Not: Intimate Com-
An Introduction to Human Ecology, Addison-Wesley Publish- puting in Support of Human Memory,” Friend21: International
ing Co., Reading, MA (1949). Symposium on Next Generation Human Interface, Meguro Ga-
15. J. Payne, J. Bettman, and E. Johnson, The Adaptive Decision joen, Japan (1994), pp. 125–128.
Maker, Cambridge University Press, Cambridge, UK (1993). 37. P. Brown, J. Bovey, and X. Chen, “Context-Aware Applica-
tions: From the Laboratory to the Marketplace,” IEEE Per-
16. S. Jarvenpaa, “The Effect of Task Demands and Graphical
sonal Communications 4, No. 5, 58 – 63 (October 1997).
Format on Information Processing Strategies,” Management
38. S. Kaplan, M. Kapor, E. Belove, R. Landsman, and T. Drake,
Science 35, 285–303 (1989).
“Agenda: A Personal Information Manager,” Communica-
17. J. Russo, “The Value of Unit Price Information,” Journal of
tions of the ACM 33, No. 7, 105–116 (July 1990).
Marketing Research 14, 193–201 (1977). 39. E. Mynatt, M. Back, and R. Want, “Designing Audio Aura,”
18. R. Miller, “Response Time in Man-Computer Conversational Proceedings of the Conference on Human Factors in Comput-
Transactions,” AFIPS Conference Proceedings of the Fall Joint ing Systems (CHI’98), ACM Press (1998), pp. 566 –573.
Computer Conference 33, Part 1 (1968), pp. 267–277. 40. N. Sawhney and C. Schmandt, “Nomadic Radio: Scalable and
19. B. Jansen, A. Spink, and J. Bateman, “Searchers, the Sub- Contextual Notification for Wearable Audio Messaging,” Pro-
jects They Search, and Sufficiency: A Study of a Large Sam- ceedings of the ACM SICHI Conference on Human Factors in
ple of EXCITE Searches,” Proceedings of WebNet-98 (1998). Computing Systems, Pittsburgh, PA (May 15–20, 1999), pp.
20. D. Harman, “Overview of the Third Text REtrieval Confer- 96 –103.
ence (TREC-3),” NIST Special Publication 500-226: Overview 41. N. Ryan, J. Pascoe, and D. Morse, “Enhanced Reality Field-
of the Third Text REtrieval Conference (TREC-3), NIST, Gaith- work: The Context-Aware Archaeological Assistant,”
ersburg, MD (1995), pp. 1–20. V. Gaffney, M. van Leusen, and S. Exxon, Editors, Computer
21. D. Hull, “The TREC-7 Filtering Track: Description and Anal- Applications in Archaeology (1997, 1998).
ysis,” NIST Special Publication 500-242: The Seventh Text RE- 42. J. Rekimoto, Y. Ayatsuka, and K. Hayashi, “Augmentable
trieval Conference (TREC 7), NIST, Gaithersburg, MD (1998), Reality: Situated Communications Through Physical and Dig-
pp. 33–56. ital Spaces,” Digest of Papers: The Second International Sym-
22. C. D. Wickens, Engineering Psychology and Human Perfor- posium on Wearable Computers, Pittsburgh, PA, IEEE Com-
mance, Second Edition, Harper Collins Publishers, New York puter Society (October 19 –20, 1998), pp. 68 –75.
(1992). 43. S. Feiner, B. MacIntyre, and D. Seligmann, “Knowledge-
23. Wickens, op. cit., p. 98. Based Augmented Reality,” Communications of the ACM 36,
24. Y. Miyata and D. Norman, “Psychological Issues in Support No. 7, 52– 62 (July 1993).
of Multiple Activities,” User Centered System Design, D. Nor- 44. M. Weiser, “Some Computer Science Issues in Ubiquitous
man and S. Draper, Editors, Lawrence Erlbaum Associates, Computing,” Communications of the ACM 36, No. 7, 75– 84
Inc., Hillsdale, NJ (1986). (July 1993).
25. The Tech Archive Search, see http://www-tech.mit.edu/search. 45. G. Golovchinsky, “What the Query Told the Link: The In-
html. tegration of Hypertext and Information Retrieval,” The Pro-
26. J. Budzik and K. Hammond, “Watson: Anticipating and Con- ceedings of HyperText’97, Southampton, UK, ACM Press
textualizing Information Needs,” Proceedings of the Sixty-Sec- (1997), pp. 67–74.
ond Annual Meeting of the American Society for Information 46. M. Price, G. Golovchinsky, and B. Schilit, “Linking by Ink-
Science (1999). ing: Trailblazing in a Paper-Like Hypertext,” The Proceed-
ings of HyperText’98, Pittsburgh, PA, ACM Press (June 1998),
27. P. Maglio, R. Barrett, C. Campbell, and T. Selker, “SUIT-
pp. 30 –39.
OR: An Attentive Information System,” The Proceedings of
47. P. Zellweger, B. Chang, and J. Mackinlay, “Fluid Links for
the International Conference on Intelligent User Interfaces
Informed and Incremental Link Transitions,” The Proceed-
(IUI’00), Henry Lieberman, Editor, ACM Press (January
ings of HyperText’98, Pittsburgh, PA, ACM Press (June 1998),
9 –12, 2000), pp. 169 –176. pp. 50 –57.
28. H. Lieberman, “Autonomous Interface Agents,” Proceedings
of CHI’97, Atlanta, GA (March 1997), pp. 67–74.
29. Autonomy, Inc., San Francisco, CA, see http://www. Accepted for publication April 13, 2000.
kenjin.com.
30. P. Hart and J. Graham, “Query-Free Information Retriev- Bradley J. Rhodes MIT Media Laboratory, 20 Ames Street, Cam-
al,” IEEE Expert/Intelligent Systems & Their Applications 12, bridge, Massachusetts 02139-4307 (electronic mail: rhodes@media.
No. 5, 32–37 (September, October 1997). mit.edu). Dr. Rhodes is a recent graduate of MIT’s Media Lab,
31. T. Joachims, D. Freitag, and T. Mitchell, “WebWatcher: A where he worked under the direction of Pattie Maes in the Soft-
Tour Guide for the World Wide Web,” International Joint ware Agents Group. His research areas included the study of in-
Conference on Artificial Intelligence (August 1997), pp. 770 – telligence augmentation, just-in-time information retrieval, ubiq-
775. uitous computing, context-aware applications, and wearable

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 RHODES AND MAES 703
computing. He is one of the “Media Lab Cyborgs” and for the
past four years has been an active member of the MIT Wearable
Computing Project. He received his Ph.D. degree from the MIT
Media Lab in 2000, his S.M. degree from the MIT Media Lab
in 1996, and an S.B. degree in computer science from MIT in
1992.

Pattie Maes MIT Media Laboratory, 20 Ames Street, Cambridge,


Massachusetts 02139-4307 (electronic mail: pattie@media.mit.edu).
Dr. Maes is an associate professor at MIT’s Media Lab, where
she founded and directs the Software Agents Group and is prin-
cipal investigator of the Media Lab e-markets Special Interest
Group. Her team’s work has included personalized information
filtering agents, agents that automate behavior patterns, match-
making agents, just-in-time information retrieval agents, and
agents that buy and sell on behalf of a user. Dr. Maes received
her doctoral degree in computer science and artificial intelligence
from the Vrije Universiteit Brussel in Belgium in 1987.

704 RHODES AND MAES IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000

You might also like