Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Download
Standard view
Full view
of .
Look up keyword
Like this
4Activity
0 of .
Results for:
No results containing your search query
P. 1
RH CHI99paper

RH CHI99paper

Ratings: (0)|Views: 195|Likes:
Published by Jamey Graham

More info:

Published by: Jamey Graham on Sep 10, 2008
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

12/15/2012

pdf

text

original

 
ABSTRACT
Over the last two centuries, reading styles have shifted awayfrom the reading of documents from beginning to end andtoward the skimming of documents in search of relevantinformation. This trend continues today where readers,often confronted with an insurmountable amount of text,seek more efficient methods of extracting relevant informa-tion from documents. In this paper, a new document reading
environment is introduced called the Reader’s Helper
TM
,which supports the reading of electronic and paperdocuments. The Reader’s Helper analyzes documents andproduces a relevance score for each of the reader’s topics of interest, thereby helping the reader decide whether the docu-ment is actually worth skimming or reading. Moreover, dur-ing the analysis process, topic of interest phrases areautomatically annotated to help the reader quickly locate rel-evant information. A new information visualization tool,called the Thumbar
TM
, is used in conjunction with relevancyscoring and automatic annotation to portray a continuous,dynamic thumb-nail representation of the document. Thisfurther supports rapid navigation of the text.
Keywords
document annotation, information visualization, content rec-ognition, intelligent agents, digital libraries, probabilisticreasoning, user interface design, reading online
INTRODUCTION
Around 1750AD there was a dramatic change in the waypeople read documents [8]. Before this time, readers con-sumed documents
intensively
, reading the document fromstart to finish, sometimes several times or even out loud to agroup. By the early 1800’s, however, readers tended to read
extensively
, reading documents only once or skimming thedocuments in search of relevant information to determinewhether the document was worth reading in its entirety.Today, with the advent of the World Wide Web (WWW) andthe growing collection of electronic documents, this style islikely to continue: there are simply too many potentially use-ful documents and not enough time to read them all [14].Office workers, in particular, are forced to optimize theirdaily reading by sifting through the vast amount of informa-tion, establishing a balance between in-depth understandingand expediency. Reading intensively versus extensively canbe thought of as
vertical
 
versus
 
horizontal
reading [6]. Thatis, in the past, readers read the document from beginning toend (vertical); now, they scan and browse the text(horizontal).Few applications available today fully support the readingprocess. There are, however, several applications whichcondense or locate documents for the user. Applicationssuch as [13, 15] provide a synopsis of the text which cansometimes be used to determine the document’s relevance.Other systems search for and retrieve documents relevant toan evolving user profile [3, 16]. The learning of user pro-files over time provides an evolutionary process whichenables the system to improve the quality of documentsretrieved for the user. Another system supports users as theysearch digital libraries by showing query keywords in thecontext of the sentences they appear in a document [17].Thus, users can quickly access the database based on thepresence or absence of a particular context they are seeking.Another application inserts supplemental information in theform of an annotation into a news story if the story containskey phrases from the subject database [9]. This offers thereader additional information not necessarily provided by theauthor of the original text. The work by [7] supports theskimming of documents by representing the topics of a textas
content capsules
. Using a special visualization tool por-traying the document as a thumb-nail image, topics are pre-sented to the reader at the location in the text where theyoccur. This allows the reader to quickly view the highlightsof the document in the context of the surrounding text struc-ture.Despite the growing number of applications used to locateand evaluate documents, there are few, if any, applicationsthat focus on the actual
text reading
process. I believe thatreaders require a personalized environment that supports theskimming of documents and the extraction of information. Ihave created a new document reading environment called theReader’s Helper, to act as both the reader’s document
The Reader’s Helper:A Personalized Document Reading Environment
Jamey Graham
California Research CenterRicoh Silicon Valley, Inc.2882 Sand Hill Road, Suite 115Menlo Park, CA 94025, USA jamey@rsv.ricoh.com
 
browser and personal agent, advising the reader of relevantdocuments and of the relevant text within each document.
The Reader’s Helper is not a search engine; it does notsearch for or deliver documents to the user. Instead, it
helpsreaders help themselves
to be more productive in reading byevaluating documents the reader views and by providingvisual tools for showing the locations of the relevant por-tions of the text. In the following sections the Reader’sHelper system is describe, both in terms of the user interfaceand the underlying content recognition subsystem. Futureissues and potential research directions are also discussed.
THE READER’S HELPER
The Reader’s Helper (RH) application integrates existingtechnology—a WWW browser, highlighting key words, andprobabilistic reasoning—with a unique information visual-ization tool in support of readers who read both online andpaper documents. The RH uses information about eachreader in evaluating the content of a document. It calculatesa
relevance
value to determine if a document is applicableto a reader. The current prototype is composed of a special-ized WWW browser and an annotation agent responsible forrecognizing the reader’s
topics of interest 
in a document.The way the annotation agent understands what is importantto the reader is through a
reader profile
which contains per-sonal information about the reader. In the following sec-tions the electronic document browsing environment isdescribed in terms of the user interface and followed by adescription of the
 paper-based 
version of the RH.
The User Interface
One of the most important aspects of an electronic docu-ment reading environment is the user interface. The diffi-culty of reading electronic documents is well known [18].Readers therefore often rely on the printed documentbecause of the high resolution and flexibility offered bypaper. A design goal for the RH was to provide the readerwith an easy to use interface that emphasizes the relevantcontent of the document, and, as a consequence,
 personal-izes
the document. This should not only increase the appealof reading electronic documents but also the efficiency withwhich they can be consumed. There are three main methodsthat work in concert to improve the readability of an elec-tronic document. First, a relevance score is computed foreach topic that is important to the reader with respect to adocument. Each topic score is an estimation of the rele-vancy of the document to the reader, offering a
 first appraisal
of the document. This is directly associated withthe horizontal reading trend mentioned earlier: most readersdo not have time to completely analyze all of the documentsthey must process and so it is desirable to have a quantita-tive measure of each document’s relevancy. Second, a newinformation visualization tool, called the Thumbar, showsan overview of both the document and the results of anno-tating the document. Hence, the reader can quickly navi-gate to the more relevant parts of a document based on thevisual information present in a thumb-nail. Thirdly, afterthe RH has completed the analysis of a document, it is auto-matically annotated to depict the most relevant portions of the document. As the reader reads the document, keyphrases are pointed out by way of highlighting to help guidethe reader through the text.
The Thumbar 
Figure 1 shows the RH document browser displaying aHTML source document (from the CHI ‘97 Online Proceed-ings [19]). On the left hand side of the display is the Thum-bar
(
and
)
. The Thumbar is a unique informationvisualization tool whose name is derived from the conceptof a thumb-nail image combined with the functionality of ascrollbar. The reader may drag a lens up and down toreposition the view of the document in the document areain the manner of a scrollbar. The Thumbar allows read-ers to
look ahead 
in the text to see more of the document’sstructure and content. By using a Thumbar, a reader canquickly scan the document for a desired image, chart oreven text structure and scroll accordingly. The work by [22]supports the notion that humans are very good at recogniz-ing small images. Recent work by [20] further exploits thisconcept in a system where icons of documents are used asthe retrieval cues to a large document database. It wasfound that users could recognize the thumb-nail images of documents (based on the text structure and the images in thetext) to retrieve several documents which “looked” like thedocument they were seeking.The Thumbar is a dynamic representation of the documentcontained in the browser. Its contents and shape change asthe document itself is personalized. For instance, afterannotating a document, the Thumbar presents relevant key-word phrases as red lines (instead of the typical gray orblack lines). This clearly indicates to the reader where therelevant information is located in the document, similar tothe way attribute-mapped scroll-bars [24], TileBars [10] andthe Mural [12] depict relevant information in a document.This visual information, however, can be changed. Forexample, by “turning off” a concept that has scored well inthe document, the red lines in the Thumbar change back tothe normal gray or black lines as if there were no annotationat that location at all. Thus, the reader can create a repre-sentation of the document based on a combination of con-cepts either turned on or off. Another way of alteringthe annotation is by using the
sensitivity threshold meter 
 which can be used to manipulate the concepts that are activein the document. This is done by setting a threshold andonly allowing concepts whose similarity scores equal or sur-pass this threshold to be visible as annotations.As with any HTML document, there is no formal paginationas long as the document is contained in the browser (thepagination is set when the document is sent to the printer).Instead, there is simply the concept of “a screenful” of thedocument which can change if the user resizes their browser
1.11.21.21.31.41.5
 
window. The Thumbar represents a reduced version of the original document based on a user defined reduction ratio(e.g. in figure 1, the reduction ratio is 6 which means that theThumbar is 1/6th the size of the document area ). Theentire representation of text in the Thumbar is presentedusing a proportionally reduced line for each word. Thismethod of portraying a navigable thumb-nail of text is simi-lar to [4], used for software visualization.If the document is resized, the Thumbar is resized to repre-sent the same screenful depicted in the document area. Theuse of a thumb-nail icon to represent a page in a document isof course not a new idea. Adobe Systems, for instance, usesa similar method of displaying icons for the individual pagesin a document [1]. The Adobe method, however, portrays astatic representation of the document (i.e., the thumb-nailimage does not automatically change when the document isedited) and does not support using a lens for navigationthroughout the document. The Adobe method is also basedon a formally paginated document with distinct thumb-nailsfor each page. In the Thumbar, there is no pagination andtherefore, the document is viewed as a continuous stream of text and images by which one can navigate to any location inthe document.If the document cannot be fully contained in the Thumbar(because the document is too long), a green line is drawnacross the bottom of the Thumbar. Dragging the lens downto the green line causes the entire thumb-nail image to scrollupwards, thereby revealing more of the document. Once thedocument has been scrolled in this way, a green line will alsoappear at the top of the Thumbar indicating that more of thedocument is available above. To go back to the beginning othe document, the reader may again drag the lens upward tothe green line which, thereby causing the thumb-nail imageto scroll back down. Alternatively, there are buttons at thetop of the Thumbar for repositioning the lens at either the top
 
or the bottom of the document. The user may alsochoose to view the entire document in the Thumbar. Thishas the disadvantage, however, of reducing the clarity of thumb-nail information.With the help of a Thumbar attached to the documentbrowser, the reader can look ahead in the document to evalu-ate its content and structure. This is not possible in mostdocument browsing systems available today where readersare restricted to one page or screenful at a time. This abilityto look ahead in a document is quite useful for reading docu-ments as it leverages off of the design principle of providinga global (macro) and local (micro) representation of informa-tion [23]. The global representation portrays coarse infor-
1.11.3
Figure 1 - The Reader’s Helper WWW Browser
1.11.21.3 1.41.5

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->