You are on page 1of 2

CHI 97

22-27

MARCH 1997

ORGANIZATIONAL

OVERVIEWS

Multimodal Human Computer Interaction Research at Toshiba R e s e a r c h and D e v e l o p m e n t C e n t e r


Yoichi Takebayashi and Miwako Doi Human Interface Technology Center, Research and Development Center, Toshiba Corporation 1 Komukai-Toshiba-cho, Saiwai-ku, Kawasaki 210, Japan +81 44 549 2243 {yoichi,doi}@eel.rdc.toshiba.co.jp

ABSTRACT Toshiba's Human Interface Research Group is pursuing media understanding and intelligent interaction technologies to achieve natural multimodal HCI (human-computer interaction). In collaboration with Toshiba's other corporate laboratories, engineering laboratories and business divisions, we have been developing practical interactive systems and products related to information services, consumer electronics, document filing and industrial equipment. KEYWORDS Organizations, multimodal, HCI, information filtering, knowledge sharing, media understanding.
ORGANIZATION THEMES AND RESEARCH

Best known for its world's first letter handling system using hand-written character recognition and Japanese word processor using Kana-to-Kanji conversion, Toshiba Research and Development Center (RDC) has been developing various media conversion/understanding systems and natural language processing systems. We believe that these technologies play important roles in achieving user-centered multimodal human computer interaction. While advances in computing environments have helped us gather and share a large amount of mulimedia data, they cause the situation in which we have been forced to work under stress due to a flood of information. To solve this problem, we are focusing our HCI research on informarion retrieval and knowledge sharing, based on media understanding technologies. Specifically, we have been exploring user's intention and contents of multimedia data from the the viewpoint of media conversion and understanding functions and multimodal interface, because their full understanding is crucial in retrieving useful information.

Toshiba's Human Interface Technology Center (HIC) was established in 1995, as a corporate organization, aiming to achieve human-centered reliable media technologies in harmony with our human society. To apply these technologies to various systems and products, about 30 researchers are collaborating with other organizations, including those in charge of computer and communication systems, consumer electronics, power systems, and industrial equipment. Our work widely covers media conversion/understanding functions, from character recognition, to document understanding, natural language understanding, as well as media interaction such as information filtering, knowledge/information sharing, speech dialogue, video browsing and human factors. Figure 1 shows the framework of information retrieving and sharing system using a set of media conversion/processing functions" HIware".

""~"~

answers questions

multimediainformation

il management IL-J--'~ rarest

S
structurization

i
i

i I="
i

.............. ~.~.l~..r~.~..~]!9~

~structur~zation

o1~

..............

~document Lrganization DB /personnel

DB

DB

.......................

Figure 1: Framework of information retrieving and sharing system APPROACH Structuring Multimedia Information In order to create user-centered multimodal inter79

ORGANIZATIONAL

OVERVIEWS

CHI 97

22-27

MARCH 1997

face, it is vitally important to structure multimedia information using media conversion. Structured multimedia information enables both humans and computers to share and retrieve information as they wish and to have a better understanding of each other. The fundamental basis for this task is knowledgebases and language dictionaries, which we are currently building.
Enhancing Multimodal Interaction We need to upgrade intelligent multimodal human computer interaction technologies using agents so that users can find more enjoyment and comfort in working with computers. This means creating a system which understands users' intentions and situations from their utterances and gestures and provides such services as information retrieval, advices and suggestions, and whatever help they need, while directing a natural dialogue with users.

(World Wide Web). This system is being used as Japan's first information filtering service.
Personal Information Provider We have been developing a multimodal Personal Information Provider(PIP), for enhancing information/knowledge sharing and closer human relations among groups. This system employs natural language and emotion understanding from speech and keyboard input, with a user-initiative dialogue manager and multimodal response generator. The system runs in real time on a personal computer with an interface agent to make the user's stored information open to others under the user's permission. Experiments based on the PIP are being performed on about 300 people for knowledge and know-how sharing in our laboratories. Figure 1 shows the advice/help on demand system for our office knowledge/know-how sharing system.

Developing S e n s o r s a n d Input-output Devices Finally, we are also investigating new sensors and input-output devices. They extract information users presented both voluntarily and unintentionally. Such information facilitates the development of media understanding technologies and makes it possible for computers to understand users' intention and situations.
The human brain consists of thousands of architectural types of computers, each of which has various functions including voice understanding, scene understanding, language understanding, translation, dialogue, speed reading, and problem-solving. Thus, a future picture for intelligent multimodal interfaces could be realized by upgrading conversion of each multi-level media and integrating those media through the organization of knowledge-bases and language dictionaries, thereby assisting humans' academic activities. Now that environments for digital information are being set, we should accelerate our research and development for highly advanced acquisition, sharing, and dispatch of knowledge/information.
SELECTED P R O J E C T S Information Filtering System We have developed an information filtering system for newspaper articles published every day in digital form. The system computes similarities between the user's information need and each article based on our expanded vector space model, and then selects articles suitable to his need. The system also detects other similar articles, so that it can indicate a cluster of similar articles. The selected newspaper articles are provided to users by using communication tools in the Internet, e.g., e-mail and WWW 80

HI-ware(Common HI service environment) We have been developing HI-ware (Common HI Service Environment) where various kinds of HI functions such as speech recognition/synthesis, character recognition, and machine translation are easily and organically available to develop advanced HCI. As shown in Figure 2, the environment has two features. One is standardized API (Application Programming Interface). The API keeps consistency among HI functions so that various kinds of HI applications can incorporate them in the common manner. The other feature is a common dictionary shared among HI functions; a new word registered in the common dictionary can be provided to all HI functions.
Speech recognition ]~,,,,,~ Applications

Character recognitior]~_...i Knowledge ( : - e ]--I Com%n I , ct,on


A

Abstrac, ion

17

Figure 2: Configuration of HI-ware PUBLICATIONS 1. Aoki, H. et al. "A Shot Classification Method to Select Effective Key-frames for Video Browsing," Proc. ACM Multimedia'96, 1996 2. Ono, K. et al. :"Abstract Generation Based on Rhetorical Structure Extracti on," COLING '94, 1994. 3. Miike, S. et al. : "A Full-Text Retrieval System with a Dynamic Abstract Generation Function," Proc. SIGIR '94, 1994.

You might also like