You are on page 1of 5
Voice Interfaces for Real-Time Translation of Common Tourist Conversation Eduardo M. Pires, Lais V. Vital, Carina F, Alves, Alex S. Gomes Centro de Informatica Universidade Federal de Pernambuco Cid. Universitéria s/n CEP 50.740-560 {emp, Ivy, cfa, asg}@cin.ufpe.br ABSTRACT In this paper, we present an user centered design for e reel- timo cross-lingual translation system designed to fulfill needs in short communication between interlocutors speaking different languages. Through our design method, three incremental versions of prototype were produced. In the end, we demonstrate thet the interaction model can bo applied on real situations. For that, validation tests were applied, and taxi drivers and tourists interacted by a prototype, Keywords Real-time translator, short-term conversation, interaction model LINTRODUCTION When visiting « foroign place, the contact between people of different languages is quite common, whether traveling for pleasure or business. People face many difficulties in ‘communication when ebroad, such as the attempt to find a particular place, service or a certain tourist spot. Different systoms have been designed to fulfill social interaction nocesstios, a5 can be reed in (3) and [4]. Our focus is on real-time voice translators. Real-time voico translators enable cross-cultural interaction through communication. Our research focuses on analyzing the existing interaction models for real-time translation systems and the expectations of users, who need to rminimizo communications difficulties encountered on daily brief conversations. In other words, this paper aims to present our design process of an automatic translation interaction style for cross-lingual short-term conversations. ‘This paper is organized as follows, In section 2 we presont and analyze the state-of-art of real-time translation interfaces (RTT), identifying the best design concepts. In section 3, we present our user centered design process, Which involves the observation of users dealing with low fidelity prototypes. In section 4, the design results ere presented and we propose a simplo interaction model for translators, which arose os a result of the evolutionary process, ‘The findings ofthis study are presented in Section 5, which highlights the relovant features of a RTTI system, 2. CROSSLINGUAL REAL TIME SHORT-TERM TRANSLATION INTERACTION STYLES In this section we present the evolution of real-time translation systems in the absence of a personal translator. ‘The rapid development of this new portable computers paradigm has been facilitated by dynamic and fast prototyping methodology [5] and design with user validation. In [7], rolovant charactoristies for RTI systems on mobile devices were specified as follows. The latency should be Kept es low as possible, so tho application can react instantly to the users’ voico actions. Moreover, power consumption should be minimel to increase battery life, and the device must bo lightweight and be operable in multiple orientations (vertical, horizontal). Other xtra features are offered: a custom configuration and the scalability to different types of mobile devices. 17] also complements that the interaction style through spooch should use a hands-free input, so thero ere no distractions for the user. One must be able to continue his, ‘ongoing activities, without boing interrupted by the voice application. Scanned with CamScanner It should be noted that the procedures usod for real-time translation with technology (automatic speech recognition = ASR, translation engine - TE - and spooch synthesis - TTS!) ere not free of faults. In order to avoid users’ disappointment from having their conversation misinterpreted, interfaco solutions should be used. The solution of this interface in [3], [4] and [6] usos texts during the whole conversation, which allows visualization of Partial results of the translation steps (ASR, TE, TTS), In ‘an attempt to increase the usor’s confidence in the application, [4] and [6] allow, as on extra feature, that errors in the transcribed text can be flagged and corrected in real-time. Other solutions enhence the translation system, just adding slang, colloquialisms and idiomatic expressions to increase the naturalness [3]. The system works on push-to-talk interaction style, where one has to touch a button to speak, {nd press tho speaker button to listen something, Most solutions availeble emphasize that visual factor makes the conversation easior. Gestures, glances and facial expressions, combined to the use of reel-time trenslation technology, increase user satisfaction and sense of, spontaneity when communicating, The use of the translation system at electronic mootings in chet rooms is proposed in [1]. Polyglot uses a Google Translator API (Application Programming Interface) which currently supports 58 Tanguages (httputranslate google.com). Among online translators, Google Translator is the most accurate, [1] says that the translated text should be clear, easily understood and it ‘ust keep the original moaning ofthe text, The concept of “topic spotting” for communicetion environment is introduced in [2]. Considering the location ‘and context of a conversation, it is Possible to take the translation to @ more realistic level, This feature makes the system similar to a non-fluent person ina Janguage that does not understand all spoken words, but understands the meaning of what is being said in general. It is also important that the user feels thet he understands what is boing said, Finally, [3] states that the usebility validation of a real-time translation system helps the crt making the understandin Furthermore, ‘oss-lingual communication, 19 Of the conversation oasier, to enhence performance, it is Tecessary to incrense the grammer and context-specific domeins, ond ee 'ext-to-spoech colloquial expressions, which aro responsible for incorrect translations. After analyzing other features of interaction stylos for automatic trenslation, wo will present our user centered design method. 3, USER CENTERED DESIGN METHOD We adopted a User Centered Design approach [8] to guide the stops of development as « primary design method, Thus, it was possible to define an application that reflects the User's needs since its conception. Finally, three interaction cycles were conducted. The chosen users to guide tho evalustion process wore hotel's and almor’s taxi drivers. It wes found that they keep almost daily contact with tourists of different countries. Moreover, taxi drivers end tourists abroad have short-term. conversations, such as ‘where to go’ questions or Comments about daily facts, such as tho weather, sights, events and more, To understand precisely users’ needs, different research techniques were used, such as individual qualitative interviews and observation. All interviews were recorded and Jeter transcribed. The achieved content was analyzed by threo different people that did not patticipate on the interview process. Each person categorized the content into @ taxonomy defining the mein influential factors in the cross-lingual interaction, The resulting taxonomies were then merged into a final one. Once the influential factors were defined, possible solutions Were identified and refined through a brainstorm section, In Sequence, low fidelity prototypes wore developod and evaluated by the potential users: the taxi drivers. The “Wizard of Oz” technique was used during the validation. Tn this process, a person simulates a behavior of the testing device, helping the interaction while is validating. Also, the user should disregard this porson to make the interaction closor real interaction with the device. The low-fidelity Prototype features are oxtendad with this technique, enabling a more naturel interaction. During the validation, users’ reactions and expectations were also observed and current problems and insights were acknowledged. Tho entire process was repeated for each cycle. At the end, {total of three prototypes were developed, as a result of the Svolutionary design cycle. However, for a prototyping evaluation purpose, only the software epplication was ovaluated and not the hardware device itself 4. PROTOTYPES DEVELOPMENT Based on the factors identi Literature analysis, ified during the context and three prototypes were developed as Scanned with CamScanner design solutions of an intuitive real-time voice transletor. Allof thom were emulated using slides presentations form. ‘The slides represented the screen. Regarding the language, all tests were applied using English and Portuguese. The former was chosen for its lingua franca characteristic, the second because it is the potential users’ native languago. The application scenario elected to be prototyped consisted of conversations between a tourist in a taxi and the taxi driver, Wo dosigned an automobile device equipped with two similar modules: one for the driver, and another for the tourist passenger. In this setting, we analyzed the structure of a roal-timo cross-linqual conversation. In order to make the interaction more natural, both modules had touch- screens, a microphone and speakers integrated with the car. The prototype was called Tradutaxi. For the prototyping, some evaluation aspects woro considered, such as objectivity, ease of use and users experience during the interaction. The prototype must not affect the flow of the conversation; therefore the user must be free to express himself through gestures and body expressions, It was essential to have an intuitive and user friendly interface as ‘well as being hands-free, so the users could concentrate only on the conversation. 4.1 First Prototyping Cycle For the first low-fidelity prototype, en initial slide was used to represent the application screen where the user can choose from tho available languages as seen at Figure 1. ‘The language options are represented as buttons with labels indicating the language in its netive form. On the top of the screen there is the instruction: “Choose your language”. The user must choose the language they wish to talk in. Riioos= VouL language Figure 1 - The first prototype’s initial screen After choosing the language, tho user is led to @ second slide: the translator's screen. Similerly, on the top of the screen there is the instruction: “Press speak if you wish to translate something”. On the contor of the screen, there is a circled gray button with a “Speak” Inbel on the bottom, as shown at Figure 2, If the user wants to translate a sentence, he must press the button end speak the sentence to be translated. At this moment, the button becomes red with a “Translating” label. To ond tho translation the button must be pressed once more. The process works for both modules. Penske Neb eed pede er) Figure 3 - The first prototype’s translator's screen During evaluation, some results wore perceived as follows. Tho users did not present difficulties while interacting with tho initial screen. However, despite the simplicity of the prototype, some confusion occurred with the translator’s sereon, Many users could not distinguish the color representation of the button. In other words, they had difficulties to identify whether the translator was activated or not. Therefore, the experience failed to achieve a sense of ease and comfort, because the button’s representation operated as a distraction. 4.2 Second Prototyping Cycle After the first validation cycle, some observations were inforred and a second prototype was developed. Concerning tho interaction model, the prototype worked similarly to the Scanned with CamScanner first one, The prototype initial screen was well evaluated. ‘Thus, no modifications were made for the second prototype. The difference between the prototypes was the button’s representation. In the translator's screen, the circled gray button was replaced with a green play button, a worldwide representation for video and music reproduction. The “Speak” label, in turn, was substituted with a “Translate” label. Therefore, the user can interact with the interface using his previous knowledge, which leads to a more natural experience, Figure 4 ~The translator's screen of second prototype. Likewise the first prototype, to translate a speech the user needs to press the play bution and speak whet must be translated. After, pressing the button, itis replaced by a red stop button, with the label “Stop Translating”, as seen at Figure 5. When the user ends his speech, the stop button must he pressed, In this way, all possible actions are clearly prosonted to the usor in a proper roprosontation, provnting from misinterpretations. Figuro 5 ~ Tho translator's screen of second prototype. Tho new button's representation was well evaluated throughout the user’s validation with the “Wizard of 02” method. The different shapes and colors helped users to discriminate the translator’s functionalities. ‘Although the second prototype provided a more natural and satisfactory interaction, new difficulties concerning the user’s confidence in the translator were identified. Despite the translator's ease of use, some users were uncomfortable because they did not have a guarantee thet the translator ‘was not misinterpreting their speech. 4,2 Third Prototyping Cycle Finally, the third prototype was an evolution of the previous one. Thus, there wore no modifications regarding the design of the translator, the progress concerns the addition of new feavures. Through ovaluation, it was acknowledged the significance of enabling not merely the translation button, but also screen captions with transcribed speoch, as tho user speaks. Consequently, it was possible to read the speech, while speaking. The results of third prototype can be seen at Figure 6. Figure 6 ~ The third prototype’s translator's screen ‘Tho now foature provided a more comfortable environment and a higher level of confidence on the application, es were observed from users’ reactions while validating. 5. CONCLUSION In this paper, we presented the evolution of a voice interface real-time translation interaction style, The design requirements were identified in the literature and after applying individual qualitative interviews. We could evolve the concept model in three design interactions involving users. We evalunted each version considering the user's feedback to perform the task analysis. As a result, we developed a voice interface for real-time translator, to be used on taxis that keep daily contact with Scanned with CamScanner tourists. The solution is simple and does not affect the dialogue between the parts. The ‘touch to start translating’ and ‘touch to stop translating’ method was most useful and had a great acceptance among users who participated on the interaction validation. The simplicity of tho interaction allows an easy adaptation to real-time translator systems. REFERENCES 1, Aiken, M,, Park, M., Simmons, L., and Lindblom, T. 2009. Automatic Translation in Multilingual Electronic Meetings. Translation Journal, 13(9), July. 2. Chung, J., Kern, R. and Lieberman, H. 2005. Topic spotting common sense translation assistant. In CHI '05 extended abstracts on Human factors in computing systems (CHI EA '05), ACM, New York, NY, USA, 1280-1283. 3. Hattori, H. 2002. An Automatic Speech Translation System on PDAs for Travel Conversation, InProceedings of the 4th IEEE International Conference on Multimodal Interfaces (ICMI '02), IEEE Computer Society, Washington, DC, USA, 211-. 4. Metze, F., McDonough, J., Soltau, H., et al. 2002. The NESPOLE! speech-to-speech translation system. In Proceedings of the second international conference on Human Language Technology Research (HLT '02). o x Morgan Kaufmann Publishers Jnc., San Francisco, CA, USA, 378-383, . Rashmi, S, and Jonathan, B. 2004, Rapid information architecture prototyping. In Proceedings of the 5th conference on Designing interactive systems: processes, practices, methods, and techniques (DIS ‘04). ACM, New York, NY, USA, 349-352. Shigenobu, T. 2007. Evaluation and usability of back translation for intercultural communication. In Proceedings of the 2nd international conference on Usability and internationalization (UI-HCII‘07), Nuray Aykin (Ed.). Springer-Verlag, Berlin, Heidelberg, 259- 265. Smailagic, A., Siewiorek, D., Martin, R. and Reilly, D. 1989. CMU Weerable Computers for Real-Time Speech Translation. In Proceedings of the 3rd IEEE International Symposium == on_~——Woarablo. Computers (ISWC '99). IEEE Computer Society, Washington, DC, USA, 187-, . Vredenburg, K., Isensee, S. and Righi, C. 2002. User- Centered Design: An Integrated Approach. Upper Saddle River, NJ: Prentice Hall PTR. 198-203. ‘Scanned with CamScanner

You might also like