Voice Interfaces for Real-Time Translation of Common
Tourist Conversation
Eduardo M. Pires, Lais V. Vital, Carina F, Alves, Alex S. Gomes
Centro de Informatica
Universidade Federal de Pernambuco
Cid. Universitéria s/n
CEP 50.740-560
{emp, Ivy, cfa, asg}@cin.ufpe.br
ABSTRACT
In this paper, we present an user centered design for e reel-
timo cross-lingual translation system designed to fulfill
needs in short communication between interlocutors
speaking different languages. Through our design method,
three incremental versions of prototype were produced. In
the end, we demonstrate thet the interaction model can bo
applied on real situations. For that, validation tests were
applied, and taxi drivers and tourists interacted by a
prototype,
Keywords
Real-time translator, short-term conversation, interaction
model
LINTRODUCTION
When visiting « foroign place, the contact between people
of different languages is quite common, whether traveling
for pleasure or business. People face many difficulties in
‘communication when ebroad, such as the attempt to find a
particular place, service or a certain tourist spot. Different
systoms have been designed to fulfill social interaction
nocesstios, a5 can be reed in (3) and [4]. Our focus is on
real-time voice translators.
Real-time voico translators enable cross-cultural interaction
through communication. Our research focuses on analyzing
the existing interaction models for real-time translation
systems and the expectations of users, who need to
rminimizo communications difficulties encountered on daily
brief conversations. In other words, this paper aims to
present our design process of an automatic translation
interaction style for cross-lingual short-term conversations.
‘This paper is organized as follows, In section 2 we presont
and analyze the state-of-art of real-time translation
interfaces (RTT), identifying the best design concepts. In
section 3, we present our user centered design process,
Which involves the observation of users dealing with low
fidelity prototypes. In section 4, the design results ere
presented and we propose a simplo interaction model for
translators, which arose os a result of the evolutionary
process, ‘The findings ofthis study are presented in Section
5, which highlights the relovant features of a RTTI system,
2. CROSSLINGUAL REAL TIME SHORT-TERM
TRANSLATION INTERACTION STYLES
In this section we present the evolution of real-time
translation systems in the absence of a personal translator.
‘The rapid development of this new portable computers
paradigm has been facilitated by dynamic and fast
prototyping methodology [5] and design with user
validation.
In [7], rolovant charactoristies for RTI systems on mobile
devices were specified as follows. The latency should be
Kept es low as possible, so tho application can react
instantly to the users’ voico actions. Moreover, power
consumption should be minimel to increase battery life, and
the device must bo lightweight and be operable in multiple
orientations (vertical, horizontal). Other xtra features are
offered: a custom configuration and the scalability to
different types of mobile devices.
17] also complements that the interaction style through
spooch should use a hands-free input, so thero ere no
distractions for the user. One must be able to continue his,
‘ongoing activities, without boing interrupted by the voice
application.
Scanned with CamScannerIt should be noted that the procedures usod for real-time
translation with technology (automatic speech recognition =
ASR, translation engine - TE - and spooch synthesis - TTS!)
ere not free of faults. In order to avoid users’
disappointment from having their conversation
misinterpreted, interfaco solutions should be used. The
solution of this interface in [3], [4] and [6] usos texts during
the whole conversation, which allows visualization of
Partial results of the translation steps (ASR, TE, TTS), In
‘an attempt to increase the usor’s confidence in the
application, [4] and [6] allow, as on extra feature, that
errors in the transcribed text can be flagged and corrected in
real-time.
Other solutions enhence the translation system, just adding
slang, colloquialisms and idiomatic expressions to increase
the naturalness [3]. The system works on push-to-talk
interaction style, where one has to touch a button to speak,
{nd press tho speaker button to listen something,
Most solutions availeble emphasize that visual factor makes
the conversation easior. Gestures, glances and facial
expressions, combined to the use of reel-time trenslation
technology, increase user satisfaction and sense of,
spontaneity when communicating,
The use of the translation system at electronic mootings in
chet rooms is proposed in [1]. Polyglot uses a Google
Translator API (Application Programming Interface) which
currently supports 58
Tanguages
(httputranslate google.com).
Among online translators,
Google Translator is the most accurate, [1] says that the
translated text should be clear, easily understood and it
‘ust keep the original moaning ofthe text,
The concept of “topic spotting” for communicetion
environment is introduced in [2]. Considering the location
‘and context of a conversation, it is Possible to take the
translation to @ more realistic level, This feature makes the
system similar to a non-fluent person ina Janguage that
does not understand all spoken words, but understands the
meaning of what is being said in general. It is also
important that the user feels thet he understands what is
boing said,
Finally, [3] states that the usebility validation of a real-time
translation system helps the crt
making the understandin
Furthermore,
‘oss-lingual communication,
19 Of the conversation oasier,
to enhence performance, it is Tecessary to
incrense the grammer and context-specific domeins, ond
ee
'ext-to-spoech
colloquial expressions, which aro responsible for incorrect
translations. After analyzing other features of interaction
stylos for automatic trenslation, wo will present our user
centered design method.
3, USER CENTERED DESIGN METHOD
We adopted a User Centered Design approach [8] to guide
the stops of development as « primary design method, Thus,
it was possible to define an application that reflects the
User's needs since its conception. Finally, three interaction
cycles were conducted.
The chosen users to guide tho evalustion process wore
hotel's and almor’s taxi drivers. It wes found that they
keep almost daily contact with tourists of different
countries. Moreover, taxi drivers end tourists abroad have
short-term. conversations, such as ‘where to go’ questions or
Comments about daily facts, such as tho weather,
sights,
events and more,
To understand precisely users’ needs, different research
techniques were used, such as individual qualitative
interviews and observation. All interviews were recorded
and Jeter transcribed. The achieved content was analyzed by
threo different people that did not patticipate on the
interview process. Each person categorized the content into
@ taxonomy defining the mein influential factors in the
cross-lingual interaction, The resulting taxonomies were
then merged into a final one.
Once the influential factors were defined, possible solutions
Were identified and refined through a brainstorm section, In
Sequence, low fidelity prototypes wore developod and
evaluated by the potential users: the taxi drivers. The
“Wizard of Oz” technique was used during the validation.
Tn this process, a person simulates a behavior of the testing
device, helping the interaction while is validating. Also, the
user should disregard this porson to make the interaction
closor real interaction with the device. The low-fidelity
Prototype features are oxtendad with this technique,
enabling a more naturel interaction. During the validation,
users’ reactions and expectations were also observed and
current problems and insights were acknowledged.
Tho entire process was repeated for each cycle. At the end,
{total of three prototypes were developed, as a result of the
Svolutionary design cycle. However, for a prototyping
evaluation purpose, only the software epplication was
ovaluated and not the hardware device itself
4. PROTOTYPES DEVELOPMENT
Based on the factors identi
Literature analysis,
ified during the context and
three prototypes were developed as
Scanned with CamScannerdesign solutions of an intuitive real-time voice transletor.
Allof thom were emulated using slides presentations form.
‘The slides represented the screen. Regarding the language,
all tests were applied using English and Portuguese. The
former was chosen for its lingua franca characteristic, the
second because it is the potential users’ native languago.
The application scenario elected to be prototyped consisted
of conversations between a tourist in a taxi and the taxi
driver, Wo dosigned an automobile device equipped with
two similar modules: one for the driver, and another for the
tourist passenger. In this setting, we analyzed the structure
of a roal-timo cross-linqual conversation. In order to make
the interaction more natural, both modules had touch-
screens, a microphone and speakers integrated with the car.
The prototype was called Tradutaxi. For the prototyping,
some evaluation aspects woro considered, such as
objectivity, ease of use and users experience during the
interaction. The prototype must not affect the flow of the
conversation; therefore the user must be free to express
himself through gestures and body expressions, It was
essential to have an intuitive and user friendly interface as
‘well as being hands-free, so the users could concentrate
only on the conversation.
4.1 First Prototyping Cycle
For the first low-fidelity prototype, en initial slide was used
to represent the application screen where the user can
choose from tho available languages as seen at Figure 1.
‘The language options are represented as buttons with labels
indicating the language in its netive form. On the top of the
screen there is the instruction: “Choose your language”.
The user must choose the language they wish to talk in.
Riioos= VouL language
Figure 1 - The first prototype’s initial screen
After choosing the language, tho user is led to @ second
slide: the translator's screen. Similerly, on the top of the
screen there is the instruction: “Press speak if you wish to
translate something”. On the contor of the screen, there is a
circled gray button with a “Speak” Inbel on the bottom, as
shown at Figure 2,
If the user wants to translate a sentence, he must press the
button end speak the sentence to be translated. At this
moment, the button becomes red with a “Translating” label.
To ond tho translation the button must be pressed once
more. The process works for both modules.
Penske
Neb eed pede er)
Figure 3 - The first prototype’s translator's screen
During evaluation, some results wore perceived as follows.
Tho users did not present difficulties while interacting with
tho initial screen. However, despite the simplicity of the
prototype, some confusion occurred with the translator’s
sereon, Many users could not distinguish the color
representation of the button. In other words, they had
difficulties to identify whether the translator was activated
or not. Therefore, the experience failed to achieve a sense
of ease and comfort, because the button’s representation
operated as a distraction.
4.2 Second Prototyping Cycle
After the first validation cycle, some observations were
inforred and a second prototype was developed. Concerning
tho interaction model, the prototype worked similarly to the
Scanned with CamScannerfirst one, The prototype initial screen was well evaluated.
‘Thus, no modifications were made for the second prototype.
The difference between the prototypes was the button’s
representation. In the translator's screen, the circled gray
button was replaced with a green play button, a worldwide
representation for video and music reproduction. The
“Speak” label, in turn, was substituted with a “Translate”
label. Therefore, the user can interact with the interface
using his previous knowledge, which leads to a more
natural experience,
Figure 4 ~The translator's screen of second prototype.
Likewise the first prototype, to translate a speech the user
needs to press the play bution and speak whet must be
translated. After, pressing the button, itis replaced by a red
stop button, with the label “Stop Translating”, as seen at
Figure 5. When the user ends his speech, the stop button
must he pressed, In this way, all possible actions are clearly
prosonted to the usor in a proper roprosontation, provnting
from misinterpretations.
Figuro 5 ~ Tho translator's screen of second prototype.
Tho new button's representation was well evaluated
throughout the user’s validation with the “Wizard of 02”
method. The different shapes and colors helped users to
discriminate the translator’s functionalities.
‘Although the second prototype provided a more natural and
satisfactory interaction, new difficulties concerning the
user’s confidence in the translator were identified. Despite
the translator's ease of use, some users were uncomfortable
because they did not have a guarantee thet the translator
‘was not misinterpreting their speech.
4,2 Third Prototyping Cycle
Finally, the third prototype was an evolution of the previous
one. Thus, there wore no modifications regarding the design
of the translator, the progress concerns the addition of new
feavures.
Through ovaluation, it was acknowledged the significance
of enabling not merely the translation button, but also
screen captions with transcribed speoch, as tho user speaks.
Consequently, it was possible to read the speech, while
speaking. The results of third prototype can be seen at
Figure 6.
Figure 6 ~ The third prototype’s translator's screen
‘Tho now foature provided a more comfortable environment
and a higher level of confidence on the application, es were
observed from users’ reactions while validating.
5. CONCLUSION
In this paper, we presented the evolution of a voice
interface real-time translation interaction style, The design
requirements were identified in the literature and after
applying individual qualitative interviews. We could evolve
the concept model in three design interactions involving
users. We evalunted each version considering the user's
feedback to perform the task analysis.
As a result, we developed a voice interface for real-time
translator, to be used on taxis that keep daily contact with
Scanned with CamScannertourists. The solution is simple and does not affect the
dialogue between the parts. The ‘touch to start translating’
and ‘touch to stop translating’ method was most useful and
had a great acceptance among users who participated on the
interaction validation. The simplicity of tho interaction
allows an easy adaptation to real-time translator systems.
REFERENCES
1, Aiken, M,, Park, M., Simmons, L., and Lindblom, T.
2009. Automatic Translation in Multilingual Electronic
Meetings. Translation Journal, 13(9), July.
2. Chung, J., Kern, R. and Lieberman, H. 2005. Topic
spotting common sense translation assistant. In CHI '05
extended abstracts on Human factors in computing
systems (CHI EA '05), ACM, New York, NY, USA,
1280-1283.
3. Hattori, H. 2002. An Automatic Speech Translation
System on PDAs for Travel Conversation,
InProceedings of the 4th IEEE International
Conference on Multimodal Interfaces (ICMI '02), IEEE
Computer Society, Washington, DC, USA, 211-.
4. Metze, F., McDonough, J., Soltau, H., et al. 2002. The
NESPOLE! speech-to-speech translation system.
In Proceedings of the second international conference
on Human Language Technology Research (HLT '02).
o
x
Morgan Kaufmann Publishers Jnc., San Francisco, CA,
USA, 378-383,
. Rashmi, S, and Jonathan, B. 2004, Rapid information
architecture prototyping. In Proceedings of the 5th
conference on Designing interactive systems:
processes, practices, methods, and techniques (DIS
‘04). ACM, New York, NY, USA, 349-352.
Shigenobu, T. 2007. Evaluation and usability of back
translation for intercultural communication. In
Proceedings of the 2nd international conference on
Usability and internationalization (UI-HCII‘07), Nuray
Aykin (Ed.). Springer-Verlag, Berlin, Heidelberg, 259-
265.
Smailagic, A., Siewiorek, D., Martin, R. and Reilly, D.
1989. CMU Weerable Computers for Real-Time
Speech Translation. In Proceedings of the 3rd IEEE
International Symposium == on_~——Woarablo.
Computers (ISWC '99). IEEE Computer Society,
Washington, DC, USA, 187-,
. Vredenburg, K., Isensee, S. and Righi, C. 2002. User-
Centered Design: An Integrated Approach. Upper
Saddle River, NJ: Prentice Hall PTR. 198-203.
‘Scanned with CamScanner