You are on page 1of 22

Remote interpreting

A technical perspective on recent experiments*

Panayotis Mouzourakis
European Parliament Interpretation Directorate (EPID)

his article reviews recent remote interpreting (RI) experiments carried out
at the United Nations and European Union institutions, with emphasis on
their salient technical features, which are also summarized in the Appendix.
Motivations for remote interpreting with minimum technical requirements
for sound and image transmission in compressed form as well as the meth-
ods used in recent experiments for image capture in the meeting room and
display in the remote room are discussed. he impact of technical condi-
tions upon interpreters’ perception of remote interpreting is also examined
using questionnaire data, which seem to suggest that the interpreters’ visual
perception of the meeting room, as mediated by image displays, is the deter-
mining factor for the “alienation” or absence of a feeling of presence in the
meeting room universally experienced by interpreters under RI conditions.
he paper also points out the advantages of a more coherent research meth-
odology based upon the notion of presence in a virtual environment as well
as possible innovative approaches to providing the interpreter with meeting
room views.

Keywords: remote interpreting, technical standards, meeting room views,


image capture, image transmission, visual perception, alienation, presence

. Introduction and terminology

Remote interpreting, a boon for some, a bane to others, has given rise to much
heated and emotional debate within the interpreting community. Although not
exactly a new idea, as the first attempts in this direction had already taken place
in the 1970s, namely the “Symphonie Satellite” (hiery 1976) and the 1978 New
York — Buenos Aires (Chernov 2004: 82–83) tests, remote interpreting has
recently come into the limelight as a potential mode of conference interpreting.

Interpreting 8:1 (2006), 45–66.


issn 1384–6647 / e-issn 1569–982X © John Benjamins Publishing Company
46 Panayotis Mouzourakis

No fewer than eight experiments using this technique have been carried out
since 1999 in major multilingual organizations such as the United Nations and
the European Union institutions.
Since definitions of the term remote interpreting tend to vary (Niska 1998),
in the context of the present article remote interpreting (RI) will be used to refer
to situations in which interpreters are no longer present in the meeting room,
but work from a screen and earphones without a direct view of the meeting
room or the speaker. his is different from videoconferencing where interpret-
ers are still physically present in a booth, within the meeting room where most
participants are gathered and other participants intervene remotely via a video
link-up. RI should not be confused with video remote interpreting, a term used,
especially in the United States, to refer to a form of person-to-person video-
conferencing mostly (although not exclusively) used to convey sign language.
Experimentation is, by its very nature, oten anarchic and recent experi-
ments in RI, using a bewildering variety of equipment and under conditions
oten specific to the particular institution at hand, is probably no exception.
However, as the dust begins to settle, it is perhaps time to take stock, establish
what has been learned from these experiments and identify the still unresolved
issues. If RI is ever to become a routine form of interpreting, more systematic
research will be needed, to establish technique-specific criteria for technical
standards and interpreter working conditions.
he fascinating ergonomic and cognitive issues raised by the interpreter
experience under RI conditions, which require extensive research before they
can be resolved, and the consequences of the routine use of RI in multilingual
conferences at the organizational and administrative level, will not be included
in the present article. Such matters will only be noted superficially and only to
the extent they are relevant to the technical aspects of remote interpreting.

2. Motivations for RI

In general, there are two kinds of rationale for the introduction of RI in the
United Nations, European Union or other international organizations:
Remote interpreting can be seen as a means of dealing with problems of
interpreter availability and cost. For large multilingual organizations, travel to-
gether with per diems constitute, on the average, about one third of the total
cost of a free-lance interpreter. In addition, such organizations need to deal
with the logistics of putting together large interpreter teams, oten requiring
free-lance interpreters to travel to distant locations, even for brief meetings.
Remote interpreting 47

In fact, if these interpreters did not have to travel but could work from their
homes, they might even be able to take advantage of different time zones to
service more than one meeting per working day, an obvious advantage in the
case of those covering “exotic” languages. Furthermore, for organizations such
as the UN, with staff interpreters dispersed among a number of duty stations
(New York, Geneva, Vienna, Nairobi etc.), remote interpreting could provide
for a more efficient use of human resources.
Another explanation has to do with physical building constraints, in par-
ticular a shortage of meeting rooms with enough booths to accommodate the
twenty (and soon more) official EU languages, or even the impossibility of fit-
ting that many booths into a small or medium-sized meeting room, without
compromising interpreters’ visibility. Other constraints might include a reluc-
tance to install booths in historic meeting rooms and security considerations
mandating the physical separation of interpreters from conferring participants.
In these cases, RI would offer the alternative of accommodating the interpret-
ers in different rooms (whether in booths or in custom-built installations), lo-
cated in the same building or at least within the same building complex.
Whatever the motivation, the ability to physically separate interpreters
from the meeting room opens up a host of new possibilities. In principle, RI
could, without the need for long-distance travel, enable interpreters to service
meetings anywhere in the world. In the future, perhaps interpreters could work
from one of a dense network of decentralized “interpreting stations” linking all
the major cities of the globe.

3. Technical requirements for sound and image transmission

One of the problems that must be resolved before such a vision can be realized
concerns the need to provide the proper technical framework for transmitting
audio and video to the remote interpreters and also sending the interpreta-
tion audio stream back to the meeting room. Satellite links, both analog and
digital, such as those routinely used to transmit TV broadcasts are used for
this purpose; however, the cost of these satellite connections is quite high. As a
result, much interest has been focused on alternative, terrestrial links, particu-
larly ISDN (Integrated Services Digital Network) lines,1 which can be readily
and inexpensively leased from telephone operators. ISDN lines provide syn-
chronized transmission of digitized audio and video streams in the form of a
number of data “blocks”, each such block providing a capacity or bandwidth of
64 kilobits/second (kbps),2 and equal to a digital telephone connection. Using
48 Panayotis Mouzourakis

ISDN lines for videoconferencing relies on a standard audio/video transmis-


sion protocol known as “H 320”.3
To reduce the amount of information to be transmitted, sound and image
information must be first “compressed” into an “H 320” protocol, invariably
entailing some loss in quality. For video streams, a reduced image format (con-
taining fewer pixels4 per frame than standard digital TV) is used, Depending
on the number of 64 kbps blocks available, the frame rate (number of frames
transmitted per second) may have to be reduced, oten to such a degree that
motion becomes visibly jerky. For the audio stream, the standard procedure is
to retain only the lower part of the frequency spectrum used in human speech,
throwing away the higher frequencies (those less critical for speech compre-
hension). “H 320” provides either telephone quality encoding, which retains
speech frequencies only up to about 3.1 kHz,5 or a more advanced encoding,
“G 722”,6 with frequencies up to 7 kHz (Mouzourakis 1996).
Since the “H 320” protocol was primarily designed to address the needs
of the United States market, it does not take into account the subtleties of
multilingual communication. While perhaps adequate for the participants at
a meeting, a 7 kHz sound capacity does not provide adequate support for si-
multaneous interpretation, where faithful transmission of all speech requires,
by the relevant ISO 2603 and 4043 standards, frequencies between 125 Hz and
12.5 kHz (AIIC 2000). his obstacle, persistent in early videoconferencing ex-
periments (ETSI 1993; SCIC 1995; Braun 2004), was first overcome during the
earliest UN (Geneva-Vienna) RI experiment, where the audio stream trans-
mitted between the meeting room in Geneva and the remote room containing
the interpreter booths in Vienna was encoded in mp37 format (Esteban-Causo
1999), retaining speech frequencies up to 20 kHz, with a sound quality just
slightly inferior to that of a CD player. While it is technically possible, using the
“H 320” protocol and special equipment at both ends, to achieve this quality
when transmitting compressed sound signals between two duty stations of the
same organization, in all other cases, interoperability cannot be guaranteed.
he second UN remote experiment in New York, again using nonstandard
encoding for transmitting the audio signal over ISDN, concluded that while 10
kHz would suffice for the meeting participants, a sound signal containing all
speech frequencies up to 14 kHz provided sufficient audio quality for interpre-
tation purposes(UN 2001: 15). he same experiment also established that 384
kbps should be considered the minimum bandwidth necessary for image trans-
mission over ISDN, but recommended at least 512 kbps for the speaker view.
At the same time (the late 1990s), in view of the forthcoming increase in
the number of official EU languages, coupled with the practical difficulties of
Remote interpreting 49

installing more booths in meeting rooms, the EU institutions concentrated


their efforts on achieving proper conditions for remote interpreting between
meeting rooms in the same building and a number of experiments were con-
ducted at the European Commission’s Service Commun Interprétation–Con-
férences (SCIC), the EU Council and the European Parliament Interpretation
Directorate (EPID). he EU Council even briefly considered totally dispensing
with meeting-room interpreter booths and replacing them with special instal-
lations for its projected new building. Owing to this different perspective, EU
institutions’ remote experiments did not contemplate video and audio trans-
mission over ISDN; instead they implemented direct cable (coaxial or fiber
optics) connections between meeting rooms and the remote rooms (providing
supplementary booths) to transport sound and image signals without a loss of
quality. hus, the focus of these experiments shited to investigating the opti-
mum manner for capturing and displaying meeting room views.

4. Image capture and display

Image capture in RI experiments, by either fixed or mobile cameras, typically


involves one or more of the following:
– the speaker
– the podium, including, at the very minimum, the meeting chairperson
– a panoramic view of the meeting room
– partial views of the meeting room.
Many modern conference halls now include microphone-activated cameras
that can automatically focus on delegates as they take the floor. In the absence
of automatic cameras and except for the smallest meeting rooms, at least two
cameramen with mobile cameras are needed to cover the area and all the po-
tential speakers.
Podium shots are centered on the chairperson and typically include his/her
immediate neighbors; such shots can usually be provided by a properly adjust-
ed fixed camera and do not require the continued presence of a cameraman.
Panoramic shots are also usually static views of the meeting room, captured by
one or more fixed cameras. Partial views of the meeting room may be captured,
either by static unmanned cameras or by roaming cameramen selecting “rel-
evant” views, such as an expanded view of the speaker, a national delegation or
some other influential group of delegates, within the immediate environment.
50 Panayotis Mouzourakis

In all but the simplest cases involving more than one mobile camera, a
director and an image-mixing station is needed to select the views projected
in the remote room. his is a demanding task, requiring a professional with
experience in working and directing cameramen to capture relevant views of
the meeting room; for any extended period of time, two directors working half
a day each would probably be needed for this task (Louvranges 2001: 10). Al-
though language skills are an advantage for the director at a multilingual meet-
ing, an experienced interpreter can act as a liaison with the director to facilitate
the proper flow of speech events. Cameramen need both sufficient experience
and time before they concurrently find their targets and provide smooth tran-
sitions between images. Another complication with image mixing is the ad-
ditional delay, relative to the audio signal, in the video signal, which must be
properly compensated to preserve sound — image (“lip-sync”) synchroniza-
tion, the absence of which can be very unsettling for interpreters.
Image capture quality is also sensitive to meeting-room lighting conditions.
Meeting-room design does not generally consider image capture, meeting
room illumination is rarely uniform and the lighting temperature does not cor-
respond to daylight conditions (5200°),8 resulting in an altered color balance.
Passive projection and plasma screens, cathode ray tube (CRT), liquid
crystal display (LCD) monitors, and combinations thereof have all been used
to display meeting-room images to interpreters in their remote rooms. Plasma
screens have been most favorably judged for their lack of flicker and conse-
quent minimal eye fatigue, as well as their compatibility with normal lighting
in the remote room. Yet they have been judged aggressive when placed too
close to the booth. In the EU Council 2001 remote-interpreting experiment,
plasma screens initially placed at 130 cm had to be subsequently moved to a
distance of 350 cm from the booths (EU Council 2001: 5). In the December
2001 EPID remote experiment, where plasma screens were placed at a distance
of 150 cm from the booth and could not be moved, interpreters complained of
excessive glare (EPID 2001b: 7). In general, there has been no comprehensive
evaluation of the pros and cons of different kinds of display, particularly the
optimum distance from the booth for screens and monitors.
In principle, relative to what can be achieved using standard definition
(SD), the quality of both image capture and projection should benefit from
the use of high definition (HD)9 video equipment, as tested during the EPID
2004 remote experiment. In practice the HD “silver bullet” failed to materialize
during the EPID 2004 experiment, since individually adjusting each projector
for HD to overcome persistent color balance problems essentially consumed
the first two weeks of what should have been a 3-week test (Barco 2004: 15).
Remote interpreting 5

In any case, since HD transmission would require more bandwidth than SD,
the use of HD for anything other than direct room-to-room connections is
highly problematic.

5. Meeting-room views

Whereas there is no clear consensus on which set of meeting-room views is


optimal for remote interpreting, there seems to be universal agreement on the
need for the remote interpreter to have a view of the speaker at least. Indeed
even under normal interpreting conditions, especially in large meeting rooms,
or when the interpreter can only see the speaker from the back or side, close-up
shots of the speaker projected on a screen or monitor can provide valuable in-
formation to the interpreter. A view of the podium, including the chairperson,
is also generally considered useful.
he interpreters’ verdict about partial views of the meeting room seems to
be influenced by the nature of the meeting, the room layout, and the partici-
pants’ seating arrangements. In the case of European Parliament meetings, for
instance, where MEPs are seated by political affiliation, rather than by nation-
ality, interpreters expressed only scant interest in having partial views of the
meeting room. On the other hand, for EU Council meetings, where partici-
pants are seated by national delegations (one of which holds the presidency)
and a delegation from the EU Commission is also present, interpreters partici-
pating in the EU Council 2001 remote experiment considered it essential to be
able to view their national delegations (Louvranges 2001: 8). Providing such
a view for each individual booth was of course technically impossible. Never-
theless, the Council report for the RI test suggested that such individualized
views, one per delegation, could be captured using the battery of fixed cameras
mounted on the meeting room ceiling (EU Council 2001: 7).
An alternative possibility, which has only been tested in a single RI experi-
ment, is to allow the interpreter a choice of meeting-room views (EPID 2001a:
22). he let half of the screens used in this experiment (with the right half
permanently showing the speaker) contained a “mosaic” of four meeting-room
views (speaker, podium, let and right halves of the meeting room), from which
the interpreter could select the image that would also appear on the monitor
in front of his/her booth. Unfortunately, the bulkiness of the device used to
select among the four available views discouraged most interpreters, who soon
settled on one of the four views for their monitors.
52 Panayotis Mouzourakis

While all experiments to date have attempted to provide a “global”, pan-


oramic view of the meeting room, in practice, the added value provided by such
a view has been questionable. Panoramic views have certainly never achieved
their goal of providing the remote interpreters with “a perfect view of all occu-
pants of the room”, as was rather optimistically stated in the specifications for
the EPID 2004 remote experiment (EPID 2004: 10). Panoramic views have not
even provided sufficient visual information to allow the interpreters to identify
and follow the facial expressions and gestures of all participants, wherever they
might be seated in the meeting room.
For large meeting rooms, as in the 2004 EPID experiment, even two high
definition (HD) cameras could not resolve this problem. Although the use of
HD cameras did improve visibility, especially of participants seated in the back
rows of the meeting room, they did not provide adequate visibility of all partic-
ipants. Due to the distortion introduced by the wide-angle optics necessary to
cover the entire meeting room, it was not possible to fuse the images captured
by the two cameras into a single, seamless view at the projection stage as would
have been the case with normal optics. In spite of the sophisticated electronics
incorporated in the projectors, cusps and/or discontinuities at the juncture of
the two images could not be eliminated, resulting in an unnatural, incoherent
and disorienting view of the meeting room for the interpreters, most of whom
preferred working from extended speaker views.

6. Possible connections with interpreters’ perception of RI

Interpreters working in multilingual international institutions have consistent-


ly rejected remote interpreting, judging it “unacceptable” (AIIC 2000). his
negative stance was initially attributed to the inferior image and especially the
sound quality characteristic of early videoconferencing and RI tests. However,
starting with the 1999 UN Geneva-Vienna remote interpreting experiment,
it has become clear that interpreter complaints were not only due to the infe-
rior technical conditions, but also the result of a number of physiological (sore
eyes, back and neck pain, headaches, nausea) and psychological complaints
(loss of concentration and motivation, feeling of alienation) stemming from
the remote interpreting conditions. hese complaints resurfaced in subsequent
experiments, conducted in a variety of technical conditions and by a number
of multilingual organizations; it would thus be difficult to attribute them solely
to a particular technical setup or even to the working conditions provided by a
particular organization.
Remote interpreting 53

While it is beyond the scope of the present article to provide a full ac-
count of the cognitive aspects of remote interpreting, it might be instructive to
explore possible connections between individual features of the remote inter-
preting environment, such as the speaker and meeting-room views, on the one
hand, and the interpreters’ subjective judgment of key psychological parame-
ters, on the other, including their feelings of participation or of alienation. his
was based on questionnaire data provided by interpreters who, ater a meeting,
graded the physical and psychological parameters on a scale of −5 to +5, with 0
corresponding to the conditions of normal, “live” interpreting.
Intriguing results emerged from the analysis of the EPID January 2001
questionnaire data of participating interpreters who had been divided into two
groups: those who consistently wear glasses and those who use only reading
glasses or no glasses at all. he feeling of participation reported by those who
consistently wear glasses was found to be much lower (−3.98) than that of the
second group (−2.89). According to a two-tail t-test10 the probability that this
could be due to chance was only 4%. (EPID 2001a: 30). his would indicate
that interpreters who consistently wear glasses are at a handicap under RI con-
ditions, and suggests a connection between visual perception and interpreters’
feelings of participation.
A correlation analysis11 of questionnaire data from the EPID January 2001
remote experiment also showed a correlation coefficient between alienation
and the speaker view of +0.01 (complete independence), while that between
alienation and meeting-room view was +0.37 (−0.19 and +0.46, respectively
for the group of interpreters not permanently wearing glasses). Although
strictly speaking not statistically significant, this result appears to suggest that
alienation, while unrelated to the quality of the speaker view, is influenced by
the more “global” meeting-room view. he same conclusion emerges from a
meta–analysis of questionnaire data from a number of recent RI experiments,
showing that interpreter alienation/lack of motivation in RI is linked to the
view of the meeting room rather than to that of the speaker (Moser-Mercer
2005: 733).
Another result of the EPID January 2001 remote interpreting experi-
ment concerns the correlation between sound-image synchronization and
the speaker and meeting-room views. he respective correlation coefficients
were found to be +0.30 for the speaker view versus +0.04 for the meeting-
room view (+0.51 vs. +0.13 for interpreters not permanently wearing glasses).
hese results could indicate a possible connection between sound and image
perception, i.e. to a multi-modality of perception under remote interpreting
conditions. Such “parallel” processing of simultaneous complementary audible
54 Panayotis Mouzourakis

and visual cues is known to be a feature of normal speech perception, reducing


ambiguity and increasing the likelihood of accurate speech signal detection
(Moser-Mercer 2005: 728).
Considering the caveats that must accompany any such analysis, it is too
early to claim that any of the above results constitutes a “smoking gun”, estab-
lishing a direct relationship between interpreter alienation and specific features
of the remote interpreting technical setup. hese results do, however, suggest
an important role for the visual perception of the meeting room as a whole, as
well as a complex interrelationship between hearing and vision, under remote
interpreting conditions.

7. Towards purpose-built booths for RI?

Perhaps the single most remarkable feature of the RI experiments to date has
been a lack of ambition in design philosophy. Existing meeting rooms were
“converted” into remote rooms for RI purposes by simply installing screens or
other display devices; the configuration of interpreting booths in these rooms
did not undergo any major modification relative to their normal use for “live”
interpretation. However, since the general layout of these rooms and the posi-
tioning of interpreting booths oten severely limit the possibilities for installing
display devices, this approach is rather problematic. While cost considerations
have also favored this approach to RI, these were less important than the im-
plicit, diehard assumption that RI is just another variant of normal interpret-
ing, or put more crudely, that it makes no fundamental difference to interpret-
ers if they are looking at a real meeting room or at a screen.
While one might have expected that a thorough study of “live” interpreta-
tion in a given meeting room, aimed at the identification of potential interpret-
ing problems, be carried out prior to the RI experimentation in that setting,
this has not been the case to date. he only instance where such a study was
actually performed was the EPID 2004 remote experiment, but even there, due
to tight time constraints, it was impossible, based on ergonomic study data, to
significantly alter the remote experiment’s technical setup. Nevertheless, the
final report for this experiment was the first ever to offer TSI (Technology Sup-
ported Interpretation), a concrete concept encompassing both “live” and re-
mote interpreting booths (Mertens & Hoffman 2005: 134–161).
According to the TSI concept, normal interpreting with full access to visu-
al information from the meeting room and remote interpreting are but the two
end-points of a continuum that also includes a wide spectrum of intermediate
Remote interpreting 55

situations where physical restrictions limit the interpreter’s ability to see all
the participants from the interpreting booth properly, e.g. in large meeting
halls. Accessing the missing visual information for the interpreter might call
for varying kinds of assistance, ranging from the relay of voting results, to a
slide presentation or an improved speaker view or even providing a full re-
placement view of the entire meeting room under RI conditions. he authors
of the 2004 EP remote interpreting report claim that information can best be
realized through the use of individual, ergonomic, computerized workstations
connected to the Internet, installed in an office environment or even within
existing booths, and integrating the functions of interpreting console and vi-
sual display. hese workstations would incorporate a flat 19’’ screen at approxi-
mately 90 cm from the interpreters’ eyes, and allow the interpreter to select one
or more windows with views of the speaker and podium, as well as partial and
panoramic views of the meeting room.
Since the TSI concept has not yet been tested, some of its claims should
probably be taken with a healthy dose of skepticism. For instance, it is not
particularly likely that a 19’’ screen will be capable of providing sufficient detail
for a panoramic view of the meeting room; nor would interpreters be expected
to relish working from a display placed at a distance of only 90 cm (in most
RI experiments interpreters have insisted that screens be placed as far as pos-
sible from the booth). It is also unclear whether the TSI concept, in its present
form, would be compatible with existing booths, especially since the minimum
dimensions for the interpreters’ offices or booths recommended by Mertens &
Hoffman would represent an increase of at least 20% over present booth sizes
(Mertens & Hoffman 2005: 140). Nevertheless, the TSI concept has at least pro-
vided a starting point for future efforts for defining new standards for booths
intended for remote interpreting and/or limited visibility situations.

8. Future directions for remote interpreting research

While it would be rather premature to predict the precise direction of future re-
mote interpreting experiments, one can already identify two broad, interlinked,
areas which could prove fruitful for future research: a) the analysis of the causes
of interpreter alienation under RI conditions and b) the exploration of alterna-
tive more effective methods for visualizing the meeting room in RI conditions.
It is unlikely that much progress in RI will be achieved without a more
systematic understanding of the ergonomic and cognitive issues involved and,
in particular, interpreter alienation or the absence of a feeling of participation
56 Panayotis Mouzourakis

in the meeting room universally observed and experienced in RI experiments.


Interpreter alienation has been shown to be strongly correlated with lack of
concentration (r = 0.55) and of motivation (r = 0.68) in at least one remote
experiment (EPID 2001a, Annex 5). It is worth noting that both the absence
of a feeling of participation and many related physical complaints (eye and
posture problems, nausea, headaches, etc.) are not unique to RI, but are also
common to other activities where the tasks are performed within a virtual
environment (VE) where a human operator is only remotely present. Studies
have shown that both task performance and the degree of physical and psy-
chological comfort experienced by the human operator crucially depend on
his/her sense of “being there”, or (self-) presence within that virtual environ-
ment (Mouzourakis 2003).
he premise that remote interpreting is an activity within a virtual envi-
ronment and the consequent notion of presence as the central metaphor for RI
study is, at present, only a working hypothesis. he notion of a presence meta-
phor is the only available paradigm that can provide a wider framework within
which the relationship between the remote environment and the interpreters’
experience might be further elucidated. he presence metaphor provides an al-
ternative to evaluate the available RI options in terms of minimizing interpreter
alienation and promoting a sense of their being present (even if only virtually)
in the meeting room, rather than evaluating these options exclusively on their
technical “attractiveness” or even expediency. If quantifiable metrics for an RI-
appropriate presence (Kalawsky 2000) can be identified, it might even become
possible to benchmark RI environments according to the degree of presence
they foster. For a presence paradigm to be suitable for RI, a consistent and im-
mersive technical environment will be needed, which has too oten not been
the case, as screens and monitors have been indiscriminately thrown together
in front of otherwise empty, dark interpreter booths.
An RI-appropriate technical environment must also consider how hu-
man vision operates. Human vision has evolved to support our survival and
includes features such as depth perception and the presence of both foveal and
peripheral vision, rather than megapixel counts. Under RI conditions, with the
interpreter looking at two dimensional screens and monitors at viewing angles
such that peripheral vision (i.e. the component most sensitive to moving ob-
jects) essentially never comes into play, depth perception is limited. In RI en-
vironments another component of depth perception, parallax, the continuous
shit in point of view that we experience as we move our head or even our full
body relative to an object (exploited in virtual reality displays to make them
look real), is also missing.
Remote interpreting 57

Without an IMAX-like panoramic projection, or some equivalent technol-


ogy, or at least an immersive experience fully exploiting peripheral vision, it
will probably not be possible to restore anything close to normal vision. Res-
toration of depth perception would require the 3-D projection of images, and
interpreters would have to wear (color, polarizing, or even electronic) glasses
at all times. Parallax-dependent alternative approaches, such as integral pho-
tography, which dispenses with the need for glasses but which is for the time
being limited to narrow angles of view only for providing depth information,
are currently being explored.
Novel strategies for visualizing the meeting room will also need to consider
how verbal and non-verbal information is integrated in simultaneous inter-
preting. Eye gaze studies (Yarbus 1967) have shown that human vision is both
active and selective and how we look at a scene depends on the problem we are
trying to solve; irrelevant details may be overlooked entirely while the brain is
constantly “filling in the blanks”. Interpreters apparently scan meeting rooms
in ways that have much less to do with the particular person having the floor
than with the content of what is being said (Moser-Mercer 2002). hus, the
interpreter must have some choice over the images displayed in the remote
room, above and beyond that provided by a mere “menu” of speaker and meet-
ing room views selected by a director/camera crew, a need that could perhaps
be partly met by individual control of a simple web camera. A more elegant
solution, dispensing with the need for a camera crew altogether, might be to
produce a composite panoramic meeting room view electronically combining
the information captured by multiple fixed cameras. his is the approach used
in several experimental systems such as the FlyCam panoramic video system
(Foote & Kimber 2000), where the output of a series of inexpensive, off-the-
shelf cameras (arranged in a manner similar to a fly’s eye) is combined into
a single panoramic image that can be digitally zoomed or panned into by a
“virtual” camera.

9. Some tentative conclusions

Perhaps the first conclusion to be drawn from recent RI experiments is meth-


odological. Most experiments to date have focused within the specific setting
and constraints of a given institution, on providing a “proof of concept” for re-
mote interpreting, oten hoping, to arrive at a set of “ideal” technical conditions
by sheer luck. While some important, primarily negative lessons have been
learned from testing different types of equipment under a variety of meeting
58 Panayotis Mouzourakis

room configurations and arrangements, it should be clear by now that such an


incoherent approach is unlikely to lead to any real progress.
From a research point of view and to avoid the current cacophony of ex-
perimental conditions, it would be preferable (as well as more cost-effective)
to define a prototype remote booth configuration, perhaps such as suggested
by the TSI concept, within which future RI experimentation might be con-
ducted under standardized and controlled conditions. he comparability of
experimental results would be further enhanced by the definition of a mini-
mum common set of parameters to be studied under both “baseline” (normal
interpreting) and intermediate RI conditions involving reduced visibility of the
meeting room. Research in RI would also benefit from the increased dissemi-
nation of experimental results and conclusions, many of which are currently
unpublished or only available on restricted-access institutional websites.
Future research on remote interpreting should also investigate the mecha-
nisms through which visual information from the meeting room is perceived
by the interpreter and how visual and verbal information interact to form a
cohesive whole. To test and compare alternative options, especially innovative
ones, for image capture and display (and possibly also sound delivery) it is im-
perative to reach “benchmarking” remote interpreting configurations. Perhaps
the concept of presence (or the absence of alienation), emphasizing the need
to offer a consistent, immersive environment for the interpreter, could serve
as the unifying metaphor for a more general approach to RI, embedding RI
research within a wider discipline (the study of human performance in virtual
environments).
A final conclusion might include the implications of the above consider-
ations for the regular use of remote interpreting in multilingual institutions.
Promises of cost savings, flexibility and overcoming building-related con-
straints have been predicated on the assumption that it will be possible to
implement this technique using already existing infrastructures and off-the-
shelf technologies. he viability of RI also rests on the assumption of “business
as usual” working conditions for interpreters, which in the light of recent RI
experiments, seems a rather problematic proposition. If remote interpreting
booths must be redesigned and innovative image capture and visualization
methods deployed, and new working conditions for RI must be negotiated
with interpreters, the use of RI by multilingual institutions on a routine basis
will inevitably result in greater costs and complexity than hitherto envisaged.
Remote interpreting 59

Notes

* Disclaimer: he opinions expressed in the present article are purely those of the author
and do not reflect the point of view of the European Parliament Interpretation Directorate
or of any other European Parliament body.

. ISDN telephone lines provide an alternative to what is known as POTS (“plain old tele-
phone service”): unlike normal telephone lines restricted to carrying voice information only
(in analog form) they can carry voice or data plus connection control information in digital
form (i.e. as strings of 0’s and 1’s).

2. he speed at which data can be transmitted over a digital line is usually expressed in kbps,
i.e. thousands of bits of information per second, each bit (“binary digit”) having a value of ei-
ther 0 or 1. his speed is also (rather improperly) commonly referred to as the “bandwidth”
of the line in question.

3. “H 320” is a standard agreed by ITU (the International Telecommunications Union) for


the transmission of digital audio and video information, typically structured in multiples of
64 kbps blocks (see note 2 ) over ISDN lines. Both the audio and video signals are usually
“compressed”, i.e. specially treated to remove redundant information, to enable as much
information as possible to fit within each standard 64 kbps block.

4. A pixel (short for “picture element”) is the elementary unit out of which images in digital
form are composed; a pixel is typically represented by a triplet of 8-bit values (ranging from
0 to 255) for each of the three primary colors: red, green and blue. he number of pixels
in an image is referred to as the resolution of that image; a typical resolution for computer
monitor images would be 1024 by 768 pixels.

5. Sound signals are characterized by their frequency, i.e. the number of times the sound
wave pattern repeats itself per second. 1 Hertz (Hz) corresponds to one such repetition per
second; one kHz is a thousand Hertz. he human ear is sensitive to frequencies ranging
roughly from 100 Hz to 20 kHz. ISO standards for simultaneous interpreting require the
faithful transmission of all speech frequencies between 125 Hz and 12.5 kHz (AIIC 2000).

6. A standard 64-kbps block (see note 3) is used in telephony to carry the human voice,
retaining frequencies from 0 to about 3.1 kHz. he “G 722” standard, contained in “H 320”,
provides for the compression of sound signals by a factor of roughly 2, so that a 64 kbps
block can now carry frequencies from 0 to about 7 kHz.

7. Mp3 (more properly MPEG1 layer 3) is a popular standard for compressed sound files,
providing nearly CD quality at roughly one tenth the CD file size.

8. In video, as in photography, color balance depends on the temperature of the illuminat-


ing light source; this temperature is 5200° (the temperature of the sun’s surface) for ambient
sunlight, while artificial lighting has a lower temperature, typically around 3200°.

9. High definition (HD) refers to a standard for TV and video, providing considerably more
resolution (up to 1920 by 1080 pixels) than “standard” digital (SD) TV (720 by 480 pixels).
60 Panayotis Mouzourakis

0. he t-test is a standard statistical test for determining whether a difference observed in
the values of a variable between two subsets of a population is statistically significant or not.
A two-tail t-test makes no presumption as to which subset is expected to return a higher
value for the variable.

. In statistics, two variables are said to be correlated if they follow a similar trend, and anti-
correlated if they follow opposite trends. he correlation coefficient r between two variables
is defined in such a way as to have a value of +1 or −1 for perfect correlation and −1 for per-
fect anticorrelation; a value of 0 corresponds to the variables being completely independent
of each other. A significant correlation is deemed to exist if r > 0.5 (or r < −0.5).

References

AIIC. (2000) Code for the use of new technologies in conference interpreting. Communi-
cate! March-April 2000. http://www.aiic.net/ViewPage.cfm/page120.htm (accessed 18
December 2005).
Barco N. V. (2004). Technical report lot 2. Remote interpreting test, European Parliament
Brussels, December 2004. Unpublished.
Braun, S. (2004). Kommunikation unter widrigen Umständen? Tübingen: Gunter Narr.
Chernov, G. V. (2004). Inference and anticipation in simultaneous interpreting. Amsterdam/
Philadelphia: John Benjamins.
EPID (2001a). Report on a remote interpreting test at the European Parliament. http://www.
europarl.eu.int/interp/remote_interpreting/ep_report1.pdf (accessed 18 December
2005).
EPID (2001b). Report on the second remote interpreting test at the European Parliament.
http://www.europarl.eu.int/interp/remote_interpreting/ep_report2.pdf (accessed 18
December 2005).
EPID (2004). Study concerning the constraints arising from remote interpreting. Special provi-
sions and specifications. http://www.europarl.eu.int/interp/online/english/techno/foru-
men.pdf (accessed 18 December 2005).
EU Council (2001). Rapport sur un test de téléinterprétation effectué au Secrétariat Général
du Conseil. http://www.europarl.eu.int/interp/remote_interpreting/sg_conseil_avr-
il2001.pdf (accessed 18 December 2005).
Esteban-Causo, J. (1999). Rapport de mission: Expérience de téléinterprétation ONU. Unpub-
lished SCIC internal note.
ETSI (1993). Study of ISDN videotelephony for conference interpreters. Unpublished report.
Foote, J. & Kimber, D. (2000). FlyCam: Practical Panoramic Video. In Proceedings of IEEE
International Conference on Multimedia and Expo, vol. III, pp. 1419–1422. http://www.
fxpal.com/publications/FXPAL-PR-00-090.pdf (accessed 18 December 2005).
ITU (2001). Remote interpretation — status report. Unpublished report submitted to IAM-
LADP.
Kalawsky, R. S. (2000). he validity of presence as a reliable human performance metric in
immersive environments. 3rd International Workshop on Presence, Delt, Netherlands.
http://www.presence-research.org/Kalawsky.pdf (accessed 18 December 2005).
Remote interpreting 6

Louvranges Broadcast (2001). Rapport technique: Tests de télé-interprétation Secrétariat Gé-


néral du Conseil de l’UE. Unpublished report.
Mertens & Hoffman Management Consultants Ltd. (2005) Final report on the December
2004 Remote Interpreting Test at the European Parliament. Unpublished.
Moser-Mercer, B. (2002). Situation models: he cognitive relation between interpreter,
speaker and audience. In F. Israël (Ed.), Identité, altérité, équivalence? La traduction
comme relation. Paris: Minard-Lettres modernes, 163–187.
Moser-Mercer, B. (2005). Remote interpreting: Issues of multi-sensory integration in a mul-
tilingual task. Meta 50 (2), 727–738.
Mouzourakis, P. (1996). Videoconferencing: Techniques and challenges. Interpreting 1 (1),
21–38.
Mouzourakis, P. (2003). hat feeling of being there: Vision and presence in remote interpret-
ing. Communicate! Summer 2003. http://www.aiic.net/ViewPage.cfm?page_id=1173
(accessed 18 December 2005).
Niska, H. (1998). What is remote interpreting? http://lisa.tolk.su.se/remote-niska.html (ac-
cessed 18 December 2005).
SCIC (1995). Evaluation des questionnaires Beaulieu. Unpublished internal note.
SCIC (2000). Rapport concernant les tests de simulation de téléconférence au SCIC en janvier
2000. http://www.europarl.eu.int/interp/remote_interpreting/scic_janvier2000.pdf
(accessed 18 December 2005).
SCIC (2001). Essai de téléinterprétation au Conseil de l’Union Européenne. Unpublished
internal note.
hiéry, C. (1976). Note on the UNESCO “Symphonie Satellite” interpretation experiment.
Unpublished report.
UN (1999). A joint experiment in remote interpretation UNHQ-UNOG-UNOV. Unpublished
report.
UN (2001). he second full-scale experiment in remote interpretation in the United Nations.
Unpublished report submitted to IAMLADP.
Yarbus, A. L. (1967). Eye movements and vision. New York: Plenum Press.

Author’s address
Panayotis Mouzourakis
Interpretation Directorate
European Parliament
Rue Wiertz
B-1047 Brussels
Belgium
E-mail: PMouzourakis@europarl.eu.int

About the author


Panayotis (“Takis”) Mouzourakis, a former particle physicist, has been a staff interpreter at
the European Parliament since 1983. He is particularly interested in the application of ICT
and “new technologies”, especially remote interpreting, in conference interpreting.
62 Panayotis Mouzourakis

Appendix: Videoconferencing and RI experiments since 1990

What follows is a list of major videoconferencing and remote interpreting experiments


since 1990 together with a summary of their salient technical characteristics and/or main
conclusions:

Videoconferencing experiments using ISDN connections:


a. he European Telecommunications Standards Institute (ETSI) in Nürnberg, January
1993: five interpreters in two standard booths, working from 84’’ monitors placed at a
distance of 200 cm from the booths. Audio and video input from videotapes of real meet-
ings, encoded according to H320 standards. 3.1 kHz (considered unacceptable) or 7 kHz
sound; 128 kbps (considered unacceptable) or 384 kbps for the images (ETSI 1993).
he conclusions to be drawn from this study are, therefore, positive, in the sense that
we now know that the lower levels of video and audio bandwidth do not support
simultaneous interpretation, leaving open for further study the possibilities for the
higher level, especially with further improvements in codec design leading to im-
proved picture, and above all, sound.
b. Commission Européenne Service Commun Interprétation–Conférences (SCIC) at the
Beaulieu studio (Brussels) in the autumn of 1995: interpretation of a videoconference
(administrative meeting) between Brussels and Luxembourg. Interpreters work in nor-
mal booths visualizing distant participants on double screens of 150 cm diagonal each
plus booth monitors. 7 kHz sound, up to a total of 2048 kbps for images. Sound quality
found to be unacceptable by interpreters in the test who were unanimous in considering
that “an improvement in sound quality is a necessary prerequisite to a regular assignment
to such meetings” (SCIC 1995).
c. University of Tübingen, 1996–1998: the interpreter mediates between two speakers us-
ing different languages, visible on PC monitors. Technical conditions as in the ETSI test:
3.1/7 kHz sound and 128/384 kbps image; as in the ETSI experiment, only the higher bit
rates are considered acceptable. It is concluded that the interpreter can, through adaptive
strategies, overcome some of the constraints of the videoconferencing situation (Braun
2004: 337):
Die erhobenen Daten lassen ingesamt den Schluss zu, dass durch Monitoring und
gegenseitige Unterstützung sowohl eine spontane als auch eine dauerhate Anpassung
der Beteiligten an neue, unvertraute kommunikative Situationen grundsätzlich mög-
lich ist.

Remote interpreting experiments using ISDN connections:


d. United Nations Organization (UN) Geneva-Vienna test, January-February 1999: two
weeks of meetings in Geneva interpreted by a team working in booths of a meeting room
in Vienna. his is the first time that the 7-Hz limitation for sound (inherent in H320) is
Remote interpreting 63

overcome by using non-standard, mp3 encoding (20 kHz sound);image capture in Ge-
neva using 3 cameras filming from different angles; projection in Vienna by 2000 lumen
projectors on large double screens, 4 by 4 m each at 15–20 m from the booths; 384 kbps
per screen. Let-hand side screen shows a static view of the meeting room with a small
close-up of the president in a corner. Right-hand side shows the speaker. Conclusions
(UN 1999: 25–27):
he experiment was a technical success, albeit a qualified one. Saying that it was a
successful experiment should not be interpreted as meaning that remote interpreta-
tion on a large scale is a viable and cost-effective alternative to on-site interpretation
… However, it is doubtful that RI [remote interpretation] will ever become standard
practice for interpreters who were trained to work on-site without some adjustments
to their working conditions. Indeed, this experiment seems to indicate that there are
components in simultaneous interpretation which do not lend themselves to techno-
logical solutions …
e. Ecole de Traduction et de l’Interprétation — International Telecommunications Union
(ETI-ITU) test in Geneva, April 1999: he first experiment to investigate the interpreter’s
psychological/physiological response to remote vs. normal interpreting when six inter-
preters alternate between a normal and a remote booth. Technical setup similar to UN
experiment above: image capture by three orientable cameras; one for president and two
for global view plus speaker. Mp3 quality sound, 384 kbps for the image projected on a
monitor, showing a global view of the meeting room on which a small portrait of the
speaker or president is superimposed. Conclusions (ITU 1999: 19):
he first controlled experiment to evaluate human factors and technical arrangements
in remote interpreting has demonstrated that for the same group of interpreters work-
ing live in a conference room is psychologically less stressful, less tiring and conducive
to better performance.
f. UN New York test, April 2001: Two weeks of meetings in a conference room in N.Y. are
interpreted by a team located in a different conference room, to which audio and video
are provided by a combined ISDN plus satellite (4.85 MHz) link. In the meeting room,
3 cameras are used, one facing participants and capturing speaker close-ups, one facing
the podium to provide an image of the chairperson, and one providing a general view
of the meeting room. here are 3 cameramen and a director. For each booth, images are
displayed by one 42-inch plasma screen (showing the speaker at 512 kbps), 14 feet from
the booth plus one 25-inch monitor (alternately showing the general room view or the
podium at 384 kbps), 11 feet from the booth. he experiment concluded that the mini-
mum requirements for remote interpreting were (UN 2001: 15):
14 kHz sound (requiring 128 kbps) for sending floor sound to the booths (14 kHz)
and 10 kHz sound (at 64 kbps) for sending interpretation back to the floor (10 kHz);
512 kbps for the image of the speaker plus 384 kbps for the floor/podium image.
64 Panayotis Mouzourakis

Remote experiments using room-to-room (cable) connections:

g. SCIC test in Brussels, January 2000: sound and image transmission by direct cabling.
Image capture by 5–6 cameras (two fixed and 3–4 mobile or one fixed and 5 mobile),
one mixing station, a director, three cameramen, all quadrilingual. A number of different
configurations were used for image display: 16/9 monitors, projection on big screens,
plasma screens, under both natural and reduced light conditions. Interpreters stressed
the added fatigue due to artificial lighting and were not satisfied by the speed of reaction
and target choice of the cameras; they also stressed the alienation of the interpreter from
the meeting room under such conditions (SCIC 2001: 3):
Le manque de vision globale de la salle et du déroulement réel de la réunion entraîne
la perte d’élements d’information essentiels à l’interprétation, relevant de la commu-
nication non verbale…
… Toute la gestuelle échappe. Il devient impossible de suivre réactions et interactions
des délégués, d’identifier le prochain intervenant, d’anticiper la langue qu’il va parler,
de se rendre compte d’éventuels problèmes techniques (micro, mauvaise audition …)
et de proposer des remèdes aux délégués.
h. EPID test in Brussels, January 2001: One week of parliamentary meetings; sound and
image transmission by direct cabling. he meeting room was covered by five cameras,
one fixed camera showing the podium, one fixed camera providing a global view of the
meeting room and three mobile cameras on tripods (3 full or partial let-right views of
meeting room plus 2 cameras for speaker views), operated by 5 cameramen and a direc-
tor. In the remote room, images were projected by a pair of 3000 lumen projectors on
each of 3 large screens (placed in such a way as to be visible from at least one of the 11
interpreting booths) plus 11 monitors, one in front of each booth. he let-hand side of
the double screens was used to project a mosaic of four images which could be used by
the interpreter to select the image appearing on the monitor in front of each booth. he
right-hand screen usually showed a close-up of the speaker, as chosen by the director.
Interpreters had the possibility (through a rather unwieldy device) of choosing one of
the four mosaic images to be displayed on the monitor in front of the booth. Conclusions
(EPID 2001a: 5):
he technical set-up tested in the course of this experiment did not provide interpret-
ers with an adequate and coherent view of the meeting room. his led to a significant
loss of visual information, which was not compensated by the arrangement intro-
duced to allow individual booths to choose between multiple camera views …
… Remote interpreting resulted generally in a significant and cumulative increase in
fatigue and physical discomfort for the interpreters, who reported a marked deterio-
ration in their ability to concentrate and their motivation to work in such a setting, as
well as a significant feeling of alienation.
i. European Union (EU) Council test in Brussels, April 2001: sound and image transmis-
sion by direct cabling. 7 cameras used: 4 mobile cameras for the speakers, 3 fixed cam-
eras for the chair, for a (wide-angle) global view of the meeting room plus a view of
Remote interpreting 65

(a maximum of 3) selected delegations. 4 cameramen, 3 assistant cameramen, a director,


and a general coordinator. Display by 2 plasma screens (42’’) per booth, capable of show-
ing a total of 2, 3 or 4 images (each screen can display up to 2 images). Plasma screens
initially at 130 cm from the booths (too close), gradually moved to 350 cm at the request
of interpreters. Lack of flicker in plasma screens considered a plus, reducing eye fatigue.
Interpreters work in mixed-team mode (some booths working normally, others in re-
mote). hey tend to prefer one screen displaying the image of the speaker, preferably in
the midst of his/her delegation. Interpreters consider the following visual information as
necessary (Louvranges 2001: 8):
– a view of the speaker plus his/her delegation
– a view of the chair plus secretaries to the right and let
– a panoramic view of the meeting room (constant or periodical)
– a view of the Commission (when they take the floor)
– a view of one’s own delegations (audience).
Image mixing introduces a delay (1–2 frames) of image relative to sound. Neon lighting
may interfere with image quality through parasite currents unless it corresponds to daylight
temperature (5200°).
he EU Council report concludes that present booths are unsuited to RI. Visual commu-
nication between booths is needed for relays. It is also impossible, with the present setup, for
interpreters to see their client delegations. One solution would be to install cameras (one per
delegation) on the ceiling of the meeting room (EU Council 2001: 7). A new ISO standard
will have to be defined before RI can be routinely used; reduced working hours will have to
be applied to interpreters working in RI conditions.
j. EPID test in Brussels, December 2001 (EPID 2001b): One week of parliamentary meet-
ings; sound and image transmission by direct cabling. 5 cameras altogether: 2 fixed cam-
eras, capturing one half of the meeting room each plus a third fixed camera with a view
of the podium and two mobile cameras for the speakers. One director. Image display is
by two 50” plasma screens per booth, at about 150 cm plus two 18” LCD screens, in front
(but below) the plasma screens. he flicker (possibly due to interference with the meet-
ing room lighting), excessive brightness, excessive contrast. he placement of screens is
far from ideal; cannot be moved further away, interpreters resort to using their hands or
even wearing sunglasses (the so-called “Blues Brothers effect”) to shield themselves from
the plasma screens. LCD monitors (which interpreters prefer) are deemed better than
screens, but appear too dark next to plasma screens and are not ideally placed.
k. EPID test in Brussels, November 2004 (Barco 2004): 3 weeks of parliamentary meetings.
Sound and image transmission by direct cabling. 5 HD cameras, 2 cameras fixed behind
the podium providing a panoramic view of the meeting room, a third one providing a
fixed image of the podium, 2 mobile cameras providing a speaker image. Fiber optics
transport between meeting and remote room. 6 screens in the remote room (one per 2
booths): 4.02 m wide by 1.13 m high (aspect ratio = 32/9) at a distance of 6.00 m from
the booth. Two HD projectors per screen showing the two halves of the panoramic room
view, with a speaker (plus surroundings) shot superposed on the let side and the podium
(3–4 persons including the chair) on the right side. Sound-image synchronization by
introducing a 90 ms audio delay: perfect lip sync claimed.
66 Panayotis Mouzourakis

Problems: Joining the two panoramic (fairly wide-angle) shots creates an optical dis-
tortion. he alternative would be two different meeting room perspectives that would
not exactly match. Since the projector sotware was unable to provide the normal color
balance in HD mode, the experiment was run in standard definition mode for the first
two weeks; HD was achieved only during the last week and still fell short of the test
specifications.

You might also like