You are on page 1of 5

t, based on examples of natural

language
instructions and corresponding
trajectories. In reinforcement
learning, the network receives a
reward whenever the robot
manages to reach its goal. Both of
these approaches are
fully learning-based, i.e. mapping
inputs to outputs without
performing any geometric reasoning
or relying on other
types of domain knowledge. They
are also ’greedy’, in the
sense that decisions are taken locally
(”what’s the next best
move?”) rather than planning ahead
overlooking the entire
* Equal contribution
1
Zehao Wang, Minye Wu and Tinne
Tuytelaars are members of the Elec-
trical Engineering Department (ESAT-PSI) of
KU Leuven, 3000 Leuven,
Belgium zehao.wang@esat.kuleuven.be
2
Mingxiao Li and Marie-Francine Moens are
members of the Com-
puter Science Department of KU Leuven,
3000 Leuven, Belgium
mingxiao.li@kuleuven.be
Fig. 1: In this paper, we introduce a
novel map-language
guided navigator in a propose-and-
discriminate fashion.
trajectory. This reduces their error-
tolerance. It also does not
allow to consider alternative routes.
In contrast, robot navigation in
general has been an active
area of research for decades, with
well established methods
for path planning based on
occupancy maps, that have proven
their value in a myriad of real world
applications. The
question then naturally arises how
this can be leveraged in
the context of language guided
navigation.
Recent works for language guided
navigation often include
a pre-exploration phase (e.g.[7]),
during which the robot
explores a new environment. This is
introduced to tackle
the domain gap and finetune the
visual encoder in a self-
supervised setting, and has been
shown to have a signifi-
cant impact on the performance.
Such pre-exploration phase
seems reasonab
ments. VLN-CE [1] are derived from
the Matter-
port [24] dataset which contains
annotated point clouds that
indicate the room type and the
category of objects in the envi-
ronment. There are forty object types
and thirty room types
in the dataset. Using these annotated
data, we render top-
view layouts of each scene an
ments. VLN-CE [1] are derived from
the Matter-
port [24] dataset which contains
annotated point clouds that
indicate the room type and the
category of objects in the envi-
ronment. There are forty object types
and thirty room types
in the dataset. Using these annotated
data, we render top-
view layouts of each scene an
nsions, while the notion of a dissimilarity metric may not be applicable to them. Among the
variety of approaches to their construction, weak semantic maps can be derived from semantic
relations, such as synonymy and antonymy, using the technique developed by Samsonovich and
Ascoli in previous works. This paper presents a weak semantic map constructed using this
technique, based on the Microsoft Word Russian Thesaurus. General characteristics of the map
are described. The new map is compared to the maps reported earlier, that were constructed
using the Microsoft Word English, French, German and Spanish thesauri. Differences and
similarities are noted that may be due to differences between languages or between particular co

You might also like