Professional Documents
Culture Documents
Smart Guiding Glasses For Visually Impaired People in Indoor Environment
Smart Guiding Glasses For Visually Impaired People in Indoor Environment
3, August 2017
Abstract—To overcome the travelling difficulty for the visually perceive all the necessary information such as volume or
impaired group, this paper presents a novel ETA (Electronic distance, etc. [4]. Comparably, ETA (Electronic Travel Aid)
Travel Aids)-smart guiding device in the shape of a pair of can provide more information about the surroundings by
eyeglasses for giving these people guidance efficiently and safely. integrating multiple electronic sensors and have proved to be
Different from existing works, a novel multi-sensor fusion based
obstacle avoiding algorithm is proposed, which utilizes both the effective on improving the visually impaired person’s daily life
depth sensor and ultrasonic sensor to solve the problems of [4], and the device presented in this work belongs to such
detecting small obstacles, and transparent obstacles, e.g. the category.
French door. For totally blind people, three kinds of auditory cues The RGB-D (Red, Green, Blue and Depth) sensor based
were developed to inform the direction where they can go ahead. ETA [5], [6] can detect obstacles more easily and precisely than
Whereas for weak sighted people, visual enhancement which other sensor (e.g. ultrasonic sensor, mono-camera, etc.) based
leverages the AR (Augment Reality) technique and integrates the
traversable direction is adopted. The prototype consisting of a scheme. However, a drawback of the depth sensor is that it has
pair of display glasses and several low-cost sensors is developed, a limited working range in measuring the distance of the
and its efficiency and accuracy were tested by a number of users. obstacle and cannot work well in the face of transparent objects,
The experimental results show that the smart guiding glasses can such as glass, French window, French door, etc. To overcome
effectively improve the user’s travelling experience in complicated this limitation, a multi-sensor fusion based obstacle avoiding
indoor environment. Thus it serves as a consumer device for algorithm, which utilizes both the depth sensor and the
helping the visually impaired people to travel safely.
ultrasonic sensor, is proposed in this work.
Index Terms—AR, depth sensor, ETA, sensor fusion, vision The totally blind people can be informed through auditory
enhancement and/or tactile sensor [7]. Tactile feedback does not block the
auditory sense, which is the most important perceptual input
source. However, such an approach has the drawbacks of high
I. INTRODUCTION power consumption and large size, which is not suitable for
A CCORDING to the official statistics from World Health wearable ETA (like the glasses proposed in this work). Thus,
Organization, there are about 285 million visually sound or synthetic voice is the option for the first case. Some
impaired persons in the world up to the year of 2011: about 39 sound feedback based ETAs map the processed RGB image
million are completely blind and 246 million have weak sight and/or depth image to acoustic patterns [8] or semantic speech
[1]. This number will increase rapidly as the baby boomer [9] for helping the blind to perceive the surroundings. But, the
generation ages [2]. These visually impaired people have great blind still needs to understand the feedback sound and decide
difficulty in perceiving and interacting with the surroundings, where they can go ahead by himself. Thus, such systems are
especially those which are unfamiliar. Fortunately, there are hard to ensure the blind making a right decision according to
some navigation systems or tools available for visually the feedback sound. Focusing on the above problem, three
impaired individuals. Traditionally, most people rely on the kinds of auditory cues, which is converted from the traversable
white cane for local navigation, constantly swaying it in front direction (produced by the multi-sensor fusion based obstacle
for obstacle detection [3]. However, they cannot adequately avoiding algorithm) were developed in this paper for directly
guiding the user where to go.
Since the weak sighted people have some degree of visual
Manuscript received July 1, 2017; accepted August 30, 2017. Date of perception, and vision can provide more information than other
publication September 5, 2017. This work was supported by the CloudMinds
Technologies Inc. (Corresponding author: J. Bai.)
senses, e.g. touch and hearing, the visual enhancement, which
Jinqiang Bai is with Beihang University, Beijing, 10083, China (e-mail: uses the popular AR (Augmented Reality) [10], [11] technique
baijinqiang@buaa.edu.cn). for displaying the surroundings and the feasible direction on the
Shiguo Lian, Zhaoxiang Liu, Kai Wang are all with AI Department, eyeglasses, is proposed to help the users to avoid the obstacle.
CloudMinds Technologies Inc., Beijing, 100102, China (e-mail: { scott.lian,
robin.liu, kai.wang }@cloudminds.com). The rest of the paper is organized as follows. Section II
Dijun Liu is with DT-LinkTech Inc., Beijing, 10083, China (e-mail: reviews the related works involved in guiding the visually
liudijun@datang.com) impaired people. The proposed smart guiding glasses are
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
presented in Section III. Section IV shows some experimental
Digital Object Identifier 10.1109/TCE.2017.014980 results, and demonstrates the effectiveness and robustness of
0098 3063/17/$20.00 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
J. Bai et al.: Smart Guiding Glasses for Visually Impaired People in Indoor Environment 259
the proposed system. Finally, some conclusions are drawn in on a belt [7], helmet [23] or in a backpack [14]. Although they
Section V. have far less interference with sensing the environment, they
are hard to represent complicated information and require more
II. RELATED WORK training and concentration. Audio feedback based systems
utilize acoustic patterns [8], [9], semantic speech [24], different
As this work focuses on the obstacle avoidance and the
intensities sound [25] or spatially localized auditory cues [26].
guiding information feedback, the related work with respect to
The method in [8], [9] directly maps the processed RGB image
such two fields are reviewed in this section.
to acoustic patterns for helping the blind to perceive the
A. Obstacle Avoidance surroundings. The method in [24] maps the depth image to
There exist a vast literature on obstacle detection and semantic speech for telling the blind some information about
avoidance. According to the sensor type, the obstacle the obstacles. The method in [25] maps the depth image to
avoidance method can be categorized as: ultrasonic sensor different intensities sound for representing obstacles in
based method [12], laser scanner based method [13], and different distance. The method in [26] maps the depth image to
camera based method [5], [6], [14]. Ultrasonic sensor based spatially localized auditory cues for expressing the 3D
method can measure the distance of obstacle and compare it information of the surroundings. However, the user will
with the given distance threshold for deciding whether to go misunderstand these auditory cues under noisy or complicated
ahead, but it cannot determine the exact direction of going environment. Visual feedback based systems can be used for
forward, and may suffer from interference problems with the the partially sighted individuals due to its ability of providing
sensors themselves if ultrasonic radar (ultrasonic sensor array) more detailed information than haptic or audio feedback based
is used, or other signals in indoor environment. Although laser systems. The method in [27] maps the distance of the obstacle
scanner based method is widely used in mobile robot to brightness on LED (Light Emitting Diode) display as a visual
navigation for their high precision and resolution, the laser enhancement method to help the users more easily to notice the
scanner is expensive, heavy, and with high power consumption, obstacle. But, the LED display only shows the large obstacle
so it is not suitable for wearable navigation system. As for due to its low resolution.
camera based method, there are many methods based on In this paper, a novel multi-sensor fusion based obstacle
different cameras, such as mono-camera, stereo-camera, and avoiding algorithm is proposed to overcome the above
RGB-D camera. Based on the mono-camera, some methods limitations, which utilizes both the depth sensor and ultrasonic
process RGB image to detect obstacles by e.g., floor sensor to find the optimal traversable direction. The output
segmentation [15], [16], deformable grid based obstacle traversable direction is then converted to three kinds of auditory
detection [8], etc. However, these methods cost so much cues in order to select an optimal one under different scenarios,
computation that they are not satisfied for real-time and integrated in the AR technique based visual enhancement
applications, and hard to measure the distance of the obstacle. for guiding the visually impaired people.
To measure the distance, some stereo-camera based methods
are proposed. For example, the method [17] uses local window III. THE PROPOSED FRAMEWORK
based matching algorithms for estimating the distance of
A. The Hardware System
obstacles, and the method [18] uses genetic algorithm to
generate dense disparity maps that can also estimate the The proposed system includes a depth camera for acquiring
distance of obstacles. However, these methods will fail under the depth information of the surroundings, an ultrasonic
low-texture or low-light scenarios, which cannot ensure the rangefinder consisting of an ultrasonic sensor and a MCU
secure navigation. Recently, RGB-D cameras have been widely (Microprogrammed Control Unit) for measuring the obstacle
used in many applications [5], [14], [19]-[21] for their low cost, distance, an embedded CPU (Central Processing Unit) board
good miniaturization and ability of providing wealthy acting as main processing module, which does such operations
information. The RGB-D cameras provide both dense range as depth image processing, data fusion, AR rendering, guiding
information from active sensing and color information from sound synthesis, etc., a pair of AR glasses to display the visual
passive sensor such as standard camera. The RGB-D camera enhancement information and an earphone to play the guiding
based method [5] combines range information with color sound. The hardware configuration of the proposed system is
information to extend the floor segmentation to the entire scene illustrated in Fig. 1, and the initial prototype of the system is
for detecting the obstacles in detail. The one in [14] builds a 3D shown in Fig. 2.
(3 Dimensional) voxel map of the environment and analyzes
3D traversability for obstacle avoidance. But these methods are
constrained to non-transparent objects scenarios due to the
imperfection of the depth camera.
B. Guiding Information Feedback
There are three main techniques for providing guiding
information to visually impaired people [22], i.e., haptic, audio
and visual. Haptic feedback based systems often use vibrators Fig. 1. The hardware configuration of the proposed system.
260 IEEE Transactions on Consumer Electronics, Vol. 63, No. 3, August 2017
B. The Steps
The overall algorithm diagram is depicted in Fig. 5. The
depth image acquired from the depth camera is processed by
the depth-based way-finding algorithm which outputs several
candidate moving directions. The multi-sensor fusion based
obstacle avoiding algorithm then uses the ultrasonic
measurement data to select an optimal moving direction from
the candidates. The AR rendering utilizes one depth image to
Fig. 3. Depth information acquisition.
generate and render the binocular images as well as the moving
direction to guide the user efficiently. The guiding sound
2) Ultrasonic Rangefinder
In this work, the ultrasonic sensor is mounted on the glasses.
The sensor uses 40 KHz samples. The samples are sent by the
transmitter of the sensor. The object reflects the ultrasound
wave and the receiver of the sensor receives the reflected wave.
The distance of the object can be obtained according to the time
interval between the wave sending and the receiving. As is
shown in Fig. 4., the Trig pin of the sensor must receive a pulse
of high (5 V) for at least 10 us to start measurement that will
trigger the sensor to transmit 8 cycles of ultrasonic burst at 40
KHz and wait for the reflected burst. When the sensor has sent
the 8 cycles burst, the Echo pin of the sensor is set to high. Once
the reflected burst is received, the Echo pin will be set to low,
which produces a pulse at the Echo pin. If no reflected burst is
received within 30ms, the Echo pin stays high. Thus, the
distance will be set very large for representing that there is no Fig. 5. Diagram of the proposed system.
J. Bai et al.: Smart Guiding Glasses for Visually Impaired People in Indoor Environment 261
synthesis takes the moving direction as the input to produce the 1 D ( z ) , which D ( z ) represents the adaptive width
auditory cue for guiding the totally blind people. Three kinds of depending on the depth z . Every sliding step is computed as
auditory cues are developed and tested to allow the selection of follows.
the most suitable one under different scenarios. First, compute the corresponding 3D point of a given point in
1) Depth-based Way-finding the depth image. As is shown in Fig. 7, given a point n in the
This depth-based way-finding algorithm is to find candidate depth image, the u1 , v1 , z can be known. Using similar
traversable directions based on the depth image. Different from triangles law, the 3D point N ( x1 , y1 , z ) can be calculated by:
the floor-segmentation based way-finding methods, it only uses
the region of interest to determine the traversable directions. x1 u1 u 0
z (2)
Since the nearest obstacle is always at the bottom of the depth y1 f v1 v 0 .
image, it only select a line in the bottom of the image as input, z f
as is shown in Fig. 6. Considering that the user’s walking is
Second, compute the sliding window width D ( z ) in the
slow and gradual, it can detect the obstacle timely.
image. According to the traversable threshold w , the 3D
boundary point M ( x 2 , y 2 , z ) of the traversable region can be
obtained by:
x 2 x1 w
(3)
y 2 y1 .
z z
Using similar triangles law as well, the 2D point m of the
3D point M projection on the depth image can be computed by:
u2 x2 u0
Fig. 6. The used depth image. The blue line represents the input of the f v . (4)
2 z y2
v
depth-based way-finding algorithm.
0
1 z 0
Since the depth is relative to the camera, i.e. it is in the f
camera coordinate system. As is shown in Fig. 7, O c is the Substituting (2), (3) into (4), we can obtain:
origin of the camera coordinate system ( X c , Yc , Z c ) , i.e. the u 2 u1 fw
(5)
center of projection. O is the origin of the image coordinate + z .
v 2 v1 0
system ( u , v ) in pixel. O I ( u 0 , v 0 ) is the principal point, i.e.
the origin of the image coordinate system ( x , y ) in millimeter. Then the width D ( z ) of adaptive sliding window can be
The distance from O c to the image plane is the focal length f . expressed as:
fw (6)
A 3D point in camera coordinates N ( x1 , y1 , z ) is mapped to D ( z ) u 2 u1 .
z
the image plane I at the intersection n ( u1 , v1 ) of the ray Third, judge if the region between point n and m in the depth
connecting the 3D point N with the center of projection O c . image is traversable. This can be calculated by:
1 if x 0:4 { x | z x [ z x , z x ], z x }; (7)
1x ( z ) ,
0 others .
where x is the point in the depth image between point n
and m , x 0 :4 represents continuous five points, z x is the depth
of the point x , is the measurement noise and set as fixed
value, is the distance threshold.
If arbitrary continuous five points between point n and m is
in the range [ z , z ] , and the depths of the five points
exceed the distance threshold for timely and safely avoiding
the obstacle, this region is considered to be traversable;
Fig. 7. Coordinate system transformation. otherwise, an obstacle is considered in this region, and this
region will be discarded.
The depth-based way-finding algorithm uses the traversable Fourth, compute the steering angle . If 1 x ( z ) in (7) is 1,
threshold w and adaptive sliding window to determine the i.e. the region is traversable, the steering angle can be
candidate moving directions. The sliding window size is calculated by:
262 IEEE Transactions on Consumer Electronics, Vol. 63, No. 3, August 2017
u1 u 2 2 u 0 (8)
arctan .
2f
If 1 x ( z ) in (7) is 0, i.e. the region is not traversable, the
steering angle is not calculated.
These four steps are continually conducted until all the input
points were traversed. Then the candidate direction set A ( ) ,
i.e. the set of steering angle , will be stored for later use.
2) Multi-sensor Fusion Based Obstacle Avoiding
Because the depth camera projects the infrared laser for
measuring the distance, and the infrared laser can pass through
transparent objects, which will produce incorrect measuring
Fig. 8. The workflow of the proposed algorithm.
data. Thus, the multi-sensor fusion based method, which
utilizes the depth camera and the ultrasonic sensor, is proposed
and can overcome the above limitation of depth camera. This
algorithm steps are as follows.
Firstly, compute the optimal moving direction based on the
depth image. The optimal moving direction can be obtained by
minimizing the cost function, which is defined as:
(a) (b) (c)
1 Fig. 9. Results of the moving direction. (a) is the input depth iamge. (b) shows
min f ( ) min ( ) ifA ( ) ; (9)
opt A ( ) A ( ) W ( ) , the moving direction (the laurel-green region in the image) calculated in (9). (c)
Null ifA ( ) . shows the moving direciton (Null) calculated in (10).
where αopt is the optimal moving direction, is the steering 3) AR Rendering with Guiding Cue
angle, which belongs to the A ( ) (see section Ⅲ.B.(1)),
The visual enhancement, which adopts the AR technique, is
W ( ) is the maximum traversable width which centers on the
direction , , are the different weights. used for weak sighted people. In order to showing the guiding
cue to the user based on the one depth image, the binocular
The function f(α) evaluates the cost of both steering angle
parallax images are needed to generate. This was realized in
and traversable region width. The smaller the steering angle is,
Unity3D [28] by adjusting the texture coordinates of the depth
the faster the user can turn. The wider the traversable region is,
image. The rendering stereo images integrate the feasible
the safer it will be. This cost function will ensure the user move
direction (the circle in Fig. 10(a) (b)) for guiding the user.
effectively and safely.
When the feasible direction is located in the bounding box (the
Second, fuse ultrasonic data to determine the final moving
rectangular box in Fig. 10 (a) (b)), the user can go forward (see
direction. Since the ultrasonic sensor can detect the obstacles in
(c) of the third row in Fig. 10). When the direction is out of the
the range of 0.03 m to 4.25 m, and within scanning field of 15°,
bounding box, the user should turn left (see the second row in
the final moving direction is defined as:
Fig. 10) or right (see the last row in Fig. 10) according to the
if (opt [7.5,7.5]) (opt [7.5,7.5]&&d ); (10) feasible direction until it is lay in the bounding box. When the
opt
opt , feasible direction is absent (see the first row in Fig. 10), this
Null others .
indicates there is no traversable way in the field of view, the
is the final moving direction, opt is equal as
where opt user should stop and turn left or right slowly, even turn back in
order to find a traversable direction.
(9), d is the distance measured by the ultrasonic sensor, is
the same as (7). 4) Guiding Sound Synthesis
This can be explained as follows. First, it judges if the For the totally blind users, auditory cues are adopted in this
optimal moving direction in (9) is within the view field of work. The guiding sound synthesis module can produce three
ultrasonic sensor, i.e. [-7.5°, 7.5°]. If false, it will directly kinds of guiding signal: stereo tone [26], recorded instructions
output the optimal moving direction as in (9). If it is true, the and different frequency beep.
ultrasonic data then will be used to judge if the measurement First kind converts the feasible direction into stereo tone. The
distance exceeds the distance threshold . If true, it will also stereo sound (see loudspeaker in Fig. 11) is like a person in the
output the optimal moving direction as in (9). If false, it will right direction to tell the user came to him. Second kind uses the
output Null, which means no moving direction. The workflow recorded speech to tell the user turn left or right, or go forward.
of this algorithm is shown in Fig. 8. As is shown in Fig. 11, the field of view is 60°, the middle
The results of the optimal moving direction is shown in Fig. region is 15° and the two sides are divided equally. When an
9, and show the multi-sensor fusion based method can make a obstacle is in front of the user, the recorded speech will tell the
correct decision under transparent scenario, whereas the user “Attention, obstacle in front of you, turn left 20 degrees”.
method only by depth image cannot. Some recorded audio instructions are detailed in TABLE Ⅰ. The
J. Bai et al.: Smart Guiding Glasses for Visually Impaired People in Indoor Environment 263
last one converts the feasible direction into different frequency TABLE I
beep. The beep frequency is proportional to the steering angle. AUDIO INSTRUCTIONS
When the user should turn left, the left channel of the earphone Condition Audio instruction
will work and the right will not, vice versa. When the user
should go forward, the beep sound will keep silence. Obstacle placed in front of the user Attention, obstacle in front of
with no feasible direction you, turn left or right slowly
Obstacle placed in front of the user Attention, obstacle in front of
with feasible direction on the left you, turn left xxa degrees
Obstacle placed in front of the user Attention, obstacle in front of
with feasible direction on the right you, turn right xxa degrees
Obstacle placed in left of the user Attention, obstacle in left of
with feasible direction on the front you, go straight
Obstacle placed in right of the user Attention, obstacle in right of
with feasible direction on the front you, go straight
Obstacle placed in left of the user Attention, obstacle in left of
with feasible direction on the right you, turn right xxa degrees
Obstacle placed in right of the user Attention, obstacle in right of
with feasible direction on the left you, turn left xxa degrees
No obstacle Go straight
a
xx is the steering angle.
Fig. 14 Accuracy under different transparent glass (a) Home (b) Office
C. Computational Cost
The average computational time for each step of the
proposed system is calculated, and the results are shown in
TABLE Ⅱ. The depth image acquisition and the depth based
way-finding algorithm takes about 11 ms. The ultrasonic sensor
measurement cost depends on the obstacle’s distance, which
TABLE Ⅱ
COMPUTATIONAL TIME FOR EACH STEP OF THE PROPOSED ALGORITHM