Professional Documents
Culture Documents
Monocular Camera
Guilherme Kaname Reis Kano
guilherme.kano@tecnico.ulisboa.pt
Abstract
Railway tracks are one of the most important factors for the social-economical development of a
nation. Due to its unquestionable impact, railway monitorization should be defined as a high priority
task. In this work two solutions are developed and tested, based on computer vision and deep learning
for the automatic obstacle detection in railway tracks, such as fallen trees or large dimension rocks
resulting from rockslides. The computer vision-based algorithm employs image processing techniques
such as edge detection and image segmentation using superpixels. The deep learning approach uses
a pretrained network (transfer learning) which is modified and retrained over the target dataset. In
this dissertation two types of obstacles are considered: direct and indirect obstacles. The first refers
to elements directly obstructing the railway rails. The second refers to large obstacles located in the
vicinity of the rails, susceptible of colliding with a passing train. Both algorithms perform upon RGB
images of railway tracks, captured by a single monocular camera. Since no data set depicting railway
track obstacles is known to exist, it was required to artificially create one to develop and test both
methods developed. In a final phase, both algorithms were tested on the scale model composed of a
webcam and railway tracks models. It was verified that the best results were obtained with the deep
learning algorithm.
Keywords: Railway Tracks, Obstacles, Automatic Detection, Computer Vision, Convolutional Neural
Networks
1
monitorization. Obstacle detection solutions often rect obstacles are objects that, despite not being ob-
rely the use of LiDAR systems, which are used for structing the rails, are located in the vicinity of the
monitoring level crossing, one of the location where rails, susceptible of colliding with a passing train.
those incidents are most prone to occur. The main contributions achieved by this work are
To avoid the use of high cost sensors, often lim- the creation and collection of a novel dataset of rail-
ited by an effective range, solutions often drift to way track images with and with no obstacles, that
imaging sensors and imaging processing techniques. can be used by the academic community. Secondly,
maire2010obstacle the algorithm developed and addressed in section
Many researchers [3, 4, 5] have focused on the use 5 proposes an alternative to railway rails segmenta-
of computer vision techniques, such as edge detec- tion based on the detection of straight lines with the
tors, to extract the rail edges, as the starting point Hough transform. This topic is further addressed
for the detection of obstacles. In [5], authors, use a by Kano et al. in [8].
window slider technique that encompass a portion Finally, this work addresses main advantages and
of the detected rail edges. At each window, the ori- drawbacks when adopting vision and deep learning
entation and average intensity gradient of the rail in the topic of automatic detection in railway tracks.
edge is computed. If both the angle and intensity
2. Theoretical Background: Computer Vi-
between two edges of adjacent windows do not lay sion
under a similarity threshold, the rail is said to be 2.1. Linear Image Filtering
obstructed. Linear image filtering consists of modifying an im-
Other approaches, such as the one explored by age and it can be used for feature detection, such as
Ukai et al. in [6], combine the information re- finding edges, and for noise suppression. The tar-
trieved from different sensors. Authors developed get pixel in the image I at location (u, v) is updated
an on-board monitorization device that employs according to
both image recognition and high resolution radar X
g(u, v) = w(k, l)I(u + k, v + l) (1)
techniques to detect obstacles in the railway tracks.
k,l
Contrary to the available research of surface and
railway track components maintenance using deep with (u; v) and (k; l) being the image and the mask
learning algorithms, it appears that no significant indices, respectively.
search is being conducted in obstacle detection task Sobel operator is one common edge detector,
using the same methods. While the rail surface in- which can be defined by its horizontal and vertical
spection allows for the collection of a vast amount of components Gx and Gy , defined as:
data, subsequently provided to the research commu-
nity, the same do not extends to the obstacle detec-
1 0 −1
1 2 1
tion field, as there is a significantly smaller number 1 1
Gx = 2 0 −2 Gy = 0 0 0 (2)
of registered observations, as mention by Nakhaee 4 4
1 0 −1 −1 −2 −1
et al. in[7].
Alternatively, noise reduction can be achieved us-
1.2. Objectives and Contributions ing a discrete approximation of the 2-D Gaussian
This thesis aims to develop, evaluate and com- function, defined by a N × N matrix and the stan-
pare computer vision and deep learning approaches dard deviation of the Gaussian function, σ, which
for the automatic detection of obstacles in railway measures the filter intensity.
tracks, while being limited to the analysis of RGB
2.2. Image Segmentation
railway tracks images captured by a single monoc-
Image segmentation can be performed based on dis-
ular camera.
continuity, with the use of edge detection algorithms
The proposed detection algorithm operates upon
such as the Sobel operator and the Canny edge de-
images of railway tracks taken by an inspection
tector, and similarity of intensity values, such as the
vehicle mounted camera whose orientation is kept
image oversegmentation using superpixels.
approximately constant over the inspection course.
Superpixels are cluster of pixels that share per-
This constraint dictates the normal operation con-
ceptually similar features such as color or grayscale
ditions on which the detection algorithm operates
intensity levels. SLIC decomposes the image into
upon: the railway tracks’ relative position and ori-
several clustering using a 5-D distance definition,
entation within the captured image, with the image
that accounts for both the color similarity in the
vertical bisector being approximately aligned with
L∗ a∗ b∗ color space and the 2-D euclidean distance.
the middle of the railway rails.
Two types of obstacles are considered: direct 2.3. Region-Based Descriptors
obstacles and indirect obstacles. Direct obstacles Region-based descriptors translate shape properties
refers to objects obstructing the raiway rails. Indi- of a given region into a numeric feature. Major and
2
minor axis, correspond the maximum and minimum 4. Railway Dataset
distances between the utmost pixels in a given re- The railway dataset is composed by railway track
gion. These descriptors can be derived from other images with and without obstacles.
shapes, such as ellipses, that share the same second The No Obstacle dataset was assembled from two
central moment as the considered binary object. sources. The first source is a compilation of several
Additionally, the orientation of the shape, given by images obtained from Google TrainView, and re-
the angle between the major axis and the horizontal trieved by Pinho and Santos in [10]. The second
axis can also be computed using this ellipses (Fig. source was compiled from a collection of Youtube
1). videos where the frames were collected.
Since there is no large dataset of railway tracks
images with obstacles, the Obstacle dataset had to
be artificially generated.
The obstacle dataset images were generated fol-
lowing two different approaches. The first approach
uses the algorithm developed by Pinho and San-
Figure 1: Major and minor axis and orientation tos in [10] that automatically generates obstacle in-
descriptors of the region (left) can be obtained from stances by placing the obstacle images (.png images
the ellipses with the same second central moment of trees and rocks) in the original image.
(right). Alternatively, in the second method, the obstacle
instances were manually generated using image ma-
The Euler number (E) of a binary image is the nipulation software (GIMP ), by outlining specific
difference between the number of the connected (C) contours in the target image, that could represent
elements and the number of holes (H): obstacles in a real scenario. The obstacle instances
were generated by placing the contours in such a
E =C −H (3) manner to simulate rock slides or landslides (Fig.
2).
3. Theoretical Background: Deep Learning
3.1. Learning and Computer Vision
Deep learning has been a popular strategy among
many scientific applications such as medical imag-
ing diagnosis [9].
Being verified as an effective tool to solve many
computer vision tasks, deep learning requires a large
data set of examples to train the models. Regarding (a) Target image. (b) Contour out- (c) Obstacle in-
this matter, one of the most important contribu- lining stances.
tors is the ImageNet, an image database containing
over 14 million images, where many state of the Figure 2: Process of obtaining an obstacle instance
art visual recognition models have been trained on. with GIMP.
Many of these models have been used for transfer
learning (TL) which consists in using a pretrained Table 1 describes the number of images in the
model that has already been trained over a large- complete dataset (no obstacles and obstacles).
scale dataset, and retrained in a different domain.
3
followed by a Gaussian filter is applied to diminish
the presence of high frequency details that can later
be detected during the edge detection phase. Then, θglobal = θlef t rail ∪ |θright rail | = [61, 83] deg (6)
a Region of Interest (ROI) is defined to limit the
search area where the edge detection are performed Therefore, a BLOB is classified as true rail BLOB
upon. if |θBLOB | ∈ θglobal .
The ROI is obtained by defining a binary mask, Having obtained the partially reconstructed rail
with the desired region represented by foreground BLOBs, a closing operation using a square 50 × 50
pixels and it is obtained by first considering a sub- SE, is perform upon each individual rail BLOB to
region A whose frontiers are defined by: obtained the final binary representation of the rails.
5.3. Obstacle Detection Phase
y < 0.6 · heightImage
The obstacle detection phase is divided into two
x < 0.5 · widthImage (4)
modules: the first is responsible for detecting the
y < 2x
presence of direct obstacles obstructing the rails and
second is only concern to detect large obstacles in
where y = 2x is the equation of the line de- the vicinity of the railway tracks.
fined by the points p1 = (0, 0) and p2 = (0, 3 · Direct obstacles can be detected by computing
widhtImage , widthImage , 0.6 · heightImage) (Fig. the vertical distance V ertDist of each rail. Af-
3). Then, it is mirrored around the image’s vertical ter considering two possible definitions for this
bissector to obtain the ROI. measurement, by either evaluating the major axis
length and orientation of each rail (Fig. 4 mid-
dle) or by accessing the height of the corresponding
bounding box (Fig. 4 right).
4
Figure 7: Separation of the surrounding area (sec-
ond column) into the three regions (third to last
Figure 5: No obstruction detected (left) right rail
column) using the lines ylef t and yright (red lines).
obstructed (right).
where besq = yCentlef t − mlef t xCentlef t and mlef t = (caused by large rock that does not divide one re-
tan(θlef t ). gion, however, it is large enough to be considered as
The same reasoning is applied to the right line. a threat), the corresponding euler number is equal
The left region (Fig. 7 third image) is composed to 0. If a region is clear of any indirect obstacle, it
by the foreground pixels with coordinates (x, y) that is represented by a single foreground with no holes
satisfy: and the corresponding euler number is equal to 1.
y > mlef t x + blef t (8) Therefore, the algorithm detects the presence of an
indirect obstacle if there is at least one region with
the middle region (Fig. 7 fourth image) by pixel an euler number different from 1.
such that satisfy Finally, the detection algorithm only predicts the
complete safety of the railway if, and only if, both
y < mlef t x + blef t
(9) detection modules do not detect any obstacle in the
y < mright x + bright railway track.
and the right region (Fig. 7 fifth image) by pixels 6. Obstacle Detection - Deep Learning Algo-
that satisfy rithm (DetectDL)
6.1. Data Augmentation and Dataset Spliting
y > mright x + bright (10) In order to have two classification classes with an
equal number of images, the Obstacle class images
Each individual binary image is subject to a clos- are subject to data augmentation techniques to gen-
ing and opening operation. The decision whether an erate more cases.
5
The used data augmentation routine produces 3 of 0.1 ever 2 training epochs and a dropout value
new images from an already existing one. The first p = 0.4. The corresponding performance curves for
image (Fig. 9 b) is obtained by flipping the orig- the accuracy (%) and loss (cross-entropy) for both
inal image (Fig. 9 a) around the vertical bisector training and validation are shown in Fig. 10.
axis. The second and the third (Fig. 9 c and d) are
obtained by decreasing and increasing the image’s Baseline Hyperparameter configuration vs.
Best Hyperparameter configuration
contrast up to of 30%. Then one of these two images 100
Accuracy (%)
tical bisector axis, similarly to the first augmented 80 Baseline Hyperparameter (Train)
Best Hyperparameter (Train)
image (Fig. 9 b). 70 Baseline Hyperparameter (Valid.)
Best Hyperparameter (Valid.)
60
50
0 100 200 300 400 500 600 700
# iterations
Baseline Hyperparameter configuration vs.
Best Hyperparameter configuration
0.8
Loss (Cross-Entropy)
0.6
Figure 9: Data augmentation routine. (a) Original 0.4
Image, (b) flipped image, (c) image with increased
0.2
contrast and (d) flipped image with decreased con-
trast. 0
0 100 200 300 400 500 600 700
# iterations
From the total 1572 No Obstacle images, a pro-
portion of 70%-15%-15% is selected for the train- Figure 10: Performance curves obtained from the
ing,validation and test sets, corresponding to an baseline hyperparameter combination (blue) and
amount of 1100, 236 and 236 No Obstacle images, best hyperparameter combination estimation (or-
respectively. From the 747 distinct Obstacle im- ange).
ages, 236 are selected for the test set. To avoid
having images in the validation set generated by 7. Results
7.1. DetectCV
data augmentation techniques from training exam-
ples, from the remaining 511, 236 are randomly se- The confusion matrix for the DetectCV algorithm
lected as a validation set. The remaining 275 are in the test set is represented by table 2. DetectCV
subject to data augmentation techniques, account- has a reasonable capacity in correctly detecting
ing for a total of 1100 Obstacle images to be used obstacles(Sensitivity= 95.3%) and about 68.0% of
as training set. the images of railway tracks classified as having
an obstacle are correctly classified. DetectCV cor-
6.2. Hyperparameter Selection rectly identifies 130 of the total of 236 images as
In order to estimate the best model configuration, free of any obstacle (Specificity= 55.1%). Overall,
several hyperparameters combinations were tested the algorithm assigns the correct label (Obstacle or
using a grid search algorithm by (1) varying the No Obstacle) in 75.2% of the cases.
number of frozen Inception modules, (2) the ini-
Table 2: Confusion Matrix for the DetectCV algo-
tial learning rate (ILR), and respective learning
rithm applied to a 472 images test set.
schedule and (3) dropout probability (p). Re-
Ground truth
garding the number of frozen Inception modules,
Positive Negative
the terminology adopted is the following: base-
Pred.
6
able to detect the rail edges accurately (Fig. 13
(b)).
Positive 229 13
Negative 7 223
7
Obstacle, 1
No obstacle, 7.2976e-12
DetectDL prediction:
Obstacle
Obstacle, 1
No obstacle, 1.4727e-14
DetectDL prediction:
Obstacle Fig. 15 (c), might not be clear, a more coherent jus-
tification can be given for the case depicted in Fig.
15 (d). The level crossing platform creates an illu-
sion of a rail discontinuity, misleading the model’s
prediction. From the total of 13 false positives in-
(a) Predicted: Obstacle. (b) Predicted: Obstacle. cidences, 5 are related to these infrastructures.
No obstacle, 0.61295
Obstacle, 0.38705
DetectDL prediction:
No obstacle
7.3. DetectCV vs DetectDL - Specific cases
No obstacle, 0.86207
Obstacle, 0.13793
DetectDL prediction:
No obstacle
Comparing both detection algorithms on low qual-
ity images (high blurring effect), DetectDL has a
better performance than DetectCV, whose perfor-
mance depends on how well the rail edges are de-
(c) Predicted: No Obstacle. (d) Predicted: No Obstacle.
tected, which can be hard when subject to low qual-
ity images.
Fig. 16 shows two examples of low quality rail-
Figure 14: Ground truth: Obstacle. Top row - true
way track images and the respective output of the
positive. Bottom row - false negative.
computer vision and the deep learning algorithms.
No obstacle, 0.96019
Obstacle, 0.039814
DetectDL prediction:
No obstacle
No obstacle, 0.99458
Obstacle, 0.0054182
DetectDL prediction:
No obstacle
Figure 16: Comparison of both DetectCV (left) and
DetectDL (right) when subject to low quality im-
ages. DetectCV failed in the classification, while
DetectDL succeeded.
8
DetectCV fails when the shadow causes a discon- detection range achieved was 23, 0 cm, measured in
tinuity in the rail edges (Fig. 17 top row). How- the model.
ever, the algorithm performs well when facing cases Fig. 20 shows some processed frames by the com-
where the shadow only changes the rail’s surface puter vision algorithm (DetectDL) using the scale
color, keeping the rail edges unbroken (Fig. 17 mid- model. In the first frame (Fig. 20 (a)), the model
dle and bottom row). does not yet detect the obstacle. In frames 2 and
Regarding DetectDL, it wrongly detects an ob- 3 (Fig. 20(b) and (c)), the model was located in
stacle in Fig. 17 (middle) where the shadow projec- the same position. At this location, the model’s
tion leads to a drastic change in the rail’s surface prediction oscillated from No Obstacle and Obsta-
color (Fig. 17 middle). In the other images (Fig.17 cle. For the No Obstacle class, the respective CAM
top and bottom), the rails retain an uniform sur- completely ignores (by representing with a darker
face color even after being covered by the shadow, shade in the heat map) the discriminative location
therefore, not misleading the detection algorithm. within the image where the obstacle stands. Con-
trary, the CAM for the Obstacle class, fully encloses
7.4. Scale Model the obstacle once ignored by the CAM represented
Both algorithms were tested using a scale model, in Fig. 20 (c).
comprised of 1/87 scale track model and a comercial Finally, in the fourth frame, the model correctly
webcam HP 2300HD (Fig. 18). detects the obstacle (white rock) (Fig. 20 (d)).
9
captured by a single monocular camera. [2] Y. Santur, M. Karaköse, and E. Akin,
To address this topic, for which no available “An adaptive fault diagnosis approach us-
dataset of images of railway tracks containing ob- ing pipeline implementation for railway in-
stacles is known to exist, the first effort was focused spection,” Turkish Journal of Electrical Engi-
on the creation of a novel dataset from existing im- neering & Computer Sciences, vol. 26, no. 2,
ages of railway tracks without obstacles, retrieved pp. 987–998, 2018.
from multiple sources.
[3] F. Maire and A. Bigdeli, “Obstacle-free range
Regarding the computer vision algorithm De-
determination for rail track maintenance ve-
tectVC, some remarks can be drawn, which, funda-
hicles,” in 2010 11th International Confer-
mentally represent the paradigm of any computer
ence on Control Automation Robotics & Vi-
vision system: the importance of establishing con-
sion, pp. 2172–2178, IEEE, 2010.
trolled working conditions for the algorithms to op-
erate. In this work, the working conditions were [4] L. F. Rodriguez, J. A. Uribe, and J. V. Bonilla,
defined from the beginning, by considering only im- “Obstacle detection over rails using hough
ages with similar orientation and by defining a ROI. transform,” in 2012 XVII Symposium of Im-
However, it was later verified that some blurred im- age, Signal Processing, and Artificial Vision
ages were presented where the algorithm had diffi- (STSIVA), pp. 317–322, IEEE, 2012.
culties to achieve its task. Moreover, the selection
of parameters in many computer vision systems are [5] H. Wang, X. Zhang, L. Damiani, P. Giribone,
often based on heuristics, with no definitive guaran- R. Revetria, and G. Ronchetti, “Video analy-
tee that one solution fits every case. Nevertheless, sis for improving transportation safety: obsta-
the algorithm presented a good detection ability. cles and collision detection applied to railways
A deep learning approach was also explored. and roads,” in Proceedings of the International
Deep learning models are data driven models that MultiConference of Engineers and Computer
require minimal human supervision, with an asso- Scientists 2017, pp. 909–915, 2017.
ciated cost of requiring a significantly amount of [6] M. Ukai, B. Nassu, N. Nagamine, M. Watan-
dataset to extract features from raw data. As such, abe, and T. Inaba, “Obstacle detection on rail-
their performance is highly dependent of the avail- way track by fusing radar and image sensor,”
able data used for training and less on human exper- in 9th World Congress on Railway Research
tise, contrary to computer vision algorithms. De- (WCRR-2011), Lille, France, 2011.
spite the immense efforts dedicated to assemble the
dataset necessary for this task, the DetectDL algo- [7] M. C. Nakhaee, D. Hiemstra, M. Stoelinga,
rithm would benefit from more images. Neverthe- and M. van Noort, “The recent applications of
less, with the use of transfer learning, the algorithm machine learning in rail track maintenance: A
was able to achieved even better results than the survey,” in International Conference on Relia-
computer vision-based detection algorithm. bility, Safety, and Security of Railway Systems,
On a finishing note, all the objectives defined in pp. 91–105, Springer, 2019.
this work were fulfilled. Nevertheless, some recom-
[8] G. Kano, T. Andrade, and A. Moutinho, “Au-
mendations are provided.
tomatic detection of obstacles in railway tracks
The first suggestion is the use of supervised ob- using monocular camera.,” in Computer Vi-
ject detection algorithms, such as the Faster R- sion Systems (D. Tzovaras, D. Giakoumis,
CNN. Secondly, the deep learning network could be M. Vincze, and A. Argyros, eds.), 11754,
trained with images with the ROI, instead of using Springer International Publishing, 2019.
the whole image (railway tracks + background). If
saved in memory, the images must be saved in a [9] S. Pereira, A. Pinto, V. Alves, and C. A.
lossless format such as Portable Network Graphics Silva, “Brain tumor segmentation using convo-
(.png) to avoid the presence of compression noise. lutional neural networks in mri images,” IEEE
Finally, and since there is lack of sufficient data ded- transactions on medical imaging, vol. 35, no. 5,
icated to the railway track inspection research, the pp. 1240–1251, 2016.
expansion of the dataset is a work worth exploring.
[10] J. Pinho and J. Santos, “Technical report.
References automatic detection of anomalies in railway
[1] D. Soukup and R. Huber-Mörk, “Convolu- tracks. instituto superior técnico. universidade
tional neural networks for steel surface defect de lisboa.” Project developed in Computer Vi-
detection from photometric stereo images,” in sion lecture.
International Symposium on Visual Comput-
ing, pp. 668–677, Springer, 2014.
10