You are on page 1of 7

Improvement of vision-based drone detection and

tracking by removing cluttered background,


shadow and water reflection with super resolution
1st Don Daven Christopher Trapal 2nd Bryan Chia Chee Leong 3rd Haw Wen Ng
Mechanical Engineering Mechanical Engineering Mechanical Engineering
National University of Singapore National University of Singapore National University of Singapore
Singapore Singapore Singapore
e0176118@u.nus.edu e0175400@u.nus.edu e0196875@u.nus.edu

4th John Tan Guan Zhong 5th Sutthiphong Srigrarom 6th Teng Hooi Chan
Mechanical Engineering Mechanical Engineering Institute of Flight Systems Dynamics
National University of Singapore National University of Singapore Technical University of Munich
Singapore Singapore Germany
e0176118@u.nus.edu spot.srigrarom@nus.edu.sg Cth.405998@tum-asia.edu.sg

Abstract—Detection and tracking of small and fast moving The effectiveness of visual-based detection depends
aerial targets especially drones has got attention nowadays. largely on the quality of input images or video stream.
This paper focuses on the vision-based technique using images Operationally, there are several environmental factors that
taken from observing cameras. In real life situations, the target
drone objects maybe are hidden in the cluttered background degrade the effectiveness of such technique. Common prob-
such as trees shadow or foliage, rows of buildings, and other lems are cluttered backgrounds such as tree shadows and
kind of scenery that will hinder the clear indication of the foliage. These cluttered background and shadow have ad-
drones. The object is further confused by the presence of its verse effect as they could hide or mask the target drones to
shadow and reflection from the water or glass wall reflection. be seen clearly, and that, any detection scheme might miss
For vision-based objection, the clarity and the ambiguity of the
target images in the video stream are the key for effective and such target. Other problems are shadow and reflection on
successful target detection and tracking. Here, we present the water or reflective surfaces such as glass. These shadows
improvement by mitigating the effect of cluttered background, and reflections will give mirror images of the target drones
shadow and water reflection to the target images. We applied and might confuse the detector and also give false detection
the schemes to make the drone more visible and more clear. of these duplicated drones. The overall scenario can be
We also implemented super resolution to increase the image
resolution for more precise detection and tracking. As a result, illustrated in figure 1.
the target drone could be detected and tracked throughout the There are vast articles and schemes about cluttered back-
sample clips. The comparative tracking results using DCF are ground, shadow and reflection elimination. For cluttered
presented. Likewise, we applied the water removal scheme to background, especially tree shadow and foliage, Yadav et
eliminate the reflection to avoid confusion to the tracker. With
this, just the correct drone targets were detected and tracked.
al [2] treated the problem as background subtraction and
Overall, the drones could be detected and tracked all the way, suggested to use Gaussian Mixture Model (GMM) to miti-
as long as they appeared in the camera scene. gate the problem. Zhang et al [3] applied the technique for
Index Terms—vision-based detection, aerial target, drone de- animal detection and showed promising results. Hence, we
tection, drone tracking, cluttered background removal, shadow applied similar technique for drone detection in this work.
removal, water reflection removal, super resolution.
Note that, there are several other researchers working in this
but mostly based on deep learning, which requires a lot of
I. I NTRODUCTION training of data to distinguish the tracked object, such as:
Drone detection has gained much attraction over the years. Hinterstoisser et al [4], Fan et al [5], We opted not to follow
Presently, concepts such as radar, acoustics, radio frequency this direction as this limits the general applicability.
identification and visual-based methods are being used for For shadow removal, Hsieh et al [6] discussed about ap-
detecting drone. In this work, we focus on visual-based plying shadow removal scheme for moving object detection.
detection for small and fast moving drones, as most of the Chen et al [7] presented the shadow removal for vehicle (car)
other techniques have limited reliability in detection [1]. detection. However, to the author’s knowledge, none of the
Fig. 1. Sample scenario of detecting drone flying near tree and water pond Fig. 2. Snapshot of a drone flying in the park, near to the trees (cluttered
on sunny day. There hwexists drone shadow, cluttered background (tree background) and hence, there was shadow from the trees to hinder the
shadow) and drone’s water reflection. The aerial camera with downwards visibility of the target drone as well as the drone’s own shadow. The
view will observe the actual drone together with tree’s shadow and drone’s target drone (DJI Tello) was observed by another drone (DJI Mavic)
shadow, where as the ground camera will observe the drone and the drone’s flying above. The full video can be seen in this Youtube link: https :
water reflection. //youtu.be/wf pCV bRP lHQ .

modern articles address the effect of shadow and reflection in


drone detection, nor about drone tracking afterwards. On the
other hand, for water reflection, there are studies on water
reflection removal effect, e.g. Shih et al [8] and Li et al [9]
. Likewise, the author finds no literature on water reflection
removal particularly purposed for object detection.
This paper addresses the issue of these environmental
effect to aerial object (drone) detection and tracking. The
paper can be broadly separate into 2 parts: 1) Cluttered
background and shadow removal and 2) Water reflection
removal. The focus will not be on the removal schemes,
but rather on downstream object detection. We will also
Fig. 3. Snapshot of a drone flying in the park, near to the water pond
introduce Super-Resolution (SR) to improve the contrast of and hence, there was reflection in the water body. The target drone (DJI
the image, so as to help detect the small drone better. Then, Mavic) was observed by a static camera. The full video can be seen in this
we will compare the effectiveness by using the standard Youtube link: https : //youtu.be/P AU JvD9hBBk .
Discriminative Correlation Filter (DCF) tracking tool to
detect and track the drones.
Mavic was observed by a static camera on a tripod next
II. V IDEO SAMPLE INPUT
to the water. In the camera view, both the real and the
In this work, we focus on the improvement of the vision- reflection drones can be seen. The setup is as shown in
based drone detection. Hence, there are direct comparisons figure 1. The video sample video clip is shown in figure
of drone detection before treatment (removal) and after 3 and the full video can be seen in this Youtube link:
treatment, and that, the original video input are the same. https : //youtu.be/P AU JvD9hBBk .
For cluttered background and shadow effect removal, we
flew a drone flying in the park, near to the trees, and hence, III. S HADOW R EMOVAL
there was shadow from the trees to hinder the visibility of
the target drone as well as the drone’s own shadow. The In this section, we shall discuss about the 1st treatment:
target drone (DJI Tello) was observed by another drone Shadow removal or shadow elimination. Shadow elimination
(DJI Mavic) flying above. The setup was as shown in process is widely used as a pre-processing operation in var-
figure 1. The video sample video clip is shown in figure ious video surveillance applications, such as environmental
17 and the full video can be seen in this Youtube link: monitoring, motion detection, and security monitoring. Once
https : //youtu.be/wf pCV bRP lHQ detected, shadows in images are used for moving target
For water reflection removal comparison, we flew a DJI detection in a video surveillance system and detection of
Mavic drone flying in the park above water, and that, target shape and size and finding the number of light sources
there was water reflection of the Mavic in the water. The and illumination conditions in natural images. Ignoring the
existence of shadows in images can degrade the output 3) Absdiff: This is the step that makes the shadow disap-
quality. pear. cv2.absdiff is a function which helps in finding the
absolute difference between the pixels of the two image
A. Shadow Removal by simple OpenCV techniques arrays. Essentially, a matrix subtraction is done between two
arrays, namely the original colour plane and the resulting
We can remove shadow on images using simple OpenCV image from the two techniques done earlier (bg img).
techniques. [10] This is most useful while pre-processing
and OCR text detection in images. The used techniques are
Dilation, Blurring, Absdif f and N ormalization.
Before these techniques are used, the image or frame will
be split into the red, green, and blue colour planes. The four
techniques will then be done on each individual colour plane
using a for loop.
The blue plane will be used to explain what happens as Fig. 6. Absdiff Effect
each step is executed.
1) Dilation: This operations consists of convolving an The blurred image can be observed to be different from
image A with some kernel (B), which can have any shape the original plane image due to dilation and filtration.
or size, usually a square or circle. The kernel B has a defined However, the shadow did not seem to have been affected
anchor point, usually being the center of the kernel. As the greatly by those two processes and so the pixel values in the
kernel B is scanned over the image, we compute the maximal shadow region did not vary greatly. Hence when cv2.absdiff
pixel value overlapped by B and replace the image pixel in was done, the shadow ‘disappeared’, while the details of
the anchor point position with that maximal value. As you the ground where the shadow was on seem to have been
can deduce, this maximizing operation causes bright regions maintained.
within an image to “grow” (therefore the name ”dilation”). Subsequently the brightness of the diff img was in-
creased.

Fig. 4. Dilation Effect


Fig. 7. Brightness increased

2) Blurring: Blurring is required to reduce noise in the


output image. Bilateral Filter is used as it can reduce un- However, as seen above, this step comes at a great cost.
wanted noise very well while keeping edges sharp. However, Neither diff img nor diff2 img highly resemble the plane
it is very slow compared to most filters. MedianBlur can also image due to the matrix subtraction done. When the three
be used for a faster result but does not keep the details as colour planes are put back together, they no longer keep the
well. Each channel of a multi-channel image is processed same colour content as the original image.
independently. In-place operation is supported. 4) Normalization: Normalizes the norm or value range of
an array. Diff2 img is normalised, with the minimum value
of pixel set at 0 and maximum value at 255, and the values
in between are normalized so that they look more ‘normal’
to the human eye. However in one colour plane, the effect
is not as obvious.
The resulting arrays from the three colour planes are then
merged back together to form the output image.
IV. S UPER R ESOLUTION

Fig. 5. Blurring Effect


One of the difficulties in Vision-based Drone Tracking is
the detection of drone in low-contrast and cluttered back-
ground. There are numerous methods that work reliability
work (ESPCN) 2. Fast Super-Resolution Convolution Neural
Network (FSRCNN) Both ESPCN and FSRCNN offer 2x,
3x and 4x magnification options.
To illustrate the enhanced resolution of SR, we compare
the cropped image of the drone with cluttered background
from the normal image (Fig. 10) and ESPCN 4x magni-
fication enhanced image (Fig. 11). The detected drone is
highlighted within the superimposed green box. (Fig. 11).
Fig. 8. Normalization Effect From Fig. 10 and Fig. 11, we can see that the ESPCN
enhanced images have a higher quality resolution which
results in a more robust motion detection that works better
in a cluttered environment.

Fig. 9. Input to Output Image

when there is high contrast between the foreground and


background. However, most of these methods fail in low-
contrast situations.
The integration of Super-Resolution (SR) into tracking
was first proposed by Mise and Breckon [11] to create a
robust method of drone motion detection that works well in
low-contrast and clutter environment. Fig. 10. Snapshot of Drone in Cluttered Background (No Super Res-
SR is a class of techniques of enlarging and enhancing olution). The full video can be seen at this YouTube link: https :
the details of images. It is best understood when compared //youtu.be/uzF Z2T 4jtoM
with digital enlargement. Typically, when we enlarge a small
image using bi-linear interpolation, it will result in a poor
image quality characterized by large pixels. SR makes use of
Convolution Neural Network to “hallucinate” details based
on some prior information database collected from a large
set of images.
There are 2 broad categories of SR: Single Image Su-
per Resolution (SISR) and Multi-Frame Super Resolution
(MFSR). SISR uses pre-trained models to fill in the missing
pixel and simulate a higher resolution image while MFSR
uses multiple images captured in burst, making use of slight
hand movements to fill in the missing spots in an up-scaled
image. For the purpose of this research, we limit the scope
of SR to Single Image Super Resolution (SISR) applied
to individual video frames as merging multiple frames will
result in positional error of the drone within the frame.
We tested this technique with the reference video “Tree Fig. 11. Snapshot of Drone in Cluttered Background (ESPCN with 4x
Drone” used that was 3 seconds long with 75 frames which magnification). The video clip can be seen at this YouTube link: https :
//youtu.be/QW XG49rU jC8
captures a drone flying across cluttered background with
foliage, resulting in a low contrast between foreground
(drone) and background. To study the effect of SR on motion detection, two
We used 2 methods of SISR available on OpenCV which experiments were carried out. The first experiment (Fig.
has quick computational times and support real-time video 12) was carried out with varying contour threshold for
upscaling: 1. Efficient Sub-Pixel Convolution Neural Net- the different magnification levels. The rationale for this is
that the same physical size is used for detection in each V. R EMOVAL OF O BJECT R EFLECTION IN WATER
detection program. For the default (No SR) program, a
contour threshold of 50 pixels was used. For the ESPCN x2
and FSRCNN x2 programs, a contour threshold of 100 pixels As stated in the introduction, Vision-based drone detection
was used. For the ESPCN x3 and FSRCNN x3 programs, a poses significant challenges in the natural outdoor setting.
contour threshold of 150 pixels was used. For the ESPCN One of such great challenges is detection of false objects
x4 and FSRCNN x4 programs, a contour threshold of 200 on reflection cast by wet bodies on the ground. The method
pixels was used. proposed in this paper seeks to improve the robustness of
aerial drone detection in the natural scene by employing
dense optical flow, or the Gunner-Farneback’s method [12],
which represents the motion map of every pixel in the scene.
The motion is processed to identify the presence of wet
bodies and reflection in a particular scene. Subsequently
the reflection of the object in the water surface is removed
Fig. 12. Experimental Results for Varying Contour Threshold through further processing.
Using OpenCV library functions [13], the basic image
processing and video capture can be performed. First, the
The second experiment (Fig. 13) was carried out with con- sample video file is captured using the cv2.V ideoCapture
stant contour threshold of 150 pixels for all programs. This function. The first frame of the video sample is extracted,
meant that for higher levels of magnification, the program resized for a greater speed, and converted to gray-scale
was more sensitive to smaller physical object motion. version for optical flow computation. The similar steps are
performed to extract and process the next frame. Using
the cv2.calcOpticalF lowF arneback function, the optical
flow vector of each pixel is captured based on the motion
differences between the two consecutive frames, denoted as
sets of optical flow vectors (u, v). Using cv2.cartT oP olar
function, these vectors are converted into polar coordinates
such that each pixel in the frame consists of a single optical
Fig. 13. Experimental Results for Constant Contour Threshold
flow vector magnitude. The optical flow principle works
on the assumption that the intensity of individual pixels
True Positive is defined as accurate motion detection of remain similar between consecutive frames, i.e. the lighting
the moving drone. False Positive Frames are defined as condition of the scene remains constant.
having motion detection on an object which is in reality Naturally, the objects that appear in the reflection of the
stationary. This could happen in either True Positive frame wet surface tend to manifest slower movement compared
or False Negative Frame. False Negative is defined as null to the surrounding ground scene when there is an apparent
motion detection of the moving drone. Note that the sum motion of the scene. This behaviour is reflected as a drastic
True Positive frames and False Positive frames gives the 75 change in the optical flow vector magnitude at the bound-
frames of the clips evaluated. aries of the reflection compared to the rest of the pixels in
From the results as shown in figures 12 and 13, the best the frame. Therefore, our approach is to iteratively identify
SR method is arguably FSRCNN x4 with the highest number the points with a sudden change in the optical flow vector
of true positive frames (96 percent). However, it has the magnitude in each row of the frame. This is achieved by
longest computational time and second highest number of comparing two adjacent pixels from left to right in each
false positive frames (10.7 percent). row, and recording the points if their difference is greater
Combining the findings from both experiments, there is than a certain threshold. The result consists of two lists of
a compromise that needs to be made when choosing a points which represent the left and right boundary of the
SR method for drone detection in cluttered background. reflection in each row of the frame.
Hence, we used FSRCNN x4 for the drone detection in Once the boundaries of the reflection has been determined,
high-security environments, as utmost priority is placed on the reflection can be effectively removed by patching the
detecting drones accurately, therefore a sacrifice in terms of region in between the boundaries with the surrounding
longer computational time and higher number of false posi- neighbour pixels, row by row. The information in the current
tive detection. The computational time could potentially be frame, such as the optical flow vector magnitude, is updated
reduced to enable real-time processing by using a computer to be the previous frame. A new frame is continually
with much higher computing power. streamed from the video file until the end.
VI. A PPLICATION TO D RONE TRACKING
Recall that the objective of this paper is to improve the
drone detection and tracking ability under cluttered, shadow
and reflective environments by applying removals and Super-
Resolution schemes presented in sections III to V. Therefore,
we applied the schemes and focused on the tracking results.
The tracking is done based on Discriminative Correlation
Fig. 14. Wet body with reflection (left) and without reflection (right)
Filter (DCF). Comparison of the detection and tracking
before and after applying schemes are presented.

A. Reflection removal in the indoor scene A. CSR-DCF with Shadow Removal


A preliminary video sample is recorded indoor to test the The DCF tracker tracks an object by using discriminative
idea of this reflection removal. The indoor setting effectively correlation methods where a filter is learnt based on the
minimizes the effect of disturbances and noise on the motion training image [14]. In particular, this work used the dis-
detection, as shown in Fig. 14. This Fig. 14 shows that the criminative correlation filter tracker with spatial and channel
reflection of the lights in the water is completely removed. reliability (CSR-DCF) [15].
The spatial reliability map constrains the filter to the
B. Reflection removal in the outdoor scene part of the object befitting for tracking. It overcomes the
The similar optical flow technique is applied on removal limitations of the rectangular shape assumption commonly
of reflection of drone during tracking in the outdoor scene, made in traditional DCF trackers. It also allows large training
as shown in Figures 15 and 16. Figure 16 shows that the region to improve the discriminative power of the filter with
reflection of the drone in the water is likewise completely background samples. Channel reliability scores estimated in
removed. The video clip can be seen at this YouTube link: the constrained optimization step are used for weighting the
https : //youtu.be/R5GW kJQeDO8 per-channel filter responses in localization step.
The tracking algorithm consists of localization step
and update step. In localization, the object is localized
through the responses of the learnt correlation filters
weighted by channel reliability scores. In update step, fore-
ground/background histograms are extracted and the spatial
reliability map is constructed.
B. Tracking results by CSR-DCF
We applied cluttered background and shadow removal as
mentioned in section III as well as Super-resolution image
enhancement, as mentioned in section IV, to the original
tracking video, the result video clips are shown in this
YouTube link: https : //youtu.be/T Zc9C0IL0hk for the
Fig. 15. Outdoor scene with reflection of drone tracking result of the original video clip (before treatment),
and https : //youtu.be/dOAwO4oV P 7Q for the tracking
result of the modified video clip (after treatment). The
sample snapshots are shown in figures 17 (before treatment)
and 18 (after treatment) respectively.
We applied DCF tracking for both before and after
treatment. The tracking result is shown in figure 19. Before
applying shadow removal technique described in section III
with SR image enhancement in section IV, the target DJI
Tello drone was seen and tracked at the beginning. Shortly
after, the Tello was hidden inside the tree shadow. The filter
eventually learnt the background and the tracking is lost.
After, we applied cluttered background and shadow removal
and image enhancement treatment, the tracking recovers and
the Tello drone can be seen throughout until the drone left
Fig. 16. Outdoor scene with reflection of drone removed
the scene and the video clip finishes, without interruptions
nor missed (dropped) detection.
Fig. 17. Snapshot of a video clip with showing a drone flying in the park,
near to the trees (cluttered background) BEFORE cluttered background and
shadow removal scheme applied. The original video clip is the same video
clip as shown in figure 17. The full video can be seen at this YouTube link:
https : //youtu.be/T Zc9C0IL0hk Fig. 19. Comparison of drone detection tracking by DCF result for before
and after cluttered background and shadow removal.

[2] D. K. Yadav, “Efficient method for moving object detection in


cluttered background using gaussian mixture model,” in 2014 Interna-
tional Conference on Advances in Computing, Communications and
Informatics (ICACCI). IEEE, 2014, pp. 943–948.
[3] Z. Zhang, Z. He, G. Cao, and W. Cao, “Animal detection from highly
cluttered natural scenes using spatiotemporal object region proposals
and patch verification,” IEEE Transactions on Multimedia, vol. 18,
no. 10, pp. 2079–2092, 2016.
[4] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Kono-
lige, and N. Navab, “Model based training, detection and pose
estimation of texture-less 3d objects in heavily cluttered scenes,” in
Asian conference on computer vision. Springer, 2012, pp. 548–562.
Fig. 18. Snapshot of a video clip with showing a drone flying in the park, [5] D.-P. Fan, M.-M. Cheng, J.-J. Liu, S.-H. Gao, Q. Hou, and A. Borji,
near to the trees (cluttered background) AFTER cluttered background and “Salient objects in clutter: Bringing salient object detection to the
shadow removal scheme applied. The original video clip is the same video foreground,” in Proceedings of the European conference on computer
clip as shown in figure 17. The full video can be seen at this YouTube link: vision (ECCV), 2018, pp. 186–202.
https : //youtu.be/dOAwO4oV P 7Q [6] J.-W. Hsieh, W.-F. Hu, C.-J. Chang, and Y.-S. Chen, “Shadow
elimination for effective moving object detection by gaussian shadow
modeling,” Image and Vision Computing, vol. 21, no. 6, pp. 505–516,
2003.
VII. C ONCLUSION [7] C.-T. Chen, C.-Y. Su, and W.-C. Kao, “An enhanced segmentation
on vision-based shadow removal for vehicle detection,” in The 2010
International Conference on Green Circuits and Systems. IEEE,
For vision-based objection, the clarity and the ambiguity 2010, pp. 679–682.
of the target images in the video stream are the key for [8] Y. Shih, D. Krishnan, F. Durand, and W. T. Freeman, “Reflection
effective and successful detection and tracking. This paper removal using ghosting cues,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2015, pp. 3193–3201.
addresses the effect of shadow and water reflection to the [9] T. Li, D. P. Lun, Y.-H. Chan et al., “Robust reflection removal based
target images. We applied the cluttered-background and on light field imaging,” IEEE Transactions on Image Processing,
shadow removal scheme to make the drone more visible vol. 28, no. 4, pp. 1798–1812, 2018.
[10] K. Ravi, “Shadow removal with open-cv,” 2020, last accessed
and more clearer. We also introduced Super-Resolution to 20 November 2020. [Online]. Available: https://medium.com/arnekt-
increase the image resolution for more precise detection and ai/shadow-removal-with-open-cv-71e030eadaf5
tracking. As a result, the target drone can be detected and [11] O. M. T. Breckon, “Super-resolution imaging applied to moving
targets in high dynamic scenes,” 2013.
tracked throughout the sample clip. Likewise, we applied the [12] A. Agarwal, “Opencv – the gunnar-farneback optical flow,”
water removal scheme to eliminate the reflection to avoid 2020, last accessed 29 November 2020. [Online]. Avail-
confusion to the tracker. With this, just the correct drone able: https://www.geeksforgeeks.org/opencv-the-gunnar-farneback-
optical-flow/
target was detected and tracked. [13] A. Mordvintsev and K. Abid, “Optical flow,” 2013, last accessed
2 December 2020. [Online]. Available: https://opencv-python-
tutroals.readthedocs.io/latest/py tutorials/py video/py lucas kanade
R EFERENCES [14] D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui, “Visual
object tracking using adaptive correlation filters,” in 2010 IEEE com-
[1] S. Srigrarom, S. M. Lee, M. Lee, F. Shaohui, and P. Ratsamee, “An puter society conference on computer vision and pattern recognition.
integrated vision-based detection-tracking-estimation system for dy- IEEE, 2010, pp. 2544–2550.
namic localization of small aerial vehicles,” in 2020 5th International [15] A. Lukežič, T. Voj’iř, L. Čehovin Zajc, J. Matas, and M. Kristan,
Conference on Control and Robotics Engineering (ICCRE). IEEE, “Discriminative correlation filter tracker with channel and spatial
2020, pp. 152–158. reliability,” International Journal of Computer Vision, 2018.

You might also like