You are on page 1of 74

國立交通大學

電機資訊國際學位學程

碩士論文

四軸飛行器追蹤之即時影像穩定設計

Real-Time Visual Stabilization for Quadcopter


Tracking using Zed Stereo Camera

研究生:沈颯米 (Samir Singh)

指導教授:宋開泰 博士 (Dr. Kai-Tai Song)

中華民國一百一十年一月
四軸飛行器追蹤之即時影像穩定設計
Real-Time Visual Stabilization for Quadcopter
Tracking using Zed Stereo Camera

研 究 生: 沈颯米 Student: Samir Singh


指導教授:宋開泰 博士 Advisor: Dr. Kai-Tai Song

國立交通大學
電機資訊國際學位學程
碩士論文

A Thesis
Submitted to EECS International Graduate Program
National Chiao Tung University
in Partial Fulfilment of the Requirements
for the Degree of
Master
in

Electrical Engineering and Computer Science

January 2021

Hsinchu, Taiwan, Republic of China

中華民國一百一十年一月
四軸飛行器追蹤之即時影像穩定設計

研 究 生: 沈颯米 指導教授:宋開泰 博士

電機資訊國際學位學程

國 立 交 通 大 學

摘要

即時的穩定影像幾乎是所有基於影像控制系統中都會需要的資訊。近年來,為了改善
電腦視覺的應用,例如視頻拍攝、室內監控、工業檢測、家用攝影機、保全攝影辨識
移動物體、物件辨識等,影像穩定的議題越來越受到關注。然而,大多數現有的演算
法僅限於對前景和背景特徵檢測與擷取,這限制了演算法在各種場景下的效果,例如
物體的劇烈運動、接近鏡頭的動態物件和照明變化等情況。本文設計了一種影像穩定
演算法,採用 YOLO 進行物件辨識與提取,使用帶通濾波與卡爾曼濾波穩定影像輸出。
演算法分為 5 個區塊,首先是擷取輸入影像畫面並且確定其參數,接著將其傳遞給
YOLO 進行物件偵測,然後框出物體的特徵進行運動估計和平滑化,擷取運動參數和
平滑影像後,使用運動補償和濾波器去除影像中的累積運動誤差和雜訊來得到穩定的
輸出。透過這樣的設計,可以有效處理由於四軸飛行器(quadcopter)運動引起的雜訊和
高頻振動,包含帶通濾波器可消除高低頻和系統雜訊,卡爾曼濾波器用來預測畫面位
移 並 減 少 畫 面 的 抖 動 和 失 真 。 最 後 , 應 用 影 像 扭 曲 仿 射 變 換 (warping affine
transformation)來產生穩定且有意義的輸出。實驗結果顯示,我們的方法對於各種照明
和運動變化均具有強健性,可以在有限大小和資源的嵌入式系統上提升影像穩定的性
能。與其他方法比較實驗結果顯示,本論文之方法在處理時間及振動誤差上皆優於 4
種常見之方法。

關鍵字: 穩定影像, 運動補償,帶通濾波, 卡爾曼濾波

i
Real-Time Quadcopter Tracking and Visual Stabilization
using Zed Stereo Camera

Student: Samir Singh Advisor: Dr. Kai-Tai Song

EECS International Graduate Program


National Chiao Tung University

Abstract
Real-time video stabilization is one of the most needed techniques for various image-based
applications such as video taking, indoor-monitoring, industrial inspection, video stabilization
for home video cameras, security cameras, object detection etc. However, most of the existing
stabilization algorithms are only limited to the detection of foreground and background features
and their extraction. This limits the applications when employed in various scenarios like the
illuminance change, drastic change in motion of a near-foreground object w.r.t. the camera etc.
Therefore, this thesis studies a stabilization method based on YOLOv3 for enhanced object
detection in the given frame for precise and effective feature extraction along with the
combination of Bandpass and Kalman filters for desirable motion filtering. First, the proposed
method estimates the input frame parameters to determine the bounding box of objects with a
confidence score using YOLOv3. Then, motion estimation, motion smoothing, motion filtering
and motion compensation are computed in the subsequent framework modules. In doing so, it
is assumed that the estimated motion of the input frames still contain noise and high-frequency
vibration due to camera motion, which is mounted on a quadcopter. The quadcopter is used as
a test application for this thesis as it also covers various aspects of image stabilization
applications. Therefore, the Bandpass filter is used to remove all high and low-frequency
transmission and system periodic noise while the Kalman filter predicts the future state of the
frame displacement and moderates the shakiness and distortion of the frames. Finally, a
warping affine transformation is applied to produce stabilized output with meaningful
information. Experimental results show that the proposed method is robust against variation of
illuminance and motion change to improve the performance of overall video stabilization on
an embedded system with limited computing resources. Comparison results further concur that
the proposed method outperforms 4 popular vision stabilization methods.

Keywords: Video Stabilization, Motion Estimation, Motion Filtering, Kalman Filter

ii
Acknowledgement
First and foremost, I would like to express my deepest gratitude to all the people who has
helped me achieve this goal. I would like to thank National Chiao Tung University and EECS
IGP for giving me the opportunity to study and research.

I would like to express my sincere gratitude to my advisor Prof. Kai-Tai-Song for the
continuous support during my master’s study and research, for his patience, motivation,
enthusiasm and belief towards me. It would have been impossible without his guidance and
feedback to complete my research and write my thesis. He has always believed in me and
provided me each and every means in order to complete my research work. I would like to
underline his great experience in research fields as well as his wisdom and human qualities.
Thank you very much professor for everything you have done for me. I will always be grateful.

At the 621 laboratory, I have had the opportunity to meet many new people from
different culture. It was really enriching experience to work with and learn from their
experiences. I would like to extend my sincere thanks to all my lab mates, Kathy, Ping-Jui
Hsieh, Bing-Yi Li for their valuable inputs. It was really an overwhelming experience to work
and spend time with such talented group of people. Thank you everybody for your kind and
wise behavior.

My family has been helped me along the way and been a source of inspiration and
values. I am grateful to my parents, siblings and friends for their continuous moral support and
belief in me that I can do this. Last but not the least, I extend my gratitude and devotion to the
almighty for pouring strength into me. Thank you each and every soul.

iii
Table of Contents
摘要 ............................................................................................................................................ i
Abstract ...................................................................................................................................... ii
Acknowledgement ....................................................................................................................iii
Table of Contents ...................................................................................................................... iv
List of Figures ........................................................................................................................... vi
List of Tables ............................................................................................................................ ix
Chapter 1 Introduction ............................................................................................................... 1
1.1 Motivation ............................................................................................................... 1
1.2 Related works .......................................................................................................... 2
1.3 Problem Description ................................................................................................ 3
1.4 Contribution of this Thesis ...................................................................................... 3
1.5 Organization of this Thesis ...................................................................................... 4
Chapter 2 Proposed Method ...................................................................................................... 5
Object Detection and Recognition........................................................................... 8
2.1.1 YOLO ........................................................................................................... 8
Motion Estimation ................................................................................................. 10
Motion Smoothing ................................................................................................. 11
Motion Filtering..................................................................................................... 12
Motion Compensation ........................................................................................... 14
Warping Design using OpenCV ............................................................................ 15
Detailed Procedure for Proposed Visual Stabilization .......................................... 16
Summary................................................................................................................ 17
Chapter 3 Quadcopter Control Design..................................................................................... 18
3.1 Quadcopter Design ................................................................................................ 19
3.2 Hardware Devices ................................................................................................. 20
3.2.1 Motors......................................................................................................... 20
3.2.2 Propellers .................................................................................................... 20
3.2.3 Electronic Speed Controller (ESC) ............................................................ 21
3.2.4 Pixhawk 4 ................................................................................................... 21
3.2.5 Nvidia Tx2 .................................................................................................. 22
3.2.6 Telemetry .................................................................................................... 22
3.2.7 Radio Transmitter ....................................................................................... 23

iv
3.3 Sensors ................................................................................................................... 24
3.4 User Interface ........................................................................................................ 26
Chapter 4 Experimental Results .............................................................................................. 27
Hardware and Software Parameters Description ................................................... 27
Experiment Scenario Explanation ......................................................................... 27
Experimental Results ............................................................................................. 32
4.3.1 Experimental Results with gradual motion changes of object ................... 32
4.3.2 Experimental results with drastic motion changes of object ...................... 35
4.3.3 Experimental Results with Near Foreground Motion Changes of Object .. 39
4.3.4 Experimental Results with Random Motion Changes of Object ................ 43
Evaluation and Discussion .................................................................................... 44
4.4.1 Evaluation ................................................................................................... 44
4.4.2 Discussion................................................................................................... 55
Chapter 5 Conclusion and Future Work .................................................................................. 59
5.1 Conclusions ........................................................................................................... 59
5.2 Future Work........................................................................................................... 60
References ................................................................................................................................ 61

v
List of Figures
Fig. 2.1 The system architecture of the real-time visual stabilization using YOLO with zed
stereo camera………………………...………………………………………….........6

Fig. 2.2 The Block diagram of YOLO bounding box and confidence score determination.........9

Fig. 2.3 The detailed convolutional layer of YOLO….……………………………..………..9

Fig. 2.4 The block diagram of the motion filtering and trajectory correction…………….….13

Fig. 3.1 The Control design of the quadcopter and ground control……..…………………...18

Fig. 3.2 (a) show the configuration and (b) S500 frame design of the quadcopter ….……….19

Fig. 3.3 (a) show the Complete quadcopter design (b) fully assembled embedded system on
quadcopter……………………………………………………………………...…..19

Fig. 3.4 Brushless motor…………………………………………………………………..….20

Fig. 3.5: Propellers (a) counter-clockwise (CCW), (b) clockwise (CW).………………..…..20

Fig. 3.6 ESC Motor Driver………………………………...……………………………..…...21

Fig. 3.7 The configuration and functionality of the pixhawk and its port description…………21

Fig. 3.8 Nvidia Tx2 development board………………………………………..…………......22

Fig. 3.9 Air and ground module for telemetry communication......………………………........23

Fig. 3.10 Turnigy radio transmitter...………….………………………….…………………..24

Fig. 3.11 Zed Stereo Camera………………………………..…………………………….......25

Fig. 3.12 The camera calibration using zed calibration utility…………………………....…..25

Fig. 3.13 Block diagram of onboard computer and ground station communication design…...26

Fig. 4.1 Experimental Scenario Setup in Engineering Building 5 …………………...……...29

Fig. 4.2 The Experimental scenario for the quadcopter motion with respect to the object motion
in both foreground and background of the scene.………………………………….. 30

Fig. 4.3 (a), (b) The different experimental scenario with the quadcopter for random motion
combining gradual, drastic and near foreground changes in one ……...….................30

Fig. 4.4 (a)(c)(e)(g)(i) shows the input image to the zed camera (b)(d)(f)(h)(j) shows the
stabilization algorithm in process with object detection and tracking in different
scenarios with illuminance changes …….…………………………………………..31

vi
Fig. 4.5 Handheld camera motion (a)Stabilized dx trajectory for gradual motion w.r.t original
trajectory along the x-axis (b) Stabilized dy trajectory for gradual motion w.r.t original
trajectory along the y-axis (c) Stabilized da trajectory for gradual motion w.r.t original
trajectory. ………………………………………………………………….………..33

Fig. 4.6 Gradual motion with a handheld camera (a) Stabilized dx trajectory for the gradual
motion of camera w.r.t object along the x-axis (b)Stabilized dy trajectory for the
gradual motion of camera w.r.t object along the y-axis (c) Stabilized da trajectory for
the gradual motion of camera w.r.t object along the x-axis. …………………………34

Fig. 4.7 Gradual motion with a quadcopter (a) Stabilized dx trajectory for gradual motion w.r.t
original trajectory along the x-axis (b) Stabilized dy trajectory for gradual motion
w.r.t original trajectory along the y-axis (c) Stabilized da trajectory for gradual motion
w.r.t original trajectory..…………………………………………………………….36

Fig. 4.8 Handheld camera with drastic motion (a) Stabilized dx trajectory for drastic motion
w.r.t original trajectory along the x-axis (b) Stabilized dy trajectory for drastic motion
w.r.t original trajectory along the y-axis (c) Stabilized da trajectory for drastic motion
w.r.t original trajectory..…………………………………………………………….37

Fig. 4.9 Handheld camera motion w.r.t. Static object (a) stabilized dx trajectory for the drastic
motion of camera w.r.t object along the x-axis (b) stabilized dy trajectory for the
drastic motion of camera w.r.t object along the y-axis (c)stabilized da trajectory for
the drastic motion of camera w.r.t object along the x-axis………………………....38

Fig. 4.10 Drastic motion with a quadcopter (a) Stabilized dx trajectory for drastic motion w.r.t
original trajectory along the x-axis (b) Stabilized dy trajectory for drastic motion w.r.t
original trajectory along the y-axis (c) Stabilized da trajectory for drastic motion w.r.t
original trajectory along the x-axis…………………………………………………40

Fig. 4.11 Handheld camera with near foreground motion (a) Stabilized dx trajectory for near
foreground motion w.r.t original trajectory along the x-axis (b)Stabilized dy trajectory
for near foreground motion w.r.t original trajectory along the y-axis (c) Stabilized da
trajectory for near foreground motion w.r.t original trajectory along the x -
axis………………………………………………………………………………….41

Fig. 4.12 Handheld camera motion w.r.t. static objects (a) Stabilized dx trajectory for near
foreground motion of camera w.r.t object along the x-axis (b) Stabilized dy trajectory
for near foreground motion of camera w.r.t object along the y-axis (c) Stabilized da
trajectory for near foreground motion of camera w.r.t object………………………42

Fig. 4.13 (a) Stabilized dx trajectory for near foreground motion w.r.t original trajectory along
the x-axis (b)Stabilized dy trajectory for near foreground motion w.r.t original
trajectory along the y-axis (c) Stabilized da trajectory for near foreground motion
w.r.t original trajectory along the x-axis……………….…………………………….43

Fig. 4.14 Random motion with a quadcopter (a) Stabilized dx trajectory for random motion
w.r.t original trajectory along the x-axis (b) Stabilized dy trajectory for random motion
w.r.t original trajectory along the y-axis (c) Stabilized da trajectory for random motion
w.r.t original trajectory along the x-axis……………………………………………44

vii
Fig. 4.15 (1) OpenCV Video Stabilization (2) Object Tracking and Video Stabilization using
OpenCV (3) Robust Video Stabilization Using Particle Keypoint update and L1-
Optimized Camera Path (4) Deep online Video Stabilization using Multi -Grid
Warping Transformation Learning, Stabilized dx trajectory for gradual motion w.r.t
original trajectory along the x-axis …………………………………….....................47

Fig. 4.16 Stabilized dy trajectory for gradual motion w.r.t original trajectory along the y-
axis…………………………………………………………………………………47

Fig. 4.17 Stabilized da trajectory for gradual motion w.r.t original trajectory along the x-
axis…………………………………………………………………….....................48

Fig. 4.18 Stabilized dx trajectory for drastic motion w.r.t original trajectory along with the x-
axis…………………………………………………………………………………49

Fig. 4.19 Stabilized dy trajectory for drastic motion w.r.t original trajectory along the y-
axis…………………………………………………………………………………49

Fig. 4.20 Stabilized da trajectory for drastic motion w.r.t original trajectory along the x -
axis…………………………………………………………………….....................50

Fig. 4.21 Stabilized dx trajectory for near foreground motion w.r.t original trajectory along the
x-axis………………………………………………………………………………51

Fig. 4.22 Stabilized dy trajectory for near foreground motion w.r.t original trajectory along the
y-axis………………………………………………………………………………52

Fig. 4.23 Stabilized da trajectory for near foreground motion w.r.t original trajectory along the
x-axis………………………………………………………………………………52

Fig. 4.24 Stabilized dx trajectory for random motion w.r.t original trajectory along the x-
axis………………………………………………………..………………………..53

Fig. 4.25 Stabilized dy trajectory for random motion w.r.t original trajectory along the y-
axis………………………………………………………….....................................54

Fig. 4.26 Stabilized da trajectory for random motion w.r.t original trajectory along the x-
axis…………………………………………………………………………………54

viii
List of Tables
Table 3.1: Different formats of Video Output ……………….…………………...………25
Table 4.1: Comparison of the mean square error for translation and rotation after
stabilization………………….………………………………………………....57
Table 4.2: Comparison of performance in a different scenario and the computation time of
different methods…………….……..………..…….…………………………...58

ix
Chapter 1 Introduction
1.1 Motivation
The application of cameras together with artificial intelligence nowadays is fundamental
for UAV perception of the environment. Recently, there has been a rapid growth in computer
vision techniques for visual applications on devices such as drones, robots, embedded
platforms and automobiles in inspection and monitoring, mapping, video taking just to mention
a few. The precision in the drone position estimation is crucial. In drone video stabilization,
most of the existing methods work on the foreground and background motion of the image
frame [1-3] which is not always efficient in some cases where the object becomes blurred or
there is a movement of multiple objects in foreground and background. To accomplish some
of the above-mentioned tasks, UAVs must detect, stabilize and usually track objects present in
the environment [4]. During the flight, the camera suffers jitters due to quadcopter’s motor
vibration and the jitter due to flight itself. In that case, it becomes difficult to stabilize and track
an object from the quadcopter without any additional stabilizing hardware or software.

The major problem which we face is that there are not many algorithms that can be
applied in real-time video stabilizing without any additional hardware and sensors and they do
not differentiate between the unintended motion of the quadcopter [5] and jitter and shakiness
made by the quadcopter motors during the flight. Also, for motion filtering, Kalman filter along
with low pass filter [6] is commonly used to remove the noise accumulated due to the motion
of the image but it is not enough to remove all system periodic noise and transmission noise
present in the image frame. Most of the stabilizing algorithm stabilized the frame with the
assumption that there are no or very few background movements and focusing only on the
foreground to stabilized the image. But if both background and foreground have movements
then it does not stabilize well and its performance decreases.

Stitching video for stabilization [7] is not effective for aerial video stabilization as the
camera motion w.r.t object is fast and has high jitter due to the motors as the need for the
autonomous drone is increasing, the necessity of a stabilized visual system is a crucial part for
a quadcopter stable flight and its various applications indoor and outdoor. Even in the case of
handheld camera shakiness due to hand movement can add a lot of grittiness to video, it
becomes jarring over a time for the people to watch. Using gyro [8] on UAVs we can stabilize
the video but in doing so there is a risk of motor malfunctioning and it adds to the cost of

1
hardware fusion and make the system more complex. This thesis studies the solution for the
robust real-time visual stabilization using the onboard camera without any additional hardware
cost and with least embedded system configuration to give the stable video for a better
realization of the on-flight scene and its various applications in the field.

1.2 Related works


Developing video stabilization algorithm for a quadcopter in indoor and outdoor
environment presents a unique set of challenges like background movement, static and dynamic
motion of the object, the drastic motion of the object w.r.t. the quadcopter, jitter due to motors
and propellers of the quadcopter and luminance. Mainly the video stabilization algorithms
present are based on 4 assumptions [2][9][10]. (1) The feature motion is purely translational
(2) background movement is least or no movement. (3) Ignoring the noise added due to the
motors of the quadcopter (4) Unintended motion of the quadcopter itself. OpenCV [9] is a vast
opensource library for motion filtering and compensation needed for video stabilization.
OpenCV uses rigid Euclidean transform to get the trajectory of x,y and angle at each frame,
then sliding average window is being used to smoothen the trajectory which creates a new
transformation with the noise and jitter removed. This is further improved using object tracking
with vidstab class [10] to enhance the feature matching and have a more prominent feature to
compensate the image motion accumulated due to the jitter and shakiness by initializing
bounding box for drawing rectangle around tracked object and select region of interest for
tracking. It calculates the key feature points using optical flow for each frame and uses affine
transform to generate a stabilized output. Camera path estimation has become the most
important for fast and efficient stabilization in recent times. Jeon et al. [11] proposed a robust
video stabilization method which uses particle key point update with optimized camera path
for stabilizing the video. This method stabilized the frame in 3 steps namely (i)robust feature
detection (ii) estimate camera path (iii) rendering to reconstruct a stabilized video. The flat
region map is generated with the help of a pair of input shaky video frames. The new feature
points it generates in the flat regions using particle key points helps in determining the accurate
homography for video rendering. Moreover, using adaptive camera path it stabilizes the frames
with less noise due to minimum temporal total variation. This makes it suitable to be used in
many robot vision, autonomous driving assistant and visual surveillance system applications.
To further improve the accuracy and speed of the stabilization, deep learning and AI
technologies are being used recently to give a robust and efficient stabilization method to be

2
used in various industrial and everyday applications. Wang et al. [12] proposed a deep online
video stabilization using multi-grid warping transformation learning. This method is essential
for most of the handheld captured videos as it eliminates the high-frequency shakes. It uses a
neural network model called stabnet, which learns to predict the stable transformation of the
upcoming unsteady frames using the mesh grid transformation information from the previous
frame. Very few stabilization algorithms use deep learning till now. So, there is a vast scope
for exploration in the field of real-time video stabilization using AI and deep learning to make
the method more robust and precise and real-time without much hardware cost.

1.3 Problem Description


Objects are often recorded at low resolution and encounter important illumination
variations which make UAV tracking more challenging than general tracking. Real-time
stabilization faces several issues like memory limitations, computation cost, the relative motion
between the camera and the object, the magnitude of jitter and shakiness level it can rectify.
For stabilization from the quadcopter, it’s not always feasible to take foreground and
background of a frame to stabilize the image due to its sudden and fast motion during the flight.
Visual stabilization during the flight is very important especially in the indoor environment to
avoid any kind of accident or damage. There are a lot of jitters and noise during the flight but
all noise cannot be removed by Kalman filter

1.4 Contribution of this Thesis


This thesis investigates on the aforementioned methods in the related work section for an
efficient and robust video stabilization for UAV applications. The contribution of this thesis is:

(1) Fast and accurate detection of an object in the video frame for feature extraction and
feature mapping and motion estimation
(2) Robustness for foreground and background changes in the image frame for stabilization.
(3) Robustness for illuminance changes in the image frame for stabilization
(4) Removal of all high and low-frequency noise present in the input frame using the
Bandpass and Kalman filter and compensation of the image velocity accumulated due to
unintended jitters and shakiness
This output helps in smoothing and compensating the deviation in the frame path due to
the noise with the help of Kalman filter and Bandpass filter we remove all low and high-
frequency noise resulting in a robust stable output video. It resolves the problem of high-

3
frequency jitter as well as low-frequency noise. By determining the global motion vector of
each frame using Kalman filter, input frame motion vectors are determined which is further
passed through the Bandpass filter. In Bandpass filter frequency with a certain range can only
pass through, rest high and low frequency are rejected hence the smooth output is produced.
The smoothened output is again passed through the Kalman filter and gaussian blur to
compensate the pixel motion and remove any jitter or noise left in the image frame and
smoothen the image for the final warping to obtain a stable and smooth output. This result can
further help in serval application of object tracking, photo-taking, monitoring, surveillance and
also in precision agriculture.

1.5 Organization of this Thesis


This thesis is organized as follows, Chapter 1 describes the introduction and literature
review of the video stabilization methods with the help of related works. Chapter 2 explains
the proposed method and its architecture with the implementation of YOLOv3 and Bandpass
motion filtering. In chapter 3, system design with the quadcopter is explained as to carry out
the experiment. In Chapter 4, experimental results are presented with the comparison with the
existing method to show the effectiveness of the proposed method. In Chapter 5, the conclusion
of the method effectiveness is discussed based on the data from result data after that the future
scope of work of this thesis is discussed.

4
Chapter 2 Proposed Method
Video stabilization is an image processing technique to remove the unwanted high-
frequency noise and low-frequency noise due to jitter and shakiness during handheld camera
recording or during UAV sudden motion shakiness. With the advancement of technology in
the field of computer vision and image processing, the need for real-time visual stabilization is
the key tool for state-of-the-art algorithms like self-driving cars, autonomous navigation of
drones, cinematography or general video recording. There are several methods for video
stabilization. OpenCV has numerous libraries on video stabilization like VidStab, Key feature
matching Optical flow etc. For this thesis, we will be extensively using OpenCV library in our
algorithm to stabilize and detect the object in real-time during the flight of a drone so that it
can take a stable and precise video of the object in the frame.

The goal of this thesis is to study and implement the YOLO along with Bandpass, Kalman
motion filter for fast, precise and stabilized video when using a quadcopter. The developed
algorithm provides a fast and robust stabilization system. Most of the stabilization till now
works by considering only the foreground and background of the image in the frame to stabilize
the frame using either the feature matching which is efficient in generic cases but in some cases,
fails to stabilize the image. It's not always possible to have feature extraction using the
foreground and background of the image during the drastic change in the scene or the object in
case of dynamic motions of objects with relative motion with the camera. It’s difficult to get
the exact feature of the object because image sharpness and edges become a blur, which makes
the feature matching difficult and results in the poor stabilization results.

We propose a real-time stabilization method using YOLOv3[13] and Bandpass motion


filtering along with motion compensation as shown in Fig. 2.1. We take the input image data
from the Zed stereo camera mounted on the quadcopter, determine its parameters and send to
YOLO object detection model for the determination of bounding box of the objects in the image
frame during that particular time of quadcopter flight. YOLO detects the object and defines
bounding box around it with the highest score and send the position coordinates and the
detected feature to the motion estimation module which determines its location and motion
vectors and match the pixel motion with the camera motion to determine the translation along
x, y and angle. We focused on stabilization of object w.r.t the camera, so that the object remains
stable in the image frame. Using the feature point of the initial object coordinate in the first
frame. Once motion estimation parameters are determined, it is sent to motion smoothing
module for the removal of noise due to accumulated motion and smoothing of the image frame

5
Fig. 2.1 The system architecture of real-time visual stabilization using YOLO with Zed stereo camera

6
using Gaussian blur [14]. The smoothened output is then passed through the Kalman filter
where firstly it determines the global motion vector of each frame, once the vector position of
the pixels in the frame is determined, it reconstructs the smoothened transform and predicts the
next stable and uniform state of the pixel in the frame. As the video can still contain some
disturbance and noise even after passing through the Kalman filter due to motor vibrations and
unintended motion of the quadcopter. To overcome that we use Bandpass motion filtering in
this thesis. It consists of the Kalman filters and the Bandpass filter. The Bandpass filter is used
to truncate the periodic noise of high and low-frequency range. It makes the image sharp which
can cause coarse image output so to smoothen it gaussian blur is used. Once all the possible
noise is removed from the incoming frame and motion vectors of the image frame is corrected,
it is then passed through the motion compensation module where the incoming frame is
compared with the first frame which is considered as the reference frame. Cumulative
transform of each frame is summed and transformed to obtain a smooth affine transformation
which is finally passed to the warp module where full-frame warps are computed between the
original and filtered motion to give the stable output. In doing so, we are able to stabilize the
object in the image frame and also the background up to a certain extent as the motion vectors
of the detected object is correlated with the entire image frame but most challenging part of
this experiment that when the foreground contains more than one object in the scene along with
background motion. To solve this problem, we introduced YOLO to enable the proposed
algorithm to learns specific object features, surrounding objects and its background features.
When these features are present in testing or real-world scenario together or near to the target,
then accuracy of the methods further increase the confidence of YOLO detection which in turn
provides more prominent pixel coordinates of the bounding box to initiate the feature matching
search and the bounding box coordinates with highest pixel intensity value is selected. The
algorithm continues to stabilize the image frames based on the detected object until it is present
in the frame. If it moves out of the frame, next detected object is selected to stabilize the image
frame based on its pixel coordinates. This plays an important role in case of occlusion and also
illumination variation as YOLO is robust to these changes, so it continues to provide the pixel
information of the detected object and stabilization of the overall image frame is not affected .
However, this creates a slight disturbance, but this won’t affect the overall stabilization of
process and as a result, it continues to gives a stable output without any large addition of noise
or disturbance. Fig. 2.1 show the complete System design of the method. For the
implementation of our algorithm, python 3.6 version and OpenCV version 4.3 are used to
execute the program.
7
Object Detection and Recognition
Object detection is important techniques used in many computer visions, image
processing applications for detecting the occurrence of objects of a certain class present within
an input image frame. Object detection and recognition techniques can be classified into mainly
two types: single-step and two step-approach. the single-step approach emphasizes the
detection speed, like YOLO, SSD and Retina Net. Two-step approach prioritizes to be more
accurate with a little compromise with the speed for detection, like Faster R-CNN, Mask R-
CNN and Cascade R-CNN. Object recognition is an important computer vision and image
processing method used for recognizing, identifying, and locating objects within a picture with
its probability percentage. With current deep learning models, we can detect and recognize the
object with much more accuracy and efficiency. For this project, we use custom YOLOv3[13]
for object detection and recognition which facilitates the motion estimation and further motion
compensation to endeavor a stabilized output.

YOLOv3[13] is a single-step convolutional network which predicts objects and


characterizes different bounding boxes and class probabilities for those boxes within the frame
outline simultaneously. It is trained on the class of images which specifically optimizes the
detection performance. Not at all like sliding window and region proposal-based
methodologies, YOLO sees the entire picture during training and test time so it verifiably
encodes significant information approximately classes as well as their appearance. We train the
convolutional layers as shown in Fig. 2.3 on the ImageNet classification dataset [15] at 416 x
416 resolution with learning rate of 0.001, max batch size of 6000 and 18 convolutional filters.
The resolution is then doubled for detection with custom weight for our indoor purpose. For
this thesis, we use YOLO v3 to determine the object location and define the bounding box for
the object detected and pass the data to motion estimation for the feature tracking and matching
using Lucas Kanade algorithm [16]. Let us consider the input image from the Zed stereo camera
be (𝑆𝑖,𝑗 ), where i and j are x and y-axis of the image. Once the input image is passed through
the YOLO after the image frame width and height are determined. The system divides the
input image into an S × S grid and then determine the output by reducing it to 13x13xNx23,
where n is the number of the bounding box.

8
Fig. 2.2 The Block diagram of YOLO bounding box and confidence score determination [13]

Fig. 2.3 The detailed convolutional layer of YOLO [13]


The object score is determined in addition to the object classification. The ones with the
lower score are discarded. Instead of predicting the absolute size of the box w.r.t the entire
image. It predicts the anchor box which best matches the desired object of interest. The box
width and height are then converted into the image frame scale coordinates. If the centre of an
object in the image frame lies within the grid cell, then that particular grid cell is responsible
for detecting that object. Each bounding box consists of 5 predictions parameters: i, j, w, h, and
confidence predicted by the grid cell for a particular image frame. The (i, j) coordinates

9
represent the centre of the bounding box relative to the grid cell bounds. The width and height
are further predicted relative to the whole image.

𝑃𝑟(𝐶𝑙𝑎𝑠𝑠𝑖 | 𝑜𝑏𝑗𝑒𝑐𝑡). 𝑃𝑟(𝑂𝑏𝑗𝑒𝑐𝑡). 𝐼𝑂𝑈 = 𝑃𝑟(𝐶𝑙𝑎𝑠𝑠𝑖 ). 𝐼𝑂𝑈 (1)

where 𝑃𝑟 is the probability of the 𝐶𝑙𝑎𝑠𝑠𝑖 is the gives us the class-specific confidence score for
every box. Each grid cell predicts n number of conditional class probabilities,
𝑃𝑟(𝐶𝑙𝑎𝑠𝑠𝑖 | 𝑜𝑏𝑗𝑒𝑐𝑡). These probabilities are conditioned on the grid cell containing an object.
IOU is the individual box confidence score. It is then passed through the convolution neural
network for the classification of the object, the once-classified bounding box is drawn across
the object with the highest confidence score. It passes the centre pixel ‘a’ coordinate (ai, aj)
information of the object in the frame along with the width and height of the detected bounding
box

Motion Estimation
Motion estimation algorithm utilizes the pixel values to calculate the motion information
pixel per frame. Usually, a typical video has more than one-pixel difference between
continuous frames, thus, an iterative course-to-fine approach is often considered to calculate
the pixel difference in a series of steps to define the motion vectors. A course-to-fine iterative
estimation or pyramidal algorithm transforms the original image into a course image using a
Gaussian blur [14] to obtain the total movement between the frames. This transformation
focuses mostly on the small displacement of the pixels at each step, the large displacement is
usually ignored. However, most of the iterative approaches require an initial conjecture to start
the search from some initial pixel coordinates from the image frames. Therefore, to provide the
algorithm with an initial guess, we use the centre pixel coordinate of the bounding box of the
detected object with the help of YOLOv3[13]. This enables the algorithm to save its
computation time for searching for the initial point and looking in the entire frame. In this
Lucas Kanade tracking algorithm [16] using cv2.calcOpticalFlowPyrLK() [17] First, we
consider an image point a = [ai, aj]T is obtained as centre pixel from the detected bounding box
with the help of YOLOv3 for the first image S. Using the image point, a, we attempt to locate
𝑇
point v = [𝑣𝑖 + 𝑑𝑖 , 𝑣𝑗 + 𝑑𝑗 ] on the second image K such that S(a) and K(b) are in correlation
with each other and features matching is proceeded to the next frame. The vector d = [di dj]T
signifies the image velocity at a which is also known as the optical flow at a. In addition to the
translation component ‘d’, an affine transformation ‘A’ is performed between S and K in the
vicinity of the two-image feature points a and v.

10
𝐾(𝑖, 𝑗) = 𝑆(𝑑𝑖 + 𝑎𝑖 , 𝑑𝑗 + 𝑎𝑗 ) (2)

1 + 𝑑𝑖𝑖 𝑑𝑖𝑗 (3)


𝑨=[ ]
𝑑𝑗𝑖 1 + 𝑑𝑗𝑗
where 𝑑𝑖𝑖 , 𝑑𝑖𝑗 , 𝑑𝑗𝑖 and 𝑑𝑗𝑗 characterize the transformation of the image patch. Then, using
Taylor series expansion, we minimize the residual function 𝜀 to find vector d and affine matrix
A using the eq. (4).
𝑎 +𝑤 2 (4)
𝑖 𝑎 +𝑤
𝑖 𝑗 𝑗
𝜀(𝑑, 𝑨, ) = 𝜀(𝑑) = ∑𝑥=𝑎𝑖 −𝑤𝑖
∑𝑦=𝑎𝑗 −𝑤𝑗
(𝑆(𝑖, 𝑗) − 𝑘(𝑖 + 𝑣𝑖 , 𝑗 + 𝑣𝑗 ))

Once the motion vector’s magnitude and direction are determined and matched with the
previous frame, the trajectory using the motion of two consecutive frames are estimated for its
scale, translation along x, y-axis and rotation along the x-axis. The linear transformation and
translation are given by:

𝑣𝑖 = 𝑎1 + 𝑎2 𝑎𝑖 + 𝑎3 𝑎𝑗 (5)
𝑣𝑗 = 𝑎4 + 𝑎5 𝑎𝑖 + 𝑎6 𝑎𝑗 (6)
where 𝑎2 𝑎𝑖 = 𝜆𝑐𝑜𝑠𝜃, 𝑎3 𝑎𝑗 = −𝜆𝑠𝑖𝑛𝜃, 𝑎5 𝑎𝑖 = 𝜆𝑠𝑖𝑛𝜃, 𝑎6 𝑎𝑗 = 𝜆𝑐𝑜𝑠𝜃
Therefore, the polar representation of eq. (3) of the affine transform matrix 𝐴𝑖 is given by
𝑣𝑖 𝜆𝑐𝑜𝑠𝜃 −𝜆𝑠𝑖𝑛𝜃 𝑎𝑖 𝑎1
𝑨𝒊 =[𝑣 ] = [ ] [𝑎 ] + [𝑎 ] (7)
𝑗 𝜆𝑠𝑖𝑛𝜃 𝜆𝑐𝑜𝑠𝜃 𝑗 4

where 𝑖 = 1,2,3 … 𝑛 and 𝜆 is the scale, 𝜃is equal to rotation of the image frame, and 𝑎1 , 𝑎4 are
the translation along the x and y-axis respectively[18]. Once we have the motion vectors
parameters, we then feed the value of E in eq. (9) and (10) along with the motion estimated
parameters in eq. (4) to motion smoothing module where we apply the Gaussian blur filter to
smoothen the edge of the input frame and remove grittiness.

Motion Smoothing
In motion smoothing, high-frequency shakiness accumulated due to the unintended
camera motion are rectified and edges are smoothened using the filter. Many motion smoothing
techniques are readily found to determine the motion intention and smoothen it, such as particle
filter, Kalman filter, or Gaussian filter, median filter [1] by removing small noises and rejecting
large outliers. For this thesis, we used the Gaussian blur using low pass filter for smoothing the
edges and removing the noise and unwanted motion in the frame. The objective of motion
smoothing is to obtain the smooth and desired movement of the feature points to compute the
motion parameter of the input frames after that. It uses a Gaussian function for computing

11
the transformation of a pixel in each image which is to be applied to each pixel in the image
frame. The Gaussian function [14] in 1-D is given by the equation:
𝑑2
1 −
G(i) = √2𝜋𝜎2 𝑒 2𝜎2 (8)

For 2-D, the Gaussian function is the product of 1-D function in each direction.
𝑑𝑖2 +𝑑𝑗2
1 −
G (i, j) = 2𝜋𝜎2 𝑒 2𝜎2 (9)

where 𝑑𝑖 , 𝑑𝑗 are the vector distance from the reference point in the x and y-axis defined in eq.
(3), and the value of σ is considered as 3.
After applying Gaussian blur, the surface generated by the function has concentric circles
contours from the reference point in the image frame. Therefore, a convolution matrix is
determined using the values from the eq. (13) which is further computed with the original image
frame. For every new pixel, its value is defined by the pixel's neighbourhood weighted average.
The heaviest weight is determined by the pixel values of the convolution matrix from the initial
image while the neighbouring pixels get smaller weights as they are farther from a centre pixel.
Due to this the image frame becomes a blur, and noise is reduced to make the image frame
smooth with the help of cv2.GaussianBlur()[14] function in OpenCV for further motion
filtering. The width and height of the kernel are set to 15x15.

Motion Filtering
After the image frame is smoothened, the determined motion parameters and trajectories
from the motion estimation module is fed to the motion filtering to remove the unwanted jitter
and shakiness to correct the image frame path. Kalman filter is employed to remove the high-
frequency shakiness due to the unintended motion of the quadcopter motion and keep the
motion of the overall model in a smooth steady motion. However, the residual jitter and noise
still passed through the filter which decreases the model performance. So, to remove such
leftover noise, we introduce a Bandpass filter with the combination of Kalman filter and
gaussian blur to produce a smoother output.

Kalman filters are generally the low pass filters and they only cut the high-frequency
jitters but telemetry transmission and the interference of the sensor still contain the low-
frequency noise. Besides, noise present in image frame itself due to the object and background
motion is critical to deal to obtain an efficient stabilized output. Thus, an additional
combination of Kalman filter along with Bandpass filter is used to resolves the high-frequency
jitter as well as low-frequency noise present as shown in Fig. 2.3. In this method the global

12
Fig. 2.4 The block diagram of the motion filtering and trajectory correction
motion vectors from eq. (4) of each input, image frames are corrected by the Kalman filter to
be further processed through the Bandpass filtering and Gaussian blurring. Gaussian blur is
used again here, as Bandpass filter makes the image sharp so that it smoothens the image as
explained in section 2.3. The smoothened output is again passed through the Kalman filter to
rectify the pixel motion and remove any leftover jitter or noise. Kalman filter estimates the
future upcoming state of the video frame depending on the values of the previous state of the
system without considering the sequence of past states. Let 𝑺𝒎 be the state of the frame at time
m, T be transfer matrix[19] to make the upcoming state matrix and current state matrix in exact
form so that the motion parameters of the state matrix that is the pixel position and velocity is
not lost and 𝑵𝒎 is the noise in the image frame. For the sake of brevity the derivation of the
state matrix and transfer matrix is omitted here. The detailed proof can be found in [19].

𝑺𝒎+𝟏 = 𝑻𝑺𝒎 + 𝑵𝒎 (10)


For the current state of the system: 𝑪𝒎 = 𝑶𝑺𝒎 +𝑯𝒎 (11)
where O is the current state matrix, 𝑯𝒎 is the current frame noise, 𝑪𝒎 is the current state
system state 𝑺𝒎 at time m. Kalman filter task is to do the prediction and correction of the
current state to get close to the value 𝑺𝒎 . For prediction using cv.KalmanFilter.predict() [20]:
𝑺′𝒎 = 𝑻𝑺𝒎−𝟏 ,
𝑸′𝒎 = 𝑻𝑸𝒎−𝟏 𝑻′ +𝑬𝒎 (12)
where 𝑬𝒎 is the covariance matrix of the noise present in the present image frame and 𝑺′𝒎 and
𝑸′𝒎 is the estimated value of 𝑺𝒎 and 𝑸𝒎 respectively.
For the correction using cv.KalmanFilter.correct() [20]:
𝑨𝒈 = 𝑸𝒎−𝟏 𝑶′ (𝐎𝑸′𝒎−𝟏 𝑶′ +X)−𝟏
𝑺𝒎 = 𝑺′𝒎 + 𝑨𝒈 (𝑪𝒎 − 𝑶𝑺′𝒎 )
𝑸𝒎 = (𝑰 − 𝑨𝒈 𝑶) 𝑸′𝒎−𝟏 (13)
where 𝑨𝒈 is the Gain value, X for the current state noise of the image frame is the covariance
matrix whose value chosen here as 107 [21] and I represent the constant matrix function. For

13
a gaussian blur, we can choose a discrete Gaussian filter by approximating the continuous
Gaussian eq.[14]
2 𝑑 2
1 −(𝑑σ )
g[m] =𝐺 𝑒 where G = ∑𝑟𝑖=−𝑟 𝑒 −( σ ) and M = [r, -r] (14)

Then, Butterworth Bandpass [22] is performed to remove the periodic noise and over
filtering. it comprised of low pass filter and high pass filter, so the transfer function of the
Bandpass filter is the product of their product.
1
𝑇𝐿 (𝑖, 𝑗) = 𝑓(𝑖,𝑗)
2𝑛 (15)
1+[ ]
𝑓𝐿

1
𝑇𝐻 (𝑖, 𝑗) = 1 − 𝑓(𝑖,𝑗)
2𝑛 (16)
1+[ ]
𝑓𝐻

𝑇𝐵𝑃𝐹 (𝑖, 𝑗) = ℎ𝐿 (𝑥, 𝑦) . ℎ𝐻 (𝑖, 𝑗) (17)


where f(i, j) is a point at a distance (i, j) from a reference point in the image frame S(i, j) in the
frequency domain i.e. the centre origin of the image frame, filter order is defined by the value
of n, 𝑓𝐿 is the low pass cut-off frequency and 𝑓𝐻 is the high pass cut-off frequency. The high
pass cut-off frequency chosen here is 1.5 Hz, and the low pass cut-off frequency is set to 0.5
Hz. After trying a range of frequency such as 0.2, 0.3, 0.5, 0.8, 1,1.5, 2 Hz, these two frequency
values were chosen as the ideal and most suitable frequency which can be used for this
experiment. For a very small low pass cut off frequency, the filtering is noisy and less efficient
and it will pass noise too along with the desired signal hence giving us noisy image frame, and
with higher cut-off frequency for high pass filter, it may remove significant features of the
image which will result in failure of the purpose of filtering. Therefore, an ideal range of 0.5-
1.5 Hz is selected. It has a linear transfer function without any interruption and its continuous.
The frequency range over which this filter can operate smoothly is dependent on the order of
the filter chosen, which for this case is 3. After the image is filtered using a Bandpass filter,
finally the discretely convoluted motion parameters using the aforementioned equation to
obtain the new motion parameter which is once more passed through Kalman filter using and
gaussian blur explained in section 2.3 to advance remove the unintended movement and noise
and smoothen the image frame as explained in an earlier section. The purpose of using the
gaussian blur here is to remove the sharpness of edges caused by the Bandpass filter.

Motion Compensation
The corrected motion parameters obtained from motion filtering using eq. (17) are used
to compensate for the image frame for obtaining a stabilized output. In this method, the

14
video/image frame will be preselected and then the next frames will be stabilized comparing
the features for the reference image. Let us consider a set of the frame be 𝑆𝑖 , where i=1, 2, 3…n,
the motion parameters estimated in the above steps are represented as smooth affine transform
𝐴𝑠𝑚𝑜𝑜𝑡ℎ [23]as follows:

𝜆 cos 𝜃 −𝜆 sin 𝜃 𝑇𝑥
𝑨𝒔𝒎𝒐𝒐𝒕𝒉 =[−𝜆 sin 𝜃 𝜆 cos 𝜃 𝑇𝑦 ]
(22)
0 0 1
where 𝜆 is the scalar value, 𝜃 is the rotational value and 𝑇𝑥 and 𝑇𝑦 are the corrected translation
values of the image frame after applying an affine transform to compensate the motion
parameters of the image obtained from eq. (17).
𝑨𝒕𝒐𝒕𝒂𝒍 = ∑𝑖=𝑛
𝑖=0 𝐴𝑖 (23)
The total transform 𝐴𝑡𝑜𝑡𝑎𝑙 represents the camera motion of all the frames except the
initial frames as shown in eq. (7). Therefore, the total sum of affine transform of every
individual frame till now is used to stabilize the video, a Bandpass and Kalman filter is used
which takes the parameters generated by the image frame motion using total affine transform
and reconstruct it into a smooth transform 𝐴𝑆𝑚𝑜𝑜𝑡ℎ𝑒𝑑 removing all the noise generated by the
quadcopter and the flight motion itself and dampening the unintended motion of the frame by
predicting the next state of the smooth trajectory by Kalman filter.

Warping Design using OpenCV


OpenCV has functions like warp Affine(), which deals with the stabilization process [10].
The Affine matrices obtained from motion compensation is used to produce a stable visual
output using warping function. In this pixel position transformation between each image frame
is given by warping function W with parameters u = (𝑢1, 𝑢2, . . . , 𝑢𝑛)𝑃 . W (i, u) corresponds to
new pixel coordinates in the image frame S obtained by warping the pixel at the position v in
image coordinate of the previous frame of template P. Template image P(i) implies the region
of interest extracted at some location a= (𝑎𝑖 , 𝑎𝑗 ) in the previous frame. Affine warping is given
by W (i, u) = 𝑨𝒂𝒇𝒇𝒊𝒏𝒆 i. as shown in equation (16). The smoothened frame is generated using
equation (22). The Sum of squared differences is the error measure used for tracking the feature
location in K corresponding to the features found at a = (𝑎𝑖 , 𝑎𝑗 )𝑃 in a previous frame to S. To
minimize the error between P(i) and W (i, u). Newton-Raphson method computes displacement
iteratively i.e. ∆u until ∆u ≦∈ or a maximum number of iterations. Parameters are updated after
each frame using the equation (31). The current frame 𝑿𝒄𝒖𝒓𝒓𝒆𝒏𝒕 is then passed to the warping
module to get the smooth output frame 𝑿𝑺𝒎𝒐𝒐𝒕𝒉𝒆𝒅 using the equation from OpenCV [11].

15
𝑿𝑺𝒎𝒐𝒐𝒕𝒉𝒆𝒅 = 𝑨𝑺𝒎𝒐𝒐𝒕𝒉𝒆𝒅 . (𝑨𝒕𝒐𝒕𝒂𝒍 )−𝟏. 𝑿𝒄𝒖𝒓𝒓𝒆𝒏𝒕 (24)
𝑎 +𝑤 2
∑𝑎𝑖=𝑎
𝑖 +𝑤
∑ 𝑗
−𝑤 𝑗=𝑎 [𝑆(𝑊(𝑖, 𝑢)) − 𝑃(𝑖)] (25)
𝑖 𝑗 −𝑤

𝑢 ← 𝑢 + ∆𝑢 (26)
This function is executed until it reaches the end of the video sequence and gives a final
stable output with stabilized motion and jitters and shakiness removed. Once it reaches the end
of the video sequence it stops and the video is saved as the mp4 file. The using the Vidstab
library we extract the motion trajectory of the pixel for original input video and stabilized video.
These trajectories in turn are plotted to verify the effectiveness of the method. This is further
illustrated in the experimental result section of Chapter 4.

Detailed Procedure for Proposed Visual Stabilization


The proposed method take n unstabilized image frames as input to pass through different
modules such as YOLO, motion estimation and motion filtering to construct a stabilized video
with above mentioned details. The algorithm summarized as follows

Algorithm: Real-time video stabilization


Input: n unstabilized image frames
Step 1 Get the width and height of the initial frame and store the information.
Step 2 Pass each frame through YOLO and resize the image
I. Detect the object having a maximum confidence score.
II. Apply CNN
III. Draw bounding box and pass centre pixel coordinates to motion estimation for
each frame.
Step 3 For each image frame
I. Estimate motion vectors using Luca Kanade Tracking Algorithm
II. Reduce grittiness and smoothen upcoming image frame.
Step 4 Reduce high and low frequency jitter and shakiness in image frames
I. Predict and correct motion trajectories with Kalman filter
II. Reduce low and high frequency noise using bandpass filter
Step 5 Pass each denoised image frame to Warping function to generate a smooth and
stabilized output.
Output: Stabilized Image/Video.

16
Summary
The use of YOLOv3[13] for object detection and recognition enhances the feature
detection precision and makes it more reliable for the motion estimation as it is not affected by
the background motion in the image frame which is the most common problem in most of the
algorithm that they assume that there is no or very small background motion and only focuses
on the object in the foreground which causes difficulties in a crowed indoor scenario. The use
of Kalman filter and Butterworth Bandpass filter help in making the input frame noise-free and
also remove the high-frequency jitter and shakiness due to the quadcopter unintended motions
and the jitters making it robust and efficient stabilization algorithm for any aerial or handheld
purposes without the need of additional hardware like gyro or gimbal making it cheap and
effective. We were also able to solve the problem of dynamic motion of camera w.r.t object in
gradual, drastic and near foreground and random conditions.

From the experimental results, it can be seen later that in dynamic motion, the major issue
with other algorithms was lack of detected feature of the object to stabilize or object too close
to the camera leading to the parallax error and the algorithm fails to differentiate between
foreground and background, but in our algorithm, until the object is there inside the frame and
is detected our algorithm can stabilize the image. Especially in the case of drastic motion
changes and near foreground we can observe in the later experiment that the object sometimes
becomes blurry due to fast change in position and it becomes tough for the OpenCV feature
detecting algorithms to detect the object in the frame and extract its feature leading to the poor
stabilization result.

17
Chapter 3 Quadcopter Control Design
In recent times there has been a huge demand of the quadcopter for surveillance, rescue
operation, video taking, Industrial inspection, monitoring, precision farming and various other
tasks. Quadcopter plays a vital role in today’s society. But tracking moving object using a UAV
is very challenging as it requires the object feature to be tracked at every instance of time but
without any hardware, for visual stabilization, it is very difficult to provide the steady and
constant stable input of the object to be tracked hence digital stabilization is very necessary.
Our method can make the object out stable enough to extract sufficient features for the efficient
tracking of an object in almost all kind of situations.

Fig. 3.1 shows the complete system design. The input from the zed stereo camera is fed
to the visual stabilization and YOLOv3[13] object detection module. For the experiment, we
used the turnigy radio transmitter [24] to control the quadcopter which is controlled by the
pixhawk[25] and the visual stabilization and object tracking is done by the Nvidia Tx2 [26] on
board as we will see in the later section. Pixhawk controls the ESC and the motor motion by
providing the location and altitude, information using the inbuilt sensors. Pixhawk is connected
with an fpv air module which communicates with the ground station i.e the laptop through the
ground module antenna which has 915hz frequency range through the software
QGroundControl [27]. Where we can change the flight mode and communication using that
interface which is explained in the next section. QGC can control the throttle, roll, pitch and
yaw of the Quadcopter using the radio link.

Fig. 3.1 The Control design of the Quadcopter and ground control
18
3.1 Quadcopter Design
We can see that the quadcopter consists of four motors and each arm of the frame is
mounted with motors in the different rotation direction. As shown in Fig.3.2 (a)[28] for each
motor a propeller is fixed to rotate in a counter-clockwise or clockwise direction. The rotation
of the propeller will produce the thrust to make the octocopter hover and fly using its roll, pitch
and yaw. For a positive roll 𝑊3 and 𝑊2 motor speed is decreased. Similarly, for Yaw 𝑊1 and
𝑊2 speed should be increased. And for a pitch which makes the quadcopter move forward 𝑊4
and 𝑊2 speed should increase. And for throttle, all motors speed is increased. Fig.3.2(b) [29]
shows the carbon fibre S500 frame used for the quadcopter experiment in this thesis as it is
lightweight and durable. So that it doesn’t add up much to the overall weight of the quadcopter
with all the components on board. The overall weight of the quadcopter is around 2.3 kg.
Fig.3.3(a) and Fig. 3.3 (b) shows the complete Quadcopter platform with onboard computer i.e.
Nvidia tx2, Zed stereo camera [30], pixhawk, radio antenna.

(a) (b)
Fig. 3.2 (a) show the configuration [27] and (b) S500 frame design of the Quadcopter [29]

Fig. 3.3 (a) show the complete quadcopter design (b) fully assembled embedded system on a
quadcopter

19
3.2 Hardware Devices

For our experiment, we have used Turnigy D2836/11 750kv motors [31] as shown in Fig.
3.4. The motor has voltage input from 7.4 ~ 11.1 V. The rotation speed of this motor is around
8325 RPM. The maximum current input that it can take is 14A with a full load and 08A with
no load. The maximum power is 210W with 0.16Ω internal resistance and the maximum thrust
is 800g.

Fig. 3.5(a) [32] shows the propeller used in the Quadcopter for CCW direction and Fig.
3.5(b) is for CW direction. The Quadcopter has two CW propellers and two CCW propellers
which are mounted on every motor. This propeller is very lightweight, rigid and ideal for multi-
rotor. It has 10 inches in diameter and 4.5 inches for pitch.

Fig. 3.4 Brushless motor [31]

Fig. 3.5 Propellers (a) counter-clock wise (CCW), (b) clockwise (CW) [32].

20
As shown in Fig. 3.6, ESC [33] is an electronic circuit which regulates the motor’s speed.
This kind of motor driver is often used for radio control models, and most often used for
brushless motors. This ESC can generate three-phase electric power with a low voltage source
of energy for the motor. The control input of ECS by giving a pulse to the motor. The input
pulse for the motor on is above 1060 µs and the full power at 1860 µs, while giving a pulse
below 1060 µs, will stop the motor.

Pixhawk [25] as shown in Fig. 3.7 is a middleware flight controller used to control and
help navigate the flight base on the in-built and external sensor data fusion to give commands
to the ESC and motors for the flight. It is an open-source platform to make it suitable for
individual need.

Fig. 3.6 ESC Motor Driver [33]

Fig. 3.7 The configuration and functionality of the pixhawk and its port description

21
Jetson TX2 as shown in Fig. 3.8 is an AI supercomputer on a module [26]. Due to its
compact-sized, and power-efficient form factor it a most suited tool for all deep learning and
embedded systems application devices like robots, drones, smart cameras, and portable medical
devices. Processing of complex data can now be done on-board edge devices. This means it
can perform visual stabilization as well as to object detection and tracking tasking onboard
during a quadcopter flight much faster compared to other embedded platforms like raspberry
pi, intel edge etc. Bridging AI with such compact and fast performing platform unlocks huge
potential for devices in network-constrained environments with minimum hardware cost and
complexity.

An FPV Telemetry radio is a compact and inexpensive open-source platform that


typically allow ranges around 500 to 1km range as shown in Fig. 3.9. The radio uses open
source firmware through which it communicates with MAVLink packets and to be integrated

Fig. 3.8 Nvidia Tx2 development board

22
with the Mission Planner, QGroundControl. These radios come in two frequency ranges
433mhz and 915Mhz. the one which we use is of 915Mhz. It has a sensitivity of -121 dBm,
Transmit power up to 20dBm(100mW). It is based on HM-TRP radio modules [34], with
Si1000 8051v microcontroller and Si4432 radio module. It has one ground module and one air
module. The ground module is connected to through the computer to the QGroundControl
module using a serial Com port and Air module is connected to Pixhawk on Telem1 port with
a baud rate of 57600 Hz.

It is a low power consumption SBUS/IBUS and CPPM/PWM support Transmitter [24].


It has a 9-channel mode with a frequency range of 2.4GHz. It works on 16v AA battery power
supply. It has two joysticks for controlling Throttle, roll, pitch yaw of the Quadcopter as shown
in Fig. 3.10. Its transmitting power is not more than 20dBm and it has a bandwidth of 500KHz,
2.4GHz. It has multi modes AUX channel which can be assigned for different modes of flight
to the Quadcopter according to our need like Manual, Stable, Altitude. Position, onboard, hover,
land, take-off etc.

Fig. 3.9 Air and ground module for telemetry communication [34]

23
Fig. 3.10 Turnigy radio transmitter

3.3 Sensors
3.3.1 Zed Stereo Camera
Computer stereo vision is an emulation of depth perception. It relies on stereo cameras
to achieve the perception; Zed stereo Camera is the passive stereo camera which tries to
replicate the human vision and creates a 3D map by comparing the vector difference between
the left and right images [30] as shown in Fig. 3.11. It comes with its SDK which helps us to
configure it and use it for the use of task-specific purpose. It has inbuild IMU which helps it to
determine its position in the global world frame without any GPS. It has a 4 different mode of
video during the flight needed for any task as shown in Table 3.1 below. It has a pose update
rate up to 100Hz, which help in recording moving object without any frame loss. Its sensor
resolution is 4M pixels sensors with large 2-micron pixels and a sensor format of native 16:9
format for a greater horizontal field of view for the navigation of a quadcopter in an indoor
environment a depth-sensing and with a stereo vision perception is very necessary to realise
the object and environment in front of it as it is without neglecting or transforming into the 2D
world.

3.3.2 Zed Camera Calibration


The calibration file is needed by the zed stereo camera to function properly. Zed generally
comes with the factory-calibrated file with its serial no, but to improve the accuracy for the

24
user-specific task we can recalibrate it locally using the Zed calibration utility as shown in
Fig.3.12. After the SDK is successfully installed the calibration utility can be found in the given
location of our system.

darkratio@darkratio:~$ /usr/local/zed/tools/ZED/tools/Calibration
We follow the onscreen instruction to perform calibration. Once the calibration is done
it will save the new file overwriting the existing file.
Table 3.1. Different formats of Video Output [30]
Video Mode Frames per second Output resolution
2.2K 15 4416x1242
1080p 30 3840x1080
720 60 2560x720
WVGA 100 1344x376

Fig.3.11 Zed Stereo Camera [30]

Fig. 3.12 The camera calibration using zed calibration utility

25
3.4 User Interface
Fig.3.13 shows the complete block diagram [35] of how the communication between
the ground station and onboard computer and pixhawk is made using the radio link. from the
left it shows the quadcopter frame with all its landing gears then comes the motor, ESCs, and
propellers which are controlled by the pixhawk in the autopilot hardware system. Through the
voltage command to the ESC for the motor control. The manual command for the throttle, roll,
pitch and yaw to the quadcopter is given by the Radio control system where Rc receiver is on
board and is connected to the pixhawk using PWM converter for the motor signal command
and Rc transmitter is used to send the control signal through the radio link to the radio receiver
which in turns send the signal to pixhawk to give commands to ESC for the motor speed control.
The ground control computer is connected to pixhawk interface using ‘Qgroundcontrol’
software [27] which is an open-source cross-platform software responsible for the connection
and communication of the pixhawk with the ground station and establishing a communication
link using a user-friendly interface where we can set the mode of flight, control the speed of
the mode, set fail safe threshold and other voltage and current distribution parameters. It is
connected using an FPV radio telemetry which has a ground and air module with the frequency
range of 915Hz.

Pixhawk and ground station communication design

Fig. 3.13 Block diagram of onboard computer and ground station communication design [35]

26
Chapter 4 Experimental Results
In this section, we showcase the effectiveness of the proposed method for 4 different
motion cases with some manually created artefacts such as blurriness, illumination variation
and occlusion. Initially, we conducted our experiment using a handheld camera to test the
feasibility and effectiveness of the proposed method later the final experiment to demonstrate
the method stabilization performance for comparison with the already established method was
done using the camera mounted on the quadcopter. The quadcopter was manually controlled
by the radio transmitter(joystick) and the disturbance were given additionally with the help of
radio transmitter during the quadcopter flight, along with its own system vibrations and
jitteriness. From the experimental results we can observe that the proposed method using
YOLO was able to reduce the aforementioned artefacts.

Hardware and Software Parameters Description


For this experiment we used NVidia tx2, GPU NVidia Pascal architecture with 256
NVIDIA CUDA cores 1.3 TFLOPS with Ubuntu 18.0.4, Ram 8 GB 128-bit, Jetpack 4.4, CPU
is Dual-core Denver 2 64-bit and quad-core ARM A57 complex. Zed Stereo camera with
resolution HD720p and 30 fps. Pixhawk 4 for flight control. We trained the YOLOv3 model
on custom dataset available on open images dataset V6+ [15]. Testing video is real-time video
recorded during the experiment and then used for benchmarking with other already established
methods. We implemented the algorithm in python using OpenCV modules, YOLOv3.

Experiment Scenario Explanation


For this experiment, we have selected 4 cases of motion to validate the robustness of our
method for blurriness, illumination variation, occlusion and also to compare its effectiveness
with some already established works in this field. the cases are as follows.

(1) Gradual motion


In stabilization of a video, motion of the object to be stabilized plays an important role
as the feature determined by the movement of the object in the frame decide the accuracy of
motion estimation and hence further leading to the motion compensation of the overall frame.
For our experiment, we chose two separate scenarios (1) the object and camera both are made
to move slowly (2) the camera is only moving w.r.t the object. Giving enough time to detect

27
the object and extract its feature for stabilization. This is the most general case for video
stabilization which we encounter in most of our applications.

(2) Drastic motion


In the drastic motion (1) object and camera both change its position drastically w.r.t each
other (2) the only camera is made to change its position drastically w.r.t static object making it
challenging for stabilization. As the object sometimes becomes blurry which makes it even
harder for the algorithm to work on and stabilize the frame.

(3) Near foreground motion


In this both object and camera will be moving near each other and object is made to be
in constant close and far position from the camera to check the robustness in case of parallax
or overfitting of the image in the fame. In such cases, it's hard to differentiate the object with
the foreground and background.

(4) Random motion of Object with quadcopter tracking


As shown in Fig. 4.1, we execute the experiment with random motion by randomly
arranging all the aforementioned cases under one scene, and test the robustness of stabilization
method in a more realistic scenario. In the test scenario, the background person depicts the
drastic motion of an object by running randomly in the given frame while another person
depicts the gradual motion change of the object by walking slowly. In addition we also added
one more person in the foreground which is moving close to the camera with varying speed
depicting the near foreground object. The most challenging part of this experiment that when
the foreground contains more than one object in the scene along with background motion. To
solve this problem, we introduced YOLO to enable the proposed algorithm to learns specific
object features, surrounding objects and its background features. When these features are
present in testing or real-world scenario together or near to the target, then accuracy of the
methods further increase the confidence of YOLO detection which in turn provides more
prominent pixel coordinates of the bounding box to initiate the feature matching search and the
bounding box coordinates with highest pixel intensity value is selected. The algorithm
continues to stabilize the image frames based on the detected object until it is present in the
frame. If it moves out of the frame, next detected object is selected to stabilize the image frame
based on its pixel coordinates. This plays a important role in case of occlusion and also
illumination variation as YOLO is robust to these changes, so it continues to provide the pixel
information of the detected object and stabilization of the overall image frame is not affected .

28
However, this creates a slight disturbance, but this won’t affect the overall stabilization of
process and as a result, it continues to gives a stable output without any large addition of noise
or disturbance. Thus, with this experimental case, we try to verify the effectiveness of the
method. In Fig. 4.2 we can see that the object is in motion executing the aforementioned cases
and the zed camera attached to the Quadcopter is taking the input video and it's being passed
through the stabilization algorithm to stabilize the image which is been recorded and transferred
to the ground station i.e. the laptop through the USB cable. Fig. 4.3 (a), (b) shows the different
instance for the random motion executing all the above-mentioned cases with the random
motion of the quadcopter. Quadcopter is trying to capture the objects in the scene and pass it
to YOLO for object detection and stabilization. The detection and object tracking along with
stabilization are shown in Fig. 4.4 (a), (c), (e), (g) and (i) which illustrates the input image to
the zed camera Fig.(b), (d), (f), (h), (j) shows the stabilization algorithm in process with object
detection and tracking in different scenarios with illuminance changes. We can clearly observe
the effect of YOLO on blurriness and illumination variation in the image frame during the
experiment.

Gradual Motion

Near foreground
Drastic motion
motion

Fig. 4.1 Experimental Scenario Setup in Engineering Building 5

29
Fig. 4.2 The Experimental scenario for the quadcopter motion with respect to the object motion in both
foreground and background of the scene.

(a) (b)
Fig.4.3: (a), (b) The different experimental scenario with the quadcopter for random motion combining
gradual, drastic and near foreground changes in one

(a) (b)

30
(c) (d)

(e) (f)

(g) (h)

(i) (j)
Fig. 4.4 (a)(c)(e)(g)(i) shows the input image to the zed camera (b)(d)(f)(h)(j) shows the stabilization
algorithm in process with object detection and tracking in different scenarios with illuminance changes.

31
Experimental Results

We have plotted the results of stabilized trajectory w.r.t to original trajectory along x, y
for translation for the handheld camera case using the eq. (7) for original input video. For stable
video output, we plotted the trajectory of the image frame using the eq. (22) which is updated
after passing through the motion filter to feed to the warping module. In the graph plots below
dx, dy is the translation of pixel in x, y direction respectively and da is the transformation of
the pixel for the rotation in degree. To generate this result, we used the trajectory data generated
by the motion estimation for the input data and the stabilized output data. In the following
result, we will observe the trajectory which is closed to the respective axis is more stabilized.
More the deviation from the respective axis the more noise it has and less stable it is.

4.3.1.1 Handheld camera motion w.r.t. the object


For this experiment, we perform video stabilization using the proposed algorithm for the
handheld camera. In this object and camera, both are made to move in gradual motion w.r.t
each other to depict the dynamic gradual motion change. From the Fig. 4.5 (a), (b), (c) we can
observe that original video frame had noises, lot of disturbance which is rectified after applying
the proposed method and it was able to stabilize the input frame translation and rotation very
efficiently to produce almost a flat trajectory w.r.t reference axis.

4.3.1.2 Gradual motion trajectory with a static object using a handheld


camera
For this experiment, we used the static object and put the camera into gradual motion to
determine the stability efficiency of the proposed method. We can observe from the Fig. 4.6
(a), (b) and (c) that the proposed method was able to stabilize the input frame by detecting the
object in the image frame and then determining the motion effective motion estimation and
using the motion filtering to remove the noise from the image frame and make it very efficiently
with almost a flat cure w.r.t. original unstabilized trajectory. Also, when the noise is present
for a longer time like for frame number 60 to 170 in Fig. 4.6 (a) unstabilized trajectory. After
applying the proposed method, the noise is removed and the image frame of stabilized output
becomes smooth and have a flatter curve.

32
(a)

(b)

(c)
Fig. 4.5 Handheld camera motion (a)Stabilized dx trajectory for gradual motion w.r.t original trajectory
along the x-axis (b) Stabilized dy trajectory for gradual motion w.r.t original trajectory along the y-axis
(c) Stabilized da trajectory for gradual motion w.r.t original trajectory.

33
(a)

(b)

(c)
Fig. 4.6 Gradual motion with a handheld camera (a) Stabilized dx trajectory for the gradual motion of
camera w.r.t object along the x-axis (b)Stabilized dy trajectory for the gradual motion of camera w.r.t
object along the y-axis (c) Stabilized da trajectory for the gradual motion of camera w.r.t object along
the x-axis.

34
4.3.1.3 Gradual motion trajectory with the quadcopter
For this experiment object and Quadcopter, both are made to move in a gradual motion
w.r.t each other. In Fig. 4.7 (a), (b), (c) we can see that the proposed method was able to
stabilize the input frame for the translation and rotation. There is an almost flat curve with
respect to the reference axis removing almost all the noise accumulated by the input frame.

4.3.2.1 Handheld camera motion w.r.t. the object


For this experiment object and the camera are made to move in fast motion with respect
to each other. The purpose of this experiment is to validate that even if the object changes its
position and becomes sometimes blur the proposed method is still able to detect the object and
stabilize the frame. This can concur with the following Fig. 4.8 (a), (b), (c), the stabilized
trajectory curve is smooth and very close to the reference axis, removing all the noise which is
accumulated over the motion of the input frame. There are few deviations still present in the
frame as shown in Fig. 4.8(c).

4.3.2.2 Drastic camera motion w.r.t. static object


For this experiment object is static only the camera is made to move drastically around
the object. In Fig. 4.9 (a) (b) (c), we observe that dx trajectory is very smooth, whereas the dy
and da have small disturbances but overall it can stabilize the image. That small disturbance is
because either software or the hardware stabilization method can only stabilize the shakiness
and jitter in the input frame up to certain level, if the shakiness is too much then the method
can reduce the effect of the jitter and shakiness but it won't be able to completely stabilize the
frame.

35
(a)

(b)

(c)
Fig. 4.7 Gradual motion with a quadcopter (a) Stabilized dx trajectory for gradual motion w.r.t original
trajectory along the x-axis (b) Stabilized dy trajectory for gradual motion w.r.t original trajectory along
the y-axis (c) Stabilized da trajectory for gradual motion w.r.t original trajectory.

(a)

36
(b)

(c)
Fig. 4.8 Handheld camera with drastic motion (a) Stabilized dx trajectory for drastic motion w.r.t
original trajectory along the x-axis (b) Stabilized dy trajectory for drastic motion w.r.t original trajectory
along the y-axis (c) Stabilized da trajectory for drastic motion w.r.t original trajectory.

(a)

37
(b)

(c)
Fig. 4.9 Handheld camera motion w.r.t. Static object (a) stabilized dx trajectory for the drastic motion
of camera w.r.t object along the x-axis (b) stabilized dy trajectory for the drastic motion of camera w.r.t
object along the y-axis (c)stabilized da trajectory for the drastic motion of camera w.r.t object along the
x-axis.

4.3.2.3 Drastic motion trajectory with quadcopter


For this experiment the camera is attached to the quadcopter and object is made to run
in a certain pattern and the input data is capture and send to the proposed method of stabilization
for processing. In Fig. 4.10 (a) (b) shows the flat curve of the translation trajectory with respect
to the reference axis. For rotation in Fig. 4.10 (c) we can observe some disturbance but overall.
Video is stabilized and the jitter and shakiness are reduced to the extend.

38
4.3.3.1 Near foreground motion trajectory with a handheld camera
In the Fig. 4.11 (a), (b) and (c), we observe that dx, dy and da we observe that that dx has
some noise disturbance around frame number 150 there is some large deviation from the
reference axis also around frame number 275 to 350 we see the deviation due to jitter and
shakiness. For dy and dx which is translation and rotation, we observe that there it is a flat
smooth curve making overall the video a stabilized output.

4.3.3.2 Near foreground camera motion w.r.t. the static object


For this experiment, we move the camera closed to the static object with jitter and
shakiness provided by the hand. We observe that the in Fig. 4.12 (a), (b) and (c) the proposed
method can suppress the jitter and shakiness and remove the noise to produce a stable and
smooth curve. There is a slight deviation from the reference axis but overall it was able to
remove the jitters and trembling of the video frame.

(a)

(b)

39
(c)
Fig. 4.10 Drastic motion with a quadcopter (a) Stabilized dx trajectory for drastic motion w.r.t original
trajectory along the x-axis (b) Stabilized dy trajectory for drastic motion w.r.t original trajectory along
the y-axis (c) Stabilized da trajectory for drastic motion w.r.t original trajectory along the x-axis

(a)

(b)

40
(c)
Fig. 4.11 Handheld camera with near foreground motion (a) Stabilized dx trajectory for near foreground
motion w.r.t original trajectory along the x-axis (b)Stabilized dy trajectory for near foreground motion
w.r.t original trajectory along the y-axis (c) Stabilized da trajectory for near foreground motion w.r.t
original trajectory along the x-axis

(a)

(b)

41
(c)

Fig. 4.12 Handheld camera motion w.r.t. static objects (a) Stabilized dx trajectory for near foreground
motion of camera w.r.t object along the x-axis (b) Stabilized dy trajectory for near foreground motion
of camera w.r.t object along the y-axis (c) Stabilized da trajectory for near foreground motion of camera
w.r.t object

4.3.3.3 Near foreground motion trajectory with the quadcopter


The camera was attached to the quadcopter in this experiment and the object was made
to move in close proximity of the quadcopter. In the Fig. 4.13 (a) (b) (c) we can observe that
dx dy and da produced by the proposed method was able to reduce the shakiness of the input
frame and remove the noise to produce a more stable and smoother trajectory curve. Da shows
more deviation from the reference axis which indicates that the input frame although had some
degree of rotation but the shakiness and trembling of the input video was removed and was
made more stable and constant in motion.

(a)

42
(b)

(c)
Fig. 4.13 (a) Stabilized dx trajectory for near foreground motion w.r.t original trajectory along the x-
axis (b)Stabilized dy trajectory for near foreground motion w.r.t original trajectory along the y-axis (c)
Stabilized da trajectory for near foreground motion w.r.t original trajectory along the x-axis.

(a)

43
(b)

(c)
Fig. 4.14 Random motion with a quadcopter (a) Stabilized dx trajectory for random motion w.r.t original
trajectory along the x-axis (b) Stabilized dy trajectory for random motion w.r.t original trajectory along
the y-axis (c) Stabilized da trajectory for random motion w.r.t original trajectory along the x-axis

Evaluation and Discussion

This section compares the result with some of the already established methods to check
the efficiency and robustness of this method in various cases. The methods used here for
comparison are:

1) OpenCV Video Stabilization [9] for comparison purpose the code was taken from learning
OpenCV [36]
2) Object Tracking and Video Stabilization using OpenCV [10] for comparison purpose the
code was taken from VidStab [10].

44
3) Robust Video Stabilization Using Particle Keypoint update and L1-Optimized Camera Path
[11] for comparison purpose the code was taken from [37]
4) Deep online Video Stabilization using Multi-Grid Warping Transformation Learning [12]
for comparison purpose the code was taken from [38].
For the benchmarking purpose, we have considered the stabilization of the translation
trajectory along x and y-axis and rotation transformation. In the following graphs, we have
compared the trajectory of dx, dy which is the pixel displacement from the X and Y-axis
respectively. The bigger is the displacement or deviation from the reference axis the less stable
the output video is. We have taken a total of 4 videos using the zed camera from the Quadcopter
for gradual, drastic, near foreground motion and random motion, and did post-processing using
all the above-mentioned methods. for the sake of same inference constraint from the input video.
The graph is plotted based on the output trajectory of the stabilized video using the following
methods. OpenCV and Object tracking and Video stabilization using OpenCV are preferred
for generic use of stabilization i.e. static object with gradual motion. The robust video
stabilization and deep online stabilization are good in both static and dynamic motion of the
object with respect to the camera. We have summarized all the cases and try to test this method
with the rest of the methods to validate their effectiveness and robustness in different scenarios.
The horizontal axis depicts the number of frames for that particular experiment and vertical
axis depicts the pixel value displacement for X and Y-axis respectively. For the rotation angle,
the vertical axis denotes the degree of transformation for each pixel. Later mean square error
for each result is calculated and compared to how the quantitative value for the graph.

Mean Squared Error (MSE)[39] is a very effective and quantitative evaluation of video
stabilization methods, we have used this method as a quantitative evaluation measure in this
thesis. The lower the MSE is, the more stable the video is. The equation for mean square error
is given by[39]:
1
𝑀𝑆𝐸(𝑋1, 𝑋0 ) = 𝑃∗𝑄 ∑𝑃𝑖=1 ∑𝑄𝑗=1(𝑋1 (𝑖, 𝑗) − 𝑋0 (𝑖, 𝑗))2 ( 4.1)

where 𝑋0 (𝑖, 𝑗)and 𝑋1 (𝑖, 𝑗) represent the pixel value in point (i, j) of the adjacent image frames
compensated by the above process, respectively. The smaller the value of 𝑀𝑆𝐸(𝑋1, 𝑋0 )is, the
higher the image overlap of two adjacent frames is.
4.4.1.1 Benchmarking for the gradual motion of the object
Fig. 4.15 shows the translation trajectory of the 4 methods along the X-axis for the
gradual dynamic motion of quadcopter and the object. As we can see from the graphs that
OpenCV and object tracking and video stabilization with OpenCV were not able to stabilize
45
the input frame and almost followed the original path of the frame. For Robust video
stabilization, using particle key point update and l1-optimized camera path we can see although
it fails to stabilize the input video when the dynamic motion is using the quadcopter as the
motion of the quadcopter is fast which makes it tough for the averaging window to determine
the window size and perform stabilization. In deep online video stabilization using multi-grid
warping transformation learning it’s clear that it can stabilize the input video but we could still
observe some spikes due to large change in motion. From the comparison we can concur that
the proposed solution was able to stabilize the video more efficiently and to back this up we
can also refer to the mean square error data that this solution has the least error compared to

the rest.

Fig. 4.16 shows the pixel displacement in the along the y-axis. Where dy denote the
amount of pixel displacement. OpenCV and object tracking with video stabilization using
OpenCV not able to stabilize the y-axis. For the Robust video stabilization, we can observe it
tries to reduce the spikes at some places to makes the motion less bumpy but it is still not able
to completely stabilize the image. In deep online video stabilization, we observe that in starting
it works well and can make the trajectory smoother but as the spikes increase later it becomes
little noisy at places and does not able to attain a complete stabilized output. For the proposed
solution it has small noises around the frame number 1300 and 1400 during the landing of the
quadcopter rest it was able to stabilize the input image very smoothly. Which can also be
concurred with mean square error values of each method given in the table below showing that
this solution had the least deviation from the reference axis and was able to stabilize the input
frame more efficiently.

For the rotation angle da which is in degree as shown in Fig. 4.17, we observe that
OpenCV video stabilization had stable trajectory till 300 frames but after that, it becomes noisy
and almost follows the original transformation path. For object tracking with video stabilization
using OpenCV we can till 400 frames, it was stabilizing well then it has become unstable after
400 frames. Robust video stabilization, in this case, performed well and was able to control the
rotation of the image frame to a large extend with having unstable frames around the frame
number 200 to 400 and 1000 frames onwards. For deep learning online video stabilization, it
performs worse and follows the original path. The proposed solution was able to give the best
stabilization which can be observed from the graph as well as the mean square error data for
all the 5 methods with gradual motion change.

46
Fig. 4.15 (1) OpenCV Video Stabilization (2) Object Tracking and Video Stabilization using OpenCV
(3) Robust Video Stabilization Using Particle Keypoint update and L1-Optimized Camera Path (4) Deep
online Video Stabilization using Multi-Grid Warping Transformation Learning, Stabilized dx trajectory
for gradual motion w.r.t original trajectory along the x-axis

Fig. 4.16 Stabilized dy trajectory for gradual motion w.r.t original trajectory along the y-axis

47
Fig. 4.17: Stabilized da trajectory for gradual motion w.r.t original trajectory along the x-axis

4.4.1.2 Benchmarking for the drastic motion of the object


In drastic motion, there are only 850 frames taken for comparison as shown in Fig. 4.18
to test the stabilization efficiency of the method. OpenCV in the x-axis translation trajectory
fails to stabilize whereas object tracking with video stabilization with OpenCV was able to
stabilize the input frame to some extend except. However Robust video stabilization method
completely follows the original trajectory and fails to stabilize the input frame. We can observe
that deep online video stabilization was able to stabilize the input frame along the x-axis very
efficiently but the proposed method outperformed the deep online stabilization in providing the
best-stabilized output along the x-axis with least mean square error value.

For the y-axis translation, OpenCV was able to stabilize the input frame and we can see
from the graph that it reduces the spikes and sudden change. Object tracking was able to
stabilize the frame till 250 frames but from 300 frames it fails to stabilize the frames. Robust
video stabilization completely follows the original trajectory and could not stabilize the frame.

For deep learning method, we see a smooth trajectory for the input frame and was able
to remove the noise efficiently but the proposed method outperforms and gives the best
stabilization result with slight disturbance in initial frames after that we obtain the smooth and
almost straight line to the reference axis with the least mean square axis value as shown in Fig.
4.19.

48
Fig. 4.18 Stabilized dx trajectory for drastic motion w.r.t original trajectory along the x-axis

Fig. 4.19 Stabilized dy trajectory for drastic motion w.r.t original trajectory along the y-axis

For the rotation angle comparison of the above-mentioned methods, OpenCV could not
stabilize the input frame as its clear from the Fig.4.20. Object tracking with video stabilization
using OpenCV was able to stabilize but between frame 300 to 350 it failed to stabilize the
overall video as the region of interest defined for the object could not keep track of the object

49
Fig. 4.20 Stabilized da trajectory for drastic motion w.r.t original trajectory along the x-axis

during the drastic motion change leading to the unstabilized output overall. Robust stabilization,
however, was able to stabilize the input frame rotation well within stable range and outperform
all other methods and produced almost stable out with least mean square value. Deep online
video stabilization could not dampen the rotation of the input frame and produce a stable and
smooth output. The proposed method could smoothen the rotation of the input frame and
stabilize it but from frame number 700 to 800 during the landing of the quadcopter, it could
not stabilize the rotation caused.
4.4.1.3Benchmarking of near foreground motion of the object
In the following benchmarking of near foreground motion along the x-axis, we observe
that OpenCV, object tracking with video stabilization using OpenCV and robust video
stabilization could fail to stabilized the input frame. Deep online video stabilization was able
to stabilize the frame to some extent but the proposed method outperforms it by providing a
smoother and more stable trajectory for the input image frame as shown in Fig. 4.21.

In dy pixel trajectory graph along the y-axis in Fig. 4.22, we can observe clearly from
that 3 methods could stabilize the input image frame. OpenCV and robust video stabilization
followed the exact path of the original trajectory. Whereas object tracking with video
stabilization using OpenCV failed to stabilize the object as the region of interest failed to follow
the object when it came close to the camera and hence it becomes unstabilized after that. Deep
online video stabilization gives the most stable image frame output along the y-axis with the

50
least mean square error value providing a smooth and stable trajectory of motion for the image
pixel. The proposed method produces a stable output but with some noise and spikes around
the frame number 200 to 400 and 1400 to 1500 making it less stable smooth at some places.
The mean square error value is although small but it's not the best for this axis.

In the da transformation graph of the rotation angle, we can confirm that deep online
stabilization performs the worst and fails to stabilize the rotation of the input frame at all,
moreover its mean square error values is also the highest which makes depicts that the method
failed to stabilize the near foreground object. OpenCV video stabilization was able also not
able to stabilize the rotation of the image frame but it was able to reduce the angle of rotation
to some extent as we can see from the Fig. 4.23. Robust video stabilization method, however,
was able to stabilize the rotation of the input frame and provide a good stabilized output It
dampened all the spikes and makes the trajectory smoother. However, the proposed method
outperforms all the above-mentioned methods and gave the best stable and smooth trajectory
for the da transformation graph, which is further proved by the mean square error value which
is the least for this method

Fig. 4.21 Stabilized dx trajectory for near foreground motion w.r.t original trajectory along the x-axis

51
Fig. 4.22: Stabilized dy trajectory for near foreground motion w.r.t original trajectory along the y-axis

Fig. 4.23: Stabilized da trajectory for near foreground motion w.r.t original trajectory along the x-axis

4.4.1.4 Benchmarking with established methods for random motion


In the dx trajectory for random motion along the x-axis, we can observe that the
OpenCV failed to stabilize the image frame, The object tracking with video stabilization using
OpenCV method was able to stabilize the input frame upto150 frames as we can see in the Fig.
4.24 after that it becomes unstabilized as we can observe from the graph, it is because it loses

52
the region of interest due the random motion of the object it fails to track different object motion
in foreground and background. Robust video stabilization method had poor stabilization for the
initial frames later it smoothens the jitters and produced a flat trajectory. For deep online video
stabilization, we can observe that the graph is stable but it's not smooth, it has few spikes
however the proposed method gives the best dx trajectory for the input image frame which is
almost straight light close to reference axis with the smallest mean square error value.

For dy trajectory along the y-axis, we observe that Object tracking with video
stabilization using OpenCV was able to stabilize the input frame as shown in Fig. 4.25 but from
frame 1600 to 1800. It was not able to stabilize the input frame affecting overall stabilization.
However, the deep online stabilization method was able to stabilize the input frame most
efficiently and produce a smooth trajectory we concur this by the mean square error data which
shows that it has the least value among all the above methods. The proposed method produces
an almost flat the trajectory of the stabilized output with small mean square error value but it
has during the frame number 1600 to 1800, it had disturbance. Although the overall path of the
trajectory of the proposed method is more flat and smoother but due to a large deviation at
certain frame number its error value is larger than the deep online stabilization.

For the rotation angle in Fig.4.26, we observe that object tracking with video stabilization
using OpenCV is was able to reduce the rotation of the input frame but still it had some noise
and spikes some places showing a large amount of rotation at certain frame numbers. Robust
video stabilization, however, produced a smooth and stable output, but the proposed method

Fig. 4.24: Stabilized dx trajectory for random motion w.r.t original trajectory along the x-axis

53
outperforms all the above methods and gives a more flat and stable output with the least mean
square error value. On the other hand, deep online stabilization failed to reduce the rotation of
the input frame to produce a smooth and stable out image frame. we can concur this from the
mean square error table that it has the maximum value make it unstable.

Fig. 4.25: Stabilized dy trajectory for random motion w.r.t original trajectory along the y-axis

Fig. 4.26: Stabilized da trajectory for random motion w.r.t original trajectory along the x-axis

54
As we could observe that in the above benchmarking and MSE evaluation, the proposed
method, on the other hand, does a much better job in all the 3 cases with handheld dynamic,
static and with the Quadcopter in gradual, drastic and near-foreground motion w.r.t object. It
can dampen the vibration caused by jitters and shakiness also the noise due to UAV motors
and propellers. As we can concur from Table 4.1 that our method has the least mean square
error on average. In case of random object and camera motion, we observe that the performance
of the method increases, one of the possible reasons is that when the algorithm learns specific
object features, it also tries to learn its surrounding objects and background features. And when
these features or object are also present in testing or real-word scenario together or near to the
target then the accuracy of the methods further increased, which can be observed in the above
results. From table 4.2 we can say that the proposed method is robust in all real-world scenarios
unaffected by the variation in illuminance and variation in object movement and also the
number of the object in the scene with acceptable real time computation time.

Object tracking as we have discussed loses tracking when the object motion is drastic or
there is illuminance change or there are multiple objects in the scene. In case of deep online
video stabilization too in case of near foreground when the object is close to the camera it
becomes computationally costly for it to stabilize the image efficiently and loses its accuracy
which is concurred by the use-value as well as the stabilization plot of translation and rotation.
There are potential improvements and applications for this algorithm. One potential direction
of improvement can be to find out what is causing the different spikes lines in the results. Each
test has some unique spikes that are currently not explained. Memory placement and memory
conflicts are the expected explanation for these abnormalities. If the reason is found, then the
algorithm will become much faster on average.

OpenCV Stabilization failed to give a stabilized output, Robust Video Stabilization was
able to keep the frame but it still had jitters and blurriness and could not stabilize sudden
variations. Object tracking with Bandpass filter was able to stabilize but its computation time
was high at the same time also it could not stabilize for drastic and near foreground changes. It
drops the tracking if jitter is too much or illuminance is changed suddenly. In the case of Deep-
Online Video stabilization, it works by providing the stable and unstable video of the scene,
then it tries to cascade the two and match for the perfect frame path of motion for stabilization.
It works fine for normal unstable video with gradual changes but in scenes with drastic motion
in the dynamic case or with extreme near-range foreground objects, this method might fail

55
because it learns to warp the unstable camera to a virtual stable camera with parallax. And in
the near foreground due to the closeness of the object with the camera, it might go out of frame
some time leading for the parallax error also miscalculation of motion estimation leading to a
poor stabilized video.

56
Table 4.1: Comparison of the mean square error for translation and rotation after stabilization

57
MSE evaluation
Table 4.2 Comparison of performance in a different scenario and the computation time of different methods

58
Chapter 5 Conclusion and Future Work
5.1 Conclusions
The first contribution of this thesis i.e. efficient and fast feature tracking that is well
justified here by the computation time it takes to compute a single frame and the second
contribution for removal of all sort of noise to provide most robust and stabilized output
possible is also well justified by the benchmarking results and the MSE evaluation, we also
saw that in when the number of objects increased the proposed method performed much better
compared to the rest overall. Hence making it more reliable and efficient in real-time indoor
applications like photo taking, surveillance.

There are few limitations to this method since it’s a software-based stabilization there is
a limit of trembling and jitter it could stabilize beyond a certain limit it will not be able to
stabilize the input video. But in general shake and jitter produced by the quadcopter and
handheld camera motions can easily be rectified which is even verified through the
experimental results. So, we can conclude based on the following results that our method was
able to significantly improve the stabilization performance as well as providing more accurate
of the data for translation and rotation angle of the image frame. In the drastic motion condition
where we can observe that the motion of an object is fast and sometimes it becomes blurry
making any feature detection method difficult to detect feature for video stabilization.

In this thesis, we solved that issue using the state-of-the-art deep learning object detection
model to detect the object in the frame and then extract the feature from the detected object
making it more efficient and robust. Deep online stabilization works well in general cases of
the handheld camera but if the camera is attached to the drone and the frames are changing fast
it becomes difficult for it to stabilize the input frame. Robust video stabilization method works
well for gradual and drastic changes but for near foreground and random motion, its
performance decreases as the complexity increases. OpenCV and object tracking video
stabilization with OpenCV work well for the static object in gradual case. In object tracking
and video stabilization using OpenCV, the object is tracked until if there is a single object
moving or the illuminance is constant, if there is an occlusion or multiple objects are there of
some kind then it fails to track the object sometimes and its stabilization performance decreases.
The proposed method provides the user with more precise information about the object to be
tracked and to be stabilized for dynamic as well as static conditions with more real and complex

59
situations for its various real-time applications. Hence using YOLO-Stabilization Algorithm
we can improve the stability of the video and provide smooth and stable video.

5.2 Future Work


The development platform shown in this thesis enables the wide range of exploration in
the field of computer vision and AI with UAVs. The object detection using YOLO method can
be further enhanced in future with the latest AI model which is much faster and lighter for
onboard computation. Making it computation more efficient and real-time and can display the
image in full resolution for the task of industrial requirements. Kalman filtering can alone be
used as a hybrid Kalman and bandpass filtering by setting the low cutoff frequency and higher
cut off frequency instead of using separate bandpass filtering. This could make the algorithm
faster and reduce the computation cost drastically. Kalman filter is also capable of solving delay
and interpolation issues caused by the telemetry data, which gives the possibility for further
enhancement of estimation of intended video motion during the flight making it cost-effective
compact with least hardware requirement.

60
References
[1] Shuaicheng Liu, et al. “Meshflow: Minimum latency online video stabilization,” in
Proceedings of European Conference on Computer Vision, Springer, Cham, 2016, pp
800-815.
[2] J. Windau and L. Itti, “Multilayer real-time video image stabilization,” in Proceedings
of International Conference on Intelligent Robots and Systems, San Francisco, CA,
USA 2011, pp. 2397-2402.
[3] M. Zhao and Q. Ling, “Adaptively Meshed Video Stabilization,” IEEE Transactions
on Circuits and Systems for Video Technology, doi: 10.1109/TCSVT.2020.3040753.
[4] Nicholas Stewart Cross, Onboard video stabilization for unmanned air vehicles,
doi:10.15368/theses.2011.67, 2011
[5] David Linn Johansen, Video stabilization and target localization using feature
tracking with small UAV video, Diss. Department of Electrical and Computer
Engineering, Brigham Young University, 2006
[6] Lakshya Kejriwal and Indu Singh. “A Hybrid filtering approach of Digital Video
Stabilization for UAV using Kalman and Low Pass filter,” In Proceedings of
International Conference on Advances in Computing & Communications, Cochin,
India, 2016, pp. 359-366.
[7] H. Guo, S. Liu, T. He, S. Zhu, B. Zeng and M. Gabbouj, “Joint Video Stitching and
Stabilization from Moving Cameras,” IEEE Transactions on Image Processing, vol.
25, no. 11, pp. 5491-5503, Nov. 2016.
[8] S. G. Fowers, D. Lee, B. J. Tippetts, K. D. Lillywhite, A. W. Dennis and J. K.
Archibald, “Vision Aided Stabilization and the Development of a Quad-Rotor Micro
UAV,” in Proceedings of International Symposium on Computational Intelligence in
Robotics and Automation, Jacksonville, USA, 2007, pp. 143-148.
[9] https://docs.opencv.org/3.4/d5/d50/group_videostab.html [accessed on 10/12/2019]
[10] https://pypi.org/project/vidstab/ [accessed on 10/12/2019]
[11] Semi Jeon, Inhye Yoon, Jinbeum Jang, Seungji Yang, Jisung Kim and Joonki Paik,
“Robust Video Stabilization Using Particle Keypoint Update and l1-Optimized
Camera Path,” Sensors, vol. 2, pp. 337, 2017.
[12] Wang, Miao, Guo-Ye Yang, Jin-Kun Lin, Song-Hai Zhang, Ariel Shamir, Shao-Ping
Lu and Shi-Min Hu, “Deep online video stabilization with multi-grid warping

61
transformation learning,” IEEE Transactions on Image Processing, vol. 5, pp.2283-
2292, 2018.
[13] Joseph Redmon and Ali Farhadi, “YOLOv3: An incremental improvement,” arXiv
preprint arXiv:1804.02767, 2018.
[14] https://docs.opencv.org/master/d4/d13/tutorial_py_filtering.html [accessed on
20/11/2020]
[15] https://appen.com/datasets/open-images-annotated-with-bounding-boxes/ [accessed
on 20/11/2020]
[16] S. Tamgade and V. Bora, “Motion Vector Estimation of Video Image by Pyramidal
Implementation of Lucas Kanade Optical Flow,” in Proceedings of International
Conference on Emerging Trends in Engineering & Technology, Nagpur, Maharashtra,
India, 2009, pp. 914-917.
[17] https://opencv-python tutroals.readthedocs.io/en/latest/py_tutorials/py_video/
py_lucas_kanade/py_lucas_kanade.html[accessed on 3/03/2020]
[18] R. Hartley and R. Gupta, “Computing matched-epipolar projections,” in Proceedings
of IEEE Conference on Computer Vision and Pattern Recognition, New York, USA,
1993, pp. 549-555.
[19] D. Choukroun, Novel methods for attitude estimation from vector observations, PhD
diss., Ph. D. dissertation, Technion, Israel Institute of Technology, 2003.
[20] https://docs.opencv.org/3.4/dd/d6a/classcv_1_1KalmanFilter.html [accessed on
3/03/2020].
[21] M. He, C. Huang, C. Xiao and Y. Wen, “Digital video stabilization based on hybrid
filtering,” in Proceedings of 7th International Congress on Image and Signal
Processing, Dalian, China, 2014, pp. 94-98.
[22] H.G. Adelmann, “Butterworth equations for homomorphic filtering of images,”
Computers in Biology and Medicine, vol. 28, no.2, pp.169-181, 1998.
[23] https://docs.opencv.org/master/d9/dab/tutorial_homography.html[accessed on
3/03/2020]
[24] https://hobbyking.com/en_us/turnigy-9x-9ch-mode-2-transmitter-w-module-ia8-
receiver-afhds-2a-system.html [accessed on 10/07/2019]
[25] https://docs.px4.io/master/en/flight_controller/pixhawk_series.html [accessed on
1/05/2019]
[26] https://developer.nvidia.com/embedded/jetson-tx2[accessed on 13/04/2019]
[27] https://docs.qgroundcontrol.com/master/en/index.html [accessed on 1/05/2019]
62
[28] https://dronequadcopterx.blogspot.com/2018/11/design-of-quadcopter.html[accessed
on 10/11/2020]
[29] https://www.amazon.com/Readytosky-Quadcopter-Stretch-Version
Landing/dp/B01N0AX1MZ [accessed on 7/03/2020]
[30] https://www.stereolabs.com/zed/ [accessed on 15/04/2019]
[31] https://www.amazon.com/20Amp-Multi-rotor-Controller-SimonK-
Firmware/dp/B00MB1Y1TW[accessed on 13/03/2015]
[32] http://www.hobbyking.com/hobbyking/store/__64191__Turnigy_Slowfly_Propeller
_10x4_5_Orange_CW_2pcs_US_Warehouse_.html[accessed on 10/07/2019]
[33] https://en.wikipedia.org/wiki/Electronic_speed_control[accessed on 10/07/2019]
[34] https://ardupilot.org/copter/docs/common-sik-telemetry-radio.html [accessed on
1/05/2019]
[35] https://rflysim.com/en/2_Configuration/Introduction.html [accessed on 20/11/2020]
[36] https://www.learnopencv.com/video-stabilization-using-point-feature-matching-in-
opencv/ [accessed on 11/11/2020]
[37] https://github.com/ishank-juneja/Video-Stabilization [accessed on 11/11/2020]
[38] https://github.com/cxjyxxme/deep-online-video-stabilization-deploy [accessed on
11/11/2020]
[39] Yao Shen, Parthasarathy Guturu, Thyagaraju Damarla, Bill P. Buckles and
Kameswara Rao Namuduri, “Video stabilization using principal component analysis
and scale invariant feature transform in particle filter framework,” IEEE Transactions
on Consumer Electronics, vol.55, no. 3, pp.1714-1721, 2009.

63

You might also like