You are on page 1of 42

360° Video Stabilization System

CHAPTER 1

1 INTRODUCTION

Earlier, normal cameras were used to capture videos. It only capture videos
in the direction in which the camera is directed or focused. Sudden capturing of the
videos usually result in unnecessary shakiness in the video which when viewed by
the viewer will cause cyber sickness. In order to view the instantly captured videos
smoothly, the term ‘video stabilization’ is used. Video stabilization is the method
in which the unnecessary shakiness and blur are removed which occurs when instant
videos are captured, for the smooth viewing of the video. It can also be defined as the
process of estimating and compensating for the background image motion occurring
due to the ego-motion of the camera. These techniques have been developed in order
to smooth shaky camera motion in videos before viewing.

Figure 1.1: Normal camera

In our project, we worked on the stabilization of 360◦ videos. Usually a 360◦


camera captures the whole view around us at the same time without specifying the
direction which is to be captured. It represents the whole view around us in a single
frame format. Viewers can pan and rotate a 360◦ video’s perspective to watch it from
different angles by dragging with the mouse or finger on a computer or mobile device,

Department of ECE 1 ACE College of Engineering


360◦ Video Stabilization System

or by watching the video with a VR headset.360-degree video is typically recorded


using either a special rig of multiple cameras, or using a dedicated camera that
contains multiple camera lenses embedded into the device, and filming overlapping
angles simultaneously. This separate footage is merged into one spherical video
piece, through a method known as video stitching,and the color and contrast of
each shot is calibrated to be consistent with the others. This process is done either
by the camera itself, or using specialized software which can analyze common visuals
and audio to synchronize and link the different camera feeds together. Generally,
the only area that cannot be viewed is the view toward the camera support.

Figure 1.2: 360 degree camera

The processing of 360◦ videos for optimal viewing is a bit harder now. The
researchers have been developing new algorithms for the stabilization of 360◦ videos
as our future lies in it. By the introduction of Virtual Reality(VR), 360◦ videos and
its stabilization seem to have a wide range of applications in it. So the stabilization
of 360◦ videos have become more important nowadays.

Department of ECE 2 ACE College of Engineering


360◦ Video Stabilization System

1.1 USE OF 360◦ VIDEOS

360 degree video stabilization can be used in the areas of military applica-
tions,localization, navigation and target tracking. It is best to install 360 degree
cameras in the war equipment. During war times this will ensure the safety of sol-
diers by focusing the enemy actions. Virtual Reality is highly useful in Navy field.
The navy conducts a wide range of training operations,some of which are dangerous
and expensive. Using 360 degree videos and VR technology we can reduce the risk
and cost.

360 degree video stabilization will be stepping stone towards Virtual Reality
(VR) world. By using VR headsets we can create realistic images,sound and other
sensations that simulate a user’s presence in a virtual or imaginary environment.
People will able to to "feel the action", "Be the movie character", "fly to the sky"
while sitting at home. VR technique helps a person to fully immerse in the videos.
Virtual reality is a revolutionary technology.

360 degree video stabilization will be highly influencing in the medical field.
That is in medical diagnostic applications like endoscopy and colonoscopy. Normally
endoscopy is done by capturing images of internal organs using a special camera. By
using 360 degree cameras we can record videos of internal organs and can observe
the stabilized video.

It can be used in entertainment fields like gaming and movies to make the
gaming experience more real and to allow individual experience adventures under
extreme conditions. It can be used in Traffic systems. It can be used in scientific
research laboratories so that scientist can easily research on a specific topic.It can
be used for security purposes such as in banks,offices etc.It can also be used in
education industry. It increases the level of understanding. 360 degree cameras can
be used in sports industry to show the full action of a gaming.

Department of ECE 3 ACE College of Engineering


360◦ Video Stabilization System

1.2 CHALLENGES IN 360◦ VIDEOS

The challenge in providing a 360 degree video experience is the polishing


stage. The video may be shot by parts to cover different angles or dimensions but
to compile it and to get done with the stitching of different frames is where the
task becomes a matter of worry. The resulting video has either distorted top or the
bottom of the sphere and also the stitching point is sometimes noticeable.

The consumer sits still while the camera movements are too rapid sometimes
to cover a fast paced shot say like skydive or a bungee jumping sequence. This
causes motion sickness in the consumer and thus reduces the preference for Virtual
Reality Experience. There are some other health effects observed in the users of VR
Headsets:

• Seizures

• Loss of attentiveness

• Disorientation

• Impaired balance

• Cyber-sickness

Department of ECE 4 ACE College of Engineering


360° Video Stabilization System

1.3 LITERATURE SURVEY

"Stabilizing First Person 360 Degree Videos" by Chetan Arora and Vivek
Kwatra proposed a 360 degree video stabilization technique in which the GFTT fea-
tures between a pair of cube maps is tracked using LK tracker. The tracked feature
points are converted to a 3D vector representation and transformed to the coordi-
nate system of a reference face, chosen arbitrarily. These tracked 3D points from
all faces are then jointly fed into a RANSAC based camera relative pose estimator.
The rotations for all intermediate frames of a batch are computed by solving an
optimization problem. After that an update rotation for each frame, which when
applied to a frame results in a smooth camera trajectory is computed. To better
visualize the effects of stabilization, angular rotations before and after stabilization
are analyzed. The smoother curve after the stabilization indicates that the shake in
camera orientation have stabilized.

"360-degree Video Stabilization" by J.Kopf has proposed a simple algorithm


for 360° videos. They compute 3D rotations of the entire sequence first, and after
the estimation using the computertransformation each frame is simply warped to
the reference frame. This method has a problem when the videographer takes large
turn. Here the output video starts looking sideways because of the alignment of
reference frame direction. Apart from this problem it has excellent results.In this
paper a hybrid 3D-2D algorithm is used to stabilize 360 videos. A new deformed-
rotation motion model is used to undo shake in the videos. This method track
feature points and perform stabilization in terms of these accumulated trajectories.
The major steps involved here are tracking and Key Frame Generation, Estimat-
ing Rotations Between Key Frames,Optimizing the Inner Frames, Residual Jitter
Compensation.The rotation compensation is accurate because it uses 3D analysis,
so it can distinguish rotational and translational motion. The 3D reconstruction
estimates only the rotational component while ignoring translation. This algorithm
remaps the video frame time stamps to balance the apparent camera velocity is also
proposed here.

" Feature Detection for Color Images Using SURF" by Muthugnanambika

Department of ECE 5 ACE College of Engineering


360° Video Stabilization System

M and Dr. Padmavathi S explained that SURF features are extracted from the
color image and combined to detect a color object. Based on the experimental
results, an efficient way to detect the SURF features is derived.Object recognition
majorly depends on shape character and to some extent color also influences the
recognition process. Shape is identified based on the edges or corners of the object.
But since these are affected by the noise in other parameters like illumination, scale
etc. Some amount of interest points with the neighborhood can help better in
object recognition. For the speedy computation of the intensity values in the region
of interest for any image we construct an Integral image. This will have the size
same as that of the given image. The value at any point (x, y) of the integral
image is given as the sum of all the intensity values of the points in the image which
are lesser than or equal to (x, y).This integral image is used in the Hessian square
matrix to get the sum of the intensity values. The final matrix that we get with
the determinant values thresholded with a particular filter size is called the “blob
response map”.To get a proper scale, we interpolate the interest points and express
the final hessian.

"Real Time Recognition of Human Faces" by M. Shujah Islam Sameem,


Tehreem Qasim and Khush Bakhat explained about M-estimator sample consensus
(MSAC) is an algorithm to determine parameters of a mathematical model from a
set of observed data that contains inliers and outliers. Outliers are to be recognized
no influence on the values of the estimates. Therefore, it also can be explained as
an outlier detection method. It is a non-deterministic algorithm in the sense that
it produces a sensible result only with a certain probability. As this probability
increases more iteration are allowed.a sample set containing minimal data items
is randomly selected from the input dataset. A fitting model and the correlated
model parameters are computed using only the elements of this sample subset. The
sample subset is the smallest sufficient to estimate the model parameters. Then
this technique checks which elements of the entire dataset are consistent with the
model expressed by the model parameters obtained from the previous step. A data
element will be considered as an outlier if it does not fit the fitting model expressed
by the set of estimated model parameters within some error threshold that defines
the maximum deviation referable to the effect of

“An efficient solution to the five-point relative pose problem” by D.Nister

Department of ECE 6 ACE College of Engineering


360° Video Stabilization System

explained the Random Sample Consensus (RANSAC) algorithm for robust param-
eter value estimation has been applied to a wide variety of parametric entities . In
many implementations the algorithm is tightly integrated with code pertaining to a
specific parametric object. In this paper we introduce a generic RANSAC implemen-
tation that is independent of the estimated object. Thus, the user is able to ignore
outlying data elements potentially found in their input. To illustrate the use of
the algorithm we implement the required components for estimating the parameter
values of a hyperplane and hypersphere.

Department of ECE 7 ACE College of Engineering


360° Video Stabilization System

CHAPTER 2

2 EXISTING SYSTEMS

2.1 360° Video Stabilization by Johannes Kopf

In this algorithm, Kopf propose a hybrid 2D-3D algorithm. The 3D reasoning


estimates the relative rotation of appropriately spaces key frames. Then they undo
therelative rotation between the keyframes and interpolate the adjustment for the
inner frames. Then they switch to 2D optimization of the inner frames rotations
inorder to maximize the smoothness of the stabilized feature point trajectories.
Their motion model allows slight deviation from pure rotation at inner frames to
account for residual jitter from parallax and rolling shutter wobble, etc.

Here the input videos use equirectangular projection, which is highly dis-
torted at the poles. The frames are converted to a less distorted cube map rep-
resentation for tracking. While they track points on the planar cube faces, they
immediately convert the 2D locations into a 3D unit vectors and store them in a
track table. After they finished tracking the whole video, they cut "dangling ends"
off tracks, so that each track starts and ends at a keyframe.

Figure 2.1: A frame of 360 degree video

The next goal is to estimate the relative rotation between successive key
frames. Let K = ki be the set of key frames. For each pair of successive key frames

Department of ECE 8 ACE College of Engineering


360° Video Stabilization System

(ki , ki+1) we obtain from the feature tracks a set of matching point pairs. We use
a five-point algorithm in a RANSAC procedure to estimate the relative rotation. If
the number of features in a keyframepair is lower than 8, they do not use the 5-point
algorithm but instead directly find the rotation that minimizes the relative distances
between matched points. We chain the rotation to make the rotation relative to the
first keyframe. Inorder to remove the rotations from the video, they store the inverse
transformation. When these inverse transformation is applied to the keyframes, it
stabilizes them since it removes all the relative rotation.

Now they need to find the rotation of the inner frames. For that, in the
tracking phase, they computed a set of tracks each being a list of observations.
These tracks form visual trajectories that can be plotted on the video. Now the
inner frame rotations should be optimized such that these trajectories become as
smooth as possible. For that. We consider first order and second order smoothness
terms. The first order term encourages the trajectories to be as short as possible.
The second order encourages smoothness using a discrete Laplacian operator and
whose 3- tap footprint can reach across keyframes. It does not necessarily recover
the true rotations but I stead the ones that produce the smoothest possible results.

Even after the optimization of inner frames, still a small amount of resid-
ual jitter remains. For various reasons, a generic deformation model is designed to
handle the problems. In this, the fixed keyframes use fixed rotation. This effec-
tively prevents the rolling shutter compensation from drifting too far off, since the
keyframes are interpolated.

Inorder to render an output frame using backwards warping, we need to


invert the deformation model. Since the warping function is smooth, we evaluate
the warping coordinates only for 1 in 8×8 pixels and interpolate the remaining
coordinates bilinearly. This will look identical visually, but will improve the warping
speed.

The results of this algorithm are found to be suffieciently smooth to be


played back at high speed- up factors. And this algorithm of Kopf was found be the
basic building block to the 360° video pipeline.

Department of ECE 9 ACE College of Engineering


360° Video Stabilization System

2.2 Stabilizing first person 360° videos by Chetan Arora and Vivek Kwa-
tra

This paper proposes a noval algorithm for 360° video stabilization that com-
putes the optimal 3D camera rotations for stabilization, without destroying the orig-
inally intended camera orientations. We find 3D rotations first between keyframes,
and then between each consecutive keyframes as suggested by Kopf. And instead
of warping to the reference frame, we use the computed 3D rotations, and stabi-
lize the camera orientations by minimizing the first, second and third derivatives of
the resulting camera path. Warping to this smooth camera path keeps the output
video view point near to the original even when the videographer takes a turn. This
algorithm can be used for stabilizing the narrow field of view(NFOV) as well.

Figure 2.2: A frame of 360 degree video

Figure 2.3: Equirect and corresponding cubemap representation

360° videos are usually represented in equirect format. This format is suit-
able for viewing, but hard to process using computer vision techniques. So this

Department of ECE 10 ACE College of Engineering


360° Video Stabilization System

equirectangular format is converted into the cubemap representation. This repre-


sentation is obtained by projecting the viewing sphere to the six faces of a unit cube.
Each face is equivalent to a image captured by a pinhole camera at the centre of a
cube having unit focal length.

Now we track GFTT(Good Features To Track) between a pair of cubemaps


using LK tracker. We track feature points on each face of the cubemap indepen-
dently. The tracked feature points are converted to a 3D vector representation and
transformed to the coordinate system of a reference face. The tracked 3D points
from all the faces are then jointly fed into the RANSAC based camera relative pose
estimator. The relative translation returned by the pose estimator is discarded and
only the rotation component is used for the computation ahead. Certain frames
are said to be keyframes when the relative motion is large enough to estimate pose
reliably. Whenever the average of the magnitude of the optical flow vector is beyond
20 pixels, it is marked as a keyframe. Now the relative rotation is estimated be-
tween each consecutive keyframes independently, and then chain them to compute
the rotation of a keyframe relative to the first frame.

We compute the rotation between the consecutive keyframes so that it can


maximally smooth feature trajectories. Inorder to achieve this, we divide the in-
put frames into batches with a keyframe at the beginning and eand of the batch It
contains all the intermediate frames between these two keyframes. The rotations
of all the intermediate frames of a batch are computed together by solving an opti-
mization problem. The optimization problem contains 2 energy components. The
first component minimizes first order smoothness of feature tracks, and second order
minimizes second order jitter.

The important difference between this and the previous one is that in the
previous ones they have derotated the frames to the reference frame before finding
the rotations for the intermediate frames. This will result in smaller rotations for
the intermediate frame. Here, they do not derotate the frame which results in larger
value of rotations with respect to the reference frame. It is found that the quaternion
representation of larger angles is more stable than axis-angle representation. So it
is used in the optimization problem.

It found that the camera rotation affects the video stabilization more than

Department of ECE 11 ACE College of Engineering


360° Video Stabilization System

the translation. It is because translation does not induce much optical flow in the
far away points, whereas rotations affects near and far points similarly making the
effect of rotational camera shake more pronounced. Therefore, this algorithm finds
out a steady camera trajectory based upon the rotation component ignoring the
translation component.

Now we compute an update rotation for each frame, which when applied
to the frame will result in a smooth camera trajectory. And warping is done to
the smooth camera trajectory and thus we get the stabilized output video. The
proposed technique is generic enough to be used for narrow FOV videos as well.

Department of ECE 12 ACE College of Engineering


360◦ Video Stabilization System

CHAPTER 3

3 PROPOSED SYSTEM

Figure 3.1: Block diagram representation of Proposed system

3.1 CUBEMAP REPRESENTATION

Cube map representation is a method of projecting the sphere onto the six face
of a cube. The six images are arranged as an unfolded cube. Usually, 360 degree
videos are represented in equirectangular format. It is appropriate for viewing.
But, processing the videos in this format are hard to process using computer vision
technologies. So, we convert the video to cube map format. Each face represents the
view along the directions of the 360 degree frame (top, bottom, left, right, front and
back). Cube map format gives much larger capacity to support real-time rendering
of reflections when compared to sphere mapping

In cubemap representation, the 360 degree camera is surrounded by a cube


the 6 faces of which have a proper texture map. These texture maps are created
by representing the scene with six 90 degree cameras which gives left, front, right,
back, top, and bottom texture. The camera is surrounded by a sphere with a single
spherically distorted texture in a spherical map.

Department of ECE 13 ACE College of Engineering


360◦ Video Stabilization System

Figure 3.2: Sample input video frame in equirect format

Figure 3.3: Corresponding cubemap representation

3.2 SURF FEATURE EXTRACTION

Speeded-Up robust features (SURF) is a local feature detector and descriptor


in computer vision. It can be used for tasks such as recognition of object, image
registration, classification or 3D reconstruction. The typical form of SURF is several
times faster than SIFT. SURF have been used to detect and identify objects, people
or faces, to reconstruct 3D scenes, to track objects and to extract points.

Recognition using Speeded-Up Robust Features (SURF) algorithm is com-


posed of three steps. They are feature extraction, feature description, and feature
matching.

Department of ECE 14 ACE College of Engineering


360◦ Video Stabilization System

3.2.1 Feature Extraction

The fundamental step in object recognition algorithm is feature extraction.


It refers to the process of extracting useful information which referred to as features
from the image. The extracted features should carry unique and important at-
tributes of that image. The feature extraction is done in all 3 planes-red plane,green
plane and blue plane (RGB plane).

SURF algorithm uses square-shaped filters as an estimation of Gaussian


smoothing. Square filtering of the image is much faster if the integral image is used:
a X
X b
S(a, b) = I(i, j) (1)
i=1 j=1

The sum of the original image within a rectangle can be assessed speedily
using the integral image, requiring evaluations at the four corners of the rectangle.
SURF uses the Hessian matrix to find feature points. The determinant of the Hessian
matrix is used for determining local change around the point. Points are chosen
where the determinant is maximum. Let a point p=(a , b) in an image I, the
Hessian matrix H(p , σ) at point p and scale σ, is: where Letc is the convolution of
the second-order derivative of Gaussian with the image at the point.

Points of interest can be found at different scales, partly because the search
for correspondences requires comparison of images when they are seen at different
scales. Images are smoothed repeatedly using a Gaussian filter. Then, they are
sampled to get the next level of the pyramid. Therefore, several stairs with various
measures of the masks are calculated.The scale space is sub divided into a number
of octaves. An octave means to a series of response maps of covering a doubling of
scale.

3.2.2 Descriptor

The aim of feature descriptor is to provide a distinctive and robust descrip-


tion of a feature of an image. Thus most descriptors are determined in a local
manner. Therefore, a descriptor is obtained for every point identified previously.
The descriptor dimensionality has direct effect on both its computational complex-
ity and feature matching accuracy. The first step consists of fixing a orientation

Department of ECE 15 ACE College of Engineering


360◦ Video Stabilization System

based on information from a circular region around the point of interest. Then, we
construct a square area lined up to the selected orientation, and can extract the
SURF descriptors from it.

3.2.3 Matching

Matching feature points can be identified by comparing the descriptors ob-


tained from different images. This is done in RGB planes.

3.3 INLIER AND OUTLIER POINTS USING MSAC ALGORITHM

M-estimator sample consensus (MSAC) is an algorithm to determine param-


eters of a mathematical model from a set of observed data that contains inliers and
outliers. Outliers are to be recognized no influence on the values of the estimates.
Therefore, it also can be explained as an outlier detection method. It is a non-
deterministic algorithm in the sense that it produces a sensible result only with a
certain probability. As this probability increases more iteration are allowed. The
MSAC algorithm is done in all the three planes(RGB plane)

The fundamental assumption is that the data consists of inliers, i.e., those
points subject to noise, and outliers which are data that do not cause noise. The
MSAC algorithm is essentially composed of two steps that are repeated:

1. In the first step, a sample set containing minimal data items is randomly selected
from the input dataset. A fitting model and the correlated model parameters are
computed using only the elements of this sample subset. The sample subset is the
smallest sufficient to estimate the model parameters.

2. In the next step, the method checks which elements of the entire dataset are
consistent with the model expressed by the model parameters obtained from the
previous step. A data element will be considered as an outlier if it does not fit the
fitting model expressed by the set of estimated model parameters within some error
threshold that defines the maximum deviation referable to the effect of noise.

Department of ECE 16 ACE College of Engineering


360◦ Video Stabilization System

The set of inliers obtained for the model is known as consensus set. The
MSAC method will repeat the above two steps until the obtained consensus set in
certain iteration has enough inliers. The input to the MSAC algorithm is a set of
observed data values. The following steps are repeated:

1. Select a random subset of the original data. This sample subset is the inliers.

2. A model is fitted to this set of inliers.

3. All other data are then tested opposition to the fitted model. Those points

that fit the estimated model, according to some model-specific loss function, are
considered as a part of consensus set.

4. The estimated model is reasonably good if sufficiently many points have been
classified as part of the consensus set.

5. The model may be improved by re-estimating it using all points of the consensus
set.

3.4 AFFINE TRANSFORMATION

A linear mapping technique that preserves points, straight lines, and planes
is affine transformation. The affine transformation method is usually used to correct
for geometric distortions that occur with non-ideal camera angles. Satellite imagery
uses affine transformations to precise wide angle lens distortion, panorama stitching,
and image registration. To eliminate distortion, transforming and fusing the images
to a large, flat coordinate system is desirable. This enables easier interactions and
calculations and can be able to eliminate the distortions.

Detected images(RGB planes) are subject to geometric distortion introduced


by irregularities where in the position of the camera with respect to the scene changes
the apparent dimensions of the scene geometry. Applying an affine transformation
the distorted image can correct for a range of perspective distortions by transform-
ing the computations from the ideal coordinates to those actually used. An affine
transformation is an important class of linear 2-D geometric transformations which

Department of ECE 17 ACE College of Engineering


360◦ Video Stabilization System

maps pixel values located at position (input image) into new variables (output im-
age) by applying a linear combination of translation, rotation, scaling and shearing
operations.

Department of ECE 18 ACE College of Engineering


360° Video Stabilization System

CHAPTER 4

4 RESULTS

4.1 INPUT VIDEO 1

We have implemented our algorithm in MATLAB software. The input video


in the equirectangular form is first converted to the cubemap representation as shown
in figure 4 and 5.Then the SURF features are extracted from each of the frames of the
faces of a cube. Then, the inlier and Outlier points are obtained through the MSAC
algorithm. The inlier points are the ones inwhich the distortions are found. So, using
the geometric transformation, those distortions are removed and the stabilized video
is obtained. The input and output stabilized frame is shown in the figure 6, 7 and
8 respectively. The frame rate was found to be 30frames/second. i.e. 30 frames are
processed in a single second. And the elapsed time was found to be 310.57 seconds.

Figure 4.1: Input video 1 frame in equirect format

Department of ECE 19 ACE College of Engineering


360° Video Stabilization System

Figure 4.2: Cubemap representation of Input video 1 frame

Figure 4.3: (a) Input and stabilized frame of back face (b) Input and stabilized frame of front face

Department of ECE 20 ACE College of Engineering


360° Video Stabilization System

Figure 4.4: (a) Input and stabilized frame of top face (b) Input and stabilized frame of bottom face

Figure 4.5: (a) Input and stabilized frame of left face (b) Input and stabilized frame of right face

Department of ECE 21 ACE College of Engineering


360° Video Stabilization System

4.2 INPUT VIDEO 2

We have implemented our algorithm in MATLAB software. The input video


in the equirectangular form is first converted to the cubemap representation as shown
in figure 9 and 10.Then the SURF features are extracted from each of the frames
of the faces of a cube. Then, the inlier and Outlier points are obtained through
the MSAC algorithm. The inlier points are the ones inwhich the distortions are
found. So, using the geometric transformation, those distortions are removed and
the stabilized video is obtained. The input and output stabilized frame is shown in
the figure 11, 12, 13 respectively. The frame rate was found to be 30frames/second.
i.e. 30 frames are processed in a single second. And the elapsed time was found to
be 320.56 seconds.

Figure 4.6: Input video 2 frame in equirect format

Department of ECE 22 ACE College of Engineering


360° Video Stabilization System

Figure 4.7: Cubemap representation of Input video 2 frame

Figure 4.8: (a) Input and stabilized frame of back face (b) Input and stabilized frame of front face

Department of ECE 23 ACE College of Engineering


360° Video Stabilization System

Figure 4.9: (a) Input and stabilized frame of top face (b) Input and stabilized frame of bottom face

Figure 4.10: (a) Input and stabilized frame of right face (b) Input and stabilized frame of left face

Department of ECE 24 ACE College of Engineering


360° Video Stabilization System

4.3 INPUT VIDEO 3

We have implemented our algorithm in MATLAB software. The input video


in the equirectangular form is first converted to the cubemap representation as
shown in figure 14 and 15.Then the SURF features are extracted from each of the
frames of the faces of a cube. Then, the inlier and Outlier points are obtained
through the MSAC algorithm. The inlier points are the ones inwhich the distortions
are found. So, using the geometric transformation, those distortions are removed
and the stabilized video is obtained. The input and output stabilized frame is
shown in the figure 16, 17 and 18 respectively. The frame rate was found to be
30frames/second. i.e. 30 frames are processed in a single second. And the elapsed
time was found to be 302 seconds.

Figure 4.11: Input video 3 frame in equirect format

Department of ECE 25 ACE College of Engineering


360° Video Stabilization System

Figure 4.12: Cubemap representation of Input video 3 frame

Figure 4.13: (a) Input and stabilized frame of back face (b) Input and stabilized frame of front face

Department of ECE 26 ACE College of Engineering


360° Video Stabilization System

Figure 4.14: (a) Input and stabilized frame of top face (b) Input and stabilized frame of bottom
face

Figure 4.15: (a) Input and stabilized frame of right face (b) Input and stabilized frame of left face

Department of ECE 27 ACE College of Engineering


360° Video Stabilization System

4.4 INPUT VIDEO 4

We have implemented our algorithm in MATLAB software. The input video


in the equirectangular form is first converted to the cubemap representation as
shown in figure 19 and 20.Then the SURF features are extracted from each of the
frames of the faces of a cube. Then, the inlier and Outlier points are obtained
through the MSAC algorithm. The inlier points are the ones inwhich the distortions
are found. So, using the geometric transformation, those distortions are removed
and the stabilized video is obtained. The input and output stabilized frame is
shown in the figure 21, 22 and 23 respectively. The frame rate was found to be
30frames/second. i.e. 30 frames are processed in a single second. And the elapsed
time was found to be 338.36 seconds.

Figure 4.16: Input video 4 frame in equirect format

Department of ECE 28 ACE College of Engineering


360° Video Stabilization System

Figure 4.17: Cubemap representation of Input video 4 frame

Figure 4.18: (a) Input and stabilized frame of back face (b) Input and stabilized frame of front face

Department of ECE 29 ACE College of Engineering


360° Video Stabilization System

Figure 4.19: (a) Input and stabilized frame of top face (b) Input and stabilized frame of bottom
face

Figure 4.20: (a) Input and stabilized frame of right face (b) Input and stabilized frame of left face

Department of ECE 30 ACE College of Engineering


360° Video Stabilization System

4.5 INPUT VIDEO 5

We have implemented our algorithm in MATLAB software. The input video


in the equirectangular form is first converted to the cubemap representation as
shown in figure 24 and 25.Then the SURF features are extracted from each of the
frames of the faces of a cube. Then, the inlier and Outlier points are obtained
through the MSAC algorithm. The inlier points are the ones inwhich the distortions
are found. So, using the geometric transformation, those distortions are removed
and the stabilized video is obtained. The input and output stabilized frame is
shown in the figure 26, 27 and 28 respectively. The frame rate was found to be
30frames/second. i.e. 30 frames are processed in a single second. And the elapsed
time was found to be 332 seconds.

Figure 4.21: Input video 5 frame in equirect format

Department of ECE 31 ACE College of Engineering


360° Video Stabilization System

Figure 4.22: Cubemap representation of Input video 5 frame

Figure 4.23: (a) Input and stabilized frame of back face (b) Input and stabilized frame of front face

Department of ECE 32 ACE College of Engineering


360° Video Stabilization System

Figure 4.24: (a) Input and stabilized frame of top face (b) Input and stabilized frame of bottom
face

Figure 4.25: (a) Input and stabilized frame of right face (b) Input and stabilized frame of left face

Department of ECE 33 ACE College of Engineering


360° Video Stabilization System

4.6 INPUT VIDEO 6

We have implemented our algorithm in MATLAB software. The input video


in the equirectangular form is first converted to the cubemap representation as
shown in figure 29 and 30.Then the SURF features are extracted from each of the
frames of the faces of a cube. Then, the inlier and Outlier points are obtained
through the MSAC algorithm. The inlier points are the ones inwhich the distortions
are found. So, using the geometric transformation, those distortions are removed
and the stabilized video is obtained. The input and output stabilized frame is
shown in the figure 31, 32 and 33 respectively. The frame rate was found to be
30frames/second. i.e. 30 frames are processed in a single second. And the elapsed
time was found to be 339.57 seconds.

Figure 4.26: Input video 6 frame in equirect format

Department of ECE 34 ACE College of Engineering


360° Video Stabilization System

Figure 4.27: Cubemap representation of Input video 6 frame

Figure 4.28: (a) Input and stabilized frame of back face (b) Input and stabilized frame of front face

Department of ECE 35 ACE College of Engineering


360° Video Stabilization System

Figure 4.29: (a) Input and stabilized frame of top face (b) Input and stabilized frame of bottom
face

Figure 4.30: (a) Input and stabilized frame of right face (b) Input and stabilized frame of left face

Department of ECE 36 ACE College of Engineering


360° Video Stabilization System

5 ADVANTAGES

Parameters Existing System Existing System Proposed Sys-


1 2 tem
Frame Rate 29.9frames/sec - 30frames/sec
Software Open CV, Open Open CV, Open Matlab
GV, Python, C++ GV, Python, C++
Keyframes spaced key frames Frames as bunches Adjacent frames
are used
Points Tracked Motion of feature GFTT features SURF features
points are tracked

Table 5.1: Comparison between existing systems and proposed system

1. The frame rate is 30 frames/second. This means that in one second 30 frames are
being processed. Compared to other systems, we can process more frames in short
period of time

2. Stabilization would be easier since adjacent frames are taken as key frames.

3. MSAC has an ability to do robust estimation of the model parameters. It


can estimate the parameters with high accuracy even when a significant number of
outliers are present in the data set.

4. Computation time is much faster in SURF algorithm.

5. SURF method is good at handling image rotation.

6. Repeatability property of the feature is good in SURF.

Department of ECE 37 ACE College of Engineering


360° Video Stabilization System

6 APPLICATIONS AND FUTURESCOPE

a) 360 degree video stabilization can be used in the areas of militery applica-
tions,localization, Navigation and target tracking. It is best to install 360 degree
cameras in the war equipments.During wartimes this will ensure the safety of soldiers
by focusing the enemy actions.

b) 360 degree video stabilization will be highly influencing in the medical field.that
is in medical diagnostic applications like endoscopy and colonoscopy . Normally
endoscopy is done by capturing images of internal organs using a special camera.By
using 360 degree cameras we can record videos of internal organs and can observe
the stabilized video.

c) 360 degree video stabilization will be stepping stone towards Virtual Reality
(VR) world. By using VR headsets we can create realistic images,sound and other
sensations that simulate a user’s presence in a virtual or imaginary environment.
People will able to to ’feel the action’, ’Be the movie character’,’fly to the sky’ while
sitting at home. VR technique helps a person to fully immerse in the videos.Virtual
reality is a revolutionary technology.

d) Virtual Reality is highly useful in Navy field.The navy conducts a wide range
of training operations,some of which are dangerous and expensive.Using 360 degree
videos and VR technology we can reduce the risk and cost.

e) It can be used in entertainment fields like gaming and movies to make the gaming
experience more real and to allow individual experience adventures under extreme
conditions.

f) It can be used in Traffic systems.

g) It can be used in scientific research laboratories so that scientist can easily research
on a specific topic.

Department of ECE 38 ACE College of Engineering


360° Video Stabilization System

h) It can be used for security purposes such as in banks,offices etc.

i) It can be used in education industry.It increases the level of understanding.

j) 360 degree cameras can be used in sports industry to show the full action of a
gaming thereby stabilizing.

Department of ECE 39 ACE College of Engineering


360° Video Stabilization System

7 CONCLUSION

This approach does not change the original orientation of input video. It
creates a more natural experience for the viewer. Here the 360 degree video stabi-
lization algorithm is done in Matlab. We have obtained the stabilized video output
in a fast manner by incorporating SURF Extraction Technique, MSAC Algorithm
and the Geometric Transformation Technique together. SURF extraction technique
is much faster than other feature extraction techniques. SURF is good at handling
images with blurring or rotation. M-estimator sample Consensus (MSAC) algo-
rithm is an iterative method to estimate parameters of a mathematical model from
a set of observed data which contains outliers. Then affine transformation is applied
to all the frames.The affine transformation method is usually used to correct for
geometric distortions that occur with non-ideal camera angles.Applying an affine
transformation the distorted image can correct for a range of perspective distortions
by transforming the computations from the ideal coordinates to those actually used.

Department of ECE 40 ACE College of Engineering

You might also like