Professional Documents
Culture Documents
CHAPTER 1
1 INTRODUCTION
Earlier, normal cameras were used to capture videos. It only capture videos
in the direction in which the camera is directed or focused. Sudden capturing of the
videos usually result in unnecessary shakiness in the video which when viewed by
the viewer will cause cyber sickness. In order to view the instantly captured videos
smoothly, the term ‘video stabilization’ is used. Video stabilization is the method
in which the unnecessary shakiness and blur are removed which occurs when instant
videos are captured, for the smooth viewing of the video. It can also be defined as the
process of estimating and compensating for the background image motion occurring
due to the ego-motion of the camera. These techniques have been developed in order
to smooth shaky camera motion in videos before viewing.
The processing of 360◦ videos for optimal viewing is a bit harder now. The
researchers have been developing new algorithms for the stabilization of 360◦ videos
as our future lies in it. By the introduction of Virtual Reality(VR), 360◦ videos and
its stabilization seem to have a wide range of applications in it. So the stabilization
of 360◦ videos have become more important nowadays.
360 degree video stabilization can be used in the areas of military applica-
tions,localization, navigation and target tracking. It is best to install 360 degree
cameras in the war equipment. During war times this will ensure the safety of sol-
diers by focusing the enemy actions. Virtual Reality is highly useful in Navy field.
The navy conducts a wide range of training operations,some of which are dangerous
and expensive. Using 360 degree videos and VR technology we can reduce the risk
and cost.
360 degree video stabilization will be stepping stone towards Virtual Reality
(VR) world. By using VR headsets we can create realistic images,sound and other
sensations that simulate a user’s presence in a virtual or imaginary environment.
People will able to to "feel the action", "Be the movie character", "fly to the sky"
while sitting at home. VR technique helps a person to fully immerse in the videos.
Virtual reality is a revolutionary technology.
360 degree video stabilization will be highly influencing in the medical field.
That is in medical diagnostic applications like endoscopy and colonoscopy. Normally
endoscopy is done by capturing images of internal organs using a special camera. By
using 360 degree cameras we can record videos of internal organs and can observe
the stabilized video.
It can be used in entertainment fields like gaming and movies to make the
gaming experience more real and to allow individual experience adventures under
extreme conditions. It can be used in Traffic systems. It can be used in scientific
research laboratories so that scientist can easily research on a specific topic.It can
be used for security purposes such as in banks,offices etc.It can also be used in
education industry. It increases the level of understanding. 360 degree cameras can
be used in sports industry to show the full action of a gaming.
The consumer sits still while the camera movements are too rapid sometimes
to cover a fast paced shot say like skydive or a bungee jumping sequence. This
causes motion sickness in the consumer and thus reduces the preference for Virtual
Reality Experience. There are some other health effects observed in the users of VR
Headsets:
• Seizures
• Loss of attentiveness
• Disorientation
• Impaired balance
• Cyber-sickness
"Stabilizing First Person 360 Degree Videos" by Chetan Arora and Vivek
Kwatra proposed a 360 degree video stabilization technique in which the GFTT fea-
tures between a pair of cube maps is tracked using LK tracker. The tracked feature
points are converted to a 3D vector representation and transformed to the coordi-
nate system of a reference face, chosen arbitrarily. These tracked 3D points from
all faces are then jointly fed into a RANSAC based camera relative pose estimator.
The rotations for all intermediate frames of a batch are computed by solving an
optimization problem. After that an update rotation for each frame, which when
applied to a frame results in a smooth camera trajectory is computed. To better
visualize the effects of stabilization, angular rotations before and after stabilization
are analyzed. The smoother curve after the stabilization indicates that the shake in
camera orientation have stabilized.
M and Dr. Padmavathi S explained that SURF features are extracted from the
color image and combined to detect a color object. Based on the experimental
results, an efficient way to detect the SURF features is derived.Object recognition
majorly depends on shape character and to some extent color also influences the
recognition process. Shape is identified based on the edges or corners of the object.
But since these are affected by the noise in other parameters like illumination, scale
etc. Some amount of interest points with the neighborhood can help better in
object recognition. For the speedy computation of the intensity values in the region
of interest for any image we construct an Integral image. This will have the size
same as that of the given image. The value at any point (x, y) of the integral
image is given as the sum of all the intensity values of the points in the image which
are lesser than or equal to (x, y).This integral image is used in the Hessian square
matrix to get the sum of the intensity values. The final matrix that we get with
the determinant values thresholded with a particular filter size is called the “blob
response map”.To get a proper scale, we interpolate the interest points and express
the final hessian.
explained the Random Sample Consensus (RANSAC) algorithm for robust param-
eter value estimation has been applied to a wide variety of parametric entities . In
many implementations the algorithm is tightly integrated with code pertaining to a
specific parametric object. In this paper we introduce a generic RANSAC implemen-
tation that is independent of the estimated object. Thus, the user is able to ignore
outlying data elements potentially found in their input. To illustrate the use of
the algorithm we implement the required components for estimating the parameter
values of a hyperplane and hypersphere.
CHAPTER 2
2 EXISTING SYSTEMS
Here the input videos use equirectangular projection, which is highly dis-
torted at the poles. The frames are converted to a less distorted cube map rep-
resentation for tracking. While they track points on the planar cube faces, they
immediately convert the 2D locations into a 3D unit vectors and store them in a
track table. After they finished tracking the whole video, they cut "dangling ends"
off tracks, so that each track starts and ends at a keyframe.
The next goal is to estimate the relative rotation between successive key
frames. Let K = ki be the set of key frames. For each pair of successive key frames
(ki , ki+1) we obtain from the feature tracks a set of matching point pairs. We use
a five-point algorithm in a RANSAC procedure to estimate the relative rotation. If
the number of features in a keyframepair is lower than 8, they do not use the 5-point
algorithm but instead directly find the rotation that minimizes the relative distances
between matched points. We chain the rotation to make the rotation relative to the
first keyframe. Inorder to remove the rotations from the video, they store the inverse
transformation. When these inverse transformation is applied to the keyframes, it
stabilizes them since it removes all the relative rotation.
Now they need to find the rotation of the inner frames. For that, in the
tracking phase, they computed a set of tracks each being a list of observations.
These tracks form visual trajectories that can be plotted on the video. Now the
inner frame rotations should be optimized such that these trajectories become as
smooth as possible. For that. We consider first order and second order smoothness
terms. The first order term encourages the trajectories to be as short as possible.
The second order encourages smoothness using a discrete Laplacian operator and
whose 3- tap footprint can reach across keyframes. It does not necessarily recover
the true rotations but I stead the ones that produce the smoothest possible results.
Even after the optimization of inner frames, still a small amount of resid-
ual jitter remains. For various reasons, a generic deformation model is designed to
handle the problems. In this, the fixed keyframes use fixed rotation. This effec-
tively prevents the rolling shutter compensation from drifting too far off, since the
keyframes are interpolated.
2.2 Stabilizing first person 360° videos by Chetan Arora and Vivek Kwa-
tra
This paper proposes a noval algorithm for 360° video stabilization that com-
putes the optimal 3D camera rotations for stabilization, without destroying the orig-
inally intended camera orientations. We find 3D rotations first between keyframes,
and then between each consecutive keyframes as suggested by Kopf. And instead
of warping to the reference frame, we use the computed 3D rotations, and stabi-
lize the camera orientations by minimizing the first, second and third derivatives of
the resulting camera path. Warping to this smooth camera path keeps the output
video view point near to the original even when the videographer takes a turn. This
algorithm can be used for stabilizing the narrow field of view(NFOV) as well.
360° videos are usually represented in equirect format. This format is suit-
able for viewing, but hard to process using computer vision techniques. So this
The important difference between this and the previous one is that in the
previous ones they have derotated the frames to the reference frame before finding
the rotations for the intermediate frames. This will result in smaller rotations for
the intermediate frame. Here, they do not derotate the frame which results in larger
value of rotations with respect to the reference frame. It is found that the quaternion
representation of larger angles is more stable than axis-angle representation. So it
is used in the optimization problem.
It found that the camera rotation affects the video stabilization more than
the translation. It is because translation does not induce much optical flow in the
far away points, whereas rotations affects near and far points similarly making the
effect of rotational camera shake more pronounced. Therefore, this algorithm finds
out a steady camera trajectory based upon the rotation component ignoring the
translation component.
Now we compute an update rotation for each frame, which when applied
to the frame will result in a smooth camera trajectory. And warping is done to
the smooth camera trajectory and thus we get the stabilized output video. The
proposed technique is generic enough to be used for narrow FOV videos as well.
CHAPTER 3
3 PROPOSED SYSTEM
Cube map representation is a method of projecting the sphere onto the six face
of a cube. The six images are arranged as an unfolded cube. Usually, 360 degree
videos are represented in equirectangular format. It is appropriate for viewing.
But, processing the videos in this format are hard to process using computer vision
technologies. So, we convert the video to cube map format. Each face represents the
view along the directions of the 360 degree frame (top, bottom, left, right, front and
back). Cube map format gives much larger capacity to support real-time rendering
of reflections when compared to sphere mapping
The sum of the original image within a rectangle can be assessed speedily
using the integral image, requiring evaluations at the four corners of the rectangle.
SURF uses the Hessian matrix to find feature points. The determinant of the Hessian
matrix is used for determining local change around the point. Points are chosen
where the determinant is maximum. Let a point p=(a , b) in an image I, the
Hessian matrix H(p , σ) at point p and scale σ, is: where Letc is the convolution of
the second-order derivative of Gaussian with the image at the point.
Points of interest can be found at different scales, partly because the search
for correspondences requires comparison of images when they are seen at different
scales. Images are smoothed repeatedly using a Gaussian filter. Then, they are
sampled to get the next level of the pyramid. Therefore, several stairs with various
measures of the masks are calculated.The scale space is sub divided into a number
of octaves. An octave means to a series of response maps of covering a doubling of
scale.
3.2.2 Descriptor
based on information from a circular region around the point of interest. Then, we
construct a square area lined up to the selected orientation, and can extract the
SURF descriptors from it.
3.2.3 Matching
The fundamental assumption is that the data consists of inliers, i.e., those
points subject to noise, and outliers which are data that do not cause noise. The
MSAC algorithm is essentially composed of two steps that are repeated:
1. In the first step, a sample set containing minimal data items is randomly selected
from the input dataset. A fitting model and the correlated model parameters are
computed using only the elements of this sample subset. The sample subset is the
smallest sufficient to estimate the model parameters.
2. In the next step, the method checks which elements of the entire dataset are
consistent with the model expressed by the model parameters obtained from the
previous step. A data element will be considered as an outlier if it does not fit the
fitting model expressed by the set of estimated model parameters within some error
threshold that defines the maximum deviation referable to the effect of noise.
The set of inliers obtained for the model is known as consensus set. The
MSAC method will repeat the above two steps until the obtained consensus set in
certain iteration has enough inliers. The input to the MSAC algorithm is a set of
observed data values. The following steps are repeated:
1. Select a random subset of the original data. This sample subset is the inliers.
3. All other data are then tested opposition to the fitted model. Those points
that fit the estimated model, according to some model-specific loss function, are
considered as a part of consensus set.
4. The estimated model is reasonably good if sufficiently many points have been
classified as part of the consensus set.
5. The model may be improved by re-estimating it using all points of the consensus
set.
A linear mapping technique that preserves points, straight lines, and planes
is affine transformation. The affine transformation method is usually used to correct
for geometric distortions that occur with non-ideal camera angles. Satellite imagery
uses affine transformations to precise wide angle lens distortion, panorama stitching,
and image registration. To eliminate distortion, transforming and fusing the images
to a large, flat coordinate system is desirable. This enables easier interactions and
calculations and can be able to eliminate the distortions.
maps pixel values located at position (input image) into new variables (output im-
age) by applying a linear combination of translation, rotation, scaling and shearing
operations.
CHAPTER 4
4 RESULTS
Figure 4.3: (a) Input and stabilized frame of back face (b) Input and stabilized frame of front face
Figure 4.4: (a) Input and stabilized frame of top face (b) Input and stabilized frame of bottom face
Figure 4.5: (a) Input and stabilized frame of left face (b) Input and stabilized frame of right face
Figure 4.8: (a) Input and stabilized frame of back face (b) Input and stabilized frame of front face
Figure 4.9: (a) Input and stabilized frame of top face (b) Input and stabilized frame of bottom face
Figure 4.10: (a) Input and stabilized frame of right face (b) Input and stabilized frame of left face
Figure 4.13: (a) Input and stabilized frame of back face (b) Input and stabilized frame of front face
Figure 4.14: (a) Input and stabilized frame of top face (b) Input and stabilized frame of bottom
face
Figure 4.15: (a) Input and stabilized frame of right face (b) Input and stabilized frame of left face
Figure 4.18: (a) Input and stabilized frame of back face (b) Input and stabilized frame of front face
Figure 4.19: (a) Input and stabilized frame of top face (b) Input and stabilized frame of bottom
face
Figure 4.20: (a) Input and stabilized frame of right face (b) Input and stabilized frame of left face
Figure 4.23: (a) Input and stabilized frame of back face (b) Input and stabilized frame of front face
Figure 4.24: (a) Input and stabilized frame of top face (b) Input and stabilized frame of bottom
face
Figure 4.25: (a) Input and stabilized frame of right face (b) Input and stabilized frame of left face
Figure 4.28: (a) Input and stabilized frame of back face (b) Input and stabilized frame of front face
Figure 4.29: (a) Input and stabilized frame of top face (b) Input and stabilized frame of bottom
face
Figure 4.30: (a) Input and stabilized frame of right face (b) Input and stabilized frame of left face
5 ADVANTAGES
1. The frame rate is 30 frames/second. This means that in one second 30 frames are
being processed. Compared to other systems, we can process more frames in short
period of time
2. Stabilization would be easier since adjacent frames are taken as key frames.
a) 360 degree video stabilization can be used in the areas of militery applica-
tions,localization, Navigation and target tracking. It is best to install 360 degree
cameras in the war equipments.During wartimes this will ensure the safety of soldiers
by focusing the enemy actions.
b) 360 degree video stabilization will be highly influencing in the medical field.that
is in medical diagnostic applications like endoscopy and colonoscopy . Normally
endoscopy is done by capturing images of internal organs using a special camera.By
using 360 degree cameras we can record videos of internal organs and can observe
the stabilized video.
c) 360 degree video stabilization will be stepping stone towards Virtual Reality
(VR) world. By using VR headsets we can create realistic images,sound and other
sensations that simulate a user’s presence in a virtual or imaginary environment.
People will able to to ’feel the action’, ’Be the movie character’,’fly to the sky’ while
sitting at home. VR technique helps a person to fully immerse in the videos.Virtual
reality is a revolutionary technology.
d) Virtual Reality is highly useful in Navy field.The navy conducts a wide range
of training operations,some of which are dangerous and expensive.Using 360 degree
videos and VR technology we can reduce the risk and cost.
e) It can be used in entertainment fields like gaming and movies to make the gaming
experience more real and to allow individual experience adventures under extreme
conditions.
g) It can be used in scientific research laboratories so that scientist can easily research
on a specific topic.
j) 360 degree cameras can be used in sports industry to show the full action of a
gaming thereby stabilizing.
7 CONCLUSION
This approach does not change the original orientation of input video. It
creates a more natural experience for the viewer. Here the 360 degree video stabi-
lization algorithm is done in Matlab. We have obtained the stabilized video output
in a fast manner by incorporating SURF Extraction Technique, MSAC Algorithm
and the Geometric Transformation Technique together. SURF extraction technique
is much faster than other feature extraction techniques. SURF is good at handling
images with blurring or rotation. M-estimator sample Consensus (MSAC) algo-
rithm is an iterative method to estimate parameters of a mathematical model from
a set of observed data which contains outliers. Then affine transformation is applied
to all the frames.The affine transformation method is usually used to correct for
geometric distortions that occur with non-ideal camera angles.Applying an affine
transformation the distorted image can correct for a range of perspective distortions
by transforming the computations from the ideal coordinates to those actually used.