You are on page 1of 6

Video Compression for Surveillance Application

using Deep Neural Network


Department of Computer Science and Engineering, RRCE
Rajarajeswari College of Engineering Banglore,Affilicated to Visvesvaraya Technological university
Vmsp4umohan@gmail.com , navyagowda270@gmail.com , mantoshkichni@gmail.com ,
abhileshpatil1961@gmail.com , pratibha.deshak@gmail.com
Abstrac : A fresh comparison analysis on video Surveillance cameras are becoming more common
compression technologies was offered in this in countries around the world. Approximately in
study. Video streaming programmes are There are 760 million security cameras deployed
becoming increasingly popular as internet around the world, with the number expected to climb
technology and computers advance at a rapid to 1 billion by 2020. By 2023, the population will be
pace. As a result, uncompressed raw video around 1 billion. Benefits such as better security and
demands a lot of disc space and network lower crime The high level of interest in these
bandwidth to store and deliver today. We technologies can be linked to the rates. A Most
describe a novel technique to video compression surveillance recordings have a lot of temporal
surveillance that improves on the shortcomings redundancy because the surroundings are usually
of previous approaches by replacing each static and only a few people are moving about
standard component with a neural network Moving items cause a portion of the frame to alter.
counterpart. . We describe a novel technique to As a result, a significant portion of bandwidth is
video compression surveillance that improves on used to broadcast scenes that are of little or no
the shortcomings of previous approaches by interest because they don't include semantically
replacing each standard component with a interesting items (For example, people, vehicles, and
neural network counterpart. It delivers a higher- animals). unprocessed video feeds from all
quality video stream at a consistent bit rate cameras are frequently transferred to a to a single
(compared to previous standards). As a result, server for survey, processing, and viewing in
you must select the appropriate video surveillance a camera network with a big number of
compression technology to fulfil the cameras. It puts a lot of strain on the processor. the
requirements of your video application.. Our primary server Furthermore, in most situations, the
work is founded on a set of principles common streams must be stored for future usage, which
method for reducing the bit rate while requires a significant amount of storage. The
minimising distortions in decoded frames by monitoring system's problems can be handled by
using In video frames, there is spatial and employing a compression of video technology that
temporal redundancy. We use a neural network employs spatiotemporal redundancy. The key
to develop a video compression in the traditional details are preserved in the streams, making
sense strategy and encode redundant data with additional analysis and storage possible.
fewer bits. Experiments have revealed that our
solution is successful and surpass traditional Many existing standard for video compression, such
MPEG encoding while retaining visual quality at as MPEG and H.264, which use mathematical
similar bit rates.Although our approach is geared techniques to compress video, are widely
at surveillance, it can simply be applied to other used.While they have been meticulously created and
types of video. fine-tuned, they are intended to be utilised in a
specific environment. They can only be general in
Keyword- Video Compression, Algorithm, Deep their surveillance applications, which limits their
Learning,MotionEstimation, BitRate Estimation potential to be specific.

Many problems have been solved using deep


learning techniques, incorporating the field of
I. INTRODUCTION picture compression, which has made significant
progress. Commercial codecs have been
outperformed by ML-based image compression
techniques by a wide margin, and progress is still II. EASE OF USE
being made. Advancements in hardware to support
A.TRADITIONAL VIDEO COMPRESSION
machine learning-based applications have
accelerated this process even more However, while PIPLINE
image compression is mostly focused on spatial
estimate, video compression should also take into The most often used and widely employed standard
for today's video compression is H.264, which was
account motion information.
conceived by a figure of important participants in the
coding for video industry.
Deep learning methods for video compression are
being investigated throughout the full compression In general, video compression algorithms take use of
process.For a portion of a typical pipeline One the vision system of the human redundant reactions.
effective approach to our situation is Detecting Compression techniques used in the past To
interesting video frame sections and allocating more transform the sampled Red, Green, and Blue spatial
bits in the video frame might be difficult. They are sequence into Luma and three components of
given a compressed representation. In the rest of the chroma, use the Y:Cr:Cb Color Space. The HVS's
frame, fewer bits are assigned.By smoothing out luma components are very sensitive, however
information that would otherwise be included in the the chroma components are not. In the
transmitted video unnecessarily, a compressed encoded signal, only two chroma components
stream can be created. Despite the fact that object can be sent, with the remaining components
detection techniques are robust and accurate, this plus luma constituting the third component.
strategy will make the answer very specialised.
B.PROPOSED SYSTEM
Our solution is to compress video streams using an Motion Estimation: It is determined the
end-to-end deep learning model. Surveillance distance between the current and previous
systems that are both reliable and light on memory. frames.
Motion Compensation: Pixels from previous
The process of compressing a video file such that it
frames are shifted on the basis of motion vector
takes up less space is known as video compression.
or optical flow obtained during the motion
It's smaller than the original file and easier to send
estimating process to create the frame with
over the internet. It's a compression technique of
motion compensation.
some sort. It minimises the size of video files by
Residual Compression: It is calculated and
removing unnecessary and non-functional data.
compressed the difference between the targeted
Encoding of video is the process of compressing and
and compensated for movement frames.
preparing your video file for playback in the
Motion Compression: The motion data
appropriate formats and specifications. Although
gathered during the motion estimate step is
video compression reduces file size, it may have
compressed and delivered to be encoded.
unintended consequences. excellent video quality
Encoding: After compression, the residue and
Video encoding, on the other hand, compresses your
motion information are encoded and delivered
video file while maintaining its quality.
to the decoder.
Deep Learning is a subset of machine learning that FrameReconstruction :To generate a rebuilt
works with algorithms influenced by data structure. frame, the motion adjusted frame from the
The construction and function of neural networks is motion compensation step is combined with
a form of brain structure. A convolutional neural residue from the residue compression steps.
network (CNN) is a kind of neural network that
analyses large amounts of data using a deep learning
method. Assign priority to various elements in an C.TRANSFORMATION FUNCTIONS
image and be able to distinguish one from the other
using an input image .The quantity pre-processing The frame data is converted from the spatial domain
required by a ConvNet is significantly less than that using the transformations function. This thesis
of other classification algorithms. focuses on the most recent and widely discussed
innovations in the last five years. As a result, H.264 equations. Equations (5) and (6), on the other hand,
and HEVC have been chosen for further are the quantization formulas for intra and inter-
investigation. The transform basis functions are coding with AC values — the non-zero frequencies.
obtained from the DCT in video coding techniques The inverse quantization formula used on the
in general and H.264, HEVC in particular. DCT is a standards is Equation (7).
method for transforming image component picture
elements from the spatial domain to the frequency
domain, where redundancy can be discovered. The
frame is broken into blocks with sizes ranging from
4*4 to 64*64 in video compression. a method of
image compression JPEG breaks the image into 8*8
blocks and applies a two-dimensional discrete
cosine transform (DCT) on each of them.

The DCT function is represented by equation (1). F k (x, y) is the DCT transformed signal, and is the
The Inverse Discrete Cosine Transform (IDCT) is block index, where delta is the quantization
used to the DCT coefficient blocks in video parameter and F k (x, y) is the block index. In video
decompression. The IDCT function is defined by coding, DC represents the value x=y=0, AC
equation(2). represents the worth (x, y) member of (0,1),...,(n, n),
and n represents the block size.

E. MOTION COMPENSATION AND MOTION


ESTIMATION

This subsection examines the motion estimation


where f_k and F_k is the block index, and is the pixel (ME) and motion compensation (MC) functions,
value and DCT coefficient, respectively. which are the most important aspects of inter coding.
In video coding, motion is utilised to encode one
frame in terms of another.The motion estimation
technique uses the block motion estimation concept,
which encodes frame data using modified versions of
other frames.

The full search (FS) algorithm is the most


The uncompressed/raw as well as lossless fundamental and basic search strategy. FS involves
compression digital image requires high memory and using a cost function to identify the best block match
bandwidth. DCT significantly reduces the amount of out of all the pixels in the search range. SAD is
memory and bandwidth. chosen in H.264 and HEVC because it requires fewer
resources than the other cost functions while
D.QUANTIZATION FUNCTION
simultaneously having low distortion, which is
The inevitability of representing a value as a number acceptable. Equation (8) examines a block of
with a definite number of decimal places is candidates in the preceding frame (I t-1) at location
quantization. The quantization scale code is divided (x+u, y+v) and a template block in the present
element-wise by the quantization matrix in video frame at position (x,y).
coding, and each resultant element is rounded. The
quantization parameter determines the step size for
associating the converted coefficients with a limited
number of footsteps.When it comes to video coding,
the value is inversely proportional to the PSNR value where g t(.) is a pixel size in frame I t, and g t-1(.) is
and directly proportional to the CR. In both cases, the a pixel size in frame I t-1. In (u v), the motion vector
DC value is the value with zero frequency. Inverse (MV) is defined as equation (9).
quantization formulas are applied to a converted
signal, and equations (3) and (4) are the quantization
F.ENTROPY CODING image has three (RGB) channels, but the depth will
span all three. A convolutional neural network is a
Entropy encoding is a lossless data compression feed forward neural network with up to 20 or 30
approach that is unaffected by the medium's layers.The convolutional layer, an unique sort of
particular features. Each unique symbol in the input layer, gives a convolutional neural network its
is given a unique prefix code through entropy coding. power.

CONCLUSION

In the past, video was stored on magnetic cassettes.


The discrete in video has been around for a long
time. since the beginning compression . The cosine
transform is a common tool. H.261, the first
practical video computer standard, was born out of a
series of projects. H.264 is a video compression
standard created by a number of major participants
in the video computer industry.It is today's most
Figure : Proposed end-to-end video compression widely acknowledged and used video computing
framework. There is one-to-one mapping between standard.
the traditional video compression pipeline and our The process of reducing the file size and modifying
proposed framework the format of a video is known as video
compression.. It can save money by reducing the
G.CONVOLUTIONAL NEURAL NETWORK amount of storage space needed to store video. It
also minimises the amount of bandwidth needed to
Convolutional Neural Networks (CNNs) are neural send video, making media consumption more
networks that are trained to handle features with a enjoyable for users. It eliminates frames that are
like a grid structure, as an example photographs. A duplicated or repeated, leaving only the ones that are
binary representation of visual data is a as an required. Assume there are two frameworks that are
example. It's made up of a grid of pixels with pixel quite similar. Compression removes unnecessary
values that specify how bright and what colour each data from one frame and exchanges it with an
one should be.. When we see an image, our brain instance of the other. The purpose of video
processes a large the amount of data. Each neuron compression is to send video data at a low bitrate
has its own receptive field, which is linked to the while maintaining image quality.
receptive fields of neighbouring neurons to cover the
full visual field. All neuron in the biological
perception system only reacts to external stimuli in a
tiny region of the visual field called the receptive
field, and each neuron in a CNN only data analysis in
its receptive area. A less complicated pattern (lines,
curves, and so on) appear first in the layers, after that
pattern that is more complicated (faces, objects,
etc.). A CNN can be used to provide sight to a
computer. The CNN's most critical component is the
convolution layer. It is in charge of the majority of
the processing power on the system.
Figure (a): Original(target) frame,(b): Reconstructed
This layer does a dot product between two- frame using proposed work
dimensional matrix, one of which is the restricted
area of the receptive field and the other a set of
learnable parameters called a kernel.. The kernel has
more depth than a picture, despite its smaller size.
The kernel height and width will be modest if the
[6] Lu, G., Ouyang, W., Xu, D., Zhang, X.,
Cai, C., & Gao, Z. (2019). Dvc: An end-to-
end deep video compression framework, In
Proceedings of the ieee conference on
computer vision and pattern recognition.

[7] Rippel, O., Nair, S., Lew, C., Branson,


Figure (a): Comparison between our model and
MPEG based on MS-SSIM, (b): Comparison
S., Anderson, A. G., & Bourdev, L. (2018).
between our model and MPEG based on PSNR Learned video compression, arXiv
1811.06981
[8] Farnebäck, G. (2003). Two-frame motion
References estimation based on polynomial expansion, In
Scandinavian conference on image analysis.
Springer.
[1] Le Gall, D. (1991). Mpeg: A video
compression standard for multimedia [9] Fischer, P., Dosovitskiy, A., Ilg, E., Häusser,
applications. Communications of the ACM, P., Hazırbaş, C., Golkov, V., Van der Smagt, P.,
34(4), 46–58. Cremers, D., & Brox, T. (2015). Flownet:
Learning optical flow with convolutional
networks. arXiv preprint arXiv:1504.06852
[2] Wiegand, T., Sullivan, G. J.,
Bjontegaard, G., & Luthra, A. (2003). [10] Ranjan, A., & Black, M. J. (2017). Optical
Overview of the h. 264/avc video coding flow estimation using a spatial pyramid
standard. IEEE Transactions on circuits and network, In Proceedings of the ieee conference
systems for video technology, 13(7), 560– on computer vision and pattern recognition.
576. [11] N. Ahmed, T. N., & Rao, K. R. (1974).
Discrete cosine transform. IEEE Transactions
on Computers, C-23(1), 90–93.
[3] Liu, H., Chen, T., Shen, Q., Yue, T., &
Ma, Z. (2018). Deep image compression via [12] Sullivan, G. J., Ohm, J., Han, W., &
end-to-end learning., In Cvpr workshops. Wiegand, T. (2012). Overview of the high
efficiency video coding (hevc) standard. IEEE
Transactions on Circuits and Systems for Video
[4] Toderici, G., Vincent, D., Johnston, N., Technology, 22(12), 1649–1668.
Jin Hwang, S., Minnen, D., Shor, J., &
Covell, M. (2017). Full resolution image [13] Hadar, O., Shleifer, A., Mukherjee, D.,
compression with recurrent neural Joshi, U., Mazar, I., Yuzvinsky, M., Tavor, N.,
networks, In Proceedings of the ieee Itzhak, N., & Birman, R. (2017). Novel modes
conference on computer vision and pattern and adaptive block scanning order for intra
recognition prediction in AV1 (A. G. Tescher, Ed.). In A. G.
Tescher (Ed.), Applications of digital image
processing xl, SPIE. International Society for
[5] Kim, S., Park, J. S., Bampis, C. G., Lee, Optics and Photonics.
J., Markey, M. K., Dimakis, A. G., & https://doi.org/10.1117/12.2274035
Bovik, A. C. (2020). Adver-sarial video
compression guided by soft edge detection, [14] Laude, T., & Ostermann, J. (2016). Deep
In Icassp 2020-2020 ieee international learning-based intra prediction mode decision
conference on acoustics, speech and signal for hevc, In 2016 picture coding symposium
processing (icassp). IEEE (pcs).
[15] Cui, W., Zhang, T., Zhang, S., Jiang, F., Zuo, Simoncelli, E. P. (2016). End-to-end optimized
W., & Zhao, D. (2018). Convolutional neural image compression. arXiv preprint
networks based intra prediction for hevc. arXiv:1611.01704.

[16] Zhao, Z., Wang, S., Wang, S., Zhang, X.,


Ma, S., & Yang, J. (2018). Cnn-based bi-
directional motion compen-sation for high
efficiency video coding, In 2018 ieee
international symposium on circuits and
systems (iscas). [25] Marpe, D., Schwarz, H., & Wiegand, T.
(2003). Context-based adaptive binary
[17] Lee, J. K., Kim, N., Cho, S., & Kang, J.
arithmetic coding in the h. 264/avc video
(2018). Convolution neural network based
compression standard. IEEE Transactions on
video coding technique using reference video
circuits and systems for video technology,
synthesis, In 2018 asia-pacific signal and
13(7), 620–636.
information processing association annual
summit and conference (apsipa asc). [26] Ballé, J., Minnen, D., Singh, S., Hwang, S. J.,
& Johnston, N. (2018). Variational image
[18] Liu, J., Xia, S., Yang, W., Li, M., & Liu, D.
compression with a scale hyperprior. arXiv
(2019). One-for-all: Grouped variation
preprint arXiv:1802.01436.
networkbased fractional interpolation in video
coding. IEEE Transactions on Image Processing, [27] Kingma, D. P., & Welling, M. (2013). Auto-
28(5), 2140–2151. encoding variational bayes. arXiv preprint
arXiv:1312.6114.
[19] Ibrahim, E. M., Badry, E., Abdelsalam, A.
M., Abdalla, I. L., Sayed, M., & Shalaby, H. [28] Rissanen, J., & Langdon, G. (1981).
(2018). Neural networks based fractional pixel Universal modeling and coding. IEEE
motion estimation for hevc, In 2018 ieee Transactions on Information Theory, 27(1), 12–
international symposium on multimedia (ism). 23.
[20] Jiang, F., Tao, W., Liu, S., Ren, J., Guo, X., & [29] Wang, Z., Bovik, A. C., Sheikh, H. R., &
Zhao, D. (2018). An end-to-end compression Simoncelli, E. P. (2004). Image quality
framework based on convolutional neural assessment: From error visibility to structural
networks. IEEE Transactions on Circuits and similarity. IEEE transactions on image
Systems for Video Technology, 28(10), 3007– processing, 13(4), 600– 612.
3018.
[30] Oh, S., Hoogs, A., Perera, A., Cuntoor, N.,
[21] Ilg, E., Mayer, N., Saikia, T., Keuper, M., Chen, C.-C., Lee, J. T., Mukherjee, S., Aggarwal,
Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: J., Lee, H., Davis, L. Et al. (2011). A large-scale
Evolution of optical flow estimation with deep benchmark dataset for event recognition in
networks, In Proceedings of the ieee surveillance video, In Cvpr2011. IEEE.
conference on computer vision and pattern
recognition [31] Kingma, D. P., & Ba, J. (2014). Adam: A
method for stochastic optimization. arXiv
[22] Schwarz, H., Marpe, D., & Wiegand, T. preprint arXiv:1412.6980.
(2007). Overview of the scalable video coding
extension of the h. 264/avc standard. IEEE
Transactions on circuits and systems for video
technology, 17(9), 1103–1120. [23] Ballé, J.,
Laparra, V., & Simoncelli, E. P. (2015). Density
modeling of images using a generalized
normalization transformation. arXiv preprint
arXiv:1511.06281. [24] Ballé, J., Laparra, V., &

You might also like