Professional Documents
Culture Documents
CHAPTER – 2
ARCHITECTURE AND DESIGN
CHAPTER 2
ARCHITECTURE AND DESIGN
This chapter gives a brief overview of the design theory and concepts that have been
made use in the project.
A video consists of an ordered sequence of frames. Each frame contains spatial information,
and the sequence of those frames contains temporal information. To model both of these
aspects, we use a hybrid architecture that consists of convolutions (for spatial processing)
as well as recurrent layers (for temporal processing). The first model i.e., CNN will be used
to extract the (spatial) features and convert them into an encoded feature vector hence called
an encoder. Similarly the second model i.e., RNN will be used to process mini-batches of
encoded frames to get the final classification result hence called a decoder.
When it comes to real-time video processing, the data pipeline becomes more complex to
handle. And we are striving to minimize latency in streaming video. On the other hand, we
must also ensure sufficient accuracy of the implemented models.
The first part of this process i.e., preprocessing and encoding frames is a serial process which
will be done on each frame coming sequentially from CCTV.
The second part i.e., decoding a batch of embeddings to get predictions as probabilities of
different classes will be done once the required number of frames is available.
To make the best use of the hardware resources as well as to decrease latency and lags
while monitoring the live feed, we can use a pipeline approach which is aimed to split and
parallelize the operations, which are performed during the processing.