Professional Documents
Culture Documents
CCTV footage
Swati Shilaskar Ritika Sisodiya Tejas Chougule
E&TC Dept E&TC Dept E&TC Dept
VIT, Pune VIT, Pune VIT, Pune
swati.shilaskar@vit.edu ritika.sisodiya20@vit.edu tejas.chougule 21@vit.edu
Abstract- Person tracking in multiple CCTV footages their execution. The primary objective is to serve as a
is a challenging issue in computer vision that has reference for analysts and specialists fascinated by creating
attained increasing attention in the past few years. new approaches to protest discovery and following in
The main intent is to accurately track individuals as
multi-camera visual reconnaissance systems.The paper is
they move through different camera views, which is
essential for a broad range of applications such as organized into three areas. The primary segment examines
security, surveillance and also crowd monitoring. the challenges and restrictions related with multi-camera
However, this task is complicated by a number of visual reconnaissance frameworks. It highlights the
factors including occlusions, changes in lighting, and significance of inter-camera calibration and correspondence
camera angles.To address these challenges,
for effective protest discovery and following. The moment
researchers have developed a variety of techniques
such as multi-camera calibration, feature extraction, segment covers state-of-the-art protest discovery and
and tracking algorithms. Deep learning methods following strategies, counting feature-based,
have also shown promising results in improving appearance-based, and deep learning-based approaches. This
person tracking accuracy. However, there are still area gives an outline of the points of interest and
many open research questions, particularly in cases
impediments of each approach and highlights the
of crowded scenes or rapid movements.In this paper,
we review recent advances in person tracking in significance of selecting the correct method based on the
multiple CCTV footages, discussing both traditional reconnaissance situation. The third area talks about the
methods and deep learning approaches. We also existing datasets and assessment measurements utilized to
highlight some of the remaining challenges and degree the execution of multi-camera surveillance systems.
future directions for research in this area. Overall,
This section emphasizes the importance of using
person tracking in multiple CCTV footages remains
standardized evaluation metrics and datasets to facilitate
an active area of research with significant potential
for real-world applications. comparison and reproducibility of results. Overall, the paper
emphasizes the importance of multi-camera visual
surveillance systems and their potential to enhance safety
I. INTRODUCTION
and crime prevention measures in various fields. By
Visual reconnaissance frameworks are broadly utilized to providing an overview of the latest techniques and
screen individuals and objects in different settings, counting evaluation metrics, this paper aims to promote further
air terminals, open zones, and activity administration. With research in this area and facilitate the development of more
the headways in computer vision advances, these efficient and accurate object detection and tracking
frameworks can be computerized to extend their approaches for multi-camera visual surveillance systems.
productivity and diminish the workload on human II. LITERATURE REVIEW
administrators. Be that as it may, conventional single-camera
The paper proposes a strategy for multi-person following
reconnaissance frameworks are not sufficient for complex
over non-overlapping uncalibrated cameras utilizing
reconnaissance assignments and require multi-camera
information affiliation. The strategy includes building
observation frameworks for protest discovery, tracking, and
directions from each camera freely and finding affiliations
programmed calibration.This term paper points to supply an between directions from distinctive cameras. The paper
diagram of multi-camera visual reconnaissance frameworks, presents a strategy to investigate human portion
especially their challenges and confinements, the most arrangements on directions to portray inter-camera
recent strategies for question location and following, and the spatial-temporal limitations and defines the affiliation issue
assessment measurements and datasets utilized to degree
as a multi-class classification issue utilizing space priors different object sequences observed across multiple cameras, which
such as bunch action. The proposed strategy accomplishes may have either overlapping or non-overlapping views.
vigorous multi-person following, and test comes about on a Overlapping multi-camera tracking systems usually require camera
calibration and rely on geometric constraints, while tracking
benchmark dataset approve its effectiveness.[1] The
objects across non-overlapping views presents even greater
proposed framework tracks the positions of numerous
difficulties. To address this issue, many factors need to be taken
people from covering cameras employing a novel two-step into consideration, including illumination variance, viewpoint
technique that together gauges individual position and track changes, pose changes, nonuniform clothing, self-occlusions, and
task. The framework productively computes the K-best joint differences in camera characteristics, among others [11]. This
gauges for individual position and track task beneath an paper introduces new features to enhance the re-identification of
estimation of the probability work employing a subset of individuals within a group of people. The proposed method
prompts. Within the speculation confirmation organize, known considers a group to be a cohesive set of individuals with shared
individual positions related with these arrangements are utilized to relative positions and trajectories, even across different cameras.
characterize a bigger set of unmistakable prompts to re-rank the To increase the accuracy of re-identification, the approach
found assignments. The framework outflanks the state-of-the-art on combines traditional image cues that depict the appearance of each
four challenging multi-person datasets. [2] This paper centers on person with the newly developed group features. A proposed
the issue of target-agnostic individual following and recognition method to recognize groups of people by classifying trajectories
over different non-overlapping cameras. The think investigates and using features obtained from recognized groups to achieve
existing calculations for online individual following utilizing re-identification of people across non-overlapping cameras. In the
discriminative spatio-temporal highlights from video information, proposed method for population detection, the spatio-temporal
and recognizes open problem and future inquire about headings. relationship between pairs of pedestrians is presented at each time
The proposed approaches incorporate a spatio-temporal point [12]. Multiview analysis can compensate for the effects of
demonstrate utilizing LSTM systems and the utilize of movement occlusion and noisy observations. However, cost and logistics
metadata such as individual bounding box and camera number. considerations often limit the number of duplicate cameras that can
These strategies accomplish state-of-the-art execution in be used. The main goal is how it already works while overlapping
large-scale following datasets, without requiring target-specific with her 3-4 cameras from very different oblique downward view
appearance models[3] A modern coordinates system for tending to directions. Most often use the object's visual his hull obtained by
the issues of thermal-visible video enrollment, sensor combination, volume carving. So the biggest challenge is getting the correct
and individuals following for far-range recordings. The proposed correspondence between views at the object level. The task of
system employments a RANSAC trajectory-to-trajectory recognizing and tracking discovered objects as people is
coordinating for video enlistment, sensor combination to compute formulated as a bipartite graph mapping problem. Hypotheses that
sum-rule outlines, and different question following for following. associate 2D detections across views are constructed using
Comes about illustrate that the proposed system gets way better volumetric rebuilding to shape detections in 3D space.When it
comes about for both picture enlistment and following than comes to detecting and tracking multiple people with multiple
partitioned methods[4] A real-time tracking system for surveillance cameras, several key aspects need to be considered. These include
and security using multiple cameras. The system is based on finding relation between different camera views, linking detections
feature extraction, background subtraction, and object with traces, and identifying false positives. The primary
identification. The proposed system extends the concept of single contribution of a particular approach is a two-step likelihood
detection using a stationary camera to multiple object detection optimization technique that addresses all of these issues while also
under multiple cameras for indoor environments. The objective is taking into account the positional information of foreground
to enhance security measures and achieve real-time detection of objects. Joint estimation involves using multiview display models
non-steady objects in video sequences and live video streaming. and occlusion information, with the added advantage of leveraging
This will be accomplished by employing a strong algorithm that is overlapping camera settings. The proposed method was compared
capable of detecting and tracking human bodies with high to several prior art methods and different types of background
reliability [8]. This paper discusses the challenges of object estimation to evaluate its performance. Results showed that in
tracking in real-world CCTV footage due to factors such as moving scenarios where person densities are lower and static backgrounds
backgrounds, motion blur, and scale changes. The authors propose are utilized across all methods, KSP-based methods perform better
the use of convolutional neural networks (CNNs) for more efficient than the proposed recursive approach [13]. For large-scale
tracking and explore the use of heterogeneous training data and surveillance cameras to be effective real-time systems, they must
data augmentation to improve detection rates. The researchers be capable of identifying and responding to events as they occur.
propose incorporating the spatial transformation parameters of However, the generation and processing of massive amounts of
objects to predict and model camera parameters, which could lead data represent significant challenges for any intelligent surveillance
to improved performance. To evaluate their approach, they test it system.
on publicly available datasets as well as real-world CCTV The primary aim of a surveillance system is to reduce the potential
videos.[10] Multi-camera visual surveillance systems have become for economic or physical harm, and the speed at which decisions
increasingly popular for various applications, such as continuously are made plays a crucial role in achieving this objective. The
tracking objects of interest and providing early warning of effectiveness of a video surveillance system is largely determined
abnormal events. However, one of the main challenges in by the time it takes to make decisions. To improve the effectiveness
wide-area object tracking is establishing correspondences between of the system, the proposed method involves extracting features
such as color, shape, and texture from individuals observed by the The identification, extraction, and tracking of moving objects in
cameras. The combination of these features is used to create a surveillance videos captured from multiple angles. The proposed
signature for each person, which enables the system to identify solution involves utilizing a multi-view foreground matching
them. When dealing with multiple cameras, the model identifies model that employs HOG detection and system clustering
relationships between each pair of cameras by comparing the techniques to recognize and match the same foreground object
signature of the same person across different cameras. To improve from different viewpoints. The study's findings indicate that the
the accuracy of person identification, the model uses multiple model efficiently detects and extracts foreground moving objects
features, including color, pose, and texture, to create a more
and matches them across multiple surveillance videos.[16]. This
comprehensive signature. In non-overlapping fields of view, binary
paper reviews existing approaches for vision-based object detection
SVM is used to establish the relationship between two cameras to
and tracking and highlights the issue of occlusion. The paper
re-identify a person. False detections cannot be entirely avoided,
suggests that a combination of features such as depth data,
but the model employs background subtraction to reduce their
geometry, textural, color, speed, etc., and a combination of
impact. By calculating the percentage of the detected region in
motion, the system can minimize the effects of false detections. techniques such as 3D techniques, optimal trajectory estimation,
Overall, the proposed model and method show promise in region of interest feature extraction, and feature fusion can be used
achieving the aim of effective video surveillance systems.[14]. In to track multiple objects under occlusion conditions. The paper also
[9] for the purpose of separating suggests that synchronizing multiple cameras with optimal feature
the object(s) from the input video frames via background extraction, trajectory estimation and classification can be an
effective solution for tracking multiple objects under occlusions,
subtraction, various techniques have been explored. This
especially in traffic and outdoor surveillance applications[17]. A
essay essentially serves as an overview of the various
survey of various object tracking algorithms with a focus on the
background subtraction methods. A fascinating debate of
role of feature selection. The algorithms are evaluated under
the Wallflower, W4, Cutler, Halevy, and other algorithms
different environmental conditions to determine their efficacy. The
took place. The many algorithms, their fundamental
paper highlights the importance of considering shape, color,
mathematical models for differentiating between foreground texture, object of interest, and motion in multi-direction for object
and background pixels, and comments on the requirement tracking in video surveillance applications. The results show that a
are shown in Table I. single algorithm can be designed by combining multiple features
for more effective object tracking.[18] This paper discusses a novel
Table I Comparative Study approach to tracking moving objects in video surveillance
applications. Specifically, the proposed method involves a
Algorithm multi-feature fusion model that utilizes color and edge orientation
Model Remarks
information. The approach also incorporates a stochastic fusion
[9]Heikkila and |Ii-Bi|>Th 1.Predefined constant
threshold scheme, which serves to enhance tracking robustness and address
Olli Bi+1=αIi + (1 -
2. For background occlusion issues. The experiment results show that the proposed
α) Bi
correction must be method is effective in achieving strong tracking performance.[19]
very small constant This paper presents a framework for robust identification, tracking,
α
and categorisation of multiple moving objects in real-time from
[5]Adaptive Wj,i*F(Ii; , 1.Each time UAV videos. This framework leverages off-the-shelf UAV systems
mixture of ) background updated and a laptop computer to offer a range of data, including the
Gaussian coordinates and velocities of identified objects. The system's
P(Ii)= (j=1Σtoµ before the
k)σ2 efficacy is cited as surpassing human proficiency for detecting
foreground
moving objects [20].
detected.
2. Initial constant
threshold required.
III. METHODOLOGY
|Min - Ii|> Th, 1.Morphological
[9] W4 |Max - operations as
II|> Th erosion and dilution
[14] Cluter Σ𝑐ϵRGB|Ii(c)-Bi(c) noise Standardσ
|>Mσ Derivation 1.
:Offline
2.Basically used for
template matching
[9]Wallflower B(i-k)Bi=- (k= 1 a(k) are updated each
time in each frame .
to p) a(k)Σ
2. Two autoregressive
background models
has used Figure 1
Discuss recording video using multiple cameras to create a video streams, overlap-based matching, appearance-based
data set. Briefly describe research issues when using multiple matching, or a combination of both techniques can be used.
cameras. There is a brief detail about the research issues with the
Overlap-based matching involves matching tracks based on
use of multiple cameras.Collecting multiple CCTV footage streams
the overlap of their bounding boxes in consecutive frames,
from different angles can be a useful method for tracking people,
objects, and activities in public spaces. The process involves while appearance-based matching involves matching tracks
identifying the location and cameras, obtaining permission, based on the similarity of their appearance features such as
gathering and processing the footage using specialized software or color, texture, or shape. By combining both techniques, the
tools, analyzing the footage with object detection and tracking accuracy of matching can be improved, especially in
algorithms, and refining the algorithms as needed. However, it's
complex environments with occlusions or poor footage
important to consider privacy concerns and ensure that the use of
surveillance technology is transparent and accountable to the
quality. The choice of technique depends on the application
public. By balancing the benefits of the technology with ethical requirements such as accuracy level or complexity of the
considerations, we can use multiple CCTV footage streams to environment being monitored. These techniques can help to create
improve safety and security in public spaces. To analyze CCTV a comprehensive view of individuals' movements, useful for
footage and track people or objects in public spaces, a deep identifying patterns of behavior or tracking individuals of interest.
learning-based object detection algorithm can be used. After However, it's essential to address privacy concerns and ensure
gathering and processing the footage, the algorithm uses a neural transparency and accountability when using surveillance
network to learn relevant features and detect and track objects in technology.To match individuals' tracks across different CCTV
real-time. The algorithm can be trained using tools like TensorFlow footage streams, data association is performed after object
or PyTorch on a large dataset of labelled images, and can be detection and tracking algorithms are used to analyze and track
refined by adjusting parameters or retraining the model with individuals in multiple streams. Various techniques for computing
additional data. By using a deep learning-based object detection
optical flow have been examined. The approach outlined by Brox
algorithm, we can achieve higher accuracy and efficiency.
et al. in their paper [14] is employed in this study. This method
prioritizes flow, is resilient, isotropic, and features a gradient
constancy term. Experiments documented in [14] demonstrate that
Brox et al.'s technique yields noteworthy precision, occasionally
even exceeding the current best known value. Furthermore, the
method is resilient to substantial noise levels. The energy
functional for the Brox et al. algorithm can be expressed as
EE
E= data smooth where D is the
streams into a single database or visualization tool to 1 Single Person ● Facial Features
provide a clear understanding of individuals' movements Extraction.
over time and space. This is an essential step in analyzing ● Attire of the person
Detected.
and understanding individuals' movements across different ● Background modeling
streams. Integration can be performed manually or ● Front angle of the cctv
automatically, depending on the complexity of the data and camera.
the desired level of accuracy. The resulting data can be used 2 Two persons ● Face Detection of person
for various applications, such as identifying patterns of 2
behavior, tracking individuals of interest, or detecting ● Attire Detection of
person 2
anomalies. To evaluate the performance of the persons ● Area
tracking system, the accuracy of the tracking algorithm and ● Back angle of the cctv
the data association techniques used can be measured using camera.