You are on page 1of 61

CRIMEGUARD

STRENGTHING PUBLIC SAFETY THROUGH

ADVANCED DETECTION AND NOTIFICATION

A PROJECT REPORT
Submitted by,

DHEJASRI K (310820205016)
HARISHMA T (310820205025)
JESSY AMAL RANI F (310820205036)

In the partial fulfillment of the award of the degree

of

BACHELOR OF TECHNOLOGY

in

INFORMATION TECHNOLOGY

JEPPIAAR ENGINEERING COLLEGE


ANNA UNIVERSITY
CHENNAI 600 025
MAY 2024
JEPPIAAR ENGINEERING COLLEGE
DEPARTMENT OF INFORMATION TECHNOLOGY
JEPPIAAR NAGAR, RAJIV GANDHI ROAD, CHENNAI-119

BONAFIDE CERTIFICATE

This is to certify that this Project Report “CRIMEGUARD: STRENGTHING


PUBLIC SAFETY THROUGH ADVANCED DETECTION AND
NOTIFICATION” is the bonafide work of “JESSY AMAL RANI F, DHEJASRI K
and HARISHMA T” who carried out the project under my supervision.

SUPERVISOR HEAD OF DEPARTMENT

Mrs. Anuja. T, M.E. (Ph. D), Mrs. C.Anitha, M.E., Ph.D,


Assistant Professor, Assistant Professor,
Department of IT, Department of IT,
Jeppiaar Engineering College, Jeppiaar Engineering College,
Chennai 600 119. Chennai 600 119.

Submitted for the project viva voce examination held on

INTERNAL EXAMINER EXTERNAL EXAMINER

I
ACKNOWLEDGEMENT

We are very much indebted to (Late) Hon’ble Colonel Dr. JEPPIAAR, M.A.,
B.L., Ph.D., Our Chairman and Managing Director Dr. M. REGEENA
JEPPIAAR, B. Tech., M.B.A., Ph.D., the Principal Dr. K. Senthil Kumar,
M.E, Ph. D, FIE, and the Dean Academics Dr. SHALEESHA A. STANLEY
M.Sc., M.Phil., Ph.D., to carry out the project here.

We would like to express our deep sense of gratitude to Dr. C. Anitha, M.E.,
Ph.D., Head of the Department, and also to our guide Mrs. Anuja T, M.E., for
giving valuable suggestions for making this project a grand success.

We take this opportunity to express our sincere gratitude to our Project


coordinator Mrs. Anuja.T , M.E., for giving us the opportunity to do this project
under their esteemed guidance.

We also thank the teaching and non-teaching staff members of the Department of
Information Technology for their constant support.

II
ABSTRACT

The world faces an average annual fatality rate of 7.9 per 10,000 people due to human
violence. Much of this violence occurs suddenly or in isolated areas, presenting a
significant challenge in preventing and addressing these acts. To tackle this issue, a
detection technique has been employed, leveraging the effectiveness of computer vision
algorithms, particularly in the realm of detecting moving objects from Closed-Circuit
Television (CCTV) footage. CCTV cameras have become ubiquitous on streets, serving as
invaluable tools in solving criminal cases. This study focuses on enhancing the proactive
detection of violent acts by utilizing deep learning techniques in computer vision to predict
and identify actions and properties from video data. The aim is to overcome the
information delay that often hampers timely intervention in violent situations. The study
employs YOLO-v5 models, a state-of-the-art deep learning architecture, to detect violent
acts, determine the number of individuals involved, and identify any weapons used in the
situation. The core of this study revolves around implementing deep learning models to
establish a comprehensive video detection system. YOLO-v5, which stands for "You Only
Look Once," is renowned for its efficiency in real-time object detection. By harnessing the
power of YOLO-v5, the system can rapidly analyze video feeds from CCTV cameras,
enabling law enforcement to identify and respond to violent incidents promptly. The
integration of YOLO-v5 allows for detecting various parameters essential for
understanding a violent situation. Not only can the system identify the occurrence of a
violent act, but it can also quantify the number of persons involved. Additionally, the
model is designed to recognize and report on any weapons present in the observed
scenario. The significance of this study lies in its potential to revolutionize the way law
enforcement responds to and prevents violent incidents.

II
I
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE


NO.
ABSTRACT III

LIST OF FIGURES VII

CHAPTER 1: INTRODUCTION
1
1.1 INTRODUCTION 1
CHAPTER 2: LITERATURE SURVEY
2
2.1 LITERATURE SURVEY 4
CHAPTER 3: SYSTEM ANALYSIS
3
3.1 EXISTING SYSTEM 9
3.2 PROPOSED SYSTEM 9
3.3 BLOCK DIAGRAM 10
3.3.1 DESCRIPTION OF THE SYSTEM BLOCK

DIAGRAM 10
3.4 FLOW DIAGRAM 11
CHAPTER 4: METHODOLOGIES AND
4 ALGORITHMS

METHODOLOGIES

4.1 ENSEMBLE ALGORITHM USED 12


4.1.1 YOLO ALGORITHM 12
4.1.2 CONVOLUTION NEURAL NETWORK 13
4.1.3 REAL TIME CRIME DETECTION 15
4.1.4 DEEP LEARNING MODELS 16

CHAPTER 5: SYSTEM DESIGN

5.1 FUNCTIONAL AND NON-FUNCTIONAL

I
V
REQUIREMENTS
23
5.1.1 FUNCTIONAL REQUIREMENTS 23
5 5.1.2 NON-FUNCTIONAL REQUIREMENTS 23
5.2 SYSTEM SPECIFICATIONS 24
5.2.1 HARDWARE SPECIFICATIONS 24
5.2.2 SOFTWARE SPECIFICATIONS 24
5.3 UML DIAGRAMS 25
5.3.1 USE CASE DIAGRAM 26
5.3.2 CLASS DIAGRAM 27
5.3.3 SEQUENCE DIAGRAM 27
5.3.4 COLLABORATION DIAGRAM 28
5.3.5 DEPLOYMENT DIAGRAM 29
5.3.6 ACTIVITY DIAGRAM 29
5.3.7 COMPONENT DIAGRAM 30
5.3.8 ER DIAGRAM 30
5.3.9 DFD DIAGRAM 31
CHAPTER 6: SOFTWARE DESIGN
6
6.1 SOFTWARE DEVELOPMENT LIFE CYCLE 33
6.2 FEASIBILITY STUDY 34
6.2.1 ECONOMIC FEASIBILITY 35
6.2.2 TECHNICAL FEASIBILITY 35
6.2.3 SOCIAL FEASIBILITY 35
6.3 MODULES 36
6.3.1 INPUT MODULE 36
6.3.2 PRE-PROCESSING MODULE 36
6.3.3 VIOLENCE DETECTION MODULE 37
6.3.4 VISUALIZATION MODULE 38

CHAPTER 7: SOFTWARE TESTING


7
7.1 INTRODUCTION 41
7.2 TESTING TRADITIONAL SOFTWARE SYSTEMS

V
V/S MACHINE LEARNING SYSTEMS 42
7.3 MODEL TESTING AND MODEL EVALUATION 43
7.3.1 WRITING TEST CASES 43
7.4 PROJECT TESTING 45
CHAPTER 8: IMPLEMENTATION

8 8.1 FRONT END CODING 47


8.2 BACK END CODING 50
9 CHAPTER 9: OUTPUTS AND SNAPSHOTS 57

CHAPTER 10: CONCLUSION AND FUTURE


10 WORK

10.1 CONCLUSION 54
10.2 FUTURE WORK 54

REFERENCES 66

VI
LIST OF FIGURES

FIGURE NO. NAME OF THE FIGURE PAGE NO.


3.3 BLOCK DIAGRAM 10
3.4 FLOW DIAGRAM 11
4.1.1.1 LOGISTIC REGRESSION 13
HYPOTHESIS
4.1.1.2 LOGISTIC REGRESSION DECISION 13
BOUNDARY
4.1.2 RANDOM FOREST CLASSIFIER 15
4.2.1.1 GRAPH OF SUPPORT VECTORS 18
4.2.1.2 HYPERPLANES 19
4.2.1.3 SUPPORT VECTORS 19
5.3.1 USE CASE DIAGRAM 26
5.3.2 CLASS DIAGRAM 27
5.3.3 SEQUENCE DIAGRAM 28
5.3.4 COLLABORATION DIAGRAM 28
5.3.5 DEPLOYMENT DIAGRAM 29
5.3.6 ACTIVITY DIAGRAM 29
5.3.7 COMPONENT DIAGRAM 30
5.3.8 ER DIAGRAM 31
5.3.9 DFD DIAGRAM 32
6.1 WATERFALL MODEL 33
9.1 SNAPSHOTS 57

VI
I
CHAPTER 1
INTRODUCTION

1.1 INTRODUCTION

The rapid advancements in video and image processing technologies have been remarkable,
driven by the growing need to extract meaningful content for a variety of applications,
particularly in the realm of surveillance and security. One critical application involves the
recognition of actions and objects, including potentially harmful items such as knives or
guns. The rise in incidents of human violence in our daily lives has underscored the
importance of developing robust systems to automatically detect and respond to violent
activities, especially in surveillance footage where manual monitoring is impractical due to
the sheer volume of data. The prevalence of millions of surveillance cameras worldwide
necessitates automated methods for detecting and responding to potential threats. Although
the percentage of human violence incidents may be relatively low, the potential dangers
exist in various settings, making it crucial to devise systems capable of estimating and
responding to such situations promptly. This study delves into the current landscape of
human violence detection systems, emphasizing the role of deep learning techniques in
addressing this pressing concern. Deep learning algorithms play a pivotal role in automating
the detection of violent activities. The process involves multiple stages, including object
detection, action detection, and video classification. The objective is to create a system
capable of autonomously identifying and flagging instances of human violence without
requiring human intervention. Leveraging transfer learning, this study incorporates two
prominent deep learning models: Google Net – Inception – v3 for image classification and
YOLO (You Only Look Once) – v5 for object and face detection.In this research, the
machine learning pre-trained model Inception – v3 is a key component. This model
surpasses the basic structures of earlier Inception v1 and v2 computer vision models.
Trained on extensive ImageNet datasets, Inception – v3 exhibits a sophisticated
architecture that retains valuable information from inception layers to top layers. This
VI
II
model's proficiency in image classification is harnessed to recognize patterns and features
indicative of violent activities.The YOLO – v5 model, renowned for its efficiency in object
detection, further enhances the system's capability to identify and locate relevant objects,
including potential weapons or violent actions, within video streams. YOLO – v5's real-
time object detection capabilities make it a valuable asset in scenarios where immediate
responsiveness is crucial.By integrating these deep learning models into a unified system,
the aim is to create a robust framework for automated human-violence detection. The
synergy between image classification, object detection, and real-time processing allows for
a comprehensive analysis of video streams. This system operates without the need for
constant human monitoring, making it a scalable and effective solution for enhancing
security and safety in diverse environments.Thus, the integration of deep learning
techniques, exemplified by models like Inception – v3 and YOLO – v5, holds immense
promise in automating the detection of human-violence activities. This research contributes
to the ongoing efforts to leverage technology for enhancing security measures, especially in
the context of widespread surveillance. The goal is to create intelligent systems that can
promptly identify and respond to potential threats, ultimately contributing to a safer and
more secure environment.

IX
CHAPTER 2
LITERATURE SURVEY

2.1 Deep Learning For Automatic Violence Detection: Tests On The


Airtlab Dataset
Author: Paolo Sernani, Nicola Falcionelli, Selene Tomassini, Paolo Contardo, Aldo
Franco Dragonis

With the growing availability of video surveillance cameras and the need for techniques to
automatically identify events in video footage, there is an increasing interest in automatic
violence detection in videos. Deep learning-based architectures, such as 3D Convolutional
Neural Networks, demonstrated their capability of extracting spatio-temporal features from
videos, being effective in violence detection. However, friendly behaviors or fast moves
such as hugs, small hits, claps, high fives, etc., can still cause false positives, interpreting a
harmless action as violent. To this end, we present three deep learning-based models for
violence detection and test them on the AIRTLab dataset, a novel dataset designed to check
the robustness of algorithms against false positives. The objective is twofold: on one hand,
we compute accuracy metrics on the three proposed models (two are based on transfer
learning and one is trained from scratch), building a baseline of metrics for the AIRTLab
dataset; on the other hand, we validate the capability of the proposed dataset of challenging
the robustness to false positives. The results of the proposed models are in line with the
scientific literature, in terms of accuracy, with transfer learning-based networks exhibiting
better generalization capabilities than the trained from scratch network. Moreover, the tests
highlighted that most of the classification errors concern the identification of non-violent
clips, validating the design of the proposed dataset. Finally, to demonstrate the significance

X
of the proposed models, the paper presents a comparison with the related literature, as well
as with models based on well-established pre-trained 2D Convolutional Neural Networks
(2D CNNs). Such comparison highlights that 3D models get better accuracy performance
than time-distributed 2D CNNs (merged with a recurrent module) in processing the
spatiotemporal features of video clips. The source code of the experiments and the
AIRTLab dataset are available in public repositories.

2.2 A Novel YOLO-Based Safety Helmet Detection in Intelligent Construction


Platform
Author: Meng Yang, Zhile Yang, Yuanjun Guo, Shilong Su & Zesen

Safety is the foremost important issue on the construction site. Wearing a safety helmet is
a compulsory issue for every individual in the construction area, which greatly reduces
injuries and deaths. However, though workers are aware of the dangers associated with not
wearing safety helmets, many of them may forget to wear helmets at work, which leads to
significant potential security issues. To solve this problem, we have developed an
automatic computer-vision approach based on Convolutional Neural Network (YOLO) to
detect wearing conditions. We create a safety helmet image dataset of people working on
construction sites. The corresponding images are collected and labeled and are used to
train and test our model. The YOLO-based model is adopted and the parameters are well
tuned. The precision of the proposed model is 78.3% and the accuracy rate is 20 ms. The
results demonstrate that the proposed model is an effective method and comparatively fast
for recognition and localization in real-time helmet detection .

2.3 Fast Personal Protective Equipment Detection for Real Construction Sites
Using Deep Learning Approaches

Author: Zijian Wang ,Yimin Wu ,Lichao Yang ,Arjun Thirunavukarasu ,Colin

The existing deep learning-based Personal Protective Equipment (PPE) detectors can only
XI
detect limited types of PPE and their performance needs to be improved, particularly for
their deployment on real construction sites. This paper introduces an approach to train and
evaluate eight deep learning detectors, for real application purposes, based on You Only
Look Once (YOLO) architectures for six classes, including helmets with four colors,
person, and vest. Meanwhile, a dedicated high-quality dataset, CHV, consisting of 1330
images, is constructed by considering real construction site backgrounds, different gestures,
varied angles and distances, and multi-PPE classes. The comparison result among the eight
models shows that YOLO v5x has the best mAP (86.55%), and YOLO v5s has the fastest
speed (52 FPS) on GPU. The detection accuracy of helmet classes on blurred faces
decreases by 7%, while there is no effect on other person and vest classes. The proposed
detectors trained on the CHV dataset have superior performance compared to other deep
learning approaches on the same datasets. The novel multiclass CHV dataset is open for
public use.

2.4 Deep Learning-Based Automatic Safety Helmet Detection System for Construction
Safety

Author: Ahatsham Hayat and Fernando Morgado-Dias

Worker safety at construction sites is a growing concern for many construction industries.
Wearing safety helmets can reduce injuries to workers at construction sites, but due to
various reasons, safety helmets are not always worn properly. Hence, a computer vision-
based automatic safety helmet detection system is extremely important. Many researchers
have developed machine and deep learning-based helmet detection systems, but few have
focused on helmet detection at construction sites. This paper presents a You Only Look
Once (YOLO)--based real-time computer vision-based automatic safety helmet detection
system at a construction site. YOLO architecture is high-speed and can process 45 frames
per second, making YOLO-based architectures feasible to use in real-time safety helmet
detection. A benchmark dataset containing 5000 images of hard hats was used in this study,
which was further divided into a ratio of 60:20:20 (%) for training, testing, and validation,
XI
I
respectively. The experimental results showed that the YOLOv5x architecture achieved the
best mean average precision (mAP) of 92.44%, thereby showing excellent results in
detecting safety helmets even in low-light conditions.

XI
II
CHAPTER 3

SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

 The cutting-edge and interdisciplinary discipline of behavioral analytics for violence


prevention explores the intricate dynamics of human behavior to find patterns and pre-
indicators linked to violent tendencies.

 This method seeks to anticipate and avoid violent situations by utilizing cutting-edge
machine learning algorithms. It provides a proactive and data-driven approach to
improving public safety.

 The idea that some behavioral signs and patterns may foreshadow violent acts is one
of the core tenets of behavioral analytics. When thoroughly examined, human
interactions and activities can yield a wealth of information that can be used to
identify possible dangers.

 . By harnessing the power of data and technology, authorities can identify patterns and
trends in violence, allowing for targeted interventions and prevention strategies.

DISADVANTAGE
 Does not have a huge amount of data set.
 For smaller amounts of data the results may be not accurate.

XI
V
3.2 PROPOSED SYSTEM

 The envisioned system represents a proactive stride towards fortifying real-time


responses to violent incidents, with a particular focus on swiftly assessing potentially
violent situations.

 The existing protocol involves law enforcement arriving at locations deemed prone to
violence, promptly checking the CCTV cameras for ongoing incidents, and
subsequently launching investigations.

 The objective is to empower law enforcement with a cutting-edge video detection


system capable of not only identifying violent acts but also extracting crucial details
such as the number of individuals involved and the presence of weapons in a given
scenario. The deployment of YOLO-v5 models marks a significant departure from
traditional methods, offering a more proactive and efficient means of handling security
concerns.

 In essence, the proposed system represents a paradigm shift in the approach to security
and crime prevention. By integrating state-of-the-art deep learning models into a video
detection system, the study seeks to bridge the gap between identifying potential
threats and responding promptly

 As technology continues to evolve, the proposed system stands as a testament to the


integration of cutting-edge solutions in addressing real-world challenges.

 As the system is designed to operate in real-time, it holds the potential to significantly


enhance public safety by ensuring a swift and targeted response to violent incidents
captured by CCTV camera.

X
V
ADVANTAGE
 Accurate Violence Detection
 Real-time Monitoring
 Scalability
 Flexibility

3.3 BLOCK DIAGRAM

Fig No 3.3 - BLOCK DIAGRAM

3.3.1 DESCRIPTION OF THE SYSTEM BLOCK DIAGRAM

1. Define adequately our problem (objective, desired outputs…).


2. Gather data.
3. Choose a measure of success.
4. Set an evaluation protocol and the different protocols available.

X
VI
5. Prepare the data (dealing with missing values, with categorical values…).
6. Split correctly the data as train and test data.

7. Using machine learning algorithms and Prediction Is Made.

3.4 FLOW DIAGRAM

CHAPTER 4

17
METHODOLOGY AND ALGORITHMS

METHODOLOGY

In an era marked by growing concerns over public safety, leveraging advanced


technologies becomes imperative for enhancing security measures. The CrimeGuard
project emerges as a proactive response to these challenges, aiming to fortify public safety
through the deployment of cutting-edge detection and notification systems. At the heart of
this initiative lies the utilization of the YOLOv7 algorithm, a state-of-the-art object
detection framework renowned for its efficiency and accuracy.

The genesis of CrimeGuard lies in a meticulous planning phase where project objectives
are delineated with precision. The overarching goal is clear: to bolster public safety by
detecting and preempting criminal activities. Scope definition plays a pivotal role,
outlining the gamut of crimes targeted for detection, encompassing theft, vandalism,
assault, and beyond. Additionally, the deployment environment is meticulously
scrutinized, ensuring that the system is tailored to the unique dynamics of urban spaces
and public settings.

Central to the success of CrimeGuard is the curation of a comprehensive dataset, teeming


with images and videos portraying scenarios pertinent to crime detection. These datasets
serve as the bedrock for training the YOLOv7 model, meticulously annotated with
bounding boxes encapsulating objects of interest. With the dataset in hand, the
preprocessing and augmentation phase ensues, where images are refined, normalized, and
augmented to ensure optimal model performance and generalization.

The selection of the YOLOv7 algorithm is a strategic choice, rooted in its prowess for

18
real-time object detection and high accuracy. Through supervised learning, the YOLOv7
model undergoes rigorous training on the annotated dataset, fine-tuning its parameters to
minimize detection loss. Validation and evaluation become paramount, as the model's
performance is scrutinized across validation and testing sets, under diverse environmental
conditions and scenarios.

As the CrimeGuard system takes shape, integration and deployment become focal points.
The YOLOv7 model is seamlessly integrated into the broader CrimeGuard ecosystem,
comprising surveillance cameras, sensors, and notification mechanisms. Real-time
monitoring capabilities are imbued within the system, enabling continuous analysis of
surveillance feeds for signs of suspicious activities. Crucially, the system is calibrated to
trigger notifications and alerts, facilitating swift interventions by law enforcement or
security personnel.

The efficacy of CrimeGuard extends beyond its technological prowess; user feedback and
iterative improvement form the cornerstone of its evolution. Stakeholder engagement and
community collaboration foster an environment of continuous enhancement, where user
insights inform refinements to the detection algorithm, notification mechanisms, and
system scalability.

In documenting the CrimeGuard journey, a narrative of innovation and impact unfolds.


Technical reports and publications elucidate the methodologies, algorithms, and results
underpinning the project. Stakeholders are equipped with comprehensive insights,
illuminating the tangible contributions of CrimeGuard to public safety and crime
prevention.

In essence, CrimeGuard stands as a testament to the transformative potential of technology


in safeguarding communities. Through the fusion of advanced detection algorithms like
YOLOv7 with proactive intervention strategies, CrimeGuard charts a course toward a

19
safer, more secure future. As it continues to evolve and adapt to emerging challenges,
CrimeGuard remains steadfast in its commitment to fortify public safety, one detection at a
time.

4.1 ENSEMBLE ALGORITHMS USED

1. YOLO ALGORITHM
2. CONVOLUTION NEURAL NETWORK
3. REAL-TIME CRIME DETECTION
4. DEEP LEARNING MODELS

4.1.1 YOLO ALGORITHM

The YOLO (You Only Look Once) algorithm is a popular object detection system that
revolutionized the field of computer vision. Unlike traditional object detection algorithms,
which involve multiple stages like region proposal, feature extraction, and classification,
YOLO performs all these tasks in a single pass through the neural network. This results in
significantly faster inference speeds, making it well-suited for real-time applications.

Here's a basic overview of how YOLO works:

1. Input Image: YOLO takes an input image and divides it into a grid.

2. Bounding Box Prediction: For each grid cell, YOLO predicts bounding boxes. Each
bounding box contains the coordinates (x, y) of its center, width, height, and the
confidence score representing the probability that the bounding box contains an object.

3. Class Prediction: YOLO also predicts the probability distribution over all classes for
each bounding box.

20
4. Non-Maximum Suppression (NMS): To eliminate duplicate detections, YOLO applies
NMS, which removes redundant bounding boxes based on their overlap and confidence
scores.

5. Output: The final output of YOLO is a set of bounding boxes, each associated with a
class label and a confidence score.

YOLO has undergone several versions, each with improvements in accuracy and speed.
YOLOv1 was the original version, followed by YOLOv2, YOLOv3, and more recently,
YOLOv4 and YOLOv5, each refining the architecture and training methods to achieve
better performance.

The YOLO algorithm finds applications in various fields such as autonomous vehicles,
surveillance systems, object tracking, and more, owing to its real-time capabilities and
accuracy.

21
Fig. No 4.1.1.1 – YOLO ALGORITHM

ss

ALGORITHM Fig. No 4.1.1.2- YOLO

4.1.2 CONVOLUTIONAL NEURAL NETWORK

1. Convolutional Layers: YOLO's backbone network consists of convolutional layers,


which are adept at capturing spatial hierarchies and patterns in images. These layers help
in extracting features relevant to object detection.

2. Single Pass Inference: YOLO's design enables it to perform object detection in a


single pass through the network. This architecture is well-aligned with the strengths of
CNNs, which excel at processing input data through layers of convolutions efficiently.

3. Feature Learning: CNNs are powerful tools for learning hierarchical representations
of visual data. YOLO leverages this capability to learn discriminative features for object
detection tasks.

22
4. End-to-end Training: YOLO is trained end-to-end, meaning that the entire network,
including the backbone CNN and the detection head, is trained simultaneously. This
holistic training approach is facilitated by the seamless integration of CNNs into YOLO's
architecture.

Overall, YOLO's reliance on CNNs contributes to its effectiveness and efficiency in object
detection tasks, making it a popular choice for real-time applications where speed and
Accuracy.

23
Fig. No 4.1.2 CONVOLUTIONAL NEURAL NETWORK

4.1.3 REAL - TIME CRIME DETECTION

Real-time crime detection systems can be designed with a few key components in
mind:

1. Data Integration: Gather data from various sources such as surveillance cameras,
IoT devices, social media feeds, emergency calls, and criminal databases.

2. Data Processing: Employ algorithms to process and analyze the data in real time. This
includes object detection, facial recognition, license plate recognition, and natural
language processing for sentiment analysis of social media feeds.

3. Pattern Recognition: Utilize machine learning algorithms to recognize patterns


indicative of criminal activity. This could include abnormal behavior detection, identifying
known criminal patterns, or spotting anomalies in data streams.

4. Geospatial Analysis: Incorporate geospatial data to pinpoint the location of


incidents and identify crime hotspots. This helps in deploying resources effectively.

5. Alerting Mechanisms: Implement automated alerting mechanisms to notify law


enforcement agencies or relevant authorities when potential criminal activity is
detected.

24
6. User Interface: Develop intuitive dashboards or user interfaces for law
enforcement personnel to visualize real-time data, monitor alerts, and take
appropriate actions.

7. Scalability and Reliability: Ensure the system is scalable to handle large


volumes of data and reliable to operate 24/7 without downtime.

8. Privacy and Ethics: Prioritize privacy concerns by implementing measures to


anonymize data where necessary and ensure compliance with relevant regulations
such as GDPR or CCPA.

By integrating these elements, real-time crime detection systems can effectively aid
law enforcement agencies in identifying and responding to criminal activities
promptly.

4.1.4 DEEP LEARNING MODELS

Deep learning models have revolutionized various fields, from computer vision to natural
language processing and beyond. Here's an overview of why they're so powerful:

1. Complexity Handling: Traditional machine learning models struggle with handling


complex relationships in data. Deep learning models, with their multilayer neural
networks, can automatically learn intricate patterns and representations from raw data,
making them adept at handling complex tasks.

25
2. Feature Learning: Deep learning models can automatically learn relevant features
from raw data, eliminating the need for manual feature engineering. This ability to learn
hierarchical representations of data enables them to perform well on tasks where the
features are not easily discernible.

3. Scalability: Deep learning models scale well with data, often performing better as more
data becomes available. With the advent of frameworks like TensorFlow and PyTorch,
training deep learning models on large datasets has become feasible, leading to improved
performance on various tasks.

4. Versatility: Deep learning models can be applied to a wide range of tasks, including
image classification, object detection, speech recognition, natural language understanding,
and more. They have shown state-of-the-art performance across multiple domains, often
outperforming traditional machine learning techniques.

5.Transfer Learning: Pre-trained deep learning models can be fine-tuned on specific


tasks with limited amounts of task-specific data, making them highly efficient in scenarios
where labeled data is scarce. This transfer learning capability allows for the reuse of
learned representations across different tasks, speeding up model development and
deployment.

6. Continuous Improvement: Deep learning research is a rapidly evolving field, with


continuous advancements in model architectures, optimization techniques, and training
methodologies. This ongoing progress leads to improvements in model performance and
generalization across various domains.

7. Interpretability Challenges: While deep learning models excel in performance, their


inherent complexity often results in reduced interpretability compared to traditional

26
machine learning models. Understanding why a deep learning model makes a certain
prediction can be challenging, raising concerns regarding transparency and trustworthiness
in critical applications.

8. Hardware Acceleration: Deep learning models require significant computational


resources for training and inference. SSThe development of specialized hardware such as
GPUs (Graphics Processing Units) and TPUs (Tensor Processing Units) has accelerated
the training process, enabling the efficient deployment of deep learning models at scale.

In summary, deep learning models offer unmatched capabilities in handling complex


tasks, learning representations from raw data, and achieving state-of-the-art performance
across various domains. However, challenges such as interpretability and computational
requirements remain areas of active result and development.

27
CHAPTER 5
SYSTEM DESIGN

5.1 FUNCTIONAL AND NON- FUNCTIONAL REQUIREMENTS

Requirement analysis is a very critical process that enables the success of a system or
software project to be assessed. Requirements are generally split into two types:
Functional and nonfunctional requirements.

5.1.1 FUNCTIONAL REQUIREMENTS

These are the requirements that the end user specifically demands as basic facilities that
the system should offer. All these functionalities need to be necessarily incorporated into
the system as a part of the contract. These are represented or stated in the form of input to
be given to the system, the operation performed and the output expected. They are
basically the requirements stated by the user which one can see directly in the final
product, unlike the non-functional requirements.

Examples of functional requirements:

1. Object Detection Accuracy.

28
2. Real-time Processing.
3. Adaptability to different Environments.
4. Customization and Configuration

5.1.2 NON-FUNCTIONAL REQUIREMENTS

These are the quality constraints that the system must satisfy according to the project
contract. The priority or extent to which these factors are implemented varies from one
project to another. They are also called non-behavioral requirements.
They deal with issues like:
 Portability
 Security
 Maintainability
 Reliability
 Scalability
 Performance
 Reusability
 Flexibility
Examples of non-functional requirements:
1. Emails should be sent with a latency of no greater than 12 hours from such an
activity.
2. The processing of each request should be done within 10 seconds
3. The site should load in 3 seconds whenever of simultaneous users are >
10000.

29
5.2 SYSTEM SPECIFICATIONS:

5.2.1 HARDWARE SPECIFICATIONS:

 CCTV Cameras
 Computer or Server
 High Capacity Hard drives

5.2.2 SOFTWARE SPECIFICATIONS :

• Operating System : Windows 10


• Server-side Script : Python 3.6
• IDE : PyCharm
• Framework : Flask
• Libraries Used : Numpy, pandas, Scikit-Learn.

5.3 UML DIAGRAMS

UML stands for Unified Modelling Language. UML is a standardized general-


purpose modeling language in the field of object-oriented software engineering. The
standard is managed and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object-
oriented computer software. In its current form, UML comprises two major components: a
Meta-model and a notation. In the future, some form of method or process may also be
added to; or associated with, UML.
The Unified Modelling Language is a standard language for specifying,
Visualization, Constructing, and documenting the artifacts of software systems, as well as
for business modeling and other non-software systems.
The UML represents a collection of best engineering practices that have proven

30
successful in the modeling of large and complex systems.
The UML is a very important part of developing object-oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.

GOALS:

The Primary goals in the design of the UML are as follows:


1. Provide users with a ready-to-use, expressive visual modeling Language so that they
can develop and exchange meaningful models.
2. Provide extensibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and
development processes.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of the OO tools market.
6. Support higher-level development concepts such as collaborations, frameworks,
patterns, and components.
7. Integrate best practices.

5.3.1 USE CASE DIAGRAM

1. A use-case diagram in the Unified Modeling Language (UML) is a type of behavioral


diagram defined by and created from a Use-case analysis.
2. Its purpose is to present a graphical overview of the functionality provided by a system
in terms of actors, their goals (represented as use cases), and any dependencies
between those use cases.
3. The main purpose of a use case diagram is to show what system functions are

31
performed for which actor. The roles of the actors in the system can be depicted.

Fig.No 5.3.1 - USE CASE DIAGRAM

5.3.2 CLASS DIAGRAM

In software engineering, a class diagram in the Unified Modeling Language (UML) is a


type of static structure diagram that describes the structure of a system by showing the
system's classes, attributes, operations (or methods), and the relationships among the

classes. It explains which class contains information

32
Fig.No 5.3.2 - CLASS DIAGRAM
5.3.3 SEQUENCE DIAGRAM

 A sequence diagram in Unified Modeling Language (UML) is a kind of interaction


diagram that shows how processes operate with one another and in what order.
 It is a construct of a Message Sequence Chart. Sequence diagrams are sometimes
called event diagrams, event scenarios, and timing diagrams.

Fig.No 5.3.3 - SEQUENCE DIAGRAM

5.3.4 COLLABORATION DIAGRAM:

In the collaboration diagram, the method call sequence is indicated by some


numbering technique as shown below. The number indicates how the methods are called

33
one after another. We have taken the same order management system to describe the
collaboration diagram. The method calls are similar to that of a sequence diagram. But the
difference is that the sequence diagram does not describe the object organization whereas
the collaboration diagram shows the object organization.

Fig.No 5.3.4 - COLLABORATION DIAGRAM

5.3.5 DEPLOYMENT DIAGRAM

The deployment diagram represents the deployment view of a system. It is related to the
component diagram. Because the components are deployed using the deployment
diagrams. A deployment diagram consists of nodes. Nodes are nothing but physical
hardware used to deploy the application.

Fig.No 5.3.5 - DEPLOYMENT DIAGRAM

34
5.3.6 ACTIVITY DIAGRAM:

Activity diagrams are graphical representations of workflows of stepwise activities and


actions with support for choice, iteration and concurrency. In the Unified Modeling
Language, activity diagrams can be used to describe the business and operational step-by-
step workflows of components in a system. An activity diagram shows the overall flow of
control.

Fig.No 5.3.6 -ACTIVITY DIAGRAM

5.3.7 COMPONENT DIAGRAM:

A component diagram, also known as a UML component diagram, describes the


organization and wiring of the physical components in a system.

Component diagrams are often drawn to help model implementation details and double-
check that every aspect of the system's required function is covered by planned

35
development.

Fig.No 5.3.7 - COMPONENT DIAGRAM

5.3.8 ER DIAGRAM:

An Entity–relationship model (ER model) describes the structure of a database with the
help of a diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER
model is a design or blueprint of a database that can later be implemented as a database.
The main components of the E-R model are: entity set and relationship set.
An ER diagram shows the relationship among entity sets. An entity set is a group of
similar entities and these entities can have attributes. In terms of DBMS, an entity is a
table or attribute of a table in a database, so by showing relationship among tables and
their attributes, ER diagram shows the complete logical structure of a database. Let’s have
a look at a simple ER diagram to understand this concept.

36
Fig.No 5.3.8 - ER DIAGRAM
5.3.9 DFD DIAGRAM:

A Data Flow Diagram (DFD) is a traditional way to visualize the information flows within
a system. A neat and clear DFD can depict a good amount of the system requirements
graphically. It can be manual, automated, or a combination of both. It shows how
information enters and leaves the system, what changes the information, and where
information is stored. The purpose of a DFD is to show the scope and boundaries of a
system as a whole. It may be used as a communications tool between a systems analyst
and any person who plays a part in the system and acts as the starting point for redesigning
a system.

37
Fig.No 5.3.9 - DFD DIAGRAM

CHAPTER 6

SYSTEM DESIGN

6.1 SOFTWARE DEVELOPMENT LIFE CYCLE – SDLC:

In our project, we use the waterfall model as our software development cycle because
of its step-by-step procedure during implementation.

Fig.No 6.1 - Waterfall Model

Requirement Gathering and analysis − All possible requirements of the system to be


developed are captured in this phase and documented in a requirement specification
document.

38
System Design − The requirement specifications from first phase are studied in this phase
and the system design is prepared. This system design helps in specifying hardware and
system requirements and helps in defining the overall system architecture.
Implementation − With inputs from the system design, the system is first developed in
small programs called units, which are

integrated in the next phase. Each unit is developed and tested for its functionality, which
is referred to as Unit Testing.
Integration and Testing − All the units developed in the implementation phase are
integrated into a system after testing of each unit. Post integration the entire system is
tested for any faults and failures.
Deployment of system − Once the functional and non-functional testing is done; the
product is deployed in the customer environment or released into the market.
Maintenance − Some issues come up in the client environment. To fix those issues,
patches are released. Also, to enhance the product some better versions are released.
Maintenance is done to deliver these changes in the customer environment.

6.2 FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and a business proposal is put forth
with a very general plan for the project and some cost estimates. During system analysis,
the feasibility study of the proposed system is to be carried out. This is to ensure that the
proposed system is not a burden to the company. For feasibility analysis, some
understanding of the major requirements for the system is essential.
Three key considerations involved in the feasibility analysis are

 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY

39
 SOCIAL FEASIBILITY

6.3 ECONOMIC FEASIBILITY:

This study is carried out to check the economic impact that the system will have on the
organization. The amount of funds that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus, the
developed system is well within the budget and this was achieved because most of the
technologies used are freely available. Only the customized products had to be purchased.

6.4 TECHNICAL FEASIBILITY:

This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed
system must have a modest requirement, as only minimal or null changes are required for
implementing this system

6.5 SOCIAL FEASIBILITY:

The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not
feel threatened by the system, instead must accept it as a necessity. The level of
acceptance by the users solely depends on the methods that are employed to educate the
user about the system and to make him familiar with it. His level of confidence must be
raised so that he is also able to make some constructive criticism, which is welcomed, as

40
he is the final user of the system.

6.3 MODULES

1. Input Module
2. Preprocessing Module
3. Violence Detection Module
4. Visualization Module.

6.3.1 INPUT MODULE:

The designated module serves as the gateway for input data, comprising images or video
frames depicting altercations in public spaces. Its primary function involves the efficient
management of this visual content, retrieved from surveillance systems capturing incidents
of people engaging in physical confrontations. Serving as the initial stage in the system's
workflow, the module plays a crucial role in preparing the input data for subsequent
processing. Upon receiving images or video frames, the module undertakes tasks such as
format standardization, noise reduction, and data organization. This preprocessing ensures
that the input data is uniform and optimized for further analysis. The focus on altercations
in public places aligns with the system's objective of detecting and responding to violent
incidents, contributing to public safety. By facilitating the seamless transition of raw visual
data to a refined and standardized format, this module sets the foundation for subsequent
stages in the system. The prepared data, now cleansed of extraneous elements, is ready for
advanced processing using deep learning models or other analytical techniques, ultimately
enhancing the system's ability to detect, analyze, and respond to instances of physical
altercations in real-world scenarios

6.3.2 PREPROCESSING MODULE:

The preprocessing module is a pivotal component that conducts essential preparatory steps

41
on the input data before it is introduced to the detection model. This critical phase involves
a series of tasks aimed at optimizing the data for effective analysis. Tasks within the
preprocessing module encompass operations like resizing, normalization, and data
formatting, all of which are essential to guarantee compatibility with the subsequent
detection model.The resizing aspect involves adjusting the dimensions of the input data,
ensuring uniformity and adherence to the specifications of the detection model.
Normalization is employed to standardize pixel values, enhancing the model's ability to
discern patterns across different images. Data formatting ensures that the input adheres
to the structure expected by the detection model, facilitating seamless integration into the
overall workflow. Moreover, the preprocessing module plays a key role in labeling the
input data, distinguishing between instances of violence and non-violence. This labeling is
fundamental for the supervised learning process, enabling the detection model to learn and
differentiate between the two classes. By performing these preprocessing tasks, the
module lays the groundwork for a robust and streamlined workflow, ultimately enhancing
the detection model's accuracy and effectiveness in discerning violent and non-violent
content within images or video frames.

6.3.3 VIOLENCE DETECTION MODULE:

The Violence Detection Module plays a pivotal role in the system, employing the
YOLOv7 object detection algorithm to identify instances of violence, specifically
detecting and categorizing humans engaged in physical altercations. Operating on
preprocessed data as its input, this module leverages the power of the trained YOLOv7
model to conduct precise and efficient detection of violent activities within images or
video frames. By utilizing the YOLOv7 algorithm, renowned for its accuracy and real-
time object detection capabilities, the module processes the preprocessed data and outputs
valuable information. The system generates bounding box coordinates that precisely
delineate the regions containing instances of violence. The YOLOv7 model's ability to
handle multiple object classes and its effectiveness in real-time applications make it a

42
suitable choice for violence detection in dynamic scenarios. The output from this module
serves as critical information for subsequent stages, aiding in the prompt response and
intervention by law enforcement or relevant authorities in situations of public disturbance
or violence. Overall, the Violence Detection Module showcases the synergy between
advanced object detection algorithms and real-world applications, contributing to
enhanced public safety and security.

6.3.4 VISUALIZATION MODULE:

The Visualization Module is a crucial component designed to enhance the interpretability


and analysis of violence detection results. Building upon the output from the Violence
Detection Module, this module transforms the detected information into a visual
representation. It overlays bounding boxes and labels onto the input images or video
frames, offering a clear and intuitive depiction of where instances of violence have been
identified.By superimposing bounding boxes, the module precisely highlights the regions
within the images or video frames where violent activities are detected. Labels associated
with each bounding box further provide immediate context, aiding observers in
understanding the nature of the identified events. This visual representation significantly
facilitates the interpretation of the violence detection output, allowing for quick
comprehension and decision-making by law enforcement or relevant personnel.A
noteworthy feature of the Visualization Module is its ability to save the results locally.
This ensures that the visualized output, complete with bounding boxes and labels, is
preserved for future reference or analysis. The local storage of results contributes to
documentation and serves as a valuable resource for further investigations or review. In
summary, the Visualization Module acts as a bridge between the raw detection data and
human interpretation. It transforms abstract detection results into tangible visualizations,
providing a valuable tool for law enforcement and decision-makers to swiftly comprehend
and respond to instances of violence captured by the system. The capability to save results
locally adds a layer of documentation and archival functionality, enhancing the overall

43
utility of the system.

CHAPTER 7
TESTING

7.1 INTRODUCTION

Testing forms an integral part of any software development project. Testing helps in
ensuring that the final product is by and large, free of defects and it meets the desired
requirements. Proper testing in the development phase helps in identifying the critical
errors in the design and implementation of various functionalities thereby ensuring
product reliability. Even though it is a bit time-consuming and a costly process at first, it
helps in the long run of software development.

Although machine learning systems are not traditional software systems, not testing them
properly for their intended purposes can lead to a huge impact in the real world. This is
because machine learning systems reflect the biases of the real world. Not accounting or
testing for them will inevitably have lasting and sometimes irreversible impacts. Some of
examples for such fails include Amazon’s recruitment tool which did not evaluate people
in a gender-neutral way and Microsoft’s chatbot Tay which responded with
offensive and derogatory remarks.

In this article, we will understand how testing machine learning systems is different from

44
testing traditional software systems, the difference between model testing and model
evaluation, types of tests for Machine Learning systems followed by a hands-on example
of writing test cases for “insurance charge prediction”.

7.2 TESTING TRADITIONAL SOFTWARE SYSTEMS V/S MACHINE


LEARNING

In traditional software systems, code is written for having a desired behavior as the
outcome. Testing them involves testing the logic behind the actual behavior and how it
compares with the expected behavior. In machine learning systems, however, data and
desired behavior are the inputs and the models learn the logic as the outcome of the
training and optimization processes.

In this case, testing involves validating the consistency of the model’s logic and our
desired behavior. Due to the process of models learning the logic, there are some notable
obstacles in the way of testing Machine Learning systems. They are:

Indeterminate outcomes: on retraining, it’s highly possible that the model parameters vary
significantly
Generalization: it’s a huge task for Machine Learning models to predict sensible outcomes
for data not encountered in their training
Coverage: there is no set method of determining test coverage for a machine-
learning model
Interpretability: most ML models are black boxes and don’t have a comprehensible logic
for a certain decision made during prediction

45
These issues lead to a lower understanding of the scenarios in which models fail and the
reason for that behavior; not to mention, making it more difficult for developers to
improve their behaviors.

7.3 MODEL TESTING AND MODEL EVALUATION

From the discussion above, it may feel as if model testing is the same as model evaluation
but that’s not true. Model evaluations focus on the performance metrics of the models like
accuracy, precision, the area under the curve, f1 score, log loss, etc. These metrics are
calculated on the validation dataset and remain confined to that. Though the evaluation
metrics are necessary for assessing a model, they are not sufficient because they don’t
shed light on the specific behaviors of the model.

It is fully possible that a model’s evaluation metrics have improved but its behavior on a
core functionality has regressed. Or retraining a model on new data might introduce a bias
for marginalized sections of society all the while showing no particular difference in the
metrics values. This is extra harmful in the case of ML systems since such problems might
not come to light easily but can have devastating impacts.

In summary, model evaluation helps in covering the performance on validation datasets


while model testing helps in explicitly validating the nuanced behaviors of our models.
During the development of ML models, it is better to have both model testing and
evaluation to be executed in parallel.

7.3.1 WRITING TEST CASES

We usually write two different classes of tests for Machine Learning systems:

46
Pre-train tests
Post-train tests

Pre-train tests: The intention is to write such tests that can be run without trained
parameters so that we can catch implementation errors early on. This helps in avoiding the
extra time and effort spent in a wasted training job.

We can test the following in the pre-train test:

the model predicted output shape is proper or not


test dataset leakage i.e. checking whether the data in training and testing datasets have no
duplication
temporal data leakage which involves checking whether the dependencies between
training and test data do not lead to unrealistic situations in the time domain like training
on a future data point and testing on a past data point
check for the output ranges. In the cases where we are predicting outputs in a certain range
(for example when predicting probabilities), we need to ensure the final prediction is
not outside the expected range of values.
Ensuring a gradient step training on a batch of data leads to a decrease in the loss

data profiling assertions

Post-train tests: Post-train tests are aimed at testing the model’s behavior. We want to test
the learned logic and it could be tested on the following points and more:

Invariance tests involve testing the model by tweaking only one feature in a data point and
checking for consistency in model predictions. For example, if we are working with a loan
prediction dataset then a change in sex should not affect an individual’s eligibility for the
loan given all other features are the same or in the case of

47
titanic survivor probability prediction data, a change in the passenger’s name should not
affect their chances of survival.
Directional expectations wherein we test for a direct relation between feature values and
predictions. For example, in the case of a loan prediction problem, having a higher credit
score should increase a person’s eligibility for a loan.

Apart from this, you can also write tests for any other failure modes identified for your
model.

Now, let’s try a hands-on approach and write tests for the Medical Cost Personal Datasets.
Here, we are given a bunch of features and we have to predict the insurance costs.

7.3.2 PROJECT TESTING

Let’s see the features first. The following columns are provided in the dataset:

The dataset contains transactions made by credit cards in September.This dataset presents
transactions that occurred in two days, where we have 492 frauds out of 284,807
transactions. The dataset is highly unbalanced, the positive class (frauds) accounts for
0.172% of all transactions.
It contains only numeric input variables which are the result of a PCA transformation.
Unfortunately, due to confidentiality issues, we cannot provide the original features and
more background information about the data. Features V1, V2, … V28 are the principal
components obtained with PCA, the only features which have not been transformed with
PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each
transaction and the first transaction in the dataset. The feature 'Amount' is the transaction
Amount, this feature can be used for
example-dependent cost-sensitive learning. Feature 'Class' is the response variable and it
takes value 1 in case of fraud and 0 otherwise.
Doing a little bit of analysis on the dataset will reveal the relationship between various

48
features. Since the main aim of this article is to learn how to write tests, we will skip the
analysis part and directly write basic tests

CHAPTER 8
IMPLEMENTATION

8.1 SAMPLE CODE

#Object Crop Using YOLOv7


import argparse
import time
from pathlib import
Path import os
import cv2
import torch
import torch.backends.cudnn as
cudnn from numpy import random
import
smtplib
import
imghdr
from email.message import EmailMessage
from models.experimental import
attempt_load
from utils.datasets import LoadStreams, LoadImages
from utils.general import check_img_size, check_requirements, check_imshow,
non_max_suppression, apply_classifier, \
scale_coords, xyxy2xywh, strip_optimizer, set_logging,
increment_path from utils.plots import plot_one_box
from utils.torch_utils import select_device, load_classifier, time_synchronized,

49
TracedModel
def detect(save_img=False):
source, weights, view_img, save_txt, imgsz, trace = opt.source, opt.weights,
opt.view_img, opt.save_txt, opt.img_size, not opt.no_trace
save_img = not opt.nosave and not source.endswith('.txt') # save inference images
webcam = source.isnumeric() or source.endswith('.txt') or source.lower().startswith(
('rtsp://', 'rtmp://', 'http://', 'https://'))
#make crop folder
if not os.path.exists("crop"):
os.mkdir("cro
p") crp_cnt =
0
# Directories
save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok))
# increment run
(save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True,
exist_ok=True) # make dir
# Initialize
set_loggin
g()
device = select_device(opt.device)
half = device.type != 'cpu' # half precision only supported on
CUDA # Load model
model = attempt_load(weights, map_location=device) # load FP32 model
stride = int(model.stride.max()) # model stride
imgsz = check_img_size(imgsz, s=stride) # check
img_size if trace:
model = TracedModel(model, device,
opt.img_size) if half:
model.half() # to
FP16 # Second-stage
classifier classify =
False
if classify:
modelc = load_classifier(name='resnet101', n=2) # initialize
modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)
['model']).to(device).eval()

50
# Set Dataloader
vid_path, vid_writer = None,
None if webcam:
view_img = check_imshow()
cudnn.benchmark = True # set True to speed up constant image size inference
dataset = LoadStreams(source, img_size=imgsz, stride=stride)
else:
dataset = LoadImages(source, img_size=imgsz,
stride=stride) # Get names and colors
names = model.module.names if hasattr(model, 'module') else model.names
colors = [[random.randint(0, 255) for _ in range(3)] for _ in names]
# Run inference
if device.type != 'cpu':
model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters()))) #
run once
old_img_w = old_img_h =
imgsz old_img_b = 1
t0 = time.time()
for path, img, im0s, vid_cap in dataset:
img = torch.from_numpy(img).to(device)
img = img.half() if half else img.float() # uint8 to
fp16/32 img /= 255.0 # 0 - 255 to 0.0 - 1.0
if img.ndimension() == 3:
img =
img.unsqueeze(0) #
Warmup
if device.type != 'cpu' and (old_img_b != img.shape[0] or old_img_h != img.shape[2]
or old_img_w != img.shape[3]):
old_img_b =
img.shape[0] old_img_h
= img.shape[2]
old_img_w =
img.shape[3] for i in
range(3):
model(img, augment=opt.augment)
[0] # Inference
t1 = time_synchronized()

51
pred = model(img, augment=opt.augment)
[0] t2 = time_synchronized()
# Apply NMS
pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes,
agnostic=opt.agnostic_nms)
t3 =
time_synchronized() #
Apply Classifier
if classify:
pred = apply_classifier(pred, modelc, img,
im0s) # Process detections
for i, det in enumerate(pred): # detections per
image if webcam: # batch_size >= 1
p, s, im0, frame = path[i], '%g: ' % i, im0s[i].copy(), dataset.count
else:
p, s, im0, frame = path, '', im0s, getattr(dataset, 'frame',
0) p = Path(p) # to Path
save_path = str(save_dir / p.name) # img.jpg
txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else
f'_{frame}') # img.txt
gn = torch.tensor(im0.shape)[[1, 0, 1, 0]] # normalization gain
# send email
Sender_Email="finalyrprj2024@gmail.com"
Reciever_Email="finalyrprj2024@gmail.com
" Password = "vnfc lbdu mbdy vgpa"
newMessage = EmailMessage()
newMessage['Subject'] = "Voilance
detected!" newMessage['From'] =
Sender_Email newMessage['To'] =
Reciever_Email
newMessage.set_content('Peoples are involved voilated activities')
withopen(r'C:\Users\avant\Downloads\yolov7-voilance\yolov7-voilance\Source code\
crop/0.jpg','rb') as f:
image_data = f.read()
image_type =
imghdr.what(f.name)
image_name = f.name

52
newMessage.add_attachment(image_data,maintype='image',subtype=image_type,
filename=image_name)
with smtplib.SMTP_SSL('smtp.gmail.com', 465) as
smtp: smtp.login(Sender_Email,Password)
smtp.send_message(newMessage)

CHAPTER 9
OUTPUTS AND SNAPSHOTS

Fig.No 9.1 - HOME PAGE

53
Fig.No 9.2 - VIOLENCE DETECTION

Fig.No 9.3 - DATASETS

54
Fig.No 9.4- APP PASSWORD GENERATION

FINAL OUTPUT:

Fig.No 9.5 -VIOLENCE INPUT VIDEO

55
Fig.No 9.6 VIOLENCE OUTPUT MAIL

MATRIX

Fig.No 9.7 - MATRIX

HEAP MAP

56
Fig.No 9.8- HEAT MAP

CHAPTER 10
CONCLUSIONS AND FUTURE WORK

10.1 CONCLUSION :

The development of techniques for automatic footage of surveillance recognition is


becoming more and more important, especially in the context of recognizing and detecting
human aggression. An efficient method for solving this issue has been to use deep learning
models, which rely on the convolutional neural network (CNN) as a key pre-training
component. In this lengthy examination, the model's understanding of complex patterns
and temporal connections in the surveillance video data is enhanced by the introduction of
completely interconnected layers with short-term and long-term memory (LSTM).
Combining CNN with LSTM yields a more comprehensive picture of human interactions
by effectively collecting both spatial and temporal aspects. CNN is used to further enhance
the model's ability to analyze certain movements and actions inside the surveillance film

57
by delving into the local motion dynamics of the video. With the use of many modalities,
the video data is thoroughly examined, allowing the model to identify minute details that
may be signs of impending violence. The YOLO (You Only Look Once) v5 model, which
was created especially for the single goal of identifying violent acts, is the main tool used
in this study. YOLO v5, which is well-known for its effectiveness in real-time object
detection, can handle the challenges involved in identifying violent behaviors in the video
stream. Its exceptional object identification accuracy becomes a crucial advantage in
distinguishing between situations involving human violence and those involving
nonhuman violence.

10.2 FUTURE WORK

Future advancements could focus on several key areas to elevate the system's capabilities.
Integrating multi-modal fusion would allow the incorporation of diverse data sources,
fostering a more holistic understanding of potential criminal activities. Behavioral
analysis, using recurrent neural networks, could introduce a temporal dimension to the
model, enabling the identification of abnormal patterns in human
behavior. To address concerns related to the interpretability of deep learning models,
implementing Explainable AI (XAI) techniques would provide transparency in decision-
making processes. Moreover, prioritizing privacy-preserving mechanisms, such as
federated learning, can uphold individual privacy rights in the data analysis process.
Adaptive learning strategies would enable the system to evolve and adapt to changing
crime patterns continuously. The integration of edge computing can enhance real-time
processing capabilities, especially in resource-constrained environments. Collaborating
with IoT devices, including smart cameras and sensors, would expand the system's data

58
sources, offering richer contextual information for crime detection. Ensuring robustness to
adversarial attacks and fostering a human-in-the-loop system would fortify the system
against intentional manipulations and leverage human expertise. Furthermore,
collaboration with law enforcement agencies is crucial to align the system with legal and
ethical standards, establishing protocols for responsible use and compliance with
regulation.

REFERENCES

[1] Kaya V, Tuncer S and Baran A 2021, Detection And Classification Of Different
Weapon Types Using Deep Learning. Applied Sciences, 11 (16), 7535.
[2] Singh P and Pankajakshan V 2018 A Deep Learning Based Technique For Anomaly
Detection In Surveillance Videos. Proc. of the 24th National Conf. on Communications,
pp. 1-6.
[3] Dandage V, Gautam H, Ghavale A, Mahore R and Sonewar P A 2019 Review Of
Violence Detection System Using Deep Learning. Int. Research Journal of Engineering
and Technology 6 (12), pp. 1899-1902.
[4] Wang K, Liu M 2022 YOLOv3-MT: A YOLOv3 Using Multi-Target Tracking For
Vehicle Visual Detection. Appl. Intell. 52, pp. 2070–2091.
[5] Antoniou A and Angelov P 2016 A General Purpose Intelligent Surveillance System
For Mobile Devices Using Deep Learning. Proc. of the Int. Joint Conf. on Neural
Networks, pp. 2879-2886.

59
60

You might also like