Professional Documents
Culture Documents
A PROJECT REPORT
Submitted by,
DHEJASRI K (310820205016)
HARISHMA T (310820205025)
JESSY AMAL RANI F (310820205036)
of
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY
BONAFIDE CERTIFICATE
I
ACKNOWLEDGEMENT
We are very much indebted to (Late) Hon’ble Colonel Dr. JEPPIAAR, M.A.,
B.L., Ph.D., Our Chairman and Managing Director Dr. M. REGEENA
JEPPIAAR, B. Tech., M.B.A., Ph.D., the Principal Dr. K. Senthil Kumar,
M.E, Ph. D, FIE, and the Dean Academics Dr. SHALEESHA A. STANLEY
M.Sc., M.Phil., Ph.D., to carry out the project here.
We would like to express our deep sense of gratitude to Dr. C. Anitha, M.E.,
Ph.D., Head of the Department, and also to our guide Mrs. Anuja T, M.E., for
giving valuable suggestions for making this project a grand success.
We also thank the teaching and non-teaching staff members of the Department of
Information Technology for their constant support.
II
ABSTRACT
The world faces an average annual fatality rate of 7.9 per 10,000 people due to human
violence. Much of this violence occurs suddenly or in isolated areas, presenting a
significant challenge in preventing and addressing these acts. To tackle this issue, a
detection technique has been employed, leveraging the effectiveness of computer vision
algorithms, particularly in the realm of detecting moving objects from Closed-Circuit
Television (CCTV) footage. CCTV cameras have become ubiquitous on streets, serving as
invaluable tools in solving criminal cases. This study focuses on enhancing the proactive
detection of violent acts by utilizing deep learning techniques in computer vision to predict
and identify actions and properties from video data. The aim is to overcome the
information delay that often hampers timely intervention in violent situations. The study
employs YOLO-v5 models, a state-of-the-art deep learning architecture, to detect violent
acts, determine the number of individuals involved, and identify any weapons used in the
situation. The core of this study revolves around implementing deep learning models to
establish a comprehensive video detection system. YOLO-v5, which stands for "You Only
Look Once," is renowned for its efficiency in real-time object detection. By harnessing the
power of YOLO-v5, the system can rapidly analyze video feeds from CCTV cameras,
enabling law enforcement to identify and respond to violent incidents promptly. The
integration of YOLO-v5 allows for detecting various parameters essential for
understanding a violent situation. Not only can the system identify the occurrence of a
violent act, but it can also quantify the number of persons involved. Additionally, the
model is designed to recognize and report on any weapons present in the observed
scenario. The significance of this study lies in its potential to revolutionize the way law
enforcement responds to and prevents violent incidents.
II
I
TABLE OF CONTENTS
CHAPTER 1: INTRODUCTION
1
1.1 INTRODUCTION 1
CHAPTER 2: LITERATURE SURVEY
2
2.1 LITERATURE SURVEY 4
CHAPTER 3: SYSTEM ANALYSIS
3
3.1 EXISTING SYSTEM 9
3.2 PROPOSED SYSTEM 9
3.3 BLOCK DIAGRAM 10
3.3.1 DESCRIPTION OF THE SYSTEM BLOCK
DIAGRAM 10
3.4 FLOW DIAGRAM 11
CHAPTER 4: METHODOLOGIES AND
4 ALGORITHMS
METHODOLOGIES
I
V
REQUIREMENTS
23
5.1.1 FUNCTIONAL REQUIREMENTS 23
5 5.1.2 NON-FUNCTIONAL REQUIREMENTS 23
5.2 SYSTEM SPECIFICATIONS 24
5.2.1 HARDWARE SPECIFICATIONS 24
5.2.2 SOFTWARE SPECIFICATIONS 24
5.3 UML DIAGRAMS 25
5.3.1 USE CASE DIAGRAM 26
5.3.2 CLASS DIAGRAM 27
5.3.3 SEQUENCE DIAGRAM 27
5.3.4 COLLABORATION DIAGRAM 28
5.3.5 DEPLOYMENT DIAGRAM 29
5.3.6 ACTIVITY DIAGRAM 29
5.3.7 COMPONENT DIAGRAM 30
5.3.8 ER DIAGRAM 30
5.3.9 DFD DIAGRAM 31
CHAPTER 6: SOFTWARE DESIGN
6
6.1 SOFTWARE DEVELOPMENT LIFE CYCLE 33
6.2 FEASIBILITY STUDY 34
6.2.1 ECONOMIC FEASIBILITY 35
6.2.2 TECHNICAL FEASIBILITY 35
6.2.3 SOCIAL FEASIBILITY 35
6.3 MODULES 36
6.3.1 INPUT MODULE 36
6.3.2 PRE-PROCESSING MODULE 36
6.3.3 VIOLENCE DETECTION MODULE 37
6.3.4 VISUALIZATION MODULE 38
V
V/S MACHINE LEARNING SYSTEMS 42
7.3 MODEL TESTING AND MODEL EVALUATION 43
7.3.1 WRITING TEST CASES 43
7.4 PROJECT TESTING 45
CHAPTER 8: IMPLEMENTATION
10.1 CONCLUSION 54
10.2 FUTURE WORK 54
REFERENCES 66
VI
LIST OF FIGURES
VI
I
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
The rapid advancements in video and image processing technologies have been remarkable,
driven by the growing need to extract meaningful content for a variety of applications,
particularly in the realm of surveillance and security. One critical application involves the
recognition of actions and objects, including potentially harmful items such as knives or
guns. The rise in incidents of human violence in our daily lives has underscored the
importance of developing robust systems to automatically detect and respond to violent
activities, especially in surveillance footage where manual monitoring is impractical due to
the sheer volume of data. The prevalence of millions of surveillance cameras worldwide
necessitates automated methods for detecting and responding to potential threats. Although
the percentage of human violence incidents may be relatively low, the potential dangers
exist in various settings, making it crucial to devise systems capable of estimating and
responding to such situations promptly. This study delves into the current landscape of
human violence detection systems, emphasizing the role of deep learning techniques in
addressing this pressing concern. Deep learning algorithms play a pivotal role in automating
the detection of violent activities. The process involves multiple stages, including object
detection, action detection, and video classification. The objective is to create a system
capable of autonomously identifying and flagging instances of human violence without
requiring human intervention. Leveraging transfer learning, this study incorporates two
prominent deep learning models: Google Net – Inception – v3 for image classification and
YOLO (You Only Look Once) – v5 for object and face detection.In this research, the
machine learning pre-trained model Inception – v3 is a key component. This model
surpasses the basic structures of earlier Inception v1 and v2 computer vision models.
Trained on extensive ImageNet datasets, Inception – v3 exhibits a sophisticated
architecture that retains valuable information from inception layers to top layers. This
VI
II
model's proficiency in image classification is harnessed to recognize patterns and features
indicative of violent activities.The YOLO – v5 model, renowned for its efficiency in object
detection, further enhances the system's capability to identify and locate relevant objects,
including potential weapons or violent actions, within video streams. YOLO – v5's real-
time object detection capabilities make it a valuable asset in scenarios where immediate
responsiveness is crucial.By integrating these deep learning models into a unified system,
the aim is to create a robust framework for automated human-violence detection. The
synergy between image classification, object detection, and real-time processing allows for
a comprehensive analysis of video streams. This system operates without the need for
constant human monitoring, making it a scalable and effective solution for enhancing
security and safety in diverse environments.Thus, the integration of deep learning
techniques, exemplified by models like Inception – v3 and YOLO – v5, holds immense
promise in automating the detection of human-violence activities. This research contributes
to the ongoing efforts to leverage technology for enhancing security measures, especially in
the context of widespread surveillance. The goal is to create intelligent systems that can
promptly identify and respond to potential threats, ultimately contributing to a safer and
more secure environment.
IX
CHAPTER 2
LITERATURE SURVEY
With the growing availability of video surveillance cameras and the need for techniques to
automatically identify events in video footage, there is an increasing interest in automatic
violence detection in videos. Deep learning-based architectures, such as 3D Convolutional
Neural Networks, demonstrated their capability of extracting spatio-temporal features from
videos, being effective in violence detection. However, friendly behaviors or fast moves
such as hugs, small hits, claps, high fives, etc., can still cause false positives, interpreting a
harmless action as violent. To this end, we present three deep learning-based models for
violence detection and test them on the AIRTLab dataset, a novel dataset designed to check
the robustness of algorithms against false positives. The objective is twofold: on one hand,
we compute accuracy metrics on the three proposed models (two are based on transfer
learning and one is trained from scratch), building a baseline of metrics for the AIRTLab
dataset; on the other hand, we validate the capability of the proposed dataset of challenging
the robustness to false positives. The results of the proposed models are in line with the
scientific literature, in terms of accuracy, with transfer learning-based networks exhibiting
better generalization capabilities than the trained from scratch network. Moreover, the tests
highlighted that most of the classification errors concern the identification of non-violent
clips, validating the design of the proposed dataset. Finally, to demonstrate the significance
X
of the proposed models, the paper presents a comparison with the related literature, as well
as with models based on well-established pre-trained 2D Convolutional Neural Networks
(2D CNNs). Such comparison highlights that 3D models get better accuracy performance
than time-distributed 2D CNNs (merged with a recurrent module) in processing the
spatiotemporal features of video clips. The source code of the experiments and the
AIRTLab dataset are available in public repositories.
Safety is the foremost important issue on the construction site. Wearing a safety helmet is
a compulsory issue for every individual in the construction area, which greatly reduces
injuries and deaths. However, though workers are aware of the dangers associated with not
wearing safety helmets, many of them may forget to wear helmets at work, which leads to
significant potential security issues. To solve this problem, we have developed an
automatic computer-vision approach based on Convolutional Neural Network (YOLO) to
detect wearing conditions. We create a safety helmet image dataset of people working on
construction sites. The corresponding images are collected and labeled and are used to
train and test our model. The YOLO-based model is adopted and the parameters are well
tuned. The precision of the proposed model is 78.3% and the accuracy rate is 20 ms. The
results demonstrate that the proposed model is an effective method and comparatively fast
for recognition and localization in real-time helmet detection .
2.3 Fast Personal Protective Equipment Detection for Real Construction Sites
Using Deep Learning Approaches
The existing deep learning-based Personal Protective Equipment (PPE) detectors can only
XI
detect limited types of PPE and their performance needs to be improved, particularly for
their deployment on real construction sites. This paper introduces an approach to train and
evaluate eight deep learning detectors, for real application purposes, based on You Only
Look Once (YOLO) architectures for six classes, including helmets with four colors,
person, and vest. Meanwhile, a dedicated high-quality dataset, CHV, consisting of 1330
images, is constructed by considering real construction site backgrounds, different gestures,
varied angles and distances, and multi-PPE classes. The comparison result among the eight
models shows that YOLO v5x has the best mAP (86.55%), and YOLO v5s has the fastest
speed (52 FPS) on GPU. The detection accuracy of helmet classes on blurred faces
decreases by 7%, while there is no effect on other person and vest classes. The proposed
detectors trained on the CHV dataset have superior performance compared to other deep
learning approaches on the same datasets. The novel multiclass CHV dataset is open for
public use.
2.4 Deep Learning-Based Automatic Safety Helmet Detection System for Construction
Safety
Worker safety at construction sites is a growing concern for many construction industries.
Wearing safety helmets can reduce injuries to workers at construction sites, but due to
various reasons, safety helmets are not always worn properly. Hence, a computer vision-
based automatic safety helmet detection system is extremely important. Many researchers
have developed machine and deep learning-based helmet detection systems, but few have
focused on helmet detection at construction sites. This paper presents a You Only Look
Once (YOLO)--based real-time computer vision-based automatic safety helmet detection
system at a construction site. YOLO architecture is high-speed and can process 45 frames
per second, making YOLO-based architectures feasible to use in real-time safety helmet
detection. A benchmark dataset containing 5000 images of hard hats was used in this study,
which was further divided into a ratio of 60:20:20 (%) for training, testing, and validation,
XI
I
respectively. The experimental results showed that the YOLOv5x architecture achieved the
best mean average precision (mAP) of 92.44%, thereby showing excellent results in
detecting safety helmets even in low-light conditions.
XI
II
CHAPTER 3
SYSTEM ANALYSIS
This method seeks to anticipate and avoid violent situations by utilizing cutting-edge
machine learning algorithms. It provides a proactive and data-driven approach to
improving public safety.
The idea that some behavioral signs and patterns may foreshadow violent acts is one
of the core tenets of behavioral analytics. When thoroughly examined, human
interactions and activities can yield a wealth of information that can be used to
identify possible dangers.
. By harnessing the power of data and technology, authorities can identify patterns and
trends in violence, allowing for targeted interventions and prevention strategies.
DISADVANTAGE
Does not have a huge amount of data set.
For smaller amounts of data the results may be not accurate.
XI
V
3.2 PROPOSED SYSTEM
The existing protocol involves law enforcement arriving at locations deemed prone to
violence, promptly checking the CCTV cameras for ongoing incidents, and
subsequently launching investigations.
In essence, the proposed system represents a paradigm shift in the approach to security
and crime prevention. By integrating state-of-the-art deep learning models into a video
detection system, the study seeks to bridge the gap between identifying potential
threats and responding promptly
X
V
ADVANTAGE
Accurate Violence Detection
Real-time Monitoring
Scalability
Flexibility
X
VI
5. Prepare the data (dealing with missing values, with categorical values…).
6. Split correctly the data as train and test data.
CHAPTER 4
17
METHODOLOGY AND ALGORITHMS
METHODOLOGY
The genesis of CrimeGuard lies in a meticulous planning phase where project objectives
are delineated with precision. The overarching goal is clear: to bolster public safety by
detecting and preempting criminal activities. Scope definition plays a pivotal role,
outlining the gamut of crimes targeted for detection, encompassing theft, vandalism,
assault, and beyond. Additionally, the deployment environment is meticulously
scrutinized, ensuring that the system is tailored to the unique dynamics of urban spaces
and public settings.
The selection of the YOLOv7 algorithm is a strategic choice, rooted in its prowess for
18
real-time object detection and high accuracy. Through supervised learning, the YOLOv7
model undergoes rigorous training on the annotated dataset, fine-tuning its parameters to
minimize detection loss. Validation and evaluation become paramount, as the model's
performance is scrutinized across validation and testing sets, under diverse environmental
conditions and scenarios.
As the CrimeGuard system takes shape, integration and deployment become focal points.
The YOLOv7 model is seamlessly integrated into the broader CrimeGuard ecosystem,
comprising surveillance cameras, sensors, and notification mechanisms. Real-time
monitoring capabilities are imbued within the system, enabling continuous analysis of
surveillance feeds for signs of suspicious activities. Crucially, the system is calibrated to
trigger notifications and alerts, facilitating swift interventions by law enforcement or
security personnel.
The efficacy of CrimeGuard extends beyond its technological prowess; user feedback and
iterative improvement form the cornerstone of its evolution. Stakeholder engagement and
community collaboration foster an environment of continuous enhancement, where user
insights inform refinements to the detection algorithm, notification mechanisms, and
system scalability.
19
safer, more secure future. As it continues to evolve and adapt to emerging challenges,
CrimeGuard remains steadfast in its commitment to fortify public safety, one detection at a
time.
1. YOLO ALGORITHM
2. CONVOLUTION NEURAL NETWORK
3. REAL-TIME CRIME DETECTION
4. DEEP LEARNING MODELS
The YOLO (You Only Look Once) algorithm is a popular object detection system that
revolutionized the field of computer vision. Unlike traditional object detection algorithms,
which involve multiple stages like region proposal, feature extraction, and classification,
YOLO performs all these tasks in a single pass through the neural network. This results in
significantly faster inference speeds, making it well-suited for real-time applications.
1. Input Image: YOLO takes an input image and divides it into a grid.
2. Bounding Box Prediction: For each grid cell, YOLO predicts bounding boxes. Each
bounding box contains the coordinates (x, y) of its center, width, height, and the
confidence score representing the probability that the bounding box contains an object.
3. Class Prediction: YOLO also predicts the probability distribution over all classes for
each bounding box.
20
4. Non-Maximum Suppression (NMS): To eliminate duplicate detections, YOLO applies
NMS, which removes redundant bounding boxes based on their overlap and confidence
scores.
5. Output: The final output of YOLO is a set of bounding boxes, each associated with a
class label and a confidence score.
YOLO has undergone several versions, each with improvements in accuracy and speed.
YOLOv1 was the original version, followed by YOLOv2, YOLOv3, and more recently,
YOLOv4 and YOLOv5, each refining the architecture and training methods to achieve
better performance.
The YOLO algorithm finds applications in various fields such as autonomous vehicles,
surveillance systems, object tracking, and more, owing to its real-time capabilities and
accuracy.
21
Fig. No 4.1.1.1 – YOLO ALGORITHM
ss
3. Feature Learning: CNNs are powerful tools for learning hierarchical representations
of visual data. YOLO leverages this capability to learn discriminative features for object
detection tasks.
22
4. End-to-end Training: YOLO is trained end-to-end, meaning that the entire network,
including the backbone CNN and the detection head, is trained simultaneously. This
holistic training approach is facilitated by the seamless integration of CNNs into YOLO's
architecture.
Overall, YOLO's reliance on CNNs contributes to its effectiveness and efficiency in object
detection tasks, making it a popular choice for real-time applications where speed and
Accuracy.
23
Fig. No 4.1.2 CONVOLUTIONAL NEURAL NETWORK
Real-time crime detection systems can be designed with a few key components in
mind:
1. Data Integration: Gather data from various sources such as surveillance cameras,
IoT devices, social media feeds, emergency calls, and criminal databases.
2. Data Processing: Employ algorithms to process and analyze the data in real time. This
includes object detection, facial recognition, license plate recognition, and natural
language processing for sentiment analysis of social media feeds.
24
6. User Interface: Develop intuitive dashboards or user interfaces for law
enforcement personnel to visualize real-time data, monitor alerts, and take
appropriate actions.
By integrating these elements, real-time crime detection systems can effectively aid
law enforcement agencies in identifying and responding to criminal activities
promptly.
Deep learning models have revolutionized various fields, from computer vision to natural
language processing and beyond. Here's an overview of why they're so powerful:
25
2. Feature Learning: Deep learning models can automatically learn relevant features
from raw data, eliminating the need for manual feature engineering. This ability to learn
hierarchical representations of data enables them to perform well on tasks where the
features are not easily discernible.
3. Scalability: Deep learning models scale well with data, often performing better as more
data becomes available. With the advent of frameworks like TensorFlow and PyTorch,
training deep learning models on large datasets has become feasible, leading to improved
performance on various tasks.
4. Versatility: Deep learning models can be applied to a wide range of tasks, including
image classification, object detection, speech recognition, natural language understanding,
and more. They have shown state-of-the-art performance across multiple domains, often
outperforming traditional machine learning techniques.
26
machine learning models. Understanding why a deep learning model makes a certain
prediction can be challenging, raising concerns regarding transparency and trustworthiness
in critical applications.
27
CHAPTER 5
SYSTEM DESIGN
Requirement analysis is a very critical process that enables the success of a system or
software project to be assessed. Requirements are generally split into two types:
Functional and nonfunctional requirements.
These are the requirements that the end user specifically demands as basic facilities that
the system should offer. All these functionalities need to be necessarily incorporated into
the system as a part of the contract. These are represented or stated in the form of input to
be given to the system, the operation performed and the output expected. They are
basically the requirements stated by the user which one can see directly in the final
product, unlike the non-functional requirements.
28
2. Real-time Processing.
3. Adaptability to different Environments.
4. Customization and Configuration
These are the quality constraints that the system must satisfy according to the project
contract. The priority or extent to which these factors are implemented varies from one
project to another. They are also called non-behavioral requirements.
They deal with issues like:
Portability
Security
Maintainability
Reliability
Scalability
Performance
Reusability
Flexibility
Examples of non-functional requirements:
1. Emails should be sent with a latency of no greater than 12 hours from such an
activity.
2. The processing of each request should be done within 10 seconds
3. The site should load in 3 seconds whenever of simultaneous users are >
10000.
29
5.2 SYSTEM SPECIFICATIONS:
CCTV Cameras
Computer or Server
High Capacity Hard drives
30
successful in the modeling of large and complex systems.
The UML is a very important part of developing object-oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.
GOALS:
31
performed for which actor. The roles of the actors in the system can be depicted.
32
Fig.No 5.3.2 - CLASS DIAGRAM
5.3.3 SEQUENCE DIAGRAM
33
one after another. We have taken the same order management system to describe the
collaboration diagram. The method calls are similar to that of a sequence diagram. But the
difference is that the sequence diagram does not describe the object organization whereas
the collaboration diagram shows the object organization.
The deployment diagram represents the deployment view of a system. It is related to the
component diagram. Because the components are deployed using the deployment
diagrams. A deployment diagram consists of nodes. Nodes are nothing but physical
hardware used to deploy the application.
34
5.3.6 ACTIVITY DIAGRAM:
Component diagrams are often drawn to help model implementation details and double-
check that every aspect of the system's required function is covered by planned
35
development.
5.3.8 ER DIAGRAM:
An Entity–relationship model (ER model) describes the structure of a database with the
help of a diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER
model is a design or blueprint of a database that can later be implemented as a database.
The main components of the E-R model are: entity set and relationship set.
An ER diagram shows the relationship among entity sets. An entity set is a group of
similar entities and these entities can have attributes. In terms of DBMS, an entity is a
table or attribute of a table in a database, so by showing relationship among tables and
their attributes, ER diagram shows the complete logical structure of a database. Let’s have
a look at a simple ER diagram to understand this concept.
36
Fig.No 5.3.8 - ER DIAGRAM
5.3.9 DFD DIAGRAM:
A Data Flow Diagram (DFD) is a traditional way to visualize the information flows within
a system. A neat and clear DFD can depict a good amount of the system requirements
graphically. It can be manual, automated, or a combination of both. It shows how
information enters and leaves the system, what changes the information, and where
information is stored. The purpose of a DFD is to show the scope and boundaries of a
system as a whole. It may be used as a communications tool between a systems analyst
and any person who plays a part in the system and acts as the starting point for redesigning
a system.
37
Fig.No 5.3.9 - DFD DIAGRAM
CHAPTER 6
SYSTEM DESIGN
In our project, we use the waterfall model as our software development cycle because
of its step-by-step procedure during implementation.
38
System Design − The requirement specifications from first phase are studied in this phase
and the system design is prepared. This system design helps in specifying hardware and
system requirements and helps in defining the overall system architecture.
Implementation − With inputs from the system design, the system is first developed in
small programs called units, which are
integrated in the next phase. Each unit is developed and tested for its functionality, which
is referred to as Unit Testing.
Integration and Testing − All the units developed in the implementation phase are
integrated into a system after testing of each unit. Post integration the entire system is
tested for any faults and failures.
Deployment of system − Once the functional and non-functional testing is done; the
product is deployed in the customer environment or released into the market.
Maintenance − Some issues come up in the client environment. To fix those issues,
patches are released. Also, to enhance the product some better versions are released.
Maintenance is done to deliver these changes in the customer environment.
The feasibility of the project is analyzed in this phase and a business proposal is put forth
with a very general plan for the project and some cost estimates. During system analysis,
the feasibility study of the proposed system is to be carried out. This is to ensure that the
proposed system is not a burden to the company. For feasibility analysis, some
understanding of the major requirements for the system is essential.
Three key considerations involved in the feasibility analysis are
ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
39
SOCIAL FEASIBILITY
This study is carried out to check the economic impact that the system will have on the
organization. The amount of funds that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus, the
developed system is well within the budget and this was achieved because most of the
technologies used are freely available. Only the customized products had to be purchased.
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed
system must have a modest requirement, as only minimal or null changes are required for
implementing this system
The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not
feel threatened by the system, instead must accept it as a necessity. The level of
acceptance by the users solely depends on the methods that are employed to educate the
user about the system and to make him familiar with it. His level of confidence must be
raised so that he is also able to make some constructive criticism, which is welcomed, as
40
he is the final user of the system.
6.3 MODULES
1. Input Module
2. Preprocessing Module
3. Violence Detection Module
4. Visualization Module.
The designated module serves as the gateway for input data, comprising images or video
frames depicting altercations in public spaces. Its primary function involves the efficient
management of this visual content, retrieved from surveillance systems capturing incidents
of people engaging in physical confrontations. Serving as the initial stage in the system's
workflow, the module plays a crucial role in preparing the input data for subsequent
processing. Upon receiving images or video frames, the module undertakes tasks such as
format standardization, noise reduction, and data organization. This preprocessing ensures
that the input data is uniform and optimized for further analysis. The focus on altercations
in public places aligns with the system's objective of detecting and responding to violent
incidents, contributing to public safety. By facilitating the seamless transition of raw visual
data to a refined and standardized format, this module sets the foundation for subsequent
stages in the system. The prepared data, now cleansed of extraneous elements, is ready for
advanced processing using deep learning models or other analytical techniques, ultimately
enhancing the system's ability to detect, analyze, and respond to instances of physical
altercations in real-world scenarios
The preprocessing module is a pivotal component that conducts essential preparatory steps
41
on the input data before it is introduced to the detection model. This critical phase involves
a series of tasks aimed at optimizing the data for effective analysis. Tasks within the
preprocessing module encompass operations like resizing, normalization, and data
formatting, all of which are essential to guarantee compatibility with the subsequent
detection model.The resizing aspect involves adjusting the dimensions of the input data,
ensuring uniformity and adherence to the specifications of the detection model.
Normalization is employed to standardize pixel values, enhancing the model's ability to
discern patterns across different images. Data formatting ensures that the input adheres
to the structure expected by the detection model, facilitating seamless integration into the
overall workflow. Moreover, the preprocessing module plays a key role in labeling the
input data, distinguishing between instances of violence and non-violence. This labeling is
fundamental for the supervised learning process, enabling the detection model to learn and
differentiate between the two classes. By performing these preprocessing tasks, the
module lays the groundwork for a robust and streamlined workflow, ultimately enhancing
the detection model's accuracy and effectiveness in discerning violent and non-violent
content within images or video frames.
The Violence Detection Module plays a pivotal role in the system, employing the
YOLOv7 object detection algorithm to identify instances of violence, specifically
detecting and categorizing humans engaged in physical altercations. Operating on
preprocessed data as its input, this module leverages the power of the trained YOLOv7
model to conduct precise and efficient detection of violent activities within images or
video frames. By utilizing the YOLOv7 algorithm, renowned for its accuracy and real-
time object detection capabilities, the module processes the preprocessed data and outputs
valuable information. The system generates bounding box coordinates that precisely
delineate the regions containing instances of violence. The YOLOv7 model's ability to
handle multiple object classes and its effectiveness in real-time applications make it a
42
suitable choice for violence detection in dynamic scenarios. The output from this module
serves as critical information for subsequent stages, aiding in the prompt response and
intervention by law enforcement or relevant authorities in situations of public disturbance
or violence. Overall, the Violence Detection Module showcases the synergy between
advanced object detection algorithms and real-world applications, contributing to
enhanced public safety and security.
43
utility of the system.
CHAPTER 7
TESTING
7.1 INTRODUCTION
Testing forms an integral part of any software development project. Testing helps in
ensuring that the final product is by and large, free of defects and it meets the desired
requirements. Proper testing in the development phase helps in identifying the critical
errors in the design and implementation of various functionalities thereby ensuring
product reliability. Even though it is a bit time-consuming and a costly process at first, it
helps in the long run of software development.
Although machine learning systems are not traditional software systems, not testing them
properly for their intended purposes can lead to a huge impact in the real world. This is
because machine learning systems reflect the biases of the real world. Not accounting or
testing for them will inevitably have lasting and sometimes irreversible impacts. Some of
examples for such fails include Amazon’s recruitment tool which did not evaluate people
in a gender-neutral way and Microsoft’s chatbot Tay which responded with
offensive and derogatory remarks.
In this article, we will understand how testing machine learning systems is different from
44
testing traditional software systems, the difference between model testing and model
evaluation, types of tests for Machine Learning systems followed by a hands-on example
of writing test cases for “insurance charge prediction”.
In traditional software systems, code is written for having a desired behavior as the
outcome. Testing them involves testing the logic behind the actual behavior and how it
compares with the expected behavior. In machine learning systems, however, data and
desired behavior are the inputs and the models learn the logic as the outcome of the
training and optimization processes.
In this case, testing involves validating the consistency of the model’s logic and our
desired behavior. Due to the process of models learning the logic, there are some notable
obstacles in the way of testing Machine Learning systems. They are:
Indeterminate outcomes: on retraining, it’s highly possible that the model parameters vary
significantly
Generalization: it’s a huge task for Machine Learning models to predict sensible outcomes
for data not encountered in their training
Coverage: there is no set method of determining test coverage for a machine-
learning model
Interpretability: most ML models are black boxes and don’t have a comprehensible logic
for a certain decision made during prediction
45
These issues lead to a lower understanding of the scenarios in which models fail and the
reason for that behavior; not to mention, making it more difficult for developers to
improve their behaviors.
From the discussion above, it may feel as if model testing is the same as model evaluation
but that’s not true. Model evaluations focus on the performance metrics of the models like
accuracy, precision, the area under the curve, f1 score, log loss, etc. These metrics are
calculated on the validation dataset and remain confined to that. Though the evaluation
metrics are necessary for assessing a model, they are not sufficient because they don’t
shed light on the specific behaviors of the model.
It is fully possible that a model’s evaluation metrics have improved but its behavior on a
core functionality has regressed. Or retraining a model on new data might introduce a bias
for marginalized sections of society all the while showing no particular difference in the
metrics values. This is extra harmful in the case of ML systems since such problems might
not come to light easily but can have devastating impacts.
We usually write two different classes of tests for Machine Learning systems:
46
Pre-train tests
Post-train tests
Pre-train tests: The intention is to write such tests that can be run without trained
parameters so that we can catch implementation errors early on. This helps in avoiding the
extra time and effort spent in a wasted training job.
Post-train tests: Post-train tests are aimed at testing the model’s behavior. We want to test
the learned logic and it could be tested on the following points and more:
Invariance tests involve testing the model by tweaking only one feature in a data point and
checking for consistency in model predictions. For example, if we are working with a loan
prediction dataset then a change in sex should not affect an individual’s eligibility for the
loan given all other features are the same or in the case of
47
titanic survivor probability prediction data, a change in the passenger’s name should not
affect their chances of survival.
Directional expectations wherein we test for a direct relation between feature values and
predictions. For example, in the case of a loan prediction problem, having a higher credit
score should increase a person’s eligibility for a loan.
Apart from this, you can also write tests for any other failure modes identified for your
model.
Now, let’s try a hands-on approach and write tests for the Medical Cost Personal Datasets.
Here, we are given a bunch of features and we have to predict the insurance costs.
Let’s see the features first. The following columns are provided in the dataset:
The dataset contains transactions made by credit cards in September.This dataset presents
transactions that occurred in two days, where we have 492 frauds out of 284,807
transactions. The dataset is highly unbalanced, the positive class (frauds) accounts for
0.172% of all transactions.
It contains only numeric input variables which are the result of a PCA transformation.
Unfortunately, due to confidentiality issues, we cannot provide the original features and
more background information about the data. Features V1, V2, … V28 are the principal
components obtained with PCA, the only features which have not been transformed with
PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each
transaction and the first transaction in the dataset. The feature 'Amount' is the transaction
Amount, this feature can be used for
example-dependent cost-sensitive learning. Feature 'Class' is the response variable and it
takes value 1 in case of fraud and 0 otherwise.
Doing a little bit of analysis on the dataset will reveal the relationship between various
48
features. Since the main aim of this article is to learn how to write tests, we will skip the
analysis part and directly write basic tests
CHAPTER 8
IMPLEMENTATION
49
TracedModel
def detect(save_img=False):
source, weights, view_img, save_txt, imgsz, trace = opt.source, opt.weights,
opt.view_img, opt.save_txt, opt.img_size, not opt.no_trace
save_img = not opt.nosave and not source.endswith('.txt') # save inference images
webcam = source.isnumeric() or source.endswith('.txt') or source.lower().startswith(
('rtsp://', 'rtmp://', 'http://', 'https://'))
#make crop folder
if not os.path.exists("crop"):
os.mkdir("cro
p") crp_cnt =
0
# Directories
save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok))
# increment run
(save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True,
exist_ok=True) # make dir
# Initialize
set_loggin
g()
device = select_device(opt.device)
half = device.type != 'cpu' # half precision only supported on
CUDA # Load model
model = attempt_load(weights, map_location=device) # load FP32 model
stride = int(model.stride.max()) # model stride
imgsz = check_img_size(imgsz, s=stride) # check
img_size if trace:
model = TracedModel(model, device,
opt.img_size) if half:
model.half() # to
FP16 # Second-stage
classifier classify =
False
if classify:
modelc = load_classifier(name='resnet101', n=2) # initialize
modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)
['model']).to(device).eval()
50
# Set Dataloader
vid_path, vid_writer = None,
None if webcam:
view_img = check_imshow()
cudnn.benchmark = True # set True to speed up constant image size inference
dataset = LoadStreams(source, img_size=imgsz, stride=stride)
else:
dataset = LoadImages(source, img_size=imgsz,
stride=stride) # Get names and colors
names = model.module.names if hasattr(model, 'module') else model.names
colors = [[random.randint(0, 255) for _ in range(3)] for _ in names]
# Run inference
if device.type != 'cpu':
model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters()))) #
run once
old_img_w = old_img_h =
imgsz old_img_b = 1
t0 = time.time()
for path, img, im0s, vid_cap in dataset:
img = torch.from_numpy(img).to(device)
img = img.half() if half else img.float() # uint8 to
fp16/32 img /= 255.0 # 0 - 255 to 0.0 - 1.0
if img.ndimension() == 3:
img =
img.unsqueeze(0) #
Warmup
if device.type != 'cpu' and (old_img_b != img.shape[0] or old_img_h != img.shape[2]
or old_img_w != img.shape[3]):
old_img_b =
img.shape[0] old_img_h
= img.shape[2]
old_img_w =
img.shape[3] for i in
range(3):
model(img, augment=opt.augment)
[0] # Inference
t1 = time_synchronized()
51
pred = model(img, augment=opt.augment)
[0] t2 = time_synchronized()
# Apply NMS
pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes,
agnostic=opt.agnostic_nms)
t3 =
time_synchronized() #
Apply Classifier
if classify:
pred = apply_classifier(pred, modelc, img,
im0s) # Process detections
for i, det in enumerate(pred): # detections per
image if webcam: # batch_size >= 1
p, s, im0, frame = path[i], '%g: ' % i, im0s[i].copy(), dataset.count
else:
p, s, im0, frame = path, '', im0s, getattr(dataset, 'frame',
0) p = Path(p) # to Path
save_path = str(save_dir / p.name) # img.jpg
txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else
f'_{frame}') # img.txt
gn = torch.tensor(im0.shape)[[1, 0, 1, 0]] # normalization gain
# send email
Sender_Email="finalyrprj2024@gmail.com"
Reciever_Email="finalyrprj2024@gmail.com
" Password = "vnfc lbdu mbdy vgpa"
newMessage = EmailMessage()
newMessage['Subject'] = "Voilance
detected!" newMessage['From'] =
Sender_Email newMessage['To'] =
Reciever_Email
newMessage.set_content('Peoples are involved voilated activities')
withopen(r'C:\Users\avant\Downloads\yolov7-voilance\yolov7-voilance\Source code\
crop/0.jpg','rb') as f:
image_data = f.read()
image_type =
imghdr.what(f.name)
image_name = f.name
52
newMessage.add_attachment(image_data,maintype='image',subtype=image_type,
filename=image_name)
with smtplib.SMTP_SSL('smtp.gmail.com', 465) as
smtp: smtp.login(Sender_Email,Password)
smtp.send_message(newMessage)
CHAPTER 9
OUTPUTS AND SNAPSHOTS
53
Fig.No 9.2 - VIOLENCE DETECTION
54
Fig.No 9.4- APP PASSWORD GENERATION
FINAL OUTPUT:
55
Fig.No 9.6 VIOLENCE OUTPUT MAIL
MATRIX
HEAP MAP
56
Fig.No 9.8- HEAT MAP
CHAPTER 10
CONCLUSIONS AND FUTURE WORK
10.1 CONCLUSION :
57
by delving into the local motion dynamics of the video. With the use of many modalities,
the video data is thoroughly examined, allowing the model to identify minute details that
may be signs of impending violence. The YOLO (You Only Look Once) v5 model, which
was created especially for the single goal of identifying violent acts, is the main tool used
in this study. YOLO v5, which is well-known for its effectiveness in real-time object
detection, can handle the challenges involved in identifying violent behaviors in the video
stream. Its exceptional object identification accuracy becomes a crucial advantage in
distinguishing between situations involving human violence and those involving
nonhuman violence.
Future advancements could focus on several key areas to elevate the system's capabilities.
Integrating multi-modal fusion would allow the incorporation of diverse data sources,
fostering a more holistic understanding of potential criminal activities. Behavioral
analysis, using recurrent neural networks, could introduce a temporal dimension to the
model, enabling the identification of abnormal patterns in human
behavior. To address concerns related to the interpretability of deep learning models,
implementing Explainable AI (XAI) techniques would provide transparency in decision-
making processes. Moreover, prioritizing privacy-preserving mechanisms, such as
federated learning, can uphold individual privacy rights in the data analysis process.
Adaptive learning strategies would enable the system to evolve and adapt to changing
crime patterns continuously. The integration of edge computing can enhance real-time
processing capabilities, especially in resource-constrained environments. Collaborating
with IoT devices, including smart cameras and sensors, would expand the system's data
58
sources, offering richer contextual information for crime detection. Ensuring robustness to
adversarial attacks and fostering a human-in-the-loop system would fortify the system
against intentional manipulations and leverage human expertise. Furthermore,
collaboration with law enforcement agencies is crucial to align the system with legal and
ethical standards, establishing protocols for responsible use and compliance with
regulation.
REFERENCES
[1] Kaya V, Tuncer S and Baran A 2021, Detection And Classification Of Different
Weapon Types Using Deep Learning. Applied Sciences, 11 (16), 7535.
[2] Singh P and Pankajakshan V 2018 A Deep Learning Based Technique For Anomaly
Detection In Surveillance Videos. Proc. of the 24th National Conf. on Communications,
pp. 1-6.
[3] Dandage V, Gautam H, Ghavale A, Mahore R and Sonewar P A 2019 Review Of
Violence Detection System Using Deep Learning. Int. Research Journal of Engineering
and Technology 6 (12), pp. 1899-1902.
[4] Wang K, Liu M 2022 YOLOv3-MT: A YOLOv3 Using Multi-Target Tracking For
Vehicle Visual Detection. Appl. Intell. 52, pp. 2070–2091.
[5] Antoniou A and Angelov P 2016 A General Purpose Intelligent Surveillance System
For Mobile Devices Using Deep Learning. Proc. of the Int. Joint Conf. on Neural
Networks, pp. 2879-2886.
59
60