You are on page 1of 19

A

MAJOR PROJECT REPORT

on

Automated Violence Detection in Video Streams

BE (Computer Science and Engineering)

VII Semester

By

T Sushanth Reddy (160120733173)

S Uday Kumar (160120733176)

Under the guidance of

Dr. T. Sridevi

Associate Professor

CSE Department

DEPARTMENT OF COMPUTER SCIENCE


CHAITANYA BHARATHI INSTITUTE OF TECHNOLOGY (A)
(Affiliated to Osmania University; Accredited by NBA(AICTE) and NAAC(UGC), ISO Certified 9001:2015)
KOKAPET(V), GANDIPET(M), RR District HYDERABAD - 75
Website: www.cbit.ac.in
2023-2024

1
This is to certify that the project work entitled “Automated Violence Detection in Video

Streams” submitted to CHAITANYA BHARATHI INSTITUTE OF TECHNOLOGY, in

partial fulfillment of the requirements for the completion of Major Project of VII Semester

B.E. in Computer Science and Engineering, during the Academic Year 2023-2024, is a record

of original work done by T Sushanth Reddy(160120733173) and S Uday Kumar Reddy

(160120733176) during the period of study in the Department of CSE, CBIT,

HYDERABAD, under our guidance.

Supervisor Head of the Department

Dr T. Sridevi Dr.Raman Dugyala

Associate Professor, Professor,

Dept. of CSE, Dept. of CSE,

CBIT, Hyderabad. CBIT, Hyderabad.

2
ACKNOWLEDGEMENTS

We would like to express our heartfelt gratitude to Dr T.Sridevi, our Supervisor, for her
invaluable guidance and constant support, along with her capable instruction and persistent
encouragement.

We would like to take this opportunity to thank our Principal Dr. C. V. Narsimhulu, as well
as the Management of the Institute, for having designed an excellent learning atmosphere.

Our thanks are due to all members of the staff and our lab assistants for providing us with
the help required to carry out the groundwork of this project.

3
ABSTRACT

Hearing about the violent activities that occur on a daily basis around the world is quite
overwhelming. Personal safety and social stability are seriously threatened by the violent activities.
A variety of methods have been tried to curb the violent activities which includes installing of
surveillance systems. It will be of great significance if the surveillance systems can automatically
detect violent activities and give warning or alert signals. The whole system can be implemented
with a sequence of procedures. Firstly, the system has to identify the presence of human beings in
a video frame. Then, the frames which are predicted to contain violent activities has to be
extracted. The irrelevant frames are to be dropped at this stage. Finally, the trained model detects
violent behaviour and these frames are separately saved as images. These images are enhanced to
detect faces of people involved in the activity, if possible. The enhanced images along with other
necessary details such as time and location is sent as an alert to the concerned authority. The
proposed method is a deep learning based automatic detection approach that uses Convolutional
Neural Network to detect violence present in a video. But, the disadvantage of using just CNN is
that, it requires a lot of time for computation and is less accurate. Hence, a pre-trained model,
MobileNet, which provides higher accuracy and acts as a starting point for the building of the
entire model. An alert message is given to the concerned authorities using telegram application.

4
CONTENTS

S.N Topics Page.


o No

Abstract iv

1 Introduction 6-8

1.1 Motivation 7

1.2 Objective of the Project 8

1.3 Problem Statement 8

2 Existing System 9-13

2.1 Literature survey 9

3 Proposed Methodology 14

3.1 System Specifications 14

3.2 List of Analysis Tasks 14


3.3 Data Flow Diagram 16

References 17

5
INTRODUCTION

Violence in public spaces, private premises, and online platforms poses significant
threats to individuals and communities. Timely intervention is essential to prevent harm and
ensure the safety of people and property.Traditional violence detection methods lack speed and
accuracy. Deep Learning in Real Time utilizes diverse datasets, training deep neural networks
on both violent and non-violent behaviors.These models offer adaptability and continuous
learning, improving their recognition of various forms of violence. These networks can then
autonomously analyze live video streams or images, rapidly identifying potential instances of
violence with remarkable precision. The ability to perform such tasks in real time makes this
technology invaluable for a wide range of applications, including but not limited to security
systems, law enforcement, social media content moderation, and public safety. Detection violence
activity is not a simple task because it faces problems like anomaly detection in general and
processing these videos.

Violence detection in video streams using AI involves leveraging advanced machine learning
techniques to automatically identify and analyze instances of violent behavior. By extracting
spatial and temporal features from the video data, deep learning models, such as Convolutional
Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), can discern patterns associated
with violence. Training on labeled datasets, continuous monitoring, and real-time processing
enable the deployment of these models in security systems and public spaces, enhancing the ability
to promptly identify and respond to potential threats. Challenges include addressing data bias,
achieving real-time performance, and navigating privacy concerns in deployment.

6
1.1 MOTIVATION

Motivating the development of an automated violence detection system is driven by the


potential to significantly enhance public safety and security. By leveraging the efficiency
and computational advantages of MobileNet, this system can operate seamlessly on
mobile devices, expanding its reach to a broader range of contexts. The implementation
of such a system holds the promise of reducing response times to violent incidents,
thereby mitigating potential harm. Additionally, it can serve as a force multiplier for law
enforcement and security personnel, offering an extra layer of surveillance to better
monitor public spaces and critical environments. The societal impact includes fostering
safer communities, deterring violence, and providing a proactive tool for maintaining
public order.
Public Safety: The primary motivation is to enhance public safety and security.
Automated violence detection systems can be deployed in public spaces such as airports,
train stations, schools, and crowded events to identify potential violent incidents quickly.
This can aid in preventing or minimizing harm and responding to incidents more
effectively.
Early Intervention: A violence detection system can provide early intervention by
identifying potential threats before they escalate. This early detection can be crucial in
preventing violence and protecting individuals in the vicinity.
Reducing Response Time: Automated systems can significantly reduce the response
time of law enforcement or security personnel. Quick identification of violent behavior
allows for a faster and more targeted response, potentially preventing further harm.
Technological Advancements: Advances in deep learning and computer vision
technologies make it increasingly feasible to develop accurate and reliable violence
detection systems. These advancements contribute to the continuous improvement and
effectiveness of such systems over time.
Adaptability to Emerging Threats: MobileNet's adaptability allows for the continuous
improvement of the violence detection system. As new threats or patterns of violence
emerge, the system can be updated and retrained to recognize and respond to these
evolving challenges, ensuring its effectiveness over time.

7
1.2 OBJECTIVE OF THE PROJECT
Objectives of this project include-

 Accuracy Improvement:
Enhance the accuracy of violence detection to minimize false positives and negatives,
ensuring reliable identification of violent incidents.
 Real-time Detection:
Achieve real-time detection capabilities to enable immediate response and intervention
in violent situations.
 Adaptability to Different Scenarios:
Create a system that can adapt to different types of violence scenarios, considering
variations in behavior and context.
 Integration with Existing Infrastructure:
Ensure seamless integration with existing surveillance and security infrastructure to
facilitate widespread adoption and compatibility.
 Reducing Response Time:
Minimize the response time of security personnel and law enforcement by providing
timely alerts and information about violent incidents.

1.3 PROBLEM STATEMENT

Despite advancements in artificial intelligence for violence detection in video streams, there
exist challenges in ensuring accurate and unbiased identification of violent behavior. Issues
such as data bias, real-time processing constraints, and privacy concerns pose obstacles to the
seamless deployment of these systems in diverse environments. Developing robust models
that can generalize across demographic groups, operate in real-time scenarios, and address
ethical considerations is essential for the effective implementation of violence detection
technologies in security and public safety applications.

8
2. EXISTING SYSTEM

2.1 LITERATURE SURVEY

Exisiting systems include VGGNet is known for its simplicity and effectiveness. It has various
versions (e.g., VGG16, VGG19) and can be used as a feature extractor in real-time violence
detection systems. ResNet is designed to address the vanishing gradient problem in deep
networks. Its skip connections make it well-suited for training very deep networks, and it has
been used in action recognition for real-time violence detection tasksThe Inception architecture,
particularly InceptionV3 and Inception ResNet, offers a good trade-off between accuracy and
computational efficiency. It can be used as a backbone network for feature extraction.
The methodologies used in the research papers that we have gone through include:

[1] For detecting violence has been presented by us that uses a network similar to the U-NET with
the encoder ResNet to extract spatial features before moving on to an LSTM block for the
extraction of temporal features and binary classification The results of the trial revealed that
the precision is 95% and the accuracy is 94% utilising a dataset based on real life situations
The recommended model uses minimal computer resources while yet producing useful results.

9
[1] The research initiative began by creating a specialized dataset tailored for skin disease
classification, encompassing four distinct classes. To bolster the model's generalization
capabilities, data augmentation techniques were applied, expanding the dataset's volume
through variations of existing images. An innovative Convolutional Neural Network (CNN)
model, optimized via hyper-parameter tuning, was introduced for classifying these skin
diseases. Benchmarking against other CNN algorithms demonstrated its superior
performance.Recognizing the sensitivity of medical images, especially those involving
private body parts, the study proactively addressed privacy concerns. A federated learning
approach was explored to enhance medical imaging security, employing the custom dataset.
Federated learning's decentralized model training mitigates privacy risks by keeping sensitive
data localized, contributing to a balance between effective skin disease classification and
preserving the privacy of sensitive medical images.

[2] The methodology introduces a robust 16-layered Convolutional Neural Network (CNN)
specifically designed for skin lesion classification. Trained from scratch on original
dermoscopic images and corresponding ground truth data, the CNN utilizes consistent filter
sizes and channels to ensure a coherent feature extraction process, tailoring the model to the
complexities of dermatological images. To augment the approach, the study adopts deep
HDCT saliency features instead of histograms, providing a more comprehensive
representation of various lesion properties beyond shape characteristics. Additionally, an
innovative image fusion technique, grounded in maximum mutual information, ensures
precise segmentation, crucial for accurate delineation of lesion boundaries. For classification
tasks, DenseNet CNN is employed, leveraging its dense connectivity pattern for effective
feature reuse and propagation. Further enhancing feature engineering, the methodology
incorporates a novel feature fusion method inspired by multi-set canonical correlation
analysis (MCCA). This technique strategically amalgamates information from diverse
sources, aiming to generate an optimal feature space that comprehensively captures nuances
in the input data. By integrating insights from multiple feature sets, the model gains the
capacity to discern and classify various skin lesions effectively, contributing to the overall
robustness of the proposed methodology.

10
[3] This study introduces a novel deep Convolutional Neural Network (CNN)-based model
for skin disease classification, utilizing the triplet loss function. Specifically, ResNet 152 and
InceptionResNet-V2 are fine-tuned using the triplet loss function to extract discriminative
features from skin disease images, offering a unique perspective in the realm of skin disease
classification for enhanced accuracy.An innovative layer-wise fine-tuning strategy is adopted
for pre-trained CNN models, deviating from block-wise fine-tuning. This strategic shift
focuses on optimizing the end-to-end learning process by fine-tuning individual layers,
aiming to better adapt the models to the specific features present in skin disease images. The
layer-wise fine- tuning strategy is expected to significantly enhance the overall performance
of the CNN models, ensuring their effective adaptation to dermatological patterns. In the
classification phase, the model computes L2 distances between images using learned
embeddings, enabling the comparison of distances to classify images into skin disease
categories based on their similarity. This methodology emphasizes the importance of feature
extraction and fine-tuning strategies for a holistic and innovative approach to skin disease
classification.
[4] This methodology introduces two key innovations for dermoscopic image classification.
Firstly, the Multi-weighted New Loss (MWNL) enhances accuracy in detecting crucial
classes, such as melanoma, addressing challenges in precise identification and ensuring
reliable classification outcomes. Secondly, the End-to-End Cumulative Learning Strategy
(CLS) offers an effective balance between representation and classifier learning without
additional computational costs. By prioritizing universal patterns for initialization and
progressively refining the model's understanding of different classes, this strategy provides a
comprehensive solution to challenges associated with class imbalances in dermoscopic image
datasets. The study challenges the prevailing belief in larger models' superiority, advocating
for moderately complex deep convolutional neural network (DCNN) models based on
empirical evidence showing their outperformance in dermoscopic image classification tasks.
The proposed methodology not only introduces novel techniques but also contributes
valuable insights for addressing challenges in dermoscopic image classification.

[5] The proposed system methodology involves the detection, classification, and
segmentation of skin lesions from images using two CNN models. The first CNN model is
utilized for detecting healthy skin from unhealthy skin, while the second CNN model is
employed for classifying unhealthy skin images into one of nine skin diseases. The main
steps of the proposed model include image acquisition, preprocessing, feature extraction, and

11
detection/classification. For lesion segmentation, computer vision techniques are employed,

12
and the performance of the segmentation process is compared with the PH2-Dataset, a
dermoscopic image database developed for research and benchmarking purposes.The dataset
used in the proposal consists of 21,485 skin-colored images with ten categories of skin
diseases. The images were collected from various publicly available websites such as Kaggle,
Dermnetnz, Dermcoll, Medical News Today, and MedLine Plus. The proposed model's
performance was tested in terms of detection, classification, and segmentation, with the
accuracy and loss function being measured during the training stage.

[7]. The study employs a hybrid machine learning model combining the Whale Optimization
Algorithm (WOA) and Singular Value Decomposition (SVM) for skin disease detection.
Image segmentation is performed using a level set approach, followed by feature extraction
using histogram and Local Binary Pattern (LBP). A Support Vector Machine (SVM) based
on WOA classifies diseases, demonstrating high accuracy and outperforming other
algorithms.Moreover, the research introduces a class-center based triplet loss to enhance
reliability in triplet-based learning, particularly for rare disease diagnosis. The triplet-based
solution proves highly effective in skin image classification, surpassing conventional
approaches for handling imbalances. The study also utilizes a CNN-NSVM combination for
categorizing skin lesions in non-dermoscopic digital images. CNN extracts image properties,
and SVM with neutrosophic logic identifies various lesion types, evaluated using a
malignancy lesion image database.Additionally, the methodology incorporates Support
Vector Machine- Based Black Widow Optimization (SVM-BWO) for skin disease
classification. Various disease images are collected, and a unique fuzzy set segmentation
approach is used for skin lesion region division. Color, texture, and form features are
extracted, achieving a classification accuracy of 92%, outperforming alternative methods.

[8] The study's methodology encompasses four key steps: data collection, observation, pre-
processing, and model processing. The dataset, acquired from Kaggle, comprises 23 skin
disease classes sourced from an online medical education portal. Following data collection,
individual image data points were identified before proceeding to pre-processing. This step
involved data annotation, resizing images to 416x416 pixels, and splitting the dataset into
75% training and 25% testing data.Model processing involved building a model using
Residual Network (ResNet) for feature extraction and Fast R-CNN for skin disease detection.
Fast R- CNN, a variant of R-CNN, streamlined runtime by employing a single model for
feature extraction and classification. Evaluation indicated a 90% accuracy for both

13
training and

14
validation datasets, affirming the model's effectiveness. The study concludes that the
proposed Fast R-CNN-based methodology accurately classifies 23 skin disease types and
suggests its scaled-up implementation as a valuable contribution to dermatology, potentially
easing skin disease detection. In summary, the methodology involves data collection,
observation, pre- processing, and model processing, with Fast R-CNN and ResNet
contributing to the achieved high accuracy.

2.1.1 PROPOSED METHODOLOGY

3.1 SYSTEM SPECIFICATION

3.1.1 REQUIREMENTS

Software Requirements

1. Python 3.6
2. Tensor Flow
3. Keras
4. VS Code (For running the program)
5. Numpy 1.18 or newer
6. Pandas 1.0 or newer
7. Pre-trained MobileNetV2 Model with different models

Hardware Requirements

1. Recommended operating system: Windows or Linux


2. Windows: Windows 10 or above
3. Processor: Recommended 4 GHz or more.
4. RAM: Recommended 8 GB or more
5. Graphics card: Integrated Graphics or dedicated graphics card
3.2 LIST OF ANALYSIS TASKS
1. Data collection
1.1. Finding new videos or live video
1.2. Collecting the data from various sources
2. Model Training
2.1 Classify extracted features

15
3. Pre-Processing
3.1 Input Video frames
3.2 Background Substraction
3.3 Threshold
3.4 Output

4. Model Testing
5. Model Evaluation

1.Data Collection

1.1 Finding New Videos or Live Videos


In this phase, the project will actively seek out and acquire new video or live videos relevant
to the target domain. This involves comprehensive searches and collaborations with relevant
sources to ensure a diverse and representative dataset.
1.2 Collecting Data from Various Sources
The collected data will be curated to include video sets from a hockey within the chosen
domain. This diversity is crucial for training a robust model capable of accurately recognizing
and classifying various conditions.
2.Model Training
The preprocessed data will be fed into the chosen deep learning architecture called
MobileNetV2 for extracting features. During this phase, the model extract features accordinly
and is able to classify them.
3.Data Preprocessing
Applying lightweight frame differencing technique to selected and extracted only salient motion
frames to efficiently utilize the computation resources.
4.Model Testing
Selected salient motion frames are pass from the trained MobileNet model to classify whether it
is normal or abnormal activity. If the trained model classifies abnormal activity, the alarm will
be triggered to inform the concerned authority to act against them.
5.Model Evaluation
The model's performance will be assessed using various metrics such as accuracy, precision,
recall, and F1 score. This comprehensive evaluation will provide insights into the model's
strengths and weaknesses, guiding potential improvements and optimizations for future
iterations.

16
3.3 Block Diagram

Extract
Video Streams
Frames

Video frames

Frames MobileNet
Background With features
Substraction classification

Salient features

Check for
abnormal
activity

Sent alert

Notify to
concered Alarm if Violence
authority Detected

17
References
[1] Md. Nazmul Hossen,Vijayakumari Panneerselvam, Deepika Koundal, Kawsar
Ahmed, Francis M. Bui, Sobhy M. Ibrahim , “ Federated Machine Learning for
Detection of Skin Diseases and Enhancement of Internet of Medical Things (IoMT)
Security ”,ieee journal of Biomedical and Health Informatics, vol. 27, No. 2, February
2023.

[2] Muhammad Attique Khan ,Khan Muhammad,Muhammad Sharif,Tallha Akram,and


Victor Hugo C. De Albuquerque “Multi-Class Skin Lesion Detection and
Classification via Teledermatology”, ieee journal of Biomedical and Health
Informatics , vol. 25 , No. 12, December 2021.

[3] Yanyang Gu , Zongyuan Ge,C. Paul Bonnington, and Jun Zhou “ Progressive
Transfer Learning and Adversarial Domain Adaptation for Cross-Domain Skin
Disease Classification”, ieee journal of Biomedical and Health Informatics, vol.
24, No. 5, May 2020.

[4] Belal Ahmad , Mohd Usama, Cheun – Min Huang , Kai Hwang, M. Shamim Hossain
, and Ghulam Muhammad . “Discriminative Feature Learning for Skin Disease
Classification Using Deep Convolutional Neural Network”, ieee journal of
Biomedical and Health Informatics , vol. 8 , February 2020.

[5] Peng Yao , Shuwei Shen, Mengjuan Xu, Peng Liu, Fan Zhang, Jinyu Xing,Pengfei Shao
,Benjamin Kaffenberger , and Ronald X. Xu,"Single Model Deep Learning on
Imbalanced Small Datasets for Skin Lesion Classification ",ieee Transcations on
Medical Imaging, Vol. 41, No. 5, May 2022.

18
[6] Zaynab Habib,R. Naji,Nidhal K. Abbas, EL Abbadi," Skin Diseases Detection,
Classification, and Segmentation",2022 International Conference on Green Eneregy.

[7] Dharam Buddhi, Shaik Vaseem Akram, N Sathishkumar, S. Prabu, Arun Sekar
Rajasekaran,Piyush Kumar Pareeks,"Skin Disease Classification using Hybrid AI
based Localization Approach", 2022 International Conference on Knowledge
Engineering and Communication System.

[8] Prakriti Dwivedi,Akbar Ali Khan,Amit Gawade,Subodh Deolekar ," A deep learning
based approach for automated skin disease detection using Fast R-CNN",2021 Sixth
International Conference on Image Information Processing .

19

You might also like