You are on page 1of 11

ARTIFICIAL NEURAL NETWORKS: BUILD AND TRAIN A

NEURAL NETWORK MODEL FOR INTELLIGENT VIDEO


SURVEILLANCE SYSTEMS

A PROJECT REPORT Submitted by


Anshit Sharma 22BCS15234
Ashman Verma 22BCS15181
Ansh Jain 22BCS15216
Mohd Aftab 22BCS15445

in partial fulfillment for the award of the degree of

BACHELOR OF ENGINEERING
IN
Computer Science

Chandigarh University
February - June 2023
Table of Contents
CHAPTER 3. DESIGN FLOW/PROCESS ...............................................................21

3.1. Concept Generation ......................................................................................................01

3.2. Evaluation & Selection of Specifications/Features ......................................................03

3.3 Existing CNN model Constraints………………………………….....…………………………………………


3.3.1 Economics of the Project ..........................................................................04
3.3.2 Environmental Impact ........................................................................04
3.3.3 Health Impact .........................................................................................04
3.3.4 Safety Impact ............................................................................................04
3.3.5 Professional Impact ..................................................................................05
3.3.6 Ethical Impact ............................................................................................05

3.4. Best CNN model Selection ............................................................................................23

3.5. Implentation Plan ..........................................................................................................26


3.5.1. Block Diagram ..............................................................................................26

3.6. Technology used ...........................................................................................28

3.7 Equipment Integrated with CNN Model..................................................................29


Chapter 3 :- Design flow/ process

3.1 Concept Generation


Concept generation for artificial neural networks: build and train a neural network model for
intelligent video surveillance systems include access to a large amount of video data,
preprocessing, neural network architecture, training data, loss function, optimization algorithm,
hyperparameters, validation and testing, and evaluation and testing. Preprocessing involves tasks
like resizing videos to a standard size, converting them to a specific format, and extracting
relevant features from the video frames. Neural network architecture includes Convolutional
Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial
Networks (GANs). Training data includes video data and the corresponding labels, loss function
measures the difference between the predicted outputs of the neural network and the true labels,
optimization algorithm adjusts the weights of the neural network to minimize the loss function,
hyperparameters include learning rate, batch size, number of layers, and number of neurons per
layer, and validation and testing evaluates the performance of the neural network model.

3.2 Evaluation & Selection of Specifications/Features


1. Network Architecture: The network architecture is the structure of the neural network
that determines how the neurons are interconnected. There are various types of
architectures like feedforward, recurrent, convolutional, etc. Depending on the
requirements, one has to choose an appropriate architecture.
2. Activation Functions: The activation function determines the output of a neuron given its
input. There are various types of activation functions like sigmoid, ReLU, tanh, etc.
Depending on the problem, one has to choose an appropriate activation function.
3. Loss Functions: The loss function is a mathematical function that measures how well the
network is predicting the output. Depending on the problem, one has to choose an
appropriate loss function.
4. Optimization Algorithm: The optimization algorithm is used to minimize the loss
function by adjusting the weights and biases of the neural network. There are various
types of optimization algorithms like stochastic gradient descent, Adam, etc.
5. Regularization Techniques: Regularization techniques are used to prevent overfitting of
the neural network. Some popular regularization techniques are dropout, L1 and L2
regularization, etc.
6. Batch Size and Epochs: Batch size is the number of samples processed by the neural
network at one time, and epochs are the number of times the entire dataset is passed
through the neural network during training. These parameters should be chosen carefully
based on the available computational resources and the size of the dataset.
7. Data Augmentation: Data augmentation techniques are used to increase the size of the
training dataset artificially by applying transformations like rotation, scaling, and flipping
to the existing data.
8. Preprocessing: The input data should be preprocessed before feeding into the neural
network. Preprocessing techniques like normalization, scaling, and feature extraction can
help to improve the performance of the neural network.

3.3 Existing CNN model Constraints


3.3.1 Economics of the Project:

Real-time detection of suspicious activity in video surveillance systems requires quick response times
to detect and alert security personnel about potential threats. CNN models can be computationally
expensive and require a large number of computations to process each input frame, which can be a
significant constraint in resource-constrained environments. The computational cost of a CNN model is
determined by its architecture, size, and complexity. Larger models with more layers and parameters
require more computations, and hence, more computational resources. In resource-constrained
environments, such as embedded systems, IoT devices, or edge computing platforms, the limited
processing power and memory can make it challenging to deploy complex CNN models for real-time
detection.

3.3.2 Environmental Impact:


CNN models require a significant amount of energy to train and run, which can contribute to
greenhouse gas emissions and other environmental impacts associated with energy production.
The deployment of CNN models requires hardware infrastructure, such as servers, GPUs, and
other computing equipment. The disposal of this hardware can contribute to electronic waste and
environmental impacts associated with the production and disposal of electronic equipment.

3.3.3 Health Impact:


In some cases, environmental CNN models rely on sensors or devices that emit radiation, such as
X-rays, to detect suspicious activity. Exposure to radiation can have negative impacts on human
health, including an increased risk of cancer. Some real-time suspicious activity detection
systems rely on sensors or devices that produce loud noises to alert security personnel or other
responders to potential threats. Exposure to loud noises over time can cause hearing loss,
tinnitus, and other negative impacts on human health.

3.3.4 Safety Impact:


Equipment failures: Real-time suspicious activity detection systems rely on sensors, cameras,
and other equipment to detect and alert security personnel to potential threats. Equipment failures
or malfunctions can lead to false alarms, missed threats, and other safety risks. Cybersecurity
risks: The deployment of real-time suspicious activity detection systems can create cybersecurity
risks, as these systems often rely on networked devices and cloud-based platforms to function.
Cyber attacks on these systems can compromise the integrity of the data and pose safety risks to
individuals and organizations. Human error: Real-time suspicious activity detection systems
require human operators to monitor and respond to potential threats. Human error, such as
misinterpretation of data or delayed response times, can lead to safety risks and potentially
dangerous situations.

3.3.5 Professional Impact:


The deployment of real-time suspicious activity detection systems may result in the displacement
of certain jobs, such as security guards or other personnel who are responsible for monitoring
and responding to potential threats. This can have a significant impact on the livelihoods of
individuals and communities. Real-time suspicious activity detection systems require trained
personnel to operate and maintain the equipment and software. This can create a need for
specialized training and education programs, which can have professional impacts on individuals
and organizations. CNN models can be challenging to adapt to new scenarios or domains, which
can be a constraint when dealing with evolving threat landscapes or new types of suspicious
activity. Overfitting also occurs in the model; it occurs when a model is trained to fit the training
data too closely and is not able to generalize well to new data. Overfitting is a common problem
in CNN models, which can result in poor performance in real-world scenarios.

3.3.6 Ethical Impact:


In our project, we followed ethical guidelines to ensure that no personal information of patients
was compromised. We obtained patient data from publicly available datasets and ensured that
patient privacy was maintained. Additionally, we took steps to ensure that our algorithm was fair
and unbiased. We trained our models using diverse datasets to ensure that they could accurately
diagnose lung cancer in patients from different demographics. Overall, our project adhered to
ethical principles to ensure that patients were not harmed and that our results could be used to
improve healthcare outcomes.

3.4 Best CNN Model selection


Selecting the best CNN model for detecting lung cancer is crucial to achieve high accuracy in
cancer diagnosis. CNN models differ in their architecture, parameters, and performance, making
model selection a crucial step in the development of a deep learning algorithm. The selection
process involves several stages, including training and validation, hyperparameter tuning, and
performance evaluation.
This step involves clearly defining the scope and nature of the problem that the CNN model is
being designed to solve. This includes specifying the types of suspicious activity that need to be
detected, the environmental conditions under which the activity occurs, and the specific
performance metrics that the model will be evaluated against. The performance of a CNN model
is heavily influenced by the quality and quantity of data used for training. Therefore, it is
essential to gather a large and diverse dataset that captures the range of suspicious activity and
environmental conditions of interest. Once the data is collected, it is often necessary to
preprocess it in order to improve its quality and suitability for training the model. This may
involve tasks such as resizing images, normalizing brightness and contrast levels, and removing
noise and artifacts.
There are many different CNN architectures to choose from, each with its own strengths and
weaknesses. The selection of an appropriate architecture depends on several factors, including
the complexity of the problem, the amount and quality of available data, and the computational
resources available for training and inference. Commonly used architectures for real-time
suspicious activity detection include YOLO (You Only Look Once), Faster R-CNN (Region-
based Convolutional Neural Networks), and SSD (Single Shot Detector).
Once a suitable architecture has been selected, the model must be fine-tuned to improve its
performance on the specific task at hand. This involves adjusting hyperparameters, such as
learning rate, batch size, and regularization strength, to optimize the model's performance. The
choice of loss function and optimization algorithm can also have a significant impact on the
model's performance. Once the model has been fine-tuned, it is essential to evaluate its
performance on a separate validation dataset. This allows the model to be tested on new and
previously unseen data, and helps to identify potential issues with overfitting or underfitting.
Evaluation metrics such as precision, recall, and F1 score are commonly used to quantify the
model's performance.
After evaluating the model's performance on validation data, the model can be deployed and
tested in a real-time environment. This involves integrating the model with a video feed and
evaluating its performance on live data. Real-time performance metrics such as frame rate and
processing time are important to consider in order to ensure that the model is suitable for
deployment in real-world scenarios. Real-time suspicious activity detection is an iterative
process, and it is essential to continually monitor and improve the model's performance over
time. This involves incorporating new data as it becomes available, fine-tuning the model to
improve its accuracy and speed, and continually evaluating its performance in real-world
settings.
By following these steps, it is possible to develop an accurate and effective CNN model for real-
time suspicious activity detection, which can help improve security and safety in a wide range of
environments.
Overall, selecting the best CNN model for detecting suspicious activity requires careful
consideration of various factors, including the dataset size, model architecture, hyperparameter
tuning, and performance evaluation. It also requires the use of statistical analysis to identify
patterns and relationships in the data that can be used to improve the model's performance. By
selecting the best CNN model, we can improve the detection of the level of suspicious activity,
which can help in the early detection and treatment of the disease.
3.5 Implementation plan
Implementing a deep ensemble CNN model for detecting suspicious activity with the help of
surveillance cameras can be a complex task. However, with proper planning and execution, it can
be done efficiently. Here are some steps that can be taken to implement the proposed model.
Data collection is a critical step in the process of building an accurate and effective CNN model.
The performance of the model depends heavily on the quality and quantity of data used to train
it. Therefore, it's important to collect a large and diverse dataset that captures the range of
suspicious activity and environmental conditions of interest. The first step in data collection is to
define the scope of the problem and the types of suspicious activity that need to be detected. This
includes identifying the relevant locations, times of day, and situations where suspicious activity
is likely to occur. It's important to collect data that reflects the real-world conditions where the
model will be deployed. Next, data is typically collected using a variety of sensors such as
surveillance cameras, audio sensors, and other devices that can capture relevant data. The data
should be collected in different formats, including video, audio, and sensor data, depending on
the type of suspicious activity being detected. Once the data is collected, it needs to be
preprocessed. The data needs to be cleaned, and any anomalies or noise in the data need to be
removed. The data should then be resized and normalized to ensure that it is compatible with the
model's input size. This step is critical in ensuring that the model can learn from the data
effectively. The next step is to split the data into training and testing sets. The training data will
be used to train the model, while the testing data will be used to evaluate the model's
performance. The data should be split in a way that ensures that the model does not overfit the
training data and can generalize well on new data.

The model architecture needs to be designed next. In the proposed model, we will use multiple
models and combine their output for the desired result alongside keeping nodes of various
parameters such as motion pattern, object detection, facial recognition, time of the day etc. Some
of the models that will be used are CNN(convolutional neural network), MobileNet model,
ResNet (Residual Network),a deep ensemble CNN architecture. This architecture consists of
three independent CNNs, each with a different architecture and number of layers. The outputs of
these three CNNs are then combined to form the final output of the model. The choice of CNN
architecture depends on the type of data being used and the complexity of the problem being
solved.
Once the model architecture is designed, the model needs to be trained. The training process
involves feeding the training data into the model and adjusting the model's parameters to
minimize the error between the predicted output and the actual output. The training process can
take a long time, depending on the complexity of the model and the size of the data.
After the model is trained, it needs to be evaluated using the testing data. The model's
performance is measured using various metrics, including accuracy, sensitivity, specificity, and
precision. In the proposed model, an accuracy of 66.6%, with an loss of 82% , this can be
improved by increasing the dataset and epochs size. The model will have an flow of program as
given below in the figure:
Finally, the model needs to be deployed. The model can be deployed in various ways, including
as a web application or a mobile application. The deployment process involves integrating the
model into the desired platform and ensuring that it runs efficiently and accurately.
In conclusion, implementing a deep ensemble CNN model for suspicious detection requires
careful planning and execution. It involves collecting and preprocessing the data, designing the
model architecture, training and evaluating the model, and finally deploying the model. The
proposed model achieved high accuracy, sensitivity, specificity, and precision and has the
potential to improve outcomes and early detection of crime.
3.5.1 Block diagram:

3.6 Technology Used


1. Python:
Python is a high-level programming language that is widely used in the field of data science and
machine learning. It has an extensive library of tools and modules that are useful for various
tasks, including data analysis, visualization, and modeling.
2. Keras:

Keras is an open-source neural network library written in Python. It provides a user-friendly


interface for building and training deep learning models. Keras can run on top of TensorFlow,
CNTK, or Theano, making it a versatile tool for machine learning tasks.
3. TensorFlow:

TensorFlow is an open-source platform for building and training machine learning models. It
was developed by Google and is used widely in the industry for tasks such as image and speech
recognition, natural language processing, and predictive analytics.
4. Scikit-learn:

Scikit-learn is a free and open-source machine learning library for the Python programming
language. It provides tools for classification, regression, clustering, and dimensionality reduction,
among other tasks.
5. Google Colaboratory:

Google Colaboratory is a free cloud-based platform that provides access to Python programming
environments, including Jupyter Notebooks and TensorFlow. It is widely used by data scientists
and machine learning practitioners to experiment with code and algorithms without the need for
a local machine.
6. DICOM file format:

DICOM (Digital Imaging and Communications in Medicine) is a standard format for medical
images such as X-rays, CT scans, and MRIs. It includes metadata such as patient information,
imaging parameters, and pixel data.
7. LIDC-IDRI dataset:

The Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) is a
publicly available dataset of CT scans for lung cancer diagnosis. It contains over 1000 CT scans
with annotations for lung nodules by multiple radiologists, making it a valuable resource for
developing and testing machine learning algorithms for lung cancer detection.
8. FER(2013) dataset:
The FER2013 dataset is a widely-used facial expression recognition dataset that consists of
35,887 grayscale images in 48x48-pixel resolution. The images are categorized into seven
emotions: anger, disgust, fear, happiness, sadness, surprise, and neutral. The dataset was created
by Pierre-Luc Carrier and Aaron Courville and made available to the public in 2013. The images
in the dataset were collected from a range of sources, including the internet and manually-labeled
facial expression datasets. FER2013 has become a popular benchmark dataset for evaluating
facial expression recognition models and has been used in numerous studies in the field.
9. OpenCV:
OpenCV is the huge open-source library for computer vision, machine learning, and image
processing and now it plays a major role in real-time operation which is very important in
today’s systems. By using it, one can process images and videos to identify objects, faces, or
even handwriting of a human. When integrated with various libraries, such as NumPy, python is
capable of processing the OpenCV array structure for analysis. To Identify image pattern and its
various features we use vector space and perform mathematical operations on these features.
10. OS module:
The functions OS module provides allows us to operate on underlying Operating System tasks,
irrespective of it being a Windows Platform, Macintosh or Linux. In this lesson, we will review
these functions and what we can do with them. Let us start our journey with these functions and
what information they offer.
11. Kaggle:
Kaggle is a popular platform for data scientists, machine learning engineers, and enthusiasts to
compete in data science challenges, collaborate on projects, and share knowledge. Kaggle
provides a variety of resources and tools for data analysis, such as datasets, Jupyter notebooks,
and competitions, to enable users to explore and analyze data, build models, and share their
results with the community.
12. Github:
GitHub is a web-based platform for version control and collaboration that allows developers to
store and manage their code, as well as track changes and collaborate with other team members.
It provides a variety of tools for developers, including the ability to host and review code,
manage projects and software releases, and collaborate on open-source software development.
With GitHub, developers can work on code together, share feedback, and make changes to code
in a controlled manner.

3.7 Equipment integrated with CNN model


The final equipment that has the CNN model integrated into it is a surveillance system. This
system consists of several components, including a computer with a lens, a shutter, power supply
and a software program that can capture and store images and videos. Along with the CNN
model is accompanied by a depth sensor for gesture detection and foreground and background
separation equipped with a powerful processor and cloud storage so that our data is safe and
cannot be manipulated.
The computer used in the system is equipped with a powerful CPU to perform the complex
computations required for deep learning models. We need high quality camera lens in order to
perform multiple algorithms at once and provide a useful output so blurry and unclear video
capturing may lead to unwanted and diverted results.
The software program used in the system is written in Python, a popular programming language
for machine learning and data analysis. It uses several libraries and frameworks, including
TensorFlow, Keras, and OpenCV, to implement the CNN model and perform image analysis
tasks.
Overall the final equipment needed for security surveillance which can detect much better, will
be able to inform the local authorities in time, help and be a good cause to the society and should
not act as a faulty machine.

You might also like