0% found this document useful (0 votes)
31 views50 pages

Report

The document presents a major project report titled 'PRO VISION' submitted by a group of students for their Bachelor of Computer Application degree at Tula's Institute, Dehradun. The project focuses on developing a smart surveillance system that integrates advanced computer vision techniques for real-time monitoring, including features like motion detection, face recognition, and visitor tracking. The report includes sections on the project's objectives, literature review, system requirements, design, and implementation, highlighting the use of Python and various libraries to enhance security in small-scale environments.

Uploaded by

chronoansh28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views50 pages

Report

The document presents a major project report titled 'PRO VISION' submitted by a group of students for their Bachelor of Computer Application degree at Tula's Institute, Dehradun. The project focuses on developing a smart surveillance system that integrates advanced computer vision techniques for real-time monitoring, including features like motion detection, face recognition, and visitor tracking. The report includes sections on the project's objectives, literature review, system requirements, design, and implementation, highlighting the use of Python and various libraries to enhance security in small-scale environments.

Uploaded by

chronoansh28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

PRO VISION

A MAJOR PROJECT REPORT

Submitted in partial fulfillment for the award of the degree of

BACHELOR OF COMPUTER APPLICATION

in

COMPUTER APPLICATIONS

Under The Guidance of

Dr. Sanjeev Solanki


(Professor)
Submitted by

1. Ashish Singh Rawat 236229285022


2. Chandra Shekar 236229285033
3. Chand Lokesh Jhalak 236229285031
4. Aryan Gusain 236229285021
5. Anshika Pundir 236229285012

Submitted To

DEPARTMENT OF COMPUTER APPLICATIONS


TULA'S INSTITUTE DEHRADUN
MAY-2025
DEPARTMENT OF COMPUTER
APPLICATIONS

TULA'S INSTITUTE, DEHRADUN

DECLARATION

I hereby declare that the major project entitled “PRO VISION”


submitted for the BCA (Bachelor of Computer Application,
Dehradun) Degree is my original work and the major project have not
formed the basis for the award of my degree, associate ship,
fellowship or any other similar titles.

Place: Signature of the Student


Date:

Ashish Singh Rawat 236229285022

Chandra Shekar 236229285033

Chand Lokesh Jhalak 236229285031

Aryan Gusain 236229285021

Anshika Pundir 236229285012


DEPARTMENT OF COMPUTER
APPLICATIONS

TULA'S INSTITUTE, DEHRADUN

CERTIFICATE

This is to certify that the major project “ PRO VISION” is the bonafide work
carried out by “Ashish Singh Rawat , Chandra Shekar , Chand Lokesh
Jhalak , Aryan Gusain, Anshika Pundir” Roll number “236229285022,
236229285033, 236229085031, 236229285021, 236229285012" student of
BCA Tulas’s Institute, Dehradun ,During the year 2024-25 in partial
fulfillment of the requirement for the award of the degree of Bachelor of
Computer Application and that the major project has not formed the basis for
the award previously of any degree , diploma,associateship , fellowship or any
other similar title.

Place: Signature of the Guide


Date: Dr. Sanjeev Solanki
(Professor)
ACKNOWLEDGMENT

We would like to express my sincere gratitude to all the individuals who


have supported and contributed to the successful completion of my BCA
project. Their guidance, assistance, and encouragement have been
invaluable throughout this journey.

First and foremost, we would like to extend my heartfelt appreciation to


my project guide Dr. Sanjeev Solanki for his unwavering support and
expertise. Her continuous guidance, valuable insights, and constructive
feedback have been instrumental in shaping this project and enhancing its
quality.

We are also grateful to the faculty members of Tula's Institute


Dehradun, especially HOD Mam Dr Priya Matta and our project
coordinator Md Mursleen, for their valuable inputs, suggestions, and
encouragement. Their vast knowledge and expertise in the field of
computer science have been instrumental in broadening our
understanding of the subject matter.

Finally, we would like to acknowledge the support and encouragement


from our family. Their unwavering belief in our abilities and their
continuous support throughout our academic journey have been a
constant source of motivation.

In conclusion, we are deeply grateful to all the individuals mentioned


above and to anyone else who has played a part, no matter how big or
small, in the completion of this BCA project. Their contributions have
been indispensable, and without their support, this project would not have
been possible.
Thanks You.

Ashish Singh Rawat


Chandra Shekar
Chand Lokesh Jhalak
Aryan Gusain
Anshika Pundir

Date :-
ABSTRACT

Pro Vision is a smart surveillance system developed to overcome the


limitations of conventional CCTV setups by integrating advanced
computer vision techniques and real-time data processing. The system is
designed to perform intelligent monitoring through various automated
features, including motion detection, object monitoring, face recognition,
visitor entry/exit tracking, and real-time video recording. Built using
Python and supported by libraries such as OpenCV, NumPy, and Tkinter,
Pro Vision offers a modular and user-friendly graphical interface that
allows users to access all functionalities seamlessly.

At the core of the system, technologies like the Structural Similarity


Index (SSIM) are employed to detect changes in a scene, while the LBPH
(Local Binary Pattern Histogram) algorithm enables accurate face
recognition. Motion detection is handled by analyzing frame differences
and drawing contours around detected movement, ensuring high
responsiveness to activity within the monitored area. The system stores
important events in organized folders, categorizing them into incidents
like 'in', 'out', 'stolen', and general recordings, allowing for easy review.
By combining these intelligent modules, Pro Vision not only improves
the efficiency of surveillance but also reduces the need for constant
human supervision. It is well-suited for use in homes, offices, and
institutions seeking low-cost, automated security solutio
TABLE OF CONTENTS

Declaratrion i
Certificates ii
Acknownledgement iii
Abstract
Chapter 1 Introduction 1
1.1 Background
1.2 Objectives
1.3 Scope
Chapter 2 Literature Review 2-4
Chapter 3 System Requirement 5
3.1 Software Requirements
3.2 Hardware Requirements
Chapter 4 System Design 6-10
4.1 Block Diagram
4.2 Architecture
4.3 Component Description
4.4 Model
Chapter 5 Coding 11-23
5.1 [Link]
5.2 Find_noise.py
5.3 Spot_diff.py
5.4 [Link]
5.5 [Link]
5.6 [Link]
5.7 Finding noises in rectangle
5.8 Date and Time
Chapter 6 Implementation 24-35
6.1 Data Flow
6.2 Modules and Functionality
6.3 User Interface
Chapter 7 Future Scope 36-38
Chapter 8 Conclusion 39-41
References 42-43
Appendix 44
Chapter 1
INTRODUCTION

1.1 Background
With the rapid growth of urban infrastructure and increasing security
concerns, surveillance systems have become a crucial component in
maintaining safety and monitoring environments. Traditional CCTV
systems, while useful, often require manual intervention and lack
intelligent features. The rise of Computer Vision and Machine
Learning has enabled the development of smarter, more autonomous
security systems. This project, Smart CCTV Surveillance System, is
designed to go beyond passive video capture and introduce real-time
intelligent monitoring features such as object theft detection, motion
analysis, face recognition, and visitor counting

1.2 Objectives
The main objectives of this project are:

 To develop a Python-based GUI application that utilizes a webcam


for real-time surveillance.
 To integrate intelligent features such as:
 Object theft detection using image similarity.
 Noise/motion detection to trigger alerts.
 Face recognition using LBPH algorithm for identifying known
individuals.
 Visitor tracking to count entries and exits in a monitored room.
 To ensure cross-platform functionality (Windows/Linux/Mac).

1.3 Scope
This system is intended for use in small-scale environments like homes,
offices, or labs where real-time monitoring with intelligent alerts can
enhance security. The scope includes:

Building a desktop application using Python and OpenCV.

Implementing basic Computer Vision techniques without the need for


deep learning [Link] modular expansion in future iterations,
such as weapon detection or fire alerts using deep learning models.

1
CHAPTER 2
LITERATURE REVIEW
In the landscape of modern security, conventional closed-circuit
television (CCTV) systems are steadily becoming obsolete. These legacy
systems primarily rely on passive monitoring and continuous human
supervision, making them limited in their responsiveness and prone to
human error. The rise in demand for intelligent, automated surveillance
systems has led to the integration of advanced technologies such as
computer vision, machine learning, and real-time image processing.
Systems like Pro Vision represent a major leap in this direction by
incorporating a wide range of intelligent monitoring features that work
autonomously and respond to events as they occur.

One of the foundational elements in Pro Vision is the Structural


Similarity Index (SSIM), a perceptual metric used to compare the visual
similarity between two images. Unlike traditional methods that rely
solely on pixel-by-pixel comparison, SSIM analyzes three crucial
components of visual perception: luminance, contrast, and structure.
This approach allows the system to detect changes in a scene that may
otherwise go unnoticed in basic difference calculations. In surveillance
contexts, this is particularly useful for identifying when an object has
been removed or tampered with. For instance, when a valuable item
disappears from the camera's field of view, the system captures two
frames — one before and one after motion is detected — and applies
SSIM to determine if a significant structural change has occurred. The
use of the Python skimage library simplifies this complex computation,
offering high-level methods to apply SSIM in real-time monitoring
applications.

Another core function within Pro Vision is face detection and


recognition, which allows the system not just to see but to understand
who is present in the monitored area. Face detection is implemented
using Haar Cascade Classifiers, a method introduced by Viola and Jones
that revolutionized real-time object detection. Haar cascades use simple
rectangular features trained through machine learning to rapidly scan
images and detect faces, even under varying conditions. Once a face is
detected, Pro Vision employs the Local Binary Pattern Histogram (LBPH)
algorithm to recognize the individual. LBPH works by dividing a face

2
image into smaller regions, computing binary patterns for each region,
and generating histograms that collectively serve as a facial signature.
This signature is then compared to a pre-trained dataset to identify
known individuals. The strength of LBPH lies in its robustness against
lighting variations and facial expressions, as well as its low
computational cost, making it ideal for real-time surveillance on devices
without high-end GPUs.

Motion detection is another essential feature integrated into Pro Vision,


offering the capability to monitor dynamic activity in the camera’s field
of view. The technique used here is frame differencing, one of the
simplest yet effective methods for motion analysis. By computing the
absolute pixel-wise difference between two consecutive frames, the
system can isolate areas where movement occurs. OpenCV’s contour
detection tools are then applied to highlight the regions of interest.
When motion is detected, these contours are outlined and labeled
within the video stream. This real-time feedback allows the system to
identify events such as intrusions, unauthorized movements, or
environmental changes. It also acts as a triggering mechanism for other
system responses, such as object monitoring or visitor tracking.

Among the more innovative capabilities of Pro Vision is its directional


visitor tracking feature. This function allows the system to determine
whether a person is entering or leaving a monitored space, using spatial
and temporal analysis of motion. By defining specific left and right zones
within the video frame, the system monitors the trajectory of movement.
If motion starts on the left side and ends on the right, the system
classifies the event as an "entry," and vice versa for an "exit." This simple
yet effective technique enables automated visitor counting and activity
logging without requiring biometric scanning or manual input. Such
functionality has been increasingly discussed in recent literature on
smart buildings and occupancy analytics, as it provides valuable data for
space usage and security management.

The entire Pro Vision system is built on the Python programming


language, which was selected for its versatility, ease of use, and rich
ecosystem of libraries. Python has emerged as a dominant language in
computer vision and machine learning due to frameworks such as
OpenCV, NumPy, Pillow, and Tkinter. These tools collectively handle
tasks ranging from GUI rendering and image processing to mathematical
computations and real-time visual feedback. The integration of these

3
libraries allows for rapid prototyping and deployment across multiple
platforms, including Windows and Linux operating systems. Moreover,
Python’s open-source nature and active community support have
significantly reduced the development cycle and simplified debugging,
making it an ideal choice for both academic projects and commercial
applications.

From a broader perspective, the design choices in Pro Vision align with
current trends in smart surveillance. Research in the field has
increasingly emphasized the need for modular, automated, and locally-
processed security systems. While cloud-based surveillance offers
scalability, it introduces latency, privacy concerns, and dependency on
internet connectivity. In contrast, systems like Pro Vision, which process
data locally and store it securely on the device, offer faster response
times and greater control over sensitive information. This approach has
been highlighted in studies advocating edge computing for surveillance,
especially in privacy-sensitive or resource-constrained environments
such as homes and small businesses.

In conclusion, the literature strongly supports the methods and


technologies implemented in Pro Vision. The use of SSIM for structural
change detection, LBPH for efficient face recognition, and motion
analysis through frame differencing reflects a thoughtful integration of
reliable and well-supported computer vision techniques. These choices
ensure that Pro Vision delivers accurate, responsive, and meaningful
surveillance while remaining lightweight and accessible. As smart
surveillance continues to evolve, systems like Pro Vision provide a
practical foundation that can be extended with additional features such
as anomaly detection, night vision, and even deep learning capabilities
— offering a scalable path from academic prototype to real-world
deployment.

4
CHAPTER 3
SYSTEM REQUIREMENT
3.1 SOFTWARE REQUIREMENT

The software stack for Pro Vision is built on cross-platform tools and open-source
libraries to ensure maximum accessibility and flexibility. Below are the essential
software requirements:

Operating System: Windows

Programming Language: Python 3.x

Required Python Libraries:

OpenCV – for computer vision and camera handling

NumPy – for efficient array and numerical operations

skimage – for Structural Similarity Index and image comparison

Tkinter – for building the graphical user interface (GUI)

Pillow – for image display and processing in GUI

IDE/Text Editor: VS Code

3.2 HARDWARE REQUIREMENT

The following hardware specifications are recommended:

Processor: Intel Core i5 or equivalent (Dual Core or higher)

RAM: Minimum 4 GB (8 GB recommended for smooth performance)

Storage: At least 1 GB free space for recorded data and images

Camera: In-built or external webcam with basic driver support

Display: Minimum 1366x768 resolution for GUI rendering

Optional: LED/Flashlight for low-light or night-time monitoring

5
CHAPTER 4
SYSTEM DESIGN
The system design of Pro Vision outlines how various components—both
hardware and software—interact to create a functional smart surveillance
solution. It includes the architectural flow, module design, and
component responsibilities that drive the core functionality of the system.

4.1 Block Diagram


The block diagram of Pro Vision represents the interaction between the
user interface, processing modules, hardware, and storage components.
At the core is the webcam that captures real-time video input, which is
processed through various modules based on the user’s selection via the
GUI. The results are either displayed in the interface or stored locally as
images or video recordings.

Block Diagram (described):

Input: Webcam Feed

GUI (Tkinter): Provides access to features like Monitor, Identify,


Noise,In/Out, Record

Processing Modules:

Motion Detection

SSIM Comparison

Face Detection & Recognition

Direction Analysis (In/Out)

Video Recording

Output:

Display (real-time GUI)

Image Storage (Screenshots for events)

Video Storage (Recordings)

6
Fig.1 BLOCK DIAGRAM

7
4.2 Architecture
Pro Vision follows a layered and modular architecture to separate
concerns and enhance maintainability. The architecture consists of the
following layers:

Presentation Layer: The user interface, developed using Tkinter, serves


as the access point for users to control surveillance operations.

Processing Layer: This layer contains all computer vision logic.


Depending on the selected feature, it handles real-time image processing,
frame differencing, face recognition, and similarity checks.

Data Layer: Responsible for storing outputs such as detected frames,


visitor snapshots, and recorded videos. This ensures evidence is logged
for later analysis.

Hardware Interaction Layer: Interfaces with the webcam and handles


frame capture using OpenCV.

The separation of logic into these layers allows for easy updates and
future integration with more advanced systems such as IoT modules or
deep learning algorithms.

4.3 Component Description


Webcam: Captures continuous frames to be analyzed by the system.

Tkinter GUI: Allows users to initiate and interact with features using
buttons and icons.

Motion Detection Module: Detects movement using frame differencing


and contour analysis.

Monitor Module: Uses SSIM to compare frames and identify missing or


altered objects.

Face Recognition Module: Combines Haar cascade classifiers and the


LBPH algorithm to detect and recognize faces.

In/Out Module: Tracks direction of movement to identify room entries


and exits.

8
Recording Module: Records live video feed with timestamps and stores
it in AVI format.

Storage: Captures and stores event-specific images and full recordings


for evidence or review.

4.4 Model
The development of Pro Vision followed the Waterfall Model. This
model was chosen due to its simplicity and structured approach, making it
suitable for a compact, self-contained project like this. The phases
followed are:

Requirement Analysis: Deciding on the features and behavior of the


system.

Design: Creating architecture and deciding on algorithms (SSIM, LBPH,


etc.).

Implementation: Writing the code in Python using OpenCV, Tkinter,


and other libraries.

Testing: Verifying each module (monitoring, motion, recognition) across


multiple OS platforms.

Deployment: Running the complete GUI application and storing results.

Maintenance: Enhancing functionality and fixing minor bugs during


testing.

The Waterfall Model provided a clear development path and ensured that
every part of the system—from face recognition to recording—was tested
and working before moving to the next phase.

9
The Waterfall Model is a traditional software development methodology
that follows a linear and sequential approach. It divides the project life
cycle into distinct phases, where each phase must be completed before the
next one begins. For the Pro Vision smart surveillance system, the
Waterfall Model was selected due to its simplicity and suitability for
small to medium-sized projects where the requirements are well-
understood from the beginning.

Fig.2 WATERFALL MODEL

10
CHAPTER 5
CODING
The Pro Vision system was implemented in Python due to its simplicity,
portability, and extensive library support for computer vision and GUI
development. The system follows a modular programming approach,
where each feature is implemented as a separate Python module and
integrated into a unified GUI.

The main file ([Link]) acts as the central controller, invoking different
functionalities like monitoring, noise detection, in/out tracking, face
identification, and recording based on user interaction through the GUI.
Below is an overview of each key module in the project.

5.1 [Link]

import tkinter as tk
import [Link] as font
from in_out import in_out
from motion import noise
from rect_noise import rect_noise
from record import record
from PIL import Image, ImageTk
from find_motion import find_motion
from identify import maincall

window = [Link]()
[Link]("Smart cctv")
[Link](False, [Link](file='[Link]'))
[Link]('1080x700')
frame1 = [Link](window)

label_title = [Link](frame1, text="Smart cctv Camera")


label_font = [Link](size=35, weight='bold',family='Helvetica')
label_title['font'] = label_font
label_title.grid(pady=(10,10), column=2)
icon = [Link]('icons/[Link]')
icon = [Link]((150,150), [Link])
icon = [Link](icon)

11
label_icon = [Link](frame1, image=icon)
label_icon.grid(row=1, pady=(5,10), column=2)

btn1_image = [Link]('icons/[Link]')
btn1_image = btn1_image.resize((50,50), [Link])
btn1_image = [Link](btn1_image)

btn2_image = [Link]('icons/[Link]')
btn2_image = btn2_image.resize((50,50), [Link])
btn2_image = [Link](btn2_image)

btn5_image = [Link]('icons/[Link]')
btn5_image = btn5_image.resize((50,50), [Link])
btn5_image = [Link](btn5_image)

btn3_image = [Link]('icons/[Link]')
btn3_image = btn3_image.resize((50,50), [Link])
btn3_image = [Link](btn3_image)

btn6_image = [Link]('icons/[Link]')
btn6_image = btn6_image.resize((50,50), [Link])
btn6_image = [Link](btn6_image)

btn4_image = [Link]('icons/[Link]')
btn4_image = btn4_image.resize((50,50), [Link])
btn4_image = [Link](btn4_image)

btn7_image = [Link]('icons/[Link]')
btn7_image = btn7_image.resize((50,50), [Link])
btn7_image = [Link](btn7_image)

# --------------- Button -------------------#


btn_font = [Link](size=25)
btn1 = [Link](frame1, text='Monitor', height=90, width=180,
fg='green',command = find_motion, image=btn1_image, compound='left')
btn1['font'] = btn_font
[Link](row=3, pady=(20,10))

btn2 = [Link](frame1, text='Rectangle', height=90, width=180, fg='orange',


command=rect_noise, compound='left', image=btn2_image)
btn2['font'] = btn_font
[Link](row=3, pady=(20,10), column=3, padx=(20,5))

btn_font = [Link](size=25)
btn3 = [Link](frame1, text='Noise', height=90, width=180, fg='green',
command=noise, image=btn3_image, compound='left')
btn3['font'] = btn_font

12
[Link](row=5, pady=(20,10))

btn4 = [Link](frame1, text='Record', height=90, width=180, fg='orange',


command=record, image=btn4_image, compound='left')
btn4['font'] = btn_font
[Link](row=5, pady=(20,10), column=3)

btn6 = [Link](frame1, text='In Out', height=90, width=180, fg='green',


command=in_out, image=btn6_image, compound='left')
btn6['font'] = btn_font
[Link](row=5, pady=(20,10), column=2)

btn5 = [Link](frame1, height=90, width=180, fg='red', command=[Link],


image=btn5_image)
btn5['font'] = btn_font
[Link](row=6, pady=(20,10), column=2)

btn7 = [Link](frame1, text="identify", fg="orange",command=maincall,


compound='left', image=btn7_image, height=90, width=180)
btn7['font'] = btn_font
[Link](row=3, column=2, pady=(20,10))

[Link]()
[Link]()

5.2 find_noise.py

import cv2
from spot_diff import spot_diff
import time
import numpy as np
def find_motion():
motion_detected = False
is_start_done = False
cap = [Link](0)
check = []
print("waiting for 2 seconds")
[Link](2)
frame1 = [Link]()
_, frm1 = [Link]()
frm1 = [Link](frm1, cv2.COLOR_BGR2GRAY)
while True:

13
diff = [Link](frm1, frm2)
_, thresh = [Link](diff, 30, 255, cv2.THRESH_BINARY)
contors = [Link](thresh, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)[0]
#look at it
contors = [c for c in contors if [Link](c) > 25]
if len(contors) > 5:
[Link](thresh, "motion detected", (50,50),
cv2.FONT_HERSHEY_SIMPLEX, 2, 255)
motion_detected = True
is_start_done = False
elif motion_detected and len(contors) < 3:
if (is_start_done) == False:
start = [Link]()
is_start_done = True
end = [Link]()
end = [Link]()
print(end-start)
if (end - start) > 4:
frame2 = [Link]()
[Link]()
[Link]()
x = spot_diff(frame1, frame2)
if x == 0:
print("runnig again")
return
else:
print("found motion sending mail")
return
else:
[Link](thresh, "no motion detected", (50,50),
cv2.FONT_HERSHEY_SIMPLEX, 2, 255)
[Link]("winname", thresh)
_, frm1 = [Link]()
frm1 = [Link](frm1, cv2.COLOR_BGR2GRAY)
if [Link](1) == 27:
break
return

14
5.3 SPOT_DIFF.PY

import cv2
import time
from [Link] import structural_similarity
from datetime import datetime

def spot_diff(frame1, frame2):

frame1 = frame1[1]
frame2 = frame2[1]
g1 = [Link](frame1, cv2.COLOR_BGR2GRAY)
g2 = [Link](frame2, cv2.COLOR_BGR2GRAY)
g1 = [Link](g1, (2,2))
g2 = [Link](g2, (2,2))
(score, diff) = structural_similarity(g2, g1, full=True)
print("Image similarity", score)
diff = (diff * 255).astype("uint8")
thresh = [Link](diff, 100, 255, cv2.THRESH_BINARY_INV)[1]
contors = [Link](thresh, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)[0]
contors = [c for c in contors if [Link](c) > 50]
if len(contors):
for c in contors:
x,y,w,h = [Link](c)
[Link](frame1, (x,y), (x+w, y+h), (0,255,0), 2)
else:
print("nothing stolen")
return 0
[Link]("diff", thresh)
[Link]("win1", frame1)
[Link]("stolen/"+[Link]().strftime('%-y-%-m-%-d-
%H:%M:%S')+".jpg", frame1)
[Link](0)
[Link]()
return 1

15
5.4 [Link]

import cv2
import os
import numpy as np
import tkinter as tk
import [Link] as font
def collect_data():
name = input("Enter name of person : ")
count = 1
ids = input("Enter ID: ")
cap = [Link](0)
filename = "haarcascade_frontalface_default.xml"
cascade = [Link](filename)
while True:
_, frm = [Link]()
gray = [Link](frm, cv2.COLOR_BGR2GRAY)

faces = [Link](gray, 1.4, 1)


for x,y,w,h in faces:
[Link](frm, (x,y), (x+w, y+h), (0,255,0), 2)
roi = gray[y:y+h, x:x+w]

[Link](f"persons/{name}-{count}-{ids}.jpg", roi)
count = count + 1
[Link](frm, f"{count}", (20,20), cv2.FONT_HERSHEY_PLAIN,
2, (0,255,0), 3)
[Link]("new", roi)
[Link]("identify", frm)
if [Link](1) == 27 or count > 200:
[Link]()
[Link]()
train()
break
def train():
print("training part initiated !")
recog = [Link].LBPHFaceRecognizer_create()
dataset = 'persons'
paths = [[Link](dataset, im) for im in [Link](dataset)]
faces = []
ids = []
labels = []
for path in paths:
16
[Link]([Link]('/')[-1].split('-')[0])
[Link](int([Link]('/')[-1].split('-')[2].split('.')[0]))
[Link]([Link](path, 0))
[Link](faces, [Link](ids))
[Link]('[Link]')

return
def identify():
cap = [Link](0)
filename = "haarcascade_frontalface_default.xml"

paths = [[Link]("persons", im) for im in [Link]("persons")]


labelslist = []
for path in paths:
if [Link]('/')[-1].split('-')[0] not in labelslist:
[Link]([Link]('/')[-1].split('-')[0])
print(labelslist)
recog = [Link].LBPHFaceRecognizer_create()
[Link]('[Link]')
cascade = [Link](filename)
while True:
_, frm = [Link]()
gray = [Link](frm, cv2.COLOR_BGR2GRAY)
faces = [Link](gray, 1.4, 1)
for x,y,w,h in faces:
[Link](frm, (x,y), (x+w, y+h), (0,255,0), 2)
roi = gray[y:y+h, x:x+w]
label = [Link](roi)
[Link](frm, f"{labelslist[label[0]]}", (x,y),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 3)
[Link]("identify", frm)
if [Link](1) == 27:
[Link]()
[Link]()
break

def maincall():
root = [Link]()
[Link]("480x100")
[Link]("identify")
label = [Link](root, text="Select below buttons ")
[Link](row=0, columnspan=2)
label_font = [Link](size=35, weight='bold',family='Helvetica')
button1 = [Link](root,
label['font'] = label_font text="Add Member ", command=collect_data,
btn_font = [Link](size=25) 17
height=2, width=20)
[Link](row=1, column=0, pady=(10,10), padx=(5,5))
button1['font'] = btn_font
button2 = [Link](root, text="Start with known ",
command=identify, height=2, width=20)
[Link](row=1, column=1,pady=(10,10), padx=(5,5))
button2['font'] = btn_font
[Link]()
return

5.5 IN_OUT.PY

import cv2
from datetime import datetime
def in_out():
cap = [Link](0)

right, left = "", ""

while True:
_, frame1 = [Link]()
frame1 = [Link](frame1, 1)
_, frame2 = [Link]()
frame2 = [Link](frame2, 1)

diff = [Link](frame2, frame1)

diff = [Link](diff, (5,5))

gray = [Link](diff, cv2.COLOR_BGR2GRAY)

_, threshd = [Link](gray, 40, 255,


cv2.THRESH_BINARY)

contr, _ = [Link](threshd, cv2.RETR_TREE,


cv2.CHAIN_APPROX_SIMPLE)

x = 300
if len(contr) > 0:
max_cnt = max(contr, key=[Link])
18
x,y,w,h = [Link](max_cnt)
[Link](frame1, (x, y), (x+w, y+h), (0,255,0), 2)
[Link](frame1, "MOTION", (10,80),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0,255,0), 2)

if right == "" and left == "":


if x > 500:
right = True

elif x < 200:


left = True

elif right:
if x < 200:
print("to left")
x = 300
right, left = "", ""
[Link](f"visitors/in/{[Link]().strftime('%-y-
%-m-%-d-%H:%M:%S')}.jpg", frame1)
elif left:
if x > 500:
print("to right")
x = 300
right, left = "", ""
[Link](f"visitors/out/{[Link]().strftime('%-y-
%-m-%-d-%H:%M:%S')}.jpg", frame1)

[Link]("", frame1)

k = [Link](1)

if k == 27:
[Link]()
[Link]()
break

19
5.6 [Link]

import cv2

def noise():
cap = [Link](0)

while True:
_, frame1 = [Link]()
_, frame2 = [Link]()

diff = [Link](frame2, frame1)


diff = [Link](diff, cv2.COLOR_BGR2GRAY)

diff = [Link](diff, (5,5))


_, thresh = [Link](diff, 25, 255, cv2.THRESH_BINARY)

contr, _ = [Link](thresh, cv2.RETR_TREE,


cv2.CHAIN_APPROX_SIMPLE)

if len(contr) > 0:
max_cnt = max(contr, key=[Link])
x,y,w,h = [Link](max_cnt)
[Link](frame1, (x, y), (x+w, y+h), (0,255,0), 2)
[Link](frame1, "MOTION", (10,80),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0,255,0), 2)

else:
[Link](frame1, "NO-MOTION", (10,80),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0,0,255), 2)

[Link]("esc. to exit", frame1)

if [Link](1) == 27:
[Link]()
[Link]()
break

20
5.7 FINDING NOISES IN RECTANGLE

import cv2

donel = False
doner = False
x1,y1,x2,y2 = 0,0,0,0

def select(event, x, y, flag, param):


global x1,x2,y1,y2,donel, doner
if event == cv2.EVENT_LBUTTONDOWN:
x1,y1 = x,y
donel = True
elif event == cv2.EVENT_RBUTTONDOWN:
x2,y2 = x,y
doner = True
print(doner, donel)

def rect_noise():

global x1,x2,y1,y2, donel, doner


cap = [Link](0)

[Link]("select_region")
[Link]("select_region", select)

while True:
_, frame = [Link]()

[Link]("select_region", frame)

if [Link](1) == 27 or doner == True:


[Link]()
print("gone--")
break while True:
_, frame1 = [Link]()
_, frame2 = [Link]()

frame1only = frame1[y1:y2, x1:x2]


frame2only = frame2[y1:y2, x1:x2]

21
diff = [Link](diff, cv2.COLOR_BGR2GRAY)

diff = [Link](diff, (5,5))


_, thresh = [Link](diff, 25, 255, cv2.THRESH_BINARY)

contr, _ = [Link](thresh, cv2.RETR_TREE,


cv2.CHAIN_APPROX_SIMPLE)

if len(contr) > 0:
max_cnt = max(contr, key=[Link])
x,y,w,h = [Link](max_cnt)
[Link](frame1, (x+x1, y+y1), (x+w+x1, y+h+y1),
(0,255,0), 2)
[Link](frame1, "MOTION", (10,80),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0,255,0), 2)

else:
[Link](frame1, "NO-MOTION", (10,80),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0,0,255), 2)

[Link](frame1, (x1,y1), (x2, y2), (0,0,255), 1)


[Link]("esc. to exit", frame1)

if [Link](1) == 27:
[Link]()
[Link]()
break

22
5.8 DATE AND TIME

import cv2

from datetime import datetime

def record():
cap = [Link](0)

fourcc = cv2.VideoWriter_fourcc(*'XVID')
out =
[Link](f'recordings/{[Link]().strftime("%H-
%M-%S")}.avi', fourcc,20.0,(640,480))

while True:
_, frame = [Link]()

[Link](frame, f'{[Link]().strftime("%D-%H-
%M-%S")}', (50,50), cv2.FONT_HERSHEY_COMPLEX,
0.6, (255,255,255), 2)

[Link](frame)

[Link]("esc. to stop", frame)

if [Link](1) == 27:
[Link]()
[Link]()
break

23
Chapter-6
IMPLEMENTATION
The implementation of Pro Vision was carried out using Python,
leveraging libraries such as OpenCV for image processing, Tkinter for
GUI development, NumPy for numerical operations, and skimage for
structural similarity analysis. The system was designed in a modular
format to ensure scalability and ease of maintenance. Each core
functionality—monitoring, face recognition, motion detection, in/out
tracking, and video recording—was implemented as an independent
Python script and later integrated through a central graphical interface.
The GUI was designed to be user-friendly, featuring labeled buttons and
icons that allow users to trigger specific modules with a single click.
Features such as face detection were implemented using Haar cascade
classifiers, while face recognition was performed using the LBPH (Local
Binary Pattern Histogram) algorithm. Object monitoring was achieved by
comparing frames using the Structural Similarity Index (SSIM), and
direction-based tracking was coded by analyzing motion vectors across
frame boundaries. Once the backend logic was in place, the application
was tested across different platforms, including Windows and Linux, to
ensure platform independence and real-time performance.

6.1 Data Flow


In Pro Vision, data flow begins with the webcam feed, which
continuously captures real-time video frames. These frames are passed
into the system via the Graphical User Interface (GUI), where the user
selects the desired function such as Monitor, Noise Detection, Face
Identification, In/Out Tracking, or Video Recording.

Step 1: Data Capture


The webcam captures live video and sends individual frames to the
system in real-time using OpenCV.

Step 2: User Input via GUI


The user interacts with the Tkinter-based GUI, choosing one of the
surveillance features. This input controls which module processes the
video data.

24
Step 3: Data Processing in Modules
Based on the selected function:

Monitor Mode compares frames using SSIM to detect object removal.

Noise Detection uses frame differencing and contour detection for


motion.

Face Identification applies Haar cascades for detection and LBPH for
recognition.

In/Out Tracking analyzes the direction of movement based on object


position changes.

Recording Mode continuously writes video frames to a file with


timestamp overlays.

Step 4: Output Generation


After processing, the system either:

 Displays results in the GUI (e.g., face name, motion alert),


 Stores images (e.g., detected faces, motion snapshots), or
 Saves full video recordings to the system’s storage.

This structured flow ensures that data moves logically and efficiently
through each stage—from input acquisition to intelligent processing and
evidence storage—maintaining system responsiveness and reliability.

6.2 Modules And Functionality

For this project we have used various latest technologies which will be
evaluated in this chapter with every details of why it is used.
We’ll divide this section of explaination of technolgy based on
modules/features in project.
But first lets see the language used in this project.

Language Used,
We have used Python language as it is very new and also comes with so
many features like we can do Machine Learning, Computer Vision and
Also make GUI application with ease.

Reasons for Selecting this language :

25
1 – Short and Concise Language.
2 – Easy to Learn and use.
3 – Good Technical support over Internet
4 – Many Package for different tasks.
5 – Run on Any Platform.
6 – Modern and OOP language
Well these are just the minor points from our sides Python is just a lot
more than this.

Some more notes about python,


Python is a widely used general-purpose, high level programming
language. It was created by Guido van Rossum in 1991 and further
developed by the Python Software Foundation. It was designed with an
emphasis on code readability, and its syntax allows programmers to
express their concepts in fewer lines of code. Python is a programming
language that lets you work quickly and integrate systems more
[Link] are two major Python versions: Python 2 and Python 3
Some specific features of Python are as follows:

 An interpreted (as opposed to compiled) language. Contrary to e.g.


C or Fortran, one does not compile Python code before executing it.
In addition, Python can be used interactively: many Python
interpreters are available, from which commands and scripts can be
executed.
 A free software released under an open-source license: Python can
be used and distributed free of charge, even for building
commercial software.
 Multi-platform: Python is available for all major operating
systems, Windows, Linux/Unix, MacOS X, most likely your
mobile phone OS, etc.
 A very readable language with clear non-verbose syntax
 A language for which a large variety of high-quality packages are
available for various applications, from web frameworks to
scientific computing.
 A language very easy to interface with other languages, in
particular C and C++.
 Some other features of the language are illustrated just below. For
example, Python is an object-oriented language, with dynamic
typing (the same variable can contain objects of different types
during the course of a program).
Below are the different features which can performed by using this minor
project:

26
1. Monitor
2. Identify the family member
3. Detect for Noises
4. Visitors in room detection

So lets see each features one by one.

Monitor Feature :
This feature is used to find what is the thing which is stolen from
the frame which is visible to webcam. Meaning It constantly monitors the
frames and checks which object or thing from the frame has been taken
away by the thief.

This uses Structural Similarity to find the differences in the two frames.
The two frames are captured first when noise was not happened and
second when noise stopped happening in the frame.

2. SSIM is used as a metric to measure the similarity between two given


images. As this technique has been around since 2004, a lot of material
exists explaining the theory behind SSIM but very few

resources go deep into the details, that too specifically for a gradient-
based implementation as SSIM is often used as a loss function.
The Structural Similarity Index (SSIM) metric extracts 3 key features
from an image:
 Luminance
 Contrast
 Structure

The comparison between the two images is performed on the basis of


these 3 features.

27
This system calculates the Structural Similarity Index between 2 given
images which is a value between -1 and +1. A value of +1 indicates that
the 2 given images are very similar or the same while a value of -1
indicates the 2 given images are very different. Often these values are
adjusted to be in the range [0, 1], where the extremes hold the same
meaning.

Luminance: Luminance is measured by averaging over all the pixel


values. Its denoted by μ (Mu) and the formula is given below,

28
Structure: The structural comparison is done by using a consolidated
formula (more on that later) but in essence, we divide the input signal
with its standard deviation so that the result has unit standard deviation
which allows for a more robust comparison.

Luckily , thanks to skimage package in python we dont have to replicate


all this mathematical calculation in python since skimage has pre build
feature that d o all of these tasks for us with just calling its in-built
function.

We just have to feed in two images/frames which we have captured


earlier, so we just feed them in and its gives us out the masked image
with score.

Identify the Family Member feature


This feature is very useful feature of our minor project, It is used to
find if the person the frame is known or not. It do this in two steps :

1 – Find the faces in the frames


2 – Use LBPH face recognizer algorithm to predict the person from
already trained model.

So lets divide this in following categories,

1 – Detecting faces in the frames


This is done via Haarcascade classifiers which are again in-built
in openCV module of python.

29
Cascade classifier, or namely cascade of boosted classifiers working
with haar-like features, is a special case of ensemble learning, called
boosting.
It typically relies on Adaboost classifiers (and other models such as Real
Adaboost, Gentle Adaboost or Logitboost).
Cascade classifiers are trained on a few hundred sample images of image
that contain the object we want to detect, and other images that do not
contain those images.
There are some common features that we find on most common human
faces :
 a dark eye region compared to upper-cheeks
 a bright nose bridge region compared to the eyes
 some specific location of eyes, mouth, nose…
The characteristics are called Haar Features. The feature extraction
process will look like this :
Haar features are similar to these convolution kernels which are used to
detect the presence of that feature in the given image.
For doing all this stuff openCV module in python language has inbuild
function called cascadeclassifier which we have used in order to detect
for faces in the frame

30
2 – Using LBPH for face recognition

So now we have detected for faces in the frame and this is the time to
identify it and check if it is in the dataset which we’ve used to train our
lbph model.

The LBPH uses 4 parameters:


 Radius: the radius is used to build the circular local binary pattern
and represents the radius around the central pixel. It is usually set
to 1.
 Neighbors: the number of sample points to build the circular local
binary pattern. Keep in mind: the more sample points you include,
the higher the computational cost. It is usually set to 8.
 Grid X: the number of cells in the horizontal direction. The more
cells, the finer the grid, the higher the dimensionality of the
resulting feature vector. It is usually set to 8.
 Grid Y: the number of cells in the vertical direction. The more
cells, the finer the grid, the higher the dimensionality of the
resulting feature vector. It is usually set to 8

The first computational step of the LBPH is to create an intermediate


image that describes the original image in a better way, by highlighting
the facial characteristics. To do so, the algorithm uses a concept of a
sliding window, based on the parameters radius and neighbors. Which is
shown perfectly via the above image.

Extracting the Histograms: Now, using the image generated in the last

31
step, we can use the Grid X and Grid Y parameters to divide the image
into multiple grids, as can be seen in the following image:

And after all this the model is trained and later on when we want to make
predictions the same steps are applied to the make and its histograms are
compared with already trained model and in such way this feature works.
3 – Detect for Noises in the frame

This feature is used to find the noises in the frames well this is
something you would find in most of the cctv’s but in this module we’ll
see how it works.

Talking in simple way all the frames are continously analyzed and
checked for noises. Noise in checked in the consecutive frames. Simply
we do the absolute difference between two frames and in this way the
difference of two images are analyzed and Contours( boundaries of the
motion are detected ) and if there are no boundries then no motion and if
there is any there is motion .pixel and similarly every pixel has that
valules of brigthness.

So we just do simply absolute difference because negative will make no


sense at all.

32
4 – Visitors in room detection

This is the feature which can detect if someone has entered in the
room or gone out.

So it works using following steps:

1 – It first detect for noises in the frame.


2 – Then if any moiton happen it find from which side does that happen
either left or right.
3 – Last if checks if motion from left ended to right then its will detect it
as entered and capture the frame.
Or vise-versa.

So there is not complex mathematics going on around in this specific


feature.

So basically to know from which side does the motion happened we first
detect for motion and later on we draw rectangle over noise and last step
is we check the co-ordinates if those points lie on left side then it is
classified as left motion.

33
6.3 USER INTERFACE

# feature 1 - Monitor
Showing use of first feature or you can consider it as output from feature
1.

as you can see that it is detecting speaker is stolen which is true.

#feature 2 – Noise Detection


This is working captured output for NO-Motion and Motion being
detected by this application.

34
# feature 3 – In out Detection It has detected me entering in the room
and being detected as entered and saving the image locally.

# feature 4 – Face Identification


Since I have trained my model for sidhu and it is predicting sidhu
correctly.
Now its not always predict right sometime it makes bad predictions also.

35
CHAPTER 7

Future Scope

As technology continues to advance, surveillance systems are evolving


from basic video capture tools into sophisticated, intelligent security
solutions. Pro Vision is built on a modular, scalable architecture that
positions it well for future enhancement and expansion. While the
current implementation provides efficient motion detection, face
recognition, and monitoring features, there is significant potential to
further develop the system to meet the rising demands of modern
security challenges.

One of the most promising directions for future improvement is the


development of portable CCTV systems. With the increasing availability
of compact single-board computers like Raspberry Pi and Jetson Nano, it
is now feasible to run lightweight versions of Pro Vision on small,
battery-powered devices. These portable units can be deployed in
remote or temporary locations where traditional CCTV infrastructure is
impractical. Combined with wireless connectivity and local data
processing, these devices could provide on-the-go security without
reliance on constant internet or power supply.

Another important area for enhancement is the integration of night


vision capability. Most basic webcams and consumer cameras struggle
in low-light environments, limiting their effectiveness during nighttime.
Future versions of Pro Vision could incorporate infrared (IR) sensors or
thermal imaging cameras to enable 24/7 surveillance regardless of
lighting conditions. In addition, implementing low-light image
enhancement algorithms can significantly improve visibility and accuracy
during poor illumination, allowing the system to detect objects and faces
even in complete darkness.

As computational power continues to improve, Pro Vision can be


enhanced by incorporating deep learning (DL) models to unlock more
advanced features. Currently, the system relies on classical computer
vision methods that are efficient and fast, but limited in complexity.
Deep learning models, particularly Convolutional Neural Networks
(CNNs), can provide significantly better accuracy in tasks like object

36
detection, face recognition, and anomaly identification. With DL
integration, Pro Vision could support features such as:

 Deadly weapon detection, which would identify firearms or knives in


real-time video streams.
 Accident detection, useful in public spaces or transportation hubs to
detect collisions, falls, or injuries.
 Fire and smoke detection, enabling the system to act as a basic fire
alarm when flames or smoke are visually detected.
 Suspicious behavior analysis, where DL models track posture,
movement speed, or activity duration to flag unusual patterns.

These features would transform Pro Vision into a more intelligent and
context-aware system, capable of not just recording events, but
understanding and responding to them in real-time.

Furthermore, efforts can be made to convert the system into a fully


standalone application, eliminating the need for Python installation or
manual setup. By packaging the project using tools like PyInstaller,
Electron-Python, or cross-platform frameworks, the entire application
can be distributed as an executable file (.exe for Windows, .app for
macOS), ready to run with a single click. This would greatly enhance
usability for non-technical users and make the system more appealing
for deployment in offices, schools, and homes.

On a more advanced level, the project could evolve into a standalone


surveillance device. This would involve creating custom hardware with
an embedded operating system, integrated camera module, and pre-
installed software. With the addition of a small touchscreen or wireless
interface, users could operate Pro Vision without needing an external
computer. These compact surveillance boxes could serve as plug-and-
play security units, configurable via mobile or desktop apps, and
networked together for large-scale deployments.

The growing field of Internet of Things (IoT) opens further possibilities


for remote monitoring and control. A future version of Pro Vision could
be linked to a cloud dashboard, enabling users to monitor multiple
locations through a centralized interface. Real-time alerts could be
pushed to smartphones through apps or SMS when anomalies like
intrusions or fire are detected. Although the current version focuses on

37
local processing, future iterations could offer cloud sync as an optional
add-on for those who require remote access and backup.

In addition to technological enhancements, attention should be paid to


privacy, security, and legal compliance. As surveillance systems grow
more intelligent, ensuring that they operate ethically and protect user
data becomes critical. Future updates could include encrypted storage,
access control via authentication, and user activity logging. Moreover,
the system could include customizable privacy zones, where specific
areas of the camera feed are ignored or blurred to respect privacy
concerns in shared spaces.

Finally, Pro Vision holds strong potential as an educational tool. With


modular code, rich visual feedback, and real-time interaction, it can be
used to teach students and developers about computer vision, GUI
development, and real-world AI applications. Adding documentation,
training datasets, and even a graphical setup wizard could help make Pro
Vision a part of classroom labs or developer toolkits.

In summary, Pro Vision is not just a project—it is a platform. With future


upgrades such as deep learning integration, portable device deployment,
night vision support, and standalone application packaging, it can grow
into a powerful, commercial-grade smart surveillance solution. As
technology advances, the opportunities for expanding the system’s
capabilities will only increase, making Pro Vision a forward-looking
investment in security, innovation, and accessibility.

38
Chapter 8
Conclusion

The Pro Vision surveillance system was conceptualized and developed as


a response to the increasing need for smarter, more autonomous security
systems that reduce the burden of human monitoring while providing
real-time insights. As traditional CCTV systems often lack the ability to
detect, respond to, or analyze ongoing activity, Pro Vision fills a crucial
gap by integrating multiple intelligent features within a lightweight, user-
friendly application. The project stands as a successful demonstration of
how modern computer vision techniques can transform conventional
video surveillance into a proactive security solution.

Throughout the development process, a strong focus was placed on


modularity, ease of use, and real-time performance. Built using Python
and key libraries such as OpenCV, NumPy, Tkinter, and scikit-image, the
system showcases the power of open-source technologies to create
complex applications without the need for expensive proprietary tools.
Each core feature — motion detection, face recognition, object
monitoring, directional tracking, and video recording — was designed as
an independent module and integrated through a central graphical
interface. This structure not only enhanced maintainability and flexibility
but also ensured that each function could be developed and tested in
isolation, reducing the risk of bugs or performance issues in the overall
application.

One of the most valuable aspects of Pro Vision is its ability to detect and
react to environmental changes. The use of the Structural Similarity Index
(SSIM) allows the system to recognize subtle differences in image
content, making it possible to detect theft or tampering by comparing
frames captured before and after movement is detected. Meanwhile, the
motion detection algorithm based on frame differencing and contour
analysis enables the system to alert users to dynamic activity in its field
of view. These capabilities allow the system to act not only as a recorder
but as a smart observer capable of identifying potential threats in real
time.

Another key achievement of the system is the integration of face


detection and recognition using Haar cascade classifiers and the LBPH
(Local Binary Pattern Histogram) algorithm. This feature enables the
application to identify known individuals and log entry events accurately.

39
The inclusion of directional tracking further expands its usefulness,
particularly in access-controlled areas or rooms where occupancy
tracking is important. Together, these features transform the system from
a passive video recorder into an interactive security assistant that logs,
identifies, and organizes surveillance data for easy review and analysis.

From a development perspective, Pro Vision was also a valuable learning


experience in software design, modular coding, GUI development, and
applied artificial intelligence. It presented opportunities to explore best
practices in user interface design, cross-platform compatibility, code
optimization, and real-time video processing. By building and testing the
application on both Windows and Linux operating systems, the team
ensured the system would be accessible to a wide range of users and
hardware configurations. The application’s performance on basic
hardware platforms shows that effective surveillance need not require
high-end infrastructure — a modest computer and webcam are sufficient
to run the complete system efficiently.

Moreover, Pro Vision holds significant promise for future extension. Its
architecture was intentionally designed to support the integration of more
advanced features as technology evolves. Potential enhancements include
deep learning–based object detection, real-time behavioral analysis,
integration with IoT devices for smart home automation, and even mobile
or cloud connectivity for remote monitoring. These additions could
elevate Pro Vision from a basic surveillance assistant to a comprehensive
security platform suitable for deployment in offices, campuses, homes, or
public environments. The modular codebase and clean UI further support
the idea of Pro Vision as an educational tool, allowing students and
developers to learn from and build upon the existing system.

The successful implementation of Pro Vision demonstrates that practical,


intelligent security solutions can be achieved using accessible and well-
documented tools. It also reinforces the idea that innovation does not
always require expensive hardware or complex cloud services — well-
designed software, when backed by strong logic and thoughtful
integration, can significantly improve everyday technologies. In addition,
the system provides a foundation for further academic or commercial
development, acting as a stepping stone toward more complex
surveillance platforms powered by artificial intelligence and edge
computing.

In conclusion, Pro Vision achieves its core objective of transforming


traditional CCTV into a smart, responsive, and user-friendly surveillance

40
system. It effectively merges multiple technologies into a cohesive
product that is as practical as it is educational. The success of this project
underscores the value of modular software development, the potential of
open-source libraries, and the impact of automation in the field of
security. With its strong foundation and future-ready design, Pro Vision is
not just a complete project—it is the beginning of a much larger
opportunity in smart surveillance innovation.

41
References
1. Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality
assessment: From error visibility to structural similarity. IEEE Transactions on
Image Processing, 13(4), 600–612. [Link]
2. Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of
simple features. Proceedings of the 2001 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR), 1, I-511–I-518.
[Link]
3. Ahonen, T., Hadid, A., & Pietikäinen, M. (2006). Face description with local binary
patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 28(12), 2037–2041. [Link]
4. Bradski, G. (2000). The OpenCV library. Dr. Dobb's Journal of Software Tools.
[Link]
5. Shapiro, L., & Stockman, G. (2001). Computer Vision. Prentice Hall.
6. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection.
In Proceedings of the IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR), 1, 886–893. [Link]
7. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once:
Unified, real-time object detection. Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 779–788.
8. Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment
using multitask cascaded convolutional networks. IEEE Signal Processing Letters,
23(10), 1499–1503.
9. King, D. E. (2009). Dlib-ml: A machine learning toolkit. Journal of Machine
Learning Research, 10, 1755–1758. [Link]
10. Russakovsky, O., Deng, J., Su, H., Krause, J., et al. (2015). ImageNet Large Scale
Visual Recognition Challenge. International Journal of Computer Vision, 115(3),
211–252.
11. Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely
Connected Convolutional Networks. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 4700–4708.
12. Szeliski, R. (2010). Computer Vision: Algorithms and Applications. Springer.
[Link]
13. Rosebrock, A. (2020). Practical Python and OpenCV + Case Studies.
PyImageSearch.
14. OpenCV Development Team. (2023). OpenCV-Python Tutorials. Retrieved from
[Link]
15. van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. CreateSpace.
16. Lundh, F. (2001). An Introduction to Tkinter. Pythonware. Retrieved from
[Link]

42
17. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., et al. (2011). Scikit-learn:
Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
18. Brownlee, J. (2022). Introduction to Computer Vision in Python. Machine Learning
Mastery.
19. Singh, A., & Sharma, R. (2021). Smart Surveillance System Using OpenCV and
Python. International Journal of Computer Applications, 174(10), 8–14.
[Link]
20. Raut, A., & Jadhav, N. (2020). IoT Based Smart Surveillance System Using
Raspberry Pi. International Research Journal of Engineering and Technology
(IRJET), 7(4), 2399–2404.
21. Jain, R., & Gupta, A. (2022). An Efficient CCTV System Using Face Recognition and
Object Tracking. International Journal of Engineering Research & Technology
(IJERT), 11(02), 452–456.
22. Chandra, R., & Bose, D. (2019). AI-Powered Security Systems: Trends and
Challenges. ACM Computing Surveys, 52(5), Article 87.
23. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
[Link]
24. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with
deep convolutional neural networks. Advances in Neural Information Processing
Systems, 25, 1097–1105.
25. Anwar, S., & Raychowdhury, A. (2020). Edge AI: Machine learning at the edge in
IoT. Proceedings of the IEEE, 108(12), 2561–2574.
26. Zhao, Z., Zheng, P., Xu, S., & Wu, X. (2019). Object Detection With Deep Learning:
A Review. IEEE Transactions on Neural Networks and Learning Systems, 30(11),
3212–3232.
27. Anil, K., & Lavanya, R. (2019). A Review on Real Time Video Surveillance System
Using Deep Learning. International Journal of Innovative Technology and Exploring
Engineering, 8(11), 3540–3543.
28. Chakraborty, T., & Saha, A. (2020). A Comparative Study on Face Recognition
Algorithms. International Journal of Advanced Computer Science and Applications,
11(4), 215–220.
29. Rajalakshmi, K., & Sangeetha, R. (2021). Real-Time Object Detection Using
OpenCV and Python. International Journal of Engineering and Technology (IJET),
13(1), 45–4.

43
Appendix

The appendix includes supplementary materials that support the development and
understanding of the Pro Vision system. It contains source code snippets, screenshots of
the user interface, sample output images from different modules (such as motion
detection, face recognition, and in/out tracking), and configuration details for software
dependencies. These artifacts provide additional insight into the technical implementation
and help validate the functionality of each component. The appendix also includes
references to third-party libraries used during development, such as OpenCV, NumPy,
and Tkinter, along with their installation instructions, ensuring the system can be
replicated or extended by other developers or researchers.

44

You might also like