Report
Report
in
COMPUTER APPLICATIONS
Submitted To
DECLARATION
CERTIFICATE
This is to certify that the major project “ PRO VISION” is the bonafide work
carried out by “Ashish Singh Rawat , Chandra Shekar , Chand Lokesh
Jhalak , Aryan Gusain, Anshika Pundir” Roll number “236229285022,
236229285033, 236229085031, 236229285021, 236229285012" student of
BCA Tulas’s Institute, Dehradun ,During the year 2024-25 in partial
fulfillment of the requirement for the award of the degree of Bachelor of
Computer Application and that the major project has not formed the basis for
the award previously of any degree , diploma,associateship , fellowship or any
other similar title.
Date :-
ABSTRACT
Declaratrion i
Certificates ii
Acknownledgement iii
Abstract
Chapter 1 Introduction 1
1.1 Background
1.2 Objectives
1.3 Scope
Chapter 2 Literature Review 2-4
Chapter 3 System Requirement 5
3.1 Software Requirements
3.2 Hardware Requirements
Chapter 4 System Design 6-10
4.1 Block Diagram
4.2 Architecture
4.3 Component Description
4.4 Model
Chapter 5 Coding 11-23
5.1 [Link]
5.2 Find_noise.py
5.3 Spot_diff.py
5.4 [Link]
5.5 [Link]
5.6 [Link]
5.7 Finding noises in rectangle
5.8 Date and Time
Chapter 6 Implementation 24-35
6.1 Data Flow
6.2 Modules and Functionality
6.3 User Interface
Chapter 7 Future Scope 36-38
Chapter 8 Conclusion 39-41
References 42-43
Appendix 44
Chapter 1
INTRODUCTION
1.1 Background
With the rapid growth of urban infrastructure and increasing security
concerns, surveillance systems have become a crucial component in
maintaining safety and monitoring environments. Traditional CCTV
systems, while useful, often require manual intervention and lack
intelligent features. The rise of Computer Vision and Machine
Learning has enabled the development of smarter, more autonomous
security systems. This project, Smart CCTV Surveillance System, is
designed to go beyond passive video capture and introduce real-time
intelligent monitoring features such as object theft detection, motion
analysis, face recognition, and visitor counting
1.2 Objectives
The main objectives of this project are:
1.3 Scope
This system is intended for use in small-scale environments like homes,
offices, or labs where real-time monitoring with intelligent alerts can
enhance security. The scope includes:
1
CHAPTER 2
LITERATURE REVIEW
In the landscape of modern security, conventional closed-circuit
television (CCTV) systems are steadily becoming obsolete. These legacy
systems primarily rely on passive monitoring and continuous human
supervision, making them limited in their responsiveness and prone to
human error. The rise in demand for intelligent, automated surveillance
systems has led to the integration of advanced technologies such as
computer vision, machine learning, and real-time image processing.
Systems like Pro Vision represent a major leap in this direction by
incorporating a wide range of intelligent monitoring features that work
autonomously and respond to events as they occur.
2
image into smaller regions, computing binary patterns for each region,
and generating histograms that collectively serve as a facial signature.
This signature is then compared to a pre-trained dataset to identify
known individuals. The strength of LBPH lies in its robustness against
lighting variations and facial expressions, as well as its low
computational cost, making it ideal for real-time surveillance on devices
without high-end GPUs.
3
libraries allows for rapid prototyping and deployment across multiple
platforms, including Windows and Linux operating systems. Moreover,
Python’s open-source nature and active community support have
significantly reduced the development cycle and simplified debugging,
making it an ideal choice for both academic projects and commercial
applications.
From a broader perspective, the design choices in Pro Vision align with
current trends in smart surveillance. Research in the field has
increasingly emphasized the need for modular, automated, and locally-
processed security systems. While cloud-based surveillance offers
scalability, it introduces latency, privacy concerns, and dependency on
internet connectivity. In contrast, systems like Pro Vision, which process
data locally and store it securely on the device, offer faster response
times and greater control over sensitive information. This approach has
been highlighted in studies advocating edge computing for surveillance,
especially in privacy-sensitive or resource-constrained environments
such as homes and small businesses.
4
CHAPTER 3
SYSTEM REQUIREMENT
3.1 SOFTWARE REQUIREMENT
The software stack for Pro Vision is built on cross-platform tools and open-source
libraries to ensure maximum accessibility and flexibility. Below are the essential
software requirements:
5
CHAPTER 4
SYSTEM DESIGN
The system design of Pro Vision outlines how various components—both
hardware and software—interact to create a functional smart surveillance
solution. It includes the architectural flow, module design, and
component responsibilities that drive the core functionality of the system.
Processing Modules:
Motion Detection
SSIM Comparison
Video Recording
Output:
6
Fig.1 BLOCK DIAGRAM
7
4.2 Architecture
Pro Vision follows a layered and modular architecture to separate
concerns and enhance maintainability. The architecture consists of the
following layers:
The separation of logic into these layers allows for easy updates and
future integration with more advanced systems such as IoT modules or
deep learning algorithms.
Tkinter GUI: Allows users to initiate and interact with features using
buttons and icons.
8
Recording Module: Records live video feed with timestamps and stores
it in AVI format.
4.4 Model
The development of Pro Vision followed the Waterfall Model. This
model was chosen due to its simplicity and structured approach, making it
suitable for a compact, self-contained project like this. The phases
followed are:
The Waterfall Model provided a clear development path and ensured that
every part of the system—from face recognition to recording—was tested
and working before moving to the next phase.
9
The Waterfall Model is a traditional software development methodology
that follows a linear and sequential approach. It divides the project life
cycle into distinct phases, where each phase must be completed before the
next one begins. For the Pro Vision smart surveillance system, the
Waterfall Model was selected due to its simplicity and suitability for
small to medium-sized projects where the requirements are well-
understood from the beginning.
10
CHAPTER 5
CODING
The Pro Vision system was implemented in Python due to its simplicity,
portability, and extensive library support for computer vision and GUI
development. The system follows a modular programming approach,
where each feature is implemented as a separate Python module and
integrated into a unified GUI.
The main file ([Link]) acts as the central controller, invoking different
functionalities like monitoring, noise detection, in/out tracking, face
identification, and recording based on user interaction through the GUI.
Below is an overview of each key module in the project.
5.1 [Link]
import tkinter as tk
import [Link] as font
from in_out import in_out
from motion import noise
from rect_noise import rect_noise
from record import record
from PIL import Image, ImageTk
from find_motion import find_motion
from identify import maincall
window = [Link]()
[Link]("Smart cctv")
[Link](False, [Link](file='[Link]'))
[Link]('1080x700')
frame1 = [Link](window)
11
label_icon = [Link](frame1, image=icon)
label_icon.grid(row=1, pady=(5,10), column=2)
btn1_image = [Link]('icons/[Link]')
btn1_image = btn1_image.resize((50,50), [Link])
btn1_image = [Link](btn1_image)
btn2_image = [Link]('icons/[Link]')
btn2_image = btn2_image.resize((50,50), [Link])
btn2_image = [Link](btn2_image)
btn5_image = [Link]('icons/[Link]')
btn5_image = btn5_image.resize((50,50), [Link])
btn5_image = [Link](btn5_image)
btn3_image = [Link]('icons/[Link]')
btn3_image = btn3_image.resize((50,50), [Link])
btn3_image = [Link](btn3_image)
btn6_image = [Link]('icons/[Link]')
btn6_image = btn6_image.resize((50,50), [Link])
btn6_image = [Link](btn6_image)
btn4_image = [Link]('icons/[Link]')
btn4_image = btn4_image.resize((50,50), [Link])
btn4_image = [Link](btn4_image)
btn7_image = [Link]('icons/[Link]')
btn7_image = btn7_image.resize((50,50), [Link])
btn7_image = [Link](btn7_image)
btn_font = [Link](size=25)
btn3 = [Link](frame1, text='Noise', height=90, width=180, fg='green',
command=noise, image=btn3_image, compound='left')
btn3['font'] = btn_font
12
[Link](row=5, pady=(20,10))
[Link]()
[Link]()
5.2 find_noise.py
import cv2
from spot_diff import spot_diff
import time
import numpy as np
def find_motion():
motion_detected = False
is_start_done = False
cap = [Link](0)
check = []
print("waiting for 2 seconds")
[Link](2)
frame1 = [Link]()
_, frm1 = [Link]()
frm1 = [Link](frm1, cv2.COLOR_BGR2GRAY)
while True:
13
diff = [Link](frm1, frm2)
_, thresh = [Link](diff, 30, 255, cv2.THRESH_BINARY)
contors = [Link](thresh, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)[0]
#look at it
contors = [c for c in contors if [Link](c) > 25]
if len(contors) > 5:
[Link](thresh, "motion detected", (50,50),
cv2.FONT_HERSHEY_SIMPLEX, 2, 255)
motion_detected = True
is_start_done = False
elif motion_detected and len(contors) < 3:
if (is_start_done) == False:
start = [Link]()
is_start_done = True
end = [Link]()
end = [Link]()
print(end-start)
if (end - start) > 4:
frame2 = [Link]()
[Link]()
[Link]()
x = spot_diff(frame1, frame2)
if x == 0:
print("runnig again")
return
else:
print("found motion sending mail")
return
else:
[Link](thresh, "no motion detected", (50,50),
cv2.FONT_HERSHEY_SIMPLEX, 2, 255)
[Link]("winname", thresh)
_, frm1 = [Link]()
frm1 = [Link](frm1, cv2.COLOR_BGR2GRAY)
if [Link](1) == 27:
break
return
14
5.3 SPOT_DIFF.PY
import cv2
import time
from [Link] import structural_similarity
from datetime import datetime
frame1 = frame1[1]
frame2 = frame2[1]
g1 = [Link](frame1, cv2.COLOR_BGR2GRAY)
g2 = [Link](frame2, cv2.COLOR_BGR2GRAY)
g1 = [Link](g1, (2,2))
g2 = [Link](g2, (2,2))
(score, diff) = structural_similarity(g2, g1, full=True)
print("Image similarity", score)
diff = (diff * 255).astype("uint8")
thresh = [Link](diff, 100, 255, cv2.THRESH_BINARY_INV)[1]
contors = [Link](thresh, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)[0]
contors = [c for c in contors if [Link](c) > 50]
if len(contors):
for c in contors:
x,y,w,h = [Link](c)
[Link](frame1, (x,y), (x+w, y+h), (0,255,0), 2)
else:
print("nothing stolen")
return 0
[Link]("diff", thresh)
[Link]("win1", frame1)
[Link]("stolen/"+[Link]().strftime('%-y-%-m-%-d-
%H:%M:%S')+".jpg", frame1)
[Link](0)
[Link]()
return 1
15
5.4 [Link]
import cv2
import os
import numpy as np
import tkinter as tk
import [Link] as font
def collect_data():
name = input("Enter name of person : ")
count = 1
ids = input("Enter ID: ")
cap = [Link](0)
filename = "haarcascade_frontalface_default.xml"
cascade = [Link](filename)
while True:
_, frm = [Link]()
gray = [Link](frm, cv2.COLOR_BGR2GRAY)
[Link](f"persons/{name}-{count}-{ids}.jpg", roi)
count = count + 1
[Link](frm, f"{count}", (20,20), cv2.FONT_HERSHEY_PLAIN,
2, (0,255,0), 3)
[Link]("new", roi)
[Link]("identify", frm)
if [Link](1) == 27 or count > 200:
[Link]()
[Link]()
train()
break
def train():
print("training part initiated !")
recog = [Link].LBPHFaceRecognizer_create()
dataset = 'persons'
paths = [[Link](dataset, im) for im in [Link](dataset)]
faces = []
ids = []
labels = []
for path in paths:
16
[Link]([Link]('/')[-1].split('-')[0])
[Link](int([Link]('/')[-1].split('-')[2].split('.')[0]))
[Link]([Link](path, 0))
[Link](faces, [Link](ids))
[Link]('[Link]')
return
def identify():
cap = [Link](0)
filename = "haarcascade_frontalface_default.xml"
def maincall():
root = [Link]()
[Link]("480x100")
[Link]("identify")
label = [Link](root, text="Select below buttons ")
[Link](row=0, columnspan=2)
label_font = [Link](size=35, weight='bold',family='Helvetica')
button1 = [Link](root,
label['font'] = label_font text="Add Member ", command=collect_data,
btn_font = [Link](size=25) 17
height=2, width=20)
[Link](row=1, column=0, pady=(10,10), padx=(5,5))
button1['font'] = btn_font
button2 = [Link](root, text="Start with known ",
command=identify, height=2, width=20)
[Link](row=1, column=1,pady=(10,10), padx=(5,5))
button2['font'] = btn_font
[Link]()
return
5.5 IN_OUT.PY
import cv2
from datetime import datetime
def in_out():
cap = [Link](0)
while True:
_, frame1 = [Link]()
frame1 = [Link](frame1, 1)
_, frame2 = [Link]()
frame2 = [Link](frame2, 1)
x = 300
if len(contr) > 0:
max_cnt = max(contr, key=[Link])
18
x,y,w,h = [Link](max_cnt)
[Link](frame1, (x, y), (x+w, y+h), (0,255,0), 2)
[Link](frame1, "MOTION", (10,80),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0,255,0), 2)
elif right:
if x < 200:
print("to left")
x = 300
right, left = "", ""
[Link](f"visitors/in/{[Link]().strftime('%-y-
%-m-%-d-%H:%M:%S')}.jpg", frame1)
elif left:
if x > 500:
print("to right")
x = 300
right, left = "", ""
[Link](f"visitors/out/{[Link]().strftime('%-y-
%-m-%-d-%H:%M:%S')}.jpg", frame1)
[Link]("", frame1)
k = [Link](1)
if k == 27:
[Link]()
[Link]()
break
19
5.6 [Link]
import cv2
def noise():
cap = [Link](0)
while True:
_, frame1 = [Link]()
_, frame2 = [Link]()
if len(contr) > 0:
max_cnt = max(contr, key=[Link])
x,y,w,h = [Link](max_cnt)
[Link](frame1, (x, y), (x+w, y+h), (0,255,0), 2)
[Link](frame1, "MOTION", (10,80),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0,255,0), 2)
else:
[Link](frame1, "NO-MOTION", (10,80),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0,0,255), 2)
if [Link](1) == 27:
[Link]()
[Link]()
break
20
5.7 FINDING NOISES IN RECTANGLE
import cv2
donel = False
doner = False
x1,y1,x2,y2 = 0,0,0,0
def rect_noise():
[Link]("select_region")
[Link]("select_region", select)
while True:
_, frame = [Link]()
[Link]("select_region", frame)
21
diff = [Link](diff, cv2.COLOR_BGR2GRAY)
if len(contr) > 0:
max_cnt = max(contr, key=[Link])
x,y,w,h = [Link](max_cnt)
[Link](frame1, (x+x1, y+y1), (x+w+x1, y+h+y1),
(0,255,0), 2)
[Link](frame1, "MOTION", (10,80),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0,255,0), 2)
else:
[Link](frame1, "NO-MOTION", (10,80),
cv2.FONT_HERSHEY_SIMPLEX, 2, (0,0,255), 2)
if [Link](1) == 27:
[Link]()
[Link]()
break
22
5.8 DATE AND TIME
import cv2
def record():
cap = [Link](0)
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out =
[Link](f'recordings/{[Link]().strftime("%H-
%M-%S")}.avi', fourcc,20.0,(640,480))
while True:
_, frame = [Link]()
[Link](frame, f'{[Link]().strftime("%D-%H-
%M-%S")}', (50,50), cv2.FONT_HERSHEY_COMPLEX,
0.6, (255,255,255), 2)
[Link](frame)
if [Link](1) == 27:
[Link]()
[Link]()
break
23
Chapter-6
IMPLEMENTATION
The implementation of Pro Vision was carried out using Python,
leveraging libraries such as OpenCV for image processing, Tkinter for
GUI development, NumPy for numerical operations, and skimage for
structural similarity analysis. The system was designed in a modular
format to ensure scalability and ease of maintenance. Each core
functionality—monitoring, face recognition, motion detection, in/out
tracking, and video recording—was implemented as an independent
Python script and later integrated through a central graphical interface.
The GUI was designed to be user-friendly, featuring labeled buttons and
icons that allow users to trigger specific modules with a single click.
Features such as face detection were implemented using Haar cascade
classifiers, while face recognition was performed using the LBPH (Local
Binary Pattern Histogram) algorithm. Object monitoring was achieved by
comparing frames using the Structural Similarity Index (SSIM), and
direction-based tracking was coded by analyzing motion vectors across
frame boundaries. Once the backend logic was in place, the application
was tested across different platforms, including Windows and Linux, to
ensure platform independence and real-time performance.
24
Step 3: Data Processing in Modules
Based on the selected function:
Face Identification applies Haar cascades for detection and LBPH for
recognition.
This structured flow ensures that data moves logically and efficiently
through each stage—from input acquisition to intelligent processing and
evidence storage—maintaining system responsiveness and reliability.
For this project we have used various latest technologies which will be
evaluated in this chapter with every details of why it is used.
We’ll divide this section of explaination of technolgy based on
modules/features in project.
But first lets see the language used in this project.
Language Used,
We have used Python language as it is very new and also comes with so
many features like we can do Machine Learning, Computer Vision and
Also make GUI application with ease.
25
1 – Short and Concise Language.
2 – Easy to Learn and use.
3 – Good Technical support over Internet
4 – Many Package for different tasks.
5 – Run on Any Platform.
6 – Modern and OOP language
Well these are just the minor points from our sides Python is just a lot
more than this.
26
1. Monitor
2. Identify the family member
3. Detect for Noises
4. Visitors in room detection
Monitor Feature :
This feature is used to find what is the thing which is stolen from
the frame which is visible to webcam. Meaning It constantly monitors the
frames and checks which object or thing from the frame has been taken
away by the thief.
This uses Structural Similarity to find the differences in the two frames.
The two frames are captured first when noise was not happened and
second when noise stopped happening in the frame.
resources go deep into the details, that too specifically for a gradient-
based implementation as SSIM is often used as a loss function.
The Structural Similarity Index (SSIM) metric extracts 3 key features
from an image:
Luminance
Contrast
Structure
27
This system calculates the Structural Similarity Index between 2 given
images which is a value between -1 and +1. A value of +1 indicates that
the 2 given images are very similar or the same while a value of -1
indicates the 2 given images are very different. Often these values are
adjusted to be in the range [0, 1], where the extremes hold the same
meaning.
28
Structure: The structural comparison is done by using a consolidated
formula (more on that later) but in essence, we divide the input signal
with its standard deviation so that the result has unit standard deviation
which allows for a more robust comparison.
29
Cascade classifier, or namely cascade of boosted classifiers working
with haar-like features, is a special case of ensemble learning, called
boosting.
It typically relies on Adaboost classifiers (and other models such as Real
Adaboost, Gentle Adaboost or Logitboost).
Cascade classifiers are trained on a few hundred sample images of image
that contain the object we want to detect, and other images that do not
contain those images.
There are some common features that we find on most common human
faces :
a dark eye region compared to upper-cheeks
a bright nose bridge region compared to the eyes
some specific location of eyes, mouth, nose…
The characteristics are called Haar Features. The feature extraction
process will look like this :
Haar features are similar to these convolution kernels which are used to
detect the presence of that feature in the given image.
For doing all this stuff openCV module in python language has inbuild
function called cascadeclassifier which we have used in order to detect
for faces in the frame
30
2 – Using LBPH for face recognition
So now we have detected for faces in the frame and this is the time to
identify it and check if it is in the dataset which we’ve used to train our
lbph model.
Extracting the Histograms: Now, using the image generated in the last
31
step, we can use the Grid X and Grid Y parameters to divide the image
into multiple grids, as can be seen in the following image:
And after all this the model is trained and later on when we want to make
predictions the same steps are applied to the make and its histograms are
compared with already trained model and in such way this feature works.
3 – Detect for Noises in the frame
This feature is used to find the noises in the frames well this is
something you would find in most of the cctv’s but in this module we’ll
see how it works.
Talking in simple way all the frames are continously analyzed and
checked for noises. Noise in checked in the consecutive frames. Simply
we do the absolute difference between two frames and in this way the
difference of two images are analyzed and Contours( boundaries of the
motion are detected ) and if there are no boundries then no motion and if
there is any there is motion .pixel and similarly every pixel has that
valules of brigthness.
32
4 – Visitors in room detection
This is the feature which can detect if someone has entered in the
room or gone out.
So basically to know from which side does the motion happened we first
detect for motion and later on we draw rectangle over noise and last step
is we check the co-ordinates if those points lie on left side then it is
classified as left motion.
33
6.3 USER INTERFACE
# feature 1 - Monitor
Showing use of first feature or you can consider it as output from feature
1.
34
# feature 3 – In out Detection It has detected me entering in the room
and being detected as entered and saving the image locally.
35
CHAPTER 7
Future Scope
36
detection, face recognition, and anomaly identification. With DL
integration, Pro Vision could support features such as:
These features would transform Pro Vision into a more intelligent and
context-aware system, capable of not just recording events, but
understanding and responding to them in real-time.
37
local processing, future iterations could offer cloud sync as an optional
add-on for those who require remote access and backup.
38
Chapter 8
Conclusion
One of the most valuable aspects of Pro Vision is its ability to detect and
react to environmental changes. The use of the Structural Similarity Index
(SSIM) allows the system to recognize subtle differences in image
content, making it possible to detect theft or tampering by comparing
frames captured before and after movement is detected. Meanwhile, the
motion detection algorithm based on frame differencing and contour
analysis enables the system to alert users to dynamic activity in its field
of view. These capabilities allow the system to act not only as a recorder
but as a smart observer capable of identifying potential threats in real
time.
39
The inclusion of directional tracking further expands its usefulness,
particularly in access-controlled areas or rooms where occupancy
tracking is important. Together, these features transform the system from
a passive video recorder into an interactive security assistant that logs,
identifies, and organizes surveillance data for easy review and analysis.
Moreover, Pro Vision holds significant promise for future extension. Its
architecture was intentionally designed to support the integration of more
advanced features as technology evolves. Potential enhancements include
deep learning–based object detection, real-time behavioral analysis,
integration with IoT devices for smart home automation, and even mobile
or cloud connectivity for remote monitoring. These additions could
elevate Pro Vision from a basic surveillance assistant to a comprehensive
security platform suitable for deployment in offices, campuses, homes, or
public environments. The modular codebase and clean UI further support
the idea of Pro Vision as an educational tool, allowing students and
developers to learn from and build upon the existing system.
40
system. It effectively merges multiple technologies into a cohesive
product that is as practical as it is educational. The success of this project
underscores the value of modular software development, the potential of
open-source libraries, and the impact of automation in the field of
security. With its strong foundation and future-ready design, Pro Vision is
not just a complete project—it is the beginning of a much larger
opportunity in smart surveillance innovation.
41
References
1. Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality
assessment: From error visibility to structural similarity. IEEE Transactions on
Image Processing, 13(4), 600–612. [Link]
2. Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of
simple features. Proceedings of the 2001 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR), 1, I-511–I-518.
[Link]
3. Ahonen, T., Hadid, A., & Pietikäinen, M. (2006). Face description with local binary
patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 28(12), 2037–2041. [Link]
4. Bradski, G. (2000). The OpenCV library. Dr. Dobb's Journal of Software Tools.
[Link]
5. Shapiro, L., & Stockman, G. (2001). Computer Vision. Prentice Hall.
6. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection.
In Proceedings of the IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR), 1, 886–893. [Link]
7. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once:
Unified, real-time object detection. Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 779–788.
8. Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment
using multitask cascaded convolutional networks. IEEE Signal Processing Letters,
23(10), 1499–1503.
9. King, D. E. (2009). Dlib-ml: A machine learning toolkit. Journal of Machine
Learning Research, 10, 1755–1758. [Link]
10. Russakovsky, O., Deng, J., Su, H., Krause, J., et al. (2015). ImageNet Large Scale
Visual Recognition Challenge. International Journal of Computer Vision, 115(3),
211–252.
11. Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely
Connected Convolutional Networks. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 4700–4708.
12. Szeliski, R. (2010). Computer Vision: Algorithms and Applications. Springer.
[Link]
13. Rosebrock, A. (2020). Practical Python and OpenCV + Case Studies.
PyImageSearch.
14. OpenCV Development Team. (2023). OpenCV-Python Tutorials. Retrieved from
[Link]
15. van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. CreateSpace.
16. Lundh, F. (2001). An Introduction to Tkinter. Pythonware. Retrieved from
[Link]
42
17. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., et al. (2011). Scikit-learn:
Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
18. Brownlee, J. (2022). Introduction to Computer Vision in Python. Machine Learning
Mastery.
19. Singh, A., & Sharma, R. (2021). Smart Surveillance System Using OpenCV and
Python. International Journal of Computer Applications, 174(10), 8–14.
[Link]
20. Raut, A., & Jadhav, N. (2020). IoT Based Smart Surveillance System Using
Raspberry Pi. International Research Journal of Engineering and Technology
(IRJET), 7(4), 2399–2404.
21. Jain, R., & Gupta, A. (2022). An Efficient CCTV System Using Face Recognition and
Object Tracking. International Journal of Engineering Research & Technology
(IJERT), 11(02), 452–456.
22. Chandra, R., & Bose, D. (2019). AI-Powered Security Systems: Trends and
Challenges. ACM Computing Surveys, 52(5), Article 87.
23. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
[Link]
24. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with
deep convolutional neural networks. Advances in Neural Information Processing
Systems, 25, 1097–1105.
25. Anwar, S., & Raychowdhury, A. (2020). Edge AI: Machine learning at the edge in
IoT. Proceedings of the IEEE, 108(12), 2561–2574.
26. Zhao, Z., Zheng, P., Xu, S., & Wu, X. (2019). Object Detection With Deep Learning:
A Review. IEEE Transactions on Neural Networks and Learning Systems, 30(11),
3212–3232.
27. Anil, K., & Lavanya, R. (2019). A Review on Real Time Video Surveillance System
Using Deep Learning. International Journal of Innovative Technology and Exploring
Engineering, 8(11), 3540–3543.
28. Chakraborty, T., & Saha, A. (2020). A Comparative Study on Face Recognition
Algorithms. International Journal of Advanced Computer Science and Applications,
11(4), 215–220.
29. Rajalakshmi, K., & Sangeetha, R. (2021). Real-Time Object Detection Using
OpenCV and Python. International Journal of Engineering and Technology (IJET),
13(1), 45–4.
43
Appendix
The appendix includes supplementary materials that support the development and
understanding of the Pro Vision system. It contains source code snippets, screenshots of
the user interface, sample output images from different modules (such as motion
detection, face recognition, and in/out tracking), and configuration details for software
dependencies. These artifacts provide additional insight into the technical implementation
and help validate the functionality of each component. The appendix also includes
references to third-party libraries used during development, such as OpenCV, NumPy,
and Tkinter, along with their installation instructions, ensuring the system can be
replicated or extended by other developers or researchers.
44