Professional Documents
Culture Documents
Bachelor of Technology
In
2023-24
Page 1 of 27
CERTIFICATE
This is to certify that this project report titled Smart glasses for the blind with Face
It is further certified that it contains no material, which to a substantial extent has been
submitted for the award of any degree/diploma in any institute or has been published in
any form, except the assistances drawn from other sources, for which due
___________
Date: Signature of the Supervisor
is our own original work carried out as a under graduate student in Netaji Subhash
Engineering College except to the extent that assistances from other sources are duly
acknowledged.
All sources used for this project report have been fully and properly cited. It contains no
material which to a substantial extent has been submitted for the award of any
degree/diploma in any institute or has been published in any form, except where due
acknowledgement is made.
Page 3 of 27
Certificate of Approval
We hereby approve this dissertation titled
Smart glasses for the blind with Face detection and recognition system based on
Deep Learning
carried out by
award of the degree Bachelor of Technology (B. Tech) in << Program name >> of West
Date:………..
Examiners’ signatures:
1. ………………………………………….
2. ………………………………………….
3. ………………………………………….
Page 4 of 27
Acknowledgement and/or Dedication
We hereby declare that the work presented in this report entitled “ Smart
glasses for the blind with Face detection and recognition system based on
Deep Learning” in partial fulfillment of the requirements for the award of the
degree of Bachelor of Technology in Electronics and Communication
Engineering submitted in the department of Electronics and Communication
Engineering, Netaji Subhash Engineering College, is an authentic record of
our own work under the supervision of Niladri Shekhar Mishra, Department
of Electronics and Communication Engineering . The matter embodied in the
report has not been submitted for the award of any other degree or diploma.
Dwaipyayan Mahata
Joyjit Dey
Kaustav Sarkar
Dated:…………………
Page 5 of 27
1. Introduction (Chapter 1: A description of the rationale of the project): (i)
Describe the problem being addressed (Why this problem? Application
areas? Social impact?) (ii) Literature review to address the problem (discuss
most recent publications, their approach, advantages, and disadvantages; all
references should be cited) (iii) Brief discussion of the proposed concept to
solve the problem (only discuss the concept briefly), (iv) brief description of
the workflow and outcome. - A description of the rationale of the project.
2. Methodology (Chapter 2): Detailed description of the proposed
methodology (design approach, analysis, theory etc.)
3. Results (Chapter 3): A description of the result analysis justifying the
proposed concept/ideas (data analysis, graphs, table, etc.)
4. Conclusion: (Chapter 4): Discuss the main outcome being achieved, other
possible approaches, merits and demerits of the proposed concept, social
impact, and future scope of the work.
5. References: A list of the references cited in the synopsis (must be referred
to in the text). Significant references are appreciated, preferably published in
peer-reviewed journals most recently.
Page 6 of 27
Abstract
Face recognition, propelled by advances in deep learning, has evolved into a
transformative technology with applications spanning security, authentication, and
human-computer interaction. Leveraging Convolutional Neural Networks (CNNs),
this methodology offers a robust solution to the inherent challenges posed by
variations in lighting, pose, and facial expressions.
The process begins with face detection using state-of-the-art CNN architectures
like Single Shot Multibox Detector (SSD) or You Only Look Once (YOLO). Detected
faces then undergo feature extraction through layers of convolutional and pooling
operations, enabling the automatic learning of hierarchical facial features crucial for
accurate identification.
During training, deep learning models employ techniques such as triplet loss to
ensure that the embeddings (numerical representations) of the same person's
faces are closer in the feature space, enhancing discrimination between
individuals. The utilization of large, diverse datasets for training is pivotal, allowing
models to generalize well to real-world scenarios.
Transfer learning plays a vital role, enabling the application of pre-trained models
on extensive datasets to face recognition tasks with limited labeled data. This
facilitates robust performance even in scenarios where training data is scarce.
Page 7 of 27
Contents
1: INTRODUCTION Page No
1.1 Problem Definition.......................................................................................
2: REVIEW WORK
2.1 Existing System..........................................................................................
3: WORKING METHODOLOGY
3.1 Flowcharts& Circuit Diagram..................................................................
4.2 Conclusion…………………………………………………………………………………………….
5: REFERENCES
Page 8 of 27
Chapter 1
Introduction
Face recognition using deep learning has emerged as a powerful and efficient technology
in the field of computer vision. Deep learning, a subset of machine learning, involves
training artificial neural networks on large datasets to learn and extract hierarchical
representations of data. In the context of face recognition, deep learning algorithms can
automatically identify and authenticate individuals by analyzing facial features.
The traditional methods of face recognition often relied on handcrafted features, such as
the position of eyes, nose, and mouth, which could be sensitive to variations in lighting,
pose, and facial expressions. Deep learning approaches, on the other hand, have shown
remarkable success in addressing these challenges by automatically learning relevant
features directly from raw data.
Here's a brief introduction to the key components of face recognition using deep learning:
9 of 27
6. Transfer Learning: Transfer learning is often employed to leverage pre-trained
models on large datasets for face recognition tasks with limited labeled data. This
helps in achieving good performance even when the available training data is
relatively small.
Face recognition by deep learning has found applications in various domains, including
security, surveillance, user authentication, and human-computer interaction. However, it's
essential to address ethical considerations, such as privacy concerns and potential biases
in the training data, when deploying face recognition systems in real-world applications.
Key Challenges:
Page 10 of 27
3. Privacy Concerns:
Developing a facial recognition system that respects the privacy of
individuals and adheres to ethical standards is crucial to avoid potential
privacy violations.
2. Face Detection:
Implement a reliable face detection system that can identify faces in the
wearer's environment, providing information about the presence of
individuals nearby.
3. Face Recognition:
Incorporate deep learning-based face recognition to enable the smart
glasses to recognize and provide information about the identity of known
individuals, such as friends or acquaintances.
4. Real-Time Feedback:
Ensure that the system provides real-time auditory or haptic feedback to the
wearer, conveying information about the detected faces and recognized
individuals.
5. Navigation Assistance:
Integrate navigation features to help users navigate their surroundings,
providing information about obstacles and guiding them in unfamiliar
environments.
6. User-Friendly Interaction:
Design an intuitive and user-friendly interface to facilitate seamless
interaction with the smart glasses, allowing users to control and customize
the system easily.
7. Privacy Considerations:
Implement privacy-conscious features, ensuring that the facial recognition
system respects the privacy of individuals and complies with ethical
standards.
By achieving these objectives, the project aims to create a valuable tool that empowers
visually impaired individuals, enhances their social interactions, and contributes to their
overall independence and quality of life.
Page 12 of 27
1.3 Project Overview
Dang and Sharma (2017) compared and analysed the precision and recall of four basic
algorithms which are used for face detection: (1) Viola-Jones, (2) Support Vector
Machines-Based (3) Neural Network-Based Face Detection and (4) SMQT Features and
SNOW Classifier, face detection. They concluded that the Viola-Jones is the best among
all these algorithms.
Datta, Datta and Banerjee (2015) also mentioned that the Viola-Jones face detector is
able to process face image rapidly with high true detection rates is a realtime framework.
Rajeshwari and Prof. Anala (2015) agreed that the Viola-Jones method gives better
results, but it has greater time consumption than skin colour-based detection method and
background subtraction method.
From the works mentioned above, the Viola-Jones algorithm can be considered as a
popular method for face detection. However, the Viola-Jones algorithm cannot detect
faces in a diverse position or angle (Enriquez, 2018). Low accuracy of face detection has
resulted when the face is not presented in a front-facing position with proper lighting. In
other words, the Viola-Jones face detection method could not handle non-frontal faces
efficiently.
Page 14 of 27
1.3.3 Face Detection by Multi-Task Cascaded Convolutional Network (MTCNN)
The Viola-Jones face detector while being prevalent in face detection tasks for a decade.
As mentioned before, Viola-Jones face detector degrades expressively with greater visual
variations of faces that usually occurs in real-world applications. Inspired by the
achievement obtained in computer vision tasks through the use of deep convolutional
neural networks (CNNs), numerous studies were motivated to use this architecture for
face detection. In this respect, Zhang et al. (2016) proposed a Multi-task Cascaded
Convolutional Networks (MTCNN) based framework for joint face detection and
alignment, which implements three stages of designed deep CNNs in a cascaded
structure that forecast the face and landmark locations.
In contrast to the Viola-Jones algorithm, CNNs able to detect faces in various positions or
angle and different lighting circumstances. As a result of it, the CNNs face detection
method requires to store a larger amount of information and much more space needed
than the Viola-Jones algorithm (Enriquez, 2018). Accessing too much RAM (Random-
Access Memory) and requiring stronger processing unit is a constant problem to run the
program of CNNs. It limited the CNNs can be implemented correctly. Therefore, while
CNNs are faster and much more reliable in term of accuracy in face detection, the Viola-
Jones algorithm is still widely used today.
Page 16 of 27
1.4 Specifications
This section provides an outline for the hardware and software requirements of the
developed system.
Laptop webcam
Python language
VScode / Jupyternotebook (coding environment)
Anaconda (ML libraries and algorithms)
Pyhton library packages
(i) OpenCV Library
(ii) Tensorflow Library
(iii) NumPy Library
REVIEW WORK
Researchers and developers continue to improve existing algorithms and propose novel
solutions for face detection, making it a dynamic and evolving field within computer vision.
Keep in mind that the effectiveness of a specific method can depend on factors such as
the dataset used, the application context, and the computational resources available.
As of my last knowledge update in January 2022, there have been notable developments
in utilizing face detection for smart glasses designed to assist individuals with visual
impairments. These technologies aim to enhance the daily lives of the blind by providing
real-time information about their surroundings. Key aspects of existing work include:
Page 18 of 27
3. Real-Time Feedback:
Smart glasses offer real-time auditory or haptic feedback about detected
faces, enabling users to receive instant information about people around
them. This feature enhances the user's situational awareness and social
interactions.
4. Wearable Computer Vision Systems:
Researchers have developed wearable computer vision systems that
seamlessly integrate with smart glasses. These systems capture visual data
through embedded cameras, process it in real-time, and deliver relevant
information to the user, including the recognition of faces.
In recent years, artificial intelligence and deep learning approaches are rapidly entering all
areas, including autonomous vehicle systems , robotics, space exploration, medicine, pet
and animal monitoring systems , and areas that start with the word smart, such as smart
city, smart home, smart agriculture, etc. Computer vision and artificial intelligence
methods play a key role in the development of smart glass systems. It is not possible to
build a smart glass system without computer vision methods such as object detection and
recognition methods because the input data is an image or a video. Object detection and
recognition has garnered the attention of researchers, and numerous new approaches are
being developed every year. To reduce the review areas, we analyzed lightweight object
detection and recognition models designed for embedded systems.
In 2016, Iandola et al.designed three primary mechanisms to squeeze CNN networks and
named SqueezeNet: (1) 3 × 3 filters were replaced with 1 × 1 filters; (2) the number of
input channels was reduced to 3 × 3 filters, and (3) the network was down-sampled late.
These three approaches reduced the number of parameters in a CNN while maximizing
the accuracy of the limited parameter sources. Mobile deep learning is rapidly expanding.
The Tiny-YOLO net for iOS, introduced by Apte et al. in 2017, was developed for mobile
devices and tested with a metal GPU for real-time applications with an accuracy
approximately similar to the original YOLO. In the same year, Howard et al. built a
lightweight deep neural network named MobileNet using depth-wise separable
convolution architecture for mobile and embedded systems. This model has inspired
researchers and has been used in various applications. In 2018, the MobileNet-SSD
network , derived from VGG-SSD, was proposed to improve the accuracy of small objects
in real-time speed. Further, Wong et al. developed a compact single-shot detection deep
CNN based on the remarkable performance of the fire microarchitecture presented in
SqueezeNet and the macro architecture introduced in SSD. A tiny SSD is created for real-
time embedded systems by reducing the model size and consists of a fire subnetwork
stack and optimized SSD-based convolutional feature layers. With the increasing
capabilities of processors for mobile and embedded devices, numerous effective mobile
deep CNNs for object detection and recognition have been introduced in recent years,
such as ShuffleNet , PeleeNet , and EfficientDet .
It's important to note that the field of assistive technologies for the blind is dynamic, and
ongoing research and development are likely to bring further advancements.
2.2 Proposed System
Our goal is to create convenience and opportunities for blind people to facilitate
independent travel during both day and night-time. To achieve this goal, wearable smart
glass and a multifunctional system that can capture images through a mini camera and
return object recognition results with voice feedback to users are the most effective
approaches. It is also conceivable to perceive visual information by touching the contours
of detected salient objects according to the needs of blind people via a refreshable tactile
display. The system is required to use deep CNNs to detect objects with high accuracy,
and a powerful processor to perform the processes sufficiently fast in real time. Therefore,
we introduced client–server architecture that consists of smart glass and a
smartphone/tactile pad as a local, and an artificial intelligence server to perform image
processing tasks. Hereinafter, for simplicity in the text, a smartphone is written instead of
a smartphone/tactile pad. The local part comprises smart glass and a smartphone and
transfers data via a Bluetooth connection. Meanwhile, the artificial intelligence server
receives the images from the local, processes them, and returns the result in audio
format. Note that, smart glass hardware has a built-in speaker for direct output and
earphone port for audio connection to convey returned audio results from smartphone to
users.
Page 20 of 27
Chapter 2
Working Methodology
3.1 Flow Chart:
Page 21 of 27
3.2 Work done:
Step 1: Install the Required Software and Library Packages The purposed face recognition
system is developed using Anaconda with the Python programming language. Install
Anaconda Anaconda is essentially a nicely packaged Python IDE (Integrated Development
Environment) that is shipped with tons of useful library package, such as NumPy, Time,
Matplotlib and so-on. Anaconda also uses the concept of creating environments to isolate
different libraries and versions.
Install Library Packages Some Python library packages are required to be installed for the developed system,
which includes the OpenCV, Tensorflow, MTCNN, Face_recognition, Dlib, NumPy, Threading, OS, Time,
Pyttsx3, Openpyxl and Tkinter Library. These library packages can be simply installed by entering the
relevant command in the Python terminal
Step 2: Face Detection Face detection is the first and essential step for the face
recognition system. A face must be captured in order to recognize it. The face detection
technique in the developed system is using the pre-trained MTCNN face detector model.
The working principle of MTCNN is mentioned in Chapter 2. It has a good face detection
result, which works well for a large angle non-frontal face. Figure below shows the result of
detecting facial regions. The locations and outlines of each person’s eyes, nose, mouth
and chin can also be obtained using the MTCNN face detector. There are total 68
coordinates on the face. However, this face landmark is not a must in the face recognition
system.
Work is still pending in the part of image recognition, dataset creation, linking the dataset,
and hardware part, to reach the proposed goal of ours.
Page 22 of 27
Chapter 3 Result Analysis
Page 23 of 27
Conclusion
This paper describes a smart glass system that includes object detection, salient
object extraction, and text recognition models using computer vision and deep learning for
blind people. The proposed system is fully automatic and runs on an artificial intelligence
server. It detects and recognizes objects from low-light and dark-scene images to assist
blind people in a regular, day to day environment. The traditional smart glass system was
extended using deep learning models and the addition of salient object extraction for tactile
graphics and text recognition for text-to-speech.
Smart glass systems require greater energy and memory in embedded systems
because they are based on deep learning models. Therefore, we built it in an artificial
intelligence server to ensure real-time performance and solve energy problems. With the
advancement of the 5G era, transmitting image data to a server or receiving real-time
results for users is no longer a concern. The experimental results showed that object
detection, salient object extraction, and text recognition models were robust and performed
well with the help of low-light enhancement techniques in a dark scene environment. In the
future, we aim to create low-light and dark-image datasets with bounding box and ground
truth data to address object detection and text recognition tasks as well as evaluations at
night
Page 24 of 27
References
1. Adam, G. (2016) Machine Learning is Fun! Part 4: Modern Face Recognition with
Deep Learning. Available at: https://medium.com/@ageitgey/machinelearning-is-fun-part-4-
modern-face-recognition-with-deep-learningc3cffc121d78 (Accessed: 28 July 2019).
5. Cai, Z. et al. (2018) ‘Joint Head Pose Estimation with Multi-task Cascaded
Convolutional Networks for Face Alignment’, Proceedings - International
Conference on Pattern Recognition. IEEE, 2018–Augus, pp. 495–500. doi:
10.1109/ICPR.2018.8545898
6. Datta, A. K., Datta, M. and Banerjee, P. K. (2015) Face Detection and Recognition:
Theory and Practice. CRC Press, 2015.
Page 25 of 27
Rubrics
Project Assessment Report (Final Year: Project Part I)
Department of Electronics and Communication Engineering
Netaji Subhash Engineering College, Technocity, Garia, Kolkata
Page 26 of 27
Page 27 of 27