Project Phase 1 Final

Department of Electronics and Communication Engineering
Project Synopsis Presentation – 2021 -2022
OBJECT DETECTION, IDENTIFICATION AND GESTURE

Project Title: CONTROL USING MACHINE LEARNING ON RASPBERRY PI
Name of the Student USN

Group Number
SHISHIR KUMAR SINGH 1AY15EC090
HARISH PURTY 1AY16EC033
NIKHIL PEDI REDDY 1AY15EC065 39
SHAMEEK BARMAN 1AY16EC088
Name of the Guide: MR. SANDEEP KUMAR K

Designation: Asst. Proffessor
Acharya Institute of Technology, Acharya Dr. Sarvepalli Radhakrishnan Road, Soladevanahalli,Bengaluru 560107 | 1
Motivation and Objectives
The aim of this project is to evaluate the classification performance of the suitable deep learning models for
real-time object recognition and identification and use Convolutional Neural Networks for hand finger counting
and openCV for further tasks.
The following objectives have been identified to fulfil the aim of this work:
• To identify suitable and highly efficient deep learning models for real-time object recognition and
identification.
• Evaluate the classification performance of the selected deep learning models.
• Compare the classification performance of the selected models among each
other and present the results.
• Understanding of openCV and suitable Convolutional Neural Networks efficient for the required tasks.
Survey of previous work/literature etc…
1. LITERATURE SURVEY :
Date of Name Of The Paper Conclusion Of The Paper
paper
published
November- Learning to Detect Objects in Images via a

Sparse,Part-Based Representation-Agarwal, S.,
To Present an approach for learning to detect object in images using a
sparse part- based representation.It also addresses several methodological
2004 Awan, A., and Roth, D. issues that are important in evaluating object detection approaches.
October - Azizpour, H., and Laptev, I. (2012). “Object

detection using strongly-supervised
It proposes a novel misalignment-robust representation (MRR) model in
order for real-time face recognition.By modelling vehicles and pedestrians
2012 deformable part models,” in Computer Vision- as cuboids and cylinders of fixed sizes, we are able to estimate the
ECCV deconvolution of the occupancy map, and
recover object locations
March-April - Rohini M, Abishek Leo Kingston. j, Shriram GS,

Siva Sankaran & Vasuki G, “Hand Gesture
In this paper it is discussed how camera can be used for Detection hand
gestures and can be applied to any. We are using camera as a detecting
2021 Recognition Using OpenCV” device as well as input device for Augmented Reality.
Introduction
• A few years ago, the creation of the software and hardware image processing systems was mainly limited to
the development of the user interface, which most of the programmers of each firm were engaged in. The
situation has been significantly changed with the advent of the Windows operating system when the
majority of the developers switched to solving the problems of image processing itself.
• Object recognition is to describe a collection of related computer vision tasks that involve activities like
identifying objects in digital photographs. Image classification involves activities such as predicting the class
of one object in an image.
• Object localization is refers to identifying the location of one or more objects in an image
and drawing an abounding box around their extent. Object detection does the work of
combines these two tasks and localizes and classifies one or more objects in an image.
When a user or practitioner refers to the term “object recognition“, they often mean
“object detection“.
• Image classification also involves assigning a class label to an image, whereas object localization
involves drawing a bounding box around one or more objects in an image.
• Object detection is always more challenging and combines these two tasks and draws a
bounding box around each object of interest in the image and assigns them a class label.
Together, all these problems are referred to as object recognition
• Object recognition refers to a collection of related tasks for identifying objects in digital
photographs. Region-based Convolutional Neural Networks, or R-CNNs, is a family of techniques
for addressing object localization and recognition tasks, designed for model performance.
• You Only Look Once, or YOLO is known as the second family of techniques for object recognition
designed for speed and real-time use.
• In this gesture based controlling media player system we are going to recognize both facial and
hand gestures.
• We will be achieving this through Image Processing and Pyautogui.
• The Pyautogui is a python library that allows to control the mouse and keyboard.
Images For Gesture Control
Flow Chart for System Architecture of Gesture Control
Methodology With Tools And Technology
InceptionV3:
• Inception v3 is widely used as image recognition model that has showed to obtain accuracy of greater than
78.1% on the ImageNet dataset.
• The model is the culmination of many ideas developed by researchers over years. It is based on “Rethinking
the Inception Architecture Computer Vision” by Szegedy.
• The model is made of symmetric and asymmetric building blocks, including convolutions, average pooling,
max pooling, concats, dropouts, and fully connected layers. A high-level diagram of the model is shown
below:
Existing Methods :
1. ResNet
• ResNet To train the network model in a more effective manner, we herein adopt the same strategy as that
used for DSSD (the performance of the residual network is better than that of the VGG network).
• The goal is to improve accuracy. However, the first implemented for the modification was the replacement of
the VGG network which is used in the original SSD with ResNet.
• We will also add a series of convolution feature layers at the end of the underlying network. These feature
layers will gradually be reduced in size that allowed prediction of the detection results on multiple scales.
2. R-CNN :
• To circumvent the problem of selecting a huge number of regions, Ross Girshick et al. proposed a method
where we use the selective search for extract just 2000 regions from the image and he called them region
proposals.
• Therefore, instead of trying to classify the huge number of regions, you can just work with 2000 regions.
These 2000 region proposals are generated by using the selective search algorithm.
3. Fast R-CNN :
• The same author of the previous paper(R-CNN) solved some of the drawbacks of R-CNN to build a faster object
detection algorithm and it was called Fast R-CNN.
• The approach is similar to the R-CNN algorithm. But, instead of feeding the region proposals to the CNN, we feed the
input image to the CNN to generate a convolutional feature map.
• From the convolutional feature map, we can identify the region of the proposals and warp them into the squares and
by using an RoI pooling layer we reshape them into the fixed size so that it can be fed into a fully connected layer.
• The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to feed 2000 region proposals to the
convolutional neural network every time. Instead, the convolution operation is always done only once per image and
a feature map is generated from it.
4. YOLO — You Only Look Once :
• All the previous object detection algorithms have used regions to localize the object within the image.
• The network does not look at the complete image. Instead, parts of the image which has high probabilities
of containing the object.
• YOLO or You Only Look Once is an object detection algorithm much is different from the region based
algorithms which seen above. In YOLO a single convolutional network predicts the bounding boxes and the
class probabilities for these boxes.
Softwares Used :
1. Python
2. TensorFlow lite
3. Numpy
4. Keras
5. Matplotlib
6. OpenCV
7. VLC.py
8.ImageAI
9.PHP
10. HTML
Hardwares Used :
1. Raspberry Pi :
2. USB Camera :
Applications & Advantages
o
For Object Detection:

• Object detection is breaking into a wide scope of enterprises, with use cases extending from individual
security to efficiency in the working environment.
• Object detection is applied in numerous territories of image processing, including picture retrieval, security,
observation, computerized vehicle systems and machine investigation.
• Critical difficulties remain in the field of object detection.
1. Tracking Objects :
• An item/object detection framework is additionally utilized in tracking the objects, for instance tracking a ball
during a match in the football world cup, tracking the swing of a cricket bat, tracking an individual in a video.
• Object tracking has an assortment of uses, some of which are surveillance and security, traffic checking,
video correspondence, robot vision and activity.
2. Automated CCTV Surveillance :
• Surveillance is a necessary piece of security and watch.
• Typically CCTV is running inevitably, so we need a huge size of the memory framework to store the recorded
video. By utilizing an object discovery framework we can mechanize CCTV so that in the event that a few
items are detected, at that point the record is going to begin.
• Utilizing this we can diminish the over and over account a similar picture outlines, which expands memory
effectiveness. We can diminish the memory prerequisite by utilizing this object detection system.
3. Person Detection :
• Person detection is necessary and critical work in any intelligent video surveillance framework, as it gives the
essential data to semantic comprehension of the video recordings.
• Person detection is undertakings of Computer vision frameworks for finding and following individuals.
Person detection is the task of finding all examples of individuals present in a picture, and it has been most
broadly achieved via looking through all areas in the picture, at all potential scales, and contrasting a little
region at every area with known layouts or examples of individuals.
4. Vehicle Detection :
• Vehicle Detection is one of the most important part in our daily life.
• By using Vehicle Detection technique we can detect the number plate of a speeding car or accident affected
car.
• This also enables for security of the society and decreasing the number of crimes done by car.
For Gesture Control:
1.Medical Operation-
• Gestures can be used to control the distribution of resources in hospitals, interact with medical
instrumentation, control visualization displays, and help handicapped users as part of their rehabilitation
therapy.
• Some of these concepts have been exploited to improve medical procedures and systems; for example, a
technology which satisfied the “come as you are” requirement, where surgeons control the motion of a
laparoscope by making appropriate facial gestures without hand or foot switches or voice input.
2. Gesture Based Gaming Control-
• Computer games are a particularly technologically promising and commercially rewarding arena for
innovative interfaces due to the entertaining nature of the interaction.
• In computer-vision-based, hand-gesture-controlled games, the system must respond quickly to user
gestures, the “fast-response” requirement.
• In games, computer-vision algorithms must be robust and efficient, as opposed to applications (such as
inspection systems) with no real-time requirement, and where recognition performance is the highest
priority. Research efforts should thus focus on tracking and gesture/posture recognition with high-frame-rate
image processing
Advantages :
1. Improve accuracy-
• The significant most advantage of object detection projects is that it is more accurate than human vision.
• The human brain is astounding, so much that it can finish pictures dependent on only a couple of snippets of
data. But it can sometimes also keep us from seeing what is actually there. The complete picture isn’t always
accurate because human brains make assumptions.
• Object detection also operates at the pixel level at which the human brain can’t process. This allows object
detection projects to provide more accurate results.
2.Deliver Faster Results-
• The human brain works fast and efficiently, but computers are better at multitasking, which permits object
detection projects to deliver quicker results for some applications.
• Using object detection projects to finish projects not only delivers results in a fraction of the time but also frees
up valuable time to focus on higher-level tasks that truly require human cognition.
• Kiosks with gesture control interface are faster than touchscreen kiosks. Since the gesture recognition tools track
movements in real-time, the input is much quicker than touchscreen kiosks.
3. Reduce cost -
• After an object detection project has been trained, it can repeat the same tasks with minimal cost, and it even
continues to learn while it does that. This saves endless long hours of manual labour and its related expenses.
• Touch screens usually go through wear and tear over the years due to frequent use. Then, there is also the cost of
keeping the screen clean at all times. With gesture controls, all these costs are eliminated.
4. Provide Unbaised Result-
• When object detection projects look at an image with a specific goal, it does not consider any information not related
to that goal.
• This lessens the bias that humans might introduce to a process, whether intentionally or unintentionally.
5. Offer A Unique Customer Experience -
• Object detection projects have been used to improve the customer experience both online and in retail stores.
• Object detection can identify products or brands that an individual is most likely to buy via online platforms based on
images in social media profiles.
• Gesture control interface can be used standalone or combined with other advanced technologies such as voice
recognition to provide users with more interaction options.
• Users can easily switch to speaking with the kiosk if they get tired of gesticulating
• Flow Chart For Object Detection:
Read Image Feature Understanding
Homography Feature Matching
• Working Model:
• Raspberry pi 3B+ Specifications:
Camera Input – 15-pin CSI (Camera Serial Interface)

Model – Pi 3 B+
Video Output – HDMI, Composite, DSI (Display Serial Interface)
Released – Mar 2018
Audio Output – Analog 3.5 mm jack, Digital via HDMI
Architecture – ARMv8-A 64/32-bit
Storage – Micro SD Slot
SoC – Broadcom BCM2837B0
Ethernet – 10/100/1000 (max 300) Mbps
CPU – 1.4 GHz ARM Cortex-A53
Onboard WiFi – 2.4 Ghz and 5 Ghz 802.11 b/g/n/ac
Cores – 4
Onboard Bluetooth ® - 4.2 BLE
GPU – Broadcom VideoCore IV HD 1080p
Input/Output Pins – 40
Memory RAM – 1GB
Power (less peripherals) – 5v 950 ma
Operating System – Primarily Linux based
Size – 85 mm x 56 mm
USB 2.0 Ports – 4
• Steps To Setup Raspberry pi:
1.Load Up the 32 GB external sd card with Raspbian OS.
2.Insert a 32Gb of micro sd card inside the Raspberry pi.
3.Power up the raspberry pi and connect it to a monitor with the help of a HDMI cable.
4. Connect the laptop and the raspberry pi to the same network.
5. In the laptop open remote desktop and enter the ip adress of the network.
6.It will ask for username and password.Enter username as pi and password as raspberry.
7.You will get a window where you can operate the raspberry pi.
8.Write the code and download all the necessary software required for you by typing
command in the bash terminal.
• Flow Chart for the main program:
Allowing
Updating & Installing execution of Installing Tensorflow
Upgarding the Apache web system lite &USB coral
raspberry pi OS server & PHP commands from libraries
PHP
Executing the bash Training Models &

script to run the code Installing Open CV
writing the code
Conclusion:
• Object detection and classification can be achieved with the help help of different CNN
models.In our case we are using Inception models specifically Inception V3 and mobilenet
models.
• To reduce the inference time on the output we use different Inception and mobilenet
model as each are having different accuracy,inference rate along with frame rates.
• To the reduce the inference time further we can use a coral USB accelerator.
• At the output layer sigmoid activation function is used where as in the hidden layer Relu
activation function is used.
References :
1. Agarwal, S., Awan, A., and Roth, D. (2004). Learning to detect objects in images via a sparse, part-based
representation. IEEE Trans. Pattern Anal. Mach. Intell. 26,1475–1490. doi:10.1109/TPAMI.2004.108
2. Alexe, B., Deselaers, T., and Ferrari, V. (2010). “What is an object?,” in ComputerVision and Pattern
Recognition (CVPR), 2010 IEEE Conference on (San Francisco,CA: IEEE), 73–80.
doi:10.1109/CVPR.2010.5540226
3. Azizpour, H., and Laptev, I. (2012). “Object detection using strongly-superviseddeformable part models,” in
Computer Vision-ECCV 2012 (Florence: Springer),836–849.
4. Rohini M, Abishek Leo Kingston. j, Shriram G S, Siva Sankaran & Vasuki G, “Hand Gesture Recognition Using
OpenCV”, International Journal of Scientific Research in Science.
5. Tenserflow.org, Wikipedia, Google.com, Opencv.org, Raspberrypi.org, Python.org .

Project Phase 1 Final

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Phase 1 Final

Uploaded by

Copyright:

Available Formats

Department of Electronics and Communication Engineering

Project Synopsis Presentation – 2021 -2022

OBJECT DETECTION, IDENTIFICATION AND GESTURE

Name of the Student USN

Name of the Guide: MR. SANDEEP KUMAR K

November- Learning to Detect Objects in Images via a

October - Azizpour, H., and Laptev, I. (2012). “Object

March-April - Rohini M, Abishek Leo Kingston. j, Shriram GS,

For Object Detection:

Read Image Feature Understanding

Homography Feature Matching

Camera Input – 15-pin CSI (Camera Serial Interface)

Executing the bash Training Models &

You might also like