You are on page 1of 23

Synopsis on

VISUAL WRITING USING CV


Submitted in partial fulfillment of the requirements of the

BACHELOR OF ENGINEERING
IN
COMPUTER ENGINEERING

By

HARSH BADAPURE – 02
ABHISHEK NAGARE -53
ROHAN PATEL -61
ANIRUDH NAVALE -54

Name of the Mentor


Prof. SAMITA PATIL

Department of Computer Engineering


Shivajirao S. Jondhale College of Engineering.
Dombivli (E)
Affiliated to University of Mumbai)
(AY 2023-24)
CERTIFICATE

This is to certify that the Synopsis on Mini Project entitled “ VIRTUAL

WRITING USING CV ” is a bonafide work of Harsh Badapure (02),

Abhishek Nagare (53), Rohan Patel (61), Anirudh Navale (54)

submitted to the University of Mumbai in partial fulfillment of the requirement

for the award of the degree of “Bachelor of Engineering” in “Computer

Engineering” .

Prof. Samita Patil


Mentor

Prof. Manisha Sonawane Dr. Uttara Gogate Dr. Pramod Rodge


Project Coordinator Head of Department Principal
Mini Project Approval

This Synopsis on Mini Project entitled “VISUAL PAINTING” by Harsh

Badapure (02), Abhishek Nagare (53), by Rohan Patel (61) , Anirudh Navale

(54) is approved for TE (Computer Engineering) Semester V during the

academic year 2023-24

Examiners

1………………………………………
(Internal Examiner Name & Sign)

2…………………………………………
(External Examiner name & Sign)

Date:

Place:
Contents

Abstract I

Acknowledgments II

List of Abbreviations III

List of Figures IV

1 Introduction
1.1 Introduction 1
1.2 Motivation 4
1.3 Problem Statement & Objectives 4
1.4 Organization of the Report 5

2 Literature Survey

2.1 Survey of Existing System/SRS 6


2.2 Literature survey table 7

3 Proposed System

3.1 Introduction 8
3.2 System Architecture 9
3.3 Process Diagram 10
3.4 System Flowchart 11
3.5 Details of Hardware & Software 12
3.6 Application 13

4 References

4.1 Reference 14
ABSTRACT

In this project we have implemented the “VISUAL WRITING” using OpenCV and object
detection algorithm. Virtual writing and controlling system is challenging research areas in
field of image processing and pattern recognition in the recent years. It contributes extremely
to the advancement of an automation process and can improve the interface between man and
machine in numerous applications. Several research works have been focusing on new
techniques and methods that would reduce the processing time while providing higher
recognition accuracy. Given the real time webcam data, this jambord like python application
uses OpenCV library to track an object-of-interest (a human palm/finger in this case) and
allows the user to draw by moving the finger, which makes it both awesome and interesting to
draw simple thing.

I
ACKNOWLEDGEMENT

We sincerely wish to thank the project guide Prof. Samita Patil for her encouraging and inspiring guidance
helped us to make our project a success. Our project guide makes us endure with her expert guidance, kind
advice and timely motivation which helped us to determine ourproject. We express our deepest thanks to
our HOD Dr. Uttara Gogate whose benevolent helps us making available the computer facilities to us for
our project in our laboratory and making it true success. Without his kind and keen co-operation our project
would have been stifled to standstill. Lastly, we would like to thank our college Principal. P.R Rodge for
providing lab facilities and permitting to go on with our project. We would also like to thank our colleagues
who helped us directly or indirectly during our project. Thank You

Harsh Badepure ………………


Abhishek Nagare ……………..
Rohan Patel ……………………
Anirudh Navale ……………….

II
List of Abbreviations

Sr. Abbreviati Definitions


no ons
1 Open CV Open-source computer vision library

2 RAM Random access memory

3 HCI Human computers interaction

4 ML Machine learning

5 RCNN Region based convolutional neural network

III
List of Figures

Fig.no Figure Name Page no

3.1 ARCHITECTURE 09

3.2 PROCESS DESIGN 10

3.3 FLOWCHART 11

IV
1. Introduction

1.1 Introduction:
VISUAL WRITING using AI, OpenCV and Mediapipe is an application that tracks the
movement of an object. Using this tracking feature, the user can draw on the screen by moving
the object (which in our project is the human hand) in the air, in front of the webcam. This real
time webcam data generated by tracking the movement of the object helps the user to draw
simple things which are both interesting and challenging.

OpenCV (Open-Source Computer Vision) -is a programming language library


consisting of different types of functions mainly for computer vision. To explain in a simple
language or in general way it is a library used for Image Processing.

In Python, `cv2` refers to the OpenCV (Open Source Computer Vision Library) module.
OpenCV is a popular open-source computer vision and image processing library that provides
tools and functions for a wide range of image and video analysis tasks. It is widely used in
computer vision applications, image and video manipulation, object detection, facial
recognition, and more.

The `cv2` module in Python provides a Python interface to the OpenCV library, allowing
developers to access and use OpenCV's functionality within Python programs. You can use it
to perform tasks such as:

1. Image Manipulation: You can load, display, save, and manipulate images using OpenCV.
It provides various functions for resizing, cropping, rotating, and filtering images.

2. Video Analysis: OpenCV allows you to capture, process, and analyze video streams from
webcams or video files. You can perform tasks like object tracking, motion detection, and video
stabilization.

3. Image Processing: It offers a wide range of image processing techniques, including filtering,
edge detection, thresholding, and morphological operations.

4. Feature Detection: OpenCV provides tools for detecting and matching features in images,
such as corners, keypoints, and descriptors. This is useful for tasks like object recognition and
tracking.

1
5. Object Detection: You can use OpenCV to perform object detection and recognition tasks.
It supports various pre-trained models for tasks like face detection, pedestrian detection, and
more.

6. Machine Learning: OpenCV has machine learning libraries for tasks like classification,
clustering, and regression. It also supports integration with popular machine learning
frameworks like TensorFlow and PyTorch.

7. Camera Calibration: OpenCV helps in camera calibration, which is crucial for tasks like
3D reconstruction from multiple images.

8. Computer Vision Algorithms: OpenCV provides a collection of computer vision


algorithms for tasks such as optical flow, stereo vision, and camera pose estimation.

It is used mainly to do all the operations which are related to Images.

What it can do:

1. Read and Write Images.

2. Detection of faces and its features.

3. Detection of different shapes such as circle, rectangle etc in an image. E.g Detection of
coins in images.

4. Text recognition in images. e.g., Reading Number Plates.

5. Can modify the quality of an image or it's colour.

6. Developing Augmented reality apps. OpenCV is a library for images.

It roughly supports all main programming languages. Commonly used in python and C++.
OpenCV can be used to read or write an image, for image modification. Convert coloured to
gray, binary, HSV etc. OPENCV is also an OPEN SOURCE.

2
MediaPipe -is Google's open-source framework (graph based) used for media processing. Mainly aims
at making media processing easier for us by providing machine learning features and some integrated
computer vision.
Some of its notable applications is: Face detection

In Python, `mediapipe` is an open-source library developed by Google that provides a framework for
building various applications related to media and machine learning, including media processing (such as
video and audio analysis) and computer vision tasks. The abbreviation `mp` is commonly used as an alias
when importing the `mediapipe` library to make it more concise and convenient to use in your code.
`mediapipe` offers a range of pre-built solutions and pipelines for various tasks, making it easier to
develop applications involving real-time data processing and analysis. Some of the common use cases for
`mediapipe` include:

1. Hand Tracking: `mediapipe` can track hand movements and gestures in real-time, making it useful for
applications like virtual touchscreens and gesture recognition.
2. Face Detection and Recognition: It provides models for face detection, facial landmark detection, and
face recognition, which can be used in applications like augmented reality and emotion recognition.
3. Pose Estimation: `mediapipe` can estimate human body poses, making it valuable for fitness and
wellness applications, as well as motion capture.
4. Objectron: This feature allows 3D object detection and tracking, which can be useful for applications
involving object recognition and augmented reality.
5. Selfie Segmentation: It can segment the foreground and background in real-time, which is used in
various photo and video editing applications.
6. Holistic: Holistic is a comprehensive solution that combines face, hand, and pose detection, making it
suitable for applications requiring full-body tracking and analysis.

1
3
1.2 Motivation:

Currently there is a large movement in the Technology field post-pandemic, to streamline


processes and improve current methods of conducting business, teachings, etc. As with any
work, every effort is difficult without an “information at your fingertips” type of application.
A well-designed software is prepared to help people in different business fields; helping users
to create specific highlights or draw and write stuffs just by using hand movement. Several
problems with the current management system have been identified. While current model
allows user to write on the canvas only by using an input device i.etrackpad or mouse.
Additionally it should improve communication between user and audience by creating a real
time work and reducing required time efficiently. Further, it will be helpful to increase in
interest among the third party as well as allows us to unlock one more astonishing aspect of
technology.

1.3 Problem Statement:

During pandemic as the world was forced to live in quarantine. The deadly virus still wasn’t
able to stop the busy world, as the work world shifted almost 90% Online. As much as tough
it was for everyone to adjust in the world of technology completely; a lot of people got used to
it real soon.

While teaching was done online as well; it was tough for teachers as well as students to go from
blackboard to ppts. As a result, big meeting sites like Google meet, Zoom,etc introduced a
whiteboard feature. Here, the meeting organizer can use an online whiteboard with the help of
input devices like trackpads or mouse. As the time goes by, this method turns out of be very
much time consuming and lags a lot.

4
1.4 Organization of Project:
• Conducting a Survey
• Understanding Problem Statement
• Finalizing Proposed Solution
• Implement code
• Debug the code
• Execute code for the required output
• Implementation of code in real life as a working example
• Prepare a report

5 3
2. Literature Survey

2.1 Survey of Existing System / SRS:

In the past decades, gestures were usually identified and judged by wearing data gloves to
obtain the angles and positions of each joint in the gesture Several papers and projects have
targeted the issue of hand gesture recognition. However, it is difficult to use widely due to the
cost and inconvenience of wearing the sensor. In contrast, the non-contact visual inspection
methods have the advantage of low cost and comfort for the human body, which are the
currently popular gesture recognition methods. Chakraborty proposed the skin colour models
utilizing image pixel distribution in a given colour space, which can significantly improve the
detection accuracy in the presence of varying illumination conditions. However, it was difficult
to achieve the desired results using the model-based methods because of the light sensitivity
during the imaging process. The algorithm-based non-contact visual inspection methods were
also used to conduct the gesture recognition. But computer vision is a rapidly growing field,
partly as a result of both cheaper and more capable cameras, partly because of affordable
processing power, and partly because vision algorithms are starting to mature. By using Hand
gestures user can communicate more information in less time period. So, for improving the
interface International Journal of Technical Research and Applications Page between users and
computers human computers interaction (HCI) technology has great utilization. The OpenCV
itself has played a role in the growth of computer vision by enabling thousands of people to do
more productive work in vision. With its focus on real-time vision. It gives reader a boost in
implementing computer vision and machine learning algorithms by providing many working
coded examples to start from. OpenCV, that allows the reader to do interesting and fun things
rapidly in computer vision.

6
2.2 Literature Survey Table
Author Published Focus on Advantage
The project takes advantage
of this gap by concentrating
the focus of study is on data
S.U. on the development of a
learning, data similarity, and a
Saoji,NisthaDua, motion-to-text converter
May 2019 set of algorithms that are best
Akash Kumar that might be used as
suited to the type of data being
C,BharatPhogat software for intelligent
obtained.
wearable gadgets that allow
users to write from the air
extracting features from
The key advantage of this
Nimisha K Pand accessible data and improving
July 2016 technique is the short
Agnes Jacob feature extraction using
response time
algorithms
Kavitha and It works well for tracking a
Feb 2014 to detect the shape of the hand
Tejaswini large object
new algorithm based upon the remove the noise and solve
Rakibe and Patil Dec 2012 background subtraction the background interruption
algorithm difficulty
October background subtraction
Wang and Zhao Detection of moving object
2010 technique

7 5
3. Proposed Systems

3.1 Introduction:

The objective is to create a free space where one can draw in air freely. The camera detects
the fingertip and tracks its motion throughout the screen. Whenever the hand comes in front
of the camera, the initial thing to do is detect the fingertip. There are various ways of fingertip
detection. Fingertip Detection We are aiming to develop a system which can accurately
detect the fingertips. First, we will detect the whole hand and then the region segmentation
is done. Region segmentation is a two-step approach which includes skin segmentation and
background subtraction. This system will work accurately in real time. For background
subtraction we may use faster RCNN methods. Determining the center of gravity is important
as it is used to detect some particular hand gestures for operations to be done. The proposed
system aims to use two algorithms for centroid calculation and then take the average value
of both as the final result. Distance transformation is the algorithm used and the pixel with
the highest intensity is the center of gravity.

8
3.2 System Architecture:

Fig. 3.1. Architecture

There are two sides in the architecture i.e Client side and Server side. On client side, machine
will read the input and prepare the frames from the clients webcam. The prepared frames will
be further sent to server side for further execution. On Server side, server will accept output
frames from client. From the accepted frames, contour will be detected and process them;
showing the output live.

7
9
3.3 Process Design

Fig 3.2. Process Design

When the user showcase the palm on webcam. It will detect the contour and focus on fingertip
and start tracking it. From start to
end point it will capture the hand trajectory and recognise the gesture. After recognition it will
create a live output on both screen i.e. canvas and on the mapping web screen.

10
3.4 SystemFlowchart:

Fig. 3.3. Flowchart

Firstly when the program is executed, it will detect the palm on screen through webcam.
User have to select the tool; that is, colors in this case. If it is not selected then it will go
on the default blue color. If already selected, then it will start tracking the hand
trajectory and draw it on the side canvas accordingly.

9
11
3.4 Details Of Hardware And Software:

• The minimum hardware requirements to execute the system are as follows:


o Processor – Intel I5
o RAM – 4GB
o Storage – 1GB
o Web Camera
• The minimum software requirements to execute the system are as follows:
o Operating system – Windows 10
o Programming Language – python
o Front End – Python Tkinter and OpenCV
• The technology used to execute the system are as follows:
o Mediapipe
o OpenCV
o NumPy

12
3.5 Application:

Visual writing using computer vision (CV) involves the use of computer vision techniques to
analyze and generate written content based on visual data. This can have various practical
applications in different fields. Here are some applications of visual writing using CV:
1. Image Captioning: Computer vision can be used to generate descriptive captions for
images. This is useful in social media, content creation, and accessibility for visually
impaired individuals.
2. Automatic Report Generation: In industries like healthcare, CV can be used to
analyze medical images and automatically generate written reports or summaries,
saving time for medical professionals.
3. Visual Assistants: CV can power visual assistants that provide real-time descriptions
of the surroundings for individuals with visual impairments or tourists exploring new
places.
4. Content Generation: Content creators can use CV to analyze images and generate
written content for articles, blogs, or marketing materials.
5. Product Descriptions: E-commerce platforms can use CV to automatically generate
product descriptions from images, making it easier to manage large catalogs.
6. Social Media Posts: CV can assist users in generating captions for their photos on
social media, improving engagement and accessibility.
7. Visual Search: In e-commerce, users can search for products using images, and CV
can help generate textual descriptions for these visual queries.
8. Automated Image Metadata: CV can be used to generate metadata for image files,
making it easier to organize and search for images in digital archives.
9. Forensic Analysis: In law enforcement, CV can analyze images and generate reports
for evidence presented in court cases.
10. Agricultural Monitoring: Farmers can use CV to analyze drone or satellite imagery
to monitor crop health and generate reports on crop conditions.

11
13
References

[1] Pranavi Srungavarapu, Eswar Pavan Maganti, Srilekkha Sakhamuri, Sai Pavan
Kalyan Veerada, Anuradha Chinta, Virtual Sketch using Open CV,ISSN: 2278-3075
(Online), Volume-10 Issue-8, June 2021

[2] Prof. S.U. Saoj, Nishtha Dua,Akash Kumar Choudhary ,Bharat Phogat AIR
CANVAS APPLICATION USING OPENCV AND NUMPY IN PYTHON,e- ISSN:
2395-005

[3] Saira Beg, M. Fahad Khan and Faisal Baig, \\\"Text Writing in Air,\\\" Journal of
Information Display Volume 14

[4] Sidra Mehtab ,Jaydip Sen , Object Detection and Tracking Using Opencv in
Python.

[5] Neeraj Bhardwaj, Subhash Chand Agrawal, Rajesh Kumar Tripathi


VIRTUAL DRAWING: AN AIR PAINT APPLICATION – IEEE

[6] [1]. R. S. Jadon and G. R. S. Murthy, 2009. International Journal of Information


Technology and Knowledge Management, vol. 2(2), pp. 405–410, "A Review of
Vision Based Hand Gestures Recognition."

[7] "Real-Time Arabic Sign Language (ArSL) Recognition" International Conference


on Communications and Information Technology 2012. Nadia R. Albelwi and Yasser
M. Alginahi.

[8] Geetha M. and Manjusha U. C. (2013), International Journal of Computer Science


and Engineering (IJCSE), "A Vision Based Recognition of Indian Sign Language
Alphabets and Numerals Using B-Spline Approximation."

[9] Vinod, P. R., Gopalakrishnan, U., and Ieee (2013); Adithya, V. Artificial Neural
Network Based Method for Recognition of Indian Sign Language.

[10] Video Audio Interface for Recognising Gestures of Indian Sign Language,

14
P.V.V. Kishore, P. Rajesh Kumar, E. Kiran Kumar, and S.R.C. Kishore, International
Journal of Image Processing (IJIP), Volume 5, Issue 4, 2011, pp. 479– 503.

[11] "Real time hand pose estimation using depth sensors," in Consumer depth
cameras for computer vision, Springer, pp. 119–137. C. Keskin, F. Kraç, Y. E. Kara,
and L. Akarun.

[12] "Real-time vernacular sign language recognition using mediapipe and machine
learning," A. Halder and A. Tayad, 2021. Website for the publication: www.ijrpr.com
ISSN, vol. 2582, p. 7421.

[13] "3D sign language recognition using spatio temporal graph kernels," Journal of
King Saud University-Computer and Information Sciences, 2022; D. A. Kumar, A. S.
C. S. Sastry, P. V. V. Kishore, and E. K. Kumar

[14] "Robust hand gesture recognition based on fingerearth mover's distance with a
commodity depth camera," in Proceedings of the 19th ACM international conference
on Multimedia, by Z. Ren, J. Yuan, and Z. Zhang (2011).

[15] In the Macromolecular Symposia, A. K. Sahoo (2021), "Indian sign language


recognition using machine learning techniques,"

13
15

You might also like