Professional Documents
Culture Documents
Bachelor of Technology
in
“Electronics and Communication Engineering”
by
2022
Intelligent Billing System Using Object Detection
Project report submitted to
Visvesvaraya National Institute of Technology, Nagpur in
partial fulfillment of the requirements for the award of
the degree
Bachelor of Technology
in
“Electronics and Communication Engineering”
by
2022
Declaration
We, Neeraj Chidella, N Kalyan Reddy, N Sai Dheeraj Reddy and Maddi
Mohan, hereby declare that this project work titled “Intelligent Billing System
Date:
Certificate
This is to certify that the project titled “Intelligent Billing System Using Object
Dr. A. G. Keskar
Head, Department of Electronics and Communication Engineering
VNIT, Nagpur
Date:
ACKNOWLEDGEMENT
The completion of this project could not have been possible without the support
and participation of a number of people who have always given their valuable
suggestions. We sincerely appreciate the constant guidance, support and inspiration of all
those who are involved in bringing success to this project.
We wish to take this opportunity to acknowledge, with deep sense of gratitude
and respect, our Project Guide, Dr. Joydeep Sengupta, Assistant Professor, Department
of Electronics and Communication Engineering, VNIT, Nagpur for his patient guidance,
constructive suggestions and constant encouragement throughout the period of work.
Also, we would like to thank Dr. A. G. Keskar, Professor and Head of Department,
Department of Electronics and Communication Engineering, VNIT, Nagpur and Dr. P.M.
Padole, Director, VNIT, Nagpur for giving us the golden opportunity to work on this
wonderful project on the topic “Intelligent Billing System Using Object Detection”,
which helped us in doing a lot of research. We sincerely thank them for providing each
and every facility required for the successful completion of the project, in spite of this
tough situation during the pandemic.
We would also like to thank our family and friends because any attempt at any
level cannot be satisfactorily completed without their support and encouragement.
Finally, we are deeply indebted to all the above mentioned for their moral support
provided during the project.
Neeraj Chidella
N Kalyan Reddy
N Sai Dheeraj Reddy
Maddi Mohan
ABSTRACT
i
LIST OF FIGURES
1.1. People waiting in shopping malls and road side shops ………… 2
1.2. Barcode Scanners which are used to scan the barcodes. ……... 2
4.3. List of all classes and their accuracies using YOLOv4 …………………… 22
4.8. Results obtained using YOLOv5 along with confidence scores ………. 25
ii
4.9. Result showing actual class and predicted class using CNN ………… 25
iii
LIST OF ACRONYMS
iv
LIST OF PUBLICATIONS
Abstract - With the rapidly increasing technology and development in machine learning,
deep learning and artificial intelligence, improving the billing system is an effective
means of reducing wastage of time. Nowadays, even though barcode scanners have
become as fast as ever, but for fruits and vegetables, it still needs to be entered manually
into the computer which is very time taking and hectic process. Vegetable and fruit
markets have become an integral part of our life hence in such places the environment
must be made hassle free and more importantly, the billing should be less laborious and
efficient without wasting time. In order to overcome the existing problems associated
with the barcode and RFID tags, we proposed an automatic billing system that detects
the fruits and vegetables and then displays the final Bill. The main objective of this
project is to detect the fruits, display the fruits detected and then to bill these items. To
achieve this, we have used two different algorithms, the Fine tuned Convolutional Neural
Network that we built from a base model. To increase accuracy for real time object
detection and for the bounding boxes to be displayed, we used state of the art YOLO
based on pytorch as YOLO predicts the bounding boxes and detects the object faster than
other detection algorithms and is more reliable.
v
INDEX
ABSTRACT …………………………………………………………… i
LIST OF FIGURES …………………………………………………... ii
LIST OF ACRONYMS ………………………………………………. iv
LIST OF PUBLICATIONS ………………………………………….. v
1
CHAPTER 1
INTRODUCTION
1.1. Objective
Wastage of time has been a major issue for many years. The world is constantly
evolving, people are continuously competing with each other to increase productivity in
less time. Automation of each and every daily process has made life easier for mankind,
all the monotonous jobs are being replaced by machines powered by artificial
intelligence and machine learning.
Fig 1.1. People waiting in shopping malls and road side shops
Nowadays, people want to spend more time with family and friends and keep their
mental state perfect instead of wasting time on such monotonous tasks. One such task is
the billing system methods that are being followed right now in India.
Currently, Billing in INDIA is mostly based on Barcode scanners and RFID tags. This
process of billing is acceptable in low populated areas, less dense regions where malls
and markets are not populous, whereas in metropolitan cities, fruit and vegetable
markets, busy areas and densely populated regions this method i.e., barcode scanning of
each and every item present inside the checkout bag and then waiting for the final bill to
pay is a lot of time. This leads to big queues in supermarkets which in turn lead to
increase in waiting times and decrease in satisfaction levels of customers.
Fig 1.2. Barcode Scanners which are used to scan the barcodes
2
As the market for daily products, vegetables and fruits is huge, unsatisfied customers
easily change the place of purchase if this problem is prevalent.
3
1.2. Problem Flow
● Joseph Redmon, Santosh Divvala, Ross Girshick [1] proposed the algorithm of
YOLO (You Only Look Once) for the purpose of object detection. They used
bounding-box technique to detect objects in an image. This algorithm divides the
image into small grids and then checks whether the object is present in that
particular grid or not.
4
● Md Jan Nordin, Norshakirah Aziz, Ooi Wei Xin [4] proposed a model to apply
object detection using CNN on grocery objects. They proposed a model for
billing the obtained objects after object detection. They also created a website for
calculating the total bill of the detected objects.
● Chengji Liu,Yufan Tao, Jiawei Liang, Kai Li ,Yihang Chen [5] developed an
object detection model using the YOLO algorithm. They proposed a method for
tackling real-world image shooting problems like blurring, noise etc using image
degradation models based on YOLO.
● Xiaofeng Ning, Wen Zhu, Shifeng Chen [7] developed a model of image
recognition, object detection and segmentation for images of white background.
They used faster R-CNN algorithm for detection of objects.
● Kavan Patel [8] proposed a model to detect fruits and vegetables using YOLO
algorithm and developed a self-checking portal for the customers.
● Huimin Yuan, Ming Yan [10] proposed an intelligent food identification model
based on Cascade R-CNN and computer vision techniques. They got a very good
output and also a good amount of accuracy.
● Suraj Chopade, Prof. Smita Palnitkar, Sujit Chavan, Anirudha Deshpande [11]
implemented the automated billing system using image processing techniques.
This model detects the objects using image processing and the detected objects
are sent for billing.
● E. K. Jose and Veni. S [12] developed a YOLO based model for finding the open
parking space. They developed this model by detecting multiple objects in the
area using YOLO and hence open parking spaces were found.
● Redmon and A. Farhadi [13] introduced the YOLO9000 algorithm which can
detect over 9000 object categories. They used the COCO dataset for
implementing this algorithm and got good results.
5
● G. M. Farinella, D. Allegra, M. Moltisanti [14] developed a model for
understanding the food items present in an image based on various computer
vision techniques. This model monitors the food intake of a person based on the
images of the food he takes.
● Zhuang-Zhuang Wang, Kai Xie, Xin-Yu Zhang, Hua-Quan Chen, Chang Wen,
Jian-Biao He [15] proposed a model to detect small objects present in an image
using YOLO and Dense Block. They used the Image Super Resolution technique.
1.4. Approach
For the purpose of simplicity and easy understanding, the functioning of the
system has been divided into major tasks.
1. First of all, a dataset should be made or found according to the model that is
being trained, it should be pre-processed and pre annotated with proper
techniques such as rotation and augmentation.
2. Object detection and recognition of different classes of objects is the second task,
which is performed using a object detection model, CNN, You Only Look Once,
version 4 (YOLOv4) and YOLOv5
3. The third task is the supervision of all the results after training the model from all
the three types of models.
4. The fourth task is the applying the received weights to a set of detection images
and checking the results, confidence and the map precision of the model.
5. The fifth task is to assess the models performance, make changes to improve the
accuracy of the model, check whether the given model satisfies our objectives
and verify the final integrity of the objects detected
6. The sixth task is mapping the detected objects to their price through python code
7. The seventh task is to integrate this model after mapping detected objects with
the price and giving a final bill by summing all the objects detected
8. The last and final task is to make a webpage using pytorch and flask by
integrating the model after billing using python script.
6
1.5. Datasets
1. Fruits-360 dataset was used for training the model using CNN algorithm. This
dataset is taken from kaggle.
This dataset is chosen because it has a very large set of images. It will help us in
acquiring high accuracy and hence high detection.
● 90483 total images which are split into train and test sets. The train set
consists of 67692 images and the test set consists of 22688 images.
2. A Custom Dataset was created in YOLOv4 format by collecting the images from
the Open Images Dataset.
● 3850 total images which are split into train and test sets. The train set
consists of 3650 images and the test set consists of 200 images.
7
Here are some of the images from this dataset
3. Another Custom Dataset was created by collecting images from two different
datasets in kaggle and converting them into YOLO format using roboflow as the
images are pre-annotated.
● 1705 total images with 1654 images in the train set and 51 images in the
test set.
8
CHAPTER 2
METHODOLOGY
9
CHAPTER 2
METHODOLOGY
Convolution Layer
In this procedure, important features from an image are extracted .Many filters in a
convolution layer conduct the convolution process. Every image is seen as a matrix of
pixel values.
ReLU layer
The rectified linear unit is abbreviated as ReLU. After the feature maps have been
removed, they must be moved to a ReLU layer. ReLU goes through each element one by
one, converting all negative pixels to zero.It causes the network to become non-linear,
and the result is a rectified feature map. For feature detection, the original image is
scanned with numerous convolutions and ReLU layers.
10
Pooling Layer
In this layer a flattened matrix is sent as input to the fully connected layer to classify the
image.
• The image is processed with many convolutions and ReLU layers for locating the
feature.
• Different pooling layers with various filters are used to identify certain parts of the
image.
• The pooled feature map is flattened and sent to a fully connected layer.
11
2.1.2. Object Detection Using YOLO
● Residual blocks
● Bounding box regression
● Intersection Over Union (IOU)
Residual blocks
The image is divided into S x S grids by predicting the bounding boxes and class
probabilities for each grid.For each grid of the image, image classification and object
localization algorithms are used, and each grid is given a label.The algorithm then goes
through each grid one by one, marking the labels that include objects as well as their
bounding boxes.The grid labels that do not have an item are indicated as zero.
Each grid is labelled, and picture classification and object localization procedures are
used to each grid. Y is assigned to the label.
12
Pc = It represents whether an object is present in the grid or not, If present pc=1 else 0.
bx, by, bh, bw = these are the bounding boxes of the objects (if present).
c=Class (for example, person, fruits etc.)
The concept of intersection over union (IOU) illustrates how boxes overlap in object
detection. IOU is used by YOLO to create an output box that properly surrounds the
items.The bounding boxes and their confidence scores are predicted by each grid cell
.The IOU is 1 if the predicted and actual bounding boxes are identical.This approach
removes bounding boxes that aren't the same size as the actual box.
ACCURACY IMPROVEMENT
Only one object can be detected by a grid when bounding boxes are used for object
detection. As a result, we use the Anchor box to detect several objects. We can use any
number of anchor boxes for a single image to detect multiple images.
13
2.1.3. Object Detection Using YOLOv5
One of the object detection techniques that uses regression is You Only Look
Once (YOLO). By running the method a single time, all the items of various classes in
the image are detected, and bounding boxes are built around them. YOLO is one of the
most efficient object detection techniques. YOLOv4, which is one of the quickest but
less exact object detection algorithms. YOLOv5, the most recent version, has a very high
accuracy. It has been trained to recognise things from 80 different categories.
Architecture of YOLOv5
Figure 2.4 depicts the Yolov5 network architecture. Yolov5 was chosen as our initial
learner for three reasons. To begin, Yolov5 combined the cross stage partial network
(CSPNet) into Darknet, resulting in the creation of CSPDarknet as the network's
backbone. CSPNet solves the problem of recurrent gradient information in large-scale
backbones by including gradient changes into the feature map, reducing model
parameters and FLOPS (floating-point operations per second), ensuring inference speed
and accuracy while simultaneously reducing model size. In the detection of fruits and
vegetables or grocery, speed and accuracy are critical, and the size of the model impacts
its inference efficiency on resource-limited edge devices. Second, to improve
information flow, the Yolov5 used a path aggregation network (PANet) as its neck.
PANet uses a new feature pyramid network (FPN) topology with an improved bottom-up
approach to improve low-level feature propagation. Simultaneously, adaptive feature
pooling, which connects the feature grid to all feature levels, is employed to ensure that
meaningful information from each feature level reaches the next subnetwork. PANet
improves the use of precise localization signals in lower layers, which can significantly
improve the object's location accuracy. Finally, Yolov5's head, the Yolo layer, generates
three various sizes of feature maps (18 18, 36 36, 72 72) to provide multi-scale
prediction, allowing the model to handle tiny, medium, and large objects.
14
Fig.2.4. Architecture of YOLOv5 Showing the various layers
The main advantage of using YOLOV5 over other YOLO algorithms are
1) The decrease in size : There is an approximately 80 percent decrease in the model size
when compared to earlier Yolo models
2) The latest YOLO algorithm is YOLOV5 hence it is 150 percent faster than the earlier
YOLO algorithms
15
2.2. Detecting grocery objects with CNN
As the fruits-360 dataset from kaggle was ordered, pre annotated and well built
dataset, A basic convolutional neural network was implemented using pre annotation
techniques like Rotation, Augmentation and Splitting. After training the dataset, an
accuracy of 98 percent was achieved on the split verification set. An API model was
incorporated to test the results with outdoor real - life images, but the CNN model wasn't
able to give good results with these images.
Basic CNN was not able to give good results with real life images so after a
thorough research, Yolo (You only look once) detection algorithm is chosen and a proper
dataset is formed using BBox annotation. As images are gathered randomly from Google
images and proper Bbox annotations were given. Though an accuracy of 70 percent was
achieved, many images were found to be unrelated to the dataset hence is the reason for
70 percent.
16
CHAPTER 3
SOFTWARE USED
17
CHAPTER 3
SOFTWARE USED
3.1. Dataset Creation Tools
After importing the images from the google images dataset, for making the
images train using yolo model we need a txt file for each image containing information
about the bounding box of each and every object inside that particular image. This is
done using BBOX TOOL. Use of BBOX is shown below,
· Darknet:
Darknet is a neural network framework which is open source and used to train
our custom model with our custom datasets using Yolo. This framework contains the
source files which helps in training our model using the provided dataset with given
configuration settings that are required for the given dataset and model used.
18
· Google Colab:
Colab is a coding environment that runs entirely on the cloud and helps us in
executing our python code and used to clone the darknet framework and use it with our
custom dataset and model. It provides GPU and TPU free of cost for training our dataset.
Our trained model is then deployed into a WebApp which is built using
In this OpenCV is used to read the model weights given and predict the output of
the given input image. A user-friendly GUI is developed using the above-mentioned
software framework which takes an image as input and gives the final bill of all items as
output. ATOM code editor is used for coding all these.
19
CHAPTER 4
RESULTS AND DISCUSSIONS
20
CHAPTER 4
RESULTS AND DISCUSSIONS
21
Fig 4.3. List of all classes and their accuracies using YOLOv4
22
4.3. Website Results
23
4.4. Results of Object Detection using YOLOv5
24
Fig 4.8. Results obtained using YOLOv5 along with confidence scores
Fig 4.9. Result showing actual class and predicted class using CNN
25
4.6. Model training metrics
26
CHAPTER 5
CONCLUSION AND FUTURE SCOPE
27
CHAPTER 5
CONCLUSION AND FUTURE SCOPE
5.1. Conclusion
Vegetables, fruits and groceries are daily commodities required so the experience in
purchasing them should be hassle free. Considering the advancement in the domain of
deep learning and computer vision, we hope our idea would revolutionize the complete
process of the billing system in INDIA and can create a change in existing methods.
In this project, we have proposed a novel solution to reduce the wait times, long queues
and the workforce in the supermarkets. The final billing cart with all the products bought
by a customer is fed as an input image to the website, every fruit/vegetable/grocery is
identified, tallied and a final bill after summing up is provided to the customer. Now each
and every product need not be scanned through barcode or RFID tag instead an image
consisting of all of the cart products can be fed as input and the customers can directly
access the bill.
The proposed web application is designed to make everyone’s life easier during the
process of billing by creating a hassle-free experience reducing the wait times. It is also
hands free so the risk of transmitting any kind of virus also decreases which in turn is
good for the customers health.The number of people working in the counters can be
decreased using this method as the number of counters itself would reduce due to less
demand in billing. Generalization of costs of vegetables and fruits can be done if this
method is employed as everyone would follow the same dataset and can have the same
prices, but the dataset can also be modified according to the shopkeeper's needs.
The main objective of reducing wait times and fastening the billing process is achieved
through this project.
28
5.2. Future Scope
In coming times, we would like to fix some of the setbacks of the project and
improve the detection accuracy of our model. We would also like to create a separate
annotated proper dataset that would consist of a greater number of classifications with
lots of dataset so that the model would understand the features of distinguishment easily
and would give more accurate results
Also, we would like to extend this project by adding a few more features to it , such as
video input which would take video as the feed and give live billing output. We would
also like to integrate the offers available, discounts given and the tax added to the final
bill.
An app integrating the detection and billing system is proposed to be made in the near
period. There is a need to improve the dataset which would in turn improve the model’s
accuracy.
This model once after improving accuracy and integrating it with mobile applications
can be taken into the market for business purposes.
Here, as per our observation one of the major problems during training the model is its
computational time. So as the time progresses due to enhancement in technology the
time taken to train and test the model will be decreasing exponentially, with this we can
incorporate many classes with a huge dataset and train with more complex and improved
versions of yolo algorithms, which makes the model more accurate and also makes it
compatible with a smaller number of resources. So, this project can be used efficiently
even in small devices. It can be used in small stores where the owner can not afford
RFID tags or Barcode scanners, and can directly use his mobile to bill the items with a
simple photo click.
29
APPENDIX A
The GUI created is integrated with the trained model using PYTHON (app.py) backend
code. And the templating of the GUI is done using HTML (success.html, page.html) and
styling is done using CSS (styles.html) and BOOTSTRAP. All the codes used are given
below.
app.py
30
31
32
page.html
33
success.html
34
styles.css
35
REFERENCES
36
REFERENCES
[1] Joseph Redmon, Santosh Divvala, Ross Girshick. “You Only Look Once:
Unified, Real-Time Object Detection”. The IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2016, pp. 779-788.
[3] Marcus Klasson, Cheng Zhang, Hedvig Kjellstrom. “A hierarchical grocery store
image dataset with visual and semantic labels”.
[4] Md Jan Nordin, Norshakirah Aziz, Ooi Wei Xin. “Food image recognition for
price calculation using convolutional neural network”.
[5] Chengji Liu,Yufan Tao, Jiawei Liang, Kai Li ,Yihang Chen. “Object detection
based on YOLO network”. 2018 IEEE 4th Information Technology and Mechatronics
Engineering Conference (ITOEC 2018).
[7] Xiaofeng Ning, Wen Zhu, Shifeng Chen. “Recognition, Object Detection and
Segmentation of white background photos based on Deep Learning”.
[8] Kavan Patel. “Fruits and vegetable detection for POS with Deep Learning”.
[10] Huimin Yuan, Ming Yan. “Food object recognition and intelligent billing system
based on Cascade R-CNN”. 2020 International Conference on Culture-oriented Science
and Technology (ICCST).
37
[11] Suraj Chopade, Prof. Smita Palnitkar, Sujit Chavan, Anirudha Deshpande.
“Automated Super Shop using image processing (Python)”. International Journal of
Future Generation Communication and Networking Vol. 13, No. 2s, (2020), pp.
382–388.
[12] E. K. Jose and Veni. S. “YOLO classification with multiple object tracking for
vacant parking lot detection”. Journal of Advanced Research in Dynamical and Control
Systems, vol. 10, pp. 683-689, 2018.
[13] Redmon and A. Farhadi. ”YOLO9000: Better, Faster, Stronger”. 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017,
pp. 6517-6525.
[15] Zhuang-Zhuang Wang, Kai Xie, Xin-Yu Zhang, Hua-Quan Chen, Chang Wen,
Jian-Biao He. “Small-Object detection based on YOLO and Dense Block via Image
Super-Resolution”. IEEE Access, vol. 9, pp. 56416- 56429.
38
Intelligent Billing System
ORIGINALITY REPORT
26 %
SIMILARITY INDEX
18%
INTERNET SOURCES
11%
PUBLICATIONS
20%
STUDENT PAPERS
PRIMARY SOURCES
1
Submitted to Visvesvaraya National Institute
of Technology
8%
Student Paper
2
www.simplilearn.com
Internet Source 1%
3
Submitted to Sri Lanka Institute of
Information Technology
1%
Student Paper
4
Submitted to Nanyang Technological
University
1%
Student Paper
5
Ritu Tandon, Shweta Agrawal, Rachana
Raghuwanshi, Narendra Pal Singh Rathore,
1%
Lalji Prasad, Vishal Jain. "Chapter 9 Automatic
Lung Carcinoma Identification and
Classification in CT Images Using CNN Deep
Learning Model", Springer Science and
Business Media LLC, 2022
Publication
6
Submitted to Nottingham Trent University
Student Paper 1%