You are on page 1of 6

2023 International Conference on Network, Multimedia and Information Technology (NMITCON)

Autonomous Sorting with 6 DOF Robotic Arm


using Machine Vision
1st Nikhil S Patil 2nd M.D.Jaybhaye
Dept of Manufacturing Engg and Industrial Mgmt Dept of Manufacturing Engg and Industrial Mgmt
COEP Technological University COEP Technological University
2023 International Conference on Network, Multimedia and Information Technology (NMITCON) | 979-8-3503-0082-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/NMITCON58196.2023.10276195

Pune,India Pune,India
patilns21.mfg@coeptech.ac.in mdj.mfg@coeptech.ac.in

Abstract— Numerous industrial and logistical applications simultaneously. Faster R-CNN, alternately, is a region-based
frequently use robotic automation. Robots must be able to method that creates region proposals and performs object
autonomously classify objects in congested surroundings to detection within these regions, reaching great precision. This
increase operational effectiveness and minimize human study focuses on developing an autonomous sorting system
involvement. The use of robotic arms for autonomous sorting that takes advantage of machine vision techniques, a 6 DOF
tasks has gained significant attention due to its potential to robot, and cutting-edge object detection algorithms, namely
enhance efficiency and productivity in various industries. This YOLO v7 and Faster R-CNN. With these two algorithm
robotic system is based on the Niryo-ned robotic arm, OpenCV [7] was used. By adding novel features using
combined with an overhead USB webcam. For the object
OpenCV, the system aims to achieve accurate and real-time
detection purpose YOLO v7 (You Only Look Once version 7)
and Faster RCNN (Region based Convolutional Neural
object detection within a given environment. The field of
Networks) were trained. For that custom dataset was used. machine vision, which combines various techniques with
Dataset consists of different colored cubes. After training deep learning algorithms, has demonstrated remarkable
YOLO v7 has given better accuracy 98.4% as compared to success in object detection and recognition tasks. YOLO v7
faster RCNN 94.1%. As a result, YOLO v7 was chosen as the and Faster R-CNN are two prominent algorithms that excel
preferred model for autonomously sorting objects using the in their speed, accuracy, and precision in object detection.
Niryo Ned robot. Further by adding novel features to the
The remainder of this paper is organized as follows:
existing model autonomous sorting has successfully completed
Section II gives an overview of related work and previous
by niryo ned robot.
research in the field of autonomous sorting. Section III
Keywords—Machine vision, deep learning, object detection, presents the autonomous sorting system. Section IV
autonomous sorting. discusses the experimental results, implementation, and
discussion. And Section V concludes the research with a
I. INTRODUCTION summary of findings, implications.
In recent years, automation and artificial intelligence II. RELATED WORK
have brought about significant advancements in various
industries, including manufacturing and logistics [1]. Among Autonomous sorting, the process of categorizing and
the areas of focus, autonomous sorting systems have gained organizing objects based on specific criteria, plays a crucial
substantial interest due to their potential to enhance role in various industries such as manufacturing, logistics,
efficiency, accuracy, and productivity when handling large recycling, and warehousing. Traditionally, manual sorting
quantities of objects.[2] To address the challenges associated has been employed, requiring significant human effort and
with object recognition and sorting tasks, the use of machine time. However, with advancements in machine vision
vision techniques and robotics has come out as a promising technology and artificial intelligence, autonomous sorting
solution. This research paper concentrates on implementing systems have emerged as efficient and reliable alternatives.
an autonomous sorting system that combines a 6 Degree of George et al. [8] aims to automate waste sorting using deep
Freedom (DOF) robot with the latest object detection learning models. Two algorithms, CNN and SVM, were
algorithms, specifically YOLO v7 and Faster R-CNN [3]. compared for waste classification. SVM has given a higher
The objective is to leverage these advanced algorithms to accuracy of 94.8% and showed adaptability to different
achieve real-time and precise object detection within a given waste types. The SVM model was implemented on a
environment. Raspberry Pi 3, achieving quick classification with an
average time of 0.1s per image. Joao et al. [9] proposes a
Machine vision is an interdisciplinary field encompassing cutting-edge object detector like Faster R-CNN to suggest an
various techniques that enable machines to capture, process, alternative deep learning strategy for automating trash
and interpret visual data [4]. By applying deep learning identifying and categorizing in food trays.
algorithms, these techniques have given remarkable success
in tasks such as object detection and recognition. YOLO v7 Chen et al. [10] proposed a system to sort garbage using
[5] and Faster R-CNN [6] are two algorithms highly deep learning models like RPN and the VGG-16 for object
recognized for their superior performance in object detection. detection and pose estimation. Praneel et al. [11] presents
Due to its precision and speed, the YOLO v7 object vision-based detection of used electronics parts in which
detection algorithm excels in real-time applications. It customized deep learning models like shallow neural
divides an image into a grid using a single neural network network, support vector machines and CNNs were used. Out
and predicts bounding boxes and class probabilities of which CNN model has best accuracy of 98.1. Zainab et al.
[12] proposed voice assisted robot to sort objects using their

979-8-3503-0082-6/23/$31.00 ©2023 IEEE

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on April 08,2024 at 09:42:24 UTC from IEEE Xplore. Restrictions apply.
colors and shapes. For this reason, MATLAB toolbox was purpose, 15% for validation and 15% for testing purpose.
used. Color recognition was based on RGB component Every image is labelled by creating bounding box around the
extraction and shape recognition was obtained by measuring object. For this Roboflow annotation tool is used. There were
the completeness of the part. Yue et al [13] developed empty some challenges while creating dataset like size of the
dish recycling robot where YOLO v4 was used for object dataset. To increase the size of dataset a greater number of
detection and its performance was evaluated on precision, images should be captured, which is time consuming to
recall, and F1 score. Bui et al. [14] proposed autonomous incorporate this problem, data augmentation technique is
robot manipulator in which RGB-D data was used to train used which increases the size of dataset artificially. One
convolutional neural network for object detection and 3D main challenge is to create a dataset which contains different
point cloud was generated for grasping. This robotic system lighting conditions. The Fig. 2. shows some of sample
differs from the given related works as it uses custom dataset images of cubes in different orientation and different lighting
of different colored cubes for object detection and by using conditions.
advanced object detection models.
III. AUTONOMOUS SORTING
A. Workspace creation
To build an autonomous sorting workstation, 6 DOF
Niryo Ned robot arm is used along with an HD webcam.
Niryo ned has inbuild Raspberry Pi controller and is
controlled by Robot operating system (ROS) and Python
API. A workspace is fixed of size 0.194m x 0.194m with top
left marker as origin marker and other three markers are edge
detection markers. The marker on the top left is used to
transform workspace distance into robot’s world frame. The
camera is placed at height of 0.47m but its height can be
varied as long as all four markers are clearly visible. With its Fig. 2: Sample Dataset
adaptability and user friendliness, the Niryo robot is an
effective robot that can be used for a variety of industrial C. Training object detection model
applications. Real-time visual data may be recorded and The popular one-stage object detection model known as
processed by integrating the HD webcam into the workspace, YOLO V7, also known as” You Only Look Once Version
enabling the robot to correctly identify and classify things 7,” is well renowned for its quick response time and great
based on predefined criteria [15]. accuracy. On the other hand, a two-stage object detection
framework called Faster R-CNN, also known as” Region
based Convolutional Neural Networks,” excels at accurate
object localization and detection. For the training purpose
custom dataset was exported in .csv file which contains files
like filename, height, width, class, xmin, xmax, ymin, ymax.
Nvidia GPUs were used to speed up training and enhance
performance for YOLO V7 and Faster R-CNN. To attain the
best results, the tuning-parameters of the models, such as
learning rates, batch sizes, and optimizer configurations are
carefully adjusted. To evaluate the model’s performance
assessment criteria like mean mAP and losses are used to
measure the accuracy and overlap of projected bounding
boxes with ground truth annotations. Fig. 3. shows color
cubes are correctly detected with its confidence score.

Fig. 1:Workspace containing Niryo ned robot and overhead camera

The Fig. 1 shows the 6 degree of freedom niryo ned robot


with standard gripper 1. It has 300 gm of payload capacity
and reach of 440mm with repeatability of +/-0.5.
B. Dataset creation
The dataset contains mainly three cubes of colours red,
green, and yellow. The cubes are size of 20 mm*20 mm. The
size of cubes is constrained to the size of jaw of gripper.
Different images of the objects are captured using different
orientation and different lighting effects. Images are captured
of resolution 1280 x720 and of reduced size for faster
training purpose. Total 400 images captured and 400 were
Fig. 3:Image with labelled class and confidence score.
augmented out of which 70% of images are used for training

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on April 08,2024 at 09:42:24 UTC from IEEE Xplore. Restrictions apply.
Fig. 4:YOLO architecture.

contain objects. The RPN operates on the last


1) YOLO V7: convolutional feature maps of a pre-trained deep
YOLO (You Only Look Once) [16][17] is an object bounding box.
detection framework known for its speed and accuracy. The • The generated region proposals are given to the
main idea behind YOLO is to perform object detection by Region of Interest pooling layer. This layer reshapes
directly predicting bounding boxes and class probabilities in each proposal to a fixed spatial dimension and aligns
a single pass of the neural network. General overview of how them with a fixed-size feature map. This allows the
YOLO models typically detect objects: network to handle regions of varying sizes and extract
• Input Processing: The YOLO model takes an input features from each region. Next, the RoI features are
image and converts it into a grid of cells. the given to two parallel fully connected layers. One
branch performs object classification, predicting the
• Bounding Box Prediction: For every grid cell, the probability of object classes for each proposed region.
model gives multiple bounding boxes with their The other branch performs bounding box regression,
corresponding confidence scores. They define the refining the coordinates of the proposed regions to
regions where objects are likely to be present. improve localization accuracy.
• Class Prediction: The model also predicts the class • During training, the Faster R-CNN model goes
probabilities for each bounding box, indicating the through region proposal generation, ground truth
type of object present within. assignment, and a multi-task loss computation that
• Non-maximum Suppression: To eliminate redundant incorporates both classification and bounding box
or overlapping bounding box predictions, a technique regression losses. The model is then trained using
called non-maximum suppression is applied. It keeps backpropagation and optimization techniques. During
the box who has the highest score for each detected inference, the model generates region proposals,
object and removes others that significantly overlap performs non-maximum suppression to filter out
with it. overlapping bounding boxes, and predicts object class
probabilities and refined box coordinates. The final
• Object Detection: After non-maximum suppression, detections were obtained by applying thresholds to
the final output contains the bounding box class probabilities and bounding box confidence
coordinates, corresponding class labels, and scores.
confidence scores for the detected objects.
Fig.4. shows the architecture of YOLO v7 in which
architecture is mainly divided into three parts, first YOLO
backbone which is a CNN that pools image pixels to form
features at different granularities. Second one is YOLO neck
which combines and mixes the convNet layer representation
before passing on to last part that is YOLO head which gives
the bounding box and class prediction.

2) Faster RCNN:
• Faster R-CNN [18] is a highly effective object
detection algorithm. The Fig. 5. shows the Fig. 5 Faster RCNN architecture
architecture of Faster R-CNN consisting of several
key components. The first major component is the IV. EXPERIMENTAL RESULTS AND DISSCUTION
Region Proposal Network (RPN), which is a fully When training an object detection model, there are
convolutional network. It creates region proposals, various metrics and loss functions commonly used to assess
which are potential bounding box regions that may

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on April 08,2024 at 09:42:24 UTC from IEEE Xplore. Restrictions apply.
its performance. Here is an explanation of the different iterations increases losses curves decreases and Fig. 7. shows
outcomes you may observe: recall and precision increases over the iterations.
• Object Loss: Also known as localization loss or
bounding box regression loss, object loss measures
the model’s error in predicting the bounding box
coordinates of objects in an image. It indicates how
accurately the model localizes objects within the
image.
• Box Loss: Box loss is a component of object loss that
specifically focuses on the accuracy of bounding box
predictions. It quantifies the error in predicting the
bounding box coordinates, such as the top-left corner,
width, and height of the objects.
• Class Loss: Class loss, or classification loss, measures
the model’s error in predicting the class labels or
categories of objects in an image. It quantifies how
well the model can classify objects into predefined
Fig. 6:Box, object, and class losses of YOLO V7
classes.
• Precision: It is a parameter which evaluates the
exactness or accuracy of the model’s predictions. It
measures the percentage of correctly predicted objects
(true positives) of all the objects predicted as positive.
Precision is useful when minimizing false positives is
important. Equation 1 represents the precision
formula.

True positive
Precision 1
True Positive False Positive

• Recall: Recall, known as sensitivity or true positive


rate, measures the model’s capability to detect all
relevant objects in an image. It quantifies the
percentage of correctly predicted objects (true
positives) out of all the ground truth objects. Recall is
valuable when minimizing false negatives is a
priority. Equation 2 represents the recall formula.

True Positive
Recall 2 Fig. 7: Precision, recall, and mAP of YOLO V7
True Positive False Negetive
B. Faster RCNN results:
• mAP (mean Average Precision): mAP is important
Fig. 8. Shows Precision, recall, and mAP of Faster
parameter for object detection models. It combines
RCNN and Fig. 9. Shows Class, box, and object losses of
precision and recall across multiple object categories.
Faster RCN. Initially at a smaller number of iterations the
It calculates the averages precision of every class and
accuracy is less but as iterations increases accuracy gets
then takes the mean across all classes. Higher mAP
increases. Ideal recall and mAP graphs should increase
values indicate better overall performance of the
steadily as the number of retrieved relevant instances
model. Equation 3 represents the mAP formula.
increases.
!
1
3
"#$

A. YOLO V7 results:
To analyse the performance of model tensor board was
used. TensorFlow which is well known machine learning
frameworks has tensor board as a visualization tool. Initially
the model was trained for 100 iterations, but losses achieved
was higher. Hence to reduce losses, model was trained for
250 iterations. Fig. 6. shows Box, object, and class losses of
YOLO V7. It indicates for total 250 iterations the number of
Fig. 8:Precision, recall, and mAP of Faster RCNN

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on April 08,2024 at 09:42:24 UTC from IEEE Xplore. Restrictions apply.
TABLE II: F1 SCORES OF DIFFERENT CLASSES.

Class F1-Score
Red cube 0.85
Yellow cube 0.91
Green cube 0.89

After successfully detecting objects using trained yolo


V7 model which gives the bounding box coordinates and
class labels, but to grasp object using Niryo ned robot it
needed coordinates of object. To overcome this limitation, a
two-step process is introduced, finding the contour of the
detected.

Fig. 9: Class, box, and object losses of Faster RCNN

C. Implementation:
For the implementation and experiments the system of 8
GB RAM and GPU of NVIDIA GEFORCE RTX 3050 is
used. For training purpose additional CUDA is used to
accelerate GPU. The system uses UBANTU 18.04 as
Operating system, and in terms of software ROS noetic,
Google colab notebook, Visual studio code are used.
Based on evaluation and analysis of precision, recall, and
mAP, as shown in Table I it is evident that YOLO v7
outperformed Faster R-CNN in terms of object detection
accuracy and localization. YOLO consistently demonstrated
higher precision, recall, and mAP values, indicating its Fig. 10: Generating contour and centroid.
superior performance in detecting and localizing objects
within images. These findings suggest that YOLO may be a object, and determining its centroid [19]. By identifying the
more suitable choice for applications requiring real-time contour, more precise representation of the object’s shape
object detection and localization capabilities. can be obtained, allowing for better grasping planning.
Additionally, the centroid serves as a crucial reference point
TABLE I:COMPARISON OF PERFORMANCES OF YOLO V7 AND
for the robot to calculate the coordinates and execute the
FASTER RCNN
grasping action with improved accuracy. Fig. 10. is the
Model Precision Recall mAP image taken from camera attached overhead to niryo ned
YOLO V7 95.3% 94.0% 98.4% robot shows the counter and centroid of object and with
Faster RCNN 93.2% 91.0% 94.1% reference to markers in the workspace the positioning of
object is calculated. The calculated positions of objects are
transferred to robots using Ethernet.
After selecting YOLO as an object detection model, the
F1 score is checked for each class by obtaining TP, FN, and
FP for each class to find balance of dataset.
F-measure (F-1 Score): The F1-score, known as the F-
measure, is a performance metric used to calculate the
effectiveness of a detection model. It gives a balance
between precision and recall, combining both parameters
into a single value. The F1-score is mainly useful for
calculating model performance on imbalanced datasets,
where one class is significantly more dominant than the
other. Table II shows the F scores for each class, A perfect
F1-score is 1, indicating perfect precision and recall. A
lower F1-score shows a less effective model, and it ranges
from 0 to 1, with 0 being the worst and 1 being the best.
Equation 4 shows the formula to calculate F-1 score. Fig. 11: NIryo ned robot picking the yellow cube.

Precision ' Recall


F % measure 2' 4
Fig. 11. Shows robot has picked the yellow cube from the
Precision recall workspace and Fig.12. shows the robot placing the yellow
cube at defined position. Similarly other cubes are also
detected and placed at defined positions.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on April 08,2024 at 09:42:24 UTC from IEEE Xplore. Restrictions apply.
[7] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software
Tools, 2000.
[8] Sakr, G.E., Mokbel, M., Darwich, A., Khneisser, M.N., & Hadi, A.
(2016). Comparing deep learning and support vector machines for
autonomous waste sorting. 2016 IEEE International Multidisciplinary
Conference on Engineering Technology (IMCET), 207-212.
[9] H. -D. Bui, H. Nguyen, H. M. La and S. Li, "A Deep Learning-Based
Autonomous Robot Manipulator for Sorting Application," 2020
Fourth IEEE International Conference on Robotic Computing (IRC),
Taichung, Taiwan, 2020, pp. 298-305, doi: 10.1109/IRC.2020.00055.
[10] C. Zhihong, Z. Hebin, W. Yanbo, L. Binyan and L. Yu, "A vision-
based robotic grasping system using deep learning for garbage
sorting," 2017 36th Chinese Control Conference (CCC), Dalian,
China, 2017, pp. 11223-11226, doi: 10.23919/ChiCC.2017.8029147.
[11] Chand, Praneel, and Sunil Lal. 2022. "Vision-Based Detection and
Classification of Used Electronic Parts" Sensors 22, no. 23: 9079.
https://doi.org/10.3390/s22239079
Fig. 12: Niryo ned robot placing the yellow cube. [12] Z. AlSalman, N. AlSomali, S. AlSayari and A. Bashar, "Speech
Driven Robotic Arm for Sorting Objects Based on Colors and
V. CONCLUSION Shapes," 2018 3rd International Conference on Inventive
Computation Technologies (ICICT), Coimbatore, India, 2018, pp. 6-
This study presents an autonomously sorting niryo ned 11, doi: 10.1109/ICICT43934.2018.9034306.
robot where YOLO v7 and faster RCNN models are used for [13] X. Yue, H. Li, M. Shimizu, S. Kawamura and L. Meng, "Deep
object detection. The comparison between these models Learning-based Real-time Object Detection for Empty-Dish
revealed that YOLO v7 has better performance than faster Recycling Robot," 2022 13th Asian Control Conference (ASCC), Jeju,
RCNN in terms of accuracy. As a result, YOLO v7 model Korea, Republic of, 2022, pp. 2177-2182, doi:
10.23919/ASCC56756.2022.9828060
was chosen to detect object and by adding novel features to
the algorithm Niryo ned robot will sort the objects with [14] H. -D. Bui, H. Nguyen, H. M. La and S. Li, "A Deep Learning-Based
Autonomous Robot Manipulator for Sorting Application," 2020
greater accuracy. The empirical evaluation showcases the Fourth IEEE International Conference on Robotic Computing (IRC),
system's capabilities and identifies its limitations, guiding Taichung, Taiwan, 2020, pp. 298-305, doi: 10.1109/IRC.2020.00055.
future improvements. The research highlights the potential of [15] P. Chand, "Investigating Vision Based Sorting of Used Items," 2022
automation in industrial processes, demonstrating the IEEE International Conference on Artificial Intelligence in
benefits of autonomous systems in enhancing efficiency and Engineering and Technology (IICAIET), Kota Kinabalu, Malaysia,
reducing human labor. Overall, it lays a foundation for 2022, pp. 1-5, doi: 10.1109/IICAIET55139.2022.9936813.
further advancements in automation technologies for sorting [16] Ahmad, Tanvir & Ma, Yinglong & Yahya, Muhammad & Ahmad,
Belal & Nazir, Shah & Haq, Amin & Ali, Rahman. (2020). Object
applications. However, there are some limitations like Detection through Modified YOLO Neural Network. Scientific
generalization to new objects, sensitivity to environmental Programming. 10.1155/2020/8403262.
conditions, and potential misclassifications due to occlusions [17] P. Jiang, D. Ergu, F. Liu, Y. Cai, and B. Ma, “A Review of Yolo
and overlapping objects. The computational complexity of Algorithm Developments,” Procedia Computer Science, vol. 199, no.
YOLO v7 and Faster R-CNN models may impact real-time Complete, pp. 1066–1073, Jan. 2022.
processing, while calibration errors and dataset bias can [18] R. Girshick, "Fast R-CNN," 2015 IEEE International Conference on
affect the system's efficiency. In the future work, the plan is Computer Vision (ICCV), Santiago, Chile, 2015, pp. 1440-1448, doi:
10.1109/ICCV.2015.169.
to use real-world objects other than cubes and to replace the
USB camera with a 3D depth camera. This upgrade aims to [19] Gary Bradski. The opencv library. Dr. Dobb’s Journal: Software Tools
for the Professional Programmer, 25(11):120–123, 2000.
enhance grasping capabilities by accurately calculating the
height of objects. Additionally, the intention is to introduce
voice interaction with the robot, facilitating human-robot
interaction.
REFERENCES
[1] Daron Acemoglu and Pascual Restrepo. Artificial intelligence,
automation, and work. In The economics of artificial intelligence: An
agenda, pages 197–236. University of Chicago Press, 2018.
[2] Nadikattu, Ashok Kumar Reddy. "Influence of Artificial Intelligence
on Robotics Industry." International Journal of Creative Research
Thoughts (IJCRT), ISSN (2021): 2320-2882.
[3] Li, Yong, Zhiqiang Guo, Feng Shuang, Man Zhang, and Xiuhua Li.
"Key technologies of machine vision for weeding robots: A review
and benchmark." Computers and Electronics in Agriculture 196
(2022): 106880.
[4] Javier Andreu-Perez, Fani Deligianni, Daniele Ravi, and GuangZhong
Yang. Artificial intelligence and robotics. arXiv preprint
arXiv:1803.10813, 2018.
[5] Juan Du. Understanding of object detection based on cnn family and
yolo. In Journal of Physics: Conference Series, volume 1004, page
012-029. IOP Publishing, 2018.
[6] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster rcnn:
Towards real-time object detection with region proposal networks.
Advances in neural information processing systems, 28, 2015.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on April 08,2024 at 09:42:24 UTC from IEEE Xplore. Restrictions apply.

You might also like