You are on page 1of 7

Paper 01

Vision-Based Handling System Using Model-Based Vision and Stereo Ranging

Summary : This paper discusses the various applications of a vision-based robotic handling system for
handling casting parts in a manufacturing setting. The main focus of the research is on a three-
dimensional model-based matching technique for determining the position and orientation of objects,
particularly rigid objects with visible patterns or features like line segments and closed curves on their
surfaces.

The researchers implemented a model-based vision technique using a local feature focus method to
identify and determine the three-dimensional position and orientation of objects. They also employed
stereo ranging with area-based correlation to measure the heights of the model features. The system
comprises two distinct processes: a modeling process, which involves human interaction to create a
feature model, and an automatic recognition process. In the latter process, the system captures an image
of a sample object, extracts its features such as closed curves, arc segments, and line segments, and
presents them on a monitor. The operator then selects appropriate features to create a model of the
object, including lines, closed curves, and three types of matching models. Finally, the system generates
a three-dimensional feature model by utilizing stereo ranging measurements.

In practical experiments, the researchers used their vision-based handling system to detect and locate
parts, achieving remarkable accuracy with location errors of less than 2 mm and 3 degrees compared to
the actual position and orientation of the objects. This paper provides insights into the recognition
methodology, system setup, and successful experiments conducted to recognize and pick up real-world
objects in their manufacturing environment.

Comment : this paper provides valuable insights into the methodology, system setup, and successful
experiments conducted to recognize and pick up real-world objects, highlighting the potential
applications and benefits of vision-based robotic handling systems in manufacturing.

[11]m, T. Onda and N. Fujiwara, "Vision-based handling system using model-based vision and stereo
ranging," 1998 Second International Conference. Knowledge-Based Intelligent Electronic Systems.
Proceedings KES'98 (Cat. No.98EX111), Adelaide, SA, Australia, 1998, pp. 199-204 vol.2, doi:
10.1109/KES.1998.725911.

Paper 02

A Machine vision system for the recognition and the positioning of two dimensional partially occluded
object

Summary : This paper tackles the challenge of identifying and positioning partially concealed two-
dimensional objects. It introduces a computer vision algorithm designed to identify and locate these
partially obscured objects. The approach hinges on comparing basic descriptions of scenes and models
using a technique called HYPER, which involves generating and verifying hypotheses while also
recursively estimating the transformation from model to scene.
The proposed computer vision algorithm relies on a method called HYPER (HYpotheses Predicted and
Evaluated Recursively), employing polygons to represent objects. The use of polygonal shape
approximation, or piecewise linear approximation, not only condenses data but also smoothens the
object's boundary, addressing issues arising from digitization, noise, and segmentation variations.
Additionally, it extracts features for subsequent processing.

The implemented machine vision system takes input in the form of vertex coordinates for both the
model and the scene. The program processes this input data and produces output that includes the
number of generated hypotheses, the count of recognized segments, a quality measure, and a
parameter vector.

Comment: The paper presents experimental results illustrating the entire process of recognizing and
locating objects across diverse scene types and conditions. These results demonstrate the effectiveness
of the proposed system.

[12] M. Abou-El-Ela and F. El-Amroussy, "A machine vision system for the recognition and positioning of
two-dimensional partially occluded objects," Proceedings of 8th Mediterranean Electrotechnical
Conference on Industrial Applications in Power Systems, Computer Science and Telecommunications
(MELECON 96), Bari, Italy, 1996, pp. 1087-1092 vol.2, doi: 10.1109/MELCON.1996.551397.

Paper 03

Comparison of Deep-Learning-Based Segmentation Models: Using Top View Person Images

Summary : This article focuses on person segmentation using top-view datasets, an essential element in
various visual applications that enhance scene understanding. Image segmentation, which involves
dividing images or video frames into distinct objects or segments, plays a pivotal role in applications like
remote sensing, facial recognition, autonomous driving, computational photography, indoor object
analysis, and medical imaging.

The study explores three deep learning-based semantic segmentation models: Fully Convolutional
Neural Network (FCN) with Resnet-101 architecture, U-Net with Encoder-Decoder architecture, and
DeepLabV3 with encoder-decoder structure. Initially, these models are tested using pre-trained weights.
To enhance performance, they are further fine-tuned using a dataset specific to top-view person
segmentation.

Remarkably, despite significant changes in person appearance due to camera perspective variations, the
pre-trained deep learning models still yield promising results in this research. Looking ahead, the study
may extend its scope to other deep learning segmentation models using multiple top-view object
datasets.

Comment : this article presents a well-structured exploration of person segmentation in top-view


datasets, showcasing the adaptability of deep learning models and their potential impact on a wide
range of applications requiring scene understanding and object segmentation.

[13] I. Ahmed, M. Ahmad, F. A. Khan and M. Asif, "Comparison of Deep-Learning-Based Segmentation


Models: Using Top View Person Images," in IEEE Access, vol. 8, pp. 136361-136373, 2020, doi:
10.1109/ACCESS.2020.3011406.
Paper 04

Computer Vision Based Gesture Recognition for Desktop Object Manipulation

Summary : This paper introduces a real-time gesture recognition system based on Kinect technology,
designed for manipulating desktop objects. The system comprises three key subsystems: first, it employs
the Kinect's depth sensor to detect the 3D position of the hand. Next, it analyzes these positional data to
recognize predefined gestures. Finally, the recognized gestures are implemented on the desktop to
manipulate various objects.

The paper presents a robust approach to detect static hand gestures, which can be utilized to control
desktop objects. Gesture recognition involves classifying gestures based on the finger positions and
follows a multi-step process, including image acquisition, segmentation, feature vector extraction,
gesture classification, and a training module.

To train the system effectively, the paper employs Hidden Markov Models (HMMs) and utilizes the
Baum-Welch algorithm. This training approach proved successful, involving the use of over 1200
gestures, resulting in an impressive accuracy rate of 89%.

Comment: The primary objective of this research is to establish an interactive and dependable gesture-
based system for controlling desktop objects. To detect fingertip positions, the system employs the
Graham scan algorithm and contour detection algorithm, contributing to its robustness and accuracy.

[14] S. M. A. Hoque, M. S. Haq and M. Hasanuzzaman, "Computer Vision Based Gesture Recognition for
Desktop Object Manipulation," 2018 International Conference on Innovation in Engineering and
Technology (ICIET), Dhaka, Bangladesh, 2018, pp. 1-6, doi: 10.1109/CIET.2018.8660916.

Paper 05

Detection and Object Position Measurement using Computer Vision on Humanoid Soccer

Summary This study presents a computer vision system inspired by the technology employed in the
winning humanoid soccer team of the 2011 season. The system is designed to operate on a soccer field
conforming to the regulations of the 2011 humanoid soccer tournament, where autonomous robots
compete, aiming to prepare robots for matches against human soccer players. The competition is divided
into different classes, including kid size, teen size, and big size robots.

The primary focus of the research was to determine the positions of both the ball and the players on the
field, with a specific emphasis on detecting bouncing balls. However, it's noted that the results couldn't
be directly utilized by humanoid robots since the sensors used must be affixed to the robots themselves,
unlike being positioned off the field. Moreover, the wide field of vision in the system differs significantly
from the perspective of humanoid robots.

The paper underscores that goalkeepers in the competition tend to maintain relatively static positions
relative to objects on the field. The goalkeeper employs its vision system to recognize objects and gauge
the ball's position using image processing techniques. The process of measuring the ball's position
involves recognizing three distinct objects on the field: the ball, the goal's bar, and the field lines.
Comment: In this paper, the researchers successfully demonstrate the development of a system capable
of simultaneously detecting three different objects. They also find that the addition of hidden layers in
neural networks does not significantly improve accuracy, highlighting that even a small number of
neurons can yield satisfactory results.

[15] I. Awaludin, P. Hidayatullah, J. Hutahaean and D. G. Parta, "Detection and object position
measurement using computer vision on humanoid soccer," 2013 International Conference on
Information Technology and Electrical Engineering (ICITEE), Yogyakarta, Indonesia, 2013, pp. 88-92, doi:
10.1109/ICITEED.2013.6676217.

Paper 06

Review on Deep based Object Detection

Summary : This paper offers a comprehensive overview of deep-based object detection techniques,
categorizing them into two main types: two-stage and one-stage detectors. The paper discusses the core
components of two-stage detectors, including feature extraction, proposal generation, and detection
subnetworks, as well as how one-stage detectors treat object detection as a regression problem.

Furthermore, the paper delves into the analysis of three prominent datasets used for evaluating object
detection algorithms: MS-COCO, PASCAL VOC, and ILSVRC. It provides insights into both inter-class and
intra-class comparisons within these datasets.

The study underscores the remarkable progress in deep learning-based object detection, highlighting its
applications in diverse fields such as the military, healthcare, transportation, life sciences, and security.

Comment: This research offers a concise survey of object detection methodologies, categorizing them
into two distinct groups while reviewing eleven recent object detection algorithms and five relevant
datasets. Additionally, the paper outlines potential avenues for future research, including topics like
dataset creation, few-shot object detection, weakly supervised detection, and video-based object
detection.

[16] P. Shf and C. Zhao, "Review on Deep based Object Detection," 2020 International Conference on
Intelligent Computing and Human-Computer Interaction (ICHCI), Sanya, China, 2020, pp. 372-377, doi:
10.1109/ICHCI51889.2020.00085.

Paper 07

Projected Pattern on Three-dimensional Objects for Image Feature Classification and Recognition

Summary This paper introduces a novel technique aimed at enhancing the feature classification and
recognition capabilities of objects using a combination of a projector and camera. The approach involves
projecting a suitable pattern onto the target object, which results in a clearer image that can be analyzed
using classification and recognition algorithms.

Machine vision has gained popularity across various industries due to its cost-effectiveness and
performance. It finds applications in quality control, mobile robot navigation through stereo images,
medical diagnosis through tomography, and even weather forecasting. Within the realm of machine
vision, numerous processes such as 2D/3D matching, scanning, measuring, inspection, classification, and
recognition are crucial. Among these, image feature classification and recognition hold significant
importance in industrial settings.

The technique described in the paper greatly improves the accuracy of correlation-based matching
algorithms, allowing for the correct identification of objects like spheres, cones, cylinders, and boxes.
Furthermore, the use of a suitable pattern aids in verifying the precise edges of objects, which can also
enhance image measuring algorithms through pattern comparison.

Comment: This innovative technique has the potential to revolutionize the field of machine vision by
significantly improving the accuracy and reliability of feature classification and recognition processes. It
enables the correct identification of various objects and enhances the verification of object edges,
benefiting industries that rely on machine vision for quality control and other critical tasks.

[17] G. Phanomchoeng and R. Chanchareon, "Projected pattern on three-dimensional objects for image
feature classification and recognition," 2017 2nd International Conference on Control and Robotics
Engineering (ICCRE), Bangkok, Thailand, 2017, pp. 237-241, doi: 10.1109/ICCRE.2017.7935077.

Paper 08

Research on Rotary Object Recognition Technique based on Neural Network

Summary : This dissertation addresses the challenging task of recognizing rotary objects within complex
backgrounds, a significant concern in the field of computer vision, particularly in military and
manufacturing applications. The proposed approach leverages a BP neural network to achieve accurate
object extraction.

To enhance the recognition process, several key techniques are employed. Firstly, a median filter is
utilized to effectively reduce image noise, improving the quality of the input data. Additionally, an
improved method based on maximum class square error is employed to determine the image
segmentation threshold, enhancing the segmentation accuracy.

The core of the system is an object recognition framework based on an improved BP neural network.
Seven invariant moments extracted from the rotary objects are utilized as the input feature vector for
recognition. Experimental results demonstrate the effectiveness of the image processing techniques
outlined in the dissertation, particularly in noise reduction and precise segmentation. Moreover, the
choice of seven invariant moments aligns well with the characteristics of rotary objects, resulting in a
robust and successful rotary object recognition system based on the BP neural network.

Comment: This research addresses a critical challenge in computer vision by focusing on the recognition
of rotary objects within complex backgrounds. The proposed approach, which incorporates image noise
reduction, improved image segmentation, and a well-tailored feature set, yields impressive recognition
results. The dissertation's contributions have practical applications in military and manufacturing fields,
where accurate object recognition is of paramount importance.

[18] S. -J. Jia, T. Wang and Y. -P. Cui, "Research on rotary object recognition technique based on neural
network," 2015 International Conference on Machine Learning and Cybernetics (ICMLC), Guangzhou,
China, 2015, pp. 684-689, doi: 10.1109/ICMLC.2015.7340637.

Paper 09
The comparison between two methods of object detection: Fast Yolo model and Delaunay Triangulation

Summary : This paper presents a comparative analysis of two techniques for recognizing moving objects
within video scenes. The first method employs deep learning, utilizing the Fast YOLO model for object
detection, while the second approach relies on object segmentation through Delaunay Triangulation and
combines features such as HOG, color histograms, and GLCM for each object. Classification for both
methods is performed using the AlexNet network. The experiments encompass various video clips from
highways and local roads with diverse traffic and lighting conditions.

The primary focus of the article is on real-time vehicle detection and classification. It juxtaposes two
distinct methods: deep learning-based detection with Fast YOLO and object segmentation using
Delaunay Triangulation. The results obtained from both approaches are assessed for object detection
and classification accuracy.

The paper also mentions several existing algorithms for object detection, including those based on inter-
frame difference, optical flux estimation, and background subtraction (employing Gaussian models). The
researchers propose a method that uses three Gaussians to represent road, moving vehicles, and
shadows, building upon the work of Stauffer and Grimson.

Comment: In conclusion, the researchers conducted a comparative study of two object detection
methods in video scenes. The first method, based on deep learning and Fast YOLO, achieved a precision
rate of 76.13%, while the second approach, utilizing Delaunay Triangulation and feature combination,
reached 50%. The paper highlights the challenge of overfitting with small datasets and suggests future
work should focus on enlarging the training dataset and extracting additional features to enhance the
Fast YOLO model's object detection performance.

[19] F. Benjelloun, I. E. Manaa, M. A. Sabri, A. Yahyaouy and A. Aarab, "The comparison between two
methods of object detection: Fast Yolo model and Delaunay Triangulation," 2020 International
Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 2020, pp. 1-6, doi:
10.1109/ISCV49265.2020.9204197.

Paper 10

To detect and Recognize Object from Videos for Computer Vision by Parallel Approach using Deep
Learning

Summary This paper addresses the computer vision problem by presenting a solution based on deep
learning techniques and GPU acceleration. Computer vision involves interpreting 3D scenes from 2D
images by analyzing the structural properties of the scene, essentially transforming real-world objects
into 3D structures.

The proposed approach combines deep learning and machine learning methods to solve computer vision
challenges. It begins by employing a Convolutional Neural Network (CNN) to learn and extract features
from real-time videos. These features are then used for object classification through an extended linear
Support Vector Machine (SVM) classifier.

Deep CNN algorithms, being highly parallel, are effectively utilized in combination with GPU acceleration
to tackle computer vision tasks. The experimental results are evaluated in terms of performance,
accuracy, and simplicity, showcasing the effectiveness of this approach in solving computer vision
problems.

Comment: In this research, the authors introduce an effective solution to computer vision problems by
combining deep learning and machine learning techniques, with a focus on feature extraction using
CNNs. The utilization of GPU acceleration enhances the efficiency of the approach. The paper
underscores the automatic extraction and analysis of valuable information from images and videos,
contributing to the field of computer vision.

[20] G. Nalinipriya, B. Baluswarny, R. Patan, S. Kallam, T. Gs and M. R. Babu, "To detect and Recognize
Object from Videos for Computer Vision by Parallel Approach using Deep Learning," 2018 International
Conference on Advances in Computing and Communication Engineering (ICACCE), Paris, France, 2018,
pp. 336-341, doi: 10.1109/ICACCE.2018.8441718.

You might also like