Professional Documents
Culture Documents
ABSTRACT Robotic bin-picking is a common process in modern manufacturing, logistics, and warehousing
that aims to pick-up known or unknown objects with random poses out of a bin by using a robot-camera
system. Rapid and accurate object pose estimation pipelines have become an escalating issue for robot
picking in recent years. In this paper, a fast 6-DoF (degrees of freedom) pose estimation pipeline for random
bin-picking is proposed in which the pipeline is capable of recognizing different types of objects in various
cluttered scenarios and uses an adaptive threshold segment strategy to accelerate estimation and matching
for the robot picking task. Particularly, our proposed method can be effectively trained with fewer samples
by introducing the geometric properties of objects such as contour, normal distribution, and curvature.
An experimental setup is designed with a Kinova 6-Dof robot and an Ensenso industrial 3D camera for
evaluating our proposed methods with respect to four different objects. The results indicate that our proposed
method achieves a 91.25% average success rate and a 0.265s average estimation time, which sufficiently
demonstrates that our approach provides competitive results for fast objects pose estimation and can be
applied to robotic random bin-picking tasks.
INDEX TERMS Robotic bin-picking, learning-based pose estimation, adaptive threshold, point pair features,
multilayer segmentation, 3D sensing.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 63055
W. Yan et al.: Fast Object Pose Estimation Using Adaptive Threshold for Bin-Picking
FIGURE 2. Illustration of a proposed pipeline for 3D object detection and pose estimation. The 3D model database in the dotted
rectangle is built off-line, followed by feature extraction and point cloud key point description transform of the 3D CAD model into
the 3D model database as template.
FIGURE 5. The proposed pipeline for 3D object detection and pose estimation.
FIGURE 6. The key point and reference point of scenes, where yellow and
blue denote the key point and the reference point denoted under PPF
description, respectively.
camera and box can then have a closer search range than
the usual solution, a layer less, a search layer closer from FIGURE 7. Mathematical model of picking strategy.
the bottom of box to top area of the target. When there are dif-
ferent type of target, the layer gaps are also different because, Taking a special situation as an example, if all layer maxi-
under the field of view of the camera, the target of picking is mum diameter values L of the convex enclosure are the same,
determined by the section above, we know the major work- i.e., l1 = l2 = l3 . . ., then the above search range can be
piece of one layer, the threshold is between the minimum described as C − H + (n − 1) ∗ L ≤ d ≤ C − H + n ∗ L. The
and maximum diameters, and this represents the adaptive pseudocode statement is shown in Algorithm 1, in which the
threshold. input includes the distance between box and camera, which is
Because the 3D camera can directly capture depth infor- denoted as C, the box height H , maximum workpiece height
mation and determine the distance between the camera and Lmax , and predefined minimum similarity score threshold
the actual scene, the adaptive threshold can remove the fore- Smin . The output is the pose of target Pfinal and the similarity
ground and background, reduce the point cloud size, and score Sfinal .
facilitate the speed of extracting features and matching. The As the code shows, we consider the case in which the
three-dimensional model of the object has been used as condition S > Smin is satisfied at n = n0 . Next, step
prior knowledge. For the stacking case which is depicted n = n0 − 1, pose P, score S and the search range ρ will be
in Figure 7, suppose we know the 3D camera max height of updated according to the algorithm. If not satisfied, the search
the field of view, denoted as C, and the height H of the box range will be updated using the predefined threshold 1. The
with the maximum diameter L of the convex enclosure of the side view of our bin-picking multilayer algorithm result is
recognized target, and the search strategy is followed in the shown in Figure 8. The green coloration and coordinates
search region of (C − H ) ≤ d1 ≤ (C − H + l1 ) (top-down represent the recognized target using the multilayer adap-
from the camera view) as shown in Figure 7 and then using tive threshold picking strategy described above. The piped
the matching algorithm mentioned above to search for the corner-joint has been masked with green on every recog-
target. In addition, the results corresponding to the number nized object. As shown in Figure 9, the first row is the real
of total layers can be obtained from n = 1 and formulated in scene of the 3D camera, where the target recognized by our
Equation 3: method is indicated by a green mask for convenience. If the
target has been recognized, the green mask with ICP pose
C − H ≤ d1 ≤ C − H + l1 , (n = 1) refinement will be used, and the others will retain the original
C − H + l1 ≤ d2 ≤ C − H + l1 + l2 , (n = 2) color, i.e., green coloration indicates the recognized objects,
C − H + l1 + l2 ≤ d3 ≤ C − H + l1 + l2 + l3 , (n = 3) whereas the gray objects are not recognized. The second row
.. is the processed point cloud in our experiment, from left to
. (3) right figures are the targets which were highlighted using our
TABLE 1. Recognition rates of 4 different types of targets with CAD model or real scene models.
FIGURE 12. Object on origin reference point (left image) and detected
result with respect to origin (right image), where the green labels at the
bottom of left in the two pictures are the position and orientation of the
target.
TABLE 2. Comparisons of the resulting measurements and ground truth (unit: mm).
camera is sensitive to distance and noise in the Z direction, [5] R. B. Rusu, ‘‘Semantic 3D object maps for everyday manipulation
the error in the Z direction is more substantial than those in in human living environments,’’ KI- Künstliche Intell., vol. 24, no. 4,
pp. 345–348, Nov. 2010.
the X and Y directions. [6] F. Tombari, S. Salti, and L. Di Stefano, ‘‘Unique signatures of histograms
for local surface description,’’ in Proc. Eur. Conf. Comput. Vis. Berlin,
V. CONCLUSION AND FUTURE WORK Germany: Springer, 2010, pp. 356–369.
This paper proposes a CAD-based 6-DoF pose estimation [7] F. Tombari, S. Salti, and L. Di Stefano, ‘‘A combined texture-shape descrip-
tor for enhanced 3D feature matching,’’ in Proc. 18th IEEE Int. Conf. Image
pipeline for robotic random bin-picking tasks using the 3D Process., Sep. 2011, pp. 809–812.
camera. Less data and time were consumed by using our [8] A. Johnson, ‘‘Spin-images: A representation for 3-D surface matching,’’
method for database building and pose estimation. The 3D Ph.D. dissertation, Robot. Inst., Carnegie Mellon Univ., Pittsburgh, PA,
USA, 1997.
model database can be generated from either 3D CAD parts
[9] A. Aldoma, M. Vincze, N. Blodow, D. Gossow, S. Gedikli, R. B. Rusu,
or real scene segmented data, which is more convenient and G. Bradski, ‘‘CAD-model recognition and 6DOF pose estimation
in practical production scenarios than the most commonly using 3D cues,’’ in Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCV
Workshops), Nov. 2011, pp. 585–592.
employed methods. Additionally, as for on-line pose esti-
[10] Z.-C. Marton, D. Pangercic, R. B. Rusu, A. Holzbach, and M. Beetz, ‘‘Hier-
mation, the improved PPF voting algorithm with multilayer archical object geometric categorization and appearance classification for
adaptive threshold picking strategy was introduced, and a mobile manipulation,’’ in Proc. 10th IEEE-RAS Int. Conf. Hum. Robots,
series of experiments involving practical robotic random bin- Dec. 2010, pp. 365–370.
[11] Z.-C. Marton, D. Pangercic, N. Blodow, and M. Beetz, ‘‘Combined 2D–3D
picking of multiple object types showed that the proposed categorization and classification for multimodal perception systems,’’ Int.
system can pick up all randomly posed objects in the bin. J. Robot. Res., vol. 30, no. 11, pp. 1378–1402, Sep. 2011.
A practical experiment in the laboratory achieved an average [12] P. J. Besl and N. D. McKay, ‘‘Method for registration of 3-D shapes,’’ Proc.
recognition rate of 91.275 % and a time of 0.265 seconds. For SPIE, vol. 1611, Apr. 1992, pp. 586–606.
[13] S. Rusinkiewicz and M. Levoy, ‘‘Efficient variants of the ICP algorithm,’’
the current system, there is still room to improve the quality in Proc. 3rd Int. Conf. 3-D Digit. Imag. Model., vol. 1, 2001, pp. 145–152.
of the predicted pose score in clustered scenes, as well as the [14] A. Segal, D. Haehnel, and S. Thrun, ‘‘Generalized-ICP,’’ in Robotics:
small size objects, occlusion and better quality of selected Science and Systems, vol. 2, no. 4. Seattle, WA, USA: RSS, 2009, p. 435.
suction cup locations on objects. Those will be the topic of [15] F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce, ‘‘3D object modeling
and recognition using local affine-invariant image descriptors and multi-
our future research. view spatial constraints,’’ Int. J. Comput. Vis., vol. 66, no. 3, pp. 231–259,
Mar. 2006.
REFERENCES [16] M. Ulrich, C. Wiedemann, and C. Steger, ‘‘CAD-based recognition of 3D
[1] G. Lowe, ‘‘SIFT-the scale invariant feature transform,’’ Int. J, vol. 2, objects in monocular images,’’ in Proc. IEEE Int. Conf. Robot. Autom.,
pp. 91–110, Mar. 2004. May 2009, pp. 1191–1198.
[2] H. Bay, T. Tuytelaars, and L. Van Gool, ‘‘Surf: Speeded up robust fea- [17] C. Choi and H. I. Christensen, ‘‘Real-time 3D model-based tracking using
tures,’’ in Proc. Eur. Conf. Comput. Vis. Berlin, Germany: Springer, 2006, edge and keypoint features for robotic manipulation,’’ in Proc. IEEE Int.
pp. 404–417. Conf. Robot. Autom., May 2010, pp. 4048–4055.
[3] E. Rublee, V. Rabaud, K. Konolige, and G. R. Bradski, ‘‘ORB: An efficient [18] A. Collet, M. Martinez, and S. S. Srinivasa, ‘‘The MOPED framework:
alternative to SIFT or SURF,’’ in Proc. ICCV, Nov. 2011, vol. 11, no. 1, Object recognition and pose estimation for manipulation,’’ Int. J. Robot.
p. 2. Res., vol. 30, no. 10, pp. 1284–1306, Sep. 2011.
[4] R. B. Rusu, N. Blodow, and M. Beetz, ‘‘Fast point feature histograms [19] S. Salti, A. Petrelli, F. Tombari, and L. Di Stefano, ‘‘On the affinity between
(FPFH) for 3D registration,’’ in Proc. IEEE Int. Conf. Robot. Autom., 3D detectors and descriptors,’’ in Proc. 2nd Int. Conf. 3D Imag., Modeling,
May 2009, pp. 3212–3217. Process., Vis. Transmiss., Oct. 2012, pp. 424–431.
[20] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, and N. M. Kwok, ‘‘A [43] P. Wohlhart and V. Lepetit, ‘‘Learning descriptors for object recognition
comprehensive performance evaluation of 3D local feature descriptors,’’ and 3D pose estimation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recog-
Int. J. Comput. Vis., vol. 116, no. 1, pp. 66–89, Jan. 2016. nit. (CVPR), Jun. 2015, pp. 3109–3118.
[21] K.-T. Song, C.-H. Wu, and S.-Y. Jiang, ‘‘CAD-based pose estimation [44] J. Tremblay, T. To, B. Sundaralingam, Y. Xiang, D. Fox, and
design for random bin picking using a RGB-D camera,’’ J. Intell. Robotic S. Birchfield, ‘‘Deep object pose estimation for semantic robotic grasp-
Syst., vol. 87, nos. 3–4, pp. 455–470, Sep. 2017. ing of household objects,’’ 2018, arXiv:1809.10790. [Online]. Available:
[22] S. Hinterstoisser, S. Holzer, C. Cagniart, S. Ilic, K. Konolige, N. Navab, http://arxiv.org/abs/1809.10790
and V. Lepetit, ‘‘Multimodal templates for real-time detection of texture- [45] J. M. Wong, V. Kee, T. Le, S. Wagner, G.-L. Mariottini, A. Schneider,
less objects in heavily cluttered scenes,’’ in Proc. Int. Conf. Comput. Vis., L. Hamilton, R. Chipalkatty, M. Hebert, D. M. S. Johnson, J. Wu, B. Zhou,
Nov. 2011, pp. 858–865. and A. Torralba, ‘‘SegICP: Integrated deep semantic segmentation and
[23] B. Drost, M. Ulrich, N. Navab, and S. Ilic, ‘‘Model globally, match locally: pose estimation,’’ in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS),
Efficient and robust 3D object recognition,’’ in Proc. IEEE Comput. Soc. Sep. 2017, pp. 5784–5789.
Conf. Comput. Vis. Pattern Recognit., Jun. 2010, pp. 998–1005. [46] K. Lai, L. Bo, X. Ren, and D. Fox, ‘‘A large-scale hierarchical multi-view
[24] T. Hodan, F. Michel, E. Brachmann, W. Kehl, A. GlentBuch, D. Kraft, RGB-D object dataset,’’ in Proc. IEEE Int. Conf. Robot. Autom., May 2011,
B. Drost, J. Vidal, S. Ihrke, and X. Zabulis, ‘‘BOP: Benchmark for 6D pp. 1817–1824.
object pose estimation,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, [47] P. Marion, P. R. Florence, L. Manuelli, and R. Tedrake, ‘‘Label fusion: A
pp. 19–34. pipeline for generating ground truth labels for real RGBD data of cluttered
[25] S. Hinterstoisser, V. Lepetit, N. Rajkumar, and K. Konolige, ‘‘Going further scenes,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2018,
with point pair features,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV). Cham, pp. 1–8.
Switzerland: Springer, 2016, pp. 834–848. [48] T. Hodan, P. Haluza, S. Obdrzalek, J. Matas, M. Lourakis, and
[26] D. Liu, S. Arai, J. Miao, J. Kinugawa, Z. Wang, and K. Kosuge, ‘‘Point X. Zabulis, ‘‘T-LESS: An RGB-D dataset for 6D pose estimation of
pair feature-based pose estimation with multiple edge appearance models texture-less objects,’’ in Proc. IEEE Winter Conf. Appl. Comput. Vis.
(PPF-MEAM) for robotic bin picking,’’ Sensors, vol. 18, no. 8, p. 2719, (WACV), Mar. 2017, pp. 880–888.
2018. [49] C.-H. Wu, S.-Y. Jiang, and K.-T. Song, ‘‘CAD-based pose estimation for
[27] D. Liu, S. Arai, Z. Feng, J. Miao, Y. Xu, J. Kinugawa, and K. Kosuge, ‘‘2D random bin-picking of multiple objects using a RGB-D camera,’’ in Proc.
object localization based point pair feature for pose estimation,’’ in Proc. 15th Int. Conf. Control, Autom. Syst. (ICCAS), Oct. 2015, pp. 1645–1649.
IEEE Int. Conf. Robot. Biomimetics (ROBIO), Dec. 2018, pp. 1119–1124. [50] C. Choi, Y. Taguchi, O. Tuzel, M.-Y. Liu, and S. Ramalingam, ‘‘Voting-
[28] J. Vidal, C.-Y. Lin, and R. Marti, ‘‘6D pose estimation using an improved based pose estimation for robotic assembly using a 3D sensor,’’ in Proc.
method based on point pair features,’’ in Proc. 4th Int. Conf. Control, IEEE Int. Conf. Robot. Autom., May 2012, pp. 1724–1731.
Autom. Robot. (ICCAR), Apr. 2018, pp. 405–409.
[29] J. Vidal, C.-Y. Lin, X. Lladó, and R. Martí, ‘‘A method for 6D pose
estimation of free-form rigid objects using point pair features on range
data,’’ Sensors, vol. 18, no. 8, p. 2678, 2018.
[30] M. Li and K. Hashimoto, ‘‘Curve set feature-based robust and fast pose
WU YAN received the M.E. degree in mechanical
estimation algorithm,’’ Sensors, vol. 17, no. 8, p. 1782, 2017.
[31] M. Li and K. Hashimoto, ‘‘Accurate object pose estimation using depth
engineering from the Guangdong University of
only,’’ Sensors, vol. 18, no. 4, p. 1045, 2018. Technology, in 2016. Since then, he has been a
[32] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, ‘‘3D Research Assistant with the Guangdong Institute
ShapeNets: A deep representation for volumetric shapes,’’ in Proc. IEEE of Intelligent Manufacturing, Guangzhou, China.
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1912–1920. His current research interests include 3D percep-
[33] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, ‘‘3D bounding tion and robotics.
box estimation using deep learning and geometry,’’ in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 7074–7082.
[34] R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, ‘‘PointNet: Deep learn-
ing on point sets for 3D classification and segmentation,’’ in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 652–660.
[35] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, ‘‘PointNet++: Deep hierarchical
feature learning on point sets in a metric space,’’ in Proc. Adv. Neural Inf. ZHIHAO XU (Student Member, IEEE) received
Process. Syst., 2017, pp. 5099–5108.
the B.E. and Ph.D. degrees from the School of
[36] Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, ‘‘PoseCNN: A
Automation, Nanjing University of Science and
convolutional neural network for 6D object pose estimation in
cluttered scenes,’’ 2017, arXiv:1711.00199. [Online]. Available: Technology, Nanjing, China, in 2010 and 2016,
http://arxiv.org/abs/1711.00199 respectively. He is currently a Postdoctoral Fellow
[37] M. Rad and V. Lepetit, ‘‘BB8: A scalable, accurate, robust to partial occlu- of the robotics team with the Guangdong Institute
sion method for predicting the 3D poses of challenging objects without of Intelligent Manufacturing, Guangzhou, China.
using depth,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, His main research interests include neural net-
pp. 3828–3836. works, force control, and intelligent information
[38] W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab, ‘‘SSD-6D: Making processing.
RGB-based 3D detection and 6D pose estimation great again,’’ in Proc.
IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1521–1529.
[39] Y. Li, G. Wang, X. Ji, Y. Xiang, and D. Fox, ‘‘DeepIM: Deep iterative
matching for 6D pose estimation,’’ Int. J. Comput. Vis., vol. 128, no. 3,
pp. 657–678, Mar. 2020. XUEFENG ZHOU received the M.E. and Ph.D.
[40] C. Wang, D. Xu, Y. Zhu, R. Martin-Martin, C. Lu, L. Fei-Fei, and S.
degrees in mechanical engineering from the South
Savarese, ‘‘DenseFusion: 6D object pose estimation by iterative dense
China University of Technology, Guangzhou,
fusion,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jun. 2019, pp. 3343–3352.
China, in 2006 and 2011, respectively. He is cur-
[41] M. Fu and W. Zhou, ‘‘DeepHMap++: Combined projection grouping and rently the Director of the robotics team with the
correspondence learning for full DoF pose estimation,’’ Sensors, vol. 19, Guangdong Institute of Intelligent Manufacturing,
no. 5, p. 1032, 2019. Guangzhou. His research interests include neu-
[42] E. Brachmann, F. Michel, A. Krull, M. Y. Yang, S. Gumhold, and ral networks, force control, and intelligent infor-
C. Rother, ‘‘Uncertainty-driven 6D pose estimation of objects and scenes mation processing. He received the Best Student
from a single RGB image,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Paper Award from the 2010 IEEE International
Recognit. (CVPR), Jun. 2016, pp. 3364–3372. Conference on Robotics and Biomimetics.
QIANXING SU received the B.S. degree in HONGMIN WU received the Ph.D. degree in
mechanical engineering from Northeastern Uni- mechanical engineering from the Guangdong Uni-
versity at Qinhuangdao, China. Since then, he has versity of Technology, Guangzhou, China, in
been a Research Assistant with the Guangdong 2019. He is currently a Postdoctoral Fellow of
Institute of Intelligent Manufacturing, Guangzhou, the robotics team with the Guangdong Insti-
China. His current research interests include force tute of Intelligent Manufacturing, Guangzhou. His
control and deep reinforcement learning. research interests include in the domain of multi-
modal time series modeling using Bayesian non-
parametric methods, robot introspection, move-
ment representation, and robot learning.