Professional Documents
Culture Documents
ABSTRACT
Retrieving a 3D model from a 3D database and augmenting the retrieved model in the Augmented Reality system simultaneously
became an issue in developing the plausible AR environments in a convenient fashion. It is considered that the sketch-based 3D object
retrieval is an intuitive way for searching 3D objects based on human-drawn sketches as query. In this paper, we propose a novel deep
learning based approach of retrieving a sketch-based 3D object as for an Augmented Reality Model. For this work, we introduce a
new method which uses Sketch CNN, Wasserstein CNN and Wasserstein center loss for retrieving a sketch-based 3D object. Especially,
Wasserstein center loss is used for learning the center of each object category and reducing the Wasserstein distance between center
and features of the same category. The proposed 3D object retrieval and augmentation consist of three major steps as follows. Firstly,
Wasserstein CNN extracts 2D images taken from various directions of 3D object using CNN, and extracts features of 3D data by
computing the Wasserstein barycenters of features of each image. Secondly, the features of the sketch are extracted using a separate
Sketch CNN. Finally, we adopt sketch-based object matching method to localize the natural marker of the images to register a 3D
virtual object in AR system. Using the detected marker, the retrieved 3D virtual object is augmented in AR system automatically. By
the experiments, we prove that the proposed method is efficiency for retrieving and augmenting objects.
☞ keyword : Convolutional Neural Network, object retrieval, Deep Learning, Sketch-based 3D object retrieval, Augmented Reality
Wasserstein distance, Wasserstein center and Wasserstein features of sketches and 3D objects[3]. In Xie’s study, he
center along with the structure of Wasserstein CNN and extracts features from the images of the 3D object by CNN
sketch-based 3D augmentation. In section 4, the experimental and obtains Wasserstein center of the features to match the
results by the proposed method included. Finally, in section objects and sketches[10].
5, the concluding remarks and future works are discussed. In the studies of similarity measure based on loss function,
Hadsell[4] proposed Contrastive loss and Schroff[5] proposed
2. Related Works Triplet loss in the classifying the input data. Wen[6]
introduced Center loss for recognizing the face. He etal[16]
The studies of sketch-based 3D object retrieval become a introduced Triplet Center loss which combined the Triplet
major issue in the field of contents-based model retrieval area. loss and Center loss in sketch-based 3D object retrieval.
However, the difficulties to use sketches for retrieving a 3D However, the demerit of Contrastive loss is the learning
object is the sketch of the object is not uniquely defined speed can be slow down when the pair of data is not properly
depending on the person’s subjection. Due to this reason, the designed. Triplet loss conventionally needs long learning time
2D sketches for the same 3D object can be presented in many because of the triple pair of data.
different fashions. Meanwhile, sketch-based image matching, which is known
In the study of the 2D projection of the 3D objects, a as a content-based retrieval[23,24,25] method to compare the
composite descriptor so called ZFEC which includes local database images with sketch images drawn by users, is used
region-based Zernike moment, boundary-based Fourier to detect a desired object in an input image and the detected
descriptor, and features of eccentricity and roundness is object is used as natural maker of AR for augmenting a
introduced[2]. In other study, the silhouette of a 3D model is virtual 3D object.
used as a 2D sketch of the model[8]. In a work of
sketch-based 3D retrieval by learning features, Eitz utilized 3. Proposed Method
sketches and 2D projections of the 3D objects by use of
Gabor local line-based features and bag-of-feature (BOF) The proposed sketch-based 3D model retrieval system
histogram[1]. In addition, Furuya proposed BF-SIFT to described in Fig 1 consists of three major parts: two CNN and
describe sketches and 2D projections of 3D objects[9]. Wasserstein Loss. The Wasserstein CNN extracts features of
Recently, studies of sketch-based 3D object retrieval using 3D models and the sketch CNN extracts features from the
CNN(Convolutional Neural Network) have been introduced. sketch, respectively. Wasserstein center loss is used for
Wang retrieved 3D object using two Siamese CNNs to extract learning of obtained both features from CNNs.
34 2020. 2
A Sketch-based 3D Object Retrieval Approach for Augmented Reality Models Using Deep Learning
In section 3.1 and 3.2 Wasserstein distance and 3.2 Wasserstein Center
Wasserstein center are introduced. In section 3.3 the
Wasserstein barycenters is the center point of a set of
characteristics of Wasserstein CNN and Sketch CNN which
probability distributions calculated using the Wasserstein
extracts the features of 3D model and the features of the
distance. When a probability distribution set is
user’s sketch, respectively are explained. In section 3.4, the
Wasserstein center loss used for sketched-based 3D model ℝ × ⋯ . barycenter of this set is
retrieval is described. defined as follows[11].
∈ℝ× (1)
In Eq. (1), is the transmission scheme, and is a
In Eq. (6) is iteration of Wasserstein center and,
column vector in which all elements are 1. The Wasserstein
is are auxiliary variables[17].
distance between this and can be defined as
follow.
3.3 Wasserstein CNN
(2)
As shown in Fig 1. the features of the 3D model can be
×
In Eq (2), ℝ is a pairwise distance matrix of extracted from the rendered multi-view images of the target
and called ground matrix. is dot product of model using CNN and the Wasserstein center will be
and . Wasserstein distance is the optimal obtained from the features[10]. In the fisrst stage, in order to
extract the features of the model, the 12 images are taken
transmission planning cost for transmitting the mass of to
from the model according to the 30 degree rotational
. In many cases, Eq. (2) may not have a unique solution
direction of the 3D model as illustrated in Fig 2. Those
and we use Eq. (3) [14] plus the entropy normalization term.
images are feed into CNN to obtain the 3D feature of the
model.
(3)
The proposed Wasserstein CNN plays a role of four major center loss, which has been used for face recognition area to
parts: CNN for extracting feature from each view, compensate for the Softmax loss of the supervised learning.
Wasserstein barycenter for extracting features of the 3D Center loss obtains the center of a class and minimizes the
model by calculating Wasserstein center of the all views, distance between the center and each feature to be classified.
CNN2 for mapping the obtained 3D features to the same The formula of the center loss can be defined as Eq. (7).
domain of the sketch features, and a classifier for classifying
the mapping features. Fig. 3(a) shows the structure of the
proposed Wasserstein CNN.
(7)
36 2020. 2
A Sketch-based 3D Object Retrieval Approach for Augmented Reality Models Using Deep Learning
(11)
(12)
Python, OpenCV library and PyTorch deep learning library.
For the test of sketch-based 3D augmentation, Logitech C920
PRO HD web camera is used.
In the formula (11) etc. are the second-order
derivatives of the grayscale image. Finally the exact shape of (Table 1) Environments of the experiments
the marker can be extracted by GrabCut which utilizes a Resources Description
user-specified bounding box around the object to be CPU AMD Ryzen 7 1700 3GHz
segmented. GrabCut estimates the color distribution of the GPU NVIDIA GeForce GTX 1080 Ti
target object and that of the background using a Gaussian RAM 32.00 GB
mixture model. OS Ubuntu 16.04
Fig 5. shows an example of sketch-based 3D object Language Python 3.5
augmentation in AR system. Develop
Jupyter Notebook
Tool
Library OpenCV, PyTorch
Camera Logitech C920 PRO HD
38 2020. 2
A Sketch-based 3D Object Retrieval Approach for Augmented Reality Models Using Deep Learning
(a) 3D object retrieval results using SHREC 13 (b) 3D object retrieval results using SHREC 14
(Figure 8) 3D object retrieval results using TCL and the proposed WCL
5. Concluding Remarks
40 2020. 2
A Sketch-based 3D Object Retrieval Approach for Augmented Reality Models Using Deep Learning
sketch-based 3D object. We use two parts of networks to [5] F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A
extract features of 3D data and user-drawn sketch from each unified embedding for face recognition and clustering,”
image by Resnet. Wasserstein barycenters of 2D images 2015 IEEE Conference on Computer Vision and Pattern
taken from various directions of 3D data are evaluated from Recognition (CVPR), 2015.
the extracted features of 3D data. The second CNN, which is https://doi.org/10.1109/cvpr.2015.7298682
called 'CNN2', maps the Wasserstein barycenters of 2D [6] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A
images and the sketch features to the corresponding outputs. Discriminative Feature Learning Approach for Deep
In order to train the two parts of networks, Wasserstein Face Recognition,” Lecture Notes in Computer Science,
distance loss function of the output is adopted. In the respect pp. 499–515, 2016.
of the accuracy of retrieving 3D object, we can justify that https://doi.org/10.1007/978-3-319-46478-7_31
the proposed method shows improved performance both on [7] A. Rolet, M. Cuturi, and G. Peyr´e. "Fast dictionary
the SHREC 13 and SHREC 14 datasets. Moreover, we learning with a smoothed wasserstein loss," International
proposed sketch-based object matching scheme to localize the Conference on Artificial Intelligence and Statistics,
natural marker of the images to register a 3D virtual object Cadiz, Spain, pp. 630–638, 2016.
in Augmented Reality. Using the detected sketch as a marker, http://www.jmlr.org/proceedings/papers/v51/rolet16.pdf
the retrieved 3D object is augmented in AR automatically. [8] B. Li, Y. Lu, A. Godil, T. Schreck, M. Aono, H. Johan,
Form the experiments, we prove that the proposed method is J. M. Saavedra, and S. Tashiro. "Shrec’13 track: Large
efficiency for retrieving and augmenting objects. scale sketchbased 3D shape retrieval," Eurographics
Workshop on 3D Object Retrieval, Girona, Spain, pp.
Reference 89–96, 2013.
https://dx.doi.org/10.2312/3DOR/3DOR13/089-096
[1] M. Eitz, R. Richter, T. Boubekeur, K. Hildebrand, and [9] T. Furuya and R. Ohbuchi. "Ranking on cross-domain
M. Alexa, “Sketch-based shape retrieval,” ACM manifold for sketch-based 3D model retrieval,"
Transactions on Graphics, vol. 31, no. 4, pp. 1–10, International Conference on Cyberworlds, Yokohama,
2012. https://doi.org/10.1145/2185520.2335382 Japan, pp. 274– 281, 2013.
[2] B. Li, Y. Lu, A. Godil, T. Schreck, B. Bustos, A. https://doi.org/10.1109/cw.2013.60
Ferreira, T. Furuya, M. J. Fonseca, H. Johan, T. [10] J. Xie, G. Dai, F. Zhu, and Y. Fang, “Learning
Matsuda, R. Ohbuchi, P. B. Pascoal, and J. M. Barycentric Representations of 3D Shapes for
Saavedra, “A comparison of methods for sketch-based Sketch-Based 3D Shape Retrieval,” 2017 IEEE
3D shape retrieval,” Computer Vision and Image Conference on Computer Vision and Pattern
Understanding, vol. 119, pp. 57–80, 2014. Recognition (CVPR), 2017.
https://doi.org/10.1016/j.cviu.2013.11.008 https://doi.org/10.1109/cvpr.2017.385
[3] Fang Wang, Le Kang, and Yi Li, “Sketch-based 3D [11] J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G.
shape retrieval using Convolutional Neural Networks,” Peyré, “Iterative Bregman Projections for Regularized
2015 IEEE Conference on Computer Vision and Pattern Transportation Problems,” SIAM Journal on Scientific
Recognition (CVPR), 2015. Computing, vol. 37, no. 2, pp. A1111–A1138, 2015.
https://doi.org/10.1109/cvpr.2015.7298797 https://doi.org/10.1137/141000439
[4] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality [12] V. I. Bogachev and A. V. Kolesnikov, “The
Reduction by Learning an Invariant Mapping,” 2006 Monge-Kantorovich problem: achievements, connections,
IEEE Computer Society Conference on Computer and perspectives,” Russian Mathematical Surveys, vol.
Vision and Pattern Recognition - Volume 2 (CVPR’06), 67, no. 5, pp. 785–890, 2012.
2006. https://doi.org/10.1109/cvpr.2006.100 https://doi.org/10.1070/rm2012v067n05abeh004808
[13] Y. Rubner, C. Tomasi, and L. J. Guibas. "The Earth [20] B. Li, Y. Lu, C. Li, A. Godil, T. Schreck, M. Aono, M.
Mover’s Distance as a metric for image retrieval," Burtscher, H. Fu, T. Furuya, H. Johan, J. Liu, R.
International Journal of Computer Vision, vol. 40, no. 2 Ohbuchi, A. Tatsuma, and C. Zou. "Extended large
pp. 99–121, 2000. scale sketch-based 3D shape retrieval," Eurographics
https://doi.org/10.1023/a:1026543900054 Workshop on 3D Object Retrieval, Strasbourg, France,
[14] M. Cuturi. "Sinkhorn distances: Lightspeed computation pp. 121–130, 2014.
of optimal transport," Advances in Neural Information http://dx.doi.org/10.2312/3dor.20141058
Processing Systems, Lake Tahoe, Nevada, USA, pp. [21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual
2292– 2300, 2013. Learning for Image Recognition,” 2016 IEEE
https://papers.nips.cc/paper/4927-sinkhorn-distances-light Conference on Computer Vision and Pattern
speed-computation-of-optimal-transport.pdf Recognition (CVPR), 2016.
[15] R. Sinkhorn, “Diagonal Equivalence to Matrices with https://doi.org/10.1109/cvpr.2016.90
Prescribed Row and Column Sums,” The American [22] S. Ferradans, G.-S. Xia, G. Peyré, and J.-F. Aujol,
Mathematical Monthly, vol. 74, no. 4, p. 402, 1967. “Static and Dynamic Texture Mixing Using Optimal
https://doi.org/10.2307/2314570 Transport,” Scale Space and Variational Methods in
[16] He, Xinwei, et al. "Triplet-Center Loss for Multi-View Computer Vision, pp. 137–148, 2013.
3D Object Retrieval," arXiv preprint arXiv:1803.06189, https://doi.org/10.1007/978-3-642-38267-3_12
2018. [23] K.V. Shriram, P.L.K. Priyadarsini, and A. Baskar, “An
http://openaccess.thecvf.com/content_cvpr_2018/Camera intelligent system of content-based image retrieval for
Ready/1632.pdf crime investigation”, Int. J. of Advanced Intelligence
[17] N. Bonneel, G. Peyré, and M. Cuturi, “Wasserstein Paradigms, Vol. 7, No. 3/4, pp. 264-279. 2015.
barycentric coordinates,” ACM Transactions on https://doi.org/10.1504/IJAIP.2015.073707
Graphics, vol. 35, no. 4, pp. 1–10, 2016. [24] Eitz, M., Hildebrand, K., etal. “Sketch-Based Image
https://doi.org/10.1145/2897824.2925918 Retrieval: Benchmark and Bag-of-Features
[18] P.-T. de Boer, D. P. Kroese, S. Mannor, and R. Y. Descriptors,” IEEE Transactions on Visualization
Rubinstein, “A Tutorial on the Cross-Entropy Method,” and Computer Graphics, Vol. 17, No. 11, pp.
Annals of Operations Research, vol. 134, no. 1, pp. 19 1624-1636, 2010.
–67, 2005. https://doi.org/10.1007/s10479-005-5724-z https://doi.org/10.1109/TVCG.2010.266
[19] L. van der Maaten and G. Hinton. "Visualizing [25] Loris Nanni, Alessandra Lumini, and Sheryl
highdimensional data using t-SNE.," Journal of Machine Brahnam, “Ensemble of shape descriptors for shape
Learning Research, vol. 9, pp. 2579–2605, 2008. retrieval and classification,” Int. J. of Advanced
http://www.jmlr.org/papers/volume9/vandermaaten08a/v Intelligence Paradigms, Vol. 6, No.2, pp.136–156.
andermaaten08a.pdf https://doi.org/10.1504/IJAIP.2014.062177
42 2020. 2
A Sketch-based 3D Object Retrieval Approach for Augmented Reality Models Using Deep Learning
◐ 저 자 소 개 ◑
지 명 근(Myunggeun Ji)
2017 B.S. in Computer Science, Kyonggi University, Suwon, Korea
2018 M.S. in Computer Science, Kyonggi University, Suwon, Korea
2018.03~Present Researcher at Huray, Seoul, Korea
Research Interests : Computer Vision, Augmented Reality
E-mail : jmg2968@gmail.com
전 준 철(Junchul Chun)
1984 B.S. in Computer Science, Chung-Ang University, Seoul, Korea
1986 M.S. in Computer Science(Software Engineering), Chung-Ang University, Seoul, Korea
1992 M.S. in Computer Science and Engineering (Computer Graphics), The Univ. of Connecticut, USA
1995 Ph.D. in Computer Science and Engineering (Computer Graphics), The Univ. of Connecticut, USA
2001.02~2002.02 Visiting Scholar, Michigan State Univ. Pattern Recognition and Image Processing Lab.
2009.02~2010.02 Visiting Scholar, Univ. of Colorado, Wellness Innovation and Interaction Lab.
1995.03~present, Professor at the Department of Computer Science, Kyonggi University.
Research Interests : Augmented Reality, Computer Vision, Human Computer Interaction
E-mail : jcchun@kgu.ac.kr