Professional Documents
Culture Documents
4840-Article Text-7906-1-10-20190709
4840-Article Text-7906-1-10-20190709
Yutong Feng,1 Yifan Feng,2 Haoxuan You,1 Xibin Zhao,1∗ Yue Gao1∗
1
BNRist, KLISS, School of Software, Tsinghua University, China.
2
School of Information Science and Engineering, Xiamen University
{feng-yt15, zxb, gaoyue}@tsinghua.edu.cn, {evanfeng97, haoxuanyou}@gmail.com
Abstract
Acc
Mesh View
Volume Point Cloud SO-Net
Mesh is an important and powerful type of data for 3D shapes Pointnet++ GVCNN
Pairwise Kd-Net
and widely studied in the field of computer vision and com- 90
MVCNN
FPNN
Pointnet
8279
Structural Descriptor
Spatial Descriptor
nx3
Neignbor Face Kernel
Index Correlation
MLP
n x 131
n x 64
n x 64
n x 64
(131, 131) MLP (64, 64)
nx3
nx3
nx3
Normal Center
nx9
Face Rotate
Corner
Convolution
Global Feature
MLP MLP
n x 1024
n x 512
n x 512
n x 512
n x 256
n x 256
n x 512
n x 64
Spatial
1024
nx3
(1024) (1024)
Center
Descriptor Pooling
Mesh Mesh
nx9
(512, 256, k)
n x 131
n x 256
n x 512
Structural
MLP
Descriptor
nx3
Normal
Neighbor
Index
nx3
Output Scores
Figure 2: The architecture of MeshNet. The input is a list of faces with initial values, which are fed into the spatial and
structural descriptors to generate initial spatial and structural features. The features are then aggregated with neighboring in-
formation in the mesh convolution blocks labeled as “Mesh Conv”, and fed into a pooling function to output the global feature
used for further tasks. Multi-layer-perceptron is labeled as “mlp” with the numbers in the parentheses indicating the dimension
of the hidden layers and output layer.
with per-face processes and a symmetry function. Moreover, 2001). Hubeli and Gross extend the features of surfaces to
the feature of faces is split into spatial and structural fea- a multiresolution setting to solve the unstructured problem
tures. Based on these ideas, we design the network architec- of mesh data (Hubeli and Gross 2001). In SPH (Kazhdan,
ture, with two blocks named spatial and structural descrip- Funkhouser, and Rusinkiewicz 2003), a rotation invariant
tors for learning the initial features, and a mesh convolution representation is presented with existing orientation depen-
block for aggregating neighboring features. In this way, the dent descriptors. Mesh difference of Gaussians (DOG) intro-
proposed method is able to solve the complexity and irregu- duces the Gaussian filtering to shape functions.(Zaharescu et
larity problem of mesh and represent 3D shapes well. al. 2009) Intrinsic shape context (ISC) descriptor (Kokkinos
We apply our MeshNet method in the tasks of 3D shape et al. 2012) develops a generalization to surfaces and solves
classification and retrieval on the ModelNet40 (Wu et al. the problem of orientational ambiguity.
2015) dataset. And the experimental results show that Mesh-
Net achieve significant improvement on 3D shape classifi- Deep Learning Methods for 3D Shape
cation and retrieval using mesh data and comparable perfor- Representation
mance with recent methods using other types of 3D data. With the construction of large-scale 3D model datasets, nu-
The key contributions of our work are as follows: merous deep descriptors of 3D shapes are proposed. Based
• We propose a neural network using mesh for 3D shape on different types of data, these methods can be categorized
representation and design blocks for capturing and aggre- into four types.
gating features of polygon faces in 3D shapes. Voxel-based method. 3DShapeNets (Wu et al. 2015) and
• We conduct extensive experiments to evaluate the perfor- VoxNet (Maturana and Scherer 2015) propose to learn on
mance of the proposed method, and the experimental re- volumetric grids, which partition the space into regular
sults show that the proposed method performs well on the cubes. However, they introduce extra computation cost due
3D shape classification and retrieval task. to the sparsity of data, which restricts them to be applied on
more complex data. Field probing neural networks (FPNN)
Related Work (Li et al. 2016), Vote3D (Wang and Posner 2015) and
Octree-based convolutional neural network (OCNN) (Wang
Mesh Feature Extraction et al. 2017) address the sparsity problem, while they are still
There are plenty of handcraft descriptors that extract fea- restricted with input getting larger.
tures from mesh. Lien and Kajiya calculate moments of each View-based method. Using 2D images of 3D shapes to
tetrahedron in mesh (Lien and Kajiya 1984), and Zhang and represent them is proposed by Multi-view convolutional
Chen develop more functions applied to each triangle and neural networks (MVCNN) (Su et al. 2015), which aggre-
add all the resulting values as features (Zhang and Chen gates 2D views from a loop around the object and applies 2D
8280
deep learning framework to them. Group-view convolutional
neural networks (GVCNN) (Feng et al. 2018) proposes a Center O
0.3
0.4
n faces
hierarchical framework, which divides views into different
<latexit sha1_base64="O5qDU1NnK9phbsF6iL+vaKQAXO0=">AAACxHicjVHLSsNAFD2Nr1pfVZdugkVwVRIRdFkUxJ0t2AdokSSd1qHTJEwmQin6A27128Q/0L/wzjgFtYhOSHLm3HvOzL03TAXPlOe9Fpy5+YXFpeJyaWV1bX2jvLnVypJcRqwZJSKRnTDImOAxayquBOukkgWjULB2ODzV8fYdkxlP4ks1Tll3FAxi3udRoIhqXNyUK17VM8udBb4FFdhVT8ovuEYPCSLkGIEhhiIsECCj5wo+PKTEdTEhThLiJs5wjxJpc8pilBEQO6TvgHZXlo1prz0zo47oFEGvJKWLPdIklCcJ69NcE8+Ns2Z/854YT323Mf1D6zUiVuGW2L9008z/6nQtCn0cmxo41ZQaRlcXWZfcdEXf3P1SlSKHlDiNexSXhCOjnPbZNZrM1K57G5j4m8nUrN5HNjfHu74lDdj/Oc5Z0Dqo+l7VbxxWaid21EXsYBf7NM8j1HCOOprG+xFPeHbOHOFkTv6Z6hSsZhvflvPwAQm8j1Q=</latexit>
<latexit
O
<latexit sha1_base64="O5qDU1NnK9phbsF6iL+vaKQAXO0=">AAACxHicjVHLSsNAFD2Nr1pfVZdugkVwVRIRdFkUxJ0t2AdokSSd1qHTJEwmQin6A27128Q/0L/wzjgFtYhOSHLm3HvOzL03TAXPlOe9Fpy5+YXFpeJyaWV1bX2jvLnVypJcRqwZJSKRnTDImOAxayquBOukkgWjULB2ODzV8fYdkxlP4ks1Tll3FAxi3udRoIhqXNyUK17VM8udBb4FFdhVT8ovuEYPCSLkGIEhhiIsECCj5wo+PKTEdTEhThLiJs5wjxJpc8pilBEQO6TvgHZXlo1prz0zo47oFEGvJKWLPdIklCcJ69NcE8+Ns2Z/854YT323Mf1D6zUiVuGW2L9008z/6nQtCn0cmxo41ZQaRlcXWZfcdEXf3P1SlSKHlDiNexSXhCOjnPbZNZrM1K57G5j4m8nUrN5HNjfHu74lDdj/Oc5Z0Dqo+l7VbxxWaid21EXsYBf7NM8j1HCOOprG+xFPeHbOHOFkTv6Z6hSsZhvflvPwAQm8j1Q=</latexit>
<latexit
0.2
1x3
groups with different weights to generate a more discrimi- A
<latexit sha1_base64="1fpNIUZRPl2vZJVNl9EHXpCLbCI=">AAACxHicjVHLSsNAFD2Nr1pfVZdugkVwVRIRdFkVxGUL9gFaJEmndeg0CZOJUIr+gFv9NvEP9C+8M05BLaITkpw5954zc+8NU8Ez5XmvBWdufmFxqbhcWlldW98ob261siSXEWtGiUhkJwwyJnjMmoorwTqpZMEoFKwdDs90vH3HZMaT+FKNU9YdBYOY93kUKKIaJzflilf1zHJngW9BBXbVk/ILrtFDggg5RmCIoQgLBMjouYIPDylxXUyIk4S4iTPco0TanLIYZQTEDuk7oN2VZWPaa8/MqCM6RdArSelijzQJ5UnC+jTXxHPjrNnfvCfGU99tTP/Qeo2IVbgl9i/dNPO/Ol2LQh/HpgZONaWG0dVF1iU3XdE3d79UpcghJU7jHsUl4cgop312jSYzteveBib+ZjI1q/eRzc3xrm9JA/Z/jnMWtA6qvlf1G4eV2qkddRE72MU+zfMINVygjqbxfsQTnp1zRziZk3+mOgWr2ca35Tx8AOhtj0Y=</latexit>
<latexit
nx3
0.2
native descriptor for a 3D shape. This type of method also !
OA 0.7
B !
<latexit sha1_base64="5uufU2fHqR6H/BMq1AX8F+x2X0A=">AAAC2HicjVHLSsNAFD2Nr1pf1S7dBIvgqiQi6LLqxp0V7ANtkSQd69A0EyYTpZSCO3HrD7jVLxL/QP/CO2MEH4hOSHLm3HvOzL3Xj0OeKMd5zlkTk1PTM/nZwtz8wuJScXmlkYhUBqweiFDIlu8lLOQRqyuuQtaKJfMGfsiafn9fx5uXTCZcRMdqGLPOwOtF/JwHniLqrFhqCwpL3rtQnpTianS4Oz4rlp2KY5b9E7gZKCNbNVF8QhtdCARIMQBDBEU4hIeEnlO4cBAT18GIOEmImzjDGAXSppTFKMMjtk/fHu1OMzaivfZMjDqgU0J6JSltrJNGUJ4krE+zTTw1zpr9zXtkPPXdhvT3M68BsQoXxP6l+8j8r07XonCOHVMDp5piw+jqgswlNV3RN7c/VaXIISZO4y7FJeHAKD/6bBtNYmrXvfVM/MVkalbvgyw3xau+JQ3Y/T7On6CxWXGdinu0Va7uZaPOYxVr2KB5bqOKA9RQJ+8h7vGAR+vEurZurNv3VCuXaUr4sqy7N5wMl9M=</latexit>
expensively adds the computation cost and is hard to be ap- Corner <latexit sha1_base64="OB+MD7FAoYbDCJJnnCYjDlaEKJo=">AAACxHicjVHLSsNAFD2Nr/quunQTLIKrkoigy1JBXLZgH1CLJNNpDZ0mYTIRStEfcKvfJv6B/oV3ximoRXRCkjPn3nNm7r1hKqJMed5rwVlYXFpeKa6urW9sbm2XdnZbWZJLxpssEYnshEHGRRTzpoqU4J1U8mAcCt4OR+c63r7jMouS+EpNUt4bB8M4GkQsUEQ1ajelslfxzHLngW9BGXbVk9ILrtFHAoYcY3DEUIQFAmT0dOHDQ0pcD1PiJKHIxDnusUbanLI4ZQTEjug7pF3XsjHttWdm1IxOEfRKUro4JE1CeZKwPs018dw4a/Y376nx1Heb0D+0XmNiFW6J/Us3y/yvTteiMMCZqSGimlLD6OqYdclNV/TN3S9VKXJIidO4T3FJmBnlrM+u0WSmdt3bwMTfTKZm9Z7Z3Bzv+pY0YP/nOOdB67jiexW/cVKu1uyoi9jHAY5onqeo4hJ1NI33I57w7Fw4wsmc/DPVKVjNHr4t5+ED6s2PRw==</latexit>
<latexit
O
OB
!
0.1
…
<latexit sha1_base64="G2RBOz3IuuhRjXaHO0sr/u5gxI8=">AAAC2HicjVHLSsNAFD2Nr/qOdukmWARXJRFBl0U37qxgH1iLJHGsQ9NMmEyUUgR34tYfcKtfJP6B/oV3xhTUIjohyZlz7zkz994giXiqXPe1YE1MTk3PFGfn5hcWl5btldVGKjIZsnooIiFbgZ+yiMesrriKWCuRzO8HEWsGvX0db14xmXIRH6tBwjp9vxvzCx76iqgzu3QqKCx591L5Uorr4eHezZlddiuuWc448HJQRr5qwn7BKc4hECJDHwwxFOEIPlJ62vDgIiGugyFxkhA3cYYbzJE2oyxGGT6xPfp2adfO2Zj22jM16pBOieiVpHSwQRpBeZKwPs0x8cw4a/Y376Hx1Hcb0D/IvfrEKlwS+5dulPlfna5F4QK7pgZONSWG0dWFuUtmuqJv7nypSpFDQpzG5xSXhEOjHPXZMZrU1K5765v4m8nUrN6HeW6Gd31LGrD3c5zjoLFV8dyKd7Rdru7loy5iDevYpHnuoIoD1FAn7wEe8YRn68S6te6s+89Uq5BrSvi2rIcPnm2X1A==</latexit>
OC
<latexit sha1_base64="O5qDU1NnK9phbsF6iL+vaKQAXO0=">AAACxHicjVHLSsNAFD2Nr1pfVZdugkVwVRIRdFkUxJ0t2AdokSSd1qHTJEwmQin6A27128Q/0L/wzjgFtYhOSHLm3HvOzL03TAXPlOe9Fpy5+YXFpeJyaWV1bX2jvLnVypJcRqwZJSKRnTDImOAxayquBOukkgWjULB2ODzV8fYdkxlP4ks1Tll3FAxi3udRoIhqXNyUK17VM8udBb4FFdhVT8ovuEYPCSLkGIEhhiIsECCj5wo+PKTEdTEhThLiJs5wjxJpc8pilBEQO6TvgHZXlo1prz0zo47oFEGvJKWLPdIklCcJ69NcE8+Ns2Z/854YT323Mf1D6zUiVuGW2L9008z/6nQtCn0cmxo41ZQaRlcXWZfcdEXf3P1SlSKHlDiNexSXhCOjnPbZNZrM1K57G5j4m8nUrN5HNjfHu74lDdj/Oc5Z0Dqo+l7VbxxWaid21EXsYBf7NM8j1HCOOprG+xFPeHbOHOFkTv6Z6hSsZhvflvPwAQm8j1Q=</latexit>
<latexit
-0.2
1x9
nx9
<latexit sha1_base64="9JnBjP+aj5VVLWDvCKMvwNY+8Zk=">AAACxHicjVHLSsNAFD2Nr/quunQTLIKrkoigy2JBXLZgH1CLJOm0Dp0mYTIRStEfcKvfJv6B/oV3ximoRXRCkjPn3nNm7r1hKnimPO+14CwsLi2vFFfX1jc2t7ZLO7utLMllxJpRIhLZCYOMCR6zpuJKsE4qWTAOBWuHo5qOt++YzHgSX6lJynrjYBjzAY8CRVSjdlMqexXPLHce+BaUYVc9Kb3gGn0kiJBjDIYYirBAgIyeLnx4SInrYUqcJMRNnOEea6TNKYtRRkDsiL5D2nUtG9Nee2ZGHdEpgl5JSheHpEkoTxLWp7kmnhtnzf7mPTWe+m4T+ofWa0yswi2xf+lmmf/V6VoUBjgzNXCqKTWMri6yLrnpir65+6UqRQ4pcRr3KS4JR0Y567NrNJmpXfc2MPE3k6lZvY9sbo53fUsasP9znPOgdVzxvYrfOClXz+2oi9jHAY5onqeo4hJ1NI33I57w7Fw4wsmc/DPVKVjNHr4t5+ED7S2PSA==</latexit>
<latexit
1.0
cloud is not suitable for previous frameworks. PointNet (Qi Normal
!
n
<latexit sha1_base64="clgDmGxq+gdQeCFlTTEsGoQwqic=">AAAC13icjVHLSsNAFD3GV62vWpdugkVwVRIRdCm6cVnBakWLJONYB9NMmEx8UIo7cesPuNU/Ev9A/8I74xR8IDohyZlz7zkz9944S0Sug+BlyBseGR0bL02UJ6emZ2Yrc9W9XBaK8SaTiVStOMp5IlLe1EInvJUpHnXjhO/H51smvn/BVS5kuquvM97uRp1UnAoWaaKOK9UjSWElOmc6Ukpe9tL+caUW1AO7/J8gdKAGtxqy8owjnECCoUAXHCk04QQRcnoOESJARlwbPeIUIWHjHH2USVtQFqeMiNhz+nZod+jYlPbGM7dqRqck9CpS+lgijaQ8Rdic5tt4YZ0N+5t3z3qau13TP3ZeXWI1zoj9SzfI/K/O1KJxinVbg6CaMsuY6phzKWxXzM39T1VpcsiIM/iE4oows8pBn32ryW3tpreRjb/aTMOaPXO5Bd7MLWnA4fdx/gR7K/UwqIc7q7WNTTfqEhawiGWa5xo2sI0GmuR9hQc84sk78G68W+/uI9Ubcpp5fFne/Tsd/pen</latexit>
0.0
d 34
gation with neighbors to solve this problem. Self-organizing Neighbor
<latexit sha1_base64="7kVD1bztflqXqy6ue1pPjTm2YK4=">AAACxHicjVHLSsNAFD2Nr1pfVZdugkVwVRIRdFkUxGUL9gG1SJJOa+g0CTMToRT9Abf6beIf6F94Z5yCWkQnJDlz7j1n5t4bZjyWyvNeC87C4tLySnG1tLa+sblV3t5pyTQXEWtGKU9FJwwk43HCmipWnHUywYJxyFk7HJ3rePuOCRmnyZWaZKw3DoZJPIijQBHV6N+UK17VM8udB74FFdhVT8svuEYfKSLkGIMhgSLMEUDS04UPDxlxPUyJE4RiE2e4R4m0OWUxygiIHdF3SLuuZRPaa09p1BGdwukVpHRxQJqU8gRhfZpr4rlx1uxv3lPjqe82oX9ovcbEKtwS+5dulvlfna5FYYBTU0NMNWWG0dVF1iU3XdE3d79UpcghI07jPsUF4cgoZ312jUaa2nVvAxN/M5ma1fvI5uZ417ekAfs/xzkPWkdV36v6jeNK7cyOuog97OOQ5nmCGi5RR9N4P+IJz86Fwx3p5J+pTsFqdvFtOQ8fO5yPaQ==</latexit>
<latexit
e
<latexit sha1_base64="mJ8GqMi7I82JglfZ7jvtceK19vA=">AAACxHicjVHLSsNAFD2Nr1pfVZdugkVwVRIRdFkUxGUL9gG1SDKd1qFpEmYmQin6A27128Q/0L/wzpiCWkQnJDlz7j1n5t4bppFQ2vNeC87C4tLySnG1tLa+sblV3t5pqSSTjDdZEiWyEwaKRyLmTS10xDup5ME4jHg7HJ2bePuOSyWS+EpPUt4bB8NYDAQLNFENflOueFXPLnce+DmoIF/1pPyCa/SRgCHDGBwxNOEIARQ9XfjwkBLXw5Q4SUjYOMc9SqTNKItTRkDsiL5D2nVzNqa98VRWzeiUiF5JShcHpEkoTxI2p7k2nllnw/7mPbWe5m4T+oe515hYjVti/9LNMv+rM7VoDHBqaxBUU2oZUx3LXTLbFXNz90tVmhxS4gzuU1wSZlY567NrNcrWbnob2PibzTSs2bM8N8O7uSUN2P85znnQOqr6XtVvHFdqZ/moi9jDPg5pnieo4RJ1NK33I57w7Fw4kaOc7DPVKeSaXXxbzsMHPfyPag==</latexit>
<latexit
78
Index f 83
network (SO-Net) (Li, Chen, and Lee 2018), kernel corre- f
<latexit sha1_base64="7QQfQx4iNX04F8go9Nvv8s3hbAI=">AAACxHicjVHLSsNAFD2Nr1pfVZdugkVwVRIRdFkUxGUL9gG1SDKd1qHTJGQmQin6A27128Q/0L/wzpiCWkQnJDlz7j1n5t4bJlIo7XmvBWdhcWl5pbhaWlvf2Nwqb++0VJyljDdZLOO0EwaKSxHxphZa8k6S8mAcSt4OR+cm3r7jqRJxdKUnCe+Ng2EkBoIFmqjG4KZc8aqeXe488HNQQb7qcfkF1+gjBkOGMTgiaMISARQ9XfjwkBDXw5S4lJCwcY57lEibURanjIDYEX2HtOvmbER746msmtEpkt6UlC4OSBNTXkrYnObaeGadDfub99R6mrtN6B/mXmNiNW6J/Us3y/yvztSiMcCprUFQTYllTHUsd8lsV8zN3S9VaXJIiDO4T/GUMLPKWZ9dq1G2dtPbwMbfbKZhzZ7luRnezS1pwP7Pcc6D1lHV96p+47hSO8tHXcQe9nFI8zxBDZeoo2m9H/GEZ+fCkY5yss9Up5BrdvFtOQ8fQFyPaw==</latexit>
<latexit sha1_base64="7QQfQx4iNX04F8go9Nvv8s3hbAI=">AAACxHicjVHLSsNAFD2Nr1pfVZdugkVwVRIRdFkUxGUL9gG1SDKd1qHTJGQmQin6A27128Q/0L/wzpiCWkQnJDlz7j1n5t4bJlIo7XmvBWdhcWl5pbhaWlvf2Nwqb++0VJyljDdZLOO0EwaKSxHxphZa8k6S8mAcSt4OR+cm3r7jqRJxdKUnCe+Ng2EkBoIFmqjG4KZc8aqeXe488HNQQb7qcfkF1+gjBkOGMTgiaMISARQ9XfjwkBDXw5S4lJCwcY57lEibURanjIDYEX2HtOvmbER746msmtEpkt6UlC4OSBNTXkrYnObaeGadDfub99R6mrtN6B/mXmNiNW6J/Us3y/yvztSiMcCprUFQTYllTHUsd8lsV8zN3S9VaXJIiDO4T/GUMLPKWZ9dq1G2dtPbwMbfbKZhzZ7luRnezS1pwP7Pcc6D1lHV96p+47hSO8tHXcQe9nFI8zxBDZeoo2m9H/GEZ+fCkY5yss9Up5BrdvFtOQ8fQFyPaw==</latexit>
1x3
nx3
(Jiang, Wu, and Lu 2018) develop more detailed approaches one face
for capturing local structures with nearest neighbors. Kd-Net
(Klokov and Lempitsky 2017) proposes another approach to Figure 3: Initial values of each face. There are four types
solve the irregularity problem using k-d tree. of initial values, divided into two parts: center, corner and
Fusion method. These methods learn on multiple types normal are the face information, and neighbor index is the
of data and fusion the features of them together. FusionNet neighbor information.
(Hegde and Zadeh 2016) uses the volumetric grid and multi-
view for classification. Point-view network (PVNet) (You et
al. 2018) proposes the embedding attention fusion to exploit mesh data is also more irregular and complex for the multi-
both point cloud data and multi-view data. ple compositions and varying numbers of elements.
Method To get full use of the advantages of mesh and solve the
problem of its irregularity and complexity, we propose two
In this block, we present the design of MeshNet. Firstly, we
key ideas of design:
analyze the properties of mesh, propose the methods for de-
signing network and reorganize the input data. We then in- • Regard face as the unit. Mesh data consists of multiple
troduce the overall architecture of MeshNet and some blocks elements and connections may be defined among them.
for capturing features of faces and aggregating them with To simplify the data organization, we regard face as the
neighbor information, which are then discussed in detail. only unit and define a connection between two faces if
they share a common edge. There are several advantages
Overall Design of MeshNet of this simplification. First is that one triangular face can
connect with no more than three faces, which makes the
We first introduce the mesh data and analyze its properties. connection relationship regular and easy to use. More im-
Mesh data of 3D shapes is a collection of vertices, edges and portantly, we can solve the disorder problem with per-face
faces, in which vertices are connected with edges and closed processes and a symmetry function, which is similar to
sets of edges form faces. In this paper, we only consider tri- PointNet (Qi et al. 2017a), with per-face processes and
angular faces. Mesh data is dominantly used for storing and a symmetry function. And intuitively, face also contains
rendering 3D models in computer graphics, because it pro- more information than vertex and edge.
vides an approximation of the smooth surfaces of objects
and simplifies the rendering process. Numerous studies on • Split feature of face. Though the above simplification
3D shapes in the field of computer graphic and geometric enables us to consume mesh data similar to point-based
modeling are taken based on mesh. methods, there are still some differences between point-
Mesh data shows stronger ability to describe 3D shapes unit and face-unit because face contains more information
comparing with other popular types of data. Volumetric grid than point. We only need to know “where you are” for a
and multi-view are data types defined to avoid the irregu- point, while we also want to know “what you look like”
larity of the native data such as mesh and point cloud, while for a face. Correspondingly, we split the feature of faces
they lose some natural information of the original object. For into spatial feature and structural feature, which helps
point cloud, there may be ambiguity caused by random sam- us to capture features more explicitly.
pling and the ambiguity is more obvious with fewer amount
of points. In contrast, mesh is more clear and loses less nat- Following the above ideas, we transform the mesh data
ural information. Besides, when capturing local structures, into a list of faces. For each face, we define the initial values
most methods based on point cloud collect the nearest neigh- of it, which are divided into two parts (illustrated in Fig 3):
bors to approximately construct an adjacency matrix for fur-
ther process, while in mesh there are explicit connection • Face Information:
relationships to show the local structure clearly. However, – Center: coordinate of the center point
8281
– Corner: vectors from the center point to three vertices
– Normal: the unit normal vector
⇥ Kernel
• Neighbor Information:
>tixetal/<084K2+xz8+Q4Oe2yBvWRP2r3xvn25x7KDzzIBDIN6MIu1UAcEkjwcmgwoMey7Kls+De14rhozn0VXWugQkrKCgeZNgEbl4W4ZQ5IxqcGDvQ66YPZULBTivoFFLXGlK0JO4LMJf9n4nZvXiWcCdGXJiT7ihGTP93VyIUlGbNJ6PVmJ701m7kEbhqq48eARvRKPxSzUTb+moQDCBXx/IgdWTamKEOLz+lbvW+eULXBJhMJrCy51/lilvnbSgyu1cSRYwCRgSnJeMljDUdoYx9TbN/O0hWrkHpJcWzO5hCFNIehuoFitbTpm2w4M4cr14xHo8z2dnt2dh7qzNWftFmXfvP6wWqSrsrpqjaogGLNek8Jbwv3/oPvFdwiB7vQQ8KTLPzsxMpZOxcJSOAzR84FgeMoKyeKgPoBe9YUfhxrNyOEBNgSLDVbciX7BAAA>"=QFHqV2J3gSGXUBrKAihXjr6yJKE"=46esab_1ahs tixetal<
Pooling
– Neighbor Index: indexes of the connected faces (filled shared
with the index of itself if the face connects with less
than three faces) ⇥
>tixetal/<084K2+xz8+Q4Oe2yBvWRP2r3xvn25x7KDzzIBDIN6MIu1UAcEkjwcmgwoMey7Kls+De14rhozn0VXWugQkrKCgeZNgEbl4W4ZQ5IxqcGDvQ66YPZULBTivoFFLXGlK0JO4LMJf9n4nZvXiWcCdGXJiT7ihGTP93VyIUlGbNJ6PVmJ701m7kEbhqq48eARvRKPxSzUTb+moQDCBXx/IgdWTamKEOLz+lbvW+eULXBJhMJrCy51/lilvnbSgyu1cSRYwCRgSnJeMljDUdoYx9TbN/O0hWrkHpJcWzO5hCFNIehuoFitbTpm2w4M4cr14xHo8z2dnt2dh7qzNWftFmXfvP6wWqSrsrpqjaogGLNek8Jbwv3/oPvFdwiB7vQQ8KTLPzsxMpZOxcJSOAzR84FgeMoKyeKgPoBe9YUfhxrNyOEBNgSLDVbciX7BAAA>"=QFHqV2J3gSGXUBrKAihXjr6yJKE"=46esab_1ahs tixetal<
Kernel
shared
In the final of this section, we present the overall archi-
MLP
tecture of MeshNet, illustrated in Fig 2. A list of faces with ⇥
>tixetal/<084K2+xz8+Q4Oe2yBvWRP2r3xvn25x7KDzzIBDIN6MIu1UAcEkjwcmgwoMey7Kls+De14rhozn0VXWugQkrKCgeZNgEbl4W4ZQ5IxqcGDvQ66YPZULBTivoFFLXGlK0JO4LMJf9n4nZvXiWcCdGXJiT7ihGTP93VyIUlGbNJ6PVmJ701m7kEbhqq48eARvRKPxSzUTb+moQDCBXx/IgdWTamKEOLz+lbvW+eULXBJhMJrCy51/lilvnbSgyu1cSRYwCRgSnJeMljDUdoYx9TbN/O0hWrkHpJcWzO5hCFNIehuoFitbTpm2w4M4cr14xHo8z2dnt2dh7qzNWftFmXfvP6wWqSrsrpqjaogGLNek8Jbwv3/oPvFdwiB7vQQ8KTLPzsxMpZOxcJSOAzR84FgeMoKyeKgPoBe9YUfhxrNyOEBNgSLDVbciX7BAAA>"=QFHqV2J3gSGXUBrKAihXjr6yJKE"=46esab_1ahs tixetal<
Kernel
initial values is fed into two blocks, named spatial descrip-
tor and structural descriptor, to generate the initial spatial
and structural features of faces. The features are then passed
v1
through some mesh convolution blocks to aggregate neigh- <latexit sha1_base64="oY9xBoyCVPHzsCLD8qgLPTDP65g=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiQi6LLoxmUF+4CmlMl00g6dTMLMTaGE/oYbF4q49Wfc+TdO2iy09cDA4Zx7uWdOkEhh0HW/ndLG5tb2Tnm3srd/cHhUPT5pmzjVjLdYLGPdDajhUijeQoGSdxPNaRRI3gkm97nfmXJtRKyecJbwfkRHSoSCUbSS70cUx0GYTecDb1CtuXV3AbJOvILUoEBzUP3yhzFLI66QSWpMz3MT7GdUo2CSzyt+anhC2YSOeM9SRSNu+tki85xcWGVIwljbp5As1N8bGY2MmUWBncwzmlUvF//zeimGt/1MqCRFrtjyUJhKgjHJCyBDoTlDObOEMi1sVsLGVFOGtqaKLcFb/fI6aV/VPbfuPV7XGndFHWU4g3O4BA9uoAEP0IQWMEjgGV7hzUmdF+fd+ViOlpxi5xT+wPn8ASdSkb4=</latexit>
v2
boring information, which get features of two types as input
<latexit sha1_base64="ssjW1CGTywUuhiz3AoMjfAjFPK8=">AAAB83icbVDLSsNAFL3xWeur6tLNYBFclaQIuiy6cVnBPqApZTKdtEMnkzBzUyihv+HGhSJu/Rl3/o2TNgttPTBwOOde7pkTJFIYdN1vZ2Nza3tnt7RX3j84PDqunJy2TZxqxlsslrHuBtRwKRRvoUDJu4nmNAok7wST+9zvTLk2IlZPOEt4P6IjJULBKFrJ9yOK4yDMpvNBfVCpujV3AbJOvIJUoUBzUPnyhzFLI66QSWpMz3MT7GdUo2CSz8t+anhC2YSOeM9SRSNu+tki85xcWmVIwljbp5As1N8bGY2MmUWBncwzmlUvF//zeimGt/1MqCRFrtjyUJhKgjHJCyBDoTlDObOEMi1sVsLGVFOGtqayLcFb/fI6addrnlvzHq+rjbuijhKcwwVcgQc30IAHaEILGCTwDK/w5qTOi/PufCxHN5xi5wz+wPn8ASjWkb8=</latexit>
⇥ : Convolution
and output new features of them. It is noted that all the pro-
>tixetal/<084K2+xz8+Q4Oe2yBvWRP2r3xvn25x7KDzzIBDIN6MIu1UAcEkjwcmgwoMey7Kls+De14rhozn0VXWugQkrKCgeZNgEbl4W4ZQ5IxqcGDvQ66YPZULBTivoFFLXGlK0JO4LMJf9n4nZvXiWcCdGXJiT7ihGTP93VyIUlGbNJ6PVmJ701m7kEbhqq48eARvRKPxSzUTb+moQDCBXx/IgdWTamKEOLz+lbvW+eULXBJhMJrCy51/lilvnbSgyu1cSRYwCRgSnJeMljDUdoYx9TbN/O0hWrkHpJcWzO5hCFNIehuoFitbTpm2w4M4cr14xHo8z2dnt2dh7qzNWftFmXfvP6wWqSrsrpqjaogGLNek8Jbwv3/oPvFdwiB7vQQ8KTLPzsxMpZOxcJSOAzR84FgeMoKyeKgPoBe9YUfhxrNyOEBNgSLDVbciX7BAAA>"=QFHqV2J3gSGXUBrKAihXjr6yJKE"=46esab_1ahs tixetal<
8282
n ⇥ in1
the affinity between two vectors. In this paper, we generally Spatial
n ⇥ out1
choose the Gaussian kernel: Feature Combination
n ⇥ in2
Combination
2
kn − mk
n ⇥ in2
Kσ (n, m) = exp(− ), (4) Structural
2σ 2 Feature
n ⇥ out1
n ⇥ in1
n ⇥ in1
n ⇥ in2
M LP
n ⇥ out2
where k·k is the length of a vector in the Euclidean space, Aggregation (out1 )
n⇥3
Neighbor
and σ is the hyper-parameter that controls the kernels’ re- Index
n ⇥ in2
Feature
will get higher values. Since the parameters of kernels are pooling
n ⇥ out2
M LP
MLP
learnable, they will turn to some common distributions on (out2 )
……
the surfaces of 3D shapes and be able to describe the local 2 3
n⇥3
max average
structures of faces. We set the value of KC(i, k) as the k-th pooling pooling
1 Concatenation
feature of the i-th face. Therefore, with M learnable kernels, 2 Max pooling
Average pooling
3
we generate features with the length of M for each face.
8283
Experiments Spatial X X X X X
In the experiments, we first apply our network for 3D shape Stru. -FRC X X X X
classification and retrieval. Then we conduct detailed abla- Stru. -FKC X X X X
tion experiments to analyze the effectiveness of blocks in the Mesh Conv X X X X X
architecture. We also investigate the robustness to the num- Acc. (%) 83.5 88.2 87.0 89.9 90.4 91.9
ber of faces and the time and space complexity of our net-
work. Finally, we visualize the structural features from the Table 2: Classification results of ablation experiments on
two structural descriptors. ModelNet40.
3D Shape Classification and Retrieval Aggregation Method Accuracy (%)
We apply our network on the ModelNet40 dataset (Wu et Average Pooling 90.7
al. 2015) for classification and retrieval tasks. The dataset Max Pooling 91.1
contains 12,311 mesh models from 40 categories, in which Concatenation 91.9
9,843 models for training and 2,468 models for testing. For
each model, we simplify the mesh data into no more than Table 3: Classification results of different aggregation meth-
1,024 faces, translate it to the geometric center, and nor- ods on ModelNet40.
malize it into a unit sphere. Moreover, we also compute the
normal vector and indexes of connected faces for each face.
During the training, we augment the data by jittering the po- and retrieval on ModelNet40, comparing our work with rep-
sitions of vertices by a Gaussian noise with zero mean and resentative methods. It is shown that, as a mesh-based rep-
0.01 standard deviation. Since the number of faces varies resentation, MeshNet achieves satisfying performance and
among models, we randomly fill the list of faces to the length makes great improvement compared with traditional mesh-
of 1024 with existing faces for batch training. based methods. It is also comparable with recent deep learn-
For classification, we apply fully-connected layers (512, ing methods based on other types of data.
256, 40) to the global features as the classifier, and add Our performance is dedicated to the following reasons.
dropout layers with drop probability of 0.5 before the last With face-unit and per-face processes, MeshNet solves the
two fully-connected layers. For retrieval, we calculate the complexity and irregularity problem of mesh data and makes
L2 distances between the global features as similarities and it suitable for deep learning method. Though with deep
evaluate the result with mean average precision (mAP). We learning’s strong ability to capture features, we do not sim-
use the SGD optimizer for training, with initial learning rate ply apply it, but design blocks to get full use of the rich infor-
0.01, momentum 0.9, weight decay 0.0005 and batch size mation in mesh. Splitting features into spatial and structural
64. features enables us to consider the spatial distribution and
Table 1 shows the experimental results of classification local structure of shapes respectively. And the mesh convo-
lution blocks widen the receptive field of faces. Therefore,
the proposed method is able to capture detailed features of
Acc. mAP faces and conduct the 3D shape representation well.
Method Modality
(%) (%)
On the Effectiveness of Blocks
3DShapeNets (Wu et al. volume 77.3 49.2
To analyze the design of blocks in our architecture and prove
2015)
the effectiveness of them, we conduct several ablation exper-
VoxNet (Maturana and volume 83.0 -
iments, which compare the classification results while vary-
Scherer 2015)
ing the settings of architecture or removing some blocks.
FPNN (Li et al. 2016) volume 88.4 -
For the spatial descriptor, labeled as “Spatial” in Table 2,
LFD (Chen et al. 2003) view 75.5 40.9 we remove it together with the use of spatial feature in the
MVCNN (Su et al. 2015) view 90.1 79.5 network, and maintain the aggregation of structural feature
Pairwise (Johns, Leuteneg- view 90.7 - in the mesh convolution.
ger, and Davison 2016) For the structural descriptor, we first remove the whole
PointNet (Qi et al. 2017a) point 89.2 - of it and use max pooling to aggregate the spatial feature
PointNet++ (Qi et al. 2017b) point 90.7 - in the mesh convolution. Then we partly remove the face
Kd-Net (Klokov and Lempit- point 91.8 - rotate convolution, labeled as “Structural-FRC” in Table 2,
sky 2017) or the face kernel correlation, labeled as “Structural-FKC”,
KCNet (Shen et al. 2018) point 91.0 - and keep the rest of pipeline to prove the effectiveness of
each structural descriptor.
SPH (Kazhdan, Funkhouser, mesh 68.2 33.3 For the mesh convolution, labeled as “Mesh Conv” in Ta-
and Rusinkiewicz 2003) ble 2, we remove it and use the initial features to generate
MeshNet mesh 91.9 81.9 the global feature directly. We also explore the effectiveness
of different aggregation methods in this block, and com-
Table 1: Classification and retrieval results on ModelNet40. pare them in Table 3. The experimental results show that the
8284
adopted concatenation method performs better for aggregat-
ing neighboring features.
8285
Acknowledgments Lien, S.-l., and Kajiya, J. T. 1984. A Symbolic Method
This work was supported by National Key R&D Pro- for Calculating the Integral Properties of Arbitrary Noncon-
gram of China (Grant No. 2017YFC0113000), National vex Polyhedra. IEEE Computer Graphics and Applications
Natural Science Funds of China (U1701262, 61671267), 4(10):35–42.
National Science and Technology Major Project (No. Maturana, D., and Scherer, S. 2015. VoxNet: A 3D Convo-
2016ZX01038101), MIIT IT funds (Research and applica- lutional Neural Network for Real-time Object Recognition.
tion of TCN key technologies) of China, and The National In 2015 IEEE/RSJ International Conference on Intelligent
Key Technology R&D Program (No. 2015BAG14B01-02). Robots and Systems (IROS), 922–928. IEEE.
Qi, C. R.; Su, H.; Nießner, M.; Dai, A.; Yan, M.; and Guibas,
References L. J. 2016. Volumetric and Multi-View CNNs for Object
Classification on 3D Data. In Proceedings of the IEEE Con-
Chang, A. X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; ference on Computer Vision and Pattern Recognition, 5648–
Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, 5656.
H.; et al. 2015. ShapeNet: An Information-Rich 3D Model
Repository. arXiv preprint arXiv:1512.03012. Qi, C. R.; Su, H.; Mo, K.; and Guibas, L. J. 2017a. Point-
Net: Deep Learning on Point Sets for 3D Classification and
Chen, D.-Y.; Tian, X.-P.; Shen, Y.-T.; and Ouhyoung, M. Segmentation. In Proceedings of the IEEE Conference on
2003. On Visual Similarity Based 3D Model Retrieval. In Computer Vision and Pattern Recognition, 77–85.
Computer Graphics Forum, volume 22, 223–232. Wiley On-
line Library. Qi, C. R.; Yi, L.; Su, H.; and Guibas, L. J. 2017b. Point-
Net++: Deep Hierarchical Feature Learning on Point Sets in
Feng, Y.; Zhang, Z.; Zhao, X.; Ji, R.; and Gao, Y. 2018. a Metric Space. In Advances in Neural Information Process-
GVCNN: Group-View Convolutional Neural Networks for ing Systems, 5099–5108.
3D Shape Recognition. In Proceedings of the IEEE Confer-
Shen, Y.; Feng, C.; Yang, Y.; and Tian, D. 2018. Min-
ence on Computer Vision and Pattern Recognition, 264–272.
ing Point Cloud Local Structures by Kernel Correlation and
Hegde, V., and Zadeh, R. 2016. Fusionnet: 3D Object Graph Pooling. In Proceedings of the IEEE Conference on
Classification Using Multiple Data Representations. arXiv Computer Vision and Pattern Recognition, volume 4.
preprint arXiv:1607.05695. Su, H.; Maji, S.; Kalogerakis, E.; and Learned-Miller, E.
Hubeli, A., and Gross, M. 2001. Multiresolution Feature 2015. Multi-View Convolutional Neural Networks for 3D
Extraction for Unstructured Meshes. In Proceedings of the Shape Recognition. In Proceedings of the IEEE Interna-
Conference on Visualization, 287–294. IEEE Computer So- tional Conference on Computer Vision, 945–953.
ciety. Tsin, Y., and Kanade, T. 2004. A Correlation-Based Ap-
Jiang, M.; Wu, Y.; and Lu, C. 2018. PointSIFT: A SIFT-like proach to Robust Point Set Registration. In European Con-
Network Module for 3D Point Cloud Semantic Segmenta- ference on Computer Vision, 558–569. Springer.
tion. arXiv preprint arXiv:1807.00652. Wang, D. Z., and Posner, I. 2015. Voting for Voting in
Johns, E.; Leutenegger, S.; and Davison, A. J. 2016. Pair- Online Point Cloud Object Detection. In Robotics: Science
wise Decomposition of Image Sequences for Active Multi- and Systems, volume 1.
view Recognition. In Proceedings of the IEEE Conference Wang, P.-S.; Liu, Y.; Guo, Y.-X.; Sun, C.-Y.; and Tong, X.
on Computer Vision and Pattern Recognition, 3813–3822. 2017. O-CNN: Octree-based Convolutional Neural Net-
Kazhdan, M.; Funkhouser, T.; and Rusinkiewicz, S. 2003. works for 3D Shape Analysis. ACM Transactions on Graph-
Rotation Invariant Spherical Harmonic Representation of ics (TOG) 36(4):72.
3D Shape Descriptors. In Symposium on Geometry Process- Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.;
ing, volume 6, 156–164. and Xiao, J. 2015. 3d ShapeNets: A Deep Representation for
Klokov, R., and Lempitsky, V. 2017. Escape from Cells: Volumetric Shapes. In Proceedings of the IEEE Conference
Deep Kd-Networks for the Recognition of 3D Point Cloud on Computer Vision and Pattern Recognition, 1912–1920.
Models. In Proceedings of the IEEE International Confer- You, H.; Feng, Y.; Ji, R.; and Gao, Y. 2018. PVnet: A Joint
ence on Computer Vision, 863–872. IEEE. Convolutional Network of Point Cloud and Multi-view for
Kokkinos, I.; Bronstein, M. M.; Litman, R.; and Bronstein, 3D Shape Recognition. In Proceedings of the 26th ACM In-
A. M. 2012. Intrinsic Shape Context Descriptors for De- ternational Conference on Multimedia, 1310–1318. ACM.
formable Shapes. In Proceedings of the IEEE Conference on Zaharescu, A.; Boyer, E.; Varanasi, K.; and Horaud, R.
Computer Vision and Pattern Recognition, 159–166. IEEE. 2009. Surface Feature Detection and Description with Ap-
Li, Y.; Pirk, S.; Su, H.; Qi, C. R.; and Guibas, L. J. 2016. plications to Mesh Matching. In Proceedings of the IEEE
FPNN: Field Probing Neural Networks for 3D Data. In Ad- Conference on Computer Vision and Pattern Recognition,
vances in Neural Information Processing Systems, 307–315. 373–380. IEEE.
Li, J.; Chen, B. M.; and Lee, G. H. 2018. SO-Net: Self- Zhang, C., and Chen, T. 2001. Efficient Feature Extraction
Organizing Network for Point Cloud Analysis. In Proceed- for 2D/3D Objects in Mesh Representation. In Proceedings
ings of the IEEE Conference on Computer Vision and Pat- of the IEEE International Conference on Image Processing,
tern Recognition, 9397–9406. volume 3, 935–938. IEEE.
8286