You are on page 1of 11

This WACV paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;


the final published version of the proceedings is available on IEEE Xplore.

RIMeshGNN: A Rotation-Invariant Graph Neural Network


for Mesh Classification

Bahareh Shakibajahromi, Edward Kim, David E. Breen


Department of Computer Science, Drexel University
Philadelphia, PA, USA
bs643@drexel.edu, ek826@drexel.edu, deb39@drexel.edu

Abstract aim to categorize 3D shapes into different classes based on


their geometric features and retrieve similar shapes based on
Shape analysis tasks, including mesh classification, seg- user queries. A descriptive and versatile representation of
mentation, and retrieval demonstrate symmetries in Eu- 3D object shapes is crucial for accomplishing these tasks. In
clidean space and should be invariant to geometric trans- the field of 3D shape analysis, deep learning methods such
formations such as rotation and translation. However, ex- as PointNet [40], PointNet++ [42], ShapeContextNet [57],
isting methods in mesh analysis often rely on extensive data PointGrid [26], DynamicGCN [54], and SampleNet [25]
augmentation and more complex analysis models to han- have demonstrated potential. In the context of 3D shape
dle 3D rotations. Despite these efforts, rotation invariance analysis, it is crucial for a neural network-based model to be
is not guaranteed, which can significantly reduce accuracy able to effectively learn relevant latent features, even when
when test samples undergo arbitrary rotations, because the the 3D shape has undergone geometric transformation.
analysis method struggles to generalize to the unknown ori- In this study, we employ adaptively sampled mesh mod-
entations of the test samples. To address these challenges, els to represent 3D shapes, instead of other popular meth-
our work presents a novel approach that employs graph ods, such as volumetric grids (voxels) [33, 41] and point
neural networks (GNNs) to analyze mesh-structured data. clouds [42,59]. Mesh models may provide higher resolution
Our proposed GNN layer, aggregation function, and local by assigning more vertices and polygons to complex geo-
pooling layer are equivariant to the rotation, reflection and metric areas. Conversely, in extensive, nearly flat regions
translation of 3D shapes, making them suitable building only a minimal number of polygons are utilized. Mesh
blocks for our proposed rotation-invariant network for the models are a useful choice due to their conceptual simplic-
classification of mesh models. Therefore, our proposed ap- ity, versatility, and efficient processing methods.
proach does not need rotation augmentation, and we can Shape analysis tasks, such as mesh classification, seg-
maintain accuracy even when test samples undergo arbi- mentation, and retrieval exhibit symmetries in Euclidean
trary rotations. Extensive experiments on various datasets space. These tasks involve predicting outcomes that should
demonstrate that our methods achieve state-of-the-art per- be unaffected by rotation, translation, and reflection, collec-
formance. tively referred to as E(3) transformations, that are applied
to the analyzed geometric models. However, the majority
of the mesh [3, 21, 24], point cloud [18, 29, 42, 54, 55], volu-
1. Introduction metric grids [33, 37] analysis methods are not robust to ro-
tation. Most 3D shape analysis methods have been trained
With the advancements in 3D scanning technology and and evaluated on 3D shape datasets, such as ModelNet [56]
modeling software, the amount of generated 3D geometric and ShapeNet [7], which contain shapes that have been pre-
data has increased exponentially. These data are used in var- aligned into a standard orientation. These methods fail to
ious fields such as gaming, augmented and virtual reality, generalize to models in unknown poses and experience a
medical imaging, robotics, self-driving cars, and many oth- significant performance drop when evaluated on objects in
ers. The analysis of 3D mesh data is challenging, and tradi- arbitrary orientations. A different set of techniques deals
tional methods can be time-consuming and limited in their with the input mesh as a graph and uses graph deep learning
capabilities. 3D shape classification and 3D shape matching techniques to analyze the input mesh for various tasks such
are among the main applications of shape analysis and have as classification and segmentation [34,35,44,52]. However,
been a long standing topic of active research. These tasks most spatial GNN layers, such as graph convolution, graph

3150
attention, and message passing layers, are not designed to line future work. In order to demonstrate the impact and the
handle node features that include attributes that vary with individual contribution of each component within our clas-
different coordinate systems. This means that the spatial sification network, we conducted a series of ablation stud-
coordinates and normal vectors of nodes change when sub- ies. The outcomes of these studies can be found in the sup-
jected to translation, rotation, and reflection. Consequently, plementary materials.
training a mesh classification network using these GNN
layers is expensive and requires significant data augmen- 2. Related Work
tation to expose the network to 3D meshes in different
orientations. Additionally, these GNN layers lack transla- A triangular mesh, composed of vertices (V), edges (E),
tion/rotation equivariance or invariance, requiring O(δ −3 ) and faces (F), is the most widely used and effective 3D
more filters to achieve an angular resolution of δ (which is shape representation in computer graphics. However, the
assumed to be less than 1) [50]. To tackle these challenges irregularity and unstructured nature of this representation
and reduce the computational complexity of the classifica- poses a challenge when applying Convolutional Neural Net-
tion network, we propose an approach that is translation, works (CNNs) to it, as each vertex has a varying number of
rotation and reflection invariant. neighbors at differing distances. Traditional deep learning
techniques tackle the irregularity of meshes by either con-
The objective of our research is to develop a represen- verting them into volumetric grids [33, 37, 41] or creating
tation of 3D objects that can be used for 3D shape classi- multiple 2D projections [23, 47] to make them compatible
fication that remains unaffected by geometric transforma- with CNNs. An alternative method involves extracting 3D
tions such as rotation and translation. To achieve this goal, point samples from the surface of a three-dimensional ob-
we construct a graph neural network framework with mesh- ject and then applying a point cloud technique to the sam-
specific features. We propose a unique Graph Neural Net- ples [40, 42, 48, 49, 53, 60]. These techniques aim to learn
work (GNN) as an instance of the message-passing GNN a representation for each point by analyzing its neighbors,
that can jointly learn node, edge, and graph embeddings. using either multi-layer perceptrons or convolutional layers.
Our rotation-, reflection- and translation-equivariant layer Recently, there has been a growing research trend ad-
uses geometric features of the mesh that change under rota- dressing 3D meshes directly. Early work on learning repre-
tion and translation, such as position and normals, as well sentations for 3D meshes generalized standard 2D Convo-
as rotation and translation invariant features. Furthermore, lutional Neural Networks (CNNs) to 3D. These generaliza-
we introduce a novel permutation and rotation invariant ag- tions treat 3D meshes as manifolds and apply patch-based
gregation function as a component of the proposed GNN. techniques on the underlying surface [31, 32, 35]. How-
This function leverages the coordinates of vertices by bin- ever, it is worth mentioning that most patch-based methods
ning them into spherical regions, which are then represented entail high computational complexity, hand-crafted tech-
by a complete graph. Each node in the complete graph rep- niques, and precomputed local systems of coordinates [4].
resents a bin and is assigned a feature vector that is cal- Researchers have attempted to establish a specific order
culated by taking the average of the rotation-invariant fea- for vertex neighborhoods to prevent expensive resamplings
tures of mesh vertices located in the corresponding spheri- [3, 4, 17, 24, 31]. Lim et al. [31] and Gong et al. [17] have
cal shell. We then utilize two layers of Graph Attention Net- introduced sampling operators that serialize vertex neigh-
works (GATs), which employ self-attention over the node borhoods along spiral trajectories to capture local geomet-
features, giving varying importance to different neighbors. ric structures. An alternative method for handling the irreg-
We aggregate the node features of the mesh by taking the ular structure of 3D meshes utilizes several random walks
average of the embedded features of the nodes of the com- to traverse the surface of the mesh, thereby examining its
plete graph. Finally, we present a local pooling layer, which local and global geometry [3, 24]. These walks are then
implements mesh coarsening at different levels, enabling fed into a Recurrent Neural Network (RNN) that can retain
multi-resolution mesh analysis. Combined together, these the history of each walk. To overcome the limitations of
components create RIMeshGNN, a (R)otation-(I)nvariant fixed discretization schemes, DiffusionNet [45] utilizes heat
graph neural network (GNN) for mesh classification. propagation across the mesh surface, generating a diffusion
The rest of this paper is structured as follows. In Sec- kernel that encodes vertex relationships in terms of connec-
tion 2 we provide a summary of related research. Section tivity and proximity. Point cloud methods tackle challenges
3 introduces the fundamental components of our approach arising from the irregular domain and the absence of order-
and the framework implemented for 3D mesh classification. ing in various ways [18, 42, 54, 55]. PCT [18] leverages the
Next, in Section 4 we discuss the experimental evaluation permutation invariant properties of transformers, combined
of our proposed model for 3D mesh classification. We also with farthest point sampling and nearest neighbor search,
provide a comparison to relevant supervised models for 3D to facilitate efficient point cloud learning. DeltaConv [55]
classification. We conclude the paper in Section 5 and out- adopts the DGCNN network architecture [54] while substi-

3151
tuting each EdgeConv block with an anisotropic convolu- SO(3)-equivariant neural networks for point-cloud process-
tion layer. ing. Li et al. [28] convert the point clouds into their canoni-
Another way to tackle the irregular, non-uniform struc- cal poses by developing a pose selector module that disam-
ture of a mesh is to exploit the inherent properties of mesh biguates PCA-based canonical poses. To address the same
elements that have regular forms. Particularly, in a 2- issue for meshes, MeshCNN [20] utilized rotation- and
manifold triangle mesh each edge is incident to exactly two translation-invariant geometric features of the mesh edges.
triangular faces and, therefore, adjacent to four other edges. In contrast, our method leverages rotation- and translation-
MeshCNN [20] designed a convolution operator for edges, invariant nodes and edge features, along with directional
where its spatial support is defined using the four incident and positional information that varies with geometric trans-
edges. Similarly, face-based techniques take advantage of formation. We accomplish this through a novel graph neural
the fact that each face in a 2-manifold triangle mesh is ad- network framework integrating mesh-specific geometric in-
jacent to exactly three other faces. MeshNet [14] lever- sights.
ages two mesh convolutional layers to aggregate the 1-ring
neighbors and learn the spatial and structural features of a 3. Method
face. SubdivNet [21] has also introduced a convolutional This section describes RIMeshGNN and the message-
operation for mesh faces that supports kernel size, stride, passing graph neural network (GNN) [2, 16, 43] used to en-
and dilation. However, this method requires a mesh with a code a 3D shape that remains invariant to E(3) transforma-
subdivision sequence. tions. In our context, all shapes are considered to be repre-
As triangle meshes are structured graphs, learning algo- sented by watertight, 2-manifold meshes.
rithms inspired by graph-based approaches can often be em-
ployed to learn from meshes. Our method is closely related 3.1. E(3)-Equivariant GNN
to a family of approaches that utilizes the graph represen- A 3D shape is defined as a mesh comprised of vertices,
tation of 3D meshes and relies on Graph Neural Networks edges, and faces. This mesh can be represented as a graph,
(GNNs). GNNs extract neighborhood features through a denoted by G = (V, E), where V = {vi }N i=1 is the set of
weight-sharing strategy and are classified into two main cat- N vertices in the mesh, E = {eij } is the set of E edges,
egories: 1) spectral- and 2) spatial-based methods. The where edge eij connects two vertices vi and vj . Each ver-
spectral-based methods use the eigen decomposition of the tex vi , edge eij , and the entire graph G are associated with
Laplacian associated with the graph [5, 9, 27]. Such mod- feature vectors fi , hij , and hG . Vertex features include
els generally suffer from large computational complexity, rotation equivariant properties, such as vertex coordinates
unstable decomposition, and poor generalization across dif- xi and normal ni , and rotation invariant feature hi , such
ferent graphs [1, 35, 52]. In contrast, spatial-based methods as mean and Gaussian curvature, area, and dihedral angle
aggregate deep feature information of neighboring nodes (fi = (xi , ni , hi )). We define the vertex area as one-third
[19, 36, 51]. These methods have attained considerable at- of the total face area of its 1-ring neighborhood. The vertex
tention due to their low computational complexity, gener- dihedral angle represents the maximum dihedral angle of all
alization, and localized properties. Milano et al. [34] pro- edges incident to the vertex. Edge features (hij ) are rotation
posed PD-MeshNet that performs dynamic, attention-based invariant, including dihedral and internal angles and edge-
feature aggregation using the primal-dual graph convolu- to-height ratios. The initial graph feature hG includes a his-
tional framework of Monti et al. [36]. Shakibajahromi et togram of the lengths and dihedral angles of all the edges
al. [44] introduced a 3D segmentation network, HyNet, that of the mesh. Figure 1 illustrates the input features for the
converts a 3D mesh into a hybrid graph consisting of three vertex, edge and mesh graph.
types of nodes representing vertices, edges, and faces.
Nevertheless, most end-to-end deep neural networks dis-
cussed earlier do not demonstrate invariance to 3D rota-
tions. As a result, the accuracy of these networks tends to
decline when the orientations of the input 3D objects differ
between training and inference. Several studies have re-
ported a substantial performance difference in shape anal-
ysis tasks between aligned and unaligned or rotationally
augmented settings, such as those by Poulenard et al. [39],
Tao et al. [49] and Sun et al. [48]. Recently several point
cloud techniques have been offered to address this issue Figure 1. Input feature vector for the vertex, edge and mesh graph.
[10, 28, 39, 48, 49, 61]. For example, Deng et al. [10] ex-
tended neurons from 1D scalars to 3D vectors, introducing Our proposed E(3) equivariant layer utilizes the nodes,

3152
edges, and graph features as input and generates corre- The global graph attribute hG is updated using a learn-
sponding output features by gathering information from the able transformation ΦG . This transformation takes the ag-
neighboring nodes in the mesh graph through nonlinear gregation of node updates sl+1 l+1
G , edge updates hij , and the
transformation and permutation-invariant aggregation func- l
graph attributes of the previous layer hG as input. The result
tions. To achieve sufficient expressive power, a learnable is an updated feature vector for hl+1
G , defined as:
transformation must be employed to convert the input fea-
tures into higher-level features. Specifically, we apply a

shared transformation Φe : RF → RF , i.e. a feed-forward h_{\mathcal {G}}^{l+1} &= \Phi _g(s_{\mathcal {G}}^{l+1}, \frac {1}{E} \sum _{i \in \mathcal {V}, j \in \mathcal {N}_i} h_{ij}^{l+1}, h_{\mathcal {G}}^l ) . \label {eq:graph_update_g} (6)
neural networks, to each edge, utilizing the edge’s and its
vertices’ features, and the graph embedding as inputs:
In order to calculate the aggregation of node features sl+1
G ,
we have designed a novel aggregation function for combin-
h_{ij}^{l+1} &= \Phi _e( h_{ij}^l, h_i^l, h_j^l, \|x_i^l - x_j^l\|_2, n_i^l \cdot {n_j^l}^T, h_{\mathcal {G}}^l) , \label {eq:edge_update_g} (1) ing node features, which will be discussed in the next sec-
tion. Additionally, we have chosen the MEAN function for
where hlij is the edge embedding and hli and hlj represent
edge feature aggregation, which is supported by the results
the embeddings of the rotation invariant portion of the two
of our numerous experiments and ablation studies.
incident vertices. ∥xli − xlj ∥2 is the distance between the
T
incident vertices’ coordinates. nli · nlj is the cosine of an-
(a)
gles between the incident vertices’ normal and hlG denotes
the graph embedding.
Furthermore, we split the node features update into rota-
tion equivariant and invariant portions. Particularly, we up- 2
3
3 3
2 2
date the vertex coordinates and normal using the weighted 1
1 1

sum of the relative differences between the vertex and


its neighboring nodes. The weights are separately com-
puted using the functions Φx and Φn , and are approxi- (b)
mated by Multilayer Perceptrons (MLPs), which take the
edge embedding hij corresponding to the relative differ-
ences (xi − xj ) and (ni − nj ) and generate scalar values 3
2 3 3
Φx (hij ) and Φn (hij ), as follows: 1
2 2
1
1

x_i^{l+1} &= x_i^l + \frac {1}{|\mathcal {N}_i|} \sum _{j \in \mathcal {N}_i}(x_i^l -x_j^l) \Phi _x( h_{ij}^{l+1}) \label {eq:pos_update_g}\\ n_i^{l+1} &= n_i^l + \frac {1}{|\mathcal {N}_i|} \sum _{j \in \mathcal {N}_i}(n_i^l -n_j^l) \Phi _n( h_{ij}^{l+1}) . \label {eq:normal_update_g}
𝒎𝒍+𝟏
𝟏
1

(c) 2 𝒎𝒍+𝟏
𝟐
(3)

𝒎𝒍+𝟏
𝑲 3 𝒎𝒍+𝟏
𝟑
k
The embedded normal vector is then normalized:

n_i^{l+1} &= \frac {n_i^{l+1}}{\|n_i^{l+1}\|_2} . \label {eq:normalization_g} (4)


Figure 2. Permutation and Rotation Invariant Aggregation Func-
tion: (a) illustrates the spherical bins and bucketizing process. (b)
To update the rotation invariant component of the node fea- demonstrates the rotation-invariance property of the process, and
ture hi , we utilize a learnable transformation Φh approx- the mesh vertices associated with the first bin and its correspond-
imated by an MLP that operates on each vertex vi . This ing feature vector ml+11 , (c) presents the complete graph with nbin
transformation takes as input the aggregation of edge up- nodes, where each node represents a spherical region.
dates for edges that are incident to vertex vi , the initial fea-
ture hli , and the entire graph feature hlG , as follows: 3.2. Permutation, Rotation and Reflection Invariant
Aggregation Function
h_i^{l+1} &= \Phi _h(h_i^l, \sum _{j \in \mathcal {N}_i} h_{ij}^{l+1}, h_{\mathcal {G}}^l) . \label {eq:node_update} (5)
After the input 3D shapes have been scaled to fit inside a
unit sphere centered at the origin, we proceed to bucketize
We have opted to use the summation function for aggrega- the nodes V into nbin groups based on their distance from
tion, and our various experiments and ablation studies have the origin. This process will assign an index ki to each node
provided supporting evidence for this choice. vi . To achieve this, we divide the range of distances from 0

3153
to 1 into equal intervals, creating spherical bins illustrated xi and xj , which is then added to xi . Any rotation applied
in Figure 2a and expressed by the following equation: to input coordinates will result in an equivalent rotation in
xi − xj and xi . Therefore, the output xl+1 i undergoes an
k_i &= \textit {Bucketize}( \|x_i\|, n_{bin}) . \label {eq:bin_g} (7) equivalent rotation as the input when the coordinates are ro-
tated. Likewise, translating the coordinates will alter xi , but
The node embeddings hl+1
i that belong to each bin are av- not alter the coordinate difference and thus cause an equiv-
eraged to form ml+1
k , which represents the bin: alent translation in the output. The same principle applies
to Equation 3.
m_k^{l+1} &= \frac {1}{|\{v_i | k_i=k\}|} \sum _{k_i = k} h_i^{l+1} \hspace {0.8em} k\in \{1,2, \dots , n_{bin}\} . \label {eq:binfeat_g} Moreover, indices ki are assigned to the initial coordi-
nates of shapes, which have already been translated and
(8) scaled to fit in a unit sphere centered at the origin, as illus-
trated in Equation 7. The process of bucketizing is rotation
Figure 2b illustrates the mesh vertices associated with the
invariant as it depends on the nodes’ distance to the cen-
first bin and its corresponding feature vector ml+1
1 . We con- ter and remains unchanged when shapes are rotated. Equa-
struct a complete graph with nbin nodes, where each node
tion 8 averages the invariant feature vectors of nodes within
is represented by ml+1 k , and apply a two-layer graph atten- the same spherical shell. Since bucketization is rotation
tion network (GAT) on the complete graph, as illustrated
invariant, the averaging of invariant features remains un-
in Figure 2c. The GAT [51] assigns varying importance
changed under E(3) transformations. Equation 9 computes
to the neighborhood of each node (spherical bin) in each
graph features based on the bin features ml+1 k . Since mk
l+1
layer, which enhances performance for some downstream
is invariant to E(3) transformations, Equation 9 is also in-
tasks. We calculate a global feature vector sl+1
G for the en-
variant to E(3) transformations. Figure 2b illustrates this
tire complete graph by aggregating the node (bin) embed-
characteristic. Finally, Equation 6 maintains invariance due
ding vectors using the MEAN function:
to its inputs’ translation and rotation invariance properties.
To summarize, the composition of our proposed GNN layer
s_{\mathcal {G}}^{l+1} &= \frac {1}{n_{bin}}\sum _{k=1}^{n_{bin}} \Phi _{GAT}(m_k^{l+1}) . \label {eq:nodeMean_g} (9) and aggregation functions exhibits E(3) equivariance induc-
tively. For classification we utilize the graph attribute ob-
tained through Equation 6, which is E(3) invariant.
This entire process serves as a permutation, rotation, and
reflection invariant aggregation function that consolidates 3.4. Local Pooling Layer
the rotation invariant node features of the input mesh. The
aggregated node features will then be used as one of the We introduce a local pooling layer for RIMeshGNN that
inputs for computing global mesh attributes utilized for the performs mesh coarsening at various scales; thus allowing
shape classification task. for mesh analysis at multiple resolutions. Our method is in-
fluenced by Garland and Heckbert’s surface simplification
3.3. Analysis of E(3) Equivariance and Invariance algorithm [15] and MeshCNN [20] and employs edge con-
Property traction as the basis for our pooling technique. We modify
and extend Diehl’s edge pooling method [11] for a geomet-
In this section, we explore the equivariance properties
ric mesh context to develop a learnable pooling strategy.
of our proposed GNN layer and aggregation function with
The pooling layer performs a series of edge collapse op-
respect to rotation and translation. We observe that cer-
erations, where each collapse operation results in an edge
tain geometric features, such as node features hi (mean and
collapsing to a vertex, and subsequently, two pairs of edges
Gaussian curvature, area, and dihedral angles), edge fea-
from its adjacent faces merge into two edges, as demon-
tures hij (dihedral and internal angles, and edge-to-height
strated in Figure 3. To identify the most appropriate edges
ratios), and shape feature hG (a histogram of the length and
for edge collapses within each pooling layer, we use the up-
dihedral angles of all edges of the mesh) remain invariant to
dated edge embeddings generated by the prior GNN layer in
rotation and translation. Furthermore, the relative distance
Equation 1 (hl+1
ij ). A score is computed for each edge using
between two coordinates ∥x0i − x0j ∥2 and cosine similarity
a single fully connected layer with softmax as a non-linear
between two normal vectors are also invariant to rotation
activation function:
and translation. This is because the distance between nodes
and the relative angles between adjacent normal vectors do S_{ij} = \Phi _{score}(h_{ij}^{l+1}) . (10)
not change when objects are rotated or translated. As a re-
sult, we can ensure that edge features generated by Equation We first rank all edges according to their scores. The
1 remain invariant to E(3) transformation. highest-scoring edge is chosen for the initial edge contrac-
Equation 2, similar to the one presented in [43], is com- tion. We proceed by choosing the next highest-scoring edge
puted as a weighted sum of differences between coordinates if its vertices have not been involved in prior contractions

3154
Edge Collapse

concatenation
Pooling Layer

Pooling Layer

Class
MLP
Figure 3. Local pooling layer based on edge collapse.

Graph feature
Graph feature
within the current pooling layer; otherwise, we skip that
edge. This precaution is taken to prevent redundancy and
optimize the efficiency of the process. Specifically, by de-
ferring the update of the mesh graph’s edges until after all
contractions are completed, the overall computational time Figure 4. RIMeshGNN architecture.
is reduced, minimizing the need for repeated updates after
each individual contraction.
We continue until this process reaches the end of the 3.5. Network Architecture
ranked list. This approach allows the network to collapse
the least important parts of the mesh while preserving the Our classification network employs multiple GNN and
more relevant regions in relation to the network’s purpose pooling layers to capture node and edge dependencies over
enforced by the loss function. It is important to recognize long distances and to facilitate information propagation
that not all edges can be collapsed. Our approach prohibits across multiple hops. Nevertheless, stacking several GNN
edge collapses that result in non-manifold faces, e.g. those layers can cause over-smoothing and performance issues in
that produce faces with zero area. Also, an edge is not suit- deep networks or graphs with small diameters [6, 58]. To
able for collapse if the intersection of the 1-ring vertices of address the problem of over-smoothing and enhance model
the edge’s end nodes contains three or more vertices. capacity, we concatenate the graph features from all lay-
More specifically, in each edge collapse operation, an ers, thereby directly fusing all the information from inter-
edge eij = (vi , vj ) is contracted to a new vertex vk . This mediate layers. The embedding vector produced for nodes,
action also removes two pairs of edges namely (emi = edges, and graphs can serve various mesh analysis purposes,
(vm , vi ), emj = (vm , vj ) and (eni = (vn , vi ), enj = including mesh classification, shape retrieval, and segmen-
(vn , vj )) and introduces emk = (vm , vk ) and enk = tation. In order to accomplish graph classification, we in-
(vn , vk ). We illustrate this process in Figure 3. To deter- corporate an MLP sub-network as the final layer. The pro-
mine the feature of emk = (vm , vk ), we take the average of posed neural network is then trained end-to-end to achieve
the features of edges emi and emj (hl+1 l+1
mk = 0.5 · (hmi +
mesh classification by minimizing cross-entropy loss over
l+1 all predicted categories and labeled graphs. Figure 4 pro-
hmj )). Similarly, the feature of enk is computed by aver-
aging the features of edges eni and enj . Additionally, we vides an overview of the proposed classification network.
combine the features of the two collapsed nodes vi and vj
to generate new vertex features for vk :
4. Experiments and Results
h_{k}^{l+1} = S_{ij} \cdot (h_{i}^{l+1} + h_{j}^{l+1}) \label {eq:inv_feat_merge}\\ x_{k}^{l+1} = 0.5 \cdot (x_{i}^{l+1} + x_{j}^{l+1})\\ n_{k}^{l+1} = \frac {n_{i}^{l+1} + n_{j}^{l+1}}{\|(n_{i}^{l+1} + n_{j}^{l+1}\|_2}\\ x_{k} = 0.5 \cdot (x_{i} + x_{j}) .
Our proposed E(3)-equivariant Graph Neural Network
and rotation-invariant aggregation layer have a broad range
of applications in 3D shape analysis, due to their flexibil-
ity and adaptability. These components form the founda-
tional elements of our E(3) invariant classification network.
(14) We quantitatively evaluated the network’s performance for
a shape classification task and compared the results with
Following the approach in [11], to ensure that the gradient the current state-of-the-art outcomes. The RIMeshGNN
can flow into the scores, we multiply the combined node model uses PyTorch [38] as the deep learning framework
features by the edge score Sij in Equation 11. Our pro- on a computer with a 3.8 GHz CPU (24 cores), a 64 GB
posed local pooling layer guarantees that the pooling pro- RAM, and a TITAN RTX 24GB GPU. The graph neural
cess maintains a watertight and manifold shape throughout network operations are performed with the Deep Graph Li-
the process. Furthermore, it is rotation-equivariant and does brary (DGL) [53]. We have conducted extensive experi-
not affect the rotation-invariance characteristic of our clas- ments to demonstrate the effectiveness of the key compo-
sification network. nents of our proposed GNN layer and model.

3155
Dataset SHREC11 investigation to include training the baseline methods using
Split Split 16 Split 10 augmented data generated through random rotations. The
Evaluation Process NR/NR NR/NR subsequent evaluation aimed to assign labels to arbitrarily
GWCNN [13] 96.6% 90.3% rotated test samples (AR/AR).
MeshCNN [20] 98.6% 91.0% SHREC11. The dataset introduced by Lian et al. [30]
MeshWalker [24] 98.6% 97.1% contains 30 different classes, each with 20 samples. Fol-
HodgeNet [46] 99.1% 94.6% lowing the setup in [13, 20, 34], we perform training and
SubdivNet [21] 99.9% 99.5% evaluation on two types of dataset splits: 1) Split 16: where
RIMeshGNN (Ours) 100% 100% the samples of each class are randomly split into 16 train-
ing examples and 4 testing examples, 2) Split 10: where
Table 1. Experimental Results for SHREC11 classification. the samples of each class are randomly split into 10 training
examples and 10 testing examples.
Our method is compared to multiple state-of-the-art deep
4.1. Data Preprocessing and Augmentation
learning methods that utilize a mesh representation of 3D
Within our framework, it is assumed that all shapes shapes as input. The evaluation results are shown in Ta-
are represented as water-tight manifold meshes. However, ble 1. Our classification network is rotation invariant and
many 3D shapes found in various datasets are not water- achieves an outstanding 100% accuracy in the NR/NR eval-
tight manifolds. As a result, the ManifoldPlus [22] method uation setting for both split 16 and split 10. It maintains this
is used in a preprocessing stage to convert the triangular accuracy even when an arbitrary rotation is applied to the
meshes into water-tight manifolds. While our method does test samples in the NR/AR evaluation setting for both data
not mandate meshes to have the same number of vertices splits. Our method outperforms all compared methods on
or faces, we simplify all meshes to precisely 750 faces to aligned and unaligned data. The unavailability of the meth-
reduce the computational cost of training. In addition, to ods’ code or pre-trained models for the SHREC dataset lim-
reduce network sensitivity to size, we translate and scale ited our ability to perform evaluations of the models within
all 3D meshes to ensure they fit within a unit sphere that is the NR/AR and AR/AR evaluation settings. The models
centered at the origin. GWCNN [13], MeshWalker [24], HodgeNet [46], and Sub-
Augmentation. Two distinct methods for data augmen- divNet [21], which utilize xyz and normal feature vectors,
tations are utilized: 1) adding varying Gaussian noise to are vulnerable to the impact of rotation. Consequently,
each vertex of the shape, and 2) randomly displacing ver- their performance suffers when evaluated on datasets that
tices to various locations on the surface of the mesh within include unknown rotation transformations in the NR/AR
a close-to-planar surface region [20]. As our classification assessment framework. In contrast, MeshCNN [20] is ca-
network is rotation invariant, including rotated 3D shapes in pable of handling random rotation due to its use of solely
the dataset does not help in network training. rotation- and translation-invariant edge features. Our pro-
posed approach takes advantage of both rotation-sensitive
4.2. Classification and rotation-invariant features of nodes and edges.
We evaluated our proposed graph neural network (GNN) ModelNet40. The ModelNet40 dataset contains 12,311
and aggregation function for mesh classification on two shapes categorized into 40 different classes. The dataset
datasets, namely SHREC11 [30] and ModelNet40 [56], and was introduced by Wu et al. [56], with 9,843 models used
conducted three separate experiments to demonstrate the for training and 2,488 models for testing. ModelNet40
effectiveness and superiority of our approach. Initially, has become a widely used benchmark for 3D geometric
the classification network was trained using the original learning, particularly in tasks related to classification and
datasets that contained aligned shapes (No random rota- retrieval. However, it should be noted that the majority
tion, NR). The test process was divided into two groups. of the shapes within the dataset are not watertight mani-
In the first experiment, the classification network predicted folds. To address this issue, we reconstructed the shapes
category labels for the original datasets with aligned test in ModelNet40, producing manilfold meshes using Mani-
samples (NR). In the second experiment, category labels foldPlus [22] and then simplifying them to 750 faces using
were predicted by the classification network after applying Garland’s method [15]. We refer to this modified dataset as
an arbitrary rotation transformation to the test data (AR). Manifold40. Due to reconstruction errors and simplification
This experiment simulated real-world scenarios where the distortions, Manilfold40 meshes present a more challeng-
3D mesh can be represented in any coordinate system, and ing classification task, resulting in slightly lower accuracies
the shape orientation is unknown to the network. In the compared to the results from ModelNet40 meshes.
third experiment, we examined the effect of augmentation We compare our model to several state-of-the-art deep
on rotation-sensitive baseline methods. We expanded our learning techniques that employ triangular mesh representa-

3156
Dataset Manifold40** (ModelNet40) # Parameters
Evaluation Setting NR/NR † NR/AR AR/AR Input
MeshNet [14] 88.4% (91.9%) 10.8% (11.0%) 81.1% Mesh 4.2 M
MeshWalker [24] 90.5% (92.3%) 11.2% (11.4%) -- Mesh 12.6 M
SubdivNet [21] 91.2% 10.2% 87.0% Mesh 0.8 M
Laplacian2Mesh [12] 90.9% - - Mesh -
RIMeshGNN (Ours) 90.7% 90.7% 90.7% Mesh 0.9 M
MeshWalker* [24] (94.4%) (13.4%) -- Mesh 12.6 M
Laplacian2Mesh* [12] 92.8% - - Mesh -
RIMeshGNN* (Ours*) 93.4% 93.4% 93.4% Mesh 0.9 M
DynamicGCN [54] 93.5% 16.6% 81.1% point cloud -
DeltaConv [55] 93.8% 15.6% -- point cloud -
ClusterNet [8] 87.1% 87.1% 87.1% point cloud -
Pose-Selector [28] 90.2% 90.2% 90.2% point cloud -
VN-DGCN [10] 90.0% 90.0% 90.0% point cloud -

Table 2. Experimental Results on Manifold40 (ModelNet40) Classification. The dash (-) indicates that the corresponding method’s code
is unavailable, preventing the evaluation of the model under the NR/AR and AR/AR settings and further examination of the network
architecture. - - denotes our inability to train the data with rotation augmentation due to the substantial computational demands of training
or data preprocessing. The accuracy of models that can handle non-watertight and non-manifold 3D shapes from the ModelNet40 dataset
is denoted by values in parentheses. * The reported accuracy excludes five cross-labeled classes (desk/table and plant/flower-pot/vase).
†The accuracy values for models utilizing mesh input on the Manifold40 dataset are reported from [12, 21] .

tions of 3D objects. Table 2 shows that our method achieves struggle, all while using significantly fewer learnable pa-
comparable results to prior work in the NR/NR evaluation rameters.
scenario, where both the training and testing are performed Furthermore, we compare our approach with state-of-
on aligned dataset samples. Our approach, in particular, the-art deep learning methods that utilize point cloud rep-
reaches a classification accuracy of 90.7%. In our analysis, resentation for 3D objects. While rotation-sensitive meth-
we identified some categories with cross-labeling issues, ods [42, 54, 55] experience a substantial accuracy drop in
such as desk/table and plant/flowerpot/vase, which contain a NR/AR setting, Clusternet [8], VN-DGCN [10] and Pos-
notable number of cross-labeled samples. Following the ap- eSelector [28] achieve rotation-invariant classification. No-
proach of MeshWalker [24] and Laplacian2Mesh [12], we tably, the accuracy of PoseSelector [28] decreases in classi-
excluded the models of these five categories. As a result, fication tasks involving datasets with non-symmetrical ob-
our classification accuracy reached 93.4%, a result that falls jects, where intra-class shapes do not share similar struc-
between the accuracy of Laplacian2Mesh and MeshWalker. tures. However, our proposed method consistently achieves
In NR/AR setup, we maintain the classification accuracy state-of-the-art performance in accuracy.
of 90.7% on the randomly rotated full test set and signifi-
cantly outperform all compared methods. The accuracy of 5. Conclusions
RIMeshGNN also remains at 93.4% for the reduced dataset
In this paper, we present a novel representation learn-
in the NR/AR setup. In contrast, the majority of other deep
ing model, RIMeshGNN, to address the challenge of rota-
learning approaches that incorporate Euclidean coordinates
tion invariance in shape analysis tasks. Our approach con-
and normal vectors struggle to generalize to unfamiliar ori-
sists of a E(3) equivariant GNN layer, aggregation func-
entations, resulting in a significant decline in performance
tion, and local pooling layer, which together form the basis
when assessed in the NR/AR evaluation setting. It could be
of our proposed rotation-invariant network for mesh model
argued that rotation augmentation might enhance the perfor-
classification. The method enables the analysis of mesh-
mance of rotation-sensitive models [14,18,21,24,42,54,55];
structured data while maintaining accuracy even when test
however, this requires training with extensive data augmen-
samples experience arbitrary rotations, without requiring
tation, substantially more complex analysis models, and in-
extensive training on rotation-augmented dataset. Compre-
creased computation resources, while our proposed method
hensive tests on various datasets demonstrate that our ap-
does not require any rotation augmentation. Furthermore,
proach achieves state-of-the-art performance. Additionally,
our method achieves comparable accuracy in the NR/NR
the proposed representation learning framework, can and
evaluation setup, and also maintains consistent accuracy in
will be adapted for other mesh analysis tasks, such as mesh
the NR/AR and AR/AR configuration, where other models
segmentation and retrieval.

3157
References [12] Qiujie Dong, Zixiong Wang, Manyi Li, Junjie Gao, Shuang-
min Chen, Zhenyu Shu, Shiqing Xin, Changhe Tu, and Wen-
[1] Muhammet Balcilar, Guillaume Renton, Pierre Héroux, ping Wang. Laplacian2Mesh: Laplacian-based mesh under-
Benoit Gauzere, Sebastien Adam, and Paul Honeine. Bridg- standing. IEEE Transactions on Visualization and Computer
ing the gap between spectral and spatial domains in graph Graphics, 2023. 8
neural networks. arXiv preprint arXiv:2003.11702, 2020. 3 [13] Danielle Ezuz, Justin Solomon, Vladimir G Kim, and Mirela
[2] Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Al- Ben-Chen. GWCNN: A metric alignment layer for deep
varo Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Ma- shape analysis. In Computer Graphics Forum, volume 36,
linowski, Andrea Tacchetti, David Raposo, Adam Santoro, pages 49–57. Wiley Online Library, 2017. 7
Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Bal- [14] Yutong Feng, Yifan Feng, Haoxuan You, Xibin Zhao, and
lard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Yue Gao. MeshNet: Mesh neural network for 3D shape rep-
Allen, Charles Nash, Victoria Langston, Chris Dyer, Nico- resentation. In Proceedings of the AAAI Conference on Ar-
las Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, tificial Intelligence, volume 33, pages 8279–8286, 2019. 3,
Oriol Vinyals, Yujia Li, and Razvan Pascanu. Relational in- 8
ductive biases, deep learning, and graph networks, 2018. 3
[15] Michael Garland and Paul S Heckbert. Surface simplification
[3] Ran Ben Izhak, Alon Lahav, and Ayellet Tal. AttWalk: At- using quadric error metrics. In Proceedings of the 24th an-
tentive cross-walks for deep mesh analysis. In Proceed- nual conference on Computer graphics and interactive tech-
ings of the IEEE/CVF Winter Conference on Applications of niques, pages 209–216, 1997. 5, 7
Computer Vision, pages 1546–1555, 2022. 1, 2 [16] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol
[4] Giorgos Bouritsas, Sergiy Bokhnyak, Stylianos Ploumpis, Vinyals, and George E Dahl. Neural message passing for
Michael Bronstein, and Stefanos Zafeiriou. Neural 3D mor- quantum chemistry. In International conference on machine
phable models: Spiral convolutional networks for 3D shape learning, pages 1263–1272. PMLR, 2017. 3
representation learning and generation. In Proceedings of [17] Shunwang Gong, Lei Chen, Michael Bronstein, and Stefanos
the IEEE/CVF International Conference on Computer Vi- Zafeiriou. SpiralNet++: A fast and highly efficient mesh
sion, pages 7213–7222, 2019. 2 convolution operator. In Proceedings of the IEEE/CVF Inter-
[5] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Le- national Conference on Computer Vision Workshops, pages
Cun. Spectral networks and locally connected networks on 0–0, 2019. 2
graphs. arXiv preprint arXiv:1312.6203, 2013. 3 [18] Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang
[6] Chen Cai and Yusu Wang. A note on over-smoothing for Mu, Ralph R Martin, and Shi-Min Hu. Pct: Point cloud
graph neural networks. arXiv preprint arXiv:2006.13318, transformer. Computational Visual Media, 7:187–199, 2021.
2020. 6 1, 2, 8
[7] Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat [19] William L Hamilton, Rex Ying, and Jure Leskovec. Induc-
Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Mano- tive representation learning on large graphs. In Proceedings
lis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, of the 31st International Conference on Neural Information
and Fisher Yu. ShapeNet: An Information-Rich 3D Model Processing Systems, pages 1025–1035, 2017. 3
Repository. Technical Report arXiv:1512.03012 [cs.GR], [20] Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar
Stanford University — Princeton University — Toyota Tech- Fleishman, and Daniel Cohen-Or. MeshCNN: A network
nological Institute at Chicago, 2015. 1 with an edge. ACM Transactions on Graphics (TOG),
[8] Chao Chen, Guanbin Li, Ruijia Xu, Tianshui Chen, Meng 38(4):1–12, 2019. 3, 5, 7
Wang, and Liang Lin. Clusternet: Deep hierarchical cluster [21] Shi-Min Hu, Zheng-Ning Liu, Meng-Hao Guo, Jun-Xiong
network with rigorously rotation-invariant representation for Cai, Jiahui Huang, Tai-Jiang Mu, and Ralph R Martin.
point cloud analysis. In Proceedings of the IEEE/CVF con- Subdivision-based mesh convolution networks. ACM Trans-
ference on computer vision and pattern recognition, pages actions on Graphics (TOG), 41(3):1–16, 2022. 1, 3, 7, 8
4994–5002, 2019. 8 [22] Jingwei Huang, Yichao Zhou, and Leonidas Guibas. Man-
[9] Michaël Defferrard, Xavier Bresson, and Pierre Van- ifoldPlus: A robust and scalable watertight manifold sur-
dergheynst. Convolutional neural networks on graphs face generation method for triangle soups. arXiv preprint
with fast localized spectral filtering. arXiv preprint arXiv:2005.11621, 2020. 7
arXiv:1606.09375, 2016. 3 [23] Asako Kanezaki, Yasuyuki Matsushita, and Yoshifumi
[10] Congyue Deng, Or Litany, Yueqi Duan, Adrien Poulenard, Nishida. RotationNet: Joint object categorization and pose
Andrea Tagliasacchi, and Leonidas J Guibas. Vector neu- estimation using multiviews from unsupervised viewpoints.
rons: A general framework for so (3)-equivariant networks. In Proceedings of IEEE International Conference on Com-
In Proceedings of the IEEE/CVF International Conference puter Vision and Pattern Recognition (CVPR), 2018. 2
on Computer Vision, pages 12200–12209, 2021. 3, 8 [24] Alon Lahav and Ayellet Tal. MeshWalker: Deep mesh under-
[11] Frederik Diehl, Thomas Brunner, Michael Truong Le, and standing by random walks. ACM Transactions on Graphics
Alois Knoll. Towards graph pooling by edge contraction. (TOG), 39(6):1–13, 2020. 1, 2, 7, 8
In ICML 2019 Workshop on Learning and Reasoning with [25] Itai Lang, Asaf Manor, and Shai Avidan. SampleNet: Dif-
Graph-Structured Data, 2019. 5, 6 ferentiable point cloud sampling. In Proceedings of the

3158
IEEE/CVF Conference on Computer Vision and Pattern [38] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,
Recognition, pages 7578–7588, 2020. 1 James Bradbury, Gregory Chanan, Trevor Killeen, Zeming
[26] Truc Le and Ye Duan. PointGrid: A deep network for 3D Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison,
shape understanding. In Proceedings of the IEEE conference Andreas Kopf, Edward Yang, Zachary DeVito, Martin Rai-
on computer vision and pattern recognition, pages 9204– son, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner,
9214, 2018. 1 Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An
[27] Ron Levie, Federico Monti, Xavier Bresson, and Michael M imperative style, high-performance deep learning library. In
Bronstein. CayleyNets: Graph convolutional neural net- H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc,
works with complex rational spectral filters. IEEE Trans- E. Fox, and R. Garnett, editors, Advances in Neural Infor-
actions on Signal Processing, 67(1):97–109, 2018. 3 mation Processing Systems, volume 32. Curran Associates,
[28] Feiran Li, Kent Fujiwara, Fumio Okura, and Yasuyuki Mat- Inc., 2019. 6
sushita. A closer look at rotation-invariant deep point cloud [39] Adrien Poulenard and Leonidas J Guibas. A functional ap-
analysis. In Proceedings of the IEEE/CVF International proach to rotation equivariant non-linearities for tensor field
Conference on Computer Vision, pages 16218–16227, 2021. networks. In Proceedings of the IEEE/CVF Conference on
3, 8 Computer Vision and Pattern Recognition, pages 13174–
[29] Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan 13183, 2021. 3
Di, and Baoquan Chen. PointCNN: Convolution on x- [40] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas.
transformed points. Advances in neural information process- PointNet: Deep learning on point sets for 3D classification
ing systems, 31, 2018. 1 and segmentation. In Proceedings of the IEEE conference
[30] Z Lian, A Godil, B Bustos, M Daoudi, J Hermans, S Kawa- on computer vision and pattern recognition, pages 652–660,
mura, Y Kurita, G Lavoua, P Dp Suetens, et al. Shape re- 2017. 1, 2
trieval on non-rigid 3D watertight meshes. In Eurographics [41] Charles R Qi, Hao Su, Matthias Nießner, Angela Dai,
workshop on 3D object retrieval (3DOR). Citeseer, 2011. 7 Mengyuan Yan, and Leonidas J Guibas. Volumetric and
[31] Isaak Lim, Alexander Dielen, Marcel Campen, and Leif multi-view CNNs for object classification on 3D data. In
Kobbelt. A simple approach to intrinsic correspondence Proceedings of the IEEE conference on computer vision and
learning on unstructured 3D meshes. In Proceedings of the pattern recognition, pages 5648–5656, 2016. 1, 2
European Conference on Computer Vision (ECCV) Work- [42] Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J
shops, pages 0–0, 2018. 2 Guibas. PointNet++: Deep hierarchical feature learning on
[32] Jonathan Masci, Davide Boscaini, Michael Bronstein, and point sets in a metric space. Advances in neural information
Pierre Vandergheynst. Geodesic convolutional neural net- processing systems, 30, 2017. 1, 2, 8
works on Riemannian manifolds. In Proceedings of the [43] Vıctor Garcia Satorras, Emiel Hoogeboom, and Max
IEEE international conference on computer vision work- Welling. E(n) equivariant graph neural networks. In Interna-
shop, pages 37–45, 2015. 2 tional Conference on Machine Learning, pages 9323–9332.
[33] Daniel Maturana and Sebastian Scherer. VoxNet: A 3D con- PMLR, 2021. 3, 5
volutional neural network for real-time object recognition. [44] Bahareh Shakibajahromi, Saeed Shayestehmanesh, Daniel
In 2015 IEEE/RSJ International Conference on Intelligent Schwartz, and Ali Shokoufandeh. HyNet: 3D segmentation
Robots and Systems (IROS), pages 922–928, 2015. 1, 2 using hybrid graph networks. In 2021 International Confer-
[34] Francesco Milano, Antonio Loquercio, Antoni Rosinol, Da- ence on 3D Vision (3DV), pages 805–814. IEEE, 2021. 1,
vide Scaramuzza, and Luca Carlone. Primal-dual mesh con- 3
volutional neural networks. In Advances in Neural Informa- [45] Nicholas Sharp, Souhaib Attaiki, Keenan Crane, and Maks
tion Processing Systems, pages 952–963. Curran Associates, Ovsjanikov. Diffusionnet: Discretization agnostic learn-
Inc. 1, 3, 7 ing on surfaces. ACM Transactions on Graphics (TOG),
[35] Federico Monti, Davide Boscaini, Jonathan Masci, 41(3):1–16, 2022. 2
Emanuele Rodola, Jan Svoboda, and Michael M Bronstein. [46] Dmitriy Smirnov and Justin Solomon. HodgeNet: Learning
Geometric deep learning on graphs and manifolds using spectral geometry on triangle meshes. ACM Transactions on
mixture model CNNs. In Proceedings of the IEEE con- Graphics (TOG), 40(4):1–11, 2021. 7
ference on computer vision and pattern recognition, pages [47] Hang Su, Subhransu Maji, Evangelos Kalogerakis, and
5115–5124, 2017. 1, 2, 3 Erik G. Learned-Miller. 2
[36] Federico Monti, Oleksandr Shchur, Aleksandar Bojchevski, [48] Xiao Sun, Zhouhui Lian, and Jianguo Xiao. SRINet: Learn-
Or Litany, Stephan Günnemann, and Michael M Bronstein. ing strictly rotation-invariant representations for point cloud
Dual-primal graph convolutional networks. arXiv preprint classification and segmentation. In Proceedings of the 27th
arXiv:1806.00770, 2018. 3 ACM International Conference on Multimedia, pages 980–
[37] A. A. M. Muzahid, Wanggen Wan, Ferdous Sohel, 988, 2019. 2, 3
Naimat Ullah Khan, Ofelia Delfina Cervantes Villagómez, [49] Zhiyong Tao, Yixin Zhu, Tong Wei, and Sen Lin. Multi-
and Hidayat Ullah. 3D object classification using a volumet- head attentional point cloud classification and segmentation
ric deep neural network: An efficient octree guided auxiliary using strictly rotation-invariant representations. IEEE Ac-
learning approach. IEEE Access, 8:23802–23816, 2020. 1, 2 cess, 9:71133–71144, 2021. 2, 3

3159
[50] Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann
Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field
networks: Rotation-and translation-equivariant neural net-
works for 3D point clouds. arXiv preprint arXiv:1802.08219,
2018. 2
[51] Petar Veličković, Guillem Cucurull, Arantxa Casanova,
Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph at-
tention networks. arXiv preprint arXiv:1710.10903, 2017. 3,
5
[52] Nitika Verma, Edmond Boyer, and Jakob Verbeek. FeaStNet:
Feature-steered graph convolutions for 3D shape analysis. In
Proceedings of the IEEE conference on computer vision and
pattern recognition, pages 2598–2606, 2018. 1, 3
[53] Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei
Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu
Gai, et al. Deep graph library: A graph-centric, highly-
performant package for graph neural networks. arXiv
preprint arXiv:1909.01315, 2019. 2, 6
[54] Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma,
Michael M Bronstein, and Justin M Solomon. Dynamic
graph CNN for learning on point clouds.(2018). arXiv
preprint arXiv:1801.07829, 222, 2018. 1, 2, 8
[55] Ruben Wiersma, Ahmad Nasikun, Elmar Eisemann, and
Klaus Hildebrandt. Deltaconv: anisotropic operators for ge-
ometric deep learning on point clouds. ACM Transactions on
Graphics (TOG), 41(4):1–10, 2022. 1, 2, 8
[56] Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin-
guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3D
ShapeNets: A deep representation for volumetric shapes. In
Proceedings of the IEEE conference on computer vision and
pattern recognition, pages 1912–1920, 2015. 1, 7
[57] Saining Xie, Sainan Liu, Zeyu Chen, and Zhuowen Tu. At-
tentional ShapeContextNet for point cloud recognition. In
Proceedings of the IEEE conference on computer vision and
pattern recognition, pages 4606–4615, 2018. 1
[58] Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe,
Ken-ichi Kawarabayashi, and Stefanie Jegelka. Representa-
tion learning on graphs with jumping knowledge networks.
In International conference on machine learning, pages
5453–5462. PMLR, 2018. 6
[59] Y. You, Y. Lou, R. Shi, Q. Liu, Y. Tai, L. Ma, W. Wang,
and C. Lu. PRIN/SPRIN: On extracting point-wise rotation
invariant features. IEEE Transactions on Pattern Analysis
amp; Machine Intelligence, 44(12):9489–9502, 2022. 1
[60] Zhiyuan Zhang, Binh-Son Hua, David W Rosen, and Sai-Kit
Yeung. Rotation invariant convolutions for 3D point clouds
deep learning. In 2019 International conference on 3D vision
(3DV), pages 204–213. IEEE, 2019. 2
[61] Zhiyuan Zhang, Binh-Son Hua, and Sai-Kit Yeung. RI-
Conv++: Effective rotation invariant convolutions for 3D
point clouds deep learning. International Journal of Com-
puter Vision, 130(5):1228–1243, 2022. 3

3160

You might also like