s9623 Gpu Accelerated 3d Point Cloud Processing With Hierarchical Gaussian Mixtures

GPU-ACCELERATED 3D POINT CLOUD
PROCESSING WITH
HIERARCHICAL GAUSSIAN MIXTURES
Ben Eckart, NVIDIA Research, Learning and Perception Group, 3/20/2019
2
3D POINT CLOUD DATA

Basic data type for
unstructured 3D data
Emergence of commercial
depth sensors has made it
ubiquitous
2
POINT CLOUD PROCESSING CHALLENGES
Points are non-differentiable,
non-probabilistic
Large amounts of often noisy
data
Often spatially redundant,
wide ranging density variance
3
PREVIOUS APPROACHES
What have people done before?
Discrete Approaches
Voxel Grids/Lists, Octrees, TSDFs
Though efficient, they inherit the same non-
differentiable, non-probabilistic problems as point
clouds
OctoMap
4
PREVIOUS APPROACHES
What have people done before?
Continuous Approaches
Gaussian Mixture Models, Gaussian Processes
Though theoretically attractive, in practice tend to be too
slow for many applications
GMM Gaussian Process 5

Proposal: Hierarchical Gaussian Mixture
GMM Goals:
J=8 “Level 2” GMM
Efficiency
GMM GMM benefits of
J=8 J=8
hierarchical
“Level
structures like
GMM GMM GMM GMM
3” Octree
J=8 J=8 J=8 J=8
Theoretical
benefits of a
“Level probabilistic
generative
4”
GMM GMM GMM GMM GMM GMM GMM GMM
model
J=8 J=8 J=8 J=8 J=8 J=8 J=8 J=8
Talk Overview
• Background
– Theory of generative modeling for point clouds
• Single-Layer Model (GMMs)
– GPU-Accelerated Construction Algorithm
– Benefits: Compact and Data-Parallel
– Limitations: Scaling with model size, lack of memory
coherence
• Hierarchical Models (HGMMs)
– GPU-Accelerated Construction Algorithm
– Benefits: Fast and Parallelizable on GPU
– Application: Registration
STATISTICAL / GENERATIVE MODELS
Interpret point cloud data (PCD) as an iid sampling of some
unknown latent spatial probabilistic function
Generative property: Full joint probability space is
represented
Model
8
Modeling as an MLE Optimization
• Given a set of parameters describing the model, find the

parameters that best “explain” the data (Maximum Data
Likelihood)
Data Model
Parametric Model as a Modified GMM
Interpret point cloud data as an iid sampling from a small number (J << N)
of Gaussian and Uniform Distributions:
GMM for Point Clouds: Intuition
Point samples
representing pieces of the
same local geometry could
be aggregated into
clusters with the local
geometry encoded inside
the covariance of that
cluster.
SOLVING FOR THE MLE GMM PARAMETERS
Typically done via the Expectation Maximization (EM) Algorithm
Update point-cluster associations
𝚯𝒊𝒏𝒊𝒕 𝚯𝒇𝒊𝒏𝒂𝒍
E Step M𝚯 Step
Update 𝚯
Point Cloud
EM Algorithm
12
E Step: A Single Point
Z
zi
𝑂(𝑁)
For each point z, we want to find the relative

likelihood (expectation) of it having been
generated by each cluster
E Step: Expectation Vector
Z
zi
𝑂(𝐽)
We calculate the probability of each point with

respect to each J Gaussian cluster. The expected
associations are denoted by the NxJ matrix γ
M STEP: CLOSED FORM WEIGHTED SUMS
For the GMM case, the M Step has closed form
solutions given the NxJ matrix γ:
“Probabilistic generalization
of K-Means Clustering”
15
GPU Data Parallelism
GMM Model Limitations
• Each point needs to access all J zi

cluster parameters in CUDA (poor
memory locality and linear scaling 𝑂(𝐽)
with J)
• NxJ expectation matrix mostly sparse
(thus wasted computation)
• Static number of Gaussians that
must be set a priori
18
HIERARCHICAL GAUSSIAN MIXTURE

Suppose we restrict J to be only 8 Gaussians
The model would fit entirely in shared memory for each
CUDA threadblock, removing need for global memory
accesses
The expectation matrix will be dense (Nx8)
18
19

After convergence of the J=8 GMM, we can use the Nx8
expectation matrix as a partition function
Each point is partitioned via its maximum expectation
Now we have 8 partitions of roughly size N/8
19
20

We can now run the algorithm recursively on each partition
Each partition contains ~N/8 points that will be modeled as
another J=8 GMM
Note that this will produce 64 clusters in total
20
PARALLEL PARTITIONING USING CUDA
Given each point's max expectation and associated cluster index, we can
"invert" this index using parallel scans to group together point ID's having
same partition #:
Cluster 1 Cluster 2 Cluster 3
[0 0 1 0 1 1 1 2 0 2 2 2] ➔ [[0 1 3 8] [2 4 5 6] [7 9 10 11]]
Now we can run a 2D cuda kernel where
Dimension 1: index into original point cloud
Dimension 2: cluster of the parent
e.g. 3 clusters, 12 points, 2 threads/threadblock ➔ grid size of (2, 3)
21
HGMM COMPLEXITY
Even though we now have 64 clusters, we only need to query 8
clusters for each point (avoiding the computation of all NxJ (sparse)
expectations)
Due to the 2D cuda grid and indexing structure, this segmentation of
the points into 64 clusters is the exact same complexity/speed as the
original "simple" J=8 GMM.
Thus, we can keep increasing the complexity of the model eightfold
while incurring only a linear time penalty
22
HGMM ALGORITHM
Small EM algorithms (8 clusters at a time) are
recursively performed on increasingly smaller
partitions of the point cloud data
E Step: Associate points to clusters
M Step: Update mixture means, covariances,
and weights
Partition Step: Before each recursion step,
new point partitions are determined by
maximum likelihood point-cluster
associations from last E Step
23
HGMM DATA STRUCTURE
GMM
J=8 “Level 2” GMM
Efficiency
benefits of
GMM
J=8
GMM
J=8
hierarchical
structures
“Level
3”
like Octree
GMM GMM GMM GMM
J=8 J=8 J=8 J=8 Theoretical
benefits of a
“Level
4”
probabilistic
GMM
J=8
GMM
J=8
GMM
J=8
GMM
J=8
GMM
J=8
GMM
J=8
GMM
J=8
GMM
J=8 generative
model 24
E Step Performance
25
Compactness vs Fidelity
COMPACTNESS VS FIDELITY
Reconstruction Error (PSNR) vs Model Size (kB)
20
kB
27
MODELING LARGE POINT CLOUDS
Endeavor Snapshots: ~80 GB of Point Cloud Data each
HGMM Level 6:
<12 MB
Volume created
from
stochastically
sampled
Marching Cubes
Visualization is
real-time: ~20
fps on Titan X
28
ENDEAVOR DATA: BILLIONS OF POINTS
29
APPLICATION: RIGID REGISTRATION
Point-sampled surfaces displaced by some rigid transformation

Recover translation, rotation that best overlaps point clouds
30
Registration as EM with HGMM
MLE over Space of Rotations, Translations
Goal: Maximize data
likelihood over T
given some
probability model θ
31
Outdoor Urban Velodyne Data
• Velodyne VLP-16
– ~15k pts/frame
– ~10 frames/sec
• Frame-to-Frame
model-building and
registration with
overlap estimation
HGMM-Based Registration
• Average Frame-
to-Frame Error:
0.0960
Robust Point-to-Plane ICP
• Average Frame-
to-Frame Error:
0.1519
• best result on
libpointmatcher
Speed vs Accuracy Trade-Off
Test: Random
transformations of point
cloud pairs while
varying the subsampling
rate.
Less subsampling yields

better accuracy, but
slower speeds.
Bottom left is fastest

and most accurate.
Our proposed methods

are red/teal/black. Our
Proposed
Methods 35
HGMM COMING TO ISAAC
~350 fps on Titan Xp
~30 fps on Xavier
Error: ~0.05° yaw
(median, 4 Hz
updates)
36
DRIVEWORKS (Future Release)
With Velodyne
HDL-64E:
~300 FPS on
Titan Xp
~30 FPS on
Xavier 37
DNN-BASED STEREO DEPTH MAPS
38
FINAL REMARKS
HGMM’s have many nice properties for modeling point clouds:

Efficient: Fast to compute via CUDA/GPU, even scaling to billions of
points
Multi-Level: Can well-model the data distribution at multiple levels
simultaneously
Probabilistic: allows Bayesian optimization for applications like
registration
Compact and Continuous: no voxels and no aliasing artifacts, easy to
transform
39
QUESTIONS?
40
REGISTRATION FROM DNN-BASED STEREO
Noisy point cloud output is well-suited for HGMM representation
42
Stanford Lounge Dataset (Kinect)
Frame-to-frame registration from point cloud data only (no

depth maps), subsampled to 2000 points, first 100 frames.
Histograms of average Euler angle error per frame shown.
GMM-
Based
ICP-Based
Proposed
43
44
Noise Handling
• Test: Random (uniform)
noise injected at
increasing amounts
• Result: Mixture
component “stick” to
areas of geometrically
coherent, dense areas,
disregarding areas of
noise
SAMPLING FOR PROBABILISTIC OCCUPANCY
𝑝Ƹ = 𝐿Σ𝑝 + 𝜇
∀ 𝜇, Σ ∈ Θ
46
MESHING UNDER NOISE
47
ADAPTIVE
MULTI-
SCALE
48
MULTI-SCALE MODELING
Multilevel cross-sections can be adaptively chosen for robustness
49
E Step: Parallelized Tree Search
Adaptive Thresholding Finds the Most Appropriate Scale to Associate Point Data to the Point Cloud Model
Point-model associations
are found through
parallelized adaptive
tree search in CUDA.
Complexity() is defined
to be , but other
suitable heuristics are
possible. 50
M-Step: Mahalanobis Estimation
We seek the transformation that maximizes the expected
joint log-likelihood of our data and latent associations wrt the
posterior over our current association estimates.
The resulting form (1) is a
weighted sum-of-squared
Mahalanobis distances,
further reduced to (2) by (1)
writing in terms of
{0,1}
sufficient statistics M𝑗 . (2)
Lastly, covariance eigendecomposition produces an equivalent

weighted point-to-plane distance measure (3), which we can solve
efficiently with least squares.
(3
)
51

s9623 Gpu Accelerated 3d Point Cloud Processing With Hierarchical Gaussian Mixtures

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

s9623 Gpu Accelerated 3d Point Cloud Processing With Hierarchical Gaussian Mixtures

Uploaded by

Copyright:

Available Formats

GPU-ACCELERATED 3D POINT CLOUD

3D POINT CLOUD DATA

GMM Gaussian Process 5

• Given a set of parameters describing the model, find the

Update point-cluster associations

For each point z, we want to find the relative

We calculate the probability of each point with

• Each point needs to access all J zi

HIERARCHICAL GAUSSIAN MIXTURE

HIERARCHICAL GAUSSIAN MIXTURE

HIERARCHICAL GAUSSIAN MIXTURE

Point-sampled surfaces displaced by some rigid transformation

Less subsampling yields

Bottom left is fastest

Our proposed methods

HGMM’s have many nice properties for modeling point clouds:

Frame-to-frame registration from point cloud data only (no

Lastly, covariance eigendecomposition produces an equivalent

You might also like