You are on page 1of 51

GPU-ACCELERATED 3D POINT CLOUD

PROCESSING WITH
HIERARCHICAL GAUSSIAN MIXTURES
Ben Eckart, NVIDIA Research, Learning and Perception Group, 3/20/2019
2

3D POINT CLOUD DATA


Basic data type for
unstructured 3D data
Emergence of commercial
depth sensors has made it
ubiquitous

2
POINT CLOUD PROCESSING CHALLENGES
Points are non-differentiable,
non-probabilistic
Large amounts of often noisy
data
Often spatially redundant,
wide ranging density variance

3
PREVIOUS APPROACHES
What have people done before?
Discrete Approaches
Voxel Grids/Lists, Octrees, TSDFs
Though efficient, they inherit the same non-
differentiable, non-probabilistic problems as point
clouds

OctoMap

4
PREVIOUS APPROACHES
What have people done before?
Continuous Approaches
Gaussian Mixture Models, Gaussian Processes
Though theoretically attractive, in practice tend to be too
slow for many applications

GMM Gaussian Process 5


Proposal: Hierarchical Gaussian Mixture
GMM Goals:
J=8 “Level 2” GMM
Efficiency
GMM GMM benefits of
J=8 J=8
hierarchical
“Level
structures like
GMM GMM GMM GMM
3” Octree
J=8 J=8 J=8 J=8
Theoretical
benefits of a
“Level probabilistic
generative
4”
GMM GMM GMM GMM GMM GMM GMM GMM

model
J=8 J=8 J=8 J=8 J=8 J=8 J=8 J=8
Talk Overview
• Background
– Theory of generative modeling for point clouds
• Single-Layer Model (GMMs)
– GPU-Accelerated Construction Algorithm
– Benefits: Compact and Data-Parallel
– Limitations: Scaling with model size, lack of memory
coherence
• Hierarchical Models (HGMMs)
– GPU-Accelerated Construction Algorithm
– Benefits: Fast and Parallelizable on GPU
– Application: Registration
STATISTICAL / GENERATIVE MODELS
Interpret point cloud data (PCD) as an iid sampling of some
unknown latent spatial probabilistic function
Generative property: Full joint probability space is
represented

Model

8
Modeling as an MLE Optimization

• Given a set of parameters describing the model, find the


parameters that best “explain” the data (Maximum Data
Likelihood)

Data Model
Parametric Model as a Modified GMM
Interpret point cloud data as an iid sampling from a small number (J << N)
of Gaussian and Uniform Distributions:
GMM for Point Clouds: Intuition
Point samples
representing pieces of the
same local geometry could
be aggregated into
clusters with the local
geometry encoded inside
the covariance of that
cluster.
SOLVING FOR THE MLE GMM PARAMETERS
Typically done via the Expectation Maximization (EM) Algorithm

Update point-cluster associations

𝚯𝒊𝒏𝒊𝒕 𝚯𝒇𝒊𝒏𝒂𝒍
E Step M𝚯 Step

Update 𝚯
Point Cloud

EM Algorithm
12
E Step: A Single Point
Z
zi
𝑂(𝑁)

For each point z, we want to find the relative


likelihood (expectation) of it having been
generated by each cluster
E Step: Expectation Vector
Z
zi

𝑂(𝐽)

We calculate the probability of each point with


respect to each J Gaussian cluster. The expected
associations are denoted by the NxJ matrix γ
M STEP: CLOSED FORM WEIGHTED SUMS
For the GMM case, the M Step has closed form
solutions given the NxJ matrix γ:

“Probabilistic generalization
of K-Means Clustering”
15
GPU Data Parallelism
GMM Model Limitations

• Each point needs to access all J zi


cluster parameters in CUDA (poor
memory locality and linear scaling 𝑂(𝐽)
with J)
• NxJ expectation matrix mostly sparse
(thus wasted computation)
• Static number of Gaussians that
must be set a priori
18

HIERARCHICAL GAUSSIAN MIXTURE


Suppose we restrict J to be only 8 Gaussians
The model would fit entirely in shared memory for each
CUDA threadblock, removing need for global memory
accesses
The expectation matrix will be dense (Nx8)

18
19

HIERARCHICAL GAUSSIAN MIXTURE


After convergence of the J=8 GMM, we can use the Nx8
expectation matrix as a partition function
Each point is partitioned via its maximum expectation
Now we have 8 partitions of roughly size N/8

19
20

HIERARCHICAL GAUSSIAN MIXTURE


We can now run the algorithm recursively on each partition
Each partition contains ~N/8 points that will be modeled as
another J=8 GMM
Note that this will produce 64 clusters in total

20
PARALLEL PARTITIONING USING CUDA
Given each point's max expectation and associated cluster index, we can
"invert" this index using parallel scans to group together point ID's having
same partition #:
Cluster 1 Cluster 2 Cluster 3

[0 0 1 0 1 1 1 2 0 2 2 2] ➔ [[0 1 3 8] [2 4 5 6] [7 9 10 11]]
Now we can run a 2D cuda kernel where
Dimension 1: index into original point cloud
Dimension 2: cluster of the parent
e.g. 3 clusters, 12 points, 2 threads/threadblock ➔ grid size of (2, 3)
21
HGMM COMPLEXITY
Even though we now have 64 clusters, we only need to query 8
clusters for each point (avoiding the computation of all NxJ (sparse)
expectations)
Due to the 2D cuda grid and indexing structure, this segmentation of
the points into 64 clusters is the exact same complexity/speed as the
original "simple" J=8 GMM.
Thus, we can keep increasing the complexity of the model eightfold
while incurring only a linear time penalty

22
HGMM ALGORITHM
Small EM algorithms (8 clusters at a time) are
recursively performed on increasingly smaller
partitions of the point cloud data
E Step: Associate points to clusters
M Step: Update mixture means, covariances,
and weights
Partition Step: Before each recursion step,
new point partitions are determined by
maximum likelihood point-cluster
associations from last E Step
23
HGMM DATA STRUCTURE
GMM
J=8 “Level 2” GMM
Efficiency
benefits of
GMM
J=8
GMM
J=8
hierarchical
structures
“Level
3”
like Octree
GMM GMM GMM GMM
J=8 J=8 J=8 J=8 Theoretical
benefits of a
“Level
4”
probabilistic
GMM
J=8
GMM
J=8
GMM
J=8
GMM
J=8
GMM
J=8
GMM
J=8
GMM
J=8
GMM
J=8 generative
model 24
E Step Performance

25
Compactness vs Fidelity
COMPACTNESS VS FIDELITY
Reconstruction Error (PSNR) vs Model Size (kB)

20
kB

27
MODELING LARGE POINT CLOUDS
Endeavor Snapshots: ~80 GB of Point Cloud Data each

HGMM Level 6:
<12 MB
Volume created
from
stochastically
sampled
Marching Cubes
Visualization is
real-time: ~20
fps on Titan X
28
ENDEAVOR DATA: BILLIONS OF POINTS

29
APPLICATION: RIGID REGISTRATION

Point-sampled surfaces displaced by some rigid transformation


Recover translation, rotation that best overlaps point clouds

30
Registration as EM with HGMM
MLE over Space of Rotations, Translations
Goal: Maximize data
likelihood over T
given some
probability model θ

31
Outdoor Urban Velodyne Data
• Velodyne VLP-16
– ~15k pts/frame
– ~10 frames/sec
• Frame-to-Frame
model-building and
registration with
overlap estimation
HGMM-Based Registration
• Average Frame-
to-Frame Error:
0.0960
Robust Point-to-Plane ICP
• Average Frame-
to-Frame Error:
0.1519
• best result on
libpointmatcher
Speed vs Accuracy Trade-Off
Test: Random
transformations of point
cloud pairs while
varying the subsampling
rate.

Less subsampling yields


better accuracy, but
slower speeds.

Bottom left is fastest


and most accurate.

Our proposed methods


are red/teal/black. Our
Proposed
Methods 35
HGMM COMING TO ISAAC
~350 fps on Titan Xp
~30 fps on Xavier
Error: ~0.05° yaw
(median, 4 Hz
updates)

36
DRIVEWORKS (Future Release)

With Velodyne
HDL-64E:

~300 FPS on
Titan Xp

~30 FPS on
Xavier 37
DNN-BASED STEREO DEPTH MAPS

38
FINAL REMARKS

HGMM’s have many nice properties for modeling point clouds:


Efficient: Fast to compute via CUDA/GPU, even scaling to billions of
points
Multi-Level: Can well-model the data distribution at multiple levels
simultaneously
Probabilistic: allows Bayesian optimization for applications like
registration
Compact and Continuous: no voxels and no aliasing artifacts, easy to
transform
39
QUESTIONS?
40
REGISTRATION FROM DNN-BASED STEREO
Noisy point cloud output is well-suited for HGMM representation

42
Stanford Lounge Dataset (Kinect)

Frame-to-frame registration from point cloud data only (no


depth maps), subsampled to 2000 points, first 100 frames.
Histograms of average Euler angle error per frame shown.
GMM-
Based

ICP-Based

Proposed
43
44
Noise Handling
• Test: Random (uniform)
noise injected at
increasing amounts
• Result: Mixture
component “stick” to
areas of geometrically
coherent, dense areas,
disregarding areas of
noise
SAMPLING FOR PROBABILISTIC OCCUPANCY

𝑝Ƹ = 𝐿Σ𝑝 + 𝜇
∀ 𝜇, Σ ∈ Θ

46
MESHING UNDER NOISE

47
ADAPTIVE
MULTI-
SCALE

48
MULTI-SCALE MODELING
Multilevel cross-sections can be adaptively chosen for robustness

49
E Step: Parallelized Tree Search

Adaptive Thresholding Finds the Most Appropriate Scale to Associate Point Data to the Point Cloud Model

Point-model associations
are found through
parallelized adaptive
tree search in CUDA.
Complexity() is defined
to be , but other
suitable heuristics are
possible. 50
M-Step: Mahalanobis Estimation
We seek the transformation that maximizes the expected
joint log-likelihood of our data and latent associations wrt the
posterior over our current association estimates.
The resulting form (1) is a
weighted sum-of-squared
Mahalanobis distances,
further reduced to (2) by (1)
writing in terms of
{0,1}
sufficient statistics M𝑗 . (2)

Lastly, covariance eigendecomposition produces an equivalent


weighted point-to-plane distance measure (3), which we can solve
efficiently with least squares.
(3
)
51

You might also like