Point Cloud Compression in MPEG

Point Clouds – the
convergence between video

and graphics technologies
Convergence Future Communication Colloquium,

Kyung Hee University, March 2021
Marius PREDA
MPEG 3D Graphics Convenor
Institut Polytechnique de Paris, FRANCE
Are we ready for the immersion in digital worlds?
A more realistic scenario …

Understanding the present by looking back to the history of visual content
Realistic Computer
Drawings Paintings Paintings Photography
Generated
A more recent history …
New ways
of
consuming
visual
vs
content
Traditional way
of watching
visual content
A more recent history …
New ways
Fixed support Mobile support of
Fixed user position Dynamic user position
consuming
User as a spectator visual
User as a director
vs
Same experience for all content
Personalized experience
Traditional way
of watching
visual content
What is the present and the near future?
Is the need of immersive applications real? Is realistic

immersion technically feasible with current technologies?

The MPEG answer is YES! and even more …, standards are needed!
MPEG started in 2017 MPEG-I (Coded Representation of Immersive Media)

The MPEG answer is YES! and even more …, standards are needed!
MPEG started in 2017 MPEG-I (Coded Representation of Immersive Media)
Immersion is obtained by:

• being in (e.g. video 360, a 3D scene),
• being synchronized with,
• interacting with
The Cinema was pioneering this track

(stereoscopic content, huge screens, motion
chairs, …) but for multiusers experience
Yes, we are!
What are the point clouds and how they enable immersion?
How can we compress them? The MPEG-I approaches
What’s next?
Synergies by the confluence of two worlds of technologies
Visual
capture
Visual
synthesis
Visual
capture
Multi-camera
LDR, HDR
HD, Full HD, 4K, 8K
Stereoscopy
Visual
synthesis
Visual
capture
Geometric
primitives
Visual
synthesis
Visual
capture
Multi-camera
LDR, HDR
Geometric
HD, Full HD, 4K, 8K primitives
Stereoscopy
Visual
synthesis
Visual
capture
Easy to produce
High quality
Interactivity
Immersion
Visual
synthesis
Visual
capture
Easy to produce
High quality
Interactivity
Immersion
Visual
synthesis
What is at the frontier?
Visual Point Cloud –

capture a convergence between 2 worlds
Visual
synthesis
Point Cloud
A set of 3D points
• not ordered,
• without relations between
them
Each point is defined by

• (X, Y, Z)
• (R, G, B) or (Y, U, V)
• reflectance, transparency, …
Point Clouds
Point Clouds
Sport viewing with point clouds
360°
backgroun
d
3D
objects
1-3 Gbps per object

Presence in Augmented Reality
Realistic content
• spatially collocated,
• interactive
Environment mapping for autonomous driving
~20 million points
• 2,020,734,515 bytes
Point Cloud
800,000 points -> 1 000 Mbps (uncompressed)
Compression is required in order to make PC useful

Point Cloud Compression – basic principles
Very sparse occupancy of the 3D space

- (usually) the objects are represented by their
surface and not by volumes
- In 2D a pixel has 8 neighbors, in 3D - 26 and
many of them are transparent
Special constructs are needed: octree or KD-tree

- points grouped into hierarchical structure of branches and leaves
- better difference/residual coding between a representative point and its
direct neighbors in a group
Octree and KD-tree are great for static scenes

- very difficult to extend theirs performances to the temporal axis
- leaves that jump from one branch to another in the octree,
even after a simple motion
Point Cloud Compression in MPEG V-PCC
04/2020
G-PCC
7/2020
2014 2015 2016 2017 2018 2019 2020
MPEG initiated In April 2017 MPEG First Committee Draft

the work on issued a Call for issued in October 2018
PCC Proposals
9 technology leading companies

responded and MPEG evaluated them in
October 2017
10
0
1 2 3 4
5 6 7 8
9 10 11
12 13 14
Geometry Compressed
Geometry Padded geometry
image geometry video
images
generation images
Input Texture Image
V-PCC Video- G-PCC Geometry-

Compressed
point Patch Texture padding Padded Video Texture
Packing image images texture
cloud generation Compression video
generation images
C om pre sse d
multiple xer
frame
bitstrea m
Smoothed
geometry
Occupancy
based PCC based PCC

map
Reconstructed
Smoothing geometry images
Patch info
Compressed
Occupancy map occupancy
compression map
Compressed
Auxiliary patch-info auxiliary patch
compression information
Video-based Point Cloud Compression
Main ideas:
(1) a point coordinate is encoded as a distance with respect to a
particular plane – inspired from the “displacement mapping” in
Graphics
Pixel intensity Vertex Height

Main ideas:
(2) the color (or any attribute) associated to a 3D vertex is
encoded in a 2D texture – inspired from the “texture mapping” in
Graphics
Vertex color Pixel color

Projecting all the points on a

single plane would result to
several 3D points having the same
2D projection - > several depth
values should be stored per pixel
Projecting per patch is

preferred:
- A set of points (patch) in
a small neighborhood is
projected on the same
plane
- The set of projection
planes is very limited
- 6 faces of the cube
- 4 additional diagonal planes
Projection Orientation
In V-PCC, 6 orientations are defined: (±x, ±y, ±z)
Additionally, the projection axis may be rotated by 45
degrees around each direction, into 12 new
orientations
Rotation* around Z-axis
Rotation* around X-axis

Rotation* around Y-axis
*Rotations can be done lossless in integers, using shearing and scaling.
Encoding the 3D point clouds as a set of 2D patches
Geometry
Color (Attributes)

- not all the pixels in the image are used for reconstruction

- not all the pixels in the image are used for reconstruction, an occupancy
map indicates which ones should be used
~

- Some points are still projected on already occupied pixels
~

- Some points are still projected on already occupied pixels, an additional
depth map is used

- For enforcing lossless, the missed points are encoded separately
Encoding the 3D point clouds as a set of 2D videos: depth, color

and occupancy maps
MPEG is very
good in video
coding!
Problem solved 
Not really … point cloud projections have very bad temporal

behavior, not good for traditional video codecs
Frame 24 Frame 26
Per frame encoding criteria are used

Frame 12 Frame 13 Frame 14
inter prediction
intra prediction
prediction type
motion vectors
inter prediction
intra prediction
prediction type
motion vectors
One way to improve the temporal redundancy: global packing
No global Global no rotation Global flexible Global Tetris

packing packing packing packing
The good thing is that patch forming in out of the scope of the
standard : a good opportunity to have competition at the
encoder level!
1. Motion compensation based on octree segmentation
Up to 20% gain
2. Motion compensation based on skeleton segmentation
A priori, only rotations for

anatomical parts
Weighted influence near

the joints
Encoding 3D point clouds as a set of 2D videos: color, depth and occupancy map
100,000 points @ 30fps  360 Mbps (uncompressed)

 1 Mbps (MPEG PCC 2019)
7 Mbps 4.4 Mbps

Evaluation Dataset
10-bit positions, 8-bit texture, 0.8M-1M points, 30-50 fps
Longdress[1] loot[1] soldier[1] red&black[1] queen [2]

Licenses available on https://mpegfs.int-evry.fr/mpegcontent/
[1] E. d’Eon, B. Harrison, T. Myers, P. A. Chou, “8i Voxelized Full Bodies – A Voxelized Point Cloud Dataset,” ISO/IEC JTC1/SC29 Joint WG11/WG1 input document m40059/M74006, Geneva, January 2017.
[2] J. Ricard , C. Guède, R. Doré, S. Lasserre, “CGI-based dynamic point cloud test content,” ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document m40050, Geneva, January 2017.
Objective Quality Metrics: Geometric Quality
3
10 Decoded
point cloud
Reference B
point cloud
• D1: Point-to-point A
,
, ,
∈ ∈
• D2: Point-to-plane
,
∈ ∈
is the maximum coordinate range (e.g., 1023 for 10-bit content)

Objective Quality Metrics: Attributes Quality
Decoded
point cloud
10 Reference
point cloud
B
is the maximum attribute range (e.g., 255 for 8-bit content)

Rate distortion metric: BD Rate
Codec C1
Quality
Codec C2
Bitrate
BD-rate1 is the average, over the green area, of the

bitrate difference (in %) for the same quality
[1] G. Bjontegaard, Calculation of Average PSNR Differences between RD curves, VCEG-M33, April 2001.
V-PCC Evolution (All Intra): TMC2-v11 vs. TMC2-v1 vs. Anchor[1]
Loot (all intra) Red & black (all intra) Soldier (all intra)
75.0 75.0 75.0
70.0 70.0 70.0

D1-PSNR (dB)
D1-PSNR (dB)
D1-PSNR (dB)
65.0 65.0 65.0
60.0 60.0 60.0
55.0 55.0 55.0
50.0 50.0 50.0

0.0 5.0 10.0 15.0 20.0 25.0 30.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 0.0 10.0 20.0 30.0 40.0
Bitrate (MBits/s) Bitrate (MBits/s) Bitrate (MBits/s)
45.0 45.0 44.0
40.0 40.0 39.0

Y-PSNR (dB)
Y-PSNR (dB)
Y-PSNR (dB)
35.0 35.0 34.0
30.0 30.0 29.0
25.0 25.0 24.0

0.0 5.0 10.0 15.0 20.0 25.0 30.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 0.0 10.0 20.0 30.0 40.0
[1] https://github.com/RufaelDev/pcc-
mp3dg/tree/mpeg_standalone_branch
V-PCC Evolution (All Intra): TMC2-v11 vs. TMC2-v1
V-PCC Evolution (Random Access): TMC2-v11 vs. TMC2-v1 vs. Anchor
Loot (random access) Red & black (random access ) Soldier (random access)
75.0 75.0 75.0
70.0 70.0 70.0
D1-PSNR (dB)
D1-PSNR (dB)
D1-PSNR (dB)
65.0 65.0 65.0
60.0 60.0 60.0
55.0 55.0 55.0
50.0 50.0 50.0

0.0 5.0 10.0 15.0 20.0 25.0 30.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 0.0 10.0 20.0 30.0 40.0 50.0
Loot (random access) Red & black (random access) Soldier (random access)
45.0 45.0 44.0
40.0 40.0 39.0

Y-PSNR (dB)
Y-PSNR (dB)
Y-PSNR (dB)
35.0 35.0 34.0
30.0 30.0 29.0
25.0 25.0 24.0

0.0 5.0 10.0 15.0 20.0 25.0 30.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 0.0 10.0 20.0 30.0 40.0 50.0
V-PCC Evolution (Random Access): TMC2-v11 vs. TMC2-v1
V-PCC Bitstream Components
Video sub-streams make 95%-99% of the overall V-PCC bitstream size

V-PCC: All intra vs. Random Access
Loot (All Intra vs. Random Access) Red & black (All Intra vs. Random Access) Soldier (All Intra vs. Random Access)
73.0 73.0 73.0
71.0 71.0 71.0
D1-PSNR (dB)
D1-PSNR (dB)
D1-PSNR (dB)
69.0 69.0 69.0
67.0 67.0 67.0
65.0 65.0 65.0

0.0 5.0 10.0 15.0 20.0 0.0 5.0 10.0 15.0 20.0 25.0 0.0 10.0 20.0 30.0 40.0
Loot (All Intra vs. Random Access) Red & black (All Intra vs. Random Access) Soldier (All Intra vs. Random Access)
45.0 45.0 43.0
40.0 40.0 38.0

Y-PSNR (dB)
Y-PSNR (dB)
Y-PSNR (dB)
35.0 35.0 33.0
30.0 30.0 28.0

0.0 5.0 10.0 15.0 20.0 0.0 5.0 10.0 15.0 20.0 25.0 0.0 10.0 20.0 30.0 40.0
V-PCC: All intra vs. Random Access
V-PCC All Intra: VVC (VTM 8.2) vs. HEVC (HM 16.2)
74.0 73.0 73.0
72.0 71.0 71.0

D1-PSNR (dB)
D1-PSNR (dB)
D1-PSNR (dB)
70.0 69.0 69.0
68.0 67.0 67.0
66.0 65.0 65.0

0.0 5.0 10.0 15.0 20.0 0.0 5.0 10.0 15.0 20.0 25.0 0.0 10.0 20.0 30.0 40.0
45.0 45.0 44.0
39.0
Y-PSNR (dB)
40.0
Y-PSNR (dB)
40.0
Y-PSNR (dB)
35.0 35.0 34.0
30.0 30.0 29.0

0.0 5.0 10.0 15.0 20.0 0.0 5.0 10.0 15.0 20.0 25.0 0.0 10.0 20.0 30.0 40.0
V-PCC All Intra: VVC (VTM 8.2) vs. HEVC (HM 16.2)
V-PCC Random Access: VVC (VTM 8.2) vs. HEVC (HM 16.2)
Loot (random access) Red & black (random access ) Soldier (random access)
74.0 73.0 72.0
72.0 71.0
D1-PSNR (dB)
D1-PSNR (dB)
D1-PSNR (dB)
70.0
70.0 69.0
TMC2-HEVC 68.0
TMC2-HEVC 67.0
TMC2-HEVC
68.0
TMC2-VVC TMC2-VVC TMC2-VVC

66.0 65.0 66.0
0.0 2.0 4.0 6.0 8.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0
Loot (random access) Red & black (random access) Soldier (random access)
45.0 45.0 44.0
40.0 39.0
Y-PSNR (dB)
40.0
Y-PSNR (dB)
Y-PSNR (dB)
35.0 35.0 TMC2-HEVC 34.0 TMC2-HEVC

TMC2-HEVC
TMC2-VVC TMC2-VVC TMC2-VVC

30.0 30.0 29.0
0.0 2.0 4.0 6.0 8.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0
V-PCC Random Access : VVC (VTM 8.2) vs. HEVC (HM 16.2)
V-PCC implementations publicly available

www.mpeg-pcc.org
Integratedreal-time decoder and renderer

is also available for Android
https://github.com/nokiatech/vpcc
About collaborations and environnement
V-PCC progress was fast

because video coding
expertise was accessible
No other Graphics community
has this privilege
Geometry-based Point Cloud Compression
Encoding 3D point clouds in their native format
100,000 points @ 10 fps  112 Mbps (uncompressed)

Encode flow
Decode flow
Coordinate conversion, Voxelize

 Octree coding
 Trisoup
Geometry coding  Predictive geometry
Point cloud
Coordinate conversion,
Attribute transfer （for lossy geometry encoding)
Scaling
G-PCC  LoD scheme

bitstream
Attribute coding
 RAHT
Geometry-based PCC – main elements
Octree decomposition
• Optionally, it is possible to prune the octree and
complete with a surface model that approximates the
surface within each leaf of the pruned octree (triangle
soup)
Attributes coding
• Region Adaptive Hierarchical Transform (RAHT)
• Interpolation-based hierarchical nearest-neighbour
prediction (Predicting Transform)
• Interpolation-based hierarchical nearest-neighbour
prediction with an update/lifting step (Lifting Transform)
Geometry: octree coding
Octree Arithmetic coding

representation
Geometry
Voxelized source
bitstream
point cloud
(geometry only) Direct coding mode Neighbor-Dependent
Entropy Context
Planar mode
Intra context prediction
QTBT mode
Angular/ Azimuthal mode
Compact representation Entropy coding

Geometry: octree coding and triangle soup
Octree Arithmetic coding

representation
Geometry
Voxelized source
bitstream
point cloud Voxels for the mesh approx. Mesh vertices position
(geometry only)
Trisoup
representation
Surface approximation by triangle mesh representation.

Geometry: predictive geometry coding
Prediction mode, residual
Predictive Arithmetic coding

geometry
Geometry
Voxelized source bitstream
Branch vertex
point cloud with 3 children
(geometry only) Root vertex Branch
vertex with 2
children
Branch
vertex with
one child
Leaf vertex
Point-by-point compression targeting Low-latency application

Attribute coding
LoD scheme
(Predicting Transform,
Lifting Transform) Quant. coefficient.
Arithmetic coding
RAHT Quant. coefficient. Attribute

Voxelized source (Region Adaptive Bitstream
point cloud Hierarchical Transform)
with attribute
There may be several attributes

in the input point cloud.
(E.g. Color + Reflectance)
Two main methods are provided for the various kind of the attribute data
Encoding 3D point clouds in their native format
100,000 points @ 10 fps  112 Mbps (uncompressed)
20 Mbps (lossless) (80kpoints)

G-PCC handles various content categories, while offering
state-of-the-art RD performance
Coding tool Solid Dense Sparse Scant Lidar-Fused Lidar-Frame Improvements

IDCM ✅ ✅ ✅ ✅ Complexity
Planar ✅ ✅ RD
Neighbor Dependent
✅ ✅ RD
Entropy Context
Intra Occupancy
✅ ✅ RD
Prediction
Angular ✅ RD
Predictive ✅ ✅ RD & Complexity
Trisoup ✅ RD
LoD ✅ ✅ ✅ ✅ RD
RAHT ✅ ✅ ✅ ✅ RD
What is next? Revisiting the past!
Visual Point Cloud –

capture a convergence between 2 worlds
Mesh –
A surface approximation of the point cloud
Visual
synthesis
Future of V-PCC
Mesh representation – the most used 3D graphics format

Traditionally used for (realistically) animating virtual characters
Traditional Mesh object – hand crafted
A set of 3D points
• ordered,
• connected to form
polygons
An animated mesh is defined by

• (Xt, Yt, Zt)n
• (v1, v2, v3)m
• (R, G, B) – still image
• a mapping from texture to geometry
Traditional Mesh object – hand crafted
MPEG technologies for compressing traditional mesh:

• TFAN – static mesh
• FAMC – animated mesh
Introducing D-Mesh (Dynamic Mesh)
A set of 3D points
• ordered,
• connected to form
polygons – varying in time
A D-Mesh is defined by
• (Xt, Yt, Zt)n
• (v1t, v2t, v3t)m
• (R, G, B) – still image video
• mapping from video texture to geometry
Automatically captured
V-PCC for the D-Mesh object
Current V-PCC for vertex positions and

colors
TFAN connectivity per frame
Ongoing
exploration in
MPEG
Future of V-PCC
V-PCC - a base for encoding other types of immersive video content
3DoF+ (MIV) encoding pipeline

Future of V-PCC
MIV data representation: reference + dis-occlusion patches

Future of V-PCC
Differences in input data format and rendering but similarities in how

information is represented in the encoded domain (patch maps –
atlas encoded as 2D frames)
Idea: adding something in V-PCC (e.g. camera parameters) and

restricting something else (e.g. a null indicator for occupancy map
video codec) to make it more generic
Future of V-PCC
Intel Frog sequence
… …
MIV Atlas generation
V-PCC
packing/encoding/3D
reconstruction
Future of V-PCC
Classroom sequence
… …
MIV Atlas generation
V-PCC
packing/encoding/3D
reconstruction
Conclusions
Novel capturing systems and interactive 3D viewing experiences
are creating new opportunities for realistic immersion
Point Cloud Compression enables interactive high quality 3D

content by providing manageable bitrates and also reducing
requirements in creation, transmission and (rendering) of 3D
content
V-PCC leverages the existing hardware and software

infrastructure for rapid deployment of new immersive
experiences. G-PCC has the potential to do better (supposing a
similar expertise as in video coding is possible to build)
PCC provides a solid framework for the convergence between

natural and synthetic 3D graphics. Mesh extension of PCC
improves rendering and (potentially) safes bandwidth
Conclusion
We are at the beginning of a new era when humanity

will re-gain its third dimension in the digital space!
Disclaimer
Several pictures and videos used in this presentation

are provided by
8i, Owli, Sony, Intel RealSense, Microsoft Hololens,
Institut Mines Telecom
The huge Internet image data bases
Thank you!

Point Cloud Compression in MPEG

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Point Cloud Compression in MPEG

Uploaded by

Copyright:

Available Formats

Point Clouds – the

convergence between video

Convergence Future Communication Colloquium,

A more realistic scenario …

Is the need of immersive applications real? Is realistic

Is the need of immersive applications real? Is realistic

Is the need of immersive applications real? Is realistic

Immersion is obtained by:

The Cinema was pioneering this track

How can we compress them? The MPEG-I approaches

HD, Full HD, 4K, 8K

Visual Point Cloud –

 Each point is defined by

1-3 Gbps per object

800,000 points -> 1 000 Mbps (uncompressed)

Compression is required in order to make PC useful

Very sparse occupancy of the 3D space

Special constructs are needed: octree or KD-tree

Octree and KD-tree are great for static scenes

2014 2015 2016 2017 2018 2019 2020

MPEG initiated In April 2017 MPEG First Committee Draft

9 technology leading companies

Input Texture Image

V-PCC Video- G-PCC Geometry-

based PCC based PCC

Pixel intensity Vertex Height

Vertex color Pixel color

Projecting all the points on a

Projecting per patch is

Rotation* around Z-axis

Rotation* around X-axis

Encoding the 3D point clouds as a set of 2D patches

Encoding the 3D point clouds as a set of 2D patches

Encoding the 3D point clouds as a set of 2D patches

Encoding the 3D point clouds as a set of 2D patches

Encoding the 3D point clouds as a set of 2D patches

Encoding the 3D point clouds as a set of 2D patches

Encoding the 3D point clouds as a set of 2D videos: depth, color

Not really … point cloud projections have very bad temporal

Per frame encoding criteria are used

One way to improve the temporal redundancy: global packing

No global Global no rotation Global flexible Global Tetris

1. Motion compensation based on octree segmentation

2. Motion compensation based on skeleton segmentation

A priori, only rotations for

Weighted influence near

100,000 points @ 30fps  360 Mbps (uncompressed)

7 Mbps 4.4 Mbps

Longdress[1] loot[1] soldier[1] red&black[1] queen [2]

is the maximum coordinate range (e.g., 1023 for 10-bit content)

is the maximum attribute range (e.g., 255 for 8-bit content)

BD-rate1 is the average, over the green area, of the

70.0 70.0 70.0

60.0 60.0 60.0

55.0 55.0 55.0

50.0 50.0 50.0

40.0 40.0 39.0

35.0 35.0 34.0

30.0 30.0 29.0

25.0 25.0 24.0

70.0 70.0 70.0

65.0 65.0 65.0

60.0 60.0 60.0

55.0 55.0 55.0

Each point is defined by

V-PCC implementations publicly available

Integratedreal-time decoder and renderer

V-PCC progress was fast

Mesh representation – the most used 3D graphics format

An animated mesh is defined by

MPEG technologies for compressing traditional mesh:

Current V-PCC for vertex positions and

V-PCC - a base for encoding other types of immersive video content

V-PCC - a base for encoding other types of immersive video content

V-PCC - a base for encoding other types of immersive video content

Differences in input data format and rendering but similarities in how

Idea: adding something in V-PCC (e.g. camera parameters) and

Point Cloud Compression enables interactive high quality 3D

V-PCC leverages the existing hardware and software

PCC provides a solid framework for the convergence between

We are at the beginning of a new era when humanity

Several pictures and videos used in this presentation