You are on page 1of 88

Point Clouds – the

convergence between video


and graphics technologies

Convergence Future Communication Colloquium,


Kyung Hee University, March 2021

Marius PREDA
MPEG 3D Graphics Convenor
Institut Polytechnique de Paris, FRANCE
Are we ready for the immersion in digital worlds?
Are we ready for the immersion in digital worlds?

A more realistic scenario …


Understanding the present by looking back to the history of visual content

Realistic Computer
Drawings Paintings Paintings Photography
Generated
A more recent history …

New ways
of
consuming
visual
vs
content

Traditional way
of watching
visual content
A more recent history …

New ways
Fixed support Mobile support of
Fixed user position Dynamic user position
consuming
User as a spectator visual
User as a director
vs
Same experience for all content
Personalized experience

Traditional way
of watching
visual content
What is the present and the near future?

Is the need of immersive applications real? Is realistic


immersion technically feasible with current technologies?
What is the present and the near future?

Is the need of immersive applications real? Is realistic


immersion technically feasible with current technologies?

The MPEG answer is YES! and even more …, standards are needed!
MPEG started in 2017 MPEG-I (Coded Representation of Immersive Media)
What is the present and the near future?

Is the need of immersive applications real? Is realistic


immersion technically feasible with current technologies?

The MPEG answer is YES! and even more …, standards are needed!
MPEG started in 2017 MPEG-I (Coded Representation of Immersive Media)

Immersion is obtained by:


• being in (e.g. video 360, a 3D scene),
• being synchronized with,
• interacting with

The Cinema was pioneering this track


(stereoscopic content, huge screens, motion
chairs, …) but for multiusers experience
Are we ready for the immersion in digital worlds?

Yes, we are!

What are the point clouds and how they enable immersion?

How can we compress them? The MPEG-I approaches

What’s next?
Synergies by the confluence of two worlds of technologies

Visual
capture

Visual
synthesis
Synergies by the confluence of two worlds of technologies

Visual
capture

Multi-camera
LDR, HDR

HD, Full HD, 4K, 8K

Stereoscopy
Visual
synthesis
Synergies by the confluence of two worlds of technologies

Visual
capture

Geometric
primitives

Visual
synthesis
Synergies by the confluence of two worlds of technologies

Visual
capture

Multi-camera
LDR, HDR

Geometric
HD, Full HD, 4K, 8K primitives

Stereoscopy
Visual
synthesis
Synergies by the confluence of two worlds of technologies

Visual
capture

Easy to produce
High quality

Interactivity
Immersion

Visual
synthesis
Synergies by the confluence of two worlds of technologies

Visual
capture

Easy to produce
High quality

Interactivity
Immersion

Visual
synthesis
What is at the frontier?

Visual Point Cloud –


capture a convergence between 2 worlds

Visual
synthesis
Point Cloud

„A set of 3D points
• not ordered,
• without relations between
them

„ Each point is defined by


• (X, Y, Z)
• (R, G, B) or (Y, U, V)
• reflectance, transparency, …
Point Clouds
Point Clouds
Sport viewing with point clouds

360°
backgroun
d

3D
objects

1-3 Gbps per object


Presence in Augmented Reality

„ Realistic content
• spatially collocated,
• interactive
Environment mapping for autonomous driving
„ ~20 million points
• 2,020,734,515 bytes
Point Cloud

800,000 points -> 1 000 Mbps (uncompressed)

Compression is required in order to make PC useful


Point Cloud Compression – basic principles

Very sparse occupancy of the 3D space


- (usually) the objects are represented by their
surface and not by volumes
- In 2D a pixel has 8 neighbors, in 3D - 26 and
many of them are transparent
Point Cloud Compression – basic principles

Special constructs are needed: octree or KD-tree


- points grouped into hierarchical structure of branches and leaves
- better difference/residual coding between a representative point and its
direct neighbors in a group
Point Cloud Compression – basic principles

Octree and KD-tree are great for static scenes


- very difficult to extend theirs performances to the temporal axis
- leaves that jump from one branch to another in the octree,
even after a simple motion
Point Cloud Compression in MPEG V-PCC
04/2020
G-PCC
7/2020

2014 2015 2016 2017 2018 2019 2020

MPEG initiated In April 2017 MPEG First Committee Draft


the work on issued a Call for issued in October 2018
PCC Proposals

9 technology leading companies


responded and MPEG evaluated them in
October 2017
10

0
1 2 3 4
5 6 7 8
9 10 11
12 13 14

Geometry Compressed
Geometry Padded geometry
image geometry video
images
generation images

Input Texture Image

V-PCC Video- G-PCC Geometry-


Compressed
point Patch Texture padding Padded Video Texture
Packing image images texture
cloud generation Compression video
generation images

C om pre sse d
multiple xer
frame

bitstrea m
Smoothed
geometry
Occupancy

based PCC based PCC


map
Reconstructed
Smoothing geometry images
Patch info

Compressed
Occupancy map occupancy
compression map

Compressed
Auxiliary patch-info auxiliary patch
compression information
Video-based Point Cloud Compression

Main ideas:
(1) a point coordinate is encoded as a distance with respect to a
particular plane – inspired from the “displacement mapping” in
Graphics

Pixel intensity Vertex Height


Video-based Point Cloud Compression

Main ideas:
(2) the color (or any attribute) associated to a 3D vertex is
encoded in a 2D texture – inspired from the “texture mapping” in
Graphics

Vertex color Pixel color


Video-based Point Cloud Compression

Projecting all the points on a


single plane would result to
several 3D points having the same
2D projection - > several depth
values should be stored per pixel
Video-based Point Cloud Compression

Projecting per patch is


preferred:
- A set of points (patch) in
a small neighborhood is
projected on the same
plane
- The set of projection
planes is very limited
- 6 faces of the cube
- 4 additional diagonal planes
Projection Orientation
„ In V-PCC, 6 orientations are defined: (±x, ±y, ±z)
„ Additionally, the projection axis may be rotated by 45
degrees around each direction, into 12 new
orientations

Rotation* around Z-axis

Rotation* around X-axis


Rotation* around Y-axis
*Rotations can be done lossless in integers, using shearing and scaling.
Video-based Point Cloud Compression

Encoding the 3D point clouds as a set of 2D patches

Geometry

Color (Attributes)
Video-based Point Cloud Compression

Encoding the 3D point clouds as a set of 2D patches


- not all the pixels in the image are used for reconstruction
Video-based Point Cloud Compression

Encoding the 3D point clouds as a set of 2D patches


- not all the pixels in the image are used for reconstruction, an occupancy
map indicates which ones should be used

~
Video-based Point Cloud Compression

Encoding the 3D point clouds as a set of 2D patches


- Some points are still projected on already occupied pixels

~
Video-based Point Cloud Compression

Encoding the 3D point clouds as a set of 2D patches


- Some points are still projected on already occupied pixels, an additional
depth map is used
Video-based Point Cloud Compression

Encoding the 3D point clouds as a set of 2D patches


- For enforcing lossless, the missed points are encoded separately
Video-based Point Cloud Compression

Encoding the 3D point clouds as a set of 2D videos: depth, color


and occupancy maps

MPEG is very
good in video
coding!

Problem solved 
Video-based Point Cloud Compression

Not really … point cloud projections have very bad temporal


behavior, not good for traditional video codecs

Frame 24 Frame 26

Per frame encoding criteria are used


Frame 12 Frame 13 Frame 14
Video-based Point Cloud Compression

inter prediction
intra prediction

prediction type

motion vectors
Video-based Point Cloud Compression

inter prediction
intra prediction

prediction type

motion vectors
Video-based Point Cloud Compression

One way to improve the temporal redundancy: global packing

No global Global no rotation Global flexible Global Tetris


packing packing packing packing
Video-based Point Cloud Compression

The good thing is that patch forming in out of the scope of the
standard : a good opportunity to have competition at the
encoder level!
Video-based Point Cloud Compression

1. Motion compensation based on octree segmentation

Up to 20% gain
Video-based Point Cloud Compression

2. Motion compensation based on skeleton segmentation

A priori, only rotations for


anatomical parts

Weighted influence near


the joints
Video-based Point Cloud Compression
Encoding 3D point clouds as a set of 2D videos: color, depth and occupancy map

100,000 points @ 30fps  360 Mbps (uncompressed)


 1 Mbps (MPEG PCC 2019)

7 Mbps 4.4 Mbps


Evaluation Dataset
10-bit positions, 8-bit texture, 0.8M-1M points, 30-50 fps

Longdress[1] loot[1] soldier[1] red&black[1] queen [2]


Licenses available on https://mpegfs.int-evry.fr/mpegcontent/
[1] E. d’Eon, B. Harrison, T. Myers, P. A. Chou, “8i Voxelized Full Bodies – A Voxelized Point Cloud Dataset,” ISO/IEC JTC1/SC29 Joint WG11/WG1 input document m40059/M74006, Geneva, January 2017.
[2] J. Ricard , C. Guède, R. Doré, S. Lasserre, “CGI-based dynamic point cloud test content,” ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) input document m40050, Geneva, January 2017.
Objective Quality Metrics: Geometric Quality

3
10 Decoded
point cloud
Reference B
point cloud
• D1: Point-to-point A
,

, ,
∈ ∈

• D2: Point-to-plane

,
∈ ∈

is the maximum coordinate range (e.g., 1023 for 10-bit content)


Objective Quality Metrics: Attributes Quality

Decoded
point cloud
10 Reference
point cloud
B

is the maximum attribute range (e.g., 255 for 8-bit content)


Rate distortion metric: BD Rate

Codec C1

Quality

Codec C2

Bitrate

BD-rate1 is the average, over the green area, of the


bitrate difference (in %) for the same quality

[1] G. Bjontegaard, Calculation of Average PSNR Differences between RD curves, VCEG-M33, April 2001.
V-PCC Evolution (All Intra): TMC2-v11 vs. TMC2-v1 vs. Anchor[1]

Loot (all intra) Red & black (all intra) Soldier (all intra)
75.0 75.0 75.0

70.0 70.0 70.0


D1-PSNR (dB)

D1-PSNR (dB)

D1-PSNR (dB)
65.0 65.0 65.0

60.0 60.0 60.0

55.0 55.0 55.0

50.0 50.0 50.0


0.0 5.0 10.0 15.0 20.0 25.0 30.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 0.0 10.0 20.0 30.0 40.0
Bitrate (MBits/s) Bitrate (MBits/s) Bitrate (MBits/s)

Loot (all intra) Red & black (all intra) Soldier (all intra)
45.0 45.0 44.0

40.0 40.0 39.0


Y-PSNR (dB)

Y-PSNR (dB)
Y-PSNR (dB)

35.0 35.0 34.0

30.0 30.0 29.0

25.0 25.0 24.0


0.0 5.0 10.0 15.0 20.0 25.0 30.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 0.0 10.0 20.0 30.0 40.0
Bitrate (MBits/s) Bitrate (MBits/s) Bitrate (MBits/s)

[1] https://github.com/RufaelDev/pcc-
mp3dg/tree/mpeg_standalone_branch
V-PCC Evolution (All Intra): TMC2-v11 vs. TMC2-v1
V-PCC Evolution (Random Access): TMC2-v11 vs. TMC2-v1 vs. Anchor

Loot (random access) Red & black (random access ) Soldier (random access)
75.0 75.0 75.0

70.0 70.0 70.0

D1-PSNR (dB)

D1-PSNR (dB)
D1-PSNR (dB)

65.0 65.0 65.0

60.0 60.0 60.0

55.0 55.0 55.0

50.0 50.0 50.0


0.0 5.0 10.0 15.0 20.0 25.0 30.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 0.0 10.0 20.0 30.0 40.0 50.0
Bitrate (MBits/s) Bitrate (MBits/s) Bitrate (MBits/s)

Loot (random access) Red & black (random access) Soldier (random access)
45.0 45.0 44.0

40.0 40.0 39.0


Y-PSNR (dB)

Y-PSNR (dB)
Y-PSNR (dB)

35.0 35.0 34.0

30.0 30.0 29.0

25.0 25.0 24.0


0.0 5.0 10.0 15.0 20.0 25.0 30.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 0.0 10.0 20.0 30.0 40.0 50.0
Bitrate (MBits/s) Bitrate (MBits/s) Bitrate (MBits/s)
V-PCC Evolution (Random Access): TMC2-v11 vs. TMC2-v1
V-PCC Bitstream Components

Video sub-streams make 95%-99% of the overall V-PCC bitstream size


V-PCC: All intra vs. Random Access

Loot (All Intra vs. Random Access) Red & black (All Intra vs. Random Access) Soldier (All Intra vs. Random Access)
73.0 73.0 73.0

71.0 71.0 71.0

D1-PSNR (dB)
D1-PSNR (dB)

D1-PSNR (dB)
69.0 69.0 69.0

67.0 67.0 67.0

65.0 65.0 65.0


0.0 5.0 10.0 15.0 20.0 0.0 5.0 10.0 15.0 20.0 25.0 0.0 10.0 20.0 30.0 40.0
Bitrate (MBits/s) Bitrate (MBits/s) Bitrate (MBits/s)

Loot (All Intra vs. Random Access) Red & black (All Intra vs. Random Access) Soldier (All Intra vs. Random Access)
45.0 45.0 43.0

40.0 40.0 38.0


Y-PSNR (dB)

Y-PSNR (dB)

Y-PSNR (dB)
35.0 35.0 33.0

30.0 30.0 28.0


0.0 5.0 10.0 15.0 20.0 0.0 5.0 10.0 15.0 20.0 25.0 0.0 10.0 20.0 30.0 40.0
Bitrate (MBits/s) Bitrate (MBits/s) Bitrate (MBits/s)
V-PCC: All intra vs. Random Access
V-PCC All Intra: VVC (VTM 8.2) vs. HEVC (HM 16.2)

Loot (all intra) Red & black (all intra) Soldier (all intra)
74.0 73.0 73.0

72.0 71.0 71.0


D1-PSNR (dB)

D1-PSNR (dB)
D1-PSNR (dB)
70.0 69.0 69.0

68.0 67.0 67.0

66.0 65.0 65.0


0.0 5.0 10.0 15.0 20.0 0.0 5.0 10.0 15.0 20.0 25.0 0.0 10.0 20.0 30.0 40.0
Bitrate (MBits/s) Bitrate (MBits/s) Bitrate (MBits/s)

Loot (all intra) Red & black (all intra) Soldier (all intra)
45.0 45.0 44.0

39.0

Y-PSNR (dB)
40.0
Y-PSNR (dB)

40.0
Y-PSNR (dB)

35.0 35.0 34.0

30.0 30.0 29.0


0.0 5.0 10.0 15.0 20.0 0.0 5.0 10.0 15.0 20.0 25.0 0.0 10.0 20.0 30.0 40.0
Bitrate (MBits/s) Bitrate (MBits/s) Bitrate (MBits/s)
V-PCC All Intra: VVC (VTM 8.2) vs. HEVC (HM 16.2)
V-PCC Random Access: VVC (VTM 8.2) vs. HEVC (HM 16.2)

Loot (random access) Red & black (random access ) Soldier (random access)
74.0 73.0 72.0

72.0 71.0

D1-PSNR (dB)

D1-PSNR (dB)
D1-PSNR (dB)

70.0

70.0 69.0

TMC2-HEVC 68.0
TMC2-HEVC 67.0
TMC2-HEVC
68.0

TMC2-VVC TMC2-VVC TMC2-VVC


66.0 65.0 66.0
0.0 2.0 4.0 6.0 8.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0
Bitrate (MBits/s) Bitrate (MBits/s) Bitrate (MBits/s)

Loot (random access) Red & black (random access) Soldier (random access)
45.0 45.0 44.0

40.0 39.0
Y-PSNR (dB)

40.0

Y-PSNR (dB)
Y-PSNR (dB)

35.0 35.0 TMC2-HEVC 34.0 TMC2-HEVC


TMC2-HEVC

TMC2-VVC TMC2-VVC TMC2-VVC


30.0 30.0 29.0
0.0 2.0 4.0 6.0 8.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0
Bitrate (MBits/s) Bitrate (MBits/s) Bitrate (MBits/s)
V-PCC Random Access : VVC (VTM 8.2) vs. HEVC (HM 16.2)
Video-based Point Cloud Compression

„ V-PCC implementations publicly available


www.mpeg-pcc.org

„ Integratedreal-time decoder and renderer


is also available for Android
https://github.com/nokiatech/vpcc
About collaborations and environnement

„ V-PCC progress was fast


because video coding
expertise was accessible
„ No other Graphics community
has this privilege
Geometry-based Point Cloud Compression
„ Encoding 3D point clouds in their native format

100,000 points @ 10 fps  112 Mbps (uncompressed)


Geometry-based Point Cloud Compression
Encode flow
Decode flow

Coordinate conversion, Voxelize


 Octree coding
 Trisoup
Geometry coding  Predictive geometry
Point cloud

Coordinate conversion,
Attribute transfer (for lossy geometry encoding)
Scaling

G-PCC  LoD scheme


bitstream
Attribute coding
 RAHT
Geometry-based PCC – main elements
„ Octree decomposition
• Optionally, it is possible to prune the octree and
complete with a surface model that approximates the
surface within each leaf of the pruned octree (triangle
soup)
„ Attributes coding
• Region Adaptive Hierarchical Transform (RAHT)
• Interpolation-based hierarchical nearest-neighbour
prediction (Predicting Transform)
• Interpolation-based hierarchical nearest-neighbour
prediction with an update/lifting step (Lifting Transform)
Geometry: octree coding

Octree Arithmetic coding


representation

Geometry
Voxelized source
bitstream
point cloud
(geometry only) Direct coding mode Neighbor-Dependent
Entropy Context
Planar mode
Intra context prediction

QTBT mode

Angular/ Azimuthal mode

Compact representation Entropy coding


Geometry: octree coding and triangle soup

Octree Arithmetic coding


representation

Geometry
Voxelized source
bitstream
point cloud Voxels for the mesh approx. Mesh vertices position
(geometry only)
Trisoup
representation

Surface approximation by triangle mesh representation.


Geometry: predictive geometry coding

Prediction mode, residual

Predictive Arithmetic coding


geometry
Geometry
Voxelized source bitstream
Branch vertex
point cloud with 3 children
(geometry only) Root vertex Branch
vertex with 2
children

Branch
vertex with
one child

Leaf vertex

Point-by-point compression targeting Low-latency application


Attribute coding

LoD scheme
(Predicting Transform,
Lifting Transform) Quant. coefficient.

Arithmetic coding

RAHT Quant. coefficient. Attribute


Voxelized source (Region Adaptive Bitstream
point cloud Hierarchical Transform)
with attribute

There may be several attributes


in the input point cloud.
(E.g. Color + Reflectance)

Two main methods are provided for the various kind of the attribute data
Geometry-based Point Cloud Compression
„ Encoding 3D point clouds in their native format

100,000 points @ 10 fps  112 Mbps (uncompressed)

20 Mbps (lossless) (80kpoints)


Geometry-based Point Cloud Compression
G-PCC handles various content categories, while offering
state-of-the-art RD performance

Coding tool Solid Dense Sparse Scant Lidar-Fused Lidar-Frame Improvements


IDCM ✅ ✅ ✅ ✅ Complexity
Planar ✅ ✅ RD
Neighbor Dependent
✅ ✅ RD
Entropy Context
Intra Occupancy
✅ ✅ RD
Prediction
Angular ✅ RD
Predictive ✅ ✅ RD & Complexity
Trisoup ✅ RD
LoD ✅ ✅ ✅ ✅ RD
RAHT ✅ ✅ ✅ ✅ RD
What is next? Revisiting the past!

Visual Point Cloud –


capture a convergence between 2 worlds

Mesh –
A surface approximation of the point cloud

Visual
synthesis
Future of V-PCC

„ Mesh representation – the most used 3D graphics format


„ Traditionally used for (realistically) animating virtual characters
Traditional Mesh object – hand crafted

„A set of 3D points
• ordered,
• connected to form
polygons

„ An animated mesh is defined by


• (Xt, Yt, Zt)n
• (v1, v2, v3)m
• (R, G, B) – still image
• a mapping from texture to geometry
• reflectance, transparency, …
Traditional Mesh object – hand crafted

„ MPEG technologies for compressing traditional mesh:


• TFAN – static mesh
• FAMC – animated mesh
Introducing D-Mesh (Dynamic Mesh)

„A set of 3D points
• ordered,
• connected to form
polygons – varying in time
„A D-Mesh is defined by
• (Xt, Yt, Zt)n
• (v1t, v2t, v3t)m
• (R, G, B) – still image video
• mapping from video texture to geometry
• reflectance, transparency, …
„ Automatically captured
V-PCC for the D-Mesh object

„ Current V-PCC for vertex positions and


colors
„ TFAN connectivity per frame

Ongoing
exploration in
MPEG
Future of V-PCC

„ V-PCC - a base for encoding other types of immersive video content

3DoF+ (MIV) encoding pipeline


Future of V-PCC

„ V-PCC - a base for encoding other types of immersive video content

MIV data representation: reference + dis-occlusion patches


Future of V-PCC

„ V-PCC - a base for encoding other types of immersive video content

„ Differences in input data format and rendering but similarities in how


information is represented in the encoded domain (patch maps –
atlas encoded as 2D frames)

„ Idea: adding something in V-PCC (e.g. camera parameters) and


restricting something else (e.g. a null indicator for occupancy map
video codec) to make it more generic
Future of V-PCC
„ Intel Frog sequence

… …

MIV Atlas generation

V-PCC
packing/encoding/3D
reconstruction
Future of V-PCC
„ Classroom sequence

… …

MIV Atlas generation

V-PCC
packing/encoding/3D
reconstruction
Conclusions
„ Novel capturing systems and interactive 3D viewing experiences
are creating new opportunities for realistic immersion

„ Point Cloud Compression enables interactive high quality 3D


content by providing manageable bitrates and also reducing
requirements in creation, transmission and (rendering) of 3D
content

„ V-PCC leverages the existing hardware and software


infrastructure for rapid deployment of new immersive
experiences. G-PCC has the potential to do better (supposing a
similar expertise as in video coding is possible to build)

„ PCC provides a solid framework for the convergence between


natural and synthetic 3D graphics. Mesh extension of PCC
improves rendering and (potentially) safes bandwidth
Conclusion

„ We are at the beginning of a new era when humanity


will re-gain its third dimension in the digital space!
Disclaimer

„ Several pictures and videos used in this presentation


are provided by
„ 8i, Owli, Sony, Intel RealSense, Microsoft Hololens,
Institut Mines Telecom
„ The huge Internet image data bases

Thank you!

You might also like