Professional Documents
Culture Documents
2
Overview of 3D Talking Head Synthesis
Scene Representation Network - 3D View Synthesizing
ER-NeRF
[Li 2023]
5
DFRF: Learning Dynamic Facial Radiance
Field
Dynamic Facial Radiance Field
3d query point
NeRF: (but only for static scenes) MLP Network Color and density
2D view direction
Identity learning
6
DFRF: Learning Dynamic Facial Radiance
Field
Differentiable Face Warping
WHY? – Strict NeRF mapping fails to model complex facial movements.
3d query point
Audio features
∆ 𝑜𝑛 + 𝑝
𝑟𝑒𝑓 ′
𝑛
N.o. reference images All the points in the 3D-space Learnable parameters
Low density points are more probably background Loss function: MSE
=> Less offset 2
𝐿=‖𝐶 − 𝐼 ‖ + 𝜆∗ 𝐿′𝑟
: rendered color
: ground truth
DFRF: Learning Dynamic Facial Radiance
Field
Effectiveness of the Face Warping Module
Result to show the contribution of the proposed differentiable face warping module.
DFRF – Reported Results
DFRF
11
ER-NeRF: Efficient Region-Aware Neural Radiance
Fields for High-Fidelity Talking Portrait Synthesis
Observations Proposal
• Only the head region needs to be focused Efficient Region-aware NeRF (ER-NeRF):
-> Unrelated neurons can be pruned uneven concentration level for different
• Distinct audio-facial manners spatial regions
-> Unique audio-driven local motions
Contribution
• Tri-Plane Hash Representation
• Region Attention Module
Capture the correlation between the
audio condition and spatial regions
12
ER-NeRF
Tri-Plane Hash Representation
Problems
• Hash collision increases linearly with n.o. sampling points
• Every point in the 3D space are sampled equally
-> MLP needs to handle multiple audio
features at the same time
• Naïve methods of sampling reduction lower the quality
-> Avoid hash collisions from high dimensions
(concatenate with other 2 planes’ features)
Method
For each plane:
+
plane-level
Final tri-plane
3d points Project to 2d Hash decoder geometry
geometry features
feature
+
ER-NeRF
Region Attention Module
15
ER-NeRF: Reported Results
Results on lip-synchronization
Key-frame picking 16
ER-NeRF: Different training length results
PSNR ↑
35.000
30.976
30.000 29.097 29.469 29.480
27.993 28.222
26.594 26.678 26.365
25.873
25.000 24.591 24.18 24.378 24.587
23.519
22.991
21.863
15.000
10.000
5.000
0 0 0 0 0 0 0 0 0 0 0 0
0.000
Obama Biden reporter Chinese man Chinese woman Trump reporter French man
17
ER-NeRF: Different training length results
LPIPS↓
0.1800
0.1597
0.1600
0.1400
0.1200
0.1037
0.1000
0.0800 0.0772
0.0698
0.0623
0.0600 0.0558 0.0564 0.0569 0.0556 0.0555
0.0532
0.0487 0.0471
0.0446
0.0400 0.0370
0.0316 0.0314
0.0254
0.0242
0.0200
0.0000
Obama Biden reporter Chinese man Chinese woman Trump reporter French man
18
ER-NeRF: Different training length results
LMD ↓
5.000
4.500 4.375
3.969
4.000
3.697
2.000
1.500
1.000
0.500
0.000
Obama Biden reporter Chinese man Chinese woman Trump reporter French man
19
ER-NeRF: Cross-lingo tests
20
Future Plan
• Continue to run experiments on customed data with DFRF and ER-NeRF
• Research and experiment with HideNeRF
27