You are on page 1of 41

ZOOM, ENHANCE, SYNTHESIZE!

MAGIC UPSCALING AND MATERIAL


SYNTHESIS USING DEEP LEARNING
Tuesday, 9 May 2017
Andrew Edelsten - NVIDIA Developer Technologies
DEEP LEARNING FOR ART
Active R&D but ready now

▪ Style transfer
▪ Generative networks creating images and voxels
▪ Adversarial networks (DCGAN) – still early but promising

▪ DL & ML based tools from NVIDIA and partners


▪ NVIDIA

▪ Artomatix

▪ Allegorithmic

▪ Autodesk
2
STYLE TRANSFER
Something Fun
▪ Doodle a masterpiece!
Content Style
▪ Uses CNN to take the “style” from one image and apply it to
another

▪ Sept 2015: A Neural Algorithm of Artistic Style by Gatys et al

▪ Dec 2015: neural-style (github)

▪ Mar 2016: neural-doodle (github)

▪ Mar 2016: texture-nets (github)

▪ Oct 2016: fast-neural-style (github)

▪ 2 May 2017 (last week!): Deep Image Analogy (arXiv)

▪ Also numerous services: Vinci, Prisma, Artisto, Ostagram


3
HTTP://OSTAGRAM.RU/STATIC_PAGES/LENTA 4
STYLE TRANSFER
Something Useful
▪ Game remaster & texture enhancement
▪ Try Neural Style and use a real-world photo for the “style”

▪ For stylized or anime up-rez try https://github.com/nagadomi/waifu2x

▪ Experiment with art styles


▪ Dream or power-up sequences
▪ “Come Swim” by Kirsten Stewart - https://arxiv.org/pdf/1701.04928v1.pdf

5
GAMEWORKS: MATERIALS & TEXTURES
Using DL for Game Development & Content Creation

▪ Set of tools targeting the game industry using machine learning and deep learning
▪ Launched at Game Developer Conference in March, tools run as a web service
▪ Sign up for the Beta at: https://gwmt.nvidia.com
▪ Tools in this initial release:
▪ Photo to Material: 2shot

▪ Texture Multiplier

▪ Super-Resolution

6
PHOTO TO MATERIAL
The 2Shot Tool

▪ From two photos of a surface, generate a “material”


▪ Based on a SIGGRAPH 2015 paper by NVIDIA Research & Aalto University (Finland)
▪ “Two-Shot SVBRDF Capture for Stationary Materials”

▪ https://mediatech.aalto.fi/publications/graphics/TwoShotSVBRDF/

▪ Input is pixel aligned “flash” and “guide” photographs


▪ Use tripod and remote shutter or bracket

▪ Or align later

▪ Use for flat surfaces with repeating patterns


7
MATERIAL SYNTHESIS FROM TWO PHOTOS

Flash image Guide image

Diffuse
Specular Normals Glossiness Anisotropy
albedo
8
TEXTURE MULTIPLIER
Organic variations of textures

▪ Put simply: texture in, new texture out


▪ Inspired by Gatys, Ecker & Bethge
▪ Texture Synthesis Using Convolutional Neural Networks

▪ https://arxiv.org/pdf/1505.07376.pdf

▪ Artomatix
▪ Similar product “Texture Mutation”

▪ https://artomatix.com/

9
SUPER RESOLUTION

10
SUPER RESOLUTION
Zoom.. ENHANCE!

OK!
Sure!

Can you
Zoom in on the
enhance that?
license plate

11
SUPER RESOLUTION Construct a
high-resolution image
The task at hand
Given a
low-resolution image

H Upscale n*H
(magic?)

n*W
12
UPSCALE: CREATE MORE PIXELS
An ill-posed task?
Pixels of the upscaled image

? ? ?
Pixels of the given image
? ? ? ? ? ?

? ? ?

? ? ? ? ? ?

? ? ?

? ? ? ? ? ?

13
TRADITIONAL APPROACH
▪ Interpolation (bicubic, lanczos, etc.)
▪ Interpolation + Sharpening (and other filtration)

Interpolation Filter-based sharpening


▪ Rough estimation of the data behavior  too general

▪ Too many possibilities (8x8 grayscale has 256(8∗8) ≈ 10153 pixel combinations!)
14
A NEW APPROACH
First: narrow the possible set

Photos
Natural images
Textures

All possible images


Focus on the domain of “natural images”
15
A NEW APPROACH
Second: Place image in the domain, then reconstruct

Data from natural images is sparse, it’s compressible in some domain


Then “reconstruct” images (rather than create new ones)

Compress Reconstruct

+
prior information
+
constraints
16
PATCH-BASED MAPPING: TRAINING
Low-resolution patch Mapping High-resolution patch

Model
params

training

LR,HR
Training images pairs of patches

17
PATCH-BASED MAPPING
𝒙𝑯
𝒙𝑳

Encode Decode

LR patch

HR patch

High-level information about the patch

18
PATCH-BASED MAPPING: SPARSE CODING
𝒙𝑯
𝒙𝑳

Encode Decode

LR patch

HR patch

High-level information about the patch


“Features”

Sparse
code

19
PATCH FEATURES & RECONSTRUCTION
Image patch can be reconstructed as a sparse linear combination of features
Features are learned from the dataset over time

𝑫
𝒙 = 𝑫𝒛 = 𝒅𝟏 𝒛𝟏 + ⋯ + 𝒅𝑲 𝒛𝑲

𝑫 - dictionary
𝒙 - patch
= 0.8 * + 0.3 * + 0.5 *
𝒛 - sparse code
𝒙 𝒅𝟑𝟔 𝒅𝟒𝟐 𝒅𝟔𝟑
20
GENERALIZED PATCH-BASED MAPPING

Mapping in
Mapping feature space Mapping

LR patch
High-level High-level
representation of representation of HR patch
the LR patch the HR patch

“Features”
21
GENERALIZED PATCH-BASED MAPPING

Mapping in
Mapping feature space Mapping

𝑊1 𝑊2 𝑊3

LR patch

HR patch

Trainable parameters

22
MAPPING OF THE WHOLE IMAGE
Using Convolutions
Convolutional operators

HR image

LR image

Mapping Mapping in Mapping


feature space

23
AUTO-ENCODERS

input output ≈ input

24
AUTO-ENCODER
Encode Decode

input output ≈ input

features

25
AUTO-ENCODER
Parameters
𝑊
Inference
𝑦 = 𝐹𝑊 (𝑥)

𝑥 𝑦 Training

𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛 ෍ 𝐷𝑖𝑠𝑡(𝑥𝑖 , 𝐹𝑊 𝑥𝑖 )
𝑖

𝑥𝑖 - training set

26
AUTO-ENCODER
Encode
▪ Our encoder is LOSSY by definition

input

information loss 27
SUPER-RESOLUTION AUTO-ENCODER
Parameters
𝑊
Inference
𝑦 = 𝐹𝑊 (𝑥)

𝑥 𝑦 Training

𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛 ෍ 𝐷𝑖𝑠𝑡(𝑥𝑖 , 𝐹𝑊 𝑥𝑖 )
𝑖

𝑥𝑖 - training set

28
SUPER RESOLUTION AE: TRAINING
𝑥 y
𝑥ො 𝐹W
𝐷
Downscaling SR AE

𝑊
LR image

Ground-truth HR image Reconstructed HR image

𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛 ෍ 𝐷𝑖𝑠𝑡(𝑥𝑖 , 𝐹𝑊 𝐷(𝑥𝑖 ) )


𝑖

𝑥𝑖 - training set
29
SUPER RESOLUTION AE: INFERENCE
y

𝑥ො 𝐹W

SR AE

𝑊
Given LR image

Constructed HR image

𝑦 = 𝐹𝑊 (𝑥)

30
SUPER-RESOLUTION: ILL-POSED TASK?

31
THE LOSS FUNCTION

32
THE LOSS FUNCTION
Measuring the “distance” from a good result

Distance function is a key element to obtaining good results.

𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛 ෍ 𝐷 𝑥𝑖 , 𝐹𝑊 (𝑥𝑖 )
𝑖

Choice of the loss function is an important decision

33
LOSS FUNCTION
MSE
Mean Squared Error
1 2
𝑥 −𝐹 𝑥
𝑁

34
LOSS FUNCTION: PSNR
MSE PSNR
Mean Squared Error Peak Signal-to-Noise Ratio
1 2 𝑀𝐴𝑋 2
𝑥 −𝐹 𝑥 10 ∗ 𝑙𝑜𝑔10
𝑁 𝑀𝑆𝐸

35
LOSS FUNCTION: HFEN
MSE PSNR
Mean Squared Error Peak Signal-to-Noise Ratio
1 2 𝑀𝐴𝑋2
𝑥 −𝐹 𝑥 10 ∗ 𝑙𝑜𝑔10
𝑁 𝑀𝑆𝐸

HFEN(see A)
High Frequency Error Norm High-Pass filter

𝐻𝑃(𝑥 − 𝐹 𝑥 ) 2

Perceptual loss

Ref A: http://ieeexplore.ieee.org/document/5617283/ 36
REGULAR LOSS
Result 4x Result 4x

37
REGULAR LOSS + PERCEPTUAL LOSS
Result 4x Result 4x

38
WARNING… THIS IS EXPERIMENTAL!

39
SUPER-RESOLUTION: GAN-BASED LOSS
𝐹(𝑥)
real
𝑥
𝑦 𝐷(𝑦)

Generator Discriminator fake

GAN loss = −𝑙𝑛𝐷(𝐹 𝑥 )

Total loss = Regular (MSE+PSNR+HFEN) loss + GAN loss


40
QUESTIONS?

Extended presentation from


Game Developer Conference 2017
https://developer.nvidia.com/deep-learning-games

GameWorks: Materials & Textures


https://gwmt.nvidia.com

You might also like