You are on page 1of 44
Deferred Deferred Shading Shading Shawn Hargreaves Mark Harris NVIDIA
Deferred Deferred Shading Shading Shawn Hargreaves Mark Harris NVIDIA
Deferred Deferred Shading Shading Shawn Hargreaves Mark Harris NVIDIA

DeferredDeferred ShadingShading

Shawn Hargreaves

Deferred Deferred Shading Shading Shawn Hargreaves Mark Harris NVIDIA

Mark Harris NVIDIA

Deferred Deferred Shading Shading Shawn Hargreaves Mark Harris NVIDIA

The Challenge: Real-Time Lighting

The Challenge: Real-Time Lighting Modern games use many lights on many objects covering many pixels computationally

Modern games use many lights on many objects covering many pixels covering many pixels

computationally expensivegames use many lights on many objects covering many pixels Three major options for real-time lighting

Three major options for real-time lightingmany objects covering many pixels computationally expensive Single-pass, multi-light Multi-pass, multi-light Deferred

Single-pass, multi-lightexpensive Three major options for real-time lighting Multi-pass, multi-light Deferred Shading Each has associated

Multi-pass, multi-lightThree major options for real-time lighting Single-pass, multi-light Deferred Shading Each has associated trade-offs

Deferred ShadingThree major options for real-time lighting Single-pass, multi-light Multi-pass, multi-light Each has associated trade-offs

Each has associated trade-offsexpensive Three major options for real-time lighting Single-pass, multi-light Multi-pass, multi-light Deferred Shading

for real-time lighting Single-pass, multi-light Multi-pass, multi-light Deferred Shading Each has associated trade-offs
for real-time lighting Single-pass, multi-light Multi-pass, multi-light Deferred Shading Each has associated trade-offs

Comparison: Single-Pass Lighting

For Each Object:

Comparison: Single-Pass Lighting For Each Object: Render object, apply all lighting in one shader Hidden surfaces

Render object, apply all lighting in one shader

Hidden surfaces can cause wasted shadingEach Object: Render object, apply all lighting in one shader Hard to manage multi-light situations Code

Hard to manage multi-light situationsin one shader Hidden surfaces can cause wasted shading Code generation can result in thousands of

Code generation can result in thousands of combinations for a single template shader combinations for a single template shader

Hard to integrate with shadowscan result in thousands of combinations for a single template shader Stencil = No Go Shadow

Stencil = No Gothousands of combinations for a single template shader Hard to integrate with shadows Shadow Maps =

Shadow Maps = Easy to overflow VRAMcan result in thousands of combinations for a single template shader Hard to integrate with shadows

for a single template shader Hard to integrate with shadows Stencil = No Go Shadow Maps
for a single template shader Hard to integrate with shadows Stencil = No Go Shadow Maps

Comparison: Multipass Lighting

For Each Light:

Comparison: Multipass Lighting For Each Light: For Each Object Affected By Light: framebuffer += brdf( object,

For Each Object Affected By Light:

framebuffer += brdf( object, light )

Hidden surfaces can cause wasted shadingAffected By Light: framebuffer += brdf( object, light ) High Batch Count (1/object/light) Even higher if

High Batch Count (1/object/light)object, light ) Hidden surfaces can cause wasted shading Even higher if shadow-casting Lots of repeated

Even higher if shadow-castingcan cause wasted shading High Batch Count (1/object/light) Lots of repeated work each pass: Vertex transform

Lots of repeated work each pass:shading High Batch Count (1/object/light) Even higher if shadow-casting Vertex transform & setup Anisotropic filtering

Vertex transform & setupHigh Batch Count (1/object/light) Even higher if shadow-casting Lots of repeated work each pass: Anisotropic filtering

Anisotropic filteringBatch Count (1/object/light) Even higher if shadow-casting Lots of repeated work each pass: Vertex transform &

Even higher if shadow-casting Lots of repeated work each pass: Vertex transform & setup Anisotropic filtering
Even higher if shadow-casting Lots of repeated work each pass: Vertex transform & setup Anisotropic filtering

Comparison: Deferred Shading

Comparison: Deferred Shading For Each Object: Render lighting properties to “G-buffer” For Each Light: framebuffer +=

For Each Object:

Render lighting properties to “G-buffer” For Each Light:

framebuffer += brdf( G-buffer, light )

Greatly simplifies batching & engine managementFor Each Light: framebuffer += brdf( G-buffer, light ) Easily integrates with popular shadow techniques

Easily integrates with popular shadow techniqueslight ) Greatly simplifies batching & engine management “Perfect” O(1) depth complexity for lighting Lots of

“Perfect” O(1) depth complexity for lightingbatching & engine management Easily integrates with popular shadow techniques Lots of small lights ~ one

Lots of small lights ~ one big light& engine management Easily integrates with popular shadow techniques “Perfect” O(1) depth complexity for lighting

with popular shadow techniques “Perfect” O(1) depth complexity for lighting Lots of small lights ~ one
with popular shadow techniques “Perfect” O(1) depth complexity for lighting Lots of small lights ~ one

Deferred Shading: Not A New Idea!

Deferred Shading: Not A New Idea! Deferred shading introduced by Michael Deering et al. at SIGGRAPH

Deferred shading introduced by Michael Deering et al. at SIGGRAPH 1988 al. at SIGGRAPH 1988

Their paper does not ever use the word “deferred”introduced by Michael Deering et al. at SIGGRAPH 1988 PixelFlow used it (UNC / HP project)

PixelFlow used it (UNC / HP project)et al. at SIGGRAPH 1988 Their paper does not ever use the word “deferred” Just now

Just now becoming practical for games!et al. at SIGGRAPH 1988 Their paper does not ever use the word “deferred” PixelFlow used

does not ever use the word “deferred” PixelFlow used it (UNC / HP project) Just now
does not ever use the word “deferred” PixelFlow used it (UNC / HP project) Just now

What is a G-Buffer?

What is a G-Buffer? G-Buffer = All necessary per-pixel lighting terms Normal Position Diffuse / Specular

G-Buffer = All necessary per-pixel lighting terms

G-Buffer? G-Buffer = All necessary per-pixel lighting terms Normal Position Diffuse / Specular Albedo, other attributes

NormalG-Buffer? G-Buffer = All necessary per-pixel lighting terms Position Diffuse / Specular Albedo, other attributes Limits

PositionG-Buffer = All necessary per-pixel lighting terms Normal Diffuse / Specular Albedo, other attributes Limits lighting

Diffuse / Specular Albedo, other attributesG-Buffer = All necessary per-pixel lighting terms Normal Position Limits lighting to a small number of

Limits lighting to a small number of parameters!a G-Buffer? G-Buffer = All necessary per-pixel lighting terms Normal Position Diffuse / Specular Albedo, other

terms Normal Position Diffuse / Specular Albedo, other attributes Limits lighting to a small number of
terms Normal Position Diffuse / Specular Albedo, other attributes Limits lighting to a small number of
terms Normal Position Diffuse / Specular Albedo, other attributes Limits lighting to a small number of
terms Normal Position Diffuse / Specular Albedo, other attributes Limits lighting to a small number of
terms Normal Position Diffuse / Specular Albedo, other attributes Limits lighting to a small number of

What You Need

What You Need Deferred shading is best with high-end GPU features: Floating-point textures: must store position

Deferred shading is best with high-end GPU features:

Need Deferred shading is best with high-end GPU features: Floating-point textures: must store position Multiple Render

Floating-point textures: must store positionNeed Deferred shading is best with high-end GPU features: Multiple Render Targets (MRT): write all G-buffer

Multiple Render Targets (MRT): write all G-buffer attributes in a single pass attributes in a single pass

Floating-point blending: fast compositingtextures: must store position Multiple Render Targets (MRT): write all G-buffer attributes in a single pass

Multiple Render Targets (MRT): write all G-buffer attributes in a single pass Floating-point blending: fast compositing
Multiple Render Targets (MRT): write all G-buffer attributes in a single pass Floating-point blending: fast compositing

Attributes Pass

Attributes Pass Attributes written will depend on your shading Attributes needed Position Normal Color Others:

Attributes written will depend on your shadingAttributes Pass Attributes needed Position Normal Color Others: specular/exponent map, emissive, light map, material ID,

Attributes neededPass Attributes written will depend on your shading Position Normal Color Others: specular/exponent map,

Positionwritten will depend on your shading Attributes needed Normal Color Others: specular/exponent map, emissive, light

Normalwill depend on your shading Attributes needed Position Color Others: specular/exponent map, emissive, light map,

Colordepend on your shading Attributes needed Position Normal Others: specular/exponent map, emissive, light map,

Others: specular/exponent map, emissive, light map, material ID, etc. material ID, etc.

Option: trade storage for computationmap, emissive, light map, material ID, etc. Store pos.z and compute xy from z + window.xy

Store pos.z and compute xy from z + window.xymap, material ID, etc. Option: trade storage for computation Store normal.xy and compute z=sqrt(1-x 2 -y

Store normal.xy and compute z=sqrt(1-x 2 -y 2 ) 2 -y 2 )

for computation Store pos.z and compute xy from z + window.xy Store normal.xy and compute z=sqrt(1-x
for computation Store pos.z and compute xy from z + window.xy Store normal.xy and compute z=sqrt(1-x

MRT rules

MRT rules Up to 4 active render targets All must have the same number of bits

Up to 4 active render targetsMRT rules All must have the same number of bits You can mix RTs with different

All must have the same number of bitsMRT rules Up to 4 active render targets You can mix RTs with different number of

You can mix RTs with different number of channelsactive render targets All must have the same number of bits For example, this is OK:

For example, this is OK:of bits You can mix RTs with different number of channels RT0 = R32f RT1 =

RT0 = R32fwith different number of channels For example, this is OK: RT1 = G16R16f RT2 = ARGB8

RT1 = G16R16fnumber of channels For example, this is OK: RT0 = R32f RT2 = ARGB8 This won’t

RT2 = ARGB8number of channels For example, this is OK: RT0 = R32f RT1 = G16R16f This won’t

This won’t work:number of channels For example, this is OK: RT0 = R32f RT1 = G16R16f RT2 =

RT0 = G16R16fnumber of channels For example, this is OK: RT0 = R32f RT1 = G16R16f RT2 =

RT1 = A16R16G16B16fnumber of channels For example, this is OK: RT0 = R32f RT1 = G16R16f RT2 =

For example, this is OK: RT0 = R32f RT1 = G16R16f RT2 = ARGB8 This won’t
For example, this is OK: RT0 = R32f RT1 = G16R16f RT2 = ARGB8 This won’t

Example MRT Layout

Three 16-bit Float MRTsExample MRT Layout RT1 Diffuse.r Diffuse.g Diffuse.b Specular RT0 Position.x Position.y

Example MRT Layout Three 16-bit Float MRTs RT1 Diffuse.r Diffuse.g Diffuse.b Specular RT0 Position.x

RT1

Diffuse.r

Diffuse.g

Diffuse.b

Specular

RT0

Position.x

Position.y

Position.z

Emissive

RT2

Normal.x

Normal.y

Normal.z

Free

16-bit float is overkill for Diffuse reflectance…Position.z Emissive RT2 Normal.x Normal.y Normal.z Free But we don’t have a choice due to MRT

Normal.z Free 16-bit float is overkill for Diffuse reflectance… But we don’t have a choice due

But we don’t have a choice due to MRT rules

Normal.z Free 16-bit float is overkill for Diffuse reflectance… But we don’t have a choice due
Normal.z Free 16-bit float is overkill for Diffuse reflectance… But we don’t have a choice due

Computing Lighting

Computing Lighting Render convex bounding geometry Spot Light = Cone Point Light = Sphere Directional Light

Render convex bounding geometry

Computing Lighting Render convex bounding geometry Spot Light = Cone Point Light = Sphere Directional Light

Spot Light = ConeComputing Lighting Render convex bounding geometry Point Light = Sphere Directional Light = Quad or box

Point Light = SphereLighting Render convex bounding geometry Spot Light = Cone Directional Light = Quad or box Read

Directional Light = Quad or box or box

Read G-Buffer Compute radiance Blend into frame buffer

box Read G-Buffer Compute radiance Blend into frame buffer Courtesy of Shawn Hargreaves, GDC 2004 Lots

Courtesy of Shawn Hargreaves, GDC 2004

Lots of optimizations possible

of Shawn Hargreaves, GDC 2004 Lots of optimizations possible Clipping, occlusion query, Z-cull, stencil cull, etc.

Clipping, occlusion query, Z-cull, stencil cull, etc.

of Shawn Hargreaves, GDC 2004 Lots of optimizations possible Clipping, occlusion query, Z-cull, stencil cull, etc.
of Shawn Hargreaves, GDC 2004 Lots of optimizations possible Clipping, occlusion query, Z-cull, stencil cull, etc.

Lighting Details

Lighting Details Blend contribution from each light into accumulation buffer Keep diffuse and specular separate For

Blend contribution from each light into accumulation buffer accumulation buffer

Blend contribution from each light into accumulation buffer Keep diffuse and specular separate For each light:

Keep diffuse and specular separate

For each light:

diffuse += diffuse(G-buff.N, L)) specular += G-buff.spec * specular(G-buff.N, G-buff.P, L)

A final full-screen pass modulates diffuse color:L)) specular += G-buff.spec * specular(G-buff.N, G-buff.P, L) framebuffer = diffuse * G-buff.diffuse + specular

framebuffer = diffuse * G-buff.diffuse + specular

G-buff.P, L) A final full-screen pass modulates diffuse color: framebuffer = diffuse * G-buff.diffuse + specular
G-buff.P, L) A final full-screen pass modulates diffuse color: framebuffer = diffuse * G-buff.diffuse + specular

Options for accumulation buffer(s)

PrecisionOptions for accumulation buffer(s) 16-bit floating point enables HDR Can use 8-bit for higher performance Beware

Options for accumulation buffer(s) Precision 16-bit floating point enables HDR Can use 8-bit for higher performance

16-bit floating point enables HDROptions for accumulation buffer(s) Precision Can use 8-bit for higher performance Beware of saturation Channels RGBA

Can use 8-bit for higher performancebuffer(s) Precision 16-bit floating point enables HDR Beware of saturation Channels RGBA if monochrome specular is

point enables HDR Can use 8-bit for higher performance Beware of saturation Channels RGBA if monochrome

Beware of saturation

ChannelsCan use 8-bit for higher performance Beware of saturation RGBA if monochrome specular is enough 2

RGBA if monochrome specular is enough8-bit for higher performance Beware of saturation Channels 2 RGBA buffers if RGB diffuse and specular

2 RGBA buffers if RGB diffuse and specular are both needed. needed.

Small shader overhead for each RT writtenof saturation Channels RGBA if monochrome specular is enough 2 RGBA buffers if RGB diffuse and

specular is enough 2 RGBA buffers if RGB diffuse and specular are both needed. Small shader
specular is enough 2 RGBA buffers if RGB diffuse and specular are both needed. Small shader

Lighting Optimization

Lighting Optimization Only want to shade surfaces inside light volume Anything else is wasted work Outside
Lighting Optimization Only want to shade surfaces inside light volume Anything else is wasted work Outside

Only want to shade surfaces inside light volume

Optimization Only want to shade surfaces inside light volume Anything else is wasted work Outside volume,

Anything else is wasted work

Outside volume, but will be shaded, and lighting discarded!

Inside light volume

View frustum
View frustum

Outside volume, will not be shaded

volume, but will be shaded, and lighting discarded! Inside light volume View frustum Outside volume, will
volume, but will be shaded, and lighting discarded! Inside light volume View frustum Outside volume, will

Optimization: Stencil Cull

Optimization: Stencil Cull Two pass algorithm, but first pass is very cheap Rendering without color writes

Two pass algorithm, but first pass is very cheap

Rendering without color writes = 2x pixels per clockCull Two pass algorithm, but first pass is very cheap 1. Render light volume with color

cheap Rendering without color writes = 2x pixels per clock 1. Render light volume with color

1. Render light volume with color write disabled

Depth Func = LESS, Stencil Func = ALWAYSper clock 1. Render light volume with color write disabled Stencil Z-FAIL = REPLACE (with value

Stencil Z-FAIL = REPLACE (with value X)write disabled Depth Func = LESS, Stencil Func = ALWAYS Rest of stencil ops set to

Rest of stencil ops set to KEEPFunc = ALWAYS Stencil Z-FAIL = REPLACE (with value X) 2. Render with lighting shader Depth

2. Render with lighting shader

Depth Func = ALWAY, Stencil Func = EQUAL, all ops = KEEP, Stencil Ref = X all ops = KEEP, Stencil Ref = X

Unlit pixels will be culled because stencil will not match the reference value d because stencil will not match the reference value

= EQUAL, all ops = KEEP, Stencil Ref = X Unlit pixels will be culle d
= EQUAL, all ops = KEEP, Stencil Ref = X Unlit pixels will be culle d

Setting up Stencil Buffer

Setting up Stencil Buffer Only regions that fail depth test represent objects within the light volume

Only regions that fail depth test represent objects within the light volume

Buffer Only regions that fail depth test represent objects within the light volume View frustum Only
View frustum
View frustum

Only these bits shaded.

Buffer Only regions that fail depth test represent objects within the light volume View frustum Only
Buffer Only regions that fail depth test represent objects within the light volume View frustum Only

Shadows

Shadows Shadow maps work very well with deferred shading Work trivially for directional and spot lights

Shadow maps work very well with deferred shadingShadows Work trivially for directional and spot lights Point (omni) lights are trickier… Don’t forget to

Work trivially for directional and spot lightsShadows Shadow maps work very well with deferred shading Point (omni) lights are trickier… Don’t forget

Point (omni) lights are trickier…shading Work trivially for directional and spot lights Don’t forget to use NVIDIA hardware shadow maps

Don’t forget to use NVIDIA hardware shadow mapsand spot lights Point (omni) lights are trickier… Render to shadow map at 2x pixels per

Render to shadow map at 2x pixels per clockDon’t forget to use NVIDIA hardware shadow maps Shadow depth comparison in hardware 4 sample percentage

Shadow depth comparison in hardwareshadow maps Render to shadow map at 2x pixels per clock 4 sample percentage closer filtering

4 sample percentage closer filtering in hardwareat 2x pixels per clock Shadow depth comparison in hardware Very fast high-quality shadows! May want

Very fast high-quality shadows!in hardware 4 sample percentage closer filtering in hardware May want to increase shadow bias based

May want to increase shadow bias based on pos.z4 sample percentage closer filtering in hardware Very fast high-quality shadows! If using fp16 for G-buffer

If using fp16 for G-buffer positionspercentage closer filtering in hardware Very fast high-quality shadows! May want to increase shadow bias based

Very fast high-quality shadows! May want to increase shadow bias based on pos.z If using fp16
Very fast high-quality shadows! May want to increase shadow bias based on pos.z If using fp16

Virtual Shadow Depth Cube Texture

Virtual Shadow Depth Cube Texture Solution for point light shadows Technique created by Will Newhall &

Solution for point light shadowsVirtual Shadow Depth Cube Texture Technique created by Will Newhall & Gary King Unrolls a shadow

Technique created by Will Newhall & Gary KingShadow Depth Cube Texture Solution for point light shadows Unrolls a shadow cube map into a

Unrolls a shadow cube map into a 2D depth textureshadows Technique created by Will Newhall & Gary King Pixel shader computes ST and depth from

Pixel shader computes ST and depth from XYZGary King Unrolls a shadow cube map into a 2D depth texture G16R16 cubemap efficiently maps

G16R16 cubemap efficiently maps XYZ->ST2D depth texture Pixel shader computes ST and depth from XYZ Free bilinear filtering of fsets

Free bilinear filtering offsets extra per-pixel work fsets extra per-pixel work

More details in ShaderX 3 ShaderX 3

Charles River Media, October 2004cubemap efficiently maps XYZ->ST Free bilinear filtering of fsets extra per-pixel work More details in ShaderX

Free bilinear filtering of fsets extra per-pixel work More details in ShaderX 3 Charles River Media,
Free bilinear filtering of fsets extra per-pixel work More details in ShaderX 3 Charles River Media,

Multiple Materials w/ Deferred Shading

Multiple Materials w/ Deferred Shading Deferred shading doesn’t scale to multiple materials Limited number of terms

Deferred shading doesn’t scale to multiple materialsMultiple Materials w/ Deferred Shading Limited number of terms in G-buffer Shader is tied to light

Limited number of terms in G-bufferDeferred shading doesn’t scale to multiple materials Shader is tied to light source – 1 BRDF

Shader is tied to light source – 1 BRDF to rule them allto multiple materials Limited number of terms in G-buffer Options: Re-render light multiple times, 1 for

Options:Shader is tied to light source – 1 BRDF to rule them all Re-render light multiple

Re-render light multiple times, 1 for each BRDFis tied to light source – 1 BRDF to rule them all Options: Loses much of

Loses much of deferred shading’s benefitall Options: Re-render light multiple times, 1 for each BRDF Store multiple BRDFs in light shader,

Store multiple BRDFs in light shader, choose per-pixel1 for each BRDF Loses much of deferred shading’s benefit Use that last free channel in

Use that last free channel in G-buffer to store material ID G-buffer to store material ID

Reasonably coherent dynamic branchingper-pixel Use that last free channel in G-buffer to store material ID Should work well on

Should work well on pixel shader 3.0 hardwarechoose per-pixel Use that last free channel in G-buffer to store material ID Reasonably coherent dynamic

in G-buffer to store material ID Reasonably coherent dynamic branching Should work well on pixel shader
in G-buffer to store material ID Reasonably coherent dynamic branching Should work well on pixel shader

Transparency

Transparency Deferred shading does not support transparency Only shades nearest surfaces Just draw transparent objects

Deferred shading does not support transparencyTransparency Only shades nearest surfaces Just draw transparent objects last Can use depth peeling Blend into

Only shades nearest surfacesTransparency Deferred shading does not support transparency Just draw transparent objects last Can use depth peeling

Just draw transparent objects lastdoes not support transparency Only shades nearest surfaces Can use depth peeling Blend into final image,

Can use depth peelingshades nearest surfaces Just draw transparent objects last Blend into final image, so rt back-to-front as

Blend into final image, sort back-to-front as always rt back-to-front as always

Use “normal” shading / lightingBlend into final image, so rt back-to-front as always Make sure you use the same depth

Make sure you use the same depth buffer as the restso rt back-to-front as always Use “normal” shading / lighting Also draw particles and other blended

Also draw particles and other blended effects lastso rt back-to-front as always Use “normal” shading / lighting Make sure you use the same

shading / lighting Make sure you use the same depth buffer as the rest Also draw
shading / lighting Make sure you use the same depth buffer as the rest Also draw

Post-Processing

Post-Processing G-buffer + accum buffers can be used as input to many post-process effects Glow Auto-Exposure

G-buffer + accum buffers can be used as input to many post-process effects many post-process effects

Glowbuffers can be used as input to many post-process effects Auto-Exposure Distortion Edge-smoothing Fog Whatever else!

Auto-Exposurecan be used as input to many post-process effects Glow Distortion Edge-smoothing Fog Whatever else! HDR

Distortioncan be used as input to many post-process effects Glow Auto-Exposure Edge-smoothing Fog Whatever else! HDR

Edge-smoothingcan be used as input to many post-process effects Glow Auto-Exposure Distortion Fog Whatever else! HDR

Fogused as input to many post-process effects Glow Auto-Exposure Distortion Edge-smoothing Whatever else! HDR See HDR

Whatever else!can be used as input to many post-process effects Glow Auto-Exposure Distortion Edge-smoothing Fog HDR See

HDRused as input to many post-process effects Glow Auto-Exposure Distortion Edge-smoothing Fog Whatever else! See HDR

See HDR talkcan be used as input to many post-process effects Glow Auto-Exposure Distortion Edge-smoothing Fog Whatever else!

as input to many post-process effects Glow Auto-Exposure Distortion Edge-smoothing Fog Whatever else! HDR See HDR
as input to many post-process effects Glow Auto-Exposure Distortion Edge-smoothing Fog Whatever else! HDR See HDR

Anti-Aliasing with Deferred Shading

Anti-Aliasing with Deferred Shading Deferred shading is incompatible with MSAA API doesn’t allow antialiased MRTs But

Deferred shading is incompatible with MSAAAnti-Aliasing with Deferred Shading API doesn’t allow antialiased MRTs But this is a small problem… AA

API doesn’t allow antialiased MRTsDeferred Shading Deferred shading is incompatible with MSAA But this is a small problem… AA resolve

incompatible with MSAA API doesn’t allow antialiased MRTs But this is a small problem… AA resolve

But this is a small problem…

AA resolve has to happen after accumulation!allow antialiased MRTs But this is a small problem… Resolve = process of combining multiple samples

Resolve = process of combining multiple samplesproblem… AA resolve has to happen after accumulation! G-Buffer cannot be resolved What happens to an

G-Buffer cannot be resolvedhappen after accumulation! Resolve = process of combining multiple samples What happens to an FP16 position

= process of combining multiple samples G-Buffer cannot be resolved What happens to an FP16 position

What happens to an FP16 position when resolved?

= process of combining multiple samples G-Buffer cannot be resolved What happens to an FP16 position
= process of combining multiple samples G-Buffer cannot be resolved What happens to an FP16 position

Shadow Edge, Correct AA Resolve

viewer occluder shadow receiver Scene

viewer

occluder

shadow receiver
shadow
receiver

Scene

Shadow Edge, Correct AA Resolve viewer occluder shadow receiver Scene
Shadow Edge, Correct AA Resolve viewer occluder shadow receiver Scene
Shadow Edge, Correct AA Resolve viewer occluder shadow receiver Scene
Shadow Edge, Correct AA Resolve Anti-aliased edge viewer occluder shadow receiver Scene AA depths 0.3

Shadow Edge, Correct AA Resolve

Anti-aliased edge

Shadow Edge, Correct AA Resolve Anti-aliased edge viewer occluder shadow receiver Scene AA depths 0.3 0.7

viewer

occluder

shadow receiver
shadow
receiver

Scene

AA depths

0.3

0.7

0.3

0.7

Occluder Depth = 0.3

Anti-aliased edge viewer occluder shadow receiver Scene AA depths 0.3 0.7 0.3 0.7 Occluder Depth =
Anti-aliased edge viewer occluder shadow receiver Scene AA depths 0.3 0.7 0.3 0.7 Occluder Depth =
Anti-aliased edge viewer occluder shadow receiver Scene AA depths 0.3 0.7 0.3 0.7 Occluder Depth =
Shadow Edge, Correct AA Resolve Anti-aliased edge viewer shadow receiver Scene occluder AA depths 0.3

Shadow Edge, Correct AA Resolve

Anti-aliased edge

Shadow Edge, Correct AA Resolve Anti-aliased edge viewer shadow receiver Scene occluder AA depths 0.3 0.7

viewer

shadow receiver
shadow
receiver
Correct AA Resolve Anti-aliased edge viewer shadow receiver Scene occluder AA depths 0.3 0.7 0.3 0.7

Scene

occluder

AA depths

0.3

0.7

0.3

0.7

Occluder Depth = 0.3

AA depths 0.3 0.7 0.3 0.7 Occluder Depth = 0.3 AA shadow 0 1 0 1

AA shadow

0

1

0

1

= Shadow Test Depth

AA depths 0.3 0.7 0.3 0.7 Occluder Depth = 0.3 AA shadow 0 1 0 1
AA depths 0.3 0.7 0.3 0.7 Occluder Depth = 0.3 AA shadow 0 1 0 1
Shadow Edge, Correct AA Resolve Anti-aliased edge viewer shadow receiver Scene occluder AA depths 0.3

Shadow Edge, Correct AA Resolve

Anti-aliased edge

Shadow Edge, Correct AA Resolve Anti-aliased edge viewer shadow receiver Scene occluder AA depths 0.3 0.7

viewer

shadow receiver
shadow
receiver
Correct AA Resolve Anti-aliased edge viewer shadow receiver Scene occluder AA depths 0.3 0.7 0.3 0.7

Scene

occluder

AA depths

0.3

0.7

0.3

0.7

Occluder Depth = 0.3

0.3 0.7 0.3 0.7 Occluder Depth = 0.3 AA shadow 0 1 0 1 shadow =

AA shadow

0

1

0

1

shadow = 0.5

= Shadow Test Depth

0.7 0.3 0.7 Occluder Depth = 0.3 AA shadow 0 1 0 1 shadow = 0.5
0.7 0.3 0.7 Occluder Depth = 0.3 AA shadow 0 1 0 1 shadow = 0.5
Shadow Edge, G-Buffer Resolve viewer AA depths 0.3 0.7 0.3 0.7 Anti-aliased edge occluder shadow

Shadow Edge, G-Buffer Resolve

Shadow Edge, G-Buffer Resolve viewer AA depths 0.3 0.7 0.3 0.7 Anti-aliased edge occluder shadow receiver

viewer

AA depths

0.3

0.7

0.3

0.7

Edge, G-Buffer Resolve viewer AA depths 0.3 0.7 0.3 0.7 Anti-aliased edge occluder shadow receiver Scene

Anti-aliased edge

occluder

shadow receiver
shadow
receiver

Scene

Edge, G-Buffer Resolve viewer AA depths 0.3 0.7 0.3 0.7 Anti-aliased edge occluder shadow receiver Scene
Edge, G-Buffer Resolve viewer AA depths 0.3 0.7 0.3 0.7 Anti-aliased edge occluder shadow receiver Scene
Shadow Edge, G-Buffer Resolve viewer AA depths 0.3 0.7 0.3 0.7 Pre-resolve depth = 0.5

Shadow Edge, G-Buffer Resolve

Shadow Edge, G-Buffer Resolve viewer AA depths 0.3 0.7 0.3 0.7 Pre-resolve depth = 0.5 Occluder

viewer

AA depths

0.3

0.7

0.3

0.7

Resolve viewer AA depths 0.3 0.7 0.3 0.7 Pre-resolve depth = 0.5 Occluder depth = 0.3

Pre-resolve depth = 0.5

Occluder depth = 0.3

0.3 0.7 Pre-resolve depth = 0.5 Occluder depth = 0.3 Anti-aliased edge occluder shadow receiver Scene

Anti-aliased edge

occluder

shadow receiver
shadow
receiver
0.7 Pre-resolve depth = 0.5 Occluder depth = 0.3 Anti-aliased edge occluder shadow receiver Scene =

Scene

= Shadow Test Depth

0.7 Pre-resolve depth = 0.5 Occluder depth = 0.3 Anti-aliased edge occluder shadow receiver Scene =
0.7 Pre-resolve depth = 0.5 Occluder depth = 0.3 Anti-aliased edge occluder shadow receiver Scene =
Shadow Edge, G-Buffer Resolve viewer AA depths 0.3 0.7 0.3 0.7 Pre-resolve depth = 0.5

Shadow Edge, G-Buffer Resolve

Shadow Edge, G-Buffer Resolve viewer AA depths 0.3 0.7 0.3 0.7 Pre-resolve depth = 0.5 Occluder

viewer

AA depths

0.3

0.7

0.3

0.7

Resolve viewer AA depths 0.3 0.7 0.3 0.7 Pre-resolve depth = 0.5 Occluder depth = 0.3

Pre-resolve depth = 0.5

Occluder depth = 0.3

0.3 0.7 Pre-resolve depth = 0.5 Occluder depth = 0.3 Anti-aliased edge occluder receiver Shadow 1

Anti-aliased edge

occluder

receiver Shadow 1
receiver
Shadow
1
depth = 0.5 Occluder depth = 0.3 Anti-aliased edge occluder receiver Shadow 1 Scene = Shadow

Scene

= Shadow Test Depth

Shadow = 1.0

depth = 0.5 Occluder depth = 0.3 Anti-aliased edge occluder receiver Shadow 1 Scene = Shadow
depth = 0.5 Occluder depth = 0.3 Anti-aliased edge occluder receiver Shadow 1 Scene = Shadow

Other AA options?

Other AA options? Supersampling lighting is a costly option Lighting is typically the bottleneck, pixel shader

Supersampling lighting is a costly optionOther AA options? Lighting is typically the bottleneck, pixel shader bound 4x supersampled lighting would be

Lighting is typically the bottleneck, pixel shader boundOther AA options? Supersampling lighting is a costly option 4x supersampled lighting would be a big

4x supersampled lighting would be a big perf. HitLighting is typically the bottleneck, pixel shader bound “Intelligent Blur” : Only filter object edges Based

“Intelligent Blur” : Only filter object edgesbound 4x supersampled lighting would be a big perf. Hit Based on depths and norm als

Based on depths and normals of neighboring pixels als of neighboring pixels

Set “barrier” high, to avoid interior blurring: Only filter object edges Based on depths and norm als of neighboring pixels Full-screen shader,

Full-screen shader, but cheaper than SSAAfilter object edges Based on depths and norm als of neighboring pixels Set “barrier” high, to

als of neighboring pixels Set “barrier” high, to avoid interior blurring Full-screen shader, but cheaper than
als of neighboring pixels Set “barrier” high, to avoid interior blurring Full-screen shader, but cheaper than

Should I use Deferred Shading?

Should I use Deferred Shading? This is an ESSENTIAL question Deferred shading is not always a

This is an ESSENTIAL questionShould I use Deferred Shading? Deferred shading is not always a win One major title has

Deferred shading is not always a winShould I use Deferred Shading? This is an ESSENTIAL question One major title has already scrapped

One major title has already scrapped it!an ESSENTIAL question Deferred shading is not always a win Another came close Many tradeoffs AA

Another came closeis not always a win One major title has already scrapped it! Many tradeoffs AA is

Many tradeoffsOne major title has already scrapped it! Another came close AA is problematic Some scenes work

AA is problematichas already scrapped it! Another came close Many tradeoffs Some scenes work well, others very poorly

Some scenes work well, others very poorlyit! Another came close Many tradeoffs AA is problematic The benefit will depend on your application

The benefit will depend on your applicationit! Another came close Many tradeoffs AA is problematic Some scenes work well, others very poorly

Game designAA is problematic Some scenes work well, others very poorly The benefit will depend on your

Level designAA is problematic Some scenes work well, others very poorly The benefit will depend on your

problematic Some scenes work well, others very poorly The benefit will depend on your application Game
problematic Some scenes work well, others very poorly The benefit will depend on your application Game

When is Deferred Shading A Win?

When is Deferred Shading A Win? Not when you have many directional lights Shading complexity will

Not when you have many directional lightsWhen is Deferred Shading A Win? Shading complexity will be O(R*L), R = screen res. Outdoor

Shading complexity will be O(R*L), R = screen res.Shading A Win? Not when you have many directional lights Outdoor daytime scenes probably not a

Outdoor daytime scenes probably not a good caselights Shading complexity will be O(R*L), R = screen res. Better when you have lots of

Better when you have lots of local lightsscreen res. Outdoor daytime scenes probably not a good case Ideal case is non-overlapping lights Shading

Ideal case is non-overlapping lightsnot a good case Better when you have lots of local lights Shading complexity O(R) Nighttime

Shading complexity O(R)lots of local lights Ideal case is non-overlapping lights Nighttime scenes with many dynamic lights! In

Nighttime scenes with many dynamic lights!of local lights Ideal case is non-overlapping lights Shading complexity O(R) In any case, make sure

In any case, make sure G-Buffer pass is cheapof local lights Ideal case is non-overlapping lights Shading complexity O(R) Nighttime scenes with many dynamic

lights Shading complexity O(R) Nighttime scenes with many dynamic lights! In any case, make sure G-Buffer
lights Shading complexity O(R) Nighttime scenes with many dynamic lights! In any case, make sure G-Buffer

Gosh, what about z-cull & SM3.0?

Gosh, what about z-cull & SM3.0? Isn’t the goal of z-cull to achieve deferred shading? Do

Isn’t the goal of z-cull to achieve deferred shading?Gosh, what about z-cull & SM3.0? Do an initial front-to-back-sorted z-only pass. Then you will shade

Do an initial front-to-back-sorted z-only pass.Isn’t the goal of z-cull to achieve deferred shading? Then you will shade only visible surfaces

Then you will shade only visible surfaces anyway!shading? Do an initial front-to-back-sorted z-only pass. Shader Model 3.0 allows “uber shaders” Iterate over

Shader Model 3.0 allows “uber shaders”pass. Then you will shade only visible surfaces anyway! Iterate over multiple lights of different types

Iterate over multiple lights of different types in “traditional” (non-deferred) shading “traditional” (non-deferred) shading

Combine these, and performance could be as good (better?) than deferred shading! (better?) than deferred shading!

More tests needed“traditional” (non-deferred) shading Combine these, and performance could be as good (better?) than deferred shading!

(non-deferred) shading Combine these, and performance could be as good (better?) than deferred shading! More tests
(non-deferred) shading Combine these, and performance could be as good (better?) than deferred shading! More tests

We don’t have all the answers

We don’t have all the answers We can’t tell you to use it or not Experimentation

We can’t tell you to use it or not

have all the answers We can’t tell you to use it or not Experimentation and analysis

Experimentation and analysis is importanthave all the answers We can’t tell you to use it or not Depends on your

Depends on your applicationthe answers We can’t tell you to use it or not Experimentation and analysis is important

Need to have a fallback anywayall the answers We can’t tell you to use it or not Experimentation and analysis is

you to use it or not Experimentation and analysis is important Depends on your application Need
you to use it or not Experimentation and analysis is important Depends on your application Need

Sorry to end it this way, but…

Sorry to end it this way, but… MORE RESEARCH IS NEEDED! PLEASE SHARE YOUR FINDINGS! (you

MORE RESEARCH IS NEEDED! PLEASE SHARE YOUR FINDINGS!

(you can bet we’ll share ours)

Sorry to end it this way, but… MORE RESEARCH IS NEEDED! PLEASE SHARE YOUR FINDINGS! (you
Sorry to end it this way, but… MORE RESEARCH IS NEEDED! PLEASE SHARE YOUR FINDINGS! (you

Questions?

Questions? http://developer.nvidia.com mharris@nvidia.com

http://developer.nvidia.com

mharris@nvidia.com

Questions? http://developer.nvidia.com mharris@nvidia.com
Questions? http://developer.nvidia.com mharris@nvidia.com
Questions? http://developer.nvidia.com mharris@nvidia.com

GeForce 6800 Guidance (1 of 6)

Allocate render targets FIRSTGeForce 6800 Guidance (1 of 6) Deferred Shading uses many RTs Allocating them first ensures they

GeForce 6800 Guidance (1 of 6) Allocate render targets FIRST Deferred Shading uses many RTs Allocating

Deferred Shading uses many RTsGeForce 6800 Guidance (1 of 6) Allocate render targets FIRST Allocating them first ensures they are

Allocating them first ensures they are in fastest RAMAllocate render targets FIRST Deferred Shading uses many RTs Keep MRT usage to 3 or fewer

Keep MRT usage to 3 or fewer render targetsRTs Allocating them first ensures they are in fastest RAM Performance cliff at 4 on GeForce

Performance cliff at 4 on GeForce 6800in fastest RAM Keep MRT usage to 3 or fewer render targets Each additional RT adds

Each additional RT adds shader overheadfewer render targets Performance cliff at 4 on GeForce 6800 Don’t render to all RTs if

Don’t render to all RTs if surface doesn’t need themcliff at 4 on GeForce 6800 Each additional RT adds shader overhead e.g. Sky Dome doesn’t

shader overhead Don’t render to all RTs if surface doesn’t need them e.g. Sky Dome doesn’t

e.g. Sky Dome doesn’t need normals or position

shader overhead Don’t render to all RTs if surface doesn’t need them e.g. Sky Dome doesn’t
shader overhead Don’t render to all RTs if surface doesn’t need them e.g. Sky Dome doesn’t

GeForce 6800 Guidance (2 of 6)

GeForce 6800 Guidance (2 of 6) Use aniso filtering during G-buffer pass Will help image quality

Use aniso filtering during G-buffer passGeForce 6800 Guidance (2 of 6) Will help image quality on parts of image that don’t

Will help image quality on parts of image that don’t benefit from “edge smoothing AA” parts of image that don’t benefit from “edge smoothing AA”

Only on textures that need it!of image that don’t benefit from “edge smoothing AA” Take advantage of early Z- and Stencil

Take advantage of early Z- and Stencil cullingfrom “edge smoothing AA” Only on textures that need it! Don’t switch z-test direction mid-frame Avoid

Don’t switch z-test direction mid-frameon textures that need it! Take advantage of early Z- and Stencil culling Avoid frequent stencil

Avoid frequent stencil reference / op changesOnly on textures that need it! Take advantage of early Z- and Stencil culling Don’t switch

of early Z- and Stencil culling Don’t switch z-test direction mid-frame Avoid frequent stencil reference /
of early Z- and Stencil culling Don’t switch z-test direction mid-frame Avoid frequent stencil reference /

GeForce 6800 Guidance (3 of 6)

GeForce 6800 Guidance (3 of 6) Use hardware shadow mapping (“UltraShadow”) Use D16 or D24X8 format

Use hardware shadow mapping (“UltraShadow”)

(3 of 6) Use hardware shadow mapping (“UltraShadow”) Use D16 or D24X8 format for shadow maps

Use D16 or D24X8 format for shadow maps(3 of 6) Use hardware shadow mapping (“UltraShadow”) Bind 8-bit color RT, disable color writes on

Bind 8-bit color RT, disable color writes on updates(“UltraShadow”) Use D16 or D24X8 format for shadow maps Use tex2Dproj to get hardware shadow comparison

Use tex2Dproj to get hardware shadow comparisonformat for shadow maps Bind 8-bit color RT, disable color writes on updates Enable bilinear filtering

Enable bilinear filtering to get 4-sample PCFfor shadow maps Bind 8-bit color RT, disable color writes on updates Use tex2Dproj to get

color writes on updates Use tex2Dproj to get hardware shadow comparison Enable bilinear filtering to get
color writes on updates Use tex2Dproj to get hardware shadow comparison Enable bilinear filtering to get

GeForce 6800 Guidance (4 of 6)

GeForce 6800 Guidance (4 of 6) Use fp16 filtering and blending Fp16 textures are fully orthogonal!

Use fp16 filtering and blendingGeForce 6800 Guidance (4 of 6) Fp16 textures are fully orthogonal! No need to “ping-pong” to

Fp16 textures are fully orthogonal!6800 Guidance (4 of 6) Use fp16 filtering and blending No need to “ping-pong” to accumulate

No need to “ping-pong” to accumulate light sourcesfiltering and blending Fp16 textures are fully orthogonal! Use the lowest precision possible Lower-precision textures

Use the lowest precision possibleNo need to “ping-pong” to accumulate light sources Lower-precision textures improve cache coherence, reduce

Lower-precision textures improve cache coherence, reduce bandwidth reduce bandwidth

Use half data type in shaderslight sources Use the lowest precision possible Lower-precision textures improve cache coherence, reduce bandwidth

precision possible Lower-precision textures improve cache coherence, reduce bandwidth Use half data type in shaders
precision possible Lower-precision textures improve cache coherence, reduce bandwidth Use half data type in shaders

GeForce 6800 Guidance (5 of 6)

GeForce 6800 Guidance (5 of 6) Use write masks to tell optimizer sizes of operands Can

Use write masks to tell optimizer sizes of operands to tell optimizer sizes of operands

of 6) Use write masks to tell optimizer sizes of operands Can schedule multiple instructions per

Can schedule multiple instructions per cycle

Two simultaneous 2-component ops, orof operands Can schedule multiple instructions per cycle One 3-component op + 1 scalar op Without

One 3-component op + 1 scalar opmultiple instructions per cycle Two simultaneous 2-component ops, or Without write masks, compiler must be conservative

Without write masks, compiler must be conservativeCan schedule multiple instructions per cycle Two simultaneous 2-component ops, or One 3-component op + 1

Two simultaneous 2-component ops, or One 3-component op + 1 scalar op Without write masks, compiler
Two simultaneous 2-component ops, or One 3-component op + 1 scalar op Without write masks, compiler

GeForce 6800 Guidance (6 of 6)

GeForce 6800 Guidance (6 of 6) Use fp16 normalize() Compiles to single-cycle nrmh instruction Only applies

Use fp16 normalize()

GeForce 6800 Guidance (6 of 6) Use fp16 normalize() Compiles to single-cycle nrmh instruction Only applies

Compiles to single-cycle nrmh instructionGeForce 6800 Guidance (6 of 6) Use fp16 normalize() Only applies to half3, so: half3 n

Only applies to half3, so:fp16 normalize() Compiles to single-cycle nrmh instruction half3 n = normalize(tex2D(no rmalmap, coords).xyz); // fast

half3 n = normalize(tex2D(normalmap, coords).xyz); // fast

// slow

float3 n = normalize(tex2D(normalmap, coords).xyz); // slow

half4 n = normalize(tex2D(normalmap, coords));

// slow float3 n = normalize(tex2D(n ormalmap, coords).xyz); // slow half4 n = normalize(tex2D(norm almap, coords));
// slow float3 n = normalize(tex2D(n ormalmap, coords).xyz); // slow half4 n = normalize(tex2D(norm almap, coords));

Example Attribute Layout

Example Attribute Layout Normal: x,y,z Position: x, y, z Diffuse Reflectance: RGB Specular Reflectance (“Gloss Map”,

Normal: x,y,zExample Attribute Layout Position: x, y, z Diffuse Reflectance: RGB Specular Reflectance (“Gloss Map”, single channel)

Position: x, y, zExample Attribute Layout Normal: x,y,z Diffuse Reflectance: RGB Specular Reflectance (“Gloss Map”, single channel)

Diffuse Reflectance: RGBExample Attribute Layout Normal: x,y,z Position: x, y, z Specular Reflectance (“Gloss Map”, single channel) Emissive

Specular Reflectance (“Gloss Map”, single channel)Normal: x,y,z Position: x, y, z Diffuse Reflectance: RGB Emissive (single channel) One free channel Ideas

Emissive (single channel)RGB Specular Reflectance (“Gloss Map”, single channel) One free channel Ideas on this later Your application

One free channelReflectance (“Gloss Map”, single channel) Emissive (single channel) Ideas on this later Your application will dictate

Ideas on this laterReflectance (“Gloss Map”, single channel) Emissive (single channel) One free channel Your application will dictate

Your application will dictateRGB Specular Reflectance (“Gloss Map”, single channel) Emissive (single channel) One free channel Ideas on this

Map”, single channel) Emissive (single channel) One free channel Ideas on this later Your application will
Map”, single channel) Emissive (single channel) One free channel Ideas on this later Your application will