You are on page 1of 11

UE4 Rendering

Nick Penwarden
Priorities
Parallel Rendering
Next-gen Console Performance
Features
Baseline Performance
Mobile Rendering
Parallel Rendering
What have we done so far?
Progress on stateless RHI: resource tables + command lists
Works on all platforms except OpenGL which needs some
more work
When will we do the first merge back to main?
As soon as OpenGL is running well, some cross compiler
work needed
Gradual push towards more and more stateless, not
going to be an abrupt switch
Want to make the transition for licensees gradual and
manageable
Parallel Rendering
Want to make the transition as easy for
licensees as possible
Incremental merges
Keep the rendering loop looking as single
threaded as possible
Current code will look largely unchanged
Queue up command lists, kick off tasks to populate
them
After queuing up all work, block and submit command
lists in program order
Resource Tables
Does for resources what constant buffers did for
shader constants
Manage/cache resources at the frequency they
change: per-material, per-primitive, per-view, etc.
Less engine work per-resource on render thread
Helps reduce size of command lists (less state to
track)
RHI has more freedom to implement efficiently
DX12 and PS4 have native support, GL bindless
textures
Efficient simulation on D3D11, older OpenGL
Resource Tables
/** The uniform shader parameters associated with precomputed lighting. */
BEGIN_UNIFORM_BUFFER_STRUCT(FPrecomputedLightingUniforms,ENGINE_API)
DECLARE_UNIFORM_BUFFER_STRUCT_MEMBER(FVector4,LightMapCoordinateScaleBias)
DECLARE_UNIFORM_BUFFER_STRUCT_MEMBER(FVector4,ShadowMapCoordinateScaleBias)
DECLARE_UNIFORM_BUFFER_STRUCT_MEMBER_ARRAY_EX(FVector4,LightMapScale,[MAX_LIGHTMAP_COEF],EShade
rPrecisionModifier::Half)
DECLARE_UNIFORM_BUFFER_STRUCT_MEMBER_ARRAY_EX(FVector4,LightMapAdd,[MAX_LIGHTMAP_COEF],EShaderP
recisionModifier::Half)
DECLARE_UNIFORM_BUFFER_STRUCT_MEMBER(FVector4,StaticShadowMapMasks)
DECLARE_UNIFORM_BUFFER_STRUCT_MEMBER(FVector4,DistanceFieldParameters)
DECLARE_UNIFORM_BUFFER_STRUCT_MEMBER_TEXTURE(Texture2D,LightMapTexture)
DECLARE_UNIFORM_BUFFER_STRUCT_MEMBER_SAMPLER(SamplerState,LightMapSampler)
DECLARE_UNIFORM_BUFFER_STRUCT_MEMBER_TEXTURE(Texture2D,SkyOcclusionTexture)
DECLARE_UNIFORM_BUFFER_STRUCT_MEMBER_SAMPLER(SamplerState,SkyOcclusionSampler)
DECLARE_UNIFORM_BUFFER_STRUCT_MEMBER_TEXTURE(Texture2D,StaticShadowTexture)
DECLARE_UNIFORM_BUFFER_STRUCT_MEMBER_SAMPLER(SamplerState,StaticShadowTextureSampler)
END_UNIFORM_BUFFER_STRUCT(FPrecomputedLightingUniforms)
Example: move all precomputed lighting parameters in to a uniform
buffer
Now we can get rid of all the logic from our lightmap policies
Allows us to unify draw lists -- no more static typing!
RHI Command Lists
All drawing commands buffered at the RHI level
CmdList->SetTexture()
CmdList->DrawIndexedPrimitive()
Etc.
Current implementation is straight-forward. Room to
optimize format
Clearly separates costs, decouples engine work from
RHI/driver work
Decoupling is the key here -- can parallelize engine work
even on platforms without parallel submission
Will still leverage parallel submission on platforms that
support it

RHI Command Lists
Doesnt this just add overhead?
~10% faster single threaded on D3D11, ~5% on XB1 and PS4
Timings from one (large) static draw list in an
Infiltrator scene in D3D11:
14.2ms -> 12.75ms w/ command lists
8.6ms to buffer RHI commands (engine work,
parallelizable in a platform independent manner)
+0.65ms to execute command list w/o calling D3D
1.81MB, improvements to format and resource tables will
help here!
+3.5ms to execute D3D calls (RHI + driver)

Next Steps: Parallel Rendering
Get OpenGL working well, merge to main
Refactor DrawDynamicElements, get it
buffered just like static lists.
Distribute command list generation to
multiple threads.
Rework resource lifetime management a bit to
reduce the amount of refcounting that
happens during a frame.
Next-gen Console Performance
Obviously, parallel rendering is our major
effort on the CPU
Postprocessing is a big target for us on GPU:
Improved algorithms
Micro optimizations
Move passes to compute where it makes sense
Combining passes where it makes sense
Better ESRAM utilization on XB1
Also looking at g-buffer reorganization
Features
Plan to round out our feature set:
Translucent surface lighting
Skin shading improvements
Separate textures + samplers, increase limit to 32
LPV improvements (Lionhead doing great work here,
have artists giving us feedback internally)
Sky lighting
First pass with static occlusion available now
Working on fully dynamic occlusion now
Big feature, researchy/risky, adds a lot to a scene
though!

You might also like