You are on page 1of 1

Efficient Rendering with Tile Local Storage

Marius Bjørge Sam Martin Sandeep Kakarlapudi Jan-Harald Fredriksen


ARM Geomerics ARM ARM
Trondheim, Norway Cambridge, UK Trondheim, Norway Trondheim, Norway
marius.bjorge@arm.com sam.martin@geomerics.com sandeep.kakarlapudi@arm.com jan-harald.fredriksen@arm.com

1. Introduction For viewer-facing particle systems, sorting of triangles by either


the CPU or GPU is sufficient. In real-time games, where the cost
The most significant difference between mobile GPUs and their of sorting may be significant, sorting into a fixed number of depth
desktop equivalents is the availability of sustained memory ‘buckets’ can offer a useful performance-quality trade-off. We
bandwidth. Without a revolution in mobile memory bandwidth investigate using tile local storage to store a fixed number of
mobile graphics must be tailored to work efficiently in its buckets and defer sorting until fragment shading. This simple
bandwidth-scarce environment. This is true at all levels of the approach has good performance but limited depth accuracy. We
hardware-software stack. We showed previously that deferred present a variation that uses an additional per-fragment “depth
rendering could be made bandwidth efficient by exploiting the on- mask” which allows the buckets to adapt to the distribution of the
chip memory used to store tile framebuffer contents in many tile- depth values. This trades some performance in favor of quality.
based GPUs [Martin et al, 2013]. We refer to this memory as tile
local storage. In this talk, we build upon this research to
demonstrate the versatility and effectiveness of tile local storage
with real world content. We show how key rendering challenges
can be met efficiently by use of tile local storage, and present an
updated extension that has cross-vendor support.

2. Practical lighting and shading

In our previous work lighting with tile local storage was shown to
be bandwidth efficient compared to traditional deferred lighting. Figure 1. “Transporter” mobile graphics demo
We explore this further by investigating performance in a
representative graphics pipeline on consumer devices. We take
existing content from the “Transporter” demo, built in a 4. Deferred virtual texturing
commercial game engine, and demonstrate how physically-based
shading, deferred lighting and order-independent transparency can Virtual texturing encodes multiple textures into a single, large,
be combined into a practical graphics pipeline. We describe a new virtual texture and uses a page table mechanism to provide a
tile local storage OpenGL ES 3.0 extension for fragment shading means of updating regions. We show how tile local storage can be
that has cross-vendor support and highlight the advances made used to implement a deferred virtual texturing system. By
from the initial draft extension. We compare the extension to deferring all texturing to a full screen resolve pass we ensure only
multiple render target framebuffer read back and an alternative visible fragments perform texture fetches, significantly reducing
approach for tile-based light culling when tile local storage is not texture bandwidth. This more unified approach to state and
available. texture handling provides further opportunities for savings on the
CPU by reducing the number of draw calls required.
3. Order-independent transparency
5. Tile local rendering pipelines
Efficient order-independent transparency (OIT) is a long standing
challenge. Traditional alpha blending requires per-fragment A uniquely expressive property of tile local storage is the
sorting to correctly render overlapping surfaces which does not persistence across draw calls and shader program changes within
admit a fixed bound on per-fragment storage. Sorted alpha the lifetime of the framebuffer. This persistence allows the results
blending is a good approximation for thin surfaces, such as glass, of a shader program to be passed to the subsequent one. We show
but overlapping surfaces can also be used to approximate how the techniques described can be combined into a single tile
volumetric effects, such as smoke. In these circumstances, sorted local rendering pipeline, eliminating main memory transfers.
alpha blending is not prescribed by the underlying model and
alternative blending operations should be considered. We create References
two variations of existing approximate OIT techniques that
enforce bounded storage [McGuire et al, 2013][Salvi et al, 2014]. MARTIN, S., BJØRGE, M., KAKARLAPUDI, S., FREDRIKSEN, J. 2013.
We show how each can be adapted to use tile local storage, Challenges with High Quality Mobile Graphics. ACM
reducing memory bandwidth and increasing performance. SIGGRAPH Mobile 2013

MCGUIRE, M., BAVOIL, L. 2013. Weighted Blended Order-


Independent Transparency. Journal of Computer Graphics
Permission to make digital or hard copies of part or all of this work for personal or Techniques (JCGT), vol. 2, no. 2, 122-141
classroom use is granted without fee provided that copies are not made or distributed
for commercial advantage and that copies bear this notice and the full citation on the
first page. Copyrights for third-party components of this work must be honored. For all
SALVI, M., VAIDYANATHAN, K. 2014. Multi-Layer Alpha
other uses, contact the Owner/Author. Blending. Symposium on Interactive 3D Graphics and Games
SIGGRAPH 2014, August 10 – 14, 2014, Vancouver, British Columbia, Canada.
2014 Copyright held by the Owner/Author.
ACM 978-1-4503-2960-6/14/08

You might also like