- How doesa (i24 \
Mobile GPU \SaEe
Work? ‘
TR
WI
Comic: Ikaridon Yu
Although the mobile
and PC GPUs do the
same job, directly
porting from the PC
is not a good idea!
directly now.
powerfulDr. Arm, Z
the mobile S
GPU expert. \42
I help developers
by answering all \ )
sorts of questions
about Mali GPUs.
Although the mobile
GPU uses the same
API as the PC GPU,
) the architectures of
“] that the direct) the two GPUs are
porting is nota { quite different.
Hmm... but I still Nota’ eroblen
can't imagine the py i
difference even let me take you
though you said inside a GPU.
CAIN crvecrerammconssronnis * 2+Wow! That is a
. huge power
Here is the station.
inside of a
A
the PC GPU doesn't
need to worry about
power consumption.
memory.
big
warehouse All data that
over there? needs to be
rendered is
stored inside it. J
There are many Porters carry data to the GPU
conveyors beside the gdm There are so many core whenever the GPU needs|
warehouse, where porters carrying to render something. The PC
lare they going to? data over the GPU can run so many
conveyors! conveyors because it has far
more energy available.
That's why a PC GPU can
et alg transfer huge amounts of
data at high speeds.
They are connected
to the GPU core,
which is the factory
over there.
>
° oO
CAEN corer armconserovnes “3+There are two masters inside a GPU core -
the Master Vertex shader and Master
Fragment shader. You can see there is a
conveyor between them.
Next, let's go
to see the
inside of a
GPU core.
This one is the Master This one Is the Master Fragment
Vertex shader, all shader. The Master Vertex shader
triangles that need to be sends transformed triangles to her.
rendered must be Then she will process the triangles
processed by him first. ate many freaments.
V Li Looks
fl.
like an
artist...
Triangles are processed one-by-one
After that, the porters
carry those fragments and a lot of fragments need to be Y’ so jabor
back to the video memory moved back to the video memory \ intensive
pein : after processing one triangle. That's
so the final image can be why the PC GPU needs so many
displayed on the screen. Soniveyars
eeu 7Now, let's go
to the inside of
a mobile GPU
fora
comparison.
That's right. The power
station of a PC can
provide 200 ~ 300 Watts
but the power station of a
mobile device can only
provide 2 ~ 3 Watts.
DESKTOP.
“~ about half a
MOBILE low-energy
LED bulb
Did you see any
difference from the
PC GPU?
And there are only
two conveyors
outside of the
So if you port
your PC game)
directly to mobile,
the conveyors will
stall almost
straight away and
the battery will
CAPM cosccscrameom/arapnes “3+
station is
way too
small!
That's because
conveyors consume
more energy, soa
mobile GPU can't
afford as many
» conveyors as a PC
So how can
a mobile
GPU fix this?Let's take a close look at the The transformed
mobile GPU core. It has the triangles will not What?! So
pass directly to the what
vertex shader to process fragment shader. should the
triangles, but... Instead, the fragment
triangles will be shader do
transferred to the:
video memory.
7
/ But would it
increase the
size of the
data transfer?
XC
The Fragment shader
will get the transformed
triangles from the video
memory then.
Exactly! Did you see a I) On a mobile GPU, the
small warehouse il screen is split into
besides the fragment i i many tiles. Each tile
shader? A | contains 16x16 ~
64x64 fragments.
Yes, I didn't see
such a thing at
the PC GPU core.
CAI crrserersrmcon/ssancs “8The fragment shader processes one tile at Then the
a time and then processes all triangles in the
tile at once. So the frame buffer data just
needs to be transferred to the tile memory
at the beginning of the tile processing.
/
I got it now. Although’
the data transfer for
vertex is increased,
the tile design saves
more of the data
transfer for fragment.
If the GPU finds any triangle that is
occluded by other triangles, those
occluded triangles will be discarded
without rendering.
The whole tile |
will be transferred
back to the video'|
memory when |
all triangles in
this tile are
processed.
CAIN cesses ammcon/sapnes "7 ~
fragment
shader can directly
access the data in the
tile memory and
process all the triangles
You are
invisible, don't
waste my time
ERD)
POSSSTH
Just need to
transfer one
tile, easy job!
Data
transfer
Correct. And
there is another
advantage to
using the tile
for rendering...
| Cool, so using the
tile can save both
memory bandwidth
and power
consumption.
What @ smart
design.My size is fixed
fixed, the size of
the data transfer
is also fixed.
That's why the
mobile GPU doesn't
need so much
But we have
learned enough
But on a PC GPU, the data transfer size is not]
fixed. It varies depending on the number of
triangles that are rendered.
The occluded
object also needs
to be rendered,
what awaste! 4.
~
4
consuming
h huge amounts
Why is the fps so
low when there are
not many objects
‘on screen?
So what
should I do
to take
advantage of
A this
architecture
§ when porting
we've learned so far
my PC game
and I will show you | GPU core jf
how to do that in the} ‘
next episode. =.
recat
rere: Fragment) Tile
hader| [Shader
Video
memory
Go to the Arm developer site for more
detailed information :
https://developer.arm.com/solutions/-
graphics-and-gaming/developer-guides/-
learn-the-basics/tile-based-rendering