GRAPHICS PROCESSING UNIT

PRESENTED BY LEKSHMI P A ROLL NO:19
08/30/08 1

Presentation Overview
Definition Comparison with CPU Architecture GPU-CPU Interaction GPU Memory

08/30/08

2

Why GPU?
 To provide a separate dedicated graphics
resources including a graphics processor and memory.  To relieve some of the burden of the main system resources, namely the Central Processing Unit, Main Memory, and the System Bus, which would otherwise get saturated with graphical operations and I/O requests.
08/30/08 3

There comes

GPU
08/30/08 4

What is a GPU?
 A Graphics Processing Unit or GPU (also occasionally called Visual Processing Unit or VPU) is a dedicated processor efficient at manipulating and displaying computer graphics .  Like the CPU (Central Processing Unit), it is a single-chip processor.

08/30/08

5

HOWEVER,
The abstract goal of a GPU, is to
enable a representation of a 3D world as realistically as possible. So these GPUs are designed to provide additional computational power that is customized specifically to perform these 3D tasks.

08/30/08

6

GPU vs CPU
 A GPU is tailored for highly parallel
operation while a CPU executes programs serially.  For this reason, GPUs have many parallel execution units , while CPUs have few execution units .  GPUs have singificantly faster and more advanced memory interfaces as they need to shift around a lot more data than CPUs.  GPUs have much deeper pipelines (several thousand stages vs 10-20 for CPUs).
08/30/08 7

BRIEF HISTORY
 First-Generation GPUs
– Up to 1998; Nvidia’s TNT2, ATi’s Rage, and 3dfx’s Voodoo3;DX6 feature set.

 Second-Generation GPUs

– 1999 -2000; Nvidia’s GeForce256 and GeForce2, ATi’s Radeon7500, and S3’s Savage3D; T&L; OpenGL and DX7;Configurable.

 Third-Generation GPUs

– 2001; GeForce3/4Ti, Radeon8500, MS’s Xbox; OpenGL ARB, DX7/8; Vertex Programmability + ASM

 Fourth-Generation GPUs

– 2002 onwards; GeForce FX family, Radeon 9700; OpenGL+extensions, DX9; Vertex/Pixel Programability + HLSL; 0.13μ Process, 125M T/C, 200M T/S.

08/30/08

Fifth-Generation GPUs - GeForce 8X:DirectX10.

8

GPU Architecture
How many processing units? How many ALUs? Do you need a cache? What kind of memory?
08/30/08 9

GPU Architecture
How many processing units?
– Lots.

How many ALUs? Do you need a cache? What kind of memory?
08/30/08 10

GPU Architecture
How many processing units?
– Lots.

How many ALUs?
– Hundreds.

Do you need a cache? What kind of memory?
08/30/08 11

GPU Architecture
How many processing units?
– Lots.

How many ALUs?
– Hundreds.

Do you need a cache?
– Sort of.

What kind of memory?
08/30/08 12

GPU Architecture
How many processing units?
– Lots.

How many ALUs?
– Hundreds.

Do you need a cache?
– Sort of.

What kind of memory?
– very fast.
08/30/08 13

The difference…….

08/30/08

Without GPU

With GPU

14

The GPU pipeline
 The GPU receives geometry
information from the CPU as an input and provides a picture as an output  Let’s see how that happens…
host interface vertex processing triangle setup pixel processing memory interface

08/30/08

15

Details………..

08/30/08

16

Host Interface
The host interface is the communication bridge between the CPU and the GPU.  It receives commands from the CPU and also pulls geometry information from system memory.  It outputs a stream of vertices in object space with all their associated information (texture coordinates, per vertex color etc) .
host interface vertex processing triangle setup pixel processing memory interface

08/30/08

17

Vertex Processing
The vertex processing stage receives
vertices from the host interface in object space and outputs them in screen space This may be a simple linear transformation, or a complex operation involving morphing effects No new vertices are created in this stage, and no vertices are discarded (input/output has 1:1 mapping)
host interface vertex processing triangle setup pixel processing memory interface

08/30/08

18

Triangle setup
In this stage geometry information
becomes raster information (screen space geometry is the input, pixels are the output) Prior to rasterization, triangles that are backfacing or are located outside the viewing frustrum are rejected

host interface

vertex processing

triangle setup

pixel processing

memory interface

08/30/08

19

Triangle Setup (cont…..)
A pixel is generated if and only if its center is
inside the triangle Every pixel generated has its attributes computed to be the perspective correct interpolation of the three vertices that make up the triangle

08/30/08

20

Pixel Processing
Each pixel provided by triangle setup is
fed into pixel processing as a set of attributes which are used to compute the final color for this pixel The computations taking place here include texture mapping and math operations

host interface

vertex processing

triangle setup

pixel processing

memory interface

08/30/08

21

Memory Interface
Pixel colors provided by the previous
stage are written to the framebuffer Used to be the biggest bottleneck before pixel processing took over Before the final write occurs, some pixels are rejected by the zbuffer .On modern GPUs z is compressed to reduce framebuffer bandwidth (but not size).
host interface vertex processing triangle setup pixel processing memory interface

08/30/08

22

Programmability in GPU pipeline

In current state of the art GPUs, vertex

and pixel processing are now programmable The programmer can write programs that are executed for every vertex as well as for every pixel This allows fully customizable geometry and shading effects that go well beyond the generic look and feel of older 3D applications host vertex triangle pixel memory
interface processing setup processing interface

08/30/08

23

GPU Pipelined Architecture
(simplified view)
GPU
…110010100100…

C P U

Vertex Setup

Vertex Shader

Rasterizer

Pixel Shader

Frame buffer

Texture Storage + Filtering

Vertices
08/30/08

Pixels
24

GPU Pipelined Architecture
(simplified view)
GPU
C P U

Vertex Setup

Vertex Shader

Rasterizer

Pixel Shader

Frame buffer

Texture Storage + Filtering

One unit can limit the speed of the pipeline…
08/30/08 25

CPU/GPU interaction
The CPU and GPU inside the PC work
in parallel with each other There are two “threads” going on, one for the CPU and one for the GPU, which communicate through a command buffer: GPU reads commands from here
Pending GPU commands

CPU writes commands here
08/30/08 26

CPU/GPU interaction (cont)
If this command buffer is drained
empty, we are CPU limited and the GPU will spin around waiting for new input. All the GPU power in the universe isn’t going to make your application faster! If the command buffer fills up, the CPU will spin around waiting for the GPU to consume it, and we are effectively GPU limited
08/30/08 27

Synchronization issues
In the figure below, the CPU must
not overwrite the data in the “yellow” block until the GPU is done with the “black” command, which references that data:
GPU reads commands from here

08/30/08

CPU writes commands here

data
28

Inlining data One way to avoid these problems is
to inline all data to the command buffer and avoid references to separate data: GPU reads commands from here

CPU writes commands here

 However, this is also bad for performance, since we may need to copy seve instead of merely passing around a pointer

08/30/08

29

GPU readbacks
The output of a GPU is a rendered image
on the screen, what will happen if the CPU tries to read it? GPU reads commands from here
Pending GPU commands

CPU writes commands here

 GPU must be synchronized with the CPU, ie it must drain its entire command buffer, and the CPU must wait while this happens
08/30/08 30

GPU readbacks (cont)
We lose all parallelism, since first
the CPU waits for the GPU, then the GPU waits for the CPU (because the command buffer has been drained) Both CPU and GPU performance take a nosedive Bottom line: the image the GPU produces is for your eyes, not for the CPU (treat the CPU -> GPU highway as a one way street)
08/30/08 31

About GPU memory…..

08/30/08

32

Memory Hierarchy
CPU and GPU Memory Hierarchy
Disk CPU Main Memory GPU Video Memory CPU Caches GPU Caches GPU Constant Registers GPU Temporary Registers
33

CPU Registers
08/30/08

Where is GPU Data Stored?
– – – Vertex buffer Frame buffer Texture

Texture

Vertex Buffer

Vertex Processor

Rasterizer

Fragment Processor

Frame Buffer(s)
34

08/30/08

CPU memory vs GPU memory
CPU Registers Local Mem Global Mem
Read/write Read/write stack Read/write heap

GPU
Read/write None Read-only during computation. Write-only at end (to pre-computed address) None
35

Disk
08/30/08

Read/write disk

It looks like…..

08/30/08

36

Some applications…..
Computer generated holography
using a graphics processing unit Improve the performance of CAD tools. Computer graphics in games

08/30/08

37

New…..
NVIDIA's new graphics processing
unit, the GeForce 8X ULTRA, said to represent the very latest in visual effects technologies.

08/30/08

38

THANK YOU
08/30/08 39

Sign up to vote on this title
UsefulNot useful