Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
Graphics Processing Unit

Graphics Processing Unit

Ratings: (0)|Views: 99 |Likes:
Published by Than Lwin Aung
Central Processing Unit (CPU) has been the heart of any computer for many decades, and the functionality and higher processing power of CPU have been achieved by raising the CPU clock speed. However, the problems related to high power consumption, more memory bandwidth and heat dissipation force CPU designers to look for new architectural solutions, such as streaming compute model. Traditionally, most CPU architectures utilize Von Neumann Model, which in fact shows some inefficiency for
Central Processing Unit (CPU) has been the heart of any computer for many decades, and the functionality and higher processing power of CPU have been achieved by raising the CPU clock speed. However, the problems related to high power consumption, more memory bandwidth and heat dissipation force CPU designers to look for new architectural solutions, such as streaming compute model. Traditionally, most CPU architectures utilize Von Neumann Model, which in fact shows some inefficiency for

More info:

Published by: Than Lwin Aung on Feb 15, 2011
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Central Processing Unit (CPU) has been the heart ofany computer for many decades, and the functionalityand higher processing power of CPU have beenachieved by raising the CPU clock speed. However,the problems related to high power consumption, morememory bandwidth and heat dissipation force CPUdesigners to look for new architectural solutions, suchas streaming compute model. Traditionally, most CPUarchitectures utilize Von Neumann Model, which in factshows some inefficiency for massive computation,such as Ray-Tracing. With the streaming paradigm,GPU (Graphics Processing Unit) has begun to playmore important role than ever before, as it can beoptimized for stream computation. Furthermore, theextensive vector computation capability and theparallel graphics pipe-line provided by GPU becomeattractive to many scientific applications other thangraphics. This growing use of a graphics card forstream computation leads to the implementation ofGPU as a stream co-processor. In this research paper,I will primarily focus on the general concepts of ray-tracing, and the implementation of ray tracing on GPUas a specialized stream processor, and then I willexamine the convergence between CPU and GPU forstreaming computation, such as cell processors.
1 Introduction
Nowadays, various applications of computer graphics,such as 3D rendering, 3D modeling and 3D animation,have become common place. With the extensive useof computer graphics, the tradeoff between time andvisual fidelity has become the primary concern ofgraphics community. In fact, rendering photo realisticimagery has never been easy; it is really acomputationally expensive process. In 3D animated
movie “Cars”
, the average frame took 15 hours torender.However, producing real-time photo-realistic renderingstarted to seem possible with the utilization of streamprocessing model, SIMD (single instruction multipledata) and multi-core architecture
. The gap betweenrealistic and interactive (real time) rendering becomesnarrower thanks to new architectural design such ascell processors.
1.1 Ray Tracing
Ray Tracing is a powerful yet simple 3D renderingtechnique which utilizes the physics of light. In realworld, light is emitted from the light sources asphotons. Ray tracing technique uses the sameprinciple by modeling reflection and refraction of lightby recursively following the path of light particle
.3D scene or world (i.e., a list of objects to be rendered)is usually rendered on 2D view window from a givenviewpoint, usually called the eye or camera. Each point(pixel) in view window is caused by the simulated lightray reflected back from the objects towards the eye.Figure 1: Ray TracingAlthough, ray tracing uses the physics of light, there isa key difference. In real world, every ray of lightoriginates from light sources, or other objects as inreflected rays, towards the eye, but in ray tracing, allrays of light start from the eye, until the ray intersectsthe objects in the scene. However, because of thephysical property of light, called reciprocity, the finalresult is the same.Whenever, a ray from the eye intersects with an object(transparent, semi-transparent or opaque) in the scene,a new reflected or refracted ray is generated and traceduntil it reaches to the light source, thereby ray tracing isdefined by recursive reflection and refraction.Although the underlying principle is simple, in order toachieve high quality rendering, millions of rays need tobe traced and calculated, which obviously is not aneasy process. The situation can become worse, whenthere are many light sources in the scene, and indirectillumination caused by them.
1.2 Global Illumination
Global Illumination is another algorithm which can alsocapture indirect illumination. Ray tracing usually doesnot capture the effect of indirect lighting as it is analgorithm for direct illumination. Indirect lighting occurswhen the light bounces off between two surfaces
.Although the effects of global illumination, such as colorbleeding and caustics, are subtle, they are important forphoto-realistic imagery. Therefore, many enhancementsuch as photon mapping techniques, which trace bothrays from the eye and rays from the light source, andcombine the path to produce the final image, has beenincorporated into ray tracing algorithm.
2 Stream Model
Traditionally, Von Neumann Architecture has been thebasic architecture for CPU, but it has been observedthat Von Neumann style architecture creates bottleneck between CPU and main memory. In order toalleviate this bottle neck, many strategies has beenincorporated into CPU design such as fast memory,cache etc. However, the primary focus on solvingmemory bottle-neck has resulted in lack of devotion oncomputation speed
. The most prominent exampleis the computation power of GPU. Even with therequirement of very little memory, GPU can performvery fast computation by exploiting parallelism andlocality of data.Figure 2: Von Neumann ArchitectureRecently, stream processing and stream processorarchitecture have become popular. In fact, unliketraditional architecture, stream processors aredesigned to exploit the parallelism and locality inprograms. Stream processors require tens to hundredsof arithmetic processors for parallelism and a deephierarchy of registers to exploit locality. Streamingmodel also requires programs written for parallelismand locality.Figure 3: Stream ProcessorStream programming model is based on kernels andstreams. The kernel is the function which loads thedata from input stream, computes and writes the dataon the output streams.
2.1 Graphics Pipe Line
Even though, ray tracing can generate highly photo-realistic imagery, most GPU employs different techniquecalled rasterization, in which 3D rendering is achievedby shaders and various buffers, such as z- buffer,texture memory etc.Figure 4: Graphics PipelineFigure 4 shows how a general graphics pipe-line works
. Generally, the 3D scene is fed into the hardware asa sequence of triangles (which is the most basicgeometric figure from which more complex figure suchas circles, polygons, curves etc., can be generated byapproximation) Each triangle consists of vertices eachof which stored the coordinates and color information.Vertex processor generally transforms vertices frommodel (3D) coordinates to screen coordinates (2D) bymatrix transformation. Then, rasterizer transforms eachvertex into fragments, which, in fact, are pixels withcolor and depth information (stored in memory to berendered on the screen). Fragment processor modifiescolor of each fragment with texture mapping. Eachfragment is then compared to the value stored in Z-buffer (depth buffer), and displayed through framebuffer on screen.
2.2 GPU stream processing
The most exciting feature of GPU is the programmablefragment processor, which uses SIMD parallelism toimplement common operations, such as Add, Multiply,Dot Product, Cross Product and Texture fetches.However, fragment programs are not allowed for datadependent branching
 Figure 5: programmable fragment processorIn fact, GPU's fragment processor can be viewed as aspecialized stream processor. The stream elements, inthis case, are the fragments. Parallelism is achievedthrough running each fragment program in parallel oneach pixel. Also through dependent texture lookup,which is a memory fetch for texture information foreach fragment, GPU can be exploited for locality.
2.3 Ray Tracing on GPU
After explaining the general concepts of ray tracingand stream computation model, we will look into howray tracing can be achieved on standard GPU. Since,ray tracing is an algorithm which can be computed perpixel, parallel ray tracing is readily achieved, byindependently computation of each ray in parallel toproduce the final image. This is one of the reasonswhy ray tracing was first implemented on clusteredcomputers and massively parallel share memorysupercomputers. In fact, interactive ray tracing ispossible on those high end computers, which useshigh computation power with wide memory bandwidth.However, in 2004, Purcell proposed a solution how raytracing can be achieved on standard GPU, by utilizingstream processing model on fragment processor
.Actually, GPUs are designed for rasterization, not forray tracing. Rasterization transforms all objects to berendered into fragments and the fragments aredisplayed on the screen, depending on the depthinformation. However, according to Purcell
, we canbuild a ray tracer on the fragment processor byexploiting its parallelism and locality for ray tracing. Inhis model, the input to fragments programs are pixels,each of which corresponds the pixel caused by eachray, and the fragment processor computes recursiveray tracing in parallel on that pixels, and then finalimage is displayed on screen.Figure 6: Purcell's Streaming Ray TracerFigure 7: Data flow diagram for Ray Tracing KernelsFigure 7 shows ray tracing is divided into several corekernels (functions), which are implemented on fragmentprocessor by utilizing vector and matrix calculation.Although Purcell proposed how ray tracing can beimplemented solely on fragment processor, there aremany limitations on actual implementation especiallydue to speed required for ray tracer. Moreover, Purcell'sdesign is heavily dependent on dependent texturelookups, which are poorly optimized for ray tracing.In fact, many designs have been proposed to improvethe speed of computations on GPU by improvingtexture lookups, adding more parallelism etc. The othersolution is to build a custom piece of hardware, whichcan incorporate stream model. Many architecturaldesigns have been proposed which can bridge CPUand GPU, and I will discuss about one of the designarchitectures which uses Cell Broadband Engine

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->