Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Look up keyword or section
Like this

Table Of Contents

What Is This Document?
Who Should Read This Guide?
1.1 Differences Between Host and Device
1.2 What Runs on an OpenCL-Enabled Device?
1.3 Maximum Performance Benefit
2.1.1 Using CPU Timers
2.1.2 Using OpenCL GPU Timers
2.2.1 Theoretical Bandwidth Calculation
2.2.2 Effective Bandwidth Calculation
3.1 Data Transfer Between Host and Device
3.1.1 Pinned Memory A Simple Access Pattern A Sequential but Misaligned Access Pattern Effects of Misaligned Accesses Strided Accesses Shared Memory and Memory Banks Shared Memory in Matrix Multiplication (C = AB) Shared Memory in Matrix Multiplication (C = AAT Shared Memory Use by Kernel Arguments
3.2.3 Local Memory Textured Fetch vs. Global Memory Read Register Pressure
4.2 Calculating Occupancy
4.3 Hiding Register Dependencies
4.4 Thread and Block Heuristics
4.5 Effects of Shared Memory
5.1 Arithmetic Instructions
5.1.1 Division and Modulo Operations
5.1.2 Reciprocal Square Root
5.1.4 Math Libraries
5.2 Memory Instructions
6.2 Branch Predication
A.2 High-Priority Recommendations
A.3 Medium-Priority Recommendations
A.4 Low-Priority Recommendations
0 of .
Results for:
No results containing your search query
P. 1


Ratings: (0)|Views: 550|Likes:
Published by targezzedd

More info:

Published by: targezzedd on Jun 27, 2011
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





You're Reading a Free Preview
Pages 5 to 49 are not shown in this preview.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->