Professional Documents
Culture Documents
Aaftab Munshi
Opportunity: Processor
•Today’s processors are increasingly parallel
•CPUs
■ Multiple cores are driving performance increases
•GPUs
■ Transforming into general purpose data-parallel
computational coprocessors
■ Improving numerical precision (single and double)
■ Address Qualifiers
■ __private
■ __private
■ __local
// create a work-queue
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
memobjs[1] = clCreateBuffer(context,
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
memobjs[1] = clCreateBuffer(context,
CL_MEM_READ_WRITE,
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
memobjs[1] = clCreateBuffer(context,
CL_MEM_READ_WRITE,
sizeof(float)*2*num_entries, NULL);
// execute kernel
clExecuteKernel(queue, kernel, NULL, range, NULL, 0, NULL);
localShuffle(data, sMemx, sMemy, tid, (((tid >> 4) * 64) + (tid & 15)));
// four radix-4 function calls
fftRadix4Pass(data); fftRadix4Pass(data + 4);
fftRadix4Pass(data + 8); fftRadix4Pass(data + 12);