Professional Documents
Culture Documents
Multi-core Processor
(Intel, AMD etc) Graphics Processing Unit
2-4 processors. (NVIDIA, ATI etc)
+
out-of-order issue.
100 – 300 cores.
Scratchpad memory.
Multi-level caches.
>1000 active contexts.
ENGLISH
Problem...
Cell-specific APIs
What is OpenCL ?
First open, royalty-free standard for cross-
platform, parallel programming of modern
processors found in personal computers,
servers and handheld/embedded devices.
Initially proposed by Apple.
Created by the Khronos Group.
Currently in release 1.1.
Host – A computer running a standard Operating System. In plain words, a CPU.
Device – Can be a GPU, multi-core CPU, DSP processor etc.
OpenCL application runs on a host, submits commands to device's processing elements.
Processing elements execute a single stream of instructions as SIMD units
(execute in lockstep with a single stream of instructions) or as SPMD units
(each PE maintains its own program counter).
OpenCL Architecture – Execution Model
Host program – Driver program running on the host, which manages various OpenCL
devices.
Kernel – Code that executes on one or more OpenCL devices.
Execution is performed in parallel, by a number of concurrently executing
threads.
Each thread is called a work-item, and has a unique identifier. Each work-item
executes the same code.
Work-group : A group of work-items.
Work-groups are scheduled on compute units.
Threads within work-group have a unique local id.
Threads within work-group can synchronize efficiently.
Threads within work-group can communicate among themselves via fast
local memory, analogous to scratchpad memory.
No effective way to communicate across work-groups.
OpenCL Architecture – Execution Model
NDRange - The index space supported in OpenCL.
N-dimensional index space. (1,2 or 3).
Defined by an integer array of length N specifying the extent of the index space
in each dimension starting at an offset index F (zero by default).
Each work-item’s global ID and local ID are N-dimensional tuples.
The global ID components are values in the range
F to F + (num_elements_in_dimension - 1).
OpenCL Architecture – Execution Model
Other OpenCL jargons – more on this shortly.
Devices.
Kernels.
Program objects.
Memory objects.
Command queues.
Kernel execution commands.
Memory Commands.
Synchronization commands..
OpenCL Architecture – Memory Model
Each work-item has access to global, local,
private and constant memories.
Identified in kernel code by address space
qualifiers. (__global, __constant, __local
etc)
OpenCL Architecture – Memory Model
Each work-item has access to global, local,
private and constant memories.
Identified in kernel code by address space
qualifiers. (__global, __constant, __local etc).
Relaxed consistency semantics.
Local memory is consistent across work-
items in a single work-group at a work-group
barrier
. Global memory is consistent across work-
items in a single work-group at a work-group
barrier.
No guarantees across work-groups.
Memory consistency for memory objects
shared between enqueued commands is
enforced at a synchronization point. (What's a
synchronization point ?).