You are on page 1of 13

Heterogeneous computing

A typical computer contains...

Multi-core Processor
(Intel, AMD etc) Graphics Processing Unit
2-4 processors. (NVIDIA, ATI etc)
+

 out-of-order issue.

100 – 300 cores.
 Scratchpad memory.
 Multi-level caches.
 >1000 active contexts.

Clearly, we have a LOT of computing power.


Also...

CELL Broadband Engine


(IBM/Sony/Toshiba)

1PPE, dual-thread, dual issue
 8SPEs – separate register file.

Clearly, we have a LOT of computing power.

Problem : How to use these resources?


Problem...

ENGLISH
Problem...

Cell-specific APIs
What is OpenCL ?
 First open, royalty-free standard for cross-
platform, parallel programming of modern
processors found in personal computers,
servers and handheld/embedded devices.
 Initially proposed by Apple.

Created by the Khronos Group.
 Currently in release 1.1.

OpenCL Working Group :


Benefits of OpenCL
 Acceleration in parallel processing.
 Cross-vendor software portability.
OpenCL Architecture – Platform Model
 From OpenCL's perspective, a system looks like this :


Host – A computer running a standard Operating System. In plain words, a CPU.
 Device – Can be a GPU, multi-core CPU, DSP processor etc.

OpenCL application runs on a host, submits commands to device's processing elements.

Processing elements execute a single stream of instructions as SIMD units
(execute in lockstep with a single stream of instructions) or as SPMD units
(each PE maintains its own program counter).
OpenCL Architecture – Execution Model
 Host program – Driver program running on the host, which manages various OpenCL
devices.

Kernel – Code that executes on one or more OpenCL devices.

Execution is performed in parallel, by a number of concurrently executing
threads.

Each thread is called a work-item, and has a unique identifier. Each work-item
executes the same code.
 Work-group : A group of work-items.
 Work-groups are scheduled on compute units.
 Threads within work-group have a unique local id.
 Threads within work-group can synchronize efficiently.
 Threads within work-group can communicate among themselves via fast
local memory, analogous to scratchpad memory.
 No effective way to communicate across work-groups.
OpenCL Architecture – Execution Model
 NDRange - The index space supported in OpenCL.
 N-dimensional index space. (1,2 or 3).

Defined by an integer array of length N specifying the extent of the index space
in each dimension starting at an offset index F (zero by default).

Each work-item’s global ID and local ID are N-dimensional tuples.
 The global ID components are values in the range
F to F + (num_elements_in_dimension - 1).
OpenCL Architecture – Execution Model
 Other OpenCL jargons – more on this shortly.
 Devices.

Kernels.

Program objects.
 Memory objects.

Command queues.
 Kernel execution commands.
 Memory Commands.

Synchronization commands..
OpenCL Architecture – Memory Model

Each work-item has access to global, local,
private and constant memories.

Identified in kernel code by address space
qualifiers. (__global, __constant, __local
etc)
OpenCL Architecture – Memory Model

Each work-item has access to global, local,
private and constant memories.

Identified in kernel code by address space
qualifiers. (__global, __constant, __local etc).

Relaxed consistency semantics.

Local memory is consistent across work-
items in a single work-group at a work-group
barrier

. Global memory is consistent across work-
items in a single work-group at a work-group
barrier.

No guarantees across work-groups.

Memory consistency for memory objects
shared between enqueued commands is
enforced at a synchronization point. (What's a
synchronization point ?).

You might also like