WOLFRAM WHITE PAPER

Heterogeneous Computing in Mathematica 8
®

Introduction
The past few years have seen an increase in the number of cores on both the CPU and GPU. Despite this, developers have not been able to fully utilize the parallel frameworks. As developers began to target the CPU and GPU frameworks, they realized that some algorithms map nicely onto the CPU while others work best with the GPU. As a result, much research has been done to develop algorithms that exploit both the CPU’s and GPU’s capabilities. The research culminated in a heterogeneous school of thought, with both the CPU and GPU collaborating to maximize the accuracy and speed of the user’s program. In such methodology, the GPU is utilized in parts where it excels, as is the CPU. Yet while heterogeneous algorithms are ideal, no system has exposed built-in access to both the CPU and the GPU with a concise and easy-to-use syntax until Mathematica 8 which makes heterogeneous computing a reality. By providing an environment where programs can be run on either the CPU or GPU, Mathematica is the only computational system that realizes the heterogeneous message. Coupled with Mathematica’s language, comprehensive import/export support, symbolic computation, extensive field coverage, state-of-the-art visualization features, platform neutrality, and ease of use, Mathematica is ideal for heterogeneous algorithm development. This white paper is divided as follows: first, we briefly discuss what Mathematica is and why you should use it for heterogeneous computing, then we look at some of Mathematica’s multicore and GPU features, develop a dozen applications along the way, and finally discuss why Mathematica has an advantage over other systems.

A Brief Introduction to Mathematica
Mathematica is a development environment that combines a flexible programming language with a wide range of symbolic and numeric computational capabilities, production of high-quality visualizations, built-in application area packages, and a range of immediate deployment options. Combined with integration of dynamic libraries, automatic interface construction, and C code generation, Mathematica is the most sophisticated build-to-deploy environment on the market today. Mathematica 8 introduces the ability to perform computations on the GPU and thus facilitates heterogeneous computing. For developers, this new integration means native access to Mathematica’s computing abilities—creating hybrid algorithms that combine the CPU and the GPU. Some of Mathematica’s features include:

Free-form linguistic input
Mathematica’s free-form linguistic input is the ability to interpret English text as Mathematica code. It is a breakthrough in usability, making development intuitive and simple.

Multiparadigm programming language
Mathematica provides its own highly declarative functional language, as well as several different programming paradigms, such as procedural and rule-based programming. Programmers can choose their own style for writing code with minimal effort. Along with comprehensive documentation and resources, Mathematica's flexibility greatly reduces the cost of entry for new users.

Heterogeneous Computing in Mathematica 8 | 1

such as statistics. as well as real-time access to data from Wolfram|Alpha®. and deployment. Wolfram Research also offers Wolfram Workbench’.Symbolic-numeric hybrid system The principle behind Mathematica is full integration of symbolic and numeric computing capabilities.NET. Full-featured. Java. Data access and connectivity Mathematica natively supports hundreds of formats for importing and exporting. . High-performance computing Mathematica has built-in support for multicore systems. and consistent. 2 | Heterogeneous Computing in Mathematica 8 . and Oracle. MySQL. It also provides APIs for accessing many programming languages and databases. streamlined. such as C/C++. formulas. This unified representation makes Mathematica’s language and functions extremely flexible. and built-in parallel constructs make high-performance programming easy. data visualization. Scientific and technical area coverage Mathematica provides thousands of built-in functions and packages that cover a broad range of scientific and technical computing areas. documents—can be represented as symbolic entities. development. Unified data representation At the core of Mathematica is the foundational idea that everything—data. a state-of-the-art integrated development engine based on the Eclipse platform. users reap the power of a hybrid computing system without needing knowledge of specific methodologies and algorithms. graphics. All functions are carefully designed and tightly integrated with the core system. Many functions automatically utilize the power of multicore processors. Through its full automation and preprocessing mechanisms. Wolfram Research’s computational knowledge engine’. Mathematica provides a streamlined workflow. utilizing all cores on the system for optimal performance. and image processing. programs. control systems. unified development environment Through its unique interface and integrated features for computation. called expressions.

4-. Built-in code generation functionality can be used to create standalone programs for independent distribution. and browser plugins. With CPUs routinely being 2-. and Mathematica 8 made many algorithms run on multicore systems in addition to introducing GPU programming. Yet as algorithms were implemented for the GPU. With an architecture that facilitates the addition of more ALUs. CPU Multicore Integration in Mathematica Most of the built-in functions. Mathematica 7 introduced multicore programming in Mathematica. Unlike Moore’s law for the CPU—which states that the number of transistors that can be placed inexpensively on an integrated circuit doubles approximately every 18 months—GPUs have been quadrupling the number of transistors every 18 months. making the GPU ideal for certain forms of computation. Mathematica embodies the heterogeneous computing philosophy. Also availableis webMathematica ’ which allows Mathematica programs to be run on a web server. the GPU has quickly surpassed the CPU in terms of raw computational power. as well as be used in conjunction with the GPU capabilities in Mathematica 8. already make use of all cores on the system. which uses the CPU in cases where the GPU is inefficient and the GPU where speedups are desired. Mathematica provides a wide range of options for deployment. with a modest GPU able to perform over 20 times more computations per second than a CPU. Motivations for Heterogeneous Computing in Mathematica Over the past few years. it became apparent that the lack of cache and control makes some algorithms either difficult or inefficient on the GPU. This is because of the GPU architecture. or 8-cores. The GPU has quickly surpassed the CPU in terms of performance. The GPU has also transitioned from being used only for graphics and gaming to a device capable of performing general computation. image processing. which reduces cache and RAM in favor of arithmetic logical units (ALU). allowing users to choose either the CPU or GPU based on the program’s performance. and wavelets. Scalability Wolfram Research’s gridMathematica ’ allows Mathematica programs to be parallelized on many machines in cluster or grid configuration. Wolfram CDF Player’. Mathematica also exposes to the user a few ways to accelerate programs using multiple cores. like linear algebra. Those functions can be further extended to run on a cluster of machines with the use of gridMathematica.Platform-independent deployment options Through its interactive notebooks. Heterogeneous Computing in Mathematica 8 | 3 . By providing an interface for running programs on both the CPU and the GPU. software has not been able to exploit the now-common multicore features. Hence the popularization of heterogeneous computing. multicore systems have transitioned from being found only on specialty devices to commodity devices.

it does afford the user significant speed improvements. The byte code can be targeted to Mathematica’s internal virtual machine (WVM) or C. Sin@xD + Compile`$2 1 Compile`$2 F. and although it sacrifices some Mathematica features. some expressions are automatically compiled for the user—when constructing a list using a function. for example. 8x. BlockB8Compile`$2<.p. and wavelets. when it makes sense. the user passes in the input variables along with a sequence of Mathematica expressions. Compile works only on a subset of the Mathematica language. . and. based on the input usage. performs type inferencing to construct a byte code representation of the program. like linear algebra. the function is automatically created to speed up the construction. performs common subexpression elimination evaluating x 2 once. already make use of all cores on the system. cf = CompileB8x<.Most of the built-in functions. The byte code can be printed to understand what code is being generated. Compile Mathematica’s Compile command takes a sequence of inputs and expressions. Sin@xD + x2 1 x2 F CompiledFunctionB8x<. Compile`$2 = x2 . Compile is used internally by many Mathematica functions and. image processing. as well as be used in conjunction with the GPU capabilities in Mathematica 8. It can be visualized as follows. Plot@cf@xD. Compile will automatically perform code optimization—the following code. Mathematica also exposes to the user a few ways to accelerate programs using multiple cores. for example. 4 | Heterogeneous Computing in Mathematica 8 . p<D 10 5 -3 -2 -1 -5 -10 -15 1 2 3 The Compile statement generates byte code from the Mathematica program input by the user. Those functions can be further extended to run on a cluster of machines with the use of gridMathematica. -CompiledCode-F The CompiledFunction returned behaves the same as any Mathematica function. To use the Compile command.

8a. evaluationPoints = Table@a + I b. 1. then ComÖ pile will generate a C version of the Mathematica program. Compilation Target By default. Sin@xD + Compile`$3 1 Compile`$3 F. The following. BlockB8Compile`$3<. CompilationTarget Ø "C"F CompiledFunctionB8x<. without going through low-level C programming. or JVM with both WVM and C built into Mathematica 8. This allows Mathematica to run at the same speed as C. _Complex<<. but if CompilationTarget->"C" is set.005<. CompilationTarget Ø "C". Boole@Norm@FixedPoint@Ò ^ 2 + c &.5. Compile targets the WVM. Sin@xD + x2 1 x2 .5.R3 R2 = R2 + R1 + R4 Return The byte code generated can be used to target the Wolfram Virtual Machine (WVM).Needs@"CCodeGenerator`"D CompilePrint@cfD 1 argument 5 Real registers Underflow checking off Overflow checking off Integer overflow checking on RuntimeAttributes -> 8< R0 = A1 Result = R2 1 2 3 4 5 6 R1 = Square@ R0D R2 = Sin@ R0D R3 = Reciprocal@ R1D R4 = . LLVM IR. The code is executed from within Mathematica. for example. 0. OpenCL. C. SameTest Ø HNorm@ÒD ¥ 4 &LDD ¥ 4D. The following sets the evaluation points for the Mandelbrot function.1. Compile`$3 = x2 . . 0. 1000. cf = CompileB8x<. . Heterogeneous Computing in Mathematica 8 | 5 . RuntimeAttributes Ø Listable D. -CompiledCode-F Since Mathematica is a highly expressive language.1. 8b. compileMandelbrot = Compile@88c. 0. uses Compile to generate a multi-threaded C program to compute the Mandelbrot set.005<D. 0. and run it. CUDA. compile it. programs that take dozens or hundreds of lines in C code can be written in two or three lines in Mathematica and have similar performance.

1<. y. colors. The ray tracer takes a list of sphere centers. colors_. _Real<. and might not have been portable across systems. Here.And finally. sphere radii. raySpheresIntersectionColor = CompileB88centers. y_D := Module@8 xx. CompilationTarget Ø "C". 2<. It then shines a parallel ray from the ray origin and records which sphere intersects the ray and associates a color with the intersection. center. 2<. yyD D 6 | Heterogeneous Computing in Mathematica 8 . _Real. IfBv < rad. 8radii. rad.. x_. and a ray origin position. Length@xDD. Making a CompiledFunction run in parallel is simple—the user only has to pass the option RuntimeAtÖ tributes->"Listable". Length@yDDD. radii_.v rad . RuntimeAttributes Ø Listable F. yy<. res F. 0. xx. 8ii. We will compile the code into "C" by passing the CompilationTarget->"C" option. 8colors. Compile will run in as many threads as there are on the system. we implement a basic ray tracer.0<. F. we invoke the function with the evaluation points and plot the result. 0. v. _Real. Automatic vectorization Mathematica 8 added the ability to automatically parallelize Compile statements. 0. ModuleB8ray = 8x. center = centers@@iiDD. Length@centersD< F. rayTraceCompile@centers_.. 8x. res = colors@@iiDD rad . As an example.centerD2 . xx = Transpose@ConstantArray@x. required project setup. res = 80. radii. DoB rad = radii@@iiDD2 . sphere colors. we define a Mathematica function that calls the above compiled function. v = Norm@ray . ArrayPlot@compileMandelbrot@evaluationPointsDD Written in C. From there. _Real. the above would have taken many hundreds of lines of code.<<. yy = ConstantArray@y. raySpheresIntersectionColor@centers. as we will use RuntimeAttributes->"Listable" to make the code run in parallel. 8y. _Real<<.

. nyDD ê ny .width 2 height . generating 100 spheres in random positions and creating the ray origins. RandomReal@100D>. radii.1]×[0. The most basic parallel program you can write is a Monte Carlo integrator for approximating p . width 2 >F. Point@Select@pts.1]×[0.1] region. We use the Mathematica symbol ColorData to get colors that follow the "DarkBands" color scheme. those points approximately cover a quarter of a unit circle.5L. >F. Point@Select@pts. Mathematica’s parallel tools allow users to make use of all cores on their system. nxDD ê nx . x. clusters. centers = TableB:RandomRealB: RandomRealB: . numSpheresD. 810 000. the speed of this program is comparable to a program written in C. imgc = rayTraceCompile@centers. 2<D. y = height * HN@Range@0. Parallel Computing Built into the Mathematica language since Version 7 is the capability for multicore computing.height . 8Red. 30<.. Norm@ÒD § 1 &DD<. developing more sophisticated projects that execute at a fraction of the time. Graphics@8AbsolutePointSize@2D. 1 ê numSpheresD. This is done by generating uniform random points in the [0. Image@imgcD Again. nx = ny = 300. 8numSpheres<F. Mathematica’s parallel infrastructure is also set up to allow seamless scaling to networks.5L. 8Black. colors = ReplaceAll@ ColorData@"DarkBands"D êü Range@0. 1. Norm@ÒD > 1 &DD<<D p . numSpheres = 100. yD. and grids. x = width * HN@Range@0.1] region with red points having a norm less than 1. but unlike the C version. As can be seen. 4 The following shows uniform points in the Heterogeneous Computing in Mathematica 8 | 7 . The proportion 4 of points with a norm less than 1 approximates [0. width = height = 300.This constructs the scene. 2 2 radii = RandomReal@85. RGBColor Ø ListD. This runs the ray tracer and visualizes the result. our code is around 20 lines long. colors. pts = RandomReal@1. .

The Mathematica language has extensive support for list processing. 1.784934 Parallel tools allow fine-grained control on how to split the parallel tasks and how many parallel kernels to run. 82<DD § 1. which performs automatic parallelization. gridMathematica allows for fine-grained control over the scheduling of tasks.0.0D. A queuing mechanism is also available to saturate the CPU usage and achieve the best speedup. 1. n = 1 000 000. to make the above program run on multiple cores. the user just has to place Parallelize around Table. This makes writing the above Monte Carlo p-approximation trivial. 82<DD § 1. Mean@Table@If@Norm@RandomReal@1. It also contains its own management software called Wolfram Lightweight Grid’ Manager. Mean@ Parallelize@ Table@If@Norm@RandomReal@1. 8n<DD 0. By installing the gridMathematica server on a machine. 8 | Heterogeneous Computing in Mathematica 8 Similar to Mathematica’s parallel tools. etc. PBS. and it is automatically run in parallel. gridMathematica also interfaces with existing grid management software from Microsoft. giving priority to certain kernels. The parallelization has been further simplified in Mathematica 8 by introducing ParalÖ lelize. 8n<D D D 0. 0. n = 1 000 000. LFS. the machine will broadcast its availability to the master machine.0D. So. . gridMathematica gridMathematica extends Mathematica’s parallel functionality to run on homogeneous and heterogeneous networks and clusters. Sun. 0.784838 Mathematica 7 introduced parallel primitives such as ParallelMap and ParallelTable.0.

CUDALink and OpenCLLink are supported on all platforms supported by Mathematica 8—Windows. GPUs that have CUDA or OpenCL support are supported by CUDALink or OpenCLLink with Mathematica automatically determining the precision of the card and executing the appropriate function based on the maximal floating point precision. GPU Integration in Mathematica CUDALink and OpenCLLink offer a high-level interface to the GPU built on top of Mathematica’s language and development technologies. Sun. OpenCLLink supports both NVIDIA and AMD GPUs. They allow users to execute code on their GPU with minimal effort.Similar to Mathematica’s parallel tools. gridMathematica allows for fine-grained control over the scheduling of tasks. hiding unnecessary complexity associated with GPU programming. Both the AMD video driver and AMD APP SDK are needed for AMD. OpenCLLink also supports CPU implementations of OpenCL provided by either AMD or Intel. PBS. Programs written using Mathematica’s parallel tools are valid gridMathematica programs. and lowering the learning curve for GPU computation. Making GPU Programming Easy By removing repetitive and low-level GPU related tasks. Mac OS X. users experience a more productive and efficient development cycle. Mathematica makes GPU programming simple. gridMathematica also interfaces with existing grid management software from Microsoft. Mathematica is bundled with the CUDA Toolkit. LFS. It also contains its own management software called Wolfram Lightweight Grid’ Manager. and Linux. etc. and thus Mathematica is easily scalable from single multicore systems to clusters. Heterogeneous Computing in Mathematica 8 | 9 . CUDALink and OpenCLLink Support CUDALink is supported on all CUDA-enabled NVIDIA hardware. both 32. giving priority to certain kernels. with only the NVIDIA driver being needed for NVIDIA GPUs.and 64-bit. This makes the NVIDIA driver and a supported C compiler the only requirements for CUDALink programming. By fully integrating and automating the GPU’s capabilities using Mathematica.

Automation of development project management Unlike other development frameworks that require the user to manage project setup. The Mathematica memory manager handles memory transfers intelligently in the background. and more. memory and thread management are automatically handled for the user. Mathematica streamlines the whole programming process. Mathematica makes the process of GPU programming transparent and automated. Automated GPU memory and thread management A GPU program written from scratch delegates memory and thread management to the programmer. Mathematica will automatically detect and configure the required tools to make it simple to use the GPU. 10 | Heterogeneous Computing in Mathematica 8 . and better performance. Zero device configuration Mathematica automatically finds. Integration with Mathematica’s built-in capabilities Mathematica’s GPU integration provides full access to Mathematica’s native language and built-in functions. Ready-to-use applications Mathematica provides several ready-to-use GPU functions that cover a broad range of topics such as computational mathematics. configures. As a result of hiding the bookkeeping and repetitive tasks found in GPU programming. and device configuration. With Mathematica. This bookkeeping is required in addition to the need to write the GPU kernel. Users can write hybrid algorithms that use the CPU and GPU depending on the efficiency of each algorithm. allowing for simpler code. image processing. financial engineering. is not copied to the GPU until computation is needed and is flushed out when the GPU memory gets full. shorter development cycle. Memory. platform dependencies. for example. and makes GPU devices available to users.

Maximum Image3D Height Ø 2048. Setting up OpenCLLink OpenCLLink supplies functions that query the system’s GPU hardware. Global Memory Cache Line Size Ø 128. Preferred Vector Width Short Ø 1. Local Memory Size Ø 49 152. Execution Capabilities Ø 8Kernel Execution<. Infinity. Needs@"OpenCLLink`"D OpenCLQ tells whether the current hardware and system configuration support OpenCLLink:. cl_nv_d3d10_sharing. Memory Data Type Align Size Ø 128. Because Mathematica caches program compilation and makes intelligent choices about GPU memory transfer. Maximum Write Image Arguments Ø 8. cl_khr_icd. Available Ø True. Maximum Parameter Size Ø 4352. Vendor ID Ø 4318. Compute Units Ø 14. Maximum Constant Buffer Size Ø 65 536. Version Ø OpenCL 1. Through OpenCLLink. Web deployment Mathematica can be deployed onto a web server using webMathematica. Local Memory Type Ø Local. cl_khr_global_int32_base_atomics. Profile Ø FULL_PROFILE. Preferred Vector Width Integer Ø 1. Users can also scale the setup across machines and networks using gridMathematica. Maximum Clock Frequency Ø 1147. Address Bits Ø 32. Driver Version Ø 270. Preferred Vector Width Float Ø 1. Round to Nearest. Maximum Image3D Width Ø 2048. cl_nv_device_attribute_query. Global Memory Size Ø 2 780 364 800. 64<. This means that GPU computation can be initiated from devices that do not have a GPU. Command Queue Properties Ø 8Out of Order Execution.Through Mathematica’s built-in parallel programming support. Round to Zero. Preferred Vector Width Character Ø 1. Name Ø Tesla C2050 ê C2070. cl_khr_local_int32_extended_atomics. Here. Mathematica’s OpenCLLink: Integrated GPU Programming OpenCLLink is a built-in Mathematica application that provides an interface for using OpenCL within Mathematica. NaNs. users experience better execution speed versus handwritten OpenCL programs. Maximum Read Image Arguments Ø 128. Maximum Work Item Sizes Ø 81024. To use OpenCLLink operations. cl_nv_d3d9_sharing. cl_nv_compiler_options. IEEE754-2008 Fused MAD<. Round to Infinity. or mobile devices. Vendor Ø NVIDIA Corporation. Floating Point Precision Configuration Ø 8Denorms. Profiling Timer Resolution Ø 1000. cl_khr_d3d10_sharing. Extensions Ø 8cl_khr_byte_addressable_store. which do not have GPUs. cl_nv_pragma_unroll. Profiling Enabled<< Example of a report generated by OpenCLInformation. users can execute OpenCL programs from Mathematica with little effort. cl_khr_gl_sharing. Global Memory Cache Type Ø Read Write.81. OpenCLQ@D True OpenCLInformation gives information on the available OpenCL hardware. cl_khr_fp64<. This is ideal for the classroom. Maximum Image3D Depth Ø 2048. Endian Little Ø True.0 CUDA. Global Memory Cache Size Ø 229 376. Image Support Ø True. Heterogeneous Computing in Mathematica 8 | 11 . Compiler Available Ø True. Maximum Memory Allocation Size Ø 695 091 200. 1D 8Type Ø GPU. 1024. we query information about the first platform and device installed on the system. cl_khr_global_int32_extended_atomics. Maximum Work Item Dimensions Ø 3. Memory Base Address Align Ø 4096. Maximum Work Group Size Ø 1024. where the instructor may want to control the development environment. Maximum Image2D Height Ø 32 768. cl_khr_local_int32_base_atomics. Maximum Samplers Ø 16. Maximum Constant Arguments Ø 9. Core Count Ø 448. users have to first load the OpenCLLink application. cl_nv_d3d11_sharing. OpenCLInformation@1. users can launch GPU programs on different devices. Preferred Vector Width Double Ø 1. Error Correction Support Ø True. Preferred Vector Width Long Ø 1. Maximum Image2D Width Ø 4096.

816. ImageChannels@imgDD : > OpenCLFunctionLoad follows a simple syntax. mint channelsL 8 int width = dim@0D. 88_Integer.OpenCLLink Programming Programming the GPU in Mathematica is straightforward. __global mint *dim. 16<D OpenCLFunction@<>. < <". c < channels. where the first argument is the OpenCL source. 8_Integer. 88_Integer. _Integer<D Now you can apply this new OpenCL function to an image. int xIndex = get_global_idH0L. "InputOutput"<. if HxIndex < width && yIndex < heightL 8 for Hint c = 0. img = . Several things need to happen behind the scenes at this stage to load the OpenCL code into Mathematica efficiently. c++L img@index + cD = 255 . int index = channels * HxIndex + yIndex*widthL. cudaColorNegate. yIndex = get_global_idH1L. The source code is passed to OpenCLFunctionLoad and the user gets a Mathematica function as output. ImageDimensions@imgD. OpenCLColorNegate = OpenCLFunctionLoad@src. 12 | Heterogeneous Computing in Mathematica 8 . InputOutput<. 8_Integer. _Integer<. the second argument is the function name to be invoked.img@index + cD. src = " __kernel void openclColorNegateH__global mint *img. The following OpenCL program negates colors of a multichannel image. height = dim@1D. It begins with writing an OpenCL program. and the final argument is the work group size (block dimension). the third argument is a list of function parameter types. "openclColorNegate". Input<. OpenCLColorNegate@img. "Input"<.

< <". mint lengthL 8 int ii = get_global_idH0L. OpenCLLink Applications in Mathematica In this section we discuss some applications that run on the GPU using OpenCLLink. and get the results from the GPU. we perform input checking to make sure it matches the input specification. We next load the data onto the GPU (memory copies are performed in a lazy fashion to minimize memory copy overhead). In Mathematica you can perform sophisticated heterogeneous computation easily by leveraging a variety of built-in features.HReal_tL2. We then invoke a synchronization to make sure all computation has been performed.5L __kernel void blackScholesH__global Real_t * call. __global Real_t * T. The output from this step is an OpenCLFunction that behaves like any Mathematica function. __global Real_t * S. It states that the call of a European style option can be modeled by a formula implemented by the following OpenCL program. Real_t d2 = d1 . __global Real_t * X. code = " Òifdef USING_DOUBLE_PRECISIONQ Òifdef OPENCLLINK_USING_NVIDIA Òpragma OPENCL EXTENSION cl_khr_fp64 : enable Òelse ê* OPENCLLINK_USING_NVIDIA *ê Òpragma OPENCL EXTENSION cl_amd_fp64 : enable Òendif ê* OPENCLLINK_USING_NVIDIA *ê Òendif ê* USING_DOUBLE_PRECISIONQ *ê Òdefine NHxL HerfHHxLêsqrtH2. Black–Scholes Equation The Black–Scholes equation is the basis of computational finance. we have to catch it early. if Hii < lengthL 8 Real_t d1 = HlogHS@iiDêX@iiDL+HR@iiD-Q@iiD+HpowHV@iiD. When an OpenCLFunction is invoked. call@iiD = S@iiD*expH-Q@iiD*T@iiDL*NHd1L . __global Real_t * R. Real_t is a type defined by Mathematica that maps to the highest precision of the OpenCL device. Second. Since this is the most common source of errors. we compile the OpenCL function and cache it. __global Real_t * Q. we need to check to see if the arguments to OpenCLFunctionLoad are valid.First. This ensures that users are getting the best accuracy when computing.0Lê2L*T@iiDLLêHV @iiD*sqrtHT@iiDLL.0LLê2+0.V@iiD*sqrtHT@iiDL. __global Real_t * V.X@iiD*expH-R@iiD*T@iiDL*NHd2L. Heterogeneous Computing in Mathematica 8 | 13 .

8_Real.10. "Input"<. "blackScholes". "Input"<. 8_Real. 0<. 8_Real. The rest of the data is randomly generated. res@@1DD<DD 1300. Needs@"Calendar`"D numberOfOptions = Length@data@@1DDD. which we use in this example as source for our data when calculating the one-touch option.0. R. Input<. 0. numberOfOptionsD. 0<<DD.01. _Integer<D This gets the stock price for the S&P 500 from the beginning of 2010 to March 2011. Input<. This visualization function is part of Mathematica’s comprehensive support for visualization and charting. "Input"<. 1100. Input<. res = OpenCLBlackScholes@call. 8_Real. 0. numberOfOptionsD. blackScholes. 8_Real. 88_Real<. 8_Real. V = RandomReal@80. X.04<. 14 | Heterogeneous Computing in Mathematica 8 . S. plotting surfaces. numberOfOptionsD. data = Transpose@FinancialData@"SP500". 0. 8_Real. _Integer<. "Input"<. and interacting with trading charts. This uses the S&P 500 data for the spot price and the dates for the expiration values. 8_Real. S = data@@2DD.. 128D OpenCLFunction@<>.50<. One such format is Excel 2007 (XLSX). 1200. 8_Real.1 * data@@2DD. which includes. 4. R = RandomReal@80. 82011. "Input"<. 0. Input<. call = ConstantArray@0. V. The following runs the computation on the OpenCL device. We visualize the result as a Kagi chart. This data is curated by Wolfram Research and accessible via a web connection. T = HDaysBetween@Ò. 1000. 4. 882010. T. 2009 2010 2011 Computing with Data from an Excel File Mathematica supports many import and exports formats.07<. Q = RandomReal@80. "Input"<. computing bar and pie charts. 20<D & êü data@@1DDL ê 365. Input<. numberOfOptionsD. numberOfOptionsD.The following loads the above OpenCL code into Mathematica. 88_Real<. Q. 82011. 8_Real. 8_Real. KagiChart@Transpose@8data@@1DD. X = 1. 8_Real. Input<.03. OpenCLBlackScholes = OpenCLFunctionLoad@code.

1. 1. numberOfOptionsD 8OpenCLMemory@<29650>....889275... data@"Spot Price"D. 0.. 1. 8_Real. FloatD. 2. power. < <". d1.844359. 3... 0. put. "Input"<. 0.5f * V@iiD * V@iiDL * T@iiDL ê HV@iiD * sqrtHT@iiDLL. 1. 8_Real. 8_Real. "Data"DD. 1.95156.D@iiD + 0. __global Real_t * S. We allocate the data as a "float".. 8row. 0. call = OpenCLMemoryAllocate@"Float"...0LLê2+0. Transpose@rawDataD<D. d5. 1..53988. 0..35102.0.... FloatD< This retrieves the result for the call option.xlsx". "TargetPrecision" Ø "Single"D. 0.07142. 8_Real. if Hii < lengthL 8 d1 = HlogHS@iiDêX@iiDL + HR@iiD . "Input"<.0485. 1. OpenCLMemory@<29636>. put = OpenCLMemoryAllocate@"Float".38213. 1. 88_Real... 0.913892. 1. 0..< Conway’s Game of Life Conway’s Game of Life is an example of a simple two-dimensional cellular automaton.. power = powHX@iiDêS@iiD.. 1.991108. 1. numberOfOptionsD. 0. 1. 8_Real. 1. 1. __global Real_t * T. data@"Interest"D. 1. From simple rules that look only at the eight neighbors... 1. 6.. "Input"<. 1. d5 = HlogHS@iiDêX@iiDL . OpenCLMemoryGet@callD 81. OpenCLOneTouchOption@call. call@iiD = S@iiD < X@iiD ? power * NHd5L + HS@iiDêX@iiDL*NHd1L : 1. This imports the data from an Excel file and stores it in a Mathematica table. 1. 1D. numberOfOptionsD.. 1. 0. 0. "Output"<.34359.999357. Here is a basic OpenCL program that implements the Game of Life. 1. numberOfOptions = Length@data@"Spot Price"DD. 0. 1. 0.. __global Real_t * V..933998.0. 1.63052... "onetouch". 8_Real.940062. Do@data@First@rowDD = Drop@row.00517. int ii = get_global_idH0L. put@iiD = S@iiD > X @iiD ? power * NH-d5L + HS@iiDêX@iiDL*NH-d1L : 1.0914. __global Real_t * put. This calls the function. "Input"<.689786. 0.13967.5L __kernel void onetouchH__global Real_t * call.939484. "Input"<.93962. 128. 0. 0. rawData = First@Import@"dataset. This allocates memory for both the call and put result. Heterogeneous Computing in Mathematica 8 | 15 . __global Real_t * D. 1.D@iiD + 0.719849. 2*R@iiDêHV@iiD*V@iiDLL. 1.982171. data@"Volatility"D. 0. 1. 0.852829.865092. 1. 1. data@"Strike Price"D.. 1. __global Real_t * X. OpenCLOneTouchOption = OpenCLFunctionLoad@code. 1. 8_Real.850779. 1. 0. 1. 1. "Input"<. it gives rise to complicated patterns. "Output"<.113093.HR@iiD . data@"Expiration"D.. 1.. mint lengthL 8 Real_t tmp.5f * V@iiD * V@iiDL * T@iiDL ê HV@iiD * sqrtHT@iiDLL.. 1... This loads the OpenCL function into Mathematica in single precision mode. _Integer<. 0.code = " Òdefine NHxL HerfHHxLêsqrtH2. 1. 1. 1. data@"Dividend"D. 1. 1.895195. 1. __global Real_t * R.

Output<. 512<D. 8_Integer. mint width. 1<. int index = xIndex + yIndex*width. srcf = FileNameJoin@8$OpenCLLinkPath. _Integer<. outputState. gol_kernel. 88_Integer. else nxt@indexD = Hneighbrs == 3L ? 1 : 0. 816. _Integer. UpdateInterval Ø 1 ê 60 D D Many-Body Physical Systems The N-body simulation is a classic Newtonian problem. initialState = RandomChoice@80. 512DD. 512<D. 16 | Heterogeneous Computing in Mathematica 8 . curr. mint heightL 8 int xIndex = get_global_idH0L. 0. < < < if Hcurr == 1L nxt@indexD = Hneighbrs == 2 »» neighbrs == 3L ? 1 : 0. ii++L 8 if HxIndex + ii >= 0 && xIndex+ii < widthL 8 for Hjj = -1. "NBody. _Integer<D We set the initial state using random choice. Dynamic@ Refresh@ initialState = First@OpenCLGameOfLife@initialState. 8_Integer. Input<. < <". The OpenCL implementation is included as part of the Mathematica distribution. This uses Dynamic to animate the result at 60 frames per second. for Hii = -1. We set the work group size to 16×16. This loads the function using OpenCLFunctionLoad. if HxIndex < width && yIndex < heightL 8 curr = prev@indexD. jj <= 1. ArrayPlot@initialState. jj++L 8 if HyIndex+jj >= 0 && yIndex+jj < heightL neighbrs += prev@xIndex + ii + HyIndex+jjL*widthD. "gol_kernel". 8512. ii <= 1. jj. yIndex = get_global_idH1L. ImageSize Ø MediumD. 16<D OpenCLFunction@<>. 88_Integer. _Integer. We set the output state to all zeros. neighbrs = -curr. "SupportFiles". neighbrs. __global mint * nxt. "Input"<.cl"<D. neighbrs = 0. 512. outputState = ConstantArray@0. 8512. "Output"<.7. int ii.src = " __kernel void gol_kernelH__global mint * prev. OpenCLGameOfLife = OpenCLFunctionLoad@src.3< Ø 80. making 70% of the 512×512 initial state set to zero while the others are set to 1.

3D & êü OpenCLMemoryGet@posDDD This animates the result using Dynamic.0. pos = OpenCLMemoryLoad@RandomReal@512. vel. 1024D. 8Local. numParticles. _. deltaT. vel. _. NBody = OpenCLFunctionLoad@8srcf<. _Integer. newVel. 8numParticles. This plots the body points. 8Float@4D. 4<D. Note that you can pass the vector type "float4" into the OpenCL program and Mathematica handles the conversion. vel = OpenCLMemoryLoad@RandomReal@1. _Integer. _. "Float". "Float@4D"D. Take@Ò. 88Float@4D. epsSqrt. 4<D. numParticles. Heterogeneous Computing in Mathematica 8 | 17 . Input<. 8Float@4D. 256 * 4. nbody_sim. deltaT = 0. 1024D. newPos. Input<. epsSqrt.05. Graphics3D@Point@Take@Ò. numParticles. epsSqrt = 50. "Float@4D"D. 8numParticles. pos. and epsilon distance are chosen. Float. "Float"<. vel. newVel. deltaT.This loads OpenCLFunction. pos. "Output"<<. NBody@newPos. deltaT. 8"Float@4D". deltaT. epsSqrt. 256 * 4. 8numParticles. newPos = OpenCLMemoryAllocate@"Float@4D". 3D & êü OpenCLMemoryGet@posD. 256 * 4. newPos. Output<. 4<D. numParticles. "Output"<. "nbody_sim". 8numParticles. 8"Float@4D". 4<D. time step. 8Float@4D. "Input"<. "Input"<. This sets the input and output memories. Float. This calls the NBody function. UpdateInterval Ø 0DDDD Real-time animation of the N-body simulation. NBody@newPos. vel. 88"Float@4D". Output<<D The number of particles. NBody@pos. numParticles = 1024. 1024D. "Float". newVel. newVel = OpenCLMemoryAllocate@"Float@4D". 256 * 4. epsSqrt. Float<. Graphics3D@Point@ Dynamic@Refresh@ NBody@pos. 1024D. 8"Local". 256D OpenCLFunction@<>. 8"Float@4D". newVel. _.

and the function is applied to an image. in some cases it is desirable to have a powerful workstation where users can invoke OpenCL computation from within the browser or mobile devices—invoking a financial computation using the latest stock data from a smart phone. webMathematica is a solution for such scenarios. This allows heterogeneous computation to be performed on the server from within a client’s web browser. The computed image is then displayed on the screen for the user. 18 | Heterogeneous Computing in Mathematica 8 . The above shows a teaching module developed to enable students to program an OpenCL kernel without being exposed to either Mathematica syntax or host side programming. the OpenCL kernel is compiled. When a user clicks submit. an OpenCLFunction is generated. There are many possible applications for this. which allows you to deploy Mathematica programs on the web by embedding them in JavaServer Pages (JSP). for example.OpenCL on the Web with webMathematica Wolfram Research also offers webMathematica. Aside from the academic applications.

Image Processing CUDALink’s image processing capabilities can be classified into three categories. To use CUDALink. In this section. there are the pixel operators. The first is convolution. division. we show some of the built-in functionality in CUDALink as well as how to use CUDALink to program the GPU from within Mathematica. By providing the same usage syntax as OpenCLLink. CUDALink supports pixel operations on one or two images. which contains abilities such as erosion. Heterogeneous Computing in Mathematica 8 | 19 . Convolving a microscopic image with a Sobel mask to detect edges. such as adding or multiplying pixel values from two images. Finally. and it can use Mathematica’s built-in filters as the kernel. The second is morphology. lists. opening. dilation. F Multiplication of two images. -1 0 1 -2 0 2 F -1 0 1 CUDAImageConvolveB . Needs@"CUDALink`"D CUDALink’s convolution is similar to Mathematica’s ListConvolve and ImageConvolve functions. users must first load it. CUDAImageMultiplyB . or CUDA memory references. which is optimized for CUDA. and closing. These are the image multiplication. Mathematica provides CUDA support. subtraction. and addition. Mathematica is unique in enabling easy porting of CUDA applications to OpenCL and vice versa. It will operate on images.Mathematica’s CUDALink Applications In addition to OpenCL support.

0. 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 F êê MatrixForm 4 5 15 30 45 60 75 Performing matrix multiplication. matrix-vector multiplication. Here we use the Manipulate function. morphology operations are supported. Methods such as matrix-matrix multiplication. 20 | Heterogeneous Computing in Mathematica 8 . and transposing matrices are all accelerated to use the GPU.Finally. The following shows the American put option’s surface plot as the spot price and expiry vary. finding minimum and maximum elements. or Monte Carlo methods depending on the type of option selected. radiusF. 1 2 CUDADotB 3 4 5 15 30 45 60 75 15 30 45 60 75 15 30 45 60 75 1 2 3 4 5 15 30 45 60 75 1 2 3 4 5 1 2 3 4 5 1 2 3 . Financial Engineering CUDALink has built-in financial options pricing capabilities. Here. ManipulateBCUDAErosionB . Fast Fourier Transforms and Linear Algebra Users can perform Fourier transforms and various linear algebra operations with CUDALink. 9<F Construction of an interface that performs a morphological operation on an image with varying radii. which use either the analytic solution. which makes creating GUI interfaces simple. 8radius. we multiply two matrices together. the binomial solution.

int height. int yIndex = threadIdx. ii < MAX_ITERATIONS && x*x + y*y < BAILOUT.x. which is a generalization of the Mandelbrot set. c. Real_t Real_t Real_t Real_t x = ZOOM_LEVEL*Hwidthê2 . DataRange Ø 8830. < c = logH0. ii++L 8 tmp = x*x . tmp. 8height. code = " __global__ void julia_kernelHReal_t * set. 816.06. 80. _Integer.. 130<. _Integer. height< = 8512. The following implements the CUDA kernel. Range@0.x*blockDim. Complex Dynamics CUDALink enables you to easily investigate computationally intensive complex dynamics structures.yIndexL. "Barriers" Ø 100. "ZOOM_LEVEL" Ø "0. _Real. 8"StrikePrice" Ø 80. "Output"<.y. int ii.xIndexL. set@xIndex + yIndex*widthD = c. 88_Real. 130. < <". "Expiration" Ø Ò<. "Defines" Ø 8"MAX_ITERATIONS" Ø 10.2. 1D.ListPlot3D@ParallelMap@CUDAFinancialDerivative@8"American".y*y + cx.2DD. macros are used here to allow the compiler to further optimize the code. We will compute the Julia set.x + blockIdx.0"<D.45. "Option"<D A three-dimensional plot of the CUDA-evaluated American put. "InterestRate" Ø 0. 8width. x = tmp. "Time". Heterogeneous Computing in Mathematica 8 | 21 .y*blockDim. 8"CurrentPrice" Ø Range@30. 10. width<D. 10<<.02. Notice that the syntax is the same as OpenCLFunctionLoad. "julia_kernel".y + blockIdx. While we did not show macros being used in OpenCLLink... 0. This loads the CUDAFunction. "Dividend" Ø 0. Real_t cx. Real_t cyL 8 int xIndex = threadIdx.2. "Rebate" Ø 5. AxesLabel Ø 8"Stock". if HxIndex < width && yIndex < heightL 8 for Hii = 0. In this case. y = ZOOM_LEVEL*Hheightê2 . y = 2*x*y + cy.0050". 512<. The width and height are set and the output memory is allocated.<D &. "BAILOUT" Ø "4. int width. JuliaCalculate = CUDAFunctionLoad@code. we utilize parallel programming over CPUs in addition to that provided by the GPU. "Volatility" Ø 0. 16<. jset = CUDAMemoryAllocate@Real.1f + sqrtHx*x + y*yLL. "Put"<. _Real<.

2. DataRange Ø 88. _Integer<.5D. 1<<. 2<<. The following sets the function parameters. height<D. 82. c@@2DD.x + blockIdx. Dashed. Appearance Ø Graphics@8Thick. 8width. "brownianMotion". It is used in computational chemistry. 88c. c@@2DD<<<D<D. c@@1DD. ColorFunction Ø "Rainbow".2<. ImageSize Ø 50D<D Interactive computation and rendering of a Julia set. Line@888c@@1DD. ReliefPlot@Reverse ü CUDAMemoryGet@jsetD. &rngStateL. c@@2DD<. Brownian Motion Brownian motion is a very important concept in many scientific fields.0<. . ImageSize Ø 512. cudaBM = CUDAFunctionLoad@code. 80. mint pathLen. _Integer. code = " Òinclude \"curand_kernel. 82. physics. In the following code.0. 88_Real.0. The following loads the above CUDA code into Mathematica. Frame Ø None. 64.2. ii++L 8 sum += curand_normalH&rngStateL. 2<. Dashed.2. We use a low path length and path number to make it easy to see the motion path. Circle@D<. Real_t sum = 0. 2. ii < pathLen. < < <". Manipulate@ JuliaCalculate@jset. out@ii*pathN + indexD = sum. "UnmangleCode" Ø FalseD. curand_initH1234.2. Epilog Ø 8Opacity@. 8. width. Opacity@.x*blockDim. 22 | Heterogeneous Computing in Mathematica 8 . . if Hindex < pathNL 8 out@indexD = sum.75D. we use the CURAND kernel library to generate random numbers for computing sample paths of Brownian motion. for Hint ii = 1. height. index.This creates an interface using Manipulate and ReliefPlot where you can adjust the value of the constant c interactively.0<<. int index = threadIdx. and finance. Locator. 8.2<.h\" extern \"C\" __global__ void brownianMotionHReal_t *out. 0. "Output"<.x. Thick. mint pathNL 8 curandState rngState.2. 88. 8c@@1DD.

pathN<D. or deployed using Wolfram’s Computable Document Format’. pathNDDD.3<. Heterogeneous Computing in Mathematica 8 | 23 . out = ConstantArray@0. pathLen.6. . The Mathematica Advantage If you were to combine the performance computing aspects in Mathematica with the following Mathematica features. And. 8x. with little effort. The interfaces can be used to experiment with parameter values. 6. pathN.1. pathN = 16. . programs can be written so that they execute on either the CPU or GPU depending on the detected hardware.pathLen = 1024. plot of sin x y 3D plot Plot3D@y * Sin@xD. 8pathLen. Mathematica uses Wolfram|Alpha to interpret the result. res = Transpose@First@cudaBM@out. giving the output and the interpreted Mathematica program to get the output. ListLinePlot@resD 50 200 400 600 800 1000 -50 The possibilities are open for more complicated and broader applications of the GPU capabilities in Mathematica.3. 1<D Simple Interface Creation Mathematica makes it simple to create interactive user interfaces. as teaching modules. 8y. The following visualizes the result. you could develop non-trivial heterogeneous programs intuitively. Free-Form Linguistic Input Mathematica is unique in providing an avenue for users to write programs in plain English.

tF.The following creates an interface that allows users to adjust the radius and threshold parameters for the Canny edge detector. mathematics.1. and physics. "threshold"<. lines = ImageLines@EdgeDetect@imgD.5<F Broad Field Coverage By using both the CPU and the GPU. medical imaging. . a reference implementation is likely to exist. . Graphics@8Thick. Users have written code that uses both the CPU and GPU concurrently on multiple machines to solve tasks in computer vision. img = . . for example. Line êü lines<DD 24 | Heterogeneous Computing in Mathematica 8 . Show@img. This makes benchmarking and testing simple. Red. This. "radius"<. 2. ManipulateBEdgeDetectB . . 0. 88r. 88t. 10<. finds all lines in an input image. Since Mathematica has broad field coverage.06D. and making them available to the user.28. r. Mathematica embodies the heterogeneous message. 1.

WaveletImagePlot@dwdD Import/Export Mathematica has extensive support for importing and exporting data from hundreds of formats. Colorize@ImageAdjust@Image@Reverse@dataDDD. This renders the dataset as an image. 1<D. It is similar in purpose to MathLink®. the following imports the dataset from a GRIB file. the function adds one to a given integer. The C file can then be compiled and run either as a Mathematica command (for native speed). LibraryLink LibraryLink allows you to load C functions as Mathematica functions. Automatic. Heterogeneous Computing in Mathematica 8 | 25 . ColorFunction Ø "ThermometerColors"D C Code Generation Mathematica 8 introduces the ability to export expressions written using Compile to a C file.Here is another example that computes the discrete wavelet transform of an image. 8"Datasets". These formats include PNG and JPEG for images. dwd = DiscreteWaveletTransformB . 2F. This loads a C function from a library. LaTeX and EPS for typesetting. For example. and XLS and CSV for spreadsheet data. data = Import@"ExampleDataêtemperature. but by running in the same process as the Mathematica kernel. it avoids the memory transfer cost associated with MathLink. or be integrated with an external application using the Wolfram Runtime Library. We can plot the wavelet decomposition as an image pyramid. This file format is common in meteorology to store historical and forecast weather data. "Temperature".grb".

IntegerD LibraryFunction@<>. the function adds one to a given integer.718281828459045 EulerGamma 0. ConstantD &DDD 8CDefine@Catalan. To use Mathematica’s symbolic C code generation capabilities. addOne@3D 4 CUDALink and OpenCLLink are written using LibraryLink and thus are prime examples of LibraryLink’s capabilities. 2. This loads a C function from a library.915966D. ToCCodeString@sD Òdefine Òdefine Òdefine Òdefine Òdefine Òdefine Òdefine Òdefine Òdefine Catalan 0. but by running in the same process as the Mathematica kernel.61803D. MemberQ@Attributes@ÒD. addOne = LibraryFunctionLoad@"demo". CDefine@E. 0.14159D< The symbolic expression can be converted to a C string using the ToCCodeString function.68545D. It is similar in purpose to MathLink®. N@ÒDD &.5772156649015329 Glaisher 1.2824271291006226 GoldenRatio 1.618033988749895 Khinchin 2. CDefine@GoldenRatio. 3.28243D. "demo_I_I". CDefine@Pi. Needs@"SymbolicC`"D This gets all constants in the Mathematica system context and uses SymbolicC’s CDefine to declare a C macro. Here.6854520010653062 MachinePrecision 15. CDefine@Khinchin. 8Integer<. 0. users can manipulate it using standard Mathematica techniques. we convert all the macros to constant values. CDefine@EulerGamma. 0. you first need to import the SymbolicC package.9546D.577216D.71828D. Map@ToExpression. 26 | Heterogeneous Computing in Mathematica 8 . it avoids the memory transfer cost associated with MathLink. IntegerD The library function is run with the same syntax as any other function.915965594177219 Degree 0. The example presented here creates macros for common math constants and manipulates the expression to convert the macros to constant declarations. 15. CDefine@Degree. 1. 2. Symbolic C Code Using Mathematica’s symbolic capabilities. CDefine@MachinePrecision.141592653589793 By representing the C program symbolically.017453292519943295 E 2. CDefine@Glaisher. 8Integer<. users can generate C programs within Mathematica. Select@Names@"System`*"D. s = Map@CDefine@ToString@ÒD. demo_I_I.954589770191003 Pi 3. 1.0174533D.LibraryLink allows you to load C functions as Mathematica functions.

CDeclare@8const. FlattenBPermutationsB: . Heterogeneous Computing in Mathematica 8 | 27 .14159DD< Again. CAssign@Pi. 0. ToCCodeString@sD const const const const const const const const const double double double double double double double double double Catalan = 0.718281828459045.s = ReplaceAll@s. CDeclare@8const. CDeclare@8const.954589770191003. EulerGamma = 0. valDDD 8CDeclare@8const. CDefine@name_. 8FF Wolfram Research also has many technological offerings that make scaling upward and downward possible.6854520010653062. >. 1.68545DD.28243DD. "Device" Ø $KernelIDD.9546DD. the code can be converted to a C string using ToCCodeString.915966DD. 15. val_D Ø CDeclare@8"const". CAssign@GoldenRatio. allowing users to choose the most convenient cost-effective plan for their needs. 2FF>FF.577216DD. 5. CDeclare@8const. CAssign@E.915965594177219.618033988749895. CDeclare@8const. The following performs an image morphological operation on the GPU using all four GPU cards installed on a system. GraphicsGridBPartitionBParallelizeB TableBCUDAErosion@img. . you can easily write domain-specific languages that facilitate meta-programming—programs that write other programs. Using Mathematica’s symbolic code generation tools. Glaisher = 1. 0. Scalability Mathematica programs from low-end netbooks to high-end workstations and clusters. Khinchin = 2. 1.0174533DD. Mathematica facilitates scalable GPU programming.2824271291006226. E = 2. 2. :img. CAssign@MachinePrecision. double<. Degree = 0.5772156649015329. double<. CDeclare@8const. MachinePrecision = 15. double<. The Mathematica licensing is also adaptive. CDeclare@8const. 2. CAssign@Khinchin. double<. Multi-GPU programming. double<. CAssign@EulerGamma. Through our support of all GPU cards and automatic floating point precision detection.71828DD. 3.61803DD. Pi = 3. double<. double<. CDeclare@8const. double<.141592653589793. CAssign@Glaisher.017453292519943295. for example. 0. "double"<. is as simple as wrapping Parallelize around a GPU function. . CAssign@Catalan. CAssign@name. GoldenRatio = 1. CAssign@Degree. double<.

Mathematica’s advantage lies in being able to provide all these features built into the product.com/mathematica-demo Learn more about OpenCL programming in Mathematica: U. or just yourself. gridMathematica. Wolfram CDF Player. cost-effective plan for your workgroup. Mathematica and MathLink are registered trademarks of Wolfram Research.S.com/mathematica/trial Schedule a technical demo: www. EVT1102 2298613 0611. in the past few pages we wrote a dozen heterogeneous programs in diverse fields that would have been difficult to do in any other system.co. and Wolfram Lightweight Grid are trademarks of Wolfram Research. Visit us online for more information: www. Inc.wolfram.Summary Mathematica provides several key built-in technologies that allow for easy transitioning to using heterogeneous-based computing. and Canada: Europe: 1-800-WOLFRAM (965-3726) +44-(0)1993-883400 info@wolfram. including network licensing for groups.com/mathematica-purchase Recommended Next Steps Try Mathematica for free: www.wolfram. having them be portable across operating systems. and Canada (except Europe and Asia): +1-217-398-0700 info@wolfram.uk Outside U. Inc. Wolfram Workbench. and making them scalable to low-end and high-end systems.wolfram.com Asia: +81-(0)3-3518-2880 info@wolfram. webMathematica. directorate. Inc. Pricing and Licensing Information Wolfram Research offers many flexible licensing options for both organizations and individuals. As proof of the simplicity. Mathematica is not associated with Mathematica Policy Research. university. department.com info@wolfram. All other trademarks are the property of their respective owners. Inc.jp © 2011 Wolfram Research.S.CP 28 | Heterogeneous Computing in Mathematica 8 . Wolfram|Alpha is a registered trademark and computational knowledge engine is a trademark of Wolfram Alpha LLC—A Wolfram Research Company.co. You can choose a convenient. providing an intuitive interface for their use through careful functionality design.

Sign up to vote on this title
UsefulNot useful