You are on page 1of 8

Intel® SDK for OpenCL* - Sample for Bitonic Sorting

User's Guide

Copyright © 2010–2012 Intel Corporation All Rights Reserved Document Number: 325262-002US Revision: 1.3 World Wide Web: http://www.intel.com

Document Number: 325262-002US

ANY CLAIM OF PRODUCT LIABILITY. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION. Intel processor numbers are not a measure of performance. RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE.htm. OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION. OFFICERS. Current characterized errata are available on request. NO LICENSE. in personal injury or death. Performance tests. The information here is subject to change without notice. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications.Intel® SDK for OpenCL* . EXPRESS OR IMPLIED. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS. Intel. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. operations and functions. DIRECTLY OR INDIRECTLY. PERSONAL INJURY.intel. not across different processor families. Go to: http://www. 2 Document Number: 325262-002US . WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN. and other countries. may be obtained by calling 1-800-548-4725. Intel may make changes to specifications and product descriptions at any time. Any change to any of those factors may cause the results to vary.intel. DAMAGES. are measured using specific computer systems. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases. used by permission by Khronos. BY ESTOPPEL OR OTHERWISE. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. * Other names and brands may be claimed as the property of others. Do not finalize a design with this information. Intel Core. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY. Copyright © 2010-2012 Intel Corporation. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.S. software. OpenCL and the OpenCL* logo are trademarks of Apple Inc. Copies of documents which have an order number and are referenced in this document. MERCHANTABILITY. including the performance of that product when combined with other products. A "Mission Critical Application" is any application in which failure of the Intel Product could result. All rights reserved. Processor numbers differentiate features within each processor family. AND THE DIRECTORS.Sample for Bitonic Sorting Disclaimer and Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES. components. or go to: http://www.com/products/processor_number/. AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF. Microsoft product screen shot(s) reprinted with permission from Microsoft Corporation. OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. without notice. such as SYSmark and MobileMark. directly or indirectly. VTune. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". or other Intel literature. Xeon are trademarks of Intel Corporation in the U. TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. HARMLESS AGAINST ALL CLAIMS COSTS.com/design/literature. MANUFACTURE. Intel logo. COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. AND EMPLOYEES OF EACH. SUBCONTRACTORS AND AFFILIATES. OR INFRINGEMENT OF ANY PATENT.

Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 User's Guide 3 . Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. functionality. and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability. These optimizations include SSE2.Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. or effectiveness of any optimization on microprocessors not manufactured by Intel. SSE3.

............................6 OpenCL* Implementation ....................8 4 Document Number: 325262-002US ...............................................5 Path ....................................................................................................7 Project Structure ................................................................5 Motivation............6 Understanding OpenCL* Performance Characteristics ....................................................................................7 Reference (Native) Implementation .........................................................................................................................................................6 Code Highlights ...................................Sample for Bitonic Sorting Contents About Bitonic Sorting Sample .......................................7 Controlling the Sample .................................6 Limitations ........................................................................................................Intel® SDK for OpenCL* ...................................................................................................................................................................................................................7 Benefits of Using Vector Data Types...........8 References ................................................................5 Introduction ..............................................................................................................................................................5 Algorithm ..........................7 Work-Group Size Considerations ...................................................................................

This implementation is very general. It enables efficient SIMD-style parallelism through OpenCL* vector data types.About Bitonic Sorting Sample The Bitonic Sorting sample illustrates implementing calculation kernels using OpenCL* C99 and parallelizing kernels by running several work-groups in parallel. Motivation Sorting algorithms are among most widely used building blocks.exe – 32-bit debug executable x64\Debug\BitonicSort. Path Location Executable <INSTALL_DIR>samples\BitonicSort Win32\Release\BitonicSort. Bitonic sorting algorithm implemented in this sample is based on properties of bitonic sequence and principles of so-called sorting networks.exe – 64-bit executable Win32\Debug\BitonicSort. so it permits you to add <key/value> sorting with relatively low effort.exe – 32bit executable x64\Release\BitonicSort. User's Guide 5 .exe – 64-bit debug executable Introduction This sample demonstrates how to sort arbitrary input array of integer values with OpenCL* using Single Instruction Multiple Data (SIMD) bitonic sorting networks.

As a result. 6 Document Number: 325262-002US .cpp file. The full sorting sequence consists of repetitive kernel calls performed in ExecuteSortKernel() function of BitonicSort. For general reference on bitonic sorting networks. the kernel forms bitonic sequences of size four using SIMD sorting network inside each item of the input array.cl file performs the specified stage of each pass. the number of passes is incremented by one and the sequence size is doubled by merging two neighboring items.Sample for Bitonic Sorting Algorithm For an array of length 2N*4. OpenCL* Implementation Code Highlights Bitonic sort OpenCL* kernel of BitonicSort. Limitations For the sake of simplicity. see [2]. The first stage has one pass. see [1]. For reference on sorting networks using SIMD data types. this algorithm completes N stages of sorting. Every input array item or item pair (depending on the pass number) corresponds to a unique global ID that the kernel uses for their identification. the current version of the sample requires input array of size of 4*2^N 32-bit integer items.Intel® SDK for OpenCL* . where N is a positive integer. For each successive stage.

Work-Group Size Considerations Valid work-group sizes on Intel platforms range from 1 to 1024 elements. such as int4 or float4. with OpenCL* initialization and processing functions • • BitonicSort. This removes unnecessary branches. thus decreasing execution overhead. You can use sorting network inside a single vector item during the last pass on every stage. these optimizations bring additional 25% speedup to the explicitly vectorized version.the host code. saves memory bandwidth. enables the following optimizations: • • You can work with quads instead of single integers. and optimizes CPU cache usage. Reference (Native) Implementation Reference implementation is done in ExecuteSortReference() routine of BitonicSort. but uses pure scalar C nested loop.cl – OpenCL* sorting kernel source code BitonicSort. User's Guide 7 . To achieve peak performance. you get approximately 5x speedup in total. Beside the maximum possible 4x speedup brought by SIMD register usage. This is single-threaded code that performs exactly the same bitonic sort sequence as OpenCL* code. Project Structure This sample project has the following structure: • BitonicSort. Explicit usage of these types. use work-groups of 64-128 elements.Understanding OpenCL* Performance Characteristics Benefits of Using Vector Data Types This sample implements the bitonic sort algorithm using vector data types.vcxproj – Microsoft Visual Studio* 2010 software project file containing all the required dependencies. As a result. This permits merging two last passes together to save an extra kernel invocation per stage.cpp .vcproj – Microsoft Visual Studio* 2008 software project file containing all the required dependencies • BitonicSort.cpp file.

htm.org/citation. -g run sample on the Intel® Processor Graphics device. http://www. If the command line is empty. Lee.1454171&coll=GUIDE&dl=GUIDE&CFI D=105910684&CFTOKEN=82233064 8 Document Number: 325262-002US . Pradeep Dubey: Efficient implementation of sorting on multi-core SIMD CPU architecture. Input array size is 1048576(2^20) items. Sanjeev Kumar. References [1] H. Anthony D. PVLDB 1(2): 13131324 (2008) http://portal. Mostafa Hagog. Bitonic Sort. Akram Baransi.cfm?id=1454159. Nguyen. Yen-Kuang Chen. [2] Jatin Chhugani. William Macy.acm. Lang.de/lang/algorithmen/sortieren/bitonic/bitonicen.Sample for Bitonic Sorting Controlling the Sample The sample executable is a console application.fhflensburg. Run on CPU. To set the sorting direction and input array size.Intel® SDK for OpenCL* . Victor W. the sample uses the default values: • • • Sorting direction is ascending. use command line arguments.iti. W. --h command line argument prints help information -s <arraySize> command line argument setups input/output array size -d command line argument sets descending sorting direction instead of default ascending.