Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved

Similar documents
Bitonic Sorting Intel OpenCL SDK Sample Documentation

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Intel Cache Acceleration Software for Windows* Workstation

Intel SDK for OpenCL* - Sample for OpenCL* and Intel Media SDK Interoperability

How to Create a.cibd File from Mentor Xpedition for HLDRC

How to Create a.cibd/.cce File from Mentor Xpedition for HLDRC

Drive Recovery Panel

High Dynamic Range Tone Mapping Post Processing Effect Multi-Device Version

Intel Atom Processor E3800 Product Family Development Kit Based on Intel Intelligent System Extended (ISX) Form Factor Reference Design

MICHAL MROZEK ZBIGNIEW ZDANOWICZ

Theory and Practice of the Low-Power SATA Spec DevSleep

Ernesto Su, Hideki Saito, Xinmin Tian Intel Corporation. OpenMPCon 2017 September 18, 2017

Intel Atom Processor D2000 Series and N2000 Series Embedded Application Power Guideline Addendum January 2012

Intel Atom Processor E6xx Series Embedded Application Power Guideline Addendum January 2012

Intel RealSense Depth Module D400 Series Software Calibration Tool

Intel Core TM Processor i C Embedded Application Power Guideline Addendum

Intel Core TM i7-4702ec Processor for Communications Infrastructure

INTEL PERCEPTUAL COMPUTING SDK. How To Use the Privacy Notification Tool

Krzysztof Laskowski, Intel Pavan K Lanka, Intel

Intel RealSense D400 Series Calibration Tools and API Release Notes

Using the Intel VTune Amplifier 2013 on Embedded Platforms

Installation Guide and Release Notes

Intel Cache Acceleration Software - Workstation

Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

2013 Intel Corporation

Intel USB 3.0 extensible Host Controller Driver

Intel Open Source HD Graphics Programmers' Reference Manual (PRM)

Intel Integrated Native Developer Experience 2015 (OS X* host)

Intel Open Source HD Graphics, Intel Iris Graphics, and Intel Iris Pro Graphics

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

Intel Stereo 3D SDK Developer s Guide. Alpha Release

Intel Manycore Platform Software Stack (Intel MPSS)

Product Change Notification

LED Manager for Intel NUC

HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF

Intel Desktop Board DZ68DB

Product Change Notification

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

Product Change Notification

Product Change Notification

Product Change Notification

Intel Galileo Firmware Updater Tool

Installation Guide and Release Notes

Intel Graphics Virtualization Technology. Kevin Tian Graphics Virtualization Architect

Product Change Notification

Product Change Notification

Product Change Notification

Product Change Notification

Product Change Notification

Intel Dynamic Platform and Thermal Framework (Intel DPTF), Client Version 8.X

Product Change Notification

Product Change Notification

Intel Manageability Commander User Guide

Product Change Notification

Product Change Notification

Optimizing the operations with sparse matrices on Intel architecture

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

Product Change Notification

Product Change Notification

Reference Boot Loader from Intel

Product Change Notification

The Intel SSD Pro 2500 Series Guide for Microsoft edrive* Activation

Intel Embedded Media and Graphics Driver v1.12 for Intel Atom Processor N2000 and D2000 Series

Intel Media Server Studio 2017 R3 Essentials Edition for Linux* Release Notes

Product Change Notification

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)

Product Change Notification

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes

True Scale Fabric Switches Series

Installation Guide and Release Notes

Intel Media Server Studio Professional Edition for Linux*

Product Change Notification

Customizing an Android* OS with Intel Build Tool Suite for Android* v1.1 Process Guide

Software Evaluation Guide for WinZip* esources-performance-documents.html

Product Change Notification

Product Change Notification

Data Plane Development Kit

Intel Cache Acceleration Software (Intel CAS) for Linux* v2.9 (GA)

Product Change Notification

Product Change Notification

Product Change Notification

Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes

Product Change Notification

Product Change Notification

Product Change Notification

Product Change Notification

Product Change Notification

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Product Change Notification

Intel vpro Technology Virtual Seminar 2010

OMNI-PATH FABRIC TOPOLOGIES AND ROUTING

Product Change Notification

Product Change Notification

Product Change Notification

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Intel Desktop Board D945GCLF2

Software Occlusion Culling

Intel Desktop Board DG41CN

Transcription:

Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2012 Intel Corporation All Rights Reserved Document Number: 325262-002US Revision: 1.3 World Wide Web: http://www.intel.com Document Number: 325262-002US

Disclaimer and Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/processor_number/. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Intel, Intel logo, Intel Core, VTune, Xeon are trademarks of Intel Corporation in the U.S. and other countries. * Other names and brands may be claimed as the property of others. OpenCL and the OpenCL* logo are trademarks of Apple Inc. used by permission by Khronos. Microsoft product screen shot(s) reprinted with permission from Microsoft Corporation. Copyright 2010-2012 Intel Corporation. All rights reserved. 2 Document Number: 325262-002US

Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Intel SDK for OpenCL* Applications Sample Documentation 3

Contents About Bitonic Sorting Sample... 5 Path... 5 Introduction... 5 Motivation... 5 Algorithm... 6 OpenCL* Implementation... 6 Code Highlights... 6 Limitations... 6 Understanding OpenCL* Performance Characteristics... 7 Benefits of Using Vector Data Types... 7 Work-Group Size Considerations... 7 Reference (Native) Implementation... 7 Project Structure... 7 Controlling the Sample... 8 References... 8 4 Document Number: 325262-002US

About Bitonic Sorting Sample The Bitonic Sorting sample illustrates implementing calculation kernels using OpenCL* C99 and parallelizing kernels by running several work-groups in parallel. Path Location Executable <INSTALL_DIR>samples\BitonicSort Win32\Release\BitonicSort.exe 32- bit executable x64\release\bitonicsort.exe 64-bit executable Win32\Debug\BitonicSort.exe 32-bit debug executable x64\debug\bitonicsort.exe 64-bit debug executable Introduction This sample demonstrates how to sort arbitrary input array of integer values with OpenCL* using Single Instruction Multiple Data (SIMD) bitonic sorting networks. This implementation is very general, so it permits you to add <key/value> sorting with relatively low effort. Motivation Sorting algorithms are among most widely used building blocks. Bitonic sorting algorithm implemented in this sample is based on properties of bitonic sequence and principles of so-called sorting networks. It enables efficient SIMD-style parallelism through OpenCL* vector data types. Intel SDK for OpenCL* Applications Sample Documentation 5

Algorithm For an array of length 2N*4, this algorithm completes N stages of sorting. The first stage has one pass. As a result, the kernel forms bitonic sequences of size four using SIMD sorting network inside each item of the input array. For each successive stage, the number of passes is incremented by one and the sequence size is doubled by merging two neighboring items. For general reference on bitonic sorting networks, see [1]. For reference on sorting networks using SIMD data types, see [2]. OpenCL* Implementation Code Highlights Bitonic sort OpenCL* kernel of BitonicSort.cl file performs the specified stage of each pass. Every input array item or item pair (depending on the pass number) corresponds to a unique global ID that the kernel uses for their identification. The full sorting sequence consists of repetitive kernel calls performed in ExecuteSortKernel() function of BitonicSort.cpp file. Limitations For the sake of simplicity, the current version of the sample requires input array of size of 4*2^N 32-bit integer items, where N is a positive integer. 6 Document Number: 325262-002US

Understanding OpenCL* Performance Characteristics Benefits of Using Vector Data Types This sample implements the bitonic sort algorithm using vector data types. Explicit usage of these types, such as int4 or float4, enables the following optimizations: You can work with quads instead of single integers. This removes unnecessary branches, saves memory bandwidth, and optimizes CPU cache usage. You can use sorting network inside a single vector item during the last pass on every stage. This permits merging two last passes together to save an extra kernel invocation per stage, thus decreasing execution overhead. Beside the maximum possible 4x speedup brought by SIMD register usage, these optimizations bring additional 25% speedup to the explicitly vectorized version. As a result, you get approximately 5x speedup in total. Work-Group Size Considerations Valid work-group sizes on Intel platforms range from 1 to 1024 elements. To achieve peak performance, use work-groups of 64-128 elements. Reference (Native) Implementation Reference implementation is done in ExecuteSortReference() routine of BitonicSort.cpp file. This is single-threaded code that performs exactly the same bitonic sort sequence as OpenCL* code, but uses pure scalar C nested loop. Project Structure This sample project has the following structure: BitonicSort.cpp - the host code, with OpenCL* initialization and processing functions BitonicSort.cl OpenCL* sorting kernel source code BitonicSort.vcproj Microsoft Visual Studio* 2008 software project file containing all the required dependencies BitonicSort.vcxproj Microsoft Visual Studio* 2010 software project file containing all the required dependencies. Intel SDK for OpenCL* Applications Sample Documentation 7

Controlling the Sample The sample executable is a console application. To set the sorting direction and input array size, use command line arguments. If the command line is empty, the sample uses the default values: Sorting direction is ascending. Input array size is 1048576(2^20) items. Run on CPU. --h command line argument prints help information -s <arraysize> command line argument setups input/output array size -d command line argument sets descending sorting direction instead of default ascending; -g run sample on the Intel Processor Graphics device. References [1] H. W. Lang. Bitonic Sort. http://www.iti.fhflensburg.de/lang/algorithmen/sortieren/bitonic/bitonicen.htm. [2] Jatin Chhugani, Anthony D. Nguyen, Victor W. Lee, William Macy, Mostafa Hagog, Yen-Kuang Chen, Akram Baransi, Sanjeev Kumar, Pradeep Dubey: Efficient implementation of sorting on multi-core SIMD CPU architecture. PVLDB 1(2): 1313-1324 (2008) http://portal.acm.org/citation.cfm?id=1454159.1454171&coll=guide&dl=guide&cfi D=105910684&CFTOKEN=82233064 8 Document Number: 325262-002US