OpenCL 2.1 Extended Instruction Set Specification (Provisional)

Size: px

Start display at page:

Download "OpenCL 2.1 Extended Instruction Set Specification (Provisional)"

Mitchell Pearson
6 years ago
Views:

1 OpenCL 2.1 Etended Instruction Set Specification (Provisional) Boaz Ouriel, Intel Version 0.99, Revision 30 April 2, 2015

2 OpenCL 2.1 Etended Instruction Set Specification (Provisional) ii Copright The Khronos Group Inc. All Rights Reserved. This specification is protected b copright laws and contains material proprietar to the Khronos Group, Inc. It or an components ma not be reproduced, republished, distributed, transmitted, displaed, broadcast, or otherwise eploited in an manner without the epress prior written permission of Khronos Group. You ma use this specification for implementing the functionalit therein, without altering or removing an trademark, copright or other notice from the specification, but the receipt or possession of this specification does not conve an rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anthing that it ma describe, in whole or in part. Khronos Group grants epress permission to an current Promoter, Contributor or Adopter member of Khronos to cop and redistribute UNMODIFIED versions of this specification in an fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for an version of the API is used whenever possible. Such distributed specification ma be reformatted AS LONG AS the contents of the specification are not changed in an wa. The specification ma be incorporated into a product that is sold as long as such product includes significant independent work developed b the seller. A link to the current version of this specification on the Khronos Group website should be included whenever possible with specification distributions. Khronos Group makes no, and epressl disclaims an, representations or warranties, epress or implied, regarding this specification, including, without limitation, an implied warranties of merchantabilit or fitness for a particular purpose or noninfringement of an intellectual propert. Khronos Group makes no, and epressl disclaims an, warranties, epress or implied, regarding the correctness, accurac, completeness, timeliness, and reliabilit of the specification. Under no circumstances will the Khronos Group, or an of its Promoters, Contributors or Members or their respective partners, officers, directors, emploees, agents, or representatives be liable for an damages, whether direct, indirect, special or consequential damages for lost revenues, lost profits, or otherwise, arising from or in connection with these materials. Khronos, SYCL, SPIR, WebGL, EGL, COLLADA, StreamInput, OpenVX, OpenKCam, gltf, OpenKODE, OpenVG, OpenWF, OpenSL ES, OpenMAX, OpenMAX AL, OpenMAX IL and OpenMAX DL are trademarks and WebCL is a certification mark of the Khronos Group Inc. OpenCL is a trademark of Apple Inc. and OpenGL and OpenML are registered trademarks and the OpenGL ES and OpenGL SC logos are trademarks of Silicon Graphics International used under license b Khronos. All other product names, trademarks, and/or compan names are used solel for identification and belong to their respective owners.

3 OpenCL 2.1 Etended Instruction Set Specification (Provisional) iii REVISION HISTORY NUMBER DATE DESCRIPTION NAME 1 Aug 2014 Created jk 29 Mar 2015 Provisional Release jk 30 2-Apr-2015 Provisional Release jk

4 OpenCL 2.1 Etended Instruction Set Specification (Provisional) iv Contents 1 Introduction 1 2 Binar Form Math Integer Common Geometric Relational Vector Data Load and Store Miscellaneous Vector Misc Image functions Image encoding Sampler encoding Image format encoding Image read functions Image write functions Image quer functions

5 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 1 / 80 Contributors and Acknowledgements Yaun Liu, AMD Brian Sumner, AMD Mart Johnson, AMD Mandana Baregheh, AMD Andrew Richards, Codepla Gu Benei, Intel Raun Krisch, Intel Yuan Lin, NVIDIA Lee Howes, Qulacomm Chihong Zang, Qualcomm Ben Gaster, Qualcomm Jack Liu, QUALCOMM 1 Introduction This is the specification of OpenCL.std.21 instruction set. The librar is imported into a SPIR-V module in the following manner: <et-inst-id> OpEtInstImport "OpenCL.std.21" The librar can onl be imported when Memor Model is set to OpenCL21 2 Binar Form This section contains the semantics and eact form of eecution of OpenCL 2.1 using the OpEtInst instruction. In this section we use the following naming conventions: void denote an OpTpeVoid. half, float and double denote an OpTpeFloat with a width of 16, 32 and 64 bits respectivel. i8, i16, i32 and i64 denote an OpTpeInt with a width of 8, 16, 32 and 64 bits respectivel. bool denotes an OpTpeBool. size_t denotes an i32 when the Addressing Model is Phsical32 and i64 when the Addressing Model is Phsical64.

6 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 2 / 80 vector(n) denotes an OpTpeVector where n indicates the component count. vector(n 1, n 2,..., n i ) abbreviates vector(n 1 ), vector(n 2 ),... or vector(n i ). integer denotes i8, i16, i32 or i64. floating-point denotes half, float, double. pointer(storage) denotes an OpTpePointer which points to storage Storage Class. pointer(constant) denotes an OpTpePointer with UniformConstant Storage Class. pointer(generic) denotes an OpTpePointer with Generic Storage Class. pointer(global) denotes an OpTpePointer with WorkgroupGlobal Storage Class. pointer(local) denotes an OpTpePointer with WorkgroupLocal Storage Class. pointer(private) denotes an OpTpePointer with Private Storage Class. pointer(s 1, s 2,..., s i ) abbreviates pointer(s 1 ), pointer(s 2 ),... or pointer(s i ). image defines all tpes of image memor objects (See image encoding section). sampler a SPIR-V sampler object (See sampler encoding section). 2.1 Math This section describes the list of eternal math. The eternal math are categorized into the following: A list of that have scalar or vector argument versions, and, A list of that onl take scalar float arguments. The vector versions of the math operate component-wise. The description is per-component. The math are not affected b the prevailing rounding mode in the calling environment, and alwas return the same value as the would if called with the round to nearest even rounding mode.

7 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 3 / 80 acos Compute the arc cosine of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 0 acosh Compute the inverse hperbolic cosine of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 1 acospi Compute acos() / π. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 2 asin Compute the arc sine of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 3

8 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 4 / 80 asinh Compute the inverse hperbolic sine of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 4 asinpi Compute asin() / π. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 5 atan Compute the arc tangent of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 6 atan2 Compute the arc tangent of /., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 7

9 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 5 / 80 atanh Compute the hperbolic arc tangent of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 8 atanpi Compute atan() / π. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 9 atan2pi Compute atan2(, ) / π., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 10 cbrt Compute the cube-root of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 11

10 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 6 / 80 ceil Round to integral value using the round to positive infinit rounding mode. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 12 copsign Returns with its sign changed to match the sign of., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 13 cos Compute the cosine of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 14 cosh Compute the hperbolic cosine of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 15

11 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 7 / 80 cospi Compute cos() / π. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 16 erfc Complementar error function of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 17 erf Error function of encountered in integrating the normal distribution. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 18 ep Compute the base-e eponential of. (i.e. e ) and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 19

12 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 8 / 80 ep2 Computes 2 raised to the power of. (i.e. 2 ) and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 20 ep10 Computes 10 raised to the power of. (i.e. 10 ) and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 21 epm1 Computes e and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 22 fabs Compute the absolute value of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 23

13 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 9 / 80 fdim Compute - if >, +0 if is less than or equal to., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 24 floor Round to the integral value using the round to negative infinit rounding mode. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 25 fma Compute the correctl rounded floating-point representation of the sum of c with the infinitel precise product of a and b.rounding of intermediate products shall not occur. Edge case behavior is per the IEEE standard.,a,b and c must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 26 a b c fma Returns if <, otherwise it returns. If one argument is a NaN, Fma returns the other argument. If both arguments are NaNs, Fma returns a NaN., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe. Note: fma behave as defined b C99 and ma not match the IEEE definition for manum with regard to signaling NaNs.Specificall, signaling NaNs ma behave as quiet NaNs 7 44 Result set 27

14 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 10 / 80 fmin Returns if <, otherwise it returns. If one argument is a NaN, Fmin returns the other argument. If both arguments are NaNs, Fmin returns a NaN., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe. Note: fmin behave as defined b C99 and ma not match the IEEE definition for minnum with regard to signaling NaNs.Specificall, signaling NaNs ma behave as quiet NaNs 7 44 Result set 28 fmod Modulus. Returns - * trunc (/)., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 29 fract Returns fmin( - floor(), 01.fffffep-1f. floor() is returned in ptr. and must be floating-point or vector(2,3,4,8,16) of floating-point values. ptr must be a pointer(generic) to floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe, or must be a pointer to the same tpe Result set 30 ptr

15 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 11 / 80 frep Etract the mantissa and eponent from. The holds the mantissa, and ep points to the eponent. For each component the mantissa returned is a floating-point with magnitude in the interval [1/2, 1) or 0. Each component of equals mantissa returned * 2 ep. and must be floating-point or vector(2,3,4,8,16) of floating-point values. ep must be a pointer(generic) to i32 or vector(2,3,4,8,16) of i32 values. and operands must be of the same tpe. ep operand must point to an i32 with the same component count as and operands Result set 31 ep hpot Compute the value of the square root of without undue overflow or underflow., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 32 ilogb Return the eponent of as an i32 value. must be i32 or vector(2,3,4,8,16) of i32 values. must be floating-point or vector(2,3,4,8,16) of floating-point values. and operands must have the same component count Result set 33

16 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 12 / 80 ldep Multipl b 2 to the power k. k must be i32 or vector(2,3,4,8,16) of i32 values. and must be floating-point or vector(2,3,4,8,16) of floating-point values. and operands must be of the same tpe. ep operand must have the same component count as Result Tpe and operands Result set 34 k lgamma Log gamma function of. Returns the natural logarithm of the absolute value of the gamma function. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 35 lgamma_r Log gamma function of. Returns the natural logarithm of the absolute value of the gamma function. The sign of the gamma function is returned in the signp operand and must be floating-point or vector(2,3,4,8,16) of floating-point values. singp must be a pointer(generic) to i32 or vector(2,3,4,8,16) of i32 values. and operands must be of the same tpe. singp operand must point to an i32 with the same component count as and operands Result set 36 singp log Compute natural logarithm of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe.

17 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 13 / Result set 37 log2 Compute a base 2 logarithm of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 38 log10 Compute a base 10 logarithm of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 39 log1p Compute log e (1.0 + ). and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 40 logb Compute the eponent of, which is the integral part of log r. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 41

18 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 14 / 80 mad mad approimates a * b + c. Whether or how the product of a * b is rounded and how supernormal or subnormal intermediate products are handled is not defined. mad is intended to be used where speed is preferred over accurac,a,b and c must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe. Note: For some usages, e.g.mad(a, b, -a*b), the definition of mad() is loose enough that almost an result is allowed from mad() for some values of a and b Result set 42 a b c mamag Returns if >, if >, otherwise fma(, )., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 43 minmag Returns if <, if <, otherwise fmin(, )., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 44 modf Decompose a floating-point number. The modf function breaks the argument into integral and fractional parts, each of which has the same sign as the argument. It stores the integral part in the object pointed to b iptr and must be floating-point or vector(2,3,4,8,16) of floating-point values. iptr must be a pointer(generic) to floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe, or must be a pointer to the same tpe.

19 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 15 / Result set 45 iptr nan Returns a quiet NaN. The nancode ma be placed in the significand of the resulting NaN. nancode must be i32 or vector(2,3,4,8,16) of i32 values. must be floating-point or vector(2,3,4,8,16) of floating-point values. and nancode operands must have the same component count Result set 46 nancode netafter Computes the net representable floating-point value following in the direction of. Thus, if is less than, netafter() returns the largest representable floating-point number less than., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 47 pow Compute to the power.,, and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 48

20 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 16 / 80 pown Compute to the power, where is an i32 integer. must be i32 or vector(2,3,4,8,16) of i32 values. must be floating-point or vector(2,3,4,8,16) of floating-point values. and operands must be of the same tpe. operand must have the same component count as and operands Result set 49 powr Compute to the power, where is an integer., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 50 remainder Compute the value r such that r = - n*, where n is the integer nearest the eact value of /. If there are two integers closest to /, n shall be the even one. If r is zero, it is given the same sign as., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 51

21 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 17 / 80 remquo The remquo function computes the value r such that r = - k*, where k is the integer nearest the eact value of /. If there are two integers closest to /, k shall be the even one. If r is zero, it is given the same sign as. This is the same value that is returned b the remainder function. remquo also calculates the lower seven bits of the integral quotient /, and gives that value the same sign as /. It stores this signed value in the object pointed to b quo., and must be floating-point or vector(2,3,4,8,16) of floating-point values. quo must be a pointer(generic) to i32 or vector(2,3,4,8,16) of i32 values., and operands must be of the same tpe. quo operand must point to an i32 with the same component count as, and operands Result set 52 quo rint Round to integral value (using round to nearest even rounding mode) in floating-point format. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 53 rootn Compute to the power 1/. must be i32 or vector(2,3,4,8,16) of i32 values. and must be floating-point or vector(2,3,4,8,16) of floating-point values. and operands must be of the same tpe. operand must have the same component count as and operands Result set 54

22 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 18 / 80 round Return the integral value nearest to rounding halfwa cases awa from zero, regardless of the current rounding direction. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 55 rsqrt Compute inverse square root of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 56 sin Compute sine of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 57 sincos Compute sine and cosine of. The computed sine is the return value and computed cosine is returned in cosval. and must be floating-point or vector(2,3,4,8,16) of floating-point values. cosval must be a pointer(generic) to floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe, or must be a pointer to the same tpe Result set 58 cosval

23 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 19 / 80 sinh Compute hperbolic sine of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 59 sinpi Compute sin (π ). and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 60 sqrt Compute square root of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 61 tan Compute tangent of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 62

24 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 20 / 80 tanh Compute hperbolic tangent of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 63 tanpi Compute tan (π ). and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 64 tgamma Compute the gamma function of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 65 trunc Round to integral value using the round to zero rounding mode. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 66

25 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 21 / 80 half_cos Compute cosine of, where must be in the range and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 67 half_divide Compute /., and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 68 half_ep Compute the base-e eponential of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 69

26 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 22 / 80 half_ep2 Compute the base- 2 eponential of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 70 half_ep10 Compute the base- 10 eponential of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 71 half_log Compute natural logarithm of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 72

27 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 23 / 80 half_log2 Compute a base 2 logarithm of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 73 half_log10 Compute a base 10 logarithm of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 74 half_powr Compute to the power, where is >= 0., and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 75

28 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 24 / 80 half_recip Compute reciprocal of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 76 half_rsqrt Compute inverse square root of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 77 half_sin Compute sine of, where must be in the range and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 78

29 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 25 / 80 half_sqrt Compute the square root of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 79 half_tan Compute tangent value of, where must be in the range and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. This function is implemented with a minimum of 10-bits of accurac i.e. an ULP value 8192 ulp. The support for denormal values is optional and ma return an result allowed even when -cl-denormals-are-zero flag is not in force Result set 80 native_cos Compute cosine of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 6 44 Result set 81

30 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 26 / 80 native_divide Compute / over an implementation-defined range. The maimum error is implementation-defined., and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 7 44 Result set 82 native_ep Compute the base-e eponential of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 6 44 Result set 83 native_ep2 Compute the base- 2 eponential of over an implementation-defined range. The maimum error is implementation-defined.. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 6 44 Result set 84

31 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 27 / 80 native_ep10 Compute the base- 10 eponential of over an implementation-defined range. The maimum error is implementation-defined.. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 6 44 Result set 85 native_log Compute natural logarithm of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 6 44 Result set 86 native_log2 Compute a base 2 logarithm of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 6 44 Result set 87

32 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 28 / 80 native_log10 Compute a base 10 logarithm of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 6 44 Result set 88 native_powr Compute to the power, where is >= 0., and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 7 44 Result set 89 native_recip Compute reciprocal of over an implementation-defined range. The range of and are implementation-defined. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 6 44 Result set 90

33 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 29 / 80 native_rsqrt Compute inverse square root of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 6 44 Result set 91 native_sin Compute sine of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 6 44 Result set 92 native_sqrt Compute the square root of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 6 44 Result set 93

34 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 30 / 80 native_tan Compute tangent value of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function 6 44 Result set Integer This section describes the list of integer that take scalar or vector arguments. The vector versions of the integer functions operate component-wise. The description is per-component. s_abs Returns, where is treated as signed integer. and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 141 s_abs_diff Returns - without modulo overflow, where and are treated as signed integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 142

35 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 31 / 80 s_add_sat Returns the saturated value of +, where and are treated as signed integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 143 u_add_sat Returns the saturated value of +, where and are treated as unsigned integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 144 s_hadd Returns the value of ( + ) >> 1, where and are treated as signed integers. The intermediate sum does not modulo overflow., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 145 u_hadd Returns the value of ( + ) >> 1, where and are treated as unsigned integers. The intermediate sum does not modulo overflow., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 146

36 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 32 / 80 s_rhadd Returns the value of ( + + 1) >> 1, where and are treated as signed integers. The intermediate sum does not modulo overflow., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 147 u_rhadd Returns the value of ( + + 1) >> 1, where and are treated as unsigned integers. The intermediate sum does not modulo overflow., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 148 s_clamp Returns s_min(s_ma(,minval),maval). Results are undefined if minval > maval.,,minval and maval must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 149 minval maval u_clamp Returns u_min(u_ma(,minval),maval). Results are undefined if minval > maval.,,minval and maval must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 150 minval maval

37 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 33 / 80 clz Returns the number of leading 0-bits in, starting at the most significant bit position. If is 0, returns the size in bits of the tpe of or component tpe of, if is a vector. and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 151 ctz Returns the count of trailing 0-bits in. If is 0, returns the size in bits of the tpe of or component tpe of, if is a vector. and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 152 s_mad_hi Returns mul_hi(a, b) + c, where a,b and c are treated as signed integers.,a,b and c must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 153 a b c s_ma Returns if <, otherwise it returns, where and are treated as signed integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 156

38 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 34 / 80 u_ma Returns if <, otherwise it returns, where and are treated as unsigned integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 157 s_min Returns if <, otherwise it returns, where and are treated as signed integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 158 u_min Returns if <, otherwise it returns, where and are treated as unsigned integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 159 s_mul_hi Computes * and returns the high half of the product of and, where and are treated as signed integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 160

39 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 35 / 80 rotate For each element in v, the bits are shifted left b the number of bits given b the corresponding element in i. Bits shifted off the left side of the element are shifted back in from the right.,v and i must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 161 v i s_sub_sat Returns the saturated value of -, where and are treated as signed integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 162 u_sub_sat Returns the saturated value of -, where and are treated as unsigned integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 163

40 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 36 / 80 u_upsample When hi and lo component tpe is i8: Result = ((upcast... to i16)hi << 8) lo When hi and lo component tpe is i16: Result = ((upcast... to i32)hi << 8) lo When hi and lo component i32: Result = ((upcast... to i64)hi << 8) lo hi and lo are treated as unsigned integers. hi and lo must be i8, i16 or i32 or vector(2,3,4,8,16) of i8, i16 or i32 values. must be i16, i32 or i64 or vector(2,3,4,8,16) of i16, i32 or i64 values. hi and lo operands must be of the same tpe. When hi and lo component tpe is i8, the component tpe must be i16. When hi and lo component tpe is i16, the component tpe must be i32. When hi and lo component tpe is i32, the component tpe must be i64. must have the same component count as hi and lo operands Result set 164 hi lo s_upsample When hi and lo component tpe is i8: Result = ((upcast... to i16)hi << 8) lo When hi and lo component tpe is i16: Result = ((upcast... to i32)hi << 8) lo When hi and lo component i32: Result = ((upcast... to i64)hi << 8) lo hi and lo are treated as signed integers. hi and lo must be i8, i16 or i32 or vector(2,3,4,8,16) of i8, i16 or i32 values. must be i16, i32 or i64 or vector(2,3,4,8,16) of i16, i32 or i64 values. hi and lo operands must be of the same tpe. When hi and lo component tpe is i8, the component tpe must be i16. When hi and lo component tpe is i16, the component tpe must be i32. When hi and lo component tpe is i32, the component tpe must be i64. must have the same component count as hi and lo operands.

41 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 37 / Result set 165 hi lo popcount Returns the number of non-zero bits in. and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 166 s_mad24 Multip two 24-bit integer values and and add the 32-bit integer result to the 32-bit integer z. Refer to definition of s_mul24 to see how the 24-bit integer multiplication is performed.,, and z must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 167 z u_mad24 Multip two 24-bit integer values and and add the 32-bit integer result to the 32-bit integer z. Refer to definition of u_mul24 to see how the 24-bit integer multiplication is performed.,, and z must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 168 z s_mul24 Multipl two 24-bit integer values and, where and are treated as signed integers. and are 32-bit integers but onl the low 24-bits are used to perform the multiplication. s_mul24 should onl be used when values in and are in the range [-2 23, ]. If and are not in this range, the multiplication result is implementation-defined., and must be i32 or vector(2,3,4,8,16) of i32 values. All of the operands, including the operand, must be of the same tpe.

42 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 38 / Result set 169 u_mul24 Multipl two 24-bit integer values and, where and are treated as unsigned integers. and are 32-bit integers but onl the low 24-bits are used to perform the multiplication. u_mul24 should onl be used when values in and are in the range [0, ]. If and are not in this range, the multiplication result is implementation-defined., and must be i32 or vector(2,3,4,8,16) of i32 values. All of the operands, including the operand, must be of the same tpe Result set 170 u_abs Returns, where is treated as unsigned integer. and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 201 u_abs_diff Returns - without modulo overflow, where and are treated as unsigned integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 202 u_mul_hi Computes * and returns the high half of the product of and, where and are treated as unsigned integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 203

43 OpenCL 2.1 Etended Instruction Set Specification (Provisional) 39 / 80 u_mad_hi Returns mul_hi(a, b) + c, where a,b and c are treated as unsigned integers.,a,b and c must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe Result set 204 a b c 2.3 Common This section describes the the list of common that take scalar or vector arguments. The vector versions of the integer functions operate component-wise. The description is per-component. The common are implemented using the round to nearest even rounding mode. fclamp Returns fmin(fma(, minval), maval). Results are undefined if minval > maval.,,minval and maval must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 95 minval maval degrees Converts radians to degrees, i.e. (180 / π) * radians. and radians must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe Result set 96 radians fma_common Returns if <, otherwise it returns. If or are infinite or NaN, the return values are undefined., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe.

OpenCL Extended Instruction Set Specification

OpenCL Extended Instruction Set Specification OpenCL Etended Instruction Set Specification Boaz Ouriel, Intel Version 1., Revision 2 Ma 15, 217 OpenCL Etended Instruction Set Specification Copright 214-217 The Khronos Group Inc. All Rights Reserved.