OpenCL Extended Instruction Set Specification

Size: px

Start display at page:

Download "OpenCL Extended Instruction Set Specification"

Brook Willis
5 years ago
Views:

1 OpenCL Etended Instruction Set Specification Boaz Ouriel, Intel Version 1., Revision 2 Ma 15, 217

2 OpenCL Etended Instruction Set Specification Copright The Khronos Group Inc. All Rights Reserved. This specification is protected b copright laws and contains material proprietar to the Khronos Group, Inc. It or an components ma not be reproduced, republished, distributed, transmitted, displaed, broadcast, or otherwise eploited in an manner without the epress prior written permission of Khronos Group. You ma use this specification for implementing the functionalit therein, without altering or removing an trademark, copright or other notice from the specification, but the receipt or possession of this specification does not conve an rights to reproduce, disclose, or distribute its contents, or to manufacture, use, or sell anthing that it ma describe, in whole or in part. Khronos Group grants epress permission to an current Promoter, Contributor or Adopter member of Khronos to cop and redistribute UNMODIFIED versions of this specification in an fashion, provided that NO CHARGE is made for the specification and the latest available update of the specification for an version of the API is used whenever possible. Such distributed specification ma be reformatted AS LONG AS the contents of the specification are not changed in an wa. The specification ma be incorporated into a product that is sold as long as such product includes significant independent work developed b the seller. A link to the current version of this specification on the Khronos Group website should be included whenever possible with specification distributions. Khronos Group makes no, and epressl disclaims an, representations or warranties, epress or implied, regarding this specification, including, without limitation, an implied warranties of merchantabilit or fitness for a particular purpose or non-infringement of an intellectual propert. Khronos Group makes no, and epressl disclaims an, warranties, epress or implied, regarding the correctness, accurac, completeness, timeliness, and reliabilit of the specification. Under no circumstances will the Khronos Group, or an of its Promoters, Contributors or Members or their respective partners, officers, directors, emploees, agents, or representatives be liable for an damages, whether direct, indirect, special or consequential damages for lost revenues, lost profits, or otherwise, arising from or in connection with these materials. Khronos, SYCL, SPIR, WebGL, EGL, COLLADA, StreamInput, OpenVX, OpenKCam, gltf, OpenKODE, OpenVG, OpenWF, OpenSL ES, OpenMAX, OpenMAX AL, OpenMAX IL and OpenMAX DL are trademarks and WebCL is a certification mark of the Khronos Group Inc. OpenCL is a trademark of Apple Inc. and OpenGL and OpenML are registered trademarks and the OpenGL ES and OpenGL SC logos are trademarks of Silicon Graphics International used under license b Khronos. All other product names, trademarks, and/or compan names are used solel for identification and belong to their respective owners. 2

3 OpenCL Etended Instruction Set Specification Contents 1 Introduction 4 2 Binar Form Math Integer Common Geometric Relational Vector Data Load and Store Miscellaneous Vector Misc Image encoding Sampler encoding A Changes and TBD 63 A.1 Changes from Version.99, Revision A.2 Changes from Version.99, Revision A.3 Changes from Version.99, Revision A.4 Changes from Version 1., Revision A.5 Changes from Version 1., Revision

4 OpenCL Etended Instruction Set Specification Contributors and Acknowledgments Brian Sumner, AMD Mandana Baregheh, AMD Mart Johnson, AMD Yaun Liu, AMD Andrew Richards, Codepla Alee Bader, Intel Ben Ashbaugh, Intel Gu Benei, Intel Raun Krisch, Intel Yuan Lin, NVIDIA Ben Gaster, Qualcomm Chihong Zang, Qualcomm Jack Liu, Qualcomm Lee Howes, Qualcomm 1 Introduction This is the specification of OpenCL.std instruction set. The librar is imported into a SPIR-V module in the following manner: <et-inst-id> OpEtInstImport "OpenCL.std" The librar can onl be imported when Memor Model is set to OpenCL 2 Binar Form This section contains the semantics and eact form of eecution of OpenCL using the OpEtInst instruction. In this section we use the following naming conventions: void denote an OpTpeVoid. half, float and double denote an OpTpeFloat with a width of 16, 32 and 64 bits respectivel. i8, i16, i32 and i64 denote an OpTpeInt with a width of 8, 16, 32 and 64 bits respectivel. bool denotes an OpTpeBool. size_t denotes an i32 when the Addressing Model is Phsical32 and i64 when the Addressing Model is Phsical64. 4

5 OpenCL Etended Instruction Set Specification vector(n) denotes an OpTpeVector where n indicates the component count. vector(n 1, n 2,..., n i ) abbreviates vector(n 1 ), vector(n 2 ),... or vector(n i ). integer denotes i8, i16, i32 or i64. floating-point denotes half, float, double. pointer(storage) denotes an OpTpePointer which points to storage Storage Class. pointer(constant) denotes an OpTpePointer with UniformConstant Storage Class. pointer(generic) denotes an OpTpePointer with Generic Storage Class. pointer(global) denotes an OpTpePointer with CrossWorkgroup Storage Class. pointer(local) denotes an OpTpePointer with Workgroup Storage Class. pointer(private) denotes an OpTpePointer with Function Storage Class. pointer(s 1, s 2,..., s i ) abbreviates pointer(s 1 ), pointer(s 2 ),... or pointer(s i ). image defines all tpes of image memor objects (See image encoding section). sampler a SPIR-V sampler object (See sampler encoding section). 5

6 OpenCL Etended Instruction Set Specification 2.1 Math This section describes the list of eternal math. The eternal math are categorized into the following: A list of that have scalar or vector argument versions, and, A list of that onl take scalar float arguments. The vector versions of the math operate component-wise. The description is per-component. The math are not affected b the prevailing rounding mode in the calling environment, and alwas return the same value as the would if called with the round to nearest even rounding mode. acos Compute the arc cosine of. Result is an angle in radians and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe acosh Compute the inverse hperbolic cosine of. Result is an angle in radians and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

7 OpenCL Etended Instruction Set Specification acospi Compute acos() / π. Result is an angle in radians and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe asin Compute the arc sine of. Result is an angle in radians and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe asinh Compute the inverse hperbolic sine of. Result is an angle in radians and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

8 OpenCL Etended Instruction Set Specification asinpi Compute asin() / π. Result is an angle in radians and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe atan Compute the arc tangent of. Result is an angle in radians and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe atan2 Compute the arc tangent of /. Result is an angle in radians, and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

9 OpenCL Etended Instruction Set Specification atanh Compute the hperbolic arc tangent of. Result is an angle in radians and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe atanpi Compute atan() / π. Result is an angle in radians and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe atan2pi Compute atan2(, ) / π. Result is an angle in radians, and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

10 OpenCL Etended Instruction Set Specification cbrt Compute the cube-root of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe ceil Round to integral value using the round to positive infinit rounding mode. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe copsign Returns with its sign changed to match the sign of., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe cos Compute the cosine of radians. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

11 OpenCL Etended Instruction Set Specification cosh Compute the hperbolic cosine of radians. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe cospi Compute cos() / π radians. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe erfc Complementar error function of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe erf Error function of encountered in integrating the normal distribution. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

12 OpenCL Etended Instruction Set Specification ep Compute the base-e eponential of. (i.e. e ) and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe ep2 Computes 2 raised to the power of. (i.e. 2 ) and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe ep1 Computes 1 raised to the power of. (i.e. 1 ) and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe epm1 Computes e and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

13 OpenCL Etended Instruction Set Specification fabs Compute the absolute value of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe fdim Compute - if >, + if is less than or equal to., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe floor Round to the integral value using the round to negative infinit rounding mode. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe fma Compute the correctl rounded floating-point representation of the sum of c with the infinitel precise product of a and b.rounding of intermediate products shall not occur. Edge case behavior is per the IEEE standard., a, b and c must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe a b c 13

14 OpenCL Etended Instruction Set Specification fma Returns if <, otherwise it returns. If one argument is a NaN, Fma returns the other argument. If both arguments are NaNs, Fma returns a NaN., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe. Note: fma behave as defined b C99 and ma not match the IEEE definition for manum with regard to signaling NaNs.Specificall, signaling NaNs ma behave as quiet NaNs fmin Returns if <, otherwise it returns. If one argument is a NaN, Fmin returns the other argument. If both arguments are NaNs, Fmin returns a NaN., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe. Note: fmin behave as defined b C99 and ma not match the IEEE definition for minnum with regard to signaling NaNs.Specificall, signaling NaNs ma behave as quiet NaNs fmod Modulus. Returns - * trunc (/)., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

15 OpenCL Etended Instruction Set Specification fract Returns fmin( - floor(), 1.fffffep-1f. floor() is returned in ptr. and must be floating-point or vector(2,3,4,8,16) of floating-point values. ptr must be a pointer(global, local, private, generic) to floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe, or must be a pointer to the same tpe ptr frep Etract the mantissa and eponent from. The holds the mantissa, and ep points to the eponent. For each component the mantissa returned is a floating-point with magnitude in the interval [1/2, 1) or. Each component of equals mantissa returned * 2 ep. and must be floating-point or vector(2,3,4,8,16) of floating-point values. ep must be a pointer(global, local, private, generic) to i32 or vector(2,3,4,8,16) of i32 values. and operands must be of the same tpe. ep operand must point to an i32 with the same component count as and operands ep hpot Compute the value of the square root of without undue overflow or underflow., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

16 OpenCL Etended Instruction Set Specification ilogb Return the eponent of as an i32 value. must be i32 or vector(2,3,4,8,16) of i32 values. must be floating-point or vector(2,3,4,8,16) of floating-point values. and operands must have the same component count ldep Multipl b 2 to the power k. k must be i32 or vector(2,3,4,8,16) of i32 values. and must be floating-point or vector(2,3,4,8,16) of floating-point values. and operands must be of the same tpe. ep operand must have the same component count as and operands k lgamma Log gamma function of. Returns the natural logarithm of the absolute value of the gamma function. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

17 OpenCL Etended Instruction Set Specification lgamma_r Log gamma function of. Returns the natural logarithm of the absolute value of the gamma function. The sign of the gamma function is returned in the signp operand and must be floating-point or vector(2,3,4,8,16) of floating-point values. signp must be a pointer(global, local, private, generic) to i32 or vector(2,3,4,8,16) of i32 values. and operands must be of the same tpe. signp operand must point to an i32 with the same component count as and operands signp log Compute natural logarithm of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe log2 Compute a base 2 logarithm of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe log1 Compute a base 1 logarithm of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

18 OpenCL Etended Instruction Set Specification log1p Compute log e (1. + ). and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe logb Compute the eponent of, which is the integral part of log r. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe mad mad approimates a * b + c. Whether or how the product of a * b is rounded and how supernormal or subnormal intermediate products are handled is not defined. mad is intended to be used where speed is preferred over accurac, a, b and c must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe. Note: For some usages, e.g.mad(a, b, -a*b), the definition of mad() is loose enough that almost an result is allowed from mad() for some values of a and b a b c mamag Returns if >, if >, otherwise fma(, )., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

19 OpenCL Etended Instruction Set Specification minmag Returns if <, if <, otherwise fmin(, )., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe modf Decompose a floating-point number. The modf function breaks the argument into integral and fractional parts, each of which has the same sign as the argument. It stores the integral part in the object pointed to b iptr and must be floating-point or vector(2,3,4,8,16) of floating-point values. iptr must be a pointer(global, local, private, generic) to floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe, or must be a pointer to the same tpe iptr nan Returns a quiet NaN. The nancode ma be placed in the significand of the resulting NaN. must be floating-point or vector(2,3,4,8,16) of floating-point values. nancode must be integer or vector(2,3,4,8,16) of integer values. and nancode operands must have the same component count. The primitve data tpe size of nancode and Reuslt Tpe must be equal nancode 19

20 OpenCL Etended Instruction Set Specification netafter Computes the net representable floating-point value following in the direction of. Thus, if is less than, netafter() returns the largest representable floating-point number less than., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe pow Compute to the power., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe pown Compute to the power, where is an i32 integer. must be i32 or vector(2,3,4,8,16) of i32 values. and must be floating-point or vector(2,3,4,8,16) of floating-point values. and operands must be of the same tpe. operand must have the same component count as Result Tpe and operands powr Compute to the power, where is >=., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

21 OpenCL Etended Instruction Set Specification remainder Compute the value r such that r = - n*, where n is the integer nearest the eact value of /. If there are two integers closest to /, n shall be the even one. If r is zero, it is given the same sign as., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe remquo The remquo function computes the value r such that r = - k*, where k is the integer nearest the eact value of /. If there are two integers closest to /, k shall be the even one. If r is zero, it is given the same sign as. This is the same value that is returned b the remainder function. remquo also calculates the lower seven bits of the integral quotient /, and gives that value the same sign as /. It stores this signed value in the object pointed to b quo., and must be floating-point or vector(2,3,4,8,16) of floating-point values. quo must be a pointer(global, local, private, generic) to i32 or vector(2,3,4,8,16) of i32 values., and operands must be of the same tpe. quo operand must point to an i32 with the same component count as, and operands quo rint Round to integral value (using round to nearest even rounding mode) in floating-point format. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

22 OpenCL Etended Instruction Set Specification rootn Compute to the power 1/. must be i32 or vector(2,3,4,8,16) of i32 values. and must be floating-point or vector(2,3,4,8,16) of floating-point values. and operands must be of the same tpe. operand must have the same component count as Result Tpe and operands round Return the integral value nearest to rounding halfwa cases awa from zero, regardless of the current rounding direction. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe rsqrt Compute inverse square root of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe sin Compute sine of radians. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

23 OpenCL Etended Instruction Set Specification sincos Compute sine and cosine of radians. The computed sine is the return value and computed cosine is returned in cosval. and must be floating-point or vector(2,3,4,8,16) of floating-point values. cosval must be a pointer(global, local, private, generic) to floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe, or must be a pointer to the same tpe cosval sinh Compute hperbolic sine of radians. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe sinpi Compute sin (π ) radians. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

24 OpenCL Etended Instruction Set Specification sqrt Compute square root of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe tan Compute tangent of radians. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe tanh Compute hperbolic tangent of radians. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe tanpi Compute tan (π ) radians. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

25 OpenCL Etended Instruction Set Specification tgamma Compute the gamma function of. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe trunc Round to integral value using the round to zero rounding mode. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe half_cos Compute cosine of radians, where must be in the range ha and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe half_divide Compute /., and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe

26 OpenCL Etended Instruction Set Specification half_ep Compute the base-e eponential of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe half_ep2 Compute the base- 2 eponential of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe half_ep1 Compute the base- 1 eponential of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe half_log Compute natural logarithm of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe

27 OpenCL Etended Instruction Set Specification half_log2 Compute a base 2 logarithm of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe half_log1 Compute a base 1 logarithm of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe half_powr Compute to the power, where is >=., and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe half_recip Compute reciprocal of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe

28 OpenCL Etended Instruction Set Specification half_rsqrt Compute inverse square root of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe half_sin Compute sine of radians, where must be in the range and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe half_sqrt Compute the square root of. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe half_tan Compute tangent value of radians, where must be in the range and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe

29 OpenCL Etended Instruction Set Specification native_cos Compute cosine of radians over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function native_divide Compute / over an implementation-defined range. The maimum error is implementation-defined., and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function native_ep Compute the base-e eponential of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function

30 OpenCL Etended Instruction Set Specification native_ep2 Compute the base- 2 eponential of over an implementation-defined range. The maimum error is implementation-defined.. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function native_ep1 Compute the base- 1 eponential of over an implementation-defined range. The maimum error is implementation-defined.. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function native_log Compute natural logarithm of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function

31 OpenCL Etended Instruction Set Specification native_log2 Compute a base 2 logarithm of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function native_log1 Compute a base 1 logarithm of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function native_powr Compute to the power, where is >=., and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function

32 OpenCL Etended Instruction Set Specification native_recip Compute reciprocal of over an implementation-defined range. The range of and are implementation-defined. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function native_rsqrt Compute inverse square root of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function native_sin Compute sine of radians over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function

33 OpenCL Etended Instruction Set Specification native_sqrt Compute the square root of over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function native_tan Compute tangent value of radians over an implementation-defined range. The maimum error is implementation-defined. and must be float or vector(2,3,4,8,16) of float values. All of the operands, including the operand, must be of the same tpe. The function ma map to one or more native device and will tpicall have better performance compared to the non native corresponding functions. Support for denormal values is implementation-defined for this function

34 OpenCL Etended Instruction Set Specification 2.2 Integer This section describes the list of integer that take scalar or vector arguments. The vector versions of the integer functions operate component-wise. The description is per-component. s_abs Returns, where is treated as signed integer. and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe s_abs_diff Returns - without modulo overflow, where and are treated as signed integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe s_add_sat Returns the saturated value of +, where and are treated as signed integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe

35 OpenCL Etended Instruction Set Specification u_add_sat Returns the saturated value of +, where and are treated as unsigned integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe s_hadd Returns the value of ( + ) >> 1, where and are treated as signed integers. The intermediate sum does not modulo overflow., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe u_hadd Returns the value of ( + ) >> 1, where and are treated as unsigned integers. The intermediate sum does not modulo overflow., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe s_rhadd Returns the value of ( + + 1) >> 1, where and are treated as signed integers. The intermediate sum does not modulo overflow., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe

36 OpenCL Etended Instruction Set Specification u_rhadd Returns the value of ( + + 1) >> 1, where and are treated as unsigned integers. The intermediate sum does not modulo overflow., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe s_clamp Returns s_min(s_ma(,minval),maval), where, minval, and maval are treated as signed integers. Results are undefined if minval > maval.,, minval and maval must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe minval maval u_clamp Returns u_min(u_ma(,minval),maval), where, minval, and maval are treated as unsigned integers. Results are undefined if minval > maval.,, minval and maval must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe minval maval clz Returns the number of leading bits in, starting at the most significant bit position. If is, returns the size in bits of the tpe of or component tpe of, if is a vector. and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe

37 OpenCL Etended Instruction Set Specification ctz Returns the count of trailing bits in. If is, returns the size in bits of the tpe of or component tpe of, if is a vector. and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe s_mad_hi Returns mul_hi(a, b) + c, where a,b and c are treated as signed integers., a, b and c must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe a b c u_mad_sat Returns * + z and sautrates the result where, and z are treated as unsigned integers.,, and z must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe z s_mad_sat Returns * + z and sautrates the result where, and z are treated as signed integers.,, and z must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe z 37

38 OpenCL Etended Instruction Set Specification s_ma Returns if <, otherwise it returns, where and are treated as signed integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe u_ma Returns if <, otherwise it returns, where and are treated as unsigned integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe s_min Returns if <, otherwise it returns, where and are treated as signed integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe u_min Returns if <, otherwise it returns, where and are treated as unsigned integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe

39 OpenCL Etended Instruction Set Specification s_mul_hi Computes * and returns the high half of the product of and, where and are treated as signed integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe rotate For each element in v, the bits are shifted left b the number of bits given b the corresponding element in i. Bits shifted off the left side of the element are shifted back in from the right., v and i must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe v i s_sub_sat Returns the saturated value of -, where and are treated as signed integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe u_sub_sat Returns the saturated value of -, where and are treated as unsigned integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe

40 OpenCL Etended Instruction Set Specification u_upsample When hi and lo component tpe is i8: Result = ((upcast... to i16)hi << 8) lo When hi and lo component tpe is i16: Result = ((upcast... to i32)hi << 8) lo When hi and lo component i32: Result = ((upcast... to i64)hi << 8) lo hi and lo are treated as unsigned integers. hi and lo must be i8, i16 or i32 or vector(2,3,4,8,16) of i8, i16 or i32 values. must be i16, i32 or i64 or vector(2,3,4,8,16) of i16, i32 or i64 values. hi and lo operands must be of the same tpe. When hi and lo component tpe is i8, the component tpe must be i16. When hi and lo component tpe is i16, the component tpe must be i32. When hi and lo component tpe is i32, the component tpe must be i64. must have the same component count as hi and lo operands hi lo 4

41 OpenCL Etended Instruction Set Specification s_upsample When hi and lo component tpe is i8: Result = ((upcast... to i16)hi << 8) lo When hi and lo component tpe is i16: Result = ((upcast... to i32)hi << 8) lo When hi and lo component i32: Result = ((upcast... to i64)hi << 8) lo hi and lo are treated as signed integers. hi and lo must be i8, i16 or i32 or vector(2,3,4,8,16) of i8, i16 or i32 values. must be i16, i32 or i64 or vector(2,3,4,8,16) of i16, i32 or i64 values. hi and lo operands must be of the same tpe. When hi and lo component tpe is i8, the component tpe must be i16. When hi and lo component tpe is i16, the component tpe must be i32. When hi and lo component tpe is i32, the component tpe must be i64. must have the same component count as hi and lo operands hi lo popcount Returns the number of non-zero bits in. and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe

42 OpenCL Etended Instruction Set Specification s_mad24 Multip two 24-bit integer values and and add the 32-bit integer result to the 32-bit integer z. Refer to definition of s_mul24 to see how the 24-bit integer multiplication is performed.,, and z must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe z u_mad24 Multip two 24-bit integer values and and add the 32-bit integer result to the 32-bit integer z. Refer to definition of u_mul24 to see how the 24-bit integer multiplication is performed.,, and z must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe z s_mul24 Multipl two 24-bit integer values and, where and are treated as signed integers. and are 32-bit integers but onl the low-order 24 bits are used to perform the multiplication. s_mul24 should onl be used when values in and are in the range [-2 23, ]. If and are not in this range, the multiplication result is implementation-defined., and must be i32 or vector(2,3,4,8,16) of i32 values. All of the operands, including the operand, must be of the same tpe

43 OpenCL Etended Instruction Set Specification u_mul24 Multipl two 24-bit integer values and, where and are treated as unsigned integers. and are 32-bit integers but onl the low-order 24 bits are used to perform the multiplication. u_mul24 should onl be used when values in and are in the range [, ]. If and are not in this range, the multiplication result is implementation-defined., and must be i32 or vector(2,3,4,8,16) of i32 values. All of the operands, including the operand, must be of the same tpe u_abs Returns, where is treated as unsigned integer. and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe u_abs_diff Returns - without modulo overflow, where and are treated as unsigned integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe u_mul_hi Computes * and returns the high half of the product of and, where and are treated as unsigned integers., and must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe

44 OpenCL Etended Instruction Set Specification u_mad_hi Returns mul_hi(a, b) + c, where a,b and c are treated as unsigned integers., a, b and c must be integer or vector(2,3,4,8,16) of integer values. All of the operands, including the operand, must be of the same tpe a b c 44

45 OpenCL Etended Instruction Set Specification 2.3 Common This section describes the list of common that take scalar or vector arguments. The vector versions of the integer functions operate component-wise. The description is per-component. The common are implemented using the round to nearest even rounding mode. fclamp Returns fmin(fma(, minval), maval). Results are undefined if minval > maval.,, minval and maval must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe minval maval degrees Converts radians to degrees, i.e. (18 / π) * radians. and radians must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe radians fma_common Returns if <, otherwise it returns. If or are infinite or NaN, the return values are undefined., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

46 OpenCL Etended Instruction Set Specification fmin_common Returns if <, otherwise it returns. If or are infinite or NaN, the return values are undefined., and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe mi Returns the linear blend of & implemented as: + ( - ) * a,, and a must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe. Note: This function can be implemented using contractions such as mad or fma a radians Converts degrees to radians, i.e. (π / 18) * degrees. and degrees must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe degrees step Returns. if < edge, otherwise it returns 1.., edge and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe edge 46

47 OpenCL Etended Instruction Set Specification smoothstep Returns. if edge and 1. if >= edge 1 and performs smooth Hermite interpolation between and 1, when edge < < edge 1. This is equivalent to : t = fclamp(( - edge ) / (edge 1 - edge ),, 1); return t * t * (3-2 * t); Results are undefined if edge >= edge 1 or if, edge or edge 1 is a NaN., edge, edge 1 and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe. Note: This function can be implemented using contractions such as mad or fma edge edge 1 sign Returns 1. if >, -. if = -., +. if = +., or -1. if <. Returns. if is a NaN. and must be floating-point or vector(2,3,4,8,16) of floating-point values. All of the operands, including the operand, must be of the same tpe

48 OpenCL Etended Instruction Set Specification 2.4 Geometric This section describes the list of geometric. In this section,,z and w denote the first, second, third and fourth component respecitivel, of vectors with 3 and four components.the geometric are implemented using the round to nearest even rounding mode. Note: The geometric functions can be implemented using contractions such as mad or fma cross Returns the cross product of p.z and p 1.z. When the vector component count is 4, the w component returned will be.., p and p 1 must be vector(3,4) of floating-point values. All of the operands, including the operand, must be of the same tpe p p 1 distance Returns the distance between p and p 1. This is calculated as length(p - p 1 ). must be floating-point. p and p 1 must be floating-point or vector(2,3,4) of floating-point values. p and p 1 operands must have the same tpe., p and p 1 operands must have the same component tpe p p 1 length Return the length of vector p, i.e. sqrt( p. 2 + p ) must be floating-point. p must be vector(2,3,4) of floating-point values. and p operands must have the same component tpe p 48

49 OpenCL Etended Instruction Set Specification normalize Returns a vector in the same direction as p but with a length of 1. and p must be floating-point or vector(2,3,4) of floating-point values. All of the operands, including the operand, must be of the same tpe p fast_distance Returns fast_length(p - p 1 ). must be floating-point. p and p 1 must be floating-point or vector(2,3,4) of floating-point values. p and p 1 operands must have the same tpe., p and p 1 operands must have the same component tpe p p 1 fast_length Return the length of vector p computed as: half_sqrt( p. 2 + p ) must be floating-point. p must be vector(2,3,4) of floating-point values. and p operands must have the same component tpe p 49

OpenCL 2.1 Extended Instruction Set Specification (Provisional)

OpenCL 2.1 Extended Instruction Set Specification (Provisional) OpenCL 2.1 Etended Instruction Set Specification (Provisional) Boaz Ouriel, Intel Version 0.99, Revision 30 April 2, 2015 OpenCL 2.1 Etended Instruction Set Specification (Provisional) ii Copright 2014-2015