C Fast RTS Library User Guide (Rev 1.0)

Revision History 22 Sep 2008 Initial Revision v. 1.0

IMPORTANT NOTICE Texas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to discontinue any product or service without notice, and advise customers to obtain the latest version of relevant information to verify, before placing orders, that information being relied on is current and complete. All products are sold subject to the terms and conditions of sale supplied at the time of order acknowledgment, including those pertaining to warranty, patent infringement, and limitation of liability. TI warrants performance of its products to the specifications applicable at the time of sale in accordance with TI s standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary to support this warranty. Specific testing of all parameters of each device is not necessarily performed, except those mandated by government requirements. Customers are responsible for their applications using TI components. In order to minimize risks associated with the customer s applications, adequate design and operating safeguards must be provided by the customer to minimize inherent or procedural hazards. TI assumes no liability for applications assistance or customer product design. TI does not warrant or represent that any license, either express or implied, is granted under any patent right, copyright, mask work right, or other intellect ual property right of TI covering or relating to any combination, machine, or process in which such products or services might be or are used. TI s publication of information regarding any third party s products or services does not constitute TI s approval, license, warranty or endorsement thereof. Reproduction of information in TI data books or data sheets is permissible only if reproduction is without alteration and is accompanied by all associated warranties, conditions, limitations and notices. Repres entation or reproduction of this information with alteration voids all warranties provided for an associated TI product or service is an unfair and deceptive business practice, and TI is neither responsible nor liable for any such use. Resale of TI s products or services with statements different from or beyond the parameters stated by TI for that product or service voids all express and any implied warranties for the associated TI product or service, is an unfair and deceptive business practice, and TI is not responsible nor liable for any such use. Also see: Standard Terms and Conditions of Sale for Semiconductor Products. www.ti.com/sc/docs/stdterms.htm Mailing Address: Texas Instruments Post Office Box 655303 Dallas, Texas 75265 Copyright 2008, Texas Instruments Incorporated 3

1 Contents 1 Contents... iv 2 Figures... v 3 Tables... v 1 Introduction... 6 1.1 Introduction... 6 1.2 Release package and directory structure... 6 1.3 FastRTS C functions... 6 1.4 Macros provided:... 7 1.5 Usage:... 7 1.6 Comparison between FastRTS and C FastRTS... 7 2 Function Descriptions... 8 2.1 addsp_i: Single precision floating-point addition... 8 2.2 subsp_i: Single precision floating point subtraction... 8 2.3 uintsp_i: Convert 32-bit unsigned integer to single precision floating point... 8 2.4 intsp_i: Convert 32-bit signed integer to single-precision floating-point... 9 2.5 mpysp_i: Single precision floating-point multiplication... 9 2.6 recipsp_i: Single precision floating point reciprocal... 9 2.7 spint_i: Single precision floating point to 32-bit signed integer... 9 2.8 spuint_i: Single precision floating point to 32-bit unsigned integer... 9 3 Benchmarks... 11 3.1 C64x and C64x+ FastRTS C Library Benchmarks... 11 4 Flow Charts... 12 4.1 Single Precision Addition (addsp_i):... 12 4.2 Single Precision Subtraction (subsp):... 13 4.3 Single Precision Multiplication (mpysp):... 14 4.4 Single Precision Division (divsp_i):... 15 4.5 Single Precision Reciprocal (recipsp_i):... 16

2 Figures Figure 1: Directory structure... 6 Figure 2 : addsp_i... 12 Figure 3 : subsp_i... 13 Figure 4 : mpysp_i... 14 Figure 5 : divsp_i... 15 Figure 6 : recipsp_i... 16 3 Tables Table 1. Fast RTS C functions.... 6 Table 2: Function Performance... 11 v

1 Introduction 1.1 Introduction The C62x/C64x/C64x+ FastRTS C library is an optimized, floating-point function library. The FastRTS C library provides C implementation for a subset of functions available with the FastRTS library. The C codes allow the user to inline these functions and get much improved performance. To learn more about inlining, please refer to SPRU187. 1.2 Release package and directory structure The C package is release as a part of the fastrts library. The package release directory is as shown. Figure 1: Directory structure 1.3 FastRTS C functions Table 1. Fast RTS C functions. FastRTS C functions addsp_i divsp_i intsp_i mpysp_i recipsp_i spint_i spuint_i Function Description Single precision floating point addition Single precision floating point division 32-bit signed integer to single precision floating point number Single precision floating point multiplication Single precision floating point reciprocal Single precision floating point number to 32-bit signed integer Single precision floating point number to 32-bit unsigned integer

sqrtsp_i subsp_i uintsp_i Single precision floating point square root Single precision floating point subtraction 32-bit unsigned integer to single precision floating point number 1.4 Macros provided: There are two macros used in the code. DEBUG This macro switches ON the under-flow and overflow checks in the code. See flowcharts and individual function description for further details. INLINE_C This macro enables inlining of the C fast RTS functions. 1.5 Usage: Following steps should be followed to use the C fast RTS library Include fastrts_i.h file in your source files. Call appropriate functions in code. Define the above macros as required. The remainig build process remains the same. An example project demonstrating the use of the C fast RTS library is provided in the release. The C library works for all TI C6x architectures, namely the C62x, the C64x and the C64x+. Appropriate code for a particular architecture is generated based on the compiler options selected by the user. 1.6 Comparison between FastRTS and C FastRTS The FastRTS library is written in optimized assembly to get maximum performance. The drawback is that because of its assembly nature, the kernels can t be inlined by the compiler. The FastrRTS C library is written completely in C and thus the compiler can inline the kernels to get maximum advantage. Unlike the RTS library, both the FastRTS lib and the FastRTS C library make compromises to the accuracy to get better performance. These compromizes include underflow and overflow checks and for most use cases, the accuracy loss is acceptable. Unlike FastRTS library, the FastRTS C library includes the code for such checks under DEBUG macro. This macro should be enabled for debug purposes only as it results in loss of performance. 7

2 Function Descriptions 2.1 addsp_i: Single precision floating-point addition Syntax: float addsp_i(float x, float y) Defined in: addsp_i.h Description: The sum of two input 32-bit floating-point number is generated Special Cases: Zero input return zero output Underflow and overflow is checked only in the DEBUG mode 2.2 subsp_i: Single precision floating point subtraction Syntax: float subsp_i(float x, float y) Defined in: subsp_i.h Description: The difference of two single precision floating point numbers Special Cases: Underflow and overflow is checked in DEBUG mode 2.3 uintsp_i: Convert 32-bit unsigned integer to single precision floating point Syntax: float uintsp_i(unsigned int x) Defined in: A 32-bit unsigned integer is converted to a single precision floating point number divsp_i: Single-precision floating-point division Syntax: float divsp_i(float x, float y) Defined in: divsp_i.h Description: The quotient for division of two 32-bit floating-point numbers is generated Special Cases: 8

Underflow and Overflow of the quotient is checked only in the DEBUG mode Zero divided by Zero returns 1.#NAN n-zero over zero returns infinity 2.4 intsp_i: Convert 32-bit signed integer to single-precision floating-point Syntax: Float intsp_i(int x) Defined in: intsp_i.h Description: An input 32-bit signed integer is converted to a 32-bit single precision floating point number 2.5 mpysp_i: Single precision floating-point multiplication Syntax: float mpysp_i(float x, float y) Defined in: mpysp_i.h Description: The product of two 32-bit floating point numbers is generated 2.6 recipsp_i: Single precision floating point reciprocal Syntax: float recipsp_i(float x) Defined in: recipsp_i Description: The reciprocal of an input 32-bit floating point number is generated Special Cases: Underflow and overflow is checked only in DEBUG mode The reciprocal of zero returns infinity 2.7 spint_i: Single precision floating point to 32-bit signed integer Syntax: int intsp_i(float x) Defined in: spint_i.h Description: A single precision floating point number is converted to a 32-bit signed integer 2.8 spuint_i: Single precision floating point to 32-bit unsigned integer Syntax: Unsigned int spuint_i(float x) Defined in: spuint_i.h 9

Description: A single precision floating point number is converted to 32-bit unsigned integer Special Cases: Numbers less than 1.0 returns zero Results greater than 32 bits generate the following saturation values: o 0xffff_ffff for positive numbers o 0x0000_0000 for negative numbers 10

3 Benchmarks 3.1 C64x and C64x+ FastRTS C Library Benchmarks Table 2 gives samples of execution clock cycles. The times in column 3 and 5 (function call) includes the overhead of the function call. The benchmarks were taken using TMS320C64x+ simulator (Little Endian) with flat memory architecture without overheads. The code has been tested for large number of inputs. Table 2: Function Performance Execution Cycles for C64x Execution Cycles for C64x+ FastRTS optimized C FastRTS optimized C Function Inlined and Pipelined Function Call Inlined and Pipelined Function Call addsp_i 21.17 37.12 11.33 36 subsp_i 21.18 38.12 11.33 37.12 multsp_i 6.041 31.012 5.03 27.01 divsp_i 17.08 63.012 17.08 62.01 recipsp_i 15.07 62.012 15.08 60.01 intsp_i 4.025 22.012 4.02 22.01 spint_i 7.032 20.012 6.01 22.01 spuint_i 8.027 22.012 8.02 22.01 sqrtsp_i - 559.75-545.15 uintsp_i 3.26 16.12 3.21 16.12 *Compiler version used for Benchmarking is v6.0.18 11

4 Flow Charts 4.1 Single Precision Addition (addsp_i): Op2 Op1 IF both 0 Set the ZERO FLAG Extract the exponent, the fraction and sign Inset the hidden bit Op < 0 2 s complement Shift fractions to align radix point and add If ZERO FLAG set Round and normalize the result (24 bits only) Make the exp and fraction of result= 0 Check for overflow and underflow Assemble the result and return DEBUG mode Figure 2 : addsp_i 12

4.2 Single Precision Subtraction (subsp_i): Op2 Op1 IF both 0 Set the ZERO FLAG 2 s complement of Op2 Extract the exponent, the fraction and sign Inset the hidden bit Op < 0 2 s complement Shift fractions to align radix point and add If ZERO FLAG set Round and normalize the result (24 bits only) Check for overflow and underflow Make the exp and fraction of result= 0 DEBUG mode Assemble the result and return Figure 3 : subsp_i 13

4.3 Single Precision Multiplication (mpysp_i): Op2 Op1 IF any 0 Set the ZERO FLAG Extract the exponent, the fraction and sign Inset the hidden bit Perform 32-bit Multiplication Round and normalize the result (24 bits only) If ZERO FLAG set Check for overflow and underflow DEBUG mode Make the exp and fraction of result= 0 Assemble the result and return Figure 4 : mpysp_i 14

4.4 Single Precision Division (divsp_i): Op2 Op1 Set the IFINITY FLAG IF 0 IF 0 Set the ZERO FLAG If ZERO FLAG set Extract the exponent, the fraction and sign Make the result 0 Inset the hidden bit Loop: Perform Division by repeated subtraction If INFINITY FLAG set Make the result = INFNAN Round and normalize the result (24 bits only) Check for overflow and underflow DEBUG mode If Both FLAGS set Make the result = NAN Assemble the result and return Figure 5 : divsp_i 15

4.5 Single Precision Reciprocal (recipsp_i): Op2 Set the IFINITY FLAG IF 0 Op1 = 1 Extract the exponent, the fraction and sign Inset the hidden bit If INFINITY FLAG set Loop: Perform Division by repeated subtraction Make the result = INFNAN Round and normalize the result (24 bits only) Assemble the result and return Check for overflow and underflow DEBUG mode Figure 6 : recipsp_i 16