C Fast RTS Library User Guide (Rev 1.0)

Similar documents
Debugging Shared Memory Systems

Techniques for Profiling on ROM-Based Applications

Increase Current Drive Using LVDS

OMAP SW. Release Notes. OMAP Software Tools OST version 2.5 Release. 16xx/1710/242x platforms. Document Revision: 2.5 Release

TFP101, TFP201, TFP401, TFP401A 2Pix/Clk Output Mode

SN5446A, 47A, 48, SN54LS47, LS48, LS49 SN7446A, 47A, 48, SN74LS47, LS48, LS49 BCD-TO-SEVEN-SEGMENT DECODERS/DRIVERS

Hardware UART for the TMS320C3x

I2C and the TAS3001C. Introduction. The I2C Protocol. Digital Audio Group ABSTRACT

EV Software Rev Evaluation System User Guide. Introduction. Contents. Hardware and Software Setup. Software Installation

Texas Instruments Voltage-Level-Translation Devices

INVENTORY HISTORY REPORT EXTENSION. User Guide. User Guide Page 1

Using the TMS320 DSP Algorithm Standard in a Dynamic DSP System

CUSTOM GOOGLE SEARCH. User Guide. User Guide Page 1

Dual Access into Single- Access RAM on a C5x Device

A DSP/BIOS AIC23 Codec Device Driver for the TMS320C5510 DSK

TMS320C6000 DSP Software-Programmable Phase-Locked Loop (PLL) Controller Reference Guide

TMS320C5x Memory Paging (Expanding its Address Reach)

A DSP/BIOS AIC23 Codec Device Driver for the TMS320C6416 DSK

INVENTORY REPORT EXTENSION. User Guide. User Guide Page 1

ADD RELATED PRODUCTS TO CART. User Guide. User Guide Page 1

UCC3917 Floating Hot Swap Power Manager Evaluation Board

Using the TMS320C5509 USB Bootloader

TMS320C6000 DSP 32-Bit Timer Reference Guide

Implementation of a CELP Speech Coder for the TMS320C30 using SPOX

TMS320C6000 DSP Interrupt Selector Reference Guide

COMMUNICATIONS WITH THE MULTI- CHANNEL HOST P RT INTERFACE

Bit-reversed Addressing without Data Alignment on the C3x

This document describes the features of the GUI program used to control Power Line Modem with E-Meter Platform.

WM8805_6152_DS28_EV1_REV3 Schematic and Layout. WM8805_6152_DS28_EV1_REV3 Schematic and Layout. Customer Information 1 of 18 June 2007, Rev 3.

SN65DSI86 SW Examples

Memory Allocation Techniques in System with Dynamic Swapping of Application Codes

SN54BCT760, SN74BCT760 OCTAL BUFFERS/DRIVERS WITH OPEN-COLLECTOR OUTPUTS

NO P.O. BOXES ALLOWED AT CHECKOUT. User Guide. User Guide Page 1

TMS320UC5409/TMS320VC5409 Digital Signal Processors Silicon Errata

IMPORT/EXPORT Newsletter Subscribers. User Guide. User Guide Page 1

TMS470R1x External Clock Prescale (ECP) Reference Guide

WM DS28-EV1-REV2 Schematic and Layout WOLFSON DEVICE(S):

Calibration Routines and Register Value Generation for the ADS1216, ADS1217 and ADS1218

Nested Loop Optimization on the TMS320C6x

Table 1. Proper Termination of Unused (Port) Pins in a Single-Port PSE System

1 Photo. 7/15/2014 PMP10283 Rev A Test Results

2001 Mixed-Signal Products SLOU091A

SavvyCube Ecommerce Analytics Connector by MageWorx. User Guide

Reading a 16-Bit Bus With the TMS320C5x Serial Port

Programming the TMS320VC5509 RTC Peripheral

Using LDOs and Power Managers in Systems With Redundant Power Supplies

DSP/BIOS Link. Platform Guide Published on 20 th JUNE Copyright 2009 Texas Instruments Incorporated.

TMS320VC5409A Digital Signal Processor Silicon Errata

PROGRAMMING THE MSC1210

TLK10081 EVM Quick Start Guide Texas Instruments Communications Interface Products

SN54F38, SN74F38 QUADRUPLE 2-INPUT POSITIVE-NAND BUFFERS WITH OPEN-COLLECTOR OUTPUTS

TMS320C64x DSP Peripheral Component Interconnect (PCI) Performance

C Routines for Setting Up the AIC on the TMS320C5x EVM

The photograph below shows the PMP9730 Rev E prototype assembly. This circuit was built on a PMP9730 Rev D PCB.

Configuring Code Composer Studio for OMAP Debugging

Using Endianess Conversion in the OMAP5910 Device

TMS320C6000 DSP Expansion Bus: Multiple DSP Connection Using Asynchronous Host Mode

TMS320C6000 DSP General-Purpose Input/Output (GPIO) Reference Guide

Stereo Dac Motherboard application information

PMC to PCI Express Adapter with JN4 Connector Breakout

WM DS28-EV2-REV1 Schematic and Layout

Getting Started Guide: TMS-FET470A256 IAR Kickstart Development Kit

Donations Ultimate User Guide

WM CS20-M-REV2

GUEST CHECKOUT TO REGISTERED CUSTOMERS. User Guide. User Guide Page 1

WM DT16-EV1. Customer Standalone Board WOLFSON DEVICE(S): DATE: August 2009

TMS320C620x/C670x DSP Boot Modes and Configuration Reference Guide

WM DT20-EV1. Customer Standalone Board WOLFSON DEVICE(S): DATE: September 2009

XIO1100 NAND-Tree Test

DatasheetDirect.com. Visit to get your free datasheets. This datasheet has been downloaded by

DS25BR204 Evaluation Kit

SN5476, SN54LS76A SN7476, SN74LS76A DUAL J-K FLIP-FLOPS WITH PRESET AND CLEAR

IndoTraq Development Kit 1: Command Reference

TMS320C62x, TMS320C67x DSP Cache Performance on Vocoder Benchmarks

External Programming of the TMS320C64x EDMA for Low Overhead Data Transfers

Power Line Modem with E-Meter Platform Quick Start Guide

Wolverine - based microcontrollers. Slashing all MCU power consumption in half

GUEST CHECKOUT TO REGISTERED CUSTOMERS

TMS320C672x DSP Software-Programmable Phase-Locked Loop (PLL) Controller. Reference Guide


Test Report PMP Test Data For PMP /20/2015

IMPORT/EXPORT CUSTOMERS FOR MAGENTO 2. User Guide. User Guide Page 1

Constant Temperature Chamber ATITRS1. Figure 1. Top View. Figure 2. Front View

Maximizing Endurance of MSC1210 Flash Memory

FlashBurn: A DSK Flash Memory Programmer

AC Induction Motor (ACIM) Control Board

WL1271 ini File Description and Parameters User's Guide

Application Report. 1 Hardware Description. John Fahrenbruch... MSP430 Applications

IMPORT/EXPORT WISH LIST ITEMS FOR MAGENTO 2. User Guide. User Guide Page 1

EV Evaluation System User Guide. Contents. Kit Contents. Introduction

SN54ALS32, SN54AS32, SN74ALS32, SN74AS32 QUADRUPLE 2-INPUT POSITIVE-OR GATES

TMS320 DSP DESIGNER S NOTEBOOK. Serial ROM Boot APPLICATION BRIEF: SPRA233. Alex Tessarolo Digital Signal Processing Products Semiconductor Group

27 - Line SCSI Terminator With Split Reverse Disconnect

Performance Analysis of Face Recognition Algorithms on TMS320C64x

HIDE PRODUCT PRICE. User Guide. User Guide Page 1

System-on-Chip Battery Board User s Guide

October 2002 PMP Portable Power SLVU074

Store & Currency Auto Switcher

SN54ALS04B, SN54AS04, SN74ALS04B, SN74AS04 HEX INVERTERS

DSP/BIOS LINK. Pool LNK 082 DES. Version 1.30

Transcription:

C Fast RTS Library User Guide (Rev 1.0)

Revision History 22 Sep 2008 Initial Revision v. 1.0

IMPORTANT NOTICE Texas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to discontinue any product or service without notice, and advise customers to obtain the latest version of relevant information to verify, before placing orders, that information being relied on is current and complete. All products are sold subject to the terms and conditions of sale supplied at the time of order acknowledgment, including those pertaining to warranty, patent infringement, and limitation of liability. TI warrants performance of its products to the specifications applicable at the time of sale in accordance with TI s standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary to support this warranty. Specific testing of all parameters of each device is not necessarily performed, except those mandated by government requirements. Customers are responsible for their applications using TI components. In order to minimize risks associated with the customer s applications, adequate design and operating safeguards must be provided by the customer to minimize inherent or procedural hazards. TI assumes no liability for applications assistance or customer product design. TI does not warrant or represent that any license, either express or implied, is granted under any patent right, copyright, mask work right, or other intellect ual property right of TI covering or relating to any combination, machine, or process in which such products or services might be or are used. TI s publication of information regarding any third party s products or services does not constitute TI s approval, license, warranty or endorsement thereof. Reproduction of information in TI data books or data sheets is permissible only if reproduction is without alteration and is accompanied by all associated warranties, conditions, limitations and notices. Repres entation or reproduction of this information with alteration voids all warranties provided for an associated TI product or service is an unfair and deceptive business practice, and TI is neither responsible nor liable for any such use. Resale of TI s products or services with statements different from or beyond the parameters stated by TI for that product or service voids all express and any implied warranties for the associated TI product or service, is an unfair and deceptive business practice, and TI is not responsible nor liable for any such use. Also see: Standard Terms and Conditions of Sale for Semiconductor Products. www.ti.com/sc/docs/stdterms.htm Mailing Address: Texas Instruments Post Office Box 655303 Dallas, Texas 75265 Copyright 2008, Texas Instruments Incorporated 3

1 Contents 1 Contents... iv 2 Figures... v 3 Tables... v 1 Introduction... 6 1.1 Introduction... 6 1.2 Release package and directory structure... 6 1.3 FastRTS C functions... 6 1.4 Macros provided:... 7 1.5 Usage:... 7 1.6 Comparison between FastRTS and C FastRTS... 7 2 Function Descriptions... 8 2.1 addsp_i: Single precision floating-point addition... 8 2.2 subsp_i: Single precision floating point subtraction... 8 2.3 uintsp_i: Convert 32-bit unsigned integer to single precision floating point... 8 2.4 intsp_i: Convert 32-bit signed integer to single-precision floating-point... 9 2.5 mpysp_i: Single precision floating-point multiplication... 9 2.6 recipsp_i: Single precision floating point reciprocal... 9 2.7 spint_i: Single precision floating point to 32-bit signed integer... 9 2.8 spuint_i: Single precision floating point to 32-bit unsigned integer... 9 3 Benchmarks... 11 3.1 C64x and C64x+ FastRTS C Library Benchmarks... 11 4 Flow Charts... 12 4.1 Single Precision Addition (addsp_i):... 12 4.2 Single Precision Subtraction (subsp):... 13 4.3 Single Precision Multiplication (mpysp):... 14 4.4 Single Precision Division (divsp_i):... 15 4.5 Single Precision Reciprocal (recipsp_i):... 16

2 Figures Figure 1: Directory structure... 6 Figure 2 : addsp_i... 12 Figure 3 : subsp_i... 13 Figure 4 : mpysp_i... 14 Figure 5 : divsp_i... 15 Figure 6 : recipsp_i... 16 3 Tables Table 1. Fast RTS C functions.... 6 Table 2: Function Performance... 11 v

1 Introduction 1.1 Introduction The C62x/C64x/C64x+ FastRTS C library is an optimized, floating-point function library. The FastRTS C library provides C implementation for a subset of functions available with the FastRTS library. The C codes allow the user to inline these functions and get much improved performance. To learn more about inlining, please refer to SPRU187. 1.2 Release package and directory structure The C package is release as a part of the fastrts library. The package release directory is as shown. Figure 1: Directory structure 1.3 FastRTS C functions Table 1. Fast RTS C functions. FastRTS C functions addsp_i divsp_i intsp_i mpysp_i recipsp_i spint_i spuint_i Function Description Single precision floating point addition Single precision floating point division 32-bit signed integer to single precision floating point number Single precision floating point multiplication Single precision floating point reciprocal Single precision floating point number to 32-bit signed integer Single precision floating point number to 32-bit unsigned integer

sqrtsp_i subsp_i uintsp_i Single precision floating point square root Single precision floating point subtraction 32-bit unsigned integer to single precision floating point number 1.4 Macros provided: There are two macros used in the code. DEBUG This macro switches ON the under-flow and overflow checks in the code. See flowcharts and individual function description for further details. INLINE_C This macro enables inlining of the C fast RTS functions. 1.5 Usage: Following steps should be followed to use the C fast RTS library Include fastrts_i.h file in your source files. Call appropriate functions in code. Define the above macros as required. The remainig build process remains the same. An example project demonstrating the use of the C fast RTS library is provided in the release. The C library works for all TI C6x architectures, namely the C62x, the C64x and the C64x+. Appropriate code for a particular architecture is generated based on the compiler options selected by the user. 1.6 Comparison between FastRTS and C FastRTS The FastRTS library is written in optimized assembly to get maximum performance. The drawback is that because of its assembly nature, the kernels can t be inlined by the compiler. The FastrRTS C library is written completely in C and thus the compiler can inline the kernels to get maximum advantage. Unlike the RTS library, both the FastRTS lib and the FastRTS C library make compromises to the accuracy to get better performance. These compromizes include underflow and overflow checks and for most use cases, the accuracy loss is acceptable. Unlike FastRTS library, the FastRTS C library includes the code for such checks under DEBUG macro. This macro should be enabled for debug purposes only as it results in loss of performance. 7

2 Function Descriptions 2.1 addsp_i: Single precision floating-point addition Syntax: float addsp_i(float x, float y) Defined in: addsp_i.h Description: The sum of two input 32-bit floating-point number is generated Special Cases: Zero input return zero output Underflow and overflow is checked only in the DEBUG mode 2.2 subsp_i: Single precision floating point subtraction Syntax: float subsp_i(float x, float y) Defined in: subsp_i.h Description: The difference of two single precision floating point numbers Special Cases: Underflow and overflow is checked in DEBUG mode 2.3 uintsp_i: Convert 32-bit unsigned integer to single precision floating point Syntax: float uintsp_i(unsigned int x) Defined in: A 32-bit unsigned integer is converted to a single precision floating point number divsp_i: Single-precision floating-point division Syntax: float divsp_i(float x, float y) Defined in: divsp_i.h Description: The quotient for division of two 32-bit floating-point numbers is generated Special Cases: 8

Underflow and Overflow of the quotient is checked only in the DEBUG mode Zero divided by Zero returns 1.#NAN n-zero over zero returns infinity 2.4 intsp_i: Convert 32-bit signed integer to single-precision floating-point Syntax: Float intsp_i(int x) Defined in: intsp_i.h Description: An input 32-bit signed integer is converted to a 32-bit single precision floating point number 2.5 mpysp_i: Single precision floating-point multiplication Syntax: float mpysp_i(float x, float y) Defined in: mpysp_i.h Description: The product of two 32-bit floating point numbers is generated 2.6 recipsp_i: Single precision floating point reciprocal Syntax: float recipsp_i(float x) Defined in: recipsp_i Description: The reciprocal of an input 32-bit floating point number is generated Special Cases: Underflow and overflow is checked only in DEBUG mode The reciprocal of zero returns infinity 2.7 spint_i: Single precision floating point to 32-bit signed integer Syntax: int intsp_i(float x) Defined in: spint_i.h Description: A single precision floating point number is converted to a 32-bit signed integer 2.8 spuint_i: Single precision floating point to 32-bit unsigned integer Syntax: Unsigned int spuint_i(float x) Defined in: spuint_i.h 9

Description: A single precision floating point number is converted to 32-bit unsigned integer Special Cases: Numbers less than 1.0 returns zero Results greater than 32 bits generate the following saturation values: o 0xffff_ffff for positive numbers o 0x0000_0000 for negative numbers 10

3 Benchmarks 3.1 C64x and C64x+ FastRTS C Library Benchmarks Table 2 gives samples of execution clock cycles. The times in column 3 and 5 (function call) includes the overhead of the function call. The benchmarks were taken using TMS320C64x+ simulator (Little Endian) with flat memory architecture without overheads. The code has been tested for large number of inputs. Table 2: Function Performance Execution Cycles for C64x Execution Cycles for C64x+ FastRTS optimized C FastRTS optimized C Function Inlined and Pipelined Function Call Inlined and Pipelined Function Call addsp_i 21.17 37.12 11.33 36 subsp_i 21.18 38.12 11.33 37.12 multsp_i 6.041 31.012 5.03 27.01 divsp_i 17.08 63.012 17.08 62.01 recipsp_i 15.07 62.012 15.08 60.01 intsp_i 4.025 22.012 4.02 22.01 spint_i 7.032 20.012 6.01 22.01 spuint_i 8.027 22.012 8.02 22.01 sqrtsp_i - 559.75-545.15 uintsp_i 3.26 16.12 3.21 16.12 *Compiler version used for Benchmarking is v6.0.18 11

4 Flow Charts 4.1 Single Precision Addition (addsp_i): Op2 Op1 IF both 0 Set the ZERO FLAG Extract the exponent, the fraction and sign Inset the hidden bit Op < 0 2 s complement Shift fractions to align radix point and add If ZERO FLAG set Round and normalize the result (24 bits only) Make the exp and fraction of result= 0 Check for overflow and underflow Assemble the result and return DEBUG mode Figure 2 : addsp_i 12

4.2 Single Precision Subtraction (subsp_i): Op2 Op1 IF both 0 Set the ZERO FLAG 2 s complement of Op2 Extract the exponent, the fraction and sign Inset the hidden bit Op < 0 2 s complement Shift fractions to align radix point and add If ZERO FLAG set Round and normalize the result (24 bits only) Check for overflow and underflow Make the exp and fraction of result= 0 DEBUG mode Assemble the result and return Figure 3 : subsp_i 13

4.3 Single Precision Multiplication (mpysp_i): Op2 Op1 IF any 0 Set the ZERO FLAG Extract the exponent, the fraction and sign Inset the hidden bit Perform 32-bit Multiplication Round and normalize the result (24 bits only) If ZERO FLAG set Check for overflow and underflow DEBUG mode Make the exp and fraction of result= 0 Assemble the result and return Figure 4 : mpysp_i 14

4.4 Single Precision Division (divsp_i): Op2 Op1 Set the IFINITY FLAG IF 0 IF 0 Set the ZERO FLAG If ZERO FLAG set Extract the exponent, the fraction and sign Make the result 0 Inset the hidden bit Loop: Perform Division by repeated subtraction If INFINITY FLAG set Make the result = INFNAN Round and normalize the result (24 bits only) Check for overflow and underflow DEBUG mode If Both FLAGS set Make the result = NAN Assemble the result and return Figure 5 : divsp_i 15

4.5 Single Precision Reciprocal (recipsp_i): Op2 Set the IFINITY FLAG IF 0 Op1 = 1 Extract the exponent, the fraction and sign Inset the hidden bit If INFINITY FLAG set Loop: Perform Division by repeated subtraction Make the result = INFNAN Round and normalize the result (24 bits only) Assemble the result and return Check for overflow and underflow DEBUG mode Figure 6 : recipsp_i 16