Engineer To Engineer Note

Similar documents
Engineer-to-Engineer Note

Engineer-to-Engineer Note

Engineer To Engineer Note

Engineer To Engineer Note

Engineer-to-Engineer Note

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers

Engineer-to-Engineer Note

12-B FRACTIONS AND DECIMALS

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers

Questions About Numbers. Number Systems and Arithmetic. Introduction to Binary Numbers. Negative Numbers?

Engineer To Engineer Note

Parallel Square and Cube Computations

x )Scales are the reciprocal of each other. e

Section 3.1: Sequences and Series

Subtracting Fractions

Engineer-to-Engineer Note

Engineer-to-Engineer Note

Engineer-to-Engineer Note

Enginner To Engineer Note

Accelerating 3D convolution using streaming architectures on FPGAs

Midterm 2 Sample solution

2 Computing all Intersections of a Set of Segments Line Segment Intersection

9 4. CISC - Curriculum & Instruction Steering Committee. California County Superintendents Educational Services Association

a Technical Notes on using Analog Devices' DSP components and development tools

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Engineer-to-Engineer Note

MATH 25 CLASS 5 NOTES, SEP

ECEN 468 Advanced Logic Design Lecture 36: RTL Optimization

Digital Design. Chapter 1: Introduction. Digital Design. Copyright 2006 Frank Vahid

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

Dynamic Programming. Andreas Klappenecker. [partially based on slides by Prof. Welch] Monday, September 24, 2012

Stack. A list whose end points are pointed by top and bottom

Presentation Martin Randers

Engineer To Engineer Note

Chapter 1: Introduction

Digital Design. Chapter 6: Optimizations and Tradeoffs

Small Business Networking

ECE 468/573 Midterm 1 September 28, 2012

Epson Projector Content Manager Operation Guide

Section 10.4 Hyperbolas

Unit 5 Vocabulary. A function is a special relationship where each input has a single output.

Approximation by NURBS with free knots

Small Business Networking

CSE 401 Midterm Exam 11/5/10 Sample Solution

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

Small Business Networking

5 Regular 4-Sided Composition

Transparent neutral-element elimination in MPI reduction operations

CHAPTER III IMAGE DEWARPING (CALIBRATION) PROCEDURE

Engineer To Engineer Note

Small Business Networking

Union-Find Problem. Using Arrays And Chains. A Set As A Tree. Result Of A Find Operation

Sage CRM 2018 R1 Software Requirements and Mobile Features. Updated: May 2018

Small Business Networking

Small Business Networking

Small Business Networking

Sage CRM 2017 R3 Software Requirements and Mobile Features. Updated: August 2017

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures

Small Business Networking

10.5 Graphing Quadratic Functions

TECHNICAL NOTE MANAGING JUNIPER SRX PCAP DATA. Displaying the PCAP Data Column

EasyMP Multi PC Projection Operation Guide

pdfapilot Server 2 Manual

Epson iprojection Operation Guide (Windows/Mac)

Computer Arithmetic Logical, Integer Addition & Subtraction Chapter

6.2 Volumes of Revolution: The Disk Method

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Tilt-Sensing with Kionix MEMS Accelerometers

2014 Haskell January Test Regular Expressions and Finite Automata

Tree Structured Symmetrical Systems of Linear Equations and their Graphical Solution

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.

Rational Numbers---Adding Fractions With Like Denominators.

such that the S i cover S, or equivalently S

Small Business Networking

EECS 281: Homework #4 Due: Thursday, October 7, 2004

9 Graph Cutting Procedures

How to Design REST API? Written Date : March 23, 2015

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation

Engineer-to-Engineer Note

Compatibility Testing - A Must Do of the Web Apps. By Premalatha Shanmugham & Kokila Elumalai

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Tixeo compared to other videoconferencing solutions

Digital Signal Processing: A Hardware-Based Approach

Sage CRM 2017 R2 Software Requirements and Mobile Features. Revision: IMP-MAT-ENG-2017R2-2.0 Updated: August 2017

RATIONAL EQUATION: APPLICATIONS & PROBLEM SOLVING

Memory-Optimized Software Synthesis from Dataflow Program Graphs withlargesizedatasamples

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

CS201 Discussion 10 DRAWTREE + TRIES

Agilent Mass Hunter Software

Geometric transformations

COMPUTER EDUCATION TECHNIQUES, INC. (MS_W2K3_SERVER ) SA:

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES)

EasyMP Multi PC Projection Operation Guide

Fault injection attacks on cryptographic devices and countermeasures Part 2

Fig.25: the Role of LEX

Solutions to Math 41 Final Exam December 12, 2011

Engineer-to-Engineer Note

Slides for Data Mining by I. H. Witten and E. Frank

Transcription:

Engineer To Engineer Note EE-186 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit our on-line resources http://www.nlog.com/dsp nd http://www.nlog.com/dsp/ezanswers Extended-Precision Fixed-Point Arithmetic on the Blckfin Processor Pltform Contributed by DSP Apps My 13, 2003 Introduction The Blckfin Processor pltform ws designed to efficiently perform 16-bit fixed-point rithmetic opertions. There re times, however, when it my become necessry to increse ccurcy by extending precision up to 32 bits. The first prt of this document describes n extended-precision, fixed-point rithmetic technique tht cn be emulted on Blckfin Processors using the ntive 16-bit ALU instructions. The second prt illustrtes Blckfin Processor ssembly implementtions of the 31- nd 32-bit- ccurte FIR filters. An ccompnying source code pckge contins full FIR nd IIR ssembly progrms. Bckground Extended-precision rithmetic is nturl softwre extension for 16-bit fixed-point processors. In mchines with 16-bit register files, two registers cn be used to represent one 31-bit or 32-bit fixed-point number. Blckfin Processors re idelly suited for extendedprecision rithmetic, becuse the register file is bsed on 32-bit registers, which cn either be treted s 32-bit entities or two 16-bit hlves. Before getting into specific DSP lgorithms, it is importnt to see how bsic rithmetic opertions cn be implemented with extended precision. Addition The Blckfin Processor instruction set contins single-cycle 32-bit ddition of the form R0 = R1 + R2. Therefore, no emultion is necessry for dding two 32-bit numbers. Subtrction of 32- bit numbers is lso ntively supported in the sme form s ddition: R0 = R1 - R2. Note tht, in these ddition nd subtrction instructions, ny combintion of dt registers cn be used. More Multipliction detiled informtion on the ntive Blckfin Processor opertions cn be found in the Blckfin Processor Instruction Set Reference. In order to introduce the concept of extendedprecision multipliction, it is useful to review the lredy fmilir deciml multipliction. Two-Digit Deciml Multipliction Let s strt by reclling how ny deciml multipliction cn be performed by knowing how to multiply single-digit numbers. As n exmple, consider this two-digit by twodigit deciml multipliction: 23 x 98 = 2254 Figure 1 illustrtes how this prticulr opertion cn be broken down into smller opertions. This is bsiclly multipliction by hnd. Copyright 2003, Anlog Devices, Inc. All rights reserved. Anlog Devices ssumes no responsibility for customer product design or the use or ppliction of customers products or for ny infringements of ptents or rights of others which my result from Anlog Devices ssistnce. All trdemrks nd logos re property of their respective holders. Informtion furnished by Anlog Devices Applictions nd Development Tools Engineers is believed to be ccurte nd relible, however no responsibility is ssumed by Anlog Devices regrding technicl ccurcy nd topiclity of the content provided in Anlog Devices Engineer-to-Engineer Notes.

Figure 1 Deciml multipliction in detil 1000 s plce 100 s plce 10 s plce 1 s plce 2 3 x 9 8 ------------------------------------------ {} + 8 x 3 = 24 {b} + 8 x 2 = 16 x 10 1 {c} + 9 x 3 = 27 x 10 1 {d}+ 9 x 2 = 18 x 10 2 ------------------------------------------------------------------------------------ {e} 18 x 10 2 + 27 x 10 1 + 16 x 10 1 + 24 x 10 0 = 2254 To compute the finl result, the following opertions re necessry: Four single-digit multiplictions (lines {}, {b}, {c}, {d} in Figure 1) 8 x 3 = 24, 8 x 2 = 16, 9 x 3 = 27, 9 x 2 = 18 Three opertions to shift the sub-products into the correct digit-significnt slot (lines {b}, {c}, {d} in Figure 1) 18 x 10 2, 27 x 10 1, 16 x 10 1 Three dditions (line {e} in Figure 1) 18 x 10 2 + 27 x 10 1, 16 x 10 1 + 24, (18 x 10 2 + 27 x 10 1 ) + (16 x10 1 + 24) Two-Digit Hexdeciml Multipliction Hexdeciml multipliction is not much different from its deciml counterprt. Let s consider multipliction of two 32-bit frctionl numbers, where the opernds re stored in the 32-bit generl-purpose dt registers R0 nd R1. Blckfin Processors ctully hve built-in 32- bit multiply opertion of the form: R1 *= R0. It is multi-cycle instruction tht tkes 5 cycles to execute from L1 memory. It is possible to improve this performnce with the 16-bit multipliction technique tht follows. 32-Bit Accurcy with 16-Bit Multipliction Insted of relying on this instruction, one cn use elementry rithmetic to chieve 32-bit multipliction result with single-cycle 16-bit multiplictions. Ech of the two 32-bit opernds (R0 nd R1) cn be broken up into two 16-bit hlves (R0.H, R0.L, R1.H, nd R1.L), s shown in Figure 2. Figure 2 Hexdeciml multipliction in detil bits 63:48 47:32 31:16 15:0 R0.H xr1.h R0.L R1.L ------------------------------------------ {} >> 32 + R1.L x R0.L {b} >> 16 + R1.L x R0.H {c} >> 16 + R1.H x R0.L {d}+ {e} R1.H x R0.H ------------------------------------------------------------------------------------ (R1.H x R0.H) + (R1.L x R0.H) >> 16 + (R1.H x R0.L) >> 16 + (R1.L x R0.L) >> 32 = R1 x R0 From this figure, it is esy to see the opertions required to emulte the 32-bit multipliction R0 x R1 with combintion of instructions using 16- bit multipliers: Four 16-bit multiplictions to yield four 32- bit results (lines {}, {b}, {c}, {d} in Figure 2) R1.L x R0.L, R1.L x R0.H, R1.H x R0.L, R1.H x R0.H Three opertions to shift the sub-products into the correct digit-significnt slot (lines {}, {b}, {c} in Figure 2). Since we re performing frctionl rithmetic, the result is 1.63 (1.31 x 1.31 = 2.62 with redundnt sign bit). Most of the time, the result cn be truncted to 1.31 in order to fit in ntive 32-bit dt register. Therefore, the result of the multipliction should be in reference to the sign bit, or the most significnt bit. In this wy, the rightmost lest significnt bits cn be sfely discrded in trunction. Extended-Precision Fixed-Point Arithmetic on the Blckfin Processor Pltform (EE-186) Pge 2 of 5

(R1.L x R0.L) >> 32, (R1.L x R0.H) >> 16, (R1.H x R0.L) >> 16 Three opertions to preserve bit plce in the finl nswer (line {e} in Figure 2): (R1.L x R0.L) >> 32 + (R1.L x R0.H) >> 16, (R1.H x R0.L) >> 16 + R1.H x R0.H, ((R1.L x R0.L) >> 32 + (R1.L x R0.H) >> 16) + ((R1.H x R0.L) >> 16 + R1.H x R0.H) The finl expression for 32-bit multipliction is: 31-Bit Accurcy with 16-Bit Multipliction From Figure 2, it is esy to see tht the multipliction of the lest significnt hlf-word R1.L x R0.L does not contribute much to the finl result. In fct, if the finl result is ultimtely truncted to 1.31 nywy, then this multipliction cn only hve n effect on the lest significnt bit of the 1.31 result. For mny pplictions, the loss of ccurcy due losing to this bit is blnced by the performnce increse over the 32-bit multipliction. Three opertions (one 16-bit multipliction, one shift, nd one ddition) cn be eliminted if 31-bit ccurcy is cceptble in the finl design: The remining instructions necessry to get 31- bit-ccurte 1.31 nswer re three 16-bit multiplictions, two dditions, nd shift: R1 x R0 = ((R1.L x R0.H) >> 16) + ((R1.H x R0.L) >> 16 + R1.H x R0.H) Further rerrngement of terms yields the finl form of 31-bit-ccurte multipliction: R1 x R0 = ((R1.L x R0.H) + R1.H x R0.L) >> 16 + (R1.H x R0.H) Double-Precision FIR Filter Implementtion 32-Bit-Accurte FIR Filter If we consider R0 to be the dt vlue nd R1 to be coefficient vlue, then ech multipliction in the FIR will be of the form described erlier: The kernel for 32-bit-ccurte FIR implementtion is shown in Listing 1. The number of cycles needed to execute the full implementtion is 28 + N*(3*T+5) cycles, where N is the size of the input buffer nd T is the number of filter tps. Complete source code for 31- nd 32-bitccurte FIR nd IIR filters is contined in the compressed pckge ccompnying this document. Listing 1 Kernel of 32-bit-ccurte FIR // I0 = ddress of the dely line buffer // I1 = ddress of the input rry // I2 = ddress of the coefficient rry // I3 = ddress of the output rry // P0 = number of input smples // P2 = number of coefficients // The outer loop itertes over ll the dt smples LSETUP(FIR_START, FIR_END) LC0=P0; FIR_START: // The first section performs multiplyccumulte on the lest significnt hlves of the dt nd coefficients (R0.L*R1.L), nd implicitly shifts the result >> 32 by plcing it in ccumultor A1 LSETUP(M_ST, M_ST) LC1=P2; A0=R0.L*R1.L (FU) R0=[I1--] R1=[I2++]; M_ST: R3.L=(A0+=R0.L*R1.L) (FU) R0=[I1--] R1=[I2++]; A1=R3; // In this section, the product of the most significnt words (R0.H*R1.H) gets ccumulted to A1, nd the products R0.L*R1.H nd R1.L*R0.H get ccumulted into A0 onto the running sum from the first section. The bit plcement shift is explicit in the R3=R3>>>15 instruction A0=R0.H*R1.H, A1+=R0.H*R1.L (M) [I3++]=R2; LSETUP(MAC_ST,MAC_END) LC1=P2; MAC_ST: A1+=R1.H*R0.L (M) R0=[I1--] R1=[I2++]; Extended-Precision Fixed-Point Arithmetic on the Blckfin Processor Pltform (EE-186) Pge 3 of 5

MAC_END: R2=(A0+=R0.H*R1.H), A1+=R0.H*R1.L (M); R3=(A1+=R1.H*R0.L) (M) I4+=4 R0=[I0++]; R3=R3>>>15 [I1--]=R0 R1=[I2++]; // The finl sum gives the nswer FIR_END: R2=R2+R3 (S); 31-Bit-Accurte FIR Filter A 31-bit-ccurte FIR filter cn be useful for extended precision in udio lgorithms. The 31- bit-ccurte multipliction (illustrted bove) cn be used for the FIR kernel computtion: R1 x R0 = ((R1.L x R0.H) + R1.H x R0.L) >> 16 + (R1.H x R0.H) The Blckfin Processor source code for the 31- bit-ccurte FIR filter is shown in Listing 2. The number of cycles needed to execute the full implementtion is 23 + N*(2*T+4) cycles, where N is the size of the input buffer nd T is the number of filter tps. Listing 2 Kernel of 31-bit-ccurte FIR // I0 = ddress of the dely line buffer // I1 = ddress of the input rry // I2 = ddress of the coefficient rry // I3 = ddress of the output rry // P0 = number of input smples // P2 = number of coefficients // M0 = 8 // The outer loop itertes over ll the dt smples A1=A0=0 R0=[I1--] R1=[I2++]; LSETUP(FIR_START, FIR_END) LC0=P0; FIR_START: // Compred to the first section in the 32-bit-ccurte FIR, this implementtion omits the lest significnt hlves (R0.L nd R1.L) of the opernds. The product of the most significnt words (R0.H*R1.H) gets ccumulted to A0, nd the products R0.L*R1.H nd R1.L*R0.H get ccumulted into A1. The bit plcement shift is explicit in the R3=R3>>>15 instruction LSETUP(MAC_ST,MAC_END) LC1=P2; MAC_ST: R2=(A0+=R0.H*R1.H), A1+=R0.H*R1.L (M); MAC_END: R3=(A1+=R1.H*R0.L) (M) R0=[I1--] R1=[I2++]; R3=R3>>>15 I1+=M0 R0=[I0++]; // R3 holds the finl nswer R3=R2+R3 (S) [I1--]=R0; FIR_END: A1=A0=0 [I3++]=R3; Summry This ppliction note described n effective method for implementing extended-precision rithmetic on Blckfin Processors. The discussion bout the trdeoffs between 31-bit ccurcy nd 32-bit ccurcy ws supported by code segments for n FIR filter. Tble 1 summrizes the performnce of the FIR nd IIR filters found in the compressed pckge supplied with this document. Tble 1 Computtion time for 31-bit nd 32-bit filter implementtions on Blckfin Processor FIR 32-bit ccurcy 28+N*(3*T+5) cycles 31-bit FIR ccurcy 23+N*(2*T+4) cycles IIR 23+18*N cycles 23+12*N cycles Extended-Precision Fixed-Point Arithmetic on the Blckfin Processor Pltform (EE-186) Pge 4 of 5

Document History Version My 13, 2003 by T. Luksik. April 1, 2003 by T. Luksik. Februry 26, 2003 by T. Luksik. Description Updted ccording to new nming conventions Revision to source code snippets nd ccompnying source code Initil relese Extended-Precision Fixed-Point Arithmetic on the Blckfin Processor Pltform (EE-186) Pge 5 of 5