Wordlength Optimization

Similar documents
Floating-point to Fixed-point Conversion. Digital Signal Processing Programs (Short Version for FPGA DSP)

29.3. General Terms Design, Algorithms, Performance

Simulink-Hardware Flow

Simulink Design Environment

Basic Xilinx Design Capture. Objectives. After completing this module, you will be able to:

Introduction to Field Programmable Gate Arrays

Optimize DSP Designs and Code using Fixed-Point Designer

Introduction to DSP/FPGA Programming Using MATLAB Simulink

Moving MATLAB Algorithms into Complete Designs with Fixed-Point Simulation and Code Generation

Modeling and implementation of dsp fpga solutions

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Data Wordlength Optimization for FPGA Synthesis

Fixed Point Representation And Fractional Math. By Erick L. Oberstar Oberstar Consulting

Analytical Approach for Numerical Accuracy Estimation of Fixed-Point Systems Based on Smooth Operations

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

Scientific Computing. Error Analysis

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

A Library of Parameterized Floating-point Modules and Their Use

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

OUTLINE RTL DESIGN WITH ARX

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India

Representing and Manipulating Floating Points

ECE 450:DIGITAL SIGNAL. Lecture 10: DSP Arithmetic

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering. Winter/Summer Training

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

ERROR MODELLING OF DUAL FIXED-POINT ARITHMETIC AND ITS APPLICATION IN FIELD PROGRAMMABLE LOGIC

A Methodology and Design Environment for DSP ASIC Fixed Point Refinement

Computer Arithmetic. L. Liu Department of Computer Science, ETH Zürich Fall semester, Reconfigurable Computing Systems ( L) Fall 2012

Fine Grain Word Length Optimization for Dynamic Precision Scaling in DSP Systems

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

LogiCORE IP Floating-Point Operator v6.2

Modeling a 4G LTE System in MATLAB

Objectives. After completing this module, you will be able to:

Tutorial - Using Xilinx System Generator 14.6 for Co-Simulation on Digilent NEXYS3 (Spartan-6) Board

Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

An FPGA based Implementation of Floating-point Multiplier

Giving credit where credit is due

Implementing MATLAB Algorithms in FPGAs and ASICs By Alexander Schreiber Senior Application Engineer MathWorks

Digital Signal Processing Introduction to Finite-Precision Numerical Effects

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

Giving credit where credit is due

System Programming CISC 360. Floating Point September 16, 2008

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Representing and Manipulating Floating Points. Computer Systems Laboratory Sungkyunkwan University

Implementation of a Low Power Decimation Filter Using 1/3-Band IIR Filter

Floating Point Numbers

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS.

SONIA GONZALEZ-NAVARRO AND JAVIER HORMIGO Dept. Computer Architecture Universidad de Málaga (Spain)

Data Representation Floating Point

Numerical Methods in Scientific Computation

Floating Point January 24, 2008

Intro to System Generator. Objectives. After completing this module, you will be able to:

Fixed Point LMS Adaptive Filter with Low Adaptation Delay

The course that gives CMU its Zip! Floating Point Arithmetic Feb 17, 2000

USING THE SYSTEM-C LIBRARY FOR BIT TRUE SIMULATIONS IN MATLAB

FLOATING POINT ADDERS AND MULTIPLIERS

Representing and Manipulating Floating Points. Jo, Heeseung

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Representing and Manipulating Floating Points

Agenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs

Carry Prediction and Selection for Truncated Multiplication

Agenda. How can we improve productivity? C++ Bit-accurate datatypes and modeling Using C++ for hardware design

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

Synthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA Based FIR Filter using Parallel Pipelined Structure

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 55, NO. 10, NOVEMBER

Floating Point. CSE 238/2038/2138: Systems Programming. Instructor: Fatma CORUT ERGİN. Slides adapted from Bryant & O Hallaron s slides

Chapter 3: Arithmetic for Computers

CS321 Introduction To Numerical Methods

Lightweight Arithmetic for Mobile Multimedia Devices. IEEE Transactions on Multimedia

Computer Organization: A Programmer's Perspective

COE 561 Digital System Design & Synthesis Introduction

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

COMP Overview of Tutorial #2

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Implementation of Floating Point Multiplier Using Dadda Algorithm

NORTHWESTERN UNIVERSITY. An Algorithm to Trade off Quantization Error with Hardware Resources for MATLAB based FPGA design A THESIS

Floating Point : Introduction to Computer Systems 4 th Lecture, May 25, Instructor: Brian Railing. Carnegie Mellon

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

Lecture Objectives. Structured Programming & an Introduction to Error. Review the basic good habits of programming

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

Lightweight Arithmetic for Mobile Multimedia Devices

Measuring Improvement When Using HUB Formats to Implement Floating-Point Systems under Round-to- Nearest

Floating Point Arithmetic

Inf2C - Computer Systems Lecture 2 Data Representation

Foundations of Computer Systems

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

Exp#8: Designing a Programmable Sequence Detector

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Outline. Introduction to Structured VLSI Design. Signed and Unsigned Integers. 8 bit Signed/Unsigned Integers

Transcription:

EE216B: VLSI Signal Processing Wordlength Optimization Prof. Dejan Marković ee216b@gmail.com Number Systems: Algebraic Algebraic Number e.g. a = + b [1] High level abstraction Infinite precision Often easier to understand Good for theory/algorithm development Hard to implement [1] C. Shi, Floating-point to Fixed-point Conversion, Ph.D. Thesis, University of California, Berkeley, 2004. 12.2 1

Number Systems: Floating Point Widely used in CPUs Floating precision Good for algorithm study and validation Value = ( 1) Sign Fraction 2 (Exponent Bias) IEEE 754 standard Sign Exponent Fraction Bias Single precision [31:0] 1 [31] 8 [30:23] 23 [22:0] 127 Double precision [63:0] 1 [63] 11 [62:52] 52 [51:00] 1023 A short floating-point number π = Sign 0 1 1 0 0 1 0 1 0 1 Frac Exp π = ( 1) 0 (1 2 1 + 1 2 2 + 0 2 3 + 0 2 4 + 1 2 5 + 0 2 6 ) (1 2 2 2 + 0 2 1 + 1 2 0 3) = 3.125 Bias=3 12.3 π = Number Systems: Fixed Point 2 s complement 0 0 1 1 0 0 1 0 0 1 Unsigned magnitude Overflow-mode Quant.-mode Overflow-mode Quant.-mode π = 0 0 1 1 0 0 1 0 0 1 Sign W Int W Fr π = 0 2 3 + 0 2 2 + 1 2 1 + 0 2 1 + 0 2 2 + 1 2 3 + 0 2 4 + 0 2 5 + 1 2 6 = 3.140625 Economical implementation W Int and W Fr suitable for predictable range o-mode (saturation, wrap-around) q-mode (rounding, truncation) Economic for implementation Useful built-in MATLAB functions: e.g. fix, round, ceil, floor, dec2bin,bin2dec,etc. W Int W Fr In MATLAB: dec2bin(round(pi*2^6),10) bin2dec(above)*2^-6 Simulink SynDSP and SysGen 12.4 2

> 1 month Motivation for Floating-to-Fixed Point Conversion Algorithms designed in algebraic arithmetic, verified in floating-point or very large fixed-point arithmetic a = π + b Idea Floating-pt algorithm OK? Yes No Quantization VLSI Implementation in fixed-point arithmetic π = Overflow-mode Quant.-mode S 0 1 1 0 0 1 0 0 1 Fixed-pt algorithm OK? No Time consuming Error prone Sign W Int W Fr Yes Hardware mapping 12.5 Optimization Techniques: FRIDGE [2] Set of test vectors for inputs Pre-assigned W Fr at all inputs Range-detection through simulation W Int Deterministic propagation W Fr W Int in all internal nodes + Conservative but good for avoiding overflow W Fr in all internal nodes Unjustified input W Fr Overly conservative [2] H. Keding et al., "FRIDGE: A Fixed-point Design and Simulation Environment," in Proc. Design, Automation and Test in Europe, Feb. 1998, pp. 429 435. 12.6 3

Optimization Techniques: Robust Ad Hoc Fix-point system as black-box bit-true sim. System specifications Logic block WLs Hardware cost Ad hoc search [3] or procedural [4] Long bit-true simulation, large number of iterations [5] Impractical for large systems [3] W. Sung and K.-I. Kum, "Simulation-based Word-length Optimization Method for Fixed-point Digital Signal Processing Systems," IEEE Trans. Sig. Proc., vol. 43, no. 12, pp. 3087-3090, Dec. 1995. [4] S. Kim, K.-I. Kum, and W. Sung, "Fixed-Point Optimization Utility for C and C++ Based on Digital Signal Processing Programs," IEEE Trans. Circuits and Systems-II, vol. 45, no. 11, pp. 1455-1464, Nov. 1998. [5] M. Cantin, Y. Savaria, and P. Lavoie, "A Comparison of Automatic Word Length Optimization Procedures," in Proc. Int. Symp. Circuits and Systems, vol. 2, May 2002, pp. 612-615. 12.7 Problem Formulation: Optimization Minimize hardware cost: f(w Int,1, W Fr,1 ; W Int,2, W Fr,2 ; ; o-q-modes) Subject to quantization-error specifications: S j (W Int,1, W Fr,1 ; W Int,2, W Fr,2 ; ; o-q-modes) < spec, j Feasibility: N Z +, s.t. S j (N, N; ; any mode) < spec, j Stopping criteria: f < (1 + a) f opt where a > 0. From now on, concentrate on W Fr [1] [1] C. Shi, Floating-point to Fixed-point Conversion, Ph.D. Thesis, University of California, Berkeley, 2004. 12.8 4

Output MSE Specs: Perturbation Theory On MSE [6] 2 MS E = Ε [(Infinite-precision- output Fixed-point - output) ] p 2W T Fr, i μ μ 2 B c i 1 i for a datapath of p, WL B p, C p μ i 1 WFr, i qw i, datapath 2 W c,2 c onst c Fr, i fix-pt( ), c i i i q i 0, round-off 1, truncation [6] C. Shi and R.W. Brodersen, "A Perturbation Theory on Statistical Quantization Effects in Fixedpoint DSP with Non-stationary Input," in Proc. IEEE Int. Symp. Circuits and Systems, vol. 3, May 2004, pp. 373-376. 12.9 Actual vs. Computed MSE 11-tap LMS Adaptive Filter SVD U-Sigma Further improvement can be made considering correlation T T MSE E[ b b ] E[ ] μ Bμ σ Cσ i, T, m, T j n i, T m, T i, T m, T j n j n W p, with BC,, and σ 2 Fr i i More simulations required Usually not necessary 12.10 5

FPGA Hardware Resource Estimation Designs In SysGen/SynDSP Simulink Compiler Netlister VHDL/Core Generation Synthesis Tool Mapper Design Mapping Accurate X Sometimes unnecessarily accurate X Slow (minutes to hours) X Excessive exposure to low-end tools X No direct way to estimate subsystem X Hard to realize for incomplete design Map Report with Area Info Fast and flexible resource estimation is important for FFC! Tool needs to be orders of magnitude faster 12.11 Model-based Resource Estimation [*] Individual MATLAB function created for each type of logic MATLAB function estimates each logic-block area based on design parameters (input/output WL, o, q, # of inputs, etc ) Area accumulates for each logic block Total area accumulated from individual area functions (register_area, accum_area, etc ) Xilinx area functions are proprietary, but ASIC area functions can be constructed through synthesis characterizations [*] by C. Shi and Xilinx Inc. ( Xilinx) 12.12 6

ASIC Area Estimation ASIC logic block area is a multi-dimensional function of its input/output WL and speed, constructed based on synthesis Each WL setting characterized for LP, MP, and HP Perform curve-fitting to fit data unto a quadratic function Adder Area Multiplier Area x 10 4 800 2.5 Adder Area 600 400 200 Mult Area 2 1.5 1 0.5 0 40 30 40 30 20 30 20 30 20 20 10 10 10 10 Adder 0 Output Wordlength WL 0 max(input Adder Input Wordlength WL) Input Mult 2 0 0 2 WL Input Mult Input 1 WL WL 1 0 40 12.13 40 Analytical Hardware-Cost Function: FPGA Quadratic-fit hardware-cost If all design parameters (latency, o, q, etc.) and all W Int s are fixed, then the FPGA area is roughly quadratic to W Fr f( W) W H W H W h, where W ( W Fr, W Fr,...) 850 800 750 700 650 600 550 500 450 400 T T 1 2 3,1,2 Check Hardware-cost Fitting Behavior Check Hardware-cost Fitting Behavior Quadratic-fit Linear-fit Ideal-fit Quadratic-fit Linear-fit Ideal-fit FPGA Quadratic-fit Quadratic-fit hardware-cost hardware-cost 2.5 2 1.5 x 10 4 3 1 Quadratic-fit Linear-fit Ideal-fit Quadratic-fit Linear-fit Ideal-fit ASIC 350 350 400 450 500 550 600 650 700 750 800 850 Actual hardware-cost 0.5 0.5 1 1.5 2 2.5 3 Actual Actual hardware-cost ASIC area modeled by the same f (W) x 10 4 12.14 7

Wordlength Optimization Flow Simulink Design in XSG or SynDSP [7] See the book website for tool download. Initial Setup (10.16) WL Analysis & Range Detection (10.18) HW Models for ASIC Estimation (10.13) WL Connectivity & WL Grouping (10.19-20) Optimal W Int Create Cost-function for ASIC (10.12) Create cost-function for FPGA (10.12) MSE-specification Analysis (10.22) HW-acceleration / Parallel Sim. Under Development Data-fit to Create HW Cost Function (10.21) Data-fit to Create MSE Cost Function (10.22) Wordlength Optimization Optimization Refinement (10.23) Optimal W Fr 12.15 Initial Setup Insert a FFC setup block from the library see notes Insert a Spec Marker for every output requiring MSE analysis Generally every output needs one 12.16 8

Wordlength Reader Captures the WL information of each block If user specifies WL, store the value If no specified WL, back-trace the source block until a specified WL is found If source is the input-port of a block, find source of its parent 12.17 Wordlength Analyzer Determines the integer WL of every block Inserts a Range Detector at every active/non-constant node Each detector stores signal range and other statistical info Runs 1 simulation, unless specified multiple test vectors Xilinx Range Detectors SynDSP 12.18 9

Wordlength Connectivity Connect wordlength information through WL-passive blocks Back-trace until a WL-active block is reached Essentially flattens the design hierarchy First step toward reducing # of independent WLs Connected Connected 12.19 Wordlength Grouping Deterministic Fixed WL (mux select, enable, reset, address, constant, etc) Same WL as driver (register, shift reg, up/down-sampler, etc) Heuristic (WL rules) Multi-input blocks have the same input WL (adder, mux, etc) Tradeoff between design optimality and simulation complexity Fixed Heuristic Deterministic 12.20 10

Resource-Estimation Function, Analyze HW Cost Creates a function call for each block Slide 12.12, 12.14 HW cost is analyzed as a function of WL One or two WL group is toggled with other groups fixed Quadratic iterations for small # of WLs Linear iterations for large # of WLs 12.21 Analyze Specifications, Analyze Optimization Computes MSE s sensitivity to each WL group First simulate with all WL at maximum precision WL of each group is reduced individually Slide 12.9, 12.10 Once MSE function and HW cost function are computed, user may enter the MSE requirement Specify 1 MSE for each Spec Marker Optimization algorithm summary 1) Find the minimum W Fr for a given group (others high) 2) Based on all the minimum W Fr s, increase all WL to meet spec 3) Temporarily decrease each W Fr separately by one bit, only keep the one with greatest HW reduction and still meet spec 4) Repeat 3) until W Fr cannot be reduced anymore 12.22 11

Optimization Refinement and Result The result is then examined by user for suitability Re-optimize if necessary, only takes seconds Example: 1/sqrt() on an FPGA (16,12) (13,11) (14,9) (8,4) (24,16) (13,8) (24,16) (11, 6) (24,16) (10,6) (16,11) (11,7) (12,9) (10,7) About 50% area reduction Legend: red = WL optimal 409 slices black = fixed WL 877 slices (16,11) (16,11) (8,7) (8,7) 12.23 ASIC Example: FIR Filter [8] Original Design Area = 48916 μm 2 Optimized for MSE = 10 6 Area = 18356 μm 2 [8] C.C. Wang, Design and Optimization of Low-power Logic, M.S. Thesis, UCLA, 2009. (Appendix A) 12.24 12

Example: Jitter Compensation Filter [9] Derivative HPF LPF + Mult SNR (db) 40 35 30 25 20 15 10 5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (us) SNR (db) 40 35 30 25 29.4 db 30.8 db 20 15 10 5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Time (us) [9] Z. Towfic, S.-K. Ting, A. Sayed, "Sampling Clock Jitter Estimation and Compensation in ADC Circuits," in Proc. IEEE Int. Symp. Circuits and Systems, June 2010, pp. 829-832. 12.25 Tradeoff: MSE vs. Hardware-Cost 7 Acceptable MSE ACPR (MSE=6 10-3 ) 46 db HW Cost (kluts) 6 5 4 3 ACPR (MSE=7 10-3 ) 2 10-6 MSE cos 10-4 10-2 10-2 WL-Optimal Design 10-4 MSE sin 10-6 12.26 13

Summary Wordlength minimization is important in the implementation of fixed-point systems in order to reduce area and power Integer wordlength can be simply found by using range detection, based on input data Fractional wordlengths require more elaborate perturbation theory to minimize hardware cost subject to MSE error due to quantization Design-specific information can be used Wordlength grouping (e.g. in multiplexers) Hierarchical optimization (with fixed input/output WLs) WL optimizer for recursive systems takes longer due to the time require for algorithm convergence FPGA/ASIC hardware resource estimation results are used to minimize WLs for FPGA/ASIC implementations 12.27 References (1/3) C. Shi, Floating-point to Fixed-point Conversion, Ph.D. Thesis, University of California, Berkeley, 2004. H. Keding et al., "FRIDGE: A Fixed-point Design and Simulation Environment," in Proc. Design, Automation and Test in Europe, Feb. 1998, pp. 429 435. W. Sung and K.-I. Kum, "Simulation-based Word-length Optimization Method for Fixed-point Digital Signal Processing Systems," IEEE Trans. Sig. Proc., vol. 43, no. 12, pp. 3087-3090, Dec. 1995. S. Kim, K.-I. Kum, and W. Sung, "Fixed-Point Optimization Utility for C and C++ Based on Digital Signal Processing Programs," IEEE Trans. Circuits and Systems-II, vol. 45, no. 11, pp. 1455-1464, Nov. 1998. 12.28 14

References (2/3) M. Cantin, Y. Savaria, and P. Lavoie, "A Comparison of Automatic Word Length Optimization Procedures," in Proc. Int. Symp. Circuits and Systems, vol. 2, May 2002, pp. 612-615. C. Shi and R.W. Brodersen, "A Perturbation Theory on Statistical Quantization Effects in Fixed-point DSP with Non-stationary Input," in Proc. IEEE Int. Symp. Circuits and Systems, vol. 3, May 2004, pp. 373-376. See book supplement website for tool download. Also see: http://bwrc.eecs.berkeley.edu/people/grad_students/ccshi/res earch/ffc/documentation.htm 12.29 References (3/3) C.C. Wang, Design and Optimization of Low-power Logic, M.S. Thesis, UCLA, 2009. (Appendix A) Z. Towfic, S.-K. Ting, A.H. Sayed, "Sampling Clock Jitter Estimation and Compensation in ADC Circuits," in Proc. IEEE Int. Symp. Circuits and Systems, June 2010, pp. 829-832. 12.30 15

Course Wiki CAD Tutorials WL Optimization Tool Source code Tested with Matlab 2006b and SynDSP 3.6 11.31 16