Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage silage@temple.edu ECE Temple University www.temple.edu/scdl

Signal Processing Algorithms into Fixed Point FPGA Hardware Motivation Development of Programmable Logic Algorithms into Fixed Point The MathWorks Fixed Point Designer Fixed Point Design Project Example FPGA Synthesis SCDL

Motivation: Why use FPGAs in digital signal processing? Application specific performance in a non- ASIC configuration Low power consumption and heat dissipation Vector and parallel processing

Motivation: Why use fixed-point signal processing in FPGAs? No register size restriction No possible overflow as in fixed register processing Synthesis (put-and-place) may be augmented High speed integer arithmetic

Development of Programmable Logic DEC PDP-8/E ENIAC

Development of Programmable Logic The earliest programmable ICs (1978) were the Programmable Array Logic (PAL) devices PAL16L8

Development of Programmable Logic Combination logic and registered output PAL devices were then configured as macrocells (1983) PAL22V10

Development of Programmable Logic Complex Programmable Logic Devices (CPLD) combined several macrocells and a programmable interconnection system (1988) Xilinx XC2Cxx CPLD

Development of Programmable Logic As an alternative architecture, the early FPGA (1994) provided an array of configurable logic blocks (CLB) dominated by a complex routing scheme Xilinx Spartan FPGA

Development of Programmable Logic The Xilinx Spartan-IIE FPGA (1998) utilizes a matrix of routing channels surrounding the CLBs, input-output blocks (IOB) for the fine grained interconnection

Development of Programmable Logic The Xilinx Spartan-3E FPGA (2003) had routing channels surrounding the CLBs but included coarse grained structures such as block RAM, digital clock manager (DCM) and multipliers

Development of Programmable Logic The Xilinx FPGA has now split into two families: Low cost Spartan series High performance Virtex, Kintex and Artix series

Development of Programmable Logic Except for the first of the Virtex series all the Xilinx FPGAs feature a coarse grained architecture. Multipliers in hardware (DSP48) Block ram with variable size registers Digital clock manager (DCM)

Development of Programmable Logic Although applications in DSP and DIP are facilitated, the coarse-grained architecture has drawbacks for other applications. Hardware multipliers consume power Block RAM may be an impediment in some applications Synthesis (put-and- place) may be difficult because of the coarse grain

Development of Programmable Logic The first in the Xilinx Virtex series (1999) was a sea of CLBs. However, the largest of the class was the XCV1000 with only an array of 64 by 96 (6144) 2-slice CLBs with a 200 MHz clock. Virtex I 2-slice CLB

Development of Programmable Logic The configuration of the CLB remains the same now with only increases in width. Virtex I has a 4-bit look up table (LUT) for combinational logic. Virtex I has no hardware multipliers but some block RAM (128 Kbits)

Development of Programmable Logic The XCV1000 Virtex I had 512 IO pins and was designed for integration.

Development of Programmable Logic The versatility of the Virtex I is encompassed in its sea of CLBs, although small by the standards of today.

Development of Programmable Logic The largest Virtex 7 has 178 000 2-slice CLBs, a 30 times increase, but also 3360 hardware multipliers and 68 Mbit of block RAM in its coarse-grained architecture.

Development of Programmable Logic Although over 18 years old, the SCDL still uses Virtex I XCV1000 for proof of concept because of its ease of synthesis (put-and-place) in vector operations and IO capabilities for performance verification. www.temple.edu/scdl

Algorithms into Fixed Point FPGA designs implemented in fixed point will always be more efficient than their equivalent in floating point because fixed point implementations consume fewer resources and less power.

Algorithms into Fixed Point Fixed point computer arithmetic references and research form the basis for implementation

Algorithms into Fixed Point Discrete impulse applied to both FIR filters

Algorithms into Fixed Point Dynamic power saving >80%

Algorithms into Fixed Point CORDIC (COrdinate Rotation DIgital Computer) or Volder s algorithm is a simple and efficient process to calculate hyperbolic and trigonometric functions, typically converging with one digit (or bit) per iteration.

Algorithms into Fixed Point CORDIC implementations using fixed-point arithmetic are attractive as they can exhibit high performance and low resource usage.

Algorithms into Fixed Point Contemporary research:

MathWorks Fixed Point Designer Provides data types and tools for developing fixed-point algorithms to optimize performance on embedded hardware. Analyzes the design and proposes data types and attributes such as word length and scaling.

MathWorks Fixed Point Designer Performs bit-true simulations to observe the impact of limited range and precision. Conversion of double-precision algorithms to fixed point. Specifies data attributes such as rounding mode and overflow action.

MathWorks Fixed Point Designer Optimizes data types that meet numerical accuracy requirements and target hardware constraints. Compares fixed-point results with floatingpoint baselines and Verilog HDL code generation.

MathWorks Fixed Point Designer The fixed point design process starts with a floating point algorithm for later verification.

MathWorks Fixed Point Designer The Fixed Point Designer iteratively translates the floating point algorithm to fixed point.

MathWorks Fixed Point Designer The fixed point algorithm is verified as an intermediary Register Transfer Language (RTL).

MathWorks Fixed Point Designer The implementation of the verified RTL fixed point algorithm in an FPGA is either by the MathWorks HDL Coder or the Xilinx System Generator.

MathWorks Fixed Point Designer There are two design flows for conversion from floating-point to fixed-point in MATLAB/Simulink: Automatic conversion Fixed-Point Converter App Manual conversion fi command line

MathWorks Fixed Point Designer Fixed Point Converter App Verifies the full intended operating range of the algorithm using code coverage results Proposes fraction lengths based on default word lengths Proposes word lengths based on default fraction lengths Optimizes whole numbers

MathWorks Fixed Point Designer Fixed Point Converter App Specifies safety margins for min/max data View a histogram of bits that each variable uses

MathWorks Fixed Point Designer Fixed Point Converter App Detects overflows HDL Coder for FPGA synthesis

MathWorks Fixed Point Designer Fixed Point Converter App Example: fixed point parallel form digital filter Parallel execution in FPGA

MathWorks Fixed Point Designer Fixed Point Converter App Analysis results for fixed point parallel filter

MathWorks Fixed Point Designer Manual Conversion fi constructs fixed point numerical object

MathWorks Fixed Point Designer Manual Conversion However, fi requires more experience in its use

Fixed Point Designer Project Example Calculation in real-time of the projection matrix for the Algebraic Reconstruction Technique (ART) in the 3D tomographic chemical threat mapping using hyperspectral imaging. MESH, Inc

Fixed Point Designer Project Example Path measurements from a limited number of sensors outputs a concentration map. ART is used for noisy, sparsely sampled data from a limited number of sensors. MESH, Inc

Fixed Point Designer Project Example A path integrated concentration length (CL) is determined from wind velocity and integration time. ART uses the projection matrix to produce the object concentration vector for mapping

Fixed Point Designer Project Example The object vector is solved for iteratively, adjusting the part that is affected by the current CL measurement on each iteration.

Fixed Point Designer Project Example The iterative adjustment is an intensive parallel operation and the calculation is done in fixed point and real-time on an FPGA

FPGA Synthesis Synthesis of a digital logic system to meet functionality and timing constraints is only the first task.

FPGA Synthesis Xilinx provides the Integrated Logic Analyzer (ILA) which samples logic signals at design speeds and stores them on-chip in block RAM (BRAM). Porting a signal out of the FPGA for an external logic analyzer can distort the timing.

FPGA Synthesis However, the most vexing problem is the putand-place hardware synthesis problem of the design because of the coarse-grained architecture of the FPGA. The Xilinx PlanAhead tool facilitates but does not completely abrogate this problem.

FPGA Synthesis The coarse-grained architecture often thwarts put-and-place with only a smaller percentage of FPGA resources available for implementation.

FPGA Synthesis The fine-grained architecture of the Virtex I FPGA facilitates put-and-place

System Chip Design Laboratory www.temple.edu/scdl The SCDL was established in 1999 with funding from the Western Design Center, the developer of the 65C02/65C816/65C832 microprocessor SCDL in 1999

System Chip Design Laboratory www.temple.edu/scdl An integrated design environment (IDE) to facilitate the rapid hardware/software codesign of a digital signal processor (DSP) or process control soft cores in Xilinx FPGAs.

System Chip Design Laboratory www.temple.edu/scdl The Xilinx System Generator and MATLAB/Simulink for Xilinx FPGA hardware-in-the-loop embedded processing.

System Chip Design Laboratory www.temple.edu/scdl Real-time benchmarking and performance of the Xilinx Zynq System-on-Chip (SoC) processor and bus. The Zynq integrates a dual ARM Cortex-A9 processor, FPGA, hard core peripherals and the AMBA bus.

System Chip Design Laboratory www.temple.edu/scdl

System Chip Design Laboratory www.temple.edu/scdl Development of a hardware scheduler of multiple tasks in a hard real-time operating system (RTOS). Implementation of services provided by a software RTOS kernel, such as scheduling and inter-process communication, into the hardware.

System Chip Design Laboratory www.temple.edu/scdl Education in embedded system design using the Xilinx FPGA

System Chip Design Laboratory www.temple.edu/scdl Education in embedded system design now using the Xilinx Zynq SoC

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage silage@temple.edu ECE Temple University www.temple.edu/scdl