DSP Co-Processing in FPGAs: Embedding High-Performance, Low-Cost DSP Functions

Similar documents
System-on Solution from Altera and Xilinx

FPGA Based FIR Filter using Parallel Pipelined Structure

FPGAs: FAST TRACK TO DSP

Utility Reduced Logic (v1.00a)

Stratix II vs. Virtex-4 Performance Comparison

The S6000 Family of Processors

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

INTRODUCTION TO FPGA ARCHITECTURE

The Nios II Family of Configurable Soft-core Processors

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Utility Bus Split (v1.00a)

Developing and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors

Cache Justification for Digital Signal Processors

Summary. Introduction. Application Note: Virtex, Virtex-E, Spartan-IIE, Spartan-3, Virtex-II, Virtex-II Pro. XAPP152 (v2.1) September 17, 2003

AXI4 Interconnect Paves the Way to Plug-and-Play IP

Learning Outcomes. Spiral 3 1. Digital Design Targets ASICS & FPGAS REVIEW. Hardware/Software Interfacing

Parallel FIR Filters. Chapter 5

Reference System: MCH OPB EMC with OPB Central DMA Author: Sundararajan Ananthakrishnan

Fibre Channel Arbitrated Loop v2.3

SoC Basics Avnet Silica & Enclustra Seminar Getting started with Xilinx Zynq SoC Fribourg, April 26, 2017

Intro to System Generator. Objectives. After completing this module, you will be able to:

FPGA How do they work?

White Paper Using Cyclone III FPGAs for Emerging Wireless Applications

DSP Builder Handbook Volume 1: Introduction to DSP Builder

Agenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs

LogiCORE IP AXI DMA (v3.00a)

Embedded Systems. 7. System Components

An introduction to Digital Signal Processors (DSP) Using the C55xx family

ARM Processors for Embedded Applications

Power Consumption in 65 nm FPGAs

Spiral 3-1. Hardware/Software Interfacing

RUN-TIME PARTIAL RECONFIGURATION SPEED INVESTIGATION AND ARCHITECTURAL DESIGN SPACE EXPLORATION

MAX 10 FPGA Device Overview

Reference System: MCH OPB SDRAM with OPB Central DMA Author: James Lucero

Melon S3 FPGA Development Board Product Datasheet

Stratix vs. Virtex-II Pro FPGA Performance Analysis

The DSP Primer 8. FPGA Technology. DSPprimer Home. DSPprimer Notes. August 2005, University of Strathclyde, Scotland, UK

FPGA Algorithm Development Using a Graphical Environment

Interfacing LVPECL 3.3V Drivers with Xilinx 2.5V Differential Receivers Author: Mark Wood

ISE Design Suite Software Manuals and Help

Achieving High Performance DDR3 Data Rates in Virtex-7 and Kintex-7 FPGAs

FPGA Co-Processing Architectures for Video Compression

Scaling Up to TeraFLOPs Performance with the Virtex-7 Family and High-Level Synthesis

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Xilinx DSP. High Performance Signal Processing. January 1998

White Paper. The advantages of using a combination of DSP s and FPGA s. Version: 1.0. Author: Louis N. Bélanger. Date: May, 2004.

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

DSP Builder Handbook Volume 1: Introduction to DSP Builder

MAX 10 FPGA Device Overview

LogiCORE IP AXI DMA (v4.00.a)

Simplifying FPGA Design for SDR with a Network on Chip Architecture

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Teaching Computer Architecture with FPGA Soft Processors

LogiCORE IP Serial RapidIO Gen2 v1.2

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

AC : INCORPORATING SYSTEM-LEVEL DESIGN TOOLS INTO UPPER-LEVEL DIGITAL DESIGN AND CAPSTONE COURSES

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

Avnet Speedway Design Workshop

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

FPGAs Provide Reconfigurable DSP Solutions

LogiCORE IP Serial RapidIO v5.6

logibayer.ucf Core Facts

High-Performance DDR3 SDRAM Interface in Virtex-5 Devices Author: Adrian Cosoroaba

Intel MAX 10 FPGA Device Overview

Support Triangle rendering with texturing: used for bitmap rotation, transformation or scaling

Controller IP for a Low Cost FPGA Based USB Device Core

DESIGN AND IMPLEMENTATION OF 32-BIT CONTROLLER FOR INTERACTIVE INTERFACING WITH RECONFIGURABLE COMPUTING SYSTEMS

Hardware Design with VHDL PLDs IV ECE 443

Choosing a Processor: Benchmarks and Beyond (S043)

VLSI Design Automation. Maurizio Palesi

Digital Integrated Circuits

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

Microprocessor Soft-Cores: An Evaluation of Design Methods and Concepts on FPGAs

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2

DESIGN STRATEGIES & TOOLS UTILIZED

White Paper Low-Cost FPGA Solution for PCI Express Implementation

Bringing the benefits of Cortex-M processors to FPGA

Full Linux on FPGA. Sven Gregori

iphone Noise Filtration Hardware

Dynamic Phase Alignment for Networking Applications Author: Tze Yi Yeoh

Evolution of CAD Tools & Verilog HDL Definition

Spiral 2-8. Cell Layout

Core Facts. Documentation. Encrypted VHDL Fallerovo setaliste Zagreb, Croatia. Verification. Reference Designs &

AccelDSP Synthesis Tool

MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of

Table 1: Example Implementation Statistics for Xilinx FPGAs

Separating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance

Solving the Challenges for Terabit Networking and Beyond Author: Dean Eden

Virtex-4 Family Overview

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures

Documentation. Design File Formats. Constraints Files. Verification. Slices 1 IOB 2 GCLK BRAM

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased

Core Facts. Documentation Design File Formats. Verification Instantiation Templates Reference Designs & Application Notes Additional Items

A Scalable Multiprocessor for Real-time Signal Processing

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

New Computational Modeling for Solving Higher Order ODE based on FPGA

LogiCORE IP AXI Video Direct Memory Access v4.00.a

LogiCORE IP AXI DataMover v3.00a

Transcription:

White Paper: Spartan-3 FPGAs WP212 (v1.0) March 18, 2004 DSP Co-Processing in FPGAs: Embedding High-Performance, Low-Cost DSP Functions By: Steve Zack, Signal Processing Engineer Suhel Dhanani, Senior Solutions Marketing Manager FPGAs have been used in DSP applications for years as logic aggregators, bus bridges and peripherals. More recently FPGAs have been gaining considerable traction in high-performance digital signal processing applications and are emerging as ideal co-processors for standard DSP devices. In these latter two roles FPGAs provide tremendous computational throughput by using highly parallel architectures, and are hardware reconfigurable; allowing the designer to develop customized architectures for ideal implementation of their algorithms. The new generation of FPGAs developed using 90-nm process technology not only provide an effective way of implementing high-performance DSP functions but also provide the designer with an even more cost-effective solution. This white paper takes a look at some common high-performance DSP functions and calculates their effective implementation cost based on the published pricing of the underlying FPGA. For the purposes of this calculation, we will look at the Xilinx Spartan-3 FPGA family (introduced in 2003) which has a range of advanced features that enable the area-efficient implementation of DSP functions. 2004 Xilinx, Inc. All rights reserved. All Xilinx trademarks, registered trademarks, patents, and disclaimers are as listed at http://www.xilinx.com/legal.htm. All other trademarks and registered trademarks are the property of their respective owners. All specifications are subject to change without notice. WP212 (v1.0) March 18, 2004 www.xilinx.com 1

Using FPGAs for Implementing DSP Functions In many cases, FPGAs work in conjunction with a conventional DSP typically integrating pre- and post-processing functions, along with high performance signal processing. FPGAs can also integrate all the logic, bus-bridging, and peripheral functions, thus reducing system costs and affording a higher level of system integration. FPGAs bring two key advantages to digital signal processing. First their architectures are well suited for highly parallel implementation of DSP functions, allowing for very high performance. Second, user programmability allows designers to trade-off device area vs. performance by selecting the appropriate level of parallelism to implement their functions. By programming the FPGA to use more on-chip resources, designers can achieve higher performance. By using less resources (and accepting a corresponding lower performance), designers can optimize the design for low cost. These advantages are further illustrated below. FPGAs are essentially arrays of uncommitted logic and signal processing resources. These allow the designer to implement DSP functions using highly scalable, parallel processing techniques. For example, whereas a traditional DSP solution would implement multiple MAC functions in a serial manner, an FPGA allows designers to implement these in parallel using dedicated multipliers and registers that are now available in many FPGAs (illustrated in Figure 1). As an example of the real-life benefit of this capability consider a 256-tap FI filter. By using resources that are available in the FPGA fabric, the designer can design a highly parallel implementation and achieve higher performance. 2 www.xilinx.com WP212 (v1.0) March 18, 2004

Figure 1: FPGAs Parallel Approach to DSP Enables Higher Computational Throughput Because FPGAs are completely hardware configurable, the designer has the flexibility to use only the necessary resources that the algorithm demands (see Figure 2). Put another way, the designer can optimize the hardware architecture to suit the ideal algorithm. In Figure 2 we show the different ways of implementing four multiply-accumulate (MAC) functions. By using four embedded multipliers within the FPGA fabric, this can be done at maximum speed. Alternatively the designer can opt to conserve area and implement the same function at a lower performance, by using only one multiplier, one accumulator, and a register; or the WP212 (v1.0) March 18, 2004 www.xilinx.com 3

designer can use the semi-parallel approach shown in Figure 2. Parallel Semi-Parallel Serial D Device Area: 4A Device Area: 2A Device Area: 1A MMAC/s: 600 MMAC/s: 300 MMAC/s: 150 HIGH SPEED OPTIMIZED FO SMALL AEA D Figure 2: Customize the FPGA to Suit Your Needs While FPGAs bring significant benefits to digital signal processing, it is important to analyze the effective cost of implementing DSP functions within the FPGA fabric. For the purpose of this analysis, the new Spartan-3 FPGA family is considered. This is the latest low-cost FPGA family from Xilinx that combines both low unit costs and system features for DSP. Spartan-3 FPGAs Architecture Optimized for DSP Spartan-3 FPGAs use the latest 90-nm manufacturing technology to achieve low silicon die costs. These devices are also the only low-cost FPGAs that have all the features required for efficiently implementing DSP functions. Features that were once the exclusive domain of high-end FPGAs are now available within the Spartan-3 lowcost fabric see Table 1. Table 1: Manner List of Spartan-3 Features that Enable DSP Functions in an Area-Efficient Spartan-3 Silicon Features Embedded 18x18 Multipliers Distributed AM Shift egister Logic Up to 104 18 Kb Block AM Customer Benefits Area efficient implementation of multiplyaccumulate function Local storage for DSP coefficients, small FIFOs 16-bit Shift egister ideal for capturing high-speed or burst mode data and to store data in DSP applications Video line buffers, cache tag memory, scratch-pad memory, packet buffers, large FIFOs With this family of FPGAs, the designer can implement high-performance complex DSP functions in a small fraction of the total device, leaving the rest of the device free to implement system logic or interfacing functions providing both lower costs and higher system integration. Table 2 demonstrates how the combination of advanced features and low cost work together to provide digital signal processing capability at a low cost. This table shows 4 www.xilinx.com WP212 (v1.0) March 18, 2004

a sampling of available Spartan-3 parts, the number of embedded multipliers within each device, and the cost for MMAC/s using the 50,000 unit price for each part. Table 2: Device Calculating Cost Per MMAC/s Embedded Mults (18x18) MMAC/second (Number of Mults x 150 MHz) Cost for MMAC/s (1) XC3S50 4 600 $0.0055 XC3S200 12 1,800 $0.0024 XC3S400 16 2,400 $0.0030 XC3S1000 24 3,600 $0.0037 XC3S1500 32 4,800 $0.0044 Notes: 1. Calculated using customer resale price for 50,000 units slowest speed grade and smallest package. The Million Multiply-Accumulate per second (MMAC/s) column is calculated by multiplying the number of multipliers with the operating frequency of the multipliers, which in the case of the Spartan-3 FPGAs, is 150 MHz in the slowest speed grade. Then looking at the published 50,000 price for the slow speed grade of the appropriate device, we calculate the cost for MMAC/s. This is one of the quoted industry benchmarks. With cost per MMAC/s approaching less than quarter of a cent Spartan-3 FPGAs provide an economically viable DSP co-processing option. Spartan-3 FPGAs Achieving Lowest DSP Function Cost There is no standard way of estimating the actual cost of implementing DSP functions onto FPGAs. For the purpose of this analysis, we introduce the notion of using effective cost which is determined as the cost based on percentage of the silicon area that is utilized, multiplied by the unit device cost. There are various ways of estimating effective cost, but effective cost based on the percentage of the device utilized seems to be a fair way of calculating the cost to the user; since the remainder of the FPGA can be used to implement other system functions. To calculate the effective cost of a DSP function when implemented in an FPGA, we considered the Spartan-3 XC3S1000 device which is a mid-range member of the entire Spartan-3 FPGA family. In many cases a given DSP function uses not only the FPGA logic but also embedded multipliers and block AMs. In that case we estimate the amount of die space taken by these embedded functions and add that to the die area used by the logic. WP212 (v1.0) March 18, 2004 www.xilinx.com 5

Table 3 shows some of these functions and the cost of implementing these within the Spartan-3 silicon. Table 3: DSP Function Effective Costs in Spartan-3 Devices (1) Functions 1024-point complex FFT Single channel 64-tap FI filter Digital down converter per channel Digital up converter per channel % of the XC3S1000 Device Utilized Effective Cost (2) (50K Units) Key Specification Other Specifications 24.1% $3.23 20 µs transform 20 µs transform, burst I/O, 16-bit input and phase factor 3.0% $0.41 8.1 MSPS 16-bit data and co-efficient, MAC implementation, 8.1 MSPS 18.6% $2.49 Sample rate 100 MSPS 18.6% $2.49 Sample rate 100 MSPS Viterbi decoder 37.8% $5.06 1.9 MSPS per channel Parallel mode, trace-back 42, constraint length = 7, 32-channel, 1.9 MSPS per channel eed Solomon G.709 encoder eed Solomon G.709 decoder 1.3% $0.17 120 MHz 6.9% $0.92 60 MHz Notes: 1. These costs do not include the cost for the programming POM since in many cases the existing EPOM on-board can be used for programming the FPGA. Effective costs are calculated using the 50,000 unit customer resale prices (slowest speed grade, smallest package). 2. Using Xilinx direct 50,000 unit price of $13.40 for the XC3S1000 device. Some of the most common functions used in DSP applications are Fast Fourier Transforms (FFTs) and FI filters. A single channel 64-tap MAC FI filter running at 8.1 MSPS can be implemented for an effective cost of $0.41. Note that this filter utilizes 200 logic slices and four embedded multipliers approximately 3% of the die area. Forward error correction DSP cores such as Viterbi and eed Solomon functions can be implemented at a low cost within the Spartan-3 device. A 32-channel, parallel mode Viterbi decoder running at 1.9 MSPS per channel has an effective cost of $5.06 or $0.16 per channel. A eed Solomon G.709 decoder function running at 60 MHz takes only 6.9% of the same device (effective cost $0.92). Complex functions such as a digital down converter (DDC) or a digital up converter (DUC) commonly used in wireless base stations take less than 20% of the XC3S1000 device (effective cost of $2.49). 6 www.xilinx.com WP212 (v1.0) March 18, 2004

Other Benefits Development Tool Flow In addition to the inherent cost advantage that the Spartan-3 family offers, there are a host of other advantages to using these FPGAs in a DSP application: Ability to integrate system logic Since the Spartan-3 architecture has all the features optimized for DSP functions, the implementation is extremely areaefficient. There are still plenty of resources left to handle traditional FPGA tasks such as the interfaces between other devices on the PC board. Both the system logic and the DSP functions can be implemented in one cost-effective fabric. Ability to integrate the system control path Many systems have an external microcontroller. Xilinx offers a powerful 32-bit soft processor called the MicroBlaze processor that can be integrated within the Spartan-3 fabric. This processor which occupies only 6% of the XC3S1000 device (with performance up to 68 D-MIPS), can address many traditional 16 and 32-bit microprocessor applications within control, instrumentation, test and measurement, automotive, and high-end consumer applications. With Xilinx, DSP designs can be done using industry standard development tools. Using MATLAB and Simulink from MathWorks, coupled with Xilinx System Generator for DSP, system engineers can now model, simulate, and verify their signal processing algorithms, on their target hardware platform, all without leaving the Simulink environment. The design flow typically involves the following steps: 1. A DSP designer develops and verifies the hardware model using industrystandard tools from MathWorks in conjunction with the Xilinx System Generator for DSP. 2. With a push of a button, Xilinx System Generator generates an HDL circuit representation that is bit- and cycle-true meaning that the behavior is guaranteed to match the functionality seen in the Simulink/System Generator model. 3. Then the ISE design tools synthesize the design and produce a bitstream that can be used to program the FPGA. The error-prone and time-consuming step of having an FPGA designer translate the system engineer s design into HDL is thus eliminated. With recent advances in the Xilinx System Generator, DSP designers can now generate an FPGA bitstream directly using Simulink/System Generator. WP212 (v1.0) March 18, 2004 www.xilinx.com 7

Figure 3 shows a typical design flow using the Xilinx System Generator. XILINX BLOCKSET Function/System Modeling.mdl Auto HDL Generation.hdl,.tv,.do ModelSim -- SYNTHESIS -- PLACE & OUTE -- BITSTEAM GENEATION FPGA Implementation FPGA bitstream Figure 3: DSP Design Methodology That Works Within an Existing Tool Flow Conclusion With its combination of low unit costs and architecture optimized for DSP functions, Spartan-3 devices now enable the industry s lowest price-points for high-performance DSP functions. Xilinx further enables embedded DSP functions by providing design tools that fit within the standard DSP designer s tool flow, and enhances productivity by automating the FPGA implementation process. With the availability of the Spartan-3 devices, the associated design tools, and the increasing number of off-the-shelf DSP functions optimized for this fabric, designers must evaluate embedding DSP functions within the Spartan-3 FPGA as one of the options when designing systems. evision History The following table shows the revision history for this document. Date Version evision 03/18/04 1.0 Initial Xilinx release. 8 www.xilinx.com WP212 (v1.0) March 18, 2004