FPGA Based Digital Signal Processing Applications & Techniques. Nathan Eddy Fermilab BIW12 Tutorial

Similar documents
REAL TIME DIGITAL SIGNAL PROCESSING

Fatima Michael College of Engineering & Technology

Parallel FIR Filters. Chapter 5

REAL TIME DIGITAL SIGNAL PROCESSING

19. Implementing High-Performance DSP Functions in Stratix & Stratix GX Devices

LWA DRX C Language Simulation Report - v0.1

How to validate your FPGA design using realworld

CHAPTER 4. DIGITAL DOWNCONVERTER FOR WiMAX SYSTEM

D. Richard Brown III Professor Worcester Polytechnic Institute Electrical and Computer Engineering Department

D. Richard Brown III Associate Professor Worcester Polytechnic Institute Electrical and Computer Engineering Department

Implementation of a Low Power Decimation Filter Using 1/3-Band IIR Filter

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering. Winter/Summer Training

Digital Filters in Radiation Detection and Spectroscopy

REAL TIME DIGITAL SIGNAL PROCESSING

Selecting PLLs for ASIC Applications Requires Tradeoffs

VIII. DSP Processors. Digital Signal Processing 8 December 24, 2009

isplever Parallel FIR Filter User s Guide October 2005 ipug06_02.0

Representation of Numbers and Arithmetic in Signal Processors

EQUALIZER DESIGN FOR SHAPING THE FREQUENCY CHARACTERISTICS OF DIGITAL VOICE SIGNALS IN IP TELEPHONY. Manpreet Kaur Gakhal

ECE4703 B Term Laboratory Assignment 2 Floating Point Filters Using the TMS320C6713 DSK Project Code and Report Due at 3 pm 9-Nov-2017

Filterbanks and transforms

PCI-16HSDI: 16-Bit, Six-Channel Sigma-Delta Analog Input PMC Board. With 1.1 MSPS Sample Rate per Channel, and Two Independent Clocks

Introduction to Field Programmable Gate Arrays

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

Chapter 1 Introduction

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India

Xilinx DSP. High Performance Signal Processing. January 1998

Data Manual GC4016 MULTI-STANDARD QUAD DDC CHIP

New Software-Designed Instruments

Image Compression System on an FPGA

GC2011A 3.3V DIGITAL FILTER CHIP DATASHEET. March 21, 2000

Basic Xilinx Design Capture. Objectives. After completing this module, you will be able to:

ECE 450:DIGITAL SIGNAL. Lecture 10: DSP Arithmetic

Digital Signal Processing with Field Programmable Gate Arrays

Computational Methods. H.J. Bulten, Spring

Field Programmable Gate Array (FPGA)

Classification of Semiconductor LSI

RISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER

Chapter 2. Cyclone II Architecture

HIGH SPEED SIGNAL CHAIN SELECTION GUIDE

4.1 QUANTIZATION NOISE

DIGITAL SYSTEM. Technology Overview Nordco. All rights reserved. Rev C

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Intel Stratix 10 Clocking and PLL User Guide

Qsys and IP Core Integration

Evaluating MMX Technology Using DSP and Multimedia Applications

REAL-TIME DIGITAL SIGNAL PROCESSING

Digital Signal Processing Lecture Notes 22 November 2010

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

CPCI-16HSDI. 16-Bit, Six-Channel Sigma-Delta Analog Input Board. With 1.1 MSPS Sample Rate per Channel, and Two Independent Clocks.

Floating-point to Fixed-point Conversion. Digital Signal Processing Programs (Short Version for FPGA DSP)

Digital Signal Processor 2010/1/4

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

Optimal Management of System Clock Networks

8. Migrating Stratix II Device Resources to HardCopy II Devices

Interfacing RLDRAM II with Stratix II, Stratix,& Stratix GX Devices

Intel MAX 10 Clocking and PLL User Guide

Hardware Design with VHDL PLDs IV ECE 443

2008/12/23. System Arch 2008 (Fire Tom Wada) 1

4. DSP Blocks in Stratix IV Devices

An introduction to Digital Signal Processors (DSP) Using the C55xx family

Characterizing Your PLL-based Designs To Manage System Jitter. Agilent Technologies

Implementing Biquad IIR filters with the ASN Filter Designer and the ARM CMSIS DSP software framework

CompuScope Ultra-fast waveform digitizer card for PCI bus. APPLICATIONS. We offer the widest range of

All MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes

Intel MAX 10 Clocking and PLL User Guide

User Manual for FC100

Data Acquisition Specifications a Glossary Richard House

Accuracy versus precision

COMP Overview of Tutorial #2

PSEC-4: Review of Architecture, etc. Eric Oberla 27-oct-2012

02 - Numerical Representations

ALTERA FPGAs Architecture & Design

Modeling and implementation of dsp fpga solutions

A Guide. DSP Library

SLC ultra low jitter Clock Synthesizer 2 MHz to 7 GHz

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com

A 20 GSa/s 8b ADC with a 1 MB Memory in 0.18 µm CMOS

PC104P66-16HSDI4AO4:

Intel Stratix 10 Variable Precision DSP Blocks User Guide

University Syllabus DSP Algorithms and Architecture

EECS 452 Midterm Closed book part Fall 2010

Product Information Sheet PX Channel, 14-Bit Waveform Digitizer

Detector Data Acquisition Hardware Designs and Features of NGC (New General Detector Controller)

FFT/IFFTProcessor IP Core Datasheet

MULTIPLE CHANNEL COHERENT AMPLITUDE MODULATED (AM) TIME DIVISION MULTIPLEXING (TDM) SOFTWARE DEFINED RADIO (SDR) RECEIVER

Intel MAX 10 FPGA Device Overview

Digital Signal Processing Introduction to Finite-Precision Numerical Effects

Xilinx System Generator v Xilinx Blockset Reference Guide. for Simulink. Introduction. Xilinx Blockset Overview.

MAX 10 FPGA Device Overview

The Benefits of FPGA-Enabled Instruments in RF and Communications Test. Johan Olsson National Instruments Sweden AB

PLATINUM BY MSB TECHNOLOGY

systems such as Linux (real time application interface Linux included). The unified 32-

Implementing FIR Filters

MAX 10 - ADC. Last updated 8/12/18

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Product Information Sheet PDA16 2 Channel, 16-Bit Waveform Digitizer

Transcription:

FPGA Based Digital Signal Processing Applications & Techniques BIW12 Tutorial

Outline Digital Signal Processing Basics Modern FPGA Overview Instrumentation Examples

Advantages of Digital Signal Processing (DSP) NO DRIFT due to temperature or age ACCURACY defined by number of bits PREDICTABILITY from simulation PERFORMANCE Linear Phase Response possible Adaptability in terms of resources PRODUCTION identical units, no tuning FLEXIBILITY via firmware modifications DYNAMIC RANGE

For Beam Instrumentation Need to work with analog input signals Beam pickups, Schottky detectors, Torroids, etc Requires Analog to Digital Converters (ADCs) Need to produce analog output signals To act on the beam RF, kick signals, etc Require Digital to Analog Converters (DACs) The effectiveness of FPGA solutions is largely dominated by the performance of the converters

Practical Limitations of Sampling Common sources of sampling error Aliasing Quantization Sample Clock Jitter Can be characterized by their impact on the Signal to Noise Ratio (SNR) of the sampled signal SNR ~ log(sa/na) Typically expressed in decibels (db)

Discrete Time Sampling Si = S(t)* (t) where (t) = The sequence Si is the sampled version of S(t) (t-nt) The sampling frequency Fs is 1/T

Sampling Theorem For ideal reconstruction Fs > 2B where B is the highest frequency in the signal of interest Sampling ambiguity

Aliasing Signals at frequencies larger Fs/2 than will alias into the first Nyquist band Undersampling makes use of this effect

Quantization Results in signal noise SNRdb = 1.76 + 6.02N, N=number of bits

Oversampling Improves SNR Gain SNRdb = 10log(R) where R=Fs/2B Fs is the sampling Frequency B is signal bandwidth Also relaxes the requirements on the antialiasing analog filter

Sampling Clock Jitter Another source of signal noise Proportional to maximum signal frequency, fmax RMS sampling clock jitter, Ta SNRdb = 20log(1/(2 fmaxta)) A significant effect for undersampling applications

Theoretical SNR Comparison

The Z domain A sampled sequence, x(n) can be represented X(z) = n=0 x(n)z-n Where z-1 is the unit delay related to the sample period (T) X(z) is the z-transform of x(n)

Difference Eq & Transfer Function A constant coefficient difference equation is a recursive relationship where-by the output of a discrete time system can be calculated using a combination of past output values and past and present input values N k=0 aky[n-k] = M k=0 Gives z-domain transfer function, H(z) H(z) = Y(z) = X(z) bkx[n-k] M -k b z k k=0 N -k a z k k=0 Digital filters are specified by a,b coefficients

Infinite Impulse Response (IIR) Filter Feedback provides infinite impulse response an non-zero for n 1 Properties Very efficient Non-linear Phase Can be unstable Feedforward Feedback

Finite Impulse Response (FIR) Filter Nth Order FIR filter Just the feedforward block a0=1, all others zero Has linear phase if coefficients are symmetric That is {b0,,bn} = {bn,,b0} No analog equivalent

Cascade Integrating Comb (CIC) Filter Comb Filter Integrator H(z) = 1-z-RM H(z) = 1/(1+z-1) For an Nth Order CIC Filter, the transfer function is (1-z-RM)N RM-1 -k N H(z) = =( z ) -1 N (1+z ) k=0 Equivalent to N FIR filters with unit coefficients -> symmetric Linear Phase even though it has infinite response filter sections Used as very efficient way to filter and change rate Can be used as Interpolation filter by reversing I & C sections

Quadrature Signals, I & Q Complex signal representation Imag Q(t) A(t) φ(t) Real I(t) In-Phase (real) & Quadrature (imaginary) I(t) = A(t)cos(φ(t)) Q(t) = A(t)sin(φ(t))

Digital Down-Conversion cos(2 f) X I = Xcos(2 f) Signal X sin(2 f) X Multiply a signal X by sin(2 f) & cos(2 f) where f is the the down-conversion frequency Q = Xsin(2 f) Shifts the spectral components of X down in frequency by f Commonly followed by decimating low pass filter Remove sum frequency components Process narrow bandwidth signals at lower rate in baseband

Discrete Fourier Transform (DFT) Discrete Fourier Transform of length N X(k) = N-1 n=0 x(n) [cos(2 nk/n) +isin(2 nk/n)] X(k) determines the complex signal contribution of the frequency in the composition of x(n) Phase or direction (forward, inverse) given by the sign of the imaginary term

Fast Fourier Transform (FFT) Works as a bank of band-pass filters An N point DFT requires N2 operations Can compute the N point DFT as two N/2 point DFTs The output magnitude from each filter is proportional to the input energy in each band Extrapolate to the limit where N is a power of 2 for an N point FFT An N point FFT requires N/2log2(N) operations Note, FFT are almost always implemented as a power of 2 but can be any prime factor

Outline Digital Signal Processing Basics Modern FPGA Overview Instrumentation Examples

FPGA Advantages for DSP Parallel processing power (Speed) Pipeline architecture Able to consume large quantities of data Alleviate bandwidth bottlenecks in front-ends Efficient digital filter implementation Perform operations under strict timing (latency) control Dedicated DSP (multiplier) blocks Flexibility Clocking synchronous with accelerator Easy integration with ADCs and DACs Easy to modify algorithms and functionality

High End FPGA Architecture

Flexible I/O Options Dedicated hardware support for almost all standard I/O protocols Transceivers with equalization for fast differential serial I/O External memory interface support PCI Express, RapidIO, etc Support for over 20GBps links! DDR, DDR2, DDR3 Speed and quantity of available I/O -> $$

Clock Domains and Distribution Networks Multiple PLLs and DLLs for sophisticated clock management Support for run-time reconfiguration Fractional PLLs for arbitrary clock synthesis Dedicated global and regional clock distribution networks Support high rate synchronous design Support parallel & pipelined design Up to 16 Global Clocks Up to 23 Regional Clocks per Quadrant

Synchronous Design - Pipelining Reliable designs require synchronous design principles Increase speed and bandwidth Trade resources and latency Simple Delay Pipeline Data Clock

Digital Signal Processing (DSP) Blocks Based upon dedicated multiplier blocks Provide efficient flexibility and power Fully customize, dedicated support for FIRs, FFT, etc to optimize for speed and efficiency Modern blocks provide capability for 32 or 64 bit floating point operations

Integer Representation 2s Complement N Bit representation N-1 Sign Bit (N-1) Integer Bits 2N-2 0 20 Positive numbers just simple binary representation Negative numbers are binary number that when added to a positive number of the same magnitude equals zero Convenient Properties Simple arithmetic addition & subtraction the same 2N-1-1 -1 = IBs all 1-2N-1-2N = IBs all 0

Fixed Point Signal Processing Bit widths double on each multiply! Need to control bit widths for efficient resource use Truncation or Rounding Saturation or Roll-over Fixed Point N Bit Representation N-1 M 0 Sign Integer Bits Fractional Bits Bit 2-1 2-M Can facilitate keeping track of bit widths

Floating Point Representation IEEE-754 Standard Followed Sign Exponent Bit 32 bit Single Precision 8 bit Exponent & 23 bit Mantissa 64 bit Double Precision Mantissa 11 bit Exponent & 52 bit Mantissa Single-Extended Precision Exponent and Mantissa widths are not fixed Minimum 11 Exponent Bits (Exponent<Mantissa) Minimum 31Mantissa Bits Total Bits at least 43 up to 64

Running Average Example Can implement an N sample running average as an FIR with N taps with coefficients of 1/N Can be efficiently implemented using two accumulators and a delay line x[n] z-n Acc Acc - /N y[n]

Intellectual Property (IP) Cores Functional cores which can be used to greatly speed up and simplify the design process Each FPGA manufacturer provides cores for all basic components PLLs, RAM, FIFOs, Flip-Flops, etc Accumulators, Add, Sub, Multiply, Divide, etc Take advantage of chip resources Advanced cores available for almost any task Simple implementation via GUI parametrization

Embedded Systems Hard Core Embedded processor is a dedicated physical component of the chip, separate from the programmable logic 2-4 times faster than Soft Core More efficient if you need a processor Soft Core Embedded processor is built out of the programmable logic on the chip Hard Core Resource Allocation A 32 bit RISC processor uses about few percent of total resources Have option to re-allocate resources if processor not needed Soft Core Resource Allocation

System On a Programmable Chip (SOPC) A wide variety of design tools and options available Focus upon developing System On a Programmable Chip Pure HDL entry still possible Aim to simplify the design process as chip architectures become more complicated Tightly couple design and simulation at all levels

Outline Digital Signal Processing Basics Modern FPGA Overview Instrumentation Examples

Booster Digital Damper

Digital Damper Board 4 DACs DDR SoDIMM Big FPGA 4 ADCs

Firmware Overview 84 FIRs In parallel Beam Pos Beam Kick Design operates synchronous with Booster RF sweeping from 37MHz to 53MHz Provides bucket by bucket damping on all 84 Booster bunches

ADC Input clocked at 4*RF RF Ref PLL 4RF ADC 4RF LVDS RX RF RF FIFO ADC clock is shifted to provide ideal sampling FIFO used to shift data from ADC clock to internal clock

RF Reference PLL Use one of the Enhanced PLLs Real-time reconfigurability Dedicated clock pin for RF Reference Directly connected to clock output pins for each ADC clock Use to adjust phase of ADC clock in 15º increments (~200ps @ 50Mhz) Instantiated with parametrized IP

PLL Instantiation

PLL Instantiation

LVDS Receiver Instantiated with parametrized IP Able to take full advantage of built in RX blocks Dynamic Phase Alignment circuitry Automatically de-serialize data from 4*RF rate to 4 time multi-plexed samples at RF rate No need to worry about timing constraints!

LVDS Receiver IP

LVDS Receiver IP

LVDS Receiver IP

Programmable Delay Dual-port RAM to provide programmable delay Offset read & write pointer to requested delay value Reset Read Counter to zero when Write Counter = Delay-1 Maximum delay set by the depth of the RAM Data Clock Write Counter Read Counter Clock WrAddr RdAddr RdData WrData Data In Dual-port RAM Data Out

Handling Frequency Sweep Digital Portion of design is operating locked to the Booster RF and sweeps along with it Need to adjust output delay to account for fixed input/output cable & amplifier delays Use lookup table to specify output delay Clock Turn Counter Look-up Table RAM Delay Out Turn Marker Enable

Implementation of Fine Delay Filter produces kick for each RF bucket Operate DAC at 12*RF clock rate Kick @RF Kick @12*RF Latch 12 Kick @RF LVDS TX To DAC Can shift data fed to LVDS Transmitter to provide RF/12 delay resolution

Tune Measurement 128 Turn Excite single bunch with noise via the damper Simple State Machine to Control Measurement User programmable excitation, noise or anti-damping Can vary gain and duration of excitation at each measurement point Up to 64 measurements at selectable time (turn) within the machine cycle Select bunch position to pass to FFT engine Measure response of single bunch over 128 turns Instantiate fixed point FFT using commercial IP

Tune Measurement

8 Channel 125MS/s Digitizer

Digital BPM Receiver Firmware

Narrowband (Closed Orbit) Filter Beam Calib Digital Down Converter (DDC) Section 32 separate filter paths simultaneously 8 channels, I&Q, 2 frequencies (beam, calibration) CIC filters operating in parallel at 71MHz Serial FIR filter at 4.2KHz

Numerically Controlled Oscillator (NCO) IP Generate Sin & Cos signals at requested frequency Frequency Output = FCLKPhi/2N Frequency Resolution = FCLK/2N

NCO IP

DDC Quantization Error Will always have a slight offset between the NCO frequency and the beam frequency Easy solution is to offset the NCO frequency to get an integer number of periods in the averaging window

CIC Filter IP Commercial IP allows for single instantiation for all 32 CIC filters needed Provides standard serial data output which can be directly interfaced to serial FIR filter

FIR Filter IP Provides simple filter design tool or ability to import filter coefficients Option to allow modification of coefficients

System Integration Tools Streamline system integration and design Use well defined interfaces Connect standard interfaces Internal memory, external memory, configuration devices, etc Connect custom interfaces Easy CPU integration Handles addressing & interrupts Even generates drivers for system components! Generates all logic for system interconnects! Handles all the timing - clock domains, multiplexing, etc Built in error checking at compile time Facilitates implementing re-usable HDL blocks and group design methodologies

System Integration Tool Example System Clocks Design Comps Component Library

System Integration Tool Example

System Integration Tool Example

Summary - the FPGA Pitch Sure things Looks promising FPGA are now the acknowledged leader of cutting edge fast DSP applications where speed and flexibility are needed Accelerator Control and Instrumentation is already using FPGAs to implement fast online applications, especially feedback & control The size, speed, and feature sets continue to grow by leaps and bounds Today s mid level chips are offering features only available in high end chips just a few years ago at a fraction of the cost Design tools are getting closer to traditional programming and becoming easier to use Use of FPGA s to implement online orbit measurements and optic calcuations which could be used for realtime feedback The next step is cluster and mesh architectures using FPGAs to further increase the processing power It could happen.. FPGA based co-processors for dedicated calculations FPGA based super computers which configure their hardware to optimize the performance for the algorithms being used

Thanks for Your Attention!