DSP Resources. Main features: 1 adder-subtractor, 1 multiplier, 1 add/sub/logic ALU, 1 comparator, several pipeline stages

Similar documents
Parallel FIR Filters. Chapter 5

FPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture

TSEA44 - Design for FPGAs

Xilinx ASMBL Architecture

FPGA architecture and design technology

Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs. Chethan Kumar H B and Nachiket Kapre

IA Digital Electronics - Supervision I

Altera FLEX 8000 Block Diagram

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs

Topics. Midterm Finish Chapter 7

Digital Circuit Design and Language. Datapath Design. Chang, Ik Joon Kyunghee University

Block RAM. Size. Ports. Virtex-4 and older: 18Kb Virtex-5 and newer: 36Kb, can function as two 18Kb blocks

Each DSP includes: 3-input, 48-bit adder/subtractor

UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT

Let s put together a Manual Processor

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

Review from last time. CS152 Computer Architecture and Engineering Lecture 6. Verilog (finish) Multiply, Divide, Shift

INTRODUCTION TO FPGA ARCHITECTURE

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

The DSP Primer 8. FPGA Technology. DSPprimer Home. DSPprimer Notes. August 2005, University of Strathclyde, Scotland, UK

Digital Design with FPGAs. By Neeraj Kulkarni

CHAPTER 4. DIGITAL DOWNCONVERTER FOR WiMAX SYSTEM

The Nios II Family of Configurable Soft-core Processors

Using Library Modules in Verilog Designs

REGISTER TRANSFER LANGUAGE

Design of Embedded DSP Processors

Stratix II vs. Virtex-4 Performance Comparison

Intel Stratix 10 Variable Precision DSP Blocks User Guide

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis

Binary Adders. Ripple-Carry Adder

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Achieving Breakthrough Performance with Virtex-4, the World s Fastest FPGA

Virtex-II Architecture

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit

Synthesis Options FPGA and ASIC Technology Comparison - 1

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

Lecture 7. Standard ICs FPGA (Field Programmable Gate Array) VHDL (Very-high-speed integrated circuits. Hardware Description Language)

Topics. Midterm Finish Chapter 7

CS429: Computer Organization and Architecture

Using Library Modules in Verilog Designs. 1 Introduction. For Quartus II 13.0

Field Programmable Gate Array (FPGA)

COSC 243. Computer Architecture 1. COSC 243 (Computer Architecture) Lecture 6 - Computer Architecture 1 1

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Qsys and IP Core Integration

The Xilinx XC6200 chip, the software tools and the board development tools

HDL Coding Style Xilinx, Inc. All Rights Reserved

Writing Circuit Descriptions 8

The Need of Datapath or Register Transfer Logic. Number 1 Number 2 Number 3 Number 4. Numbers from 1 to million. Register

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

EXPERIMENT NUMBER 11 REGISTERED ALU DESIGN

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

Computer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 2005

Course Overview Revisited

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India

Two hours - online EXAM PAPER MUST NOT BE REMOVED FROM THE EXAM ROOM UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Digital Design with SystemVerilog

Outline. EEL-4713 Computer Architecture Multipliers and shifters. Deriving requirements of ALU. MIPS arithmetic instructions

Figure 1: Verilog used to generate divider

UNIT - V MEMORY P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) Department of Electronics and Communication Engineering, VBIT

Counters, Timers and Real-Time Clock

CSE140L: Components and Design Techniques for Digital Systems Lab

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

FPGAs: FAST TRACK TO DSP

Virtex-II Architecture. Virtex II technical, Design Solutions. Active Interconnect Technology (continued)

Design and Simulation of Pipelined Double Precision Floating Point Adder/Subtractor and Multiplier Using Verilog

Synthesis vs. Compilation Descriptions mapped to hardware Verilog design patterns for best synthesis. Spring 2007 Lec #8 -- HW Synthesis 1

05 - Microarchitecture, RF and ALU

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

CSE140L: Components and Design

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

FPGA Matrix Multiplier

CSCB58 - Lab 3. Prelab /3 Part I (in-lab) /2 Part II (in-lab) /2 TOTAL /8

Implementation of DSP Algorithms

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

Design and Implementation of IEEE-754 Decimal Floating Point Adder, Subtractor and Multiplier

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder

Tailoring the 32-Bit ALU to MIPS

Introduction to Field Programmable Gate Arrays

EE178 Lecture Module 2. Eric Crabill SJSU / Xilinx Fall 2007

EEL 4783: HDL in Digital System Design

Lecture 3: Modeling in VHDL. EE 3610 Digital Systems

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

4. DSP Blocks in Stratix IV Devices

Optimized Design and Implementation of a 16-bit Iterative Logarithmic Multiplier

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering. Winter/Summer Training

Arithmetic Logic Unit. Digital Computer Design

Implementation of High Speed FIR Filter using Serial and Parallel Distributed Arithmetic Algorithm

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design

INTRODUCTION TO CATAPULT C

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Agenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs

System-on Solution from Altera and Xilinx

Overview. EECS Components and Design Techniques for Digital Systems. Lec 16 Arithmetic II (Multiplication) Computer Number Systems.

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

2015 Paper E2.1: Digital Electronics II

ECE 341 Midterm Exam

Transcription:

DSP Resources Specialized FPGA columns for complex arithmetic functionality DSP48 Tile: two DSP48 slices, interconnect Each DSP48 is a self-contained arithmeticlogical unit with add/sub/multiply/logic operations

DSP Resources DSP columns are, whenever possible, paired with Block RAM columns DSP48 tiles line up with RAMB36 (36Kb Block RAM) DSP48 slices line up with RAMB18 Setup results in very fast routing from Block RAM to DSP slices

DSP Resources Main features: 1 adder-subtractor, 1 multiplier, 1 add/sub/logic ALU, 1 comparator, several pipeline stages

Common DSP48 Applications Generic multiplication Generic Multiply-Accumulate: = ()() FIR Filters: y n = ( ) symmetric coefficients: =±( 1 ) Barrel shifters Replacing fabric adder trees with fast adder cascades

DSP48 Slice

DSP48 Slice 4 main control signals: INMODE (5b) controls pre-adders and input registers OPMODE (7b) controls X, Y, Z multiplexers CARRYINSEL (3b) controls source of carry for the final ALU ALUMODE (4b) controls ALU function Each has positive clock-enable, synchronous positive reset (OPMODE and CARRYINSEL share control signals)

DSP48 Slice: Input Preadder Block Preadder on A, D inputs Pipeline registers

DSP48 Slice: Input Pipeline Block Pipeline on on B input

DSP48E1 Slice: Multiplexers

DSP48E1 Slice: ALU 2 Modes: 3-input and 2-input 3-Input mode (arithmetic): ALUMODE Function 4 d0 X+Y+Z+CIN 4 d1 X+Y+CIN-Z-1 4 d2 -X-Y-Z-CIN-1 4 d3 Z-(X+Y+CIN) 2-Input mode (logic): 16 logic functions between X and Z

DSP48E1 Slice: ALU Arithmetic can be segmented into 2 or 4 SIMD

HDL Guide to Multipliers Described using multiplication operator ( * ) Can be combinatorial or registered Implemented as fabric (LUTs) or DSP48 slices based on operand size Per-device thresholds Virtex-7: 6x6 bit multiplier in fabric, larger in DSP48 DSP48 multiplication is natively signed 18x25 bit signed multiplication 17x24 unsigned multiplication (sign bit set to zero)

HDL Guide to Multipliers Pipeline registers can be absorbed into the DSP48 slice Up to 2 output pipeline levels Control signals: synchronous positive reset, positive enable always @(posedge clock) begin if(reset) begin result_r1 <= 0; result_r2 <= 0; end else if(en) begin result_r1 <= op_a * op_b; result_r2 <= result_r1; end end

HDL Guide to Multipliers Preadd-multipliers always @(posedge clock) if(reset) if(en) begin preadd <= op_b + op_c; result_r1 <= op_a * preadd; result_r2 <= result_r1; end Adder, multiplier and all pipeline stages absorbed into a DSP48 slice Same result for pre-subtraction

HDL Guide to Multipliers Preaddsub-multipliers (?) always @(posedge clock) if(reset) begin mult_in_preadd <= 0; result_r1 <= 0; result_r2 <= 0; end else if(en) begin if(paddsub) mult_in_preadd <= op_b - op_c; else mult_in_preadd <= op_b + op_c; result_r1 <= op_a * mult_in_preadd; result_r2 <= result_r1; end

HDL Guide to Multipliers Preaddsub-multipliers B and C operands do not exceed 25 bits Paddsub control signal selects subtraction when high (fabric inverter inserted if not) Optionally, a clock-enabled register on paddsub input can be absorbed to improve timing Because paddsub controls part of the INMODE register in the DSP48 slice, and INMODE is reset as a whole, reset is not available for paddsub only Use of reset will result in fabric Flip-Flop for paddsub Clock-enable can be dedicated

HDL Guide to MAC Multiply-accumulate block with pre-adder Multiplication result can be accumulated into 48- bit result output 32 samples accumulation at maximum multiplication operand size (32-stage FIR) Related function: preaddsub-multiplypostaddsub multiplier output added or subtracted from 48-bit input Switching operands results in fabric postaddsub Postaddsub control signals same as preaddsub

HDL Guide to DSP48 Adder Cascades Adder trees can only be implemented in fabric Asymptotically optimal delay O(logN) Slow speed (fmax) Consumes Resources Wastes Power 8-input adder tree (48b): ~4ns delay and 336 LUTs

HDL Guide to DSP48 Adder Cascades Adder cascades Non-optimal delay O(N) Fast speed (fmax) Requires only DSP48 resources 8-input, 48b adder tree converts to 7-stage adder cascade @700MHz: ~10ns delay (7 DSP48 slices)

HDL Guide to DSP48 Barrel Shifters Traditional way to code barrel shifter: assign sh = (input << amount) (input >> amount) 16-bit input results in 48-LUT implementation Idea: use shift-by-multiply: assign mult = input * (1 << amount) assign sh = mult[15:0] mult[31:16] Implemented using 1 DSP48 and 32 LUTs If amount is already decoded: 1 DSP48, 16 LUTs

HDL Guide to DSP48 Barrel Shifters Eliminate LUTs by using DSP48 built-in wire shift and adders

HDL Guide to DSP48 Barrel Shifters Implement in single DSP48 slice using multiply-accumulate and time multiplexing

Exercise: Complex Number Multiplier X = a + bj Y = c + dj using DSP48 Slices Compute X*Y and report resource utilization, fmax

Further Reading Virtex-4 FPGA XtremeDSP User Guide 7-Series FPGA DSP Resources User Guide DSP: Designing for Optimal Results