Each DSP includes: 3-input, 48-bit adder/subtractor

Similar documents
Outline of Presentation History of DSP Architectures in FPGAs Overview of Virtex-4 4 DSP

Digital Circuit Design and Language. Datapath Design. Chang, Ik Joon Kyunghee University

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

ECE 341. Lecture # 6

Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs

ECE 341 Midterm Exam

REGISTER TRANSFER LANGUAGE

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

DSP Resources. Main features: 1 adder-subtractor, 1 multiplier, 1 add/sub/logic ALU, 1 comparator, several pipeline stages

FPGA architecture and design technology

VARUN AGGARWAL

Digital Design with FPGAs. By Neeraj Kulkarni

Topics. Midterm Finish Chapter 7

Week 7: Assignment Solutions

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

ECE 341 Midterm Exam

Learning Outcomes. Spiral 2-2. Digital System Design DATAPATH COMPONENTS

Novel Implementation of Low Power Test Patterns for In Situ Test

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26)

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

Binary Adders. Ripple-Carry Adder

Outline of Presentation

Online Testing of Word-oriented RAMs by an Accumulator-based Compaction Scheme in Symmetric Transparent Built-In Self Test (BIST)

Overview. EECS Components and Design Techniques for Digital Systems. Lec 16 Arithmetic II (Multiplication) Computer Number Systems.

Chapter 3: part 3 Binary Subtraction

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

ELCT 501: Digital System Design

Tailoring the 32-Bit ALU to MIPS

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs

An FPGA based Implementation of Floating-point Multiplier

Arithmetic Logic Unit. Digital Computer Design

ECE 30 Introduction to Computer Engineering

IMPLEMENTATION OF TWIN PRECISION TECHNIQUE FOR MULTIPLICATION

Xilinx ASMBL Architecture

CS222: Processor Design

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road QUESTION BANK UNIT I

ANALYZING THE PERFORMANCE OF CARRY TREE ADDERS BASED ON FPGA S

Parallelized Radix-4 Scalable Montgomery Multipliers

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

International Journal Of Global Innovations -Vol.1, Issue.II Paper Id: SP-V1-I2-221 ISSN Online:

JOURNAL OF INTERNATIONAL ACADEMIC RESEARCH FOR MULTIDISCIPLINARY Impact Factor 1.393, ISSN: , Volume 2, Issue 7, August 2014

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

Topics. Midterm Finish Chapter 7

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip

CS 151 Midterm. (Last Name) (First Name)

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

A Review on Optimizing Efficiency of Fixed Point Multiplication using Modified Booth s Algorithm

A High Speed Design of 32 Bit Multiplier Using Modified CSLA

Review from last time. CS152 Computer Architecture and Engineering Lecture 6. Verilog (finish) Multiply, Divide, Shift

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

Efficient Algorithm for Test Vector Decompression Using an Embedded Processor

ECE 545 Lecture 12. FPGA Resources. George Mason University

Introduction to Digital Logic Missouri S&T University CPE 2210 Multipliers/Dividers

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA

VLSI DESIGN (ELECTIVE-I) Question Bank Unit I

R10. II B. Tech I Semester, Supplementary Examinations, May

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

University of Illinois at Chicago. Lecture Notes # 10

FPGA IMPLEMENTATION OF EFFCIENT MODIFIED BOOTH ENCODER MULTIPLIER FOR SIGNED AND UNSIGNED NUMBERS

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

TESTING OF FAULTS IN VLSI CIRCUITS USING ONLINE BIST TECHNIQUE BASED ON WINDOW OF VECTORS

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

COMP 303 Computer Architecture Lecture 6

Computer Architecture and Organization

Implementation of Floating Point Multiplier Using Dadda Algorithm

Arithmetic Circuits & Multipliers

Integer Multipliers 1

Jan Rabaey Homework # 7 Solutions EECS141

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Timing for Ripple Carry Adder

Partial product generation. Multiplication. TSTE18 Digital Arithmetic. Seminar 4. Multiplication. yj2 j = xi2 i M

Principles of Computer Architecture. Chapter 3: Arithmetic

INDEX OF VERILOG MODULES

Area-Delay-Power Efficient Carry-Select Adder

Signed integers: 2 s complement. Adder: a circuit that does addition. Sign extension 42 = = Arithmetic Circuits & Multipliers

Design and Implementation of Advanced Modified Booth Encoding Multiplier

Detection Of Fault In Self Checking Carry Select Adder

Cpr E 281 FINAL PROJECT ELECTRICAL AND COMPUTER ENGINEERING IOWA STATE UNIVERSITY. FINAL Project. Objectives. Project Selection

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding Algorithm

Outline. Field Programmable Gate Arrays. Programming Technologies Architectures. Programming Interfaces. Historical perspective

Chapter 4. Combinational Logic

Presentation Outline Overview of FPGA Architectures Virtex-4 & Virtex-5 Overview of BIST for FPGAs BIST Configuration Generation Output Response Analy

Chapter 3 Arithmetic for Computers. ELEC 5200/ From P-H slides

Microcomputers. Outline. Number Systems and Digital Logic Review

High Speed Han Carlson Adder Using Modified SQRT CSLA

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION

Experiment 7 Arithmetic Circuits Design and Implementation

Built in Self Test Architecture using Concurrent Approach

PINE TRAINING ACADEMY

DESIGN AND IMPLEMENTATION OF ADDER ARCHITECTURES AND ANALYSIS OF PERFORMANCE METRICS

Transcription:

Virtex-4 DSP Architecture 2 2 DSP slices per tile 16-256 tiles in 1-8 columns Each DSP includes: 3-input, 48-bit adder/subtractor P P = Z±(X+Y+Cin) Optional accum reg C (48) 18x18-bit 2's-comp multiplier (w/o adder) User controlled operational modes For X, Y, & Z MUXs Configuration bits control other MUXs Pipelining registers Accumulator register A(18) B(18) A(18) B(18) Outputs w/ dedicated routing X Y Inputs for cascading Outputs w/ dedicated routing C. Stroud 1/08 VLSI D&T Seminar 1 Inputs for cascading Z X Y Z ± ± P (48) P (48)

Adder BIST Test algorithm depends on architecture But architecture is not specified in data sheets Eliminate sequential logic architectures Based on modified Booth Adder choices include: Ripple carry Carry select Carry save Carry-look-ahead (CLA) Our assumption based on area/performance analysis But multiple types of CLA Our goal: find/develop architecture independent test algorithm(s) C. Stroud 1/08 VLSI D&T Seminar 2

Carry-Look-Ahead Adder Recall CLA was more difficult to test Basic CLA is 4-bits 4-bit CLAs then combined to form larger adders Ripple CLAs 2 2 types based on Lookahead Carry Unit (LCU): Ripple LCU Multi-stage LCU C 4 A 3 B 3 Full Adder A 2 B 2 Full Adder A 1 B 1 Full Adder S 3 S 2 S 1 S 0 Gi=Ai Bi Pi=Ai+Bi A 0 B 0 Full Adder P 3 G 3 C 3 P 2 G 2 C 2 P 1 G 1 C 1 P 0 G 0 4-bit Carry Look Ahead PG GG PG=P0 P1 P2 P3 GG=G3+G2 P3+G1 P2 P3+G0 P1 P2 P3 C. Stroud 1/08 VLSI D&T Seminar 3 C 0 C1=G0+P0 C0 C2=G1+G0 P1+P1 P0 C0 C3=G2+G1 P2+G0 P1 P2+P2 P1 P0 C0 C4=G3+G2 P3+G1 P2 P3+G0 P1 P2 P3+P3 P2 P1 P0 C0

CLA Test Algorithms On the Adders with Minimum Tests Kajihara and Sasao Proc. VLSI Test Symp, pp. 10-15, 15, 1997 (VTS 97) 10 vectors detect all single and multiple faults In any size ripple CLA (not an LCU implementation) Scalable Test Generators for High-Speed Datapath Circuits Al-Asaad, Hayes, and Murray J. Electronic Testing, vol 12, pp. 111-125, 125, 1998 (JETTA 98) 2 (N+1) vector sequence (for an N-bit adder) TPG implementation requires: N+1-bit shift register N XOR gates, N XNOR gates, and 1 inverter C. Stroud 1/08 VLSI D&T Seminar 4

A i B i Cin 1111111110000000000 1111111110000000001 1111111100000000001 1111111010000000011 1111110110000000111 1111101110000001111 1111011110000011111 1110111110000111111 1101111110001111111 1011111110011111111 0111111110111111111 0000000001111111111 0000000001111111110 0000000011111111110 0000000101111111100 0000001001111111000 0000010001111110000 0000100001111100000 0001000001111000000 0010000001110000000 0100000001100000000 1000000001000000000 CLA BIST Scheme Easy BIST circuit to implement But we found a problem in design 2 missing patterns needed for 100% FC Replace inverter with flip-flop reset 2 (N+2) vector sequence to CLA carry-in N+1-bit Serial Shift Register Q i Q i+1 A i B i C. Stroud 1/08 VLSI D&T Seminar 5

Fault Simulation Results JETTA 98 approach gives best overall fault coverage regardless of adder implementation Undetected faults in JETTA 98 approach can be detected Results in New BIST column for 2 (N+2) vector sequence JETTA 98 also claims similar BIST approach for Modified-Booth multiplier But description of test algorithm is very sketchy 48-bit CLA Adder Implementation Gate Delays # Faults Test Algorithm VTS 97 JETTA 98 New BIST Ripple CLA 28 1392 100% 99.9% 100% Ripple LCU 12 1542 95.7% 99.9% 100% Multi-stage LCU 10 1506 95.9% 99.9% 100% C. Stroud 1/08 VLSI D&T Seminar 6

Adder in Virtex-4 DSP Adder has 3 input ports P P = Z±(X+Y+Cin) We interpret this as a 2-stage CLA adder/subtractor implementation Apply test patterns to each stage in turn 2 2 clock cycles per vector OPMODE control Clock cycle #1 #2 X Y Z test vector (Z MUX) C port (Y MUX) B port 48-bit CLA 48-bit CLA (X MUX) A port CIN Subtract C. Stroud 1/08 VLSI D&T Seminar 7

BIST Approach for Virtex-5 DSP Optional regs like V4 but data sheets have less info Larger multiplier but same test algorithm Logical operations but 48-bit cascade of A:B allows direct testing Pattern detect but known algorithm for = comparator C. Stroud 1/08 VLSI D&T Seminar 8

Multiplier BIST Test algorithm depends on architecture Virtex-4/5 architecture is not specified in data sheets Eliminate sequential logic architectures Based on modified Booth Multiplier choices include: Unsigned Array Baugh Wooley Modified Booth Modified Booth/Wallace tree Our assumption based on area/performance analysis Our goal: find/develop architecture independent test algorithm(s) 10/15/2010 VLSI D&T Seminar 9

Modified Booth Test Algorithms Test algorithm uses 8-bit counter (256 vectors) Effective Built-In Self-Test for Booth Multipliers Gizopoulos, Paschalis & Zorian IEEE Design & Test of Computers, 1998 Claim fault coverage ~ 99.8% 4x4 connections to multiplier inputs 4 4 algorithm 8-bit counter MSB LSB 4 4 n Booth encoding n n multiplier AAAAAAAA BBBBBBBB 76543210 76543210 00000000 00000000 00000000 00010001 00000000 00100010 00000000 00110011 00000000 11101110 00000000 11111111 00010001 00000000 00010001 00010001 00010001 00100010 10/15/2010 VLSI D&T Seminar 10 n 2n CCCCCCCC CCCCCCCC 76547654 32103210

Modified Booth Wallace Tree Algorithms Test algorithm also uses 8-bit counter (256 vectors) Effective BIST Architecture for Fast Multiplier Cores Paschalis, Kranitis, Psarakis, Gizopoulus & Zorian Proc. Design Automation and Test in Europe Conf., 1999 Low Power BIST for Wallace Tree-based Fast Multipliers Bakalis Bakalis, Kalligeros, Nikolos, Vergos & Alexiou 5 3 algorithm 8-bit counter MSB LSB Proc. Int. Symp. on Quality of Electronic Design, 2000 5x3 connections with 5 inputs to Booth encoding port Both papers claim fault coverage > 99% 5 3 n Booth encoding n n multiplier AAAAAAAA BBBBBBBB 76543210 76543210 00000000 00000000 00000000 01001001 00000000 10010010 00000000 11011011 00000000 10110110 00000000 11111111 00100001 00000000 00100001 01001001 00100001 10010010 10/15/2010 VLSI D&T Seminar 11 n 2n CCCCCCCC CCCCCCCC 54376543 01210210

Modified Booth Test Algorithms Test algorithm uses 8-bit counter (256 vectors) But which side is Booth encoding? Xilinx does not specify Our original approach Run 5x3 algorithm 5 3 3 5 algorithm 8-bit counter MSB LSB 53 35 256 vectors n and run 3x5 algorithm 512 vectors Include 4x4 if fault coverage improves 768 vectors Additional algorithms only require multiplexers to change inputs Use same 8-bit counter n Booth encoding 10/15/2010 VLSI D&T Seminar 12 2n n n multiplier

Multipliers evaluated Unsigned array Analysis Signed array Baugh Wooley Modified Booth Carry look-ahead adders sum partial products in every stage Modified Booth Wallace Tree Carry look-ahead adder sums final stage partial products Carry select adder sums final stage partial products Ripple carry adder sums final stage partial products Custom Implementation of Modified Booth 10/15/2010 VLSI D&T Seminar 13

Analysis Designed 8-bit models of the multipliers Fault model: Collapsed single stuck-at gate level faults Exhaustive testing To determine undetectable faults Test algorithms evaluated 4 4 5 3 3 5 5 3 3 & 3 5 4 4, 4, 5 3 & 3 5 10/15/2010 VLSI D&T Seminar 14

Total Faults Test Algorithm # faults detected (effective fault coverage) Multiplier 5 3 & 5 3, 3 5 Exhaust 4 4 5 3 3 5 3 5 & 4 4 Unsigned array 1648 1644 1644 1644 1621 1644 1644 (100) (100) (100) (98.60) (100) (100) Signed array 1648 1644 1644 1644 1644 1644 1644 (100) (100) (100) (100) (100) (100) Mod-Booth 2499 2196 2180 2168 2179 2182 2193 (100) (99.27) (98.72) (99.23) (99.36) (99.86) Mod-Booth Wall-Tree CLA 2184 2090 2061 2068 2070 2071 2074 (100) (98.61) (98.95) (99.04) (99.09) (99.23) Mod-Booth Wall-Tree CSA 2422 2243 2215 2217 2218 2222 2228 (100) (98.75) (98.84) (98.89) (99.06) (99.33) Mod-Booth Wall-Tree RCA 2021 1962 1937 1944 1944 1944 1947 (100) (98.73) (99.08) (99.08) (99.08) (99.24) Custom Mod- 1805 1781 1787 1785 1791 1793 1908 Booth (100) (98.67) (99.00) (98.89) (99.22) (99.34) 10/15/2010 VLSI D&T Seminar 15

Summary If the architecture of the multiplier is not known: 3 5 5 algorithm gives best overall fault coverage for most multipliers Contradicting the claim of the authors who proposed 5 3 Running 3 5 & 5 3 gives better fault coverage for all multipliers Running all three algorithms: 3 5, 5 3 and 4 4 4 test algorithms provides the best fault coverage for all multipliers Architecture independent testing Virtex-4 & Virtex-5 multipliers Original approach was 3 5 and 5 3 Better approach would be 3 5 and 4 4 10/15/2010 VLSI D&T Seminar 16

Summary Adder test algorithm in JETTA 98 Easy to implement and excellent FC on all adders (stuck-at and bridging faults) 100% FC on most adders Easily adapted to subtractors & adder/subtractors Used in BIST for Virtex-4 & 5 FPGAs Both test algorithms (adder & multiplier) will be used on Spartan-6 FPGA DSPs Currently under development C. Stroud 1/08 VLSI D&T Seminar 17

Read More About It M. Pulukuri, G. Starr & C. Stroud, On BIST for Multipliers, Proc. IEEE Southeast Regional Conf.,, pp. 25-28, 28, 2010 M. Pulukuri & C. Stroud, On BIST for Adders, J. Electronic Testing: Theory & Applications,, pp. 343-346, 346, 2009 M. Pulukuri & C. Stroud, BIST for DSPs in Virtex-4 FPGAs, Proc. IEEE Southeast Symp. on System Theory,, pp. 34-38, 38, 2009 M. Pulukuri, BIST of DSP Cores in Virtex-4 & 5 FPGAs,, AU MS Thesis, 2010 10/15/2010 VLSI D&T Seminar 18