Outline of Presentation History of DSP Architectures in FPGAs Overview of Virtex-4 4 DSP

Similar documents
Each DSP includes: 3-input, 48-bit adder/subtractor

Presentation Outline Overview of FPGA Architectures Virtex-4 & Virtex-5 Overview of BIST for FPGAs BIST Configuration Generation Output Response Analy

Outline of Presentation

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Atmel AT94K FPSLIC Architecture Field Programmable Gate Array

Built-In Self-Test of Programmable Input/Output Tiles in Virtex-5 FPGAs

Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs

A Case Study. Jonathan Harris, and Jared Phillips Dept. of Electrical and Computer Engineering Auburn University

Built-In Self-Test for Regular Structure Embedded Cores in System-on-Chip

Evaluation of FPGA Resources for Built-In Self-Test of Programmable Logic Blocks

Autonomous Built-in Self-Test Methods for SRAM Based FPGAs

EMULATED FAULT INJECTION FOR BUILT-IN SELF-TEST

FPGA architecture and design technology

BIST-Based Test and Diagnosis of FPGA Logic Blocks

BIST-Based Test and Diagnosis of FPGA Logic Blocks 1

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

Outline. Field Programmable Gate Arrays. Programming Technologies Architectures. Programming Interfaces. Historical perspective

Built-In Self-Test for System-on-Chip: A Case Study

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Topics. Midterm Finish Chapter 7

Topics. Midterm Finish Chapter 7

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs

Analysis and Implementation of Built-In Self-Test for Block Random Access Memories in Virtex-5 Field Programmable Gate Arrays. Justin Lewis Dailey

FPGA Architecture Overview. Generic FPGA Architecture (1) FPGA Architecture

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Very Large Scale Integration (VLSI)

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

Novel Implementation of Low Power Test Patterns for In Situ Test

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

Xilinx ASMBL Architecture

An Architecture for Fail-Silent Operation of FPGAs and Configurable SoCs

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Verilog Sequential Logic. Verilog for Synthesis Rev C (module 3 and 4)

DSP Resources. Main features: 1 adder-subtractor, 1 multiplier, 1 add/sub/logic ALU, 1 comparator, several pipeline stages

FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

FPGA Implementations

Binary Adders. Ripple-Carry Adder

INTRODUCTION TO FPGA ARCHITECTURE

CS222: Processor Design

Testing Configurable LUT-Based FPGAs

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

Alternative Techniques for Built-In Self-Test of Field Programmable. Gate Arrays

TSEA44 - Design for FPGAs

Digital Design with FPGAs. By Neeraj Kulkarni

Digital Circuit Design and Language. Datapath Design. Chang, Ik Joon Kyunghee University

ECE 341. Lecture # 6

REGISTER TRANSFER LANGUAGE

An Efficient Fused Add Multiplier With MWT Multiplier And Spanning Tree Adder

Field Programmable Gate Array (FPGA)

Week 7: Assignment Solutions

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

Analysis and Improvement of Virtex-4 Block RAM Built-In Self-Test. and Introduction to Virtex-5 Block RAM Built-In Self-Test.

DFT for Regular Structures

BUILT-IN SELF-TEST CONFIGURATIONS FOR FIELD PROGRAMMABLE GATE ARRAY CORES IN SYSTEMS-ON-CHIP

Overview. EECS Components and Design Techniques for Digital Systems. Lec 16 Arithmetic II (Multiplication) Computer Number Systems.

ECE 341 Midterm Exam

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

Arithmetic Logic Unit. Digital Computer Design

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

TESTING OF FAULTS IN VLSI CIRCUITS USING ONLINE BIST TECHNIQUE BASED ON WINDOW OF VECTORS

Scan-Based BIST Diagnosis Using an Embedded Processor

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Circuit Partitioning for Application-Dependent FPGA Testing

Introduction to Digital Logic Missouri S&T University CPE 2210 Multipliers/Dividers

Microcomputers. Outline. Number Systems and Digital Logic Review

ECE 545 Lecture 12. FPGA Resources. George Mason University

ELCT 501: Digital System Design

Transparent Structural Online Test for Reconfigurable Systems

EECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:

Collapsing for Multiple Output Circuits. Diagnostic and Detection Fault. Raja K. K. R. Sandireddy. Dept. Of Electrical and Computer Engineering,

Parallel FIR Filters. Chapter 5

Computer Arithmetic Multiplication & Shift Chapter 3.4 EEC170 FQ 2005

PINE TRAINING ACADEMY

Learning Outcomes. Spiral 2-2. Digital System Design DATAPATH COMPONENTS

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

A High Speed Binary Floating Point Multiplier Using Dadda Algorithm

Virtex-II Architecture

Altera FLEX 8000 Block Diagram

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

FPGA for Software Engineers

Fault Tolerant Computing CS 530 Testing Sequential Circuits

Learning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS

FPGA Design Challenge :Techkriti 14 Digital Design using Verilog Part 1

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering. Winter/Summer Training

Parallelized Radix-4 Scalable Montgomery Multipliers

Efficient design and FPGA implementation of JPEG encoder

Contents 1 Basic of Test and Role of HDLs 2 Verilog HDL for Design and Test

VLSI Testing. Virendra Singh. Bangalore E0 286: Test & Verification of SoC Design Lecture - 7. Jan 27,

Outline. EECS Components and Design Techniques for Digital Systems. Lec 11 Putting it all together Where are we now?

At-Speed On-Chip Diagnosis of Board-Level Interconnect Faults

VLSI Testing. Fault Simulation. Virendra Singh. Indian Institute of Science Bangalore

Testing Digital Systems I

Fault Grading FPGA Interconnect Test Configurations

Pipelining & Verilog. Sequential Divider. Verilog divider.v. Math Functions in Coregen. Lab #3 due tonight, LPSet 8 Thurs 10/11

VLSI Test Technology and Reliability (ET4076)

Review from last time. CS152 Computer Architecture and Engineering Lecture 6. Verilog (finish) Multiply, Divide, Shift

Transcription:

Built-In Self-Test of DSPs in Virtex-4 FPGAs (Funded by NSA) Charles Stroud Dept. of Electrical & Computer Engineering Auburn University

Outline of Presentation History of DSP Architectures in FPGAs Overview of Virtex-4 4 DSP Prior Testing R&D vs. Our Analysis for: Literature on DSP test not applicable No papers published on DSPs in FPGAs Literature on Multipliers and Adders Application to Virtex-4 4 DSPs BIST for DSPs in Virtex-4 Architecture, Operation, and Implementation Timing and Fault Injection Analysis Summary and Conclusions Plans for application to Virtex-5 C. Stroud 1/08 VLSI D&T Seminar 2

Xilinx FPGA Architectures 4000/Spartan NxN array of unit cells Unit cell = CLB + routing Fast carry logic in CLBs for adders Virtex/Spartan-2 MxN array of unit cells Carry logic + AND gate for array multipliers 4K block RAMs at edges Virtex-2/Spartan-3 18K block RAMs in array 18x18-bit multipliers with each RAM based on modified Booth architecture Virtex-4/Virtex-5 Added 48-bit DSP cores w/multipliers Altera includes 9x9 multipliers based on modified Booth architecture PC PC PC PC C. Stroud 1/08 VLSI D&T Seminar 3

Virtex-4 4 DSP Architecture 2 2 DSP slices per tile 16-256 tiles in 1-81 columns Each DSP includes: 3-input, 48-bit adder/subtractor P = Z±(X+Y+Cin) Z Optional accum reg 18x18-bit 2's-comp multiplier (w/o adder) User controlled operational modes C (48) For X, Y, & Z MUXs Configuration bits control other MUXs Pipelining registers Accumulator register A(18) B(18) A(18) B(18) Outputs w/ dedicated routing C. Stroud 1/08 VLSI D&T Seminar 4 Inputs for cascading Outputs w/ dedicated routing Inputs for cascading X Y Z X Y Z ± ± P (48) P (48)

Multiplier and Adder Architectures Test algorithm depends on architecture But architecture is not specified in data sheets Eliminate sequential logic architectures Based on modified Booth Adder choices include: Ripple carry Carry select Carry save Carry-look-ahead (CLA) Our assumption based on area/performance analysis But multiple types of CLA Multiplier choices include: Array Booth Modified Booth Wallace tree Modified Booth/Wallace tree Our assumption based on area/performance analysis Our goal: find/develop architecture independent test algorithm(s) C. Stroud 1/08 VLSI D&T Seminar 5

Array Multiplier Test Algorithm Kalyana Kantipudi s s MS thesis 10 vectors give 100% fault coverage for C6288 a 16x16-bit array multiplier 18x18-bit array multiplier results Only achieved 95% fault coverage Pattern expansion required for 16x16-bit to 18x18-bit Potential for mistakes if patterns not expanded properly Modified Booth multiplier results 62% with carry-save adder 37% with CLA Note difference in FC wrt adder implementation Chris Erickson s Results Conclusion: : array multiplier test vectors do not adequately test modified Booth multiplier C. Stroud 1/08 VLSI D&T Seminar 6

Modified Booth Test Algorithms Two test algorithms using 8-bit 8 counter (256 vectors) Low Power BIST for Wallace Tree-based Fast Multipliers Bakalis, Kalligeros, Nikolos, Vergos & Alexiou Proc. Int. Symp. on Quality of Electronic Design, pp. 433-438, 438, 2000 5x3 connections with 5 inputs to Booth encoding But which side is Booth encoding? Our approach: run both 5x3 and 3x5 algorithms n n Booth encoding Algorithm used in Gizopoulos, Paschalis & Zorian Srinivas Garimella s 2n IEEE Design & Test of Computers MS thesis for n n multiplier pp. 105-111, 111, 1998 Virtex-2 2 multipliers 4x4 connections to multiplier inputs Our approach: also include 4x4 if fault coverage improves Effective Built-In Self-Test for Booth Multipliers 5 3 4 4 3 5 algorithm 8-bit counter MSB LSB 54 3 34 5 C. Stroud 1/08 VLSI D&T Seminar 7

4x4 Booth Multiplier Test Algorithm 18x18-bit array multiplier results 99.99% (1( 1 undetected fault) Booth multiplier results 90% with ripple-carry adder 90% with carry-save adder 70% with CLA Conclusion: : modified Booth multiplier test vectors do test array multiplier But Modified-Booth/Wallace Booth/Wallace-Tree appears to be most likely candidate for Virtex-4 4 DSP multiplier implementation Also for Virtex-5 5 and Altera Chris Erickson s Results Note difference in FC wrt adder implementation C. Stroud 1/08 VLSI D&T Seminar 8

Other Multiplier Results 4x4-bit implementations Exhaustive test patterns Undetected faults are undetectable Same as 4x4, 5x3, & 3x5 algorithm for 4x4-bit multiplier Simulation results discrepancy for array multiplier 4 4 undetected faults in 4x4-bit implementation 1 1 undetected fault in 18x18 multiplier w/ 4x4 algorithm in Chris Erickson s results Multiplier Array Signed Array Wallace Tree # faults 272 337 283 # detected 268 320 280 Chitanya Bandi s s Results # undetect 4 17 3 FC 98.5% 95.0% 98.9% C. Stroud 1/08 VLSI D&T Seminar 9

8 8 Modified-Booth/Wallace-Tree Multiplier Version No reduction With reduction Test Algorithm Exhaustive 4 4 Exhaustive 4 4 5 3 3 5 4 4 & 5 3 4 4 & 3 5 5 3 & 3 5 # Vectors 65,536 256 65,536 256 512 Total Faults 3,402 2,341 Fault simulation results: Faults Detected 2,520 2,477 2,031 2,005 2,011 2,015 2,015 2,019 2,029 Not Detected 5x3 plus 3x5 give best fault coverage No additional faults detected with 4x4 86.2% 99.4% C. Stroud 1/08 VLSI D&T Seminar 10 882 925 310 336 330 326 326 322 Fault Coverage 74.1% 72.8% 86.8% 85.6% 85.9% 86.1% 86.1% Effective FC 100% 99.0% 100% 98.7% 99.0% 99.2% 99.2% 312 86.7% 99.9% Chitanya Bandi s s Results (note: used ripple carry adder to sum partial products)

Carry-Look Look-Ahead Adder Recall CLA was more difficult to test Basic CLA is 4-bits4 4-bit CLAs then combined to form larger adders Ripple CLAs 2 2 types based on Lookahead Carry Unit (LCU): Ripple LCU Multi-stage LCU C 4 Full Adder S 3 A 3 B 3 A 2 B 2 Full Adder Gi=Ai Bi Pi=Ai+Bi C1=G0+P0 C0 C2=G1+G0 P1+P1 P0 C0 C3=G2+G1 P2+G0 P1 P2+P2 P1 P0 C0 C4=G3+G2 P3+G1 P2 P3+G0 P1 P2 P3+P3 P2 P1 P0 C0 C. Stroud 1/08 VLSI D&T Seminar 11 S 2 A 1 B 1 Full Adder S 1 A 0 B 0 Full Adder S 0 P 3 G 3 C 3 P 2 G 2 C 2 P 1 G 1 C 1 P 0 G 0 4-bit Carry Look Ahead PG GG PG=P0 P1 P2 P3 GG=G3+G2 P3+G1 P2 P3+G0 P1 P2 P3 C 0

CLA Test Algorithms On the Adders with Minimum Tests Kajihara and Sasao Proc. VLSI Test Symp, pp. 10-15, 15, 1997 (VTS 97) 10 vectors detect all single and multiple faults In any size ripple CLA (not( an LCU implementation) Scalable Test Generators for High-Speed Datapath Circuits Al-Asaad, Asaad, Hayes, and Murray J. Electronic Testing, vol 12, pp. 111-125, 125, 1998 (JETTA 98) 2 (N+1) vector sequence (for an N-bit adder) TPG implementation requires: N+1-bit shift register N XOR gates, N XNOR gates, and 1 inverter C. Stroud 1/08 VLSI D&T Seminar 12

A i B i Cin 1111111110000000000 1111111110000000001 1111111100000000001 1111111010000000011 1111110110000000111 1111101110000001111 1111011110000011111 1110111110000111111 1101111110001111111 1011111110011111111 0111111110111111111 0000000001111111111 0000000001111111110 0000000011111111110 0000000101111111100 0000001001111111000 0000010001111110000 0000100001111100000 0001000001111000000 0010000001110000000 0100000001100000000 1000000001000000000 CLA BIST Scheme Easy BIST circuit to implement But we found a problem in design 2 missing patterns needed for 100% FC Replace inverter with flip-flop 2 (N+2) vector sequence reset to CLA carry-in N+1-bit Serial Shift Register Q i Q i+1 C. Stroud 1/08 VLSI D&T Seminar 13 A i B i

Fault Simulation Results JETTA 98 approach gives best overall fault coverage regardless of adder implementation Undetected faults in JETTA 98 approach can be detected Results in New BIST column for 2 (N+2) vector sequence JETTA 98 also claims similar BIST approach for Modified-Booth multiplier But description of test algorithm is very sketchy 48-bit CLA Adder Implementation Gate Delays # Faults VTS 97 Test Algorithm JETTA 98 New BIST Ripple CLA 28 1392 100% 99.9% 100% Ripple LCU 12 1542 95.7% 99.9% 100% Multi-stage LCU 10 1506 95.9% 99.9% 100% C. Stroud 1/08 VLSI D&T Seminar 14

Adder in Virtex-4 4 DSP Adder has 3 input ports P P = Z±(X+Y+Cin) Z We interpret this as a 2-stage 2 CLA adder/subtractor implementation Apply test patterns to each stage in turn 2 2 clock cycles per vector OPMODE control Clock cycle #1 #2 X Y Z test vector (Z MUX) C port (Y MUX) B port 48-bit CLA 48-bit CLA (X MUX) A port CIN Subtract C. Stroud 1/08 VLSI D&T Seminar 15

DSP BIST Modes & Sequences Test pattern sequence Four groups of 256 clock cycles (ccs) each Allows control of operational modes (OPMODEs) of DSP Test mode controlled by 4-bit 4 shift register Bits include: Test Mode (2), Invert Control Signals, Reset Contents loaded via Boundary Scan interface Reduces the number of downloads to FPGA Mode (Test) 00 (multiply) 01 (adder) Preg=1 only 10 (cascade) First 256 ccs P = A B P = Z(C) P=X(P)+Y(C) P 1 = A:B+Z(PC) P 0 = Z(C) Second 256 ccs P = A B P = Y(C) P=Y(C)+Z(P) P 1 =A:B+Z(ShiftPC) P 0 = Z(C) Constant Control Signals Third 256 ccs P = A B+C P = Z(C) P=Y(C)+Z(P) P 1 = Z(C) P 0 =A:B+Z(PC) Fourth 256 ccs P = A:B+C P = Y(C) P=Y(C)+Z(ShiftP) P 1 = Z(C) P 0 =A:B+Z(ShiftPC) Pseudo-Random Control Signals C. Stroud 1/08 VLSI D&T Seminar 16

BIST Architecture 2 TPGs drive alternate rows of DSPs tiles TPG drives both DSPs in tile Prevents faulty TPG from escaping detection DSPs driven by different TPGs compared by Like DSPs compared Slice 0 compared to slice 0 Slice 1 compared to slice 1 Top DSPs compared to bottom DSPs in circular comparison BSCAN shift reg test mode TPG 1 TPG 0 DSP s1 DSP s0 DSP s1 DSP s0 DSP s1 DSP s0 DSP s1 DSP s0 DSP s1 DSP s0 DSP s1 DSP s0 C. Stroud 1/08 VLSI D&T Seminar 17

TPG Architecture Counter 5 33 and 3 53 5 multiplier test to ports A&B Shift register 2 (N+2) vector adder test to port C FSM OPMODE control for 4 group sequences LFSR pseudo-random patterns to other control inputs during last two groups of 256 clock cycles to 48 A port B port DSP slice 1 P port C port OPMODE control 36 48 7 Counter TPG Shift Register FSM 36 48 7 A port B port DSP slice 0 P port C port OPMODE control 48 to 32 LFSR 32 C. Stroud 1/08 VLSI D&T Seminar 18

ORA Implementation Old comparison-based ORA DSP i output k DSP j output k LUT Logic 1 latched in FF due to mismatches Configuration memory readback used to get results CLBs have dedicated carry chain for fast adders TDI and counters O O O New ORA latches logic 0 due to mismatch Carry chain performs iterative OR function Single pass/fail DSP i output k indication at end of BIST sequence LUT Only read configuration memory to get failing results for diagnosis carry-out DSP j output k 0 1 carry-in C. Stroud 1/08 VLSI D&T Seminar 19 1 TDO

5 5 downloads to FPGA BIST Configurations 1 1 compressed download (<50% of full config) + + 4 partial reconfigurations (<0.5% of full config) only change DPS configuration bits 7 7 BIST sequences BIST configurations #2 & #3 ran twice BIST Config 1 2 3 4 5 different control register values for multiplier/adder test algorithms Pipeline Signals B Input Source Test Modes Applied Registers Active Level All Regs=0 All Regs=1 A&Breg=2 Other Regs=1 All Regs=1 All Regs=1 High High Low High Low Slice0 Direct Direct Direct Direct Cascade Slice1 Direct Direct Direct Cascade Direct bottom row failures due to unconnected cascade inputs Multiply Yes (1) Yes (2) Yes (4) Yes (6) Yes (7) C. Stroud 1/08 VLSI D&T Seminar 20 No No Adder No Yes (3) Yes (5) No No Cascade No No No

One slice from pair put in cascade mode at a time Cascade Mode Testing Circular comparison of slices sees identical behavior Cascade inputs to bottom DSP are not connected Expected failures in comparing that DSP s s outputs C. Stroud 1/08 VLSI D&T Seminar 21

DSP BIST Implementations Circular comparison per DSP column Each slice in tile compared with its counterpart slice0-to to-slice0 slice1-to to-slice1 CLB carry chain used to provide pass/fail indication Only read config memory contents to get results for diagnosis DSPs TPG0 SX25 BSCAN C. Stroud 1/08 VLSI D&T Seminar 22 TDI TPG1 TDO

Automated BIST Configurations C C program generates.xdl file.xdl to.ncd xdl xdl2ncd bist.ncd FPGA Editor Design Rule Check Route design.ncd to.bit BitGen Download into FPGA.NCD to.xdl Modification program for generating remaining 4 BIST configurations BIST Programs XDL file XDL.exe NCD file BitGen.exe BIT file download FPGA Editor C. Stroud 1/08 VLSI D&T Seminar 23

DSP BIST Implementations TPG0 TPG1 LX15 TPG1 FX12 TPG0 Brad Dutton Generated BIST configurations for all Virtex-4 FPGAs and verified BIST on LX25, LX60, SX35, & FX12 via download and execution Power PC DSPs DSPs C. Stroud 1/08 VLSI D&T Seminar 24

Maximum CLock Frequency (MHz) 150 120 90 60 30 Bogus timing analysis by Xilinx tools due to unused cascade path with no pipeline registers BIST Timing Analysis David Baumann s results 0 Config 1 Config 2 Config 3 Config 4 Config 5 C. Stroud 1/08 VLSI D&T Seminar 25

BIST Timing Analysis Maximum Clock Frequency (MHz) 80 70 60 50 40 30 20 32 32 1 1 Based on configuration #3 48 1 max function of #DSPs & size of array 128 4 192 128 2 F max 160 2 4 512 8 1 32 48 1 #DSPs #DSP columns 4 TPGs might improve Fmax 64 64 1 1 80 1 96 1 10 0 FX12 FX25 FX40 FX60 FX100 SX25 SX35 SX55 LX15 LX25 LX40 LX60 LX80 LX100 C. Stroud 1/08 VLSI D&T Seminar 26

Physical Fault Injection Faulty FPGAs are difficult to find 1 ORCA with faulty PLB & 2 ORCAs with faulty routing Physical fault insertion Etch package down to bare die and zap We use fault injection emulation Modify configuration bits before or after download (RMW) Can inject single and/or multiple faults Mustfa Ali s s work Stuck-at faults & bridging faults Faults limited effects of configuration bits System or BIST configuration file 110100010 001010101 Download file 1101 110110010 001000101 Stuck-at values 011011001 1001000 Fault mask 000010000 0000100 faults 1001 000101 FPGA C. Stroud 1/08 VLSI D&T Seminar 27

Cinib Csel0ib Csel1ib Subib Op0ib Op1ib Op2ib Op3ib Op4ib Op5ib Op6ib Ceaib Cebib Cemib Cepib Ceccrtlib Cecinsubib Cecinib Rstaib Rstbib Rstmib Rstpib Rstctlib Rstcinib Areg0b Areg2b Breg0b Breg2b Mreg0b Preg0b Cinreg0b Cselreg0b Opreg0b Subreg0b Clkib Cascb Creg0b Cecib&t nocfgb 6 5 4 3 2 1 0 Fault Injection Emulation Results 1) Download BIST configuration 2) Manipulate configuration bit via read-modify modify-write 3) Run BIST sequence stuck-at-0 4) Get BIST results stuck-at-1 C. Stroud 1/08 VLSI D&T Seminar 28 # BIST configs detecting fault

Summary Investigated known test algorithms for multipliers and adders Looked for architecture independent tests with highest fault coverage JETTA 98 approach easy to implement Needs modification for 100% FC 7 7 DSP BIST sequences with 5 downloads New ORA eliminates config memory readback Total testing time < 52% of 1 full download Using compressed and partial reconfiguration Only DSP configuration bits need to be changed Application to Virtex-5 5 DSPs C. Stroud 1/08 VLSI D&T Seminar 29

BIST Approach for Virtex-5 5 DSP Optional regs like V4 but data sheets have less info Larger multiplier but same test algorithm Logical operations but 48-bit cascade of A:B allows direct testing Pattern detect but known algorithm for = comparator C. Stroud 1/08 VLSI D&T Seminar 30