Each DSP includes: 3-input, 48-bit adder/subtractor

Virtex-4 DSP Architecture 2 2 DSP slices per tile 16-256 tiles in 1-8 columns Each DSP includes: 3-input, 48-bit adder/subtractor P P = Z±(X+Y+Cin) Optional accum reg C (48) 18x18-bit 2's-comp multiplier (w/o adder) User controlled operational modes For X, Y, & Z MUXs Configuration bits control other MUXs Pipelining registers Accumulator register A(18) B(18) A(18) B(18) Outputs w/ dedicated routing X Y Inputs for cascading Outputs w/ dedicated routing C. Stroud 1/08 VLSI D&T Seminar 1 Inputs for cascading Z X Y Z ± ± P (48) P (48)

Adder BIST Test algorithm depends on architecture But architecture is not specified in data sheets Eliminate sequential logic architectures Based on modified Booth Adder choices include: Ripple carry Carry select Carry save Carry-look-ahead (CLA) Our assumption based on area/performance analysis But multiple types of CLA Our goal: find/develop architecture independent test algorithm(s) C. Stroud 1/08 VLSI D&T Seminar 2

Carry-Look-Ahead Adder Recall CLA was more difficult to test Basic CLA is 4-bits 4-bit CLAs then combined to form larger adders Ripple CLAs 2 2 types based on Lookahead Carry Unit (LCU): Ripple LCU Multi-stage LCU C 4 A 3 B 3 Full Adder A 2 B 2 Full Adder A 1 B 1 Full Adder S 3 S 2 S 1 S 0 Gi=Ai Bi Pi=Ai+Bi A 0 B 0 Full Adder P 3 G 3 C 3 P 2 G 2 C 2 P 1 G 1 C 1 P 0 G 0 4-bit Carry Look Ahead PG GG PG=P0 P1 P2 P3 GG=G3+G2 P3+G1 P2 P3+G0 P1 P2 P3 C. Stroud 1/08 VLSI D&T Seminar 3 C 0 C1=G0+P0 C0 C2=G1+G0 P1+P1 P0 C0 C3=G2+G1 P2+G0 P1 P2+P2 P1 P0 C0 C4=G3+G2 P3+G1 P2 P3+G0 P1 P2 P3+P3 P2 P1 P0 C0

CLA Test Algorithms On the Adders with Minimum Tests Kajihara and Sasao Proc. VLSI Test Symp, pp. 10-15, 15, 1997 (VTS 97) 10 vectors detect all single and multiple faults In any size ripple CLA (not an LCU implementation) Scalable Test Generators for High-Speed Datapath Circuits Al-Asaad, Hayes, and Murray J. Electronic Testing, vol 12, pp. 111-125, 125, 1998 (JETTA 98) 2 (N+1) vector sequence (for an N-bit adder) TPG implementation requires: N+1-bit shift register N XOR gates, N XNOR gates, and 1 inverter C. Stroud 1/08 VLSI D&T Seminar 4

A i B i Cin 1111111110000000000 1111111110000000001 1111111100000000001 1111111010000000011 1111110110000000111 1111101110000001111 1111011110000011111 1110111110000111111 1101111110001111111 1011111110011111111 0111111110111111111 0000000001111111111 0000000001111111110 0000000011111111110 0000000101111111100 0000001001111111000 0000010001111110000 0000100001111100000 0001000001111000000 0010000001110000000 0100000001100000000 1000000001000000000 CLA BIST Scheme Easy BIST circuit to implement But we found a problem in design 2 missing patterns needed for 100% FC Replace inverter with flip-flop reset 2 (N+2) vector sequence to CLA carry-in N+1-bit Serial Shift Register Q i Q i+1 A i B i C. Stroud 1/08 VLSI D&T Seminar 5

Fault Simulation Results JETTA 98 approach gives best overall fault coverage regardless of adder implementation Undetected faults in JETTA 98 approach can be detected Results in New BIST column for 2 (N+2) vector sequence JETTA 98 also claims similar BIST approach for Modified-Booth multiplier But description of test algorithm is very sketchy 48-bit CLA Adder Implementation Gate Delays # Faults Test Algorithm VTS 97 JETTA 98 New BIST Ripple CLA 28 1392 100% 99.9% 100% Ripple LCU 12 1542 95.7% 99.9% 100% Multi-stage LCU 10 1506 95.9% 99.9% 100% C. Stroud 1/08 VLSI D&T Seminar 6

Adder in Virtex-4 DSP Adder has 3 input ports P P = Z±(X+Y+Cin) We interpret this as a 2-stage CLA adder/subtractor implementation Apply test patterns to each stage in turn 2 2 clock cycles per vector OPMODE control Clock cycle #1 #2 X Y Z test vector (Z MUX) C port (Y MUX) B port 48-bit CLA 48-bit CLA (X MUX) A port CIN Subtract C. Stroud 1/08 VLSI D&T Seminar 7

BIST Approach for Virtex-5 DSP Optional regs like V4 but data sheets have less info Larger multiplier but same test algorithm Logical operations but 48-bit cascade of A:B allows direct testing Pattern detect but known algorithm for = comparator C. Stroud 1/08 VLSI D&T Seminar 8

Multiplier BIST Test algorithm depends on architecture Virtex-4/5 architecture is not specified in data sheets Eliminate sequential logic architectures Based on modified Booth Multiplier choices include: Unsigned Array Baugh Wooley Modified Booth Modified Booth/Wallace tree Our assumption based on area/performance analysis Our goal: find/develop architecture independent test algorithm(s) 10/15/2010 VLSI D&T Seminar 9

Modified Booth Test Algorithms Test algorithm uses 8-bit counter (256 vectors) Effective Built-In Self-Test for Booth Multipliers Gizopoulos, Paschalis & Zorian IEEE Design & Test of Computers, 1998 Claim fault coverage ~ 99.8% 4x4 connections to multiplier inputs 4 4 algorithm 8-bit counter MSB LSB 4 4 n Booth encoding n n multiplier AAAAAAAA BBBBBBBB 76543210 76543210 00000000 00000000 00000000 00010001 00000000 00100010 00000000 00110011 00000000 11101110 00000000 11111111 00010001 00000000 00010001 00010001 00010001 00100010 10/15/2010 VLSI D&T Seminar 10 n 2n CCCCCCCC CCCCCCCC 76547654 32103210

Modified Booth Wallace Tree Algorithms Test algorithm also uses 8-bit counter (256 vectors) Effective BIST Architecture for Fast Multiplier Cores Paschalis, Kranitis, Psarakis, Gizopoulus & Zorian Proc. Design Automation and Test in Europe Conf., 1999 Low Power BIST for Wallace Tree-based Fast Multipliers Bakalis Bakalis, Kalligeros, Nikolos, Vergos & Alexiou 5 3 algorithm 8-bit counter MSB LSB Proc. Int. Symp. on Quality of Electronic Design, 2000 5x3 connections with 5 inputs to Booth encoding port Both papers claim fault coverage > 99% 5 3 n Booth encoding n n multiplier AAAAAAAA BBBBBBBB 76543210 76543210 00000000 00000000 00000000 01001001 00000000 10010010 00000000 11011011 00000000 10110110 00000000 11111111 00100001 00000000 00100001 01001001 00100001 10010010 10/15/2010 VLSI D&T Seminar 11 n 2n CCCCCCCC CCCCCCCC 54376543 01210210

Modified Booth Test Algorithms Test algorithm uses 8-bit counter (256 vectors) But which side is Booth encoding? Xilinx does not specify Our original approach Run 5x3 algorithm 5 3 3 5 algorithm 8-bit counter MSB LSB 53 35 256 vectors n and run 3x5 algorithm 512 vectors Include 4x4 if fault coverage improves 768 vectors Additional algorithms only require multiplexers to change inputs Use same 8-bit counter n Booth encoding 10/15/2010 VLSI D&T Seminar 12 2n n n multiplier

Multipliers evaluated Unsigned array Analysis Signed array Baugh Wooley Modified Booth Carry look-ahead adders sum partial products in every stage Modified Booth Wallace Tree Carry look-ahead adder sums final stage partial products Carry select adder sums final stage partial products Ripple carry adder sums final stage partial products Custom Implementation of Modified Booth 10/15/2010 VLSI D&T Seminar 13

Analysis Designed 8-bit models of the multipliers Fault model: Collapsed single stuck-at gate level faults Exhaustive testing To determine undetectable faults Test algorithms evaluated 4 4 5 3 3 5 5 3 3 & 3 5 4 4, 4, 5 3 & 3 5 10/15/2010 VLSI D&T Seminar 14

Total Faults Test Algorithm # faults detected (effective fault coverage) Multiplier 5 3 & 5 3, 3 5 Exhaust 4 4 5 3 3 5 3 5 & 4 4 Unsigned array 1648 1644 1644 1644 1621 1644 1644 (100) (100) (100) (98.60) (100) (100) Signed array 1648 1644 1644 1644 1644 1644 1644 (100) (100) (100) (100) (100) (100) Mod-Booth 2499 2196 2180 2168 2179 2182 2193 (100) (99.27) (98.72) (99.23) (99.36) (99.86) Mod-Booth Wall-Tree CLA 2184 2090 2061 2068 2070 2071 2074 (100) (98.61) (98.95) (99.04) (99.09) (99.23) Mod-Booth Wall-Tree CSA 2422 2243 2215 2217 2218 2222 2228 (100) (98.75) (98.84) (98.89) (99.06) (99.33) Mod-Booth Wall-Tree RCA 2021 1962 1937 1944 1944 1944 1947 (100) (98.73) (99.08) (99.08) (99.08) (99.24) Custom Mod- 1805 1781 1787 1785 1791 1793 1908 Booth (100) (98.67) (99.00) (98.89) (99.22) (99.34) 10/15/2010 VLSI D&T Seminar 15

Summary If the architecture of the multiplier is not known: 3 5 5 algorithm gives best overall fault coverage for most multipliers Contradicting the claim of the authors who proposed 5 3 Running 3 5 & 5 3 gives better fault coverage for all multipliers Running all three algorithms: 3 5, 5 3 and 4 4 4 test algorithms provides the best fault coverage for all multipliers Architecture independent testing Virtex-4 & Virtex-5 multipliers Original approach was 3 5 and 5 3 Better approach would be 3 5 and 4 4 10/15/2010 VLSI D&T Seminar 16

Summary Adder test algorithm in JETTA 98 Easy to implement and excellent FC on all adders (stuck-at and bridging faults) 100% FC on most adders Easily adapted to subtractors & adder/subtractors Used in BIST for Virtex-4 & 5 FPGAs Both test algorithms (adder & multiplier) will be used on Spartan-6 FPGA DSPs Currently under development C. Stroud 1/08 VLSI D&T Seminar 17

Read More About It M. Pulukuri, G. Starr & C. Stroud, On BIST for Multipliers, Proc. IEEE Southeast Regional Conf.,, pp. 25-28, 28, 2010 M. Pulukuri & C. Stroud, On BIST for Adders, J. Electronic Testing: Theory & Applications,, pp. 343-346, 346, 2009 M. Pulukuri & C. Stroud, BIST for DSPs in Virtex-4 FPGAs, Proc. IEEE Southeast Symp. on System Theory,, pp. 34-38, 38, 2009 M. Pulukuri, BIST of DSP Cores in Virtex-4 & 5 FPGAs,, AU MS Thesis, 2010 10/15/2010 VLSI D&T Seminar 18