Defect Tolerance in VLSI Circuits

Similar documents
Time redundancy. Time redundancy

Self-checking combination and sequential networks design

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992

Fault-Tolerant Computing

Redundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992

On-Line Error Detecting Constant Delay Adder

Binary Addition. Add the binary numbers and and show the equivalent decimal addition.

Carry Checking/Parity Prediction Adders and ALUs

To design a 4-bit ALU To experimentally check the operation of the ALU

Binary Adders: Half Adders and Full Adders

Arithmetic Logic Unit. Digital Computer Design

CO Computer Architecture and Programming Languages CAPL. Lecture 9

CS8803: Advanced Digital Design for Embedded Hardware

Basic Arithmetic (adding and subtracting)

Chapter 3: part 3 Binary Subtraction

ECE 2030D Computer Engineering Spring problems, 5 pages Exam Two 8 March 2012

Computer Organization

DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS

Detection Of Fault In Self Checking Carry Select Adder

Fault-Tolerant Computing

Computer Architecture and Organization: L04: Micro-operations

Collapsing for Multiple Output Circuits. Diagnostic and Detection Fault. Raja K. K. R. Sandireddy. Dept. Of Electrical and Computer Engineering,

Chapter 5 Design and Implementation of a Unified BCD/Binary Adder/Subtractor

FAULT TOLERANT SYSTEMS

Self-Checking Fault Detection using Discrepancy Mirrors

UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT

INF2270 Spring Philipp Häfliger. Lecture 4: Signed Binaries and Arithmetic

UC Berkeley College of Engineering, EECS Department CS61C: Combinational Logic Blocks

UC Berkeley College of Engineering, EECS Department CS61C: Combinational Logic Blocks

END-TERM EXAMINATION

Chap-2 Boolean Algebra

LECTURE 4. Logic Design

Advanced Computer Architecture-CS501

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES

Boolean Algebra and Logic Gates

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

ECE468 Computer Organization & Architecture. The Design Process & ALU Design

Arithmetic Processing

(Refer Slide Time 6:48)

Single Event Upset Mitigation Techniques for SRAM-based FPGAs


Overview ECE 753: FAULT-TOLERANT COMPUTING 1/21/2014. Recap. Fault Modeling. Fault Modeling (contd.) Fault Modeling (contd.)

FAULT TOLERANT SYSTEMS

Chapter 4 Arithmetic Functions

Fault Simulation. Problem and Motivation

Chapter 4. Operations on Data

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Experiment 7 Arithmetic Circuits Design and Implementation

FAULT TOLERANT SYSTEMS

A Robust Bloom Filter

Reliability Improvement in Reconfigurable FPGAs

Final Exam Solution Sunday, December 15, 10:05-12:05 PM

Fault Tolerance. The Three universe model

At the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state

1 Introduction With increasing level of integration the realization of more and more complex and fast parallel algorithms as VLSI circuits is feasible

Czech Technical University in Prague Faculty of Electrical Engineering. Doctoral Thesis

Propositional Calculus: Boolean Algebra and Simplification. CS 270: Mathematical Foundations of Computer Science Jeremy Johnson

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

A Fast Johnson-Mobius Encoding Scheme for Fault Secure Binary Counters

Microcomputers. Outline. Number Systems and Digital Logic Review

VTU NOTES QUESTION PAPERS NEWS RESULTS FORUMS Arithmetic (a) The four possible cases Carry (b) Truth table x y

Introduction to Computer Architecture

Low Cost Convolutional Code Based Concurrent Error Detection in FSMs

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE

An Integrated ECC and BISR Scheme for Error Correction in Memory

A Fault-Tolerant Alternative to Lockstep Triple Modular Redundancy

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8

Using Error Detection Codes to detect fault attacks on Symmetric Key Ciphers

Chapter 2. Boolean Expressions:

Lecture #21 March 31, 2004 Introduction to Gates and Circuits

Online Testing of Word-oriented RAMs by an Accumulator-based Compaction Scheme in Symmetric Transparent Built-In Self Test (BIST)

High Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: continued. Consulting hours. Introduction to Sim. Milestone #1 (due 1/26)

Fault-Tolerant Computing

Logic, Words, and Integers

DIAGNOSIS AND ERROR CORRECTION FOR A FAULT-TOLERANT ARITHMETIC AND LOGIC UNIT FOR MEDICAL MICROPROCESSORS

Lecture (03) Binary Codes Registers and Logic Gates

Optimized Implementation of Logic Functions

Reduced Precision Checking for a Floating Point Adder

Topics. Computer Organization CS Exam 2 Review. Infix Notation. Reverse Polish Notation (RPN)

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Arithmetic-logic units

Origins of Stuck-Faults. Combinational Automatic Test-Pattern Generation (ATPG) Basics. Functional vs. Structural ATPG.

60-265: Winter ANSWERS Exercise 4 Combinational Circuit Design

Lec-6-HW-3-ALUarithmetic-SOLN

Recitation Session 6

CARLETON UNIVERSITY. Laboratory 2.0

Number Systems and Computer Arithmetic

CS 2506 Computer Organization II Test 1. Do not start the test until instructed to do so! printed

COMPUTER ARITHMETIC (Part 1)

ECE 2030B 1:00pm Computer Engineering Spring problems, 5 pages Exam Two 10 March 2010

Chapter 4. Combinational Logic

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 3 DLD P VIDYA SAGAR

Chapter 3: Arithmetic for Computers

Signed Binary Addition Circuitry with Inherent Even Parity Outputs

Approach to partially self-checking combinational circuits design

1 /10 2 /12 3 /16 4 /30 5 /12 6 /20

Boolean Algebra & Digital Logic

Signed umbers. Sign/Magnitude otation

Transcription:

Defect Tolerance in VLSI Circuits Prof. Naga Kandasamy We will consider the following redundancy techniques to tolerate defects in VLSI circuits. Duplication with complementary logic (physical redundancy). Permanent fault detection using time redundancy. Self-checking circuits. Reconfigurable memory arrays. 1 Duplication with Complementary Logic This technique duplicates a given module and compares the outputs of the resulting two modules. As long as the comparator works correctly, a failure of any one of the two modules is detected. The problems with duplication with comparison are two fold: (1) the comparator may fail and (2) the approach assumes that only one of the two duplicated modules will fail at any given time, that is, it ignores common mode failures that cause the two modules to fail in the same fashion at the same time. So, we need to modify the design of duplication with comparison schemes to minimize the effect of common-mode failures. One technique useful in tackling problems with common-mode failures in VLSI circuits is in the use of complementary logic where one circuit uses positive logic (that is, logic 1) while the other circuit uses negative logic (that is, logic 0). Suppose we know the Boolean function realized by a circuit using positive logic, we can easily determine the function realized by the same circuit using negative logic using the concept of duality. Recall from Boolean algebra that the dual of a Boolean function can be formed by replacing AND operations with OR operations, OR operations with AND operations, 1s with 0s, and 0s with 1s. The variables and complement operations are not changed. For example, consider the function f(x 1, x 2, x 3 ) = x 1 x 2 + x 3 The dual of the function f is given by f d (x 1, x 2, x 3 ) = (x 1 + x 2 )x 3 We can use the dual function f d to obtain the complement of f by replacing each variable in f d with its complement. f(x 1, x 2, x 3 ) = f d ( x 1, x 2, x 3 ) = ( x 1 + x 2 ) x 3 Let X be a vector consisting of n input bits given by X = (x 1, x 2,..., x n ). If we apply X to an arbitrary Boolean function f and then apply X = ( x 1, x 2,..., x n to the function f d, where f and f d are duals, the resulting outputs will be complementary. That is, f d ( X) = f(x). 1989. These notes are adapted from: B. W. Johnson, Design and Analysis of Fault Tolerant Digital Systems, Addison Wesley, 1

Fig. 1: Implementation of the function f(x 1, x 2, x 3) = x 1x 2 x 3 + x 1 x 2x 3 and its dual. Complementary logic can be used to implement a duplication with comparison approach to fault detection. Rather than use exact replicas of each module, the modules are designed as duals of each other. One module operates using positive logic and the other module operates operates using negative logic. If both modules are operating properly, the outputs will be complementary. There are three advantages of using complementary logic: (1) The use of dual implementations forces the use of separate masks to create the two modules. The possibility of common-mode failures resulting from design mistakes or mask problems is reduced. (2) The voltage transitions on the corresponding lines in the two modules are in opposite directions, and so, the possibility of faults that are sensitive to voltage transitions producing identical effects is reduced. (3) Corresponding lines in the two modules are always at different voltage levels, and so, a short between two such lines always results in one of the two lines having an erroneous value and the other line having the correct value. Consequently, the fault can be detected. Let us consider the design of a duplicate and compare scheme and the concept of complementary logic to realize the function f(x 1, x 2, x 3 ) = x 1 x 2 x 3 + x 1 x 2 x 3 The dual of f is given by f d (x 1, x 2, x 3 ) = (x 1 + x 2 + x 3 )( x 1 + x 2 + x 3 ) Fig. 1 shows the logic diagrams of the circuits that realize f and f d, respectively. The original function and its dual are now operated in parallel using complementary input combinations, as shown in Fig. 2. Logic values on corresponding lines in the two modules are complementary. The outputs, in the fault-free case, will also be complements, and can be compared to detect faults. 2 Fault Detection using Time Redundancy One of the problems with the duplicate and compare approach is the penalty paid in extra hardware. Time redundancy is a way to decrease the hardware overhead needed to achieve fault detection (or fault tolerance), at the expense of using additional time. The basic concept of time redundancy is to repeat computations in such a way that allows faults (both transient and permanent) to be detected. The approach used to detect 2

Fig. 2: Duplication with comparison using complementary logic. An example input and the values of the internal lines are also shown. Note the complementary values on the corresponding lines. transient faults is shown in Fig. 3. To detect permanent faults using time redundancy, the computation (or the data) must be modified when it is performed the second time, as shown in Fig. 4. We will consider two approaches that use time redundancy to detect permanent faults in VLSI circuits: (1) alternating logic and (2) recomputing with shifted operands (RESO). Alternating Logic The concept of alternating logic can be applied to general combinational logic circuits if the circuit possesses the property of self-duality. A combinational circuit is said to be self dual if and only if f(x) = f( X), Fig. 3: In time redundancy, the computations are repeated at different points in time, and the results are then compared. 3

Fig. 4: If time redundancy is used to tolerate permanent faults, the computations must be modified when they are performed the second time. where f is the Boolean expression for the circuit and X is the input vector for the circuit. In other words, a combinational circuit is self-dual if the output of the circuit for the input vector X is the complement of the output when the input vector X is applied. So, for a self-dual circuit, the application of an input X followed by the input X, produces outputs that alternate between 1 and 0. The key to detecting faults is determining that at least one input combination exists for which the fault does not result in alternating outputs. The full-adder circuit shown in Fig. 5 is a self-dual circuit. Any combinational circuit with n inputs can be transformed into a self dual circuit with no more than n + 1 input variables. The dual f d of an n-variable function f is given by The function given by f d = f( x 1, x 2,..., x n ) f sd = x n+1 f + x n+1 f d is a self-dual function because when x n+1 = 1, that is, when X = (x 1, x 2,..., x n, x n+1 ) = (x 1, x 2,..., x n, 1), then the value of f sd is f. When x n+1 = 0, that is, when we provide X = ( x 1, x 2,..., x n, 0), the value of f sd is f d. Thus, x n+1 is a control line that determines which of the two functions, f or f d, appear on the output line. Alternating logic detects a set of faults, if for every fault within the set, there is at least one input combination that produces non-alternating outputs. Fig. 6 shows the resulting truth table for the various stuck-at-1 or stuck-at-0 faults present in the full adder circuit in Fig. 5. As we can see, each stuck-type fault results in at least one set of non-alternating outputs being produced for complementary inputs at either the carry or the sum output. When using alternating logic, it is important to note that faults may not be immediately detected. For example, suppose that the full-adder contains a stuck-at-0 fault on line D. As we can see from the truth table, the sum output is not affected by this fault. So, we must depend on the carry output to detect this fault. The carry output, however, will have alternating outputs for the complimentary input combinations (000) and (111) as well as (001) and (110). So, the fault D/0 is not detected until the input combination (010) and (101), or the combination (011) and (100) is applied to the circuit. Depending on when these combinations are actually applied to the circuit, the time elapsed before the detection of the fault can be significant. 4

Fig. 5: A full-adder is a self-dual circuit. Complementary inputs produce complementary outputs. Recomputing with Shifted Operands Another form of time redundancy is called recomputing with shifted operands (RESO), and RESO was developed as a method to detect errors in arithmetic logic units (ALUs). (RESO is discussed in page 160 of the text book.) We will illustrate how RESO is used using the example of a n-bit ripple carry adder that performs. Suppose that the i th full-adder cell (or slice) is faulty and produces an erroneous value for the function s output at that bit slice. During the first computation when the operands are not shifted, the i th output of the circuit is erroneous. When the input operands are shifted left by one bit, the faulty bit slice then operates on, and corrupts the (i 1) th bit. When the result is shifted back to the right, the two results the first with unshifted operands and the second with shifted operands are either both correct, or they disagree in either (or both) the i th or the (i 1) th bits. Suppose we compute R = A + B, and the i th full adder is faulty. When the operands are unshifted R fault free = r n r n 1... r i r i 1... r 1 r 0 R faulty = r n r n 1... r i r i 1... r 1 r 0 (1) where r is the error in the result bit due to the faulty bit slice. A faulty bit slice can have one of three effects: the sum bit can be stuck at 0 or 1, the carry bit can be stuck at 0 or 1, or both the sum bit and the carry bit may be in error. The following table shows the effect of each possible error on the sum R. 5

Fig. 6: The truth table for single stuck-line faults in the full adder circuit of Fig. 5. Error Effect on Sum Sum is 0 2 i Sum is 1 +2 i Carry is 0 2 i+1 Carry is 1 +2 i+1 Sum is 0, carry is 0 (2 i+1 + 2 i ) = 3.2 i Sum is 0, carry is 1 2 i+1 2 i = +2 i Sum is 1, carry is 0 2 i 2 i+1 = 2 i Sum is 1, carry is 1 2 i+1 + 2 i = +3.2 i In summary, the result generated for the unshifted operands, if bit slice i is faulty is incorrect by one of [0, ±2 i, ±2 i+1, ±3.2 i ]. When the operands A and B are shifted to the left by two bits, the sum R computed, and then shifted right by two bits, we obtain R fault free = r n r n 1... r i 1 r i 2... r 1 r 0 R faulty = r n r n 1... r i 1 r i 2... r 1 r 0 (2) and a similar analysis of possible bit errors and their effect on the sum gives us Error Sum is 0 Sum is 1 Carry is 0 Carry is 1 Sum is 0, carry is 0 Sum is 0, carry is 1 Sum is 1, carry is 0 Sum is 1, carry is 1 Effect on Sum 2 i 2 +2 i 2 2 i 1 +2 i 1 (2 i 1 + 2 i 2 ) = 3.2 i 2 2 i 1 2 i 2 = +2 i 2 2 i 2 2 i 1 = 2 i 2 2 i 1 + 2 i 2 = +3.2 i 2 6

Fig. 7: The ALU structure using RESO. Summarizing, the result will be incorrect by one of [0, ±2 i 2, ±2 i 1, ±3.2 i 2 ]. Comparing the two tables, we see that the results of the two computations (that is, the unshifted and the one where the operands are shifted by two) cannot agree unless both are correct. The structure of an ALU that uses the RESO techniques is shown in Fig. 7. The additional hardware required for the technique are the three shifters, the storage register to hold the results of the first computation, and the comparator. Also, the ALU must be extended by 2 bits to allow the two-bit arithmetic shift to be performed without an overflow. The primary issues with the RESO approach are the additional hardware required and the lack of coverage provided for faults in the shifters and the comparator. 3 Self-Checking Logic Self-Checking logic is needed to tackle the checking the checker problem. In duplicate and compare approaches, it is necessary to compare the outputs of two modules. So, the basic problem is to ensure that the comparator is fault free, or to design a comparator that can detect its own fault, or a self-checking comparator. First, we define several terms that are important to understand self-checking technology. A circuit is said to be self-checking if it has the ability to detect the existence of a fault without the need for any externally applied stimulus (like what is done in circuit testing). In other words, a self-checking circuit determines if it contains a fault during the normal course of its operation. Self-checking logic is typically designed using coding techniques where the basic idea is to design a circuit that, when fault free and presented with a valid input code word, will produce the correct output code word. If a fault exists, however, the circuit should produce an invalid output code word so that the fault can be detected. A circuit is fault secure if any single fault within the circuit results in that circuit either producing the 7

Fig. 8: Basic structure of a TSC circuit. correct code word or producing a non-code word, for any valid input code word. In other words, if the circuit is fault secure, then the fault either has no effect on the output or the output is affected in such a way that it becomes an invalid code word. A circuit is self testing if there exists at least one valid input code word that will produce an invalid output code word when a single fault is present in the circuit. A circuit is said to be totally self checking (TSC) if it is both fault secure and self testing. So, in a TSC circuit, all single faults are detectable by at least one valid code word input, and if a given input combination does not detect the fault, the output is a correct code word output. The general structure of a TSC circuit is shown in Fig 8. During normal operation, coded inputs are applied to the circuit and coded outputs are produced at the circuit s output. Note that, rather than have a single-bit output that provides a faulty or not faulty indication, the output consists of two bits that are: (1) complementary if the input to the checker is a valid code word and the checker is fault free, or (2) non-complementary if the input to the checker is not a valid code word or the checker contains a fault. An obvious reason for using two checker outputs is to overcome the problem of the checker output becoming stuck at either the logic 0 or the logic 1 value. The most common TSC checker is the two-rail checker shown in Fig. 9. The two-rail checker is used to compare two words that would normally be complementary. If the words are complementary and the checker itself is fault free, the outputs of the checker should also be complementary. If the two words are not complementary or the checker contains a fault, the outputs of the checker should not be complementary. A simple design of a 2-bit TSC two-rail checker is shown in Fig. 9 where each of the two input words is two bits. The first input word is (x 0, x 1 ), and the second input word is (y 0, y 1 ). Valid code words on the inputs will have x 0 = ȳ 0 and x 1 = ȳ 1. From the logic of the circuit, we see that e 1 = x 0 y 1 + y 0 x 1 e 2 = x 0 x 1 + y 0 y 1 (3) 8

Fig. 9: The basic block diagram of the two-rail checker, and a simple 2-bit TSC two-rail checker. Provided the checker is fault free, the outputs of the TSC two-rail checker will reduce to e 1 = x 0 x 1 + x 0 x 1 = x 0 x 1 e 2 = x 0 x 1 + x 0 x 1 = (x 0 x 1 ) (4) and e 1 and e 2 are always complementary. Now, consider the cases where the checker is fault free, but the inputs are not complementary. In the first case, where x 0 = y 0 and x 1 = ȳ 1. The checker outputs become e 1 = x 0 x 1 + x 0 x 1 e 2 = x 0 x 1 + x 0 x 1 (5) which are identical for all possible values of x and y. In the second case where x 0 = ȳ 0 and x 1 = y 1, the outputs of the checker become e 1 = x 0 x 1 + x 0 x 1 e 2 = x 0 x 1 + x 0 x 1 (6) which are also identical for all possible values of x 0 and x 1. In the final case, where x 0 = y 0 and x 1 = y 1, 9

Fig. 10: An 8-bit TSC checker using 2-bit TSC checkers as building blocks. the outputs of the checker become e 1 = x 0 x 1 + x 0 x 1 = x 0 x 1 e 2 = x 0 x 1 + x 0 x 1 = x 0 x 1 (7) which are identical. We can also show that the TSC circuit is fault secure with respect to single stuck-line faults, and also satisfies the self-testing property. The proof is left to the reader. Finally, it is possible to create TSC two-rail checkers with the larger number of input bits using the circuit in Fig. 9 as the basic building block. Fig. 10 shows a hierarchical construction of a 8-bit TSC checker using 2-bit TSC checkers as building blocks. The notation e j i is used to denote the ith error signal from the j th checker, and e 1 and e 2 denote the primary error-signal outputs. The four checkers in the first level of the hierarchy each compare 2 bits from the 8-bit operands and each produce two error signals. Checkers in the second and third levels of the hierarchy verify that the error signals from the checkers at the first level are indeed complementary. A natural feature of the two-rail checker is the requirement that the two input operands be complements in the fault-free case. If we simply consider duplication with comparison, then the input from one of the modules must be inverted before the checking process. 10