HARDWARE EMULATION OF SEQUENTIAL ATPG-BASED BOUNDED MODEL CHECKING GREGORY FICK FORD. Submitted in partial fulfilment of the requirements

Size: px

Start display at page:

Download "HARDWARE EMULATION OF SEQUENTIAL ATPG-BASED BOUNDED MODEL CHECKING GREGORY FICK FORD. Submitted in partial fulfilment of the requirements"

Wilfrid Patterson
5 years ago
Views:

1 HARDWARE EMULATION OF SEQUENTIAL ATPG-BASED BOUNDED MODEL CHECKING BY GREGORY FICK FORD Submitted in partial fulfilment of the requirements for the degree of Master of Science Thesis Advisor: Dr. Daniel Saab Department of Electrical Engineering & Computer Science CASE WESTERN RESERVE UNIVERSITY January, 2014

2 CASE WESTERN RESERVE UNIVERISTY SCHOOL OF GRADUATE STUDIES We hereby approve the thesis of Gregory Fick Ford candidate for the Master of Science degree. Dr. Daniel Saab Dr. Francis Merat Dr. Christos Papachristou (date) 11/08/2013 * We also certify that written approval has been obtained for any proprietary material contained therein. i

3 Contents 1 Background 1.1 Properties & Temporal Logic Linear Temporal Logic Computational Tree Logic SUGAR OpenVera Property Examples Model Checking Ordered Binary Decision Diagrams SAT Modeling Bounded Model Checking Automatic Test Pattern Generation Prior Work.. 2 Algorithm 3 Architecture 3.1 PI/PPI Decision Block Objective Decision Block Forward Network Derivation Backward Network Derivation Backward Network Fanout Handling Backward Network Conflict Detection Backward Network Decoder Backward Network Encoder Translating Circuits into Forward/Backward Networks Processing Input Data In-Memory Data Structure Writing Output Networks... 4 Results 5 Conclusion & Future Work A Example Input Circuit and Network Translations B Example DONE Simulation for c17 Benchmark Circuit C Example FAIL Simulation for s27 Benchmark Circuit ii

4 D FPGA Emulation Algorithm Implementation Base Verilog Code E Forward / Backward Network Generation Program C++ Code F FPGA Algorithm & Network Integration TCL Script G Formal CTL Rules Generation TCL Script Bibliography iii

5 List of Figures 1.1 CTL Time/State Representation Example Simple Search System Model OBDD Graphical Representation Example Variable Order Dependency in OBDDs. 1.5 Example of the One-Literal Rule in SAT Propagation and Consistency in the D-Algorithm ILA Model of Sequential Circuit for k Time Frames Objective Tracing Within a Frame Algorithm Flow Diagram Top Level Architecture Block Diagram PI/PPI Decision Block Structure Diagram State Transition Model of More Justification Operation State Transition Model of Move to Ti-1 Operation State Containment Check Implementation State Transition Model of Backtrack Operation State Transition Model of Move to Ti+1 Operation Objective Decision Block Structure Diagram Objective Decision Block PPI Objective Read In Assembly of Frame-k Objective Word Objective Decision Block Conflict/Done Decision Logic Objective Decision Block Sequencer Operation Abstract Backward Gate Values Dependence Backward Fanout Signal Contention Priority Encoder Structure for Required Objective Bit Priority Encoder Structure for Required Objective Bit with Reset Priority Encoder Structure for Objective Value Bit Complete Structure for Backward Network Priority Encoder Trace-blocking Conflict Non-blocking Conflict State Transition Diagram for Backward Network Decoder Backward Network Decoder Value Assignment State Transition Diagram of Backward Network Encoder Backward Network Encoder Shiftout Operation Network Translation Flow Internally Consistent Pre-Processing Gate Data Structure.... iv

6 3.28 Gate Fan-out List Structure Level List Structure Virtex-6 Utilization vs. ISCAS89 Benchmark Size FPGA vs. Software Solve Time for ISCAS89 Benchmarks FPGA vs. Software Total Time for ISCAS89 Benchmarks... B.1 c17 Benchmark Circuit Structure..... B.2 c17 Simulation Circuit Structure.... B.3 c17 Simulation Backtrace... B.4 Contents of c17 Block RAM After Objective 1... B.5 c17 Simulation Trace / Implication B.6 Final c17 Block RAM Contents.. C.1 s27 Benchmark Circuit Structure... C.2 s27 Benchmark Simulation Structure.... C.3 s27 Simulation Frame k Backtrace..... C.4 Contents of Block RAM After Frame k.. C.5 Contents of Block RAM After Move to Frame k C.6 Contents of RF in Frame k C.7 s27 Simulation Frame k-1 Backtrace 1... C.8 Contents of RAM in Frame k-1 with Backtrace 1.. C.9 s27 Simulation Frame k-1 Imply C.10 Contents of RF and PO/PPO in Frame k-1, Imply C.11 s27 Simulation Frame k-1 Backtrace 2... C.12 s27 Simulation Frame k-1 Imply C.13 Contents of RF and PO in Frame k-1, Imply 2... C.14 State Check Comparison for Frame k C.15 Contents of RF in Frame k C.16 Contents of Block RAM at Frame k-2 Start... C.17 s27 Simulation Frame k-2 Backtrace 1... C.18 s27 Simulation Frame k-2 Backtrace 2... C.19 Contents of Block RAM after First Clear Top C.20 Contents of Block RAM after First Swap Value.... C.21 s27 Simulation Frame k-2, Backtrack 1 Imply... C.22 Simulation Frame k-2, Backtrack 1 Backtrace... C.23 s27 Simulation Frame k-2, Backtrack 2 Imply... C.24 Block RAM Contents after Move to T i+1 and Swap Value..... C.25 s27 Simulation Frame k-1, Backtrack 1 Imply... C.26 RF Contents and PO/PPO after Move to T i+1, Imply 1... v

7 C.27 Block RAM Contents after Move to T i+1 and Backtrack.... C.28 Block RAM Contents after Second Move to T i+1 and Swap Value.... C.29 Simulation Frame k, Backtrack 1 Imply vi

8 List of Tables 1.1 Propositional Operators for LTL Formulae Temporal Operators for LTL Formulae Temporal Operators for LTL Formulae Operators of the Temporal Layer in Sugar Directive Operators in OpenVera Temporal Operators in OpenVera Language Representations of a Safety Property Language Representations of a Liveness Property PI/PPI Decision Block Control Logic States More Justification Operation Pseudo-code Pseudo-code for Move to Ti-1 Operation Backtrack Operation Pseudo-code Move to Ti+1 Operation Pseudo-code Full One-Bit Truth Table for AND Gate Two-Bit Truth Table for AND Gate Split K-maps for AND Gate Output Bits Minterm Expressions for AND Gate Output Bits Forward Translation Equations for Basic Gates Pseudo-code for AND Gate Backward Model Outputs Truth Table for Backward Model of AND Gate Backward Translation Equations for Basic Gates Input Network Format Forward Network Module Interface Example Virtex-6 Resource Utilization for ISCAS89 Benchmarks Runtime Comparison for ISCAS89 Benchmarks.. A.1 Benchmark Code for s27 Circuit A.2 Forward Network Verilog for s27 Benchmark Circuit... A.3 Backward Network Verilog for s27 Benchmark Circuit A.4 Backward Network Priority Encoder Verilog Example. B.1 Circuit c17 Simulation Input Stimulus B.2 Circuit c17 Initial Reset.. B.3 Circuit c17 Simulation Cycle 1... B.4 Circuit c17 Simulation Cycles B.5 Circuit c17 Simulation Cycles vii

9 B.6 Circuit c17 Simulation Cycles B.7 Circuit c17 Simulation Cycles B.8 Circuit c17 Simulation Cycle B.9 Circuit c17 Simulation Cycles C.1 Circuit s27 Simulation Cycles C.2 Circuit s27 Simulation Cycles C.3 Circuit s27 Simulation Cycles C.4 Circuit s27 Simulation Cycles C.5 Circuit s27 Simulation Cycles C.6 Circuit s27 Simulation Cycles C.7 Circuit s27 Simulation Cycles C.8 Circuit s27 Simulation Cycles C.9 Circuit s27 Simulation Cycles C.10 Circuit s27 Simulation Cycles C.11 Circuit s27 Simulation Cycles C.12 Circuit s27 Simulation Cycles C.13 Circuit s27 Simulation Cycles C.14 Circuit s27 Simulation Cycles C.15 Circuit s27 Simulation Cycles C.16 Circuit s27 Simulation Cycles C.17 Circuit s27 Simulation Cycles C.18 Circuit s27 Simulation Cycles C.19 Circuit s27 Simulation Cycles C.20 Circuit s27 Simulation Cycles C.21 Circuit s27 Simulation Cycles C.22 Circuit s27 Simulation Cycles C.23 Circuit s27 Simulation Cycles C.24 Circuit s27 Simulation Cycles C.25 Circuit s27 Simulation Cycles C.26 Circuit s27 Simulation Cycles C.27 Circuit s27 Simulation Cycles C.28 Circuit s27 Simulation Cycles viii

10 Hardware Emulation of Sequential ATPG-Based Bounded Model Checking Abstract by GREGORY FICK FORD The size and complexity of integrated circuits is continually increasing, in accordance with Moore s law. Along with this growth comes an expanded exposure to subtle design errors, thus leaving a greater burden on the process of formal verification. Existing methods for formal verification, including Automatic Test Pattern Generation (ATPG) are susceptible to exploding model sizes and run times for larger and more complex circuits. In this paper, a method is presented for emulating the process of sequential ATPG-based Bounded Model Checking on reconfigurable hardware. This achieves a speed up over software based methods, due to the finegrain massive parallelism inherent to hardware. ix

11 1. Background The complexity of integrated circuits is continually increasing, and with it, the chances of having subtle errors in a design. This growth also increases the amount of work needed in verification, hence driving up the total time that projects spend in verification and effecting the bottom line of time-to-market. In 2010, the Wilson Research Group published a study on functional verification, showing, among other things, that verification now accounts for over half of the total time of digital design projects [1]. On top of that, this percentage is continuing to increase, from 50% in 2007 to 55% in Historically, simulation has been used as the main means of discovering bugs in a design, but as designs grow larger, the chances of finding these bugs becomes less and less. Attention has turned toward formal verification as a means to augment design verification in the face of growing designs. Formal verification is well suited to the problem of comparing the register transfer level (RTL) with the logic level for a design, as this is simply proving combinatorial equivalence. On the other hand, comparing between the RTL and behavioral level is a much more complicated problem, as the behavioral model is generally defined using code that follows a natural language structure. 1

12 1.1. Properties & Temporal Logic To address the problem of bridging the gap in formal verifiability between the RTL and behavior level, the concept of properties is introduced. These are statements about the function of small portions of the total design, written by the designers. These functional statements can then be used as a means of comparison against the RTL. As described by Lamport in [2], properties can be divided into two main categories. The first being a safety property, which is a statement that a defined bad event will never happen. The second being a liveness property, which is a statement that a defined good event will eventually happen. This, then, presents the issue of how to formally define properties such that they can be verified Linear Temporal Logic These properties are generally represented using temporal logic, which provides a means for applying an assertion over a period of time. For example, if condition p is true in the present, then another condition q must be true at some point in the future. Pnueli examined the applicability of temporal logic to programs in [3], which can similarly be transferred to digital design. In Linear Temporal Logic (LTL), time is represented as a linearly ordered set, or in the context of digital design, a sequence of design states. As defined by Huth and Ryan in [4], a formula in LTL is constructed from three different parts. The first are propositional atoms, which are representative of conditions in the digital system (as p and q, above). The second are propositional operators, which function as modifiers to propositional atoms, outside any concept of time. Table 1.1 provides the definition of these operators. 2

13 Operator Name Description True Always True False Always False p Negation True when p is False. p q Conjunction True when both p AND q are True. p q Disjunction True when p OR q (or both) is True. p q Implication IF p is True THEN q must also be True. Table 1.1: Propositional Operators for LTL Formulae The third, and final, are temporal operators, which function as modifiers to propositional atoms based on a concept of time represented by design states occurring either before or after the current state in a sequence. Table 1.2 provides the definition of these operators. Operator Name Description X p Next p is true at the next point in time. F p Eventuality p is true at some time in the future. G p Globally p is true at all future points in time. p U q Until p is true, until a point in the future when q becomes true. Table 1.2: Temporal Operators for LTL Formulae It is important to note that in the case of the Until operator, there are two possible conditions where the operator could produce a true result. When p holds true for some time in the future, there must be a point where q then holds true for all future points. If q must be true at some point in the future (F q), then p U q is called a Strong Until. If q is not necessarily true at any point in the future, then p U q is called a Weak Until. That is, p U q can be considered true regardless of p, if q is never true at any point in the future. 3

14 Computational Tree Logic Another system for constructing temporal logic formulae is Computational Tree Logic (CTL), as defined by Clarke, et al. in [5]. CTL s representation is based off the fact that for any point in time, there are many possible futures; or for any design state, there are many possible sequences of design states that follow, based on inputs. This is realized for any starting state, S 0, where each possible next state becomes a branch from S 0 in the tree representation, and each further state from those next states becomes a branch from them, ad infinitum. An example of this construction is shown in Figure 1.1. Figure 1.1: CTL Time/State Representation CTL has its own set of temporal operators, which provide a means to formulate equations that can assert properties based on conditions in branches of the tree. These are used in addition to the base set of LTL operators, which are used to assert properties within a branch of the tree. Table 1.3 provides the definitions of these operators. 4

15 Operator Name Description A q Necessary q will always be true, along all branches. E q Possible q is sometimes true, along some branches. Table 1.3: Temporal Operators for LTL Formulae SUGAR Sugar 2.0 is another example of a formal specification language that can be used for verification, as introduced by Eisner and Fisman in [6]. Sugar expressions are composed in four different layers. The first layer is the boolean layer, which consists of Boolean operations on propositional atoms, where any such expression can evaluate to a true or false logic value. This can be thought of as equivalent to the propositional operators component of LTL, as outlined in Table 1.1. The second layer is the temporal layer, which is used to define relationships between expressions in the Boolean layer, over a period of time. The operators used in the temporal layer and defined in Table 1.4. Operator Description always p p is true at all times. never p p is false at all times. next p p is true in the following cycle. eventually p p is true at some cycle in the future. p until q p is true until a point when q becomes true. p before q p is true at a point in time before q is true. Table 1.4: Operators of the Temporal Layer in Sugar 2.0 As an addition to these temporal operations, Sugar supports postfixes of! and _, in appropriate cases. The exclamation point postfix is supported by eventually, until and before. It defines the operator as being strong, as opposed to the default of weak. This is the same context as strong until, as defined earlier, in that a strong operator requires that the 5

16 second argument be true at some point in time, excluding the case where the expression is true by virtue of a case where the second argument never occurs (thereby never requiring a check on the first argument). The underscore postfix is supported by until and before. The underscore defines that there is an allowed overlap between the two arguments of one cycle. In the context of until, this would mean that the first argument is true up to and including the first cycle that the second argument is true. In the context of before, this would mean that the first argument must be true before, or at the same time as, the second argument. The third layer is the verification layer, which provides direction to a verification tool reading the Sugar expressions. If an expression from the temporal layer is defined with assert in this layer, then that instructs the verification tool that it must verify the defined properly. If an expression is defined with assume, then that instructs the verification tool that it can assume the behavior defined in the property to be true. The final layer is the modeling layer, which allows for definition of the behavior of the propositional atoms in the expressions. In the context of circuit verification, this can be thought of as assigning behavior to signals in the design OpenVera A final example of another formal specification language used in verification is OpenVera, as defined in [7]. OpenVera Assertions (OVAs) can be thought of as being divided into two distinct components; directives and events. Directives make statements about events, defining what conditions the verification tool should be checking for. This is similar in function to the verification layer of Sugar 2.0. Table 1.5 shows the available directives. 6

17 Operator Description check(e) Event e should always hold true. forbid(e) Event e should never hold true. Table 1.5: Directive Operators in OpenVera Events are separately defined entities that contain expressions comprised of Boolean and temporal logic. Events can be thought of as similar to a combination of the boolean layer and temporal layer in Sugar 2.0. The operators used at this layer are summarized in Table 1.6. Operator Description #n p After n cycles, p is true. #[n..m] p After between n and m cycles, p is true. p followed_by q q is true at some point after p is true. p triggers q q is true immediately after p is true. p until q p is true until a point where q is true. next p p is true in the next cycle. Table 1.6: Temporal Operators in OpenVera Property Examples To illustrate the use of the different formal specification languages discussed here, consider a simple searching system, as illustrated in Figure 1.2. This system has a memory containing arbitrary data that can be searched. The system remains in an idle state until a request (req) arrives with a pattern to be searched for. The system moves to a search state, and examines locations in the memory for the pattern until one is found, at which point it acknowledges (ack) that the pattern exists in memory. 7

18 Figure 1.2: Example Simple Search System Model In this system, a safety property could be that the search never overflows from the memory. That is, index should never exceed the number of locations in memory (defined here as mem_rows). Table 1.7 shows how this property can be realized in each of the languages discussed. LTL G (index > mem_rows) CTL AG (index > mem_rows) Sugar 2.0 never (index > mem_rows) OpenVera assert a_overflow : forbid(e_overflow) ; event e_overflow : (index > mem_rows) ; Table 1.7: Language Representations of a Safety Property In LTL, the overflow condition (index > mem_rows) can be used as a negative argument to the global operator G; that is, for all points in time the overflow condition should not be true. In CTL, similarly, the overflow condition is used as a negative argument to the necessary operator; that is, for all branches the overflow condition must not be true. In Sugar, the overflow condition can be directly used as an argument to the never operator, stating that the condition can never occur. In OpenVera, a directive is defined with the forbid operator, stating that the event e_overflow can never happen. The event e_overflow is then defined as the overflow condition. 8

19 In this system, a liveness property could be that the system always returns to its idle state. That is, ack should eventually be asserted as a response to req. Table 1.8 shows how this property can be implemented in each of the languages discussed. LTL G (req F ack) CTL AG (req F ack) Sugar 2.0 always (req eventually! ack) OpenVera assert a_return : check(e_return) ; event e_return : req followed_by ack ; Table 1.8: Language Representations of a Liveness Property In LTL, the return to idle condition can be represented as req implying that ack will eventually be true. This statement is then asserted to be globally true. In CTL, the same set of statements can be applied, with the addition of the necessary operator, adding that the condition must be true for all branches of the tree. In Sugar, the return to idle condition is represented with req implying ack with a strong eventually. Explicitly defining this as a strong eventually prevents the case where req stays true forever, and ack never becomes true (thus potentially masking a bug). In OpenVera, a directive is defined with the check operator, stating that the event e_return should always happen. The event e_return then uses the followed_by operator to state that the signal req becoming true must be followed by the signal ack becoming true at some point in the future. 9

20 1.2. Model Checking Properties expressed in temporal logic provide one of the inputs necessary for model checking. The other input needed is a model of the circuit being verified. In model checking, the supplied design model is analyzed with respect to the input properties. The result of the process is either a sequence of states showing that the property is satisfied, or a statement that no satisfying sequence could be found. One of the early implementations of model checking is temporal logic model checking, as described by Clarke and Emerson in [8]. In this system, properties are expressed in CTL, and the design is modeled as a state-transition diagram. The major limitation of this early implementation is that both the CTL expressions needed to represent properties and the design state model grow polynomially with respect to the size of the design being verified. This is known as the state explosion problem Ordered Binary Decision Diagrams The first major development in combating the state explosion problem was the SMV system, introduced by McMillan in [9]. SMV reduces the growth of the design state space by applying Ordered Binary Decision Diagrams (OBDDs) to the modeling of design states. OBDDs were first detailed by Bryant in [10]. OBDDs for a given Boolean function can be represented as a binary tree, where each level in the tree is assigned to a variable from the function, in order. At any non-terminal vertex in the tree (representing a variable), there will be two possible transitions to a following level of the tree. A transition to the left indicates a logical low, and a transition to the right represents a logical high. If for a given transition, the overall value of the function remains indeterminate, the transition will be to the next variable in order. If the transition provides determination of the function value, then the transition will be to a terminal 10

21 node in the tree (0 or 1). Figure 1.3 shows an example OBDD graph representation, where circles are non-terminal (variable) nodes and squares are terminal nodes. Figure 1.3: OBDD Graphical Representation Example Evaluation for the function (w x + y z) begins with variable w. If w is false, then the value of x does not matter, so the transition proceeds to y. If w is true, then the transition proceeds to x. In this case, there is only one instance of x, so if x is true, that means that the function is true (since both w and x are true); the transition will then proceed to the 1 terminal vertex. If x is false, then evaluation continues and the transition will proceed to y. Again, there is only one vertex for y, and the precedent for being at vertex y is that the first term in the equation is false. Then, if y is false, the second function is false, and the entire function is also false; the transition will then proceed to terminal vertex 0. If y is true, then evaluation continues by transitioning to vertex z. Since vertex z is unique, the condition for being at the vertex is that the first term is false and y is true, meaning that the overall function value will be determined by the value of z. If z is false, then the second term is also false, making the overall function false; the transition then proceeds 11

22 to terminal vertex 0. If z is true, then the second term is true, and the overall function is true; the transition proceeds to terminal vertex 1. The usefulness of OBDDs for reducing the state explosion problem does have limitations, though. One key issue faced in OBDD construction is that the resulting graphical structure is directly linked to the variable ordering used in the Boolean equations being evaluated. The fact that variables are evaluated in order can cause large variations in graph representation efficiency among Boolean equations of the same structure. This problem is illustrated in Figure 1.4. Figure 1.4: Variable Order Dependency in OBDDs Consider a new equation, based on the one described in Figure 1.3, where the order of two of the variables is changed. In both cases, the variable ordering w-x-y-z is used in evaluation, but vastly different graph efficiencies result, even though the Boolean equation structure is the same. One observation that can be made for this situation is that when related variables are not close together in evaluation, the complexity of the graph grows. In the case of the new equation, w and y are closely related, as well as x and z. But, both of these pairs of related variables are 12

23 separated in the variable evaluation ordering. Observations like this can be used as heuristics to improve variable ordering, helping to maintain high efficiency in OBDDs. As an example, Fujita, et al in [11] and Malik, et al in [12] present that a depth-first traversal of a circuit being verified can often provide a reasonable variable ordering. In situations where heuristics fail to give reasonable results, Rudell presents a solution in [13] called dynamic reordering. In this solution, a shifting algorithm is periodically run within the OBDD in an attempt to minimize it. In each shifting operation, given n variables, one variable is selected for optimization and the order of the n-1 other variables is fixed. The position of the selected variable is then shifted to a more ideal spot, out of the n possible choices. Additionally, further enhancements to the OBDD model have been proposed by Brace, et al in [14]. Specifically, more complicated sets of Boolean equations can be modeled as a multi-rooted OBDD, where the different functions have opportunities to share sub-trees (instead of distinct single-rooted trees for each equation) SAT Modeling OBDDs provide some mitigation of the state explosion problem, but they are still complete statespace representations of a circuit being verified, and as such the model size can still quickly grow too large as circuit complexity increases. Another approach that has been used to get around this problem is modeling as satisfiability (SAT) problems. In this case, the circuit being verified is modeled as a set of Boolean propositions, as opposed to a full expansion of design states. One of the major methods for SAT problems is the Davis-Putnam method [15]. This method consists of two parts. The first is the QFL-Generator, which uses the formula for the 13

24 property being verified to create a growing propositional calculus formula. The second part is the Processor, which continually checks the propositional calculus formula for consistency. If, at any point, the formula is found to be inconsistent, that provides a proof to the original formula being verified. SATO is one implementation of a SAT solver that leverages the Davis-Putnam method, as presented by Zhang in [16]. The process of checking a propositional calculus formula for consistency in this context involves reduction of the formula via elimination of terms. This reduction is achieved with a variety of rules. One-literal clauses can assist in reduction using the one-literal rule (also known as unit propagation). This rule states that for a set of clauses, containing a unit clause (a clause that is a single literal), each clause containing the unit clause can be eliminated and each occurrence of the negation of the unit clause in other clauses can be deleted. The resulting reduced set of clauses will be logically equivalent to the starting set. An example of this process is shown in Figure 1.5. Figure 1.5: Example of the One-Literal Rule in SAT In this example, the one-literal clause is x. Therefore, each remaining clause in the set is examined for inclusion of x. The clause y z does not contain x, and remains the same. The clause x z contains an affirmative reference to x, and thus the clause is dropped. The 14

25 clause x y contains a negative reference to x, so the negative reference to x is removed from the clause. This then yields the reduced set of clauses {x, y z, y}. Another rule for formula reduction is the affirmative-negative rule (also known as pure literal elimination). This rule states that if a propositional variable only appears in a single form (either affirmative or negated) across all uses in a set of clauses, then all clauses containing that variable can be eliminated. If a propositional variable only appears with a single polarity (also called a pure variable), then an assignment can always be made to make all clauses containing the pure variable true. A final rule that can be applied when the previous two rules have been exhausted is the splitting rule, which allows for a re-structuring of clauses. Davis, et al examine this rule more closely in [17]. The rule states that a formula F should first be put into the form of (A p) (B p) R. This can be achieved by creating three groups of clauses; those containing p, those containing p and those not containing p. p and p can then be factored out of the first two groups of clauses to create the desired form. It can then be stated that formula F is inconsistent if and only if (A B) R is inconsistent. This can also be represented in another form, stating that formula F is inconsistent if and only if A R and B R are both inconsistent. The splitting rule does, though, also present one of the limitations of the model. The problem of how to select p for splitting is difficult to solve, as the answer will vary depending on the model being verified. Many heuristics exist to assist in selecting a reasonable p, as a poor selection of p can reduce the performance of the Davis-Putnam SAT model by orders of magnitude. 15

26 Bounded Model Checking A further technique for combating the problem of state explosion is Bounded Model Checking (BMC), as introduced by Biere et al. in [18]. In the BMC technique, a limit of k is set on the number of state transitions within which a property must hold. This means that the paths to be searched in the model can have at most k + 1 states. Biere et al. also propose that in this model, k should begin with a value of 0 (searching for a single state counterexample). k can then be continually increased until either an imposed limit is hit, implying that there is no counterexample, or a counterexample is found with a length of k + 1. This imposed upper limit on k is information that would be provided by a user of the BMC system. As logic designers generally know the bounds within which a given property should hold, expecting this input for BMC is a reasonable assumption. Copty et al. investigated the effectiveness of BMC in an industry setting in [19]. Portions of the Pentium 4 TM were used to test BMC with a SAT solver against an OBDD symbolic model checker. BMC was found to provide improved productivity over the OBDD solution, mainly due to the high amount of manual tuning required with OBDD to optimize its performance Automatic Test Pattern Generation Automatic Test Pattern Generation (ATPG) is another method that avoids the state-explosion problem by employing a different approach to model checking. ATPG focuses on the stuck-at fault model, which is designed to detect faults where a signal in the circuit being verified is stuck at a constant value, regardless of circuit inputs. A Boolean model of the circuit being verified is stored in memory, but a full state expansion is not required. This then presents the necessity to have a method for modeling a circuit to be verified in memory for the ATPG algorithm to work 16

on. Armstrong presents a method for applying ATPG for combinatorial circuits in [20]. Hsiao and Chia then extended this to implement a solution for ATPG in sequential circuits in [21].

27 on. Armstrong presents a method for applying ATPG for combinatorial circuits in [20]. Hsiao and Chia then extended this to implement a solution for ATPG in sequential circuits in [21]. One of the first ATPG methods was the D-Algorithm as introduced by Roth et al. in [22]. In this method, combinations of primary input (PI) assignments are examined by making assignments at internal circuit nodes, based on the fault being tested. A new logic value of D is introduced in these assignments, which represents a value of 1 for a good circuit, when testing a stuck-at-0 fault ( D represents 0 when testing a stuck-at-1 fault). When working with a fault at some internal net in a circuit, this method will both propagate the D value forward, and maintain consistency by implying values back through the circuit based on propagated values. An example of this is shown in Figure 1.6. Figure 1.6: Propagation and Consistency in the D-Algorithm Given a stuck-at-0 fault under test for net N, the first implication is that in a good circuit, gate 1 must output a 1 (D). This then requires inputs A and B to both be 1. Further, to observe this fault on N, the value of D must propagate forward to output Z, passing through gate 2. For gate 17

28 2 to have a value of D on its output (Z), its other input must be 1, which implies that net P must have a value of 1 to maintain consistency. That then requires gate 3 to have both its inputs be 1, meaning that both net M and input D must be 1. Similarly, gate 4 must output a 1, implying that input C must have a value of 0. This has then generated a complete assignment on all inputs. One weakness in the D-Algorithm was first seen when exercising the method on circuits that included error correction code (ECC) logic. With ECC logic, an XOR tree to compute parity exists that is then reconvergent with the main logic being checked. This presents an efficiency issue when testing a stuck-at fault in the main logic, as the entire ECC parity tree will need to be evaluated to make a consistent assignment. To solve this problem, Goel presented the Path Oriented Decision Making (PODEM) system in [23]. PODEM differs in approach from the D- Algorithm, as instead of evaluating from the point of the stuck-at fault, it directly assigns PI values and tracks their effects to generate a complete PI assignment. Initially, all PIs to the circuit are assigned as don t care (X). PODEM then chooses a PI to make an assignment on, and implies that assignment forward through the circuit (similar to D-Algorithm propagation). If the assignment made is consistent with the required stuck-at test, then further assignments are selected, continuing until a complete assignment is made that satisfies the test. If an assignment is determined to be inconsistent with the stuck-at fault being tested, that assignment will be undone. One of the limitations of PODEM is that a good choice for what PI to assign and what logic value to use are critical to finding a complete PI assignment without extraneous evaluation. To assist with this Goel also presents heuristics for selecting a good assignment, which is based on finding a gate which has the stuck-fault (D) as an input, a don t care (X) on its output and is close to a primary output (PO) of the circuit. The logic is then backtraced from this circuit to 18

29 find the closest PI related to that gate, which becomes the PI for which an assignment will be made. A further efficiency improvement of PODEM is the fan-out oriented test generation algorithm (FAN), as presented by Fujiwara et al. in [24]. The core method used for improving efficiency is by limiting the extent of back-trace operations. This is done using the concept of head-lines and fan-out points. A head-line is a net in the circuit such that it is assigned a value of X (and all of its generation logic is also assigned X), and it is adjacent to another net with an assigned value. For example, a two input AND gate has one input with an assigned value, and another input with an X value. This net would then be a head-line, which means that backtracing can stop after a value assignment is made to the head-line. Since all final PI assignments associated with the head-line assignment are directly implied by the head-line, they can be deferred until the very end of the operation. Fan-out points are nets in the circuit that fan out to multiple gates. These are convergence points, where consistency must be maintained while back-tracing, since all of the fan-out points on the net must have values that do not present a conflict, or the backtrace must be stopped and a new assignment will be tried. In FAN, backtracing is stopped at the fan-out points, until all other objectives are exhausted. If one of the other objectives were to present a conflict at a fan-out point, that will be detected, and a new assignment can be tried, without having to have backtraced through all of the fan-out points. Yet another efficiency improvement, building off of FAN, is the Structure-Oriented Cost- Reducing Automatic TESt pattern generation system (SOCRATES) as presented by Schulz et al. in [25]. One of the main ways that SOCRATES improves on FAN is by the addition of 19

30 implication learning. During value propagation, values that are being assigned are implied by the original stuck-at value being propagated. SOCRATES evaluates these assignments to find non-local implications; that is, implications that aren t trivially determined by the normal FAN implication procedures. These learned implications are then stored, such that they can help speed up the future implication process. 20

31 1.3. Prior Work Keller et al provided one of the first looks at the use of ATPG engines in [26], where they were applied to general problems where a search space needed to be examined. Boppana et al investigated the use of sequential ATPG for model checking in [27]. Their work focused on safety properties and noted the fact that ATPG does not require explicit state-space storage between time-frames as a major advantage. Cheng et al discussed the use of ATPG for property checking in [28]. Their method involved mapping the property being checked into a combinational circuit, where the output would be tested for a stuck-at fault. Parthasarathy et al compared SAT and ATPG algorithms on combinational circuits in [29], finding that there is no performance gap between the two. This comparison is extended by Abraham et. al. in [30]. A software implementation for sequential ATPG was presented, and tested on benchmark circuits for checking temporal logic properties. In this case it was shown that ATPG outperformed SAT on smaller circuits, and further, SAT was unable to model some of the largest circuits. Qiang et al show in [31] that even with ATPG, larger circuits can cause software solvers to fail. An effective way to push beyond these limitations is to leverage parallelism. Czutro et al applied this idea with TIGUAN in [32]. TIGUAN applies a two-stage approach to SAT solving, where the first stage is a single threaded run, quickly working out the easy to solve faults. The second stage applies multi-threaded test generation to achieve a speedup on the hard to solve faults. Cai et al extended the application of threading in [33]. By creating a system where test generation 21

32 and good/fault simulation were all threaded, they were able to achieve a linear speedup of ATPG, up to the maximum of 8 CPUs used in testing. Another way to approach parallelism is to take advantage of the inherent parallelism of hardware. Sarmiento and Fernandez applied this method for fault emulation in [34]. By translating a circuit under test onto reconfigurable hardware, they were able to take advantage of hardware parallelism during propagation. This resulted in emulation being 27 to 2200 times faster than the associated software-based simulation, in their testing. Dunbar and Nepal used a similar strategy in [35], and found that by implementing multiple instantiations of a circuit under test on an FPGA, they could reduce the final test pattern set by 13% on average, while maintaining fault coverage and run time. Abramovici et al apply emulation on reconfigurable hardware to a PODEM-like instance specific SAT solver in [36], though only one objective is backtraced at a time. They put forth a new architecture in [37] where multiple objectives are backtraced in parallel, taking greater advantage of the parallelism available in hardware, and achieving an average 10x speed-up over softwarebased solvers. Gulati et al take this method further in [38], implementing an application specific SAT solver on reconfigurable hardware. To achieve this, the Conjunctive Normal Form clauses of the problem are split up into bins, which are then sequentially evaluated by the FPGA-based solver. This allows very large problems, which normally may not fit onto an FPGA, to be evaluated with this 22

33 architecture. Ultimately, they demonstrated an average 17x speed-up over the best softwarebased solvers. Kocan and Saab also leverage reconfigurable hardware to implement a concurrent D-algorithm in [39], handling both propagation and justification. They found this to be 3.25 to 14.8 times faster than equivalent software, with higher speed-up for larger circuits. This work extends the idea of reconfigurable hardware emulation to the more complex problem of sequential ATPG. One of the unique core requirements of sequential ATPG is that multiple frames must be modeled to account for the temporal aspect of the circuit. Like the previously discussed works in reconfigurable hardware emulation, fine-grain massive parallelism is leveraged to gain a significant benefit over software solvers. 23

34 2. Algorithm The first step in designing an algorithm to support hardware emulation of sequential ATPG is determining how to handle modeling the circuit being verified. Any sequential circuit can be thought of as being composed of two parts, combinational logic and flip-flops (state elements). To model the circuit s behavior over time, it can be unrolled into an Iterative Logic Array (ILA), as described by Abramovici et al in [40]. In this process, all flip-flops are removed from the circuit, with their inputs and outputs being re-purposed as pseudo-primary inputs and outputs of the circuit. The ILA consists of k instances of the circuit (where k is the search bound), with each instance being referred to as a time frame. Each of these frames is connected together, in order, by the state input/outputs from the removed flip-flops. This structure is illustrated in Figure 2.1. Figure 2.1: ILA Model of Sequential Circuit for k Time Frames As part of the ILA modeling method, the property that is being checked is also transformed into a structural monitor block on the primary outputs of the final frame of the ILA model. This monitor is designed such that the property is achieved by assigning a value of 1 to line k (the output of the monitor). Then, an ATPG-based justification algorithm can be applied to find a set of conditions to satisfy line k = 1. 24

35 This property justification uses a PODEM-like approach to trace an objective on the output of a frame, back to a set of required inputs to the frame. To distinguish the original combinational inputs/outputs of the circuit from the new, state-based inputs/outputs, the originals are called Primary Inputs and Primary Outputs (PIs and POs) and the state-based are called Pseudo- Primary Inputs and Pseudo-Primary Outputs (PPIs and PPOs). The objective tracing function within each frame is shown in Figure 2.2. Figure 2.2: Objective Tracing Within a Frame The algorithm works on one frame at a time, starting with frame k. As the first objective (line k = 1) is traced back, any objective values that propagate to a PPI of the frame, then imply an objective on the PPO of the previous frame that must also be traced. Objective values traced back to PI values require no further justification, as these values can be set independent of the time frame. Once all objective values for the current frame have been successfully traced back, processing will make a decision on what more needs to be done. If only PI objectives were traced back in the current frame (or the PPI objectives match an initial state PPI 1 ), then the algorithm has successfully found a test pattern to generate the original objective of line k = 1, and is done. If there are any PPI objectives that require further justification, processing will move to 25

36 the previous frame in the ILA. This strategy is called Reverse-Time Processing (RTP), as described by Marlett in [41]. If, while tracing objectives in a frame, a conflict is found (two objective traces require different values on the same signal), a Backtrack operation will remove the conflicting assignment. This will move processing back to the previous assignment. If there were no other assignments remaining in the current frame, then processing will return to the prior frame (frame T i+1 ) to get back to the previous assignment. If conflicts are encountered such that line k = 1 cannot be justified in frame k, then the algorithm fails. This decision flow is shown in Figure 2.3. Figure 2.3: Algorithm Flow Diagram The implementation of the algorithm flow chart shown in Figure 2.3 can be thought of in four separate categories. The first category are decisions related to the current time frame; based on 26

37 the current state of objectives, should more processing be done on the current frame, or should processing move to a different frame. These decisions fall to the PI/PPI Decision Block. The second category consists of decisions based on objectives in the current frame; which objective should be bracktraced, are there any conflicts and are all objectives satisfied. These are part of the Objective Decision Block. The third category handles the implication portion of the PODEM-based logic tracing. This is where current input objective values are propagated forward through the circuit under test to verify the consistency of the output objective values being traced. This is handled by the Forward Network. The fourth, and final, category handles the backtrace portion of the PODEM-based logic tracing. This is where an output objective value is backtraced to necessary objective values on inputs. This is handled by the Backward Network. 27

38 3. The Architecture As discussed in the previous section, the algorithm presented here can be divided up into four discrete functional groups. These groups directly correspond to the different modules used in implementing the algorithm on reconfigurable hardware. The required connections in the algorithm flowchart then become data signals between each of the modules, as shown in Figure 3.1. The Verilog code implementing all of the modules described in this section is included in Appendix D. Figure 3.1: Top Level Architecture Block Diagram The PI/PPI Decision Block controls time-frame based decisions, which are contingent on the current state of objectives in the frame. Thus, the Objective Decision block communicates this 28

39 state using Done, Conflict, line_k_x, and line_k_1. The PI/PPI Decision Block must set up each new time frame when moving to a new one, so it must set the new values in the Forward Network (via the PI and PPI busses) as well as providing the objectives for the new frame to the Objective Decision Block (via the ppi bus). The Objective Decision Block is responsible for all objective-based operations. It must examine the output of the Forward Network to ensure that all current objectives are consistent (via the PO and PPO busses). When it selects the next objective to be worked on, it passes that objective to the Backward Network over the obj bus. It must then also signal the PI/PPI Decision Block to make a decision on the current frame, once the objective operations are complete. The Forward and Backward Networks implement the logical tracing of objective values for the circuit being tested. To maintain proper consistency when objectives are being backtraced, all node values in the Forward Network must be passed to the Backward Network (via the STATE busses). Finally, backtraced objective values from the Backward Network must be passed back to the PI/PPI Decision Block for storage (via the in bus), as they represent objectives for the next frame. 29

40 3.1 PI/PPI Decision Block The PI/PPI Decision Block provides the central control system for the architecture, as well as the main storage. As such, this block can be thought of in terms of two major components; the control logic, and the results storage RAM. The full structure of this block can be seen in Figure 3.2. Figure 3.2: PI/PPI Decision Block Structure Diagram The results storage RAM component of the block is a 14 bit wide, by 8k deep FPGA block RAM. The 14 bit word width of the RAM is fixed, based on a 14 bit objective encoding scheme, described in section 3.2. The depth of the RAM is a variable limit on the total number of objectives that can be stored, constrained by the available block RAM sizes on the FPGA being utilized. 30

41 The control logic takes input signals from the Objective Decision Block (Done, Conflict, line_k_1, line_k_x) and uses these to make decisions on how to proceed in the algorithm. These decisions will result in one of four main operations performed by the PI/PPI Decision Block. Inside the control logic is a state machine which executes these operations through multiple state transitions. Table 3.1 summarizes these states and transitions. State Transition(s) Operation Description s0 init Idle / NoOp Waiting for external trigger. s1 s16, s17 Make Decision Current frame assignment incomplete. s2 s12 Pop Value Pop the top obj value from the RAM. s3 s3, s7, s8, Handle frame value readout when moving between Frame Readout s22 frames. s4 s0 Set Top Mark Set frame top mark bit for move to frame T i-1. s5 s3 Move to T i+1 Setup to start move to frame T i+1 (back off frame). s6 s12 Swap Val Swap the obj value of top word in current frame. s7 s6, s9 Backtrack Return Handle returning from Ti+1 to Backtrack. s8 s11 Start Counter Start counting up or down depending on cnt_dir. s9 s12 Clear Val Clear top word from in current frame from RAM. s10 s0, s2, s19 VF Assign (top) Assign current value on top into fwd value buffer registers. s11 s4 Stop Counter Stop counter after one decrement. s12 s10, s18, s20 Backtrack Update Update VF based on last Backtrack operation. s13 s13 (end) Fail Justification failed. s14 s14 (end) Done Justification complete. Results in RAM. s15 s3 Move to T i-1 Setup to start move to frame T i-1 (start new frame). s16 s0 Done with Objs All objs in received. Push onto Forward Network. s17 s10 Request Obj Request next obj from Backward Network Decoder. s18 s5, s6, s9 Backtrack Op Selects current operation to perform in Backtrack. s19 s10 Read RAM Read current addr in RAM. s20 s10 VF Assign (ow) Assign current value on ow into fwd value buffer registers. s21 s9, s15 State Check Verify that current frame is unique before moving to T i-1. s22 s3 VF Assign (top) Assign current value on top into fwd value buffer registers. Table 3.1: PI/PPI Decision Block Control Logic States The first major operation performed by the PI/PPI Decision Block is the More Justification operation. This operation occurs when working inside a frame, once a new objective has been 31

42 backtraced, and new objectives are available on the Backward Network. Since the Backward Network may have backtraced multiple new objectives, the goal of the More Justification operation is to iterate as many times as necessary to push all objectives from the Backward Network into the block RAM, and onto the Forward Network. The pseudo-code for this operation is shown in Table 3.2. while (in!= 14 b ) begin push in onto Block RAM; push in[2:1] onto fwd buffer addr in[13:3]; signal back_encoder for next in value; end push fwd buffer values onto Fwd Network; Table 3.2: More Justification Operation Pseudo-code To achieve the function of this operation, four states act as operations in a loop, with an additional state for operation when looping completes. The operation consists of a loop through four states with an additional completion state, as illustrated in Figure 3.3. Figure 3.3: State Transition Model of More Justification Operation 32

43 From the idle state of s0, when the Backward Network Encoder receives backtraced objectives from the Backward Network, it saves them into a buffer and sets the NReady signal low. This triggers the PI/PPI Decision Block to move into state s1. In this state, the PI/PPI Decision Block sets the genobj signal high, which triggers the Backward Network Encoder to send an objective value. The encoder scans through its buffer, starting from the last position of the objective pointer (last objective sent out). If another objective is encountered, the 14 bit objective value is pushed onto the in bus, back to the PI/PPI Decision Block. If the end of the buffer is reached (no further objectives), in is set to all 1s. The PI/PPI Decision Block makes a transition based on this. In the case where an objective value was set, the state moves to s17, where the objective value on in is pushed into the block RAM. In the next clock cycle the state always proceeds to s10, where the value that was just pushed into the RAM is also pushed through to the forward network value buffers (within the PI/PPI Decision Block). The state then returns to s0; idle. In the case where all 1s are set on in, the state moves to s16, which pushes all values in the Forward Network values buffer onto the Forward Network, starting the Imply operation. The second major operation of the PI/PPI Decision Block is Move to T i-1. This operation is triggered when all objectives in the current frame have been satisfied. In frame_k, this means that the value on line_k=1 (signal line_k_1 high and line_k_x low). Beyond frame_k, this means that all PPI objectives from the previous frame match the PPO values of the current frame (signal Done high and Conflict low). The signals that trigger this operation are all generated by the Objective Decision Block. The goal of the Move to T i-1 operation is to set up the system to start working on the next frame in the justification. The Move to T i-1 operation consists of three main parts. The first is state containment, where the assignment made in the current frame 33

44 is checked against all past frames to ensure a loop has not been encountered. The second is the process of pushing all PPO objective values from the current frame out to the RF in the Objective Decision Block, to be justified as PPIs in the frame that is being moved to. The third, and last, is the clearing of the Forward Network, which then triggers the start of the Imply operation. The pseudo-code for this operation is shown in Table 3.3: //State Containment while (addr!= 14 b ) begin if (top flag set) begin if (current_frame == past_frame} begin Stop Move to Ti-1; Start Backtrack; end end addr--; end //Push PPOs to Obj Dec Block Reset addr to top; while (!top flag set) begin if (val at addr is PPI) begin Push value to Obj Dec Block RF; end addr--; end //Clear Fwd Network Set all values in Fwd Network buffer to 2 b11; Push Fwd Network buffer values to Fwd Network; //Update frame counter frame_count--; Table 3.3: Pseudo-code for Move to T i-1 Operation The implementation of this operation utilizes 7 states in sequence, with 2 states having internal loops to iterate on multiple objective values inside of and across frames. This logic flow is shown in Figure 3.4: 34

45 Figure 3.4: State Transition Model of Move to T i-1 Operation The triggering for the Move to T i-1 operation begins within the Objective Decision Block. Whenever the Imply operation completes on the Forward Network (new PO/PPO values arrive at the Objective Decision Block), these values are checked to see if the current objectives have been met. If the system is currently in frame_k, then the only check is if line_k is 1. If so, line_k_1 is set high and line_k_x is set low. If the system is beyond frame_k, then each PPO value from the Forward Network is compared against each PPI value stored in the Objective Decision Block s RF. If all values match, then Done is set high and Conflict is set low. Once this check completes positively, the Objective Decision Block sets the newframe_ready signal high. This signal triggers the Backward Network Encoder, which controls the NReady signal. The raising of the newframe_ready signal translates into a raising of the NReady signal, which triggers the PI/PPI Decision Block to move out of idle. Given the signals set by the Objective Decision Block, the PI/PPI Decision Block will begin the Move to T i-1 operation by moving to state 21. This state implements the state containment check, as shown in Figure 3.5: 35

46 Figure 3.5: State Containment Check Implementation The check keys off the fact that at the end of working on a frame, the full PI/PPI assignment for that frame is contained in the buffer to the Forward Network (valuestofoward). A duplicate of that buffer is used, called pastframevalues. A value from past frames stored in the block RAM is read into pastframevalues each clock cycle. When the top mark bit is set on a value being read out, this indicates that the last value read out was the last value of a complete frame, which triggers the endofframe signal. When endofframe is set, the two sets of values in the buffers are compared to check for an exact match, which becomes the statecheckresult value. If the end of the values in the block RAM has not yet been reached, the values in pastframevalues will be cleared and the next frames values will continue to be read in for another round of comparison. If all values in the block RAM have been read out, and statecheckresult remains 0, the check passes. When the check ends, either passing or failing, the donewithstatecheck signal will be raised. When donewithstatecheck is raised, statecheckresult will determine the next state transition; if 1, the state containment check failed, and the state will transition over to the Backtrack operation if 0, the state containment check passed, and the state will transition to state 15 to continue the Move to T i-1 operation. 36

47 State 15 will always transition to state 3 on the next clock cycle. In state 3, the block RAM is enabled in read mode, and one objective value per clock cycle is read out from the current frame. These values are examined by the isppi module, which compares the address bits of the objective to the number of PIs in the circuit to decide if the objective is a PPI. If the objective is a PPI, the value is set on the ppi signal. This signal travels to the Objective Decision Block, where the value will be read into the RF. When the top mark bit is encountered in the block RAM, the current frame has been completely read out, and the endofframe signal is raised. This triggers state 3 to transition to state 8. State 8 enables the frame counter to count down (to track the move back by one frame to T i-1 ) by setting Cnt_enable high and Cnt_dir low. On the next clock cycle, the state will always transition to state 11, which disables the counter by setting Cnt_enable low. As the counter is allowed to run for one clock cycle, it will count down by 1, tracking to the new frame being moved to; T i-1. State 11 also sets the nclear signal to 2 b00, which triggers the valuestoforward buffer to be cleared out, and the clear values to be pushed onto the Forward Network. This then begins the next frame s Imply operation. The PI/PPI Decision Block then completes one final action by proceeding to state 4, which triggers the block RAM to set the top mark value on the top word by setting rewrite1 high and rewrite0 low. This, then, marks the last word from the frame pushed into the block RAM as the end of the current frame. The next clock cycle will always transition back to state 0, idle, for the PI/PPI Decision Block to wait for its next trigger based on the new Imply operation that was started. 37

48 The third major operation of the PI/PPI Decision block is Backtrack. This operation is triggered by the Conflict signal from the Objective Decision Block. This signal being raised indicates that either a PPO from the Forward Network conflicts with a stored PPI, or a conflicting objective assignment has caused an objective propagation failure inside the Backward Network. The Backtrack operation consists of two functions that are used in a complementary fashion. The Swap Val function will change the value of the top objective between 0 and 1, as well as set the flag bit to indicate that Swap Val has been run on this objective. The Clear Top operation removes the top objective from the stack for the current frame. Whenever a Conflict is encountered, any objectives that have already been swapped are cleared, and then the next un-swapped objective value is swapped. This operation is detailed in the pseudo code shown in Table 3.4: //Clear Top while (flag set on top word) begin Rewrite top word: value=2 b11; Push top to Fwd Network buffer; Pop top off block RAM; if (mark set on top word) begin Start Move to Ti+1 Operation; end addr--; end //Swap Val Rewrite top word: swap value, set flag; Push top to Fwd Network buffer; Table 3.4: Backtrack Operation Pseudo-code The Backtrack operation is implemented over a total of 7 states, with 4 handling the functions and 2 handling the control flow (plus the idle state). This construction is show as a state transition model in Figure 3.6: 38

49 Figure 3.6: State Transition Model of Backtrack Operation The Backtrack operation begins external to the PI/PPI Decision Block, with a conflict in objectives being detected. In one case, if a PPO value read out of the Forward Network disagrees with a PPI value stored in the Objective Decision Block s RF, the Conflict signal will be raised and the newframe_ready signal will toggle. The toggling of newframe_ready triggers the Backward Network Encoder to set the NReady signal low. In the other case, a new obj is pushed onto the Backward Network. If this does not resolve into a change in the output of the Backward Network by the next clock cycle, the Backward Network Encoder will time out and raise the propfail signal and set delayed_nready. The propfail signal triggers the Objective Decision Block to raise the Conflict signal. The delayed_nready signal will have the Backward Network Encoder wait until the next clock cycle before setting NReady low, which allows time for the Objective Decision Block to receive propfail and assert the Conflict signal. Once the PI/PP Decision Block receives the NReady signal, it is triggered to leave the idle state, s0. In this case, with Conflict set, there are three possible transitions that it could make. If the 39

50 current top word on the block RAM has its mark bit set, the Move to T i+1 operation will start, as the mark bit indicates that the current top word is the last of a previous frame. Otherwise, a function will be run on the top word based on its flag bit. If the flag bit is not set, the value will be swapped, the flag bit will be set and the word will be written back to the block RAM. If the flag bit is set, this word has already been swapped, so the value will be cleared (set to 2 b11), the clearvaluescheck control var will be set and the word will be written back to the block RAM. Both states then transition on to state 12. State 12 acts as the main control state for the Backtrack operation. State 12 will always set the control value sendvaluestoforward=0, which will suppress any value updates in the Forward Network buffer from being sent through to the Forward Network. This allows for multiple value updates (ie, Clear Val and then Swap Val ) before finally triggering the Forward Network with a new Imply operation. When first visiting state 12, no control bits are set, so the process continues on to state 10. In state 10, the current top word value is pushed to the valuestoforward buffer. If the previous function that was run was a Swap Val, then the operation is complete and the state will transition back to the idle state s0. In this transition the sendvaluestoforward control value will be set back to 1, which will then trigger the Imply operation to start onto the Forward Network with the updated values on the Forward Network buffer. If the previous function run was Clear Val, then the clearvaluescheck control value was set, which will cause state 10 to transition to state 2 (clearvaluescheck will also be set back to 0 at this point). State 2 will begin a pop operation in the block RAM, which will remove the current top word from the stack. The popcheck control var will be set to 1 at this point, and the process will then 40

51 transition back to state 12. On this second visit to state 12, the control var popcheck has been set, which will cause state 12 to redirect the process flow to state 18 (as well as resetting the popcheck var to 0). In state 18, the new top word will be examined to determine what function runs next (the same initial decision that was made transitioning out of state 0). If the mark bit is set, the Move to T i+1 operation will start. If the flag bit is set, processing will return again to state 9 to clear this word. In the default case, neither bit is set, indicating a fresh word. Processing will then proceed to state 6. In state 6, the value of the top word will be swapped and written back to the block RAM, along with the flag bit being set. The process continues on, back to state 12. This time through, again no control vars are set, so processing continues on to state 10. Assuming that this time through, state 6 was visited and a value was swapped, the clearvaluescheck control var will not be set and processing will finally return to state 0. As part of this transition, the sendvaluestoforward control value will be set back to 1, which will trigger the updated Forward Network buffer to send its values onto the Forward Network, beginning a new Imply operation. The fourth major operation of the PI/PPI Decision Block is Move to T i+1. This operation is called as part of the Backtrack operation, when all objectives in the current frame have been exhausted. That is triggered when the top word on the block RAM has its mark bit set, indicating that it is the last objective of the previous frame in the stack. The core function of the Move to T i+1 operation is to restore the state of the system to what it was at the end of the previous frame in the stack (T i+1 ). The first step to accomplish this is to un-mark the top word 41

52 for frame T i+1. Then, all objective values in frame T i+1 need to be read back onto the Forward Network. Finally, all PPI objective values from frame T i+2 need to be loaded back into the RF in the Objective Decision Block. Once this is complete, the Bracktrack operation which started the Move to T i+1 operation can continue. The function described here is illustrated in pseudocode in Table 3.5: Rewrite top word: mark=1 b0; //Read Ti+1 objs to Fwd Network Buffer while (!endofframe) begin Read obj at block RAM addr to Fwd Network Buffer; addr--; end //Read Ti+2 PPIs to Obj Dec RF while (!endofframe) begin if (val at addr is PPI) begin Push val to Obj Dec Block RF; end addr--; end //Update frame counter frame_count++; //Return to Backtrack operation Return to calling Backtrack; Table 3.5: Move to T i+1 Operation Pseudo-code The Move to T i+1 operation is implemented in a total of 4 states. One handling the starting unmarking of the top word, another handling the frame counter update and return to Backtrack, and the remaining two handling the looping read-out of objectives from the block RAM. The processing flow of these states is shown with state transitions in Figure 3.7: 42

53 Figure 3.7: State Transition Model of Move to T i+1 Operation The Move to T i+1 operation always starts coming from a Backtrack operation, where the top word on the block RAM has its mark bit set, indicating that all objective words for the current frame have been exhausted. The first way for this to occur is transitioning from state 0. If the Conflict signal is set, as discussed in the startup of the Backtrack operation, but the top objective word on the block RAM has its mark bit set, processing proceeds directly into the Move to T i+1 operation, going to state 5. The alternate case is when the Backtrack operation is proceeding, but after clearing all flagged objectives, the next one encountered on the block RAM has its mark bit set. In this case, state 18 in the Backtrack operation will transition to state 5 to start up the Move to T i+1 operation, so that Backtrack can continue. State 5 completes a rewrite operation for the top word in the block RAM, reading the word out, setting the mark bit to 1 b0 and then writing it back to the same address. Once the unmarking is complete, state 5 transitions to state 3 in the next clock cycle. State 3 handles reading out objectives from the memory to restore the system state to that of the previous frame. It puts the 43

54 block RAM into read mode, sets the current addr to the top word on the block RAM, and will decrement addr every clock cycle, to read out one objective per cycle. When first transitioning to state 5, the blockisppi2 control var is set, which blocks the output of the RAM from being evaluated by the isppi block (forwarding PPIs to the Objective Decision Block RF). The first set of objectives to be read out is frame T i+1, so nothing goes through isppi to the RF. Since blockisppi2 is not set, each clock cycle the process transitions to state 22. In this state, the objective word that was just read out of the block RAM (into the top signal) is assigned into the Forward Network buffer. State 22 then always transitions back to state 3 for the next clock cycle. In state 3, once the first mark bit is encountered while reading off objective words, the blockisppi2 control var is set to 0. This mark bit indicates that the addr is now pointing into frame T i+2, so objective words should no longer be assigned into the Forward Network buffer and should instead be passed through isppi to the Objective Decision Block RF. Processing will now stay in state 3, as each clock cycle a new objective word is read from the block RAM into the top signal, which feeds into the isppi block. For each of these objective words, if their address value is greater than the number of PIs in the circuit under test, the objective value will be forwarded on to the Objective Decision Block via the ppi signal, where it will be read into the RF. When the second mark bit is encountered in state 3, processing will proceed to state 7. The second mark bit indicates that the current address is pointing into frame T i+3, so there are no more objective words to be read out in this operation. State 7 enables the frame counter to count 44

55 up by setting Cnt_enable to 1 b1 and Cnt_dir to 1 b1. The flag bit on the top word in the block RAM is then examined to decide on which state in the Backtrack operation to return to. If the flag bit is set, the current objective words has already been swapped, so processing transitions to state 9 to clear the value and continue on in the Backtrack flow. If the flag but is not set, the current objective has not yet been swapped, so processing transitions to state 6 to swap the value and continue with Backtrack. Note that both state 6 and state 9 set Cnt_enable to 1 b0, disabling the counter, which locks the counter back in after incrementing by 1 to track to the new current frame. 45

56 3.2 Objective Decision Block The Objective Decision Block serves as the main interface between the Forward Network and Backward Network; providing two core functions. The first is selection of objective values to be pushed onto the Backward Network as input to the Backtrace operation. The second is signaling the PI/PPI Decision Block as to the status of objective satisfaction within the current frame being operated on. To provide these functions, the Objective Decision Block must store a full set of objectives for the current frame. To this end, the heart of the Objective Decision Block is a register file which is sized to be able to contain as many objectives as there are PPI/PPOs in the circuit under test (l). The high level structure of implementation for the Objective Decision Block is illustrated in Figure 3.8: Figure 3.8: Objective Decision Block Structure Diagram Storing objectives for the current frame is an important action in the Objective Decision Block which is the first action taken when a new frame is being evaluated. When the PI/PPI Decision Block begins moving to a new frame, a signal will be sent to the Objective Decision Block. In the case of moving to frame T i+1, the tiplus1 signal will be set to 1. In the case of moving to 46

57 frame Ti-1, the output signals from the State Check sub-module of the PI/PPI Decision Block will provide indication to the Objective Decision Block (donewithstatecheck=1 b1 and statecheckresult=1 b0 indicates a successful move to frame T i-1 ). When this new frame signal is received, any current values in the RF are flushed out, replacing all words with 14 b0. After the RF has been cleared out, the PI/PPI Decision Block begins shifting all objectives for the current frame to the Objective Decision Block over the ppi bus. The idle state for the ppi bus is 14 b , so any time this is assigned to a different value, that indicates that a new objective word is being pushed from the PI/PPI Decision Block. This value will then be read from the ppi bus into the Objective Decision Block RF. The reading process involves decoding the objective word, to insert the objective value into the proper associated RF address. Since the value of the address represents the position of the objective value across all PIs and PPIs, the address must be corrected by the number of PIs in the circuit under test (represented by the parameter m). This shifts the address of the objective word to have a value from 0 to l, which matches the size of the Objective Decision Block RF. This value read-in operation is illustrated in Figure 3.9: 47

58 Figure 3.9: Objective Decision Block PPI Objective Read In Selection of an objective value to begin a Backtrace operation with can be broken down into two distinct cases. The first, simpler, case is when operating in frame k. In this case, there are no PPI values from a prior frame to be satisfied; there is only one objective, which is the base objective that the test is attempting to find a pattern for; line_k = 1. With line_k being the first PO (based on the bit order of the PO bus), the objective is assembled with an address component of 0, and an objective value of 1 (encoded with the standard three-value 2-bit encoding as 10). The flag and mark bits are set to 0, as the PI/PPI Decision Block is the only place where these bits can be modified to 1. The assembly of this bit data into the 14-bit objective word format is shown in Figure 3.10: 48

59 Figure 3.10: Assembly of Frame-k Objective Word The second, more complicated, case for selecting an objective is in a frame other than frame k. In this case, there is a set of objectives that must be satisfied for the current frame; so the Objective Decision Block must evaluate the current satisfaction of objectives in the frame and select the next un-met objective for Backtrace. This evaluation occurs as a logical comparison between the current fame objective values in the RF against the current PPO values from the Forward Network. These comparisons can all occur in parallel, as shown in Figure 3.11: 49

60 Figure 3.11: Objective Decision Block Conflict/Done Decision Logic The results of all comparison operations are combined together to produce composite Done and Conflict signals. Done indicates that all current frame objective values match their associated values on the Forward Network PPO, hence the current frame is completely satisfied. Conflict indicates that a value on the Forward Network PPO is incompatible with the associated current frame objective in the RF. If either Done or Conflict is set, then the newframe_ready signal will also be toggled. This is seen by the Backward Network Encoder, which subsequently sets the NReady signal low. The setting of the NReady signal triggers the PI/PPI Decision Block to start 50

61 evaluation. When Done is seen while operating outside of frame k, a Move to T i-1 operation will begin. When Conflict is seen outside of frame k, the PI/PPI Decision Block will take corrective action via the Backtrack operation. In the case where neither the Done or Conflict signal is set, more justification is required for the current frame. This situation will trigger the sequencer to select a new objective value to start a Backtrace on. The sequencer examines each objective value in the RF, until one is found associated with an X (2 b11) value on the Forward Network PPO. This objective value is then selected as the next objective to be pushed to the Backward Network for Backtrace. This function of the sequencer is illustrated in Figure 3.12: Figure 3.12: Objective Decision Block Sequencer Operation 51

62 In addition to selecting objective values, the Objective Decision Block is responsible for maintaining state control signals that are output to the PI/PPI Decision Block. These include Done and Conflict (as previously discussed), as well as line_k_x and line_k_1. Similar to how Done and Conflict provide signaling to the PI/PPI Decision Block when not in frame k, both of the line_k_* signals are used when in frame k. If the value of line k currently coming out of the Forward Network is X/2 b11 (this is the state after initial reset), the Objective Decision Block sets the objective of line_k=0, and also sets the line_k_x signal to the PI/PPI Decision Block. In frame k, line_k_x will trigger the PI/PPI Decision Block to accept backtraced objective values that will be coming from the Backward Network based on the selected objective. If in frame k, and the value on line k from the Forward Network is 1, then frame k is completely satisfied. The line_k_1 signal is set, and the newframe_ready signal is toggled. The toggle of newframe_ready is detected by the Backward Network Encoder, which will consequently set the NReady signal low. This then triggers the PI/PPI Decision Block, reading line_k_1 in frame k, causing a Move to T i-1 operation to start. One final control signal interlock managed by the Objective Decision Block is the propfail signal, originating from the Backward Network Encoder. When an objective pushed onto the Backward Network as part of a Backtrace operation does not produce any objective values, an objective propagation failure has occurred. When this happens, a Backtrack operation is needed from the PI/PPI Decision Block to undo the cause of the propagation failure. So, when the Backward Network Encoder detects a propagation failure, the first action it takes is sending the propfail signal to the Objective Decision Block. This signal triggers the Objective Decision Block to set the Conflict signal. The Backward Network Encoder s next action is then to set the 52

63 NReady signal low. This causes the PI/PPI Decision Block to begin an operation. As the Conflict signal has been set, the PI/PPI Decision Block will begin the needed Backtrack operation. 53

64 3.3 Forward Network Derivation One important facet of the algorithm described here is the need to model 0, 1 and X values in the circuit, and hence the need for signals in the network to be represented by 2-bit values. This means that to be modeled in this architecture, circuits under test must first be transformed into two-bit equivalent networks to be compatible. The more straightforward of the two required network translations is the Forward Network, as it is a direct mapping of function from one bit to two. In mapping the Forward Network, the representations that will be used are 01 as a logic 0, 10 as a logic 1 and 11 as a logic X. The first step in translation is to expand a gate s one-bit truth table into a full form, including the value X in all possible combinations. This expansion is shown in Table 3.6, using an AND gate as an example. A B Y X X X X 0 0 X 1 X X X X Table 3.6: Full One-Bit Truth Table for AND Gate The fully expanded truth table can then be directly mapped into a two-bit representation. Note that in this translation to two-bit representation, the meaning of X changes. Xs from Table 3.6 are translated to 11, but since 00 is an unassigned value in the Forward Network, when either A 54

65 or B is 00, the output will be don t care or X. Table 3.7 continues the example of this using an AND gate. A1 A0 B1 B0 Y1 Y X X X X X X X X X X X X X X Table 3.7: Two-Bit Truth Table for AND Gate The expanded two-bit truth table can then be broken up into two separate Karnaugh maps, one for the output bit Y1 and another for the output bit Y0. Table 3.8 continues the example using the AND gate, showing the associated K-maps with minimum groupings highlighted. Table 3.8: Split K-maps for AND Gate Output Bits 55

66 From these Karnaugh maps, minterm expressions can be derived for both Y1 and Y0. Note again that 00 is an unassigned value in the Forward Network, thus Xs are assigned to those spots in the K-map, which can then be used to extend minterm groups for further simplification. Table 3.9 completes the example using the AND gate. Y1 = A1 B1 Y0 = A0 + B0 Table 3.9: Minterm Expressions for AND Gate Output Bits Using this method for each of the three basic gates, a full set of translation equations for the Forward Network can be obtained. These equations are summarized in Table AND Gate OR Gate NOT Gate Y1 = A1 B1 Y0 = A0 + B0 Y1 = A1 + B1 Y0 = A0 B0 Y1 = A0 Y0 = A1 Table 3.10: Forward Translation Equations for Basic Gates 56

67 3.4 Backward Network Derivation The more complex of the two network translations is the Backward Network, due to the fact that the function of each gate must be transformed, as opposed to being directly mapped between bit representations. One important consideration in backtracing is that consistency must be maintained. That is, if a backtrace is attempting to assign a value in the Backward Network that does not match a value for that node that has already been assigned in the Forward Network, that assignment must be prevented. In the context of backtracing through a gate, this means that propagating an objective value requires the corresponding signal be don t care (X) in the Forward Network. This can either be implemented as a check against the inputs of each gate being evaluated, or the output of each gate being evaluated. Here, the output of each gate is used as the check to minimize the number of required consistency checks. This dependence is illustrated in Figure Figure 3.13: Abstract Backward Gate Values Dependence From this abstract model of a gate in the Backward Network, a functional definition must be defined. To this end, pseudo-code defining the values for the A and B outputs can be defined, as in Table

68 A Obj Output If (Z_obj is required obj) then If (Z_fwd = X) then A_obj = Z_obj End End B Obj Output If (Z_obj is required obj) then If (Z_fwd = X) then If (Z_obj = 1) then B_obj = Z_obj End End End Table 3.11: Pseudo-code for AND Gate Backward Model Outputs Note that the behaviors of the A and B objective values differ based on the Z objective value. This is based on the requirement to complete a minimized backtrace operation. In the case of the AND gate, when an objective value of 1 is required on Z, both A and B must also have a required objective value of 1 to achieve this. When an objective value of 0 is required on Z, however, only one input needs to be 0 to achieve this. In this case, A is used for propagating the required objective of 0 and B is ignored. The next step from this point is to derive a truth table to model the described functionality of this gate. One important point in modeling this gate is that while the A/B state inputs from the Forward Network already have a format definition (0=01, 1=10, X=11), the objective inputs/outputs that are part of the flow of the Backward Network need a separate format. This is required because while the Forward Network models 3 signal values, the backward network must model 2 signal values, as well as whether or not the value is a required objective being backtraced. To this end, the Backward Network also uses 2-bit representations of signals, where 58

69 the higher order bit represents whether a value is an objective (1=yes, 0=no) and the lower order bit represents the value of the signal. Using this signal modeling, the truth table for a backward model of a gate can be derived, as illustrated in Table 3.12 continuing the AND gate example. Z_obj_1 Z_obj_0 Z_1 Z_0 A_obj_1 A_obj_0 B_obj_1 B_obj_ X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Table 3.12: Truth Table for Backward Model of AND Gate Many of the output values defined in the truth table are don t care, since this will occur for cases where the objective Z value is not a required objective, or if the Z value in the Forward Network is in an unassigned/illegal state (00). This indicates that equations defining this backwards gate model are likely to have a reasonably compact final form. In this case, the pseudo-code presented earlier can be directly translated into the simplified Boolean equations defining the backward model of the gate, as shown in Table 3.13: 59

70 AND Gate OR Gate NOT Gate A1 = Z1 Z0 Z o 1 A1 = Z1 Z0 Z o 1 A1 = Z1 Z0 Z o 1 A0 = Z o 0 A0 = Zo0 A0 =! Z o 0 B1 = Z1 Z0 Z o 1 Z o 0 B1 = Z1 Z0 Z o 1! Z o 0 B0 = Z o 0 B0 = Z o 0 Table 3.13: Backward Translation Equations for Basic Gates Since in the Backward Network encoding, the value is completely represented by the 0 bit, that value can always be directly passed from the gate output to the gate inputs (inverted in the case of NOT). The real decision being made in the backward model is with the 1 bit, whether or not the required objective propagates. For all inputs, the Forward Network gate value must be X (Z1 = 1, Z0 = 1). For the A input, the current output objective value in the Backward Network must also be 1 (Z o 1 = 1). For the B input, on top of all previous checks, the value of the required objective in the Backward Network must be such that all inputs of the gate must be set to achieve the objective (Z o 0 = 1 for AND, Z o 0 = 0 for OR) Backward Network Fanout Handling One added complexity in the Backward Network is logic fanout. The output of one gate driving multiple inputs is a fairly common occurrence in the forward logic, but this presents a problem in the Backward Network. When the logic flow is reversed, this would result in multiple gate outputs driving single inputs, resulting in contention on the input, as illustrated in Figure

71 Figure 3.14: Backward Fanout Signal Contention An important consideration in resolving this issue is that delay through different paths will vary, so different values may arrive at the fanout junctions at an arbitrary time. This opens the possibility of having transient value states at the junction that could potentially propagate value glitches to the output. To behave like a single input / single output node, these multiple input backward nodes must switch once and retain that value until the backward network inputs are changed. To this end, a priority encoder scheme is employed, such that the first objective value to arrive at a multiple input node is locked in on the output and any further input changes are blocked from propagating. Since signals in the Backward Network are composed of two bits, one determining if a value is a required objective and the other carrying the objective value, the priority encoding scheme will require two modules. The first module will generate the higher order, required objective bit. This is the simpler of the two from an implementation standpoint. The first time a required objective arrives to the priority encoder, the 1 value should pass through to the output and lock in 61

72 at that value. This can be implemented in a straightforward way by using an OR gate to merge required objective signals, and a feedback of the output to lock in a value of 1. This structure is illustrated in Figure Figure 3.15: Priority Encoder Structure for Required Objective Bit This architecture does present one additional challenge, though. With the loopback value lock directly feeding back into its own generating OR gate, this effects a permanent lock of the gate output value. Given that the Backward Network will be required to execute many backtraces in any given justification operation, there needs to be a way to unlock these priority encoded values. This can be implemented by a new priority_reset signal that is ANDed together with the Final Req Obj output before the loopback value lock. In normal operation, the priority_reset is held at 1, which allows the Final Req Obj to pass through. When the Backward Network values need to be cleared out, the priority_reset signal is set to 0, which causes the loopback value lock to reset. The priority encoder is then reset to an unlocked state, and is ready to re-lock when the next required objective propagates. This addition to the structure is illustrated in Figure

73 Figure 3.16: Priority Encoder Structure for Required Objective Bit with Reset The second module will generate the lower order, objective value bit. This structure will be slightly more complicated, because in addition to combining and locking values, values must only pass through to the final locking stage if their corresponding required objective bit is set. Given the requirement of the value passing with a control signal of 1 and being blocked with a control signal of 0, a simple AND between the required objective bit and objective value bit will generate the output value bit into the combination logic. These output value bits will remain 0 except in a case where a required objective of 1 arrives at the input of the node. In that case, the corresponding output value bit will reflect the input objective value bit. When this happens, the value must pass through as the final value bit into the locking logic. To achieve this, all output value bits are ORed together. Since all non-required objective bits are locked to 0, this then allows the value of the now required objective to pass through this stage of logic as well. This structure is illustrated in Figure

74 Figure 3.17: Priority Encoder Structure for Objective Value Bit This half of the priority encoder will require locking and reset logic as well. One important note in considering this implementation is that unlike the required objective bit, the objective value bit does not have a set transition when it needs to be locked. For this reason, the locking bit from the required objective bit must also be used to control value locking for the objective value bit. Since both halves of the priority encoder share common locking and reset control signals, the final structure can be represented in a single, complete unit, as shown in Figure Figure 3.18: Complete Structure for Backward Network Priority Encoder 64

75 This final structure achieves the required goal for resolving fanout node values in the Backward Network. When the backward network is idle / cleared, the priority encoder passes through the required objective bit of 0, and a 0 value bit. When a required objective value arrives at the encoder, if the encoder is unlocked, the objective value will pass through the required AND filter and the OR value combination and be output as the final objective value. The required objective bit will pass through the OR value combination and be passed on to the output as the final required objective bit. This signal will also lock in both the objective value bit and required objective bit, so that further input changes are blocked from effecting the priority encoder output, and hence the final network outputs. Finally, the priority reset signal serves to unlock the priority encoder, so that the next new required objective can be locked in, by forcing the lock value bit to 0, and allowing values from the inputs again to pass through to the outputs Backward Network Conflict Detection While backtracing an objective, it is possible for a conflict to occur. This is a situation where a prior objective in the same frame requires the output of a gate to be the opposite of the current objective s requirement. Since the Backward Network is designed to produce a single minimal objective assignment, a conflict condition means that the frame can not converge with its current set of objectives. There are two possible conflict scenarios that can occur in backtracing. The first, as shown in Figure 3.19 is a trace-blocking conflict. In this situation, the objective Z2=0 is backtraced first, generating two new objectives (A=1, C=0). The second backtrace is for the objective Z1=1, which encounters a conflict on the assignment of the output of G1. As this is the only active 65

76 backtraceing path for Z1, propagation is completely blocked. With the objective propagation blocked, no new values are propagated through to the Backward Network Encoder module. The Objective Decision module is signaled (via the prop_fail signal), causing it to raise the Conflict signal. Figure 3.19: Trace-blocking Conflict The second possible conflict scenario, as shown in Figure 3.20, is the non-blocking conflict. In this situation, the first objective backtraced is Z1=1. The second objective is Z2=0. This objective requires the output of G3 (an OR gate) to be 0, hence all inputs must propagate the objective of 0. Like in the previous example, a conflict occurs at the output of G1, but in this case there are two active backtracing paths. While the path backtracing to A=1 is blocked, the path backtracing to C=0 is not, so the Backward Network Encoder module still receives an objective. The Backward Network Encoder will then provide this objective to the PI/PPI Decision block as normal. Once the PI/PPI Decision Block pushes the new set of objectives to the Forward Network, though, the Objective Decision block will see that the objective of Z2=0 66

77 was not achieved (it will remain 1 due to the output of G1). This will then directly cause the Objective Decision block to raise the Conflict signal. Figure 3.20: Non-blocking conflict Backward Network Decoder Due to the previously described need for the Backward Network to use a different value encoding from the rest of the circuit, encoding and decoding modules are needed as wrappers around the Backward Network. The Backward Network Decoder module acts as the interface between the Objective Decision Block and the Backward Network. In addition to translating objective value encoding between these two blocks, the decoder must also handle resetting the priority encoding logic that is used to lock in values inside traces in the Backward Network. The entire operation to assign a new objective takes 3 clock cycles to complete, implemented as a state machine (controlled by the mode register) as shown in Figure 3.21: 67

78 Figure 3.21: State Transition Diagram for Backward Network Decoder Operation in the Backward Network Decoder is triggered by a change in the obj_set signal from the Objective Decision Block (a new objective value has been assigned to be backtraced). This causes the decoder to move from idle into the Backward Network reset state. In this state, the priority_reset signal is toggled, which causes priority encoder value locks within the Backward Network to release. At the same time, all PI/PPI input values to the Backward Network are cleared (value 2 b00) to flush out the network for a fresh value trace. On the next clock cycle, the decoder moves into the assignment state. In this state, the decoder makes an assignment to each PI/PPI Backward Network input bit. For the bit that corresponds to the address in the new obj from the Objective Decision Block, that value is assigned into the PI/PPI bus as a required objective. For all other bits, values are directly assigned from the corresponding PO/PPO values output by the Forward Network. These values are not required objectives, so their higher order objective bit is set to 0. This value assignment is detailed in Figure 3.22: 68

79 Figure 3.22: Backward Network Decoder Value Assignment Each address in the PI/PPI Backward Network input bus is compared to the address contained in obj[13:4]. When these match, the associated objective value (obj[2]) is assigned into that PI/PPI bus bit as a required objective (lower order bit is set to obj[2]; higher order bit is set to 1 b1). For any other bits, the associated higher order bit on the PO/PPO output bus from the Forward Network is assigned through into the lower order bits on the PI/PPI Backward Network input bus. Note that because of the way values are encoded outside the Backward Network, and because the Backward Network only deals with 0/1 (not X), the higher order value bit will always represent the binary value of that 3-state value. Once all bits in the PI/PPI Backward Network input bus have been assigned, the trace_start signal is toggled. This signal is output to the Backward Network Encoder, and notifies it that a new backtrace has started, so the Backward Network outputs should be monitored for a value change. After toggling the signal to the encoder, the decoder returns to an idle state. 69

80 3.4.4 Backward Network Encoder The Backward Network Encoder operates on the opposite side of the Backward Network, taking the output from the Backward Network, encoding it back into the 14-bit objective scheme of the architecture, and passing the data on to the PI/PPI Decision Block. In addition to value translation, the encoder must also be able to track the output values of the backward network to detect when a backtrace operation has failed to produce results. In order to implement these functions, the Backward Network Encoder is constructed via three states (tracked by the state register), as illustrated in Figure 3.23: Figure 3.23: State Transition Diagram of Backward Network Encoder The Backward Network Encoder begins operation in the idle state. When a Backtrace operation is started by the Backward Network Decoder, the trace_start signal is toggled. This signal triggers the encoder to move to the check state. At the start of the next clock cycle, in the check state, the current output of the Backward Network is compared against the last saved output from the network (the decoder saves the values output from the Backward Network every cycle that it is not in the check state). If the values differ, that indicates that the Backtrace operation successfully produced results. If that is the case, the higher order outputs bits are then checked to ensure that at least one is set to 1 b1 (indicating a required objective). Once that has been verified, the encoder moves on to the shiftout state. If the output values from the 70

81 Backward Network have not changed, or there are no required objectives, that indicates a Backtrace propagation failure. The propfail signal is set, which triggers the Objective Decision Block, which in turn triggers the PI/PPI Decision Block to start a corrective Backtrack operation. The shiftout operation is where communication of objective values happens between the Backward Network Encoder and the PI/PPI Decision Block. Each objective value saved from the Backward Network is sequentially shifted out to the PI/PPI Decision Block over the in signal bus. Because the operation of the PI/PPI Decision Block to write a received objective value into the block RAM takes multiple clock cycles, the triggering signals NReady and genobj are used to communicate readiness in the operation. The structure of this communication is illustrated in Figure 3.24: Figure 3.24: Backward Network Encoder Shiftout Operation 71

82 When first transitioning to the shiftout state the NReady signal is set low, which communicates to the PI/PPI Decision Block that a backtrace result will be ready. The PI/PPI Decision Block responds by setting genobj high, indicating that it is ready to accept a new required objective value on the in bus. This signal triggers the internal index in the encoder to start incrementing, searching for the next required objective in the saved values. Once the next required objective value is found, the value is encoded onto the in bus, and the NReady signal is again set to 0. This triggers the PI/PPI Decision Block to read the new value on in and store it into the block RAM. Once this operation is complete, the PI/PPI Decision Block again sets genobj, triggering the Backward Network Encoder to start scanning for the next required objective. When the index reaches the end of the PO/PPOs saved from the Backward Network, indicating that there are no more required objectives, the encoder sets the in bus to all 1s, and sets NReady low. This value on the in bus then causes the PI/PPI Decision Block to move on to start the next Imply operation on the Forward Network. One additional function provided by the Backward Network Encoder, because it controls the NReady signal, is forwarding trigger request on to the PI/PPI Decision Block. Specifically, this relates to the newframe_ready signal, which comes from the Objective Decision Block. When it is determined that the current frame is Done (all previous frame PPIs match current frame PPOs) or in Conflict (value conflict between previous frame PPI and current frame PPO), the corresponding signals are set from the Objective Decision Block to the PI/PPI Decision Block. Once complete, the NReady signal still needs to be set low to trigger the PI/PPI Decision Block to start an operation and examine those set values. To this end, the Objective Decision block 72

83 toggles the newframe_ready signal, which signals the Backward Network Encoder to set NReady low. 73

84 3.5 Translating Circuits into Forward/Backward Networks In order to place a specific circuit under test in the ATPG architecture outlined here, the network must first be translated from its basic form into a structure that is compatible to properly interface with adjacent blocks in the ATPG architecture. To accomplish this in an automated fashion, a translation program is implemented in C++ code to perform the mapping, which is included in Appendix E. This section outlines the operation and flow of this utility, as shown in Figure Figure 3.25: Network Translation Flow Processing Input Data The translation program is designed around reading a specific function-level format for networks to be translated. In this format, each net in the circuit is functionally defined with one line of code. The exception to this are output nets, which are defined by two lines; one with their logical definition and the other defining them as an output net. Appendix A contains a sample input circuit (along with translated circuits output) and Table 3.14 shows the general definition of this format. 74

85 Input Code Functional Description INPUT(G0) Defines G0 as an input to the circuit. OUTPUT(G1) Defines G1 as an output of the circuit. G1 = not(g0) Defines G1 as the inverse of G0. G2 = and(g0,g1) Defines G2 as a 2-input and of G0 and G1. G3 = or(g0,g1,g2) Defines G3 as a 3-input or of G0, G1 and G2. G4 = xor(g2,g3) Defines G4 as the exclusive or of G2 and G3. G5 = dff(g4) Defines a D-flip-flop with input G4 and output G5. Table 3.14: Input Network Format When reading in circuit data, the program does some pre-processing of structures before storing them in memory. This pre-processing consists of breaking arbitrary complex logic gates down into constituent base components (AND, OR, NOT). By employing this method of only modeling the circuit with the most basic components, the task of later translating these circuit gates into the functional blocks that make up the Forward/Backward Networks is simplified to a limited number of transforms. This also allows for full flexibility to support any arbitrary complex gate that is employed in the circuit to be translated. Figure 3.26 provides a simple example of how the pre-processing is handled in an internally consistent way. Figure 3.26: Internally Consistent Pre-Processing In-Memory Data Structure Once the input circuit has been pre-processed, there are gates that must be stored in memory to maintain a model of the circuit. Each gate that is read in is modeled into a custom data structure. This structure contains essential information about the gate being stored, such as name, function, 75

logical level in the circuit, other gates that are inputs to this one and other gates that are outputs of this one. This data structure is shown in Figure 3.

86 logical level in the circuit, other gates that are inputs to this one and other gates that are outputs of this one. This data structure is shown in Figure Figure 3.27: Gate Data Structure Multiple gates are strung together in linked lists, using gate-nodes as the linking elements in the list. These lists are employed in many functions, including global lists of inputs, outputs, DFFs and circuit logic levels. Also of note is the fact that a gate may have multiple gates fanning in to it or out from it, so these attributes of a gate are also modeled in a linked fashion, as shown in Figure Figure 3.28: Gate Fan-out List Structure 76

87 The main global data structure that ties together gates into the model used is the level list. This is a two-way linked list, which in its first dimension links together each logical level of the circuit and in its second dimension links together the list of gates associated with a given logic level. Inputs and DFF outputs (pseudo-primary inputs) are placed on level 0. Remaining levels are populated by placing each gate on a level that is one higher than the highest level of its entire set of fan in gates. This structure is illustrated in Figure Figure 3.29: Level List Structure Other structured lists of gates include the input list, which links together all input records for the circuit, the output list, which links together all gates that are also drivers of outputs, the DFF list, which links together all DFF gates, and the gate list, which links together all combinational gates in the circuit. Note that the gate list and DFF list are mutually exclusive, because DFFs are 77

88 translated into inputs/outputs in the final network output, unlike combination gates that are translated into Forward/Backward Network equation equivalents Writing Output Networks The final output of the translation program is the Forward and Backward Network Verilog, which then interface with the rest of the ATPG architecture. As the in-memory model of the circuit only contains AND, OR and NOT gates, only three transforms are needed at this point to map the circuit model into the Forward or Backward Network. To write out a network Verilog, standard module headers are first written out, which include parameterized interface constructs for primary inputs, pseudo-primary inputs, primary outputs, pseudo-primary outputs and state bits for all gates. Table 3.15 shows an example of how the interface is defined. module fwd_net (PI_1, PI_0, PPI_1, PPI_0, STATE_1, STATE_0, PO_1, PO_0, PPO_1, PPO_0); parameter n = 1; parameter m = 4; parameter l = 3; parameter s = 20; input [m-1:0] PI_1; input [m-1:0] PI_0; input [l-1:0] PPI_1; input [l-1:0] PPI_0; output [s-1:0] STATE_1; output [s-1:0] STATE_0; output [n-1:0] PO_1; output [n-1:0] PO_0; output [l-1:0] PPO_1; output [l-1:0] PPO_0; Table 3.15: Forward Network Module Interface Example The parameters defined at the top of the module (n, m, l, s) are used to size the interface busses based on the attributes of the circuit in memory. The parameter n represents the number of primary outputs in the circuit, and thus sizes the PO_1 and PO_0 interface busses. The 78

89 parameter m represents the number of primary inputs and thus sizes the PI_1 and PI_0 interface busses. The parameter l represents the number of D-flip-flops and thus sizes both the PPO_1/PPO_0 and PPI_1/PPO_0 interface busses. The parameter s represents the total number of gates in the circuit and thus sizes the STATE_1 and STATE_0 interface busses. Once the module header information has been written out, the next phase is to handle assignment of all bits in the input busses to internal signals. Since the input busses consist of both circuit inputs (in the case of the Forward Network) and circuit DFFs, the program loops through all defined inputs and then all defined DFFs to generate a complete internal assignment from the module inputs. The next step is to loop through all gates in the design, in level order. For the Forward Network, levels are traversed in ascending order and for the Backwards Network, in descending order. Each gate modeled in memory is translated to its equivalent network equations based on the transforms outlined in sections 3.3 and 3.4. Once equations for each gate have been written out, the final phase is to assign internal network bit values onto the module output busses. In the case of the Forward Network, all primary outputs and DFF inputs (pseudo-primary outputs) also have a corresponding internal gate driving them that defines their value. So, once equations are written out for all gates, values are available for all POs and PPOs. By traversing the list of all outputs and all DFFs, corresponding gate output bits are assigned into the module output busses. 79

90 4. Results When setting up testing for the implementation of the architecture described here, it is important to note that the architecture itself has no function without a test circuit being integrated in the form of Forward and Backward Networks. To this end, the ISCAS89 benchmark circuits are integrated as test circuits. This provides a wide range of size, structure and complexity which will then produce a robust set of testing results. The first property of interest in the architecture proposed here is the intent that it be implemented on an FPGA. This means that to be consistent with that goal, the architecture must both successfully synthesize with an FPGA library, and also fit within the resource constraints of the FPGA. To test against this constraint, the Xilinx Virtex- 6 board was selected, and synthesis was completed using Mentor Graphics Precision RTL Synthesis 2012b.10_64-bit. Table 4.1 summarizes utilization of the Virtex-6 resources for the ISCAS89 circuits, once integrated into the architecture. The s35932, s38417 and s38584 benchmark circuits are not listed in the compiled results. As described in section 3.2, the 14-bit encoding scheme for data in this system uses 10 bits for address. Addresses in this context index the set of PI/PPIs for the circuit being tested. That then places a limit of 1024 on the number of PIs plus sequential elements in a circuit to be tested. The three benchmark circuits excluded all have more than 1024 flip-flops, which then overflows the current addressing scheme, leading to unpredictable results. 80

91 It is important to note that while all resource utilization increases along with the relative size / complexity of the benchmark circuit, LUT and CLB utilization represents the real constraint on size, as DFF usage is much lower. Aside from DFF usage for the RF within the Objective Decision Block, and DFF buffering of busses between modules, the size of the non-network portion of the architecture is largely static. 81

92 Circuit LUTs CLB Slices DFF/Latches c % 1.25% 0.30% c % 2.97% 0.62% c % 4.05% 0.66% c % 4.02% 0.84% c % 7.12% 0.96% c % 6.44% 1.00% c % 13.96% 2.72% c % 9.42% 1.11% c % 14.05% 2.37% c % 24.52% 2.06% c % 24.55% 3.19% s % 1.35% 0.35% s % 2.21% 0.54% s % 2.47% 0.58% s % 2.88% 0.69% s % 2.94% 0.69% s % 2.37% 0.46% s % 3.46% 0.82% s % 3.14% 0.74% s % 3.20% 0.60% s % 3.07% 0.71% s526n 3.09% 3.09% 0.71% s % 4.34% 1.02% s % 4.59% 1.05% s % 3.24% 0.55% s % 3.17% 0.55% s % 6.02% 1.35% s % 5.77% 1.09% s % 3.31% 0.72% s % 3.50% 0.72% s % 8.00% 2.08% s % 3.66% 0.49% s % 3.75% 0.49% s % 17.92% 4.80% s % 22.76% 4.46% s % 62.21% 13.20% s % 59.57% 12.46% Table 4.1: Virtex-6 Resource Utilization for IACAS89 Benchmarks 82

93 This then points to the generated Forward / Backward Networks as the core contributor to utilization, which directly relates to the size and complexity of the input benchmark circuit. The relationship between benchmark circuit size and utilization can be seen plotted in Figure 4.1. Figure 4.l: Virtex-6 Utilization vs. ISCAS89 Benchmark Size The data shows a roughly linear relationship between benchmark circuit size and FPGA utilization. It is also important to note that the variance from the relationship increases with the size of the circuit. This behavior is expected, as larger circuits have more nodes where variations in net fanout can occur. Higher net fanouts result in more required circuits for the Backward Network, as discussed in section 3.4.1, hence the more nets there are in the benchmark circuit, the more potential variation in resulting utilization of the translated circuit. 83

94 The second point of interest for the architecture is simulating the resulting benchmark circuits. This determines the number of clock cycles required for operation to complete in each of the benchmark circuits, as well as a check that the intended behavior of the architecture is followed. For this purpose, the ISCAS89 benchmark circuits that were synthesized were also simulated in Mentor Graphics QuestaSim f r This data is then combined with the maximum achievable clock frequency reported in Precision during synthesis to obtain the total runtime on the FPGA. Finally, as a point of comparison, the Formal software ATPG solver tool was run for each of the same benchmark circuits. This data is summarized in Table 4.2. Circuit ATPG Emulation Software Solver Clocking Simulation Run Time Run Time Freq Gen Compile Synthesis Sim Time Total Rules Model Solve (Mhz) Sim Cycles (s) (s) (s) (s) Time (s) Gen (s) Gen (s) (s) Total Time (s) s E s E s E s E s E s E s E s526n E s E s E s E s E s E s E s E s E s E s E s E s E s E s E Table 4.2: Runtime Comparison for ISCAS89 Benchmarks 84

95 The runtime data is broken down among the required processes from start to finish for each of the solvers. In the case of the ATPG emulation architecture, this starts out with the time it takes to generate a set of Forward/Backward Network Verilog code from the input benchmark circuit. The complete Verilog must then be compiled and synthesized before it can finally be loaded onto an FPGA and run to complete the solve operation. In the case of the Formal software solver, the first step is to generate a CTL rule statement based on the circuit to match what is being solved by the FPGA (in this case, the ANDing of all POs). This rule generation is automated for this test case with the TCL script included in Appendix G. This rule is then merged with the benchmark circuit, and the combined code is translated into a model that Formal understands. That model is read into Formal and the solve operation is then run. Note that the Formal solve operation has an internal time limit before giving up early. The solve times highlighted in Table 4.2 indicate runs where Formal gave up early. For that reason, those benchmarks are not included in further analysis, as no direct comparison could be drawn. There are two interesting sets of data to compare within the run time results. The first is the raw solve time between the FPGA and the software-based solver, as shown in Figure

Figure 4.2: FPGA vs. Software Solve Time for ISCAS89 Benchmarks The solve times here were plotted using a logarithmic scale due to the large difference between the two.

96 Figure 4.2: FPGA vs. Software Solve Time for ISCAS89 Benchmarks The solve times here were plotted using a logarithmic scale due to the large difference between the two. It is seen, then, that on average the FPGA architecture solves the ATPG problem 3 orders of magnitude faster than the software-based solver (average of 6991x faster, with a minimum of 131x for s713 and a maximum of 55939x for s510). In addition to this result, though, the complete process run time must also be considered, as shown in Figure

97 Figure 4.3: FPGA vs. Software Total Time for ISCAS89 Benchmarks When the complete process time is considered, the results are the complete reverse of the solve time alone, with the software-based solver being 4 orders of magnitude faster than the FPGA based solution (average of 36309x faster, with a minimum of 2048x for s510 and a maximum of x for s13207). This discrepancy shows that the FPGA based solution still has limitations inherited from pre-processing required before the FPGA can actually solve a problem. Compile and synthesis time dwarf all other run time considerations with the FPGA solution, and present the largest barrier between the FPGA and software solutions. 87

98 5. Conclusion & Future Work Circuits are continually increasing in size and complexity, and this growth increases exposure to subtle bugs in circuit function. This leads to increased reliance on methods of formal verification to catch design flaws, and provide assurance of function. Formal verification methods also suffer issues such as state explosion and increasing runtime with larger circuits. To keep up, formal verification tequiniques continue to evolve, as described in section 1.2. This has lead up to the current use of ATPG-based methods for formal verification. As circuit sizes approach the limits of even ATPG-based method feasibility, further solutions are required. A method has been presented here for implementing an ATPG-based algorithm for formal verification in reconfigurable hardware (FPGA). This implementation has been shown to have a linear relationship between the size of the circuit being verified and ultimate FPGA resource utilization. This implies a reasonable bound on the size of the implementation, as opposed to an exponential utilization explosion as circuit size increases. One limitation encountered that prevented simulation with the three largest benchmark circuits was the limit of 1024 PI/flops for a circuit under test, due to the 10-bit addressing scheme used in the current implementation. With larger FPGAs to accommodate larger circuits for testing, this limit could be increased. Increasing the bit width of the data words across the emulation implementation to support a larger address size is mostly trivial. The only portion that would require more re-work is the interface with the block RAM, as the address / data bit allocation configuration would need to change to a different implementation that supported the new desired size for the address. 88

99 This method has been shown to be and average of 3 orders of magnitude faster than a similar software-based approach, based on the time for solving a given ATPG problem. At the same time, though, total runtime for the FPGA emulation based implementation is significantly limited by the parts of its process still in software (mainly compilation and synthesis). One future enhancement that could be made to improve this limitation would be to split the property monitor portion of the circuit under test into a separate module. Currently the property monitor for a given CTL rule is integrated as part of the Forward and Backward Networks, and as such the whole set of networks must be re-compiled and re-synthesized for each new property to be tested. If the property monitor portion were to be separated out, then only that relatively small portion of the total circuit would need to be re-compiled and re-synthesized for each iteration of different properties on the same circuit. This would reduce the impact of high compile and synthesis time overhead, and make FPGA based emulation a more attractive substitute for software based solvers, with that benefit being directly proportional to circuit size. 89

100 Appendix A: Example Input Circuit and Network Translations The code used to generate Forward and Backward Networks for the architecture described here is designed to accept a specific input format. The specific constructs used in this format were described in section Those constructs can then be applied to create an input circuit for translation, as exemplified in Table A.1, which is the ISCAS89 circuit s27. # 4 inputs # 1 outputs # 3 D-type flipflops # 2 inverters # 8 gates (1 ANDs + 1 NANDs + 2 ORs + 4 NORs) INPUT(G0) INPUT(G1) INPUT(G2) INPUT(G3) OUTPUT(G17) G5 = DFF(G10) G6 = DFF(G11) G7 = DFF(G13) G14 = NOT(G0) G17 = NOT(G11) G8 = AND(G14, G6) G15 = OR(G12, G8) G16 = OR(G3, G8) G9 = NAND(G16, G15) G10 = NOR(G14, G11) G11 = NOR(G5, G9) G12 = NOR(G1, G7) G13 = NOR(G2, G12) Table A.1: Benchmark Code for s27 Circuit This input circuit code is read into the in-memory model of the translation code, which then generates the networks. The first network to be generated is the Forward Network, which is a direct translation of the circuit, with the DFFs being mapped to PPIs/PPOs and the functional bit 90

101 encoding being changed to a 2-bit representation (to model 3 logic values; 0, 1 and X). Given the direct nature of this translation, each wire line defining a gate in the forward network is directly linked to (and has the same base name as) a gate from the input circuit, in a one-to-one relationship. Table A.2 shows the generated forward network code for s27. module fwd_net (PI_1, PI_0, PPI_1, PPI_0, STATE_1, STATE_0, PO_1, PO_0, PPO_1, PPO_0); parameter n = 1; parameter m = 4; parameter l = 3; parameter s = 20; input [m-1:0] PI_1; input [m-1:0] PI_0; input [l-1:0] PPI_1; input [l-1:0] PPI_0; output [s-1:0] STATE_1; output [s-1:0] STATE_0; output [n-1:0] PO_1; output [n-1:0] PO_0; output [l-1:0] PPO_1; output [l-1:0] PPO_0; wire G3_1, G2_1, G1_1, G0_1, G3_0, G2_0, G1_0, G0_0, G7_z1, G6_z1, G5_z1, G7_z0, G6_z0, G5_z0; assign {G3_1, G2_1, G1_1, G0_1} = PI_1; assign {G3_0, G2_0, G1_0, G0_0} = PI_0; assign {G7_z1, G6_z1, G5_z1} = PPI_1; assign {G7_z0, G6_z0, G5_z0} = PPI_0; wire G14_z1 = G0_0; wire G14_z0 = G0_1; wire G12_base_z1 = G7_z1 G1_1; wire G12_base_z0 = G7_z0 & G1_0; wire G12_z1 = G12_base_z0; wire G12_z0 = G12_base_z1; wire G8_z1 = G6_z1 & G14_z1; wire G8_z0 = G6_z0 G14_z0; wire G16_z1 = G8_z1 G3_1; wire G16_z0 = G8_z0 & G3_0; wire G15_z1 = G8_z1 G12_z1; wire G15_z0 = G8_z0 & G12_z0; wire G13_base_z1 = G12_z1 G2_1; wire G13_base_z0 = G12_z0 & G2_0; wire G13_z1 = G13_base_z0; wire G13_z0 = G13_base_z1; wire G9_base_z1 = G15_z1 & G16_z1; wire G9_base_z0 = G15_z0 G16_z0; wire G9_z1 = G9_base_z0; wire G9_z0 = G9_base_z1; wire G11_base_z1 = G9_z1 G5_z1; wire G11_base_z0 = G9_z0 & G5_z0; wire G11_z1 = G11_base_z0; wire G11_z0 = G11_base_z1; wire G10_base_z1 = G11_z1 G14_z1; wire G10_base_z0 = G11_z0 & G14_z0; wire G17_z1 = G11_z0; 91

102 wire G17_z0 = G11_z1; wire G10_z1 = G10_base_z0; wire G10_z0 = G10_base_z1; assign STATE_1 = {G3_1, G2_1, G1_1, G0_1, G7_z1, G6_z1, G5_z1, G14_z1, G12_base_z1, G12_z1, G8_z1, G16_z1, G15_z1, G13_base_z1, G13_z1, G9_base_z1, G9_z1, G11_base_z1, G11_z1, G10_base_z1, G17_z1, G10_z1}; assign STATE_0 = {G3_0, G2_0, G1_0, G0_0, G7_z0, G6_z0, G5_z0, G14_z0, G12_base_z0, G12_z0, G8_z0, G16_z0, G15_z0, G13_base_z0, G13_z0, G9_base_z0, G9_z0, G11_base_z0, G11_z0, G10_base_z0, G17_z0, G10_z0}; assign PO_1 = {G17_z1}; assign PO_0 = {G17_z0}; assign PPO_1 = {G13_z1, G11_z1, G10_z1}; assign PPO_0 = {G13_z0, G11_z0, G10_z0}; endmodule Table A.2: Forward Network Verilog for s27 Benchmark Circuit The Backward Network is more complex in its relationship back to the input circuit, since each input gate maps to multiple Backward Network gates, and special circuitry to handle fanout conditions needs to be inserted. The same base names from the input circuit are still used for the related gates in the Backward Network, though there are many post-fixes used to handle the oneto-many mapping. Table A.3 shows the generated Backward Network code for s27. module back_net (priority_reset, PO_1, PO_0, PPO_1, PPO_0, STATE_1, STATE_0, PI_1, PI_0, PPI_1, PPI_0); parameter n = 1; parameter m = 4; parameter l = 3; parameter s = 20; input priority_reset; input [n-1:0] PO_1; input [n-1:0] PO_0; input [l-1:0] PPO_1; input [l-1:0] PPO_0; input [s-1:0] STATE_1; input [s-1:0] STATE_0; output [m-1:0] PI_1; output [m-1:0] PI_0; output [l-1:0] PPI_1; output [l-1:0] PPI_0; wire G17_zo1, G17_zo0, G3_1, G2_1, G1_1, G0_1, G7_z1, G6_z1, G5_z1, G14_z1, G12_base_z1, G12_z1, G8_z1, G16_z1, G15_z1, G13_base_z1, G13_z1, G9_base_z1, G9_z1, G11_base_z1, G11_z1, G10_base_z1, G17_z1, G10_z1, G3_0, G2_0, G1_0, G0_0, G7_z0, G6_z0, G5_z0, G14_z0, G12_base_z0, G12_z0, G8_z0, G16_z0, G15_z0, G13_base_z0, G13_z0, G9_base_z0, G9_z0, G11_base_z0, G11_z0, G10_base_z0, G17_z0, G10_z0, G13_zo1, G11_zo1, G10_zo1, G13_zo0, G11_zo0, G10_zo0; assign {G17_zo1} = PO_1; assign {G17_zo0} = PO_0; assign {G13_zo1, G11_zo1, G10_zo1} = PPO_1; assign {G13_zo0, G11_zo0, G10_zo0} = PPO_0; 92

103 assign {G3_1, G2_1, G1_1, G0_1, G7_z1, G6_z1, G5_z1, G14_z1, G12_base_z1, G12_z1, G8_z1, G16_z1, G15_z1, G13_base_z1, G13_z1, G9_base_z1, G9_z1, G11_base_z1, G11_z1, G10_base_z1, G17_z1, G10_z1} = STATE_1; assign {G3_0, G2_0, G1_0, G0_0, G7_z0, G6_z0, G5_z0, G14_z0, G12_base_z0, G12_z0, G8_z0, G16_z0, G15_z0, G13_base_z0, G13_z0, G9_base_z0, G9_z0, G11_base_z0, G11_z0, G10_base_z0, G17_z0, G10_z0} = STATE_0; wire G17_zo1_1 = G17_zo1; wire G17_zo0_1 = G17_zo0; wire G10_base_zo1 = G10_zo1 & G10_base_z1 & G10_base_z0; wire G10_base_zo0 = ~G10_zo0; wire G11_zo1_1 = G10_base_zo1 & G11_z1 & G11_z0; wire G11_zo0_1 = G10_base_zo0; wire G14_zo1 = G10_base_zo1 & ~G10_base_zo0 & G14_z1 & G14_z0; wire G14_zo0 = G10_base_zo0; reg G17_priority_0; reg G17_priority_1; reg G17_priority_last_reset; G17_zo1, G17_priority_1, G17_priority_0, G17_zo1_1) begin if (G17_priority_last_reset!= priority_reset) begin G17_priority_0 = 1'b0; G17_priority_1 = 1'b0; G17_priority_last_reset = priority_reset; end else begin G17_priority_0 = G17_zo1 & ~G17_priority_1; G17_priority_1 = ~G17_priority_0 & G17_zo1_1; end end wire G17_zo1_2 = G17_priority_0 G17_priority_1; wire G17_zo0_2 = (G17_priority_0 & G17_zo0) (G17_priority_1 & G17_zo0_1); wire G11_zo1_2 = G17_zo1_2 & G11_z1 & G11_z0; wire G11_zo0_2 = ~G17_zo0_2; reg G11_priority_0; reg G11_priority_1; reg G11_priority_2; reg G11_priority_last_reset; G11_zo1, G11_priority_1, G11_priority_2, G11_priority_0, G11_zo1_1, G11_priority_2, G11_priority_0, G11_priority_1, G11_zo1_2) begin if (G11_priority_last_reset!= priority_reset) begin G11_priority_0 = 1'b0; G11_priority_1 = 1'b0; G11_priority_2 = 1'b0; G11_priority_last_reset = priority_reset; end else begin G11_priority_0 = G11_zo1 & ~G11_priority_1 & ~G11_priority_2; G11_priority_1 = ~G11_priority_0 & G11_zo1_1 & ~G11_priority_2; G11_priority_2 = ~G11_priority_0 & ~G11_priority_1 & G11_zo1_2; end end wire G11_zo1_3 = G11_priority_0 G11_priority_1 G11_priority_2; wire G11_zo0_3 = (G11_priority_0 & G11_zo0) (G11_priority_1 & G11_zo0_1) (G11_priority_2 & G11_zo0_2); wire G11_base_zo1 = G11_zo1_3 & G11_base_z1 & G11_base_z0; wire G11_base_zo0 = ~G11_zo0_3; wire G9_zo1 = G11_base_zo1 & G9_z1 & G9_z0; wire G9_zo0 = G11_base_zo0; wire G5_o1_0 = G11_base_zo1 & ~G11_base_zo0 & G5_z1 & G5_z0; wire G5_o0_0 = G11_base_zo0; wire G9_base_zo1 = G9_zo1 & G9_base_z1 & G9_base_z0; wire G9_base_zo0 = ~G9_zo0; wire G13_base_zo1 = G13_zo1 & G13_base_z1 & G13_base_z0; 93

104 wire G13_base_zo0 = ~G13_zo0; wire G15_zo1 = G9_base_zo1 & G15_z1 & G15_z0; wire G15_zo0 = G9_base_zo0; wire G16_zo1 = G9_base_zo1 & G9_base_zo0 & G16_z1 & G16_z0; wire G16_zo0 = G9_base_zo0; wire G8_zo1 = G16_zo1 & G8_z1 & G8_z0; wire G8_zo0 = G16_zo0; wire G3_o1_0 = G16_zo1 & ~G16_zo0 & G3_1 & G3_0; wire G3_o0_0 = G16_zo0; wire G8_zo1_1 = G15_zo1 & G8_z1 & G8_z0; wire G8_zo0_1 = G15_zo0; wire G12_zo1 = G15_zo1 & ~G15_zo0 & G12_z1 & G12_z0; wire G12_zo0 = G15_zo0; wire G12_zo1_1 = G13_base_zo1 & G12_z1 & G12_z0; wire G12_zo0_1 = G13_base_zo0; wire G2_o1_0 = G13_base_zo1 & ~G13_base_zo0 & G2_1 & G2_0; wire G2_o0_0 = G13_base_zo0; reg G12_priority_0; reg G12_priority_1; reg G12_priority_last_reset; G12_zo1, G12_priority_1, G12_priority_0, G12_zo1_1) begin if (G12_priority_last_reset!= priority_reset) begin G12_priority_0 = 1'b0; G12_priority_1 = 1'b0; G12_priority_last_reset = priority_reset; end else begin G12_priority_0 = G12_zo1 & ~G12_priority_1; G12_priority_1 = ~G12_priority_0 & G12_zo1_1; end end wire G12_zo1_2 = G12_priority_0 G12_priority_1; wire G12_zo0_2 = (G12_priority_0 & G12_zo0) (G12_priority_1 & G12_zo0_1); wire G12_base_zo1 = G12_zo1_2 & G12_base_z1 & G12_base_z0; wire G12_base_zo0 = ~G12_zo0_2; reg G8_priority_0; reg G8_priority_1; reg G8_priority_last_reset; G8_zo1, G8_priority_1, G8_priority_0, G8_zo1_1) begin if (G8_priority_last_reset!= priority_reset) begin G8_priority_0 = 1'b0; G8_priority_1 = 1'b0; G8_priority_last_reset = priority_reset; end else begin G8_priority_0 = G8_zo1 & ~G8_priority_1; G8_priority_1 = ~G8_priority_0 & G8_zo1_1; end end wire G8_zo1_2 = G8_priority_0 G8_priority_1; wire G8_zo0_2 = (G8_priority_0 & G8_zo0) (G8_priority_1 & G8_zo0_1); wire G6_o1_0 = G8_zo1_2 & G6_z1 & G6_z0; wire G6_o0_0 = G8_zo0_2; wire G14_zo1_1 = G8_zo1_2 & G8_zo0_2 & G14_z1 & G14_z0; wire G14_zo0_1 = G8_zo0_2; reg G14_priority_0; reg G14_priority_1; reg G14_priority_last_reset; G14_zo1, G14_priority_1, G14_priority_0, G14_zo1_1) begin if (G14_priority_last_reset!= priority_reset) begin G14_priority_0 = 1'b0; G14_priority_1 = 1'b0; 94

105 G14_priority_last_reset = priority_reset; end else begin G14_priority_0 = G14_zo1 & ~G14_priority_1; G14_priority_1 = ~G14_priority_0 & G14_zo1_1; end end wire G14_zo1_2 = G14_priority_0 G14_priority_1; wire G14_zo0_2 = (G14_priority_0 & G14_zo0) (G14_priority_1 & G14_zo0_1); wire G0_o1_0 = G14_zo1_2 & G0_1 & G0_0; wire G0_o0_0 = ~G14_zo0_2; wire G7_o1_0 = G12_base_zo1 & G7_z1 & G7_z0; wire G7_o0_0 = G12_base_zo0; wire G1_o1_0 = G12_base_zo1 & ~G12_base_zo0 & G1_1 & G1_0; wire G1_o0_0 = G12_base_zo0; wire G3_o1 = G3_o1_0; wire G3_o0 = G3_o0_0; wire G2_o1 = G2_o1_0; wire G2_o0 = G2_o0_0; wire G1_o1 = G1_o1_0; wire G1_o0 = G1_o0_0; wire G0_o1 = G0_o1_0; wire G0_o0 = G0_o0_0; wire G7_zo1 = G7_o1_0; wire G7_zo0 = G7_o0_0; wire G6_zo1 = G6_o1_0; wire G6_zo0 = G6_o0_0; wire G5_zo1 = G5_o1_0; wire G5_zo0 = G5_o0_0; assign PI_1 = {G3_o1, G2_o1, G1_o1, G0_o1}; assign PI_0 = {G3_o0, G2_o0, G1_o0, G0_o0}; assign PPI_1 = {G7_zo1, G6_zo1, G5_zo1}; assign PPI_0 = {G7_zo0, G6_zo0, G5_zo0}; endmodule Table A.3: Backward Network Verilog for s27 Benchmark Circuit One piece of repeated code that is important to note in the backward network, is the priority encoder logic that is used to handle logical fanout when translating the network to the reverse direction. Table A.4 shows an excerpt from the backward network code which is used to implement the priority encoder. 95

106 ... assign {G13_zo1, G11_zo1, G10_zo1} = PPO_1; assign {G13_zo0, G11_zo0, G10_zo0} = PPO_0;... wire G11_zo1_1 = G10_base_zo1 & G11_z1 & G11_z0; wire G11_zo0_1 = G10_base_zo0;... wire G11_zo1_2 = G17_zo1_2 & G11_z1 & G11_z0; wire G11_zo0_2 = ~G17_zo0_2; reg G11_priority_0; reg G11_priority_1; reg G11_priority_2; reg G11_priority_last_reset; G11_zo1, G11_priority_1, G11_priority_2, G11_priority_0, G11_zo1_1, G11_priority_2, G11_priority_0, G11_priority_1, G11_zo1_2) begin if (G11_priority_last_reset!= priority_reset) begin G11_priority_0 = 1'b0; G11_priority_1 = 1'b0; G11_priority_2 = 1'b0; G11_priority_last_reset = priority_reset; end else begin G11_priority_0 = G11_zo1 & ~G11_priority_1 & ~G11_priority_2; G11_priority_1 = ~G11_priority_0 & G11_zo1_1 & ~G11_priority_2; G11_priority_2 = ~G11_priority_0 & ~G11_priority_1 & G11_zo1_2; end end wire G11_zo1_3 = G11_priority_0 G11_priority_1 G11_priority_2; wire G11_zo0_3 = (G11_priority_0 & G11_zo0) (G11_priority_1 & G11_zo0_1) (G11_priority_2 & G11_zo0_2);... Table A.4: Backward Network Priority Encoder Verilog Example Each time the translation code processes a gate input while generating the backward network, the name of the driving cell is recorded. If another instance of a gate input being sourced by the same driver cell is encountered, the next incremental post-fix is selected to reference that version of the driver cell output in the backward model. Once processing arrives at the driving gate that has multiple sinks, and thus multiple versions created of its output, those multiple signals need to be resolved into a single signal to continue propagation through the Backward Network. So, at this point, a priority encoder is instantiated. 96

107 This is modeled as a block that triggers off any change in the required objective bits for any of the versions of the driver cell output signal (or the global priority_reset signal used for clearing the Backward Network). Whenever the first change occurs in those signals, the priority bits in the encoder lock in, preventing any changes until a priority reset occurs. The final portion of the priority encoder occurs outside of the detect/lock code block. In the following two wire statements, the merged driver cell signal is defined. The zo1 (required objective) bit is defined by ORing together all the priority signals. Since one of these will lock in with a value of 1 once the first one arrives, this will then cause the final output to also lock in with a value of 1. The zo0 (objective value) bit is defined by ANDing together each objective value with its priority bit, and then ORing them all together. Since the AND operation will act as a pass-through for the objective value only when the priority input is 1, only the term with the locked-in priority bit will be passed through (all other terms will always be 0). These are then all ORed together, effectively propagating the single term representing the locked-in objective from the encoder into the final merged objective value bit. 97

108 Appendix B: Example DONE Simulation for c17 Benchmark Circuit The following results are taken from simulation of the c17 ISCAS89 benchmark circuit using Mentor Graphics QuestaSim f r For reference, the logical structure of c17 is shown in Figure B.1. Note that as a c* class benchmark circuit, there are no sequential elements, and thus no PPIs/PPOs. Also note that the model of the circuit used in the verification architecture uses separate AND + NOT structures in place of the NANDs defined in the base circuit. For the sake of diagram simplicity, these will remain abstracted as singular NAND gates. Figure B.1: c17 Benchmark Circuit Structure One final note about the structure of the circuit to keep in mind is that in the case of this simulation, a property monitor is used which ANDs together all POs, effectively translating the test of line_k=1 to testing if all POs can be 1 at the same time. Thus, the structure that is being simulated is the one shown in Figure B.2. 98

109 Figure B.2: c17 Simulation Circuit Structure To begin simulation of the circuit, the clk input is defined with a 100ps period (first rising edge arriving at +50ps) and the global_reset signal is pulsed high for 5ps to initialize the circuit: force -freeze sim/:top:clk 0 0, 1 {50 ps} -r 100 force -freeze sim/:top:global_reset 1 0 -cancel 5 Table B.1: Circuit c17 Simulation Input Stimulus When global_reset goes high, the first operation in the circuit is executed, with all modules resetting to their initial states. # fourcounter reseting the frame counter # statecheck reset # obj-dec RESET # back-bec RESET # back-enc RESET # fourcounter global reset # PPI Decision Block / Reset of PPI Decision Block # memram ADDRESS=0= # PPI Decision Block / Reset of PPI Decision Block # PPI Decision Block / Reset of PPI Decision Block # fourcounter global reset # mycontrol executing state 0 Table B.2: Circuit c17 Initial Reset 99

110 As part of this reset, the values sent from the PI/PPI Decision Block to the Forward Network are cleared to all Xs (2 b11). Thus in the first clock cycle, the Objective Decision Block sees that it is in frame_k with line_k=x. Thus, an objective of line_k=1 is set, and pushed to the Backward Network Decoder. # mycontrol in state 0 # obj-dec line_k = X # obj-dec push obj: Table B.3: Circuit c17 Simulation Cycle 1 The Backward Network Decoder receives this objective value and begins the process of a Backtrace on the Backward Network. This involves two cycles of operation. In the first, the Backward Network is cleared, where all priority encoders are unlocked, so that new propagation can occur. In the second cycle, the objective is pushed onto the Backward Network, and the Backward Network Encoder is signaled that a Backtrace operation has started via toggling of the trace_start signal. # mycontrol in state 0 # back-dec received new obj: # mycontrol in state 0 # back-dec clearing back-net # mycontrol in state 0 # back-dec pushing obj onto back-net: Table B.4: Circuit c17 Simulation Cycles 2-4 Once the objective is pushed onto the Backward Network, tracing takes place in a single cycle, while the Backward Network Encoder waits for trace results. Figure B.3 shows the result of the Backtrace on the circuit structure. Note that in the circuit diagram, the bottom inputs to gates 100

111 are considered the A input, and as such when a gate output objective only requires one input to be set as an objective, the bottom input will be set. Figure B.3: c17 Simulation Backtrace While this Backtrace is happening, the Backward Network Encoder is waiting for results. Once the results are available, the Backward Network Encoder sets the NReady signal low to indicate to the PI/PPI Decision Block that backtraced results are ready to send to it. It moves to state 1 to prepare to accept a new objective value from the Backward Network. # mycontrol in state 0 # back-enc trace started, waiting for results # mycontrol in state 0 # back-enc traced values received # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 Table B.5: Circuit c17 Simulation Cycles 5-7 From state 1, the PI/PPI Decision Block moves to state 16, where it sets genobj to high, indicating to the Backward Network Encoder that it is ready to accept an objective value. In the following cycle the PI/PPI Decision Block returns to state 0 (idle), while the Backward Network 101

112 Encoder finds the first objective value on the Backward Network outputs. It finds an objective value of 1 (encoded from 11 to 10 ) at PI index 1. This corresponds to the assignment of G2gat to 1. This objective is pushed onto the in bus and NReady is set low, triggering the PI/PPI Decision Block to action. The PI/PPI Decision Block moves through state 1 and on to state 17. # mycontrol in state 1 # mycontrol executing state 16 # mycontrol in state 16 # back-enc found obj at PI index : 10 # mycontrol executing state 0 # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # memram beginning push # memram ADDRESS=0= # mycontrol in state 1 # mycontrol executing state 17 # memram pushing on to ram # input word= # memram ADDRESS=li=0000 # mycontrol in state 17 # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=10 Table B.6: Circuit c17 Simulation Cycles 8-12 In state 17, the PI/PPI Decision Block pushes the value on in onto the Block RAM, which will then contain its first value, as shown in Figure B

113 Figure B.4: Contents of c17 Block RAM After Objective 1 The PI/PPI Decision Block then moves on to state 10, where the same objective value is loaded into the Forward Network input buffer. At the same time, genobj is set, signaling the Backward Network Encoder that the PI/PPI Decision Block will be ready to accept another objective value. The PI/PPI Decision Block then returns to state 0, and the same cycle that just completed runs 2 more times to pass on the objective values set on G6gat and G7gat. # mycontrol in state 10 # back-enc found obj at PI index : 01 # mycontrol executing state 0 # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # memram beginning push # memram ADDRESS=0= # mycontrol in state 1 # mycontrol executing state 17 # memram pushing on to ram # input word= # memram ADDRESS=li=0001 # mycontrol in state 17 # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=01 # mycontrol in state 10 # back-enc found obj at PI index : 10 # mycontrol executing state 0 # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 103

114 # memram beginning push # memram ADDRESS=0= # mycontrol in state 1 # mycontrol executing state 17 # memram pushing on to ram # input word= # memram ADDRESS=li=0010 # mycontrol in state 17 # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=10 Table B.7: Circuit c17 Simulation Cycles After the final objective value has been passed from the Backward Network Encoder to the PI/PPI Decision Block, genobj is again set high. This time, the Backward Network Encoder has no more objectives to pass. Detecting that it is done, it sets the in bus to all 1s, indicating no value, and again triggers the PI/PPI Decision Block by setting NReady to 0. Receiving this signal that there are no further objectives, the PI/PPI Decision Block pushes the values in the buffer onto the Forward Network, starting a new trace. The newvaluestoforward signal is also toggled to signal the Objective Decision Block that a new trace has started. # mycontrol in state 10 # back-enc no objectives; done # PPI Decision Block Assign vf (from top) # addr= # val=10 # mycontrol executing state 0 # PPI Decision Block Sending Values to Forward Network Table B.8: Circuit c17 Simulation Cycle 21 The same three objective values that were backtraced are pushed onto the Forward Network, which propagates the values from PI to PO. The flow of this trace on the circuit structure is shown in Figure B

115 Figure B.5: c17 Simulation Trace / Implication In the next cycle, the Objective Decision Block sees nexvaluestoforward has toggled, and checks the values on the Forward Network. The PO value of final_zo has changed from X (2 b11) to 1 (2 b10), indicating a successful trace. Since the circuit is in frame_1 (circuits without sequential elements only operate in a single frame; frame_k=frame_1), the Objective Decision Block checks the Forward Network state against the objective of line_k=1. Since line_k (final_zo) is set to 1, the final objective in frame_1 has been satisfied. The Done signal is set high, indicating a done state, and the newframe_ready signal is toggled indicating that the current frame is complete (in a sequential circuit this would mean moving to the next frame). # mycontrol in state 0 # obj-dec line_k = 1 # obj-dec DONE with frame # mycontrol in state 0 # back-enc forwarding newframe nready # DONE!! # cycle= 23 FAIL=0 DONE=1 # back-enc recovering nready in idle Table B.9: Circuit c17 Simulation Cycles

116 In the final cycle, the Backward Network Encoder receives the toggle of the newframe_ready signal, which triggers it to set NReady low, passing on the signal to the PI/PPI Decision Block. The PI/PPI Decision Block receives the NReady signal and also sees that the signals Done=1, Conflict=0 and frame_1=1, indicating that the final objective has been satisfied in frame_1. The PI/PPI Decision Block then moves to the final state 14, where DONE=1 is passed to the global output, completing the simulation. At this point, the full set of PI/PPI assignment vectors required to reproduce the line_k=1 objective are located in the Block RAM for extraction. Figure B.6: Final c17 Block RAM Contents 106

117 Appendix C: Example FAIL Simulation for s27 Benchmark Circuit The following results are taken from simulation of the s27 ISCAS89 benchmark circuit using Mentor Graphics QuestaSim f r For reference, the logical structure of s27 is shown in Figure C.1. Figure C.1: s27 Benchmark Circuit Structure Note that as part of the translation into the Forward and Backward Networks, the sequential elements are removed, and converted into PPI/PPOs of the circuit. In illustration, the PPIs are located along the bottom of the circuit, using lower case notation. Their corresponding PPOs are along the right side of the circuit, using the same name with an upper case notation. Also note that inverting gates are converted into a non-inverting gate and a separate not gate, but for the sake of illustration simplicity, these gates remain singular in this example. The final network structure for tracing in simulation is shown in Figure C

118 Figure C.2: s27 Benchmark Simulation Structure Simulation begins with an input clock defined with a 100ps period (first rising edge at 50ps), and a pulse of global_reset to high for 5ps, to trigger initialization/reset of the circuit. From this point, simulation of the first frame begins. Since the first frame objective is always line_k=1, simulation proceeds exactly as in Appendix B, up until the point that the first frame is complete. Simulation output from the first frame is shown in Table C.1, but is not discussed in detail for this reason. # fourcounter reseting the frame counter # statecheck reset # obj-dec RESET # back-bec RESET # back-enc RESET # fourcounter global reset # PPI Decision Block / Reset of PPI Decision Block # memram ADDRESS=0= # PPI Decision Block / Reset of PPI Decision Block # PPI Decision Block / Reset of PPI Decision Block # fourcounter global reset # mycontrol executing state 0 # mycontrol in state 0 # obj-dec line_k = X # obj-dec push obj: # mycontrol in state 0 # back-dec received new obj:

119 # mycontrol in state 0 # back-dec clearing back-net # mycontrol in state 0 # back-dec pushing obj onto back-net: # mycontrol in state 0 # back-enc trace started, waiting for results # mycontrol in state 0 # back-enc traced values received # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # mycontrol in state 1 # mycontrol executing state 16 # mycontrol in state 16 # back-enc found obj at PPI index : 01 # mycontrol executing state 0 # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # memram beginning push # memram ADDRESS=0= # mycontrol in state 1 # mycontrol executing state 17 # memram pushing on to ram # input word= # memram ADDRESS=li=0000 # mycontrol in state 17 # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=01 # mycontrol in state 10 # back-enc found obj at PPI index : 10 # mycontrol executing state 0 # mycontrol in state 0 # back-enc reset nready signal 109

120 # mycontrol executing state 1 # memram beginning push # memram ADDRESS=0= # mycontrol in state 1 # mycontrol executing state 17 # memram pushing on to ram # input word= # memram ADDRESS=li=0001 # mycontrol in state 17 # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=10 # mycontrol in state 10 # back-enc no objectives; done # PPI Decision Block Assign vf (from top) # addr= # val=10 # mycontrol executing state 0 # PPI Decision Block Sending Values to Forward Network # mycontrol in state 0 # obj-dec line_k = 1 # obj-dec DONE with frame # mycontrol in state 0 # back-enc forwarding newframe nready # mycontrol in state 0 # back-enc recovering nready in idle # mycontrol executing state 15 Table C.1: Circuit s27 Simulation Cycles 1-20 The only difference in the first frame between s27 and Appendix B is the result of the Backtrace operation in the Backward Network, and hence the set of values passed to the PI/PPI Decision Block to be stored in the Block RAM. The Backtrace of the circuit for the first processed frame (frame_k; line_k=1) is shown in Figure C.3, where G17 is line_k, being the only PO of the circuit. 110

121 Figure C.3: s27 Simulation Frame k Backtrace At this point, the Block RAM contains two entries, storing the two PPI objective values that were backtraced. These contents are shown in Figure C.4. Note that the top bit for the last objective read onto the memory is currently 0. Figure C.4: Contents of Block RAM After Frame k This is where processing diverges from the example in Appendix B. Since sequential elements exist, the current frame is k, but not 1. Thus, the PI/PPI Decision Block, having PPIs to justify, moves to state 15 to begin a move to frame k-1. From here processing moves to state 3, where each PPI assignment from the previous frame is read from the RAM onto the ppi bus to the Objective Decision Block. The Objective Decision Block reads these values into the RF. # mycontrol in state 15 # mycontrol executing state 3 # memram ADDRESS=addr= # Reading the ram # mycontrol in state 3 (Ti-1) 111

122 # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # isppi evaluating output to RF; in= # mycontrol in state 3 (Ti-1) # obj-dec read in PPI: # memram ram reading out - = # memram ADDRESS=li=0001 # Reading the ram # isppi evaluating output to RF; in= # mycontrol in state 3 (Ti-1) # obj-dec read in PPI: # mycontrol executing state 8 # fourcounter counting a frame DOWN # memram setting Top in RAM Table C.2: Circuit s27 Simulation Cycles Within the PI/PPI Decision Block, the frame counter counts down 1. After this, the top mark bit is set on the last objective from this frame. This process requires reading out and writing back to the memory over multiple cycles, and this it occurs concurrently with the other processes over the next 3 cycles. The contents of the Block RAM after this operation are shown in Figure C.5. Figure C.5: Contents of Block RAM After Move to Frame k-1 The PI/PPI Decision Block then finishes the Move to T i-1 operation by clearing the Forward Network and toggling newvaluestoforward, which triggers the Objective Decision Block to action. 112

123 # mycontrol in state 8 # mycontrol executing state 11 # PPI Decision Block Clearing ValuestoForward # mycontrol in state 11 # obj-dec not frame k, check for conflict/done # obj-dec push obj: # mycontrol executing state 4 # memram beginning rewrite1 # memram ADDRESS=li=0001 Table C.3: Circuit s27 Simulation Cycles The Objective Decision Block again has all Xs from the output of the cleared Forward Network, but this time it is no longer frame_k, so the contents of the RF (PPI objectives from previous frame; PPO objectives for this frame) must be checked against the current Forward Network PPOs to determine the state of the frame. The current contents of the RF are shown in Figure C.6. Figure C.6: Contents of RF in Frame k-1 Since all PPOs are currently X, the first objective value in the RF is selected to be the next objective, and pushed to the Backward Network Decoder. # mycontrol in state 4 # back-dec received new obj: # mycontrol executing state 0 # memram setting the flag in RAM # mycontrol in state 0 # back-dec clearing back-net # mycontrol in state 0 # back-dec pushing obj onto back-net: # mycontrol in state 0 113

124 # back-enc trace started, waiting for results Table C.4: Circuit s27 Simulation Cycles The Backward Network Decoder receives the new objective and starts by clearing the Backward Network of values locked into the priority encoders from the previous Backtrace. Once complete, the new objective is pushed into the Backward Network, and the trace_start signal is toggled, triggering the Backward Network Encoder to begin waiting for Backtrace results. The first Bracktrace in the current frame (k-1) is shown in Figure C.7. Figure C.7: s27 Simulation Frame k-1 Backtrace 1 The Backtrace completes, and again the objective values are passed on to the PI/PPI Decision Block, where they are pushed into the Block RAM. # mycontrol in state 0 # back-enc traced values received # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # mycontrol in state 1 # mycontrol executing state

125 # mycontrol in state 16 # back-enc found obj at PPI index : 01 # mycontrol executing state 0 # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # memram beginning push # memram ADDRESS=0= # mycontrol in state 1 # mycontrol executing state 17 # memram pushing on to ram # input word= # memram ADDRESS=li=0010 # mycontrol in state 17 # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=01 # mycontrol in state 10 # back-enc found obj at PPI index : 10 # mycontrol executing state 0 # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # memram beginning push # memram ADDRESS=0= # mycontrol in state 1 # mycontrol executing state 17 # memram pushing on to ram # input word= # memram ADDRESS=li=0011 # mycontrol in state 17 # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=10 Table C.5: Circuit s27 Simulation Cycles

At this point the Block RAM contains a completed frame k assignment, and a partial assignment for frame k-1 (only the justification of the first objective has been completed) as shown in Figure C.7.

126 At this point the Block RAM contains a completed frame k assignment, and a partial assignment for frame k-1 (only the justification of the first objective has been completed) as shown in Figure C.7. Figure C.8: Contents of RAM in Frame k-1 with Backtrace 1 After receiving the final objective, the PI/PPI Decision Block pushes the received values into the Forward Network to complete the Imply operation, and generate the new objective s resultant PO/PPO values. The PI/PPI Decision Block also toggles the newvaluestofoward signal, triggering the Objective Decision Block to take action. This forward trace is shown in Figure C.9. Figure C.9: s27 Simulation Frame k-1 Imply 1 116

127 # mycontrol in state 10 # back-enc no objectives; done # PPI Decision Block Assign vf (from top) # addr= # val=10 # mycontrol executing state 0 # PPI Decision Block Sending Values to Forward Network # mycontrol in state 0 # obj-dec not frame k, check for conflict/done # obj-dec push obj: Table C.6: Circuit s27 Simulation Cycles In the following cycle, the Objective Decision Block again checks the state of the current frame. This time, the first value in the RF is satisfied by having an equal assignment on its corresponding PPO from the Forward Network, as shown in Figure C.10. Figure C.10: Contents of RF and PO/PPO in Frame k-1, Imply 1 Thus, the Objective Decision Block selects the second (and final) value in the RF as the objective for further justification, as its corresponding PPO value from the Forward Network is still X. The objective value is pushed to the Backward Network Decoder, which again clears the Backward Network for a new Backtrace operation. The objective is then pushed onto the Backward Network, and trace_start is toggled, triggering the Backward Network Encoder to begin waiting for Backtrace results. 117

128 # mycontrol in state 0 # back-dec received new obj: # mycontrol in state 0 # back-dec clearing back-net # mycontrol in state 0 # back-dec pushing obj onto back-net: # mycontrol in state 0 # back-enc trace started, waiting for results Table C.7: Circuit s27 Simulation Cycles Note that this time in Backtrace, the Forward Network is not cleared (all Xs), so there are current STATE values for each gate that factor in to whether or not an objective will continue propagating in the Backtrace. These values are shown as blue in the Backtrace 2 illustration, Figure C.11. Figure C.11: s27 Simulation Frame k-1 Backtrace 2 The value being backtraced on G7 does eventually interact with a current STATE value from the Forward Network, at G12. Here the gate is already assigned a value of 0 in the Forward Network. Since the current value is not X, the Backtrace stops on this path. Since the values are the same, the justification for this part of the path was already completed and further 118

129 evaluation along this path is not required. If the values were not equal, that would present a conflict. Evaluation on the path would still stop, but a conflict would then be detected by the Objective Decision Block in the subsequent Imply operation on the Forward Network, as a complete assignment for the backtraced objective will not have been generated. As shown, one objective is found in the second Backtrace. This value is received by the encoder and passed to the PI/PPI Decision Block. The PI/PPI Decision Block stores this value in memory, and adds it into the current Forward Network output buffer (in addition to the values already in place from the first forward trace). # mycontrol in state 0 # back-enc traced values received # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # mycontrol in state 1 # mycontrol executing state 16 # mycontrol in state 16 # back-enc found obj at PI index : 01 # mycontrol executing state 0 # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # memram beginning push # memram ADDRESS=0= # mycontrol in state 1 # mycontrol executing state 17 # memram pushing on to ram # input word= # memram ADDRESS=li=0100 # mycontrol in state 17 # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) 119

130 # addr= # val=01 # mycontrol in state 10 # back-enc no objectives; done # PPI Decision Block Assign vf (from top) # addr= # val=01 # mycontrol executing state 0 # PPI Decision Block Sending Values to Forward Network Table C.8: Circuit s27 Simulation Cycles Once it is determined that there are no more objective values from the Backward Network Encoder, the current values in the buffer are pushed onto the Forward Network and newvaluestoforward is toggled to inform the Objective Decision Block that a trace has started and action will be necessary. This second trace / Imply operation is shown in Figure C.12. Figure C.12: s27 Simulation Frame k-1 Imply 2 In the next cycle, the Imply operation is complete and updated values are available on the output of the Forward Network. The current PPO values are checked against their associated values in the RF, as shown in Figure C.13. This time, both values in the RF are satisfied by their corresponding Forward Network PPO values, so the Done signal is set and newframe_ready is 120

131 toggled, making the Backward Network encoder signal the PI/PPI Decision Block to take action via NReady. # mycontrol in state 0 # obj-dec not frame k, check for conflict/done # obj-dec DONE with frame # obj-dec signal newframe_ready # mycontrol in state 0 # back-enc forwarding newframe nready Table C.9: Circuit s27 Simulation Cycles Figure C.13: Contents of RF and PO in Frame k-1, Imply 2 Since Done is asserted, it is not frame_1, and there are no PPOs in the current frame to be further justified, processing must move back yet another frame, to k-2. This time, though, before proceeding, the PI/PPI Decision Block goes to state 21. This is the State Check, which is run for each Move to T i-1 operation beyond frame_k. # mycontrol in state 0 # back-enc recovering nready in idle # mycontrol executing state 21 # memram ADDRESS=addr= # Reading the ram # mycontrol in state 21 # statecheck store value in Out: # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state

132 # statecheck store value in Out: # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 21 # statecheck store value in Out: # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 21 # statecheck store value in Out: # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 21 # statecheck end of frame compare (Addr= ) # statecheck store value in Out: # memram ram reading out - = # memram ADDRESS=li=0100 # Reading the ram # mycontrol in state 21 # statecheck store value in Out: # memram setting Top in RAM # memram ADDRESS=addr= # Reading the ram # mycontrol in state 21 # statecheck end of frame compare (Addr= ) # statecheck check finished with no duplicates # memram ram reading out - =xxxxxxxxxxxxxx # memram ADDRESS=addr= # Reading the ram Table C.10: Circuit s27 Simulation Cycles The Forward Network output buffer contains the PI/PPI assignment that defines the current frame to be locked into memory. To check this, State Check has a separate buffer that each past frame in the memory is read out to. Each cycle, another past objective from the RAM is read out into the State Check buffer. When a top mark bit is hit, it indicates the current RAM entry is the start of a different frame. The current values in the State Check buffer are then compared to 122

133 the Forward Network output buffer. If the first entry in the RAM is reached and the final State Check comparison passes, then no duplicates were found, and the Move to T i-1 process will begin. In this example, the current assignment for frame k-1 is compared to the assignment for frame k. It is found to be different, as shown in Figure C.14, so processing continues. Figure C.14: State Check Comparison for Frame k-1 With the state check complete, the PI/PPI Decision Block continues through the Move to T i-1 operation. Upon seeing the donewithstatecheck signal, the Objective Decision Block clears the values currently in the RF, in preparation for the next frame to begin. The PI/PPI Decision Block then begins passing the PPI objectives from the last frame to the Objective Decision Block, which again stores them in the RF as the PPO objectives for the next frame. # mycontrol in state 21 # obj-dec clearing RF for new frame # mycontrol executing state 15 # memram ram reading out - =xxxxxxxxxxxxxx # memram ADDRESS=addr= # Reading the ram # mycontrol in state 15 # mycontrol executing state 3 # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 3 (Ti-1) # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 3 (Ti-1) # memram ram reading out - = # memram ADDRESS=addr=

134 # Reading the ram # isppi evaluating output to RF; in= # mycontrol in state 3 (Ti-1) # obj-dec read in PPI: # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # isppi evaluating output to RF; in= # mycontrol in state 3 (Ti-1) # obj-dec read in PPI: # memram ADDRESS=addr= # Reading the ram # mycontrol in state 3 (Ti-1) # obj-dec read in PPI: # mycontrol executing state 8 # fourcounter counting a frame DOWN # memram ram reading out - = # mycontrol in state 8 # mycontrol executing state 11 # PPI Decision Block Clearing ValuestoForward Table C.11: Circuit s27 Simulation Cycles Note that although three objectives were written to the Block RAM as part of frame k-1, only two objective values are transferred to the RF for frame k-2. This is because one of the objectives in frame k-1 is a PI, which can be set arbitrarily, and thus does not require further justification. Thus, the isppi module filters that objective in the memory while reading, and it is not passed on to the Objective Decision Block. The contents of the RF after these new values have been shifted in can be seen in Figure C.15. Figure C.15: Contents of RF in Frame k-2 124

The PI/PPI Decision Block then clears the Forward Network and proceeds to mark the last objective in the Block RAM with the top bit, indicating the end of the last frame.

135 The PI/PPI Decision Block then clears the Forward Network and proceeds to mark the last objective in the Block RAM with the top bit, indicating the end of the last frame. The contents of the RAM at this point can be seen in Figure C.16. Figure C.16: Contents of Block RAM at Frame k-2 Start At the same time that the top bit is being marked, the Objective Decision Block has again been triggered to action. All PO/PPOs from the Forward Network are cleared (values of X), and it is not frame_1, so an objective from the RF will be selected for justification. The first value is selected and passed to the Backward Network Decoder. The decoder clears the Backward Network, pushes the new objective on, and signals the Backward Network Encoder that a new Backtrace is starting. # mycontrol in state 11 # obj-dec not frame k, check for conflict/done # obj-dec push obj: # mycontrol executing state 4 # memram beginning rewrite1 # memram ADDRESS=li=0100 # mycontrol in state 4 # back-dec received new obj: # mycontrol executing state 0 # memram setting the flag in RAM # mycontrol in state 0 # back-dec clearing back-net # mycontrol in state 0 # back-dec pushing obj onto back-net:

136 # mycontrol in state 0 # back-enc trace started, waiting for results Table C.12: Circuit s27 Simulation Cycles Note that this first objective is the same as the first objective Backtraced as part of frame k-1, and as such, the results of the Backtrace operation will be the same, as shown in Figure C.17. Figure C.17: s27 Simulation Frame k-2 Backtrace 1 The same traced values as before are received and passed back to the PI/PPI Decision Block. These values are again stored in the Block RAM, and pushed onto the Forward Network, leading to the same situation as in frame k-1. # mycontrol in state 0 # back-enc traced values received # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # mycontrol in state 1 # mycontrol executing state 16 # mycontrol in state 16 # back-enc found obj at PPI index : 01 # mycontrol executing state 0 126

137 # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # memram beginning push # memram ADDRESS=0= # mycontrol in state 1 # mycontrol executing state 17 # memram pushing on to ram # input word= # memram ADDRESS=li=0101 # mycontrol in state 17 # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=01 # mycontrol in state 10 # back-enc found obj at PPI index : 10 # mycontrol executing state 0 # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # memram beginning push # memram ADDRESS=0= # mycontrol in state 1 # mycontrol executing state 17 # memram pushing on to ram # input word= # memram ADDRESS=li=0110 # mycontrol in state 17 # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=10 # mycontrol in state 10 # back-enc no objectives; done # PPI Decision Block Assign vf (from top) # addr= # val=10 # mycontrol executing state 0 # PPI Decision Block Sending Values to Forward Network 127

138 # mycontrol in state 0 # obj-dec not frame k, check for conflict/done # obj-dec push obj: # mycontrol in state 0 # back-dec received new obj: # mycontrol in state 0 # back-dec clearing back-net # mycontrol in state 0 # back-dec pushing obj onto back-net: # mycontrol in state 0 # back-enc trace started, waiting for results Table C.13: Circuit s27 Simulation Cycles The second objective to be pushed to the Backward Network is also the same, and the state of the Forward Network is the same, so the second Backtrace operation is also identical, as shown in Figure C.18. Figure C.18: s27 Simulation Frame k-2 Backtrace 2 Again, the same objective is backtraced and returned to the PI/PPI Decision Block, which in turn pushes the update onto the Forward Network. The Objective Decision Block is now in the exact same state as it was in frame k-1. Since both objectives in the RF are satisfied, the Done signal is set and the PI/PPI Decision Block is again triggered that the current frame is done. 128

139 # mycontrol in state 0 # back-enc traced values received # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # mycontrol in state 1 # mycontrol executing state 16 # mycontrol in state 16 # back-enc found obj at PI index : 01 # mycontrol executing state 0 # mycontrol in state 0 # back-enc reset nready signal # mycontrol executing state 1 # memram beginning push # memram ADDRESS=0= # mycontrol in state 1 # mycontrol executing state 17 # memram pushing on to ram # input word= # memram ADDRESS=li=0111 # mycontrol in state 17 # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=01 # mycontrol in state 10 # back-enc no objectives; done # PPI Decision Block Assign vf (from top) # addr= # val=01 # mycontrol executing state 0 # PPI Decision Block Sending Values to Forward Network # mycontrol in state 0 # obj-dec not frame k, check for conflict/done # obj-dec DONE with frame # obj-dec signal newframe_ready # mycontrol in state 0 # back-enc forwarding newframe nready Table C.14: Circuit s27 Simulation Cycles

140 Since it is not frame_k or frame_1 and there are PPI objectives from the last frame, a Move to T i-1 operation is desired. This then again triggers the State Check operation, which begins reading out past frame values to the State Check buffer for comparison vs. the last frame s assignment. # mycontrol in state 0 # back-enc recovering nready in idle # mycontrol executing state 21 # memram ADDRESS=addr= # Reading the ram # mycontrol in state 21 # statecheck store value in Out: # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 21 # statecheck store value in Out: # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 21 # statecheck store value in Out: # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 21 # statecheck store value in Out: # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 21 # statecheck end of frame compare (Addr= ) # statecheck store value in Out: # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 21 # statecheck store value in Out: # memram ram reading out - =

141 # memram ADDRESS=addr= # Reading the ram # mycontrol in state 21 # statecheck store value in Out: # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 21 # statecheck end of frame compare (Addr= ) # statecheck duplicate found # memram ram reading out - = # memram ADDRESS=li=0111 # Reading the ram Table C.15: Circuit s27 Simulation Cycles This time in the State Check, a duplicate is found, as frame k-1 was exactly the same as the current frame k-2. The presence of the duplicate indicates a loop in frame assignments, so a Backtrack operation is started. In this case of a State Check failure, the PI/PPI Decision Block goes directly to state 9, which completes a Clear Top operation, removing the last set objective on the RAM. # mycontrol in state 21 # isppi evaluating output to RF; in= # mycontrol executing state 9 # memram setting Top in RAM # memram beginning cleartop # memram ADDRESS=li=0111 # mycontrol in state 9 (cycle 1) # mycontrol executing state 9 # memram clearing top value in RAM # mycontrol in state 9 (cycle 2) # mycontrol executing state 12 # memram ADDRESS=li=0111 # Reading the ram # mycontrol in state 12 (cycle 1) # memram setting Top in RAM # memram ADDRESS=li=0111 # Reading the ram 131

142 # mycontrol in state 12 (cycle 2) # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=11 # memram setting Top in RAM # mycontrol in state 10 # mycontrol executing state 2 # memram beginning pop # memram ADDRESS=li=0111 # mycontrol in state 2 # mycontrol executing state 12 # memram popping off of RAM # memram ADDRESS=li=0110 # Reading the ram # mycontrol in state 12 (cycle 1) # memram setting Top in RAM # memram ADDRESS=li=0110 # Reading the ram # mycontrol in state 12 (cycle 2) # mycontrol executing state 18 # memram setting Top in RAM # memram ADDRESS=li=0110 # Reading the ram Table C.16: Circuit s27 Simulation Cycles After clearing the value in the RAM, the Forward Network output buffer is also updated to clear the associated value to X. Finally, the top mark bit is updated to the second to last objective value from the last frame, as the cleared value is no longer part of the last frame. The contents of the Block RAM after this clear operation are shown in Figure C

143 Figure C.19: Contents of Block RAM after First Clear Top After the Clear Top operation completes, the PI/PPI Decision Block moves to state 6 to continue the Backtrack with a Swap Value operation. In this process the current top objective in the RAM for the last frame is read out, its value is swapped from 1 to 0, and it is written back into the Block RAM. The contents of the Block RAM after the Swap Value operation are shown in Figure C.20. # mycontrol in state 18 # mycontrol executing state 6 # memram setting Top in RAM # memram beginning swapwrite # memram ADDRESS=li=0110 # mycontrol in state 6 (cycle 1) # mycontrol executing state 6 # memram swapping values in RAM # ram_data_in before swap: xx # memram ram_data_in after swap: xx # memram ram_address= # mycontrol in state 6 (cycle 2) # mycontrol executing state 12 # memram ADDRESS=li=0110 # Reading the ram # mycontrol in state 12 (cycle 1) # memram setting Top in RAM # memram ADDRESS=li=0110 # Reading the ram # mycontrol in state 12 (cycle 2) 133

# mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr=0000000110 # val=01 # memram setting Top in RAM # mycontrol in state 10 # mycontrol executing state 0 # PPI Decision

144 # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=01 # memram setting Top in RAM # mycontrol in state 10 # mycontrol executing state 0 # PPI Decision Block Sending Values to Forward Network Table C.17: Circuit s27 Simulation Cycles Figure C.20: Contents of Block RAM after First Swap Value After the objective value is updated in both the Block RAM and the Forward Network output buffer, a new trace is started on the Forward Network, and the Objective Decision Block is signaled to expect trace results. The new Forward Trace with the swapped value is shown in Figure C.21. Note that the value that was cleared was associated with G2 and the value that was swapped was associated with g6. The value of g7 remains the same. 134

145 Figure C.21: s27 Simulation Frame k-2, Backtrack 1 Imply Although two objective values are pushed onto the Forward Network, they both stop propagating within the circuit, which results in the PO/PPO feeding the Objective Decision block with all Xs. Thus, for the Objective Decision Block, the current state looks the same as the start of frame k-2, with two objectives in the RF to justify, and no current PPO values from the Forward Network. The first objective (G6=0) is selected. # mycontrol in state 0 # obj-dec not frame k, check for conflict/done # obj-dec push obj: # mycontrol in state 0 # back-dec received new obj: # mycontrol in state 0 # back-dec clearing back-net # mycontrol in state 0 # back-dec pushing obj onto back-net: # mycontrol in state 0 # back-enc trace started, waiting for results Table C.18: Circuit s27 Simulation Cycles The Backward Network Decoder clears the Backward Network and pushes this new objective on to be backtraced. Note that this time for the Backtrace of G6=0, there are different STATE 135

146 values coming from the Forward Network, which change the behavior of the operation, as shown in Figure C.22. Figure C.22: Simulation Frame k-2, Backtrack 1 Backtrace One portion of the Backtrace is stopped at gate G8 due to a prior assignment in the Forward Network. The other portion of the Backtrace, going to g7 is also stopped, though the reason is not apparent due to another abstraction in the illustration. As part of DFF handling, isolation buffers are added to all DFF outputs. This prevents no-logic paths from being introduced into the networks by DFFs feeding other DFFs. One other effect of this is that these buffers are present on all PPI inputs in the networks. Thus, this virtual buffer on g7 has an inherited value of 0 from the Forward Network due to the traced assignment. Thus, the value that is attempting to be Backtraced to g7 is blocked. This situation leads to no change in the Backward Network output. This trace failure is caught by the Backward Network Encoder, which signals the Objective Decision Block via the propfail signal. The Objective Decision Block sees this signal and raises Conflict to the PI/PPI Decision 136

147 Block. At the same time, the Backward Network Encoder also sets NReady low to trigger the PI/PPI Decision Block to action. # mycontrol in state 0 # back-enc no obj propagated; signaling propagation failure # mycontrol in state 0 # obj-dec back-net propagation failure # back-enc asserting delayed NReady Table C.19: Circuit s27 Simulation Cycles With Conflict asserted, the PI/PPI Decision Block again moves to state 9, starting another Backtrack operation with a Clear Top. After that, Swap Value is again performed, updating both the Block RAM and Forward Network output buffer. Upon completion of this Backtrack, the latest values are again pushed onto the Forward Network. # mycontrol in state 0 # obj-dec back-net propagation failure # back-enc recovering nready in idle # mycontrol executing state 9 # memram beginning cleartop # memram ADDRESS=li=0110 # mycontrol in state 9 (cycle 1) # mycontrol executing state 9 # memram clearing top value in RAM # mycontrol in state 9 (cycle 2) # mycontrol executing state 12 # memram ADDRESS=li=0110 # Reading the ram # mycontrol in state 12 (cycle 1) # memram setting Top in RAM # memram ADDRESS=li=0110 # Reading the ram # mycontrol in state 12 (cycle 2) # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr=

148 # val=11 # memram setting Top in RAM # mycontrol in state 10 # mycontrol executing state 2 # memram beginning pop # memram ADDRESS=li=0110 # mycontrol in state 2 # mycontrol executing state 12 # memram popping off of RAM # memram ADDRESS=li=0101 # Reading the ram # mycontrol in state 12 (cycle 1) # memram setting Top in RAM # memram ADDRESS=li=0101 # Reading the ram # mycontrol in state 12 (cycle 2) # mycontrol executing state 18 # memram setting Top in RAM # memram ADDRESS=li=0101 # Reading the ram # mycontrol in state 18 # mycontrol executing state 6 # memram setting Top in RAM # memram beginning swapwrite # memram ADDRESS=li=0101 # mycontrol in state 6 (cycle 1) # mycontrol executing state 6 # memram swapping values in RAM # ram_data_in before swap: xx # memram ram_data_in after swap: xx # memram ram_address= # mycontrol in state 6 (cycle 2) # mycontrol executing state 12 # memram ADDRESS=li=0101 # Reading the ram # mycontrol in state 12 (cycle 1) # memram setting Top in RAM # memram ADDRESS=li=0101 # Reading the ram # mycontrol in state 12 (cycle 2) # mycontrol executing state

149 # PPI Decision Block Assign vf (from top) # addr= # val=10 # memram setting Top in RAM # mycontrol in state 10 # mycontrol executing state 0 # PPI Decision Block Sending Values to Forward Network Table C.20: Circuit s27 Simulation Cycles Note that in this second Backtrack operation the value of G7 was cleared, and the value of G6 was swapped from 0 to 1. Once these updated values are pushed onto the Forward Network, the resulting Imply operation is shown in Figure C.23. Figure C.23: s27 Simulation Frame k-2, Backtrack 2 Imply This time in the Imply operation, the remaining value of g6=1 is immediately stopped, as it directly feeds a single AND gate, and a value of 1 just passes through the X on the other input. No PO/PPO values change from the Forward Network, and the Objective Decision Block then detects a propagation failure on the Forward Network. The Conflict signal is again raised, with newframe_ready being sent to the Backward Network Encoder to signal the PI/PPI Decision Block with NReady. The PI/PPI Decision Block, having Conflict set again, begins another Backtrack operation. 139

150 # mycontrol in state 0 # obj_dec fwd network propagation failure # mycontrol in state 0 # back-enc forwarding newframe nready # mycontrol in state 0 # back-enc recovering nready in idle # mycontrol executing state 9 # memram beginning cleartop # memram ADDRESS=li=0101 # mycontrol in state 9 (cycle 1) # mycontrol executing state 9 # memram clearing top value in RAM # mycontrol in state 9 (cycle 2) # mycontrol executing state 12 # memram ADDRESS=li=0101 # Reading the ram # mycontrol in state 12 (cycle 1) # memram setting Top in RAM # memram ADDRESS=li=0101 # Reading the ram # mycontrol in state 12 (cycle 2) # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=11 # memram setting Top in RAM # mycontrol in state 10 # mycontrol executing state 2 # memram beginning pop # memram ADDRESS=li=0101 # mycontrol in state 2 # mycontrol executing state 12 # memram popping off of RAM # memram ADDRESS=li=0100 # Reading the ram # mycontrol in state 12 (cycle 1) # memram setting Top in RAM # memram ADDRESS=li=0100 # Reading the ram # mycontrol in state 12 (cycle 2) 140

151 # mycontrol executing state 18 # memram setting Top in RAM # memram ADDRESS=li=0100 # Reading the ram # mycontrol in state 18 # mycontrol executing state 5 # memram setting Top in RAM # memram beginning rewrite0 # memram ADDRESS=li=0100 Table C.21: Circuit s27 Simulation Cycles This time, there is only one objective left in the current frame k-2. Once the Clear Top operation has completed, the PI/PPI Decision block sees the top bit set on the next objective that it needs to execute a Swap Value on. This indicates that all options in the current frame have been exhausted and a Move to T i+1 operation must be completed to continue the Backtrack operation. When the Move to T i+1 operation begins, the PI/PPI Decision Block sets the tiplus1 signal, which triggers the Objective Decision Block to clear the RF, in preparation for a new frame. The first operation that the PI/PPI Decision Block needs to complete is restoring the final state from the frame that is being moved to. To do this, each objective value from the new frame is read out from the Block RAM and into the Forward Network output buffer, cycling between states 3 and 22. Once complete, the previous PPO objective values in the RF must be restored. The objective values from the frame prior to the one being moved to are also read out, and passed over the ppi bus to the Objective Decision Block to read into the RF. Once all objective values have been read out, the top mark bit is cleared from the last objective in the RAM, unlocking the frame being moved to, and the counter increments by 1 to reflect the new current state. 141

152 # mycontrol in state 5 (cycle 1) # obj-dec clearing RF for new frame # clearing the flag in RAM # mycontrol in state 5 (cycle 2) # mycontrol executing state 3 # memram ADDRESS=addr= # Reading the ram # mycontrol in state 3 (Ti+1) # mycontrol executing state 22 # PPI Decision Block Assign vf (from Out) # addr= # val=11 # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # PPI Decision Block Assign vf (from Out) # addr= # val=01 # mycontrol in state 22 # mycontrol executing state 3 # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 3 (Ti+1) # mycontrol executing state 22 # PPI Decision Block Assign vf (from Out) # addr= # val=10 # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 22 # mycontrol executing state 3 # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 3 (Ti+1) # mycontrol executing state 22 # PPI Decision Block Assign vf (from Out) # addr= # val=01 # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram 142

# mycontrol in state 22 # mycontrol executing state 3 # memram ram reading out - =00000001101100 # memram ADDRESS=addr=00000000000001 # Reading the ram # mycontrol in state 3 (Ti+1) # isppi

153 # mycontrol in state 22 # mycontrol executing state 3 # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 3 (Ti+1) # isppi evaluating output to RF; in= # memram ram reading out - = # memram ADDRESS=addr= # Reading the ram # mycontrol in state 3 (Ti+1) # obj-dec read in PPI: # memram ram reading out - = # memram ADDRESS=li=0100 # Reading the ram # isppi evaluating output to RF; in= # mycontrol in state 3 (Ti+1) # obj-dec read in PPI: # mycontrol executing state 7 # fourcounter counting a frame UP # memram setting Top in RAM Table C.22: Circuit s27 Simulation Cycles Once the Move to T i+1 operation has completed, the Backtrack that was in progress can continue. It left off waiting to run a Swap Value on the last value in the frame that is now the current frame (k-1). That Swap Value is now executed, and the current memory state after both the Move to T i+1 and Swap Value is shown in Figure C.24. Figure C.24: Block RAM Contents after Move to T i+1 and Swap Value 143

154 Once the value is swapped in the Block RAM, it is also updated in the Forward Network output buffer. The PI/PPI Decision Block then pushes these updated values into the Forward Network to start a new Imply operation. Note that this time, the value that was swapped was that of G2, changing from 0 to 1, while g7 remains 1 and g6 remains 0, for frame k-1. This Imply operation is shown in Figure C.25. Figure C.25: s27 Simulation Frame k-1, Backtrack 1 Imply # mycontrol in state 7 # mycontrol executing state 6 # memram beginning swapwrite # memram ADDRESS=li=0100 # mycontrol in state 6 (cycle 1) # mycontrol executing state 6 # memram swapping values in RAM # ram_data_in before swap: xx # memram ram_data_in after swap: xx # memram ram_address= # mycontrol in state 6 (cycle 2) # mycontrol executing state 12 # memram ADDRESS=li=0100 # Reading the ram # mycontrol in state 12 (cycle 1) # memram setting Top in RAM # memram ADDRESS=li=0100 # Reading the ram 144

# mycontrol in state 12 (cycle 2) # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr=0000000010 # val=10 # memram setting Top in RAM # mycontrol in state 10 # mycontrol

155 # mycontrol in state 12 (cycle 2) # mycontrol executing state 10 # PPI Decision Block Assign vf (from top) # addr= # val=10 # memram setting Top in RAM # mycontrol in state 10 # mycontrol executing state 0 # PPI Decision Block Sending Values to Forward Network Table C.23: Circuit s27 Simulation Cycles Note that this time when the Imply operation completes the PO/PPO assignment from the Forward Network includes the assignment G7=0. Comparing this with the contents of the RF, as shown in Figure C.26, there is now a value conflict between the required PPO value of G7 from the previous frame (1), and the assigned value in the current frame (0). Figure C.26: RF Contents and PO/PPO after Move to T i+1, Imply 1 The Objective Decision Block detects this conflict, causing it to raise the Conflict signal again. The newframe_ready signal is sent to the Backward Network Encoder, which again sets the NReady signal, triggering the PI/PPI Decision Block to take action. With the Conflict signal raised, the PI/PPI Decision Block will run another iteration of the Backtrack operation. # mycontrol in state 0 # obj-dec not frame k, check for conflict/done # obj-dec CONFLICT found # obj-dec signal newframe_ready 145

12. Use of Test Generation Algorithms and Emulation

12. Use of Test Generation Algorithms and Emulation 1 12. Use of Test Generation Algorithms and Emulation Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin