Sequential Logic Synthesis with Retiming in Encounter RTL Compiler (RC)

Size: px
Start display at page:

Download "Sequential Logic Synthesis with Retiming in Encounter RTL Compiler (RC)"

Transcription

1 Sequential Logic Synthesis with Retiming in Encounter RTL Compiler (RC) Christoph Albrecht 1, Shrirang Dhamdhere 1, Suresh Nair 1, Krishnan Palaniswami 2, Sascha Richter 1 1 Cadence Design Systems, 2 Focus Semiconductor Session Track: Digital IC Design Session Number: 2.3 Relevant Cadence Products: Encounter RTL Compiler (RC), Encounter Conformal Logic Equivalence Checker (LEC) Abstract Typical ASIC designs are highly unbalanced with respect to the timing criticality of their combinational logic paths. This is mainly due to the ad-hoc manual design specification of the register transfer level (RTL), which does not use any information regarding the sequential timing criticality. Traditional logic synthesis does not support borrowing of timing slack across registers, and the optimization is restricted by fixed positions of the registers. This may result in a suboptimal solution, in a loss of performance, and unnecessary area and power consumption. This paper explains the concept of clock scheduling and retiming used by Encounter RTL Compiler (RC) to optimize across register boundaries. Retiming is a structural transformation which changes the positions of the registers without modifying the input-output behavior of the circuit. The reader will understand how the area, the number of registers, or the delay of the design is minimized. Computational results show the tradeoff between these two objectives. Practical applications are discussed: Registers may have different control signals, enable signals, or reset signals. This leads to the multiclass retiming problem and the reset line justification problem. Retiming used to be a difficult challenge for equivalence checking. However, together with Encounter Conformal Logic Equivalence Checker (LEC) the verification is now simple: RC writes out checkpoint netlist files and one script, which LEC can then process to automatically verify the golden RTL against the final netlist. We present a case study showing how retiming was used by Focus Semiconductor, a division of Focus Enhancements, on a 1.5 M instance UWB baseband chip. Retiming substantially improved the Quality of Results (QoR) and helped to meet the design objectives. CDNLive! Silicon Valley

2 1 Introduction Traditional combinatorial logic synthesis focuses all the optimization efforts on the combinational paths between the registers. It does not support any tradeoff between tight paths and loose paths when these are separated by registers. To motivate the use of sequential logic synthesis with retiming, we will discuss the slack distribution of a typical ASIC design. Figure 1: Slack distribution of a typical ASIC design. Figure 1 shows the slack distribution, more specifically the distribution of the setup slacks of a late-mode analysis after synthesis. For each slack interval on the x-axis, the number of combinational paths which have a slack value within that interval is shown. The design has a worst negative slack of -529 ps. Figure 2: Slack distribution of the same ASIC design for which the slack distribution is shown in Figure 1, however this time with optimized clock latencies. Figure 2 shows for the same design an optimized slack distribution. The netlist was not changed, only the clock latencies at the registers. The latencies were computed with a slack balancing algorithm which we will discuss later. The number of critical paths has decreased drastically. Only a small fraction of the paths have a negative latency. In this case it was not possible to improve the worst negative slack, because the worst path in this design is a path from a primary input to a primary output. The two figures, Figure 1 and Figure 2, impressively demonstrate the optimization potential which becomes available when the registers are unlocked and not kept fixed as hard boundaries, which constrains the synthesis optimization algorithms. With the optimized clock latencies, many paths become uncritical. The additional slack can be used to downsize the combinational gates or even to use a different logic structure that has smaller area and power consumption. While clock scheduling was not able to reduce the worst negative slack for this specific design, clock scheduling was able to improve the slack of the side paths. These are either combinational paths that start CDNLive! Silicon Valley

3 at the primary input of the critical path and end at a register or paths that start at a register and end at a primary output. This is helpful for the synthesis optimization algorithms in RC. RC is able to improve the slack of a path by using slack of the side paths. In this paper we discuss the two sequential optimization techniques, clock scheduling and retiming, and show how the combination of both these techniques is used in RC. The paper is organized as follows: In Section 2 we discuss clock scheduling. Clock scheduling is also known as useful skew. It changes the latencies of the clock signal but does not change the logic. The different latencies need to be realized by a sophisticated clock network. In Section 3 we describe retiming. Retiming is a structural transformation. While retiming does not change the combinational gates, it modifies the netlist by moving the registers forward and backward in the logic. RC can use clock scheduling as an intermediate step to drive the logic synthesis and optimization process. Ultimately, it realizes the different latencies by retiming so that a conventional zero or limited skew backend flow can place the design, construct the clock network, and route the nets. This is described in Section 4. In practice, retiming can be constrained by registers that have different control signals (for example, enable signals, asynchronous set or reset signals). Section 5 discusses these constraints. In Section 6 we discuss the automatic verification flow with LEC. In the last section we present a case study how retiming was used on an UWB baseband chip from Focus Semiconductors. 2 Clock Scheduling The following figure shows how the worst slack of a design can be improved by changing the clock latencies: Buffers are added to the clock distribution network and the switching time of the register is delayed. In this case the worst slack is improved from -2 ns to 0 ns and the design meets the timing requirements. If the clock latency of the capturing register of a combinational path is increased, the slack of the combinational path increases by the same amount. If, on the other hand, the clock latency for the capturing register is decreased, the slack of the combinational path decreases. Increasing the clock latency of the launching register decreases the slack and decreasing the latency has the opposite effect on the slack of the path. 4 ns 3 ns 3 ns 2 ns 3 ns 1 ns 2 ns 1 ns 1 ns clock + 2 ns + 1 ns + 1 ns Target clock period: 5 ns Worst slack without clock latencies: Worst slack with clock latencies: - 2 ns 0 ns Figure 3: The worst slack is improved by adjusting the clock latencies. CDNLive! Silicon Valley

4 A linear programming formulation The clock scheduling problem can be formulated as a linear program. This was first done by Fishburn in 1990 [1]. Let T be the clock period. The clock period should be minimized. Furthermore, let l i be the latency of the clock signal arriving at register i, and let d ij be the maximum delay of all combinational path from register i to register j. min T subject to l i + d ij l j + T for all combinational paths (i, j). The difference in the inequality is the slack. Should the design have constrained primary inputs or outputs, we can represent all these inputs and outputs by one dummy register that can have, without loss of generality, a clock latency of zero. Hence, we can assume that even in this case the linear program has the form above. The linear program is a very special linear program and it can be solved efficiently with combinatorial algorithms. It can be proved that the minimum clock period achievable by clock scheduling is equal to the maximum average path delay of all cycles in the register-to-register timing graph. The register-to-register timing graph contains a node for every register and an edge whenever there is a combinational path between the registers with a weight equal to the maximum delay of these paths. In general, the linear program does not have one single solution. However, any solution that minimizes the clock period is usually not desirable. For example, we examined the ASIC design for which the two different slack distributions are shown in Figure 1 and Figure 2. The worst negative slacks of the two slack distributions are equal and so are the clock periods at which the chips can operate without failure. Clock scheduling optimally balancing the slack In the following we discuss how it is possible to compute a clock schedule with a specific property which we call optimally balanced slack. As a result of this property many paths are uncritical and have a lot of slack. This part is more theoretical and if the time of the reader is limited, we recommend skipping this part because the sections following are more important for the practical use. We consider a small example circuit with four registers, a, b, c, and d, shown in Figure a 5 6 b 4 5 d 9 c Figure 4: Example circuit with combinational gates and four registers. The numbers specify the delay of the gates. From the circuit we can construct the register-to-register timing graph which is shown in the following figure. The graph has one node for each of the four registers and an edge between two nodes whenever there is a combinational path between the corresponding registers. Associated with the edges is the maximum delay of the combinational paths. CDNLive! Silicon Valley

5 a 6 b c 5 9 d Figure 5: Register-to-register timing graph for the circuit in Figure 4. Without clock latencies, the minimum feasible clock period for this circuit is equal to the maximum delay of the combinational paths, in this case T = 11. By increasing the clock latency for the register b to +1, the clock period can be decreased to T = 10. This is the minimum clock period which can be achieved by clock scheduling, because with these latencies the two paths (b,d) and (d,b) have a slack of zero. Figure 6 shows the register-to-register timing graph with the latency +1 at register b. In addition to the combinational delays we show also the slacks for the clock period T = 10 in brackets. 9 (1) a c 9 (1) 6 (5) 7 (3) 5 (5) 9 (1) 11 (0) b d +1 9 (0) clock period T = 10 delay (slack) Figure 6: A clock schedule applied to the registers such that the worst incoming slack equals the worst outgoing slack for every register. The edges corresponding to the critical paths with a slack smaller than or equal to 1 are shown in red. The clock schedule shown in Figure 6 has the property that for every register the worst incoming slack is equal to the worst outgoing slack. Changing the clock latency of one single register alone does not give an improvement, since the worst slack of all the paths starting or ending at the register can only get worse. The Figure 6 shows that there is one critical edge in red, the edge (d,c), which is not part of a critical cycle. It is possible to increase the slack of this edge by increasing the clock latency of the registers a and c simultaneously. This does not affect the two critical edges (c,a) and (a,c). The result is shown in Figure 7. In this figure the worst incoming slack equals the worst outgoing slack for every subset of the registers. Note that before, in Figure 6, the worst outgoing slack for the registers a and b together is equal to 5 whereas the worst incoming slack is only (1) +2 a c 9 (1) 6 (3) 7 (5) 5 (3) 9 (3) b (0) (0) d clock period T = 10 Figure 7: An optimally balanced clock schedule: The worst incoming slack equals the worst outgoing slack for every subset of the registers. CDNLive! Silicon Valley

6 The clock schedule shown in Figure 2 on page 2, in which the number of critical paths has decreased so drastically, has exactly this property. It is computationally too expensive to consider all subsets of the registers, because there are exponentially many cycles. Nevertheless, the efficient minimum mean balance algorithm by Young, Taran and Orlin [3] can find such a solution by iteratively finding critical cycles and contracting them. For synthesis operations it is helpful if the side paths of a critical path have additional slack. The slack can be used to reduce the delay of the critical path. An example for such a synthesis operation is Shannon decomposition shown in the following figure. combinational logic x 0 x a critical path a 1 Figure 8: A critical path becomes short and fast using Shannon decomposition. If only one path starting at a point a and ending at a point x is critical and all other paths ending at x are uncritical, then the fanin logic of x can be duplicated twice, once the value of a is permanently set to zero and once it is set to one. The two outputs of the replicated logic feed a multiplexer that chooses the right value for x depending on the value for a. The constant values for a are propagated to simplify the logic. After this transformation the path from a to x is very short and hence very fast. Limitations of clock scheduling Clock scheduling has limitations. Changing the clock latencies may increase the number of hold violations. The hold constraint ensures that data signals do not arrive too early at the data input pin of the register at the end of the path. The signal has to arrive after the register has closed. A high number can potentially lead to an enormous number of hold buffers, which need to be added at the end of the flow. Due to process variations the final delay of the paths on the fabricated chip can deviate from the computed delay. This limits the use of clock scheduling further. For example, it is not possible to have a long combinational path that has a combinational delay equal to ten times the clock period and realize the timing constraints by adjusting the latencies of the clock signals at the launching and receiving register. On such a combinational path there would be 10 different data signals at the same time. These signals need to arrive at the receiving register at the right time. If the combinational delay of the path were only 10% smaller on the final fabricated chip due to process variations, the signal would arrive too early and this would result in a hold time violation. As the delay could also increase, it is not possible to fix this hold violation by adding additional delay with hold buffers. Nevertheless, RC can use internally large positive and negative clock latencies and optimize the combinational logic with these latencies. In the end, the latencies are realized by retiming and moving the registers through the combinational logic. The latencies are only bounded by the number and the movement of the registers. CDNLive! Silicon Valley

7 3 Retiming Retiming is a powerful sequential optimization technique which overcomes the limitations of clock scheduling. Retiming moves the registers across the combinational logic to improve the performance without changing the input/output behavior of the circuit. The following figure shows the slack of a circuit can be improved by retiming. It is the same circuit for which we applied clock scheduling in Figure 4. The registers are retimed backward against the direction of the signal propagation. 4 ns 3 ns 3 ns 2 ns 3 ns 1 ns 2 ns 1 ns 1 ns Target clock period: 5 ns Worst slack before retiming: - 2 ns 4 ns 3 ns 3 ns 2 ns 3 ns 1 ns 2 ns Worst slack after retiming: 1 ns 1 ns Figure 9: The worst slack is improved by retiming the registers 0 ns backward against the direction of the signal propagation. This example shows that retiming changes the number of registers. In this case, the number of registers increases. However, the number of registers can also decrease. RC minimizes the clock period as a first objective. Among all possible retiming solutions that achieve the minimum clock period, RC finds the solution with the minimum number of registers. In addition, RC has the option to minimize the number of registers without increasing the current clock period. Any retiming can be achieved by a sequence of two elementary retiming steps: Forward retiming removes the registers at the input of a gate and creates new registers at the outputs. Backward retiming does the opposite: It removes the registers at the output and creates a new register at each input. The two retiming steps are shown in the following figure. forward retiming backward retiming Figure 10: Registers retimed forward and backward over an AND gate. For forward retiming it is necessary that each input of the gate is driven by a register. Similarly, for backward retiming the gate must not drive any combinational gate but only registers. In order to ensure equivalent input / output behavior of the circuit, retiming cannot change the number of registers on any loop and on any path from a primary input to a primary output path. This is guaranteed by the two operations. Of course, it may still be possible to retime registers forward or backward over a gate if CDNLive! Silicon Valley

8 this condition does not hold for the original circuit, but the condition has to be achieved by elementary retiming steps applied for the other gates before. Constants and dangling logic (logic that does not drive anything) are an exception. Constant propagation as part of the RC synthesis operations simplifies any logic driven by a constant, unless the gates are preserved by an attribute. Similarly, dangling logic is removed. However, should this logic be preserved, retiming is able to create or remove registers at constants and dangling logic. The following figure shows an example in which retiming cannot improve the critical path because no elementary retiming step is possible: A B C Figure 11: An example in which retiming cannot improve the clock period because the register cannot be moved forward. Depending on the clk-to-q delay of the register, the critical path goes from the register to the primary output C. If the primary inputs are even unconstrained, then the critical path starts at the register in any case. Just checking the slack at the data input pin and the output pin of the register, the user may wonder why the register was not moved forward. This is not possible, because there is no register following directly the primary input B. Efficient algorithms for retiming have been developed and published. We refer the interested reader to the fundamental paper by Leiserson and Saxe published in 1991 [2] in which the problem of finding a retiming realizing a given clock period and minimizing the number of registers is formulated and solved as a minimum cost flow problem. Polynomial time algorithms have been developed for this problem. A comprehensive book about timing in general and clock scheduling and retiming is the recent book by S. Sapatnekar [5]. Relationship between clock scheduling and retiming The two sequential optimization techniques, clock scheduling and retiming are related: It can be proved that the clock period achievable by clock scheduling (ignoring any hold constraints) is a lower bound on the clock period that can be achieved by retiming [3]. It can also be proved that retiming can almost achieve this clock period: The minimum clock period achievable by retiming is at most the minimum clock period achievable by clock scheduling plus the maximum delay of all gates. If a clock schedule is given a retiming can be computed as follows: Find a register with the maximum positive clock latency. Decrease the clock latency until the incoming slack is zero. If the slack is already zero, perform a backward retiming over the gate driving the register. The new registers added in front of the gate get a clock latency equal to the latency of the original registers minus the delay of the gate. This procedure is repeated until the clock latency of each register is smaller than half the delay of the gate driving the register. Then a similar procedure is applied for registers with the minimum negative clock latency. The registers are moved forward and the clock latency is increased by the delay of the gate until the clock latency of each register is larger than the negative value of half the delay of the gate driven by the register. If the clock latency of every register is then set to zero, then the retimed circuit has a clock period of which is at most the clock period of the original circuit with clock scheduling plus the maximum delay of all gates. CDNLive! Silicon Valley

9 4 The global sequentially driven synthesis flow in RC RC combines the two sequential optimization techniques, clock scheduling and retiming, in a global sequential synthesis flow shown in the following figure. sequentially driven synthesis and optimization combinational synthesis clock scheduling retiming combinational synthesis Figure 12: The global sequentially driven synthesis flow in RC The logic synthesis and optimization algorithms are tightly interlinked with clock scheduling. Clock scheduling computes clock latencies which improve the clock period and the slack of the combinatorial paths. The synthesis algorithms can use slack of side paths to further improve critical paths. In the next step, retiming moves the registers through the combinational logic. It minimizes the clock period and as second objective minimizes the number of registers. Ultimately, retiming is followed once more by combinational synthesis. This is necessary because the loads of the gates have changed as the registers were moved. RC performs these steps automatically. The user only has to set the attribute retime to true for either the top design or the subdesigns for which retiming should be performed and then call the synthesize command. 5 Special cases for retiming In this section we describe special cases for retiming due to control signals at the registers. The control signals at the registers may constrain the movement of the registers. First we discuss the retiming of registers with enable signals. Then we describe the case when registers with an enable signal are implemented by a simple register with a multiplexer feedback loop. Finally, we discuss asynchronous set and reset signals. Retiming of registers with different enable signals In practice, the retiming of the registers can be constrained: The registers in the circuit may have different control signals, for example enable signals. Retiming cannot combine registers which have different control signals. Figure 13 shows an example. To improve the timing, the two registers should be combined and retimed backward. However, this is not possible because the two registers receive different enable signals. RC can combine and retime registers forward or backward only if they receive the same enable signals. CDNLive! Silicon Valley

10 en 1 clock enable 1 enable en 2 Figure 13: The two registers cannot be moved backward because they receive different enable signals. Multiplexer feedback loop Registers with an enable signal can also be implemented by a simple register and a multiplexer. This may be an advantage for retiming because the registers can then be merged even though the enable signals are different. It may, however, also constrain the register movement and increase the number of registers. Figure 14 shows that the number of registers can be larger. It is a pipeline design with three stages of registers at the primary outputs. The enable is realized by a multiplexer. When the registers are retimed into the combinational logic (applying only the elementary retiming steps in Figure 10), one register has to remain in each loop with the multiplexer. Furthermore, registers pile up at the select lines of the multiplexer. enable 1 enable 2 enable 3 enable 1 enable 2 enable 3 Figure 14: Registers with enable can be implemented by a simple register and a multiplexer. This may increase the register count when the registers are moved backward. If the registers have an enable signal instead of a loop with a multiplexer that can be moved with the registers, then the number of registers after retiming is smaller. If the registers with the multiplexers are at the primary inputs and have to be moved forward, the problem is different: only the last register can be retimed forward. To retime more registers forward it would be necessary to have additional registers at the select line of the multiplexers. By default RC uses registers which have enable logic built into the register. Only if the variable hdl_ff_keep_feedback is true, RC uses simple registers which are in a loop with a multiplexer. The results depend on the structure of the design and can differ drastically. Retiming of registers with asynchronous set and reset signals Retiming of registers with asynchronous set or reset signals is more involved. When these registers are retimed forward or backward through the combinational logic it is necessary to compute the new reset values. Moving these registers forward through the combinational logic is simple: The reset values are propagated through the logic. Figure 15 shows an example. CDNLive! Silicon Valley

11 Figure 15: The registers are retimed forward. The reset values are propagated to the registers in the new locations. Moving registers backward is more complicated. First, all the registers driven by the gate need to have the same reset values. Second, the reset values of the new registers that drive the inputs of the gate are not unique. A naive approach that moves the registers over the gates one gate by the next and randomly chooses any reset values is not possible. The wrong reset values could be chosen such that later the registers cannot be retimed backward over a gate because the reset values are different. Hence, it is necessary to solve a global problem: what are the required 0/1 reset values for the registers in the new locations such that propagating these values through the logic results in the given reset values at the registers in the new location? This problem can be transformed into a satisfiablity problem. It is very similar to verifying that two netlists are equivalent, in which we ask the question: do 0/1 values exist for the registers and primary inputs such that propagating these values through the logic results in different values at a input of a register or a primary output? Sometimes no 0/1 reset values exist for the registers in the new locations, such that propagating these values forward would result in the right given values at the original locations. The following figure shows an example. In this case no valid reset values exist if the registers were moved further backward. RC can move registers with asynchronous set or reset backward only as far as valid reset values for the registers exist. 1? Figure 16: It is not possible to find reset values for the registers in the new locations such that propagating these values results in the given values for the registers in the original locations. If all the registers that retiming needs to merge and move either forward or backward receive equivalent control signals and if also the reset line justification problem is solvable, then retiming is more powerful than clock scheduling. It is possible to have extremely long combinational paths that have a delay as large as several times the clock period. If there are sufficient registers at the beginning or end of the paths, retiming can move these registers into the combinational logic and still achieve the target clock period. Earlier we had seen that clock scheduling is limited because hold constraints need to be considered. If the delays of the paths as well as the variations of the path delays are too large, it is at some point impossible to realize the hold constraints together with the setup constraints. Retiming may increase the number of registers. This is the only drawback. For some designs the increase can be significant. However, RC can also decrease the number of registers. Usually for larger designs that have only one critical part, RC can improve the clock period as well as decrease the number of registers: In the uncritical parts the locations of the registers are very flexible and hence the registers can be moved and possibly merged. CDNLive! Silicon Valley

12 6 An automated verification flow Retiming used to pose fundamental hurdles for equivalence checking. Proving that two netlists are equivalent if one netlist was generated from another netlist through combinational synthesis as well as through retiming is a problem of enormous complexity. To address these verification challenges RC writes out checkpoint files (Verilog netlist) that describe the design at a particular stage. When retiming is used, RC can write out the checkpoint files before and after retiming as shown in the following diagram. RC LEC read RTL initial RTL combinational synthesis equivalence check 1 (combinational) write checkpoint file retiming write checkpoint file combinational synthesis write final netlist pre-retiming checkpoint netlist post-retiming checkpoint netlist final netlist equivalence check 2 (retiming) equivalence check 3 (combinational) Figure 17: The automated synthesis and verification flow with checkpoint files generated by RC and read by LEC. Along with each checkpoint file, RC also generates a corresponding dofile, a command script used by Conformal Logic Equivalence Checker (LEC). Equivalence between RTL and the final netlist is established through a series of verification steps which compare the initial RTL with first checkpoint_file, checkpoint tocheckpoint file and last checkpoint file to the final netlist. The appropriate dofile sets up the verification of corresponding stages as shown in the diagram. Conformal verifies the equivalence under the assumption that either only combinational synthesis operations were performed or only the registers were moved by retiming operations. 7 Case study: Retiming for an UWB baseband chip from Focus Enhancements As a case study we describe how retiming in RC was used by Focus Semiconductor, a division of Focus Enhancements, for the dual-phy UWB baseband chip MADRAS. This chip supports a proprietary Focus (Turbo) mode and a WiMedia mode which is compliant with the Multiband OFDM Alliance (MBOA). The Focus mode is more powerful than the MBOA mode: The ratio of the bandwidth versus the distance is about 2x greater. The chip is designed in a 0.13um CMOS TSMC process technology with an analog front end. It has about 4 million transistors which correspond to approximately 1.5 million instances. The Synchronization Module has a three stage hierarchical datapath implementation. Each stage is composed of a finite input response (FIR) filter which required datapath optimization support from RC. The Synchronization Peak Finder Module contains a divider which is used to normalize the synchronization threshold. Enough pipeline registers were added at the inputs and outputs of the block. RC then rebalances the combinational paths by retiming the registers into the combinational logic. CDNLive! Silicon Valley

13 The Coarse Equalization Module consists of a Media Access Controller (MAC) and scratchpad memory. Retiming was also used for this module. Pipeline registers were added at the primary inputs and outputs and retiming automatically moved these registers into the logic and rebalanced the delay of the combinational paths. The Fine Equalization and the Tracking Module use a similar MAC and memory that made the use of retiming for these modules necessary. A top-down sequential synthesis flow with retiming The design consists of a 600K instance top level block FPT which was synthesized top-down. The retime attribute was set on 16 submodules corresponding to about 45% of the total logic and 49% of the registers. The following table shows all the modules for which the retime attribute was set to true in the automatic synthesize retime flow. number of registers clock period (ps) subdesign gates PIs POs before after change before after change block_1 51, ,589 2, % 12,908 3, % block_2 13, ,766 2, % 13,119 3, % block_3 28, ,283 6, % 6,583 3, % block_4 2, % 6,724 3, % block_5 17, % 5,489 3, % block_6 8, % 9,044 4, % block_7-a 7, ,269 1, % 5,484 3, % block_7-b 7, ,269 1, % 5,484 3, % block_7-c 7, ,269 1, % 5,451 3, % block_7-d 7, ,269 1, % 5,446 3, % block_7-e 7, ,269 1, % 5,465 3, % block_7-f 7, ,269 1, % 5,459 3, % block_8 7, ,088 1, % 8,421 5, % block_9 28, ,500 1, % 12,291 5, % block_10 18, ,862 3, % 9,195 4, % block_11 88,925 1,683 1,700 6,694 5, % 5,212 4, % Average 19, ,081 2, % (1) 7,611 3, % (2) (1) percentage change of the average number of registers before and after retiming (2) average of the percentage change of the clock period before and after retiming The table shows the number of combinational gates, the number of primary inputs (PIs), and the number of primary outputs (POs). The next three columns show the number of registers before and after retiming and the percentage change. The last three columns show the clock period in picoseconds before and after retiming and the percentage change. The table shows that retiming can increase and decrease the number of registers. Overall the number of registers decreases by 0.6%. The clock period improves always. For many of the subdesigns it is expected that the clock period decreases by a large amount because pipeline registers were added at either the primary inputs or primary outputs. CDNLive! Silicon Valley

14 Conclusion With increasing demands for faster designs and shorter time-to-market, it is important for designers to look for efficient optimization techniques. Retiming in Encounter RTL Compiler is one very powerful technique that can achieve substantial improvements in performance. In this paper we have described how RTL Compiler uses clock scheduling in a sequentially driven synthesis flow and then performs retiming minimizing the clock period and the number of registers. We have discussed special cases of retiming, registers with enable signals, registers with a multiplexer feedback loop and registers with asynchronous set and reset signals. With RTL Compiler it is easy to perform retiming and the direct link to Conformal Logic Equivalence Checking provides a complete verification solution. References [1] J. P. Fishburn, Clock Skew Optimization, IEEE Transactions on Computers, vol. 39, pp , July [2] C. Leiserson and J. Saxe, Retiming Synchronous Circuitry, Algorithmica, vol. 6, pp. 5-35, [3] N. E. Young, R. E. Tarjan, J. B. Orlin: Faster Parametric Shortest path and Minimum Balance Algorithms, Networks, 21 (1991), [4] S. S. Sapatnekar, R. B. Deokar: Utilizing the retiming-skew equivalence in a practical algorithm for retiming large circuits, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 15, no. 10, October [5] S. S. Sapatnekar, Timing, Kluwer Academic Publishers, Boston, MA, CDNLive! Silicon Valley

15 Appendix: Encounter RTL Compiler commands for retiming Automatic synthesis with retiming It is easy to use retiming in RC: only the attribute retime needs to be set to true for the design or subdesign which should be retimed. Then during synthesis the design or subdesign is processed automatically by the sequentially driven synthesis flow with retiming as described in Section 4. set_attr retime true [subdesign] synthesize to_mapped Manual retiming flow This flow can be used when a specific module or modules need to be retimed. It can be used as an exploratory tool to see the impact of what retiming can do for a subdesign in a mapped design. The first step retime prepare prepares the design for retiming and retime min_delay performs the actual retiming. Even though retime min_delay performs a local mapping of immediate logic near the flops, it is recommended to follow it with an incremental synthesis or preferably a global synthesis depending on the granularity of the changes. retime prepare [subdesign design ] retime min_delay [subdesign design ] synthesize to_mapped [-incr ] Manual retiming flow minimizing the number of registers This flow explicitly tries to minimize the number of registers and thus the area. This should be used only for a design which has positive slack. synthesize to_mapped retime min_area [subdesign design ] synthesize to_mapped [-incr ] Attributes set_attr dont_retime true [flop] set_attr retime_hard_region true \ [subdesign] set_attr boundary_opto false \ [subdesign] set_attr retime_async_reset true set_attr retime_optimize_reset true Do not retime the register specified. Retiming cannot move registers into or out of the subdesign. Disable boundary optimization (constant propagation and rewiring of equivalent signals across hierarchy) and preserve the input and output pins of a subdesign. This enables easier ECO for the blocks and might be necessary for formal verification. Enable retiming on flops with asynchronous set or reset signals. The runtime may increase if registers need to be moved backward. By default, registers with asynchronous set or reset signals are excluded from retiming. If this attribute is used in combination with the previous attribute, the reset logic is optimized by replacing asynchronous flops with simple flops wherever possible. For more information refer to the Encounter RTL Compiler User Guide, chapter 9, Retiming the Design. CDNLive! Silicon Valley

16 Interface to Conformal Logic Equivalence Checker (LEC) The checkpoint files of the automatic verification flow described in Section 6 and the corresponding dofiles for LEC are generated by RC if the checkpoint attributes are set as shown below. set_attribute checkpoint_flow true set_attribute library my_library.lib read my_design.v elaborate set_attribute checkpoint_netlist_naming_style \ my_chk_dir/chk_%d.v /designs/my_top set_attribute checkpoint_dofile_naming_style \ my_chk_dir/chk_%d_to_chk_%d.do /designs/my_top read_sdc my_constraints.sdc set_attr retime true my_top synthesize to_mapped write m > final.v write_do_lec revised final.v > final.do To run LEC lec -ultra Dofile hdl_to_chk_01.do lec -ultra Dofile chk_01_to_chk_02.do lec -ultra Dofile final.do For more information refer to the document Interfacing between RTL Compiler and Conformal. CDNLive! Silicon Valley

FishTail: The Formal Generation, Verification and Management of Golden Timing Constraints

FishTail: The Formal Generation, Verification and Management of Golden Timing Constraints FishTail: The Formal Generation, Verification and Management of Golden Timing Constraints Chip design is not getting any easier. With increased gate counts, higher clock speeds, smaller chip sizes and

More information

Advanced FPGA Design Methodologies with Xilinx Vivado

Advanced FPGA Design Methodologies with Xilinx Vivado Advanced FPGA Design Methodologies with Xilinx Vivado Alexander Jäger Computer Architecture Group Heidelberg University, Germany Abstract With shrinking feature sizes in the ASIC manufacturing technology,

More information

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function. FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different

More information

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions Logic Synthesis Overview Design flow Principles of logic synthesis Logic Synthesis with the common tools Conclusions 2 System Design Flow Electronic System Level (ESL) flow System C TLM, Verification,

More information

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments 8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments QII51017-9.0.0 Introduction The Quartus II incremental compilation feature allows you to partition a design, compile partitions

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET4076) Lecture 4(part 2) Testability Measurements (Chapter 6) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Previous lecture What

More information

CAD Technology of the SX-9

CAD Technology of the SX-9 KONNO Yoshihiro, IKAWA Yasuhiro, SAWANO Tomoki KANAMARU Keisuke, ONO Koki, KUMAZAKI Masahito Abstract This paper outlines the design techniques and CAD technology used with the SX-9. The LSI and package

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

Hardware Verification Group. Department of Electrical and Computer Engineering, Concordia University, Montreal, Canada. CAD Tool Tutorial.

Hardware Verification Group. Department of Electrical and Computer Engineering, Concordia University, Montreal, Canada. CAD Tool Tutorial. Digital Logic Synthesis and Equivalence Checking Tools Hardware Verification Group Department of Electrical and Computer Engineering, Concordia University, Montreal, Canada CAD Tool Tutorial May, 2010

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

3. Formal Equivalence Checking

3. Formal Equivalence Checking 3. Formal Equivalence Checking 1 3. Formal Equivalence Checking Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin Verification of Digital Systems Spring

More information

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience H. Krupnova CMG/FMVG, ST Microelectronics Grenoble, France Helena.Krupnova@st.com Abstract Today, having a fast hardware

More information

PrimeTime: Introduction to Static Timing Analysis Workshop

PrimeTime: Introduction to Static Timing Analysis Workshop i-1 PrimeTime: Introduction to Static Timing Analysis Workshop Synopsys Customer Education Services 2002 Synopsys, Inc. All Rights Reserved PrimeTime: Introduction to Static 34000-000-S16 Timing Analysis

More information

Logic Verification 13-1

Logic Verification 13-1 Logic Verification 13-1 Verification The goal of verification To ensure 100% correct in functionality and timing Spend 50 ~ 70% of time to verify a design Functional verification Simulation Formal proof

More information

Best Practices for Implementing ARM Cortex -A12 Processor and Mali TM -T6XX GPUs for Mid-Range Mobile SoCs.

Best Practices for Implementing ARM Cortex -A12 Processor and Mali TM -T6XX GPUs for Mid-Range Mobile SoCs. Best Practices for Implementing ARM Cortex -A12 Processor and Mali TM -T6XX GPUs for Mid-Range Mobile SoCs. Cortex-A12: ARM-Cadence collaboration Joint team working on ARM Cortex -A12 irm flow irm content:

More information

Is Power State Table Golden?

Is Power State Table Golden? Is Power State Table Golden? Harsha Vardhan #1, Ankush Bagotra #2, Neha Bajaj #3 # Synopsys India Pvt. Ltd Bangalore, India 1 dhv@synopsys.com 2 ankushb@synopsys.com 3 nehab@synopsys.com Abstract: Independent

More information

EE178 Spring 2018 Lecture Module 4. Eric Crabill

EE178 Spring 2018 Lecture Module 4. Eric Crabill EE178 Spring 2018 Lecture Module 4 Eric Crabill Goals Implementation tradeoffs Design variables: throughput, latency, area Pipelining for throughput Retiming for throughput and latency Interleaving for

More information

High-Level Synthesis (HLS)

High-Level Synthesis (HLS) Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

Best Practices for Incremental Compilation Partitions and Floorplan Assignments

Best Practices for Incremental Compilation Partitions and Floorplan Assignments Best Practices for Incremental Compilation Partitions and Floorplan Assignments December 2007, ver. 1.0 Application Note 470 Introduction The Quartus II incremental compilation feature allows you to partition

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

Lecture 11 Logic Synthesis, Part 2

Lecture 11 Logic Synthesis, Part 2 Lecture 11 Logic Synthesis, Part 2 Xuan Silvia Zhang Washington University in St. Louis http://classes.engineering.wustl.edu/ese461/ Write Synthesizable Code Use meaningful names for signals and variables

More information

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

ADVANCED DIGITAL IC DESIGN. Digital Verification Basic Concepts

ADVANCED DIGITAL IC DESIGN. Digital Verification Basic Concepts 1 ADVANCED DIGITAL IC DESIGN (SESSION 6) Digital Verification Basic Concepts Need for Verification 2 Exponential increase in the complexity of ASIC implies need for sophisticated verification methods to

More information

VLSI Testing. Virendra Singh. Bangalore E0 286: Test & Verification of SoC Design Lecture - 7. Jan 27,

VLSI Testing. Virendra Singh. Bangalore E0 286: Test & Verification of SoC Design Lecture - 7. Jan 27, VLSI Testing Fault Simulation Virendra Singh Indian Institute t of Science Bangalore virendra@computer.org E 286: Test & Verification of SoC Design Lecture - 7 Jan 27, 2 E-286@SERC Fault Simulation Jan

More information

AccuCore SPICE Accurate Core Characterization with STA. Silvaco Japan Technology Seminar Spring 2007

AccuCore SPICE Accurate Core Characterization with STA. Silvaco Japan Technology Seminar Spring 2007 AccuCore SPICE Accurate Core Characterization with STA Silvaco Japan Technology Seminar Spring 2007 What is AccuCore? Why would I use it? AccuCore performs automatic block SPICE characterization and Static

More information

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,

More information

Retiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming.

Retiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming. Retiming Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford Outline Structural optimization methods. Retiming. Modeling. Retiming for minimum delay. Retiming for minimum

More information

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

ENGN 1630: CPLD Simulation Fall ENGN 1630 Fall Simulating XC9572XLs on the ENGN1630 CPLD-II Board Using Xilinx ISim

ENGN 1630: CPLD Simulation Fall ENGN 1630 Fall Simulating XC9572XLs on the ENGN1630 CPLD-II Board Using Xilinx ISim ENGN 1630 Fall 2018 Simulating XC9572XLs on the ENGN1630 CPLD-II Board Using Xilinx ISim You will use the Xilinx ISim simulation software for the required timing simulation of the XC9572XL CPLD programmable

More information

TOPIC : Verilog Synthesis examples. Module 4.3 : Verilog synthesis

TOPIC : Verilog Synthesis examples. Module 4.3 : Verilog synthesis TOPIC : Verilog Synthesis examples Module 4.3 : Verilog synthesis Example : 4-bit magnitude comptarator Discuss synthesis of a 4-bit magnitude comparator to understand each step in the synthesis flow.

More information

SmartTime for Libero SoC v11.5

SmartTime for Libero SoC v11.5 SmartTime for Libero SoC v11.5 User s Guide NOTE: PDF files are intended to be viewed on the printed page; links and cross-references in this PDF file may point to external files and generate an error

More information

VHDL simulation and synthesis

VHDL simulation and synthesis VHDL simulation and synthesis How we treat VHDL in this course You will not become an expert in VHDL after taking this course The goal is that you should learn how VHDL can be used for simulation and synthesis

More information

Chapter 6. CMOS Functional Cells

Chapter 6. CMOS Functional Cells Chapter 6 CMOS Functional Cells In the previous chapter we discussed methods of designing layout of logic gates and building blocks like transmission gates, multiplexers and tri-state inverters. In this

More information

SystemC-to-Layout ASIC Flow Walkthrough

SystemC-to-Layout ASIC Flow Walkthrough SystemC-to-Layout ASIC Flow Walkthrough 20.6.2015 Running the Demo You can execute the flow automatically by executing the csh shell script: csh run_asic_demo.csh The script runs all tools in a sequence.

More information

June 2003, ver. 1.2 Application Note 198

June 2003, ver. 1.2 Application Note 198 Timing Closure with the Quartus II Software June 2003, ver. 1.2 Application Note 198 Introduction With FPGA designs surpassing the multimillion-gate mark, designers need advanced tools to better address

More information

Verification of Clock Domain Crossing Jitter and Metastability Tolerance using Emulation

Verification of Clock Domain Crossing Jitter and Metastability Tolerance using Emulation Verification of Clock Domain Crossing Jitter and Metastability Tolerance using Emulation Ashish Hari ashish_hari@mentor.com Suresh Krishnamurthy k_suresh@mentor.com Amit Jain amit_jain@mentor.com Yogesh

More information

2015 Paper E2.1: Digital Electronics II

2015 Paper E2.1: Digital Electronics II s 2015 Paper E2.1: Digital Electronics II Answer ALL questions. There are THREE questions on the paper. Question ONE counts for 40% of the marks, other questions 30% Time allowed: 2 hours (Not to be removed

More information

Electronic Design Automation Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Electronic Design Automation Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Electronic Design Automation Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #1 Introduction So electronic design automation,

More information

Modeling Asynchronous Circuits in ACL2 Using the Link-Joint Interface

Modeling Asynchronous Circuits in ACL2 Using the Link-Joint Interface Modeling Asynchronous Circuits in ACL2 Using the Link-Joint Interface Cuong Chau ckcuong@cs.utexas.edu Department of Computer Science The University of Texas at Austin April 19, 2016 Cuong Chau (UT Austin)

More information

VLSI Testing. Fault Simulation. Virendra Singh. Indian Institute of Science Bangalore

VLSI Testing. Fault Simulation. Virendra Singh. Indian Institute of Science Bangalore VLSI Testing Fault Simulation Virendra Singh Indian Institute of Science Bangalore virendra@computer.org E0 286: Test & Verification of SoC Design Lecture - 4 Jan 25, 2008 E0-286@SERC 1 Fault Model - Summary

More information

SigmaRAM Echo Clocks

SigmaRAM Echo Clocks SigmaRAM Echo s AN002 Introduction High speed, high throughput cell processing applications require fast access to data. As clock rates increase, the amount of time available to access and register data

More information

CAD Algorithms. Circuit Partitioning

CAD Algorithms. Circuit Partitioning CAD Algorithms Partitioning Mohammad Tehranipoor ECE Department 13 October 2008 1 Circuit Partitioning Partitioning: The process of decomposing a circuit/system into smaller subcircuits/subsystems, which

More information

Low-Power Technology for Image-Processing LSIs

Low-Power Technology for Image-Processing LSIs Low- Technology for Image-Processing LSIs Yoshimi Asada The conventional LSI design assumed power would be supplied uniformly to all parts of an LSI. For a design with multiple supply voltages and a power

More information

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) Objective Part A: To become acquainted with Spectre (or HSpice) by simulating an inverter,

More information

Register Transfer Level

Register Transfer Level Register Transfer Level Something between the logic level and the architecture level A convenient way to describe synchronous sequential systems State diagrams for pros Hierarchy of Designs The design

More information

EE 5327 VLSI Design Laboratory Lab 8 (1 week) Formal Verification

EE 5327 VLSI Design Laboratory Lab 8 (1 week) Formal Verification EE 5327 VLSI Design Laboratory Lab 8 (1 week) Formal Verification PURPOSE: To use Formality and its formal techniques to prove or disprove the functional equivalence of two designs. Formality can be used

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

An easy to read reference is:

An easy to read reference is: 1. Synopsis: Timing Analysis and Timing Constraints The objective of this lab is to make you familiar with two critical reports produced by the Xilinx ISE during your design synthesis and implementation.

More information

High-Level Synthesis

High-Level Synthesis High-Level Synthesis 1 High-Level Synthesis 1. Basic definition 2. A typical HLS process 3. Scheduling techniques 4. Allocation and binding techniques 5. Advanced issues High-Level Synthesis 2 Introduction

More information

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design Wei-Jin Dai, Dennis Huang, Chin-Chih Chang, Michel Courtoy Cadence Design Systems, Inc. Abstract A design methodology for the implementation

More information

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 01 Introduction Welcome to the course on Hardware

More information

Lecture 1: Introduction Course arrangements Recap of basic digital design concepts EDA tool demonstration

Lecture 1: Introduction Course arrangements Recap of basic digital design concepts EDA tool demonstration TKT-1426 Digital design for FPGA, 6cp Fall 2011 http://www.tkt.cs.tut.fi/kurssit/1426/ Tampere University of Technology Department of Computer Systems Waqar Hussain Lecture Contents Lecture 1: Introduction

More information

Metodologie di progetto HW Il test di circuiti digitali

Metodologie di progetto HW Il test di circuiti digitali Metodologie di progetto HW Il test di circuiti digitali Introduzione Versione del 9/4/8 Metodologie di progetto HW Il test di circuiti digitali Introduction VLSI Realization Process Customer s need Determine

More information

Timing Constraints Editor User Guide

Timing Constraints Editor User Guide Libero SoC v11.8 SP1 and SP2 NOTE: PDF files are intended to be viewed on the printed page; links and cross-references in this PDF file may point to external files and generate an error when clicked. View

More information

Quick Look under the Hood of ABC

Quick Look under the Hood of ABC Quick Look under the Hood of ABC A Programmer s Manual December 25, 2006 Network ABC is similar to SIS/MVSIS in that it processes the design by applying a sequence of transformations to the current network,

More information

EE 330 Laboratory Experiment Number 11

EE 330 Laboratory Experiment Number 11 EE 330 Laboratory Experiment Number 11 Design and Simulation of Digital Circuits using Hardware Description Languages Fall 2017 Contents Purpose:... 3 Background... 3 Part 1: Inverter... 4 1.1 Simulating

More information

An overview of standard cell based digital VLSI design

An overview of standard cell based digital VLSI design An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased

More information

Combinational Equivalence Checking

Combinational Equivalence Checking Combinational Equivalence Checking Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab. Dept. of Electrical Engineering Indian Institute of Technology Bombay viren@ee.iitb.ac.in

More information

Advanced VLSI Design Prof. Virendra K. Singh Department of Electrical Engineering Indian Institute of Technology Bombay

Advanced VLSI Design Prof. Virendra K. Singh Department of Electrical Engineering Indian Institute of Technology Bombay Advanced VLSI Design Prof. Virendra K. Singh Department of Electrical Engineering Indian Institute of Technology Bombay Lecture 40 VLSI Design Verification: An Introduction Hello. Welcome to the advance

More information

Metodologie di progetto HW Il test di circuiti digitali

Metodologie di progetto HW Il test di circuiti digitali Metodologie di progetto HW Il test di circuiti digitali Introduzione Versione del 9/4/8 Metodologie di progetto HW Il test di circuiti digitali Introduction Pag. 2 VLSI Realization Process Customer s need

More information

Accelerating CDC Verification Closure on Gate-Level Designs

Accelerating CDC Verification Closure on Gate-Level Designs Accelerating CDC Verification Closure on Gate-Level Designs Anwesha Choudhury, Ashish Hari anwesha_choudhary@mentor.com, ashish_hari@mentor.com Design Verification Technologies Mentor Graphics Abstract:

More information

Placement Algorithm for FPGA Circuits

Placement Algorithm for FPGA Circuits Placement Algorithm for FPGA Circuits ZOLTAN BARUCH, OCTAVIAN CREŢ, KALMAN PUSZTAI Computer Science Department, Technical University of Cluj-Napoca, 26, Bariţiu St., 3400 Cluj-Napoca, Romania {Zoltan.Baruch,

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

VHDL: RTL Synthesis Basics. 1 of 59

VHDL: RTL Synthesis Basics. 1 of 59 VHDL: RTL Synthesis Basics 1 of 59 Goals To learn the basics of RTL synthesis. To be able to synthesize a digital system, given its VHDL model. To be able to relate VHDL code to its synthesized output.

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device

More information

ABC basics (compilation from different articles)

ABC basics (compilation from different articles) 1. AIG construction 2. AIG optimization 3. Technology mapping ABC basics (compilation from different articles) 1. BACKGROUND An And-Inverter Graph (AIG) is a directed acyclic graph (DAG), in which a node

More information

1 Design Process HOME CONTENTS INDEX. For further assistance, or call your local support center

1 Design Process HOME CONTENTS INDEX. For further assistance,  or call your local support center 1 Design Process VHDL Compiler, a member of the Synopsys HDL Compiler family, translates and optimizes a VHDL description to an internal gate-level equivalent. This representation is then compiled with

More information

Retiming and Clock Scheduling for Digital Circuit Optimization

Retiming and Clock Scheduling for Digital Circuit Optimization 184 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 Retiming and Clock Scheduling for Digital Circuit Optimization Xun Liu, Student Member,

More information

Timing and Verification

Timing and Verification Timing and Verification Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted

More information

EECS150 - Digital Design Lecture 17 Memory 2

EECS150 - Digital Design Lecture 17 Memory 2 EECS150 - Digital Design Lecture 17 Memory 2 October 22, 2002 John Wawrzynek Fall 2002 EECS150 Lec17-mem2 Page 1 SDRAM Recap General Characteristics Optimized for high density and therefore low cost/bit

More information

Regular Fabrics for Retiming & Pipelining over Global Interconnects

Regular Fabrics for Retiming & Pipelining over Global Interconnects Regular Fabrics for Retiming & Pipelining over Global Interconnects Jason Cong Computer Science Department University of California, Los Angeles cong@cs cs.ucla.edu http://cadlab cadlab.cs.ucla.edu/~cong

More information

Topics. Verilog. Verilog vs. VHDL (2) Verilog vs. VHDL (1)

Topics. Verilog. Verilog vs. VHDL (2) Verilog vs. VHDL (1) Topics Verilog Hardware modeling and simulation Event-driven simulation Basics of register-transfer design: data paths and controllers; ASM charts. High-level synthesis Initially a proprietary language,

More information

A High Performance Bus Communication Architecture through Bus Splitting

A High Performance Bus Communication Architecture through Bus Splitting A High Performance Communication Architecture through Splitting Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University,West Lafayette, IN, 797, USA {lur, chengkok}@ecn.purdue.edu

More information

Digital Timing. Using TimingDesigner to Generate SDC Timing Constraints. EMA TimingDesigner The industry s most accurate static timing analysis

Digital Timing. Using TimingDesigner to Generate SDC Timing Constraints. EMA TimingDesigner The industry s most accurate static timing analysis EMA TimingDesigner The industry s most accurate static timing analysis Digital Timing Learn about: Using TimingDesigner to generate SDC for development of FPGA designs Using TimingDesigner to establish

More information

Overview of Digital Design with Verilog HDL 1

Overview of Digital Design with Verilog HDL 1 Overview of Digital Design with Verilog HDL 1 1.1 Evolution of Computer-Aided Digital Design Digital circuit design has evolved rapidly over the last 25 years. The earliest digital circuits were designed

More information

Adaptive Weight Functions for Shortest Path Routing Algorithms for Multi-Wavelength Optical WDM Networks

Adaptive Weight Functions for Shortest Path Routing Algorithms for Multi-Wavelength Optical WDM Networks Adaptive Weight Functions for Shortest Path Routing Algorithms for Multi-Wavelength Optical WDM Networks Tibor Fabry-Asztalos, Nilesh Bhide and Krishna M. Sivalingam School of Electrical Engineering &

More information

Next-generation Power Aware CDC Verification What have we learned?

Next-generation Power Aware CDC Verification What have we learned? Next-generation Power Aware CDC Verification What have we learned? Kurt Takara, Mentor Graphics, kurt_takara@mentor.com Chris Kwok, Mentor Graphics, chris_kwok@mentor.com Naman Jain, Mentor Graphics, naman_jain@mentor.com

More information

DESIGNING MULTI-FPGA PROTOTYPES THAT ACT LIKE ASICS

DESIGNING MULTI-FPGA PROTOTYPES THAT ACT LIKE ASICS Design Creation & Synthesis White Paper DESIGNING MULTI-FPGA PROTOTYPES THAT ACT LIKE ASICS May 2009 ABSTRACT FPGA prototyping has become indispensable for functional verification and early software integration

More information

Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure

Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Subhendu Roy 1, Pavlos M. Mattheakis 2, Laurent Masse-Navette 2 and David Z. Pan 1 1 ECE Department, The University of Texas at Austin

More information

Compiler User Guide. Intel Quartus Prime Pro Edition. Updated for Intel Quartus Prime Design Suite: Subscribe Send Feedback

Compiler User Guide. Intel Quartus Prime Pro Edition. Updated for Intel Quartus Prime Design Suite: Subscribe Send Feedback Compiler User Guide Intel Quartus Prime Pro Edition Updated for Intel Quartus Prime Design Suite: 18.0 Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1. Design Compilation...

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Digital System Design with SystemVerilog

Digital System Design with SystemVerilog Digital System Design with SystemVerilog Mark Zwolinski AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid Capetown Sydney Tokyo

More information

Synthesis Options FPGA and ASIC Technology Comparison - 1

Synthesis Options FPGA and ASIC Technology Comparison - 1 Synthesis Options Comparison - 1 2009 Xilinx, Inc. All Rights Reserved Welcome If you are new to FPGA design, this module will help you synthesize your design properly These synthesis techniques promote

More information

CS250 DISCUSSION #2. Colin Schmidt 9/18/2014 Std. Cell Slides adapted from Ben Keller

CS250 DISCUSSION #2. Colin Schmidt 9/18/2014 Std. Cell Slides adapted from Ben Keller CS250 DISCUSSION #2 Colin Schmidt 9/18/2014 Std. Cell Slides adapted from Ben Keller LAST TIME... Overview of course structure Class tools/unix basics THIS TIME... Synthesis report overview for Lab 2 Lab

More information

General Framework for Removal of Clock Network Pessimism

General Framework for Removal of Clock Network Pessimism General Framework for Removal of Clock Network Pessimism Jindrich Zejda Synopsys, Inc. 700 East Middlefield Road Mountain View, CA 94043, U.S.A. +1 650 584-5067 zejdaj@synopsys.com Paul Frain Synopsys,

More information

Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs

Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs Sudheer Vemula, Student Member, IEEE, and Charles Stroud, Fellow, IEEE Abstract The first Built-In Self-Test (BIST) approach for the programmable

More information

COPYRIGHTED MATERIAL. Architecting Speed. Chapter 1. Sophisticated tool optimizations are often not good enough to meet most design

COPYRIGHTED MATERIAL. Architecting Speed. Chapter 1. Sophisticated tool optimizations are often not good enough to meet most design Chapter 1 Architecting Speed Sophisticated tool optimizations are often not good enough to meet most design constraints if an arbitrary coding style is used. This chapter discusses the first of three primary

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Graphics: Alexandra Nolte, Gesine Marwedel, Universität Dortmund. RTL Synthesis

Graphics: Alexandra Nolte, Gesine Marwedel, Universität Dortmund. RTL Synthesis Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Universität Dortmund RTL Synthesis Purpose of HDLs Purpose of Hardware Description Languages: Capture design in Register Transfer Language form i.e. All

More information

A framework for verification of Program Control Unit of VLIW processors

A framework for verification of Program Control Unit of VLIW processors A framework for verification of Program Control Unit of VLIW processors Santhosh Billava, Saankhya Labs, Bangalore, India (santoshb@saankhyalabs.com) Sharangdhar M Honwadkar, Saankhya Labs, Bangalore,

More information

Comprehensive Place-and-Route Platform Olympus-SoC

Comprehensive Place-and-Route Platform Olympus-SoC Comprehensive Place-and-Route Platform Olympus-SoC Digital IC Design D A T A S H E E T BENEFITS: Olympus-SoC is a comprehensive netlist-to-gdsii physical design implementation platform. Solving Advanced

More information

Design of a Low Density Parity Check Iterative Decoder

Design of a Low Density Parity Check Iterative Decoder 1 Design of a Low Density Parity Check Iterative Decoder Jean Nguyen, Computer Engineer, University of Wisconsin Madison Dr. Borivoje Nikolic, Faculty Advisor, Electrical Engineer, University of California,

More information

12. Use of Test Generation Algorithms and Emulation

12. Use of Test Generation Algorithms and Emulation 12. Use of Test Generation Algorithms and Emulation 1 12. Use of Test Generation Algorithms and Emulation Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin

More information

Static Timing Verification of Custom Blocks Using Synopsys NanoTime Tool

Static Timing Verification of Custom Blocks Using Synopsys NanoTime Tool White Paper Static Timing Verification of Custom Blocks Using Synopsys NanoTime Tool September 2009 Author Dr. Larry G. Jones, Implementation Group, Synopsys, Inc. Introduction With the continued evolution

More information

Selecting PLLs for ASIC Applications Requires Tradeoffs

Selecting PLLs for ASIC Applications Requires Tradeoffs Selecting PLLs for ASIC Applications Requires Tradeoffs John G. Maneatis, Ph.., President, True Circuits, Inc. Los Altos, California October 7, 2004 Phase-Locked Loops (PLLs) are commonly used to perform

More information

When addressing VLSI design most books start from a welldefined

When addressing VLSI design most books start from a welldefined Objectives An ASIC application MSDAP Analyze the application requirement System level setting of an application Define operation mode Define signals and pins Top level model Write a specification When

More information