Sequential Logic Synthesis with Retiming in Encounter RTL Compiler (RC)
|
|
- Abigayle Stokes
- 5 years ago
- Views:
Transcription
1 Sequential Logic Synthesis with Retiming in Encounter RTL Compiler (RC) Christoph Albrecht 1, Shrirang Dhamdhere 1, Suresh Nair 1, Krishnan Palaniswami 2, Sascha Richter 1 1 Cadence Design Systems, 2 Focus Semiconductor Session Track: Digital IC Design Session Number: 2.3 Relevant Cadence Products: Encounter RTL Compiler (RC), Encounter Conformal Logic Equivalence Checker (LEC) Abstract Typical ASIC designs are highly unbalanced with respect to the timing criticality of their combinational logic paths. This is mainly due to the ad-hoc manual design specification of the register transfer level (RTL), which does not use any information regarding the sequential timing criticality. Traditional logic synthesis does not support borrowing of timing slack across registers, and the optimization is restricted by fixed positions of the registers. This may result in a suboptimal solution, in a loss of performance, and unnecessary area and power consumption. This paper explains the concept of clock scheduling and retiming used by Encounter RTL Compiler (RC) to optimize across register boundaries. Retiming is a structural transformation which changes the positions of the registers without modifying the input-output behavior of the circuit. The reader will understand how the area, the number of registers, or the delay of the design is minimized. Computational results show the tradeoff between these two objectives. Practical applications are discussed: Registers may have different control signals, enable signals, or reset signals. This leads to the multiclass retiming problem and the reset line justification problem. Retiming used to be a difficult challenge for equivalence checking. However, together with Encounter Conformal Logic Equivalence Checker (LEC) the verification is now simple: RC writes out checkpoint netlist files and one script, which LEC can then process to automatically verify the golden RTL against the final netlist. We present a case study showing how retiming was used by Focus Semiconductor, a division of Focus Enhancements, on a 1.5 M instance UWB baseband chip. Retiming substantially improved the Quality of Results (QoR) and helped to meet the design objectives. CDNLive! Silicon Valley
2 1 Introduction Traditional combinatorial logic synthesis focuses all the optimization efforts on the combinational paths between the registers. It does not support any tradeoff between tight paths and loose paths when these are separated by registers. To motivate the use of sequential logic synthesis with retiming, we will discuss the slack distribution of a typical ASIC design. Figure 1: Slack distribution of a typical ASIC design. Figure 1 shows the slack distribution, more specifically the distribution of the setup slacks of a late-mode analysis after synthesis. For each slack interval on the x-axis, the number of combinational paths which have a slack value within that interval is shown. The design has a worst negative slack of -529 ps. Figure 2: Slack distribution of the same ASIC design for which the slack distribution is shown in Figure 1, however this time with optimized clock latencies. Figure 2 shows for the same design an optimized slack distribution. The netlist was not changed, only the clock latencies at the registers. The latencies were computed with a slack balancing algorithm which we will discuss later. The number of critical paths has decreased drastically. Only a small fraction of the paths have a negative latency. In this case it was not possible to improve the worst negative slack, because the worst path in this design is a path from a primary input to a primary output. The two figures, Figure 1 and Figure 2, impressively demonstrate the optimization potential which becomes available when the registers are unlocked and not kept fixed as hard boundaries, which constrains the synthesis optimization algorithms. With the optimized clock latencies, many paths become uncritical. The additional slack can be used to downsize the combinational gates or even to use a different logic structure that has smaller area and power consumption. While clock scheduling was not able to reduce the worst negative slack for this specific design, clock scheduling was able to improve the slack of the side paths. These are either combinational paths that start CDNLive! Silicon Valley
3 at the primary input of the critical path and end at a register or paths that start at a register and end at a primary output. This is helpful for the synthesis optimization algorithms in RC. RC is able to improve the slack of a path by using slack of the side paths. In this paper we discuss the two sequential optimization techniques, clock scheduling and retiming, and show how the combination of both these techniques is used in RC. The paper is organized as follows: In Section 2 we discuss clock scheduling. Clock scheduling is also known as useful skew. It changes the latencies of the clock signal but does not change the logic. The different latencies need to be realized by a sophisticated clock network. In Section 3 we describe retiming. Retiming is a structural transformation. While retiming does not change the combinational gates, it modifies the netlist by moving the registers forward and backward in the logic. RC can use clock scheduling as an intermediate step to drive the logic synthesis and optimization process. Ultimately, it realizes the different latencies by retiming so that a conventional zero or limited skew backend flow can place the design, construct the clock network, and route the nets. This is described in Section 4. In practice, retiming can be constrained by registers that have different control signals (for example, enable signals, asynchronous set or reset signals). Section 5 discusses these constraints. In Section 6 we discuss the automatic verification flow with LEC. In the last section we present a case study how retiming was used on an UWB baseband chip from Focus Semiconductors. 2 Clock Scheduling The following figure shows how the worst slack of a design can be improved by changing the clock latencies: Buffers are added to the clock distribution network and the switching time of the register is delayed. In this case the worst slack is improved from -2 ns to 0 ns and the design meets the timing requirements. If the clock latency of the capturing register of a combinational path is increased, the slack of the combinational path increases by the same amount. If, on the other hand, the clock latency for the capturing register is decreased, the slack of the combinational path decreases. Increasing the clock latency of the launching register decreases the slack and decreasing the latency has the opposite effect on the slack of the path. 4 ns 3 ns 3 ns 2 ns 3 ns 1 ns 2 ns 1 ns 1 ns clock + 2 ns + 1 ns + 1 ns Target clock period: 5 ns Worst slack without clock latencies: Worst slack with clock latencies: - 2 ns 0 ns Figure 3: The worst slack is improved by adjusting the clock latencies. CDNLive! Silicon Valley
4 A linear programming formulation The clock scheduling problem can be formulated as a linear program. This was first done by Fishburn in 1990 [1]. Let T be the clock period. The clock period should be minimized. Furthermore, let l i be the latency of the clock signal arriving at register i, and let d ij be the maximum delay of all combinational path from register i to register j. min T subject to l i + d ij l j + T for all combinational paths (i, j). The difference in the inequality is the slack. Should the design have constrained primary inputs or outputs, we can represent all these inputs and outputs by one dummy register that can have, without loss of generality, a clock latency of zero. Hence, we can assume that even in this case the linear program has the form above. The linear program is a very special linear program and it can be solved efficiently with combinatorial algorithms. It can be proved that the minimum clock period achievable by clock scheduling is equal to the maximum average path delay of all cycles in the register-to-register timing graph. The register-to-register timing graph contains a node for every register and an edge whenever there is a combinational path between the registers with a weight equal to the maximum delay of these paths. In general, the linear program does not have one single solution. However, any solution that minimizes the clock period is usually not desirable. For example, we examined the ASIC design for which the two different slack distributions are shown in Figure 1 and Figure 2. The worst negative slacks of the two slack distributions are equal and so are the clock periods at which the chips can operate without failure. Clock scheduling optimally balancing the slack In the following we discuss how it is possible to compute a clock schedule with a specific property which we call optimally balanced slack. As a result of this property many paths are uncritical and have a lot of slack. This part is more theoretical and if the time of the reader is limited, we recommend skipping this part because the sections following are more important for the practical use. We consider a small example circuit with four registers, a, b, c, and d, shown in Figure a 5 6 b 4 5 d 9 c Figure 4: Example circuit with combinational gates and four registers. The numbers specify the delay of the gates. From the circuit we can construct the register-to-register timing graph which is shown in the following figure. The graph has one node for each of the four registers and an edge between two nodes whenever there is a combinational path between the corresponding registers. Associated with the edges is the maximum delay of the combinational paths. CDNLive! Silicon Valley
5 a 6 b c 5 9 d Figure 5: Register-to-register timing graph for the circuit in Figure 4. Without clock latencies, the minimum feasible clock period for this circuit is equal to the maximum delay of the combinational paths, in this case T = 11. By increasing the clock latency for the register b to +1, the clock period can be decreased to T = 10. This is the minimum clock period which can be achieved by clock scheduling, because with these latencies the two paths (b,d) and (d,b) have a slack of zero. Figure 6 shows the register-to-register timing graph with the latency +1 at register b. In addition to the combinational delays we show also the slacks for the clock period T = 10 in brackets. 9 (1) a c 9 (1) 6 (5) 7 (3) 5 (5) 9 (1) 11 (0) b d +1 9 (0) clock period T = 10 delay (slack) Figure 6: A clock schedule applied to the registers such that the worst incoming slack equals the worst outgoing slack for every register. The edges corresponding to the critical paths with a slack smaller than or equal to 1 are shown in red. The clock schedule shown in Figure 6 has the property that for every register the worst incoming slack is equal to the worst outgoing slack. Changing the clock latency of one single register alone does not give an improvement, since the worst slack of all the paths starting or ending at the register can only get worse. The Figure 6 shows that there is one critical edge in red, the edge (d,c), which is not part of a critical cycle. It is possible to increase the slack of this edge by increasing the clock latency of the registers a and c simultaneously. This does not affect the two critical edges (c,a) and (a,c). The result is shown in Figure 7. In this figure the worst incoming slack equals the worst outgoing slack for every subset of the registers. Note that before, in Figure 6, the worst outgoing slack for the registers a and b together is equal to 5 whereas the worst incoming slack is only (1) +2 a c 9 (1) 6 (3) 7 (5) 5 (3) 9 (3) b (0) (0) d clock period T = 10 Figure 7: An optimally balanced clock schedule: The worst incoming slack equals the worst outgoing slack for every subset of the registers. CDNLive! Silicon Valley
6 The clock schedule shown in Figure 2 on page 2, in which the number of critical paths has decreased so drastically, has exactly this property. It is computationally too expensive to consider all subsets of the registers, because there are exponentially many cycles. Nevertheless, the efficient minimum mean balance algorithm by Young, Taran and Orlin [3] can find such a solution by iteratively finding critical cycles and contracting them. For synthesis operations it is helpful if the side paths of a critical path have additional slack. The slack can be used to reduce the delay of the critical path. An example for such a synthesis operation is Shannon decomposition shown in the following figure. combinational logic x 0 x a critical path a 1 Figure 8: A critical path becomes short and fast using Shannon decomposition. If only one path starting at a point a and ending at a point x is critical and all other paths ending at x are uncritical, then the fanin logic of x can be duplicated twice, once the value of a is permanently set to zero and once it is set to one. The two outputs of the replicated logic feed a multiplexer that chooses the right value for x depending on the value for a. The constant values for a are propagated to simplify the logic. After this transformation the path from a to x is very short and hence very fast. Limitations of clock scheduling Clock scheduling has limitations. Changing the clock latencies may increase the number of hold violations. The hold constraint ensures that data signals do not arrive too early at the data input pin of the register at the end of the path. The signal has to arrive after the register has closed. A high number can potentially lead to an enormous number of hold buffers, which need to be added at the end of the flow. Due to process variations the final delay of the paths on the fabricated chip can deviate from the computed delay. This limits the use of clock scheduling further. For example, it is not possible to have a long combinational path that has a combinational delay equal to ten times the clock period and realize the timing constraints by adjusting the latencies of the clock signals at the launching and receiving register. On such a combinational path there would be 10 different data signals at the same time. These signals need to arrive at the receiving register at the right time. If the combinational delay of the path were only 10% smaller on the final fabricated chip due to process variations, the signal would arrive too early and this would result in a hold time violation. As the delay could also increase, it is not possible to fix this hold violation by adding additional delay with hold buffers. Nevertheless, RC can use internally large positive and negative clock latencies and optimize the combinational logic with these latencies. In the end, the latencies are realized by retiming and moving the registers through the combinational logic. The latencies are only bounded by the number and the movement of the registers. CDNLive! Silicon Valley
7 3 Retiming Retiming is a powerful sequential optimization technique which overcomes the limitations of clock scheduling. Retiming moves the registers across the combinational logic to improve the performance without changing the input/output behavior of the circuit. The following figure shows the slack of a circuit can be improved by retiming. It is the same circuit for which we applied clock scheduling in Figure 4. The registers are retimed backward against the direction of the signal propagation. 4 ns 3 ns 3 ns 2 ns 3 ns 1 ns 2 ns 1 ns 1 ns Target clock period: 5 ns Worst slack before retiming: - 2 ns 4 ns 3 ns 3 ns 2 ns 3 ns 1 ns 2 ns Worst slack after retiming: 1 ns 1 ns Figure 9: The worst slack is improved by retiming the registers 0 ns backward against the direction of the signal propagation. This example shows that retiming changes the number of registers. In this case, the number of registers increases. However, the number of registers can also decrease. RC minimizes the clock period as a first objective. Among all possible retiming solutions that achieve the minimum clock period, RC finds the solution with the minimum number of registers. In addition, RC has the option to minimize the number of registers without increasing the current clock period. Any retiming can be achieved by a sequence of two elementary retiming steps: Forward retiming removes the registers at the input of a gate and creates new registers at the outputs. Backward retiming does the opposite: It removes the registers at the output and creates a new register at each input. The two retiming steps are shown in the following figure. forward retiming backward retiming Figure 10: Registers retimed forward and backward over an AND gate. For forward retiming it is necessary that each input of the gate is driven by a register. Similarly, for backward retiming the gate must not drive any combinational gate but only registers. In order to ensure equivalent input / output behavior of the circuit, retiming cannot change the number of registers on any loop and on any path from a primary input to a primary output path. This is guaranteed by the two operations. Of course, it may still be possible to retime registers forward or backward over a gate if CDNLive! Silicon Valley
8 this condition does not hold for the original circuit, but the condition has to be achieved by elementary retiming steps applied for the other gates before. Constants and dangling logic (logic that does not drive anything) are an exception. Constant propagation as part of the RC synthesis operations simplifies any logic driven by a constant, unless the gates are preserved by an attribute. Similarly, dangling logic is removed. However, should this logic be preserved, retiming is able to create or remove registers at constants and dangling logic. The following figure shows an example in which retiming cannot improve the critical path because no elementary retiming step is possible: A B C Figure 11: An example in which retiming cannot improve the clock period because the register cannot be moved forward. Depending on the clk-to-q delay of the register, the critical path goes from the register to the primary output C. If the primary inputs are even unconstrained, then the critical path starts at the register in any case. Just checking the slack at the data input pin and the output pin of the register, the user may wonder why the register was not moved forward. This is not possible, because there is no register following directly the primary input B. Efficient algorithms for retiming have been developed and published. We refer the interested reader to the fundamental paper by Leiserson and Saxe published in 1991 [2] in which the problem of finding a retiming realizing a given clock period and minimizing the number of registers is formulated and solved as a minimum cost flow problem. Polynomial time algorithms have been developed for this problem. A comprehensive book about timing in general and clock scheduling and retiming is the recent book by S. Sapatnekar [5]. Relationship between clock scheduling and retiming The two sequential optimization techniques, clock scheduling and retiming are related: It can be proved that the clock period achievable by clock scheduling (ignoring any hold constraints) is a lower bound on the clock period that can be achieved by retiming [3]. It can also be proved that retiming can almost achieve this clock period: The minimum clock period achievable by retiming is at most the minimum clock period achievable by clock scheduling plus the maximum delay of all gates. If a clock schedule is given a retiming can be computed as follows: Find a register with the maximum positive clock latency. Decrease the clock latency until the incoming slack is zero. If the slack is already zero, perform a backward retiming over the gate driving the register. The new registers added in front of the gate get a clock latency equal to the latency of the original registers minus the delay of the gate. This procedure is repeated until the clock latency of each register is smaller than half the delay of the gate driving the register. Then a similar procedure is applied for registers with the minimum negative clock latency. The registers are moved forward and the clock latency is increased by the delay of the gate until the clock latency of each register is larger than the negative value of half the delay of the gate driven by the register. If the clock latency of every register is then set to zero, then the retimed circuit has a clock period of which is at most the clock period of the original circuit with clock scheduling plus the maximum delay of all gates. CDNLive! Silicon Valley
9 4 The global sequentially driven synthesis flow in RC RC combines the two sequential optimization techniques, clock scheduling and retiming, in a global sequential synthesis flow shown in the following figure. sequentially driven synthesis and optimization combinational synthesis clock scheduling retiming combinational synthesis Figure 12: The global sequentially driven synthesis flow in RC The logic synthesis and optimization algorithms are tightly interlinked with clock scheduling. Clock scheduling computes clock latencies which improve the clock period and the slack of the combinatorial paths. The synthesis algorithms can use slack of side paths to further improve critical paths. In the next step, retiming moves the registers through the combinational logic. It minimizes the clock period and as second objective minimizes the number of registers. Ultimately, retiming is followed once more by combinational synthesis. This is necessary because the loads of the gates have changed as the registers were moved. RC performs these steps automatically. The user only has to set the attribute retime to true for either the top design or the subdesigns for which retiming should be performed and then call the synthesize command. 5 Special cases for retiming In this section we describe special cases for retiming due to control signals at the registers. The control signals at the registers may constrain the movement of the registers. First we discuss the retiming of registers with enable signals. Then we describe the case when registers with an enable signal are implemented by a simple register with a multiplexer feedback loop. Finally, we discuss asynchronous set and reset signals. Retiming of registers with different enable signals In practice, the retiming of the registers can be constrained: The registers in the circuit may have different control signals, for example enable signals. Retiming cannot combine registers which have different control signals. Figure 13 shows an example. To improve the timing, the two registers should be combined and retimed backward. However, this is not possible because the two registers receive different enable signals. RC can combine and retime registers forward or backward only if they receive the same enable signals. CDNLive! Silicon Valley
10 en 1 clock enable 1 enable en 2 Figure 13: The two registers cannot be moved backward because they receive different enable signals. Multiplexer feedback loop Registers with an enable signal can also be implemented by a simple register and a multiplexer. This may be an advantage for retiming because the registers can then be merged even though the enable signals are different. It may, however, also constrain the register movement and increase the number of registers. Figure 14 shows that the number of registers can be larger. It is a pipeline design with three stages of registers at the primary outputs. The enable is realized by a multiplexer. When the registers are retimed into the combinational logic (applying only the elementary retiming steps in Figure 10), one register has to remain in each loop with the multiplexer. Furthermore, registers pile up at the select lines of the multiplexer. enable 1 enable 2 enable 3 enable 1 enable 2 enable 3 Figure 14: Registers with enable can be implemented by a simple register and a multiplexer. This may increase the register count when the registers are moved backward. If the registers have an enable signal instead of a loop with a multiplexer that can be moved with the registers, then the number of registers after retiming is smaller. If the registers with the multiplexers are at the primary inputs and have to be moved forward, the problem is different: only the last register can be retimed forward. To retime more registers forward it would be necessary to have additional registers at the select line of the multiplexers. By default RC uses registers which have enable logic built into the register. Only if the variable hdl_ff_keep_feedback is true, RC uses simple registers which are in a loop with a multiplexer. The results depend on the structure of the design and can differ drastically. Retiming of registers with asynchronous set and reset signals Retiming of registers with asynchronous set or reset signals is more involved. When these registers are retimed forward or backward through the combinational logic it is necessary to compute the new reset values. Moving these registers forward through the combinational logic is simple: The reset values are propagated through the logic. Figure 15 shows an example. CDNLive! Silicon Valley
11 Figure 15: The registers are retimed forward. The reset values are propagated to the registers in the new locations. Moving registers backward is more complicated. First, all the registers driven by the gate need to have the same reset values. Second, the reset values of the new registers that drive the inputs of the gate are not unique. A naive approach that moves the registers over the gates one gate by the next and randomly chooses any reset values is not possible. The wrong reset values could be chosen such that later the registers cannot be retimed backward over a gate because the reset values are different. Hence, it is necessary to solve a global problem: what are the required 0/1 reset values for the registers in the new locations such that propagating these values through the logic results in the given reset values at the registers in the new location? This problem can be transformed into a satisfiablity problem. It is very similar to verifying that two netlists are equivalent, in which we ask the question: do 0/1 values exist for the registers and primary inputs such that propagating these values through the logic results in different values at a input of a register or a primary output? Sometimes no 0/1 reset values exist for the registers in the new locations, such that propagating these values forward would result in the right given values at the original locations. The following figure shows an example. In this case no valid reset values exist if the registers were moved further backward. RC can move registers with asynchronous set or reset backward only as far as valid reset values for the registers exist. 1? Figure 16: It is not possible to find reset values for the registers in the new locations such that propagating these values results in the given values for the registers in the original locations. If all the registers that retiming needs to merge and move either forward or backward receive equivalent control signals and if also the reset line justification problem is solvable, then retiming is more powerful than clock scheduling. It is possible to have extremely long combinational paths that have a delay as large as several times the clock period. If there are sufficient registers at the beginning or end of the paths, retiming can move these registers into the combinational logic and still achieve the target clock period. Earlier we had seen that clock scheduling is limited because hold constraints need to be considered. If the delays of the paths as well as the variations of the path delays are too large, it is at some point impossible to realize the hold constraints together with the setup constraints. Retiming may increase the number of registers. This is the only drawback. For some designs the increase can be significant. However, RC can also decrease the number of registers. Usually for larger designs that have only one critical part, RC can improve the clock period as well as decrease the number of registers: In the uncritical parts the locations of the registers are very flexible and hence the registers can be moved and possibly merged. CDNLive! Silicon Valley
12 6 An automated verification flow Retiming used to pose fundamental hurdles for equivalence checking. Proving that two netlists are equivalent if one netlist was generated from another netlist through combinational synthesis as well as through retiming is a problem of enormous complexity. To address these verification challenges RC writes out checkpoint files (Verilog netlist) that describe the design at a particular stage. When retiming is used, RC can write out the checkpoint files before and after retiming as shown in the following diagram. RC LEC read RTL initial RTL combinational synthesis equivalence check 1 (combinational) write checkpoint file retiming write checkpoint file combinational synthesis write final netlist pre-retiming checkpoint netlist post-retiming checkpoint netlist final netlist equivalence check 2 (retiming) equivalence check 3 (combinational) Figure 17: The automated synthesis and verification flow with checkpoint files generated by RC and read by LEC. Along with each checkpoint file, RC also generates a corresponding dofile, a command script used by Conformal Logic Equivalence Checker (LEC). Equivalence between RTL and the final netlist is established through a series of verification steps which compare the initial RTL with first checkpoint_file, checkpoint tocheckpoint file and last checkpoint file to the final netlist. The appropriate dofile sets up the verification of corresponding stages as shown in the diagram. Conformal verifies the equivalence under the assumption that either only combinational synthesis operations were performed or only the registers were moved by retiming operations. 7 Case study: Retiming for an UWB baseband chip from Focus Enhancements As a case study we describe how retiming in RC was used by Focus Semiconductor, a division of Focus Enhancements, for the dual-phy UWB baseband chip MADRAS. This chip supports a proprietary Focus (Turbo) mode and a WiMedia mode which is compliant with the Multiband OFDM Alliance (MBOA). The Focus mode is more powerful than the MBOA mode: The ratio of the bandwidth versus the distance is about 2x greater. The chip is designed in a 0.13um CMOS TSMC process technology with an analog front end. It has about 4 million transistors which correspond to approximately 1.5 million instances. The Synchronization Module has a three stage hierarchical datapath implementation. Each stage is composed of a finite input response (FIR) filter which required datapath optimization support from RC. The Synchronization Peak Finder Module contains a divider which is used to normalize the synchronization threshold. Enough pipeline registers were added at the inputs and outputs of the block. RC then rebalances the combinational paths by retiming the registers into the combinational logic. CDNLive! Silicon Valley
13 The Coarse Equalization Module consists of a Media Access Controller (MAC) and scratchpad memory. Retiming was also used for this module. Pipeline registers were added at the primary inputs and outputs and retiming automatically moved these registers into the logic and rebalanced the delay of the combinational paths. The Fine Equalization and the Tracking Module use a similar MAC and memory that made the use of retiming for these modules necessary. A top-down sequential synthesis flow with retiming The design consists of a 600K instance top level block FPT which was synthesized top-down. The retime attribute was set on 16 submodules corresponding to about 45% of the total logic and 49% of the registers. The following table shows all the modules for which the retime attribute was set to true in the automatic synthesize retime flow. number of registers clock period (ps) subdesign gates PIs POs before after change before after change block_1 51, ,589 2, % 12,908 3, % block_2 13, ,766 2, % 13,119 3, % block_3 28, ,283 6, % 6,583 3, % block_4 2, % 6,724 3, % block_5 17, % 5,489 3, % block_6 8, % 9,044 4, % block_7-a 7, ,269 1, % 5,484 3, % block_7-b 7, ,269 1, % 5,484 3, % block_7-c 7, ,269 1, % 5,451 3, % block_7-d 7, ,269 1, % 5,446 3, % block_7-e 7, ,269 1, % 5,465 3, % block_7-f 7, ,269 1, % 5,459 3, % block_8 7, ,088 1, % 8,421 5, % block_9 28, ,500 1, % 12,291 5, % block_10 18, ,862 3, % 9,195 4, % block_11 88,925 1,683 1,700 6,694 5, % 5,212 4, % Average 19, ,081 2, % (1) 7,611 3, % (2) (1) percentage change of the average number of registers before and after retiming (2) average of the percentage change of the clock period before and after retiming The table shows the number of combinational gates, the number of primary inputs (PIs), and the number of primary outputs (POs). The next three columns show the number of registers before and after retiming and the percentage change. The last three columns show the clock period in picoseconds before and after retiming and the percentage change. The table shows that retiming can increase and decrease the number of registers. Overall the number of registers decreases by 0.6%. The clock period improves always. For many of the subdesigns it is expected that the clock period decreases by a large amount because pipeline registers were added at either the primary inputs or primary outputs. CDNLive! Silicon Valley
14 Conclusion With increasing demands for faster designs and shorter time-to-market, it is important for designers to look for efficient optimization techniques. Retiming in Encounter RTL Compiler is one very powerful technique that can achieve substantial improvements in performance. In this paper we have described how RTL Compiler uses clock scheduling in a sequentially driven synthesis flow and then performs retiming minimizing the clock period and the number of registers. We have discussed special cases of retiming, registers with enable signals, registers with a multiplexer feedback loop and registers with asynchronous set and reset signals. With RTL Compiler it is easy to perform retiming and the direct link to Conformal Logic Equivalence Checking provides a complete verification solution. References [1] J. P. Fishburn, Clock Skew Optimization, IEEE Transactions on Computers, vol. 39, pp , July [2] C. Leiserson and J. Saxe, Retiming Synchronous Circuitry, Algorithmica, vol. 6, pp. 5-35, [3] N. E. Young, R. E. Tarjan, J. B. Orlin: Faster Parametric Shortest path and Minimum Balance Algorithms, Networks, 21 (1991), [4] S. S. Sapatnekar, R. B. Deokar: Utilizing the retiming-skew equivalence in a practical algorithm for retiming large circuits, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 15, no. 10, October [5] S. S. Sapatnekar, Timing, Kluwer Academic Publishers, Boston, MA, CDNLive! Silicon Valley
15 Appendix: Encounter RTL Compiler commands for retiming Automatic synthesis with retiming It is easy to use retiming in RC: only the attribute retime needs to be set to true for the design or subdesign which should be retimed. Then during synthesis the design or subdesign is processed automatically by the sequentially driven synthesis flow with retiming as described in Section 4. set_attr retime true [subdesign] synthesize to_mapped Manual retiming flow This flow can be used when a specific module or modules need to be retimed. It can be used as an exploratory tool to see the impact of what retiming can do for a subdesign in a mapped design. The first step retime prepare prepares the design for retiming and retime min_delay performs the actual retiming. Even though retime min_delay performs a local mapping of immediate logic near the flops, it is recommended to follow it with an incremental synthesis or preferably a global synthesis depending on the granularity of the changes. retime prepare [subdesign design ] retime min_delay [subdesign design ] synthesize to_mapped [-incr ] Manual retiming flow minimizing the number of registers This flow explicitly tries to minimize the number of registers and thus the area. This should be used only for a design which has positive slack. synthesize to_mapped retime min_area [subdesign design ] synthesize to_mapped [-incr ] Attributes set_attr dont_retime true [flop] set_attr retime_hard_region true \ [subdesign] set_attr boundary_opto false \ [subdesign] set_attr retime_async_reset true set_attr retime_optimize_reset true Do not retime the register specified. Retiming cannot move registers into or out of the subdesign. Disable boundary optimization (constant propagation and rewiring of equivalent signals across hierarchy) and preserve the input and output pins of a subdesign. This enables easier ECO for the blocks and might be necessary for formal verification. Enable retiming on flops with asynchronous set or reset signals. The runtime may increase if registers need to be moved backward. By default, registers with asynchronous set or reset signals are excluded from retiming. If this attribute is used in combination with the previous attribute, the reset logic is optimized by replacing asynchronous flops with simple flops wherever possible. For more information refer to the Encounter RTL Compiler User Guide, chapter 9, Retiming the Design. CDNLive! Silicon Valley
16 Interface to Conformal Logic Equivalence Checker (LEC) The checkpoint files of the automatic verification flow described in Section 6 and the corresponding dofiles for LEC are generated by RC if the checkpoint attributes are set as shown below. set_attribute checkpoint_flow true set_attribute library my_library.lib read my_design.v elaborate set_attribute checkpoint_netlist_naming_style \ my_chk_dir/chk_%d.v /designs/my_top set_attribute checkpoint_dofile_naming_style \ my_chk_dir/chk_%d_to_chk_%d.do /designs/my_top read_sdc my_constraints.sdc set_attr retime true my_top synthesize to_mapped write m > final.v write_do_lec revised final.v > final.do To run LEC lec -ultra Dofile hdl_to_chk_01.do lec -ultra Dofile chk_01_to_chk_02.do lec -ultra Dofile final.do For more information refer to the document Interfacing between RTL Compiler and Conformal. CDNLive! Silicon Valley
FishTail: The Formal Generation, Verification and Management of Golden Timing Constraints
FishTail: The Formal Generation, Verification and Management of Golden Timing Constraints Chip design is not getting any easier. With increased gate counts, higher clock speeds, smaller chip sizes and
More informationAdvanced FPGA Design Methodologies with Xilinx Vivado
Advanced FPGA Design Methodologies with Xilinx Vivado Alexander Jäger Computer Architecture Group Heidelberg University, Germany Abstract With shrinking feature sizes in the ASIC manufacturing technology,
More informationFPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.
FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different
More informationOverview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions
Logic Synthesis Overview Design flow Principles of logic synthesis Logic Synthesis with the common tools Conclusions 2 System Design Flow Electronic System Level (ESL) flow System C TLM, Verification,
More information8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments
8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments QII51017-9.0.0 Introduction The Quartus II incremental compilation feature allows you to partition a design, compile partitions
More informationVerilog for High Performance
Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationCluster-based approach eases clock tree synthesis
Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network
More informationVLSI Test Technology and Reliability (ET4076)
VLSI Test Technology and Reliability (ET4076) Lecture 4(part 2) Testability Measurements (Chapter 6) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Previous lecture What
More informationCAD Technology of the SX-9
KONNO Yoshihiro, IKAWA Yasuhiro, SAWANO Tomoki KANAMARU Keisuke, ONO Koki, KUMAZAKI Masahito Abstract This paper outlines the design techniques and CAD technology used with the SX-9. The LSI and package
More informationFILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas
FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given
More informationHardware Verification Group. Department of Electrical and Computer Engineering, Concordia University, Montreal, Canada. CAD Tool Tutorial.
Digital Logic Synthesis and Equivalence Checking Tools Hardware Verification Group Department of Electrical and Computer Engineering, Concordia University, Montreal, Canada CAD Tool Tutorial May, 2010
More informationVHDL for Synthesis. Course Description. Course Duration. Goals
VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes
More information3. Formal Equivalence Checking
3. Formal Equivalence Checking 1 3. Formal Equivalence Checking Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin Verification of Digital Systems Spring
More informationMapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience
Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience H. Krupnova CMG/FMVG, ST Microelectronics Grenoble, France Helena.Krupnova@st.com Abstract Today, having a fast hardware
More informationPrimeTime: Introduction to Static Timing Analysis Workshop
i-1 PrimeTime: Introduction to Static Timing Analysis Workshop Synopsys Customer Education Services 2002 Synopsys, Inc. All Rights Reserved PrimeTime: Introduction to Static 34000-000-S16 Timing Analysis
More informationLogic Verification 13-1
Logic Verification 13-1 Verification The goal of verification To ensure 100% correct in functionality and timing Spend 50 ~ 70% of time to verify a design Functional verification Simulation Formal proof
More informationBest Practices for Implementing ARM Cortex -A12 Processor and Mali TM -T6XX GPUs for Mid-Range Mobile SoCs.
Best Practices for Implementing ARM Cortex -A12 Processor and Mali TM -T6XX GPUs for Mid-Range Mobile SoCs. Cortex-A12: ARM-Cadence collaboration Joint team working on ARM Cortex -A12 irm flow irm content:
More informationIs Power State Table Golden?
Is Power State Table Golden? Harsha Vardhan #1, Ankush Bagotra #2, Neha Bajaj #3 # Synopsys India Pvt. Ltd Bangalore, India 1 dhv@synopsys.com 2 ankushb@synopsys.com 3 nehab@synopsys.com Abstract: Independent
More informationEE178 Spring 2018 Lecture Module 4. Eric Crabill
EE178 Spring 2018 Lecture Module 4 Eric Crabill Goals Implementation tradeoffs Design variables: throughput, latency, area Pipelining for throughput Retiming for throughput and latency Interleaving for
More informationHigh-Level Synthesis (HLS)
Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationBest Practices for Incremental Compilation Partitions and Floorplan Assignments
Best Practices for Incremental Compilation Partitions and Floorplan Assignments December 2007, ver. 1.0 Application Note 470 Introduction The Quartus II incremental compilation feature allows you to partition
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits
More informationLecture 11 Logic Synthesis, Part 2
Lecture 11 Logic Synthesis, Part 2 Xuan Silvia Zhang Washington University in St. Louis http://classes.engineering.wustl.edu/ese461/ Write Synthesizable Code Use meaningful names for signals and variables
More informationCHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER
84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The
More informationADVANCED DIGITAL IC DESIGN. Digital Verification Basic Concepts
1 ADVANCED DIGITAL IC DESIGN (SESSION 6) Digital Verification Basic Concepts Need for Verification 2 Exponential increase in the complexity of ASIC implies need for sophisticated verification methods to
More informationVLSI Testing. Virendra Singh. Bangalore E0 286: Test & Verification of SoC Design Lecture - 7. Jan 27,
VLSI Testing Fault Simulation Virendra Singh Indian Institute t of Science Bangalore virendra@computer.org E 286: Test & Verification of SoC Design Lecture - 7 Jan 27, 2 E-286@SERC Fault Simulation Jan
More informationAccuCore SPICE Accurate Core Characterization with STA. Silvaco Japan Technology Seminar Spring 2007
AccuCore SPICE Accurate Core Characterization with STA Silvaco Japan Technology Seminar Spring 2007 What is AccuCore? Why would I use it? AccuCore performs automatic block SPICE characterization and Static
More informationIntroduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation
Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,
More informationRetiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming.
Retiming Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford Outline Structural optimization methods. Retiming. Modeling. Retiming for minimum delay. Retiming for minimum
More informationPushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University
PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationENGN 1630: CPLD Simulation Fall ENGN 1630 Fall Simulating XC9572XLs on the ENGN1630 CPLD-II Board Using Xilinx ISim
ENGN 1630 Fall 2018 Simulating XC9572XLs on the ENGN1630 CPLD-II Board Using Xilinx ISim You will use the Xilinx ISim simulation software for the required timing simulation of the XC9572XL CPLD programmable
More informationTOPIC : Verilog Synthesis examples. Module 4.3 : Verilog synthesis
TOPIC : Verilog Synthesis examples Module 4.3 : Verilog synthesis Example : 4-bit magnitude comptarator Discuss synthesis of a 4-bit magnitude comparator to understand each step in the synthesis flow.
More informationSmartTime for Libero SoC v11.5
SmartTime for Libero SoC v11.5 User s Guide NOTE: PDF files are intended to be viewed on the printed page; links and cross-references in this PDF file may point to external files and generate an error
More informationVHDL simulation and synthesis
VHDL simulation and synthesis How we treat VHDL in this course You will not become an expert in VHDL after taking this course The goal is that you should learn how VHDL can be used for simulation and synthesis
More informationChapter 6. CMOS Functional Cells
Chapter 6 CMOS Functional Cells In the previous chapter we discussed methods of designing layout of logic gates and building blocks like transmission gates, multiplexers and tri-state inverters. In this
More informationSystemC-to-Layout ASIC Flow Walkthrough
SystemC-to-Layout ASIC Flow Walkthrough 20.6.2015 Running the Demo You can execute the flow automatically by executing the csh shell script: csh run_asic_demo.csh The script runs all tools in a sequence.
More informationJune 2003, ver. 1.2 Application Note 198
Timing Closure with the Quartus II Software June 2003, ver. 1.2 Application Note 198 Introduction With FPGA designs surpassing the multimillion-gate mark, designers need advanced tools to better address
More informationVerification of Clock Domain Crossing Jitter and Metastability Tolerance using Emulation
Verification of Clock Domain Crossing Jitter and Metastability Tolerance using Emulation Ashish Hari ashish_hari@mentor.com Suresh Krishnamurthy k_suresh@mentor.com Amit Jain amit_jain@mentor.com Yogesh
More information2015 Paper E2.1: Digital Electronics II
s 2015 Paper E2.1: Digital Electronics II Answer ALL questions. There are THREE questions on the paper. Question ONE counts for 40% of the marks, other questions 30% Time allowed: 2 hours (Not to be removed
More informationElectronic Design Automation Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Electronic Design Automation Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #1 Introduction So electronic design automation,
More informationModeling Asynchronous Circuits in ACL2 Using the Link-Joint Interface
Modeling Asynchronous Circuits in ACL2 Using the Link-Joint Interface Cuong Chau ckcuong@cs.utexas.edu Department of Computer Science The University of Texas at Austin April 19, 2016 Cuong Chau (UT Austin)
More informationVLSI Testing. Fault Simulation. Virendra Singh. Indian Institute of Science Bangalore
VLSI Testing Fault Simulation Virendra Singh Indian Institute of Science Bangalore virendra@computer.org E0 286: Test & Verification of SoC Design Lecture - 4 Jan 25, 2008 E0-286@SERC 1 Fault Model - Summary
More informationSigmaRAM Echo Clocks
SigmaRAM Echo s AN002 Introduction High speed, high throughput cell processing applications require fast access to data. As clock rates increase, the amount of time available to access and register data
More informationCAD Algorithms. Circuit Partitioning
CAD Algorithms Partitioning Mohammad Tehranipoor ECE Department 13 October 2008 1 Circuit Partitioning Partitioning: The process of decomposing a circuit/system into smaller subcircuits/subsystems, which
More informationLow-Power Technology for Image-Processing LSIs
Low- Technology for Image-Processing LSIs Yoshimi Asada The conventional LSI design assumed power would be supplied uniformly to all parts of an LSI. For a design with multiple supply voltages and a power
More informationESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)
ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) Objective Part A: To become acquainted with Spectre (or HSpice) by simulating an inverter,
More informationRegister Transfer Level
Register Transfer Level Something between the logic level and the architecture level A convenient way to describe synchronous sequential systems State diagrams for pros Hierarchy of Designs The design
More informationEE 5327 VLSI Design Laboratory Lab 8 (1 week) Formal Verification
EE 5327 VLSI Design Laboratory Lab 8 (1 week) Formal Verification PURPOSE: To use Formality and its formal techniques to prove or disprove the functional equivalence of two designs. Formality can be used
More informationProblem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.
Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.
More informationAn easy to read reference is:
1. Synopsis: Timing Analysis and Timing Constraints The objective of this lab is to make you familiar with two critical reports produced by the Xilinx ISE during your design synthesis and implementation.
More informationHigh-Level Synthesis
High-Level Synthesis 1 High-Level Synthesis 1. Basic definition 2. A typical HLS process 3. Scheduling techniques 4. Allocation and binding techniques 5. Advanced issues High-Level Synthesis 2 Introduction
More informationSilicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design
Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design Wei-Jin Dai, Dennis Huang, Chin-Chih Chang, Michel Courtoy Cadence Design Systems, Inc. Abstract A design methodology for the implementation
More informationHardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 01 Introduction Welcome to the course on Hardware
More informationLecture 1: Introduction Course arrangements Recap of basic digital design concepts EDA tool demonstration
TKT-1426 Digital design for FPGA, 6cp Fall 2011 http://www.tkt.cs.tut.fi/kurssit/1426/ Tampere University of Technology Department of Computer Systems Waqar Hussain Lecture Contents Lecture 1: Introduction
More informationMetodologie di progetto HW Il test di circuiti digitali
Metodologie di progetto HW Il test di circuiti digitali Introduzione Versione del 9/4/8 Metodologie di progetto HW Il test di circuiti digitali Introduction VLSI Realization Process Customer s need Determine
More informationTiming Constraints Editor User Guide
Libero SoC v11.8 SP1 and SP2 NOTE: PDF files are intended to be viewed on the printed page; links and cross-references in this PDF file may point to external files and generate an error when clicked. View
More informationQuick Look under the Hood of ABC
Quick Look under the Hood of ABC A Programmer s Manual December 25, 2006 Network ABC is similar to SIS/MVSIS in that it processes the design by applying a sequence of transformations to the current network,
More informationEE 330 Laboratory Experiment Number 11
EE 330 Laboratory Experiment Number 11 Design and Simulation of Digital Circuits using Hardware Description Languages Fall 2017 Contents Purpose:... 3 Background... 3 Part 1: Inverter... 4 1.1 Simulating
More informationAn overview of standard cell based digital VLSI design
An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased
More informationCombinational Equivalence Checking
Combinational Equivalence Checking Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab. Dept. of Electrical Engineering Indian Institute of Technology Bombay viren@ee.iitb.ac.in
More informationAdvanced VLSI Design Prof. Virendra K. Singh Department of Electrical Engineering Indian Institute of Technology Bombay
Advanced VLSI Design Prof. Virendra K. Singh Department of Electrical Engineering Indian Institute of Technology Bombay Lecture 40 VLSI Design Verification: An Introduction Hello. Welcome to the advance
More informationMetodologie di progetto HW Il test di circuiti digitali
Metodologie di progetto HW Il test di circuiti digitali Introduzione Versione del 9/4/8 Metodologie di progetto HW Il test di circuiti digitali Introduction Pag. 2 VLSI Realization Process Customer s need
More informationAccelerating CDC Verification Closure on Gate-Level Designs
Accelerating CDC Verification Closure on Gate-Level Designs Anwesha Choudhury, Ashish Hari anwesha_choudhary@mentor.com, ashish_hari@mentor.com Design Verification Technologies Mentor Graphics Abstract:
More informationPlacement Algorithm for FPGA Circuits
Placement Algorithm for FPGA Circuits ZOLTAN BARUCH, OCTAVIAN CREŢ, KALMAN PUSZTAI Computer Science Department, Technical University of Cluj-Napoca, 26, Bariţiu St., 3400 Cluj-Napoca, Romania {Zoltan.Baruch,
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationVHDL: RTL Synthesis Basics. 1 of 59
VHDL: RTL Synthesis Basics 1 of 59 Goals To learn the basics of RTL synthesis. To be able to synthesize a digital system, given its VHDL model. To be able to relate VHDL code to its synthesized output.
More informationAn Interconnect-Centric Design Flow for Nanometer Technologies
An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device
More informationABC basics (compilation from different articles)
1. AIG construction 2. AIG optimization 3. Technology mapping ABC basics (compilation from different articles) 1. BACKGROUND An And-Inverter Graph (AIG) is a directed acyclic graph (DAG), in which a node
More information1 Design Process HOME CONTENTS INDEX. For further assistance, or call your local support center
1 Design Process VHDL Compiler, a member of the Synopsys HDL Compiler family, translates and optimizes a VHDL description to an internal gate-level equivalent. This representation is then compiled with
More informationRetiming and Clock Scheduling for Digital Circuit Optimization
184 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 Retiming and Clock Scheduling for Digital Circuit Optimization Xun Liu, Student Member,
More informationTiming and Verification
Timing and Verification Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan) http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted
More informationEECS150 - Digital Design Lecture 17 Memory 2
EECS150 - Digital Design Lecture 17 Memory 2 October 22, 2002 John Wawrzynek Fall 2002 EECS150 Lec17-mem2 Page 1 SDRAM Recap General Characteristics Optimized for high density and therefore low cost/bit
More informationRegular Fabrics for Retiming & Pipelining over Global Interconnects
Regular Fabrics for Retiming & Pipelining over Global Interconnects Jason Cong Computer Science Department University of California, Los Angeles cong@cs cs.ucla.edu http://cadlab cadlab.cs.ucla.edu/~cong
More informationTopics. Verilog. Verilog vs. VHDL (2) Verilog vs. VHDL (1)
Topics Verilog Hardware modeling and simulation Event-driven simulation Basics of register-transfer design: data paths and controllers; ASM charts. High-level synthesis Initially a proprietary language,
More informationA High Performance Bus Communication Architecture through Bus Splitting
A High Performance Communication Architecture through Splitting Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University,West Lafayette, IN, 797, USA {lur, chengkok}@ecn.purdue.edu
More informationDigital Timing. Using TimingDesigner to Generate SDC Timing Constraints. EMA TimingDesigner The industry s most accurate static timing analysis
EMA TimingDesigner The industry s most accurate static timing analysis Digital Timing Learn about: Using TimingDesigner to generate SDC for development of FPGA designs Using TimingDesigner to establish
More informationOverview of Digital Design with Verilog HDL 1
Overview of Digital Design with Verilog HDL 1 1.1 Evolution of Computer-Aided Digital Design Digital circuit design has evolved rapidly over the last 25 years. The earliest digital circuits were designed
More informationAdaptive Weight Functions for Shortest Path Routing Algorithms for Multi-Wavelength Optical WDM Networks
Adaptive Weight Functions for Shortest Path Routing Algorithms for Multi-Wavelength Optical WDM Networks Tibor Fabry-Asztalos, Nilesh Bhide and Krishna M. Sivalingam School of Electrical Engineering &
More informationNext-generation Power Aware CDC Verification What have we learned?
Next-generation Power Aware CDC Verification What have we learned? Kurt Takara, Mentor Graphics, kurt_takara@mentor.com Chris Kwok, Mentor Graphics, chris_kwok@mentor.com Naman Jain, Mentor Graphics, naman_jain@mentor.com
More informationDESIGNING MULTI-FPGA PROTOTYPES THAT ACT LIKE ASICS
Design Creation & Synthesis White Paper DESIGNING MULTI-FPGA PROTOTYPES THAT ACT LIKE ASICS May 2009 ABSTRACT FPGA prototyping has become indispensable for functional verification and early software integration
More informationClock Tree Resynthesis for Multi-corner Multi-mode Timing Closure
Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Subhendu Roy 1, Pavlos M. Mattheakis 2, Laurent Masse-Navette 2 and David Z. Pan 1 1 ECE Department, The University of Texas at Austin
More informationCompiler User Guide. Intel Quartus Prime Pro Edition. Updated for Intel Quartus Prime Design Suite: Subscribe Send Feedback
Compiler User Guide Intel Quartus Prime Pro Edition Updated for Intel Quartus Prime Design Suite: 18.0 Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1. Design Compilation...
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationDigital System Design with SystemVerilog
Digital System Design with SystemVerilog Mark Zwolinski AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid Capetown Sydney Tokyo
More informationSynthesis Options FPGA and ASIC Technology Comparison - 1
Synthesis Options Comparison - 1 2009 Xilinx, Inc. All Rights Reserved Welcome If you are new to FPGA design, this module will help you synthesize your design properly These synthesis techniques promote
More informationCS250 DISCUSSION #2. Colin Schmidt 9/18/2014 Std. Cell Slides adapted from Ben Keller
CS250 DISCUSSION #2 Colin Schmidt 9/18/2014 Std. Cell Slides adapted from Ben Keller LAST TIME... Overview of course structure Class tools/unix basics THIS TIME... Synthesis report overview for Lab 2 Lab
More informationGeneral Framework for Removal of Clock Network Pessimism
General Framework for Removal of Clock Network Pessimism Jindrich Zejda Synopsys, Inc. 700 East Middlefield Road Mountain View, CA 94043, U.S.A. +1 650 584-5067 zejdaj@synopsys.com Paul Frain Synopsys,
More informationBuilt-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs
Built-In Self-Test for Programmable I/O Buffers in FPGAs and SoCs Sudheer Vemula, Student Member, IEEE, and Charles Stroud, Fellow, IEEE Abstract The first Built-In Self-Test (BIST) approach for the programmable
More informationCOPYRIGHTED MATERIAL. Architecting Speed. Chapter 1. Sophisticated tool optimizations are often not good enough to meet most design
Chapter 1 Architecting Speed Sophisticated tool optimizations are often not good enough to meet most design constraints if an arbitrary coding style is used. This chapter discusses the first of three primary
More information160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp
Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing
More informationGraphics: Alexandra Nolte, Gesine Marwedel, Universität Dortmund. RTL Synthesis
Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Universität Dortmund RTL Synthesis Purpose of HDLs Purpose of Hardware Description Languages: Capture design in Register Transfer Language form i.e. All
More informationA framework for verification of Program Control Unit of VLIW processors
A framework for verification of Program Control Unit of VLIW processors Santhosh Billava, Saankhya Labs, Bangalore, India (santoshb@saankhyalabs.com) Sharangdhar M Honwadkar, Saankhya Labs, Bangalore,
More informationComprehensive Place-and-Route Platform Olympus-SoC
Comprehensive Place-and-Route Platform Olympus-SoC Digital IC Design D A T A S H E E T BENEFITS: Olympus-SoC is a comprehensive netlist-to-gdsii physical design implementation platform. Solving Advanced
More informationDesign of a Low Density Parity Check Iterative Decoder
1 Design of a Low Density Parity Check Iterative Decoder Jean Nguyen, Computer Engineer, University of Wisconsin Madison Dr. Borivoje Nikolic, Faculty Advisor, Electrical Engineer, University of California,
More information12. Use of Test Generation Algorithms and Emulation
12. Use of Test Generation Algorithms and Emulation 1 12. Use of Test Generation Algorithms and Emulation Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin
More informationStatic Timing Verification of Custom Blocks Using Synopsys NanoTime Tool
White Paper Static Timing Verification of Custom Blocks Using Synopsys NanoTime Tool September 2009 Author Dr. Larry G. Jones, Implementation Group, Synopsys, Inc. Introduction With the continued evolution
More informationSelecting PLLs for ASIC Applications Requires Tradeoffs
Selecting PLLs for ASIC Applications Requires Tradeoffs John G. Maneatis, Ph.., President, True Circuits, Inc. Los Altos, California October 7, 2004 Phase-Locked Loops (PLLs) are commonly used to perform
More informationWhen addressing VLSI design most books start from a welldefined
Objectives An ASIC application MSDAP Analyze the application requirement System level setting of an application Define operation mode Define signals and pins Top level model Write a specification When
More information