Overcoming Wireload Model Uncertainty During Physical Design

Size: px
Start display at page:

Download "Overcoming Wireload Model Uncertainty During Physical Design"

Transcription

1 Overcoming Wireload Model Uncertainty During Physical Design Padmini Gopalakrishnan, Altan Odabasioglu, Lawrence Pileggi, Salil Raje Monterey Design Systems 894 Ross Drive, Suite, Sunnyvale, CA {padmini, altan, pileggi, ABSTRACT The advent of deep sub-micron technologies has created a number of problems for existing design methodologies. Most prominent among them is the problem of timing closure, whereby design time is dramatically increased due to iterations between gate-level synthesis and physical design. It is well known that the heart of this problem lies in the use of wireload models based on wirelength statistics from legacy designs. Some technology projections in [3] have suggested that wireload models will remain effective to block sizes on the order of 5k gates. This suggests that synthesis will not have to be changed much since this is approximately the maximum size for which logic synthesis is effective. However, our analyses on production designs show that the problem is not quite so straightforward, and the efficacy of synthesis using wireload models depends upon technology data as well as specific characteristics of the design. We analyze these effects and dependencies in detail in this paper, and draw some conclusions about the amount of physical information that is required for synthesis to be effective. Finally, we discuss the implications on hierarchical design flows, and propose a solution via physical prototyping. INTRODUCTION Until deep sub-micron (DSM) issues began to surface, design methodologies for synthesis and logic optimization were decoupled from placement and routing. Prior to physical design, wireload models based on statistical information from design legacy [4] were used to provide gate load models during logic optimization. For pre-dsm technologies the error associated with wireload estimates of interconnect capacitance had very little impact on the actual delays, since the device load-capacitances dominated the total net capacitance. However, as interconnect capacitance became more dominant at and below.25 microns, designers were forced to iterate, feeding back interconnect information from place and route to redo gate-level logic optimization [2]. Unfortunately, this loop has no guarantee of convergence, since the re-optimized netlist could result in a different place and route solution, with new values for the interconnect capacitances and resistances. As we will demonstrate with several examples from industrial designs, wireload models are always inaccurate in a relative sense, even under the best of circumstances [5]. Whether or not they are acceptable in an absolute sense depends on the ratio of interconnect to device capacitance, and the criticality of the paths on which they lie. As process technologies scale the impact of interconnect on the delays Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISPD', April -4, 2, Sonoma, California, USA. Copyright 2 ACM //4.$5.. becomes more prominent, and in some cases (such as for global nets) so dominant that it must be carefully considered as part of micro-architectural decision process. Clearly the trend is not just to estimate interconnect effects more accurately, but to do so effectively as early in the design flow as possible. Obviously these are somewhat conflicting requirements. The accuracy of interconnect estimation depends on the resolution of physical (e.g. placement) information which improves at late stages in the design flow. This raises the question of a suitable middle ground, or a point in the design flow when early interconnect estimation provides acceptable accuracy in spite of the physical uncertainties. Studies such as [3] suggested that this point corresponds to a block size of 5k gates. Our results in this paper, however, show through a series of experiments that the conclusions are not quite so simple. We find that the wireload model efficacy is strongly dependent on technology parameters and specific characteristics of the netlist topology and the floorplan. Understanding these dependencies has significant implications on understanding and practising blockbased and hierarchical design. Following our detailed analyses of the limitations of wireload models, we discuss some of these implications and propose some solutions. 2 WIRELOAD MODELS The relative impact of interconnect and device capacitances on delay determines whether or not wireload estimation is acceptable. To illustrate this point we performed the following experiments on production designs. Given the detailed placement of a finalized gatelevel netlist, we divided the chip into rectangular partitions of equal dimensions and area. Each partition represents a block of gates; thus any given block size represents a level of granularity in the placement. One can think of the nets within a partition as being local nets and the nets going across partitions as global nets. Figure. shows local net Figure. Placement partitioned into blocks. this setup of a detailed placement partitioned into blocks, including an example of a local net. We then analyzed the delays over different partition sizes and using various approximations for the local interconnect capacitances. The objective was to analyze the impact on the stage delays of estimating local interconnect via wireload models given various levels of physical design resolution. 82

2 2.. The Impact of Interconnect on Delay To obtain some understanding of the impact of local net capacitance on delay, our first experiment was to consider local delays with and without interconnect. The delay profiles for each partition size are generated as follows. Every cell retains its detailed placement coordinates. The topology of a net is modeled by a Steiner tree approximation [6]. We applied a crude layer assignment algorithm to assign higher metal layers to longer nets and lower metal layers to shorter nets. For each net, we compute the worst case delay from an input pin of the net driver to one of its fanouts. We consider the following cases.. Assume that the entire load on the driver is due to pin capacitances, and that interconnect has no effect: this is delay d. 2. Assume that the load on the driver is due to pin capacitances, as well as interconnect capacitance and resistance from the Steiner model: this is delay d2. We then plotted the distribution of the ratio (d/d2) over all the nets in the design that are completely within a partition (local nets). This ratio is always between and.. Smaller values of the ratio imply that interconnect has a significant impact on the worst case delay of this net; values close to. imply that device capacitances dominate. Since we consider nets that are completely within a partition, the bounding box of any such net must lie within the bounding box of the partition that encloses it Profiles for a.8 micron process. Here we show these distributions for an industrial design in a.8 micron process. The design has approximately 44k gates. In Figures 2-4 we show profiles for all 2 pin local nets over three different levels of partition sizes. Note from the figures that for larger partition sizes there are more local nets. As one would expect, the wirelength distributions for these local nets shows greater deviation from the mean for the larger partition sizes. As a result, at larger partition sizes we can observe a larger number of nets where the ratio (d/d2) is much less than.. However, it is important to note that even at the smallest partition size that we considered, where each partition contains only 6 gates, we see some local nets with a ratio as small as.65. If such a net lies on a critical path, the wireload model error associated with it can easily result in a failure to achieve timing closure (stage-delay w/o interconnect)/(stage-delay with interconnect) Figure 2. Partition size is roughly 34 x 28 sq. microns, corresponding to approximately 7k gates per partition To better understand why interconnect dominates some nets more than others, we look at the following parameters for all nets that are local to a partition.. The ratio of the net-length to the half perimeter of the rectangle that forms a partition: which we will refer to as r. This ratio is a measure of the relative length of a net. 2. The ratio (d/d2) described earlier in this section, which we will refer to as r2. As mentioned above, this is a measure of how dominant interconnect is for a net. Nets with a low value of r and a high value of r2 are short nets, (stage-delay w/o interconnect)/(stage-delay with interconnect) Figure 3. Partition size is roughly 22 x 8 sq. microns, corresponding to approximately 3 gates per partition (stage-delay w/o interconnect)/(stage-delay with interconnect) Figure 4. Partition size is roughly 6 x 5 sq. microns, corresponding to approximately 2 gates per partition for which interconnect does not significantly impact delay. Nets with a low value of r and a low value of r2 are short nets for which interconnect is dominant because the driver is weak. Nets with a high value of r and a high value of r2 are long nets, but generally strong drivers lessen the effect of the interconnect. Nets with a high value of r and a low value of r2 are the ones which fall into the category of interconnect dominated. In Figures 5-7 below we show scatter plots of the local 2 pin nets profiled in Figures 2-4, with r2 on the x-axis and r on the y- axis. Note that the length of a net is the same, irrespective of size of Ratio r Ratio r2 Figure 5. Partition size is roughly 34 x 28 sq. microns, corresponding to approximately 7k gates per partition the partitions. However its length as a fraction of partition size decreases as partition size increases. The value of r2 is a constant for a given net. The profiles show what we would expect: in general, the interconnect has a greater impact for longer nets. Some of the extreme cases that were observed, especially for the smallest partition size, are primarily attributable to weak drivers. Since we use exact detailed placement coordinates for cells, but only a routing model for nets, the profiles shown in Section 2... present a best case picture from a routing perspective. Namely, routing obstacles and congestion which can cause nets to be even longer were not considered. There could potentially be further variation since the detailed routes include exact layer assignments, meandering 83

3 sider wireload statistics generated from exact data for our design under test. For example, we derive the wireload statistics from the actual detailed placement coordinates, which while impractical for the general design problem, will clearly represent a best-case for the wireload modeling error. Even with this best case model, we can show that at some level of block size the error incurred is too large. Starting with the detailed placement of the design under investigation, we divide the chip area into partitions as described in Section 2.. For each partition size we generate a wireload model that estimates the interconnect length of a local net as a function of its pin-count. The wireload model is generated as follows: For a given partition size, we determine which nets are local. Then we generate a wirelength distribution for these local nets for each pin-count (i.e. we have one distribution for 2 pin nets, one distribution for 3 pin nets and so on.). As before, a net is modelled by a Steiner tree. We compute the worst case delay for each local net from a driver input to a fanout by substituting its actual length with the length estimated by the wireload model. In these initial experiments we used the mean, or average wirelength as our wireload model predictor. The wireload model uses only these statistics from the detailed placement, and there is no motion across partitions or any kind of change in the netlist after the statistics are compiled. We consider only local nets which are those nets fully contained within a partition. Therefore, for a given partition size, this wireload model represents the most accurate average prediction of wirelength that is possible as a function of only the pin-count of a net. In Figures 9 - we show scatter plots of 2 pin local nets at different partition sizes, similar to those shown without wireload models in Figures 2-4. The plots show the actual delay of a net along the x axis and the wireload model predicted delay along the y axis. The design example here is the same as in Section 2.. From these plots we can see that there is a lot of difference between the actual and predicted delays at large partition sizes. The variation in the delays of 2 pin local nets is quite significant at large partition sizes, as shown in Figure 9. The correlation becomes better as partition size decreases, with the points clustering closer to the straight line x = y. As can be seen from Figure, the variation in the actual delays of these nets is much smaller too. To quantify the errors in estimation, we generate a distribution of the ratio of the estimated delay of a net to its actual delay: which we will refer to as r3. We show these distributions for 2 pin local nets at different partition sizes in Figures Ratio r Ratio r2 Figure 6. Partition size is roughly 22 x8 sq. microns, corresponding to approximately 3 gates per partition.9 Ratio r Ratio r2 Figure 7. Partition size is roughly 6 x 5 sq. microns, corresponding to approximately 2 gates per partition due to congestion, vias and jogs in the routes, and more precise capacitance and resistance values Profiles for a.25 micron technology We now profile nets for an industrial design in a.25 micron technology that contains approximately 48k gates. As for the data in Figures 2-4, the delay calculation uses a steiner tree to model net topology, and does a rough layer assignment based on net length. A profile for 2 pin local nets at a partition size of 3 x 3 sq. microns is shown in Figure 8. The level of placement granularity that it Estimated Delay (stage-delay w/o interconnect)/(stage-delay with interconnect) Figure 8. Partition size is roughly 3 x 3 sq. microns corresponding to approximately 3k gates per partition Actual Delay Figure 9. Partition size is roughly 34 x 28 sq. microns, corresponding to approximately 7k gates per partition corresponds to is roughly the same (actually slightly coarser) as that of the.8 micron design shown in Figure 3. Comparing the two profiles, we can see that a larger percentage of the nets profiled here have a ratio of (d/d2) close to.. Thus, for this design, errors in wireload estimates impact stage delays to a smaller degree. Figures 2-4 clearly show the distribution getting narrower at smaller partition sizes, and as expected, the wireload estimate becoming more accurate. We can also see that there are partition sizes at which the error in estimation is very large. In other words, at these levels of placement granularity, the wireload model breaks down. Optimizations that are based on these estimates would be significantly in error. Further, we have shown earlier in this section that this is the best possible wireload model that could be found; so a wireload model based on design legacy statistics would in all likelihood be much worse. Moreover, given that the wireload model has significant error even with coarse placement information, it will have much 2.2. A Perfect Wireload Model In Section 2.., we showed the error that would be incurred by completely ignoring the impact of the local interconnect for an industrial design. Next we consider the error incurred by using the best wireload model. In general, wireload models are assembled from statistical data over a population of designs. In this experiment we con84

4 Estimated Delay Actual Delay Figure. Partition size is roughly 22 x8 sq. microns, corresponding to approximately 3 gates per partition Estimated Delay Actual Delay Figure. Partition size is roughly 6 x 5 sq. microns, corresponding to approximately 2 gates per partition Figure 2. Partition size is roughly 34 x 28 sq. microns, corresponding to approximately 7k gates per partition Figure 3. Partition size is roughly 22 x 8 sq. microns, corresponding to approximately 3 gates per partition greater error when used in gate-level synthesis, which is completely devoid of placement information Adding More Pessimism? The obvious next question to ask is: what if we use a more pessimistic wireload model? For example, what if we use the mean + standard-deviation of the distribution instead of just the mean? Figures 5-7 again show the ratio distributions of the estimated delay to the actual delay for 2 pin local nets at different partition sizes. Comparing these distributions with those from Figures 2-4 we Figure 4. Partition size is roughly 6 x 5 sq. microns, corresponding to approximately 2 gates per partition. can clearly observe that the means of the distributions shift to a greater value as a result of the increased pessimism. But figuring out how much to shift these estimates, without overdesigning, is a difficult problem. Moreover, too much of a shift can adversely impact the fast-path problem in terms of hold margins, which is becoming an increasingly difficult problem with faster operating frequencies and shallower logic depths Figure 5. Partition size is roughly 34 x 28 sq. microns, corresponding to approximately 7k gates per partition Figure 6. Partition size is roughly 22 x 8 sq. microns, corresponding to approximately 3 gates per partition Figure 7. Partition size is roughly 6 x 5 sq. microns, corresponding to approximately 2 gates per partition 3 DSM TECHNOLOGY IMPLICATIONS From the data in Section it is apparent that predicting the im- 85

5 pact of interconnect has become a challenge since we have entered the DSM range for technologies. As expected, we see that errors in interconnect estimation have the greatest impact for large block sizes. Further, we also showed that even a non-causal wireload model that is based on the actual placement breaks down at large block sizes; hence using wireload models for gate-level synthesis is not meaningful. But what do we expect with further scaling for CMOS technologies? In general, we would expect things to get worse, but why, and by how much? 3.. Increasing Interconnect Dominance For pre-dsm technologies, shrinking device sizes were evidenced by improvement in switching speeds. This was primarily due to the increase in drive currents with reductions in channel length. As channel lengths reduce to less than.25 micron, however, the drive current remains more or less constant because of velocity saturation. Decreases in gate delays are, therefore, due mainly to reductions in gate oxide thickness. One would thus expect to see a slower rate of increase in device speeds with continued scaling [3]. At the same time, interconnect delays are increasing with scaling for two reasons. First of all, interconnect capacitance dominates the total net capacitance due to: a) increased routing densities that have led to shrinking wiring pitches; and b) aspect ratios that attempt to keep the resistance of these narrower wires constant. Both have resulted in an increase in the capacitance per unit length, particularly due to inter-layer coupling capacitance []. Secondly, since chip sizes are also growing; global wires are longer than ever before. As wire widths decrease with scaling to accommodate a greater density of routing, the interconnect resistance effects for these long wires start to become evident. Via resistances also increase as processes scale, making long interconnect delays very dependent on detailed routing, layer assignment and the number of layer changes in the routes. To study the trends of increasing interconnect dominance we consider a logic stage consisting of a NAND gate driving a net with a fixed length, layer assignment and capacitive load on its fanouts. We compute the worst stage delay to a fanout point with and without interconnect loading included. This is done for different driver sizes in process technologies at.25 and.8 microns respectively. The length of the net is approximately 38 microns; hence any contribution of interconnect delay to the stage delay is mostly due to capacitive rather than resistive effects. We measure the dominance of the interconnect by the ratio r of stage delay without interconnect to stage delay with interconnect; A smaller value of r indicates that interconnect delay dominates to a greater extent. Delays are computed assuming that this stage is driven by a close-by buffer which is driven by an input transition of. ns. The results of these measurements are compiled in Table and Table 2. Table. Dependence on driver sizes in.8 micron Drive strength of driver worst delay without interconnect worst delay with interconnect worst slope at fanout ratio r.5 x x x x x x x x We can see that for any given driver size the value of r is smaller in the.8 micron process, which measures the difference in interconnect dominance. As driver sizes increase for both technologies, Table 2. Dependence on driver sizes in.25 micron Drive strength of driver worst delay without interconnect worst delay with interconnect worst slope at fanout ratio r.5 x x x x x x x x we can see that interconnect delays are gradually swamped out; as shown by the asymptotic increase in the value of r. In Table and Table 2 we also show the worst slopes to a fanout point. It is easy to see that the driver with drive-strength of 2x gives the minimum delay for this stage and also has a reasonable slope at its output. Since we have assumed that the driver was driven by a close-by buffer, we can assume that upstream gates are shielded from any effect of sizing the NAND gate. This driver size therefore represents an optimal choice for this stage. It is important to note that the optimal point has a relatively low value of r. Thus, picking a driver size that would allow us to neglect the effect of interconnect for this stage is clearly sub-optimal from the point of view of performance, even for this local net for which only capacitive effects are evident. For global wires that are dominated by metal resistance as well, accounting for interconnect will be even more important Criticality of Layer Assignment One implication of the increasing dominance of interconnect is that layer assignment can have a dramatic impact on the delay of a net. The extent of this varies from one technology to the next; some processes have somewhat balanced capacitances per layer, whereas others do not. We have computed stage delays with and without interconnect for the stage described in Section 3.. by varying only the layer assignment of the interconnect. The length of the net considered is about 66.6 microns, hence both resistive and capacitive effects show up in the delay. Table 3 shows the capacitance per unit length in pf per micron (including both the lateral and fringe capacitances), and the resistance in ohms per square for each layer considered. Table 3. Interconnect Capacitances and Resistances Metal layer Capacitance (.8 um tech.) Resistance (.8 um tech.) Capacitance (.25 um tech.) Metal Meta Metal Metal Metal Metal Resistance (.25 um tech.) We can see from Figure 8. that there are variations in stage delays as a function of routing layer assignment only. This makes the problem of accurate interconnect estimation more complex, since the routing layer is difficult to predict prior to global routing. 86

6 Ratio r Routing layer Increasing driver strength Figure 8. Results showing the impact of layer assignment for a.8 micron process. Ratio r Routing layer Increasing driver strength Figure 9. Results showing the impact of layer assignment for a.25 micron process. 4 IMPACT ON DESIGN FLOWS In the previous sections we have analyzed the impact of increasing interconnect dominance in DSM technologies, and taken a closer look at the limitations of wireload models. The ultimate question to answer is: what impact do these trends and issues have on current and future design methodologies? What must be changed in the way we do gate-level synthesis for DSM designs, and block level assembly for hierarchical designs? 4.. Appropriate Block Sizes for Synthesis In [3] it was predicted that an approximate block size of 5k gates --- which is about the size of a logic block that a designer might want to deal with -- would be of acceptable size for wireload models to be effective, now and into the foreseeable future. Based on our analyses above, however, we believe that other technology and design factors must be considered, and that only the granularity of the physical information can ultimately determine the efficacy of the wireload models Technology and Design Dependence The influence of interconnect on delay is dependent on a number of factors, including the process technology; as shown in Section 3. One example was the increasing influence of interconnect layer assignment on overall performance. There are also effects which are a combination of technology issues and design dependence. For example, intra-layer capacitance is becoming more dominant for smaller feature sizes, which makes the impact of interconnect more dependent on neighboring line switching and routing congestion. Routes are forced to meander in congested areas regions thereby increasing the overall net capacitance. The impact of neighboring line switching can considerably increase the effective inter-layer capacitance -- which is becoming a more dominant component of the total capacitance. Since congestion impacts wireload model predictability, the prelayout timing prediction for a block is also impacted by the overall netlist connectivity. Some netlists have an inherently higher connectivity than others, forcing certain blocks of logic to be placed together; sometimes resulting in congestion hotspots. Datapath dominated designs are a good example of designs with this strong dependency. To illustrate this particular form of design dependency we performed the following experiment on the 48k gate,.25um datapath design from Section We reordered the IO pins on the block slightly from the ordering used above (simply interchanged the bit orderings for two 64bit busses entering the block), then compared the placed and routed results for both cases. Figure 2 shows a scatter plot of the delays of 2 pin nets for both placements. The variation is quantified in Figure 2 which shows a profile of the ratio of the delay of a stage in one placement to that in the other over all nets. The mean of this profile is approximately.95, and there are a significant number of nets for which this ratio is substantially different from.. Stage Delay With Floorplan Stage Delay with Floorplan 2 Figure 2. Scatter plot showing the impact of lo dependencies. Number of 2 pin nets (stage-delay with floorplan )/(stage-delay with floorplan 2) Figure 2. Distribution showing the impact of lo dependencies With such a substantial dependence on the chosen technology and the design specificity, stating some absolute block size as appropriate for synthesis seems questionable. One could perhaps only calculate an upper bound on such a block size, and for our results shown here for.8 micron technologies, such a bound would be significantly smaller than 5k gates. 5 GETTING MORE PHYSICAL In order to account for physical effects during synthesis some form of early estimation of net capacitances is clearly necessary. We showed previously that a wireload estimator based on statistics from legacy designs breaks down at some level of placement granularity, even for small designs. From these results we would expect that floorplanning provides insufficient physical detail for wireload prediction. This suggests the need for a new block synthesis methodology. These block-design methodology implications also have an impact on hierarchical design styles and capabilities. When blocks are designed separately and then assembled together at the chip level, their netlists and constraints may be in different stages of completion at different times in the design process. The challenge in hierarchical design is to be able to efficiently implement individual blocks while taking into account the global view of the chip. Recall that changing 87

7 the pin orderings for a small datapath had a significant impact on the performance of the datapath block. Should the pin assignment for blocks be done top-down or bottom-up? 5.. Approaches to Physical Synthesis We first consider proposed solutions for block level synthesis. Recent approaches to synthesis begin with some estimate of physical interconnect effects for a first pass of synthesis, followed by some interactive loop between physical design and synthesis to achieve timing closure. While such approaches can alleviate the wiring dominance problem, clearly we should be searching for new opportunities to incorporate the ultimate physical realities as early as possible in the synthesis flow. Another possibility would be to use drivers that are strong enough to make any errors in interconnect estimation inconsequential. This assumption was implicit in the 5k gate block size result in [3], where a typical driving transistor was assumed to have a W/L ratio of 2. While this approach can make wireload models and predictability more effective, there is a price paid in terms of overdesigning, as illustrated by the results in Table. Since power is becoming an extremely precious commodity in IC design, this style of synthesis might be unacceptable. Our best hope, therefore, may be to determine the point in the physical design or floorplanning flow where we can achieve sufficient confidence in the accuracy of interconnect estimation, but prior to the actual completion of the physical design so that gate sizing can still be controlled and modified. Only placement data can guarantee some level of resolution where the error in interconnect estimation is acceptable for DSM designs. The coarsest level of placement detail that provides acceptable estimation will be a function of the design style and the process technology; as discussed earlier. Once this level has been reached, synthesis can be done with confidence in the accuracy and optimality of the result Hierarchical design flows Overcoming the wireload modeling inaccuracy for synthesis and physical creation of the blocks is only half of the problem. An equally difficult task, especially due to the increasing dominance of the global interconnect, is the assembly of these blocks as part of a hierarchical design flow. In current methodologies, individual blocks in the hierarchy are designed independently using conservative constraints on area and timing, then assembled at the chip level using an abstract timing model for each block. There are several problems with this approach. Firstly, the floorplan level will not, in general, provide sufficient physical detail for estimating timing behavior. Chip-level constraints are arrived at initially without any knowledge of whether individual blocks are feasible or not.the chip-level context is not very accurately known before individual blocks are implemented. Furthermore, the implementation of a block depends on factors such as global routes and pin assignments which are known only at the chip level. Adjustments in the block timing budgets during chip assembly is what leads to costly design iterations with no guarantee of convergence. This bottom-up methodology also makes it difficult to implement ECO changes. Instead of using an abstract model, another approach is to instantiate individual blocks flat at the chip level after their physical implementation is completed. While this enables accurate estimates of timing, congestion and area, it leads to problems with capacity since detailed information about each block must now be handled at the chip level. Once again, a large number of iterations would be required since the implementation of each block is independent of a global design view. Clearly rather than performing bottom-up design, an ideal flow would look at chip-level and block-level issues concurrently. Any practical implementation of this would also require quick and accurate estimates of how changes in any one context affect the other. In the following section, we will discuss such a methodology called physical prototyping. 6 PHYSICAL PROTOTYPING Typically, verifying that a design satisfies timing constraints is done at the floorplanning stage following the first gate-level synthesis with wireload models. Since this stage does not always provide sufficient modeling accuracy, we propose to refine the coarse placement further until the wireload modeling error becomes acceptable. 6.. How much physical detail is enough? Assuming that a coarse placement is available, we quantify the level of resolution as follows. The placement area is divided into regions of roughly equal area such that the exact standard cell placement location is known to within the precision specified by the size of that region. This is analogous to the location uncertainty for the cell locations described by the region partitioning in Figure. To determine the size of the regions for which such a coarse placement would provide sufficient modeling precision we can consider an extension of the experiments in Section 2.2. In this experiment we partitioned the design into regions of roughly equal area, as shown in Figure. A wireload model of choice was used to compute the delays of all local nets (i.e. nets which are completely within a block). The regions sizes for which the mean and deviation of the wirelength profiles are acceptable would be the level at which this particular wireload model can be used for synthesis. Obviously, the more closely the wireload model correlates with the actual placement statistics, the less physical modeling detail that is required. Estimating the wirelengths within these bounded regions can be done in a variety of ways, including via analytical models [7][8], graph properties of the netlist [9], or empirical observations [][][2][3]. The deviation in net delays for local nets are computed for each block size and the region size at which this deviation is acceptable is the level of which wireload models can be used Prototyping Designs Once we have obtained an acceptable level of placement detail for wireload estimation, we can construct a physical prototype of the final design. At this level of coarse placement, gate-sizing and remapping will work with a correct knowledge of path criticality and stage delays. Furthermore, we also have an estimate of interconnect length such that required routing resources can be approximated. Performing a congestion analysis with this level of physical detail will determine whether or not routability constraints can be satisfied. The corresponding availability of accurate delay estimates also enables approximate clock tree synthesis, which in turn improves the accuracy of congestion estimation. Since the error in wireload estimation is reasonably bounded, a timing analysis of this prototype will correlate closely to a timing analysis of the final physical design. If the design constraints are not satisfied, the designer can make the necessary RTL or behavioral level changes to the netlist, modify the floorplan and constraints and then repeat the prototyping process. Power and IR drop analysis of the design can also be done at this point and if necessary, changes to the power grid incorporated Designing blocks For the coarse placement that is used to construct a physical prototype, the placement area is divided into regions and standard cells are distributed among these regions. After converging on a level of physical detail for the prototype that satisfies top-level constraints, the physical implementation of the blocks is carried out via concurrent synthesis and placement for each of them individually, but while maintaining the global view from the physical prototype. 88

8 6.4. Hierarchical Design Ideally, during hierarchical design we would maintain both a global context for the entire chip and a local context for each block in the hierarchy. The initial chip context is obtained from something equivalent to an initial block level floorplan using very abstract timing and area estimates for individual blocks. First a physical prototype of the entire chip is generated using these models. Here top-level routing and optimizations like buffer insertion, pin assignment and top-level clock tree synthesis are done. These are then used to generate constraints for the individual blocks. Cycle Time (ns) Physical Prototyping Cycle Time Comparison After physical prototyping 2.4% After physical design 2.35% 6.59% 7.2% 5.44%.76% d d2 d3 d4 d5 d6 Designs Figure 23. Physical Prototyping - comparison of cycle times with actual physical design Block Context Figure 22. Hierarchical design showing block and chip contexts Given these block level constraints, a physical prototype can be obtained for an individual blocks. This uses information pushed down from the global context such as the drivers at the block s inputs, the loads on its outputs and the topology of top-level routes. Once the physical prototype of a block is generated, it is used to refine the block level context and constraints. It is also used to refine the chip level context, since information is now available about the loads on the input pins of the block, drivers on the output pins and the resources available for routing over the block. Physical prototyping enables the accurate estimation of what the final placed and routed solution of a block will look like, and thus is a powerful tool that can be used in chip level optimization. Given a design, it can also be used to generate an optimal partition of the netlist into hierarchical blocks and provide timing and area estimates for them. 7 SOME RESULTS To demonstrate that the prototype does present a realistic picture of the final design, we present results of some industry designs using the physical design system described above. Figure 23 shows comparisons between the cycle time estimated at the physical prototyping level and the cycle time at the end of placement and routing. The margin of error in most cases is within %. The runtime required for physical prototyping is between % to 2% of the runtime required to obtain a detailed placement; as described earlier it depends on technology and design characteristics. If runtimes for global and detailed routing are included as well, the fraction of time required for physical prototyping will be even smaller. 8 CONCLUSIONS In summary, the level of placement resolution at which wireload models can be used depends on the extent to which interconnect delays dominate stage delays. For pre-dsm technologies, this corresponded to the floorplan level. However, our analyses clearly show that this is no longer sufficient for the post-dsm era. We further conclude that specifying a block size for which synthesis with wireload models is effective is an over-simplification, since such a block size depends on combinations of process technology, design style, floorplan and the wireload estimator that is used. Our results clearly show that a 5k or any other fixed block size assumption for the acceptable level at which wireload models are applicable, is not realistic even for today s.8 micron designs. Based on these analyses we proposed the generation of a physical prototype to accurately estimate timing from the coarse placement information. We described the implications of this methodology on block and hierarchical design flows. 9 REFERENCES [] Semiconductor Industry Association, National Technology Roadmap for Semiconductors, 999, [2] S. Hojat, P. Villarrubia, An Integrated Placement and Synthesis Approach for Timing Closure of PowerPC Microprocessors, Intl. Conference on Computer Design, October 997. [3] D. Sylvester and K. Keutzer, Getting to the Bottom of Deep Submicron, Intl. Conference on Computer-Aided Design, November 998. [4] N.H.E. Weste and K. Eshraghian, Principles of CMOS VLSI Design: A Systems Perspective, Addison Wesley, 2nd Edition, 993. [5] D. MacMillen, DSM: It s the Heights and not the Depths that are Dangerous, IEEE/ACM Workshop on Timing in the Specification and Analysis of Digital Systems (TAU), March 999. [6] F.K. Hwang, D.S. Richards and P.Winter, The Steiner Tree Problem, Elsevier Science Publishers, 992. [7] W. E. Donath, Placement and average interconnection lengths of computer logic. IEEE Trans. on Circuits and Systems, 26(4), April 979. [8] A. E. Caldwell, A. B. Kahng, S. Mantik, I. L. Markov and A. Zelikovsky, On Wirelength Estimations for Row-Based Placement, IEEE Trans. on CAD 8(9), 999. [9] T. Hamada, C.-K. Cheng, and P. M. Chau, A wire length estimation technique utilizing neighborhood density equations. In Proc. ACM/ IEEE Design Automation Conf., 992. []D. Stroobandt and J. Van Campenhout, Accurate Interconnection Length Estimations for Predictions Early in the Design Cycle, VLSI Design, Special Issue on Physical Design in Deep Submicron, v(!), 999. []M. Pedram and B. Preas, Interconnection length estimation for optimized standard cell layouts. Intl. Conf. on Computer-Aided Design, pp , 989. [2]C. Sechen, Average interconnection length estimation for random and optimized placements. Intl. Conf. on Computer-Aided Design, 987. [3]S.Bodapati and F.N.Najm, Pre-Layout Estimation of Individual Wire Lengths, In Proceedings ACM International Workshop on System-Level Interconnect Prediction (SLIP), 2. 89

Wojciech P. Maly Department of Electrical and Computer Engineering Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA

Wojciech P. Maly Department of Electrical and Computer Engineering Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong Deng Department of Electrical and Computer Engineering Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213 412-268-5234

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

Physical Design Closure

Physical Design Closure Physical Design Closure Olivier Coudert Monterey Design System DAC 2000 DAC2000 (C) Monterey Design Systems 1 DSM Dilemma SOC Time to market Million gates High density, larger die Higher clock speeds Long

More information

Basic Idea. The routing problem is typically solved using a twostep

Basic Idea. The routing problem is typically solved using a twostep Global Routing Basic Idea The routing problem is typically solved using a twostep approach: Global Routing Define the routing regions. Generate a tentative route for each net. Each net is assigned to a

More information

Eliminating Routing Congestion Issues with Logic Synthesis

Eliminating Routing Congestion Issues with Logic Synthesis Eliminating Routing Congestion Issues with Logic Synthesis By Mike Clarke, Diego Hammerschlag, Matt Rardon, and Ankush Sood Routing congestion, which results when too many routes need to go through an

More information

Pre-Layout Estimation of Individual Wire Lengths

Pre-Layout Estimation of Individual Wire Lengths Pre-Layout Estimation of Individual Wire Lengths Srinivas Bodapati and Farid N. Najm ECE Dept. and Coordinated Science Lab. University of Illinois at Urbana-Champaign ECE Department University of Toronto

More information

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design Wei-Jin Dai, Dennis Huang, Chin-Chih Chang, Michel Courtoy Cadence Design Systems, Inc. Abstract A design methodology for the implementation

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Wirelength Estimation based on Rent Exponents of Partitioning and Placement Λ

Wirelength Estimation based on Rent Exponents of Partitioning and Placement Λ Wirelength Estimation based on Rent Exponents of Partitioning and Placement Λ Xiaojian Yang Elaheh Bozorgzadeh Majid Sarrafzadeh Computer Science Department University of California at Los Angeles Los

More information

Design Compiler Graphical Create a Better Starting Point for Faster Physical Implementation

Design Compiler Graphical Create a Better Starting Point for Faster Physical Implementation Datasheet Create a Better Starting Point for Faster Physical Implementation Overview Continuing the trend of delivering innovative synthesis technology, Design Compiler Graphical streamlines the flow for

More information

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141 ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition

More information

On Wirelength Estimations for Row-Based Placement

On Wirelength Estimations for Row-Based Placement IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 9, SEPTEMBER 1999 1265 On Wirelength Estimations for Row-Based Placement Andrew E. Caldwell, Andrew B. Kahng,

More information

Fast, Accurate A Priori Routing Delay Estimation

Fast, Accurate A Priori Routing Delay Estimation Fast, Accurate A Priori Routing Delay Estimation Jinhai Qiu Implementation Group Synopsys Inc. Mountain View, CA Jinhai.Qiu@synopsys.com Sherief Reda Division of Engineering Brown University Providence,

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device

More information

Linking Layout to Logic Synthesis: A Unification-Based Approach

Linking Layout to Logic Synthesis: A Unification-Based Approach Linking Layout to Logic Synthesis: A Unification-Based Approach Massoud Pedram Department of EE-Systems University of Southern California Los Angeles, CA February 1998 Outline Introduction Technology and

More information

CHAPTER 1 INTRODUCTION. equipment. Almost every digital appliance, like computer, camera, music player or

CHAPTER 1 INTRODUCTION. equipment. Almost every digital appliance, like computer, camera, music player or 1 CHAPTER 1 INTRODUCTION 1.1. Overview In the modern time, integrated circuit (chip) is widely applied in the electronic equipment. Almost every digital appliance, like computer, camera, music player or

More information

Topics. ! PLAs.! Memories: ! Datapaths.! Floor Planning ! ROM;! SRAM;! DRAM. Modern VLSI Design 2e: Chapter 6. Copyright 1994, 1998 Prentice Hall

Topics. ! PLAs.! Memories: ! Datapaths.! Floor Planning ! ROM;! SRAM;! DRAM. Modern VLSI Design 2e: Chapter 6. Copyright 1994, 1998 Prentice Hall Topics! PLAs.! Memories:! ROM;! SRAM;! DRAM.! Datapaths.! Floor Planning Programmable logic array (PLA)! Used to implement specialized logic functions.! A PLA decodes only some addresses (input values);

More information

Three-Dimensional Integrated Circuits: Performance, Design Methodology, and CAD Tools

Three-Dimensional Integrated Circuits: Performance, Design Methodology, and CAD Tools Three-Dimensional Integrated Circuits: Performance, Design Methodology, and CAD Tools Shamik Das, Anantha Chandrakasan, and Rafael Reif Microsystems Technology Laboratories Massachusetts Institute of Technology

More information

Floorplan Management: Incremental Placement for Gate Sizing and Buffer Insertion

Floorplan Management: Incremental Placement for Gate Sizing and Buffer Insertion Floorplan Management: Incremental Placement for Gate Sizing and Buffer Insertion Chen Li, Cheng-Kok Koh School of ECE, Purdue University West Lafayette, IN 47907, USA {li35, chengkok}@ecn.purdue.edu Patrick

More information

Comprehensive Place-and-Route Platform Olympus-SoC

Comprehensive Place-and-Route Platform Olympus-SoC Comprehensive Place-and-Route Platform Olympus-SoC Digital IC Design D A T A S H E E T BENEFITS: Olympus-SoC is a comprehensive netlist-to-gdsii physical design implementation platform. Solving Advanced

More information

L14 - Placement and Routing

L14 - Placement and Routing L14 - Placement and Routing Ajay Joshi Massachusetts Institute of Technology RTL design flow HDL RTL Synthesis manual design Library/ module generators netlist Logic optimization a b 0 1 s d clk q netlist

More information

An Interconnect-Centric Design Flow for Nanometer. Technologies

An Interconnect-Centric Design Flow for Nanometer. Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong Department of Computer Science University of California, Los Angeles, CA 90095 Abstract As the integrated circuits (ICs) are scaled

More information

Wirelength Estimation based on Rent Exponents of Partitioning and Placement 1

Wirelength Estimation based on Rent Exponents of Partitioning and Placement 1 Wirelength Estimation based on Rent Exponents of Partitioning and Placement 1 Xiaojian Yang, Elaheh Bozorgzadeh, and Majid Sarrafzadeh Synplicity Inc. Sunnyvale, CA 94086 xjyang@synplicity.com Computer

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

FPGA Power Management and Modeling Techniques

FPGA Power Management and Modeling Techniques FPGA Power Management and Modeling Techniques WP-01044-2.0 White Paper This white paper discusses the major challenges associated with accurately predicting power consumption in FPGAs, namely, obtaining

More information

Chapter 5: ASICs Vs. PLDs

Chapter 5: ASICs Vs. PLDs Chapter 5: ASICs Vs. PLDs 5.1 Introduction A general definition of the term Application Specific Integrated Circuit (ASIC) is virtually every type of chip that is designed to perform a dedicated task.

More information

SYNTHESIS FOR ADVANCED NODES

SYNTHESIS FOR ADVANCED NODES SYNTHESIS FOR ADVANCED NODES Abhijeet Chakraborty Janet Olson SYNOPSYS, INC ISPD 2012 Synopsys 2012 1 ISPD 2012 Outline Logic Synthesis Evolution Technology and Market Trends The Interconnect Challenge

More information

UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Lab #2: Layout and Simulation

UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Lab #2: Layout and Simulation UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Lab #2: Layout and Simulation NTU IC541CA 1 Assumed Knowledge This lab assumes use of the Electric

More information

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface. Placement Introduction A very important step in physical design cycle. A poor placement requires larger area. Also results in performance degradation. It is the process of arranging a set of modules on

More information

ASIC, Customer-Owned Tooling, and Processor Design

ASIC, Customer-Owned Tooling, and Processor Design ASIC, Customer-Owned Tooling, and Processor Design Design Style Myths That Lead EDA Astray Nancy Nettleton Manager, VLSI ASIC Device Engineering April 2000 Design Style Myths COT is a design style that

More information

LOGICAL AND PHYSICAL DESIGN: A FLOW PERSPECTIVE

LOGICAL AND PHYSICAL DESIGN: A FLOW PERSPECTIVE Chapter 7 LOGICAL AND PHYSICAL DESIGN: A FLOW PERSPECTIVE Olivier Coudert Abstract A physical design flow consists of producing a production-worthy layout from a gate-level netlist subject to a set of

More information

Constructive floorplanning with a yield objective

Constructive floorplanning with a yield objective Constructive floorplanning with a yield objective Rajnish Prasad and Israel Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst, MA 13 E-mail: rprasad,koren@ecs.umass.edu

More information

Interconnect Delay and Area Estimation for Multiple-Pin Nets

Interconnect Delay and Area Estimation for Multiple-Pin Nets Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Z. Pan UCLA Computer Science Department Los Angeles, CA 90095 Sponsored by SRC and Avant!! under CA-MICRO Presentation

More information

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry High Performance Memory Read Using Cross-Coupled Pull-up Circuitry Katie Blomster and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

WHITE PAPER PARASITIC EXTRACTION FOR DEEP SUBMICRON AND ULTRA-DEEP SUBMICRON DESIGNS

WHITE PAPER PARASITIC EXTRACTION FOR DEEP SUBMICRON AND ULTRA-DEEP SUBMICRON DESIGNS WHITE PAPER PARASITIC EXTRACTION FOR DEEP SUBMICRON AND ULTRA-DEEP SUBMICRON DESIGNS TABLE OF CONTENTS Introduction.................................................................................. 1 Design

More information

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Zhigang Pan Department of Computer Science University of California, Los Angeles, CA 90095 Email: fcong,pang@cs.ucla.edu

More information

Timing and Design Closure in Physical Design Flows

Timing and Design Closure in Physical Design Flows Timing and Design Closure in Physical Design Flows Introduction A physical design flow consists of producing a productionworthy layout from a gate-level netlist subject to a set of constraints. This paper

More information

On the Decreasing Significance of Large Standard Cells in Technology Mapping

On the Decreasing Significance of Large Standard Cells in Technology Mapping On the Decreasing Significance of Standard s in Technology Mapping Jae-sun Seo, Igor Markov, Dennis Sylvester, and David Blaauw Department of EECS, University of Michigan, Ann Arbor, MI 48109 {jseo,imarkov,dmcs,blaauw}@umich.edu

More information

ECE260B CSE241A Winter Placement

ECE260B CSE241A Winter Placement ECE260B CSE241A Winter 2005 Placement Website: / courses/ ece260b- w05 ECE260B CSE241A Placement.1 Slides courtesy of Prof. Andrew B. Slides courtesy of Prof. Andrew B. Kahng VLSI Design Flow and Physical

More information

On GPU Bus Power Reduction with 3D IC Technologies

On GPU Bus Power Reduction with 3D IC Technologies On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The

More information

CAD Algorithms. Placement and Floorplanning

CAD Algorithms. Placement and Floorplanning CAD Algorithms Placement Mohammad Tehranipoor ECE Department 4 November 2008 1 Placement and Floorplanning Layout maps the structural representation of circuit into a physical representation Physical representation:

More information

Case study of Mixed Signal Design Flow

Case study of Mixed Signal Design Flow IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 49-53 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Case study of Mixed Signal Design

More information

FAST time-to-market, steadily decreasing cost, and

FAST time-to-market, steadily decreasing cost, and IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 10, OCTOBER 2004 1015 Power Estimation Techniques for FPGAs Jason H. Anderson, Student Member, IEEE, and Farid N. Najm, Fellow,

More information

10. Interconnects in CMOS Technology

10. Interconnects in CMOS Technology 10. Interconnects in CMOS Technology 1 10. Interconnects in CMOS Technology Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October

More information

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 01 Introduction Welcome to the course on Hardware

More information

A Practical Approach to Preventing Simultaneous Switching Noise and Ground Bounce Problems in IO Rings

A Practical Approach to Preventing Simultaneous Switching Noise and Ground Bounce Problems in IO Rings A Practical Approach to Preventing Simultaneous Switching Noise and Ground Bounce Problems in IO Rings Dr. Osman Ersed Akcasu, Jerry Tallinger, Kerem Akcasu OEA International, Inc. 155 East Main Avenue,

More information

Recent Topics on Programmable Logic Array

Recent Topics on Programmable Logic Array Seminar Material For Graduate Students 2001/11/30 Recent Topics on Programmable Logic Array Department of Electronics Engineering, Asada Lab. M1, 16762, Ulkuhan Ekinciel Abstract: The programmable logic

More information

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 8 (2013), pp. 907-912 Research India Publications http://www.ripublication.com/aeee.htm Circuit Model for Interconnect Crosstalk

More information

Call for Participation

Call for Participation ACM International Symposium on Physical Design 2015 Blockage-Aware Detailed-Routing-Driven Placement Contest Call for Participation Start date: November 10, 2014 Registration deadline: December 30, 2014

More information

Advanced FPGA Design Methodologies with Xilinx Vivado

Advanced FPGA Design Methodologies with Xilinx Vivado Advanced FPGA Design Methodologies with Xilinx Vivado Alexander Jäger Computer Architecture Group Heidelberg University, Germany Abstract With shrinking feature sizes in the ASIC manufacturing technology,

More information

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 797- flur,chengkokg@ecn.purdue.edu

More information

EEL 4783: HDL in Digital System Design

EEL 4783: HDL in Digital System Design EEL 4783: HDL in Digital System Design Lecture 13: Floorplanning Prof. Mingjie Lin Topics Partitioning a design with a floorplan. Performance improvements by constraining the critical path. Floorplanning

More information

Digital Design Methodology (Revisited) Design Methodology: Big Picture

Digital Design Methodology (Revisited) Design Methodology: Big Picture Digital Design Methodology (Revisited) Design Methodology Design Specification Verification Synthesis Technology Options Full Custom VLSI Standard Cell ASIC FPGA CS 150 Fall 2005 - Lec #25 Design Methodology

More information

ECO-system: Embracing the Change in Placement

ECO-system: Embracing the Change in Placement Motivation ECO-system: Embracing the Change in Placement Jarrod A. Roy and Igor L. Markov University of Michigan at Ann Arbor Cong and Sarrafzadeh: state-of-the-art incremental placement techniques unfocused

More information

EE586 VLSI Design. Partha Pande School of EECS Washington State University

EE586 VLSI Design. Partha Pande School of EECS Washington State University EE586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 1 (Introduction) Why is designing digital ICs different today than it was before? Will it change in

More information

Toward Accurate Models of Achievable Routing

Toward Accurate Models of Achievable Routing 648 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 20, NO. 5, MAY 2001 Toward Accurate Models of Achievable Routing Andrew B. Kahng, Stefanus Mantik, and Dirk Stroobandt,

More information

Lecture 8: Synthesis, Implementation Constraints and High-Level Planning

Lecture 8: Synthesis, Implementation Constraints and High-Level Planning Lecture 8: Synthesis, Implementation Constraints and High-Level Planning MAH, AEN EE271 Lecture 8 1 Overview Reading Synopsys Verilog Guide WE 6.3.5-6.3.6 (gate array, standard cells) Introduction We have

More information

Low-Power Technology for Image-Processing LSIs

Low-Power Technology for Image-Processing LSIs Low- Technology for Image-Processing LSIs Yoshimi Asada The conventional LSI design assumed power would be supplied uniformly to all parts of an LSI. For a design with multiple supply voltages and a power

More information

ECE 5745 Complex Digital ASIC Design Topic 13: Physical Design Automation Algorithms

ECE 5745 Complex Digital ASIC Design Topic 13: Physical Design Automation Algorithms ECE 7 Complex Digital ASIC Design Topic : Physical Design Automation Algorithms Christopher atten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece7

More information

MAPLE: Multilevel Adaptive PLacEment for Mixed Size Designs

MAPLE: Multilevel Adaptive PLacEment for Mixed Size Designs MAPLE: Multilevel Adaptive PLacEment for Mixed Size Designs Myung Chul Kim, Natarajan Viswanathan, Charles J. Alpert, Igor L. Markov, Shyam Ramji Dept. of EECS, University of Michigan IBM Corporation 1

More information

A Transistor-Level Placement Tool for Asynchronous Circuits

A Transistor-Level Placement Tool for Asynchronous Circuits A Transistor-Level Placement Tool for Asynchronous Circuits M Salehi, H Pedram, M Saheb Zamani, M Naderi, N Araghi Department of Computer Engineering, Amirkabir University of Technology 424, Hafez Ave,

More information

Crosstalk Aware Static Timing Analysis Environment

Crosstalk Aware Static Timing Analysis Environment Crosstalk Aware Static Timing Analysis Environment B. Franzini, C. Forzan STMicroelectronics, v. C. Olivetti, 2 20041 Agrate B. (MI), ITALY bruno.franzini@st.com, cristiano.forzan@st.com ABSTRACT Signals

More information

PICo Embedded High Speed Cache Design Project

PICo Embedded High Speed Cache Design Project PICo Embedded High Speed Cache Design Project TEAM LosTohmalesCalientes Chuhong Duan ECE 4332 Fall 2012 University of Virginia cd8dz@virginia.edu Andrew Tyler ECE 4332 Fall 2012 University of Virginia

More information

EE582 Physical Design Automation of VLSI Circuits and Systems

EE582 Physical Design Automation of VLSI Circuits and Systems EE582 Prof. Dae Hyun Kim School of Electrical Engineering and Computer Science Washington State University Preliminaries Table of Contents Semiconductor manufacturing Problems to solve Algorithm complexity

More information

Timing Driven Force Directed Placement with Physical Net Constraints

Timing Driven Force Directed Placement with Physical Net Constraints Timing Driven Force Directed Placement with Physical Net Constraints Karthik Rajagopal Tal Shaked & University of Washington Yegna Parasuram Tung Cao Amit Chowdhary Bill Halpin & Syracuse University ABSTRACT

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Challenges and Opportunities for Design Innovations in Nanometer Technologies

Challenges and Opportunities for Design Innovations in Nanometer Technologies SRC Design Sciences Concept Paper Challenges and Opportunities for Design Innovations in Nanometer Technologies Jason Cong Computer Science Department University of California, Los Angeles, CA 90095 (E.mail:

More information

Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs

Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs A Technology Backgrounder Actel Corporation 955 East Arques Avenue Sunnyvale, California 94086 April 20, 1998 Page 2 Actel Corporation

More information

EEM870 Embedded System and Experiment Lecture 2: Introduction to SoC Design

EEM870 Embedded System and Experiment Lecture 2: Introduction to SoC Design EEM870 Embedded System and Experiment Lecture 2: Introduction to SoC Design Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw March 2013 Agenda

More information

Place and Route for FPGAs

Place and Route for FPGAs Place and Route for FPGAs 1 FPGA CAD Flow Circuit description (VHDL, schematic,...) Synthesize to logic blocks Place logic blocks in FPGA Physical design Route connections between logic blocks FPGA programming

More information

Overview of Digital Design with Verilog HDL 1

Overview of Digital Design with Verilog HDL 1 Overview of Digital Design with Verilog HDL 1 1.1 Evolution of Computer-Aided Digital Design Digital circuit design has evolved rapidly over the last 25 years. The earliest digital circuits were designed

More information

THE latest generation of microprocessors uses a combination

THE latest generation of microprocessors uses a combination 1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz

More information

Digital VLSI Design. Lecture 7: Placement

Digital VLSI Design. Lecture 7: Placement Digital VLSI Design Lecture 7: Placement Semester A, 2016-17 Lecturer: Dr. Adam Teman 29 December 2016 Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from

More information

Digital Design Methodology

Digital Design Methodology Digital Design Methodology Prof. Soo-Ik Chae Digital System Designs and Practices Using Verilog HDL and FPGAs @ 2008, John Wiley 1-1 Digital Design Methodology (Added) Design Methodology Design Specification

More information

2. TOPOLOGICAL PATTERN ANALYSIS

2. TOPOLOGICAL PATTERN ANALYSIS Methodology for analyzing and quantifying design style changes and complexity using topological patterns Jason P. Cain a, Ya-Chieh Lai b, Frank Gennari b, Jason Sweis b a Advanced Micro Devices, 7171 Southwest

More information

Exploring Logic Block Granularity for Regular Fabrics

Exploring Logic Block Granularity for Regular Fabrics 1530-1591/04 $20.00 (c) 2004 IEEE Exploring Logic Block Granularity for Regular Fabrics A. Koorapaty, V. Kheterpal, P. Gopalakrishnan, M. Fu, L. Pileggi {aneeshk, vkheterp, pgopalak, mfu, pileggi}@ece.cmu.edu

More information

Lattice Semiconductor Design Floorplanning

Lattice Semiconductor Design Floorplanning September 2012 Introduction Technical Note TN1010 Lattice Semiconductor s isplever software, together with Lattice Semiconductor s catalog of programmable devices, provides options to help meet design

More information

Design Methodologies

Design Methodologies Design Methodologies 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 Complexity Productivity (K) Trans./Staff - Mo. Productivity Trends Logic Transistor per Chip (M) 10,000 0.1

More information

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA A Path Based Algorithm for Timing Driven Logic Replication in FPGA By Giancarlo Beraudo B.S., Politecnico di Torino, Torino, 2001 THESIS Submitted as partial fulfillment of the requirements for the degree

More information

A Framework for Systematic Evaluation and Exploration of Design Rules

A Framework for Systematic Evaluation and Exploration of Design Rules A Framework for Systematic Evaluation and Exploration of Design Rules Rani S. Ghaida* and Prof. Puneet Gupta EE Dept., University of California, Los Angeles (rani@ee.ucla.edu), (puneet@ee.ucla.edu) Work

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Professor Jason Cong UCLA Computer Science Department Los Angeles, CA 90095 http://cadlab.cs.ucla.edu/~ /~cong

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET4076) Lecture 4(part 2) Testability Measurements (Chapter 6) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Previous lecture What

More information

Contemporary Design. Traditional Hardware Design. Traditional Hardware Design. HDL Based Hardware Design User Inputs. Requirements.

Contemporary Design. Traditional Hardware Design. Traditional Hardware Design. HDL Based Hardware Design User Inputs. Requirements. Contemporary Design We have been talking about design process Let s now take next steps into examining in some detail Increasing complexities of contemporary systems Demand the use of increasingly powerful

More information

Physical Implementation

Physical Implementation CS250 VLSI Systems Design Fall 2009 John Wawrzynek, Krste Asanovic, with John Lazzaro Physical Implementation Outline Standard cell back-end place and route tools make layout mostly automatic. However,

More information

E 4.20 Introduction to Digital Integrated Circuit Design

E 4.20 Introduction to Digital Integrated Circuit Design E 4.20 Introduction to Digital Integrated Circuit Design Peter Cheung Department of Electrical & Electronic Engineering Imperial College London URL: www.ee.ic.ac.uk/pcheung/ E-mail: p.cheung@imperial.ac.uk

More information

Abstract Page. On the Synthesis-Oriented characteristics of high performance, deep-submicron CMOS VLSI cell libraries.

Abstract Page. On the Synthesis-Oriented characteristics of high performance, deep-submicron CMOS VLSI cell libraries. Abstract Page On the Synthesis-Oriented characteristics of high performance, deep-submicron CMOS VLSI cell libraries. Abstract A method to evaluate the synthesis-oriented quality of cell libraries, as

More information

Spiral 2-8. Cell Layout

Spiral 2-8. Cell Layout 2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric

More information

On Wirelength Estimations for Row-Based Placement

On Wirelength Estimations for Row-Based Placement On Wirelength Estimations for Row-Based Placement Andrew E. Caldwell, Andrew B. Kahng, Stefanus Mantik, Igor L. Markov and Alex Zelikovsky UCLA Computer Science Department, Los Angeles, CA 90095-1596 fcaldwell,abk,stefanus,imarkov,alexzg@cs.ucla.edu

More information

VLSI Physical Design: From Graph Partitioning to Timing Closure

VLSI Physical Design: From Graph Partitioning to Timing Closure VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5 Global Routing Original uthors: ndrew. Kahng, Jens, Igor L. Markov, Jin Hu VLSI Physical Design: From Graph Partitioning to Timing

More information

Interconnect Design for Deep Submicron ICs

Interconnect Design for Deep Submicron ICs ! " #! " # - Interconnect Design for Deep Submicron ICs Jason Cong Lei He Kei-Yong Khoo Cheng-Kok Koh and Zhigang Pan Computer Science Department University of California Los Angeles CA 90095 Abstract

More information

3D systems-on-chip. A clever partitioning of circuits to improve area, cost, power and performance. The 3D technology landscape

3D systems-on-chip. A clever partitioning of circuits to improve area, cost, power and performance. The 3D technology landscape Edition April 2017 Semiconductor technology & processing 3D systems-on-chip A clever partitioning of circuits to improve area, cost, power and performance. In recent years, the technology of 3D integration

More information

ECE 5745 Complex Digital ASIC Design Topic 7: Packaging, Power Distribution, Clocking, and I/O

ECE 5745 Complex Digital ASIC Design Topic 7: Packaging, Power Distribution, Clocking, and I/O ECE 5745 Complex Digital ASIC Design Topic 7: Packaging, Power Distribution, Clocking, and I/O Christopher Batten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece5745

More information

Introduction to CMOS VLSI Design (E158) Lecture 7: Synthesis and Floorplanning

Introduction to CMOS VLSI Design (E158) Lecture 7: Synthesis and Floorplanning Harris Introduction to CMOS VLSI Design (E158) Lecture 7: Synthesis and Floorplanning David Harris Harvey Mudd College David_Harris@hmc.edu Based on EE271 developed by Mark Horowitz, Stanford University

More information

Introduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN

Introduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN 1 Introduction The evolution of integrated circuit (IC) fabrication techniques is a unique fact in the history of modern industry. The improvements in terms of speed, density and cost have kept constant

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 6 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Contents FPGA Technology Programmable logic Cell (PLC) Mux-based cells Look up table PLA

More information

FROSTY: A Fast Hierarchy Extractor for Industrial CMOS Circuits *

FROSTY: A Fast Hierarchy Extractor for Industrial CMOS Circuits * FROSTY: A Fast Hierarchy Extractor for Industrial CMOS Circuits * Lei Yang and C.-J. Richard Shi Department of Electrical Engineering, University of Washington Seattle, WA 98195 {yanglei, cjshi@ee.washington.edu

More information

Cell Libraries and Design Hierarchy. Instructor S. Demlow ECE 410 February 1, 2012

Cell Libraries and Design Hierarchy. Instructor S. Demlow ECE 410 February 1, 2012 Cell Libraries and Design Hierarchy Instructor S. Demlow ECE 410 February 1, 2012 Stick Diagrams Simplified NAND Layout Simplified NOR Layout Metal supply rails blue n and p Active green Poly gates red

More information

Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices

Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices Deshanand P. Singh Altera Corporation dsingh@altera.com Terry P. Borer Altera Corporation tborer@altera.com

More information