Analyzing Timing Uncertainty in Mesh-based Clock Architectures

Size: px
Start display at page:

Download "Analyzing Timing Uncertainty in Mesh-based Clock Architectures"

Transcription

1 Analyzing Timing Uncertainty in Mesh-based Clock Architectures Subodh M. Reddy Gustavo R. Wilke Λ Rajeev Murgai Fujitsu Laboratories of America, Inc. UFRGS Fujitsu Laboratories of America, Inc. California, USA Porto Alegre, Brazil California, USA Abstract Mesh architectures are used to distribute critical global signals on a chip, such as clock and power/ground. Redundancy created by mesh loops smooths out undesirable variations between signal nodes spatially distributed over the chip. However, one problem with the mesh architectures is the difficulty in accurately analyzing large instances. Furthermore, variations in process and temperature, supply noise and crosstalk noise cause uncertainty in the delay from clock source to flip-flops. In this paper, we study the problem of analyzing timing uncertainty in mesh-based clock architectures. We propose solutions for both pure mesh and (mesh + global-tree) architectures. The solutions can handle large design and mesh instances. The maximum error in uncertainty values reported by our solutions is 1-3ps with respect to the golden Monte Carlo simulations, which is at most 0.5% of the nominal clock latency of about 600ps. 1 Introduction needed to accurately model a fine mesh in a large design and a large number of metal loops present in the mesh structure. As a result, circuit simulators such as SPICE either require inordinate amount of memory or run-time. In fact, HSPICE and HSIM (Synopsys) failed to analyze even coarse meshes for an industrial design [4]. An added degree of complication is brought forth by variations in parameters that affect clock latency [18, 14, 3, 10]. Examples of such parameters are process (channel length, oxide thickness, interconnect width and thickness, etc), supply voltage, temperature and crosstalk noise. Variations in these parameters cause variations or uncertainty in delay from the clock root to flip-flops, both die-to-die and clock cycle-to-clock cycle [17, 8]. With technology scaling, the magnitude of parameter variations and the sensitivity of clock latency towards variations are increasing. The focus of this paper is to analyze the timing uncertainty of mesh-based clock architectures in the presence of parameter variations. We believe this is the first work that addresses this problem. We propose solutions for both pure mesh and (mesh + global-tree) architectures. The solutions can handle large design and mesh instances. We show that uncertainty values reported by our solutions are within 1-3ps of those obtained from the golden Monte Carlo simulations (e.g., 35ps vs. 33ps), where the nominal clock latency is about 600ps. Another major benefit of our scheme is that it is easily amenable to distributed- or grid-computing. The paper is organized as follows. Section 2 gives preliminaries. Section 3 describes previous work on clock mesh analysis. An overview of our methodology for uncertainty analysis of clock meshes under parameter variations is presented in Section 4. The details of our methodology are presented along with experimental results in Section 5. We conclude and give directions for future work in Section 6. Figure 1: A mesh-based clock architecture Mesh or grid architectures are popular for distributing critical global signals on a chip, such as clock and power/ground. The mesh architecture uses inherent redundancy created by loops to smooth out undesirable variations between signal nodes spatially distributed over the chip. These variations can be due to non-uniform switching activity in the design, within-die process variations, or asymmetric distribution of circuit elements (such as flip-flops). For power/ground, mesh can help reduce voltage variations at different nodes in the network due to non-uniform switching activities. For the clock signal, a mesh (Figure 1) has been shown to achieve very low skew in microprocessor designs, e.g., Digital Alpha [2]; IBM G5 S/390 [6], Power4 and PowerPC [12]; SUN Sparc V9 [13]. Mesh also has excellent jitter mitigation properties. However, one major problem that has limited the applicability of mesh architectures is the difficulty in analyzing them with sufficient accuracy. The main reasons are the huge number of circuit nodes Λ This work was done when the author was an intern at Fujitsu Labs. of America. 2 Preliminaries 2.1 Mesh-based Clock Architecture Figure 1 shows a typical mesh architecture used for distributing the clock signal from the PLL or root buffer to sequential elements such as flip-flops (FFs) and latches on the chip. It has three main components: 1) a (uniform) mesh, 2) a global buffered tree that drives the mesh, and 3) local interconnect, which connects the clock inputs of FFs directly to the nearest point on the mesh. The mesh is a uniform rectangular grid of wires spanning the entire chip area, driven by the mesh buffers and propagating the clock to the FFs. An mxn mesh or grid has m rows (horizontal wires) and n columns (vertical wires). Thesizeofthemeshismxn. For a given chip size, the greater the mesh size, the more fine-grain the mesh is. A mesh node (or grid node) is the point where each row is connected to each column. As shown in Figure 1, the global (H-)tree delivers the clock signal to the mesh nodes via buffers called mesh buffers. We assume a uniform array of kx` mesh buffers. In Figure 1, k = m =4and ` = n =4. The mesh wire between two adjacent mesh nodes is called a mesh segment, and represents one grid unit /DATE EDAA

2 Figure 2: Single-ß model for interconnect Figure 3: 3-ß model for interconnect Clock Network Model Each buffer (mesh buffer and tree buffer) is modeled using the BSIM3 transistor models for NMOS and PMOS. Since the mesh is largely composed of wires, it is important to have an accurate wire model. To model wires smaller than 100μ, a single-ß model, which has two capacitors, a resistor and an inductor, is used (Figure 2). For longer wires, a 3-ß model is used, as shown in Figure 3. Our study on Fujitsu s 0.11μ technology showed that this scheme is delay-accurate within 0.5% of 4-ß and 5-ß models [16]. It helps reduce the number of nodes in the SPICE model. The same rule is used to model wires that connect FFs to the mesh and wires on the global tree. The clock pin of a FF is modeled as an equivalent capacitance. 2.2 Clock Timing and Uncertainty In any clock distribution scheme, one of the most important concerns is to accurately compute the clock arrival time a (also called clock delay or latency) at the clock input pin of each FF. Assume we have apathp in a design whose start and end gates are FFs FF s and FF e. Let clock arrival times at these FFs be a s and a e respectively. The maximum delay d max allowed on P is a function of (a e as), the difference in clock arrival times at the two FFs. d max» ae as + fi t set up; (1) where fi is the clock cycle and t set up is the set-up time for FF e. a e as is known as the skew between FF s and FF e. By comparing the arrival times among all FFs, we can compute the worst relevant clock skew in the design. This is the maximum negative difference in arrival times at two FFs that are connected by a data path. For a fixed clock cycle, the worst skew limits the maximum delay in the data path. Thus, it has a direct impact on the design turnaround time. Alternatively, for a given design, the skew impacts the maximum clock frequency for which the design will function correctly. In practice, at a given flip-flop on a chip, two consecutive clock rising (or falling) edges may not be fi time units apart. Moreover, for the same corresponding flip-flop on two chips, the clock latencies from the clock source may be different. Clock timing uncertainty denotes the deviation of the timing of the clock edge from its expected value. Uncertainty affects a s and a e in (1) and hence d max or fi, as discussed above. Uncertainty in clock timing can be due to several factors. 1. Supply (V ) noise: This is caused by different sets of gates switching in different clock cycles. Since gate delay depends on the value of supply voltage, any change in the supply voltage of a clock buffer changes the clock arrival time at the FF. 2. Temperature (T ) variation: This variation arises due to different switching activities on the chip (both spatial and temporal) and because power and temperature are strongly coupled to each other, especially for leakage-dominant technologies. A block with higher switching activity dissipates higher dynamic power, leading to higher local temperatures. That, in turn, increases the leakage power dissipation, further increasing the total power. A gate operating at a higher temperature exhibits higher delay due to reduced carrier mobility. 3. Process variations (within die and die-to-die) P : Examplesof process variations include intrinsic variations such as random dopant fluctuations in a MOSFET channel and extrinsic variations such as channel length and oxide thickness variations. In a chemical mechanical planarization (CMP) process, interconnect width, thickness, spacing and height may vary significantly from the intended values. These variations cause gate and wire delays to deviate from their desired values. It is difficult to predict the precise magnitude of variations and hence the exact values of wire and gate delays after manufacturing. 4. Crosstalk noise X: Delay of a clock wire v can change if there is an aggressor a that is physically close to v and is switching. Since the aggressor s switching behavior can change from one cycle to the next, it can lead to timing variation on the victim. Clock is one of the most important signals in the design. V dd /V ss shielding is typically done on both sides of the clock to eliminate such crosstalk impact. Shielding, however, does not prevent crosstalk from the top and bottom layers, when a wide bus is going over the clock line. 5. PLL jitter. Clock generated from the PLL has an inherent jitter. Some of these parameters (such as process) have random unknown variation components, but once the chip is manufactured, the values are fixed. Other parameter variations are deterministic they depend on the state of the design and the last & current signal values, and have to be computed for each cycle. Examples are supply and crosstalk noise. Their exact computation typically requires prohibitive CPU and memory resources and may be infeasible in practice. Nevertheless, both kinds of parameter variations cause uncertainty in the timing of the clock edge at a flip-flop from its expected value. Let D denote the latency (path delay) from the clock root to a flip-flop. In general, D is a function of supply voltage V i at each clock buffer B i on the path, the temperature T i at each clock buffer and wire, the set of process parameters P, and crosstalk noise X. In short, we write D( ~ V; ~ T; ~ P; ~ X), where ~ V denotes the vector of all buffer voltages fv ig. In the presence of parameter variations, D is a distribution with mean μ and standard deviation ff. We define uncertainty in D, denoted U(D),askff. In this paper, we use k =3. Problem Statement: Given a mesh-based clock network and VTPX parameter variations for each component of the clock network (i.e., clock buffers and wires), determine the timing uncertainty U(D i) in the clock latency D i from the clock root to each flip-flop FF i. 3 Previous Work If the clock network is a tree, uncertainty analysis can be carried out using gate-level statistical static timing analysis [9, 15, 1]. However, such an approach is not directly applicable for a mesh-based clock network due to metal loops (cycles) present in the mesh. We are not aware of any work on clock mesh uncertainty analysis. The only known solution is that if the mesh model fits in the memory, we can run Monte Carlosimulations (MCSs) [7] assuming some distribution for parameter variations and obtain a delay distribution at each FF, from which timing uncertainties at FFs could be derived. However, this is possible only for small design and mesh instances. Not much has been published on the problem of clock mesh latency analysis. [12, 5] present a scheme to break the clock mesh into a tree and apply a smoothing algorithm to redistribute the mesh loads. The tree is analyzed for latency. However, no accuracy results are shown. In [2], the clock mesh is verified in two steps. First, an AWE-based reduction [11] is performed on the mesh to simplify the mesh elements. Then, the simplified circuits are simulated using SPICE. The accuracy and efficiency of this method depend on the accuracy and stability of the moment matching technique. Recently, a sliding window scheme (SWS) was proposed for latency analysis of clock meshes [4]. Since uncertainty analysis derives its basic idea from SWS, we describe it next. 3.1 Sliding Window Scheme for Mesh Latency In SWS, the mesh is modeled with two different resolutions: a detailed circuit model is used for the mesh elements geometrically close

3 W border around W W complete mesh Ca a preserve circuit detail inside W lump capacitance & ignore resistance outside W (except on mesh segments) slide W Figure 4: The sliding window scheme ing the region outside the window reduces the number of nodes in the circuit model. Approximating each FF saves either 7 nodes (if the wire is longer than 100μ) or 3 nodes (otherwise). In a typical design, where there are hundreds of thousands of FFs, reduction in the SPICE model size can be huge. It was shown in [4] that HSPICE could not finish on a 65x65 mesh with 100K FFs. It needed more than 2GB of memory, whereas SWS could complete in less than 1.5 hours within 1GB memory using four machines. The latencies computed by SWS, using a border of 1 grid unit, are almost always within 1% of the latencies computed from SPICE simulation of the complete mesh. It was also shown that using no border (i.e., a border of 0 grid units) does not yield accurate results; errors of up to 30% were seen. By increasing the border beyond 1 grid, the accuracy does not improve much. However, the runtime increases significantly. In short, empirically a border of 1 grid unit was found to be optimum. Also, window size was shown to have very little impact on accuracy. However, smaller window size means smaller model and hence better chances for large designs to fit in the memory. But smaller window also implies more simulations. Figure 5: Statistical simulation model for a buffer driving a wire to the nodes whose latency we are measuring and a simplified model is used for the mesh elements far from the nodes being measured. The simplification is with respect to the local FF connections. Given ameshofsizemxn, define a rectangular window W of size rxs, where r<mand s<n. Expand W by some border to obtain W 0 (Figure 4, in which the border is 1 grid unit). If the lower left corner of W 0 is fixed to a point on the mesh, W 0 covers some fixed region of the mesh (Figure 4). The connection of a FF within W 0 to the nearest mesh segment is modeled accurately by an appropriate ß model, as described in Section (single-ß or 3-ß, depending on the length of the connection). The clock input pin of the FF is modeled as a capacitance. FFs that lie outside W 0 and their connections to the mesh are modeled approximately. The wire connecting such a FF to the mesh is replaced by an equivalent single capacitance; the wire resistance is ignored. Given a mesh node a outside W 0, the region covered by a is the unit rectangle shown in Figure 4. Let C a be the sum of the clock input pin capacitances of all the FFs in this region along with the capacitances of the wires connecting them to the mesh. Then, C a is lumped as a single capacitance at a. The mesh segments outside W 0 are still modeled with appropriate ß models. The SPICE file corresponding to this model for the window location is generated and simulated. The clock latencies at all FFs in the inner window W are measured. 1 Next, the window W is slid horizontally or vertically so as not to overlap with the previous locations. Once again, a SPICE model is created and run. The entire mesh simulation is broken down into multiple independent window-based simulations. In fact, d m 1 r 1 eλdn 1 s 1 e SPICE simulations are needed to cover the entire mesh and all the FFs in the design. SWS is a divide-and-conquer partitioning technique. Approximat- 1 Latencies of the FFs in the border of W are ignored. These will be measured when these FFs will fall in the non-border area of other window(s). 4 Clock Mesh Uncertainty Analysis 4.1 Modeling Sources of Uncertainty We model various sources of uncertainty as follows. Refer to Figure 5, where inverting buffer1 drives inverting buffer2 through a wire. 1. Supply Noise V : Supply noise is modeled by supplying independent power supplies to each clock buffer, and allowing them to vary randomly according to a noise model. The amount of variation is controlled by a user input parameter, supply tolerance. 2. Temperature Variation T : Rising temperature causes CMOS circuits to operate more slowly, and wiring resistances to increase. Temperature variation of transistors is modeled by specifying an underlying temperature for the entire chip and then applying random local temperature variations on each clock buffer and interconnect. The variation to apply is given by a user input parameter, max deltemp. 3. Process Variation P : As shown in Figure 5, process variation of transistors is modeled using only channel length (l p and l n for PMOS and NMOS transistors respectively) and threshold voltage (delvt n and delvt p). Other variations, such as oxide thickness and dopant concentration, have the overall effect of varying the threshold voltage and hence are indirectly included in our model. The variations of threshold voltage and channel lengths are passed into each instance of the buffer sub-circuit models. Process variation of wiring is modeled by applying random process factors pf c and pf r to the wiring capacitance and resistance respectively in the wire models. 4. Crosstalk Noise X: Crosstalk noise is modeled by attaching external noise sources to the wire model (Figure 5) and by applying random inputs at these sources based on some probability distribution. The crosstalk factor associated with the instances must also be defined whenever a wire is instantiated. The crosstalk factor is a unique property of each design, and is supplied by the user through the parameter xtfactor. 5. PLL jitter: We will assume a maximum PLL jitter of 3ff PLL. 4.2 Computing Uncertainty: Basic Idea The basic idea is simple: we use SWS for analyzing timing uncertainty of a mesh. We attach variation parameters with each buffer and wire on the clock network, as illustrated in Figure 5. For each window W 0 of SWS, a SPICE model of the mesh is created (just as in [4]) and Monte Carlo simulations (MCSs) are carried out. In each run of the MCSs, the values of VTPX parameters for each component of the clock network are determined from their respective distributions, and the latency D i of each flip-flop FF i that lies in the core of W 0 (i.e., in W 0 W ) is computed. After all runs are completed, a distribution of the delay D i is available for each such FF i. The uncertainty U(D i)=3ff(d i) is then computed from this distribution. Finally, U(D i)s are collected from all windows W 0 to yield uncertainties at all the FFs in the design. In this paper, we do not use large design and mesh instances. The feasibility of SWS for those has already been shown [4]. Our focus is

4 parameters 3ff variations NMOS/PMOS channel length 08μ NMOS/PMOS threshold voltage 20mV interconnect resistance 20% interconnect capacitance 0% temperature 20C V dd 10% crosstalk switching probability 0.5 Input uncertainty Max. output uncertainty (ps) (ps) 8x8 mesh 16x16 mesh Table 2: Reduction of uncertainty by mesh Table 1: 3ff variations for different parameters to determine if SWS can be used for accurate uncertainty analysis of both pure mesh and (mesh + global tree) architectures, and if so, to derive a practical and usable methodology. The next section presents detailed results of our study. 5 Results 5.1 Experimental Set-up & Definitions All our experiments were conducted in Fujitsu s 0.11μ technology. The 3ff variations for various parameters are shown in Table 1. In the following, we will compare FF uncertainties obtained from a methodology M against those from a golden reference methodology G. For instance, M may correspond to the SWS-based uncertainty analysis, and G, to running MCSs on the flat single model of the mesh-based clock network. To evaluate the quality of uncertainty results, we use two metrics: 1) error in the maximum uncertainty, E-UMAX, and 2) the maximum uncertainty-error at a FF, MAXE-FF. E-UMAX is obtained by first computing UMAX, the maximum over uncertainties at all the flip-flop clock pins (i.e., UMAX = max FFi fu(d i)g), using M and then comparing it with the UMAX computed by G. MAXE-FF is calculated by first computing the percentage error in uncertainty at each FF under M with respect to the golden uncertainty value at that FF and then picking the maximum percentage error value over all the FFs. Note MAXE-FF E-UMAX. Since we use Monte Carlo simulations to compute timing uncertainties, the accuracy of results depends on the number of simulations. More simulations usually means higher accuracy. Since it was not feasible for us to run a large number of simulations due to limited CPU resources, we did an experiment to determine the number of simulations that yield uncertainty values within 10% accuracy, where the golden result used 800 simulations. It turned out that running 400 simulations resulted in MAXE-FF of about 5.5% with respect to the golden result, whereas with 100 simulations, we obtained MAXE-FF of about 16%. So we use 400 simulations in all our MCS runs (unless stated otherwise). We present results for two architectures: pure mesh with no global tree, and complete clock network with mesh and global tree. 5.2 Pure Mesh First, we study effectiveness of clock mesh in mitigating uncertainty. Then, we investigate accuracy of SWS-based uncertainty analysis methodology. In both experiments, only the mesh along with mesh buffers was modeled and simulated. The global tree was not explicitly included in the model Effectiveness in Uncertainty Mitigation Although the global tree was not explicitly included in the model, different values for maximum skew and uncertainty were used on the inputs of the mesh buffers. These model the skew and uncertainty due to the global tree driving the mesh. The mesh buffer inputs were assumed to be independent Gaussian distributions with mean clock arrival times satisfying the maximum skew and standard deviation ff related to uncertainty. The interconnect resistance and capacitance variations shown in Table 1 were applied to each wire in the mesh. A chip of size 500μ x 500μ was used, with 1000 flip-flops placed randomly with a uniform distribution. Two different mesh sizes were tried: 8x8 and 16x16. The flip-flops are connected to the closest mesh node. Different experiments were run for maximum input skews of 0ps & 5ps, and for 3ff uncertainties of 3ps, 15ps, 30ps and 150ps (with Gaussian distributions) at mesh buffer inputs Monte Carlo simulations were performed in each case. The values of the maximum 3ff output uncertainty over all mesh nodes for input skew of 0ps and different input uncertainties are presented in Table 2 for both 8x8 and 16x16 meshes. From the column 8x8 mesh, we see that the 8x8 meshis able to reduce the uncertainty at the mesh nodes by a factor of 7 to 8 when compared to the uncertainty at mesh buffer inputs. From the column 16x16 mesh, it is clear that by increasing the mesh size from 8x8 to 16x16, the uncertainty also reduces by a factor of 2. Thus, we can draw the following two conclusions. 1. Mesh is very effective in reducing timing uncertainty. 2. For the same chip size, a finer-grain mesh is more effective in reducing uncertainty than a coarse-grain mesh. The results for maximum input skew of 5ps are similar to those presented above and are omitted SWS In this section, we investigate if SWS-based MCSs can be used for mesh uncertainty analysis. We used a chip size of 5mm x 5mm, three different mesh sizes of 10x10, 18x18 and 26x26, and 1000 FFs distributed randomly over the chip. The clock root is directly connected to all the mesh buffer inputs. We compare the SWS-based methodology with respect to a golden reference methodology, in which Monte Carlo simulations are run on the entire mesh model. We intentionally chose a small problem size so that the golden model could fit in the memory and run in reasonable CPU time. The window size in SWS was fixed at one-fourth the mesh size. Figure 6 shows E- UMAX (both in ps and percentage) as a function of the border length for different mesh sizes. The golden UMAX is the lowest horizontal line in all the graphs. It can be seen that for all mesh sizes, E-UMAX is very high more than 50% for a border of 0 (i.e., no border), but decreases rapidly as the border is expanded. For 10x10 mesh and border of 1 grid, the error is almost 0%; for 18x18 mesh and border of 2 grids, the error is around 7%, and for 26x26 mesh and border of 3, the error is around 15%. In all cases, SWS was able to achieve E-UMAX of less than 0.1ps. Interestingly, this behavior is markedly different from that of SWS latency, where increasing the border from 0 to 1 grid units reduced the latency error significantly, but no further improvement was obtained by increasing the border beyond 1 unit. We conclude that the SWS-based methodology is effective for accurately analyzing timing uncertainty of clock meshes. The error in uncertainty goes down rapidly as the window border is increased. The border required to achieve a given accuracy in uncertainty vis-avis the golden reference is a monotonic function of the mesh size. 5.3 Complete Clock Network Having established that SWS is accurate for analyzing the timing uncertainty of a pure clock mesh, we now investigate if the SWS-based uncertainty analysis can handle the clock network of Figure 1, which includes, in addition to the mesh, a global tree that drives the mesh through mesh buffers.

5 Figure 6: Impact of border on SWS accuracy for mesh uncertainty analysis mesh Golden M-Uncorrelated M-Correlated size UMAX UMAX E-UMAX UMAX E-UMAX (ps) (ps) (%) (ps) (%) 8x x Table 3: UMAX & E-UMAX for tree-mesh decoupling mesh M-Uncorrelated M-Correlated size U: M (G) MAXE-FF U: M (G) MAXE-FF (ps) (%) (ps) (%) 8x (30.6) (29.53) 15 16x16 4 (28.72) (19.88) 5.46 Table 4: MAXE-FF for tree-mesh decoupling experiment Figure 7: Global tree uncertainty analysis One straightforward way of analyzing uncertainty of the complete clock network with SWS is to include the entire tree for each location of the window in SWS-based Monte Carlo simulations. Though accurate, this scheme is time consuming, memory intensive and wasteful, since it re-analyzes the same tree for each window location. If we can decouple the tree uncertainty analysis from the mesh analysis and carry out the two separately, the complete clock network uncertainty analysis can be sped up, using less memory as well Decoupling Tree Analysis and Mesh Analysis To ascertain the validity of decoupling for analyzing timing uncertainty, we carried out the following comparison. The golden methodology G comprised of running MCSs on the entire monolithic clock network model (with global tree and mesh together), and measuring the uncertainty at each FF. The methodology M corresponded to decoupling the tree and mesh analyses. It comprised of running MCSs on the global tree, deriving mean and standard deviation of the clock arrival time at the input of each mesh buffer, and using them as inputs to the mesh uncertainty analysis. One single simulation model was created for the mesh. The mean and standard deviation of the latency at the input of a mesh buffer are the same as those derived from the global tree analysis. Moreover, the latency variables for mesh buffer inputs are assumed to be independent Gaussian variables. The mesh uncertainty analysis computes uncertainty at every FF. The comparison of M and G results is shown in Tables 3 and 4 for two mesh sizes (8x8 and 16x16) for a 5mm x 5mm chip having 1000 FFs placed with a uniform random distribution. Table 3 shows UMAX, the maximum uncertainty over all FFs, for the golden methodology G (column Golden) and the methodology M (column M-Uncorrelated). It can be seen that E-UMAX is huge: both in ps (20ps and 25ps) and in percentage (57% and 77%) for the two mesh sizes respectively. Table 4 column M-Uncorrelated shows results for the flip-flop with % Error 1 Mesh8,8; Window = 2 Window = 5 % Error 1 Mesh16,16; Window = 4 Window = 8 Window = 12 Figure 8: E-UMAX for complete clock network using SWS maximum error in uncertainty (i.e, MAXE-FF). For the 8x8 mesh case, 11.48ps is the uncertainty U (with the decoupled methodology) of the FF with maximum error, whereas its golden uncertainty is 30.6ps, resulting in a percentage error of 62.5%. MAXE-FF for the 16x16 mesh is even larger: 82.5%. The reason for such huge errors is that the latency variables at the mesh buffer inputs are not all independent: they are correlated to each other. Correlation between the latency variables at two mesh buffers depends on the tree edges shared between the paths from the clock tree root to the two buffers. This is shown in Figure 7, where the paths from the root A to mesh buffers X and Y share edges AB and BC. Each of these edges contributes the same delay to the two paths. This is not considered in the independent variable assumption. One way to incorporate common path correlations at mesh buffer inputs is as follows. The tree uncertainty analysis generates delay distribution for each stage of the global clock tree (e.g., mean μ AB and standard deviation ff AB of the delay of edge AB in Figure 7).

6 % Error Mesh8,8; Window = 2 Window = 5 % Error Mesh16,16; Window = 4 Window = 8 Window = 12 Figure 9: MAXE-FF for complete clock network using SWS For each run of the mesh MCS, generate a delay sample for each tree stage from its delay distribution. Generate latency of each path in the clock tree by adding the delays of stages on the path. Use these path latency values as inputs in a particular MC run of the mesh analysis. Thus, each edge in the tree contributes the same delay to all the paths it belongs to. The results using this approach are shown in the column M-Correlated in Tables 3 and 4. It can be seen that E-UMAX values for the two mesh sizes are 6% and 2.5%, whereas MAX-EFF values are 10% and 5.5%. The absolute ps difference in uncertainties is at most 3ps. This implies that when decoupling the tree and mesh analyses, common path correlations in the tree must be taken into account. Then decoupling methodology yields accurate results vis-a-vis the golden monolithic methodology. One problem with our approach is that it ignores delay correlations between two successive stages on a single path. Accuracy of the decoupling approach can be further improved by incorporating the stage delay correlations Decoupling with SWS In this experiment, we used the same set-up and the golden methodology G as in Section However, the methodology M used the decoupled tree and mesh analyses with correlations, using SWS for the mesh with different window sizes. Figures 8 and 9 show E- UMAX and MAXE-FF values respectively. With window dimension about half of the mesh dimension (e.g., 8x8 window for 16x16 mesh), E-UMAX is <7% and MAXE-FF is <12%. From Table 3 column Golden UMAX, the maximum FF uncertainty values were in the 30-35ps range. A 12% error in uncertainty translates to about 4ps, which is really small, given nominal clock latencies of 570ps for the 8x8 mesh and 647ps for the 16x16 mesh. As for the impact of border, the percentage errors seem to go down with increasing border. However, in some cases, the error goes up. One possible explanation is that a 1-2% change in the percentage error with border is only ps, which falls within the accuracy limit of SPICE. 6 Conclusions We addressed the problem of computing timing uncertainty of meshbased clock architectures in the presence of parameter variations. We believe ours is the first work to address this problem. First, we showed that clock meshes are effective in reducing timing uncertainty, finer meshes being more effective than coarser meshes. We came up with an efficient and accurate solution based on the sliding window scheme, which was proposed recently for computing clock latency in mesh-based architectures. However, there are significant differences in the behavior of the SWS-based latency and uncertainty schemes, e.g., the optimum border length. We applied our solution to pure mesh and (mesh + global-tree) architectures. For (mesh + global-tree), we showed that the decoupled methodology must take into account common path correlations in the tree. By doing so, this methodology yielded a maximum error of 1-3ps in uncertainty values with respect to the golden MCS-based values on the monolithic complete clock network model, which is at most 0.5% of the nominal clock latency (around ps). Since our methodology is based on SWS, it is capable of analyzing uncertainty of large meshes and design instances, and is easily amenable to distributed- or grid-computing. Future work is in the following directions. 1) Running several hundred MCSs on a large design & fine mesh can be time consuming if hundreds of compute-servers are not available. We plan to work on making our methodology faster. 2) For the complete clock network, the decoupling method should handle correlations between consecutive stages on a path. 3) In this work, we modeled variation sources for each wire and buffer independently. However, supply voltage and temperature of components located close to each other are usually correlated. We will extend our model to handle these correlations. References [1] A. Agarwal, V. Zolotov, and D. T. Blaauw. Statistical Clock Skew Analysis Considering Intra-die Process Variations. In IEEE Trans. on CAD, pages , August [2] D. W. Bailey and B. J. Benscheneider. Clocking Design and Analysis for a 600-MHz Alpha Microprocessor. In IEEE JSSC Vol 33., No. 11, pages , November [3] K. A. Bowman, S. G. Duvall, and J. D. Meindl. Impact of Die-to-Die and Within-Die Parameter Fluctuations on the Maximum Clock Frequency Distribution for Gigascale Integration. In IEEE JSSC, pages , February [4] H. Chen, C. Yeh, G. Wilke, S. Reddy, H. Nguyen, W. Walker, and R. Murgai. A Sliding Window Scheme for Accurate Clock Mesh Analysis. In ICCAD, pages , November [5] P. J. Camporeseet al.. X-Y Grid Tree Tuning Method. In U.S. Patent, No. 6,205,571 B1, March [6] G. Northrop et. al. A 600-MHz G5 S/390 Microprocessor. In ISSCC Tech. Dig., pages 88 89, February [7] Hitchcock, R. Timing Verification and the Timing Analysis Program. In DAC, pages , June [8] Y. Liu, S. R. Nassif, L. T. Pillegi, and A.J. Strojwas. Impact of Interconnect Variations on the Clock Skew of a Gigahertz Microprocessor. In DAC, pages , June [9] M. Berkelaar. Statistical Delay Calculation, A Linear Time Method. In TAU, pages 15 24, December [10] M. Orshansky, L. Milor, P. Chen, K. Keutzer, and C. Hu. Impact of Systematic Spatial Intra-chip Gate Length Variability on Performance of High-speed Digital Circuits. In ICCAD, pages 62 67, November [11] L. T. Pillage and R. A. Rohrer. Asymptotic Waveform Evaluation for Timing Analysis. In IEEE Transactions on Computer- Aided Design, pages , April [12] P.J. Restle et. al. A Clock Distribution Network for Microprocessor. In IEEE JSSC Vol 36., No. 5, May [13] R. Heald et. al. Implementation of a 3rd-Generation SPARC V9 64b Microprocessor. In ISSCC Dig. Tech. Papers, pages , February [14] S. B. Samaan. The Impact of Device Parameter Variations on the Frequency and Performance of VLSI Chips. In ICCAD, pages , November [15] C. Visweswariah, K. Ravindran, K. Kalafala, S. G. Walker, and S. Narayan. First-Order Incremental Block-Based Statistical Timing Analysis. In DAC, pages , June [16] Gustavo Wilke and Rajeev Murgai. Accuracy of Interconnect Pi Models. In Fujitsu Laboratories of America Internal Document, August [17] S. Zanella, A. Nardi, A. Neviani, M. Quarantelli, S. Saxena, and C. Guardiani. Analysis of the Impact of Process Variations on Clock Skew. In IEEE Trans. on Semiconductor Manufacturing, pages , November [18] P. S. Zuchowski, P. A. Habitz, J. D. Hayes, and J. H. Oppold. Process and Environmental Variation Impacts on ASIC Timing. In ICCAD, pages , November 2004.

CMOS Logic Gate Performance Variability Related to Transistor Network Arrangements

CMOS Logic Gate Performance Variability Related to Transistor Network Arrangements CMOS Logic Gate Performance Variability Related to Transistor Network Arrangements Digeorgia N. da Silva, André I. Reis, Renato P. Ribas PGMicro - Federal University of Rio Grande do Sul, Av. Bento Gonçalves

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

Statistical Gate Delay Calculation with Crosstalk Alignment Consideration

Statistical Gate Delay Calculation with Crosstalk Alignment Consideration Statistical Gate Delay Calculation with Crosstalk Alignment Consideration Andrew B. Kahng UC San Diego La Jolla, CA 9293 abk@ucsd.edu Bao Liu UC San Diego La Jolla, CA 9293 bliu@cs.ucsd.edu Xu Xu UC San

More information

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) Objective Part A: To become acquainted with Spectre (or HSpice) by simulating an inverter,

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET4076) Lecture 4(part 2) Testability Measurements (Chapter 6) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Previous lecture What

More information

Process-Induced Skew Variation for Scaled 2-D and 3-D ICs

Process-Induced Skew Variation for Scaled 2-D and 3-D ICs Process-Induced Skew Variation for Scaled 2-D and 3-D ICs Hu Xu, Vasilis F. Pavlidis, and Giovanni De Micheli LSI-EPFL July 26, 2010 SLIP 2010, Anaheim, USA Presentation Outline 2-D and 3-D Clock Distribution

More information

FPGA Power Management and Modeling Techniques

FPGA Power Management and Modeling Techniques FPGA Power Management and Modeling Techniques WP-01044-2.0 White Paper This white paper discusses the major challenges associated with accurately predicting power consumption in FPGAs, namely, obtaining

More information

THE latest generation of microprocessors uses a combination

THE latest generation of microprocessors uses a combination 1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz

More information

Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation

Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation Yuzhe Chen, Zhaoqing Chen and Jiayuan Fang Department of Electrical

More information

Variation Tolerant Buffered Clock Network Synthesis with Cross Links

Variation Tolerant Buffered Clock Network Synthesis with Cross Links Variation Tolerant Buffered Clock Network Synthesis with Cross Links Anand Rajaram David Z. Pan Dept. of ECE, UT-Austin Texas Instruments, Dallas Sponsored by SRC and IBM Faculty Award 1 Presentation Outline

More information

Statistical Timing Analysis Using Bounds and Selective Enumeration

Statistical Timing Analysis Using Bounds and Selective Enumeration IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 9, SEPTEMBER 2003 1243 Statistical Timing Analysis Using Bounds and Selective Enumeration Aseem Agarwal, Student

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

ECE 5745 Complex Digital ASIC Design Topic 7: Packaging, Power Distribution, Clocking, and I/O

ECE 5745 Complex Digital ASIC Design Topic 7: Packaging, Power Distribution, Clocking, and I/O ECE 5745 Complex Digital ASIC Design Topic 7: Packaging, Power Distribution, Clocking, and I/O Christopher Batten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece5745

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

Multi-Voltage Domain Clock Mesh Design

Multi-Voltage Domain Clock Mesh Design Multi-Voltage Domain Clock Mesh Design Can Sitik Electrical and Computer Engineering Drexel University Philadelphia, PA, 19104 USA E-mail: as3577@drexel.edu Baris Taskin Electrical and Computer Engineering

More information

Clock Gating Optimization with Delay-Matching

Clock Gating Optimization with Delay-Matching Clock Gating Optimization with Delay-Matching Shih-Jung Hsu Computer Science and Engineering Yuan Ze University Chung-Li, Taiwan Rung-Bin Lin Computer Science and Engineering Yuan Ze University Chung-Li,

More information

10. Interconnects in CMOS Technology

10. Interconnects in CMOS Technology 10. Interconnects in CMOS Technology 1 10. Interconnects in CMOS Technology Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October

More information

Receiver Modeling for Static Functional Crosstalk Analysis

Receiver Modeling for Static Functional Crosstalk Analysis Receiver Modeling for Static Functional Crosstalk Analysis Mini Nanua 1 and David Blaauw 2 1 SunMicroSystem Inc., Austin, Tx, USA Mini.Nanua@sun.com 2 University of Michigan, Ann Arbor, Mi, USA Blaauw@eecs.umich.edu

More information

Calibrating Achievable Design GSRC Annual Review June 9, 2002

Calibrating Achievable Design GSRC Annual Review June 9, 2002 Calibrating Achievable Design GSRC Annual Review June 9, 2002 Wayne Dai, Andrew Kahng, Tsu-Jae King, Wojciech Maly,, Igor Markov, Herman Schmit, Dennis Sylvester DUSD(Labs) Calibrating Achievable Design

More information

On Constructing Lower Power and Robust Clock Tree via Slew Budgeting

On Constructing Lower Power and Robust Clock Tree via Slew Budgeting 1 On Constructing Lower Power and Robust Clock Tree via Slew Budgeting Yeh-Chi Chang, Chun-Kai Wang and Hung-Ming Chen Dept. of EE, National Chiao Tung University, Taiwan 2012 年 3 月 29 日 Outline 2 Motivation

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

An Exact Algorithm for the Statistical Shortest Path Problem

An Exact Algorithm for the Statistical Shortest Path Problem An Exact Algorithm for the Statistical Shortest Path Problem Liang Deng and Martin D. F. Wong Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Outline Motivation

More information

CHAPTER 4 DUAL LOOP SELF BIASED PLL

CHAPTER 4 DUAL LOOP SELF BIASED PLL 52 CHAPTER 4 DUAL LOOP SELF BIASED PLL The traditional self biased PLL is modified into a dual loop architecture based on the principle widely applied in clock and data recovery circuits proposed by Seema

More information

The Monte Carlo analysis can vary basic components and models - subcircuit data is not varied during the analysis.

The Monte Carlo analysis can vary basic components and models - subcircuit data is not varied during the analysis. Monte Carlo Analysis Old Content - visit altium.com/documentation Modified by Phil Loughhead on 4-Mar-2014 Description Monte Carlo analysis allows you to perform multiple simulation runs with component

More information

Crosstalk-Aware Signal Probability-Based Dynamic Statistical Timing Analysis

Crosstalk-Aware Signal Probability-Based Dynamic Statistical Timing Analysis Crosstalk-Aware Signal Probability-Based Dynamic Statistical Timing Analysis Yao Chen, Andrew B. Kahng, Bao Liu and Wenjun Wang University of Texas at San Antonio, 1 UTSA Circle, San Antonio, TX 78249

More information

CHAPTER 3 SIMULATION TOOLS AND

CHAPTER 3 SIMULATION TOOLS AND CHAPTER 3 SIMULATION TOOLS AND Simulation tools used in this simulation project come mainly from Integrated Systems Engineering (ISE) and SYNOPSYS and are employed in different areas of study in the simulation

More information

Signal Integrity Comparisons Between Stratix II and Virtex-4 FPGAs

Signal Integrity Comparisons Between Stratix II and Virtex-4 FPGAs White Paper Introduction Signal Integrity Comparisons Between Stratix II and Virtex-4 FPGAs Signal integrity has become a critical issue in the design of high-speed systems. Poor signal integrity can mean

More information

Combinatorial Algorithms for Fast Clock Mesh Optimization

Combinatorial Algorithms for Fast Clock Mesh Optimization Combinatorial Algorithms for Fast Clock Mesh Optimization Ganesh Venkataraman, Zhuo Feng, Jiang Hu, Peng Li Dept. of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843

More information

Chapter 6. CMOS Functional Cells

Chapter 6. CMOS Functional Cells Chapter 6 CMOS Functional Cells In the previous chapter we discussed methods of designing layout of logic gates and building blocks like transmission gates, multiplexers and tri-state inverters. In this

More information

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function. FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different

More information

Clock Skew Evaluation Considering Manufacturing Variability in Mesh-Style Clock Distribution

Clock Skew Evaluation Considering Manufacturing Variability in Mesh-Style Clock Distribution 9th International Symposium on Quality Electronic Design lock Skew Evaluation onsidering Manufacturing Variability in Mesh-Style lock Distribution Shinya Abe Masanori Hashimoto Takao Onoye Dept. Information

More information

Total Power-Optimal Pipelining and Parallel Processing under Process Variations in Nanometer Technology

Total Power-Optimal Pipelining and Parallel Processing under Process Variations in Nanometer Technology otal Power-Optimal Pipelining and Parallel Processing under Process ariations in anometer echnology am Sung Kim 1, aeho Kgil, Keith Bowman 1, ivek De 1, and revor Mudge 1 Intel Corporation, Hillsboro,

More information

A Practical Approach to Preventing Simultaneous Switching Noise and Ground Bounce Problems in IO Rings

A Practical Approach to Preventing Simultaneous Switching Noise and Ground Bounce Problems in IO Rings A Practical Approach to Preventing Simultaneous Switching Noise and Ground Bounce Problems in IO Rings Dr. Osman Ersed Akcasu, Jerry Tallinger, Kerem Akcasu OEA International, Inc. 155 East Main Avenue,

More information

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem. The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults

More information

VERY large scale integration (VLSI) design for power

VERY large scale integration (VLSI) design for power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 8 (2013), pp. 907-912 Research India Publications http://www.ripublication.com/aeee.htm Circuit Model for Interconnect Crosstalk

More information

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141 ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition

More information

Crosslink Insertion for Variation-Driven Clock Network Construction

Crosslink Insertion for Variation-Driven Clock Network Construction Crosslink Insertion for Variation-Driven Clock Network Construction Fuqiang Qian, Haitong Tian, Evangeline Young Department of Computer Science and Engineering The Chinese University of Hong Kong {fqqian,

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET4076) Lecture 8(2) I DDQ Current Testing (Chapter 13) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Learning aims Describe the

More information

3. Implementing Logic in CMOS

3. Implementing Logic in CMOS 3. Implementing Logic in CMOS 3. Implementing Logic in CMOS Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 27 September, 27 ECE Department,

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

A Sensor-Assisted Self-Authentication Framework for Hardware Trojan Detection

A Sensor-Assisted Self-Authentication Framework for Hardware Trojan Detection A Sensor-Assisted Self-Authentication Framework for Hardware Trojan Detection Min Li, Azadeh Davoodi, and Mohammad Tehranipoor Department of Electrical and Computer Engineering University of Wisconsin

More information

EE5780 Advanced VLSI CAD

EE5780 Advanced VLSI CAD EE5780 Advanced VLSI CAD Lecture 1 Introduction Zhuo Feng 1.1 Prof. Zhuo Feng Office: EERC 513 Phone: 487-3116 Email: zhuofeng@mtu.edu Class Website http://www.ece.mtu.edu/~zhuofeng/ee5780fall2013.html

More information

Embedded SRAM Technology for High-End Processors

Embedded SRAM Technology for High-End Processors Embedded SRAM Technology for High-End Processors Hiroshi Nakadai Gaku Ito Toshiyuki Uetake Fujitsu is the only company in Japan that develops its own processors for use in server products that support

More information

TABLE OF CONTENTS 1.0 PURPOSE INTRODUCTION ESD CHECKS THROUGHOUT IC DESIGN FLOW... 2

TABLE OF CONTENTS 1.0 PURPOSE INTRODUCTION ESD CHECKS THROUGHOUT IC DESIGN FLOW... 2 TABLE OF CONTENTS 1.0 PURPOSE... 1 2.0 INTRODUCTION... 1 3.0 ESD CHECKS THROUGHOUT IC DESIGN FLOW... 2 3.1 PRODUCT DEFINITION PHASE... 3 3.2 CHIP ARCHITECTURE PHASE... 4 3.3 MODULE AND FULL IC DESIGN PHASE...

More information

By Charvi Dhoot*, Vincent J. Mooney &,

By Charvi Dhoot*, Vincent J. Mooney &, By Charvi Dhoot*, Vincent J. Mooney &, -Shubhajit Roy Chowdhury*, Lap Pui Chau # *International Institute of Information Technology, Hyderabad, India & School of Electrical and Computer Engineering, Georgia

More information

Statistical Modeling for Monte Carlo Simulation using Hspice

Statistical Modeling for Monte Carlo Simulation using Hspice Statistical Modeling for Monte Carlo Simulation using Hspice Kerwin Khu Chartered Semiconductor Manufacturing Ltd khukerwin@charteredsemi.com ABSTRACT With today's stringent design margins, designers can

More information

Electronic Design Automation Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Electronic Design Automation Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Electronic Design Automation Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #1 Introduction So electronic design automation,

More information

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras CAD for VLSI Debdeep Mukhopadhyay IIT Madras Tentative Syllabus Overall perspective of VLSI Design MOS switch and CMOS, MOS based logic design, the CMOS logic styles, Pass Transistors Introduction to Verilog

More information

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Prof. Lei He EE Department, UCLA LHE@ee.ucla.edu Partially supported by NSF. Pathway to Power Efficiency and Variation Tolerance

More information

VLSI microarchitecture. Scaling. Toni Juan

VLSI microarchitecture. Scaling. Toni Juan VLSI microarchitecture Scaling Toni Juan Short index Administrative glass schedule projects? Topic of the day: Scaling VLSI microarchitecture Toni Juan, Nov 2000 2 Administrative Class Schedule Day Topic

More information

Implementing Bus LVDS Interface in Cyclone III, Stratix III, and Stratix IV Devices

Implementing Bus LVDS Interface in Cyclone III, Stratix III, and Stratix IV Devices Implementing Bus LVDS Interface in Cyclone III, Stratix III, and Stratix IV Devices November 2008, ver. 1.1 Introduction LVDS is becoming the most popular differential I/O standard for high-speed transmission

More information

Delay Modeling and Static Timing Analysis for MTCMOS Circuits

Delay Modeling and Static Timing Analysis for MTCMOS Circuits Delay Modeling and Static Timing Analysis for MTCMOS Circuits Naoaki Ohkubo Kimiyoshi Usami Graduate School of Engineering, Shibaura Institute of Technology 307 Fukasaku, Munuma-ku, Saitama, 337-8570 Japan

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

arxiv: v1 [cs.ar] 14 May 2017

arxiv: v1 [cs.ar] 14 May 2017 Fast Statistical Timing Analysis for Circuits with Post-Silicon Tunable Clock Buffers Bing Li, Ning Chen, Ulf Schlichtmann Institute for Electronic Design Automation, Technische Universitaet Muenchen,

More information

Novel Methodology for Mid-Frequency Delta-I Noise Analysis of Complex Computer System Boards and Verification by Measurements

Novel Methodology for Mid-Frequency Delta-I Noise Analysis of Complex Computer System Boards and Verification by Measurements Novel Methodology for Mid-Frequency Delta-I Noise Analysis of Complex Computer System Boards and Verification by Measurements Bernd Garben IBM Laboratory, 7032 Boeblingen, Germany, e-mail: garbenb@de.ibm.com

More information

Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs

Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs A Technology Backgrounder Actel Corporation 955 East Arques Avenue Sunnyvale, California 94086 April 20, 1998 Page 2 Actel Corporation

More information

Statistical Modeling of Pipeline Delay and Design of Pipeline under Process Variation to Enhance Yield in sub-100nm Technologies *

Statistical Modeling of Pipeline Delay and Design of Pipeline under Process Variation to Enhance Yield in sub-100nm Technologies * Statistical Modeling of Pipeline Delay and Design of Pipeline under Process Variation to Enhance Yield in sub-nm Technologies * Animesh Datta, Swarup Bhunia, Saibal Mukhopadhyay, ilanjan Banerjee, and

More information

Implementing Synchronous Counter using Data Mining Techniques

Implementing Synchronous Counter using Data Mining Techniques Implementing Synchronous Counter using Data Mining Techniques Sangeetha S Assistant Professor,Department of Computer Science and Engineering, B.N.M Institute of Technology, Bangalore, Karnataka, India

More information

Effects of Specialized Clock Routing on Clock Tree Timing, Signal Integrity, and Routing Congestion

Effects of Specialized Clock Routing on Clock Tree Timing, Signal Integrity, and Routing Congestion Effects of Specialized Clock Routing on Clock Tree Timing, Signal Jesse Craig IBM Systems & Technology Group jecraig@us.ibm.com Denise Powell Synopsys, Inc. dpowell@synopsys.com ABSTRACT Signal integrity

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 10 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Content Manufacturing Defects Wafer defects Chip defects Board defects system defects

More information

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 09, 2016 ISSN (online): 2321-0613 A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM Yogit

More information

An FPGA Architecture Supporting Dynamically-Controlled Power Gating

An FPGA Architecture Supporting Dynamically-Controlled Power Gating An FPGA Architecture Supporting Dynamically-Controlled Power Gating Altera Corporation March 16 th, 2012 Assem Bsoul and Steve Wilton {absoul, stevew}@ece.ubc.ca System-on-Chip Research Group Department

More information

Parallel Circuit Simulation: How Good Can It Get? Andrei Vladimirescu

Parallel Circuit Simulation: How Good Can It Get? Andrei Vladimirescu Parallel Circuit Simulation: How Good Can It Get? Andrei Vladimirescu Overview Opportunities for Full-Chip Analog Verification Analog vs. Digital Design SPICE standard design tool for Analog and Mixed-Signal

More information

HOME :: FPGA ENCYCLOPEDIA :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE

HOME :: FPGA ENCYCLOPEDIA :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE Page 1 of 8 HOME :: FPGA ENCYCLOPEDIA :: ARCHIVES :: MEDIA KIT :: SUBSCRIBE FPGA I/O When To Go Serial by Brock J. LaMeres, Agilent Technologies Ads by Google Physical Synthesis Tools Learn How to Solve

More information

DETECTING timing-related defects has become vital for

DETECTING timing-related defects has become vital for IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 6, JUNE 2013 1129 Crosstalk- and Process Variations-Aware High-Quality Tests for Small-Delay Defects Ke Peng, Member, IEEE,

More information

ProASIC3/E SSO and Pin Placement Guidelines

ProASIC3/E SSO and Pin Placement Guidelines ProASIC3/E SSO and Pin Placement Guidelines Introduction SSO Effects Ground bounce and VCC bounce have always been present in digital integrated circuits (ICs). With the advance of technology and shrinking

More information

METAL OXIDE VARISTORS

METAL OXIDE VARISTORS POWERCET CORPORATION METAL OXIDE VARISTORS PROTECTIVE LEVELS, CURRENT AND ENERGY RATINGS OF PARALLEL VARISTORS PREPARED FOR EFI ELECTRONICS CORPORATION SALT LAKE CITY, UTAH METAL OXIDE VARISTORS PROTECTIVE

More information

Post Silicon Electrical Validation

Post Silicon Electrical Validation Post Silicon Electrical Validation Tony Muilenburg 1 1/21/2014 Homework 4 Review 2 1/21/2014 Architecture / Integration History 3 1/21/2014 4 1/21/2014 Brief History Of Microprocessors 5 1/21/2014 6 1/21/2014

More information

Linking Layout to Logic Synthesis: A Unification-Based Approach

Linking Layout to Logic Synthesis: A Unification-Based Approach Linking Layout to Logic Synthesis: A Unification-Based Approach Massoud Pedram Department of EE-Systems University of Southern California Los Angeles, CA February 1998 Outline Introduction Technology and

More information

CAD Technology of the SX-9

CAD Technology of the SX-9 KONNO Yoshihiro, IKAWA Yasuhiro, SAWANO Tomoki KANAMARU Keisuke, ONO Koki, KUMAZAKI Masahito Abstract This paper outlines the design techniques and CAD technology used with the SX-9. The LSI and package

More information

Low Power PLAs. Reginaldo Tavares, Michel Berkelaar, Jochen Jess. Information and Communication Systems Section, Eindhoven University of Technology,

Low Power PLAs. Reginaldo Tavares, Michel Berkelaar, Jochen Jess. Information and Communication Systems Section, Eindhoven University of Technology, Low Power PLAs Reginaldo Tavares, Michel Berkelaar, Jochen Jess Information and Communication Systems Section, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands {regi,michel,jess}@ics.ele.tue.nl

More information

Chapter 5: ASICs Vs. PLDs

Chapter 5: ASICs Vs. PLDs Chapter 5: ASICs Vs. PLDs 5.1 Introduction A general definition of the term Application Specific Integrated Circuit (ASIC) is virtually every type of chip that is designed to perform a dedicated task.

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017 Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of

More information

ProASIC PLUS SSO and Pin Placement Guidelines

ProASIC PLUS SSO and Pin Placement Guidelines Application Note AC264 ProASIC PLUS SSO and Pin Placement Guidelines Table of Contents Introduction................................................ 1 SSO Data.................................................

More information

A Low Power SRAM Cell with High Read Stability

A Low Power SRAM Cell with High Read Stability 16 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.9, NO.1 February 2011 A Low Power SRAM Cell with High Read Stability N.M. Sivamangai 1 and K. Gunavathi 2, Non-members ABSTRACT

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

FPGA Clock Network Architecture: Flexibility vs. Area and Power

FPGA Clock Network Architecture: Flexibility vs. Area and Power FPGA Clock Network Architecture: Flexibility vs. Area and Power Julien Lamoureux and Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, B.C.,

More information

Low-Power Technology for Image-Processing LSIs

Low-Power Technology for Image-Processing LSIs Low- Technology for Image-Processing LSIs Yoshimi Asada The conventional LSI design assumed power would be supplied uniformly to all parts of an LSI. For a design with multiple supply voltages and a power

More information

LPRAM: A Novel Low-Power High-Performance RAM Design With Testability and Scalability. Subhasis Bhattacharjee and Dhiraj K. Pradhan, Fellow, IEEE

LPRAM: A Novel Low-Power High-Performance RAM Design With Testability and Scalability. Subhasis Bhattacharjee and Dhiraj K. Pradhan, Fellow, IEEE IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 5, MAY 2004 637 LPRAM: A Novel Low-Power High-Performance RAM Design With Testability and Scalability Subhasis

More information

2015 Paper E2.1: Digital Electronics II

2015 Paper E2.1: Digital Electronics II s 2015 Paper E2.1: Digital Electronics II Answer ALL questions. There are THREE questions on the paper. Question ONE counts for 40% of the marks, other questions 30% Time allowed: 2 hours (Not to be removed

More information

Empirical Comparisons of Fast Methods

Empirical Comparisons of Fast Methods Empirical Comparisons of Fast Methods Dustin Lang and Mike Klaas {dalang, klaas}@cs.ubc.ca University of British Columbia December 17, 2004 Fast N-Body Learning - Empirical Comparisons p. 1 Sum Kernel

More information

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling Based on Interconnect Prediction and Sampling Yu Hu King Ho Tam Tom Tong Jing Lei He Electrical Engineering Department University of California at Los Angeles System Level Interconnect Prediction (SLIP),

More information

Frequency and Voltage Scaling Design. Ruixing Yang

Frequency and Voltage Scaling Design. Ruixing Yang Frequency and Voltage Scaling Design Ruixing Yang 04.12.2008 Outline Dynamic Power and Energy Voltage Scaling Approaches Dynamic Voltage and Frequency Scaling (DVFS) CPU subsystem issues Adaptive Voltages

More information

Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor

Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998 707 Issue Logic for a 600-MHz Out-of-Order Execution Microprocessor James A. Farrell and Timothy C. Fischer Abstract The logic and circuits

More information

Crosstalk Aware Static Timing Analysis Environment

Crosstalk Aware Static Timing Analysis Environment Crosstalk Aware Static Timing Analysis Environment B. Franzini, C. Forzan STMicroelectronics, v. C. Olivetti, 2 20041 Agrate B. (MI), ITALY bruno.franzini@st.com, cristiano.forzan@st.com ABSTRACT Signals

More information

Determination of Worst-case Crosstalk Noise for Non-Switching Victims in GHz+ Interconnects

Determination of Worst-case Crosstalk Noise for Non-Switching Victims in GHz+ Interconnects Determination of Worst-case Crosstalk Noise for Non-Switching Victims in GHz+ Interconnects Jun Chen ECE Department University of Wisconsin, Madison junc@cae.wisc.edu Lei He EE Department University of

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2017 Lecture 13 COMPUTER MEMORY So far, have viewed computer memory in a very simple way Two memory areas in our computer: The register file Small number

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 10: Three-Dimensional (3D) Integration

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 10: Three-Dimensional (3D) Integration 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 10: Three-Dimensional (3D) Integration Instructor: Ron Dreslinski Winter 2016 University of Michigan 1 1 1 Announcements

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

Physical Implementation

Physical Implementation CS250 VLSI Systems Design Fall 2009 John Wawrzynek, Krste Asanovic, with John Lazzaro Physical Implementation Outline Standard cell back-end place and route tools make layout mostly automatic. However,

More information

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache Stefan Rusu Intel Corporation Santa Clara, CA Intel and the Intel logo are registered trademarks of Intel Corporation or its subsidiaries in

More information

EPSON. Technical Note. Oscillator Jitter and How to Measure It. Introduction. Jitter. Cycle-Cycle Jitter

EPSON. Technical Note. Oscillator Jitter and How to Measure It. Introduction. Jitter. Cycle-Cycle Jitter 1960 E. Grand Ave., 2 nd Floor El Segundo, California 90245 Phone: 310.955.5300 Fax: 310.955.5400 Technical Note Oscillator Jitter and How to Measure It Introduction Jitter is a term that is becoming widely

More information

McGill University - Faculty of Engineering Department of Electrical and Computer Engineering

McGill University - Faculty of Engineering Department of Electrical and Computer Engineering McGill University - Faculty of Engineering Department of Electrical and Computer Engineering ECSE 494 Telecommunication Networks Lab Prof. M. Coates Winter 2003 Experiment 5: LAN Operation, Multiple Access

More information

Pentium Processor Compatible Clock Synthesizer/Driver for ALI Aladdin Chipset

Pentium Processor Compatible Clock Synthesizer/Driver for ALI Aladdin Chipset 1CY 225 7 fax id: 3517 Features Multiple clock outputs to meet requirements of ALI Aladdin chipset Six CPU clocks @ 66.66 MHz, 60 MHz, and 50 MHz, pin selectable Six PCI clocks (CPUCLK/2) Two Ref. clocks

More information

Quantifying Robustness Metrics in Parameterized Static Timing Analysis

Quantifying Robustness Metrics in Parameterized Static Timing Analysis Quantifying Robustness Metrics in Parameterized Static Timing Analysis Khaled R. Heloue ECE Department University of Toronto Toronto, Ontario, Canada khaled@eecg.utoronto.ca Chandramouli V. Kashyap Strategic

More information

MEMORIES. Memories. EEC 116, B. Baas 3

MEMORIES. Memories. EEC 116, B. Baas 3 MEMORIES Memories VLSI memories can be classified as belonging to one of two major categories: Individual registers, single bit, or foreground memories Clocked: Transparent latches and Flip-flops Unclocked:

More information

DECOUPLING LOGIC BASED SRAM DESIGN FOR POWER REDUCTION IN FUTURE MEMORIES

DECOUPLING LOGIC BASED SRAM DESIGN FOR POWER REDUCTION IN FUTURE MEMORIES DECOUPLING LOGIC BASED SRAM DESIGN FOR POWER REDUCTION IN FUTURE MEMORIES M. PREMKUMAR 1, CH. JAYA PRAKASH 2 1 M.Tech VLSI Design, 2 M. Tech, Assistant Professor, Sir C.R.REDDY College of Engineering,

More information