Clock Skew Optimization Considering Complicated Power Modes

Size: px
Start display at page:

Download "Clock Skew Optimization Considering Complicated Power Modes"

Transcription

1 Clock Skew Optimization Considering Complicated Power Modes Chiao-Ling Lung 1,2, Zi-Yi Zeng 1, Chung-Han Chou 1, Shih-Chieh Chang 1 National Tsing-Hua University, HsinChu, Taiwan 1 Industrial Technology Research Institute, HsinChu, Taiwan 2 cllung0608@gmail.com, zen.ziyi@gmail.com, u942518@oz.nthu.edu.tw, scchang@cs.nthu.edu.tw Abstract To conserve energy, a design which utilizes different power modes has been widely adopted. However, when a design has many different power modes, clock tree optimization (CTO) becomes very difficult. In this paper, we propose a two-level power-mode-aware CTO methodology. Among all different power modes, the chip-level CTO globally reduces clock skew among modules, whereas the module-level CTO reduces clock skew within a single module. Our experimental results show that the power-mode-aware CTO can achieve significant improvement in the worst-case condition with only a minor penalty in area. Keywords-power modes, clock tree, clock skew I. INTRODUCTION Due to technology scaling, the ITRS roadmap 2008 Update [24] predicts that, by 2015, high performance integrated circuits will work with on-chip local clock frequencies up to 8.5 GHz. However, in synchronous design, the performance is limited not only by the speed capability of devices but also by the synchronization ability of data signals. The clock skew, the maximum difference among the clock arrival times of sequential elements, imposes important constraints on the system performance. Power Modes Full Speed Figure 1. Industrial example. MPU DSP1 DSP2 1.2V 1.2V 1.2V Active1 1.2V 1.2V 1.0V Active2 1.2V 1.0V 1.2V Suspend 1.0V 1.0V 1.0V Inactive 1.0V 0V 0V Many previous works have concentrated on the problem of clock skew minimization. In [2] [3], clock trees are constructed by zero- or bounded-skew routing. To achieve further skew control, buffer and wire-sizing techniques have been proposed by [4] [6] [18-19]. In order to consider process variation issues, a statistical timing model is used for clock tree optimization [9] [12]. Some researchers [8] [10] use an intentional useful skew scheduling to improve system performance. Special structures such as hybrid and clock meshes have been studied in [7] [14-15] [17]. To lower power consumption, [13] suggests a lowpower clock scheme by distributing the clock signal at a lower voltage and translating it to a higher voltage at the utilization points. A type-matching method is proposed by [5] to consider the impact of clock gating. Chip-level clock tree synthesis is presented by [16] to construct a clock tree for SoC. A novel clock distribution methodology is presented by [11] to perform dynamic de-skewing during the operation of the chip. Despite many studies on clock tree optimization, clock skew minimization is still difficult to achieve in advanced power-saving methodologies where many different power modes are used. Take an industrial case shown in Figure 1 as an example. The design has over 40 modules, some of which may operate in 1.2 V or 1.0V, or may completely shut down. The design has a total of 64 power modes to fit various operating requirements. Some power modes are shown in Figure 1. Since the operating voltage has great influence on the delay of a clock buffer, the clock arrival times of FF sinks in a module may vary greatly when the module performs in a different operating voltage. As a result, it is extremely difficult to implement a single piece of clock network that satisfies the clock skew constraints in all possible power modes. The difficulty of generating a single clock tree to satisfy clock skew constraints in multiple power modes has been pointed out in several industrial publications [21-23]. One way to resolve the clock skew problem is to adopt the asynchronous design style. However, an asynchronous design is difficult to verify and requires an additional synchronizer circuit to handle data synchronization. The previous work [21] uses the delay locked loop (DLL) to synchronize the clock between power domains. As far as we know, none of the previous works have proposed solutions to the problem of clock skew minimization of complicated power modes in the synchronous way. In this paper, we propose a Power-Mode-Aware CTO framework to resolve the skew issue in the complicated power modes. Our framework consists of two major subcomponents the chip-level CTO and the module-level CTO. The chip-level CTO attempts to reduce the global clock skew in a design among all possible power modes. In contrast, the module-level CTO tries to reduce the local clock skew within a module among all different operating voltages. In the chip-level CTO, we propose novel power-modeaware buffers (PMABs) which are inserted into a chip-level clock tree to balance the clock skew among various modules of differing voltage modes. The PMAB is a super buffer with /DATE EDAA

2 mode-selection capability. The delays of a PMAB can be adjusted under various mode conditions. In this paper, we have innovated two different ways of implementing a PMAB which attempts to reduce inter-module clock skew. In the modulelevel CTO, we follow the popular way [7] [19] of using linear programming to reduce the clock skew. We have used an industrial 65nm technology library to perform a set of experiments and the results are very promising. The major contributions of this paper are summarized as follows. We propose to resolve the clock skew problem due to complicated power modes by using a PMAB which has various propagation delays to be chosen by a voltage mode. To reduce the area penalty of a PMAB, we explore the flexibility of designing a PMAB. Our methodology can cope with the current design flow. The rest of this paper is organized as follows. Section II introduces chip-level CTO. Section III describes a modulelevel CTO. Then Section IV demonstrates how to implement our framework with a commercial design flow to achieve onepass clock skew optimization. In Section V, we show experimental results on benchmark circuits. Section VI summarizes our findings to conclude the paper. II. CHIP-LEVEL CTO have five power modes for the design: Full Speed, Active 1, Active 2, Suspend and Inactive. However, for all modules including MPU, DSP1 and DSP2, we have only two voltage modes, 1.2V and 1.0V. Table I. An example design with four power modes. Power Mode M1 M2 M3 V L E V L E V L E pm pm pm pm V: Voltage; L: Latest latency; E: Earliest latency We now describe the steps for designing a PMAB. In the first step, we analyze and record the global latest clock latency called L global among all modules of possible voltage modes. Consider the example in Table I where the design has three modules (M1, M2, M3), four power modes (pm1, pm2, pm3, pm4) and two voltage modes (1.2V and 1.0V). In power mode pm4, module M1 operates in voltage mode 1.0V with the latest clock latency of 14. Similarly, we have latency of 9 for M2 operating in 1.2V and latency of 13 for M3 in 1.0V. Among all modules in all voltage modes, the global latest clock latency is L global = 14, which is the latest clock latency of module M1 operating in 1.0V. In addition, there is a clock skew of 7 between the latency of M1 in 1.0V and the latency of M2 in 1.2V. The clock skew of 7 is called the global clock skew of the design and is denoted as Skew global. Voltage Mode SELv Alignment Delay B 1 B 2 B 3 1.2V V Figure 2. An example of clock tree with PMABs. This section describes the design of a chip-level CTO which inserts PMABs to balance clock skew among modules. An example of clock tree with PMABs is given in Figure 2, where triangles stand for PMABs, solid lines represent clock signals and dotted lines are selection signals generated from the power mode controller. In this section, we first present a possible implementation of a PMAB design and then present important lemmas relating to a PMAB. After that, we then propose a modified PMAB which has less area cost and better efficiency in clock latency than the original one. A. PMAB Design First, we would like to clarify the terms voltage mode and power mode. Throughout this paper, the term voltage mode describes different operating voltages for a module, whereas the term power mode describes different configurations of the operating voltages of modules. For example in Figure 1, we (a) Original clock tree (c) Clock tree with PMABs (b) Alignment Delay (d) An example of a PMAB Figure 3. An example of a PMAB and a clock tree with PMABs. Next, for each voltage mode of a module, we calculate the delay to align its latest clock latency with L global. The delay to align the clock latency with L global for a module m in a voltage mode v is called the alignment delay of module m in voltage mode v and is denoted as m,v. In the same example, the latest clock latency of module M2 in 1.2V is 9. To align with L global (14), we need a delay of 5 (=14-9) so that the latest clock latency will be the same as L global. Therefore, we say that the

3 alignment delay of 1.2V for module M2 is M2,1.2 = 5. For another example, the alignment delay of 1.0V for module M3 is M3,1.0 = 1 (=14-13). For a module, we can calculate the alignment delays of voltage modes. Then, we design a PMAB of a module as a tunable delay element which uses the voltage mode as the select signal to select a set of the corresponding alignment delays. In the same example, the PMAB for module M3 has two voltage modes, 1.2V and 1.0V. The alignment delay of 1.2V is M3,1.2 = 3 and the alignment delay of 1.0V is M3,1.0 = 1. The PMAB of module M3 can be designed using a MUX which has the voltage mode as the select signal and two delay buffers with the delay of 3 for 1.2V and the delay of 1 for 1.0V, as shown in Figure 3(b) and Figure 3(d). After PMABs insertion, we can reduce Skew global from 7 to 4. Figure 3(a) shows the original clock tree and Figure 3(c) demonstrates a clock tree with PMABs. B. Characteristics of a PMAB With the insertion of the PMAB for a module, we can align the latest clock latency of a module in each voltage mode to L global. As a result, after inserting PMABs, we have the important property that the latest clock latency of each module is equal to L global for any given voltage mode. We have the following lemmas. Lemma 1: After inserting of a PMAB, the clock latencies within a module vary at the same pace or in other words, the clock skew within a module does not change. Informal proof: Since the sequential elements in a module belong to the same PMAB, no matter how many delays are padded by the PMAB, the clock latencies in the same module increase by the same quantity every time. Q.E.D. Lemma 2: After inserting of PMABs, we can obtain the optimal global clock skew of a design. And the optimal global clock skew equals to the maximal local clock skew of all modules among voltage modes. Informal proof: According to Lemma 1, the local clock skew of a module cannot be improved by a PMAB. As a result, the best possible global clock skew which can be achieved is the largest local clock skew. Q.E.D. The above lemmas state that the use of PMABs allows us to neglect the inter-module clock skew. Thus, we need only to focus on the reduction of clock skew within a module. In the Table II. Symbols Definition Symbols Description Example L global The maximal latest clock latency among all Take Table modules of possible voltage modes I as example E global The minimal earliest clock latency among all L global = 14 modules of possible voltage modes E global = 7 Skew global The difference between L global and E global Skew global = 7 Skew local The maximal local skew within a module Skew local = 4 L local The latest clock latency of module with Skew local L local = 14 E local The earliest clock latency of module with Skew local E local = 10 (a) Original (b) PMAB (c) Modified PMAB Figure 4. The alignment delays of the case listed in Table I. same example in Table I, among all local clock skews, the largest local clock skew is 4 when module M1 operates in 1.0V. In general, without PMABs, the global clock skew can be larger than the largest local clock skew of 4. However, Lemma 2 states that after inserting PMABs, the global clock skew is equal to the largest local clock skew of 4. C. A modified PMAB design The PMAB design described above tries to align the latest clock latencies of all modules to L global. In this section, we show that despite the simplicity of a PMAB design, the restriction of aligning only to L global is unnecessary in certain power modes and may cause large area penalty. We now present a modified PMAB design to alleviate the unnecessary restriction while still maintaining the good properties of Lemma 1 and 2 of a PMAB design. Before the discussion of a modified PMAB, we need new definitions of symbols. First, among all modules in all voltage modes, we say that the maximal local clock skew is Skew local and its corresponding earliest and latest clock latencies are E local and L local, i.e., Skew local = L local - E local. Then, as with L global, we defined a new symbol E global which is the global earliest clock latency among all modules of possible voltage modes. We summarize all symbols in Table II. For example in Table I, the largest local clock skew within a module, Skew local is 4 when M1 operates in 1.0V with E local = 10 and L local = 14. In addition, Skew global is 7 when M2 operates in 1.2V with E global = 7 and M1 operates in 1.0V with L global = 14. According to Lemma 2, after PMABs insertion, we have Skew global = Skew local, E global = E local, and L global = L local. As a

4 result, we need only to make sure all other clock latencies are located between E global and L global. Based on this observation, we have the flexibility of assigning the delays of a PMAB to be within the range and still achieve the optimal clock skew. With the flexible delay assignment, we can reduce the area for designing a PMAB. Figure 4 shows the clock latency and skew information for the example shown in Table I. The solid bar represents a range from the earliest clock latency to the latest clock latency, and the dashed bar represents the alignment delay for each module in each voltage mode. The double-headed arrow represents the global clock skew and, the dashed arrow represents the skew improvement. Figure 4(a) illustrates the original clock latency and skew information before PMAB insertion, and Figure 4(b) shows the result after PMAB insertion, where all latest clock latencies have been aligned to L global of 14. A modified PMAB, which will be described later, may have the clock latencies shown in Figure 5(c). All of them are within the range but do not align to the latest one. Take module M3 in 1.0V as an example in Figure 5(a), the latest clock latency of 13 is less than L local of 14, and the earliest clock latency of 11 is greater than E local of 10. For a modified PMAB, we can assign M3,1.0 = 0 and keep the clock skew unchanged. The delay of M3,1.0 being 0 means that there is no need for a delay buffer. On the other hand, the delay of M1,1.2 can be within the range from 1 to 3 without affecting the optimal clock skew. We now show that under different conditions among L global, L local, E global and E local, we need to use different formulations to calculate the flexibility of alignment delays. We have exhausted all possible conditions and categorize the conditions into four types. The mathematical expressions of the four types are as follows. Type 1. Type 2. Type 3. L local = L global and E local = E global L local < L global and E local > E global L local < L global and E local = E global 1. delay_assignment { 2. case (Type = 1) 3. do nothing 4. case (Type = 2 or 3) { 5. local = L global - L local 6. E local = E local + local 7. foreach (module m) 8. foreach (operating voltages v) 9. if (E m,v < E local ) then 10. m,v = E local - E m,v 11. } 12. case (Type = 4){ 13. foreach (module m) 14. foreach (operating voltages v) 15. if (E m,v < E local ) then 16. m,v = E local - E m,v 17. } 18. } Figure 5. Pseudo code of delay assignment. Type 4. L local = L global and E local > E global The procedures to calculate alignment delays for each type are described in Figure 5 and the complexity is O(kN), where k is the number of voltage modes and N is the number of modules. III. MODULE-LEVEL CTO The purpose of the module-level CTO is to build a clock tree which has the smallest skew possible within a module. In our framework, we utilize a similar linear program methodology [7] [19] which is commonly used for the clock skew minimization. We derive an LP formulation whose goal is to minimize the maximum clock skew within a module. Our LP formulation consists of two categories of LP constraints -- clock path constraints, and clock skew constraints. The clock path constraints describe the delay of a clock path by summing up the delays of buffers and wires on the clock path. The clock skew constraints are to calculate the maximum clock skew. Inputs: 1. An initial buffered clock tree topological T, 2. d i is the delay of b i, i {1,,N}, 3. w i is the delay of the wire between b i to its parent, i {1,,N} 4. P j is a set of buffers from clock source to s j, j {1,,M} Decision variables: Δd i, i {1,,N} Objective function: minimize: skew Subject to: // clock path constraints a j =Σ(w i + d i + Δd i ), i P j, j {1,,M} // clock skew constraints a max a j, a min a j, j {1,,M} skew = a max a min Outputs: 1. optimal latency at j of s j, j {1,,M} 2. optimal delay dt i of b i, i {1,,N} Figure 6. LP formulation. Given an initial clock tree T with N buffers and M sinks, the LP formulation can be stated as in Figure 6, where b i and s j denote the i th buffer and the j th sink on the clock tree; where d i and dt i are the delay and target delay of b i ; where a j and at j are the clock latency and target clock latency of s j ; where a max and a min are the maximum and minimum clock latency; and where skew max is the maximum skew. Although the LP formulation can provide an optimal clock skew, an exact solution requires rich delay buffers with various delay values. However, only a limited range of buffer sizes is available in a library. Traditionally, a mapping stage has been required to map a delay solution from an LP to a buffer with the closest delay. We found that the optimal delay for those buffers whose positions are not in the critical paths can be stated in a range that still achieves the optimal clock skew. This observation provides more flexibility when mapping the LP s solution to library cells.

5 IV. OUR FRAMEWORK To achieve an automation framework, our framework can work with a commercial design flow. We use PrimeTime as static timing analysis engine. In addition, since the interconnection delay becomes an increasingly larger component of the total delay in advance technology, the interconnection delay should also be considered. Stand Parasitic Exchange Format (SPEF) [25], the widely adopted format which records wire resistance and capacitance is used in our framework to take interconnection delay into account. Figure 7. Experimental Flow. Our experimental flow is shown in Figure 7. First, the clock trees of all modules are generated by the tool SOC Encounter with a level-shifter inserted. Second, we extract the clock tree structure and the interconnect information generated by SOC Encounter, where interconnect information is recorded in Stand Parasitic Exchange Format (SPEF) with wire resistance and capacitance. The module-level CTO is performed as follows. We use PrimeTime to extract clock latency and skew information, and to generate linear programming constraints. The linear programming constraints are solved by lpsolve_5.5. Our delaymapping algorithm uses the result of LP to generate the final clock tree for each module. After finishing module-level CTO, we then perform chiplevel CTO. We insert a PMAB for each module. Utilizing the clock latency information, we determine the alignment delays. During the construction of a PMAB, an alignment delay is formed by a buffer chain in which the buffers have been selected from industrial technology libraries. Finally, we generate the new design with PMABs inserted, and the report of clock information. V. EXPERIMENTAL RESULTS We have implemented our approach as shown in Figure 7, and applied the approach on a large industrial design with more than 56 power modes. To test more designs, we also created a set of new designs consisting of two or three modules instantiated from ISCAS89 benchmark circuits. Each new circuit is assumed to have two voltage modes, 1.32V and 0.9V. The initial clock tree given to our approach is constructed as follows. We first use Design Compiler to map all circuits to industrial 65nm technology library and use SOC Encounter to perform placement, clock tree synthesis and routing. After that, we obtained the initial clock tree by performing SOC Encounter assuming that all modules operate in the high voltage because timing is normally critical in this power mode. We ran all experiments on a Linux OS workstation, with 2.8 GHz CPU and 4 GB memory. The experimental results are shown in Table III. Columns one to three show the name of the circuit, the total number of sequential elements (FF), and the power modes (PM) in a circuit, respectively. Columns four to seven show the worst clock skew of all power modes of SOC Encounter (SOCE), PMAB, modified PMAB (mpmab), and the skew improvement of mpmab compared with SOCE (in %), respectively. Columns eight to eleven show the average clock skew of all power modes of SOCE, PMAB, mpmab and the skew improvement of mpmab compared with SOCE (in %), respectively. Columns twelve to fifteen show the worst clock latency of SOCE, PMAB, mpmab and the latency overhead of mpmab compared with SOCE, respectively. Columns sixteen to eighteen show the area overhead of PMAB, mpmab and the area overhead improvement of mpmab compared with PMAB. Finally, column nineteen shows the runtime of mpmab. For the case of IND1 in Table III, the worst clock skew of SOC Encounter is ps and the average clock skew is ps. After applying modified PMAB, the worst clock skew becomes 163.7ps and the average clock skew is ps. Our approach achieves a 66.94% improvement in the worst clock skew and a 65.75% improvement in the average clock skew. In this case, the clock latency penalty due to the PMAB is 36.78ps and the area overhead of the PMAB is only 0.05% of the total cell area, which doesn t consider the routing overhead. In addition, compared with PMAB, modified PMAB reduces 16.67% of area overhead, but still keep the skew unchanged. The clock latency distributions after applying PMAB and after applying modified PMAB are shown in Figure 8. On average, both PMAB and modified PMAB improve 74% of the worst clock skew, whereas the average worst latency penalty of modified PMAB is 39.26ps. Furthermore, the average area overheads of PMAB and modified PMAB are 0.16% and 0.12%. Although the worst clock skew and worst clock latency of modified PMAB are as good as PMAB, but the average latency overhead and area overhead of modified PMAB are less than PMAB. Our experimental results show that, compared with PMAB, modified PMAB improves 16.41% of the average latency overhead and 25.61% of area overhead on average. Furthermore, there exist minor difference between a PMAB and the corresponding modified PMAB in those columns regarding the worst clock skew, the average clock skew and the worst latency. The reason of minor difference is caused by the mapping inaccuracy that is to use delay buffers to implement certain delays. VI. CONCLUSIONS In this paper, we have proposed efficient ways to optimize clock skew considering the complicated power modes in an SoC design. Our methodology consists of a chip-level CTO and a module-level CTO. We also present our flow to adapt to a current design flow. Our experiments show that, both the PMAB and modified PMAB approaches dramatically improve

6 Circuits #FF Table III. Experimental results # Worst Clock Skew(ps) Average Clock Skew(ps) Worst Latency(ps) Area Overhead PM SOCE PMAB mpmab % SOCE PMAB mpmab % SOCE PMAB mpmab PMAB mpmab % Runtime (s) IND1 18, % % % 0.05% 16.67% 1934 case1 3, % % % 0.10% 37.50% 33 case2 2, % % % 0.09% 25.00% 27 case3 3, % % % 0.08% 27.27% 31 case4 3, % % % 0.09% 25.00% 26 case5 1, % % % 0.28% 22.22% 7 Avg % 63.92% % 0.12% 25.61% 343 the clock skew while incurring very little additional area overhead for designs with complicated power modes. Compared with PMAB, the modified PMAB approach utilizes less area and latency, while still maintaining the quality of results. REFERENCES [1] P. Ampadu, Ultra-low voltage VLSI : are we there yet?, in Proc. of ISCAS, pp , 2006 [2] K. D. Boese and A. B. Kahng, Zero-skew clock routing trees with minimum wirelength, in Proc. of IEEE 5th Int. ASIC Conf., pp , [3] T. H. Chao, Y. C. Hsu, J. M. Ho, K. D. Boese and A.B. Kahng, Zero skew clock routing with minimum wire length, in IEEE Trans. on Circuits Systems, vol. 39, pp , [4] C. C. N. Chu and D. F. Wong, An efficient and optimal algorithm for simultaneous buffer and wire sizing, in IEEE Trans. on Computer- Aided Design, vol. 18, pp , Sept [5] C. M. Chang, S. H. Huang, Y. K. Ho, J. Z. Lin, H. P. Wang and Y. S. Lu, Type-matching clock tree for zero skew clock gating, in Proc. of DAC, pp , 2008 [6] J. Cong and K. S. Leung, "Optimal wiresizing under the distributed elmore delay model," in IEEE Trans. on CAD, vol.14, pp , Mar [7] M. P. Desai, R. Cvijetic, and J. Jensen, Sizing of clock distribution networks for high performance CPU chips, In Proc. of DAC, pp , [8] E.G. Friedman, Clock distribution networks in synchronous digital integrated circuits, in Proc. IEEE, vol. 89, pp , May [9] M. Hashimoto, T. Yamamoto, and H. Onodera. Statistical analysis of clock skew variation in H-tree structure, in Proc. of ISQED, [10] J. L. Neves and E. G. Friedman, Optimal clock skew scheduling tolerant to process Variations, in Proc. of DAC, pp , June 1996 [11] P. Mahoney, E. Fetzer, B. Doyle and S. Naffziger Clock Distribution on a Dual-Core, Multi-Threaded Itanium-Family Processor, in IEEE ISSCC, pp , [12] U. Padmanabhan, Janet M. Wang, J. Hu, Statistical clock tree routing for robustness to process variations, in Proc. of ISPD, pp , [13] J Pangjun, S. S. Sapatnekar, Low-power clock distribution using multiple voltages and reduced swings, in IEEE Trans. on VLSI, vol. 10, pp , Jun [14] S. Pullela, N. Menezes and L. T. Pillage, Reliable non-zero skew clock tree using wire width optimization, in Proc. of DAC., pp , [15] A. Rajaram, J. Hu, R. Mahapatra, Reducing clock skew variability via cross links, in Proc. of DAC, pp , June 2004 [16] A. Rajaram and D. Z. Pan, Robust chip-level clock tree synthesis for SOC designs, in Proc. of DAC, pp , 2008 [17] H. Su and S. S. Sapatnekar, Hybrid structured clock network construction, in Proc. of ICCAD, pp , 2001 [18] J. L. Tsai, T. H. Chen, and C.C. Chen., Zero skew clock-tree optimization with buffer insertion/sizing and wire sizing, in IEEE Trans. on CAD, vol. 23, pp , April [19] K. Wang, Y. Ran, and M. Marek-Sadowska, General skew constrained clock network sizing based on sequential linear programming, in IEEE Trans. on CAD, vol. 24, pp , May [20] Q. Zhu and W. W. M. Dai, High-speed clock network sizing optimization based on distributed RC and lossy RLC interconnect models, in IEEE Trans. on CAD, vol. 15, pp , Sep [21] pdf [22] A practical guide to low-power design, Power Forward Initiative (PFI), [23] pdf [24] International Technology Roadmap for Semiconductors(ITRS), 2007 Edition, [25] IEEE 1481 Standard for Integrated Circuit (IC) Delay and Power Calculation System, (a) Original (b) PMAB (c) Modified PMAB Figure 8. The experimental result of IND1.

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Power-Mode-Aware Buffer Synthesis for Low-Power

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

Variation Tolerant Buffered Clock Network Synthesis with Cross Links

Variation Tolerant Buffered Clock Network Synthesis with Cross Links Variation Tolerant Buffered Clock Network Synthesis with Cross Links Anand Rajaram David Z. Pan Dept. of ECE, UT-Austin Texas Instruments, Dallas Sponsored by SRC and IBM Faculty Award 1 Presentation Outline

More information

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations Renshen Wang Department of Computer Science and Engineering University of California, San Diego La Jolla,

More information

Multi-Voltage Domain Clock Mesh Design

Multi-Voltage Domain Clock Mesh Design Multi-Voltage Domain Clock Mesh Design Can Sitik Electrical and Computer Engineering Drexel University Philadelphia, PA, 19104 USA E-mail: as3577@drexel.edu Baris Taskin Electrical and Computer Engineering

More information

Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure

Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Subhendu Roy 1, Pavlos M. Mattheakis 2, Laurent Masse-Navette 2 and David Z. Pan 1 1 ECE Department, The University of Texas at Austin

More information

VERY large scale integration (VLSI) design for power

VERY large scale integration (VLSI) design for power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,

More information

High-Speed Clock Routing. Performance-Driven Clock Routing

High-Speed Clock Routing. Performance-Driven Clock Routing High-Speed Clock Routing Performance-Driven Clock Routing Given: Locations of sinks {s 1, s,,s n } and clock source s 0 Skew Bound B >= 0 If B = 0, zero-skew routing Possibly other constraints: Rise/fall

More information

Crosslink Insertion for Variation-Driven Clock Network Construction

Crosslink Insertion for Variation-Driven Clock Network Construction Crosslink Insertion for Variation-Driven Clock Network Construction Fuqiang Qian, Haitong Tian, Evangeline Young Department of Computer Science and Engineering The Chinese University of Hong Kong {fqqian,

More information

On Constructing Lower Power and Robust Clock Tree via Slew Budgeting

On Constructing Lower Power and Robust Clock Tree via Slew Budgeting 1 On Constructing Lower Power and Robust Clock Tree via Slew Budgeting Yeh-Chi Chang, Chun-Kai Wang and Hung-Ming Chen Dept. of EE, National Chiao Tung University, Taiwan 2012 年 3 月 29 日 Outline 2 Motivation

More information

Kyoung Hwan Lim and Taewhan Kim Seoul National University

Kyoung Hwan Lim and Taewhan Kim Seoul National University Kyoung Hwan Lim and Taewhan Kim Seoul National University Table of Contents Introduction Motivational Example The Proposed Algorithm Experimental Results Conclusion In synchronous circuit design, all sequential

More information

Determination of Worst-case Crosstalk Noise for Non-Switching Victims in GHz+ Interconnects

Determination of Worst-case Crosstalk Noise for Non-Switching Victims in GHz+ Interconnects Determination of Worst-case Crosstalk Noise for Non-Switching Victims in GHz+ Interconnects Jun Chen ECE Department University of Wisconsin, Madison junc@cae.wisc.edu Lei He EE Department University of

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

University of California at Berkeley. Berkeley, CA the global routing in order to generate a feasible solution

University of California at Berkeley. Berkeley, CA the global routing in order to generate a feasible solution Post Routing Performance Optimization via Multi-Link Insertion and Non-Uniform Wiresizing Tianxiong Xue and Ernest S. Kuh Department of Electrical Engineering and Computer Sciences University of California

More information

Double Patterning-Aware Detailed Routing with Mask Usage Balancing

Double Patterning-Aware Detailed Routing with Mask Usage Balancing Double Patterning-Aware Detailed Routing with Mask Usage Balancing Seong-I Lei Department of Computer Science National Tsing Hua University HsinChu, Taiwan Email: d9762804@oz.nthu.edu.tw Chris Chu Department

More information

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -

More information

Fault-Tolerant 3D Clock Network

Fault-Tolerant 3D Clock Network Fault-Tolerant Clock Network Chiao-Ling Lung 1,2, Yu-Shih Su 2, Shih-Hsiu Huang 1, Yiyu Shi 3, and Shih-Chieh Chang 1 1 Department of Computer Science National Tsing Hua University HsinChu 30013, Taiwan

More information

Process-Induced Skew Variation for Scaled 2-D and 3-D ICs

Process-Induced Skew Variation for Scaled 2-D and 3-D ICs Process-Induced Skew Variation for Scaled 2-D and 3-D ICs Hu Xu, Vasilis F. Pavlidis, and Giovanni De Micheli LSI-EPFL July 26, 2010 SLIP 2010, Anaheim, USA Presentation Outline 2-D and 3-D Clock Distribution

More information

Whitespace-Aware TSV Arrangement in 3D Clock Tree Synthesis

Whitespace-Aware TSV Arrangement in 3D Clock Tree Synthesis 2013 IEEE Computer Society Annual Symposium on VLSI Whitespace-Aware TSV Arrangement in 3D Clock Tree Synthesis Xin Li, Wulong Liu, Haixiao Du, Yu Wang, Yuchun Ma, Huazhong Yang Tsinghua National Laboratory

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

Clock Gating Optimization with Delay-Matching

Clock Gating Optimization with Delay-Matching Clock Gating Optimization with Delay-Matching Shih-Jung Hsu Computer Science and Engineering Yuan Ze University Chung-Li, Taiwan Rung-Bin Lin Computer Science and Engineering Yuan Ze University Chung-Li,

More information

[14] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, Clock routing for high-performance ICs, 27th ACM

[14] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, Clock routing for high-performance ICs, 27th ACM Journal of High Speed Electronics and Systems, pp65-81, 1996. [14] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, Clock routing for high-performance ICs, 27th ACM IEEE Design AUtomation Conference, pp.573-579,

More information

Parallel-computing approach for FFT implementation on digital signal processor (DSP)

Parallel-computing approach for FFT implementation on digital signal processor (DSP) Parallel-computing approach for FFT implementation on digital signal processor (DSP) Yi-Pin Hsu and Shin-Yu Lin Abstract An efficient parallel form in digital signal processor can improve the algorithm

More information

A Novel Performance-Driven Topology Design Algorithm

A Novel Performance-Driven Topology Design Algorithm A Novel Performance-Driven Topology Design Algorithm Min Pan, Chris Chu Priyadarshan Patra Electrical and Computer Engineering Dept. Intel Corporation Iowa State University, Ames, IA 50011 Hillsboro, OR

More information

Floorplan considering interconnection between different clock domains

Floorplan considering interconnection between different clock domains Proceedings of the 11th WSEAS International Conference on CIRCUITS, Agios Nikolaos, Crete Island, Greece, July 23-25, 2007 115 Floorplan considering interconnection between different clock domains Linkai

More information

On GPU Bus Power Reduction with 3D IC Technologies

On GPU Bus Power Reduction with 3D IC Technologies On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The

More information

Multi-Corner Multi-Voltage Domain Clock Mesh Design

Multi-Corner Multi-Voltage Domain Clock Mesh Design Multi-Corner Multi-Voltage Domain Clock Mesh Design Can Sitik Electrical and Computer Engineering Drexel University Philadelphia, PA, 19104 USA E-mail: as3577@drexel.edu Baris Taskin Electrical and Computer

More information

Crosstalk Noise Optimization by Post-Layout Transistor Sizing

Crosstalk Noise Optimization by Post-Layout Transistor Sizing Crosstalk Noise Optimization by Post-Layout Transistor Sizing Masanori Hashimoto hasimoto@i.kyoto-u.ac.jp Masao Takahashi takahasi@vlsi.kuee.kyotou.ac.jp Hidetoshi Onodera onodera@i.kyoto-u.ac.jp ABSTRACT

More information

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 8 (2013), pp. 907-912 Research India Publications http://www.ripublication.com/aeee.htm Circuit Model for Interconnect Crosstalk

More information

Simultaneous OPC- and CMP-Aware Routing Based on Accurate Closed-Form Modeling

Simultaneous OPC- and CMP-Aware Routing Based on Accurate Closed-Form Modeling Simultaneous OPC- and CMP-Aware Routing Based on Accurate Closed-Form Modeling Shao-Yun Fang, Chung-Wei Lin, Guang-Wan Liao, and Yao-Wen Chang March 26, 2013 Graduate Institute of Electronics Engineering

More information

A Novel Framework for Multilevel Full-Chip Gridless Routing

A Novel Framework for Multilevel Full-Chip Gridless Routing A Novel Framework for Multilevel Full-Chip Gridless Routing Tai-Chen Chen Yao-Wen Chang Shyh-Chang Lin Graduate Institute of Electronics Engineering Graduate Institute of Electronics Engineering SpringSoft,

More information

Interconnect Delay and Area Estimation for Multiple-Pin Nets

Interconnect Delay and Area Estimation for Multiple-Pin Nets Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Z. Pan UCLA Computer Science Department Los Angeles, CA 90095 Sponsored by SRC and Avant!! under CA-MICRO Presentation

More information

Fine-Grained Sleep Transistor Sizing Algorithm for Leakage Power Minimization

Fine-Grained Sleep Transistor Sizing Algorithm for Leakage Power Minimization 6.1 Fine-Grained Sleep Transistor Sizing Algorithm for Leakage Power Minimization De-Shiuan Chiou, Da-Cheng Juan, Yu-Ting Chen, and Shih-Chieh Chang Department of CS, National Tsing Hua University, Hsinchu,

More information

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Zhigang Pan Department of Computer Science University of California, Los Angeles, CA 90095 Email: fcong,pang@cs.ucla.edu

More information

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER Bhuvaneswaran.M 1, Elamathi.K 2 Assistant Professor, Muthayammal Engineering college, Rasipuram, Tamil Nadu, India 1 Assistant Professor, Muthayammal

More information

Postgrid Clock Routing for High Performance Microprocessor Designs

Postgrid Clock Routing for High Performance Microprocessor Designs IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 31, NO. 2, FEBRUARY 2012 255 Postgrid Clock Routing for High Performance Microprocessor Designs Haitong Tian, Wai-Chung

More information

Three DIMENSIONAL-CHIPS

Three DIMENSIONAL-CHIPS IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 4 (Sep-Oct. 2012), PP 22-27 Three DIMENSIONAL-CHIPS 1 Kumar.Keshamoni, 2 Mr. M. Harikrishna

More information

Timing-Constrained I/O Buffer Placement for Flip- Chip Designs

Timing-Constrained I/O Buffer Placement for Flip- Chip Designs Timing-Constrained I/O Buffer Placement for Flip- Chip Designs Zhi-Wei Chen 1 and Jin-Tai Yan 2 1 College of Engineering, 2 Department of Computer Science and Information Engineering Chung-Hua University,

More information

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling Based on Interconnect Prediction and Sampling Yu Hu King Ho Tam Tom Tong Jing Lei He Electrical Engineering Department University of California at Los Angeles System Level Interconnect Prediction (SLIP),

More information

Optimal Prescribed-Domain Clock Skew Scheduling

Optimal Prescribed-Domain Clock Skew Scheduling Optimal Prescribed-Domain Clock Skew Scheduling Li Li, Yinghai Lu, Hai Zhou Electrical Engineering and Computer Science Northwestern University 6B-4 Abstract Clock skew scheduling is an efficient technique

More information

Efficient Test Compaction for Combinational Circuits Based on Fault Detection Count-Directed Clustering

Efficient Test Compaction for Combinational Circuits Based on Fault Detection Count-Directed Clustering Efficient Test Compaction for Combinational Circuits Based on Fault Detection Count-Directed Clustering Aiman El-Maleh, Saqib Khurshid King Fahd University of Petroleum and Minerals Dhahran, Saudi Arabia

More information

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic A Novel Design of High Speed and Area Efficient De-Multiplexer Using Pass Transistor Logic K.Ravi PG Scholar(VLSI), P.Vijaya Kumari, M.Tech Assistant Professor T.Ravichandra Babu, Ph.D Associate Professor

More information

Efficient Static Timing Analysis Using a Unified Framework for False Paths and Multi-Cycle Paths

Efficient Static Timing Analysis Using a Unified Framework for False Paths and Multi-Cycle Paths Efficient Static Timing Analysis Using a Unified Framework for False Paths and Multi-Cycle Paths Shuo Zhou, Bo Yao, Hongyu Chen, Yi Zhu and Chung-Kuan Cheng University of California at San Diego La Jolla,

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device

More information

Retiming and Clock Scheduling for Digital Circuit Optimization

Retiming and Clock Scheduling for Digital Circuit Optimization 184 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 Retiming and Clock Scheduling for Digital Circuit Optimization Xun Liu, Student Member,

More information

A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction

A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo Han, Andrew B. Kahng, Jongpil Lee, Jiajia Li and Siddhartha Nath CSE and ECE Departments,

More information

8D-3. Experiences of Low Power Design Implementation and Verification. Shi-Hao Chen. Jiing-Yuan Lin

8D-3. Experiences of Low Power Design Implementation and Verification. Shi-Hao Chen. Jiing-Yuan Lin Experiences of Low Power Design Implementation and Verification Shi-Hao Chen Global Unichip Corp. Hsin-Chu Science Park, Hsin-Chu, Taiwan 300 +886-3-564-6600 hockchen@globalunichip.com Jiing-Yuan Lin Global

More information

Nanometer technologies enable higher-frequency designs

Nanometer technologies enable higher-frequency designs By Ron Press & Jeff Boyer Easily Implement PLL Clock Switching for At-Speed Test By taking advantage of pattern-generation features, a simple logic design can utilize phase-locked-loop clocks for accurate

More information

An Efficient Algorithm For RLC Buffer Insertion

An Efficient Algorithm For RLC Buffer Insertion An Efficient Algorithm For RLC Buffer Insertion Zhanyuan Jiang, Shiyan Hu, Jiang Hu and Weiping Shi Texas A&M University, College Station, Texas 77840 Email: {jerryjiang, hushiyan, jianghu, wshi}@ece.tamu.edu

More information

DUE to the high computational complexity and real-time

DUE to the high computational complexity and real-time IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen

More information

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Clock Power Reduction Using Merged Flip Flops Technique S.Murugan ME VLSI Design, SCAD College of Engineering and Technology,

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Algorithms for Non-Hanan-Based Optimization for VLSI Interconnect under a Higher-Order AWE Model

Algorithms for Non-Hanan-Based Optimization for VLSI Interconnect under a Higher-Order AWE Model 446 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 4, APRIL 2000 Algorithms for Non-Hanan-Based Optimization for VLSI Interconnect under a Higher-Order AWE

More information

Iterative-Constructive Standard Cell Placer for High Speed and Low Power

Iterative-Constructive Standard Cell Placer for High Speed and Low Power Iterative-Constructive Standard Cell Placer for High Speed and Low Power Sungjae Kim and Eugene Shragowitz Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455

More information

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee

More information

WITH the development of the semiconductor technology,

WITH the development of the semiconductor technology, Dual-Link Hierarchical Cluster-Based Interconnect Architecture for 3D Network on Chip Guang Sun, Yong Li, Yuanyuan Zhang, Shijun Lin, Li Su, Depeng Jin and Lieguang zeng Abstract Network on Chip (NoC)

More information

Solving MIPI D-PHY Receiver Test Challenges

Solving MIPI D-PHY Receiver Test Challenges Stefan Walther and Yu Hu Verigy stefan.walther@verigy.com yu.hu@verigy.com Abstract MIPI stands for the Mobile Industry Processor Interface, which provides a flexible, low-cost, high-speed interface solution

More information

A General Sign Bit Error Correction Scheme for Approximate Adders

A General Sign Bit Error Correction Scheme for Approximate Adders A General Sign Bit Error Correction Scheme for Approximate Adders Rui Zhou and Weikang Qian University of Michigan-Shanghai Jiao Tong University Joint Institute Shanghai Jiao Tong University, Shanghai,

More information

A Survey on Buffered Clock Tree Synthesis for Skew Optimization

A Survey on Buffered Clock Tree Synthesis for Skew Optimization A Survey on Buffered Clock Tree Synthesis for Skew Optimization Anju Rose Tom 1, K. Gnana Sheela 2 1, 2 Electronics and Communication Department, Toc H Institute of Science and Technology, Kerala, India

More information

Combinatorial Algorithms for Fast Clock Mesh Optimization

Combinatorial Algorithms for Fast Clock Mesh Optimization Combinatorial Algorithms for Fast Clock Mesh Optimization Ganesh Venkataraman, Zhuo Feng, Jiang Hu, Peng Li Dept. of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843

More information

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.1, FEBRUARY, 2015 http://dx.doi.org/10.5573/jsts.2015.15.1.077 Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network

More information

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath

More information

Asia and South Pacific Design Automation Conference

Asia and South Pacific Design Automation Conference Asia and South Pacific Design Automation Conference Authors: Kuan-Yu Lin, Hong-Ting Lin, and Tsung-Yi Ho Presenter: Hong-Ting Lin chibli@csie.ncku.edu.tw http://eda.csie.ncku.edu.tw Electronic Design Automation

More information

Buffered Steiner Trees for Difficult Instances

Buffered Steiner Trees for Difficult Instances Buffered Steiner Trees for Difficult Instances C. J. Alpert 1, M. Hrkic 2, J. Hu 1, A. B. Kahng 3, J. Lillis 2, B. Liu 3, S. T. Quay 1, S. S. Sapatnekar 4, A. J. Sullivan 1, P. Villarrubia 1 1 IBM Corp.,

More information

Making Fast Buffer Insertion Even Faster Via Approximation Techniques

Making Fast Buffer Insertion Even Faster Via Approximation Techniques 1A-3 Making Fast Buffer Insertion Even Faster Via Approximation Techniques Zhuo Li 1,C.N.Sze 1, Charles J. Alpert 2, Jiang Hu 1, and Weiping Shi 1 1 Dept. of Electrical Engineering, Texas A&M University,

More information

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Chen-Wei Liu 12 and Yao-Wen Chang 2 1 Synopsys Taiwan Limited 2 Department of Electrical Engineering National Taiwan University,

More information

Processor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs

Processor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs Processor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs Shin-Shiun Chen, Chun-Kai Hsu, Hsiu-Chuan Shih, and Cheng-Wen Wu Department of Electrical Engineering National Tsing Hua University

More information

Architecture-Level Synthesis for Automatic Interconnect Pipelining

Architecture-Level Synthesis for Automatic Interconnect Pipelining Architecture-Level Synthesis for Automatic Interconnect Pipelining Jason Cong, Yiping Fan, Zhiru Zhang Computer Science Department University of California, Los Angeles, CA 90095 {cong, fanyp, zhiruz}@cs.ucla.edu

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017 Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of

More information

HAI ZHOU. Evanston, IL Glenview, IL (847) (o) (847) (h)

HAI ZHOU. Evanston, IL Glenview, IL (847) (o) (847) (h) HAI ZHOU Electrical and Computer Engineering Northwestern University 2535 Happy Hollow Rd. Evanston, IL 60208-3118 Glenview, IL 60025 haizhou@ece.nwu.edu www.ece.nwu.edu/~haizhou (847) 491-4155 (o) (847)

More information

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Design and Implementation of CVNS Based Low Power 64-Bit Adder Design and Implementation of CVNS Based Low Power 64-Bit Adder Ch.Vijay Kumar Department of ECE Embedded Systems & VLSI Design Vishakhapatnam, India Sri.Sagara Pandu Department of ECE Embedded Systems

More information

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University Abbas El Gamal Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program Stanford University Chip stacking Vertical interconnect density < 20/mm Wafer Stacking

More information

Implementation of Asynchronous Topology using SAPTL

Implementation of Asynchronous Topology using SAPTL Implementation of Asynchronous Topology using SAPTL NARESH NAGULA *, S. V. DEVIKA **, SK. KHAMURUDDEEN *** *(senior software Engineer & Technical Lead, Xilinx India) ** (Associate Professor, Department

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Tree Structure and Algorithms for Physical Design

Tree Structure and Algorithms for Physical Design Tree Structure and Algorithms for Physical Design Chung Kuan Cheng, Ronald Graham, Ilgweon Kang, Dongwon Park and Xinyuan Wang CSE and ECE Departments UC San Diego Outline: Introduction Ancestor Trees

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

Vdd Programmability to Reduce FPGA Interconnect Power

Vdd Programmability to Reduce FPGA Interconnect Power Vdd Programmability to Reduce FPGA Interconnect Power Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA 90095 ABSTRACT Power is an increasingly important

More information

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design

More information

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Physical Design of Digital Integrated Circuits (EN029 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Lecture 08: Interconnect Trees Introduction to Graphs and Trees Minimum Spanning

More information

SF-LRU Cache Replacement Algorithm

SF-LRU Cache Replacement Algorithm SF-LRU Cache Replacement Algorithm Jaafar Alghazo, Adil Akaaboune, Nazeih Botros Southern Illinois University at Carbondale Department of Electrical and Computer Engineering Carbondale, IL 6291 alghazo@siu.edu,

More information

NoCIC: A Spice-based Interconnect Planning Tool Emphasizing Aggressive On-Chip Interconnect Circuit Methods

NoCIC: A Spice-based Interconnect Planning Tool Emphasizing Aggressive On-Chip Interconnect Circuit Methods 1 NoCIC: A Spice-based Interconnect Planning Tool Emphasizing Aggressive On-Chip Interconnect Circuit Methods V. Venkatraman, A. Laffely, J. Jang, H. Kukkamalla, Z. Zhu & W. Burleson Interconnect Circuit

More information

Multicycle-Path Challenges in Multi-Synchronous Systems

Multicycle-Path Challenges in Multi-Synchronous Systems Multicycle-Path Challenges in Multi-Synchronous Systems G. Engel 1, J. Ziebold 1, J. Cox 2, T. Chaney 2, M. Burke 2, and Mike Gulotta 3 1 Department of Electrical and Computer Engineering, IC Design Research

More information

Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers

Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers , October 20-22, 2010, San Francisco, USA Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers I-Lun Tseng, Member, IAENG, Huan-Wen Chen, and Che-I Lee Abstract Longest-path routing problems,

More information

On the Decreasing Significance of Large Standard Cells in Technology Mapping

On the Decreasing Significance of Large Standard Cells in Technology Mapping On the Decreasing Significance of Standard s in Technology Mapping Jae-sun Seo, Igor Markov, Dennis Sylvester, and David Blaauw Department of EECS, University of Michigan, Ann Arbor, MI 48109 {jseo,imarkov,dmcs,blaauw}@umich.edu

More information

Static Compaction Techniques to Control Scan Vector Power Dissipation

Static Compaction Techniques to Control Scan Vector Power Dissipation Static Compaction Techniques to Control Scan Vector Power Dissipation Ranganathan Sankaralingam, Rama Rao Oruganti, and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer

More information

Whitespace-Aware TSV Arrangement in 3D Clock Tree Synthesis

Whitespace-Aware TSV Arrangement in 3D Clock Tree Synthesis IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. XX, XXXX 1 Whitespace-Aware TSV Arrangement in 3D Clock Tree Synthesis Wulong Liu, Student Member, IEEE, Yu Wang, Senior Member,

More information

Li Minqiang Institute of Systems Engineering Tianjin University, Tianjin , P.R. China

Li Minqiang Institute of Systems Engineering Tianjin University, Tianjin , P.R. China Multi-level Genetic Algorithm (MLGA) for the Construction of Clock Binary Tree Nan Guofang Tianjin University, Tianjin 07, gfnan@tju.edu.cn Li Minqiang Tianjin University, Tianjin 07, mqli@tju.edu.cn Kou

More information

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset M.Santhi, Arun Kumar S, G S Praveen Kalish, Siddharth Sarangan, G Lakshminarayanan Dept of ECE, National Institute

More information

FPGA Clock Network Architecture: Flexibility vs. Area and Power

FPGA Clock Network Architecture: Flexibility vs. Area and Power FPGA Clock Network Architecture: Flexibility vs. Area and Power Julien Lamoureux and Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, B.C.,

More information

Design Compiler Graphical Create a Better Starting Point for Faster Physical Implementation

Design Compiler Graphical Create a Better Starting Point for Faster Physical Implementation Datasheet Create a Better Starting Point for Faster Physical Implementation Overview Continuing the trend of delivering innovative synthesis technology, Design Compiler Graphical streamlines the flow for

More information

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,

More information

Interconnect Design for Deep Submicron ICs

Interconnect Design for Deep Submicron ICs ! " #! " # - Interconnect Design for Deep Submicron ICs Jason Cong Lei He Kei-Yong Khoo Cheng-Kok Koh and Zhigang Pan Computer Science Department University of California Los Angeles CA 90095 Abstract

More information

CATALYST: Planning Layer Directives for Effective Design Closure

CATALYST: Planning Layer Directives for Effective Design Closure CATALYST: Planning Layer Directives for Effective Design Closure Yaoguang Wei 1, Zhuo Li 2, Cliff Sze 2 Shiyan Hu 3, Charles J. Alpert 2, Sachin S. Sapatnekar 1 1 Department of Electrical and Computer

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

Calibrating Achievable Design GSRC Annual Review June 9, 2002

Calibrating Achievable Design GSRC Annual Review June 9, 2002 Calibrating Achievable Design GSRC Annual Review June 9, 2002 Wayne Dai, Andrew Kahng, Tsu-Jae King, Wojciech Maly,, Igor Markov, Herman Schmit, Dennis Sylvester DUSD(Labs) Calibrating Achievable Design

More information

LOGIC EFFORT OF CMOS BASED DUAL MODE LOGIC GATES

LOGIC EFFORT OF CMOS BASED DUAL MODE LOGIC GATES LOGIC EFFORT OF CMOS BASED DUAL MODE LOGIC GATES D.Rani, R.Mallikarjuna Reddy ABSTRACT This logic allows operation in two modes: 1) static and2) dynamic modes. DML gates, which can be switched between

More information

An Efficient Routing Tree Construction Algorithm with Buffer Insertion, Wire Sizing and Obstacle Considerations

An Efficient Routing Tree Construction Algorithm with Buffer Insertion, Wire Sizing and Obstacle Considerations An Efficient Routing Tree Construction Algorithm with uffer Insertion, Wire Sizing and Obstacle Considerations Sampath Dechu Zion Cien Shen Chris C N Chu Physical Design Automation Group Dept Of ECpE Dept

More information

Testability Optimizations for A Time Multiplexed CPLD Implemented on Structured ASIC Technology

Testability Optimizations for A Time Multiplexed CPLD Implemented on Structured ASIC Technology ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY Volume 14, Number 4, 2011, 392 398 Testability Optimizations for A Time Multiplexed CPLD Implemented on Structured ASIC Technology Traian TULBURE

More information

arxiv: v1 [cs.ar] 14 May 2017

arxiv: v1 [cs.ar] 14 May 2017 Fast Statistical Timing Analysis for Circuits with Post-Silicon Tunable Clock Buffers Bing Li, Ning Chen, Ulf Schlichtmann Institute for Electronic Design Automation, Technische Universitaet Muenchen,

More information