Clock Skew Optimization Considering Complicated Power Modes

Size: px

Start display at page:

Download "Clock Skew Optimization Considering Complicated Power Modes"

Horace O’Brien’
5 years ago
Views:

Clock Skew Optimization Considering Complicated Power Modes Chiao-Ling Lung 1,2, Zi-Yi Zeng 1, Chung-Han Chou 1, Shih-Chieh Chang 1 National Tsing-Hua University, HsinChu, Taiwan 1 Industrial

1 Clock Skew Optimization Considering Complicated Power Modes Chiao-Ling Lung 1,2, Zi-Yi Zeng 1, Chung-Han Chou 1, Shih-Chieh Chang 1 National Tsing-Hua University, HsinChu, Taiwan 1 Industrial Technology Research Institute, HsinChu, Taiwan 2 cllung0608@gmail.com, zen.ziyi@gmail.com, u942518@oz.nthu.edu.tw, scchang@cs.nthu.edu.tw Abstract To conserve energy, a design which utilizes different power modes has been widely adopted. However, when a design has many different power modes, clock tree optimization (CTO) becomes very difficult. In this paper, we propose a two-level power-mode-aware CTO methodology. Among all different power modes, the chip-level CTO globally reduces clock skew among modules, whereas the module-level CTO reduces clock skew within a single module. Our experimental results show that the power-mode-aware CTO can achieve significant improvement in the worst-case condition with only a minor penalty in area. Keywords-power modes, clock tree, clock skew I. INTRODUCTION Due to technology scaling, the ITRS roadmap 2008 Update [24] predicts that, by 2015, high performance integrated circuits will work with on-chip local clock frequencies up to 8.5 GHz. However, in synchronous design, the performance is limited not only by the speed capability of devices but also by the synchronization ability of data signals. The clock skew, the maximum difference among the clock arrival times of sequential elements, imposes important constraints on the system performance. Power Modes Full Speed Figure 1. Industrial example. MPU DSP1 DSP2 1.2V 1.2V 1.2V Active1 1.2V 1.2V 1.0V Active2 1.2V 1.0V 1.2V Suspend 1.0V 1.0V 1.0V Inactive 1.0V 0V 0V Many previous works have concentrated on the problem of clock skew minimization. In [2] [3], clock trees are constructed by zero- or bounded-skew routing. To achieve further skew control, buffer and wire-sizing techniques have been proposed by [4] [6] [18-19]. In order to consider process variation issues, a statistical timing model is used for clock tree optimization [9] [12]. Some researchers [8] [10] use an intentional useful skew scheduling to improve system performance. Special structures such as hybrid and clock meshes have been studied in [7] [14-15] [17]. To lower power consumption, [13] suggests a lowpower clock scheme by distributing the clock signal at a lower voltage and translating it to a higher voltage at the utilization points. A type-matching method is proposed by [5] to consider the impact of clock gating. Chip-level clock tree synthesis is presented by [16] to construct a clock tree for SoC. A novel clock distribution methodology is presented by [11] to perform dynamic de-skewing during the operation of the chip. Despite many studies on clock tree optimization, clock skew minimization is still difficult to achieve in advanced power-saving methodologies where many different power modes are used. Take an industrial case shown in Figure 1 as an example. The design has over 40 modules, some of which may operate in 1.2 V or 1.0V, or may completely shut down. The design has a total of 64 power modes to fit various operating requirements. Some power modes are shown in Figure 1. Since the operating voltage has great influence on the delay of a clock buffer, the clock arrival times of FF sinks in a module may vary greatly when the module performs in a different operating voltage. As a result, it is extremely difficult to implement a single piece of clock network that satisfies the clock skew constraints in all possible power modes. The difficulty of generating a single clock tree to satisfy clock skew constraints in multiple power modes has been pointed out in several industrial publications [21-23]. One way to resolve the clock skew problem is to adopt the asynchronous design style. However, an asynchronous design is difficult to verify and requires an additional synchronizer circuit to handle data synchronization. The previous work [21] uses the delay locked loop (DLL) to synchronize the clock between power domains. As far as we know, none of the previous works have proposed solutions to the problem of clock skew minimization of complicated power modes in the synchronous way. In this paper, we propose a Power-Mode-Aware CTO framework to resolve the skew issue in the complicated power modes. Our framework consists of two major subcomponents the chip-level CTO and the module-level CTO. The chip-level CTO attempts to reduce the global clock skew in a design among all possible power modes. In contrast, the module-level CTO tries to reduce the local clock skew within a module among all different operating voltages. In the chip-level CTO, we propose novel power-modeaware buffers (PMABs) which are inserted into a chip-level clock tree to balance the clock skew among various modules of differing voltage modes. The PMAB is a super buffer with /DATE EDAA

2 mode-selection capability. The delays of a PMAB can be adjusted under various mode conditions. In this paper, we have innovated two different ways of implementing a PMAB which attempts to reduce inter-module clock skew. In the modulelevel CTO, we follow the popular way [7] [19] of using linear programming to reduce the clock skew. We have used an industrial 65nm technology library to perform a set of experiments and the results are very promising. The major contributions of this paper are summarized as follows. We propose to resolve the clock skew problem due to complicated power modes by using a PMAB which has various propagation delays to be chosen by a voltage mode. To reduce the area penalty of a PMAB, we explore the flexibility of designing a PMAB. Our methodology can cope with the current design flow. The rest of this paper is organized as follows. Section II introduces chip-level CTO. Section III describes a modulelevel CTO. Then Section IV demonstrates how to implement our framework with a commercial design flow to achieve onepass clock skew optimization. In Section V, we show experimental results on benchmark circuits. Section VI summarizes our findings to conclude the paper. II. CHIP-LEVEL CTO have five power modes for the design: Full Speed, Active 1, Active 2, Suspend and Inactive. However, for all modules including MPU, DSP1 and DSP2, we have only two voltage modes, 1.2V and 1.0V. Table I. An example design with four power modes. Power Mode M1 M2 M3 V L E V L E V L E pm pm pm pm V: Voltage; L: Latest latency; E: Earliest latency We now describe the steps for designing a PMAB. In the first step, we analyze and record the global latest clock latency called L global among all modules of possible voltage modes. Consider the example in Table I where the design has three modules (M1, M2, M3), four power modes (pm1, pm2, pm3, pm4) and two voltage modes (1.2V and 1.0V). In power mode pm4, module M1 operates in voltage mode 1.0V with the latest clock latency of 14. Similarly, we have latency of 9 for M2 operating in 1.2V and latency of 13 for M3 in 1.0V. Among all modules in all voltage modes, the global latest clock latency is L global = 14, which is the latest clock latency of module M1 operating in 1.0V. In addition, there is a clock skew of 7 between the latency of M1 in 1.0V and the latency of M2 in 1.2V. The clock skew of 7 is called the global clock skew of the design and is denoted as Skew global. Voltage Mode SELv Alignment Delay B 1 B 2 B 3 1.2V V Figure 2. An example of clock tree with PMABs. This section describes the design of a chip-level CTO which inserts PMABs to balance clock skew among modules. An example of clock tree with PMABs is given in Figure 2, where triangles stand for PMABs, solid lines represent clock signals and dotted lines are selection signals generated from the power mode controller. In this section, we first present a possible implementation of a PMAB design and then present important lemmas relating to a PMAB. After that, we then propose a modified PMAB which has less area cost and better efficiency in clock latency than the original one. A. PMAB Design First, we would like to clarify the terms voltage mode and power mode. Throughout this paper, the term voltage mode describes different operating voltages for a module, whereas the term power mode describes different configurations of the operating voltages of modules. For example in Figure 1, we (a) Original clock tree (c) Clock tree with PMABs (b) Alignment Delay (d) An example of a PMAB Figure 3. An example of a PMAB and a clock tree with PMABs. Next, for each voltage mode of a module, we calculate the delay to align its latest clock latency with L global. The delay to align the clock latency with L global for a module m in a voltage mode v is called the alignment delay of module m in voltage mode v and is denoted as m,v. In the same example, the latest clock latency of module M2 in 1.2V is 9. To align with L global (14), we need a delay of 5 (=14-9) so that the latest clock latency will be the same as L global. Therefore, we say that the

Then, we design a PMAB of a module as a tunable delay element which uses the voltage mode as the select signal to select a set of the corresponding alignment delays.

3 alignment delay of 1.2V for module M2 is M2,1.2 = 5. For another example, the alignment delay of 1.0V for module M3 is M3,1.0 = 1 (=14-13). For a module, we can calculate the alignment delays of voltage modes. Then, we design a PMAB of a module as a tunable delay element which uses the voltage mode as the select signal to select a set of the corresponding alignment delays. In the same example, the PMAB for module M3 has two voltage modes, 1.2V and 1.0V. The alignment delay of 1.2V is M3,1.2 = 3 and the alignment delay of 1.0V is M3,1.0 = 1. The PMAB of module M3 can be designed using a MUX which has the voltage mode as the select signal and two delay buffers with the delay of 3 for 1.2V and the delay of 1 for 1.0V, as shown in Figure 3(b) and Figure 3(d). After PMABs insertion, we can reduce Skew global from 7 to 4. Figure 3(a) shows the original clock tree and Figure 3(c) demonstrates a clock tree with PMABs. B. Characteristics of a PMAB With the insertion of the PMAB for a module, we can align the latest clock latency of a module in each voltage mode to L global. As a result, after inserting PMABs, we have the important property that the latest clock latency of each module is equal to L global for any given voltage mode. We have the following lemmas. Lemma 1: After inserting of a PMAB, the clock latencies within a module vary at the same pace or in other words, the clock skew within a module does not change. Informal proof: Since the sequential elements in a module belong to the same PMAB, no matter how many delays are padded by the PMAB, the clock latencies in the same module increase by the same quantity every time. Q.E.D. Lemma 2: After inserting of PMABs, we can obtain the optimal global clock skew of a design. And the optimal global clock skew equals to the maximal local clock skew of all modules among voltage modes. Informal proof: According to Lemma 1, the local clock skew of a module cannot be improved by a PMAB. As a result, the best possible global clock skew which can be achieved is the largest local clock skew. Q.E.D. The above lemmas state that the use of PMABs allows us to neglect the inter-module clock skew. Thus, we need only to focus on the reduction of clock skew within a module. In the Table II. Symbols Definition Symbols Description Example L global The maximal latest clock latency among all Take Table modules of possible voltage modes I as example E global The minimal earliest clock latency among all L global = 14 modules of possible voltage modes E global = 7 Skew global The difference between L global and E global Skew global = 7 Skew local The maximal local skew within a module Skew local = 4 L local The latest clock latency of module with Skew local L local = 14 E local The earliest clock latency of module with Skew local E local = 10 (a) Original (b) PMAB (c) Modified PMAB Figure 4. The alignment delays of the case listed in Table I. same example in Table I, among all local clock skews, the largest local clock skew is 4 when module M1 operates in 1.0V. In general, without PMABs, the global clock skew can be larger than the largest local clock skew of 4. However, Lemma 2 states that after inserting PMABs, the global clock skew is equal to the largest local clock skew of 4. C. A modified PMAB design The PMAB design described above tries to align the latest clock latencies of all modules to L global. In this section, we show that despite the simplicity of a PMAB design, the restriction of aligning only to L global is unnecessary in certain power modes and may cause large area penalty. We now present a modified PMAB design to alleviate the unnecessary restriction while still maintaining the good properties of Lemma 1 and 2 of a PMAB design. Before the discussion of a modified PMAB, we need new definitions of symbols. First, among all modules in all voltage modes, we say that the maximal local clock skew is Skew local and its corresponding earliest and latest clock latencies are E local and L local, i.e., Skew local = L local - E local. Then, as with L global, we defined a new symbol E global which is the global earliest clock latency among all modules of possible voltage modes. We summarize all symbols in Table II. For example in Table I, the largest local clock skew within a module, Skew local is 4 when M1 operates in 1.0V with E local = 10 and L local = 14. In addition, Skew global is 7 when M2 operates in 1.2V with E global = 7 and M1 operates in 1.0V with L global = 14. According to Lemma 2, after PMABs insertion, we have Skew global = Skew local, E global = E local, and L global = L local. As a

4 result, we need only to make sure all other clock latencies are located between E global and L global. Based on this observation, we have the flexibility of assigning the delays of a PMAB to be within the range and still achieve the optimal clock skew. With the flexible delay assignment, we can reduce the area for designing a PMAB. Figure 4 shows the clock latency and skew information for the example shown in Table I. The solid bar represents a range from the earliest clock latency to the latest clock latency, and the dashed bar represents the alignment delay for each module in each voltage mode. The double-headed arrow represents the global clock skew and, the dashed arrow represents the skew improvement. Figure 4(a) illustrates the original clock latency and skew information before PMAB insertion, and Figure 4(b) shows the result after PMAB insertion, where all latest clock latencies have been aligned to L global of 14. A modified PMAB, which will be described later, may have the clock latencies shown in Figure 5(c). All of them are within the range but do not align to the latest one. Take module M3 in 1.0V as an example in Figure 5(a), the latest clock latency of 13 is less than L local of 14, and the earliest clock latency of 11 is greater than E local of 10. For a modified PMAB, we can assign M3,1.0 = 0 and keep the clock skew unchanged. The delay of M3,1.0 being 0 means that there is no need for a delay buffer. On the other hand, the delay of M1,1.2 can be within the range from 1 to 3 without affecting the optimal clock skew. We now show that under different conditions among L global, L local, E global and E local, we need to use different formulations to calculate the flexibility of alignment delays. We have exhausted all possible conditions and categorize the conditions into four types. The mathematical expressions of the four types are as follows. Type 1. Type 2. Type 3. L local = L global and E local = E global L local < L global and E local > E global L local < L global and E local = E global 1. delay_assignment { 2. case (Type = 1) 3. do nothing 4. case (Type = 2 or 3) { 5. local = L global - L local 6. E local = E local + local 7. foreach (module m) 8. foreach (operating voltages v) 9. if (E m,v < E local ) then 10. m,v = E local - E m,v 11. } 12. case (Type = 4){ 13. foreach (module m) 14. foreach (operating voltages v) 15. if (E m,v < E local ) then 16. m,v = E local - E m,v 17. } 18. } Figure 5. Pseudo code of delay assignment. Type 4. L local = L global and E local > E global The procedures to calculate alignment delays for each type are described in Figure 5 and the complexity is O(kN), where k is the number of voltage modes and N is the number of modules. III. MODULE-LEVEL CTO The purpose of the module-level CTO is to build a clock tree which has the smallest skew possible within a module. In our framework, we utilize a similar linear program methodology [7] [19] which is commonly used for the clock skew minimization. We derive an LP formulation whose goal is to minimize the maximum clock skew within a module. Our LP formulation consists of two categories of LP constraints -- clock path constraints, and clock skew constraints. The clock path constraints describe the delay of a clock path by summing up the delays of buffers and wires on the clock path. The clock skew constraints are to calculate the maximum clock skew. Inputs: 1. An initial buffered clock tree topological T, 2. d i is the delay of b i, i {1,,N}, 3. w i is the delay of the wire between b i to its parent, i {1,,N} 4. P j is a set of buffers from clock source to s j, j {1,,M} Decision variables: Δd i, i {1,,N} Objective function: minimize: skew Subject to: // clock path constraints a j =Σ(w i + d i + Δd i ), i P j, j {1,,M} // clock skew constraints a max a j, a min a j, j {1,,M} skew = a max a min Outputs: 1. optimal latency at j of s j, j {1,,M} 2. optimal delay dt i of b i, i {1,,N} Figure 6. LP formulation. Given an initial clock tree T with N buffers and M sinks, the LP formulation can be stated as in Figure 6, where b i and s j denote the i th buffer and the j th sink on the clock tree; where d i and dt i are the delay and target delay of b i ; where a j and at j are the clock latency and target clock latency of s j ; where a max and a min are the maximum and minimum clock latency; and where skew max is the maximum skew. Although the LP formulation can provide an optimal clock skew, an exact solution requires rich delay buffers with various delay values. However, only a limited range of buffer sizes is available in a library. Traditionally, a mapping stage has been required to map a delay solution from an LP to a buffer with the closest delay. We found that the optimal delay for those buffers whose positions are not in the critical paths can be stated in a range that still achieves the optimal clock skew. This observation provides more flexibility when mapping the LP s solution to library cells.

5 IV. OUR FRAMEWORK To achieve an automation framework, our framework can work with a commercial design flow. We use PrimeTime as static timing analysis engine. In addition, since the interconnection delay becomes an increasingly larger component of the total delay in advance technology, the interconnection delay should also be considered. Stand Parasitic Exchange Format (SPEF) [25], the widely adopted format which records wire resistance and capacitance is used in our framework to take interconnection delay into account. Figure 7. Experimental Flow. Our experimental flow is shown in Figure 7. First, the clock trees of all modules are generated by the tool SOC Encounter with a level-shifter inserted. Second, we extract the clock tree structure and the interconnect information generated by SOC Encounter, where interconnect information is recorded in Stand Parasitic Exchange Format (SPEF) with wire resistance and capacitance. The module-level CTO is performed as follows. We use PrimeTime to extract clock latency and skew information, and to generate linear programming constraints. The linear programming constraints are solved by lpsolve_5.5. Our delaymapping algorithm uses the result of LP to generate the final clock tree for each module. After finishing module-level CTO, we then perform chiplevel CTO. We insert a PMAB for each module. Utilizing the clock latency information, we determine the alignment delays. During the construction of a PMAB, an alignment delay is formed by a buffer chain in which the buffers have been selected from industrial technology libraries. Finally, we generate the new design with PMABs inserted, and the report of clock information. V. EXPERIMENTAL RESULTS We have implemented our approach as shown in Figure 7, and applied the approach on a large industrial design with more than 56 power modes. To test more designs, we also created a set of new designs consisting of two or three modules instantiated from ISCAS89 benchmark circuits. Each new circuit is assumed to have two voltage modes, 1.32V and 0.9V. The initial clock tree given to our approach is constructed as follows. We first use Design Compiler to map all circuits to industrial 65nm technology library and use SOC Encounter to perform placement, clock tree synthesis and routing. After that, we obtained the initial clock tree by performing SOC Encounter assuming that all modules operate in the high voltage because timing is normally critical in this power mode. We ran all experiments on a Linux OS workstation, with 2.8 GHz CPU and 4 GB memory. The experimental results are shown in Table III. Columns one to three show the name of the circuit, the total number of sequential elements (FF), and the power modes (PM) in a circuit, respectively. Columns four to seven show the worst clock skew of all power modes of SOC Encounter (SOCE), PMAB, modified PMAB (mpmab), and the skew improvement of mpmab compared with SOCE (in %), respectively. Columns eight to eleven show the average clock skew of all power modes of SOCE, PMAB, mpmab and the skew improvement of mpmab compared with SOCE (in %), respectively. Columns twelve to fifteen show the worst clock latency of SOCE, PMAB, mpmab and the latency overhead of mpmab compared with SOCE, respectively. Columns sixteen to eighteen show the area overhead of PMAB, mpmab and the area overhead improvement of mpmab compared with PMAB. Finally, column nineteen shows the runtime of mpmab. For the case of IND1 in Table III, the worst clock skew of SOC Encounter is ps and the average clock skew is ps. After applying modified PMAB, the worst clock skew becomes 163.7ps and the average clock skew is ps. Our approach achieves a 66.94% improvement in the worst clock skew and a 65.75% improvement in the average clock skew. In this case, the clock latency penalty due to the PMAB is 36.78ps and the area overhead of the PMAB is only 0.05% of the total cell area, which doesn t consider the routing overhead. In addition, compared with PMAB, modified PMAB reduces 16.67% of area overhead, but still keep the skew unchanged. The clock latency distributions after applying PMAB and after applying modified PMAB are shown in Figure 8. On average, both PMAB and modified PMAB improve 74% of the worst clock skew, whereas the average worst latency penalty of modified PMAB is 39.26ps. Furthermore, the average area overheads of PMAB and modified PMAB are 0.16% and 0.12%. Although the worst clock skew and worst clock latency of modified PMAB are as good as PMAB, but the average latency overhead and area overhead of modified PMAB are less than PMAB. Our experimental results show that, compared with PMAB, modified PMAB improves 16.41% of the average latency overhead and 25.61% of area overhead on average. Furthermore, there exist minor difference between a PMAB and the corresponding modified PMAB in those columns regarding the worst clock skew, the average clock skew and the worst latency. The reason of minor difference is caused by the mapping inaccuracy that is to use delay buffers to implement certain delays. VI. CONCLUSIONS In this paper, we have proposed efficient ways to optimize clock skew considering the complicated power modes in an SoC design. Our methodology consists of a chip-level CTO and a module-level CTO. We also present our flow to adapt to a current design flow. Our experiments show that, both the PMAB and modified PMAB approaches dramatically improve

18 164.59 163.70 66.94% 460.66 150.79 157.78 65.75% 752.09 786.39 788.87-36.78 0.06% 0.05% 16.67% 1934 case1 3,568 8 315.43 93.79 92.56 70.66% 252.87 92.45 91.82 63.69% 505.94 544.03 545.76-39.82 0.

73% 480.43 531.30 522.07-41.64 0.11% 0.08% 27.27% 31 case4 3,028 4 317.86 90.20 90.20 71.62% 188.17 87.10 87.00 53.77% 498.45 532.76 532.76-34.31 0.12% 0.09% 25.00% 26 case5 1,007 4 248.52 45.81 45.

6 Circuits #FF Table III. Experimental results # Worst Clock Skew(ps) Average Clock Skew(ps) Worst Latency(ps) Area Overhead PM SOCE PMAB mpmab % SOCE PMAB mpmab % SOCE PMAB mpmab PMAB mpmab % Runtime (s) IND1 18, % % % 0.05% 16.67% 1934 case1 3, % % % 0.10% 37.50% 33 case2 2, % % % 0.09% 25.00% 27 case3 3, % % % 0.08% 27.27% 31 case4 3, % % % 0.09% 25.00% 26 case5 1, % % % 0.28% 22.22% 7 Avg % 63.92% % 0.12% 25.61% 343 the clock skew while incurring very little additional area overhead for designs with complicated power modes. Compared with PMAB, the modified PMAB approach utilizes less area and latency, while still maintaining the quality of results. REFERENCES [1] P. Ampadu, Ultra-low voltage VLSI : are we there yet?, in Proc. of ISCAS, pp , 2006 [2] K. D. Boese and A. B. Kahng, Zero-skew clock routing trees with minimum wirelength, in Proc. of IEEE 5th Int. ASIC Conf., pp , [3] T. H. Chao, Y. C. Hsu, J. M. Ho, K. D. Boese and A.B. Kahng, Zero skew clock routing with minimum wire length, in IEEE Trans. on Circuits Systems, vol. 39, pp , [4] C. C. N. Chu and D. F. Wong, An efficient and optimal algorithm for simultaneous buffer and wire sizing, in IEEE Trans. on Computer- Aided Design, vol. 18, pp , Sept [5] C. M. Chang, S. H. Huang, Y. K. Ho, J. Z. Lin, H. P. Wang and Y. S. Lu, Type-matching clock tree for zero skew clock gating, in Proc. of DAC, pp , 2008 [6] J. Cong and K. S. Leung, "Optimal wiresizing under the distributed elmore delay model," in IEEE Trans. on CAD, vol.14, pp , Mar [7] M. P. Desai, R. Cvijetic, and J. Jensen, Sizing of clock distribution networks for high performance CPU chips, In Proc. of DAC, pp , [8] E.G. Friedman, Clock distribution networks in synchronous digital integrated circuits, in Proc. IEEE, vol. 89, pp , May [9] M. Hashimoto, T. Yamamoto, and H. Onodera. Statistical analysis of clock skew variation in H-tree structure, in Proc. of ISQED, [10] J. L. Neves and E. G. Friedman, Optimal clock skew scheduling tolerant to process Variations, in Proc. of DAC, pp , June 1996 [11] P. Mahoney, E. Fetzer, B. Doyle and S. Naffziger Clock Distribution on a Dual-Core, Multi-Threaded Itanium-Family Processor, in IEEE ISSCC, pp , [12] U. Padmanabhan, Janet M. Wang, J. Hu, Statistical clock tree routing for robustness to process variations, in Proc. of ISPD, pp , [13] J Pangjun, S. S. Sapatnekar, Low-power clock distribution using multiple voltages and reduced swings, in IEEE Trans. on VLSI, vol. 10, pp , Jun [14] S. Pullela, N. Menezes and L. T. Pillage, Reliable non-zero skew clock tree using wire width optimization, in Proc. of DAC., pp , [15] A. Rajaram, J. Hu, R. Mahapatra, Reducing clock skew variability via cross links, in Proc. of DAC, pp , June 2004 [16] A. Rajaram and D. Z. Pan, Robust chip-level clock tree synthesis for SOC designs, in Proc. of DAC, pp , 2008 [17] H. Su and S. S. Sapatnekar, Hybrid structured clock network construction, in Proc. of ICCAD, pp , 2001 [18] J. L. Tsai, T. H. Chen, and C.C. Chen., Zero skew clock-tree optimization with buffer insertion/sizing and wire sizing, in IEEE Trans. on CAD, vol. 23, pp , April [19] K. Wang, Y. Ran, and M. Marek-Sadowska, General skew constrained clock network sizing based on sequential linear programming, in IEEE Trans. on CAD, vol. 24, pp , May [20] Q. Zhu and W. W. M. Dai, High-speed clock network sizing optimization based on distributed RC and lossy RLC interconnect models, in IEEE Trans. on CAD, vol. 15, pp , Sep [21] pdf [22] A practical guide to low-power design, Power Forward Initiative (PFI), [23] pdf [24] International Technology Roadmap for Semiconductors(ITRS), 2007 Edition, [25] IEEE 1481 Standard for Integrated Circuit (IC) Delay and Power Calculation System, (a) Original (b) PMAB (c) Modified PMAB Figure 8. The experimental result of IND1.

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Power-Mode-Aware Buffer Synthesis for Low-Power