Power-Optimal Pipelining in Deep Submicron Technology Λ

Size: px
Start display at page:

Download "Power-Optimal Pipelining in Deep Submicron Technology Λ"

Transcription

1 8. Power-Optimal Pipelining in Deep Submicron Technology Λ Seongmoo Heo and Krste Asanović MIT Computer Science and Artificial Intelligence Laboratory Vassar Street, Cambridge, MA 9 fheomoo,krsteg@csail.mit.edu ABSTRACT This paper explores the effectiveness of pipelining as a power saving tool, where the reduction in logic depth per stage is used to reduce supply voltage at a fixed clock frequency. We examine poweroptimal pipelining in deep submicron technology, both analytically and by simulation. Simulation uses a nm predictive process with a fanout-of-four ierter chain model including input/output flipflops, and results are shown to match theory well. The simulation results show that power-optimal logic depth is to 8 FO and optimal power saving varies from to 8% compared to a FO logic depth, depending on threshold voltage,, and presence of clock-gating. We decompose the power consumption of a circuit into three components, switching power, leakage power, and idle power, and present the following insights into power-optimal pipelining. First, power-optimal logic depth decreases and s increase for larger s, where switching power dominates over leakage and idle power. Second, pipelining is more effective with lower threshold voltages at high s, but higher threshold voltages give better results at lower s where leakage current dominates. Lastly, clock-gating enables deeper pipelining and more power saving because it reduces timing element overhead when the is low. Categories and Subject Descriptors: B.. [Integrated Circuits]: Types and Design Styles Advanced Technologies, Microprocessors and Microcomputers, VLSI General Terms:Performance, Design, Theory Keywords:Pipelining, Supply Voltage Reduction, Power Scaling. INTRODUCTION Pipelining reduces the number of logic levels between registers and is usually employed by digital systems designers to increase achievable clock frequency. But the time slack obtained from pipelining can also be used to reduce power consumption by low- Λ This work was partly funded by NSF CAREER award CCR- 9, NSF ITR award CCR-9, and a donation from Intel Corporation. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISLPED, August 9,, Newport Beach, California, USA. Copyright ACM //8...$.. ering supply voltage at a fixed clock frequency. This technique can be very effective for digital systems with fixed throughput requirements and highly parallel computations. Supply voltage scaling is by far one of the most effective techniques for trading time slack for power. Supply voltage reduction leads to a quadratic reduction in active power and also a super-linear reduction in leakage power, as leakage current has a strong dependency on drain voltage in deep submicron processes. A parallel architecture could also be used to provide excess performance to trade for lower power, but pipelining has the advantage of a lower area penalty. Power reductions from pipelining are eventually limited by the power overhead of the additional pipeline latches or flip-flops required for each additional pipe stage, leading to a power-optimal level of pipelining. In this paper, we show how power-optimal pipelining varies for different operating regimes in deep submicron technology. We examine the tradeoffs between pipeline depth, supply voltage, threshold voltage, and total power using circuit-level simulations and analytical models. We also explore the effect of and clock gating.. RELATED WORK The trend towards deeper pipelines in microprocessors is clearly seen in the evolution of Intel x8 family, with a factor of reduction in logic depth per stage over the last decade [9]. This reduction in logic depth has combined with improvements in transistor speed from technology scaling to yield an even larger increase in processor clock frequency. Increasing the number of pipeline stages for an operation increases its latency in clock cycles, which in turn increases the number of pipeline stalls experienced by dependent operations. The resulting reduction in instructions completed per cycle (IPC) reduces the performance advantage from greater clock frequency, with greater impact on codes with lower instructionlevel parallelism (ILP). Processor architects have explored this tradeoff between increased clock frequency and reduced IPC to determine performance-optimal pipelining depth. Early work by Kunkel and Smith [] considered pipelining in vector supercomputers and found that 8 ECL gate levels was performance-optimal for scalar code, and as little as gate levels for more parallel vector code. Recently, several authors have iestigated the performanceoptimal pipeline depth for superscalar microprocessors [, 9, ], with a consensus in the range of 8 FO delays for SPEC integer codes and around FO delays for SPEC floating-point codes, which generally have higher ILP. These performance-optimal numbers ignore power as well as the design and verification complexity that would accompany such high-frequency designs (roughly twice the clock rate of existing systems []). Several authors have extended superscalar performance mod- 8

2 els with power models that include the power overhead of additional pipeline latches [, ]. Srinivasan et al. [] found that power-performance optimal logic depth increases to about 8 FO for SPEC benchmarks and around 8 FO for TPC-C, a commercial application. Hartstein and Puzak [] found. FO is the power-performance optimum according to their powerperformance metric. They also found that clock gating pushes the optimum back to deeper pipelines [] which agrees with our results. This previous work focuses on processor performance, where limited instruction parallelism reduces the benefits of deep pipelines, and these studies limit power optimization to the selection of the correct number of additional pipeline stages. Other types of digital system, including digital signal processors, network processors, and graphics engines, have much greater levels of parallelism and often have fixed throughput requirements. For these systems, pipelining can be used together with voltage and threshold scaling to reduce total energy consumption while maintaining a fixed clock rate. The use of pipelining for power reduction was proposed by Chandrakasan et al.[] but without an attempt to determine the power-optimal pipelining strategy.. METHODOLOGY In this paper, our main target is a logic-dominant pipeline stage. We make several simplifying assumptions in our analysis. We are interested in fixed-throughput designs for highly parallel computations and so do not include any performance loss from an increased frequency of pipeline stalls as pipeline depths increase. Global wire delay does not scale as fast as gate delay as feature size is reduced, and some modern microprocessors have so-called drive stages which include only wires and repeaters [8]. We leave wire-dominant pipeline stages for future work but note that wire RC delay become relatively less important as supply voltage is scaled down in a fixed technology, because wire resistance remains constant while effective transistor resistance increases. We do not include local wire capacitance due to the absence of detailed circuit layouts, but note that wire cap can be an important component of total load in deep submicron technology even for a logic-dominant stage. INPUT stages OUTPUT Figure shows the baseline pipeline stage model assumed in our study. To model a well-designed path in a circuit, we use a simple static ierter chain with each ierter driving four copies of itself to yield a FO load. We use FO delays as a baseline clock period, representing a current high-performance processor circuit (the high-frequency Pentium- has a FO cycle time [], and most other designs have somewhat shallower pipelines). Even though different circuit styles and logic gates might lead to different power-optimal pipelining results, we assume that our FO ierter chain model is fairly representative and insights gathered from our simulation results can be applied to other cases. Flipflops were chosen as the timing elements rather than latches due to their simplicity of usage, and the PowerPC transmission-gate flipflop was chosen because it is a popular choice due to its robustness and energy-efficiency []. While the transistor sizes of ierters and flip-flops were fixed, the sizes of clock buffers were varied to ensure the appropriate clock rise and fall times when varying the depth of pipelining. We used the BPTM nm transistor models with different threshold voltages [] and HSPICE for circuit simulation. Throughout the paper, clock frequency was fixed at GHz and temperature was constant at o C. We only considered subthreshold leakage; although gate leakage might become significant at some point in these technology generations, it is also likely that new gate dielectrics will make gate leakage insignificant again.. PIPELINING AND SUPPLY VOLTAGE We begin by showing the effect of pipeline depth on supply voltage. With delay fixed, supply voltage scales down as pipelining deepens because the logic amount per pipeline stage decreases. Synchronous circuit delay is approximately given by delay / (N + k) V dd (V dd V th ) ff () where N is the logic depth per pipeline stage in term of FO delay (or the ierters per pipeline stage), k is the timing element delay normalized by FO delay, ff is a velocity saturation effect factor, V dd and V th are supply and threshold voltages respectively. Assuming ff is (actual value of ff in deep submicron technology is close to. due to the short-channel effect), N + k / V dd V th + V th V dd () Now assuming V th V dd is close to zero, we get a simple linear equation between V dd and N, wherea is a constant: V dd = a N + a () ( a = k + V th ) () a a Figure shows the simulated supply voltages when varying the ierters per stage for different threshold voltages. LV T, MV T, andhv T represent low, medium, and high threshold voltages respectively and their values are shown in Table. Low threshold voltage results in low supply voltage for the same delay. a and a were calculated using least square method and shown in Table. We can see that a is proportional to V th as well as a (our simplified equations fail to explain the effect). Figure : Baseline pipeline stage model. Input and clock buffers are not shown..9.8 Vdd (V) Figure : Supply voltage scaling shown as voltage required to achieve GHz with given logic levels per pipeline stage. 9

3 V th (V) NMOS(PMOS) name a a. (-.)...9 (-.)..9. (-.).8.89 Table : Threshold voltages and supply voltage scaling coefficients.. PIPELINING POWER COMPONENTS In this section, we explore the impact of pipelining on the components of total power consumption when delay is fixed. We use the supply voltage scaling results shown above in Section and iestigate switching, leakage, and idle components of power consumption assuming no clock-gating mechanism.. Pipelining and Switching Power Switching power remains the dominant component of total power consumption when the is high, even in leaky deep submicron technology. Switching power is the power consumed while charging and discharging load capacitances. The load capacitances include transistor parasitic and wire capacitances. Because we assume our pipeline stage is logic-dominant, wire capacitances are not included in our simulation. The switching power of a pipelined logic stage can be divided between that due to logic gates and that due to timing elements, and can be modeled as: P switching = (b + b N )V dd () = b a( + b a )(N + ) () b N a The overhead includes clock and switching power of timing elements and it is iersely proportional to, N, the number of logic gates per stage. We assume that the number of latches increases linearly with the number of pipeline stages. All the switching power components are proportional to Vdd. The ratio of switching power coefficients b b is approximately the ratio of the parasitic capacitances of one FO ierter and one timing element. When N is much greater than a a and b b, P switching becomes quadratic to N as shown in Eq., which represents the dominance of logic gate switching power. P switching ß b a N () On the other hand, if N gets much smaller than a a and b b, P switching becomes iersely proportional to N, as shown in Eq. 8, which represents the dominance of timing element switching power: P switching ß b a N Note that the (N + a a ) term in Eq. makes relative P switching scaledownslowlywhen a a is large. Since a higher V th process has a higher a a, as shown in Table, a higher V th process gets less switching power saving from pipelining. The optimal logic depth N Λ is given by: s N Λ = ( b b (8) +8 a a b b b b ) (9) The equation indicates that the capacitance ratio of a timing element and an FO ierter b b is positively correlated to N Λ.Thatis, larger timing element parasitics lead to less deep pipelining. However, N Λ is not as sensitive to b b as it is to a a. This means that V th and timing element delay k affect N Λ and correspondingly optimal power saving more significantly (Eq. ) Figure : Switching power scaling. Figure shows the switching power obtained through HSPICE simulation of our model pipeline stage. Optimal logic depth N Λ was and was from 9 to 8% compared to the baseline of N =. The graphs show that lower threshold voltages gives slightly lower optimal logic depth and also slightly greater switching power saving. The variances are quite small since the variance of a a is small.. Pipelining and Leakage Power The rapid reduction in gate length and accompanying downscaling of threshold voltages over the last few process generations has led to an exponential growth in leakage power. Within a few process generations, it is predicted power dissipation from static leakage current could be comparable to dynamic switching energy [, ]. The leakage power of our pipelined circuit can be given by the following equations: P leak =(c + c + V dd N )V dd e T () a c a (N+ a = c a e T ( + )(N + ) e T a ) () c N a where T is a constant representing leakage current slope, is a Drain-Induced-Barrier-Lowering (DIBL) coefficient, and c c is the ratio of leakage power of one FO ierter versus one timing element. As in the switching power model, the leakage power in a stage can be divided into logic gate leakage and timing element leakage, with timing element leakage iersely proportional to N. When N is much greater than a a and c c, P leak becomes proportional to the product of N and the exponential term e N T a : P leak a N ß c a e T Ne T a (N+ a T () a The exponential term, e ) represents the dependence of leakage current on the drain voltage (from DIBL). In modern deep submicron technology, for an appropriate supply voltage range, this term is larger than O() but smaller than O(N ), therefore leakage power is reduced in a super-linear fashion as N decreases, though less than the quadratic reduction for switching power. Also, it is noted that the exponential term scales down faster as N decreases when a is larger. A higher V th process has higher a as shown in Table, and so it is expected that higher V th process will see greater leakage power saving from pipelining, which is the

4 opposite to the switching power case, but higher V th processes have less absolute leakage to begin with. On the other hand, if N becomes much smaller than a a and c c, P leak becomes iersely proportional to N just as in the P switching case: P leak ß c a e T N e T () Figure shows the simulated leakage power while varying the number of logic gates per stage. Optimal logic depth N Λ was around six and was around %. The graphs show that lower threshold voltages gives less leakage power saving and slightly greater optimal logic depth a.. Figure : Leakage power versus logic depth per stage.. Idle Power without Clock-Gating Clock-gating is a popular switching power reduction technique which inactivates the clock signal to timing elements within an inactive block when a circuit block is idle. But clock gating is not always possible due to the increase control complexity or the insufficient setup time of the clock enable signal. This section focuses on the impact of pipelining on an idle pipeline stage without clockgating. The following section discusses the effects of clock-gating. The following equations model idle power with no clock-gating mechanism as simply the sum of the switching power of the timing elements and the total leakage power. Because of the exponential dependency of leakage current on V th as represented in the e T term, P idle approximately follows the switching power of the timing elements when V th is high and follows the total leakage power when V th is low. P idle = b N V dd +(c + c + V dd N )V dde T () P idle = b a a (N + N ) + () a c a e T ( + c a a (N+ a )(N + )e T a () ) c N a When N is much greater than a a and c c, P idle becomes proportional to the product of N and the exponential function of N or just proportional to N, depending on V th : P idle (HighV th ) ß b a N () P idle (LowV th ) ß c a e T Ne a N T (8) When V th is high, P idle shows a linear reduction as N decreases, which is slower than a quadratic reduction as in switching power or a super-linear reduction as in leakage power. Thus, we can expect that idle power savings from pipelining are lower than those of switching and leakage power saving when V th is high. On the other hand, if N is much smaller than a a and c c, P idle becomes iersely proportional to N: P idle (HighV th ) ß b a N (9) P idle (LowV th ) ß c a e T N e T () Figure shows the simulated idle power without clock-gating, varying the ierters per pipeline stage. Optimal logic depth N Λ was 8, which is greater than the optimal logic depths for switching and leakage power. Also, was smaller ( to %) compared to the switching and leakage power cases. For idle stages, the overhead of timing elements is more significant compared to active stages. The graphs show that lower threshold voltages gives more idle power saving and slightly lower optimal logic depth Figure : Idle power scaling. a... RESULTS In this section, we combine the results for the individual power components to calculate optimal logic depths and optimal power savings for different operating regimes including threshold voltage,, and presence of clock-gating. Power-optimal pipelining varies depending on and V th because these change the proportion of switching power and leakage power (or idle power with no clock-gating mechanism), and each impacts pipelining power differently as seen in Section. This section is divided into two parts: the first part details the case when there is a clock-gating mechanism for pipeline stages and the second part considers the case without clock-gating.. Case : Clock-Gating Present Figure shows the simulated total power when a clock-gating mechanism is present for different s. With a low activity factor, total power curves follow leakage power curves and high V th leads to more power saving by pipelining. As the activity factor increases, total power curves follow switching power curves and high V th leads to less power saving by pipelining. Figure shows the simulated optimal total power saving when a clock-gating mechanism is present. With zero, optimal power savings compared to a FO design vary from

5 to % depending on V th. Since switching power savings from pipelining are less dependent upon V th, s reach around 8% regardless of V th as increases. Because both switching power and leakage power are minimized when N is as seen in Section. and Section., optimal logic depth was found to be regardless of or threshold voltage when a clock-gating mechanism is present. However, as seen in Figure and Figure, both switching and leakage power curves are quite flat around the optimum and power saving by pipelining is quite insensitive to modest deviations from the optimum. Therefore, 8 FO delays per stage might be a better choice since it simplifies design complexity with a small loss of power saving = = = =. Figure : Total power scaling with a clock-gating mechanism. 8 8 base base base pipe pipe pipe Figure : Optimal power saving with a clock-gating mechanism.. Case : No Clock-Gating Present Figure 8 shows the simulated total power without clock-gating for different s. With a low, total power curves follow idle power curves and low V th leads to more power saving (Section.). As the increases, total power curves follow switching power curves. Figure 9 shows the simulated optimal total power saving when there is no clock-gating mechanism. With zero, optimal power savings are around to % less than the clock-gating present case because of the timing element switching power overhead which is not present when there is a clock-gating scheme. Optimal power savings reach 8% slowly as increases compared to the clock-gated case. It is noted that low V th gets the most power saving regardless of. Figure shows the optimal logic depths when clock is not gated for different threshold voltages. Because the idle power is minimized when N is 8 (Section.), optimal logic depths remain at 8 until reaches around. (. at high V th )andafter. (. at high V th ), it falls to..8. = = = =. Figure 8: Total power scaling with no clock-gating mechanism. 8 8 base base base pipe pipe pipe Figure 9: Optimal power saving with no clock-gating mechanism.. DISCUSSION Our study has a number of limitations. The number of latches was assumed to grow linearly with the number of pipeline stages, whereas previous authors have used a superlinear latch count scaling formula of the form N, with an exponent ß : [, ]. It is not clear how latch counts scale in highly parallel architectures, but larger values of would increase the optimal logic depth. Depending on the computation being parallelized, additional state in the form of larger memory arrays might be required to track the increased number of operations in flight. A growth in the size of these memory structures would tend to increase energy per operation and hence increase optimal logic depth per stage, though

6 8 low Vt. 8 medium Vt. 8 high Vt. Figure : Optimal logic depth with no clock-gating mechanism. we expect this effect to be minor as memories are generally lower power than processing units. Our study did not include the effects of glitching on power. Others have noted that glitching activity reduces linearly with pipeline depth as it becomes less likely that inputs to a gate would have very different path lengths []. This effect would tend to push the optimum towards shallower pipeline stages. We did not include parasitic wire capacitance. Adding wire load capacitance to our model will increase total switching power, and so will again push the optimum towards shallower pipeline stages. For deeply pipelined circuits, fast path problems are more likely, as there will be an increase in the number of short logic paths between timing elements and an decrease in the relative wire delay. Because clock frequency is not increased, clock skew and jitter problems are not as apparent as in a frequency-scaled design, but clock jitter might increase as power supply to the clock drivers is reduced. One benefit of supply scale-down is that wire delay becomes relatively less significant as gates slow down. This helps reduce some of the design effort of building a highly pipelined circuit compared with pipelining for increased clock frequency. 9. REFERENCES [] A. Chandrakasan, W. J. Bowhill, and F. Fox. Design of High Performance Microprocessor Circuits. IEEE Press,. [] A. Chandrakasan et al. Low-power CMOS digital design. IEEE JSSC, (): 8, Apr. 99. [] V. De and S. Borkar. Technology and design challenges for low power and high performance. In ISLPED, pages 8, 999. [] Device Group at UC Berkeley. Predictive technology model. Technical report, UC Berkeley,. ptm/. [] A. Hartstein and T. Puzak. The optimum pipeline depth for a microprocessor. In ISCA 9, pages, May. [] A. Hartstein and T. Puzak. Optimum power/performance pipeline depth. In MICRO, Dec.. [] S.Heo,R.Krashinsky,andK.Asanović. Activity-sensitive flip-flop and latch selection for reduced energy. In 9th Conference on Advanced Research in VLSI, Salt Lake City,UT USA, March. [8] G. Hinton et al. A.8 μm CMOS IA- processor with a -GHz integer execution unit. IEEE JSSC, ():, Nov.. [9] M. Hrishikesh et al. The optimal logic depth per pipeline stage is to 8 FO ierter delays. In ISCA 9, pages, May. [] S. R. Kunkel and J. E. Smith. Optimal pipelining in supercomputers. In Proceedings th Symposium on Computer Architecture, pages, Tokyo, Japan, June 98. [] E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. In ISCA 9, pages, May. [] V. Srinivasan et al. Optimizing pipelines for power and performance. In MICRO, Nov.. 8. CONCLUSIONS Pipelining can be an effective power-reduction tool when used to support voltage scaling in digital systems implementing highly parallel computations. Simulation results show that power-optimal logic depth is to 8 FO and varies from to 8% compared with a FO design depending on threshold voltage,, and the presence of clock-gating. Even though the exact power-optimal pipelining is technologydependent, we can gain some important insights from the simulation results. First, higher s decrease the poweroptimal logic depth and increase the because pipelining is most effective at saving the additional switching power. Second, pipelining is more effective with lower threshold voltages, resulting in lower logic depths and lowest power, except for low s when leakage power is dominant. Third, clock-gating enables deeper pipelining and more power saving because it reduces timing element overhead when is low. Therefore, power-optimal pipelining with clock gating should be an efficient low-power technique for high throughput blocks in systems implementing highly parallel computations.

Power-Optimal Pipelining in Deep Submicron Technology

Power-Optimal Pipelining in Deep Submicron Technology PowerOptimal Pipelining in Deep Submicron Technology Seongmoo Heo and Krste Asanović MIT Computer Science and Artificial Intelligence aboratory Vassar Street, Cambridge, MA heomoo,krste csailmitedu ABSTRACT

More information

Total Power-Optimal Pipelining and Parallel Processing under Process Variations in Nanometer Technology

Total Power-Optimal Pipelining and Parallel Processing under Process Variations in Nanometer Technology otal Power-Optimal Pipelining and Parallel Processing under Process ariations in anometer echnology am Sung Kim 1, aeho Kgil, Keith Bowman 1, ivek De 1, and revor Mudge 1 Intel Corporation, Hillsboro,

More information

CSE 548 Computer Architecture. Clock Rate vs IPC. V. Agarwal, M. S. Hrishikesh, S. W. Kechler. D. Burger. Presented by: Ning Chen

CSE 548 Computer Architecture. Clock Rate vs IPC. V. Agarwal, M. S. Hrishikesh, S. W. Kechler. D. Burger. Presented by: Ning Chen CSE 548 Computer Architecture Clock Rate vs IPC V. Agarwal, M. S. Hrishikesh, S. W. Kechler. D. Burger Presented by: Ning Chen Transistor Changes Development of silicon fabrication technology caused transistor

More information

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 09, 2016 ISSN (online): 2321-0613 A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM Yogit

More information

Cache Implications of Aggressively Pipelined High Performance Microprocessors

Cache Implications of Aggressively Pipelined High Performance Microprocessors Cache Implications of Aggressively Pipelined High Performance Microprocessors Timothy J. Dysart, Branden J. Moore, Lambert Schaelicke, Peter M. Kogge Department of Computer Science and Engineering University

More information

The Impact of Wave Pipelining on Future Interconnect Technologies

The Impact of Wave Pipelining on Future Interconnect Technologies The Impact of Wave Pipelining on Future Interconnect Technologies Jeff Davis, Vinita Deodhar, and Ajay Joshi School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332-0250

More information

POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS

POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS Radu Zlatanovici zradu@eecs.berkeley.edu http://www.eecs.berkeley.edu/~zradu Department of Electrical Engineering and Computer Sciences University

More information

VLSI microarchitecture. Scaling. Toni Juan

VLSI microarchitecture. Scaling. Toni Juan VLSI microarchitecture Scaling Toni Juan Short index Administrative glass schedule projects? Topic of the day: Scaling VLSI microarchitecture Toni Juan, Nov 2000 2 Administrative Class Schedule Day Topic

More information

Optimizing Pipelines for Power and Performance

Optimizing Pipelines for Power and Performance Optimizing Pipelines for Power and Performance Viji Srinivasan, David Brooks, Michael Gschwind, Pradip Bose, Victor Zyuban, Philip N. Strenski, Philip G. Emma IBM T.J. Watson Research Center Yorktown Heights,

More information

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141 ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition

More information

ECE 486/586. Computer Architecture. Lecture # 2

ECE 486/586. Computer Architecture. Lecture # 2 ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:

More information

Design of Low Power Wide Gates used in Register File and Tag Comparator

Design of Low Power Wide Gates used in Register File and Tag Comparator www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,

More information

Device And Architecture Co-Optimization for FPGA Power Reduction

Device And Architecture Co-Optimization for FPGA Power Reduction 54.2 Device And Architecture Co-Optimization for FPGA Power Reduction Lerong Cheng, Phoebe Wong, Fei Li, Yan Lin, and Lei He Electrical Engineering Department University of California, Los Angeles, CA

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Delay Modeling and Static Timing Analysis for MTCMOS Circuits

Delay Modeling and Static Timing Analysis for MTCMOS Circuits Delay Modeling and Static Timing Analysis for MTCMOS Circuits Naoaki Ohkubo Kimiyoshi Usami Graduate School of Engineering, Shibaura Institute of Technology 307 Fukasaku, Munuma-ku, Saitama, 337-8570 Japan

More information

Microprocessor Trends and Implications for the Future

Microprocessor Trends and Implications for the Future Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from

More information

Enhanced Multi-Threshold (MTCMOS) Circuits Using Variable Well Bias

Enhanced Multi-Threshold (MTCMOS) Circuits Using Variable Well Bias Bookmark file Enhanced Multi-Threshold (MTCMOS) Circuits Using Variable Well Bias Stephen V. Kosonocky, Mike Immediato, Peter Cottrell*, Terence Hook*, Randy Mann*, Jeff Brown* IBM T.J. Watson Research

More information

The Optimum Pipeline Depth for a Microprocessor

The Optimum Pipeline Depth for a Microprocessor The Optimum Pipeline Depth for a Microprocessor A. Hartstein and Thomas R. Puzak IBM- T. J. Watson Research Center Yorktown Heights, NY 10598 hart@ watson, ibm. corn trpuzak @ us. ibrn.com Abstract The

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

Design and Low Power Implementation of a Reorder Buffer

Design and Low Power Implementation of a Reorder Buffer Design and Low Power Implementation of a Reorder Buffer J.D. Fisher, C. Romo, E. John, W. Lin Department of Electrical and Computer Engineering, University of Texas at San Antonio One UTSA Circle, San

More information

Power Optimization in FPGA Designs

Power Optimization in FPGA Designs Mouzam Khan Altera Corporation mkhan@altera.com ABSTRACT IC designers today are facing continuous challenges in balancing design performance and power consumption. This task is becoming more critical as

More information

Advanced Computer Architecture (CS620)

Advanced Computer Architecture (CS620) Advanced Computer Architecture (CS620) Background: Good understanding of computer organization (eg.cs220), basic computer architecture (eg.cs221) and knowledge of probability, statistics and modeling (eg.cs433).

More information

Power Measurement Using Performance Counters

Power Measurement Using Performance Counters Power Measurement Using Performance Counters October 2016 1 Introduction CPU s are based on complementary metal oxide semiconductor technology (CMOS). CMOS technology theoretically only dissipates power

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

FPGA Power Management and Modeling Techniques

FPGA Power Management and Modeling Techniques FPGA Power Management and Modeling Techniques WP-01044-2.0 White Paper This white paper discusses the major challenges associated with accurately predicting power consumption in FPGAs, namely, obtaining

More information

10. Interconnects in CMOS Technology

10. Interconnects in CMOS Technology 10. Interconnects in CMOS Technology 1 10. Interconnects in CMOS Technology Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October

More information

EE586 VLSI Design. Partha Pande School of EECS Washington State University

EE586 VLSI Design. Partha Pande School of EECS Washington State University EE586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 1 (Introduction) Why is designing digital ICs different today than it was before? Will it change in

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

Transistors and Wires

Transistors and Wires Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis Part II These slides are based on the slides provided by the publisher. The slides

More information

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry High Performance Memory Read Using Cross-Coupled Pull-up Circuitry Katie Blomster and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017 Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of

More information

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

Columbia Univerity Department of Electrical Engineering Fall, 2004

Columbia Univerity Department of Electrical Engineering Fall, 2004 Columbia Univerity Department of Electrical Engineering Fall, 2004 Course: EE E4321. VLSI Circuits. Instructor: Ken Shepard E-mail: shepard@ee.columbia.edu Office: 1019 CEPSR Office hours: MW 4:00-5:00

More information

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Practical Information

EE241 - Spring 2000 Advanced Digital Integrated Circuits. Practical Information EE24 - Spring 2000 Advanced Digital Integrated Circuits Tu-Th 2:00 3:30pm 203 McLaughlin Practical Information Instructor: Borivoje Nikolic 570 Cory Hall, 3-9297, bora@eecs.berkeley.edu Office hours: TuTh

More information

Low-Power Programmable FPGA Routing Circuitry

Low-Power Programmable FPGA Routing Circuitry 1 Low-Power Programmable FPGA Routing Circuitry Jason H. Anderson, Member, IEEE, and Farid N. Najm, Fellow, IEEE Abstract We consider circuit techniques for reducing FPGA power consumption and propose

More information

CS/EE 6810: Computer Architecture

CS/EE 6810: Computer Architecture CS/EE 6810: Computer Architecture Class format: Most lectures on YouTube *BEFORE* class Use class time for discussions, clarifications, problem-solving, assignments 1 Introduction Background: CS 3810 or

More information

LOW-POWER PROCESSOR DESIGN

LOW-POWER PROCESSOR DESIGN LOW-POWER PROCESSOR DESIGN Ricardo E. Gonzalez Technical Report No. CSL-TR-97-726 June 1997 This research has been supported by ARPA contract FBI-92-194. LOW-POWER PROCESSOR DESIGN Ricardo E. Gonzalez

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

Lecture 5. Other Adder Issues

Lecture 5. Other Adder Issues Lecture 5 Other Adder Issues Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 24 by Mark Horowitz with information from Brucek Khailany 1 Overview Reading There

More information

CAD Technology of the SX-9

CAD Technology of the SX-9 KONNO Yoshihiro, IKAWA Yasuhiro, SAWANO Tomoki KANAMARU Keisuke, ONO Koki, KUMAZAKI Masahito Abstract This paper outlines the design techniques and CAD technology used with the SX-9. The LSI and package

More information

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas

Power Solutions for Leading-Edge FPGAs. Vaughn Betz & Paul Ekas Power Solutions for Leading-Edge FPGAs Vaughn Betz & Paul Ekas Agenda 90 nm Power Overview Stratix II : Power Optimization Without Sacrificing Performance Technical Features & Competitive Results Dynamic

More information

DVFS Space Exploration in Power-Constrained Processing-in-Memory Systems

DVFS Space Exploration in Power-Constrained Processing-in-Memory Systems DVFS Space Exploration in Power-Constrained Processing-in-Memory Systems Marko Scrbak and Krishna M. Kavi Computer Systems Research Laboratory Department of Computer Science & Engineering University of

More information

Motivation. Banked Register File for SMT Processors. Distributed Architecture. Centralized Architecture

Motivation. Banked Register File for SMT Processors. Distributed Architecture. Centralized Architecture Motivation Banked Register File for SMT Processors Jessica H. Tseng and Krste Asanović MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139, USA BARC2004 Increasing demand on

More information

CS250 VLSI Systems Design Lecture 9: Memory

CS250 VLSI Systems Design Lecture 9: Memory CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled

More information

Lecture 4a. CMOS Fabrication, Layout and Simulation. R. Saleh Dept. of ECE University of British Columbia

Lecture 4a. CMOS Fabrication, Layout and Simulation. R. Saleh Dept. of ECE University of British Columbia Lecture 4a CMOS Fabrication, Layout and Simulation R. Saleh Dept. of ECE University of British Columbia res@ece.ubc.ca 1 Fabrication Fabrication is the process used to create devices and wires. Transistors

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 938 LOW POWER SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY T.SANKARARAO STUDENT OF GITAS, S.SEKHAR DILEEP

More information

Receiver Modeling for Static Functional Crosstalk Analysis

Receiver Modeling for Static Functional Crosstalk Analysis Receiver Modeling for Static Functional Crosstalk Analysis Mini Nanua 1 and David Blaauw 2 1 SunMicroSystem Inc., Austin, Tx, USA Mini.Nanua@sun.com 2 University of Michigan, Ann Arbor, Mi, USA Blaauw@eecs.umich.edu

More information

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University

Abbas El Gamal. Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program. Stanford University Abbas El Gamal Joint work with: Mingjie Lin, Yi-Chang Lu, Simon Wong Work partially supported by DARPA 3D-IC program Stanford University Chip stacking Vertical interconnect density < 20/mm Wafer Stacking

More information

FPGA Programming Technology

FPGA Programming Technology FPGA Programming Technology Static RAM: This Xilinx SRAM configuration cell is constructed from two cross-coupled inverters and uses a standard CMOS process. The configuration cell drives the gates of

More information

UNTIL VERY recently, only dynamic power has been a. Quantitative Analysis and Optimization Techniques for On-Chip Cache Leakage Power

UNTIL VERY recently, only dynamic power has been a. Quantitative Analysis and Optimization Techniques for On-Chip Cache Leakage Power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 10, OCTOBER 2005 1147 Quantitative Analysis and Optimization Techniques for On-Chip Cache Leakage Power Nam Sung Kim, David

More information

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) K.Prasad Babu 2 M.tech (Ph.d) hanumanthurao19@gmail.com 1 kprasadbabuece433@gmail.com 2 1 PG scholar, VLSI, St.JOHNS

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information

Design and verification of low power SRAM system: Backend approach

Design and verification of low power SRAM system: Backend approach Design and verification of low power SRAM system: Backend approach Yasmeen Saundatti, PROF.H.P.Rajani E&C Department, VTU University KLE College of Engineering and Technology, Udhayambag Belgaum -590008,

More information

Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology

Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology Vol. 3, Issue. 3, May.-June. 2013 pp-1475-1481 ISSN: 2249-6645 Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology Bikash Khandal,

More information

Introduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN

Introduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN 1 Introduction The evolution of integrated circuit (IC) fabrication techniques is a unique fact in the history of modern industry. The improvements in terms of speed, density and cost have kept constant

More information

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling Based on Interconnect Prediction and Sampling Yu Hu King Ho Tam Tom Tong Jing Lei He Electrical Engineering Department University of California at Los Angeles System Level Interconnect Prediction (SLIP),

More information

Defining Wakeup Width for Efficient Dynamic Scheduling

Defining Wakeup Width for Efficient Dynamic Scheduling Defining Wakeup Width for Efficient Dynamic Scheduling Aneesh Aggarwal ECE Depment Binghamton University Binghamton, NY 9 aneesh@binghamton.edu Manoj Franklin ECE Depment and UMIACS University of Maryland

More information

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007 Minimizing Power Dissipation during Write Operation to Register Files Kimish Patel, Wonbok Lee, Massoud Pedram University of Southern California Los Angeles CA August 28 th, 2007 Introduction Outline Conditional

More information

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 8 (2013), pp. 907-912 Research India Publications http://www.ripublication.com/aeee.htm Circuit Model for Interconnect Crosstalk

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

Design of a Low Power Content Addressable Memory (CAM)

Design of a Low Power Content Addressable Memory (CAM) Design of a Low Power Content Addressable Memory (CAM) Scott Beamer, Mehmet Akgul Department of Electrical Engineering & Computer Science University of California, Berkeley {sbeamer, akgul}@eecs.berkeley.edu

More information

On Test Generation for Transition Faults with Minimized Peak Power Dissipation

On Test Generation for Transition Faults with Minimized Peak Power Dissipation 30.3 On Test Generation for Transition Faults with Minimized Peak Power Dissipation Wei Li Sudhakar M. Reddy Irith Pomeranz 2 Dept. of ECE School of ECE Univ. of Iowa Purdue University Iowa City, IA 52242

More information

Tailoring Tests for Functional Binning of Integrated Circuits

Tailoring Tests for Functional Binning of Integrated Circuits 2012 IEEE 21st Asian Test Symposium Tailoring Tests for Functional Binning of Integrated Circuits Suraj Sindia Vishwani D. Agrawal Department of Electrical and Computer Engineering Auburn University, Alabama,

More information

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function. FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different

More information

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric.

SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, A Low-Power Field-Programmable Gate Array Routing Fabric. SUBMITTED FOR PUBLICATION TO: IEEE TRANSACTIONS ON VLSI, DECEMBER 5, 2007 1 A Low-Power Field-Programmable Gate Array Routing Fabric Mingjie Lin Abbas El Gamal Abstract This paper describes a new FPGA

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

EE5780 Advanced VLSI CAD

EE5780 Advanced VLSI CAD EE5780 Advanced VLSI CAD Lecture 1 Introduction Zhuo Feng 1.1 Prof. Zhuo Feng Office: EERC 513 Phone: 487-3116 Email: zhuofeng@mtu.edu Class Website http://www.ece.mtu.edu/~zhuofeng/ee5780fall2013.html

More information

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras CAD for VLSI Debdeep Mukhopadhyay IIT Madras Tentative Syllabus Overall perspective of VLSI Design MOS switch and CMOS, MOS based logic design, the CMOS logic styles, Pass Transistors Introduction to Verilog

More information

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System 1 1 1 Centip3De: A 64-Core, 3D Stacked, Near-Threshold System Ronald G. Dreslinski David Fick, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman

More information

PARE: A Power-Aware Hardware Data Prefetching Engine

PARE: A Power-Aware Hardware Data Prefetching Engine PARE: A Power-Aware Hardware Data Prefetching Engine Yao Guo Mahmoud Ben Naser Csaba Andras Moritz Department of Electrical and Computer Engineering University of Massachusetts, Amherst, MA 01003 {yaoguo,

More information

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Prof. Lei He EE Department, UCLA LHE@ee.ucla.edu Partially supported by NSF. Pathway to Power Efficiency and Variation Tolerance

More information

Static Energy Reduction Techniques for Microprocessor Caches

Static Energy Reduction Techniques for Microprocessor Caches Appears in the Proceedings of the 1 International Conference on Computer Design Static Energy Reduction Techniques for Microprocessor Caches Heather Hanson M.S. Hrishikesh Vikas Agarwal Stephen W. Keckler

More information

Reducing Power with Dynamic Critical Path Information

Reducing Power with Dynamic Critical Path Information Reducing Power with Dynamic Critical Path Information John S. Seng Eric S. Tune Dean M. Tullsen Dept. of Computer Science and Engineering University of California, San Diego La Jolla, CA 9293-4 fjseng,etune,tullseng@cs.ucsd.edu

More information

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad NoC Round Table / ESA Sep. 2009 Asynchronous Three Dimensional Networks on on Chip Frédéric ric PétrotP Outline Three Dimensional Integration Clock Distribution and GALS Paradigm Contribution of the Third

More information

Dynamic Voltage and Frequency Scaling Circuits with Two Supply Voltages

Dynamic Voltage and Frequency Scaling Circuits with Two Supply Voltages Dynamic Voltage and Frequency Scaling Circuits with Two Supply Voltages ECE Department, University of California, Davis Wayne H. Cheng and Bevan M. Baas Outline Background and Motivation Implementation

More information

A Robust Compressed Sensing IC for Bio-Signals

A Robust Compressed Sensing IC for Bio-Signals A Robust Compressed Sensing IC for Bio-Signals Jun Kwang Oh Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2014-114 http://www.eecs.berkeley.edu/pubs/techrpts/2014/eecs-2014-114.html

More information

Reconfigurable Multicore Server Processors for Low Power Operation

Reconfigurable Multicore Server Processors for Low Power Operation Reconfigurable Multicore Server Processors for Low Power Operation Ronald G. Dreslinski, David Fick, David Blaauw, Dennis Sylvester, Trevor Mudge University of Michigan, Advanced Computer Architecture

More information

On the Decreasing Significance of Large Standard Cells in Technology Mapping

On the Decreasing Significance of Large Standard Cells in Technology Mapping On the Decreasing Significance of Standard s in Technology Mapping Jae-sun Seo, Igor Markov, Dennis Sylvester, and David Blaauw Department of EECS, University of Michigan, Ann Arbor, MI 48109 {jseo,imarkov,dmcs,blaauw}@umich.edu

More information

EE241 - Spring 2004 Advanced Digital Integrated Circuits

EE241 - Spring 2004 Advanced Digital Integrated Circuits EE24 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolić Lecture 2 Impact of Scaling Class Material Last lecture Class scope, organization Today s lecture Impact of scaling 2 Major Roadblocks.

More information

High-Performance Full Adders Using an Alternative Logic Structure

High-Performance Full Adders Using an Alternative Logic Structure Term Project EE619 High-Performance Full Adders Using an Alternative Logic Structure by Atulya Shivam Shree (10327172) Raghav Gupta (10327553) Department of Electrical Engineering, Indian Institure Technology,

More information

ADVANCES IN PROCESSOR DESIGN AND THE EFFECTS OF MOORES LAW AND AMDAHLS LAW IN RELATION TO THROUGHPUT MEMORY CAPACITY AND PARALLEL PROCESSING

ADVANCES IN PROCESSOR DESIGN AND THE EFFECTS OF MOORES LAW AND AMDAHLS LAW IN RELATION TO THROUGHPUT MEMORY CAPACITY AND PARALLEL PROCESSING ADVANCES IN PROCESSOR DESIGN AND THE EFFECTS OF MOORES LAW AND AMDAHLS LAW IN RELATION TO THROUGHPUT MEMORY CAPACITY AND PARALLEL PROCESSING Evan Baytan Department of Electrical Engineering and Computer

More information

THE latest generation of microprocessors uses a combination

THE latest generation of microprocessors uses a combination 1254 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 30, NO. 11, NOVEMBER 1995 A 14-Port 3.8-ns 116-Word 64-b Read-Renaming Register File Creigton Asato Abstract A 116-word by 64-b register file for a 154 MHz

More information

Low Power PLAs. Reginaldo Tavares, Michel Berkelaar, Jochen Jess. Information and Communication Systems Section, Eindhoven University of Technology,

Low Power PLAs. Reginaldo Tavares, Michel Berkelaar, Jochen Jess. Information and Communication Systems Section, Eindhoven University of Technology, Low Power PLAs Reginaldo Tavares, Michel Berkelaar, Jochen Jess Information and Communication Systems Section, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands {regi,michel,jess}@ics.ele.tue.nl

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications

More information

Low-Power Technology for Image-Processing LSIs

Low-Power Technology for Image-Processing LSIs Low- Technology for Image-Processing LSIs Yoshimi Asada The conventional LSI design assumed power would be supplied uniformly to all parts of an LSI. For a design with multiple supply voltages and a power

More information

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Analytical Modeling of Parallel Systems To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

Implementation of ALU Using Asynchronous Design

Implementation of ALU Using Asynchronous Design IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 6 (Nov. - Dec. 2012), PP 07-12 Implementation of ALU Using Asynchronous Design P.

More information

Fine-Grained Sleep Transistor Sizing Algorithm for Leakage Power Minimization

Fine-Grained Sleep Transistor Sizing Algorithm for Leakage Power Minimization 6.1 Fine-Grained Sleep Transistor Sizing Algorithm for Leakage Power Minimization De-Shiuan Chiou, Da-Cheng Juan, Yu-Ting Chen, and Shih-Chieh Chang Department of CS, National Tsing Hua University, Hsinchu,

More information

Application of Power-Management Techniques for Low Power Processor Design

Application of Power-Management Techniques for Low Power Processor Design 1 Application of Power-Management Techniques for Low Power Processor Design Sivaram Gopalakrishnan, Chris Condrat, Elaine Ly Department of Electrical and Computer Engineering, University of Utah, UT 84112

More information

ESD Protection Circuits: Basics to nano-metric ASICs

ESD Protection Circuits: Basics to nano-metric ASICs ESD Protection Circuits: Basics to nano-metric ASICs Manoj Sachdev University of Waterloo msachdev@ece.uwaterloo.ca September 2007 1 Outline Group Introduction ESD Basics Basic ESD Protection Circuits

More information

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Low-power Architecture. By: Jonathan Herbst Scott Duntley Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media

More information

Wave-Pipelining the Global Interconnect to Reduce the Associated Delays

Wave-Pipelining the Global Interconnect to Reduce the Associated Delays Wave-Pipelining the Global Interconnect to Reduce the Associated Delays Jabulani Nyathi, Ray Robert Rydberg III and Jose G. Delgado-Frias Washington State University School of EECS Pullman, Washington,

More information

EECS4201 Computer Architecture

EECS4201 Computer Architecture Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be

More information

Microelettronica. J. M. Rabaey, "Digital integrated circuits: a design perspective" EE141 Microelettronica

Microelettronica. J. M. Rabaey, Digital integrated circuits: a design perspective EE141 Microelettronica Microelettronica J. M. Rabaey, "Digital integrated circuits: a design perspective" Introduction Why is designing digital ICs different today than it was before? Will it change in future? The First Computer

More information

Analysis of Power Consumption on Switch Fabrics in Network Routers

Analysis of Power Consumption on Switch Fabrics in Network Routers Analysis of Power Consumption on Switch Fabrics in Network Routers Terry Tao Ye Computer Systems Lab Stanford University taoye@stanford.edu Luca Benini DEIS University of Bologna lbenini@deis.unibo.it

More information

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 797- flur,chengkokg@ecn.purdue.edu

More information

Energy Characterization of Hardware-Based Data Prefetching

Energy Characterization of Hardware-Based Data Prefetching Energy Characterization of Hardware-Based Data Prefetching Yao Guo, Saurabh Chheda, Israel Koren, C. Mani Krishna, and Csaba Andras Moritz Electrical and Computer Engineering, University of Massachusetts,

More information

A Case for Merge Joins in Mediator Systems

A Case for Merge Joins in Mediator Systems A Case for Merge Joins in Mediator Systems Ramon Lawrence Kirk Hackert IDEA Lab, Department of Computer Science, University of Iowa Iowa City, IA, USA {ramon-lawrence, kirk-hackert}@uiowa.edu Abstract

More information

FROSTY: A Fast Hierarchy Extractor for Industrial CMOS Circuits *

FROSTY: A Fast Hierarchy Extractor for Industrial CMOS Circuits * FROSTY: A Fast Hierarchy Extractor for Industrial CMOS Circuits * Lei Yang and C.-J. Richard Shi Department of Electrical Engineering, University of Washington Seattle, WA 98195 {yanglei, cjshi@ee.washington.edu

More information