Energy Efficiency and Resilience in Future ICs

Size: px
Start display at page:

Download "Energy Efficiency and Resilience in Future ICs"

Transcription

1 Energy Efficiency and Resilience in Future ICs Andrew B. Kahng UCSD VLSI CAD Laboratory CSE and ECE Departments University of California, San Diego

2 Outline The Power Gap Low-Power Design Beyond Low-Power: Resilience Conclusion 2

3 Part I: Power Crisis The Power Gap Low-Power Design Beyond Low-Power: Resilience Conclusion

4 Value From Semiconductor Scaling Value is enabled by integration Greater utility Less cost Scaling enables new products Density: more functions per chip Device: better performance Device Density Product Challenge: Maintain Scaling of Value Variability larger design guardband Cost technology and design risk Leakage waste of increasingly expensive energy 4

5 Scaling of Density Layout density increase Capacitance density increase (nf/mm 2 ) Capacitance density + 10% / year P dynamic CV Intel Intel Pentium Intel Intel Pentium 4 Intel Pentium III Intel Pentium II 2,600,000,000 Intel Itanium 2 Intel Itanium 3,000,000, Core Intel Xeon Dual-Core Intel Itanium NVIDIA GF100 NVIDIA GT200 AMD RV700 NVIDIA G80 transistors 10,000,000,000 1,000,000, ,000,000 10,000,000 1,000, ,000 10,000 1,000 5

6 Scaling of Product Video Audio Voice MPEG1 Extraction JPEG MPEG4 MPEG2 Extraction Compression Sentence Translation Dolby-AC3 MPEG Word Recognition 100 GOPS Voice Auto Translation Graphics 2D Graphics Communication Recognition Modem FAX 3D Graphics 10Mpps 100Mpps VoIP Modem SW Defined Radio Face Recognition Voice Print Recognition Moving Picture Recognition Required performance for multimedia processing (GOPS: Giga Ops/Sec) 2007 ITRS Consumer-Stationary SOC Driver: 220 TFlops on a single chip by

7 Scaling of Device To meet the performance scaling Mobility enhancement Vdd slowly lowering I ds C ox V dd V th C ox ( 1/ t ox ) increasing t ox lowering V th lowering Gate leakage Scaling of transistor intrinsic speed of highperformance logic (ITRS 2009, 13%/year) Subthreshold leakage 7

8 Scaling of Device 8

9 The Power Gap Capacitance Frequency Tox, Vth Functions Density Product (CV/I) Device P dynamic Power Crisis P leakage Higher VDD? Quadratic P dynamic increase VDD? Lower VDD? Lower Vth Exponential P leakage increase 9

10 Power Limits the Technology Roadmap ITRS MPU clock frequency roadmap (ABKGroup since 2001 ITRS) Normalized frequency Frequency (GHz) High-Performance Device Intrinsic Speed (1 / ) Normalized to % / year (2001 ITRS) before ITRS 2007 ITRS ( = CV / I) 13% / year (2009 ITRS) 2011 ITRS (tentative/planned) 8% / year (2011 ITRS) (tentative / planned) View ITRS MPU Model 10

11 The Power Gap (roadmapped by UCSD since 2001) Low-Power Design Beyond Low-Power: Resilience Conclusion

12 Digression: Concept of Timing Slack Many power optimizations convert positive timing slack into power reductions: smaller transistors, area, power, But, this is not easy! Transistors in positive-slack T arrival T required cells can have higher V th, larger L gate, more variation, CLK Slack = T required T arrival CLK 12

13 Even If We Slow Down Frequency Scaling Normalized Freq Cap. scaling: 2 Tr.density / 2 years 2 L gate Freq. scaling: 4% / year , Normalized Cap. Roadmap of lowpower techniques! Clock Gating Multi-Vth Multi-Core Arch L gate Bias* Power Gating* Adaptive Body Bias MPU Power (W) 10,000 1, Practical Power Limit Multi-Vdd DVFS* * Work at UCSD

14 Clock Gating P dynamic Reduction Dynamic Power Reduction Cumulative efficiency (%) 9000% 8000% 7000% 6000% 5000% 4000% 3000% 2000% 1000% 0% Clock gating 2834% 100% 300% 270% 270% 525% Technology node (nm) 8503% Leakage Power Reduction Cumulative efficiency (%) 35000% 30000% 25000% 20000% 15000% 10000% 5000% 0% 32400% 9000% 100% 100% 300% 300% Technology node (nm) %

15 Multi-Vth P leakage Reduction Critical Timing Path Dynamic Power Reduction Cumulative efficiency (%) 9000% 8000% 7000% 6000% 5000% 4000% 3000% 2000% 1000% 0% Clock gating 2834% 100% 300% 270% 270% 525% Technology node (nm) 8503% Low-V th High-V th Leakage Power Reduction Cumulative efficiency (%) 35000% 30000% 25000% 20000% 15000% 10000% 5000% 0% 9000% 100% 100% 300% 300% Multi-Vth Technology node (nm) 32400% %

16 Multi-Core Architecture P dynamic Reduction Dynamic Power Reduction Cumulative efficiency (%) 9000% 8000% 7000% 6000% 5000% 4000% 3000% 2000% 1000% 0% Clock gating Multi-Vth 100% 300% 270% 270% 525% Multi-cores 150 Technology node (nm) % 2834% 50 0 Leakage Power Reduction Factor Cumulative efficiency (%) 35000% 30000% 25000% 20000% 15000% 10000% 5000% 0% 32400% 16200% 9000% 100% 100% 300% 300% Technology node (nm) M. Domeika, drdobbs.com, Dec. 27,

17 Gate Length Biasing P leakage Reduction Dynamic Power Reduction Cumulative efficiency (%) 9000% 8000% 7000% 6000% 5000% 4000% 3000% 2000% 1000% 0% Multi-cores Clock gating Multi-Vth 100% 300% 270% 270% 525% Technology node (nm) % 2834% 50 0 Leakage Power Reduction Cumulative efficiency (%) 35000% 30000% 25000% 20000% 15000% 10000% 5000% 0% 16200% 9000% 100% 100% 300% 300% Body biasing Power gating L gate biasing 150 Technology node (nm) %

18 Transistor Gate-Length Biasing UCSD 2003 Leakage and Delay vs. Gate Length Bias Impact Exponential (I sub ) Leakage reduction Variability reduction Linear Performance reduction Apply very small biases (+2nm, +4nm, etc.) just before tapeout Delay 18

19 Transistor Gate-Length Biasing UCSD 2003 Transistor on non-critical path: target CD 70nm Transistor on near setupcritical path: target CD 66nm Transistor on setup-critical path: target CD 65nm Challenging global optimization over millions of gates, with complex timing constraints (Spent two years developing a leading industry tool ) 19

20 Transistor Gate-Length Biasing UCSD 2003 Leakage and Delay vs. Gate Length Bias Impact Exponential (I sub ) Leakage reduction Variability reduction Linear Performance reduction Apply very small biases (+2nm, +4nm, etc.) just before tapeout Chip-scale optimization: trade timing slack for leakage power everywhere possible UCSD-patented flow currently offered in TSMC s Green Power Trim service Energy savings for just AMD/ATI Radeon GPUs: >> 10 9 watt-hours Delay 20

21 Power Gating P leakage Reduction Dynamic Power Reduction Cumulative efficiency (%) 9000% 8000% 7000% 6000% 5000% 4000% 3000% 2000% 1000% 0% Multi-cores Clock gating Multi-Vth 100% 300% 270% 270% 525% Technology node (nm) % 2834% 50 0 Leakage Power Reduction Cumulative efficiency (%) 35000% 30000% 25000% 20000% 15000% 10000% 5000% 0% 100% 100% 300% 300% Body biasing Power gating L gate biasing % Technology node (nm) % 16200%

22 Power Gating Typical operation modes: active, idle Power gating: cut off leakage path during predefined idle modes Typical operation with power gating: VDD VSS_INT VSS idle active Logic block active idle active PG_ENABLE Current sleep Without power gating With power gating wake up More than 10x leakage saving during idle mode 22

23 New: Runtime Power Gating Joint with Tajana Rosing and Rick Strong, UCSD Clock gating and power gating address dynamic and leakage power during idle mode But in active mode, cycles are still wasted E.g., execution time spent waiting for memory access What If. Circuit design (wakeup logic) to enable faster, more flexible wakeup of one or more cores? shorter power gating intervals runtime power gating 23

24 Token-Based Power Gating Architectural power gating: Send tokens to control core power gating About tokens: Sent by cache or memory controller Received by core Stamped with system cycle in which it was generated Has estimated request wait latency Sets minimum core wakeup latency Managed by token controller Memory Joint work with Tajana Rosing and Rick Strong, UCSD L2$ miss L1$ miss Core PG-L1-Miss PG-L2-Miss Time 24

25 Token-Based Power Gating Joint work with Tajana Rosing and Rick Strong, UCSD About token controller: Manages token properties Queries cores for performance information Maintains peak current constraint by managing core wakeup latencies Maximizes energy savings, e.g., by balancing wakeup latencies for parallel apps, or increasing wakeup latencies for underutilized cores Memory L2$ miss L1$ miss Core PG-L1-Miss PG-L2-Miss Time 25

26 System Model and Tool Flow Multi-core assumptions 4 cores Private L1 (32KB-2way-0.5ns), L2 caches (2MB-2way-9ns) MESI cache coherence protocol Memory: Size = 2GB Latency = 40ns Core Type: In-order EV4 ISA: ALPHA64 Frequency: 2GHZ Width: 2 OS: Vanilla-Linux Tools Joint work with Tajana Rosing and Rick Strong, UCSD M5 Full-System Simulator (Spec2006, Parsec2.0, Splash2.0 benchmarks) McPAT used to generate power numbers 26

27 Wakeup Latencies vs. Energy Saving Joint work with Tajana Rosing and Rick Strong, UCSD Different memory hierarchy levels have different latencies L1 hit latency = 0.5ns // L2 hit latency = 9ns // Memory Latency = 40ns Lowering the core wakeup latency (10ns 5ns 2ns) can make power gating for smaller idle periods more attractive PGT (for both L1 and L2 misses) vs. PGTL2 (for only L2 misses) ~40% energy saving 27

28 And More: ABB / Multi-VDD / DVFS / NTC /... Dynamic Power Reduction Cumulative efficiency (%) 9000% 8000% 7000% 6000% 5000% 4000% 3000% 2000% 1000% 0% Clock gating Multi-Vth Multi-cores 100% 300% 270% 270% 525% Technology node (nm) % 50 Multi-VDD DVFS, AVS RTL-Opt Data gating DC-DC efficiency 8503% 0 Near-threshold computing Leakage Power Reduction Cumulative efficiency (%) 35000% 30000% 25000% 20000% 15000% 10000% 5000% 0% Body biasing Power gating L gate biasing 9000% 100% 100% 300% 300% Technology node (nm) % 16200%

29 DVFS: Dynamic Voltage/Frequency Scaling DVFS enables Operation at multiple power-performance points Adaptation to different operating conditions or modes Observation 1: DVFS changes only voltage and frequency, not the design itself is a fixed design always optimal? Observation 2: Lifetime energy changes with scenario (R * X) is scenario-oblivious design always optimal? Different duty cycle (R) Lifetime e.g., talk mode e.g., standby mode Different frequency scaling (X) X = Joint work with Rakesh Kumar and John Sartori, UIUC High performance mode clock frequency Low performance mode clock frequency 29

30 DVFS Suboptimality #1 Joint work with Rakesh Kumar and John Sartori, UIUC No single design can work well in all modes: jack of all trades, master of none What If: selective replication Replication benefits are different in each module optimal use of replication = knapsack formulation Multi mode design Selective replication design Example: CTL module has 12% energy savings through replication 30

31 DVFS Suboptimality #1 Joint work with Rakesh Kumar and John Sartori, UIUC No single design can work well in all modes: jack of all trades, master of none What If: selective replication Replication benefits are different in each module optimal use of replication = knapsack formulation avg 9% energy savings High Performance Replica ITLB IRF SPU DTLB Multi-mode design IN mode 1 0 OUT IFU H MUX EXU LSU TLU H MUX Low Performance Replica mode mode IFU L instructio n cache FPRF FFU H FFU L MUX data cache TLU L High Perf. Replica Low Perf. Replica 31

32 DVFS Suboptimality #2 Joint work with Rakesh Kumar and John Sartori, UIUC Lifetime energy is not optimal Energy high performance Why this happens Design A Design B low Delay performance Design A saves energy in high perf mode. Design B saves energy in low perf mode. Neither is optimal for lifetime energy. E = P hi R + P lo (1- R) operating point Timing slack (ns) Freq. voltage Path A Path B 1.0GHz 0.95V MHz 0.60V How to optimize for multiple performance modes? 32

33 Context-Driven Multi-Mode Design Joint work with Rakesh Kumar and John Sartori, UIUC Goal: Find a minimum lifetime energy design Conventional design flow sets constraints (frequency, voltage) before implementation (but the min-energy constraints are unknown!) What If: context-driven multi-mode design: design to the scenario rather than to constraints avg 8% energy savings power consumption 33

34 The Power Gap (roadmapped by UCSD since 2001) Low-Power Design (Lgate biasing, runtime power gating, scenario-aware and replication-based DVFS) Beyond Low-Power: Resilience Conclusion

35 New Mindset Better-than-worst-case (typical case) design Dynamic reliability (error) management Living with variations Further Energy Reductions Resilient Design limit of Worst-Case design Typical-Case + error-tolerance Voltage scaling ( )

36 Types of Resilience Error Acceptance Error Tolerance Error Avoidance Key Ideas Allow errors Approximate computation for accuracy insensitive applications Detect and correct errors dynamically Error detection FF + architectural correction schemes No error allowed DVFS + canary circuits UCSD Works Approximate arithmetic design Recovery driven design Design Dependent Ring Oscillator 36

37 The Power Gap Low-Power Design Beyond Low-Power: Resilience Error Tolerance Error Acceptance Error Avoidance Recovery-Driven Design Approximate Arithmetic Logic Design-Dependent Ring Oscillator Conclusion

38 Recovery-Driven Design [HPCA10] [DAC10] Motivation #1: If the design uses an error-tolerance mechanism, then the design process should be modified accordingly Motivation #2: Error rate demands the use of functional information Joint work with Rakesh Kumar and John Sartori, UIUC 38

39 Recovery-Driven Design [HPCA10] [DAC10] Joint work with Rakesh Kumar and John Sartori, UIUC Low-power methodology for error-tolerant designs Minimize power for a target error rate Slack redistribution with functional information Voltage Scaling Path Optimization Power Reduction reduce voltage until the error rate exceeds a target optimize frequently exercised, negative slack paths reducing power w/o affecting error rate 39

40 Recovery-Driven Design: Experimental Results Path toggling extraction and error rate estimation accurate fast (20X) Power comparison across design techniques 25% power savings w/ 2% error 22% power savings w/ Razor flip flop Power Consumption (W) Conventional P&R Tight P&R PCT Slack Optimizer Power Optimizer % 0.25% 0.50% 1.00% 2.00% 4.00% 8.00% Error Rate (%) 40

41 Resilient Overhead Reduction (Ongoing Work) Resilient overhead: For the resilience, design overheads are required i.e., additional circuit and operations (pipeline flush) New tradeoffs in resilient design Pros. tradeoff ET register avoid over design, voltage scaling further Error tolerant (ET) registers Razor flip flop Cons. cost for ET registers, recovery overhead Goal: Minimize the cost function (power) using the tradeoffs Approach: Find optimal assignment of registers (error-tolerant or normal) 41

42 The Power Gap Low-Power Design Beyond Low-Power: Resilience Error Tolerance Error Acceptance Error Avoidance Recovery-Driven Design Approximate Arithmetic Logic Design-Dependent Ring Oscillator Conclusion

43 Error Avoidance Adjust Vdd,Freq. according to delay margin No error recovery mechanism not required Estimated delay DVFS controller Monitoring How to Circuit feedback delay margin? Delay Vdd, Freq Actual Circuit Delay constraint guardband Conventional approaches Inverter-based RO: Critical paths have different sensitivity to process variations Critical path RO: Replicating critical path with long interconnect costs area Time Error avoidance system Worst case design 43

44 Monitor for Resilience: DDRO 1 Delay. V Delay nom th 1 Delay. V Delay nom th 1 Delay. V Delay nom th Gate A Gate B 1 Delay. L gate Delay nom 1 Delay. L gate Delay nom path (A+B) Problem: Measure real-time performance variation in an adaptive system Approach: Select gates to form designdependent ring oscillators (DDROs) with similar delay sensitivity to variations (Lgate, Vth, Tox, V, T, ) as actual critical paths Potential Benefits: Specific to path s rising or falling transition Can cluster critical paths having similar sensitivities to reduce number of RO Low area overhead Automated design flow, standard cells only 1 Delay. V Delay th nom DDRO 1 Delay. L gate Delay nom Critical path 1 Delay. L Delay gate nom 44

45 DDRO Synthesis Flow Gate sensitivities Critical path sensitivities 1 Delay. V Delay nom th Critical path Cluster 1 Cluster critical paths DDRO error 1 Delay. L gate Delay nom Cluster 2 For each cluster, synthesize a DDRO using integer linear program 45nm SOI test chip ARM Cortex M3 DDRO Delay sensitivity Error (%) Synthesis result INV. RO CPRO DDRO Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Average 45

46 Monte Carlo Simulation Results (30 samples) Estimated delay (ns) Without within-die variation modeling Estimated delay (ns) Estimated delay (ns) 1.2 Estimation error : -1.4 % ~ 3.7 % 1.2 Estimation error : -2.0 % ~ 4.1 % 1.2 Estimation error : -4.3 % ~ 7.1 % DDRO Critical path RO Inv. RO Actual delay (ns) Actual delay (ns) With within-die variation modeling Actual delay (ns) Estimated delay (ns) Estimated delay (ns) Estimated delay (ns) 1.2 Estimation error : -0.5 % ~ 3.7 % 1.2 Estimation error : -1.3 % ~ 3.6 % 1.2 Estimation error : -1.7 % ~ 5.1 % DDRO Critical path RO Inv. RO Actual delay (ns) Actual delay (ns) Actual delay (ns) 46

47 The Power Gap (roadmapped by UCSD since 2001) Conclusion Low-Power Design (Lgate biasing, runtime power gating, scenario-aware and replication-based DVFS) Beyond Low-Power: Resilience (recovery-driven design, approximate arithmetic, design-dependent RO)

48 The Elephant Big picture for power and resilience spans Software and applications Architectures Interconnects Memories Circuits and devices Technology Fundamental limits Different animal from recent talk topics: Design for Manufacturability, Technology Roadmap, 22nm Chip Implementation, 3D PDN Pathfinding,! 48

49 What I Spend My Time On Connecting and building: dots, bridges, big pictures, The IC Design-Manufacturing Interface The ITRS roadmap IC physical design (clustering, placement, interconnect design, ) NOC modeling and optimization (ORION2.0, trace-driven optimizations,...) Utilities for teaching (math drill generator, automatic editor, ) Current projects MARCO Gigascale Systems Research Center: Physical Architecture Components: Models, Roadmaps and Integrations = system-level impacts of 3D, new memories, new design optimizations UC Discovery: Integrated Modeling, Process, and Computation for Technology (IMPACT) Center (Design-Manufacturing Interface) SRC (with UIUC): New Directions in Architecture and Design of Scalable Energy Constrained SoCs SRC (with UCLA): New Directions in Design-Aware Manufacturing NSF (with UCLA): Research on Benchmarking and Robustness of VLSI Sizing Optimizations Qualcomm: Power Delivery Pathfinding for 3D Through-Silicon Stacking STMicroelectronics: Across-Field Variation Mapping From Silicon Measurements, and Design-Driven DoseMap Flow 49

50 Thank You! 50

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

DDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators

DDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators DDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators Tuck-Boon Chan, Puneet Gupta, Andrew B. Kahng and Liangzhen Lai UC San Diego ECE and CSE Departments, La Jolla,

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Low-Power Technology for Image-Processing LSIs

Low-Power Technology for Image-Processing LSIs Low- Technology for Image-Processing LSIs Yoshimi Asada The conventional LSI design assumed power would be supplied uniformly to all parts of an LSI. For a design with multiple supply voltages and a power

More information

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Prof. Lei He EE Department, UCLA LHE@ee.ucla.edu Partially supported by NSF. Pathway to Power Efficiency and Variation Tolerance

More information

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Gigascale Integration Design Challenges & Opportunities Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Outline CMOS technology challenges Technology, circuit and μarchitecture solutions Integration

More information

Design and Technology Trends

Design and Technology Trends Lecture 1 Design and Technology Trends R. Saleh Dept. of ECE University of British Columbia res@ece.ubc.ca 1 Recently Designed Chips Itanium chip (Intel), 2B tx, 700mm 2, 8 layer 65nm CMOS (4 processors)

More information

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

Leakage Mitigation Techniques in Smartphone SoCs

Leakage Mitigation Techniques in Smartphone SoCs Leakage Mitigation Techniques in Smartphone SoCs 1 John Redmond 1 Broadcom International Symposium on Low Power Electronics and Design Smartphone Use Cases Power Device Convergence Diverse Use Cases Camera

More information

CSE 291: Mobile Application Processor Design

CSE 291: Mobile Application Processor Design CSE 291: Mobile Application Processor Design Mobile Application Processors are where the action are The evolution of mobile application processors mirrors that of microprocessors mirrors that of mainframes..

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017 Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of

More information

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem. The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults

More information

Adaptive Voltage Scaling (AVS) Alex Vainberg October 13, 2010

Adaptive Voltage Scaling (AVS) Alex Vainberg   October 13, 2010 Adaptive Voltage Scaling (AVS) Alex Vainberg Email: alex.vainberg@nsc.com October 13, 2010 Agenda AVS Introduction, Technology and Architecture Design Implementation Hardware Performance Monitors Overview

More information

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) K.Prasad Babu 2 M.tech (Ph.d) hanumanthurao19@gmail.com 1 kprasadbabuece433@gmail.com 2 1 PG scholar, VLSI, St.JOHNS

More information

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Low-power Architecture. By: Jonathan Herbst Scott Duntley Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

Multi-Core Microprocessor Chips: Motivation & Challenges

Multi-Core Microprocessor Chips: Motivation & Challenges Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005

More information

Minimization of NBTI Performance Degradation Using Internal Node Control

Minimization of NBTI Performance Degradation Using Internal Node Control Minimization of NBTI Performance Degradation Using Internal Node Control David R. Bild, Gregory E. Bok, and Robert P. Dick Department of EECS Nico Trading University of Michigan 3 S. Wacker Drive, Suite

More information

An FPGA Architecture Supporting Dynamically-Controlled Power Gating

An FPGA Architecture Supporting Dynamically-Controlled Power Gating An FPGA Architecture Supporting Dynamically-Controlled Power Gating Altera Corporation March 16 th, 2012 Assem Bsoul and Steve Wilton {absoul, stevew}@ece.ubc.ca System-on-Chip Research Group Department

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

An Overview of Standard Cell Based Digital VLSI Design

An Overview of Standard Cell Based Digital VLSI Design An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,

More information

Ultra Low Power (ULP) Challenge in System Architecture Level

Ultra Low Power (ULP) Challenge in System Architecture Level Ultra Low Power (ULP) Challenge in System Architecture Level - New architectures for 45-nm, 32-nm era ASP-DAC 2007 Designers' Forum 9D: Panel Discussion: Top 10 Design Issues Toshinori Sato (Kyushu U)

More information

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary

More information

Respin: Rethinking Near- Threshold Multiprocessor Design with Non-Volatile Memory

Respin: Rethinking Near- Threshold Multiprocessor Design with Non-Volatile Memory Respin: Rethinking Near- Threshold Multiprocessor Design with Non-Volatile Memory Computer Architecture Research Lab h"p://arch.cse.ohio-state.edu Universal Demand for Low Power Mobility Ba"ery life Performance

More information

Real-Time Dynamic Energy Management on MPSoCs

Real-Time Dynamic Energy Management on MPSoCs Real-Time Dynamic Energy Management on MPSoCs Tohru Ishihara Graduate School of Informatics, Kyoto University 2013/03/27 University of Bristol on Energy-Aware COmputing (EACO) Workshop 1 Background Low

More information

ECE 486/586. Computer Architecture. Lecture # 2

ECE 486/586. Computer Architecture. Lecture # 2 ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device

More information

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -

More information

An Energy-Efficient Asymmetric Multi-Processor for HPC Virtualization

An Energy-Efficient Asymmetric Multi-Processor for HPC Virtualization An Energy-Efficient Asymmetric Multi-Processor for HP Virtualization hung Lee and Peter Strazdins*, omputer Systems Group, Research School of omputer Science, The Australian National University (slides

More information

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun

More information

On GPU Bus Power Reduction with 3D IC Technologies

On GPU Bus Power Reduction with 3D IC Technologies On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The

More information

EE5780 Advanced VLSI CAD

EE5780 Advanced VLSI CAD EE5780 Advanced VLSI CAD Lecture 1 Introduction Zhuo Feng 1.1 Prof. Zhuo Feng Office: EERC 513 Phone: 487-3116 Email: zhuofeng@mtu.edu Class Website http://www.ece.mtu.edu/~zhuofeng/ee5780fall2013.html

More information

Near-Threshold Computing: Reclaiming Moore s Law

Near-Threshold Computing: Reclaiming Moore s Law 1 Near-Threshold Computing: Reclaiming Moore s Law Dr. Ronald G. Dreslinski Research Fellow Ann Arbor 1 1 Motivation 1000000 Transistors (100,000's) 100000 10000 Power (W) Performance (GOPS) Efficiency (GOPS/W)

More information

Conservation Cores: Reducing the Energy of Mature Computations

Conservation Cores: Reducing the Energy of Mature Computations Conservation Cores: Reducing the Energy of Mature Computations Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, Michael Bedford

More information

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine

More information

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks Last Time Making correct concurrent programs Maintaining invariants Avoiding deadlocks Today Power management Hardware capabilities Software management strategies Power and Energy Review Energy is power

More information

Conservation Cores: Reducing the Energy of Mature Computations

Conservation Cores: Reducing the Energy of Mature Computations Conservation Cores: Reducing the Energy of Mature Computations Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, Michael Bedford

More information

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache

More information

Dynamic Voltage and Frequency Scaling Circuits with Two Supply Voltages

Dynamic Voltage and Frequency Scaling Circuits with Two Supply Voltages Dynamic Voltage and Frequency Scaling Circuits with Two Supply Voltages ECE Department, University of California, Davis Wayne H. Cheng and Bevan M. Baas Outline Background and Motivation Implementation

More information

Device And Architecture Co-Optimization for FPGA Power Reduction

Device And Architecture Co-Optimization for FPGA Power Reduction 54.2 Device And Architecture Co-Optimization for FPGA Power Reduction Lerong Cheng, Phoebe Wong, Fei Li, Yan Lin, and Lei He Electrical Engineering Department University of California, Los Angeles, CA

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 24

ECE 571 Advanced Microprocessor-Based Design Lecture 24 ECE 571 Advanced Microprocessor-Based Design Lecture 24 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 25 April 2013 Project/HW Reminder Project Presentations. 15-20 minutes.

More information

Accelerating Innovation

Accelerating Innovation Accelerating Innovation In the Era of Exponentials Dr. Chi-Foon Chan President and co-chief Executive Officer, Synopsys, Inc. August 27, 2013 ASQED 1 Accelerating Technology Innovation Exciting time to

More information

Advanced Digital Integrated Circuits. Lecture 9: SRAM. Announcements. Homework 1 due on Wednesday Quiz #1 next Monday, March 7

Advanced Digital Integrated Circuits. Lecture 9: SRAM. Announcements. Homework 1 due on Wednesday Quiz #1 next Monday, March 7 EE241 - Spring 2011 Advanced Digital Integrated Circuits Lecture 9: SRAM Announcements Homework 1 due on Wednesday Quiz #1 next Monday, March 7 2 1 Outline Last lecture Variability This lecture SRAM 3

More information

Introduction. Summary. Why computer architecture? Technology trends Cost issues

Introduction. Summary. Why computer architecture? Technology trends Cost issues Introduction 1 Summary Why computer architecture? Technology trends Cost issues 2 1 Computer architecture? Computer Architecture refers to the attributes of a system visible to a programmer (that have

More information

Computer Architecture

Computer Architecture Informatics 3 Computer Architecture Dr. Boris Grot and Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh General Information Instructors: Boris

More information

POWER7: IBM's Next Generation Server Processor

POWER7: IBM's Next Generation Server Processor POWER7: IBM's Next Generation Server Processor Acknowledgment: This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002 Outline

More information

CS250 VLSI Systems Design Lecture 9: Memory

CS250 VLSI Systems Design Lecture 9: Memory CS250 VLSI Systems esign Lecture 9: Memory John Wawrzynek, Jonathan Bachrach, with Krste Asanovic, John Lazzaro and Rimas Avizienis (TA) UC Berkeley Fall 2012 CMOS Bistable Flip State 1 0 0 1 Cross-coupled

More information

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System 1 1 1 Centip3De: A 64-Core, 3D Stacked, Near-Threshold System Ronald G. Dreslinski David Fick, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

ECE484 VLSI Digital Circuits Fall Lecture 01: Introduction

ECE484 VLSI Digital Circuits Fall Lecture 01: Introduction ECE484 VLSI Digital Circuits Fall 2017 Lecture 01: Introduction Adapted from slides provided by Mary Jane Irwin. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] CSE477 L01 Introduction.1

More information

POWER7+ TM IBM IBM Corporation

POWER7+ TM IBM IBM Corporation POWER7+ TM 2012 Corporation Outline POWER Processor History Design Overview Performance Benchmarks Key Features Scale-up / Scale-out The new accelerators Advanced energy management Summary * Statements

More information

VLSI Design Automation. Maurizio Palesi

VLSI Design Automation. Maurizio Palesi VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 Outline Technology trends VLSI Design flow (an overview) 3 IC Products Processors CPU, DSP, Controllers Memory chips

More information

INTRODUCTION TO FPGA ARCHITECTURE

INTRODUCTION TO FPGA ARCHITECTURE 3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)

More information

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management Next-Generation Mobile Computing: Balancing Performance and Power Efficiency HOT CHIPS 19 Jonathan Owen, AMD Agenda The mobile computing evolution The Griffin architecture Memory enhancements Power management

More information

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Sandro Bartolini* Department of Information Engineering, University of Siena, Italy bartolini@dii.unisi.it

More information

Introduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN

Introduction 1. GENERAL TRENDS. 1. The technology scale down DEEP SUBMICRON CMOS DESIGN 1 Introduction The evolution of integrated circuit (IC) fabrication techniques is a unique fact in the history of modern industry. The improvements in terms of speed, density and cost have kept constant

More information

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market

More information

Architecture Evaluation for

Architecture Evaluation for Architecture Evaluation for Power-efficient FPGAs Fei Li*, Deming Chen +, Lei He*, Jason Cong + * EE Department, UCLA + CS Department, UCLA Partially supported by NSF and SRC Outline Introduction Evaluation

More information

Advanced Computer Architecture (CS620)

Advanced Computer Architecture (CS620) Advanced Computer Architecture (CS620) Background: Good understanding of computer organization (eg.cs220), basic computer architecture (eg.cs221) and knowledge of probability, statistics and modeling (eg.cs433).

More information

Advanced Digital Integrated Circuits. Lecture 9: SRAM. Announcements. Homework 1 due on Wednesday Quiz #1 next Monday, March 7

Advanced Digital Integrated Circuits. Lecture 9: SRAM. Announcements. Homework 1 due on Wednesday Quiz #1 next Monday, March 7 EE24 - Spring 20 Advanced Digital Integrated Circuits Lecture 9: SRAM Announcements Homework due on Wednesday Quiz # next Monday, March 7 2 Outline Last lecture Variability This lecture SRAM 3 Practical

More information

Real-Time Dynamic Voltage Hopping on MPSoCs

Real-Time Dynamic Voltage Hopping on MPSoCs Real-Time Dynamic Voltage Hopping on MPSoCs Tohru Ishihara System LSI Research Center, Kyushu University 2009/08/05 The 9 th International Forum on MPSoC and Multicore 1 Background Low Power / Low Energy

More information

ECE 261: Full Custom VLSI Design

ECE 261: Full Custom VLSI Design ECE 261: Full Custom VLSI Design Prof. James Morizio Dept. Electrical and Computer Engineering Hudson Hall Ph: 201-7759 E-mail: jmorizio@ee.duke.edu URL: http://www.ee.duke.edu/~jmorizio Course URL: http://www.ee.duke.edu/~jmorizio/ece261/261.html

More information

TDT 4260 lecture 2 spring semester 2015

TDT 4260 lecture 2 spring semester 2015 1 TDT 4260 lecture 2 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Chapter 1: Fundamentals of Quantitative Design and Analysis, continued

More information

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors:!

More information

Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014

Novel Nonvolatile Memory Hierarchies to Realize Normally-Off Mobile Processors ASP-DAC 2014 Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014 Shinobu Fujita, Kumiko Nomura, Hiroki Noguchi, Susumu Takeda, Keiko Abe Toshiba Corporation, R&D Center Advanced

More information

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements EE241 - Spring 2007 Advanced Digital Integrated Circuits Lecture 22: SRAM Announcements Homework #4 due today Final exam on May 8 in class Project presentations on May 3, 1-5pm 2 1 Class Material Last

More information

Digital Integrated Circuits A Design Perspective. Jan M. Rabaey

Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Digital Integrated Circuits A Design Perspective Jan M. Rabaey Outline (approximate) Introduction and Motivation The VLSI Design Process Details of the MOS Transistor Device Fabrication Design Rules CMOS

More information

POWER7: IBM's Next Generation Server Processor

POWER7: IBM's Next Generation Server Processor Hot Chips 21 POWER7: IBM's Next Generation Server Processor Ronald Kalla Balaram Sinharoy POWER7 Chief Engineer POWER7 Chief Core Architect Acknowledgment: This material is based upon work supported by

More information

Package level Interconnect Options

Package level Interconnect Options Package level Interconnect Options J.Balachandran,S.Brebels,G.Carchon, W.De Raedt, B.Nauwelaers,E.Beyne imec 2005 SLIP 2005 April 2 3 Sanfrancisco,USA Challenges in Nanometer Era Integration capacity F

More information

Trends in the Infrastructure of Computing

Trends in the Infrastructure of Computing Trends in the Infrastructure of Computing CSCE 9: Computing in the Modern World Dr. Jason D. Bakos My Questions How do computer processors work? Why do computer processors get faster over time? How much

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

Robust System Design with MPSoCs Unique Opportunities

Robust System Design with MPSoCs Unique Opportunities Robust System Design with MPSoCs Unique Opportunities Subhasish Mitra Robust Systems Group Departments of Electrical Eng. & Computer Sc. Stanford University Email: subh@stanford.edu Acknowledgment: Stanford

More information

When will the cost of dependability end innovation in computer design?

When will the cost of dependability end innovation in computer design? When will the cost of dependability end innovation in computer design? VTS-2015 panel session Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu http://vlsicad.ucsd.edu 1 Huh??? IMHO, The cost of

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 938 LOW POWER SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY T.SANKARARAO STUDENT OF GITAS, S.SEKHAR DILEEP

More information

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing

More information

Evaluating and Exploiting Impacts of Dynamic Power Management Schemes on System Reliability

Evaluating and Exploiting Impacts of Dynamic Power Management Schemes on System Reliability Evaluating and Exploiting Impacts of Dynamic Power Management Schemes on System Reliability Liangzhen Lai, Vikas Chandra* and Puneet Gupta UCLA Electrical Engineering Department ARM Research* Radiation-Induced

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers

L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers I N S T I T U T D E R E C H E R C H E T E C H N O L O G I Q U E L évolution des architectures et des technologies d intégration des circuits intégrés dans les Data centers 10/04/2017 Les Rendez-vous de

More information

The Design of the KiloCore Chip

The Design of the KiloCore Chip The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory

More information

Efficient Systems. Micrel lab, DEIS, University of Bologna. Advisor

Efficient Systems. Micrel lab, DEIS, University of Bologna. Advisor Row-based Design Methodologies To Compensate Variability For Energy- Efficient Systems Micrel lab, DEIS, University of Bologna Mohammad Reza Kakoee PhD Student m.kakoee@unibo.it it Luca Benini Advisor

More information

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,

More information

Regularity for Reduced Variability

Regularity for Reduced Variability Regularity for Reduced Variability Larry Pileggi Carnegie Mellon pileggi@ece.cmu.edu 28 July 2006 CMU Collaborators Andrzej Strojwas Slava Rovner Tejas Jhaveri Thiago Hersan Kim Yaw Tong Sandeep Gupta

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information

A Simple Model for Estimating Power Consumption of a Multicore Server System

A Simple Model for Estimating Power Consumption of a Multicore Server System , pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Many-Core Computing Era and New Challenges. Nikos Hardavellas, EECS

Many-Core Computing Era and New Challenges. Nikos Hardavellas, EECS Many-Core Computing Era and New Challenges Nikos Hardavellas, EECS Moore s Law Is Alive And Well 90nm 90nm transistor (Intel, 2005) Swine Flu A/H1N1 (CDC) 65nm 2007 45nm 2010 32nm 2013 22nm 2016 16nm 2019

More information

A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias. David Kidd August 26, 2013

A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias. David Kidd August 26, 2013 A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias David Kidd August 26, 2013 1 HOTCHIPS 2013 Copyright 2013 SuVolta, Inc. All rights reserved. Agenda DDC transistor and PowerShrink platform

More information

Response Time and Throughput

Response Time and Throughput Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time e.g., tasks/transactions/ per hour How are response time and throughput affected by Replacing

More information

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Interconnect Challenges in a Many Core Compute Environment Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Agenda Microprocessor general trends Implications Tradeoffs Summary

More information

More Course Information

More Course Information More Course Information Labs and lectures are both important Labs: cover more on hands-on design/tool/flow issues Lectures: important in terms of basic concepts and fundamentals Do well in labs Do well

More information

FPGAhammer: Remote Voltage Fault Attacks on Shared FPGAs, suitable for DFA on AES

FPGAhammer: Remote Voltage Fault Attacks on Shared FPGAs, suitable for DFA on AES , suitable for DFA on AES Jonas Krautter, Dennis R.E. Gnad, Mehdi B. Tahoori 10.09.2018 INSTITUTE OF COMPUTER ENGINEERING CHAIR OF DEPENDABLE NANO COMPUTING KIT Die Forschungsuniversität in der Helmholtz-Gemeinschaft

More information

Design and Analysis of Ultra Low Power Processors Using Sub/Near-Threshold 3D Stacked ICs

Design and Analysis of Ultra Low Power Processors Using Sub/Near-Threshold 3D Stacked ICs Design and Analysis of Ultra Low Power Processors Using Sub/Near-Threshold 3D Stacked ICs Sandeep Kumar Samal, Yarui Peng, Yang Zhang, and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta,

More information

Design of Low Power Wide Gates used in Register File and Tag Comparator

Design of Low Power Wide Gates used in Register File and Tag Comparator www..org 1 Design of Low Power Wide Gates used in Register File and Tag Comparator Isac Daimary 1, Mohammed Aneesh 2 1,2 Department of Electronics Engineering, Pondicherry University Pondicherry, 605014,

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Towards Energy-Proportional Datacenter Memory with Mobile DRAM

Towards Energy-Proportional Datacenter Memory with Mobile DRAM Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1 Karthika Periyathambi Benjamin Lee 2 Christos Kozyrakis 1 Mark Horowitz 1 Stanford University 1 Duke University

More information

CS/EE 6810: Computer Architecture

CS/EE 6810: Computer Architecture CS/EE 6810: Computer Architecture Class format: Most lectures on YouTube *BEFORE* class Use class time for discussions, clarifications, problem-solving, assignments 1 Introduction Background: CS 3810 or

More information