Ultra Low Power (ULP) Challenge in System Architecture Level

Similar documents
OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System

Near-Threshold Computing: Reclaiming Moore s Law

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

Ultra Low-Cost Defect Protection for Microprocessor Pipelines

COMPARITIVE ANALYSIS OF SRAM CELL TOPOLOGIES AT 65nm TECHNOLOGY

Design of Low Power Wide Gates used in Register File and Tag Comparator

Chapter 8. Coping with Physical Failures, Soft Errors, and Reliability Issues. System-on-Chip EE141 Test Architectures Ch. 8 Physical Failures - P.

CMP annual meeting, January 23 rd, 2014

Embedded SRAM Technology for High-End Processors

A 256kb 6T Self-Tuning SRAM with Extended 0.38V-1.2V Operating Range using Multiple Read/Write Assists and V MIN Tracking Canary Sensors

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Design and Simulation of Low Power 6TSRAM and Control its Leakage Current Using Sleepy Keeper Approach in different Topology

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

A 65nm LEVEL-1 CACHE FOR MOBILE APPLICATIONS

PARE: A Power-Aware Hardware Data Prefetching Engine

Execution-based Prediction Using Speculative Slices

Low-Power Technology for Image-Processing LSIs

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures

Chip-Multithreading Systems Need A New Operating Systems Scheduler

Regularity for Reduced Variability

DESIGN METHODS IN SUB-MICRON TECHNOLOGIES

Architecture at the end of Moore

Continuing Moore s law

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

Computing to the Energy and Performance Limits with Heterogeneous CPU-FPGA Devices. Dr Jose Luis Nunez-Yanez University of Bristol

Chip, Heal Thyself. The BulletProof Project

Advanced Digital Integrated Circuits. Lecture 9: SRAM. Announcements. Homework 1 due on Wednesday Quiz #1 next Monday, March 7

250nm Technology Based Low Power SRAM Memory

Designing for Low Power with Programmable System Solutions Dr. Yankin Tanurhan, Vice President, System Solutions and Advanced Applications

A novel low overhead fault tolerant Kogge-Stone adder using adaptive clocking

Advanced Digital Integrated Circuits. Lecture 9: SRAM. Announcements. Homework 1 due on Wednesday Quiz #1 next Monday, March 7

Circuit and Microarchitectural Techniques for Reducing Cache Leakage Power

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

Memory memories memory

Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window

Basic Sample and Hold Element. Prof. Paul Hasler Georgia Institute of Technology

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

Power-Efficient Approaches to Reliability. Abstract

A Case for Exploiting Complex Arithmetic Circuits towards Performance Yield Enhancement

Microprocessor Trends and Implications for the Future

SH-Mobile3: Application Processor for 3G Cellular Phones on a Low-Power SoC Design Platform

Bloom Filtering Cache Misses for Accurate Data Speculation and Prefetching

technology Leadership

A Non-Uniform Cache Architecture on Networks-on-Chip: A Fully Associative Approach with Pre-Promotion

Summer 2003 Lecture 18 07/09/03

Delay Modeling and Static Timing Analysis for MTCMOS Circuits

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Concept of Memory. The memory of computer is broadly categories into two categories:

ECE 595Z Digital Systems Design Automation

Leakage Mitigation Techniques in Smartphone SoCs

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

Reliable Physical Unclonable Function based on Asynchronous Circuits

FPGA Programming Technology

MCD: A Multiple Clock Domain Microarchitecture

Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014

250 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 2, FEBRUARY 2011

Chapter 5 Internal Memory

Design of Low Power 5T-Dual Vth SRAM-Cell

COEN-4730 Computer Architecture Lecture 12. Testing and Design for Testability (focus: processors)

A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias. David Kidd August 26, 2013

Shengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota

Energy Efficient Asymmetrically Ported Register Files

Reconfigurable Energy Efficient Near Threshold Cache Architectures


Design and verification of low power SRAM system: Backend approach

Adaptive Robustness Tuning for High Performance Domino Logic

Non-Uniform Set-Associative Caches for Power-Aware Embedded Processors

CS 152 Computer Architecture and Engineering

Energy Efficiency and Resilience in Future ICs

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

The Design and Implementation of a Low-Latency On-Chip Network

ISSN: [Bilani* et al.,7(2): February, 2018] Impact Factor: 5.164

Integrated CPU and Cache Power Management in Multiple Clock Domain Processors

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

An Effective Reconstruction of Replica Memory Design Optimization for Embedded System

MTJ-Based Nonvolatile Logic-in-Memory Architecture

Process Variations and Process-Tolerant Design

CS 152 Computer Architecture and Engineering

A Fault Tolerant Cache Architecture for Sub 500mV Operation: Resizable Data Composer Cache (RDC-Cache)

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Dependability, Power, and Performance Trade-off on a Multicore Processor

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

Innovative Power Control for. Performance System LSIs. (Univ. of Electro-Communications) (Tokyo Univ. of Agriculture and Tech.)

A Write-Back-Free 2T1D Embedded. a Dual-Row-Access Low Power Mode.

FinFETs: Quo Vadis? Niraj K. Jha Dept. of Electrical Engineering Princeton University

Optimizing Standby

EN1640: Design of Computing Systems Topic 06: Memory System

SigmaRAM Echo Clocks

An FPGA Architecture Supporting Dynamically-Controlled Power Gating

VSV: L2-Miss-Driven Variable Supply-Voltage Scaling for Low Power

Power / Capacity Scaling: Energy Savings With Simple Fault-Tolerant Caches

Memory in Digital Systems

CAD Technology of the SX-9

Transcription:

Ultra Low Power (ULP) Challenge in System Architecture Level - New architectures for 45-nm, 32-nm era ASP-DAC 2007 Designers' Forum 9D: Panel Discussion: Top 10 Design Issues Toshinori Sato (Kyushu U)

Global View Helps ULP Design Only to reduce power is not enough Variation tolerance, Soft error tolerance, and still High performance High-level consideration of power reduction is required Software optimization increases flexibilities of design Speculation can create new frontiers for optimizations Architecture selection can change characteristics of circuits Variation-aware (VA) ULP design examples

VA ULP Cache Architecture Process variations create ultra leaky transistors Fortunately, leakage current of an SRAM cell depends on the logic value stored Store leakage-safe values on entering into standby mode Power saving with negligible performance penalty 1 transistor out of 512K-bit SRAM Large Leak Mean Large Delay 5σ Vth =0.3V 100 tr. Threshold Voltage Delay is 2x of the average Leakage is 1,400x higher than average! 330x ±σ: 68.3% ±2σ: 95.4% ±3σ: 99.7% ±4σ: 99.9936% 1.8x ±5σ: 99.99994% M. Goudarzi: A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation, Session 9A @Room 411+412, just NOW.

VA ULP Cache Architecture Process variations create ultra leaky transistors Fortunately, leakage current of an SRAM cell depends on the logic value stored Store leakage-safe values on entering into standby mode Power saving with negligible performance penalty 0 1 2 3 4 5 6 7 4-way set-associative cache memory tag0 data0 tag1 data1 tag2 data2 tag3 data3 0110100101 0110100101 1110110011 1110110011 1-leaky cells 0-leaky cells M. Goudarzi: A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation, Session 9A @Room 411+412, just NOW.

VA ULP Cache Architecture Process variations create ultra leaky transistors Fortunately, leakage current of an SRAM cell depends on the logic value stored Store leakage-safe values on entering into standby mode Power saving with negligible performance penalty 1500 Power saving (nw) 1200 900 600 300 ARM920 M32R 0 0 10 20 30 40 50 60 70 Performance loss (ns) M. Goudarzi: A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation, Session 9A @Room 411+412, just NOW.

VA ULP Logic Architecture Typical-case design Optimizing not for worst cases but for typical cases Combination of two circuits Examples Main for power reduction Checker for correctness Razor FF Canary FF Potential of over 30% of energy reduction Ltd. soft error tolerance clk logic stage delayed clk Razor FF logic stage error comparator D. Ernst: Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation, MICRO, 2003. T. Sato: A Simple Flip-Flop Circuit for Typical-Case Designs for DFM, ISQED, 2007.

VA ULP Logic Architecture Typical-case design Optimizing not for worst cases but for typical cases Combination of two circuits Examples Main for power reduction Checker for correctness Razor FF Canary FF Potential of over 30% of energy reduction Ltd. soft error tolerance 40% 30% 20% 10% 0% clk logic stage clk delay Canary FF logic stage trigger comparator gzip vpr gcc parser vortex bzip2 D. Ernst: Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation, MICRO, 2003. T. Sato: A Simple Flip-Flop Circuit for Typical-Case Designs for DFM, ISQED, 2007.

VA ULP CMP Architecture Statistical characteristics of circuit delay As the number of critical paths increases, the mean delay increases and the standard deviation decreases CMP with simple CPU cores reduces critical path delay, and increases the number of critical paths is more variation-tolerant 1.2 1 0.8 0.6 0.4 0.2 0 100 x2 5 6 7 8 9 10 11 M. Hashimoto: Increase in Delay Uncertainty by Performance Optimization, ISCAS, 2001. T. Sato: Architectures Study beyond Physical Limitations, NGArch Forum, July 2006.