Low Power Cache Design. Angel Chen Joe Gambino

Similar documents
Breaking the Energy Barrier in Fault-Tolerant Caches for Multicore Systems

Very Large Scale Integration (VLSI)

HDL IMPLEMENTATION OF SRAM BASED ERROR CORRECTION AND DETECTION USING ORTHOGONAL LATIN SQUARE CODES

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes

Two-Layer Error Control Codes Combining Rectangular and Hamming Product Codes for Cache Error

An Integrated ECC and BISR Scheme for Error Correction in Memory

DETECTION AND CORRECTION OF CELL UPSETS USING MODIFIED DECIMAL MATRIX

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

Exploiting Unused Spare Columns to Improve Memory ECC

An Integrated ECC and Redundancy Repair Scheme for Memory Reliability Enhancement

Adaptive Multi-bit Crosstalk-Aware Error Control Coding Scheme for On-Chip Communication

Comparative Analysis of DMC and PMC on FPGA

An Area-Efficient BIRA With 1-D Spare Segments

Post-Manufacturing ECC Customization Based on Orthogonal Latin Square Codes and Its Application to Ultra-Low Power Caches

Majority Logic Decoding Of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

Error Correction Using Extended Orthogonal Latin Square Codes

Improving Memory Repair by Selective Row Partitioning

Analysis of 8T SRAM Cell Using Leakage Reduction Technique

Scalable Controller Based PMBIST Design For Memory Testability M. Kiran Kumar, G. Sai Thirumal, B. Nagaveni M.Tech (VLSI DESIGN)

Reliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure

An Energy-Efficient Scan Chain Architecture to Reliable Test of VLSI Chips

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

FPGA Implementation of ALU Based Address Generation for Memory

A Robust Bloom Filter

ABSTRACT I. INTRODUCTION

An Energy Improvement in Cache System by Using Write Through Policy

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

Outline. Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication. Outline

A Low-Cost Correction Algorithm for Transient Data Errors

A Performance Degradation Tolerable Cache Design by Exploiting Memory Hierarchies

Available online at ScienceDirect. Procedia Technology 25 (2016 )

POWERFUL BISR DESIGN FOR EMBEDDED SRAM WITH SELECTABLE REDUNDANCY

Improved Error Correction Capability in Flash Memory using Input / Output Pins

Overview ECE 753: FAULT-TOLERANT COMPUTING 1/21/2014. Recap. Fault Modeling. Fault Modeling (contd.) Fault Modeling (contd.)

A Universal Test Pattern Generator for DDR SDRAM *

Error Detecting and Correcting Code Using Orthogonal Latin Square Using Verilog HDL

SRAM Delay Fault Modeling and Test Algorithm Development

Built-in Self-Test and Repair (BISTR) Techniques for Embedded RAMs

ReSpace/MAPLD Conference Albuquerque, NM, August A Fault-Handling Methodology by Promoting Hardware Configurations via PageRank

FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes

Design and Implementation of Microcode based Built-in Self-Test for Fault Detection in Memory and its Repair

FLEXIBLE PRODUCT CODE-BASED ECC SCHEMES FOR MLC NAND FLASH MEMORIES

ARCHITECTURE DESIGN FOR SOFT ERRORS

A Proposed RAISIN for BISR for RAM s with 2D Redundancy

Area Versus Detection Latency Trade-Offs in Self-Checking Memory Design

Modeling And Simulation Of Microcode Based Asynchronous Memory Built In Self Test For Fault Detection Using Verilog

Chapter 8. Coping with Physical Failures, Soft Errors, and Reliability Issues. System-on-Chip EE141 Test Architectures Ch. 8 Physical Failures - P.

Yield Enhancement Considerations for a Single-Chip Multiprocessor System with Embedded DRAM

A Low-Power ECC Check Bit Generator Implementation in DRAMs

ECE 1767 University of Toronto

TESTING OF FAULTS IN VLSI CIRCUITS USING ONLINE BIST TECHNIQUE BASED ON WINDOW OF VECTORS

CALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL

3D Memory Formed of Unrepairable Memory Dice and Spare Layer

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN

DESIGN OF FAULT SECURE ENCODER FOR MEMORY APPLICATIONS IN SOC TECHNOLOGY

Effective Implementation of LDPC for Memory Applications

Survey on Stability of Low Power SRAM Bit Cells

RAM Testing Algorithms for Detection Multiple Linked Faults

Error Detection And Correction

Efficient Algorithm for Test Vector Decompression Using an Embedded Processor

Associate Professor Dr. Raed Ibraheem Hamed

Analysis of Soft Error Mitigation Techniques for Register Files in IBM Cu-08 90nm Technology

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead

Chapter 5. Internal Memory. Yonsei University

Single error correction, double error detection and double adjacent error correction with no mis-correction code

AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM

Test/Repair Area Overhead Reduction for Small Embedded SRAMs

FAULT TOLERANT SYSTEMS

Safety and Reliability of Software-Controlled Systems Part 14: Fault mitigation

Efficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation

Error Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna

Improving Fault Tolerance of Network-on-Chip Links via Minimal Redundancy and Reconfiguration

Near-Threshold Computing: How Close Should We Get?

Block Sparse and Addressing for Memory BIST Application

[Kalyani*, 4.(9): September, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

Transient Fault Detection and Reducing Transient Error Rate. Jose Lugo-Martinez CSE 240C: Advanced Microarchitecture Prof.

ECC Protection in Software

DESIGN AND ANALYSIS OF TRANSIENT FAULT TOLERANCE FOR MULTI CORE ARCHITECTURE

Implementation of Decimal Matrix Code For Multiple Cell Upsets in Memory

CSCI 402: Computer Architectures. Performance of Multilevel Cache

Complex test pattern generation for high speed fault diagnosis in Embedded SRAM

Efficient BISR strategy for Embedded SRAM with Selectable Redundancy using MARCH SS algorithm. P. Priyanka 1 and J. Lingaiah 2

Scan-Based BIST Diagnosis Using an Embedded Processor

On-Line Failure Detection and Confinement in Caches

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University

Computer Architecture: Multithreading (III) Prof. Onur Mutlu Carnegie Mellon University

Lecture 5: Refresh, Chipkill. Topics: refresh basics and innovations, error correction

Design of high speed low power Content Addressable Memory (CAM) using parity bit and gated power matchline sensing

At-Speed On-Chip Diagnosis of Board-Level Interconnect Faults

ISSN Vol.04,Issue.01, January-2016, Pages:

An Advanced and more Efficient Built-in Self-Repair Strategy for Embedded SRAM with Selectable Redundancy

N-Model Tests for VLSI Circuits

120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014

International Journal of Advance Engineering and Research Development LOW POWER AND HIGH PERFORMANCE MSML DESIGN FOR CAM USE OF MODIFIED XNOR CELL

LOW POWER SRAM CELL WITH IMPROVED RESPONSE

A Survey on Dram Testing and Its Algorithms

VERY large scale integration (VLSI) design for power

Model EXAM Question Bank

Memory Bus Encoding for Low Power: A Tutorial

Transcription:

Low Power Cache Design Angel Chen Joe Gambino

Agenda Why is low power important? How does cache contribute to the power consumption of a processor? What are some design challenges for low power caches? Existing solutions for low power cache design Researched ultra-low-voltage cache design solution for many-core systems Detailed architectural overview of the design performance comparison with existing solutions

Background The cost/performance ratio of computing systems have seen a steady decline due to advances in: Integrated circuit technology Architectural improvements in CPU design At the same time, more and more emphasis is given to the minimization of environmental footprint of computing systems. In particular, energy efficiency is a key concern in future many-core (1000+ processors) system design.

Background (cont.) Among all processor components, the cache consumes about 40% of the total area and 20% of power. Supply voltage scaling can reduce system energy consumption at the cost of performance and reliability. (Especially with the current trend toward many-core processors and deep nanometer designs)

Existing Solutions Cache designs that intrinsically consume less energy. Employ advanced fault tolerance techniques to maintain reliability under low supply voltage.

Existing Solutions (cont.) Redundant Memory Excessive area and power overheads when cell-failure probability increases Cache Line Disabling This is only effective if the cell-failure probability is low ECC (error correcting codes) E.g. extended hamming codes, multi-bit clustered ECC

Cache Fault Analysis Soft cache fault is generally caused by random noises or particle strikes. It occurs transiently and is recoverable through error-detecting/correcting codes. Hard cache fault is persistent memory cell fault that occurs as a result of chip manufacturing defects or device aging. Error correction codes can also be used to correct this failure but can only correct single-bit errors.

Cache Fault Analysis (cont.)

Cache Fault Analysis (cont.)

Researched Solution Cache Line Disabling + Double Error Correcting Triple Error Detecting Codes (1-bit error correction for hard errors and 1-bit error correction for soft errors)

Researched Solution (cont.)

Cache Line Disabling On ROM software called Built In Self Test checks for errors during initialization of processor Detects Hard Cache faults Disables Cache lines with more than one fault Only works if the number of faulty lines is low Requires no additional hardware overhead

Cache Line Disabling (cont.) SRAM - Modeled as 3 blocks Memory Cell Array, Address Decoder, and Read/Write Circuit All faults can be detected by checking memory cells Two general types of cache errors Single-Cell and Multi-Cell Stuck-At 1/0, Stuck Open, State Transition, Data Retention Cell State Coupling, Multiple Address

Cache Line Disabling (cont.) Any BIST with 100% correctness can be used Most efficient is 9N with an added Data Retention test 9N utilizes Marching For Word-Oriented SRAM, each march element is a word Introduces Multi-Cell errors Writes one march element at a time to each word address, reads it, compares them After marching, data retention test runs This method performs all checks with the best speed/rom space ratio

ECC Overview Error Detection with Parity Codes can detect all odd number faults, cannot correct error Single-Error Correction Codes - product codes (2-D parity check), parity check matrices - such coding scheme is commonly referred to as hamming codes - cannot simultaneously detect double-bit errors

ECC Overview (cont.)

ECC Overview (cont.) Single-Error Correct Double-Error Detect Codes - widely used to protect memory cells - extend SEC by adding an extra bit that represents the parity over the seven bits of the SEC code word

DECTED Double-Error Correcting Triple-Error Detecting Codes

DECTED (cont.)

Simulation Comparison

Simulation Comparison (cont.)

Simulation Comparison (cont.)

Simulation Comparison (cont.)

Simulation Comparison (cont.)

Simulation Comparison (cont.)

References [1] M. Zhang, V. Stojanovic and P. Ampadu, "Reliable ultra-low-voltage cache design for many-core systems," IEEE Trans. Circuits Syst. II: Regular Papers,, vol. 59, no. 12, pp. 858-862, Dec. 2012. [2] A. Aladmeldeen, I. Wagner, Z. Chishti, W. Wu, C. Wilkerson, and S.-L. Lu, Energy-efficient cache design using variable-strength errorcorrecting codes, in Proc. 38th Int. Symp. Comput. Architecture, 2011, pp. 461 472. [3] Mukherjee, Shubu. Architecture Design for Soft Errors. Amsterdam: Morgan Kaufmann/Elsevier, 2008. Print. [4]A.Agarwal,B.C.Paul,H.Mahmoodi,A.Datta,and K.Roy, A processtolerant cache architecture for improved yield in nanoscale technologies, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 1, pp. 27 38, Jan. 2005. [5] M. H. Tehranipour, Z. Navabi, and S. M. Falkhrai, An efficient BIST method for testing of embedded SRAMs, in Proc. IEEE Int. Symp. Circuits and Systems, vol. 5, May 2001, pp. 73 76. [6] Kenyon, Samantha Rose, "Drowsy Cache Partitioning for Multithreaded Systems and High Level Caches" (2014). Thesis. Rochester Institute of Technology. Accessed from RIT Scholar Works Thesis/Dissertation Collections.

Questions?