Lecture 5: Scheduling and Reliability. Topics: scheduling policies, handling DRAM errors

Similar documents
Lecture 5: Refresh, Chipkill. Topics: refresh basics and innovations, error correction

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University

Lecture 15: DRAM Main Memory Systems. Today: DRAM basics and innovations (Section 2.3)

Staged Memory Scheduling

Computer Architecture Lecture 24: Memory Scheduling

Lecture: Memory Technology Innovations

Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter

Lecture: Memory, Multiprocessors. Topics: wrap-up of memory systems, intro to multiprocessors and multi-threaded programming models

DRAM Faults: Data from the Field

Virtualized and Flexible ECC for Main Memory

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

THE DYNAMIC GRANULARITY MEMORY SYSTEM

Computer Architecture: Memory Interference and QoS (Part I) Prof. Onur Mutlu Carnegie Mellon University

Lecture: DRAM Main Memory. Topics: virtual memory wrap-up, DRAM intro and basics (Section 2.3)

LECTURE 5: MEMORY HIERARCHY DESIGN

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

COSC 6385 Computer Architecture - Memory Hierarchies (II)

Understanding Reduced-Voltage Operation in Modern DRAM Devices

DRAM Main Memory. Dual Inline Memory Module (DIMM)

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

MEMORY reliability is a major challenge in the design of

Copyright 2012, Elsevier Inc. All rights reserved.

Designing High-Performance and Fair Shared Multi-Core Memory Systems: Two Approaches. Onur Mutlu March 23, 2010 GSRC

Flexible Cache Error Protection using an ECC FIFO

COSC 6385 Computer Architecture - Memory Hierarchies (III)

CSE502: Computer Architecture CSE 502: Computer Architecture

Case Study 1: Optimizing Cache Performance via Advanced Techniques

High Performance Memory Access Scheduling Using Compute-Phase Prediction and Writeback-Refresh Overlap

EEM 486: Computer Architecture. Lecture 9. Memory

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

Lecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, intro to multi-thread programming models

Design-Induced Latency Variation in Modern DRAM Chips:

Virtualized ECC: Flexible Reliability in Memory Systems

FAULT TOLERANT SYSTEMS

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Resilient Memory Architectures A very short tutorial on ECC and repair Dong Wan Kim Jungrae Kim Mattan Erez

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

Computer Architecture Lecture 22: Memory Controllers. Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 3/25/2015

Row Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

CS5460: Operating Systems Lecture 20: File System Reliability

CMU Introduction to Computer Architecture, Spring Midterm Exam 2. Date: Wed., 4/17. Legibility & Name (5 Points): Problem 1 (90 Points):

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Storage Systems. Storage Systems

Scalable Many-Core Memory Systems Topic 3: Memory Interference and QoS-Aware Memory Systems

SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS

This Unit: Main Memory. Building a Memory System. First Memory System Design. An Example Memory System

Nonblocking Memory Refresh. Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian

Memories: Memory Technology

CMU Introduction to Computer Architecture, Spring 2014 HW 5: Virtual Memory, Caching and Main Memory

Memory Systems in the Many-Core Era: Some Challenges and Solution Directions. Onur Mutlu June 5, 2011 ISMM/MSPC

Let Software Decide: Matching Application Diversity with One- Size-Fits-All Memory

ECE 485/585 Microprocessor System Design

Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 23 Memory Systems

CSE502: Computer Architecture CSE 502: Computer Architecture

Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

Lecture: Storage, GPUs. Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4)

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

HP Advanced Memory Protection technologies

Adapted from David Patterson s slides on graduate computer architecture

ECE 250 / CS250 Introduction to Computer Architecture

Copyright 2012, Elsevier Inc. All rights reserved.

HPCA 18. Reliability-aware Data Placement for Heterogeneous memory Architecture

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

Thesis Defense Lavanya Subramanian

TEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT

OS and Hardware Tuning

Architectural Level Fault- Tolerance Techniques. EECE 513: Design of Fault- tolerant Digital Systems

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

OS and HW Tuning Considerations!

A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps

2000 N + N <100N. When is: Find m to minimize: (N) m. N log 2 C 1. m + C 3 + C 2. ESE534: Computer Organization. Previously. Today.

Supercomputer Field Data. DRAM, SRAM, and Projections for Future Systems

IMME256M64D2SOD8AG (Die Revision E) 2GByte (256M x 64 Bit)

I/O CANNOT BE IGNORED

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Main Memory. Readings

IMME256M64D2DUD8AG (Die Revision E) 2GByte (256M x 64 Bit)

Basics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS

I/O CANNOT BE IGNORED

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Case studies from the real world: The importance of measurement and analysis in system building and design

mcelog Memory error handling in user space

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems

GFS: The Google File System

ECE468 Computer Organization and Architecture. Memory Hierarchy

CS 320 February 2, 2018 Ch 5 Memory

1. Introduction. Traditionally, a high bandwidth file system comprises a supercomputer with disks connected

The Google File System

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Lecture 17 Introduction to Memory Hierarchies" Why it s important " Fundamental lesson(s)" Suggested reading:" (HP Chapter

Improving DRAM Performance by Parallelizing Refreshes with Accesses

Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era

Lecture 11: Large Cache Design

ECE 331 Hardware Organization and Design. UMass ECE Discussion 11 4/12/2018

Transcription:

Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors 1

PAR-BS Mutlu and Moscibroda, ISCA 08 A batch of requests (per bank) is formed: each thread can only contribute R requests to this batch; batch requests have priority over non-batch requests Within a batch, priority is first given to row buffer hits, then to threads with a higher rank, then to older requests Rank is computed based on the thread s memory intensity; low-intensity threads are given higher priority; this policy improves batch completion time and overall throughput By using rank, requests from a thread are serviced in parallel; hence, parallelism-aware batch scheduling 2

TCM Kim et al., MICRO 2010 Organize threads into latency-sensitive and bw-sensitive clusters based on memory intensity; former gets higher priority Within bw-sensitive cluster, priority is based on rank Rank is determined based on niceness of a thread and the rank is periodically shuffled with insertion shuffling or random shuffling (the former is used if there is a big gap in niceness) Threads with low row buffer hit rates and high bank level parallelism are considered nice to others 3

Minimalist Open-Page Kaseridis et al., MICRO 2011 Place 4 consecutive cache lines in one bank, then the next 4 in a different bank and so on provides the best balance between row buffer locality and bank-level parallelism Don t have to worry as much about fairness Scheduling first takes priority into account, where priority is determined by wait-time, prefetch distance, and MLP in thread A row is precharged after 50 ns, or immediately following a prefetch-dictated large burst 4

Other Scheduling Ideas Using reinforcement learning Ipek et al., ISCA 2008 Co-ordinating across multiple MCs Kim et al., HPCA 2010 Co-ordinating requests from GPU and CPU Ausavarungnirun et al., ISCA 2012 Several schedulers in the Memory Scheduling Championship at ISCA 2012 Predicting the number of row buffer hits Awasthi et al., PACT 2011 5

Basic Reliability Every 64-bit data transfer is accompanied by an 8-bit (Hamming) code typically stored in a x8 DRAM chip Guaranteed to detect any 2 errors and recover from any single-bit error (SECDED) Such DIMMs are commodities and are sufficient for most applications that are inherently error-tolerant (search) 12.5% overhead in storage and energy For a BCH code, to correct t errors in k-bit data, need an r-bit code, r = t * ceil (log2 k) + 1 6

Terminology Hard errors: caused by permanent device-level faults Soft errors: caused by particle strikes, noise, etc. SDC: silent data corruption (error was never detected) DUE: detected unrecoverable error DUE in memory caused by a hard error will typically lead to DIMM replacement Scrubbing: a background scan of memory (1GB every 45 mins) to detect and correct 1-bit errors 7

Field Studies Schroeder et al., SIGMETRICS 2009 Memory errors are the top causes for hw failures in servers and DIMMs are the top component replacements in servers Study examined Google servers in 2006-2008, using DDR1, DDR2, and FBDIMM 8

Field Studies Schroeder et al., SIGMETRICS 2009 9

Field Studies Schroeder et al., SIGMETRICS 2009 A machine with past errors is more likely to have future errors 20% of DIMMs account for 94% of errors DIMMs in platforms C and D see higher UE rates because they do not have chipkill 65-80% of uncorrectable errors are preceded by a correctable error in the same month but predicting a UE is very difficult Chip/DIMM capacity does not strongly influence error rates Higher temperature by itself does not cause more errors, but higher system utilization does Error rates do increase with age; the increase is steep in the 10-18 month range and then flattens out 10

Chipkill Chipkill correct systems can withstand failure of an entire DRAM chip For chipkill correctness the 72-bit word must be spread across 72 DRAM chips or, a 13-bit word (8-bit data and 5-bit ECC) must be spread across 13 DRAM chips 11

RAID-like DRAM Designs DRAM chips do not have built-in error detection Can employ a 9-chip rank with ECC to detect and recover from a single error; in case of a multi-bit error, rely on a second tier of error correction Can do parity across DIMMs (needs an extra DIMM); use ECC within a DIMM to recover from 1-bit errors; use parity across DIMMs to recover from multi-bit errors in 1 DIMM Reads are cheap (must only access 1 DIMM); writes are expensive (must read and write 2 DIMMs) Used in some HP servers 12

RAID-like DRAM Udipi et al., ISCA 10 Add a checksum to every row in DRAM; verified at the memory controller Adds area overhead, but provides self-contained error detection When a chip fails, can re-construct data by examining another parity DRAM chip Can control overheads by having checksum for a large row or one parity chip for many data chips Writes are again problematic 13

Virtualized ECC Yoon and Erez, ASPLOS 10 Also builds a two-tier error protection scheme, but does the second tier in software The second-tier codes are stored in the regular physical address space (not specialized DRAM chips); software has flexibility in terms of the types of codes to use and the types of pages that are protected Reads are cheap; writes are expensive as usual; but, the second-tier codes can now be cached; greatly helps reduce the number of DRAM writes Requires a 144-bit datapath (increases overfetch) 14

LoT-ECC Udipi et al., ISCA 2012 Use checksums to detect errors and parity codes to fix Requires access of only 9 DRAM chips per read, but the storage overhead grows to 26% 57 +7 7 7 15

Title Bullet 16