Lecture: Storage, GPUs. Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4)

Similar documents
Lecture 27: Pot-Pourri. Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs Disks and reliability

Lecture 23: Storage Systems. Topics: disk access, bus design, evaluation metrics, RAID (Sections )

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 27: Multiprocessors. Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs

Lecture: Networks, Disks, Datacenters, GPUs. Topics: networks wrap-up, disks and reliability, datacenters, GPU intro (Sections

I/O CANNOT BE IGNORED

Storage Systems : Disks and SSDs. Manu Awasthi CASS 2018

I/O CANNOT BE IGNORED

Appendix D: Storage Systems

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Storage Systems. Storage Systems

Chapter 6. Storage and Other I/O Topics

Components of the Virtual Memory System

Chapter 9: Peripheral Devices: Magnetic Disks

Page 1. Magnetic Disk Purpose Long term, nonvolatile storage Lowest level in the memory hierarchy. Typical Disk Access Time

Storage systems. Computer Systems Architecture CMSC 411 Unit 6 Storage Systems. (Hard) Disks. Disk and Tape Technologies. Disks (cont.

Storage Systems : Disks and SSDs. Manu Awasthi July 6 th 2018 Computer Architecture Summer School 2018

Mass-Storage Structure

Parallel Processing SIMD, Vector and GPU s cont.

Storage System COSC UCB

LECTURE 5: MEMORY HIERARCHY DESIGN

BBM371- Data Management. Lecture 2: Storage Devices

CSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011

Lecture 23. Finish-up buses Storage

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Computer System Architecture

CSE 120. Operating Systems. March 27, 2014 Lecture 17. Mass Storage. Instructor: Neil Rhodes. Wednesday, March 26, 14

COT 4600 Operating Systems Fall 2009

Chapter 6 External Memory

Concepts Introduced. I/O Cannot Be Ignored. Typical Collection of I/O Devices. I/O Issues

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 30: GP-GPU Programming. Lecturer: Alan Christopher

Copyright 2012, Elsevier Inc. All rights reserved.

Portland State University ECE 588/688. Graphics Processors

Final Lecture. A few minutes to wrap up and add some perspective

CS143: Disks and Files

Principles of Data Management. Lecture #2 (Storing Data: Disks and Files)

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Storage. CS 3410 Computer System Organization & Programming

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara

Copyright 2012, Elsevier Inc. All rights reserved.

Storage Devices for Database Systems

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

CS 179 Lecture 4. GPU Compute Architecture

Chapter 6 Storage and Other I/O Topics

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

OPERATING SYSTEMS CS3502 Spring Input/Output System Chapter 9

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

CS2410: Computer Architecture. Storage systems. Sangyeun Cho. Computer Science Department University of Pittsburgh

Warps and Reduction Algorithms

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Secondary storage. File systems

CS3600 SYSTEMS AND NETWORKS

Reliable Computing I

Adapted from David Patterson s slides on graduate computer architecture

CDA3101 Recitation Section 13

COMP283-Lecture 3 Applied Database Management

GPU Performance Optimisation. Alan Gray EPCC The University of Edinburgh

A track on a magnetic disk is a concentric rings where data is stored.

Chapter 11. I/O Management and Disk Scheduling

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Physical Representation of Files

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

CS 554: Advanced Database System

CS152 Computer Architecture and Engineering Lecture 19: I/O Systems

Programmer's View of Execution Teminology Summary

Database Systems II. Secondary Storage

Lectures More I/O

The Memory Hierarchy 10/25/16

CS61C : Machine Structures

RAID 0 (non-redundant) RAID Types 4/25/2011

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Operating Systems. Operating Systems Professor Sina Meraji U of T

Monday, May 4, Discs RAID: Introduction Error detection and correction Error detection: Simple parity Error correction: Hamming Codes

CUDA and GPU Performance Tuning Fundamentals: A hands-on introduction. Francesco Rossi University of Bologna and INFN

Chapter 10: Mass-Storage Systems

Disk Scheduling. Based on the slides supporting the text

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

OPERATING SYSTEMS CS3502 Spring Input/Output System Chapter 9

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

Scientific Computing on GPUs: GPU Architecture Overview

Part IV I/O System Chapter 1 2: 12: Mass S torage Storage Structur Structur Fall 2010

Disk Scheduling. Chapter 14 Based on the slides supporting the text and B.Ramamurthy s slides from Spring 2001

Outlines. Chapter 2 Storage Structure. Structure of a DBMS (with some simplification) Structure of a DBMS (with some simplification)

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security

Chapter 12: Mass-Storage

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

Magnetic Disk. Optical. Magnetic Tape. RAID Removable. CD-ROM CD-Recordable (CD-R) CD-R/W DVD

Transcription:

Lecture: Storage, GPUs Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4) 1

Magnetic Disks A magnetic disk consists of 1-12 platters (metal or glass disk covered with magnetic recording material on both sides), with diameters between 1-3.5 inches Each platter is comprised of concentric tracks (5-30K) and each track is divided into sectors (100 500 per track, each about 512 bytes) A movable arm holds the read/write heads for each disk surface and moves them all in tandem a cylinder of data is accessible at a time 2

Disk Latency To read/write data, the arm has to be placed on the correct track this seek time usually takes 5 to 12 ms on average can take less if there is spatial locality Rotational latency is the time taken to rotate the correct sector under the head average is typically more than 2 ms (15,000 RPM) Transfer time is the time taken to transfer a block of bits out of the disk and is typically 3 65 MB/second A disk controller maintains a disk cache (spatial locality can be exploited) and sets up the transfer on the bus (controller overhead) 3

RAID Reliability and availability are important metrics for disks RAID: redundant array of inexpensive (independent) disks Redundancy can deal with one or more failures Each sector of a disk records check information that allows it to determine if the disk has an error or not (in other words, redundancy already exists within a disk) When the disk read flags an error, we turn elsewhere for correct data 4

RAID 0 and RAID 1 RAID 0 has no additional redundancy (misnomer) it uses an array of disks and stripes (interleaves) data across the arrays to improve parallelism and throughput RAID 1 mirrors or shadows every disk every write happens to two disks Reads to the mirror may happen only when the primary disk fails or, you may try to read both together and the quicker response is accepted Expensive solution: high reliability at twice the cost 5

RAID 3 Data is bit-interleaved across several disks and a separate disk maintains parity information for a set of bits For example: with 8 disks, bit 0 is in disk-0, bit 1 is in disk-1,, bit 7 is in disk-7; disk-8 maintains parity for all 8 bits For any read, 8 disks must be accessed (as we usually read more than a byte at a time) and for any write, 9 disks must be accessed as parity has to be re-calculated High throughput for a single request, low cost for redundancy (overhead: 12.5%), low task-level parallelism 6

RAID 4 and RAID 5 Data is block interleaved this allows us to get all our data from a single disk on a read in case of a disk error, read all 9 disks Block interleaving reduces thruput for a single request (as only a single disk drive is servicing the request), but improves task-level parallelism as other disk drives are free to service other requests On a write, we access the disk that stores the data and the parity disk parity information can be updated simply by checking if the new data differs from the old data 7

RAID 5 If we have a single disk for parity, multiple writes can not happen in parallel (as all writes must update parity info) RAID 5 distributes the parity block to allow simultaneous writes 8

Other Reliability Approaches High reliability is also expected of memory systems; many memory systems offer SEC-DED support single error correct, double error detect; implemented with an 8-bit code for every 64-bit data word on ECC DIMMs Some memory systems offer chipkill support the ability to recover from complete failure in one memory chip many implementations exist, some resembling RAID designs Caches are typically protected with SEC-DED codes Some cores implement various forms of redundancy, e.g., DMR or TMR dual or triple modular redundancy 9

SIMD Processors Single instruction, multiple data Such processors offer energy efficiency because a single instruction fetch can trigger many data operations Such data parallelism may be useful for many image/sound and numerical applications 10

GPUs Initially developed as graphics accelerators; now viewed as one of the densest compute engines available Many on-going efforts to run non-graphics workloads on GPUs, i.e., use them as general-purpose GPUs or GPGPUs C/C++ based programming platforms enable wider use of GPGPUs CUDA from NVidia and OpenCL from an industry consortium A heterogeneous system has a regular host CPU and a GPU that handles (say) CUDA code (they can both be on the same chip) 11

The GPU Architecture SIMT single instruction, multiple thread; a GPU has many SIMT cores A large data-parallel operation is partitioned into many thread blocks (one per SIMT core); a thread block is partitioned into many warps (one warp running at a time in the SIMT core); a warp is partitioned across many in-order pipelines (each is called a SIMD lane) A SIMT core can have multiple active warps at a time, i.e., the SIMT core stores the registers for each warp; warps can be context-switched at low cost; a warp scheduler keeps track of runnable warps and schedules a new warp if the currently running warp stalls 12

The GPU Architecture 13

Architecture Features Simple in-order pipelines that rely on thread-level parallelism to hide long latencies Many registers (~1K) per in-order pipeline (lane) to support many active warps When a branch is encountered, some of the lanes proceed along the then case depending on their data values; later, the other lanes evaluate the else case; a branch cuts the data-level parallelism by half (branch divergence) When a load/store is encountered, the requests from all lanes are coalesced into a few 128B cache line requests; each request may return at a different time (mem divergence) 14

GPU Memory Hierarchy Each SIMT core has a private L1 cache (shared by the warps on that core) A large L2 is shared by all SIMT cores; each L2 bank services a subset of all addresses Each L2 partition is connected to its own memory controller and memory channel The GDDR5 memory system runs at higher frequencies, and uses chips with more banks, wide IO, and better power delivery networks A portion of GDDR5 memory is private to the GPU and the rest is accessible to the host CPU (the GPU performs copies) 15

Advanced Courses For GPU architectures and programming, see Mary Hall s CS 6235, Parallel Programming for Many-Core Arch Spr 13: CS 7810: Advanced Computer Architecture Mo/We 11:50am-1:10pm Core design, cache hierarchies, networks, memory systems, datacenters, etc. Major course project that evaluates original ideas with simulators (often leads to publications) One assignment Take-home final 16

Title Bullet 17