CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 24: More I/O: DMA, Disks, Networking

Similar documents
Computer Science 61C Spring Friedland and Weaver. Input/Output

CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 38: IO and Networking

CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory

Internet II. CS10 : Beauty and Joy of Computing. cs10.berkeley.edu. !!Senior Lecturer SOE Dan Garcia!!! Garcia UCB!

CS61C : Machine Structures

UCB CS61C : Machine Structures

Motivation for Input/Output

CS 61C: Great Ideas in Computer Architecture. Lecture 23: Virtual Memory. Bernhard Boser & Randy Katz

John Wawrzynek & Nick Weaver

Goals for today:! Recall : 5 components of any Computer! ! What do we need to make I/O work?!

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 26 IO Basics, Storage & Performance I ! HACKING THE SMART GRID!

CS61C : Machine Structures

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Lecture 25: Dependability and RAID

CS 61C: Great Ideas in Computer Architecture Virtual Memory. Instructors: John Wawrzynek & Vladimir Stojanovic

Virtual Memory: From Address Translation to Demand Paging

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

Review 1/2 Operating System started as shared I/O library. CS61C Anatomy of I/O Devices: Networks. Lecture 14

CS 318 Principles of Operating Systems

CS 152 Computer Architecture and Engineering. Lecture 8 - Address Translation

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

CS 152 Computer Architecture and Engineering. Lecture 11 - Virtual Memory and Caches

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

Storage. CS 3410 Computer System Organization & Programming

Recap of Networking Intro. Lecture #28 Networking & Disks Protocol Family Concept. Protocol for Networks of Networks?

Chapter 2 Computer-System Structure

Virtual Memory: From Address Translation to Demand Paging

CS370 Operating Systems

Lecture 9 - Virtual Memory

CS5460: Operating Systems Lecture 14: Memory Management (Chapter 8)

CS155b: E-Commerce. Lecture 3: Jan 16, How Does the Internet Work? Acknowledgements: S. Bradner and R. Wang

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

ECE232: Hardware Organization and Design

CS 152 Computer Architecture and Engineering. Lecture 9 - Virtual Memory

Storage systems. Computer Systems Architecture CMSC 411 Unit 6 Storage Systems. (Hard) Disks. Disk and Tape Technologies. Disks (cont.

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Networking and Internetworking 1

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

10/19/17. You Are Here! Review: Direct-Mapped Cache. Typical Memory Hierarchy

Midterm #2 Exam Solutions April 26, 2006 CS162 Operating Systems

Storage Systems. Storage Systems

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

I/O CANNOT BE IGNORED

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

CS 61C: Great Ideas in Computer Architecture. Input/Output

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

New-School Machine Structures. Overarching Theme for Today. Agenda. Review: Memory Management. The Problem 8/1/2011

10/16/17. Outline. Outline. Typical Memory Hierarchy. Adding Cache to Computer. Key Cache Concepts

Lecture 9 Virtual Memory

CS 152 Computer Architecture and Engineering. Lecture 9 - Address Translation

Announcements. IP Forwarding & Transport Protocols. Goals of Today s Lecture. Are 32-bit Addresses Enough? Summary of IP Addressing.

CSE 120 Principles of Operating Systems

Review. Manage memory to disk? Treat as cache. Lecture #26 Virtual Memory II & I/O Intro

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Outline: Connecting Many Computers

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 3

Lecture 8: Networks to Internetworks

CSE 120 Principles of Operating Systems Spring 2017

Computer Science 146. Computer Architecture

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Chapter Seven Morgan Kaufmann Publishers

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Networks. Distributed Systems. Philipp Kupferschmied. Universität Karlsruhe, System Architecture Group. May 6th, 2009

CS370 Operating Systems

Virtual Memory. Motivations for VM Address translation Accelerating translation with TLBs

Links Reading: Chapter 2. Goals of Todayʼs Lecture. Message, Segment, Packet, and Frame

EECS 470. Lecture 16 Virtual Memory. Fall 2018 Jon Beaumont

Lecture 29. Friday, March 23 CS 470 Operating Systems - Lecture 29 1

Virtual Memory. Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security

ECE331: Hardware Organization and Design

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13

Review: Performance Latency vs. Throughput. Time (seconds/program) is performance measure Instructions Clock cycles Seconds.

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

CS152 Computer Architecture and Engineering Lecture 19: I/O Systems

Lecture 10: Internetworking"

Storage. Hwansoo Han

Readings. Storage Hierarchy III: I/O System. I/O (Disk) Performance. I/O Device Characteristics. often boring, but still quite important

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CPS104 Computer Organization and Programming Lecture 16: Virtual Memory. Robert Wagner

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Principles of Data Management. Lecture #2 (Storing Data: Disks and Files)

Modern Virtual Memory Systems. Modern Virtual Memory Systems

COMP283-Lecture 3 Applied Database Management

INPUT/OUTPUT DEVICES Dr. Bill Yi Santa Clara University

Computer Architecture 计算机体系结构. Lecture 6. Data Storage and I/O 第六讲 数据存储和输入输出. Chao Li, PhD. 李超博士

Lecture 1 Introduction (Chapter 1 of Textbook)

Summary of MAC protocols

www-inst.eecs.berkeley.edu/~cs61c/

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit)

Lecture 16: Network Layer Overview, Internet Protocol

Transcription:

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 24: More I/O: DMA, Disks, Networking Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 11/21/17 Fall 2017 -- Lecture #24 1

Review: Address Translation and Protection Kernel/User Mode Read/Write Virtual Address Protection Check Virtual Page No. (VPN) Miss Address Translation offset Exception? Physical Address Physical Page No. (PPN) offset Every instruction and data access needs address translation and protection checks Good VM design should be fast (~one cycle) and space efficient 11/21/17 Fall 2017 -- Lecture #24 2

Review: Hierarchical Page Tables Virtual Address 31 22 21 12 11 p1 p2 offset 0 10-bit L1 index Root of the Current Page Table (Processor Register) 10-bit L2 index p1 page in primary memory page in secondary memory p2 Level 1 Page Table Page size 10b à 1024 x 4096B Level 2 Page Tables 12b à 4096B Physical Memory 11/21/17 PTE of a nonexistent page Fall 2017 -- Lecture #24 Data Pages 3

Review: Translation Lookaside Buffers (TLB) Address translation is very expensive! In a two-level page table, each reference becomes three memory accesses Solution: Cache some translations in TLB TLB hit Þ Single-Cycle Translation TLB miss Þ Page-Table Walk to refill virtual address VPN offset V D tag PPN (VPN = virtual page number) (PPN = physical page number) hit? physical address PPN offset 11/21/17 Fall 2017 -- Lecture #24 4

VM-related Events in Pipeline PC Inst TLB Inst. Cache D Decode E + M Data TLB Data Cache W TLB miss? Page Fault? Protection violation? TLB miss? Page Fault? Protection violation? Handling a TLB miss needs a hw or sw mechanism to refill TLB Usually done in hardware Handling a page fault (e.g., page is on disk) needs precise trap so software handler can easily resume after retrieving page Protection violation may abort process 11/21/17 Fall 2017 -- Lecture #24 5

Address Translation: Putting it all Together Virtual Address TLB Lookup hardware hardware or software software miss hit Page Table Walk Protection Check the page is Ï memory Î memory denied permitted Page Fault (OS loads page) Update TLB Protection Fault Physical Address (to cache) Where? SEGFAULT 11/21/17 Fall 2017 -- Lecture #24 6

Review: I/O Memory mapped I/O : Device control/data registers mapped to CPU address space CPU synchronizes with I/O device: Polling Interrupts Programmed I/O : CPU execs lw/sw instructions for all data movement to/from devices CPU spends time doing two things: 1. Getting data from device to main memory 2. Using data to compute 11/21/17 Fall 2017 -- Lecture #24 7

Reality Check! Memory mapped I/O : Device control/data registers mapped to CPU address space CPU synchronizes with I/O device: Polling Interrupts Programmed I/O : DMA CPU execs lw/sw instructions for all data movement to/from devices CPU spends time doing 2 things: 1. Getting data from device to main memory 2. Using data to compute 11/21/17 Fall 2017 -- Lecture #24 8

Direct Memory Access Review: Disks Networking Storage Attachment Evolution Rack Scale Memory And in Conclusion Outline 11/21/17 Fall 2017 Lecture #24 9

Direct Memory Access Disks Networking Storage Attachment Evolution Rack Scale Memory And in Conclusion Outline 11/21/17 Fall 2017 Lecture #24 10

What s Wrong with Programmed I/O? Not ideal because 1. CPU has to execute all transfers, could be doing other work 2. Device speeds don t align well with CPU speeds 3. Energy cost of using beefy general-purpose CPU where simpler hardware would suffice Until now CPU has sole control of main memory 5% of CPU cycles on Google Servers spent in memcpy() and memmove() library routines!* *Kanev et al., Profiling a warehouse-scale computer, ICSA 2015, (June 2015), Portland, OR. 11/21/17 Fall 2017 -- Lecture #24 11

PIO vs. DMA 11/21/17 Fall 2017 -- Lecture #24 12

Direct Memory Access (DMA) Allows I/O devices to directly read/write main memory New Hardware: the DMA Engine DMA engine contains registers written by CPU: Memory address to place data # of bytes I/O device #, direction of transfer unit of transfer, amount to transfer per burst 11/21/17 Fall 2017 -- Lecture #24 13

Operation of a DMA Transfer [From Section 5.1.4 Direct Memory Access in Modern Operating Systems by Andrew S. Tanenbaum, Herbert Bos, 2014] 11/21/17 Fall 2017 -- Lecture #24 14

DMA: Incoming Data 1. Receive interrupt from device 2. CPU takes interrupt, begins transfer Instructs DMA engine/device to place data @ certain address 3. Device/DMA engine handle the transfer CPU is free to execute other things 4. Upon completion, Device/DMA engine interrupt the CPU again 11/21/17 Fall 2017 -- Lecture #24 15

DMA: Outgoing Data 1. CPU decides to initiate transfer, confirms that external device is ready 2. CPU begins transfer Instructs DMA engine/device that data is available @ certain address 3. Device/DMA engine handle the transfer CPU is free to execute other things 4. Device/DMA engine interrupt the CPU again to signal completion 11/21/17 Fall 2017 -- Lecture #24 16

DMA: Some New Problems Where in the memory hierarchy do we plug in the DMA engine? Two extremes: Between L1$ and CPU: Pro: Free coherency Con: Trash the CPU s working set with transferred data Between Last-level cache and main memory: Pro: Don t mess with caches Con: Need to explicitly manage coherency 11/21/17 Fall 2017 -- Lecture #24 17

Outline Direct Memory Access Disks Networking And in Conclusion 11/21/17 Fall 2017 Lecture #24 18

Computer Memory Hierarchy One of our Great Ideas Rack Scale Memory/Storage 11/21/17 Fall 2017 -- Lecture #24 19

Storage-Centric View of the Memory Hierarchy RAM File System Cache Storage Controller Cache Embedded Disk Controller Cache Previous Slide! Solid State Disk Magnetic Disk Storage Disk Array On-Line Optical Disk Jukebox Near-Line Magnetic or Optical Tape Library Shelved Magnetic or Optical Tape Off-Line 11/21/17 Fall 2017 -- Lecture #24 20

DRAM Flash Disk 21

Disk Device Performance (1/2) Outer Track Inner Track Sector Spindle Head Arm Controller Platter Actuator Disk Access Time = Seek Time + Rotation Time + Transfer Time + Controller Overhead Seek Time = time to position the head assembly at the proper cylinder Rotation Time = time for the disk to rotate to the point where the first sectors of the block to access reach the head Transfer Time = time taken by the sectors of the block and any gaps between them to rotate past the head 11/21/17 Fall 2017 -- Lecture #24 22

Disk Device Performance (2/2) Average values to plug into the formula: Rotation Time: Average distance of sector from head? 1/2 time of a rotation 7200 Revolutions Per Minute Þ 120 Rev/sec 1 revolution = 1/120 sec Þ 8.33 milliseconds 1/2 rotation (revolution) Þ 4.17 ms Seek time: Average no. tracks to move arm? Number of tracks/3 (see CS186 for the math) Then, seek time = number of tracks moved time to move across one track 11/21/17 Fall 2017 -- Lecture #24 23

But wait! Performance estimates are different in practice Modern disks have on-disk caches, which are hidden from the outside world Generally, what limits real performance is the on-disk cache access time 11/21/17 Fall 2017 -- Lecture #24 24

Flash Memory / SSD Technology NMOS transistor with an additional conductor between gate and source/drain which traps electrons. The presence/absence is a 1 or 0 Memory cells can withstand a limited number of program-erase cycles. Controllers use a technique called wear leveling to distribute writes as evenly as possible across all the flash blocks in the SSD. 11/21/17 Fall 2017 -- Lecture #24 25

Administrivia (1/2) Project 3.2 (Performance Contest) has been released Up to 5 extra credit points for the highest speedups Final exam: 14 December, 7-10 PM @ TBA Contact head TA (Steven Ho) about conflicts if you haven t been contacted yet Review Lectures and Book with eye on the important concepts of the course Review Session Fri Dec 8, 5-8 PM @TBA Electronic Course Evaluations starting this week! See https://course-evaluations.berkeley.edu 11/21/17 Fall 2017 -- Lecture #24 26

Administrivia (2/2) HW6 party tonight night (Nov 21) in the Woz from 5-8 PM No discussions or labs this week! Labs resume Monday after Thanksgiving Lab 11 due in any lab before December 1 Lab 13 due in any OH before December 8 Homework 6 due tomorrow night Project 4 to be released Friday latest Homework 7 to be released Monday after break 11/21/17 Fall 2017 -- Lecture #24 27

Winners of the Project 3 Performance Competition! 11/21/17 Fall 2017 -- Lecture #24 28

Happy Birthday RAID: 11/20/87 RAID (Redundant Arrays of Inexpensive Disks) covered in next Tuesday s lecture! 11/21/17 Fall 2017 -- Lecture #24 29

Happy Birthday Internet! 11/21/69 30

CS61c in the News: Supercomputer in a File Drawer The Raspberry Pi modules let developers figure out how to write this software and get it to work reliably without having a dedicated testbed of the same size, which would cost a quarter billion dollars and use 25 megawatts of electricity. Gary Grider, leader of the High Performance Computing Division 11/21/17 Fall 2017 -- Lecture #24 31

11/21/17 Fall 2017 - Lecture #24 32

We have the following disk: Peer Instruction Question 15000 Cylinders, 1 ms to cross 1000 Cylinders 15000 RPM = 4 ms per rotation Want to copy 1 MB, transfer rate of 1000 MB/s 1 ms controller processing time What is the access time using our model? Disk Access Time = Seek Time + Rotation Time + Transfer Time + Controller Processing Time A B C D 10.5 ms 9 ms 8.5 ms 11.5 ms 11/21/17 Fall 2017 -- Lecture #24 33

Direct Memory Access Disks Networking Storage Attachment Evolution Rack Scale Memory And in Conclusion Outline 11/21/17 Fall 2017 Lecture #24 34

Networks: Talking to the Outside World Originally sharing I/O devices between computers E.g., printers Then communicating between computers E.g., file transfer protocol Then communicating between people E.g., e-mail Then communicating between networks of computers E.g., file sharing, www, 11/21/17 Fall 2017 -- Lecture #24 35

History 1963: JCR Licklider, while at DoD s ARPA, writes a memo describing desire to connect the computers at various research universities: Stanford, Berkeley, UCLA,... 1969 : ARPA deploys 4 nodes @ UCLA, SRI, Utah, & UCSB 1973 Robert Kahn & Vint Cerf invent TCP, now part of the Internet Protocol Suite Internet growth rates Exponential since start! www.computerhistory.org/internet_history The Internet (1962) Lick Revolutions like this don't Vint Cerf come along very often www.greatachievements.org/?id=3736 en.wikipedia.org/wiki/internet_protocol_suite 11/21/17 Fall 2017 -- Lecture #24 36

System of interlinked hypertext documents on the Internet History 1945: Vannevar Bush describes hypertext system called memex in article 1989: Sir Tim Berners-Lee proposed and implemented the first successful communication between a Hypertext Transfer Protocol (HTTP) client and server using the internet. ~2000 Dot-com entrepreneurs rushed in, 2001 bubble burst Today : Access anywhere! en.wikipedia.org/wiki/history_of_the_world_wide_web The World Wide Web (1989) Tim Berners-Lee World s First web server in 1990 11/21/17 Fall 2017 -- Lecture #24 37

Shared vs. Switch-Based Networks Shared vs. Switched: Shared: 1 at a time (CSMA/CD) Switched: pairs ( point-to-point connections) communicate at same time Aggregate bandwidth (BW) in switched network is many times that of shared: Point-to-point faster since no arbitration, simpler interface Node Shared Node Node Node Node Node Crossbar Switch Node 11/21/17 Fall 2017 -- Lecture #24 38

What Makes Networks Work? Links connecting switches and/or routers to each other and to computers or devices Computer switch switch network interface switch Ability to name the components and to route packets of information - messages - from a source to a destination Layering, redundancy, protocols, and encapsulation as means of abstraction (61C big idea) 11/21/17 Fall 2017 -- Lecture #24 39

Software Protocol to Send and Receive SW Send steps 1: Application copies data to OS buffer 2: OS calculates checksum, starts timer 3: OS sends data to network interface HW and says start SW Receive steps 3: OS copies data from network interface HW to OS buffer 2: OS calculates checksum, if OK, send ACK; if not, delete message (sender resends when timer expires) 1: If OK, OS copies data to user address space, & signals application to continue Dest Src Net ID Net ID Len ACK INFO Header CMD/ Address /Data Payload Checksum Trailer 11/21/17 Fall 2017 -- Lecture #24 40

Networks are like Ogres https://www.youtube.com/watch?v=_bmcxve8zis 11/21/17 Fall 2017 -- Lecture #24 41

Protocols for Networks of Networks? What does it take to send packets across the globe? Bits on wire or air Packets on wire or air Delivery packets within a single physical network Deliver packets across multiple networks Ensure the destination received the data Create data at the sender and make use of the data at the receiver 11/21/17 Fall 2017 -- Lecture #24 42

Protocol for Networks of Networks? Lots to do and at multiple levels! Use abstraction to cope with complexity of communication Networks are like ogres onions Hierarchy of layers: - Application (chat client, game, etc.) - Transport (TCP, UDP) - Network (IP) - Data Link Layer (ethernet) - Physical Link (copper, wireless, etc.) 11/21/17 Fall 2017 -- Lecture #24 43

Protocol Family Concept Protocol: packet structure and control commands to manage communication Protocol families (suites): a set of cooperating protocols that implement the network stack Key to protocol families is that communication occurs logically at the same level of the protocol, called peer-to-peer but is implemented via services at the next lower level Encapsulation: carry higher level information within lower level envelope 11/21/17 Fall 2017 -- Lecture #24 44

CEO A writes letter to CEO B Folds letter and hands it to assistant Dear Bill, Assistant: Puts letter in envelope with CEO B s full name Takes to FedEx Your days are numbered. FedEx Office Puts letter in larger envelope --Steve Inspiration Puts name and street address on FedEx envelope Puts package on FedEx delivery truck FedEx delivers to other company 11/21/17 Fall 2017 -- Lecture #24 45

The Path of the Letter Peers on each side understand the same things No one else needs to Lowest level has most packaging CEO Semantic Letter Content CEO Aide Envelope Identity Aide FedEx FedexLocation Envelope (FE) FedEx 11/21/17 Fall 2017 -- Lecture #24 46

Protocol Family Concept Message Logical Message Actual Actual Logical H Message T H Message T Actual Actual H H Message T T H H Message T T Physical Each lower level of stack encapsulates information from layer above by adding header and trailer 11/21/17 Fall 2017 -- Lecture #24 47

Most Popular Protocol for Network of Networks Transmission Control Protocol/Internet Protocol (TCP/IP) This protocol family is the basis of the Internet, a WAN (wide area network) protocol IP makes best effort to deliver Packets can be lost, corrupted TCP guarantees delivery TCP/IP so popular it is used even when communicating locally: even across homogeneous LAN (local area network) 11/21/17 Fall 2017 -- Lecture #24 48

TCP/IP Packet, Ethernet Packet, Protocols Application sends message TCP breaks into 64KiB segments, adds 20B header IP adds 20B header, sends to network If Ethernet, broken into 1500B packets with headers, trailers Ethernet Hdr IP Header TCP Header EHIP Data TCP data Message Ethernet Hdr 11/21/17 Fall 2017 -- Lecture #24 49

11/21/17 Fall 2017 - Lecture #20 50

Direct Memory Access Disks Networking Storage Attachment Evolution Rack Scale Memory And in Conclusion Outline 11/21/17 Fall 2017 Lecture #24 51

Storage Attachment Evolution Disk Interface (DI) Host Network Server Attachment OS Allocation Table Direct Attachment Disk, Cylinder,Track, Sector Host Host Network Interface (NI) LAN File Name, Offset, Length Network File Server OS Host Network Interface (NI) Network Interface (NI) 11/21/17 Fall 2017 -- Lecture #24 52

Storage Attachment Evolution Channel Attached Host Main Frame Main Frame OS LUN, Offset, Length Channel Interface Disk Storage Subsystem Host Host OS Disk, Cylinder, Track, Sector LAN Network Interface (NI) Network Interface (NI) File Name, Offset, Length Network Interface (NI) Network File Server OS Network-attached Storage (NAS) Network Attached Work Station LUN To PHY 11/21/17 Fall 2017 -- Lecture #24 53

Storage Attachment Evolution Gate way LAN FS SAN Remote SAN DSS WAN Storage Area Networks (SAN) Host Host Host LAN Network Interface (NI) Network Interface (NI) Gate way File Name, Offset, Length File Server File Server Network Interface (NI) File Server CI CI CI Main Frame Main Frame LUN, Offset, Length SAN Channel Interface LUN, Offset, Length CI Optical Disk Storage Subsystem Disk Storage Subsystem Tape Storage Subsystem PHY Device, Cyl, Trk, Sector 11/21/17 Fall 2017 -- Lecture #24 54

Direct Memory Access Disks Networking Storage Attachment Evolution Rack Scale Memory And in Conclusion Outline 11/21/17 Fall 2017 Lecture #24 55

Storage Class Memory aka Rack-Scale Memory 11/21/17 Fall 2017 -- Lecture #24 56

Storage Class Memory aka Rack-Scale Memory High bandwidth and low latency through a simplified interface based on memory semantics (i.e., ld/st), scalable from tens to several hundred GB/sec of bandwidth, with sub-100 nanosecs loadto-use memory latency Supports scalable memory pools and resources for real-time analytics and in-memory applications (i.e., map-reduce) Highly software compatible with no required changes to the operating system Scales from simple, low cost connectivity to highly capable, rack scale interconnect 11/21/17 Fall 2017 -- Lecture #24 57

Storage Class Memory aka Rack-Scale Memory Cheaper than DRAM More expensive than disk Non-Volatile and faster than disk 11/21/17 Fall 2017 -- Lecture #24 58

Storage Class Memory aka Rack Scale Memory System memory is flat or shrinking Memory bandwidth per core continues to decrease Memory capacity per core is generally flat Memory is changing on a different cadence compared to the CPU Data is growing Data that requires real-time analysis is growing exponentially The value of the analysis decreases if it takes too long to provide insights Industry needs an open architecture to solve the problems Memory tiers will become increasingly important Rack-scale composability requires a high bandwidth, low latency fabric Must seamlessly plug into existing ecosystems without requiring OS changes 11/21/17 Fall 2017 -- Lecture #24 59

Remote Direct Memory Access 60

Remote Direct Memory Access Conventional Networking Cut-through Memory access Over network 61

Outline Direct Memory Access Disks Networking Rack Scale Memory And, in Conclusion 11/21/17 Fall 2017 Lecture #24 62

And, in Conclusion I/O gives computers their 5 senses I/O speed range is 100-million to one DMA to avoid wasting CPU time on data transfers Disks for persistent storage, being replaced by flash and emerging storage class memory Networks: computer-to-computer I/O Protocol suites allow networking of heterogeneous components. Great Idea: Layers and Abstraction Emerging class: Rack-scale/Storage-class Memory accessible over RDMA or other network interconnect 11/21/17 Fall 2017 -- Lecture #24 63