L7: Performance. Frans Kaashoek Spring 2013

Similar documents
6.033 Computer System Engineering

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

FFS: The Fast File System -and- The Magical World of SSDs

CSE 451: Operating Systems Spring Module 12 Secondary Storage

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

Disk Scheduling COMPSCI 386

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

STORING DATA: DISK AND FILES

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

EECS 482 Introduction to Operating Systems

Hard Disk Drives. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

CSE 153 Design of Operating Systems

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Where Have We Been? Ch. 6 Memory Technology

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

CSE 153 Design of Operating Systems Fall 2018

Database Systems II. Secondary Storage

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

Advanced Database Systems

CSE 451: Operating Systems Winter Secondary Storage. Steve Gribble. Secondary storage

ECE331: Hardware Organization and Design

Data Storage and Query Answering. Data Storage and Disk Structure (2)

I/O 1. Devices and I/O. key concepts device registers, device drivers, program-controlled I/O, DMA, polling, disk drives, disk head scheduling

A Comparison of File. D. Roselli, J. R. Lorch, T. E. Anderson Proc USENIX Annual Technical Conference

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

CMSC 424 Database design Lecture 12 Storage. Mihai Pop

CS 261 Fall Mike Lam, Professor. Memory

EECS 482 Introduction to Operating Systems

Storage Technologies and the Memory Hierarchy

Memory Hierarchy Y. K. Malaiya

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

Operating Systems. Operating Systems Professor Sina Meraji U of T

[537] Fast File System. Tyler Harter

Lecture 16: Storage Devices

[537] I/O Devices/Disks. Tyler Harter

I/O CANNOT BE IGNORED

I/O CANNOT BE IGNORED

CSE 451: Operating Systems Winter Lecture 12 Secondary Storage. Steve Gribble 323B Sieg Hall.

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

secondary storage: Secondary Stg Any modern computer system will incorporate (at least) two levels of storage:

CSE 120: Principles of Operating Systems. Lecture 10. File Systems. November 6, Prof. Joe Pasquale

Main Points. File systems. Storage hardware characteristics. File system usage patterns. Useful abstractions on top of physical devices

CS429: Computer Organization and Architecture

CS 550 Operating Systems Spring File System

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

Large and Fast: Exploiting Memory Hierarchy

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13

What is a file system

Advanced Memory Organizations

CSE 120 Principles of Operating Systems

CS429: Computer Organization and Architecture

Memory Hierarchy. Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity. (Study Chapter 5)

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

CSE 120 Principles of Operating Systems

Lecture 18: Memory Systems. Spring 2018 Jason Tang

Name: Instructions. Problem 1 : Short answer. [56 points] CMU Storage Systems 25 Feb 2009 Spring 2009 Exam 1

Readings. Storage Hierarchy III: I/O System. I/O (Disk) Performance. I/O Device Characteristics. often boring, but still quite important

Block Device Scheduling. Don Porter CSE 506

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Block Device Scheduling

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Disks and Files. Storage Structures Introduction Chapter 8 (3 rd edition) Why Not Store Everything in Main Memory?

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

Introduction to OpenMP. Lecture 10: Caches

CSE506: Operating Systems CSE 506: Operating Systems

Disks, Memories & Buffer Management

CS 261 Fall Mike Lam, Professor. Memory

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

The Memory Hierarchy 10/25/16

CSE 333 Lecture 9 - storage

Introduction to I/O and Disk Management

COSC 243. Memory and Storage Systems. Lecture 10 Memory and Storage Systems. COSC 243 (Computer Architecture)

CS 4284 Systems Capstone

Introduction to I/O and Disk Management

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

CS 31: Intro to Systems Storage and Memory. Kevin Webb Swarthmore College March 17, 2015

2. PICTURE: Cut and paste from paper

Department of Computer Engineering University of California at Santa Cruz. File Systems. Hai Tao

Transistor: Digital Building Blocks

CIT 668: System Architecture. Computer Systems Architecture

Parser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text.

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1

CMSC 611: Advanced Computer Architecture

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

CSC369 Lecture 9. Larry Zhang, November 16, 2015

Chapter 4 File Systems. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved

Computer Systems. Memory Hierarchy. Han, Hwansoo

CSE 190D Database System Implementation

I/O and file systems. Dealing with device heterogeneity

Cray XE6 Performance Workshop

CENG3420 Lecture 08: Memory Organization

Computer Architecture and System Software Lecture 08: Assembly Language Programming + Memory Hierarchy

Why Does Solid State Disk Lower CPI?

Transcription:

L7: Performance Frans Kaashoek kaashoek@mit.edu 6.033 Spring 2013

Overview Technology fixes some performance problems Ride the technology curves if you can Some performance requirements require thinking An increase in load may need redesign: Batching Caching Scheduling Concurrency Important numbers

Moore s law cost per transistor transistors per die Cramming More Components Onto Integrated Circuits, Electronics, April 1965

Transistors/die doubles every ~18 months

Moore s law sets a clear goal Tremendous investment in technology Technology improvement is proportional to technology Example: processors Better processors Better layout tools Better processors Mathematically: d(technology)/dt technology Ø technology e t 5

CPU performance

DRAM density

Disk: Price per GByte drops at ~30-35% per year

1000 Latency improves slowly Improvement wrt year #1 100 10 1 Moore s law (~70% per year) DRAM access latency (~7% per year) Speed of light (0% per year) 1 2 3 4 5 6 7 8 9 10 11 Year #

Performance and system design Improvements in technology can fix performance problems Some performance problems are intrinsic E.g., design project 1 Technology doesn t fix it; you have think Handling increasing load can require re-design Not every aspect of the system improves over time

Approach to performance problems Client. Internet Server Disk Client Users complaint the system is too slow Measure the system to find bottleneck Relax the bottleneck Add hardware, change system design

Performance metrics Client Server Client Performance metrics: Throughput: request/time for many requests Latency: time / request for single request Latency = 1/throughput? Often not; e.g., server may have two CPUs

Heavily-loaded systems Throughput Latency bottleneck users Once system busy, requests queue up users

Approaches to finding bottleneck Client Network Server Disk Measure utilization of each resource CPU is 100% busy, disk is 20% busy CPU is 50% busy, disk is 50% busy, alternating Model performance of your system What performance do you expect? Say net takes 10ms, CPU 50 ms, disk 10ms Guess, check, and iterate

Fixing a bottleneck Get faster hardware Fix application Better algorithm, fewer features 6.033 cannot help you here General system techniques: Batching Caching Concurrency Scheduling

Case study: I/O bottleneck

Hitachi 7K400

Inside a disk 7200 rpm 8.3 ms per rotation

Top view 88,283 tracks per platter 576 to 1170 sectors per track

Performance of reading a sector Latency = seek + rotation + reading/writing: Seek time: 1-15 ms avg 8.2msec for read, avg 9.2ms for write Rotation time: 0-8.3 ms Read/writing bits: 35-62MB/s (inner to outer) Read(4KB): Latency: 8.2msec+4.1msec+~0.1ms = 12.4ms Throughput: 4KB/12.4 msec = 322 KB/s 99% of time spent moving disk; 1% reading!

Batching Batch into reads/writes into large sequential transfers (e.g., a track) Time for a track (1,000 512 bytes): 0.8 msec to seek to next track 8.3 msec to read track Throughput: 512KB/9.1 msec = 55MB/s As fast as LAN; less likely to be a bottleneck

System design implications If system reads/writes large files: Lay them out contiguously on disk If system reads/writes many small files: Group them together into a single track Modern Unix: put dir + inodes + data together

Caching Use DRAM to remember recently-read sectors Most operating systems use much DRAM for caching DRAM latency and throughput orders of magnitude better Challenge: what to cache? DRAM is often much smaller than disk

Performance model Average time to read/write sector with cache: hit_time hit_rate + miss_time miss_rate Example: 100 sectors, 90% hit 10 sectors Without cache: 10 ms for each sector With cache: 0ms * 0.9 + 10ms*0.1 = 1ms Hit rate must be high to make cache work well!

Replacement policy Many caches have bounded size Goal: evict cache entry that s unlikely to be used soon Approach: predict future behavior on past behavior Extend with hints from application

Least-Recently Used (LRU) policy If used recently, likely to be used again Easy to implement in software Works well for popular data (e.g., / )

Is LRU always the best policy? LRU fails for sequential access of file larger than cache LRU would evict all useful data Better policy for this work load: Most-recently Used (MRU)

When do caches work? 1. All data fits in cache 2. Access pattern has: Temporal locality E.g., home directories of users currently logged in Spatial locality E.g., all inodes in a directory ( ls l ) Not all patterns have locality E.g., reading large files

Simple server while (true) { wait for request data = read(name) response = compute(data) send response }

Caching often cannot help writes Writes often must to go disk for durability After power failure, new data must be available Result: disk performance dominates writeintensive workloads Worst case: many random small writes Mail server storing each message in a separate file Logging can help Writing data to sequential log (see later in semester)

I/O concurrency motivation Simple server alternates between waiting for disk and computing: CPU: --- A --- --- B --- Disk: --- A --- --- B ---

Use several threads to overlap I/O Thread 1 works on A and Thread 2 works on B, keeping both CPU and disk busy: CPU: --- A --- --- B --- --- C ---. Disk: --- A --- --- B --- Other benefit: fast requests can get ahead of slow requests Downside: need locks, etc.!

Scheduling Suppose many threads issuing disk requests: 71, 10, 92, 45, 29 Naïve algorithm: random reads (8-15ms seek) Better: Shortest-seek first (1 ms seek): 10, 29, 45, 71, 92 High load -> smaller seeks -> higher throughput Downside: unfair, risk of starvation Elevator algorithm avoids starvation

Parallel hardware Use several disks to increase performance: Many small requests: group files on disks Minimizes seeks Many large requests: strip files across disks Increase throughout Use many computers: Balance work across computers? What if one computer fails? How to program? MapReduce?

Solid State Disk (SSD) Faster storage technology than disk Flash memory that exports disk interface No moving parts OCZ Vertex 3: 256GB SSD Sequential read: 400 MB/s Sequential write: 200-300 MB/s Random 4KB read: 5700/s (23 MB/s) Random 4KB write: 2200/s (9 MB/s)

SSDs and writes Writes performance is slower: Flash can erase only large units (e.g, 512 KB) Writing a small block: 1. Read 512 KB 2. Update 4KB of 512 KB 3. Write 512 KB Controllers try to avoid this using logging

SSD versus Disk Disk: ~$100 for 2 TB $0.05 per GB SSD: ~$300 for 256 GB $1.00 per GB Many performance issues still the same: Both SSD and Disks much slower than RAM Avoid random small writes using batching

Important numbers Latency: 0.00000001 ms: instruction time (1 ns) 0.0001 ms: DRAM load (100 ns) 0.1 ms: LAN network 10 ms: random disk I/O 25 ms: Internet east -> west coast Throughput: 10,000 MB/s: DRAM 1,000 MB/s: LAN (or100 MB/s) 100 MB/s: sequential disk (or 500 MB/s) 1 MB/s: random disk I/O

Summary Technology fixes some performance problems If performance problem is intrinsic: Batching Caching Concurrency Scheduling Important numbers