Scalability, Performance & Caching

Similar documents
Scalability, Performance & Caching

Metcalfe s Law : Why is the Web so Big?

Chap. 4 Multiprocessors and Thread-Level Parallelism

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

Security Fundamentals

Assignment #2. Csci4211 Spring Due on March 6th, Notes: There are five questions in this assignment. Each question has 10 points.

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit)

I/O Buffering and Streaming

COMP 150-IDS: Internet Scale Distributed Systems (Spring 2016) What s in a name?

Next Steps Spring 2011 Lecture #18. Multi-hop Networks. Network Reliability. Have: digital point-to-point. Want: many interconnected points

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Seminar on. By Sai Rahul Reddy P. 2/2/2005 Web Caching 1

Versioning, Extensibility & Postel s Law

Storing Data: Disks and Files

Flash: an efficient and portable web server

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Application Protocols and HTTP

Application Layer Introduction; HTTP; FTP

Course Administration

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Async Programming & Networking. CS 475, Spring 2018 Concurrent & Distributed Systems

Yahoo Traffic Server -a Powerful Cloud Gatekeeper

Issues in Multiprocessors

EE 4683/5683: COMPUTER ARCHITECTURE

Advanced Network Design

Final Lecture. A few minutes to wrap up and add some perspective

Introduction to parallel Computing

Introduction to TCP/IP

Chapter 13 Strong Scaling

The Memory Hierarchy 10/25/16

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

CSE/EE 461 HTTP and the Web

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems

6.033 Computer System Engineering

COMP6218: Content Caches. Prof Leslie Carr

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Naming System Design Tradeoffs

Systems Architecture II

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

Principles of Data Management. Lecture #2 (Storing Data: Disks and Files)

CS3350B Computer Architecture

CSCI-1680 WWW Rodrigo Fonseca

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

CS 111. Operating Systems Peter Reiher

Topics in Computer System Performance and Reliability: Storage Systems!

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CS 326: Operating Systems. Networking. Lecture 17

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

Disks and Files. Storage Structures Introduction Chapter 8 (3 rd edition) Why Not Store Everything in Main Memory?

CSE506: Operating Systems CSE 506: Operating Systems

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 20: Networks and Distributed Systems

L7: Performance. Frans Kaashoek Spring 2013

CSE 262 Spring Scott B. Baden. Lecture 1 Introduction

Google File System (GFS) and Hadoop Distributed File System (HDFS)

HTTP/2: Ease the Transition, Remove Implementation Barriers. Robert Haynes Ronnie Dockery

Chapter 4 File Systems. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 19: Networks and Distributed Systems

Send me up to 5 good questions in your opinion, I ll use top ones Via direct message at slack. Can be a group effort. Try to add some explanation.

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture Input/Output (I/O) Copyright 2012 Daniel J. Sorin Duke University

High Performance Computing Systems

Introduction to: The Architecture of the Internet

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

IBM Emulex 16Gb Fibre Channel HBA Evaluation

Performance analysis basics

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

CS252 S05. CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2. I/O performance measures. I/O performance measures

Networking and Internetworking 1

Ext3/4 file systems. Don Porter CSE 506

High Performance Computing

Reading and References. Input / Output. Why Input and Output? A typical organization. CSE 410, Spring 2004 Computer Systems

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

CS2410: Computer Architecture. Storage systems. Sangyeun Cho. Computer Science Department University of Pittsburgh

CSE502: Computer Architecture CSE 502: Computer Architecture

Squirrel case-study. Decentralized peer-to-peer web cache. Traditional centralized web cache. Based on the Pastry peer-to-peer middleware system

Storage Systems. Storage Systems

Introduction to: The Architecture of the Internet

CSE 451: Operating Systems Spring Module 12 Secondary Storage

Issues in Multiprocessors

Storage systems. Computer Systems Architecture CMSC 411 Unit 6 Storage Systems. (Hard) Disks. Disk and Tape Technologies. Disks (cont.

Page 1. Multilevel Memories (Improving performance using a little cash )

Disks and Files. Jim Gray s Storage Latency Analogy: How Far Away is the Data? Components of a Disk. Disks

Introduction to Parallel Programming and Computing for Computational Sciences. By Justin McKennon

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

Disks October 22 nd, 2010

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

Computer Architecture. What is it?

Proxying. Why and How. Alon Altman. Haifa Linux Club. Proxying p.1/24

Overview Content Delivery Computer Networking Lecture 15: The Web Peter Steenkiste. Fall 2016

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 15: Caching: Demand Paged Virtual Memory

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

EE282H: Computer Architecture and Organization. EE282H: Computer Architecture and Organization -- Course Overview

PROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Outline of Today s Lecture. The Big Picture: Where are We Now?

Chapter Seven Morgan Kaufmann Publishers

NFS. Don Porter CSE 506

Transcription:

COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015) Scalability, Performance & Caching Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah Copyright 2012 & 2015 Noah Mendelsohn

Goals Explore some general principles of performance, scalability and caching Explore key issues relating to performance and scalability of the Web 2

Performance Concepts and Terminology 3

Performance, Scalability, Availability, Reliability Performance Get a lot done quickly Preferably a low cost Scalability Low barriers to growth A scalable system isn t necessarily fast but it can grow without slowing down Availability Always there when you need it Reliability Never does the wrong thing, never loses or corrupts data 4

Throughput vs. Response Time vs. Latency Throughput: the aggregate rate at which a system does work Response time: the time to get a response to a request Latency: time spent waiting (e.g. for disk or network) We can improve throughput by: Minimizing work done per request Doing enough work at once to keep all hardware resources busy and when some work is delayed (latency) find other work to do We can improve response time by: Minimizing total work and delay (latency) on critical path to a response Applying parallel resources to an individual response including streaming Precomputing response values 5

Know How Fast Things Are 6

Typical speeds n feeds CPU (e.g. Intel Core I7): A few billions instructions / second per core Memory: 20GB/sec (20 bytes/instruction executed) Long distance network Latency (ping time): 10-100ms Bandwidth: 5 100 Mb/sec Local area network (Gbit Ethernet) Latency: 50-100usec (note microseconds) Bandwidth: 1 Gb/sec (100mbytes/sec) Hard disk Rotational delay: 5ms Seek time: 5 10 ms Bandwidth from magenetic media: 1Gbit/sec SSD Setup time: 100usec Bandwidth: 2Gbit/sec (typical) note: SSD wins big on latency, some on bandwidth 7

Making Systems Faster Hiding Latency 8

Hard disks are slow Platter Sector

Handling disk data the slow way The Slow Way Read a block Compute on block Read another block Compute on other block Rinse and repeat Sector Computer waits msec while reading disk 1000s of instruction times!

Faster way: overlap to hide latency The Faster Way Read a block Start reading another block Compute on 1 st block Start reading 3 rd block Compute on 2 nd block Rinse and repeat Sector Buffering: we re reading ahead computing while reading!

Making Systems Faster Bottlenecks and Parallelism 12

Amdahl s claim: parallel processing won t scale 1967: Major controversy will parallel computers work? Demonstration is made of the continued validity of the single processor approach and of the weaknesses of the multiple processor approach in terms of application to real problems and their attendant irregularities. Gene Amdahl* * Gene Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, AFIPS spring joint computer conference, 1967 http://wwwinst.eecs.berkeley.edu/~n252/paper/amdahl.pdf 13

Amdahl: why no parallel scaling? The first characteristic of interest is the fraction of the computational load which is associated with data management housekeeping. This fraction [ might eventually be reduced to 20%...]. The nature of this overhead appears to be sequential so that it is unlikely to be amenable to parallel processing techniques. Overhead alone would then place an upper limit on throughput of five to seven times the sequential processing rate. Gene Amdahl (Ibid) In short: even if the part you re optimizing went to zero time, the speedup would be only 5x. Speedup = 1/(r s +(r p /n)) where r s and r p are sequential/parallel fractions of computation As r p /n 0, Speedup -> 1/r s 14

Web Performance and Scaling 15

Web Performance & Scalability Goals Overall Web Goals: Extraordinary scalability, with good performance Therefore very high aggregate throughput (think of all the accesses being made this second) Economical to deploy (modest cost/user) Be a good citizen on the Internet Web servers: Decent performance, high throughput and scalability Web clients (browsers): Low latency (quick response for users) Reasonable burden on PC Minimize memory and CPU on small devices 16

What we ve already studied about Web scalability Web Builds on scalable hi-perf. Internet infrastructure: IP DNS TCP Decentralized administration & deployment The only thing resembling a global, central Web server is the DNS root URI generation Stateless protocols Relatively easy to add servers 17

Web server scaling Reservation Records Browser Web-Server Application - logic Data store

Stateless HTTP protocol helps scalability Browser Web-Server Application - logic Browser Web-Server Application - logic Data store Browser Web-Server Application - logic

Caching 20

There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton 21

Why does caching work at all? Locality: In many computer systems, a small fraction of data gets most of the accesses In other systems, a slowly changing set of data is accessed repeatedly History: use of memory by typical programs Denning s Working Set Theory* Early demonstration of locality in program access to memory Justified paged virtual memory with LRU replacement algorithm Also indirectly explains why CPU caches work But not all data-intensive programs follow the theory: Video processing! Many simulations Hennessy and Patterson: running vector (think MMX/SIMD) data through the CPU cache was a big mistake in IBM mainframe vector implementations * Peter J. Denning, The Working Set Model for Program Behavior, 1968 http://cs.gmu.edu/cne/pjd/pubs/wsmodel_1968.pdf Also 2008 overview on locality from Denning: http://denninginstitute.com/pjd/pubs/enc/locality08.pdf 22

Why is caching hard? Things change Telling everyone when things change adds overhead So, we re tempted to cheat caches out of sync with reality 23

CPU Caching Simple System CPU Read request Read data Read data CACHE Read request Read data Memory 24

CPU Caching Simple System CPU Life is Good No Traffic to Slow Memory Repeated read request Read data Read data CACHE Memory 25

CPU Caching Store Through Writing CPU Everything is up-to-date but every write waits for slow memory! Write request Write data CACHE Write request Memory 26

CPU Caching Store In Writing CPU The write is fast, but memory is out of date! Write request Write data CACHE Memory 27

CPU Caching Store In Writing CPU If we try to write data from memory to disk, the wrong data will go out! Write data CACHE Memory 28

Cache invalidation is hard! CPU We can start to see why cache Write data CACHE invalidation is hard! Memory 29

Multi-core CPU caching CPU CPU Write request CACHE Coherence Protocol CACHE CACHE Memory 30

Multi-core CPU caching CPU CPU Data Write request Read request Data CACHE Coherence Protocol CACHE CACHE Memory 31

Multi-core CPU caching CPU CPU Write request Disk read request Data CACHE Coherence Protocol Data CACHE Data A CACHE read from disk must flush all caches Memory Disk Data 32

Consistency vs. performance Caching involves difficult tradeoffs Coherence is the enemy of performance! This proves true over and over in lots of systems There s a ton of research on weak-consistency models Weak consistency: let things get out of sync sometimes Programming: compilers and libraries can hide or even exploit weak consistency Yet another example of leaky abstractions! 33

What about Web Caching? Note: update rate on Web is mostly low makes things easier! 34

Browsers have caches Web Server Browser Usually includes a cache! E.g. Firefox Browser cache prevents repeated requests for same representations E.g. Apache

Browsers have caches Web Server Browser Usually includes a cache! E.g. Firefox Browser cache prevents repeated requests for same representations even different pages share images stylesheets, etc. E.g. Apache

Web Reservation System iphone or Android Reservation Application Reservation Records Proxy Cache (optional!) Flight Reservation Logic HTTP HTTP RPC? ODBC? Proprietary? Browser or Phone App E.g. Squid Web Server Application - logic Data Store Many commercial applications work this way

HTTP Caches Help Web to Scale Browser Web Proxy Cache Web-Server Application - logic Browser Web Proxy Cache Web-Server Application - logic Browser Web Proxy Cache Web-Server Application - logic Data store

Web Caching Details 39

HTTP Headers for Caching Cache-control: max-age: Server indicates how long response is good Heuristics: If no explicit times, cache can guess Caches check with server when content has expired Sends ordinary GET w/validator headers Validators: Modified time (1 sec resolution); Etag (opaque code) Server returns 304 Not Modified or new content Cache-control: override default caching rules E.g. client forces check for fresh copy E.g. client explicitly allows stale data (for performance, availability) Caches inform downstream clients/proxies of response age PUT/POST/DELETE clear caches, but No guarantee that updates go through same proxies as all reads! Don t mark as cacheable things you expect to update through parallel proxies! 40

Summary 41

Summary We have studied some key principles and techniques relating to peformance and scalability Amdahl s law Buffering and caching Stateless protocols, etc. The Web is a highly scalable system w/most OK performance Web approaches to scalability Built on scalable Internet infrastructure Few single points of control (DNS root changes slowly and available in parallel) Administrative scalability: no central Web site registry Web performance and scalability Very high parallelism (browsers, servers all run in parallel) Stateless protocols support scale out Caching 42