Large-scale Caching. CS6450: Distributed Systems Lecture 18. Ryan Stutsman

Size: px
Start display at page:

Download "Large-scale Caching. CS6450: Distributed Systems Lecture 18. Ryan Stutsman"

Transcription

1 Large-scale Caching CS6450: Distributed Systems Lecture 18 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed for use under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Some material taken/derived from MIT by Robert Morris, Franz Kaashoek, and Nickolai Zeldovich. 1

2 Move Fast & Break Things Feb 2004: open thefacebook.com to Harvard PHP + MySQL Jun 2004: expand to Columbia, Yale, Stanford Dec 2005: 6 million users 2008: 100 million 2009: 200 million 2010: 400 million 2011: 800 million Today 1.4 billion? PHP + MySQL 2

3 Big Picture Billions of users Inherent wide fan outs, poor locality Near real-time communication, content sharing Spans globe Billions of data requests per second Trillions of items Combine, simple off-the-shelf stuff to do it 3

4 Simple www DB PHP Frontend MySQL Backend 4

5 PHP is slow, so scale LB www www www www www www PHP Frontend MySQL Backend Too many DNS entries, add load balancer? (HTTP proxy) DB 5

6 DB is slow, so shard LB www www www www www www DB DB DB DB DB DB Problem: DBs short stroked, only 100 IOPS each Enter memcached 6

7 Memcached Get(k) v Set(k, v) CAS(k, v, v ) All data in DRAM LRU + Slab allocator Couple thousand lines of C 7

8 +Memcache LB www www www www www www mc mc mc mc mc DB DB DB DB DB DB Look aside, ~ 1M IOPS per memcached box All writes through DB, read miss goes to DB Even this won t be enough... Reads >> writes, decouple r/w capacity scaling 8

9 Cache Workload Long tail; some KBs or 10s of KBs Median few tens of B? Paper says 135 B percentile of requests Caching looks promising Lots of small values hard for disks Reads >> Writes Bytes Figure 10: Cumulative distribution of value sizes fetched wide variance between the p95 latencies arises from handling large responses and waiting for the runnable thread to be scheduled as discussed in Section Pool Statistics

10 Look Aside web server web server 1. get k 2. SELECT UPDATE set (k,v) 2. delete k memcache database memcache delete k database Figure 1: as a demand-filled look-aside Why forward deletes and not sets?

11 Interesting Points Shard with consistent hashing to distribute load Different hash function than DB sharding, why? Scale DB capacity, throughput independently DB provisioning determined by write throughput peak miss rate 50% miss, get rid of ½ DBs 1% miss, get rid of 99% of DBs 11

12 Communications Issues All-to-all communication pattern TCP connection state isn t free Communications scheduling, connection scaling # www servers >> mc, and > 100 threads per www O(nm) connections needed Also, flow control is per connection Possible to DOS self by issues parallel requests on 100 threads ( incast ) Incast can happen further up too, even in a nonblocking network 12

13 Connection Aggregation mcrouter One TCP connection per machine to each memcached 100x reduction in memcached-side connection state For Gets, each client thread uses UDP and skips mcrouter Max sustained items / second fb www 0 1M 2M TCP UDP Get mcrouter 10 key multiget 13

14 Latency microseconds UDP direct by mcrouter (TCP) Average of Medians Average of 95th Percentiles

15 App-level Flow Control Paces requests Across all targets unlike TCP One TCP connection per machine to each memcached 100x reduction in memcached-side connection state milliseconds th Percentile Median Window Size 15

16 Why Separate Cache? High fanout and multiple rounds of data fetching Batching Can we use multiget well? Interstitial slide Data dependency DAG for a small request Can amortize dispatch/remote call cost, but Collect groups of keys to request? Coroutines 16

17 Fan out percentile of requests All requests A popular data intensive page distinct memcached servers

18 Two Problems Stale Sets What if DB value changes before cache value can be installed? Thundering Herds Cache misses on hot keys cause runs on DB Leases: lock key on get miss while fetching 18

19 Stale Set C1 Get(k) Miss Get(k) v1 Set(k, v1) Ok MC DB C2

20 Stale Set C1 Get(k) Miss Get(k) v1 Set(k, v1) Ok MC DB C2 Set(k, v2) Ok Del(k)

21 Stale Set Fix: Leases C1 Get(k) Miss Get(k) v1 Set(k, v1) Reject! Idea: Ensure all sets induced by a get miss that came before a DB update are invalidated. MC DB C2 Set(k, v2) Ok Del(k) LL/SC Lease Granted Lease Cleared Set Rejected

22 Thundering Herd MC DB C2

23 Thundering Herd MC DB C2 Ok Del(k)

24 Thundering Herd Fix: Leases Get(k) Miss: Retry Soon C3 Get(k) Miss Get(k) v1 Set(k, v1) C1 MC DB C2 Set(k, v2) Ok Del(k) Lease Granted Lease Cleared Idea: Only let one miss handle DB fetch

25 Pools Terabytes Low churn High churn Daily Weekly Minimum, mean, and maximum Mixing high-churn and low-churn apps causes negative interference in eviction policy Daily Solution: separate apps physically Weekly Figure 5: Daily and weekly working set of a high-chu 25

26 Partitioned Memory Over Time Memshare Detour Static Partition No Partition App B App C 26

27 Estimate Hit Rate Curve Gradient to Optimize Hit Rate Memshare Detour Workload 1 Hit Rate Workload 2 Hit Rate Cache Allocation 27

28 Estimate Hit Rate Curve Gradient to Optimize Hit Rate Memshare Detour Workload 1 Hit Rate " # < " % Keep items from " % Workload 2 Hit Rate Cache Allocation 28

29 Estimating Hit Rate Gradient Memshare Detour Track access frequency to recently evicted objects to determine gradient at working point Can be further improved with full hit rate curve estimation SHARDS [Waldspurger 2015, 2017] AET [Hu 2016] Hit Rate Cache Allocation 29

30 Too Hot to Scale: Replicate Some keys have high locality and are hot Not amenable to sharding mcd mcd Capacity 500k get/s Incoming 1M get/s in 100 key multigets 30

31 Too Hot to Scale: Replicate Some keys have high locality and are hot Not amenable to sharding Interesting angle to paper Sharding in some cases, replication in others Depends on workload and resources e.g. network topo mcd mcd Incoming 2M get/s in 50 key multigets if sharded (1M/s to each server) Capacity 500k get/s each 31

32 Handling Failures: Gutter Reroute Gets to unresponsive nodes to a small mcd cluster No deletes, just short lifetimes Recovers hot part of crashed node s LRU chain quickly But, big cut in hit rate on that shard Hit rate > 35% in 4 m mcd mcd Frontend mcd cluster mcd mcd mcd Gutter mcd mcd 32

33 Replication versus Partitioning Partition: frontend i, key k -> cache server hash(k) Memory efficient Max per key throughput equal to single server tput Multiplier on number of servers each frontend talks to Replication: frontend i, key k -> cache server hash(i) Redundant data Works well if few keys extremely popular

34 Regions mcrouter helps connection scaling; only a constant factor Want some failure independent clusters Inter-cluster links likely to be less well-provisioned Want low-latency to local DCs 34

35 Bigger Picture

36 Shootdowns MC DB C1 Set(k, v2) Ok Del(k)

37 Regional Invalidations Memcache Mcrouter Storage Server MySQL Commit Log McSqueal Update Operations Storage

38 How Bobby was able to sleep at night Problem: if power goes off? Lose 100 TB of 100 B objects in DRAM Disks 100 IOPS Need 1 trillion disk accesses to refill cache To recover in 1 s just need 10 billion disks 38

39 Cold Cluster Warmup Use Region 1 s DRAM cache to warm Region 2 39

40 Cold Cluster Warmup 2. Get Hit 1. Get Miss 3. Set Cluster 1 Cluster 2 Storage (DB) Tier

41 Cold Cluster Warmup 2. Get Miss 1. Get Miss 3. Get from DB Cluster 1 Cluster 2 Storage (DB) Tier

42 Cold Cluster Set(k, v2) C1 Del MC2 DB Del Del MC1 C2 Get(k) Miss Get(k) v1 Set(k, v1) Ok

43 Cold Cluster: Fix Hold-Off Set(k, v2) C1 MC2 Del Del Hold Off Window DB Del MC1 C2 Get(k) Miss Get(k) v1 Set(k, v1) Reject

44 Non-local Writes Set in non-master cluster Invalidate the local cache Send write to master region Fetch value from non-master region... Could still get the old value... So not even read-your-own writes 44

45 Non-local Writes Set(k, v2) C1 MC Slave DB Del Ok Get(k) Miss Get(k) v1 Del Set(k, v2) Master DB

46 Non-local Writes: Remote Marker Set(rk) Set(k, v2) C1 Del k Ok Get(rk) Miss Get(k) v2 MC Slave DB Del k Del rk Set(k, v2) Master DB

47 Busted, but my boss told me to break things What if two clients set markers? Marker will get cleared by the first Set handled by mcsqueal Filled cache value may miss the second update Cache state diverges for an unbounded period of time In practice, we find both the eviction of remote markers and situations of concurrent modification to be rare. 47

48 So, consistency? tion of value sizes latencies arises from iting for the runnable d in Section 3.1. four memcache pools. ult pool), app (a pool fraction of deletes that failed 1e 06 1e 05 1e 04 1e 03 1s 10s 1m 10m 1h 1d 1s 10s 1m 10m 1h 1d master region seconds of delay replica region Figure 11: Latency of the Delete Pipeline 1e-3: 1 in 1000, 1e-4 1 in 10,000 1 in 10,000 Gets of cross-regional writes will return the incorrect value for > 1 day... Follow on work finds, consistency is pretty pretty pretty good of a million deletes and record the time the delete was issued. We subsequently query the contents of memcache across all frontend clusters at regular intervals for the sampled keys and log an error if an item remains cached despite a delete that should have invalidated it. In Figure 11, we use this monitoring mechanism to report our invalidation latencies across a 30 day span. We break this data into two different components: (1) the delete originated from a web server in the master region and was destined to a memcached server in the master re-

49 Takeaways Cache is crucial for survival at FB; caches go down, site goes down Partitioning and replication have different nuance for increasing performance How much does consistency matter?

50 50

51 Discussion Why not have DB send new values to memcached, so clients only read memcached? Then, no racing client updates. All writes ordered.

52 Discussion Why not have DB send new values to memcached, so clients only read memcached? Then, no racing client updates. All writes ordered. 1. DB doesn t know how to compute values for memcached (cache isn t literal DB record) 2. Would increase read-your-writes delay (probably need Spanner-like mechanism?) 3. DB doesn t know what is cached; have to send values for uncached items

53 Replication versus Partitioning Partition: frontend i, key k -> cache server hash(k) Memory efficient Max per key throughput equal to single server tput Multiplier on number of servers each frontend talks to Replication: frontend i, key k -> cache server hash(i) Redundant data Works well if few keys extremely popular

54 milliseconds th Percentile Median Window Size

55 Terabytes Low churn High churn Daily Weekly Daily Weekly Minimum, mean, and maximum Figure 5: Daily and weekly working set of a high-chu

Goals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service.

Goals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service. Goals Memcache as a Service Tom Anderson Rapid application development - Speed of adding new features is paramount Scale Billions of users Every user on FB all the time Performance Low latency for every

More information

Primary/Backup. CS6450: Distributed Systems Lecture 3/4. Ryan Stutsman

Primary/Backup. CS6450: Distributed Systems Lecture 3/4. Ryan Stutsman Primary/Backup CS6450: Distributed Systems Lecture 3/4 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

More information

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman Scaling KVS CS6450: Distributed Systems Lecture 14 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

More information

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman Goal: fast and cost-effective key-value store Target: cluster-level storage for

More information

GFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman

GFS. CS6450: Distributed Systems Lecture 5. Ryan Stutsman GFS CS6450: Distributed Systems Lecture 5 Ryan Stutsman Some material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed for

More information

Distributed Hash Tables

Distributed Hash Tables Distributed Hash Tables CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University.

More information

Bitcoin. CS6450: Distributed Systems Lecture 20 Ryan Stutsman

Bitcoin. CS6450: Distributed Systems Lecture 20 Ryan Stutsman Bitcoin CS6450: Distributed Systems Lecture 20 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed for

More information

Distributed Transactions

Distributed Transactions Distributed Transactions CS6450: Distributed Systems Lecture 17 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University.

More information

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

CS6450: Distributed Systems Lecture 15. Ryan Stutsman Strong Consistency CS6450: Distributed Systems Lecture 15 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

More information

Facebook Tao Distributed Data Store for the Social Graph

Facebook Tao Distributed Data Store for the Social Graph L. Lancia, G. Salillari Cloud Computing Master Degree in Data Science Sapienza Università di Roma Facebook Tao Distributed Data Store for the Social Graph L. Lancia & G. Salillari 1 / 40 Table of Contents

More information

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level

More information

Byzantine Fault Tolerance

Byzantine Fault Tolerance Byzantine Fault Tolerance CS6450: Distributed Systems Lecture 10 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University.

More information

Memshare: a Dynamic Multi-tenant Key-value Cache

Memshare: a Dynamic Multi-tenant Key-value Cache Memshare: a Dynamic Multi-tenant Key-value Cache ASAF CIDON*, DANIEL RUSHTON, STEPHEN M. RUMBLE, RYAN STUTSMAN *STANFORD UNIVERSITY, UNIVERSITY OF UTAH, GOOGLE INC. 1 Cache is 100X Faster Than Database

More information

CS6450: Distributed Systems Lecture 13. Ryan Stutsman

CS6450: Distributed Systems Lecture 13. Ryan Stutsman Eventual Consistency CS6450: Distributed Systems Lecture 13 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University.

More information

Rocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman

Rocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman Rocksteady: Fast Migration for Low-Latency In-memory Storage Chinmay Kulkarni, niraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman 1 Introduction Distributed low-latency in-memory key-value stores are

More information

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

CS6450: Distributed Systems Lecture 11. Ryan Stutsman Strong Consistency CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

More information

Large-Scale Web Applications

Large-Scale Web Applications Large-Scale Web Applications Mendel Rosenblum Web Application Architecture Web Browser Web Server / Application server Storage System HTTP Internet CS142 Lecture Notes - Intro LAN 2 Large-Scale: Scale-Out

More information

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010 Scaling Without Sharding Baron Schwartz Percona Inc Surge 2010 Web Scale!!!! http://www.xtranormal.com/watch/6995033/ A Sharding Thought Experiment 64 shards per proxy [1] 1 TB of data storage per node

More information

Memory Hierarchy Design (Appendix B and Chapter 2)

Memory Hierarchy Design (Appendix B and Chapter 2) CS359: Computer Architecture Memory Hierarchy Design (Appendix B and Chapter 2) Yanyan Shen Department of Computer Science and Engineering 1 Four Memory Hierarchy Questions Q1 (block placement): where

More information

Building a Scalable Architecture for Web Apps - Part I (Lessons Directi)

Building a Scalable Architecture for Web Apps - Part I (Lessons Directi) Intelligent People. Uncommon Ideas. Building a Scalable Architecture for Web Apps - Part I (Lessons Learned @ Directi) By Bhavin Turakhia CEO, Directi (http://www.directi.com http://wiki.directi.com http://careers.directi.com)

More information

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage Horizontal or vertical scalability? Scaling Out Key-Value Storage COS 418: Distributed Systems Lecture 8 Kyle Jamieson Vertical Scaling Horizontal Scaling [Selected content adapted from M. Freedman, B.

More information

Peer-to-Peer Systems and Distributed Hash Tables

Peer-to-Peer Systems and Distributed Hash Tables Peer-to-Peer Systems and Distributed Hash Tables CS 240: Computing Systems and Concurrency Lecture 8 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Selected

More information

Identifying Workloads for the Cloud

Identifying Workloads for the Cloud Identifying Workloads for the Cloud 1 This brief is based on a webinar in RightScale s I m in the Cloud Now What? series. Browse our entire library for webinars on cloud computing management. Meet our

More information

4 Myths about in-memory databases busted

4 Myths about in-memory databases busted 4 Myths about in-memory databases busted Yiftach Shoolman Co-Founder & CTO @ Redis Labs @yiftachsh, @redislabsinc Background - Redis Created by Salvatore Sanfilippo (@antirez) OSS, in-memory NoSQL k/v

More information

CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time.

CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time. CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time. SOLUTION SET In class we often used smart highway (SH) systems as

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU), Dave Andersen (CMU), Michael Kaminsky (Intel Labs) NSDI 2013 http://www.pdl.cmu.edu/ 1 Goal: Improve Memcached 1. Reduce space overhead

More information

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems Distributed Architectures & Microservices CS 475, Spring 2018 Concurrent & Distributed Systems GFS Architecture GFS Summary Limitations: Master is a huge bottleneck Recovery of master is slow Lots of success

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

Data Centers. Tom Anderson

Data Centers. Tom Anderson Data Centers Tom Anderson Transport Clarification RPC messages can be arbitrary size Ex: ok to send a tree or a hash table Can require more than one packet sent/received We assume messages can be dropped,

More information

Scaling Out Key-Value Storage

Scaling Out Key-Value Storage Scaling Out Key-Value Storage COS 418: Distributed Systems Logan Stafman [Adapted from K. Jamieson, M. Freedman, B. Karp] Horizontal or vertical scalability? Vertical Scaling Horizontal Scaling 2 Horizontal

More information

Memory - Paging. Copyright : University of Illinois CS 241 Staff 1

Memory - Paging. Copyright : University of Illinois CS 241 Staff 1 Memory - Paging Copyright : University of Illinois CS 241 Staff 1 Physical Frame Allocation How do we allocate physical memory across multiple processes? What if Process A needs to evict a page from Process

More information

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005]

CSF Cache Introduction. [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] CSF Cache Introduction [Adapted from Computer Organization and Design, Patterson & Hennessy, 2005] Review: The Memory Hierarchy Take advantage of the principle of locality to present the user with as much

More information

Last Class: Demand Paged Virtual Memory

Last Class: Demand Paged Virtual Memory Last Class: Demand Paged Virtual Memory Benefits of demand paging: Virtual address space can be larger than physical address space. Processes can run without being fully loaded into memory. Processes start

More information

CPS 512 midterm exam #1, 10/7/2016

CPS 512 midterm exam #1, 10/7/2016 CPS 512 midterm exam #1, 10/7/2016 Your name please: NetID: Answer all questions. Please attempt to confine your answers to the boxes provided. If you don t know the answer to a question, then just say

More information

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel Rosenblum and John Ousterhout) a Storage System

More information

How to deal with large numbers (millions) of entities in a system? IP devices in the internet (0.5 billion) Users in P2P network (millions)

How to deal with large numbers (millions) of entities in a system? IP devices in the internet (0.5 billion) Users in P2P network (millions) Designs for Scale How to deal with large numbers (millions) of entities in a system? IP devices in the internet (0.5 billion) Users in P2P network (millions) More generally: Are there advantages to large

More information

No compromises: distributed transactions with consistency, availability, and performance

No compromises: distributed transactions with consistency, availability, and performance No compromises: distributed transactions with consistency, availability, and performance Aleksandar Dragojevi c, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam,

More information

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05 Engineering Goals Scalability Availability Transactional behavior Security EAI... Scalability How much performance can you get by adding hardware ($)? Performance perfect acceptable unacceptable Processors

More information

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work? EEC 17 Computer Architecture Fall 25 Introduction Review Review: The Hierarchy Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology

More information

EVCache: Lowering Costs for a Low Latency Cache with RocksDB. Scott Mansfield Vu Nguyen EVCache

EVCache: Lowering Costs for a Low Latency Cache with RocksDB. Scott Mansfield Vu Nguyen EVCache EVCache: Lowering Costs for a Low Latency Cache with RocksDB Scott Mansfield Vu Nguyen EVCache 90 seconds What do caches touch? Signing up* Logging in Choosing a profile Picking liked videos

More information

Scalability of web applications

Scalability of web applications Scalability of web applications CSCI 470: Web Science Keith Vertanen Copyright 2014 Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing

More information

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value

More information

Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University)

Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University) Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University) Background: Memory Caching Two orders of magnitude more reads than writes

More information

Caching Memcached vs. Redis

Caching Memcached vs. Redis Caching Memcached vs. Redis San Francisco MySQL Meetup Ryan Lowe Erin O Neill 1 Databases WE LOVE THEM... Except when we don t 2 When Databases Rule Many access patterns on the same set of data Transactions

More information

Highly Available Database Architectures in AWS. Santa Clara, California April 23th 25th, 2018 Mike Benshoof, Technical Account Manager, Percona

Highly Available Database Architectures in AWS. Santa Clara, California April 23th 25th, 2018 Mike Benshoof, Technical Account Manager, Percona Highly Available Database Architectures in AWS Santa Clara, California April 23th 25th, 2018 Mike Benshoof, Technical Account Manager, Percona Hello, Percona Live Attendees! What this talk is meant to

More information

Cluster-Level Google How we use Colossus to improve storage efficiency

Cluster-Level Google How we use Colossus to improve storage efficiency Cluster-Level Storage @ Google How we use Colossus to improve storage efficiency Denis Serenyi Senior Staff Software Engineer dserenyi@google.com November 13, 2017 Keynote at the 2nd Joint International

More information

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona MySQL Performance Optimization and Troubleshooting with PMM Peter Zaitsev, CEO, Percona In the Presentation Practical approach to deal with some of the common MySQL Issues 2 Assumptions You re looking

More information

The Right Read Optimization is Actually Write Optimization. Leif Walsh

The Right Read Optimization is Actually Write Optimization. Leif Walsh The Right Read Optimization is Actually Write Optimization Leif Walsh leif@tokutek.com The Right Read Optimization is Write Optimization Situation: I have some data. I want to learn things about the world,

More information

Analysis and Optimization. Carl Waldspurger Irfan Ahmad CloudPhysics, Inc.

Analysis and Optimization. Carl Waldspurger Irfan Ahmad CloudPhysics, Inc. PRESENTATION Practical Online TITLE GOES Cache HERE Analysis and Optimization Carl Waldspurger Irfan Ahmad CloudPhysics, Inc. SNIA Legal Notice The material contained in this tutorial is copyrighted by

More information

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 05r. Case study: Google Cluster Architecture Paul Krzyzanowski Rutgers University Fall 2016 1 A note about relevancy This describes the Google search cluster architecture in the mid

More information

Tools for Social Networking Infrastructures

Tools for Social Networking Infrastructures Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes

More information

Show Me the $... Performance And Caches

Show Me the $... Performance And Caches Show Me the $... Performance And Caches 1 CPU-Cache Interaction (5-stage pipeline) PCen 0x4 Add bubble PC addr inst hit? Primary Instruction Cache IR D To Memory Control Decode, Register Fetch E A B MD1

More information

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure Mario Beck (mario.beck@oracle.com) Principal Sales Consultant MySQL Session Agenda Requirements for

More information

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani The Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani CS5204 Operating Systems 1 Introduction GFS is a scalable distributed file system for large data intensive

More information

BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding. Shenglong Li, Quanlu Zhang, Zhi Yang and Yafei Dai Peking University

BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding. Shenglong Li, Quanlu Zhang, Zhi Yang and Yafei Dai Peking University BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding Shenglong Li, Quanlu Zhang, Zhi Yang and Yafei Dai Peking University Outline Introduction and Motivation Our Design System and Implementation

More information

Take Back Lost Revenue by Activating Virtuozzo Storage Today

Take Back Lost Revenue by Activating Virtuozzo Storage Today Take Back Lost Revenue by Activating Virtuozzo Storage Today JUNE, 2017 2017 Virtuozzo. All rights reserved. 1 Introduction New software-defined storage (SDS) solutions are enabling hosting companies to

More information

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model

More information

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high

More information

CS3600 SYSTEMS AND NETWORKS

CS3600 SYSTEMS AND NETWORKS CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection

More information

The Google File System (GFS)

The Google File System (GFS) 1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

! Design constraints.  Component failures are the norm.  Files are huge by traditional standards. ! POSIX-like Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total

More information

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1 Memory Allocation Copyright : University of Illinois CS 241 Staff 1 Allocation of Page Frames Scenario Several physical pages allocated to processes A, B, and C. Process B page faults. Which page should

More information

Getafix: Workload-aware Distributed Interactive Analytics

Getafix: Workload-aware Distributed Interactive Analytics Getafix: Workload-aware Distributed Interactive Analytics Presenter: Mainak Ghosh Collaborators: Le Xu, Xiaoyao Qian, Thomas Kao, Indranil Gupta, Himanshu Gupta Data Analytics 2 Picture borrowed from https://conferences.oreilly.com/strata/strata-ny-2016/public/schedule/detail/51640

More information

Huge market -- essentially all high performance databases work this way

Huge market -- essentially all high performance databases work this way 11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch

More information

SCYLLA: NoSQL at Ludicrous Speed. 主讲人 :ScyllaDB 软件工程师贺俊

SCYLLA: NoSQL at Ludicrous Speed. 主讲人 :ScyllaDB 软件工程师贺俊 SCYLLA: NoSQL at Ludicrous Speed 主讲人 :ScyllaDB 软件工程师贺俊 Today we will cover: + Intro: Who we are, what we do, who uses it + Why we started ScyllaDB + Why should you care + How we made design decisions to

More information

Extreme Computing. NoSQL.

Extreme Computing. NoSQL. Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable

More information

CLOUD-SCALE INFORMATION RETRIEVAL

CLOUD-SCALE INFORMATION RETRIEVAL 1 CLOUD-SCALE INFORMATION RETRIEVAL Ken Birman, CS5412 Cloud Computing Styles of cloud computing 2 Think about Facebook We normally see it in terms of pages that are imageheavy But the tags and comments

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit)

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit) CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2003 Lecture 21: Network Protocols (and 2 Phase Commit) 21.0 Main Point Protocol: agreement between two parties as to

More information

Anti-Caching: A New Approach to Database Management System Architecture. Guide: Helly Patel ( ) Dr. Sunnie Chung Kush Patel ( )

Anti-Caching: A New Approach to Database Management System Architecture. Guide: Helly Patel ( ) Dr. Sunnie Chung Kush Patel ( ) Anti-Caching: A New Approach to Database Management System Architecture Guide: Helly Patel (2655077) Dr. Sunnie Chung Kush Patel (2641883) Abstract Earlier DBMS blocks stored on disk, with a main memory

More information

NPTEL Course Jan K. Gopinath Indian Institute of Science

NPTEL Course Jan K. Gopinath Indian Institute of Science Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,

More information

CMPSC 311- Introduction to Systems Programming Module: Caching

CMPSC 311- Introduction to Systems Programming Module: Caching CMPSC 311- Introduction to Systems Programming Module: Caching Professor Patrick McDaniel Fall 2016 Reminder: Memory Hierarchy L0: Registers CPU registers hold words retrieved from L1 cache Smaller, faster,

More information

Distributed Systems 16. Distributed File Systems II

Distributed Systems 16. Distributed File Systems II Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS

More information

Parallel DBs. April 25, 2017

Parallel DBs. April 25, 2017 Parallel DBs April 25, 2017 1 Why Scale Up? Scan of 1 PB at 300MB/s (SATA r2 Limit) (x1000) ~1 Hour ~3.5 Seconds 2 Data Parallelism Replication Partitioning A A A A B C 3 Operator Parallelism Pipeline

More information

CS 167 Final Exam Solutions

CS 167 Final Exam Solutions CS 167 Final Exam Solutions Spring 2018 Do all questions. 1. [20%] This question concerns a system employing a single (single-core) processor running a Unix-like operating system, in which interrupts are

More information

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University

RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Nandu Jayakumar, Diego Ongaro, Mendel Rosenblum, Stephen Rumble, and Ryan Stutsman) DRAM in Storage

More information

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers A Distributed System Case Study: Apache Kafka High throughput messaging for diverse consumers As always, this is not a tutorial Some of the concepts may no longer be part of the current system or implemented

More information

CSE 124: Networked Services Lecture-17

CSE 124: Networked Services Lecture-17 Fall 2010 CSE 124: Networked Services Lecture-17 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/30/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

The Microsoft Large Mailbox Vision

The Microsoft Large Mailbox Vision WHITE PAPER The Microsoft Large Mailbox Vision Giving users large mailboxes without breaking your budget Introduction Giving your users the ability to store more email has many advantages. Large mailboxes

More information

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies! DEMYSTIFYING BIG DATA WITH RIAK USE CASES Martin Schneider Basho Technologies! Agenda Defining Big Data in Regards to Riak A Series of Trade-Offs Use Cases Q & A About Basho & Riak Basho Technologies is

More information

CMPSC 311- Introduction to Systems Programming Module: Caching

CMPSC 311- Introduction to Systems Programming Module: Caching CMPSC 311- Introduction to Systems Programming Module: Caching Professor Patrick McDaniel Fall 2014 Lecture notes Get caching information form other lecture http://hssl.cs.jhu.edu/~randal/419/lectures/l8.5.caching.pdf

More information

Transactional Consistency and Automatic Management in an Application Data Cache Dan R. K. Ports MIT CSAIL

Transactional Consistency and Automatic Management in an Application Data Cache Dan R. K. Ports MIT CSAIL Transactional Consistency and Automatic Management in an Application Data Cache Dan R. K. Ports MIT CSAIL joint work with Austin Clements Irene Zhang Samuel Madden Barbara Liskov Applications are increasingly

More information

Hyperbolic Caching: Flexible Caching for Web Applications

Hyperbolic Caching: Flexible Caching for Web Applications Hyperbolic Caching: Flexible Caching for Web Applications Aaron Blankstein Princeton University (now @ Blockstack Inc.) Siddhartha Sen Microsoft Research NY Michael J. Freedman Princeton University Modern

More information

The Google File System

The Google File System The Google File System By Ghemawat, Gobioff and Leung Outline Overview Assumption Design of GFS System Interactions Master Operations Fault Tolerance Measurements Overview GFS: Scalable distributed file

More information

An Analysis of Linux Scalability to Many Cores

An Analysis of Linux Scalability to Many Cores An Analysis of Linux Scalability to Many Cores 1 What are we going to talk about? Scalability analysis of 7 system applications running on Linux on a 48 core computer Exim, memcached, Apache, PostgreSQL,

More information

COMP 273 Winter physical vs. virtual mem Mar. 15, 2012

COMP 273 Winter physical vs. virtual mem Mar. 15, 2012 Virtual Memory The model of MIPS Memory that we have been working with is as follows. There is your MIPS program, including various functions and data used by this program, and there are some kernel programs

More information

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

Yuval Carmel Tel-Aviv University Advanced Topics in Storage Systems - Spring 2013 Yuval Carmel Tel-Aviv University "Advanced Topics in About & Keywords Motivation & Purpose Assumptions Architecture overview & Comparison Measurements How does it fit in? The Future 2 About & Keywords

More information

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency Local file systems Disks are terrible abstractions: low-level blocks, etc. Directories, files, links much

More information

CS162 Operating Systems and Systems Programming Lecture 14. Caching (Finished), Demand Paging

CS162 Operating Systems and Systems Programming Lecture 14. Caching (Finished), Demand Paging CS162 Operating Systems and Systems Programming Lecture 14 Caching (Finished), Demand Paging October 11 th, 2017 Neeraja J. Yadwadkar http://cs162.eecs.berkeley.edu Recall: Caching Concept Cache: a repository

More information

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016 Consistency in Distributed Storage Systems Mihir Nanavati March 4 th, 2016 Today Overview of distributed storage systems CAP Theorem About Me Virtualization/Containers, CPU microarchitectures/caches, Network

More information

CGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout

CGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout CGAR: Strong Consistency without Synchronous Replication Seo Jin Park Advised by: John Ousterhout Improved update performance of storage systems with master-back replication Fast: updates complete before

More information

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs Anuj Kalia (CMU), Michael Kaminsky (Intel Labs), David Andersen (CMU) RDMA RDMA is a network feature that

More information

Sharding & CDNs. CS 475, Spring 2018 Concurrent & Distributed Systems

Sharding & CDNs. CS 475, Spring 2018 Concurrent & Distributed Systems Sharding & CDNs CS 475, Spring 2018 Concurrent & Distributed Systems Review: Distributed File Systems Challenges: Heterogeneity (different kinds of computers with different kinds of network links) Scale

More information

A memcached implementation in Java. Bela Ban JBoss 2340

A memcached implementation in Java. Bela Ban JBoss 2340 A memcached implementation in Java Bela Ban JBoss 2340 AGENDA 2 > Introduction > memcached > memcached in Java > Improving memcached > Infinispan > Demo Introduction 3 > We want to store all of our data

More information

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark Testing validation report prepared under contract with Dell Introduction As innovation drives

More information

Mo Money, No Problems: Caches #2...

Mo Money, No Problems: Caches #2... Mo Money, No Problems: Caches #2... 1 Reminder: Cache Terms... Cache: A small and fast memory used to increase the performance of accessing a big and slow memory Uses temporal locality: The tendency to

More information

Migrating to Vitess at (Slack) Scale. Michael Demmer Percona Live Europe 2017

Migrating to Vitess at (Slack) Scale. Michael Demmer Percona Live Europe 2017 Migrating to Vitess at (Slack) Scale Michael Demmer Percona Live Europe 2017 This is a (brief) story of how Slack's databases work today, why we're migrating to Vitess, and some lessons we've learned

More information

Performance and Scalability with Griddable.io

Performance and Scalability with Griddable.io Performance and Scalability with Griddable.io Executive summary Griddable.io is an industry-leading timeline-consistent synchronized data integration grid across a range of source and target data systems.

More information