Rocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman
|
|
- Sheena Gilmore
- 6 years ago
- Views:
Transcription
1 Rocksteady: Fast Migration for Low-Latency In-memory Storage Chinmay Kulkarni, niraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman 1
2 Introduction Distributed low-latency in-memory key-value stores are emerging Predictable response times ~10 µs median, ~60 µs 99.9 th -tile Problem: Must migrate data between servers Minimize performance impact of migration go slow? Quickly respond to hot spots, skew shifts, load spikes go fast? Solution: Fast data migration with low impact Early ownership transfer of data, leverage workload skew Low priority, parallel and adaptive migration Result: Migration protocol for RMCloud in-memory key-value store Migrates 256 G in 6 minutes, 99.9 th -tile latency less than 250 µs Median latency recovers from 40 µs to 20 µs in 14 s 2
3 Why Migrate Data? Client 1 Client 2 multiget( ) multiget( C D ) 6 Million C D Server 1 Server 2 No Locality Fanout=7 Poor spatial locality High multiget() fan-out More RPCs 3
4 Migrate To Improve Spatial Locality Client 1 Client 2 multiget( ) multiget( C D ) 6 Million C D Server 1 Server 2 C No Locality Fanout=7 4
5 Spatial Locality Improves Throughput Client 1 Client 2 multiget( ) multiget( C D ) 25 Million 6 Million C D Server 1 Server 2 Full Locality Fanout=1 No Locality Fanout=7 etter spatial locality Fewer RPCs Higher throughput enefits multiget(), range scans 5
6 The RMCloud Key-Value Store Client Client Client Client Coordinator Data Center Fabric ll Data in RM Kernel ypass/ DPDK 10 µs reads Master Master Master Master 6
7 The RMCloud Key-Value Store Client Client Client Client Write RPC Coordinator Data Center Fabric Master Master Master Master 7
8 The RMCloud Key-Value Store Client Client Client Client Write RPC Coordinator Data Center Fabric 1x in DRM 3x on Disk Master Master Master Master 8
9 Fault-tolerance & Recovery In RMCloud Client Client Client Client Coordinator Data Center Fabric Master Master Master Master 9
10 Fault-tolerance & Recovery In RMCloud Client Client Client Client Coordinator Data Center Fabric 2 seconds to recover Master Master Master Master 10
11 Performance Goals For Migration Maintain low access latency 10 µsec median latency System extremely sensitive Tail latency matters at scale Even more sensitive Migrate data fast Workloads dynamic Respond quickly Growing DRM storage: 512 G per server Slow data migration Entire day to scale cluster 11
12 Rocksteady Overview: Early Ownership Transfer Problem: Loaded source can bottleneck migration Solution: Instantly shift ownership and all load to target Client 1 Client 2 Client 3 Client 4 Reads and Writes Source Server Target Server 12
13 Rocksteady Overview: Early Ownership Transfer Problem: Loaded source can bottleneck migration Solution: Instantly shift ownership and all load to target Client 1 Client 2 Client 3 Client 4 Instantly Redirected ll future operations serviced at Target Source Server Target Server Creates headroom to speed migration 13
14 Rocksteady Overview: Leverage Skew Problem: Data has not arrived at source yet Solution: On demand migration of unavailable data Client 1 Client 2 Client 3 Client 4 Read On-demand Pull Source Server Target Server 14
15 Rocksteady Overview: Leverage Skew Problem: Data has not arrived at source yet Solution: On demand migration of unavailable data Client 1 Client 2 Client 3 Client 4 Read Hot keys move early Source Server Target Server Median Latency recovers to 20 µs in 14 s 15
16 Rocksteady Overview: daptive and Parallel Problem: Old single-threaded protocol limited to 130 M/s Solution: Pipelined and parallel at source and target Client 1 Client 2 Client 3 Client 4 On-demand Pull Source Server Parallel Pulls Target Server 16
17 Rocksteady Overview: daptive and Parallel Problem: Old single-threaded protocol limited to 130 M/s Solution: Pipelined and parallel at source and target Client 1 Client 2 Client 3 Client 4 On-demand Pull Target Driven Yields to On-demand Pulls Source Server Parallel Pulls Target Server Moves 758 M/s 17
18 Rocksteady Overview: Eliminate Sync Replication Problem: Synchronous replication bottleneck at target Solution: Safely defer replication until after migration Client 1 Client 2 Client 3 Client 4 On-demand Pull Replication Source Server Parallel Pulls Target Server 18
19 Rocksteady Overview: Eliminate Sync Replication Problem: Synchronous replication bottleneck at target Solution: Safely defer replication until after migration Client 1 Client 2 Client 3 Client 4 Replication Source Server Target Server 19
20 Rocksteady: Putting it all together Instantaneous ownership transfer Immediate load reduction at overloaded source Creates headroom for migration work Leverage skew to rapidly migrate hot data Target comes up to speed with little data movement daptive parallel, pipelined at source and target ll cores avoid stalls, but yield to client-facing operations Safely defer replication at target Eliminates replication bottleneck and contention 20
21 Rocksteady Instantaneous ownership transfer Leverage skew to rapidly migrate hot data daptive parallel, pipelined at source and target Safely defer synchronous replication at target 21
22 Evaluation Setup Client YCS- (95/5) Skew=0.99 Client YCS- (95/5) Skew=0.99 Client YCS- (95/5) Skew=0.99 Client YCS- (95/5) Skew= Million Records 45 G Source Server Target Server 22
23 Evaluation Setup Client YCS- (95/5) Skew=0.99 Client YCS- (95/5) Skew=0.99 Client YCS- (95/5) Skew=0.99 Client YCS- (95/5) Skew= Million Records 22.5 G 150 Million Records 22.5 G 150 Million Records 22.5 G Target Server 23
24 Instantaneous Ownership Transfer Source CPU Utilization 80% 25% Created 55% Source CPU Headroom efore Ownership Transfer Immediately fter Transfer efore migration: Source over-loaded, Target under-loaded Ownership transfer creates Source headroom for migration 24
25 Rocksteady Instantaneous ownership transfer Leverage skew to rapidly migrate hot data daptive parallel, pipelined at source and target Safely defer synchronous replication at target 25
26 Leverage Skew To Move Hot Data 99.9th Latency Median Latency efore Migration: 240µs 245µs 75µs 155µs 28µs 17µs Median=10 µs 99.9 th = 60 µs Uniform (Low) Skew=0.99 Skew=1.5 (High) fter ownership transfer, hot keys pulled on-demand More skew Median restored faster (migrate fewer hot keys) 26
27 Rocksteady Instantaneous ownership transfer Leverage skew to rapidly migrate hot data daptive parallel, pipelined at source and target Safely defer synchronous replication at target 27
28 Parallel, Pipelined, & daptive Pulls Target Hash Table Per-Core uffers Dispatch Core NIC Worker Cores Polling replay replay read() Migration Manager Pull uffers pulling Target driven, migration manager Co-partitioned hash tables, pull from partitions in parallel Replay pulled data into per-core buffers 28
29 Parallel, Pipelined, & daptive Pulls Source Hash Table Copy ddresses Dispatch Core NIC Polling read() pull(11) Gather List pull(17) Gather List Stateless passive Source Granular 20 K pulls 29
30 Parallel, Pipelined, & daptive Pulls Redirect any idle CPU for migration Migration yields to regular requests, on-demand pulls 30
31 Rocksteady Instantaneous ownership transfer Leverage skew to rapidly migrate hot data daptive parallel, pipelined at source and target Safely defer synchronous replication at target 31
32 Naïve Fault Tolerance During Migration Each server has a recovery log distributed across the cluster Source Target C Source Recovery Log C C C Target Recovery Log 32
33 Naïve Fault Tolerance During Migration Migrated data needs to be triplicated to target s recovery log Source Target C Source Recovery Log C C C Target Recovery Log 33
34 Naïve Fault Tolerance During Migration Migrated data needs to be triplicated to target s recovery log Source Target C Source Recovery Log C C C Target Recovery Log 34
35 Synchronous Replication ottlenecks Migration Synchronous replication hits migration speed by 34% Source C Target Source Recovery Log C C C Target Recovery Log 35
36 Rocksteady: Safely Defer Replication t The Target Replicate at Target only after all data has been moved over Source C Target C Source Recovery Log C C C Target Recovery Log 36
37 Writes/Mutations Served y Target Mutations have to be replicated by the target Write Source C Target C Source Recovery Log C C C Target Recovery Log C C C 37
38 Crash Safety During Migration Need both Source and Target recovery log for data recovery Initial table state on Source recovery log Writes/Mutations on Target recovery log Transfer ownership back to Source in case of crash Migration cancelled Recovery involves both recovery logs Source takes a dependency on Target recovery log at migration start Stored reliably at the cluster coordinator Identifies position after which mutations present 38
39 If The Source Crashes During Migration Recover Source, recover from Target recovery log Source C Target C Source Recovery Log C C C Target Recovery Log C C C 39
40 If The Target Crashes During Migration Recover from Source and Target recovery log, recover Target Source C Target C Source Recovery Log C C C Target Recovery Log C C C 40
41 Crash Safety During Migration Need both Source and Target recovery log for data recovery Initial table state on Source recovery log Writes/Mutations on Target recovery log Safely Transfer Ownership t Migration Start Transfer ownership back to Source in case of crash Migration cancelled Recovery involves both recovery logs Safely Delay Replication Till ll Data Has een Moved Source takes a dependency on Target recovery log at migration start Stored reliably at the cluster coordinator Identifies position after which mutations present 41
42 Performance of Rocksteady YCS-, 300 Million objects (30 key, 100 value), migrate half Median ccess Latency (µs) Rocksteady Source Keeps Ownership + Sync Replication Time Since Experiment Start (Seconds) 42
43 Performance of Rocksteady YCS-, 300 Million objects (30 key, 100 value), migrate half Median latency better after ~14 seconds Median ccess Latency (µs) Rocksteady Source Keeps Ownership + Sync Replication Time Since Experiment Start (Seconds) 43
44 Performance of Rocksteady YCS-, 300 Million objects (30 key, 100 value), migrate half Median ccess Latency (µs) Median latency better after ~14 seconds 28% faster migration Rocksteady Source Keeps Ownership + Sync Replication Time Since Experiment Start (Seconds) 44
45 Related Work Dynamo: Pre-partition hash keys Spanner: pplications given control over locality (Directories) FaRM and DrTM: Re-use in-memory redundancy for migration Squall: Reconfiguration protocol for H-Store Early ownership transfer Paced background migration Fully partitioned, serial execution, no synchronization Each migration pull stalls execution Synchronous replication at the target 45
46 Conclusion Distributed low-latency in-memory key-value stores are emerging Predictable response times ~10 µs median, ~60 µs 99.9 th -tile Problem: Must migrate data between servers Minimize performance impact of migration go slow? Quickly respond to hot spots, skew shifts, load spikes go fast? Solution: Fast data migration with low impact Leverage skew: Transfer ownership before data, move hot data first Low priority, parallel and adaptive migration Result: Migration protocol for RMCloud in-memory key-value store Migrates at 758 Mps with 99.9 th -tile latency < 250 µs Source Code: 46
47 Slides 47
48 Rocksteady Tail Latency reakdown Rocksteady 48
49 Rocksteady Tail Latency reakdown Rocksteady Disabling parallel pulls brings tail latency down to 160 µsec 49
50 Rocksteady Tail Latency reakdown Rocksteady Disabling parallel pulls brings tail latency down to 160 µsec Synchronous on-demand pulls further brings tail latency down to 135 µsec 50
Rocksteady: Fast Migration for Low-latency In-memory Storage
Rocksteady: Fast Migration for Low-latency In-memory Storage Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, and Ryan Stutsman University of Utah ABSTRACT Scalable in-memory key-value stores
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system
More informationTailwind: Fast and Atomic RDMA-based Replication. Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, Toni Cortes
Tailwind: Fast and Atomic RDMA-based Replication Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, Toni Cortes In-Memory Key-Value Stores General purpose in-memory key-value stores are widely used nowadays
More informationCGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout
CGAR: Strong Consistency without Synchronous Replication Seo Jin Park Advised by: John Ousterhout Improved update performance of storage systems with master-back replication Fast: updates complete before
More informationExploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout
Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: consistent replication adds latency and throughput overheads Why? Replication happens after ordering
More informationRAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel Rosenblum and John Ousterhout) a Storage System
More informationBeyond Simple Request Processing with RAMCloud
Beyond Simple Request Processing with RAMCloud Chinmay Kulkarni, Aniraj Kesavan, Robert Ricci, and Ryan Stutsman University of Utah Abstract RAMCloud is a scale-out data center key-value store designed
More informationExploiting Commutativity For Practical Fast Replication
Exploiting Commutativity For Practical Fast Replication Seo Jin Park Stanford University John Ousterhout Stanford University Abstract Traditional approaches to replication require client requests to be
More informationGoogle File System. Arun Sundaram Operating Systems
Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)
More informationFaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs
FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs Anuj Kalia (CMU), Michael Kaminsky (Intel Labs), David Andersen (CMU) RDMA RDMA is a network feature that
More informationChapter 11: File System Implementation. Objectives
Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationGeorgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong
Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Relatively recent; still applicable today GFS: Google s storage platform for the generation and processing of data used by services
More informationExploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout
Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: replication adds latency and throughput overheads CURP: Consistent Unordered Replication Protocol
More informationNo compromises: distributed transactions with consistency, availability, and performance
No compromises: distributed transactions with consistency, availability, and performance Aleksandar Dragojevi c, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam,
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW
More informationGoogle File System. By Dinesh Amatya
Google File System By Dinesh Amatya Google File System (GFS) Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung designed and implemented to meet rapidly growing demand of Google's data processing need a scalable
More informationRemote Procedure Call. Tom Anderson
Remote Procedure Call Tom Anderson Why Are Distributed Systems Hard? Asynchrony Different nodes run at different speeds Messages can be unpredictably, arbitrarily delayed Failures (partial and ambiguous)
More informationCurrent Topics in OS Research. So, what s hot?
Current Topics in OS Research COMP7840 OSDI Current OS Research 0 So, what s hot? Operating systems have been around for a long time in many forms for different types of devices It is normally general
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google* 정학수, 최주영 1 Outline Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis Conclusions
More informationLarge-scale Caching. CS6450: Distributed Systems Lecture 18. Ryan Stutsman
Large-scale Caching CS6450: Distributed Systems Lecture 18 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School
More informationUnderstanding Oracle RAC ( ) Internals: The Cache Fusion Edition
Understanding (12.1.0.2) Internals: The Cache Fusion Edition Subtitle Markus Michalewicz Director of Product Management Oracle Real Application Clusters (RAC) November 19th, 2014 @OracleRACpm http://www.linkedin.com/in/markusmichalewicz
More informationNetChain: Scale-Free Sub-RTT Coordination
NetChain: Scale-Free Sub-RTT Coordination Xin Jin Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, Ion Stoica Conventional wisdom: avoid coordination NetChain: lightning
More informationRAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University
RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Nandu Jayakumar, Diego Ongaro, Mendel Rosenblum, Stephen Rumble, and Ryan Stutsman) DRAM in Storage
More informationAnti-Caching: A New Approach to Database Management System Architecture. Guide: Helly Patel ( ) Dr. Sunnie Chung Kush Patel ( )
Anti-Caching: A New Approach to Database Management System Architecture Guide: Helly Patel (2655077) Dr. Sunnie Chung Kush Patel (2641883) Abstract Earlier DBMS blocks stored on disk, with a main memory
More informationDistributed Systems COMP 212. Revision 2 Othon Michail
Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise
More informationAn Oracle White Paper April Consolidation Using the Fujitsu M10-4S Server
An Oracle White Paper April 2014 Consolidation Using the Fujitsu M10-4S Server Executive Overview... 1 Why Server and Application Consolidation?... 2 Requirements for Consolidation... 3 Consolidation on
More informationE-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems
E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, Michael
More informationRAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University
RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction
More informationDistributed Data Management Replication
Felix Naumann F-2.03/F-2.04, Campus II Hasso Plattner Institut Distributing Data Motivation Scalability (Elasticity) If data volume, processing, or access exhausts one machine, you might want to spread
More informationPerformance and Optimization Issues in Multicore Computing
Performance and Optimization Issues in Multicore Computing Minsoo Ryu Department of Computer Science and Engineering 2 Multicore Computing Challenges It is not easy to develop an efficient multicore program
More informationI/O Systems. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
I/O Systems Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics Device characteristics Block device vs. Character device Direct I/O vs.
More informationGoals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service.
Goals Memcache as a Service Tom Anderson Rapid application development - Speed of adding new features is paramount Scale Billions of users Every user on FB all the time Performance Low latency for every
More informationBuilding Consistent Transactions with Inconsistent Replication
DB Reading Group Fall 2015 slides by Dana Van Aken Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports
More informationCA485 Ray Walshe Google File System
Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage
More informationTales of the Tail Hardware, OS, and Application-level Sources of Tail Latency
Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What
More informationGFS: The Google File System. Dr. Yingwu Zhu
GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [DYNAMO & GOOGLE FILE SYSTEM] Frequently asked questions from the previous class survey What s the typical size of an inconsistency window in most production settings? Dynamo?
More informationMaking RAMCloud Writes Even Faster
Making RAMCloud Writes Even Faster (Bring Asynchrony to Distributed Systems) Seo Jin Park John Ousterhout Overview Goal: make writes asynchronous with consistency. Approach: rely on client Server returns
More informationCSE 124: Networked Services Fall 2009 Lecture-19
CSE 124: Networked Services Fall 2009 Lecture-19 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa09/cse124 Some of these slides are adapted from various sources/individuals including but
More informationEngineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05
Engineering Goals Scalability Availability Transactional behavior Security EAI... Scalability How much performance can you get by adding hardware ($)? Performance perfect acceptable unacceptable Processors
More informationI/O Device Controllers. I/O Systems. I/O Ports & Memory-Mapped I/O. Direct Memory Access (DMA) Operating Systems 10/20/2010. CSC 256/456 Fall
I/O Device Controllers I/O Systems CS 256/456 Dept. of Computer Science, University of Rochester 10/20/2010 CSC 2/456 1 I/O devices have both mechanical component & electronic component The electronic
More informationScaling Without Sharding. Baron Schwartz Percona Inc Surge 2010
Scaling Without Sharding Baron Schwartz Percona Inc Surge 2010 Web Scale!!!! http://www.xtranormal.com/watch/6995033/ A Sharding Thought Experiment 64 shards per proxy [1] 1 TB of data storage per node
More informationNPTEL Course Jan K. Gopinath Indian Institute of Science
Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,
More informationCSE 451: Operating Systems. Section 10 Project 3 wrap-up, final exam review
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review Final exam review Goal of this section: key concepts you should understand Not just a summary of lectures Slides coverage and
More informationSquall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases
Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases AARON J. ELMORE, VAIBHAV ARORA, REBECCA TAFT, ANDY PAVLO, DIVY AGRAWAL, AMR EL ABBADI Higher OLTP Throughput Demand for High-throughput
More information1. Introduction. Traditionally, a high bandwidth file system comprises a supercomputer with disks connected
1. Introduction Traditionally, a high bandwidth file system comprises a supercomputer with disks connected by a high speed backplane bus such as SCSI [3][4] or Fibre Channel [2][67][71]. These systems
More informationLecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown
Lecture 21: Reliable, High Performance Storage CSC 469H1F Fall 2006 Angela Demke Brown 1 Review We ve looked at fault tolerance via server replication Continue operating with up to f failures Recovery
More informationSMD149 - Operating Systems - Multiprocessing
SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction
More informationOverview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy
Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system
More informationNooks. Robert Grimm New York University
Nooks Robert Grimm New York University The Three Questions What is the problem? What is new or different? What are the contributions and limitations? Design and Implementation Nooks Overview An isolation
More informationOPERATING SYSTEM. Chapter 12: File System Implementation
OPERATING SYSTEM Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management
More informationCOLLIN LEE INITIAL DESIGN THOUGHTS FOR A GRANULAR COMPUTING PLATFORM
COLLIN LEE INITIAL DESIGN THOUGHTS FOR A GRANULAR COMPUTING PLATFORM INITIAL DESIGN THOUGHTS FOR A GRANULAR COMPUTING PLATFORM GOAL OF THIS TALK Introduce design ideas and issues for a granular computing
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2016 Lecture 2 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 2 System I/O System I/O (Chap 13) Central
More informationParallel DBs. April 25, 2017
Parallel DBs April 25, 2017 1 Sending Hints Rk B Si Strategy 3: Bloom Filters Node 1 Node 2 2 Sending Hints Rk B Si Strategy 3: Bloom Filters Node 1 with
More informationData Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 14: Data Replication Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database Replication What is database replication The advantages of
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
More informationFile System Implementation
File System Implementation Last modified: 16.05.2017 1 File-System Structure Virtual File System and FUSE Directory Implementation Allocation Methods Free-Space Management Efficiency and Performance. Buffering
More informationDISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.
CHALLENGES Transparency: Slide 1 DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems ➀ Introduction ➁ NFS (Network File System) ➂ AFS (Andrew File System) & Coda ➃ GFS (Google File System)
More informationChapter 11: Implementing File
Chapter 11: Implementing File Systems Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency
More informationI/O Management and Disk Scheduling. Chapter 11
I/O Management and Disk Scheduling Chapter 11 Categories of I/O Devices Human readable used to communicate with the user video display terminals keyboard mouse printer Categories of I/O Devices Machine
More informationUsing Virtualization to Reduce Cost and Improve Manageability of J2EE Application Servers
WHITEPAPER JANUARY 2006 Using Virtualization to Reduce Cost and Improve Manageability of J2EE Application Servers J2EE represents the state of the art for developing component-based multi-tier enterprise
More information! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like
Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total
More informationChapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition
Chapter 11: Implementing File Systems Operating System Concepts 9 9h Edition Silberschatz, Galvin and Gagne 2013 Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory
More informationCOS 318: Operating Systems. NSF, Snapshot, Dedup and Review
COS 318: Operating Systems NSF, Snapshot, Dedup and Review Topics! NFS! Case Study: NetApp File System! Deduplication storage system! Course review 2 Network File System! Sun introduced NFS v2 in early
More informationCS3600 SYSTEMS AND NETWORKS
CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection
More informationLecture XIII: Replication-II
Lecture XIII: Replication-II CMPT 401 Summer 2007 Dr. Alexandra Fedorova Outline Google File System A real replicated file system Paxos Harp A consensus algorithm used in real systems A replicated research
More informationGoogle File System (GFS) and Hadoop Distributed File System (HDFS)
Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear
More informationGoogle File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo
Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance
More informationCeph: A Scalable, High-Performance Distributed File System PRESENTED BY, NITHIN NAGARAJ KASHYAP
Ceph: A Scalable, High-Performance Distributed File System PRESENTED BY, NITHIN NAGARAJ KASHYAP Outline Introduction. System Overview. Distributed Object Storage. Problem Statements. What is Ceph? Unified
More informationI/O Buffering and Streaming
I/O Buffering and Streaming I/O Buffering and Caching I/O accesses are reads or writes (e.g., to files) Application access is arbitary (offset, len) Convert accesses to read/write of fixed-size blocks
More informationMicrosoft Office SharePoint Server 2007
Microsoft Office SharePoint Server 2007 Enabled by EMC Celerra Unified Storage and Microsoft Hyper-V Reference Architecture Copyright 2010 EMC Corporation. All rights reserved. Published May, 2010 EMC
More informationApsaraDB for Redis. Product Introduction
ApsaraDB for Redis is compatible with open-source Redis protocol standards and provides persistent memory database services. Based on its high-reliability dual-machine hot standby architecture and seamlessly
More informationAssignment 5. Georgia Koloniari
Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last
More informationVERITAS Volume Manager for Windows 2000 VERITAS Cluster Server for Windows 2000
WHITE PAPER VERITAS Volume Manager for Windows 2000 VERITAS Cluster Server for Windows 2000 VERITAS CAMPUS CLUSTER SOLUTION FOR WINDOWS 2000 WHITEPAPER 1 TABLE OF CONTENTS TABLE OF CONTENTS...2 Overview...3
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationThe Google File System (GFS)
1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints
More informationConceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.
Conceptual Modeling on Tencent s Distributed Database Systems Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc. Outline Introduction System overview of TDSQL Conceptual Modeling on TDSQL Applications Conclusion
More informationCSE 153 Design of Operating Systems
CSE 153 Design of Operating Systems Winter 2018 Lecture 22: File system optimizations and advanced topics There s more to filesystems J Standard Performance improvement techniques Alternative important
More informationComputer-System Organization (cont.)
Computer-System Organization (cont.) Interrupt time line for a single process doing output. Interrupts are an important part of a computer architecture. Each computer design has its own interrupt mechanism,
More informationChapter 11: Implementing File Systems
Chapter 11: Implementing File Systems Operating System Concepts 99h Edition DM510-14 Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation
More informationOracle Rdb Hot Standby Performance Test Results
Oracle Rdb Hot Performance Test Results Bill Gettys (bill.gettys@oracle.com), Principal Engineer, Oracle Corporation August 15, 1999 Introduction With the release of Rdb version 7.0, Oracle offered a powerful
More informationHigh Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.
High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Warm Standby...2 The Business Problem...2 Section II:
More informationINFINIDAT Storage Architecture. White Paper
INFINIDAT Storage Architecture White Paper Abstract The INFINIDAT enterprise storage solution is based upon the unique and patented INFINIDAT Storage Architecture (ISA). The INFINIDAT Storage Architecture
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationDatacenter replication solution with quasardb
Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION
More informationPerformance comparisons and trade-offs for various MySQL replication schemes
Performance comparisons and trade-offs for various MySQL replication schemes Darpan Dinker VP Engineering Brian O Krafka, Chief Architect Schooner Information Technology, Inc. http://www.schoonerinfotech.com/
More informationReview: Creating a Parallel Program. Programming for Performance
Review: Creating a Parallel Program Can be done by programmer, compiler, run-time system or OS Steps for creating parallel program Decomposition Assignment of tasks to processes Orchestration Mapping (C)
More informationCeph: A Scalable, High-Performance Distributed File System
Ceph: A Scalable, High-Performance Distributed File System S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long Presented by Philip Snowberger Department of Computer Science and Engineering University
More informationScalable Streaming Analytics
Scalable Streaming Analytics KARTHIK RAMASAMY @karthikz TALK OUTLINE BEGIN I! II ( III b Overview Storm Overview Storm Internals IV Z V K Heron Operational Experiences END WHAT IS ANALYTICS? according
More informationCSE 124: Networked Services Lecture-16
Fall 2010 CSE 124: Networked Services Lecture-16 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/23/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments
More information18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.
18-hdfs-gfs.txt Thu Nov 01 09:53:32 2012 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2012 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File
More informationIntroduction to the Active Everywhere Database
Introduction to the Active Everywhere Database INTRODUCTION For almost half a century, the relational database management system (RDBMS) has been the dominant model for database management. This more than
More informationASN Configuration Best Practices
ASN Configuration Best Practices Managed machine Generally used CPUs and RAM amounts are enough for the managed machine: CPU still allows us to read and write data faster than real IO subsystem allows.
More informationVMware vsphere with ESX 4.1 and vcenter 4.1
QWERTYUIOP{ Overview VMware vsphere with ESX 4.1 and vcenter 4.1 This powerful 5-day class is an intense introduction to virtualization using VMware s vsphere 4.1 including VMware ESX 4.1 and vcenter.
More informationI/O CANNOT BE IGNORED
LECTURE 13 I/O I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access improves by ~10% per year and I/O remains the same.
More informationAn Implementation of the Homa Transport Protocol in RAMCloud. Yilong Li, Behnam Montazeri, John Ousterhout
An Implementation of the Homa Transport Protocol in RAMCloud Yilong Li, Behnam Montazeri, John Ousterhout Introduction Homa: receiver-driven low-latency transport protocol using network priorities HomaTransport
More informationCPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University
Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads
More informationDistributed System. Gang Wu. Spring,2018
Distributed System Gang Wu Spring,2018 Lecture7:DFS What is DFS? A method of storing and accessing files base in a client/server architecture. A distributed file system is a client/server-based application
More information