The OceanStore Write Path
|
|
- Gabriel Blankenship
- 5 years ago
- Views:
Transcription
1 The OceanStore Write Path Sean C. Rhea John Kubiatowicz University of California, Berkeley June 11, 2002
2 Introduction: the OceanStore Write Path
3 Introduction: the OceanStore Write Path The Inner Ring Acts as the single point of consistency for a file
4 Introduction: the OceanStore Write Path The Inner Ring Acts as the single point of consistency for a file Performs write access control, serialization Creates archival fragments of new data and disperses them
5 Introduction: the OceanStore Write Path The Inner Ring Acts as the single point of consistency for a file Performs write access control, serialization Creates archival fragments of new data and disperses them Certifies the results of its actions with cryptography
6 Introduction: the OceanStore Write Path The Inner Ring Acts as the single point of consistency for a file Performs write access control, serialization Creates archival fragments of new data and disperses them Certifies the results of its actions with cryptography The Second Tier Caches certificates and data produced at the inner ring Self-organizes into an dissemination tree to share results
7 Introduction: the OceanStore Write Path The Inner Ring Acts as the single point of consistency for a file Performs write access control, serialization Creates archival fragments of new data and disperses them Certifies the results of its actions with cryptography The Second Tier Caches certificates and data produced at the inner ring Self-organizes into an dissemination tree to share results The Archival Storage Servers Store archival fragments generated in the Inner Ring
8 Introduction: the OceanStore Write Path The Inner Ring Acts as the single point of consistency for a file Performs write access control, serialization Creates archival fragments of new data and disperses them Certifies the results of its actions with cryptography The Second Tier Caches certificates and data produced at the inner ring Self-organizes into an dissemination tree to share results The Archival Storage Servers Store archival fragments generated in the Inner Ring The Client Machines Create updates and send them to the inner ring Wait for responses to come down the dissemination tree 1
9 Introduction: the OceanStore Write Path (con t) Inner Ring Archive App Replica App Replica Replica T req Time 1. A client sends an update to the inner ring 2
10 Introduction: the OceanStore Write Path (con t) Inner Ring Archive App Replica App Replica Replica T req T agree Time 1. A client sends an update to the inner ring 2. The inner ring performs a Byzantine agreement, applying the update 3
11 Introduction: the OceanStore Write Path (con t) Inner Ring Archive App Replica App Replica Replica T req T agree T disseminate Time 1. A client sends an update to the inner ring 2. The inner ring performs a Byzantine agreement, applying the update 3. The results are sent down the dissemination tree and into the archive 4
12 Write Path Details Inner Ring uses Byzantine agreement for fault tolerance Up to f of 3f + 1 servers can fail We use a modified version of the Castro-Liskov protocol
13 Write Path Details Inner Ring uses Byzantine agreement for fault tolerance Up to f of 3f + 1 servers can fail We use a modified version of the Castro-Liskov protocol Inner Ring certifies decisions with proactive threshold signatures Single public (verification) key Each member has a key share which lets it generate signature shares Need f + 1 signature shares to generate full signature Independent sets of key shares can be used to control membership
14 Write Path Details Inner Ring uses Byzantine agreement for fault tolerance Up to f of 3f + 1 servers can fail We use a modified version of the Castro-Liskov protocol Inner Ring certifies decisions with proactive threshold signatures Single public (verification) key Each member has a key share which lets it generate signature shares Need f + 1 signature shares to generate full signature Independent sets of key shares can be used to control membership Second Tier and Archive are ignorant of composition of Inner Ring Know only the single public key Allows simple replacement of faulty Inner Ring servers 5
15 Micro Benchmarks: Update Latency vs. Update Size bit keys 512 bit keys slope = 0.6 s/mb Latency (ms) slope = 0.6 s/mb Update Size (kb) Use two key sizes to show effects of Moore s Law on latency 512 bit keys are not secure, but are 4 faster Gives an upper bound on latency three years from now 6
16 Micro Benchmarks: Update Latency Remarks Threshold signatures are expensive Takes 6.3 ms to generate regular 1024 bit signature But takes 73.9 ms to generate 1024 bit threshold signature share (Combining shares takes less than 1 ms)
17 Micro Benchmarks: Update Latency Remarks Threshold signatures are expensive Takes 6.3 ms to generate regular 1024 bit signature But takes 73.9 ms to generate 1024 bit threshold signature share (Combining shares takes less than 1 ms) Unfortunately, this is a mathematical fact of life Cannot use Chinese Remainder Theorem in computing shares (4 ) Making individual shares verifiable is expensive
18 Micro Benchmarks: Update Latency Remarks Threshold signatures are expensive Takes 6.3 ms to generate regular 1024 bit signature But takes 73.9 ms to generate 1024 bit threshold signature share (Combining shares takes less than 1 ms) Unfortunately, this is a mathematical fact of life Cannot use Chinese Remainder Theorem in computing shares (4 ) Making individual shares verifiable is expensive Almost no research into performance of threshold cryptography 7
19 Micro Benchmarks: Throughput vs. Update Size Total Update Operations per Second Ops/s MB/s Total Bandwidth (MB/s) Size of Update (kb) Using 1024 bit keys, 60 synchronous clients Max throughput is a respectable 5 MB/s Berkeley DB through Java can only do about 7.5 MB/s
20 Micro Benchmarks: Throughput vs. Update Size Total Update Operations per Second Ops/s MB/s Total Bandwidth (MB/s) Size of Update (kb) Using 1024 bit keys, 60 synchronous clients Max throughput is a respectable 5 MB/s Berkeley DB through Java can only do about 7.5 MB/s But we have a problem with small updates 13 ops/s is atrocious! 8
21 Batching: A Solution to the Small Update Problem What if we could combine many small updates into a single batch?
22 Batching: A Solution to the Small Update Problem What if we could combine many small updates into a single batch? Each Inner Ring member Decides result of each update individually Generates a signature share over the results of all of the updates
23 Batching: A Solution to the Small Update Problem What if we could combine many small updates into a single batch? Each Inner Ring member Decides result of each update individually Generates a signature share over the results of all of the updates Saves CPU time Generating signature shares is expensive
24 Batching: A Solution to the Small Update Problem What if we could combine many small updates into a single batch? Each Inner Ring member Decides result of each update individually Generates a signature share over the results of all of the updates Saves CPU time Generating signature shares is expensive Saves network bandwidth Each Byzantine agreement requires O(ringsize 2 ) messages
25 Batching: A Solution to the Small Update Problem What if we could combine many small updates into a single batch? Each Inner Ring member Decides result of each update individually Generates a signature share over the results of all of the updates Saves CPU time Generating signature shares is expensive Saves network bandwidth Each Byzantine agreement requires O(ringsize 2 ) messages But makes signatures unwieldy Each signature is now O(batchsize) long For high throughput, we want batch sizes in the hundreds or thousands 9
26 Merkle Trees: Making Batching Efficient Path 2 H 2 H 1 H 3 Key: Sign: H i = SHA1 (H 2 i, H 2i +1 ) (n=15, H 1 ) H 4 H 5 H 8 Result 1 H 9 Result 2 H 15 Result 15 Build a Merkle Tree over results Each node is a hash of it s two children
27 Merkle Trees: Making Batching Efficient Path 2 H 2 H 1 H 3 Key: Sign: H i = SHA1 (H 2 i, H 2i +1 ) (n=15, H 1 ) H 4 H 5 H 8 Result 1 H 9 Result 2 H 15 Result 15 Build a Merkle Tree over results Each node is a hash of it s two children Sign only the tree size and the top hash To verify Result 2, need only signature plus H 2, H 4.
28 Merkle Trees: Making Batching Efficient Path 2 H 2 H 1 H 3 Key: Sign: H i = SHA1 (H 2 i, H 2i +1 ) (n=15, H 1 ) H 4 H 5 H 8 Result 1 H 9 Result 2 H 15 Result 15 Build a Merkle Tree over results Each node is a hash of it s two children Sign only the tree size and the top hash To verify Result 2, need only signature plus H 2, H 4. Signature over any one result is only O(log batchsize)
29 Merkle Trees: Making Batching Efficient Path 2 H 2 H 1 H 3 Key: Sign: H i = SHA1 (H 2 i, H 2i +1 ) (n=15, H 1 ) H 4 H 5 H 8 Result 1 H 9 Result 2 H 15 Result 15 Build a Merkle Tree over results Each node is a hash of it s two children Sign only the tree size and the top hash To verify Result 2, need only signature plus H 2, H 4. Signature over any one result is only O(log batchsize) Provably secure 10
30 Micro Benchmarks: Throughput vs. Update Size Total Update Operations per Second Ops/s MB/s Total Bandwidth (MB/s) Size of Update (kb) Using 1024 bit keys Max throughput is a respectable 5 MB/s Berkeley DB through Java can only do about 7.5 MB/s But we have a problem with small updates 13 ops/s is atrocious! 11
31 Micro Benchmarks: Throughput vs. Update Size (w/ Batching) Total Update Operations per Second Ops/s, No Batching MB/s, No Batching Ops/s, Naive Batching MB/s, Naive Batching Total Bandwidth (MB/s) Size of Update (kb) Batching works great Amortizes expensive agreements over many updates For small updates, go from 13.5 ops/s to 76 ops/s
32 Micro Benchmarks: Throughput vs. Update Size (w/ Batching) Total Update Operations per Second Ops/s, No Batching MB/s, No Batching Ops/s, Naive Batching MB/s, Naive Batching Total Bandwidth (MB/s) Size of Update (kb) Batching works great Amortizes expensive agreements over many updates For small updates, go from 13.5 ops/s to 76 ops/s Introspecting on batch size should further improve small update tput 12
33 Macro Benchmarks: The Andrew Benchmark Andrew Benchmark JVM UL NFS OSRead OSUpdate OSCreate Client Interface fopen fread fwrite etc. READ WRITE GETATTR etc. Linux Kernel Network Replica Tapestry Tapestry Msgs Built a UNIX file system on top of OceanStore Runs as a user-level NFS daemon on Linux Application s use familiar fopen, fwrite, etc. No recompilation. Kernel translates to NFS requests and sends to local daemon Daemon translates to OceanStore requests and sends out on network 13
34 Macro Benchmarks: The Andrew Benchmark Destination Source U. TX GA Tech Rice UW UCB 45.3 (0.75) 56.5 (0.14) 49.6 (3.1) 20.0 (0.11) UTA 24.1 (0.49) 8.45 (1.5) 61.7 (0.22) GA Tech 27.7 (2.2) 59.0 (0.20) Rice 61.5 (0.69) Inter-host ping times in milliseconds For more realism, we used a nationwide network Find out whether Byzantine agreement is practical in wide area Ran the Andrew Benchmark Simulates software development workload
35 Macro Benchmarks: The Andrew Benchmark Destination Source U. TX GA Tech Rice UW UCB 45.3 (0.75) 56.5 (0.14) 49.6 (3.1) 20.0 (0.11) UTA 24.1 (0.49) 8.45 (1.5) 61.7 (0.22) GA Tech 27.7 (2.2) 59.0 (0.20) Rice 61.5 (0.69) Inter-host ping times in milliseconds For more realism, we used a nationwide network Find out whether Byzantine agreement is practical in wide area Ran the Andrew Benchmark Simulates software development workload For control, used several competitors Linux user-level NFS daemon: real NFS, ships with Debian GNU/Linux Java-based user-level NFS daemon: uses disk (not OceanStore) 14
36 Macro Benchmarks: Local Andrew 80 Time (s) Phase 5: Compile Source Tree Phase 4: Read All Files Phase 3: Stat All Files Phase 2: Copy Source Tree Phase 1: Create Directories 0 Linux NFS Java 512 Simple 1024 Simple 512 Batching + Tentative Simple OceanStore performance not so hot In the local area, NFS is in its element; OceanStore isn t
37 Macro Benchmarks: Local Andrew 80 Time (s) Phase 5: Compile Source Tree Phase 4: Read All Files Phase 3: Stat All Files Phase 2: Copy Source Tree Phase 1: Create Directories 0 Linux NFS Java 512 Simple 1024 Simple 512 Batching + Tentative Simple OceanStore performance not so hot In the local area, NFS is in its element; OceanStore isn t But with tentative update support and batching, OceanStore pretty good Tentative updates let client go on while waiting for agreements Batching allows inner ring to keep up Within a factor of two of Java-based NFS 15
38 Macro Benchmarks: Nationwide Andrew 300 Phase 5: Compile Source Tree Time (s) Phase 4: Read All Files Phase 3: Stat All Files Phase 2: Copy Source Tree Phase 1: Create Directories Simple 512 Simple Linux NFS In the wide area, OceanStore is its element; NFS isn t Even simple OceanStore is nearly within a factor of two Numbers with batching and tentative updates forthcoming Should outperform NFS 16
39 Conclusion All the basics of the OceanStore write path implemented and working Not doing full recovery yet
40 Conclusion All the basics of the OceanStore write path implemented and working Not doing full recovery yet Performance is good Single update time under 100 ms, improves directly with Moore s Law
41 Conclusion All the basics of the OceanStore write path implemented and working Not doing full recovery yet Performance is good Single update time under 100 ms, improves directly with Moore s Law Throughput great for large updates
42 Conclusion All the basics of the OceanStore write path implemented and working Not doing full recovery yet Performance is good Single update time under 100 ms, improves directly with Moore s Law Throughput great for large updates Batching allows inner ring to amortize signatures over many updates Get large update throughput with small updates Secure and space-efficient
43 Conclusion All the basics of the OceanStore write path implemented and working Not doing full recovery yet Performance is good Single update time under 100 ms, improves directly with Moore s Law Throughput great for large updates Batching allows inner ring to amortize signatures over many updates Get large update throughput with small updates Secure and space-efficient Provides a lot more functionality than competition Higher durability and availability than NFS Cryptographic data integrity Versioning allows logical undo 17
Staggeringly Large File Systems. Presented by Haoyan Geng
Staggeringly Large File Systems Presented by Haoyan Geng Large-scale File Systems How Large? Google s file system in 2009 (Jeff Dean, LADIS 09) - 200+ clusters - Thousands of machines per cluster - Pools
More informationProceedings of FAST 03: 2nd USENIX Conference on File and Storage Technologies
USENIX Association Proceedings of FAST 03: 2nd USENIX Conference on File and Storage Technologies San Francisco, CA, USA March 31 April 2, 2003 2003 by The USENIX Association All Rights Reserved For more
More informationStaggeringly Large Filesystems
Staggeringly Large Filesystems Evan Danaher CS 6410 - October 27, 2009 Outline 1 Large Filesystems 2 GFS 3 Pond Outline 1 Large Filesystems 2 GFS 3 Pond Internet Scale Web 2.0 GFS Thousands of machines
More informationAdvantages of P2P systems. P2P Caching and Archiving. Squirrel. Papers to Discuss. Why bother? Implementation
Advantages of P2P systems P2P Caching and Archiving Tyrone Nicholas May 10, 2004 Redundancy built in - by definition there are a large number of servers Lends itself automatically to multiple copies of
More informationImproving Bandwidth Efficiency of Peer-to-Peer Storage
Improving Bandwidth Efficiency of Peer-to-Peer Storage Patrick Eaton, Emil Ong, John Kubiatowicz University of California, Berkeley http://oceanstore.cs.berkeley.edu/ P2P Storage: Promise vs.. Reality
More informationZZ: Cheap Practical BFT using Virtualization
University of Massachusetts, Technical Report TR14-08 1 ZZ: Cheap Practical BFT using Virtualization Timothy Wood, Rahul Singh, Arun Venkataramani, and Prashant Shenoy Department of Computer Science, University
More informationCA485 Ray Walshe Google File System
Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage
More informationCS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Practical Byzantine Consensus CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Scenario Asynchronous system Signed messages s are state machines It has to be practical CS 138
More informationByzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory
Byzantine Fault Tolerance and Consensus Adi Seredinschi Distributed Programming Laboratory 1 (Original) Problem Correct process General goal: Run a distributed algorithm 2 (Original) Problem Correct process
More informationZZ and the Art of Practical BFT Execution
To appear in EuroSys 2 and the Art of Practical BFT Execution Timothy Wood, Rahul Singh, Arun Venkataramani, Prashant Shenoy, And Emmanuel Cecchet Department of Computer Science, University of Massachusetts
More informationTolerating Latency in Replicated State Machines through Client Speculation
Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 1, James Cowling 2, Edmund B. Nightingale 3, Peter M. Chen 1, Jason Flinn 1, Barbara Liskov 2 University of Michigan
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationZyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin
Zyzzyva Speculative Byzantine Fault Tolerance Ramakrishna Kotla L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin The Goal Transform high-performance service into high-performance
More informationPractical Byzantine Fault
Practical Byzantine Fault Tolerance Practical Byzantine Fault Tolerance Castro and Liskov, OSDI 1999 Nathan Baker, presenting on 23 September 2005 What is a Byzantine fault? Rationale for Byzantine Fault
More informationPractical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov
Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov Outline 1. Introduction to Byzantine Fault Tolerance Problem 2. PBFT Algorithm a. Models and overview b. Three-phase protocol c. View-change
More informationVeritas Storage Foundation and. Sun Solaris ZFS. A performance study based on commercial workloads. August 02, 2007
Veritas Storage Foundation and Sun Solaris ZFS A performance study based on commercial workloads August 02, 2007 Introduction...3 Executive Summary...4 About Veritas Storage Foundation...5 Veritas Storage
More informationCIS 21 Final Study Guide. Final covers ch. 1-20, except for 17. Need to know:
CIS 21 Final Study Guide Final covers ch. 1-20, except for 17. Need to know: I. Amdahl's Law II. Moore s Law III. Processes and Threading A. What is a process? B. What is a thread? C. Modes (kernel mode,
More informationDEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. UNIT I PART A (2 marks)
DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK Subject Code : IT1001 Subject Name : Distributed Systems Year / Sem : IV / VII UNIT I 1. Define distributed systems. 2. Give examples of distributed systems
More informationZooKeeper. Table of contents
by Table of contents 1 ZooKeeper: A Distributed Coordination Service for Distributed Applications... 2 1.1 Design Goals... 2 1.2 Data model and the hierarchical namespace... 3 1.3 Nodes and ephemeral nodes...
More informationZZ and the Art of Practical BFT Execution
Extended Technical Report for EuroSys 2 Paper and the Art of Practical BFT Execution Timothy Wood, Rahul Singh, Arun Venkataramani, Prashant Shenoy, And Emmanuel Cecchet Department of Computer Science,
More informationByzantine Fault Tolerance
Byzantine Fault Tolerance CS 240: Computing Systems and Concurrency Lecture 11 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. So far: Fail-stop failures
More informationPractical Byzantine Fault Tolerance (The Byzantine Generals Problem)
Practical Byzantine Fault Tolerance (The Byzantine Generals Problem) Introduction Malicious attacks and software errors that can cause arbitrary behaviors of faulty nodes are increasingly common Previous
More informationRed Hat Gluster Storage performance. Manoj Pillai and Ben England Performance Engineering June 25, 2015
Red Hat Gluster Storage performance Manoj Pillai and Ben England Performance Engineering June 25, 2015 RDMA Erasure Coding NFS-Ganesha New or improved features (in last year) Snapshots SSD support Erasure
More informationByzantine Techniques
November 29, 2005 Reliability and Failure There can be no unity without agreement, and there can be no agreement without conciliation René Maowad Reliability and Failure There can be no unity without agreement,
More informationDistributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf
Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need
More informationIsilon Performance. Name
1 Isilon Performance Name 2 Agenda Architecture Overview Next Generation Hardware Performance Caching Performance Streaming Reads Performance Tuning OneFS Architecture Overview Copyright 2014 EMC Corporation.
More informationPerformance of Various Levels of Storage. Movement between levels of storage hierarchy can be explicit or implicit
Memory Management All data in memory before and after processing All instructions in memory in order to execute Memory management determines what is to be in memory Memory management activities Keeping
More informationSecure Distributed Storage in Peer-to-peer networks
Secure Distributed Storage in Peer-to-peer networks Øyvind Hanssen 07.02.2007 Motivation Mobile and ubiquitous computing Persistent information in untrusted networks Sharing of storage and information
More information10. Replication. Motivation
10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure
More informationByzantine Fault Tolerance Can Be Fast
Byzantine Fault Tolerance Can Be Fast Miguel Castro Microsoft Research Ltd. 1 Guildhall St., Cambridge CB2 3NH, UK mcastro@microsoft.com Barbara Liskov MIT Laboratory for Computer Science 545 Technology
More informationToday. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables
Peer-to-Peer Systems and Distributed Hash Tables COS 418: Distributed Systems Lecture 7 Today 1. Peer-to-Peer Systems Napster, Gnutella, BitTorrent, challenges 2. Distributed Hash Tables 3. The Chord Lookup
More informationEngineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05
Engineering Goals Scalability Availability Transactional behavior Security EAI... Scalability How much performance can you get by adding hardware ($)? Performance perfect acceptable unacceptable Processors
More informationTFS: A Transparent File System for Contributory Storage
TFS: A Transparent File System for Contributory Storage James Cipar, Mark Corner, Emery Berger http://prisms.cs.umass.edu/tcsm University of Massachusetts, Amherst Contributory Applications Users contribute
More informationPractical Byzantine Fault Tolerance
Practical Byzantine Fault Tolerance Robert Grimm New York University (Partially based on notes by Eric Brewer and David Mazières) The Three Questions What is the problem? What is new or different? What
More informationCS 655 Advanced Topics in Distributed Systems
Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3
More informationPeer-to-peer computing research a fad?
Peer-to-peer computing research a fad? Frans Kaashoek kaashoek@lcs.mit.edu NSF Project IRIS http://www.project-iris.net Berkeley, ICSI, MIT, NYU, Rice What is a P2P system? Node Node Node Internet Node
More informationExtremely Fast Distributed Storage for Cloud Service Providers
Solution brief Intel Storage Builders StorPool Storage Intel SSD DC S3510 Series Intel Xeon Processor E3 and E5 Families Intel Ethernet Converged Network Adapter X710 Family Extremely Fast Distributed
More informationChapter 8 Main Memory
COP 4610: Introduction to Operating Systems (Spring 2014) Chapter 8 Main Memory Zhi Wang Florida State University Contents Background Swapping Contiguous memory allocation Paging Segmentation OS examples
More informationInitial Evaluation of a User-Level Device Driver Framework
Initial Evaluation of a User-Level Device Driver Framework Stefan Götz Karlsruhe University Germany sgoetz@ira.uka.de Kevin Elphinstone National ICT Australia University of New South Wales kevine@cse.unsw.edu.au
More informationSoftware-defined Storage: Fast, Safe and Efficient
Software-defined Storage: Fast, Safe and Efficient TRY NOW Thanks to Blockchain and Intel Intelligent Storage Acceleration Library Every piece of data is required to be stored somewhere. We all know about
More informationGoogle File System. Arun Sundaram Operating Systems
Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)
More informationThird Midterm Exam April 24, 2017 CS162 Operating Systems
University of California, Berkeley College of Engineering Computer Science Division EECS Spring 2017 Ion Stoica Third Midterm Exam April 24, 2017 CS162 Operating Systems Your Name: SID AND 162 Login: TA
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationVM & VSE Tech Conference May Orlando Session M70
VM & VSE Tech Conference May 2000 - Orlando Session M70 Bill Bitner VM Performance 607-752-6022 bitner@vnet.ibm.com Last Updated: April10,2000 RETURN TO INDEX Disclaimer Legal Stuff The information contained
More informationDISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.
DISTRIBUTED SYSTEMS 121r itac itple TAYAdiets Second Edition Andrew S. Tanenbaum Maarten Van Steen Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON Prentice Hall Upper Saddle River, NJ 07458 CONTENTS
More informationDistributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017
Distributed Systems 16. Distributed Lookup Paul Krzyzanowski Rutgers University Fall 2017 1 Distributed Lookup Look up (key, value) Cooperating set of nodes Ideally: No central coordinator Some nodes can
More informationThird Midterm Exam April 24, 2017 CS162 Operating Systems
University of California, Berkeley College of Engineering Computer Science Division EECS Spring 2017 Ion Stoica Third Midterm Exam April 24, 2017 CS162 Operating Systems Your Name: SID AND 162 Login: TA
More informationCustomizable Fault Tolerance for Wide-Area Replication
Customizable Fault Tolerance for Wide-Area Replication Yair Amir 1, Brian Coan 2, Jonathan Kirsch 1, John Lane 1 1 Johns Hopkins University, Baltimore, MD. {yairamir, jak, johnlane}@cs.jhu.edu 2 Telcordia
More informationLazy Verification in Fault-Tolerant Distributed Storage Systems
Lazy Verification in Fault-Tolerant Distributed Storage Systems Michael Abd-El-Malek, Gregory R. Ganger, Garth R. Goodson, Michael K. Reiter, Jay J. Wylie Carnegie Mellon University, Network Appliance,
More informationZZ and the Art of Practical BFT
University of Massachusetts, Technical Report 29-24 ZZ and the Art of Practical BFT Timothy Wood, Rahul Singh, Arun Venkataramani, Prashant Shenoy, and Emmanuel Cecchet Department of Computer Science,
More informationByzantine fault tolerance. Jinyang Li With PBFT slides from Liskov
Byzantine fault tolerance Jinyang Li With PBFT slides from Liskov What we ve learnt so far: tolerate fail-stop failures Traditional RSM tolerates benign failures Node crashes Network partitions A RSM w/
More informationThe Google File System (GFS)
1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints
More informationZZ and the Art of Practical BFT Execution
and the Art of Practical BFT Execution Timothy Wood, Rahul Singh, Arun Venkataramani, Prashant Shenoy, and Emmanuel Cecchet University of Massachusetts Amherst {twood,rahul,arun,shenoy,cecchet}@cs.umass.edu
More informationWindows Servers In Microsoft Azure
$6/Month Windows Servers In Microsoft Azure What I m Going Over 1. How inexpensive servers in Microsoft Azure are 2. How I get Windows servers for $6/month 3. Why Azure hosted servers are way better 4.
More informationVoldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data
More informationPBFT: A Byzantine Renaissance. The Setup. What could possibly go wrong? The General Idea. Practical Byzantine Fault-Tolerance (CL99, CL00)
PBFT: A Byzantine Renaissance Practical Byzantine Fault-Tolerance (CL99, CL00) first to be safe in asynchronous systems live under weak synchrony assumptions -Byzantine Paxos! The Setup Crypto System Model
More informationScaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX
Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Inventing Internet TV Available in more than 190 countries 104+ million subscribers Lots of Streaming == Lots of Traffic
More informationMoneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories
Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Adrian M. Caulfield Arup De, Joel Coburn, Todor I. Mollov, Rajesh K. Gupta, Steven Swanson Non-Volatile Systems
More informationRobust BFT Protocols
Robust BFT Protocols Sonia Ben Mokhtar, LIRIS, CNRS, Lyon Joint work with Pierre Louis Aublin, Grenoble university Vivien Quéma, Grenoble INP 18/10/2013 Who am I? CNRS reseacher, LIRIS lab, DRIM research
More informationVirtual Machines Measure Up
Virtual Machines Measure Up Graduate Operating Systems, Fall 2005 Final Project Presentation John Staton Karsten Steinhaeuser University of Notre Dame December 15, 2005 Outline Problem Description Virtual
More informationAtomicity. Bailu Ding. Oct 18, Bailu Ding Atomicity Oct 18, / 38
Atomicity Bailu Ding Oct 18, 2012 Bailu Ding Atomicity Oct 18, 2012 1 / 38 Outline 1 Introduction 2 State Machine 3 Sinfonia 4 Dangers of Replication Bailu Ding Atomicity Oct 18, 2012 2 / 38 Introduction
More informationTolerating latency in replicated state machines through client speculation
Tolerating latency in replicated state machines through client speculation Benjamin Wester Peter M. Chen University of Michigan James Cowling Jason Flinn MIT CSAIL Edmund B. Nightingale Barbara Liskov
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system
More informationFileBench A Prototype Model Based Workload for File Systems
FileBench A Prototype Model Based Workload for File Systems Work In Progress Report 4/1/2004 Richard McDougall Glenn Colaco Sun Microsystems Benchmarks? For Vendors Product characterization Product design
More informationMidterm II December 4 th, 2006 CS162: Operating Systems and Systems Programming
Fall 2006 University of California, Berkeley College of Engineering Computer Science Division EECS John Kubiatowicz Midterm II December 4 th, 2006 CS162: Operating Systems and Systems Programming Your
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW
More informationImprove Web Application Performance with Zend Platform
Improve Web Application Performance with Zend Platform Shahar Evron Zend Sr. PHP Specialist Copyright 2007, Zend Technologies Inc. Agenda Benchmark Setup Comprehensive Performance Multilayered Caching
More informationPractical Byzantine Fault Tolerance
Appears in the Proceedings of the Third Symposium on Operating Systems Design and Implementation, New Orleans, USA, February 1999 Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov Laboratory
More informationExploiting Route Redundancy via Structured Peer to Peer Overlays
Exploiting Route Redundancy ia Structured Peer to Peer Oerlays Ben Y. Zhao, Ling Huang, Jeremy Stribling, Anthony D. Joseph, and John D. Kubiatowicz Uniersity of California, Berkeley Challenges Facing
More informationState of the Linux Kernel
State of the Linux Kernel Timothy D. Witham Chief Technology Officer Open Source Development Labs, Inc. 1 Agenda Process Performance/Scalability Responsiveness Usability Improvements Device support Multimedia
More informationReducing the Costs of Large-Scale BFT Replication
Reducing the Costs of Large-Scale BFT Replication Marco Serafini & Neeraj Suri TU Darmstadt, Germany Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de
More informationPeer-to-Peer Systems and Distributed Hash Tables
Peer-to-Peer Systems and Distributed Hash Tables CS 240: Computing Systems and Concurrency Lecture 8 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Selected
More informationHyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University
HyperDex A Distributed, Searchable Key-Value Store Robert Escriva Bernard Wong Emin Gün Sirer Department of Computer Science Cornell University School of Computer Science University of Waterloo ACM SIGCOMM
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
More informationIBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage
IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage Silverton Consulting, Inc. StorInt Briefing 2017 SILVERTON CONSULTING, INC. ALL RIGHTS RESERVED Page 2 Introduction Unstructured data has
More informationXen Network I/O Performance Analysis and Opportunities for Improvement
Xen Network I/O Performance Analysis and Opportunities for Improvement J. Renato Santos G. (John) Janakiraman Yoshio Turner HP Labs Xen Summit April 17-18, 27 23 Hewlett-Packard Development Company, L.P.
More informationAlgorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II
Memory Performance of Algorithms CSE 32 Data Structures Lecture Algorithm Performance Factors Algorithm choices (asymptotic running time) O(n 2 ) or O(n log n) Data structure choices List or Arrays Language
More informationIBM DS8870 Release 7.0 Performance Update
IBM DS8870 Release 7.0 Performance Update Enterprise Storage Performance David Whitworth Yan Xu 2012 IBM Corporation Agenda Performance Overview System z (CKD) Open Systems (FB) Easy Tier Copy Services
More informationI/O Buffering and Streaming
I/O Buffering and Streaming I/O Buffering and Caching I/O accesses are reads or writes (e.g., to files) Application access is arbitary (offset, len) Convert accesses to read/write of fixed-size blocks
More informationArcGIS Enterprise: Performance and Scalability Best Practices. Darren Baird, PE, Esri
ArcGIS Enterprise: Performance and Scalability Best Practices Darren Baird, PE, Esri dbaird@esri.com What is ArcGIS Enterprise What s Included with ArcGIS Enterprise ArcGIS Server the core web services
More informationCeph: A Scalable, High-Performance Distributed File System
Ceph: A Scalable, High-Performance Distributed File System S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long Presented by Philip Snowberger Department of Computer Science and Engineering University
More informationLecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown
Lecture 21: Reliable, High Performance Storage CSC 469H1F Fall 2006 Angela Demke Brown 1 Review We ve looked at fault tolerance via server replication Continue operating with up to f failures Recovery
More informationOPERATING SYSTEM. Chapter 12: File System Implementation
OPERATING SYSTEM Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management
More informationKernel Level Speculative DSM
Motivation Main interest is performance, fault-tolerance, and correctness of distributed systems Present our ideas in the context of a DSM system We are developing tools that Improve performance Address
More informationFile Size Distribution on UNIX Systems Then and Now
File Size Distribution on UNIX Systems Then and Now Andrew S. Tanenbaum, Jorrit N. Herder*, Herbert Bos Dept. of Computer Science Vrije Universiteit Amsterdam, The Netherlands {ast@cs.vu.nl, jnherder@cs.vu.nl,
More informationPractical Byzantine Fault Tolerance Consensus and A Simple Distributed Ledger Application Hao Xu Muyun Chen Xin Li
Practical Byzantine Fault Tolerance Consensus and A Simple Distributed Ledger Application Hao Xu Muyun Chen Xin Li Abstract Along with cryptocurrencies become a great success known to the world, how to
More informationAgenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache
Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More informationCold Storage: The Road to Enterprise Ilya Kuznetsov YADRO
Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO Agenda Technical challenge Custom product Growth of aspirations Enterprise requirements Making an enterprise cold storage product 2 Technical Challenge
More informationHDFS: Hadoop Distributed File System. Sector: Distributed Storage System
GFS: Google File System Google C/C++ HDFS: Hadoop Distributed File System Yahoo Java, Open Source Sector: Distributed Storage System University of Illinois at Chicago C++, Open Source 2 System that permanently
More informationDYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun
DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:
More information1 of 8 14/12/2013 11:51 Tuning long-running processes Contents 1. Reduce the database size 2. Balancing the hardware resources 3. Specifying initial DB2 database settings 4. Specifying initial Oracle database
More informationGeorgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong
Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Relatively recent; still applicable today GFS: Google s storage platform for the generation and processing of data used by services
More informationCS Amazon Dynamo
CS 5450 Amazon Dynamo Amazon s Architecture Dynamo The platform for Amazon's e-commerce services: shopping chart, best seller list, produce catalog, promotional items etc. A highly available, distributed
More informationNFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency
Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency Local file systems Disks are terrible abstractions: low-level blocks, etc. Directories, files, links much
More informationChallenges in the Wide-area. Tapestry: Decentralized Routing and Location. Global Computation Model. Cluster-based Applications
Challenges in the Wide-area Tapestry: Decentralized Routing and Location System Seminar S 0 Ben Y. Zhao CS Division, U. C. Berkeley Trends: Exponential growth in CPU, b/w, storage Network expanding in
More informationMemory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts
Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of
More informationCMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22
More informationSoftware Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec
Software Defined Storage at the Speed of Flash PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec Agenda Introduction Software Technology Architecture Review Oracle Configuration
More informationChapter 8: Main Memory
Chapter 8: Main Memory Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and 64-bit Architectures Example:
More informationChallenges in the Wide-area. Tapestry: Decentralized Routing and Location. Key: Location and Routing. Driving Applications
Challenges in the Wide-area Tapestry: Decentralized Routing and Location SPAM Summer 00 Ben Y. Zhao CS Division, U. C. Berkeley! Trends: Exponential growth in CPU, b/w, storage Network expanding in reach
More information