Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems. Dong Dai, Yong Chen, Dries Kimpe, and Robert Ross
|
|
- Emery Nathan Clark
- 5 years ago
- Views:
Transcription
1 Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems Dong Dai, Yong Chen, Dries Kimpe, and Robert Ross
2 Parallel Object Storage Many HPC systems utilize object storage: PVFS, Lustre, PanFS, Ceph, etc. Files are arranged as list of objects, which are physically distributed to OSDs I/O requests are mapped to objects, each object is served by OSD Application Application p1 p2 p3 pn File=>Objects Object Store Device OSD OSD OSD OSD
3 Parallel Object Storage Many HPC systems utilize object storage: PVFS, Lustre, PanFS, Ceph, etc. Files are arranged as list of objects, which are physically distributed to OSDs I/O requests are mapped to objects, each object is served by OSD Application Application p1 p2 p3 pn File=>Objects Object Store Device OSD OSD OSD OSD
4 Parallel Object Storage Many HPC systems utilize object storage: PVFS, Lustre, PanFS, Ceph, etc. Files are arranged as list of objects, which are physically distributed to OSDs I/O requests are mapped to objects, each object is served by OSD Application Application p1 p2 p3 pn File=>Objects Object Store Device OSD OSD OSD OSD I/O Stragglers
5 Motivation: I/O Stragglers The existence of I/O stragglers is a well-know problem Long-term I/O stragglers Hours, days, even always Caused by software bugs, hardware failures, or outdated hardware Short-term, dynamic I/O stragglers Minutes, seconds, or even less Come from interference between applications, or resource contention With more clients and storage servers, this problem will not become better In this research, we focus on detecting and avoiding short-term, dynamic I/O Stragglers.
6 Motivation: I/O Stragglers The existence of I/O stragglers is a well-know problem Long-term I/O stragglers Hours, days, even always Statistical data help identify them Caused by software bugs, hardware failures, or outdated hardware Short-term, dynamic I/O stragglers Minutes, seconds, or even less Come from interference between applications, or resource contention With more clients and storage servers, this problem will not become better In this research, we focus on detecting and avoiding short-term, dynamic I/O Stragglers.
7 Motivation: I/O Stragglers The existence of I/O stragglers is a well-know problem Long-term I/O stragglers Hours, days, even always Statistical data help identify them Caused by software bugs, hardware failures, or outdated hardware Short-term, dynamic I/O stragglers Minutes, seconds, or even less No good strategy to identify them Come from interference between applications, or resource contention With more clients and storage servers, this problem will not become better In this research, we focus on detecting and avoiding short-term, dynamic I/O Stragglers.
8 Two-choice randomized, dynamic HPC I/O scheduler Identify and avoid the short-term stragglers by tracking the real-time performance of storage servers Dynamically place write operations to OSDs in a decentralized way (use Two-choice Randomization) Efficient way to track the dynamic data placement due to the I/O scheduling
9 K. Ousterhout, Sparrow: Distributed, Low Latency Scheduling, SOSP 2013 Two-Choice Algorithm Parallel, randomized load-balancer M. D. Mitzenmacher, et. al. The Power of Two Choices in Randomized Load Balancing, Also applied in task scheduler recently (SOSP 2013) Having two choices yields a qualitatively improvement on Maximal Queue Length Randomly choose one Randomly choose two, and select one Requests Servers Requests Queue Queue Length
10 K. Ousterhout, Sparrow: Distributed, Low Latency Scheduling, SOSP 2013 Two-Choice Algorithm Parallel, randomized load-balancer M. D. Mitzenmacher, et. al. The Power of Two Choices in Randomized Load Balancing, Also applied in task scheduler recently (SOSP 2013) Having two choices yields a qualitatively improvement on Maximal Queue Length Randomly choose one Randomly choose two, and select one Requests Servers Requests Queue Queue Length
11 K. Ousterhout, Sparrow: Distributed, Low Latency Scheduling, SOSP 2013 Two-Choice Algorithm Parallel, randomized load-balancer M. D. Mitzenmacher, et. al. The Power of Two Choices in Randomized Load Balancing, Also applied in task scheduler recently (SOSP 2013) Having two choices yields a qualitatively improvement on Maximal Queue Length Randomly choose one Randomly choose two, and select one Requests Servers Requests Queue Queue Length
12 K. Ousterhout, Sparrow: Distributed, Low Latency Scheduling, SOSP 2013 Two-Choice Algorithm Parallel, randomized load-balancer M. D. Mitzenmacher, et. al. The Power of Two Choices in Randomized Load Balancing, Also applied in task scheduler recently (SOSP 2013) Having two choices yields a qualitatively improvement on Maximal Queue Length Randomly choose one Randomly choose two, and select one Requests Servers Requests Queue Queue Length
13 K. Ousterhout, Sparrow: Distributed, Low Latency Scheduling, SOSP 2013 Two-Choice Algorithm Parallel, randomized load-balancer M. D. Mitzenmacher, et. al. The Power of Two Choices in Randomized Load Balancing, Also applied in task scheduler recently (SOSP 2013) Having two choices yields a qualitatively improvement on Maximal Queue Length Randomly choose one Randomly choose two, and select one Requests Servers Requests Queue Queue Length
14 Is Two-Choice Good Enough? Two-Choice seems to be promising We simulate it in HPC I/O to see what will happen Run background workloads that issue I/O operations on storage servers Synthetically create some slower servers (I/O stragglers) by putting 5x loads on them Run a new application with parallel I/O operations to see the response time (based on the slowest I/O operation) Response Time(ms) Fixed Scheduler Random Selection Two Choice Random 1,000 Storage Nodes 10,000 Processes 1MB writes 5ms round-trip time Straggler Ratio the percentage of servers that are much slower
15 Is Two-Choice Good Enough? Two-Choice seems to be promising We simulate it in HPC I/O to see what will happen Run background workloads that issue I/O operations on storage servers Synthetically create some slower servers (I/O stragglers) by putting 5x loads on them Run a new application with parallel I/O operations to see the response time (based on the slowest I/O operation) Random selection can easily be worse than fixed strategy Response Time(ms) Fixed Scheduler Random Selection Two Choice Random 1,000 Storage Nodes 10,000 Processes 1MB writes 5ms round-trip time Straggler Ratio the percentage of servers that are much slower
16 Is Two-Choice Good Enough? Two-Choice seems to be promising We simulate it in HPC I/O to see what will happen Run background workloads that issue I/O operations on storage servers Synthetically create some slower servers (I/O stragglers) by putting 5x loads on them Run a new application with parallel I/O operations to see the response time (based on the slowest I/O operation) Random selection can easily Two-Choice converges to the same be worse than fixed strategy performance as Fixed Scheduler Response Time(ms) Fixed Scheduler Random Selection Two Choice Random 1,000 Storage Nodes 10,000 Processes 1MB writes 5ms round-trip time Straggler Ratio the percentage of servers that are much slower
17 Why I/O schedulers perform like this? By analyzing the scheduling results, we find: Random selection most likely will hit some of the stragglers will still put I/O requests on heavy-loaded servers Native two-choice algorithm can avoid stragglers when there are not many of them tends to put much more I/O requests on servers with slightly less load may generate new hotspots Maximal Scheduled I/O Requests Fixed Scheduler Random Selection Two Choice Random Server Number of Each Load Current Load (Pending I/O Requests)
18 Extend Native Two-Choice We can not simply apply native two-choice in HPC I/O as All schedulers probe only two random storage servers can not avoid stragglers effectively High concurrency HPC apps place I/O requests at the same time all schedulers probe at the same time, make the same scheduling decision may generate new stragglers
19 Extend Native Two-Choice We can not simply apply native two-choice in HPC I/O as All schedulers probe only two random storage servers can not avoid stragglers effectively High concurrency HPC apps place I/O requests at the same time all schedulers probe at the same time, make the same scheduling decision may generate new stragglers Extend native two-choice Collaborative Probe Strategy Preassign Strategy
20 Collaborative Probe (CP) Strategy Combine concurrent probing in a single node together Attach more information for scheduling from storage servers
21 Collaborative Probe (CP) Strategy Combine concurrent probing in a single node together Attach more information for scheduling from storage servers Probe two, Select one
22 Collaborative Probe (CP) Strategy Combine concurrent probing in a single node together Attach more information for scheduling from storage servers Probe two, Select one
23 Collaborative Probe (CP) Strategy Combine concurrent probing in a single node together Attach more information for scheduling from storage servers Probe two, Select one Probe 2*k, Select k
24 Collaborative Probe (CP) Strategy Combine concurrent probing in a single node together Attach more information for scheduling from storage servers Probe two, Select one Probe 2*k, Select k Schedulers knew k times more server info
25 Collaborative Probe (CP) Strategy Combine concurrent probing in a single node together Attach more information for scheduling from storage servers Probe two, Select one Probe 2*k, Select k Schedulers knew k times more server info
26 Collaborative Probe (CP) Strategy Combine concurrent probing in a single node together Attach more information for scheduling from storage servers Probe two, Select one Probe 2*k, Select k Probe with attached info Schedulers knew k times more server info
27 Collaborative Probe (CP) Strategy Combine concurrent probing in a single node together Attach more information for scheduling from storage servers Probe two, Select one Probe 2*k, Select k Probe with attached info Schedulers knew k times more server info Schedulers learn from shortterm cached server info
28 With Collaborative Probe (CP) Strategy Scheduler observes more load information and creates better performance when the straggler ratio is low But, with more stragglers, the performance drops down quickly caused by the high concurrency of HPC I/O Response Time(ms) Fixed Scheduler Random Selection Two Choice Random Two Choice with Collaborative Probe Straggler Ratio
29 With Collaborative Probe (CP) Strategy Scheduler observes more load information and creates better performance when the straggler ratio is low But, with more stragglers, the performance drops down quickly caused by the high concurrency of HPC I/O Response Time(ms) Better Performance Fixed Scheduler Random Selection Two Choice Random Two Choice with Collaborative Probe Straggler Ratio
30 With Collaborative Probe (CP) Strategy Scheduler observes more load information and creates better performance when the straggler ratio is low But, with more stragglers, the performance drops down quickly caused by the high concurrency of HPC I/O Response Time(ms) Better Performance Performance Gets Worse Quickly Fixed Scheduler Random Selection Two Choice Random Two Choice with Collaborative Probe Straggler Ratio
31 Preassign Strategy probe time line OSD local load Info Storage servers maintain statistics about the acceptance of a given local load value add a fraction of current load from the probe information based on this historical statistics After receiving the real requests, storage servers will adjust these preassigned loads
32 Preassign Strategy probe Compare, Select time line OSD local load Info Storage servers maintain statistics about the acceptance of a given local load value add a fraction of current load from the probe information based on this historical statistics After receiving the real requests, storage servers will adjust these preassigned loads
33 Preassign Strategy probe Compare, Select I/O Arrives time line OSD local load Info Storage servers maintain statistics about the acceptance of a given local load value add a fraction of current load from the probe information based on this historical statistics After receiving the real requests, storage servers will adjust these preassigned loads
34 Preassign Strategy probe Compare, Select I/O Arrives time line OSD local load Info Updated Load Info No Change on Load Information Storage servers maintain statistics about the acceptance of a given local load value add a fraction of current load from the probe information based on this historical statistics After receiving the real requests, storage servers will adjust these preassigned loads
35 Preassign Strategy probe probe Compare, Select I/O Arrives time line OSD local load Info Updated Load Info No Change on Load Information Storage servers maintain statistics about the acceptance of a given local load value add a fraction of current load from the probe information based on this historical statistics After receiving the real requests, storage servers will adjust these preassigned loads
36 Preassign Strategy Compare, Select probe probe Compare, Select I/O Arrives time line OSD local load Info Updated Load Info No Change on Load Information Storage servers maintain statistics about the acceptance of a given local load value add a fraction of current load from the probe information based on this historical statistics After receiving the real requests, storage servers will adjust these preassigned loads
37 With CP + Preassign With Pre-assign, schedulers will not get the same load information with more probes, the expected loads on that server will increase if the pre-assigned loads are large, the possibility of being accepted will be low Even with more stragglers, we still are able to keep the performance stable Better performance by avoid stragglers Response Time(ms) Stable performance without creating new stragglers Fixed Scheduler Random Selection Two Choice Random Collaborative Probe CP + Preassign Straggler Ratio
38 Implementation Issues In order to implement a dynamically scheduler in an object storage systems be able to redirect I/O requests to arbitrary storage servers be able to remember these redirections and read back in the future Core Idea: Do not update metadata servers, update storage servers Do not invalidate client-side cache, move redirected data back OSD OSD OSD Components: Redirect Table Metadata Migration
39 Implementation Issues In order to implement a dynamically scheduler in an object storage systems be able to redirect I/O requests to arbitrary storage servers be able to remember these redirections and read back in the future Small fragments Core Idea: Do not update metadata servers, update storage servers Do not invalidate client-side cache, move redirected data back OSD OSD OSD Components: Redirect Table Metadata Migration
40 Implementation Issues In order to implement a dynamically scheduler in an object storage systems be able to redirect I/O requests to arbitrary storage servers be able to remember these redirections and read back in the future Small fragments Core Idea: Do not update metadata servers, update storage servers Do not invalidate client-side cache, move redirected data back OSD OSD OSD Components: Redirect Table Metadata Migration
41 Read/Write under this solution Write Write(obj, offset, len) randomly probe several storage servers, get their real-time loads compare and select the one who should server this I/O request put the real data into that selected server; put the redirection information into the original server Read Read(obj, offset, len) find the original server based on client-side metadata cache send read request to the original server, may false hit a server get the redirected server location, and send read request there again return the value from the redirected server
42 Performance Consideration Redirection is not free It introduces extra latency During writes, probing and updating redirect table During reads, false hit and querying redirect table
43 Performance Consideration Redirection is not free It introduces extra latency During writes, probing and updating redirect table During reads, false hit and querying redirect table Move data back fast to reduce the cache miss
44 Performance Consideration Redirection is not free It introduces extra latency During writes, probing and updating redirect table During reads, false hit and querying redirect table Move data back fast to reduce the cache miss Efficient redirect table to support fast updating and querying
45 Efficient Redirect table Supported operations create/query/delete Redirect table time-order array storing redirection ranges and target servers create indicates appending a new entry at the end of the redirect table query looks back the table one by one to search the visited range delete also appends new entries Tips Maintain an in-memory structure for fast access Append new entry with timestamps Remove overlapped existing entries while appending new ones by checking their range
46 Move Data Back: Metadata Migration Metadata Migration threads run in the background of storage servers With the lowest priority, pause if there are I/O requests We have plenty of opportunities to do this Carns, Philip, et al. Understanding and improving computational science storage access through continuous characterization, TOS, 2011
47 Move Data Back: Metadata Migration Metadata Migration threads run in the background of storage servers With the lowest priority, pause if there are I/O requests We have plenty of opportunities to do this Carns, Philip, et al. Understanding and improving computational science storage access through continuous characterization, TOS, 2011 Consistency is easier to manage as each storage server manages its own redirect table Use timestamps to direct the consistency management
48 Implementation and Evaluation Prototype based on Triton - an object-based storage system All evaluations were conducted on the Fusion cluster in ANL 320 compute nodes 2.53 GHz Xeon CPU 36 GB Memory and 250 GB local hard disk Strategy Evaluate the critical components separately Probing Performance Redirect Performance Evaluate the overall performance on real-world workloads Whole cluster load-balancing Single application finish time
49 Evaluate with Workloads Evaluate Synthetically create a group of I/O workloads to mimic the real-world application s behaviors. Using 128 storage servers, 64 compute nodes (512 cores) Fixed Scheduler indicates the round-robin scheduler (use random start index) Dynamic Scheduler indicates the Two-choice randomized scheduler A better balanced load on each storage server
50 Evaluate Application Evaluate Use previous workloads running in the background to generate unbalanced workloads Schedule one new application to run Results: Fixed scheduler scheduled more processes to finish in first several seconds; Native Two-choice scheduler suffers at creating new stragglers Proposed Two-choice scheduler finish faster since avoid the stragglers
51 Evaluate Application Evaluate Use previous workloads running in the background to generate unbalanced workloads Schedule one new application to run Results: Fixed scheduler scheduled more processes to finish in first several seconds; 60% I/Os finished soon, 100% finished at 200+s Native Two-choice scheduler suffers at creating new stragglers Proposed Two-choice scheduler finish faster since avoid the stragglers
52 Evaluate Application Evaluate Use previous workloads running in the background to generate unbalanced workloads Schedule one new application to run Results: 90% I/Os finished soon, 100% at around 200s Fixed scheduler scheduled more processes to finish in first several seconds; 60% I/Os finished soon, 100% finished at 200+s Native Two-choice scheduler suffers at creating new stragglers Proposed Two-choice scheduler finish faster since avoid the stragglers
53 Evaluate Application Evaluate Use previous workloads running in the background to generate unbalanced workloads Schedule one new application to run Results: 100% I/Os finished 90% I/Os finished soon, 100% at around 200s Fixed scheduler scheduled more processes to finish in first several seconds; 60% I/Os finished soon, 100% finished at 200+s Native Two-choice scheduler suffers at creating new stragglers Proposed Two-choice scheduler finish faster since avoid the stragglers
54 Conclusion Extends the native two-choice randomized algorithm (collaborative probe, preassign strategy) to I/O scheduler for object storage systems Implement new components (redirect table, metadata migration) for such dynamic I/O scheduler in parallel file systems Evaluations confirm the better response time for applications and also a better load balance for storage servers.
55 Thanks & Questions
GraphTrek: Asynchronous Graph Traversal for Property Graph-Based Metadata Management
GraphTrek: Asynchronous Graph Traversal for Property Graph-Based Metadata Management Dong Dai, Philip Carns, Robert B. Ross, John Jenkins, Kyle Blauer, and Yong Chen Metadata Management Challenges in HPC
More informationECE7995 (7) Parallel I/O
ECE7995 (7) Parallel I/O 1 Parallel I/O From user s perspective: Multiple processes or threads of a parallel program accessing data concurrently from a common file From system perspective: - Files striped
More informationCrossing the Chasm: Sneaking a parallel file system into Hadoop
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large
More informationCrossing the Chasm: Sneaking a parallel file system into Hadoop
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large
More informationSharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Social Environment
SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Social Environment This document is provided as-is. Information and views expressed in this document, including URL and other Internet
More informationSharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment
SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment This document is provided as-is. Information and views expressed in this document, including
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
More informationGFS-python: A Simplified GFS Implementation in Python
GFS-python: A Simplified GFS Implementation in Python Andy Strohman ABSTRACT GFS-python is distributed network filesystem written entirely in python. There are no dependencies other than Python s standard
More informationOutline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work
Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3
More informationUsing Transparent Compression to Improve SSD-based I/O Caches
Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr
More informationAmbry: LinkedIn s Scalable Geo- Distributed Object Store
Ambry: LinkedIn s Scalable Geo- Distributed Object Store Shadi A. Noghabi *, Sriram Subramanian +, Priyesh Narayanan +, Sivabalan Narayanan +, Gopalakrishna Holla +, Mammad Zadeh +, Tianwei Li +, Indranil
More informationVIRTUAL MEMORY READING: CHAPTER 9
VIRTUAL MEMORY READING: CHAPTER 9 9 MEMORY HIERARCHY Core! Processor! Core! Caching! Main! Memory! (DRAM)!! Caching!! Secondary Storage (SSD)!!!! Secondary Storage (Disk)! L cache exclusive to a single
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationDistributed Scheduling for the Sombrero Single Address Space Distributed Operating System
Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.
More informationDistributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented
More informationAn Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar
An Exploration into Object Storage for Exascale Supercomputers Raghu Chandrasekar Agenda Introduction Trends and Challenges Design and Implementation of SAROJA Preliminary evaluations Summary and Conclusion
More information! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like
Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total
More informationI/O Characterization of Commercial Workloads
I/O Characterization of Commercial Workloads Kimberly Keeton, Alistair Veitch, Doug Obal, and John Wilkes Storage Systems Program Hewlett-Packard Laboratories www.hpl.hp.com/research/itc/csl/ssp kkeeton@hpl.hp.com
More informationLecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown
Lecture 21: Reliable, High Performance Storage CSC 469H1F Fall 2006 Angela Demke Brown 1 Review We ve looked at fault tolerance via server replication Continue operating with up to f failures Recovery
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 21 Main Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ Why not increase page size
More informationFastScale: Accelerate RAID Scaling by
FastScale: Accelerate RAID Scaling by Minimizing i i i Data Migration Weimin Zheng, Guangyan Zhang gyzh@tsinghua.edu.cn Tsinghua University Outline Motivation Minimizing data migration Optimizing data
More informationPebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees
PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research
More informationGFS: The Google File System. Dr. Yingwu Zhu
GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can
More informationCharacterizing Private Clouds: A Large-Scale Empirical Analysis of Enterprise Clusters
Characterizing Private Clouds: A Large-Scale Empirical Analysis of Enterprise Clusters Ignacio Cano, Srinivas Aiyar, Arvind Krishnamurthy University of Washington Nutanix Inc. ACM Symposium on Cloud Computing
More informationModule Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.
M6 Memory Hierarchy Module Outline CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies. Events on a Cache Miss Events on a Cache Miss Stall the pipeline.
More informationAnnouncements. Reading. Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) CMSC 412 S14 (lect 5)
Announcements Reading Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) 1 Relationship between Kernel mod and User Mode User Process Kernel System Calls User Process
More informationMemory Hierarchy. Goal: Fast, unlimited storage at a reasonable cost per bit.
Memory Hierarchy Goal: Fast, unlimited storage at a reasonable cost per bit. Recall the von Neumann bottleneck - single, relatively slow path between the CPU and main memory. Fast: When you need something
More informationGFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures
GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,
More informationAuthors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani
The Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani CS5204 Operating Systems 1 Introduction GFS is a scalable distributed file system for large data intensive
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton
More informationNetwork Load Balancing Methods: Experimental Comparisons and Improvement
Network Load Balancing Methods: Experimental Comparisons and Improvement Abstract Load balancing algorithms play critical roles in systems where the workload has to be distributed across multiple resources,
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW
More informationApplication Acceleration Beyond Flash Storage
Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage
More informationUK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.
UK LUG 10 th July 2012 Lustre at Exascale Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com Agenda Exascale I/O requirements Exascale I/O model 3 Lustre at Exascale - UK LUG 10th July 2012 Exascale I/O
More informationOptimizing Local File Accesses for FUSE-Based Distributed Storage
Optimizing Local File Accesses for FUSE-Based Distributed Storage Shun Ishiguro 1, Jun Murakami 1, Yoshihiro Oyama 1,3, Osamu Tatebe 2,3 1. The University of Electro-Communications, Japan 2. University
More informationModification and Evaluation of Linux I/O Schedulers
Modification and Evaluation of Linux I/O Schedulers 1 Asad Naweed, Joe Di Natale, and Sarah J Andrabi University of North Carolina at Chapel Hill Abstract In this paper we present three different Linux
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationSANDPIPER: BLACK-BOX AND GRAY-BOX STRATEGIES FOR VIRTUAL MACHINE MIGRATION
SANDPIPER: BLACK-BOX AND GRAY-BOX STRATEGIES FOR VIRTUAL MACHINE MIGRATION Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif * University of Massachusetts Amherst * Intel, Portland Data
More informationTRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS
TRASH DAY: COORDINATING GARBAGE COLLECTION IN DISTRIBUTED SYSTEMS Martin Maas* Tim Harris KrsteAsanovic* John Kubiatowicz* *University of California, Berkeley Oracle Labs, Cambridge Why you should care
More informationZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency
ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr
More informationDo You Know What Your I/O Is Doing? (and how to fix it?) William Gropp
Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp www.cs.illinois.edu/~wgropp Messages Current I/O performance is often appallingly poor Even relative to what current systems can achieve
More informationNPTEL Course Jan K. Gopinath Indian Institute of Science
Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,
More informationMemory Hierarchy: Caches, Virtual Memory
Memory Hierarchy: Caches, Virtual Memory Readings: 5.1-5.4, 5.8 Big memories are slow Computer Fast memories are small Processor Memory Devices Control Input Datapath Output Need to get fast, big memories
More informationData Transformation and Migration in Polystores
Data Transformation and Migration in Polystores Adam Dziedzic, Aaron Elmore & Michael Stonebraker September 15th, 2016 Agenda Data Migration for Polystores: What & Why? How? Acceleration of physical data
More informationMidterm Exam Solutions and Grading Guidelines March 3, 1999 CS162 Operating Systems
University of California, Berkeley College of Engineering Computer Science Division EECS Spring 1999 Anthony D. Joseph Midterm Exam Solutions and Grading Guidelines March 3, 1999 CS162 Operating Systems
More informationParallel Databases C H A P T E R18. Practice Exercises
C H A P T E R18 Parallel Databases Practice Exercises 181 In a range selection on a range-partitioned attribute, it is possible that only one disk may need to be accessed Describe the benefits and drawbacks
More informationAerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song
Aerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song Outline 1. Storage-Class Memory (SCM) 2. Motivation 3. Design of Aerie 4. File System Features
More informationIndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion
IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion Kai Ren Qing Zheng, Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University Why Scalable
More informationExtreme Storage Performance with exflash DIMM and AMPS
Extreme Storage Performance with exflash DIMM and AMPS 214 by 6East Technologies, Inc. and Lenovo Corporation All trademarks or registered trademarks mentioned here are the property of their respective
More informationPerformance Modeling and Analysis of Flash based Storage Devices
Performance Modeling and Analysis of Flash based Storage Devices H. Howie Huang, Shan Li George Washington University Alex Szalay, Andreas Terzis Johns Hopkins University MSST 11 May 26, 2011 NAND Flash
More information1. Consider the following page reference string: 1, 2, 3, 4, 2, 1, 5, 6, 2, 1, 2, 3, 7, 6, 3, 2, 1, 2, 3, 6.
1. Consider the following page reference string: 1, 2, 3, 4, 2, 1, 5, 6, 2, 1, 2, 3, 7, 6, 3, 2, 1, 2, 3, 6. What will be the ratio of page faults for the following replacement algorithms - FIFO replacement
More informationCS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015
CS 3: Intro to Systems Caching Kevin Webb Swarthmore College March 24, 205 Reading Quiz Abstraction Goal Reality: There is no one type of memory to rule them all! Abstraction: hide the complex/undesirable
More informationPresented by: Nafiseh Mahmoudi Spring 2017
Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory
More informationPlot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;
How will execution time grow with SIZE? int array[size]; int A = ; for (int i = ; i < ; i++) { for (int j = ; j < SIZE ; j++) { A += array[j]; } TIME } Plot SIZE Actual Data 45 4 5 5 Series 5 5 4 6 8 Memory
More informationMoneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories
Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Adrian M. Caulfield Arup De, Joel Coburn, Todor I. Mollov, Rajesh K. Gupta, Steven Swanson Non-Volatile Systems
More informationGetafix: Workload-aware Distributed Interactive Analytics
Getafix: Workload-aware Distributed Interactive Analytics Presenter: Mainak Ghosh Collaborators: Le Xu, Xiaoyao Qian, Thomas Kao, Indranil Gupta, Himanshu Gupta Data Analytics 2 Picture borrowed from https://conferences.oreilly.com/strata/strata-ny-2016/public/schedule/detail/51640
More informationPROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18
PROCESS VIRTUAL MEMORY CS124 Operating Systems Winter 2015-2016, Lecture 18 2 Programs and Memory Programs perform many interactions with memory Accessing variables stored at specific memory locations
More informationGo Deep: Fixing Architectural Overheads of the Go Scheduler
Go Deep: Fixing Architectural Overheads of the Go Scheduler Craig Hesling hesling@cmu.edu Sannan Tariq stariq@cs.cmu.edu May 11, 2018 1 Introduction Golang is a programming language developed to target
More informationVirtual Memory. Kevin Webb Swarthmore College March 8, 2018
irtual Memory Kevin Webb Swarthmore College March 8, 2018 Today s Goals Describe the mechanisms behind address translation. Analyze the performance of address translation alternatives. Explore page replacement
More informationMemory Management Virtual Memory
Memory Management Virtual Memory Part of A3 course (by Theo Schouten) Biniam Gebremichael http://www.cs.ru.nl/~biniam/ Office: A6004 April 4 2005 Content Virtual memory Definition Advantage and challenges
More informationCSE 124: Networked Services Fall 2009 Lecture-19
CSE 124: Networked Services Fall 2009 Lecture-19 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa09/cse124 Some of these slides are adapted from various sources/individuals including but
More informationDell EMC CIFS-ECS Tool
Dell EMC CIFS-ECS Tool Architecture Overview, Performance and Best Practices March 2018 A Dell EMC Technical Whitepaper Revisions Date May 2016 September 2016 Description Initial release Renaming of tool
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationClick to edit Master title
Click to edit Master title DIMM: A Distributed Metadata Management for Data-Intensive HPC Brandon Szeliga, John Cavicchio and Weisong Shi Wayne State University bszeliga@wayne.edu 1 Click Roadmap to edit
More informationOS and Hardware Tuning
OS and Hardware Tuning Tuning Considerations OS Threads Thread Switching Priorities Virtual Memory DB buffer size File System Disk layout and access Hardware Storage subsystem Configuring the disk array
More informationMultiprocessor Scheduling. Multiprocessor Scheduling
Multiprocessor Scheduling Will consider only shared memory multiprocessor or multi-core CPU Salient features: One or more caches: cache affinity is important Semaphores/locks typically implemented as spin-locks:
More informationMultiprocessor Scheduling
Multiprocessor Scheduling Will consider only shared memory multiprocessor or multi-core CPU Salient features: One or more caches: cache affinity is important Semaphores/locks typically implemented as spin-locks:
More informationFile Open, Close, and Flush Performance Issues in HDF5 Scot Breitenfeld John Mainzer Richard Warren 02/19/18
File Open, Close, and Flush Performance Issues in HDF5 Scot Breitenfeld John Mainzer Richard Warren 02/19/18 1 Introduction Historically, the parallel version of the HDF5 library has suffered from performance
More informationMain Memory. Electrical and Computer Engineering Stephen Kim ECE/IUPUI RTOS & APPS 1
Main Memory Electrical and Computer Engineering Stephen Kim (dskim@iupui.edu) ECE/IUPUI RTOS & APPS 1 Main Memory Background Swapping Contiguous allocation Paging Segmentation Segmentation with paging
More informationFall COMP3511 Review
Outline Fall 2015 - COMP3511 Review Monitor Deadlock and Banker Algorithm Paging and Segmentation Page Replacement Algorithms and Working-set Model File Allocation Disk Scheduling Review.2 Monitors Condition
More informationGetting it Right: Testing Storage Arrays The Way They ll be Used
Getting it Right: Testing Storage Arrays The Way They ll be Used Peter Murray Virtual Instruments Flash Memory Summit 2017 Santa Clara, CA 1 The Journey: How Did we Get Here? Storage testing was black
More informationWHITE PAPER. Optimizing Virtual Platform Disk Performance
WHITE PAPER Optimizing Virtual Platform Disk Performance Optimizing Virtual Platform Disk Performance 1 The intensified demand for IT network efficiency and lower operating costs has been driving the phenomenal
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationXen scheduler status. George Dunlap Citrix Systems R&D Ltd, UK
Xen scheduler status George Dunlap Citrix Systems R&D Ltd, UK george.dunlap@eu.citrix.com Goals for talk Understand the problem: Why a new scheduler? Understand reset events in credit1 and credit2 algorithms
More informationOS and HW Tuning Considerations!
Administração e Optimização de Bases de Dados 2012/2013 Hardware and OS Tuning Bruno Martins DEI@Técnico e DMIR@INESC-ID OS and HW Tuning Considerations OS " Threads Thread Switching Priorities " Virtual
More informationOASIS: Self-tuning Storage for Applications
OASIS: Self-tuning Storage for Applications Kostas Magoutis, Prasenjit Sarkar, Gauri Shah 14 th NASA Goddard- 23 rd IEEE Mass Storage Systems Technologies, College Park, MD, May 17, 2006 Outline Motivation
More informationSCALING A DISTRIBUTED SPATIAL CACHE OVERLAY. Alexander Gessler Simon Hanna Ashley Marie Smith
SCALING A DISTRIBUTED SPATIAL CACHE OVERLAY Alexander Gessler Simon Hanna Ashley Marie Smith MOTIVATION Location-based services utilize time and geographic behavior of user geotagging photos recommendations
More informationCSE 124: Networked Services Lecture-17
Fall 2010 CSE 124: Networked Services Lecture-17 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/30/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments
More informationChapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction
Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.
More informationCOSC3330 Computer Architecture Lecture 20. Virtual Memory
COSC3330 Computer Architecture Lecture 20. Virtual Memory Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston Virtual Memory Topics Reducing Cache Miss Penalty (#2) Use
More informationGLUSTER CAN DO THAT! Architecting and Performance Tuning Efficient Gluster Storage Pools
GLUSTER CAN DO THAT! Architecting and Performance Tuning Efficient Gluster Storage Pools Dustin Black Senior Architect, Software-Defined Storage @dustinlblack 2017-05-02 Ben Turner Principal Quality Engineer
More informationA Comparison of Two Distributed Systems: Amoeba & Sprite. By: Fred Douglis, John K. Ousterhout, M. Frans Kaashock, Andrew Tanenbaum Dec.
A Comparison of Two Distributed Systems: Amoeba & Sprite By: Fred Douglis, John K. Ousterhout, M. Frans Kaashock, Andrew Tanenbaum Dec. 1991 Introduction shift from time-sharing to multiple processors
More informationActive Storage using OSD. John A. Chandy Department of Electrical and Computer Engineering
Active Storage using OSD John A. Chandy Department of Electrical and Computer Engineering Active Disks We already have intelligence at the disk Block management Arm scheduling Can we use that intelligence
More informationNear Memory Key/Value Lookup Acceleration MemSys 2017
Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy
More informationRUNTIME SUPPORT FOR ADAPTIVE SPATIAL PARTITIONING AND INTER-KERNEL COMMUNICATION ON GPUS
RUNTIME SUPPORT FOR ADAPTIVE SPATIAL PARTITIONING AND INTER-KERNEL COMMUNICATION ON GPUS Yash Ukidave, Perhaad Mistry, Charu Kalra, Dana Schaa and David Kaeli Department of Electrical and Computer Engineering
More informationBigtable. Presenter: Yijun Hou, Yixiao Peng
Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 06 Presenter: Yijun Hou, Yixiao Peng
More information1 of 8 14/12/2013 11:51 Tuning long-running processes Contents 1. Reduce the database size 2. Balancing the hardware resources 3. Specifying initial DB2 database settings 4. Specifying initial Oracle database
More informationI, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.
5 Solutions Chapter 5 Solutions S-3 5.1 5.1.1 4 5.1.2 I, J 5.1.3 A[I][J] 5.1.4 3596 8 800/4 2 8 8/4 8000/4 5.1.5 I, J 5.1.6 A(J, I) 5.2 5.2.1 Word Address Binary Address Tag Index Hit/Miss 5.2.2 3 0000
More informationDalí: A Periodically Persistent Hash Map
Dalí: A Periodically Persistent Hash Map Faisal Nawab* 1, Joseph Izraelevitz* 2, Terence Kelly*, Charles B. Morrey III*, Dhruva R. Chakrabarti*, and Michael L. Scott 2 1 Department of Computer Science
More informationIOGP. an Incremental Online Graph Partitioning algorithm for distributed graph databases. Dong Dai*, Wei Zhang, Yong Chen
IOGP an Incremental Online Graph Partitioning algorithm for distributed graph databases Dong Dai*, Wei Zhang, Yong Chen Workflow of The Presentation A Use Case IOGP Details Evaluation Setup OLTP vs. OLAP
More informationE-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems
E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, Michael
More informationToday: Segmentation. Last Class: Paging. Costs of Using The TLB. The Translation Look-aside Buffer (TLB)
Last Class: Paging Process generates virtual addresses from 0 to Max. OS divides the process onto pages; manages a page table for every process; and manages the pages in memory Hardware maps from virtual
More informationDesign of Global Data Deduplication for A Scale-out Distributed Storage System
218 IEEE 38th International Conference on Distributed Computing Systems Design of Global Data Deduplication for A Scale-out Distributed Storage System Myoungwon Oh, Sejin Park, Jungyeon Yoon, Sangjae Kim,
More information!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?
Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!
More informationIntel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment
Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment Case Study Order Number: 334534-002US Ordering Information Contact your local Intel sales representative for ordering
More informationLecture 16. Today: Start looking into memory hierarchy Cache$! Yay!
Lecture 16 Today: Start looking into memory hierarchy Cache$! Yay! Note: There are no slides labeled Lecture 15. Nothing omitted, just that the numbering got out of sequence somewhere along the way. 1
More informationThe Google File System (GFS)
1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints
More informationCSE 124: Networked Services Lecture-16
Fall 2010 CSE 124: Networked Services Lecture-16 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/23/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments
More informationAdvanced Memory Organizations
CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU
More information