Handling heterogeneous storage devices in clusters
|
|
- Sybil Price
- 6 years ago
- Views:
Transcription
1 Handling heterogeneous storage devices in clusters André Brinkmann University of Paderborn Toni Cortes Barcelona Supercompu8ng Center
2 Randomized Data Placement Schemes n Randomized Data Placement Schemes Introduc8on Randomiza8on Balls into bins Randomized Data Placement Schemes Distributed Hash Tables Consistent Hashing and Share Redundancy and Randomized Data Placement Schemes Distributed Metadata Management
3 Introduc?on Randomiza?on n Determinis?c data placement schemes suffered many drawbacks for a long?me Heterogeneity has been an issue It has been costly to adapt to new storage systems It is difficult to support storage- on- demand concepts n Is there an alterna?ve to determinis?c schemes? n Yes, Randomiza?on can help to overcome these drawbacks, but new challenges are introduced!
4 Balls into bins Games I n Basic tasks of balls into bins games Assign a set of m balls to n bins n Mo?va?on Bins = Hard disks Balls = Data items L = max number of data items on each disk Where should I place the next item??
5 Balls into bins Games II n Basic Results: Assign n balls to n bins For every ball, choose one bin independently, uniformly at random Maximum load is sharply concentrated: where w.h.p. abbreviates with probability at least, for any fixed
6 Balls into bins Games III n This sounds terrible: The maximum loaded hard disk stores - 8mes more data than the average This seems not to be not scalable, or n The model assumes that only very few data items are stored inside the environment, but each disk is able to store many objects Let s assume that many objects means Then it holds w.h.p. that see, e.g, M. Raab, A. Steger: Balls into Bins - A Simple and Tight Analysis
7 Distributed Hash Tables n Randomiza?on introduces some (well known) challenges n Key ques?ons are: How can we retrieve a stored data item? How can we adapt to a changing number of disks? How can we handle heterogeneity? How can we support redundancy? n Key Tasks of Distributed Hash Tables (DHTs)
8 Consistent Hashing I n Introduced in the context of Web Caching n Bins are mapped by a pseudo- random hash func?on h: on a ring of length 1 n Bins become responsible for their interval n Balls are mapped by an addi?onal hash func?on g: onto the ring n Each bin stores balls in its interval See D. Karger, E. Lehman et al.: Consistent Hashing and Random Trees: Tools for Relieving Hot Spots on the World Wide Web
9 Consistent Hashing II n Average load of each bin is, but devia?on from average can be high: The maximum arc length on the ring becomes w.h.p. n Solu?on: Each bin is mapped by a set of independent hash func?ons to mul?ple points on the ring The maximum arc length assigned to a bin can be reduced to for an arbitrary small constant, if virtual bins are used for each physical bin See I. Stoica, R. Morris, et al.: Chord: A Scalable Peer- To- Peer Lookup Service for Internet Applica8ons.
10 Join and Leave- Opera?ons I n In a dynamic network, nodes can join and leave any?me n The main goal of a DHT is to have the ability to locate every key in the network at (nearly) any?me n (Planned) removal of bins changes the length of their neighbor intervals Data has to be moved to neighbor n Inser?on of bins changes interval length of their new neighbors
11 Join and Leave- Opera?ons II n Defini?on of a View V: A view V is a set of bins of which a particular client is aware of. n Monotonicity: A ranged hash function f is monotone if for all views implies n Monotonicity implies that in case of a join opera?on of a bin i, all moved data items have des?na?on i n Consistent Hashing has property of monotonicity
12 Heterogeneous Bins n Consistent Hashing is (nearly) op?mally suited for homogeneous environment, where all bins (disks) have same capacity and performance n Heterogeneous bins can be mapped to Consistent Hashing by using a different number of virtual bins for each physical bin n The rela?on between the number of different bins constantly changes n Monotonicity (and some other proper?es) can not be kept up
13 Share Strategy I g(d) l(c d ) 0 1 d p o n Share Strategy tries to map heterogeneous problem to homogeneous solu?on n Each bin d is assigned by a hash func?on g: to a start point g(d) inside [0,1)- interval n The length l of the interval is propor?onal to the capacity c i (performance, or other metric) of bin i See A. Brinkmann, K. Salzwedel, C. Scheideler: Compact, adap8ve placement schemes for non- uniform distribu8on requirements.
14 Share Strategy II 0 x h(x) n How to retrieve loca?on of a data item x inside this heterogeneous sebng? n Use hash func?on h: to map x to [0,1)- Interval n Use DHT for homogeneous bins to retrieve loca?on of x from all intervals cubng h(x)
15 Share Strategy III 0 x h(x) n Proper?es: (Arbitrary) op8mal distribu8on of balls and bins Computa8onal Complexity in O(1) Compe88ve Ra8o concerning Join and Leave is (1+ε) for every ε>0 n But: Share has been op8mized for usage in data center environments Share is not monotone and only par8ally suited for P2P networks
16 V:Drive SAN MDA n V:Drive out- of- band virtualiza8on environment each (Linux) server includes addi8onal block- level driver module metadata appliance ensures consistent view on storage and servers Share strategy used as data distribu8on strategy See A. Brinkmann, S. Effert, et al.: Influence of Adap8ve Data Layouts on Performance in dynamically changing Storage Environments
17 Performance V:Drive - Sta?c Throughput (MB/s) Synthe8c random I/O benchmark, sta8c configura8on Physical Volumes VDrive LVM Avg. latency (ms) Physical volumes VDrive LVM
18 Performance V:Drive Dynamic Throughput (MB/s) Synthe8c random I/O benchmark, dynamic configura8on Avg. latency (ms) Physical volumes VDrive LVM Physical volumes VDrive LVM
19 V:Drive - Reconfigura?on Overhead
20 Randomiza?on and Redundancy n Randomized data distribu?on schemes do not include mechanisms to safe data against dist failures n Ques?on: How to use Randomiza8on and RAID schemes together n Assump?on: n copies of a data block have to be distributed over n disks No two copies of a data block are allowed to be stored on the same disk
21 Trivial Solu?ons n Trivial Solu?on I: Divide storage systems into n storage pools Distribute first copies over first pool,, n- th copies over n- th pool Ø Missing flexibility n Trivial Solu?on II: First copy will be distributed over all disks Second copy will be distributed about all but the previously chosen disk, Ø Not able to use capacity efficiently p = ( 1 2 ) 3 p = ( 1 1 ) 2 p = Second Copy ( 1 1 ) 4 First Copy
22 Observa?on n Trivial Solu?on II is not able to use capacity efficiently, because big storage systems will be penalized compared to smaller devices n Theorem: Assume a trivial replication strategy that has to distribute k copies of m balls over n > k bins. Furthermore, the biggest bin has a capacity c max that is at least (1 + ε) c j of the next biggest bin j. In this case, the expected load of the biggest bin will be smaller than the expected load required for an optimal capacity efficiency. See A. Brinkmann, S. Effert, et al.: Dynamic and Redundant Data Placement
23 Idea n Algorithm has to ensure that bigger bins get data items according to their capaci?es n This can be ensured by an algorithm that iterates over a sorted list of bins 1. At each itera8on, the algorithm randomly decides, whether or whether not to place the ball 2. If one of k copies of a ball has been placed, use op8mal strategy for (k- 1) with remaining bins as input n Challenge: How to make random decision in step 1 of each itera8on
24 Example for Mirroring (k=2) 100 GB 100 GB 80 GB 80 GB 60 GB n n n denotes the rela?ve capacity of disk i to all disks denotes the rela?ve capacity of disk i to all disks star?ng with index i is the weight for the random decision!
25 Example for Mirroring (k=2) 100 GB 100 GB 80 GB 80 GB 60 GB n If, e.g., disk 2 is chosen as first copy of a mirror, just distribute the second copy according to Share over disks 3, 4, and 5 n Some adapta?on is necessary, if disk 3 is chose, because weight of disk 4 is greater 1
26 Observa?ons 100 GB 100 GB 80 GB 80 GB 60 GB n Strategy can easily be extended to arbitrary k n Data distribu?on is op?mal n Redistribu?on of data in dynamic environment is k 2 - compe??ve n Computa?onal complexity can be reduced to O(k)
27 Fairness of k- fold Replica?on
28 Adap?vity of k- fold Replica?on
29 Metadata Management n Assignment of data items to disks can be solved efficiently for random data distribu?on schemes Very good distribu8on of data and requests Computa8onal complexity low Adap8vity to new infrastructures op8mal without redundancy, ok with redundancy Over- provisioning can be efficiently integrated n but how to find posi?on of data item on the disks? Equal to the dic8onary problem Requires O(n) entries to find loca8on of n objects! Defines bulk set of metadata
30 Dic?onary Problem Extent Size vs. Volume Size 4 KB 16 KB 256 KB 4MB 16MB 256 MB 1 GB 1 GB 8 MB 2 MB 128 KB 8 KB 2 KB 128 Byte 32 Byte 64 GB 512 MB 128 MB 8 MB 512 KB 128 KB 8 KB 2 KB 1 TB 8 GB 2 GB 128 MB 8 MB 2 MB 128 KB 32 KB 64 TB 512 GB 128 GB 8 GB 512 MB 128 MB 8 MB 2 MB 1 PB 8 TB 2 TB 128 GB 8 GB 2 GB 128 MB 32 MB n Extent: Smallest con?nuous unit that can be addressed by virtualiza?on solu?on n Dic?onary easily becomes too big to be stored inside each server system for small extent sizes n Solu?ons Caching Huge extent sizes Object Based Storage Systems
31 Summary and Conclusions n Introduc?on into Disk Arrays n Why Heterogeneity? n Determinis?c Data Placement Schemes n Randomized Data Placement Schemes n Summary and Conclusions
32 Summary n Problem to be solved: scalable storage systems suppor?ng heterogeneous devices n Two solu?ons developed concurrently Determinis8c Modify RAID technology keeping its flavor Non- determinis8c Distribute data blocks by using randomiza8on RAID encoding on top of randomiza8on process
33 Conclusions n Advantages of each version Determinis8c Easy metadata management Easy recovery Non- determinis8c Good support for storage- on- demand concepts Less probability to get to a degraded state? n Both approaches are complementary concerning the advantages, but have many similari?es A zone is very similar to a group of extents Not fully described in the tutorial n Next step: Work on a mixed version
34 Bibliography I n A. Brinkmann, S. Effert, F. Meyer auf der Heide, C. Scheideler: Dynamic and Redundant Data Placement. In Proceedings of the 27th IEEE Interna8onal Conference on Distributed Compu8ng Systems (ICDCS ), 2007 n A. Brinkmann, S. Effert, M. Heidebuer, M. Vodisek: Influence of Adap?ve Data Layouts on Performance in dynamically changing Storage Environments. In Proceedings of the 14th Euromicro Conference on Parallel, Distributed and Network based Processing, 2006 n A. Brinkmann, K. Salzwedel, C. Scheideler: Compact, adap?ve placement schemes for non- uniform distribu?on requirements. In Proceedings of the 14th ACM Symposium on Parallel Algorithms and Architectures (SPAA), 2002 n T. Cortes and J. Labarta: Taking Advantage of Heterogeneity in Disk Arrays: Journal on Parallel and Distributed Compu8ng (JPDC), Volume 63, number 4, pp , April 2003 n J.L. Gonzalez and Toni Cortes: An Adap?ve Data Block Placement based on Determinis?c Zones (Adap?veZ): Interna8onal Conference on Grid compu8ng, high- performance and Distributed Applica8ons (GADA'07) Vilamoura, Algarve, Portugal, Nov 29-30, 2007
35 Bibliography II n J. L. Gonzalez, T. Cortes: Evalua?ng the Effects of Upgrading Heterogeneous Disk Arrays: Interna8onal Symposium on Performance Evalua8on of Computer and Telecommunica8on Systems (SPECTS 2006), Calgary, Canada, July 31 - August 2, 2006 n M. Holland G.A. Gibson: Parity declustering for con?nuous opera?on in redundant disk arrays: In Proceedings of the fish interna8onal conference on Architectural support for programming languages and opera8ng systems, Boston, Massachusets, 1992 n D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, and R. Panigrahy: Consistent Hashing and Random Trees: Tools for Relieving Hot Spots on the World Wide Web. In Proceedings of Symposium on Theory of Compu8ng (STOC), n Peter Lyman and Hal R. Varian. How much informa?on 2003?. School of Informa8on Management and Systems. University of California at Berkeley n D. A. Paterson, G. A. Gibson, R. H. Katz: A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the Interna8onal Conference on Management of Data (SIGMOD), 1988
36 Bibliography III n M. Raab, A. Steger: Balls into Bins - A Simple and Tight Analysis. In Proceedings of the 2nd Workshop on Randomiza8on and Approxima8on Techniques in Computer Science (RANDOM'98), 1998 n I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan: Chord: A Scalable Peer- To- Peer Lookup Service for Internet Applica?ons. In Proceedings of the 2001 ACM SIGCOMM Conference, 2001 n Ron Yellin. The data storage evolu?on. Has disk capacity outgrown its usefulness? Terada magazine 2006
Dynamic and Redundant Data Placement (Extended Abstract)
Dynamic and Redundant Data Placement (Extended Abstract) A. Brinkmann, S. Effert, F. Meyer auf der Heide Heinz Nixdorf Institute Paderborn, Germany brinkman@hni.upb.de, {fermat, fmadh}@upb.de C. Scheideler
More informationhashfs Applying Hashing to Op2mize File Systems for Small File Reads
hashfs Applying Hashing to Op2mize File Systems for Small File Reads Paul Lensing, Dirk Meister, André Brinkmann Paderborn Center for Parallel Compu2ng University of Paderborn Mo2va2on and Problem Design
More information6 Distributed data management I Hashing
6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication
More informationCS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University
CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Indexing Process Indexes Indexes are data structures designed to make search faster Text search
More informationCCW Workshop Technical Session on Mobile Cloud Compu<ng
CCW Workshop Technical Session on Mobile Cloud Compu
More informationDistributed Hash Table
Distributed Hash Table P2P Routing and Searching Algorithms Ruixuan Li College of Computer Science, HUST rxli@public.wh.hb.cn http://idc.hust.edu.cn/~rxli/ In Courtesy of Xiaodong Zhang, Ohio State Univ
More informationOutline. Spanner Mo/va/on. Tom Anderson
Spanner Mo/va/on Tom Anderson Outline Last week: Chubby: coordina/on service BigTable: scalable storage of structured data GFS: large- scale storage for bulk data Today/Friday: Lessons from GFS/BigTable
More informationTechnical Deep-Dive in a Column-Oriented In-Memory Database
Technical Deep-Dive in a Column-Oriented In-Memory Database Carsten Meyer, Martin Lorenz carsten.meyer@hpi.de, martin.lorenz@hpi.de Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software
More informationSearch Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson
Search Engines Informa1on Retrieval in Prac1ce Annota1ons by Michael L. Nelson All slides Addison Wesley, 2008 Evalua1on Evalua1on is key to building effec$ve and efficient search engines measurement usually
More informationLatest Trends in Database Technology NoSQL and Beyond
Latest Trends in Database Technology NoSQL and Beyond Sebas>an Marsching www.aquenos.com Why we want more than SQL Performance / Data Size Opera>onal Costs Availability 2 NoSQL NoSQL Not Only SQL 3 NoSQL
More informationSearch Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson
Search Engines Informa1on Retrieval in Prac1ce Annotations by Michael L. Nelson All slides Addison Wesley, 2008 Indexes Indexes are data structures designed to make search faster Text search has unique
More informationSemester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search
Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search David Baer Student of Computer Science Dept. of Computer Science Swiss Federal Institute of Technology (ETH) ETH-Zentrum,
More informationDistributed Two-way Trees for File Replication on Demand
Distributed Two-way Trees for File Replication on Demand Ramprasad Tamilselvan Department of Computer Science Golisano College of Computing and Information Sciences Rochester, NY 14586 rt7516@rit.edu Abstract
More informationToday s Objec2ves. Kerberos. Kerberos Peer To Peer Overlay Networks Final Projects
Today s Objec2ves Kerberos Peer To Peer Overlay Networks Final Projects Nov 27, 2017 Sprenkle - CSCI325 1 Kerberos Trusted third party, runs by default on port 88 Security objects: Ø Ticket: token, verifying
More informationBuilding a low-latency, proximity-aware DHT-based P2P network
Building a low-latency, proximity-aware DHT-based P2P network Ngoc Ben DANG, Son Tung VU, Hoai Son NGUYEN Department of Computer network College of Technology, Vietnam National University, Hanoi 144 Xuan
More informationA Simple Fault Tolerant Distributed Hash Table
A Simple ault Tolerant Distributed Hash Table Moni Naor Udi Wieder Abstract We introduce a distributed hash table (DHT) with logarithmic degree and logarithmic dilation We show two lookup algorithms The
More informationOrigin- des*na*on Flow Measurement in High- Speed Networks
IEEE INFOCOM, 2012 Origin- des*na*on Flow Measurement in High- Speed Networks Tao Li Shigang Chen Yan Qiao Introduc*on (Defini*ons) Origin- des+na+on flow between two routers is the set of packets that
More informationChord: A Scalable Peer-to-peer Lookup Service For Internet Applications
Chord: A Scalable Peer-to-peer Lookup Service For Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented by Jibin Yuan ION STOICA Professor of CS
More informationA Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables
A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables Takehiro Miyao, Hiroya Nagao, Kazuyuki Shudo Tokyo Institute of Technology 2-12-1 Ookayama, Meguro-ku,
More informationLecture 15 October 31
CS559: ALGORITHMIC ASPECTS OF COMPUTER NETWORKSFall 2007 Lecture 15 October 31 Lecturer: John Byers BOSTON UNIVERSITY Scribe: Georgios Smaragdakis In today s lecture, we elaborate more on structured eer-to-eer
More informationLoad Sharing in Peer-to-Peer Networks using Dynamic Replication
Load Sharing in Peer-to-Peer Networks using Dynamic Replication S Rajasekhar, B Rong, K Y Lai, I Khalil and Z Tari School of Computer Science and Information Technology RMIT University, Melbourne 3, Australia
More informationRouteBricks: Exploi2ng Parallelism to Scale So9ware Routers
RouteBricks: Exploi2ng Parallelism to Scale So9ware Routers Mihai Dobrescu and etc. SOSP 2009 Presented by Shuyi Chen Mo2va2on Router design Performance Extensibility They are compe2ng goals Hardware approach
More informationDynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others
Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others Sabina Serbu, Silvia Bianchi, Peter Kropf and Pascal Felber Computer Science Department, University of Neuchâtel
More informationEffec%ve Replica Maintenance for Distributed Storage Systems
Effec%ve Replica Maintenance for Distributed Storage Systems USENIX NSDI2006 Byung Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek, John Kubiatowicz, and Robert
More informationDecentralized Object Location In Dynamic Peer-to-Peer Distributed Systems
Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems George Fletcher Project 3, B649, Dr. Plale July 16, 2003 1 Introduction One of the key requirements for global level scalability
More informationScalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou
Scalability In Peer-to-Peer Systems Presented by Stavros Nikolaou Background on Peer-to-Peer Systems Definition: Distributed systems/applications featuring: No centralized control, no hierarchical organization
More informationA Framework for Peer-To-Peer Lookup Services based on k-ary search
A Framework for Peer-To-Peer Lookup Services based on k-ary search Sameh El-Ansary Swedish Institute of Computer Science Kista, Sweden Luc Onana Alima Department of Microelectronics and Information Technology
More informationAthens University of Economics and Business. Dept. of Informatics
Athens University of Economics and Business Athens University of Economics and Business Dept. of Informatics B.Sc. Thesis Project report: Implementation of the PASTRY Distributed Hash Table lookup service
More informationFlexible Information Discovery in Decentralized Distributed Systems
Flexible Information Discovery in Decentralized Distributed Systems Cristina Schmidt and Manish Parashar The Applied Software Systems Laboratory Department of Electrical and Computer Engineering, Rutgers
More informationCSCI 599 Class Presenta/on. Zach Levine. Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates
CSCI 599 Class Presenta/on Zach Levine Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates April 26 th, 2012 Topics Covered in this Presenta2on A (Brief) Review of HMMs HMM Parameter Learning Expecta2on-
More informationStorwize in IT Environments Market Overview
Storwize in IT Environments Market Overview Topic Challenges in Tradi,onal IT Environment Types of informa,on Storage systems required Storage for private clouds where tradi,onal IT is involved Storwize
More informationTime-related replication for p2p storage system
Seventh International Conference on Networking Time-related replication for p2p storage system Kyungbaek Kim E-mail: University of California, Irvine Computer Science-Systems 3204 Donald Bren Hall, Irvine,
More informationPhD in Computer And Control Engineering XXVII cycle. Torino February 27th, 2015.
PhD in Computer And Control Engineering XXVII cycle Torino February 27th, 2015. Parallel and reconfigurable systems are more and more used in a wide number of applica7ons and environments, ranging from
More informationDistriubted Hash Tables and Scalable Content Adressable Network (CAN)
Distriubted Hash Tables and Scalable Content Adressable Network (CAN) Ines Abdelghani 22.09.2008 Contents 1 Introduction 2 2 Distributed Hash Tables: DHT 2 2.1 Generalities about DHTs............................
More informationConsistent Hashing. Overview. Ranged Hash Functions. .. CSC 560 Advanced DBMS Architectures Alexander Dekhtyar..
.. CSC 56 Advanced DBMS Architectures Alexander Dekhtyar.. Overview Consistent Hashing Consistent hashing, introduced in [] is a hashing technique that assigns items (keys) to buckets in a way that makes
More informationToday s Objec4ves. Data Center. Virtualiza4on Cloud Compu4ng Amazon Web Services. What did you think? 10/23/17. Oct 23, 2017 Sprenkle - CSCI325
Today s Objec4ves Virtualiza4on Cloud Compu4ng Amazon Web Services Oct 23, 2017 Sprenkle - CSCI325 1 Data Center What did you think? Oct 23, 2017 Sprenkle - CSCI325 2 1 10/23/17 Oct 23, 2017 Sprenkle -
More informationToday s Objec2ves. AWS/MR Review Final Projects Distributed File Systems. Nov 3, 2017 Sprenkle - CSCI325
Today s Objec2ves AWS/MR Review Final Projects Distributed File Systems Nov 3, 2017 Sprenkle - CSCI325 1 Inverted Index final input files have been posted Another email out to AWS Google cloud Nov 3, 2017
More informationEffect of Links on DHT Routing Algorithms 1
Effect of Links on DHT Routing Algorithms 1 Futai Zou, Liang Zhang, Yin Li, Fanyuan Ma Department of Computer Science and Engineering Shanghai Jiao Tong University, 200030 Shanghai, China zoufutai@cs.sjtu.edu.cn
More informationAr#ficial Intelligence
Ar#ficial Intelligence Advanced Searching Prof Alexiei Dingli Gene#c Algorithms Charles Darwin Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for
More informationMODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION
INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2014) Vol. 3 (4) 273 283 MODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION MATEUSZ SMOLIŃSKI Institute of
More informationHP AutoRAID (Lecture 5, cs262a)
HP AutoRAID (Lecture 5, cs262a) Ali Ghodsi and Ion Stoica, UC Berkeley January 31, 2018 (based on slide from John Kubiatowicz, UC Berkeley) Array Reliability Reliability of N disks = Reliability of 1 Disk
More informationInforma)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies
Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:
More informationRegister Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein
Register Alloca.on Deconstructed David Ryan Koes Seth Copen Goldstein 12th Interna+onal Workshop on So3ware and Compilers for Embedded Systems April 24, 12009 Register Alloca:on Problem unbounded number
More informationOPTIMAL ROUTING VS. ROUTE REFLECTOR VNF - RECONCILE THE FIRE WITH WATER
OPTIMAL ROUTING VS. ROUTE REFLECTOR VNF - RECONCILE THE FIRE WITH WATER Rafal Jan Szarecki #JNCIE136 Solu9on Architect, Juniper Networks. AGENDA Route Reflector VNF - goals Route Reflector challenges and
More informationSHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers
2011 31st International Conference on Distributed Computing Systems Workshops SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers Lei Xu, Jian Hu, Stephen Mkandawire and Hong
More informationFluxo. Improving the Responsiveness of Internet Services with Automa7c Cache Placement
Fluxo Improving the Responsiveness of Internet Services with Automac Cache Placement Alexander Rasmussen UCSD (Presenng) Emre Kiciman MSR Redmond Benjamin Livshits MSR Redmond Madanlal Musuvathi MSR Redmond
More informationLessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems
LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems Kuang-Li Huang, Tai-Yi Huang and Jerry C. Y. Chou Department of Computer Science National Tsing Hua University Hsinchu,
More informationCSc 120. Introduc/on to Computer Programming II. 15: Hashing
CSc 120 Introduc/on to Computer Programming II 15: Hashing Hashing 2 Searching We have seen two search algorithms: linear (sequen;al) search O(n) o the items are not sorted binary search O(log n) o the
More informationImplementation and Performance Evaluation of RAPID-Cache under Linux
Implementation and Performance Evaluation of RAPID-Cache under Linux Ming Zhang, Xubin He, and Qing Yang Department of Electrical and Computer Engineering, University of Rhode Island, Kingston, RI 2881
More informationA Scalable Content- Addressable Network
A Scalable Content- Addressable Network In Proceedings of ACM SIGCOMM 2001 S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker Presented by L.G. Alex Sung 9th March 2005 for CS856 1 Outline CAN basics
More informationAmol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn
Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn Mo>va>on: Parallel Query Processing Increasing parallelism in compu>ng Shared nothing clusters, mul> core technology,
More informationAdvanced Linux System Administra3on
Advanced Linux System Administra3on Topic 7. File systems, advanced management Pablo Abad Fidalgo José Ángel Herrero Velasco Departamento de Ingeniería Informá2ca y Electrónica Este tema se publica bajo
More informationCS261 Data Structures. Maps (or Dic4onaries)
CS261 Data Structures Maps (or Dic4onaries) Goals Introduce the Map(or Dic4onary) ADT Introduce an implementa4on of the map with a Dynamic Array So Far. Emphasis on values themselves e.g. store names in
More information: Scalable Lookup
6.824 2006: Scalable Lookup Prior focus has been on traditional distributed systems e.g. NFS, DSM/Hypervisor, Harp Machine room: well maintained, centrally located. Relatively stable population: can be
More informationMapReduce. Cloud Computing COMP / ECPE 293A
Cloud Computing COMP / ECPE 293A MapReduce Jeffrey Dean and Sanjay Ghemawat, MapReduce: simplified data processing on large clusters, In Proceedings of the 6th conference on Symposium on Opera7ng Systems
More informationA Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System
A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System Ilkay Al(ntas and Daniel Crawl San Diego Supercomputer Center UC San Diego Jianwu Wang UMBC WorDS.sdsc.edu Computa3onal
More informationStaggeringly Large File Systems. Presented by Haoyan Geng
Staggeringly Large File Systems Presented by Haoyan Geng Large-scale File Systems How Large? Google s file system in 2009 (Jeff Dean, LADIS 09) - 200+ clusters - Thousands of machines per cluster - Pools
More informationAn Empirical Study of Data Redundancy for High Availability in Large Overlay Networks
An Empirical Study of Data Redundancy for High Availability in Large Overlay Networks Giovanni Chiola Dipartimento di Informatica e Scienze dell Informazione (DISI) Università di Genova, 35 via Dodecaneso,
More informationPyro: A Spatial-Temporal Big-Data Storage System. Shen Li Shaohan Hu Raghu Ganti Mudhakar Srivatsa Tarek Abdelzaher
Pyro: A Spatial-Temporal Big-Data Storage System Shen Li Shaohan Hu Raghu Ganti Mudhakar Srivatsa Tarek Abdelzaher 1 Applications A huge amount of geo- tagged events are generated and stored in real- 5me.
More informationCurrent Topics in OS Research. So, what s hot?
Current Topics in OS Research COMP7840 OSDI Current OS Research 0 So, what s hot? Operating systems have been around for a long time in many forms for different types of devices It is normally general
More informationEarly Measurements of a Cluster-based Architecture for P2P Systems
Early Measurements of a Cluster-based Architecture for P2P Systems Balachander Krishnamurthy, Jia Wang, Yinglian Xie I. INTRODUCTION Peer-to-peer applications such as Napster [4], Freenet [1], and Gnutella
More informationTrustworthy Keyword Search for Regulatory Compliant Records Reten;on
Trustworthy Keyword Search for Regulatory Compliant Records Reten;on S. Mitra, W. Hsu, M. WinsleA Presented by Thao Pham Introduc;on Important documents: emails, mee;ng memos, must be maintained in a trustworthy
More informationMonitoring IPv6 Content Accessibility and Reachability. Contact: R. Guerin University of Pennsylvania
Monitoring IPv6 Content Accessibility and Reachability Contact: R. Guerin (guerin@ee.upenn.edu) University of Pennsylvania Outline Goals and scope So=ware overview Func@onality, performance, and requirements
More informationECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger
ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng Prof. Natalie Enright Jerger Announcements Feedback on your project proposals This week Scheduled extended 1 week Next week:
More informationCSE Opera+ng System Principles
CSE 30341 Opera+ng System Principles Lecture 2 Introduc5on Con5nued Recap Last Lecture What is an opera+ng system & kernel? What is an interrupt? CSE 30341 Opera+ng System Principles 2 1 OS - Kernel CSE
More informationRAT Selec)on Games in HetNets
RAT Selec)on Games in HetNets Presented by Oscar Bejarano Rice University Ehsan Aryafar Princeton University Michael Wang Princeton University Alireza K. Haddad Rice University Mung Chiang Princeton University
More informationStructured Superpeers: Leveraging Heterogeneity to Provide Constant-Time Lookup
Structured Superpeers: Leveraging Heterogeneity to Provide Constant-Time Lookup Alper Mizrak (Presenter) Yuchung Cheng Vineet Kumar Stefan Savage Department of Computer Science & Engineering University
More informationVirtualization. Introduction. Why we interested? 11/28/15. Virtualiza5on provide an abstract environment to run applica5ons.
Virtualization Yifu Rong Introduction Virtualiza5on provide an abstract environment to run applica5ons. Virtualiza5on technologies have a long trail in the history of computer science. Why we interested?
More informationOverview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste
Overview 5-44 5-44 Computer Networking 5-64 Lecture 6: Delivering Content: Peer to Peer and CDNs Peter Steenkiste Web Consistent hashing Peer-to-peer Motivation Architectures Discussion CDN Video Fall
More informationCITS4009 Introduc0on to Data Science
School of Computer Science and Software Engineering CITS4009 Introduc0on to Data Science SEMESTER 2, 2017: CHAPTER 3 EXPLORING DATA 1 Chapter Objec0ves Using summary sta.s.cs to explore data Exploring
More informationExample. You manage a web site, that suddenly becomes wildly popular. Performance starts to degrade. Do you?
Scheduling Main Points Scheduling policy: what to do next, when there are mul:ple threads ready to run Or mul:ple packets to send, or web requests to serve, or Defini:ons response :me, throughput, predictability
More informationAdaptive Load Balancing for DHT Lookups
Adaptive Load Balancing for DHT Lookups Silvia Bianchi, Sabina Serbu, Pascal Felber and Peter Kropf University of Neuchâtel, CH-, Neuchâtel, Switzerland {silvia.bianchi, sabina.serbu, pascal.felber, peter.kropf}@unine.ch
More informationTerraSwarm. A Machine Learning and Op0miza0on Toolkit for the Swarm. Ilge Akkaya, Shuhei Emoto, Edward A. Lee. University of California, Berkeley
TerraSwarm A Machine Learning and Op0miza0on Toolkit for the Swarm Ilge Akkaya, Shuhei Emoto, Edward A. Lee University of California, Berkeley TerraSwarm Tools Telecon 17 November 2014 Sponsored by the
More informationOp#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD
Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD Riyaz Haque and David F. Richards This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore
More informationM 2 R: Enabling Stronger Privacy in MapReduce Computa;on
M 2 R: Enabling Stronger Privacy in MapReduce Computa;on Anh Dinh, Prateek Saxena, Ee- Chien Chang, Beng Chin Ooi, Chunwang Zhang School of Compu,ng Na,onal University of Singapore 1. Mo;va;on Distributed
More informationSubway : Peer-To-Peer Clustering of Clients for Web Proxy
Subway : Peer-To-Peer Clustering of Clients for Web Proxy Kyungbaek Kim and Daeyeon Park Department of Electrical Engineering & Computer Science, Division of Electrical Engineering, Korea Advanced Institute
More informationCombinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons
Combinatorial Mathema/cs and Algorithms at Exascale: Challenges and Promising Direc/ons Assefaw Gebremedhin Purdue University (Star/ng August 2014, Washington State University School of Electrical Engineering
More informationCh06. NoSQL Part III.
Ch06. NoSQL Part III. Joonho Kwon Data Science Laboratory, PNU 2017. Spring Adapted from Dr.-Ing. Sebastian Michel s slides Recap: Configurations R/W Configuration Kind ofconsistency W=N and R=1 Read optimized
More informationConsistency Rationing in the Cloud: Pay only when it matters
Consistency Rationing in the Cloud: Pay only when it matters By Sandeepkrishnan Some of the slides in this presenta4on have been taken from h7p://www.cse.iitb.ac.in./dbms/cs632/ra4oning.ppt 1 Introduc4on:
More informationDegree Optimal Deterministic Routing for P2P Systems
Degree Optimal Deterministic Routing for P2P Systems Gennaro Cordasco Luisa Gargano Mikael Hammar Vittorio Scarano Abstract We propose routing schemes that optimize the average number of hops for lookup
More informationWeb- Scale Mul,media: Op,mizing LSH. Malcolm Slaney Yury Li<shits Junfeng He Y! Research
Web- Scale Mul,media: Op,mizing LSH Malcolm Slaney Yury Li
More informationA Directed-multicast Routing Approach with Path Replication in Content Addressable Network
2010 Second International Conference on Communication Software and Networks A Directed-multicast Routing Approach with Path Replication in Content Addressable Network Wenbo Shen, Weizhe Zhang, Hongli Zhang,
More informationPerformance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis
Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Elif Dede, Madhusudhan Govindaraju Lavanya Ramakrishnan, Dan Gunter, Shane Canon Department of Computer Science, Binghamton
More informationHP AutoRAID (Lecture 5, cs262a)
HP AutoRAID (Lecture 5, cs262a) Ion Stoica, UC Berkeley September 13, 2016 (based on presentation from John Kubiatowicz, UC Berkeley) Array Reliability Reliability of N disks = Reliability of 1 Disk N
More informationThere is a tempta7on to say it is really used, it must be good
Notes from reviews Dynamo Evalua7on doesn t cover all design goals (e.g. incremental scalability, heterogeneity) Is it research? Complexity? How general? Dynamo Mo7va7on Normal database not the right fit
More informationBroadcas(ng Video in Dense g Networks Using Applica(on FEC and Mul(cast
Broadcas(ng Video in Dense 802.11g Networks Using Applica(on FEC and Mul(cast Last update: 6-10-2011 Dr James Martin School of Computing Clemson University Clemson, SC jim.martin@cs.clemson.edu Dr James
More informationSta$c Analysis Dataflow Analysis
Sta$c Analysis Dataflow Analysis Roadmap Overview. Four Analysis Examples. Analysis Framework Soot. Theore>cal Abstrac>on of Dataflow Analysis. Inter- procedure Analysis. Taint Analysis. Overview Sta>c
More informationSimple Determination of Stabilization Bounds for Overlay Networks. are now smaller, faster, and near-omnipresent. Computer ownership has gone from one
Simple Determination of Stabilization Bounds for Overlay Networks A. Introduction The landscape of computing has changed dramatically in the past half-century. Computers are now smaller, faster, and near-omnipresent.
More informationChord: A Scalable Peer-to-peer Lookup Service for Internet Applications
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented by Veranika Liaukevich Jacobs University
More informationDYNAMIC TREE-LIKE STRUCTURES IN P2P-NETWORKS
DYNAMIC TREE-LIKE STRUCTURES IN P2P-NETWORKS Herwig Unger Markus Wulff Department of Computer Science University of Rostock D-1851 Rostock, Germany {hunger,mwulff}@informatik.uni-rostock.de KEYWORDS P2P,
More informationMul$media Networking. #9 CDN Solu$ons Semester Ganjil 2012 PTIIK Universitas Brawijaya
Mul$media Networking #9 CDN Solu$ons Semester Ganjil 2012 PTIIK Universitas Brawijaya Schedule of Class Mee$ng 1. Introduc$on 2. Applica$ons of MN 3. Requirements of MN 4. Coding and Compression 5. RTP
More informationChapter 10: Mass-Storage Systems
Chapter 10: Mass-Storage Systems Silberschatz, Galvin and Gagne 2013 Chapter 10: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space
More informationCluster-Level Google How we use Colossus to improve storage efficiency
Cluster-Level Storage @ Google How we use Colossus to improve storage efficiency Denis Serenyi Senior Staff Software Engineer dserenyi@google.com November 13, 2017 Keynote at the 2nd Joint International
More informationVirtual Allocation: A Scheme for Flexible Storage Allocation
Virtual Allocation: A Scheme for Flexible Storage Allocation Sukwoo Kang, and A. L. Narasimha Reddy Dept. of Electrical Engineering Texas A & M University College Station, Texas, 77843 fswkang, reddyg@ee.tamu.edu
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationA Super-Peer Based Lookup in Structured Peer-to-Peer Systems
A Super-Peer Based Lookup in Structured Peer-to-Peer Systems Yingwu Zhu Honghao Wang Yiming Hu ECECS Department ECECS Department ECECS Department University of Cincinnati University of Cincinnati University
More informationClick to edit Master title
Click to edit Master title DIMM: A Distributed Metadata Management for Data-Intensive HPC Brandon Szeliga, John Cavicchio and Weisong Shi Wayne State University bszeliga@wayne.edu 1 Click Roadmap to edit
More informationChapter 6 PEER-TO-PEER COMPUTING
Chapter 6 PEER-TO-PEER COMPUTING Distributed Computing Group Computer Networks Winter 23 / 24 Overview What is Peer-to-Peer? Dictionary Distributed Hashing Search Join & Leave Other systems Case study:
More informationCLOUD COMPUTING IT0530. G.JEYA BHARATHI Asst.Prof.(O.G) Department of IT SRM University
CLOUD COMPUTING IT0530 G.JEYA BHARATHI Asst.Prof.(O.G) Department of IT SRM University What is virtualization? Virtualization is way to run multiple operating systems and user applications on the same
More informationChapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition
Chapter 10: Mass-Storage Systems Silberschatz, Galvin and Gagne 2013 Chapter 10: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space
More information