Click to edit Master title

Size: px

Start display at page:

Download "Click to edit Master title"

Dylan Morrison
5 years ago
Views:

1 Click to edit Master title DIMM: A Distributed Metadata Management for Data-Intensive HPC Brandon Szeliga, John Cavicchio and Weisong Shi Wayne State University bszeliga@wayne.edu 1

2 Click Roadmap to edit Master title Motivation DIMM DHT Bloomfilter System Walk Through Evaluation of DIMM Related Work Conclusion 2

3 Click Motivation to edit Master title Amount of data being stored continually growing Soon to reach levels never seen before Petabyte levels E.g., Physics, bioinformatics, etc Data needs to be migrated from storage to computational nodes We envision this 3

Click Motivation to edit Master title Along with the increase in data stores comes an increase in the metadata associated with migrated files Two challenges

4 Click Motivation to edit Master title Along with the increase in data stores comes an increase in the metadata associated with migrated files Two challenges Our Solution: Maintaining the migrated file DiSK: information A Distributed Shared Disk Cache System Frequent DIMM is updating the key of component of DiSK 4

5 Click DIMMto edit Master title DIstributed Metadata Management Goal: Third Reduce level the amount of migrations from archival Fourth storage level and minimize metadata for a centralized scheduler DIMM uses two key concepts: Distributed Hash Table (DHT) Bloomfilter DIMM is used for read only data and does not guarantee data is persistent within it. 5

6 Distributed Hash Table Overview Click to edit Master title A distributed hash table: Organizes nodes into a ring Inserts/Retrieves items based on a key E.g., Fifth Chord level [Stoica et al. 2001] 6

7 Click Distributed to edit Hash Master Table title By using a key related to the name of a file, a home location for each file can be determined Every Fourth node level can determine where the home is, but not if it is there Allows every node to be able to retrieve data stored if stored on its home To differentiate between data stored as a result of being on its home node or being elsewhere we have the storage divided into a home cache (H) and a local cache (C) 7

8 Click Bloomfilter to edit Master title Why Bloomfilter? Quick checking Easy Fourth insertion level Small Fifth storage level requirement Why not Bloomfilter? Needs to be larger than the set Tendency to contain false positives Cannot delete 8

9 Click Bloomfilter to edit Overview Master title Bloomfilter is an array of bits that is k times larger than a set n 9

10 Click Counter to edit Based Master Bloomfilter title Uses an array of integers instead of bits By using Fourth a level counter based Bloomfilter a centralized manager can monitor data available in DHT This allows for removal of data from Bloomfilter without false negatives However false positives are still a problem with this Bloomfilter 10

11 Click Locality-Check to edit Master Bloomfilter title In order to reduce false positives, a locality check is introduced into the Bloomfilter For Fourth every level file a set of its neighboring files are checked as well Neighboring files can be set alphanumerically or chronologically Using the existence of these neighboring files a probability of existence of the original file is given by: 11

12 Standard System vs. DIMM System Click to edit Master title 12

13 Click System to Walk edit Master Throughtitle 13

14 Click DIMM to Evaluation edit Master title Simulation used for monitoring the impact of DHT and Bloomfilter Evaluate: Impact in job scheduling Local Hits and Migrations from archive Database size vs. Bloomfilter Size Impact of the Locality Check in the Bloomfilter False negatives and False positives 14

15 Click Simulation to edit Setup Master title 400 nodes each capable of holding 2,500 files 250 GB nodes with 100MB files Trace Fourth file level generates amount of input files from normal Fifth (mean level 500, standard deviation 22) Actual files are from a uniform distribution of 100,000 files Jobs (collection of the input files) are scheduled based on SWAP(Storage-aware App. Scheduling) This attempts to maximize the amount of file hits This scheduling policy is a separate topic of ours 15

16 Click Scheme to edit Comparisons Master title SWAP All storage is being considered cache space DIMM_h This is DIMM where only the home location Fifth is level being used to hold files DIMM_hr This is DIMM with the home and the local cache being used to hold files JobMig This is DIMM, but with the ability to migrate jobs to the location of the data 16

underperforms SWAP due to restrictions on size DIMM_hr

17 Impact in Job Scheduling Local Click to edit Master title DIMM_hr performs similar to SWAP until limiting size DIMM_h underperforms SWAP due to restrictions on size DIMM_hr compares to SWAP when large cache, but suffers when caches are larger 17

18 Impact in Job Scheduling Click to edit Master title SWAP s cache scheme suffers from needing to go to the Fifth level archive for data often Also as the home cache increases we can match the migrations required of DIMM has less migrations than SWAP with a large cache, and is capable of matching JobMig 18

19 Database Size vs. Bloomfilter Size Click to edit Master title Problem with Bloomfilter is that the array needs to be larger than number of items Problem with Databases is that the per item entry Fifth is large level Next slide we compare Bloomfilter and Database: Bloomfilter where each element is a byte Database where each item is 4 bytes for location information and lg(n) bytes for file differentiation 19

20 Database Size vs. Bloomfilter Size Click to edit Master title Bloomfilters with 5x and 10x the total number Fifth of level files have a savings on space before 500 files in database Counter-based Bloomfilter is a space efficient alternative for a database 20

21 Impact of Locality Check in Click to edit Master title Created Bloomfilters of various sizes (10 7, 10 6,500x10 3, 250x10 3 ) with 4 hash functions 100x10 3 files selected from a normal distribution with various Fifth levelvariances (250, 10 3, 5x10 3, 10x10 3, 25x10 3, 50x10 3, 75x10 3, 100x10 3, 125x10 3, 250x10 3, 375x10 3, 500x10 3 ) Changing this parameter changes the number of different files inserted 21

Click Number to edit of False Master Positives title Increase of variance increases number Fifth of level files inserted and increases number of

22 Click Number to edit of False Master Positives title Increase of variance increases number Fifth of level files inserted and increases number of false positives Smaller Bloomfilter has more false positives, and increasing variance leads to more files which increases false positives as well 22

23 Click False to Positives edit Master Identified title D is the distance in the locality check T is the Fifth thresh- level hold to identify false positives Bloomfilter size of 250x10 3 Identifies at least 25% of false positives, more if D/T increases 23

24 Click False to Negatives edit Master title Bloomfilter size of 250x10 3 Similar results for other sizes The increase in false negatives is comparable to the decrease in false positives, but these are less costly 24

25 Click Related to Work edit Master title Giggle : Manage replicas in a user given configuration, [A. Chervenak et al. 2002] Requires user to define system type Achieves distributed nature by high redundancy of Fifth level data L-Store: Manages files on block level in a file system, [A. Tackett el al. 2006] Doesn t get benefit of local file hits, but may have faster transfers Interesting comparison to DIMM Zhang et al. : Job recovery in the event of node failure, [Zhang et al. 2007] 25

26 Click Conclusions to edit Master title We present a method for distributing the metadata management in HPC environments capable Fourth of level reducing the amount of migrations from archive while keeping a high number of local hits capable of reducing the size of the centralized management scheme With the introduction of locality checks in a Bloomfilter we are able to reduce the number of false positives in exchange for increasing the less costly false negatives 26

27 Click Current to and edit Future Master Work title Currently implementing a version of DIMM into our DiSK project (Distributed Shared Disk Cache) DiSK Fifth is the levelmajor project that is a culmination of DIMM s management and Differentiable Replication (DiR) Based on MIT s Chord/DHash A prototype is running on a 20-node cluster 27

28 Click to edit Master title Questions and More Information Brandon Szeliga Weisong Shi 28

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems GFS (The Google File System) 1 Filesystems