DataMods. Programmable File System Services. Noah Watkins

Size: px

Start display at page:

Download "DataMods. Programmable File System Services. Noah Watkins"

Barbra McDaniel
6 years ago
Views:

1 DataMods Programmable File System Services Noah Watkins 1

2 Overview Middleware scalability challenges How DataMods can help address some issues Case Studies Hadoop Cloud I/O Stack Offline Indexing in PLFS (PDSW 12) 2

3 DataMods Overview ApplicaKons struggle to scale on POSIX Systems rarely to provide other interfaces POSIX designed to prevent lock- in Open- source systems are now available Can we generalize these systems to provide new behavior to new users? 3

4 ApplicaKon Middleware Complex data models and interfaces Difficult to work directly with simple byte stream Middleware maps the complex onto the simple Applications Performing Raw I/O BloomMap Subset of Hadoop Ecosystem SetFile ArrayFile MapFile TFile SequenceFile BCFile HFile RCFile NetCDF, HDF5, FITS DB (BTree, WAL,...) Byte Stream Interface 4

5 Middleware Complexity Bloat Hadoop and Big Data data models Ordered key/value pairs stored in file DicKonary for random key- oriented access Common table abstrackons Applications Performing Raw I/O BloomMap Subset of Hadoop Ecosystem SetFile ArrayFile MapFile SequenceFile TFile BCFile HFile RCFile NetCDF, HDF5, FITS DB (BTree, WAL,...) Byte Stream Interface 5

Middleware Complexity Bloat ScienKfic data MulK- dimensional arrays Imaging Genomics Applications Performing Raw I/O Subset of Hadoop Ecosystem BloomMap

6 Middleware Complexity Bloat ScienKfic data MulK- dimensional arrays Imaging Genomics Applications Performing Raw I/O Subset of Hadoop Ecosystem BloomMap SetFile ArrayFile MapFile TFile SequenceFile BCFile HFile RCFile NetCDF Scientific Formats HDF5 GRIB FITS FASTA DB (BTree, WAL,...) Byte Stream Interface 6

Middleware Complexity Bloat IO Middleware Low- level data models and I/O opkmizakon TransformaKve I/O avoids POSIX limitakons Applications Performing Raw I/O Subset of Hadoop Ecosystem

7 Middleware Complexity Bloat IO Middleware Low- level data models and I/O opkmizakon TransformaKve I/O avoids POSIX limitakons Applications Performing Raw I/O Subset of Hadoop Ecosystem BloomMap SetFile ArrayFile MapFile TFile SequenceFile BCFile HFile RCFile Scientific Formats NetCDF HDF5 GRIB FITS FASTA IO Middleware MPI-IO ADIOS PLFS DB (BTree, WAL,...) Byte Stream Interface 7

Middleware Scalability Challenges Mapping between domains Difficult to translate efficiently Magic numbers / alignment Application Applications Performing Raw I/O Subset of Hadoop Ecosystem

8 Middleware Scalability Challenges Mapping between domains Difficult to translate efficiently Magic numbers / alignment Application Applications Performing Raw I/O Subset of Hadoop Ecosystem BloomMap SetFile ArrayFile MapFile TFile SequenceFile BCFile HFile RCFile Scientific Formats NetCDF HDF5 GRIB FITS FASTA IO Middleware MPI-IO ADIOS PLFS DB (BTree, WAL,...) Byte Stream Interface 8

Middleware Scalability Challenges Underlying scalable storage Highly scalable services Customizable Application Applications Performing Raw I/O Subset of Hadoop Ecosystem BloomMap SetFile ArrayFile

9 Middleware Scalability Challenges Underlying scalable storage Highly scalable services Customizable Application Applications Performing Raw I/O Subset of Hadoop Ecosystem BloomMap SetFile ArrayFile MapFile TFile SequenceFile BCFile HFile RCFile Scientific Formats NetCDF HDF5 GRIB FITS FASTA IO Middleware MPI-IO ADIOS PLFS DB (BTree, WAL,...) Byte Stream Interface Metadata Service Object-storage Service Client File Operations Scalable Recovery 9

10 Data Model Modules Define new interfaces and behavior Built atop exiskng scalable services Enables richer set of file implementakons Bytes SeqFile ND-Array SeqFile Module Definition Metadata Service Client File Operations ND-Array Module Definition Object-storage Service Scalable Recovery 10

11 What does middleware do? Metadata Management Data Placement Intelligent Access Asynchronous Services 11

12 Hadoop SequenceFile Middleware Consolidate many key/value pairs (less files) Preserves the ordering of inserts Binary file format / block model isn t flexible Applications Performing Raw I/O Subset of Hadoop Ecosystem BloomMap SetFile ArrayFile MapFile TFile SequenceFile BCFile HFile RCFile Scientific Formats NetCDF HDF5 GRIB FITS FASTA IO Middleware MPI-IO ADIOS PLFS DB (BTree, WAL,...) Byte Stream Interface Metadata Service Object-storage Service Client File Operations Scalable Recovery 12

13 SequenceFile Metadata Management Header added when file is created Key/value data type Free- text descripkons Inflexible afer creakon (shif problem) Hot spot for thousands of clients Interface - Create - GetMetaData SequenceFile Header 0 EOF 13

14 SequenceFile Data Placement New records are append to file Interface - Create - GetMetaData - Append(K/V) SequenceFile Header KEY VALUE 0 EOF First Record 14

15 SequenceFile Data Placement New records are append to file Format encodes length and marker Interface - Create - GetMetaData - Append(K/V) SequenceFile Klen Rlen Header KEY VALUE KEY VALUE KEY VALUE... 0 EOF Record Record Record 15

16 SequenceFile Data Placement New records are append to file Format encodes length and marker Block boundaries break records Interface - Create - GetMetaData - Append(K/V) - Next(K/V) Compute Remote reads to fetch value SequenceFile Header KEY VALUE KEY VALUE KEY VALUE... 0 EOF 16

17 SequenceFile Intelligent Access Block boundary!= record bounary Interface - Create - GetMetaData - Append(K/V) - Next(K/V) - Sync() SequenceFile Header KEY VALUE KEY VALUE KEY VALUE... 0 EOF 17

18 SequenceFile Intelligent Access Block boundary!= record boundary Sync markers added before record Interface - Create - GetMetaData - Append(K/V) - Next(K/V) - Sync() SequenceFile Header KEY VALUE KEY VALUE KEY VALUE... 0 EOF 18

19 SequenceFile Intelligent Access Block boundary!= record boundary Sync markers added before record Sliding window over block Scan can be very costly More markers add space overhead Interface - Create - GetMetaData - Append(K/V) - Next(K/V) - Sync() scan SequenceFile Header KEY VALUE KEY VALUE KEY VALUE... 0 EOF 19

20 DataMods: The AbstracKon File Manifold Generalize metadata and data organizakon AcKve and Typed Objects Domain- specific interfaces and computakon Asynchronous Services Long- term schedulable work units Indexing, compression, sorkng, re- organizakon 20

21 DataMods: File Manifold Customized metadata and data placement Inode stores manifold placement strategy Manifold manages data in container model Inode MDObj Obj 1 Obj 2 Extended Metadata K V K V K V K V K V K V K V K V... 21

DataMods: AcGve and Type Objects Domain- specific interfaces and funckonality Objects automakcally build index Avoid record fragmentakon Inode

22 DataMods: AcGve and Type Objects Domain- specific interfaces and funckonality Objects automakcally build index Avoid record fragmentakon Inode update_index(off, key) append(key, value append new append free space Index MDObj Obj 1 Obj 2 Extended Metadata K V K V K V K V K V K V K V K V... 22

23 DataMods Summary File Manifold Metadata Data placement AcKve and Typed Storage Expose low- level services Well- defined programming model 23

24 Case Study Hadoop in the Cloud Offline indexing for checkpoint/restart 24

25 Locality and the Hadoop I/O Stack Map/Reduce clients local data blocks Reader will filter and parse blocks into <K/V> Traffic reduckon is significant Not as simple in cloud Parsing & Filtering Reduce Map Record Reader SequenceFile Header KEY VALUE KEY VALUE KEY VALUE... 0 EOF 25

26 Locality and Hadoop Cloud I/O Stack Providers separate compute from storage Co- locakon is prevented Network traffic increases Filtering rendered ineffeckve Parsing & Filtering Reduce Map Record Reader SequenceFile Header KEY VALUE KEY VALUE KEY VALUE... 0 EOF 26

27 Unmanaged Co- located Filtering Use ackve storage to implement reader Filtering is now effeckve Reduce Co- locakng arbitrary code Map Providers unlikely to support Security and QoS challenges Hammer is too big Parsing & Filtering Record Reader SequenceFile Header KEY VALUE KEY VALUE KEY VALUE... 0 EOF Record Reader 27

28 Managed Co- located Filtering Expand map/reduce model Explicitly remote reader Restricted access to services Vesed programming model Domain- specific interfaces SequenceFile à <K/V> access Parsing & Filtering Reduce Map Record Reader Extended Record Reader Extended Metadata K V K V K V K V K V K V K V K V... Extended Record Reader 28

29 Case Study Summary Non- POSIX file interfaces don t need POSIX Sequence File can be directly implemented Enables new funckonality using custom programming models 29

30 Case Study: PLFS Checkpoint/Restart Long- running simulakons need fault- tolerance SimulaKons run on expensive machines > 300 TB RAM, > 300,000 cores Checkpoint state periodically to file system Compute- I/O- Compute cycle More bandwidth = increased uklizakon Some workloads are very difficult to opkmize 30

31 Checkpoint Workloads Many variakons N- 1 Client 1 Client 2 Client 3 N- N Chkpt.bin Client 1 Client 2 Client 3 Chkpt_1.bin Chkpt_3.bin Chkpt_3.bin 31

32 Overview of PLFS Middleware layer Transforms N- 1 into N- N above file system Application / Client PLFS MPI-IO POSIX File System Each process writes into dedicated log Manages mulkple logs/indexes in the file system 32

33 Overview of PLFS I/O Client 1 Client 2 Client 3 Index Log- structured Index Log- structured Index Log- structured 33

34 PLFS Scalability Challenges Indexing overheads Volume I/O traffic Index compression Compute Kme wasted What if we take advantage of idle and excess capacity in storage? It already knows about the data 34

35 DataMods Module for PLFS File Manifold AutomaKc namespace management AcKve and Typed Objects No explicit index management (reduced I/O) Indexes automakcally generated by ackve objects Asynchronous Services Offline index support frees compute resources Compression opkmizes indexes over Kme 35

36 PLFS File Manifold Logical Byte Stream Logical top- half file is not materialized Log Selection (write) Per-proc Log-structured File Manifold Index Lookup (read) Striping Strategy (Append) Index Maintenance API Object Store append to log... pid.obj.0 pid.obj.1 pid.obj.n 36

37 PLFS File Manifold Logical Byte Stream Logical top- half file is not materialized Log Selection (write) Per-proc Log-structured File Manifold Index Lookup (read) Routes to per- process log file Striping Strategy (Append) Object Store Index Maintenance API append to log... pid.obj.0 pid.obj.1 pid.obj.n 37

38 PLFS File Manifold Logical Byte Stream Logical top- half file is not materialized Log Selection (write) Per-proc Log-structured File Manifold Index Lookup (read) Routes to per- process log file Striping Strategy (Append) Object Store Index Maintenance API append to log... Append striping within RADOS namespace pid.obj.0 pid.obj.1 pid.obj.n 38

39 PLFS File Manifold Logical Byte Stream Logical top- half file is not materialized Log Selection (write) Per-proc Log-structured File Manifold Index Lookup (read) Routes to per- process log file Striping Strategy (Append) Object Store Index Maintenance API append to log... Append striping within RADOS namespace pid.obj.0 pid.obj.1 pid.obj.n Index- enabled objects record logical- to- phy 39

within RADOS namespace Striping Strategy (Append) Object Store Index Maintenance API append to log... pid.

40 PLFS File Manifold Logical Byte Stream Logical top- half file is not materialized Log Selection (write) Per-proc Log-structured File Manifold Index Lookup (read) Routes to per- process log file Append striping within RADOS namespace Striping Strategy (Append) Object Store Index Maintenance API append to log... pid.obj.0 pid.obj.1 pid.obj.n Index- enabled objects record logical- to- phy Interface to index maintenance rouknes 40

Offline Index Compression Exploit storage system idle Kme Asynchronously opkmize global index Merging conkguous entries Pasern discovery and replacement

41 Offline Index Compression Exploit storage system idle Kme Asynchronously opkmize global index Merging conkguous entries Pasern discovery and replacement Index consolidakon Raw Index Index Compression Pipeline Merge Pattern ConslidaKon Compression & Consolidated Index (logical offset, physical offset, length) 41

42 Offline Index Compression Effect Effect'of'Merging'and'Pa>ern'Techniques'for'PLFS'Traces' Pasern Compression on 92 PLFS Traces Merging" Merging+Pa-ern"Recogni1on" " " Factor'Reduc,on'over'Baseline' 10000" 1000" 100" 10" 1" Data'Sets'Ordered'by'Reduc,on'Factor' 42

43 Current Status / Conclusion Building out more mechanisms for case study OpKmizing low- level performance Preliminary indexing performance: 10% overhead Next steps ConstrucKng specificakon language Building DataMods plugins for new file formats Thanks! QuesKons? 43

DataMods: Programmable File System Services

DataMods: Programmable File System Services Noah Watkins, Carlos Maltzahn, Scott Brandt University of California, Santa Cruz {jayhawk,carlosm,scott}@cs.ucsc.edu Adam Manzanares California State University,