8.5 End-to-End Demonstration Exascale Fast Forward Storage Team June 30 th, 2014

Size: px
Start display at page:

Download "8.5 End-to-End Demonstration Exascale Fast Forward Storage Team June 30 th, 2014"

Transcription

1 8.5 End-to-End Demonstration Exascale Fast Forward Storage Team June 30 th, 2014 NOTICE: THIS MANUSCRIPT HAS BEEN AUTHORED BY INTEL, THE HDF GROUP, AND EMC UNDER INTEL S SUBCONTRACT WITH LAWRENCE LIVERMORE NATIONAL SECURITY, LLC WHO IS THE OPERATOR AND MANAGER OF LAWRENCE LIVERMORE NATIONAL LABORATORY UNDER CONTRACT NO. DE-AC52-07NA27344 WITH THE U.S. DEPARTMENT OF ENERGY. THE UNITED STATES GOVERNMENT RETAINS AND THE PUBLISHER, BY ACCEPTING THE ARTICLE OF PUBLICATION, ACKNOWLEDGES THAT THE UNITED STATES GOVERNMENT RETAINS A NON-EXCLUSIVE, PAID-UP, IRREVOCABLE, WORLD-WIDE LICENSE TO PUBLISH OR REPRODUCE THE PUBLISHED FORM OF THIS MANUSCRIPT, OR ALLOW OTHERS TO DO SO, FOR UNITED STATES GOVERNMENT PURPOSES. THE VIEWS AND OPINIONS OF AUTHORS EXPRESSED HEREIN DO NOT NECESSARILY REFLECT THOSE OF THE UNITED STATES GOVERNMENT OR LAWRENCE LIVERMORE NATIONAL SECURITY, LLC. Fast Forward Project

2 Statement of Work 8.5 Deliverables 8.5 End-to -End Demonstration with Final Design Documentation and Report a) The Subcontractor shall develop a final version of the design documentation that is an updated version of the Milestone 4.1 Design Document to include technical lessons learned during the research and development phases of the project. The resulting design will include the following sections for the topics listed in Milestones 3.1 and 4.1 : Description of the solution elements/implementation components and how they are expected to address the research goals Explanation of why/how the proposed solution will address the project requirements Identify risks or unknowns with the proposed approach b) The Subcontractor shall complete a final report that describes research methodology, key findings and recommendations for future research. This report will also identify work that did not result in useable functionality, so others may avoid those paths in the future. c) As part of this reporting process the Subcontractor shall complete a final end-to-end demonstration representative of work completed during the project. The specific demonstration criteria will be described and mutually agreed to by the Subcontractor and Technical Representative in the Solution Architecture document and further refined and finalized during the quarter prior to the demonstration (Project Quarter 7). This milestone is deemed complete when the Final Report, Final Design Document, and demonstration results have been presented to and approved by the Technical Representative. 2

3 Final Report and Updated Design Documents Milestone Final Report Ref. Document Milestone Final Report Ref. Document M8.5 FF-Storage Final Report 3.1 D1 DAOS 3.1 Reduction Network Discovery Design Document 4.1 D2 DAOS 4.1 Server Collectives Design Document 3.1 D3 DAOS 3.1 Versioning Object Storage Device (VOSD) Design Document - D4 DAOS API and DAOS POSIX Design Document 4.1 D5 DAOS 4.1 Epoch Recovery v D6 DAOS 4.1 Lustre Restructuring and Protocol Changes v DAOS 4.1 Client Health and Global Eviction Design Document - I1 IOD Solution Architecture - I2 IOD API 3.1 I3 IOD 3.1 Design Document - I4 IOD KV Design Document - I5 IOD Object Storage on DAOS Document HDF 3.1 HDF5 IOD VOL Design v HDF 4.4 HDF5 Data Integrity Report POSIX Function Shipping Design Document - H1 The Design and Implementation of FastForward Features in HDF5 - H2 User s Guide to FastForward Features in HDF5 5.6,5.7 H3 HDF5 Data in IOD Containers Layout Specification 5.1 H4 HDF Function Shipper Design v ,7.3 H5 Burst Buffer Space Management - Prototype and Production - H6 Deep Dive - Transactions - Presentation Sept H7 Mercury Design Document - H8 AXE Design Document - A1 ACG Solution Architecture 4.1 A2 ACG Computation ACG Software Install Guide - EFF1 End to End Data Integrity in the Intel/EMC/HDF Exascale IO Presentation, Sept HDF 3.1 Dynamic Data Structure Support 3

4 Demonstration Goals Simulate a producer/consumer workflow Run applications at larger scale Demonstrate resilience of the stack to failures Early feedback on stack behavior/performance Compute Nodes (CNs) I/O Nodes (IONs) DAOS-Lustre MetaData Server (MDS) DAOS-Lustre Object Storage Servers (OSSs) 4

5 LANL Test Cluster: Buffy 5

6 Test Environment 64 Compute nodes Cray SLES Aries interconnect 14 I/O nodes CentOS 6.4 Connected to both Aries & Infiniband network 3 Lustre file systems exported by the abba appliance cross mounted on all IONs 5 DAOS servers (1MDS + 4 OSS) CentOS OSTs per OSS, each OST using 5x RAID6+2 LUNs SFA12K has writeback cache enabled 6

7 Workflow Simulation / attribute Step#0 DT0 Step#1 DT1 Step#2 DT2 Dataset 1D Dataset 1D Dataset 1D Split Communicator in half, Producers and Consumers. Producers create N timesteps (runtime argument s) Each timestep is written with 1 transaction. Persist every R timesteps. Consumers read and verify R timesteps once the Persist of those timesteps are completed by the Producers. Producers will concurrently write the next R timesteps as the consumers are reading and verifying After an Application crash/relaunch, Consumers will first verify all the timesteps written up to the latest readable transaction on DAOS. Then the above behavior resumes. 7

8 Three failure scenarios 1. Fail the CN Purge the BB s and restart the app. Recover from last persisted checkpoint on DAOS Future work: Recover from ION since data still there. 2. Fail the ION Wait for VPIC to fail. Remount the needed mount points on ION after recovery Same as above Future work: Recover from ION if ION data is resilient. 3. Fail the OSS Fail and then recover an OSS VPIC will fail a persist and wait for the OSS VPIC will retry the persist Future work VPIC continue computing and committing to IOD while waiting for OSS IOD to internally retry the persist 8

9 Compute Node Failure Simulation Let s start the demo Run VPIC Application on CNs & IONs Kill Application running on the CNs Clear all burst buffers Run VPIC again which restarts from DAOS Expected result Application resumes from the last persisted state to DAOS and completes successfully 9

10 I/O Node Failure Simulation Let s start the demo Run VPIC Application on CNs & IONs Power cycle an ION (ion01) Wait for VPIC to fail Remount BB filesystems & DAOS once the ION is back online Clear all burst buffers Run VPIC again Expected result Application resumes from the last persisted state to DAOS and completes successfully 10

11 DAOS Server Failure Simulation Let s start the demo Run VPIC Application on CNs & IONs Power cycle an OSS Remount the OSTs once the OSS is back online Wait for VPIC to complete Expected result Application continues running until it needs to communicate with an OST which is down Application waits for the OST to be back online Persist() call might fail if OST has lost I/Os on disk (HDF5 to retry) Application resumes and completes successfully 11

12 VPIC-IO & Backend Activity 600 DAOS & BB Activity (100% transactions persisted) Throughput (MB/s) BB reads BB writes DAOS writes Time (s) 12

13 VPIC-IO & Persist Frequency Runtime (s) % 20% 50% 100% Persist Rate (% of persisted transactions) 13

14 Performance Evaluation IOR driver developed for each layer of the stack Each driver uses a single transaction Checksum disabled -a DAOS New driver using asynchronous DAOS I/O submission -a IOD New driver supporting all IOD object types (blobs, arrays & kvs) Support purge/prefetch/persist/cksum and variable array cell size -a HDF5 Original HDF5 driver extended to support EFF extensions Only support dataset (map to IOD array object) Single dataset shared by each task -a PLFS For checking baseline performance of Lustre cross-mounts on the BB s For checking overhead of IOD which layers above PLFS for BB storage LANL fs_test also augmented with an IOD module -b iod 14

15 Performance DAOS Several important VOSD performance optimizations Enable VIL zero-copy Disabled in M7 demo due to problems in ZFS patch ZFS patch has been almost rewritten since Fully overwritten blocks (128KB) are migrated from VIL to final DAOS object Partially-modified blocks are still copied Fix bug causing zero-copy blocks to be read from disk during flattening Detach aggregation Detach blocks from VIL in bulk just before transaction commits Fix bug in DAOS writeback cache causing poor results in M5 demo 15

16 Write Bandwidth (MB/s) DAOS/IOR - Write DAOS Buffered HCE+1 DAOS DIO HCE+1 POSIX 1:1 POSIX N:1 DAOS DIO VIL KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB I/O Size 16

17 Read Bandwidth (MB/s) DAOS/IOR - Read DAOS DIO POSIX N: POSIX 1: KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB I/O Size 17

18 Performance IOD Several important performance optimizations Reduce overly-eager PLFS index merging PLFS will extend index entries indefinitely for contiguous data PLFS has one checksum per index entry Small reads within a giant checksum chunk are very slow Better aggregation and larger writes to DAOS during persists Using <stdio.h> FILE * IO instead of <fcntl.h> IO Provides client side user-space buffering and caching Many performance tuning parameters in iodrc and plfsrc Threadpool sizes Stripe sizes Checksum chunk sizes Memory consumption during persist DAOS shards used per DAOS storage target Incremental KV persist Optimized sorting in range queries of data across multiple TIDs 18

19 Write Bandwidth (MB/s) IOD/IOR Burst Buffer Write POSIX 1:1 IOD Blob 1:1 IOD Blob N:1 IOD Blob Noglib 8000 IOD Array N: KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB 1GB I/O Size 19

20 Write Bandwidth (GB/s) Small IO s to Single Object on Lola 5 4 PLFS File 3 2 IOD Blob 1 0 4K IOD Array 64K 1M 16M 256M 1G IO Size Note that these tests set the array cell size equal to the IO size. However, in other IOR measurements, not shown, we set the array cells to be a constant 8 bytes regardless of the IO size and observed that this did not affect performance. 20

21 IOD IOPS Scaling by NP 64K 128-len keys per task Two problems 1. Array small IO. Fixable with faster transformation to blobs internally. 2. KV inserts. Fixable with newer MDHIM which uses leveldb instead of pbl-isam. 21

22 Write Bandwidth (GB/s) IOStore and Debug Impact on PLFS on Lola K 64K 1M 16M 256M IO Size 1G IOStore=Glib, MLOG=Err IOStore=Glib, Mlog=Debug IOStore=Posix, Mlog=Err IOStore=Posix, Mlog=Debug INSIGHT: Apps better do large IO or have client-side buffering. 22

23 Read Bandwidth (MB/s) IOD/IOR Burst Buffer Read POSIX 1: IOD Blob 1:1 IOD Blob N:1 IOD Array N: KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB 1GB I/O Size 23

24 Persist Bandwidth (MB/s) IOD/IOR Persist IOD Blob 1: IOD Blob N: KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB I/O Size 24

25 IOD IOPS Persist Scaling by NP 64K 128-len keys per task INSIGHT: Apps need to understand how KV objects are sharding as seen by the very poor performance of the KV using decimal keys. Or new functionality needed from IOD. e.g. Hey IOD: Don t use sorted ranges; hash the keys instead. 25

26 IOD/IOR Persist DAOS RPC Size 100% 90% 80% 70% 60% DAOS RPC Size Distribution 50% 40% 512 KB 30% 20% 10% 0% 4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB I/O Size 1 MB 26

27 Bandwidth (MB/s) IOD/IOR Persist Storage Activity (32K IO) BB Reads DAOS Writes BB Writes Time (s) 27

28 Performance HDF5 Performance results for HDF5 were gathered with 2 modes: Mercury (non coresident) Mode: The IOR application / clients run on the CNs and connect to the HDF5 VOL servers on the IONs. This will include the cost of data transfer through Mercury in the performance results. Coresident Mode: The IOR application / clients are executed on the IONs and act also as HDF5 VOL server processes. Overhead of data transfer is eliminated. Checksums: Add a considerable overhead. Disabled for most runs. 28

29 Bandwidth (MB/s) HDF5 and IOD IOR Read & Write IOD Array write IOD Array read HDF5 Co-resident write HDF5 Co-resident read HDF5 Mercury write HDF5 Mercury read KB 128KB 256KB 512KB 1MB 1GB I/O Size 29

30 Bandwidth (MB/s) HDF5 Co-resident IOR Impact of IOD Checksums (with slower crc64) Write Without cksum Read Without cksum Read With cksum Write With cksum KB 128KB 256KB 512KB 1MB I/O Size 30

31 ACG Retrieval of sub-objects HDFS EFF EFF stack much faster and more consistent than HDFS for an important ACG workload retrieving sub-objects. 31

32 Performance Buffy MPICH with GNI support Native support for Aries interconnect Low latency & RDMA support Some simple IOD operations like transaction finish significantly slower with MPICH/GNI than with MPICH/TCP Might be related to multi-thread support with one thread polling inside MPI Problem still under investigation All performance tests were run with MPICH/TCP 32

33 Time (s) IOD Operations MPI/TCP vs MPI/GNI Lola eth Buffy gni Buffy eth Note: Buffy gni out-performs Buffy ethernet by about 3X using the standard Intel MPI Benchmarks. IOD, and MDHIM within it, use non-standard MPI routines such as heavy use of threads. 33

34 Intel MPI Benchmark Alltoall 34

35 Intel MPI Benchmark Pingpong 35

36 Performance Checksum Noticed significant impact of checksums on performance Default checksum algorithm set to crc64 Not the most optimal one e.g. adler32 usually performs better Should test with crc32c supported in hardware since SSE4.2 36

37 Checksum algorithms versus memset memset adler32 crc64 37

38 Write Bandwidth (MB/s) Performance Summary (1/2) Full IOR Write Comparison KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB 1GB I/O Size PLFS N-1 POSIX 1:1 IOD Blob 1:1 IOD Blob N:1 IOD Array N:1 HDF5 Co-resident HDF5 Mercury DAOS Buffered HCE+1 IOD Blob 1:1 Persist IOD Blob N:1 Persist 38

39 Read Bandwidth (MB/s) Performance Summary (2/2) Full IOR Read Comparison POSIX 1:1 IOD Blob 1:1 PLFS N-1 IOD Blob N:1 IOD Array N:1 HDF5 Co-resident DAOS DIO HDF5 Mercury 0 4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB 1GB I/O Size 39

40 Performance Next Steps (1/2) All ACG Cray Checksums. Transition from crc64 to adler32 Identify whether performance issues are design or implementation Scale tests and more performance benchmarks Being able to efficiently extend on-demand sizes of data structures More efficiency in variable-length structures (common in natural graphs) Multi-threaded MPI on Aries DAOS Zero-copy VIL performance improvement for large dataset More performance benchmarks Many objects in a single shard Persist to epochs > HCE+1 40

41 Performance Next Steps (2/2) HDF5 Mercury Plugins that support the native network protocol as well as true RDMA (as opposed to current MPI emulation) More testing of access patterns from HDF5 level, to understand tradeoffs between various data representations, prefetch options, read/write/persist granularities. IOD Arrays (especially small IO s) KV s Update to new MDHIM with LevelDB instead of PBL-iSAM Implement fetch_next_list from key K in addition to current fetch_next_list from Nth key More testing of reads. Reads after purge and reads after fetch. Some was done here and some was done in 8.3 and 8.4 But larger focus on writes. What do you expect? PLFS Motto: Checkpoints. 41

42 Fast Forward Project

43 Bandwidth (MB/s) IOD/IOR Persist Storage Activity (1MB IO) BB Reads DAOS Writes BB Writes Time (s) 43

44 Howard Pritchard about Buffy Gemini: It appears the issue is one of fairness within the nemesis progress engine when one thread is blocking within MPI - i.e. has made some kind of blocking MPI call. Although there are places within the progress engine where the thread yields the lock, it's not sufficient to prevent hindering progress of other threads. The tcp netmod is less susceptible to this, I suspect, because once a thread starts reading data off of a socket, its doing so with the big lock held. With ugni (and likely also for ib), since there aren't these "blocking" calls (note the mpich tcp uses non-blocking sockets but once there's data to read it keeps reading till there's no more data in the sock buffer at that time) they are more likely to exit the progress engine and return to the application without having completed a transfer. There may be additional effects of this long blocking barrier on other threads making progress.

45 IOD Scaling by NP (8MB IO, 4GB per task) 45

46 Adler32 Checksums in IOD - Writes Drop-off for large IO probably due to fs_test timing measureemnt bug which includes buffer allocation and setup in the open time. 46

47 Adler32 Checksums in IOD - Reads Drop-off for large IO probably due to fs_test timing measureemnt bug which includes buffer allocation and setup in the open time. 47

48 Adler32 Checksums in IOD - Persists Drop-off for large IO probably due to fs_test timing measureemnt bug which includes buffer allocation and setup in the open time. 48

49 Demo Performance

50 Demo Performance

51 Performance VPIC-IO DAOS Lustre IOD aggregation Buffy mpi-eth IOD <stdio.h> VOSD zero-copy 51

5.4 - DAOS Demonstration and Benchmark Report

5.4 - DAOS Demonstration and Benchmark Report 5.4 - DAOS Demonstration and Benchmark Report Johann LOMBARDI on behalf of the DAOS team September 25 th, 2013 Livermore (CA) NOTICE: THIS MANUSCRIPT HAS BEEN AUTHORED BY INTEL UNDER ITS SUBCONTRACT WITH

More information

High Level Design IOD KV Store FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O

High Level Design IOD KV Store FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O Date: January 10, 2013 High Level Design IOD KV Store FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O LLNS Subcontract No. Subcontractor Name Subcontractor Address B599860

More information

FastForward I/O and Storage: IOD M5 Demonstration (5.2, 5.3, 5.9, 5.10)

FastForward I/O and Storage: IOD M5 Demonstration (5.2, 5.3, 5.9, 5.10) FastForward I/O and Storage: IOD M5 Demonstration (5.2, 5.3, 5.9, 5.10) 1 EMC September, 2013 John Bent john.bent@emc.com Sorin Faibish faibish_sorin@emc.com Xuezhao Liu xuezhao.liu@emc.com Harriet Qiu

More information

EFF-IO M7.5 Demo. Semantic Migration of Multi-dimensional Arrays

EFF-IO M7.5 Demo. Semantic Migration of Multi-dimensional Arrays EFF-IO M7.5 Demo Semantic Migration of Multi-dimensional Arrays John Bent, Sorin Faibish, Xuezhao Liu, Harriet Qui, Haiying Tang, Jerry Tirrell, Jingwang Zhang, Kelly Zhang, Zhenhua Zhang NOTICE: THIS

More information

FastForward I/O and Storage: ACG 8.6 Demonstration

FastForward I/O and Storage: ACG 8.6 Demonstration FastForward I/O and Storage: ACG 8.6 Demonstration Kyle Ambert, Jaewook Yu, Arnab Paul Intel Labs June, 2014 NOTICE: THIS MANUSCRIPT HAS BEEN AUTHORED BY INTEL UNDER ITS SUBCONTRACT WITH LAWRENCE LIVERMORE

More information

FastForward I/O and Storage: ACG 5.8 Demonstration

FastForward I/O and Storage: ACG 5.8 Demonstration FastForward I/O and Storage: ACG 5.8 Demonstration Jaewook Yu, Arnab Paul, Kyle Ambert Intel Labs September, 2013 NOTICE: THIS MANUSCRIPT HAS BEEN AUTHORED BY INTEL UNDER ITS SUBCONTRACT WITH LAWRENCE

More information

Design Document (Historical) HDF5 Dynamic Data Structure Support FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O

Design Document (Historical) HDF5 Dynamic Data Structure Support FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O Date: July 24, 2013 Design Document (Historical) HDF5 Dynamic Data Structure Support FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O LLNS Subcontract No. Subcontractor

More information

End-to-End Data Integrity in the Intel/EMC/HDF Group Exascale IO DOE Fast Forward Project

End-to-End Data Integrity in the Intel/EMC/HDF Group Exascale IO DOE Fast Forward Project End-to-End Data Integrity in the Intel/EMC/HDF Group Exascale IO DOE Fast Forward Project As presented by John Bent, EMC and Quincey Koziol, The HDF Group Truly End-to-End App provides checksum buffer

More information

libhio: Optimizing IO on Cray XC Systems With DataWarp

libhio: Optimizing IO on Cray XC Systems With DataWarp libhio: Optimizing IO on Cray XC Systems With DataWarp May 9, 2017 Nathan Hjelm Cray Users Group May 9, 2017 Los Alamos National Laboratory LA-UR-17-23841 5/8/2017 1 Outline Background HIO Design Functionality

More information

Milestone 6.3: Basic Analysis Shipping Demonstration

Milestone 6.3: Basic Analysis Shipping Demonstration The HDF Group Milestone 6.3: Basic Analysis Shipping Demonstration Ruth Aydt, Mohamad Chaarawi, Ivo Jimenez, Quincey Koziol, Jerome Soumagne 12/17/2013 NOTICE: THIS MANUSCRIPT HAS BEEN AUTHORED BY INTEL

More information

The HDF Group Q5 Demo

The HDF Group Q5 Demo The HDF Group The HDF Group Q5 Demo 5.6 HDF5 Transaction API 5.7 Full HDF5 Dynamic Data Structure NOTICE: THIS MANUSCRIPT HAS BEEN AUTHORED BY INTEL UNDER ITS SUBCONTRACT WITH LAWRENCE LIVERMORE NATIONAL

More information

Lustre* - Fast Forward to Exascale High Performance Data Division. Eric Barton 18th April, 2013

Lustre* - Fast Forward to Exascale High Performance Data Division. Eric Barton 18th April, 2013 Lustre* - Fast Forward to Exascale High Performance Data Division Eric Barton 18th April, 2013 DOE Fast Forward IO and Storage Exascale R&D sponsored by 7 leading US national labs Solutions to currently

More information

6.5 Collective Open/Close & Epoch Distribution Demonstration

6.5 Collective Open/Close & Epoch Distribution Demonstration 6.5 Collective Open/Close & Epoch Distribution Demonstration Johann LOMBARDI on behalf of the DAOS team December 17 th, 2013 Fast Forward Project - DAOS DAOS Development Update Major accomplishments of

More information

Lustre overview and roadmap to Exascale computing

Lustre overview and roadmap to Exascale computing HPC Advisory Council China Workshop Jinan China, October 26th 2011 Lustre overview and roadmap to Exascale computing Liang Zhen Whamcloud, Inc liang@whamcloud.com Agenda Lustre technology overview Lustre

More information

Milestone Burst Buffer & Data Integrity Demonstra>on Milestone End- to- End Epoch Recovery Demonstra>on

Milestone Burst Buffer & Data Integrity Demonstra>on Milestone End- to- End Epoch Recovery Demonstra>on he HF Group ilestone 7.2 - Burst Buffer & ata Integrity emonstra>on ilestone 7.3 - End- to- End Epoch Recovery emonstra>on NOICE: HIS ANUSCRIP HAS BEEN AUHORE BY HE HF GROUP UNER HE INEL SUBCONRAC WIH

More information

Fast Forward I/O & Storage

Fast Forward I/O & Storage Fast Forward I/O & Storage Eric Barton Lead Architect 1 Department of Energy - Fast Forward Challenge FastForward RFP provided US Government funding for exascale research and development Sponsored by 7

More information

DAOS Epoch Recovery Design FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O

DAOS Epoch Recovery Design FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O Date: June 4, 2014 DAOS Epoch Recovery Design FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O LLNS Subcontract No. Subcontractor Name Subcontractor Address B599860 Intel

More information

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application

More information

API and Usage of libhio on XC-40 Systems

API and Usage of libhio on XC-40 Systems API and Usage of libhio on XC-40 Systems May 24, 2018 Nathan Hjelm Cray Users Group May 24, 2018 Los Alamos National Laboratory LA-UR-18-24513 5/24/2018 1 Outline Background HIO Design HIO API HIO Configuration

More information

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc. UK LUG 10 th July 2012 Lustre at Exascale Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com Agenda Exascale I/O requirements Exascale I/O model 3 Lustre at Exascale - UK LUG 10th July 2012 Exascale I/O

More information

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete 1 DDN Who We Are 2 We Design, Deploy and Optimize Storage Systems Which Solve HPC, Big Data and Cloud Business

More information

Guidelines for Efficient Parallel I/O on the Cray XT3/XT4

Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 Jeff Larkin, Cray Inc. and Mark Fahey, Oak Ridge National Laboratory ABSTRACT: This paper will present an overview of I/O methods on Cray XT3/XT4

More information

Fast Forward Storage & I/O. Jeff Layton (Eric Barton)

Fast Forward Storage & I/O. Jeff Layton (Eric Barton) Fast Forward & I/O Jeff Layton (Eric Barton) DOE Fast Forward IO and Exascale R&D sponsored by 7 leading US national labs Solutions to currently intractable problems of Exascale required to meet the 2020

More information

High Level Design Client Health and Global Eviction FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O MILESTONE: 4.

High Level Design Client Health and Global Eviction FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O MILESTONE: 4. Date: 2013-06-01 High Level Design Client Health and Global Eviction FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O MILESTONE: 4.1 LLNS Subcontract No. Subcontractor

More information

File Open, Close, and Flush Performance Issues in HDF5 Scot Breitenfeld John Mainzer Richard Warren 02/19/18

File Open, Close, and Flush Performance Issues in HDF5 Scot Breitenfeld John Mainzer Richard Warren 02/19/18 File Open, Close, and Flush Performance Issues in HDF5 Scot Breitenfeld John Mainzer Richard Warren 02/19/18 1 Introduction Historically, the parallel version of the HDF5 library has suffered from performance

More information

Johann Lombardi High Performance Data Division

Johann Lombardi High Performance Data Division ZFS Improvements for HPC Johann Lombardi High Performance Data Division Lustre*: ZFS Support ZFS backend fully supported since 2.4.0 Basic support for ZFS-based OST introduced in 2.3.0 ORION project funded

More information

Structuring PLFS for Extensibility

Structuring PLFS for Extensibility Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University What is PLFS? Parallel Log Structured File System Interposed filesystem b/w

More information

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer

More information

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar An Exploration into Object Storage for Exascale Supercomputers Raghu Chandrasekar Agenda Introduction Trends and Challenges Design and Implementation of SAROJA Preliminary evaluations Summary and Conclusion

More information

Lustre and PLFS Parallel I/O Performance on a Cray XE6

Lustre and PLFS Parallel I/O Performance on a Cray XE6 Lustre and PLFS Parallel I/O Performance on a Cray XE6 Cray User Group 2014 Lugano, Switzerland May 4-8, 2014 April 2014 1 Many currently contributing to PLFS LANL: David Bonnie, Aaron Caldwell, Gary Grider,

More information

LLNL Lustre Centre of Excellence

LLNL Lustre Centre of Excellence LLNL Lustre Centre of Excellence Mark Gary 4/23/07 This work was performed under the auspices of the U.S. Department of Energy by University of California, Lawrence Livermore National Laboratory under

More information

Milestone 8.1: HDF5 Index Demonstration

Milestone 8.1: HDF5 Index Demonstration The HDF Group Milestone 8.1: HDF5 Index Demonstration Ruth Aydt, Mohamad Chaarawi, Quincey Koziol, Aleksandar Jelenak, Jerome Soumagne 06/30/2014 NOTICE: THIS MANUSCRIPT HAS BEEN AUTHORED BY THE HDF GROUP

More information

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE DELL EMC ISILON F800 AND H600 I/O PERFORMANCE ABSTRACT This white paper provides F800 and H600 performance data. It is intended for performance-minded administrators of large compute clusters that access

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Versioning Object Storage Device (VOSD) Design Document FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O

Versioning Object Storage Device (VOSD) Design Document FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O Date: June 4, 2014 Versioning Object Storage Device (VOSD) Design Document FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O LLNS Subcontract No. Subcontractor Name Subcontractor

More information

IME Infinite Memory Engine Technical Overview

IME Infinite Memory Engine Technical Overview 1 1 IME Infinite Memory Engine Technical Overview 2 Bandwidth, IOPs single NVMe drive 3 What does Flash mean for Storage? It's a new fundamental device for storing bits. We must treat it different from

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

MDHIM: A Parallel Key/Value Store Framework for HPC

MDHIM: A Parallel Key/Value Store Framework for HPC MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system

More information

OpenZFS Performance Improvements

OpenZFS Performance Improvements OpenZFS Performance Improvements LUG Developer Day 2015 April 16, 2015 Brian, Behlendorf This work was performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344.

More information

Got Burst Buffer. Now What? Early experiences, exciting future possibilities, and what we need from the system to make it work

Got Burst Buffer. Now What? Early experiences, exciting future possibilities, and what we need from the system to make it work Got Burst Buffer. Now What? Early experiences, exciting future possibilities, and what we need from the system to make it work The Salishan Conference on High-Speed Computing April 26, 2016 Adam Moody

More information

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl DDN s Vision for the Future of Lustre LUG2015 Robert Triendl 3 Topics 1. The Changing Markets for Lustre 2. A Vision for Lustre that isn t Exascale 3. Building Lustre for the Future 4. Peak vs. Operational

More information

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG 2017 @Beijing Outline LNet reliability DNE improvements Small file performance File Level Redundancy Miscellaneous improvements

More information

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division Andreas Dilger Principal Lustre Engineer High Performance Data Division Focus on Performance and Ease of Use Beyond just looking at individual features... Incremental but continuous improvements Performance

More information

Architecting a High Performance Storage System

Architecting a High Performance Storage System WHITE PAPER Intel Enterprise Edition for Lustre* Software High Performance Data Division Architecting a High Performance Storage System January 2014 Contents Introduction... 1 A Systematic Approach to

More information

CA485 Ray Walshe Google File System

CA485 Ray Walshe Google File System Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage

More information

CSCS HPC storage. Hussein N. Harake

CSCS HPC storage. Hussein N. Harake CSCS HPC storage Hussein N. Harake Points to Cover - XE6 External Storage (DDN SFA10K, SRP, QDR) - PCI-E SSD Technology - RamSan 620 Technology XE6 External Storage - Installed Q4 2010 - In Production

More information

Lustre A Platform for Intelligent Scale-Out Storage

Lustre A Platform for Intelligent Scale-Out Storage Lustre A Platform for Intelligent Scale-Out Storage Rumi Zahir, rumi. May 2003 rumi.zahir@intel.com Agenda Problem Statement Trends & Current Data Center Storage Architectures The Lustre File System Project

More information

Chapter 12: File System Implementation

Chapter 12: File System Implementation Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency

More information

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Evaluation of Lustre File System software enhancements for improved Metadata performance Wojciech Turek, Paul Calleja,John

More information

Chapter 10: File System Implementation

Chapter 10: File System Implementation Chapter 10: File System Implementation Chapter 10: File System Implementation File-System Structure" File-System Implementation " Directory Implementation" Allocation Methods" Free-Space Management " Efficiency

More information

Chapter 11: Implementing File Systems

Chapter 11: Implementing File Systems Chapter 11: Implementing File Systems Operating System Concepts 99h Edition DM510-14 Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation

More information

PLFS and Lustre Performance Comparison

PLFS and Lustre Performance Comparison PLFS and Lustre Performance Comparison Lustre User Group 2014 Miami, FL April 8-10, 2014 April 2014 1 Many currently contributing to PLFS LANL: David Bonnie, Aaron Caldwell, Gary Grider, Brett Kettering,

More information

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition Chapter 12: File System Implementation Silberschatz, Galvin and Gagne 2013 Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods

More information

Improved Solutions for I/O Provisioning and Application Acceleration

Improved Solutions for I/O Provisioning and Application Acceleration 1 Improved Solutions for I/O Provisioning and Application Acceleration August 11, 2015 Jeff Sisilli Sr. Director Product Marketing jsisilli@ddn.com 2 Why Burst Buffer? The Supercomputing Tug-of-War A supercomputer

More information

Chapter 12: File System Implementation

Chapter 12: File System Implementation Chapter 12: File System Implementation Silberschatz, Galvin and Gagne 2013 Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods

More information

Analyzing I/O Performance on a NEXTGenIO Class System

Analyzing I/O Performance on a NEXTGenIO Class System Analyzing I/O Performance on a NEXTGenIO Class System holger.brunst@tu-dresden.de ZIH, Technische Universität Dresden LUG17, Indiana University, June 2 nd 2017 NEXTGenIO Fact Sheet Project Research & Innovation

More information

Directions in Workload Management

Directions in Workload Management Directions in Workload Management Alex Sanchez and Morris Jette SchedMD LLC HPC Knowledge Meeting 2016 Areas of Focus Scalability Large Node and Core Counts Power Management Failure Management Federated

More information

OPERATING SYSTEM. Chapter 12: File System Implementation

OPERATING SYSTEM. Chapter 12: File System Implementation OPERATING SYSTEM Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management

More information

Using DDN IME for Harmonie

Using DDN IME for Harmonie Irish Centre for High-End Computing Using DDN IME for Harmonie Gilles Civario, Marco Grossi, Alastair McKinstry, Ruairi Short, Nix McDonnell April 2016 DDN IME: Infinite Memory Engine IME: Major Features

More information

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD. OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD. File System Implementation FILES. DIRECTORIES (FOLDERS). FILE SYSTEM PROTECTION. B I B L I O G R A P H Y 1. S I L B E R S C H AT Z, G A L V I N, A N

More information

High-Performance Lustre with Maximum Data Assurance

High-Performance Lustre with Maximum Data Assurance High-Performance Lustre with Maximum Data Assurance Silicon Graphics International Corp. 900 North McCarthy Blvd. Milpitas, CA 95035 Disclaimer and Copyright Notice The information presented here is meant

More information

MS 52 Distributed Persistent Memory Class Storage Model (DAOS-M) Solution Architecture Revision 1.2 September 28, 2015

MS 52 Distributed Persistent Memory Class Storage Model (DAOS-M) Solution Architecture Revision 1.2 September 28, 2015 MS 52 Distributed Persistent Memory Class Storage Model (DAOS-M) Solution Architecture Revision 1.2 September 28, 2015 Intel Federal, LLC Proprietary i Solution Architecture Generated under Argonne Contract

More information

Demonstration Milestone for Parallel Directory Operations

Demonstration Milestone for Parallel Directory Operations Demonstration Milestone for Parallel Directory Operations This milestone was submitted to the PAC for review on 2012-03-23. This document was signed off on 2012-04-06. Overview This document describes

More information

Caching and Buffering in HDF5

Caching and Buffering in HDF5 Caching and Buffering in HDF5 September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 1 Software stack Life cycle: What happens to data when it is transferred from application buffer to HDF5 file and from HDF5

More information

Progress on Efficient Integration of Lustre* and Hadoop/YARN

Progress on Efficient Integration of Lustre* and Hadoop/YARN Progress on Efficient Integration of Lustre* and Hadoop/YARN Weikuan Yu Robin Goldstone Omkar Kulkarni Bryon Neitzel * Some name and brands may be claimed as the property of others. MapReduce l l l l A

More information

An Overview of Fujitsu s Lustre Based File System

An Overview of Fujitsu s Lustre Based File System An Overview of Fujitsu s Lustre Based File System Shinji Sumimoto Fujitsu Limited Apr.12 2011 For Maximizing CPU Utilization by Minimizing File IO Overhead Outline Target System Overview Goals of Fujitsu

More information

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018 Small File I/O Performance in Lustre Mikhail Pershin, Joe Gmitter Intel HPDD April 2018 Overview Small File I/O Concerns Data on MDT (DoM) Feature Overview DoM Use Cases DoM Performance Results Small File

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

Blue Waters I/O Performance

Blue Waters I/O Performance Blue Waters I/O Performance Mark Swan Performance Group Cray Inc. Saint Paul, Minnesota, USA mswan@cray.com Doug Petesch Performance Group Cray Inc. Saint Paul, Minnesota, USA dpetesch@cray.com Abstract

More information

Parallel File Systems. John White Lawrence Berkeley National Lab

Parallel File Systems. John White Lawrence Berkeley National Lab Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation

More information

SFA12KX and Lustre Update

SFA12KX and Lustre Update Sep 2014 SFA12KX and Lustre Update Maria Perez Gutierrez HPC Specialist HPC Advisory Council Agenda SFA12KX Features update Partial Rebuilds QoS on reads Lustre metadata performance update 2 SFA12KX Features

More information

Chapter 11: Implementing File

Chapter 11: Implementing File Chapter 11: Implementing File Systems Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency

More information

Evaluation of Parallel I/O Performance and Energy with Frequency Scaling on Cray XC30 Suren Byna and Brian Austin

Evaluation of Parallel I/O Performance and Energy with Frequency Scaling on Cray XC30 Suren Byna and Brian Austin Evaluation of Parallel I/O Performance and Energy with Frequency Scaling on Cray XC30 Suren Byna and Brian Austin Lawrence Berkeley National Laboratory Energy efficiency at Exascale A design goal for future

More information

I/O Profiling Towards the Exascale

I/O Profiling Towards the Exascale I/O Profiling Towards the Exascale holger.brunst@tu-dresden.de ZIH, Technische Universität Dresden NEXTGenIO & SAGE: Working towards Exascale I/O Barcelona, NEXTGenIO facts Project Research & Innovation

More information

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions Roger Goff Senior Product Manager DataDirect Networks, Inc. What is Lustre? Parallel/shared file system for

More information

LAPI on HPS Evaluating Federation

LAPI on HPS Evaluating Federation LAPI on HPS Evaluating Federation Adrian Jackson August 23, 2004 Abstract LAPI is an IBM-specific communication library that performs single-sided operation. This library was well profiled on Phase 1 of

More information

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Operating Systems Lecture 7.2 - File system implementation Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Design FAT or indexed allocation? UFS, FFS & Ext2 Journaling with Ext3

More information

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition Chapter 11: Implementing File Systems Operating System Concepts 9 9h Edition Silberschatz, Galvin and Gagne 2013 Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory

More information

T10PI End-to-End Data Integrity Protection for Lustre

T10PI End-to-End Data Integrity Protection for Lustre 1! T10PI End-to-End Data Integrity Protection for Lustre 2018/04/25 Shuichi Ihara, Li Xi DataDirect Networks, Inc. 2! Why is data Integrity important? Data corruptions is painful Frequency is low, but

More information

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support Data Management Dr David Henty HPC Training and Support d.henty@epcc.ed.ac.uk +44 131 650 5960 Overview Lecture will cover Why is IO difficult Why is parallel IO even worse Lustre GPFS Performance on ARCHER

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model

More information

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research Computer Science Section Computational and Information Systems Laboratory National Center for Atmospheric Research My work in the context of TDD/CSS/ReSET Polynya new research computing environment Polynya

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Week 12: File System Implementation

Week 12: File System Implementation Week 12: File System Implementation Sherif Khattab http://www.cs.pitt.edu/~skhattab/cs1550 (slides are from Silberschatz, Galvin and Gagne 2013) Outline File-System Structure File-System Implementation

More information

Andreas Dilger, Intel High Performance Data Division Lustre User Group 2017

Andreas Dilger, Intel High Performance Data Division Lustre User Group 2017 Andreas Dilger, Intel High Performance Data Division Lustre User Group 2017 Statements regarding future functionality are estimates only and are subject to change without notice Performance and Feature

More information

CS 537 Fall 2017 Review Session

CS 537 Fall 2017 Review Session CS 537 Fall 2017 Review Session Deadlock Conditions for deadlock: Hold and wait No preemption Circular wait Mutual exclusion QUESTION: Fix code List_insert(struct list * head, struc node * node List_move(struct

More information

Dept. Of Computer Science, Colorado State University

Dept. Of Computer Science, Colorado State University CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [HADOOP/HDFS] Trying to have your cake and eat it too Each phase pines for tasks with locality and their numbers on a tether Alas within a phase, you get one,

More information

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Feedback on BeeGFS. A Parallel File System for High Performance Computing Feedback on BeeGFS A Parallel File System for High Performance Computing Philippe Dos Santos et Georges Raseev FR 2764 Fédération de Recherche LUmière MATière December 13 2016 LOGO CNRS LOGO IO December

More information

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1 ZEST Snapshot Service A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1 Design Motivation To optimize science utilization of the machine Maximize

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in

More information

Illinois Proposal Considerations Greg Bauer

Illinois Proposal Considerations Greg Bauer - 2016 Greg Bauer Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and

More information

Reduction Network Discovery Design Document FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O

Reduction Network Discovery Design Document FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O Date: May 01, 2014 Reduction Network Discovery Design Document FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O LLNS Subcontract No. Subcontractor Name Subcontractor

More information

An Exploration of New Hardware Features for Lustre. Nathan Rutman

An Exploration of New Hardware Features for Lustre. Nathan Rutman An Exploration of New Hardware Features for Lustre Nathan Rutman Motivation Open-source Hardware-agnostic Linux Least-common-denominator hardware 2 Contents Hardware CRC MDRAID T10 DIF End-to-end data

More information

DELL Terascala HPC Storage Solution (DT-HSS2)

DELL Terascala HPC Storage Solution (DT-HSS2) DELL Terascala HPC Storage Solution (DT-HSS2) A Dell Technical White Paper Dell Li Ou, Scott Collier Terascala Rick Friedman Dell HPC Solutions Engineering THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES

More information

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson A Cross Media File System Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson 1 Let s build a fast server NoSQL store, Database, File server, Mail server Requirements

More information

Application Performance on IME

Application Performance on IME Application Performance on IME Toine Beckers, DDN Marco Grossi, ICHEC Burst Buffer Designs Introduce fast buffer layer Layer between memory and persistent storage Pre-stage application data Buffer writes

More information

Google File System. Arun Sundaram Operating Systems

Google File System. Arun Sundaram Operating Systems Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)

More information

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Hitoshi Sato *1, Shuichi Ihara *2, Satoshi Matsuoka *1 *1 Tokyo Institute

More information

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:

More information