REMEM: REmote MEMory as Checkpointing Storage

Size: px

Start display at page:

Download "REMEM: REmote MEMory as Checkpointing Storage"

Clementine Gray
5 years ago
Views:

1 REMEM: REmote MEMory as Checkpointing Storage Hui Jin Illinois Institute of Technology Xian-He Sun Illinois Institute of Technology Yong Chen Oak Ridge National Laboratory Tao Ke Illinois Institute of Technology 12/20/2010 CloudCom

2 Outline Background & Motivation REMEM Design Implementation of REMEM on Open MPI Adaptive Checkpointing Storage Selection Experimental Results Conclusions & Future Work 12/20/2010 CloudCom

3 Motivation Checkpointing is a mostly used mechanism to support fault tolerance in High-Performance Computing environment. However, it introduces considerable overhead due to the expensive I/O access cost. For a 1-petaFLOPS system, checkpointing can potentially harm the system performance by 50%.[R. Oldfield al, et 2007] The upcoming Exascale computing environment puts forward even more challenges. 10^18 FLOPS computing power. Millions of computing components. Checkpointing on the centralized parallel file system is not scalable. What if the MTBF < checkpointing cost? 12/20/2010 CloudCom

4 A detailed look of Checkpointing Cost J. Hursey, al et, "Interconnect Agnostic Checkpoint/Resart in Open MPI", HPDC /20/2010 CloudCom

5 Motivation Memory-based checkpointing is a promising solution to break through the bottleneck from the stable storage. But Rarely supported by the mainstream of current checkpoint systems. Complexity. Reliability Concern. Excess Memory Usage 12/20/2010 CloudCom

6 REMEM REmote MEMory as Checkpiting Storage. Seamless integration with existing checkpointing sysems. Flexible switch between disk and remote memory as checkpointing storage. Consideration of reliability and space efficiency. 12/20/2010 CloudCom

7 REMEM Design Goals Reliability: Memory is volatile. Scalability: Large-scale environment. Space Efficiency: Memory is precious. Transparency: Augment to existing systems. Flexibility: Switch between the disk and memory. 12/20/2010 CloudCom

8 REMEM Design 12/20/2010 CloudCom

9 REMEM Node Matching Reliability: C C k k 1 n k+ 1 n k 1 k Cn C k 2 k n /2 k Cn Z. Chen, etc, Fault Tolerant High Performacne Computing by a Coding Approach, PPoPP 05 12/20/2010 CloudCom

10 REMEM System Configuration 12/20/2010 CloudCom

11 REMEM: Failure Handling If failures occurs to the source node. If backup node is healthy, simply recovery from remote memory. If backup node also fails, loads the image from last disk-based checkpointing. 12/20/2010 CloudCom

12 REMEM: Implementation on Open MPI Open source MPI-2 implementation that provides a high performance, robust, parallel execution environment for a wide variety of computing environments Supports transparent, coordinated checkpoint/restart implementation supported primarily by the BLCR library. 12/20/2010 CloudCom

13 REMEM: Implementation on Open MPI 12/20/2010 CloudCom

14 Adaptive Checkpionting Storage Selection Disk: Memory: 12/20/2010 CloudCom

15 Experimental Setup Hardware A 65-node SunFire Cluster. Compute Nodes. OS: Dual 2.3GHz Opteron quad-core processors and 8GB memory, 250GB 7.2K-RPM SATA hard drive. Ubuntu enterprise server with Linux kernel Software: Open MPI v1.3.3 and GCC REMEM was implemented on the Open MPI with the support of tmpfs and NFS /20/2010 CloudCom

16 Experimental Setup The 64 compute nodes are organized in two groups naturally by the rack id. The nodes from the two groups are mutually mapped for REMEM. 4 dedicated X2200 computer nodes configured as PVFS2 servers. Results were obtained for the NAS Parallel Benchmarks (NPB) version /20/2010 CloudCom

17 REMEM Performance 12/20/2010 CloudCom

18 Problem Size Scaling Performance 12/20/2010 CloudCom

19 Task Scaling Performance 12/20/2010 CloudCom

20 Adaptive Checkpointing Storage Selection Simulate a cluster of 2048 nodes. For each node, we generate a series of failure arrivals withweibull distribution. MTBF = 7668 Hours; shape parameter = /20/2010 CloudCom

21 Adaptive Checkpointing Storage Selection - Metrics Rework Cost Checkpoint Restart Cost Useful Work 12/20/2010 CloudCom

22 Adaptive Checkpointing Storage Selection Performance with Different Number of Processes 12/20/2010 CloudCom

23 Adaptive Checkpointing Storage Selection Performance with Different Number of I/O Nodes 12/20/2010 CloudCom

24 Adaptive Checkpointing Storage Selection Performance with Different Checkpointing Interval 12/20/2010 CloudCom

25 Future Work Release the software. More flexible node matching. How the HPC checkpointing looks like in the cloud? Adopt MapReduce as Checkponiting storage? 12/20/2010 CloudCom

26 Conclusions It is feasible to implement memory based checkpointing seamlessly. Remote memory is a promising alternative to existing disk as checkpointing storage. Memory should be used in combination with disk to guarantee reliability while achieving efficiency. 12/20/2010 CloudCom

27 Thanks! Questions? 12/20/2010 CloudCom

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering