REMEM: REmote MEMory as Checkpointing Storage
|
|
- Clementine Gray
- 5 years ago
- Views:
Transcription
1 REMEM: REmote MEMory as Checkpointing Storage Hui Jin Illinois Institute of Technology Xian-He Sun Illinois Institute of Technology Yong Chen Oak Ridge National Laboratory Tao Ke Illinois Institute of Technology 12/20/2010 CloudCom
2 Outline Background & Motivation REMEM Design Implementation of REMEM on Open MPI Adaptive Checkpointing Storage Selection Experimental Results Conclusions & Future Work 12/20/2010 CloudCom
3 Motivation Checkpointing is a mostly used mechanism to support fault tolerance in High-Performance Computing environment. However, it introduces considerable overhead due to the expensive I/O access cost. For a 1-petaFLOPS system, checkpointing can potentially harm the system performance by 50%.[R. Oldfield al, et 2007] The upcoming Exascale computing environment puts forward even more challenges. 10^18 FLOPS computing power. Millions of computing components. Checkpointing on the centralized parallel file system is not scalable. What if the MTBF < checkpointing cost? 12/20/2010 CloudCom
4 A detailed look of Checkpointing Cost J. Hursey, al et, "Interconnect Agnostic Checkpoint/Resart in Open MPI", HPDC /20/2010 CloudCom
5 Motivation Memory-based checkpointing is a promising solution to break through the bottleneck from the stable storage. But Rarely supported by the mainstream of current checkpoint systems. Complexity. Reliability Concern. Excess Memory Usage 12/20/2010 CloudCom
6 REMEM REmote MEMory as Checkpiting Storage. Seamless integration with existing checkpointing sysems. Flexible switch between disk and remote memory as checkpointing storage. Consideration of reliability and space efficiency. 12/20/2010 CloudCom
7 REMEM Design Goals Reliability: Memory is volatile. Scalability: Large-scale environment. Space Efficiency: Memory is precious. Transparency: Augment to existing systems. Flexibility: Switch between the disk and memory. 12/20/2010 CloudCom
8 REMEM Design 12/20/2010 CloudCom
9 REMEM Node Matching Reliability: C C k k 1 n k+ 1 n k 1 k Cn C k 2 k n /2 k Cn Z. Chen, etc, Fault Tolerant High Performacne Computing by a Coding Approach, PPoPP 05 12/20/2010 CloudCom
10 REMEM System Configuration 12/20/2010 CloudCom
11 REMEM: Failure Handling If failures occurs to the source node. If backup node is healthy, simply recovery from remote memory. If backup node also fails, loads the image from last disk-based checkpointing. 12/20/2010 CloudCom
12 REMEM: Implementation on Open MPI Open source MPI-2 implementation that provides a high performance, robust, parallel execution environment for a wide variety of computing environments Supports transparent, coordinated checkpoint/restart implementation supported primarily by the BLCR library. 12/20/2010 CloudCom
13 REMEM: Implementation on Open MPI 12/20/2010 CloudCom
14 Adaptive Checkpionting Storage Selection Disk: Memory: 12/20/2010 CloudCom
15 Experimental Setup Hardware A 65-node SunFire Cluster. Compute Nodes. OS: Dual 2.3GHz Opteron quad-core processors and 8GB memory, 250GB 7.2K-RPM SATA hard drive. Ubuntu enterprise server with Linux kernel Software: Open MPI v1.3.3 and GCC REMEM was implemented on the Open MPI with the support of tmpfs and NFS /20/2010 CloudCom
16 Experimental Setup The 64 compute nodes are organized in two groups naturally by the rack id. The nodes from the two groups are mutually mapped for REMEM. 4 dedicated X2200 computer nodes configured as PVFS2 servers. Results were obtained for the NAS Parallel Benchmarks (NPB) version /20/2010 CloudCom
17 REMEM Performance 12/20/2010 CloudCom
18 Problem Size Scaling Performance 12/20/2010 CloudCom
19 Task Scaling Performance 12/20/2010 CloudCom
20 Adaptive Checkpointing Storage Selection Simulate a cluster of 2048 nodes. For each node, we generate a series of failure arrivals withweibull distribution. MTBF = 7668 Hours; shape parameter = /20/2010 CloudCom
21 Adaptive Checkpointing Storage Selection - Metrics Rework Cost Checkpoint Restart Cost Useful Work 12/20/2010 CloudCom
22 Adaptive Checkpointing Storage Selection Performance with Different Number of Processes 12/20/2010 CloudCom
23 Adaptive Checkpointing Storage Selection Performance with Different Number of I/O Nodes 12/20/2010 CloudCom
24 Adaptive Checkpointing Storage Selection Performance with Different Checkpointing Interval 12/20/2010 CloudCom
25 Future Work Release the software. More flexible node matching. How the HPC checkpointing looks like in the cloud? Adopt MapReduce as Checkponiting storage? 12/20/2010 CloudCom
26 Conclusions It is feasible to implement memory based checkpointing seamlessly. Remote memory is a promising alternative to existing disk as checkpointing storage. Memory should be used in combination with disk to guarantee reliability while achieving efficiency. 12/20/2010 CloudCom
27 Thanks! Questions? 12/20/2010 CloudCom
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering
More informationCombing Partial Redundancy and Checkpointing for HPC
Combing Partial Redundancy and Checkpointing for HPC James Elliott, Kishor Kharbas, David Fiala, Frank Mueller, Kurt Ferreira, and Christian Engelmann North Carolina State University Sandia National Laboratory
More informationCRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart
CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer
More informationEnhancing Checkpoint Performance with Staging IO & SSD
Enhancing Checkpoint Performance with Staging IO & SSD Xiangyong Ouyang Sonya Marcarelli Dhabaleswar K. Panda Department of Computer Science & Engineering The Ohio State University Outline Motivation and
More informationExploring Use-cases for Non-Volatile Memories in support of HPC Resilience
Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Onkar Patil 1, Saurabh Hukerikar 2, Frank Mueller 1, Christian Engelmann 2 1 Dept. of Computer Science, North Carolina State University
More informationProactive Process-Level Live Migration in HPC Environments
Proactive Process-Level Live Migration in HPC Environments Chao Wang, Frank Mueller North Carolina State University Christian Engelmann, Stephen L. Scott Oak Ridge National Laboratory SC 08 Nov. 20 Austin,
More informationAccelerating Parallel Analysis of Scientific Simulation Data via Zazen
Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Tiankai Tu, Charles A. Rendleman, Patrick J. Miller, Federico Sacerdoti, Ron O. Dror, and David E. Shaw D. E. Shaw Research Motivation
More informationScalable In-memory Checkpoint with Automatic Restart on Failures
Scalable In-memory Checkpoint with Automatic Restart on Failures Xiang Ni, Esteban Meneses, Laxmikant V. Kalé Parallel Programming Laboratory University of Illinois at Urbana-Champaign November, 2012 8th
More informationScalable, Fault-Tolerant Membership for MPI Tasks on HPC Systems
fastos.org/molar Scalable, Fault-Tolerant Membership for MPI Tasks on HPC Systems Jyothish Varma 1, Chao Wang 1, Frank Mueller 1, Christian Engelmann, Stephen L. Scott 1 North Carolina State University,
More informationAggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments
Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments Swen Böhm 1,2, Christian Engelmann 2, and Stephen L. Scott 2 1 Department of Computer
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationA2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications
A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications Li Tan 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California, Riverside 2 Texas
More informationSun Lustre Storage System Simplifying and Accelerating Lustre Deployments
Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems
More informationScalable and Fault Tolerant Failure Detection and Consensus
EuroMPI'15, Bordeaux, France, September 21-23, 2015 Scalable and Fault Tolerant Failure Detection and Consensus Amogh Katti, Giuseppe Di Fatta, University of Reading, UK Thomas Naughton, Christian Engelmann
More informationShared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments
LCI HPC Revolution 2005 26 April 2005 Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments Matthew Woitaszek matthew.woitaszek@colorado.edu Collaborators Organizations National
More informationCrossing the Chasm: Sneaking a parallel file system into Hadoop
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large
More informationCrossing the Chasm: Sneaking a parallel file system into Hadoop
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large
More informationLeveraging Burst Buffer Coordination to Prevent I/O Interference
Leveraging Burst Buffer Coordination to Prevent I/O Interference Anthony Kougkas akougkas@hawk.iit.edu Matthieu Dorier, Rob Latham, Rob Ross, Xian-He Sun Wednesday, October 26th Baltimore, USA Outline
More informationCheckpointing with DMTCP and MVAPICH2 for Supercomputing. Kapil Arya. Mesosphere, Inc. & Northeastern University
MVAPICH Users Group 2016 Kapil Arya Checkpointing with DMTCP and MVAPICH2 for Supercomputing Kapil Arya Mesosphere, Inc. & Northeastern University DMTCP Developer Apache Mesos Committer kapil@mesosphere.io
More informationRollback-Recovery Protocols for Send-Deterministic Applications. Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello
Rollback-Recovery Protocols for Send-Deterministic Applications Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello Fault Tolerance in HPC Systems is Mandatory Resiliency is
More informationCooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O devices
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O devices Ryousei Takano, Hidemoto Nakada, Takahiro Hirofuchi, Yoshio Tanaka, and Tomohiro Kudoh Information Technology Research
More informationStructuring PLFS for Extensibility
Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University What is PLFS? Parallel Log Structured File System Interposed filesystem b/w
More informationNFSv4 as the Building Block for Fault Tolerant Applications
NFSv4 as the Building Block for Fault Tolerant Applications Alexandros Batsakis Overview Goal: To provide support for recoverability and application fault tolerance through the NFSv4 file system Motivation:
More informationCoordinating Parallel HSM in Object-based Cluster Filesystems
Coordinating Parallel HSM in Object-based Cluster Filesystems Dingshan He, Xianbo Zhang, David Du University of Minnesota Gary Grider Los Alamos National Lab Agenda Motivations Parallel archiving/retrieving
More informationAn Empirical Study of High Availability in Stream Processing Systems
An Empirical Study of High Availability in Stream Processing Systems Yu Gu, Zhe Zhang, Fan Ye, Hao Yang, Minkyong Kim, Hui Lei, Zhen Liu Stream Processing Model software operators (PEs) Ω Unexpected machine
More informationRAIDIX Data Storage Solution. Data Storage for a VMware Virtualization Cluster
RAIDIX Data Storage Solution Data Storage for a VMware Virtualization Cluster 2017 Contents Synopsis... 2 Introduction... 3 RAIDIX Architecture for Virtualization... 4 Technical Characteristics... 7 Sample
More informationEvalua&ng Energy Savings for Checkpoint/Restart in Exascale. Bryan Mills, Ryan E. Grant, Kurt B. Ferreira and Rolf Riesen
Evalua&ng Energy Savings for Checkpoint/Restart in Exascale Bryan Mills, Ryan E. Grant, Kurt B. Ferreira and Rolf Riesen E2SC Workshop November 18, 2013 Requisite Agenda Slide CheckpoinLng Why is power
More informationThe Cray Rainier System: Integrated Scalar/Vector Computing
THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology Topics Current Product Overview Cray Technology Strengths Rainier
More informationFastForward I/O and Storage: ACG 5.8 Demonstration
FastForward I/O and Storage: ACG 5.8 Demonstration Jaewook Yu, Arnab Paul, Kyle Ambert Intel Labs September, 2013 NOTICE: THIS MANUSCRIPT HAS BEEN AUTHORED BY INTEL UNDER ITS SUBCONTRACT WITH LAWRENCE
More informationA Case for High Performance Computing with Virtual Machines
A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation
More informationMaximizing NFS Scalability
Maximizing NFS Scalability on Dell Servers and Storage in High-Performance Computing Environments Popular because of its maturity and ease of use, the Network File System (NFS) can be used in high-performance
More informationScalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance
Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance Jonathan Lifflander*, Esteban Meneses, Harshitha Menon*, Phil Miller*, Sriram Krishnamoorthy, Laxmikant V. Kale* jliffl2@illinois.edu,
More informationPSA: Performance and Space-Aware Data Layout for Hybrid Parallel File Systems
PSA: Performance and Space-Aware Data Layout for Hybrid Parallel File Systems Shuibing He, Yan Liu, Xian-He Sun Department of Computer Science Illinois Institute of Technology I/O Becomes the Bottleneck
More informationCan Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?
Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda Network- Based Compu2ng Laboratory Department of Computer
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationS4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems
S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems Shuibing He, Xian-He Sun, Bo Feng Department of Computer Science Illinois Institute of Technology Speed Gap Between CPU and Hard Drive http://www.velobit.com/storage-performance-blog/bid/114532/living-with-the-2012-hdd-shortage
More informationDELL POWERVAULT MD FAMILY MODULAR STORAGE THE DELL POWERVAULT MD STORAGE FAMILY
DELL MD FAMILY MODULAR STORAGE THE DELL MD STORAGE FAMILY Simplifying IT The Dell PowerVault MD family can simplify IT by optimizing your data storage architecture and ensuring the availability of your
More informationChisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique
Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Prateek Dhawalia Sriram Kailasam D. Janakiram Distributed and Object Systems Lab Dept. of Comp.
More informationAdaptive Runtime Support
Scalable Fault Tolerance Schemes using Adaptive Runtime Support Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at
More informationAn Analysis and Empirical Study of Container Networks
An Analysis and Empirical Study of Container Networks Kun Suo *, Yong Zhao *, Wei Chen, Jia Rao * University of Texas at Arlington *, University of Colorado, Colorado Springs INFOCOM 2018@Hawaii, USA 1
More informationClusters. Rob Kunz and Justin Watson. Penn State Applied Research Laboratory
Clusters Rob Kunz and Justin Watson Penn State Applied Research Laboratory rfk102@psu.edu Contents Beowulf Cluster History Hardware Elements Networking Software Performance & Scalability Infrastructure
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationReflections on Failure in Post-Terascale Parallel Computing
Reflections on Failure in Post-Terascale Parallel Computing 2007 Int. Conf. on Parallel Processing, Xi An China Garth Gibson Carnegie Mellon University and Panasas Inc. DOE SciDAC Petascale Data Storage
More informationFunctional Partitioning to Optimize End-to-End Performance on Many-core Architectures
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim,Christian Engelmann, and Galen Shipman
More informationIBM InfoSphere Streams v4.0 Performance Best Practices
Henry May IBM InfoSphere Streams v4.0 Performance Best Practices Abstract Streams v4.0 introduces powerful high availability features. Leveraging these requires careful consideration of performance related
More informationToward An Integrated Cluster File System
Toward An Integrated Cluster File System Adrien Lebre February 1 st, 2008 XtreemOS IP project is funded by the European Commission under contract IST-FP6-033576 Outline Context Kerrighed and root file
More informationThe Fusion Distributed File System
Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique
More informationParallel File Systems for HPC
Introduction to Scuola Internazionale Superiore di Studi Avanzati Trieste November 2008 Advanced School in High Performance and Grid Computing Outline 1 The Need for 2 The File System 3 Cluster & A typical
More informationRevealing Applications Access Pattern in Collective I/O for Cache Management
Revealing Applications Access Pattern in for Yin Lu 1, Yong Chen 1, Rob Latham 2 and Yu Zhuang 1 Presented by Philip Roth 3 1 Department of Computer Science Texas Tech University 2 Mathematics and Computer
More informationModeling and Tolerating Heterogeneous Failures in Large Parallel Systems
Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems Eric Heien 1, Derrick Kondo 1, Ana Gainaru 2, Dan LaPine 2, Bill Kramer 2, Franck Cappello 1, 2 1 INRIA, France 2 UIUC, USA Context
More informationEd D Azevedo Oak Ridge National Laboratory Piotr Luszczek University of Tennessee
A Framework for Check-Pointed Fault-Tolerant Out-of-Core Linear Algebra Ed D Azevedo (e6d@ornl.gov) Oak Ridge National Laboratory Piotr Luszczek (luszczek@cs.utk.edu) University of Tennessee Acknowledgement
More informationTransparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching
Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching Bogdan Nicolae (IBM Research, Ireland) Pierre Riteau (University of Chicago, USA) Kate Keahey (Argonne National
More informationUnified Runtime for PGAS and MPI over OFED
Unified Runtime for PGAS and MPI over OFED D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University, USA Outline Introduction
More informationHPC In The Cloud? Michael Kleber. July 2, Department of Computer Sciences University of Salzburg, Austria
HPC In The Cloud? Michael Kleber Department of Computer Sciences University of Salzburg, Austria July 2, 2012 Content 1 2 3 MUSCLE NASA 4 5 Motivation wide spread availability of cloud services easy access
More informationFuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc
Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,
More informationCloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase. Chen Zhang Hans De Sterck University of Waterloo
CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase Chen Zhang Hans De Sterck University of Waterloo Outline Introduction Motivation Related Work System Design Future Work Introduction
More informationPerformance Evaluation Using Network File System (NFS) v3 Protocol. Hitachi Data Systems
P E R F O R M A N C E B R I E F Hitachi NAS Platform 3080 Cluster Using the Hitachi Adaptable Modular Aciduisismodo Dolore Eolore Storage 2500: SPEC SFS2008 Dionseq Uatummy Odolorem Vel Performance Analysis
More informationUNIBUS: ASPECTS OF HETEROGENEITY AND FAULT TOLERANCE IN CLOUD COMPUTING
Atlanta, Georgia, April 19, 2010 in conjunction with IPDPS 2010 UNIBUS: ASPECTS OF HETEROGENEITY AND FAULT TOLERANCE IN CLOUD COMPUTING Magdalena Slawinska Jaroslaw Slawinski Vaidy Sunderam {magg, jaross,
More informationPHX: Memory Speed HPC I/O with NVM. Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan
PHX: Memory Speed HPC I/O with NVM Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan Node Local Persistent I/O? Node local checkpoint/ restart - Recover from transient failures ( node restart)
More informationxsim The Extreme-Scale Simulator
www.bsc.es xsim The Extreme-Scale Simulator Janko Strassburg Severo Ochoa Seminar @ BSC, 28 Feb 2014 Motivation Future exascale systems are predicted to have hundreds of thousands of nodes, thousands of
More informationOracle Database 11g Direct NFS Client Oracle Open World - November 2007
Oracle Database 11g Client Oracle Open World - November 2007 Bill Hodak Sr. Product Manager Oracle Corporation Kevin Closson Performance Architect Oracle Corporation Introduction
More informationStorage Optimization with Oracle Database 11g
Storage Optimization with Oracle Database 11g Terabytes of Data Reduce Storage Costs by Factor of 10x Data Growth Continues to Outpace Budget Growth Rate of Database Growth 1000 800 600 400 200 1998 2000
More informationDell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results
Dell Fluid Data solutions Powerful self-optimized enterprise storage Dell Compellent Storage Center: Designed for business results The Dell difference: Efficiency designed to drive down your total cost
More informationHigh-performance aspects in virtualized infrastructures
SVM 21 High-performance aspects in virtualized infrastructures Vitalian Danciu, Nils gentschen Felde, Dieter Kranzlmüller, Tobias Lindinger SVM 21 - HPC aspects in virtualized infrastructures 1/29/21 Niagara
More informationCray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET
CRAY XD1 DATASHEET Cray XD1 Supercomputer Release 1.3 Purpose-built for HPC delivers exceptional application performance Affordable power designed for a broad range of HPC workloads and budgets Linux,
More informationScheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications
Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Sep 2009 Gilad Shainer, Tong Liu (Mellanox); Jeffrey Layton (Dell); Joshua Mora (AMD) High Performance Interconnects for
More informationThe way toward peta-flops
The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops
More informationPattern-Aware File Reorganization in MPI-IO
Pattern-Aware File Reorganization in MPI-IO Jun He, Huaiming Song, Xian-He Sun, Yanlong Yin Computer Science Department Illinois Institute of Technology Chicago, Illinois 60616 {jhe24, huaiming.song, sun,
More informationQNAP OpenStack Ready NAS For a Robust and Reliable Cloud Platform
QNAP OpenStack Ready NAS For a Robust and Reliable Cloud Platform Agenda IT transformation and challenges OpenStack A new star in the cloud world How does OpenStack satisfy IT demands? QNAP + OpenStack
More informationarxiv: v2 [cs.dc] 2 May 2017
High Performance Data Persistence in Non-Volatile Memory for Resilient High Performance Computing Yingchao Huang University of California, Merced yhuang46@ucmerced.edu Kai Wu University of California,
More informationApace Systems. Avid Unity Media Offload Solution KIT
Apace Systems Networked Storage for Video Backup 6TB Unity in 8 Hours! Instant restore! WOW!!!! Apace Systems Avid Unity Media Offload Solution KIT Backup / restore / shared storage / expanded access from
More informationMicrosoft Office SharePoint Server 2007
Microsoft Office SharePoint Server 2007 Enabled by EMC Celerra Unified Storage and Microsoft Hyper-V Reference Architecture Copyright 2010 EMC Corporation. All rights reserved. Published May, 2010 EMC
More informationIBM IBM Open Systems Storage Solutions Version 4. Download Full Version :
IBM 000-742 IBM Open Systems Storage Solutions Version 4 Download Full Version : https://killexams.com/pass4sure/exam-detail/000-742 Answer: B QUESTION: 156 Given the configuration shown, which of the
More informationMemory Management Strategies for Data Serving with RDMA
Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands
More informationSGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012
SGI Overview HPC User Forum Dearborn, Michigan September 17 th, 2012 SGI Market Strategy HPC Commercial Scientific Modeling & Simulation Big Data Hadoop In-memory Analytics Archive Cloud Public Private
More informationIntra-MIC MPI Communication using MVAPICH2: Early Experience
Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University
More informationStorage Update and Storage Best Practices for Microsoft Server Applications. Dennis Martin President, Demartek January 2009 Copyright 2009 Demartek
Storage Update and Storage Best Practices for Microsoft Server Applications Dennis Martin President, Demartek January 2009 Copyright 2009 Demartek Agenda Introduction Storage Technologies Storage Devices
More informationHYCOM Performance Benchmark and Profiling
HYCOM Performance Benchmark and Profiling Jan 2011 Acknowledgment: - The DoD High Performance Computing Modernization Program Note The following research was performed under the HPC Advisory Council activities
More informationGeneral Purpose Storage Servers
General Purpose Storage Servers Open Storage Servers Art Licht Principal Engineer Sun Microsystems, Inc Art.Licht@sun.com Agenda Industry issues and Economics Platforms Software Architectures Industry
More informationNAMD Performance Benchmark and Profiling. January 2015
NAMD Performance Benchmark and Profiling January 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource
More informationProActive SPMD and Fault Tolerance Protocol and Benchmarks
1 ProActive SPMD and Fault Tolerance Protocol and Benchmarks Brian Amedro et al. INRIA - CNRS 1st workshop INRIA-Illinois June 10-12, 2009 Paris 2 Outline ASP Model Overview ProActive SPMD Fault Tolerance
More informationDiskReduce: Making Room for More Data on DISCs. Wittawat Tantisiriroj
DiskReduce: Making Room for More Data on DISCs Wittawat Tantisiriroj Lin Xiao, Bin Fan, and Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University GFS/HDFS Triplication GFS & HDFS triplicate
More informationSPECIFICATION FOR NETWORK ATTACHED STORAGE (NAS) TO BE FILLED BY BIDDER. NAS Controller Should be rack mounted with a form factor of not more than 2U
SPECIFICATION FOR NETWORK ATTACHED STORAGE (NAS) TO BE FILLED BY BIDDER S.No. Features Qualifying Minimum Requirements No. of Storage 1 Units 2 Make Offered 3 Model Offered 4 Rack mount 5 Processor 6 Memory
More informationA Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop
A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop Myoungjin Kim 1, Seungho Han 1, Jongjin Jung 3, Hanku Lee 1,2,*, Okkyung Choi 2 1 Department of Internet and Multimedia Engineering,
More informationAlleviating Scalability Issues of Checkpointing
Rolf Riesen, Kurt Ferreira, Dilma Da Silva, Pierre Lemarinier, Dorian Arnold, Patrick G. Bridges 13 November 2012 Alleviating Scalability Issues of Checkpointing Protocols Overview 2 3 Motivation: scaling
More informationIBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights
IBM Spectrum NAS Easy-to-manage software-defined file storage for the enterprise Highlights Reduce capital expenditures with storage software on commodity servers Improve efficiency by consolidating all
More informationResilience Design Patterns: A Structured Approach to Resilience at Extreme Scale
Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale Saurabh Hukerikar Christian Engelmann Computer Science Research Group Computer Science & Mathematics Division Oak Ridge
More informationManaging CAE Simulation Workloads in Cluster Environments
Managing CAE Simulation Workloads in Cluster Environments Michael Humphrey V.P. Enterprise Computing Altair Engineering humphrey@altair.com June 2003 Copyright 2003 Altair Engineering, Inc. All rights
More informationExperiences with HP SFS / Lustre in HPC Production
Experiences with HP SFS / Lustre in HPC Production Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Outline» What is HP StorageWorks Scalable File Share (HP SFS)? A Lustre
More informationAccelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card
Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card The Rise of MongoDB Summary One of today s growing database
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox, Altair Compute
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationiscsi Technology: A Convergence of Networking and Storage
HP Industry Standard Servers April 2003 iscsi Technology: A Convergence of Networking and Storage technology brief TC030402TB Table of Contents Abstract... 2 Introduction... 2 The Changing Storage Environment...
More informationScalable Fault Tolerance Schemes using Adaptive Runtime Support
Scalable Fault Tolerance Schemes using Adaptive Runtime Support Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at
More informationShedding Tiers Creating a Simpler, More Manageable Storage Infrastructure
Shedding Tiers Creating a Simpler, More Manageable Storage Infrastructure By Gary Orenstein, Vice President of Marketing Gear6 [www.gear6.com] Introduction The concept of segmenting data storage repositories
More informationECE7995 (7) Parallel I/O
ECE7995 (7) Parallel I/O 1 Parallel I/O From user s perspective: Multiple processes or threads of a parallel program accessing data concurrently from a common file From system perspective: - Files striped
More informationFDS and Intel MPI. Verification Report. on the. FireNZE Linux IB Cluster
Consulting Fire Engineers 34 Satara Crescent Khandallah Wellington 6035 New Zealand FDS 6.7.0 and Intel MPI Verification Report on the FireNZE Linux IB Cluster Prepared by: FireNZE Dated: 11 August 2018
More informationMellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions
Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions Providing Superior Server and Storage Performance, Efficiency and Return on Investment As Announced and Demonstrated at
More informationAn introduction to checkpointing. for scientific applications
damien.francois@uclouvain.be UCL/CISM - FNRS/CÉCI An introduction to checkpointing for scientific applications November 2013 CISM/CÉCI training session What is checkpointing? Without checkpointing: $./count
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices
More information