ECE7995 (7) Parallel I/O

Size: px
Start display at page:

Download "ECE7995 (7) Parallel I/O"

Transcription

1 ECE7995 (7) Parallel I/O 1

2 Parallel I/O From user s perspective: Multiple processes or threads of a parallel program accessing data concurrently from a common file From system perspective: - Files striped across multiple I/O servers - File system designed to perform well for concurrent writes and reads (parallel file system) Compute Nodes Interconnect I/O requests are served in parallel and you may receive good performance. I/O nodes 2

3 A Scenario: Running an MPI Program with a Parallel File System Compute Nodes P 0 P 1 P 2 P n CN 0 CN 1 CN 2 CN n Cluster Network Data Servers DS 0 DS 1 DS m Meta-S Metadata Server 3

4 Parallel I/O Infrastructure Maps application abstractions onto storage abstractions and provides data portability HDF5, Parallel netcdf Maintains logical space and provides efficient access to data PVFS, Lustre, GPFS, PanFS Organizes accesses from many processes, especially those using collective I/O MPI IO, ROMIO Underlying I/O hardware, storage devices 4

5 What Are Parallel File Systems Store application data persistently usually extremely large datasets that can t fit in memory Provide global shared namespace (files, directories) Designed for parallelism Concurrent (often coordinated) access from many clients Designed for high-performance Operate over high-speed networks (IB, Myrinet) Optimized I/O path for maximum bandwidth 5

6 Parallel File Systems Provide a directory tree all nodes can see (the global name space) Map data across many servers and drives (parallelism of access) Coordinate access to data so certain access rules are followed (useful semantics) 6

7 Data distribution in parallel file systems 7

8 Data Distribution Round-robin (aka Simple Stripe in PVFS) is a reasonable default solution Works consistently for a variety of workloads Works well on most systems Can you think of a system where this might not work so well? 8

9 Data Distribution Clients perform writes/reads of file at various regions Usually depends on application workload and number of tasks 9

10 PVFS - Parallel Virtual File System An open source parallel file system Brings state-of-the-art parallel I/O concepts to production parallel systems Designed to scale to petabytes of storage and provide access rates at 100s of GB Developed by Parallel Architecture Research Lab at Clemson University since 1993, MCS of ANL, and the Ohio Supercomputer Center Major developers: Walt Ligon, Rob Ross, Phil Carns, Pete Wyckoff, Neil Miller, Rob Latham, Sam Lang, Brad Settlemyer Current stable release (PVFS2):

11 PVFS Features Performance: Designed to provide high performance for parallel applications, where concurrent, large IO and many file accesses are common Dynamic distribution of IO and metadata, avoiding single points of contention Optimizations Integration with HPC Interfaces (MPI-IO) Non-contiguous accesses Easy Deployment Hardware independent Mostly userspace (small linux kernel module) Proven production environment Good Research Platform Much research in parallel I/O has used PVFS 11

12 IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination 12

13 Outline A Motivation Example Design of IOrchestrator Performance Evaluation Related Work Conclusions 13

14 A Motivation Example Experimental setting MPICH compiled with ROMIO 6 compute nodes, 7 data servers, 1 meta-data server PVFS with default striping configuration CFQ disk scheduler The benchmark: mpi-io-test Run two instances of the benchmark. Each collectively reads a 10GB file with 64KB request size. Five processes are spawn in each of the program running. The processes access contiguous data. 14

15 I/O Request Generation Requests from Instance 1 Requests from Instance 2 Iteration 2 Iteration 1 For (j=0; j < opt_iter; j++){ err = MPI_File_read_all(fh, buf, nchars, MPI_CHAR,&status); } Iteration 0 15

16 Work conserving I/O Schedulers in a Single Disk Requests from Process 1 Requests from Process 2 disk head thrashing! New Position of Disk Head 16

17 Non work conserving Scheduling for one Disk Requests from Process 1 Requests from Process 2 Requests are efficiently served in non-workconserving manner. Position New Position of Disk of Disk Head Head 17

18 Analysis of On-Disk Data Accesses Disk access in a random order though there is ample spatial locality for data access of each instance. 18

19 Non work conserving Scheduling for Multiple Disks Disk heads have Anticipating to seek for serving for the requests remote pending requests. Requests accessing from Instance nearby 1 disk Requests areas. from Instance 2 Iteration 0 Next Request from the same instance does not arrive quickly because other requests from the same collective I/O call are still being served or pending at other servers. Anticipation fails! New Position Position of Disk of Disk Head Head 19

20 Coordinating Data Accesses on Disks Dedicated use of the disks through server coordination. Anticipation succeeds! Requests from Instance 1 Requests from Instance 2 Iteration 01 2 New Position of of Disk Disk Head Head 20

21 Issues with Dedicated Use of Disks But Requests what if from the Instance program 1 takes Requests a long think from Instance time 2 to issue its next requests? Iteration 0 Cost of idle-waiting might be larger than the cost for long-disk-head seeks. Position of Disk Head 21

22 Issues with Dedicated Use of Disks (Cont d) But what if requests are not evenly distributed over all the disks? Requests from Instance 1 Requests from Instance 2 Iteration 0 Position of Disk Head 22

23 Outline A Motivation Example Design of IOrchestrator Performance Evaluation Related Work Conclusions 23

24 IOrchestrator: Orchestration of Data Access on Multi node I/O Systems Objectives Recover spatial locality in a parallel program, which runs in a shared multi node storage cluster. System performance should not be compromised. Challenges How to track the spatial locality and think time of each program? How to determine the cost effectiveness of dedicated services for programs? How to implement the data access orchestration in a parallel file systems such as PVFS2? 24

25 Measurement of Spatial Locality and Program s Reuse Distance Two concepts Spatial locality: the average distance of disk head movement for serving a request Program s reuse distance: the think time between two requests from the same program that hit a data server Measurements To statistically quantify the spatial locality and reuse distance, we use the method which is similar to the one developed in Linux. Smooth out the short term dynamics accurately. Phase out historical statistics quickly. 25

26 Eligible Programs for Dedicated Service The spatial locality condition The standard deviations of spatial locality is small. 20% σ SL The benefits condition The average disk seek distance across all the programs shared the data server is large enough. n n ( SL 0 i) / SL 1.5 i= i 0 ij = The cost effectiveness condition The average reuse distance is smaller than disk seek time. n ( RD ) / n SeekTime i = 0 ij σsl : the standard deviations of SL SeekTime: the disk seek time derived from SL RD: Reuse distance i SL: Spatial locality 26

27 Scheduling of Programs What is an object for scheduling? Each eligible program is a running object for dedicated I/O service. Ineligible programs constitute an object for I/O service. How to determine the time slices for scheduling? Fixed window size 500ms Each object receives a portion of the window as time slice for its dedicated service. The ratio in each window is inversely proportional to the percentage of its average reuse distance over the sum of distances of all objects. How to avoid the starvation? The programs having poor SL or longer RD are serviced together in a dedicated time slice. The ratio is determined by their combined data access efficiency on disks. 27

28 IOrchestrator Architecture Compute Nodes (with instrumented MPI library) "mpdlistjobs" Ids of open files Ids of open files Metadata server program-files locality orchestrator ischeduler Disk I/O Scheduler 3 locality Hard DIsk ischeduler 3 Data Servers Disk I/O Scheduler Hard DIsk 28

29 Outline A Motivation Example Design of IOrchestrator Performance Evaluation Related Work Conclusions 29

30 Performance Evaluation: Benchmarks Name Access Pattern Sources mpi-io-test contiguous data sets PVFS2 software package ior-mpi-io non-contiguous data sets the ASCI Purple benchmark suite developed at LLNL mpi-tile-io data access in a tile-by-tile fashion the Parallel I/O Benchmarking Consortium at ANL noncontig data access with vector-derived MPI data type the Parallel I/O benchmarking Consortium at ANL hpio diverse set access patterns Northwestern University and SNL 30

31 Homogenous Workloads (w/o Collective I/O) The IOrchestrator improves I/O throughput of the entire file system by up to 89% and 43% on average. For the mpi io test benchmark, when IOrchestrator is used, the I/O throughput is increased by 57% for read and 37% for write. 31

32 On disk Data Access The disk head frequently alternates between two disk regions. CFQ does not preserve spatial locality without IOrchestrator. 32

33 Reuse Distances Without IOrchestrator, reuse there distances are many are significantly very large reuse reduced. distances. With IOrchestrator, CFQ exploits the strong locality in the program into efficient disk access

34 Homogenous Workloads (w/ Collective I/O) The IOrchestrator improves I/O throughput of the entire file system by up to 63% and 28% on average. For the ior mpi io benchmark, the I/O throughput is significantly reduced when collective I/O is used because of unbalanced workloads. 34

35 Performance of Heterogeneous Workloads Even IOrchestrator time slicing improves the system throughput by 17%. 47%. 35

36 Effect of File Distances among Programs The throughputs are improved by 33%, 64%, 147%, and 147%, respectively. 36

37 Impact of Scheduling Window Size The throughputs are improved by 40%, 48%, 58%, and 59%, respectively, with the selected window sizes. 37

38 Conclusions Coordinating data accesses across data servers is critical to preserve spatial locality of parallel program. We design and implement IOrchestrator based on PVFS2 to coordinate request scheduling across data servers according to monitored programs access behaviors. Our experiments with representative benchmarks show that IOrchestrator increases I/O throughput by 39% on average. 38

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application

More information

Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters

Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Kent Milfeld, Avijit Purkayastha, Chona Guiang Texas Advanced Computing Center The University of Texas Austin, Texas USA Abstract

More information

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing

A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou, K. Magoutis, M. Marazakis, A. Bilas Institute of Computer Science (ICS) Foundation for Research and

More information

Rethinking the design and implementation of the i/o software stack for high-performance computing

Rethinking the design and implementation of the i/o software stack for high-performance computing Wayne State University DigitalCommons@WayneState Wayne State University Dissertations 1-1-2012 Rethinking the design and implementation of the i/o software stack for high-performance computing Xuechen

More information

Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems. Dong Dai, Yong Chen, Dries Kimpe, and Robert Ross

Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems. Dong Dai, Yong Chen, Dries Kimpe, and Robert Ross Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems Dong Dai, Yong Chen, Dries Kimpe, and Robert Ross Parallel Object Storage Many HPC systems utilize object storage: PVFS, Lustre, PanFS,

More information

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT Abhisek Pan 2, J.P. Walters 1, Vijay S. Pai 1,2, David Kang 1, Stephen P. Crago 1 1 University of Southern California/Information Sciences Institute 2

More information

An Evolutionary Path to Object Storage Access

An Evolutionary Path to Object Storage Access An Evolutionary Path to Object Storage Access David Goodell +, Seong Jo (Shawn) Kim*, Robert Latham +, Mahmut Kandemir*, and Robert Ross + *Pennsylvania State University + Argonne National Laboratory Outline

More information

Making Resonance a Common Case: A High-Performance Implementation of Collective I/O on Parallel File Systems

Making Resonance a Common Case: A High-Performance Implementation of Collective I/O on Parallel File Systems Making Resonance a Common Case: A High-Performance Implementation of Collective on Parallel File Systems Xuechen Zhang 1, Song Jiang 1, and Kei Davis 2 1 ECE Department 2 Computer and Computational Sciences

More information

S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems

S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems Shuibing He, Xian-He Sun, Bo Feng Department of Computer Science Illinois Institute of Technology Speed Gap Between CPU and Hard Drive http://www.velobit.com/storage-performance-blog/bid/114532/living-with-the-2012-hdd-shortage

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

Pattern-Aware File Reorganization in MPI-IO

Pattern-Aware File Reorganization in MPI-IO Pattern-Aware File Reorganization in MPI-IO Jun He, Huaiming Song, Xian-He Sun, Yanlong Yin Computer Science Department Illinois Institute of Technology Chicago, Illinois 60616 {jhe24, huaiming.song, sun,

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

Introduction to HPC Parallel I/O

Introduction to HPC Parallel I/O Introduction to HPC Parallel I/O Feiyi Wang (Ph.D.) and Sarp Oral (Ph.D.) Technology Integration Group Oak Ridge Leadership Computing ORNL is managed by UT-Battelle for the US Department of Energy Outline

More information

Revealing Applications Access Pattern in Collective I/O for Cache Management

Revealing Applications Access Pattern in Collective I/O for Cache Management Revealing Applications Access Pattern in for Yin Lu 1, Yong Chen 1, Rob Latham 2 and Yu Zhuang 1 Presented by Philip Roth 3 1 Department of Computer Science Texas Tech University 2 Mathematics and Computer

More information

QoS Support for End Users of I/O-intensive Applications using Shared Storage Systems

QoS Support for End Users of I/O-intensive Applications using Shared Storage Systems QoS Support for End Users of I/O-intensive Applications using Shared Storage Systems Xuechen Zhang ECE Department Wayne State University Detroit, MI, 4822, USA xczhang@wayne.edu Kei Davis Los Alamos National

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Massive Data Processing on the Acxiom Cluster Testbed

Massive Data Processing on the Acxiom Cluster Testbed Clemson University TigerPrints Presentations School of Computing 8-2001 Massive Data Processing on the Acxiom Cluster Testbed Amy Apon Clemson University, aapon@clemson.edu Pawel Wolinski University of

More information

Scalable I/O, File Systems, and Storage Networks R&D at Los Alamos LA-UR /2005. Gary Grider CCN-9

Scalable I/O, File Systems, and Storage Networks R&D at Los Alamos LA-UR /2005. Gary Grider CCN-9 Scalable I/O, File Systems, and Storage Networks R&D at Los Alamos LA-UR-05-2030 05/2005 Gary Grider CCN-9 Background Disk2500 TeraBytes Parallel I/O What drives us? Provide reliable, easy-to-use, high-performance,

More information

The ASCI/DOD Scalable I/O History and Strategy Run Time Systems and Scalable I/O Team Gary Grider CCN-8 Los Alamos National Laboratory LAUR

The ASCI/DOD Scalable I/O History and Strategy Run Time Systems and Scalable I/O Team Gary Grider CCN-8 Los Alamos National Laboratory LAUR The ASCI/DOD Scalable I/O History and Strategy Run Time Systems and Scalable I/O Team Gary Grider CCN-8 Los Alamos National Laboratory LAUR 042787 05/2004 Parallel File Systems and Parallel I/O Why - From

More information

Proceedings of the Linux Symposium

Proceedings of the Linux Symposium Proceedings of the Linux Symposium Volume Two July 20nd 23th, 2005 Ottawa, Ontario Canada Conference Organizers Andrew J. Hutton, Steamballoon, Inc. C. Craig Ross, Linux Symposium Stephanie Donovan, Linux

More information

Dynamic Active Storage for High Performance I/O

Dynamic Active Storage for High Performance I/O Dynamic Active Storage for High Performance I/O Chao Chen(chao.chen@ttu.edu) 4.02.2012 UREaSON Outline Ø Background Ø Active Storage Ø Issues/challenges Ø Dynamic Active Storage Ø Prototyping and Evaluation

More information

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017 CS 471 Operating Systems Yue Cheng George Mason University Fall 2017 Review: Disks 2 Device I/O Protocol Variants o Status checks Polling Interrupts o Data PIO DMA 3 Disks o Doing an disk I/O requires:

More information

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of

More information

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE DELL EMC ISILON F800 AND H600 I/O PERFORMANCE ABSTRACT This white paper provides F800 and H600 performance data. It is intended for performance-minded administrators of large compute clusters that access

More information

An Overview of Fujitsu s Lustre Based File System

An Overview of Fujitsu s Lustre Based File System An Overview of Fujitsu s Lustre Based File System Shinji Sumimoto Fujitsu Limited Apr.12 2011 For Maximizing CPU Utilization by Minimizing File IO Overhead Outline Target System Overview Goals of Fujitsu

More information

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments LCI HPC Revolution 2005 26 April 2005 Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments Matthew Woitaszek matthew.woitaszek@colorado.edu Collaborators Organizations National

More information

Multi-Layer Event Trace Analysis for Parallel I/O Performance Tuning

Multi-Layer Event Trace Analysis for Parallel I/O Performance Tuning To appear in Proc. of the 27 Int l Conf. on Parallel Processing (ICPP-7). Multi-Layer Event Trace Analysis for Parallel I/O Performance Tuning Pin Lu and Kai Shen Department of Computer Science, University

More information

MPI versions. MPI History

MPI versions. MPI History MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention

More information

A Scheduling Framework that Makes any Disk Schedulers Non-work-conserving solely based on Request Characteristics

A Scheduling Framework that Makes any Disk Schedulers Non-work-conserving solely based on Request Characteristics A Scheduling Framework that Makes any Disk Schedulers Non-work-conserving solely based on Request Characteristics Yuehai Xu ECE Department Wayne State University Detroit, MI 48202, USA yhxu@wayne.edu Song

More information

MPI History. MPI versions MPI-2 MPICH2

MPI History. MPI versions MPI-2 MPICH2 MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention

More information

Enabling Active Storage on Parallel I/O Software Stacks. Seung Woo Son Mathematics and Computer Science Division

Enabling Active Storage on Parallel I/O Software Stacks. Seung Woo Son Mathematics and Computer Science Division Enabling Active Storage on Parallel I/O Software Stacks Seung Woo Son sson@mcs.anl.gov Mathematics and Computer Science Division MSST 2010, Incline Village, NV May 7, 2010 Performing analysis on large

More information

Memory Management Strategies for Data Serving with RDMA

Memory Management Strategies for Data Serving with RDMA Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands

More information

Leveraging Burst Buffer Coordination to Prevent I/O Interference

Leveraging Burst Buffer Coordination to Prevent I/O Interference Leveraging Burst Buffer Coordination to Prevent I/O Interference Anthony Kougkas akougkas@hawk.iit.edu Matthieu Dorier, Rob Latham, Rob Ross, Xian-He Sun Wednesday, October 26th Baltimore, USA Outline

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton

More information

Enhancing Checkpoint Performance with Staging IO & SSD

Enhancing Checkpoint Performance with Staging IO & SSD Enhancing Checkpoint Performance with Staging IO & SSD Xiangyong Ouyang Sonya Marcarelli Dhabaleswar K. Panda Department of Computer Science & Engineering The Ohio State University Outline Motivation and

More information

Iteration Based Collective I/O Strategy for Parallel I/O Systems

Iteration Based Collective I/O Strategy for Parallel I/O Systems Iteration Based Collective I/O Strategy for Parallel I/O Systems Zhixiang Wang, Xuanhua Shi, Hai Jin, Song Wu Services Computing Technology and System Lab Cluster and Grid Computing Lab Huazhong University

More information

GFS-python: A Simplified GFS Implementation in Python

GFS-python: A Simplified GFS Implementation in Python GFS-python: A Simplified GFS Implementation in Python Andy Strohman ABSTRACT GFS-python is distributed network filesystem written entirely in python. There are no dependencies other than Python s standard

More information

libhio: Optimizing IO on Cray XC Systems With DataWarp

libhio: Optimizing IO on Cray XC Systems With DataWarp libhio: Optimizing IO on Cray XC Systems With DataWarp May 9, 2017 Nathan Hjelm Cray Users Group May 9, 2017 Los Alamos National Laboratory LA-UR-17-23841 5/8/2017 1 Outline Background HIO Design Functionality

More information

Hint Controlled Distribution with Parallel File Systems

Hint Controlled Distribution with Parallel File Systems Hint Controlled Distribution with Parallel File Systems Hipolito Vasquez Lucas and Thomas Ludwig Parallele und Verteilte Systeme, Institut für Informatik, Ruprecht-Karls-Universität Heidelberg, 6912 Heidelberg,

More information

Enhancements to Linux I/O Scheduling

Enhancements to Linux I/O Scheduling Enhancements to Linux I/O Scheduling Seetharami R. Seelam, UTEP Rodrigo Romero, UTEP Patricia J. Teller, UTEP William Buros, IBM-Austin 21 July 2005 Linux Symposium 2005 1 Introduction Dynamic Adaptability

More information

Parallel File Systems for HPC

Parallel File Systems for HPC Introduction to Scuola Internazionale Superiore di Studi Avanzati Trieste November 2008 Advanced School in High Performance and Grid Computing Outline 1 The Need for 2 The File System 3 Cluster & A typical

More information

Improving Disk I/O Performance on Linux. Carl Henrik Lunde, Håvard Espeland, Håkon Kvale Stensland, Andreas Petlund, Pål Halvorsen

Improving Disk I/O Performance on Linux. Carl Henrik Lunde, Håvard Espeland, Håkon Kvale Stensland, Andreas Petlund, Pål Halvorsen Improving Disk I/O Performance on Linux Carl Henrik Lunde, Håvard Espeland, Håkon Kvale Stensland, Completely Fair Queuing Default scheduler on Linux Ensures complete fairness among I/O-requests in the

More information

Improved Solutions for I/O Provisioning and Application Acceleration

Improved Solutions for I/O Provisioning and Application Acceleration 1 Improved Solutions for I/O Provisioning and Application Acceleration August 11, 2015 Jeff Sisilli Sr. Director Product Marketing jsisilli@ddn.com 2 Why Burst Buffer? The Supercomputing Tug-of-War A supercomputer

More information

Parallel File Systems Compared

Parallel File Systems Compared Parallel File Systems Compared Computing Centre (SSCK) University of Karlsruhe, Germany Laifer@rz.uni-karlsruhe.de page 1 Outline» Parallel file systems (PFS) Design and typical usage Important features

More information

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3

More information

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018 File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018 Today s Goals Supporting multiple file systems in one name space. Schedulers not just for CPUs, but disks too! Caching

More information

Introduction to High Performance Parallel I/O

Introduction to High Performance Parallel I/O Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing

More information

DATA access has become the major performance bottleneck

DATA access has become the major performance bottleneck 2940 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 10, OCTOBER 2016 Improving Performance of Parallel I/O Systems through Selective and Layout-Aware SSD Cache Shuibing He, Yang Wang,

More information

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational

More information

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files Addressable by a filename ( foo.txt ) Usually supports hierarchical

More information

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology September 19, 2007 Markus Levy, EEMBC and Multicore Association Enabling the Multicore Ecosystem Multicore

More information

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research Computer Science Section Computational and Information Systems Laboratory National Center for Atmospheric Research My work in the context of TDD/CSS/ReSET Polynya new research computing environment Polynya

More information

Enosis: Bridging the Semantic Gap between

Enosis: Bridging the Semantic Gap between Enosis: Bridging the Semantic Gap between File-based and Object-based Data Models Anthony Kougkas - akougkas@hawk.iit.edu, Hariharan Devarajan, Xian-He Sun Outline Introduction Background Approach Evaluation

More information

DATA access is one of the critical performance bottlenecks

DATA access is one of the critical performance bottlenecks This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 1.119/TC.216.2637353,

More information

CISC 879 Software Support for Multicore Architectures Spring Student Presentation 6: April 8. Presenter: Pujan Kafle, Deephan Mohan

CISC 879 Software Support for Multicore Architectures Spring Student Presentation 6: April 8. Presenter: Pujan Kafle, Deephan Mohan CISC 879 Software Support for Multicore Architectures Spring 2008 Student Presentation 6: April 8 Presenter: Pujan Kafle, Deephan Mohan Scribe: Kanik Sem The following two papers were presented: A Synchronous

More information

Design and Evaluation of I/O Strategies for Parallel Pipelined STAP Applications

Design and Evaluation of I/O Strategies for Parallel Pipelined STAP Applications Design and Evaluation of I/O Strategies for Parallel Pipelined STAP Applications Wei-keng Liao Alok Choudhary ECE Department Northwestern University Evanston, IL Donald Weiner Pramod Varshney EECS Department

More information

NetApp High-Performance Storage Solution for Lustre

NetApp High-Performance Storage Solution for Lustre Technical Report NetApp High-Performance Storage Solution for Lustre Solution Design Narjit Chadha, NetApp October 2014 TR-4345-DESIGN Abstract The NetApp High-Performance Storage Solution (HPSS) for Lustre,

More information

Black-Box Problem Diagnosis in Parallel File System

Black-Box Problem Diagnosis in Parallel File System A Presentation on Black-Box Problem Diagnosis in Parallel File System Authors: Michael P. Kasick, Jiaqi Tan, Rajeev Gandhi, Priya Narasimhan Presented by: Rishi Baldawa Key Idea Focus is on automatically

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

pnfs, POSIX, and MPI-IO: A Tale of Three Semantics

pnfs, POSIX, and MPI-IO: A Tale of Three Semantics Dean Hildebrand Research Staff Member PDSW 2009 pnfs, POSIX, and MPI-IO: A Tale of Three Semantics Dean Hildebrand, Roger Haskin Arifa Nisar IBM Almaden Northwestern University Agenda Motivation pnfs HPC

More information

Parallel I/O on JUQUEEN

Parallel I/O on JUQUEEN Parallel I/O on JUQUEEN 4. Februar 2014, JUQUEEN Porting and Tuning Workshop Mitglied der Helmholtz-Gemeinschaft Wolfgang Frings w.frings@fz-juelich.de Jülich Supercomputing Centre Overview Parallel I/O

More information

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark by Nkemdirim Dockery High Performance Computing Workloads Core-memory sized Floating point intensive Well-structured

More information

What is a file system

What is a file system COSC 6397 Big Data Analytics Distributed File Systems Edgar Gabriel Spring 2017 What is a file system A clearly defined method that the OS uses to store, catalog and retrieve files Manage the bits that

More information

Benefits of Quadrics Scatter/Gather to PVFS2 Noncontiguous IO

Benefits of Quadrics Scatter/Gather to PVFS2 Noncontiguous IO Benefits of Quadrics Scatter/Gather to PVFS2 Noncontiguous IO Weikuan Yu Dhabaleswar K. Panda Network-Based Computing Lab Dept. of Computer Science & Engineering The Ohio State University {yuw,panda}@cse.ohio-state.edu

More information

Coordinating Parallel HSM in Object-based Cluster Filesystems

Coordinating Parallel HSM in Object-based Cluster Filesystems Coordinating Parallel HSM in Object-based Cluster Filesystems Dingshan He, Xianbo Zhang, David Du University of Minnesota Gary Grider Los Alamos National Lab Agenda Motivations Parallel archiving/retrieving

More information

CSE6230 Fall Parallel I/O. Fang Zheng

CSE6230 Fall Parallel I/O. Fang Zheng CSE6230 Fall 2012 Parallel I/O Fang Zheng 1 Credits Some materials are taken from Rob Latham s Parallel I/O in Practice talk http://www.spscicomp.org/scicomp14/talks/l atham.pdf 2 Outline I/O Requirements

More information

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy François Tessier, Venkatram Vishwanath Argonne National Laboratory, USA July 19,

More information

S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems

S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems : Smart Selective SSD Cache for Parallel I/O Systems Shuibing He, Xian-He Sun, Bo Feng Department of Computer Science Illinois Institute of Technology Chicago, IL 6616 {she11, sun, bfeng5}@iit.edu Abstract

More information

POCCS: A Parallel Out-of-Core Computing System for Linux Clusters

POCCS: A Parallel Out-of-Core Computing System for Linux Clusters POCCS: A Parallel Out-of-Core System for Linux Clusters JIANQI TANG BINXING FANG MINGZENG HU HONGLI ZHANG Department of Computer Science and Engineering Harbin Institute of Technology No.92, West Dazhi

More information

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters COSC 6374 Parallel I/O (I) I/O basics Fall 2010 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card

More information

IBM Spectrum Scale IO performance

IBM Spectrum Scale IO performance IBM Spectrum Scale 5.0.0 IO performance Silverton Consulting, Inc. StorInt Briefing 2 Introduction High-performance computing (HPC) and scientific computing are in a constant state of transition. Artificial

More information

API and Usage of libhio on XC-40 Systems

API and Usage of libhio on XC-40 Systems API and Usage of libhio on XC-40 Systems May 24, 2018 Nathan Hjelm Cray Users Group May 24, 2018 Los Alamos National Laboratory LA-UR-18-24513 5/24/2018 1 Outline Background HIO Design HIO API HIO Configuration

More information

Object Placement in Shared Nothing Architecture Zhen He, Jeffrey Xu Yu and Stephen Blackburn Λ

Object Placement in Shared Nothing Architecture Zhen He, Jeffrey Xu Yu and Stephen Blackburn Λ 45 Object Placement in Shared Nothing Architecture Zhen He, Jeffrey Xu Yu and Stephen Blackburn Λ Department of Computer Science The Australian National University Canberra, ACT 2611 Email: fzhen.he, Jeffrey.X.Yu,

More information

Analyzing I/O Performance on a NEXTGenIO Class System

Analyzing I/O Performance on a NEXTGenIO Class System Analyzing I/O Performance on a NEXTGenIO Class System holger.brunst@tu-dresden.de ZIH, Technische Universität Dresden LUG17, Indiana University, June 2 nd 2017 NEXTGenIO Fact Sheet Project Research & Innovation

More information

Orthrus: A Framework for Implementing Efficient Collective I/O in Multi-core Clusters

Orthrus: A Framework for Implementing Efficient Collective I/O in Multi-core Clusters Orthrus: A Framework for Implementing Efficient Collective I/O in Multi-core Clusters Xuechen Zhang 1 Jianqiang Ou 2 Kei Davis 3 Song Jiang 2 1 Georgia Institute of Technology, 2 Wayne State University,

More information

Block Device Driver. Pradipta De

Block Device Driver. Pradipta De Block Device Driver Pradipta De pradipta.de@sunykorea.ac.kr Today s Topic Block Devices Structure of devices Kernel components I/O Scheduling USB Device Driver Basics CSE506: Block Devices & IO Scheduling

More information

Enabling Active Storage on Parallel I/O Software Stacks

Enabling Active Storage on Parallel I/O Software Stacks Enabling Active Storage on Parallel I/O Software Stacks Seung Woo Son Samuel Lang Philip Carns Robert Ross Rajeev Thakur Berkin Ozisikyilmaz Prabhat Kumar Wei-Keng Liao Alok Choudhary Mathematics and Computer

More information

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering

More information

Evaluating Algorithms for Shared File Pointer Operations in MPI I/O

Evaluating Algorithms for Shared File Pointer Operations in MPI I/O Evaluating Algorithms for Shared File Pointer Operations in MPI I/O Ketan Kulkarni and Edgar Gabriel Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston {knkulkarni,gabriel}@cs.uh.edu

More information

REMEM: REmote MEMory as Checkpointing Storage

REMEM: REmote MEMory as Checkpointing Storage REMEM: REmote MEMory as Checkpointing Storage Hui Jin Illinois Institute of Technology Xian-He Sun Illinois Institute of Technology Yong Chen Oak Ridge National Laboratory Tao Ke Illinois Institute of

More information

Improving MPI Independent Write Performance Using A Two-Stage Write-Behind Buffering Method

Improving MPI Independent Write Performance Using A Two-Stage Write-Behind Buffering Method Improving MPI Independent Write Performance Using A Two-Stage Write-Behind Buffering Method Wei-keng Liao 1, Avery Ching 1, Kenin Coloma 1, Alok Choudhary 1, and Mahmut Kandemir 2 1 Northwestern University

More information

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing

More information

HDF5 I/O Performance. HDF and HDF-EOS Workshop VI December 5, 2002

HDF5 I/O Performance. HDF and HDF-EOS Workshop VI December 5, 2002 HDF5 I/O Performance HDF and HDF-EOS Workshop VI December 5, 2002 1 Goal of this talk Give an overview of the HDF5 Library tuning knobs for sequential and parallel performance 2 Challenging task HDF5 Library

More information

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Feedback on BeeGFS. A Parallel File System for High Performance Computing Feedback on BeeGFS A Parallel File System for High Performance Computing Philippe Dos Santos et Georges Raseev FR 2764 Fédération de Recherche LUmière MATière December 13 2016 LOGO CNRS LOGO IO December

More information

Experiences with HP SFS / Lustre in HPC Production

Experiences with HP SFS / Lustre in HPC Production Experiences with HP SFS / Lustre in HPC Production Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Outline» What is HP StorageWorks Scalable File Share (HP SFS)? A Lustre

More information

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Evaluation of Lustre File System software enhancements for improved Metadata performance Wojciech Turek, Paul Calleja,John

More information

An Introduction to GPFS

An Introduction to GPFS IBM High Performance Computing July 2006 An Introduction to GPFS gpfsintro072506.doc Page 2 Contents Overview 2 What is GPFS? 3 The file system 3 Application interfaces 4 Performance and scalability 4

More information

OASIS: Self-tuning Storage for Applications

OASIS: Self-tuning Storage for Applications OASIS: Self-tuning Storage for Applications Kostas Magoutis, Prasenjit Sarkar, Gauri Shah 14 th NASA Goddard- 23 rd IEEE Mass Storage Systems Technologies, College Park, MD, May 17, 2006 Outline Motivation

More information

Parallel Performance Studies for a Clustering Algorithm

Parallel Performance Studies for a Clustering Algorithm Parallel Performance Studies for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland,

More information

High Performance MPI on IBM 12x InfiniBand Architecture

High Performance MPI on IBM 12x InfiniBand Architecture High Performance MPI on IBM 12x InfiniBand Architecture Abhinav Vishnu, Brad Benton 1 and Dhabaleswar K. Panda {vishnu, panda} @ cse.ohio-state.edu {brad.benton}@us.ibm.com 1 1 Presentation Road-Map Introduction

More information

Applying DDN to Machine Learning

Applying DDN to Machine Learning Applying DDN to Machine Learning Jean-Thomas Acquaviva jacquaviva@ddn.com Learning from What? Multivariate data Image data Facial recognition Action recognition Object detection and recognition Handwriting

More information

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies François Tessier, Venkatram Vishwanath, Paul Gressier Argonne National Laboratory, USA Wednesday

More information

Advanced Data Placement via Ad-hoc File Systems at Extreme Scales (ADA-FS)

Advanced Data Placement via Ad-hoc File Systems at Extreme Scales (ADA-FS) Advanced Data Placement via Ad-hoc File Systems at Extreme Scales (ADA-FS) Understanding I/O Performance Behavior (UIOP) 2017 Sebastian Oeste, Mehmet Soysal, Marc-André Vef, Michael Kluge, Wolfgang E.

More information

Short Note. Parallel datasets in SEPlib. Robert G. Clapp 1

Short Note. Parallel datasets in SEPlib. Robert G. Clapp 1 Stanford Exploration Project, Report 115, May 22, 2004, pages 479?? Short Note Parallel datasets in SEPlib Robert G. Clapp 1 Cluster computing, with relatively inexpensive computational units, looks to

More information

Burst Buffers Simulation in Dragonfly Network

Burst Buffers Simulation in Dragonfly Network Burst Buffers Simulation in Dragonfly Network Jian Peng Department of Computer Science Illinois Institute of Technology Chicago, IL, USA jpeng10@hawk.iit.edu Michael Lang Los Alamos National Laboratory

More information

LSFS: A Lightweight Segment-Structured Local File System to Boost Parallel File System Performance

LSFS: A Lightweight Segment-Structured Local File System to Boost Parallel File System Performance LSFS: A Lightweight Segment-Structured Local File System to Boost Parallel File System Performance Peng Gu, Jun Wang School of Electrical Engineering and Computer Science University of Central Florida

More information

Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp

Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp Do You Know What Your I/O Is Doing? (and how to fix it?) William Gropp www.cs.illinois.edu/~wgropp Messages Current I/O performance is often appallingly poor Even relative to what current systems can achieve

More information

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,

More information

Intelligent Networks For Fault Tolerance in Real-Time Distributed Systems. Jayakrishnan Nair

Intelligent Networks For Fault Tolerance in Real-Time Distributed Systems. Jayakrishnan Nair Intelligent Networks For Fault Tolerance in Real-Time Distributed Systems Jayakrishnan Nair Real Time Distributed Systems A Distributed System may follow a traditional Master-Slave Approach for Task Allocation

More information