Hint Controlled Distribution with Parallel File Systems
|
|
- Mabel Walton
- 6 years ago
- Views:
Transcription
1 Hint Controlled Distribution with Parallel File Systems Hipolito Vasquez Lucas and Thomas Ludwig Parallele und Verteilte Systeme, Institut für Informatik, Ruprecht-Karls-Universität Heidelberg, 6912 Heidelberg, Germany {hipolito.vasquez, Abstract. The performance of scientific parallel programs with high file-i/o-activity running on top of cluster computers strongly depends on the qualitative and quantitative characteristics of the requested I/Oaccesses. It also depends on the corresponding mechanisms and policies being used at the parallel file system level. This paper presents the motivation and design of a set of MPI-IO-hints. These hints are used to select the distribution function with which a parallel file system manipulates an opened file. The implementation of a new physical distribution function called varstrip dist is also presented in this article. This function is proposed based upon spatial characteristics presented by I/O-access patterns observed at the application level. 1 Introduction Hard disks offer a cost effective solution for secondary storage, but mainly due to mechanical reasons their access time has not kept pace with the speed development of processors. Disk and microprocessor performance have evolved at different rates [1]. This difference of development at the hardware level is one of the main causes of the so-called I/O-bottleneck problem [3] in disk-based computing systems. The performance of I/O intensive scientific applications, which convey huge amounts of data between primary and secondary storage, suffers heavily due to this bottleneck. The performance of such an application depends on the I/Osubsystem architecture and on the corresponding usage of it, which is inherent to the application s nature. In order to design computing systems with the cost effective advantages of hard disks and at the same time favor I/O intensive scientific applications, which run on top of such systems, the parallel I/O approach [4] has been adopted. This consists in arranging a set of disks over which files are striped or declustered [2]. By applying this mechanism, the applications take advantage of the resulting aggregated throughput. A Beowulf cluster computer [5] in which many nodes have their own hard disk device inherently constitutes an appropriate hardware testbed for supporting parallel I/O. In order to make this parallelism, at the hardware level, visible to the applications, corresponding parallel I/O operations at the file system and B. Di Martino et al. (Eds.): EuroPVM/MPI 25, LNCS 3666, pp , 25. c Springer-Verlag Berlin Heidelberg 25
2 Hint Controlled Distribution with Parallel File Systems 111 middleware level must be supported. Two implementations which fullfill these tasks are the PVFS2 [6] parallel file system and the ROMIO [7] library, an implementation of MPI-2[16]. ROMIO accepts so-called hints that are communicated via the info argument in the functions MPI File open, MPI File set view, and MPI File set info. Their purpose is mainly to communicate information, which may improve the I/O-subsystem s performance. A hint is represented by a keyvalue pair mainly concerning parameters for striping, collective I/O, and access patterns. In this work we propose a set of hints, which we call distribution hints. This set gives the user the opportunity to choose the type of physical distribution function [2] to be applied by the PVFS2 parallel file system for the manipulation of an opened file. After choosing the type of distribution function, the user can set its corresponding parameters. Assigning a value to the strip size, for example, requires information on the type of distribution function to which this parameter belongs. To augment the set of distribution functions, which can be manipulated via distribution hints, we also propose the new varstrip dist distribution for PVFS2. We propose this distribution function taking into consideration the characteristics of spatial I/O access patterns generated from scientific parallel applications. Through the usage of the varstrip distribution the programers can control the throughput or the load balancing degree in a PVFS2-ROMIObased I/O subsytem, thus influencing the performance of their MPI-IO-based application. 2 Parallel I/O Access Patterns 2.1 Introduction Our objective in this section is to present an abstract set of spatial I/O access patterns at the application level and their parameters. These patterns represent the assignation of storage areas of a logical file to monothreaded processes of a parallel program. This assignation is known as logical file partitioning [8]. The logical file can be interpreted as a one dimensional array of data blocks, whose smallest granularity is one byte. We use this set of patterns as a reference model, in order to propose distribution functions for the PVFS2 parallel file system. We have summarized these characteristics based upon studies, which have been done on I/O intensive parallel scientific applications running mainly on multiprocessor systems [1], [12], [14]. These patterns depend on the application s nature [15], but they are also conditioned by the kind of application programming interface being used and furthermore by the way this interface is used. ROMIO s interface, for example, offers four different levels to communicate a request pattern to the I/O subsystem. Each of these levels might have different performance implications for the application [13]. 2.2 Parameters We use the following parameters to characterize a spatial I/O access pattern: request size, type of operation, andsequentiality.
3 112 H. Vasquez Lucas and T. Ludwig Table 1. Relative Sizes of R Condition R<M size.5 M size.5 <R<M size R>M size Relative Size Small Medium Big We differentiate between absolute and relative request sizes. An absolute request size, R, is the requested number of bytes from the perspective of each involved process within a parallel program. R can be uniform or variable across processes. In order to express the relative size of R, we define M size as the main memory size of the compute node, where the accessing process runs. Taking M size as reference we distinguish the types of relative sizes shown in Table 1. Requests are also characterized by the type of operations they make. In this work we consider basically read and write operations. The main criterion that we use to characterize the set of spatial access patterns used in this work is the sequentiality from the program s perspective. We consider especially two types: partitioned and interleaved [11]. A partitioned sequentiality appears when the processes collectively access the entire file in disjoint sequential segments. There is no common area in the file being used by two processes. The interleaved sequentiality appears when the accesses of every process are strided, or noncontiguous, to form a global sequential pattern. 2.3 Spatial Patterns Figure 1 shows snapshots of five spatial patterns. The circles represent processes running within a common program that are accessing a common logical file, and the arrows mean any type of operation. Pattern represents a non-mpi parallel I/O to multiple files where every process is sequential with respect to I/O. This pattern has drawbacks such as a non-one logical view of the entire data set, a difficulty to manage the number of files, and a dependency on the number of original processes. Since it can be generated using language I/O [17], it will often be applied. Patterns 1 through 4 are MPI parallel I/O variants. Their main advantage consists in offering the user a one logical view of the file. Patterns 1 and 3 fall into the category of global partitioned sequentiality, whereas 2 and 4 are variants of interleaved global sequentiality. Pattern 4 appears when each process accesses the file in a noncontiguous manner. This happens when parallel scientific applications access multidimensional data structures. It can be generated through calling the darray or the subarray function of the MPI-2 interface. We call Pattern 4 irregular because it is the result of irregularly distributed arrays. In such a pattern each process has a data array and a map array, which indicates the position in the file of the corresponding data in the data array. Such a pattern can be expressed using the MPI-2 interface through the MPI Type create indexed block. It can also unknowingly be generated by using darray in the cases where the size of
4 Hint Controlled Distribution with Parallel File Systems Fig. 1. Parallel I/O Application Patterns the array in any dimension is not evenly divisible by the number of processes in that dimension. For this kind of access load balancing is an issue. 3 Distribution Functions in PVFS2 File distribution, physical distribution or simply distribution, is a set of methods describing a mapping from a logical sequence of bytes to a physical layout of bytes on PVFS2 I/O servers, which we here simply call I/O nodes. These functions are similar to declustering or striping methods used to scatter data across many disks such as in RAID systems [18]. One of these functions is the round robin scheme, which is implemented in PVFS2. In the context of PVFS2, the logical file consists of a set of strip sizes, ss, which are stored in a contiguous manner on I/O servers [9]. These strips are stored in datafiles [19] on I/O nodes through a distribution function. 4 A Set of Distribution Hints To ease our discussion in this section we define an I/O cluster as a Beowulf cluster computer where every physical node has a secondary storage device. The default distribution function in PVFS2 is the so called simple stripe, which is a round robin mechanism, that uses a fixed value of 64KB for ss. Suppose that PVFS2 is configured on an I/O cluster such that each node is a compute and I/O node at the same time and on top of this configuration an application generates pattern 1. Under these circumstances the simple stripe might penalize some strips by sending them over the network, thus slowing down I/O operations. In this work we propose the varstrip distribution. Our approach consists in reproducing pattern 1 at each level of the software stack down to the raw hardware, thus the varstrip distribution does not scatter strips over I/O nodes in a RAID manner, but instead it guarantees that each compute node accesses
5 114 H. Vasquez Lucas and T. Ludwig Parallel I/O Intensive Applications MPI MPI IO PVFS2 I/O Hardware Fig. 2. Software Stack Environment for Distribution Hints only its own local hard disk. Furthermore the strip size to be stored or retrieved on an I/O node can be defined. The varstrip distribution allows the definition of flexible strip sizes that can be assigned to a defined datafile number, thus influencing the load balancing degree among the different I/O servers. In order to control the parameters of any distribution function from an MPI- Program, running on a similar software stack as that shown in figure 2, we introduce distribution hints. The purpose of such a hint is to select not only a type of distribution function, but also its parameters. The hint-key must have the following format: <distribution name>:<parameter type>:<parameter name>. At the moment the user can choose, using this format, the following functions: basic dist, simple stripe, andvarstrip dist. By choosing the first one, the user saves the data on one single I/O node. The second applies the round robin mechanism with a strip size of 64 KB. These functions are already part of the standard set of distributions in PVFS2. By selecting our proposed varstrip dist function the user can influence the throughput or the amount of data to be assign to the I/O nodes when manipulating an opened file. In the hint-key the parameter name must be given with its type, in order for ROMIO and PVFS2 to manipulate it. Currently the strip size, type int64, parameter for the simple stripe is supported. The parameter strips is supported for varstrip dist. This parameter represents the assignation between datafile numbers and strip sizes. The following piece of code shows the usage of varstrip dist. MPI_Info_set(theinfo, distribution_name, varstrip_dist ) /*Throughput */ MPI_Info_set(theinfo, varstrip_dist:string:strips, :1;1:1 ) /*Load Balancing*/ MPI_Info_set(theinfo, varstrip_dist:string:strips, :8;1:1 )
6 5 Experiments 5.1 Testbed Hint Controlled Distribution with Parallel File Systems 115 The hardware testbed used for the implementation and tests was an I/O cluster consisting of 5 SMP nodes (,..). Each node had two Xeon hyper-threaded processors running at 2 Ghz, a main memory size of 1 GB, and an 8 GB hard disk. These nodes were networked using a store-and-forward Gigabit Ethernet switch. The used operating system was linux with kernel On top of this operating system we installed version 1..1 of PVFS2 and MPICH2. PVFS2 was running on top of an ext3 file system and every node was configured both as client and server. The node called was configured as the metadata server. 5.2 Objective The purpose of the measurements was to compare the bandwidth observed at the nodes when using the varstrip distribution with the bandwidth observed when using pattern or two variants of the round robin PVFS2 distribution: the default distribution function with a strip size of 64KB and a variant which we called simple stripe with fitted strip size. This variant resulted from setting thesamevalueforr, ss, and datafile. When using the fitted simple stripe a compute node did not necessarily access its own secondary storage device. 5.3 Measurements Figures 3, 4, and 5 show the bandwidths, y-axes, calculated from the measured times before and after MPI File write or MPI File read operations. One single process was started per node. Each process made small, medium, read, and write R requests following pattern 1. The requests (R < 1GB) are shown on the x-axes Reads MB/s 3 2 Writes MBytes Fig. 3. Measured Bandwidth: Pattern 1, varstrip dist
7 116 H. Vasquez Lucas and T. Ludwig Reads MB/s 3 2 Writes MBytes Fig. 4. Measured Bandwidth: Pattern 1, simple stripe, fitted strip size MB/s Writes Reads MBytes Fig. 5. Measured Bandwidth: Pattern 1, simple stripe For comparison purposes the same type of operations and values of R were requested at the application level using the unix write and read functions. The data was saved or retrieved to/from the local ext3 file system directly on the involved nodes following pattern. The corresponding values are presented in Figure 6. For pattern the measured bandwidth at the nodes approximately was of 5 MB/s and 4 MB/s for read and write operations respectively. The bandwidth for write operations of s hard disk was 3 MB/s. These results correlate with similar tests made with the bonnie++ benchmarking program. Using the values obtained for pattern as reference, we obtained only 55% and 4% of performance for write and read accesses respectively when using the default function simple stripe as presented in figure 5. It was the only case where the bandwidht of write was better than that for read operations. With the fitted strip size for the simple stripe function performances of approximately 75% and 8% were measured for write and read operations re-
8 Hint Controlled Distribution with Parallel File Systems Reads MB/s 3 2 Writes MBytes Fig. 6. Bandwidth obtained using the UNIX interface spectively. Since the compute nodes were not necessarily using their own local hard disks, accessed the hard disk of during reading operations as shown in figure 4. The node, also the metadata server, used its own local disk during read operations. Figure 3 presents the performance observed when using our proposed varstrip distribution. The bandwidth reached 8% and 1% of the reference bandwidth for write and read operations respectively. 6 Conclusion and Future Work In this paper we have described a set of MPI-IO-hints, which the user can choose to select a certain distribution function of the PVFS2 parallel file system and its corresponding parameters. We have also described the varstrip distribution function. This function is proposed taking into consideration pattern 1, a parallel I/O spatial pattern, which appears at the application level. For this type of workload the varstrip distribution performs better than the other distribution functions, as shown through the experiments. Furthermore, by selecting varstrip the user can manipulate the load balancing degree among the I/O servers. Our future work consists in implementing other distribution functions and constructing a matrix with pattern-distribution pairs, which will provide information about the functions best suited for particular application patterns. During this process we shall find out for which pattern and configuration the simple stripe performs best and how well varstrip dist performs with some other patterns as workload. Acknowledgment We thank Tobias Eberle and Frederik Grüll for the implementations, and Sven Marnach, our cluster administrator.
9 118 H. Vasquez Lucas and T. Ludwig Additionally, we would like to acknowledge the Department of Education of Baden Württemberg, Germany, for supporting this work. References 1. Patterson, David A., Chen, Peter M.: Storage Performance - Metrics and Benchmarks. (1998) 2. Patterson, David A., Chen, Peter M.: Maximizing Performance in a Striped Disk Array. Proc. 17th Annual Symposium on Computer Architecture (17th ISCA 9), Computer Architecture News. (199) Hsu, W. W., Smith, A. J.: Characteristics of I/O traffic in personal computer and server workloads. IBM Syst. J. 42 (23) Hsu, W. W., Smith, A. J.: The performance impact of I/O optimizations and disk improvements. IBM Journal of Research and Development. 48 (24) Sterling, T.: An Overview of Cluster Computing. Beowulf Cluster Computing with Linux. (22) PVFS2 URL: 7. ROMIO URL: 8. Ligon, W.B., Ross, R.B.: Implementation and Performance of a Parallel File System for High Performance Distributed Applications. Proceedings of the Fifth IEEE International Symposium on High Performance Distributed Computing. (1996) Ross, Robert B., Carns, Philip H., Ligon III, Walter B., Latham, Robert: Using the Parallel Virtual File System. (22) 1. Madhyastha, Tara M.: Automatic Classification of Input/Output Access Patterns. PhD Thesis. (1997) 11. Madhyastha, Tara M., Reed, Daniel A.: Exploiting Global Input/Output Access Pattern Classification. Proceedings of SC97: High Performance Networking and Computing. (1997) 12. Thakur, Rajeev, Gropp, William, Lusk, Ewing: On implementing MPI-IO portably and with high performance. Proceedings of the 6th Workshop on I/O in Parallel and Distributed Systems (IOPADS-99). (1999) Thakur, Rajeev S., Gropp, William, Lusk, Ewing: A Case for ung MPI s derived datatypes to improve I/O Performance. Proceedings of Supercomputing 98 (CD- ROM). (1998) 14. Rabenseifner, Rolf, Koniges, Alice E., Prost, Jean-Pierre, Hedges, Richard: The Parallel Effective I/O Bandwidth Benchmark: b eff io. Parallel I/O for Cluster Computing. (24) Miller, Ethan L., Katz, Randy H.: Input/output behavior of supercomputing applications. SC. (1991) MPI-2 URL: Gropp, William, Lusk, Ewing, Thakur Rajeev: Using MPI-2: Advanced Features of the Message-Passing Interface. (1999) Patterson, David, Gibson, Garth, Katz Randy: A case for redundant arrays of inexpensive disks (RAID). Proceedings of the ACM SIGMOD International Conference on Management of Data. (1988) PVFS Development Team: PVFS 2 Concepts: the new guy s guide to PVFS. PVFS 2 Documentation (24) 2. PVFS Development Team: PVFS 2 Distribution Design Notes. PVFS 2 Documentation. (24)
Evaluating Algorithms for Shared File Pointer Operations in MPI I/O
Evaluating Algorithms for Shared File Pointer Operations in MPI I/O Ketan Kulkarni and Edgar Gabriel Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston {knkulkarni,gabriel}@cs.uh.edu
More informationEfficiency Evaluation of the Input/Output System on Computer Clusters
Efficiency Evaluation of the Input/Output System on Computer Clusters Sandra Méndez, Dolores Rexachs and Emilio Luque Computer Architecture and Operating System Department (CAOS) Universitat Autònoma de
More informationOptimizations Based on Hints in a Parallel File System
Optimizations Based on Hints in a Parallel File System María S. Pérez, Alberto Sánchez, Víctor Robles, JoséM.Peña, and Fernando Pérez DATSI. FI. Universidad Politécnica de Madrid. Spain {mperez,ascampos,vrobles,jmpena,fperez}@fi.upm.es
More informationExperiences with the Parallel Virtual File System (PVFS) in Linux Clusters
Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Kent Milfeld, Avijit Purkayastha, Chona Guiang Texas Advanced Computing Center The University of Texas Austin, Texas USA Abstract
More informationResource Management on a Mixed Processor Linux Cluster. Haibo Wang. Mississippi Center for Supercomputing Research
Resource Management on a Mixed Processor Linux Cluster Haibo Wang Mississippi Center for Supercomputing Research Many existing clusters were built as a small test-bed for small group of users and then
More informationOptimization of non-contiguous MPI-I/O operations
Optimization of non-contiguous MPI-I/O operations Enno Zickler Arbeitsbereich Wissenschaftliches Rechnen Fachbereich Informatik Fakultät für Mathematik, Informatik und Naturwissenschaften Universität Hamburg
More informationPattern-Aware File Reorganization in MPI-IO
Pattern-Aware File Reorganization in MPI-IO Jun He, Huaiming Song, Xian-He Sun, Yanlong Yin Computer Science Department Illinois Institute of Technology Chicago, Illinois 60616 {jhe24, huaiming.song, sun,
More informationECE7995 (7) Parallel I/O
ECE7995 (7) Parallel I/O 1 Parallel I/O From user s perspective: Multiple processes or threads of a parallel program accessing data concurrently from a common file From system perspective: - Files striped
More informationImplementing MPI-IO Atomic Mode Without File System Support
1 Implementing MPI-IO Atomic Mode Without File System Support Robert Ross Robert Latham William Gropp Rajeev Thakur Brian Toonen Mathematics and Computer Science Division Argonne National Laboratory Argonne,
More informationGroup Management Schemes for Implementing MPI Collective Communication over IP Multicast
Group Management Schemes for Implementing MPI Collective Communication over IP Multicast Xin Yuan Scott Daniels Ahmad Faraj Amit Karwande Department of Computer Science, Florida State University, Tallahassee,
More informationImplementing Byte-Range Locks Using MPI One-Sided Communication
Implementing Byte-Range Locks Using MPI One-Sided Communication Rajeev Thakur, Robert Ross, and Robert Latham Mathematics and Computer Science Division Argonne National Laboratory Argonne, IL 60439, USA
More informationCrossing the Chasm: Sneaking a parallel file system into Hadoop
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large
More informationLinux Software RAID Level 0 Technique for High Performance Computing by using PCI-Express based SSD
Linux Software RAID Level Technique for High Performance Computing by using PCI-Express based SSD Jae Gi Son, Taegyeong Kim, Kuk Jin Jang, *Hyedong Jung Department of Industrial Convergence, Korea Electronics
More informationParallel I/O Scheduling in Multiprogrammed Cluster Computing Systems
Parallel I/O Scheduling in Multiprogrammed Cluster Computing Systems J.H. Abawajy School of Computer Science, Carleton University, Ottawa, Canada. abawjem@scs.carleton.ca Abstract. In this paper, we address
More informationAbstract. 1. Introduction
Improved Read Performance in a Cost-Effective, Fault-Tolerant Parallel Virtual File System (CEFT-PVFS) Yifeng Zhu*, Hong Jiang*, Xiao Qin*, Dan Feng, David R. Swanson* *Department of Computer Science and
More informationData Sieving and Collective I/O in ROMIO
Appeared in Proc. of the 7th Symposium on the Frontiers of Massively Parallel Computation, February 1999, pp. 182 189. c 1999 IEEE. Data Sieving and Collective I/O in ROMIO Rajeev Thakur William Gropp
More informationWhat is a file system
COSC 6397 Big Data Analytics Distributed File Systems Edgar Gabriel Spring 2017 What is a file system A clearly defined method that the OS uses to store, catalog and retrieve files Manage the bits that
More informationImplementation and Evaluation of Prefetching in the Intel Paragon Parallel File System
Implementation and Evaluation of Prefetching in the Intel Paragon Parallel File System Meenakshi Arunachalam Alok Choudhary Brad Rullman y ECE and CIS Link Hall Syracuse University Syracuse, NY 344 E-mail:
More informationCrossing the Chasm: Sneaking a parallel file system into Hadoop
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large
More informationTracing Internal Communication in MPI and MPI-I/O
Tracing Internal Communication in MPI and MPI-I/O Julian M. Kunkel, Yuichi Tsujita, Olga Mordvinova, Thomas Ludwig Abstract MPI implementations can realize MPI operations with any algorithm that fulfills
More informationStorage Hierarchy Management for Scientific Computing
Storage Hierarchy Management for Scientific Computing by Ethan Leo Miller Sc. B. (Brown University) 1987 M.S. (University of California at Berkeley) 1990 A dissertation submitted in partial satisfaction
More informationMaking Resonance a Common Case: A High-Performance Implementation of Collective I/O on Parallel File Systems
Making Resonance a Common Case: A High-Performance Implementation of Collective on Parallel File Systems Xuechen Zhang 1, Song Jiang 1, and Kei Davis 2 1 ECE Department 2 Computer and Computational Sciences
More informationRevealing Applications Access Pattern in Collective I/O for Cache Management
Revealing Applications Access Pattern in for Yin Lu 1, Yong Chen 1, Rob Latham 2 and Yu Zhuang 1 Presented by Philip Roth 3 1 Department of Computer Science Texas Tech University 2 Mathematics and Computer
More informationBoosting Application-specific Parallel I/O Optimization using IOSIG
22 2th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Boosting Application-specific Parallel I/O Optimization using IOSIG Yanlong Yin yyin2@iit.edu Surendra Byna 2 sbyna@lbl.gov
More informationMassive Data Processing on the Acxiom Cluster Testbed
Clemson University TigerPrints Presentations School of Computing 8-2001 Massive Data Processing on the Acxiom Cluster Testbed Amy Apon Clemson University, aapon@clemson.edu Pawel Wolinski University of
More informationHow to Apply the Geospatial Data Abstraction Library (GDAL) Properly to Parallel Geospatial Raster I/O?
bs_bs_banner Short Technical Note Transactions in GIS, 2014, 18(6): 950 957 How to Apply the Geospatial Data Abstraction Library (GDAL) Properly to Parallel Geospatial Raster I/O? Cheng-Zhi Qin,* Li-Jun
More informationA FRAMEWORK ARCHITECTURE FOR SHARED FILE POINTER OPERATIONS IN OPEN MPI
A FRAMEWORK ARCHITECTURE FOR SHARED FILE POINTER OPERATIONS IN OPEN MPI A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements
More informationMODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION
INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2014) Vol. 3 (4) 273 283 MODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION MATEUSZ SMOLIŃSKI Institute of
More informationScheduling for Improved Write Performance in a Cost- Effective, Fault-Tolerant Parallel Virtual File System (CEFT-PVFS)
In the Proceedings of ClusterWorld Conference and Expo, 2003 Scheduling for Improved Write Performance in a Cost- Effective, Fault-Tolerant Parallel Virtual File System (CEFT-PVFS) Yifeng Zhu 1, Hong Jiang
More informationGuidelines for Efficient Parallel I/O on the Cray XT3/XT4
Guidelines for Efficient Parallel I/O on the Cray XT3/XT4 Jeff Larkin, Cray Inc. and Mark Fahey, Oak Ridge National Laboratory ABSTRACT: This paper will present an overview of I/O methods on Cray XT3/XT4
More informationProviding Efficient I/O Redundancy in MPI Environments
Providing Efficient I/O Redundancy in MPI Environments Willam Gropp, Robert Ross, and Neill Miller Mathematics and Computer Science Division Argonne National Laboratory Argonne, Illinois 60439 {gropp,rross,neillm}@mcs.anl.gov
More informationEnabling Active Storage on Parallel I/O Software Stacks. Seung Woo Son Mathematics and Computer Science Division
Enabling Active Storage on Parallel I/O Software Stacks Seung Woo Son sson@mcs.anl.gov Mathematics and Computer Science Division MSST 2010, Incline Village, NV May 7, 2010 Performing analysis on large
More informationData Management of Scientific Datasets on High Performance Interactive Clusters
Data Management of Scientific Datasets on High Performance Interactive Clusters Faisal Ghias Mir Master of Science Thesis Stockholm, Sweden February 2006 Abstract The data analysis on scientific datasets
More informationHP AutoRAID (Lecture 5, cs262a)
HP AutoRAID (Lecture 5, cs262a) Ali Ghodsi and Ion Stoica, UC Berkeley January 31, 2018 (based on slide from John Kubiatowicz, UC Berkeley) Array Reliability Reliability of N disks = Reliability of 1 Disk
More informationTechnische Universitat Munchen. Institut fur Informatik. D Munchen.
Developing Applications for Multicomputer Systems on Workstation Clusters Georg Stellner, Arndt Bode, Stefan Lamberts and Thomas Ludwig? Technische Universitat Munchen Institut fur Informatik Lehrstuhl
More informationImproving MPI Independent Write Performance Using A Two-Stage Write-Behind Buffering Method
Improving MPI Independent Write Performance Using A Two-Stage Write-Behind Buffering Method Wei-keng Liao 1, Avery Ching 1, Kenin Coloma 1, Alok Choudhary 1, and Mahmut Kandemir 2 1 Northwestern University
More informationA Flexible Multiagent Parallel File System for Clusters
½ ¾ Marıa S. Perez, Jesus Carretero, Felix Garcıa ¾, Jose M. Pena ½ ½, and Vıctor Robles ¾ ½ DATSI. FI. Universidad Politécnica de Madrid. Spain Departamento de Informática. Universidad Carlos III de Madrid.
More informationTwo-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems. Dong Dai, Yong Chen, Dries Kimpe, and Robert Ross
Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems Dong Dai, Yong Chen, Dries Kimpe, and Robert Ross Parallel Object Storage Many HPC systems utilize object storage: PVFS, Lustre, PanFS,
More informationDelayed Partial Parity Scheme for Reliable and High-Performance Flash Memory SSD
Delayed Partial Parity Scheme for Reliable and High-Performance Flash Memory SSD Soojun Im School of ICE Sungkyunkwan University Suwon, Korea Email: lang33@skku.edu Dongkun Shin School of ICE Sungkyunkwan
More informationPerformance Modeling and Evaluation of MPI-I/O on a Cluster *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 18, 825-836 (2002) Short Paper Performance Modeling and Evaluation of MPI-I/O on a Cluster * JACOBO BARRO, JUAN TOURIÑO, RAMON DOALLO AND VICTOR M. GULIAS
More informationHigh Performance Supercomputing using Infiniband based Clustered Servers
High Performance Supercomputing using Infiniband based Clustered Servers M.J. Johnson A.L.C. Barczak C.H. Messom Institute of Information and Mathematical Sciences Massey University Auckland, New Zealand.
More informationANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES
ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES Sergio N. Zapata, David H. Williams and Patricia A. Nava Department of Electrical and Computer Engineering The University of Texas at El Paso El Paso,
More informationEvaluating I/O Characteristics and Methods for Storing Structured Scientific Data
Evaluating I/O Characteristics and Methods for Storing Structured Scientific Data Avery Ching 1, Alok Choudhary 1, Wei-keng Liao 1,LeeWard, and Neil Pundit 1 Northwestern University Sandia National Laboratories
More informationDynamic Active Storage for High Performance I/O
Dynamic Active Storage for High Performance I/O Chao Chen(chao.chen@ttu.edu) 4.02.2012 UREaSON Outline Ø Background Ø Active Storage Ø Issues/challenges Ø Dynamic Active Storage Ø Prototyping and Evaluation
More informationPerformance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture
Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Sivakumar Harinath 1, Robert L. Grossman 1, K. Bernhard Schiefer 2, Xun Xue 2, and Sadique Syed 2 1 Laboratory of
More informationA Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing
A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou, K. Magoutis, M. Marazakis, A. Bilas Institute of Computer Science (ICS) Foundation for Research and
More informationAn In-place Algorithm for Irregular All-to-All Communication with Limited Memory
An In-place Algorithm for Irregular All-to-All Communication with Limited Memory Michael Hofmann and Gudula Rünger Department of Computer Science Chemnitz University of Technology, Germany {mhofma,ruenger}@cs.tu-chemnitz.de
More informationImplementing Fast and Reusable Datatype Processing
Implementing Fast and Reusable Datatype Processing Robert Ross, Neill Miller, and William Gropp Mathematics and Computer Science Division Argonne National Laboratory, Argonne IL 60439, USA {rross, neillm,
More informationPOCCS: A Parallel Out-of-Core Computing System for Linux Clusters
POCCS: A Parallel Out-of-Core System for Linux Clusters JIANQI TANG BINXING FANG MINGZENG HU HONGLI ZHANG Department of Computer Science and Engineering Harbin Institute of Technology No.92, West Dazhi
More informationI/O Systems and Storage Devices
CSC 256/456: Operating Systems I/O Systems and Storage Devices John Criswell! University of Rochester 1 I/O Device Controllers I/O devices have both mechanical component & electronic component! The electronic
More informationStructuring PLFS for Extensibility
Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University What is PLFS? Parallel Log Structured File System Interposed filesystem b/w
More informationA Methodology to characterize the parallel I/O of the message-passing scientific applications
A Methodology to characterize the parallel I/O of the message-passing scientific applications Sandra Méndez, Dolores Rexachs and Emilio Luque Computer Architecture and Operating Systems Department (CAOS)
More informationOPERATING SYSTEM. PREPARED BY : DHAVAL R. PATEL Page 1. Q.1 Explain Memory
Q.1 Explain Memory Data Storage in storage device like CD, HDD, DVD, Pen drive etc, is called memory. The device which storage data is called storage device. E.g. hard disk, floppy etc. There are two types
More informationExploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors
Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,
More informationExploiting redundancy to boost performance in a RAID-10 style cluster-based file system
Cluster Comput (6) 9:433 447 DOI 10.1007/s10586-006-0011-6 Exploiting redundancy to boost performance in a RAID-10 style cluster-based file system Yifeng Zhu Hong Jiang Xiao Qin Dan Feng David R. Swanson
More informationI/O Analysis and Optimization for an AMR Cosmology Application
I/O Analysis and Optimization for an AMR Cosmology Application Jianwei Li Wei-keng Liao Alok Choudhary Valerie Taylor ECE Department, Northwestern University {jianwei, wkliao, choudhar, taylor}@ece.northwestern.edu
More informationFEATURE. High-throughput Video Data Transfer of Striping with SSDs for 8K Super Hi-Vision
High-throughput Video Data Transfer of Striping with s for 8K Super Hi-Vision Takeshi KAJIYAMA, Kodai KIKUCHI and Eiichi MIYASHITA We have developed a high-throughput recording and playback method for
More informationThe Design and Implementation of a MPI-Based Parallel File System
Proc. Natl. Sci. Counc. ROC(A) Vol. 23, No. 1, 1999. pp. 50-59 (Scientific Note) The Design and Implementation of a MPI-Based Parallel File System YUNG-YU TSAI, TE-CHING HSIEH, GUO-HUA LEE, AND MING-FENG
More informationIBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide
V7 Unified Asynchronous Replication Performance Reference Guide IBM V7 Unified R1.4.2 Asynchronous Replication Performance Reference Guide Document Version 1. SONAS / V7 Unified Asynchronous Replication
More informationV. Mass Storage Systems
TDIU25: Operating Systems V. Mass Storage Systems SGG9: chapter 12 o Mass storage: Hard disks, structure, scheduling, RAID Copyright Notice: The lecture notes are mainly based on modifications of the slides
More informationA Hybrid Scheme for Object Allocation in a Distributed Object-Storage System
A Hybrid Scheme for Object Allocation in a Distributed Object-Storage System Fang Wang **, Shunda Zhang, Dan Feng, Hong Jiang, Lingfang Zeng, and Song Lv Key Laboratory of Data Storage System, Ministry
More informationDATA access is one of the critical performance bottlenecks
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 1.119/TC.216.2637353,
More informationMPICH on Clusters: Future Directions
MPICH on Clusters: Future Directions Rajeev Thakur Mathematics and Computer Science Division Argonne National Laboratory thakur@mcs.anl.gov http://www.mcs.anl.gov/~thakur Introduction Linux clusters are
More informationArchitectural Issues for the 1990s. David A. Patterson. Computer Science Division EECS Department University of California Berkeley, CA 94720
Microprocessor Forum 10/90 1 Architectural Issues for the 1990s David A. Patterson Computer Science Division EECS Department University of California Berkeley, CA 94720 1990 (presented at Microprocessor
More informationScalable Performance Analysis of Parallel Systems: Concepts and Experiences
1 Scalable Performance Analysis of Parallel Systems: Concepts and Experiences Holger Brunst ab and Wolfgang E. Nagel a a Center for High Performance Computing, Dresden University of Technology, 01062 Dresden,
More informationSingle I/O Space for Scalable Cluster Computing
Single I/O Space for Scalable Cluster Computing Roy S. C. Ho 1, Kai Hwang 1, 2, and Hai Jin 1,2 The University of Hong Kong 1 and University of Southern California 2 Email: scho@csis.hku.hk, kaihwang@usc.edu,
More informationIntroduction to Parallel I/O
Introduction to Parallel I/O Bilel Hadri bhadri@utk.edu NICS Scientific Computing Group OLCF/NICS Fall Training October 19 th, 2011 Outline Introduction to I/O Path from Application to File System Common
More informationOnline Optimization of VM Deployment in IaaS Cloud
Online Optimization of VM Deployment in IaaS Cloud Pei Fan, Zhenbang Chen, Ji Wang School of Computer Science National University of Defense Technology Changsha, 4173, P.R.China {peifan,zbchen}@nudt.edu.cn,
More informationLecture 33: More on MPI I/O. William Gropp
Lecture 33: More on MPI I/O William Gropp www.cs.illinois.edu/~wgropp Today s Topics High level parallel I/O libraries Options for efficient I/O Example of I/O for a distributed array Understanding why
More informationClassification of Partioning Problems for Networks of Heterogeneous Computers
Classification of Partioning Problems for Networks of Heterogeneous Computers Alexey Lastovetsky and Ravi Reddy Department of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland {alexey.lastovetsky,
More informationThe Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler
The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler MSST 10 Hadoop in Perspective Hadoop scales computation capacity, storage capacity, and I/O bandwidth by
More informationImage-Space-Parallel Direct Volume Rendering on a Cluster of PCs
Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr
More informationBenefits of Quadrics Scatter/Gather to PVFS2 Noncontiguous IO
Benefits of Quadrics Scatter/Gather to PVFS2 Noncontiguous IO Weikuan Yu Dhabaleswar K. Panda Network-Based Computing Lab Dept. of Computer Science & Engineering The Ohio State University {yuw,panda}@cse.ohio-state.edu
More informationSnapshot-Based Data Recovery Approach
Snapshot-Based Data Recovery Approach Jaechun No College of Electronics and Information Engineering Sejong University 98 Gunja-dong, Gwangjin-gu, Seoul Korea Abstract: - In this paper, we present the design
More informationDisk scheduling Disk reliability Tertiary storage Swap space management Linux swap space management
Lecture Overview Mass storage devices Disk scheduling Disk reliability Tertiary storage Swap space management Linux swap space management Operating Systems - June 28, 2001 Disk Structure Disk drives are
More informationDatabase Management Systems, 2nd edition, Raghu Ramakrishnan, Johannes Gehrke, McGraw-Hill
Lecture Handout Database Management System Lecture No. 34 Reading Material Database Management Systems, 2nd edition, Raghu Ramakrishnan, Johannes Gehrke, McGraw-Hill Modern Database Management, Fred McFadden,
More informationMachine-Independent Virtual Memory Management for Paged June Uniprocessor 1st, 2010and Multiproce 1 / 15
Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Matthias Lange TU Berlin June 1st, 2010 Machine-Independent Virtual Memory Management for Paged June
More informationAn Evolutionary Path to Object Storage Access
An Evolutionary Path to Object Storage Access David Goodell +, Seong Jo (Shawn) Kim*, Robert Latham +, Mahmut Kandemir*, and Robert Ross + *Pennsylvania State University + Argonne National Laboratory Outline
More informationMain Points. File layout Directory layout
File Systems Main Points File layout Directory layout File System Design Constraints For small files: Small blocks for storage efficiency Files used together should be stored together For large files:
More informationOn the scalability of tracing mechanisms 1
On the scalability of tracing mechanisms 1 Felix Freitag, Jordi Caubet, Jesus Labarta Departament d Arquitectura de Computadors (DAC) European Center for Parallelism of Barcelona (CEPBA) Universitat Politècnica
More informationLoaded: Server Load Balancing for IPv6
Loaded: Server Load Balancing for IPv6 Sven Friedrich, Sebastian Krahmer, Lars Schneidenbach, Bettina Schnor Institute of Computer Science University Potsdam Potsdam, Germany fsfried, krahmer, lschneid,
More informationExecution-driven Simulation of Network Storage Systems
Execution-driven ulation of Network Storage Systems Yijian Wang and David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA 2115 yiwang, kaeli@ece.neu.edu Abstract
More informationExploiting Shared Memory to Improve Parallel I/O Performance
Exploiting Shared Memory to Improve Parallel I/O Performance Andrew B. Hastings 1 and Alok Choudhary 2 1 Sun Microsystems, Inc. andrew.hastings@sun.com 2 Northwestern University choudhar@ece.northwestern.edu
More informationChapter 10: Mass-Storage Systems
Chapter 10: Mass-Storage Systems Silberschatz, Galvin and Gagne 2013 Chapter 10: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space
More informationThe Optimal CPU and Interconnect for an HPC Cluster
5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance
More informationIteration Based Collective I/O Strategy for Parallel I/O Systems
Iteration Based Collective I/O Strategy for Parallel I/O Systems Zhixiang Wang, Xuanhua Shi, Hai Jin, Song Wu Services Computing Technology and System Lab Cluster and Grid Computing Lab Huazhong University
More informationChapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition
Chapter 10: Mass-Storage Systems Silberschatz, Galvin and Gagne 2013 Chapter 10: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space
More informationOverview of High Performance Input/Output on LRZ HPC systems. Christoph Biardzki Richard Patra Reinhold Bader
Overview of High Performance Input/Output on LRZ HPC systems Christoph Biardzki Richard Patra Reinhold Bader Agenda Choosing the right file system Storage subsystems at LRZ Introduction to parallel file
More informationChapter 11: Implementing File Systems
Chapter 11: Implementing File Systems Operating System Concepts 99h Edition DM510-14 Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation
More informationFile. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access
File File System Implementation Operating Systems Hebrew University Spring 2009 Sequence of bytes, with no structure as far as the operating system is concerned. The only operations are to read and write
More information2011/11/04 Sunwook Bae
2011/11/04 Sunwook Bae Contents Introduction Ext4 Features Block Mapping Ext3 Block Allocation Multiple Blocks Allocator Inode Allocator Performance results Conclusion References 2 Introduction (1/3) The
More informationCSE 153 Design of Operating Systems
CSE 153 Design of Operating Systems Winter 2018 Lecture 22: File system optimizations and advanced topics There s more to filesystems J Standard Performance improvement techniques Alternative important
More informationDisk Reads with DRAM Latency
Disk Reads with DRAM Latency Garth A. Gibson?, R. Hugo Patterson& M. Satyanarayanant?School of Computer Science $Dept. of Electrical and Computer Engineering Carnegie Mellon University 5000 Forbes Avenue
More informationHDF5 I/O Performance. HDF and HDF-EOS Workshop VI December 5, 2002
HDF5 I/O Performance HDF and HDF-EOS Workshop VI December 5, 2002 1 Goal of this talk Give an overview of the HDF5 Library tuning knobs for sequential and parallel performance 2 Challenging task HDF5 Library
More informationMaximizing NFS Scalability
Maximizing NFS Scalability on Dell Servers and Storage in High-Performance Computing Environments Popular because of its maturity and ease of use, the Network File System (NFS) can be used in high-performance
More informationEnhancements to Linux I/O Scheduling
Enhancements to Linux I/O Scheduling Seetharami R. Seelam, UTEP Rodrigo Romero, UTEP Patricia J. Teller, UTEP William Buros, IBM-Austin 21 July 2005 Linux Symposium 2005 1 Introduction Dynamic Adaptability
More informationPresented by: Nafiseh Mahmoudi Spring 2017
Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory
More informationA Fast Sequential Rainfalling Watershed Segmentation Algorithm
A Fast Sequential Rainfalling Watershed Segmentation Algorithm Johan De Bock, Patrick De Smet, and Wilfried Philips Ghent University, Belgium jdebock@telin.ugent.be Abstract. In this paper we present a
More information1993 Paper 3 Question 6
993 Paper 3 Question 6 Describe the functionality you would expect to find in the file system directory service of a multi-user operating system. [0 marks] Describe two ways in which multiple names for
More informationChapter 12: File System Implementation
Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency
More information