COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Similar documents
COSC 6385 Computer Architecture. Storage Systems

COSC 6385 Computer Architecture Storage Systems

What is a file system

Chapter 11: File System Implementation. Objectives

Current Topics in OS Research. So, what s hot?

CS5460: Operating Systems Lecture 20: File System Reliability

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

CS 537 Fall 2017 Review Session

Bits, bytes and digital information. Lecture 2 COMPSCI111/111G

Today: Coda, xfs! Brief overview of other file systems. Distributed File System Requirements!

Operating Systems CMPSC 473 File System Implementation April 10, Lecture 21 Instructor: Trent Jaeger

Where we are going (today)

Today: Coda, xfs. Case Study: Coda File System. Brief overview of other file systems. xfs Log structured file systems HDFS Object Storage Systems

Reliable Computing I

CSE 153 Design of Operating Systems

CA485 Ray Walshe Google File System

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

File Management. Chapter 12

I/O CANNOT BE IGNORED

Where we are going (today)

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

File systems CS 241. May 2, University of Illinois

Definition of RAID Levels

Virtual File System -Uniform interface for the OS to see different file systems.

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 23 Feb 2011 Spring 2012 Exam 1

Cloud Computing CS

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

Chapter 11: Implementing File-Systems

Chapter 7: File-System

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

Distributed File Systems. Directory Hierarchy. Transfer Model

CMSC 424 Database design Lecture 12 Storage. Mihai Pop

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

DISTRIBUTED FILE SYSTEMS & NFS

I/O in scientific applications

CSE 451: Operating Systems. Section 10 Project 3 wrap-up, final exam review

Topics. " Start using a write-ahead log on disk " Log all updates Commit

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

1. Introduction. Traditionally, a high bandwidth file system comprises a supercomputer with disks connected

COS 318: Operating Systems. Journaling, NFS and WAFL

COMP520-12C Final Report. NomadFS A block migrating distributed file system

Database Systems. November 2, 2011 Lecture #7. topobo (mit)

Distributed File Systems

November 9 th, 2015 Prof. John Kubiatowicz

High Performance Computing Course Notes High Performance Storage

File Systems: Consistency Issues

C 1. Recap. CSE 486/586 Distributed Systems Distributed File Systems. Traditional Distributed File Systems. Local File Systems.

CSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011

CS 4284 Systems Capstone

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Wednesday, May 3, Several RAID "levels" have been defined. Some are more commercially viable than others.

Part IV I/O System Chapter 1 2: 12: Mass S torage Storage Structur Structur Fall 2010

Where we are going (today)

Lustre A Platform for Intelligent Scale-Out Storage

Part IV I/O System. Chapter 12: Mass Storage Structure

CSE 120: Principles of Operating Systems. Lecture 10. File Systems. November 6, Prof. Joe Pasquale

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

Module 13: Secondary-Storage Structure

Section 11: File Systems and Reliability, Two Phase Commit

The Google File System

CS 111. Operating Systems Peter Reiher

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Distributed File Systems. Distributed Systems IT332

- SLED: single large expensive disk - RAID: redundant array of (independent, inexpensive) disks

SMD149 - Operating Systems - File systems

Chapter 12: File System Implementation

File-System Structure

The Google File System

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

CS 318 Principles of Operating Systems

Chapter 11: Implementing File Systems

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

Database Applications (15-415)

RAID. Redundant Array of Inexpensive Disks. Industry tends to use Independent Disks

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

Example Implementations of File Systems

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Nov 28, 2017 Lecture 25: Distributed File Systems All slides IG

Petal and Frangipani

Chapter 11: File-System Interface

CSE 380 Computer Operating Systems

OPERATING SYSTEM. Chapter 12: File System Implementation

CS 318 Principles of Operating Systems

Lecture 18: Reliable Storage

COT 4600 Operating Systems Fall 2009

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

File System Implementation

Operating Systems. File Systems. Thomas Ropars.

Chapter 11: Implementing File Systems

I/O and file systems. Dealing with device heterogeneity

I/O CANNOT BE IGNORED

Chapter 11: Implementing File

Chapter 11: File System Implementation

CLOUD-SCALE FILE SYSTEMS

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

Disks and I/O Hakan Uraz - File Organization 1

Chapter 11: File System Implementation

Chapter 4 File Systems. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved

Transcription:

COSC 6374 Parallel I/O (I) I/O basics Fall 2010 Concept of a clusters Processor 1 local disks Compute node message passing network administrative network Memory Processor 2 Network card 1 Network card 2 1

I/O Problem (I) Every node has its own local disk Most applications require data and executable to be locally available e.g. an MPI application using multiple nodes requires executable to be available on all nodes in the same directory using the same name Multiple processes need to access the same file potentially different portions efficiency Basic characteristics of storage devices Capacity: amount of data a device can store Transfer rate or bandwidth: amount of data at which a device can read/write in a certain amount of time Access time or latency: delay before the first byte is moved Prefix Abbreviation Base ten Base two kilo, kibi K, Ki 10^3 2^10=1024 Mega, mebi M, Mi 10^6 2^20 Giga, gibi G, Gi 10^9 2^30 Tera, tebi T, Ti 10^12 2^40 Peta, pebi P, Pi 10^15 2^50 2

UNIX File Access Model A File is a sequence of bytes When a program opens a file, the file system establishes a file pointer. The file pointer is an integer indicating the position in the file, where the next byte will be written/read. Disk drives read and write data in fixed-sized units (disk sectors) File systems allocate space in blocks, which is a fixed number of contiguous disk sectors. In UNIX based file systems, the blocks that hold data are listed in an inode. An inode contains the information needed to find all the blocks that belong to a file. If a file is too large and an inode can not hold the whole list of blocks, intermediate nodes (indirect blocks) are introduced. Write operations Write: the file systems copies bytes from the user buffer into system buffer. If buffer filled up, system sends data to disk System buffering + allows file systems to collect full blocks of data before sending to disk + File system can send several blocks at once to the disk (delayed write or write behind) - Data not really saved in the case of a system crash - For very large write operations, the additional copy from user to system buffer could/should be avoided 3

Read operations Read: File system determines, which blocks contain requested data Read blocks from disk into system buffer Copy data from system buffer into user memory System buffering: + file system always reads a full block (file caching) + If application reads data sequentially, prefetching (read ahead) can improve performance - Prefetching harmful to the performance, if application has a random access pattern. Dealing with disk latency: Caching and buffering Avoids repeated access to the same block Allows a file system to smooth out I/O behavior Helps to hide the latency of the hard drives Lowers the performance of I/O operations for irregular access Non-blocking I/O gives users control over prefetching and delayed writing Initiate read/write operations as soon as possible Wait for the finishing of the read/write operations just when absolutely necessary. 4

Improving Disk Bandwidth: disk striping Utilize multiple hard drives Split a file into constant chunks and distribute them across all disks Three relevant parameters: Stripe factor: number of disks Stripe depth: size of each block Which disk contains the first block of the file Block 1 Block 2 Block 3 Block n Disk 1 Disk 2 Disk 3 Disk 4 Disk striping Ideal assumption b(n, p) = p * b(n/p, 1) with N: number of bytes to be written b: bandwidth p: number of disks Realistically: b(n,p) < p * b(n/p,1) since N is often not large enough to fully utilize p hard drives networking overhead 5

Two levels of disk striping (I) Using a RAID controller Hardware typically a single box number of disks: 3 n Redundant arrays of independent disks (RAID) Goals: improve reliability and performance of an I/O system improve performance of an I/O system Several RAID levels defined RAID 0: disk striping without redundant storage ( JBOD = just a bunch of disks) No fault tolerance Good for high transfer rates i.e. read/write bandwidth of a single large file Good for high request rates i.e. access time to many (small) files RAID 1: mirroring All data is replicated on two or more disks Does not improve write performance and just moderately the read performance 6

RAID level 2 RAID 2: Hamming codes Each group of data bits has several check bits appended to it forming Hamming code words Each bit of a Hamming code word is stored on a separate disk Very high additional costs: e.g. up to 50% additional capacity required Hardly used today since parity based codes faster and easier RAID level 3 Parity based protection: Based on exclusive OR (XOR) Reversible Example 01101010 (data byte 1) XOR 11001001 (data byte 2) -------------------------------------- 10100011 (parity byte) Recovery 11001001 (data byte 2) XOR 10100011 (parity byte) --------------------------------------- 01101010 (recovered data byte 1) 7

RAID level 3 (cont.) Data divided evenly into N subblocks (N = number of disks, typically 4 or 5) Computing parity bytes generates an additional subblock Subblocks written in parallel on N+1 disks For best performance data should be of size (N * sector size) Problems with RAID level 3: All disks are always participating in every operation => contention for applications with high access rates If data size is less than N*sector size, system has to read old subblocks to calculate the parity bytes RAID level 3 good for high transfer rates RAID level 4 Parity bytes for N disks calculated and stored Parity bytes are stored on a separate disk Files are not necessarily distributed over N disks For read operations: Determine disks for the requested blocks Read data from these disks For write operations Retrieve the old data from the sector being overwritten Retrieve parity block from the parity disk Extract old data from the parity block using XOR operations Add the new data to the parity block using XOR Store new data Store new parity block Bottleneck: parity disk is involved in every operation 8

RAID level 5 Same as RAID 4, but parity blocks are distributed on different disks Block 1 Block 2 Block 3 Block 4 P(1,2,3,4) Block 5 Block 6 Block 7 P(5,6,7,8) Block 8 RAID level 6 Tolerates the loss of more than one disk Collection of several techniques E.g. P+Q parity: store parity bytes using two different algorithms and store the two parity blocks on different disks E.g. Two dimensional parity Parity disks 9

Is RAID level 1 + RAID level 0 RAID 1 mirroring RAID level 10 RAID 0 striping Also available: RAID 53 (RAID 0 + RAID 3) Comparing RAID levels RAID level Protection Space usage Good at.. Poor at.. 0 None N Performance Data protect. 1 Mirroring 2N Data protect. Space effic. 2 Hamming codes ~1.5N Transfer rate Request rate 3 Parity N+1 Transfer rate Request rate 4 Parity N+1 Read req. rate Write perf. 5 Parity N+1 Request rate Transfer rate 6 P+Q or 2-D (N+2) or (MN+M+N) Data protect. Write perf. 10 Mirroring 2N Performance Space effic. 53 parity N+striping factor Performance Space effic. 10

Two levels of disk striping (II) Using a parallel file system exposes the individual units capable of handling data often called storage servers, I/O nodes, etc. each storage server might use multiple hard drives underneath the hood to increase its read/write bandwidth Metadata server which keeps track of which parts of a file are on which storage server Single disk failure less of a problem, if each server uses underneath the hood a RAID 5 storage system Parallel File Systems: Conceptual overview Compute nodes Meta-data server storage server 0 storage server 1 storage server 2 storage server 3 11

File access on a parallel file system Compute node Metadata server Application calls write() OS requests list of relevant I/O nodes for this write operation MD server sends storage IDs, offsets etc. OS sends data to storage servers Disk striping Requirements to improve performance of I/O operations using disk striping: Multiple physical disks Have to balance network bandwidth and I/O bandwidth Problem of simple disk striping: for a fixed file size, the number of disks which can be used in parallel is limited Prominent parallel file systems PVFS2 Lustre GPFS NFS v4.2 (new standard currently being ratified) 12

Distributed vs. Parallel File Systems Distributed File Systems Offer access to a collection of files on remote machines Typically client-server based approach Transparent for the user NFS The Network File System Protocol for a remote file service Stateless server (v3) Communication based on RPC (Remote Procedure Call) NFS provides session semantics changes to an open file are initially only visible to the process that modified the file File locking not part of NFS protocol (v3) but often available through a separate protocol/daemon Client caching not part of the NFS protocol (v3) implementation dependent behavior Network File System (NFS) Compute node = NFS client NFS server Application calls write() OS forwards data to NSF server NFS daemon receives data NFS daemon calls write() 13

Parallel vs. Distributed File Systems Concurrent access to the same file from several processes is considered to be an unlikely event Distributed file systems assume different numbers of processors than parallel file systems Distributed file systems have different security requirements than parallel file systems 14