DBMS Environment. Application Running in DMS. Source of data. Utilization of data. Standard files. Parallel files. Input. File. Output.

Size: px

Start display at page:

Download "DBMS Environment. Application Running in DMS. Source of data. Utilization of data. Standard files. Parallel files. Input. File. Output."

Tobias Lane
6 years ago
Views:

1 Language, Compiler and Parallel Database Support for I/O Intensive Applications? Peter Brezany a, Thomas A. Mueck b and Erich Schikuta b University of Vienna a Inst. for Softw. Technology and Parallel Systems, Liechtensteinstr. 22, A-1092 Vienna b Department of Data Engineering, Rathausstrasse 19/4, A-1010 Vienna, Austria Abstract. Automatic mapping of I/O intensive applications on massively parallel systems is a challenging problem of great importance. This paper proposes a novel solution to the I/O problem. First, Fortran language extensions are introduced that support highly ecient I/O processing. Second, we specify the appropriate compilation method that utilizes an advanced runtime system called VIPIOS that is designed on the basis of parallel database technology. We present this proposal in the context of Vienna Fortran and its compiler. 1 Introduction This paper proposes a language, compiler and runtime software solution to the problem of I/O in distributed-memory systems (DMSs). We present this proposal in the context of Vienna Fortran [8], and its compilation system. In typical supercomputing applications six types of I/O can be identied ([5]): (1) input, (2) debugging, (3) scratch les, (4) checkpoint/restart, (5) output, and (6) accessing out-of-core structures. Types (3), (4) and in some phases (6) too, do not contribute to the communication with the environment of the processing system. Therefore, the data they include may be stored on devices of the parallel I/O subsystem as parallel les. The lay-out of such les may be optimized to achieve the highest I/O data transfer rate. In our approach, this I/O functionality is implemented by the VIenna Parallel I/O System (VIPIOS). All other types of I/O operations involve les which have to resemble the standard sequential FORTRAN le format - standard les. The ow of I/O data in a typical application processing cycle is graphically sketched in Figure 1. In general the logical view of a VIPIOS le corresponds to the conventional sequential FORTRAN le model.? The work described in this paper was carried out as part of the CEI PACT Project funded by the Austrian Ministry for Science and Research (BMWF).

2 DBMS Environment Application Running in DMS Source of data Input File COPYIN READ Parallel I/O Subsystem (VIPIOS) READ/WRITE Computational nodes Utilization of data Standard files WRITE Output File COPYOUT Parallel files Fig. 1. I/O in a Typical Application Processing Cycle 2 Language Support for I/O Operations 2.1 Opening a Parallel File Specication of the File Location. The standard Fortran OPEN statement is extended by a new optional specier MODE. The meaning of MODE specier follows from the examples introduced in Figure2: units 8 and 9 will refer to standard les (MODE = 'ST'), and unit 10 will be connected to a parallel le (MODE = 'PF'). D1: IO DTYPE PAT1(M,N1,N2,K1,K2) D2: PROCESSORS P2D(M,M); ELM TYPE REAL; ARRAY SHAPE (N1,N2) D3: ARRAY DIST (CYCLIC(K1), CYCLIC(K2)) TO P2D) D4: END IO DTYPE PAT1 O1: OPEN (u1 = 8, FILE = '/usr/exa1', MODE = 'ST', STATUS = 'NEW') O2: OPEN (u2 = 9, FILE = '/usr/exa2', STATUS = 'OLD') O3: OPEN (u3 = 10, FILE = '/usr/exa3', MODE = 'PF', STATUS = 'NEW', & O4: IO DIST = PAT1(8,400,100,4,2)) C1: COPYIN (u1) ONTO (u3); COPYOUT (u3) ONTO (u2) Fig. 2. Opening and Copying Files { Examples

3 I/O Distribution Hints. Using a new optional specier IO DIST, the application programmer may pass to the compiler/runtime system a hint that data in the le will be written to or read from an array of the specied distribution. According to lines D1{D4 and O3{O4 in Figure 2, by default, elements of all arrays will be written to the le '/usr/exa3' so as to optimize reading them into real arrays which have the shape (400,100), and are distributed as (CYCLIC(4), CYCLIC(2)) onto a grid of processors having the shape (8,8). This predened global I/O distribution specication can be temporarily changed by a WRITE statement. 2.2 I/O Operations on Parallel Files (i) In the simplest form, the individual distributions of the arrays determine the sequence of array elements written out to the le. For example, in the following statement: WRITE (f) A 1, A 2,..., A r where A i, 1 i r are array identiers. This form should be used when the data is going to be read into arrays with the same distribution as A i. (ii) The IO DIST specier of the WRITE statement enables the application programmer to specify the distribution of the target array in a similar way as outlined in subsection 2.1. WRITE (f, IO DIST = PAT1(4,100,100,5,5)) A (iii) If the data in a le is to be subsequently read into arrays with dierent distributions or if there is no information available about the distribution of the target arrays, the application programmer may allow the compiler to choose the sequence of the elements to be written out. WRITE (f, IO DIST = SYSTEM) A 1,...,A r (iv) A read operation to one or more distributed arrays is specied by READ (f) B 1, B 2,..., B r (v) The REORGANIZE statement enables the application programmer to restructure a le. The statements COPYIN and COPYOUT copy les (see line C1 in Figure 2). All I/O statements may include an EVENT specier to specify the asynchronous mode. 3 Specication of the Compilation Method The implementation of parallel I/O operations introduced in the last section comprises both run time and compile time elements. At compile time, processing of parallel I/O operations conceptually consists of two phases: basic compilation and advanced optimizations. The basic compilation phase extracts parameters about data distributions and le access patterns from the VF programs and passes them in a normalized form to the VIPIOS runtime primitives without performing any sophisticated program analysis and optimization. As an example, a possible translation of the

4 OPEN and WRITE statements is shown in Figure 3. These statements are translated to calls of functions VIPIOS open and VIPIOS write, respectively. The last function writes synchronously the distributed array referenced in the statement to the open VIPIOS le. The structures dd source and dd target store the data distribution descriptors associated with the source and target arrays, respectively. The le descriptor fd contains all information about the associated le in a compact form. This information is needed during the subsequent le operations. File descriptors are stored in an one dimensional array FdArray using the unit number as an index. The value of the logical variable result indicates whether the operation succeeded or failed. Original code PROCESSORS P1D(16); REAL A(10000) DIST (BLOCK) TO P1D OPEN (13, FILE = '/usr/exa6', MODE = 'PF', STATUS = 'NEW') WRITE (13, IO DIST = '(CYCLIC)') A Transformed form (generated by VFCS automatically) PARAMETER :: Max Numb of Units =...; LOGICAL :: result TYPE (Distr Descriptor) :: dd source, dd target; TYPE (File Descriptor) :: fd TYPE (File Descriptor), DIMENSION(Max Numb of Units) :: FdArray... initialization of dd source and dd target... fd = VIPIOS open(name='/usr/exa6',status='new'); FdArray(13) = fd result = VIPIOS write (le descr=fdarray(f),data address = A, & dist source=dd source,dist target=dd target) Fig. 3. Translation of the OPEN and WRITE statements. The optimization phase utilizes the results of program analysis which are provided by the Analysis Subsystem of VFCS. Program analysis that supports I/O optimizations comprises data ow analysis, reaching distribution analysis and cost estimation. The goal of I/O optimizations is to derive an ecient parallel program with: Low I/O overhead. The compiler derives hints for data organization in the les and inserts the appropriate data reorganization statements. High amount of computation that can be performed concurrently with I/O. These optimizations restructure the computation and I/O to increase the amount of useful computation that may be performed in parallel with I/O. The VIPIOS calls oer a choice between synchronous and asynchronous I/O. I/O performed concurrently with computation and other I/O. The program analysis is capable of providing information whether or not the I/O-computation- I/O parallelism is save (due to the data dependence analysis) and useful (due to the performance analysis). If both preconditions are fullled the compiler allows I/O to run in parallel with other computations or other I/O statements.

5 4 VIPIOS In contrast to the parallelized computation supported by HPF languages (like VF) les are read and written sequentially in current implementations. I/O requests are processed by a single centralized host process and data is transferred via the network interconnections to the node processes. Therefore parallel le I/O is not yet supported by the current system architecture. 4.1 Design characteristics To exploit parallelization the physical le reads and writes have to be shifted from the host process to the node processes. The proposed solution is the VI- PIOS (Vienna Parallel Input Output System), a separate I/O subsystem, which resolves the read/write requests locally on the node. The VIPIOS is realized by cooperative parallelized data server modules running on the nodes. The data requests of the processes of each node are received and handled by the I/O subsystem on the nodes directly. The VIPIOS guarantees that each processor has access to its requested data. Based on the information about the data and the access prole provided by a HPF language system the VIPIOS organizes the information and tries to assure a high performance of the accesses to the stored data. To reach this aim, the design and development of the VIPIOS is determined by the following characteristics. Parallelism. The foremost design principle is the utilization of parallelization to achieve highest possible performance. This is be reached by parallelized accesses of processors to multiple disks. To avoid unnecessary communication and synchronization overhead the physical data distribution has to reect the problem distribution of the SPMD processes. This guarantees that each processor accesses mostly the data of its local disk ("data locality principle"). Abstract I/O model. The notion of a data type is supported by the VIPIOS. Stored information is not seen as byte sequences only, but as topologically ordered typed data values bearing semantics. This can be exploited by the runtime system and allows data administration on a higher level, which in turn results in a smarter data organization and higher performance. Ecient data administration. Finding specic information in a sequential le is a costly process, which can result into a scan of the whole le. Index structures support the accesses to stored data set. This can improve the performance dramatically, due to the situation that the size of the data set to be scanned is reduced drastically. Scalability. The size of the I/O system, i.e. the number of I/O nodes, is independent of any implementation or system dependent characteristics. The only dening attribute is the problem size. Further the possibility to change the number of I/O nodes dynamically corresponding to the problem solution process is supported. This requires the feature to redistribute the data among the changed set of participating nodes.

6 4.2 Declustering Declustering of a data set is the distribution of the blocks, the records, the data objects or the bytes of a le among two or more disk drives according to a dened declustering schema. Declustering allows the I/O system to increase the bandwidth of the I/O operations by reading and writing multiple disks in parallel. It is the common technique of parallel database systems to speed up the data accesses. Generally three dierent declustering types can be distinguished. { Key declustering. The declustering is performed according to the key values of one or more attributes of the records. { Data independent declustering. A general data-independent declustering algorithm is employed, for example round-robin declustering. { Problem declustering. The location of the data records is dened by the distribution criteria of the superimposed problem solution approach. That means in the context of a data parallel language like VF the array distribution among the processes. This approach seems extremely promising to increase the system performance due to the data locality principle. It is one of the key elements of VIPIOS project. The problem specic distribution information given by the VF example PROCESSOR P1D(5) REAL A(100, 10000, 200) DIST (:, BLOCK, :) TO P1D WRITE (f) A can be reected by the data declustering shown in Figure Implementation Basis of the VIPIOS implementation is the existing DiNG le system [6]. It is a prototype based on Distributed and Nested Grid les (i.e. DiNG le), which supports the ecient parallel execution of exact match, partial match and range queries directly by its inherent data structure. All necessary operations are provided at the system call level. Distributed and nested grid les are multikey index structures designed for mass-storage subsystems on shared nothing architectures. 5 Conclusion In this paper a novel solution to the I/O problem of HPF languages is presented. The necessary language constructs, compilation methods and the runtime support is discussed. The language constructs proposed in this paper and the VIPIOS are described and discussed in [4] in more detail. The proposed system is planned to become part of the VFCS in the future. Further interesting topics are checkpoint/restart and out-of-core program. These issues are beyond the current state of the research project, but will be tackled in the future.

7 array A Declustering method Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Fig. 4. Problem specic data declustering References 1. Bordawekar R.R., Choudhary A.N., Language and Compiler Support for Parallel I/O. IFIP Working Conf. Prog. Env. for Massively Parallel Distributed Systems, Swiss, April Bordawekar R., Rosario J.M., Choudhary A., Design and Evaluation of Primitives for Parallel I/O, in Proc. Supercomputing '93, Nov Brezany P., Gerndt M., Mehrotra P., Zima H., Concurrent File Operations in a High Performance Fortran. In Proceedings of Supercomputing'92, (November 1992), 230{ Brezany P., Mueck T.A., Schikuta E., Language, Compiler and Database Support for Parallel I/O Operations, Int. Rep. Inst. for Softw. Techn. and Par. Sys., Dept. of Data Eng., Nov Galbreath N., Gropp W., Levine D., Applications-Driven Parallel I/O. Supercomputing 93, Portland, USA, 462{ Mueck T.A., The DiNG - A Parallel Multiattribute File System for Deductive Database Machines, 3rd Int. Symp. on Database Systems for Adv. Appl., World Scientic, Taejon, Snir M., Proposal for IO. Posted to HPFF I/O Forum by Marc Snir, July Zima H., Brezany P., Chapman B., Mehrotra P., and Schwald A., Vienna Fortran { a language specication. ACPC Technical Report Series, University of Vienna, Vienna, Austria, Also available as ICASE INTERIM REPORT 21, MS 132c, NASA, Hampton VA This article was processed using the LaT E X macro package with LLNCS style

Application Programmer. Vienna Fortran Out-of-Core Program

Application Programmer. Vienna Fortran Out-of-Core Program Mass Storage Support for a Parallelizing Compilation System b a Peter Brezany a, Thomas A. Mueck b, Erich Schikuta c Institute for Software Technology and Parallel Systems, University of Vienna, Liechtensteinstrasse