Native Marshalling. Java Marshalling. Mb/s. kbytes

Size: px
Start display at page:

Download "Native Marshalling. Java Marshalling. Mb/s. kbytes"

Transcription

1 Design Issues for Ecient Implementation of MPI in Java Glenn Judd, Mark Clement, Quinn Snell Computer Science Department, Brigham Young University, Provo, USA Vladimir Getov 2 2 School of Computer Science, University of Westminster, London, UK Abstract While there is growing interest in using Java for high-performance applications, many in the highperformance computing community do not believe that Java can match the performance of traditional native message passing environments. This paper discusses critical issues that must be addressed in the design of Java based message passing systems. Ecient handling of these issues allows Java-MPI applications to obtain performance which rivals that of traditional native message passing systems. To illustrate these concepts, the design and performance of a pure Java implementation of MPI are discussed. Introduction The Message Passing Interface (MPI) [] has proven to be an eective means of writing portable parallel programs. With the increasing interest in using Java for high-performance computing several groups have investigated using MPI from within Java. Nevertheless, there are still many in the high-performance computing community who are skeptical that Java MPI performance can compete with native MPI. These skeptics usually refer to data showing early Java implementations of message passing standards [2] that have performed orders of magnitude slower than native versions. Their skepticism is further backed by the fact that, as of yet, there has not been an MPI implementation completely written in Java that has been competitive with native MPI implementations. To investigate possible Java MPI performance, we have designed and implemented, an MPI implementation written completely in Java which seeks to be competitive with native MPI implementations in clustered computing environments. In this paper we discuss issues that must be addressed in order to eciently implement MPI in Java, explain how they are concretely addressed in, and compare the performance obtained using with native MPI. Our results show that there are many cases where MPI in Java can compete with native MPI. Section 2 reviews previous work on Java message passing. In Section 3, we discuss issues which must be addressed in order to allow for ecient communication between MPI processes in Java. In Section 4 we discuss issues related to supporting threads in Java MPI processes. Section 5 discusses methods for integrating high performance libraries into Java. In Section 6 we discuss the design of a pure Java implementation of MPI:. Section 7 presents performance results. 2 Related Work Over the last few years many Java message passing systems have been developed. A large number of these systems such as JavaParty [3], JET [4], and IceT [5] have developed novel parallel programming methodologies using Java. Others have looked at using variations on Java Remote Method Invocation [6] or JavaSpaces [7] for high performance computing. A number of eorts have also investigated using Java versions of established message passing standards such as MPI [] and Parallel Virtual Machine (PVM) [8]: JPVM [2] is an implementation of PVM written completely Java. Unfortunately, JPVM has very poor performance compared to native PVM and MPI systems. mpijava [9] is a Java wrapper to native MPI implementations. It allows application code to be written in pure Java, but currently requires native MPI implementations in order to function. JavaMPI [] is also a Java wrapper to native MPI libraries, but JavaMPI wrappers are generated au-

2 tomatically with the help of a special-purpose tool called JCI (Java-to-C Interface generator). Eorts are currently underway to develop a standard Java MPI binding in order to increase the interoperability and quality of Java MPI bindings []. This research adds to this eort by exploring issues that must be addressed in order to eciently implement MPI in Java. These issues can then be addressed by both the developing Java MPI bindings, and by the Java environment itself in order to foster the development of ecient message passing systems which are written completely in Java. 3 Network Communication in Java 3. Native Marshalling High network communication performance is a critical element of any MPI implementation. Achieving high network communication performance under Java requires consideration of issues not found under native code. Before discussing these issues, consider Figure 6 which compares Java byte array communication to C byte array communication. It is clear that Java communication of byte arrays completely matches C in this case. As both C and Java rely on the same underlying communication library to carry out the communication, it is not surprising that they should achieve very comparable results. However, most communication in MPI consists of data other than bytes. In C this is a trivial issue since arrays of any type can be type cast to be a byte array, but in Java this issue is signicant because a simple type cast is not permitted. The most common method of sending non-byte data in Java is to marshal the data into a byte array and then send this byte array. Performing this marshaling in Java code is an intrinsically less ecient operation than performing this marshaling in native code. Consider the following Java code fragment which marshals an array of doubles: void marshal(double[] src, byte[] dst) { int count; count = ; for(int i = ; i < src.length; i++) { long value = Double. doubletolongbits(src[i]); (byte)((int)(value >>> 56)); (byte)((int)(value >>> 48)); (byte)((int)(value >>> 4)); (byte)((int)(value >>> 32)); (byte)((value >>> 24)); (byte)((value >>> 6)); (byte)((value >>> 8)); Mb/s Native Marshalling Java Marshalling kbytes Figure : Native vs. Java Marshalling (JDK.2 on a Pentium II 266MHz machine running Windows NT) } } (byte)((value >>> )); In C this marshaling can be accomplished with a simple memcpy. In Java the equivalent marshaling code requires a total of 35 Java bytecode instructions including a method invocation, several shift operations, and several type conversion operations all of which are not required in C. It is suggested in [2] that the just-in-time (JIT) compilers should be able to optimize data marshaling into its native equivalent (i.e. eliminate the method invocation, shifts, etc., and replace them with memcpy code). While this is theoretically possible, it is complicated by the fact that there are several primitive types and several approaches to marshaling each of these types that a JIT compiler would need to be able to optimize. At this time, we are not aware of any JIT compiler which even attempts this optimization. A much simpler solution is to follow the precedent established by the System.arraycopy method already included in Java. This method is used to eciently copy data between Java arrays. Currently this method requires both source and destination to be of the same data type. This routine could be extended to allow the source and destination arrays to be of dierent primitive types. Alternatively, a new method could be added which would specically allow for copying data between arrays of dierent primitive types. Note that such a method would not compromise Java language safety or security as it introduces no new functionality, but rather expedites existing functionality. As shown in Figure, this method enables huge performance increases in data marshaling speed, and enables data marshaling to occur at the same speed as a memcpy.

3 3.2 Typed Array Communication The native marshaling we have discussed still requires a memory copy. Native code is able to transfer data over the network without any copy unless the destination machine of a message uses dierent byte ordering in which case the byte order changing memory copy is required. The same approach could be used in Java code, but Java's current design does not allow this. Currently in Java, all network communication is sent using java.io.inputstream and java.io.outputstream these classes only provide routines for sending bytes. This design allows Java to leave byte ordering undened, allowing each Java Virtual Machine (JVM) to use the native machine's byte ordering. Since JVMs are only able to send bytes to each other, dierent internal byte ordering is unimportant. While this is a clean design, it limits performance for primitive arrays of type other than byte. A possible solution is to add input and output classes which are able to send typed data directly without any memory copy. These classes would have the ability to automatically determine when dierent byte ordering is used on input and output machines, and introduce byte ordering changes only when needed. These classes could also provide a uniform means for access to non-tcp/ip communication found in many high performance computing clusters. Under this scheme, Java MPI implementations would request that a "factory" provide a typed array communication class capable of communicating on the local machine's specialized network. Now consider a typical MPI implementation of MPI's standard communication mode: For messages below a certain size threshold, messages are buered and sent. For messages above the threshold, the sender blocks until the receiver actually posts the receive. This allows the message to be sent without any buering. Native marshaling allows buering in Java to occur at essentially the same speed as buering in native code. However, native marshaling does not allow Java code to send without buering. Using typed input and output classes alleviates this problem by allowing communication to occur directly between send and receive buers. The two additions to Java we have discussed allow Java applications to achieve communication performance which is comparable to that of native applications. 3.3 Shared Memory Communication On multi-processor machines, it is desirable to use direct memory transfers for communication. In Java this is accomplished by placing the multiple MPI processes in a single JVM. This introduces a signicant dierence with native MPI: MPI processes become Java threads. This means that multiple threads in a single class which uses class variables (class variables are global to the JVM) will all see the same data. This is inconvenient for applications which assume a native MPI process model where MPI processes do not see the same global information. However, the cost of this inconvenience is more than made up for by the fact that Java class variables can be exploited by programmers to provide a very simple and very powerful shared memory mechanism for threads residing on the same machine. MPI implementations can exploit this shared memory mechanism to speed up both point-to-point communication as well as global operations. We have found that global operations in Java MPI benet greatly from using Java class variables to organize a rendezvous and direct memory transfer rather the standard method of using point-to-point communication for global operations. 4 Thread Support 4. Thread Support for Shared Memory Utilization Programmers desiring the maximum amount of performance on multi-processor machines can write programs which utilize Java MPI calls between machines, and Java threads within the machine. One simple way that Java MPI implementations can aid this process is to provide a method for determining the number of processors on the machine. Java provides no mechanism for determining the number of processors on a machine, but this can be overcome by writing a method which determines the number of eective processors on a machine. This is easily accomplished by writing a routine which divides work among increasing numbers of threads. The number of processors is then determined by the occurrence of a signicant drop in the amount of work per cpu. This method can also be used by Java MPI implementations to automatically determine the number of MPI processes to run on the machine, rather than relying on a process group le. 4.2 General Thread Support Threads in Java are useful for more than just taking advantage of multiple processors. They allow many important functions such as i/o etc. to be performed outside of the main thread of execution. The widespread use of threading in Java makes support for threading a very important issue. Unfortunately, MPI includes very little thread support. Rather, MPI merely delineates what a \threaded" version of MPI should provide, and what the user should be required to provide. As threads are pervasive in Java, any Java MPI implementation should, at least, follow the guidelines provided by

4 MPI for a threaded MPI. However, the easy and power of Java threading begs for a more elegant solution. 5 Integrating Standard High Performance Libraries Into Java A signicant issue that must be addressed is how to integrate the additions we have proposed into Java. Native marshaling could easily be added into Java's current API with very little diculty, but the proposed classes for high performance communication of typed arrays introduces a more substantial change. As Java is largely driven by the needs of business applications, it is unlikely that a substantial class like the typed array networking class will make it into the core Java API, in spite of the huge performance increases possible. The issue of how to integrate a high performance computing API into Java is faced by several eorts to establish standard Java high performance computing libraries [2]. No standard method has yet been established for inclusion of these libraries in Java. Therefore, we propose a straightforward, and exible system for addition of Java high performance libraries to Java. With the introduction of the Java 2 platform, Java contains core API, packages named java.*, and several standard extension packages named javax.*. When Sun introduced the Swing API (Java's new GUI API), Sun originally dened its package as com.sun.java.swing.*. In order to move Swing into the core library for Java 2, but leave it as an extension for JDK., Sun dened Swing's package to be javax.swing. Sun then dened javax.swing as a core API, in addition to the java.* packages, in Java.2. Following this pattern, we propose that libraries critical for high-performance computing be included in standard Java Grande extensions javax.grande.*. Parts of these libraries which are useful to the general public could eventually be dened as core. Less critical libraries, or libraries still under development could be dened in grande.org.* packages. These classes could eventually be promoted to javax.* if necessary. So, under this scheme, standard native libraries critical for performance would be installed on systems. If a native library could not be found, a default Java implementation is substituted. In this way applications can have both portability to machines which do not have any native code installed, and superior performance on machines which do. 6 Design Principles In designing our implementation of MPI in Java, we followed four major principles: Pure Java Implementation A pure Java implementation is very desirable as it inherits all of Java's cross platform, security, and language safety features. The only exception we allowed to the pure Java implementation is on systems where a library for native marshaling of arrays is available. In this case, it is best to use native marshaling. If the native marshaling library is unavailable, we simply use Java marshaling. As will be shown, this small bit of native code allows messaging to compete favorably with native message passing schemes. Java Grande Forum MPI binding proposal compliance The MPI standard contains bindings for C, FORTRAN, and C++, but none for Java. It is important to have well-dened Java bindings for MPI in order to foster compatible, high-quality implementations. To remedy this situation, we are working as part of the Java Grande Forum Concurrency and Applications Working Group to develop MPI bindings. We sought to follow these emerging bindings in order to allow programs written under to run under other Java MPI systems and vice versa. High communication performance Ecient communication is critical in order to make Java MPI a viable alternative to native MPI. When is started, it rst searches for a native marshaling library when it is loaded. If a native library is found uses it to perform native marshaling as we described it earlier. If no native marshaling library is found, uses a Java library for marshaling data. does not yet use any typed array communication classes. We are currently working on incorporating them into, and we expect to see signicant performance increases when they are included. On multi-processor machines, makes use of shared access to class variables to perform ecient collective communication. This allows us to directly copy data between source and destination buers, and achieve a high degree of eciency. Independence from any particular application framework This greatly increases the usability of by allowing it to be used by any framework which provides a few simple startup functions. 7 Performance Results 7. Test Environment To quantify the performance of our current implementation, we ran benchmarks on three dierent parallel computing environments:

5 MB/s... Bytes MB/s... Bytes Figure 2: Ping Pong Distributed Memory. A cluster of dual processor Pentium II 266 MHz Windows NT machine under JDK.2 communicating via switched Mbps switched Ethernet with only one MPI process per machine. 2. The same cluster as in, but with each machine having up to two MPI processes (one per CPU). 3. A 4 processor Xeon 4 MHz Windows NT machine under JDK.2. As stated, one of our major aims is to show that MPI under Java can match native MPI performance. In order to demonstrate this we compare performance with one of the best available MPI systems for Windows NT { [3]. was chosen because an evaluation study elsewhere [4] showed it to have very good shared memory and distributed memory communication performance. We do not compare our results with JPVM and PVM 3.4 under Windows NT because their performance is signicantly less than that of. We also do not compare against Linux MPI because performance is fairly comparable to MPI on Linux (NT and Linux bandwidth on our hardware is nearly equal while Linux latency is much lower), and because Java on Linux is far less advanced than Java on Windows NT. 7.2 Point-to-Point Communication Performance Ping Pong The Ping Pong benchmark nds the maximum bandwidth that can be achieved sending bytes between two nodes, one direction at a time. As can be seen in Figure 2 and Figure 3 distributed memory communication is essentially equivalent to that of. Shared memory performance of is reasonably close to that of. (Note that this test was run with explicit tags and sources in the MPI receive call. currently performs signicantly slower when using the MPI.ANY SOURCE wildcard. The cause of this MB/s Figure 3: Ping Pong Shared Memory... Bytes Figure 4: Ping Ping Distributed Memory ineciency seems to be synchronization overhead, and we are investigating more ecient implementations.) Ping Ping The Ping Ping test (Figure 4 and Figure 5) nds the maximum bandwidth that can be obtained between two nodes when messages are being sent simultaneously in both directions. Once again, distributed memory performance is equivalent to that of. However, in this case, signicantly outperforms in a shared memory environment. Communication of Various Primitive Types The Ping Pong and Ping Ping tests measure communication of bytes. As stated previously, communicating other data types is more troublesome in Java. Figure 6 compares and communication of double precision oating point data and integer data. The native marshaling technique mentioned previously allows to reach essentially the same performance as on double precision oating point data, and on integer data, actually outperforms slightly. Now, if a native marshaling library is unavailable to, will use pure Java marshaling. The

6 System Startup Latency Shared Distributed Memory Memory 6sec 422sec 9sec 352sec MB/s... Bytes Figure 5: Ping Ping Shared Memory Microeconds Table : Latencies for and Processors Figure 7: Barrier Hybrid Memory lowest line in Figure 6 represents double precision oating point communication performance if Java marshaling is used, and clearly shows that the use of Java marshaling instead of native marshaling results in signicantly worse performance. Mbps double (native marshalling) double (Java marshalling) int (native marshalling) Message size in kbytes Figure 6: Communication of Primitive Types Startup Latency Table shows startup latency for both distributed and shared memory. distributed memory latency is lower than that of. This is possibly due to the fact is implemented directly on the Java socket API while relies on an intermediate API before accessing the Windows socket API. However, is signicantly slower in shared memory mode. This is possibly due to Java synchronization overhead. 7.3 Other benchmarks Barrier The Barrier test measures process synchronization performance. Figures 7, 8, and 9 compare performance to for the hybrid system, the shared memory system and the distributed memory system. performs well in both the hybrid and and distributed memory modes, but is signicantly slower in shared memory. This performance gap should shrink signicantly once we optimize the shared memory barrier code. NAS Parallel Benchmarks: Integer Sort As a nal test, we evaluated the performance of on a single NAS Parallel Benchmark: Integer Sort [5]. We compare this performance with the performance of both

7 Microeconds Processors Figure 8: Barrier Shared Memory Seconds PentiumII 266 Xeon 4 Xeon 4 IBM SP2 LAM IBM SP2 IBM MPI Processors Microeconds Processors Figure 9: Barrier Distributed Memory on the four processor Xeon, and of MPI on an SP2. A critical element for this benchmark is the performance of the MPI function ALLTOALLV. We optimized this function to exploit shared memory variables. As shown in gure, was able to outperform, and perform quite well compared to the SP2 [6]. 8 Conclusions We have shown several instances where MPI implemented in Java can match performance of native MPI in a clustered environment. Achieving this performance in Java requires careful implementation of data marshaling. Currently, data marshaling must occur in native code in order to achieve high performance. In our view, this functionality should be added to the core Java classes either by allowing System.arraycopy to copy between arrays of dierent types or adding a method which has this functionality. However, this still requires a memory copy. The most demanding environments will require a zero copy communication system. This is possible by adding a class similar to DataOutputStream that is capable of sending arrays without marshaling unless the message destination requires dierent byte ordering. Figure : Barrier Shared Memory - Integer Sort We have also shown that Java MPI implementations which allow multiple threads to exist in a single JVM can exploit static shared access to static variables. We have demonstrated how this technique can be used by MPI to speed up global operations, but application programmers could also use threads directly to allow shared access to data without any message passing. As Java MPI implementations mature and incorporate key communication capabilities, they will be able to provide a viable alternative to native MPI implementations. 9 Future Work We have examined some of the most critical Java MPI performance issues, but there are still many other open questions to be addressed. In addition, while our implementation of MPI contains the most essential functionality, it is not yet complete. Future work will address implementation of remaining MPI features as they are included in the nal Java MPI bindings as well as implementation on supercomputers such as the IBM SP2. References [] MPI Forum. MPI: A message-passing interface standard. International Journal of Supercomputer Applications, 8(3/4), 994. [2] A. Ferrari. JPVM: network parallel computing in Java. Concurrency: Practice and Experience, vol. (-3), pp. 985{992, [3] M. Philippsen and M. Zenger. JavaParty - transparent remote objects in Java. Concurrency: Pract. Exper., vol. 9 (), pp. 225{242, 997. [4] H. Pedroso, L. M. Silva, and J. G. Silva. Web-based metacomputing with JET. Concurrency: Pract. Exper., vol. 9 (), pp. 69{73, 997.

8 [5] P. Gray and V. Sunderam. IceT: Distributed computing and Java. Concurrency: Pract. Exper., vol. 9 (), pp. 6{67, 997. [6] Javasoft. Remote method invocation. Technical report, docs/guide/rmi/index.html, 997. [7] Javasoft. Javaspaces. Technical report, javaspaces/, 997. [8] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM 3 user's guide and reference manual. Technical Report ORNL/TM-287, Oak Ridge National Laboratory, Sept [9] B. Carpenter, G. Fox, G. Zhang, and X. Li. A draft Java binding for MPI., Nov pcrc/hpjava/mpijava.html [] S. Mintchev and V. Getov, Towards portable message passing in Java: Binding MPI, in M. Bubak, J. Dongarra, J. Wasniewski (Eds.), Recent Advances in PVM and MPI, LNCS, Springer, pp. 35{42, Nov [] B. Carpenter, V. Getov, G. Judd, T. Skjellum, G. Fox. MPI for Java: Position Document and Draft API Specication, Technical Report JGF-TR-3, Java Grande Forum, Nov [2] Java Grande Forum. Making Java Work for High-End Computing. Technical Report JGF-TR-, Java Grande Forum, Nov [3] Wmpi. Technical report, [4] M. Baker and G. Fox. Mpi on nt: A preliminary evaluation of the available environments. in: Jose Rolim (Ed.), Parallel and Distributed Computing, (2th IPPS and 9th SPDP), LNCS, Springer, pp. 549{563, April 998. [5] D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga, The NAS parallel benchmarks, Technical Report RNR-94-7, NASA Ames Research Center, NPB/ (994). [6] V. Getov, S. Flynn-Hummel, and S. Mintchev, High-performance parallel programming in Java: Exploiting native libraries, Concurrency: Exper., vol. (-3), pp. 863{872, 998. Pract.

A MIXED-LANGUAGE PROGRAMMING METHODOLOGY FOR HIGH PERFORMANCE JAVA COMPUTING*

A MIXED-LANGUAGE PROGRAMMING METHODOLOGY FOR HIGH PERFORMANCE JAVA COMPUTING* A MIXED-LANGUAGE PROGRAMMING METHODOLOGY FOR HIGH PERFORMANCE JAVA COMPUTING* Vladimir S. Getov University of Westminster Northwick Park, Harrow, UK and Los Alamos National Laboratory Los Alamos, NM, USA

More information

Performance Issues for Multi-language Java. Applications.

Performance Issues for Multi-language Java. Applications. Performance Issues for Multi-language Java Applications Paul Murray 1,Todd Smith 1, Suresh Srinivas 1, and Matthias Jacob 2 1 Silicon Graphics, Inc., Mountain View, CA fpmurray, tsmith, ssureshg@sgi.com

More information

Development Routes for Message Passing Parallelism in Java

Development Routes for Message Passing Parallelism in Java Technical Report DHPC-082 Development Routes for Message Passing Parallelism in Java J.A. Mathew, H.A. James and K.A. Hawick Distributed and High performance Computing Group Department of Computer Science,

More information

100 Mbps DEC FDDI Gigaswitch

100 Mbps DEC FDDI Gigaswitch PVM Communication Performance in a Switched FDDI Heterogeneous Distributed Computing Environment Michael J. Lewis Raymond E. Cline, Jr. Distributed Computing Department Distributed Computing Department

More information

Performance Analysis of Java Message-Passing Libraries on Fast Ethernet, Myrinet and SCI Clusters

Performance Analysis of Java Message-Passing Libraries on Fast Ethernet, Myrinet and SCI Clusters Performance Analysis of Java Message-Passing Libraries on Fast Ethernet, Myrinet and SCI Clusters Guillermo L. Taboada, Juan Touriño and Ramón Doallo Computer Architecture Group Dep. of Electronics and

More information

mpijava: an object-oriented Java interface to MPI

mpijava: an object-oriented Java interface to MPI Syracuse University SURFACE Northeast Parallel Architecture Center College of Engineering and Computer Science 1999 mpijava: an object-oriented Java interface to MPI Mark Baker Syracuse University Bryan

More information

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate

More information

MPJ: Enabling Parallel Simulations in Java

MPJ: Enabling Parallel Simulations in Java MPJ: Enabling Parallel Simulations in Java Mark Baker, Bryan Carpenter and Aamir Shafi Distributed Systems Group, University of Portsmouth Mark.Baker@computer.org, B.Carpenter@omii.ac.uk, Aamir.Shafi@port.ac.uk

More information

Applications. Message Passing Interface(PVM, MPI, P4 etc.) Socket Interface. Low Overhead Protocols. Network dependent protocols.

Applications. Message Passing Interface(PVM, MPI, P4 etc.) Socket Interface. Low Overhead Protocols. Network dependent protocols. Exploiting Multiple Heterogeneous Networks to Reduce Communication Costs in Parallel Programs JunSeong Kim jskim@ee.umn.edu David J. Lilja lilja@ee.umn.edu Department of Electrical Engineering University

More information

Performance of the MP_Lite message-passing library on Linux clusters

Performance of the MP_Lite message-passing library on Linux clusters Performance of the MP_Lite message-passing library on Linux clusters Dave Turner, Weiyi Chen and Ricky Kendall Scalable Computing Laboratory, Ames Laboratory, USA Abstract MP_Lite is a light-weight message-passing

More information

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s] Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of

More information

An Analysis of Object Orientated Methodologies in a Parallel Computing Environment

An Analysis of Object Orientated Methodologies in a Parallel Computing Environment An Analysis of Object Orientated Methodologies in a Parallel Computing Environment Travis Frisinger Computer Science Department University of Wisconsin-Eau Claire Eau Claire, WI 54702 frisintm@uwec.edu

More information

Comparing the performance of MPICH with Cray s MPI and with SGI s MPI

Comparing the performance of MPICH with Cray s MPI and with SGI s MPI CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 3; 5:779 8 (DOI:./cpe.79) Comparing the performance of with Cray s MPI and with SGI s MPI Glenn R. Luecke,, Marina

More information

Evaluating Personal High Performance Computing with PVM on Windows and LINUX Environments

Evaluating Personal High Performance Computing with PVM on Windows and LINUX Environments Evaluating Personal High Performance Computing with PVM on Windows and LINUX Environments Paulo S. Souza * Luciano J. Senger ** Marcos J. Santana ** Regina C. Santana ** e-mails: {pssouza, ljsenger, mjs,

More information

FPT Parallelizer. code parallelization data distribution Parallel FPT AST

FPT Parallelizer. code parallelization data distribution Parallel FPT AST JPT: a Java Parallelization Tool Kristof Beyls 1, Erik D'Hollander 2, and Yijun Yu 3 1 Kristof.Beyls@rug.ac.be 2 Erik.DHollander@elis.rug.ac.be 3 Yijun.Yu@elis.rug.ac.be University of Ghent, Department

More information

Abstract 1. Introduction

Abstract 1. Introduction Jaguar: A Distributed Computing Environment Based on Java Sheng-De Wang and Wei-Shen Wang Department of Electrical Engineering National Taiwan University Taipei, Taiwan Abstract As the development of network

More information

CC MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters

CC MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters CC MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Dept. of Computer Science Florida State University Tallahassee, FL 32306 {karwande,xyuan}@cs.fsu.edu

More information

Data Distribution, Migration and Replication on a cc-numa Architecture

Data Distribution, Migration and Replication on a cc-numa Architecture Data Distribution, Migration and Replication on a cc-numa Architecture J. Mark Bull and Chris Johnson EPCC, The King s Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, Scotland,

More information

Implementation of Java-MPI Binding and Its Evaluation

Implementation of Java-MPI Binding and Its Evaluation Java MPI *, **, ** * ( ) ** API MPI Java PC MPI C/C++/Fortran Implementation of Java-MPI Binding and Its Evaluation Akira KUSAKABE*, Tomoyuki HIROYASU** and Mitsunori MIKI** *Doshisha University(Presently

More information

Building MPI for Multi-Programming Systems using Implicit Information

Building MPI for Multi-Programming Systems using Implicit Information Building MPI for Multi-Programming Systems using Implicit Information Frederick C. Wong 1, Andrea C. Arpaci-Dusseau 2, and David E. Culler 1 1 Computer Science Division, University of California, Berkeley

More information

As related works, OMG's CORBA (Common Object Request Broker Architecture)[2] has been developed for long years. CORBA was intended to realize interope

As related works, OMG's CORBA (Common Object Request Broker Architecture)[2] has been developed for long years. CORBA was intended to realize interope HORB: Distributed Execution of Java Programs HIRANO Satoshi Electrotechnical Laboratory and RingServer Project 1-1-4 Umezono Tsukuba, 305 Japan hirano@etl.go.jp http://ring.etl.go.jp/openlab/horb/ Abstract.

More information

High Performance Java Remote Method Invocation for Parallel Computing on Clusters

High Performance Java Remote Method Invocation for Parallel Computing on Clusters High Performance Java Remote Method Invocation for Parallel Computing on Clusters Guillermo L. Taboada*, Carlos Teijeiro, Juan Touriño taboada@udc.es UNIVERSIDADE DA CORUÑA SPAIN IEEE Symposium on Computers

More information

Technical Report DHPC-063 Analysis and Development of Java Grande Benchmarks J.A. Mathew, P.D. Coddington and K.A. Hawick Advanced Computational Syste

Technical Report DHPC-063 Analysis and Development of Java Grande Benchmarks J.A. Mathew, P.D. Coddington and K.A. Hawick Advanced Computational Syste Technical Report DHPC-063 Analysis and Development of Java Grande Benchmarks J.A. Mathew, P.D. Coddington and K.A. Hawick Advanced Computational Systems Cooperative Research Centre Department of Computer

More information

Do! environment. DoT

Do! environment. DoT The Do! project: distributed programming using Java Pascale Launay and Jean-Louis Pazat IRISA, Campus de Beaulieu, F35042 RENNES cedex Pascale.Launay@irisa.fr, Jean-Louis.Pazat@irisa.fr http://www.irisa.fr/caps/projects/do/

More information

LINUX. Benchmark problems have been calculated with dierent cluster con- gurations. The results obtained from these experiments are compared to those

LINUX. Benchmark problems have been calculated with dierent cluster con- gurations. The results obtained from these experiments are compared to those Parallel Computing on PC Clusters - An Alternative to Supercomputers for Industrial Applications Michael Eberl 1, Wolfgang Karl 1, Carsten Trinitis 1 and Andreas Blaszczyk 2 1 Technische Universitat Munchen

More information

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh Using Java for Scientific Computing Mark Bul EPCC, University of Edinburgh markb@epcc.ed.ac.uk Java and Scientific Computing? Benefits of Java for Scientific Computing Portability Network centricity Software

More information

Untyped Memory in the Java Virtual Machine

Untyped Memory in the Java Virtual Machine Untyped Memory in the Java Virtual Machine Andreas Gal and Michael Franz University of California, Irvine {gal,franz}@uci.edu Christian W. Probst Technical University of Denmark probst@imm.dtu.dk July

More information

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS Xiaodong Zhang and Yongsheng Song 1. INTRODUCTION Networks of Workstations (NOW) have become important distributed

More information

Analysis and Development of Java Grande Benchmarks. J.A. Mathew, P.D. Coddington and K.A. Hawick

Analysis and Development of Java Grande Benchmarks. J.A. Mathew, P.D. Coddington and K.A. Hawick Analysis and Development of Java Grande Benchmarks J.A. Mathew, P.D. Coddington and K.A. Hawick Advanced Computational Systems Cooperative Research Centre Department of Computer Science, University of

More information

Developing a Thin and High Performance Implementation of Message Passing Interface 1

Developing a Thin and High Performance Implementation of Message Passing Interface 1 Developing a Thin and High Performance Implementation of Message Passing Interface 1 Theewara Vorakosit and Putchong Uthayopas Parallel Research Group Computer and Network System Research Laboratory Department

More information

An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks

An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks Ryan G. Lane Daniels Scott Xin Yuan Department of Computer Science Florida State University Tallahassee, FL 32306 {ryanlane,sdaniels,xyuan}@cs.fsu.edu

More information

NAS Applied Research Branch. Ref: Intl. Journal of Supercomputer Applications, vol. 5, no. 3 (Fall 1991), pg. 66{73. Abstract

NAS Applied Research Branch. Ref: Intl. Journal of Supercomputer Applications, vol. 5, no. 3 (Fall 1991), pg. 66{73. Abstract THE NAS PARALLEL BENCHMARKS D. H. Bailey 1, E. Barszcz 1, J. T. Barton 1,D.S.Browning 2, R. L. Carter, L. Dagum 2,R.A.Fatoohi 2,P.O.Frederickson 3, T. A. Lasinski 1,R.S. Schreiber 3, H. D. Simon 2,V.Venkatakrishnan

More information

Non-blocking Java Communications Support on Clusters

Non-blocking Java Communications Support on Clusters Non-blocking Java Communications Support on Clusters Guillermo L. Taboada*, Juan Touriño, Ramón Doallo UNIVERSIDADE DA CORUÑA SPAIN {taboada,juan,doallo}@udc.es 13th European PVM/MPI Users s Meeting (EuroPVM/MPI

More information

Frank Miller, George Apostolopoulos, and Satish Tripathi. University of Maryland. College Park, MD ffwmiller, georgeap,

Frank Miller, George Apostolopoulos, and Satish Tripathi. University of Maryland. College Park, MD ffwmiller, georgeap, Simple Input/Output Streaming in the Operating System Frank Miller, George Apostolopoulos, and Satish Tripathi Mobile Computing and Multimedia Laboratory Department of Computer Science University of Maryland

More information

PM2: High Performance Communication Middleware for Heterogeneous Network Environments

PM2: High Performance Communication Middleware for Heterogeneous Network Environments PM2: High Performance Communication Middleware for Heterogeneous Network Environments Toshiyuki Takahashi, Shinji Sumimoto, Atsushi Hori, Hiroshi Harada, and Yutaka Ishikawa Real World Computing Partnership,

More information

VM instruction formats. Bytecode translator

VM instruction formats. Bytecode translator Implementing an Ecient Java Interpreter David Gregg 1, M. Anton Ertl 2 and Andreas Krall 2 1 Department of Computer Science, Trinity College, Dublin 2, Ireland. David.Gregg@cs.tcd.ie 2 Institut fur Computersprachen,

More information

The Use of the MPI Communication Library in the NAS Parallel Benchmarks

The Use of the MPI Communication Library in the NAS Parallel Benchmarks The Use of the MPI Communication Library in the NAS Parallel Benchmarks Theodore B. Tabe, Member, IEEE Computer Society, and Quentin F. Stout, Senior Member, IEEE Computer Society 1 Abstract The statistical

More information

point in worrying about performance. The goal of our work is to show that this is not true. This paper is organised as follows. In section 2 we introd

point in worrying about performance. The goal of our work is to show that this is not true. This paper is organised as follows. In section 2 we introd A Fast Java Interpreter David Gregg 1, M. Anton Ertl 2 and Andreas Krall 2 1 Department of Computer Science, Trinity College, Dublin 2, Ireland. David.Gregg@cs.tcd.ie 2 Institut fur Computersprachen, TU

More information

Commission of the European Communities **************** ESPRIT III PROJECT NB 6756 **************** CAMAS

Commission of the European Communities **************** ESPRIT III PROJECT NB 6756 **************** CAMAS Commission of the European Communities **************** ESPRIT III PROJECT NB 6756 **************** CAMAS COMPUTER AIDED MIGRATION OF APPLICATIONS SYSTEM **************** CAMAS-TR-2.3.4 Finalization Report

More information

One-Sided Append: A New Communication Paradigm For PGAS Models

One-Sided Append: A New Communication Paradigm For PGAS Models One-Sided Append: A New Communication Paradigm For PGAS Models James Dinan and Mario Flajslik Intel Corporation {james.dinan, mario.flajslik}@intel.com ABSTRACT One-sided append represents a new class

More information

Technische Universitat Munchen. Institut fur Informatik. D Munchen.

Technische Universitat Munchen. Institut fur Informatik. D Munchen. Developing Applications for Multicomputer Systems on Workstation Clusters Georg Stellner, Arndt Bode, Stefan Lamberts and Thomas Ludwig? Technische Universitat Munchen Institut fur Informatik Lehrstuhl

More information

LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster

LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster : Support for High-Performance MPI Intra-Node Communication on Linux Cluster Hyun-Wook Jin Sayantan Sur Lei Chai Dhabaleswar K. Panda Department of Computer Science and Engineering The Ohio State University

More information

Maple on the Intel Paragon. Laurent Bernardin. Institut fur Wissenschaftliches Rechnen. ETH Zurich, Switzerland.

Maple on the Intel Paragon. Laurent Bernardin. Institut fur Wissenschaftliches Rechnen. ETH Zurich, Switzerland. Maple on the Intel Paragon Laurent Bernardin Institut fur Wissenschaftliches Rechnen ETH Zurich, Switzerland bernardin@inf.ethz.ch October 15, 1996 Abstract We ported the computer algebra system Maple

More information

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals Performance Evaluations of a Multithreaded Java Microcontroller J. Kreuzinger, M. Pfeer A. Schulz, Th. Ungerer Institute for Computer Design and Fault Tolerance University of Karlsruhe, Germany U. Brinkschulte,

More information

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX Towards an Adaptive Distributed Shared Memory (Preliminary Version ) Jai-Hoon Kim Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3 E-mail: fjhkim,vaidyag@cs.tamu.edu

More information

Component-Based Communication Support for Parallel Applications Running on Workstation Clusters

Component-Based Communication Support for Parallel Applications Running on Workstation Clusters Component-Based Communication Support for Parallel Applications Running on Workstation Clusters Antônio Augusto Fröhlich 1 and Wolfgang Schröder-Preikschat 2 1 GMD FIRST Kekulésraÿe 7 D-12489 Berlin, Germany

More information

Language-Based Parallel Program Interaction: The Breezy Approach. Darryl I. Brown Allen D. Malony. Bernd Mohr. University of Oregon

Language-Based Parallel Program Interaction: The Breezy Approach. Darryl I. Brown Allen D. Malony. Bernd Mohr. University of Oregon Language-Based Parallel Program Interaction: The Breezy Approach Darryl I. Brown Allen D. Malony Bernd Mohr Department of Computer And Information Science University of Oregon Eugene, Oregon 97403 fdarrylb,

More information

The MPBench Report. Philip J. Mucci. Kevin London. March 1998

The MPBench Report. Philip J. Mucci. Kevin London.  March 1998 The MPBench Report Philip J. Mucci Kevin London mucci@cs.utk.edu london@cs.utk.edu March 1998 1 Introduction MPBench is a benchmark to evaluate the performance of MPI and PVM on MPP's and clusters of workstations.

More information

MPI as a Coordination Layer for Communicating HPF Tasks

MPI as a Coordination Layer for Communicating HPF Tasks Syracuse University SURFACE College of Engineering and Computer Science - Former Departments, Centers, Institutes and Projects College of Engineering and Computer Science 1996 MPI as a Coordination Layer

More information

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987 Extra-High Speed Matrix Multiplication on the Cray-2 David H. Bailey September 2, 1987 Ref: SIAM J. on Scientic and Statistical Computing, vol. 9, no. 3, (May 1988), pg. 603{607 Abstract The Cray-2 is

More information

Group Management Schemes for Implementing MPI Collective Communication over IP Multicast

Group Management Schemes for Implementing MPI Collective Communication over IP Multicast Group Management Schemes for Implementing MPI Collective Communication over IP Multicast Xin Yuan Scott Daniels Ahmad Faraj Amit Karwande Department of Computer Science, Florida State University, Tallahassee,

More information

MPJava: High-Performance Message Passing in Java using Java.nio

MPJava: High-Performance Message Passing in Java using Java.nio MPJava: High-Performance Message Passing in Java using Java.nio William Pugh Dept. of Computer Science University of Maryland College Park, MD 20740 USA pugh@cs.umd.edu Jaime Spacco Dept. of Computer Science

More information

Benchmarking a Network of PCs Running Parallel Applications 1

Benchmarking a Network of PCs Running Parallel Applications 1 (To Appear: International Performance, Computing, and unications Conference, Feb. 998. Phoenix, AZ) Benchmarking a Network of PCs Running Parallel Applications Jeffrey K. Hollingsworth Erol Guven Cuneyt

More information

Multicast can be implemented here

Multicast can be implemented here MPI Collective Operations over IP Multicast? Hsiang Ann Chen, Yvette O. Carrasco, and Amy W. Apon Computer Science and Computer Engineering University of Arkansas Fayetteville, Arkansas, U.S.A fhachen,yochoa,aapong@comp.uark.edu

More information

Process 0 Process 1 MPI_Barrier MPI_Isend. MPI_Barrier. MPI_Recv. MPI_Wait. MPI_Isend message. header. MPI_Recv. buffer. message.

Process 0 Process 1 MPI_Barrier MPI_Isend. MPI_Barrier. MPI_Recv. MPI_Wait. MPI_Isend message. header. MPI_Recv. buffer. message. Where's the Overlap? An Analysis of Popular MPI Implementations J.B. White III and S.W. Bova Abstract The MPI 1:1 denition includes routines for nonblocking point-to-point communication that are intended

More information

I/O in the Gardens Non-Dedicated Cluster Computing Environment

I/O in the Gardens Non-Dedicated Cluster Computing Environment I/O in the Gardens Non-Dedicated Cluster Computing Environment Paul Roe and Siu Yuen Chan School of Computing Science Queensland University of Technology Australia fp.roe, s.chang@qut.edu.au Abstract Gardens

More information

One-Sided Routines on a SGI Origin 2000 and a Cray T3E-600. Glenn R. Luecke, Silvia Spanoyannis, Marina Kraeva

One-Sided Routines on a SGI Origin 2000 and a Cray T3E-600. Glenn R. Luecke, Silvia Spanoyannis, Marina Kraeva The Performance and Scalability of SHMEM and MPI-2 One-Sided Routines on a SGI Origin 2 and a Cray T3E-6 Glenn R. Luecke, Silvia Spanoyannis, Marina Kraeva grl@iastate.edu, spanoyan@iastate.edu, kraeva@iastate.edu

More information

WhatÕs New in the Message-Passing Toolkit

WhatÕs New in the Message-Passing Toolkit WhatÕs New in the Message-Passing Toolkit Karl Feind, Message-passing Toolkit Engineering Team, SGI ABSTRACT: SGI message-passing software has been enhanced in the past year to support larger Origin 2

More information

Intra-MIC MPI Communication using MVAPICH2: Early Experience

Intra-MIC MPI Communication using MVAPICH2: Early Experience Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University

More information

Abstract HPF was originally created to simplify high-level programming of parallel computers. The inventors of HPF strove for an easy-to-use language

Abstract HPF was originally created to simplify high-level programming of parallel computers. The inventors of HPF strove for an easy-to-use language Ecient HPF Programs Harald J. Ehold 1 Wilfried N. Gansterer 2 Dieter F. Kvasnicka 3 Christoph W. Ueberhuber 2 1 VCPC, European Centre for Parallel Computing at Vienna E-Mail: ehold@vcpc.univie.ac.at 2

More information

Parallel Programming Environments. Presented By: Anand Saoji Yogesh Patel

Parallel Programming Environments. Presented By: Anand Saoji Yogesh Patel Parallel Programming Environments Presented By: Anand Saoji Yogesh Patel Outline Introduction How? Parallel Architectures Parallel Programming Models Conclusion References Introduction Recent advancements

More information

RDMA-like VirtIO Network Device for Palacios Virtual Machines

RDMA-like VirtIO Network Device for Palacios Virtual Machines RDMA-like VirtIO Network Device for Palacios Virtual Machines Kevin Pedretti UNM ID: 101511969 CS-591 Special Topics in Virtualization May 10, 2012 Abstract This project developed an RDMA-like VirtIO network

More information

AN EMPIRICAL STUDY OF EFFICIENCY IN DISTRIBUTED PARALLEL PROCESSING

AN EMPIRICAL STUDY OF EFFICIENCY IN DISTRIBUTED PARALLEL PROCESSING AN EMPIRICAL STUDY OF EFFICIENCY IN DISTRIBUTED PARALLEL PROCESSING DR. ROGER EGGEN Department of Computer and Information Sciences University of North Florida Jacksonville, Florida 32224 USA ree@unf.edu

More information

Exploring Performance Improvement for Java-based Scientific Simulations that use the Swarm Toolkit

Exploring Performance Improvement for Java-based Scientific Simulations that use the Swarm Toolkit Exploring Performance Improvement for Java-based Scientific Simulations that use the Swarm Toolkit Xiaorong Xiang and Gregory Madey Department of Computer Science and Engineering University of Notre Dame

More information

Introduction to Java Programming

Introduction to Java Programming Introduction to Java Programming Lecture 1 CGS 3416 Spring 2017 1/9/2017 Main Components of a computer CPU - Central Processing Unit: The brain of the computer ISA - Instruction Set Architecture: the specific

More information

Array Decompositions for Nonuniform Computational Environments

Array Decompositions for Nonuniform Computational Environments Syracuse University SURFACE College of Engineering and Computer Science - Former Departments, Centers, Institutes and Projects College of Engineering and Computer Science 996 Array Decompositions for Nonuniform

More information

Abstract. provide substantial improvements in performance on a per application basis. We have used architectural customization

Abstract. provide substantial improvements in performance on a per application basis. We have used architectural customization Architectural Adaptation in MORPH Rajesh K. Gupta a Andrew Chien b a Information and Computer Science, University of California, Irvine, CA 92697. b Computer Science and Engg., University of California,

More information

Evaluation and Improvements of Programming Models for the Intel SCC Many-core Processor

Evaluation and Improvements of Programming Models for the Intel SCC Many-core Processor Evaluation and Improvements of Programming Models for the Intel SCC Many-core Processor Carsten Clauss, Stefan Lankes, Pablo Reble, Thomas Bemmerl International Workshop on New Algorithms and Programming

More information

EPOS: an Object-Oriented Operating System

EPOS: an Object-Oriented Operating System EPOS: an Object-Oriented Operating System Antônio Augusto Fröhlich 1 Wolfgang Schröder-Preikschat 2 1 GMD FIRST guto@first.gmd.de 2 University of Magdeburg wosch@ivs.cs.uni-magdeburg.de Abstract This position

More information

Java Performance Analysis for Scientific Computing

Java Performance Analysis for Scientific Computing Java Performance Analysis for Scientific Computing Roldan Pozo Leader, Mathematical Software Group National Institute of Standards and Technology USA UKHEC: Java for High End Computing Nov. 20th, 2000

More information

LAPI on HPS Evaluating Federation

LAPI on HPS Evaluating Federation LAPI on HPS Evaluating Federation Adrian Jackson August 23, 2004 Abstract LAPI is an IBM-specific communication library that performs single-sided operation. This library was well profiled on Phase 1 of

More information

Global Scheduler. Global Issue. Global Retire

Global Scheduler. Global Issue. Global Retire The Delft-Java Engine: An Introduction C. John Glossner 1;2 and Stamatis Vassiliadis 2 1 Lucent / Bell Labs, Allentown, Pa. 2 Delft University oftechnology, Department of Electrical Engineering Delft,

More information

Lecture 28: Introduction to the Message Passing Interface (MPI) (Start of Module 3 on Distribution and Locality)

Lecture 28: Introduction to the Message Passing Interface (MPI) (Start of Module 3 on Distribution and Locality) COMP 322: Fundamentals of Parallel Programming Lecture 28: Introduction to the Message Passing Interface (MPI) (Start of Module 3 on Distribution and Locality) Mack Joyner and Zoran Budimlić {mjoyner,

More information

director executor user program user program signal, breakpoint function call communication channel client library directing server

director executor user program user program signal, breakpoint function call communication channel client library directing server (appeared in Computing Systems, Vol. 8, 2, pp.107-134, MIT Press, Spring 1995.) The Dynascope Directing Server: Design and Implementation 1 Rok Sosic School of Computing and Information Technology Grith

More information

Network Object in C++

Network Object in C++ Network Object in C++ Final Project of HonorOS Professor David Maziere Po-yen Huang (Dennis) Dong-rong Wen May 9, 2003 Table of Content Abstract...3 Introduction...3 Architecture...3 The idea...3 More

More information

Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture

Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Sivakumar Harinath 1, Robert L. Grossman 1, K. Bernhard Schiefer 2, Xun Xue 2, and Sadique Syed 2 1 Laboratory of

More information

arxiv: v1 [cs.dc] 27 Sep 2018

arxiv: v1 [cs.dc] 27 Sep 2018 Performance of MPI sends of non-contiguous data Victor Eijkhout arxiv:19.177v1 [cs.dc] 7 Sep 1 1 Abstract We present an experimental investigation of the performance of MPI derived datatypes. For messages

More information

OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP

OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP (extended abstract) Mitsuhisa Sato 1, Motonari Hirano 2, Yoshio Tanaka 2 and Satoshi Sekiguchi 2 1 Real World Computing Partnership,

More information

If the agent performing this co-ordination operation was to fail having performed the in then the incremented counter would never be inserted. This wo

If the agent performing this co-ordination operation was to fail having performed the in then the incremented counter would never be inserted. This wo Mobile Co-ordination: Providing fault tolerance in tuple space based co-ordination languages. Antony Rowstron Laboratory for Communication Engineering, Engineering Department, University of Cambridge,

More information

SKaMPI: A Detailed, Accurate MPI Benchmark Ralf Reussner 1,Peter Sanders 2, Lutz Prechelt 1, and Matthias Muller 1 1 University of Karlsruhe, D

SKaMPI: A Detailed, Accurate MPI Benchmark Ralf Reussner 1,Peter Sanders 2, Lutz Prechelt 1, and Matthias Muller 1 1 University of Karlsruhe, D SKaMPI: A Detailed, Accurate MPI Benchmark Ralf Reussner 1,Peter Sanders 2, Lutz Prechelt 1, and Matthias Muller 1 1 University of Karlsruhe, D-76128 Karlsruhe 2 Max-Planck Institute for Computer Science,

More information

Comparing SIMD and MIMD Programming Modes Ravikanth Ganesan, Kannan Govindarajan, and Min-You Wu Department of Computer Science State University of Ne

Comparing SIMD and MIMD Programming Modes Ravikanth Ganesan, Kannan Govindarajan, and Min-You Wu Department of Computer Science State University of Ne Comparing SIMD and MIMD Programming Modes Ravikanth Ganesan, Kannan Govindarajan, and Min-You Wu Department of Computer Science State University of New York Bualo, NY 14260 Abstract The Connection Machine

More information

Supporting Heterogeneous Network Computing: PVM. Jack J. Dongarra. Oak Ridge National Laboratory and University of Tennessee. G. A.

Supporting Heterogeneous Network Computing: PVM. Jack J. Dongarra. Oak Ridge National Laboratory and University of Tennessee. G. A. Supporting Heterogeneous Network Computing: PVM Jack J. Dongarra Oak Ridge National Laboratory and University of Tennessee G. A. Geist Oak Ridge National Laboratory Robert Manchek University of Tennessee

More information

Low Latency MPI for Meiko CS/2 and ATM Clusters

Low Latency MPI for Meiko CS/2 and ATM Clusters Low Latency MPI for Meiko CS/2 and ATM Clusters Chris R. Jones Ambuj K. Singh Divyakant Agrawal y Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 Abstract

More information

An Ecient Parallel Algorithm. for Matrix{Vector Multiplication. Albuquerque, NM 87185

An Ecient Parallel Algorithm. for Matrix{Vector Multiplication. Albuquerque, NM 87185 An Ecient Parallel Algorithm for Matrix{Vector Multiplication Bruce Hendrickson 1, Robert Leland 2 and Steve Plimpton 3 Sandia National Laboratories Albuquerque, NM 87185 Abstract. The multiplication of

More information

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters 1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk

More information

Shared Memory vs. Message Passing: the COMOPS Benchmark Experiment

Shared Memory vs. Message Passing: the COMOPS Benchmark Experiment Shared Memory vs. Message Passing: the COMOPS Benchmark Experiment Yong Luo Scientific Computing Group CIC-19 Los Alamos National Laboratory Los Alamos, NM 87545, U.S.A. Email: yongl@lanl.gov, Fax: (505)

More information

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc. Chapter 1 GETTING STARTED SYS-ED/ Computer Education Techniques, Inc. Objectives You will learn: Java platform. Applets and applications. Java programming language: facilities and foundation. Memory management

More information

White Paper: Delivering Enterprise Web Applications on the Curl Platform

White Paper: Delivering Enterprise Web Applications on the Curl Platform White Paper: Delivering Enterprise Web Applications on the Curl Platform Table of Contents Table of Contents Executive Summary... 1 Introduction... 2 Background... 2 Challenges... 2 The Curl Solution...

More information

Java Virtual Machine

Java Virtual Machine Evaluation of Java Thread Performance on Two Dierent Multithreaded Kernels Yan Gu B. S. Lee Wentong Cai School of Applied Science Nanyang Technological University Singapore 639798 guyan@cais.ntu.edu.sg,

More information

EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH PARALLEL IN-MEMORY DATABASE. Dept. Mathematics and Computing Science div. ECP

EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH PARALLEL IN-MEMORY DATABASE. Dept. Mathematics and Computing Science div. ECP EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH CERN/ECP 95-29 11 December 1995 ON-LINE EVENT RECONSTRUCTION USING A PARALLEL IN-MEMORY DATABASE E. Argante y;z,p. v.d. Stok y, I. Willers z y Eindhoven University

More information

Throughput in Mbps. Ethernet e+06 1e+07 Block size in bits

Throughput in Mbps. Ethernet e+06 1e+07 Block size in bits NetPIPE: A Network Protocol Independent Performance Evaluator Quinn O. Snell, Armin R. Mikler and John L. Gustafson Ames Laboratory/Scalable Computing Lab, Ames, Iowa 5, USA snelljmiklerjgus@scl.ameslab.gov

More information

David H. Bailey. November 14, computational uid dynamics and other aerophysics applications. Presently this organization

David H. Bailey. November 14, computational uid dynamics and other aerophysics applications. Presently this organization Experience with Parallel Computers at NASA Ames David H. Bailey November 14, 1991 Ref: Intl. J. of High Speed Computing, vol. 5, no. 1 (993), pg. 51{62. Abstract Beginning in 1988, the Numerical Aerodynamic

More information

Yasuo Okabe. Hitoshi Murai. 1. Introduction. 2. Evaluation. Elapsed Time (sec) Number of Processors

Yasuo Okabe. Hitoshi Murai. 1. Introduction. 2. Evaluation. Elapsed Time (sec) Number of Processors Performance Evaluation of Large-scale Parallel Simulation Codes and Designing New Language Features on the (High Performance Fortran) Data-Parallel Programming Environment Project Representative Yasuo

More information

High Performance Computing Course Notes Message Passing Programming I

High Performance Computing Course Notes Message Passing Programming I High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works

More information

short long double char octet struct Throughput in Mbps Sender Buffer size in KBytes short long double char octet struct

short long double char octet struct Throughput in Mbps Sender Buffer size in KBytes short long double char octet struct Motivation Optimizations for High Performance ORBs Douglas C. Schmidt (www.cs.wustl.edu/schmidt) Aniruddha S. Gokhale (www.cs.wustl.edu/gokhale) Washington University, St. Louis, USA. Typical state of

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,

More information

Parallel Pipeline STAP System

Parallel Pipeline STAP System I/O Implementation and Evaluation of Parallel Pipelined STAP on High Performance Computers Wei-keng Liao, Alok Choudhary, Donald Weiner, and Pramod Varshney EECS Department, Syracuse University, Syracuse,

More information

Go Deep: Fixing Architectural Overheads of the Go Scheduler

Go Deep: Fixing Architectural Overheads of the Go Scheduler Go Deep: Fixing Architectural Overheads of the Go Scheduler Craig Hesling hesling@cmu.edu Sannan Tariq stariq@cs.cmu.edu May 11, 2018 1 Introduction Golang is a programming language developed to target

More information

Cost-Performance Evaluation of SMP Clusters

Cost-Performance Evaluation of SMP Clusters Cost-Performance Evaluation of SMP Clusters Darshan Thaker, Vipin Chaudhary, Guy Edjlali, and Sumit Roy Parallel and Distributed Computing Laboratory Wayne State University Department of Electrical and

More information

Communication Characteristics in the NAS Parallel Benchmarks

Communication Characteristics in the NAS Parallel Benchmarks Communication Characteristics in the NAS Parallel Benchmarks Ahmad Faraj Xin Yuan Department of Computer Science, Florida State University, Tallahassee, FL 32306 {faraj, xyuan}@cs.fsu.edu Abstract In this

More information