Native Marshalling. Java Marshalling. Mb/s. kbytes
|
|
- Gordon Thomas
- 5 years ago
- Views:
Transcription
1 Design Issues for Ecient Implementation of MPI in Java Glenn Judd, Mark Clement, Quinn Snell Computer Science Department, Brigham Young University, Provo, USA Vladimir Getov 2 2 School of Computer Science, University of Westminster, London, UK Abstract While there is growing interest in using Java for high-performance applications, many in the highperformance computing community do not believe that Java can match the performance of traditional native message passing environments. This paper discusses critical issues that must be addressed in the design of Java based message passing systems. Ecient handling of these issues allows Java-MPI applications to obtain performance which rivals that of traditional native message passing systems. To illustrate these concepts, the design and performance of a pure Java implementation of MPI are discussed. Introduction The Message Passing Interface (MPI) [] has proven to be an eective means of writing portable parallel programs. With the increasing interest in using Java for high-performance computing several groups have investigated using MPI from within Java. Nevertheless, there are still many in the high-performance computing community who are skeptical that Java MPI performance can compete with native MPI. These skeptics usually refer to data showing early Java implementations of message passing standards [2] that have performed orders of magnitude slower than native versions. Their skepticism is further backed by the fact that, as of yet, there has not been an MPI implementation completely written in Java that has been competitive with native MPI implementations. To investigate possible Java MPI performance, we have designed and implemented, an MPI implementation written completely in Java which seeks to be competitive with native MPI implementations in clustered computing environments. In this paper we discuss issues that must be addressed in order to eciently implement MPI in Java, explain how they are concretely addressed in, and compare the performance obtained using with native MPI. Our results show that there are many cases where MPI in Java can compete with native MPI. Section 2 reviews previous work on Java message passing. In Section 3, we discuss issues which must be addressed in order to allow for ecient communication between MPI processes in Java. In Section 4 we discuss issues related to supporting threads in Java MPI processes. Section 5 discusses methods for integrating high performance libraries into Java. In Section 6 we discuss the design of a pure Java implementation of MPI:. Section 7 presents performance results. 2 Related Work Over the last few years many Java message passing systems have been developed. A large number of these systems such as JavaParty [3], JET [4], and IceT [5] have developed novel parallel programming methodologies using Java. Others have looked at using variations on Java Remote Method Invocation [6] or JavaSpaces [7] for high performance computing. A number of eorts have also investigated using Java versions of established message passing standards such as MPI [] and Parallel Virtual Machine (PVM) [8]: JPVM [2] is an implementation of PVM written completely Java. Unfortunately, JPVM has very poor performance compared to native PVM and MPI systems. mpijava [9] is a Java wrapper to native MPI implementations. It allows application code to be written in pure Java, but currently requires native MPI implementations in order to function. JavaMPI [] is also a Java wrapper to native MPI libraries, but JavaMPI wrappers are generated au-
2 tomatically with the help of a special-purpose tool called JCI (Java-to-C Interface generator). Eorts are currently underway to develop a standard Java MPI binding in order to increase the interoperability and quality of Java MPI bindings []. This research adds to this eort by exploring issues that must be addressed in order to eciently implement MPI in Java. These issues can then be addressed by both the developing Java MPI bindings, and by the Java environment itself in order to foster the development of ecient message passing systems which are written completely in Java. 3 Network Communication in Java 3. Native Marshalling High network communication performance is a critical element of any MPI implementation. Achieving high network communication performance under Java requires consideration of issues not found under native code. Before discussing these issues, consider Figure 6 which compares Java byte array communication to C byte array communication. It is clear that Java communication of byte arrays completely matches C in this case. As both C and Java rely on the same underlying communication library to carry out the communication, it is not surprising that they should achieve very comparable results. However, most communication in MPI consists of data other than bytes. In C this is a trivial issue since arrays of any type can be type cast to be a byte array, but in Java this issue is signicant because a simple type cast is not permitted. The most common method of sending non-byte data in Java is to marshal the data into a byte array and then send this byte array. Performing this marshaling in Java code is an intrinsically less ecient operation than performing this marshaling in native code. Consider the following Java code fragment which marshals an array of doubles: void marshal(double[] src, byte[] dst) { int count; count = ; for(int i = ; i < src.length; i++) { long value = Double. doubletolongbits(src[i]); (byte)((int)(value >>> 56)); (byte)((int)(value >>> 48)); (byte)((int)(value >>> 4)); (byte)((int)(value >>> 32)); (byte)((value >>> 24)); (byte)((value >>> 6)); (byte)((value >>> 8)); Mb/s Native Marshalling Java Marshalling kbytes Figure : Native vs. Java Marshalling (JDK.2 on a Pentium II 266MHz machine running Windows NT) } } (byte)((value >>> )); In C this marshaling can be accomplished with a simple memcpy. In Java the equivalent marshaling code requires a total of 35 Java bytecode instructions including a method invocation, several shift operations, and several type conversion operations all of which are not required in C. It is suggested in [2] that the just-in-time (JIT) compilers should be able to optimize data marshaling into its native equivalent (i.e. eliminate the method invocation, shifts, etc., and replace them with memcpy code). While this is theoretically possible, it is complicated by the fact that there are several primitive types and several approaches to marshaling each of these types that a JIT compiler would need to be able to optimize. At this time, we are not aware of any JIT compiler which even attempts this optimization. A much simpler solution is to follow the precedent established by the System.arraycopy method already included in Java. This method is used to eciently copy data between Java arrays. Currently this method requires both source and destination to be of the same data type. This routine could be extended to allow the source and destination arrays to be of dierent primitive types. Alternatively, a new method could be added which would specically allow for copying data between arrays of dierent primitive types. Note that such a method would not compromise Java language safety or security as it introduces no new functionality, but rather expedites existing functionality. As shown in Figure, this method enables huge performance increases in data marshaling speed, and enables data marshaling to occur at the same speed as a memcpy.
3 3.2 Typed Array Communication The native marshaling we have discussed still requires a memory copy. Native code is able to transfer data over the network without any copy unless the destination machine of a message uses dierent byte ordering in which case the byte order changing memory copy is required. The same approach could be used in Java code, but Java's current design does not allow this. Currently in Java, all network communication is sent using java.io.inputstream and java.io.outputstream these classes only provide routines for sending bytes. This design allows Java to leave byte ordering undened, allowing each Java Virtual Machine (JVM) to use the native machine's byte ordering. Since JVMs are only able to send bytes to each other, dierent internal byte ordering is unimportant. While this is a clean design, it limits performance for primitive arrays of type other than byte. A possible solution is to add input and output classes which are able to send typed data directly without any memory copy. These classes would have the ability to automatically determine when dierent byte ordering is used on input and output machines, and introduce byte ordering changes only when needed. These classes could also provide a uniform means for access to non-tcp/ip communication found in many high performance computing clusters. Under this scheme, Java MPI implementations would request that a "factory" provide a typed array communication class capable of communicating on the local machine's specialized network. Now consider a typical MPI implementation of MPI's standard communication mode: For messages below a certain size threshold, messages are buered and sent. For messages above the threshold, the sender blocks until the receiver actually posts the receive. This allows the message to be sent without any buering. Native marshaling allows buering in Java to occur at essentially the same speed as buering in native code. However, native marshaling does not allow Java code to send without buering. Using typed input and output classes alleviates this problem by allowing communication to occur directly between send and receive buers. The two additions to Java we have discussed allow Java applications to achieve communication performance which is comparable to that of native applications. 3.3 Shared Memory Communication On multi-processor machines, it is desirable to use direct memory transfers for communication. In Java this is accomplished by placing the multiple MPI processes in a single JVM. This introduces a signicant dierence with native MPI: MPI processes become Java threads. This means that multiple threads in a single class which uses class variables (class variables are global to the JVM) will all see the same data. This is inconvenient for applications which assume a native MPI process model where MPI processes do not see the same global information. However, the cost of this inconvenience is more than made up for by the fact that Java class variables can be exploited by programmers to provide a very simple and very powerful shared memory mechanism for threads residing on the same machine. MPI implementations can exploit this shared memory mechanism to speed up both point-to-point communication as well as global operations. We have found that global operations in Java MPI benet greatly from using Java class variables to organize a rendezvous and direct memory transfer rather the standard method of using point-to-point communication for global operations. 4 Thread Support 4. Thread Support for Shared Memory Utilization Programmers desiring the maximum amount of performance on multi-processor machines can write programs which utilize Java MPI calls between machines, and Java threads within the machine. One simple way that Java MPI implementations can aid this process is to provide a method for determining the number of processors on the machine. Java provides no mechanism for determining the number of processors on a machine, but this can be overcome by writing a method which determines the number of eective processors on a machine. This is easily accomplished by writing a routine which divides work among increasing numbers of threads. The number of processors is then determined by the occurrence of a signicant drop in the amount of work per cpu. This method can also be used by Java MPI implementations to automatically determine the number of MPI processes to run on the machine, rather than relying on a process group le. 4.2 General Thread Support Threads in Java are useful for more than just taking advantage of multiple processors. They allow many important functions such as i/o etc. to be performed outside of the main thread of execution. The widespread use of threading in Java makes support for threading a very important issue. Unfortunately, MPI includes very little thread support. Rather, MPI merely delineates what a \threaded" version of MPI should provide, and what the user should be required to provide. As threads are pervasive in Java, any Java MPI implementation should, at least, follow the guidelines provided by
4 MPI for a threaded MPI. However, the easy and power of Java threading begs for a more elegant solution. 5 Integrating Standard High Performance Libraries Into Java A signicant issue that must be addressed is how to integrate the additions we have proposed into Java. Native marshaling could easily be added into Java's current API with very little diculty, but the proposed classes for high performance communication of typed arrays introduces a more substantial change. As Java is largely driven by the needs of business applications, it is unlikely that a substantial class like the typed array networking class will make it into the core Java API, in spite of the huge performance increases possible. The issue of how to integrate a high performance computing API into Java is faced by several eorts to establish standard Java high performance computing libraries [2]. No standard method has yet been established for inclusion of these libraries in Java. Therefore, we propose a straightforward, and exible system for addition of Java high performance libraries to Java. With the introduction of the Java 2 platform, Java contains core API, packages named java.*, and several standard extension packages named javax.*. When Sun introduced the Swing API (Java's new GUI API), Sun originally dened its package as com.sun.java.swing.*. In order to move Swing into the core library for Java 2, but leave it as an extension for JDK., Sun dened Swing's package to be javax.swing. Sun then dened javax.swing as a core API, in addition to the java.* packages, in Java.2. Following this pattern, we propose that libraries critical for high-performance computing be included in standard Java Grande extensions javax.grande.*. Parts of these libraries which are useful to the general public could eventually be dened as core. Less critical libraries, or libraries still under development could be dened in grande.org.* packages. These classes could eventually be promoted to javax.* if necessary. So, under this scheme, standard native libraries critical for performance would be installed on systems. If a native library could not be found, a default Java implementation is substituted. In this way applications can have both portability to machines which do not have any native code installed, and superior performance on machines which do. 6 Design Principles In designing our implementation of MPI in Java, we followed four major principles: Pure Java Implementation A pure Java implementation is very desirable as it inherits all of Java's cross platform, security, and language safety features. The only exception we allowed to the pure Java implementation is on systems where a library for native marshaling of arrays is available. In this case, it is best to use native marshaling. If the native marshaling library is unavailable, we simply use Java marshaling. As will be shown, this small bit of native code allows messaging to compete favorably with native message passing schemes. Java Grande Forum MPI binding proposal compliance The MPI standard contains bindings for C, FORTRAN, and C++, but none for Java. It is important to have well-dened Java bindings for MPI in order to foster compatible, high-quality implementations. To remedy this situation, we are working as part of the Java Grande Forum Concurrency and Applications Working Group to develop MPI bindings. We sought to follow these emerging bindings in order to allow programs written under to run under other Java MPI systems and vice versa. High communication performance Ecient communication is critical in order to make Java MPI a viable alternative to native MPI. When is started, it rst searches for a native marshaling library when it is loaded. If a native library is found uses it to perform native marshaling as we described it earlier. If no native marshaling library is found, uses a Java library for marshaling data. does not yet use any typed array communication classes. We are currently working on incorporating them into, and we expect to see signicant performance increases when they are included. On multi-processor machines, makes use of shared access to class variables to perform ecient collective communication. This allows us to directly copy data between source and destination buers, and achieve a high degree of eciency. Independence from any particular application framework This greatly increases the usability of by allowing it to be used by any framework which provides a few simple startup functions. 7 Performance Results 7. Test Environment To quantify the performance of our current implementation, we ran benchmarks on three dierent parallel computing environments:
5 MB/s... Bytes MB/s... Bytes Figure 2: Ping Pong Distributed Memory. A cluster of dual processor Pentium II 266 MHz Windows NT machine under JDK.2 communicating via switched Mbps switched Ethernet with only one MPI process per machine. 2. The same cluster as in, but with each machine having up to two MPI processes (one per CPU). 3. A 4 processor Xeon 4 MHz Windows NT machine under JDK.2. As stated, one of our major aims is to show that MPI under Java can match native MPI performance. In order to demonstrate this we compare performance with one of the best available MPI systems for Windows NT { [3]. was chosen because an evaluation study elsewhere [4] showed it to have very good shared memory and distributed memory communication performance. We do not compare our results with JPVM and PVM 3.4 under Windows NT because their performance is signicantly less than that of. We also do not compare against Linux MPI because performance is fairly comparable to MPI on Linux (NT and Linux bandwidth on our hardware is nearly equal while Linux latency is much lower), and because Java on Linux is far less advanced than Java on Windows NT. 7.2 Point-to-Point Communication Performance Ping Pong The Ping Pong benchmark nds the maximum bandwidth that can be achieved sending bytes between two nodes, one direction at a time. As can be seen in Figure 2 and Figure 3 distributed memory communication is essentially equivalent to that of. Shared memory performance of is reasonably close to that of. (Note that this test was run with explicit tags and sources in the MPI receive call. currently performs signicantly slower when using the MPI.ANY SOURCE wildcard. The cause of this MB/s Figure 3: Ping Pong Shared Memory... Bytes Figure 4: Ping Ping Distributed Memory ineciency seems to be synchronization overhead, and we are investigating more ecient implementations.) Ping Ping The Ping Ping test (Figure 4 and Figure 5) nds the maximum bandwidth that can be obtained between two nodes when messages are being sent simultaneously in both directions. Once again, distributed memory performance is equivalent to that of. However, in this case, signicantly outperforms in a shared memory environment. Communication of Various Primitive Types The Ping Pong and Ping Ping tests measure communication of bytes. As stated previously, communicating other data types is more troublesome in Java. Figure 6 compares and communication of double precision oating point data and integer data. The native marshaling technique mentioned previously allows to reach essentially the same performance as on double precision oating point data, and on integer data, actually outperforms slightly. Now, if a native marshaling library is unavailable to, will use pure Java marshaling. The
6 System Startup Latency Shared Distributed Memory Memory 6sec 422sec 9sec 352sec MB/s... Bytes Figure 5: Ping Ping Shared Memory Microeconds Table : Latencies for and Processors Figure 7: Barrier Hybrid Memory lowest line in Figure 6 represents double precision oating point communication performance if Java marshaling is used, and clearly shows that the use of Java marshaling instead of native marshaling results in signicantly worse performance. Mbps double (native marshalling) double (Java marshalling) int (native marshalling) Message size in kbytes Figure 6: Communication of Primitive Types Startup Latency Table shows startup latency for both distributed and shared memory. distributed memory latency is lower than that of. This is possibly due to the fact is implemented directly on the Java socket API while relies on an intermediate API before accessing the Windows socket API. However, is signicantly slower in shared memory mode. This is possibly due to Java synchronization overhead. 7.3 Other benchmarks Barrier The Barrier test measures process synchronization performance. Figures 7, 8, and 9 compare performance to for the hybrid system, the shared memory system and the distributed memory system. performs well in both the hybrid and and distributed memory modes, but is signicantly slower in shared memory. This performance gap should shrink signicantly once we optimize the shared memory barrier code. NAS Parallel Benchmarks: Integer Sort As a nal test, we evaluated the performance of on a single NAS Parallel Benchmark: Integer Sort [5]. We compare this performance with the performance of both
7 Microeconds Processors Figure 8: Barrier Shared Memory Seconds PentiumII 266 Xeon 4 Xeon 4 IBM SP2 LAM IBM SP2 IBM MPI Processors Microeconds Processors Figure 9: Barrier Distributed Memory on the four processor Xeon, and of MPI on an SP2. A critical element for this benchmark is the performance of the MPI function ALLTOALLV. We optimized this function to exploit shared memory variables. As shown in gure, was able to outperform, and perform quite well compared to the SP2 [6]. 8 Conclusions We have shown several instances where MPI implemented in Java can match performance of native MPI in a clustered environment. Achieving this performance in Java requires careful implementation of data marshaling. Currently, data marshaling must occur in native code in order to achieve high performance. In our view, this functionality should be added to the core Java classes either by allowing System.arraycopy to copy between arrays of dierent types or adding a method which has this functionality. However, this still requires a memory copy. The most demanding environments will require a zero copy communication system. This is possible by adding a class similar to DataOutputStream that is capable of sending arrays without marshaling unless the message destination requires dierent byte ordering. Figure : Barrier Shared Memory - Integer Sort We have also shown that Java MPI implementations which allow multiple threads to exist in a single JVM can exploit static shared access to static variables. We have demonstrated how this technique can be used by MPI to speed up global operations, but application programmers could also use threads directly to allow shared access to data without any message passing. As Java MPI implementations mature and incorporate key communication capabilities, they will be able to provide a viable alternative to native MPI implementations. 9 Future Work We have examined some of the most critical Java MPI performance issues, but there are still many other open questions to be addressed. In addition, while our implementation of MPI contains the most essential functionality, it is not yet complete. Future work will address implementation of remaining MPI features as they are included in the nal Java MPI bindings as well as implementation on supercomputers such as the IBM SP2. References [] MPI Forum. MPI: A message-passing interface standard. International Journal of Supercomputer Applications, 8(3/4), 994. [2] A. Ferrari. JPVM: network parallel computing in Java. Concurrency: Practice and Experience, vol. (-3), pp. 985{992, [3] M. Philippsen and M. Zenger. JavaParty - transparent remote objects in Java. Concurrency: Pract. Exper., vol. 9 (), pp. 225{242, 997. [4] H. Pedroso, L. M. Silva, and J. G. Silva. Web-based metacomputing with JET. Concurrency: Pract. Exper., vol. 9 (), pp. 69{73, 997.
8 [5] P. Gray and V. Sunderam. IceT: Distributed computing and Java. Concurrency: Pract. Exper., vol. 9 (), pp. 6{67, 997. [6] Javasoft. Remote method invocation. Technical report, docs/guide/rmi/index.html, 997. [7] Javasoft. Javaspaces. Technical report, javaspaces/, 997. [8] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM 3 user's guide and reference manual. Technical Report ORNL/TM-287, Oak Ridge National Laboratory, Sept [9] B. Carpenter, G. Fox, G. Zhang, and X. Li. A draft Java binding for MPI., Nov pcrc/hpjava/mpijava.html [] S. Mintchev and V. Getov, Towards portable message passing in Java: Binding MPI, in M. Bubak, J. Dongarra, J. Wasniewski (Eds.), Recent Advances in PVM and MPI, LNCS, Springer, pp. 35{42, Nov [] B. Carpenter, V. Getov, G. Judd, T. Skjellum, G. Fox. MPI for Java: Position Document and Draft API Specication, Technical Report JGF-TR-3, Java Grande Forum, Nov [2] Java Grande Forum. Making Java Work for High-End Computing. Technical Report JGF-TR-, Java Grande Forum, Nov [3] Wmpi. Technical report, [4] M. Baker and G. Fox. Mpi on nt: A preliminary evaluation of the available environments. in: Jose Rolim (Ed.), Parallel and Distributed Computing, (2th IPPS and 9th SPDP), LNCS, Springer, pp. 549{563, April 998. [5] D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga, The NAS parallel benchmarks, Technical Report RNR-94-7, NASA Ames Research Center, NPB/ (994). [6] V. Getov, S. Flynn-Hummel, and S. Mintchev, High-performance parallel programming in Java: Exploiting native libraries, Concurrency: Exper., vol. (-3), pp. 863{872, 998. Pract.
A MIXED-LANGUAGE PROGRAMMING METHODOLOGY FOR HIGH PERFORMANCE JAVA COMPUTING*
A MIXED-LANGUAGE PROGRAMMING METHODOLOGY FOR HIGH PERFORMANCE JAVA COMPUTING* Vladimir S. Getov University of Westminster Northwick Park, Harrow, UK and Los Alamos National Laboratory Los Alamos, NM, USA
More informationPerformance Issues for Multi-language Java. Applications.
Performance Issues for Multi-language Java Applications Paul Murray 1,Todd Smith 1, Suresh Srinivas 1, and Matthias Jacob 2 1 Silicon Graphics, Inc., Mountain View, CA fpmurray, tsmith, ssureshg@sgi.com
More informationDevelopment Routes for Message Passing Parallelism in Java
Technical Report DHPC-082 Development Routes for Message Passing Parallelism in Java J.A. Mathew, H.A. James and K.A. Hawick Distributed and High performance Computing Group Department of Computer Science,
More information100 Mbps DEC FDDI Gigaswitch
PVM Communication Performance in a Switched FDDI Heterogeneous Distributed Computing Environment Michael J. Lewis Raymond E. Cline, Jr. Distributed Computing Department Distributed Computing Department
More informationPerformance Analysis of Java Message-Passing Libraries on Fast Ethernet, Myrinet and SCI Clusters
Performance Analysis of Java Message-Passing Libraries on Fast Ethernet, Myrinet and SCI Clusters Guillermo L. Taboada, Juan Touriño and Ramón Doallo Computer Architecture Group Dep. of Electronics and
More informationmpijava: an object-oriented Java interface to MPI
Syracuse University SURFACE Northeast Parallel Architecture Center College of Engineering and Computer Science 1999 mpijava: an object-oriented Java interface to MPI Mark Baker Syracuse University Bryan
More informationKevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a
Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate
More informationMPJ: Enabling Parallel Simulations in Java
MPJ: Enabling Parallel Simulations in Java Mark Baker, Bryan Carpenter and Aamir Shafi Distributed Systems Group, University of Portsmouth Mark.Baker@computer.org, B.Carpenter@omii.ac.uk, Aamir.Shafi@port.ac.uk
More informationApplications. Message Passing Interface(PVM, MPI, P4 etc.) Socket Interface. Low Overhead Protocols. Network dependent protocols.
Exploiting Multiple Heterogeneous Networks to Reduce Communication Costs in Parallel Programs JunSeong Kim jskim@ee.umn.edu David J. Lilja lilja@ee.umn.edu Department of Electrical Engineering University
More informationPerformance of the MP_Lite message-passing library on Linux clusters
Performance of the MP_Lite message-passing library on Linux clusters Dave Turner, Weiyi Chen and Ricky Kendall Scalable Computing Laboratory, Ames Laboratory, USA Abstract MP_Lite is a light-weight message-passing
More informationCluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]
Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of
More informationAn Analysis of Object Orientated Methodologies in a Parallel Computing Environment
An Analysis of Object Orientated Methodologies in a Parallel Computing Environment Travis Frisinger Computer Science Department University of Wisconsin-Eau Claire Eau Claire, WI 54702 frisintm@uwec.edu
More informationComparing the performance of MPICH with Cray s MPI and with SGI s MPI
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 3; 5:779 8 (DOI:./cpe.79) Comparing the performance of with Cray s MPI and with SGI s MPI Glenn R. Luecke,, Marina
More informationEvaluating Personal High Performance Computing with PVM on Windows and LINUX Environments
Evaluating Personal High Performance Computing with PVM on Windows and LINUX Environments Paulo S. Souza * Luciano J. Senger ** Marcos J. Santana ** Regina C. Santana ** e-mails: {pssouza, ljsenger, mjs,
More informationFPT Parallelizer. code parallelization data distribution Parallel FPT AST
JPT: a Java Parallelization Tool Kristof Beyls 1, Erik D'Hollander 2, and Yijun Yu 3 1 Kristof.Beyls@rug.ac.be 2 Erik.DHollander@elis.rug.ac.be 3 Yijun.Yu@elis.rug.ac.be University of Ghent, Department
More informationAbstract 1. Introduction
Jaguar: A Distributed Computing Environment Based on Java Sheng-De Wang and Wei-Shen Wang Department of Electrical Engineering National Taiwan University Taipei, Taiwan Abstract As the development of network
More informationCC MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters
CC MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Dept. of Computer Science Florida State University Tallahassee, FL 32306 {karwande,xyuan}@cs.fsu.edu
More informationData Distribution, Migration and Replication on a cc-numa Architecture
Data Distribution, Migration and Replication on a cc-numa Architecture J. Mark Bull and Chris Johnson EPCC, The King s Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, Scotland,
More informationImplementation of Java-MPI Binding and Its Evaluation
Java MPI *, **, ** * ( ) ** API MPI Java PC MPI C/C++/Fortran Implementation of Java-MPI Binding and Its Evaluation Akira KUSAKABE*, Tomoyuki HIROYASU** and Mitsunori MIKI** *Doshisha University(Presently
More informationBuilding MPI for Multi-Programming Systems using Implicit Information
Building MPI for Multi-Programming Systems using Implicit Information Frederick C. Wong 1, Andrea C. Arpaci-Dusseau 2, and David E. Culler 1 1 Computer Science Division, University of California, Berkeley
More informationAs related works, OMG's CORBA (Common Object Request Broker Architecture)[2] has been developed for long years. CORBA was intended to realize interope
HORB: Distributed Execution of Java Programs HIRANO Satoshi Electrotechnical Laboratory and RingServer Project 1-1-4 Umezono Tsukuba, 305 Japan hirano@etl.go.jp http://ring.etl.go.jp/openlab/horb/ Abstract.
More informationHigh Performance Java Remote Method Invocation for Parallel Computing on Clusters
High Performance Java Remote Method Invocation for Parallel Computing on Clusters Guillermo L. Taboada*, Carlos Teijeiro, Juan Touriño taboada@udc.es UNIVERSIDADE DA CORUÑA SPAIN IEEE Symposium on Computers
More informationTechnical Report DHPC-063 Analysis and Development of Java Grande Benchmarks J.A. Mathew, P.D. Coddington and K.A. Hawick Advanced Computational Syste
Technical Report DHPC-063 Analysis and Development of Java Grande Benchmarks J.A. Mathew, P.D. Coddington and K.A. Hawick Advanced Computational Systems Cooperative Research Centre Department of Computer
More informationDo! environment. DoT
The Do! project: distributed programming using Java Pascale Launay and Jean-Louis Pazat IRISA, Campus de Beaulieu, F35042 RENNES cedex Pascale.Launay@irisa.fr, Jean-Louis.Pazat@irisa.fr http://www.irisa.fr/caps/projects/do/
More informationLINUX. Benchmark problems have been calculated with dierent cluster con- gurations. The results obtained from these experiments are compared to those
Parallel Computing on PC Clusters - An Alternative to Supercomputers for Industrial Applications Michael Eberl 1, Wolfgang Karl 1, Carsten Trinitis 1 and Andreas Blaszczyk 2 1 Technische Universitat Munchen
More informationUsing Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh
Using Java for Scientific Computing Mark Bul EPCC, University of Edinburgh markb@epcc.ed.ac.uk Java and Scientific Computing? Benefits of Java for Scientific Computing Portability Network centricity Software
More informationUntyped Memory in the Java Virtual Machine
Untyped Memory in the Java Virtual Machine Andreas Gal and Michael Franz University of California, Irvine {gal,franz}@uci.edu Christian W. Probst Technical University of Denmark probst@imm.dtu.dk July
More informationCHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song
CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS Xiaodong Zhang and Yongsheng Song 1. INTRODUCTION Networks of Workstations (NOW) have become important distributed
More informationAnalysis and Development of Java Grande Benchmarks. J.A. Mathew, P.D. Coddington and K.A. Hawick
Analysis and Development of Java Grande Benchmarks J.A. Mathew, P.D. Coddington and K.A. Hawick Advanced Computational Systems Cooperative Research Centre Department of Computer Science, University of
More informationDeveloping a Thin and High Performance Implementation of Message Passing Interface 1
Developing a Thin and High Performance Implementation of Message Passing Interface 1 Theewara Vorakosit and Putchong Uthayopas Parallel Research Group Computer and Network System Research Laboratory Department
More informationAn Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks
An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks Ryan G. Lane Daniels Scott Xin Yuan Department of Computer Science Florida State University Tallahassee, FL 32306 {ryanlane,sdaniels,xyuan}@cs.fsu.edu
More informationNAS Applied Research Branch. Ref: Intl. Journal of Supercomputer Applications, vol. 5, no. 3 (Fall 1991), pg. 66{73. Abstract
THE NAS PARALLEL BENCHMARKS D. H. Bailey 1, E. Barszcz 1, J. T. Barton 1,D.S.Browning 2, R. L. Carter, L. Dagum 2,R.A.Fatoohi 2,P.O.Frederickson 3, T. A. Lasinski 1,R.S. Schreiber 3, H. D. Simon 2,V.Venkatakrishnan
More informationNon-blocking Java Communications Support on Clusters
Non-blocking Java Communications Support on Clusters Guillermo L. Taboada*, Juan Touriño, Ramón Doallo UNIVERSIDADE DA CORUÑA SPAIN {taboada,juan,doallo}@udc.es 13th European PVM/MPI Users s Meeting (EuroPVM/MPI
More informationFrank Miller, George Apostolopoulos, and Satish Tripathi. University of Maryland. College Park, MD ffwmiller, georgeap,
Simple Input/Output Streaming in the Operating System Frank Miller, George Apostolopoulos, and Satish Tripathi Mobile Computing and Multimedia Laboratory Department of Computer Science University of Maryland
More informationPM2: High Performance Communication Middleware for Heterogeneous Network Environments
PM2: High Performance Communication Middleware for Heterogeneous Network Environments Toshiyuki Takahashi, Shinji Sumimoto, Atsushi Hori, Hiroshi Harada, and Yutaka Ishikawa Real World Computing Partnership,
More informationVM instruction formats. Bytecode translator
Implementing an Ecient Java Interpreter David Gregg 1, M. Anton Ertl 2 and Andreas Krall 2 1 Department of Computer Science, Trinity College, Dublin 2, Ireland. David.Gregg@cs.tcd.ie 2 Institut fur Computersprachen,
More informationThe Use of the MPI Communication Library in the NAS Parallel Benchmarks
The Use of the MPI Communication Library in the NAS Parallel Benchmarks Theodore B. Tabe, Member, IEEE Computer Society, and Quentin F. Stout, Senior Member, IEEE Computer Society 1 Abstract The statistical
More informationpoint in worrying about performance. The goal of our work is to show that this is not true. This paper is organised as follows. In section 2 we introd
A Fast Java Interpreter David Gregg 1, M. Anton Ertl 2 and Andreas Krall 2 1 Department of Computer Science, Trinity College, Dublin 2, Ireland. David.Gregg@cs.tcd.ie 2 Institut fur Computersprachen, TU
More informationCommission of the European Communities **************** ESPRIT III PROJECT NB 6756 **************** CAMAS
Commission of the European Communities **************** ESPRIT III PROJECT NB 6756 **************** CAMAS COMPUTER AIDED MIGRATION OF APPLICATIONS SYSTEM **************** CAMAS-TR-2.3.4 Finalization Report
More informationOne-Sided Append: A New Communication Paradigm For PGAS Models
One-Sided Append: A New Communication Paradigm For PGAS Models James Dinan and Mario Flajslik Intel Corporation {james.dinan, mario.flajslik}@intel.com ABSTRACT One-sided append represents a new class
More informationTechnische Universitat Munchen. Institut fur Informatik. D Munchen.
Developing Applications for Multicomputer Systems on Workstation Clusters Georg Stellner, Arndt Bode, Stefan Lamberts and Thomas Ludwig? Technische Universitat Munchen Institut fur Informatik Lehrstuhl
More informationLiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster
: Support for High-Performance MPI Intra-Node Communication on Linux Cluster Hyun-Wook Jin Sayantan Sur Lei Chai Dhabaleswar K. Panda Department of Computer Science and Engineering The Ohio State University
More informationMaple on the Intel Paragon. Laurent Bernardin. Institut fur Wissenschaftliches Rechnen. ETH Zurich, Switzerland.
Maple on the Intel Paragon Laurent Bernardin Institut fur Wissenschaftliches Rechnen ETH Zurich, Switzerland bernardin@inf.ethz.ch October 15, 1996 Abstract We ported the computer algebra system Maple
More informationinstruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals
Performance Evaluations of a Multithreaded Java Microcontroller J. Kreuzinger, M. Pfeer A. Schulz, Th. Ungerer Institute for Computer Design and Fault Tolerance University of Karlsruhe, Germany U. Brinkschulte,
More information(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX
Towards an Adaptive Distributed Shared Memory (Preliminary Version ) Jai-Hoon Kim Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3 E-mail: fjhkim,vaidyag@cs.tamu.edu
More informationComponent-Based Communication Support for Parallel Applications Running on Workstation Clusters
Component-Based Communication Support for Parallel Applications Running on Workstation Clusters Antônio Augusto Fröhlich 1 and Wolfgang Schröder-Preikschat 2 1 GMD FIRST Kekulésraÿe 7 D-12489 Berlin, Germany
More informationLanguage-Based Parallel Program Interaction: The Breezy Approach. Darryl I. Brown Allen D. Malony. Bernd Mohr. University of Oregon
Language-Based Parallel Program Interaction: The Breezy Approach Darryl I. Brown Allen D. Malony Bernd Mohr Department of Computer And Information Science University of Oregon Eugene, Oregon 97403 fdarrylb,
More informationThe MPBench Report. Philip J. Mucci. Kevin London. March 1998
The MPBench Report Philip J. Mucci Kevin London mucci@cs.utk.edu london@cs.utk.edu March 1998 1 Introduction MPBench is a benchmark to evaluate the performance of MPI and PVM on MPP's and clusters of workstations.
More informationMPI as a Coordination Layer for Communicating HPF Tasks
Syracuse University SURFACE College of Engineering and Computer Science - Former Departments, Centers, Institutes and Projects College of Engineering and Computer Science 1996 MPI as a Coordination Layer
More informationExtra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987
Extra-High Speed Matrix Multiplication on the Cray-2 David H. Bailey September 2, 1987 Ref: SIAM J. on Scientic and Statistical Computing, vol. 9, no. 3, (May 1988), pg. 603{607 Abstract The Cray-2 is
More informationGroup Management Schemes for Implementing MPI Collective Communication over IP Multicast
Group Management Schemes for Implementing MPI Collective Communication over IP Multicast Xin Yuan Scott Daniels Ahmad Faraj Amit Karwande Department of Computer Science, Florida State University, Tallahassee,
More informationMPJava: High-Performance Message Passing in Java using Java.nio
MPJava: High-Performance Message Passing in Java using Java.nio William Pugh Dept. of Computer Science University of Maryland College Park, MD 20740 USA pugh@cs.umd.edu Jaime Spacco Dept. of Computer Science
More informationBenchmarking a Network of PCs Running Parallel Applications 1
(To Appear: International Performance, Computing, and unications Conference, Feb. 998. Phoenix, AZ) Benchmarking a Network of PCs Running Parallel Applications Jeffrey K. Hollingsworth Erol Guven Cuneyt
More informationMulticast can be implemented here
MPI Collective Operations over IP Multicast? Hsiang Ann Chen, Yvette O. Carrasco, and Amy W. Apon Computer Science and Computer Engineering University of Arkansas Fayetteville, Arkansas, U.S.A fhachen,yochoa,aapong@comp.uark.edu
More informationProcess 0 Process 1 MPI_Barrier MPI_Isend. MPI_Barrier. MPI_Recv. MPI_Wait. MPI_Isend message. header. MPI_Recv. buffer. message.
Where's the Overlap? An Analysis of Popular MPI Implementations J.B. White III and S.W. Bova Abstract The MPI 1:1 denition includes routines for nonblocking point-to-point communication that are intended
More informationI/O in the Gardens Non-Dedicated Cluster Computing Environment
I/O in the Gardens Non-Dedicated Cluster Computing Environment Paul Roe and Siu Yuen Chan School of Computing Science Queensland University of Technology Australia fp.roe, s.chang@qut.edu.au Abstract Gardens
More informationOne-Sided Routines on a SGI Origin 2000 and a Cray T3E-600. Glenn R. Luecke, Silvia Spanoyannis, Marina Kraeva
The Performance and Scalability of SHMEM and MPI-2 One-Sided Routines on a SGI Origin 2 and a Cray T3E-6 Glenn R. Luecke, Silvia Spanoyannis, Marina Kraeva grl@iastate.edu, spanoyan@iastate.edu, kraeva@iastate.edu
More informationWhatÕs New in the Message-Passing Toolkit
WhatÕs New in the Message-Passing Toolkit Karl Feind, Message-passing Toolkit Engineering Team, SGI ABSTRACT: SGI message-passing software has been enhanced in the past year to support larger Origin 2
More informationIntra-MIC MPI Communication using MVAPICH2: Early Experience
Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University
More informationAbstract HPF was originally created to simplify high-level programming of parallel computers. The inventors of HPF strove for an easy-to-use language
Ecient HPF Programs Harald J. Ehold 1 Wilfried N. Gansterer 2 Dieter F. Kvasnicka 3 Christoph W. Ueberhuber 2 1 VCPC, European Centre for Parallel Computing at Vienna E-Mail: ehold@vcpc.univie.ac.at 2
More informationParallel Programming Environments. Presented By: Anand Saoji Yogesh Patel
Parallel Programming Environments Presented By: Anand Saoji Yogesh Patel Outline Introduction How? Parallel Architectures Parallel Programming Models Conclusion References Introduction Recent advancements
More informationRDMA-like VirtIO Network Device for Palacios Virtual Machines
RDMA-like VirtIO Network Device for Palacios Virtual Machines Kevin Pedretti UNM ID: 101511969 CS-591 Special Topics in Virtualization May 10, 2012 Abstract This project developed an RDMA-like VirtIO network
More informationAN EMPIRICAL STUDY OF EFFICIENCY IN DISTRIBUTED PARALLEL PROCESSING
AN EMPIRICAL STUDY OF EFFICIENCY IN DISTRIBUTED PARALLEL PROCESSING DR. ROGER EGGEN Department of Computer and Information Sciences University of North Florida Jacksonville, Florida 32224 USA ree@unf.edu
More informationExploring Performance Improvement for Java-based Scientific Simulations that use the Swarm Toolkit
Exploring Performance Improvement for Java-based Scientific Simulations that use the Swarm Toolkit Xiaorong Xiang and Gregory Madey Department of Computer Science and Engineering University of Notre Dame
More informationIntroduction to Java Programming
Introduction to Java Programming Lecture 1 CGS 3416 Spring 2017 1/9/2017 Main Components of a computer CPU - Central Processing Unit: The brain of the computer ISA - Instruction Set Architecture: the specific
More informationArray Decompositions for Nonuniform Computational Environments
Syracuse University SURFACE College of Engineering and Computer Science - Former Departments, Centers, Institutes and Projects College of Engineering and Computer Science 996 Array Decompositions for Nonuniform
More informationAbstract. provide substantial improvements in performance on a per application basis. We have used architectural customization
Architectural Adaptation in MORPH Rajesh K. Gupta a Andrew Chien b a Information and Computer Science, University of California, Irvine, CA 92697. b Computer Science and Engg., University of California,
More informationEvaluation and Improvements of Programming Models for the Intel SCC Many-core Processor
Evaluation and Improvements of Programming Models for the Intel SCC Many-core Processor Carsten Clauss, Stefan Lankes, Pablo Reble, Thomas Bemmerl International Workshop on New Algorithms and Programming
More informationEPOS: an Object-Oriented Operating System
EPOS: an Object-Oriented Operating System Antônio Augusto Fröhlich 1 Wolfgang Schröder-Preikschat 2 1 GMD FIRST guto@first.gmd.de 2 University of Magdeburg wosch@ivs.cs.uni-magdeburg.de Abstract This position
More informationJava Performance Analysis for Scientific Computing
Java Performance Analysis for Scientific Computing Roldan Pozo Leader, Mathematical Software Group National Institute of Standards and Technology USA UKHEC: Java for High End Computing Nov. 20th, 2000
More informationLAPI on HPS Evaluating Federation
LAPI on HPS Evaluating Federation Adrian Jackson August 23, 2004 Abstract LAPI is an IBM-specific communication library that performs single-sided operation. This library was well profiled on Phase 1 of
More informationGlobal Scheduler. Global Issue. Global Retire
The Delft-Java Engine: An Introduction C. John Glossner 1;2 and Stamatis Vassiliadis 2 1 Lucent / Bell Labs, Allentown, Pa. 2 Delft University oftechnology, Department of Electrical Engineering Delft,
More informationLecture 28: Introduction to the Message Passing Interface (MPI) (Start of Module 3 on Distribution and Locality)
COMP 322: Fundamentals of Parallel Programming Lecture 28: Introduction to the Message Passing Interface (MPI) (Start of Module 3 on Distribution and Locality) Mack Joyner and Zoran Budimlić {mjoyner,
More informationdirector executor user program user program signal, breakpoint function call communication channel client library directing server
(appeared in Computing Systems, Vol. 8, 2, pp.107-134, MIT Press, Spring 1995.) The Dynascope Directing Server: Design and Implementation 1 Rok Sosic School of Computing and Information Technology Grith
More informationNetwork Object in C++
Network Object in C++ Final Project of HonorOS Professor David Maziere Po-yen Huang (Dennis) Dong-rong Wen May 9, 2003 Table of Content Abstract...3 Introduction...3 Architecture...3 The idea...3 More
More informationPerformance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture
Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Sivakumar Harinath 1, Robert L. Grossman 1, K. Bernhard Schiefer 2, Xun Xue 2, and Sadique Syed 2 1 Laboratory of
More informationarxiv: v1 [cs.dc] 27 Sep 2018
Performance of MPI sends of non-contiguous data Victor Eijkhout arxiv:19.177v1 [cs.dc] 7 Sep 1 1 Abstract We present an experimental investigation of the performance of MPI derived datatypes. For messages
More informationOmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP
OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP (extended abstract) Mitsuhisa Sato 1, Motonari Hirano 2, Yoshio Tanaka 2 and Satoshi Sekiguchi 2 1 Real World Computing Partnership,
More informationIf the agent performing this co-ordination operation was to fail having performed the in then the incremented counter would never be inserted. This wo
Mobile Co-ordination: Providing fault tolerance in tuple space based co-ordination languages. Antony Rowstron Laboratory for Communication Engineering, Engineering Department, University of Cambridge,
More informationSKaMPI: A Detailed, Accurate MPI Benchmark Ralf Reussner 1,Peter Sanders 2, Lutz Prechelt 1, and Matthias Muller 1 1 University of Karlsruhe, D
SKaMPI: A Detailed, Accurate MPI Benchmark Ralf Reussner 1,Peter Sanders 2, Lutz Prechelt 1, and Matthias Muller 1 1 University of Karlsruhe, D-76128 Karlsruhe 2 Max-Planck Institute for Computer Science,
More informationComparing SIMD and MIMD Programming Modes Ravikanth Ganesan, Kannan Govindarajan, and Min-You Wu Department of Computer Science State University of Ne
Comparing SIMD and MIMD Programming Modes Ravikanth Ganesan, Kannan Govindarajan, and Min-You Wu Department of Computer Science State University of New York Bualo, NY 14260 Abstract The Connection Machine
More informationSupporting Heterogeneous Network Computing: PVM. Jack J. Dongarra. Oak Ridge National Laboratory and University of Tennessee. G. A.
Supporting Heterogeneous Network Computing: PVM Jack J. Dongarra Oak Ridge National Laboratory and University of Tennessee G. A. Geist Oak Ridge National Laboratory Robert Manchek University of Tennessee
More informationLow Latency MPI for Meiko CS/2 and ATM Clusters
Low Latency MPI for Meiko CS/2 and ATM Clusters Chris R. Jones Ambuj K. Singh Divyakant Agrawal y Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 Abstract
More informationAn Ecient Parallel Algorithm. for Matrix{Vector Multiplication. Albuquerque, NM 87185
An Ecient Parallel Algorithm for Matrix{Vector Multiplication Bruce Hendrickson 1, Robert Leland 2 and Steve Plimpton 3 Sandia National Laboratories Albuquerque, NM 87185 Abstract. The multiplication of
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More informationShared Memory vs. Message Passing: the COMOPS Benchmark Experiment
Shared Memory vs. Message Passing: the COMOPS Benchmark Experiment Yong Luo Scientific Computing Group CIC-19 Los Alamos National Laboratory Los Alamos, NM 87545, U.S.A. Email: yongl@lanl.gov, Fax: (505)
More informationChapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.
Chapter 1 GETTING STARTED SYS-ED/ Computer Education Techniques, Inc. Objectives You will learn: Java platform. Applets and applications. Java programming language: facilities and foundation. Memory management
More informationWhite Paper: Delivering Enterprise Web Applications on the Curl Platform
White Paper: Delivering Enterprise Web Applications on the Curl Platform Table of Contents Table of Contents Executive Summary... 1 Introduction... 2 Background... 2 Challenges... 2 The Curl Solution...
More informationJava Virtual Machine
Evaluation of Java Thread Performance on Two Dierent Multithreaded Kernels Yan Gu B. S. Lee Wentong Cai School of Applied Science Nanyang Technological University Singapore 639798 guyan@cais.ntu.edu.sg,
More informationEUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH PARALLEL IN-MEMORY DATABASE. Dept. Mathematics and Computing Science div. ECP
EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH CERN/ECP 95-29 11 December 1995 ON-LINE EVENT RECONSTRUCTION USING A PARALLEL IN-MEMORY DATABASE E. Argante y;z,p. v.d. Stok y, I. Willers z y Eindhoven University
More informationThroughput in Mbps. Ethernet e+06 1e+07 Block size in bits
NetPIPE: A Network Protocol Independent Performance Evaluator Quinn O. Snell, Armin R. Mikler and John L. Gustafson Ames Laboratory/Scalable Computing Lab, Ames, Iowa 5, USA snelljmiklerjgus@scl.ameslab.gov
More informationDavid H. Bailey. November 14, computational uid dynamics and other aerophysics applications. Presently this organization
Experience with Parallel Computers at NASA Ames David H. Bailey November 14, 1991 Ref: Intl. J. of High Speed Computing, vol. 5, no. 1 (993), pg. 51{62. Abstract Beginning in 1988, the Numerical Aerodynamic
More informationYasuo Okabe. Hitoshi Murai. 1. Introduction. 2. Evaluation. Elapsed Time (sec) Number of Processors
Performance Evaluation of Large-scale Parallel Simulation Codes and Designing New Language Features on the (High Performance Fortran) Data-Parallel Programming Environment Project Representative Yasuo
More informationHigh Performance Computing Course Notes Message Passing Programming I
High Performance Computing Course Notes 2008-2009 2009 Message Passing Programming I Message Passing Programming Message Passing is the most widely used parallel programming model Message passing works
More informationshort long double char octet struct Throughput in Mbps Sender Buffer size in KBytes short long double char octet struct
Motivation Optimizations for High Performance ORBs Douglas C. Schmidt (www.cs.wustl.edu/schmidt) Aniruddha S. Gokhale (www.cs.wustl.edu/gokhale) Washington University, St. Louis, USA. Typical state of
More informationEcient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines
Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,
More informationParallel Pipeline STAP System
I/O Implementation and Evaluation of Parallel Pipelined STAP on High Performance Computers Wei-keng Liao, Alok Choudhary, Donald Weiner, and Pramod Varshney EECS Department, Syracuse University, Syracuse,
More informationGo Deep: Fixing Architectural Overheads of the Go Scheduler
Go Deep: Fixing Architectural Overheads of the Go Scheduler Craig Hesling hesling@cmu.edu Sannan Tariq stariq@cs.cmu.edu May 11, 2018 1 Introduction Golang is a programming language developed to target
More informationCost-Performance Evaluation of SMP Clusters
Cost-Performance Evaluation of SMP Clusters Darshan Thaker, Vipin Chaudhary, Guy Edjlali, and Sumit Roy Parallel and Distributed Computing Laboratory Wayne State University Department of Electrical and
More informationCommunication Characteristics in the NAS Parallel Benchmarks
Communication Characteristics in the NAS Parallel Benchmarks Ahmad Faraj Xin Yuan Department of Computer Science, Florida State University, Tallahassee, FL 32306 {faraj, xyuan}@cs.fsu.edu Abstract In this
More information