Low Latency MPI for Meiko CS/2 and ATM Clusters
|
|
- Karen Norris
- 6 years ago
- Views:
Transcription
1 Low Latency MPI for Meiko CS/2 and ATM Clusters Chris R. Jones Ambuj K. Singh Divyakant Agrawal y Department of Computer Science University of California, Santa Barbara Santa Barbara, CA Abstract MPI (Message Passing Interface) is a proposed message passing standard for development of efficient and portable parallel programs. An implementation of MPI is presented and evaluated for the Meiko CS/2, a 64 node parallel computer, and a network of 8 SGI workstations connected by an ATM switch and Ethernet. 1. Introduction A major hurdle of the message passing paradigm has been the lack of a standard library supporting features such as point-to-point and collective communication, tagged message delivery, and synchronization primitives. This motivated a consortium of researchers and practitioners in the parallel computing arena to develop a standard for message passing, Message Passing Interface (MPI) [12]. Rather than adopt one of the existing message passing libraries, they chose to create a standard with all of the above features by integrating the features provided by existing message passing libraries such as PVM [8], NX/2 [9], p4 [6] and PAR- MACS [7]. Implementations of high-level message passing libraries such as MPI are often significantly less efficient than lower level libraries because of their support of high-level programming features, and their failure to exploit specific architectural details. In implementing message passing libraries, implementors often have to trade off between low latency for small messages and high bandwidth for large. This paper examines the necessary overheads in implementing the MPI library on top of existing user level libraries for two platforms: a 64 node Meiko CS-2, and a cluster of Silicon Graphics workstations connected by both Ethernet and an ATM (Asynchronous Transfer Mode) switch. The focus This research was supported by the NSF under CCR , CDA , and CCR y Work supported by LANL under UC94-B-A-223 and by a research gift from NEC Japan. of the research is on reducing latencies in a tagged message passing model of MPI. In MPI, a sending process issues a send operation, which eventually transfers data to a process that issues a matching receive operation. Each process might issue several sends or several receives, so the processes are responsible for matching corresponding send and receive operations. A message communication in MPI therefore involves each send operation sending its envelope to the receiver, the matching of these envelopes to receive tags, and the eventual transfer of data from the sender's buffer to the receiver's buffer. This paper examines methods for the sending of envelopes, the matching of sends and receives, and the sending of data. To minimize latency, a mechanism is proposed which overlaps the transfer of data and send envelopes, buffering data temporarily when necessary at the receiver. However, for large amounts of data, the temporary buffering becomes a major overhead and can limit bandwidth. So, we propose a hybrid implementation in which, above a certain message size threshold, the sending and matching of envelopes occurs first, and then a DMA (direct memory access) from the sender to the receiver is initiated without intermediate buffering. 2. Related Work MPICH [2] is an implementation of MPI designed for portability that provides a well-defined device layer allowing for easy implementation on new architectures. It has been implemented for a variety of architectures through the widely portable p4 [6] communications layer, and also includes specialized device layers for many parallel machines, including the Meiko CS2. xmpi [15] extends MPICH by providing a device driver on top of xkernel, a framework for implementing network protocols in the kernel more efficiently than existing frameworks, such as STREAMS. As xkernel can be used to implement kernel level network protocols, it could provide more efficient overall implementations than other kernel level protocols, such as those described in this paper. The authors propose an MPI imple-
2 mentation over ATM AAL5. The Meiko CS2 has specialized hardware for providing secure user level communication using a high speed network [13]. The Meiko implementation of MPI included with the MPICH distribution is based on a tagged message passing widget provided by Meiko, called the tport widget. The tport widget, however, trades off latency in providing high bandwidth communication, and provides no support for collective communications, requiring that MPICH implement collective communications on top of point-to-point mechanisms. LAM [5] is a parallel software environment that implements MPI on top of its own existing features. Those MPI features which have no close match are independently implemented. Like MPI, it provides a rich environment for parallel programming. The MPI extension allows users to use LAM features as well. Efficient implementation of the MPI collective communications library on top of Ethernet is examined by Bruck, Dolev, Ho, Rosu and Strong [3]. A user level reliable transport protocol is given which uses the broadcast nature of the Ethernet for efficiency. Like the broadcast mechanism discussed in this paper, the exploitation of hardware broadcast gives a more efficient implementation than would be possible using only point-to-point communication. MPI's message delivery guarantees can be unrealizable due to limited resources for message envelopes. Burns and Daoud [4] discuss tactics for overflow detection and reporting in MPI implementations. Basu, Buch, Vogels and Von Eicken [1], as well as Thekkath, Nguyen, Moy and Lazowska [17] discuss the inefficiency of implementing networking protocols in the kernel. They discuss methods for moving parts of the networking protocol into the user level, leaving only the necessary security mechanisms in the kernel. Both papers give implementations of user level networking protocols for ATM, showing significant improvements over kernel level implementations. Implementation of a minimal latency DMA implementation for the ATM is given by Thekkath, Levy and Lazowska [16]. The DMA mechanism is the heart of communication for the Meiko, so a DMA mechanism such as this could be used in conjunction with the Meiko implementation discussed in this paper for a high performance ATM implementation. 3. MPI Standard MPI [12] is a communication standard to support development of portable parallel programs. The formal MPI specifications define the following primitives: point-topoint communication, collective communication, process group management and virtual topology management, environmental management and a profiling interface. In this paper, we restrict our attention primarily to the point-to-point communication primitives: MPI_Send(buffer, count, datatype, dest, tag, communicator) MPI_Recv(buffer, count, datatype, source, tag, communicator, status) There are several variants of the MPI Send call: a buffered mode, synchronous mode, and ready mode, each of which has its own blocking and nonblocking variant. The buffered send, MPI Bsend, ensures that the completion of the send operation does not depend on the posting of the receive operation. Once the buffered send has been posted, it will complete regardless of the activities of the receiver. The user is responsible for providing the buffer space for the buffered send. As long as this buffer space is monitored to make sure space is available, the send is guaranteed to be buffered correctly. The synchronous mode send, MPI Ssend, adds the requirement that the sender must wait until the receiver has posted the matching receive, and the send and receive have completed, with the sender's data having been successfully transferred to the receiver's buffer. This allows the sender to know the exact point at which the data transfer occurred. The ready mode send, MPI Rsend, allows the programmer to give more information to MPI, by informing it that the sender knows that the receiver has already posted a receive. As far as receive operations go, there are no special modes; all receives match any type of send. However, there is an operation similar to MPI Recv, MPI Probe, which polls for incoming messages without actually completing the data-transfer. A nonblocking version of this operation, MPI IProbe, checks to see if a message is available without blocking. The MPI Bcast call supplies a broadcast mechanism, used when a single processor wishes to send data to all other processes in a group. We have implemented MPI point-to-point and the broadcast collective communication primitive on the Meiko and the ATM cluster. The implementation of broadcast on Meiko uses the underlying hardware broadcast mechanism, whereas on the ATM network, it uses a succession of point-to-point messages. 4. Implementation of MPI on Meiko 4.1. Design Motivations As discussed above, MPI point-to-point communication involves transmitting a send envelope, matching a send with a receive, and transferring data from the sender's buffer to the receiver's buffer. There are several choices as to how and when these operations are performed. We discuss some
3 of these options next. The ability of a receive operation to specify MPI ANY SOURCE requires that the matching be done at the receiver. Waiting for the data transfer until a match will result in high latencies, especially for small messages. This is because the send's envelope must first be sent to the receiver, a match must be made at the receiver, a second trip across the network is necessary to request the data from the sender, and finally another trip is necessary for the sender to actually send the DMA data. If the extra latency is avoided by initiating a transfer simultaneously with a match operation, then temporary buffers will be needed, thus increasing the space requirements. We use a hybrid approach in which the transfer of small messages is overlapped with the matching operation, and the transfer of large messages occurs after the match operation. The crossover point is determined by examining the various latencies. As far as the matching operation at the receiver goes, we still need to decide whether to use the SPARC (the main 40 MHz processor on a Meiko node) or the Elan (10 MHz communications co-processor) to perform the matching. If we are to match sends and receives only with the SPARC processor, data transfers will not complete until the application program issues an MPI Wait or MPI Test operation on a nonblocking operation. For instance, if the receiver has issued a nonblocking operation, and the sender issues a synchronous send, the sender will have to wait for the receiver to issue a completion operation on its receive. To handle this problem, we could utilize the Elan co-processor for matching of sends and receives in the background. However, the slower Elan may not be able to handle the somewhat intensive message matching as quickly as the faster SPARC, so latency could increase. Additionally, since the SPARC must eventually be informed of the completion of a receive, the extra synchronization between the Elan and SPARC can also increase latency. The existing ANL/MSU MPICH implementation [2] on top of tport widget uses the Elan for matching in the background. For the sake of comparison, we implemented our matching of sends and receives on the SPARC. Regardless of matching at the receiver, we would also like the sending processor to be able to issue nonblocking sends very quickly, freeing the processor for other responsibilities. So, we use the Elan to perform this sending in the background, allowing the SPARC to perform other computation. The actual transfer of data from the sender requires allocation of memory at the receiver. The expense of these mechanisms has been researched in detail in an implementation of Active Messages for the Meiko CS2 [14]. In order to minimize latency, we allocate space for a single send envelope for each sending processor at each receiver. Round-trip time (us) Round-trip time (us) Buffering No buffering Figure 1. Meiko transfer mechanisms MPI(mpich) MPI(low latency) Meiko tport Figure 2. Meiko round-trip latency 4.2. Meiko results As mentioned before, data can be optimistically transferred before the match to a buffer at the receiver, or after the match is made directly to the receiver's address space. The comparison of these two mechanisms is shown in detail in Figure 1. The intersection of the two lines occurs at a message length of 180 bytes. Based on this we choose 180 bytes as the crossover point between the two implementations. The mechanism for small transfers allows much lower latencies than the tport mechanism for certain applications. Figure 2 shows the round-trip latency times for varying transfer sizes. Three plots are shown: our MPI implementation where matching is done entirely on the SPARC, the MPICH tport-based MPI implementation which does matching mostly on the Elan, and a plot of Meiko's tport widget without any MPI overheads. As Meiko's tport widget provides simplified tagged message passing directly, it is the lowest latency mechanism, with a 1 byte round-trip latency of 52 s. The two MPI implementations add significant overheads to this minimal cost by adding the MPI features of communicators, datatypes and different modes of communication. As
4 40 35 MPI(mpich) MPI(low latency) Meiko tport TCP UDP Fore aal4 Throughput (MB/s) Round-trip time (us) Figure 3. Meiko bandwidth Figure 4. ATM round-trip latency MPICH implements MPI directly on top of the tport widget, it can be seen that MPICH adds 158 s to the 1 byte roundtrip latency. Our lower latency SPARC matching implementation decreases these overheads by providing an implementation directly over Meiko DMAs and transactions, decreasing latency significantly with a total round-trip time of 104 s, about 52 s higher than the tport widget. The bend in the curve for the SPARC matching MPI implementation which is visible around 180 bytes clearly shows where the high bandwidth mechanism takes over, and DMAs are invoked by the receiver after the matching of the send and receive has taken place. To demonstrate that this implementation still maintains high bandwidth for large transfers, Figure 3 plots bandwidth versus message size for larger messages with the two MPI implementations and the Meiko tport. As seen in the figure, the best possible DMA provided bandwidth of 39MBytes/s is nearly reached, and bandwidth is in fact increased as a result of decreasing latency for the SPARC matching implementation. 5. ATM Implementation The ATM cluster examined here consists of eight SGI Indy 133-MHz workstations and an SGI Challenge SMP 150-MHz dual-processor. All of these machines have 64 MBytes of RAM, and are connected via a 10 Mbit/s Ethernet and 155 Mbit/s ATM channels. The ATM switch is a Fore Systems ForeRunner ASX-200 with eight 155 Mbit/s ports. Each SGI is connected to the switch by a Fore GIA- 200 interface card. These interface cards include an Intel i960 processor dedicated to segmentation and reassembly for the AAL3/4 and AAL5 protocols without using the main processor. Four different user level protocols are available to run in this environment. TCP/IP is available with two different signaling protocols to establish the connection. The first, Classical IP, is the standard signaling protocol defined by the ATM Forum. The second, SPANS, is Fore's own signaling protocol. The differences in these protocols only affect connection establishment. In the implementations we will examine, connections are static, so connection setup time is not of major importance. Hence, this paper only examines classical IP. Fore also offers an API which provides communication on top of ATM adaptation layers 3 and 4 (which are treated identically), and AAL5. As our goal is a low latency implementation of MPI, it is worth considering the latency of a packet over these different protocols. We would expect that the Fore adaptation layers might be significantly faster, since they have very few overheads. Unfortunately, they are not significantly faster than the Fore TCP or UDP implementations. This is because of the overheads involved in the streams protocol layers used by the Fore API. Figure 4 shows a comparison of the latencies for Fore's implementation of AAL4, TCP and UDP. Except for small message sizes, the latency of these protocols are indistinguishable from each other. This prompted us to confine our attention only to TCP/IP and UDP as the underlying communication protocols. Our measurements were made using TCP/IP and UDP on an Ethernet as well as the described ATM network TCP Implementation The facilities for communication provided by TCP/IP and the Meiko are quite different. While Meiko provides mechanisms for manipulation of remote data through DMAs or remote transactions, TCP/IP provides a much different communication mechanism, that of a reliable stream of data between two processes on opposite ends of a channel. However, many of the issues we confronted during the implementation of MPI for the Meiko are common, such as the need to match MPI Sends to MPI Receives. In order to reuse the components of our MPI implementation on the Meiko, we decided to implement the underlying primitives (assumed in the Meiko implementation) on top of TCP.
5 Essentially, we needed to implement a method of sending an envelope, a method of sending an envelope with piggybacked data, and a method for setting remote events and sending DMA data. As the communication latencies are quite large when TCP is used, piggybacking data is more important than in the Meiko implementation. As such, the buffer flow control mechanism used in the Meiko implementation is inappropriate, since it assumes only a single outstanding message at any given time. A window protocol would be ideal allowing multiple outstanding messages. Unfortunately, since not all MPI messages can be ordered on the same FIFO because of tags and communicators, standard window protocols are inappropriate. So, we have the receiver keep a reserved amount of memory for each sender, to which the sender sends data optimistically. Once freed, the receiver informs the sender that the space can be reused. This allows the sender to optimistically send many messages as long as it knows that free space is available at the receiver TCP Results Roundtrip time (us) mpi/tcp/atm mpi/tcp/eth tcp/atm tcp/eth Figure 5. TCP round-trip latency Figure 5 displays the round-trip latency times for messages with both TCP and MPI over TCP. Four plots are shown, two for TCP and MPI over Ethernet and two for ATM. There is approximately a 150 s higher round-trip latency for MPI implementations over TCP. This overhead is caused by the additional transfer of envelopes and control information, and the cost of performing message matching. Table 1 displays the breakdown of these overheads for a 1 byte message. The largest overhead is the 925 s costof a round-trip message over Ethernet, and 1065 s foratm. The next line in the table is the cost of sending 25 bytes of MPI protocol information. (Of the 25 bytes, 1 byte designates the type of message, such as envelope, or DMA. 4 bytes are included for telling the destination how much reserved space has been freed. The last 20 bytes are used for the envelope, and DMA request information.) The amount of this overhead is 45 s on Ethernet and about 5 s on ATM. The next line measures the overhead of determining the incoming message type. This was measured to be 65 s on Ethernet and slightly higher (85 s) on ATM. The next line measures the overhead of receiving the actual envelope once the message type has been determined. Once more, they are 65 s on Ethernet and 85 s on ATM. These costs are so high because the associated operations need to cross the kernel boundary. The last 35 s overhead is the cost of actually performing MPI matching. (The cost for receiving the actual data is already included in the latency figures of the first line.) Figure 6 displays the bandwidth obtained using TCP as the communications layer. For the sake of completeness, we implemented MPI by using the UDP transport level interface. The UDP imple- Throughput (MB/s) mpi/tcp/atm mpi/tcp/eth tcp/atm tcp/eth Figure 6. TCP bandwidth ATM Ethernet Overhead 1065 s 925 s 1 byte round-trip latency 5 s 45 s 25 byte info overhead 85 s 65 s Read for msg type 85 s 65 s Read for envelope 35 s 35 s Overheads for matching Table 1. MPI round-trip overheads with TCP
6 1 0.9 mpich low latency mpich low latency time (s) time (us) # Processes Number of processors Figure 7. Meiko Linear Equation Solver Figure 8. Meiko Particle Pairwise Interactions mentation is very similar to the one with TCP with additional measures taken to make the UDP communication reliable. As a consequence of the overhead that arises to make the UDP connection reliable, our results indicated that the performance of the UDP implementation was very similar to that of TCP [10]. Time (us) Ethernet ATM 6. Applications Linear Equation Solver A linear equation solver for N variables has been implemented which solves the equation with an initial phase of computation by the initiator, N phases of broadcasting and computation by all processes, and a final phase of result gathering by the initiator. As the only communication mechanism involved here is the broadcast, the MPI-based program uses the collective communication primitives implemented using Meiko's hardware broadcast mechanism. The results for the Meiko are shown in Figure 7, which givestimes forsolvinga linear system using from 1 to 32 processes. Two plots are show, one for the MPICH implementation, which implements broadcast using pointto-point messages, and our implementation. We also implemented matrix multiplication; the performance results are similar to that of the linear equation solver Pairwise Interactions Another problem which exploits the use of a parallel computer well is molecular dynamics, where we need to compute interacting forces within a group of particles. As each particle interacts with every other particle in the group, O(n 2 ) interactions must be calculated. To parallelize this problem, each processor is in charge of calculating the interactions of P=N particles where N is the numberof processors. The processes communicate in P, 1 phases, passing Number of Processors Figure 9. TCP Particle Pairwise Interactions a partition of the particles around in the ring. Each process calculates the interactions between the particles it is permanently assigned to and the partition of particles that it has in the current round. Messages are simply passed in a ring, requiring only point-to-point messages. To allow concurrent sending and receiving at the communication phase of each round, nonblocking sends are posted to send to the next processor in the ring, then a blocking receive is performed, followed by a wait operation to complete the send. Figure 8 shows the results of running this program to find the forces acting on each of 24 particles, using up to 8 processes of the Meiko. As each processor has a fairly even load, the processes tend to interact at nearly the same time, so a lower latency communication mechanism is beneficial. The high latencies of TCP on the cluster of workstations make this problem scale well only for much larger problems sets. Figure 9 shows the results of finding the forces acting on 128 particles. The ATM shows a clear performance gain, primarily because there is no network contention and fairly large messages are used, exploiting ATM' s higher bandwidth.
7 7. Conclusions The MPI standard is intended to provide a widely portable message passing interface. Its semantics leave many efficiency issues to the implementor. We have shown three elements that can affect performance on different architectures: the use of a communications co-processor for background processing, the buffering of data at the receiver, and the overlapping of data transfer with message matching. A mechanism was proposed that takes these issues into account, giving efficient implementations on the Meiko CS-2, and a cluster of workstations communicating with TCP or UDP on Ethernet or ATM. References [1] Basu, A., Buch, V., Vogels, W., and Von Eicken, T., U- Net: A User-Level Network Interface for Parallel and Distributed Computing Proceedings of the 15th ACM Symposium on Operating Systems Principles, December [2] Bridges, P., Doss, N., Gropp, W. Karrels, E., Lusk, E., and Skjellum, A., Users' Guide to MPICH: A Portable Implementation of MPI, Argonne National Laboratory, May [3] Bruck, J., Dolev, D., Ho, C., Rosu, M. and Strong, R., Efficient Message Passing Interface (MPI) for Parallel Computing on Clusters of Workstations ACM Symposium on Parallel Algorithms and Architectures, July [4] Burns, G. and Daoud, R. Robust MPI Message Delivery with Guaranteed Resources Proceedings of the MPI Developers Conference, June [9] ipsc/2 and ipsc/860 User' s Guide, Intel Corporation, Order Number , April [10] Jones, Chris. Low Latency MPI for Meiko CS-2 and ATM Clusters, MS Thesis, Department of Computer Science, University of California at Santa Barbara, July [11] Lin, M., Du, D. H. C., Thomas, J. P. and McDonald, J. A. Distributed Network Computing over Local ATM Networks. IEEE Journal on Selected Areas in Communications, Vol. 13, No. 4, May [12] Message Passing Interface Forum, MPI: A Message- Passing Interface Standard, Version 1.1, June [13] CS-2 Documentation Set, Meiko World Incorporated, [14] Schauser, K. and Scheiman, C., Experience with Active Messages on the Meiko CS-2 9th International Parallel Processing Symposium, Santa Barbara CA, April [15] Singhai, A. and Campbell, R. H. xmpi - An Implementation over x-kernel for ATM Networks Proceedings of the MPI Developers Conference, June [16] Thekkath, C. A., Levy, H.M. and Lazowska, E. D., Efficient Support for Multicomputing on ATM Networks. Technical Report TR Department of Computer Science and Engineering, University of Washington, April [17] Thekkath, C.A., Nguyen, T. D., Moy, E., and Lazowska, E.D., Implementing Network Protocols at User Level. IEEE/ACM Transactions on Networking. Vol 1. No. 5, October [5] Burns, G., Daoud, R. and Jaigl, J., LAM: An Open Cluster Environment for MPI, Ohio Supercomputing Center, [6] Butler, R., and Lusk, E., User's guide to the p4 parallel programming system. Technical Report ANL-92/17, Argonne National Laboratory, [7] Calvin, R., Hempel, R., Hoppe, H., Wypior, P. Portable programming with the PARMACS Message- Passing Library. Parallel Computing, Special Issue on Message Passing Interfaces 20 (April 1994), [8] Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., and Sunderam, V., PVM 3.0 User' s Guide and Reference Manual. Oak Ridge National Laboratory. Technical Report ORNL/TM , February 1993.
AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1. October 4 th, Department of Computer Science, Cornell University
AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1 October 4 th, 2012 1 Department of Computer Science, Cornell University Papers 2 Active Messages: A Mechanism for Integrated Communication and Control,
More informationDeveloping a Thin and High Performance Implementation of Message Passing Interface 1
Developing a Thin and High Performance Implementation of Message Passing Interface 1 Theewara Vorakosit and Putchong Uthayopas Parallel Research Group Computer and Network System Research Laboratory Department
More informationMulticast can be implemented here
MPI Collective Operations over IP Multicast? Hsiang Ann Chen, Yvette O. Carrasco, and Amy W. Apon Computer Science and Computer Engineering University of Arkansas Fayetteville, Arkansas, U.S.A fhachen,yochoa,aapong@comp.uark.edu
More informationEthan Kao CS 6410 Oct. 18 th 2011
Ethan Kao CS 6410 Oct. 18 th 2011 Active Messages: A Mechanism for Integrated Communication and Control, Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. In Proceedings
More informationTransactions on Information and Communications Technologies vol 9, 1995 WIT Press, ISSN
Finite difference and finite element analyses using a cluster of workstations K.P. Wang, J.C. Bruch, Jr. Department of Mechanical and Environmental Engineering, q/ca/z/brm'a, 5Wa jbw6wa CW 937% Abstract
More informationLow-Latency Message Passing on Workstation Clusters using SCRAMNet 1 2
Low-Latency Message Passing on Workstation Clusters using SCRAMNet 1 2 Vijay Moorthy, Matthew G. Jacunski, Manoj Pillai,Peter, P. Ware, Dhabaleswar K. Panda, Thomas W. Page Jr., P. Sadayappan, V. Nagarajan
More informationProfile-Based Load Balancing for Heterogeneous Clusters *
Profile-Based Load Balancing for Heterogeneous Clusters * M. Banikazemi, S. Prabhu, J. Sampathkumar, D. K. Panda, T. W. Page and P. Sadayappan Dept. of Computer and Information Science The Ohio State University
More informationThe latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication
The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication John Markus Bjørndalen, Otto J. Anshus, Brian Vinter, Tore Larsen Department of Computer Science University
More informationGroup Management Schemes for Implementing MPI Collective Communication over IP Multicast
Group Management Schemes for Implementing MPI Collective Communication over IP Multicast Xin Yuan Scott Daniels Ahmad Faraj Amit Karwande Department of Computer Science, Florida State University, Tallahassee,
More informationAn O/S perspective on networks: Active Messages and U-Net
An O/S perspective on networks: Active Messages and U-Net Theo Jepsen Cornell University 17 October 2013 Theo Jepsen (Cornell University) CS 6410: Advanced Systems 17 October 2013 1 / 30 Brief History
More informationAn Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language
An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language Martin C. Rinard (martin@cs.ucsb.edu) Department of Computer Science University
More informationTHE U-NET USER-LEVEL NETWORK ARCHITECTURE. Joint work with Werner Vogels, Anindya Basu, and Vineet Buch. or: it s easy to buy high-speed networks
Thorsten von Eicken Dept of Computer Science tve@cs.cornell.edu Cornell niversity THE -NET SER-LEVEL NETWORK ARCHITECTRE or: it s easy to buy high-speed networks but making them work is another story NoW
More informationLow-Latency Communication over Fast Ethernet
Low-Latency Communication over Fast Ethernet Matt Welsh, Anindya Basu, and Thorsten von Eicken {mdw,basu,tve}@cs.cornell.edu Department of Computer Science Cornell University, Ithaca, NY 14853 http://www.cs.cornell.edu/info/projects/u-net
More informationKevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a
Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate
More information[ 7.2.5] Certain challenges arise in realizing SAS or messagepassing programming models. Two of these are input-buffer overflow and fetch deadlock.
Buffering roblems [ 7.2.5] Certain challenges arise in realizing SAS or messagepassing programming models. Two of these are input-buffer overflow and fetch deadlock. Input-buffer overflow Suppose a large
More informationLow-Latency Communication on the IBM RISC System/6000 SP
Low-Latency Communication on the IBM RISC System/6000 SP Chi-Chao Chang, Grzegorz Czajkowski, Chris Hawblitzel and Thorsten von Eicken Department of Computer Science Cornell University Ithaca NY 1483 Abstract
More informationMICE: A Prototype MPI Implementation in Converse Environment
: A Prototype MPI Implementation in Converse Environment Milind A. Bhandarkar and Laxmikant V. Kalé Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign
More informationLogP Performance Assessment of Fast Network Interfaces
November 22, 1995 LogP Performance Assessment of Fast Network Interfaces David Culler, Lok Tin Liu, Richard P. Martin, and Chad Yoshikawa Computer Science Division University of California, Berkeley Abstract
More informationCISC 879 Software Support for Multicore Architectures Spring Student Presentation 6: April 8. Presenter: Pujan Kafle, Deephan Mohan
CISC 879 Software Support for Multicore Architectures Spring 2008 Student Presentation 6: April 8 Presenter: Pujan Kafle, Deephan Mohan Scribe: Kanik Sem The following two papers were presented: A Synchronous
More informationTMD-MPI: AN MPI IMPLEMENTATION FOR MULTIPLE PROCESSORS ACROSS MULTIPLE FPGAS. Manuel Saldaña and Paul Chow
TMD-MPI: AN MPI IMPLEMENTATION FOR MULTIPLE PROCESSORS ACROSS MULTIPLE FPGAS Manuel Saldaña and Paul Chow Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada M5S
More informationTechnische Universitat Munchen. Institut fur Informatik. D Munchen.
Developing Applications for Multicomputer Systems on Workstation Clusters Georg Stellner, Arndt Bode, Stefan Lamberts and Thomas Ludwig? Technische Universitat Munchen Institut fur Informatik Lehrstuhl
More informationET4254 Communications and Networking 1
Topic 9 Internet Protocols Aims:- basic protocol functions internetworking principles connectionless internetworking IP IPv6 IPSec 1 Protocol Functions have a small set of functions that form basis of
More informationPOM: a Virtual Parallel Machine Featuring Observation Mechanisms
POM: a Virtual Parallel Machine Featuring Observation Mechanisms Frédéric Guidec, Yves Mahéo To cite this version: Frédéric Guidec, Yves Mahéo. POM: a Virtual Parallel Machine Featuring Observation Mechanisms.
More informationMPI and comparison of models Lecture 23, cs262a. Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018
MPI and comparison of models Lecture 23, cs262a Ion Stoica & Ali Ghodsi UC Berkeley April 16, 2018 MPI MPI - Message Passing Interface Library standard defined by a committee of vendors, implementers,
More informationHIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS
HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS CS6410 Moontae Lee (Nov 20, 2014) Part 1 Overview 00 Background User-level Networking (U-Net) Remote Direct Memory Access
More informationCS 5520/ECE 5590NA: Network Architecture I Spring Lecture 13: UDP and TCP
CS 5520/ECE 5590NA: Network Architecture I Spring 2008 Lecture 13: UDP and TCP Most recent lectures discussed mechanisms to make better use of the IP address space, Internet control messages, and layering
More informationPerformance of the MP_Lite message-passing library on Linux clusters
Performance of the MP_Lite message-passing library on Linux clusters Dave Turner, Weiyi Chen and Ricky Kendall Scalable Computing Laboratory, Ames Laboratory, USA Abstract MP_Lite is a light-weight message-passing
More informationUnder the Hood, Part 1: Implementing Message Passing
Lecture 27: Under the Hood, Part 1: Implementing Message Passing Parallel Computer Architecture and Programming CMU 15-418/15-618, Fall 2017 Today s Theme 2 Message passing model (abstraction) Threads
More informationAn Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks
An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks Ryan G. Lane Daniels Scott Xin Yuan Department of Computer Science Florida State University Tallahassee, FL 32306 {ryanlane,sdaniels,xyuan}@cs.fsu.edu
More informationLoaded: Server Load Balancing for IPv6
Loaded: Server Load Balancing for IPv6 Sven Friedrich, Sebastian Krahmer, Lars Schneidenbach, Bettina Schnor Institute of Computer Science University Potsdam Potsdam, Germany fsfried, krahmer, lschneid,
More informationDistributed Scheduling for the Sombrero Single Address Space Distributed Operating System
Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.
More informationDemultiplexing on the ATM Adapter: Experiments withinternetprotocolsinuserspace
Demultiplexing on the ATM Adapter: Experiments withinternetprotocolsinuserspace Ernst W. Biersack, Erich Rütsche B.P. 193 06904 Sophia Antipolis, Cedex FRANCE e-mail: erbi@eurecom.fr, rue@zh.xmit.ch Abstract
More informationRTI Performance on Shared Memory and Message Passing Architectures
RTI Performance on Shared Memory and Message Passing Architectures Steve L. Ferenci Richard Fujimoto, PhD College Of Computing Georgia Institute of Technology Atlanta, GA 3332-28 {ferenci,fujimoto}@cc.gatech.edu
More informationPart 5: Link Layer Technologies. CSE 3461: Introduction to Computer Networking Reading: Chapter 5, Kurose and Ross
Part 5: Link Layer Technologies CSE 3461: Introduction to Computer Networking Reading: Chapter 5, Kurose and Ross 1 Outline PPP ATM X.25 Frame Relay 2 Point to Point Data Link Control One sender, one receiver,
More informationRDMA-like VirtIO Network Device for Palacios Virtual Machines
RDMA-like VirtIO Network Device for Palacios Virtual Machines Kevin Pedretti UNM ID: 101511969 CS-591 Special Topics in Virtualization May 10, 2012 Abstract This project developed an RDMA-like VirtIO network
More informationData Link Layer. Our goals: understand principles behind data link layer services: instantiation and implementation of various link layer technologies
Data Link Layer Our goals: understand principles behind data link layer services: link layer addressing instantiation and implementation of various link layer technologies 1 Outline Introduction and services
More informationParallel Implementation of 3D FMA using MPI
Parallel Implementation of 3D FMA using MPI Eric Jui-Lin Lu y and Daniel I. Okunbor z Computer Science Department University of Missouri - Rolla Rolla, MO 65401 Abstract The simulation of N-body system
More informationDistributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne
Distributed Computing: PVM, MPI, and MOSIX Multiple Processor Systems Dr. Shaaban Judd E.N. Jenne May 21, 1999 Abstract: Distributed computing is emerging as the preferred means of supporting parallel
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationEfficient Signaling Algorithms for ATM Networks
Efficient Signaling Algorithms for ATM Networks See-Mong Tan Roy H. Campbell Department of Computer Science University of Illinois at Urbana-Champaign 1304 W. Springfield Urbana, IL 61801 stan,roy @cs.uiuc.edu
More informationPush-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters
Title Push-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters Author(s) Wong, KP; Wang, CL Citation International Conference on Parallel Processing Proceedings, Aizu-Wakamatsu
More informationCommunication Kernel for High Speed Networks in the Parallel Environment LANDA-HSN
Communication Kernel for High Speed Networks in the Parallel Environment LANDA-HSN Thierry Monteil, Jean Marie Garcia, David Gauchard, Olivier Brun LAAS-CNRS 7 avenue du Colonel Roche 3077 Toulouse, France
More informationPerformance of Multihop Communications Using Logical Topologies on Optical Torus Networks
Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,
More informationby Brian Hausauer, Chief Architect, NetEffect, Inc
iwarp Ethernet: Eliminating Overhead In Data Center Designs Latest extensions to Ethernet virtually eliminate the overhead associated with transport processing, intermediate buffer copies, and application
More information6.9. Communicating to the Outside World: Cluster Networking
6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and
More informationCS 428/528 Computer Networks Lecture 01. Yan Wang
1 CS 428/528 Computer Lecture 01 Yan Wang 2 Motivation: Why bother? Explosive growth of networks 1989, 100,000 hosts on the Internet Distributed Applications and Systems E-mail, WWW, multimedia, distributed
More informationWeek 2 / Paper 1. The Design Philosophy of the DARPA Internet Protocols
Week 2 / Paper 1 The Design Philosophy of the DARPA Internet Protocols David D. Clark ACM CCR, Vol. 18, No. 4, August 1988 Main point Many papers describe how the Internet Protocols work But why do they
More informationEvaluating Personal High Performance Computing with PVM on Windows and LINUX Environments
Evaluating Personal High Performance Computing with PVM on Windows and LINUX Environments Paulo S. Souza * Luciano J. Senger ** Marcos J. Santana ** Regina C. Santana ** e-mails: {pssouza, ljsenger, mjs,
More informationinterfaces. Originally developed for the CM-5 [1], implementations are also available for the Meiko CS-[10], HP workstations on FDDI ring [9], Intel P
Low-Latency Communication on the IBM RISC System/6000 SP y Chi-Chao Chang, Grzegorz Czajkowski, Chris Hawblitzel, and Thorsten von Eicken Department of Computer Science Cornell University Ithaca, NY 14853
More informationAdvanced Computer Networks. Flow Control
Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Last week TCP in Datacenters Avoid incast problem - Reduce
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationBuilding MPI for Multi-Programming Systems using Implicit Information
Building MPI for Multi-Programming Systems using Implicit Information Frederick C. Wong 1, Andrea C. Arpaci-Dusseau 2, and David E. Culler 1 1 Computer Science Division, University of California, Berkeley
More informationCC MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters
CC MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Dept. of Computer Science Florida State University Tallahassee, FL 32306 {karwande,xyuan}@cs.fsu.edu
More informationInternetworking Part 1
CMPE 344 Computer Networks Spring 2012 Internetworking Part 1 Reading: Peterson and Davie, 3.1 22/03/2012 1 Not all networks are directly connected Limit to how many hosts can be attached Point-to-point:
More informationPoint-to-Point Communication. Reference:
Point-to-Point Communication Reference: http://foxtrot.ncsa.uiuc.edu:8900/public/mpi/ Introduction Point-to-point communication is the fundamental communication facility provided by the MPI library. Point-to-point
More informationAdaptive RTP Rate Control Method
2011 35th IEEE Annual Computer Software and Applications Conference Workshops Adaptive RTP Rate Control Method Uras Tos Department of Computer Engineering Izmir Institute of Technology Izmir, Turkey urastos@iyte.edu.tr
More informationParallel Computing Trends: from MPPs to NoWs
Parallel Computing Trends: from MPPs to NoWs (from Massively Parallel Processors to Networks of Workstations) Fall Research Forum Oct 18th, 1994 Thorsten von Eicken Department of Computer Science Cornell
More informationECE 650 Systems Programming & Engineering. Spring 2018
ECE 650 Systems Programming & Engineering Spring 2018 Networking Transport Layer Tyler Bletsch Duke University Slides are adapted from Brian Rogers (Duke) TCP/IP Model 2 Transport Layer Problem solved:
More informationPerformance Modeling and Evaluation of MPI
Performance Modeling and Evaluation of MPI Khalid Al-Tawil Csaba Andras Moritz y Abstract Users of parallel machines need to have a good grasp for how different communication patterns and styles affect
More informationPM2: High Performance Communication Middleware for Heterogeneous Network Environments
PM2: High Performance Communication Middleware for Heterogeneous Network Environments Toshiyuki Takahashi, Shinji Sumimoto, Atsushi Hori, Hiroshi Harada, and Yutaka Ishikawa Real World Computing Partnership,
More informationAchieving Distributed Buffering in Multi-path Routing using Fair Allocation
Achieving Distributed Buffering in Multi-path Routing using Fair Allocation Ali Al-Dhaher, Tricha Anjali Department of Electrical and Computer Engineering Illinois Institute of Technology Chicago, Illinois
More information100 Mbps DEC FDDI Gigaswitch
PVM Communication Performance in a Switched FDDI Heterogeneous Distributed Computing Environment Michael J. Lewis Raymond E. Cline, Jr. Distributed Computing Department Distributed Computing Department
More informationPARA++ : C++ Bindings for Message Passing Libraries
PARA++ : C++ Bindings for Message Passing Libraries O. Coulaud, E. Dillon {Olivier.Coulaud, Eric.Dillon}@loria.fr INRIA-lorraine BP101, 54602 VILLERS-les-NANCY, FRANCE Abstract The aim of Para++ is to
More informationPerformance of a High-Level Parallel Language on a High-Speed Network
Performance of a High-Level Parallel Language on a High-Speed Network Henri Bal Raoul Bhoedjang Rutger Hofman Ceriel Jacobs Koen Langendoen Tim Rühl Kees Verstoep Dept. of Mathematics and Computer Science
More informationRecently, symmetric multiprocessor systems have become
Global Broadcast Argy Krikelis Aspex Microsystems Ltd. Brunel University Uxbridge, Middlesex, UK argy.krikelis@aspex.co.uk COMPaS: a PC-based SMP cluster Mitsuhisa Sato, Real World Computing Partnership,
More informationComparing the performance of MPICH with Cray s MPI and with SGI s MPI
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 3; 5:779 8 (DOI:./cpe.79) Comparing the performance of with Cray s MPI and with SGI s MPI Glenn R. Luecke,, Marina
More informationINTRODUCTORY COMPUTER
INTRODUCTORY COMPUTER NETWORKS TYPES OF NETWORKS Faramarz Hendessi Introductory Computer Networks Lecture 4 Fall 2010 Isfahan University of technology Dr. Faramarz Hendessi 2 Types of Networks Circuit
More information1/29/2008. From Signals to Packets. Lecture 6 Datalink Framing, Switching. Datalink Functions. Datalink Lectures. Character and Bit Stuffing.
/9/008 From Signals to Packets Lecture Datalink Framing, Switching Peter Steenkiste Departments of Computer Science and Electrical and Computer Engineering Carnegie Mellon University Analog Signal Digital
More informationComparing Gang Scheduling with Dynamic Space Sharing on Symmetric Multiprocessors Using Automatic Self-Allocating Threads (ASAT)
Comparing Scheduling with Dynamic Space Sharing on Symmetric Multiprocessors Using Automatic Self-Allocating Threads (ASAT) Abstract Charles Severance Michigan State University East Lansing, Michigan,
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationMVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand
MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand Matthew Koop 1,2 Terry Jones 2 D. K. Panda 1 {koop, panda}@cse.ohio-state.edu trj@llnl.gov 1 Network-Based Computing Lab, The
More informationHigh Performance MPI-2 One-Sided Communication over InfiniBand
High Performance MPI-2 One-Sided Communication over InfiniBand Weihang Jiang Jiuxing Liu Hyun-Wook Jin Dhabaleswar K. Panda William Gropp Rajeev Thakur Computer and Information Science The Ohio State University
More informationScalable Multiprocessors
arallel Computer Organization and Design : Lecture 7 er Stenström. 2008, Sally A. ckee 2009 Scalable ultiprocessors What is a scalable design? (7.1) Realizing programming models (7.2) Scalable communication
More informationA trace-driven analysis of disk working set sizes
A trace-driven analysis of disk working set sizes Chris Ruemmler and John Wilkes Operating Systems Research Department Hewlett-Packard Laboratories, Palo Alto, CA HPL OSR 93 23, 5 April 993 Keywords: UNIX,
More informationCan User-Level Protocols Take Advantage of Multi-CPU NICs?
Can User-Level Protocols Take Advantage of Multi-CPU NICs? Piyush Shivam Dept. of Comp. & Info. Sci. The Ohio State University 2015 Neil Avenue Columbus, OH 43210 shivam@cis.ohio-state.edu Pete Wyckoff
More informationPerformance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture
Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Sivakumar Harinath 1, Robert L. Grossman 1, K. Bernhard Schiefer 2, Xun Xue 2, and Sadique Syed 2 1 Laboratory of
More informationTEG: A High-Performance, Scalable, Multi-Network Point-to-Point Communications Methodology
TEG: A High-Performance, Scalable, Multi-Network Point-to-Point Communications Methodology T.S. Woodall 1, R.L. Graham 1, R.H. Castain 1, D.J. Daniel 1, M.W. Sukalski 2, G.E. Fagg 3, E. Gabriel 3, G. Bosilca
More informationChapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup
Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.
More informationCombining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?
Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática
More informationHigh Performance MPI-2 One-Sided Communication over InfiniBand
High Performance MPI-2 One-Sided Communication over InfiniBand Weihang Jiang Jiuxing Liu Hyun-Wook Jin Dhabaleswar K. Panda William Gropp Rajeev Thakur Computer and Information Science The Ohio State University
More informationECE 650 Systems Programming & Engineering. Spring 2018
ECE 650 Systems Programming & Engineering Spring 2018 Networking Introduction Tyler Bletsch Duke University Slides are adapted from Brian Rogers (Duke) Computer Networking A background of important areas
More informationNetwork Control and Signalling
Network Control and Signalling 1. Introduction 2. Fundamentals and design principles 3. Network architecture and topology 4. Network control and signalling 5. Network components 5.1 links 5.2 switches
More informationLightweight Remote Procedure Call
Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, Henry M. Levy ACM Transactions Vol. 8, No. 1, February 1990, pp. 37-55 presented by Ian Dees for PSU CS533, Jonathan
More informationANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES
ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES Sergio N. Zapata, David H. Williams and Patricia A. Nava Department of Electrical and Computer Engineering The University of Texas at El Paso El Paso,
More informationRED behavior with different packet sizes
RED behavior with different packet sizes Stefaan De Cnodder, Omar Elloumi *, Kenny Pauwels Traffic and Routing Technologies project Alcatel Corporate Research Center, Francis Wellesplein, 1-18 Antwerp,
More informationSummary of MAC protocols
Summary of MAC protocols What do you do with a shared media? Channel Partitioning, by time, frequency or code Time Division, Code Division, Frequency Division Random partitioning (dynamic) ALOHA, S-ALOHA,
More informationImproving TCP Performance over Wireless Networks using Loss Predictors
Improving TCP Performance over Wireless Networks using Loss Predictors Fabio Martignon Dipartimento Elettronica e Informazione Politecnico di Milano P.zza L. Da Vinci 32, 20133 Milano Email: martignon@elet.polimi.it
More information1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects
Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects
More informationChapter 4: network layer. Network service model. Two key network-layer functions. Network layer. Input port functions. Router architecture overview
Chapter 4: chapter goals: understand principles behind services service models forwarding versus routing how a router works generalized forwarding instantiation, implementation in the Internet 4- Network
More informationParallel Programming
Parallel Programming for Multicore and Cluster Systems von Thomas Rauber, Gudula Rünger 1. Auflage Parallel Programming Rauber / Rünger schnell und portofrei erhältlich bei beck-shop.de DIE FACHBUCHHANDLUNG
More informationImplementation and Evaluation of Prefetching in the Intel Paragon Parallel File System
Implementation and Evaluation of Prefetching in the Intel Paragon Parallel File System Meenakshi Arunachalam Alok Choudhary Brad Rullman y ECE and CIS Link Hall Syracuse University Syracuse, NY 344 E-mail:
More informationThe Design and Implementation of a MPI-Based Parallel File System
Proc. Natl. Sci. Counc. ROC(A) Vol. 23, No. 1, 1999. pp. 50-59 (Scientific Note) The Design and Implementation of a MPI-Based Parallel File System YUNG-YU TSAI, TE-CHING HSIEH, GUO-HUA LEE, AND MING-FENG
More informationSwitch Configuration message sent 1 (1, 0, 1) 2
UNIVESITY COLLEGE LONON EPATMENT OF COMPUTE SCIENCE COMP00: Networked Systems Problem Set istributed: nd November 08 NOT ASSESSE, model answers released: 9th November 08 Instructions: This problem set
More informationNetworking interview questions
Networking interview questions What is LAN? LAN is a computer network that spans a relatively small area. Most LANs are confined to a single building or group of buildings. However, one LAN can be connected
More informationHigh Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore
High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No # 09 Lecture No # 40 This is lecture forty of the course on
More informationNoWait RPC: Extending ONC RPC to a fully compatible Message Passing System
NoWait RPC: Extending ONC RPC to a fully compatible Message Passing System Thomas Hopfner Franz Fischer Georg Färber Laboratory for Process Control and Real Time Systems Prof. Dr. Ing. Georg Färber Technische
More informationArchitecture or Parallel Computers CSC / ECE 506
Architecture or Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models 6/19/2006 Dr Steve Hunter Back to Basics Parallel Architecture = Computer Architecture + Communication Architecture
More informationRICE UNIVERSITY. High Performance MPI Libraries for Ethernet. Supratik Majumder
RICE UNIVERSITY High Performance MPI Libraries for Ethernet by Supratik Majumder A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree MASTER OF SCIENCE Approved, Thesis Committee:
More informationUnderstanding MPI on Cray XC30
Understanding MPI on Cray XC30 MPICH3 and Cray MPT Cray MPI uses MPICH3 distribution from Argonne Provides a good, robust and feature rich MPI Cray provides enhancements on top of this: low level communication
More informationHomework 1. Question 1 - Layering. CSCI 1680 Computer Networks Fonseca
CSCI 1680 Computer Networks Fonseca Homework 1 Due: 27 September 2012, 4pm Question 1 - Layering a. Why are networked systems layered? What are the advantages of layering? Are there any disadvantages?
More information