A Framework for Distributed Computation Over a Heterogeneous Beowulf Cluster.

Similar documents
Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Parallel matrix-vector multiplication

Transaction-Consistent Global Checkpoints in a Distributed Database System

An Optimal Algorithm for Prufer Codes *

Concurrent Apriori Data Mining Algorithms

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

User Authentication Based On Behavioral Mouse Dynamics Biometrics

CMPS 10 Introduction to Computer Science Lecture Notes

AADL : about scheduling analysis

Module Management Tool in Software Development Organizations

A Binarization Algorithm specialized on Document Images and Photos

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

AP PHYSICS B 2008 SCORING GUIDELINES

The Codesign Challenge

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A New Transaction Processing Model Based on Optimistic Concurrency Control

Load Balancing for Hex-Cell Interconnection Network

Solution Brief: Creating a Secure Base in a Virtual World

On Some Entertaining Applications of the Concept of Set in Computer Science Course

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Notes on Organizing Java Code: Packages, Visibility, and Scope

Hermite Splines in Lie Groups as Products of Geodesics

Information Sciences

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

Simulation Based Analysis of FAST TCP using OMNET++

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals

Reliability and Performance Models for Grid Computing

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

124 Chapter 8. Case Study: A Memory Component ndcatng some error condton. An exceptonal return of a value e s called rasng excepton e. A return s ssue

Performance Evaluation of Information Retrieval Systems

X- Chart Using ANOM Approach

ELEC 377 Operating Systems. Week 6 Class 3

Parallelism for Nested Loops with Non-uniform and Flow Dependences


Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

An Entropy-Based Approach to Integrated Information Needs Assessment

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Private Information Retrieval (PIR)

Priority queues and heaps Professors Clark F. Olson and Carol Zander

Mathematics 256 a course in differential equations for engineering students

S1 Note. Basis functions.

Efficient Distributed File System (EDFS)

Array transposition in CUDA shared memory

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Load-Balanced Anycast Routing

3D vector computer graphics

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Classifying Acoustic Transient Signals Using Artificial Intelligence

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

Cluster Analysis of Electrical Behavior

Real-Time Guarantees. Traffic Characteristics. Flow Control

Related-Mode Attacks on CTR Encryption Mode

Intro. Iterators. 1. Access

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems:

Chapter 1. Introduction

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Alternating Direction Method of Multipliers Implementation Using Apache Spark

Remote Sensing Image Retrieval Algorithm based on MapReduce and Characteristic Information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Polyhedral Compilation Foundations

CS 534: Computer Vision Model Fitting

Reducing Frame Rate for Object Tracking

A MapReduce-supported Data Center Networking Topology

Optimizing Document Scoring for Query Retrieval

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Analysis of Continuous Beams in General

Advanced Computer Networks

y and the total sum of

Security Enhanced Dynamic ID based Remote User Authentication Scheme for Multi-Server Environments

Virtual Machine Migration based on Trust Measurement of Computer Node

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

K-means and Hierarchical Clustering

Goals and Approach Type of Resources Allocation Models Shared Non-shared Not in this Lecture In this Lecture

Review of approximation techniques

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Real-time Motion Capture System Using One Video Camera Based on Color and Edge Distribution

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Assembler. Building a Modern Computer From First Principles.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

Parallel Artificial Bee Colony Algorithm for the Traveling Salesman Problem

Lecture 5: Multilayer Perceptrons

Fast Computation of Shortest Path for Visiting Segments in the Plane

an assocated logc allows the proof of safety and lveness propertes. The Unty model nvolves on the one hand a programmng language and, on the other han

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

CE 221 Data Structures and Algorithms

CS 268: Lecture 8 Router Support for Congestion Control

CHAPTER 10: ALGORITHM DESIGN TECHNIQUES

Connection-information-based connection rerouting for connection-oriented mobile communication networks

Real-time Scheduling

Transcription:

A Framework for Dstrbuted Computaton Over a Heterogeneous Beowulf Cluster. Jared A. Heuschele Computer Scence Unversty of Wsconsn-Eau Clare heuschja@uwec.edu Andrew T. Phllps Computer Scence Unversty of Wsconsn-Eau Clare phllpa@uwec.edu Abstract We present a problem ndependent frameworks software desgn that takes a descrpton of some computatonal problem consstng of both the mathematcal model and ts data, and then performs the calculatons n a dstrbuted computng envronment. The MPI standard for dstrbuted computng over a network of heterogeneous workstatons s used, but the software framework s fully applcaton ndependent. Specfc goals of the project were dynamc process control and load balancng and the development of a C++ object orented framework that would take a descrpton of the computatonal problem and ts data and dstrbute the computatons over a heterogeneous Beowulf cluster. That s, the dstrbuted computng aspect of the calculaton s completely separated from the problem and ts descrpton. We wll demonstrate the success of the applcaton framework for dstrbuted computaton, ncludng dynamc load balancng and process management on a network of Lnux and IRIX workstatons, all n the context of a non-trval applcaton, namely the Proten Foldng Problem.

Ths paper descrbes a problem ndependent framework based mplementaton of a varaton of the tradtonal master-slave dstrbuted computng model. Our framework approach takes a descrpton of the problem, consstng of the data and the mathematcal/algorthmc model, and performs all calculatons n a dstrbuted computng envronment wthout requrng any knowledge or understandng of dstrbuted computaton on the part of the user. Hence, the framework approach completely dvorces the dstrbuted computaton from the user-defned problem statement and soluton. Furthermore, our dstrbuted computng framework ncludes dynamc process control and load balancng usng a C++ object orented model. Dstrbuted computaton has been practced snce the advent of multple processor supercomputers n the early 1980s. Wth the wdespread growth of computer networks and the avalablty of message passng software, t s also now the doman of common desktop PCs. In ths paper we descrbe a model for dstrbuted computng that makes ths process both easer and problem ndependent. We do ths by creatng a framework for dstrbuted computng whch separates the dstrbuted computng aspect of the mplementaton from the problem and ts descrpton/model. Model for Dstrbuted Computaton Dstrbuted computaton s a method of breakng up a problem nto several computatonal parts and metng out those parts to several ndependent processors. There are several hardware confguratons on whch to mplement dstrbuted computng, and we have chosen to use a Page 1

heterogeneous Beowulf cluster. A Beowulf cluster conssts of several ndependent computers, known as "nodes," lnked together va ethernet or other network technology n the attempt to create a gestalt effect of a supercomputer. MPI (Message Passng Interface), a standard protocol for dstrbuted computaton, s then mplemented on the Beowulf cluster n order to run dstrbuted computatons (MPI Forum, 1997). Whle we have selected the Beowulf cluster mplementaton of dstrbuted computaton, none of our desgn s n any way dependent on ths choce. We assume only that whatever dstrbuted computaton mplementaton s chosen, t supports message passng va MPI. There are may dfferent dstrbuted computng models that can be used on a Beowulf cluster. We have chosen a varaton of the tradtonal master-slave model (Breshears, 1998). In the typcal master-slave scenaro, the master assgns tasks for the slaves to process n parallel. Slaves, n turn, perform most or all of the computaton relevant to the applcaton. After the computatons are done by a slave, the results are reported back to the master, whch then combnes the results and contnues the cycle of dstrbutng the tasks untl the entre computaton s complete. The key pont s that the master coordnates, or manages, the dstrbuton of work and the complaton of the results, whle the slaves are solely responsble for the actual computatonal kernel. Our desgn for dstrbuted computaton over N nodes on a Beowulf cluster dffers slghtly from ths master-slave scenaro. Our model nvolves dvdng the ndvdual processors nto a herarchy that conssts of one node runnng the master process, another runnng an assstant process, and the other N-2 nodes runnng the slave processes. Once started, the master node/process s responsble only for assgnng the workload, whch s dynamcally determned, to the slave nodes and montorng ther progress. As before, the slaves do the bulk of the actual Page 2

computaton but report ther results to the assstant (not the master). The assstant s duty then s strctly to gather and process the ndvdual results reported to t drectly by the slaves. Ths dvson of labor between master and assstant reduces the amount of communcaton/computaton the master s requred to perform, and frees t to focus exclusvely on manageral actvtes. To use the processng power of a Beowulf cluster at or near the level of greatest potental, the master program must also montor the slave nodes progress n order to deal wth the dffcultes that arse n the envronment created by the hardware and software of heterogeneous machnes and operatng systems. Dffcultes encountered nclude but are not lmted to: network falures and bottlenecks, memory falures, and schedulng bottlenecks assocated wth the operatng system. An effectve way of crcumventng these potental dffcultes s to ncorporate on the master node a dynamc process control mechansm that optmzes job dstrbuton va montorng the load balance among the N-2 slave nodes. Thus, the master process s responsble for more than dstrbutng work to the slaves. In our model, a majorty of ths mechansm s accomplshed va an observer thread that runs on the same node as the master process. The observer keeps a record of tasks assgned to slaves and the status of each task ncludng whether t s completed, n progress, or stalled. On a more fundamental level, t acts as an terator for the master. That s, the master s not responsble for determnng to whch slave to send the next task; nstead, t just asks the observer for the next slave. The observer therefore provdes the basc operaton of teraton among the lst of all actve slaves. The observer also tracks the status of each slave. Should one of the slaves de, the observer notes ths and ensures va ts terator methods that the master avods assgnng tasks to the dead node. Furthermore, the observer measures the effectve speed of a Page 3

slave by trackng task completon tmes, and enables the master to assgn the approprate amount of work to each slave. Thus, the load balancng and process management s abstracted out of the master nto the observer. Of course, the master s responsble for ntally assgnng tasks to each of the slaves whch t does wth the help of the observer s terator methods. As slaves complete ther tasks, the master assgns them more work and coordnates the load balancng based on the recommendatons of the observer. Thus, load balancng s an nherent part of the master process; faster slaves wll be able to request and receve more tasks. Fnally, task completon s tracked va verfcaton from the slaves as well as node montorng by the observer. In contrast to the manageral nature of the master, slaves do the actual computatonal work. They awat data from the master, work on t n a user defned way, and then send the results to the assstant, and a correspondng confrmaton of completon to the master, whch then prompts the master to send that slave another task. Note that slaves need know nothng about the exstence of other slaves, nothng about the workload dstrbuton, and nothng about the overall value/use of the results that they compute. The assstant s responsble for processng the results generated by the slaves. It needs to communcate wth the master only at the end of the entre computaton n order to confrm to the master that all work has been completed and the program s ready to end. The communcaton that occurs n our master-assstant-slave model s therefore more complcated than a typcal master-slave model ( Fgure 1): Master 1 4 5 3 Slave Assst. 2 Fgure 1 Page 4

In step one, the master sends a data set to the slave for processng. The slave then processes that data set and n step two, passes the result to the assstant and mmedately (step three) sends a confrmaton to the master n order to receve another data set. When the master has assgned all work to the slaves, t sends a sngle message to the assstant ndcatng that t has fnshed assgnng tasks (step four). When the assstant fnshes processng results receved from the slaves, step fve s to send a fnal confrmaton ndcatng problem completon to the master. The Framework The desgn and constructon of the dynamc process control and communcaton n the context of a problem ndependent framework s the key to our model. Much of the software that mplements the rudmentary communcaton between, and dstrbuton to, the nodes s wdely avalable n software lbrary packages lke MPI, whch we use. However, the mplementaton of an object orented framework for dstrbuted computaton s not ncluded n ths and other lbrary packages. A "framework" n the context of computer software s an extenson of the prncple of code reuse. The four prmary benefts of object orented frameworks are modularty, reusablty, extensblty, and nverson of control (Fayad, Schmdt, and Johnson, 1999, p. 8). Of the four nverson of control s the beneft unque to frameworks. Conventonal code reuse conssts of nsertng preexstng modules nto the code developed to solve a problem n order to save tme n Page 5

the constructon process. A framework s the fguratve nverse of ths process: nstead of pluggng the preexstng modules nto the problem, the problem s plugged nto the framework. Frameworks are peces/sutes of code that mplement a certan process or model and execute ndependent of the specfcs of a problem. Frameworks serve as a template for a problem soluton technque. In the case of dstrbuted computaton, our framework mplements dynamc load balancng, slave process management, and all communcaton va low-level message passng, all of whch can execute ndependent of the detals of the user defned problem. These aspects of dstrbutng the computaton must be mantaned ndependently and reman unrelated to the actual calculaton. Of course, detals are necessary n the dstrbuton of problem specfc data to each slave process (whch s the responsblty of the master process), the gatherng and processng of the ndvdual results (whch s the role of the assstant process), and n the calculatons specfc to the problem (whch s the duty of the slave processes). Thus, when a framework s constructed, all that s needed s a descrpton of certan specfc master, assstant, and slave process responsbltes. These descrptons are encapsulated n the four user defned functons used by the framework that encompass the user defned soluton method, ts assocated data to be calculated, and how the calculatons are to be dstrbuted. These four functons are: 1. UserNextTask: defnes and provdes a specfc data set on whch to perform computatons. The master requests ths data set from the user pror to communcatng wth a slave. 2. UserWork: defnes the computatonal kernel of work to be done. Each slave s responsble for provdng a specfc data set (obtaned from the master) on whch to perform ths work. Page 6

3. UserCombne: combnes results (as reported by the slaves) n a user defned way. The assstant s responsble for provdng a specfc soluton vector for combnng by the user. 4. UserTasksDone: determnes when there are no more data sets/tasks to be completed. The master makes ths request of the user pror to askng for a next data set (va UserNextTask). Thus, the user s problem, confned to these methods, can actually be wrtten and executed sequentally. The user does not have to know anythng about the MPI, message passng, or the vagares of dstrbuted computng. Ths s not to say that usng the framework to solve the user s problem s plug and play. The user must understand and dentfy the parallelsm nherent n the soluton method. Sample Computaton As a smple example of ths model, we use the followng elementary computaton: σ = n = 1 σ 2 whereσ R, and n s consdered to be large. Our framework approach would then requre the followng smple set of user defned functons/behavors: UserWork( σ ): returns 2 σ UserCombne( σ ): partal Page 7

update σ = σ + σ UserNextTask(): returns next UserTasksDone(): σ partal returns true f and only f n σ already have been requested. 2 In ths case, each slave smply computes σ (va UserWork) gven a σ from the master (the s obtaned va UserNextTask). Upon completon of ths task, the slave sends t result to the assstant whch n turn keeps a runnng total of the results va UserCombne. When the master has determned that all tasks have been completed (va UserTasksDone and confrmaton from all slaves), the assstant s notfed to prepare to report the fnal result. Notce that the user s responsble for understandng the nherent parallelsm n the computaton, but that no reference to dstrbuted computaton or MPI s expected or requred. σ Relable Message Passng Dstrbuted computaton models rely heavly on the relable transportaton of user data. In our framework, data buffers are sent from the master to be processed by the slaves and results are stored n buffers that are sent from the slaves to be gathered and processed by the assstant. Memory cannot be shared by these processes due to the dstrbuted nature of the computng envronment; that s, we are assumng a dstrbuted memory model, not a shared memory one. Therefore, each of these transfers requre that separate buffers be allocated by the master, assstant, and slaves on each of ther respectve nodes. Master and assstant need only manage one buffer, whereas the slave has two buffers to coordnate: one to store data receved from the Page 8

master, and another n whch to send the results to the assstant. Because of the need for relable transport of the buffers contanng the user data, the user must specfy the maxmum length of data buffers that each ndvdual task mght requre. By havng the user specfy the maxmum length of data buffers, the allocaton, passng, and referencng of these data buffers s greatly smplfed. All data buffers used n message passng are thus allocated by the framework, an abstracton that hdes one more complcated aspect of MPI operaton from the user. There s only one type of data that can be relably transferred between multple systems n a heterogeneous cluster: the IEEE double. By usng the IEEE standard 64 bt floatng pont representaton, the framework avods the overhead of trackng varous archtecture dfferences that can results n non-portable or ncorrect message passng. The Object Orented Framework Our framework runs on each machne usng a sngle program, multple data (SPMD) format; each machne comples and runs the same code, but executes that code dfferently dependng on whch personalty (master, slave, assstant) s ntated (Almas & Gottleb, 1994) Each node, whether master, slave, or assstant, shares common attrbutes and behavors whch are abstracted n a base class Mp_t. Each node personalty (Master_t, Asstant_t, or Slave_t) then nherts from ths base class, and then determnes f t s the master, the assstant, or a slave. Each node also determnes the dentty (node number) of the master and assstant. A node s dentty s easly establshed n the Mp_t constructor wth the functon MPI_Comm_rank(), Page 9

whch fnds the node s rank n the overall model (OSC, 1996). A node s dentty does not change throughout the run of the computaton. The master node always has a rank of 0, and the assstant node always has a rank of N-1 n a N node cluster. pseudocode: The run methods of each node type can be summarzed n the followng C++ vod Master_t::Run() { whle ((observer->moreslaves()) && UserMoreTasks() ) { allocate workbuffer memory for message passng; UserNextTask(workBuffer); Send workbuffer to observer->currentslave(); deallocate the workbuffer memory; observer->nextslave(); whle (!UserTasksDone()) { receve confrmaton of work completed by slave #d; allocate workbuffer memory for message passng; UserNextTask(workBuffer); Send workbuffer to slave #d; deallocate the workbuffer memory; whle (there are stll tasks unfnshed) { receve confrmaton of work completed from any slave; send message to assstant declarng end of task allocaton, ncludng count of tasks completed by slaves; receve confrmaton from the assstant; StopAllSlaves(); StopMeNow(); The master run method begns by sendng off a task to each slave. It uses the observer to terate through the lst of slaves and does so as long as task exsts for a slave to complete (determned by callng UserTasksDone). After sendng a job to each of the slaves, t wats for the slaves to complete a task and then sends that slave a new task f one remans. Tasks are Page 10

contnuously assgned as slaves complete ther work wth the help of UserNextTask. After all tasks have been assgned, the master wats for slaves to confrm that the remander of tasks have completed. vod Slave_t::Work() { allocate memory for resultsbuffer for message passng; UserWork(resultsBuffer,workBuffer); send resultsbuffer to assstant; send confrmaton to master; deallocate the resultsbuffer memory; vod Slave_t::Run() { bool keep_workng = true; whle (keep_workng) { allocate memory for workbuffer for message passng; swtch (Wat for message and buffer from master) { case stop: StopMeNow(); keep_workng = false; break; case more2do: Work(); break; default: keep_workng = false; break; deallocate the workbuffer memory; The slave run method wats n a loop for a message from the master. In that message, a tag determnes whether the slave contnues workng or whether t exts and termnates on the node. If t s told to do work, t calls ts work method, whch executes the UserWork functon to accomplsh the computaton. The slave next sends the results to the assstant and a confrmaton to the master, upon whch t exts the work method and returns to the run method awatng more work. Page 11

vod Assstant_t::Work() { allocate memory for resultsbuffer for message passng; wat for a message and resultsbuffer from anybody; f (sender == master) { masternotfedus = true; masterstally = count of completed tasks as sent by master; else { //t was from slave UserCombne(resultsBuffer); completedjobs++; deallocate resultsbuffer memory; bool Assstant_t::sAllWorkDone() { return (masternotfedus && (masterstally == completedjobs)); vod Assstant_t::Run() { whle (!sallworkdone()) { Work(); send confrmaton to master; StopMeNow(); The assstant executes smlar to the slave; ts run method conssts prmarly of a watng for message loop, but n a dfferent form. It checks to see f has receved the message to termnate from the master, and checks the master s tally of tasks assgned aganst the tally of tasks collated by the assstant. If nether of these condtons are met, t executes ts work method, whch wats for a message from anybody, processng the results va UserCombne f the message was from a slave, or checkng the tally from the master otherwse. If the ext condtons are met, t sends a confrmaton of completon to the master and termnates. At ths pont n the executon, computaton s complete, and the program exts normally on each node. Note that message passng s completely abstracted out from the framework wth the class Message_t. It contans the necessary handle nstances and other members to facltate Page 12

message passng. Its methods nclude two sendng and two recevng functons, one each for merely sendng and recevng a tag (a message wth no assocated buffer), and another two to send and receve both tag and data. Here s the class defnton for Message_t whch s used to encapsulate all message passng behavor: enum message_type {none, stop, more2do, confrmed, completed; class Message_t { publc: Message_t(); ~Message_t(); message_type RecvTag(); message_type RecvTagFrom(nt fromwhom); vod SendTag2(nt towhom, message_type tag); vod SendMsg2(nt towhom, message_type tag, double *buffer, nt len); message_type RecvMsg(double *buffer, nt len); message_type GetTag(); nt GetSender(); prvate: nt sender; message_type tagrecvd; MPI_Request therequest; MPI_Status thestatus; ; Concluson By developng a the dstrbuted computaton framework, the ease of utlzng parallel computatonal power s ncreased. Users need be able only to understand what aspects of ther problem can be run ndependently and n parallel n order to provde the detals of the template Page 13

functons UserWork, UserNextTask, UserCombne, and UserTasksDone. In no case s there any need of the part of the user to understand or use any prncples/methods of dstrbuted computng. Furthermore, by abstractng the message passng out of the master-slave-assstant model, the framework s more adaptable to the changng world of MPI standards and mplementatons, and also allows tself to be moved to other computng envronments by smply changng the mplementaton of the message passng class. References Almas, G. S., Gottleb, A. (1994). Hghly Parallel Computng, 2 nd Ed. Calforna: The Benjamn/Cummngs Publshng Company, Inc. Breshears, Clay. (1998). Detaled Examples. A Begnner's Gude to PVM Parallel Vrtual Machne [Onlne]. Avalable: http://www-jcs.cs.utk.edu/pvm/pvm_gude.html [2000, February 29]. Fayad, M., Schmdt, D., & Johnson, R. (1999). Buldng Applcaton Frameworks: Object Orented Foundatons of Framework Desgn. New York: Wley Computer Publshng. Message Passng Interface Forum. (1997). MPI-2: Extensons to the Message-Passng Interface [Onlne]. Avalable: http://www.mp-forum.org/docs/mp-20-html/mp2-report.html. Knoxvlle, Tennessee: Unversty of Tennessee. [2000, February 29] OSC (Oho Supercomputer Center), The Oho State Unversty. (1996). Basc Parallel Informaton. MPI Prmer/Developng wth LAM (p.21). Columbus, Oho: The Oho State Unversty. Acknowledgements We would lke to acknowledge the help of the many members of the lam@mp.nd.edu malng lst. Page 14