Language-Based Parallel Program Interaction: The Breezy Approach. Darryl I. Brown Allen D. Malony. Bernd Mohr. University of Oregon

Similar documents
TAU: A Portable Parallel Program Analysis. Environment for pc++ 1. Bernd Mohr, Darryl Brown, Allen Malony. fmohr, darrylb,

DARP: Java-based Data Analysis and Rapid Prototyping Environment for Distributed High Performance Computations

Program Analysis and Tuning Tools for a Parallel. Object Oriented Language: An Experiment with. the TAU System. Allen Malony, Bernd Mohr

Workload Characterization using the TAU Performance System

Khoral Research, Inc. Khoros is a powerful, integrated system which allows users to perform a variety

Steering. Stream. User Interface. Stream. Manager. Interaction Managers. Snapshot. Stream

director executor user program user program signal, breakpoint function call communication channel client library directing server

Shigeru Chiba Michiaki Tatsubori. University of Tsukuba. The Java language already has the ability for reection [2, 4]. java.lang.

les are not generally available by NFS or AFS, with a mechanism called \remote system calls". These are discussed in section 4.1. Our general method f

Technische Universitat Munchen. Institut fur Informatik. D Munchen.

Eclipse-PTP: An Integrated Environment for the Development of Parallel Applications

Monitoring and Visualizing. Software-Heterogeneous Distributed Object Applications. Jakub Szymaszek.

Performance Analysis of pc++: A Portable Data-Parallel. Programming System for Scalable Parallel Computers 1

Monitoring Script. Event Recognizer

To appear in: Proceedings of the Supercomputing '93 Conference, Portland, Oregon, November 15{19, Implementing a Parallel C++ Runtime System for

Short Notes of CS201

CS201 - Introduction to Programming Glossary By

DBMS Environment. Application Running in DMS. Source of data. Utilization of data. Standard files. Parallel files. Input. File. Output.

ICC++ Language Denition. Andrew A. Chien and Uday S. Reddy 1. May 25, 1995

The members of the Committee approve the thesis of Baosheng Cai defended on March David B. Whalley Professor Directing Thesis Xin Yuan Commit

The PAPI Cross-Platform Interface to Hardware Performance Counters

The Compositional C++ Language. Denition. Abstract. This document gives a concise denition of the syntax and semantics

As related works, OMG's CORBA (Common Object Request Broker Architecture)[2] has been developed for long years. CORBA was intended to realize interope

Automatic Code Generation for Non-Functional Aspects in the CORBALC Component Model

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t

The Extensible Java Preprocessor Kit. and a Tiny Data-Parallel Java. Abstract

A Component-based Programming Model for Composite, Distributed Applications

Web site Image database. Web site Video database. Web server. Meta-server Meta-search Agent. Meta-DB. Video query. Text query. Web client.

A Strategy for Addressing the Needs of Advanced Scientific Computing Using Eclipse as a Parallel Tools Platform 1

InsECTJ: A Generic Instrumentation Framework for Collecting Dynamic Information within Eclipse

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

CUMULVS: Collaborative Infrastructure for Developing. Abstract. by allowing them to dynamically attach to, view, and \steer" a running simulation.

Compiler and Runtime Support for Programming in Adaptive. Parallel Environments 1. Guy Edjlali, Gagan Agrawal, and Joel Saltz

Multiple Data Sources

Cedar Fortran Programmer's Manual 1. Jay Hoeinger. Center for Supercomputing Research and Development. Urbana, Illinois

City Research Online. Permanent City Research Online URL:

An Interactive Desk Calculator. Project P2 of. Common Lisp: An Interactive Approach. Stuart C. Shapiro. Department of Computer Science

OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP

Software Architecture

[8] J. J. Dongarra and D. C. Sorensen. SCHEDULE: Programs. In D. B. Gannon L. H. Jamieson {24, August 1988.

TAUg: Runtime Global Performance Data Access Using MPI

Rance Cleaveland The Concurrency Factory is an integrated toolset for specication, simulation,

NightStar. NightView Source Level Debugger. Real-Time Linux Debugging and Analysis Tools BROCHURE

Chapter 7:: Data Types. Mid-Term Test. Mid-Term Test (cont.) Administrative Notes

MMPI: Asynchronous Message Management for the. Message-Passing Interface. Harold Carter Edwards. The University of Texas at Austin

Compilers and Compiler-based Tools for HPC

Performance database technology for SciDAC applications

ParaProf: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis

GET <URL1> GET <URL2>

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

Hands-On Perl Scripting and CGI Programming

Department of. Computer Science. A Comparison of Explicit and Implicit. March 30, Colorado State University

The S-Expression Design Language (SEDL) James C. Corbett. September 1, Introduction. 2 Origins of SEDL 2. 3 The Language SEDL 2.

AUTOMATIC GRAPHIC USER INTERFACE GENERATION FOR VTK

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001

Message Passing vs. Distributed Objects. 5/15/2009 Distributed Computing, M. L. Liu 1

pc++/streams: a Library for I/O on Complex Distributed Data-Structures

Software Component Relationships. Stephen H. Edwards. Department of Computer Science. Virginia Polytechnic Institute and State University

Software Architecture Patterns

Visual Profiler. User Guide

Appendix: Generic PbO programming language extension

Written Presentation: JoCaml, a Language for Concurrent Distributed and Mobile Programming

Beth Plale Greg Eisenhauer Karsten Schwan. Jeremy Heiner Vernard Martin Jerey Vetter. Georgia Institute of Technology. Atlanta, Georgia 30332

Distributed Objects. Object-Oriented Application Development

Advanced Algorithms and Computational Models (module A)

Towards the Performance Visualization of Web-Service Based Applications

A hybrid approach to application instrumentation

Chapter 2 FEATURES AND FACILITIES. SYS-ED/ Computer Education Techniques, Inc.

client 1 2 client 2 activating entity collective object (3)

A Capabilities Based Communication Model for High-Performance Distributed Applications: The Open HPC++ Approach

CPS221 Lecture: Operating System Functions

Architectural Design

A Multidatabase Platform to Support. Prototyping Environments. Omar Boucelma, Jean-Claude Franchitti, and Roger King.

CSC209 Review. Yeah! We made it!

2 Addressing the Inheritance Anomaly One of the major issues in correctly connecting task communication mechanisms and the object-oriented paradigm is

security model. The framework allowed for quickly creating applications that examine nancial data stored in a database. The applications that are gene

OpenCL Overview. Shanghai March Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group

tee is to design a new TCP/IP API which matches the requirements of embedded systems. RTOS Automotive Application Technical Committee With current pra

Data Flow Oriented Software Design in a FACE Architecture

1 Executive Overview The Benefits and Objectives of BPDM

Offloading Java to Graphics Processors

Siegfried Loer and Ahmed Serhrouchni. Abstract. SPIN is a tool to simulate and validate Protocols. PROMELA, its

Global Scheduler. Global Issue. Global Retire

DataCutter and A Client Interface for the Storage Resource Broker. with DataCutter Services. Johns Hopkins Medical Institutions

An Object-Based Infrastructure for Program Monitoring and Steering. Greg Eisenhauer and Karsten Schwan. College ofcomputing

Portable Self-Describing Binary Data Streams. Greg Eisenhauer. College of Computing. Georgia Institute of Technology

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems

The Art of Debugging: How to think like a programmer. Melissa Sulprizio GEOS-Chem Support Team

WhatÕs New in the Message-Passing Toolkit

OpenACC 2.6 Proposed Features

Transparent Access to Legacy Data in Java. Olivier Gruber. IBM Almaden Research Center. San Jose, CA Abstract

Outline. Computer Science 331. Information Hiding. What This Lecture is About. Data Structures, Abstract Data Types, and Their Implementations

Java Class Visualization for Teaching Object-Oriented Concepts

ABCDE. HP Part No Printed in U.S.A U0989

RM0327 Reference manual

Khoros: A Problem Solving Environment for. Danielle Argiro Steve Kubica Mark Young. Jeremy Worley Steve Jorgensen. Khoral Research, Inc.

Ian Foster. Argonne, IL Fortran M is a small set of extensions to Fortran 77 that supports a

Introduction to Split-C 1. Version 1.0. David E. Culler. Andrea Dusseau. Seth Copen Goldstein. Arvind Krishnamurthy. Steven Lumetta.

UNIT V SYSTEM SOFTWARE TOOLS

Transcription:

Language-Based Parallel Program Interaction: The Breezy Approach Darryl I. Brown Allen D. Malony Bernd Mohr Department of Computer And Information Science University of Oregon Eugene, Oregon 97403 fdarrylb, malony, mohrg@cs.uoregon.edu Abstract. This paper presents a general architecture for runtime interaction with a data-parallel program. We have applied this architecture in the development of the Breezy tool for the pc++ language. Breezy grants application programs convenient and ecient access to higher-level external services (e.g., databases, visualization systems, and distributed resources) and allows external access to the application's state (e.g., for program state display or computational steering). Although such support can be developed on an ad-hoc basis for each application, a general approach to the problem of parallel program interaction is preferred. A general approach makes tools more portable and retargetable to dierent language systems. There are two main conclusions from this work. First, interaction support should be integrated with a language system facilitating an implementation of a model that is consistent with the language design. This aids application developers or the tool builders that require this interaction. Second, as the implementation of Breezy shows, the development of interaction support can leverage o the language itself as well as its compiler and runtime systems. 1 Introduction It is increasingly the case in high-performance parallel applications that interaction with an external computing environment is necessary for a computational problem's solution. Certainly, this has al- This research is supported by ARPA under Rome Labs contract AF 30602-92-C-0135 and Fort Huachuca contract ARMY DABT63-94-C-0029. ways been true from the point of view of le I/O for reading program input data and writing computation output results, and traditional state-based debugging tools have always had interaction with a halted program as a fundamental feature. However, these interface examples are rudimentary relative to the support required for application programs to access higherlevel external services (e.g., databases, visualization systems[1], and distributed resources) and to allow external access to the application's state (e.g., for program state display or computational steering[2]). Although such support can be developed on an ad-hoc basis for each application, a general approach to the problem of parallel program interaction is preferred. In particular, high-level parallel languages require an approach that is portable with language implementations and that can accommodate language-level interaction with external applications and tools. The notion of program interaction support has long been a concern in the distributed computing and software engineering domains where interactions are either out of necessity or are the basis for improved application design. In a high-performance computing context, program interaction typically implies a performance loss. However, the development of interaction support, when required, will become more problematic as the sophistication of the application and the parallel computing environment increases. For this reason, the inclusion of interaction support early in the design and development of a parallel language system can lead to an integrated solution where external access to a parallel program will be natural and convenient, and where performance concerns, when they arise, can be addressed within the particular computing environment.

In this paper, we describe the design approach used to implement parallel program interaction support in a parallel object-oriented language system. The unique aspect of this research work is the use of the language itself and its associated compiler resources for generating the interaction infrastructure. The remainder of the introduction describes the features of the interaction system and the language platform where it was developed. More details of the architecture and implementation are given in the following sections. Several examples are then briey described, followed by future directions and conclusions. 1.1 Breezy and pc++ The program interaction system we developed, Breezy (BReakpoint Executive Environment for visualization and data DisplaY), is a tool that provides the infrastructure for a client application to attach to a data-parallel application at runtime. It creates a partnership between the client application and the parallel program. This partnership gives the client several capabilities. The client can control the execution of the program. The client can retrieve data from parallel data structures created in the program. The client can invoke certain functions (or class methods) in the parallel program. The client can retrieve specic information about the program's execution state. The client can retrieve meta-information about the program such as type descriptions. And, Breezy allows general communication between the client application and parallel program. Here we describe the Breezy implementation for the data-parallel language pc++[3]. pc++ is a language extension to C++ designed to allow programmers to create distributed data structures that have parallel execution semantics. The basic concept behind pc++ is the notion of a distributed collection, a structured set of objects which are distributed across the processing elements of the computer in a manner designed to be consistent with HPF[4]. To accomplish this, pc++ provides a very simple mechanism to build collections of objects from a base element class. Member functions of this element class can be applied to the Client Application Type Module Breezy Access Module Breezy API Transport Layer Executing Parallel Program Type Module Figure 1: Breezy Architecture Breakpoint Executive Module entire collection (or a subset) in parallel. This mechanism provides the programmer with a clean interface to data-parallel style operations by simply calling member functions of the element class. To help the programmer build collections, the pc++ language includes a library of standard collection classes that may be used direct or subclassed. This includes classes such as DistributedArray, DistributedMatrix, DistributedVector, and DistributedGrid. pc++ also includes an environment of tuning and analysis utilities (TAU)[5], of which Breezy is a part. This implementation of Breezy in pc++ is a concrete example of how the Breezy architecture has been applied successfully to a data-parallel language environment. 2 Breezy Architecture Breezy is an architecture that could protably be reapplied to other data-parallel languages. The Breezy architecture consists of several modules (see Fig. 1). The key modules are the Breakpoint Executive module and the Breezy Access module. The Type module and the Transport Layer module are components utilized by the Access and Executive modules. 2.1 The Breakpoint Executive Module The Breakpoint Executive module is primarily a request handler. It maintains information about program state, such as current breakpoint location in source code and the list of currently instantiated parallel data objects. For meta-information such as type descriptions of the parallel data structures or lists of all user-dened functions that can be called, the Breakpoint Executive module must consult the Type module. To serve requests for parallel data, the Breakpoint Executive calls access functions in the executing

program. These access functions reside in the (modi- ed) user program in order to have access to the program variables and functions. 2.2 The Breezy Access Module The Breezy Access module is currently implemented as a library of C routines. This library is linked with a client application to give that application access to the Breezy API. The following list relates this functionality in detail. The client can control the execution of the program. The parallel program stops at each synchronization barrier and waits for a request from the client. This request species one of the functions described below, or it directs Breezy to continue to the next breakpoint. Breezy guarantees a consistent state of data in the program by allowing breakpoints only during these barriers. The client can retrieve data from parallel data structures. The client species the variable from the program that holds the parallel data object of interest. If this object is a structured object with elds (such as a class), then the client can further specify a particular eld within that structure. The user can retrieve this data from all of the distributed elements of the parallel data object, or from a single element in that object. The client can call specied user-dened functions (or method invocations on classes) in the parallel program. By prepending function names in the user program with a particular string (e.g. UserDefined), the Breezy instrumentation process notes these particular functions, and adds code that will make them available to the client during runtime. Note that these can be methods dened on the elements of a parallel object as well as regular global functions. The client can retrieve specic information about the program state. This includes the current location in the source code where the program has paused, and also a list of variables that are currently instantiated parallel objects. The client can retrieve metainformation about the program. This consists of type information for all the parallel structures, and also the names of user-dened functions that are available to be called. Lastly, Breezy supports communication between the client application and parallel program. This may be desirable for instance if the programmer wants a user-dened function to return a value(s). This can be done in a straightforward manner with a high level communication interface which accesses the transport layer directly, bypassing the Breakpoint Executive and the Breezy Access module. One of the functions that might be of less obvious use is the ability to get type information about the parallel data objects in the program. This type information may be of interest in itself, as in a debugger application. Also, the client program can make use of type information to interact with the Breezy Access module. Using type information, client applications can be generic, adapting to dierent parallel programs and data within those programs. 2.3 Retrieving Data In Breezy One way of using type information is in requesting data. For structures, these type descriptions can be used to specify a particular eld of interest. Note that nested structures can be accessed this way also. A small example will help explain. Let's assume Breezy gives the client application the following type information: class valattributes { char *color; float threedposition[3]; } class simpleelem { int i; class valattributes *attr; float vals[100][100]; } Assume the client further nds that there is a variable mydistarray that is a distributed two-dimensional array of simpleelem elements (by retrieving program state information using Breezy). Breezy would represent such a structure as follows: DistributedArray (simpleelem) mydistarray[20][20]; We can now retrieve all of the elements in the variable mydistarray or a particular element (by specifying the indices in the distributed array of the particular element). We can also retrieve a specic eld of the element(s) by specifying its name. retrievedata "mydistarray", "vals" The above call would retrieve the data pointed to by the vals eld of the mydistarray variable. This specicity is recursive, so we could further grab the threedposition eld of the attr eld, which is a class itself.

retrievedata "mydistarray", "attr", "threedposition" This returns the values in the oat array pointed to by the threedposition eld in the valattributes class. All of these requests could be repeated for a particular element by specifying the index of the element. For example to retrieve element indexed by (4,5): retrievedatafromelem "mydistarray", "vals", 4, 5 3 Breezy Discussion There are several features that make Breezy a unique tool for its purpose in data-parallel computing analysis. It has a high level interface, practical and intuitive to use. Its modular design allows for reuse of components, and clean substitution of new technologies (such as substituting CORBA/IDL[6] for the transport layer). It can be used as is with minimal eort, or it could be built on to achieve much more complex functionality, such as computational steering. It allows the programmer to make functions available to be called by the client (via the Breezy API), giving the client the power to alter the course of the program or perform specic computations. Almost all of the implementation is done in the target language. This last point is particularly interesting because it allows the client application to reference data objects just as they were dened in the program, not at some lower level which the data may have been transformed into by the compiler. Also, a new implementation of Breezy is not required for each new architecture that the language system is ported to. Because Breezy is implemented in the language, Breezy runs on any architecture supported by the language implementation. There is a caveat to this argument in that there is at least one and possibly two necessary modications that needs to take place in the runtime system for Breezy to work. The one necessary change is in the implementation of the synchronization barrier. Breezy allows the client to access the program information during these barriers only, to ensure a consistent state in the program. The runtime system must modify the implementation of this barrier function to accommodate Breezy by having a single thread of execution call the Breakpoint Executive module before entering the barrier. All other threads enter the barrier and wait for the last one before continuing on. While they are there, they serve requests from the one thread that is in the Breakpoint Executive module. Thus, another requirement of the runtime system would be active messages (the ability to interrupt other threads to answer requests). This may or may not be implemented in the runtime system. In the case that it is not, it would need to be simulated during these barriers. 3.1 How does Breezy Dier from a Debugger? Comparing Breezy to a traditional parallel debugger of high-performance applications such as gdb or dbx-like debuggers helps illustrate its purpose. Breezy operates one layer below a debugger. A debugger does not provide a programmable interface. Since Breezy makes the parallel program accessible to the client through an API, it is much more exible than a debugger. In fact, the rst use of Breezy was in a simple parallel debugger. The implementation was trivial, basically a GUI built on top of the Breezy API. It also diers from a debugger in that it provides dierent functionality. It is specically geared toward parallel data. Thus, only data of this type is known and can be access using Breezy, whereas a debugger keeps track of all data. Breezy streamlines extraction of parallel data and allows interesting interactions with that data. Breezy also diers from most parallel debuggers in philosophy. Debuggers typically deal with symbol tables and pointers to all the data on each node or thread of execution. Thus, for each thread, a debugger window appears to address the variables in that thread. Breezy accesses data using the language. The philosophy of Breezy is to use the language constructs that exist already to get to parallel data. A single thread using these language-level constructs can access data from all other threads, just as any thread in the program itself would access data from other threads. This is how a single point of control is maintained, while allowing access to data on all nodes.

4 Program Analysis and Instrumentation This section discusses what happens during the precompilation process of Breezy. This process is important in that it describes how the data access is designed and how user functions are made available. As mentioned above, these two important functions of Breezy are implemented at the language level; the program analysis and instrumentation is where that implementation takes place. Program analysis is accomplished using a utility called Sage++[7], a compiler toolkit that provides an API for browsing the syntax tree of the program and modifying that tree as desired. Once modied, the new syntax tree can be unparsed to C++ source code, which can subsequently be compiled. To take advantage of Breezy, other languages would have to provide similar information about the program either from a compiler toolkit, or from the compiler itself. The user program is rst analyzed to generate type information. This type information is passed on to the instrumentor as well as saved to a separate type description le. This le contains \interesting" types (e.g., types involved in parallel data structures), and will be read by Breezy during the initialization phase of the execution to make the type information available at runtime. The second step in the process is instrumentation of the user program. The instrumentor also makes use of the type information. It must add code to the user program that will allow access to the parallel data objects at runtime. The rst step in this process is creating functions that extract the data from the parallel structures. If the distributed elements of the parallel object are instances of a class, then we must add methods to that class to get to that data. In generating these new functions and methods, the instrumentor uses the type information. It then must make a table that correlates the string type name of each interesting data type with the function that accesses (extracts data from) that data type. This table, and others that are created during the instrumentation process, are accessible by Breezy at runtime. Note that the access functions and methods have xed argument types so that the access function table entries need not include the parameter types. The next operation that the instrumentor must perform is detecting all lines in the code where parallel data objects are created. At each of these points, the instrumentor inserts code which will add the new variable's name, the pointer to that variable, and the type of that variable to a table. At each new allocation of a parallel structure, a new table entry is created, making the new variable available to Breezy. Note that at all points where parallel structures are deallocated, there must also be code added which deletes the table entry for the object being deallocated. The last step the instrumentor takes is to detect user-dened access functions. This consists of searching all function names in the user program for a certain prex, such as UserDefined. As in the previous step, the instrumentor again must construct a table relating the name of the function with a pointer to the function for runtime access. Thus, by accomplishing these steps, Breezy has runtime access to tables from which it can do the following. relate a name of a type (or detailed elds within that type) to an access function that can extract the data from that type given a pointer to a variable of that type; relate a variable name to its type and to a pointer to that variable's data; and relate the name of a user-dened function to the pointer to that function. Given access to this information at runtime, Breezy can forward variable, type and functions names from these tables to the client. In return, the client can specify what it is interested in by using these names as arguments to basic calls in the Breezy API. The result is a high-level interface based on the language and the program itself. 5 Example Applications of Breezy The following are three applications that used Breezy as a basis for parallel program interaction. Space limitation limit their descriptions here; more information can be found at [11]. The rst use of Breezy was a simple parallel debugger[5]. This consisted of building a GUI on top of the Breezy Access module. This interface allows basic control of the program and access to data and type information. Using the GUI, the user can choose parallel objects from the program, and specify which elements of those objects are of interest. For elements that are structures, the user can choose a particular eld of the structure from a display of the structure type. The data from these selected data structures can be processed in two ways: it can be displayed in a scrollable text window or piped to a separate process.

The next application of Breezy was as a utility for extracting data from a specic parallel pc++ program for visualization. The parallel application dealt with objects in three dimensional space. These objects were visualized using a visualization language, VIZ, which is a STk[8] based language designed for building visualization tools and prototyping application specic visualizations. The latest project applying Breezy is a Distributed Array Visualizer Environment (DAVE)[9][10]. DAVE acts as a database front-end to program data and information. DAVE, in turn, relies on Breezy to actually retrieve that data. DAVE may have several data analysis/visualization applications available. A user species through DAVE's GUI what data is to be retrieved (utilizing information from Breezy) and to which of these applications that data is to be sent. 6 Future work There are currently many projects underway in areas of modifying and extending Breezy as well as in using Breezy as a tool to build on. The network communication of Breezy has been implemented using sockets. A new version of Breezy will use CORBA/IDL[6] for its transportation layer. The Breakpoint Executive module will be a CORBA compliant object, from which clients can request data from the program. This data will be encoded as IDL structures. Our research team is currently working to develop program analysis tools for HPF. Breezy will be one of the tools that will be incorporated into this HPF environment. DAVE[9][10] will be extended to deal with CORBA objects and communicate directly with Breezy via the CORBA interface. 7 Conclusion The result of the research presented here is a general architecture for runtime interaction with a dataparallel program. We have applied this architecture in the development of the Breezy tool for the pc++ language. There are two main conclusions from this work. First, when interaction support is integrated with a language system, the opportunity exists to implement a model that is consistent with the language design, aiding application developers or the tool builders that require this interaction. Second, the development of interaction support can leverage o the language itself as well as the compiler and runtime systems to implement it. References [1] B. Topol, J. T. Stasko, Integrating Visualization Support Into Distributed Computing Systems, Georgia Institute of Technology, Tech. Rep. GIT- GVU-94-38, Oct., 1994. [2] W. Gu, G. Eisenhauer, E. Kraemer, K. Schwan, J. Stasko, J. Vetter, N. Mallavarupu, Falcon: Online Monitoring and Steering of Large-Scale Parallel Programs, Proc. Frontiers of Massively Parallel Computation, pp. 442-429, Feb. 1995. [3] F. Bodin, P. Beckman, D. Gannon, S. Yang, S. Kesavan, A. Malony, B. Mohr, Implementing a Parallel C++ Runtime System for Scalable Parallel Systems, Proc. 1993 Supercomputing Conference, Portland, pp. 588-597, Nov. 1993. [4] High Performance Fortran Forum. High Performance Language Specication (Version 1.0). Rice University, May 3, 1993. [5] D. Brown, S. Hackstadt, A. Malony, B. Mohr, Program Analysis Environments for Parallel Language Systems: The TAU Environment, Proc. 2nd Workshop on Environments and Tools For Parallel Scientic Computing, Townsend, Tennessee, pp. 162-171, May 1994. [6] Object Management Group, The Common Object Request Broker: Architecture and Specication, Version 1.2. [7] F. Bodin, P. Beckman, D. Gannon, J. Gotwals, S. Narayana, S. Srinivas, B. Winnicka, Sage++: An Object Oriented Toolkit and Class Library for Building Fortran and C++ Restructuring Tools, Proc. Oonski `94, Oregon, 1994. [8] E. Gallesio, STk Reference Manual, version 2.1.6. Universite de Nice, Feb. 1995. [9] "Distributed Array Query and Visualization," Parallel Tools Consortium Working Document, December 14, 1994. [10] Parallel Tools Consortium Working Group on Distributed Array Visualization, URL: http:// www.llnl.gov/ptools/. [11] Breezy: A Tool for Runtime Program Interaction With Data Parallel Programs, URL: http:// www.cs.uoregon.edu/paracomp/tau/breezy/.