Khoral Research, Inc. Khoros is a powerful, integrated system which allows users to perform a variety

Similar documents
Khoros: A Problem Solving Environment for. Danielle Argiro Steve Kubica Mark Young. Jeremy Worley Steve Jorgensen. Khoral Research, Inc.

Technische Universitat Munchen. Institut fur Informatik. D Munchen.

Developing a Multimedia Toolbox for the Khoros System. Yuh-Lin Chang. Rafael Alonso. Matsushita Information Technology Laboratory

Reverse Engineering with a CASE Tool. Bret Johnson. Research advisors: Spencer Rugaber and Rich LeBlanc. October 6, Abstract

Compiler and Runtime Support for Programming in Adaptive. Parallel Environments 1. Guy Edjlali, Gagan Agrawal, and Joel Saltz

Language-Based Parallel Program Interaction: The Breezy Approach. Darryl I. Brown Allen D. Malony. Bernd Mohr. University of Oregon

RECONFIGURATION OF HIERARCHICAL TUPLE-SPACES: EXPERIMENTS WITH LINDA-POLYLITH. Computer Science Department and Institute. University of Maryland

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

DRAFT for FINAL VERSION. Accepted for CACSD'97, Gent, Belgium, April 1997 IMPLEMENTATION ASPECTS OF THE PLC STANDARD IEC

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Benchmarking the CGNS I/O performance

DBMS Environment. Application Running in DMS. Source of data. Utilization of data. Standard files. Parallel files. Input. File. Output.

Stackable Layers: An Object-Oriented Approach to. Distributed File System Architecture. Department of Computer Science

Neuro-Remodeling via Backpropagation of Utility. ABSTRACT Backpropagation of utility is one of the many methods for neuro-control.

Chapter 1. Reprinted from "Proc. 6th SIAM Conference on Parallel. Processing for Scientic Computing",Norfolk, Virginia (USA), March 1993.

MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)

Reinforcement Control via Heuristic Dynamic Programming. K. Wendy Tang and Govardhan Srikant. and

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

CUMULVS: Collaborative Infrastructure for Developing. Abstract. by allowing them to dynamically attach to, view, and \steer" a running simulation.

task object task queue

Chapter 3 Parallel Software

MPI Casestudy: Parallel Image Processing

Parametric and provisioning approaches are again obviously useful for various

Application Programmer. Vienna Fortran Out-of-Core Program

Auto Source Code Generation and Run-Time Infrastructure and Environment for High Performance, Distributed Computing Systems

maximally convex volumes

Accelerated Library Framework for Hybrid-x86

In context with optimizing Fortran 90 code it would be very helpful to have a selection of

[8] J. J. Dongarra and D. C. Sorensen. SCHEDULE: Programs. In D. B. Gannon L. H. Jamieson {24, August 1988.

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

ABCDE. HP Part No Printed in U.S.A U0989

Coven - a Framework for High Performance Problem Solving Environments

MPI as a Coordination Layer for Communicating HPF Tasks

Reimplementation of the Random Forest Algorithm

A Component-based Programming Model for Composite, Distributed Applications

Ian Foster. Argonne, IL Fortran M is a small set of extensions to Fortran 77 that supports a

INFORMATION RETRIEVAL USING MARKOV MODEL MEDIATORS IN MULTIMEDIA DATABASE SYSTEMS. Mei-Ling Shyu, Shu-Ching Chen, and R. L.

director executor user program user program signal, breakpoint function call communication channel client library directing server

The members of the Committee approve the thesis of Baosheng Cai defended on March David B. Whalley Professor Directing Thesis Xin Yuan Commit

Parallel Pipeline STAP System

Dewayne E. Perry. Abstract. An important ingredient in meeting today's market demands

Storage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk

Message-Passing and MPI Programming

Research on outlier intrusion detection technologybased on data mining

Eigen Calculator. Farm Processor. Farmer 1. Farmer 2

Acknowledgments 2

2 Rupert W. Ford and Michael O'Brien Parallelism can be naturally exploited at the level of rays as each ray can be calculated independently. Note, th

MPI: A Message-Passing Interface Standard

1. Write two major differences between Object-oriented programming and procedural programming?

A Data Flow Approach to Color Gamut Visualization

Job Re-Packing for Enhancing the Performance of Gang Scheduling

SAMOS: an Active Object{Oriented Database System. Stella Gatziu, Klaus R. Dittrich. Database Technology Research Group

Submitted for TAU97 Abstract Many attempts have been made to combine some form of retiming with combinational

Convolutional Neural Networks for Object Classication in CUDA

Knowledge- Based System CORBA ORB

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu

A Java-based Course in Human-Computer Interaction. Eileen Kraemer. The University of Georgia. Athens, GA interface.

1 INTRODUCTION The LMS adaptive algorithm is the most popular algorithm for adaptive ltering because of its simplicity and robustness. However, its ma

Progress in Image Analysis and Processing III, pp , World Scientic, Singapore, AUTOMATIC INTERPRETATION OF FLOOR PLANS USING

Compilation Issues for High Performance Computers: A Comparative. Overview of a General Model and the Unied Model. Brian J.

Interme diate DNS. Local browse r. Authorit ative ... DNS

Abstract. provide substantial improvements in performance on a per application basis. We have used architectural customization

Parallel and High Performance Computing CSE 745

Part 1. SAS AppDev Studio. Chapter 1 Getting Started with SAS AppDev Studio 3

General Objective:To understand the basic memory management of operating system. Specific Objectives: At the end of the unit you should be able to:

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T.

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

An Embedded Wavelet Video. Set Partitioning in Hierarchical. Beong-Jo Kim and William A. Pearlman

An Embedded Wavelet Video Coder. Using Three-Dimensional Set. Partitioning in Hierarchical Trees. Beong-Jo Kim and William A.

Keywords: networks-of-workstations, distributed-shared memory, compiler optimizations, locality

SOFTWARE VERIFICATION RESEARCH CENTRE DEPARTMENT OF COMPUTER SCIENCE THE UNIVERSITY OF QUEENSLAND. Queensland 4072 Australia TECHNICAL REPORT

2 Texts and Text Styles The texts considered for generation in the current stage of the AGILE project are simplied versions of routine passages occurr

6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP

XML-Based Representation. Robert L. Kelsey

Pacific Symposium on Biocomputing 4: (1999)

Tarek S. Abdelrahman and Thomas N. Wong. University oftoronto. Toronto, Ontario, M5S 1A4. Canada

VFComb { a program for design of virtual fonts A.S. Berdnikov and S.B. Turtia Institute for Analytical Instrumentation Rizskii pr. 26, St.Peter

C. E. McDowell August 25, Baskin Center for. University of California, Santa Cruz. Santa Cruz, CA USA. abstract

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

and easily tailor it for use within the multicast system. [9] J. Purtilo, C. Hofmeister. Dynamic Reconguration of Distributed Programs.

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano

which a value is evaluated. When parallelising a program, instances of this class need to be produced for all the program's types. The paper commented

Parallel I/O Libraries and Techniques

Database Systems Concepts *

Algorithms Implementing Distributed Shared Memory. Michael Stumm and Songnian Zhou. University of Toronto. Toronto, Canada M5S 1A4

BMVC 1996 doi: /c.10.41

Documentation Open Graph Markup Language (OGML)

Overpartioning with the Rice dhpf Compiler

A Framework for Building Parallel ATPs. James Cook University. Automated Theorem Proving (ATP) systems attempt to prove proposed theorems from given

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville

The Matrix Market Exchange Formats:

Point-to-Point Synchronisation on Shared Memory Architectures

Iteration constructs in data-ow visual programming languages

Preliminary results from an agent-based adaptation of friendship games

Think of drawing/diagramming editors. ECE450 Software Engineering II. The problem. The Composite pattern

Outline. 1 About the course

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907

Global Scheduler. Global Issue. Global Retire

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley

Transcription:

Data Parallel Programming with the Khoros Data Services Library Steve Kubica, Thomas Robey, Chris Moorman Khoral Research, Inc. 6200 Indian School Rd. NE Suite 200 Albuquerque, NM 87110 USA E-mail: info@khoral.com 1 Introduction Khoros is a powerful, integrated system which allows users to perform a variety of tasks related to image and signal processing, data exploration, and scientic visualization. Khoros includes a visual programming language, interactive image display and plotting programs, and an extensive set of image processing, data manipulation, scientic visualization, geometry and matrix operators. Additionally, Khoros contains a suite of software development tools that allow users to create new applications[1]. Cantata, the visual programming environment available with Khoros, provides a tool for algorithm development and rapid prototyping. Users can construct visual programs within Cantata by connecting glyphs, which represent the actual data processing or visualization operators to be executed, with data connections, which represent the data ow between these operators. This intuitive approach provides a simple and accessible way to interconnect and execute many dierent programs. 2 Khoros Support for High Performance Computing DARPA, Wright Laboratories, and Phillips Laboratories, in conjunction with Khoral Research, have developed a combined plan to extend the Khoros framework to support parallel computers and eventually even embedded High Performance Computers (HPC)[2]. This Advanced Khoros system will provide support for the development and creation of high-performance parallel and embedded systems using a common framework for all levels of development. Research in parallel algorithms has tended to focus on speeding up complex algorithms involving complex communication patterns. The focus of Advanced Khoros is to handle the simpler parallelization common in many image and signal processing applications and to develop programming tools that simplify the time-consuming conversion of serial to parallel programs.

2.1 Task Parallelism Task Parallelism, or course-grained parallelism, refers to the ability to run multiple separate processes in parallel. Multiple branches in a workspace are able to run simultaneously. With task parallelism, there is no interprocess communication between the workspace operators beyond their input and output connections. 2.2 Data Parallelism Data Parallelism, or ne-grained parallelism, refers to the ability to process data in parallel within a single routine. Data parallelism would be exhibited by a single Khoros operator executed in a Single Program, Multiple Data (SPMD) manner processing a scattered data set across a number of processors. This data parallel operator would distribute or scatter an input data set across multiple processes, process the data, and potentially regather the output. 3 A Merging of Paradigms There are two paradigms to consider in writing parallel routines: { The message passing programming paradigm is exemplied by the Message Passing Interface (MPI). Data distribution is explicitly handled by the programmer using direct MPI communication calls. This is exible, but not convenient. All communication must be explicitly managed by the programmer. Developers who use MPI calls directly must deal with a fair amount of low- level complexity when structuring the intercommunication between parallel processes. This complexity can be prohibitive for developers with little or no parallel programming experience who wish either to write new parallel code or to migrate existing serial code to a parallel machine. { The data parallel programming paradigm is exemplied by High Performance Fortran (HPF). In HPF, a data distribution is specied on an array with a compiler directive; the compiled code will automatically handle the data distribution. While this mechanism is convenient, the compiler often does not choose optimal strategies, and there is no mechanism for optimization. Distributed data services provides the easy-to-use features of HPF and the exibility of MPI. Data distribution will be handled automatically, reducing the work to create parallel programs, while low-level communication calls can still be used as needed as for critical code speed-up. 4 Data Services Overview Data services provides access to data through an abstract data object via domainspecic data models. The polymorphic data model is the most commonly used,

and is accessible through the polymorphic data services library. The polymorphic data model provides a uniform interpretation of data across many dierent processing domains. Consisting of a number of data segments representing a time-series of volumes in space, the structure of this model is ideal for storing signal, image, and volumetric data. The primary data segment, known as the value segment, is used to store the actual values intended for processing[3]. The data services application programming interface (API) consists of a set of library functions that operate on an abstract data object. The internals of the data object are kept hidden from the developer; the data contained in the object can never be accessed directly. Access is only possible through the data service library calls. Within a program, a data object is declared to be an instance of the abstract data type kobject. Every data processing routine in Khoros uses these objects in approximately the same manner; data is read in from one or more input objects, processed, and then written out to one or more output objects. Data objects contain primitives and attributes. Primitives are used to access data within the data object and attributes are used to access meta-data within the data object. Meta-data is a term used loosely to cover characteristics of the data such as size and data type as well as auxiliary information such as the date or a comment. User-dened attributes can also be created to store any needed application-specic information. 4.1 Distributed Data Services Distributed data services is implemented as an extension to polymorphic data services. These extensions allow a developer to distribute the value segment across several dierent processors for processing. Following the SPMD model, each processor will be running the same program, but will be operating on a dierent subset of the overall value segment. The scattering of data to each processor and the gathering of the nal results will be handled within data services. Since the only modications in the API are made for handling this data distribution, developers should be able to easily migrate their existing polymorphic data services code to run on parallel machines. The key idea behind distributed data services is the concept of a distributed data object. A distributed data object is an object that consists of many subobjects distributed over several dierent processors. When viewed collectively, these subobjects act as local handles to a conceptual global object. Each processor only has access to the portion of the value segment which is contained in its local subobject. The global characteristics are accessible to each processor from each local subobject via special attributes. With a few exceptions, distributed data objects are handled just like regular data objects in terms of accessing data and attributes.

4.2 Process Groups and Communicators The concept of groups and communicators has been borrowed from MPI. In MPI, the set of processes involved in an interprocess communication is dened to be a group. Each process within a group is givenanumerical rank, identifying it uniquely within that group. For a given communication call, a communicator handle is used to identify the group in which that communication will occur. MPI COMM WORLD is the communicator for the process group which contains all processes; that is all the processes that were collectively initiated with a call to MPI Init. New communicators can be created as subsets of this communicator. This provides a mechanism for isolating the communication to subgroups within the group of all processes. For distributed data services, a communicator is used to specify over which processes data distribution shall occur. The handle KCOMM WORLD is initialized to refer to the communication group that contains all processes. Khoros extends the concept of groups or communicators by using scattered and gathered states for a communicator where gathered implies all data resides on the root node. Within each communication group, the process with a rank of 0 is considered to be the root process for that group. For any given Khoros communicator, the communication group that contains only the root process is given by the function KCOMM SELF(). 4.3 Shape Description The shape of the data object refers to the manner in which a dataset is distributed across a series of processors. The shape is dened in terms of the polymorphic data model, and dictates how each segment in the model is divided over a process group. The polymorphic value segment consists of a ve-dimensional array of data; it is best pictured as a time-series of volumes, where every element of the volume is a vector of data values. Partitioning is always along a single dimension of the polymorphic data model. The shape is controlled via a data services attribute: KPDS DISTRIBUTION. The eect of setting this attribute is immediate; the data object will take on the new shape when the set attribute call is made. Setting this attribute is done collectively by all processors on their local kobject handles. All processors must set the same attribute values. The KPDS DISTRIBUTION attribute consists of three values: { DIMENSION This value describes the polymorphic dimension along which the partitioning will occur. Eventually, partitioning in more than one dimension may be achieved using recursion.

{ DISTRIBUTION This value describes how the data is to be distributed along the distribution dimension. Possible values are KREPLICATED, KBLOCK, or KCYCLIC. A KREPLICATED distribution implies that the entire dataset is replicated to each processor. A KBLOCK distribution implies that the overall size of the partition dimension is divided by the number of processors, with each processor getting a contiguous block of data along that dimension. A KCYCLIC distribution implies that every slice of the partition is distributed to a different processor in a cyclic manner. { COMMUNICATOR This value species the Khoros communicator over which the distribution should be performed. The value KCOMM WORLD will dictate that the data should be distributed over all processors. The value KCOMM SELF(KCOMM WORLD) will distribute the data only to the root process. Note that data distributed to one process is eectively gathered on that process; hence changing the communicator value is used to scatter and gather the data. Any existing data le can be opened as a distributed data object and will have a communicator of KCOMM SELF(KCOMM WORLD). A scatter can be accomplished simply by setting a communicator value of KCOMM WORLD on a distributed data object. Since many signal and image processing algorithms require data from adjacent blocks when processing, a KPDS OVERLAP attribute is also provided to specify an overlap for KBLOCK distributions. These overlaps aect the size of each local subobject on the distributed dimension; the rst and last subobjects are expanded by the overlap size while the other processors are expanded by two times the overlap size. When collecting distributed overlapped data, only the non-overlapped regions from each subobject will be collected. This attribute has no aect on KCYCLIC or KREPLICATED distributions. 4.4 Related Work This specic concept of distributing polymorphic data objects is not completely new. Very similar work on a parallel Khoros Data Services was done by Doug Warner at the Albuquerque Resource Center (ARC). The main dierence between Warner's approach and that presented here is that his required an explicit gathering glyph to recollect the data[4]. A similar concept has also been investigated by Nathan Doss and Thomas McMahon at Mississippi State University[5]. In their approach, a polymorphic data object is explicitly distributed which produces local data objects for each machine. The glyph accepting input will block on the opening of the input until the glyph producing output executes a close call for the output. At that time, the communication between the two glyphs, dened to be in dierent MPI groups, will take place. This communication will

be entirely in memory. This approach is particularly interesting since both routines are running simultaneously during the communication. One way in which our approach diers from theirs is that each processor is allowed access not only to the local data object, but also to the initial global data object. Their approach requires multiple copies of the data, contained in multiple subobjects. 5 Summary The distributed data service library allows developers to control the distribution of data in a parallel program simply by setting attributes on a distributed data object. This interface provides the power of a data parallel programming paradigm by abstracting all the low-level communication required for eecting a data distribution. The simplicity of the interface should also facilitate the porting of serial routines to run on parallel architectures. The emphasis in the development of distributed data services has been to provide a framework which addresses the common problems encountered in writing even the simplest data parallel programs, while still being extensible to problems which are less frequently encountered, and often more complicated. Distributed data services allows developers writing such applications to use low-level MPI calls as necessary. In this way, distributed data services combines the convenience of a data parallel language such as HPF with the exibility of the message passing library MPI. AVIS Contract Acknowledgment MAUI Contract Acknowledgment This project is supported by Federal Contract This project is supported by Federal Contract Number N660001-96-C-8619 sponsored by Number F296001-96-C0006 sponsored by DARPA/ITO United States Air Force 3701 N. Fairfax Drive Air Force Systems Command Arlington, Virginia Phillips Laboratory (PL) 22203-1714 Kirtland AFB, New Mexico 87117-6008 References 1. John Rasure and Steve Kubica. The khoros application development environment. In H.L. Christensen and J.L. Crowley, editors, Experimental Environments for Computer Vision and Image Processing. World Scientic, 1994. 2. Khoral Research Inc and the DARPA/WL Khoros Coordinating Committee. Advanced khoros software development plan. White paper produced as deliverable for KRI contract to DARPA/WL Advanced Khoros Working Group, 1996. 3. Khoral Research, Inc. Khoros Manuals: Programming Services Volume II, 1994. 4. Doug Warner. Parallel khoros data services. In Parallel Khoros Workshop. Albuquerque Resource Center, April 1996. 5. Nathan Doss and Thomas McMahon. An integrated khoros and mpi system for the development of portable dsp applications. Department of Computer Science and NSF Engineering Research Center for Computiation Field Simulation, Mississippi State University, 1996. This article was processed using the LaT E X macro package with LLNCS style