Grid Application Development Software

Similar documents
Grid Computing: Application Development

Compiler Technology for Problem Solving on Computational Grids

Compilers and Run-Time Systems for High-Performance Computing

Component Architectures

Numerical Libraries And The Grid: The GrADS Experiments With ScaLAPACK

Experiments with Scheduling Using Simulated Annealing in a Grid Environment

Adaptive Scientific Software Libraries

Biological Sequence Alignment On The Computational Grid Using The Grads Framework

High Performance Computing. Without a Degree in Computer Science

Advanced Computing Research Laboratory. Adaptive Scientific Software Libraries

Page 1. Distributed and Parallel Systems. High Performance Computing, Computational Grid, and Numerical Libraries. What is Grid Computing?

Toward a Framework for Preparing and Executing Adaptive Grid Programs

GrADSoft and its Application Manager: An Execution Mechanism for Grid Applications

A Performance Oriented Migration Framework For The Grid Λ

New Grid Scheduling and Rescheduling Methods in the GrADS Project

Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok

Compiler Architecture for High-Performance Problem Solving

A Decoupled Scheduling Approach for the GrADS Program Development Environment. DCSL Ahmed Amin

Why Performance Models Matter for Grid Computing

CS Understanding Parallel Computing

Biological Sequence Alignment On The Computational Grid Using The GrADS Framework

International Journal of Parallel Programming, Vol. 33, No. 2, June 2005 ( 2005) DOI: /s

Self Adaptivity in Grid Computing

Research in High-Performance Grid Software

Building Performance Topologies for Computational Grids UCSB Technical Report

An Adaptive Framework for Scientific Software Libraries. Ayaz Ali Lennart Johnsson Dept of Computer Science University of Houston

GRID*p: Interactive Data-Parallel Programming on the Grid with MATLAB

Latency Hiding by Redundant Processing: A Technique for Grid enabled, Iterative, Synchronous Parallel Programs

Research Related Activities

Why Performance Models Matter for Grid Computing

Virtual Grids. Today s Readings

Building Performance Topologies for Computational Grids

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

From Beowulf to the HIVE

Real Parallel Computers

Decreasing End-to Job Execution Times by Increasing Resource Utilization using Predictive Scheduling in the Grid

Compilers for High Performance Computer Systems: Do They Have a Future? Ken Kennedy Rice University

CISC 879 Software Support for Multicore Architectures Spring Student Presentation 6: April 8. Presenter: Pujan Kafle, Deephan Mohan

10 Gbit/s Challenge inside the Openlab framework

SIMULATION OF ADAPTIVE APPLICATIONS IN HETEROGENEOUS COMPUTING ENVIRONMENTS

Enhanced Representation Of Data Flow Anomaly Detection For Teaching Evaluation

Adaptive Timeout Discovery using the Network Weather Service

GrADSolve - a Grid-based RPC system for. Parallel Computing with Application-Level. Scheduling

Performance Analysis and Modeling of the SciDAC MILC Code on Four Large-scale Clusters

Heterogeneous Grid Computing: Issues and Early Benchmarks

PM2: High Performance Communication Middleware for Heterogeneous Network Environments

Co-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University

Profile-Based Load Balancing for Heterogeneous Clusters *

HETEROGENEOUS COMPUTING

Generation of High Performance Domain- Specific Languages from Component Libraries. Ken Kennedy Rice University

MPICH-G2 performance evaluation on PC clusters

A Resource Look up Strategy for Distributed Computing

COSC 6385 Computer Architecture - Multi Processor Systems

Using Cache Models and Empirical Search in Automatic Tuning of Applications. Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX

FOBS: A Lightweight Communication Protocol for Grid Computing Phillip M. Dickens

Input parameters System specifics, user options. Input parameters size, dim,... FFT Code Generator. Initialization Select fastest execution plan

The DETER Testbed: Overview 25 August 2004

A Performance Contract System in a Grid Enabling, Component Based Programming Environment *

Algorithm and Library Software Design Challenges for Tera, Peta, and Future Exascale Computing

MPI History. MPI versions MPI-2 MPICH2

Production Grids. Outline

Parallel Linear Algebra on Clusters

The Grid: Feng Shui for the Terminally Rectilinear

Profiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA

APPLICATIONS. 1 Introduction

ReconOS: An RTOS Supporting Hardware and Software Threads

A Performance Evaluation of WS-MDS in the Globus Toolkit

XPU A Programmable FPGA Accelerator for Diverse Workloads

Data Block. Data Block. Copy A B C D P HDD 0 HDD 1 HDD 2 HDD 3 HDD 4 HDD 0 HDD 1

MPI versions. MPI History

Lecture 3: Intro to parallel machines and models

Algorithms and Computation in Signal Processing

NOW and the Killer Network David E. Culler

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

A Replica Location Grid Service Implementation

Self-adapting Numerical Software for Next Generation Applications Lapack Working Note 157, ICL-UT-02-07

Introduction to HPC. Lecture 21

GrADSAT: A Parallel SAT Solver for the Grid

Installation and Test of Molecular Dynamics Simulation Packages on SGI Altix and Hydra-Cluster at JKU Linz

The Use of Cloud Computing Resources in an HPC Environment

Cost-Performance Evaluation of SMP Clusters

The Performance Analysis of Portable Parallel Programming Interface MpC for SDSM and pthread

Ed D Azevedo Oak Ridge National Laboratory Piotr Luszczek University of Tennessee

Low Cost Supercomputing. Rajkumar Buyya, Monash University, Melbourne, Australia. Parallel Processing on Linux Clusters

OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP

APENet: LQCD clusters a la APE

Representing Dynamic Performance Information in Grid Environments with the Network Weather Service Λ

An Experience in Accessing Grid Computing from Mobile Device with GridLab Mobile Services

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Linux Compute Cluster in the German Automotive Industry

Overview of research activities Toward portability of performance

Evaluating Algorithms for Shared File Pointer Operations in MPI I/O

Linear Algebra Computation Benchmarks on a Model Grid Platform

Distributed ASCI Supercomputer DAS-1 DAS-2 DAS-3 DAS-4 DAS-5

Support for Programming Reconfigurable Supercomputers

The MOSIX Algorithms for Managing Cluster, Multi-Clusters, GPU Clusters and Clouds

An Introduction to the Grid

Statistical Evaluation of a Self-Tuning Vectorized Library for the Walsh Hadamard Transform

Predicting Slowdown for Networked Workstations

R. K. Ghosh Dept of CSE, IIT Kanpur

Transcription:

Grid Application Development Software Department of Computer Science University of Houston, Houston, Texas

GrADS Vision Goals Approach Status http://www.hipersoft.cs.rice.edu/grads

GrADS Team (PIs) Ken Kennedy Ruth Aydt Francine Berman Andrew Chien Keith Cooper Jack Dongarra Ian Foster Dennis Gannon Carl Kesselman Dan Reed Linda Torczon Richard Wolski

E-Science: Data Gathering, Analysis, Simulation, and Collaboration Simulated Higgs Decay CMS LHC

What Is a Grid? A collection of distributed, interconnected data bases, instruments, computing and visualization resources, potentially dynamically (re)configured Accessed through a broad range of devices and methods Sharing a common set of services Exhibiting a great diversity in architecture, capabilities, reliability, availability, and predictability Managed and administered independently with a broad range of policies

GrADS - A Software Grand Challenge Goals Tools and methodologies for development and performance management of applications leading to predictable performance and high efficiency on Grids Challenges Presenting a high-level application development interface If programming is hard, its useless Designing and constructing applications for adaptability Mapping applications to dynamically changing configurations Determining when to interrupt execution and remap Application monitors Performance estimators

GrADSoft Architecture Goal: reliable performance under varying load Performance Feedback Software Components Performance Problem Real-time Performance Monitor Whole- Program Compiler Source Application Configurable Object Program Service Negotiator Scheduler Negotiation Grid Runtime System Libraries Dynamic Optimizer GrADS Project (NSF NGS): Berman, Chien, Cooper, Dongarra, Foster, Gannon Johnsson, Kennedy, Kesselman, Mellor-Crummey, Reed, Torczon, Wolski

GrADSoft Architecture Execution Environment Performance Feedback Software Components Performance Problem Real-time Performance Monitor Whole- Program Compiler Source Application Configurable Object Program Service Negotiator Scheduler Negotiation Grid Runtime System Libraries Dynamic Optimizer

GrADSoft Architecture Program Preparation System Execution Environment Performance Feedback Software Components Performance Problem Real-time Performance Monitor Whole- Program Compiler Source Application Configurable Object Program Service Negotiator Scheduler Negotiation Grid Runtime System Libraries Dynamic Optimizer

Research Strategy Begin Modestly prototype a series of applications using components of envisioned execution system ScaLAPACK Demo, Cactus Demo evolve the definition of performance contracts prototype reconfiguration system with performance monitoring, without dynamic optimization Move from Hand Development to Automated System identify key steps use experience to expand design of software integration and execution systems Experiment use an artificial testbed to test performance under varying conditions move to real applications on the MacroGrid testbed

GrADS - Status Testbeds Macro Grid (based on Globus, site resources) Micro Grid (Andrew Chien et. al., simulator) Experiments ScalaPack (GrADS team, Dongarra lead) Cactus (Ed Seidel et. al., GrADS team)

GrADS ScaLAPACK Implement a version of a ScaLAPACK library routine that runs on the Grid. Make use of resources at the user s disposal Provide the best time to solution Proceed without the user s involvement Make as few changes as possible to the numerical software. Assumption is that the user is already Grid enabled and runs a program that contacts the execution environment to determine where the execution should take place.

How ScaLAPACK Works To use ScaLAPACK a user must: Download the package and auxiliary packages to the machines Write a SPMD program which Sets up the logical process grid Places the data on the logical process grid Calls the library routine in a SPMD fashion Collects the solution after the library routine finishes The user must allocate the processors and decide the number of processors the application will run on The user must start the application mpirun np N user_app The number of processors is fixed by the user before the run Upon completion, return the processors to the pool of resources

GrADS Numerical Library Want to relieve the user of some of the tasks Make decisions on which machines to use based on the user s problem and the state of the system Optimize for the best time to solution Distribute the data on the processors and collections of results Start the SPMD library routine on all the platforms Check to see if the computation is proceeding as planned If not perhaps migrate application

Grid Environment for this Experiment 4 cliques 43 boxes UCSD Quidam, Mystere, Soleil PII 400Mhz 100Mb sw Dralion, Nouba PIII 450Mhz 100Mb sw UIUC amajor-dmajor PII 266Mhz 100Mb sw opus0, opus13-opus16 PII 450Mhz Myrinet U Tennessee cypher01 cypher16 dual PIII 500Mhz 1 Gb sw U Tennessee torc0-torc8 Dual PIII 550Mhz 100 Mb sw ferret.usc.edu 64 Proc SGI 32-250mhz 32-195Mhz jupiter.isi.edu 10 Proc SGI lunar.uits.indiana.edu

Components Used Globus version 1.1.3 Autopilot version 2.3 NWS version 2.0.pre2 MPICH-G version 1.1.2 ScaLAPACK version 1.6 ATLAS/BLAS version 3.0.2 BLACS version 1.1 PAPI version 1.1.5 GrADS Crafted code Millions of lines of code and installed on the GrADS Macro Grid

Heterogeneous Grid TORC CYPHER OPUS Type Cluster 8 Dual Pentium III OS Red Hat Linux 2.2.15 SMP Cluster 16 Dual Pentium III Debian Linux 2.2.17 SMP Cluster 8 Pentium II Red Hat Linux 2.2.16 Memory 512 MB 512 MB 128 or 256 MB CPU speed 550 MHz 500 MHz 265 448 MHz Network Fast Ethernet (100 Mbit/s) (3Com 3C905B) and switch (BayStack 350T) with 16 ports Gigabit Ethernet (SK- 9843) and switch (Foundry FastIron II) with 24 ports IP over Myrinet (LANai 4.3) + Fast Ethernet (3Com 3C905B) and switch (M2M-SW16 & Cisco Catalyst 2924 XL) with 16 ports each

Performance Model vs Runs

Overheads

Grid ScaLAPACK vs Non-Grid ScaLAPACK, Dedicated Torc machines Time (seconds) 600 500 400 300 200 100 Time for Application Execution Time for processes spawning Time for NWS retrieval Time for MDS retrieval 0 Grid Non- Grid N=600, NB=40, Grid Non- Grid N=1500, NB=40, Grid Non- Grid N=5000, NB=40, Grid Non- Grid N=8000, NB=40, Grid Non- Grid N=10,000, NB=40, 2 torc procs. Ratio: 46.12 4 torc procs. Ratio: 15.03 6 torc procs. Ratio: 2.25 8 torc procs. Ratio: 1.52 8 torc procs. Ratio: 1.29

ScaLAPACK across 3 Clusters Time (seconds) 3500 3000 2500 2000 1500 1000 500 0 OPUS OPUS, CYPHER OPUS, TORC, CYPHER 2 OPUS, 4 TORC, 6 CYPHER 8 OPUS, 4 TORC, 4 CYPHER 8 OPUS, 2 TORC, 6 CYPHER 6 OPUS, 5 CYPHER 8 OPUS, 6 CYPHER 8 OPUS 5 OPUS 8 OPUS 0 5000 10000 15000 20000 Matrix Size

Largest Problem Solved Matrix of size 30,000 7.2 GB for the data 32 processors to choose from UIUC and UT Not all machines have 512 MBs, some little as 128 MBs PM chose 17 machines in 2 clusters from UT Computation took 84 minutes 3.6 Gflop/s total 210 Mflop/s per processor ScaLAPACK on a cluster of 17 processors would get about 50% of peak Processors are 500 MHz or 500 Mflop/s peak For this grid computation 20% less than ScaLAPACK

UHFFT Library Organization UHFFT Library Library of FFT Modules Initialization Routines Execution Routines Utilities FFT Code Generator Mixed-Radix (Cooly-Tukey) Split-Radix Algorithm Prime Factor Algorithm Rader Algorithm Unparser Scheduler Key: Optimizer Initializer (Algorithm Abstraction) Fixed library code Generated code Code generator

Summary Goal: Build programming systems for the Grid that enable ordinary users to develop and run applications in this complex environment Execution of applications must be adaptive In response to changing loads and resources Software support is challenging Must manage execution to reliable completion Must prepare a program for execution Should support high-level domain-specific programming For breadth of user community Research strategy involves developing a series of prototypes Evolve the infrastructure based on lessons learned