IOS: A Middleware for Decentralized Distributed Computing

Similar documents
A FRAMEWORK FOR THE DYNAMIC RECONFIGURATION OF SCIENTIFIC APPLICATIONS IN GRID ENVIRONMENTS

A Middleware Framework for Dynamically Reconfigurable MPI Applications

AUTONOMIC GRID COMPUTING USING MALLEABILITY AND MIGRATION: AN ACTOR-ORIENTED SOFTWARE FRAMEWORK

The Internet Operating System: Middleware for Adaptive Distributed Computing

Malleable iterative MPI applications

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Peer-to-Peer Systems. Chapter General Characteristics

Chapter 3. Design of Grid Scheduler. 3.1 Introduction

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Telecommunication Services Engineering Lab. Roch H. Glitho

Adaptive Cluster Computing using JavaSpaces

MIDDLEWARE FOR AUTONOMOUS RECONFIGURATION OF VIRTUAL MACHINES

Lecture 9: MIMD Architectures

A GENERIC FRAMEWORK FOR DISTRIBUTED COMPUTATION

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Functional Requirements for Grid Oriented Optical Networks

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Introduction to Distributed Systems

Assignment 5. Georgia Koloniari

Exploiting peer group concept for adaptive and highly available services

Early Measurements of a Cluster-based Architecture for P2P Systems

TDP3471 Distributed and Parallel Computing

Lecture 9: MIMD Architectures

Chapter 18 Parallel Processing

Distributed Systems LEEC (2006/07 2º Sem.)

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat

Lecture 9: MIMD Architecture

Principles of Parallel Algorithm Design: Concurrency and Mapping

6. Peer-to-peer (P2P) networks I.

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Layers of an information system. Design strategies.

Principles of Parallel Algorithm Design: Concurrency and Mapping

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system

Malleable Components for Scalable High Performance Computing

Overview. Concurrent and Distributed Programming Patterns in SALSA. Farmer Worker Computations. Farmer Worker Computations

Distributed Systems. Overview. Distributed Systems September A distributed system is a piece of software that ensures that:

Chapter 1: Distributed Information Systems

Chapter 2 Distributed Information Systems Architecture

Introduction to parallel Computing

Approaches to Architecture-Aware Parallel Scientific Computation

Current Topics in OS Research. So, what s hot?

Parallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)

Distributed Computing. Santa Clara University 2016

Distributed System Chapter 16 Issues in ch 17, ch 18

Distributed KIDS Labs 1

Supporting Application- Tailored Grid File System Sessions with WSRF-Based Services

Data Transformation and Migration in Polystores

The Use of Cloud Computing Resources in an HPC Environment

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

02 - Distributed Systems

Systematic Cooperation in P2P Grids

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

6.1 Multiprocessor Computing Environment

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization. Dmitry I. Lyakh.

Announcements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris

Module 16: Distributed System Structures. Operating System Concepts 8 th Edition,

An agent-based peer-to-peer grid computing architecture

Gustavo Alonso, ETH Zürich. Web services: Concepts, Architectures and Applications - Chapter 1 2

Distributed systems. Distributed Systems Architectures. System types. Objectives. Distributed system characteristics.

Crossbar switch. Chapter 2: Concepts and Architectures. Traditional Computer Architecture. Computer System Architectures. Flynn Architectures (2)

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Distributed Systems. Lecture 4 Othon Michail COMP 212 1/27

Salsa: Scalable Asynchronous Replica Exchange for Parallel Molecular Dynamics Applications

PARALLEL AND DISTRIBUTED PLATFORM FOR PLUG-AND-PLAY AGENT-BASED SIMULATIONS. Wentong CAI

Nash Equilibrium Load Balancing

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Opportunistic Application Flows in Sensor-based Pervasive Environments

Distributed Systems Architectures. Ian Sommerville 2006 Software Engineering, 8th edition. Chapter 12 Slide 1

Dynamic Server Allocation in a Real-Life Deployable Communications Architecture for

Design of Parallel Algorithms. Models of Parallel Computation

Parallel Algorithm Design. CS595, Fall 2010

Scaling Tuple-Space Communication in the Distributive Interoperable Executive Library. Jason Coan, Zaire Ali, David White and Kwai Wong

Challenges in Ubiquitous Data Mining

Lecture 7: Parallel Processing

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

A Survey of Peer-to-Peer Content Distribution Technologies

Concurrent and Distributed Programming Patterns

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Distributed Information Processing

MPI versions. MPI History

02 - Distributed Systems

4. Networks. in parallel computers. Advances in Computer Architecture

MPI History. MPI versions MPI-2 MPICH2

Client Server & Distributed System. A Basic Introduction

Motivation for peer-to-peer

Streaming data Model is opposite Queries are usually fixed and data are flows through the system.

Dynamic inter-core scheduling in Barrelfish

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Lecture 4: Principles of Parallel Algorithm Design (part 4)

Module 16: Distributed System Structures

Distributed Systems Architectures. Ian Sommerville 2006 Software Engineering, 8th edition. Chapter 12 Slide 1

Introduction Distributed Systems

Ossification of the Internet

Distributed Operating Systems Fall Prashant Shenoy UMass Computer Science. CS677: Distributed OS

OPTIMIZING MOBILITY MANAGEMENT IN FUTURE IPv6 MOBILE NETWORKS

Transcription:

IOS: A Middleware for Decentralized Distributed Computing Boleslaw Szymanski Kaoutar El Maghraoui, Carlos Varela Department of Computer Science Rensselaer Polytechnic Institute http://www.cs.rpi.edu/wwc September 2005

General Concepts IOS Architecture

IOS Overview The Internet Operating System (IOS) is a decentralized middleware framework that provides: Goal: Opportunistic load balancing capabilities Resource profiling Application-level profiling Automatic Reconfiguration of applications in dynamic environments (e.g., Computational Grids) Scalability to worldwide execution environments Modular architecture enabling evaluation of different load balancing and resource profiling strategies 11/11/2005 3

Reconfigurable Distributed Applications Reconfigurable applications have several needs: Availability resilience to failures Scalability Ability to use new resources while the application is executing Adaptability Ability to adapt to load fluctuations to achieve high performance Autonomy Self-configuration, self-healing, and self-organization 11/11/2005 4

Resource Management Requirements Resource Allocation Pre-execution resource discovery Pre-execution allocation of resources Could be informed: an initial good match with application s requirements Could be non-informed: use any available resources and rely on the middleware to find the optimal allocation Resource Reallocation/Reconfiguration Dynamically and repeatedly modifying the mapping between application components and physical resources to respond to the dynamic behavior of the execution environment Need to support process/data migration Resource Profiling Continuous evaluation of both application-level requirements and resource-level characteristics. 11/11/2005 5

IOS Middleware Middleware Agents: Encapsulate components for resource profiling and reconfiguration policies Interface with high level applications Interfacing with IOS agents Applications need to implement specific APIs to interface with IOS agents to exchange profile information and reconfiguration requests. Applications need to support some level of component migration 11/11/2005 6

IOS Architecture 11/11/2005 7

IOS Architecture IOS middleware layer A Resource Profiling Component Captures information about actor, network topologies and available resources A Decision Component Takes migration, split/merge, or replication decisions based on profile information A Protocol Component Performs communication with other agents in a virtual network (e.g., peer-topeer, cluster-to-cluster, centralized) 11/11/2005 8

Using the IOS middleware Start IOS Peer Servers: a mechanism for peer discovery Start a network of IOS theaters Write SALSA programs with actors and extend all actors to autonomous actors Bind autonomous actors to theaters IOS automatically reconfigures the location of actors in the network for improved performance of the application. IOS supports the dynamic addition and removal of theaters 11/11/2005 9

Resource Sensitive Model Decision components use a resource sensitive model parameterized with application s profile information to decide how to balance the resources consumption Reconfiguration decisions Where to migrate When to migrate How many entities to migrate 11/11/2005 10

Virtual Topologies of IOS Agents Agents organize themselves in various networksensitive virtual topologies to sense the underlying physical environments Peer-to-peer topology: agents form a p2p network to exchange profiled information. Cluster-to-cluster topology: agents organize themselves in groups of clusters. Cluster managers form a p2p network. 11/11/2005 11

11/11/2005 12 C2C vs. P2P topologies Workstations Cluster Mobile Resources Cluster Cluster Manager Cluster Cluster Manager Cluster Cluster Manager

IOS Load Balancing Strategies IOS modular architecture enables using different load balancing and profiling strategies, e.g.: Round-robin (RR) Random work-stealing (RS) Application topology-sensitive work-stealing (ATS) Network topology-sensitive work-stealing (NTS) 11/11/2005 13

Random Stealing (RS) Based on Cilk s random work stealing Lightly-loaded nodes periodically send work steal packets to randomly picked peer theaters Application entities migrate from highly loaded theaters to lightly loaded theaters Simple strategy: no broadcasts required Stable strategy: it avoids additional traffic on overloaded networks 11/11/2005 14

Application Topology-Sensitive Work- Stealing (ATS) An extension of RS to collocate components that communicate frequently Decision agent picks the component that will minimize inter-node communication after migration, based on Location of acquaintances Collected communication history Tries to minimize the frequency of remote communication to improve overall system throughput 11/11/2005 15

Network Topology-Sensitive Work-Stealing (NTS) An extension of ATS to take the network topology and performance into consideration Periodically profile end-to-end network performance among peer theaters Latency Bandwidth Tries to minimize the cost of remote communication improving overall system throughput Tightly coupled entities stay within reasonably low latencies/ high bandwidths Loosely coupled entities can flow more freely 11/11/2005 16

Application Case Studies Actors and MPI programs

Salsa and Autonomous Actors SALSA application layer Programming language constructs for actor communication, migration, and coordination. Actors Unit of concurrency Asynchronous message passing State encapsulation Universal actors Universal names Location/theater Ability to migrate between theaters Autonomous actors Performance profiling to improve quality of service Autonomous migration to balance computational load Split and merge to tune granularity Replication to increase fault tolerance 11/11/2005 18

Preliminary Results---Unconnected/Sparse Load balancing experiments use RR, RS and ATS Applications with diverse inter-actor communication topologies Unconnected, sparse, tree, and hypercube actor graphs Unconnected Topology Sparse Topology 80000 50000 70000 45000 Throughput (Messages Processed) 60000 50000 40000 30000 20000 10000 0 0 10 20 30 40 50 60 70 80 90 Time (seconds) 100 110 120 130 140 ARS Throughput RS Throughput Round Robin 11/11/2005 19 Throughput (Messages Processed) 40000 35000 30000 25000 20000 15000 10000 5000 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Time (seconds) 100 105 110 115 120 125 130 135 140 145 ARS Throughput RS Throughput Round Robin

Tree and Hypercube Topology Results RS and ATS do not add substantial overhead to RR ATS performs best in all cases with some interconnectivity Tree Topology Hypercube Topology 25000 5000 4500 20000 4000 Throughput (Messages Processed) 15000 10000 ARS Throughput RS Throughput Round Robin Throughput (Messages Processed) 3500 3000 2500 2000 1500 ARS Throughput RS Throughput Round Robin 5000 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Time (seconds) 100 105 110 115 120 125 130 135 140 145 11/11/2005 20 1000 500 0 0 10 20 30 40 50 60 70 80 90 Time (seconds) 100 110 120 130 140

Results for applications with high communication to computation ratio 11/11/2005 21

Results for applications with low communication-to-computation ratio 11/11/2005 22

Load Balancing Strategies for Internet-like and Grid-like Environments Simulation results show that: The peer-to-peer protocol performs better for applications with high communication-tocomputation ratio in Internet-like environments The cluster-to-cluster protocol performs better for applications with low communication-tocomputation ratio in Grid-like environments 11/11/2005 23

Migration Policies Group Migration performs better for the four application topologies. Single Migration has a more stable behavior of the application s topology throughput Future Work: Evaluation of migration policies for different sizes of actors. 11/11/2005 24

Single vs. Group Migration 11/11/2005 25

Dynamic Networks Theaters were added and removed dynamically to test scalability. 140000 Unconnected Topology During the 1 st half of the experiment, every 30 seconds, a theater was added. During the 2 nd half, every 30 seconds, a theater was removed Throughput improves as the number of theaters grows. Throughput (Messages Processed) 120000 100000 80000 60000 40000 20000 0 0 15 30 45 60 75 90 105 120 135 150 165 180 195 210 225 240 255 270 285 300 315 330 345 360 375 390 405 420 435 450 465 480 495 510 525 Time (seconds) ARS Throughput RS Throughput 11/11/2005 26

Experiences with MPI programs Motivation Message Passing interface (MPI) is the de-facto standard to implement single program multiple data (SPMD) parallel applications. Widely used in the scientific community MPI Challenges on Dynamic Grids Tailored for tightly coupled systems/static environments Dynamic reconfiguration Process mobility Scale to accommodate joining nodes Shrink to accommodate leaving nodes Fault-tolerance Level of involvement of end-users in effective resource management (e.g. Application-level load balancing) 11/11/2005 27

MPI/IOS Research Goals Enabling MPI on Grid environments A cost-effective solution for scientific computing A scalable solution for high performance distributed applications Goals Extending MPI to support the dynamic reconfiguration and high adaptability to dynamic computational grids Improving performance and fault-tolerance of the code Minimalizing changes to legacy code 11/11/2005 28

MPI/IOS Framework Extending MPI with: Semi-transparent checkpointing Process migration support Integration with IOS for middleware-triggered reconfiguration 11/11/2005 29

Architecture The MPI/IOS runtime architecture consists of the following components: MPI applications Wrapped MPI that includes a Process Checkpointing and Migration (PCM) API The PCM library, and wrappers for all MPI native calls The MPI library, and The IOS runtime components. 11/11/2005 30

11/11/2005 31

MPI/IOS interactions 11/11/2005 32

Case Study: Heat Diffusion Problem A problem that models heat transfer in a solid A two-dimensional mesh is used to represent the problem data space An Iterative Application Highly synchronized 11/11/2005 33

Parallel Decomposition of the Heat Problem Original Data Space N N Legend Ghost Cells Data Cells Boundary Cells Ghost Cell Exchange 4-pt update stencil Parallel Decomposition N P 0 P 1 P 2 P n-1 11/11/2005 34

Empirical Results: Overhead of the PCM library 11/11/2005 35

Process migration evaluation Breakdown of the execution time of 2D heat application iterations on a 4 node cluster using MPICH2 Breakdown of the execution time of 2D heat application iterations on a 4 node cluster using MPI/IOS prototype 11/11/2005 36

Process migration evaluation Measured throughput of the twodimensional heat application using MPICH2 and MPI/IOS. The applications adapted to the load change by migrating the affected process to one of the participating nodes in the case of MPI/IOS. Measured throughput of the 2D heat application using MPICH2 and MPI/IOS. The applications adapted to the load change by migrating the affected process to a fast machine that joined the computation in the case of MPI/IOS. 11/11/2005 37

Reconfiguration Overhead 1400.00 1232.21 1200.00 1000.00 Time(s) 800.00 753.73 Non-reconfigurable execution time Reconfigurable execution time Reconfiguration overhead 600.00 535.72 441.08 400.00 354.16 200.00 257.95 134.12 210.83 0.82 0.90 1.13 1.52 0.00 0.95 1.37 2.44 3.81 Data Size (Megabytes) 11/11/2005 38

Breakdown of Reconfiguration Cost 100% 90% 80% 70% 60% 0.69 0.72 0.82 1.02 Overhead (s) 50% 40% Synchronization Loading Checkpointing 30% 20% 10% 0.08 0.05 0.12 0.07 0.20 0.11 0.32 0.18 0% 0.95 1.37 2.44 3.81 Data Size (Megabytes) 11/11/2005 39

Related and Ongoing Work

Related Work Work Stealing/Internet Computing/P2P Work stealing Cilk s runtime system for multithreaded parallel programming Cilk s scheduler s techniques of work stealing R. D. Blumofe and C. E. Leiserson, Scheduling Multithreaded Computations by Work Stealing, FOCS 94 Internet Computing SETI@home (Berkeley) Folding@home (Stanford) P2P systems Distributed Storage: Freenet, KaZaA File Sharing: Napster, Gnutella 11/11/2005 41

Related Work-- Globus/NWS Globus: A toolkit to address issues related to the development of grid-enabled tools, services and applications www.globus.org NWS A distributed system that periodically monitors and dynamically forecasts the performance of various network and computational resources http://nws.cs.ucsb.edu/ 11/11/2005 42

Ongoing Work Effective decentralized grid load balancing algorithms. Virtual surgical planning simulation on the Rensselaer Grid Scan of real patient is processed to extract solid model and inlet flow waveform. Model is discretized and flow equations solved. Multiple alterations to model are made within intuitive human computer interface and evaluated similarly. This simulation will be run on Rensselaer Grid using MPI/IOS Other potential target applications: ScalaPACK: a popular software package for linear algebra. PCA: principal component analysis using massive data sets 11/11/2005 43