Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University
Outline Background Introduction Related Work Summary of presented Framework Framework Framework Architecture Framework Operation Experimental Evaluation Conclusions & Future Work
Background
Introduction Motivation Traditional approach adopted for HPC Potential for leveraging existing networked resources for HPC Objective is to design and implement a framework that: Aggregates networked computing resources Non-intrusively exploits idle resources for Parallel/Distributed Computing Challenges that need to addressed by the Framework 1. Adaptability to Cluster Dynamics 2. System Configuration and Management Overhead 3. Heterogeneity 4. Security and Privacy concerns 5. Non intrusiveness
Related Work Adaptive Parallelism Job Level Parallelism Condor Piranha Cluster Based ATLAS ObjectSpace Web Based Charlotte Javelin ParaWeb
Features Summary of the Presented Work Implementation using Java as the core programming language to leverage portability across heterogeneous platforms Offloading user responsibilities to the Framework for Resource Availability event triggers Minimized configuration overhead by employing remote node configuration techniques Targeted Applications High Computational Complexity Massively Partitioned Problems Small Input and Output Sizes
Framework Architecture
Architectural Overview Master Module Worker Module Network Management Module Master Master Process Process Java Spaces Jini Java Space Infrastructure Space Space Remote Node Configuration Engine Worker Worker Process Process Rule Based Protocol Java Sockets Inference Engine Rule Based Protocol Java Native Interface Monitoring Agent Common Services PLATFORM INFRASTRUCTURE (OPERATING SYSTEM)
Architectural Components: Jini Infrastructure Master Module Master Process JavaSpaces Jini Java Space Infrastructure Space Worker Module Remote Node Config. Engine Worker Process Rule Base Protocol Java Sockets Network Management Module Inference Engine Rule Base Protocol Java Native Interface Monitoring Agent Jini Based Infrastructure Common Services PLATFORM INFRASTRUCTURE (OPERATING SYSTEM) 3. Client discovers lookup server through multicast discovery 4. Client performs lookup directly to know lookup server and receives server location Jini Client Jini Lookup Server 1. Service discovers lookup server through multicast discovery 2. Service registers with lookup Server using join Jini Server 5. Client directly accesses the server and executes code on the server Jini Client Jini Client
Architectural Components: Master-Worker Paradigm Master Module Master Process JavaSpaces Jini Java Space Infrastructure Space Worker Module Remote Node Config. Engine Worker Process Rule Base Protocol Java Sockets Network Management Module Inference Engine Rule Base Protocol Java Native Interface Monitoring Agent Master-Worker Interaction Diagram Common Services PLATFORM INFRASTRUCTURE (OPERATING SYSTEM) Worker 1 Writing Input Parameters Collects tasks Executes Returns results into space Worker 2 Master Creates tasks Loads into Space Awaits collection of results Collect Result Parameters Space Space Pool Pool of of Tasks Tasks Collects tasks Executes Returns results into space Worker n Legend: Writing Task Writing Results Collects tasks Executes Returns results into space
Architectural Components: Rule-Base Protocol UML Interaction Diagram for Rule-Base Protocol Master Module Master Process JavaSpaces Jini Java Space Infrastructure Common Services Space Worker Module Remote Node Config. Engine Worker Process Rule Base Protocol Java Sockets PLATFORM INFRASTRUCTURE (OPERATING SYSTEM) Network Management Module Inference Engine Rule Base Protocol Java Native Interface Monitoring Agent Inference Engine SNMP Server SNMP Client Worker Application 1. Server Listens for Client Connections 5. 2. Client connects and sends its I.P. Address 3. to Server Server assigns a Client I.D. 4. Server invokes SNMP Service for the Client Invokes Inference Engine 6. Decides signal for the Client 7. Sends signal to Client through the Server 8. Client sends signal to Application 10. Goto Step 5. 9. Computations are performed
Architectural Components: Worker Execution States Master Module Master Process JavaSpaces Jini Java Space Infrastructure Space Worker Module Remote Node Config. Engine Worker Process Rule Base Protocol Java Sockets Network Management Module Inference Engine Rule Base Protocol Java Native Interface Monitoring Agent Worker State Transition Diagram Common Services PLATFORM INFRASTRUCTURE (OPERATING SYSTEM) Stop Running Running Start Stop Stop Pause Resume Pause Pause
Experimental Evaluation
Experimental Evaluation Application Metrics Parallel Monte Carlo Simulation for Stock Option Pricing The problem domain is divided into 50 sub tasks, each comprising of 100 simulations Parallel Ray Tracing A 600X600-image plane is divided into rectangular slices of 25X600 thus creating 24 independent tasks Option Pricing Scheme Ray Tracing Scheme Scalability Medium High CPU Memory Adaptable depending High Requirements on number of simulations Dependency No No Jini Server Master Workstation P III, 256 MB RAM, 800 MHz Worker 1 P III, 64 MB RAM, 300 MHz Experimental Setup Worker 2 P III, 64 MB RAM, 300 MHz LAN SNMP Server Worker 3 P III, 64 MB RAM, 300 MHz P III, 64 MB RAM, 300 MHz Worker 12 P III, 64 MB RAM, 300 MHz
Experiment 1: Scalability Analysis (Option Pricing Scheme) Figure 1: Scalability Analysis for Option Pricing Scheme 8000 7000 6000 Time (ms) 5000 4000 3000 2000 1000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Number of Workers # of machines Max worker time Parallel Time Task Planning Task aggregation
Experiment 2: Adaptation Protocol Analysis (Option Pricing Scheme) Load Simulator 1: Stop Signal Load Simulator 2: Pause Signal Figure 1(a): Graph of CPU Usage on Worker Nodes for Option Pricing Scheme Figure 1(b): Graph of Worker Reaction Times for Option Pricing Scheme CPU Usage (%) 120 100 80 60 40 20 Simulation Run (STOP) Class Loading Simulation Run (PAUSE) Signal Start Pause Start Stop Start 1061 0 Start Worker Signal 1061 Client Signal 0 150777 150777 120723 120603 92182 90440 60256 60246 Stop Restart Pause Restart 60256 92182 120723 150777 60246 90440 120603 150777 Time (ms) 0 Client Signal Worker Signal
Experiment 3: Dynamic Behavior Patterns under varying load conditions (Option Pricing Scheme) Figure 1(a): Graph of dynamics under load showing time measurements for Option Pricing Scheme Measurement Category Total Parallel time Task Planning and Aggregation Maximum Master Overhead Maximum Worker Time 580 31555 29533 29953 8103 8352 272272 241467 272132 241397 423039 422948 0 100000 200000 300000 400000 500000 Time (ms) Without Loading Loading 3 Workers Loading 6 Workers Figure 1(b): Graph of dynamics under load showing task distribution for Option Pricing Scheme Number of Tasks Executed 16 14 12 10 8 6 4 2 0 Worker 1 Worker 2 Worker 3 Worker 4 Worker 5 Worker 6 Worker 7 Worker 8 Worker 9 Worker 10 Worker 11 Worker 12 Worker Nodes Without Loading Loading 3 Workers Loading 6 Workers
Conclusions and Future Work
Conclusions Summary Good Scalability for coarse grained applications Idle workstations can be effectively used to compute large parallel jobs Framework provides support for global code deployment and configuration management Monitoring and reacting to System State enables us to minimize intrusiveness to machines within the cluster Future Directions Optimizations at the Application level Use of Transaction Management services to address Partial Failures Incorporating a distributed JavaSpace model to avoid a single point of resource contention or failure
Additional Slides
Summary of Related Cluster Based Parallel Computing Architectures Architecture Approach Key Features Strengths Weaknesses Condor Piranha Cluster based, Job level parallelism, Asynchronous offloading of work to idle resources on a request basis Cluster based, Adaptive parallelism, Centralized work distribution using the master worker paradigm Queuing mechanism, Check pointing and process migration, Remote procedure call and matchmaking Implementation of Linda Model using tuple spaces, Recycles idle resources based on user defined criteria. Survives partial failure, preserved local execution environment Survives partial failure, efficient cycle stealing Incompleteness of checkpointing files. Difficulty in linking checkpointing code with user code for third party software. Requires modifications to the Linda systems. Supports only Linda Programs
Summary of Related Cluster Based Parallel Computing Architectures (cont d) Architecture Approach Key Features Strengths Weaknesses ATLAS ObjectSpace /Anaconda Cluster based, Adaptive Parallelism, Hierarchical work stealing Cluster based, Job level parallelism at brokers Implementation based on Cilk. Emphasis on thread scheduling algorithm based on work stealing techniques. Runtime library for work stealing, thread management, marshalling of objects. Eager scheduling, automated/manual job replication, Job level parallelism at broker level. Fault Tolerance, Localized computations: more bandwidth, reduced latencies, and natural mapping to administrative domains. Improved scalability Periodic polling the health of participation hosts. Global file sharing leads to conflicts in access control. Heterogeneity concerns: conversion mechanisms for native libraries, platform specific Java interpreter. Fault tolerance is not supported among brokers.
Architectur e Summary of Related Web Based Parallel Computing Architectures Approach Key Features Strengths Weaknesses Charlotte Web based, Adaptive parallelism, Centralized work distribution using downloadable applets Abstraction of Distributed Shared Memory, data type mapping and use of atomic update protocol. Directory look up service for idle resource identification, embedded lightweight class server on local hosts Direct interapplet communication, Fault tolerance, Eager scheduling and Two Phase idempotent execution Centralized broker can be a potential bottleneck Javelin Web based, Adaptive parallelism, Centralized work distribution using downloadable applets Heterogeneous, secure execution environment, portability to include Linda and SPMD programming models Offline worker computation, Applet balancing scheme via broker registration Inter applet communication routed via initiating broker: potential bottleneck
Summary of Related Web Based Parallel Computing Architectures (cont d) Architecture Approach Key Features Strengths Weaknesses ParaWeb Web based, Adaptive parallelism, Centralized work distribution using scheduling servers Implementation of Java Parallel Runtime System to provide global shared memory and transparent thread management and Java Parallel Class Library to facilitate upload and download of execution code across the web Clients can act as workers JPRS requires modifying the Java Interpreter Open Issues Heterogeneity concerns Manual Management
Architectural Components: JavaSpace Service Features Coordination tool for gluing processes together into a distributed application Supports heterogeneous computing Supports Dynamic Client Pool Provides a storage repository for Java Objects Provides a simplistic API for access of the Java objects in the space write read take JavaSpaces TM Interaction: Source Java Web Site
Experiment 1: Scalability Analysis (Ray Tracing Scheme) Figure 2: Scalability Analysis for Ray Tracing Scheme Time (ms) 25000 20000 15000 10000 5000 0 1 2 3 4 5 Number of Workers # of machines Max Worker Time Parallel Time Task Planning Task Aggregation
Experiment 1: Scalability Analysis (Pre-fetching Scheme) Figure 3: Scalability Analysis for Pre-fetching Scheme based on Page Rank Time (ms) 3000 2500 2000 1500 1000 500 0 1 2 3 4 5 Number of Workers # of machines Max Worker Time Parallel Time Task Planning Task Aggregation
Experiment 2: Adaptation Protocol Analysis (Ray Tracing Scheme) Figure 2(a): Graph of CPU Usage on Worker Nodes for Ray Tracing Scheme Figure 2(b): Graph of Worker Reaction Times for Ray Tracing Scheme CPU Usage (%) 120 100 80 60 40 20 0 Simulation Run (PAUSE) Simulation Run (STOP) Signal Restart Pause Restart Stop 1072 Start 0 182363 181271 151238 151238 391533 391533 347320 331467 Start Stop Restart Pause Restart Worker Signal 1072 151238 182363 347320 391533 Client Signal 0 151238 181271 331467 391533 Time (ms) Client Signal Worker Signal
Experiment 2: Adaptation Protocol Analysis (Pre-fetching scheme) Figure 3(a): Graph of CPU Usage on Worker Nodes for Pre-fetching Scheme Figure 3(b): Graph of Worker Reaction Times for Pre-fetching Scheme CPU Usage (%) 120 100 80 60 40 20 0 Simulation Run (STOP) Class Loading Simulation Run (PAUSE) Signal Restart 427054 427044 Pause 424130 395889 Restart 152449 151187 Stop 121164 121164 Start 1162 0 Start Stop Restart Pause Restart Worker Signal 1162 121164 152449 424130 427054 Client Signal 0 121164 151187 395889 427044 Time (ms) Client Signal Worker Signal
Experiment 3: Dynamic Behavior Patterns under varying load conditions (Ray Tracing Scheme) Figure 2(a): Graph of dynamics under load showing time measurements for Ray Tracing Scheme Measurement Category Total Parallel Time Task Planning and Aggregation Maximum Master Overhead Max Worker Time 79164 26839 29162 16072 166510 166270 244262 243750 244091 317626 360909 360408 0 50000 100000 150000 200000 250000 300000 350000 400000 Time (ms) Without Loading Loading 1Worker Loading 2 Workers Figure 2(b): Graph of dynamics under load showing task distribution for Ray Tracing Scheme Number of Tasks Executed 12 10 8 6 4 2 0 Worker 1 Worker 2 Worker 3 Worker 4 Worker Nodes Without Loading Loading 1Worker Loading 2 Workers
Experiment 3: Dynamic Behavior Patterns under varying load conditions (Pre-fetching Scheme) Figure 3(a): Graph of dynamics under load showing time measurements for Pre-fetching Scheme Measurement Category Parallel Time Task Planning and Aggregation Maximum Master Overhead Max Worker Time 30144 29993 52129783 511 500 271700 211624 180991 271660 211604 180840 0 50000 100000 150000 200000 250000 300000 Time (ms) Without Loading Loading 1 Worker Loading 2 Workers Figure 3(b): Graph of dynamics under load showing task distribution for Pre-fetching Scheme Number of Tasks Executed 12 10 8 6 4 2 0 Worker 1 Worker 2 Worker3 Worker 4 Worker Nodes Without Loading Loading 1 Worker Loading 2 Workers