Update on EZ-Grid. Priya Raghunath University of Houston. PI : Dr Barbara Chapman

Similar documents
Integrating SGE and Globus in a Heterogeneous HPC Environment

NUSGRID a computational grid at NUS

Héctor Fernández and G. Pierre Vrije Universiteit Amsterdam

Managing CAE Simulation Workloads in Cluster Environments

3C05 - Advanced Software Engineering Thursday, April 29, 2004

Using Resources of Multiple Grids with the Grid Service Provider. Micha?Kosiedowski

Grid Compute Resources and Job Management

ELFms industrialisation plans

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Computational Web Portals. Tomasz Haupt Mississippi State University

Grid Scheduling Architectures with Globus

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

Cycle Sharing Systems

Introduction. Software Trends. Topics for Discussion. Grid Technology. GridForce:

Maintaining the Central Management System Database

CS612: IT Technology and Course Overview

Job sample: SCOPE (VLDBJ, 2012)

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

1 BRIEF / Oracle Solaris Cluster Features and Benefits

Introduction to Grid Computing

MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced

Job-Oriented Monitoring of Clusters

Distributed Multitiered Application

Building the Enterprise

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Grid Architectural Models

UGP and the UC Grid Portals

How to build Scientific Gateways with Vine Toolkit and Liferay/GridSphere framework

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

Management of Biometric Data in a Distributed Internet Environment

Overview SENTINET 3.1

Database code in PL-SQL PL-SQL was used for the database code. It is ready to use on any Oracle platform, running under Linux, Windows or Solaris.

Smart Documents Timely Information Access For All T. V. Raman Senior Computer Scientist Advanced Technology Group Adobe Systems

External Tasks. External Milestone. Deadline. ERS_Project_Schedule.mpp Page 1 Fri 4/21/06 9:04 AM

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

Lecture 11 Hadoop & Spark

Designing and Deploying Microsoft Exchange Server 2016

Inside WebSphere Application Server

Exam Questions C

VMware vsphere 6.5: Install, Configure, Manage (5 Days)

Maintaining End-to-End Service Levels for VMware Virtual Machines Using VMware DRS and EMC Navisphere QoS

Enterprise Software Architecture & Design

Enterprise Web based Software Architecture & Design

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

[VMICMV6.5]: VMware vsphere: Install, Configure, Manage [V6.5]

Chapter 4 Communication

Java EE Application Assembly & Deployment Packaging Applications, Java EE modules. Model View Controller (MVC)2 Architecture & Packaging EJB Module

Cisco Configuration Engine 2.0

GEMS: A Fault Tolerant Grid Job Management System

CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase. Chen Zhang Hans De Sterck University of Waterloo

Cloud Computing. Summary

TABLE OF CONTENTS 1. INTRODUCTION DEFINITIONS Error! Bookmark not defined REASON FOR ISSUE 2 3. RELATED DOCUMENTS 2 4.

Oracle WebLogic Server 11g: Administration Essentials

Chapter 3. Design of Grid Scheduler. 3.1 Introduction

Rensselaer and UWCalendar2 an institute-wide open-source Java events calendar. Communication & Collaboration Technologies

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

CERTIFICATION SUCCESS GUIDE ENTERPRISE ARCHITECT FOR JAVA 2 PLATFORM, ENTERPRISE EDITION (J2EE ) TECHNOLOGY

status Emmanuel Cecchet

19/05/2010 SPD 09/10 - M. Coppola - The ASSIST Environment 28 19/05/2010 SPD 09/10 - M. Coppola - The ASSIST Environment 29. <?xml version="1.0"?

Fusion Registry 9 SDMX Data and Metadata Management System

Grid Programming Models: Current Tools, Issues and Directions. Computer Systems Research Department The Aerospace Corporation, P.O.

The LGI Pilot job portal. EGI Technical Forum 20 September 2011 Jan Just Keijser Willem van Engen Mark Somers

Enterprise Java Beans

GlassFish High Availability Overview

Research Collection. WebParFE A web interface for the high performance parallel finite element solver ParFE. Report. ETH Library

Web Services. Lecture I. Valdas Rapševičius. Vilnius University Faculty of Mathematics and Informatics

Tutorial 4: Condor. John Watt, National e-science Centre

WHITE PAPER. Good Mobile Intranet Technical Overview

Building solutions using Secure Global Desktop Curtis Cunningham (Presentation stolen from Mr. Steve Taylor)

Globus GTK and Grid Services

Maintaining End-to-End Service Levels for VMware Virtual Machines Using VMware DRS and EMC Navisphere QoS

COMPUTE CANADA GLOBUS PORTAL

Easy Access to Grid Infrastructures

AN ISO 9001:2008 CERTIFIED COMPANY ADVANCED. Java TRAINING.

Chapter 10 Protecting Virtual Environments

Michigan Grid Research and Infrastructure Development (MGRID)

Day 2 August 06, 2004 (Friday)

Service Execution Platform WebOTX To Support Cloud Computing

Systemwalker Service Quality Coordinator. Technical Guide. Windows/Solaris/Linux

Introduction to Distributed Systems (DS)

TUTORIAL: WHITE PAPER. VERITAS Indepth for the J2EE Platform PERFORMANCE MANAGEMENT FOR J2EE APPLICATIONS

WebSphere 4.0 General Introduction

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

Gatlet - a Grid Portal Framework

Emacspeak Direct Speech Access. T. V. Raman Senior Computer Scientist Adobe Systems. c1996 Adobe Systems Incorporated.All rights reserved.

Mobile Application Support

Introduction to Distributed Systems (DS)

Symphony A Java-based Composition and Manipulation Framework for Computational Grids

An Open Grid Services Architecture for Mobile Network Operators

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio)

Cornell Red Cloud: Campus-based Hybrid Cloud. Steven Lee Cornell University Center for Advanced Computing

Deploying virtualisation in a production grid

MPI History. MPI versions MPI-2 MPICH2

EMC Documentum Federated Search Services

Compute Infrastructure Management: The Future. Fred van den Bosch CTO, EVP Advanced Technology VERITAS Software Corporation

Application Server Evaluation Method

Cloud Computing. Up until now

Understanding StoRM: from introduction to internals

Transcription:

Update on EZ-Grid Priya Raghunath University of Houston PI : Dr Barbara Chapman chapman@cs.uh.edu

Outline Campus Grid at the University of Houston (UH) Functionality of EZ-Grid Design and Implementation of EZ-Grid Integrated Resource Information Service (IRIS) Adaptive Fault-tolerance Conclusion

Campus Grid Setup Major resource for computational science is Sun cluster at HPC Center (HPCC) Large-scale computing, data intensive computing Shared file system Job submission is managed by SGEEE Other resources include Solaris clusters and Linux clusters Separately constructed, Individually owned by various departments such as Computer Science, Geosciences and Math

Campus Grid Web Browser AQM HPCC Secure http Web Server + Servlet Container EZ Grid Services Secure Secure http Web Browser Computer Science Campus Grid Geo- Sciences

EZ-Grid High-level environment for job submission and resource usage control Uses Globus tools for middleware Security, resource management, information services, data transport etc. EZ-Grid Goals Developing generic brokerage systems for multi-site computing Enforcing absolute control and priority for resource providers GUI for primary functions such as Job Submission, Job Management and file handling Customizable User Profiles

Previous Architecture of EZ-Grid Client-Server Architecture EZ-Grid Client Provides functionalities such as: Logging into the grid (create grid proxy), GUI, Brokering tools to assist in scheduling of user jobs EZ-Grid Server Provides functionalities such as : Management of remote resources Usage Control of the resources by enforcing owners policies Job Submission

Limitations of Client-Server EZ-Grid Model Difficult to deploy Needs to be installed on each of the resource from which user needs to access. Lack of modularization Client side of the client-server system includes both the GUI and the business logic Hard to maintain Client software maintenance becomes harder with distributed clients

New Architecture of EZ-Grid Highly Modular Presentation layer and business logic are separated Enhanced Web-based portal Convenient, ubiquitous and high-level access to the grid Model-View-Controller (MVC) as the design pattern Separate the application logic (model) from the way it is represented to the user (view) Matches user actions with appropriate model Uses Java Server Technology: JSP + Servlets + Struts web framework Struts is a framework based on JSP Model 2

JSP Model 2 Design (MVC)

EZ-Grid Architecture

EZ-Grid Design Presentation layer HTML/JSP web pages containing customizable GUI to provide users with convenient and useful graphical interfaces Web application service layer Uses struts framework to organize/model the generic web application services in the standard MVC paradigm Business Logic The code is encapsulated as Java beans components Grid Service Interface Layer Database Layer

Driving Application-Air Quality Modeling Air quality initiative with federal funds Work to model ozone problems in Greater Houston area Assess strategies for compliance with ozone standards by 2007 Challenges Time critical results Different components execute on heterogeneous environments Interaction between components Reliable execution

Development Stages of EZ-Grid Stage 1 Basic functionalities such as authentication, maintenance of user profile and job profiles, file handling and job executions Stage 2 Maintenance of Data archives, resource information (static and dynamic), resource brokerage and precise usage control Stage 3 Fault tolerance and project management functions

EZ-Grid User Interface

Integrated Resource Information Service (IRIS) Estimates the queue wait time a user could experience based on the number of nodes desired Aids the user in node selection Information updated in real time Profiles every user based on past usage. Helps users whose jobs have consistent execution times Job Turnaround Time: Queue Wait Time + Execution Time The IRIS backend database is used for storing the relevant information. Useful for grid administration and monitoring.

IRIS Architecture Web / Command Line Based Reporting Interface IRIS Controller Database Interface User Interface 1 4 5 6 Heartbeat Monitor 2 Information Extractor 3 Database Resource Management Systems (SGEEE etc.) Grid Resources

IRIS Sample Output Select * from USER where Owner = archit ; Owner Num of Jobs Avg Exec Time Avg Wait Time Max Exec Time Min Exec Time Tool Archit 1444 16386.44 5957.49 1136383 11 IRIS Select Day, count(*) from JOB group by Day; Day count(*) Fri 1746 Mon 1768 Sat 926 Sun 891 Thu 1674 Tue 1362 Wed 1714 Select Wait, Date, Time from PROBE where Num_Nodes=6; 3157, Fri 28 Jun 2003, 18:54:57

Current Techniques for providing Fault Tolerance Checkpointing by saving the entire application image consumes time and disk space thereby affecting performance Some solutions have benefit of saving only the minimal state needed to recover however the degree of application fault-tolerance is usually fixed High degree of fault-tolerance hurts performance even if resources is very stable More recent trends are in the area of reconfigurable applications

Proposed Solution Adaptive fault-tolerance balanced with performance and resource stability Reconfigurable execution adapted to the changing computation power, such as node addition, failure, network bandwidth change, etc

Proposed Solution Reconfigurable MPI with application-level checkpointing programming Run-time systems, providing resource and application monitoring, and Performance model to decide run-time balance To guide programmer to setup different degrees of fault-tolerance in programs To guide run-time systems to decide the best balance between performance, resource stability and fault-tolerance

Proposed Solution Application Program Levels of checkpoints Performance Requirement Application/Job Management Systems Resource Stability Resource Management Systems Resource Monitoring, failure detection Performance Model Fault-tolerance degree Application performance monitoring, failure detection Scheduling ES0 EE1 ES1 EE2 Executed Checkpoint Non-executed Checkpoint Run-time Systems Reconfigurable MPI Application Application Execution EE2 ES2 EE3 ES3 EE5 EE4 Reconfigured execution scheduling ES4

Conclusion Grids need an convenient interface in order to allow easy access to grid services EZ-Grid is taking efforts in that direction Moving towards OGSA Compliance Grid Engine software provides a powerful tool for building campus grids such as the one at UH Investigating general framework for Policy Fault-tolerance without compromising on performance is an important need for grids